> With this PEP, the unicode object overhead grows to 10 pointer-sized > words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. > Does it have any adverse effects?
For pure ASCII, it might be possible to use a shorter struct: typedef struct { PyObject_HEAD Py_ssize_t length; Py_hash_t hash; int state; Py_ssize_t wstr_length; wchar_t *wstr; /* no more utf8_length, utf8, str */ /* followed by ascii data */ } _PyASCIIObject; (-2 pointer -1 ssize_t: 56 bytes) => "a" is 58 bytes (with utf8 for free, without wchar_t) For object allocated with the new API, we can use a shorter struct: typedef struct { PyObject_HEAD Py_ssize_t length; Py_hash_t hash; int state; Py_ssize_t wstr_length; wchar_t *wstr; Py_ssize_t utf8_length; char *utf8; /* no more str pointer */ /* followed by latin1/ucs2/ucs4 data */ } _PyNewUnicodeObject; (-1 pointer: 72 bytes) => "é" is 74 bytes (without utf8 / wchar_t) For the legacy API: typedef struct { PyObject_HEAD Py_ssize_t length; Py_hash_t hash; int state; Py_ssize_t wstr_length; wchar_t *wstr; Py_ssize_t utf8_length; char *utf8; void *str; } _PyLegacyUnicodeObject; (same size: 80 bytes) => "a" is 80+2 (2 malloc) bytes (without utf8 / wchar_t) The current struct: typedef struct { PyObject_HEAD Py_ssize_t length; Py_UNICODE *str; Py_hash_t hash; int state; PyObject *defenc; } PyUnicodeObject; => "a" is 56+2 (2 malloc) bytes (without utf8, with wchar_t if Py_UNICODE is wchar_t) ... but the code (maybe only the macros?) and debuging will be more complex. > Will the format codes returning a Py_UNICODE pointer with > PyArg_ParseTuple be deprecated? Because Python 2.x is still dominant and it's already hard enough to port C modules, it's not the best moment to deprecate the legacy API (Py_UNICODE*). > Do you think the wstr representation could be removed in some future > version of Python? Conversion to wchar_t* is common, especially on Windows. But I don't know if we *have to* cache the result. Is it cached by the way? Or is wstr only used when a string is created from Py_UNICODE? Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com