> Repetition of "11"; I'm guessing that the 2byte/UCS-2 should read "10", > so that they give the width of the char representation.
Thanks, fixed. >> 00 => null pointer > > Naturally this assumes that all pointers are at least 4-byte aligned (so > that they can be masked off). I assume that this is sane on every > platform that Python supports, but should it be spelled out explicitly > somewhere in the PEP? I'll change the PEP to move the type indicator into the state field, so that issue becomes irrelevant. >> The string is null-terminated (in its respective representation). >> - hash, state: same as in Python 3.2 >> - utf8_length, utf8: UTF-8 representation (null-terminated) > If this is to share its buffer with the "str" representation for the > Latin-1 case, then I take it this ptr will typically be (str & ~4) ? > i.e. only "str" has the low-order-bit type info. Yes, the other pointers are aligned. Notice that the case in which sharing occurs is only ASCII, though (for Latin-1, some characters require two bytes in UTF-8). > Spelling out the meaning of "optional": > does this mean that the relevant ptr is NULL; if so, if utf8 is null, > is utf8_length undefined, or is it some dummy value? I've clarified this: I propose length is undefined (unless there is a good reason to clear it). >> If the string is created directly with the canonical representation >> (see below), this representation doesn't take a separate memory block, >> but is allocated right after the PyUnicodeObject struct. > > Is the idea to do pointer arithmentic when deleting the PyUnicodeObject > to determine if the ptr is in that location, and not delete it if it is, > or is there some other way of determining whether the pointers need > deallocating? Correct. > If the former, is this embedding an assumption that the > underlying allocator couldn't have allocated a buffer directly adjacent > to the PyUnicodeObject. I know that GNU libc's malloc/free > implementation has gaps of two machine words between each allocation; > off the top of my head I'm not sure if the optimized Object/obmalloc.c > allocator enforces such gaps. No, it doesn't... So I guess I reserve another bit in the state for that. > GDB Debugging Hooks > ------------------- > Tools/gdb/libpython.py contains debugging hooks that embed knowledge > about the internals of CPython's data types, include PyUnicodeObject > instances. It will need to be slightly updated to track the change. Thanks, added. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com