In http://mail.python.org/pipermail/python-dev/2012-January/115368.html Stefan Behnel wrote:
> Admittedly, this may require some adaptation for the PEP393 unicode memory > layout in order to produce identical hashes for all three representations > if they represent the same content. They SHOULD NOT represent the same content; comparing two strings currently requires converting them to canonical form, which means the smallest format (of those three) that works. If it can be represented in PyUnicode_1BYTE_KIND, then representations using PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND don't count as canonical, won't be created by Python itself, and already compare unequal according to both PyUnicode_RichCompare and stringlib/eq.h (a shortcut used by dicts). That said, I don't think smallest-format is actually enforced with anything stronger than comments (such as in unicodeobject.h struct PyASCIIObject) and asserts (mostly calling _PyUnicode_CheckConsistency). I don't have any insight on how prevalent non-conforming strings will be in practice, or whether supporting their equality will be required as a bugfix. -jJ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com