[Python-Dev] Hash collision security issue (now public)

Jim Jewett Sun, 08 Jan 2012 14:35:47 -0800

In http://mail.python.org/pipermail/python-dev/2012-January/115368.html
Stefan Behnel wrote:


> Admittedly, this may require some adaptation for the PEP393 unicode memory
> layout in order to produce identical hashes for all three representations
> if they represent the same content.

They SHOULD NOT represent the same content; comparing two strings
currently requires converting them to canonical form, which means the
smallest format (of those three) that works.

If it can be represented in PyUnicode_1BYTE_KIND, then representations
using PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND don't count as
canonical, won't be created by Python itself, and already compare
unequal according to both PyUnicode_RichCompare and stringlib/eq.h (a
shortcut used by dicts).

That said, I don't think smallest-format is actually enforced with
anything stronger than comments (such as in unicodeobject.h struct
PyASCIIObject) and asserts (mostly calling
_PyUnicode_CheckConsistency).  I don't have any insight on how
prevalent non-conforming strings will be in practice, or whether
supporting their equality will be required as a bugfix.

-jJ
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Hash collision security issue (now public)

Reply via email to