Re: [Python-Dev] Hashing proposal: change only string-only dicts

Martin v. Löwis Wed, 18 Jan 2012 09:58:33 -0800

Am 18.01.2012 17:01, schrieb PJ Eby:
> On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. Löwis" <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Am 17.01.2012 22:26, schrieb Antoine Pitrou:
>     > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30 bits
>     > could cache a "hash perturbation" computed from the string and the
>     > random bits:
>     >
>     > - hash() would use ob_shash
>     > - dict_lookup() would use ((ob_shash * 1000003) ^ (ob_sstate & ~3))
>     >
>     > This way, you cache almost all computations, adding only a computation
>     > and a couple logical ops when looking up a string in a dict.
> 
>     That's a good idea. For Unicode, it might be best to add another slot
>     into the object, even though this increases the object size.
> 
> 
> Wouldn't that break the ABI in 2.x?


I was thinking about adding the field at the end, so I thought it
shouldn't. However, if somebody inherits from PyUnicodeObject, it still
might - so my new proposal is to add the extra hash into the str block,
either at str[-1], or after the terminating 0. This would cause an
average increase of four bytes of the storage (0 bytes in 50% of the
cases, 8 bytes because of padding in the other 50%).

What do you think?

Regards,
Martin
_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hashing proposal: change only string-only dicts

Reply via email to