Bugs item #1646068, was opened at 2007-01-27 18:23
Message generated for change (Comment added) made by ked-tao
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1646068&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.5
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: ked-tao (ked-tao)
Assigned to: Tim Peters (tim_one)
Summary: Dict lookups fail if sizeof(Py_ssize_t) < sizeof(long)

Initial Comment:
Portation problem.

Include/dictobject.h defines PyDictEntry.me_hash as a Py_ssize_t. Everywhere 
else uses a C 'long' for hashes.

On the system I'm porting to, ints and pointers (and ssize_t) are 32-bit, but 
longs and long longs are 64-bit. Therefore, the assignments to me_hash truncate 
the hash and subsequent lookups fail.

I've changed the definition of me_hash to 'long' and (in Objects/dictobject.c) 
removed the casting from the various assignments and changed the definition of 
'i' in dict_popitem(). This has fixed my immediate problems, but I guess I've 
just reintroduced whatever problem it got changed for. The comment in the 
header says:

/* Cached hash code of me_key.  Note that hash codes are C longs.
 * We have to use Py_ssize_t instead because dict_popitem() abuses
 * me_hash to hold a search finger.
 */

... but that doesn't really explain what it is about dict_popitem() that 
requires the different type.

Thanks. Kev.

----------------------------------------------------------------------

>Comment By: ked-tao (ked-tao)
Date: 2007-02-04 14:11

Message:
Logged In: YES 
user_id=1703158
Originator: YES

Hi Jim. I understand what the problem is (perhaps I didn't state it
clearly enough) - me_hash is a cache of the dict item's hash which is
compared against the hash of the object being looked up before going any
further with expensive richer comparisons. On my system, me_hash is a
32-bit quantity but hashes in general are declared 'long' which is a
64-bit quantity. Therefore for any object whose hash has any of the top 32
bits set, a dict lookup will fail as it will never get past that first
check (regardless of why that slot is being checked - it has nothing to do
with the perturbation to find the next slot).

The deal is that my system is basically a 32-bit system (sizeof(int) ==
sizeof(void *) == 4, and therefore ssize_t is not unreasonably also
32-bit), but C longs are 64-bit.

You say "popitem assumes it can store a pointer there", but AFAICS it's
just storing an _index_, not a pointer. I was concerned that making that
index a 64-bit quantity might tickle some subtlety in the code, thinking
that perhaps it was changed from 'long' to 'Py_ssize_t' because it had to
be 32-bit for some reason. However, it seems much more likely that it was
defined like that to be more correct on a system with 64-bit addressing
and 32-bit longs (which would be more common). With that in mind, I've
attached a suggested patch which selects a reasonable type according to
the SIZEOF_ configuration defines.

WRT forcing the hashes 32-bit to "save space and time" - that means
inventing a 'Py_hash_t' type and going through the entire python source
looking for 'long's that might be used to store/calculate hashes. I think
I'll pass on that ;)

Regards, Kev.
File Added: dict.diff

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-02-02 20:20

Message:
Logged In: YES 
user_id=764593
Originator: NO

The whole point of a hash is that if it doesn't match, you can skip an
expensive comparison.  How big to make the hash is a tradeoff between how
much you'll waste calculating and storing it vs how often it will save a
"real" comparison.

The comment means that, as an implementation detail, popitem assumes it
can store a pointer there instead, so hashes need to be at least as big as
a pointer.  

Going to the larger of the two sizes will certainly solve your problem; it
just wastes some space, and maybe some time calculating the hash.  

If you want to get that space back, just make sure the truncation is
correct and consistent.  I *suspect* your problem is that when there is a
collision, either 

(1)  It is comparing a truncated value to an untruncated value, or
(2)  The perturbation to find the next slot is going wrong, because of
when the truncation happens.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-01-27 19:40

Message:
Logged In: YES 
user_id=849994
Originator: NO

This is your code, Tim.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1646068&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to