Bugs item #1646068, was opened at 2007-01-27 18:23 Message generated for change (Comment added) made by ked-tao You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1646068&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: Python 2.5 Status: Open Resolution: None Priority: 6 Private: No Submitted By: ked-tao (ked-tao) Assigned to: Tim Peters (tim_one) Summary: Dict lookups fail if sizeof(Py_ssize_t) < sizeof(long) Initial Comment: Portation problem. Include/dictobject.h defines PyDictEntry.me_hash as a Py_ssize_t. Everywhere else uses a C 'long' for hashes. On the system I'm porting to, ints and pointers (and ssize_t) are 32-bit, but longs and long longs are 64-bit. Therefore, the assignments to me_hash truncate the hash and subsequent lookups fail. I've changed the definition of me_hash to 'long' and (in Objects/dictobject.c) removed the casting from the various assignments and changed the definition of 'i' in dict_popitem(). This has fixed my immediate problems, but I guess I've just reintroduced whatever problem it got changed for. The comment in the header says: /* Cached hash code of me_key. Note that hash codes are C longs. * We have to use Py_ssize_t instead because dict_popitem() abuses * me_hash to hold a search finger. */ ... but that doesn't really explain what it is about dict_popitem() that requires the different type. Thanks. Kev. ---------------------------------------------------------------------- >Comment By: ked-tao (ked-tao) Date: 2007-02-04 14:11 Message: Logged In: YES user_id=1703158 Originator: YES Hi Jim. I understand what the problem is (perhaps I didn't state it clearly enough) - me_hash is a cache of the dict item's hash which is compared against the hash of the object being looked up before going any further with expensive richer comparisons. On my system, me_hash is a 32-bit quantity but hashes in general are declared 'long' which is a 64-bit quantity. Therefore for any object whose hash has any of the top 32 bits set, a dict lookup will fail as it will never get past that first check (regardless of why that slot is being checked - it has nothing to do with the perturbation to find the next slot). The deal is that my system is basically a 32-bit system (sizeof(int) == sizeof(void *) == 4, and therefore ssize_t is not unreasonably also 32-bit), but C longs are 64-bit. You say "popitem assumes it can store a pointer there", but AFAICS it's just storing an _index_, not a pointer. I was concerned that making that index a 64-bit quantity might tickle some subtlety in the code, thinking that perhaps it was changed from 'long' to 'Py_ssize_t' because it had to be 32-bit for some reason. However, it seems much more likely that it was defined like that to be more correct on a system with 64-bit addressing and 32-bit longs (which would be more common). With that in mind, I've attached a suggested patch which selects a reasonable type according to the SIZEOF_ configuration defines. WRT forcing the hashes 32-bit to "save space and time" - that means inventing a 'Py_hash_t' type and going through the entire python source looking for 'long's that might be used to store/calculate hashes. I think I'll pass on that ;) Regards, Kev. File Added: dict.diff ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-02-02 20:20 Message: Logged In: YES user_id=764593 Originator: NO The whole point of a hash is that if it doesn't match, you can skip an expensive comparison. How big to make the hash is a tradeoff between how much you'll waste calculating and storing it vs how often it will save a "real" comparison. The comment means that, as an implementation detail, popitem assumes it can store a pointer there instead, so hashes need to be at least as big as a pointer. Going to the larger of the two sizes will certainly solve your problem; it just wastes some space, and maybe some time calculating the hash. If you want to get that space back, just make sure the truncation is correct and consistent. I *suspect* your problem is that when there is a collision, either (1) It is comparing a truncated value to an untruncated value, or (2) The perturbation to find the next slot is going wrong, because of when the truncation happens. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2007-01-27 19:40 Message: Logged In: YES user_id=849994 Originator: NO This is your code, Tim. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1646068&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com