[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Mark Dickinson
Changes by Mark Dickinson dicki...@gmail.com: Removed file: http://bugs.python.org/file12978/time_object_hash.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Apart from the misindentation Apologies. My fault for editing Python files while at work, with a substandard editor configuration... have you run the benchmark script with it? I have now. See attached file for 3 sets of results

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Mark Dickinson
Changes by Mark Dickinson dicki...@gmail.com: Added file: http://bugs.python.org/file13069/pointer_hash5_xor.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Mark Dickinson
Changes by Mark Dickinson dicki...@gmail.com: Added file: http://bugs.python.org/file13070/pointer_hash5_rotate.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Mark Dickinson
Changes by Mark Dickinson dicki...@gmail.com: Removed file: http://bugs.python.org/file13054/pointer_hash5.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ok, so the rotate version is really significantly faster (and, as Adam pointed out, it's also theoretically better). ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Antoine, please check in a patch of your choice. I think we've beaten this issue to death already. :-) ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- assignee: - pitrou ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___ ___ Python-bugs-list

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: pointer_hash5_rotate.patch has been committed to trunk and py3k. Thanks! ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: -- resolution: accepted - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-13 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Sweet! ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___ ___ Python-bugs-list

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: - avoids comparing an unsigned long to -1 Just out of interest, why? The cast is unnecessary: there's no ambiguity or undefinedness (the int -1 gets promoted to unsigned long, with wraparound semantics), and neither gcc nor MSVC

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Mark: Just out of interest, why? The cast is unnecessary: there's no ambiguity or undefinedness (the int -1 gets promoted to unsigned long, with wraparound semantics), and neither gcc nor MSVC complains. Well, I had memories of a weird

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Well, I had memories of a weird signed/unsigned problem (issue4935) and I wasn't sure whether it could raise its head in the present case or not. I'm 99.9% sure that it's not a problem here. If it is a problem then it needs to be fixed in

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Other than that, the patch looks fine to me; x ^= x 4 would be fine too. I've just tried x ^= x 4 and the speedup is smaller on our microbenchmark (time_object_hash.py). I conjecture that trying to maintain the sequentiality of adresses

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: +1 for checking in pointer_hash4.patch, provided Raymond approves. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Consider it approved. Though I prefer you switch to x ^= x 4. -- resolution: - accepted ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Though I prefer you switch to x ^= x 4. Okay, how about this one? Short and sweet. No loss of information except when sizeof(void *) sizeof(long) (unavoidable until someone finds a way to fit all 64-bit pointers into a 32-bit integer

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Okay, how about this one? Apart from the misindentation (the file should use tabs not spaces), have you run the benchmark script with it? ___ Python tracker rep...@bugs.python.org

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Adam Olsen
Adam Olsen rha...@gmail.com added the comment: Antoine, x ^= x4 has a higher collision rate than just a rotate. However, it's still lower than a statistically random hash. If you modify the benchmark to randomly discard 90% of its contents this should give you random addresses, reflecting a

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen
Adam Olsen rha...@gmail.com added the comment: The alignment requirements (long double) make it impossible to have anything in those bits. Hypothetically, a custom allocator could lower the alignment requirements to sizeof(void *). However, rotating to the high bits is pointless as they're the

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: [Adam Olsen] The alignment requirements (long double) make it impossible to have anything in those bits. Not necessarily, since not all pointers passed to _Py_HashPointer come from a PyObject_Malloc. _Py_HashPointer is used for function

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Le mercredi 11 février 2009 à 03:31 +, Adam Olsen a écrit : .. although, if I replace object() with list() I get best results with a shift of 6 bits. Replacing it with dict() is best with 8 bits. But list() and dict() don't use id() for

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Here's an updated patch, that errs on the conservative side: - rotate instead of shifting, as suggested by Raymond. This costs very little, and I admit to feeling uncomfortable about the possibility of just throwing bits away -

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Chema Cortés
Changes by Chema Cortés dev.xt...@gmail.com: -- nosy: +chemacortes ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___ ___ Python-bugs-list

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen
Adam Olsen rha...@gmail.com added the comment: Antoine, I only meant list() and dict() to be an example of objects with a larger allocation pattern. We get a substantial benefit from the sequentially increasing memory addresses, and I wanted to make sure that benefit wasn't lost on larger

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: I'm fine with rotating 4 bits instead of 3, especially if the timings look good on 32-bit as well as 64-bit. We should really benchmark dict lookups (and set membership tests) as well as dict creation.

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: At four bits, you may be throwing away information and I don't think that's cool. Even if some selected timings are better with more bits shifted, all you're really showing is that there is more randomness in the upper bits

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I'm *much* more comfortable with a byte-swap, rotation, or xoring-in upper bits than with shifts that potentially destroy entropy. Otherwise, your taxing apps that build giant sets/dicts and need all distinguishing bits to avoid collision

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: At four bits, you may be throwing away information and I don't think that's cool. The current patch *does* do a rotation: it doesn't throw away any information. ___ Python tracker

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Here is an updated patch. It uses a 4-bit shift and an addition. We should avoid the use of logical or, because it makes the outputs non-uniformly distributed ('1' bits are more likely). Added file:

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Here is an updated benchmark script, for both object() and an user-defined class, and adding dict lookup, set lookup and set difference. Set difference is massively faster: up to 60% faster. Added file:

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Antoine Pitrou
Changes by Antoine Pitrou pit...@free.fr: Removed file: http://bugs.python.org/file12981/time_object_hash.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Guys, let's be careful. Make sure that efforts to randomize lower bits don't destroy information. Something like x |= x8 is reversible and fast. Other fun looking transformations are not: len(set((x 4) + (x 15) for

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Ok, updated patch: - uses a 4-bit rotate (not shift) - avoids comparing an unsigned long to -1 - tries to streamline the win64 special path (but I can't test) Added file: http://bugs.python.org/file13040/pointer_hash4.patch

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen
Adam Olsen rha...@gmail.com added the comment: At four bits, you may be throwing away information and I don't think that's cool. Even if some selected timings are better with more bits shifted, all you're really showing is that there is more randomness in the upper bits than the lower ones.

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: [Antoine] Ok, updated patch: - uses a 4-bit rotate (not shift) - avoids comparing an unsigned long to -1 - tries to streamline the win64 special path (but I can't test) pointer_hash4.patch looks fine to me. Still, I

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread David W. Lambert
David W. Lambert lamber...@corning.com added the comment: x |= x4 Are you (Ray) sure you didn't mean x ^= x4? -- nosy: +LambertDW ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen
Adam Olsen rha...@gmail.com added the comment: Testing with a large set of ids is a good demonstration, but not proof. Forming a set of *all* possible values within a certain range is proof. However, XOR does work (OR definitely does not) — it's a 1-to-1 transformation (reversible as you say.)

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: David, yes, I did mean x ^= x4; How embarrassing. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Some comments, while I remember: * the argument to _Py_HashPointer is not always divisible by 8. It's called to create hashes for various objects, including methodobjects; see the line: y = _Py_HashPointer((void*)(a-m_ml-ml_meth));

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Some tests on py3k (32-bit build): l = [object() for i in range(20)] [id(l[i+1]) - id(l[i]) for i in range(len(l)-1)] [16, -96, 104, 8, 8, 8, 8, 8, -749528, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] class C(object): ...__slots__ = () ... l = [C()

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Le mardi 10 février 2009 à 21:18 +, Antoine Pitrou a écrit : Note: a 64-bit build shows an even greater allocation unit: class C(object): ...__slots__ = ('x') ... l = [C() for i in range(20)] [id(l[i+1]) - id(l[i]) for i in

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Adam Olsen
Adam Olsen rha...@gmail.com added the comment: On my 64-bit linux box there's nothing in the last 4 bits: [id(o)%16 for o in [object() for i in range(128)]] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Adam Olsen
Adam Olsen rha...@gmail.com added the comment: Upon further inspection, although a shift of 4 (on a 64-bit linux box) isn't perfect for dict, it's fairly close to it and well beyond random hash values. Mixing things more is just gonna lower it towards random values. c() 2: 1, 1, 1,

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Raymond Hettinger
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Instead of a shift, how about a rotate or byteswap in case the lower bits ever become significant again in some build. ___ Python tracker rep...@bugs.python.org

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-09 Thread Jesús Cea Avión
Changes by Jesús Cea Avión j...@jcea.es: -- nosy: +jcea ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue5186 ___ ___ Python-bugs-list mailing list

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-08 Thread Mark Dickinson
New submission from Mark Dickinson dicki...@gmail.com: In the issue 5169 discussion, Antoine Pitrou suggested that for an object x without a __hash__ method, id()/8 might be a better hash value than id(), since dicts use the low order bits of the hash as initial key, and the 3 lowest bits of

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-08 Thread Mark Dickinson
Mark Dickinson dicki...@gmail.com added the comment: Here are some timings for dict creation, created with the attached script. They're not particularly scientific, but they at least show that this one- line optimization can make a significant difference. Typical results on my machine (OS X

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-08 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: The code path for SIZEOF_LONG SIZEOF_VOID_P could probably also benefit from this optimization by casting the pointer to a size_t (this will effect 64-bit Windows, where long is 32 bits but pointers are 64 bits). (unfortunately it seems the

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-08 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Benchmark results on my machine (64-bit Linux, gcc 4.3.2, AMD X2 3600+): Before: dict creation (selected): 5.09600687027 dict creation (shuffled): 5.66548895836 dict creation: 3.72823190689 After: dict creation (selected): 4.57248306274

[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-08 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I observe even greater speedups (15%/20%/37%) on set creation. Here is the updated benchmark script. Added file: http://bugs.python.org/file12981/time_object_hash.py ___ Python tracker rep...@bugs.python.org