On Thu, 10 Aug 2017 07:00 pm, Peter Otten wrote:
> Steven D'Aprano wrote:
>> On Wed, 09 Aug 2017 20:07:48 +0300, Marko Rauhamaa wrote:
>>> Good point! A very good __hash__() implementation is:
>>> def __hash__(self):
>>> return id(self)
>>> In fact, I didn't know Python (kinda) did this by default already. I
>>> can't find that information in the definition of object.__hash__():
>> Hmmm... using id() as the hash would be a terrible hash function. Objects
> It's actually id(self) >> 4 (almost, see C code below), to account for
> memory alignment.
Thanks for tracking that down. As you show, the default hash isn't id() itself.
The C code says:
> /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
> excessive hash collisions for dicts and sets */
which I think agrees with my comment: using the id() itself would put too many
objects in the same bucket (i.e. too many collisions).
>>>> obj = object()
>>>> sample = (object() for _ in range(10))
>>>> all(id(obj) >> 4 == hash(obj) for obj in sample)
>> would fall into similar buckets if they were created at similar times,
>> regardless of their value, rather than being well distributed.
> If that were the problem it wouldn't be solved by the current approach:
>>>> sample = [object() for _ in range(10)]
>>>> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
> [1, 1, 1, 1, 1, 1, 1, 1, 1]
Arguably that's a flaw with the current approach that (maybe?) makes object()'s
hash too closely. But:
- perhaps it doesn't matter in practice, since the hash is taken modulo
the size of the hash table;
- or maybe Python's dicts and sets are good enough that a difference
of 1 is sufficient to give a good distribution of objects in the
- or maybe it does matter, but since people hardly ever use object()
itself as the keys in dicts, it doesn't come up.
Here's your example with a class that inherits from object, rather than object
py> class X(object):
py> sample = [X() for x in range(10)]
py> [hash(b) - hash(a) for a, b in zip(sample, sample[1:])]
[-5338, -10910, -2976, -2284, -21326, 4, -8, 2, -4]
So my money is on object() being anomalous: because it is so small, the hashes
end up so similar. For "typical" classes, the hash function does a much better
job of mixing the hash values up.
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.