Tim Peters <t...@python.org> added the comment:

I agree:  we "shouldn't have" documented anything about hash codes beyond the 
invariants needed to guarantee they work for their intended purpose, chiefly 
that x == y implies hash(x) == hash(y).

Which leads to your other question ;-)  That invariant is tricky to maintain 
across a pile of distinct but comparable numeric types!  Whatever we return for 
an int needs also to be what's returned for a float that happens to have the 
same value, and whatever we return for a float needs also to be what's returned 
for a fraction.Fraction with the same value, and so on.

So Python uses a single numeric hashing algorithm defined on mathematical 
rationals, described here:

https://docs.python.org/3.8/library/stdtypes.html#hashing-of-numeric-types

Presumably that's documented so that people defining their own numeric types 
have a clue about how to implement compatible hash codes.

Anyway, that algorithm uses a large fixed prime P (different on 32- and 64-bit 
boxes), and uses operations modulo P.  That's why the full bit width isn't 
used.  And not just any prime P, it picks one for which computing the remainder 
mod P is especially efficient.  2**61 - 1 is as good as it gets on 64 bit boxes.

I didn't pick this algorithm (I suspect Mark did), but I certainly approve of 
it.  It's clever and general enough to apply to any plausible subset-of-real 
numeric type short of the constructive reals (where equality isn't even 
theoretically decidable, so "x == y implies ..." can't get off the ground ;-) ).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37807>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to