On Fri, Mar 19, 2010 at 3:07 AM, Mark Dickinson <dicki...@gmail.com> wrote:
> On Fri, Mar 19, 2010 at 9:37 AM, Mark Dickinson <dicki...@gmail.com> wrote:
>> Making hashes of int,
>> float, Decimal *and* Fraction all compatible with one another,
>> efficient for ints and floats, and not grossly inefficient for
>> Fractions and Decimals, is kinda hairy, though I have a couple of
>> ideas of how it could be done.
>
> To elaborate, here's a cute scheme for the above, which is actually
> remarkably close to what Python already does for ints and floats, and
> which doesn't require any of the numeric types to figure out whether
> it's exactly equal to an instance of some other numeric type.
>
> After throwing out infinities and nans (which can easily be dealt with
> separately), everything we care about is a rational number, so it's
> enough to come up with some mapping from the set of all rationals to
> the set of possible hashes, subject to the requirement that the
> mapping can be computed efficiently for the types we care about.
>
> For any prime P there's a natural 'reduce modulo P' map
>
>  reduce : {rational numbers} --> {0, 1, ..., P-1, infinity}
>
> defined in pseudocode by:
>
>  reduce(m/n) = ((m % P) * inv(n, P)) % P  if n % P != 0  else infinity
>
> where inv(n, P) represents the modular inverse to n modulo P.
>
> Now let P be the handy Mersenne prime P = 2**31-1 (on a 64-bit
> machine, the almost equally handy prime 2**61-1 could be used
> instead), and define a hash function from the set of rationals to the
> set [-2**31, 2**31) by:
>
> hash(0) = 0
> hash(m/n) = 1 + reduce(m/n - 1) if m/n > 0   # i.e., reduce m/n modulo
> P, but to [1..P] rather than [0..P-1].
> hash(m/n) = -hash(-m/n) if m/n < 0.
>
> and in the last two cases, map a result of infinity to the unused hash
> value -2**31.
>
> For ints, this hash function is almost identical to what Python
> already has, except that the current int hash does a reduction modulo
> 2**32-1 or 2**64-1 rather than 2**31-1.  For all small ints, hash(n)
> == n, as currently.  Either way, the hash can be computed
> digit-by-digit in exactly the same manner.  For floats, it's also easy
> to compute:  express the float as m * 2**e for some integers m and e,
> compute hash(m), and rotate e bits in the appropriate direction.  And
> it's straightforward to implement for the Decimal and Fraction types,
> too.
Will this change the result of hashing a long? I know that both gmpy
and SAGE use their own hash implementations for performance reasons. I
understand that CPython's hash function is an implementation detail,
but there are external modules that rely on the existing hash
behavior.

FWIW, I'd prefer 2.7 and 3.2 have the same behavior. I don't mind the
existing 3.1 behavior and I'd rather not have a difference between 3.1
and 3.2.

casevh
>
> (One minor detail:  as usual, some postprocessing would be required to
> replace a hash of -1 with something else, since a hash value of -1 is
> invalid.)
>
> Mark
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/casevh%40gmail.com
>
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to