Re: [racket-dev] plausible hash function for s16vectors

2013-11-27 Thread Matthew Flatt
At Tue, 26 Nov 2013 13:39:31 -0800, John Clements wrote:
> My question: is there an accepted hash function for an s16vector, or
> more generally, for a big block of memory?

No, not currently.

> Taking a look at the behavior of vectors, though, it looks like *every* 
> element is considered in computing the hash. [...]
> Which suggests that every change to the vector changes the result of
> the hash function. This seems... really expensive!

The hashing functions are generally linear in the size of the value
being hashed. That's not currently documented, as it should be.

To hash a list, array, transparent structure, hash table, etc., each
element is hashed recursively, but there is currently a limit on the
depth of recursive hash calls to 128 so that the hash function doesn't
have to detect cycles. (Lists are treated differently from non-list
pairs in that hashing the `rest` doesn't count against the depth.)

> My current guess is that Racket 
> uses a highly optimized (a.k.a. no safety checks) hash function that works 
> over arbitrary blocks of data, 

No. In the case of a vector, for example, the hash function is called
recursively on each element of the vector.

> Questions:
> 
> 1) Am I guessing right?

Mostly.

> 2) Is this documented somewhere?

No, and I'll fix that.

> 3) Is there a generic memory-hash function in the unsafe interface somewhere?

Not currently.

> 4) Does the hash function affect the time taken by 'equal?' -- i.e.,
> the hash value is cached for faster equal? checking ?

No, `equal?` doesn't hash its arguments.

_
  Racket Developers list:
  http://lists.racket-lang.org/dev


[racket-dev] plausible hash function for s16vectors

2013-11-26 Thread John Clements
I want to equip my rsounds with a gen:equal+hash implementation, so that I can 
compare them using equal? and thereby use them in check-expect test cases.  I 
have to roll this myself, since s16vectors don't do non-eq? equal?. My 
question: is there an accepted hash function for an s16vector, or more 
generally, for a big block of memory? These vectors are relatively unlikely to 
be sparse--that is, have lots of zeros--so it seems like it would be reasonable 
just to take three or four values (in this case s16s) and just XOR them 
together.  

Taking a look at the behavior of vectors, though, it looks like *every* element 
is considered in computing the hash. This code:

#lang racket

(equal-hash-code (make-vector 10 0))
(define v (make-vector 10 0))
(vector-set! v 7982 1)
(equal-hash-code v)
(vector-set! v 7182 1)
(equal-hash-code v)
(vector-set! v 17982 1)
(equal-hash-code v)
(vector-set! v 55089 1)
(equal-hash-code v)

produces this output:

-1113903107084523788
369193263530696121
555229980507179346
2232258794360948312
-1281429753928160859

Which suggests that every change to the vector changes the result of the hash 
function. This seems... really expensive!  My current guess is that Racket uses 
a highly optimized (a.k.a. no safety checks) hash function that works over 
arbitrary blocks of data, but that the hash computation is still linear in the 
size of the data. I haven't tried it, but I would guess that if I wrote my own 
looks-at-every-value hash function, it would be a lot slower. Questions:

1) Am I guessing right?
2) Is this documented somewhere?
3) Is there a generic memory-hash function in the unsafe interface somewhere?
4) Does the hash function affect the time taken by 'equal?' -- i.e., the hash 
value is cached for faster equal? checking ?

Thanks!

John


_
  Racket Developers list:
  http://lists.racket-lang.org/dev