Re: [racket-dev] plausible hash function for s16vectors
At Tue, 26 Nov 2013 13:39:31 -0800, John Clements wrote: > My question: is there an accepted hash function for an s16vector, or > more generally, for a big block of memory? No, not currently. > Taking a look at the behavior of vectors, though, it looks like *every* > element is considered in computing the hash. [...] > Which suggests that every change to the vector changes the result of > the hash function. This seems... really expensive! The hashing functions are generally linear in the size of the value being hashed. That's not currently documented, as it should be. To hash a list, array, transparent structure, hash table, etc., each element is hashed recursively, but there is currently a limit on the depth of recursive hash calls to 128 so that the hash function doesn't have to detect cycles. (Lists are treated differently from non-list pairs in that hashing the `rest` doesn't count against the depth.) > My current guess is that Racket > uses a highly optimized (a.k.a. no safety checks) hash function that works > over arbitrary blocks of data, No. In the case of a vector, for example, the hash function is called recursively on each element of the vector. > Questions: > > 1) Am I guessing right? Mostly. > 2) Is this documented somewhere? No, and I'll fix that. > 3) Is there a generic memory-hash function in the unsafe interface somewhere? Not currently. > 4) Does the hash function affect the time taken by 'equal?' -- i.e., > the hash value is cached for faster equal? checking ? No, `equal?` doesn't hash its arguments. _ Racket Developers list: http://lists.racket-lang.org/dev
[racket-dev] plausible hash function for s16vectors
I want to equip my rsounds with a gen:equal+hash implementation, so that I can compare them using equal? and thereby use them in check-expect test cases. I have to roll this myself, since s16vectors don't do non-eq? equal?. My question: is there an accepted hash function for an s16vector, or more generally, for a big block of memory? These vectors are relatively unlikely to be sparse--that is, have lots of zeros--so it seems like it would be reasonable just to take three or four values (in this case s16s) and just XOR them together. Taking a look at the behavior of vectors, though, it looks like *every* element is considered in computing the hash. This code: #lang racket (equal-hash-code (make-vector 10 0)) (define v (make-vector 10 0)) (vector-set! v 7982 1) (equal-hash-code v) (vector-set! v 7182 1) (equal-hash-code v) (vector-set! v 17982 1) (equal-hash-code v) (vector-set! v 55089 1) (equal-hash-code v) produces this output: -1113903107084523788 369193263530696121 555229980507179346 2232258794360948312 -1281429753928160859 Which suggests that every change to the vector changes the result of the hash function. This seems... really expensive! My current guess is that Racket uses a highly optimized (a.k.a. no safety checks) hash function that works over arbitrary blocks of data, but that the hash computation is still linear in the size of the data. I haven't tried it, but I would guess that if I wrote my own looks-at-every-value hash function, it would be a lot slower. Questions: 1) Am I guessing right? 2) Is this documented somewhere? 3) Is there a generic memory-hash function in the unsafe interface somewhere? 4) Does the hash function affect the time taken by 'equal?' -- i.e., the hash value is cached for faster equal? checking ? Thanks! John _ Racket Developers list: http://lists.racket-lang.org/dev