On 02/25/2014 05:04 PM, Andres Valloud wrote: > >> When going to 64-bit, and with the new ObjectMemory scheme, I guess a >> couple of identity hashing functions will come under scrutiny.
I'd think so. I haven't been following discussions of 64-bit design for Pharo. How many bits of identity hash are in each object, and how many bits is a SmallInteger (61, perhaps?) > > IMO it's not clear that SmallInteger>>identityHash should be implemented > that way. Finding a permutation of the small integers that also behaves > like a good quality hash function and evaluates quickly (in > significantly less time and complexity than, say, Bob Jenkins' lookup3) > would be a really interesting research project. I don't know if it's > possible. If no such thing exists, then getting at least some rough > idea of what's the minimum necessary complexity for such hash functions > would be valuable. Andres and I have an ongoing discussion on this topic. :-) I will say this: using SmallInteger>>hashMultiply as SmallInteger>>hash is *far* better than implementing SmallInteger>>hash as ^self. But Andres may be right that something else would be even more desirable. > > The implementation of hashMultiply, yes. Note however that > SmallInteger>>hash is ^self. >> >> So, 16384 * is the same as bitShift: 14 and it looks like done once, >> which may be better. And #bitShift *should* be faster. But test this. In VW, multiplication is quite a lot faster than bitShift: for some reason. Unless Andres has fixed this recently. :-) > > Keep in mind that the speed at which hash values are calculated is only > part of the story. If the hash function quality is not great, or the > hashed collection implementation is not efficient and induces collisions > or other extra work, improving the efficiency of the hash functions > won't do much. I think it's mentioned in the hash book (I'd have to > check), but once I made a hash function 5x times slower to get better > quality and the result was that a report that was taking 30 minutes took > 90 seconds instead (and hashing was gone from the profiler output). I wholeheartedly agree with this! The quality of the hash function is much more important than the speed. Also, remember that the #hash method is only *half* of the hash function. The other half is in the hashed collection (usually \\ tableSize). Those two must coordinate well to form a good hash function or you can get poor results. And the collections that are theoretically fastest with a good hash function (open addressing with linear probing) seem to be the most sensitive to the quality of the hash function. > > If this is a primitive instead, then you can also avoid the overflow > into large integers and do the math with (basically) > > mov eax, smallInteger > shr eax, numberOfTagBits > mul eax, 1664525 "the multiplication that throws out the high bits" > shl eax, 4 "throw out bits 29-32" > shr eax, 4 > lea eax, [eax * 2^numberOfTagBits + smallIntegerTagBits] Right. In Pharo, you might use NativeBoost to implement this, as you noted. Regards, -Martin
