On 02/25/2014 05:04 PM, Andres Valloud wrote:
> 
>> When going to 64-bit, and with the new ObjectMemory scheme, I guess a
>> couple of identity hashing functions will come under scrutiny.

I'd think so. I haven't been following discussions of 64-bit design for
Pharo. How many bits of identity hash are in each object, and how many
bits is a SmallInteger (61, perhaps?)

> 
> IMO it's not clear that SmallInteger>>identityHash should be implemented
> that way.  Finding a permutation of the small integers that also behaves
> like a good quality hash function and evaluates quickly (in
> significantly less time and complexity than, say, Bob Jenkins' lookup3)
> would be a really interesting research project.  I don't know if it's
> possible.  If no such thing exists, then getting at least some rough
> idea of what's the minimum necessary complexity for such hash functions
> would be valuable.

Andres and I have an ongoing discussion on this topic. :-) I will say
this: using SmallInteger>>hashMultiply as SmallInteger>>hash is *far*
better than implementing SmallInteger>>hash as ^self. But Andres may be
right that something else would be even more desirable.

> 
> The implementation of hashMultiply, yes.  Note however that
> SmallInteger>>hash is ^self.


>>
>> So, 16384 * is the same as bitShift: 14 and it looks like done once,
>> which may be better.

And #bitShift *should* be faster. But test this. In VW, multiplication
is quite a lot faster than bitShift: for some reason. Unless Andres has
fixed this recently. :-)

> 
> Keep in mind that the speed at which hash values are calculated is only
> part of the story.  If the hash function quality is not great, or the
> hashed collection implementation is not efficient and induces collisions
> or other extra work, improving the efficiency of the hash functions
> won't do much.  I think it's mentioned in the hash book (I'd have to
> check), but once I made a hash function 5x times slower to get better
> quality and the result was that a report that was taking 30 minutes took
> 90 seconds instead (and hashing was gone from the profiler output).

I wholeheartedly agree with this! The quality of the hash function is
much more important than the speed. Also, remember that the #hash method
is only *half* of the hash function. The other half is in the hashed
collection (usually \\ tableSize). Those two must coordinate well to
form a good hash function or you can get poor results. And the
collections that are theoretically fastest with a good hash function
(open addressing with linear probing) seem to be the most sensitive to
the quality of the hash function.


> 
> If this is a primitive instead, then you can also avoid the overflow
> into large integers and do the math with (basically)
> 
> mov eax, smallInteger
> shr eax, numberOfTagBits
> mul eax, 1664525 "the multiplication that throws out the high bits"
> shl eax, 4 "throw out bits 29-32"
> shr eax, 4
> lea eax, [eax * 2^numberOfTagBits + smallIntegerTagBits]

Right. In Pharo, you might use NativeBoost to implement this, as you noted.

Regards,

-Martin


Reply via email to