Re: [Pharo-dev] hashMultiply

Andres Valloud Tue, 25 Feb 2014 17:05:58 -0800

Hello...

On 2/25/14 1:17 , [email protected] wrote:

I am currently reading through the Hashing in Smalltalk book
(http://www.lulu.com/shop/andres-valloud/hashing-in-smalltalk-theory-and-practice/paperback/product-3788892.html)
and, my head hurting notwithstanding, there are indeed a ton of gems in
this system. As he mentions, doing the exercises brings a lot of extra :-)


:) thank you.

When going to 64-bit, and with the new ObjectMemory scheme, I guess a
couple of identity hashing functions will come under scrutiny.

e.g.

SmallInteger>>hashMultiply
| low |

low := self bitAnd: 16383.
^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low)
bitAnd: 16383) * 16384))
bitAnd: 16r0FFFFFFF


which will need some more bits.

IMO it's not clear that SmallInteger>>identityHash should be implementedthat way. Finding a permutation of the small integers that also behaveslike a good quality hash function and evaluates quickly (insignificantly less time and complexity than, say, Bob Jenkins' lookup3)would be a really interesting research project. I don't know if it'spossible. If no such thing exists, then getting at least some roughidea of what's the minimum necessary complexity for such hash functionswould be valuable.

Looking at hashMultiply as a non-identity hash function, one would starthaving problems when significantly more than 2^28 objects are stored ina single hashed collection. 2^28 objects with e.g. 12 bytes per headerand a minimum of one instance variable (so the hash value isn't ainstance-constant) stored in a hashed collection requires more than 4gb,so that is clearly a 64 bit image problem. In 64 bits, 2^28 objectswith e.g. 16 bytes per header and a minimum of one instance variableeach is already 6gb, and a minimum of 8gb with the hashed collection itself.

Because of these figures, I'd think improving the implementation ofhashed collections takes priority over adding more non-identity hashfunction bits (as long as the existing hash values are of good quality).

Did you look at the Hash Analysis Tool I wrote? It's in the Cincompublic Store repository. It comes in two bundles: Hash Analysis Tool,and Hash Analysis Tool - Extensions. With everything loaded, the toolcomes with 300+ hash functions and 100+ data sets out of the box. Thecode is MIT.

I had a look at how it was done in VisualWorks;

The implementation of hashMultiply, yes. Note however thatSmallInteger>>hash is ^self.

hashMultiply
"Multiply the receiver by 16r0019660D mod 2^28
without using large integer arithmetic for speed.
The constant is a generator of the multiplicative
subgroup of Z_2^30, see Knuth's TAOCP vol 2."
<primitive: 1747>
| low14Bits |
low14Bits := self bitAnd: 16r3FFF.
^16384
* (16r260D * (self bitShift: -14) + (16r0065 * low14Bits) bitAnd: 16r3FFF)
+ (16r260D * low14Bits) bitAnd: 16rFFFFFFF

The hashing book version has:

multiplication
"Computes self times 1664525 mod 2^38 while avoiding overflow into a
large integer by making the multiplication into two 14 bits chunks. Do
not use any division or modulo operation."
| lowBits highBits|

lowBits := self bitAnd: 16r3FFF.
highBits := self bitShift: -14.
^(lowBits * 16r260D)
   + (((lowBits * 16r0065) bitAnd: 16r3FFF) bitShift: 14)
   + (((highBits * 16r260D) bitAnd: 16r3FFF) bitShift: 14)
bitAnd: 16rFFFFFFF

So, 16384 * is the same as bitShift: 14 and it looks like done once,
which may be better.

It should be a primitive (or otherwise optimized somehow). At somepoint though that hash function was implemented for e.g. ByteArray inSqueak, I thought at that point the multiplication step was alsoimplemented as a primitive?

Also VW marks it as a primitive, which Pharo does not.

In VW it is also a translated primitive, i.e. it's executed directly inthe JIT without calling C.

Keep in mind that the speed at which hash values are calculated is onlypart of the story. If the hash function quality is not great, or thehashed collection implementation is not efficient and induces collisionsor other extra work, improving the efficiency of the hash functionswon't do much. I think it's mentioned in the hash book (I'd have tocheck), but once I made a hash function 5x times slower to get betterquality and the result was that a report that was taking 30 minutes took90 seconds instead (and hashing was gone from the profiler output).

Would we gain
some speed doing that? hashMultiply is used a lof for identity hashes.

Bytecode has quite some work to do:

37 <70> self
38 <20> pushConstant: 16383
39 <BE> send: bitAnd:
40 <68> popIntoTemp: 0
41 <21> pushConstant: 9741
42 <10> pushTemp: 0
43 <B8> send: *
44 <21> pushConstant: 9741
45 <70> self
46 <22> pushConstant: -14
47 <BC> send: bitShift:
48 <B8> send: *
49 <23> pushConstant: 101
50 <10> pushTemp: 0
51 <B8> send: *
52 <B0> send: +
53 <20> pushConstant: 16383
54 <BE> send: bitAnd:
55 <24> pushConstant: 16384
56 <B8> send: *
57 <B0> send: +
58 <25> pushConstant: 268435455
59 <BE> send: bitAnd:
60 <7C> returnTop

If this is a primitive instead, then you can also avoid the overflowinto large integers and do the math with (basically)


mov eax, smallInteger
shr eax, numberOfTagBits
mul eax, 1664525 "the multiplication that throws out the high bits"
shl eax, 4 "throw out bits 29-32"
shr eax, 4
lea eax, [eax * 2^numberOfTagBits + smallIntegerTagBits]

Please excuse trivial omissions in the above, it's written only for thesake of illustration (e.g. it looks like the 3 last instructions can becombined into two... lea followed by shr). Also, did you see thelatency of integer multiplication instructions in modern x86 processors?...

I ran some experiments timing things.

It looks like that replacing 16384 * by bitShift:14 leads to a small
gain, bitShift (primitive 17) being faster than * (primitive 9)

Keep in mind those operations still have to check for overflow intolarge integers. In this case, large integers are not necessary.


Andres.

The bytecode is identical, except send: bitShift instead of send: *

multiplication3
| low |

low := self bitAnd: 16383.
^(16r260D * low + ((16r260D * (self bitShift: -14) + (16r0065 * low)
bitAnd: 16383) bitShift: 14))
bitAnd: 16r0FFFFFFF



[500000 timesRepeat: [ 15000 hashMultiply ]] timeToRun 12
[500000 timesRepeat: [ 15000 multiplication ]] timeToRun 41  (worse)
[500000 timesRepeat: [ 15000 multiplication3 ]] timeToRun 10 (better)

It looks like correct for SmallInteger minVal to: SmallInteger maxVal

Now, VW gives: [500000 timesRepeat: [ 15000 hashMultiply ]] timeToRun
1.149 milliseconds

Definitely worth investigating the primitive thing, or some NB Asm as
this is used about everywhere (Collections etc).

Toughts?

Phil

Re: [Pharo-dev] hashMultiply

Reply via email to