I got it working and kicked the tires a bit. It would be useful to have such a tool in Pharo :-)
Now, time to dig. Phil On Mon, Mar 3, 2014 at 12:13 AM, Andres Valloud < [email protected]> wrote: > Once the code is loaded, from the Tools menu use Hash Analysis Tool. > There's a manual below, and also the Fundamentals book has a somewhat in > depth discussion on how it works. > > ftp://sqrmax.us.to/pub/Smalltalk/Papers/Hash%20Analysis%20Tool.pdf > > On 3/2/14 15:06 , [email protected] wrote: > >> I have found the tools in the Cincom public repo. >> >> Now, I have to see how this works in VW, I am not that proficient with it. >> >> Phil >> >> >> >> On Wed, Feb 26, 2014 at 8:55 AM, [email protected] >> <mailto:[email protected]> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Andres, >> >> Thanks for the insights. >> >> hash quality is indeed an key factor. At least, my mind is somewhat >> grasping this hashing field a bit better. >> >> I'll have a look at the tools, I haven't used them yet. >> >> And a shot at the ASM version with NativeBoost in Pharo. >> >> For what occurs in modern CPUs, well, no. Surprising to see that a >> mul would be faster than a shr or shl. How comes? >> I used to be ok with these things when I was writing demoscene code >> a loong time ago but I'd need an extremely serious refresh. >> >> As a side note, there is a huge uptake in the >> BigData/mapreduce/hadoop environment where Smalltalk is sorely >> absent. Scala seems to fill the void on the JVM. >> There hashing is quite key, to remap all of the mapping phase >> results to the reduce nodes. I am surprised to see that Smalltalk >> vendors haven't jumped in that space. >> >> Phil >> >> >> On Wed, Feb 26, 2014 at 2:04 AM, Andres Valloud >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hello... >> >> On 2/25/14 1:17 , [email protected] <mailto:[email protected]> >> wrote: >> >> I am currently reading through the Hashing in Smalltalk book >> (http://www.lulu.com/shop/__andres-valloud/hashing-in-__ >> smalltalk-theory-and-practice/__paperback/product-3788892.html >> <http://www.lulu.com/shop/andres-valloud/hashing-in- >> smalltalk-theory-and-practice/paperback/product-3788892.html>__) >> and, my head hurting notwithstanding, there are indeed a ton >> of gems in >> this system. As he mentions, doing the exercises brings a >> lot of extra :-) >> >> >> :) thank you. >> >> When going to 64-bit, and with the new ObjectMemory scheme, >> I guess a >> couple of identity hashing functions will come under scrutiny. >> >> e.g. >> >> SmallInteger>>hashMultiply >> | low | >> >> low := self bitAnd: 16383. >> ^(16r260D * low + ((16r260D * (self bitShift: -14) + >> (16r0065 * low) >> bitAnd: 16383) * 16384)) >> bitAnd: 16r0FFFFFFF >> >> >> which will need some more bits. >> >> >> IMO it's not clear that SmallInteger>>identityHash should be >> implemented that way. Finding a permutation of the small >> integers that also behaves like a good quality hash function and >> evaluates quickly (in significantly less time and complexity >> than, say, Bob Jenkins' lookup3) would be a really interesting >> research project. I don't know if it's possible. If no such >> thing exists, then getting at least some rough idea of what's >> the minimum necessary complexity for such hash functions would >> be valuable. >> >> Looking at hashMultiply as a non-identity hash function, one >> would start having problems when significantly more than 2^28 >> objects are stored in a single hashed collection. 2^28 objects >> with e.g. 12 bytes per header and a minimum of one instance >> variable (so the hash value isn't a instance-constant) stored in >> a hashed collection requires more than 4gb, so that is clearly a >> 64 bit image problem. In 64 bits, 2^28 objects with e.g. 16 >> bytes per header and a minimum of one instance variable each is >> already 6gb, and a minimum of 8gb with the hashed collection >> itself. >> >> Because of these figures, I'd think improving the implementation >> of hashed collections takes priority over adding more >> non-identity hash function bits (as long as the existing hash >> values are of good quality). >> >> Did you look at the Hash Analysis Tool I wrote? It's in the >> Cincom public Store repository. It comes in two bundles: Hash >> Analysis Tool, and Hash Analysis Tool - Extensions. With >> everything loaded, the tool comes with 300+ hash functions and >> 100+ data sets out of the box. The code is MIT. >> >> I had a look at how it was done in VisualWorks; >> >> >> The implementation of hashMultiply, yes. Note however that >> SmallInteger>>hash is ^self. >> >> hashMultiply >> "Multiply the receiver by 16r0019660D mod 2^28 >> without using large integer arithmetic for speed. >> The constant is a generator of the multiplicative >> subgroup of Z_2^30, see Knuth's TAOCP vol 2." >> <primitive: 1747> >> | low14Bits | >> low14Bits := self bitAnd: 16r3FFF. >> ^16384 >> * (16r260D * (self bitShift: -14) + (16r0065 * low14Bits) >> bitAnd: 16r3FFF) >> + (16r260D * low14Bits) bitAnd: 16rFFFFFFF >> >> The hashing book version has: >> >> multiplication >> "Computes self times 1664525 mod 2^38 while avoiding >> overflow into a >> large integer by making the multiplication into two 14 bits >> chunks. Do >> not use any division or modulo operation." >> | lowBits highBits| >> >> lowBits := self bitAnd: 16r3FFF. >> highBits := self bitShift: -14. >> ^(lowBits * 16r260D) >> + (((lowBits * 16r0065) bitAnd: 16r3FFF) bitShift: 14) >> + (((highBits * 16r260D) bitAnd: 16r3FFF) bitShift: 14) >> bitAnd: 16rFFFFFFF >> >> So, 16384 * is the same as bitShift: 14 and it looks like >> done once, >> which may be better. >> >> >> It should be a primitive (or otherwise optimized somehow). At >> some point though that hash function was implemented for e.g. >> ByteArray in Squeak, I thought at that point the multiplication >> step was also implemented as a primitive? >> >> Also VW marks it as a primitive, which Pharo does not. >> >> >> In VW it is also a translated primitive, i.e. it's executed >> directly in the JIT without calling C. >> >> Keep in mind that the speed at which hash values are calculated >> is only part of the story. If the hash function quality is not >> great, or the hashed collection implementation is not efficient >> and induces collisions or other extra work, improving the >> efficiency of the hash functions won't do much. I think it's >> mentioned in the hash book (I'd have to check), but once I made >> a hash function 5x times slower to get better quality and the >> result was that a report that was taking 30 minutes took 90 >> seconds instead (and hashing was gone from the profiler output). >> >> Would we gain >> some speed doing that? hashMultiply is used a lof for >> identity hashes. >> >> Bytecode has quite some work to do: >> >> 37 <70> self >> 38 <20> pushConstant: 16383 >> 39 <BE> send: bitAnd: >> 40 <68> popIntoTemp: 0 >> 41 <21> pushConstant: 9741 >> 42 <10> pushTemp: 0 >> 43 <B8> send: * >> 44 <21> pushConstant: 9741 >> 45 <70> self >> 46 <22> pushConstant: -14 >> 47 <BC> send: bitShift: >> 48 <B8> send: * >> 49 <23> pushConstant: 101 >> 50 <10> pushTemp: 0 >> 51 <B8> send: * >> 52 <B0> send: + >> 53 <20> pushConstant: 16383 >> 54 <BE> send: bitAnd: >> 55 <24> pushConstant: 16384 >> 56 <B8> send: * >> 57 <B0> send: + >> 58 <25> pushConstant: 268435455 >> 59 <BE> send: bitAnd: >> 60 <7C> returnTop >> >> >> If this is a primitive instead, then you can also avoid the >> overflow into large integers and do the math with (basically) >> >> mov eax, smallInteger >> shr eax, numberOfTagBits >> mul eax, 1664525 "the multiplication that throws out the high >> bits" >> shl eax, 4 "throw out bits 29-32" >> shr eax, 4 >> lea eax, [eax * 2^numberOfTagBits + smallIntegerTagBits] >> >> Please excuse trivial omissions in the above, it's written only >> for the sake of illustration (e.g. it looks like the 3 last >> instructions can be combined into two... lea followed by shr). >> Also, did you see the latency of integer multiplication >> instructions in modern x86 processors?... >> >> I ran some experiments timing things. >> >> It looks like that replacing 16384 * by bitShift:14 leads to >> a small >> gain, bitShift (primitive 17) being faster than * (primitive >> 9) >> >> >> Keep in mind those operations still have to check for overflow >> into large integers. In this case, large integers are not >> necessary. >> >> Andres. >> >> The bytecode is identical, except send: bitShift instead of >> send: * >> >> multiplication3 >> | low | >> >> low := self bitAnd: 16383. >> ^(16r260D * low + ((16r260D * (self bitShift: -14) + >> (16r0065 * low) >> bitAnd: 16383) bitShift: 14)) >> bitAnd: 16r0FFFFFFF >> >> >> >> [500000 timesRepeat: [ 15000 hashMultiply ]] timeToRun 12 >> [500000 timesRepeat: [ 15000 multiplication ]] timeToRun 41 >> (worse) >> [500000 timesRepeat: [ 15000 multiplication3 ]] timeToRun 10 >> (better) >> >> It looks like correct for SmallInteger minVal to: >> SmallInteger maxVal >> >> Now, VW gives: [500000 timesRepeat: [ 15000 hashMultiply ]] >> timeToRun >> 1.149 milliseconds >> >> Definitely worth investigating the primitive thing, or some >> NB Asm as >> this is used about everywhere (Collections etc). >> >> Toughts? >> >> Phil >> >> >> >> >> >
