Vyacheslav, thanks for your suggestions. I tried them out and together over 200 random keys each with 100,000 iterations, it's about 0.000048ms per key (hash output is 32-bit signed).
Ultimately the hash result is being used as an index into a hash table so it needs to be cast to unsigned integer at some point. If I do a) and b) but then cast it, it's about 0.00022ms per key (hash output is 32-bit unsigned). If I use a Uint32Array or normal array, with 31-bit unsigned integers as hashing table constants, and then cast the result to unsigned, it's about 0.000048ms per key (hash output is 31-bit unsigned). Is there something obvious I'm missing to get an index into a hash table bucket using a 32-bit signed integer without slowing things down? I did also notice along the way that if I run the original 32-bit unsigned integer hash first a few times, and then the faster 31-bit unsigned integer hash, then the 31-bit hash is instead two orders of magnitude slower. On Wednesday, May 30, 2012 6:25:39 PM UTC+2, Vyacheslav Egorov wrote: > > Is it essential that your hash should be uint32 not int32? > > I would assume you can get a good performance and 32 bit of hash if > you stay with int32's: > > a) fill your tables with (integer % Math.pow(2,32)) | 0 to force > uint32 into int32 and > b) make your table a typed array instead of normal array: new > Int32Array(256); [not necessary if you are running x64 version of V8] > c) stop doing >>> 0 at the end; > > -- > Vyacheslav Egorov > > On Wed, May 30, 2012 at 5:21 PM, Joran Greef <[email protected]> wrote: > > The difference comes through when changing "integer % Math.pow(2,32)" > > to "integer % Math.pow(2,31)" when pre-generating the hash tables. > > > > Hash tables containing 256 random 31-bit unsigned integers are > pre-generated > > for every byte of key. The hash operates on fixed-length 20 byte keys. > > > > Each byte of the key is XOR-red with one of the integers in the table, > > depending on the position of the byte in the key, so the XOR is dealing > with > > an 8-bit unsigned integer and a 31-bit unsigned integer. > > > > The result is cast to an unsigned integer by >>> 0. > > > > On Wednesday, May 30, 2012 5:00:55 PM UTC+2, Vyacheslav Egorov wrote: > >> > >> Minor correction: 31-bit tagged _signed_ integers are used on ia32, on > >> x64 you get 32-bit tagged _signed_ integers. Neither though are wide > >> enough to contain all values from unsigned 32-bit integer range. > >> > >> Thus if you are really using them as 32bit _unsigned_ integers, e.g. > >> you are doing something like: > >> > >> var a = (b ^ c) >>> 0; // force into uint32 and then use in > >> non-int32-truncating manner. > >> > >> then unfortunately even V8's optimizing compiler gets confused. It > >> does not have designated uint32 representation and does not try to > >> infer when int32 can be safely used instead of uint32 (another example > >> of this bug: http://code.google.com/p/v8/issues/detail?id=2097). > >> > >> I suggest you post your code here if possible so that we could take a > >> look. > >> > >> -- > >> Vyacheslav Egorov > >> > >> On Wed, May 30, 2012 at 4:40 PM, Jakob Kummerow <[email protected]> > > >> wrote: > >> > As long as you're running unoptimized code on a 32-bit V8, this is > >> > expected: > >> > 31-bit integers are stored directly as "small integers", the 32nd bit > is > >> > used to tag them as such, whereas 32-bit integers are converted to > >> > doubles > >> > and stored as objects on the heap, which makes accessing them more > >> > expensive. > >> > > >> > When your code runs long enough for the optimizer to kick in, it > should > >> > recognize this situation, use untagged 32-bit integer values in > >> > optimized > >> > code, and the difference between 31-bit and 32-bit values should go > >> > away. If > >> > it doesn't, please post a reduced test case that exhibits the > behavior > >> > so > >> > that we can investigate. (Running the code for a second or so should > be > >> > enough to get the full effect of optimization and make the initial > >> > difference negligible.) > >> > > >> > > >> > On Wed, May 30, 2012 at 4:31 PM, Joran Greef <[email protected]> > wrote: > >> >> > >> >> I am implementing a table hash > >> >> (http://en.wikipedia.org/wiki/Tabulation_hashing) and noticed that > a > >> >> table > >> >> hash using a table of 31-bit unsigned integers is almost an order of > >> >> magnitude faster than a table hash using a table of 32-bit unsigned > >> >> integers. > >> >> > >> >> The former has an average hash time of 0.00007ms per 20 byte key for > a > >> >> 31-bit hash, and the latter has an average hash time of 0.00034ms > per > >> >> 20 > >> >> byte key for a 32-bit hash. > >> >> > >> >> I figured that XOR on 8-bit integers would be faster than XOR on > 16-bit > >> >> integers would be faster than XOR on 24-bit integers would be faster > >> >> than > >> >> XOR on 32-bit integers, but did not anticipate such a difference > >> >> between > >> >> 31-bit and 32-bit integers. > >> >> > >> >> Is there something regarding XOR that I may be missing that could > >> >> explain > >> >> the difference? > >> >> > >> >> > >> > -- > >> > v8-users mailing list > >> > [email protected] > >> > http://groups.google.com/group/v8-users > > > > -- > > v8-users mailing list > > [email protected] > > http://groups.google.com/group/v8-users > -- v8-users mailing list [email protected] http://groups.google.com/group/v8-users
