Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t
On Jun 12, 2013, at 00:22 , Nathan Hjelmwrote: > Though a hardware accelerated crc32 (if available) would probably work great > as well. http://google-opensource.blogspot.fr/2011/04/introducing-cityhash.html with code available under MIT @ https://code.google.com/p/cityhash/ George.
Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t
I think as long as we are updating this we might as well add a performant hash function. I see Murmur2 is 32-bit and has a public domain implementation: https://code.google.com/p/smhasher/source/browse/trunk/MurmurHash2.cpp Since Murmur2 hashes in 4-byte blocks I expect it probably would perform better than either a nieve crc32 or Jenkins. Though a hardware accelerated crc32 (if available) would probably work great as well. Both are good suggestions that I will look into. I will also look into the string variant of Murmur though that would require another set of opal_hash_table functions for string keys (I was using the _ptr variety). -Nathan On Wed, Jun 12, 2013 at 12:10:42AM +0200, George Bosilca wrote: > The one-at-the-time version computes on chars, if the performance of the hash > function is a critical element in the equation then you will be better off > avoiding its usage. I would suggest going with Murmur > (http://en.wikipedia.org/wiki/MurmurHash) instead, which is faster and > perform well in random distribution. Another interesting features is that > there are specialized derivative for strings based key, a feature that might > prove helpful with the MCA parameters and the MPI_T stuff. > > George. > > > On Jun 11, 2013, at 23:32 , Nathan Hjelmwrote: > > > What: Implement a better hash function in opal_hash_table_t. The function > > is a simple one-at-a-time Jenkin's hash (see > > http://en.wikipedia.org/wiki/Jenkins_hash_function) and has good collision > > rates and isn't overly complex or slow. > > > > Why: I am preparing an update to the MCA variable system (adding > > performance variables) which will include a hash-based lookup function (in > > preperation for the inevitable MPI_T_cvar/pvar/category lookup functions-- > > MPI 3.0 errata). The current hash function is not very good so now seems > > like a good time to update it. > > > > When: Will push this to trunk on Thursday if there are no objections. > > > > Patch below > > > > Index: opal/class/opal_hash_table.c > > === > > --- opal/class/opal_hash_table.c(revision 28609) > > +++ opal/class/opal_hash_table.c(working copy) > > @@ -356,14 +356,20 @@ > > static inline uint32_t opal_hash_value(size_t mask, const void *key, > >size_t keysize) > > { > > -size_t h, i; > > -const unsigned char *p; > > - > > -h = 0; > > -p = (const unsigned char *)key; > > -for (i = 0; i < keysize; i++, p++) > > -h = HASH_MULTIPLIER*h + *p; > > -return (uint32_t)(h & mask); > > +const unsigned char *p = (const unsigned char *) key; > > +uint32_t h = 0, i; > > + > > +for (i = 0 ; i < keysize ; ++i, ++p) { > > +h += *p; > > +h += h << 10; > > +h ^= h >> 6; > > +} > > + > > +h += h << 3; > > +h ^= h >> 11; > > +h += h << 15; > > + > > +return h & (uint32_t) mask; > > } > > > > int opal_hash_table_get_value_ptr(opal_hash_table_t* ht, const void* key, > > > > > > > > > > -Nathan > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t
The one-at-the-time version computes on chars, if the performance of the hash function is a critical element in the equation then you will be better off avoiding its usage. I would suggest going with Murmur (http://en.wikipedia.org/wiki/MurmurHash) instead, which is faster and perform well in random distribution. Another interesting features is that there are specialized derivative for strings based key, a feature that might prove helpful with the MCA parameters and the MPI_T stuff. George. On Jun 11, 2013, at 23:32 , Nathan Hjelmwrote: > What: Implement a better hash function in opal_hash_table_t. The function is > a simple one-at-a-time Jenkin's hash (see > http://en.wikipedia.org/wiki/Jenkins_hash_function) and has good collision > rates and isn't overly complex or slow. > > Why: I am preparing an update to the MCA variable system (adding performance > variables) which will include a hash-based lookup function (in preperation > for the inevitable MPI_T_cvar/pvar/category lookup functions-- MPI 3.0 > errata). The current hash function is not very good so now seems like a good > time to update it. > > When: Will push this to trunk on Thursday if there are no objections. > > Patch below > > Index: opal/class/opal_hash_table.c > === > --- opal/class/opal_hash_table.c (revision 28609) > +++ opal/class/opal_hash_table.c (working copy) > @@ -356,14 +356,20 @@ > static inline uint32_t opal_hash_value(size_t mask, const void *key, >size_t keysize) > { > -size_t h, i; > -const unsigned char *p; > - > -h = 0; > -p = (const unsigned char *)key; > -for (i = 0; i < keysize; i++, p++) > -h = HASH_MULTIPLIER*h + *p; > -return (uint32_t)(h & mask); > +const unsigned char *p = (const unsigned char *) key; > +uint32_t h = 0, i; > + > +for (i = 0 ; i < keysize ; ++i, ++p) { > +h += *p; > +h += h << 10; > +h ^= h >> 6; > +} > + > +h += h << 3; > +h ^= h >> 11; > +h += h << 15; > + > +return h & (uint32_t) mask; > } > > int opal_hash_table_get_value_ptr(opal_hash_table_t* ht, const void* key, > > > > > -Nathan > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] RFC: improve the hash function used by opal_hash_table_t
What: Implement a better hash function in opal_hash_table_t. The function is a simple one-at-a-time Jenkin's hash (see http://en.wikipedia.org/wiki/Jenkins_hash_function) and has good collision rates and isn't overly complex or slow. Why: I am preparing an update to the MCA variable system (adding performance variables) which will include a hash-based lookup function (in preperation for the inevitable MPI_T_cvar/pvar/category lookup functions-- MPI 3.0 errata). The current hash function is not very good so now seems like a good time to update it. When: Will push this to trunk on Thursday if there are no objections. Patch below Index: opal/class/opal_hash_table.c === --- opal/class/opal_hash_table.c(revision 28609) +++ opal/class/opal_hash_table.c(working copy) @@ -356,14 +356,20 @@ static inline uint32_t opal_hash_value(size_t mask, const void *key, size_t keysize) { -size_t h, i; -const unsigned char *p; - -h = 0; -p = (const unsigned char *)key; -for (i = 0; i < keysize; i++, p++) -h = HASH_MULTIPLIER*h + *p; -return (uint32_t)(h & mask); +const unsigned char *p = (const unsigned char *) key; +uint32_t h = 0, i; + +for (i = 0 ; i < keysize ; ++i, ++p) { +h += *p; +h += h << 10; +h ^= h >> 6; +} + +h += h << 3; +h ^= h >> 11; +h += h << 15; + +return h & (uint32_t) mask; } int opal_hash_table_get_value_ptr(opal_hash_table_t* ht, const void* key, -Nathan