Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t

2013-06-11 Thread George Bosilca
On Jun 12, 2013, at 00:22 , Nathan Hjelm  wrote:

> Though a hardware accelerated crc32 (if available) would probably work great 
> as well.

http://google-opensource.blogspot.fr/2011/04/introducing-cityhash.html
with code available under MIT @ https://code.google.com/p/cityhash/

  George. 


Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t

2013-06-11 Thread Nathan Hjelm
I think as long as we are updating this we might as well add a performant hash 
function. I see Murmur2 is 32-bit and has a public domain implementation: 
https://code.google.com/p/smhasher/source/browse/trunk/MurmurHash2.cpp

Since Murmur2 hashes in 4-byte blocks I expect it probably would perform better 
than either a nieve crc32 or Jenkins. Though a hardware accelerated crc32 (if 
available) would probably work great as well. Both are good suggestions that I 
will look into. I will also look into the string variant of Murmur though that 
would require another set of opal_hash_table functions for string keys (I was 
using the _ptr variety).

-Nathan

On Wed, Jun 12, 2013 at 12:10:42AM +0200, George Bosilca wrote:
> The one-at-the-time version computes on chars, if the performance of the hash 
> function is a critical element in the equation then you will be better off 
> avoiding its usage. I would suggest going with Murmur 
> (http://en.wikipedia.org/wiki/MurmurHash) instead, which is faster and 
> perform well in random distribution. Another interesting features is that 
> there are specialized derivative for strings based key, a feature that might 
> prove helpful with the MCA parameters and the MPI_T stuff.
> 
>   George.
> 
> 
> On Jun 11, 2013, at 23:32 , Nathan Hjelm  wrote:
> 
> > What: Implement a better hash function in opal_hash_table_t. The function 
> > is a simple one-at-a-time Jenkin's hash (see 
> > http://en.wikipedia.org/wiki/Jenkins_hash_function) and has good collision 
> > rates and isn't overly complex or slow.
> > 
> > Why: I am preparing an update to the MCA variable system (adding 
> > performance variables) which will include a hash-based lookup function (in 
> > preperation for the inevitable MPI_T_cvar/pvar/category lookup functions-- 
> > MPI 3.0 errata). The current hash function is not very good so now seems 
> > like a good time to update it.
> > 
> > When: Will push this to trunk on Thursday if there are no objections.
> > 
> > Patch below
> > 
> > Index: opal/class/opal_hash_table.c
> > ===
> > --- opal/class/opal_hash_table.c(revision 28609)
> > +++ opal/class/opal_hash_table.c(working copy)
> > @@ -356,14 +356,20 @@
> > static inline uint32_t opal_hash_value(size_t mask, const void *key,
> >size_t keysize)
> > {
> > -size_t h, i;
> > -const unsigned char *p;
> > -
> > -h = 0;
> > -p = (const unsigned char *)key;
> > -for (i = 0; i < keysize; i++, p++)
> > -h = HASH_MULTIPLIER*h + *p;
> > -return (uint32_t)(h & mask);
> > +const unsigned char *p = (const unsigned char *) key;
> > +uint32_t h = 0, i;
> > +
> > +for (i = 0 ; i < keysize ; ++i, ++p) {
> > +h += *p;
> > +h += h << 10;
> > +h ^= h >> 6;
> > +}
> > +
> > +h += h << 3;
> > +h ^= h >> 11;
> > +h += h << 15;
> > +
> > +return h & (uint32_t) mask;
> > }
> > 
> > int opal_hash_table_get_value_ptr(opal_hash_table_t* ht, const void* key,
> > 
> > 
> > 
> > 
> > -Nathan
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] RFC: improve the hash function used by opal_hash_table_t

2013-06-11 Thread George Bosilca
The one-at-the-time version computes on chars, if the performance of the hash 
function is a critical element in the equation then you will be better off 
avoiding its usage. I would suggest going with Murmur 
(http://en.wikipedia.org/wiki/MurmurHash) instead, which is faster and perform 
well in random distribution. Another interesting features is that there are 
specialized derivative for strings based key, a feature that might prove 
helpful with the MCA parameters and the MPI_T stuff.

  George.


On Jun 11, 2013, at 23:32 , Nathan Hjelm  wrote:

> What: Implement a better hash function in opal_hash_table_t. The function is 
> a simple one-at-a-time Jenkin's hash (see 
> http://en.wikipedia.org/wiki/Jenkins_hash_function) and has good collision 
> rates and isn't overly complex or slow.
> 
> Why: I am preparing an update to the MCA variable system (adding performance 
> variables) which will include a hash-based lookup function (in preperation 
> for the inevitable MPI_T_cvar/pvar/category lookup functions-- MPI 3.0 
> errata). The current hash function is not very good so now seems like a good 
> time to update it.
> 
> When: Will push this to trunk on Thursday if there are no objections.
> 
> Patch below
> 
> Index: opal/class/opal_hash_table.c
> ===
> --- opal/class/opal_hash_table.c  (revision 28609)
> +++ opal/class/opal_hash_table.c  (working copy)
> @@ -356,14 +356,20 @@
> static inline uint32_t opal_hash_value(size_t mask, const void *key,
>size_t keysize)
> {
> -size_t h, i;
> -const unsigned char *p;
> -
> -h = 0;
> -p = (const unsigned char *)key;
> -for (i = 0; i < keysize; i++, p++)
> -h = HASH_MULTIPLIER*h + *p;
> -return (uint32_t)(h & mask);
> +const unsigned char *p = (const unsigned char *) key;
> +uint32_t h = 0, i;
> +
> +for (i = 0 ; i < keysize ; ++i, ++p) {
> +h += *p;
> +h += h << 10;
> +h ^= h >> 6;
> +}
> +
> +h += h << 3;
> +h ^= h >> 11;
> +h += h << 15;
> +
> +return h & (uint32_t) mask;
> }
> 
> int opal_hash_table_get_value_ptr(opal_hash_table_t* ht, const void* key,
> 
> 
> 
> 
> -Nathan
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] RFC: improve the hash function used by opal_hash_table_t

2013-06-11 Thread Nathan Hjelm
What: Implement a better hash function in opal_hash_table_t. The function is a 
simple one-at-a-time Jenkin's hash (see 
http://en.wikipedia.org/wiki/Jenkins_hash_function) and has good collision 
rates and isn't overly complex or slow.

Why: I am preparing an update to the MCA variable system (adding performance 
variables) which will include a hash-based lookup function (in preperation for 
the inevitable MPI_T_cvar/pvar/category lookup functions-- MPI 3.0 errata). The 
current hash function is not very good so now seems like a good time to update 
it.

When: Will push this to trunk on Thursday if there are no objections.

Patch below

Index: opal/class/opal_hash_table.c
===
--- opal/class/opal_hash_table.c(revision 28609)
+++ opal/class/opal_hash_table.c(working copy)
@@ -356,14 +356,20 @@
 static inline uint32_t opal_hash_value(size_t mask, const void *key,
size_t keysize)
 {
-size_t h, i;
-const unsigned char *p;
-
-h = 0;
-p = (const unsigned char *)key;
-for (i = 0; i < keysize; i++, p++)
-h = HASH_MULTIPLIER*h + *p;
-return (uint32_t)(h & mask);
+const unsigned char *p = (const unsigned char *) key;
+uint32_t h = 0, i;
+
+for (i = 0 ; i < keysize ; ++i, ++p) {
+h += *p;
+h += h << 10;
+h ^= h >> 6;
+}
+
+h += h << 3;
+h ^= h >> 11;
+h += h << 15;
+
+return h & (uint32_t) mask;
 }

 int opal_hash_table_get_value_ptr(opal_hash_table_t* ht, const void* key,




-Nathan