On Fri, Nov 8, 2013 at 11:56 AM, Jed Brown <[email protected]> wrote:
> Jed Brown <[email protected]> writes: > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 1000 > > 0.870 real 0.857 user 0.010 sys 99.67 cpu > > > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 2000 > > 2.762 real 2.690 user 0.073 sys 100.03 cpu > > 0.870 * 2^2 = 3.48 > > > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 3000 > > 5.176 real 4.963 user 0.213 sys 100.01 cpu > > 0.870 * 3^2 = 7.83 > > 2.762 * (3/2)^2 = 4.59 > > > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 4000 > > 8.183 real 7.893 user 0.280 sys 99.88 cpu > > 0.870 * 4^2 = 13.92 > > 2.762 * (4/2)^2 = 7.63 > > 5.176 * (4/3)^2 = 8.95 > > > > This looks stable at about 2 million lookups and insertions per second > > (or 4M hash operations). > > If we use the khash API the way it was intended (via kh_put and kh_val; > ignore the ugly API inconsistency) > > PetscHashICreate(table); > for (i = 0; i < N; ++i) { > for (j = 0; j < N; ++j) { > PetscInt key = (PetscMin(i,j) << 16) + PetscMax(i,j); > khint_t ret,idx = kh_put(HASHI,table,key,&ret); > if (ret == 1) kh_val(table,idx) = newp++; > } > } > PetscHashIDestroy(table); > > we get numbers that are about twice as good (no weird chaining and only > one lookup): > One of the original use cases for PetscHash was pseudograph manipulation (multiple edges allowed), which required the 'multivalued' option that might slow things down unnecessarily in the simpler use cases. PetscHash also tried to hide the idiosyncrasies of the khash API anticipating that it might have to be replaced at the backend with another hash table implementation. It's possible both the PetscHash API and the implementation need to be revised. The API itself could probably be simplified somewhat. If we were to stick to khash long-term, maybe then the PetscHash API should more directly reflect khash's preferred use pattern - put-test-val. Initially PetscHash was all macros, but then some of it was switched over to PETSC_STATIC_INLINE functions. Even if these are being inlined by the compiler, there may be some call overhead, which may be avoidable. Besides, it may be more natural to have macros to wrap the use case in the snippet above. And there are no error codes being returned from inside PetscHash at the moment. That might have to change, however, if the bare realloc() calls in khash are replaced by something like PetscRealloc(). > $ for n in 1000 2000 3000 4000; do time mpich-clang-optg/lib/ex26-obj/ex26 > -N $n; done > 0.462 real 0.453 user 0.007 sys 99.59 cpu > 1.476 real 1.457 user 0.020 sys 100.01 cpu > 2.791 real 2.767 user 0.023 sys 99.96 cpu > 3.927 real 3.893 user 0.037 sys 100.06 cpu > > The debug-mode performance is also tons better. > > I see another hash table shootout by a good developer: > > http://preshing.com/20110603/hash-table-performance-tests/ If the implementation of PetscHash is to be replaced, it would be a good idea to do some representative testing of the proposed new implementation. In particular, I don't have a good feel for how double hashing compares to linear probing or to chaining. The fastest implementation above uses a smallish fixed-size table, so that might not be an entirely fair comparison with khash et al. I don't think the author claims the test to be comprehensive anyhow. Dmitry. > > > The fastest in his comparison uses simple chaining instead of double > hashing (which khash uses). I don't have a direct comparison of the > two, and Jeff's version is hashing strings, so it would have to be > modified/simplified to compare directly. > > Does your algorithm need lots better than 4M hash lookups per second, or > do we just need to optimize the current implementation? >
