Re: [petsc-dev] PetscHashIJ scaling problem?

Dmitry Karpeyev Sun, 10 Nov 2013 07:31:52 -0800

On Fri, Nov 8, 2013 at 11:56 AM, Jed Brown <[email protected]> wrote:

> Jed Brown <[email protected]> writes:
> > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 1000
> > 0.870 real   0.857 user   0.010 sys   99.67 cpu
> >
> > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 2000
> > 2.762 real   2.690 user   0.073 sys   100.03 cpu
> > 0.870 * 2^2 = 3.48
> >
> > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 3000
> > 5.176 real   4.963 user   0.213 sys   100.01 cpu
> > 0.870 * 3^2 = 7.83
> > 2.762 * (3/2)^2 = 4.59
> >
> > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 4000
> > 8.183 real   7.893 user   0.280 sys   99.88 cpu
> > 0.870 * 4^2 = 13.92
> > 2.762 * (4/2)^2 = 7.63
> > 5.176 * (4/3)^2 = 8.95
> >
> > This looks stable at about 2 million lookups and insertions per second
> > (or 4M hash operations).
>
> If we use the khash API the way it was intended (via kh_put and kh_val;
> ignore the ugly API inconsistency)
>
>   PetscHashICreate(table);
>   for (i = 0; i < N; ++i) {
>     for (j = 0; j < N; ++j) {
>       PetscInt key = (PetscMin(i,j) << 16) + PetscMax(i,j);
>       khint_t ret,idx = kh_put(HASHI,table,key,&ret);
>       if (ret == 1) kh_val(table,idx) = newp++;
>     }
>   }
>   PetscHashIDestroy(table);
>
> we get numbers that are about twice as good (no weird chaining and only
> one lookup):
>
One of the original use cases for PetscHash was pseudograph manipulation
(multiple edges allowed),
which required the 'multivalued' option that might slow things down
unnecessarily in the simpler
use cases.

PetscHash also tried to hide the idiosyncrasies of the khash API
anticipating that it might have to be
replaced at the backend with another hash table implementation. It's
possible both the PetscHash API
and the implementation need to be revised.  The API itself could probably
be simplified somewhat.
If we were to stick to khash long-term, maybe then the PetscHash API should
more directly reflect
khash's preferred use pattern - put-test-val.

Initially PetscHash was all macros, but then some of it was switched over
to PETSC_STATIC_INLINE
functions.  Even if these are being inlined by the compiler, there may be
some call overhead, which
may be avoidable.  Besides, it may be more natural to have macros to wrap
the use case in the snippet above.
And there are no error codes being returned from inside PetscHash at the
moment.  That might have to
change, however, if the bare realloc() calls in khash are replaced by
something like PetscRealloc().

> $ for n in 1000 2000 3000 4000; do time mpich-clang-optg/lib/ex26-obj/ex26
> -N $n; done
> 0.462 real   0.453 user   0.007 sys   99.59 cpu
> 1.476 real   1.457 user   0.020 sys   100.01 cpu
> 2.791 real   2.767 user   0.023 sys   99.96 cpu
> 3.927 real   3.893 user   0.037 sys   100.06 cpu
>
> The debug-mode performance is also tons better.
>
> I see another hash table shootout by a good developer:
>
> http://preshing.com/20110603/hash-table-performance-tests/

If the implementation of PetscHash is to be replaced, it would be a good
idea to do some representative
testing of the proposed new implementation.  In particular, I don't have a
good feel for how double hashing
compares to linear probing or to chaining.  The fastest implementation
above uses a smallish fixed-size
table, so that might not be an entirely fair comparison with khash et al. I
don't think the author claims
the test to be comprehensive anyhow.

Dmitry.

>
>
> The fastest in his comparison uses simple chaining instead of double
> hashing (which khash uses).  I don't have a direct comparison of the
> two, and Jeff's version is hashing strings, so it would have to be
> modified/simplified to compare directly.
>
> Does your algorithm need lots better than 4M hash lookups per second, or
> do we just need to optimize the current implementation?
>

Re: [petsc-dev] PetscHashIJ scaling problem?

Reply via email to