On Fri, Nov 8, 2013 at 11:56 AM, Jed Brown <[email protected]> wrote:
> Jed Brown <[email protected]> writes: > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 1000 > > 0.870 real 0.857 user 0.010 sys 99.67 cpu > > > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 2000 > > 2.762 real 2.690 user 0.073 sys 100.03 cpu > > 0.870 * 2^2 = 3.48 > > > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 3000 > > 5.176 real 4.963 user 0.213 sys 100.01 cpu > > 0.870 * 3^2 = 7.83 > > 2.762 * (3/2)^2 = 4.59 > > > > $ time mpich-clang-optg/lib/ex26-obj/ex26 -N 4000 > > 8.183 real 7.893 user 0.280 sys 99.88 cpu > > 0.870 * 4^2 = 13.92 > > 2.762 * (4/2)^2 = 7.63 > > 5.176 * (4/3)^2 = 8.95 > > > > This looks stable at about 2 million lookups and insertions per second > > (or 4M hash operations). > > If we use the khash API the way it was intended (via kh_put and kh_val; > ignore the ugly API inconsistency) > > PetscHashICreate(table); > for (i = 0; i < N; ++i) { > for (j = 0; j < N; ++j) { > PetscInt key = (PetscMin(i,j) << 16) + PetscMax(i,j); > khint_t ret,idx = kh_put(HASHI,table,key,&ret); > if (ret == 1) kh_val(table,idx) = newp++; > } > } > PetscHashIDestroy(table); > > we get numbers that are about twice as good (no weird chaining and only > one lookup): > > $ for n in 1000 2000 3000 4000; do time mpich-clang-optg/lib/ex26-obj/ex26 > -N $n; done > 0.462 real 0.453 user 0.007 sys 99.59 cpu > 1.476 real 1.457 user 0.020 sys 100.01 cpu > 2.791 real 2.767 user 0.023 sys 99.96 cpu > 3.927 real 3.893 user 0.037 sys 100.06 cpu > > The debug-mode performance is also tons better. > > I see another hash table shootout by a good developer: > > http://preshing.com/20110603/hash-table-performance-tests/ > > The fastest in his comparison uses simple chaining instead of double > hashing (which khash uses). I don't have a direct comparison of the > two, and Jeff's version is hashing strings, so it would have to be > modified/simplified to compare directly. > > Does your algorithm need lots better than 4M hash lookups per second, or > do we just need to optimize the current implementation? > Okay, here is the clearly quadratic performance, and its in next. Build SNES ex12 and run using /PETSc3/petsc/petsc-dev/arch-c-opencl-opt-next/lib/ex12-obj/ex12 -run_type perf -refinement_limit 0.00000625 -variable_coefficient field -petscspace_order 1 -mat_petscspace_order 0 -show_initial 0 -show_solution 0 -petscfe_type basic -log_summary -interpolate -refinement_limit 0.000625 DMPlexInterpolate 4 1.0 1.7443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 12 0 0 0 24 12 0 0 0 24 0 -refinement_limit 0.0003125 DMPlexInterpolate 4 1.0 3.3111e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 13 0 0 0 24 13 0 0 0 24 0 -refinement_limit 0.0000625 DMPlexInterpolate 4 1.0 2.3465e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 21 0 0 0 24 21 0 0 0 24 0 -refinement_limit 0.00003125 DMPlexInterpolate 4 1.0 7.7508e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 30 0 0 0 24 30 0 0 0 24 0 -refinement_limit 0.000015625 DMPlexInterpolate 4 1.0 2.7267e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 42 0 0 0 24 42 0 0 0 24 0 -refinement_limit 0.0000078125 DMPlexInterpolate 4 1.0 1.0175e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 58 0 0 0 24 58 0 0 0 24 0 -refinement_limit 0.00000625 DMPlexInterpolate 4 1.0 3.8912e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 72 0 0 0 24 72 0 0 0 24 0 You can see the quadratic complexity kick in, and this is what people are complaining about. Does this mean we have to ditch kh_hash, or have I done something wrong. Matt -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
