Matthew Knepley <[email protected]> writes:

> I think I have tracked down the DMPlex performance issue to a problem with
> PetscHashIJKL. I replaced it with PetscHashIJ in 2D, and I still have bad
> behavior. I made a simple test and the growth of runtime is definitely
> superlinear. The test is ex26 in knepley/fix-hash-scaling. I get
>
> next *$:/PETSc3/petsc/petsc-dev$
> /PETSc3/petsc/petsc-dev/arch-c-opencl-next/lib/ex26-obj/ex26 -N 100
> -log_summary | grep "(sec):" | cut -c 23-32

Your use of grep conceals the fact that you are not running in optimized mode.

> 6.067e-01
> next *$:/PETSc3/petsc/petsc-dev$
> /PETSc3/petsc/petsc-dev/arch-c-opencl-next/lib/ex26-obj/ex26 -N 110
> -log_summary | grep "(sec):" | cut -c 23-32
> 1.491e+00
> next *$:/PETSc3/petsc/petsc-dev$
> /PETSc3/petsc/petsc-dev/arch-c-opencl-next/lib/ex26-obj/ex26 -N 120
> -log_summary | grep "(sec):" | cut -c 23-32
> 2.913e+00
> next *$:/PETSc3/petsc/petsc-dev$
> /PETSc3/petsc/petsc-dev/arch-c-opencl-next/lib/ex26-obj/ex26 -N 130
> -log_summary | grep "(sec):" | cut -c 23-32
> 4.892e+00
> next *$:/PETSc3/petsc/petsc-dev$
> /PETSc3/petsc/petsc-dev/arch-c-opencl-next/lib/ex26-obj/ex26 -N 140
> -log_summary | grep "(sec):" | cut -c 23-32
> 6.812e+00
> next *$:/PETSc3/petsc/petsc-dev$
> /PETSc3/petsc/petsc-dev/arch-c-opencl-next/lib/ex26-obj/ex26 -N 150
> -log_summary | grep "(sec):" | cut -c 23-32
> 9.777e+00

This test is a nested loop so it should scale quadratically.

$ time mpich-clang-optg/lib/ex26-obj/ex26 -N 1000                               
                                            
0.870 real   0.857 user   0.010 sys   99.67 cpu

$ time mpich-clang-optg/lib/ex26-obj/ex26 -N 2000                               
                                            
2.762 real   2.690 user   0.073 sys   100.03 cpu
0.870 * 2^2 = 3.48

$ time mpich-clang-optg/lib/ex26-obj/ex26 -N 3000                               
                                            
5.176 real   4.963 user   0.213 sys   100.01 cpu
0.870 * 3^2 = 7.83
2.762 * (3/2)^2 = 4.59

$ time mpich-clang-optg/lib/ex26-obj/ex26 -N 4000
8.183 real   7.893 user   0.280 sys   99.88 cpu
0.870 * 4^2 = 13.92
2.762 * (4/2)^2 = 7.63
5.176 * (4/3)^2 = 8.95

This looks stable at about 2 million lookups and insertions per second
(or 4M hash operations).

Attachment: pgp_3UsgkpQkN.pgp
Description: PGP signature

Reply via email to