On Sun, Nov 10, 2013 at 12:47 PM, Jed Brown <[email protected]> wrote:
> Matthew Knepley <[email protected]> writes: > > Okay, here is the clearly quadratic performance, and its in next. Build > > SNES ex12 and run using > > > > /PETSc3/petsc/petsc-dev/arch-c-opencl-opt-next/lib/ex12-obj/ex12 > -run_type > > perf -refinement_limit 0.00000625 -variable_coefficient field > > -petscspace_order 1 -mat_petscspace_order 0 -show_initial 0 > -show_solution > > 0 -petscfe_type basic -log_summary -interpolate > > > > -refinement_limit 0.000625 DMPlexInterpolate 4 1.0 1.7443e-02 > 1.0 > > 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 12 0 0 0 24 12 0 0 0 24 0 > > -refinement_limit 0.0003125 DMPlexInterpolate 4 1.0 3.3111e-02 > 1.0 > > 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 13 0 0 0 24 13 0 0 0 24 0 > > -refinement_limit 0.0000625 DMPlexInterpolate 4 1.0 2.3465e-01 > 1.0 > > 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 21 0 0 0 24 21 0 0 0 24 0 > > -refinement_limit 0.00003125 DMPlexInterpolate 4 1.0 7.7508e-01 > 1.0 > > 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 30 0 0 0 24 30 0 0 0 24 0 > > -refinement_limit 0.000015625 DMPlexInterpolate 4 1.0 2.7267e+00 > 1.0 > > 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 42 0 0 0 24 42 0 0 0 24 0 > > -refinement_limit 0.0000078125 DMPlexInterpolate 4 1.0 1.0175e+01 > 1.0 > > 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 58 0 0 0 24 58 0 0 0 24 0 > > -refinement_limit 0.00000625 DMPlexInterpolate 4 1.0 3.8912e+01 > 1.0 > > 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 72 0 0 0 24 72 0 0 0 24 0 > > Testing in optimized mode, I have > > -refinement_limit 0.00001 > DMPlexInterpolate 4 1.0 2.1803e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.2e+01 56 0 0 0 24 56 0 0 0 24 0 > DMPlexStratify 9 1.0 2.0112e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.2e+01 52 0 0 0 24 52 0 0 0 24 0 > DMPlexPreallocate 1 1.0 5.3650e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 14 0 0 0 12 14 0 0 0 12 0 > DMPlexResidualFEM 1 1.0 5.6863e-01 1.0 3.54e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 15 54 0 0 0 15 54 0 0 0 6 > > -refinement_limit 0.000005 > DMPlexInterpolate 4 1.0 9.0466e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.2e+01 72 0 0 0 24 72 0 0 0 24 0 > DMPlexStratify 9 1.0 8.6705e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.2e+01 69 0 0 0 24 69 0 0 0 24 0 > DMPlexPreallocate 1 1.0 1.0999e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 9 0 0 0 12 9 0 0 0 12 0 > DMPlexResidualFEM 1 1.0 1.1739e+00 1.0 7.08e+06 1.0 0.0e+00 0.0e+00 > 0.0e+00 9 54 0 0 0 9 54 0 0 0 6 > > -refinement_limit 0.0000025 > DMPlexInterpolate 4 1.0 3.9527e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.2e+01 85 0 0 0 24 85 0 0 0 24 0 > DMPlexStratify 9 1.0 3.8794e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.2e+01 83 0 0 0 24 83 0 0 0 24 0 > DMPlexPreallocate 1 1.0 2.1816e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 5 0 0 0 12 5 0 0 0 12 0 > DMPlexResidualFEM 1 1.0 2.4221e+00 1.0 1.42e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 5 54 0 0 0 5 54 0 0 0 6 > > These are coming from DMLabelSetValue(), which is O(N) when the point > does not yet exist. To insert/modify these entries with storage bounded > by the number of points rather than the number of times you mutate them, > either use a hash (perhaps compressing to sorted array later) or a tree. > (The guy that wrote khash also wrote kbtree, which has the same > interface, though it tends to use more memory and be slower.) > I rewrote insertion for DMLabel to use the hash table, and convert to flat arrays after insertion. Now its all linear: -refinement_limit 0.000625 DMPlexInterpolate 4 1.0 1.1356e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 9 0 0 0 24 9 0 0 0 24 0 -refinement_limit 0.0003125 DMPlexInterpolate 4 1.0 2.2995e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 9 0 0 0 24 9 0 0 0 24 0 -refinement_limit 0.0000625 DMPlexInterpolate 4 1.0 9.0071e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 9 0 0 0 24 9 0 0 0 24 0 -refinement_limit 0.00003125 DMPlexInterpolate 4 1.0 1.7823e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 9 0 0 0 24 9 0 0 0 24 0 -refinement_limit 0.000015625 DMPlexInterpolate 4 1.0 3.7626e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 9 0 0 0 24 9 0 0 0 24 0 -refinement_limit 0.0000078125 DMPlexInterpolate 4 1.0 8.7979e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 11 0 0 0 24 11 0 0 0 24 0 -refinement_limit 0.00000625 DMPlexInterpolate 4 1.0 1.7574e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.2e+01 10 0 0 0 24 10 0 0 0 24 0 Time (sec): 1.275e-01 1.00000 1.275e-01 Time (sec): 2.476e-01 1.00000 2.476e-01 Time (sec): 9.859e-01 1.00000 9.859e-01 Time (sec): 1.993e+00 1.00000 1.993e+00 Time (sec): 4.032e+00 1.00000 4.032e+00 Time (sec): 8.257e+00 1.00000 8.257e+00 Time (sec): 1.674e+01 1.00000 1.674e+01 This should help Karl running his GPU test now :) Matt > https://github.com/attractivechaos/klib/blob/master/kbtree.h > > > If you run in a debugger, you'll find that your stack is pretty much > always here: > > (gdb) bt > #0 0x00007ffff51314be in __memmove_ssse3_back () from /usr/lib/libc.so.6 > #1 0x00007ffff751bca4 in PetscMemmove (a=0x26a0030, b=0x26a002c, > n=979520) at src/sys/utils/memc.c:94 > #2 0x00007ffff78f51b6 in DMLabelSetValue (label=0x4529b0, point=404548, > value=<optimized out>) at src/dm/impls/plex/plexlabel.c:279 > #3 0x00007ffff78ad27f in DMPlexStratify (dm=0x429350) at > src/dm/impls/plex/plex.c:1440 > #4 0x00007ffff78e8b63 in DMPlexInterpolateFaces_Internal > (cellDepth=<error reading variable: Cannot access memory at address 0x2>, > dm=<optimized out>, idm=<optimized out>) at > src/dm/impls/plex/plexinterpolate.c:290 > #5 DMPlexInterpolate (dm=0x428370, dmInt=0x7fffffffbda0) at > src/dm/impls/plex/plexinterpolate.c:331 > #6 0x00007ffff78a629a in DMPlexCreateFromCellList (comm=<optimized out>, > dim=<optimized out>, numCells=262144, numVertices=131585, > numCorners=<optimized out>, interpolate=<optimized out>, cells=<optimized > out>, spaceDim=40501292, vertexCoords=0x26ac3a4, dm=0x7fffffffc520) at > src/dm/impls/plex/plexcreate.c:945 > #7 0x00007ffff78ba14d in DMPlexRefine_Triangle (dm=<optimized out>, > maxVolumes=0x42e9f0, dmRefined=0x7fffffffc520) at > src/dm/impls/plex/plex.c:3639 > #8 0x00007ffff78baae0 in DMRefine_Plex (dm=0x42cb40, comm=40501292, > dmRefined=0x7fffffffc520) at src/dm/impls/plex/plex.c:4313 > #9 0x00007ffff791d9cc in DMRefine (dm=0x42cb40, comm=40501292, > dmf=0x7fffffffc520) at src/dm/interface/dm.c:1498 > #10 0x0000000000404c64 in CreateMesh (comm=1140850688, > user=0x7fffffffc608, dm=0x7fffffffd740) at > /home/jed/petsc/src/snes/examples/tutorials/ex12.c:311 > #11 0x0000000000406e60 in main (argc=21, argv=0x7fffffffd858) at > /home/jed/petsc/src/snes/examples/tutorials/ex12.c:648 > > > > > DMPlexStratify also calls ISGetIndices, but forgets ISRestoreIndices. > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
