On Tue, Apr 28, 2020 at 7:03 AM Stefano Zampini <[email protected]> wrote:
> I think the slowdown in the second test (interpolate 0) is coming from the > fact that Plex has to compute the dual graph on the fly, see here > https://gitlab.com/petsc/petsc/-/blob/master/src/dm/impls/plex/plexpartition.c#L469 > . > You are right. Matt > Are you having a performance regression with DMPLEX on this second case, > or is it just the first time you noticed it? > > Il giorno mar 28 apr 2020 alle ore 13:57 Matthew Knepley < > [email protected]> ha scritto: > >> On Tue, Apr 28, 2020 at 5:19 AM Pierre Jolivet < >> [email protected]> wrote: >> >>> On 14 Apr 2020, at 2:36 PM, Matthew Knepley <[email protected]> wrote: >>> >>> On Tue, Apr 14, 2020 at 6:36 AM Pierre Jolivet < >>> [email protected]> wrote: >>> >>>> Hello, >>>> I’d like to call DMPlexInterpolate after DMPlexDistribute and not the >>>> other way around for performance reasons (please stop me here if this is >>>> equivalent). >>>> When there is no overlap in DMPlexDistribute, it goes through fine. >>>> If there is overlap, I run into an error. >>>> Is this the expected behavior? >>> >>> >>> Sorry for taking so long to get back at this. >>> I thought I got everything setup, but when looking at the performance, >>> I’m quite surprised. >>> I rewound and looked at src/dm/impls/plex/tutorials/ex2.c by just adding >>> the DMPlexDistribute step. I guess this is equivalent to your steps #1 and >>> #2. >>> diff --git a/src/dm/impls/plex/tutorials/ex2.c >>> b/src/dm/impls/plex/tutorials/ex2.c >>> index a069d922b2..5e5c4fb584 100644 >>> --- a/src/dm/impls/plex/tutorials/ex2.c >>> +++ b/src/dm/impls/plex/tutorials/ex2.c >>> @@ -65,2 +65,5 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx >>> *user, DM *dm) >>> ierr = DMPlexCreateFromFile(comm, user->filename, >>> user->interpolate, dm);CHKERRQ(ierr); >>> + DM dmParallel; >>> + ierr = DMPlexDistribute(*dm, 0, NULL, &dmParallel);CHKERRQ(ierr); >>> + ierr = DMDestroy(&dmParallel);CHKERRQ(ierr); >>> } >>> >>> With a cube of 741000 nodes 4574068 elements. >>> $ cat untitled.geo && ~/gmsh-4.5.4-Linux64/bin/gmsh untitled.geo -bin -3 >>> //+ >>> SetFactory("OpenCASCADE"); >>> Box(1) = {-0.5, -0.5, 0, 1, 1, 1}; >>> Characteristic Length {:} = 0.01; >>> >>> I get the following timings (optimized build, scalar-type=real, >>> 64-bit-indices=false, SKL cluster). >>> $ mpirun -n 120 ./ex2 -filename untitled.msh -log_view -interpolate true >>> DMPlexInterp 1 1.0 1.0444e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 3.0e+00 73 0 0 0 5 73 0 0 0 6 0 >>> DMPlexDistribute 1 1.0 2.8793e+01 1.0 0.00e+00 0.0 3.6e+03 6.2e+05 >>> 3.3e+01 21 0100100 60 21 0100100 69 0 >>> $ mpirun -n 120 ./ex2 -filename untitled.msh -log_view -interpolate >>> false >>> DMPlexDistribute 1 1.0 7.0265e+02 1.0 0.00e+00 0.0 3.1e+03 2.0e+05 >>> 2.6e+01 99 0100100 58 99 0100100 68 0 >>> >>> Do you think I messed up something else? >>> Our legacy implementation, on the same mesh, takes around 20 seconds >>> (accounting for the ParMETIS call) to partition and distribute the mesh and >>> generate the underlying structures for our FEM kernels (kd tree, >>> distributed numbering, so on and so forth). >>> Of course, DMPlex is much more versatile and generic, so I’d understand >>> if it’s a little bit slower, but that’s quite a lot slower right now. >>> Are there any option I could try to play with to speed things up? >>> >> >> Yes, something is messed up. Vaclav's latest paper has almost the exact >> setup for the cube benchmark (https://arxiv.org/pdf/2004.08729.pdf), and >> you can see that the interpolation time is an order of magnitude smaller >> for 128 procs and also scales linearly. What you show above in ex2 looks >> serial, in that the mesh is loaded on 1 proc, and then interpolation is >> also done on 1 proc, which corresponds to the time he shows as "Serial >> Startup". I cannot explain the poor performance of your second run, however >> the interpolation time in the first can be greatly reduced by interpolating >> in parallel. Thus, the right answer is to load in parallel, interpolation, >> and redistribute. That is what we do in that paper. >> >> Short answer: If you want scalable load and interpolation, I think you >> should be using the parallel stuff that Vaclav just put in. I think that >> means first converting your mesh to the HDF5 format using CreateFromFile >> and then DMView to hdf5. Then you can load it in parallel. Vaclav, has >> everything been pushed for this? >> >> Thanks, >> >> Matt >> >> Thanks in advance for your help, >>> Pierre >>> >>> Yes. What we want you to do is: >>> >>> 1) Load/generate mesh >>> >>> 2) Distribute (this can be done at the same time as load with parallel >>> load) >>> >>> 3) Interpolate (this is also an option from parallel load) >>> >>> 4) If necessary, redistribute for load balance >>> >>> 5) Construct overlap >>> >>> When you pass '1' below to DMDistribute(), it distributes as normal >>> and then calls >>> >>> >>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/DMPLEX/DMPlexDistributeOverlap.html >>> >>> at the end. So you just postpone calling that until you have >>> interpolated. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Here is a MWE. >>>> $ patch -p1 < patch.txt >>>> $ cd src/dm/impls/plex/tests/ >>>> $ make ex18 >>>> $ mpirun -n 2 ./ex18 -distribute -interpolate after_distribute >>>> [0]PETSC ERROR: --------------------- Error Message >>>> -------------------------------------------------------------- >>>> [0]PETSC ERROR: Petsc has generated inconsistent data >>>> [0]PETSC ERROR: Point SF contains 1 which is a cell >>>> >>>> Thanks, >>>> Pierre >>>> >>>> diff --git a/src/dm/impls/plex/tests/ex18.c >>>> b/src/dm/impls/plex/tests/ex18.c >>>> index 07421b3522..dd62be58e5 100644 >>>> --- a/src/dm/impls/plex/tests/ex18.c >>>> +++ b/src/dm/impls/plex/tests/ex18.c >>>> @@ -806 +806 @@ static PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx >>>> *user, DM *dm) >>>> - ierr = DMPlexDistribute(*dm, 0, NULL, &pdm);CHKERRQ(ierr); >>>> + ierr = DMPlexDistribute(*dm, 1, NULL, &pdm);CHKERRQ(ierr); >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> <http://www.cse.buffalo.edu/~knepley/> >>> >>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> > > > -- > Stefano > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
