Excellant, we'll still need to fix the parallel coloring but I'm glad we can put that off :-) Barry > On Aug 18, 2019, at 1:19 PM, Mark Lohry <[email protected]> wrote: > > Barry, thanks for your suggestion to do the serial coloring on the mesh > itself / block size 1 case first, and then manually color the blocks. Works > like a charm. The 2 million cell case is small enough to create the sparse > system on one process and color it in about a second. > > On Sun, Aug 11, 2019 at 9:41 PM Mark Lohry <[email protected]> wrote: > So the parallel JP runs just as proportionally slow in serial as it does in > parallel. > > valgrind --tool=callgrind shows essentially 100% of the runtime in > jp.c:255-262, within the larger loop commented > /* pass two -- color it by looking at nearby vertices and building a mask */ > > for (j=0;j<ncols;j++) { > if (seen[cols[j]] != cidx) { > bidx++; > seen[cols[j]] = cidx; > idxbuf[bidx] = cols[j]; > distbuf[bidx] = dist+1; > } > } > > I'll dig into how this algorithm is supposed to work, but anything obvious in > there? It kinda feels like something is doing something N^2 or worse when it > doesn't need to be. > > On Sun, Aug 11, 2019 at 3:47 PM Mark Lohry <[email protected]> wrote: > Sorry, forgot to reply to the mailing list. > > where does your matrix come from? A mesh? Structured, unstructured, a graph, > something else? What type of discretization? > > Unstructured tetrahedral mesh (CGNS, I can give links to the files if that's > of interest), the discretization is arbitrary order discontinuous galerkin > for compressible navier-stokes. 5 coupled equations x 10 nodes per element > for this 2nd order case to give the 50x50 blocks. Each tet cell dependent on > neighbors, so for tets 4 extra off-diagonal blocks per cell. > > I would expect one could exploit the large block size here in computing the > coloring -- the underlying mesh is 2M nodes with the same connectivity as a > standard cell-centered finite volume method. > > > > On Sun, Aug 11, 2019 at 2:12 PM Smith, Barry F. <[email protected]> wrote: > > These are due to attempting to copy the entire matrix to one process and do > the sequential coloring there. Definitely won't work for larger problems, > we'll > > need to focus on > > 1) having useful parallel coloring and > 2) maybe using an alternative way to determine the coloring: > > where does your matrix come from? A mesh? Structured, unstructured, a > graph, something else? What type of discretization? > > Barry > > > > On Aug 11, 2019, at 10:21 AM, Mark Lohry <[email protected]> wrote: > > > > On the very large case, there does appear to be some kind of overflow > > ending up with an attempt to allocate too much memory in MatFDColorCreate, > > even with --with-64-bit-indices. Full terminal output here: > > https://raw.githubusercontent.com/mlohry/petsc_miscellany/master/slurm-3451378.out > > > > In particular: > > PETSC ERROR: Memory requested 1036713571771129344 > > > > Log filename here: > > https://github.com/mlohry/petsc_miscellany/blob/master/petsclogfile.0 > > > > On Sun, Aug 11, 2019 at 9:49 AM Mark Lohry <[email protected]> wrote: > > Hi Barry, I made a minimum example comparing the colorings on a very small > > case. You'll need to unzip the jacobian_sparsity.tgz to run it. > > > > https://github.com/mlohry/petsc_miscellany > > > > This is sparse block system with 50x50 block sizes, ~7,680 blocks. > > Comparing the coloring types sl, lf, jp, id, greedy, I get these timings > > wallclock, running with -np 16: > > > > SL: 1.5s > > LF: 1.3s > > JP: 29s ! > > ID: 1.4s > > greedy: 2s > > > > As far as I'm aware, JP is the only parallel coloring implemented? It is > > looking as though I'm simply running out of memory with the sequential > > methods (I should apologize to my cluster admin for chewing up 10TB and > > crashing...). > > > > On this small problem JP is taking 30 seconds wallclock, but that time > > grows exponentially with larger problems (last I tried it, I killed the job > > after 24 hours of spinning.) > > > > Also as I mentioned, the "greedy" method appears to be producing an invalid > > coloring for me unless I also specify weights "lexical". But > > "-mat_coloring_test" doesn't complain. I'll have to make a different > > example to actually show it's an invalid coloring. > > > > Thanks, > > Mark > > > > > > > > On Sat, Aug 10, 2019 at 4:38 PM Smith, Barry F. <[email protected]> wrote: > > > > Mark, > > > > Would you be able to cook up an example (or examples) that demonstrate > > the problem (or problems) and how to run it? If you send it to us and we > > can reproduce the problem then we'll fix it. If need be you can send large > > matrices to [email protected] don't send them to petsc-users since it > > will reject large files. > > > > Barry > > > > > > > On Aug 10, 2019, at 1:56 PM, Mark Lohry <[email protected]> wrote: > > > > > > Thanks Barry, been trying all of the above. I think I've homed in on it > > > to an out-of-memory and/or integer overflow inside MatColoringApply. > > > Which makes some sense since I only have a sequential coloring algorithm > > > working... > > > > > > Is anyone out there using coloring in parallel? I still have the same > > > previously mentioned issues with MATCOLORINGJP (on small problems takes > > > upwards of 30 minutes to run) which as far as I can see is the only > > > "parallel" implementation. MATCOLORINGSL and MATCOLORINGID both work on > > > less large problems, MATCOLORINGGREEDY works on less large problems if > > > and only if I set weight type to MAT_COLORING_WEIGHT_LEXICAL, and all 3 > > > are failing on larger problems. > > > > > > On Tue, Aug 6, 2019 at 9:36 AM Smith, Barry F. <[email protected]> wrote: > > > > > > There is also > > > > > > $ ./configure --help | grep color > > > --with-is-color-value-type=<char,short> > > > char, short can store 256, 65536 colors current: short > > > > > > I can't imagine you have over 65 k colors but something to check > > > > > > > > > > On Aug 6, 2019, at 8:19 AM, Mark Lohry <[email protected]> wrote: > > > > > > > > My first guess is that the code is getting integer overflow somewhere. > > > > 25 billion is well over the 2 billion that 32 bit integers can hold. > > > > > > > > Mine as well -- though in later tests I have the same issue when using > > > > --with-64-bit-indices. Ironically I had removed that flag at some point > > > > because the coloring / index set was using a serious chunk of total > > > > memory on medium sized problems. > > > > > > Understood > > > > > > > > > > > Questions on the petsc internals there though: Are matrices indexed > > > > with two integers (i,j) so the max matrix dimension is (int limit) x > > > > (int limit) or a single integer so the max dimension is sqrt(int > > > > limit)? > > > > Also I was operating under the assumption the 32 bit limit should only > > > > constrain per-process problem sizes (25B over 400 processes giving 62M > > > > non-zeros per process), is that not right? > > > > > > It is mostly right but may not be right for everything in PETSc. For > > > example I don't know about the MatFD code > > > > > > Since using a debugger is not practical for large code counts to find > > > the point the two processes diverge you can try > > > > > > -log_trace > > > > > > or > > > > > > -log_trace filename > > > > > > in the second case it will generate one file per core called filename.%d > > > note it will produce a lot of output > > > > > > Good luck > > > > > > > > > > > > > > > > > We are adding more tests to nicely handle integer overflow but it is > > > > not easy since it can occur in so many places > > > > > > > > Totally understood. I know the pain of only finding an overflow bug > > > > after days of waiting in a cluster queue for a big job. > > > > > > > > We urge you to upgrade. > > > > > > > > I'll do that today and hope for the best. On first tests on 3.11.3, I > > > > still have a couple issues with the coloring code: > > > > > > > > * I am still getting the nasty hangs with MATCOLORINGJP mentioned here: > > > > https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2017-October/033746.html > > > > * MatColoringSetType(coloring, MATCOLORINGGREEDY); this produces a > > > > wrong jacobian unless I also set MatColoringSetWeightType(coloring, > > > > MAT_COLORING_WEIGHT_LEXICAL); > > > > * MATCOLORINGMIS mentioned in the documentation doesn't seem to exist. > > > > > > > > Thanks, > > > > Mark > > > > > > > > On Tue, Aug 6, 2019 at 8:56 AM Smith, Barry F. <[email protected]> > > > > wrote: > > > > > > > > My first guess is that the code is getting integer overflow > > > > somewhere. 25 billion is well over the 2 billion that 32 bit integers > > > > can hold. > > > > > > > > We urge you to upgrade. > > > > > > > > Regardless for problems this large you likely need the ./configure > > > > option --with-64-bit-indices > > > > > > > > We are adding more tests to nicely handle integer overflow but it is > > > > not easy since it can occur in so many places > > > > > > > > Hopefully this will resolve your problem with large process counts > > > > > > > > Barry > > > > > > > > > > > > > On Aug 6, 2019, at 7:43 AM, Mark Lohry via petsc-users > > > > > <[email protected]> wrote: > > > > > > > > > > I'm running some larger cases than I have previously with a working > > > > > code, and I'm running into failures I don't see on smaller cases. > > > > > Failures are on 400 cores, ~100M unknowns, 25B non-zero jacobian > > > > > entries. Runs successfully on half size case on 200 cores. > > > > > > > > > > 1) The first error output from petsc is "MPI_Allreduce() called in > > > > > different locations". Is this a red herring, suggesting some process > > > > > failed prior to this and processes have diverged? > > > > > > > > > > 2) I don't think I'm running out of memory -- globally at least. > > > > > Slurm output shows e.g. > > > > > Memory Utilized: 459.15 GB (estimated maximum) > > > > > Memory Efficiency: 26.12% of 1.72 TB (175.78 GB/node) > > > > > I did try with and without --64-bit-indices. > > > > > > > > > > 3) The debug traces seem to vary, see below. I *think* the failure > > > > > might be happening in the vicinity of a Coloring call. I'm using > > > > > MatFDColoring like so: > > > > > > > > > > ISColoring iscoloring; > > > > > MatFDColoring fdcoloring; > > > > > MatColoring coloring; > > > > > > > > > > MatColoringCreate(ctx.JPre, &coloring); > > > > > MatColoringSetType(coloring, MATCOLORINGGREEDY); > > > > > > > > > > // converges stalls badly without this on small cases, don't know > > > > > why > > > > > MatColoringSetWeightType(coloring, MAT_COLORING_WEIGHT_LEXICAL); > > > > > > > > > > // none of these worked. > > > > > // MatColoringSetType(coloring, MATCOLORINGJP); > > > > > // MatColoringSetType(coloring, MATCOLORINGSL); > > > > > // MatColoringSetType(coloring, MATCOLORINGID); > > > > > MatColoringSetFromOptions(coloring); > > > > > > > > > > MatColoringApply(coloring, &iscoloring); > > > > > MatColoringDestroy(&coloring); > > > > > MatFDColoringCreate(ctx.JPre, iscoloring, &fdcoloring); > > > > > > > > > > I have had issues in the past with getting a functional coloring > > > > > setup for finite difference jacobians, and the above is the only > > > > > configuration I've managed to get working successfully. Have there > > > > > been any significant development changes to that area of code since > > > > > v3.8.3? I'll try upgrading in the mean time and hope for the best. > > > > > > > > > > > > > > > > > > > > Any ideas? > > > > > > > > > > > > > > > Thanks, > > > > > Mark > > > > > > > > > > > > > > > ************************************* > > > > > > > > > > mlohry@lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429773.out > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Petsc has generated inconsistent data > > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations > > > > > (functions) on different processors > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19 by > > > > > mlohry Tue Aug 6 06:05:02 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun --with-64-bit-indices > > > > > [0]PETSC ERROR: #1 TSSetMaxSteps() line 2944 in > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: #2 TSSetMaxSteps() line 2944 in > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Invalid argument > > > > > [0]PETSC ERROR: Enum value must be same on all processes, argument # 2 > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19 by > > > > > mlohry Tue Aug 6 06:05:02 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun --with-64-bit-indices > > > > > [0]PETSC ERROR: #3 TSSetExactFinalTime() line 2250 in > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: > > > > > ------------------------------------------------------------------------ > > > > > [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or > > > > > the batch system) has told this process to end > > > > > [0]PETSC ERROR: Try option -start_in_debugger or > > > > > -on_error_attach_debugger > > > > > [0]PETSC ERROR: or see > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > > > > OS X to find memory corruption errors > > > > > [0]PETSC ERROR: likely location of problem given in stack below > > > > > [0]PETSC ERROR: --------------------- Stack Frames > > > > > ------------------------------------ > > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > > > > available, > > > > > [0]PETSC ERROR: INSTEAD the line number of the start of the > > > > > function > > > > > [0]PETSC ERROR: is given. > > > > > [0]PETSC ERROR: [0] PetscCommDuplicate line 130 > > > > > /home/mlohry/build/external/petsc/src/sys/objects/tagm.c > > > > > [0]PETSC ERROR: [0] PetscHeaderCreate_Private line 34 > > > > > /home/mlohry/build/external/petsc/src/sys/objects/inherit.c > > > > > [0]PETSC ERROR: [0] DMCreate line 36 > > > > > /home/mlohry/build/external/petsc/src/dm/interface/dm.c > > > > > [0]PETSC ERROR: [0] DMShellCreate line 983 > > > > > /home/mlohry/build/external/petsc/src/dm/impls/shell/dmshell.c > > > > > [0]PETSC ERROR: [0] TSGetDM line 5287 > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: [0] TSSetIFunction line 1310 > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: [0] TSSetExactFinalTime line 2248 > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: [0] TSSetMaxSteps line 2942 > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Signal received > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n19 by > > > > > mlohry Tue Aug 6 06:05:02 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun --with-64-bit-indices > > > > > [0]PETSC ERROR: #4 User provided function() line 0 in unknown file > > > > > > > > > > > > > > > ************************************* > > > > > > > > > > > > > > > mlohry@lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429158.out > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Petsc has generated inconsistent data > > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code > > > > > lines) on different processors > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1 by > > > > > mlohry Mon Aug 5 23:58:19 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #1 MatSetBlockSizes() line 7206 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: #2 MatSetBlockSizes() line 7206 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: #3 MatSetBlockSize() line 7170 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Petsc has generated inconsistent data > > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code > > > > > lines) on different processors > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1 by > > > > > mlohry Mon Aug 5 23:58:19 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #4 VecSetSizes() line 1310 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c > > > > > [0]PETSC ERROR: #5 VecSetSizes() line 1310 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c > > > > > [0]PETSC ERROR: #6 VecCreateMPIWithArray() line 609 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c > > > > > [0]PETSC ERROR: #7 MatSetUpMultiply_MPIAIJ() line 111 in > > > > > /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mmaij.c > > > > > [0]PETSC ERROR: #8 MatAssemblyEnd_MPIAIJ() line 735 in > > > > > /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: #9 MatAssemblyEnd() line 5243 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: > > > > > ------------------------------------------------------------------------ > > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > > > > probably memory access out of range > > > > > [0]PETSC ERROR: Try option -start_in_debugger or > > > > > -on_error_attach_debugger > > > > > [0]PETSC ERROR: or see > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > > > > OS X to find memory corruption errors > > > > > [0]PETSC ERROR: likely location of problem given in stack below > > > > > [0]PETSC ERROR: --------------------- Stack Frames > > > > > ------------------------------------ > > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > > > > available, > > > > > [0]PETSC ERROR: INSTEAD the line number of the start of the > > > > > function > > > > > [0]PETSC ERROR: is given. > > > > > [0]PETSC ERROR: [0] PetscSFSetGraphLayout line 497 > > > > > /home/mlohry/build/external/petsc/src/vec/is/utils/pmap.c > > > > > [0]PETSC ERROR: [0] GreedyColoringLocalDistanceTwo_Private line 208 > > > > > /home/mlohry/build/external/petsc/src/mat/color/impls/greedy/greedy.c > > > > > [0]PETSC ERROR: [0] MatColoringApply_Greedy line 559 > > > > > /home/mlohry/build/external/petsc/src/mat/color/impls/greedy/greedy.c > > > > > [0]PETSC ERROR: [0] MatColoringApply line 357 > > > > > /home/mlohry/build/external/petsc/src/mat/color/interface/matcoloring.c > > > > > [0]PETSC ERROR: [0] VecSetSizes line 1308 > > > > > /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c > > > > > [0]PETSC ERROR: [0] VecCreateMPIWithArray line 605 > > > > > /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c > > > > > [0]PETSC ERROR: [0] MatSetUpMultiply_MPIAIJ line 24 > > > > > /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mmaij.c > > > > > [0]PETSC ERROR: [0] MatAssemblyEnd_MPIAIJ line 698 > > > > > /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5234 > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] MatSetBlockSizes line 7204 > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] MatSetBlockSize line 7167 > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Signal received > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h21c2n1 by > > > > > mlohry Mon Aug 5 23:58:19 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #10 User provided function() line 0 in unknown file > > > > > > > > > > > > > > > > > > > > ************************* > > > > > > > > > > > > > > > mlohry@lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429134.out > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Petsc has generated inconsistent data > > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code > > > > > lines) on different processors > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n1 by > > > > > mlohry Mon Aug 5 23:24:23 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #1 PetscSplitOwnership() line 88 in > > > > > /home/mlohry/build/external/petsc/src/sys/utils/psplit.c > > > > > [0]PETSC ERROR: #2 PetscSplitOwnership() line 88 in > > > > > /home/mlohry/build/external/petsc/src/sys/utils/psplit.c > > > > > [0]PETSC ERROR: #3 PetscLayoutSetUp() line 137 in > > > > > /home/mlohry/build/external/petsc/src/vec/is/utils/pmap.c > > > > > [0]PETSC ERROR: #4 VecCreate_MPI_Private() line 489 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c > > > > > [0]PETSC ERROR: #5 VecCreate_MPI() line 537 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/pbvec.c > > > > > [0]PETSC ERROR: #6 VecSetType() line 51 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/interface/vecreg.c > > > > > [0]PETSC ERROR: #7 VecCreateMPI() line 40 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/impls/mpi/vmpicr.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Object is in wrong state > > > > > [0]PETSC ERROR: Vec object's type is not set: Argument # 1 > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h20c2n1 by > > > > > mlohry Mon Aug 5 23:24:23 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #8 VecGetLocalSize() line 665 in > > > > > /home/mlohry/build/external/petsc/src/vec/vec/interface/vector.c > > > > > > > > > > > > > > > > > > > > ************************************** > > > > > > > > > > > > > > > > > > > > mlohry@lancer:/ssd/dev_ssd/cmake-build$ grep "\[0\]" slurm-3429102.out > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Petsc has generated inconsistent data > > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code > > > > > lines) on different processors > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by > > > > > mlohry Mon Aug 5 22:50:12 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #1 TSSetExactFinalTime() line 2250 in > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: #2 TSSetExactFinalTime() line 2250 in > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Petsc has generated inconsistent data > > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code > > > > > lines) on different processors > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by > > > > > mlohry Mon Aug 5 22:50:12 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #3 MatSetBlockSizes() line 7206 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: #4 MatSetBlockSizes() line 7206 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: #5 MatSetBlockSize() line 7170 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Petsc has generated inconsistent data > > > > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code > > > > > lines) on different processors > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by > > > > > mlohry Mon Aug 5 22:50:12 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #6 MatStashScatterBegin_Ref() line 476 in > > > > > /home/mlohry/build/external/petsc/src/mat/utils/matstash.c > > > > > [0]PETSC ERROR: #7 MatStashScatterBegin_Ref() line 476 in > > > > > /home/mlohry/build/external/petsc/src/mat/utils/matstash.c > > > > > [0]PETSC ERROR: #8 MatStashScatterBegin_Private() line 455 in > > > > > /home/mlohry/build/external/petsc/src/mat/utils/matstash.c > > > > > [0]PETSC ERROR: #9 MatAssemblyBegin_MPIAIJ() line 679 in > > > > > /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: #10 MatAssemblyBegin() line 5154 in > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: > > > > > ------------------------------------------------------------------------ > > > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > > > > probably memory access out of range > > > > > [0]PETSC ERROR: Try option -start_in_debugger or > > > > > -on_error_attach_debugger > > > > > [0]PETSC ERROR: or see > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > > > > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac > > > > > OS X to find memory corruption errors > > > > > [0]PETSC ERROR: likely location of problem given in stack below > > > > > [0]PETSC ERROR: --------------------- Stack Frames > > > > > ------------------------------------ > > > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > > > > available, > > > > > [0]PETSC ERROR: INSTEAD the line number of the start of the > > > > > function > > > > > [0]PETSC ERROR: is given. > > > > > [0]PETSC ERROR: [0] MatStashScatterEnd_Ref line 137 > > > > > /home/mlohry/build/external/petsc/src/mat/utils/matstash.c > > > > > [0]PETSC ERROR: [0] MatStashScatterEnd_Private line 126 > > > > > /home/mlohry/build/external/petsc/src/mat/utils/matstash.c > > > > > [0]PETSC ERROR: [0] MatAssemblyEnd_MPIAIJ line 698 > > > > > /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: [0] MatAssemblyEnd line 5234 > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] MatStashScatterBegin_Ref line 473 > > > > > /home/mlohry/build/external/petsc/src/mat/utils/matstash.c > > > > > [0]PETSC ERROR: [0] MatStashScatterBegin_Private line 454 > > > > > /home/mlohry/build/external/petsc/src/mat/utils/matstash.c > > > > > [0]PETSC ERROR: [0] MatAssemblyBegin_MPIAIJ line 676 > > > > > /home/mlohry/build/external/petsc/src/mat/impls/aij/mpi/mpiaij.c > > > > > [0]PETSC ERROR: [0] MatAssemblyBegin line 5143 > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] MatSetBlockSizes line 7204 > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] MatSetBlockSize line 7167 > > > > > /home/mlohry/build/external/petsc/src/mat/interface/matrix.c > > > > > [0]PETSC ERROR: [0] TSSetExactFinalTime line 2248 > > > > > /home/mlohry/build/external/petsc/src/ts/interface/ts.c > > > > > [0]PETSC ERROR: --------------------- Error Message > > > > > -------------------------------------------------------------- > > > > > [0]PETSC ERROR: Signal received > > > > > [0]PETSC ERROR: See > > > > > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > > > > shooting. > > > > > [0]PETSC ERROR: Petsc Release Version 3.8.3, Dec, 09, 2017 > > > > > [0]PETSC ERROR: maDG on a arch-linux2-c-opt named tiger-h19c1n16 by > > > > > mlohry Mon Aug 5 22:50:12 2019 > > > > > [0]PETSC ERROR: Configure options > > > > > PETSC_DIR=/home/mlohry/build/external/petsc > > > > > PETSC_ARCH=arch-linux2-c-opt > > > > > --with-cc=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigcc > > > > > > > > > > --with-cxx=/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpigxx > > > > > --with-fc=0 --with-clanguage=C++ --with-pic=1 --with-debugging=yes > > > > > COPTFLAGS='-O3' CXXOPTFLAGS='-O3' --with-shared-libraries=1 > > > > > --download-parmetis --download-metis MAKEFLAGS=$MAKEFLAGS > > > > > --with-mpiexec=/usr/bin/srun > > > > > [0]PETSC ERROR: #11 User provided function() line 0 in unknown file > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [petsc-users] Sporadic MPI_Allreduce() called in different locations on larger core counts
Smith, Barry F. via petsc-users Sun, 18 Aug 2019 11:38:50 -0700
- Re: [petsc-users] Sporadic MPI_Allreduce()... Smith, Barry F. via petsc-users
- Re: [petsc-users] Sporadic MPI_Allred... Smith, Barry F. via petsc-users
- Re: [petsc-users] Sporadic MPI_Al... Smith, Barry F. via petsc-users
- Re: [petsc-users] Sporadic MP... Smith, Barry F. via petsc-users
- Re: [petsc-users] Sporadi... Smith, Barry F. via petsc-users
