On Sat 11. Jan 2020 at 00:04, Santiago Andres Triana <[email protected]> wrote:
> Hi Barry, petsc-users: > > Just updated to petsc-3.12.3 and the performance is about the same as > 3.12.2, i.e. about 2x the memory use of petsc-3.9.4 > > > petsc-3.12.3 (uses superlu_dist-6.2.0) > > Summary of Memory Usage in PETSc > Maximum (over computational time) process memory: total 2.9368e+10 > max 1.2922e+09 min 1.1784e+09 > Current process memory: total 2.8192e+10 > max 1.2263e+09 min 1.1456e+09 > Maximum (over computational time) space PetscMalloc()ed: total 2.7619e+09 > max 1.4339e+08 min 8.6494e+07 > Current space PetscMalloc()ed: total 3.6127e+06 > max 1.5053e+05 min 1.5053e+05 > > > petsc-3.9.4 > > Summary of Memory Usage in PETSc > Maximum (over computational time) process memory: total 1.5695e+10 > max 7.1985e+08 min 6.0131e+08 > Current process memory: total 1.3186e+10 > max 6.9240e+08 min 4.2821e+08 > Maximum (over computational time) space PetscMalloc()ed: total 3.1290e+09 > max 1.5869e+08 min 1.0179e+08 > Current space PetscMalloc()ed: total 1.8808e+06 > max 7.8368e+04 min 7.8368e+04 > > > However, it seems that the culprit is superlu_dist: I recompiled current > petsc/slepc with superlu_dist-5.4.0 (used option > --download-superlu_dist=/home/spin/superlu_dist-5.4.0.tar.gz) and the > result is this: > > petsc-3.12.3 with superlu_dist-5.4.0: > > Summary of Memory Usage in PETSc > Maximum (over computational time) process memory: total 1.5636e+10 > max 7.1217e+08 min 5.9963e+08 > Current process memory: total 1.3401e+10 > max 6.5498e+08 min 4.2626e+08 > Maximum (over computational time) space PetscMalloc()ed: total 2.7619e+09 > max 1.4339e+08 min 8.6494e+07 > Current space PetscMalloc()ed: total 3.6127e+06 > max 1.5053e+05 min 1.5053e+05 > > I could not compile petsc-3.12.3 with the exact superlu_dist version that > petsc-3.9.4 uses (5.3.0), but will try newer versions to see how they > perform ... I guess I should address this issue to the superlu mantainers? > Yes. > Thanks! > Santiago > > On Fri, Jan 10, 2020 at 9:19 PM Smith, Barry F. <[email protected]> > wrote: > >> >> Can you please try v3.12.3 There was some funky business mistakenly >> added related to partitioning that has been fixed in 3.12.3 >> >> Barry >> >> >> > On Jan 10, 2020, at 1:57 PM, Santiago Andres Triana <[email protected]> >> wrote: >> > >> > Dear all, >> > >> > I ran the program with valgrind --tool=massif, the results are cryptic >> to me ... not sure who's the memory hog! the logs are attached. >> > >> > The command I used is: >> > mpiexec -n 24 valgrind --tool=massif --num-callers=20 >> --log-file=valgrind.log.%p ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 $opts >> -eps_target -4.008e-3+1.57142i -eps_target_magnitude -eps_tol 1e-14 >> > >> > Is there any possibility to install a version of superlu_dist (or >> mumps) different from what the petsc version automatically downloads? >> > >> > Thanks! >> > Santiago >> > >> > >> > On Thu, Jan 9, 2020 at 10:04 PM Dave May <[email protected]> >> wrote: >> > This kind of issue is difficult to untangle because you have >> potentially three pieces of software which might have changed between v3.9 >> and v3.12, namely >> > PETSc, SLEPC and SuperLU_dist. >> > You need to isolate which software component is responsible for the 2x >> increase in memory. >> > >> > When I look at the memory usage in the log files, things look very very >> similar for the raw PETSc objects. >> > >> > [v3.9] >> > --- Event Stage 0: Main Stage >> > >> > Viewer 4 3 2520 0. >> > Matrix 15 15 125236536 0. >> > Vector 22 22 19713856 0. >> > Index Set 10 10 995280 0. >> > Vec Scatter 4 4 4928 0. >> > EPS Solver 1 1 2276 0. >> > Spectral Transform 1 1 848 0. >> > Basis Vectors 1 1 2168 0. >> > PetscRandom 1 1 662 0. >> > Region 1 1 672 0. >> > Direct Solver 1 1 17440 0. >> > Krylov Solver 1 1 1176 0. >> > Preconditioner 1 1 1000 0. >> > >> > versus >> > >> > [v3.12] >> > --- Event Stage 0: Main Stage >> > >> > Viewer 4 3 2520 0. >> > Matrix 15 15 125237144 0. >> > Vector 22 22 19714528 0. >> > Index Set 10 10 995096 0. >> > Vec Scatter 4 4 3168 0. >> > Star Forest Graph 4 4 3936 0. >> > EPS Solver 1 1 2292 0. >> > Spectral Transform 1 1 848 0. >> > Basis Vectors 1 1 2184 0. >> > PetscRandom 1 1 662 0. >> > Region 1 1 672 0. >> > Direct Solver 1 1 17456 0. >> > Krylov Solver 1 1 1400 0. >> > Preconditioner 1 1 1000 0. >> > >> > Certainly there is no apparent factor 2x increase in memory usage in >> the underlying petsc objects themselves. >> > Furthermore, the counts of creations of petsc objects in toobig.log and >> justfine.log match, indicating that none of the implementations used in >> either PETSc or SLEPc have fundamentally changed wrt the usage of the >> native petsc objects. >> > >> > It is also curious that VecNorm is called 3 times in "justfine.log" and >> 19 times in "toobig.log" - although I don't see how that could be related >> to you problem... >> > >> > The above at least gives me the impression that issue of memory >> increase is likely not coming from PETSc. >> > I just read Barry's useful email which is even more compelling and also >> indicates SLEPc is not the likely culprit either as it uses PetscMalloc() >> internally. >> > >> > Some options to identify the problem: >> > >> > 1/ Eliminate SLEPc as a possible culprit by not calling EPSSolve() and >> rather just call KSPSolve() with some RHS vector. >> > * If you still see a 2x increase, switch the preconditioner to using >> -pc_type bjacobi -ksp_max_it 10 rather than superlu_dist. >> > If the memory usage is good, you can be pretty certain the issue arises >> internally to superl_dist. >> > >> > 2/ Leave your code as is and perform your profiling using mumps rather >> than superlu_dist. >> > This is a less reliable test than 1/ since the mumps implementation >> used with v3.9 and v3.12 may differ... >> > >> > Thanks >> > Dave >> > >> > On Thu, 9 Jan 2020 at 20:17, Santiago Andres Triana <[email protected]> >> wrote: >> > Dear all, >> > >> > I think parmetis is not involved since I still run out of memory if I >> use the following options: >> > export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu >> -st_pc_factor_mat_solver_type superlu_dist -eps_true_residual 1' >> > and issuing: >> > mpiexec -n 24 ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target >> -4.008e-3+1.57142i $opts -eps_target_magnitude -eps_tol 1e-14 -memory_view >> > >> > Bottom line is that the memory usage of petsc-3.9.4 / slepc-3.9.2 is >> much lower than current version. I can only solve relatively small problems >> using the 3.12 series :( >> > I have an example with smaller matrices that will likely fail in a 32 >> Gb ram machine with petsc-3.12 but runs just fine with petsc-3.9. The >> -memory_view output is >> > >> > with petsc-3.9.4: (log 'justfine.log' attached) >> > >> > Summary of Memory Usage in PETSc >> > Maximum (over computational time) process memory: total >> 1.6665e+10 max 7.5674e+08 min 6.4215e+08 >> > Current process memory: total >> 1.5841e+10 max 7.2881e+08 min 6.0905e+08 >> > Maximum (over computational time) space PetscMalloc()ed: total >> 3.1290e+09 max 1.5868e+08 min 1.0179e+08 >> > Current space PetscMalloc()ed: total >> 1.8808e+06 max 7.8368e+04 min 7.8368e+04 >> > >> > >> > with petsc-3.12.2: (log 'toobig.log' attached) >> > >> > Summary of Memory Usage in PETSc >> > Maximum (over computational time) process memory: total >> 3.1564e+10 max 1.3662e+09 min 1.2604e+09 >> > Current process memory: total >> 3.0355e+10 max 1.3082e+09 min 1.2254e+09 >> > Maximum (over computational time) space PetscMalloc()ed: total >> 2.7618e+09 max 1.4339e+08 min 8.6493e+07 >> > Current space PetscMalloc()ed: total >> 3.6127e+06 max 1.5053e+05 min 1.5053e+05 >> > >> > Strangely, monitoring with 'top' I can see *appreciably higher* peak >> memory use, usually twice what -memory_view ends up reporting, both for >> petsc-3.9.4 and current. Program fails usually at this peak if not enough >> ram available >> > >> > The matrices for the example quoted above can be downloaded here (I use >> slepc's tutorial ex7.c to solve the problem): >> > https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0 (about 600 Mb) >> > https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0 (about 210 Mb) >> > >> > I haven't been able to use a debugger successfully since I am using a >> compute node without the possibility of an xterm ... note that I have no >> experience using a debugger so any help on that will also be appreciated! >> > Hope I can switch to the current petsc/slepc version for my production >> runs soon... >> > >> > Thanks again! >> > Santiago >> > >> > >> > >> > On Thu, Jan 9, 2020 at 4:25 PM Stefano Zampini < >> [email protected]> wrote: >> > Can you reproduce the issue with smaller matrices? Or with a debug >> build (i.e. using —with-debugging=1 and compilation flags -02 -g)? >> > >> > The only changes in parmetis between the two PETSc releases are these >> below, but I don’t see how they could cause issues >> > >> > kl-18448:pkg-parmetis szampini$ git log -2 >> > commit ab4fedc6db1f2e3b506be136e3710fcf89ce16ea (HEAD -> master, tag: >> v4.0.3-p5, origin/master, origin/dalcinl/random, origin/HEAD) >> > Author: Lisandro Dalcin <[email protected]> >> > Date: Thu May 9 18:44:10 2019 +0300 >> > >> > GKLib: Make FPRFX##randInRange() portable for 32bit/64bit indices >> > >> > commit 2b4afc79a79ef063f369c43da2617fdb64746dd7 >> > Author: Lisandro Dalcin <[email protected]> >> > Date: Sat May 4 17:22:19 2019 +0300 >> > >> > GKlib: Use gk_randint32() to define the RandomInRange() macro >> > >> > >> > >> >> On Jan 9, 2020, at 4:31 AM, Smith, Barry F. via petsc-users < >> [email protected]> wrote: >> >> >> >> >> >> This is extremely worrisome: >> >> >> >> ==23361== Use of uninitialised value of size 8 >> >> ==23361== at 0x847E939: gk_randint64 (random.c:99) >> >> ==23361== by 0x847EF88: gk_randint32 (random.c:128) >> >> ==23361== by 0x81EBF0B: libparmetis__Match_Global (in >> /space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib/libparmetis.so) >> >> >> >> do you get that with PETSc-3.9.4 or only with 3.12.3? >> >> >> >> This may result in Parmetis using non-random numbers and then giving >> back an inappropriate ordering that requires more memory for SuperLU_DIST. >> >> >> >> Suggest looking at the code, or running in the debugger to see what >> is going on there. We use parmetis all the time and don't see this. >> >> >> >> Barry >> >> >> >> >> >> >> >> >> >> >> >> >> >>> On Jan 8, 2020, at 4:34 PM, Santiago Andres Triana <[email protected]> >> wrote: >> >>> >> >>> Dear Matt, petsc-users: >> >>> >> >>> Finally back after the holidays to try to solve this issue, thanks >> for your patience! >> >>> I compiled the latest petsc (3.12.3) with debugging enabled, the same >> problem appears: relatively large matrices result in out of memory errors. >> This is not the case for petsc-3.9.4, all fine there. >> >>> This is a non-hermitian, generalized eigenvalue problem, I generate >> the A and B matrices myself and then I use example 7 (from the slepc >> tutorial at $SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to solve the >> problem: >> >>> >> >>> mpiexec -n 24 valgrind --tool=memcheck -q --num-callers=20 >> --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc >> -eps_nev 1 -eps_target -2.5e-4+1.56524i -eps_target_magnitude -eps_tol >> 1e-14 $opts >> >>> >> >>> where the $opts variable is: >> >>> export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu >> -eps_error_relative ::ascii_info_detail -st_pc_factor_mat_solver_type >> superlu_dist -mat_superlu_dist_iterrefine 1 -mat_superlu_dist_colperm >> PARMETIS -mat_superlu_dist_parsymbfact 1 -eps_converged_reason >> -eps_conv_rel -eps_monitor_conv -eps_true_residual 1' >> >>> >> >>> the output from valgrind (sample from one processor) and from the >> program are attached. >> >>> If it's of any use the matrices are here (might need at least 180 Gb >> of ram to solve the problem succesfully under petsc-3.9.4): >> >>> >> >>> https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0 >> >>> https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0 >> >>> >> >>> WIth petsc-3.9.4 and slepc-3.9.2 I can use matrices up to 10Gb (with >> 240 Gb ram), but only up to 3Gb with the latest petsc/slepc. >> >>> Any suggestions, comments or any other help are very much appreciated! >> >>> >> >>> Cheers, >> >>> Santiago >> >>> >> >>> >> >>> >> >>> On Mon, Dec 23, 2019 at 11:19 PM Matthew Knepley <[email protected]> >> wrote: >> >>> On Mon, Dec 23, 2019 at 3:14 PM Santiago Andres Triana < >> [email protected]> wrote: >> >>> Dear all, >> >>> >> >>> After upgrading to petsc 3.12.2 my solver program crashes >> consistently. Before the upgrade I was using petsc 3.9.4 with no problems. >> >>> >> >>> My application deals with a complex-valued, generalized eigenvalue >> problem. The matrices involved are relatively large, typically 2 to 10 Gb >> in size, which is no problem for petsc 3.9.4. >> >>> >> >>> Are you sure that your indices do not exceed 4B? If so, you need to >> configure using >> >>> >> >>> --with-64-bit-indices >> >>> >> >>> Also, it would be nice if you ran with the debugger so we can get a >> stack trace for the SEGV. >> >>> >> >>> Thanks, >> >>> >> >>> Matt >> >>> >> >>> However, after the upgrade I can only obtain solutions when the >> matrices are small, the solver crashes when the matrices' size exceed about >> 1.5 Gb: >> >>> >> >>> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> >>> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or >> the batch system) has told this process to end >> >>> [0]PETSC ERROR: Try option -start_in_debugger or >> -on_error_attach_debugger >> >>> [0]PETSC ERROR: or see >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> >>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple >> Mac OS X to find memory corruption errors >> >>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, >> link, and run >> >>> [0]PETSC ERROR: to get more information on the crash. >> >>> >> >>> and so on for each cpu. >> >>> >> >>> >> >>> I tried using valgrind and this is the typical output: >> >>> >> >>> ==2874== Conditional jump or move depends on uninitialised value(s) >> >>> ==2874== at 0x4018178: index (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x400752D: expand_dynamic_string_token (in /lib64/ >> ld-2.22.so) >> >>> ==2874== by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x40013E4: map_doit (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x4000ABE: do_preload (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x40034F0: dl_main (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x4004A99: _dl_start (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x40011F7: ??? (in /lib64/ld-2.22.so) >> >>> ==2874== by 0x12: ??? >> >>> ==2874== >> >>> >> >>> >> >>> These are my configuration options. Identical for both petsc 3.9.4 >> and 3.12.2: >> >>> >> >>> ./configure --with-scalar-type=complex --download-mumps >> --download-parmetis --download-metis --download-scalapack=1 >> --download-fblaslapack=1 --with-debugging=0 --download-superlu_dist=1 >> --download-ptscotch=1 CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3 >> -march=native' COPTFLAGS='-O3 -march=native' >> >>> >> >>> >> >>> Thanks in advance for any comments or ideas! >> >>> >> >>> Cheers, >> >>> Santiago >> >>> >> >>> >> >>> -- >> >>> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> >>> -- Norbert Wiener >> >>> >> >>> https://www.cse.buffalo.edu/~knepley/ >> >>> <test1.e6034496><valgrind.log.23361> >> >> >> > >> > <massif.out.petsc-3.9><massif.out.petsc-3.12> >> >>
