Dear Matt, petsc-users: Finally back after the holidays to try to solve this issue, thanks for your patience! I compiled the latest petsc (3.12.3) with debugging enabled, the same problem appears: relatively large matrices result in out of memory errors. This is not the case for petsc-3.9.4, all fine there. This is a non-hermitian, generalized eigenvalue problem, I generate the A and B matrices myself and then I use example 7 (from the slepc tutorial at $SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to solve the problem:
mpiexec -n 24 valgrind --tool=memcheck -q --num-callers=20 --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target -2.5e-4+1.56524i -eps_target_magnitude -eps_tol 1e-14 $opts where the $opts variable is: export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu -eps_error_relative ::ascii_info_detail -st_pc_factor_mat_solver_type superlu_dist -mat_superlu_dist_iterrefine 1 -mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact 1 -eps_converged_reason -eps_conv_rel -eps_monitor_conv -eps_true_residual 1' the output from valgrind (sample from one processor) and from the program are attached. If it's of any use the matrices are here (might need at least 180 Gb of ram to solve the problem succesfully under petsc-3.9.4): https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0 https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0 WIth petsc-3.9.4 and slepc-3.9.2 I can use matrices up to 10Gb (with 240 Gb ram), but only up to 3Gb with the latest petsc/slepc. Any suggestions, comments or any other help are very much appreciated! Cheers, Santiago On Mon, Dec 23, 2019 at 11:19 PM Matthew Knepley <[email protected]> wrote: > On Mon, Dec 23, 2019 at 3:14 PM Santiago Andres Triana <[email protected]> > wrote: > >> Dear all, >> >> After upgrading to petsc 3.12.2 my solver program crashes consistently. >> Before the upgrade I was using petsc 3.9.4 with no problems. >> >> My application deals with a complex-valued, generalized eigenvalue >> problem. The matrices involved are relatively large, typically 2 to 10 Gb >> in size, which is no problem for petsc 3.9.4. >> > > Are you sure that your indices do not exceed 4B? If so, you need to > configure using > > --with-64-bit-indices > > Also, it would be nice if you ran with the debugger so we can get a stack > trace for the SEGV. > > Thanks, > > Matt > > >> However, after the upgrade I can only obtain solutions when the matrices >> are small, the solver crashes when the matrices' size exceed about 1.5 Gb: >> >> [0]PETSC ERROR: >> ------------------------------------------------------------------------ >> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the >> batch system) has told this process to end >> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger >> [0]PETSC ERROR: or see >> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS >> X to find memory corruption errors >> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, >> and run >> [0]PETSC ERROR: to get more information on the crash. >> >> and so on for each cpu. >> >> >> I tried using valgrind and this is the typical output: >> >> ==2874== Conditional jump or move depends on uninitialised value(s) >> ==2874== at 0x4018178: index (in /lib64/ld-2.22.so) >> ==2874== by 0x400752D: expand_dynamic_string_token (in /lib64/ >> ld-2.22.so) >> ==2874== by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so) >> ==2874== by 0x40013E4: map_doit (in /lib64/ld-2.22.so) >> ==2874== by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so) >> ==2874== by 0x4000ABE: do_preload (in /lib64/ld-2.22.so) >> ==2874== by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so) >> ==2874== by 0x40034F0: dl_main (in /lib64/ld-2.22.so) >> ==2874== by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so) >> ==2874== by 0x4004A99: _dl_start (in /lib64/ld-2.22.so) >> ==2874== by 0x40011F7: ??? (in /lib64/ld-2.22.so) >> ==2874== by 0x12: ??? >> ==2874== >> >> >> These are my configuration options. Identical for both petsc 3.9.4 and >> 3.12.2: >> >> ./configure --with-scalar-type=complex --download-mumps >> --download-parmetis --download-metis --download-scalapack=1 >> --download-fblaslapack=1 --with-debugging=0 --download-superlu_dist=1 >> --download-ptscotch=1 CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3 >> -march=native' COPTFLAGS='-O3 -march=native' >> >> >> Thanks in advance for any comments or ideas! >> >> Cheers, >> Santiago >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >
test1.e6034496
Description: Binary data
valgrind.log.23361
Description: Binary data
