Hi, I'm having some problems with my PETSc application similar to the ones discussed in this thread, so perhaps one of you can help. In my application I factorize a preconditioner matrix with mumps or superlu_dist, using this factorized preconditioner to accelerate gmres on a matrix that is denser than the preconditioner. I've been running on edison at nersc. My program works reliably for problem sizes below about 1 million x 1 million, but above this size, the factorization step fails in one of many possible ways, depending on the compiler, # of nodes, # of procs/node, etc:
When I use superlu_dist, I get 1 of 2 failure modes: (1) the first step of KSP returns "0 KSP residual norm -nan" and ksp then returns KSPConvergedReason = -9, or (2) the factorization completes, but GMRES then converges excruciatingly slowly or not at all, even if I choose the "real" matrix to be identical to the preconditioner matrix so KSP ought to converge in 1 step (which it does for smaller matrices). For mumps, the factorization can fail in many different ways: (3) With the intel compiler I usually get "Caught signal number 11 SEGV: Segmentation Violation" (4) Sometimes with the intel compiler I get "Caught signal number 7 BUS: Bus Error" (5) With the gnu compiler I often get a bunch of lines like "problem with NIV2_FLOPS message -5.9604644775390625E-008 0 -227464733.99999997" (6) Other times with gnu I get a mumps error with INFO(1)=-9 or INFO(1)=-17. The mumps documentation suggests I should increase icntl(14), but what is an appropriate value? 50? 10000? (7) With the Cray compiler I consistently get this cryptic error: Fatal error in PMPI_Test: Invalid MPI_Request, error stack: PMPI_Test(166): MPI_Test(request=0xb228dbf3c, flag=0x7ffffffe097c, status=0x7ffffffe0a00) failed PMPI_Test(121): Invalid MPI_Request _pmiu_daemon(SIGCHLD): [NID 02784] [c6-1c1s8n0] [Sun Mar 2 10:35:20 2014] PE RANK 0 exit signal Aborted [NID 02784] 2014-03-02 10:35:20 Apid 3374579: initiated application termination Application 3374579 exit codes: 134 For linear systems smaller than around 1 million^2, my application is very robust, working consistently with both mumps & superlu_dist, working for a wide range of # of nodes and # of procs/node, and working with all 3 available compilers on edison (intel, gnu, cray). By the way, mumps failed for much smaller problems until I tried -mat_mumps_icntl_7 2 (inspired by your conversation last week). I tried all the other options for icntl(7), icntl(28), and icntl(29), finding icntl(7)=2 works best by far. I tried the flags that worked for Samar (-mat_superlu_dist_colperm PARMETIS -mat_superlu_dist_parsymbfact 1) with superlu_dist, but they did not appear to change anything in my case. Can you recommend any other parameters of petsc, superlu_dist, or mumps that I should try changing? I don't care in the end whether I use superlu_dist or mumps. Thanks! Matt Landreman On Tue, Feb 25, 2014 at 3:50 PM, Xiaoye S. Li <[email protected]> wrote: > Very good! Thanks for the update. > I guess you are using all 16 cores per node? Since superlu_dist currently > is MPI-only, if you generate 16 MPI tasks, serial symbolic factorization > only has less than 2 GB memory to work with. > > Sherry > > > On Tue, Feb 25, 2014 at 12:22 PM, Samar Khatiwala > <[email protected]>wrote: > >> Hi Sherry, >> >> Thanks! I tried your suggestions and it worked! >> >> For the record I added these flags: -mat_superlu_dist_colperm PARMETIS >> -mat_superlu_dist_parsymbfact 1 >> >> Also, for completeness and since you asked: >> >> size: 2346346 x 2346346 >> nnz: 60856894 >> unsymmetric >> >> The hardware (http://www2.cisl.ucar.edu/resources/yellowstone/hardware) >> specs are: 2 GB/core, 32 GB/node (27 GB usable), (16 cores per node) >> I've been running on 8 nodes (so 8 x 27 ~ 216 GB). >> >> Thanks again for your help! >> >> Samar >> >> On Feb 25, 2014, at 1:00 PM, "Xiaoye S. Li" <[email protected]> wrote: >> >> I didn't follow the discussion thread closely ... How large is your >> matrix dimension, and number of nonzeros? >> How large is the memory per core (or per node)? >> >> The default setting in superlu_dist is to use serial symbolic >> factorization. You can turn on parallel symbolic factorization by: >> >> options.ParSymbFact = YES; >> options.ColPerm = PARMETIS; >> >> Is your matrix symmetric? if so, you need to give both upper and lower >> half of matrix A to superlu, which doesn't exploit symmetry. >> >> Do you know whether you need numerical pivoting? If not, you can turn >> off pivoting by: >> >> options.RowPerm = NATURAL; >> >> This avoids some other serial bottleneck. >> >> All these options can be turned on in the petsc interface. Please check >> out the syntax there. >> >> >> Sherry >> >> >> >> On Tue, Feb 25, 2014 at 8:07 AM, Samar Khatiwala >> <[email protected]>wrote: >> >>> Hi Barry, >>> >>> You're probably right. I note that the error occurs almost instantly and >>> I've tried increasing the number of CPUs >>> (as many as ~1000 on Yellowstone) to no avail. I know this is a big >>> problem but I didn't think it was that big! >>> >>> Sherry: Is there any way to write out more diagnostic info? E.g.,how >>> much memory superlu thinks it needs/is attempting >>> to allocate. >>> >>> Thanks, >>> >>> Samar >>> >>> On Feb 25, 2014, at 10:57 AM, Barry Smith <[email protected]> wrote: >>> > >>> >> >>> >> I tried superlu_dist again and it crashes even more quickly than >>> MUMPS with just the following error: >>> >> >>> >> ERROR: 0031-250 task 128: Killed >>> > >>> > This is usually a symptom of running out of memory. >>> > >>> >> >>> >> Absolutely nothing else is written out to either stderr or stdout. >>> This is with -mat_superlu_dist_statprint. >>> >> The program works fine on a smaller matrix. >>> >> >>> >> This is the sequence of calls: >>> >> >>> >> KSPSetType(ksp,KSPPREONLY); >>> >> PCSetType(pc,PCLU); >>> >> PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST); >>> >> KSPSetFromOptions(ksp); >>> >> PCSetFromOptions(pc); >>> >> KSPSolve(ksp,b,x); >>> >> >>> >> All of these successfully return *except* the very last one to >>> KSPSolve. >>> >> >>> >> Any help would be appreciated. Thanks! >>> >> >>> >> Samar >>> >> >>> >> On Feb 24, 2014, at 3:58 PM, Xiaoye S. Li <[email protected]> wrote: >>> >> >>> >>> Samar: >>> >>> If you include the error message while crashing using superlu_dist, >>> I probably know the reason. (better yet, include the printout before the >>> crash. ) >>> >>> >>> >>> Sherry >>> >>> >>> >>> >>> >>> On Mon, Feb 24, 2014 at 9:56 AM, Hong Zhang <[email protected]> >>> wrote: >>> >>> Samar : >>> >>> There are limitations for direct solvers. >>> >>> Do not expect any solver can be used on arbitrarily large problems. >>> >>> Since superlu_dist also crashes, direct solvers may not be able to >>> work on your application. >>> >>> This is why I suggest to increase size incrementally. >>> >>> You may have to experiment other type of solvers. >>> >>> >>> >>> Hong >>> >>> >>> >>> Hi Hong and Jed, >>> >>> >>> >>> Many thanks for replying. It would indeed be nice if the error >>> messages from MUMPS were less cryptic! >>> >>> >>> >>> 1) I have tried smaller matrices although given how my problem is >>> set up a jump is difficult to avoid. But a good idea >>> >>> that I will try. >>> >>> >>> >>> 2) I did try various ordering but not the one you suggested. >>> >>> >>> >>> 3) Tracing the error through the MUMPS code suggest a rather abrupt >>> termination of the program (there should be more >>> >>> error messages if, for example, memory was a problem). I therefore >>> thought it might be an interface problem rather than >>> >>> one with mumps and turned to the petsc-users group first. >>> >>> >>> >>> 4) I've tried superlu_dist but it also crashes (also unclear as to >>> why) at which point I decided to try mumps. The fact that both >>> >>> crash would again indicate a common (memory?) problem. >>> >>> >>> >>> I'll try a few more things before asking the MUMPS developers. >>> >>> >>> >>> Thanks again for your help! >>> >>> >>> >>> Samar >>> >>> >>> >>> On Feb 24, 2014, at 11:47 AM, Hong Zhang <[email protected]> wrote: >>> >>> >>> >>>> Samar: >>> >>>> The crash occurs in >>> >>>> ... >>> >>>> [161]PETSC ERROR: Error in external library! >>> >>>> [161]PETSC ERROR: Error reported by MUMPS in numerical >>> factorization phase: INFO(1)=-1, INFO(2)=48 >>> >>>> >>> >>>> for very large matrix, likely memory problem as you suspected. >>> >>>> I would suggest >>> >>>> 1. run problems with increased sizes (not jump from a small one to >>> a very large one) and observe memory usage using >>> >>>> '-ksp_view'. >>> >>>> I see you use '-mat_mumps_icntl_14 1000', i.e., percentage of >>> estimated workspace increase. Is it too large? >>> >>>> Anyway, this input should not cause the crash, I guess. >>> >>>> 2. experimenting with different matrix ordering -mat_mumps_icntl_7 >>> <> (I usually use sequential ordering 2) >>> >>>> I see you use parallel ordering -mat_mumps_icntl_29 2. >>> >>>> 3. send bug report to mumps developers for their suggestion. >>> >>>> >>> >>>> 4. try other direct solvers, e.g., superlu_dist. >>> >>>> >>> >>>> ... >>> >>>> >>> >>>> etc etc. The above error I can tell has something to do with >>> processor 48 (INFO(2)) and so forth but not the previous one. >>> >>>> >>> >>>> The full output enabled with -mat_mumps_icntl_4 3 looks as in the >>> attached file. Any hints as to what could be giving this >>> >>>> error would be very much appreciated. >>> >>>> >>> >>>> I do not know how to interpret this output file. mumps developer >>> would give you better suggestion on it. >>> >>>> I would appreciate to learn as well :-) >>> >>>> >>> >>>> Hong >>> >>> >>> >>> >>> >>> >>> >> >>> > >>> >>> >> >> >
