On Feb 25, 2014, at 8:23 AM, Samar Khatiwala <[email protected]> wrote:
> Hi Sherry, > > Thanks for the offer to help! > > I tried superlu_dist again and it crashes even more quickly than MUMPS with > just the following error: > > ERROR: 0031-250 task 128: Killed This is usually a symptom of running out of memory. > > Absolutely nothing else is written out to either stderr or stdout. This is > with -mat_superlu_dist_statprint. > The program works fine on a smaller matrix. > > This is the sequence of calls: > > KSPSetType(ksp,KSPPREONLY); > PCSetType(pc,PCLU); > PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST); > KSPSetFromOptions(ksp); > PCSetFromOptions(pc); > KSPSolve(ksp,b,x); > > All of these successfully return *except* the very last one to KSPSolve. > > Any help would be appreciated. Thanks! > > Samar > > On Feb 24, 2014, at 3:58 PM, Xiaoye S. Li <[email protected]> wrote: > >> Samar: >> If you include the error message while crashing using superlu_dist, I >> probably know the reason. (better yet, include the printout before the >> crash. ) >> >> Sherry >> >> >> On Mon, Feb 24, 2014 at 9:56 AM, Hong Zhang <[email protected]> wrote: >> Samar : >> There are limitations for direct solvers. >> Do not expect any solver can be used on arbitrarily large problems. >> Since superlu_dist also crashes, direct solvers may not be able to work on >> your application. >> This is why I suggest to increase size incrementally. >> You may have to experiment other type of solvers. >> >> Hong >> >> Hi Hong and Jed, >> >> Many thanks for replying. It would indeed be nice if the error messages from >> MUMPS were less cryptic! >> >> 1) I have tried smaller matrices although given how my problem is set up a >> jump is difficult to avoid. But a good idea >> that I will try. >> >> 2) I did try various ordering but not the one you suggested. >> >> 3) Tracing the error through the MUMPS code suggest a rather abrupt >> termination of the program (there should be more >> error messages if, for example, memory was a problem). I therefore thought >> it might be an interface problem rather than >> one with mumps and turned to the petsc-users group first. >> >> 4) I've tried superlu_dist but it also crashes (also unclear as to why) at >> which point I decided to try mumps. The fact that both >> crash would again indicate a common (memory?) problem. >> >> I'll try a few more things before asking the MUMPS developers. >> >> Thanks again for your help! >> >> Samar >> >> On Feb 24, 2014, at 11:47 AM, Hong Zhang <[email protected]> wrote: >> >>> Samar: >>> The crash occurs in >>> ... >>> [161]PETSC ERROR: Error in external library! >>> [161]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: >>> INFO(1)=-1, INFO(2)=48 >>> >>> for very large matrix, likely memory problem as you suspected. >>> I would suggest >>> 1. run problems with increased sizes (not jump from a small one to a very >>> large one) and observe memory usage using >>> '-ksp_view'. >>> I see you use '-mat_mumps_icntl_14 1000', i.e., percentage of estimated >>> workspace increase. Is it too large? >>> Anyway, this input should not cause the crash, I guess. >>> 2. experimenting with different matrix ordering -mat_mumps_icntl_7 <> (I >>> usually use sequential ordering 2) >>> I see you use parallel ordering -mat_mumps_icntl_29 2. >>> 3. send bug report to mumps developers for their suggestion. >>> >>> 4. try other direct solvers, e.g., superlu_dist. >>> >>> … >>> >>> etc etc. The above error I can tell has something to do with processor 48 >>> (INFO(2)) and so forth but not the previous one. >>> >>> The full output enabled with -mat_mumps_icntl_4 3 looks as in the attached >>> file. Any hints as to what could be giving this >>> error would be very much appreciated. >>> >>> I do not know how to interpret this output file. mumps developer would >>> give you better suggestion on it. >>> I would appreciate to learn as well :-) >>> >>> Hong >> >> >> >
