Re: [petsc-users] Can't expand MemType 1: jcol 16104

Anthony Paul Haas Tue, 07 Jul 2015 15:26:18 -0700

Hi Sherry,

Thanks for your message. I have used superlu_dist default options. I did
not realize that I was doing serial symbolic factorization. That is
probably the cause of my problem.
Each node on Garnet has 60GB usable memory and I can run with 1,2,4,8,16 or
32 core per node.


So I should use:

-mat_superlu_dist_r 20
-mat_superlu_dist_c 32

How do you specify the parallel symbolic factorization option? is it
-mat_superlu_dist_matinput 1

Thanks,

Anthony


On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <[email protected]> wrote:

> For superlu_dist failure, this occurs during symbolic factorization.
> Since you are using serial symbolic factorization, it requires the entire
> graph of A to be available in the memory of one MPI task. How much memory
> do you have for each MPI task?
>
> It won't help even if you use more processes.  You should try to use
> parallel symbolic factorization option.
>
> Another point.  You set up process grid as:
>        Process grid nprow 32 x npcol 20
> For better performance, you show swap the grid dimension. That is, it's
> better to use 20 x 32, never gives nprow larger than npcol.
>
>
> Sherry
>
>
> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <[email protected]> wrote:
>
>>
>>    I would suggest running a sequence of problems, 101 by 101 111 by 111
>> etc and get the memory usage in each case (when you run out of memory you
>> can get NO useful information out about memory needs). You can then plot
>> memory usage as a function of problem size to get a handle on how much
>> memory it is using.  You can also run on more and more processes (which
>> have a total of more memory) to see how large a problem you may be able to
>> reach.
>>
>>    MUMPS also has an "out of core" version (which we have never used)
>> that could in theory anyways let you get to large problems if you have lots
>> of disk space, but you are on your own figuring out how to use it.
>>
>>   Barry
>>
>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <[email protected]>
>> wrote:
>> >
>> > Hi Jose,
>> >
>> > In my code, I use once PETSc to solve a linear system to get the
>> baseflow (without using SLEPc) and then I use SLEPc to do the stability
>> analysis of that baseflow. This is why, there are some SLEPc options that
>> are not used in test.out-superlu_dist-151x151 (when I am solving for the
>> baseflow with PETSc only). I have attached a 101x101 case for which I get
>> the eigenvalues. That case works fine. However If i increase to 151x151, I
>> get the error that you can see in test.out-superlu_dist-151x151 (similar
>> error with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the
>> very end of the files test.out-superlu_dist-151x151 and
>> test.out-mumps-151x151, you will see that the last info message printed is:
>> >
>> > On Processor (after EPSSetFromOptions)  0    memory:
>> 0.65073152000E+08          =====>  (see line 807 of module_petsc.F90)
>> >
>> > This means that the memory error probably occurs in the call to
>> EPSSolve (see module_petsc.F90 line 810). I would like to evaluate how much
>> memory is required by the most memory intensive operation within EPSSolve.
>> Since I am solving a generalized EVP, I would imagine that it would be the
>> LU decomposition. But is there an accurate way of doing it?
>> >
>> > Before starting with iterative solvers, I would like to exploit as much
>> as I can direct solvers. I tried GMRES with default preconditioner at some
>> point but I had convergence problem. What solver/preconditioner would you
>> recommend for a generalized non-Hermitian (EPS_GNHEP) EVP?
>> >
>> > Thanks,
>> >
>> > Anthony
>> >
>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <[email protected]>
>> wrote:
>> >
>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:
>> >
>> > > Hi,
>> > >
>> > > I am computing eigenvalues using PETSc/SLEPc and superlu_dist for the
>> LU decomposition (my problem is a generalized eigenvalue problem). The code
>> runs fine for a grid with 101x101 but when I increase to 151x151, I get the
>> following error:
>> > >
>> > > Can't expand MemType 1: jcol 16104   (and then [NID 00037] 2015-07-06
>> 19:19:17 Apid 31025976: OOM killer terminated this process.)
>> > >
>> > > It seems to be a memory problem. I monitor the memory usage as far as
>> I can and it seems that memory usage is pretty low. The most memory
>> intensive part of the program is probably the LU decomposition in the
>> context of the generalized EVP. Is there a way to evaluate how much memory
>> will be required for that step? I am currently running the debug version of
>> the code which I would assume would use more memory?
>> > >
>> > > I have attached the output of the job. Note that the program uses
>> twice PETSc: 1) to solve a linear system for which no problem occurs, and,
>> 2) to solve the Generalized EVP with SLEPc, where I get the error.
>> > >
>> > > Thanks
>> > >
>> > > Anthony
>> > > <test.out-superlu_dist-151x151>
>> >
>> > In the output you are attaching there are no SLEPc objects in the
>> report and SLEPc options are not used. It seems that SLEPc calls are
>> skipped?
>> >
>> > Do you get the same error with MUMPS? Have you tried to solve linear
>> systems with a preconditioned iterative solver?
>> >
>> > Jose
>> >
>> >
>> >
>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>
>>
>

Re: [petsc-users] Can't expand MemType 1: jcol 16104

Reply via email to