Re: [petsc-users] Can't expand MemType 1: jcol 16104

Xiaoye S. Li Tue, 28 Jul 2015 08:46:15 -0700

I am checking v4.1 now. I'll let you know when I fixed the problem.

Sherry


On Tue, Jul 28, 2015 at 8:27 AM, Hong <[email protected]> wrote:

> Sherry,
> I tested with superlu_dist v4.1. The extra printings are gone, but hang
> remains.
> It hangs at
>
> #5  0x00007fde5af1c818 in PMPI_Wait (request=0xb6e4e0,
> status=0x7fff9cd83d60)
>     at src/mpi/pt2pt/wait.c:168
> #6  0x00007fde602dd635 in pzgstrf (options=0x9202f0, m=4900, n=4900,
>     anorm=13.738475134194639, LUstruct=0x9203c8, grid=0x9202c8,
>     stat=0x7fff9cd84880, info=0x7fff9cd848bc) at pzgstrf.c:1308
>
>                 if (recv_req[0] != MPI_REQUEST_NULL) {
>  -->                   MPI_Wait (&recv_req[0], &status);
>
> We will update petsc interface to superlu_dist v4.1.
>
> Hong
>
>
> On Mon, Jul 27, 2015 at 11:33 PM, Xiaoye S. Li <[email protected]> wrote:
>
>> Hong,
>> Thanks for trying out.
>> The extra printings are not properly guarded by the print level.  I will
>> fix that.   I will look into the hang problem soon.
>>
>> Sherry
>> 
>>
>> On Mon, Jul 27, 2015 at 7:50 PM, Hong <[email protected]> wrote:
>>
>>> Sherry,
>>>
>>> I can repeat hang using petsc/src/ksp/ksp/examples/tutorials/ex10.c:
>>> mpiexec -n 4 ./ex10 -f0 /homes/hzhang/tmp/Amat_binary.m -rhs 0 -pc_type
>>> lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact
>>> ...
>>> .. Starting with 1 OpenMP threads
>>> [0] .. BIG U size 1342464
>>> [0] .. BIG V size 131072
>>>   Max row size is 1311
>>>   Using buffer_size of 5000000
>>>   Threads per process 1
>>> ...
>>>
>>> using a debugger (with petsc option '-start_in_debugger'), I find that
>>> hang occurs at
>>> #0  0x00007f117d870998 in __GI___poll (fds=0x20da750, nfds=4,
>>>     timeout=<optimized out>, timeout@entry=-1)
>>>     at ../sysdeps/unix/sysv/linux/poll.c:83
>>> #1  0x00007f117de9f7de in MPIDU_Sock_wait (sock_set=0x20da550,
>>>     millisecond_timeout=millisecond_timeout@entry=-1,
>>>     eventp=eventp@entry=0x7fff654930b0)
>>>     at src/mpid/common/sock/poll/sock_wait.i:123
>>> #2  0x00007f117de898b8 in MPIDI_CH3i_Progress_wait (
>>>     progress_state=0x7fff65493120)
>>>     at src/mpid/ch3/channels/sock/src/ch3_progress.c:218
>>> #3  MPIDI_CH3I_Progress (blocking=blocking@entry=1,
>>>     state=state@entry=0x7fff65493120)
>>>     at src/mpid/ch3/channels/sock/src/ch3_progress.c:921
>>> #4  0x00007f117de1a559 in MPIR_Wait_impl (request=request@entry
>>> =0x262df90,
>>>     status=status@entry=0x7fff65493390) at src/mpi/pt2pt/wait.c:67
>>> #5  0x00007f117de1a818 in PMPI_Wait (request=0x262df90,
>>> status=0x7fff65493390)
>>>     at src/mpi/pt2pt/wait.c:168
>>> #6  0x00007f11831da557 in pzgstrf (options=0x23dfda0, m=4900, n=4900,
>>>     anorm=13.738475134194639, LUstruct=0x23dfe78, grid=0x23dfd78,
>>>     stat=0x7fff65493ea0, info=0x7fff65493edc) at pzgstrf.c:1308
>>>
>>> #7  0x00007f11831bf3bd in pzgssvx (options=0x23dfda0, A=0x23dfe30,
>>>     ScalePermstruct=0x23dfe50, B=0x0, ldb=1225, nrhs=0, grid=0x23dfd78,
>>>     LUstruct=0x23dfe78, SOLVEstruct=0x23dfe98, berr=0x0,
>>> stat=0x7fff65493ea0,
>>> ---Type <return> to continue, or q <return> to quit---
>>>     info=0x7fff65493edc) at pzgssvx.c:1063
>>>
>>> #8  0x00007f11825c2340 in MatLUFactorNumeric_SuperLU_DIST (F=0x23a0110,
>>>     A=0x21bb7e0, info=0x2355068)
>>>     at
>>> /sandbox/hzhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:411
>>> #9  0x00007f1181c6c567 in MatLUFactorNumeric (fact=0x23a0110,
>>> mat=0x21bb7e0,
>>>     info=0x2355068) at
>>> /sandbox/hzhang/petsc/src/mat/interface/matrix.c:2946
>>> #10 0x00007f1182a56489 in PCSetUp_LU (pc=0x2353a10)
>>>     at /sandbox/hzhang/petsc/src/ksp/pc/impls/factor/lu/lu.c:152
>>> #11 0x00007f1182b16f24 in PCSetUp (pc=0x2353a10)
>>>     at /sandbox/hzhang/petsc/src/ksp/pc/interface/precon.c:983
>>> #12 0x00007f1182be61b5 in KSPSetUp (ksp=0x232c2a0)
>>>     at /sandbox/hzhang/petsc/src/ksp/ksp/interface/itfunc.c:332
>>> #13 0x0000000000405a31 in main (argc=11, args=0x7fff65499578)
>>>     at /sandbox/hzhang/petsc/src/ksp/ksp/examples/tutorials/ex10.c:312
>>>
>>> You may take a look at it. Sequential symbolic factorization works fine.
>>>
>>> Why superlu_dist (v4.0) in complex precision displays
>>>
>>> .. Starting with 1 OpenMP threads
>>> [0] .. BIG U size 1342464
>>> [0] .. BIG V size 131072
>>>   Max row size is 1311
>>>   Using buffer_size of 5000000
>>>   Threads per process 1
>>> ...
>>>
>>> I realize that I use superlu_dist v4.0. Would v4.1 works? I'll give it a
>>> try tomorrow.
>>>
>>> Hong
>>>
>>> On Mon, Jul 27, 2015 at 1:25 PM, Anthony Paul Haas <
>>> [email protected]> wrote:
>>>
>>>> Hi Hong,
>>>>
>>>> No that is not the correct matrix. Note that I forgot to mention that
>>>> it is a complex matrix. I tried loading the matrix I sent you this morning
>>>> with:
>>>>
>>>> !...Load a Matrix in Binary Format
>>>>       call
>>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
>>>>       call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
>>>>       call MatSetType(DLOAD,MATAIJ,ierr)
>>>>       call MatLoad(DLOAD,viewer,ierr)
>>>>       call PetscViewerDestroy(viewer,ierr)
>>>>
>>>>       call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>
>>>> The first 37 rows should look like this:
>>>>
>>>> Mat Object: 2 MPI processes
>>>>   type: mpiaij
>>>> row 0: (0, 1)
>>>> row 1: (1, 1)
>>>> row 2: (2, 1)
>>>> row 3: (3, 1)
>>>> row 4: (4, 1)
>>>> row 5: (5, 1)
>>>> row 6: (6, 1)
>>>> row 7: (7, 1)
>>>> row 8: (8, 1)
>>>> row 9: (9, 1)
>>>> row 10: (10, 1)
>>>> row 11: (11, 1)
>>>> row 12: (12, 1)
>>>> row 13: (13, 1)
>>>> row 14: (14, 1)
>>>> row 15: (15, 1)
>>>> row 16: (16, 1)
>>>> row 17: (17, 1)
>>>> row 18: (18, 1)
>>>> row 19: (19, 1)
>>>> row 20: (20, 1)
>>>> row 21: (21, 1)
>>>> row 22: (22, 1)
>>>> row 23: (23, 1)
>>>> row 24: (24, 1)
>>>> row 25: (25, 1)
>>>> row 26: (26, 1)
>>>> row 27: (27, 1)
>>>> row 28: (28, 1)
>>>> row 29: (29, 1)
>>>> row 30: (30, 1)
>>>> row 31: (31, 1)
>>>> row 32: (32, 1)
>>>> row 33: (33, 1)
>>>> row 34: (34, 1)
>>>> row 35: (35, 1)
>>>> row 36: (1, -41.2444)  (35, -41.2444)  (36, 118.049 - 0.999271 i) (37,
>>>> -21.447)  (38, 5.18873)  (39, -2.34856)  (40, 1.3607)  (41, -0.898206)
>>>> (42, 0.642715)  (43, -0.48593)  (44, 0.382471)  (45, -0.310476)  (46,
>>>> 0.258302)  (47, -0.219268)  (48, 0.189304)  (49, -0.165815)  (50,
>>>> 0.147076)  (51, -0.131907)  (52, 0.119478)  (53, -0.109189)  (54, 0.1006)
>>>> (55, -0.0933795)  (56, 0.0872779)  (57, -0.0821019)  (58, 0.0777011)  (59,
>>>> -0.0739575)  (60, 0.0707775)  (61, -0.0680868)  (62, 0.0658258)  (63,
>>>> -0.0639473)  (64, 0.0624137)  (65, -0.0611954)  (66, 0.0602698)  (67,
>>>> -0.0596202)  (68, 0.0592349)  (69, -0.0295536)  (71, -21.447)  (106,
>>>> 5.18873)  (141, -2.34856)  (176, 1.3607)  (211, -0.898206)  (246,
>>>> 0.642715)  (281, -0.48593)  (316, 0.382471)  (351, -0.310476)  (386,
>>>> 0.258302)  (421, -0.219268)  (456, 0.189304)  (491, -0.165815)  (526,
>>>> 0.147076)  (561, -0.131907)  (596, 0.119478)  (631, -0.109189)  (666,
>>>> 0.1006)  (701, -0.0933795)  (736, 0.0872779)  (771, -0.0821019)  (806,
>>>> 0.0777011)  (841, -0.0739575)  (876, 0.0707775)  (911, -0.0680868)  (946,
>>>> 0.0658258)  (981, -0.0639473)  (1016, 0.0624137)  (1051, -0.0611954)
>>>> (1086, 0.0602698)  (1121, -0.0596202)  (1156, 0.0592349)  (1191,
>>>> -0.0295536)  (1261, 0)  (3676, 117.211)  (3711, -58.4801)  (3746,
>>>> -78.3633)  (3781, 29.4911)  (3816, -15.8073)  (3851, 9.94324)  (3886,
>>>> -6.87205)  (3921, 5.05774)  (3956, -3.89521)  (3991, 3.10522)  (4026,
>>>> -2.54388)  (4061, 2.13082)  (4096, -1.8182)  (4131, 1.57606)  (4166,
>>>> -1.38491)  (4201, 1.23155)  (4236, -1.10685)  (4271, 1.00428)  (4306,
>>>> -0.919116)  (4341, 0.847829)  (4376, -0.787776)  (4411, 0.736933)  (4446,
>>>> -0.693735)  (4481, 0.656958)  (4516, -0.625638)  (4551, 0.599007)  (4586,
>>>> -0.576454)  (4621, 0.557491)  (4656, -0.541726)  (4691, 0.528849)  (4726,
>>>> -0.518617)  (4761, 0.51084)  (4796, -0.50538)  (4831, 0.502142)  (4866,
>>>> -0.250534)
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Anthony
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 24, 2015 at 7:56 PM, Hong <[email protected]> wrote:
>>>>
>>>>> Anthony:
>>>>> I test your Amat_binary.m
>>>>> using petsc/src/ksp/ksp/examples/tutorials/ex10.c.
>>>>> Your matrix has many zero rows:
>>>>> ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more
>>>>> Mat Object: 1 MPI processes
>>>>>   type: seqaij
>>>>> row 0: (0, 1)
>>>>> row 1: (1, 0)
>>>>> row 2: (2, 1)
>>>>> row 3: (3, 0)
>>>>> row 4: (4, 1)
>>>>> row 5: (5, 0)
>>>>> row 6: (6, 1)
>>>>> row 7: (7, 0)
>>>>> row 8: (8, 1)
>>>>> row 9: (9, 0)
>>>>> ...
>>>>> row 36: (1, 1)  (35, 0)  (36, 1)  (37, 0)  (38, 1)  (39, 0)  (40, 1)
>>>>>  (41, 0)  (42, 1)  (43, 0)  (44, 1)  (45,
>>>>> 0)  (46, 1)  (47, 0)  (48, 1)  (49, 0)  (50, 1)  (51, 0)  (52, 1)
>>>>>  (53, 0)  (54, 1)  (55, 0)  (56, 1)  (57, 0)
>>>>>  (58, 1)  (59, 0)  (60, 1)  ...
>>>>>
>>>>> Do you send us correct matrix?
>>>>>
>>>>>>
>>>>>> I ran my code through valgrind and gdb as suggested by Barry. I am
>>>>>> now coming back to some problem I have had while running with parallel
>>>>>> symbolic factorization. I am attaching a test matrix (petsc binary 
>>>>>> format)
>>>>>> that I LU decompose and then use to solve a linear system (see code 
>>>>>> below).
>>>>>> I can run on 2 processors with parsymbfact or with 4 processors without
>>>>>> parsymbfact. However, if I run on 4 procs with parsymbfact, the code is
>>>>>> just hanging. Below is the simplified test case that I have used to test.
>>>>>> The matrix A and B are built somewhere else in my program. The matrix I 
>>>>>> am
>>>>>> attaching is A-sigma*B (see below).
>>>>>>
>>>>>> One thing is that I don't know for sparse matrices what is the
>>>>>> optimum number of processors to use for a LU decomposition? Does it 
>>>>>> depend
>>>>>> on the total number of nonzero? Do you have an easy way to compute it?
>>>>>>
>>>>>
>>>>> You have to experiment your matrix on a target machine to find out.
>>>>>
>>>>> Hong
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>      Subroutine HowBigLUCanBe(rank)
>>>>>>
>>>>>>       IMPLICIT NONE
>>>>>>
>>>>>>       integer(i4b),intent(in) :: rank
>>>>>>       integer(i4b)            :: i,ct
>>>>>>       real(dp)                :: begin,endd
>>>>>>       complex(dpc)            :: sigma
>>>>>>
>>>>>>       PetscErrorCode ierr
>>>>>>
>>>>>>
>>>>>>       if (rank==0) call cpu_time(begin)
>>>>>>
>>>>>>       if (rank==0) then
>>>>>>          write(*,*)
>>>>>>          write(*,*)'Testing How Big LU Can Be...'
>>>>>>          write(*,*)'============================'
>>>>>>          write(*,*)
>>>>>>       endif
>>>>>>
>>>>>>       sigma = (1.0d0,0.0d0)
>>>>>>       call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on
>>>>>> exit A = A-sigma*B
>>>>>>
>>>>>> !.....Write Matrix to ASCII and Binary Format
>>>>>>       !call
>>>>>> PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
>>>>>>       !call MatView(DXX,viewer,ierr)
>>>>>>       !call PetscViewerDestroy(viewer,ierr)
>>>>>>
>>>>>>       call
>>>>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
>>>>>>       call MatView(A,viewer,ierr)
>>>>>>       call PetscViewerDestroy(viewer,ierr)
>>>>>>
>>>>>> !.....Create Linear Solver Context
>>>>>>       call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>>>>>
>>>>>> !.....Set operators. Here the matrix that defines the linear system
>>>>>> also serves as the preconditioning matrix.
>>>>>>       !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr)
>>>>>> !aha commented and replaced by next line
>>>>>>       call KSPSetOperators(ksp,A,A,ierr) ! remember: here A =
>>>>>> A-sigma*B
>>>>>>
>>>>>> !.....Set Relative and Absolute Tolerances and Uses Default for
>>>>>> Divergence Tol
>>>>>>       tol = 1.e-10
>>>>>>       call
>>>>>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>>>>>>
>>>>>> !.....Set the Direct (LU) Solver
>>>>>>       call KSPSetType(ksp,KSPPREONLY,ierr)
>>>>>>       call KSPGetPC(ksp,pc,ierr)
>>>>>>       call PCSetType(pc,PCLU,ierr)
>>>>>>       call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr)
>>>>>> ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
>>>>>>
>>>>>> !.....Create Right-Hand-Side Vector
>>>>>>       call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
>>>>>>       call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
>>>>>>
>>>>>>       allocate(xwork1(IendA-IstartA))
>>>>>>       allocate(loc(IendA-IstartA))
>>>>>>
>>>>>>       ct=0
>>>>>>       do i=IstartA,IendA-1
>>>>>>          ct=ct+1
>>>>>>          loc(ct)=i
>>>>>>          xwork1(ct)=(1.0d0,0.0d0)
>>>>>>       enddo
>>>>>>
>>>>>>       call
>>>>>> VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
>>>>>>       call VecZeroEntries(sol,ierr)
>>>>>>
>>>>>>       deallocate(xwork1,loc)
>>>>>>
>>>>>> !.....Assemble Vectors
>>>>>>       call VecAssemblyBegin(frhs,ierr)
>>>>>>       call VecAssemblyEnd(frhs,ierr)
>>>>>>
>>>>>> !.....Solve the Linear System
>>>>>>       call KSPSolve(ksp,frhs,sol,ierr)
>>>>>>
>>>>>>       !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>>>
>>>>>>       if (rank==0) then
>>>>>>          call cpu_time(endd)
>>>>>>          write(*,*)
>>>>>>          print '("Total time for HowBigLUCanBe = ",f21.3,"
>>>>>> seconds.")',endd-begin
>>>>>>       endif
>>>>>>
>>>>>>       call SlepcFinalize(ierr)
>>>>>>
>>>>>>       STOP
>>>>>>
>>>>>>
>>>>>>     end Subroutine HowBigLUCanBe
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote:
>>>>>>
>>>>>>  Indeed, the parallel symbolic factorization routine needs power of
>>>>>> 2 processes, however, you can use however many processes you need;
>>>>>> internally, we redistribute matrix to nearest power of 2 processes, do
>>>>>> symbolic, then redistribute back to all the processes to do 
>>>>>> factorization,
>>>>>> triangular solve etc.  So, there is no  restriction from the users
>>>>>> viewpoint.
>>>>>>
>>>>>>  It's difficult to tell what the problem is.  Do you think you can
>>>>>> print your matrix, then, I can do some debugging by running superlu_dist
>>>>>> standalone?
>>>>>>
>>>>>>  Sherry
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>>   Hi,
>>>>>>>
>>>>>>>  I have used the switch -mat_superlu_dist_parsymbfact in my pbs
>>>>>>> script. However, although my program worked fine with sequential 
>>>>>>> symbolic
>>>>>>> factorization, I get one of the following 2 behaviors when I run with
>>>>>>> parallel symbolic factorization (depending on the number of processors 
>>>>>>> that
>>>>>>> I use):
>>>>>>>
>>>>>>>  1) the program just hangs (it seems stuck in some subroutine ==>
>>>>>>> see test.out-hangs)
>>>>>>>  2) I get a floating point exception ==> see
>>>>>>> test.out-floating-point-exception
>>>>>>>
>>>>>>>  Note that as suggested in the Superlu manual, I use a power of 2
>>>>>>> number of procs. Are there any tunable parameters for the parallel 
>>>>>>> symbolic
>>>>>>> factorization? Note that when I build my sparse matrix, most elements I 
>>>>>>> add
>>>>>>> are nonzero of course but to simplify the programming, I also add a few
>>>>>>> zero elements in the sparse matrix. I was thinking that maybe if the
>>>>>>> parallel symbolic factorization proceed by block, there could be some
>>>>>>> blocks where the pivot would be zero, hence creating the FPE??
>>>>>>>
>>>>>>>  Thanks,
>>>>>>>
>>>>>>>  Anthony
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <[email protected]> wrote:
>>>>>>>
>>>>>>>>  Did you find out how to change option to use parallel symbolic
>>>>>>>> factorization?  Perhaps PETSc team can help.
>>>>>>>>
>>>>>>>>  Sherry
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <[email protected]> wrote:
>>>>>>>>
>>>>>>>>>  Is there an inquiry function that tells you all the available
>>>>>>>>> options?
>>>>>>>>>
>>>>>>>>>  Sherry
>>>>>>>>>
>>>>>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>>    Hi Sherry,
>>>>>>>>>>
>>>>>>>>>>  Thanks for your message. I have used superlu_dist default
>>>>>>>>>> options. I did not realize that I was doing serial symbolic 
>>>>>>>>>> factorization.
>>>>>>>>>> That is probably the cause of my problem.
>>>>>>>>>>  Each node on Garnet has 60GB usable memory and I can run with
>>>>>>>>>> 1,2,4,8,16 or 32 core per node.
>>>>>>>>>>
>>>>>>>>>>  So I should use:
>>>>>>>>>>
>>>>>>>>>> -mat_superlu_dist_r 20
>>>>>>>>>> -mat_superlu_dist_c 32
>>>>>>>>>>
>>>>>>>>>>  How do you specify the parallel symbolic factorization option?
>>>>>>>>>> is it -mat_superlu_dist_matinput 1
>>>>>>>>>>
>>>>>>>>>>  Thanks,
>>>>>>>>>>
>>>>>>>>>>  Anthony
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>  For superlu_dist failure, this occurs during symbolic
>>>>>>>>>>> factorization.  Since you are using serial symbolic factorization, 
>>>>>>>>>>> it
>>>>>>>>>>> requires the entire graph of A to be available in the memory of one 
>>>>>>>>>>> MPI
>>>>>>>>>>> task. How much memory do you have for each MPI task?
>>>>>>>>>>>
>>>>>>>>>>>  It won't help even if you use more processes.  You should try
>>>>>>>>>>> to use parallel symbolic factorization option.
>>>>>>>>>>>
>>>>>>>>>>>  Another point.  You set up process grid as:
>>>>>>>>>>>        Process grid nprow 32 x npcol 20
>>>>>>>>>>>  For better performance, you show swap the grid dimension. That
>>>>>>>>>>> is, it's better to use 20 x 32, never gives nprow larger than npcol.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  Sherry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>    I would suggest running a sequence of problems, 101 by 101
>>>>>>>>>>>> 111 by 111 etc and get the memory usage in each case (when you run 
>>>>>>>>>>>> out of
>>>>>>>>>>>> memory you can get NO useful information out about memory needs). 
>>>>>>>>>>>> You can
>>>>>>>>>>>> then plot memory usage as a function of problem size to get a 
>>>>>>>>>>>> handle on how
>>>>>>>>>>>> much memory it is using.  You can also run on more and more 
>>>>>>>>>>>> processes
>>>>>>>>>>>> (which have a total of more memory) to see how large a problem you 
>>>>>>>>>>>> may be
>>>>>>>>>>>> able to reach.
>>>>>>>>>>>>
>>>>>>>>>>>>    MUMPS also has an "out of core" version (which we have never
>>>>>>>>>>>> used) that could in theory anyways let you get to large problems 
>>>>>>>>>>>> if you
>>>>>>>>>>>> have lots of disk space, but you are on your own figuring out how 
>>>>>>>>>>>> to use it.
>>>>>>>>>>>>
>>>>>>>>>>>>   Barry
>>>>>>>>>>>>
>>>>>>>>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Hi Jose,
>>>>>>>>>>>> >
>>>>>>>>>>>> > In my code, I use once PETSc to solve a linear system to get
>>>>>>>>>>>> the baseflow (without using SLEPc) and then I use SLEPc to do the 
>>>>>>>>>>>> stability
>>>>>>>>>>>> analysis of that baseflow. This is why, there are some SLEPc 
>>>>>>>>>>>> options that
>>>>>>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am solving 
>>>>>>>>>>>> for the
>>>>>>>>>>>> baseflow with PETSc only). I have attached a 101x101 case for 
>>>>>>>>>>>> which I get
>>>>>>>>>>>> the eigenvalues. That case works fine. However If i increase to 
>>>>>>>>>>>> 151x151, I
>>>>>>>>>>>> get the error that you can see in test.out-superlu_dist-151x151 
>>>>>>>>>>>> (similar
>>>>>>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ). If you 
>>>>>>>>>>>> look a the
>>>>>>>>>>>> very end of the files test.out-superlu_dist-151x151 and
>>>>>>>>>>>> test.out-mumps-151x151, you will see that the last info message 
>>>>>>>>>>>> printed is:
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Processor (after EPSSetFromOptions)  0    memory:
>>>>>>>>>>>> 0.65073152000E+08          =====>  (see line 807 of 
>>>>>>>>>>>> module_petsc.F90)
>>>>>>>>>>>> >
>>>>>>>>>>>> > This means that the memory error probably occurs in the call
>>>>>>>>>>>> to EPSSolve (see module_petsc.F90 line 810). I would like to 
>>>>>>>>>>>> evaluate how
>>>>>>>>>>>> much memory is required by the most memory intensive operation 
>>>>>>>>>>>> within
>>>>>>>>>>>> EPSSolve. Since I am solving a generalized EVP, I would imagine 
>>>>>>>>>>>> that it
>>>>>>>>>>>> would be the LU decomposition. But is there an accurate way of 
>>>>>>>>>>>> doing it?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Before starting with iterative solvers, I would like to
>>>>>>>>>>>> exploit as much as I can direct solvers. I tried GMRES with default
>>>>>>>>>>>> preconditioner at some point but I had convergence problem. What
>>>>>>>>>>>> solver/preconditioner would you recommend for a generalized 
>>>>>>>>>>>> non-Hermitian
>>>>>>>>>>>> (EPS_GNHEP) EVP?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>> >
>>>>>>>>>>>> > Anthony
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió:
>>>>>>>>>>>> >
>>>>>>>>>>>> > > Hi,
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > I am computing eigenvalues using PETSc/SLEPc and
>>>>>>>>>>>> superlu_dist for the LU decomposition (my problem is a generalized
>>>>>>>>>>>> eigenvalue problem). The code runs fine for a grid with 101x101 
>>>>>>>>>>>> but when I
>>>>>>>>>>>> increase to 151x151, I get the following error:
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Can't expand MemType 1: jcol 16104   (and then [NID 00037]
>>>>>>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated this 
>>>>>>>>>>>> process.)
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > It seems to be a memory problem. I monitor the memory usage
>>>>>>>>>>>> as far as I can and it seems that memory usage is pretty low. The 
>>>>>>>>>>>> most
>>>>>>>>>>>> memory intensive part of the program is probably the LU 
>>>>>>>>>>>> decomposition in
>>>>>>>>>>>> the context of the generalized EVP. Is there a way to evaluate how 
>>>>>>>>>>>> much
>>>>>>>>>>>> memory will be required for that step? I am currently running the 
>>>>>>>>>>>> debug
>>>>>>>>>>>> version of the code which I would assume would use more memory?
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > I have attached the output of the job. Note that the
>>>>>>>>>>>> program uses twice PETSc: 1) to solve a linear system for which no 
>>>>>>>>>>>> problem
>>>>>>>>>>>> occurs, and, 2) to solve the Generalized EVP with SLEPc, where I 
>>>>>>>>>>>> get the
>>>>>>>>>>>> error.
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Thanks
>>>>>>>>>>>> > >
>>>>>>>>>>>> > > Anthony
>>>>>>>>>>>> > > <test.out-superlu_dist-151x151>
>>>>>>>>>>>> >
>>>>>>>>>>>> > In the output you are attaching there are no SLEPc objects in
>>>>>>>>>>>> the report and SLEPc options are not used. It seems that SLEPc 
>>>>>>>>>>>> calls are
>>>>>>>>>>>> skipped?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Do you get the same error with MUMPS? Have you tried to solve
>>>>>>>>>>>> linear systems with a preconditioned iterative solver?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Jose
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>>  >
>>>>>>>>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [petsc-users] Can't expand MemType 1: jcol 16104

Reply via email to