I've updated petsc to use v4.1. The changes are in branch 'balay/update-superlu_dist-4.1' - and merged to 'next' for now.
Satish On Wed, 29 Jul 2015, Xiaoye S. Li wrote: > Thanks for quick update. In the new tarball, I have already removed the > junk files, as pointed out by Satish. > > Sherry > > On Wed, Jul 29, 2015 at 8:36 AM, Hong <[email protected]> wrote: > > > Sherry, > > With your bugfix, superlu_dist-4.1 works now: > > > > petsc/src/ksp/ksp/examples/tutorials (master) > > $ mpiexec -n 4 ./ex10 -f0 Amat_binary.m -rhs 0 -pc_type lu > > -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact > > Number of iterations = 1 > > Residual norm 2.11605e-11 > > > > Once you address Satish's request, we'll update petsc interface to this > > version of superlu_dist. > > > > Anthony: > > Please download the latest superlu_dist-v4.1, > > then configure petsc with > > '--download-superlu_dist=superlu_dist_4.1.tar.gz' > > > > Hong > > > > On Tue, Jul 28, 2015 at 11:11 AM, Satish Balay <[email protected]> wrote: > > > >> Sherry, > >> > >> One minor issue with the tarball. I see the following new files in the > >> v4.1 tarball > >> [when comparing it with v4.0]. Some of these files are perhaps junk files > >> - and can > >> be removed from the tarball? > >> > >> EXAMPLE/dscatter.c.bak > >> EXAMPLE/g10.cua > >> EXAMPLE/g4.cua > >> EXAMPLE/g4.postorder.eps > >> EXAMPLE/g4.rua > >> EXAMPLE/g4_postorder.jpg > >> EXAMPLE/hostname > >> EXAMPLE/pdgssvx.c > >> EXAMPLE/pdgstrf2.c > >> EXAMPLE/pwd > >> EXAMPLE/pzgstrf2.c > >> EXAMPLE/pzgstrf_v3.3.c > >> EXAMPLE/pzutil.c > >> EXAMPLE/test.bat > >> EXAMPLE/test.cpu.bat > >> EXAMPLE/test.err > >> EXAMPLE/test.err.1 > >> EXAMPLE/zlook_ahead_update.c > >> FORTRAN/make.out > >> FORTRAN/zcreate_dist_matrix.c > >> MAKE_INC/make.xc30 > >> SRC/int_t > >> SRC/lnbrow > >> SRC/make.out > >> SRC/rnbrow > >> SRC/temp > >> SRC/temp1 > >> > >> > >> Thanks, > >> Satish > >> > >> > >> On Tue, 28 Jul 2015, Xiaoye S. Li wrote: > >> > >> > I am checking v4.1 now. I'll let you know when I fixed the problem. > >> > > >> > Sherry > >> > > >> > On Tue, Jul 28, 2015 at 8:27 AM, Hong <[email protected]> wrote: > >> > > >> > > Sherry, > >> > > I tested with superlu_dist v4.1. The extra printings are gone, but > >> hang > >> > > remains. > >> > > It hangs at > >> > > > >> > > #5 0x00007fde5af1c818 in PMPI_Wait (request=0xb6e4e0, > >> > > status=0x7fff9cd83d60) > >> > > at src/mpi/pt2pt/wait.c:168 > >> > > #6 0x00007fde602dd635 in pzgstrf (options=0x9202f0, m=4900, n=4900, > >> > > anorm=13.738475134194639, LUstruct=0x9203c8, grid=0x9202c8, > >> > > stat=0x7fff9cd84880, info=0x7fff9cd848bc) at pzgstrf.c:1308 > >> > > > >> > > if (recv_req[0] != MPI_REQUEST_NULL) { > >> > > --> MPI_Wait (&recv_req[0], &status); > >> > > > >> > > We will update petsc interface to superlu_dist v4.1. > >> > > > >> > > Hong > >> > > > >> > > > >> > > On Mon, Jul 27, 2015 at 11:33 PM, Xiaoye S. Li <[email protected]> wrote: > >> > > > >> > >> Hong, > >> > >> Thanks for trying out. > >> > >> The extra printings are not properly guarded by the print level. I > >> will > >> > >> fix that. I will look into the hang problem soon. > >> > >> > >> > >> Sherry > >> > >> > >> > >> > >> > >> On Mon, Jul 27, 2015 at 7:50 PM, Hong <[email protected]> wrote: > >> > >> > >> > >>> Sherry, > >> > >>> > >> > >>> I can repeat hang using petsc/src/ksp/ksp/examples/tutorials/ex10.c: > >> > >>> mpiexec -n 4 ./ex10 -f0 /homes/hzhang/tmp/Amat_binary.m -rhs 0 > >> -pc_type > >> > >>> lu -pc_factor_mat_solver_package superlu_dist > >> -mat_superlu_dist_parsymbfact > >> > >>> ... > >> > >>> .. Starting with 1 OpenMP threads > >> > >>> [0] .. BIG U size 1342464 > >> > >>> [0] .. BIG V size 131072 > >> > >>> Max row size is 1311 > >> > >>> Using buffer_size of 5000000 > >> > >>> Threads per process 1 > >> > >>> ... > >> > >>> > >> > >>> using a debugger (with petsc option '-start_in_debugger'), I find > >> that > >> > >>> hang occurs at > >> > >>> #0 0x00007f117d870998 in __GI___poll (fds=0x20da750, nfds=4, > >> > >>> timeout=<optimized out>, timeout@entry=-1) > >> > >>> at ../sysdeps/unix/sysv/linux/poll.c:83 > >> > >>> #1 0x00007f117de9f7de in MPIDU_Sock_wait (sock_set=0x20da550, > >> > >>> millisecond_timeout=millisecond_timeout@entry=-1, > >> > >>> eventp=eventp@entry=0x7fff654930b0) > >> > >>> at src/mpid/common/sock/poll/sock_wait.i:123 > >> > >>> #2 0x00007f117de898b8 in MPIDI_CH3i_Progress_wait ( > >> > >>> progress_state=0x7fff65493120) > >> > >>> at src/mpid/ch3/channels/sock/src/ch3_progress.c:218 > >> > >>> #3 MPIDI_CH3I_Progress (blocking=blocking@entry=1, > >> > >>> state=state@entry=0x7fff65493120) > >> > >>> at src/mpid/ch3/channels/sock/src/ch3_progress.c:921 > >> > >>> #4 0x00007f117de1a559 in MPIR_Wait_impl (request=request@entry > >> > >>> =0x262df90, > >> > >>> status=status@entry=0x7fff65493390) at src/mpi/pt2pt/wait.c:67 > >> > >>> #5 0x00007f117de1a818 in PMPI_Wait (request=0x262df90, > >> > >>> status=0x7fff65493390) > >> > >>> at src/mpi/pt2pt/wait.c:168 > >> > >>> #6 0x00007f11831da557 in pzgstrf (options=0x23dfda0, m=4900, > >> n=4900, > >> > >>> anorm=13.738475134194639, LUstruct=0x23dfe78, grid=0x23dfd78, > >> > >>> stat=0x7fff65493ea0, info=0x7fff65493edc) at pzgstrf.c:1308 > >> > >>> > >> > >>> #7 0x00007f11831bf3bd in pzgssvx (options=0x23dfda0, A=0x23dfe30, > >> > >>> ScalePermstruct=0x23dfe50, B=0x0, ldb=1225, nrhs=0, > >> grid=0x23dfd78, > >> > >>> LUstruct=0x23dfe78, SOLVEstruct=0x23dfe98, berr=0x0, > >> > >>> stat=0x7fff65493ea0, > >> > >>> ---Type <return> to continue, or q <return> to quit--- > >> > >>> info=0x7fff65493edc) at pzgssvx.c:1063 > >> > >>> > >> > >>> #8 0x00007f11825c2340 in MatLUFactorNumeric_SuperLU_DIST > >> (F=0x23a0110, > >> > >>> A=0x21bb7e0, info=0x2355068) > >> > >>> at > >> > >>> > >> /sandbox/hzhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:411 > >> > >>> #9 0x00007f1181c6c567 in MatLUFactorNumeric (fact=0x23a0110, > >> > >>> mat=0x21bb7e0, > >> > >>> info=0x2355068) at > >> > >>> /sandbox/hzhang/petsc/src/mat/interface/matrix.c:2946 > >> > >>> #10 0x00007f1182a56489 in PCSetUp_LU (pc=0x2353a10) > >> > >>> at /sandbox/hzhang/petsc/src/ksp/pc/impls/factor/lu/lu.c:152 > >> > >>> #11 0x00007f1182b16f24 in PCSetUp (pc=0x2353a10) > >> > >>> at /sandbox/hzhang/petsc/src/ksp/pc/interface/precon.c:983 > >> > >>> #12 0x00007f1182be61b5 in KSPSetUp (ksp=0x232c2a0) > >> > >>> at /sandbox/hzhang/petsc/src/ksp/ksp/interface/itfunc.c:332 > >> > >>> #13 0x0000000000405a31 in main (argc=11, args=0x7fff65499578) > >> > >>> at > >> /sandbox/hzhang/petsc/src/ksp/ksp/examples/tutorials/ex10.c:312 > >> > >>> > >> > >>> You may take a look at it. Sequential symbolic factorization works > >> fine. > >> > >>> > >> > >>> Why superlu_dist (v4.0) in complex precision displays > >> > >>> > >> > >>> .. Starting with 1 OpenMP threads > >> > >>> [0] .. BIG U size 1342464 > >> > >>> [0] .. BIG V size 131072 > >> > >>> Max row size is 1311 > >> > >>> Using buffer_size of 5000000 > >> > >>> Threads per process 1 > >> > >>> ... > >> > >>> > >> > >>> I realize that I use superlu_dist v4.0. Would v4.1 works? I'll give > >> it a > >> > >>> try tomorrow. > >> > >>> > >> > >>> Hong > >> > >>> > >> > >>> On Mon, Jul 27, 2015 at 1:25 PM, Anthony Paul Haas < > >> > >>> [email protected]> wrote: > >> > >>> > >> > >>>> Hi Hong, > >> > >>>> > >> > >>>> No that is not the correct matrix. Note that I forgot to mention > >> that > >> > >>>> it is a complex matrix. I tried loading the matrix I sent you this > >> morning > >> > >>>> with: > >> > >>>> > >> > >>>> !...Load a Matrix in Binary Format > >> > >>>> call > >> > >>>> > >> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr) > >> > >>>> call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr) > >> > >>>> call MatSetType(DLOAD,MATAIJ,ierr) > >> > >>>> call MatLoad(DLOAD,viewer,ierr) > >> > >>>> call PetscViewerDestroy(viewer,ierr) > >> > >>>> > >> > >>>> call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr) > >> > >>>> > >> > >>>> The first 37 rows should look like this: > >> > >>>> > >> > >>>> Mat Object: 2 MPI processes > >> > >>>> type: mpiaij > >> > >>>> row 0: (0, 1) > >> > >>>> row 1: (1, 1) > >> > >>>> row 2: (2, 1) > >> > >>>> row 3: (3, 1) > >> > >>>> row 4: (4, 1) > >> > >>>> row 5: (5, 1) > >> > >>>> row 6: (6, 1) > >> > >>>> row 7: (7, 1) > >> > >>>> row 8: (8, 1) > >> > >>>> row 9: (9, 1) > >> > >>>> row 10: (10, 1) > >> > >>>> row 11: (11, 1) > >> > >>>> row 12: (12, 1) > >> > >>>> row 13: (13, 1) > >> > >>>> row 14: (14, 1) > >> > >>>> row 15: (15, 1) > >> > >>>> row 16: (16, 1) > >> > >>>> row 17: (17, 1) > >> > >>>> row 18: (18, 1) > >> > >>>> row 19: (19, 1) > >> > >>>> row 20: (20, 1) > >> > >>>> row 21: (21, 1) > >> > >>>> row 22: (22, 1) > >> > >>>> row 23: (23, 1) > >> > >>>> row 24: (24, 1) > >> > >>>> row 25: (25, 1) > >> > >>>> row 26: (26, 1) > >> > >>>> row 27: (27, 1) > >> > >>>> row 28: (28, 1) > >> > >>>> row 29: (29, 1) > >> > >>>> row 30: (30, 1) > >> > >>>> row 31: (31, 1) > >> > >>>> row 32: (32, 1) > >> > >>>> row 33: (33, 1) > >> > >>>> row 34: (34, 1) > >> > >>>> row 35: (35, 1) > >> > >>>> row 36: (1, -41.2444) (35, -41.2444) (36, 118.049 - 0.999271 i) > >> (37, > >> > >>>> -21.447) (38, 5.18873) (39, -2.34856) (40, 1.3607) (41, > >> -0.898206) > >> > >>>> (42, 0.642715) (43, -0.48593) (44, 0.382471) (45, -0.310476) > >> (46, > >> > >>>> 0.258302) (47, -0.219268) (48, 0.189304) (49, -0.165815) (50, > >> > >>>> 0.147076) (51, -0.131907) (52, 0.119478) (53, -0.109189) (54, > >> 0.1006) > >> > >>>> (55, -0.0933795) (56, 0.0872779) (57, -0.0821019) (58, > >> 0.0777011) (59, > >> > >>>> -0.0739575) (60, 0.0707775) (61, -0.0680868) (62, 0.0658258) > >> (63, > >> > >>>> -0.0639473) (64, 0.0624137) (65, -0.0611954) (66, 0.0602698) > >> (67, > >> > >>>> -0.0596202) (68, 0.0592349) (69, -0.0295536) (71, -21.447) > >> (106, > >> > >>>> 5.18873) (141, -2.34856) (176, 1.3607) (211, -0.898206) (246, > >> > >>>> 0.642715) (281, -0.48593) (316, 0.382471) (351, -0.310476) > >> (386, > >> > >>>> 0.258302) (421, -0.219268) (456, 0.189304) (491, -0.165815) > >> (526, > >> > >>>> 0.147076) (561, -0.131907) (596, 0.119478) (631, -0.109189) > >> (666, > >> > >>>> 0.1006) (701, -0.0933795) (736, 0.0872779) (771, -0.0821019) > >> (806, > >> > >>>> 0.0777011) (841, -0.0739575) (876, 0.0707775) (911, > >> -0.0680868) (946, > >> > >>>> 0.0658258) (981, -0.0639473) (1016, 0.0624137) (1051, > >> -0.0611954) > >> > >>>> (1086, 0.0602698) (1121, -0.0596202) (1156, 0.0592349) (1191, > >> > >>>> -0.0295536) (1261, 0) (3676, 117.211) (3711, -58.4801) (3746, > >> > >>>> -78.3633) (3781, 29.4911) (3816, -15.8073) (3851, 9.94324) > >> (3886, > >> > >>>> -6.87205) (3921, 5.05774) (3956, -3.89521) (3991, 3.10522) > >> (4026, > >> > >>>> -2.54388) (4061, 2.13082) (4096, -1.8182) (4131, 1.57606) > >> (4166, > >> > >>>> -1.38491) (4201, 1.23155) (4236, -1.10685) (4271, 1.00428) > >> (4306, > >> > >>>> -0.919116) (4341, 0.847829) (4376, -0.787776) (4411, 0.736933) > >> (4446, > >> > >>>> -0.693735) (4481, 0.656958) (4516, -0.625638) (4551, 0.599007) > >> (4586, > >> > >>>> -0.576454) (4621, 0.557491) (4656, -0.541726) (4691, 0.528849) > >> (4726, > >> > >>>> -0.518617) (4761, 0.51084) (4796, -0.50538) (4831, 0.502142) > >> (4866, > >> > >>>> -0.250534) > >> > >>>> > >> > >>>> > >> > >>>> Thanks, > >> > >>>> > >> > >>>> Anthony > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> On Fri, Jul 24, 2015 at 7:56 PM, Hong <[email protected]> wrote: > >> > >>>> > >> > >>>>> Anthony: > >> > >>>>> I test your Amat_binary.m > >> > >>>>> using petsc/src/ksp/ksp/examples/tutorials/ex10.c. > >> > >>>>> Your matrix has many zero rows: > >> > >>>>> ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more > >> > >>>>> Mat Object: 1 MPI processes > >> > >>>>> type: seqaij > >> > >>>>> row 0: (0, 1) > >> > >>>>> row 1: (1, 0) > >> > >>>>> row 2: (2, 1) > >> > >>>>> row 3: (3, 0) > >> > >>>>> row 4: (4, 1) > >> > >>>>> row 5: (5, 0) > >> > >>>>> row 6: (6, 1) > >> > >>>>> row 7: (7, 0) > >> > >>>>> row 8: (8, 1) > >> > >>>>> row 9: (9, 0) > >> > >>>>> ... > >> > >>>>> row 36: (1, 1) (35, 0) (36, 1) (37, 0) (38, 1) (39, 0) (40, > >> 1) > >> > >>>>> (41, 0) (42, 1) (43, 0) (44, 1) (45, > >> > >>>>> 0) (46, 1) (47, 0) (48, 1) (49, 0) (50, 1) (51, 0) (52, 1) > >> > >>>>> (53, 0) (54, 1) (55, 0) (56, 1) (57, 0) > >> > >>>>> (58, 1) (59, 0) (60, 1) ... > >> > >>>>> > >> > >>>>> Do you send us correct matrix? > >> > >>>>> > >> > >>>>>> > >> > >>>>>> I ran my code through valgrind and gdb as suggested by Barry. I > >> am > >> > >>>>>> now coming back to some problem I have had while running with > >> parallel > >> > >>>>>> symbolic factorization. I am attaching a test matrix (petsc > >> binary format) > >> > >>>>>> that I LU decompose and then use to solve a linear system (see > >> code below). > >> > >>>>>> I can run on 2 processors with parsymbfact or with 4 processors > >> without > >> > >>>>>> parsymbfact. However, if I run on 4 procs with parsymbfact, the > >> code is > >> > >>>>>> just hanging. Below is the simplified test case that I have used > >> to test. > >> > >>>>>> The matrix A and B are built somewhere else in my program. The > >> matrix I am > >> > >>>>>> attaching is A-sigma*B (see below). > >> > >>>>>> > >> > >>>>>> One thing is that I don't know for sparse matrices what is the > >> > >>>>>> optimum number of processors to use for a LU decomposition? Does > >> it depend > >> > >>>>>> on the total number of nonzero? Do you have an easy way to > >> compute it? > >> > >>>>>> > >> > >>>>> > >> > >>>>> You have to experiment your matrix on a target machine to find > >> out. > >> > >>>>> > >> > >>>>> Hong > >> > >>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> Subroutine HowBigLUCanBe(rank) > >> > >>>>>> > >> > >>>>>> IMPLICIT NONE > >> > >>>>>> > >> > >>>>>> integer(i4b),intent(in) :: rank > >> > >>>>>> integer(i4b) :: i,ct > >> > >>>>>> real(dp) :: begin,endd > >> > >>>>>> complex(dpc) :: sigma > >> > >>>>>> > >> > >>>>>> PetscErrorCode ierr > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> if (rank==0) call cpu_time(begin) > >> > >>>>>> > >> > >>>>>> if (rank==0) then > >> > >>>>>> write(*,*) > >> > >>>>>> write(*,*)'Testing How Big LU Can Be...' > >> > >>>>>> write(*,*)'============================' > >> > >>>>>> write(*,*) > >> > >>>>>> endif > >> > >>>>>> > >> > >>>>>> sigma = (1.0d0,0.0d0) > >> > >>>>>> call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! > >> on > >> > >>>>>> exit A = A-sigma*B > >> > >>>>>> > >> > >>>>>> !.....Write Matrix to ASCII and Binary Format > >> > >>>>>> !call > >> > >>>>>> PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr) > >> > >>>>>> !call MatView(DXX,viewer,ierr) > >> > >>>>>> !call PetscViewerDestroy(viewer,ierr) > >> > >>>>>> > >> > >>>>>> call > >> > >>>>>> > >> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr) > >> > >>>>>> call MatView(A,viewer,ierr) > >> > >>>>>> call PetscViewerDestroy(viewer,ierr) > >> > >>>>>> > >> > >>>>>> !.....Create Linear Solver Context > >> > >>>>>> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) > >> > >>>>>> > >> > >>>>>> !.....Set operators. Here the matrix that defines the linear > >> system > >> > >>>>>> also serves as the preconditioning matrix. > >> > >>>>>> !call > >> KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) > >> > >>>>>> !aha commented and replaced by next line > >> > >>>>>> call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = > >> > >>>>>> A-sigma*B > >> > >>>>>> > >> > >>>>>> !.....Set Relative and Absolute Tolerances and Uses Default for > >> > >>>>>> Divergence Tol > >> > >>>>>> tol = 1.e-10 > >> > >>>>>> call > >> > >>>>>> > >> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) > >> > >>>>>> > >> > >>>>>> !.....Set the Direct (LU) Solver > >> > >>>>>> call KSPSetType(ksp,KSPPREONLY,ierr) > >> > >>>>>> call KSPGetPC(ksp,pc,ierr) > >> > >>>>>> call PCSetType(pc,PCLU,ierr) > >> > >>>>>> call > >> PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) > >> > >>>>>> ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS > >> > >>>>>> > >> > >>>>>> !.....Create Right-Hand-Side Vector > >> > >>>>>> call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr) > >> > >>>>>> call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr) > >> > >>>>>> > >> > >>>>>> allocate(xwork1(IendA-IstartA)) > >> > >>>>>> allocate(loc(IendA-IstartA)) > >> > >>>>>> > >> > >>>>>> ct=0 > >> > >>>>>> do i=IstartA,IendA-1 > >> > >>>>>> ct=ct+1 > >> > >>>>>> loc(ct)=i > >> > >>>>>> xwork1(ct)=(1.0d0,0.0d0) > >> > >>>>>> enddo > >> > >>>>>> > >> > >>>>>> call > >> > >>>>>> VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr) > >> > >>>>>> call VecZeroEntries(sol,ierr) > >> > >>>>>> > >> > >>>>>> deallocate(xwork1,loc) > >> > >>>>>> > >> > >>>>>> !.....Assemble Vectors > >> > >>>>>> call VecAssemblyBegin(frhs,ierr) > >> > >>>>>> call VecAssemblyEnd(frhs,ierr) > >> > >>>>>> > >> > >>>>>> !.....Solve the Linear System > >> > >>>>>> call KSPSolve(ksp,frhs,sol,ierr) > >> > >>>>>> > >> > >>>>>> !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr) > >> > >>>>>> > >> > >>>>>> if (rank==0) then > >> > >>>>>> call cpu_time(endd) > >> > >>>>>> write(*,*) > >> > >>>>>> print '("Total time for HowBigLUCanBe = ",f21.3," > >> > >>>>>> seconds.")',endd-begin > >> > >>>>>> endif > >> > >>>>>> > >> > >>>>>> call SlepcFinalize(ierr) > >> > >>>>>> > >> > >>>>>> STOP > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> end Subroutine HowBigLUCanBe > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> On 07/08/2015 11:23 AM, Xiaoye S. Li wrote: > >> > >>>>>> > >> > >>>>>> Indeed, the parallel symbolic factorization routine needs power > >> of > >> > >>>>>> 2 processes, however, you can use however many processes you > >> need; > >> > >>>>>> internally, we redistribute matrix to nearest power of 2 > >> processes, do > >> > >>>>>> symbolic, then redistribute back to all the processes to do > >> factorization, > >> > >>>>>> triangular solve etc. So, there is no restriction from the > >> users > >> > >>>>>> viewpoint. > >> > >>>>>> > >> > >>>>>> It's difficult to tell what the problem is. Do you think you > >> can > >> > >>>>>> print your matrix, then, I can do some debugging by running > >> superlu_dist > >> > >>>>>> standalone? > >> > >>>>>> > >> > >>>>>> Sherry > >> > >>>>>> > >> > >>>>>> > >> > >>>>>> On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas < > >> > >>>>>> [email protected]> wrote: > >> > >>>>>> > >> > >>>>>>> Hi, > >> > >>>>>>> > >> > >>>>>>> I have used the switch -mat_superlu_dist_parsymbfact in my pbs > >> > >>>>>>> script. However, although my program worked fine with > >> sequential symbolic > >> > >>>>>>> factorization, I get one of the following 2 behaviors when I > >> run with > >> > >>>>>>> parallel symbolic factorization (depending on the number of > >> processors that > >> > >>>>>>> I use): > >> > >>>>>>> > >> > >>>>>>> 1) the program just hangs (it seems stuck in some subroutine > >> ==> > >> > >>>>>>> see test.out-hangs) > >> > >>>>>>> 2) I get a floating point exception ==> see > >> > >>>>>>> test.out-floating-point-exception > >> > >>>>>>> > >> > >>>>>>> Note that as suggested in the Superlu manual, I use a power of > >> 2 > >> > >>>>>>> number of procs. Are there any tunable parameters for the > >> parallel symbolic > >> > >>>>>>> factorization? Note that when I build my sparse matrix, most > >> elements I add > >> > >>>>>>> are nonzero of course but to simplify the programming, I also > >> add a few > >> > >>>>>>> zero elements in the sparse matrix. I was thinking that maybe > >> if the > >> > >>>>>>> parallel symbolic factorization proceed by block, there could > >> be some > >> > >>>>>>> blocks where the pivot would be zero, hence creating the FPE?? > >> > >>>>>>> > >> > >>>>>>> Thanks, > >> > >>>>>>> > >> > >>>>>>> Anthony > >> > >>>>>>> > >> > >>>>>>> > >> > >>>>>>> > >> > >>>>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <[email protected]> > >> wrote: > >> > >>>>>>> > >> > >>>>>>>> Did you find out how to change option to use parallel symbolic > >> > >>>>>>>> factorization? Perhaps PETSc team can help. > >> > >>>>>>>> > >> > >>>>>>>> Sherry > >> > >>>>>>>> > >> > >>>>>>>> > >> > >>>>>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <[email protected]> > >> wrote: > >> > >>>>>>>> > >> > >>>>>>>>> Is there an inquiry function that tells you all the available > >> > >>>>>>>>> options? > >> > >>>>>>>>> > >> > >>>>>>>>> Sherry > >> > >>>>>>>>> > >> > >>>>>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas < > >> > >>>>>>>>> [email protected]> wrote: > >> > >>>>>>>>> > >> > >>>>>>>>>> Hi Sherry, > >> > >>>>>>>>>> > >> > >>>>>>>>>> Thanks for your message. I have used superlu_dist default > >> > >>>>>>>>>> options. I did not realize that I was doing serial symbolic > >> factorization. > >> > >>>>>>>>>> That is probably the cause of my problem. > >> > >>>>>>>>>> Each node on Garnet has 60GB usable memory and I can run > >> with > >> > >>>>>>>>>> 1,2,4,8,16 or 32 core per node. > >> > >>>>>>>>>> > >> > >>>>>>>>>> So I should use: > >> > >>>>>>>>>> > >> > >>>>>>>>>> -mat_superlu_dist_r 20 > >> > >>>>>>>>>> -mat_superlu_dist_c 32 > >> > >>>>>>>>>> > >> > >>>>>>>>>> How do you specify the parallel symbolic factorization > >> option? > >> > >>>>>>>>>> is it -mat_superlu_dist_matinput 1 > >> > >>>>>>>>>> > >> > >>>>>>>>>> Thanks, > >> > >>>>>>>>>> > >> > >>>>>>>>>> Anthony > >> > >>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <[email protected]> > >> > >>>>>>>>>> wrote: > >> > >>>>>>>>>> > >> > >>>>>>>>>>> For superlu_dist failure, this occurs during symbolic > >> > >>>>>>>>>>> factorization. Since you are using serial symbolic > >> factorization, it > >> > >>>>>>>>>>> requires the entire graph of A to be available in the > >> memory of one MPI > >> > >>>>>>>>>>> task. How much memory do you have for each MPI task? > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> It won't help even if you use more processes. You should > >> try > >> > >>>>>>>>>>> to use parallel symbolic factorization option. > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> Another point. You set up process grid as: > >> > >>>>>>>>>>> Process grid nprow 32 x npcol 20 > >> > >>>>>>>>>>> For better performance, you show swap the grid dimension. > >> That > >> > >>>>>>>>>>> is, it's better to use 20 x 32, never gives nprow larger > >> than npcol. > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> Sherry > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith < > >> [email protected]> > >> > >>>>>>>>>>> wrote: > >> > >>>>>>>>>>> > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> I would suggest running a sequence of problems, 101 by > >> 101 > >> > >>>>>>>>>>>> 111 by 111 etc and get the memory usage in each case (when > >> you run out of > >> > >>>>>>>>>>>> memory you can get NO useful information out about memory > >> needs). You can > >> > >>>>>>>>>>>> then plot memory usage as a function of problem size to > >> get a handle on how > >> > >>>>>>>>>>>> much memory it is using. You can also run on more and > >> more processes > >> > >>>>>>>>>>>> (which have a total of more memory) to see how large a > >> problem you may be > >> > >>>>>>>>>>>> able to reach. > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> MUMPS also has an "out of core" version (which we have > >> never > >> > >>>>>>>>>>>> used) that could in theory anyways let you get to large > >> problems if you > >> > >>>>>>>>>>>> have lots of disk space, but you are on your own figuring > >> out how to use it. > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> Barry > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> > On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas < > >> > >>>>>>>>>>>> [email protected]> wrote: > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > Hi Jose, > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > In my code, I use once PETSc to solve a linear system to > >> get > >> > >>>>>>>>>>>> the baseflow (without using SLEPc) and then I use SLEPc to > >> do the stability > >> > >>>>>>>>>>>> analysis of that baseflow. This is why, there are some > >> SLEPc options that > >> > >>>>>>>>>>>> are not used in test.out-superlu_dist-151x151 (when I am > >> solving for the > >> > >>>>>>>>>>>> baseflow with PETSc only). I have attached a 101x101 case > >> for which I get > >> > >>>>>>>>>>>> the eigenvalues. That case works fine. However If i > >> increase to 151x151, I > >> > >>>>>>>>>>>> get the error that you can see in > >> test.out-superlu_dist-151x151 (similar > >> > >>>>>>>>>>>> error with mumps: see test.out-mumps-151x151 line 2918 ). > >> If you look a the > >> > >>>>>>>>>>>> very end of the files test.out-superlu_dist-151x151 and > >> > >>>>>>>>>>>> test.out-mumps-151x151, you will see that the last info > >> message printed is: > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > On Processor (after EPSSetFromOptions) 0 memory: > >> > >>>>>>>>>>>> 0.65073152000E+08 =====> (see line 807 of > >> module_petsc.F90) > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > This means that the memory error probably occurs in the > >> call > >> > >>>>>>>>>>>> to EPSSolve (see module_petsc.F90 line 810). I would like > >> to evaluate how > >> > >>>>>>>>>>>> much memory is required by the most memory intensive > >> operation within > >> > >>>>>>>>>>>> EPSSolve. Since I am solving a generalized EVP, I would > >> imagine that it > >> > >>>>>>>>>>>> would be the LU decomposition. But is there an accurate > >> way of doing it? > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > Before starting with iterative solvers, I would like to > >> > >>>>>>>>>>>> exploit as much as I can direct solvers. I tried GMRES > >> with default > >> > >>>>>>>>>>>> preconditioner at some point but I had convergence > >> problem. What > >> > >>>>>>>>>>>> solver/preconditioner would you recommend for a > >> generalized non-Hermitian > >> > >>>>>>>>>>>> (EPS_GNHEP) EVP? > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > Thanks, > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > Anthony > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman < > >> > >>>>>>>>>>>> [email protected]> wrote: > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > El 07/07/2015, a las 02:33, Anthony Haas escribió: > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > > Hi, > >> > >>>>>>>>>>>> > > > >> > >>>>>>>>>>>> > > I am computing eigenvalues using PETSc/SLEPc and > >> > >>>>>>>>>>>> superlu_dist for the LU decomposition (my problem is a > >> generalized > >> > >>>>>>>>>>>> eigenvalue problem). The code runs fine for a grid with > >> 101x101 but when I > >> > >>>>>>>>>>>> increase to 151x151, I get the following error: > >> > >>>>>>>>>>>> > > > >> > >>>>>>>>>>>> > > Can't expand MemType 1: jcol 16104 (and then [NID > >> 00037] > >> > >>>>>>>>>>>> 2015-07-06 19:19:17 Apid 31025976: OOM killer terminated > >> this process.) > >> > >>>>>>>>>>>> > > > >> > >>>>>>>>>>>> > > It seems to be a memory problem. I monitor the memory > >> usage > >> > >>>>>>>>>>>> as far as I can and it seems that memory usage is pretty > >> low. The most > >> > >>>>>>>>>>>> memory intensive part of the program is probably the LU > >> decomposition in > >> > >>>>>>>>>>>> the context of the generalized EVP. Is there a way to > >> evaluate how much > >> > >>>>>>>>>>>> memory will be required for that step? I am currently > >> running the debug > >> > >>>>>>>>>>>> version of the code which I would assume would use more > >> memory? > >> > >>>>>>>>>>>> > > > >> > >>>>>>>>>>>> > > I have attached the output of the job. Note that the > >> > >>>>>>>>>>>> program uses twice PETSc: 1) to solve a linear system for > >> which no problem > >> > >>>>>>>>>>>> occurs, and, 2) to solve the Generalized EVP with SLEPc, > >> where I get the > >> > >>>>>>>>>>>> error. > >> > >>>>>>>>>>>> > > > >> > >>>>>>>>>>>> > > Thanks > >> > >>>>>>>>>>>> > > > >> > >>>>>>>>>>>> > > Anthony > >> > >>>>>>>>>>>> > > <test.out-superlu_dist-151x151> > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > In the output you are attaching there are no SLEPc > >> objects in > >> > >>>>>>>>>>>> the report and SLEPc options are not used. It seems that > >> SLEPc calls are > >> > >>>>>>>>>>>> skipped? > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > Do you get the same error with MUMPS? Have you tried to > >> solve > >> > >>>>>>>>>>>> linear systems with a preconditioned iterative solver? > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > Jose > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > > >> > >>>>>>>>>>>> > >> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151> > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>> > >> > >>>>>>>> > >> > >>>>>>> > >> > >>>>>> > >> > >>>>>> > >> > >>>>> > >> > >>>> > >> > >>> > >> > >> > >> > > > >> > > >> > > > > >
