https://bitbucket.org/petsc/petsc/commits/1216793c5e02eeb91a6711f4ee34b47012a7925a
All it does it update --download-superlu_dist to use the v4.1 tarball by default. i.e what you currently have with --download-superlu_dist=superlu_dist_4.1.tar.gz is the same thing. Satish On Thu, 30 Jul 2015, Anthony Haas wrote: > Hi Hong, Satish, > > I have been using petsc-3.6.1 with superlu_dist 4.1 > (--download-superlu_dist=superlu_dist_4.1.tar.gz) for a few days now. It seems > to be working fine. However, reading Satish email below, I wonder if there is > some kind of patch I need to apply for PETSc? > > Thanks > > Anthony > > > > On 07/30/2015 01:51 PM, Satish Balay wrote: > > I've updated petsc to use v4.1. The changes are in branch > > 'balay/update-superlu_dist-4.1' - and merged to 'next' for now. > > > > Satish > > > > On Wed, 29 Jul 2015, Xiaoye S. Li wrote: > > > > > Thanks for quick update. In the new tarball, I have already removed the > > > junk files, as pointed out by Satish. > > > > > > Sherry > > > > > > On Wed, Jul 29, 2015 at 8:36 AM, Hong <[email protected]> wrote: > > > > > > > Sherry, > > > > With your bugfix, superlu_dist-4.1 works now: > > > > > > > > petsc/src/ksp/ksp/examples/tutorials (master) > > > > $ mpiexec -n 4 ./ex10 -f0 Amat_binary.m -rhs 0 -pc_type lu > > > > -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_parsymbfact > > > > Number of iterations = 1 > > > > Residual norm 2.11605e-11 > > > > > > > > Once you address Satish's request, we'll update petsc interface to this > > > > version of superlu_dist. > > > > > > > > Anthony: > > > > Please download the latest superlu_dist-v4.1, > > > > then configure petsc with > > > > '--download-superlu_dist=superlu_dist_4.1.tar.gz' > > > > > > > > Hong > > > > > > > > On Tue, Jul 28, 2015 at 11:11 AM, Satish Balay <[email protected]> > > > > wrote: > > > > > > > > > Sherry, > > > > > > > > > > One minor issue with the tarball. I see the following new files in the > > > > > v4.1 tarball > > > > > [when comparing it with v4.0]. Some of these files are perhaps junk > > > > > files > > > > > - and can > > > > > be removed from the tarball? > > > > > > > > > > EXAMPLE/dscatter.c.bak > > > > > EXAMPLE/g10.cua > > > > > EXAMPLE/g4.cua > > > > > EXAMPLE/g4.postorder.eps > > > > > EXAMPLE/g4.rua > > > > > EXAMPLE/g4_postorder.jpg > > > > > EXAMPLE/hostname > > > > > EXAMPLE/pdgssvx.c > > > > > EXAMPLE/pdgstrf2.c > > > > > EXAMPLE/pwd > > > > > EXAMPLE/pzgstrf2.c > > > > > EXAMPLE/pzgstrf_v3.3.c > > > > > EXAMPLE/pzutil.c > > > > > EXAMPLE/test.bat > > > > > EXAMPLE/test.cpu.bat > > > > > EXAMPLE/test.err > > > > > EXAMPLE/test.err.1 > > > > > EXAMPLE/zlook_ahead_update.c > > > > > FORTRAN/make.out > > > > > FORTRAN/zcreate_dist_matrix.c > > > > > MAKE_INC/make.xc30 > > > > > SRC/int_t > > > > > SRC/lnbrow > > > > > SRC/make.out > > > > > SRC/rnbrow > > > > > SRC/temp > > > > > SRC/temp1 > > > > > > > > > > > > > > > Thanks, > > > > > Satish > > > > > > > > > > > > > > > On Tue, 28 Jul 2015, Xiaoye S. Li wrote: > > > > > > > > > > > I am checking v4.1 now. I'll let you know when I fixed the problem. > > > > > > > > > > > > Sherry > > > > > > > > > > > > On Tue, Jul 28, 2015 at 8:27 AM, Hong <[email protected]> wrote: > > > > > > > > > > > > > Sherry, > > > > > > > I tested with superlu_dist v4.1. The extra printings are gone, but > > > > > hang > > > > > > > remains. > > > > > > > It hangs at > > > > > > > > > > > > > > #5 0x00007fde5af1c818 in PMPI_Wait (request=0xb6e4e0, > > > > > > > status=0x7fff9cd83d60) > > > > > > > at src/mpi/pt2pt/wait.c:168 > > > > > > > #6 0x00007fde602dd635 in pzgstrf (options=0x9202f0, m=4900, > > > > > > > n=4900, > > > > > > > anorm=13.738475134194639, LUstruct=0x9203c8, grid=0x9202c8, > > > > > > > stat=0x7fff9cd84880, info=0x7fff9cd848bc) at pzgstrf.c:1308 > > > > > > > > > > > > > > if (recv_req[0] != MPI_REQUEST_NULL) { > > > > > > > --> MPI_Wait (&recv_req[0], &status); > > > > > > > > > > > > > > We will update petsc interface to superlu_dist v4.1. > > > > > > > > > > > > > > Hong > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 27, 2015 at 11:33 PM, Xiaoye S. Li <[email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > ​Hong, > > > > > > > > Thanks for trying out. > > > > > > > > The extra printings are not properly guarded by the print level. > > > > > > > > I > > > > > will > > > > > > > > fix that. I will look into the hang problem soon. > > > > > > > > > > > > > > > > Sherry > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 27, 2015 at 7:50 PM, Hong <[email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Sherry, > > > > > > > > > > > > > > > > > > I can repeat hang using > > > > > > > > > petsc/src/ksp/ksp/examples/tutorials/ex10.c: > > > > > > > > > mpiexec -n 4 ./ex10 -f0 /homes/hzhang/tmp/Amat_binary.m -rhs 0 > > > > > -pc_type > > > > > > > > > lu -pc_factor_mat_solver_package superlu_dist > > > > > -mat_superlu_dist_parsymbfact > > > > > > > > > ... > > > > > > > > > .. Starting with 1 OpenMP threads > > > > > > > > > [0] .. BIG U size 1342464 > > > > > > > > > [0] .. BIG V size 131072 > > > > > > > > > Max row size is 1311 > > > > > > > > > Using buffer_size of 5000000 > > > > > > > > > Threads per process 1 > > > > > > > > > ... > > > > > > > > > > > > > > > > > > using a debugger (with petsc option '-start_in_debugger'), I > > > > > > > > > find > > > > > that > > > > > > > > > hang occurs at > > > > > > > > > #0 0x00007f117d870998 in __GI___poll (fds=0x20da750, nfds=4, > > > > > > > > > timeout=<optimized out>, timeout@entry=-1) > > > > > > > > > at ../sysdeps/unix/sysv/linux/poll.c:83 > > > > > > > > > #1 0x00007f117de9f7de in MPIDU_Sock_wait (sock_set=0x20da550, > > > > > > > > > millisecond_timeout=millisecond_timeout@entry=-1, > > > > > > > > > eventp=eventp@entry=0x7fff654930b0) > > > > > > > > > at src/mpid/common/sock/poll/sock_wait.i:123 > > > > > > > > > #2 0x00007f117de898b8 in MPIDI_CH3i_Progress_wait ( > > > > > > > > > progress_state=0x7fff65493120) > > > > > > > > > at src/mpid/ch3/channels/sock/src/ch3_progress.c:218 > > > > > > > > > #3 MPIDI_CH3I_Progress (blocking=blocking@entry=1, > > > > > > > > > state=state@entry=0x7fff65493120) > > > > > > > > > at src/mpid/ch3/channels/sock/src/ch3_progress.c:921 > > > > > > > > > #4 0x00007f117de1a559 in MPIR_Wait_impl > > > > > > > > > (request=request@entry > > > > > > > > > =0x262df90, > > > > > > > > > status=status@entry=0x7fff65493390) at > > > > > > > > > src/mpi/pt2pt/wait.c:67 > > > > > > > > > #5 0x00007f117de1a818 in PMPI_Wait (request=0x262df90, > > > > > > > > > status=0x7fff65493390) > > > > > > > > > at src/mpi/pt2pt/wait.c:168 > > > > > > > > > #6 0x00007f11831da557 in pzgstrf (options=0x23dfda0, m=4900, > > > > > n=4900, > > > > > > > > > anorm=13.738475134194639, LUstruct=0x23dfe78, > > > > > > > > > grid=0x23dfd78, > > > > > > > > > stat=0x7fff65493ea0, info=0x7fff65493edc) at > > > > > > > > > pzgstrf.c:1308 > > > > > > > > > > > > > > > > > > #7 0x00007f11831bf3bd in pzgssvx (options=0x23dfda0, > > > > > > > > > A=0x23dfe30, > > > > > > > > > ScalePermstruct=0x23dfe50, B=0x0, ldb=1225, nrhs=0, > > > > > grid=0x23dfd78, > > > > > > > > > LUstruct=0x23dfe78, SOLVEstruct=0x23dfe98, berr=0x0, > > > > > > > > > stat=0x7fff65493ea0, > > > > > > > > > ---Type <return> to continue, or q <return> to quit--- > > > > > > > > > info=0x7fff65493edc) at pzgssvx.c:1063 > > > > > > > > > > > > > > > > > > #8 0x00007f11825c2340 in MatLUFactorNumeric_SuperLU_DIST > > > > > (F=0x23a0110, > > > > > > > > > A=0x21bb7e0, info=0x2355068) > > > > > > > > > at > > > > > > > > > > > > > > /sandbox/hzhang/petsc/src/mat/impls/aij/mpi/superlu_dist/superlu_dist.c:411 > > > > > > > > > #9 0x00007f1181c6c567 in MatLUFactorNumeric (fact=0x23a0110, > > > > > > > > > mat=0x21bb7e0, > > > > > > > > > info=0x2355068) at > > > > > > > > > /sandbox/hzhang/petsc/src/mat/interface/matrix.c:2946 > > > > > > > > > #10 0x00007f1182a56489 in PCSetUp_LU (pc=0x2353a10) > > > > > > > > > at > > > > > > > > > /sandbox/hzhang/petsc/src/ksp/pc/impls/factor/lu/lu.c:152 > > > > > > > > > #11 0x00007f1182b16f24 in PCSetUp (pc=0x2353a10) > > > > > > > > > at > > > > > > > > > /sandbox/hzhang/petsc/src/ksp/pc/interface/precon.c:983 > > > > > > > > > #12 0x00007f1182be61b5 in KSPSetUp (ksp=0x232c2a0) > > > > > > > > > at > > > > > > > > > /sandbox/hzhang/petsc/src/ksp/ksp/interface/itfunc.c:332 > > > > > > > > > #13 0x0000000000405a31 in main (argc=11, args=0x7fff65499578) > > > > > > > > > at > > > > > /sandbox/hzhang/petsc/src/ksp/ksp/examples/tutorials/ex10.c:312 > > > > > > > > > You may take a look at it. Sequential symbolic factorization > > > > > > > > > works > > > > > fine. > > > > > > > > > Why superlu_dist (v4.0) in complex precision displays > > > > > > > > > > > > > > > > > > .. Starting with 1 OpenMP threads > > > > > > > > > [0] .. BIG U size 1342464 > > > > > > > > > [0] .. BIG V size 131072 > > > > > > > > > Max row size is 1311 > > > > > > > > > Using buffer_size of 5000000 > > > > > > > > > Threads per process 1 > > > > > > > > > ... > > > > > > > > > > > > > > > > > > I realize that I use superlu_dist v4.0. Would v4.1 works? I'll > > > > > > > > > give > > > > > it a > > > > > > > > > try tomorrow. > > > > > > > > > > > > > > > > > > Hong > > > > > > > > > > > > > > > > > > On Mon, Jul 27, 2015 at 1:25 PM, Anthony Paul Haas < > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > Hi Hong, > > > > > > > > > > > > > > > > > > > > No that is not the correct matrix. Note that I forgot to > > > > > > > > > > mention > > > > > that > > > > > > > > > > it is a complex matrix. I tried loading the matrix I sent > > > > > > > > > > you this > > > > > morning > > > > > > > > > > with: > > > > > > > > > > > > > > > > > > > > !...Load a Matrix in Binary Format > > > > > > > > > > call > > > > > > > > > > > > > > > PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr) > > > > > > > > > > call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr) > > > > > > > > > > call MatSetType(DLOAD,MATAIJ,ierr) > > > > > > > > > > call MatLoad(DLOAD,viewer,ierr) > > > > > > > > > > call PetscViewerDestroy(viewer,ierr) > > > > > > > > > > > > > > > > > > > > call MatView(DLOAD,PETSC_VIEWER_STDOUT_WORLD,ierr) > > > > > > > > > > > > > > > > > > > > The first 37 rows should look like this: > > > > > > > > > > > > > > > > > > > > Mat Object: 2 MPI processes > > > > > > > > > > type: mpiaij > > > > > > > > > > row 0: (0, 1) > > > > > > > > > > row 1: (1, 1) > > > > > > > > > > row 2: (2, 1) > > > > > > > > > > row 3: (3, 1) > > > > > > > > > > row 4: (4, 1) > > > > > > > > > > row 5: (5, 1) > > > > > > > > > > row 6: (6, 1) > > > > > > > > > > row 7: (7, 1) > > > > > > > > > > row 8: (8, 1) > > > > > > > > > > row 9: (9, 1) > > > > > > > > > > row 10: (10, 1) > > > > > > > > > > row 11: (11, 1) > > > > > > > > > > row 12: (12, 1) > > > > > > > > > > row 13: (13, 1) > > > > > > > > > > row 14: (14, 1) > > > > > > > > > > row 15: (15, 1) > > > > > > > > > > row 16: (16, 1) > > > > > > > > > > row 17: (17, 1) > > > > > > > > > > row 18: (18, 1) > > > > > > > > > > row 19: (19, 1) > > > > > > > > > > row 20: (20, 1) > > > > > > > > > > row 21: (21, 1) > > > > > > > > > > row 22: (22, 1) > > > > > > > > > > row 23: (23, 1) > > > > > > > > > > row 24: (24, 1) > > > > > > > > > > row 25: (25, 1) > > > > > > > > > > row 26: (26, 1) > > > > > > > > > > row 27: (27, 1) > > > > > > > > > > row 28: (28, 1) > > > > > > > > > > row 29: (29, 1) > > > > > > > > > > row 30: (30, 1) > > > > > > > > > > row 31: (31, 1) > > > > > > > > > > row 32: (32, 1) > > > > > > > > > > row 33: (33, 1) > > > > > > > > > > row 34: (34, 1) > > > > > > > > > > row 35: (35, 1) > > > > > > > > > > row 36: (1, -41.2444) (35, -41.2444) (36, 118.049 - > > > > > > > > > > 0.999271 i) > > > > > (37, > > > > > > > > > > -21.447) (38, 5.18873) (39, -2.34856) (40, 1.3607) (41, > > > > > -0.898206) > > > > > > > > > > (42, 0.642715) (43, -0.48593) (44, 0.382471) (45, > > > > > > > > > > -0.310476) > > > > > (46, > > > > > > > > > > 0.258302) (47, -0.219268) (48, 0.189304) (49, -0.165815) > > > > > > > > > > (50, > > > > > > > > > > 0.147076) (51, -0.131907) (52, 0.119478) (53, -0.109189) > > > > > > > > > > (54, > > > > > 0.1006) > > > > > > > > > > (55, -0.0933795) (56, 0.0872779) (57, -0.0821019) (58, > > > > > 0.0777011) (59, > > > > > > > > > > -0.0739575) (60, 0.0707775) (61, -0.0680868) (62, > > > > > > > > > > 0.0658258) > > > > > (63, > > > > > > > > > > -0.0639473) (64, 0.0624137) (65, -0.0611954) (66, > > > > > > > > > > 0.0602698) > > > > > (67, > > > > > > > > > > -0.0596202) (68, 0.0592349) (69, -0.0295536) (71, > > > > > > > > > > -21.447) > > > > > (106, > > > > > > > > > > 5.18873) (141, -2.34856) (176, 1.3607) (211, -0.898206) > > > > > > > > > > (246, > > > > > > > > > > 0.642715) (281, -0.48593) (316, 0.382471) (351, > > > > > > > > > > -0.310476) > > > > > (386, > > > > > > > > > > 0.258302) (421, -0.219268) (456, 0.189304) (491, > > > > > > > > > > -0.165815) > > > > > (526, > > > > > > > > > > 0.147076) (561, -0.131907) (596, 0.119478) (631, > > > > > > > > > > -0.109189) > > > > > (666, > > > > > > > > > > 0.1006) (701, -0.0933795) (736, 0.0872779) (771, > > > > > > > > > > -0.0821019) > > > > > (806, > > > > > > > > > > 0.0777011) (841, -0.0739575) (876, 0.0707775) (911, > > > > > -0.0680868) (946, > > > > > > > > > > 0.0658258) (981, -0.0639473) (1016, 0.0624137) (1051, > > > > > -0.0611954) > > > > > > > > > > (1086, 0.0602698) (1121, -0.0596202) (1156, 0.0592349) > > > > > > > > > > (1191, > > > > > > > > > > -0.0295536) (1261, 0) (3676, 117.211) (3711, -58.4801) > > > > > > > > > > (3746, > > > > > > > > > > -78.3633) (3781, 29.4911) (3816, -15.8073) (3851, > > > > > > > > > > 9.94324) > > > > > (3886, > > > > > > > > > > -6.87205) (3921, 5.05774) (3956, -3.89521) (3991, > > > > > > > > > > 3.10522) > > > > > (4026, > > > > > > > > > > -2.54388) (4061, 2.13082) (4096, -1.8182) (4131, 1.57606) > > > > > (4166, > > > > > > > > > > -1.38491) (4201, 1.23155) (4236, -1.10685) (4271, > > > > > > > > > > 1.00428) > > > > > (4306, > > > > > > > > > > -0.919116) (4341, 0.847829) (4376, -0.787776) (4411, > > > > > > > > > > 0.736933) > > > > > (4446, > > > > > > > > > > -0.693735) (4481, 0.656958) (4516, -0.625638) (4551, > > > > > > > > > > 0.599007) > > > > > (4586, > > > > > > > > > > -0.576454) (4621, 0.557491) (4656, -0.541726) (4691, > > > > > > > > > > 0.528849) > > > > > (4726, > > > > > > > > > > -0.518617) (4761, 0.51084) (4796, -0.50538) (4831, > > > > > > > > > > 0.502142) > > > > > (4866, > > > > > > > > > > -0.250534) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 24, 2015 at 7:56 PM, Hong <[email protected]> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Anthony: > > > > > > > > > > > I test your Amat_binary.m > > > > > > > > > > > using petsc/src/ksp/ksp/examples/tutorials/ex10.c. > > > > > > > > > > > Your matrix has many zero rows: > > > > > > > > > > > ./ex10 -f0 ~/tmp/Amat_binary.m -rhs 0 -mat_view |more > > > > > > > > > > > Mat Object: 1 MPI processes > > > > > > > > > > > type: seqaij > > > > > > > > > > > row 0: (0, 1) > > > > > > > > > > > row 1: (1, 0) > > > > > > > > > > > row 2: (2, 1) > > > > > > > > > > > row 3: (3, 0) > > > > > > > > > > > row 4: (4, 1) > > > > > > > > > > > row 5: (5, 0) > > > > > > > > > > > row 6: (6, 1) > > > > > > > > > > > row 7: (7, 0) > > > > > > > > > > > row 8: (8, 1) > > > > > > > > > > > row 9: (9, 0) > > > > > > > > > > > ... > > > > > > > > > > > row 36: (1, 1) (35, 0) (36, 1) (37, 0) (38, 1) (39, > > > > > > > > > > > 0) (40, > > > > > 1) > > > > > > > > > > > (41, 0) (42, 1) (43, 0) (44, 1) (45, > > > > > > > > > > > 0) (46, 1) (47, 0) (48, 1) (49, 0) (50, 1) (51, 0) > > > > > > > > > > > (52, 1) > > > > > > > > > > > (53, 0) (54, 1) (55, 0) (56, 1) (57, 0) > > > > > > > > > > > (58, 1) (59, 0) (60, 1) ... > > > > > > > > > > > > > > > > > > > > > > Do you send us correct matrix? > > > > > > > > > > > > > > > > > > > > > > > I ran my code through valgrind and gdb as suggested by > > > > > > > > > > > > Barry. I > > > > > am > > > > > > > > > > > > now coming back to some problem I have had while running > > > > > > > > > > > > with > > > > > parallel > > > > > > > > > > > > symbolic factorization. I am attaching a test matrix > > > > > > > > > > > > (petsc > > > > > binary format) > > > > > > > > > > > > that I LU decompose and then use to solve a linear > > > > > > > > > > > > system (see > > > > > code below). > > > > > > > > > > > > I can run on 2 processors with parsymbfact or with 4 > > > > > > > > > > > > processors > > > > > without > > > > > > > > > > > > parsymbfact. However, if I run on 4 procs with > > > > > > > > > > > > parsymbfact, the > > > > > code is > > > > > > > > > > > > just hanging. Below is the simplified test case that I > > > > > > > > > > > > have used > > > > > to test. > > > > > > > > > > > > The matrix A and B are built somewhere else in my > > > > > > > > > > > > program. The > > > > > matrix I am > > > > > > > > > > > > attaching is A-sigma*B (see below). > > > > > > > > > > > > > > > > > > > > > > > > One thing is that I don't know for sparse matrices what > > > > > > > > > > > > is the > > > > > > > > > > > > optimum number of processors to use for a LU > > > > > > > > > > > > decomposition? Does > > > > > it depend > > > > > > > > > > > > on the total number of nonzero? Do you have an easy way > > > > > > > > > > > > to > > > > > compute it? > > > > > > > > > > > You have to experiment your matrix on a target machine to > > > > > > > > > > > find > > > > > out. > > > > > > > > > > > Hong > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Subroutine HowBigLUCanBe(rank) > > > > > > > > > > > > > > > > > > > > > > > > IMPLICIT NONE > > > > > > > > > > > > > > > > > > > > > > > > integer(i4b),intent(in) :: rank > > > > > > > > > > > > integer(i4b) :: i,ct > > > > > > > > > > > > real(dp) :: begin,endd > > > > > > > > > > > > complex(dpc) :: sigma > > > > > > > > > > > > > > > > > > > > > > > > PetscErrorCode ierr > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > if (rank==0) call cpu_time(begin) > > > > > > > > > > > > > > > > > > > > > > > > if (rank==0) then > > > > > > > > > > > > write(*,*) > > > > > > > > > > > > write(*,*)'Testing How Big LU Can Be...' > > > > > > > > > > > > write(*,*)'============================' > > > > > > > > > > > > write(*,*) > > > > > > > > > > > > endif > > > > > > > > > > > > > > > > > > > > > > > > sigma = (1.0d0,0.0d0) > > > > > > > > > > > > call > > > > > > > > > > > > MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! > > > > > on > > > > > > > > > > > > exit A = A-sigma*B > > > > > > > > > > > > > > > > > > > > > > > > !.....Write Matrix to ASCII and Binary Format > > > > > > > > > > > > !call > > > > > > > > > > > > PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr) > > > > > > > > > > > > !call MatView(DXX,viewer,ierr) > > > > > > > > > > > > !call PetscViewerDestroy(viewer,ierr) > > > > > > > > > > > > > > > > > > > > > > > > call > > > > > > > > > > > > > > > > > PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr) > > > > > > > > > > > > call MatView(A,viewer,ierr) > > > > > > > > > > > > call PetscViewerDestroy(viewer,ierr) > > > > > > > > > > > > > > > > > > > > > > > > !.....Create Linear Solver Context > > > > > > > > > > > > call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) > > > > > > > > > > > > > > > > > > > > > > > > !.....Set operators. Here the matrix that defines the > > > > > > > > > > > > linear > > > > > system > > > > > > > > > > > > also serves as the preconditioning matrix. > > > > > > > > > > > > !call > > > > > KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) > > > > > > > > > > > > !aha commented and replaced by next line > > > > > > > > > > > > call KSPSetOperators(ksp,A,A,ierr) ! remember: > > > > > > > > > > > > here A = > > > > > > > > > > > > A-sigma*B > > > > > > > > > > > > > > > > > > > > > > > > !.....Set Relative and Absolute Tolerances and Uses > > > > > > > > > > > > Default for > > > > > > > > > > > > Divergence Tol > > > > > > > > > > > > tol = 1.e-10 > > > > > > > > > > > > call > > > > > > > > > > > > > > > > > KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) > > > > > > > > > > > > !.....Set the Direct (LU) Solver > > > > > > > > > > > > call KSPSetType(ksp,KSPPREONLY,ierr) > > > > > > > > > > > > call KSPGetPC(ksp,pc,ierr) > > > > > > > > > > > > call PCSetType(pc,PCLU,ierr) > > > > > > > > > > > > call > > > > > PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) > > > > > > > > > > > > ! MATSOLVERSUPERLU_DIST MATSOLVERMUMPS > > > > > > > > > > > > > > > > > > > > > > > > !.....Create Right-Hand-Side Vector > > > > > > > > > > > > call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr) > > > > > > > > > > > > call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr) > > > > > > > > > > > > > > > > > > > > > > > > allocate(xwork1(IendA-IstartA)) > > > > > > > > > > > > allocate(loc(IendA-IstartA)) > > > > > > > > > > > > > > > > > > > > > > > > ct=0 > > > > > > > > > > > > do i=IstartA,IendA-1 > > > > > > > > > > > > ct=ct+1 > > > > > > > > > > > > loc(ct)=i > > > > > > > > > > > > xwork1(ct)=(1.0d0,0.0d0) > > > > > > > > > > > > enddo > > > > > > > > > > > > > > > > > > > > > > > > call > > > > > > > > > > > > VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr) > > > > > > > > > > > > call VecZeroEntries(sol,ierr) > > > > > > > > > > > > > > > > > > > > > > > > deallocate(xwork1,loc) > > > > > > > > > > > > > > > > > > > > > > > > !.....Assemble Vectors > > > > > > > > > > > > call VecAssemblyBegin(frhs,ierr) > > > > > > > > > > > > call VecAssemblyEnd(frhs,ierr) > > > > > > > > > > > > > > > > > > > > > > > > !.....Solve the Linear System > > > > > > > > > > > > call KSPSolve(ksp,frhs,sol,ierr) > > > > > > > > > > > > > > > > > > > > > > > > !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr) > > > > > > > > > > > > > > > > > > > > > > > > if (rank==0) then > > > > > > > > > > > > call cpu_time(endd) > > > > > > > > > > > > write(*,*) > > > > > > > > > > > > print '("Total time for HowBigLUCanBe = > > > > > > > > > > > > ",f21.3," > > > > > > > > > > > > seconds.")',endd-begin > > > > > > > > > > > > endif > > > > > > > > > > > > > > > > > > > > > > > > call SlepcFinalize(ierr) > > > > > > > > > > > > > > > > > > > > > > > > STOP > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > end Subroutine HowBigLUCanBe > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 07/08/2015 11:23 AM, Xiaoye S. Li wrote: > > > > > > > > > > > > > > > > > > > > > > > > Indeed, the parallel symbolic factorization routine > > > > > > > > > > > > needs power > > > > > of > > > > > > > > > > > > 2 processes, however, you can use however many processes > > > > > > > > > > > > you > > > > > need; > > > > > > > > > > > > internally, we redistribute matrix to nearest power of 2 > > > > > processes, do > > > > > > > > > > > > symbolic, then redistribute back to all the processes to > > > > > > > > > > > > do > > > > > factorization, > > > > > > > > > > > > triangular solve etc. So, there is no restriction from > > > > > > > > > > > > the > > > > > users > > > > > > > > > > > > viewpoint. > > > > > > > > > > > > > > > > > > > > > > > > It's difficult to tell what the problem is. Do you > > > > > > > > > > > > think you > > > > > can > > > > > > > > > > > > print your matrix, then, I can do some debugging by > > > > > > > > > > > > running > > > > > superlu_dist > > > > > > > > > > > > standalone? > > > > > > > > > > > > > > > > > > > > > > > > Sherry > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 8, 2015 at 10:34 AM, Anthony Paul Haas < > > > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > I have used the switch -mat_superlu_dist_parsymbfact > > > > > > > > > > > > > in my pbs > > > > > > > > > > > > > script. However, although my program worked fine with > > > > > sequential symbolic > > > > > > > > > > > > > factorization, I get one of the following 2 behaviors > > > > > > > > > > > > > when I > > > > > run with > > > > > > > > > > > > > parallel symbolic factorization (depending on the > > > > > > > > > > > > > number of > > > > > processors that > > > > > > > > > > > > > I use): > > > > > > > > > > > > > > > > > > > > > > > > > > 1) the program just hangs (it seems stuck in some > > > > > > > > > > > > > subroutine > > > > > ==> > > > > > > > > > > > > > see test.out-hangs) > > > > > > > > > > > > > 2) I get a floating point exception ==> see > > > > > > > > > > > > > test.out-floating-point-exception > > > > > > > > > > > > > > > > > > > > > > > > > > Note that as suggested in the Superlu manual, I use > > > > > > > > > > > > > a power of > > > > > 2 > > > > > > > > > > > > > number of procs. Are there any tunable parameters for > > > > > > > > > > > > > the > > > > > parallel symbolic > > > > > > > > > > > > > factorization? Note that when I build my sparse > > > > > > > > > > > > > matrix, most > > > > > elements I add > > > > > > > > > > > > > are nonzero of course but to simplify the programming, > > > > > > > > > > > > > I also > > > > > add a few > > > > > > > > > > > > > zero elements in the sparse matrix. I was thinking > > > > > > > > > > > > > that maybe > > > > > if the > > > > > > > > > > > > > parallel symbolic factorization proceed by block, > > > > > > > > > > > > > there could > > > > > be some > > > > > > > > > > > > > blocks where the pivot would be zero, hence creating > > > > > > > > > > > > > the FPE?? > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li > > > > > > > > > > > > > <[email protected]> > > > > > wrote: > > > > > > > > > > > > > > Did you find out how to change option to use > > > > > > > > > > > > > > parallel symbolic > > > > > > > > > > > > > > factorization? Perhaps PETSc team can help. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sherry > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li > > > > > > > > > > > > > > <[email protected]> > > > > > wrote: > > > > > > > > > > > > > > > Is there an inquiry function that tells you all > > > > > > > > > > > > > > > the available > > > > > > > > > > > > > > > options? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sherry > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas > > > > > > > > > > > > > > > < > > > > > > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Sherry, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your message. I have used > > > > > > > > > > > > > > > > superlu_dist default > > > > > > > > > > > > > > > > options. I did not realize that I was doing > > > > > > > > > > > > > > > > serial symbolic > > > > > factorization. > > > > > > > > > > > > > > > > That is probably the cause of my problem. > > > > > > > > > > > > > > > > Each node on Garnet has 60GB usable memory and > > > > > > > > > > > > > > > > I can run > > > > > with > > > > > > > > > > > > > > > > 1,2,4,8,16 or 32 core per node. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So I should use: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -mat_superlu_dist_r 20 > > > > > > > > > > > > > > > > -mat_superlu_dist_c 32 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > How do you specify the parallel symbolic > > > > > > > > > > > > > > > > factorization > > > > > option? > > > > > > > > > > > > > > > > is it -mat_superlu_dist_matinput 1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li > > > > > > > > > > > > > > > > <[email protected]> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For superlu_dist failure, this occurs during > > > > > > > > > > > > > > > > > symbolic > > > > > > > > > > > > > > > > > factorization. Since you are using serial > > > > > > > > > > > > > > > > > symbolic > > > > > factorization, it > > > > > > > > > > > > > > > > > requires the entire graph of A to be available > > > > > > > > > > > > > > > > > in the > > > > > memory of one MPI > > > > > > > > > > > > > > > > > task. How much memory do you have for each MPI > > > > > > > > > > > > > > > > > task? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It won't help even if you use more > > > > > > > > > > > > > > > > > processes. You should > > > > > try > > > > > > > > > > > > > > > > > to use parallel symbolic factorization option. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Another point. You set up process grid as: > > > > > > > > > > > > > > > > > Process grid nprow 32 x npcol 20 > > > > > > > > > > > > > > > > > For better performance, you show swap the > > > > > > > > > > > > > > > > > grid dimension. > > > > > That > > > > > > > > > > > > > > > > > is, it's better to use 20 x 32, never gives > > > > > > > > > > > > > > > > > nprow larger > > > > > than npcol. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sherry > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith < > > > > > [email protected]> > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would suggest running a sequence of > > > > > > > > > > > > > > > > > > problems, 101 by > > > > > 101 > > > > > > > > > > > > > > > > > > 111 by 111 etc and get the memory usage in > > > > > > > > > > > > > > > > > > each case (when > > > > > you run out of > > > > > > > > > > > > > > > > > > memory you can get NO useful information out > > > > > > > > > > > > > > > > > > about memory > > > > > needs). You can > > > > > > > > > > > > > > > > > > then plot memory usage as a function of > > > > > > > > > > > > > > > > > > problem size to > > > > > get a handle on how > > > > > > > > > > > > > > > > > > much memory it is using. You can also run > > > > > > > > > > > > > > > > > > on more and > > > > > more processes > > > > > > > > > > > > > > > > > > (which have a total of more memory) to see > > > > > > > > > > > > > > > > > > how large a > > > > > problem you may be > > > > > > > > > > > > > > > > > > able to reach. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > MUMPS also has an "out of core" version > > > > > > > > > > > > > > > > > > (which we have > > > > > never > > > > > > > > > > > > > > > > > > used) that could in theory anyways let you > > > > > > > > > > > > > > > > > > get to large > > > > > problems if you > > > > > > > > > > > > > > > > > > have lots of disk space, but you are on your > > > > > > > > > > > > > > > > > > own figuring > > > > > out how to use it. > > > > > > > > > > > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Jul 7, 2015, at 2:37 PM, Anthony Paul > > > > > > > > > > > > > > > > > > > Haas < > > > > > > > > > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > Hi Jose, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > In my code, I use once PETSc to solve a > > > > > > > > > > > > > > > > > > > linear system to > > > > > get > > > > > > > > > > > > > > > > > > the baseflow (without using SLEPc) and then > > > > > > > > > > > > > > > > > > I use SLEPc to > > > > > do the stability > > > > > > > > > > > > > > > > > > analysis of that baseflow. This is why, > > > > > > > > > > > > > > > > > > there are some > > > > > SLEPc options that > > > > > > > > > > > > > > > > > > are not used in > > > > > > > > > > > > > > > > > > test.out-superlu_dist-151x151 (when I am > > > > > solving for the > > > > > > > > > > > > > > > > > > baseflow with PETSc only). I have attached a > > > > > > > > > > > > > > > > > > 101x101 case > > > > > for which I get > > > > > > > > > > > > > > > > > > the eigenvalues. That case works fine. > > > > > > > > > > > > > > > > > > However If i > > > > > increase to 151x151, I > > > > > > > > > > > > > > > > > > get the error that you can see in > > > > > test.out-superlu_dist-151x151 (similar > > > > > > > > > > > > > > > > > > error with mumps: see test.out-mumps-151x151 > > > > > > > > > > > > > > > > > > line 2918 ). > > > > > If you look a the > > > > > > > > > > > > > > > > > > very end of the files > > > > > > > > > > > > > > > > > > test.out-superlu_dist-151x151 and > > > > > > > > > > > > > > > > > > test.out-mumps-151x151, you will see that > > > > > > > > > > > > > > > > > > the last info > > > > > message printed is: > > > > > > > > > > > > > > > > > > > On Processor (after EPSSetFromOptions) 0 > > > > > > > > > > > > > > > > > > > memory: > > > > > > > > > > > > > > > > > > 0.65073152000E+08 =====> (see line > > > > > > > > > > > > > > > > > > 807 of > > > > > module_petsc.F90) > > > > > > > > > > > > > > > > > > > This means that the memory error probably > > > > > > > > > > > > > > > > > > > occurs in the > > > > > call > > > > > > > > > > > > > > > > > > to EPSSolve (see module_petsc.F90 line 810). > > > > > > > > > > > > > > > > > > I would like > > > > > to evaluate how > > > > > > > > > > > > > > > > > > much memory is required by the most memory > > > > > > > > > > > > > > > > > > intensive > > > > > operation within > > > > > > > > > > > > > > > > > > EPSSolve. Since I am solving a generalized > > > > > > > > > > > > > > > > > > EVP, I would > > > > > imagine that it > > > > > > > > > > > > > > > > > > would be the LU decomposition. But is there > > > > > > > > > > > > > > > > > > an accurate > > > > > way of doing it? > > > > > > > > > > > > > > > > > > > Before starting with iterative solvers, I > > > > > > > > > > > > > > > > > > > would like to > > > > > > > > > > > > > > > > > > exploit as much as I can direct solvers. I > > > > > > > > > > > > > > > > > > tried GMRES > > > > > with default > > > > > > > > > > > > > > > > > > preconditioner at some point but I had > > > > > > > > > > > > > > > > > > convergence > > > > > problem. What > > > > > > > > > > > > > > > > > > solver/preconditioner would you recommend > > > > > > > > > > > > > > > > > > for a > > > > > generalized non-Hermitian > > > > > > > > > > > > > > > > > > (EPS_GNHEP) EVP? > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 7, 2015 at 12:17 AM, Jose E. > > > > > > > > > > > > > > > > > > > Roman < > > > > > > > > > > > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > > > El 07/07/2015, a las 02:33, Anthony Haas > > > > > > > > > > > > > > > > > > > escribió: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I am computing eigenvalues using > > > > > > > > > > > > > > > > > > > > PETSc/SLEPc and > > > > > > > > > > > > > > > > > > superlu_dist for the LU decomposition (my > > > > > > > > > > > > > > > > > > problem is a > > > > > generalized > > > > > > > > > > > > > > > > > > eigenvalue problem). The code runs fine for > > > > > > > > > > > > > > > > > > a grid with > > > > > 101x101 but when I > > > > > > > > > > > > > > > > > > increase to 151x151, I get the following > > > > > > > > > > > > > > > > > > error: > > > > > > > > > > > > > > > > > > > > Can't expand MemType 1: jcol 16104 > > > > > > > > > > > > > > > > > > > > (and then [NID > > > > > 00037] > > > > > > > > > > > > > > > > > > 2015-07-06 19:19:17 Apid 31025976: OOM > > > > > > > > > > > > > > > > > > killer terminated > > > > > this process.) > > > > > > > > > > > > > > > > > > > > It seems to be a memory problem. I > > > > > > > > > > > > > > > > > > > > monitor the memory > > > > > usage > > > > > > > > > > > > > > > > > > as far as I can and it seems that memory > > > > > > > > > > > > > > > > > > usage is pretty > > > > > low. The most > > > > > > > > > > > > > > > > > > memory intensive part of the program is > > > > > > > > > > > > > > > > > > probably the LU > > > > > decomposition in > > > > > > > > > > > > > > > > > > the context of the generalized EVP. Is there > > > > > > > > > > > > > > > > > > a way to > > > > > evaluate how much > > > > > > > > > > > > > > > > > > memory will be required for that step? I am > > > > > > > > > > > > > > > > > > currently > > > > > running the debug > > > > > > > > > > > > > > > > > > version of the code which I would assume > > > > > > > > > > > > > > > > > > would use more > > > > > memory? > > > > > > > > > > > > > > > > > > > > I have attached the output of the job. > > > > > > > > > > > > > > > > > > > > Note that the > > > > > > > > > > > > > > > > > > program uses twice PETSc: 1) to solve a > > > > > > > > > > > > > > > > > > linear system for > > > > > which no problem > > > > > > > > > > > > > > > > > > occurs, and, 2) to solve the Generalized EVP > > > > > > > > > > > > > > > > > > with SLEPc, > > > > > where I get the > > > > > > > > > > > > > > > > > > error. > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Anthony > > > > > > > > > > > > > > > > > > > > <test.out-superlu_dist-151x151> > > > > > > > > > > > > > > > > > > > In the output you are attaching there are > > > > > > > > > > > > > > > > > > > no SLEPc > > > > > objects in > > > > > > > > > > > > > > > > > > the report and SLEPc options are not used. > > > > > > > > > > > > > > > > > > It seems that > > > > > SLEPc calls are > > > > > > > > > > > > > > > > > > skipped? > > > > > > > > > > > > > > > > > > > Do you get the same error with MUMPS? Have > > > > > > > > > > > > > > > > > > > you tried to > > > > > solve > > > > > > > > > > > > > > > > > > linear systems with a preconditioned > > > > > > > > > > > > > > > > > > iterative solver? > > > > > > > > > > > > > > > > > > > Jose > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
