Hong, petsc master is updated to download/install mumps-5.0.2
Satish On Mon, 12 Dec 2016, Hong wrote: > Alfredo: > Sure, I got the tarball of mumps-5.0.2, and will test it and update > petsc-mumps interface. I'll let you know if problem remains. > > Hong > > Dear all, > > sorry for the late reply. The petsc installation went supersmooth and > > I could easily reproduce the issue. I dumped the matrix generated by > > petsc and read it back with a standalone mumps tester in order to > > confirm the bug. This bug has been already reported by another user, > > was fixed a few months ago and the fix was included in the 5.0.2 > > release. Could you please check if everything works well with mumps > > 5.0.2? > > > > Kind regards, > > te MUMPS team > > > > > > > > > > On Thu, Oct 20, 2016 at 4:44 PM, Hong <[email protected]> wrote: > > > Alfredo: > > > It would be much easier to install petsc with mumps, parmetis, and > > > debugging this case. Here is what you can do on a linux machine > > > (see http://www.mcs.anl.gov/petsc/documentation/installation.html): > > > > > > 1) get petsc-release: > > > git clone -b maint https://bitbucket.org/petsc/petsc petsc > > > > > > cd petsc > > > git pull > > > export PETSC_DIR=$PWD > > > export PETSC_ARCH=<> > > > > > > 2) configure petsc with additional options > > > '--download-metis --download-parmetis --download-mumps > > --download-scalapack > > > --download-ptscotch' > > > see http://www.mcs.anl.gov/petsc/documentation/installation.html > > > > > > 3) build petsc and test > > > make > > > make test > > > > > > 4) test ex53.c: > > > cd $PETSC_DIR/src/ksp/ksp/examples/tutorials > > > make ex53 > > > mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 > > > -mat_mumps_icntl_29 2 > > > > > > 5) debugging ex53.c: > > > mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 > > > -mat_mumps_icntl_29 2 -start_in_debugger > > > > > > Give it a try. Contact us if you cannot reproduce this case. > > > > > > Hong > > > > > >> Dear all, > > >> this may well be due to a bug in the parallel analysis. Do you think you > > >> can reproduce the problem in a standalone MUMPS program (i.e., without > > going > > >> through PETSc) ? that would save a lot of time to track the bug since > > we do > > >> not have a PETSc install at hand. Otherwise we'll give it a shot at > > >> installing petsc and reproducing the problem on our side. > > >> > > >> Kind regards, > > >> the MUMPS team > > >> > > >> > > >> > > >> On Wed, Oct 19, 2016 at 8:32 PM, Barry Smith <[email protected]> > > wrote: > > >>> > > >>> > > >>> Tim, > > >>> > > >>> You can/should also run with valgrind to determine exactly the > > first > > >>> point with memory corruption issues. > > >>> > > >>> Barry > > >>> > > >>> > On Oct 19, 2016, at 11:08 AM, Hong <[email protected]> wrote: > > >>> > > > >>> > Tim: > > >>> > With '-mat_mumps_icntl_28 1', i.e., sequential analysis, I can run > > ex56 > > >>> > with np=3 or larger np successfully. > > >>> > > > >>> > With '-mat_mumps_icntl_28 2', i.e., parallel analysis, I can run up > > to > > >>> > np=3. > > >>> > > > >>> > For np=4: > > >>> > mpiexec -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 2 > > >>> > -mat_mumps_icntl_29 2 -start_in_debugger > > >>> > > > >>> > code crashes inside mumps: > > >>> > Program received signal SIGSEGV, Segmentation fault. > > >>> > 0x00007f33d75857cb in > > >>> > dmumps_parallel_analysis::dmumps_build_scotch_graph ( > > >>> > id=..., first=..., last=..., ipe=..., > > >>> > pe=<error reading variable: Cannot access memory at address 0x0>, > > >>> > work=...) > > >>> > at dana_aux_par.F:1450 > > >>> > 1450 MAPTAB(J) = I > > >>> > (gdb) bt > > >>> > #0 0x00007f33d75857cb in > > >>> > dmumps_parallel_analysis::dmumps_build_scotch_graph ( > > >>> > id=..., first=..., last=..., ipe=..., > > >>> > pe=<error reading variable: Cannot access memory at address 0x0>, > > >>> > work=...) > > >>> > at dana_aux_par.F:1450 > > >>> > #1 0x00007f33d759207c in dmumps_parallel_analysis::dmum > > ps_parmetis_ord > > >>> > ( > > >>> > id=..., ord=..., work=...) at dana_aux_par.F:400 > > >>> > #2 0x00007f33d7592d14 in dmumps_parallel_analysis::dmum > > ps_do_par_ord > > >>> > (id=..., > > >>> > ord=..., work=...) at dana_aux_par.F:351 > > >>> > #3 0x00007f33d7593aa9 in dmumps_parallel_analysis::dmumps_ana_f_par > > >>> > (id=..., > > >>> > work1=..., work2=..., nfsiz=..., > > >>> > fils=<error reading variable: Cannot access memory at address > > 0x0>, > > >>> > frere=<error reading variable: Cannot access memory at address > > >>> > 0x0>) > > >>> > at dana_aux_par.F:98 > > >>> > #4 0x00007f33d74c622a in dmumps_ana_driver (id=...) at > > >>> > dana_driver.F:563 > > >>> > #5 0x00007f33d747706b in dmumps (id=...) at dmumps_driver.F:1108 > > >>> > #6 0x00007f33d74721b5 in dmumps_f77 (job=1, sym=0, par=1, > > >>> > comm_f77=-2080374779, n=10000, icntl=..., cntl=..., keep=..., > > >>> > dkeep=..., > > >>> > keep8=..., nz=0, irn=..., irnhere=0, jcn=..., jcnhere=0, a=..., > > >>> > ahere=0, > > >>> > nz_loc=7500, irn_loc=..., irn_lochere=1, jcn_loc=..., > > >>> > jcn_lochere=1, > > >>> > a_loc=..., a_lochere=1, nelt=0, eltptr=..., eltptrhere=0, > > >>> > eltvar=..., > > >>> > eltvarhere=0, a_elt=..., a_elthere=0, perm_in=..., perm_inhere=0, > > >>> > rhs=..., > > >>> > rhshere=0, redrhs=..., redrhshere=0, info=..., rinfo=..., > > >>> > infog=..., > > >>> > rinfog=..., deficiency=0, lwk_user=0, size_schur=0, > > >>> > listvar_schur=..., > > >>> > ---Type <return> to continue, or q <return> to quit--- > > >>> > ar_schurhere=0, schur=..., schurhere=0, wk_user=..., > > wk_userhere=0, > > >>> > colsca=..., > > >>> > colscahere=0, rowsca=..., rowscahere=0, instance_number=1, > > nrhs=1, > > >>> > lrhs=0, lredrhs=0, > > >>> > rhs_sparse=..., rhs_sparsehere=0, sol_loc=..., sol_lochere=0, > > >>> > irhs_sparse=..., > > >>> > irhs_sparsehere=0, irhs_ptr=..., irhs_ptrhere=0, isol_loc=..., > > >>> > isol_lochere=0, > > >>> > nz_rhs=0, lsol_loc=0, schur_mloc=0, schur_nloc=0, schur_lld=0, > > >>> > mblock=0, nblock=0, > > >>> > nprow=0, npcol=0, ooc_tmpdir=..., ooc_prefix=..., > > >>> > write_problem=..., tmpdirlen=20, > > >>> > prefixlen=20, write_problemlen=20) at dmumps_f77.F:260 > > >>> > #7 0x00007f33d74709b1 in dmumps_c (mumps_par=0x16126f0) at > > >>> > mumps_c.c:415 > > >>> > #8 0x00007f33d68408ca in MatLUFactorSymbolic_AIJMUMPS (F=0x1610280, > > >>> > A=0x14bafc0, > > >>> > r=0x160cc30, c=0x1609ed0, info=0x15c6708) > > >>> > at /scratch/hzhang/petsc/src/mat/impls/aij/mpi/mumps/mumps.c:14 > > 87 > > >>> > > > >>> > -mat_mumps_icntl_29 = 0 or 1 give same error. > > >>> > I'm cc'ing this email to mumps developer, who may help to resolve > > this > > >>> > matter. > > >>> > > > >>> > Hong > > >>> > > > >>> > > > >>> > Hi all, > > >>> > > > >>> > I have some problems with PETSc using MUMPS and PARMETIS. > > >>> > In some cases it works fine, but in some others it doesn't, so I am > > >>> > trying to understand what is happening. > > >>> > > > >>> > I just picked the following example: > > >>> > > > >>> > http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examp > > les/tutorials/ex53.c.html > > >>> > > > >>> > Now, when I start it with less than 4 processes it works as expected: > > >>> > mpirun -n 3 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > > >>> > -mat_mumps_icntl_29 2 > > >>> > > > >>> > But with 4 or more processes, it crashes, but only when I am using > > >>> > Parmetis: > > >>> > mpirun -n 4 ./ex53 -n 10000 -ksp_view -mat_mumps_icntl_28 1 > > >>> > -mat_mumps_icntl_29 2 > > >>> > > > >>> > Metis worked in every case I tried without any problems. > > >>> > > > >>> > I wonder if I am doing something wrong or if this is a general > > problem > > >>> > or even a bug? Is Parmetis supposed to work with that example with 4 > > >>> > processes? > > >>> > > > >>> > Thanks a lot and kind regards. > > >>> > > > >>> > Volker > > >>> > > > >>> > > > >>> > Here is the error log of process 0: > > >>> > > > >>> > Entering DMUMPS 5.0.1 driver with JOB, N = 1 10000 > > >>> > ================================================= > > >>> > MUMPS compiled with option -Dmetis > > >>> > MUMPS compiled with option -Dparmetis > > >>> > ================================================= > > >>> > L U Solver for unsymmetric matrices > > >>> > Type of parallelism: Working host > > >>> > > > >>> > ****** ANALYSIS STEP ******** > > >>> > > > >>> > ** Max-trans not allowed because matrix is distributed > > >>> > Using ParMETIS for parallel ordering. > > >>> > [0]PETSC ERROR: > > >>> > > > >>> > ------------------------------------------------------------ > > ------------ > > >>> > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > > >>> > probably memory access out of range > > >>> > [0]PETSC ERROR: Try option -start_in_debugger or > > >>> > -on_error_attach_debugger > > >>> > [0]PETSC ERROR: or see > > >>> > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > > >>> > [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple > > Mac > > >>> > OS X to find memory corruption errors > > >>> > [0]PETSC ERROR: likely location of problem given in stack below > > >>> > [0]PETSC ERROR: --------------------- Stack Frames > > >>> > ------------------------------------ > > >>> > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not > > >>> > available, > > >>> > [0]PETSC ERROR: INSTEAD the line number of the start of the > > >>> > function > > >>> > [0]PETSC ERROR: is given. > > >>> > [0]PETSC ERROR: [0] MatLUFactorSymbolic_AIJMUMPS line 1395 > > >>> > > > >>> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/impls/ > > aij/mpi/mumps/mumps.c > > >>> > [0]PETSC ERROR: [0] MatLUFactorSymbolic line 2927 > > >>> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/mat/interfac > > e/matrix.c > > >>> > [0]PETSC ERROR: [0] PCSetUp_LU line 101 > > >>> > > > >>> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/ > > impls/factor/lu/lu.c > > >>> > [0]PETSC ERROR: [0] PCSetUp line 930 > > >>> > > > >>> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/pc/ > > interface/precon.c > > >>> > [0]PETSC ERROR: [0] KSPSetUp line 305 > > >>> > > > >>> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/ > > interface/itfunc.c > > >>> > [0]PETSC ERROR: [0] KSPSolve line 563 > > >>> > > > >>> > /fsgarwinhpc/133/petsc/sources/petsc-3.7.4a/src/ksp/ksp/ > > interface/itfunc.c > > >>> > [0]PETSC ERROR: --------------------- Error Message > > >>> > -------------------------------------------------------------- > > >>> > [0]PETSC ERROR: Signal received > > >>> > [0]PETSC ERROR: See > > >>> > http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble > > >>> > shooting. > > >>> > [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016 > > >>> > [0]PETSC ERROR: ./ex53 on a linux-manni-mumps named manni by 133 Wed > > >>> > Oct 19 16:39:49 2016 > > >>> > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-cxx=mpiicpc > > >>> > --with-fc=mpiifort --with-shared-libraries=1 > > >>> > --with-valgrind-dir=~/usr/valgrind/ > > >>> > > > >>> > --with-mpi-dir=/home/software/intel/Intel-2016.4/compilers_a > > nd_libraries_2016.4.258/linux/mpi > > >>> > --download-scalapack --download-mumps --download-metis > > >>> > --download-metis-shared=0 --download-parmetis > > >>> > --download-parmetis-shared=0 > > >>> > [0]PETSC ERROR: #1 User provided function() line 0 in unknown file > > >>> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0 > > >>> > > > >>> > > >> > > >> > > >> > > >> -- > > >> ----------------------------------------- > > >> Alfredo Buttari, PhD > > >> CNRS-IRIT > > >> 2 rue Camichel, 31071 Toulouse, France > > >> http://buttari.perso.enseeiht.fr > > > > > > > > > > > > > > -- > > ----------------------------------------- > > Alfredo Buttari, PhD > > CNRS-IRIT > > 2 rue Camichel, 31071 Toulouse, France > > http://buttari.perso.enseeiht.fr > > >
