Barry: > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason > Linear solve did not converge due to DIVERGED_NANORINF iterations 0 > Number of iterations = 0 > Residual norm 0.0220971 >
Hmm, I'm working on it, and forgot to check '-ksp_converged_reason'. However, superlu_dist does not report zero pivot, might simply 'exit'. I'll contact Sherry about it. Hong > > The matrix has a zero pivot with the nd ordering > > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Zero pivot in LU factorization: > http://www.mcs.anl.gov/petsc/documentation/faq.html#zeropivot > [0]PETSC ERROR: Zero pivot row 5 value 0. tolerance 2.22045e-14 > > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html > for trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.6.2-1539-g5ca2a2b GIT > Date: 2015-11-13 02:00:47 -0600 > [0]PETSC ERROR: ./ex10 on a arch-mpich-nemesis named > Barrys-MacBook-Pro.local by barrysmith Sun Nov 15 21:37:15 2015 > [0]PETSC ERROR: Configure options --download-mpich > --download-mpich-device=ch3:nemesis > [0]PETSC ERROR: #1 MatPivotCheck_none() line 688 in > /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h > [0]PETSC ERROR: #2 MatPivotCheck() line 707 in > /Users/barrysmith/Src/PETSc/include/petsc/private/matimpl.h > [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1332 in > /Users/barrysmith/Src/PETSc/src/mat/impls/aij/seq/inode.c > [0]PETSC ERROR: #4 MatLUFactorNumeric() line 2946 in > /Users/barrysmith/Src/PETSc/src/mat/interface/matrix.c > [0]PETSC ERROR: #5 PCSetUp_LU() line 152 in > /Users/barrysmith/Src/PETSc/src/ksp/pc/impls/factor/lu/lu.c > [0]PETSC ERROR: #6 PCSetUp() line 984 in > /Users/barrysmith/Src/PETSc/src/ksp/pc/interface/precon.c > [0]PETSC ERROR: #7 KSPSetUp() line 332 in > /Users/barrysmith/Src/PETSc/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: #8 main() line 312 in > /Users/barrysmith/Src/petsc/src/ksp/ksp/examples/tutorials/ex10.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -f0 /Users/barrysmith/Downloads/mat.dat > [0]PETSC ERROR: -ksp_converged_reason > [0]PETSC ERROR: -ksp_error_if_not_converged > [0]PETSC ERROR: -ksp_monitor_true_residual > [0]PETSC ERROR: -malloc_test > [0]PETSC ERROR: -pc_type lu > [0]PETSC ERROR: -rhs /Users/barrysmith/Downloads/rhs.dat > [0]PETSC ERROR: ----------------End of Error Message -------send entire > error message to [email protected] > application called MPI_Abort(MPI_COMM_WORLD, 71) - process 0 > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged > -pc_factor_nonzeros_along_diagonal" > > > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > $ ./ex10 -pc_type lu -ksp_monitor_true_residual -f0 ~/Downloads/mat.dat > -rhs ~/Downloads/rhs.dat -ksp_converged_reason -ksp_error_if_not_converged > -pc_factor_nonzeros_along_diagonal > 0 KSP preconditioned resid norm 1.905901677970e+00 true resid norm > 2.209708691208e-02 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.703926496877e-14 true resid norm > 5.880234823611e-15 ||r(i)||/||b|| 2.661090507997e-13 > Linear solve converged due to CONVERGED_RTOL iterations 1 > Number of iterations = 1 > Residual norm < 1.e-12 > ~/Src/petsc/src/ksp/ksp/examples/tutorials (barry/utilize-hwloc *>) > arch-mpich-nemesis > > Jan, > > Remember you ALWAYS have to call KSPConvergedReason() after a KSPSolve > to see what happened in the solve. > > > On Nov 15, 2015, at 8:53 PM, Hong <[email protected]> wrote: > > > > Jan: > > I can reproduce reported behavior using > > petsc/src/ksp/ksp/examples/tutorials/ex10.c on your mat.dat and rhs.dat. > > > > Using petsc sequential lu with default ordering 'nd', I get > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > > Number of iterations = 0 > > Residual norm 0.0220971 > > > > Changing to > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu -pc_factor_mat_ordering_type > natural > > Number of iterations = 1 > > Residual norm < 1.e-12 > > > > Back to superlu_dist, I get > > mpiexec -n 3 ./ex10 -f0 /homes/hzhang/tmp/mat.dat -rhs > /homes/hzhang/tmp/rhs.dat -pc_type lu -pc_factor_mat_solver_package > superlu_dist > > Number of iterations = 4 > > Residual norm 25650.8 > > > > which uses default ordering (-ksp_view) > > Row permutation LargeDiag > > Column permutation METIS_AT_PLUS_A > > > > Run it with > > mpiexec -n 3 ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_rowperm > NATURAL -mat_superlu_dist_colperm NATURAL > > Number of iterations = 1 > > Residual norm < 1.e-12 > > > > i.e., your problem is sensitive to matrix ordering, which I do not know > why. > > > > I checked condition number of your mat.dat using superlu: > > ./ex10 -f0 mat.dat -rhs rhs.dat -pc_type lu > -pc_factor_mat_solver_package superlu -mat_superlu_conditionnumber > > Recip. condition number = 1.137938e-03 > > Number of iterations = 1 > > Residual norm < 1.e-12 > > > > As you see, matrix is well-conditioned. Why is it so sensitive to matrix > ordering? > > > > Hong > > > > Using attached petsc4py code, matrix and right-hand side, SuperLU_dist > > returns totally wrong solution for mixed Laplacian: > > > > $ tar -xzf report.tar.gz > > $ python test-solve.py -pc_factor_mat_solver_package mumps > -ksp_final_residual > > KSP final norm of residual 3.81865e-15 > > $ python test-solve.py -pc_factor_mat_solver_package umfpack > -ksp_final_residual > > KSP final norm of residual 3.68546e-14 > > $ python test-solve.py -pc_factor_mat_solver_package superlu_dist > -ksp_final_residual > > KSP final norm of residual 1827.72 > > > > Moreover final residual is random when run using mpirun -np 3. Maybe > > a memory corruption issue? This is reproducible using PETSc 3.6.2 (and > > SuperLU_dist configured by PETSc) and much older, see > > http://fenicsproject.org/pipermail/fenics-support/2014-March/000439.html > > but has never been reported upstream. > > > > The code for assembling the matrix and rhs using FEniCS is also > > included for the sake of completeness. > > > > Jan > > > >
