Junchao : When matrix factorization fails, we diver error message back to user and skip MatSolve. Can you reproduce this problem and I'll take a look at it?
What is embarrassing is the user sent me beautiful -log_view outputs and began doing performance comparison. The whole thing is meaningless only because he forgot to check the converged reason on a direct solver. When linear solver fails, snes/ts also fails, which should display error output to user. User should check the accuracy of his final solution with '-snes_converged_reason' before looking at performance. MUMPS manual has "A call to MUMPS with JOB=2 must be preceded by a call with JOB=1 on the same instance", and similar languages for other phases. It implies we at least should not call MatSolve_MUMPS with failed factorization since it might crash the code. Yes. I've never seen this happen before, thus want to check. Hong ________________________________ From: Smith, Barry F. Sent: Wednesday, October 10, 2018 6:41:20 PM To: Zhang, Junchao Cc: petsc-dev Subject: Re: [petsc-dev] MUMPS silent errors I looked at the code and it is handled in the PETSc way. The user should not expect KSP to error just because it was unable to solve a linear system; they should be calling KSPGetConvergedReason() after KSPSolve() to check that the solution was computed successfully. Barry > On Oct 10, 2018, at 2:12 PM, Zhang, Junchao > <[email protected]<mailto:[email protected]>> wrote: > > I met a case where MUMPS numeric factorization returned an error code -9 in > mumps->id.INFOG(1) but A->erroriffailure was false in the following code in > mumps.c > 1199: PetscErrorCode MatFactorNumeric_MUMPS(Mat F,Mat A,const MatFactorInfo > *info) > 1200: > { > ... > > 1227: PetscMUMPS_c(mumps); > 1228: if > (mumps->id.INFOG(1) < 0) { > > 1229: if > (A->erroriffailure) { > > 1230: SETERRQ2(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error reported by MUMPS > in numerical factorization phase: INFOG(1)=%d, INFO(2)=%d\n" > ,mumps->id.INFOG(1),mumps->id.INFO(2)); > > 1231: } else > { > > 1232: if (mumps->id.INFOG(1) == -10) { /* numerically singular matrix */ > 1233: PetscInfo2(F,"matrix is numerically singular, INFOG(1)=%d, > INFO(2)=%d\n" > ,mumps->id.INFOG(1),mumps->id.INFO(2)); > > 1234: > F->factorerrortype = MAT_FACTOR_NUMERIC_ZEROPIVOT; > > > The code continued to KSPSolve and finished successfully (with wrong answer). > The user did not call KSPGetConvergedReason() after KSPSolve. I found I had > to either add -ksp_error_if_not_converged or call > KSPSetErrorIfNotConverged(ksp,PETSC_TRUE) to make the code fail. > Is it expected? In my view, it is dangerous. If MUMPS fails in one stage, > PETSc should not proceed to the next stage because it may hang there.
