Junchao:

Hong,

 The user's example code reads a matrix, calls KSPSolve, then over. From his  
log_view file, I saw long MatLUFactorNum time and short MatSolve time.  Now I 
know that is because MatSolve was skipped. Thanks.

This is intended.
Hong
________________________________
From: Zhang, Hong
Sent: Thursday, October 11, 2018 10:07:10 AM
To: Zhang, Junchao
Cc: Smith, Barry F.; For users of the development version of PETSc
Subject: Re: [petsc-dev] MUMPS silent errors

Junchao :
When matrix factorization fails, we diver error message back to user and skip 
MatSolve. Can you reproduce this problem and I'll take a look at it?


What is embarrassing is the user sent me beautiful -log_view outputs and began 
doing performance comparison. The whole thing is meaningless only because he 
forgot to check the converged reason on a direct solver.


When linear solver fails, snes/ts also fails, which should display error output 
to user. User should check the accuracy of his final solution with 
'-snes_converged_reason' before looking at performance.


MUMPS manual has "A call to MUMPS with JOB=2 must be preceded by a call with 
JOB=1 on the same instance", and similar languages for other phases.  It 
implies we at least should not call MatSolve_MUMPS with failed factorization 
since it might crash the code.

Yes. I've never seen this happen before, thus want to check.
Hong
________________________________
From: Smith, Barry F.
Sent: Wednesday, October 10, 2018 6:41:20 PM
To: Zhang, Junchao
Cc: petsc-dev
Subject: Re: [petsc-dev] MUMPS silent errors


  I looked at the code and it is handled in the PETSc way. The user should not 
expect KSP to error just because it was unable to solve a linear system; they 
should be calling KSPGetConvergedReason() after KSPSolve() to check that the 
solution was computed successfully.

   Barry


> On Oct 10, 2018, at 2:12 PM, Zhang, Junchao 
> <[email protected]<mailto:[email protected]>> wrote:
>
> I met a case where MUMPS numeric factorization returned an error code -9 in 
> mumps->id.INFOG(1) but A->erroriffailure was false in the following code in 
> mumps.c
> 1199: PetscErrorCode MatFactorNumeric_MUMPS(Mat F,Mat A,const MatFactorInfo 
> *info)
> 1200:
> {
> ...
>
> 1227:   PetscMUMPS_c(mumps);
> 1228:   if
>  (mumps->id.INFOG(1) < 0) {
>
> 1229:     if
>  (A->erroriffailure) {
>
> 1230:       SETERRQ2(PETSC_COMM_SELF,PETSC_ERR_LIB,"Error reported by MUMPS 
> in numerical factorization phase: INFOG(1)=%d, INFO(2)=%d\n"
> ,mumps->id.INFOG(1),mumps->id.INFO(2));
>
> 1231:     } else
>  {
>
> 1232:       if (mumps->id.INFOG(1) == -10) { /* numerically singular matrix */
> 1233:         PetscInfo2(F,"matrix is numerically singular, INFOG(1)=%d, 
> INFO(2)=%d\n"
> ,mumps->id.INFOG(1),mumps->id.INFO(2));
>
> 1234:
>         F->factorerrortype = MAT_FACTOR_NUMERIC_ZEROPIVOT;
>
>
> The code continued to KSPSolve and finished successfully (with wrong answer). 
> The user did not call KSPGetConvergedReason() after KSPSolve. I found I had  
> to either add -ksp_error_if_not_converged or call 
> KSPSetErrorIfNotConverged(ksp,PETSC_TRUE) to make the code fail.
> Is it expected?  In my view, it is dangerous. If MUMPS fails in one stage, 
> PETSc should not proceed to the next stage because it may hang there.

Reply via email to