> On 4 Mar 2023, at 2:30 PM, Zongze Yang <[email protected]> wrote:
> 
> Hi, 
> 
> I am writing to seek your advice regarding a problem I encountered while 
> using multigrid to solve a certain issue.
> I am currently using multigrid with the coarse problem solved by PCLU. 
> However, the PC failed randomly with the error below (the value of INFO(2) 
> may differ):
> ```shell
> [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9, 
> INFO(2)=36
> ```
> 
> Upon checking the documentation of MUMPS, I discovered that increasing the 
> value of ICNTL(14) may help resolve the issue. Specifically, I set the option 
> -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to 
> disappear after I set the value of ICNTL(14) to 80. However, I am still 
> curious as to why MUMPS failed randomly in the first place.
> 
> Upon further inspection, I found that the number of nonzeros of the PETSc 
> matrix and the MUMPS matrix were different every time I ran the code. I am 
> now left with the following questions:
> 
> 1. What could be causing the number of nonzeros of the MUMPS matrix to change 
> every time I run the code?

Is the Mat being fed to MUMPS distributed on a communicator of size greater 
than one?
If yes, then, depending on the pivoting and the renumbering, you may get 
non-deterministic results.

> 2. Why is the number of nonzeros of the MUMPS matrix significantly greater 
> than that of the PETSc matrix (as seen in the output of ksp_view, 115025949 
> vs 7346177)?

Exact factorizations introduce fill-in.
The number of nonzeros you are seeing for MUMPS is the number of nonzeros in 
the factors.

> 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is 
> the cause of the random failure?

Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting, 
and which may generate factors with different number of nonzeros.

Thanks,
Pierre

> I have attached a test example written in Firedrake. The output of `ksp_view` 
> after running the code twice is included below for your reference.
> In the output, the number of nonzeros of the MUMPS matrix was 115025949 and 
> 115377847, respectively, while that of the PETSc matrix was only 7346177.
> 
> ```shell
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view 
> ::ascii_info_detail | grep -A3 "type: "
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   left preconditioning
> --
>   type: lu
>     out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: external
> --
>           type: mumps
>           rows=1050625, cols=1050625
>           package used to perform factorization: mumps
>           total: nonzeros=115025949, allocated nonzeros=115025949
> --
>     type: mpiaij
>     rows=1050625, cols=1050625
>     total: nonzeros=7346177, allocated nonzeros=7346177
>     total number of mallocs used during MatSetValues calls=0
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view 
> ::ascii_info_detail | grep -A3 "type: "
>   type: preonly
>   maximum iterations=10000, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>   left preconditioning
> --
>   type: lu
>     out-of-place factorization
>     tolerance for zero pivot 2.22045e-14
>     matrix ordering: external
> --
>           type: mumps
>           rows=1050625, cols=1050625
>           package used to perform factorization: mumps
>           total: nonzeros=115377847, allocated nonzeros=115377847
> --
>     type: mpiaij
>     rows=1050625, cols=1050625
>     total: nonzeros=7346177, allocated nonzeros=7346177
>     total number of mallocs used during MatSetValues calls=0
> ```
> 
> I would greatly appreciate any insights you may have on this matter. Thank 
> you in advance for your time and assistance.
> 
> Best wishes,
> Zongze
> <test_mumps.py>

Reply via email to