> On 4 Mar 2023, at 2:30 PM, Zongze Yang <[email protected]> wrote:
>
> Hi,
>
> I am writing to seek your advice regarding a problem I encountered while
> using multigrid to solve a certain issue.
> I am currently using multigrid with the coarse problem solved by PCLU.
> However, the PC failed randomly with the error below (the value of INFO(2)
> may differ):
> ```shell
> [ 0] Error reported by MUMPS in numerical factorization phase: INFOG(1)=-9,
> INFO(2)=36
> ```
>
> Upon checking the documentation of MUMPS, I discovered that increasing the
> value of ICNTL(14) may help resolve the issue. Specifically, I set the option
> -mat_mumps_icntl_14 to a higher value (such as 40), and the error seemed to
> disappear after I set the value of ICNTL(14) to 80. However, I am still
> curious as to why MUMPS failed randomly in the first place.
>
> Upon further inspection, I found that the number of nonzeros of the PETSc
> matrix and the MUMPS matrix were different every time I ran the code. I am
> now left with the following questions:
>
> 1. What could be causing the number of nonzeros of the MUMPS matrix to change
> every time I run the code?
Is the Mat being fed to MUMPS distributed on a communicator of size greater
than one?
If yes, then, depending on the pivoting and the renumbering, you may get
non-deterministic results.
> 2. Why is the number of nonzeros of the MUMPS matrix significantly greater
> than that of the PETSc matrix (as seen in the output of ksp_view, 115025949
> vs 7346177)?
Exact factorizations introduce fill-in.
The number of nonzeros you are seeing for MUMPS is the number of nonzeros in
the factors.
> 3. Is it possible that the varying number of nonzeros of the MUMPS matrix is
> the cause of the random failure?
Yes, MUMPS uses dynamic scheduling, which will depend on numerical pivoting,
and which may generate factors with different number of nonzeros.
Thanks,
Pierre
> I have attached a test example written in Firedrake. The output of `ksp_view`
> after running the code twice is included below for your reference.
> In the output, the number of nonzeros of the MUMPS matrix was 115025949 and
> 115377847, respectively, while that of the PETSc matrix was only 7346177.
>
> ```shell
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
> ::ascii_info_detail | grep -A3 "type: "
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> --
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: external
> --
> type: mumps
> rows=1050625, cols=1050625
> package used to perform factorization: mumps
> total: nonzeros=115025949, allocated nonzeros=115025949
> --
> type: mpiaij
> rows=1050625, cols=1050625
> total: nonzeros=7346177, allocated nonzeros=7346177
> total number of mallocs used during MatSetValues calls=0
> (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view
> ::ascii_info_detail | grep -A3 "type: "
> type: preonly
> maximum iterations=10000, initial guess is zero
> tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
> left preconditioning
> --
> type: lu
> out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: external
> --
> type: mumps
> rows=1050625, cols=1050625
> package used to perform factorization: mumps
> total: nonzeros=115377847, allocated nonzeros=115377847
> --
> type: mpiaij
> rows=1050625, cols=1050625
> total: nonzeros=7346177, allocated nonzeros=7346177
> total number of mallocs used during MatSetValues calls=0
> ```
>
> I would greatly appreciate any insights you may have on this matter. Thank
> you in advance for your time and assistance.
>
> Best wishes,
> Zongze
> <test_mumps.py>