On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <[email protected]> wrote: > > > > On 4 Mar 2023, at 2:30 PM, Zongze Yang <[email protected]> wrote: > > > > Hi, > > > > I am writing to seek your advice regarding a problem I encountered while > using multigrid to solve a certain issue. > > I am currently using multigrid with the coarse problem solved by PCLU. > However, the PC failed randomly with the error below (the value of INFO(2) > may differ): > > ```shell > > [ 0] Error reported by MUMPS in numerical factorization phase: > INFOG(1)=-9, INFO(2)=36 > > ``` > > > > Upon checking the documentation of MUMPS, I discovered that increasing > the value of ICNTL(14) may help resolve the issue. Specifically, I set the > option -mat_mumps_icntl_14 to a higher value (such as 40), and the error > seemed to disappear after I set the value of ICNTL(14) to 80. However, I am > still curious as to why MUMPS failed randomly in the first place. > > > > Upon further inspection, I found that the number of nonzeros of the > PETSc matrix and the MUMPS matrix were different every time I ran the code. > I am now left with the following questions: > > > > 1. What could be causing the number of nonzeros of the MUMPS matrix to > change every time I run the code? > > Is the Mat being fed to MUMPS distributed on a communicator of size > greater than one? > If yes, then, depending on the pivoting and the renumbering, you may get > non-deterministic results. >
Hi, Pierre, Thank you for your prompt reply. Yes, the size of the communicator is greater than one. Even if the size of the communicator is equal, are the results still non-deterministic? Can I assume the Mat being fed to MUMPS is the same in this case? Is the pivoting and renumbering all done by MUMPS other than PETSc? > > 2. Why is the number of nonzeros of the MUMPS matrix significantly > greater than that of the PETSc matrix (as seen in the output of ksp_view, > 115025949 vs 7346177)? > > Exact factorizations introduce fill-in. > The number of nonzeros you are seeing for MUMPS is the number of nonzeros > in the factors. > > > 3. Is it possible that the varying number of nonzeros of the MUMPS > matrix is the cause of the random failure? > > Yes, MUMPS uses dynamic scheduling, which will depend on numerical > pivoting, and which may generate factors with different number of nonzeros. > Got it. Thank you for your clear explanation. Zongze > Thanks, > Pierre > > I have attached a test example written in Firedrake. The output of > `ksp_view` after running the code twice is included below for your > reference. > > In the output, the number of nonzeros of the MUMPS matrix was 115025949 > and 115377847, respectively, while that of the PETSc matrix was only > 7346177. > > > > ```shell > > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view > ::ascii_info_detail | grep -A3 "type: " > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > -- > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: external > > -- > > type: mumps > > rows=1050625, cols=1050625 > > package used to perform factorization: mumps > > total: nonzeros=115025949, allocated nonzeros=115025949 > > -- > > type: mpiaij > > rows=1050625, cols=1050625 > > total: nonzeros=7346177, allocated nonzeros=7346177 > > total number of mallocs used during MatSetValues calls=0 > > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view > ::ascii_info_detail | grep -A3 "type: " > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > -- > > type: lu > > out-of-place factorization > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: external > > -- > > type: mumps > > rows=1050625, cols=1050625 > > package used to perform factorization: mumps > > total: nonzeros=115377847, allocated nonzeros=115377847 > > -- > > type: mpiaij > > rows=1050625, cols=1050625 > > total: nonzeros=7346177, allocated nonzeros=7346177 > > total number of mallocs used during MatSetValues calls=0 > > ``` > > > > I would greatly appreciate any insights you may have on this matter. > Thank you in advance for your time and assistance. > > > > Best wishes, > > Zongze > > <test_mumps.py> > >
