> On 4 Mar 2023, at 3:26 PM, Zongze Yang <[email protected]> wrote: > > > > >> On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <[email protected]> wrote: >> >> >>>> On 4 Mar 2023, at 2:51 PM, Zongze Yang <[email protected]> wrote: >>>> >>>> >>>> >>>> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <[email protected]> wrote: >>>>> >>>>> >>>>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <[email protected]> wrote: >>>>> > >>>>> > Hi, >>>>> > >>>>> > I am writing to seek your advice regarding a problem I encountered >>>>> > while using multigrid to solve a certain issue. >>>>> > I am currently using multigrid with the coarse problem solved by PCLU. >>>>> > However, the PC failed randomly with the error below (the value of >>>>> > INFO(2) may differ): >>>>> > ```shell >>>>> > [ 0] Error reported by MUMPS in numerical factorization phase: >>>>> > INFOG(1)=-9, INFO(2)=36 >>>>> > ``` >>>>> > >>>>> > Upon checking the documentation of MUMPS, I discovered that increasing >>>>> > the value of ICNTL(14) may help resolve the issue. Specifically, I set >>>>> > the option -mat_mumps_icntl_14 to a higher value (such as 40), and the >>>>> > error seemed to disappear after I set the value of ICNTL(14) to 80. >>>>> > However, I am still curious as to why MUMPS failed randomly in the >>>>> > first place. >>>>> > >>>>> > Upon further inspection, I found that the number of nonzeros of the >>>>> > PETSc matrix and the MUMPS matrix were different every time I ran the >>>>> > code. I am now left with the following questions: >>>>> > >>>>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to >>>>> > change every time I run the code? >>>>> >>>>> Is the Mat being fed to MUMPS distributed on a communicator of size >>>>> greater than one? >>>>> If yes, then, depending on the pivoting and the renumbering, you may get >>>>> non-deterministic results. >>>> >>>> Hi, Pierre, >>>> Thank you for your prompt reply. Yes, the size of the communicator is >>>> greater than one. >>>> Even if the size of the communicator is equal, are the results still >>>> non-deterministic? >>> >>> In the most general case, yes. >>> >>> Can I assume the Mat being fed to MUMPS is the same in this case? >> >> Are you doing algebraic or geometric multigrid? >> Are the prolongation operators computed by Firedrake or by PETSc, e.g., >> through GAMG? >> If it’s the latter, I believe the Mat being fed to MUMPS should always be >> the same. >> If it’s the former, you’ll have to ask the Firedrake people if there may be >> non-determinism in the coarsening process. > > I am using geometric multigrid, and the prolongation operators, I think, are > computed by Firedrake. > Thanks for your suggestion, I will ask the Firedrake people. > >> >>> Is the pivoting and renumbering all done by MUMPS other than PETSc? >> >> You could provide your own numbering, but by default, this is outsourced to >> MUMPS indeed, which will itself outsourced this to METIS, AMD, etc. > > I think I won't do this. > By the way, does the result of superlu_dist have a similar non-deterministic?
SuperLU_DIST uses static pivoting as far as I know, so it may be more deterministic. Thanks, Pierre > Thanks, > Zongze > >> >> Thanks, >> Pierre >> >>>> >>>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly >>>> > greater than that of the PETSc matrix (as seen in the output of >>>> > ksp_view, 115025949 vs 7346177)? >>>> >>>> Exact factorizations introduce fill-in. >>>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros >>>> in the factors. >>>> >>>> > 3. Is it possible that the varying number of nonzeros of the MUMPS >>>> > matrix is the cause of the random failure? >>>> >>>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical >>>> pivoting, and which may generate factors with different number of nonzeros. >>> >>> Got it. Thank you for your clear explanation. >>> Zongze >>> >>>> >>>> Thanks, >>>> Pierre >>>> >>>> > I have attached a test example written in Firedrake. The output of >>>> > `ksp_view` after running the code twice is included below for your >>>> > reference. >>>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 >>>> > and 115377847, respectively, while that of the PETSc matrix was only >>>> > 7346177. >>>> > >>>> > ```shell >>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >>>> > ::ascii_info_detail | grep -A3 "type: " >>>> > type: preonly >>>> > maximum iterations=10000, initial guess is zero >>>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> > left preconditioning >>>> > -- >>>> > type: lu >>>> > out-of-place factorization >>>> > tolerance for zero pivot 2.22045e-14 >>>> > matrix ordering: external >>>> > -- >>>> > type: mumps >>>> > rows=1050625, cols=1050625 >>>> > package used to perform factorization: mumps >>>> > total: nonzeros=115025949, allocated nonzeros=115025949 >>>> > -- >>>> > type: mpiaij >>>> > rows=1050625, cols=1050625 >>>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>>> > total number of mallocs used during MatSetValues calls=0 >>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view >>>> > ::ascii_info_detail | grep -A3 "type: " >>>> > type: preonly >>>> > maximum iterations=10000, initial guess is zero >>>> > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> > left preconditioning >>>> > -- >>>> > type: lu >>>> > out-of-place factorization >>>> > tolerance for zero pivot 2.22045e-14 >>>> > matrix ordering: external >>>> > -- >>>> > type: mumps >>>> > rows=1050625, cols=1050625 >>>> > package used to perform factorization: mumps >>>> > total: nonzeros=115377847, allocated nonzeros=115377847 >>>> > -- >>>> > type: mpiaij >>>> > rows=1050625, cols=1050625 >>>> > total: nonzeros=7346177, allocated nonzeros=7346177 >>>> > total number of mallocs used during MatSetValues calls=0 >>>> > ``` >>>> > >>>> > I would greatly appreciate any insights you may have on this matter. >>>> > Thank you in advance for your time and assistance. >>>> > >>>> > Best wishes, >>>> > Zongze >>>> > <test_mumps.py> >>
