Re: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9

Pierre Jolivet Sat, 04 Mar 2023 07:10:01 -0800


> On 4 Mar 2023, at 3:26 PM, Zongze Yang <[email protected]> wrote:
> 
> 
> 
> 
>> On Sat, 4 Mar 2023 at 22:03, Pierre Jolivet <[email protected]> wrote:
>> 
>> 
>>>> On 4 Mar 2023, at 2:51 PM, Zongze Yang <[email protected]> wrote:
>>>> 
>>>> 
>>>> 
>>>> On Sat, 4 Mar 2023 at 21:37, Pierre Jolivet <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> > On 4 Mar 2023, at 2:30 PM, Zongze Yang <[email protected]> wrote:
>>>>> > 
>>>>> > Hi, 
>>>>> > 
>>>>> > I am writing to seek your advice regarding a problem I encountered 
>>>>> > while using multigrid to solve a certain issue.
>>>>> > I am currently using multigrid with the coarse problem solved by PCLU. 
>>>>> > However, the PC failed randomly with the error below (the value of 
>>>>> > INFO(2) may differ):
>>>>> > ```shell
>>>>> > [ 0] Error reported by MUMPS in numerical factorization phase: 
>>>>> > INFOG(1)=-9, INFO(2)=36
>>>>> > ```
>>>>> > 
>>>>> > Upon checking the documentation of MUMPS, I discovered that increasing 
>>>>> > the value of ICNTL(14) may help resolve the issue. Specifically, I set 
>>>>> > the option -mat_mumps_icntl_14 to a higher value (such as 40), and the 
>>>>> > error seemed to disappear after I set the value of ICNTL(14) to 80. 
>>>>> > However, I am still curious as to why MUMPS failed randomly in the 
>>>>> > first place.
>>>>> > 
>>>>> > Upon further inspection, I found that the number of nonzeros of the 
>>>>> > PETSc matrix and the MUMPS matrix were different every time I ran the 
>>>>> > code. I am now left with the following questions:
>>>>> > 
>>>>> > 1. What could be causing the number of nonzeros of the MUMPS matrix to 
>>>>> > change every time I run the code?
>>>>> 
>>>>> Is the Mat being fed to MUMPS distributed on a communicator of size 
>>>>> greater than one?
>>>>> If yes, then, depending on the pivoting and the renumbering, you may get 
>>>>> non-deterministic results.
>>>>  
>>>> Hi, Pierre,
>>>> Thank you for your prompt reply. Yes, the size of the communicator is 
>>>> greater than one. 
>>>> Even if the size of the communicator is equal, are the results still 
>>>> non-deterministic?
>>> 
>>> In the most general case, yes.
>>> 
>>> Can I assume the Mat being fed to MUMPS is the same in this case?
>> 
>> Are you doing algebraic or geometric multigrid?
>> Are the prolongation operators computed by Firedrake or by PETSc, e.g., 
>> through GAMG?
>> If it’s the latter, I believe the Mat being fed to MUMPS should always be 
>> the same.
>> If it’s the former, you’ll have to ask the Firedrake people if there may be 
>> non-determinism in the coarsening process.
> 
> I am using geometric multigrid, and the prolongation operators, I think, are 
> computed by Firedrake. 
> Thanks for your suggestion, I will ask the Firedrake people.
>  
>> 
>>> Is the pivoting and renumbering all done by MUMPS other than PETSc?
>> 
>> You could provide your own numbering, but by default, this is outsourced to 
>> MUMPS indeed, which will itself outsourced this to METIS, AMD, etc.
> 
> I think I won't do this.
> By the way, does the result of superlu_dist  have a similar non-deterministic?


SuperLU_DIST uses static pivoting as far as I know, so it may be more 
deterministic.

Thanks,
Pierre

> Thanks,
> Zongze
> 
>> 
>> Thanks,
>> Pierre
>> 
>>>> 
>>>> > 2. Why is the number of nonzeros of the MUMPS matrix significantly 
>>>> > greater than that of the PETSc matrix (as seen in the output of 
>>>> > ksp_view, 115025949 vs 7346177)?
>>>> 
>>>> Exact factorizations introduce fill-in.
>>>> The number of nonzeros you are seeing for MUMPS is the number of nonzeros 
>>>> in the factors.
>>>> 
>>>> > 3. Is it possible that the varying number of nonzeros of the MUMPS 
>>>> > matrix is the cause of the random failure?
>>>> 
>>>> Yes, MUMPS uses dynamic scheduling, which will depend on numerical 
>>>> pivoting, and which may generate factors with different number of nonzeros.
>>> 
>>> Got it. Thank you for your clear explanation.
>>> Zongze 
>>> 
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>> > I have attached a test example written in Firedrake. The output of 
>>>> > `ksp_view` after running the code twice is included below for your 
>>>> > reference.
>>>> > In the output, the number of nonzeros of the MUMPS matrix was 115025949 
>>>> > and 115377847, respectively, while that of the PETSc matrix was only 
>>>> > 7346177.
>>>> > 
>>>> > ```shell
>>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view 
>>>> > ::ascii_info_detail | grep -A3 "type: "
>>>> >   type: preonly
>>>> >   maximum iterations=10000, initial guess is zero
>>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>> >   left preconditioning
>>>> > --
>>>> >   type: lu
>>>> >     out-of-place factorization
>>>> >     tolerance for zero pivot 2.22045e-14
>>>> >     matrix ordering: external
>>>> > --
>>>> >           type: mumps
>>>> >           rows=1050625, cols=1050625
>>>> >           package used to perform factorization: mumps
>>>> >           total: nonzeros=115025949, allocated nonzeros=115025949
>>>> > --
>>>> >     type: mpiaij
>>>> >     rows=1050625, cols=1050625
>>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>>> >     total number of mallocs used during MatSetValues calls=0
>>>> > (complex-int32-mkl) $ mpiexec -n 32 python test_mumps.py -ksp_view 
>>>> > ::ascii_info_detail | grep -A3 "type: "
>>>> >   type: preonly
>>>> >   maximum iterations=10000, initial guess is zero
>>>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>> >   left preconditioning
>>>> > --
>>>> >   type: lu
>>>> >     out-of-place factorization
>>>> >     tolerance for zero pivot 2.22045e-14
>>>> >     matrix ordering: external
>>>> > --
>>>> >           type: mumps
>>>> >           rows=1050625, cols=1050625
>>>> >           package used to perform factorization: mumps
>>>> >           total: nonzeros=115377847, allocated nonzeros=115377847
>>>> > --
>>>> >     type: mpiaij
>>>> >     rows=1050625, cols=1050625
>>>> >     total: nonzeros=7346177, allocated nonzeros=7346177
>>>> >     total number of mallocs used during MatSetValues calls=0
>>>> > ```
>>>> > 
>>>> > I would greatly appreciate any insights you may have on this matter. 
>>>> > Thank you in advance for your time and assistance.
>>>> > 
>>>> > Best wishes,
>>>> > Zongze
>>>> > <test_mumps.py>
>>

Re: [petsc-users] Random Error of mumps: out of memory: INFOG(1)=-9

Reply via email to