Re: [petsc-users] MUMPS error and superLU error

Barry Smith Mon, 22 Jun 2015 10:44:09 -0700

  There is nothing we can really do to help on the PETSc side. I do note from 
the output


 REDISTRIB: TOTAL DATA LOCAL/SENT         =   328575589  1437471711
 GLOBAL TIME FOR MATRIX DISTRIBUTION       =    206.6792
 ** Memory relaxation parameter ( ICNTL(14)  )            :        35
 ** Rank of processor needing largest memory in facto     :        30
 ** Space in MBYTES used by this processor for facto      :     21593
 ** Avg. Space in MBYTES per working proc during facto    :      7708

some processes (like 30) require three times as much memory as other processes 
so perhaps a better load balancing of the matrix during the factorization would 
help with memory usage.

  Barry


> On Jun 22, 2015, at 10:57 AM, venkatesh g <[email protected]> wrote:
> 
> Hi 
> I have restructured my matrix eigenvalue problem to see why B is singular as 
> you suggested by changing the governing equations in different form. 
> 
> Now my matrix B is not singular. Both A and B are invertible in Ax=lambda Bx. 
> 
> Still I receive error in MUMPS as it uses large memory (attached is the error 
> log)
> 
> I gave the command: aprun -n 240 -N 24 ./ex7 -f1 A100t -f2 B100t -st_type 
> sinvert -eps_target 0.01 -st_ksp_type preonly -st_pc_type lu 
> -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-5 
> -mat_mumps_icntl_4 2 -evecs v100t
> 
> The matrix A is 60% with zeros.
> 
> Kindly help me.
> 
> Venkatesh 
> 
> On Sun, May 31, 2015 at 8:04 PM, Hong <[email protected]> wrote:
> venkatesh,
> 
> As we discussed previously, even on smaller problems, 
> both mumps and superlu_dist failed, although Mumps gave "OOM" error in 
> numerical factorization.
> 
> You acknowledged that B is singular, which may need additional reformulation 
> for your eigenvalue problems. The option '-st_type sinvert' likely uses 
> B^{-1} (have you read slepc manual?), which could be the source of trouble. 
> 
> Please investigate your model, understand why B is singular; if there is a 
> way to dump null space before submitting large size simulation.
> 
> Hong
> 
> 
> On Sun, May 31, 2015 at 8:36 AM, Dave May <[email protected]> wrote:
> It failed due to a lack of memory. "OOM" stands for "out of memory". OOM 
> killer terminated your job means you ran out of memory.
> 
> 
> 
> 
> On Sunday, 31 May 2015, venkatesh g <[email protected]> wrote:
> Hi all,
> 
> I tried to run my Generalized Eigenproblem in 120 x 24 = 2880 cores. 
> The matrix size of A = 20GB and B = 5GB. 
> 
> It got killed after 7 Hrs of run time. Please see the mumps error log. Why 
> must it fail ? 
> I gave the command: 
> 
> aprun -n 240 -N 24 ./ex7 -f1 a110t -f2 b110t -st_type sinvert -eps_nev 1 
> -log_summary -st_ksp_type preonly -st_pc_type lu 
> -st_pc_factor_mat_solver_package mumps -mat_mumps_cntl_1 1e-2
> 
> Kindly let me know.
> 
> cheers,
> Venkatesh
> 
> On Fri, May 29, 2015 at 10:46 PM, venkatesh g <[email protected]> wrote:
> Hi Matt, users,
> 
> Thanks for the info. Do you also use Petsc and Slepc with MUMPS ? I get into 
> the segmentation error if I increase my matrix size. 
> 
> Can you suggest other software for direct solver for QR in parallel since as 
> LU may not be good for a singular B matrix in Ax=lambda Bx ? I am attaching 
> the working version mumps log.
> 
> My matrix size here is around 47000x47000. If I am not wrong, the memory 
> usage per core is 272MB.
> 
> Can you tell me if I am wrong ? or really if its light on memory for this 
> matrix ?
> 
> Thanks
> cheers,
> Venkatesh
> 
> On Fri, May 29, 2015 at 4:00 PM, Matt Landreman <[email protected]> 
> wrote:
> Dear Venkatesh,
> 
> As you can see in the error log, you are now getting a segmentation fault, 
> which is almost certainly a separate issue from the info(1)=-9 memory problem 
> you had previously. Here is one idea which may or may not help. I've used 
> mumps on the NERSC Edison system, and I found that I sometimes get 
> segmentation faults when using the default Intel compiler. When I switched to 
> the cray compiler the problem disappeared. So you could perhaps try a 
> different compiler if one is available on your system.
> 
> Matt
> 
> On May 29, 2015 4:04 AM, "venkatesh g" <[email protected]> wrote:
> Hi Matt,
> 
> I did what you told and read the manual of that CNTL parameters. I solve for 
> that with CNTL(1)=1e-4. It is working. 
> 
> But it was a test matrix with size 46000x46000. Actual matrix size is 
> 108900x108900 and will increase in the future. 
> 
> I get this error of memory allocation failed. And the binary matrix size of A 
> is 20GB and B is 5 GB.
> 
> Now I submit this in 240 processors each 4 GB RAM and also in 128 Processors 
> with total 512 GB RAM.
> 
> In both the cases, it fails with the following error like memory is not 
> enough. But for 90000x90000 size it had run serially in Matlab with <256 GB 
> RAM.
> 
> Kindly let me know.
> 
> Venkatesh
> 
> On Tue, May 26, 2015 at 8:02 PM, Matt Landreman <[email protected]> 
> wrote:
> Hi Venkatesh,
> 
> I've struggled a bit with mumps memory allocation too.  I think the behavior 
> of mumps is roughly the following. First, in the "analysis step", mumps 
> computes a minimum memory required based on the structure of nonzeros in the 
> matrix.  Then when it actually goes to factorize the matrix, if it ever 
> encounters an element smaller than CNTL(1) (default=0.01) in the diagonal of 
> a sub-matrix it is trying to factorize, it modifies the ordering to avoid the 
> small pivot, which increases the fill-in (hence memory needed).  ICNTL(14) 
> sets the margin allowed for this unanticipated fill-in.  Setting 
> ICNTL(14)=200000 as in your email is not the solution, since this means mumps 
> asks for a huge amount of memory at the start. Better would be to lower 
> CNTL(1) or (I think) use static pivoting (CNTL(4)).  Read the section in the 
> mumps manual about these CNTL parameters. I typically set CNTL(1)=1e-6, which 
> eliminated all the INFO(1)=-9 errors for my problem, without having to modify 
> ICNTL(14).
> 
> Also, I recommend running with ICNTL(4)=3 to display diagnostics. Look for 
> the line in standard output that says "TOTAL     space in MBYTES for IC 
> factorization".  This is the amount of memory that mumps is trying to 
> allocate, and for the default ICNTL(14), it should be similar to matlab's 
> need.
> 
> Hope this helps,
> -Matt Landreman
> University of Maryland
> 
> On Tue, May 26, 2015 at 10:03 AM, venkatesh g <[email protected]> wrote:
> I posted a while ago in MUMPS forums but no one seems to reply.
> 
> I am solving a large generalized Eigenvalue problem. 
> 
> I am getting the following error which is attached, after giving the command:
> 
> /cluster/share/venkatesh/petsc-3.5.3/linux-gnu/bin/mpiexec -np 64 -hosts 
> compute-0-4,compute-0-6,compute-0-7,compute-0-8 ./ex7 -f1 a72t -f2 b72t 
> -st_type sinvert -eps_nev 3 -eps_target 0.5 -st_ksp_type preonly -st_pc_type 
> lu -st_pc_factor_mat_solver_package mumps -mat_mumps_icntl_14 200000
> 
> IT IS impossible to allocate so much memory per processor.. it is asking like 
> around 70 GB per processor. 
> 
> A serial job in MATLAB for the same matrices takes < 60GB. 
> 
> After trying out superLU_dist, I have attached the error there also 
> (segmentation error).
> 
> Kindly help me. 
> 
> Venkatesh
> 
> 
> 
> 
> 
> 
> 
> 
> <mumps_error_log.txt>

Re: [petsc-users] MUMPS error and superLU error

Reply via email to