Ok now I have restructured the matrix. Now I will not use mumps and use only Petsc and Slepc. I am getting some error.
I will open another thread. Venkatesh On Mon, May 25, 2015 at 2:36 PM, Matthew Knepley <[email protected]> wrote: > On Mon, May 25, 2015 at 2:54 AM, venkatesh g <[email protected]> > wrote: > >> Ok this will load the matrices in parallel correct ? >> > > Yes > > Matt > > >> On Sun, May 24, 2015 at 7:36 PM, Matthew Knepley <[email protected]> >> wrote: >> >>> On Sun, May 24, 2015 at 8:57 AM, venkatesh g <[email protected]> >>> wrote: >>> >>>> I am using Matload option as in the ex7.c code given by the Slepc. >>>> ierr = MatLoad(A,viewer);CHKERRQ(ierr); >>>> >>>> >>>> There is no problem here right ? or any additional option is required >>>> for very large matrices while running the eigensolver in parallel ? >>>> >>> >>> This will load the matrix from the viewer (presumably disk). There are >>> no options for large matrices. >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> cheers, >>>> Venkatesh >>>> >>>> On Sat, May 23, 2015 at 5:43 PM, Matthew Knepley <[email protected]> >>>> wrote: >>>> >>>>> On Sat, May 23, 2015 at 7:09 AM, venkatesh g <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> Thanks. >>>>>> Per node it has 24 cores and each core has 4 GB RAM. And the job was >>>>>> submitted in 10 nodes. >>>>>> >>>>>> So, does it mean it requires 10G for one core ? or for 1 node ? >>>>>> >>>>> >>>>> The error message from MUMPS said that it tried to allocate 10G. We >>>>> must assume each process >>>>> tried to do the same thing. That means if you scheduled 24 processes >>>>> on a node, it would try to >>>>> allocate at least 240G, which is in excess of what you specify above. >>>>> >>>>> Note that this has nothing to do with PETSc. It is all in the >>>>> documentation for that machine and its >>>>> scheduling policy. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> cheers, >>>>>> >>>>>> Venkatesh >>>>>> >>>>>> On Sat, May 23, 2015 at 5:17 PM, Matthew Knepley <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> On Sat, May 23, 2015 at 6:44 AM, venkatesh g < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> The same eigenproblem runs with 120 GB RAM in a serial machine in >>>>>>>> Matlab. >>>>>>>> >>>>>>>> In Cray I fired with 240*4 GB RAM in parallel. So it has to go in >>>>>>>> right ? >>>>>>>> >>>>>>> >>>>>>> I do not know how MUMPS allocates memory, but the message is >>>>>>> unambiguous. Also, >>>>>>> this is concerned with the memory available per node. Do you know >>>>>>> how many processes >>>>>>> per node were scheduled? The message below indicates that it was >>>>>>> trying to allocate 10G >>>>>>> for one process. >>>>>>> >>>>>>> >>>>>>>> And for small matrices it is having negative scaling i.e 24 core is >>>>>>>> running faster. >>>>>>>> >>>>>>> >>>>>>> Yes, for strong scaling you always get slowdown eventually since >>>>>>> overheads dominate >>>>>>> work, see Amdahl's Law. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> I have attached the submission script. >>>>>>>> >>>>>>>> Pls see.. Kindly let me know >>>>>>>> >>>>>>>> cheers, >>>>>>>> Venkatesh >>>>>>>> >>>>>>>> >>>>>>>> On Sat, May 23, 2015 at 4:58 PM, Matthew Knepley <[email protected] >>>>>>>> > wrote: >>>>>>>> >>>>>>>>> On Sat, May 23, 2015 at 2:39 AM, venkatesh g < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi again, >>>>>>>>>> >>>>>>>>>> I have installed the Petsc and Slepc in Cray with intel compilers >>>>>>>>>> with Mumps. >>>>>>>>>> >>>>>>>>>> I am getting this error when I solve eigenvalue problem with >>>>>>>>>> large matrices: [201]PETSC ERROR: Error reported by MUMPS in >>>>>>>>>> numerical >>>>>>>>>> factorization phase: Cannot allocate required memory 9632 megabytes >>>>>>>>>> >>>>>>>>> >>>>>>>>> It ran out of memory on the node. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Also it is again not scaling well for small matrices. >>>>>>>>>> >>>>>>>>> >>>>>>>>> MUMPS strong scaling for small matrices is not very good. Weak >>>>>>>>> scaling is looking at big matrices. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Matt >>>>>>>>> >>>>>>>>> >>>>>>>>>> Kindly let me know what to do. >>>>>>>>>> >>>>>>>>>> cheers, >>>>>>>>>> >>>>>>>>>> Venkatesh >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, May 19, 2015 at 3:02 PM, Matthew Knepley < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, May 19, 2015 at 1:04 AM, venkatesh g < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I have attached the log of the command which I gave in the >>>>>>>>>>>> master node: make streams NPMAX=32 >>>>>>>>>>>> >>>>>>>>>>>> I dont know why it says 'It appears you have only 1 node'. But >>>>>>>>>>>> other codes run in parallel with good scaling on 8 nodes. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If you look at the STREAMS numbers, you can see that your system >>>>>>>>>>> is only able to support about 2 cores with the >>>>>>>>>>> available memory bandwidth. Thus for bandwidth constrained >>>>>>>>>>> operations (almost everything in sparse linear algebra >>>>>>>>>>> and solvers), your speedup will not be bigger than 2. >>>>>>>>>>> >>>>>>>>>>> Other codes may do well on this machine, but they would be >>>>>>>>>>> compute constrained, using things like DGEMM. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Kindly let me know. >>>>>>>>>>>> >>>>>>>>>>>> Venkatesh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, May 18, 2015 at 11:21 PM, Barry Smith < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Run the streams benchmark on this system and send the >>>>>>>>>>>>> results. >>>>>>>>>>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#computers >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> > On May 18, 2015, at 11:14 AM, venkatesh g < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> > Hi, >>>>>>>>>>>>> > I have emailed the mumps-user list. >>>>>>>>>>>>> > Actually the cluster has 8 nodes with 16 cores, and other >>>>>>>>>>>>> codes scale well. >>>>>>>>>>>>> > I wanted to ask if this job takes much time, then if I >>>>>>>>>>>>> submit on more cores, I have to increase the icntl(14).. which >>>>>>>>>>>>> would again >>>>>>>>>>>>> take long time. >>>>>>>>>>>>> > >>>>>>>>>>>>> > So is there another way ? >>>>>>>>>>>>> > >>>>>>>>>>>>> > cheers, >>>>>>>>>>>>> > Venkatesh >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Mon, May 18, 2015 at 7:16 PM, Matthew Knepley < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> > On Mon, May 18, 2015 at 8:29 AM, venkatesh g < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> > Hi I have attached the performance logs for 2 jobs on >>>>>>>>>>>>> different processors. I had to increase the workspace icntl(14) >>>>>>>>>>>>> when I >>>>>>>>>>>>> submit on more cores since it is failing with small value of >>>>>>>>>>>>> icntl(14). >>>>>>>>>>>>> > >>>>>>>>>>>>> > 1. performance_log1.txt is run on 8 cores (option given >>>>>>>>>>>>> -mat_mumps_icntl_14 200) >>>>>>>>>>>>> > 2. performance_log2.txt is run on 2 cores (option given >>>>>>>>>>>>> -mat_mumps_icntl_14 85 ) >>>>>>>>>>>>> > >>>>>>>>>>>>> > 1) Your number of iterates increased from 7600 to 9600, but >>>>>>>>>>>>> that is a relatively small effect >>>>>>>>>>>>> > >>>>>>>>>>>>> > 2) MUMPS is just taking a lot longer to do forward/backward >>>>>>>>>>>>> solve. You might try emailing >>>>>>>>>>>>> > the list for them. However, I would bet that your system has >>>>>>>>>>>>> enough bandwidth for 2 procs >>>>>>>>>>>>> > and not much more. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>> > >>>>>>>>>>>>> > Matt >>>>>>>>>>>>> > >>>>>>>>>>>>> > Venkatesh >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Sun, May 17, 2015 at 6:13 PM, Matthew Knepley < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> > On Sun, May 17, 2015 at 1:38 AM, venkatesh g < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> > Hi, Thanks for the information. I now increased the >>>>>>>>>>>>> workspace by adding '-mat_mumps_icntl_14 100' >>>>>>>>>>>>> > >>>>>>>>>>>>> > It works. However, the problem is, if I submit in 1 core I >>>>>>>>>>>>> get the answer in 200 secs, but with 4 cores and >>>>>>>>>>>>> '-mat_mumps_icntl_14 100' >>>>>>>>>>>>> it takes 3500secs. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Send the output of -log_summary for all performance queries. >>>>>>>>>>>>> Otherwise we are just guessing. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Matt >>>>>>>>>>>>> > >>>>>>>>>>>>> > My command line is: 'mpiexec -np 4 ./ex7 -f1 a2 -f2 b2 >>>>>>>>>>>>> -eps_nev 1 -st_type sinvert -eps_max_it 5000 -st_ksp_type preonly >>>>>>>>>>>>> -st_pc_type lu -st_pc_factor_mat_solver_package mumps >>>>>>>>>>>>> -mat_mumps_icntl_14 >>>>>>>>>>>>> 100' >>>>>>>>>>>>> > >>>>>>>>>>>>> > Kindly let me know. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Venkatesh >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Sat, May 16, 2015 at 7:10 PM, David Knezevic < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> > On Sat, May 16, 2015 at 8:08 AM, venkatesh g < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> > Hi, >>>>>>>>>>>>> > I am trying to solving AX=lambda BX eigenvalue problem. >>>>>>>>>>>>> > >>>>>>>>>>>>> > A and B are of sizes 3600x3600 >>>>>>>>>>>>> > >>>>>>>>>>>>> > I run with this command : >>>>>>>>>>>>> > >>>>>>>>>>>>> > 'mpiexec -np 4 ./ex7 -f1 a2 -f2 b2 -eps_nev 1 -st_type >>>>>>>>>>>>> sinvert -eps_max_it 5000 -st_ksp_type preonly -st_pc_type lu >>>>>>>>>>>>> -st_pc_factor_mat_solver_package mumps' >>>>>>>>>>>>> > >>>>>>>>>>>>> > I get this error: (I get result only when I give 1 or 2 >>>>>>>>>>>>> processors) >>>>>>>>>>>>> > Reading COMPLEX matrices from binary files... >>>>>>>>>>>>> > [0]PETSC ERROR: --------------------- Error Message >>>>>>>>>>>>> ------------------------------------ >>>>>>>>>>>>> > [0]PETSC ERROR: Error in external library! >>>>>>>>>>>>> > [0]PETSC ERROR: Error reported by MUMPS in numerical >>>>>>>>>>>>> factorization phase: INFO(1)=-9, INFO(2)=2024 >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > The MUMPS error types are described in Chapter 7 of the >>>>>>>>>>>>> MUMPS manual. In this case you have INFO(1)=-9, which is >>>>>>>>>>>>> explained in the >>>>>>>>>>>>> manual as: >>>>>>>>>>>>> > >>>>>>>>>>>>> > "–9 Main internal real/complex workarray S too small. If >>>>>>>>>>>>> INFO(2) is positive, then the number of entries that are missing >>>>>>>>>>>>> in S at >>>>>>>>>>>>> the moment when the error is raised is available in INFO(2). If >>>>>>>>>>>>> INFO(2) is >>>>>>>>>>>>> negative, then its absolute value should be multiplied by 1 >>>>>>>>>>>>> million. If an >>>>>>>>>>>>> error –9 occurs, the user should increase the value of ICNTL(14) >>>>>>>>>>>>> before >>>>>>>>>>>>> calling the factorization (JOB=2) again, except if ICNTL(23) is >>>>>>>>>>>>> provided, >>>>>>>>>>>>> in which case ICNTL(23) should be increased." >>>>>>>>>>>>> > >>>>>>>>>>>>> > This says that you should use ICTNL(14) to increase the >>>>>>>>>>>>> working space size: >>>>>>>>>>>>> > >>>>>>>>>>>>> > "ICNTL(14) is accessed by the host both during the analysis >>>>>>>>>>>>> and the factorization phases. It corresponds to the percentage >>>>>>>>>>>>> increase in >>>>>>>>>>>>> the estimated working space. When significant extra fill-in is >>>>>>>>>>>>> caused by >>>>>>>>>>>>> numerical pivoting, increasing ICNTL(14) may help. Except in >>>>>>>>>>>>> special cases, >>>>>>>>>>>>> the default value is 20 (which corresponds to a 20 % increase)." >>>>>>>>>>>>> > >>>>>>>>>>>>> > So, for example, you can avoid this error via the following >>>>>>>>>>>>> command line argument to PETSc: "-mat_mumps_icntl_14 30", where 30 >>>>>>>>>>>>> indicates that we allow a 30% increase in the workspace instead >>>>>>>>>>>>> of the >>>>>>>>>>>>> default 20%. >>>>>>>>>>>>> > >>>>>>>>>>>>> > David >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > -- >>>>>>>>>>>>> > What most experimenters take for granted before they begin >>>>>>>>>>>>> their experiments is infinitely more interesting than any results >>>>>>>>>>>>> to which >>>>>>>>>>>>> their experiments lead. >>>>>>>>>>>>> > -- Norbert Wiener >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > -- >>>>>>>>>>>>> > What most experimenters take for granted before they begin >>>>>>>>>>>>> their experiments is infinitely more interesting than any results >>>>>>>>>>>>> to which >>>>>>>>>>>>> their experiments lead. >>>>>>>>>>>>> > -- Norbert Wiener >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments is infinitely more interesting than any results to >>>>>>>>>>> which their >>>>>>>>>>> experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>> experiments is infinitely more interesting than any results to which >>>>>>>>> their >>>>>>>>> experiments lead. >>>>>>>>> -- Norbert Wiener >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener >
