On Fri, Dec 21, 2018 at 12:55 PM Zhang, Hong <hzh...@mcs.anl.gov> wrote:
> Matt: > >> Does anyone know how to profile memory usage? >>> >> >> The best serial way is to use Massif, which is part of valgrind. I think >> it might work in parallel if you >> only look at one process at a time. >> > > Can you give an example of using Massif? > For example, how to use it on petsc/src/ksp/ksp/examples/tutorials/ex56.c > with np=8? > I have not used it in a while, so I have nothing laying around. However, the manual is very good: http://valgrind.org/docs/manual/ms-manual.html Thanks, Matt > Hong > >> >> >>> Hong >>> >>> Thanks, Hong, >>>> >>>> I just briefly went through the code. I was wondering if it is possible >>>> to destroy "c->ptap" (that caches a lot of intermediate data) to release >>>> the memory after the coarse matrix is assembled. I understand you may still >>>> want to reuse these data structures by default but for my simulation, the >>>> preconditioner is fixed and there is no reason to keep the "c->ptap". >>>> >>> >>>> It would be great, if we could have this optional functionality. >>>> >>>> Fande Kong, >>>> >>>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong <hzh...@mcs.anl.gov> wrote: >>>> >>>>> We use nonscalable implementation as default, and switch to scalable >>>>> for matrices over finer grids. You may use option '-matptap_via scalable' >>>>> to force scalable PtAP implementation for all PtAP. Let me know if it >>>>> works. >>>>> Hong >>>>> >>>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <bsm...@mcs.anl.gov> >>>>> wrote: >>>>> >>>>>> >>>>>> See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically >>>>>> for "large" problems, which is determined by some heuristic. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users < >>>>>> petsc-users@mcs.anl.gov> wrote: >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <hzh...@mcs.anl.gov> >>>>>> wrote: >>>>>> > Fande: >>>>>> > Hong, >>>>>> > Thanks for your improvements on PtAP that is critical for MG-type >>>>>> algorithms. >>>>>> > >>>>>> > On Wed, May 3, 2017 at 10:17 AM Hong <hzh...@mcs.anl.gov> wrote: >>>>>> > Mark, >>>>>> > Below is the copy of my email sent to you on Feb 27: >>>>>> > >>>>>> > I implemented scalable MatPtAP and did comparisons of three >>>>>> implementations using ex56.c on alcf cetus machine (this machine has >>>>>> small >>>>>> memory, 1GB/core): >>>>>> > - nonscalable PtAP: use an array of length PN to do dense axpy >>>>>> > - scalable PtAP: do sparse axpy without use of PN array >>>>>> > >>>>>> > What PN means here? >>>>>> > Global number of columns of P. >>>>>> > >>>>>> > - hypre PtAP. >>>>>> > >>>>>> > The results are attached. Summary: >>>>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre >>>>>> PtAP >>>>>> > - scalable PtAP is 4x faster than hypre PtAP >>>>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh) >>>>>> > >>>>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I >>>>>> am implementing an AMG algorithm based on PETSc right now, and it is >>>>>> working well. But we find some a bottleneck with PtAP. For the same P and >>>>>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, >>>>>> while >>>>>> hypre still can generates the coarse matrix. >>>>>> > >>>>>> > I do not want to just use the HYPRE one because we had to duplicate >>>>>> matrices if I used HYPRE PtAP. >>>>>> > >>>>>> > It would be nice if you guys already have done some compassions on >>>>>> these implementations for the memory usage. >>>>>> > Do you encounter memory issue with scalable PtAP? >>>>>> > >>>>>> > By default do we use the scalable PtAP?? Do we have to specify some >>>>>> options to use the scalable version of PtAP? If so, it would be nice to >>>>>> use the scalable version by default. I am totally missing something >>>>>> here. >>>>>> > >>>>>> > Thanks, >>>>>> > >>>>>> > Fande >>>>>> > >>>>>> > >>>>>> > Karl had a student in the summer who improved MatPtAP(). Do you use >>>>>> the latest version of petsc? >>>>>> > HYPRE may use less memory than PETSc because it does not save and >>>>>> reuse the matrices. >>>>>> > >>>>>> > I do not understand why generating coarse matrix fails due to out >>>>>> of memory. Do you use direct solver at coarse grid? >>>>>> > Hong >>>>>> > >>>>>> > Based on above observation, I set the default PtAP algorithm as >>>>>> 'nonscalable'. >>>>>> > When PN > local estimated nonzero of C=PtAP, then switch default to >>>>>> 'scalable'. >>>>>> > User can overwrite default. >>>>>> > >>>>>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I >>>>>> get >>>>>> > MatPtAP 3.6224e+01 (nonscalable for small mats, >>>>>> scalable for larger ones) >>>>>> > scalable MatPtAP 4.6129e+01 >>>>>> > hypre 1.9389e+02 >>>>>> > >>>>>> > This work in on petsc-master. Give it a try. If you encounter any >>>>>> problem, let me know. >>>>>> > >>>>>> > Hong >>>>>> > >>>>>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfad...@lbl.gov> >>>>>> wrote: >>>>>> > (Hong), what is the current state of optimizing RAP for scaling? >>>>>> > >>>>>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we >>>>>> are working out performance problems. They are hitting problems at ~1.5B >>>>>> dof problems on a basic Cray (XC30 I think). >>>>>> > >>>>>> > Thanks, >>>>>> > Mark >>>>>> > >>>>>> >>>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>