On Fri, Dec 21, 2018 at 12:55 PM Zhang, Hong <hzh...@mcs.anl.gov> wrote:

> Matt:
>
>> Does anyone know how to profile memory usage?
>>>
>>
>> The best serial way is to use Massif, which is part of valgrind. I think
>> it might work in parallel if you
>> only look at one process at a time.
>>
>
> Can you give an example of using  Massif?
> For example, how to use it on petsc/src/ksp/ksp/examples/tutorials/ex56.c
> with np=8?
>

I have not used it in a while, so I have nothing laying around. However,
the manual is very good:

    http://valgrind.org/docs/manual/ms-manual.html

  Thanks,

    Matt


> Hong
>
>>
>>
>>> Hong
>>>
>>> Thanks, Hong,
>>>>
>>>> I just briefly went through the code. I was wondering if it is possible
>>>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>>>> the memory after the coarse matrix is assembled. I understand you may still
>>>> want to reuse these data structures by default but for my simulation, the
>>>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>>>
>>>
>>>> It would be great, if we could have this optional functionality.
>>>>
>>>> Fande Kong,
>>>>
>>>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong <hzh...@mcs.anl.gov> wrote:
>>>>
>>>>> We use nonscalable implementation as default, and switch to scalable
>>>>> for matrices over finer grids. You may use option '-matptap_via scalable'
>>>>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>>>>> works.
>>>>> Hong
>>>>>
>>>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. <bsm...@mcs.anl.gov>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>>>>> for "large" problems, which is determined by some heuristic.
>>>>>>
>>>>>>    Barry
>>>>>>
>>>>>>
>>>>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>>>>> petsc-users@mcs.anl.gov> wrote:
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong <hzh...@mcs.anl.gov>
>>>>>> wrote:
>>>>>> > Fande:
>>>>>> > Hong,
>>>>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>>>>> algorithms.
>>>>>> >
>>>>>> > On Wed, May 3, 2017 at 10:17 AM Hong <hzh...@mcs.anl.gov> wrote:
>>>>>> > Mark,
>>>>>> > Below is the copy of my email sent to you on Feb 27:
>>>>>> >
>>>>>> > I implemented scalable MatPtAP and did comparisons of three
>>>>>> implementations using ex56.c on alcf cetus machine (this machine has 
>>>>>> small
>>>>>> memory, 1GB/core):
>>>>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>>>>> > - scalable PtAP:       do sparse axpy without use of PN array
>>>>>> >
>>>>>> > What PN means here?
>>>>>> > Global number of columns of P.
>>>>>> >
>>>>>> > - hypre PtAP.
>>>>>> >
>>>>>> > The results are attached. Summary:
>>>>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
>>>>>> PtAP
>>>>>> > - scalable PtAP is 4x faster than hypre PtAP
>>>>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>>>>> >
>>>>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I
>>>>>> am implementing an AMG algorithm based on PETSc right now, and it is
>>>>>> working well. But we find some a bottleneck with PtAP. For the same P and
>>>>>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, 
>>>>>> while
>>>>>> hypre still can generates the coarse matrix.
>>>>>> >
>>>>>> > I do not want to just use the HYPRE one because we had to duplicate
>>>>>> matrices if I used HYPRE PtAP.
>>>>>> >
>>>>>> > It would be nice if you guys already have done some compassions on
>>>>>> these implementations for the memory usage.
>>>>>> > Do you encounter memory issue with  scalable PtAP?
>>>>>> >
>>>>>> > By default do we use the scalable PtAP?? Do we have to specify some
>>>>>> options to use the scalable version of PtAP?  If so, it would be nice to
>>>>>> use the scalable version by default.  I am totally missing something 
>>>>>> here.
>>>>>> >
>>>>>> > Thanks,
>>>>>> >
>>>>>> > Fande
>>>>>> >
>>>>>> >
>>>>>> > Karl had a student in the summer who improved MatPtAP(). Do you use
>>>>>> the latest version of petsc?
>>>>>> > HYPRE may use less memory than PETSc because it does not save and
>>>>>> reuse the matrices.
>>>>>> >
>>>>>> > I do not understand why generating coarse matrix fails due to out
>>>>>> of memory. Do you use direct solver at coarse grid?
>>>>>> > Hong
>>>>>> >
>>>>>> > Based on above observation, I set the default PtAP algorithm as
>>>>>> 'nonscalable'.
>>>>>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>>>>>> 'scalable'.
>>>>>> > User can overwrite default.
>>>>>> >
>>>>>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I
>>>>>> get
>>>>>> > MatPtAP                   3.6224e+01 (nonscalable for small mats,
>>>>>> scalable for larger ones)
>>>>>> > scalable MatPtAP     4.6129e+01
>>>>>> > hypre                        1.9389e+02
>>>>>> >
>>>>>> > This work in on petsc-master. Give it a try. If you encounter any
>>>>>> problem, let me know.
>>>>>> >
>>>>>> > Hong
>>>>>> >
>>>>>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams <mfad...@lbl.gov>
>>>>>> wrote:
>>>>>> > (Hong), what is the current state of optimizing RAP for scaling?
>>>>>> >
>>>>>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we
>>>>>> are working out performance problems. They are hitting problems at ~1.5B
>>>>>> dof problems on a basic Cray (XC30 I think).
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Mark
>>>>>> >
>>>>>>
>>>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Reply via email to