Re: [petsc-users] Select a preconditioner for SLEPc eigenvalue solver Jacobi-Davidson

2019-11-05 Thread Fande Kong via petsc-users
How about I want to determine the ST type on runtime?

 mpirun -n 1 ./ex3  -eps_type jd -st_ksp_type gmres  -st_pc_type none
-eps_view  -eps_target  0 -eps_monitor  -st_ksp_monitor

ST is indeed STPrecond, but the passed preconditioning matrix is still
ignored.

EPS Object: 1 MPI processes
  type: jd
search subspace is orthogonalized
block size=1
type of the initial subspace: non-Krylov
size of the subspace after restarting: 6
number of vectors after restarting from the previous iteration: 1
threshold for changing the target in the correction equation (fix): 0.01
  problem type: symmetric eigenvalue problem
  selected portion of the spectrum: closest to target: 0. (in magnitude)
  number of eigenvalues (nev): 1
  number of column vectors (ncv): 17
  maximum dimension of projected problem (mpd): 17
  maximum number of iterations: 1700
  tolerance: 1e-08
  convergence test: relative to the eigenvalue
BV Object: 1 MPI processes
  type: svec
  17 columns of global length 100
  vector orthogonalization method: classical Gram-Schmidt
  orthogonalization refinement: if needed (eta: 0.7071)
  block orthogonalization method: GS
  doing matmult as a single matrix-matrix product
DS Object: 1 MPI processes
  type: hep
  solving the problem with: Implicit QR method (_steqr)
ST Object: 1 MPI processes
  type: precond
  shift: 0.
  number of matrices: 1
  KSP Object: (st_) 1 MPI processes
type: gmres
  restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
  happy breakdown tolerance 1e-30
maximum iterations=90, initial guess is zero
tolerances:  relative=0.0001, absolute=1e-50, divergence=1.
left preconditioning
using PRECONDITIONED norm type for convergence test
  PC Object: (st_) 1 MPI processes
type: none
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
  type: shell
  rows=100, cols=100
 Solution method: jd


Preconding matrix should be a SeqAIJ not shell.


Fande,

On Tue, Nov 5, 2019 at 9:07 AM Jose E. Roman  wrote:

> Currently, the function that passes the preconditioner matrix is specific
> of STPRECOND, so you have to add
>   ierr = STSetType(st,STPRECOND);CHKERRQ(ierr);
> before
>   ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr);
> otherwise this latter call is ignored.
>
> We may be changing a little bit the way in which ST is initialized, and
> maybe we modify this as well. It is not decided yet.
>
> Jose
>
>
> > El 5 nov 2019, a las 0:28, Fande Kong  escribió:
> >
> > Thanks Jose,
> >
> > I think I understand now. Another question: what is the right way to
> setup a linear preconditioning matrix for the inner linear solver of JD?
> >
> > I was trying to do something like this:
> >
> >   /*
> >  Create eigensolver context
> >   */
> >   ierr = EPSCreate(PETSC_COMM_WORLD,);CHKERRQ(ierr);
> >
> >   /*
> >  Set operators. In this case, it is a standard eigenvalue problem
> >   */
> >   ierr = EPSSetOperators(eps,A,NULL);CHKERRQ(ierr);
> >   ierr = EPSSetProblemType(eps,EPS_HEP);CHKERRQ(ierr);
> >   ierr = EPSGetST(eps,);CHKERRQ(ierr);
> >   ierr = STPrecondSetMatForPC(st,B);CHKERRQ(ierr);
> >
> >   /*
> >  Set solver parameters at runtime
> >   */
> >   ierr = EPSSetFromOptions(eps);CHKERRQ(ierr);
> >
> >   /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> >   Solve the eigensystem
> >  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> */
> >
> >   ierr = EPSSolve(eps);CHKERRQ(ierr);
> >
> >
> > But did not work. A complete example is attached.  I could try to dig
> into the code, but you may already know the answer.
> >
> >
> > On Wed, Oct 23, 2019 at 3:58 AM Jose E. Roman 
> wrote:
> > Yes, it is confusing. Here is the explanation: when you use a target,
> the preconditioner is built from matrix A-sigma*B. By default, instead of
> TARGET_MAGNITUDE we set LARGEST_MAGNITUDE, and in Jacobi-Davidson we treat
> this case by setting sigma=PETSC_MAX_REAL. In this case, the preconditioner
> is built from matrix B. The thing is that in a standard eigenproblem we
> have B=I, and hence there is no point in using a preconditioner, that is
> why we set PCNONE.
> >
> > Jose
> >
> >
> > > El 22 oct 2019, a las 19:57, Fande Kong via petsc-users <
> petsc-users@mcs.anl.gov> escribió:
> > >
> > > Hi All,
> > >
> > > It looks like the preconditioner is hard-coded in the Jacobi-Davidson
> solver. I could not select a preconditioner rather than the default setting.
> > >
> > > For example, I was trying to select LU,

Re: [petsc-users] Problem about Residual evaluation

2019-04-01 Thread Fande Kong via petsc-users
On Mon, Apr 1, 2019 at 10:24 AM Matthew Knepley via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> On Mon, Apr 1, 2019 at 10:22 AM Yingjie Wu via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Dear PETSc developers:
>> Hi,
>>
>> I've been using -snes_mf_operator and I've customized a precondition
>> matrix to solve my problem.I have two questions about the residuals of
>> linear steps(KSP residual).
>>
>>
>> 1.Since I'm using a matrix-free method, how do we get KSP residuals in
>> PETSc?
>>
>> r_m = b - A*x_m
>>
>> Is finite difference used to approximate "A*x_m" ?
>>
>
> Yes.
>
>
>> 2.What is the difference between instruction ' -ksp_monitor ' and '
>> -ksp_monitor_true_residual ' in how they are calculated?
>>
>
> The true residual is the unpreconditioned residual.
>

I actually have some specific understanding on " -ksp_monitor_true_residual",
but not sure it is right or not.  If I am wrong, please correct me.

When the preconditioning  matrix is super ill-conditioned, the ``true
residual" is not necessary ``true"  for the right preconditioning since an
unwind process is applied.  That is,   "-ksp_monitor_true_residual" does
not print ||b-Ax||, instead, it prints  the unpreconditioned residual by
unpreconditioning  the preconditioned residual.


Thanks,

Fande,



>
>   Matt
>
>
>> Thanks,
>>
>> Yingjie
>>
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


[petsc-users] How to use khash in PETSc?

2019-03-24 Thread Fande Kong via petsc-users
Hi All,

Since PetscTable will be replaced by khash in the future somehow,  it is
better to use khash for new implementations. I was wondering where I can
find some examples that use khash? Do we have any petsc wrappers of khash?

Thanks,

Fande,


Re: [petsc-users] MPI Iterative solver crash on HPC

2019-01-14 Thread Fande Kong via petsc-users
Hi Hong,

According to this PR
https://bitbucket.org/petsc/petsc/pull-requests/1061/a_selinger-feature-faster-scalable/diff

Should we set the scalable algorithm as default?

Thanks,

Fande Kong,

On Fri, Jan 11, 2019 at 10:34 AM Zhang, Hong via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Add option '-mattransposematmult_via scalable'
> Hong
>
> On Fri, Jan 11, 2019 at 9:52 AM Zhang, Junchao via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> I saw the following error message in your first email.
>>
>> [0]PETSC ERROR: Out of memory. This could be due to allocating
>> [0]PETSC ERROR: too large an object or bleeding by not properly
>> [0]PETSC ERROR: destroying unneeded objects.
>>
>> Probably the matrix is too large. You can try with more compute nodes,
>> for example, use 8 nodes instead of 2, and see what happens.
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Jan 11, 2019 at 7:45 AM Sal Am via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Using a larger problem set with 2B non-zero elements and a matrix of 25M
>>> x 25M I get the following error:
>>> [4]PETSC ERROR:
>>> 
>>> [4]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>>> probably memory access out of range
>>> [4]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> [4]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>> [4]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
>>> OS X to find memory corruption errors
>>> [4]PETSC ERROR: likely location of problem given in stack below
>>> [4]PETSC ERROR: -  Stack Frames
>>> 
>>> [4]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>>> available,
>>> [4]PETSC ERROR:   INSTEAD the line number of the start of the
>>> function
>>> [4]PETSC ERROR:   is given.
>>> [4]PETSC ERROR: [4] MatCreateSeqAIJWithArrays line 4422
>>> /lustre/home/vef002/petsc/src/mat/impls/aij/seq/aij.c
>>> [4]PETSC ERROR: [4] MatMatMultSymbolic_SeqAIJ_SeqAIJ line 747
>>> /lustre/home/vef002/petsc/src/mat/impls/aij/seq/matmatmult.c
>>> [4]PETSC ERROR: [4]
>>> MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ_nonscalable line 1256
>>> /lustre/home/vef002/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c
>>> [4]PETSC ERROR: [4] MatTransposeMatMult_MPIAIJ_MPIAIJ line 1156
>>> /lustre/home/vef002/petsc/src/mat/impls/aij/mpi/mpimatmatmult.c
>>> [4]PETSC ERROR: [4] MatTransposeMatMult line 9950
>>> /lustre/home/vef002/petsc/src/mat/interface/matrix.c
>>> [4]PETSC ERROR: [4] PCGAMGCoarsen_AGG line 871
>>> /lustre/home/vef002/petsc/src/ksp/pc/impls/gamg/agg.c
>>> [4]PETSC ERROR: [4] PCSetUp_GAMG line 428
>>> /lustre/home/vef002/petsc/src/ksp/pc/impls/gamg/gamg.c
>>> [4]PETSC ERROR: [4] PCSetUp line 894
>>> /lustre/home/vef002/petsc/src/ksp/pc/interface/precon.c
>>> [4]PETSC ERROR: [4] KSPSetUp line 304
>>> /lustre/home/vef002/petsc/src/ksp/ksp/interface/itfunc.c
>>> [4]PETSC ERROR: - Error Message
>>> --
>>> [4]PETSC ERROR: Signal received
>>> [4]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [4]PETSC ERROR: Petsc Release Version 3.10.2, unknown
>>> [4]PETSC ERROR: ./solveCSys on a linux-cumulus-debug named r02g03 by
>>> vef002 Fri Jan 11 09:13:23 2019
>>> [4]PETSC ERROR: Configure options PETSC_ARCH=linux-cumulus-debug
>>> --with-cc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicc
>>> --with-fc=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpifort
>>> --with-cxx=/usr/local/depot/openmpi-3.1.1-gcc-7.3.0/bin/mpicxx
>>> --download-parmetis --download-metis --download-ptscotch
>>> --download-superlu_dist --download-mumps --with-scalar-type=complex
>>> --with-debugging=yes --download-scalapack --download-superlu
>>> --download-fblaslapack=1 --download-cmake
>>> [4]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>>>
>>> --
>>> MPI_ABORT was invoked on rank 4 in communicator MPI_COMM_WORLD
>>> with errorcode 59.
>>>
>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> You may or may not see output from other processes, depending on
>>> exactly when Open MPI kills them.
>>>
>>> --
>>> [0]PETSC ERROR:
>>> 
>>> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
>>> batch system) has told this process to end
>>> [0]PETSC ERROR: Try option -start_in_debugger or
>>> -on_error_attach_debugger
>>> [0]PETSC ERROR: or see
>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>
>>> Using Valgrind on only one of the valgrind files the following error was
>>> written:
>>>
>>> ==9053== Invalid read of 

Re: [petsc-users] Any reason for API change: DMGetWorkArray()

2019-01-10 Thread Fande Kong via petsc-users
OK...,

Thanks for the words.

Fande,

On Thu, Jan 10, 2019 at 3:36 PM Matthew Knepley  wrote:

> On Thu, Jan 10, 2019 at 5:31 PM Fande Kong  wrote:
>
>> Thanks, Matt,
>>
>> And then what is the reason to remove PetscDataType? I am out of
>> curiosity.
>>
>
> Occam's Razor: "one should not increase, beyond what is necessary, the
> number of entities required to explain anything"
>
>Matt
>
>
>> DMGetWorkArray is a little misleading. This may/might make people think
>> the routine is related to MPI, but it does not have anything to do with MPI.
>>
>> Thanks,
>>
>> Fande,
>>
>> On Thu, Jan 10, 2019 at 3:22 PM Matthew Knepley 
>> wrote:
>>
>>> We are trying to eliminate PetscDataType.
>>>
>>>   Matt
>>>
>>> On Thu, Jan 10, 2019 at 5:10 PM Fande Kong via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>>
>>>> Hi All,
>>>>
>>>> The second parameter is changed from PetscDataType to MPI_Datatype
>>>> starting from PETSc-3.9.x
>>>>
>>>> Thanks,
>>>>
>>>> Fande Kong,
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


[petsc-users] Any reason for API change: DMGetWorkArray()

2019-01-10 Thread Fande Kong via petsc-users
Hi All,

The second parameter is changed from PetscDataType to MPI_Datatype starting
from PETSc-3.9.x

Thanks,

Fande Kong,


Re: [petsc-users] GAMG scaling

2018-12-21 Thread Fande Kong via petsc-users
Sorry, hit the wrong button.



On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:

>
>
> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>
>> Also, you mentioned that you are using 10 levels. This is very strange
>> with GAMG. You can run with -info and grep on GAMG to see the sizes and the
>> number of non-zeros per level. You should coarsen at a rate of about 2^D to
>> 3^D with GAMG (with 10 levels this would imply a very large fine grid
>> problem so I suspect there is something strange going on with coarsening).
>> Mark
>>
>
> Hi Mark,
>
>
Thanks for your email. We did not try GAMG much for our problems since we
still have troubles to figure out how to effectively use GAMG so far.
Instead, we are building our own customized  AMG  that needs to use PtAP to
construct coarse matrices.  The customized AMG works pretty well for our
specific simulations. The bottleneck right now is that PtAP might
take too much memory, and the code crashes within the function "PtAP". I
defiantly need a memory profiler to confirm my statement here.

Thanks,

Fande Kong,



>
>
>
>>
>> On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Fande:
>>> I will explore it and get back to you.
>>> Does anyone know how to profile memory usage?
>>> Hong
>>>
>>> Thanks, Hong,
>>>>
>>>> I just briefly went through the code. I was wondering if it is possible
>>>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>>>> the memory after the coarse matrix is assembled. I understand you may still
>>>> want to reuse these data structures by default but for my simulation, the
>>>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>>>
>>>
>>>> It would be great, if we could have this optional functionality.
>>>>
>>>> Fande Kong,
>>>>
>>>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:
>>>>
>>>>> We use nonscalable implementation as default, and switch to scalable
>>>>> for matrices over finer grids. You may use option '-matptap_via scalable'
>>>>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>>>>> works.
>>>>> Hong
>>>>>
>>>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>>>>> for "large" problems, which is determined by some heuristic.
>>>>>>
>>>>>>Barry
>>>>>>
>>>>>>
>>>>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>>>>> petsc-users@mcs.anl.gov> wrote:
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
>>>>>> wrote:
>>>>>> > Fande:
>>>>>> > Hong,
>>>>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>>>>> algorithms.
>>>>>> >
>>>>>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>>>>>> > Mark,
>>>>>> > Below is the copy of my email sent to you on Feb 27:
>>>>>> >
>>>>>> > I implemented scalable MatPtAP and did comparisons of three
>>>>>> implementations using ex56.c on alcf cetus machine (this machine has 
>>>>>> small
>>>>>> memory, 1GB/core):
>>>>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>>>>> > - scalable PtAP:   do sparse axpy without use of PN array
>>>>>> >
>>>>>> > What PN means here?
>>>>>> > Global number of columns of P.
>>>>>> >
>>>>>> > - hypre PtAP.
>>>>>> >
>>>>>> > The results are attached. Summary:
>>>>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
>>>>>> PtAP
>>>>>> > - scalable PtAP is 4x faster than hypre PtAP
>>>>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>>>>> >
>>>>>> > I was wondering how much more m

Re: [petsc-users] GAMG scaling

2018-12-21 Thread Fande Kong via petsc-users
Thanks so much, Hong,

If any new finding, please let me know.


On Fri, Dec 21, 2018 at 9:36 AM Zhang, Hong  wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
>

We are using gperftools
https://gperftools.github.io/gperftools/heapprofile.html

Fande,



> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is possible
>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>> the memory after the coarse matrix is assembled. I understand you may still
>> want to reuse these data structures by default but for my simulation, the
>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable for
>>> matrices over finer grids. You may use option '-matptap_via scalable' to
>>> force scalable PtAP  implementation for all PtAP. Let me know if it works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>
>>>>
>>>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>>> for "large" problems, which is determined by some heuristic.
>>>>
>>>>Barry
>>>>
>>>>
>>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>>> petsc-users@mcs.anl.gov> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
>>>> wrote:
>>>> > Fande:
>>>> > Hong,
>>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>>> algorithms.
>>>> >
>>>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>>>> > Mark,
>>>> > Below is the copy of my email sent to you on Feb 27:
>>>> >
>>>> > I implemented scalable MatPtAP and did comparisons of three
>>>> implementations using ex56.c on alcf cetus machine (this machine has small
>>>> memory, 1GB/core):
>>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>>> > - scalable PtAP:   do sparse axpy without use of PN array
>>>> >
>>>> > What PN means here?
>>>> > Global number of columns of P.
>>>> >
>>>> > - hypre PtAP.
>>>> >
>>>> > The results are attached. Summary:
>>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
>>>> PtAP
>>>> > - scalable PtAP is 4x faster than hypre PtAP
>>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>>> >
>>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I am
>>>> implementing an AMG algorithm based on PETSc right now, and it is working
>>>> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
>>>> PtAP fails to generate a coarse matrix due to out of memory, while hypre
>>>> still can generates the coarse matrix.
>>>> >
>>>> > I do not want to just use the HYPRE one because we had to duplicate
>>>> matrices if I used HYPRE PtAP.
>>>> >
>>>> > It would be nice if you guys already have done some compassions on
>>>> these implementations for the memory usage.
>>>> > Do you encounter memory issue with  scalable PtAP?
>>>> >
>>>> > By default do we use the scalable PtAP?? Do we have to specify some
>>>> options to use the scalable version of PtAP?  If so, it would be nice to
>>>> use the scalable version by default.  I am totally missing something here.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Fande
>>>> >
>>>> >
>>>> > Karl had a student in the summer who improved MatPtAP(). Do you use
>>>> the latest version of petsc?
>>>> > HYPRE may use less memory than PETSc because it does not save and
>>>> reuse the matrices.
>>>> >
>>>> > I do not understand why generating coarse matrix fails due to out of
>>>> memory. Do you use direct solver at coarse grid?
>>>> > Hong
>>>> >
>>>> > Based on above observation, I set the default PtAP algorithm as
>>>> 'nonscalable'.
>>>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>>>> 'scalable'.
>>>> > User can overwrite default.
>>>> >
>>>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>>>> > MatPtAP   3.6224e+01 (nonscalable for small mats,
>>>> scalable for larger ones)
>>>> > scalable MatPtAP 4.6129e+01
>>>> > hypre1.9389e+02
>>>> >
>>>> > This work in on petsc-master. Give it a try. If you encounter any
>>>> problem, let me know.
>>>> >
>>>> > Hong
>>>> >
>>>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
>>>> > (Hong), what is the current state of optimizing RAP for scaling?
>>>> >
>>>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we
>>>> are working out performance problems. They are hitting problems at ~1.5B
>>>> dof problems on a basic Cray (XC30 I think).
>>>> >
>>>> > Thanks,
>>>> > Mark
>>>> >
>>>>
>>>>


Re: [petsc-users] GAMG scaling

2018-12-20 Thread Fande Kong via petsc-users
Thanks, Hong,

I just briefly went through the code. I was wondering if it is possible to
destroy "c->ptap" (that caches a lot of intermediate data) to release the
memory after the coarse matrix is assembled. I understand you may still
want to reuse these data structures by default but for my simulation, the
preconditioner is fixed and there is no reason to keep the "c->ptap".

It would be great, if we could have this optional functionality.

Fande Kong,

On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:

> We use nonscalable implementation as default, and switch to scalable for
> matrices over finer grids. You may use option '-matptap_via scalable' to
> force scalable PtAP  implementation for all PtAP. Let me know if it works.
> Hong
>
> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
> wrote:
>
>>
>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for
>> "large" problems, which is determined by some heuristic.
>>
>>Barry
>>
>>
>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>> >
>> >
>> >
>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong  wrote:
>> > Fande:
>> > Hong,
>> > Thanks for your improvements on PtAP that is critical for MG-type
>> algorithms.
>> >
>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>> > Mark,
>> > Below is the copy of my email sent to you on Feb 27:
>> >
>> > I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has small
>> memory, 1GB/core):
>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>> > - scalable PtAP:   do sparse axpy without use of PN array
>> >
>> > What PN means here?
>> > Global number of columns of P.
>> >
>> > - hypre PtAP.
>> >
>> > The results are attached. Summary:
>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>> > - scalable PtAP is 4x faster than hypre PtAP
>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>> >
>> > I was wondering how much more memory PETSc PtAP uses than hypre? I am
>> implementing an AMG algorithm based on PETSc right now, and it is working
>> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
>> PtAP fails to generate a coarse matrix due to out of memory, while hypre
>> still can generates the coarse matrix.
>> >
>> > I do not want to just use the HYPRE one because we had to duplicate
>> matrices if I used HYPRE PtAP.
>> >
>> > It would be nice if you guys already have done some compassions on
>> these implementations for the memory usage.
>> > Do you encounter memory issue with  scalable PtAP?
>> >
>> > By default do we use the scalable PtAP?? Do we have to specify some
>> options to use the scalable version of PtAP?  If so, it would be nice to
>> use the scalable version by default.  I am totally missing something here.
>> >
>> > Thanks,
>> >
>> > Fande
>> >
>> >
>> > Karl had a student in the summer who improved MatPtAP(). Do you use the
>> latest version of petsc?
>> > HYPRE may use less memory than PETSc because it does not save and reuse
>> the matrices.
>> >
>> > I do not understand why generating coarse matrix fails due to out of
>> memory. Do you use direct solver at coarse grid?
>> > Hong
>> >
>> > Based on above observation, I set the default PtAP algorithm as
>> 'nonscalable'.
>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>> 'scalable'.
>> > User can overwrite default.
>> >
>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>> > MatPtAP   3.6224e+01 (nonscalable for small mats,
>> scalable for larger ones)
>> > scalable MatPtAP 4.6129e+01
>> > hypre1.9389e+02
>> >
>> > This work in on petsc-master. Give it a try. If you encounter any
>> problem, let me know.
>> >
>> > Hong
>> >
>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
>> > (Hong), what is the current state of optimizing RAP for scaling?
>> >
>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we are
>> working out performance problems. They are hitting problems at ~1.5B dof
>> problems on a basic Cray (XC30 I think).
>> >
>> > Thanks,
>> > Mark
>> >
>>
>>