Mark, Below is the copy of my email sent to you on Feb 27: I implemented scalable MatPtAP and did comparisons of three implementations using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core): - nonscalable PtAP: use an array of length PN to do dense axpy - scalable PtAP: do sparse axpy without use of PN array - hypre PtAP.
The results are attached. Summary: - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP - scalable PtAP is 4x faster than hypre PtAP - hypre uses less memory (see job.ne399.n63.np1000.sh) Based on above observation, I set the default PtAP algorithm as 'nonscalable'. When PN > local estimated nonzero of C=PtAP, then switch default to 'scalable'. User can overwrite default. For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get MatPtAP 3.6224e+01 (nonscalable for small mats, scalable for larger ones) scalable MatPtAP 4.6129e+01 hypre 1.9389e+02 This work in on petsc-master. Give it a try. If you encounter any problem, let me know. Hong On Wed, May 3, 2017 at 10:01 AM, Mark Adams <[email protected]> wrote: > (Hong), what is the current state of optimizing RAP for scaling? > > Nate, is driving 3D elasticity problems at scaling with GAMG and we are > working out performance problems. They are hitting problems at ~1.5B dof > problems on a basic Cray (XC30 I think). > > Thanks, > Mark >
out_ex56_cetus_short
Description: Binary data
