This work is merged to master https://bitbucket.org/petsc/petsc/commits/57c33550d09512dabc9444743ef186e1d13ee746
Hong On Tue, Feb 28, 2017 at 8:04 AM, Mark Adams <[email protected]> wrote: > Yes, this looks great. Good to get some data on this. Hard problem. > > On Mon, Feb 27, 2017 at 3:31 PM, Barry Smith <[email protected]> wrote: > >> >> Hong, >> >> Very nice, thanks! >> >> Barry >> >> >> > On Feb 27, 2017, at 2:06 PM, Hong <[email protected]> wrote: >> > >> > Mark, >> > I implemented scalable MatPtAP and did comparisons of three >> implementations using ex56.c on alcf cetus machine (this machine has small >> memory, 1GB/core): >> > - nonscalable PtAP: use an array of length PN to do dense axpy >> > - scalable PtAP: do sparse axpy without use of PN array >> > - hypre PtAP. >> > >> > The results are attached. Summary: >> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP >> > - scalable PtAP is 4x faster than hypre PtAP >> > - hypre uses less memory (see job.ne399.n63.np1000.sh) >> > >> > Based on above observation, I set the default PtAP algorithm as >> 'nonscalable'. >> > When PN > local estimated nonzero of C=PtAP, then switch default to >> 'scalable'. >> > User can overwrite default. >> > >> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >> > MatPtAP 3.6224e+01 (nonscalable for small mats, >> scalable for larger ones) >> > scalable MatPtAP 4.6129e+01 >> > hypre 1.9389e+02 >> > >> > I'm merging this work to next, then to master soon. >> > >> > Hong >> > >> > On Wed, Feb 8, 2017 at 7:25 PM, Mark Adams <[email protected]> wrote: >> > Thanks Hong, that sounds great. >> > >> > I am weary of silent optimizations like you suggest but 2x is big! and >> failure is very bad. So I would vote for your suggestion. >> > >> > ex56 is an elasticity problem. It would be nice, now that you have this >> experimental setup in place, to compare with hypre on a 3D scalar problem. >> Hypre might not spend much effort optimizing for block matrices. 3x better >> than hypre seems large to me. I have to suspect that hypre does not >> exploit blocked matrices as much as we do. >> > >> > Thanks again, >> > >> > >> > >> > On Wed, Feb 8, 2017 at 12:07 PM, Hong <[email protected]> wrote: >> > I conducted tests on MatMatMult() and MatPtAP() using >> petsc/src/ksp/ksp/examples/tutorials/ex56.c (gamg) on a 8-core machine >> (petsc machine). The output file is attached. >> > >> > Summary: >> > 1) non-scalable MatMatMult() for mpiaij format is 2x faster than >> scalable version. The major difference between the two is dense-axpy vs. >> sparse-axpy. >> > >> > Currently, we set non-scalable as default, which leads to problem when >> running large problems. >> > How about setting default as >> > - non-scalable for small to medium size matrices >> > - scalable for larger ones, e.g. >> > >> > + ierr = PetscOptionsEList("-matmatmult_via","Algorithmic >> approach","MatMatMult",algTypes,nalg,algTypes[1],&alg,&flg); >> > >> > + if (!flg) { /* set default algorithm based on B->cmap->N */ >> > + PetscMPIInt size; >> > + ierr = MPI_Comm_size(comm,&size);CHKERRQ(ierr); >> > + if ((PetscReal)(B->cmap->N)/size > 100000.0) alg = 0; /* >> scalable algorithm */ >> > + } >> > >> > i.e., if user does NOT pick an algorithm, when ave cols per process > >> 100k, use scalable implementation; otherwise, non-scalable version. >> > >> > 2) We do NOT have scalable implementation for MatPtAP() yet. >> > We have non-scalable PtAP and interface to Hypre's PtAP. Comparing the >> two, >> > Petsc MatPtAP() is approx 3x faster than Hypre's. >> > >> > I'm writing a scalable MatPtAP() now. >> > >> > Hong >> > >> > On Thu, Feb 2, 2017 at 2:54 PM, Stefano Zampini < >> [email protected]> wrote: >> > >> > >> > Il 02 Feb 2017 23:43, "Mark Adams" <[email protected]> ha scritto: >> > >> > >> > On Thu, Feb 2, 2017 at 12:02 PM, Stefano Zampini < >> [email protected]> wrote: >> > Mark, >> > >> > I saw your configuration has hypre. If you could run with master, you >> may try -matptap_via hypre. >> > >> > This is worth trying. Does this even work with GAMG? >> > >> > Yes, it should work, except that the block sizes, if any, are not >> propagated to the resulting matrix. I can add it if you need it. >> > >> > >> > >> > Treb: try hypre anyway. It has its own RAP code. >> > >> > >> > With that option, you will use hypre's RAP with MATAIJ >> > >> > >> > It uses BoomerAMGBuildCoarseOperator directly with the AIJ matrices. >> > >> > Stefano >> > >> >> On Feb 2, 2017, at 7:28 PM, Mark Adams <[email protected]> wrote: >> >> >> >> >> >> >> >> On Thu, Feb 2, 2017 at 11:13 AM, Hong <[email protected]> wrote: >> >> Mark: >> >> Try '-matmatmult_via scalable' first. If this works, should we set it >> as default? >> >> >> >> If it is robust I would say yes unless it is noticeably slower (say >> >20%) small scale problems. >> >> >> > >> > >> > >> > >> > >> > >> > <out_ex56_cetus_short> >> >> >
