Re: [petsc-dev] no petsc on Edison

Hong Tue, 28 Feb 2017 07:55:59 -0800

This work is merged to master
https://bitbucket.org/petsc/petsc/commits/57c33550d09512dabc9444743ef186e1d13ee746


Hong

On Tue, Feb 28, 2017 at 8:04 AM, Mark Adams <[email protected]> wrote:

> Yes, this looks great. Good to get some data on this. Hard problem.
>
> On Mon, Feb 27, 2017 at 3:31 PM, Barry Smith <[email protected]> wrote:
>
>>
>>    Hong,
>>
>>   Very nice, thanks!
>>
>>     Barry
>>
>>
>> > On Feb 27, 2017, at 2:06 PM, Hong <[email protected]> wrote:
>> >
>> > Mark,
>> > I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has small
>> memory, 1GB/core):
>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>> > - scalable PtAP:       do sparse axpy without use of PN array
>> > - hypre PtAP.
>> >
>> > The results are attached. Summary:
>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>> > - scalable PtAP is 4x faster than hypre PtAP
>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>> >
>> > Based on above observation, I set the default PtAP algorithm as
>> 'nonscalable'.
>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>> 'scalable'.
>> > User can overwrite default.
>> >
>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>> > MatPtAP                   3.6224e+01 (nonscalable for small mats,
>> scalable for larger ones)
>> > scalable MatPtAP     4.6129e+01
>> > hypre                        1.9389e+02
>> >
>> > I'm merging this work to next, then to master soon.
>> >
>> > Hong
>> >
>> > On Wed, Feb 8, 2017 at 7:25 PM, Mark Adams <[email protected]> wrote:
>> > Thanks Hong, that sounds great.
>> >
>> > I am weary of silent optimizations like you suggest but 2x is big! and
>> failure is very bad. So I would vote for your suggestion.
>> >
>> > ex56 is an elasticity problem. It would be nice, now that you have this
>> experimental setup in place, to compare with hypre on a 3D scalar problem.
>> Hypre might not spend much effort optimizing for block matrices. 3x better
>> than hypre seems large to me.  I have to suspect that hypre does not
>> exploit blocked matrices as much as we do.
>> >
>> > Thanks again,
>> >
>> >
>> >
>> > On Wed, Feb 8, 2017 at 12:07 PM, Hong <[email protected]> wrote:
>> > I conducted tests on MatMatMult() and MatPtAP() using
>> petsc/src/ksp/ksp/examples/tutorials/ex56.c (gamg) on a 8-core machine
>> (petsc machine). The output file is attached.
>> >
>> > Summary:
>> > 1) non-scalable MatMatMult() for mpiaij format is 2x faster than
>> scalable version. The major difference between the two is dense-axpy vs.
>> sparse-axpy.
>> >
>> > Currently, we set non-scalable as default, which leads to problem when
>> running large problems.
>> > How about setting default as
>> >  - non-scalable for small to medium size matrices
>> >  - scalable for larger ones, e.g.
>> >
>> > +    ierr = PetscOptionsEList("-matmatmult_via","Algorithmic
>> approach","MatMatMult",algTypes,nalg,algTypes[1],&alg,&flg);
>> >
>> > +    if (!flg) { /* set default algorithm based on B->cmap->N */
>> > +      PetscMPIInt size;
>> > +      ierr = MPI_Comm_size(comm,&size);CHKERRQ(ierr);
>> > +      if ((PetscReal)(B->cmap->N)/size > 100000.0) alg = 0; /*
>> scalable algorithm */
>> > +    }
>> >
>> > i.e., if user does NOT pick an algorithm, when ave cols per process >
>> 100k, use scalable implementation; otherwise, non-scalable version.
>> >
>> > 2) We do NOT have scalable implementation for MatPtAP() yet.
>> > We have non-scalable PtAP and interface to Hypre's PtAP. Comparing the
>> two,
>> > Petsc MatPtAP() is approx 3x faster than Hypre's.
>> >
>> > I'm writing a scalable MatPtAP() now.
>> >
>> > Hong
>> >
>> > On Thu, Feb 2, 2017 at 2:54 PM, Stefano Zampini <
>> [email protected]> wrote:
>> >
>> >
>> > Il 02 Feb 2017 23:43, "Mark Adams" <[email protected]> ha scritto:
>> >
>> >
>> > On Thu, Feb 2, 2017 at 12:02 PM, Stefano Zampini <
>> [email protected]> wrote:
>> > Mark,
>> >
>> > I saw your configuration has hypre. If you could run with master, you
>> may try -matptap_via hypre.
>> >
>> > This is worth trying. Does this even work with GAMG?
>> >
>> > Yes, it should work, except that the block sizes, if any, are not
>> propagated to the resulting matrix. I can add it if you need it.
>> >
>> >
>> >
>> > Treb: try hypre anyway. It has its own RAP code.
>> >
>> >
>> > With that option, you will use hypre's RAP with MATAIJ
>> >
>> >
>> > It uses BoomerAMGBuildCoarseOperator directly with the AIJ matrices.
>> >
>> > Stefano
>> >
>> >> On Feb 2, 2017, at 7:28 PM, Mark Adams <[email protected]> wrote:
>> >>
>> >>
>> >>
>> >> On Thu, Feb 2, 2017 at 11:13 AM, Hong <[email protected]> wrote:
>> >> Mark:
>> >> Try '-matmatmult_via scalable' first. If this works, should we set it
>> as default?
>> >>
>> >> If it is robust I would say yes unless it is noticeably slower (say
>> >20%) small scale problems.
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> > <out_ex56_cetus_short>
>>
>>
>

Re: [petsc-dev] no petsc on Edison

Reply via email to