Re: [petsc-dev] no petsc on Edison

Hong Mon, 27 Feb 2017 12:07:08 -0800

Mark,
I implemented scalable MatPtAP and did comparisons of three implementations
using ex56.c on alcf cetus machine (this machine has small memory,
1GB/core):
- nonscalable PtAP: use an array of length PN to do dense axpy
- scalable PtAP:       do sparse axpy without use of PN array
- hypre PtAP.


The results are attached. Summary:
- nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
- scalable PtAP is 4x faster than hypre PtAP
- hypre uses less memory (see job.ne399.n63.np1000.sh)

Based on above observation, I set the default PtAP algorithm as
'nonscalable'.
When PN > local estimated nonzero of C=PtAP, then switch default to
'scalable'.
User can overwrite default.

For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
MatPtAP                   3.6224e+01 (nonscalable for small mats, scalable
for larger ones)
scalable MatPtAP     4.6129e+01
hypre                        1.9389e+02

I'm merging this work to next, then to master soon.

Hong

On Wed, Feb 8, 2017 at 7:25 PM, Mark Adams <[email protected]> wrote:

> Thanks Hong, that sounds great.
>
> I am weary of silent optimizations like you suggest but 2x is big! and
> failure is very bad. So I would vote for your suggestion.
>
> ex56 is an elasticity problem. It would be nice, now that you have this
> experimental setup in place, to compare with hypre on a 3D scalar
> problem. Hypre might not spend much effort optimizing for block matrices.
> 3x better than hypre seems large to me.  I have to suspect that hypre does
> not exploit blocked matrices as much as we do.
>
> Thanks again,
>
>
>
> On Wed, Feb 8, 2017 at 12:07 PM, Hong <[email protected]> wrote:
>
>> I conducted tests on MatMatMult() and MatPtAP()
>> using petsc/src/ksp/ksp/examples/tutorials/ex56.c (gamg) on a 8-core
>> machine (petsc machine). The output file is attached.
>>
>> Summary:
>> 1) non-scalable MatMatMult() for mpiaij format is 2x faster than scalable
>> version. The major difference between the two is dense-axpy vs. sparse-axpy.
>>
>> Currently, we set non-scalable as default, which leads to problem when
>> running large problems.
>> How about setting default as
>>  - non-scalable for small to medium size matrices
>>  - scalable for larger ones, e.g.
>>
>> +    ierr = PetscOptionsEList("-matmatmult_via","Algorithmic
>> approach","MatMatMult",algTypes,nalg,algTypes[1],&alg,&flg);
>>
>> +    if (!flg) { /* set default algorithm based on B->cmap->N */
>> +      PetscMPIInt size;
>> +      ierr = MPI_Comm_size(comm,&size);CHKERRQ(ierr);
>> +      if ((PetscReal)(B->cmap->N)/size > 100000.0) alg = 0; /* scalable
>> algorithm */
>> +    }
>>
>> i.e., if user does NOT pick an algorithm, when ave cols per process >
>> 100k, use scalable implementation; otherwise, non-scalable version.
>>
>> 2) We do NOT have scalable implementation for MatPtAP() yet.
>> We have non-scalable PtAP and interface to Hypre's PtAP. Comparing the
>> two,
>> Petsc MatPtAP() is approx 3x faster than Hypre's.
>>
>> I'm writing a scalable MatPtAP() now.
>>
>> Hong
>>
>> On Thu, Feb 2, 2017 at 2:54 PM, Stefano Zampini <
>> [email protected]> wrote:
>>
>>>
>>>
>>> Il 02 Feb 2017 23:43, "Mark Adams" <[email protected]> ha scritto:
>>>
>>>
>>>
>>> On Thu, Feb 2, 2017 at 12:02 PM, Stefano Zampini <
>>> [email protected]> wrote:
>>>
>>>> Mark,
>>>>
>>>> I saw your configuration has hypre. If you could run with master, you
>>>> may try -matptap_via hypre.
>>>>
>>>
>>> This is worth trying. Does this even work with GAMG?
>>>
>>>
>>> Yes, it should work, except that the block sizes, if any, are not
>>> propagated to the resulting matrix. I can add it if you need it.
>>>
>>>
>>>
>>> Treb: try hypre anyway. It has its own RAP code.
>>>
>>>
>>>
>>> With that option, you will use hypre's RAP with MATAIJ
>>>
>>>
>>> It uses BoomerAMGBuildCoarseOperator directly with the AIJ matrices.
>>>>
>>>> Stefano
>>>>
>>>> On Feb 2, 2017, at 7:28 PM, Mark Adams <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Feb 2, 2017 at 11:13 AM, Hong <[email protected]> wrote:
>>>>
>>>>> Mark:
>>>>> Try '-matmatmult_via scalable' first. If this works, should we set it
>>>>> as default?
>>>>>
>>>>
>>>> If it is robust I would say yes unless it is noticeably slower (say
>>>> >20%) small scale problems.
>>>>
>>>>
>>>>
>>>
>>>
>>
>

out_ex56_cetus_short
Description: Binary data

Re: [petsc-dev] no petsc on Edison

Reply via email to