[petsc-users] ASM vs GASM

2019-04-29 Thread Griffith, Boyce Eugene via petsc-users
I am trying to setup a GASM preconditioner and am running into some difficulty 
getting convergence even with exact subdomain solvers. I am just doing things 
in serial now, and tried switching to ASM, and get convergence. I am wondering 
if I am misunderstanding the GASM interface. If inner_ises and outer_ises 
define the “inner” and “outer” subdomains for GASM, should GASM and ASM do the 
same things when configured via:

PCGASMSetSubdomains(pc, n_subdomains, inner_ises, outer_ises);

and

PCASMSetLocalSubdomains(pc, n_subdomains, outer_ises, inner_ises);

Thanks!

— Boyce

Re: [petsc-users] FIELDSPLIT fields

2018-08-15 Thread Griffith, Boyce Eugene


> On Aug 15, 2018, at 10:07 PM, Smith, Barry F.  wrote:
> 
> 
>   Yes you can have "overlapping fields" with FIELDSPLIT but I don't think you 
> can use FIELDSPLIT for your case. You seem to have a geometric decomposition 
> into regions. ASM and GASM are intended for such decompositions. Fieldsplit 
> is for multiple fields that each live across the entire domain.

Basically there is one field the lives on the entire domain, and another field 
that lives only on a subdomain.

Perhaps we could do GASM for the geometric split and FIELDSPLIT within the 
subdomain with the two fields.

>   Barry
> 
> 
>> On Aug 15, 2018, at 7:42 PM, Griffith, Boyce Eugene  
>> wrote:
>> 
>> Is it permissible to have overlapping fields in FIELDSPLIT? We are 
>> specifically thinking about how to handle DOFs living on the interface 
>> between two regions.
>> 
>> Thanks!
>> 
>> — Boyce
> 



Re: [petsc-users] FIELDSPLIT fields

2018-08-15 Thread Griffith, Boyce Eugene


On Aug 15, 2018, at 9:17 PM, Matthew Knepley 
mailto:knep...@gmail.com>> wrote:

On Wed, Aug 15, 2018 at 8:42 PM Griffith, Boyce Eugene 
mailto:boy...@email.unc.edu>> wrote:
Is it permissible to have overlapping fields in FIELDSPLIT? We are specifically 
thinking about how to handle DOFs living on the interface between two regions.

There is only 1 IS, so no way to do RASM, or any other thing on the overlap. 
This sort of things was supposed to be handled by GASM.

There are three big blocks, with one interface between two of them. It may not 
make much difference which subdomain the interfacial DOFs are assigned to.

I am not sure that works. If you want blocks that are not parallel, then you 
can probably use PCPATCH as soon as I get it merged.

   Matt

Thanks!

— Boyce


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.caam.rice.edu/~mk51/>



[petsc-users] FIELDSPLIT fields

2018-08-15 Thread Griffith, Boyce Eugene
Is it permissible to have overlapping fields in FIELDSPLIT? We are specifically 
thinking about how to handle DOFs living on the interface between two regions.

Thanks!

— Boyce

Re: [petsc-users] Neumann BC with non-symmetric matrix

2016-03-01 Thread Griffith, Boyce Eugene

On Mar 1, 2016, at 10:56 AM, Jed Brown 
> wrote:

Boyce Griffith > writes:
Jed, can you also do this for Stokes?  It seems like something like
RT0 is the right place to start.

See, for example, Arnold, Falk, and Winther's 2007 paper on mixed FEM
for elasticity with weakly imposed symmetry.  It's the usual H(div)
methodology and should apply equally well to Stokes.  I'm not aware of
any analysis or results of choosing quadrature to eliminate flux terms
in these discretizations.

Two papers that are along the direction that I have in mind are:

http://onlinelibrary.wiley.com/doi/10.1002/fld.1566/abstract
http://onlinelibrary.wiley.com/doi/10.1002/fld.1723/abstract

I would love to know how to do this kind of thing on a SAMR or octree grid.

-- Boyce


Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()

2016-01-17 Thread Griffith, Boyce Eugene
Barry --

Another random thought --- are these smallish direct solves things that make 
sense to (try to) offload to a GPU?

Thanks,

-- Boyce

> On Jan 16, 2016, at 10:46 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> 
> 
>  Boyce,
> 
>   Of course anything is possible in software. But I expect an optimization to 
> not rebuild common submatrices/factorization requires a custom PCSetUp_ASM() 
> rather than some PETSc option that we could add (especially if you are using 
> Matt's PC_COMPOSITE_MULTIPLICATIVE).
> 
>   I would start by copying PCSetUp_ASM(), stripping out all the setup stuff 
> that doesn't relate to your code and then mark identical domains so you don't 
> need to call MatGetSubMatrices() on those domains and don't create a new KSP 
> for each one of those subdomains (but reuses a common one). The PCApply_ASM() 
> should be hopefully be reusable so long as you have created the full array of 
> KSP objects (some of which will be common). If you increase the reference 
> counts of the common KSP in 
> PCSetUp_ASM() (and maybe the common sub matrices) then the PCDestroy_ASM() 
> should also work unchanged
> 
> Good luck,
> 
>  Barry
> 
>> On Jan 16, 2016, at 8:25 PM, Griffith, Boyce Eugene <boy...@email.unc.edu> 
>> wrote:
>> 
>> 
>>> On Jan 16, 2016, at 7:00 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>> 
>>> 
>>> Ok, I looked at your results in hpcviewer and don't see any surprises. The 
>>> PETSc time is in the little LU factorizations, the LU solves and the 
>>> matrix-vector products as it should be. Not much can be done on speeding 
>>> these except running on machines with high memory bandwidth. 
>> 
>> Looks like LU factorizations are about 25% for this particular case.  Many 
>> of these little subsystems are going to be identical (many will correspond 
>> to constant coefficient Stokes), and it is fairly easy to figure out which 
>> are which.  How hard would it be to modify PCASM to allow for the 
>> specification of one or more "default" KSPs that can be used for specified 
>> blocks?
>> 
>> Of course, we'll also look into tweaking the subdomain solves --- it may not 
>> even be necessary to do exact subdomain solves to get reasonable MG 
>> performance.
>> 
>> -- Boyce
>> 
>>> If you are using the master branch of PETSc two users gave us a nifty new 
>>> profiler that is "PETSc style" but shows the hierarchy of PETSc solvers 
>>> time and flop etc. You can run with -log_view :filename.xml:ascii_xml and 
>>> then open the file with a browser (for example open -f Safari filename.xml) 
>>> or email the file.
>>> 
>>> Barry
>>> 
>>>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S <amne...@live.unc.edu> 
>>>> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jan 16, 2016, at 1:13 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>>>> 
>>>>> Either way is fine so long as I don't have to install a ton of stuff; 
>>>>> which it sounds like I won’t.
>>>> 
>>>> http://hpctoolkit.org/download/hpcviewer/
>>>> 
>>>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder 
>>>> to Applications. You will be able to 
>>>> fire HPCViewer from LaunchPad. Point it to this attached directory. You 
>>>> will be able to see three different kind of profiling
>>>> under Calling Context View, Callers View and Flat View.   
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 



Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()

2016-01-16 Thread Griffith, Boyce Eugene

> On Jan 16, 2016, at 7:00 PM, Barry Smith  wrote:
> 
> 
>  Ok, I looked at your results in hpcviewer and don't see any surprises. The 
> PETSc time is in the little LU factorizations, the LU solves and the 
> matrix-vector products as it should be. Not much can be done on speeding 
> these except running on machines with high memory bandwidth. 

Looks like LU factorizations are about 25% for this particular case.  Many of 
these little subsystems are going to be identical (many will correspond to 
constant coefficient Stokes), and it is fairly easy to figure out which are 
which.  How hard would it be to modify PCASM to allow for the specification of 
one or more "default" KSPs that can be used for specified blocks?

Of course, we'll also look into tweaking the subdomain solves --- it may not 
even be necessary to do exact subdomain solves to get reasonable MG performance.

-- Boyce

>   If you are using the master branch of PETSc two users gave us a nifty new 
> profiler that is "PETSc style" but shows the hierarchy of PETSc solvers time 
> and flop etc. You can run with -log_view :filename.xml:ascii_xml and then 
> open the file with a browser (for example open -f Safari filename.xml) or 
> email the file.
> 
>   Barry
> 
>> On Jan 16, 2016, at 5:09 PM, Bhalla, Amneet Pal S  
>> wrote:
>> 
>> 
>> 
>>> On Jan 16, 2016, at 1:13 PM, Barry Smith  wrote:
>>> 
>>> Either way is fine so long as I don't have to install a ton of stuff; which 
>>> it sounds like I won’t.
>> 
>> http://hpctoolkit.org/download/hpcviewer/
>> 
>> Unzip HPCViewer for MacOSX with command line and drag the unzipped folder to 
>> Applications. You will be able to 
>> fire HPCViewer from LaunchPad. Point it to this attached directory. You will 
>> be able to see three different kind of profiling
>> under Calling Context View, Callers View and Flat View.   
>> 
>> 
>> 
>> 
>> 
> 



Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()

2016-01-16 Thread Griffith, Boyce Eugene

On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S 
> wrote:



On Jan 15, 2016, at 5:40 PM, Matthew Knepley 
> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet 
discovered.

Ok, I tried Barry’s suggestion. The runtime for PetscOptionsFindPair_Private() 
fell from 14% to mere 1.6%.
If I am getting it right, it’s the petsc options in the KSPSolve() that is 
sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that 
also bypass these calls to PetscOptionsXXX?

Thanks,

-- Boyce


Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()

2016-01-16 Thread Griffith, Boyce Eugene

On Jan 16, 2016, at 4:06 PM, Bhalla, Amneet Pal S 
<amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote:

Does Instruments save results somewhere (like in a cascade view) that I can 
send to Barry?

Yes --- "save as..." will save the current trace, and then you can open it back 
up.

-- Boyce

--Amneet Bhalla

On Jan 16, 2016, at 1:04 PM, Griffith, Boyce Eugene 
<boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote:


On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S 
<amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote:



--Amneet Bhalla

On Jan 16, 2016, at 10:21 AM, Barry Smith 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:


On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene 
<boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote:


On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S 
<amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote:



On Jan 15, 2016, at 5:40 PM, Matthew Knepley 
<knep...@gmail.com<mailto:knep...@gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet 
discovered.

Ok, I tried Barry’s suggestion. The runtime for PetscOptionsFindPair_Private() 
fell from 14% to mere 1.6%.
If I am getting it right, it’s the petsc options in the KSPSolve() that is 
sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that 
also bypass these calls to PetscOptionsXXX?

 No that is a different issue.

  In the short term I recommend when running optimized/production you work with 
a PETSc with those Options checking in KSPSolve commented out, you don't use 
them anyways*.  Since you are using ASM with many subdomains there are many 
"fast" calls to KSPSolve which is why for your particular case the the 
PetscOptionsFindPair_Private takes so much time.

 Now that you have eliminated this issue I would be very interested in seeing 
the HPCToolKit or Instruments profiling of the code to see  hot spots in the 
PETSc solver configuration you are using. Thanks

Barry --- the best way and the least back and forth way would be if I can send 
you the files (maybe off-list) that you can view in HPCViewer, which is a light 
weight java script app. You can view which the calling context (which petsc 
function calls which internal petsc routine) in a cascade form. If I send you 
an excel sheet, it would be in a flat view and not that useful for serious 
profiling.

Amneet, can you just run with OS X Instruments, which Barry already knows how 
to use (right Barry?)? :-)

Thanks,

-- Boyce


Let me know if you would like to try that.

  Barry

* Eventually we'll switch to a KSPPreSolveMonitorSet() and 
KSPPostSolveMonitorSet() model to eliminate this overhead but still have the 
functionality.


Thanks,

-- Boyce





Re: [petsc-users] Amortizing calls to PetscOptionsFindPair_Private()

2016-01-16 Thread Griffith, Boyce Eugene

On Jan 16, 2016, at 4:00 PM, Bhalla, Amneet Pal S 
<amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote:



--Amneet Bhalla

On Jan 16, 2016, at 10:21 AM, Barry Smith 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:


On Jan 16, 2016, at 7:12 AM, Griffith, Boyce Eugene 
<boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote:


On Jan 16, 2016, at 12:34 AM, Bhalla, Amneet Pal S 
<amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote:



On Jan 15, 2016, at 5:40 PM, Matthew Knepley 
<knep...@gmail.com<mailto:knep...@gmail.com>> wrote:

I am inclined to try
Barry's experiment first, since this may have bugs that we have not yet 
discovered.

Ok, I tried Barry’s suggestion. The runtime for PetscOptionsFindPair_Private() 
fell from 14% to mere 1.6%.
If I am getting it right, it’s the petsc options in the KSPSolve() that is 
sucking up nontrivial amount of time (14 - 1.6)
and not KSPSetFromOptions() itself (1.6%).

Barry / Matt / Jed, if we were using KSPReset here and reusing KSPs, would that 
also bypass these calls to PetscOptionsXXX?

 No that is a different issue.

  In the short term I recommend when running optimized/production you work with 
a PETSc with those Options checking in KSPSolve commented out, you don't use 
them anyways*.  Since you are using ASM with many subdomains there are many 
"fast" calls to KSPSolve which is why for your particular case the the 
PetscOptionsFindPair_Private takes so much time.

 Now that you have eliminated this issue I would be very interested in seeing 
the HPCToolKit or Instruments profiling of the code to see  hot spots in the 
PETSc solver configuration you are using. Thanks

Barry --- the best way and the least back and forth way would be if I can send 
you the files (maybe off-list) that you can view in HPCViewer, which is a light 
weight java script app. You can view which the calling context (which petsc 
function calls which internal petsc routine) in a cascade form. If I send you 
an excel sheet, it would be in a flat view and not that useful for serious 
profiling.

Amneet, can you just run with OS X Instruments, which Barry already knows how 
to use (right Barry?)? :-)

Thanks,

-- Boyce


Let me know if you would like to try that.

  Barry

* Eventually we'll switch to a KSPPreSolveMonitorSet() and 
KSPPostSolveMonitorSet() model to eliminate this overhead but still have the 
functionality.


Thanks,

-- Boyce




Re: [petsc-users] HPCToolKit/HPCViewer on OS X

2016-01-14 Thread Griffith, Boyce Eugene

> On Jan 14, 2016, at 2:24 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> 
> 
>   Matt is right, there is a lot of "missing" time from the output. Please 
> send the output from -ksp_view so we can see exactly what solver is being 
> used. 
> 
>   From the output we have:
> 
>Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC 
> is taking about 22% of the time)
>Linear solver 77 % of the time (this is reasonable pretty much the entire 
> cost of the nonlinear solve is the linear solve)
>Time to set up the preconditioner is 19%  (10 + 9)  
>Time of iteration in KSP 35 % (this is the sum of the vector operations 
> and MatMult() and MatSolve())
> 
> So 77 - (19 + 35) = 23 % unexplained time inside the linear solver 
> (custom preconditioner???)
> 
>Also getting the results with Instruments or HPCToolkit would be useful 
> (so long as we don't need to install HPCTool ourselves to see the results).

Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some 
matrix-based stuff implemented using PETSc along with some matrix-free stuff 
that is built on top of SAMRAI. Amneet and I should take a look at performance 
off-list first.

-- Boyce

> 
> 
>   Barry
> 
>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amne...@live.unc.edu> 
>> wrote:
>> 
>> 
>> 
>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boy...@email.unc.edu> 
>>> wrote:
>>> 
>>> I see one hot spot:
>> 
>> 
>> Here is with opt build
>> 
>> 
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
>> -fCourier9' to print this document***
>> 
>> 
>> -- PETSc Performance Summary: 
>> --
>> 
>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 
>> 02:24:43 2016
>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 
>> 2016-01-13 21:30:26 -0600
>> 
>> Max   Max/MinAvg  Total 
>> Time (sec):   1.018e+00  1.0   1.018e+00
>> Objects:  2.935e+03  1.0   2.935e+03
>> Flops:4.957e+08  1.0   4.957e+08  4.957e+08
>> Flops/sec:4.868e+08  1.0   4.868e+08  4.868e+08
>> MPI Messages: 0.000e+00  0.0   0.000e+00  0.000e+00
>> MPI Message Lengths:  0.000e+00  0.0   0.000e+00  0.000e+00
>> MPI Reductions:   0.000e+00  0.0
>> 
>> Flop counting convention: 1 flop = 1 real number operation of type 
>> (multiply/divide/add/subtract)
>>e.g., VecAXPY() for real vectors of length N --> 
>> 2N flops
>>and VecAXPY() for complex vectors of length N --> 
>> 8N flops
>> 
>> Summary of Stages:   - Time --  - Flops -  --- Messages ---  
>> -- Message Lengths --  -- Reductions --
>>Avg %Total Avg %Total   counts   %Total   
>>   Avg %Total   counts   %Total 
>> 0:  Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  
>> 0.000e+000.0%  0.000e+00   0.0% 
>> 
>> 
>> See the 'Profiling' chapter of the users' manual for details on interpreting 
>> output.
>> Phase summary info:
>>   Count: number of times phase was executed
>>   Time and Flops: Max - maximum over all processors
>>   Ratio - ratio of maximum to minimum over all processors
>>   Mess: number of messages sent
>>   Avg. len: average message length (bytes)
>>   Reduct: number of global reductions
>>   Global: entire computation
>>   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
>> PetscLogStagePop().
>>  %T - percent time in this phase %F - percent flops in this phase
>>  %M - percent messages in this phase %L - percent message lengths in 
>> this phase
>>  %R - percent reductions in this phase
>>   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over 
>> all processors)
>> --

Re: [petsc-users] HPCToolKit/HPCViewer on OS X

2016-01-14 Thread Griffith, Boyce Eugene

On Jan 14, 2016, at 3:09 PM, Barry Smith 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:


On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene 
<boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote:


On Jan 14, 2016, at 2:24 PM, Barry Smith 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:


Matt is right, there is a lot of "missing" time from the output. Please send 
the output from -ksp_view so we can see exactly what solver is being used.

>From the output we have:

 Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC is 
taking about 22% of the time)
 Linear solver 77 % of the time (this is reasonable pretty much the entire cost 
of the nonlinear solve is the linear solve)
 Time to set up the preconditioner is 19%  (10 + 9)
 Time of iteration in KSP 35 % (this is the sum of the vector operations and 
MatMult() and MatSolve())

  So 77 - (19 + 35) = 23 % unexplained time inside the linear solver (custom 
preconditioner???)

 Also getting the results with Instruments or HPCToolkit would be useful (so 
long as we don't need to install HPCTool ourselves to see the results).

Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some 
matrix-based stuff implemented using PETSc along with some matrix-free stuff 
that is built on top of SAMRAI. Amneet and I should take a look at performance 
off-list first.

  Just put an PetscLogEvent() in (or several) to track that part. Plus put an 
event or two outside the SNESSolve to track the outside PETSc setup time.

  The PETSc time looks reasonable at most I can only image any optimizations we 
could do bringing it down a small percentage.

Here is a bit more info about what we are trying to do:

This is a Vanka-type MG preconditioner for a Stokes-like system on a structured 
grid. (Currently just uniform grids, but hopefully soon with AMR.) For the 
smoother, we are using damped Richardson + ASM with relatively small block 
subdomains --- e.g., all DOFs associated with 8x8 cells in 2D (~300 DOFs), or 
8x8x8 in 3D (~2500 DOFs). Unfortunately, MG iteration counts really tank when 
using smaller subdomains.

I can't remember whether we have quantified this carefully, but PCASM seems to 
bog down with smaller subdomains. A question is whether there are different 
implementation choices that could make the case of "lots of little subdomains" 
run faster. But before we get to that, Amneet and I should take a more careful 
look at overall solver performance.

(We are also starting to play around with PCFIELDSPLIT for this problem too, 
although we don't have many ideas about how to handle the Schur complement.)

Thanks,

-- Boyce



  Barry


-- Boyce



Barry

On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S 
<amne...@live.unc.edu<mailto:amne...@live.unc.edu>> wrote:



On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene 
<boy...@email.unc.edu<mailto:boy...@email.unc.edu>> wrote:

I see one hot spot:


Here is with opt build


*** WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document***


-- PETSc Performance Summary: 
--

./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 
02:24:43 2016
Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2  GIT Date: 
2016-01-13 21:30:26 -0600

  Max   Max/MinAvg  Total
Time (sec):   1.018e+00  1.0   1.018e+00
Objects:  2.935e+03  1.0   2.935e+03
Flops:4.957e+08  1.0   4.957e+08  4.957e+08
Flops/sec:4.868e+08  1.0   4.868e+08  4.868e+08
MPI Messages: 0.000e+00  0.0   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00  0.0   0.000e+00  0.000e+00
MPI Reductions:   0.000e+00  0.0

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
 e.g., VecAXPY() for real vectors of length N --> 2N 
flops
 and VecAXPY() for complex vectors of length N --> 8N 
flops

Summary of Stages:   - Time --  - Flops -  --- Messages ---  -- 
Message Lengths --  -- Reductions --
 Avg %Total Avg %Total   counts   %Total 
Avg %Total   counts   %Total
0:  Main Stage: 1.0183e+00 100.0%  4.9570e+08 100.0%  0.000e+00   0.0%  
0.000e+000.0% 0.000e+00   0.0%


See 

Re: [petsc-users] HPCToolKit/HPCViewer on OS X

2016-01-13 Thread Griffith, Boyce Eugene
I see one hot spot:

On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S 
> wrote:

  ##
  ##
  #  WARNING!!!#
  ##
  #   This code was compiled with a debugging option,  #
  #   To get timing results run ./configure#
  #   using --with-debugging=no, the performance will  #
  #   be generally two or three times faster.  #
  ##
  ##