Thank you very much!
KPS Setup does appear in the flame login it in the far left (and you have to 
zoom to see it using the browser)
I will look into the debugger option.

You said “. -pc_bjacobi_local_blocks 2 indicates you want to 2 blocks per rank” 
does it imply on mpi -n 2 that KSPSetUP should be called 4 times?


The below shows a different behaviour?

mpirun -n 6 ……… -pc_bjacobi_local_blocks 1 • KSPSetUP is called 2 times

mpirun -n 6 ……… -pc_bjacobi_local_blocks 2 • KSPSetUP is called 3 times

mpirun -n 6 ……… -pc_bjacobi_local_blocks 3 • KSPSetUP is called 4 times



mpirun -n 3 ……… -pc_bjacobi_local_blocks 1 • KSPSetUP is called 2 times

mpirun -n 3 ……… -pc_bjacobi_local_blocks 2 • KSPSetUP is called 3 times

mpirun -n 3 ……… -pc_bjacobi_local_blocks 3 • KSPSetUP is called 4 times

Thanks,
Karthik.

From: Barry Smith <[email protected]>
Date: Tuesday, 5 October 2021 at 17:47
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]>
Cc: Matthew Knepley <[email protected]>, "[email protected]" 
<[email protected]>
Subject: Re: [petsc-users] (percent time in this phase)




On Oct 5, 2021, at 12:28 PM, Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:

Thanks Barry.

Please find the attached screen shoot (flamegraph) and foo.txt which generated 
that graph (using https://www.speedscope.app<https://www.speedscope.app/>).
If you look at the far right of the flamegraph PCSetUpOnBlock() calls PCSetUp() 
and not KSPSetup(). Unless, I am not reading the graph right?

 For block Jacobi

static PetscErrorCode PCSetUpOnBlocks_BJacobi_Singleblock(PC pc)
{
  PetscErrorCode     ierr;
  PC_BJacobi         *jac = (PC_BJacobi*)pc->data;
  KSP                subksp = jac->ksp[0];
  KSPConvergedReason reason;

  PetscFunctionBegin;
  ierr = KSPSetUp(subksp);CHKERRQ(ierr);
  ierr = KSPGetConvergedReason(subksp,&reason);CHKERRQ(ierr);
  if (reason == KSP_DIVERGED_PC_FAILED) {
    pc->failedreason = PC_SUBPC_ERROR;
  }
  PetscFunctionReturn(0);
}

I am not sure why the KSPSetUp does not appear in the Flame logging, there may 
be something that ensures it does not get logged.

KSPSetUpOnBlocks() is not logged so does not appear in the logging.

Sometimes you may need to run in the debugger with break points on certain 
functions to indicate if they are called and when

Secondly how can I know, how many blocks are being setup? Is there a default 
flag on the number of blocks being SetUp?

For ASM and block Jacobi the default blocks is one per MPI rank. 
-pc_bjacobi_local_blocks 2 indicates you want to 2 blocks per rank






From: Barry Smith <[email protected]<mailto:[email protected]>>
Date: Tuesday, 5 October 2021 at 17:15
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: Matthew Knepley <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] (percent time in this phase)


  PCSetUpOnBlocks() calls KSPSetUp() on each block which calls PCSetUp() on 
each block. This is why you are getting the confusing logging results. So the "
extra" KSPSetUp() is the one on the blocks.



On Oct 5, 2021, at 12:02 PM, Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:

Hi Matt,

I have a couple of questions;


  1.  Weather I run on single core or on multiple cores I find that KSPSetUP 
(ksp/tutorial/ex.45.c) is always called twice. But why? Is setup not done once?
  2.  I find PCSetUpOnBlocks is calling PCSetup and not the other way around. 
Can you shed some light? The preconditioner used is block jacobi.

Thanks,
Karthik.

From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Date: Sunday, 3 October 2021 at 12:54
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] (percent time in this phase)

On Sun, Oct 3, 2021 at 5:43 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Hi Matt,

Thank you. The flamegraph tool is helpful.
Please find the attached screen shoot and foo.txt which generated that graph 
(using https://www.speedscope.app<https://www.speedscope.app/>).
I find the following call sequence from the graph
KSPSolve -> PCApply -> KSPSolve -> PCApply -> MatSolve

I have a couple of questions

  1.  The KSPSolve time listed in the file using -log_summary (or -log_view), 
is it the time taken by the first KSPSolve (in the above call sequence)?
Yes. When calls are nested, we just do not record the time for the nested call 
in log_view.


  1.
  2.  What is the unit of measurement in flamegrah?
I believe it is microseconds, but I am not sure.

  Thanks,

     Matt


  1.
Thanks,
Karthik.

From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Date: Friday, 1 October 2021 at 14:51
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: Barry Smith <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] (percent time in this phase)

On Thu, Sep 30, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
When comparing the MatSolve data for

GPU

MatSolve             341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 
0.0e+00 36 34  0  0  0  36 34  0  0  0  4536    4538      1 5.34e+02    0 
0.00e+00 100
MatCUSPARSSolAnl       2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  9  0  0  0  0   9  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0

and CPU

MatSolve             352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 
0.0e+00 35 34  0  0  0  35 34  0  0  0  4489

the time spent is almost the same for this preconditioner.  Look like 
MatCUSPARSSolAnl is called only twice (since I am running on two cores)

mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511  
-dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi 
-ksp_monitor

So would it be fair to assume MatCUSPARSSolAnl  is not accounted for in 
MatSolve and it is an exclusive event?
KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 
100 %

I am getting so old. We have a different kind of log output if you are really 
concerned about inclusion. You can run with

  -log_view :foo.txt:ascii_flamegraph

and then there are tools for plotting that output, described here

  https://firedrakeproject.org/optimising.html

This output _guarantees_ strict inclusion, so you will not have the problems 
you have above adding things up.

  Thanks,

    Matt

Best,
Karthik.


From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Date: Wednesday, 29 September 2021 at 16:29
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: Barry Smith <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] %T (percent time in this phase)

On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Thank you!

Just to summarize

KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ 
100 %

You didn’t happen to mention how MatCUSPARSSolAnl is accounted for? Am I right 
in accounting for it as above?

I am not sure.I thought it might be the GPU part of MatSolve(). I will have to 
look in the code. I am not as familiar with the GPU part.

MatCUSPARSSolAnl       2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  9  0  0  0  0   9  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0

Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and VecAYPX 
are mutually exclusive?

Yes.

  Thanks,

     Matt

Best,

Karthik.

From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Date: Wednesday, 29 September 2021 at 11:58
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: Barry Smith <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] %T (percent time in this phase)

On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Thank you Mathew. Now, it is all making sense to me.

From data file ksp_ex45_N511_gpu_2.txt

KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%).

However, you said “So an iteration would mostly consist of MatMult + PCApply, 
with some vector work”

1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one 
process and using Block-Jacobi . Half the time is spent in the solve (53%)


KSPSetUp               2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
2.4e+01  0  0  0  0  1   0  0  0  0  1     0       0      0 0.00e+00    0 
0.00e+00  0

KSPSolve               1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 
4.4e+03 53100100100 95  53100100100 96 10881   11730   1022 6.40e+03 1021 
8.17e-03 100

2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is 
all setup of the individual blocks, and this is all used by the numerical ILU 
factorization.

PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 
0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 
1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 
0.00e+00 0

MatLUFactorNum         1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0    63       0      2 6.93e+03    0 
0.00e+00  0

MatILUFactorSym        1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0

3) The preconditioner application takes 37% of the time, which is all solving 
the factors and recorded in MatSolve(). Matrix multiplication takes 4%.

PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 
37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100

MatSolve             341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 
0.0e+00 36 34  0  0  0  36 34  0  0  0  4536    4538      1 5.34e+02    0 
0.00e+00 100

MatMult              341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 
2.0e+00  4 34100100  0   4 34100100  0 54801   66441      2 5.86e+03    0 
0.00e+00 100


4) The significant vector time is all in norms (11%) since they are really slow 
on the GPU.


VecNorm              342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 
6.8e+02 11  5  0  0 15  11  5  0  0 15  1466   196884      0 0.00e+00  342 
2.74e-03 100

VecTDot              680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 
1.4e+03  1 10  0  0 29   1 10  0  0 29 106079   133922      0 0.00e+00  680 
5.44e-03 100

VecAXPY              681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  1 11  0  0  0   1 11  0  0  0 56728   58367    682 5.34e+02    0 
0.00e+00 100

VecAYPX              339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  1  5  0  0  0   1  5  0  0  0 34136   34153    339 2.71e-03    0 
0.00e+00 100


So the solve time is:

  53% ~ 37% + 4% + 11%

and the setup time is about 16%. I was wrong about the SetUp time being 
included, as it is outside the event:

  https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852

It looks like the remainder of the time (23%) is spent preallocating the matrix.

  Thanks,

     Matt

The MalMult event is 4 %. How does this event figure into the above equation; 
if preconditioning (MatMult + PCApply) is included in KSPSolve?

Best,
Karthik.

From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Date: Wednesday, 29 September 2021 at 10:58
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: Barry Smith <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] %T (percent time in this phase)

On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
That was helpful. I would like to provide some additional details of my run on 
cpus and gpus. Please find the following attachments:


  1.  graph.pdf a plot showing overall time and various petsc events.
  2.  ksp_ex45_N511_cpu_6.txt data file of the log_summary
  3.  ksp_ex45_N511_gpu_2.txt data file of the log_summary

I used the following petsc options for cpu

mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 
-dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi -ksp_monitor

and for gpus

mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511  
-dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type bjacobi 
-ksp_monitor

to run the following problem

https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html

From the above code, I see is there no individual function called KSPSetUp(), 
so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, 
kSPSetComputeOperators all are timed together as KSPSetUp. For this example, is 
KSPSetUp time and KSPSolve time mutually exclusive?

No, KSPSetUp() will be contained in KSPSolve() if it is called automatically.

In your response you said that

   “PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on 
how much of the preconditioner construction can take place early, so depends 
exactly on the preconditioner used.”

I don’t see a explicit call to PCSetUp() or  PCApply() in ex45; so for this 
particular preconditioner (bjacobi) how can I tell how they are timed?

They are all inside KSPSolve(). If you have a preconditioned linear solve, the 
oreconditioning happens during the iteration. So an iteration would mostly
consist of MatMult + PCApply, with some vector work.

I am hoping to time KSP solving and preconditioning mutually exclusively.

I am not sure that concept makes sense here. See above.

  Thanks,

     Matt


Kind regards,
Karthik.


From: Barry Smith <[email protected]<mailto:[email protected]>>
Date: Tuesday, 28 September 2021 at 19:19
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] %T (percent time in this phase)



On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:

Thanks for Barry for your response.

I was just benchmarking the problem with various preconditioner on cpu and gpu. 
I understand, it is not possible to get mutually exclusive timing.
However, can you tell if KSPSolve time includes both PCSetup and PCApply? And 
if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and 
PCApply.

   If you do not call KSPSetUp() separately from KSPSolve() then its time is 
included with KSPSolve().

   PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on 
how much of the preconditioner construction can take place early, so depends 
exactly on the preconditioner used.

   So yes the answer is not totally satisfying. The one thing I would recommend 
is to not call KSPSetUp() directly and then KSPSolve() will always include the 
total time of the solve plus all setup time. PCApply will contain all the time 
to apply the preconditioner but may also include some setup time.

  Barry


Best,
Karthik.




From: Barry Smith <[email protected]<mailto:[email protected]>>
Date: Tuesday, 28 September 2021 at 16:56
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] %T (percent time in this phase)



On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:

Hello,

I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson 
problem. I noticed from the output from using the flag -log_summary that for 
various events their respective %T (percent time in this phase) do not add up 
to 100 but rather exceeds 100. So, I gather there is some overlap among these 
events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and 
PCSolve. Is it possible to get a mutually exclusive %T or Time for these 
individual events? I have attached  the log_summary output file from my run for 
your reference.


  For nested solvers it is tricky to get the times to be mutually exclusive 
because some parts of the building of the preconditioner is for some 
preconditioners delayed until the solve has started.

  It looks like you are using the default preconditioner options which for this 
example are taking more or less no time since so many iterations are needed. It 
is best to use -pc_type mg to use geometric multigrid on this problem.

  Barry



Thanks!
Karthik.

This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system. UK Research and 
Innovation (UKRI) has taken every reasonable precaution to minimise risk of 
this email or any attachments containing viruses or malware but the recipient 
should carry out its own virus and malware checks before opening the 
attachments. UKRI does not accept any liability for any losses or damages which 
the recipient may sustain due to presence of any viruses.
<ksp_ex45_N511_cpu_6.txt>



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>

<flamegrah.png><foo.txt>


Reply via email to