That max_time is the maximum over all processes from PetscInitialize() to PetscFinalize(), it is not a good number to use to compute the percentages since it also includes the time to get all the MPI ranks up and running. This is why I recommend using a PetscStage and the percentages it reports for time, these will reflect exactly the times relevant for your computations.
Barry > On Oct 1, 2021, at 12:52 PM, Karthikeyan Chockalingam - STFC UKRI > <[email protected]> wrote: > > Hi Barry, > > Thanks for your comment. > > I took max time as 2.868e+02 sec from the below table, as the total time > taken to solve the entire problem. Was I correct in my assumption? > Using this max time, I manually tried to calculate the individual event > percentage to see if it matched up to T%. > > Max Max/Min Avg Total > Time (sec): 2.868e+02 1.000 2.868e+02 > Objects: 3.800e+01 1.000 3.800e+01 > Flop: 8.659e+11 1.004 8.642e+11 1.728e+12 > Flop/sec: 3.019e+09 1.004 3.013e+09 6.026e+09 > Memory: 1.764e+10 1.004 1.760e+10 3.521e+10 > MPI Messages: 3.430e+02 1.000 3.430e+02 6.860e+02 > MPI Message Lengths: 7.134e+08 1.000 2.080e+06 1.427e+09 > MPI Reductions: 4.637e+03 1.000 > > Best, > Karthik. > > From: Barry Smith <[email protected] <mailto:[email protected]>> > Date: Friday, 1 October 2021 at 16:03 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] (percent time in this phase) > > > What is "max time"? It is best to gather timings with a stage > PetscLogStagePush() to get a separate subtable for exactly the part of the > code you want timing for. For example if you are studying linear solver time > you want only the solver part of the code in the stage, not the time to build > the matrix and right hand side. > > It is very difficult to get really correct publishable reliable performance > numbers when comparing solvers with similar timings on parallel machines and > especially GPUs. Values can be very dependent on particular compilers used, > the specific hardware used, generation of memory used etc. > > Barry > > > > On Oct 1, 2021, at 8:51 AM, Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > > Hi Barry, > > Yes, looks like it is computationally faster using GPUs. I used block jacobi > as the preconditioner. > I have attached the output data for cpu and gpu using -ksp_view. I am not > sure; what information I should be looking at using -ksp_view? > > I have an outstanding question, > > event time > T% > cal = (event/max)*100 > max time > 2.87E+02 > KSPSolve > 1.58E+02 > 53 > 55.2 > MatMult > 1.08E+01 > 4 > 3.76 > PCApply > 1.31E+02 > 37 > 45.6 > VecNorm > 6.23E+01 > 11 > 21.7 > > Matt couple of days back helped breakdown KSPSolve (53 %) ~ PCApply (37%) + > VecNorm (11%) + MatMul (4%) > > However, when I calculate T% manually using max time, the numbers for PCApply > and VecNorm are way off as you can see from the above table. > As a result, the cumulative sum of event time don’t match up to KSPSolve. Can > you please let me know what I might be doing wrong? > > I will be performing extensive benchmarking of various preconditioners and > comparing their performance on cpus and gpus, so this information is critical. > > Many thanks! > Karthik. > > From: Barry Smith <[email protected] <mailto:[email protected]>> > Date: Thursday, 30 September 2021 at 15:47 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] (percent time in this phase) > > > The MatSolve is no better on the GPUs then on the CPU; while other parts of > the computation seem to speed up nicely. What is the result of -ksp_view ? > Are you using ILU(0) as the preconditioner, this will not solve well on the > GPU, its solve is essentially sequential. You won't want to use ILU(0) in > this way on GPUs. > > Barry > > > > > On Sep 30, 2021, at 9:41 AM, Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > > Based on your feedback from yesterday. I was trying to breakdown KSPSolve. > Please find the attached bar plot. The numbers are not adding up at least for > GPUs. > Your feedback from yesterday were based on T%. > I plotted the time spend on each event, hoping that the cumulative sum would > add up to KSPSolve time. > > Kind regards, > Karthik. > > From: Matthew Knepley <[email protected] <mailto:[email protected]>> > Date: Thursday, 30 September 2021 at 13:52 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: Barry Smith <[email protected] <mailto:[email protected]>>, > "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] (percent time in this phase) > > On Thu, Sep 30, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > When comparing the MatSolve data for > > GPU > > MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 > 0.00e+00 100 > MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > and CPU > > MatSolve 352 1.0 1.3553e+02 1.0 1.02e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 35 34 0 0 0 35 34 0 0 0 4489 > > the time spent is almost the same for this preconditioner. Look like > MatCUSPARSSolAnl is called only twice (since I am running on two cores) > > mpirun -n 2 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 > -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > So would it be fair to assume MatCUSPARSSolAnl is not accounted for in > MatSolve and it is an exclusive event? > > Looks like that. > > Thanks > > Matt > > KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ > 100 % > > Best, > Karthik. > > > From: Matthew Knepley <[email protected] <mailto:[email protected]>> > Date: Wednesday, 29 September 2021 at 16:29 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: Barry Smith <[email protected] <mailto:[email protected]>>, > "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] %T (percent time in this phase) > > On Wed, Sep 29, 2021 at 10:18 AM Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > Thank you! > > Just to summarize > > KSPSolve (53%) + PCSetup (16%) + DMCreateMat (23%) + MatCUSPARSSolAnl (9%) ~ > 100 % > > You didn’t happen to mention how MatCUSPARSSolAnl is accounted for? Am I > right in accounting for it as above? > > I am not sure.I thought it might be the GPU part of MatSolve(). I will have > to look in the code. I am not as familiar with the GPU part. > > MatCUSPARSSolAnl 2 1.0 3.2338e+01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 9 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > Finally, I believe the vector events, VecNorn, VecTDot, VecAXPY, and VecAYPX > are mutually exclusive? > > Yes. > > Thanks, > > Matt > > Best, > > Karthik. > > From: Matthew Knepley <[email protected] <mailto:[email protected]>> > Date: Wednesday, 29 September 2021 at 11:58 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: Barry Smith <[email protected] <mailto:[email protected]>>, > "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] %T (percent time in this phase) > > On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > Thank you Mathew. Now, it is all making sense to me. > > From data file ksp_ex45_N511_gpu_2.txt > > KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). > > However, you said “So an iteration would mostly consist of MatMult + PCApply, > with some vector work” > > 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one > process and using Block-Jacobi . Half the time is spent in the solve (53%) > > KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 > 0.00e+00 0 > KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 > 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 > 8.17e-03 100 > > > 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is > all setup of the individual blocks, and this is all used by the numerical ILU > factorization. > > PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 > 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 > 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 > 0.00e+00 0 > MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 > 0.00e+00 0 > MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > 3) The preconditioner application takes 37% of the time, which is all solving > the factors and recorded in MatSolve(). Matrix multiplication takes 4%. > > PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 > 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 > MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 > 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 > 0.00e+00 100 > MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 > 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 > 0.00e+00 100 > > 4) The significant vector time is all in norms (11%) since they are really > slow on the GPU. > > > VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 > 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 > 2.74e-03 100 > VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 > 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 > 5.44e-03 100 > VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 > 0.00e+00 100 > VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 > 0.00e+00 100 > > So the solve time is: > > 53% ~ 37% + 4% + 11% > > and the setup time is about 16%. I was wrong about the SetUp time being > included, as it is outside the event: > > > https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 > > <https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852> > > It looks like the remainder of the time (23%) is spent preallocating the > matrix. > > Thanks, > > Matt > > The MalMult event is 4 %. How does this event figure into the above equation; > if preconditioning (MatMult + PCApply) is included in KSPSolve? > > Best, > Karthik. > > From: Matthew Knepley <[email protected] <mailto:[email protected]>> > Date: Wednesday, 29 September 2021 at 10:58 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: Barry Smith <[email protected] <mailto:[email protected]>>, > "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] %T (percent time in this phase) > > On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > That was helpful. I would like to provide some additional details of my run > on cpus and gpus. Please find the following attachments: > > graph.pdf a plot showing overall time and various petsc events. > ksp_ex45_N511_cpu_6.txt data file of the log_summary > ksp_ex45_N511_gpu_2.txt data file of the log_summary > > I used the following petsc options for cpu > > mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 > -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi > -ksp_monitor > > and for gpus > > mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z 511 > -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > to run the following problem > > https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html > <https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html> > > From the above code, I see is there no individual function called KSPSetUp(), > so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, > kSPSetComputeOperators all are timed together as KSPSetUp. For this example, > is KSPSetUp time and KSPSolve time mutually exclusive? > > No, KSPSetUp() will be contained in KSPSolve() if it is called automatically. > > In your response you said that > > “PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends > on how much of the preconditioner construction can take place early, so > depends exactly on the preconditioner used.” > > I don’t see a explicit call to PCSetUp() or PCApply() in ex45; so for this > particular preconditioner (bjacobi) how can I tell how they are timed? > > They are all inside KSPSolve(). If you have a preconditioned linear solve, > the oreconditioning happens during the iteration. So an iteration would mostly > consist of MatMult + PCApply, with some vector work. > > I am hoping to time KSP solving and preconditioning mutually exclusively. > > I am not sure that concept makes sense here. See above. > > Thanks, > > Matt > > > Kind regards, > Karthik. > > > From: Barry Smith <[email protected] <mailto:[email protected]>> > Date: Tuesday, 28 September 2021 at 19:19 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] %T (percent time in this phase) > > > > > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > > Thanks for Barry for your response. > > I was just benchmarking the problem with various preconditioner on cpu and > gpu. I understand, it is not possible to get mutually exclusive timing. > However, can you tell if KSPSolve time includes both PCSetup and PCApply? And > if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp and > PCApply. > > If you do not call KSPSetUp() separately from KSPSolve() then its time is > included with KSPSolve(). > > PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends on > how much of the preconditioner construction can take place early, so depends > exactly on the preconditioner used. > > So yes the answer is not totally satisfying. The one thing I would > recommend is to not call KSPSetUp() directly and then KSPSolve() will always > include the total time of the solve plus all setup time. PCApply will contain > all the time to apply the preconditioner but may also include some setup time. > > Barry > > > Best, > Karthik. > > > > > From: Barry Smith <[email protected] <mailto:[email protected]>> > Date: Tuesday, 28 September 2021 at 16:56 > To: "Chockalingam, Karthikeyan (STFC,DL,HC)" > <[email protected] > <mailto:[email protected]>> > Cc: "[email protected] <mailto:[email protected]>" > <[email protected] <mailto:[email protected]>> > Subject: Re: [petsc-users] %T (percent time in this phase) > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI > <[email protected] > <mailto:[email protected]>> wrote: > > Hello, > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson > problem. I noticed from the output from using the flag -log_summary that for > various events their respective %T (percent time in this phase) do not add up > to 100 but rather exceeds 100. So, I gather there is some overlap among these > events. I am primarily looking at the events KSPSetUp, KSPSolve, PCSetUp and > PCSolve. Is it possible to get a mutually exclusive %T or Time for these > individual events? I have attached the log_summary output file from my run > for your reference. > > > For nested solvers it is tricky to get the times to be mutually exclusive > because some parts of the building of the preconditioner is for some > preconditioners delayed until the solve has started. > > It looks like you are using the default preconditioner options which for > this example are taking more or less no time since so many iterations are > needed. It is best to use -pc_type mg to use geometric multigrid on this > problem. > > Barry > > > > > Thanks! > Karthik. > > This email and any attachments are intended solely for the use of the named > recipients. If you are not the intended recipient you must not use, disclose, > copy or distribute this email or any of its attachments and should notify the > sender immediately and delete this email from your system. UK Research and > Innovation (UKRI) has taken every reasonable precaution to minimise risk of > this email or any attachments containing viruses or malware but the recipient > should carry out its own virus and malware checks before opening the > attachments. UKRI does not accept any liability for any losses or damages > which the recipient may sustain due to presence of any viruses. > <ksp_ex45_N511_cpu_6.txt> > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/> > <KSPSolve.pdf> > > <ksp_ex45_N511_cpu_6.txt><ksp_ex45_N511_gpu_2.txt>
