On Wed, Sep 29, 2021 at 6:24 AM Karthikeyan Chockalingam - STFC UKRI < [email protected]> wrote:
> Thank you Mathew. Now, it is all making sense to me. > > > > From data file ksp_ex45_N511_gpu_2.txt > > > > KSPSolve (53%) + KSPSetup (0%) = PCSetup (16%) + PCApply (37%). > > > > However, you said “So an iteration would mostly consist of MatMult + > PCApply, with some vector work” > 1) You do one solve, but 2 KSPSetUp()s. You must be running on more than one process and using Block-Jacobi . Half the time is spent in the solve (53%) KSPSetUp 2 1.0 5.3149e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 1 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 1.5837e+02 1.1 8.63e+11 1.0 6.8e+02 2.1e+06 4.4e+03 53100100100 95 53100100100 96 10881 11730 1022 6.40e+03 1021 8.17e-03 100 2) The preconditioner look like BJacobi-ILU. The setup time is 16%, which is all setup of the individual blocks, and this is all used by the numerical ILU factorization. PCSetUp 2 1.0 4.9623e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 16 0 0 0 0 16 0 0 0 0 58 0 2 6.93e+03 0 0.00e+00 0 PCSetUpOnBlocks 1 1.0 4.9274e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15 0 0 0 0 15 0 0 0 0 59 0 2 6.93e+03 0 0.00e+00 0 MatLUFactorNum 1 1.0 4.6126e+01 1.3 1.45e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 0 0 0 0 14 0 0 0 0 63 0 2 6.93e+03 0 0.00e+00 0 MatILUFactorSym 1 1.0 2.5110e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 3) The preconditioner application takes 37% of the time, which is all solving the factors and recorded in MatSolve(). Matrix multiplication takes 4%. PCApply 341 1.0 1.3068e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 37 34 0 0 0 37 34 0 0 0 4516 4523 1 5.34e+02 0 0.00e+00 100 MatSolve 341 1.0 1.3009e+02 1.6 2.96e+11 1.0 0.0e+00 0.0e+00 0.0e+00 36 34 0 0 0 36 34 0 0 0 4536 4538 1 5.34e+02 0 0.00e+00 100 MatMult 341 1.0 1.0774e+01 1.1 2.96e+11 1.0 6.9e+02 2.1e+06 2.0e+00 4 34100100 0 4 34100100 0 54801 66441 2 5.86e+03 0 0.00e+00 100 4) The significant vector time is all in norms (11%) since they are really slow on the GPU. VecNorm 342 1.0 6.2261e+01129.9 4.57e+10 1.0 0.0e+00 0.0e+00 6.8e+02 11 5 0 0 15 11 5 0 0 15 1466 196884 0 0.00e+00 342 2.74e-03 100 VecTDot 680 1.0 1.7107e+00 1.3 9.09e+10 1.0 0.0e+00 0.0e+00 1.4e+03 1 10 0 0 29 1 10 0 0 29 106079 133922 0 0.00e+00 680 5.44e-03 100 VecAXPY 681 1.0 3.2036e+00 1.7 9.10e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 11 0 0 0 1 11 0 0 0 56728 58367 682 5.34e+02 0 0.00e+00 100 VecAYPX 339 1.0 2.6502e+00 1.8 4.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 34136 34153 339 2.71e-03 0 0.00e+00 100 So the solve time is: 53% ~ 37% + 4% + 11% and the setup time is about 16%. I was wrong about the SetUp time being included, as it is outside the event: https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/interface/itfunc.c#L852 It looks like the remainder of the time (23%) is spent preallocating the matrix. Thanks, Matt The MalMult event is 4 %. How does this event figure into the above > equation; if preconditioning (MatMult + PCApply) is included in KSPSolve? > > > > Best, > > Karthik. > > > > *From: *Matthew Knepley <[email protected]> > *Date: *Wednesday, 29 September 2021 at 10:58 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > [email protected]> > *Cc: *Barry Smith <[email protected]>, "[email protected]" < > [email protected]> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > On Wed, Sep 29, 2021 at 5:52 AM Karthikeyan Chockalingam - STFC UKRI < > [email protected]> wrote: > > That was helpful. I would like to provide some additional details of my > run on cpus and gpus. Please find the following attachments: > > > > 1. graph.pdf a plot showing overall time and various petsc events. > 2. ksp_ex45_N511_cpu_6.txt data file of the log_summary > 3. ksp_ex45_N511_gpu_2.txt data file of the log_summary > > > > I used the following petsc options for cpu > > > > mpirun -n 6 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaij -dm_vec_type mpi -ksp_type cg -pc_type bjacobi > -ksp_monitor > > > > and for gpus > > > > mpirun -n 1 ./ex45 -log_summary -da_grid_x 511 -da_grid_y 511 -da_grid_z > 511 -dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type cg -pc_type > bjacobi -ksp_monitor > > > > to run the following problem > > > > https://petsc.org/release/src/ksp/ksp/tutorials/ex45.c.html > > > > From the above code, I see is there no individual function called KSPSetUp(), > so I gather KSPSetDM, KSPSetComputeInitialGuess, KSPSetComputeRHS, > kSPSetComputeOperators all are timed together as KSPSetUp. For this > example, is KSPSetUp time and KSPSolve time mutually exclusive? > > > > No, KSPSetUp() will be contained in KSPSolve() if it is called > automatically. > > > > In your response you said that > > > > “PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it > depends on how much of the preconditioner construction can take place > early, so depends exactly on the preconditioner used.” > > > > I don’t see a explicit call to PCSetUp() or PCApply() in ex45; so for > this particular preconditioner (bjacobi) how can I tell how they are timed? > > > > They are all inside KSPSolve(). If you have a preconditioned linear solve, > the oreconditioning happens during the iteration. So an iteration would > mostly > > consist of MatMult + PCApply, with some vector work. > > > > I am hoping to time KSP solving and preconditioning mutually exclusively. > > > > I am not sure that concept makes sense here. See above. > > > > Thanks, > > > > Matt > > > > > > Kind regards, > > Karthik. > > > > > > *From: *Barry Smith <[email protected]> > *Date: *Tuesday, 28 September 2021 at 19:19 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > [email protected]> > *Cc: *"[email protected]" <[email protected]> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 12:11 PM, Karthikeyan Chockalingam - STFC UKRI < > [email protected]> wrote: > > > > Thanks for Barry for your response. > > > > I was just benchmarking the problem with various preconditioner on cpu and > gpu. I understand, it is not possible to get mutually exclusive timing. > > However, can you tell if KSPSolve time includes both PCSetup and PCApply? > And if KSPSolve and KSPSetup are mutually exclusive? Likewise for PCSetUp > and PCApply. > > > > If you do not call KSPSetUp() separately from KSPSolve() then its time > is included with KSPSolve(). > > > > PCSetUp() time may be in KSPSetUp() or it maybe in PCApply() it depends > on how much of the preconditioner construction can take place early, so > depends exactly on the preconditioner used. > > > > So yes the answer is not totally satisfying. The one thing I would > recommend is to not call KSPSetUp() directly and then KSPSolve() will > always include the total time of the solve plus all setup time. PCApply > will contain all the time to apply the preconditioner but may also include > some setup time. > > > > Barry > > > > > > Best, > > Karthik. > > > > > > > > > > *From: *Barry Smith <[email protected]> > *Date: *Tuesday, 28 September 2021 at 16:56 > *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" < > [email protected]> > *Cc: *"[email protected]" <[email protected]> > *Subject: *Re: [petsc-users] %T (percent time in this phase) > > > > > > > > On Sep 28, 2021, at 10:55 AM, Karthikeyan Chockalingam - STFC UKRI < > [email protected]> wrote: > > > > Hello, > > > > I ran ex45 in the KPS tutorial, which is a 3D finite-difference Poisson > problem. I noticed from the output from using the flag -log_summary that > for various events their respective %T (percent time in this phase) do not > add up to 100 but rather exceeds 100. So, I gather there is some overlap > among these events. I am primarily looking at the events KSPSetUp, > KSPSolve, PCSetUp and PCSolve. Is it possible to get a mutually exclusive > %T or Time for these individual events? I have attached the log_summary > output file from my run for your reference. > > > > > > For nested solvers it is tricky to get the times to be mutually > exclusive because some parts of the building of the preconditioner is for > some preconditioners delayed until the solve has started. > > > > It looks like you are using the default preconditioner options which for > this example are taking more or less no time since so many iterations are > needed. It is best to use -pc_type mg to use geometric multigrid on this > problem. > > > > Barry > > > > > > > > Thanks! > > Karthik. > > > > This email and any attachments are intended solely for the use of the > named recipients. If you are not the intended recipient you must not use, > disclose, copy or distribute this email or any of its attachments and > should notify the sender immediately and delete this email from your > system. UK Research and Innovation (UKRI) has taken every reasonable > precaution to minimise risk of this email or any attachments containing > viruses or malware but the recipient should carry out its own virus and > malware checks before opening the attachments. UKRI does not accept any > liability for any losses or damages which the recipient may sustain due to > presence of any viruses. > > <ksp_ex45_N511_cpu_6.txt> > > > > > > > -- > > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
