I am not sure I understand the logic, we print the ratio of max/min. I report max and look at the ratio to see if I might be catching some load imbalance or whatever. Is there a problem with that workflow?
I assume there is or you would not have done this, so can I add a method that I think also has legitimate values? I did observe a 15% hit on my whole stage with the GPU timers but I would still like to take my chances with: Landau Operator 29 1.0 1.1200e-02 1.0 4.54e+06 1.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 40 4 0 0 0 405 509 0 0.00e+00 0 0.00e+00 100 Landau Jacobian 15 1.0 7.8189e-03 1.0 4.54e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 28 4 0 0 0 580 731 0 0.00e+00 0 0.00e+00 100 Landau Mass 14 1.0 3.3759e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 12 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 These methods require barriers at the end before they call MatSetValuesCOO, which is like < 1% of these total matrix times, so I trust the data w/o GPU times set. (The rest of the stage is the rest of the time step, which is mostly the linear solver, and I guess that is where the 15% comes from but I have not dug into it) Thanks, Mark On Wed, Apr 27, 2022 at 11:06 AM Barry Smith <bsm...@petsc.dev> wrote: > > Only KSPSolve, SNESSolve, TSSolve will have legitimate values. > "High-level" functions like PtAP can be asynchronous (meaning the GPU > returns to the CPU to do more stuff) before they are complete, hence their > timings would be incorrect and so they must be labeled with a special > marker and not have numbers for time. > > Jed reports a possible 10% smaller time for KSPSolve() in this mode in > some of his runs, compared to timing correctly the inner operations, and > thus feels it is the best default. > > Barry > > > On Apr 27, 2022, at 10:08 AM, Mark Adams <mfad...@lbl.gov> wrote: > > > > On Tue, Apr 26, 2022 at 8:00 PM Barry Smith <bsm...@petsc.dev> wrote: > >> >> The current nan output has to be replaced to get the column alignment >> correct, I just didn't feel like making that change also in the same MR. >> >> Something like Unknown or anything that fits in the column space would >> be fine. It just means for the given run the timing numbers are not >> meaningful/correct for those events. >> > > Just a note, just about every event is NAN for me. My GAMG setup that is > all CPU is NAN. High level functions like PtAP as well. > That said, adding -log_view_gpu_time is fine. Not worth the churn. > > >> This is to obtain the best meaningful results for the outer events per >> Jed since timing the inner events accurately introduces extra time in the >> outer events. That is it is not possible to have the best accurate times >> for both inner events and outer events in the same run. So if you want to >> compare KSPSolve timings, for example, you run as-is, it you want to >> examine, low-level vector operations run also with -log_view_gpu_time but >> know that the KSP times are higher than need be. >> >> Sorry for the confusion. >> >> >> >> >> On Apr 26, 2022, at 3:49 PM, Matthew Knepley <knep...@gmail.com> wrote: >> >> On Tue, Apr 26, 2022 at 12:03 PM Mark Adams <mfad...@lbl.gov> wrote: >> >>> Well, Nans are a clear sign that something is very wrong. >>> >> >> Barry chose them so that it could not be mistaken for an actual number. >> >> Matt >> >> >>> On Tue, Apr 26, 2022 at 11:52 AM Jacob Faibussowitsch < >>> jacob....@gmail.com> wrote: >>> >>>> There is an automatic warning that shows when you do run with >>>> `-log_view_gpu_time`, but perhaps there should also be an automatic warning >>>> when *not* running with it. It is unfortunate that NaN is the value printed >>>> as this implies a bug but AFAIK it is unavoidable (Barry can say more on >>>> this though). >>>> >>>> Best regards, >>>> >>>> Jacob Faibussowitsch >>>> (Jacob Fai - booss - oh - vitch) >>>> >>>> > On Apr 26, 2022, at 09:48, Jose E. Roman <jro...@dsic.upv.es> wrote: >>>> > >>>> > You have to add -log_view_gpu_time >>>> > See https://gitlab.com/petsc/petsc/-/merge_requests/5056 >>>> > >>>> > Jose >>>> > >>>> > >>>> >> El 26 abr 2022, a las 16:39, Mark Adams <mfad...@lbl.gov> escribió: >>>> >> >>>> >> I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log >>>> timing data except the two 'Solve' lines. >>>> >> Just cg/jacobi on snes/ex56. >>>> >> >>>> >> Any ideas? >>>> >> >>>> >> VecTDot 2 1.0 nan nan 1.20e+01 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>>> 0.00e+00 100 >>>> >> VecNorm 2 1.0 nan nan 1.00e+01 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>>> 0.00e+00 100 >>>> >> VecCopy 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>>> 0.00e+00 0 >>>> >> VecSet 5 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>>> 0.00e+00 0 >>>> >> VecAXPY 4 1.0 nan nan 2.40e+01 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 >>>> 0.00e+00 100 >>>> >> VecPointwiseMult 1 1.0 nan nan 3.00e+00 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>>> 0.00e+00 100 >>>> >> KSPSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 >>>> 0.00e+00 0 >>>> >> KSPSolve 1 1.0 4.0514e-04 1.0 5.50e+01 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 -nan 0 >>>> 0.00e+00 0 0.00e+00 100 >>>> >> SNESSolve 1 1.0 2.2128e-02 1.0 5.55e+05 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 72 56 0 0 0 100100 0 0 0 25 -nan 0 >>>> 0.00e+00 0 0.00e+00 0 >>>> > >>>> >>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >> >> >