On Sat, Jan 22, 2022 at 6:22 PM Barry Smith <bsm...@petsc.dev> wrote:
> > I cleaned up Mark's last run and put it in a fixed-width font. I > realize this may be too difficult but it would be great to have identical > runs to compare with on Summit. > I was planning on running this on Perlmutter today, as well as some sanity checks like all GPUs are being used. I'll try PetscDeviceView. Junchao modified the timers and all GPU > CPU now, but he seemed to move the timers more outside and Barry wants them tight on the "kernel". I think Junchao is going to work on that so I will hold off. (I removed the the Kokkos wait stuff and seemed to run a little faster but I am not sure how deterministic the timers are, and I did a test with GAMG and it was fine.) > > As Jed noted Scatter takes a long time but the pack and unpack take no > time? Is this not timed if using Kokkos? > > > --- Event Stage 2: KSP Solve only > > MatMult 400 1.0 8.8003e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 > 0.0e+00 2 55 61 54 0 70 91100100 95,058 132,242 0 0.00e+00 0 > 0.00e+00 100 > VecScatterBegin 400 1.0 1.3391e+00 2.6 0.00e+00 0.0 2.2e+04 8.5e+04 > 0.0e+00 0 0 61 54 0 7 0100100 0 0 0 0.00e+00 0 > 0.00e+00 0 > VecScatterEnd 400 1.0 1.3240e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 9 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SFPack 400 1.0 1.8276e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > SFUnpack 400 1.0 6.2653e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > KSPSolve 2 1.0 1.2540e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 > 1.2e+03 3 60 61 54 60 100100100 73,592 116,796 0 0.00e+00 0 > 0.00e+00 100 > VecTDot 802 1.0 1.3551e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 > 8.0e+02 0 2 0 0 40 10 3 0 19,627 52,599 0 0.00e+00 0 > 0.00e+00 100 > VecNorm 402 1.0 9.0151e-01 2.2 1.69e+09 1.0 0.0e+00 0.0e+00 > 4.0e+02 0 1 0 0 20 5 1 0 0 14,788 125,477 0 0.00e+00 0 > 0.00e+00 100 > VecAXPY 800 1.0 8.2617e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 2 0 0 0 7 3 0 0 32,112 61,644 0 0.00e+00 0 > 0.00e+00 100 > VecAYPX 398 1.0 8.1525e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 5 1 0 0 16,190 20,689 0 0.00e+00 0 > 0.00e+00 100 > VecPointwiseMult 402 1.0 3.5694e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 3 1 0 0 18,675 38,633 0 0.00e+00 0 > 0.00e+00 100 > > > > On Jan 22, 2022, at 12:40 PM, Mark Adams <mfad...@lbl.gov> wrote: > > And I have a new MR with if you want to see what I've done so far. > > >