Not clear how to interpret, the "gpu" FLOP rate for dot and norm are a good amount higher (exact details of where the log functions are located can affect this) but the over flop rates of them are not much better. Scatter is better without GPU MPI. How much of this is noise, need to see statistics from multiple runs. Certainly not satisfying.
GPU MPI MatMult 400 1.0 8.4784e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 2 55 61 54 0 68 91100100 0 98667 139198 0 0.00e+00 0 0.00e+00 100 KSPSolve 2 1.0 1.2222e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 3 60 61 54 60 100100100100100 75509 122610 0 0.00e+00 0 0.00e+00 100 VecTDot 802 1.0 1.3863e+00 1.3 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 10 3 0 0 67 19186 48762 0 0.00e+00 0 0.00e+00 100 VecNorm 402 1.0 9.2933e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 6 1 0 0 33 14345 127332 0 0.00e+00 0 0.00e+00 100 VecAXPY 800 1.0 8.2405e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 7 3 0 0 0 32195 62486 0 0.00e+00 0 0.00e+00 100 VecAYPX 398 1.0 8.6891e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 6 1 0 0 0 15190 19019 0 0.00e+00 0 0.00e+00 100 VecPointwiseMult 402 1.0 3.5227e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 18922 39878 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 400 1.0 1.1519e+00 2.1 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 7 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0 VecScatterEnd 400 1.0 1.5642e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 10 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 400 1.0 8.1754e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 2 55 61 54 0 65 91100100 102324 133771 800 4.74e+02 800 4.74e+02 100 KSPSolve 2 1.0 1.2605e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 2 60 61 54 60 100100100100100 73214 113908 800 4.74e+02 800 4.74e+02 100 VecTDot 802 1.0 2.0607e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 15 3 0 0 67 12907 25655 0 0.00e+00 0 0.00e+00 100 VecNorm 402 1.0 9.5100e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 6 1 0 0 33 14018 96704 0 0.00e+00 0 0.00e+00 100 VecAXPY 800 1.0 7.9864e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 6 3 0 0 0 33219 65843 0 0.00e+00 0 0.00e+00 100 VecAYPX 398 1.0 8.0719e-01 1.7 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 5 1 0 0 0 16352 21253 0 0.00e+00 0 0.00e+00 100 VecPointwiseMult 402 1.0 3.7318e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 17862 38464 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 400 1.0 1.4075e+00 1.8 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 9 0100100 0 0 0 0 0.00e+00 800 4.74e+02 0 VecScatterEnd 400 1.0 6.3044e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 5 0 0 0 0 0 0 800 4.74e+02 0 0.00e+00 0 > On Jan 24, 2022, at 10:25 AM, Mark Adams <mfad...@lbl.gov> wrote: > > > Mark, > > Can you run both with GPU aware MPI? > > > Perlmuter fails with GPU aware MPI. I think there are know problems with this > that are being worked on. > > And here is Crusher with GPU aware MPI. > > <jac_out_001_kokkos_Crusher_6_1_notpl.txt>