Not clear how to interpret, the "gpu" FLOP rate for dot and norm are a good 
amount higher (exact details of where the log functions are located can affect 
this) but the over flop rates of them are not much better. Scatter is better 
without GPU MPI. How much of this is noise, need to see statistics from 
multiple runs. Certainly not satisfying.

GPU MPI

MatMult              400 1.0 8.4784e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 
0.0e+00  2 55 61 54  0  68 91100100  0 98667  139198      0 0.00e+00    0 
0.00e+00 100
KSPSolve               2 1.0 1.2222e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 
1.2e+03  3 60 61 54 60 100100100100100 75509  122610      0 0.00e+00    0 
0.00e+00 100
VecTDot              802 1.0 1.3863e+00 1.3 3.36e+09 1.0 0.0e+00 0.0e+00 
8.0e+02  0  2  0  0 40  10  3  0  0 67 19186   48762      0 0.00e+00    0 
0.00e+00 100
VecNorm              402 1.0 9.2933e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 
4.0e+02  0  1  0  0 20   6  1  0  0 33 14345  127332      0 0.00e+00    0 
0.00e+00 100
VecAXPY              800 1.0 8.2405e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   7  3  0  0  0 32195   62486      0 0.00e+00    0 
0.00e+00 100
VecAYPX              398 1.0 8.6891e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   6  1  0  0  0 15190   19019      0 0.00e+00    0 
0.00e+00 100
VecPointwiseMult     402 1.0 3.5227e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   3  1  0  0  0 18922   39878      0 0.00e+00    0 
0.00e+00 100
VecScatterBegin      400 1.0 1.1519e+00 2.1 0.00e+00 0.0 2.2e+04 8.5e+04 
0.0e+00  0  0 61 54  0   7  0100100  0     0       0      0 0.00e+00    0 
0.00e+00  0
VecScatterEnd        400 1.0 1.5642e+00 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  10  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0


MatMult              400 1.0 8.1754e+00 1.0 1.06e+11 1.0 2.2e+04 8.5e+04 
0.0e+00  2 55 61 54  0  65 91100100   102324  133771    800 4.74e+02  800 
4.74e+02 100
KSPSolve               2 1.0 1.2605e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 
1.2e+03  2 60 61 54 60 100100100100100 73214  113908    800 4.74e+02  800 
4.74e+02 100
VecTDot              802 1.0 2.0607e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 
8.0e+02  0  2  0  0 40  15  3  0  0 67 12907   25655      0 0.00e+00    0 
0.00e+00 100
VecNorm              402 1.0 9.5100e-01 2.1 1.69e+09 1.0 0.0e+00 0.0e+00 
4.0e+02  0  1  0  0 20   6  1  0  0 33 14018   96704      0 0.00e+00    0 
0.00e+00 100
VecAXPY              800 1.0 7.9864e-01 1.1 3.36e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   6  3  0  0  0 33219   65843      0 0.00e+00    0 
0.00e+00 100
VecAYPX              398 1.0 8.0719e-01 1.7 1.67e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   5  1  0  0  0 16352   21253      0 0.00e+00    0 
0.00e+00 100
VecPointwiseMult     402 1.0 3.7318e-01 1.1 8.43e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   3  1  0  0  0 17862   38464      0 0.00e+00    0 
0.00e+00 100
VecScatterBegin      400 1.0 1.4075e+00 1.8 0.00e+00 0.0 2.2e+04 8.5e+04 
0.0e+00  0  0 61 54  0   9  0100100  0     0       0      0 0.00e+00  800 
4.74e+02  0
VecScatterEnd        400 1.0 6.3044e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   5  0  0  0  0     0       0    800 4.74e+02    0 
0.00e+00  0


> On Jan 24, 2022, at 10:25 AM, Mark Adams <mfad...@lbl.gov> wrote:
> 
>  
>   Mark,
> 
>      Can you run both with GPU aware MPI?
> 
> 
> Perlmuter fails with GPU aware MPI. I think there are know problems with this 
> that are being worked on.
> 
> And here is Crusher with GPU aware MPI.
>  
> <jac_out_001_kokkos_Crusher_6_1_notpl.txt>

Reply via email to