Hi, Philip, It looks the performance of MatPtAP is pretty bad. There are a lot of issues with PtAP, which I am going to address.
MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan -nan 0 0.00e+00 0 0.00e+00 0 Thanks. --Junchao Zhang On Fri, Jan 20, 2023 at 10:55 AM Fackler, Philip via petsc-users < [email protected]> wrote: > The following is the log_view output for the ported case using 4 MPI tasks. > > **************************************************************************************************************************************************************** > > *** WIDEN YOUR WINDOW TO 160 CHARACTERS. > Use 'enscript -r -fCourier9' to print this document > *** > > **************************************************************************************************************************************************************** > > ------------------------------------------------------------------ PETSc > Performance Summary: > ------------------------------------------------------------------ > > Unknown Name on a named iguazu with 4 processors, by 4pf Fri Jan 20 > 11:53:04 2023 > Using Petsc Release Version 3.18.3, unknown > > Max Max/Min Avg Total > Time (sec): 1.447e+01 1.000 1.447e+01 > Objects: 1.229e+03 1.003 1.226e+03 > Flops: 5.053e+09 1.217 4.593e+09 1.837e+10 > Flops/sec: 3.492e+08 1.217 3.174e+08 1.269e+09 > MPI Msg Count: 1.977e+04 1.067 1.895e+04 7.580e+04 > MPI Msg Len (bytes): 7.374e+07 1.088 3.727e+03 2.825e+08 > MPI Reductions: 2.065e+03 1.000 > > Flop counting convention: 1 flop = 1 real number operation of type > (multiply/divide/add/subtract) > e.g., VecAXPY() for real vectors of length N > --> 2N flops > and VecAXPY() for complex vectors of length N > --> 8N flops > > Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages > --- -- Message Lengths -- -- Reductions -- > Avg %Total Avg %Total Count > %Total Avg %Total Count %Total > 0: Main Stage: 1.4471e+01 100.0% 1.8371e+10 100.0% 7.580e+04 > 100.0% 3.727e+03 100.0% 2.046e+03 99.1% > > > ------------------------------------------------------------------------------------------------------------------------ > See the 'Profiling' chapter of the users' manual for details on > interpreting output. > Phase summary info: > Count: number of times phase was executed > Time and Flop: Max - maximum over all processors > Ratio - ratio of maximum to minimum over all processors > Mess: number of messages sent > AvgLen: average message length (bytes) > Reduct: number of global reductions > Global: entire computation > Stage: stages of a computation. Set stages with PetscLogStagePush() and > PetscLogStagePop(). > %T - percent time in this phase %F - percent flop in this > phase > %M - percent messages in this phase %L - percent message lengths > in this phase > %R - percent reductions in this phase > Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over > all processors) > GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU > time over all processors) > CpuToGpu Count: total number of CPU to GPU copies per processor > CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per > processor) > GpuToCpu Count: total number of GPU to CPU copies per processor > GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per > processor) > GPU %F: percent flops on GPU in this event > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > GPU - CpuToGpu - - GpuToCpu - GPU > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > Mflop/s Count Size Count Size %F > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > --- Event Stage 0: Main Stage > > BuildTwoSided 257 1.0 nan nan 0.00e+00 0.0 4.4e+02 8.0e+00 > 2.6e+02 1 0 1 0 12 1 0 1 0 13 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > BuildTwoSidedF 210 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 > 2.1e+02 1 0 0 2 10 1 0 0 2 10 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > DMCreateMat 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 7.0e+00 10 0 0 0 0 10 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetGraph 69 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFSetUp 47 1.0 nan nan 0.00e+00 0.0 7.3e+02 2.1e+03 > 4.7e+01 0 0 1 1 2 0 0 1 1 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFBcastBegin 222 1.0 nan nan 0.00e+00 0.0 2.3e+03 1.9e+04 > 0.0e+00 0 0 3 16 0 0 0 3 16 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFBcastEnd 222 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFReduceBegin 254 1.0 nan nan 0.00e+00 0.0 1.5e+03 1.2e+04 > 0.0e+00 0 0 2 6 0 0 0 2 6 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFReduceEnd 254 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFFetchOpBegin 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFFetchOpEnd 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFPack 8091 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SFUnpack 8092 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecDot 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 > 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMDot 398 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 4.0e+02 0 0 0 0 19 0 0 0 0 19 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNorm 641 1.0 nan nan 4.45e+07 1.2 0.0e+00 0.0e+00 > 6.4e+02 1 1 0 0 31 1 1 0 0 31 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScale 601 1.0 nan nan 2.08e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecCopy 3735 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecSet 2818 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAXPY 123 1.0 nan nan 8.68e+06 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAYPX 6764 1.0 nan nan 1.90e+08 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAXPBYCZ 2388 1.0 nan nan 1.83e+08 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 4 0 0 0 0 4 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecWAXPY 60 1.0 nan nan 4.30e+06 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecMAXPY 681 1.0 nan nan 1.36e+08 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 3 0 0 0 0 3 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecAssemblyBegin 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecAssemblyEnd 7 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecPointwiseMult 4449 1.0 nan nan 6.06e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecScatterBegin 7614 1.0 nan nan 0.00e+00 0.0 7.1e+04 2.9e+03 > 1.3e+01 0 0 94 73 1 0 0 94 73 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecScatterEnd 7614 1.0 nan nan 4.78e+04 1.5 0.0e+00 0.0e+00 > 0.0e+00 3 0 0 0 0 3 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecReduceArith 120 1.0 nan nan 8.60e+06 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > VecReduceComm 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 6.0e+01 0 0 0 0 3 0 0 0 0 3 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > VecNormalize 401 1.0 nan nan 4.09e+07 1.2 0.0e+00 0.0e+00 > 4.0e+02 0 1 0 0 19 0 1 0 0 20 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSStep 20 1.0 1.2908e+01 1.0 5.05e+09 1.2 7.6e+04 3.7e+03 > 2.0e+03 89 100 100 98 96 89 100 100 98 97 1423 > -nan 0 0.00e+00 0 0.00e+00 99 > > TSFunctionEval 140 1.0 nan nan 1.00e+07 1.2 1.1e+03 3.7e+04 > 0.0e+00 1 0 1 15 0 1 0 1 15 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > TSJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 > 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan > -nan 0 0.00e+00 0 0.00e+00 87 > > MatMult 4934 1.0 nan nan 4.16e+09 1.2 5.1e+04 2.7e+03 > 4.0e+00 15 82 68 49 0 15 82 68 49 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultAdd 1104 1.0 nan nan 9.00e+07 1.2 8.8e+03 1.4e+02 > 0.0e+00 1 2 12 0 0 1 2 12 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatMultTranspose 1104 1.0 nan nan 9.01e+07 1.2 8.8e+03 1.4e+02 > 1.0e+00 1 2 12 0 0 1 2 12 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatSolve 368 0.0 nan nan 3.57e+04 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSOR 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorSym 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatLUFactorNum 2 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatConvert 8 1.0 nan nan 0.00e+00 0.0 8.0e+01 1.2e+03 > 4.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatScale 66 1.0 nan nan 1.48e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 99 > > MatResidual 1104 1.0 nan nan 1.01e+09 1.2 1.2e+04 2.9e+03 > 0.0e+00 4 20 16 12 0 4 20 16 12 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > MatAssemblyBegin 590 1.0 nan nan 0.00e+00 0.0 1.5e+02 4.2e+04 > 2.0e+02 1 0 0 2 10 1 0 0 2 10 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAssemblyEnd 590 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.4e+02 2 0 0 0 7 2 0 0 0 7 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetRowIJ 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCreateSubMat 122 1.0 nan nan 0.00e+00 0.0 6.3e+01 1.8e+02 > 1.7e+02 2 0 0 0 8 2 0 0 0 8 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetOrdering 2 0.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatCoarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 > 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatZeroEntries 61 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatAXPY 6 1.0 nan nan 1.37e+06 1.2 0.0e+00 0.0e+00 > 1.8e+01 1 0 0 0 1 1 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatTranspose 6 1.0 nan nan 0.00e+00 0.0 2.2e+02 2.9e+04 > 4.8e+01 1 0 0 2 2 1 0 0 2 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatMatMultSym 4 1.0 nan nan 0.00e+00 0.0 2.2e+02 1.7e+03 > 2.8e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatMatMultNum 4 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatPtAPSymbolic 5 1.0 nan nan 0.00e+00 0.0 6.2e+02 5.2e+03 > 4.4e+01 3 0 1 1 2 3 0 1 1 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatPtAPNumeric 181 1.0 nan nan 0.00e+00 0.0 3.3e+03 1.8e+04 > 0.0e+00 56 0 4 21 0 56 0 4 21 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatGetLocalMat 185 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 1 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetPreallCOO 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 1.0e+01 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > MatSetValuesCOO 60 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSetUp 483 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.2e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve 60 1.0 1.1843e+01 1.0 4.91e+09 1.2 7.3e+04 2.9e+03 > 1.2e+03 82 97 97 75 60 82 97 97 75 60 1506 > -nan 0 0.00e+00 0 0.00e+00 99 > > KSPGMRESOrthog 398 1.0 nan nan 7.97e+07 1.2 0.0e+00 0.0e+00 > 4.0e+02 1 2 0 0 19 1 2 0 0 19 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESSolve 60 1.0 1.2842e+01 1.0 5.01e+09 1.2 7.5e+04 3.6e+03 > 2.0e+03 89 99 100 96 95 89 99 100 96 96 1419 > -nan 0 0.00e+00 0 0.00e+00 99 > > SNESSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > SNESFunctionEval 120 1.0 nan nan 3.01e+07 1.2 9.6e+02 3.7e+04 > 0.0e+00 1 1 1 13 0 1 1 1 13 0 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > SNESJacobianEval 60 1.0 nan nan 1.67e+07 1.2 4.8e+02 3.7e+04 > 6.0e+01 2 0 1 6 3 2 0 1 6 3 -nan > -nan 0 0.00e+00 0 0.00e+00 87 > > SNESLineSearch 60 1.0 nan nan 6.99e+07 1.2 9.6e+02 1.9e+04 > 2.4e+02 1 1 1 6 12 1 1 1 6 12 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > PCSetUp_GAMG+ 60 1.0 nan nan 3.53e+07 1.2 5.2e+03 1.4e+04 > 4.3e+02 62 1 7 25 21 62 1 7 25 21 -nan > -nan 0 0.00e+00 0 0.00e+00 96 > > PCGAMGCreateG 3 1.0 nan nan 1.32e+06 1.2 2.2e+02 2.9e+04 > 4.2e+01 1 0 0 2 2 1 0 0 2 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Coarsen 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 > 1.2e+02 1 0 1 0 6 1 0 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG MIS/Agg 3 1.0 nan nan 0.00e+00 0.0 5.0e+02 1.3e+03 > 1.2e+02 0 0 1 0 6 0 0 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMGProl 3 1.0 nan nan 0.00e+00 0.0 7.8e+01 7.8e+02 > 4.8e+01 0 0 0 0 2 0 0 0 0 2 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Prol-col 3 1.0 nan nan 0.00e+00 0.0 5.2e+01 5.8e+02 > 2.1e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Prol-lift 3 1.0 nan nan 0.00e+00 0.0 2.6e+01 1.2e+03 > 1.5e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMGOptProl 3 1.0 nan nan 3.40e+07 1.2 5.8e+02 2.4e+03 > 1.1e+02 1 1 1 0 6 1 1 1 0 6 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > GAMG smooth 3 1.0 nan nan 2.85e+05 1.2 1.9e+02 1.9e+03 > 3.0e+01 0 0 0 0 1 0 0 0 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 43 > > PCGAMGCreateL 3 1.0 nan nan 0.00e+00 0.0 4.8e+02 6.5e+03 > 8.0e+01 3 0 1 1 4 3 0 1 1 4 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG PtAP 3 1.0 nan nan 0.00e+00 0.0 4.5e+02 7.1e+03 > 2.7e+01 3 0 1 1 1 3 0 1 1 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > GAMG Reduce 1 1.0 nan nan 0.00e+00 0.0 3.6e+01 3.7e+01 > 5.3e+01 0 0 0 0 3 0 0 0 0 3 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Gal l00 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.4e+04 > 9.0e+00 46 0 1 6 0 46 0 1 6 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Opt l00 1 1.0 nan nan 0.00e+00 0.0 4.8e+01 1.7e+02 > 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Gal l01 60 1.0 nan nan 0.00e+00 0.0 1.6e+03 2.9e+04 > 9.0e+00 13 0 2 16 0 13 0 2 16 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Opt l01 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 4.8e+03 > 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Gal l02 60 1.0 nan nan 0.00e+00 0.0 1.1e+03 1.2e+03 > 1.7e+01 0 0 1 0 1 0 0 1 0 1 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCGAMG Opt l02 1 1.0 nan nan 0.00e+00 0.0 7.2e+01 2.2e+02 > 7.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCSetUp 182 1.0 nan nan 3.53e+07 1.2 5.3e+03 1.4e+04 > 7.7e+02 64 1 7 27 37 64 1 7 27 38 -nan > -nan 0 0.00e+00 0 0.00e+00 96 > > PCSetUpOnBlocks 368 1.0 nan nan 4.24e+02 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > PCApply 60 1.0 nan nan 4.85e+09 1.2 7.3e+04 2.9e+03 > 1.1e+03 81 96 96 75 54 81 96 96 75 54 -nan > -nan 0 0.00e+00 0 0.00e+00 99 > > KSPSolve_FS_0 60 1.0 nan nan 3.12e+07 1.2 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 0 1 0 0 0 -nan > -nan 0 0.00e+00 0 0.00e+00 0 > > KSPSolve_FS_1 60 1.0 nan nan 4.79e+09 1.2 7.2e+04 2.9e+03 > 1.1e+03 81 95 96 75 54 81 95 96 75 54 -nan > -nan 0 0.00e+00 0 0.00e+00 100 > > > --- Event Stage 1: Unknown > > > ------------------------------------------------------------------------------------------------------------------------ > --------------------------------------- > > > Object Type Creations Destructions. Reports information only > for process 0. > > --- Event Stage 0: Main Stage > > Container 14 14 > Distributed Mesh 9 9 > Index Set 120 120 > IS L to G Mapping 10 10 > Star Forest Graph 87 87 > Discrete System 9 9 > Weak Form 9 9 > Vector 761 761 > TSAdapt 1 1 > TS 1 1 > DMTS 1 1 > SNES 1 1 > DMSNES 3 3 > SNESLineSearch 1 1 > Krylov Solver 11 11 > DMKSP interface 1 1 > Matrix 171 171 > Matrix Coarsen 3 3 > Preconditioner 11 11 > Viewer 2 1 > PetscRandom 3 3 > > --- Event Stage 1: Unknown > > > ======================================================================================================================== > Average time to get PetscTime(): 3.82e-08 > Average time for MPI_Barrier(): 2.2968e-06 > Average time for zero size MPI_Send(): 3.371e-06 > #PETSc Option Table entries: > -log_view > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with 64 bit PetscInt > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 8 > Configure options: PETSC_DIR=/home2/4pf/petsc > PETSC_ARCH=arch-kokkos-serial --prefix=/home2/4pf/.local/serial > --with-cc=mpicc --with-cxx=mpicxx --with-fc=0 --with-cudac=0 --with-cuda=0 > --with-shared-libraries --with-64-bit-indices --with-debugging=0 > --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 > --with-kokkos-dir=/home2/4pf/.local/serial > --with-kokkos-kernels-dir=/home2/4pf/.local/serial --download-f2cblaslapack > > ----------------------------------------- > Libraries compiled on 2023-01-06 18:21:31 on iguazu > Machine characteristics: Linux-4.18.0-383.el8.x86_64-x86_64-with-glibc2.28 > Using PETSc directory: /home2/4pf/.local/serial > Using PETSc arch: > ----------------------------------------- > > Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-unknown-pragmas > -Wno-lto-type-mismatch -fstack-protector -fvisibility=hidden -O3 > ----------------------------------------- > > Using include paths: -I/home2/4pf/.local/serial/include > ----------------------------------------- > > Using C linker: mpicc > Using libraries: -Wl,-rpath,/home2/4pf/.local/serial/lib > -L/home2/4pf/.local/serial/lib -lpetsc > -Wl,-rpath,/home2/4pf/.local/serial/lib64 -L/home2/4pf/.local/serial/lib64 > -Wl,-rpath,/home2/4pf/.local/serial/lib -L/home2/4pf/.local/serial/lib > -lkokkoskernels -lkokkoscontainers -lkokkoscore -lf2clapack -lf2cblas -lm > -lX11 -lquadmath -lstdc++ -ldl > ----------------------------------------- > > > --- > > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* > ------------------------------ > *From:* Zhang, Junchao <[email protected]> > *Sent:* Tuesday, January 17, 2023 17:25 > *To:* Fackler, Philip <[email protected]>; > [email protected] < > [email protected]>; [email protected] < > [email protected]> > *Cc:* Mills, Richard Tran <[email protected]>; Blondel, Sophie < > [email protected]>; Roth, Philip <[email protected]> > *Subject:* [EXTERNAL] Re: Performance problem using COO interface > > Hi, Philip, > Could you add -log_view and see what functions are used in the solve? > Since it is CPU-only, perhaps with -log_view of different runs, we can > easily see which functions slowed down. > > --Junchao Zhang > ------------------------------ > *From:* Fackler, Philip <[email protected]> > *Sent:* Tuesday, January 17, 2023 4:13 PM > *To:* [email protected] < > [email protected]>; [email protected] < > [email protected]> > *Cc:* Mills, Richard Tran <[email protected]>; Zhang, Junchao < > [email protected]>; Blondel, Sophie <[email protected]>; Roth, Philip < > [email protected]> > *Subject:* Performance problem using COO interface > > In Xolotl's feature-petsc-kokkos branch I have ported the code to use > petsc's COO interface for creating the Jacobian matrix (and the Kokkos > interface for interacting with Vec entries). As the attached plots show for > one case, while the code for computing the RHSFunction and RHSJacobian > perform similarly (or slightly better) after the port, the performance for > the solve as a whole is significantly worse. > > Note: > This is all CPU-only (so kokkos and kokkos-kernels are built with only the > serial backend). > The dev version is using MatSetValuesStencil with the default > implementations for Mat and Vec. > The port version is using MatSetValuesCOO and is run with -dm_mat_type > aijkokkos -dm_vec_type kokkos. > The port/def version is using MatSetValuesCOO and is run with -dm_vec_type > kokkos (using the default Mat implementation). > > So, this seems to be due be a performance difference in the petsc > implementations. Please advise. Is this a known issue? Or am I missing > something? > > Thank you for the help, > > > *Philip Fackler * > Research Software Engineer, Application Engineering Group > Advanced Computing Systems Research Section > Computer Science and Mathematics Division > *Oak Ridge National Laboratory* >
