Add an implementation of MatGetDiagonal_SeqAIJCUSPARSE(), which is missing. Use for example this: https://stackoverflow.com/questions/60311408/how-to-get-the-diagonal-of-a-sparse-matrix-in-cusparse
Jose > El 8 jun 2022, a las 3:21, Mark Adams <[email protected]> escribió: > > I am looking at TS/SNES/KSP/GAMG solve with Landau, which is all on the GPU, > but it looks like MatGetDiagonal (see attached), and to a lesser extent > VecPointWiseMult (biggest red band on the right side under PCApply), are > resulting in expensive CPU-GPU movement. MatGetDiagonal on the fine grid is > taking about 10x the time of TFQMR/GAMG iteration. > > Attached is a view of this with CUDA and an nsys data file with Kokkos that > is pretty much the same. > > Any thoughts on how to fix this? > > Thanks, > Mark > <Screen Shot 2022-06-07 at 8.31.20 PM.png><output_ex2_3d_kokkos.nsys-rep>
