Dear all, We are trying to solve ksp using GPUs. We found the example, src/ksp/ksp/tutorials/bench_kspsolve.c, in which the matrix is created and assembling using COO way provided by PETSc. In this example, the number of CPU is as same as the number of GPU. In our case, computation of the parameters of matrix is performed on CPUs. And the cost of it is expensive, which might take half of total time or even more.
We want to use more CPUs to compute parameters in parallel. And a smaller communication domain (such as gpu_comm) for the CPUs corresponding to the GPUs is created. The parameters are computed by all of the CPUs (in MPI_COMM_WORLD). Then, the parameters are send to gpu_comm related CPUs via MPI. Matrix (type of aijcusparse) is then created and assembled within gpu_comm. Finally, ksp_solve is performed on GPUs. I’m not sure if this approach will work in practice. Are there any comparable examples I can look to for guidance? Best, Wenbo
