Start by running a good size problem with -log_view (one MPI rank is best for all initial studies) and see the performance and the parts on the GPU. For a GPU your minimum problem size should be about 1 million unknowns!
Feel free to send the -log_view output. Barry > On Jan 14, 2022, at 4:27 PM, Rohan Yadav <roh...@alumni.cmu.edu> wrote: > > Hi, > > I'm looking to use PETSc with GPUs to do some linear algebra operations, like > SpMV, SPMM etc. Building PETSc with `--with-cuda=1` and running with > `-mat_type aijcusparse -vec_type cuda` gives me a large slowdown from the > same code running on the CPU. This is not entirely unexpected, as things like > data transfer costs across the PCIE might erroneously be included in my > timing. Are there some examples of benchmarking GPU computations with PETSc, > or just the proper way to write code in PETSc that will work for CPUs and > GPUs? > > Rohan