Dominic Meiser <[email protected]> writes: > - To get reliable timing you should configure PETSc without debugging > (i.e. --with-debugging=no) > - The ILU preconditioning in your GPU benchmark is done on the CPU. The > host-device data transfers are killing performance. Can you try to run > with the additional option --pc_factor_mat_solver_packe cusparse? This > will perform the preconditioning on the GPU. > - If you're interested in running benchmarks in parallel you will need a > few patches that are not yet in petsc/master. I can put together a > branch that has the needed fixes.
And for the CPU version, considering using a configuration that makes sense there. Like FMG with Gauss-Seidel or Chebyshev smoothers and an error tolerance proportional to discretization error. You might find that not enough time is spent on the fine grid to see a significant speed-up.
pgpiio4Q_4E77.pgp
Description: PGP signature
