Dominic Meiser <[email protected]> writes:
> - To get reliable timing you should configure PETSc without debugging 
> (i.e. --with-debugging=no)
> - The ILU preconditioning in your GPU benchmark is done on the CPU. The 
> host-device data transfers are killing performance. Can you try to run 
> with the additional option --pc_factor_mat_solver_packe cusparse? This 
> will perform the preconditioning on the GPU.
> - If you're interested in running benchmarks in parallel you will need a 
> few patches that are not yet in petsc/master. I can put together a 
> branch that has the needed fixes.

And for the CPU version, considering using a configuration that makes
sense there.  Like FMG with Gauss-Seidel or Chebyshev smoothers and an
error tolerance proportional to discretization error.  You might find
that not enough time is spent on the fine grid to see a significant
speed-up.

Attachment: pgpiio4Q_4E77.pgp
Description: PGP signature

Reply via email to