Hi,

> I am new to PETSc and trying to determine if GPU speedup is possible with the 3D Poisson solvers. I configured 2 copies of 'petsc-master' on a standalone machine, one with CUDA toolkit 5.0 and one without (both without MPI):
Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5, Driver: 
313.09)

I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting about 20% 
speedup with GPU. Is this reasonable or did I miss something?

That is fairly reasonable for your setting, yet the setup is not ideal: With the default ILU preconditioner, the residual gets copied between host and device in each iteration. Better use a preconditioner suitable for the GPU. For a Poisson problem you should get good numbers with the algebraic multigrid preconditioner in CUSP (-pctype sacusp)

For Poisson you may also try CG instead of GMRES to save all the orthogonalization costs - assuming that you use a symmetric preconditioner.

Attached is a comparison chart with two sample logs. The y-axis is the elapsed 
time in seconds and the x-axis corresponds to the size of the problem. In 
particular, I wonder if the numbers of calls to 'VecCUSPCopyTo' and 
'VecCUSPCopyFrom' shown in the GPU log are excessive?

They just manifest that the residual gets copied between host and device in each iteration because ILU is only run sequentially.

Best regards,
Karli

Reply via email to