On 09/22/2014 12:57 PM, Chung Shen wrote:
Dear PETSc Users,

I am new to PETSc and trying to determine if GPU speedup is possible with the 
3D Poisson solvers. I configured 2 copies of 'petsc-master' on a standalone 
machine, one with CUDA toolkit 5.0 and one without (both without MPI):
Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5, Driver: 
313.09)

I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting about 20% 
speedup with GPU. Is this reasonable or did I miss something?

Attached is a comparison chart with two sample logs. The y-axis is the elapsed 
time in seconds and the x-axis corresponds to the size of the problem. In 
particular, I wonder if the numbers of calls to 'VecCUSPCopyTo' and 
'VecCUSPCopyFrom' shown in the GPU log are excessive?

Thanks in advance for your reply.

Best Regards,

Chung Shen
A few comments:

- To get reliable timing you should configure PETSc without debugging (i.e. --with-debugging=no) - The ILU preconditioning in your GPU benchmark is done on the CPU. The host-device data transfers are killing performance. Can you try to run with the additional option --pc_factor_mat_solver_packe cusparse? This will perform the preconditioning on the GPU. - If you're interested in running benchmarks in parallel you will need a few patches that are not yet in petsc/master. I can put together a branch that has the needed fixes.

Cheers,
Dominic

--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com

Reply via email to