On 09/22/2014 12:57 PM, Chung Shen wrote:
Dear PETSc Users,
I am new to PETSc and trying to determine if GPU speedup is possible with the
3D Poisson solvers. I configured 2 copies of 'petsc-master' on a standalone
machine, one with CUDA toolkit 5.0 and one without (both without MPI):
Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5, Driver:
313.09)
I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting about 20%
speedup with GPU. Is this reasonable or did I miss something?
Attached is a comparison chart with two sample logs. The y-axis is the elapsed
time in seconds and the x-axis corresponds to the size of the problem. In
particular, I wonder if the numbers of calls to 'VecCUSPCopyTo' and
'VecCUSPCopyFrom' shown in the GPU log are excessive?
Thanks in advance for your reply.
Best Regards,
Chung Shen
A few comments:
- To get reliable timing you should configure PETSc without debugging
(i.e. --with-debugging=no)
- The ILU preconditioning in your GPU benchmark is done on the CPU. The
host-device data transfers are killing performance. Can you try to run
with the additional option --pc_factor_mat_solver_packe cusparse? This
will perform the preconditioning on the GPU.
- If you're interested in running benchmarks in parallel you will need a
few patches that are not yet in petsc/master. I can put together a
branch that has the needed fixes.
Cheers,
Dominic
--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com