Hi Dominic,

I've got some time available at the end of this week for a merge to next. Is there anything other than PR #178 needed? It currently shows some conflicts, so is there any chance to rebase it on ~Thursday?

Best regards,
Karli



On 09/22/2014 09:38 PM, Dominic Meiser wrote:
On 09/22/2014 12:57 PM, Chung Shen wrote:
Dear PETSc Users,

I am new to PETSc and trying to determine if GPU speedup is possible
with the 3D Poisson solvers. I configured 2 copies of 'petsc-master'
on a standalone machine, one with CUDA toolkit 5.0 and one without
(both without MPI):
Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5,
Driver: 313.09)

I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting
about 20% speedup with GPU. Is this reasonable or did I miss something?

Attached is a comparison chart with two sample logs. The y-axis is the
elapsed time in seconds and the x-axis corresponds to the size of the
problem. In particular, I wonder if the numbers of calls to
'VecCUSPCopyTo' and 'VecCUSPCopyFrom' shown in the GPU log are excessive?

Thanks in advance for your reply.

Best Regards,

Chung Shen
A few comments:

- To get reliable timing you should configure PETSc without debugging
(i.e. --with-debugging=no)
- The ILU preconditioning in your GPU benchmark is done on the CPU. The
host-device data transfers are killing performance. Can you try to run
with the additional option --pc_factor_mat_solver_packe cusparse? This
will perform the preconditioning on the GPU.
- If you're interested in running benchmarks in parallel you will need a
few patches that are not yet in petsc/master. I can put together a
branch that has the needed fixes.

Cheers,
Dominic


Reply via email to