Hi Karli,
PR #178 gets you most of the way. src/ksp/ksp/examples/tests/ex32.c uses
DMDA's which require a few additional fixes. I haven't opened a pull
request for these yet but I will do that before Thursday.
Regarding the rebase, wouldn't it be preferable to just resolve the
conflicts in the merge commit? In any event, I've merged these branches
several times into local integration branches created off of recent
petsc/master branches so I'm pretty familiar with the conflicts and how
to resolve them. I can help with the merge or do a rebase, whichever you
prefer.
Cheers,
Dominic
On 09/22/2014 10:37 PM, Karl Rupp wrote:
Hi Dominic,
I've got some time available at the end of this week for a merge to
next. Is there anything other than PR #178 needed? It currently shows
some conflicts, so is there any chance to rebase it on ~Thursday?
Best regards,
Karli
On 09/22/2014 09:38 PM, Dominic Meiser wrote:
On 09/22/2014 12:57 PM, Chung Shen wrote:
Dear PETSc Users,
I am new to PETSc and trying to determine if GPU speedup is possible
with the 3D Poisson solvers. I configured 2 copies of 'petsc-master'
on a standalone machine, one with CUDA toolkit 5.0 and one without
(both without MPI):
Machine: HP Z820 Workstation, Redhat Enterprise Linux 5.0
CPU: (x2) 8-core Xeon E5-2650 2.0GHz, 128GB Memory
GPU: (x2) Tesla K20c (706MHz, 5.12GB Memory, Cuda Compatibility: 3.5,
Driver: 313.09)
I used 'src/ksp/ksp/examples/tests/ex32.c' as a test and was getting
about 20% speedup with GPU. Is this reasonable or did I miss something?
Attached is a comparison chart with two sample logs. The y-axis is the
elapsed time in seconds and the x-axis corresponds to the size of the
problem. In particular, I wonder if the numbers of calls to
'VecCUSPCopyTo' and 'VecCUSPCopyFrom' shown in the GPU log are
excessive?
Thanks in advance for your reply.
Best Regards,
Chung Shen
A few comments:
- To get reliable timing you should configure PETSc without debugging
(i.e. --with-debugging=no)
- The ILU preconditioning in your GPU benchmark is done on the CPU. The
host-device data transfers are killing performance. Can you try to run
with the additional option --pc_factor_mat_solver_packe cusparse? This
will perform the preconditioning on the GPU.
- If you're interested in running benchmarks in parallel you will need a
few patches that are not yet in petsc/master. I can put together a
branch that has the needed fixes.
Cheers,
Dominic
--
Dominic Meiser
Tech-X Corporation
5621 Arapahoe Avenue
Boulder, CO 80303
USA
Telephone: 303-996-2036
Fax: 303-448-7756
www.txcorp.com