Hi,
as you can see from the screenshot, the communication is merely for
scalars from the dot-products and/or norms. These are needed on the host
for the control flow and convergence checks and is true for any
iterative solver.
Best regards,
Karli
On 7/18/19 3:11 PM, Xiangdong via petsc-users wrote:
On Thu, Jul 18, 2019 at 5:11 AM Smith, Barry F. <[email protected]
<mailto:[email protected]>> wrote:
1) What preconditioner are you using? If any.
Currently I am using none as I want to understand how gmres works on GPU.
2) Where/how are you getting this information about the
MemCpy(HtoD) and one call MemCpy(DtoH)? We might like to utilize
this same sort of information to plan future optimizations.
I am using nvprof and nvvp from cuda toolkit. It looks like there are
one MemCpy(HtoD) and three MemCpy(DtoH) calls per iteration for np=1
case. See the attached snapshots.
3) Are you using more than 1 MPI rank?
I tried both np=1 and np=2. Attached please find snapshots from nvvp for
both np=1 and np=2 cases. The figures showing gpu calls with two pure
gmres iterations.
Thanks.
Xiangdong
If you use the master branch (which we highly recommend for
anyone using GPUs and PETSc) the -log_view option will log
communication between CPU and GPU and display it in the summary
table. This is useful for seeing exactly what operations are doing
vector communication between the CPU/GPU.
We welcome all feedback on the GPUs since it previously has only
been lightly used.
Barry
> On Jul 16, 2019, at 9:05 PM, Xiangdong via petsc-users
<[email protected] <mailto:[email protected]>> wrote:
>
> Hello everyone,
>
> I am new to petsc gpu and have a simple question.
>
> When I tried to solve Ax=b where A is MATAIJCUSPARSE and b and x
are VECSEQCUDA with GMRES(or GCR) and pcnone, I found that during
each krylov iteration, there are one call MemCpy(HtoD) and one call
MemCpy(DtoH). Does that mean the Krylov solve is not 100% on GPU and
the solve still needs some work from CPU? What are these MemCpys for
during the each iteration?
>
> Thank you.
>
> Best,
> Xiangdong