Hi,

as you can see from the screenshot, the communication is merely for scalars from the dot-products and/or norms. These are needed on the host for the control flow and convergence checks and is true for any iterative solver.

Best regards,
Karli



On 7/18/19 3:11 PM, Xiangdong via petsc-users wrote:


On Thu, Jul 18, 2019 at 5:11 AM Smith, Barry F. <[email protected] <mailto:[email protected]>> wrote:


        1) What preconditioner are you using? If any.

Currently I am using none as I want to understand how gmres works on GPU.


        2) Where/how are you getting this information about the
    MemCpy(HtoD) and one call MemCpy(DtoH)? We might like to utilize
    this same sort of information to plan future optimizations.

I am using nvprof and nvvp from cuda toolkit. It looks like there are one MemCpy(HtoD) and three MemCpy(DtoH) calls per iteration for np=1 case. See the attached snapshots.

        3) Are you using more than 1 MPI rank?


I tried both np=1 and np=2. Attached please find snapshots from nvvp for both np=1 and np=2 cases. The figures showing gpu calls with two pure gmres iterations.

Thanks.
Xiangdong


       If you use the master branch (which we highly recommend for
    anyone using GPUs and PETSc) the -log_view option will log
    communication between CPU and GPU and display it in the summary
    table. This is useful for seeing exactly what operations are doing
    vector communication between the CPU/GPU.

       We welcome all feedback on the GPUs since it previously has only
    been lightly used.

        Barry


     > On Jul 16, 2019, at 9:05 PM, Xiangdong via petsc-users
    <[email protected] <mailto:[email protected]>> wrote:
     >
     > Hello everyone,
     >
     > I am new to petsc gpu and have a simple question.
     >
     > When I tried to solve Ax=b where A is MATAIJCUSPARSE and b and x
    are VECSEQCUDA  with GMRES(or GCR) and pcnone, I found that during
    each krylov iteration, there are one call MemCpy(HtoD) and one call
    MemCpy(DtoH). Does that mean the Krylov solve is not 100% on GPU and
    the solve still needs some work from CPU? What are these MemCpys for
    during the each iteration?
     >
     > Thank you.
     >
     > Best,
     > Xiangdong

Reply via email to