Re: [petsc-users] MemCpy (HtoD and DtoH) in Krylov solver

2019-07-23 Thread Karl Rupp via petsc-users
Hi, I have two quick questions related to run gpu solvers. 1) # of MPI processes vs # of GPUs. Is it true that we should set these two numbers equal if most of computations are done on GPU? For one case I tested, with only one GPU, running with np=2 is 15% slower than np=1 (probably due to

Re: [petsc-users] MemCpy (HtoD and DtoH) in Krylov solver

2019-07-19 Thread Karl Rupp via petsc-users
Hi Xiangdong, I can understand some of the numbers, but not the HtoD case. In DtoH1, it is the data movement from VecMDot. The size of data is 8.192KB, which is sizeof(PetscScalar) * MDOT_WORKGROUP_NUM * 8 = 8*128*8 = 8192. My question is: instead of calling cublasDdot nv times, why do you

Re: [petsc-users] MemCpy (HtoD and DtoH) in Krylov solver

2019-07-18 Thread Karl Rupp via petsc-users
Hi, as you can see from the screenshot, the communication is merely for scalars from the dot-products and/or norms. These are needed on the host for the control flow and convergence checks and is true for any iterative solver. Best regards, Karli On 7/18/19 3:11 PM, Xiangdong via

Re: [petsc-users] GPUs, cud, complex

2019-02-24 Thread Karl Rupp via petsc-users
Hi, just for information: we've seen some issues with Thrust in recent CUDA versions (mostly compilation issues). I don't know whether this is the cause of this particular error, though. Best regards, Karli On 2/23/19 6:00 AM, Smith, Barry F. via petsc-users wrote: I get this in the