Hi,
I have two quick questions related to run gpu solvers.
1) # of MPI processes vs # of GPUs. Is it true that we should set these
two numbers equal if most of computations are done on GPU? For one case
I tested, with only one GPU, running with np=2 is 15% slower than np=1
(probably due to
Hi Xiangdong,
I can understand some of the numbers, but not the HtoD case.
In DtoH1, it is the data movement from VecMDot. The size of data is
8.192KB, which is sizeof(PetscScalar) * MDOT_WORKGROUP_NUM * 8 = 8*128*8
= 8192. My question is: instead of calling cublasDdot nv times, why do
you
Hi,
as you can see from the screenshot, the communication is merely for
scalars from the dot-products and/or norms. These are needed on the host
for the control flow and convergence checks and is true for any
iterative solver.
Best regards,
Karli
On 7/18/19 3:11 PM, Xiangdong via
Hi,
just for information: we've seen some issues with Thrust in recent CUDA
versions (mostly compilation issues). I don't know whether this is the
cause of this particular error, though.
Best regards,
Karli
On 2/23/19 6:00 AM, Smith, Barry F. via petsc-users wrote:
I get this in the