Good catch. Thanks. Maybe the SeqCUDA has the same problem?
> On Jul 27, 2019, at 10:40 AM, Mark Adams <[email protected]> wrote: > > Yea, I just figured out the problem. VecDuplicate_MPICUDA did not call > PinToCPU or even copy pinnedtocpu. It just copied ops, so I added and am > testing: > > ierr = VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr); > vw = (Vec_MPI*)(*v)->data; > ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct _VecOps));CHKERRQ(ierr); > ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr); > > Thanks, > > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F. <[email protected]> wrote: > > I don't understand the context. Once a vector is pinned to the CPU the flag > should be PETSC_OFFLOAD_CPU permanently until the pin to cpu is turned off. > Do you have a pinned vector that has the value PETSC_OFFLOAD_GPU? For > example here it is set to PETSC_OFFLOAD_CPU > > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin) > { > .... > if (pin) { > ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); > V->valid_GPU_array = PETSC_OFFLOAD_CPU; /* since the CPU code will likely > change values in the vector */ > > > Is there any way to reproduce the problem? > > Barry > > > > > > On Jul 27, 2019, at 10:28 AM, Mark Adams <[email protected]> wrote: > > > > I'm not sure what to do here. The problem is that pinned-to-cpu vectors are > > calling VecCUDACopyFromGPU here. > > > > Should I set x->valid_GPU_array to something else, like PETSC_OFFLOAD_CPU, > > in PinToCPU so this block of code i s not executed? > > > > PetscErrorCode VecGetArray(Vec x,PetscScalar **a) > > { > > PetscErrorCode ierr; > > #if defined(PETSC_HAVE_VIENNACL) > > PetscBool is_viennacltype = PETSC_FALSE; > > #endif > > > > PetscFunctionBegin; > > PetscValidHeaderSpecific(x,VEC_CLASSID,1); > > ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr); > > if (x->petscnative) { > > #if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA) > > if (x->valid_GPU_array == PETSC_OFFLOAD_GPU) { > > #if defined(PETSC_HAVE_VIENNACL) > > ierr = > > PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr); > > if (is_viennacltype) { > > ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr); > > } else > > #endif > > { > > #if defined(PETSC_HAVE_CUDA) > > ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr); > > #endif > > } > > } else if (x->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) { > > #if defined(PETSC_HAVE_VIENNACL) > > ierr = > > PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr); > > if (is_viennacltype) { > > ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr); > > } else > > #endif > > { > > #if defined(PETSC_HAVE_CUDA) > > ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr); > > #endif > > } > > } > > #endif > > *a = *((PetscScalar**)x->data); > > } else { > > > > > > On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F. <[email protected]> wrote: > > > > Yes, it needs to be able to switch back and forth between the CPU and GPU > > methods so you need to move into it the setting of the methods that is > > currently directly in the create method. See how > > MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr = > > MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set the methods > > for the GPU initially. > > > > Barry > > > > > > > On Jul 23, 2019, at 7:32 PM, Mark Adams <[email protected]> wrote: > > > > > > > > > What are the symptoms of it not working? Does it appear to be still > > > copying the matrices to the GPU? then running the functions on the GPU? > > > > > > > > > The object is dispatching the CUDA mat-vec etc. > > > > > > I suspect the pinning is incompletely done for CUDA (and MPIOpenCL) > > > matrices. > > > > > > > > > Yes, git grep MatPinToCPU shows stuff for ViennaCL but not CUDA. > > > > > > I guess I can add something like this below. Do we need to set the device > > > methods? They are already set when this method is set, right? > > > > > > We need the equivalent of > > > > > > static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat A,PetscBool flg) > > > { > > > PetscFunctionBegin; > > > A->pinnedtocpu = flg; > > > if (flg) { > > > A->ops->mult = MatMult_SeqAIJ; > > > A->ops->multadd = MatMultAdd_SeqAIJ; > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJ; > > > A->ops->duplicate = MatDuplicate_SeqAIJ; > > > } else { > > > A->ops->mult = MatMult_SeqAIJViennaCL; > > > A->ops->multadd = MatMultAdd_SeqAIJViennaCL; > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJViennaCL; > > > A->ops->destroy = MatDestroy_SeqAIJViennaCL; > > > A->ops->duplicate = MatDuplicate_SeqAIJViennaCL; > > > } > > > PetscFunctionReturn(0); > > > } > > > > > > for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't look like it has > > > been written yet. > > > > > > > > > > > > > > It does not seem to work. It does not look like CUDA has an > > > > MatCreateVecs. Should I add one and copy this flag over? > > > > > > We do need this function. But I don't see how it relates to pinning. > > > When the matrix is pinned to the CPU we want it to create CPU vectors > > > which I assume it does. > > > > > > > > > > > > > > Mark > > > > > >
