Hi Mark,

feel free to submit a fresh pull request now. I looked at your latest commit in the repository in order to cherry-pick it, but it looked like it had a few other bits in it as well.

Best regards,
Karli


On 7/28/19 6:27 PM, Mark Adams via petsc-dev wrote:
This is looking good. I'm not seeing the numerical problems, but I've just hid them by avoiding the GPU on coarse grids.

Should I submit a pull request now or test more or wait for Karl?

On Sat, Jul 27, 2019 at 7:37 PM Mark Adams <[email protected] <mailto:[email protected]>> wrote:

    Barry, I fixed CUDA to pin to CPUs correctly for GAMG at least.
    There are some hacks here that we can work on.

    I will start testing it tomorrow, but I am pretty sure that I have
    not regressed. I am hoping that this will fix the numerical
    problems, which seem to be associated with empty processors.

    I did need to touch code outside of GAMG and CUDA. It might be nice
    to test this in a next.

    GAMG now puts all reduced processorg grids on the CPU. This could be
    looked at in the future.


    On Sat, Jul 27, 2019 at 1:00 PM Smith, Barry F. <[email protected]
    <mailto:[email protected]>> wrote:



         > On Jul 27, 2019, at 11:53 AM, Mark Adams <[email protected]
        <mailto:[email protected]>> wrote:
         >
         >
         > On Sat, Jul 27, 2019 at 11:39 AM Smith, Barry F.
        <[email protected] <mailto:[email protected]>> wrote:
         >
         >   Good catch. Thanks. Maybe the SeqCUDA has the same problem?
         >
         > THis is done  (I may have done it).
         >
         > Now it seems to me that when you call VecPinToCPU you are
        setting up and don't have data, so this copy does not seem
        necessary. Maybe remove the copy here:
         >
         > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
         > {
         >   PetscErrorCode ierr;
         >
         >   PetscFunctionBegin;
         >   V->pinnedtocpu = pin;
         >   if (pin) {
         >     ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); ????

            The copy from GPU should actually only do anything if the
        GPU already has data and PETSC_OFFLOAD_GPU. If the GPU does not
        have data
        the copy doesn't do anything. When one calls VecPinToCPU() one
        doesn't know where the data is so the call must be made, but it
        may do nothing

           Note that VecCUDACopyFromGPU() calls
        VecCUDAAllocateCheckHost() not VecCUDAAllocateCheck() so the GPU
        will not allocate space,
        VecCUDAAllocateCheck() is called from VecCUDACopyToGPU().

            Yes, perhaps the naming could be more consistent:

        1) in one place it is Host in an other place it is nothing
        2) some places it is Host, Device, some places GPU,CPU

            Perhaps Karl can make these all consistent and simpler in
        his refactorization


           Barry


         >
         > or
         >
         > Not allocate the GPU if it is pinned by added in a check here:
         >
         > PetscErrorCode VecCUDAAllocateCheck(Vec v)
         > {
         >   PetscErrorCode ierr;
         >   cudaError_t    err;
         >   cudaStream_t   stream;
         >   Vec_CUDA       *veccuda;
         >
         >   PetscFunctionBegin;
         >   if (!v->spptr) {
         >     ierr = PetscMalloc(sizeof(Vec_CUDA),&v->spptr);CHKERRQ(ierr);
         >     veccuda = (Vec_CUDA*)v->spptr;
         > if (v->valid_GPU_array != PETSC_OFFLOAD_CPU) {
         >     err =
        
cudaMalloc((void**)&veccuda->GPUarray_allocated,sizeof(PetscScalar)*((PetscBLASInt)v->map->n));CHKERRCUDA(err);
         >     veccuda->GPUarray = veccuda->GPUarray_allocated;
         >     err = cudaStreamCreate(&stream);CHKERRCUDA(err);
         >     veccuda->stream = stream;
         >     veccuda->hostDataRegisteredAsPageLocked = PETSC_FALSE;
         >     if (v->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) {
         >       if (v->data && ((Vec_Seq*)v->data)->array) {
         >         v->valid_GPU_array = PETSC_OFFLOAD_CPU;
         >       } else {
         >         v->valid_GPU_array = PETSC_OFFLOAD_GPU;
         >       }
         >     }
         > }
         >   }
         >   PetscFunctionReturn(0);
         > }
         >
         >
         >
         >
         >
         > > On Jul 27, 2019, at 10:40 AM, Mark Adams <[email protected]
        <mailto:[email protected]>> wrote:
         > >
         > > Yea, I just figured out the problem. VecDuplicate_MPICUDA
        did not call PinToCPU or even copy pinnedtocpu. It just copied
        ops, so I added and am testing:
         > >
         > >   ierr =
        VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr);
         > >   vw   = (Vec_MPI*)(*v)->data;
         > >   ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct
        _VecOps));CHKERRQ(ierr);
         > >   ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr);
         > >
         > > Thanks,
         > >
         > > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F.
        <[email protected] <mailto:[email protected]>> wrote:
         > >
         > >   I don't understand the context. Once a vector is pinned
        to the CPU the flag should be PETSC_OFFLOAD_CPU permanently
        until the pin to cpu is turned off.  Do you have a pinned vector
        that has the value PETSC_OFFLOAD_GPU?  For example here it is
        set to PETSC_OFFLOAD_CPU
         > >
         > > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin)
         > > {
         > > ....
         > >   if (pin) {
         > >     ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr);
         > >     V->valid_GPU_array = PETSC_OFFLOAD_CPU; /* since the
        CPU code will likely change values in the vector */
         > >
         > >
         > >   Is there any way to reproduce the problem?
         > >
         > >   Barry
         > >
         > >
         > >
         > >
         > > > On Jul 27, 2019, at 10:28 AM, Mark Adams <[email protected]
        <mailto:[email protected]>> wrote:
         > > >
         > > > I'm not sure what to do here. The problem is that
        pinned-to-cpu vectors are calling VecCUDACopyFromGPU here.
         > > >
         > > > Should I set x->valid_GPU_array to something else, like
        PETSC_OFFLOAD_CPU, in PinToCPU so this block of code i s not
        executed?
         > > >
         > > > PetscErrorCode VecGetArray(Vec x,PetscScalar **a)
         > > > {
         > > >   PetscErrorCode ierr;
         > > > #if defined(PETSC_HAVE_VIENNACL)
         > > >   PetscBool      is_viennacltype = PETSC_FALSE;
         > > > #endif
         > > >
         > > >   PetscFunctionBegin;
         > > >   PetscValidHeaderSpecific(x,VEC_CLASSID,1);
         > > >   ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr);
         > > >   if (x->petscnative) {
         > > > #if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA)
         > > >     if (x->valid_GPU_array == PETSC_OFFLOAD_GPU) {
         > > > #if defined(PETSC_HAVE_VIENNACL)
         > > >       ierr =
        
PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
         > > >       if (is_viennacltype) {
         > > >         ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr);
         > > >       } else
         > > > #endif
         > > >       {
         > > > #if defined(PETSC_HAVE_CUDA)
         > > >         ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr);
         > > > #endif
         > > >      }
         > > >     } else if (x->valid_GPU_array ==
        PETSC_OFFLOAD_UNALLOCATED) {
         > > > #if defined(PETSC_HAVE_VIENNACL)
         > > >       ierr =
        
PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr);
         > > >       if (is_viennacltype) {
         > > >         ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr);
         > > >       } else
         > > > #endif
         > > >       {
         > > > #if defined(PETSC_HAVE_CUDA)
         > > >         ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr);
         > > > #endif
         > > >       }
         > > >     }
         > > > #endif
         > > >     *a = *((PetscScalar**)x->data);
         > > >   } else {
         > > >
         > > >
         > > > On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F.
        <[email protected] <mailto:[email protected]>> wrote:
         > > >
         > > >  Yes, it needs to be able to switch back and forth
        between the CPU and GPU methods so you need to move into it the
        setting of the methods that is currently directly in the create
        method. See how  MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr =
        MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set
        the methods for the GPU initially.
         > > >
         > > >   Barry
         > > >
         > > >
         > > > > On Jul 23, 2019, at 7:32 PM, Mark Adams
        <[email protected] <mailto:[email protected]>> wrote:
         > > > >
         > > > >
         > > > >   What are the symptoms of it not working? Does it
        appear to be still copying the matrices to the GPU? then running
        the functions on the GPU?
         > > > >
         > > > >
         > > > > The object is dispatching the CUDA mat-vec etc.
         > > > >
         > > > >   I suspect the pinning is incompletely done for CUDA
        (and MPIOpenCL) matrices.
         > > > >
         > > > >
         > > > > Yes, git grep MatPinToCPU shows stuff for ViennaCL but
        not CUDA.
         > > > >
         > > > > I guess I can add something like this below. Do we need
        to set the device methods? They are already set when this method
        is set, right?
         > > > >
         > > > > We need the equivalent of
         > > > >
         > > > > static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat
        A,PetscBool flg)
         > > > > {
         > > > >   PetscFunctionBegin;
         > > > >   A->pinnedtocpu = flg;
         > > > >   if (flg) {
         > > > >     A->ops->mult           = MatMult_SeqAIJ;
         > > > >     A->ops->multadd        = MatMultAdd_SeqAIJ;
         > > > >     A->ops->assemblyend    = MatAssemblyEnd_SeqAIJ;
         > > > >     A->ops->duplicate      = MatDuplicate_SeqAIJ;
         > > > >   } else {
         > > > >     A->ops->mult           = MatMult_SeqAIJViennaCL;
         > > > >     A->ops->multadd        = MatMultAdd_SeqAIJViennaCL;
         > > > >     A->ops->assemblyend    = MatAssemblyEnd_SeqAIJViennaCL;
         > > > >     A->ops->destroy        = MatDestroy_SeqAIJViennaCL;
         > > > >     A->ops->duplicate      = MatDuplicate_SeqAIJViennaCL;
         > > > >   }
         > > > >   PetscFunctionReturn(0);
         > > > > }
         > > > >
         > > > > for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't
        look like it has been written yet.
         > > > >
         > > > >
         > > > > >
         > > > > > It does not seem to work. It does not look like CUDA
        has an MatCreateVecs. Should I add one and copy this flag over?
         > > > >
         > > > >    We do need this function. But I don't see how it
        relates to pinning. When the matrix is pinned to the CPU we want
        it to create CPU vectors which I assume it does.
         > > > >
         > > > >
         > > > > >
         > > > > > Mark
         > > > >
         > > >
         > >
         >

Reply via email to