I don't understand the notation in the legend on the second page
12,288 cpus and no GPUs ? 24 GPUs? or 6 GPUs 192 GPUs? 1536 GPUs? 12,288 GPUs? or 12288/4 = 3072 GPUs? So on the largest run using GPUs or not takes pretty much exactly the same amount of time? What about 6 GPUs vs 24 CPUs ? Same equal amount of time. Can you send some log summaries > On Jul 29, 2019, at 4:01 PM, Mark Adams <[email protected]> wrote: > > FYI, CUDA is running and here is some preliminary data on up to 1/8 of > SUMMIT. This run with 4 cores/processes per GPU, so the GPU is virtualized > into 4 GPUs. > > On Sun, Jul 28, 2019 at 2:34 PM Karl Rupp <[email protected]> wrote: > Hi Mark, > > feel free to submit a fresh pull request now. I looked at your latest > commit in the repository in order to cherry-pick it, but it looked like > it had a few other bits in it as well. > > Best regards, > Karli > > > On 7/28/19 6:27 PM, Mark Adams via petsc-dev wrote: > > This is looking good. I'm not seeing the numerical problems, but I've > > just hid them by avoiding the GPU on coarse grids. > > > > Should I submit a pull request now or test more or wait for Karl? > > > > On Sat, Jul 27, 2019 at 7:37 PM Mark Adams <[email protected] > > <mailto:[email protected]>> wrote: > > > > Barry, I fixed CUDA to pin to CPUs correctly for GAMG at least. > > There are some hacks here that we can work on. > > > > I will start testing it tomorrow, but I am pretty sure that I have > > not regressed. I am hoping that this will fix the numerical > > problems, which seem to be associated with empty processors. > > > > I did need to touch code outside of GAMG and CUDA. It might be nice > > to test this in a next. > > > > GAMG now puts all reduced processorg grids on the CPU. This could be > > looked at in the future. > > > > > > On Sat, Jul 27, 2019 at 1:00 PM Smith, Barry F. <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > > > > On Jul 27, 2019, at 11:53 AM, Mark Adams <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > > > > On Sat, Jul 27, 2019 at 11:39 AM Smith, Barry F. > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > Good catch. Thanks. Maybe the SeqCUDA has the same problem? > > > > > > THis is done (I may have done it). > > > > > > Now it seems to me that when you call VecPinToCPU you are > > setting up and don't have data, so this copy does not seem > > necessary. Maybe remove the copy here: > > > > > > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin) > > > { > > > PetscErrorCode ierr; > > > > > > PetscFunctionBegin; > > > V->pinnedtocpu = pin; > > > if (pin) { > > > ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); ???? > > > > The copy from GPU should actually only do anything if the > > GPU already has data and PETSC_OFFLOAD_GPU. If the GPU does not > > have data > > the copy doesn't do anything. When one calls VecPinToCPU() one > > doesn't know where the data is so the call must be made, but it > > may do nothing > > > > Note that VecCUDACopyFromGPU() calls > > VecCUDAAllocateCheckHost() not VecCUDAAllocateCheck() so the GPU > > will not allocate space, > > VecCUDAAllocateCheck() is called from VecCUDACopyToGPU(). > > > > Yes, perhaps the naming could be more consistent: > > > > 1) in one place it is Host in an other place it is nothing > > 2) some places it is Host, Device, some places GPU,CPU > > > > Perhaps Karl can make these all consistent and simpler in > > his refactorization > > > > > > Barry > > > > > > > > > > or > > > > > > Not allocate the GPU if it is pinned by added in a check here: > > > > > > PetscErrorCode VecCUDAAllocateCheck(Vec v) > > > { > > > PetscErrorCode ierr; > > > cudaError_t err; > > > cudaStream_t stream; > > > Vec_CUDA *veccuda; > > > > > > PetscFunctionBegin; > > > if (!v->spptr) { > > > ierr = PetscMalloc(sizeof(Vec_CUDA),&v->spptr);CHKERRQ(ierr); > > > veccuda = (Vec_CUDA*)v->spptr; > > > if (v->valid_GPU_array != PETSC_OFFLOAD_CPU) { > > > err = > > > > cudaMalloc((void**)&veccuda->GPUarray_allocated,sizeof(PetscScalar)*((PetscBLASInt)v->map->n));CHKERRCUDA(err); > > > veccuda->GPUarray = veccuda->GPUarray_allocated; > > > err = cudaStreamCreate(&stream);CHKERRCUDA(err); > > > veccuda->stream = stream; > > > veccuda->hostDataRegisteredAsPageLocked = PETSC_FALSE; > > > if (v->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) { > > > if (v->data && ((Vec_Seq*)v->data)->array) { > > > v->valid_GPU_array = PETSC_OFFLOAD_CPU; > > > } else { > > > v->valid_GPU_array = PETSC_OFFLOAD_GPU; > > > } > > > } > > > } > > > } > > > PetscFunctionReturn(0); > > > } > > > > > > > > > > > > > > > > > > > On Jul 27, 2019, at 10:40 AM, Mark Adams <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > > > Yea, I just figured out the problem. VecDuplicate_MPICUDA > > did not call PinToCPU or even copy pinnedtocpu. It just copied > > ops, so I added and am testing: > > > > > > > > ierr = > > VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr); > > > > vw = (Vec_MPI*)(*v)->data; > > > > ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct > > _VecOps));CHKERRQ(ierr); > > > > ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr); > > > > > > > > Thanks, > > > > > > > > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F. > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > > > I don't understand the context. Once a vector is pinned > > to the CPU the flag should be PETSC_OFFLOAD_CPU permanently > > until the pin to cpu is turned off. Do you have a pinned vector > > that has the value PETSC_OFFLOAD_GPU? For example here it is > > set to PETSC_OFFLOAD_CPU > > > > > > > > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin) > > > > { > > > > .... > > > > if (pin) { > > > > ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); > > > > V->valid_GPU_array = PETSC_OFFLOAD_CPU; /* since the > > CPU code will likely change values in the vector */ > > > > > > > > > > > > Is there any way to reproduce the problem? > > > > > > > > Barry > > > > > > > > > > > > > > > > > > > > > On Jul 27, 2019, at 10:28 AM, Mark Adams <[email protected] > > <mailto:[email protected]>> wrote: > > > > > > > > > > I'm not sure what to do here. The problem is that > > pinned-to-cpu vectors are calling VecCUDACopyFromGPU here. > > > > > > > > > > Should I set x->valid_GPU_array to something else, like > > PETSC_OFFLOAD_CPU, in PinToCPU so this block of code i s not > > executed? > > > > > > > > > > PetscErrorCode VecGetArray(Vec x,PetscScalar **a) > > > > > { > > > > > PetscErrorCode ierr; > > > > > #if defined(PETSC_HAVE_VIENNACL) > > > > > PetscBool is_viennacltype = PETSC_FALSE; > > > > > #endif > > > > > > > > > > PetscFunctionBegin; > > > > > PetscValidHeaderSpecific(x,VEC_CLASSID,1); > > > > > ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr); > > > > > if (x->petscnative) { > > > > > #if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA) > > > > > if (x->valid_GPU_array == PETSC_OFFLOAD_GPU) { > > > > > #if defined(PETSC_HAVE_VIENNACL) > > > > > ierr = > > > > PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr); > > > > > if (is_viennacltype) { > > > > > ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr); > > > > > } else > > > > > #endif > > > > > { > > > > > #if defined(PETSC_HAVE_CUDA) > > > > > ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr); > > > > > #endif > > > > > } > > > > > } else if (x->valid_GPU_array == > > PETSC_OFFLOAD_UNALLOCATED) { > > > > > #if defined(PETSC_HAVE_VIENNACL) > > > > > ierr = > > > > PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr); > > > > > if (is_viennacltype) { > > > > > ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr); > > > > > } else > > > > > #endif > > > > > { > > > > > #if defined(PETSC_HAVE_CUDA) > > > > > ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr); > > > > > #endif > > > > > } > > > > > } > > > > > #endif > > > > > *a = *((PetscScalar**)x->data); > > > > > } else { > > > > > > > > > > > > > > > On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F. > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > > > > > Yes, it needs to be able to switch back and forth > > between the CPU and GPU methods so you need to move into it the > > setting of the methods that is currently directly in the create > > method. See how MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr = > > MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set > > the methods for the GPU initially. > > > > > > > > > > Barry > > > > > > > > > > > > > > > > On Jul 23, 2019, at 7:32 PM, Mark Adams > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > > > > > > > > > > > > > What are the symptoms of it not working? Does it > > appear to be still copying the matrices to the GPU? then running > > the functions on the GPU? > > > > > > > > > > > > > > > > > > The object is dispatching the CUDA mat-vec etc. > > > > > > > > > > > > I suspect the pinning is incompletely done for CUDA > > (and MPIOpenCL) matrices. > > > > > > > > > > > > > > > > > > Yes, git grep MatPinToCPU shows stuff for ViennaCL but > > not CUDA. > > > > > > > > > > > > I guess I can add something like this below. Do we need > > to set the device methods? They are already set when this method > > is set, right? > > > > > > > > > > > > We need the equivalent of > > > > > > > > > > > > static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat > > A,PetscBool flg) > > > > > > { > > > > > > PetscFunctionBegin; > > > > > > A->pinnedtocpu = flg; > > > > > > if (flg) { > > > > > > A->ops->mult = MatMult_SeqAIJ; > > > > > > A->ops->multadd = MatMultAdd_SeqAIJ; > > > > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJ; > > > > > > A->ops->duplicate = MatDuplicate_SeqAIJ; > > > > > > } else { > > > > > > A->ops->mult = MatMult_SeqAIJViennaCL; > > > > > > A->ops->multadd = MatMultAdd_SeqAIJViennaCL; > > > > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJViennaCL; > > > > > > A->ops->destroy = MatDestroy_SeqAIJViennaCL; > > > > > > A->ops->duplicate = MatDuplicate_SeqAIJViennaCL; > > > > > > } > > > > > > PetscFunctionReturn(0); > > > > > > } > > > > > > > > > > > > for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't > > look like it has been written yet. > > > > > > > > > > > > > > > > > > > > > > > > > > It does not seem to work. It does not look like CUDA > > has an MatCreateVecs. Should I add one and copy this flag over? > > > > > > > > > > > > We do need this function. But I don't see how it > > relates to pinning. When the matrix is pinned to the CPU we want > > it to create CPU vectors which I assume it does. > > > > > > > > > > > > > > > > > > > > > > > > > > Mark > > > > > > > > > > > > > > > > > > > > > <strong_su.png><dynamic_range_log.png>
