On Thu, Jan 5, 2023 at 10:24 AM Mark Lohry <mlo...@gmail.com> wrote: > > I am thinking something like MatSeqAIJGetArrayAndMemType >> > > Isn't the "MemType" of the matrix an invariant on creation? e.g. a user > shouldn't care what memtype a pointer is for, just that if a device matrix > was created it returns device pointers, if a host matrix was created it > returns host pointers. > > Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType > <https://petsc.org/release/docs/manualpages/Mat/MatSeqAIJGetCSRAndMemType/>, > isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will > cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return > me those raw device pointers? > > Yeah, I forgot I added it :). On "a user shouldn't care what memtype a pointer is": yes if you can, otherwise you can use mtype to differentiate your code path.
> > > > > On Thu, Jan 5, 2023 at 11:06 AM Junchao Zhang <junchao.zh...@gmail.com> > wrote: > >> >> >> On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry <mlo...@gmail.com> wrote: >> >>> >>>> A workaround is to let petsc build the matrix and allocate the memory, >>>> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >>>> >>> >>> Junchao, looking at the code for this it seems to only return a pointer >>> to the value array, but not pointers to the column and row index arrays, is >>> that right? >>> >> Yes, that is correct. >> I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const >> PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), >> which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if >> A is a device matrix, otherwise (a,i, j) on host and mtype = >> PETSC_MEMTYPE_HOST. >> We currently have similar things like >> VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding >> MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*). >> >> It looks like you need (a, i, j) for assembly, but the above function >> only works for an assembled matrix. >> >> >>> >>> >>> On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch <jacob....@gmail.com> >>> wrote: >>> >>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not >>>> both >>>> >>>> >>>> CUPM works with both enabled simultaneously, I don’t think there are >>>> any direct restrictions for it. Vec at least was fully usable with both >>>> cuda and hip (though untested) last time I checked. >>>> >>>> Best regards, >>>> >>>> Jacob Faibussowitsch >>>> (Jacob Fai - booss - oh - vitch) >>>> >>>> On Jan 5, 2023, at 00:09, Junchao Zhang <junchao.zh...@gmail.com> >>>> wrote: >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <knep...@gmail.com> >>>> wrote: >>>> >>>>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <junchao.zh...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <mlo...@gmail.com> wrote: >>>>>> >>>>>>> Oh, is the device backend not known at compile time? >>>>>>> >>>>>> Currently it is known at compile time. >>>>>> >>>>> >>>>> Are you sure? I don't think it is known at compile time. >>>>> >>>> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not >>>> both >>>> >>>> >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> Or multiple backends can be alive at once? >>>>>>> >>>>>> >>>>>> Some petsc developers (Jed and Barry) want to support this, but we >>>>>> are incapable now. >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <junchao.zh...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <mlo...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE >>>>>>>>>> on AMD >>>>>>>>>> GPUs, ... >>>>>>>>> >>>>>>>>> >>>>>>>>> Wouldn't one function suffice? Assuming these are contiguous >>>>>>>>> arrays in CSR format, they're just raw device pointers in all cases. >>>>>>>>> >>>>>>>> But we need to know what device it is (to dispatch to either >>>>>>>> petsc-CUDA or petsc-HIP backend) >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>>>> junchao.zh...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() >>>>>>>>>> for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), >>>>>>>>>> but >>>>>>>>>> then we would need another for MATMPIAIJCUSPARSE, and then for >>>>>>>>>> HIPSPARSE on >>>>>>>>>> AMD GPUs, ... >>>>>>>>>> >>>>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus >>>>>>>>>> is >>>>>>>>>> discouraged for users to do so. >>>>>>>>>> >>>>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array >>>>>>>>>> and fill >>>>>>>>>> it up. >>>>>>>>>> >>>>>>>>>> We recently added routines to support matrix assembly on GPUs, >>>>>>>>>> see if MatSetValuesCOO >>>>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/> >>>>>>>>>> helps >>>>>>>>>> >>>>>>>>>> --Junchao Zhang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <mlo...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>>>> standard CSR representation where I compute the Jacobian to be used >>>>>>>>>>> in an >>>>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>>>> >>>>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>>>> >>>>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>>>> information, and the matrix Jac is just a non-owning view into the >>>>>>>>>>> already >>>>>>>>>>> allocated CSR, (2) I can write directly into the original data >>>>>>>>>>> structures >>>>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far >>>>>>>>>>> this >>>>>>>>>>> works great with GAMG. >>>>>>>>>>> >>>>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary >>>>>>>>>>> copies? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Mark >>>>>>>>>>> >>>>>>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>> >>>>