On Thu, Jan 5, 2023 at 9:39 AM Mark Lohry <[email protected]> wrote: > >> A workaround is to let petsc build the matrix and allocate the memory, >> then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. >> > > Junchao, looking at the code for this it seems to only return a pointer to > the value array, but not pointers to the column and row index arrays, is > that right? > Yes, that is correct. I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if A is a device matrix, otherwise (a,i, j) on host and mtype = PETSC_MEMTYPE_HOST. We currently have similar things like VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).
It looks like you need (a, i, j) for assembly, but the above function only works for an assembled matrix. > > > On Thu, Jan 5, 2023 at 5:47 AM Jacob Faibussowitsch <[email protected]> > wrote: > >> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >> >> >> CUPM works with both enabled simultaneously, I don’t think there are any >> direct restrictions for it. Vec at least was fully usable with both cuda >> and hip (though untested) last time I checked. >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >> On Jan 5, 2023, at 00:09, Junchao Zhang <[email protected]> wrote: >> >> >> >> >> >> On Wed, Jan 4, 2023 at 6:02 PM Matthew Knepley <[email protected]> wrote: >> >>> On Wed, Jan 4, 2023 at 6:49 PM Junchao Zhang <[email protected]> >>> wrote: >>> >>>> >>>> On Wed, Jan 4, 2023 at 5:40 PM Mark Lohry <[email protected]> wrote: >>>> >>>>> Oh, is the device backend not known at compile time? >>>>> >>>> Currently it is known at compile time. >>>> >>> >>> Are you sure? I don't think it is known at compile time. >>> >> We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both >> >> >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> Or multiple backends can be alive at once? >>>>> >>>> >>>> Some petsc developers (Jed and Barry) want to support this, but we are >>>> incapable now. >>>> >>>> >>>>> >>>>> On Wed, Jan 4, 2023, 6:27 PM Junchao Zhang <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 4, 2023 at 5:19 PM Mark Lohry <[email protected]> wrote: >>>>>> >>>>>>> Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then >>>>>>>> we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on >>>>>>>> AMD >>>>>>>> GPUs, ... >>>>>>> >>>>>>> >>>>>>> Wouldn't one function suffice? Assuming these are contiguous arrays >>>>>>> in CSR format, they're just raw device pointers in all cases. >>>>>>> >>>>>> But we need to know what device it is (to dispatch to either >>>>>> petsc-CUDA or petsc-HIP backend) >>>>>> >>>>>> >>>>>>> >>>>>>> On Wed, Jan 4, 2023 at 6:02 PM Junchao Zhang < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for >>>>>>>> GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but >>>>>>>> then we >>>>>>>> would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD >>>>>>>> GPUs, ... >>>>>>>> >>>>>>>> The real problem I think is to deal with multiple MPI ranks. >>>>>>>> Providing the split arrays for petsc MATMPIAIJ is not easy and thus is >>>>>>>> discouraged for users to do so. >>>>>>>> >>>>>>>> A workaround is to let petsc build the matrix and allocate the >>>>>>>> memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and >>>>>>>> fill >>>>>>>> it up. >>>>>>>> >>>>>>>> We recently added routines to support matrix assembly on GPUs, see >>>>>>>> if MatSetValuesCOO >>>>>>>> <https://petsc.org/release/docs/manualpages/Mat/MatSetValuesCOO/> >>>>>>>> helps >>>>>>>> >>>>>>>> --Junchao Zhang >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Jan 4, 2023 at 2:22 PM Mark Lohry <[email protected]> wrote: >>>>>>>> >>>>>>>>> I have a sparse matrix constructed in non-petsc code using a >>>>>>>>> standard CSR representation where I compute the Jacobian to be used >>>>>>>>> in an >>>>>>>>> implicit TS context. In the CPU world I call >>>>>>>>> >>>>>>>>> MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, >>>>>>>>> rowidxptr, colidxptr, valptr, Jac); >>>>>>>>> >>>>>>>>> which as I understand it -- (1) never copies/allocates that >>>>>>>>> information, and the matrix Jac is just a non-owning view into the >>>>>>>>> already >>>>>>>>> allocated CSR, (2) I can write directly into the original data >>>>>>>>> structures >>>>>>>>> and the Mat just "knows" about it, although it still needs a call to >>>>>>>>> MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far >>>>>>>>> this >>>>>>>>> works great with GAMG. >>>>>>>>> >>>>>>>>> I have the same CSR representation filled in GPU data allocated >>>>>>>>> with cudaMalloc and filled on-device. Is there an equivalent Mat >>>>>>>>> constructor for GPU arrays, or some other way to avoid unnecessary >>>>>>>>> copies? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Mark >>>>>>>>> >>>>>>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> <http://www.cse.buffalo.edu/~knepley/> >>> >>
