Isn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointer
Yes, the user generally only cares host or device. Sometimes internally though we care if it is host or device, and if device, what kind of device. PetscMemtype is a bit overloaded in this sense.
if a host matrix was created it returns host pointers.
Many matrix “types” allow conversion. So what was once a host matrix can become a device matrix and vice versa. In these cases you’d want to know which pointers are returned. Best regards,
Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch)
I am thinking something like MatSeqAIJGetArrayAndMemType
Isn't the "MemType" of the matrix an invariant on creation? e.g. a user shouldn't care what memtype a pointer is for, just that if a device matrix was created it returns device pointers, if a host matrix was created it returns host pointers.
Now that I'm looking at those docs I see MatSeqAIJGetCSRAndMemType, isn't this what I'm looking for? If I call MatCreateSeqAIJCUSPARSE it will cudaMalloc the csr arrays, and then MatSeqAIJGetCSRAndMemType will return me those raw device pointers?
A workaround is to let petsc build the matrix and allocate the
memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and
fill it up.
Junchao, looking at the code for this it seems to only return a pointer to the value array, but not pointers to the column and row index arrays, is that right?
Yes, that is correct. I am thinking something like MatSeqAIJGetArrayAndMemType(Mat A, const PetscInt **i, const PetscInt **j, PetscScalar **a, PetscMemType *mtype), which returns (a, i, j) on device and mtype = PETSC_MEMTYPE_{CUDA, HIP} if A is a device matrix, otherwise (a,i, j) on host and mtype = PETSC_MEMTYPE_HOST. We currently have similar things like VecGetArrayAndMemType(Vec,PetscScalar**,PetscMemType*), and I am adding MatDenseGetArrayAndMemType(Mat,PetscScalar**,PetscMemType*).
It looks like you need (a, i, j) for assembly, but the above function only works for an assembled matrix.
We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
CUPM works with both enabled simultaneously, I don’t think there are any direct restrictions for it. Vec at least was fully usable with both cuda and hip (though untested) last time I checked. Best regards,
Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch)
Oh, is the device backend not known at compile time?
Currently it is known at compile time.
Are you sure? I don't think it is known at compile time.
We define either PETSC_HAVE_CUDA or PETSC_HAVE_HIP or NONE, but not both
Thanks,
Matt Or multiple backends can be alive at once?
Some petsc developers (Jed and Barry) want to support this, but we are incapable now.
Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...
Wouldn't one function suffice? Assuming these are contiguous arrays in CSR format, they're just raw device pointers in all cases.
But we need to know what device it is (to dispatch to either petsc-CUDA or petsc-HIP backend)
No, we don't have a counterpart of MatCreateSeqAIJWithArrays() for GPUs. Maybe we could add a MatCreateSeqAIJCUSPARSEWithArrays(), but then we would need another for MATMPIAIJCUSPARSE, and then for HIPSPARSE on AMD GPUs, ...
The real problem I think is to deal with multiple MPI ranks. Providing the split arrays for petsc MATMPIAIJ is not easy and thus is discouraged for users to do so. A workaround is to let petsc build the matrix and allocate the memory, then you call MatSeqAIJCUSPARSEGetArray() to get the array and fill it up. We recently added routines to support matrix assembly on GPUs, see if MatSetValuesCOO helps
I have a sparse matrix constructed in non-petsc code using a standard CSR representation where I compute the Jacobian to be used in an implicit TS context. In the CPU world I call
MatCreateSeqAIJWithArrays(PETSC_COMM_WORLD, nrows, ncols, rowidxptr, colidxptr, valptr, Jac);
which as I understand it -- (1) never copies/allocates that information, and the matrix Jac is just a non-owning view into the already allocated CSR, (2) I can write directly into the original data structures and the Mat just "knows" about it, although it still needs a call to MatAssemblyBegin/MatAssemblyEnd after modifying the values. So far this works great with GAMG.
I have the same CSR representation filled in GPU data allocated with cudaMalloc and filled on-device. Is there an equivalent Mat constructor for GPU arrays, or some other way to avoid unnecessary copies?
Thanks, Mark
-- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
|