> On Jun 24, 2022, at 8:58 AM, Mark Adams <[email protected]> wrote: > > And before we move to the MR, I think Matt found a clear problem: > > * PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); > returns "localA seqaij" > > * And, oddly, PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); > returns: > "it seems to detect that the pointer is a device mapped pointer but that it > is invalid"
It does not return a device mapped pointer, it returns a valid host pointer only. MatSeqAIJGetArrayRead() is intended to only return a host pointer, it cannot return a device pointer. MatSeqAIJCusparseGetArrayRead() returns device pointers and should be used for this purpose. > > Matt, lets just comment out the REUSE line and add another INITIAL line > (destroying the old Mat of course), and lets press on. Looking at the code there is no way that simply using INITIAL instead of REUSE will make a code that does not work on the GPU run on the GPU. The MatMPIAIJGetLocalMat() returns only a MATSEQAIJ matrix regardless of the INITIAL versus REUSE and one can never get a device pointer from a non-GPU matrix. As noted by Stefano, the code either needs to use MatMPIAIJGetLocalMatMerge () which does return a CUSPARSE matrix (but the columns are not supported) or MatMPIAIJGetLocalMat() needs to be updated to return a CUSPARSE matrix when the input MPI matrix is a CUSPARSE matrix. > We can keep the debugging code for now. > > We (PETSc) can work on this independently, > > Thanks, > Mark > > On Fri, Jun 24, 2022 at 8:51 AM Mark Adams <[email protected] > <mailto:[email protected]>> wrote: > I am not seeing this response, I see my "hstack" comment last. > https://gitlab.com/petsc/petsc/-/merge_requests/4323 > <https://gitlab.com/petsc/petsc/-/merge_requests/4323> > > On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <[email protected] > <mailto:[email protected]>> wrote: > > I have responded in the MR, which has all the context and the code. Please > move this conversation from petsc-dev to the MR. Note you can use the little > cartoon cloud symbol (upper write of the sub window with my text) to reply > to my post and keep everything in a thread for clarity. > > We are confused because it seems you are trying a variety of things and we > don't know how the different things you tried resulted in the multiple errors > you reported. > > > >> On Jun 23, 2022, at 3:59 PM, Matthew Martineau <[email protected] >> <mailto:[email protected]>> wrote: >> >> I checked in the changes and some debugging statements. >> >> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, &is_dev_ptrs, >> MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); >> >> Then the call returns false. If we instead call PetscObjectTypeCompareAny on >> Pmat then it returns true. If you print the type of the matrices: >> >> localA seqaij >> Pmat mpiaijcusparse >> >> If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then it >> errors (presumably because of the type mismatch). >> >> If we call MatSeqAIJGetArrayRead on localA and then pass the `values` to >> AmgX it seems to detect that the pointer is a device mapped pointer but that >> it is invalid. >> >> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems to >> return invalid pointer, but I’ll investigate more >> >> This doesn’t reproduce if we call: >> >> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); >> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Pointer >> appears to be valid and we converge >> >> Essentially all I want to achieve is that when we are parallel, we fetch the >> local part of A and the device pointer to the matrix values from that >> structure so that we can pass to AmgX. Preferring whichever API calls are >> the most efficient. >> >> >> From: Stefano Zampini <[email protected] >> <mailto:[email protected]>> >> Sent: 23 June 2022 20:55 >> To: Mark Adams <[email protected] <mailto:[email protected]>> >> Cc: Barry Smith <[email protected] <mailto:[email protected]>>; For users of >> the development version of PETSc <[email protected] >> <mailto:[email protected]>>; Matthew Martineau <[email protected] >> <mailto:[email protected]>> >> Subject: Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs >> >> External email: Use caution opening links or attachments >> >> The logic is wrong. It should check for MATSEQAIJCUSPARSE. >> >> On Thu, Jun 23, 2022, 21:36 Mark Adams <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <[email protected] >> <mailto:[email protected]>> wrote: >> >> It looks like the current code copies the nonzero values to the CPU from >> the MPI matrix (with the calls >> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav)); >> PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them into >> the CPU memory of the Seq matrix. When the matrix entries are next accessed >> on the GPU it should automatically copy them down to the GPU. So the code >> looks ok even for GPUs. We'll need to see the full error message with what >> the "invalid pointer" is. >> >> I showed Matt how to peek into offloadmask and he found that it is a host >> state, but this is not the issue. The access method should do the copy to >> the device. >> >> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT" in >> the comparison below). >> >> Matt, is the issue that you are calling MatSeqAIJCUSPARSEGetArrayRead and >> getting a host pointer? >> >> I think the state of amgx->localA after the call to >> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data >> to the device so they are both valid and you should have device data. >> >> 211 PetscBool is_dev_ptrs; >> 212 PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, "")); >> 213 >> 214 if (is_dev_ptrs) { >> 216 PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, >> &amgx->values)); >> 217 } else { >> 219 PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); >> 220 } >> >> >> Barry >> >> >> Yes this routine is terribly inefficient for GPU matrices, it needs to be >> specialized to not use the GPU memory but that is a separate issue from >> there being bugs in the current code. >> >> The code also seems to implicitly assume the parallel matrix has the same >> nonzero pattern with a reuse. This should be checked with each use by >> stashing the nonzero state of the matrix into the sequential matrix and >> making sure the parallel matrix has that same stashed value each time. >> Currently if one changes the nonzero matrix of the parallel matrix one is >> likely to get random confusing crashes due to memory corruption. But likely >> not the problem here. >> >> >> On Jun 23, 2022, at 2:23 PM, Mark Adams <[email protected] >> <mailto:[email protected]>> wrote: >> >> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel. >> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass in >> the code below, where the local matrix is created (INITIAL), but in the next >> pass, when "REUSE" is used, he sees an invalid pointer. >> Matt found that it does have offloadmask == CPU. >> Maybe it is missing logic to put the output in same state as the input? >> >> Any ideas on this or should I just dig into it? >> >> Thanks, >> bool partial_setup_allowed = (pc->setupcalled && pc->flag != >> DIFFERENT_NONZERO_PATTERN); >> 199 if (amgx->nranks > 1) { >> 200 if (partial_setup_allowed) { >> 202 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, >> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX >> API >> 203 } else { >> 205 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, >> &amgx->localA)); // This path works >> 206 } >> 207 } else { >> 208 amgx->localA = Pmat; >> 209 } >> 210 >
