> On Jun 24, 2022, at 8:58 AM, Mark Adams <[email protected]> wrote:
> 
> And before we move to the MR, I think Matt found a clear problem:
> 
> * PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
> returns "localA seqaij"
> 
> * And, oddly, PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); 
> returns:
> "it seems to detect that the pointer is a device mapped pointer but that it 
> is invalid"

  It does not return a device mapped pointer, it returns a valid host pointer 
only. MatSeqAIJGetArrayRead() is intended to only return a host pointer, it 
cannot return a device pointer. MatSeqAIJCusparseGetArrayRead() returns device 
pointers and should be used for this purpose.

> 
> Matt, lets just comment out the REUSE line and add another INITIAL line 
> (destroying the old Mat of course), and lets press on.

  Looking at the code there is no way that simply using INITIAL instead of 
REUSE will make a code that does not work on the GPU run on the GPU. The 
MatMPIAIJGetLocalMat() returns only a MATSEQAIJ matrix regardless of the 
INITIAL versus REUSE and one can never get a device pointer from a non-GPU 
matrix.

   As noted by Stefano, the code either needs to use MatMPIAIJGetLocalMatMerge 
() which does return a CUSPARSE matrix (but the columns are not supported) or 
MatMPIAIJGetLocalMat()
needs to be updated to return a CUSPARSE matrix when the input MPI matrix is a 
CUSPARSE matrix.




> We can keep the debugging code for now.
> 
> We (PETSc) can work on this independently,
> 
> Thanks,
> Mark
> 
> On Fri, Jun 24, 2022 at 8:51 AM Mark Adams <[email protected] 
> <mailto:[email protected]>> wrote:
> I am not seeing this response, I see my "hstack" comment last.
> https://gitlab.com/petsc/petsc/-/merge_requests/4323 
> <https://gitlab.com/petsc/petsc/-/merge_requests/4323>
> 
> On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>  I have responded in the MR, which has all the context and the code. Please 
> move this conversation from petsc-dev to the MR. Note you can use the little 
> cartoon cloud symbol (upper write of the sub window with my text)  to reply 
> to my post and keep everything in a thread for clarity.
> 
>   We are confused because it seems you are trying a variety of things and we 
> don't know how the different things you tried resulted in the multiple errors 
> you reported.
> 
> 
> 
>> On Jun 23, 2022, at 3:59 PM, Matthew Martineau <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> I checked in the changes and some debugging statements.
>>  
>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
>> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, &is_dev_ptrs, 
>> MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, ""));
>>  
>> Then the call returns false. If we instead call PetscObjectTypeCompareAny on 
>> Pmat then it returns true. If you print the type of the matrices:
>>  
>> localA seqaij
>> Pmat mpiaijcusparse
>>  
>> If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then it 
>> errors (presumably because of the type mismatch).
>>  
>> If we call MatSeqAIJGetArrayRead on localA and then pass the `values` to 
>> AmgX it seems to detect that the pointer is a device mapped pointer but that 
>> it is invalid.
>>  
>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems to 
>> return invalid pointer, but I’ll investigate more
>>  
>> This doesn’t reproduce if we call:
>>  
>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA));
>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Pointer 
>> appears to be valid and we converge
>>  
>> Essentially all I want to achieve is that when we are parallel, we fetch the 
>> local part of A and the device pointer to the matrix values from that 
>> structure so that we can pass to AmgX. Preferring whichever API calls are 
>> the most efficient. 
>>  
>>  
>> From: Stefano Zampini <[email protected] 
>> <mailto:[email protected]>> 
>> Sent: 23 June 2022 20:55
>> To: Mark Adams <[email protected] <mailto:[email protected]>>
>> Cc: Barry Smith <[email protected] <mailto:[email protected]>>; For users of 
>> the development version of PETSc <[email protected] 
>> <mailto:[email protected]>>; Matthew Martineau <[email protected] 
>> <mailto:[email protected]>>
>> Subject: Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs
>>  
>> External email: Use caution opening links or attachments
>>  
>> The logic is wrong. It should check for MATSEQAIJCUSPARSE. 
>>  
>> On Thu, Jun 23, 2022, 21:36 Mark Adams <[email protected] 
>> <mailto:[email protected]>> wrote:
>>  
>>  
>> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>>  
>>   It looks like the current code copies the nonzero values to the CPU from 
>> the MPI matrix (with the calls 
>> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav));
>>   PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them into 
>> the CPU memory of the Seq matrix. When the matrix entries are next accessed 
>> on the GPU it should automatically copy them down to the GPU.  So the code 
>> looks ok even for GPUs. We'll need to see the full error message with what 
>> the "invalid pointer" is.
>>  
>> I showed Matt how to peek into offloadmask and he found that it is a host 
>> state, but this is not the issue. The access method should do the copy to 
>> the device.
>>  
>> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT" in 
>> the comparison below).
>>  
>> Matt, is the issue that you are calling  MatSeqAIJCUSPARSEGetArrayRead and 
>> getting a host pointer?
>>  
>> I think the state of amgx->localA after the call to 
>> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data 
>> to the device so they are both valid and you should have device data.
>>  
>> 211   PetscBool is_dev_ptrs;
>> 212   PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, 
>> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, ""));
>> 213
>> 214   if (is_dev_ptrs) {
>> 216     PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, 
>> &amgx->values));
>> 217   } else {
>> 219     PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); 
>> 220   }
>>  
>>  
>>   Barry
>>  
>>  
>>  Yes this routine is terribly inefficient for GPU matrices, it needs to be 
>> specialized to not use the GPU memory but that is a separate issue from 
>> there being bugs in the current code.
>>  
>>   The code also seems to implicitly assume the parallel matrix has the same 
>> nonzero pattern with a reuse. This should be checked with each use by 
>> stashing the nonzero state of the matrix into the sequential matrix and 
>> making sure the parallel matrix has that same stashed value each time. 
>> Currently if one changes the nonzero matrix of the parallel matrix one is 
>> likely to get random confusing crashes due to memory corruption. But likely 
>> not the problem here.
>> 
>> 
>> On Jun 23, 2022, at 2:23 PM, Mark Adams <[email protected] 
>> <mailto:[email protected]>> wrote:
>>  
>> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel.
>> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass in 
>> the code below, where the local matrix is created (INITIAL), but in the next 
>> pass, when "REUSE" is used, he sees an invalid pointer.
>> Matt found that it does have offloadmask == CPU. 
>> Maybe it is missing logic to put the output in same state as the input?
>>  
>> Any ideas on this or should I just dig into it?
>>  
>> Thanks,
>> bool partial_setup_allowed = (pc->setupcalled && pc->flag != 
>> DIFFERENT_NONZERO_PATTERN);
>> 199   if (amgx->nranks > 1) {
>> 200     if (partial_setup_allowed) {
>> 202       PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, 
>> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX 
>> API
>> 203     } else {
>> 205       PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, 
>> &amgx->localA)); // This path works
>> 206     }
>> 207   } else {
>> 208     amgx->localA = Pmat;
>> 209   }
>> 210
> 

Reply via email to