It appears to be crashing in kh_resize() in khash.h on a memory allocation failure when it tries to get additional memory for storing the matrix.
This code seems to be only using the CPU memory so it should also fail in a similar way with 'aij'. But the matrix is not large and so I don't think it should be running out of memory. I cannot reproduce the crash with same parameters on my non-CUDA machine so debugging will be tricky. Barry > On Jan 18, 2024, at 3:35 PM, Barry Smith <[email protected]> wrote: > > > Do you ever get a problem with 'aij` ? Can you run in a loop with 'aij' > to confirm it doesn't fail then? > > > > Barry > > >> On Jan 17, 2024, at 4:51 PM, Yesypenko, Anna <[email protected]> wrote: >> >> Dear Petsc users/developers, >> >> I'm experiencing a bug when using petsc4py with GPU support. It may be my >> mistake in how I set up a AIJCUSPARSE matrix. >> For larger matrices, I sometimes encounter a error in assigning matrix >> values; the error is thrown in PetscHMapIJVQuerySet(). >> Here is a minimum snippet that populates a sparse tridiagonal matrix. >> >> ``` >> from petsc4py import PETSc >> from scipy.sparse import diags >> import numpy as np >> >> n = int(5e5); >> >> nnz = 3 * np.ones(n, dtype=np.int32); nnz[0] = nnz[-1] = 2 >> A = PETSc.Mat(comm=PETSc.COMM_WORLD) >> A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz) >> A.setType('aijcusparse') >> tmp = diags([-1,2,-1],[-1,0,+1],shape=(n,n)).tocsr() >> A.setValuesCSR(tmp.indptr,tmp.indices,tmp.data) >> ####### this is the line where the error is thrown. >> A.assemble() >> ``` >> >> The error trace is below: >> ``` >> File "petsc4py/PETSc/Mat.pyx", line 2603, in petsc4py.PETSc.Mat.setValuesCSR >> File "petsc4py/PETSc/petscmat.pxi", line 1039, in >> petsc4py.PETSc.matsetvalues_csr >> File "petsc4py/PETSc/petscmat.pxi", line 1032, in >> petsc4py.PETSc.matsetvalues_ijv >> petsc4py.PETSc.Error: error code 76 >> [0] MatSetValues() at >> /work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:1497 >> [0] MatSetValues_Seq_Hash() at >> /work/06368/annayesy/ls6/petsc/include/../src/mat/impls/aij/seq/seqhashmatsetvalues.h:52 >> [0] PetscHMapIJVQuerySet() at >> /work/06368/annayesy/ls6/petsc/include/petsc/private/hashmapijv.h:10 >> [0] Error in external library >> [0] [khash] Assertion: `ret >= 0' failed. >> ``` >> >> If I run the same script a handful of times, it will run without errors >> eventually. >> Does anyone have insight on why it is behaving this way? I'm running on a >> node with 3x NVIDIA A100 PCIE 40GB. >> >> Thank you! >> Anna >
