On Thu, Jan 18, 2024 at 4:18 PM Yesypenko, Anna <[email protected]> wrote:
> Hi Matt, Barry, > > Apologies for the extra dependency on scipy. I can replicate the error by > calling setValue (i,j,v) in a loop as well. > In roughly half of 10 runs, the following script fails because of an error > in hashmapijv – the same as my original post. > It successfully runs without error the other times. > > Barry is right that it's CUDA specific. The script runs fine on the CPU. > Do you have any suggestions or example scripts on assigning entries to a > AIJCUSPARSE matrix? > Oh, you definitely do not want to be doing this. I believe you would rather 1) Make the CPU matrix and then convert to AIJCUSPARSE. This is efficient. 2) Produce the values on the GPU and call https://petsc.org/main/manualpages/Mat/MatSetPreallocationCOO/ https://petsc.org/main/manualpages/Mat/MatSetValuesCOO/ This is what most people do who are forming matrices directly on the GPU. What you are currently doing is incredibly inefficient, and I think accounts for you running out of memory. It talks back and forth between the CPU and GPU. Thanks, Matt Here is a minimum snippet that doesn't depend on scipy. > ``` > from petsc4py import PETSc > import numpy as np > > n = int(5e5); > nnz = 3 * np.ones(n, dtype=np.int32) > nnz[0] = nnz[-1] = 2 > A = PETSc.Mat(comm=PETSc.COMM_WORLD) > A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz) > A.setType('aijcusparse') > > A.setValue(0, 0, 2) > A.setValue(0, 1, -1) > A.setValue(n-1, n-2, -1) > A.setValue(n-1, n-1, 2) > > for index in range(1, n - 1): > A.setValue(index, index - 1, -1) > A.setValue(index, index, 2) > A.setValue(index, index + 1, -1) > A.assemble() > ``` > If it means anything to you, when the hash error occurs, it is for index > 67283 after filling 201851 nonzero values. > > Thank you for your help and suggestions! > Anna > > ------------------------------ > *From:* Barry Smith <[email protected]> > *Sent:* Thursday, January 18, 2024 2:35 PM > *To:* Yesypenko, Anna <[email protected]> > *Cc:* [email protected] <[email protected]> > *Subject:* Re: [petsc-users] HashMap Error when populating AIJCUSPARSE > matrix > > > Do you ever get a problem with 'aij` ? Can you run in a loop with > 'aij' to confirm it doesn't fail then? > > > > Barry > > > On Jan 17, 2024, at 4:51 PM, Yesypenko, Anna <[email protected]> wrote: > > Dear Petsc users/developers, > > I'm experiencing a bug when using petsc4py with GPU support. It may be my > mistake in how I set up a AIJCUSPARSE matrix. > For larger matrices, I sometimes encounter a error in assigning matrix > values; the error is thrown in PetscHMapIJVQuerySet(). > Here is a minimum snippet that populates a sparse tridiagonal matrix. > > ``` > from petsc4py import PETSc > from scipy.sparse import diags > import numpy as np > > n = int(5e5); > > nnz = 3 * np.ones(n, dtype=np.int32); nnz[0] = nnz[-1] = 2 > A = PETSc.Mat(comm=PETSc.COMM_WORLD) > A.createAIJ(size=[n,n],comm=PETSc.COMM_WORLD,nnz=nnz) > A.setType('aijcusparse') > tmp = diags([-1,2,-1],[-1,0,+1],shape=(n,n)).tocsr() > A.setValuesCSR(tmp.indptr,tmp.indices,tmp.data) > ####### this is the line where the error is thrown. > A.assemble() > ``` > > The error trace is below: > ``` > File "petsc4py/PETSc/Mat.pyx", line 2603, in > petsc4py.PETSc.Mat.setValuesCSR > File "petsc4py/PETSc/petscmat.pxi", line 1039, in > petsc4py.PETSc.matsetvalues_csr > File "petsc4py/PETSc/petscmat.pxi", line 1032, in > petsc4py.PETSc.matsetvalues_ijv > petsc4py.PETSc.Error: error code 76 > [0] MatSetValues() at > /work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:1497 > [0] MatSetValues_Seq_Hash() at > /work/06368/annayesy/ls6/petsc/include/../src/mat/impls/aij/seq/seqhashmatsetvalues.h:52 > [0] PetscHMapIJVQuerySet() at > /work/06368/annayesy/ls6/petsc/include/petsc/private/hashmapijv.h:10 > [0] Error in external library > [0] [khash] Assertion: `ret >= 0' failed. > ``` > > If I run the same script a handful of times, it will run without errors > eventually. > Does anyone have insight on why it is behaving this way? I'm running on a > node with 3x NVIDIA A100 PCIE 40GB. > > Thank you! > Anna > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
