Hi, I was wondering what the official status of 64-bit integer support in the PETSc GPU backend is (specifically CUDA). This question comes from the result of benchmarking some PETSc code and looking at some sources. In particular, I found that PETSc's call to cuSPARSE SpMV seems to always be using the 32-bit integer call, even if I compile PETSc with `--with-64-bit-indices`. After digging around more, I see that PETSc always only creates 32-bit cuSPARSE matrices as well: https://gitlab.com/petsc/petsc/-/blob/v3.19.4/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu?ref_type=tags#L2501. I was looking around for a switch somewhere to 64 bit integers inside this code, but everything seems to be pretty hardcoded with `THRUSTINTARRAY32`.
As expected, this all works when the range of coordinates in each sparse matrix partition is less than INT_MAX, but PETSc GPU code breaks in different ways (calling cuBLAS and cuSPARSE) when trying a (synthetic) problem that needs 64 bit integers: ``` #include "petscmat.h" #include "petscvec.h" #include "petsc.h" int main(int argc, char** argv) { PetscInt ierr; PetscInitialize(&argc, &argv, (char *)0, "GPU bug"); PetscInt numRows = 1; PetscInt numCols = PetscInt(INT_MAX) * 2; Mat A; PetscInt rowStart, rowEnd; ierr = MatCreate(PETSC_COMM_WORLD, &A); CHKERRQ(ierr); MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, numRows, numCols); MatSetType(A, MATMPIAIJ); MatSetFromOptions(A); MatSetValue(A, 0, 0, 1.0, INSERT_VALUES); MatSetValue(A, 0, numCols - 1, 1.0, INSERT_VALUES); MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY); MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY); Vec b; ierr = VecCreate(PETSC_COMM_WORLD, &b); CHKERRQ(ierr); VecSetSizes(b, PETSC_DECIDE, numCols); VecSetFromOptions(b); VecSet(b, 0.0); VecSetValue(b, 0, 42.0, INSERT_VALUES); VecSetValue(b, numCols - 1, 58.0, INSERT_VALUES); VecAssemblyBegin(b); VecAssemblyEnd(b); Vec x; ierr = VecCreate(PETSC_COMM_WORLD, &x); CHKERRQ(ierr); VecSetSizes(x, PETSC_DECIDE, numRows); VecSetFromOptions(x); VecSet(x, 0.0); MatMult(A, b, x); PetscScalar result; VecSum(x, &result); PetscPrintf(PETSC_COMM_WORLD, "Result of mult: %f\n", result); PetscFinalize(); } ``` When this program is run on CPUs, it outputs 100.0, as expected. When run on a single GPU with `-vec_type cuda -mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with ``` [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Argument out of range [0]PETSC ERROR: 4294967294 is too big for cuBLAS, which may be restricted to 32-bit integers [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 09:34:10 2023 [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices [0]PETSC ERROR: #1 checkCupmBlasIntCast() at /local/home/rohany/petsc/include/petsc/private/cupmblasinterface.hpp:435 [0]PETSC ERROR: #2 VecAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:335 [0]PETSC ERROR: #3 VecCUPMAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:360 [0]PETSC ERROR: #4 DeviceAllocateCheck_() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:389 [0]PETSC ERROR: #5 GetArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:545 [0]PETSC ERROR: #6 VectorArray() at /local/home/rohany/petsc/include/petsc/private/veccupmimpl.h:273 -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF with errorcode 63. NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- ``` and when run with just `-mat_type aijcusparse -use_gpu_aware_mpi 0` it fails with ``` ** On entry to cusparseCreateCsr(): dimension mismatch for CUSPARSE_INDEX_32I, cols (4294967294) + base (0) > INT32_MAX (2147483647) [0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: GPU error [0]PETSC ERROR: cuSPARSE errorcode 3 (CUSPARSE_STATUS_INVALID_VALUE) : invalid value [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [0]PETSC ERROR: Petsc Release Version 3.19.4, unknown [0]PETSC ERROR: ./gpu-bug on a named sean-dgx2 by rohany Fri Aug 11 09:43:07 2023 [0]PETSC ERROR: Configure options --with-cuda=1 --prefix=/local/home/rohany/petsc/petsc-install/ --with-cuda-dir=/usr/local/cuda-11.7/ CXXFLAGS=-O3 COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 --download-fblaslapack=1 --with-debugging=0 --with-64-bit-indices [0]PETSC ERROR: #1 MatSeqAIJCUSPARSECopyToGPU() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu:2503 [0]PETSC ERROR: #2 MatMultAddKernel_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu:3544 [0]PETSC ERROR: #3 MatMult_SeqAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu:3485 [0]PETSC ERROR: #4 MatMult_MPIAIJCUSPARSE() at /local/home/rohany/petsc/src/mat/impls/aij/mpi/mpicusparse/ mpiaijcusparse.cu:452 [0]PETSC ERROR: #5 MatMult() at /local/home/rohany/petsc/src/mat/interface/matrix.c:2599 ``` Thanks, Rohan Yadav