So the PETSc test all run, including the test that uses a GPU. The hypre test is failing. It is impossible to tell from the output why.
You can run it manually, cd src/snes/tutorials make ex19 mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -info > somefile then take a look at the output in somefile and send it to us. Barry > On Jul 14, 2022, at 12:32 PM, Juan Pablo de Lima Costa Salazar via > petsc-users <[email protected]> wrote: > > Hello, > > I was hoping to get help regarding a runtime error I am encountering on a > cluster node with 4 Tesla K40m GPUs after configuring PETSc with the > following command: > > $./configure --force \ > --with-precision=double \ > --with-debugging=0 \ > --COPTFLAGS=-O3 \ > --CXXOPTFLAGS=-O3 \ > --FOPTFLAGS=-O3 \ > PETSC_ARCH=linux64GccDPInt32-spack \ > --download-fblaslapack \ > --download-openblas \ > --download-hypre \ > > --download-hypre-configure-arguments=--enable-unified-memory \ > --with-mpi-dir=/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.4 \ > --with-cuda=1 \ > --download-suitesparse \ > --download-dir=downloads \ > > --with-cudac=/opt/ohpc/admin/spack/0.15.0/opt/spack/linux-centos8-ivybridge/gcc-9.3.0/cuda-11.7.0-hel25vgwc7fixnvfl5ipvnh34fnskw3m/bin/nvcc > \ > --with-packages-download-dir=downloads \ > --download-sowing=downloads/v1.1.26-p4.tar.gz \ > --with-cuda-arch=35 > > When I run > > $ make PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda > PETSC_ARCH=linux64GccDPInt32-spack check > Running check examples to verify correct installation > Using PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda and > PETSC_ARCH=linux64GccDPInt32-spack > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes > 3,5c3,15 > < 1 SNES Function norm 4.12227e-06 > < 2 SNES Function norm 6.098e-11 > < Number of SNES iterations = 2 > --- > > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139 > > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139 > > -------------------------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code. Per user-direction, the job has been aborted. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec detected that one or more processes exited with non-zero status, > > thus causing > > the job to be terminated. The first process to do so was: > > > > Process name: [[52712,1],0] > > Exit code: 1 > > -------------------------------------------------------------------------- > /home/juan/OpenFOAM/juan-v2206/petsc-cuda/src/snes/tutorials > Possible problem with ex19 running with hypre, diffs above > ========================================= > C/C++ example src/snes/tutorials/ex19 run successfully with cuda > C/C++ example src/snes/tutorials/ex19 run successfully with suitesparse > Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process > Completed test examples > > I have compiled the code on the head node (without GPUs) and on the compute > node where there are 4 GPUs. > > $nvidia-debugdump -l > Found 4 NVIDIA devices > Device ID: 0 > Device name: Tesla K40m > GPU internal ID: 0320717032250 > > Device ID: 1 > Device name: Tesla K40m > GPU internal ID: 0320717031968 > > Device ID: 2 > Device name: Tesla K40m > GPU internal ID: 0320717032246 > > Device ID: 3 > Device name: Tesla K40m > GPU internal ID: 0320717032235 > > Attached are the log files form configure and make. > > Any pointers are highly appreciated. My intention is to use PETSc as a linear > solver for OpenFOAM, leveraging the availability of GPUs at the same time. > Currently I can run PETSc without GPU support. > > Cheers, > Juan S. > > > > > > <configure.log.tar.gz><make.log.tar.gz>
