Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-31 Thread Junchao Zhang
On Mon, Jan 31, 2022 at 10:50 AM Fande Kong wrote: > Sorry for the confusion. I thought I explained pretty well :-) > > Good: > > PETSc was linked to /usr/lib64/libcuda for libcuda > > Bad: > > PETSc was linked > to >

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-31 Thread Fande Kong
Sorry for the confusion. I thought I explained pretty well :-) Good: PETSc was linked to /usr/lib64/libcuda for libcuda Bad: PETSc was linked to /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs for libcuda My question would be: where should I

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-31 Thread Junchao Zhang
Fande, From your configure_main.log cuda: Version: 10.1 Includes: -I/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/include Library: -Wl,-rpath,/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-31 Thread Fande Kong
OK, Finally we resolved the issue. The issue was that there were two libcuda libs on a GPU compute node: /usr/lib64/libcuda and /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda. But on a login node there is one libcuda lib:

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-26 Thread Junchao Zhang
Do you have the configure.log with main? --Junchao Zhang On Wed, Jan 26, 2022 at 12:26 PM Fande Kong wrote: > I am on the petsc-main > > commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6 > > Merge: 96c919c d5f3255 > > Author: Satish Balay > > Date: Wed Jan 26 10:28:32 2022 -0600 > > >

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-26 Thread Fande Kong
I am on the petsc-main commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6 Merge: 96c919c d5f3255 Author: Satish Balay Date: Wed Jan 26 10:28:32 2022 -0600 Merge remote-tracking branch 'origin/release' It is still broken. Thanks, Fande On Wed, Jan 26, 2022 at 7:40 AM Junchao Zhang

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-26 Thread Fande Kong
On Tue, Jan 25, 2022 at 10:59 PM Barry Smith wrote: > > bad has extra > > -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs > -lcuda > > good does not. > > Try removing the stubs directory and -lcuda from the bad >

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-26 Thread Junchao Zhang
The good uses the compiler's default library/header path. The bad searches from cuda toolkit path and uses rpath linking. Though the paths look the same on the login node, they could have different behavior on a compute node depending on its environment. I think we fixed the issue in cuda.py

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-25 Thread Barry Smith
bad has extra -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs -lcuda good does not. Try removing the stubs directory and -lcuda from the bad $PETSC_ARCH/lib/petsc/conf/variables and likely the bad will start working. Barry I never liked

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-25 Thread Junchao Zhang
Fande, do you have a configure.log with current petsc/main? --Junchao Zhang On Tue, Jan 25, 2022 at 10:30 PM Fande Kong wrote: > Hi Junchao, > > I attached a "bad" configure log and a "good" configure log. > > The "bad" one was on produced at 246ba74192519a5f34fb6e227d1c64364e19ce2c > > and

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-25 Thread Junchao Zhang
Fande, could you send the configure.log that works (i.e., before this offending commit)? --Junchao Zhang On Tue, Jan 25, 2022 at 8:21 PM Fande Kong wrote: > Not sure if this is helpful. I did "git bisect", and here was the result: > > [kongf@sawtooth2 petsc]$ git bisect bad >

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-25 Thread Fande Kong
Not sure if this is helpful. I did "git bisect", and here was the result: [kongf@sawtooth2 petsc]$ git bisect bad 246ba74192519a5f34fb6e227d1c64364e19ce2c is the first bad commit commit 246ba74192519a5f34fb6e227d1c64364e19ce2c Author: Junchao Zhang Date: Wed Oct 13 05:32:43 2021 +

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-25 Thread Fande Kong
On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch wrote: > Configure should not have an impact here I think. The reason I had you run > `cudaGetDeviceCount()` is because this is the CUDA call (and in fact the > only CUDA call) in the initialization sequence that returns the error code. >

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Jed Brown
Junchao Zhang writes: > I don't see values using PetscUnlikely() today. It's usually premature optimization and PetscUnlikelyDebug makes it too easy to skip important checks. But at the time when I added PetscUnlikely, it was important for CHKERRQ(ierr). Specifically, without PetsUnlikely,

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Junchao Zhang
I don't see values using PetscUnlikely() today. --Junchao Zhang On Thu, Jan 20, 2022 at 7:26 PM Jacob Faibussowitsch wrote: > Segfault is caused by the following check at > src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a > PetscUnlikelyDebug() rather than just PetscUnlikely(): >

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Jacob Faibussowitsch
Segfault is caused by the following check at src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a PetscUnlikelyDebug() rather than just PetscUnlikely(): ``` if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is in fact < 0 here and uncaught ``` To clarify: “lazy”

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Matthew Knepley
On Thu, Jan 20, 2022 at 6:44 PM Fande Kong wrote: > Thanks, Jed > > On Thu, Jan 20, 2022 at 4:34 PM Jed Brown wrote: > >> You can't create CUDA or Kokkos Vecs if you're running on a node without >> a GPU. > > > I am running the code on compute nodes that do have GPUs. > If you are actually

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Fande Kong
Thanks, Jed On Thu, Jan 20, 2022 at 4:34 PM Jed Brown wrote: > You can't create CUDA or Kokkos Vecs if you're running on a node without a > GPU. I am running the code on compute nodes that do have GPUs. With PETSc-3.16.1, I got good speedup by running GAMG on GPUs. That might be a bug of

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Jed Brown
You can't create CUDA or Kokkos Vecs if you're running on a node without a GPU. The point of lazy initialization is to make it possible to run a solve that doesn't use a GPU in PETSC_ARCH that supports GPUs, regardless of whether a GPU is actually present. Fande Kong writes: > I spoke too

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Fande Kong
I spoke too soon. It seems that we have trouble creating cuda/kokkos vecs now. Got Segmentation fault. Thanks, Fande Program received signal SIGSEGV, Segmentation fault. 0x2aaab5558b11 in Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize (this=0x1) at

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-20 Thread Fande Kong
Thanks, Jed, This worked! Fande On Wed, Jan 19, 2022 at 11:03 PM Jed Brown wrote: > Fande Kong writes: > > > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch < > jacob@gmail.com> > > wrote: > > > >> Are you running on login nodes or compute nodes (I can’t seem to tell > from > >>

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-19 Thread Jed Brown
Fande Kong writes: > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch > wrote: > >> Are you running on login nodes or compute nodes (I can’t seem to tell from >> the configure.log)? >> > > I was compiling codes on login nodes, and running codes on compute nodes. > Login nodes do not have

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-19 Thread Fande Kong
On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch wrote: > Are you running on login nodes or compute nodes (I can’t seem to tell from > the configure.log)? > I was compiling codes on login nodes, and running codes on compute nodes. Login nodes do not have GPUs, but compute nodes do have

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-19 Thread Jacob Faibussowitsch
Are you running on login nodes or compute nodes (I can’t seem to tell from the configure.log)? If running from login nodes, do they support running with GPU’s? Some clusters will install stub versions of cuda runtime on login nodes (such that configuration can find them), but that won’t

Re: [petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-19 Thread Jacob Faibussowitsch
Hi Fande, What machine are you running this on? Please attach configure.log so I can troubleshoot this. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Jan 19, 2022, at 10:04, Fande Kong wrote: > > Hi All, > > Upgraded PETSc from 3.16.1 to the current main branch.

[petsc-users] Cannot eagerly initialize cuda, as doing so results in cuda error 35 (cudaErrorInsufficientDriver) : CUDA driver version is insufficient for CUDA runtime version

2022-01-19 Thread Fande Kong
Hi All, Upgraded PETSc from 3.16.1 to the current main branch. I suddenly got the following error message: 2d_diffusion]$ ../../../moose_test-dbg -i 2d_diffusion_test.i -use_gpu_aware_mpi 0 -gpu_mat_type aijcusparse -gpu_vec_type cuda -log_view [0]PETSC ERROR: - Error