On Mon, Jan 31, 2022 at 10:50 AM Fande Kong wrote:
> Sorry for the confusion. I thought I explained pretty well :-)
>
> Good:
>
> PETSc was linked to /usr/lib64/libcuda for libcuda
>
> Bad:
>
> PETSc was linked
> to
>
Sorry for the confusion. I thought I explained pretty well :-)
Good:
PETSc was linked to /usr/lib64/libcuda for libcuda
Bad:
PETSc was linked
to
/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs
for libcuda
My question would be: where should I
Fande,
From your configure_main.log
cuda:
Version: 10.1
Includes:
-I/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/include
Library:
-Wl,-rpath,/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64
OK,
Finally we resolved the issue. The issue was that there were two libcuda
libs on a GPU compute node: /usr/lib64/libcuda
and
/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda.
But on a login node there is one libcuda lib:
Do you have the configure.log with main?
--Junchao Zhang
On Wed, Jan 26, 2022 at 12:26 PM Fande Kong wrote:
> I am on the petsc-main
>
> commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6
>
> Merge: 96c919c d5f3255
>
> Author: Satish Balay
>
> Date: Wed Jan 26 10:28:32 2022 -0600
>
>
>
I am on the petsc-main
commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6
Merge: 96c919c d5f3255
Author: Satish Balay
Date: Wed Jan 26 10:28:32 2022 -0600
Merge remote-tracking branch 'origin/release'
It is still broken.
Thanks,
Fande
On Wed, Jan 26, 2022 at 7:40 AM Junchao Zhang
On Tue, Jan 25, 2022 at 10:59 PM Barry Smith wrote:
>
> bad has extra
>
> -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs
> -lcuda
>
> good does not.
>
> Try removing the stubs directory and -lcuda from the bad
>
The good uses the compiler's default library/header path. The bad searches
from cuda toolkit path and uses rpath linking.
Though the paths look the same on the login node, they could have different
behavior on a compute node depending on its environment.
I think we fixed the issue in cuda.py
bad has extra
-L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs
-lcuda
good does not.
Try removing the stubs directory and -lcuda from the bad
$PETSC_ARCH/lib/petsc/conf/variables and likely the bad will start working.
Barry
I never liked
Fande, do you have a configure.log with current petsc/main?
--Junchao Zhang
On Tue, Jan 25, 2022 at 10:30 PM Fande Kong wrote:
> Hi Junchao,
>
> I attached a "bad" configure log and a "good" configure log.
>
> The "bad" one was on produced at 246ba74192519a5f34fb6e227d1c64364e19ce2c
>
> and
Fande, could you send the configure.log that works (i.e., before this
offending commit)?
--Junchao Zhang
On Tue, Jan 25, 2022 at 8:21 PM Fande Kong wrote:
> Not sure if this is helpful. I did "git bisect", and here was the result:
>
> [kongf@sawtooth2 petsc]$ git bisect bad
>
Not sure if this is helpful. I did "git bisect", and here was the result:
[kongf@sawtooth2 petsc]$ git bisect bad
246ba74192519a5f34fb6e227d1c64364e19ce2c is the first bad commit
commit 246ba74192519a5f34fb6e227d1c64364e19ce2c
Author: Junchao Zhang
Date: Wed Oct 13 05:32:43 2021 +
On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch
wrote:
> Configure should not have an impact here I think. The reason I had you run
> `cudaGetDeviceCount()` is because this is the CUDA call (and in fact the
> only CUDA call) in the initialization sequence that returns the error code.
>
Junchao Zhang writes:
> I don't see values using PetscUnlikely() today.
It's usually premature optimization and PetscUnlikelyDebug makes it too easy to
skip important checks. But at the time when I added PetscUnlikely, it was
important for CHKERRQ(ierr). Specifically, without PetsUnlikely,
I don't see values using PetscUnlikely() today.
--Junchao Zhang
On Thu, Jan 20, 2022 at 7:26 PM Jacob Faibussowitsch
wrote:
> Segfault is caused by the following check at
> src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a
> PetscUnlikelyDebug() rather than just PetscUnlikely():
>
Segfault is caused by the following check at
src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a
PetscUnlikelyDebug() rather than just PetscUnlikely():
```
if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is in fact < 0
here and uncaught
```
To clarify:
“lazy”
On Thu, Jan 20, 2022 at 6:44 PM Fande Kong wrote:
> Thanks, Jed
>
> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown wrote:
>
>> You can't create CUDA or Kokkos Vecs if you're running on a node without
>> a GPU.
>
>
> I am running the code on compute nodes that do have GPUs.
>
If you are actually
Thanks, Jed
On Thu, Jan 20, 2022 at 4:34 PM Jed Brown wrote:
> You can't create CUDA or Kokkos Vecs if you're running on a node without a
> GPU.
I am running the code on compute nodes that do have GPUs.
With PETSc-3.16.1, I got good speedup by running GAMG on GPUs. That might
be a bug of
You can't create CUDA or Kokkos Vecs if you're running on a node without a GPU.
The point of lazy initialization is to make it possible to run a solve that
doesn't use a GPU in PETSC_ARCH that supports GPUs, regardless of whether a GPU
is actually present.
Fande Kong writes:
> I spoke too
I spoke too soon. It seems that we have trouble creating cuda/kokkos vecs
now. Got Segmentation fault.
Thanks,
Fande
Program received signal SIGSEGV, Segmentation fault.
0x2aaab5558b11 in
Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize
(this=0x1) at
Thanks, Jed,
This worked!
Fande
On Wed, Jan 19, 2022 at 11:03 PM Jed Brown wrote:
> Fande Kong writes:
>
> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch <
> jacob@gmail.com>
> > wrote:
> >
> >> Are you running on login nodes or compute nodes (I can’t seem to tell
> from
> >>
Fande Kong writes:
> On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch
> wrote:
>
>> Are you running on login nodes or compute nodes (I can’t seem to tell from
>> the configure.log)?
>>
>
> I was compiling codes on login nodes, and running codes on compute nodes.
> Login nodes do not have
On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch
wrote:
> Are you running on login nodes or compute nodes (I can’t seem to tell from
> the configure.log)?
>
I was compiling codes on login nodes, and running codes on compute nodes.
Login nodes do not have GPUs, but compute nodes do have
Are you running on login nodes or compute nodes (I can’t seem to tell from the
configure.log)? If running from login nodes, do they support running with
GPU’s? Some clusters will install stub versions of cuda runtime on login nodes
(such that configuration can find them), but that won’t
Hi Fande,
What machine are you running this on? Please attach configure.log so I can
troubleshoot this.
Best regards,
Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
> On Jan 19, 2022, at 10:04, Fande Kong wrote:
>
> Hi All,
>
> Upgraded PETSc from 3.16.1 to the current main branch.
Hi All,
Upgraded PETSc from 3.16.1 to the current main branch. I suddenly got the
following error message:
2d_diffusion]$ ../../../moose_test-dbg -i 2d_diffusion_test.i
-use_gpu_aware_mpi 0 -gpu_mat_type aijcusparse -gpu_vec_type cuda
-log_view
[0]PETSC ERROR: - Error
26 matches
Mail list logo