Fande, From your configure_main.log cuda: Version: 10.1 Includes: -I/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/include Library: -Wl,-rpath,/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64 -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64 -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs -lcudart -lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda
You can see the `stubs` directory is not in rpath. We took a lot of effort to achieve that. You need to double check the reason. --Junchao Zhang On Mon, Jan 31, 2022 at 9:40 AM Fande Kong <fdkong...@gmail.com> wrote: > OK, > > Finally we resolved the issue. The issue was that there were two libcuda > libs on a GPU compute node: /usr/lib64/libcuda > and > /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda. > But on a login node there is one libcuda lib: > /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda. > We can not see /usr/lib64/libcuda from a login node where I was compiling > the code. > > Before the Junchao's commit, we did not have "-Wl,-rpath" to force PETSc > take > /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda. > A code compiled on a login node could correctly pick up the cuda lib > from /usr/lib64/libcuda at runtime. When with "-Wl,-rpath", the code > always takes the cuda lib from > /apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs/libcuda, > wihch was a bad lib. > > Right now, I just compiled code on a compute node instead of a login node, > PETSc was able to pick up the correct lib from /usr/lib64/libcuda, and > everything ran fine. > > I am not sure whether or not it is a good idea to search for "stubs" since > the system might have the correct ones in other places. Should not I do a > batch compiling? > > Thanks, > > Fande > > > On Wed, Jan 26, 2022 at 1:49 PM Fande Kong <fdkong...@gmail.com> wrote: > >> Yes, please see the attached file. >> >> Fande >> >> On Wed, Jan 26, 2022 at 11:49 AM Junchao Zhang <junchao.zh...@gmail.com> >> wrote: >> >>> Do you have the configure.log with main? >>> >>> --Junchao Zhang >>> >>> >>> On Wed, Jan 26, 2022 at 12:26 PM Fande Kong <fdkong...@gmail.com> wrote: >>> >>>> I am on the petsc-main >>>> >>>> commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6 >>>> >>>> Merge: 96c919c d5f3255 >>>> >>>> Author: Satish Balay <ba...@mcs.anl.gov> >>>> >>>> Date: Wed Jan 26 10:28:32 2022 -0600 >>>> >>>> >>>> Merge remote-tracking branch 'origin/release' >>>> >>>> >>>> It is still broken. >>>> >>>> Thanks, >>>> >>>> >>>> Fande >>>> >>>> On Wed, Jan 26, 2022 at 7:40 AM Junchao Zhang <junchao.zh...@gmail.com> >>>> wrote: >>>> >>>>> The good uses the compiler's default library/header path. The bad >>>>> searches from cuda toolkit path and uses rpath linking. >>>>> Though the paths look the same on the login node, they could have >>>>> different behavior on a compute node depending on its environment. >>>>> I think we fixed the issue in cuda.py (i.e., first try the compiler's >>>>> default, then toolkit). That's why I wanted Fande to use petsc/main. >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Jan 25, 2022 at 11:59 PM Barry Smith <bsm...@petsc.dev> wrote: >>>>> >>>>>> >>>>>> bad has extra >>>>>> >>>>>> -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs >>>>>> -lcuda >>>>>> >>>>>> good does not. >>>>>> >>>>>> Try removing the stubs directory and -lcuda from the bad >>>>>> $PETSC_ARCH/lib/petsc/conf/variables and likely the bad will start >>>>>> working. >>>>>> >>>>>> Barry >>>>>> >>>>>> I never liked the stubs stuff. >>>>>> >>>>>> On Jan 25, 2022, at 11:29 PM, Fande Kong <fdkong...@gmail.com> wrote: >>>>>> >>>>>> Hi Junchao, >>>>>> >>>>>> I attached a "bad" configure log and a "good" configure log. >>>>>> >>>>>> The "bad" one was on produced >>>>>> at 246ba74192519a5f34fb6e227d1c64364e19ce2c >>>>>> >>>>>> and the "good" one at 384645a00975869a1aacbd3169de62ba40cad683 >>>>>> >>>>>> This good hash is the last good hash that is just the right before >>>>>> the bad one. >>>>>> >>>>>> I think you could do a comparison between these two logs, and check >>>>>> what the differences were. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Fande >>>>>> >>>>>> On Tue, Jan 25, 2022 at 8:21 PM Junchao Zhang < >>>>>> junchao.zh...@gmail.com> wrote: >>>>>> >>>>>>> Fande, could you send the configure.log that works (i.e., before >>>>>>> this offending commit)? >>>>>>> --Junchao Zhang >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 25, 2022 at 8:21 PM Fande Kong <fdkong...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Not sure if this is helpful. I did "git bisect", and here was the >>>>>>>> result: >>>>>>>> >>>>>>>> [kongf@sawtooth2 petsc]$ git bisect bad >>>>>>>> 246ba74192519a5f34fb6e227d1c64364e19ce2c is the first bad commit >>>>>>>> commit 246ba74192519a5f34fb6e227d1c64364e19ce2c >>>>>>>> Author: Junchao Zhang <jczh...@mcs.anl.gov> >>>>>>>> Date: Wed Oct 13 05:32:43 2021 +0000 >>>>>>>> >>>>>>>> Config: fix CUDA library and header dirs >>>>>>>> >>>>>>>> :040000 040000 187c86055adb80f53c1d0565a8888704fec43a96 >>>>>>>> ea1efd7f594fd5e8df54170bc1bc7b00f35e4d5f M config >>>>>>>> >>>>>>>> >>>>>>>> Started from this commit, and GPU did not work for me on our HPC >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Fande >>>>>>>> >>>>>>>> On Tue, Jan 25, 2022 at 7:18 PM Fande Kong <fdkong...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch < >>>>>>>>> jacob....@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Configure should not have an impact here I think. The reason I >>>>>>>>>> had you run `cudaGetDeviceCount()` is because this is the CUDA call >>>>>>>>>> (and in >>>>>>>>>> fact the only CUDA call) in the initialization sequence that returns >>>>>>>>>> the >>>>>>>>>> error code. There should be no prior CUDA calls. Maybe this is a >>>>>>>>>> problem >>>>>>>>>> with oversubscribing GPU’s? In the runs that crash, how many ranks >>>>>>>>>> are >>>>>>>>>> using any given GPU at once? Maybe MPS is required. >>>>>>>>>> >>>>>>>>> >>>>>>>>> I used one MPI rank. >>>>>>>>> >>>>>>>>> Fande >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> >>>>>>>>>> Jacob Faibussowitsch >>>>>>>>>> (Jacob Fai - booss - oh - vitch) >>>>>>>>>> >>>>>>>>>> On Jan 21, 2022, at 12:01, Fande Kong <fdkong...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thanks Jacob, >>>>>>>>>> >>>>>>>>>> On Thu, Jan 20, 2022 at 6:25 PM Jacob Faibussowitsch < >>>>>>>>>> jacob....@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Segfault is caused by the following check at >>>>>>>>>>> src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a >>>>>>>>>>> PetscUnlikelyDebug() rather than just PetscUnlikely(): >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice >>>>>>>>>>> is in fact < 0 here and uncaught >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> To clarify: >>>>>>>>>>> >>>>>>>>>>> “lazy” initialization is not that lazy after all, it still does >>>>>>>>>>> some 50% of the initialization that “eager” initialization does. It >>>>>>>>>>> stops >>>>>>>>>>> short initializing the CUDA runtime, checking CUDA aware MPI, >>>>>>>>>>> gathering >>>>>>>>>>> device data, and initializing cublas and friends. Lazy also >>>>>>>>>>> importantly >>>>>>>>>>> swallows any errors that crop up during initialization, storing the >>>>>>>>>>> resulting error code for later (specifically _defaultDevice = >>>>>>>>>>> -init_error_value;). >>>>>>>>>>> >>>>>>>>>>> So whether you initialize lazily or eagerly makes no difference >>>>>>>>>>> here, as _defaultDevice will always contain -35. >>>>>>>>>>> >>>>>>>>>>> The bigger question is why cudaGetDeviceCount() is returning >>>>>>>>>>> cudaErrorInsufficientDriver. Can you compile and run >>>>>>>>>>> >>>>>>>>>>> ``` >>>>>>>>>>> #include <cuda_runtime.h> >>>>>>>>>>> >>>>>>>>>>> int main() >>>>>>>>>>> { >>>>>>>>>>> int ndev; >>>>>>>>>>> return cudaGetDeviceCount(&ndev): >>>>>>>>>>> } >>>>>>>>>>> ``` >>>>>>>>>>> >>>>>>>>>>> Then show the value of "echo $?”? >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Modify your code a little to get more information. >>>>>>>>>> >>>>>>>>>> #include <cuda_runtime.h> >>>>>>>>>> #include <cstdio> >>>>>>>>>> >>>>>>>>>> int main() >>>>>>>>>> { >>>>>>>>>> int ndev; >>>>>>>>>> int error = cudaGetDeviceCount(&ndev); >>>>>>>>>> printf("ndev %d \n", ndev); >>>>>>>>>> printf("error %d \n", error); >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Results: >>>>>>>>>> >>>>>>>>>> $ ./a.out >>>>>>>>>> ndev 4 >>>>>>>>>> error 0 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I have not read the PETSc cuda initialization code yet. If I need >>>>>>>>>> to guess at what was happening. I will naively think that PETSc did >>>>>>>>>> not get >>>>>>>>>> correct GPU information in the configuration because the compiler >>>>>>>>>> node does >>>>>>>>>> not have GPUs, and there was no way to get any GPU device >>>>>>>>>> information. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> During the runtime on GPU nodes, PETSc might have incorrect >>>>>>>>>> information grabbed during configuration and had this kind of false >>>>>>>>>> error >>>>>>>>>> message. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Fande >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> >>>>>>>>>>> Jacob Faibussowitsch >>>>>>>>>>> (Jacob Fai - booss - oh - vitch) >>>>>>>>>>> >>>>>>>>>>> On Jan 20, 2022, at 17:47, Matthew Knepley <knep...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Thu, Jan 20, 2022 at 6:44 PM Fande Kong <fdkong...@gmail.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks, Jed >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <j...@jedbrown.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> You can't create CUDA or Kokkos Vecs if you're running on a >>>>>>>>>>>>> node without a GPU. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am running the code on compute nodes that do have GPUs. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> If you are actually running on GPUs, why would you need lazy >>>>>>>>>>> initialization? It would not break with GPUs present. >>>>>>>>>>> >>>>>>>>>>> Matt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> With PETSc-3.16.1, I got good speedup by running GAMG on >>>>>>>>>>>> GPUs. That might be a bug of PETSc-main. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Fande >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> KSPSetUp 13 1.0 6.4400e-01 1.0 2.02e+09 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 3140 >>>>>>>>>>>> 64630 15 >>>>>>>>>>>> 1.05e+02 5 3.49e+01 100 >>>>>>>>>>>> KSPSolve 1 1.0 1.0109e+00 1.0 3.49e+10 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 87 0 0 0 0 87 0 0 0 34522 >>>>>>>>>>>> 69556 4 >>>>>>>>>>>> 4.35e-03 1 2.38e-03 100 >>>>>>>>>>>> KSPGMRESOrthog 142 1.0 1.2674e-01 1.0 1.06e+10 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 83755 >>>>>>>>>>>> 87801 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 100 >>>>>>>>>>>> SNESSolve 1 1.0 4.4402e+01 1.0 4.00e+10 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 21100 0 0 0 21100 0 0 0 901 >>>>>>>>>>>> 51365 57 >>>>>>>>>>>> 1.10e+03 52 8.78e+02 100 >>>>>>>>>>>> SNESSetUp 1 1.0 3.9101e-05 1.0 0.00e+00 0.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>> SNESFunctionEval 2 1.0 1.7097e+01 1.0 1.60e+07 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 1 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 6 1.92e+02 0 >>>>>>>>>>>> SNESJacobianEval 1 1.0 1.6213e+01 1.0 2.80e+07 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 2 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 1 3.20e+01 0 >>>>>>>>>>>> SNESLineSearch 1 1.0 8.5582e+00 1.0 1.24e+08 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 14 >>>>>>>>>>>> 64153 1 >>>>>>>>>>>> 3.20e+01 3 9.61e+01 94 >>>>>>>>>>>> PCGAMGGraph_AGG 5 1.0 3.0509e+00 1.0 8.19e+07 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 >>>>>>>>>>>> 0 5 >>>>>>>>>>>> 3.49e+01 9 7.43e+01 0 >>>>>>>>>>>> PCGAMGCoarse_AGG 5 1.0 3.8711e+00 1.0 0.00e+00 0.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>> PCGAMGProl_AGG 5 1.0 7.0748e-01 1.0 0.00e+00 0.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>> PCGAMGPOpt_AGG 5 1.0 1.2904e+00 1.0 2.14e+09 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 1661 >>>>>>>>>>>> 29807 26 >>>>>>>>>>>> 7.15e+02 20 2.90e+02 99 >>>>>>>>>>>> GAMG: createProl 5 1.0 8.9489e+00 1.0 2.22e+09 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 4 6 0 0 0 4 6 0 0 0 249 >>>>>>>>>>>> 29666 31 >>>>>>>>>>>> 7.50e+02 29 3.64e+02 96 >>>>>>>>>>>> Graph 10 1.0 3.0478e+00 1.0 8.19e+07 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 >>>>>>>>>>>> 0 5 >>>>>>>>>>>> 3.49e+01 9 7.43e+01 0 >>>>>>>>>>>> MIS/Agg 5 1.0 4.1290e-01 1.0 0.00e+00 0.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>> SA: col data 5 1.0 1.9127e-02 1.0 0.00e+00 0.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>> SA: frmProl0 5 1.0 6.2662e-01 1.0 0.00e+00 0.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>> SA: smooth 5 1.0 4.9595e-01 1.0 1.21e+08 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 244 >>>>>>>>>>>> 2709 15 >>>>>>>>>>>> 1.97e+02 15 2.55e+02 90 >>>>>>>>>>>> GAMG: partLevel 5 1.0 4.7330e-01 1.0 6.98e+08 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1475 >>>>>>>>>>>> 4120 5 >>>>>>>>>>>> 1.78e+02 10 2.55e+02 100 >>>>>>>>>>>> PCGAMG Squ l00 1 1.0 2.6027e+00 1.0 0.00e+00 0.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>>>>>>>>>> 0 0 >>>>>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>>>>> PCGAMG Gal l00 1 1.0 3.8406e-01 1.0 5.48e+08 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1426 >>>>>>>>>>>> 4270 1 >>>>>>>>>>>> 1.48e+02 2 2.11e+02 100 >>>>>>>>>>>> PCGAMG Opt l00 1 1.0 2.4932e-01 1.0 7.20e+07 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 289 >>>>>>>>>>>> 2653 1 >>>>>>>>>>>> 6.41e+01 1 1.13e+02 100 >>>>>>>>>>>> PCGAMG Gal l01 1 1.0 6.6279e-02 1.0 1.09e+08 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1645 >>>>>>>>>>>> 3851 1 >>>>>>>>>>>> 2.40e+01 2 3.64e+01 100 >>>>>>>>>>>> PCGAMG Opt l01 1 1.0 2.9544e-02 1.0 7.15e+06 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 242 >>>>>>>>>>>> 1671 1 >>>>>>>>>>>> 4.84e+00 1 1.23e+01 100 >>>>>>>>>>>> PCGAMG Gal l02 1 1.0 1.8874e-02 1.0 3.72e+07 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1974 >>>>>>>>>>>> 3636 1 >>>>>>>>>>>> 5.04e+00 2 6.58e+00 100 >>>>>>>>>>>> PCGAMG Opt l02 1 1.0 7.4353e-03 1.0 2.40e+06 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 323 >>>>>>>>>>>> 1457 1 >>>>>>>>>>>> 7.71e-01 1 2.30e+00 100 >>>>>>>>>>>> PCGAMG Gal l03 1 1.0 2.8479e-03 1.0 4.10e+06 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1440 >>>>>>>>>>>> 2266 1 >>>>>>>>>>>> 4.44e-01 2 5.51e-01 100 >>>>>>>>>>>> PCGAMG Opt l03 1 1.0 8.2684e-04 1.0 2.80e+05 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 339 >>>>>>>>>>>> 1667 1 >>>>>>>>>>>> 6.72e-02 1 2.03e-01 100 >>>>>>>>>>>> PCGAMG Gal l04 1 1.0 1.2238e-03 1.0 2.09e+05 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 170 >>>>>>>>>>>> 244 1 >>>>>>>>>>>> 2.05e-02 2 2.53e-02 100 >>>>>>>>>>>> PCGAMG Opt l04 1 1.0 4.1008e-04 1.0 1.77e+04 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 43 >>>>>>>>>>>> 165 1 >>>>>>>>>>>> 4.49e-03 1 1.19e-02 100 >>>>>>>>>>>> PCSetUp 2 1.0 9.9632e+00 1.0 4.95e+09 1.0 >>>>>>>>>>>> 0.0e+00 0.0e+00 0.0e+00 5 12 0 0 0 5 12 0 0 0 496 >>>>>>>>>>>> 17826 55 >>>>>>>>>>>> 1.03e+03 45 6.54e+02 98 >>>>>>>>>>>> PCSetUpOnBlocks 44 1.0 9.9087e-04 1.0 2.88e+03 1.0 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> The point of lazy initialization is to make it possible to run >>>>>>>>>>>>> a solve that doesn't use a GPU in PETSC_ARCH that supports GPUs, >>>>>>>>>>>>> regardless >>>>>>>>>>>>> of whether a GPU is actually present. >>>>>>>>>>>>> >>>>>>>>>>>>> Fande Kong <fdkong...@gmail.com> writes: >>>>>>>>>>>>> >>>>>>>>>>>>> > I spoke too soon. It seems that we have trouble creating >>>>>>>>>>>>> cuda/kokkos vecs >>>>>>>>>>>>> > now. Got Segmentation fault. >>>>>>>>>>>>> > >>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>> > >>>>>>>>>>>>> > Fande >>>>>>>>>>>>> > >>>>>>>>>>>>> > Program received signal SIGSEGV, Segmentation fault. >>>>>>>>>>>>> > 0x00002aaab5558b11 in >>>>>>>>>>>>> > >>>>>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize >>>>>>>>>>>>> > (this=0x1) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54 >>>>>>>>>>>>> > 54 PetscErrorCode >>>>>>>>>>>>> CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept >>>>>>>>>>>>> > Missing separate debuginfos, use: debuginfo-install >>>>>>>>>>>>> > bzip2-libs-1.0.6-13.el7.x86_64 >>>>>>>>>>>>> elfutils-libelf-0.176-5.el7.x86_64 >>>>>>>>>>>>> > elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64 >>>>>>>>>>>>> > libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64 >>>>>>>>>>>>> > libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64 >>>>>>>>>>>>> > libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64 >>>>>>>>>>>>> > libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64 >>>>>>>>>>>>> > libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64 >>>>>>>>>>>>> > libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64 >>>>>>>>>>>>> > libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 >>>>>>>>>>>>> libnl3-3.2.28-4.el7.x86_64 >>>>>>>>>>>>> > librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64 >>>>>>>>>>>>> > librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 >>>>>>>>>>>>> libxcb-1.13-1.el7.x86_64 >>>>>>>>>>>>> > libxml2-2.9.1-6.el7_9.6.x86_64 >>>>>>>>>>>>> numactl-libs-2.0.12-5.el7.x86_64 >>>>>>>>>>>>> > systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64 >>>>>>>>>>>>> > zlib-1.2.7-19.el7_9.x86_64 >>>>>>>>>>>>> > (gdb) bt >>>>>>>>>>>>> > #0 0x00002aaab5558b11 in >>>>>>>>>>>>> > >>>>>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize >>>>>>>>>>>>> > (this=0x1) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54 >>>>>>>>>>>>> > #1 0x00002aaab5558db7 in >>>>>>>>>>>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice >>>>>>>>>>>>> > (this=this@entry=0x2aaab7f37b70 >>>>>>>>>>>>> > <CUDADevice>, device=0x115da00, id=-35, id@entry=-1) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344 >>>>>>>>>>>>> > #2 0x00002aaab55577de in PetscDeviceCreate (type=type@entry >>>>>>>>>>>>> =PETSC_DEVICE_CUDA, >>>>>>>>>>>>> > devid=devid@entry=-1, device=device@entry=0x2aaab7f37b48 >>>>>>>>>>>>> > <defaultDevices+8>) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107 >>>>>>>>>>>>> > #3 0x00002aaab5557b3a in >>>>>>>>>>>>> PetscDeviceInitializeDefaultDevice_Internal >>>>>>>>>>>>> > (type=type@entry=PETSC_DEVICE_CUDA, >>>>>>>>>>>>> defaultDeviceId=defaultDeviceId@entry=-1) >>>>>>>>>>>>> > at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273 >>>>>>>>>>>>> > #4 0x00002aaab5557bf6 in PetscDeviceInitialize >>>>>>>>>>>>> > (type=type@entry=PETSC_DEVICE_CUDA) >>>>>>>>>>>>> > at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234 >>>>>>>>>>>>> > #5 0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244 >>>>>>>>>>>>> > #6 0x00002aaab5649b40 in VecSetType (vec=vec@entry >>>>>>>>>>>>> =0x115d150, >>>>>>>>>>>>> > method=method@entry=0x2aaab70b45b8 "seqcuda") at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93 >>>>>>>>>>>>> > #7 0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/ >>>>>>>>>>>>> > mpicuda.cu:214 >>>>>>>>>>>>> > #8 0x00002aaab5649b40 in VecSetType (vec=vec@entry >>>>>>>>>>>>> =0x115d150, >>>>>>>>>>>>> > method=method@entry=0x7fffffff9260 "cuda") at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93 >>>>>>>>>>>>> > #9 0x00002aaab5648bf1 in VecSetTypeFromOptions_Private >>>>>>>>>>>>> (vec=0x115d150, >>>>>>>>>>>>> > PetscOptionsObject=0x7fffffff9210) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263 >>>>>>>>>>>>> > #10 VecSetFromOptions (vec=0x115d150) at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297 >>>>>>>>>>>>> > #11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init >>>>>>>>>>>>> > (this=0x11cd1a0, n=441, n_local=441, fast=false, >>>>>>>>>>>>> ptype=libMesh::PARALLEL) >>>>>>>>>>>>> > at >>>>>>>>>>>>> > >>>>>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693 >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Thu, Jan 20, 2022 at 1:09 PM Fande Kong < >>>>>>>>>>>>> fdkong...@gmail.com> wrote: >>>>>>>>>>>>> > >>>>>>>>>>>>> >> Thanks, Jed, >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> This worked! >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Fande >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown < >>>>>>>>>>>>> j...@jedbrown.org> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >>> Fande Kong <fdkong...@gmail.com> writes: >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch < >>>>>>>>>>>>> >>> jacob....@gmail.com> >>>>>>>>>>>>> >>> > wrote: >>>>>>>>>>>>> >>> > >>>>>>>>>>>>> >>> >> Are you running on login nodes or compute nodes (I >>>>>>>>>>>>> can’t seem to tell >>>>>>>>>>>>> >>> from >>>>>>>>>>>>> >>> >> the configure.log)? >>>>>>>>>>>>> >>> >> >>>>>>>>>>>>> >>> > >>>>>>>>>>>>> >>> > I was compiling codes on login nodes, and running codes >>>>>>>>>>>>> on compute >>>>>>>>>>>>> >>> nodes. >>>>>>>>>>>>> >>> > Login nodes do not have GPUs, but compute nodes do have >>>>>>>>>>>>> GPUs. >>>>>>>>>>>>> >>> > >>>>>>>>>>>>> >>> > Just to be clear, the same thing (code, machine) with >>>>>>>>>>>>> PETSc-3.16.1 >>>>>>>>>>>>> >>> worked >>>>>>>>>>>>> >>> > perfectly. I have this trouble with PETSc-main. >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> I assume you can >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> export PETSC_OPTIONS='-device_enable lazy' >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> and it'll work. >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> I think this should be the default. The main complaint is >>>>>>>>>>>>> that timing the >>>>>>>>>>>>> >>> first GPU-using event isn't accurate if it includes >>>>>>>>>>>>> initialization, but I >>>>>>>>>>>>> >>> think this is mostly hypothetical because you can't trust >>>>>>>>>>>>> any timing that >>>>>>>>>>>>> >>> doesn't preload in some form and the first GPU-using event >>>>>>>>>>>>> will almost >>>>>>>>>>>>> >>> always be something uninteresting so I think it will >>>>>>>>>>>>> rarely lead to >>>>>>>>>>>>> >>> confusion. Meanwhile, eager initialization is viscerally >>>>>>>>>>>>> disruptive for >>>>>>>>>>>>> >>> lots of people. >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> What most experimenters take for granted before they begin their >>>>>>>>>>> experiments is infinitely more interesting than any results to >>>>>>>>>>> which their >>>>>>>>>>> experiments lead. >>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>> >>>>>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> <configure_bad.log><configure_good.log> >>>>>> >>>>>> >>>>>>