Do you have the configure.log with main? --Junchao Zhang
On Wed, Jan 26, 2022 at 12:26 PM Fande Kong <fdkong...@gmail.com> wrote: > I am on the petsc-main > > commit 1390d3a27d88add7d79c9b38bf1a895ae5e67af6 > > Merge: 96c919c d5f3255 > > Author: Satish Balay <ba...@mcs.anl.gov> > > Date: Wed Jan 26 10:28:32 2022 -0600 > > > Merge remote-tracking branch 'origin/release' > > > It is still broken. > > Thanks, > > > Fande > > On Wed, Jan 26, 2022 at 7:40 AM Junchao Zhang <junchao.zh...@gmail.com> > wrote: > >> The good uses the compiler's default library/header path. The bad >> searches from cuda toolkit path and uses rpath linking. >> Though the paths look the same on the login node, they could have >> different behavior on a compute node depending on its environment. >> I think we fixed the issue in cuda.py (i.e., first try the compiler's >> default, then toolkit). That's why I wanted Fande to use petsc/main. >> >> --Junchao Zhang >> >> >> On Tue, Jan 25, 2022 at 11:59 PM Barry Smith <bsm...@petsc.dev> wrote: >> >>> >>> bad has extra >>> >>> -L/apps/local/spack/software/gcc-7.5.0/cuda-10.1.243-v4ymjqcrr7f72qfiuzsstuy5jiajbuey/lib64/stubs >>> -lcuda >>> >>> good does not. >>> >>> Try removing the stubs directory and -lcuda from the bad >>> $PETSC_ARCH/lib/petsc/conf/variables and likely the bad will start working. >>> >>> Barry >>> >>> I never liked the stubs stuff. >>> >>> On Jan 25, 2022, at 11:29 PM, Fande Kong <fdkong...@gmail.com> wrote: >>> >>> Hi Junchao, >>> >>> I attached a "bad" configure log and a "good" configure log. >>> >>> The "bad" one was on produced at 246ba74192519a5f34fb6e227d1c64364e19ce2c >>> >>> and the "good" one at 384645a00975869a1aacbd3169de62ba40cad683 >>> >>> This good hash is the last good hash that is just the right before the >>> bad one. >>> >>> I think you could do a comparison between these two logs, and check >>> what the differences were. >>> >>> Thanks, >>> >>> Fande >>> >>> On Tue, Jan 25, 2022 at 8:21 PM Junchao Zhang <junchao.zh...@gmail.com> >>> wrote: >>> >>>> Fande, could you send the configure.log that works (i.e., before this >>>> offending commit)? >>>> --Junchao Zhang >>>> >>>> >>>> On Tue, Jan 25, 2022 at 8:21 PM Fande Kong <fdkong...@gmail.com> wrote: >>>> >>>>> Not sure if this is helpful. I did "git bisect", and here was the >>>>> result: >>>>> >>>>> [kongf@sawtooth2 petsc]$ git bisect bad >>>>> 246ba74192519a5f34fb6e227d1c64364e19ce2c is the first bad commit >>>>> commit 246ba74192519a5f34fb6e227d1c64364e19ce2c >>>>> Author: Junchao Zhang <jczh...@mcs.anl.gov> >>>>> Date: Wed Oct 13 05:32:43 2021 +0000 >>>>> >>>>> Config: fix CUDA library and header dirs >>>>> >>>>> :040000 040000 187c86055adb80f53c1d0565a8888704fec43a96 >>>>> ea1efd7f594fd5e8df54170bc1bc7b00f35e4d5f M config >>>>> >>>>> >>>>> Started from this commit, and GPU did not work for me on our HPC >>>>> >>>>> Thanks, >>>>> Fande >>>>> >>>>> On Tue, Jan 25, 2022 at 7:18 PM Fande Kong <fdkong...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 25, 2022 at 9:04 AM Jacob Faibussowitsch < >>>>>> jacob....@gmail.com> wrote: >>>>>> >>>>>>> Configure should not have an impact here I think. The reason I had >>>>>>> you run `cudaGetDeviceCount()` is because this is the CUDA call (and in >>>>>>> fact the only CUDA call) in the initialization sequence that returns the >>>>>>> error code. There should be no prior CUDA calls. Maybe this is a problem >>>>>>> with oversubscribing GPU’s? In the runs that crash, how many ranks are >>>>>>> using any given GPU at once? Maybe MPS is required. >>>>>>> >>>>>> >>>>>> I used one MPI rank. >>>>>> >>>>>> Fande >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Jacob Faibussowitsch >>>>>>> (Jacob Fai - booss - oh - vitch) >>>>>>> >>>>>>> On Jan 21, 2022, at 12:01, Fande Kong <fdkong...@gmail.com> wrote: >>>>>>> >>>>>>> Thanks Jacob, >>>>>>> >>>>>>> On Thu, Jan 20, 2022 at 6:25 PM Jacob Faibussowitsch < >>>>>>> jacob....@gmail.com> wrote: >>>>>>> >>>>>>>> Segfault is caused by the following check at >>>>>>>> src/sys/objects/device/impls/cupm/cupmdevice.cxx:349 being a >>>>>>>> PetscUnlikelyDebug() rather than just PetscUnlikely(): >>>>>>>> >>>>>>>> ``` >>>>>>>> if (PetscUnlikelyDebug(_defaultDevice < 0)) { // _defaultDevice is >>>>>>>> in fact < 0 here and uncaught >>>>>>>> ``` >>>>>>>> >>>>>>>> To clarify: >>>>>>>> >>>>>>>> “lazy” initialization is not that lazy after all, it still does >>>>>>>> some 50% of the initialization that “eager” initialization does. It >>>>>>>> stops >>>>>>>> short initializing the CUDA runtime, checking CUDA aware MPI, gathering >>>>>>>> device data, and initializing cublas and friends. Lazy also importantly >>>>>>>> swallows any errors that crop up during initialization, storing the >>>>>>>> resulting error code for later (specifically _defaultDevice = >>>>>>>> -init_error_value;). >>>>>>>> >>>>>>>> So whether you initialize lazily or eagerly makes no difference >>>>>>>> here, as _defaultDevice will always contain -35. >>>>>>>> >>>>>>>> The bigger question is why cudaGetDeviceCount() is returning >>>>>>>> cudaErrorInsufficientDriver. Can you compile and run >>>>>>>> >>>>>>>> ``` >>>>>>>> #include <cuda_runtime.h> >>>>>>>> >>>>>>>> int main() >>>>>>>> { >>>>>>>> int ndev; >>>>>>>> return cudaGetDeviceCount(&ndev): >>>>>>>> } >>>>>>>> ``` >>>>>>>> >>>>>>>> Then show the value of "echo $?”? >>>>>>>> >>>>>>> >>>>>>> Modify your code a little to get more information. >>>>>>> >>>>>>> #include <cuda_runtime.h> >>>>>>> #include <cstdio> >>>>>>> >>>>>>> int main() >>>>>>> { >>>>>>> int ndev; >>>>>>> int error = cudaGetDeviceCount(&ndev); >>>>>>> printf("ndev %d \n", ndev); >>>>>>> printf("error %d \n", error); >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> Results: >>>>>>> >>>>>>> $ ./a.out >>>>>>> ndev 4 >>>>>>> error 0 >>>>>>> >>>>>>> >>>>>>> I have not read the PETSc cuda initialization code yet. If I need to >>>>>>> guess at what was happening. I will naively think that PETSc did not get >>>>>>> correct GPU information in the configuration because the compiler node >>>>>>> does >>>>>>> not have GPUs, and there was no way to get any GPU device information. >>>>>>> >>>>>>> >>>>>>> During the runtime on GPU nodes, PETSc might have incorrect >>>>>>> information grabbed during configuration and had this kind of false >>>>>>> error >>>>>>> message. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Fande >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Jacob Faibussowitsch >>>>>>>> (Jacob Fai - booss - oh - vitch) >>>>>>>> >>>>>>>> On Jan 20, 2022, at 17:47, Matthew Knepley <knep...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> On Thu, Jan 20, 2022 at 6:44 PM Fande Kong <fdkong...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks, Jed >>>>>>>>> >>>>>>>>> On Thu, Jan 20, 2022 at 4:34 PM Jed Brown <j...@jedbrown.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> You can't create CUDA or Kokkos Vecs if you're running on a node >>>>>>>>>> without a GPU. >>>>>>>>> >>>>>>>>> >>>>>>>>> I am running the code on compute nodes that do have GPUs. >>>>>>>>> >>>>>>>> >>>>>>>> If you are actually running on GPUs, why would you need lazy >>>>>>>> initialization? It would not break with GPUs present. >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> With PETSc-3.16.1, I got good speedup by running GAMG on GPUs. >>>>>>>>> That might be a bug of PETSc-main. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Fande >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> KSPSetUp 13 1.0 6.4400e-01 1.0 2.02e+09 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 5 0 0 0 0 5 0 0 0 3140 64630 15 >>>>>>>>> 1.05e+02 5 3.49e+01 100 >>>>>>>>> KSPSolve 1 1.0 1.0109e+00 1.0 3.49e+10 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 87 0 0 0 0 87 0 0 0 34522 69556 4 >>>>>>>>> 4.35e-03 1 2.38e-03 100 >>>>>>>>> KSPGMRESOrthog 142 1.0 1.2674e-01 1.0 1.06e+10 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 27 0 0 0 0 27 0 0 0 83755 87801 0 >>>>>>>>> 0.00e+00 0 0.00e+00 100 >>>>>>>>> SNESSolve 1 1.0 4.4402e+01 1.0 4.00e+10 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 21100 0 0 0 21100 0 0 0 901 51365 57 >>>>>>>>> 1.10e+03 52 8.78e+02 100 >>>>>>>>> SNESSetUp 1 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>> SNESFunctionEval 2 1.0 1.7097e+01 1.0 1.60e+07 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 1 0 0 >>>>>>>>> 0.00e+00 6 1.92e+02 0 >>>>>>>>> SNESJacobianEval 1 1.0 1.6213e+01 1.0 2.80e+07 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 2 0 0 >>>>>>>>> 0.00e+00 1 3.20e+01 0 >>>>>>>>> SNESLineSearch 1 1.0 8.5582e+00 1.0 1.24e+08 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 4 0 0 0 0 4 0 0 0 0 14 64153 1 >>>>>>>>> 3.20e+01 3 9.61e+01 94 >>>>>>>>> PCGAMGGraph_AGG 5 1.0 3.0509e+00 1.0 8.19e+07 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 0 5 >>>>>>>>> 3.49e+01 9 7.43e+01 0 >>>>>>>>> PCGAMGCoarse_AGG 5 1.0 3.8711e+00 1.0 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 0 0 >>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>> PCGAMGProl_AGG 5 1.0 7.0748e-01 1.0 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>> PCGAMGPOpt_AGG 5 1.0 1.2904e+00 1.0 2.14e+09 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 1 5 0 0 0 1 5 0 0 0 1661 29807 26 >>>>>>>>> 7.15e+02 20 2.90e+02 99 >>>>>>>>> GAMG: createProl 5 1.0 8.9489e+00 1.0 2.22e+09 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 4 6 0 0 0 4 6 0 0 0 249 29666 31 >>>>>>>>> 7.50e+02 29 3.64e+02 96 >>>>>>>>> Graph 10 1.0 3.0478e+00 1.0 8.19e+07 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 27 0 5 >>>>>>>>> 3.49e+01 9 7.43e+01 0 >>>>>>>>> MIS/Agg 5 1.0 4.1290e-01 1.0 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>> SA: col data 5 1.0 1.9127e-02 1.0 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>> SA: frmProl0 5 1.0 6.2662e-01 1.0 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 >>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>> SA: smooth 5 1.0 4.9595e-01 1.0 1.21e+08 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 244 2709 15 >>>>>>>>> 1.97e+02 15 2.55e+02 90 >>>>>>>>> GAMG: partLevel 5 1.0 4.7330e-01 1.0 6.98e+08 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 1475 4120 5 >>>>>>>>> 1.78e+02 10 2.55e+02 100 >>>>>>>>> PCGAMG Squ l00 1 1.0 2.6027e+00 1.0 0.00e+00 0.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 0 0 >>>>>>>>> 0.00e+00 0 0.00e+00 0 >>>>>>>>> PCGAMG Gal l00 1 1.0 3.8406e-01 1.0 5.48e+08 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1426 4270 1 >>>>>>>>> 1.48e+02 2 2.11e+02 100 >>>>>>>>> PCGAMG Opt l00 1 1.0 2.4932e-01 1.0 7.20e+07 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 289 2653 1 >>>>>>>>> 6.41e+01 1 1.13e+02 100 >>>>>>>>> PCGAMG Gal l01 1 1.0 6.6279e-02 1.0 1.09e+08 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1645 3851 1 >>>>>>>>> 2.40e+01 2 3.64e+01 100 >>>>>>>>> PCGAMG Opt l01 1 1.0 2.9544e-02 1.0 7.15e+06 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 242 1671 1 >>>>>>>>> 4.84e+00 1 1.23e+01 100 >>>>>>>>> PCGAMG Gal l02 1 1.0 1.8874e-02 1.0 3.72e+07 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1974 3636 1 >>>>>>>>> 5.04e+00 2 6.58e+00 100 >>>>>>>>> PCGAMG Opt l02 1 1.0 7.4353e-03 1.0 2.40e+06 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 323 1457 1 >>>>>>>>> 7.71e-01 1 2.30e+00 100 >>>>>>>>> PCGAMG Gal l03 1 1.0 2.8479e-03 1.0 4.10e+06 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1440 2266 1 >>>>>>>>> 4.44e-01 2 5.51e-01 100 >>>>>>>>> PCGAMG Opt l03 1 1.0 8.2684e-04 1.0 2.80e+05 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 339 1667 1 >>>>>>>>> 6.72e-02 1 2.03e-01 100 >>>>>>>>> PCGAMG Gal l04 1 1.0 1.2238e-03 1.0 2.09e+05 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 170 244 1 >>>>>>>>> 2.05e-02 2 2.53e-02 100 >>>>>>>>> PCGAMG Opt l04 1 1.0 4.1008e-04 1.0 1.77e+04 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 43 165 1 >>>>>>>>> 4.49e-03 1 1.19e-02 100 >>>>>>>>> PCSetUp 2 1.0 9.9632e+00 1.0 4.95e+09 1.0 0.0e+00 >>>>>>>>> 0.0e+00 0.0e+00 5 12 0 0 0 5 12 0 0 0 496 17826 55 >>>>>>>>> 1.03e+03 45 6.54e+02 98 >>>>>>>>> PCSetUpOnBlocks 44 1.0 9.9087e-04 1.0 2.88e+03 1.0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> The point of lazy initialization is to make it possible to run a >>>>>>>>>> solve that doesn't use a GPU in PETSC_ARCH that supports GPUs, >>>>>>>>>> regardless >>>>>>>>>> of whether a GPU is actually present. >>>>>>>>>> >>>>>>>>>> Fande Kong <fdkong...@gmail.com> writes: >>>>>>>>>> >>>>>>>>>> > I spoke too soon. It seems that we have trouble creating >>>>>>>>>> cuda/kokkos vecs >>>>>>>>>> > now. Got Segmentation fault. >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > >>>>>>>>>> > Fande >>>>>>>>>> > >>>>>>>>>> > Program received signal SIGSEGV, Segmentation fault. >>>>>>>>>> > 0x00002aaab5558b11 in >>>>>>>>>> > >>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize >>>>>>>>>> > (this=0x1) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54 >>>>>>>>>> > 54 PetscErrorCode >>>>>>>>>> CUPMDevice<T>::CUPMDeviceInternal::initialize() noexcept >>>>>>>>>> > Missing separate debuginfos, use: debuginfo-install >>>>>>>>>> > bzip2-libs-1.0.6-13.el7.x86_64 >>>>>>>>>> elfutils-libelf-0.176-5.el7.x86_64 >>>>>>>>>> > elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-325.el7_9.x86_64 >>>>>>>>>> > libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64 >>>>>>>>>> > libattr-2.4.46-13.el7.x86_64 libcap-2.22-11.el7.x86_64 >>>>>>>>>> > libibmad-5.4.0.MLNX20190423.1d917ae-0.1.49224.x86_64 >>>>>>>>>> > libibumad-43.1.1.MLNX20200211.078947f-0.1.49224.x86_64 >>>>>>>>>> > libibverbs-41mlnx1-OFED.4.9.0.0.7.49224.x86_64 >>>>>>>>>> > libmlx4-41mlnx1-OFED.4.7.3.0.3.49224.x86_64 >>>>>>>>>> > libmlx5-41mlnx1-OFED.4.9.0.1.2.49224.x86_64 >>>>>>>>>> libnl3-3.2.28-4.el7.x86_64 >>>>>>>>>> > librdmacm-41mlnx1-OFED.4.7.3.0.6.49224.x86_64 >>>>>>>>>> > librxe-41mlnx1-OFED.4.4.2.4.6.49224.x86_64 >>>>>>>>>> libxcb-1.13-1.el7.x86_64 >>>>>>>>>> > libxml2-2.9.1-6.el7_9.6.x86_64 numactl-libs-2.0.12-5.el7.x86_64 >>>>>>>>>> > systemd-libs-219-78.el7_9.3.x86_64 xz-libs-5.2.2-1.el7.x86_64 >>>>>>>>>> > zlib-1.2.7-19.el7_9.x86_64 >>>>>>>>>> > (gdb) bt >>>>>>>>>> > #0 0x00002aaab5558b11 in >>>>>>>>>> > >>>>>>>>>> Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::CUPMDeviceInternal::initialize >>>>>>>>>> > (this=0x1) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:54 >>>>>>>>>> > #1 0x00002aaab5558db7 in >>>>>>>>>> > Petsc::CUPMDevice<(Petsc::CUPMDeviceType)0>::getDevice >>>>>>>>>> > (this=this@entry=0x2aaab7f37b70 >>>>>>>>>> > <CUDADevice>, device=0x115da00, id=-35, id@entry=-1) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/impls/cupm/cupmdevice.cxx:344 >>>>>>>>>> > #2 0x00002aaab55577de in PetscDeviceCreate (type=type@entry >>>>>>>>>> =PETSC_DEVICE_CUDA, >>>>>>>>>> > devid=devid@entry=-1, device=device@entry=0x2aaab7f37b48 >>>>>>>>>> > <defaultDevices+8>) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:107 >>>>>>>>>> > #3 0x00002aaab5557b3a in >>>>>>>>>> PetscDeviceInitializeDefaultDevice_Internal >>>>>>>>>> > (type=type@entry=PETSC_DEVICE_CUDA, >>>>>>>>>> defaultDeviceId=defaultDeviceId@entry=-1) >>>>>>>>>> > at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:273 >>>>>>>>>> > #4 0x00002aaab5557bf6 in PetscDeviceInitialize >>>>>>>>>> > (type=type@entry=PETSC_DEVICE_CUDA) >>>>>>>>>> > at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/sys/objects/device/interface/device.cxx:234 >>>>>>>>>> > #5 0x00002aaab5661fcd in VecCreate_SeqCUDA (V=0x115d150) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/seq/seqcuda/veccuda.c:244 >>>>>>>>>> > #6 0x00002aaab5649b40 in VecSetType (vec=vec@entry=0x115d150, >>>>>>>>>> > method=method@entry=0x2aaab70b45b8 "seqcuda") at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93 >>>>>>>>>> > #7 0x00002aaab579c33f in VecCreate_CUDA (v=0x115d150) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/impls/mpi/mpicuda/ >>>>>>>>>> > mpicuda.cu:214 >>>>>>>>>> > #8 0x00002aaab5649b40 in VecSetType (vec=vec@entry=0x115d150, >>>>>>>>>> > method=method@entry=0x7fffffff9260 "cuda") at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vecreg.c:93 >>>>>>>>>> > #9 0x00002aaab5648bf1 in VecSetTypeFromOptions_Private >>>>>>>>>> (vec=0x115d150, >>>>>>>>>> > PetscOptionsObject=0x7fffffff9210) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1263 >>>>>>>>>> > #10 VecSetFromOptions (vec=0x115d150) at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/petsc/src/vec/vec/interface/vector.c:1297 >>>>>>>>>> > #11 0x00002aaab02ef227 in libMesh::PetscVector<double>::init >>>>>>>>>> > (this=0x11cd1a0, n=441, n_local=441, fast=false, >>>>>>>>>> ptype=libMesh::PARALLEL) >>>>>>>>>> > at >>>>>>>>>> > >>>>>>>>>> /home/kongf/workhome/sawtooth/moosegpu/scripts/../libmesh/installed/include/libmesh/petsc_vector.h:693 >>>>>>>>>> > >>>>>>>>>> > On Thu, Jan 20, 2022 at 1:09 PM Fande Kong <fdkong...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> > >>>>>>>>>> >> Thanks, Jed, >>>>>>>>>> >> >>>>>>>>>> >> This worked! >>>>>>>>>> >> >>>>>>>>>> >> Fande >>>>>>>>>> >> >>>>>>>>>> >> On Wed, Jan 19, 2022 at 11:03 PM Jed Brown <j...@jedbrown.org> >>>>>>>>>> wrote: >>>>>>>>>> >> >>>>>>>>>> >>> Fande Kong <fdkong...@gmail.com> writes: >>>>>>>>>> >>> >>>>>>>>>> >>> > On Wed, Jan 19, 2022 at 11:39 AM Jacob Faibussowitsch < >>>>>>>>>> >>> jacob....@gmail.com> >>>>>>>>>> >>> > wrote: >>>>>>>>>> >>> > >>>>>>>>>> >>> >> Are you running on login nodes or compute nodes (I can’t >>>>>>>>>> seem to tell >>>>>>>>>> >>> from >>>>>>>>>> >>> >> the configure.log)? >>>>>>>>>> >>> >> >>>>>>>>>> >>> > >>>>>>>>>> >>> > I was compiling codes on login nodes, and running codes on >>>>>>>>>> compute >>>>>>>>>> >>> nodes. >>>>>>>>>> >>> > Login nodes do not have GPUs, but compute nodes do have >>>>>>>>>> GPUs. >>>>>>>>>> >>> > >>>>>>>>>> >>> > Just to be clear, the same thing (code, machine) with >>>>>>>>>> PETSc-3.16.1 >>>>>>>>>> >>> worked >>>>>>>>>> >>> > perfectly. I have this trouble with PETSc-main. >>>>>>>>>> >>> >>>>>>>>>> >>> I assume you can >>>>>>>>>> >>> >>>>>>>>>> >>> export PETSC_OPTIONS='-device_enable lazy' >>>>>>>>>> >>> >>>>>>>>>> >>> and it'll work. >>>>>>>>>> >>> >>>>>>>>>> >>> I think this should be the default. The main complaint is >>>>>>>>>> that timing the >>>>>>>>>> >>> first GPU-using event isn't accurate if it includes >>>>>>>>>> initialization, but I >>>>>>>>>> >>> think this is mostly hypothetical because you can't trust any >>>>>>>>>> timing that >>>>>>>>>> >>> doesn't preload in some form and the first GPU-using event >>>>>>>>>> will almost >>>>>>>>>> >>> always be something uninteresting so I think it will rarely >>>>>>>>>> lead to >>>>>>>>>> >>> confusion. Meanwhile, eager initialization is viscerally >>>>>>>>>> disruptive for >>>>>>>>>> >>> lots of people. >>>>>>>>>> >>> >>>>>>>>>> >> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which >>>>>>>> their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>>> >>>>>>>> >>>>>>> <configure_bad.log><configure_good.log> >>> >>> >>>