I talked to the MVAPICH people, and they told me to try adding /path/to/mvapich2-gdr/lib64/libmpi.so to LD_PRELOAD (apparently, they've had this issue before). This seemed to do the trick; I can build everything with MVAPICH2-GDR and run with it now. Not sure if this is something you want to add to the docs.
Thanks, Sreeram On Wed, Apr 17, 2024 at 9:17 AM Junchao Zhang <[email protected]> wrote: > I looked at it before and checked again, and still see > https://urldefense.us/v3/__https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html*inter-gpu-communication-with-cuda-aware-mpi__;Iw!!G_uCfscf7eWS!bE42LoNqiOoD5Yu05BdtZqAHbFHxkuFy6S8ljr09QRqymgFnne-nbxx-xywoOtRzBGA3fRvcsOyyVgNDLWPRbI84MA$ > > > Using both MPI and NCCL to perform transfers between the same sets of > CUDA devices concurrently is therefore not guaranteed to be safe. > > I was scared by it. It means we have to replace all MPI device > communications (what if they are from a third-party library?) with NCCL. > > --Junchao Zhang > > > On Wed, Apr 17, 2024 at 8:27 AM Sreeram R Venkat <[email protected]> > wrote: > >> Yes, I saw this paper >> https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!bE42LoNqiOoD5Yu05BdtZqAHbFHxkuFy6S8ljr09QRqymgFnne-nbxx-xywoOtRzBGA3fRvcsOyyVgNDLWMNhkqWSA$ >> >> that mentioned it, and I heard in Barry's talk at SIAM PP this year about >> the need for stream-aware MPI, so I was wondering if NCCL would be used in >> PETSc to do GPU-GPU communication. >> >> On Wed, Apr 17, 2024, 7:58 AM Junchao Zhang <[email protected]> >> wrote: >> >>> >>> >>> >>> >>> On Wed, Apr 17, 2024 at 7:51 AM Sreeram R Venkat <[email protected]> >>> wrote: >>> >>>> Do you know if there are plans for NCCL support in PETSc? >>>> >>> What is your need? Do you mean using NCCL for the MPI communication? >>> >>> >>>> >>>> On Tue, Apr 16, 2024, 10:41 PM Junchao Zhang <[email protected]> >>>> wrote: >>>> >>>>> Glad to hear you found a way. Did you use Frontera at TACC? If yes, >>>>> I could have a try. >>>>> >>>>> --Junchao Zhang >>>>> >>>>> >>>>> On Tue, Apr 16, 2024 at 8:35 PM Sreeram R Venkat <[email protected]> >>>>> wrote: >>>>> >>>>>> I finally figured out a way to make it work. I had to build PETSc and >>>>>> my application using the (non GPU-aware) Intel MPI. Then, before >>>>>> running, I >>>>>> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the >>>>>> only >>>>>> way I've >>>>>> ZjQcmQRYFpfptBannerStart >>>>>> This Message Is From an External Sender >>>>>> This message came from outside your organization. >>>>>> >>>>>> ZjQcmQRYFpfptBannerEnd >>>>>> I finally figured out a way to make it work. I had to build PETSc and >>>>>> my application using the (non GPU-aware) Intel MPI. Then, before >>>>>> running, I >>>>>> switch to the MVAPICH2-GDR. >>>>>> I'm not sure why that works, but it's the only way I've found to >>>>>> compile and run successfully without throwing any errors about not >>>>>> having a >>>>>> GPU-aware MPI. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Dec 8, 2023 at 5:30 PM Mark Adams <[email protected]> wrote: >>>>>> >>>>>>> You may need to set some env variables. This can be system specific >>>>>>> so you might want to look at docs or ask TACC how to run with GPU-aware >>>>>>> MPI. >>>>>>> >>>>>>> Mark >>>>>>> >>>>>>> On Fri, Dec 8, 2023 at 5:17 PM Sreeram R Venkat <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Actually, when I compile my program with this build of PETSc and >>>>>>>> run, I still get the error: >>>>>>>> >>>>>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is >>>>>>>> not GPU-aware. For better performance, please use a GPU-aware MPI. >>>>>>>> >>>>>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>>>>> >>>>>>>> Is there anything else I need to do? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sreeram >>>>>>>> >>>>>>>> On Fri, Dec 8, 2023 at 3:29 PM Sreeram R Venkat < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>>>>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>>>>>>> >>>>>>>>> On Fri, Dec 8, 2023 at 1:15 PM Satish Balay <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Executing: mpicc -show >>>>>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>>>>> -I/opt/apps/cuda/11.4/include -lcuda >>>>>>>>>> -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 >>>>>>>>>> -Wl,--enable-new-dtags -lmpi >>>>>>>>>> >>>>>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>>>>> >>>>>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>>>>> this build. >>>>>>>>>> >>>>>>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>>>>>>> >>>>>>>>>> Satish >>>>>>>>>> >>>>>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>>>>> >>>>>>>>>> > On Fri, Dec 8, 2023 at 1:54 PM Sreeram R Venkat < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> > >>>>>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>>>>> MVAPICH2-GDR. >>>>>>>>>> > > >>>>>>>>>> > > Here is my configure command: >>>>>>>>>> > > >>>>>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>>>>> --download-hypre >>>>>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>>>>> --download-metis >>>>>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>>>>> --with-fc=mpif90 >>>>>>>>>> > > >>>>>>>>>> > > which errors with: >>>>>>>>>> > > >>>>>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>>>>> configure.log for >>>>>>>>>> > > details): >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ >>>>>>>>>> -std=c++14 >>>>>>>>>> > > -Xcompiler -fPIC >>>>>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>>>>> > > arch=compute_80,code=sm_80" >>>>>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > The same configure command works when I use the Intel MPI and >>>>>>>>>> I can build >>>>>>>>>> > > with CUDA. The full config.log file is attached. Please let >>>>>>>>>> me know if you >>>>>>>>>> > > need any other information. I appreciate your help with this. >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> > The proximate error is >>>>>>>>>> > >>>>>>>>>> > Executing: nvcc -c -o >>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>>>>> -std=c++14 >>>>>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>>>>> -gencode >>>>>>>>>> > arch=compute_80,code=sm_80 >>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>>>>> > conftest.cu >>>>>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$> >>>>>>>>>> > stdout: >>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>> than one >>>>>>>>>> > instance of overloaded function >>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>> > "C" linkage >>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$> >>>>>>>>>> ". >>>>>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>>>>> > stderr: >>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>> than one >>>>>>>>>> > instance of overloaded function >>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>> > "C" linkage >>>>>>>>>> > >>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>>>>> > >>>>>>>>>> > This looks like screwed up headers to me, but I will let >>>>>>>>>> someone that >>>>>>>>>> > understands CUDA compilation reply. >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > >>>>>>>>>> > Matt >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > > Sreeram >>>>>>>>>> > > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> >>>>>>>>>
