Thanks for the trick. We can prepare the example script for Lonestar6 and mention it.
--Junchao Zhang On Fri, Apr 19, 2024 at 11:55 AM Sreeram R Venkat <[email protected]> wrote: > I talked to the MVAPICH people, and they told me to try adding > /path/to/mvapich2-gdr/lib64/libmpi.so to LD_PRELOAD (apparently, they've > had this issue before). This seemed to do the trick; I can build everything > with MVAPICH2-GDR and run with it now. Not sure if this is something you > want to add to the docs. > > Thanks, > Sreeram > > On Wed, Apr 17, 2024 at 9:17 AM Junchao Zhang <[email protected]> > wrote: > >> I looked at it before and checked again, and still see >> https://urldefense.us/v3/__https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html*inter-gpu-communication-with-cuda-aware-mpi__;Iw!!G_uCfscf7eWS!a7KaPIMnK3W6fKk-LnqLXX2RuRqPkLf7VqOFMTbTer2ssQFCasDyKoFrz3cZDwHMUxFHFHYcHAp3JQLue4dkR3JyE_IH$ >> >> > Using both MPI and NCCL to perform transfers between the same sets of >> CUDA devices concurrently is therefore not guaranteed to be safe. >> >> I was scared by it. It means we have to replace all MPI device >> communications (what if they are from a third-party library?) with NCCL. >> >> --Junchao Zhang >> >> >> On Wed, Apr 17, 2024 at 8:27 AM Sreeram R Venkat <[email protected]> >> wrote: >> >>> Yes, I saw this paper >>> https://urldefense.us/v3/__https://www.sciencedirect.com/science/article/abs/pii/S016781912100079X__;!!G_uCfscf7eWS!a7KaPIMnK3W6fKk-LnqLXX2RuRqPkLf7VqOFMTbTer2ssQFCasDyKoFrz3cZDwHMUxFHFHYcHAp3JQLue4dkRxedk29J$ >>> >>> that mentioned it, and I heard in Barry's talk at SIAM PP this year >>> about the need for stream-aware MPI, so I was wondering if NCCL would be >>> used in PETSc to do GPU-GPU communication. >>> >>> On Wed, Apr 17, 2024, 7:58 AM Junchao Zhang <[email protected]> >>> wrote: >>> >>>> >>>> >>>> >>>> >>>> On Wed, Apr 17, 2024 at 7:51 AM Sreeram R Venkat <[email protected]> >>>> wrote: >>>> >>>>> Do you know if there are plans for NCCL support in PETSc? >>>>> >>>> What is your need? Do you mean using NCCL for the MPI communication? >>>> >>>> >>>>> >>>>> On Tue, Apr 16, 2024, 10:41 PM Junchao Zhang <[email protected]> >>>>> wrote: >>>>> >>>>>> Glad to hear you found a way. Did you use Frontera at TACC? If >>>>>> yes, I could have a try. >>>>>> >>>>>> --Junchao Zhang >>>>>> >>>>>> >>>>>> On Tue, Apr 16, 2024 at 8:35 PM Sreeram R Venkat <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I finally figured out a way to make it work. I had to build PETSc >>>>>>> and my application using the (non GPU-aware) Intel MPI. Then, before >>>>>>> running, I switch to the MVAPICH2-GDR. I'm not sure why that works, but >>>>>>> it's the only way I've >>>>>>> ZjQcmQRYFpfptBannerStart >>>>>>> This Message Is From an External Sender >>>>>>> This message came from outside your organization. >>>>>>> >>>>>>> ZjQcmQRYFpfptBannerEnd >>>>>>> I finally figured out a way to make it work. I had to build PETSc >>>>>>> and my application using the (non GPU-aware) Intel MPI. Then, before >>>>>>> running, I switch to the MVAPICH2-GDR. >>>>>>> I'm not sure why that works, but it's the only way I've found to >>>>>>> compile and run successfully without throwing any errors about not >>>>>>> having a >>>>>>> GPU-aware MPI. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 8, 2023 at 5:30 PM Mark Adams <[email protected]> wrote: >>>>>>> >>>>>>>> You may need to set some env variables. This can be system specific >>>>>>>> so you might want to look at docs or ask TACC how to run with >>>>>>>> GPU-aware MPI. >>>>>>>> >>>>>>>> Mark >>>>>>>> >>>>>>>> On Fri, Dec 8, 2023 at 5:17 PM Sreeram R Venkat < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Actually, when I compile my program with this build of PETSc and >>>>>>>>> run, I still get the error: >>>>>>>>> >>>>>>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is >>>>>>>>> not GPU-aware. For better performance, please use a GPU-aware MPI. >>>>>>>>> >>>>>>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>>>>>> >>>>>>>>> Is there anything else I need to do? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Sreeram >>>>>>>>> >>>>>>>>> On Fri, Dec 8, 2023 at 3:29 PM Sreeram R Venkat < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The >>>>>>>>>> mvapich2-gdr module didn't require CUDA 11.4 as a dependency, so I >>>>>>>>>> was >>>>>>>>>> using 12.0 >>>>>>>>>> >>>>>>>>>> On Fri, Dec 8, 2023 at 1:15 PM Satish Balay <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Executing: mpicc -show >>>>>>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>>>>>> -I/opt/apps/cuda/11.4/include -lcuda >>>>>>>>>>> -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 >>>>>>>>>>> -Wl,--enable-new-dtags -lmpi >>>>>>>>>>> >>>>>>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>>>>>> >>>>>>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>>>>>> this build. >>>>>>>>>>> >>>>>>>>>>> Perhaps you need to use cuda-11.4 - with this install of >>>>>>>>>>> mvapich.. >>>>>>>>>>> >>>>>>>>>>> Satish >>>>>>>>>>> >>>>>>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>>>>>> >>>>>>>>>>> > On Fri, Dec 8, 2023 at 1:54 PM Sreeram R Venkat < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> > >>>>>>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>>>>>> MVAPICH2-GDR. >>>>>>>>>>> > > >>>>>>>>>>> > > Here is my configure command: >>>>>>>>>>> > > >>>>>>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>>>>>> --download-hypre >>>>>>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>>>>>> --download-metis >>>>>>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>>>>>> --with-fc=mpif90 >>>>>>>>>>> > > >>>>>>>>>>> > > which errors with: >>>>>>>>>>> > > >>>>>>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>>>>>> configure.log for >>>>>>>>>>> > > details): >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ >>>>>>>>>>> -std=c++14 >>>>>>>>>>> > > -Xcompiler -fPIC >>>>>>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>>>>>> > > arch=compute_80,code=sm_80" >>>>>>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > The same configure command works when I use the Intel MPI >>>>>>>>>>> and I can build >>>>>>>>>>> > > with CUDA. The full config.log file is attached. Please let >>>>>>>>>>> me know if you >>>>>>>>>>> > > need any other information. I appreciate your help with this. >>>>>>>>>>> > > >>>>>>>>>>> > >>>>>>>>>>> > The proximate error is >>>>>>>>>>> > >>>>>>>>>>> > Executing: nvcc -c -o >>>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>>>>>> -std=c++14 >>>>>>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>>>>>> -gencode >>>>>>>>>>> > arch=compute_80,code=sm_80 >>>>>>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>>>>>> > conftest.cu >>>>>>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$> >>>>>>>>>>> > stdout: >>>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>>> than one >>>>>>>>>>> > instance of overloaded function >>>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>>> > "C" linkage >>>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$> >>>>>>>>>>> ". >>>>>>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>>>>>> > stderr: >>>>>>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>>>>>> than one >>>>>>>>>>> > instance of overloaded function >>>>>>>>>>> "__nv_associate_access_property_impl" has >>>>>>>>>>> > "C" linkage >>>>>>>>>>> > >>>>>>>>>>> > 1 error detected in the compilation of >>>>>>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>>>>>> > >>>>>>>>>>> > This looks like screwed up headers to me, but I will let >>>>>>>>>>> someone that >>>>>>>>>>> > understands CUDA compilation reply. >>>>>>>>>>> > >>>>>>>>>>> > Thanks, >>>>>>>>>>> > >>>>>>>>>>> > Matt >>>>>>>>>>> > >>>>>>>>>>> > Thanks, >>>>>>>>>>> > > Sreeram >>>>>>>>>>> > > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> >>>>>>>>>>
