On Fri, Oct 7, 2022 at 1:08 PM Rob Kudyba <rk3...@columbia.edu> wrote:
> Thanks for the quick reply. I added these options to make and make check > still produce the warnings so I used the command like this: > make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug > MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca > opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'" check > Running check examples to verify correct installation > Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug > C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process > C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes > Completed test examples > > Could be useful for the FAQ. > You mentioned you had "OpenMPI 4.1.1 with CUDA aware", so I think a workable mpicc should automatically find cuda libraries. Maybe you unloaded cuda libraries? > I'm not trying to use PetSC to compile and linking appears to go awry: > [ 58%] Building CXX object > CMakeFiles/wtm.dir/src/update_effective_storativity.cpp.o > [ 62%] Linking CXX static library libwtm.a > [ 62%] Built target wtm > [ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o > [ 70%] Linking CXX executable wtm.x > /usr/bin/ld: cannot find -lpetsc > collect2: error: ld returned 1 exit status > make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1 > make[1]: *** [CMakeFiles/Makefile2:269: CMakeFiles/wtm.x.dir/all] Error 2 > make: *** [Makefile:136: all] Error 2 > It seems cmake could not find petsc. Look at $PETSC_DIR/share/petsc/CMakeLists.txt and try to modify your CMakeLists.txt. > > > Is there an environment variable I'm missing? I've seen the suggestion > <https://www.mail-archive.com/search?l=petsc-users@mcs.anl.gov&q=subject:%22%5C%5Bpetsc%5C-users%5C%5D+CMake+error+in+PETSc%22&o=newest&f=1> > to add it to LD_LIBRARY_PATH which I did with export > LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib and that > points to: > ls -l /path/to/petsc/arch-linux-c-debug/lib > total 83732 > lrwxrwxrwx 1 rk3199 user 18 Oct 7 13:56 libpetsc.so -> > libpetsc.so.3.18.0 > lrwxrwxrwx 1 rk3199 user 18 Oct 7 13:56 libpetsc.so.3.18 -> > libpetsc.so.3.18.0 > -rwxr-xr-x 1 rk3199 user 85719200 Oct 7 13:56 libpetsc.so.3.18.0 > drwxr-xr-x 3 rk3199 user 4096 Oct 6 10:22 petsc > drwxr-xr-x 2 rk3199 user 4096 Oct 6 10:23 pkgconfig > > Anything else to check? > If modifying CMakeLists.txt does not work, you can try export LIBRARY_PATH=$LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib LD_LIBRARY_PATHis is for run time, but the error happened at link time, > > On Fri, Oct 7, 2022 at 1:53 PM Satish Balay <ba...@mcs.anl.gov> wrote: > >> you can try >> >> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug >> MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca >> opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'" >> >> Wrt configure - it can be set with --with-mpiexec option - its saved in >> PETSC_ARCH/lib/petsc/conf/petscvariables >> >> Satish >> >> On Fri, 7 Oct 2022, Rob Kudyba wrote: >> >> > We are on RHEL 8, using modules that we can load/unload various version >> of >> > packages/libraries, and I have OpenMPI 4.1.1 with CUDA aware loaded >> along >> > with GDAL 3.3.0, GCC 10.2.0, and cmake 3.22.1 >> > >> > make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug check >> > fails with the below errors, >> > Running check examples to verify correct installation >> > >> > Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug >> > Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process >> > See https://petsc.org/release/faq/ >> > >> -------------------------------------------------------------------------- >> > The library attempted to open the following supporting CUDA libraries, >> > but each of them failed. CUDA-aware support is disabled. >> > libcuda.so.1: cannot open shared object file: No such file or directory >> > libcuda.dylib: cannot open shared object file: No such file or directory >> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or >> > directory >> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such file >> or >> > directory >> > If you are not interested in CUDA-aware support, then run with >> > --mca opal_warn_on_missing_libcuda 0 to suppress this message. If you >> are >> > interested >> > in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location >> > of libcuda.so.1 to get passed this issue. >> > >> -------------------------------------------------------------------------- >> > >> -------------------------------------------------------------------------- >> > WARNING: There was an error initializing an OpenFabrics device. >> > >> > Local host: g117 >> > Local device: mlx5_0 >> > >> -------------------------------------------------------------------------- >> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. >> > Number of SNES iterations = 2 >> > Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI >> processes >> > See https://petsc.org/release/faq/ >> > >> > The library attempted to open the following supporting CUDA libraries, >> > but each of them failed. CUDA-aware support is disabled. >> > libcuda.so.1: cannot open shared object file: No such file or directory >> > libcuda.dylib: cannot open shared object file: No such file or directory >> > /usr/lib64/libcuda.so.1: cannot open shared object file: No such file or >> > directory >> > /usr/lib64/libcuda.dylib: cannot open shared object file: No such file >> or >> > directory >> > If you are not interested in CUDA-aware support, then run with >> > --mca opal_warn_on_missing_libcuda 0 to suppress this message. If you >> are >> > interested in CUDA-aware support, then try setting LD_LIBRARY_PATH to >> the >> > locationof libcuda.so.1 to get passed this issue. >> > >> > WARNING: There was an error initializing an OpenFabrics device. >> > >> > Local host: xxx >> > Local device: mlx5_0 >> > >> > lid velocity = 0.0016, prandtl # = 1., grashof # = 1. >> > Number of SNES iterations = 2 >> > [g117:4162783] 1 more process has sent help message >> > help-mpi-common-cuda.txt / dlopen failed >> > [g117:4162783] Set MCA parameter "orte_base_help_aggregate" to 0 to see >> all >> > help / error messages >> > [g117:4162783] 1 more process has sent help message >> help-mpi-btl-openib.txt >> > / error in device init >> > Completed test examples >> > Error while running make check >> > gmake[1]: *** [makefile:149: check] Error 1 >> > make: *** [GNUmakefile:17: check] Error 2 >> > >> > Where is $MPI_RUN set? I'd like to be able to pass options such as --mca >> > orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml >> > ucx --mca btl '^openib' which will help me troubleshoot and hide >> unneeded >> > warnings. >> > >> > Thanks, >> > Rob >> > >> >>