Re: [petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

2022-10-08 Thread Rob Kudyba
>
> Perhaps we can back one step:
> Use your mpicc to build a "hello world" mpi test, then run it on a compute
> node (with GPU) to see if it works.
> If no, then your MPI environment has problems;
> If yes, then use it to build petsc (turn on petsc's gpu support,
>  --with-cuda  --with-cudac=nvcc), and then your code.
> --Junchao Zhang

OK tried this just to eliminate that the CUDA-capable OpenMPI is a factor:
./configure --with-debugging=0 --with-cmake=true   --with-mpi=true
 --with-mpi-dir=/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support --with-fc=0
  --with-cuda=1
[..]
cuda:
  Version:11.7
  Includes:   -I/path/to/cuda11.7/toolkit/11.7.1/include
  Libraries:  -Wl,-rpath,/path/to/cuda11.7/toolkit/11.7.1/lib64
-L/cm/shared/apps/cuda11.7/toolkit/11.7.1/lib64
-L/path/to/cuda11.7/toolkit/11.7.1/lib64/stubs -lcudart -lnvToolsExt
-lcufft -lcublas -lcusparse -lcusolver -lcurand -lcuda
  CUDA SM 75
  CUDA underlying compiler:
CUDA_CXX="/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin"/mpicxx
  CUDA underlying compiler flags: CUDA_CXXFLAGS=
  CUDA underlying linker libraries: CUDA_CXXLIBS=
[...]
 Configure stage complete. Now build PETSc libraries with:
   make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-opt all

C++ compiler version: g++ (GCC) 10.2.0
Using C++ compiler to compile PETSc
-
Using C/C++ linker:
/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/bin/mpicxx
Using C/C++ flags: -Wall -Wwrite-strings -Wno-strict-aliasing
-Wno-unknown-pragmas -Wno-lto-type-mismatch -fstack-protector
-fvisibility=hidden -g -O0
-
Using system modules:
shared:slurm/20.02.6:DefaultModules:openmpi/gcc/64/4.1.1_cuda_11.0.3_aware:gdal/3.3.0:cmake/3.22.1:cuda11.7/toolkit/11.7.1:openblas/dynamic/0.3.7:gcc/10.2.0
Using mpi.h: # 1
"/path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/include/mpi.h" 1
-
Using libraries: -Wl,-rpath,/path/to/petsc/arch-linux-cxx-debug/lib
-L/path/to/petsc/arch-linux-cxx-debug/lib -lpetsc -lopenblas -lm -lX11
-lquadmath -lstdc++ -ldl
--
Using mpiexec: mpiexec -mca orte_base_help_aggregate 0  -mca pml ucx --mca
btl '^openib'
--
Using MAKE: /path/to/petsc/arch-linux-cxx-debug/bin/make
Using MAKEFLAGS: -j24 -l48.0  --no-print-directory -- MPIEXEC=mpiexec\
-mca\ orte_base_help_aggregate\ 0\ \ -mca\ pml\ ucx\ --mca\ btl\ '^openib'
PETSC_ARCH=arch-linux-cxx-debug PETSC_DIR=/path/to/petsc
==
make[3]: Nothing to be done for 'libs'.
=
Now to check if the libraries are working do:
make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-cxx-debug check
=
[me@xxx petsc]$ make PETSC_DIR=/path/to/petsc
PETSC_ARCH=arch-linux-cxx-debug MPIEXEC="mpiexec -mca
orte_base_help_aggregate 0  -mca pml ucx --mca btl '^openib'" check
Running check examples to verify correct installation
Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-cxx-debug
C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes

./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Quadro RTX 8000
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(GB/s)
   3200 12.3

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(GB/s)
   3200 13.2

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes) Bandwidth(GB/s)
   3200 466.2

Result = PASS

On Sat, Oct 8, 2022 at 7:56 PM Barry Smith  wrote:

>
>   True, but when users send reports back to us they will never have used
> the VERBOSE=1 option, so it requires one more round trip of email to get
> this additional information.
>
> > On Oct 8, 2022, at 6:48 PM, Jed Brown  wrote:
> >
> > Barry Smith  writes:
> >
> >>   I hate these kinds of make rules that hide what the compiler is doing
> (in the name of having less output, I guess) it makes it difficult to
> figure out what is going wrong.
> >
> > You can make VERBOSE=1 with CMake-generated makefiles.
>


> Anyways, either some of the MPI libraries are missing from the link line
> or they are in the wrong order and thus it is not able to search them
> properly. Here is a bunch of discussions on why that error message can
> appear
> https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line
>


Still same but more noise and I have been using the suggestion of
LDFLAGS="-Wl,--copy-dt-needed-entries" along with make:
make[2]: Entering directory '/path/to/WTM/build'
cd /path/to/WTM/build && /path/to/cmake/cmake-3.22.1-linux-x86_64/bin/cmake
-E cmake_depends "Unix Makefiles" /path/to/WTM /path/to/WTM
/path/to/WTM/build 

Re: [petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

2022-10-08 Thread Barry Smith


  True, but when users send reports back to us they will never have used the 
VERBOSE=1 option, so it requires one more round trip of email to get this 
additional information. 

> On Oct 8, 2022, at 6:48 PM, Jed Brown  wrote:
> 
> Barry Smith  writes:
> 
>>   I hate these kinds of make rules that hide what the compiler is doing (in 
>> the name of having less output, I guess) it makes it difficult to figure out 
>> what is going wrong.
> 
> You can make VERBOSE=1 with CMake-generated makefiles.



Re: [petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

2022-10-08 Thread Jed Brown
Barry Smith  writes:

>I hate these kinds of make rules that hide what the compiler is doing (in 
> the name of having less output, I guess) it makes it difficult to figure out 
> what is going wrong.

You can make VERBOSE=1 with CMake-generated makefiles.


Re: [petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

2022-10-08 Thread Junchao Zhang
Perhaps we can back one step:
Use your mpicc to build a "hello world" mpi test, then run it on a compute
node (with GPU) to see if it works.
If no, then your MPI environment has problems;
If yes, then use it to build petsc (turn on petsc's gpu support,
--with-cuda  --with-cudac=nvcc), and then your code.

--Junchao Zhang


On Fri, Oct 7, 2022 at 10:45 PM Rob Kudyba  wrote:

> The error changes now and at an earlier place, 66% vs 70%:
> make LDFLAGS="-Wl,--copy-dt-needed-entries"
> Consolidate compiler generated dependencies of target fmt
> [ 12%] Built target fmt
> Consolidate compiler generated dependencies of target richdem
> [ 37%] Built target richdem
> Consolidate compiler generated dependencies of target wtm
> [ 62%] Built target wtm
> Consolidate compiler generated dependencies of target wtm.x
> [ 66%] Linking CXX executable wtm.x
> /usr/bin/ld: libwtm.a(transient_groundwater.cpp.o): undefined reference to
> symbol 'MPI_Abort'
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error
> adding symbols: DSO missing from command line
> collect2: error: ld returned 1 exit status
> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
> make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2
> make: *** [Makefile:136: all] Error 2
>
> So perhaps PET_Sc is now being found. Any other suggestions?
>
> On Fri, Oct 7, 2022 at 11:18 PM Rob Kudyba  wrote:
>
>>
>> Thanks for the quick reply. I added these options to make and make check
 still produce the warnings so I used the command like this:
 make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug
  MPIEXEC="mpiexec -mca orte_base_help_aggregate 0 --mca
 opal_warn_on_missing_libcuda 0 -mca pml ucx --mca btl '^openib'" check
 Running check examples to verify correct installation
 Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
 C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI
 process
 C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI
 processes
 Completed test examples

 Could be useful for the FAQ.

>>> You mentioned you had "OpenMPI 4.1.1 with CUDA aware",  so I think a
>>> workable mpicc should automatically find cuda libraries.  Maybe you
>>> unloaded cuda libraries?
>>>
>> Oh let me clarify, OpenMPI is CUDA aware however this code and the node
>> where PET_Sc is compiling does not have a GPU, hence not needed and using
>> the MPIEXEC option worked during the 'check' to suppress the warning.
>>
>> I'm not trying to use PetSC to compile and linking appears to go awry:
 [ 58%] Building CXX object
 CMakeFiles/wtm.dir/src/update_effective_storativity.cpp.o
 [ 62%] Linking CXX static library libwtm.a
 [ 62%] Built target wtm
 [ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o
 [ 70%] Linking CXX executable wtm.x
 /usr/bin/ld: cannot find -lpetsc
 collect2: error: ld returned 1 exit status
 make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
 make[1]: *** [CMakeFiles/Makefile2:269: CMakeFiles/wtm.x.dir/all] Error
 2
 make: *** [Makefile:136: all] Error 2

>>> It seems cmake could not find petsc.   Look
>>> at $PETSC_DIR/share/petsc/CMakeLists.txt and try to modify your
>>> CMakeLists.txt.
>>>
>>
>> There is an explicit reference to the path in CMakeLists.txt:
>> # NOTE: You may need to update this path to identify PETSc's location
>> set(ENV{PKG_CONFIG_PATH}
>> "$ENV{PKG_CONFIG_PATH}:/path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/")
>> pkg_check_modules(PETSC PETSc>=3.17.1 IMPORTED_TARGET REQUIRED)
>> message(STATUS "Found PETSc ${PETSC_VERSION}")
>> add_subdirectory(common/richdem EXCLUDE_FROM_ALL)
>> add_subdirectory(common/fmt EXCLUDE_FROM_ALL)
>>
>> And that exists:
>> ls /path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/
>> petsc.pc  PETSc.pc
>>
>>  Is there an environment variable I'm missing? I've seen the suggestion
>>> 
>>> to add it to LD_LIBRARY_PATH which I did with export
>>> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib and that
>>> points to:
>>>
 ls -l /path/to/petsc/arch-linux-c-debug/lib
 total 83732
 lrwxrwxrwx 1 rk3199 user   18 Oct  7 13:56 libpetsc.so ->
 libpetsc.so.3.18.0
 lrwxrwxrwx 1 rk3199 user   18 Oct  7 13:56 libpetsc.so.3.18 ->
 libpetsc.so.3.18.0
 -rwxr-xr-x 1 rk3199 user 85719200 Oct  7 13:56 libpetsc.so.3.18.0
 drwxr-xr-x 3 rk3199 user 4096 Oct  6 10:22 petsc
 drwxr-xr-x 2 rk3199 user 4096 Oct  6 10:23 pkgconfig

 Anything else to check?

>>> If modifying  CMakeLists.txt does not work, you can try export
>>> LIBRARY_PATH=$LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib
>>> LD_LIBRARY_PATHis is for run time, but the error happened at link time,
>>>
>>
>> Yes that's what I already had. Any other 

Re: [petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

2022-10-08 Thread Barry Smith

   I hate these kinds of make rules that hide what the compiler is doing (in 
the name of having less output, I guess) it makes it difficult to figure out 
what is going wrong.

   Anyways, either some of the MPI libraries are missing from the link line or 
they are in the wrong order and thus it is not able to search them properly. 
Here is a bunch of discussions on why that error message can appear 
https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line


  Barry


> On Oct 7, 2022, at 11:45 PM, Rob Kudyba  wrote:
> 
> The error changes now and at an earlier place, 66% vs 70%:
> make LDFLAGS="-Wl,--copy-dt-needed-entries"
> Consolidate compiler generated dependencies of target fmt
> [ 12%] Built target fmt
> Consolidate compiler generated dependencies of target richdem
> [ 37%] Built target richdem
> Consolidate compiler generated dependencies of target wtm
> [ 62%] Built target wtm
> Consolidate compiler generated dependencies of target wtm.x
> [ 66%] Linking CXX executable wtm.x
> /usr/bin/ld: libwtm.a(transient_groundwater.cpp.o): undefined reference to 
> symbol 'MPI_Abort'
> /path/to/openmpi-4.1.1_ucx_cuda_11.0.3_support/lib/libmpi.so.40: error adding 
> symbols: DSO missing from command line
> collect2: error: ld returned 1 exit status
> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
> make[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/wtm.x.dir/all] Error 2
> make: *** [Makefile:136: all] Error 2
> 
> So perhaps PET_Sc is now being found. Any other suggestions?
> 
> On Fri, Oct 7, 2022 at 11:18 PM Rob Kudyba  > wrote:
> 
> Thanks for the quick reply. I added these options to make and make check 
> still produce the warnings so I used the command like this:
> make PETSC_DIR=/path/to/petsc PETSC_ARCH=arch-linux-c-debug  MPIEXEC="mpiexec 
> -mca orte_base_help_aggregate 0 --mca opal_warn_on_missing_libcuda 0 -mca pml 
> ucx --mca btl '^openib'" check
> Running check examples to verify correct installation
> Using PETSC_DIR=/path/to/petsc and PETSC_ARCH=arch-linux-c-debug
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> Completed test examples
> 
> Could be useful for the FAQ.
> You mentioned you had "OpenMPI 4.1.1 with CUDA aware",  so I think a workable 
> mpicc should automatically find cuda libraries.  Maybe you unloaded cuda 
> libraries?
> Oh let me clarify, OpenMPI is CUDA aware however this code and the node where 
> PET_Sc is compiling does not have a GPU, hence not needed and using the 
> MPIEXEC option worked during the 'check' to suppress the warning. 
> 
> I'm not trying to use PetSC to compile and linking appears to go awry:
> [ 58%] Building CXX object 
> CMakeFiles/wtm.dir/src/update_effective_storativity.cpp.o
> [ 62%] Linking CXX static library libwtm.a
> [ 62%] Built target wtm
> [ 66%] Building CXX object CMakeFiles/wtm.x.dir/src/WTM.cpp.o
> [ 70%] Linking CXX executable wtm.x
> /usr/bin/ld: cannot find -lpetsc
> collect2: error: ld returned 1 exit status
> make[2]: *** [CMakeFiles/wtm.x.dir/build.make:103: wtm.x] Error 1
> make[1]: *** [CMakeFiles/Makefile2:269: CMakeFiles/wtm.x.dir/all] Error 2
> make: *** [Makefile:136: all] Error 2
> It seems cmake could not find petsc.   Look at 
> $PETSC_DIR/share/petsc/CMakeLists.txt and try to modify your CMakeLists.txt.
> 
> There is an explicit reference to the path in CMakeLists.txt:
> # NOTE: You may need to update this path to identify PETSc's location
> set(ENV{PKG_CONFIG_PATH} 
> "$ENV{PKG_CONFIG_PATH}:/path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/")
> pkg_check_modules(PETSC PETSc>=3.17.1 IMPORTED_TARGET REQUIRED)
> message(STATUS "Found PETSc ${PETSC_VERSION}")
> add_subdirectory(common/richdem EXCLUDE_FROM_ALL)
> add_subdirectory(common/fmt EXCLUDE_FROM_ALL)
>  
> And that exists:
> ls /path/to/petsc/arch-linux-cxx-debug/lib/pkgconfig/
> petsc.pc  PETSc.pc
> 
>  Is there an environment variable I'm missing? I've seen the suggestion 
> 
>  to add it to LD_LIBRARY_PATH which I did with export 
> LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib and that points 
> to:
> ls -l /path/to/petsc/arch-linux-c-debug/lib
> total 83732
> lrwxrwxrwx 1 rk3199 user   18 Oct  7 13:56 libpetsc.so -> 
> libpetsc.so.3.18.0
> lrwxrwxrwx 1 rk3199 user   18 Oct  7 13:56 libpetsc.so.3.18 -> 
> libpetsc.so.3.18.0
> -rwxr-xr-x 1 rk3199 user 85719200 Oct  7 13:56 libpetsc.so.3.18.0
> drwxr-xr-x 3 rk3199 user 4096 Oct  6 10:22 petsc
> drwxr-xr-x 2 rk3199 user 4096 Oct  6 10:23 pkgconfig
> 
> Anything else to check?
> If modifying  CMakeLists.txt does not work, you can try export 
> LIBRARY_PATH=$LIBRARY_PATH:$PETSC_DIR/$PETSC_ARCH/lib
> LD_LIBRARY_PATHis is for run time, but the error