I'm starting EasyBuild up on our new AMD "Genoa" platform with 1 AMD EPYC 9124 16-Core Processor with 2 threads/core, 384 GB RAM, Omni-Path (OPA) fabric, and AlmaLinux 8.8 OS.

I wiped our existing EB modules so as to start with a clean slate. The goal is to build the foss-2023a toolchain as a starting point for further modules.

I previously experienced the same error as shown below with OpenMPI-4.1.4-GCC-12.2.0.eb, and Kenneth suggested that the lack of Infiniband hardware might be the problem. I had an Omni-Path (OPA fabric) adapter lying around, so I installed it in the system and made sure that IPoIB is working as expected.

The build of the OpenMPI-4.1.5-GCC-12.3.0.eb unfortunately fails with the same "PML cm cannot be selected" error as before:

== 2023-10-03 09:36:16,437 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:126 in __init__): Sanity check failed: sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 (output: [e000.nifl.fysik.dtu.dk:2392967] PML cm cannot be selected
[e000.nifl.fysik.dtu.dk:2392963] PML cm cannot be selected
)
sanity check command mpirun -n 1 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 (output: [e000.nifl.fysik.dtu.dk:2392988] PML cm cannot be selected
)
sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_mpifh exited with code 1 (output: [e000.nifl.fysik.dtu.dk:2393008] PML cm cannot be selected
)
sanity check command mpirun -n 1 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_mpifh exited with code 1 (output: [e000.nifl.fysik.dtu.dk:2393029] PML cm cannot be selected
)
sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_usempi exited with code 1 (output: [e000.nifl.fysik.dtu.dk:2393042] PML cm cannot be selected
)
sanity check command mpirun -n 1 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_usempi exited with code 1 (output: [e000.nifl.fysik.dtu.dk:2393070] PML cm cannot be selected
) (at easybuild/framework/easyblock.py:3655 in _sanity_check_step)
== 2023-10-03 09:36:16,437 build_log.py:267 INFO ... (took 5 secs)
== 2023-10-03 09:36:16,437 filetools.py:2012 INFO Removing lock /home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock... == 2023-10-03 09:36:16,438 filetools.py:383 INFO Path /home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock successfully removed. == 2023-10-03 09:36:16,438 filetools.py:2016 INFO Lock removed: /home/modules/software/.locks/_home_modules_software_OpenMPI_4.1.5-GCC-12.3.0.lock == 2023-10-03 09:36:16,438 easyblock.py:4277 WARNING build failed (first 300 chars): Sanity check failed: sanity check command OMPI_MCA_rmaps_base_oversubscribe=1 mpirun -n 8 /dev/shm/OpenMPI/4.1.5/GCC-12.3.0/mpi_test_hello_c exited with code 1 (output: node[e000.nifl.fysik.dtu.dk:2392967] PML cm cannot be selected
[e000.nifl.fysik.dtu.dk:2392963] PML cm cannot be selected
)
sanity chec
== 2023-10-03 09:36:16,438 easyblock.py:328 INFO Closing log for application name OpenMPI version 4.1.5


Since we now have used the latest GCC 12.3.0, and we have installed an OPA fabric, the problem would seem to be related to having the AMD "Genoa" hardware.

Does anyone have suggestions for building OpenMPI successfully on this platform?

Thanks,
Ole

--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark

Reply via email to