Hi all,
With the latest update of the RHEL 8.6 kernel 4.18.0-372.26.1.el8_6.x86_64
the Intel MPI issue has been resolved! The older Intel MPI versions are
working again on the new kernel, see
https://github.com/easybuilders/easybuild-easyconfigs/issues/15651. Here
is a copy of my comment:
The EL 8.6 kernel kernel-4.18.0-372.26.1.el8_6.x86_64.rpm is available in
both AlmaLinux and RockyLinux since last night! RHEL 8.6 was updated a
couple of days ago.
I've upgraded an EL 8.6 server and the cpulist file now has size>0 as
expected:
$ uname -r
4.18.0-372.26.1.el8_6.x86_64
$ ls -l /sys/devices/system/node/node0/cpulist
-r--r--r--. 1 root root 28672 Sep 15 07:42
/sys/devices/system/node/node0/cpulist
and tested all our Intel toolchains on this system:
$ ml intel/2020b
$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2019 Update 9 Build 20200923
(id: abd58e492)
Copyright 2003-2020, Intel Corporation.
$ ml purge
$ ml intel/2021b
$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.4 Build 20210831 (id:
758087adf)
Copyright 2003-2021, Intel Corporation.
$ ml purge
$ ml intel/2022a
[$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id:
28877f3f32)
Copyright 2003-2022, Intel Corporation.
As you can see, the Intel MPI is now working correctly again :-)) It was
OK on EL 8.5, but broken on EL 8.6 until the above listed kernel was released.
Best regards,
Ole
On 6/9/22 11:09, Ole Holm Nielsen wrote:
Hi Alan,
Thanks a lot for the feedback! I've opened a new issue now:
https://github.com/easybuilders/easybuild-easyconfigs/issues/15651
Best regards,
Ole
On 6/9/22 10:52, Alan O'Cais wrote:
Ole,
Can you please copy this over to an issue in
https://github.com/easybuilders/easybuild-easyconfigs/issues
<https://github.com/easybuilders/easybuild-easyconfigs/issues> so we can
keep track of things there? It is also being discussed in Slack but we
should really have the discussion and progress in a location where
anyone can find it.
If you don't have a GitHub account, can you give me permission to copy
over the content of your email to create the issue.
Thanks,
Alan
On Wed, 25 May 2022 at 10:54, Ole Holm Nielsen
<ole.h.niel...@fysik.dtu.dk <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:
Hi Easybuilders,
I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6
(the RHEL 8 clone similar to Rocky Linux).
We have found that *all* MPI codes built with any of the Intel
toolchains
intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade. The
codes
fail also on login nodes, so the Slurm queue system is not involved.
The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6,
however.
My simple test uses the attached trivial MPI Hello World code running
on a
single node:
$ module load intel/2021b
$ mpicc mpi_hello_world.c
$ mpirun ./a.out
Now the mpirun command enters an infinite loop (running many
minutes) and
we see these processes with "ps":
/bin/sh
/home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun
./a.out
mpiexec.hydra ./a.out
The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to
kill
it with 9/SIGKILL. I've tried to enable debugging output with
export I_MPI_HYDRA_DEBUG=1
export I_MPI_DEBUG=5
but nothing gets printed from this.
Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and
mpiexec.hydra? Can you suggest how I may debug this issue?
OS information:
$ cat /etc/redhat-release
AlmaLinux release 8.6 (Sky Tiger)
$ uname -r
4.18.0-372.9.1.el8.x86_64
Thanks a lot,
Ole
-- Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark
--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620