Hi all,

With the latest update of the RHEL 8.6 kernel 4.18.0-372.26.1.el8_6.x86_64 the Intel MPI issue has been resolved! The older Intel MPI versions are working again on the new kernel, see https://github.com/easybuilders/easybuild-easyconfigs/issues/15651. Here is a copy of my comment:

The EL 8.6 kernel kernel-4.18.0-372.26.1.el8_6.x86_64.rpm is available in both AlmaLinux and RockyLinux since last night! RHEL 8.6 was updated a couple of days ago.

I've upgraded an EL 8.6 server and the cpulist file now has size>0 as expected:

$ uname -r
4.18.0-372.26.1.el8_6.x86_64
$ ls -l /sys/devices/system/node/node0/cpulist
-r--r--r--. 1 root root 28672 Sep 15 07:42 /sys/devices/system/node/node0/cpulist

and tested all our Intel toolchains on this system:

$ ml intel/2020b
$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2019 Update 9 Build 20200923 (id: abd58e492)
Copyright 2003-2020, Intel Corporation.
$ ml purge
$ ml intel/2021b
$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.4 Build 20210831 (id: 758087adf)
Copyright 2003-2021, Intel Corporation.
$ ml purge
$ ml intel/2022a
[$ mpiexec.hydra --version
Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)
Copyright 2003-2022, Intel Corporation.

As you can see, the Intel MPI is now working correctly again :-)) It was OK on EL 8.5, but broken on EL 8.6 until the above listed kernel was released.

Best regards,
Ole



On 6/9/22 11:09, Ole Holm Nielsen wrote:
Hi Alan,

Thanks a lot for the feedback!  I've opened a new issue now:
https://github.com/easybuilders/easybuild-easyconfigs/issues/15651

Best regards,
Ole

On 6/9/22 10:52, Alan O'Cais wrote:
Ole,

Can you please copy this over to an issue in https://github.com/easybuilders/easybuild-easyconfigs/issues <https://github.com/easybuilders/easybuild-easyconfigs/issues> so we can keep track of things there? It is also being discussed in Slack but we should really have the discussion and progress in a location where anyone can find it.

If you don't have a GitHub account, can you give me permission to copy over the content of your email to create the issue.

Thanks,

Alan

On Wed, 25 May 2022 at 10:54, Ole Holm Nielsen <ole.h.niel...@fysik.dtu.dk <mailto:ole.h.niel...@fysik.dtu.dk>> wrote:

    Hi Easybuilders,

    I'm testing the upgrade of our compute nodes from Almalinux 8.5 to 8.6
    (the RHEL 8 clone similar to Rocky Linux).

    We have found that *all* MPI codes built with any of the Intel toolchains     intel/2020b or intel/2021b fail after the 8.5 to 8.6 upgrade.  The codes
    fail also on login nodes, so the Slurm queue system is not involved.
    The FOSS toolchains foss/2020b and foss/2021b work perfectly on EL 8.6,
    however.

    My simple test uses the attached trivial MPI Hello World code running
    on a
    single node:

    $ module load intel/2021b
    $ mpicc mpi_hello_world.c
    $ mpirun ./a.out

    Now the mpirun command enters an infinite loop (running many minutes) and
    we see these processes with "ps":

    /bin/sh
/home/modules/software/impi/2021.4.0-intel-compilers-2021.4.0/mpi/2021.4.0/bin/mpirun

    ./a.out
    mpiexec.hydra ./a.out

    The mpiexec.hydra process doesn't respond to 15/SIGTERM and I have to
    kill
    it with 9/SIGKILL.  I've tried to enable debugging output with
    export I_MPI_HYDRA_DEBUG=1
    export I_MPI_DEBUG=5
    but nothing gets printed from this.

    Question: Has anyone tried an EL 8.6 Linux with the Intel toolchain and
    mpiexec.hydra?  Can you suggest how I may debug this issue?

    OS information:

    $ cat /etc/redhat-release
    AlmaLinux release 8.6 (Sky Tiger)
    $ uname -r
    4.18.0-372.9.1.el8.x86_64

    Thanks a lot,
    Ole

    --     Ole Holm Nielsen
    PhD, Senior HPC Officer
    Department of Physics, Technical University of Denmark



--
Ole Holm Nielsen
PhD, Senior HPC Officer
Department of Physics, Technical University of Denmark,
Fysikvej Building 309, DK-2800 Kongens Lyngby, Denmark
E-mail: ole.h.niel...@fysik.dtu.dk
Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/
Mobile: (+45) 5180 1620

Reply via email to