Public bug reported:

Which version of ubuntu:
distribution is newest daily of ubuntu focal with all updates as of today.

What i want to accomplish:
use slurm-wlm in combination with openmpi with a simple test.

How to reproduce:
1) Use slurm-wlm on a small single node test setup, with just the example slurm 
conf copied, and the server name changed accordingly in the slurm.conf file at
> SlurmctldHost=srv0
and
> NodeName=srv0 State=UNKNOWN
2) start slurmctld and slurmd daemons
3) create small sample hello world for mpi test
#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}
4) mpicc test.c
5) mpirun ./a.out works correctly

6) trying the same with slurm: srun --mpi=pmix ./a.out
which gives me
> srun: error: (null) [0] /mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can 
> not load PMIx library
> srun: error: Couldn't load specified plugin name for mpi/pmix: Plugin init() 
> callback failed
> srun: error: cannot create mpi context for mpi/pmix
> srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types

7) test if the pmix plugin is really supported with: srun --mpi=list
gives me
> srun: MPI types are...
> srun: pmix_v3
> srun: none
> srun: pmi2
> srun: pmix
> srun: openmpi

8) more verbose output of the failing command: strace srun --mpi=pmix ./a.out
shorter output: (tells me that the library is actually not at the path that 
slurm expects it to be)

openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2, revents=POLLOUT}])
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
write(2, "srun: error: (null) [0] /mpi_pmi"..., 100srun: error: (null) [0] 
/mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can not load PMIx library
) = 100

9) check the real content of the directory of the library with: ls -lh
/usr/lib/x86_64-linux-gnu/pmix/lib/

total 2,6M
lrwxrwxrwx 1 root root   29 Okt 19 19:57 libmca_common_dstore.so.1 -> 
libmca_common_dstore.so.1.0.1
-rw-r--r-- 1 root root  59K Okt 19 19:57 libmca_common_dstore.so.1.0.1
lrwxrwxrwx 1 root root   16 Okt 19 19:57 libpmi2.so.1 -> libpmi2.so.1.0.0
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi2.so.1.0.0
lrwxrwxrwx 1 root root   15 Okt 19 19:57 libpmi.so.1 -> libpmi.so.1.0.1
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi.so.1.0.1
lrwxrwxrwx 1 root root   17 Okt 19 19:57 libpmix.so.2 -> libpmix.so.2.2.24
-rw-r--r-- 1 root root 847K Okt 19 19:57 libpmix.so.2.2.24
drwxr-xr-x 3 root root 4,0K Feb 11 21:43 pmix

10) it seems the library is actually there, but the name is not perfectly 
correct:
slurm wants "libpmix.so" but the real name is "libpmix.so.2"

11) try to make a link with the name:
ln -s /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24 
/usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so

12) try slurm again: 
$ srun --mpi=pmix -n4 ~segler/a.out 
Hello world from processor srv0, rank 0 out of 4 processors
Hello world from processor srv0, rank 2 out of 4 processors
Hello world from processor srv0, rank 1 out of 4 processors
Hello world from processor srv0, rank 3 out of 4 processors

13) it works!!! :)

could you add the library link to the package of libpmix2?

$ dpkg -S /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24
libpmix2:amd64: /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24

thank you!

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: slurm-wlm 19.05.5-1
ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8
Uname: Linux 5.4.0-12-generic x86_64
ApportVersion: 2.20.11-0ubuntu16
Architecture: amd64
Date: Tue Feb 11 23:01:18 2020
InstallationDate: Installed on 2020-02-11 (0 days ago)
InstallationMedia: Ubuntu-Server 20.04 LTS "Focal Fossa" - Alpha amd64 
(20200124)
ProcEnviron:
 SHELL=/bin/bash
 LANG=de_DE.UTF-8
 TERM=xterm-256color
 PATH=(custom, no user)
SourcePackage: slurm-llnl
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: slurm-llnl (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug focal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1862854

Title:
  srun with pmix plugin searches .so file at wrong location

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/slurm-llnl/+bug/1862854/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to