Public bug reported:
Which version of ubuntu:
distribution is newest daily of ubuntu focal with all updates as of today.
What i want to accomplish:
use slurm-wlm in combination with openmpi with a simple test.
How to reproduce:
1) Use slurm-wlm on a small single node test setup, with just the example slurm
conf copied, and the server name changed accordingly in the slurm.conf file at
> SlurmctldHost=srv0
and
> NodeName=srv0 State=UNKNOWN
2) start slurmctld and slurmd daemons
3) create small sample hello world for mpi test
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);
// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
// Get the name of the processor
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
// Print off a hello world message
printf("Hello world from processor %s, rank %d out of %d processors\n",
processor_name, world_rank, world_size);
// Finalize the MPI environment.
MPI_Finalize();
}
4) mpicc test.c
5) mpirun ./a.out works correctly
6) trying the same with slurm: srun --mpi=pmix ./a.out
which gives me
> srun: error: (null) [0] /mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can
> not load PMIx library
> srun: error: Couldn't load specified plugin name for mpi/pmix: Plugin init()
> callback failed
> srun: error: cannot create mpi context for mpi/pmix
> srun: error: invalid MPI type 'pmix', --mpi=list for acceptable types
7) test if the pmix plugin is really supported with: srun --mpi=list
gives me
> srun: MPI types are...
> srun: pmix_v3
> srun: none
> srun: pmi2
> srun: pmix
> srun: openmpi
8) more verbose output of the failing command: strace srun --mpi=pmix ./a.out
shorter output: (tells me that the library is actually not at the path that
slurm expects it to be)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so",
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2, revents=POLLOUT}])
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}) = 0
write(2, "srun: error: (null) [0] /mpi_pmi"..., 100srun: error: (null) [0]
/mpi_pmix.c:133 [init] mpi/pmix: ERROR: pmi/pmix: can not load PMIx library
) = 100
9) check the real content of the directory of the library with: ls -lh
/usr/lib/x86_64-linux-gnu/pmix/lib/
total 2,6M
lrwxrwxrwx 1 root root 29 Okt 19 19:57 libmca_common_dstore.so.1 ->
libmca_common_dstore.so.1.0.1
-rw-r--r-- 1 root root 59K Okt 19 19:57 libmca_common_dstore.so.1.0.1
lrwxrwxrwx 1 root root 16 Okt 19 19:57 libpmi2.so.1 -> libpmi2.so.1.0.0
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi2.so.1.0.0
lrwxrwxrwx 1 root root 15 Okt 19 19:57 libpmi.so.1 -> libpmi.so.1.0.1
-rw-r--r-- 1 root root 863K Okt 19 19:57 libpmi.so.1.0.1
lrwxrwxrwx 1 root root 17 Okt 19 19:57 libpmix.so.2 -> libpmix.so.2.2.24
-rw-r--r-- 1 root root 847K Okt 19 19:57 libpmix.so.2.2.24
drwxr-xr-x 3 root root 4,0K Feb 11 21:43 pmix
10) it seems the library is actually there, but the name is not perfectly
correct:
slurm wants "libpmix.so" but the real name is "libpmix.so.2"
11) try to make a link with the name:
ln -s /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24
/usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so
12) try slurm again:
$ srun --mpi=pmix -n4 ~segler/a.out
Hello world from processor srv0, rank 0 out of 4 processors
Hello world from processor srv0, rank 2 out of 4 processors
Hello world from processor srv0, rank 1 out of 4 processors
Hello world from processor srv0, rank 3 out of 4 processors
13) it works!!! :)
could you add the library link to the package of libpmix2?
$ dpkg -S /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24
libpmix2:amd64: /usr/lib/x86_64-linux-gnu/pmix/lib/libpmix.so.2.2.24
thank you!
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: slurm-wlm 19.05.5-1
ProcVersionSignature: Ubuntu 5.4.0-12.15-generic 5.4.8
Uname: Linux 5.4.0-12-generic x86_64
ApportVersion: 2.20.11-0ubuntu16
Architecture: amd64
Date: Tue Feb 11 23:01:18 2020
InstallationDate: Installed on 2020-02-11 (0 days ago)
InstallationMedia: Ubuntu-Server 20.04 LTS "Focal Fossa" - Alpha amd64
(20200124)
ProcEnviron:
SHELL=/bin/bash
LANG=de_DE.UTF-8
TERM=xterm-256color
PATH=(custom, no user)
SourcePackage: slurm-llnl
UpgradeStatus: No upgrade log present (probably fresh install)
** Affects: slurm-llnl (Ubuntu)
Importance: Undecided
Status: New
** Tags: amd64 apport-bug focal
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1862854
Title:
srun with pmix plugin searches .so file at wrong location
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/slurm-llnl/+bug/1862854/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs