We pinpointed it to `ConstrainDevices=yes` in cgroup.conf. The solution was to set `/dev/*` in cgroup_allowed_devices_file.conf. We did not have anything there. We're now looking into the specific device that is needed by pmi2.
Martijn Kruiten On Thu, 2018-11-01 at 18:48 +0100, Bas van der Vlies wrote: > Oke if we change: > * TaskPlugin=task/affinity,task/cgroup > > to: > * TaskPlugin=task/affinity > > The pmi2 interface works. Investigating this further > > On 31/10/2018 08:26, Bas van der Vlies wrote: > > I am busy with migrating from Torque/Moab to SLURM. > > > > I have installed slurm 18.03 and trying to run an mpi program woth > > the > > pmi2 interface. > > > > {{{ > > ~/mpitest> srun --mpi=list > > srun: MPI types are... > > srun: none > > srun: openmpi > > srun: pmi2 > > }}} > > > > The none and openmpi interface works but the pmi2 interface crashes > > the > > slurmstepd. Have I missed some setting or is this a bug? > > > > {{{ > > (gdb) thread apply all bt > > > > Thread 6 (Thread 0x2b9ce9b8b700 (LWP 21945)): > > #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at > > ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 > > #1 0x00002b9ce5c7862b in ?? () from > > /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so > > #2 0x00002b9ce6c08494 in start_thread (arg=0x2b9ce9b8b700) at > > pthread_create.c:333 > > #3 0x00002b9ce6f06acf in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 > > > > Thread 5 (Thread 0x2b9ce9c8c700 (LWP 21946)): > > #0 0x00002b9ce6efd67d in poll () at ../sysdeps/unix/syscall- > > template.S:84 > > #1 0x00002b9ce5d16cfb in slurm_eio_handle_mainloop () from > > /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so > > #2 0x00005631c29f69f6 in ?? () > > #3 0x00002b9ce6c08494 in start_thread (arg=0x2b9ce9c8c700) at > > pthread_create.c:333 > > #4 0x00002b9ce6f06acf in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 > > > > Thread 4 (Thread 0x2b9ceaedb700 (LWP 21948)): > > #0 0x00002b9ce6efd67d in poll () at ../sysdeps/unix/syscall- > > template.S:84 > > #1 0x00002b9cea2a8f52 in ?? () from > > /usr/lib/x86_64-linux-gnu/slurm//task_cgroup.so > > #2 0x00002b9ce6c08494 in start_thread (arg=0x2b9ceaedb700) at > > pthread_create.c:333 > > #3 0x00002b9ce6f06acf in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 > > > > Thread 3 (Thread 0x2b9ceadda700 (LWP 21947)): > > #0 0x00002b9ce6efd67d in poll () at ../sysdeps/unix/syscall- > > template.S:84 > > #1 0x00002b9ce5d16cfb in slurm_eio_handle_mainloop () from > > /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so > > #2 0x00002b9ceaac7355 in ?? () from > > /usr/lib/x86_64-linux-gnu/slurm//mpi_pmi2.so > > #3 0x00002b9ce6c08494 in start_thread (arg=0x2b9ceadda700) at > > pthread_create.c:333 > > #4 0x00002b9ce6f06acf in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 > > > > Thread 2 (Thread 0x2b9ce5ae0700 (LWP 21944)): > > #0 pthread_cond_wait@@GLIBC_2.3.2 () at > > ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > > #1 0x00002b9ce5c7e65d in ?? () from > > /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so > > #2 0x00002b9ce6c08494 in start_thread (arg=0x2b9ce5ae0700) at > > pthread_create.c:333 > > #3 0x00002b9ce6f06acf in clone () at > > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 > > > > Thread 1 (Thread 0x2b9ce59dd080 (LWP 21943)): > > #0 __GI_raise (sig=sig@entry=6) at > > ../sysdeps/unix/sysv/linux/raise.c:51 > > #1 0x00002b9ce6e5242a in __GI_abort () at abort.c:89 > > #2 0x00002b9ce6e8ec00 in __libc_message (do_abort=do_abort@entry=2 > > , > > fmt=fmt@entry=0x2b9ce6f83d98 "*** Error in `%s': %s: 0x%s ***\n") > > at ../sysdeps/posix/libc_fatal.c:175 > > #3 0x00002b9ce6e94fc6 in malloc_printerr (action=3, > > str=0x2b9ce6f8094a > > "free(): invalid pointer", ptr=<optimized out>, > > ar_ptr=<optimized out>) at malloc.c:5049 > > #4 0x00002b9ce6e9580e in _int_free (av=0x2b9ce71b7b00 > > <main_arena>, > > p=0x2b9ce71bba60 <lock>, have_lock=0) at malloc.c:3905 > > #5 0x00002b9ce5d1084d in slurm_xfree () from > > /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so > > #6 0x00002b9cea2ab0b0 in task_cgroup_devices_create () from > > /usr/lib/x86_64-linux-gnu/slurm//task_cgroup.so > > #7 0x00002b9cea2a5977 in task_p_pre_setuid () from > > /usr/lib/x86_64-linux-gnu/slurm//task_cgroup.so > > #8 0x00005631c2a04216 in task_g_pre_setuid () > > #9 0x00005631c29e713d in ?? () > > #10 0x00005631c29ec3f4 in job_manager () > > #11 0x00005631c29e9374 in main () > > }}}} > > > > > > > > -- > -- > Bas van der Vlies > > Operations, Support & Development | SURFsara | Science Park 140 | > > 1098 > XG Amsterdam > > T +31 (0) 20 800 1300 | bas.vandervl...@surfsara.nl | > > www.surfsara.nl | -- | System Programmer | SURFsara | Science Park 140 | 1098 XG Amsterdam | | T +31 6 20043417 | martijn.krui...@surfsara.nl | www.surfsara.nl |
smime.p7s
Description: S/MIME cryptographic signature