On 2016-05-10 19:14, [email protected] wrote:
Background:
I’m trying to write a slurmd task plugin to bind mount /tmp to
/tmp/USERID/JOBID.
Question 1: should I be using a task plugin or a spank plugin to do this?
There are a number of options here. You can look at the slides from the past
few slurm user group meetings, there's some info there.
What we ended up with was to use pam_namespace to setup various bind-mounted
dirs (in our case, /tmp, /var/tmp, and /dev/shm), and then an epilog script to
clean it up afterwards. Below is our /etc/security/namespace.d/site.conf:
/tmp /l/tmp-inst/ user root,bin,adm
/var/tmp /l/vartmp_inst/ user root,bin,adm
/dev/shm /dev/shm/inst/ user root,bin,adm
See the pam_namespace(8) and namespace.conf(5) man pages for more info. And
then in /etc/pam.d/slurm add
session required pam_namespace.so
Finally, slurm epilog script below, modify as appropriate.
#!/bin/bash
# If pam_namespace is used to create per-job /tmp/, /var/tmp, /dev/shm,
# clean it here in the epilog when no jobs are running on the node.
# Annoyingly, squeue always exits with status 0, so we must check that
# the output is empty, that is no jobs by the user running on the node
# and no error occurred (timeout etc.)
userlist=$(/usr/bin/squeue -w $HOSTNAME -o%u -h -u $SLURM_JOB_USER -t R,S,CF
2>&1)
if [ -z $userlist ]; then
/bin/rm -rf /l/tmp-inst/$SLURM_JOB_USER /l/vartmp_inst/$SLURM_JOB_USER
/dev/shm/$SLURM_JOB_USER
fi
Question 2:
I’m launching slurmd with the following line
sudo slurmd -N linux0 -D -vvvvvv
but the debug statements in slurmstepd code aren’t being printed to screen.
I assume that the slurmstepd code is being run in a fork of slurred.
Where can I find debug output fromslurmstepd?
Nowhere, see https://bugs.schedmd.com/show_bug.cgi?id=2631 . The workaround is to just
run slurmd without "-D" and tail -f the syslog.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || [email protected]