On 2016-03-18 19:11, Thomas Orgis wrote:
Hi,
we had a nice debugging session here with batch jobs writing data to
the wrong temporary directories which then also were not cleaned up. It
turns out that this is due to the feature of Slurm creating a TMPDIR if
the environment variable is set to a non-existing directory (and
setting TMPDIR to something in any case).
Ugh. We're bringing up a new cluster, and what we settled on was
1. Use pam_namespace.so to create user-private /tmp, /var/tmp and
/dev/shm directories. (pam_namespace.so does all the bind-mounting etc.
dance)
2. An epilog script which checks whether the user has any other job
running on the node, if not delete the above user-private directories.
Even the idea of setting it to /tmp irritates me:
if (!(tmpdir = getenvp(job->env, "TMPDIR")))
setenvf(&job->env, "TMPDIR", "/tmp"); /* task may
want it set */
else if (mkdir(tmpdir, 0700) < 0) {
The comment says "task may want it set". May. Maybe not. I would
appreciate the batch system not to guess, only adding environment
variables related to batch job setup (SLURM_* variables) and leave the
environment pretty please alone apart from that. Am I alone with that?
Well, the POSIX'y thing for an application/script/whatever to do is to
use TMPDIR if set, else fall back to /tmp. So slurm setting TMPDIR
should have no effect on a sensibly behaving app.
That being said, slurm creating the tmp directory if it doesn't exist is
potentially very confusing, I agree.
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & NBE
+358503841576 || [email protected]