On 2016-03-21 08:45, Janne Blomqvist wrote:
> On 2016-03-18 19:11,
Thomas Orgis wrote:
>
>> Hi, we had a nice debugging session here with
batch jobs writing data to the wrong temporary directories which then
also were not cleaned up. It turns out that this is due to the feature
of Slurm creating a TMPDIR if the environment variable is set to a
non-existing directory (and setting TMPDIR to something in any case).
>
> Ugh. We're bringing up a new cluster, and what we settled on was
>
>
1. Use pam_namespace.so to create user-private /tmp, /var/tmp and
>
/dev/shm directories. (pam_namespace.so does all the bind-mounting etc.
> dance)
>
> 2. An epilog script which checks whether the user has any
other job
> running on the node, if not delete the above user-private
directories.
>
>> Even the idea of setting it to /tmp irritates me: if
(!(tmpdir = getenvp(job->env, "TMPDIR"))) setenvf(&job->env, "TMPDIR",
"/tmp"); /* task may want it set */ else if (mkdir(tmpdir, 0700) < 0) {
The comment says "task may want it set". May. Maybe not. I would
appreciate the batch system not to guess, only adding environment
variables related to batch job setup (SLURM_* variables) and leave the
environment pretty please alone apart from that. Am I alone with that?
>
> Well, the POSIX'y thing for an application/script/whatever to do is
to
> use TMPDIR if set, else fall back to /tmp. So slurm setting TMPDIR
> should have no effect on a sensibly behaving app.
>
> That being
said, slurm creating the tmp directory if it doesn't exist is
>
potentially very confusing, I agree.
We use prolog to creat per job
tempdir and spank plugin to bind it to /tmp at the end epilog script to
clean up everything
https://github.com/fafik23/slurm_plugins/tree/master/bindtmp
DB