Hello!
As far as I know SlurmSpoolDir is a directory for slurmd state
information, slurm TMPFS is the directory that the daemon should use
when SlurmSpoolDir runs out of space.
Regards,
Barbara
On 04/24/2014 10:07 AM, Alfonso Pardo wrote:
Thanks Janne,
We had “SlurmdSpoolDir” parameter with value “/tmp/slurmd”. As you
say, a correct value may be “/var/spool/slurmd” for example.
But, with your reply, we have other questions:
- What is the difference between “SlurmdSpooldDir” and “TmpFS” slurm
parameters?
- Who write in the TmpFS? Slurm daemon or user process?
- We have compute nodes with 500GB HDD. How many space is recomendable
for TmpDisk parameter?
Regards,
Alfonso Pardo Diaz
*System Administrator / Researcher*
/c/ Sola nº 1; 10200 Trujillo, ESPAÑA/
/Tel: +34 927 65 93 17 Fax: +34 927 32 32 37/
CETA-Ciemat logo <http://www.ceta-ciemat.es/>
El 24/04/2014, a las 08:39, Janne Blomqvist <[email protected]
<mailto:[email protected]>> escribió:
On 2014-04-23T17:26:36 EEST, Alfonso Pardo wrote:
Hi,
I had some errors from a premature terminate jobs with this message:
slurmd[bd-p14-01]: error: unlink(/tmp/slurmd/job60560/slurm_script):
No such file or directory
slurmd[bd-p14-01]: error: rmdir(/tmp/slurmd/job60560): No such file or
directory
Should “TmpFS” location be a shared file system?
No. Or maybe it's possible, but why? Typically /tmp is considered a
machine-local directory.
That being said, the error messages you quote have nothing to do with
the slurm.conf TmpFS setting but rather tell that your SlurmdSpoolDir
is set to "/tmp/slurmd". That is likely a bad idea, as there might be
various /tmp cleaner scripts such as tmpwatch emptying /tmp
regularly, leading to errors like you see (been there, done that).
Just leave it at the default value unless you have good reasons to do
otherwise. Note that it requires some trickery to move the contents
of the SlurmdSpoolDir if you want to do it on the fly without losing
track of running jobs.
We don’t have TmpDisk parameter established (default value). How many
space is reasonable for this parameter?
Depends on how large disks you have on your nodes, no? However, the
trend seems to be that /tmp is a relatively small space, frequently
on a ram disk (tmpfs) rather than backed by a real disk [1]. So you
might not want to encourage your users to write code assuming a large
/tmp is available. A large machine-local space is probably better to
place at /var/tmp or something site-specific such as /local.
[1] http://0pointer.de/blog/projects/tmp.html
--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & BECS
+358503841576 || [email protected]
<mailto:[email protected]>