Hello!

As far as I know SlurmSpoolDir is a directory for slurmd state information, slurm TMPFS is the directory that the daemon should use when SlurmSpoolDir runs out of space.

Regards,
Barbara


On 04/24/2014 10:07 AM, Alfonso Pardo wrote:
Thanks Janne,


We had “SlurmdSpoolDir” parameter with value “/tmp/slurmd”. As you say, a correct value may be “/var/spool/slurmd” for example.
But, with your reply, we have other questions:

- What is the difference between “SlurmdSpooldDir” and “TmpFS” slurm parameters?
- Who write in the TmpFS? Slurm daemon or user process?
- We have compute nodes with 500GB HDD. How many space is recomendable for TmpDisk parameter?




Regards,

Alfonso Pardo Diaz
*System Administrator / Researcher*
/c/ Sola nº 1; 10200 Trujillo, ESPAÑA/
/Tel: +34 927 65 93 17 Fax: +34 927 32 32 37/

CETA-Ciemat logo <http://www.ceta-ciemat.es/>


El 24/04/2014, a las 08:39, Janne Blomqvist <[email protected] <mailto:[email protected]>> escribió:


On 2014-04-23T17:26:36 EEST, Alfonso Pardo wrote:
Hi,

I had some errors from a premature terminate jobs with this message:

slurmd[bd-p14-01]: error: unlink(/tmp/slurmd/job60560/slurm_script):
No such file or directory
slurmd[bd-p14-01]: error: rmdir(/tmp/slurmd/job60560): No such file or
directory


Should “TmpFS” location be a shared file system?

No. Or maybe it's possible, but why? Typically /tmp is considered a machine-local directory.

That being said, the error messages you quote have nothing to do with the slurm.conf TmpFS setting but rather tell that your SlurmdSpoolDir is set to "/tmp/slurmd". That is likely a bad idea, as there might be various /tmp cleaner scripts such as tmpwatch emptying /tmp regularly, leading to errors like you see (been there, done that). Just leave it at the default value unless you have good reasons to do otherwise. Note that it requires some trickery to move the contents of the SlurmdSpoolDir if you want to do it on the fly without losing track of running jobs.

We don’t have TmpDisk parameter established (default value). How many
space is reasonable for this parameter?

Depends on how large disks you have on your nodes, no? However, the trend seems to be that /tmp is a relatively small space, frequently on a ram disk (tmpfs) rather than backed by a real disk [1]. So you might not want to encourage your users to write code assuming a large /tmp is available. A large machine-local space is probably better to place at /var/tmp or something site-specific such as /local.


[1] http://0pointer.de/blog/projects/tmp.html

--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & BECS
+358503841576 || [email protected] <mailto:[email protected]>


Reply via email to