Thanks Janne,
We had “SlurmdSpoolDir” parameter with value “/tmp/slurmd”. As you say, a correct value may be “/var/spool/slurmd” for example. But, with your reply, we have other questions: - What is the difference between “SlurmdSpooldDir” and “TmpFS” slurm parameters? - Who write in the TmpFS? Slurm daemon or user process? - We have compute nodes with 500GB HDD. How many space is recomendable for TmpDisk parameter? Regards, Alfonso Pardo Diaz System Administrator / Researcher c/ Sola nº 1; 10200 Trujillo, ESPAÑA Tel: +34 927 65 93 17 Fax: +34 927 32 32 37 El 24/04/2014, a las 08:39, Janne Blomqvist <[email protected]> escribió: > > On 2014-04-23T17:26:36 EEST, Alfonso Pardo wrote: >> Hi, >> >> I had some errors from a premature terminate jobs with this message: >> >> slurmd[bd-p14-01]: error: unlink(/tmp/slurmd/job60560/slurm_script): >> No such file or directory >> slurmd[bd-p14-01]: error: rmdir(/tmp/slurmd/job60560): No such file or >> directory >> >> >> Should “TmpFS” location be a shared file system? > > No. Or maybe it's possible, but why? Typically /tmp is considered a > machine-local directory. > > That being said, the error messages you quote have nothing to do with the > slurm.conf TmpFS setting but rather tell that your SlurmdSpoolDir is set to > "/tmp/slurmd". That is likely a bad idea, as there might be various /tmp > cleaner scripts such as tmpwatch emptying /tmp regularly, leading to errors > like you see (been there, done that). Just leave it at the default value > unless you have good reasons to do otherwise. Note that it requires some > trickery to move the contents of the SlurmdSpoolDir if you want to do it on > the fly without losing track of running jobs. > >> We don’t have TmpDisk parameter established (default value). How many >> space is reasonable for this parameter? > > Depends on how large disks you have on your nodes, no? However, the trend > seems to be that /tmp is a relatively small space, frequently on a ram disk > (tmpfs) rather than backed by a real disk [1]. So you might not want to > encourage your users to write code assuming a large /tmp is available. A > large machine-local space is probably better to place at /var/tmp or > something site-specific such as /local. > > > [1] http://0pointer.de/blog/projects/tmp.html > > -- > Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist > Aalto University School of Science, PHYS & BECS > +358503841576 || [email protected]
