Hello I defined "TmpDisk=930000" for some machines in slurm 20.02.3 (and TmpFS is set to a local volume slightly bigger than that) and when I run...
sbatch --tmp=100000 -w node01 -array=1-100 -wrap="sleep 300" I ended up with 36 jobs on the machine at a time, 1 per CPU core. I expect the --tmp option to limit it to 9 jobs at a time since the node was defined as having 930000MB of TmpDisk. If I up the option to "--tmp=1000000" sbatch rejects the job because "Temporary disk specification cannot be satisfied" so this should not be a typo in the config or unit conversion issue. I would expect this to be treated as a managed resource that could help limit how many jobs land on a machine. Am I misunderstanding how the "--tmp" option is supposed to work? And a general question regarding TMPDIR and TmpFS... I understand that TMPDIR is set to /tmp by slurm regardless of what TmpFS is set to and it is expected that local sites will define TMPDIR in a prolog or plugin if they feel it is necessary. I would expect TmpFS to affect the value of TMPDIR by default. What is the reasoning behind the decision not to set TMPDIR to something like ${TmpFS}/${SLURM_JOB_ID}? Is there any documented discussion on slurm's expected use of TmpFS/TMPDIR or the philosophy behind it that I can read? Thanks