We save all job scripts by adding a line of scontrol write batch_script to
our slurmctld.prolog. For example:
test -e "$jobdir/jobscript" || timeout 15s scontrol write batch_script
"${SLURM_JOBID}" "$jobdir/jobscript"
Best regards,
Jessica Nettelblad, UPPMAX
O
regards,
Jessica Nettelblad (and some others)
Uppsala University
jess...@nettelblad.se
+46-706-346175
a file. This is what we use in
the prolog when we gather information for later (possible) troubleshooting.
So I suppose that'll be available from 18.08 without patching.
https://github.com/SchedMD/slurm/commit/158551a7059702ba46eaab78168008fe75d1c070#diff-bc42eb80c8e7b0e26af71e04acab6aca
B
like you did.
- The current job's process will get this new environment variable. You can
refer to this new name (in code/child processes after the change) using
$SLURM_JOB_NAME.
- Doesn't update the Slurm controller job name.
Best regards,
Jessica Nettelblad, UPPMAX
On Thu, Mar 22, 2018 at 10:16 PM
Try this instead:
scontrol hold name=g09
(Line 441:
https://github.com/SchedMD/slurm/blob/master/src/scontrol/update_job.c)
On Mon, Mar 19, 2018 at 9:17 AM, Loris Bennett
wrote:
> Hi,
>
> The manpage for version 17.02.9 of 'scontrol' says the following about
> the
rol, squeue, and other Slurm commands with
date information. Else they get server default local time.
Best regards,
Jessica Nettelblad, UPPMAX
We experienced the same problem. On our two new clusters with smaller
databases (<1 million jobs), the upgrade from 17.02.9 to 17.11.2 and
17.11.3 was quick and smooth. On the third, older cluster, where we have a
larger database (>30 million jobs) the upgrade was a mess, both in mysql
and