It is not exactly clear from the documentation here (
https://computing.llnl.gov/linux/slurm/checkpoint_blcr.html) how it is that
I am supposed to checkpoint my jobs launched via SLURM.
Say I have launched an MPI Job with the following command
srun_cr -N2 -n24 --checkpoint 1 --checkpoint-dir
Hi guys,
On our cluster we run into situation, when we want change the
SlurmdSpoolDir location, do you know any way to do this without draining
whole cluster?
cheers,
marcin
Marcin Stolarek
Interdisciplinary Center for Mathematical and Computational Modeling,
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 11/03/14 06:20, Andy Riebs wrote:
Has anyone seen this before? slurm.conf is on an NFS server, so
it's possible we've got a configuration error there.
We've seen this same problem too, lost a heap of jobs to it. :-(
- --
Christopher Samuel
I wanted to add another reason (just discovered today) for the We have more
allocated time than is possible error emitted to the slurmdbd.log.
Disclaimer: I found this in an old version (v2.3.3) of Slurm, and can't
confirm that the problem can still happen.
The slurmctld submits job records
Hi All,
When we use --export=NONE with sbatch, so that we get a clean environment to
work with, some SLURM environment variables don't get set. At least the
following:
SLURM_JOB_NAME
SLURM_NTASKS_PER_NODE
SLURM_PRIO_PROCESS
SLURM_CPUS_PER_TASK
SLURM_SUBMIT_DIR
SLURM_SUBMIT_HOST
Script is: