-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 04/07/14 17:06, Arjun J Rao wrote:
> Also, is the missing /usr/local/sbin/scch an integral part of the problem ?
I think that's a red herring, the man page for slurm.conf says:
checkpoint/blcr Berkeley Lab Checkpoint Restart (BLCR).
NOTE: If a file is found at sbin/scch (relative
to the SLURM installation location), it will be
executed upon completion of the checkpoint. This
can be a script used for managing the checkpoint
files. NOTE: SLURM’s BLCR logic only supports batch
jobs.
*However* I think that NOTE at the end may explain it, you say you are doing:
srun -N2 -n24 --checkpoint 1 --checkpoint-dir /home/arjun/ACIM/Ctrl ./MPIJob
I think you'll need to do that inside an sbatch script for
this to work.
Caveat: We've never used this, so YMMV.
All the best,
Chris
- --
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: [email protected] Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlO54jYACgkQO2KABBYQAh9sdACfWoq1EBZJD7efbiEnYdqxY53U
y3gAnjDnMw39y2IoGWaMV9DUftXhhJ8U
=QPs4
-----END PGP SIGNATURE-----