-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28/03/14 11:02, Franco Broi wrote:

> Just an update on this, after running with the new version of slurm
> on the control node overnight, this morning several nodes running
> 2.6.5 were showing as down. Restarted the local slurm daemons but
> lost the jobs running on them.

Ouch - that's not good.   Anything in your slurmctld log (or syslog)
about why they seemed to go away?

It's frustrating that Slurm kills jobs on down nodes, I don't think
we've had an instance yet where it's been justified and we've lost a
heap of jobs from this behaviour (especially on BG/Q).

A setting that let you change that behaviour to mark nodes as draining
rather than down would be very handy.

cheers,
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM4t34ACgkQO2KABBYQAh/HywCeMPWjaVDNtpIEyaR7FZrJgxKC
x9EAnR73QIjD3tNcXZHPT3tlQkLj2J2z
=28B7
-----END PGP SIGNATURE-----

Reply via email to