-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 28/03/14 11:02, Franco Broi wrote:
> Just an update on this, after running with the new version of slurm > on the control node overnight, this morning several nodes running > 2.6.5 were showing as down. Restarted the local slurm daemons but > lost the jobs running on them. Ouch - that's not good. Anything in your slurmctld log (or syslog) about why they seemed to go away? It's frustrating that Slurm kills jobs on down nodes, I don't think we've had an instance yet where it's been justified and we've lost a heap of jobs from this behaviour (especially on BG/Q). A setting that let you change that behaviour to mark nodes as draining rather than down would be very handy. cheers, Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM4t34ACgkQO2KABBYQAh/HywCeMPWjaVDNtpIEyaR7FZrJgxKC x9EAnR73QIjD3tNcXZHPT3tlQkLj2J2z =28B7 -----END PGP SIGNATURE-----
