-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 31/03/14 19:57, Franco Broi wrote:
[nodes going down on migration from 2.6 to 14.03] > The slurmctld log just shows them as unresponsive and I don't have > logging on the cluster nodes - never needed it before. No worries, we're going to do some testing on VMs to test migration so we'll see if we can reproduce this. [killing jobs when nodes go down] > I agree. I can see that a parallel job that loses a node could be > restarted but maybe Slurm should ping the node before deciding that > it's definitely down. For us Open-MPI catches those sorts of things for us, so I think we'd rather just have the Slurm not kill jobs unless we (or the user) tells it to. cheers, Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM58U8ACgkQO2KABBYQAh9mEQCfQp1J2mAHAxTxuUae/a+wgzOS BfMAniP24zDWxD+tQT21ie+9X5Vbo6HV =GsY8 -----END PGP SIGNATURE-----