-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 31/03/14 19:57, Franco Broi wrote:

[nodes going down on migration from 2.6 to 14.03]
> The slurmctld log just shows them as unresponsive and I don't have 
> logging on the cluster nodes - never needed it before.

No worries, we're going to do some testing on VMs to test migration so
we'll see if we can reproduce this.

[killing jobs when nodes go down]
> I agree. I can see that a parallel job that loses a node could be 
> restarted but maybe Slurm should ping the node before deciding that
> it's definitely down.

For us Open-MPI catches those sorts of things for us, so I think we'd
rather just have the Slurm not kill jobs unless we (or the user) tells
it to.

cheers,
Chris
- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM58U8ACgkQO2KABBYQAh9mEQCfQp1J2mAHAxTxuUae/a+wgzOS
BfMAniP24zDWxD+tQT21ie+9X5Vbo6HV
=GsY8
-----END PGP SIGNATURE-----

Reply via email to