I'm sure that other problems exist when daemon can't run, but his patch should
fix your particular problem for batch jobs. This patch works in SLURM v2.2 and
should work in v2.1.6 too.
Index: src/slurmd/slurmstepd/mgr.c
===================================================================
--- src/slurmd/slurmstepd/mgr.c (revision 22973)
+++ src/slurmd/slurmstepd/mgr.c (working copy)
@@ -855,6 +855,8 @@
if (rc) {
error("IO setup failed: %m");
+ job->task[0]->estatus = 0x0100;
+ step_complete.step_rc = 0x0100;
rc = SLURM_SUCCESS; /* drains node otherwise */
goto fail2;
} else {
________________________________________
From: [email protected] [[email protected]] On Behalf
Of Alejandro Lucero Palau [[email protected]]
Sent: Wednesday, March 30, 2011 8:17 AM
To: [email protected]
Subject: [slurm-dev] Job state after IO_setup failed at slurmstepd
Hi,
We have users submitting jobs with afterok dependency on. When a node
is not working properly (filesystem problems) IO_setup at slurmstepd
fails and user program never executed. However the job is reported as
completed to slurmctl so jobs dependent on that job are executed.
Probably there's a good reason for this but I can not see it.
We are working with slurm-2.1.6 but I have checked out last slurm
version and it seems the same.
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
http://www.bsc.es/disclaimer.htm