These patches will be in SLURM version 2.3.3, which will probably be
released next week.
Thanks!
Moe
Quoting "Mark A. Grondona" <[email protected]>:
It was found that slurmstepd was intermittently leaving SIGPIPE
blocked when launching user tasks. This may have something to do
with the fact that the xsignal_unblock() call in _fork_all_tasks()
is referencing an extern array (nominally this should have unblocked
SIGPIPE), but I didn't spend the time to fully track this issue
down. Instead, I figured there is probably no reason we would _not_
want to unblock *all* signals, so this patch does that.
Before this change, the following program fails every once in awhile:
#include <stdio.h>
#include <signal.h>
int main (int ac, char **av)
{
int i, rc = 0;
struct sigaction act;
for (i = 1; i < SIGRTMAX; i++) {
sigaction (i, NULL, &act);
if (act.sa_handler == SIG_DFL)
continue;
fprintf (stderr, "Signal %d appears to be ignored!\n", i);
rc = 1;
}
return (rc);
}
with:
srun -N1 -n1 ./test
Signal 13 appears to be ignored!
after the change, the program succeeds.
---
src/slurmd/slurmstepd/mgr.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
diff --git a/src/slurmd/slurmstepd/mgr.c b/src/slurmd/slurmstepd/mgr.c
index 87b5fda..698d4d3 100644
--- a/src/slurmd/slurmstepd/mgr.c
+++ b/src/slurmd/slurmstepd/mgr.c
@@ -1207,6 +1207,13 @@ static void prepare_stdio (slurmd_job_t *job,
slurmd_task_info_t *task)
return;
}
+static void unblock_signals (void)
+{
+ sigset_t set;
+ sigemptyset(&set);
+ xsignal_set_mask (&set);
+}
+
/* fork and exec N tasks
*/
static int
@@ -1319,7 +1326,7 @@ _fork_all_tasks(slurmd_job_t *job)
/* log_fini(); */ /* note: moved into exec_task() */
- xsignal_unblock(slurmstepd_blocked_signals);
+ unblock_signals();
/*
* Need to setup stdio before setpgid() is called
--
1.7.1