These patches will be in SLURM version 2.3.3, which will probably be released next week.

Thanks!
Moe


Quoting "Mark A. Grondona" <[email protected]>:

It was found that slurmstepd was intermittently leaving SIGPIPE
blocked when launching user tasks. This may have something to do
with the fact that the xsignal_unblock() call in _fork_all_tasks()
is referencing an extern array (nominally this should have unblocked
SIGPIPE), but I didn't spend the time to fully track this issue
down. Instead, I figured there is probably no reason we would _not_
want to unblock *all* signals, so this patch does that.

Before this change, the following program fails every once in awhile:

 #include <stdio.h>
 #include <signal.h>

 int main (int ac, char **av)
 {
        int i, rc = 0;
        struct sigaction act;
        for (i = 1; i < SIGRTMAX; i++) {
                sigaction (i, NULL, &act);
                if (act.sa_handler == SIG_DFL)
                        continue;
                fprintf (stderr, "Signal %d appears to be ignored!\n", i);
                rc = 1;
        }
        return (rc);
 }

with:

 srun -N1 -n1 ./test
 Signal 13 appears to be ignored!

after the change, the program succeeds.
---
 src/slurmd/slurmstepd/mgr.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/src/slurmd/slurmstepd/mgr.c b/src/slurmd/slurmstepd/mgr.c
index 87b5fda..698d4d3 100644
--- a/src/slurmd/slurmstepd/mgr.c
+++ b/src/slurmd/slurmstepd/mgr.c
@@ -1207,6 +1207,13 @@ static void prepare_stdio (slurmd_job_t *job, slurmd_task_info_t *task)
        return;
 }

+static void unblock_signals (void)
+{
+       sigset_t set;
+       sigemptyset(&set);
+       xsignal_set_mask (&set);
+}
+
 /* fork and exec N tasks
  */
 static int
@@ -1319,7 +1326,7 @@ _fork_all_tasks(slurmd_job_t *job)

                        /* log_fini(); */ /* note: moved into exec_task() */

-                       xsignal_unblock(slurmstepd_blocked_signals);
+                       unblock_signals();

                        /*
                         *  Need to setup stdio before setpgid() is called
--
1.7.1





Reply via email to