These changes were motivated to support Cray systems and
making this bit of logic conditionally compiled only on
Cray systems seems the best option. Right now, that is the
most attractive option in my mind.

LLNL has no Cray systems and the vast majority of jobs are
submitted using sbatch, so these changes have gone unnoticed
by our users.

Moe


Quoting [email protected]:

Moe,

Yes,  that change does allow bash to run under salloc when salloc is
within a script.   Other shells ("tcsh", "ksh", and "zsh") on my system
would also fail to run, and now are ok.

I sympathize with Gerrit's problem of hung jobs,  and I understand his
motivation for trying to manage the potentially orphaned processes.  But I
have to contend with our users as well, many of whom have been happily
(until now) using salloc in scripts, or running it non-interactively as
background jobs,  without encountering any such dire consequences.

As you mentioned, perhaps the job control changes should be in conditional
code instead of general code, since the problem seems to be specific to
Cray?  What about other users of SLURM with Cray systems?   Doesn't LLNL
itself run SLURM on Cray?   Have you all encountered the same blockages of
resources that motivated Gerrit to add these changes?

        -Don Albert-





[email protected]
06/02/2011 06:47 AM

To
[email protected]
cc
[email protected]
Subject
Re: [slurm-dev] Running Salloc From Bash Fails Under 2.2.3?






Don,

I believe that the patch below will fix the problem that you have
reported and return salloc to something very similar to its previous
behavior. It would cause problems on Cray systems, so this logic
might be something that is conditionally compiled. Perhaps you could
try this change and see how it works for you.


diff --git a/src/salloc/salloc.c b/src/salloc/salloc.c
index 4cbc9c5..092668f 100644
--- a/src/salloc/salloc.c
+++ b/src/salloc/salloc.c
@@ -219,10 +219,6 @@ int main(int argc, char *argv[])
          * a) input is from a terminal (stdin has valid termios
attributes),
          * b) controlling terminal exists (non-negative tpgid),
          * c) salloc is not run in allocation-only (--no-shell) mode,
-        * d) salloc runs in its own process group (true in interactive
-        *    shells that support job control),
-        * e) salloc has been configured at compile-time to support
background
-        *    execution and is not currently in the background process
group.
          */
         if (tcgetattr(STDIN_FILENO, &saved_tty_attributes) < 0) {
                 /*
@@ -234,9 +230,8 @@ int main(int argc, char *argv[])
                         error("no controlling terminal: please set
--no-shell");
                         exit(error_exit);
                 }
-       } else if ((!opt.no_shell) && (pid == getpgrp())) {
-               if (tpgid == pid)
-                       is_interactive = true;
+       } else if (!opt.no_shell) {
+               is_interactive = true;
  #ifdef SALLOC_RUN_FOREGROUND
                 while (tcgetpgrp(STDIN_FILENO) != pid) {
                         if (!is_interactive) {


Quoting Gerrit Renker <[email protected]>:

Don,

as I stated earlier, I am sorry that this broke things for you, but ...

There was on major and one minor reason for submitting the set of job
control patches. The minor one is such that I would have no problems if
the SLURM developers would withdraw job control, and is: without job
control, there will always be situations where users are forced to use
kill -9 or similar to get rid of salloc. Since salloc is then not in
control of the sub-processes it runs.

The major reason for keeping this is the proprietary system we use, the
technical details are below. I am not writing these to enter into a
discussion of Cray features, but hope that the work at ORNL proceeds to
replace ALPS with SLURM. Then the problem described below would not
exist. That problem basically forced us to either
 * disable interactive sessions on our systems (unacceptable since both
   researchers and newcomers rely on interactive sessions to test out
   how a particular combination of parameters work) or
 * use job control as a trade-off.

The situation in December was such that it took only a single mistake of
one
user to make an entire multi-cabinet system unusable, i.e. if SLURM were
a
firewall or malware protection program, we would not even have this
discussion.

In December, when job control had not been added, we twice had stretches
in
the order of 10-12 hours where the machine became blocked and no new
jobs
would run, the messed up salloc sessions both happened in the evening.

Perhaps I was not clear in my description of the problem,  but the
patch
you supplied most emphatically does *not*  solve the problem that the
bash
shell is crashing and getting into the SIGTTIN loop before ever issuing
its own prompt to the user!    The "/bin/bash" command should have
executed the shell and allowed the user to enter commands,  and not
immediately terminated.

Have you tried this with other shells? This is coming from bash itself,
which
tries to become the foreground process in order to perform its job
control.
When run within a job script, salloc does not allow another process to
come
into the foreground, only if run in interactive mode. The behaviour
is expected,
it may be that other shells less aggressively try to move into the
foreground.

You seem to imply that it is somehow illegitimate to execute the
"salloc"
within a script.    I submit that it is almost second nature for
Linux/Unix programmers to create various "wrapper" scripts to invoke
commands (including "salloc") with certain fixed parameters, while
allowing easy substitution of other parameters.
If I understand you correctly, you mean a shell invoking a shell such as
% cat a
#!/bin/bash
/bin/sh -c /bin/bash
% ./a
% ps f -o pid,ppid,sid,pgid,tpgid,stat,wchan,cmd
  PID  PPID   SID  PGID TPGID STAT WCHAN  CMD
25480 25477 25480 25480 26024 Ss   wait   -bash
25739 25480 25480 25739 26024 S    wait    \_ sh a
25740 25739 25480 25740 26024 S    wait        \_ /bin/bash
26024 25740 25480 26024 26024 R+   -               \_ ps f -o
pid,ppid,sid,pgid,tpgid,stat,wchan,cmd

The three nested shells all have the same session ID, the login
shell 25480 remains session leader.
The bottom-level shell 25740 has backgrounded itself so that the ps
command it forked can be the
foreground process (tpgid = 26024).

The "salloc" command itself is essentially a wrapper which allows a
 user to invoke a specific
command or shell after calling SLURM to reserve some resources. I
see  no reason that salloc
should not be able to be executed within a script.
The above use case uses shell scripts to construct interactive
sessions. I have found no non-ugly
way of allowing this, and believe also that sh or bash resort to
some tricks and heuristics that
make this possible (i.e. not sure that sh/bash will always get this
right).

The reason is the "some resources". When a user kills the salloc
process, the remote processes
running via aprun on compute nodes are not at the same time
terminated. Once the salloc session
is in addition either terminated via scancel or has naturally timed
out, it would be time to free
up the resources reserved for this session. Very likely SLURM itself
 could handle this situation.

However, on Cray the compute nodes are not under control of SLURM,
but the Basil batch system layer.
At the end of the salloc session, this layer receives a notification
 that the job is done. However,
it will refuse to launch any new reservations until it has not
cleaned up the existing ones. But it
can not clean up the existing ones since the remote orphaned
processes are still executing. Until
an operator comes in (e.g. during the middle of the night) and
cleans up the orphaned processes, no
new jobs will run since the old reservation sits in "pending cancel"
state.

Hence we do not allow other processes to obtain control over salloc
unless salloc is running in an
interactive mode.

We run SLURM on 3 XT systems, including our main production system,
and one XE system. When we
migrated we had many novice users. Since introduction of the job
control into salloc, we have had
not a single case of machine-unusable-time caused by orphaned child
processes of salloc.

Hence if you have your way, we would need to fork in order to keep
things running.










Reply via email to