In order to allow the customer to continue his regression tests without 
modifying many script files,  the "salloc" command at the site has been 
modified to take out the lines checking for salloc being in the foreground 
when it starts. 

Will this behavior (requiring foreground) remain the default for salloc? 
Or can we come up with a way to allow salloc to run as background and 
still improve the handling of signals when it has a controlling terminal?

  -Don Albert-

[email protected] wrote on 02/04/2011 03:12:23 PM:

> 
> Gerrit, 
> 
> I looked back in the "slurm-dev" list and found some discussion of 
> this issue about signal handling in "salloc" back in December 2010. 
> At that time you proposed a patch to add more signal handling to 
> salloc.c.   But I did not find a patch or discussion on the list 
> about arbitrarily restricting salloc to running in foreground mode only. 

> 
> Yes, it would be possible to use sbatch instead.  However,  a 
> difference between salloc and sbatch is that the sbatch script is 
> submitted for execution on one of the nodes of the allocation, 
> rather than the command running on the node submitting the request. 
> 
> You can also bypass the test for "is_interactive" by adding input 
> redirection to the salloc: 
> 
> salloc -n 16 -N 1 mpirun -n 16 <some-mpi-job> </dev/null & 
> 
> and it will run in the background. 
> 
> But I guess my major complaint is that this was a fairly major 
> change introduced right at the end of a release cycle with no 
> warning and no documentation indicating that salloc was now intended
> for foreground use only. 
> 
>   -Don Albert- 
> 
> [email protected] wrote on 02/04/2011 11:05:38 AM:
> 
> > While considering a patch for this, I decided against it, having been 
at the
> > same point about a month ago. The problem is that the command 
namefollowing
> > salloc has no restrictions (and also the SallocDefaultCommand): it 
could be
> > a non-interactive command (such as an mpirun), but it could as well 
> > be a shell.
> > 
> > So it divided into two cases:
> >  * interactive subcommands spawned by salloc, such as e.g. a shell,
> >  * non-interactive subcommands run by salloc, e.g. mpirun
> > 
> > For the latter a batch script seems suitable. 
> > 
> > While looking through the manpages -- did you perhaps consider
> > 
> >  sbatch -n 16 -N1 --wrap="mpirun -n 16 <some-mpi-job>" 
> > 
> > (with or without ampersand)?
> > 
> > Gerrit
> > 
> > On Fri, 4 Feb 2011 08:48:45 -0700 you wrote:
> > > Gerrit,
> > > 
> > > This report came to me from a customer site.  From what I gather, 
they 
> > > are running a lot of test jobs from scripts that use "salloc" with 
> > > background jobs of the form:
> > > 
> > > salloc -n 16 -N 1 mpirun -n 16 <some-mpi-job> &
> > > 
> > > using SLURM to generate an allocation and mpirun to run a job.  As 
such, 
> > > I don't think they need to be the controlling terminal as they would 
if 
> > > they were launching a shell under an allocation and running jobs 
> > > interactively.
> > > 
> > > Perhaps this could have been done using "sbatch" instead of 
"salloc", but 
> > > the fact remains that this change in the latest update broke 
> their testing 
> > > procedures for regression tests on the new release.
> > > 
> > >         -Don Albert-
> > > 
> > > 
> > > Gerrit Renker <[email protected]> wrote on 02/03/2011 11:15:30 PM:
> > > 
> > > > Hi Don,
> > > > 
> > > > I submitted the patch and can give an account why it is necessary. 

> > > > We had lots of
> > > > problems with salloc due to the absence of job control (meaning 
> > > > those jobs that
> > > > were spawned by salloc as child processes).
> > > > 
> > > > This is not the only change, it needs to be seen in the context of 

> > > > the others. The
> > > > loop is used in order to gain control over the terminal. As long 
as 
> > > > salloc runs in
> > > > the background, it is not in control of the terminal.
> > > > 
> > > > This piece of code is comparable to running
> > > > prompt> bash &
> > > > [1] 3291
> > > > prompt> jobs -l
> > > > [1]+  3291 Stopped (tty input)     bash
> > > > 
> > > > bash is doing the same thing - as long as it is not the foreground 

> > > > process in control
> > > > of the terminal, it receives SIGTTIN to stop itself.
> > > > 
> > > > Further below in salloc, once it is in the foreground, it makes 
> > > > itself the controlling
> > > > process (tpgid), and then hands this over to the child.
> > > > 
> > > > Why would you want to start salloc in the background if, once you 
> > > > use it, it needs to
> > > > run in the foreground?
> > > > 
> > > > Gerrit
> > > > 
> > > > On Thu, 3 Feb 2011 11:27:50 -0700 you wrote:
> > > > > There appears to have been a change in "salloc.c" sometime 
between 
> > > SLURM 
> > > > > 2.2.0-RC1 and the final release of SLURM 2.2.0,  involving 
signal 
> > > handling 
> > > > > and whether "salloc" is running in the foreground or background. 
 In 
> > > > > particular, the lines:
> > > > > 
> > > > >         is_interactive = isatty(STDIN_FILENO);
> > > > >         if (is_interactive) {
> > > > >                 bool sent_msg = false;
> > > > >                 /* Wait as long as we are running in the 
background */
> > > > >                 while (tcgetpgrp(STDIN_FILENO) != (pid = 
getpgrp())) {
> > > > >                         if (!sent_msg) {
> > > > >                                 error("Waiting for program to be 

> > > placed in 
> > > > > "
> > > > >                                       "the foreground");
> > > > >                                 sent_msg = true;
> > > > >                         }
> > > > >                         killpg(pid, SIGTTIN);
> > > > >                 }
> > > > > 
> > > > >                 /*
> > > > >                  * Save tty attributes and reset at exit, in 
case a 
> > > child
> > > > >                  * process died before properly resetting 
terminal.
> > > > >                  */
> > > > >                 tcgetattr (STDIN_FILENO, &saved_tty_attributes);
> > > > >                 atexit (_reset_input_mode);
> > > > >         }
> > > > > 
> > > > > were added right at the beginning of the "main" function in 
> > > "salloc.c". 
> > > > > There are no comments to indicate the rationale for this change. 
  I 
> > > don't 
> > > > > recall any discussion of such a change in the "slurm-dev" 
> list,  but I 
> > > 
> > > > > could have missed it.
> > > > > 
> > > > > This change seems to prevent salloc from being launched as a 
> > > background 
> > > > > process.   An salloc with a simple command like "sleep" gets:
> > > > > 
> > > > > [stag] (dalbert) dja-slurm> salloc -n 2 sleep 10 &
> > > > > [2] 30235
> > > > > [stag] (dalbert) dja-slurm> salloc: error: Waiting for program 
to be 
> > > > > placed in the foreground
> > > > > 
> > > > > and the job sits until you bring it to the foreground.   Can 
someone 
> > > > > comment on the reason for this change?
> > > > > 
> > > > >         -Don Albert-

Reply via email to