Gerrit,

I looked back in the "slurm-dev" list and found some discussion of this 
issue about signal handling in "salloc" back in December 2010.  At that 
time you proposed a patch to add more signal handling to salloc.c.   But I 
did not find a patch or discussion on the list about arbitrarily 
restricting salloc to running in foreground mode only.

Yes, it would be possible to use sbatch instead.  However,  a difference 
between salloc and sbatch is that the sbatch script is submitted for 
execution on one of the nodes of the allocation,  rather than the command 
running on the node submitting the request.

You can also bypass the test for "is_interactive" by adding input 
redirection to the salloc: 

salloc -n 16 -N 1 mpirun -n 16 <some-mpi-job> </dev/null &

and it will run in the background.

But I guess my major complaint is that this was a fairly major change 
introduced right at the end of a release cycle with no warning and no 
documentation indicating that salloc was now intended for foreground use 
only.

  -Don Albert-

[email protected] wrote on 02/04/2011 11:05:38 AM:

> While considering a patch for this, I decided against it, having been at 
the
> same point about a month ago. The problem is that the command name 
following
> salloc has no restrictions (and also the SallocDefaultCommand): it could 
be
> a non-interactive command (such as an mpirun), but it could as well 
> be a shell.
> 
> So it divided into two cases:
>  * interactive subcommands spawned by salloc, such as e.g. a shell,
>  * non-interactive subcommands run by salloc, e.g. mpirun
> 
> For the latter a batch script seems suitable. 
> 
> While looking through the manpages -- did you perhaps consider
> 
>  sbatch -n 16 -N1 --wrap="mpirun -n 16 <some-mpi-job>" 
> 
> (with or without ampersand)?
> 
> Gerrit
> 
> On Fri, 4 Feb 2011 08:48:45 -0700 you wrote:
> > Gerrit,
> > 
> > This report came to me from a customer site.  From what I gather, they 

> > are running a lot of test jobs from scripts that use "salloc" with 
> > background jobs of the form:
> > 
> > salloc -n 16 -N 1 mpirun -n 16 <some-mpi-job> &
> > 
> > using SLURM to generate an allocation and mpirun to run a job.   As 
such, 
> > I don't think they need to be the controlling terminal as they would 
if 
> > they were launching a shell under an allocation and running jobs 
> > interactively.
> > 
> > Perhaps this could have been done using "sbatch" instead of "salloc", 
but 
> > the fact remains that this change in the latest update broke their 
testing 
> > procedures for regression tests on the new release.
> > 
> >         -Don Albert-
> > 
> > 
> > Gerrit Renker <[email protected]> wrote on 02/03/2011 11:15:30 PM:
> > 
> > > Hi Don,
> > > 
> > > I submitted the patch and can give an account why it is necessary. 
> > > We had lots of
> > > problems with salloc due to the absence of job control (meaning 
> > > those jobs that
> > > were spawned by salloc as child processes).
> > > 
> > > This is not the only change, it needs to be seen in the context of 
> > > the others. The
> > > loop is used in order to gain control over the terminal. As long as 
> > > salloc runs in
> > > the background, it is not in control of the terminal.
> > > 
> > > This piece of code is comparable to running
> > > prompt> bash &
> > > [1] 3291
> > > prompt> jobs -l
> > > [1]+  3291 Stopped (tty input)     bash
> > > 
> > > bash is doing the same thing - as long as it is not the foreground 
> > > process in control
> > > of the terminal, it receives SIGTTIN to stop itself.
> > > 
> > > Further below in salloc, once it is in the foreground, it makes 
> > > itself the controlling
> > > process (tpgid), and then hands this over to the child.
> > > 
> > > Why would you want to start salloc in the background if, once you 
> > > use it, it needs to
> > > run in the foreground?
> > > 
> > > Gerrit
> > > 
> > > On Thu, 3 Feb 2011 11:27:50 -0700 you wrote:
> > > > There appears to have been a change in "salloc.c" sometime between 

> > SLURM 
> > > > 2.2.0-RC1 and the final release of SLURM 2.2.0,  involving signal 
> > handling 
> > > > and whether "salloc" is running in the foreground or background. 
In 
> > > > particular, the lines:
> > > > 
> > > >         is_interactive = isatty(STDIN_FILENO);
> > > >         if (is_interactive) {
> > > >                 bool sent_msg = false;
> > > >                 /* Wait as long as we are running in the 
background */
> > > >                 while (tcgetpgrp(STDIN_FILENO) != (pid = 
getpgrp())) {
> > > >                         if (!sent_msg) {
> > > >                                 error("Waiting for program to be 
> > placed in 
> > > > "
> > > >                                       "the foreground");
> > > >                                 sent_msg = true;
> > > >                         }
> > > >                         killpg(pid, SIGTTIN);
> > > >                 }
> > > > 
> > > >                 /*
> > > >                  * Save tty attributes and reset at exit, in case 
a 
> > child
> > > >                  * process died before properly resetting 
terminal.
> > > >                  */
> > > >                 tcgetattr (STDIN_FILENO, &saved_tty_attributes);
> > > >                 atexit (_reset_input_mode);
> > > >         }
> > > > 
> > > > were added right at the beginning of the "main" function in 
> > "salloc.c". 
> > > > There are no comments to indicate the rationale for this change. I 

> > don't 
> > > > recall any discussion of such a change in the "slurm-dev" list, 
but I 
> > 
> > > > could have missed it.
> > > > 
> > > > This change seems to prevent salloc from being launched as a 
> > background 
> > > > process.   An salloc with a simple command like "sleep" gets:
> > > > 
> > > > [stag] (dalbert) dja-slurm> salloc -n 2 sleep 10 &
> > > > [2] 30235
> > > > [stag] (dalbert) dja-slurm> salloc: error: Waiting for program to 
be 
> > > > placed in the foreground
> > > > 
> > > > and the job sits until you bring it to the foreground.   Can 
someone 
> > > > comment on the reason for this change?
> > > > 
> > > >         -Don Albert-

Reply via email to