While considering a patch for this, I decided against it, having been at the
same point about a month ago. The problem is that the command name following
salloc has no restrictions (and also the SallocDefaultCommand): it could be
a non-interactive command (such as an mpirun), but it could as well be a shell.

So it divided into two cases:
 * interactive subcommands spawned by salloc, such as e.g. a shell,
 * non-interactive subcommands run by salloc, e.g. mpirun

For the latter a batch script seems suitable. 

While looking through the manpages -- did you perhaps consider

 sbatch -n 16 -N1 --wrap="mpirun -n 16 <some-mpi-job>"  

(with or without ampersand)?

Gerrit

On Fri, 4 Feb 2011 08:48:45 -0700 you wrote:
> Gerrit,
> 
> This report came to me from a customer site.  From what I gather,  they 
> are running a lot of test jobs from scripts that use "salloc" with 
> background jobs of the form:
> 
> salloc -n 16 -N 1 mpirun -n 16 <some-mpi-job> &
> 
> using SLURM to generate an allocation and mpirun to run a job.   As such, 
> I don't think they need to be the controlling terminal as they would if 
> they were launching a shell under an allocation and running jobs 
> interactively.
> 
> Perhaps this could have been done using "sbatch" instead of "salloc", but 
> the fact remains that this change in the latest update broke their testing 
> procedures for regression tests on the new release.
> 
>         -Don Albert-
> 
> 
> Gerrit Renker <[email protected]> wrote on 02/03/2011 11:15:30 PM:
> 
> > Hi Don,
> > 
> > I submitted the patch and can give an account why it is necessary. 
> > We had lots of
> > problems with salloc due to the absence of job control (meaning 
> > those jobs that
> > were spawned by salloc as child processes).
> > 
> > This is not the only change, it needs to be seen in the context of 
> > the others. The
> > loop is used in order to gain control over the terminal. As long as 
> > salloc runs in
> > the background, it is not in control of the terminal.
> > 
> > This piece of code is comparable to running
> > prompt> bash &
> > [1] 3291
> > prompt> jobs -l
> > [1]+  3291 Stopped (tty input)     bash
> > 
> > bash is doing the same thing - as long as it is not the foreground 
> > process in control
> > of the terminal, it receives SIGTTIN to stop itself.
> > 
> > Further below in salloc, once it is in the foreground, it makes 
> > itself the controlling
> > process (tpgid), and then hands this over to the child.
> > 
> > Why would you want to start salloc in the background if, once you 
> > use it, it needs to
> > run in the foreground?
> > 
> > Gerrit
> > 
> > On Thu, 3 Feb 2011 11:27:50 -0700 you wrote:
> > > There appears to have been a change in "salloc.c" sometime between 
> SLURM 
> > > 2.2.0-RC1 and the final release of SLURM 2.2.0,  involving signal 
> handling 
> > > and whether "salloc" is running in the foreground or background.  In 
> > > particular, the lines:
> > > 
> > >         is_interactive = isatty(STDIN_FILENO);
> > >         if (is_interactive) {
> > >                 bool sent_msg = false;
> > >                 /* Wait as long as we are running in the background */
> > >                 while (tcgetpgrp(STDIN_FILENO) != (pid = getpgrp())) {
> > >                         if (!sent_msg) {
> > >                                 error("Waiting for program to be 
> placed in 
> > > "
> > >                                       "the foreground");
> > >                                 sent_msg = true;
> > >                         }
> > >                         killpg(pid, SIGTTIN);
> > >                 }
> > > 
> > >                 /*
> > >                  * Save tty attributes and reset at exit, in case a 
> child
> > >                  * process died before properly resetting terminal.
> > >                  */
> > >                 tcgetattr (STDIN_FILENO, &saved_tty_attributes);
> > >                 atexit (_reset_input_mode);
> > >         }
> > > 
> > > were added right at the beginning of the "main" function in 
> "salloc.c". 
> > > There are no comments to indicate the rationale for this change.   I 
> don't 
> > > recall any discussion of such a change in the "slurm-dev" list,  but I 
> 
> > > could have missed it.
> > > 
> > > This change seems to prevent salloc from being launched as a 
> background 
> > > process.   An salloc with a simple command like "sleep" gets:
> > > 
> > > [stag] (dalbert) dja-slurm> salloc -n 2 sleep 10 &
> > > [2] 30235
> > > [stag] (dalbert) dja-slurm> salloc: error: Waiting for program to be 
> > > placed in the foreground
> > > 
> > > and the job sits until you bring it to the foreground.   Can someone 
> > > comment on the reason for this change?
> > > 
> > >         -Don Albert-

Reply via email to