While considering a patch for this, I decided against it, having been at the same point about a month ago. The problem is that the command name following salloc has no restrictions (and also the SallocDefaultCommand): it could be a non-interactive command (such as an mpirun), but it could as well be a shell.
So it divided into two cases: * interactive subcommands spawned by salloc, such as e.g. a shell, * non-interactive subcommands run by salloc, e.g. mpirun For the latter a batch script seems suitable. While looking through the manpages -- did you perhaps consider sbatch -n 16 -N1 --wrap="mpirun -n 16 <some-mpi-job>" (with or without ampersand)? Gerrit On Fri, 4 Feb 2011 08:48:45 -0700 you wrote: > Gerrit, > > This report came to me from a customer site. From what I gather, they > are running a lot of test jobs from scripts that use "salloc" with > background jobs of the form: > > salloc -n 16 -N 1 mpirun -n 16 <some-mpi-job> & > > using SLURM to generate an allocation and mpirun to run a job. As such, > I don't think they need to be the controlling terminal as they would if > they were launching a shell under an allocation and running jobs > interactively. > > Perhaps this could have been done using "sbatch" instead of "salloc", but > the fact remains that this change in the latest update broke their testing > procedures for regression tests on the new release. > > -Don Albert- > > > Gerrit Renker <[email protected]> wrote on 02/03/2011 11:15:30 PM: > > > Hi Don, > > > > I submitted the patch and can give an account why it is necessary. > > We had lots of > > problems with salloc due to the absence of job control (meaning > > those jobs that > > were spawned by salloc as child processes). > > > > This is not the only change, it needs to be seen in the context of > > the others. The > > loop is used in order to gain control over the terminal. As long as > > salloc runs in > > the background, it is not in control of the terminal. > > > > This piece of code is comparable to running > > prompt> bash & > > [1] 3291 > > prompt> jobs -l > > [1]+ 3291 Stopped (tty input) bash > > > > bash is doing the same thing - as long as it is not the foreground > > process in control > > of the terminal, it receives SIGTTIN to stop itself. > > > > Further below in salloc, once it is in the foreground, it makes > > itself the controlling > > process (tpgid), and then hands this over to the child. > > > > Why would you want to start salloc in the background if, once you > > use it, it needs to > > run in the foreground? > > > > Gerrit > > > > On Thu, 3 Feb 2011 11:27:50 -0700 you wrote: > > > There appears to have been a change in "salloc.c" sometime between > SLURM > > > 2.2.0-RC1 and the final release of SLURM 2.2.0, involving signal > handling > > > and whether "salloc" is running in the foreground or background. In > > > particular, the lines: > > > > > > is_interactive = isatty(STDIN_FILENO); > > > if (is_interactive) { > > > bool sent_msg = false; > > > /* Wait as long as we are running in the background */ > > > while (tcgetpgrp(STDIN_FILENO) != (pid = getpgrp())) { > > > if (!sent_msg) { > > > error("Waiting for program to be > placed in > > > " > > > "the foreground"); > > > sent_msg = true; > > > } > > > killpg(pid, SIGTTIN); > > > } > > > > > > /* > > > * Save tty attributes and reset at exit, in case a > child > > > * process died before properly resetting terminal. > > > */ > > > tcgetattr (STDIN_FILENO, &saved_tty_attributes); > > > atexit (_reset_input_mode); > > > } > > > > > > were added right at the beginning of the "main" function in > "salloc.c". > > > There are no comments to indicate the rationale for this change. I > don't > > > recall any discussion of such a change in the "slurm-dev" list, but I > > > > could have missed it. > > > > > > This change seems to prevent salloc from being launched as a > background > > > process. An salloc with a simple command like "sleep" gets: > > > > > > [stag] (dalbert) dja-slurm> salloc -n 2 sleep 10 & > > > [2] 30235 > > > [stag] (dalbert) dja-slurm> salloc: error: Waiting for program to be > > > placed in the foreground > > > > > > and the job sits until you bring it to the foreground. Can someone > > > comment on the reason for this change? > > > > > > -Don Albert-
