Gerrit, sorry to appear as a non constructive guy but Don views were sufficient to express the problem and his patch (that looks the one I wrote at CEA) interesting, as you said not perfect, but at least a good base to find an alternative solution. I guess your users do not use salloc to launch background applications, that is great for you but we could perhaps try to imagine a solution that is valid for everyone.
As said in the point 4 of the comment, in the last versions of slurm, the ability to launch a process in background using salloc is completely removed to prevent interactive shells/applications launched with salloc in the background from being badly handled. Well, launching in background an interactive application with salloc sounds really weird to me. The logic you added to better manage signals and TTY is really great but must be reserved to applications launched in foreground. Perfect, so why not adding an option to salloc to request for this specific behavior and do not touch the default one ? We could have a "-i" option (like bash -i) in salloc to ask for this "foreground only" mode and if not specified, the behavior of Don's patch that is to enforce backgrounded salloc to be treated as non-interactive one. salloc does not have the equivalent of "bash -c" as it was the default behavior. The --no-shell is not an equivalent as it returns a soon as the allocation is granted. If it is not possible to add a "-i" perhaps we could add the equivalent of the "-c" of bash instead. Let me know your views, I have started to ask my users to use STDIN redirections externally when submitting in the background, but I think that far less intuitive as they were never asked to do "bash -c command </dev/null &" Regards, Matthieu 2011/2/15 Gerrit Renker <[email protected]> > On Tue, 15 Feb 2011 08:39:10 +0100 Matthieu wrote: > > Hi, > > > > I come a little bit late on that but I would like to add that I agree > with > > Don on that. > > > > IMHO, modifing such a behavior is not really great. There is more > scenarios > > where salloc is executing a non-interactive command (salloc/mpirun) in > > background than scenarios where it is running a particular shell > > interactively _starting_ in background. If it is mandatory to have this > > behavior for interactive application I would rather have a new option for > > salloc to use this mode that making it the default. At least for me, 99% > of > > the backgrounded salloc are made for mpirun executions in non regression > > tests, I am not sure that my users will be happy to rewrite all their > > scripts or python programs that launches concurrent salloc/mpirun using > > threads (automatically set in background by python). > > > > > So far you are saying the same as Don. I have neither disagreed with your > and Don's comments, but apart from claiming that it "broke" things, there > have been no constructive suggestions how to make this better. > > The change affects solely jobs started in the background, for foreground > processes the behaviour is not "broken" (it did not change). > > If you can be sure that the job is indeed non-interactive, it can still be > started in the background, but apparently there is a large number of > scripts that resist any changes through e.g. perl or sed. > > But if the job is interactive, starting it in the background will result in > hanging salloc, as per reply to Mark. There is no way of bringing such a > job > into the foreground within salloc, the session is doomed to fail. > > That case is indeed broken, and this is not limited to the question > background > or not. We had a user running gdb in this way, the lack of job control in > salloc lead him having to use skill/scancel to clean up the hung session. > > > Would it be possible to make it configurable ? > > > But how do you intend that? As per the reply to Don, there is no a priori > way of telling whether a program is run in interactive mode or not. Program > name or commandline flags are not sufficient - a shell can also run in non > interactive mode. > > Gerrit > > > 2011/2/15 <[email protected]> > > > > > > > > It looks as though I am being outvoted in this, but I would like to > make a > > > few more points: > > > > > > 1. The reason I got involved in this was that a rather large Bull > > > customer has acceptance test script jobs that submit thousands of > "salloc" > > > requests as background jobs. These scripts worked just fine, and > then > > > they were broken by a change that appeared in the final version of > 2.2.0, > > > with no explanation of why salloc is suddenly restricted to running > in the > > > foreground. > > > 2. I understand that there are job control issues with salloc when > run > > > in the background, and that the changes in signal handling that > Gerrit made > > > improve the situation when salloc is run in the foreground by > retaining > > > better control of the job from the terminal, but I disagree that > this is > > > sufficient justification to remove the ability to run salloc in the > > > background, expecially since this change can be trivially bypassed > by using > > > input redirection. All that has been accomplished is to break the > scripts I > > > mentioned above and whatever else depended on the current behavior > of > > > salloc, and force users to add a "kludge" to obtain the behavior > they had > > > before. The customer scripts above have a legitimate use in > invoking > > > "non-interactive" usage of salloc, as do other examples such as > starting an > > > "xterm" on a SLURM allocation. > > > 3. I disagree that the proposed comments in the code provide > > > sufficient explanation of this change. The new comments explain > that salloc > > > must be running in the foreground to issue the "tcsetpgrp" call and > run > > > "interactive" subprocesses, but they do not explain the rationale > for > > > disallowing salloc to run in the background when it is running only > > > "non-interactive" subprocesses. > > > 4. The test case for my patch of submitting an interactive shell as > a > > > background job request is spurious. As Gerrit said, "if starting > an > > > interactive session via salloc, why would a user want to start it > in a > > > stopped state"? The answer is: you wouldn't. If you wanted to > run > > > interactively, you wouldn't add that "&" at the end of your > command. But > > > if you knew that you wanted to run something that would run > > > "non-interactively", such as an "mpirun" or an "xterm", why > would you > > > not want to be able to add that "&", and free up your terminal or > script > > > for other commands? As Mark noted previously, if users > inadvertently try > > > to run jobs that need to be interactive in the background, they > should > > > fairly quickly learn that it isn't a good idea, whether under > salloc or > > > just a normal shell. > > > > > > > > > Ok, I've had my say. I will rest my case now. > > > > > > -Don Albert- > > > >
