Probably the only way to resolve this is to provide an option to support either
behavior. The attached patch adds a configuration option at SLURM build time.
This is from the NEWS file:
-- Add new SLURM configure time parameter of --enable-salloc-background. If
set, then salloc can execute in the background. Otherwise a message will be
printed and the job allocation halted until brought into the foreground.
________________________________________
From: [email protected] [[email protected]] On Behalf
Of Gerrit Renker [[email protected]]
Sent: Tuesday, February 15, 2011 12:41 AM
To: [email protected]
Cc: [email protected]
Subject: Re: [slurm-dev] Change in "salloc" to inhibit running in background?
On Tue, 15 Feb 2011 08:39:10 +0100 Matthieu wrote:
> Hi,
>
> I come a little bit late on that but I would like to add that I agree with
> Don on that.
>
> IMHO, modifing such a behavior is not really great. There is more scenarios
> where salloc is executing a non-interactive command (salloc/mpirun) in
> background than scenarios where it is running a particular shell
> interactively _starting_ in background. If it is mandatory to have this
> behavior for interactive application I would rather have a new option for
> salloc to use this mode that making it the default. At least for me, 99% of
> the backgrounded salloc are made for mpirun executions in non regression
> tests, I am not sure that my users will be happy to rewrite all their
> scripts or python programs that launches concurrent salloc/mpirun using
> threads (automatically set in background by python).
>
>
So far you are saying the same as Don. I have neither disagreed with your
and Don's comments, but apart from claiming that it "broke" things, there
have been no constructive suggestions how to make this better.
The change affects solely jobs started in the background, for foreground
processes the behaviour is not "broken" (it did not change).
If you can be sure that the job is indeed non-interactive, it can still be
started in the background, but apparently there is a large number of
scripts that resist any changes through e.g. perl or sed.
But if the job is interactive, starting it in the background will result in
hanging salloc, as per reply to Mark. There is no way of bringing such a job
into the foreground within salloc, the session is doomed to fail.
That case is indeed broken, and this is not limited to the question background
or not. We had a user running gdb in this way, the lack of job control in
salloc lead him having to use skill/scancel to clean up the hung session.
> Would it be possible to make it configurable ?
>
But how do you intend that? As per the reply to Don, there is no a priori
way of telling whether a program is run in interactive mode or not. Program
name or commandline flags are not sufficient - a shell can also run in non
interactive mode.
Gerrit
> 2011/2/15 <[email protected]>
>
> >
> > It looks as though I am being outvoted in this, but I would like to make a
> > few more points:
> >
> > 1. The reason I got involved in this was that a rather large Bull
> > customer has acceptance test script jobs that submit thousands of
> > "salloc"
> > requests as background jobs. These scripts worked just fine, and then
> > they were broken by a change that appeared in the final version of 2.2.0,
> > with no explanation of why salloc is suddenly restricted to running in
> > the
> > foreground.
> > 2. I understand that there are job control issues with salloc when run
> > in the background, and that the changes in signal handling that Gerrit
> > made
> > improve the situation when salloc is run in the foreground by retaining
> > better control of the job from the terminal, but I disagree that this is
> > sufficient justification to remove the ability to run salloc in the
> > background, expecially since this change can be trivially bypassed by
> > using
> > input redirection. All that has been accomplished is to break the
> > scripts I
> > mentioned above and whatever else depended on the current behavior of
> > salloc, and force users to add a "kludge" to obtain the behavior they had
> > before. The customer scripts above have a legitimate use in invoking
> > "non-interactive" usage of salloc, as do other examples such as
> > starting an
> > "xterm" on a SLURM allocation.
> > 3. I disagree that the proposed comments in the code provide
> > sufficient explanation of this change. The new comments explain that
> > salloc
> > must be running in the foreground to issue the "tcsetpgrp" call and run
> > "interactive" subprocesses, but they do not explain the rationale for
> > disallowing salloc to run in the background when it is running only
> > "non-interactive" subprocesses.
> > 4. The test case for my patch of submitting an interactive shell as a
> > background job request is spurious. As Gerrit said, "if starting an
> > interactive session via salloc, why would a user want to start it in a
> > stopped state"? The answer is: you wouldn't. If you wanted to run
> > interactively, you wouldn't add that "&" at the end of your command.
> > But
> > if you knew that you wanted to run something that would run
> > "non-interactively", such as an "mpirun" or an "xterm", why would you
> > not want to be able to add that "&", and free up your terminal or script
> > for other commands? As Mark noted previously, if users inadvertently
> > try
> > to run jobs that need to be interactive in the background, they should
> > fairly quickly learn that it isn't a good idea, whether under salloc or
> > just a normal shell.
> >
> >
> > Ok, I've had my say. I will rest my case now.
> >
> > -Don Albert-
> >
Index: configure
===================================================================
--- configure (revision 22477)
+++ configure (working copy)
@@ -1014,6 +1014,7 @@
enable_memory_leak_debug
enable_front_end
enable_partial_attach
+enable_salloc_background
with_slurmctld_port
with_slurmd_port
with_slurmdbd_port
@@ -1691,6 +1692,8 @@
--enable-front-end enable slurmd operation on a front-end
--disable-partial-attach
disable debugger partial task attach support
+ --enable-salloc-background
+ enable salloc to execute in the background
--enable-multiple-slurmd
enable multiple-slurmd support
@@ -7202,13 +7205,13 @@
else
lt_cv_nm_interface="BSD nm"
echo "int some_variable = 0;" > conftest.$ac_ext
- (eval echo "\"\$as_me:7205: $ac_compile\"" >&5)
+ (eval echo "\"\$as_me:7208: $ac_compile\"" >&5)
(eval "$ac_compile" 2>conftest.err)
cat conftest.err >&5
- (eval echo "\"\$as_me:7208: $NM \\\"conftest.$ac_objext\\\"\"" >&5)
+ (eval echo "\"\$as_me:7211: $NM \\\"conftest.$ac_objext\\\"\"" >&5)
(eval "$NM \"conftest.$ac_objext\"" 2>conftest.err > conftest.out)
cat conftest.err >&5
- (eval echo "\"\$as_me:7211: output\"" >&5)
+ (eval echo "\"\$as_me:7214: output\"" >&5)
cat conftest.out >&5
if $GREP 'External.*some_variable' conftest.out > /dev/null; then
lt_cv_nm_interface="MS dumpbin"
@@ -8413,7 +8416,7 @@
;;
*-*-irix6*)
# Find out which ABI we are using.
- echo '#line 8416 "configure"' > conftest.$ac_ext
+ echo '#line 8419 "configure"' > conftest.$ac_ext
if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5
(eval $ac_compile) 2>&5
ac_status=$?
@@ -10202,11 +10205,11 @@
-e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
-e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \
-e 's:$: $lt_compiler_flag:'`
- (eval echo "\"\$as_me:10205: $lt_compile\"" >&5)
+ (eval echo "\"\$as_me:10208: $lt_compile\"" >&5)
(eval "$lt_compile" 2>conftest.err)
ac_status=$?
cat conftest.err >&5
- echo "$as_me:10209: \$? = $ac_status" >&5
+ echo "$as_me:10212: \$? = $ac_status" >&5
if (exit $ac_status) && test -s "$ac_outfile"; then
# The compiler can only warn and ignore the option if not recognized
# So say no if there are warnings other than the usual output.
@@ -10541,11 +10544,11 @@
-e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
-e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \
-e 's:$: $lt_compiler_flag:'`
- (eval echo "\"\$as_me:10544: $lt_compile\"" >&5)
+ (eval echo "\"\$as_me:10547: $lt_compile\"" >&5)
(eval "$lt_compile" 2>conftest.err)
ac_status=$?
cat conftest.err >&5
- echo "$as_me:10548: \$? = $ac_status" >&5
+ echo "$as_me:10551: \$? = $ac_status" >&5
if (exit $ac_status) && test -s "$ac_outfile"; then
# The compiler can only warn and ignore the option if not recognized
# So say no if there are warnings other than the usual output.
@@ -10646,11 +10649,11 @@
-e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
-e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \
-e 's:$: $lt_compiler_flag:'`
- (eval echo "\"\$as_me:10649: $lt_compile\"" >&5)
+ (eval echo "\"\$as_me:10652: $lt_compile\"" >&5)
(eval "$lt_compile" 2>out/conftest.err)
ac_status=$?
cat out/conftest.err >&5
- echo "$as_me:10653: \$? = $ac_status" >&5
+ echo "$as_me:10656: \$? = $ac_status" >&5
if (exit $ac_status) && test -s out/conftest2.$ac_objext
then
# The compiler can only warn and ignore the option if not recognized
@@ -10701,11 +10704,11 @@
-e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
-e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \
-e 's:$: $lt_compiler_flag:'`
- (eval echo "\"\$as_me:10704: $lt_compile\"" >&5)
+ (eval echo "\"\$as_me:10707: $lt_compile\"" >&5)
(eval "$lt_compile" 2>out/conftest.err)
ac_status=$?
cat out/conftest.err >&5
- echo "$as_me:10708: \$? = $ac_status" >&5
+ echo "$as_me:10711: \$? = $ac_status" >&5
if (exit $ac_status) && test -s out/conftest2.$ac_objext
then
# The compiler can only warn and ignore the option if not recognized
@@ -13085,7 +13088,7 @@
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
-#line 13088 "configure"
+#line 13091 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H
@@ -13181,7 +13184,7 @@
lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
lt_status=$lt_dlunknown
cat > conftest.$ac_ext <<_LT_EOF
-#line 13184 "configure"
+#line 13187 "configure"
#include "confdefs.h"
#if HAVE_DLFCN_H
@@ -15137,11 +15140,11 @@
-e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
-e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \
-e 's:$: $lt_compiler_flag:'`
- (eval echo "\"\$as_me:15140: $lt_compile\"" >&5)
+ (eval echo "\"\$as_me:15143: $lt_compile\"" >&5)
(eval "$lt_compile" 2>conftest.err)
ac_status=$?
cat conftest.err >&5
- echo "$as_me:15144: \$? = $ac_status" >&5
+ echo "$as_me:15147: \$? = $ac_status" >&5
if (exit $ac_status) && test -s "$ac_outfile"; then
# The compiler can only warn and ignore the option if not recognized
# So say no if there are warnings other than the usual output.
@@ -15236,11 +15239,11 @@
-e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
-e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \
-e 's:$: $lt_compiler_flag:'`
- (eval echo "\"\$as_me:15239: $lt_compile\"" >&5)
+ (eval echo "\"\$as_me:15242: $lt_compile\"" >&5)
(eval "$lt_compile" 2>out/conftest.err)
ac_status=$?
cat out/conftest.err >&5
- echo "$as_me:15243: \$? = $ac_status" >&5
+ echo "$as_me:15246: \$? = $ac_status" >&5
if (exit $ac_status) && test -s out/conftest2.$ac_objext
then
# The compiler can only warn and ignore the option if not recognized
@@ -15288,11 +15291,11 @@
-e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \
-e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \
-e 's:$: $lt_compiler_flag:'`
- (eval echo "\"\$as_me:15291: $lt_compile\"" >&5)
+ (eval echo "\"\$as_me:15294: $lt_compile\"" >&5)
(eval "$lt_compile" 2>out/conftest.err)
ac_status=$?
cat out/conftest.err >&5
- echo "$as_me:15295: \$? = $ac_status" >&5
+ echo "$as_me:15298: \$? = $ac_status" >&5
if (exit $ac_status) && test -s out/conftest2.$ac_objext
then
# The compiler can only warn and ignore the option if not recognized
@@ -19428,7 +19431,30 @@
$as_echo "${x_ac_partial_attach=no}" >&6; }
+ { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to permit salloc to execute in the background" >&5
+$as_echo_n "checking whether to permit salloc to execute in the background... " >&6; }
+ # Check whether --enable-salloc-background was given.
+if test "${enable_salloc_background+set}" = set; then :
+ enableval=$enable_salloc_background; case "$enableval" in
+ yes) x_ac_salloc_background=yes ;;
+ no) x_ac_salloc_background=no ;;
+ *) { $as_echo "$as_me:${as_lineno-$LINENO}: result: doh!" >&5
+$as_echo "doh!" >&6; }
+ as_fn_error $? "bad value \"$enableval\" for --enable-salloc-background" "$LINENO" 5 ;;
+ esac
+
+fi
+
+ if test "$x_ac_salloc_background" = yes; then
+
+$as_echo "#define SALLOC_RUN_BACKGROUND 1" >>confdefs.h
+
+ fi
+ { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${x_ac_salloc_background=no}" >&5
+$as_echo "${x_ac_salloc_background=no}" >&6; }
+
+
if test "x$ac_debug" = "xtrue"; then
DEBUG_MODULES_TRUE=
DEBUG_MODULES_FALSE='#'
Index: src/salloc/salloc.c
===================================================================
--- src/salloc/salloc.c (revision 22478)
+++ src/salloc/salloc.c (working copy)
@@ -218,13 +218,17 @@
if ((!opt.no_shell) && isatty(STDIN_FILENO)) {
bool sent_msg = false;
- is_interactive = true;
/*
* Job control: interactive sub-processes run in the foreground
* process group of the controlling terminal. In order to grant
* this (tcsetpgrp), salloc needs to be in the foreground first.
*/
- while (tcgetpgrp(STDIN_FILENO) != (pid = getpgrp())) {
+ pid = getpgrp();
+#ifdef SALLOC_RUN_BACKGROUND
+ if (tcgetpgrp(STDIN_FILENO) == pid)
+ is_interactive = true;
+#else
+ while (tcgetpgrp(STDIN_FILENO) != pid) {
if (!sent_msg) {
error("Waiting for program to be placed in "
"the foreground");
@@ -232,11 +236,14 @@
}
killpg(pid, SIGTTIN);
}
-
- /*
- * Save tty attributes and reset at exit, in case a child
- * process died before properly resetting terminal.
- */
+ is_interactive = true;
+#endif
+ }
+ /*
+ * Save tty attributes and reset at exit, in case a child
+ * process died before properly resetting terminal.
+ */
+ if (is_interactive) {
tcgetattr (STDIN_FILENO, &saved_tty_attributes);
atexit (_reset_input_mode);
}
Index: auxdir/x_ac_debug.m4
===================================================================
--- auxdir/x_ac_debug.m4 (revision 22477)
+++ auxdir/x_ac_debug.m4 (working copy)
@@ -91,5 +91,22 @@
fi
AC_MSG_RESULT([${x_ac_partial_attach=no}])
+
+ AC_MSG_CHECKING([whether to permit salloc to execute in the background])
+ AC_ARG_ENABLE(
+ [salloc-background],
+ AS_HELP_STRING(--enable-salloc-background,enable salloc to execute in the background),
+ [ case "$enableval" in
+ yes) x_ac_salloc_background=yes ;;
+ no) x_ac_salloc_background=no ;;
+ *) AC_MSG_RESULT([doh!])
+ AC_MSG_ERROR([bad value "$enableval" for --enable-salloc-background]) ;;
+ esac
+ ]
+ )
+ if test "$x_ac_salloc_background" = yes; then
+ AC_DEFINE(SALLOC_RUN_BACKGROUND, 1, [Define to 1 to permit salloc to run in the background.])
+ fi
+ AC_MSG_RESULT([${x_ac_salloc_background=no}])
]
)
Index: config.h.in
===================================================================
--- config.h.in (revision 22477)
+++ config.h.in (working copy)
@@ -389,6 +389,9 @@
/* Define the project's release. */
#undef RELEASE
+/* Define to 1 to permit salloc to run in the background. */
+#undef SALLOC_RUN_BACKGROUND
+
/* Define to 1 if sched_getaffinity takes three arguments. */
#undef SCHED_GETAFFINITY_THREE_ARGS
Index: NEWS
===================================================================
--- NEWS (revision 22478)
+++ NEWS (working copy)
@@ -39,6 +39,9 @@
-- BLUEGENE - Fix for bad conn-type set when running small blocks in HTC mode.
-- If salloc's --no-shell option is used, then do not attempt to preserve the
terminal's state.
+ -- Add new SLURM configure time parameter of --enable-salloc-background. If
+ set then salloc can execute in the background. Otherwise a message will be
+ printed and the job allocation halted until brought into the foreground.
* Changes in SLURM 2.2.1
========================