Hi,

David Bigagli <[email protected]> writes:

>    can we see what script is broken? You may want to review #991 to
> get the full picture.

We have a small shell script called "jobsh", that have been designed to
behave similarly to RSH/SSH. When run non-interactively it uses -u,
which have worked fine since at least Slurm 2.4.

Obviously not real-world examples, but things like this works:

  rsync -e jobsh ...          - rsync over srun
  GIT_SSH=jobsh git clone ... - clone git repo over srun

In Slurm 14.11 this breaks since -u now injects carriage return
characters in the output. Running without --unbuffered in 14.11 appears
to give the behavior that --unbuffered did in previous versions. So for
this use case we could change our script to check the slurm version and
remove -u if on 14.11. But that is the kind of changes that we should
not have to make when we rely on an option that have worked for 10+
years.

Bug #991 appears to be about a miss-match between observed behavior and
documentation. Problem is that it was then assumed that the
documentation was wrong. The documentation actually pretty accurately
described how srun have "always" done buffering. I don't find any
mention anywhere as to why the behavior was changed. Maybe the change
was done unintenioally? That would explain the lack of updated
documentation.

It also appears to be no way at all to get the old behavior with 14.11?
The line buffering actually makes a lot of sense as default
behavior. There is sadly plenty of codes out there that have rather bad
output routines. With srun doing line buffering at least output from
multiple processes don't end up on the same line.

Here is a stupid example that shows how srun used to behave, compared to
now.

--- srun-buffering-test ----------------
#!/bin/bash
echo -n "$(hostname): "
i=1
while [ $i -lt 20 ]; do
echo -n "$i"
[ $((i%10)) == 0 ] && echo -ne "\n$(hostname): "
sleep 0.5
((i++))
done
echo ''
----------------------------------------

--- srun-buffering-test.job ------------
#!/bin/bash
#SBATCH -N2 -n2 -t 15
#SBATCH -o srun-buffering-test.out
srun --version
set -x
srun ./srun-buffering-test
srun -u ./srun-buffering-test
----------------------------------------

Running this job on previous versions (tested on 2.4, 2.6, 14.03)
results in output like this:

--- srun-buffering-test.out ------------
slurm 2.6.4
+ srun ./srun-buffering-test
a3: 12345678910
a4: 12345678910
a3: 111213141516171819
a4: 111213141516171819
+ srun -u ./srun-buffering-test
a3: a4: 11223344556677889910
a4: 10
a3: 111112121313141415151616171718181919

----------------------------------------

With 14.11 both srun and srun -u gives the scrabled output, difference
being that with -u line breaks are also changed into CR+LF. (might not
be showed by most e-mail clients however):

--- srun-buffering-test.out ------------
slurm 14.11.4
+ srun ./srun-buffering-test
n549: 1n550: 1223344556677889910
n549: 10
n550: 111112121313141415151616171718181919

+ srun -u ./srun-buffering-test
n549: 1n550: 1223344556677889910
n549: 10
n550: 111112121313141415151616171718181919

----------------------------------------


I would still prefer that this change was reverted completely.

At the very least you need to restore the previous -u,--unbuffered
behavior. If the current default behavior is kept, then -u,--unbuffered
should simply do nothing. A --line-buffered option to get the old
behavior would also be nice in that case.

Regards,
Pär Lindfors, NSC

Reply via email to