Thanks for you explanations! - My notes below...
Date: Tue, 17 Mar 2015 17:19:24 -0500
From: terrence.j.doyle+...@gmail.com
There are two points, here.
First, you're using an old version of ksh.
Yes, I'm aware and noted that myself. I'd be happy if newer ksh's
behaviour would not only be different than older ksh's, but also...
(see below)
Second, ksh has a different spin from bash on reading fixed-length
data. Read sees variables in two flavors, text and binary.
I'm in this case primarily interested in text.
When read
puts characters into a textual variable newlines are always changed to a
null character which serves as a string terminator. Also, when ksh reads
fixed-length data, characters continue to fill the buffer after a
newline is read. Thus, any characters read into your line variable
after a newline will effectively disappear.
Diasappearing characters when processing ordinary text is what
annoys and surprises me; also conceptually.
The man page says:
The
-n
option causes at most
n
bytes to read rather a full line
but will return when reading from a slow device
as soon as any
characters have been read.
No mention of discarding any characters. (And why would that
make more sense than reading up to the desired amount, and
keep the rest for another read.)
(And I didn't specify -N, to read exactly an amount of characters,
I called it with -n, to read at most the given amount of characters.)
When I run your test with ksh-20140929, I get these results:
'ABCDEFGHIJ123456'
'7890'
'234567890'
(Frankly, this shocks me even more than the output I got from that
older ksh version I used.)
The first 16 characters are put into line as expected and printf %s
displays them. In the next loop iteration read puts another 16
characters into line but changes the newline to a null character. Printf
%s sees the null character as end-of-string and prints only 7890.
That's fine and as expected.
Characters 6 through 16 (abcdefghij1) in line can't be seen using printf
%s.
Here I see the problem. According to read's option -n, it should not
read more than the 6 characters (i.e. insert a NUL, and then another
string of chracters). Rather it should read just 6 characters and
aborting the read while skipping the NL and terminating the string
internally with NUL. So that the rest is available for the next read.
In the final iteration the remaining characters are put in line and
printf %s displays them.
If you don't want ksh to change newlines to null characters in
fixed-length data, you need to use binary variables.
No, what I would want is that there's not more characters read than
asked for.
All the features related to binary processing is not what I wanted;
I just want to do (non-binary, printable characters) text processing.
Binary variables
are declared with typeset -b. To print binary variables you should use
printf %B. As with textual variables read will continue to fill the
buffer after a newline is read. I modified your example for binary
variables as follows:
typeset -b line
while IFS= read -n16 -r line
do
printf -- '%B'\n line
done EOT
ABCDEFGHIJ1234567890
abcdefghij1234567890
EOT
This produced the following output:
'ABCDEFGHIJ123456'
'7890
abcdefghij1'
'234567890
'
(A change to binary processing is not what I wanted to achieve, so the
results of your changes naturally don't reflect the expected outcome.)
It might be more appropriate to change line to buffer to go along
with ksh's behavior.
It's still different from bash, but I'm content letting ksh be ksh and
bash be bash.
To be clear; it's *not* about ksh imitating bash. It's about a behaviour
of ksh's read that's (IMO) extremely surprising (and [IMO] not helpful).
I don't know if the new bash compatability-mode makes
reading fixed-length data closer to bash's style.
I'm not a bash user (usually), I very much prefer ksh. But methinks the
visible behaviour of read -n is not intuitive, specifically when it comes
to silently discarding characters.
The explanations here left me with a feeling that read -n behaves the
way it is because it's technically implemented in the way it is. Not really
satisfying.
But thanks again. - More insights about why the given read -n behaviour
is more sensible than bash's (in this specific read -n case) is welcome.
Terrence Doyle
On 3/15/15 9:10 AM, Janis Papanagnou wrote:
I observe a problem (see testcase below) with ksh's read -n.
(Version 93t 2008-11-04 on Cygwin).
Bash's behaviour would be what I expect.
Ksh doesn't read the second line and doesn't terminate output.
(Maybe an old bug fixed in newer versions? Or am I missing something?)
--snip--
$ cat readtest
while IFS= read -n16 -r line
do
printf '%s'\n $line
done EOT
ABCDEFGHIJ1234567890
abcdefghij1234567890
EOT
$ bash readtest | head
'ABCDEFGHIJ123456'