Re: [ast-users] Problem with ksh read -n

2015-03-17 Thread Janis Papanagnou



Thanks for you explanations! - My notes below...

 Date: Tue, 17 Mar 2015 17:19:24 -0500
 From: terrence.j.doyle+...@gmail.com
 
   There are two points, here.
 
   First, you're using an old version of ksh.

Yes, I'm aware and noted that myself. I'd be happy if newer ksh's
behaviour would not only be different than older ksh's, but also...
(see below)

 
   Second, ksh has a different spin from bash on reading fixed-length
 data. Read sees variables in two flavors, text and binary.

I'm in this case primarily interested in text.

 When read
 puts characters into a textual variable newlines are always changed to a
 null character which serves as a string terminator. Also, when ksh reads
 fixed-length data, characters continue to fill the buffer after a
 newline is read. Thus, any characters read into your line variable
 after a newline will effectively disappear.

Diasappearing characters when processing ordinary text is what
annoys and surprises me; also conceptually.

The man page says:
The
-n
option causes at most
n 
bytes to read rather a full line
but will return when reading from a slow device
as soon as any
characters have been read.

No mention of discarding any characters. (And why would that
make more sense than reading up to the desired amount, and
keep the rest for another read.)

(And I didn't specify -N, to read exactly an amount of characters,
I called it with -n, to read at most the given amount of characters.)

 
   When I run your test with ksh-20140929, I get these results:
 
 'ABCDEFGHIJ123456'
 '7890'
 '234567890'

(Frankly, this shocks me even more than the output I got from that
older ksh version I used.)

 
 The first 16 characters are put into line as expected and printf %s
 displays them. In the next loop iteration read puts another 16
 characters into line but changes the newline to a null character. Printf
 %s sees the null character as end-of-string and prints only 7890.

That's fine and as expected.

 Characters 6 through 16 (abcdefghij1) in line can't be seen using printf
 %s.

Here I see the problem. According to read's option -n, it should not
read more than the 6 characters (i.e. insert a NUL, and then another
string of chracters). Rather it should read just 6 characters and
aborting the read while skipping the NL and terminating the string
internally with NUL. So that the rest is available for the next read.

 In the final iteration the remaining characters are put in line and
 printf %s displays them.
 
   If you don't want ksh to change newlines to null characters in
 fixed-length data, you need to use binary variables.

No, what I would want is that there's not more characters read than
asked for.

All the features related to binary processing is not what I wanted;
I just want to do (non-binary, printable characters) text processing.

 Binary variables
 are declared with typeset -b. To print binary variables you should use
 printf %B. As with textual variables read will continue to fill the
 buffer after a newline is read. I modified your example for binary
 variables as follows:
 
 typeset -b line
 while IFS= read -n16 -r line
 do
 printf -- '%B'\n line
 done EOT
 ABCDEFGHIJ1234567890
 abcdefghij1234567890
 EOT
 
 This produced the following output:
 
 'ABCDEFGHIJ123456'
 '7890
 abcdefghij1'
 '234567890
 '

(A change to binary processing is not what I wanted to achieve, so the
results of your changes naturally don't reflect the expected outcome.)

 
 It might be more appropriate to change line to buffer to go along
 with ksh's behavior.
 
   It's still different from bash, but I'm content letting ksh be ksh and
 bash be bash.

To be clear; it's *not* about ksh imitating bash. It's about a behaviour
of ksh's read that's (IMO) extremely surprising (and [IMO] not helpful).

 I don't know if the new bash compatability-mode makes
 reading fixed-length data closer to bash's style.

I'm not a bash user (usually), I very much prefer ksh. But methinks the
visible behaviour of read -n is not intuitive, specifically when it comes
to silently discarding characters.

The explanations here left me with a feeling that read -n behaves the
way it is because it's technically implemented in the way it is. Not really
satisfying.

But thanks again. - More insights about why the given read -n behaviour
is more sensible than bash's (in this specific read -n case) is welcome.

 
   Terrence Doyle
 
 On 3/15/15 9:10 AM, Janis Papanagnou wrote:
  I observe a problem (see testcase below) with ksh's read -n.
  (Version 93t 2008-11-04 on Cygwin).
  Bash's behaviour would be what I expect.
  Ksh doesn't read the second line and doesn't terminate output.
  (Maybe an old bug fixed in newer versions? Or am I missing something?)
  
  --snip--
  
  $ cat readtest
  while IFS= read -n16 -r line
  do
printf '%s'\n $line
  done EOT
  ABCDEFGHIJ1234567890
  abcdefghij1234567890
  EOT
  
  $ bash readtest | head
  'ABCDEFGHIJ123456'

Re: [ast-users] Problem with ksh read -n

2015-03-17 Thread Terrence J. Doyle

There are two points, here.

First, you're using an old version of ksh.

Second, ksh has a different spin from bash on reading fixed-length
data. Read sees variables in two flavors, text and binary. When read
puts characters into a textual variable newlines are always changed to a
null character which serves as a string terminator. Also, when ksh reads
fixed-length data, characters continue to fill the buffer after a
newline is read. Thus, any characters read into your line variable
after a newline will effectively disappear.

When I run your test with ksh-20140929, I get these results:

'ABCDEFGHIJ123456'
'7890'
'234567890'

The first 16 characters are put into line as expected and printf %s
displays them. In the next loop iteration read puts another 16
characters into line but changes the newline to a null character. Printf
%s sees the null character as end-of-string and prints only 7890.
Characters 6 through 16 (abcdefghij1) in line can't be seen using printf
%s. In the final iteration the remaining characters are put in line and
printf %s displays them.

If you don't want ksh to change newlines to null characters in
fixed-length data, you need to use binary variables. Binary variables
are declared with typeset -b. To print binary variables you should use
printf %B. As with textual variables read will continue to fill the
buffer after a newline is read. I modified your example for binary
variables as follows:

typeset -b line
while IFS= read -n16 -r line
do
printf -- '%B'\n line
done EOT
ABCDEFGHIJ1234567890
abcdefghij1234567890
EOT

This produced the following output:

'ABCDEFGHIJ123456'
'7890
abcdefghij1'
'234567890
'

It might be more appropriate to change line to buffer to go along
with ksh's behavior.

It's still different from bash, but I'm content letting ksh be ksh and
bash be bash. I don't know if the new bash compatability-mode makes
reading fixed-length data closer to bash's style.

Terrence Doyle

On 3/15/15 9:10 AM, Janis Papanagnou wrote:
 I observe a problem (see testcase below) with ksh's read -n.
 (Version 93t 2008-11-04 on Cygwin).
 Bash's behaviour would be what I expect.
 Ksh doesn't read the second line and doesn't terminate output.
 (Maybe an old bug fixed in newer versions? Or am I missing something?)
 
 --snip--
 
 $ cat readtest
 while IFS= read -n16 -r line
 do
   printf '%s'\n $line
 done EOT
 ABCDEFGHIJ1234567890
 abcdefghij1234567890
 EOT
 
 $ bash readtest | head
 'ABCDEFGHIJ123456'
 '7890'
 'abcdefghij123456'
 '7890'
 
 $ ksh readtest | head
 'ABCDEFGHIJ123456'
 '7890'
 ''
 ''
 ''
 ''
 ''
 ''
 ''
 ''
 
 $ ksh --version
   version sh (ATT Research) 93t 2008-11-04
 
 --snip--
 
 
 
 ___
 ast-users mailing list
 ast-users@lists.research.att.com
 http://lists.research.att.com/mailman/listinfo/ast-users

___
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users