Re: New Line vs. Line Feed

2015-06-04 Thread Bill Godfrey
On Thu, 4 Jun 2015 19:19:56 -0500, Bill Godfrey wrote:

On Thu, 4 Jun 2015 11:05:15 -0400, Shmuel Metz (Seymour J.) wrote:

In 4767436570688083.wa.bgodfrey.gzgmail@listserv.ua.edu, on
06/01/2015
   at 10:18 PM, Bill Godfrey said:

The grep and awk commands don't match \n to end-of-line on omvs,
or on linux for that matter.

Don't they match \n to LF on most Eunix and *ix systems?

In awk there are regex patterns for the input data and there are regex 
patterns for strings. The regex patterns for the input data are like patterns 
in grep, in that they do not match \n with anything, but they do match $ with 
end-of-line.

Do '/test$/' and '/test\n/' have the same semantics in awk? In grep?

'/test\n/' doesn't match anything in grep or in awk's pattern for input data.

'/test$/' matches test at end-of-line in grep or in awk's pattern for input 
data.
Correcting myself. grep doesn't use slashes. awk's pattern for input data uses 
slashes.

In awk's pattern for strings, test\n (without slashes) matches test\n 
anywhere within a string, which could have multiple \n characters, whereas 
test$ matches test at the end of the string if the string has no \n at the 
end. You would need test\n$ to match test\n at the end of a string.


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-04 Thread Bill Godfrey
On Thu, 4 Jun 2015 11:05:15 -0400, Shmuel Metz (Seymour J.) wrote:

In 4767436570688083.wa.bgodfrey.gzgmail@listserv.ua.edu, on
06/01/2015
   at 10:18 PM, Bill Godfrey said:

The grep and awk commands don't match \n to end-of-line on omvs,
or on linux for that matter.

Don't they match \n to LF on most Eunix and *ix systems?

In awk there are regex patterns for the input data and there are regex patterns 
for strings. The regex patterns for the input data are like patterns in grep, 
in that they do not match \n with anything, but they do match $ with 
end-of-line.

Do '/test$/' and '/test\n/' have the same semantics in awk? In grep?

'/test\n/' doesn't match anything in grep or in awk's pattern for input data.

'/test$/' matches test at end-of-line in grep or in awk's pattern for input 
data.

In awk's pattern for strings, test\n (without slashes) matches test\n 
anywhere within a string, which could have multiple \n characters, whereas 
test$ matches test at the end of the string if the string has no \n at the 
end. You would need test\n$ to match test\n at the end of a string.

Bill

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-04 Thread Shmuel Metz (Seymour J.)
In 4767436570688083.wa.bgodfrey.gzgmail@listserv.ua.edu, on
06/01/2015
   at 10:18 PM, Bill Godfrey bgodfrey...@gmail.com said:

The grep and awk commands don't match \n to end-of-line on omvs,
or on linux for that matter.

Don't they match \n to LF on most Eunix and *ix systems?

Do '/test$/' and '/test\n/' have the same semantics in awk? In grep?
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-02 Thread Paul Gilmartin
On Mon, 1 Jun 2015 22:18:20 -0500, Bill Godfrey wrote:

The grep and awk commands don't match \n to end-of-line on omvs, or on 
linux for that matter.

awk certainly does.  To wit:
user@OS/390.24.00: cat awknl   
#! /bin/sh -x

awk 'BEGIN {
String = First line\nSecond line.\n

# Show that \n is a line end.
printf( %s, String )

# show that \n matches line end.
print( match( String, \n ) )
}'
user@OS/390.24.00: sh awknl
First line
Second line.
11
user@OS/390.24.00: 

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-02 Thread Bill Godfrey
On Tue, 2 Jun 2015 03:17:35 -0500, Paul Gilmartin wrote:

On Mon, 1 Jun 2015 22:18:20 -0500, Bill Godfrey wrote:

The grep and awk commands don't match \n to end-of-line on omvs, or on 
linux for that matter.

awk certainly does.  To wit:
user@OS/390.24.00: cat awknl   
#! /bin/sh -x

awk 'BEGIN {
String = First line\nSecond line.\n

# Show that \n is a line end.
printf( %s, String )

# show that \n matches line end.
print( match( String, \n ) )
}'
user@OS/390.24.00: sh awknl
First line
Second line.
11
user@OS/390.24.00: 

I was only referring to \n in the pattern used in awk's general pattern 
{action} syntax, where the pattern
is matched against text being read. I should have qualified my statement.

It's important to note that in your awk example and my Perl example
the \n is not being treated as an anchor in the regex pattern, like $ would be.

You could change \n to \n$ in the last print statement, and the result 
would be 24 instead of 11.

I'm sure you know all of this already. I'm just mentioning it for anyone who 
might not.

Bill

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-02 Thread Paul Gilmartin
On Tue, 2 Jun 2015 05:48:31 -0500, Bill Godfrey wrote:

I was only referring to \n in the pattern used in awk's general pattern 
{action} syntax, where the pattern
is matched against text being read. I should have qualified my statement.
 
This is a characteristic not of awk's pattern matching but of awk's input
processing.  Awk discards the delimiting \n like gets() rather than
retaining it like fgets().

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-01 Thread Shmuel Metz (Seymour J.)
In 5568a3c8.5030...@gmail.com, on 05/30/2015
   at 01:37 AM, David Crayford dcrayf...@gmail.com said:

It implicitly converted strings to ASCII

That's good if you want them converted; not so good if you don't.

  She died of a favor
  from which none could save her.
  and that was the end of Sweet Mollie Malone.
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-01 Thread Shmuel Metz (Seymour J.)
In 0379604180364016.wa.bgodfrey.gzgmail@listserv.ua.edu, on
05/29/2015
   at 10:30 AM, Bill Godfrey bgodfrey...@gmail.com said:

I get identical results whether I use \n or $ in the OP's example. In
OMVS. I'm not addressing your question but rather the OP's example.

Which OMVS facilities match \n to end of line (record) and which to
LF? What do grep et al do about matching \n against legacy PS data
sets, where there is a logical end of record?
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-06-01 Thread Bill Godfrey
On Mon, 1 Jun 2015 17:11:54 -0400, Shmuel Metz (Seymour J.) 
shmuel+ibm-m...@patriot.net wrote:

In 0379604180364016.wa.bgodfrey.gzgmail@listserv.ua.edu, on
05/29/2015
   at 10:30 AM, Bill Godfrey bgodfrey...@gmail.com said:

I get identical results whether I use \n or $ in the OP's example. In
OMVS. I'm not addressing your question but rather the OP's example.

Which OMVS facilities match \n to end of line (record) and which to
LF? What do grep et al do about matching \n against legacy PS data
sets, where there is a logical end of record?

When commands like cat and cp read legacy PS data sets as text, the results 
reflect this description of reading text files in the C/C++ Programming Guide:

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/CBCPG1C0/2.9.4.2

Where it says:

For files opened in fixed text format, rightmost blanks are stripped off a 
record at input, and a new-line character is placed in the logical record.

That new-line character is hex 15.

So a 3-line data set of 80-byte records that look like this:

a
test
testing

will look like this in hex after being read by cat

cat //test.cntl | od -tx1 -An
81  15  A3  85  A2  A3  15  A3  85  A2  A3  89  95  87  15

which is the same result as this command:

printf %b a\ntest\ntesting\n | od -tx1 -An
81  15  A3  85  A2  A3  15  A3  85  A2  A3  89  95  87  15

The only facility with regular expressions that I have found that matches \n to 
end-of-line is in Perl. For example:

printf %b a\ntest\ntesting\n | perl -ne 'print if /test\n/'
test

printf %b a\ntest\ntesting\n | perl -ne 'print if /test$/'
test

The grep and awk commands don't match \n to end-of-line on omvs, or on 
linux for that matter.

The grep command can't read a legacy PS data set directly, but awk can.

awk '/test$/' //test.cntl
test

Bill

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-31 Thread Shmuel Metz (Seymour J.)
In 5567e81c.4040...@vse2pdf.com, on 05/29/2015
   at 12:16 AM, Tony Thigpen t...@vse2pdf.com said:

1960's ATT pushes for a replacement of ITA2 which the ATA published
as  ASCII in 1963.

I might believe ASA, through several iterations. I hate overloaded
code points!
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-31 Thread Shmuel Metz (Seymour J.)
In 838271083.1045039.1432870193420.javamail.ya...@mail.yahoo.com, on
05/29/2015
   at 03:29 AM, Ze'ev Atlas
004b34e7c98a-dmarc-requ...@listserv.ua.edu said:

Does anybody know why do we need two characters that seem to do the
same thing

No, especially since they *don't* do the same thing. A better question
would be why Eunix hijacked the Line Feed instead us using CRLF.
 
-- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-31 Thread Shmuel Metz (Seymour J.)
In 8215077812992901.wa.paulgboulderaim@listserv.ua.edu, on
05/29/2015
   at 12:27 AM, Paul Gilmartin
000433f07816-dmarc-requ...@listserv.ua.edu said:

Using a device-specific hardware command to separate records in a
general file makes as little sense as Assembler H's use of machine
carriage control.

Or as Eunix using LF as a record separator.

A device-neutral convention might have beem
Record Separator, ASCII 0x1e.

Please inform Ken Thomson.

IBM clearly violates a standard.

That's not at all clear. What do POSIX et all formally say about the
use of LF-broken ASCII?  -- 
 Shmuel (Seymour J.) Metz, SysProg and JOAT
 ISO position; see http://patriot.net/~shmuel/resume/brief.html 
We don't care. We don't have to care, we're Congress.
(S877: The Shut up and Eat Your spam act of 2003)

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-30 Thread Ze'ev Atlas
Gil is correct, \n is implementation dependent.
Actually, PCRE handles it correctly, except that I've got confused and chose an 
incorrect option in my config.h.  Once I've corrected it tests run smoothly and 
produce correct test results.

Thanks all for explanations and advice
ZA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread David Crayford

On 29/05/2015 1:43 PM, Anne  Lynn Wheeler wrote:

EBCDIC and the P-Bit, The Biggest Computer Goof Ever
http://www.bobbemer.com/P-BIT.HTM

The culprit was T. Vincent Learson. The only thing for his defense is
that he had no idea of what he had done. It was when he was an IBM Vice
President, prior to tenure as Chairman of the Board, those lofty
positions where you believe that, if you order it done, it actually will
be done. I've mentioned this fiasco elsewhere.


And how much has that dumb decision cost mainframe customers over the 
years? Fiasco is the right word.


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Ze'ev Atlas
Thank you all for comprehensive  explanation

ZA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Ze'ev Atlas
Your messages clarified my issue and actually assured me that the solution I'd 
suggested is correct, so I would like to brief you.

It is apparent that IBM chose to mark the end of line with NL and not with any 
of LF or CRLF.  That on itself is probably a correct decision and probably what 
the standard should have been to begin with.  The problem is that in the C 
language convention, the escape sequence \n has subtle double meaning.  It 
means LF but it also contains within it the semantics of NL.

When we do 
printf (some text \n); 
it will work correctly on all platforms and nobody would ever notice any 
problem.  it will produce on EBCDIC
some text NL
and on ASCII platforms
some text LF
or
some text CRLF

But when we issue a pattern matching (I'll use Perl syntax for brevity) 
if ($text =~ /some text \n/)
the \n is translated by convention to LF and the EBCDIC based pattern matching 
will fail to match!

So the solution should be to somehow (optionally) dictate to the package that 
\n is NL and not LF.  I've requested that such option would be implemented so I 
can use it.

ZA

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Bill Godfrey
On Fri, 29 May 2015 09:56:20 -0500, Paul Gilmartin paulgboul...@aim.com wrote:

On Fri, 29 May 2015 09:52:42 -0500, Bill Godfrey wrote:

On Fri, 29 May 2015 09:03:59 -0500, Ze'ev Atlas wrote:


But when we issue a pattern matching (I'll use Perl syntax for brevity) 
if ($text =~ /some text \n/)
the \n is translated by convention to LF and the EBCDIC based pattern 
matching will fail to match!


why not this?
if ($text =~ /some text $/)
 
That's a circumvention, not a solution to the problem.  But my question 
remains,
by what convention in the z/OS EBCDIC environment is \n translated to LF
rather than NL?


I get identical results whether I use \n or $ in the OP's example. In OMVS.
I'm not addressing your question but rather the OP's example.

Bill

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Paul Gilmartin
On Fri, 29 May 2015 09:03:59 -0500, Ze'ev Atlas wrote:

It is apparent that IBM chose to mark the end of line with NL and not with any 
of LF or CRLF.  That on itself is probably a correct decision and probably 
what the standard should have been to begin with.  The problem is that in the 
C language convention, the escape sequence \n has subtle double meaning.  It 
means LF but it also contains within it the semantics of NL.
 
The semantic of \n is implementation-dependent.  In Linux, it compiles as 
LF;
in z/OS as NL (But, I believe, as LF in Enhanced ASCII mode); and in
Classic Mac OS (pre OS X) as CR.

When we do 
printf (some text \n); 
it will work correctly on all platforms and nobody would ever notice any 
problem.  it will produce on EBCDIC
some text NL
and on ASCII platforms
some text LF
or
some text CRLF

Much of this is handled by the device driver.

But when we issue a pattern matching (I'll use Perl syntax for brevity) 
if ($text =~ /some text \n/)
the \n is translated by convention to LF and the EBCDIC based pattern matching 
will fail to match!

That problem should not occur.  By z/OS convention, \n represents NL and 
then pattern matching succeeds.  What z/OS facility treats \n as LF?

So the solution should be to somehow (optionally) dictate to the package that 
\n is NL and not LF.  I've requested that such option would be implemented so 
I can use it.

That should not be necessary.  Can you provide more context for your example?

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Bill Godfrey
On Fri, 29 May 2015 09:03:59 -0500, Ze'ev Atlas wrote:


But when we issue a pattern matching (I'll use Perl syntax for brevity) 
if ($text =~ /some text \n/)
the \n is translated by convention to LF and the EBCDIC based pattern matching 
will fail to match!


why not this?
if ($text =~ /some text $/)

Bill

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread John McKown
On Fri, May 29, 2015 at 9:31 AM, Paul Gilmartin 
000433f07816-dmarc-requ...@listserv.ua.edu wrote:

 On Fri, 29 May 2015 19:54:20 +0800, David Crayford wrote:
 
 And how much has that dumb decision cost mainframe customers over the
 years? Fiasco is the right word.
 
 And IBM could have recovered, rather than compounding the fiasco at
 the inception of OMVS by making OMVS ASCII based and providing
 ASCII--EBCDIC conversion in the C RTL for Legacy data sets except
 when fopen() was called with mode=*b.  The kernel would have been
 simpler for omitting autoconversion.  (I believe Legacy I/O is not
 handled by kernel.)


​99.99% agreement. I'd only change I'd make would be for UTF-8 and not
ASCII instead of EBCDIC. But I'm sure that there would be other problems
with inter-operability that I haven't thought of if legacy continued to
be mainly CP-037 based with UNIX being UTF-8 based.




 And there would have been no EBCDIC obstacle to porting GNU and
 other FOSS.

 Fiasco ** 2.

 Even yet, I wish IBM would complete the Enhanced ASCII support in the
 C RTL. Significant omissions are Curses and X11; sockets is already
 supported.

 -- gil



-- 
My sister opened a computer store in Hawaii. She sells C shells down by the
seashore.

If someone tell you that nothing is impossible:
Ask him to dribble a football.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Paul Gilmartin
On Fri, 29 May 2015 09:52:42 -0500, Bill Godfrey wrote:

On Fri, 29 May 2015 09:03:59 -0500, Ze'ev Atlas wrote:


But when we issue a pattern matching (I'll use Perl syntax for brevity) 
if ($text =~ /some text \n/)
the \n is translated by convention to LF and the EBCDIC based pattern 
matching will fail to match!


why not this?
if ($text =~ /some text $/)
 
That's a circumvention, not a solution to the problem.  But my question remains,
by what convention in the z/OS EBCDIC environment is \n translated to LF
rather than NL?

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread David Crayford

On 29/05/2015 10:31 PM, Paul Gilmartin wrote:

On Fri, 29 May 2015 19:54:20 +0800, David Crayford wrote:

And how much has that dumb decision cost mainframe customers over the
years? Fiasco is the right word.


And IBM could have recovered, rather than compounding the fiasco at
the inception of OMVS by making OMVS ASCII based and providing
ASCII--EBCDIC conversion in the C RTL for Legacy data sets except
when fopen() was called with mode=*b.  The kernel would have been
simpler for omitting autoconversion.  (I believe Legacy I/O is not
handled by kernel.)


Legacy I/O is a handled quite well by Java. I had a good experience this 
week with Java when I got it to push over 1GB of CSV data to a Linux 
server in  30 seconds over a Redis backplane.
It implicitly converted strings to ASCII (which saved me heaps of time) 
and it was very fast.



And there would have been no EBCDIC obstacle to porting GNU and
other FOSS.

Fiasco ** 2.

Even yet, I wish IBM would complete the Enhanced ASCII support in the
C RTL. Significant omissions are Curses and X11; sockets is already
supported.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Tony Thigpen

 change the file!?  That's exactly what I don't want, and it gives me no
 choice.  Notepad++ and vim do better: they allow the user to choose the
 output format, defaulting to the original input format.

Paul,

It was part of a general comment about how even Microsoft has different 
rules in different programs.  BUT(!)


I use the Wordpad conversion process quite frequently at work. While my 
laptop is Linux, everybody else is Windows. When I send files (via 
email), I sometimes forget to run unix2dos against the file. When my 
coworkers get a text file from me that seems to be one long line, they 
know to open it with Wordpad instead of Notepad. If they need to retail 
the file on their Windows box, they just save it from Wordpad and never 
have to worry about it's Linux format agin.


Tony Thigpen

Paul Gilmartin wrote on 05/29/2015 05:52 PM:

On Fri, 29 May 2015 00:16:28 -0400, Tony Thigpen wrote:


Interesting, Windows Notepad requires CRLF, but Windows Wordpad will
read and display a LF only file correctly and even change the file to
CRLF when saved.


change the file!?  That's exactly what I don't want, and it gives me no
choice.  Notepad++ and vim do better: they allow the user to choose the
output format, defaulting to the original input format.


On Sat, 30 May 2015 01:37:12 +0800, David Crayford wrote:


Legacy I/O is a handled quite well by Java. I had a good experience this
week with Java when I got it to push over 1GB of CSV data to a Linux
server in  30 seconds over a Redis backplane.
It implicitly converted strings to ASCII (which saved me heaps of time)
and it was very fast.


There; was that so hard!?  -- Nick Burns, Your Company's Computer Guy

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Paul Gilmartin
On Fri, 29 May 2015 00:16:28 -0400, Tony Thigpen wrote:

Interesting, Windows Notepad requires CRLF, but Windows Wordpad will
read and display a LF only file correctly and even change the file to
CRLF when saved.

change the file!?  That's exactly what I don't want, and it gives me no
choice.  Notepad++ and vim do better: they allow the user to choose the
output format, defaulting to the original input format.


On Sat, 30 May 2015 01:37:12 +0800, David Crayford wrote:

Legacy I/O is a handled quite well by Java. I had a good experience this
week with Java when I got it to push over 1GB of CSV data to a Linux
server in  30 seconds over a Redis backplane.
It implicitly converted strings to ASCII (which saved me heaps of time)
and it was very fast.

There; was that so hard!?  -- Nick Burns, Your Company's Computer Guy

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Anne Lynn Wheeler
john.archie.mck...@gmail.com (John McKown) writes:
 As a side note (as I have heard it), the reason that Windows uses CRLF
 as a line ending is because MS-DOS did the same. MS-DOS used CRLF
 because CPM-80 used CRLF. And, finally, CPM-80 used CRLF because the
 common printers at the time could not do a carriage return / line feed
 in a single operation.  So, Gary Kindall (author of CPM-80) decided to
 end text files with CRLF so that he didn't need to complicate the
 printer driver to put a LF in when a CR was detected. This made good
 sense in the day that 64K RAM and a 1 Mhz 8080 was top of the line
 equipment for the hobbyist.

a little other topic drift from recent IBM antitrust thread

Other trivia ... also at the scientific center ... GML was invented at
the science center in 1969 (G, M,  L are the 1st letters of the
inventor's last name). This is posting by Sowa about GML being used by
IBM for documents used in the antitrust suit
http://ontolog.cim3.net/forum/ontolog-forum/2012-04/msg00058.html

from above:

For text that was copied from the original OED, they got GML to produce
exactly the same line breaks and hyphenation.  They needed to get it
exactly right in order to aid the proof readers who had to make sure
that the new copy was identical to the old.

The GML-based software in the 1980s was far more flexible than MS Word
is today.  Just look at the OED and imagine how you might use MS Word to
match that exactly.

... snip ...

in the mid-60s at science center, CMS script was implementation of CTSS
runoff using dot formating controls ... then later, script was
enhanced to support GML tag processing. in late 70s, a vm370 SE in the
LA branch ... did implementation of CMS script on trs80 (NewScript)

and periodically mentioned ... before ms/dos
http://en.wikipedia.org/wiki/MS-DOS
there was seattle computer
http://en.wikipedia.org/wiki/Seattle_Computer_Products
before seattle computer there was cp/m,
http://en.wikipedia.org/wiki/CP/M
before cp/m, kildall worked with cp67/cms at npg
http://en.wikipedia.org/wiki/Naval_Postgraduate_School

other Sowa trivia ... on the failure of FS and how poorly 3081 compared
to competition
http://www.jfsowa.com/computer/memo125.htm


-- 
virtualization experience starting Jan1968, online at home since Mar1970

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-29 Thread Paul Gilmartin
On Fri, 29 May 2015 19:54:20 +0800, David Crayford wrote:

And how much has that dumb decision cost mainframe customers over the
years? Fiasco is the right word.
 
And IBM could have recovered, rather than compounding the fiasco at
the inception of OMVS by making OMVS ASCII based and providing
ASCII--EBCDIC conversion in the C RTL for Legacy data sets except
when fopen() was called with mode=*b.  The kernel would have been
simpler for omitting autoconversion.  (I believe Legacy I/O is not
handled by kernel.)

And there would have been no EBCDIC obstacle to porting GNU and
other FOSS.

Fiasco ** 2.

Even yet, I wish IBM would complete the Enhanced ASCII support in the
C RTL. Significant omissions are Curses and X11; sockets is already
supported.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-28 Thread Tony Thigpen

It's actually much worse. There are three:

Ebcdic:
CR = x0D
NL = x15
LF = x25

Originally, CR only moved the print back to the first position of the 
same line. LF only moved the print down one line without moving 
sideways. NL moved both down and to the first position of the line.


When it was designed, they were using teletype machines and simple 
printers. No CRTs.


Historically:

1930's had the Teletype standard: International Telegraph Alphabet No. 2 
(ITA2); which had both a CR and a LF and required both at the end of a line.


1950's IBM introduces BCD and adds NL
1960's IBM introduces EBCDIC and continued using the 3 values.

1960's ATT pushes for a replacement of ITA2 which the ATA published as 
ASCII in 1963. (One of their requirements was 7 bit so EBCDIC was ruled 
out.)


In the ASCII world, CR and LF were the standard until the mid-1960's 
when the Multics developers decided that using two characters was stupid 
and they started using just LF. Unix and follow-on OSs carried on the 
same tradition.


Today, it's a mess. Windows wants CRLF. Internet RFCs normally use CRLF. 
Mac and Linux use just LF.


Interesting, Windows Notepad requires CRLF, but Windows Wordpad will 
read and display a LF only file correctly and even change the file to 
CRLF when saved.



Tony Thigpen

Ze'ev Atlas wrote on 05/28/2015 11:29 PM:

Hi allI am dealing with some C package on classic z/OS (PDS/E, no USS).  When C 
reads text files it inserts 0x15 in the end of the record (it goes that far as 
to drop the trailing blanks and substitute them with one 0x15 for fixed length 
records, but I think that there is an option to override that).  0x15 is 
defined as New Line, but there is a separate character, 0x25 that is defined as 
Line Feed.  Does anybody know why do we need two characters that seem to do the 
same thing (besides the evil desire to confuse the poor user :)  Ze'ev Atlas


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-28 Thread Anne Lynn Wheeler
t...@vse2pdf.com (Tony Thigpen) writes:
 It's actually much worse. There are three:

 Ebcdic:
 CR = x0D
 NL = x15
 LF = x25

 Originally, CR only moved the print back to the first position of the
 same line. LF only moved the print down one line without moving
 sideways. NL moved both down and to the first position of the line.

 When it was designed, they were using teletype machines and simple
 printers. No CRTs.

 Historically:

 1930's had the Teletype standard: International Telegraph Alphabet
 No. 2 (ITA2); which had both a CR and a LF and required both at the
 end of a line.

 1950's IBM introduces BCD and adds NL
 1960's IBM introduces EBCDIC and continued using the 3 values.

 1960's ATT pushes for a replacement of ITA2 which the ATA published as
 ASCII in 1963. (One of their requirements was 7 bit so EBCDIC was
 ruled out.)

 In the ASCII world, CR and LF were the standard until the mid-1960's
 when the Multics developers decided that using two characters was
 stupid and they started using just LF. Unix and follow-on OSs carried
 on the same tradition.

 Today, it's a mess. Windows wants CRLF. Internet RFCs normally use
 CRLF. Mac and Linux use just LF.

 Interesting, Windows Notepad requires CRLF, but Windows Wordpad will
 read and display a LF only file correctly and even change the file to
 CRLF when saved.

IBM did much of the standardization for ASCII and 360 originally was
suppose to be an ASCII machine ... unfortunately the 360 ASCII unit
record gear wasn't ready ... and the decision was made to go
(temporarily) with the old BCD unit record gear (but there was some
unfortunate side-effects of that decision).

EBCDIC and the P-Bit, The Biggest Computer Goof Ever
http://www.bobbemer.com/P-BIT.HTM

The culprit was T. Vincent Learson. The only thing for his defense is
that he had no idea of what he had done. It was when he was an IBM Vice
President, prior to tenure as Chairman of the Board, those lofty
positions where you believe that, if you order it done, it actually will
be done. I've mentioned this fiasco elsewhere.

... snip ...

by the father of ASCII
http://www.bobbemer.com/FATHEROF.HTM
his history index
http://www.bobbemer.com/HISTORY.HTM



-- 
virtualization experience starting Jan1968, online at home since Mar1970

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-28 Thread Paul Gilmartin
On Thu, 28 May 2015 23:34:53 -0500, John McKown wrote:

​0x15 is _NOT_ a Line Feed character. It is a New Line (NEL) character from
the 3215 console days. In EBCDIC, 0x25 is the true Line Feed character. On
the 3215, the NEL​

​was a single byte which did a carriage return and line feed operation all
in one. If you sent a 0x15 (LF) to a 3215, the platen (roller) would
advance one line, but the print head would remain stationary.

As a side note (as I have heard it), the reason that Windows uses CRLF as a
line ending is because MS-DOS did the same. MS-DOS used CRLF because CPM-80
used CRLF. And, finally, CPM-80 used CRLF because the common printers at
the time could not do a carriage return / line feed in a single operation.
So, Gary Kindall (author of CPM-80) decided to end text files with CRLF so
that he didn't need to complicate the printer driver to put a LF in when a
CR was detected. This made good sense in the day that 64K RAM and a 1 Mhz
8080 was top of the line equipment for the hobbyist.
 
The Teletype 33, running at 10 CPS, could do a CR in less than 0.2 seconds;
a LF in less than 0.1 second, so it made sense to use CRLF so the
combined operation completed before the next printable character was
issued.

Taking its cue from the 3215, VM CP (CP/67?) used NL as a command
separator. When the first C compilers, from ISVs, not IBM, and on VM,
not MVS appeared, they used 0x15 -- UNIX was not a concern.  Then
OMVS used 0x15 for compatibility with those compilers.

Using a device-specific hardware command to separate records in a
general file makes as little sense as Assembler H's use of machine
carriage control.  A device-neutral convention might have beem
Record Separator, ASCII 0x1e.

CMS Pipelines's A2E/E2A map:

  ASCII EBCDIC
 NEL 0x85 -- NL 0x15
  LF 0x0a -- LF 0x25

... as do Linux iconv commands and subroutines, and even OMVS's
dd command.  This results in painful incompatibilities.  The standouts
are OMVS iconv and other utilities.

IBM clearly violates a standard.  Footnotes on various reference
manual pages do not excuse such a violation.

-- gil

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: New Line vs. Line Feed

2015-05-28 Thread John McKown
On Thu, May 28, 2015 at 10:29 PM, Ze'ev Atlas 
004b34e7c98a-dmarc-requ...@listserv.ua.edu wrote:

 Hi allI am dealing with some C package on classic z/OS (PDS/E, no USS).
 When C reads text files it inserts 0x15 in the end of the record (it goes
 that far as to drop the trailing blanks and substitute them with one 0x15
 for fixed length records, but I think that there is an option to override
 that).  0x15 is defined as New Line, but there is a separate character,
 0x25 that is defined as Line Feed.  Does anybody know why do we need two
 characters that seem to do the same thing (besides the evil desire to
 confuse the poor user :)  Ze'ev Atlas


​0x15 is _NOT_ a Line Feed character. It is a New Line (NEL) character from
the 3215 console days. In EBCDIC, 0x25 is the true Line Feed character. On
the 3215, the NEL​

​was a single byte which did a carriage return and line feed operation all
in one. If you sent a 0x15 (LF) to a 3215, the platen (roller) would
advance one line, but the print head would remain stationary.

As a side note (as I have heard it), the reason that Windows uses CRLF as a
line ending is because MS-DOS did the same. MS-DOS used CRLF because CPM-80
used CRLF. And, finally, CPM-80 used CRLF because the common printers at
the time could not do a carriage return / line feed in a single operation.
So, Gary Kindall (author of CPM-80) decided to end text files with CRLF so
that he didn't need to complicate the printer driver to put a LF in when a
CR was detected. This made good sense in the day that 64K RAM and a 1 Mhz
8080 was top of the line equipment for the hobbyist.

-- 
My sister opened a computer store in Hawaii. She sells C shells down by the
seashore.

If someone tell you that nothing is impossible:
Ask him to dribble a football.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN