On 01/13/2015 11:49 AM, David E. Wheeler wrote:
On Jan 13, 2015, at 10:31 AM, Karl Williamson <[email protected]> wrote:
What Perl does to handle this is to simple swap the NEL and LF code points.
That makes \n mean NEL instead of LF. Apparently LF is unused in EBCDIC
applications, so it works. There is official support for this swap, as
Unicode's definition of how to get UTF-8 to work on EBCDIC platforms says to do
the swap.
Huh. Good to know (and have it documented now!).
It does mean that NL doesn't mean the character that a native EBCDIC speaker
would think.
But the bottom line is that because of this character swapping, the NEL
characters in EBCDIC appear as \n, so aren't a problem for CP1252.
Nice. So should we then adopt the same pattern as the HTML 5 spec?
I'm still leery of overruling an =encoding line, especially if we have
no provision for telling us to not overrule. But it means that it's
fine to s/latin1/cp1252 when there is no =encoding, as far as I'm
concerned, and I haven't heard any dissent from that here. If you like,
I can prepare a patch for that; the EBCDIC portion is a little tricky.
Are you going to release a version of this module without this change?
And I wonder if that W3 spec issue you pointed to the other day could use a
comment to this effect.
I don't understand you here. This is a W3 website document, and we
can't edit it. I
Best,
David