Re: Assume CP1252

Karl Williamson Tue, 13 Jan 2015 13:52:49 -0800

On 01/13/2015 11:49 AM, David E. Wheeler wrote:

On Jan 13, 2015, at 10:31 AM, Karl Williamson <[email protected]> wrote:

What Perl does to handle this is to simple swap the NEL and LF code points.  
That makes \n mean NEL instead of LF.  Apparently LF is unused in EBCDIC 
applications, so it works.  There is official support for this swap, as 
Unicode's definition of how to get UTF-8 to work on EBCDIC platforms says to do 
the swap.


Huh. Good to know (and have it documented now!).

It does mean that NL doesn't mean the character that a native EBCDIC speaker 
would think.

But the bottom line is that because of this character swapping, the NEL 
characters in EBCDIC appear as \n, so aren't a problem for CP1252.


Nice. So should we then adopt the same pattern as the HTML 5 spec?

I'm still leery of overruling an =encoding line, especially if we haveno provision for telling us to not overrule. But it means that it'sfine to s/latin1/cp1252 when there is no =encoding, as far as I'mconcerned, and I haven't heard any dissent from that here. If you like,I can prepare a patch for that; the EBCDIC portion is a little tricky.Are you going to release a version of this module without this change?


And I wonder if that W3 spec issue you pointed to the other day could use a 
comment to this effect.

I don't understand you here. This is a W3 website document, and wecan't edit it. I


Best,

David

Re: Assume CP1252

Reply via email to