Re: Assume CP1252

2015-01-12 Thread Karl Williamson

On 01/12/2015 06:25 AM, Shawn H Corey wrote:

On Sun, 11 Jan 2015 20:57:26 -0700
Karl Williamson pub...@khwilliamson.com wrote:


To be clear, I think that assuming 1252 when there is no =encoding
line is a good idea.  But I'm leery of overriding an actual =encoding
line.


Agreed.


I could possibly be persuaded, if someone want to make it, by the 
argument that 'latin1' is kind of colloquial, and someone using it may 
very well not be familiar with the possibility that they really mean 
cp1252.  But, if so, there needs to be a way for someone to say I 
really mean it and not be overridden by us.  Perhaps

that could be =encoding ISO-8859-1.



Q: What if there is more than one =encoding line? Does it switch
encoding part way thru a POD?




Error while formatting with Pod::Perldoc::ToMan:
 Nested processed encoding. at 
/usr/share/perl/5.18/Pod/Simple/BlackBox.pm line 380.




Re: Assume CP1252

2015-01-12 Thread David E. Wheeler
On Jan 12, 2015, at 11:18 AM, Karl Williamson pub...@khwilliamson.com wrote:

 To be clear, I think that assuming 1252 when there is no =encoding
 line is a good idea.  But I'm leery of overriding an actual =encoding
 line.
 
 Agreed.

I’m okay with this.

 I could possibly be persuaded, if someone want to make it, by the argument 
 that 'latin1' is kind of colloquial, and someone using it may very well not 
 be familiar with the possibility that they really mean cp1252.  But, if so, 
 there needs to be a way for someone to say I really mean it and not be 
 overridden by us.  Perhaps
 that could be =encoding ISO-8859-1.

If we *were* to assume CP1252 for Latin-1, I would want it to be consistent 
with the precedent set by the W3C. Sean supplied this link:

  http://www.w3.org/TR/encoding/#names-and-labels

Here’s the list of labels that they translate to Windows-1252:


ansi_x3.4-1968
ascii
cp1252
cp819
csisolatin1
ibm819
iso-8859-1
iso-ir-100
iso8859-1
iso88591
iso_8859-1
iso_8859-1:1987
l1
latin1
us-ascii
windows-1252
x-cp1252

In their interpretation, no label ever resolves to iso-8859-1. Pretty 
interesting.

 Q: What if there is more than one =encoding line? Does it switch
 encoding part way thru a POD?
 
 
 
 Error while formatting with Pod::Perldoc::ToMan:
 Nested processed encoding. at /usr/share/perl/5.18/Pod/Simple/BlackBox.pm 
 line 380.

I recently changed this error, because that was a pretty useless message. The 
new message is Cannot have multiple =encoding directives. Also, it is no 
longer fatal, but is passed to scream(), which means it would be a failure for 
Test::Pod, but won’t break tools that generate docs.

  http://github.com/theory/pod-simple/commit/cb884b5

Best,

David



smime.p7s
Description: S/MIME cryptographic signature


Re: Assume CP1252

2015-01-12 Thread Karl Williamson

On 01/12/2015 12:37 PM, David E. Wheeler wrote:

On Jan 12, 2015, at 11:18 AM, Karl Williamson pub...@khwilliamson.com wrote:


To be clear, I think that assuming 1252 when there is no =encoding
line is a good idea.  But I'm leery of overriding an actual =encoding
line.


Agreed.


I’m okay with this.


I could possibly be persuaded, if someone want to make it, by the argument that 'latin1' 
is kind of colloquial, and someone using it may very well not be familiar with the 
possibility that they really mean cp1252.  But, if so, there needs to be a way for 
someone to say I really mean it and not be overridden by us.  Perhaps
that could be =encoding ISO-8859-1.


If we *were* to assume CP1252 for Latin-1, I would want it to be consistent 
with the precedent set by the W3C.


That sounds reasonable.


 Sean supplied this link:


   http://www.w3.org/TR/encoding/#names-and-labels

Here’s the list of labels that they translate to Windows-1252:


ansi_x3.4-1968
ascii
cp1252
cp819
csisolatin1
ibm819
iso-8859-1
iso-ir-100
iso8859-1
iso88591
iso_8859-1
iso_8859-1:1987
l1
latin1
us-ascii
windows-1252
x-cp1252

In their interpretation, no label ever resolves to iso-8859-1. Pretty 
interesting.


I ran across this link, but didn't see what action was taken on it:
http://www.w3.org/TR/newline






Q: What if there is more than one =encoding line? Does it switch
encoding part way thru a POD?




Error while formatting with Pod::Perldoc::ToMan:
Nested processed encoding. at /usr/share/perl/5.18/Pod/Simple/BlackBox.pm line 
380.


I recently changed this error, because that was a pretty useless message. The new message 
is Cannot have multiple =encoding directives. Also, it is no longer fatal, 
but is passed to scream(), which means it would be a failure for Test::Pod, but won’t 
break tools that generate docs.

   http://github.com/theory/pod-simple/commit/cb884b5

Best,

David





Re: Assume CP1252

2015-01-12 Thread Karl Williamson

On 01/12/2015 12:49 PM, David E. Wheeler wrote:

On Jan 12, 2015, at 11:46 AM, Karl Williamson pub...@khwilliamson.com wrote:


I ran across this link, but didn't see what action was taken on it:
http://www.w3.org/TR/newline


Pardon my ignorance. Does that mean that `s/Latin-1/CP1252/g` could be a 
mistake on EBCDIC?

David



Yes, that's essentially what I meant when I said in an earlier email 
that NEL is THE new-line character on os390, which generally runs using 
EBCDIC.  The code point for NEL in cp1252 is a horizontal ellipsis, and 
not a next line, but on some platforms, like os390, it means next 
line.   This is a conflict.


However, now that I think about it, when I look at os390 runs, I rarely 
see NELs.  Maybe there is a filter that translates them to \n before the 
pod sees it, but sometimes, I do see NEL all over the place but no \n. 
I'll ask on the perl-mvs list about this.