In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/df0c79950af75059c3150ef5df5b4fac77f78720?hp=4d2ca8b5c9aea7369aec591dabed8f7f35f61ce3>
- Log ----------------------------------------------------------------- commit df0c79950af75059c3150ef5df5b4fac77f78720 Author: Karl Williamson <[email protected]> Date: Sat Apr 4 11:58:25 2015 -0600 perlpodspec: Finish EBCDIC updates This is a follow up to bd940430ebc41b7b346cc761cc46be9674f34111 M pod/perlpodspec.pod commit 1bca558f18a99576886aa54a63386d1d4ce6f1f7 Author: Karl Williamson <[email protected]> Date: Sat Apr 4 11:57:34 2015 -0600 perlpodspec: Nits Some of these weren't displaying correctly. M pod/perlpodspec.pod ----------------------------------------------------------------------- Summary of changes: pod/perlpodspec.pod | 50 ++++++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 24 deletions(-) diff --git a/pod/perlpodspec.pod b/pod/perlpodspec.pod index c3d172f..20a9195 100644 --- a/pod/perlpodspec.pod +++ b/pod/perlpodspec.pod @@ -70,7 +70,7 @@ else with the Pod (like counting words, scanning for index points, etc.). Pod content is contained in B<Pod blocks>. A Pod block starts with a -line that matches <m/\A=[a-zA-Z]/>, and continues up to the next line +line that matches C<m/\A=[a-zA-Z]/>, and continues up to the next line that matches C<m/\A=cut/> or up to the end of the file if there is no C<m/\A=cut/> line. @@ -416,7 +416,7 @@ formatting code. Examples: B<< $foo->bar(); >> With this syntax, the whitespace character(s) after the "CE<lt><<" -and before the ">>" (or whatever letter) are I<not> renderable. They +and before the ">>>" (or whatever letter) are I<not> renderable. They do not signify whitespace, are merely part of the formatting codes themselves. That is, these are all synonymous: @@ -622,8 +622,13 @@ The well known Unicode Byte Order Marks are as follows: if the file begins with the two literal byte values 0xFE 0xFF, this is the BOM for big-endian UTF-16. If the file begins with the two literal byte value 0xFF 0xFE, this is the BOM for little-endian -UTF-16. If the file begins with the three literal byte values +UTF-16. On an ASCII platform, if the file begins with the three literal +byte values 0xEF 0xBB 0xBF, this is the BOM for UTF-8. +(A mechanism portable to EBCDIC platforms is to: + + my $utf8_bom = "\x{FEFF}"; + utf8::encode($utf8_bom); =for comment use bytes; print map sprintf(" 0x%02X", ord $_), split '', "\x{feff}"; @@ -634,7 +639,8 @@ UTF-16. If the file begins with the three literal byte values =item * -A naive but often sufficient heuristic for testing the first highbit +A naive, but often sufficient heuristic on ASCII platforms, for testing +the first highbit byte-sequence in a BOM-less file (whether in code or in Pod!), to see whether that sequence is valid as UTF-8 (RFC 2279) is to check whether that the first byte in the sequence is in the range 0xC2 - 0xFD @@ -642,7 +648,8 @@ I<and> whether the next byte is in the range 0x80 - 0xBF. If so, the parser may conclude that this file is in UTF-8, and all highbit sequences in the file should be assumed to be UTF-8. Otherwise the parser should treat the file as being -in CP-1252. (A better check is to pass a copy of the sequence to +in CP-1252. (A better check, and which works on EBCDIC platforms as +well, is to pass a copy of the sequence to L<utf8::decode()|utf8> which performs a full validity check on the sequence and returns TRUE if it is valid UTF-8, FALSE otherwise. This function is always pre-loaded, is fast because it is written in C, and @@ -671,12 +678,6 @@ is sufficient to establish this file's encoding. =item * -This document's requirements and suggestions about encodings -do not apply to Pod processors running on non-ASCII platforms, -notably EBCDIC platforms. - -=item * - Pod processors must treat a "=for [label] [content...]" paragraph as meaning the same thing as a "=begin [label]" paragraph, content, and an "=end [label]" paragraph. (The parser may conflate these two @@ -848,10 +849,11 @@ are the Latin1/Unicode values, even on EBCDIC platforms. When referring to characters by using a EE<lt>n> numeric code, numbers in the range 32-126 refer to those well known US-ASCII characters (also defined there by Unicode, with the same meaning), which all Pod -formatters must render faithfully. Numbers in the ranges 0-31 and -127-159 should not be used (neither as literals, nor as EE<lt>number> -codes), except for the literal byte-sequences for newline (13, 13 10, or -10), and tab (9). +formatters must render faithfully. Characters whose EE<lt>E<gt> numbers +are in the ranges 0-31 and 127-159 should not be used (neither as +literals, +nor as EE<lt>number> codes), except for the literal byte-sequences for +newline (ASCII 13, ASCII 13 10, or ASCII 10), and tab (ASCII 9). Numbers in the range 160-255 refer to Latin-1 characters (also defined there by Unicode, with the same meaning). Numbers above @@ -902,17 +904,17 @@ character 34 (doublequote, "), "EE<lt>amp>" for character 38 =item * -Note that in all cases of "EE<lt>whatever>", I<whatever> (whether +Note that in all cases of "EE<lt>whateverE<gt>", I<whatever> (whether an htmlname, or a number in any base) must consist only of alphanumeric characters -- that is, I<whatever> must watch -C<m/\A\w+\z/>. So "EE<lt> 0 1 2 3 >" is invalid, because +C<m/\A\w+\z/>. So S<"EE<lt> 0 1 2 3 E<gt>"> is invalid, because it contains spaces, which aren't alphanumeric characters. This presumably does not I<need> special treatment by a Pod processor; -" 0 1 2 3 " doesn't look like a number in any base, so it would +S<" 0 1 2 3 "> doesn't look like a number in any base, so it would presumably be looked up in the table of HTML-like names. Since -there isn't (and cannot be) an HTML-like entity called " 0 1 2 3 ", +there isn't (and cannot be) an HTML-like entity called S<" 0 1 2 3 ">, this will be treated as an error. However, Pod processors may -treat "EE<lt> 0 1 2 3 >" or "EE<lt>e-acute>" as I<syntactically> +treat S<"EE<lt> 0 1 2 3 E<gt>"> or "EE<lt>e-acute>" as I<syntactically> invalid, potentially earning a different error message than the error message (or warning, or event) generated by a merely unknown (but theoretically valid) htmlname, as in "EE<lt>qacute>" @@ -1136,7 +1138,7 @@ four attributes: =item First: -The link-text. If there is none, this must be undef. (E.g., in +The link-text. If there is none, this must be C<undef>. (E.g., in "LE<lt>Perl Functions|perlfunc>", the link-text is "Perl Functions". In "LE<lt>Time::HiRes>" and even "LE<lt>|Time::HiRes>", there is no link text. Note that link text may contain formatting.) @@ -1149,13 +1151,13 @@ text, then this is the text that we'll infer in its place. (E.g., for =item Third: -The name or URL, or undef if none. (E.g., in "LE<lt>Perl +The name or URL, or C<undef> if none. (E.g., in "LE<lt>Perl Functions|perlfunc>", the name (also sometimes called the page) -is "perlfunc". In "LE<lt>/CAVEATS>", the name is undef.) +is "perlfunc". In "LE<lt>/CAVEATS>", the name is C<undef>.) =item Fourth: -The section (AKA "item" in older perlpods), or undef if none. E.g., +The section (AKA "item" in older perlpods), or C<undef> if none. E.g., in "LE<lt>Getopt::Std/DESCRIPTIONE<gt>", "DESCRIPTION" is the section. (Note that this is not the same as a manpage section like the "5" in "man 5 crontab". "Section Foo" in the Pod sense means the part of the text -- Perl5 Master Repository
