=encoding in perlpod/perlpodspec

Sean M. Burke Sun, 07 Sep 2003 04:09:37 -0700

This look okay?


--- perlpod.pod~core    Mon Sep  1 22:15:26 2003
+++ perlpod.pod Fri Sep  5 02:39:04 2003
@@ -269,6 +269,24 @@
 normal formatting (e.g., may not be a normal-use paragraph, but might
 be for formatting as a footnote).
 
+
+=item C<=encoding I<encodingname>>
+
+This command is used for declaring the encoding of a document.  Most
+users won't need this; but if your encoding isn't US-ASCII or Latin-1,
+then put a C<=encoding I<encodingname>> command early in the document so
+that pod formatters will know how to decode the document.  For
+I<encodingname>, use a name recognized by the L<Encode::Supported>
+module.  Examples:
+
+  =encoding utf8
+
+  =encoding koi8-r
+  
+  =encoding ShiftJIS
+  
+  =encoding big5
+
 =back
 
 And don't forget, when using any command, that the command lasts up


--- perlpodspec.pod~core        Mon Sep  1 22:15:26 2003
+++ perlpodspec.pod     Fri Sep  5 02:52:18 2003
@@ -332,6 +332,30 @@
 to use "=for formatname text..." to express "text..." as a verbatim
 paragraph.
 
+=item "=encoding encodingname"
+
+This command, which should occur early in the document (at least
+before any non-USASCII data!), declares that this document is
+encoded in the encoding I<encodingname>, which must be
+an encoding name that L<Encoding> recognizes.  (Encoding's list
+of supported encodings, in L<Encoding::Supported>, is useful here.)
+If the Pod parser cannot decode the declared encoding, it 
+should emit a warning and may abort parsing the document
+altogether.
+
+A document having more than one "=encoding" line should be
+considered an error.  Pod processors may silently tolerate this if
+the not-first "=encoding" lines are just duplicates of the
+first one (e.g., if there's a "=use utf8" line, and later on
+another "=use utf8" line).  But Pod processors should complain if
+there are contradictory "=encoding" lines in the same document
+(e.g., if there is a "=encoding utf8" early in the document and
+"=encoding big5" later).  Pod processors that recognize BOMs
+may also complain if they see an "=encoding" line
+that contradicts the BOM (e.g., if a document with a UTF16LE BOM
+has an "=encoding shiftjis" line).
+
+
 =back
 
 If a Pod processor sees any command other than the ones listed
@@ -569,12 +593,14 @@
 being UTF-8 if the first highbit byte sequence in the file seems
 valid as a UTF-8 sequence, or otherwise as Latin-1.
 
-Future versions of this specification may specify
-how Pod can accept other encodings.  Presumably treatment of other
-encodings in Pod parsing would be as in XML parsing: whatever the
-encoding declared by a particular Pod file, content is to be
-stored in memory as Unicode characters.
+It is, however, good practice to explicitly declare the document
+as UTF8, with an "=encoding utf8" line, early in the document.
 
+At time of writing (2003), UTF8 is considered the normal encoding
+for Unicode; the UTF-16 encodings are not widely used,
+and support for them in various Perl tools is spotty.
+
+
 =item *
 
 The well known Unicode Byte Order Marks are as follows:  if the

=encoding in perlpod/perlpodspec

Reply via email to