At 08:10 2001-08-13 +0200, Philip Newton wrote:
>[...]
>> > So what should happen with literal byte values over 127?
>
>What *should* happen with them, then? The way I read Sean M. Burke's 
>reply, "starts with BOM means UTF-16, otherwise some undefined 
>encoding, presumably the platform native", with no way to specify UTF-
>8.

Actually, I said quite the opposite.
 Since Perl recognizes a Unicode Byte Order Mark at the start of files
 as signaling that the file is Unicode encoded as in UTF-16 (whether
 big-endian or little-endian), pod parsers should do the same.
 Otherwise, the character encoding should be understood as being
 UTF-8.

That's from perlpodspec, draft 1.

>Unless, say, UTF-8 is mandated or we get the possibility of putting 
>a faux UTF-8 BOM at the beginning of the file as a charset signature.

I have been loathe to introduce completely new features to pod in round of
perlpod at least, but I am tempted to provide some mechanism for people who
want to use arbitrary non-UTF8 encodings (or at least just Latin-1).
Otherwise there's no other way to get out of the default (UTF8), other than
a BOM.

Altho such a thing (i.e., a mechanism for explicitly saying "this is UTF8"
or "this is Latin-1"/etc.) would be not quite so necessary if the first
high-bit sequence seen from an otherwise undisciplined handle would throw
it into the appropriate mode of utf8 / plain-8bit (or if an initial BOM is
seen, a utf16 mode).  DWIM DWIM DWIM!
Or maybe this could be triggered by a special ":auto" discipline, rather
than by a /lack/ of a discipline.
AND if perl itself read its source files that way, that'd by hunkey dorey
too, I think.


--
Sean M. Burke    [EMAIL PROTECTED]    http://www.spinn.net/~sburke/

Reply via email to