At 08:10 2001-08-13 +0200, Philip Newton wrote: >[...] >> > So what should happen with literal byte values over 127? > >What *should* happen with them, then? The way I read Sean M. Burke's >reply, "starts with BOM means UTF-16, otherwise some undefined >encoding, presumably the platform native", with no way to specify UTF- >8. Actually, I said quite the opposite. Since Perl recognizes a Unicode Byte Order Mark at the start of files as signaling that the file is Unicode encoded as in UTF-16 (whether big-endian or little-endian), pod parsers should do the same. Otherwise, the character encoding should be understood as being UTF-8. That's from perlpodspec, draft 1. >Unless, say, UTF-8 is mandated or we get the possibility of putting >a faux UTF-8 BOM at the beginning of the file as a charset signature. I have been loathe to introduce completely new features to pod in round of perlpod at least, but I am tempted to provide some mechanism for people who want to use arbitrary non-UTF8 encodings (or at least just Latin-1). Otherwise there's no other way to get out of the default (UTF8), other than a BOM. Altho such a thing (i.e., a mechanism for explicitly saying "this is UTF8" or "this is Latin-1"/etc.) would be not quite so necessary if the first high-bit sequence seen from an otherwise undisciplined handle would throw it into the appropriate mode of utf8 / plain-8bit (or if an initial BOM is seen, a utf16 mode). DWIM DWIM DWIM! Or maybe this could be triggered by a special ":auto" discipline, rather than by a /lack/ of a discipline. AND if perl itself read its source files that way, that'd by hunkey dorey too, I think. -- Sean M. Burke [EMAIL PROTECTED] http://www.spinn.net/~sburke/
