Sean M Burke <[EMAIL PROTECTED]> writes:

> Here's a problem I've run up against in testing Pod::PXML by
> round-tripping existing POD: what character-encoding is POD in?

Pure ASCII.  Any non-ASCII characters have to be expressed as E<> escapes.
Anything else will break pod2man at present, because while groff may have
some ability to handle high-bit characters, the vendor nroff commands
generally don't, and pod2man doesn't as yet know enough to catch high-bit
characters and convert them to the appropriate escapes.

> What I'm leaning toward is to assume that all POD is in /either/ UTF8 or
> Latin-1 (or US-ASCII, in which case the difference is moot), and that
> one should start out treating it as Latin-1, scan all clusters of
> ([\x80-\xFF]+) to see if they look like UTF8, and if they all do, then
> put the whammy on the text so that it's magically to be considered
> thenceforth as Unicode.

Since it's in neither right now, if we decide to accept non-ASCII
characters in POD, I vote for just flatly declaring them to be UTF-8 and
modifying all the translators as appropriate and being done with it.  UTF
is clearly the direction that everything is going, so taking a detour
through ISO 8859-1 seems pointless.

> Now, there's a converse problem when emitting text as POD: encode it as
> UTF8, or as Latin-1?  (Remember, the difference currently arises /only/
> when there's high-bit content in verbatim blocks -- everywhere else, you
> can use a E<...>.

High-bit content in verbatim blocks currently isn't supported in POD,
period.  It happens to kind of work for pod2html and pod2text if your HTML
and text browsers agree with the character set of the original author, but
that's by pure chance, and it will make many vendor nroff implementations
explode if you use pod2man.

I agree that this should change; I'm just pointing out that you don't have
to worry about backward compatibility because up until now, this never
worked.

> I'm leaning toward "verbatims always come out in UTF8", for sake of
> uniformity.

Yup.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Reply via email to