At 08:57 2002-11-13 -0800, Russ Allbery wrote:
Well, Pod parsers should try to accept Unicode as an input character set. But that doesn't implicate anything about the output range of any particular pod2whatever formatting application.Slaven Rezic <[EMAIL PROTECTED]> writes: > According to man groff_char, you do not have to make any conversions > to latin1 characters from 161 to 255. That's assuming that the character set is latin1. The POD specification says that POD should handle Unicode, which will require different handling. (That's part of why this is rather complicated.)
The route I advise/assume is to expand all E<...>'s to their characters, and then escape all characters (as necessary) to whatever escape-sequence the output format needs, and then just kill whatever the output format just can't represent (like all sorts of Unicode arcana). This approach doesn't distinguish between an � that started out as � (whether in Latin-1 or some Unicode enocding), a E<233>, a E<0xe9>, or an E<eacute>.
For example, for pre-Unicode RTF, this approach means turning \ { } \x00-\x1F and \x80-\xFF to "\'xx", escapes, and then doing s/[^\x00-\xff]/*/g, since there was no way of representing/escaping characters over 0xFF in RTF. (But now there is, incidentally; it's a different escape mechanism altogether.)
--
Sean M. Burke http://search.cpan.org/author/sburke/
