Right now, all my Pod-handling code basically ignores =encoding. It doesn't know anything about what it does or what it's for. I do not plan to add support for much of it, because for the most part I don't think it's worth the time. My plan is to, more or less, do this:
* assume documents are in ASCII unless =encoding appears * raise an exception on 8-bit characters unless =encoding appears * accept the instruction "=encoding utf-8" as meaning the document is UTF-8 * raise an exception on any other =encoding instruction * possibly raise an exception if =encoding is not the first directive I know this is not entirely compliant, but I think it's good enough for my intents. I should have almost no problem decoding only the Pod. Nonpod paragraphs can be left as octets. My only question is: how shall I handle data paragraphs? For example: =encoding utf-8 =begin data This is a data paragraph with a UTF-8 sequence right here: € ...and this is part of the same data paragraph (because they're all combined.) =end data Look, an ordinary paragraph. If that Pod is converted to an element tree and the data paragraph is extracted, should its contents be a character string or byte string? -- rjbs