Right now, all my Pod-handling code basically ignores =encoding.  It doesn't
know anything about what it does or what it's for.  I do not plan to add
support for much of it, because for the most part I don't think it's worth the
time.  My plan is to, more or less, do this:

  * assume documents are in ASCII unless =encoding appears
  * raise an exception on 8-bit characters unless =encoding appears
  * accept the instruction "=encoding utf-8" as meaning the document is UTF-8
  * raise an exception on any other =encoding instruction
  * possibly raise an exception if =encoding is not the first directive

I know this is not entirely compliant, but I think it's good enough for my
intents.  I should have almost no problem decoding only the Pod.  Nonpod
paragraphs can be left as octets.

My only question is:  how shall I handle data paragraphs?  For example:

  =encoding utf-8

  =begin data

  This is a data paragraph with a UTF-8 sequence right here: €

  ...and this is part of the same data paragraph (because they're all
  combined.)

  =end data

  Look, an ordinary paragraph.

If that Pod is converted to an element tree and the data paragraph is
extracted, should its contents be a character string or byte string?

-- 
rjbs

Reply via email to