Grant McLean <gr...@mclean.net.nz> writes:

> OK, so I went ahead and implemented both the warning and the heuristic
> to guess Latin-1 vs UTF-8 (only when no encoding was specified).  The
> resulting patch is here:
>
>   https://github.com/theory/pod-simple/pull/26

This patch enforces authors to add an "=encoding UTF-8" line to
specify that the doc is, indeed, UTF-8 encoded.

Wouldn't it be far better to consider all POD documents to be Utf-8
encoded Unicode and fall back to Latin1 if invalid UTF-8 sequences are
detected? In other words, do not enforce the author to add "=encoding
UTF-8" since that's the default? And only add "=encoding ISO8859-1" for
Latin1 encoded documents?

Since most POD documents currently are ASCII, they won't be affected.

POD docs that are Latin1 or something similar must get an explicit
encoding line added. These are precisely the documents affected by your
patch.

-- Johan

Reply via email to