On Mon, 2012-04-30 at 14:24 +0200, Johan Vromans wrote:
> Grant McLean <gr...@mclean.net.nz> writes:
> 
> > OK, so I went ahead and implemented both the warning and the heuristic
> > to guess Latin-1 vs UTF-8 (only when no encoding was specified).  The
> > resulting patch is here:
> >
> >   https://github.com/theory/pod-simple/pull/26
> 
> This patch enforces authors to add an "=encoding UTF-8" line to
> specify that the doc is, indeed, UTF-8 encoded.

Not exactly.  It generates a warning during the parsing process which
will be visible in the output of any formatter that has error output
enabled.  It's not a fatal error so it doesn't exactly "enforce"
anything.

The aim is to help people comply with the spec for POD as it is
currently written.  And that spec says that if there are non-ASCII
characters there must be an =encoding declaration.

> Wouldn't it be far better to consider all POD documents to be Utf-8
> encoded Unicode and fall back to Latin1 if invalid UTF-8 sequences are
> detected?

You won't get any argument from me that UTF-8 would be a better default,
but that's not how the spec is currently written.

If your Perl source code includes UTF-8 characters, you must say:

  use utf8;

If your POD includes UTF-8 characters, you must say:

  =encoding utf8

> In other words, do not enforce the author to add "=encoding
> UTF-8" since that's the default? And only add "=encoding ISO8859-1" for
> Latin1 encoded documents?

The patch does also implement the heuristic recommended in the
perlpodspec which has the effect of allowing either Latin-1 or UTF-8 to
work (the default is ASCII) in spite of the missing declaration.  This
will be a win for sites like metacpan.org which currently don't display
UTF-8 correctly from POD that lacks an =encoding declaration.

Any formatter that has error display disabled will see better rendering
of UTF-8 with this patch.

Additionally, if errors are displayed, the non-compliance with
perlpodspec will be reported.

Regards
Grant


Reply via email to