On Thu, 2012-04-26 at 15:23 +1200, Grant McLean wrote:
Hi POD people
> 
> There's been a discussion on #metacpan about non-ASCII characters in POD
> being rendered incorrectly on the metacpan.org web site.
> 
> The short story is that some people use utf8 characters without
> including: =encoding utf8.  Apparently the metacpan tool chain assumes
> latin1 encoding, but with the right encoding declaration, the characters
> would be rendered correctly.
> 
> The latest perlpodspec seems to imply an ASCII default and anything else
> should have an =encoding.  In the implementation notes section it also
> suggests a heuristic of checking whether the first highbit byte-sequence
> is valid as UTF-8 and default to UTF-8 if so and Latin-1 otherwise.
> 
> This raises two issues:
> 
> 1) Pod::Simple (as used by metacpan) does not seem to implement this
>    heuristic
> 2) We need to educate people who are not aware of the =encoding command
> 
> My thoughts on the second issue are that we could modify Pod::Simple to
> 'whine' if it sees non-ASCII bytes but no =encoding.  This in turn would
> cause Test::Pod to pick up the error and help people fix it.
> 
> I'd be happy to look at implementing both these things if it's agreed
> they're a good idea.

OK, so I went ahead and implemented both the warning and the heuristic
to guess Latin-1 vs UTF-8 (only when no encoding was specified).  The
resulting patch is here:

  https://github.com/theory/pod-simple/pull/26

Regards
Grant



Reply via email to