Non-ASCII data in POD

Grant McLean Wed, 25 Apr 2012 20:23:44 -0700

Hi POD people

There's been a discussion on #metacpan about non-ASCII characters in POD
being rendered incorrectly on the metacpan.org web site.


The short story is that some people use utf8 characters without
including: =encoding utf8.  Apparently the metacpan tool chain assumes
latin1 encoding, but with the right encoding declaration, the characters
would be rendered correctly.

The latest perlpodspec seems to imply an ASCII default and anything else
should have an =encoding.  In the implementation notes section it also
suggests a heuristic of checking whether the first highbit byte-sequence
is valid as UTF-8 and default to UTF-8 if so and Latin-1 otherwise.

This raises two issues:

1) Pod::Simple (as used by metacpan) does not seem to implement this
   heuristic
2) We need to educate people who are not aware of the =encoding command

My thoughts on the second issue are that we could modify Pod::Simple to
'whine' if it sees non-ASCII bytes but no =encoding.  This in turn would
cause Test::Pod to pick up the error and help people fix it.

I'd be happy to look at implementing both these things if it's agreed
they're a good idea.

Regards
Grant

Non-ASCII data in POD

Reply via email to