On Thu, 2012-04-26 at 15:23 +1200, Grant McLean wrote: Hi POD people > > There's been a discussion on #metacpan about non-ASCII characters in POD > being rendered incorrectly on the metacpan.org web site. > > The short story is that some people use utf8 characters without > including: =encoding utf8. Apparently the metacpan tool chain assumes > latin1 encoding, but with the right encoding declaration, the characters > would be rendered correctly. > > The latest perlpodspec seems to imply an ASCII default and anything else > should have an =encoding. In the implementation notes section it also > suggests a heuristic of checking whether the first highbit byte-sequence > is valid as UTF-8 and default to UTF-8 if so and Latin-1 otherwise. > > This raises two issues: > > 1) Pod::Simple (as used by metacpan) does not seem to implement this > heuristic > 2) We need to educate people who are not aware of the =encoding command > > My thoughts on the second issue are that we could modify Pod::Simple to > 'whine' if it sees non-ASCII bytes but no =encoding. This in turn would > cause Test::Pod to pick up the error and help people fix it. > > I'd be happy to look at implementing both these things if it's agreed > they're a good idea.
OK, so I went ahead and implemented both the warning and the heuristic to guess Latin-1 vs UTF-8 (only when no encoding was specified). The resulting patch is here: https://github.com/theory/pod-simple/pull/26 Regards Grant
