Re: What to do about L and L<"Foo Bar">
On Tue, 2018-05-29 at 15:20 +1000, Ron Savage wrote: > On 29/05/18 13:49, Karl Williamson wrote: > > The question is what to do? > > > > 1) We could leave things as they always have been, to let sleeping > > dogs > > lie. It's worked for so long that we're not seriously going to > > stop > > accepting these. > This pretends things have not changed when in fact they have, so not > my preference. > > > > > 2) Raise the warnings, either on both cases or just the deprecated > Raise warnings on deprecated structures, so uses can fix problems, is > my choice here. I agree that this *sounds* like the sensible thing to do, but the reality is that it will causes tests to fail. The Test::Pod module is used extensively throughout CPAN and it's function is to take any errors or warnings in the file under test and turn them into test failures. Distributions which passed their tests when uploaded will suddenly start to fail tests on install. A few years back I added a warning about non-ASCII characters in POD without an =encoding declaration - the fallout continues to this day. I'm still smarting from being accused of "breaking half of CPAN" :-) I'm not saying don't chose this option, but it's likely some pain will ensue. A number of people wisely pointed out that Test::Pod should only be used in author tests and should not be run at install time. Who knows, perhaps enough people have since taken that advice and it won't be a problem. If not, regularly breaking things in this way is really the only way to get that message out. Perhaps a doc patch to POD::Test might help. > > 3) Don't raise warnings, but change Pod::Checker to do so, under > > the > > theory that you won't be using that unless you want to know the > > iffy > > things. Maybe make the deprecated come out always, and the > > tolerated > > only for level 2 warnings. > This imposes a burden on users. Tasks: (a) Change to Pod::Checker, > to > find problems; (b) Fix problems. Not my preference since it's > simpler > for the end user to find the same problems under (2). I tend to agree. If non-compliant POD is a problem then I'm not sure that option 3 is going to solve that problem. Regards Grant
Re: Assume CP1252
On Mon, 2015-01-05 at 21:58 -0800, David E.Wheeler wrote: > Pod Peeps: > > perlpodspec says: > >* Since Perl recognizes a Unicode Byte Order Mark at the start of files >as signaling that the file is Unicode encoded as in UTF-16 (whether >big-endian or little-endian) or UTF-8, Pod parsers should do the same. >Otherwise, the character encoding should be understood as being UTF-8 >if the first highbit byte sequence in the file seems valid as a UTF-8 >sequence, or otherwise as Latin-1. > > I suggest we switch from Latin-1 to CP1252. I also agree this is a good idea. None of the Latin-1 control characters that CP1252 replaces with printable characters should be appearing in POD anyway. Regards Grant
Re: pod::simple::text
On Wed, 2014-06-25 at 18:05 +, John E Guillory wrote: > Hello, > > I thought I wasn’t this new to perl but … > > How does one use pod::simple::text to print out a section of POD, say > the DESCRIPTION section? Pod::Simple provides some core POD parsing functionality which is shared by a number of formatter classes. There isn't any sort of query API that would allow you to specify which sections of the POD you want. If you do want to produce formatted plain-text output of just the DESCRIPTION section, then probably the easiest way is to slurp in all the POD source; use a regex to extract the section you want; and then pass that to a formatter: my($pod_source) = read_file($source_file) =~ m{^(=head1\s+DESCRIPTION.*?)(?:^=head1.*)?\z}ms; my $parser = Pod::Simple::Text->new(); $parser->parse_string_document($pod_source); The parser(/formatter) will write its output to STDOUT unless you call $parser->output_fh with an alternative filehandle. An alternative approach would be to subclass Pod::Simple::Text and maintain a flag indicating when the parser is 'in' the DESCRIPTION section and suppress all output when the flag is not set. Unfortunately the "suppress all output" bit is tricky since all the methods that produce formatted output write directly to the output filehandle. Regards Grant
Re: Allow =over 0?
On Wed, 2012-10-10 at 08:53 -0700, David E. Wheeler wrote: > Pod People, > > See the forwarded complaint below. The complaint that: In "perldoc POD" I can't find any hint that 0 is an invalid indentation. is a bit bogus because the formal specification for POD is in 'perldoc perlpodspec' which says: If there is any text following the "=over", it must consist of only a nonzero positive numeral. So a change to Pod::Simple would need to be accompanied by a change to perlpodspec. Regards Grant
Re: The Encoding Warning (Again) - some data
On Mon, 2012-08-27 at 10:17 -0700, David E. Wheeler wrote: > Pod People, > > In https://rt.cpan.org/Ticket/Display.html?id=79232, Saven Rezic writes: > > > Pod::Simple currently (e.g. with version 3.23) complains if a Pod > > document has latin-1 characters in it but no =encoding command > > specified. I think this is incorrect, both perlpod.pod and > > perlpodspec.pod specify that a document without =encoding command is in > > latin-1: When I kicked this process off in April the issue I was trying to fix was that UTF-8 documents did not render correctly on metacpan.org. I proposed two changes: implementing the encoding heuristic and adding the warning. There was a small amount of discussion and both proposals were considered sane. http://www.nntp.perl.org/group/perl.pod-people/2012/04/msg1789.html At the time I had no data on how many distributions were affected (I only knew I saw mangled characters quite frequently). Now that the patch is in and generating the warning, I am able to get that data. So today I rendered all the POD from all current distributions in my minicpan and collected stats on how often the warning was generated. >From a total 5157 distributions, files in 1215 distributions generated the warning (i.e.: contained non-ASCII characters in POD with no =encoding declaration). The split was roughly 50-50 with 1187 files being detected as Latin-1 and 1131 as UTF-8. So there are current 1131 files which are now able to render correctly on metacpan.org which was my goal. There are also 1187 files (from 570 distributions) which rendered perfectly fine before but will now include the new warning in places where rendering of parser errors is enabled. This is approximately 11% of current releases on CPAN - probably a higher number than I would have anticipated. Some portion of that 11% will be using Test::Pod and will now have a test failure where none existed before. (Sorry I don't have the statistics on what proportion use Test::Pod and don't limit it to 'author' tests). So if anyone's opinion is likely to be swayed by data - there's some data. My opinion is that the warning is useful. However to be pragmatic, now that the encoding detection heuristic has been implemented, adding =encoding declarations to any of those 1215 distributions will have no practical effect other than silencing the warning. That's the best argument I can come up with in favour of changing the status quo. If we decided to turn off that warning then I would like to see an option to allow people to turn it back on if they want. On a tangentially related note, I was pondering whether the heuristic should actually fall back to CP1252 rather than ISO8859-1 - after all that's what the W3C recommend: http://www.w3.org/TR/html5/parsing.html#character-encodings-0 However my statistics show that only 44 files in current releases were detected as Latin-1 but actually contained CP1252 (typically "smart quote" symbols in the \x80-\x9F range). So it doesn't seem worth pursuing that change. Finally I searched for files which were detected as UTF-8 but actually contained characters from the CP1252 range. There was only one and it wasn't an error in the detection, the source file contains a double-encoded character. It was a mangled attempt to name a contributor - Slaven Resić :-) Regards Grant
Re: Fwd: Topic/metacpan.org (#36)
On Mon, 2012-08-13 at 09:41 -0700, David E. Wheeler wrote: > Pod People, > > > I got a pull request to switch to metacpan.org for L<> http links. > AFAIK search.cpan.org is not deprecated, and is still the official > community CPAN search site. If there is some discussion about changing > it, or if Graham thinks it's time to switch then great. Otherwise, I > am not inclined to accept this patch (though if it is hard to change > the default URL with a subclass I would be happy to take that, or a > command-line option). > > But I thought it ought to be subject to discussion here before I make > any unilateral (and potentially uninformed) decisions. Comments? Graham's search.cpan.org site has provided an excellent and valuable service and continues to do so. However I think that the switch to metacpan.org as the default source for CPAN metadata and documentation should happen eventually. The key difference is that metacpan.org is an open source project, which makes it easier for people to contribute to. As such, it is a community site. Regards Grant
Re: Possible patch to Test::Pod
On Wed, 2012-06-06 at 16:55 +1200, Grant McLean wrote: > I'm considering a patch to make Test::Pod treat the new "missing > =encoding" warning differently to other warnings. OK I've been convinced it's not worth bothering with :-) Regards Grant
Possible patch to Test::Pod
Hi POD People I've created a bit of a storm by adding the new warning to Pod::Simple which is emitted if the source POD contains non-ASCII characters but does not include an =encoding POD command. This has "broken CPAN" because a number of CPAN distributions (including DBI and Dancer) include a call to Test::Pod in their main test suite. Test::POD's role is to assert that parsing the POD produces no warnings or errors. While my patch adds a warning, in combination with Test::Pod it is effectively elevated to a fatal error which blocks a clean installation of affected distributions. The "correct" answer is for people who use Test::Pod to only run those tests on the author's system - i.e. pre release rather than pre install. (And ideally add the missing -=encoding too). Of course it might be a bit inconvenient for some maintainers to rush out a new release for that reason alone. I'm considering a patch to make Test::Pod treat the new "missing =encoding" warning differently to other warnings. The current behaviour is to fail the test if any warnings were generated. Instead we could patch it as per the following pseudo code: if no warnings pass test else if exactly one warning AND it's the new =encoding message spit out a warning via test diag output pass test else fail test This would mean that an end-user having trouble with failing tests at install time could work around it by upgrading Test::Pod. I asked David Wheeler whether this was a) sensible and b) worth doing. He confessed to not having much of an opinion on the matter and suggested I post here. Opinions? Regards Grant
Re: Non-ASCII data in POD
On Mon, 2012-04-30 at 14:24 +0200, Johan Vromans wrote: > Grant McLean writes: > > > OK, so I went ahead and implemented both the warning and the heuristic > > to guess Latin-1 vs UTF-8 (only when no encoding was specified). The > > resulting patch is here: > > > > https://github.com/theory/pod-simple/pull/26 > > This patch enforces authors to add an "=encoding UTF-8" line to > specify that the doc is, indeed, UTF-8 encoded. Not exactly. It generates a warning during the parsing process which will be visible in the output of any formatter that has error output enabled. It's not a fatal error so it doesn't exactly "enforce" anything. The aim is to help people comply with the spec for POD as it is currently written. And that spec says that if there are non-ASCII characters there must be an =encoding declaration. > Wouldn't it be far better to consider all POD documents to be Utf-8 > encoded Unicode and fall back to Latin1 if invalid UTF-8 sequences are > detected? You won't get any argument from me that UTF-8 would be a better default, but that's not how the spec is currently written. If your Perl source code includes UTF-8 characters, you must say: use utf8; If your POD includes UTF-8 characters, you must say: =encoding utf8 > In other words, do not enforce the author to add "=encoding > UTF-8" since that's the default? And only add "=encoding ISO8859-1" for > Latin1 encoded documents? The patch does also implement the heuristic recommended in the perlpodspec which has the effect of allowing either Latin-1 or UTF-8 to work (the default is ASCII) in spite of the missing declaration. This will be a win for sites like metacpan.org which currently don't display UTF-8 correctly from POD that lacks an =encoding declaration. Any formatter that has error display disabled will see better rendering of UTF-8 with this patch. Additionally, if errors are displayed, the non-compliance with perlpodspec will be reported. Regards Grant
Re: Non-ASCII data in POD
On Fri, 2012-04-27 at 09:17 -0700, David E. Wheeler wrote: > On Apr 27, 2012, at 12:10 AM, Grant McLean wrote: > > > OK, so I went ahead and implemented both the warning and the heuristic > > to guess Latin-1 vs UTF-8 (only when no encoding was specified). The > > resulting patch is here: > > > > https://github.com/theory/pod-simple/pull/26 > > I like this, but wonder if maybe it shouldn't be consistent? That is, > if you see more than one of these in a single document, and one can be > output as UTF-8 and the other can’t, would the resulting output have > mixed encodings? IOW, should it not perhaps use the encoding it > determined for the first one of these it finds in a document? I'm not sure I quite understand what you're saying. The first time a non-ASCII byte is encountered, the code will 'fire' and apply the heuristic to set an encoding. Once the encoding is set, the code won't be called again. The perlpodspec seems pretty clear that a POD document containing different encodings should be considered an error. Regards Grant
Re: Non-ASCII data in POD
On Thu, 2012-04-26 at 15:23 +1200, Grant McLean wrote: Hi POD people > > There's been a discussion on #metacpan about non-ASCII characters in POD > being rendered incorrectly on the metacpan.org web site. > > The short story is that some people use utf8 characters without > including: =encoding utf8. Apparently the metacpan tool chain assumes > latin1 encoding, but with the right encoding declaration, the characters > would be rendered correctly. > > The latest perlpodspec seems to imply an ASCII default and anything else > should have an =encoding. In the implementation notes section it also > suggests a heuristic of checking whether the first highbit byte-sequence > is valid as UTF-8 and default to UTF-8 if so and Latin-1 otherwise. > > This raises two issues: > > 1) Pod::Simple (as used by metacpan) does not seem to implement this >heuristic > 2) We need to educate people who are not aware of the =encoding command > > My thoughts on the second issue are that we could modify Pod::Simple to > 'whine' if it sees non-ASCII bytes but no =encoding. This in turn would > cause Test::Pod to pick up the error and help people fix it. > > I'd be happy to look at implementing both these things if it's agreed > they're a good idea. OK, so I went ahead and implemented both the warning and the heuristic to guess Latin-1 vs UTF-8 (only when no encoding was specified). The resulting patch is here: https://github.com/theory/pod-simple/pull/26 Regards Grant
Non-ASCII data in POD
Hi POD people There's been a discussion on #metacpan about non-ASCII characters in POD being rendered incorrectly on the metacpan.org web site. The short story is that some people use utf8 characters without including: =encoding utf8. Apparently the metacpan tool chain assumes latin1 encoding, but with the right encoding declaration, the characters would be rendered correctly. The latest perlpodspec seems to imply an ASCII default and anything else should have an =encoding. In the implementation notes section it also suggests a heuristic of checking whether the first highbit byte-sequence is valid as UTF-8 and default to UTF-8 if so and Latin-1 otherwise. This raises two issues: 1) Pod::Simple (as used by metacpan) does not seem to implement this heuristic 2) We need to educate people who are not aware of the =encoding command My thoughts on the second issue are that we could modify Pod::Simple to 'whine' if it sees non-ASCII bytes but no =encoding. This in turn would cause Test::Pod to pick up the error and help people fix it. I'd be happy to look at implementing both these things if it's agreed they're a good idea. Regards Grant
Re: how is the html on search.cpan.org generated?
On Tue, 2009-10-20 at 14:19 +0100, Mitch Gower wrote: > The pod to html conversion used on search.cpan.org seems to produce > quite nice output. What tool is being used there, and is it publicly > available? I couldn't see any evidence in the html itself of what's > being used. As far as I know, the source for Graham Barr's search.cpan.org site is not available in a public repo. There are a number of modules on CPAN that can deliver HTML from POD in a similar form. At the moment, I'm using Apache2::Pod and have this snippet in my Apache config: SetHandler perl-script PerlHandler Apache2::Pod::HTML PerlSetVar INDEX 1 PerlSetVar STYLESHEET /pod-style.css PerlSetVar LINKBASE LOCAL In my browser I can view the POD of installed modules using a URL like this: http://putnam/perldoc/Apache2::Pod::HTML The module has a built-in CSS stylesheet which you can use like this: PerlSetVar STYLESHEET auto But I've got a slightly modified version in /pod-style.css which more closely emulates the styles on search.cpan.org. Cheers Grant
How to generate pod.idx?
Hi pod-people The documentation for the perldoc command refers to a -X switch which causes the module/POD lookup to use a pre-generated index rather than crawling @INC. I haven't been able to locate a utility for generating the pod.idx file which is referred to by the code behind the -X option. Does such a utility exist? Cheers Grant