Re: What to do about L and L<"Foo Bar">

2018-05-29 Thread Grant McLean
On Tue, 2018-05-29 at 15:20 +1000, Ron Savage wrote:
> On 29/05/18 13:49, Karl Williamson wrote:
> > The question is what to do?
> > 
> > 1) We could leave things as they always have been, to let sleeping
> > dogs 
> > lie.  It's worked for so long that we're not seriously going to
> > stop 
> > accepting these.
> This pretends things have not changed when in fact they have, so not
> my preference.
> 
> > 
> > 2) Raise the warnings, either on both cases or just the deprecated
> Raise warnings on deprecated structures, so uses can fix problems, is
> my choice here.

I agree that this *sounds* like the sensible thing to do, but the
reality is that it will causes tests to fail.  The Test::Pod module is
used extensively throughout CPAN and it's function is to take any
errors or warnings in the file under test and turn them into test
failures.  Distributions which passed their tests when uploaded will
suddenly start to fail tests on install.

A few years back I added a warning about non-ASCII characters in POD
without an =encoding declaration - the fallout continues to this day.
I'm still smarting from being accused of "breaking half of CPAN" :-)

I'm not saying don't chose this option, but it's likely some pain will
ensue.

A number of people wisely pointed out that Test::Pod should only be
used in author tests and should not be run at install time.  Who knows,
perhaps enough people have since taken that advice and it won't be a
problem.  If not, regularly breaking things in this way is really the
only way to get that message out.  Perhaps a doc patch to POD::Test
might help.

> > 3) Don't raise warnings, but change Pod::Checker to do so, under
> > the 
> > theory that you won't be using that unless you want to know the
> > iffy 
> > things.  Maybe make the deprecated come out always, and the
> > tolerated 
> > only for level 2 warnings.
> This imposes a burden on users. Tasks: (a) Change to Pod::Checker,
> to 
> find problems; (b) Fix problems. Not my preference since it's
> simpler 
> for the end user to find the same problems under (2).

I tend to agree.  If non-compliant POD is a problem then I'm not sure
that option 3 is going to solve that problem.

Regards
Grant


Re: Assume CP1252

2015-01-07 Thread Grant McLean
On Mon, 2015-01-05 at 21:58 -0800, David E.Wheeler wrote:
> Pod Peeps:
> 
> perlpodspec says:
> 
>*   Since Perl recognizes a Unicode Byte Order Mark at the start of files
>as signaling that the file is Unicode encoded as in UTF-16 (whether
>big-endian or little-endian) or UTF-8, Pod parsers should do the same.
>Otherwise, the character encoding should be understood as being UTF-8
>if the first highbit byte sequence in the file seems valid as a UTF-8
>sequence, or otherwise as Latin-1.
> 
> I suggest we switch from Latin-1 to CP1252.

I also agree this is a good idea.  None of the Latin-1 control
characters that CP1252 replaces with printable characters should be
appearing in POD anyway.

Regards
Grant



Re: pod::simple::text

2014-06-25 Thread Grant McLean
On Wed, 2014-06-25 at 18:05 +, John E Guillory wrote:
> Hello,
>
> I thought I wasn’t this new to perl but …
>
> How does one use pod::simple::text to print out a section of POD, say
> the DESCRIPTION section? 

Pod::Simple provides some core POD parsing functionality which is shared
by a number of formatter classes.  There isn't any sort of query API
that would allow you to specify which sections of the POD you want.

If you do want to produce formatted plain-text output of just the
DESCRIPTION section, then probably the easiest way is to slurp in all
the POD source; use a regex to extract the section you want; and then
pass that to a formatter:

  my($pod_source) =
  read_file($source_file) =~ 
  m{^(=head1\s+DESCRIPTION.*?)(?:^=head1.*)?\z}ms;

  my $parser = Pod::Simple::Text->new();
  $parser->parse_string_document($pod_source);

The parser(/formatter) will write its output to STDOUT unless you call
$parser->output_fh with an alternative filehandle.

An alternative approach would be to subclass Pod::Simple::Text and
maintain a flag indicating when the parser is 'in' the DESCRIPTION
section and suppress all output when the flag is not set.  Unfortunately
the "suppress all output" bit is tricky since all the methods that
produce formatted output write directly to the output filehandle.

Regards
Grant




Re: Allow =over 0?

2012-10-10 Thread Grant McLean
On Wed, 2012-10-10 at 08:53 -0700, David E. Wheeler wrote:
> Pod People,
> 
> See the forwarded complaint below.

The complaint that:

   In "perldoc POD" I can't find any hint that 0 is an invalid
   indentation.

is a bit bogus because the formal specification for POD is in 'perldoc
perlpodspec' which says:

If there is any text following the "=over", it must consist of
only a nonzero positive numeral.

So a change to Pod::Simple would need to be accompanied by a change to
perlpodspec.

Regards
Grant



Re: The Encoding Warning (Again) - some data

2012-08-27 Thread Grant McLean
On Mon, 2012-08-27 at 10:17 -0700, David E. Wheeler wrote:
> Pod People,
> 
> In https://rt.cpan.org/Ticket/Display.html?id=79232, Saven Rezic
writes:
> 
> > Pod::Simple currently (e.g. with version 3.23) complains if a Pod
> > document has latin-1 characters in it but no =encoding command
> > specified. I think this is incorrect, both perlpod.pod and
> > perlpodspec.pod specify that a document without =encoding command is
in
> > latin-1:

When I kicked this process off in April the issue I was trying to fix
was that UTF-8 documents did not render correctly on metacpan.org.  I
proposed two changes: implementing the encoding heuristic and adding the
warning.  There was a small amount of discussion and both proposals were
considered sane.

  http://www.nntp.perl.org/group/perl.pod-people/2012/04/msg1789.html

At the time I had no data on how many distributions were affected (I
only knew I saw mangled characters quite frequently).  Now that the
patch is in and generating the warning, I am able to get that data.  So
today I rendered all the POD from all current distributions in my
minicpan and collected stats on how often the warning was generated.

>From a total 5157 distributions, files in 1215 distributions generated
the warning (i.e.: contained non-ASCII characters in POD with no
=encoding declaration).

The split was roughly 50-50 with 1187 files being detected as Latin-1
and 1131 as UTF-8.

So there are current 1131 files which are now able to render correctly
on metacpan.org which was my goal.

There are also 1187 files (from 570 distributions) which rendered
perfectly fine before but will now include the new warning in places
where rendering of parser errors is enabled.  This is approximately 11%
of current releases on CPAN - probably a higher number than I would have
anticipated.

Some portion of that 11% will be using Test::Pod and will now have a
test failure where none existed before.  (Sorry I don't have the
statistics on what proportion use Test::Pod and don't limit it to
'author' tests).

So if anyone's opinion is likely to be swayed by data - there's some
data.


My opinion is that the warning is useful.  However to be pragmatic, now
that the encoding detection heuristic has been implemented, adding
=encoding declarations to any of those 1215 distributions will have no
practical effect other than silencing the warning.  That's the best
argument I can come up with in favour of changing the status quo.

If we decided to turn off that warning then I would like to see an
option to allow people to turn it back on if they want.


On a tangentially related note, I was pondering whether the heuristic
should actually fall back to CP1252 rather than ISO8859-1 - after all
that's what the W3C recommend:

  http://www.w3.org/TR/html5/parsing.html#character-encodings-0

However my statistics show that only 44 files in current releases were
detected as Latin-1 but actually contained CP1252 (typically "smart
quote" symbols in the \x80-\x9F range).  So it doesn't seem worth
pursuing that change.

Finally I searched for files which were detected as UTF-8 but actually
contained characters from the CP1252 range.  There was only one and it
wasn't an error in the detection, the source file contains a
double-encoded character. It was a mangled attempt to name a contributor
- Slaven Resić   :-)

Regards
Grant





Re: Fwd: Topic/metacpan.org (#36)

2012-08-13 Thread Grant McLean
On Mon, 2012-08-13 at 09:41 -0700, David E. Wheeler wrote:
> Pod People,
> 
> 
> I got a pull request to switch to metacpan.org for L<> http links.
> AFAIK search.cpan.org is not deprecated, and is still the official
> community CPAN search site. If there is some discussion about changing
> it, or if Graham thinks it's time to switch then great. Otherwise, I
> am not inclined to accept this patch (though if it is hard to change
> the default URL with a subclass I would be happy to take that, or a
> command-line option).
>
> But I thought it ought to be subject to discussion here before I make
> any unilateral (and potentially uninformed) decisions. Comments?

Graham's search.cpan.org site has provided an excellent and valuable
service and continues to do so. However I think that the switch to
metacpan.org as the default source for CPAN metadata and documentation
should happen eventually. The key difference is that metacpan.org is an
open source project, which makes it easier for people to contribute to.
As such, it is a community site.

Regards
Grant





Re: Possible patch to Test::Pod

2012-06-08 Thread Grant McLean
On Wed, 2012-06-06 at 16:55 +1200, Grant McLean wrote:
> I'm considering a patch to make Test::Pod treat the new "missing
> =encoding" warning differently to other warnings.

OK I've been convinced it's not worth bothering with :-)

Regards
Grant



Possible patch to Test::Pod

2012-06-05 Thread Grant McLean
Hi POD People

I've created a bit of a storm by adding the new warning to Pod::Simple
which is emitted if the source POD contains non-ASCII characters but
does not include an =encoding POD command.

This has "broken CPAN" because a number of CPAN distributions (including
DBI and Dancer) include a call to Test::Pod in their main test suite.
Test::POD's role is to assert that parsing the POD produces no warnings
or errors.

While my patch adds a warning, in combination with Test::Pod it is
effectively elevated to a fatal error which blocks a clean installation
of affected distributions.

The "correct" answer is for people who use Test::Pod to only run those
tests on the author's system - i.e. pre release rather than pre install.
(And ideally add the missing -=encoding too). Of course it might be a
bit inconvenient for some maintainers to rush out a new release for that
reason alone.

I'm considering a patch to make Test::Pod treat the new "missing
=encoding" warning differently to other warnings.  The current behaviour
is to fail the test if any warnings were generated.  Instead we could
patch it as per the following pseudo code:

  if no warnings
  pass test
  else if exactly one warning AND it's the new =encoding message
  spit out a warning via test diag output
  pass test
  else
  fail test

This would mean that an end-user having trouble with failing tests at
install time could work around it by upgrading Test::Pod.

I asked David Wheeler whether this was a) sensible and b) worth doing.
He confessed to not having much of an opinion on the matter and
suggested I post here.

Opinions?

Regards
Grant




Re: Non-ASCII data in POD

2012-05-03 Thread Grant McLean
On Mon, 2012-04-30 at 14:24 +0200, Johan Vromans wrote:
> Grant McLean  writes:
> 
> > OK, so I went ahead and implemented both the warning and the heuristic
> > to guess Latin-1 vs UTF-8 (only when no encoding was specified).  The
> > resulting patch is here:
> >
> >   https://github.com/theory/pod-simple/pull/26
> 
> This patch enforces authors to add an "=encoding UTF-8" line to
> specify that the doc is, indeed, UTF-8 encoded.

Not exactly.  It generates a warning during the parsing process which
will be visible in the output of any formatter that has error output
enabled.  It's not a fatal error so it doesn't exactly "enforce"
anything.

The aim is to help people comply with the spec for POD as it is
currently written.  And that spec says that if there are non-ASCII
characters there must be an =encoding declaration.

> Wouldn't it be far better to consider all POD documents to be Utf-8
> encoded Unicode and fall back to Latin1 if invalid UTF-8 sequences are
> detected?

You won't get any argument from me that UTF-8 would be a better default,
but that's not how the spec is currently written.

If your Perl source code includes UTF-8 characters, you must say:

  use utf8;

If your POD includes UTF-8 characters, you must say:

  =encoding utf8

> In other words, do not enforce the author to add "=encoding
> UTF-8" since that's the default? And only add "=encoding ISO8859-1" for
> Latin1 encoded documents?

The patch does also implement the heuristic recommended in the
perlpodspec which has the effect of allowing either Latin-1 or UTF-8 to
work (the default is ASCII) in spite of the missing declaration.  This
will be a win for sites like metacpan.org which currently don't display
UTF-8 correctly from POD that lacks an =encoding declaration.

Any formatter that has error display disabled will see better rendering
of UTF-8 with this patch.

Additionally, if errors are displayed, the non-compliance with
perlpodspec will be reported.

Regards
Grant




Re: Non-ASCII data in POD

2012-04-27 Thread Grant McLean
On Fri, 2012-04-27 at 09:17 -0700, David E. Wheeler wrote:
> On Apr 27, 2012, at 12:10 AM, Grant McLean wrote:
> 
> > OK, so I went ahead and implemented both the warning and the heuristic
> > to guess Latin-1 vs UTF-8 (only when no encoding was specified).  The
> > resulting patch is here:
> > 
> >  https://github.com/theory/pod-simple/pull/26
> 
> I like this, but wonder if maybe it shouldn't be consistent? That is,
> if you see more than one of these in a single document, and one can be
> output as UTF-8 and the other can’t, would the resulting output have
> mixed encodings? IOW, should it not perhaps use the encoding it
> determined for the first one of these it finds in a document?

I'm not sure I quite understand what you're saying.  The first time a
non-ASCII byte is encountered, the code will 'fire' and apply the
heuristic to set an encoding.  Once the encoding is set, the code won't
be called again.

The perlpodspec seems pretty clear that a POD document containing
different encodings should be considered an error.

Regards
Grant



Re: Non-ASCII data in POD

2012-04-27 Thread Grant McLean
On Thu, 2012-04-26 at 15:23 +1200, Grant McLean wrote:
Hi POD people
> 
> There's been a discussion on #metacpan about non-ASCII characters in POD
> being rendered incorrectly on the metacpan.org web site.
> 
> The short story is that some people use utf8 characters without
> including: =encoding utf8.  Apparently the metacpan tool chain assumes
> latin1 encoding, but with the right encoding declaration, the characters
> would be rendered correctly.
> 
> The latest perlpodspec seems to imply an ASCII default and anything else
> should have an =encoding.  In the implementation notes section it also
> suggests a heuristic of checking whether the first highbit byte-sequence
> is valid as UTF-8 and default to UTF-8 if so and Latin-1 otherwise.
> 
> This raises two issues:
> 
> 1) Pod::Simple (as used by metacpan) does not seem to implement this
>heuristic
> 2) We need to educate people who are not aware of the =encoding command
> 
> My thoughts on the second issue are that we could modify Pod::Simple to
> 'whine' if it sees non-ASCII bytes but no =encoding.  This in turn would
> cause Test::Pod to pick up the error and help people fix it.
> 
> I'd be happy to look at implementing both these things if it's agreed
> they're a good idea.

OK, so I went ahead and implemented both the warning and the heuristic
to guess Latin-1 vs UTF-8 (only when no encoding was specified).  The
resulting patch is here:

  https://github.com/theory/pod-simple/pull/26

Regards
Grant





Non-ASCII data in POD

2012-04-25 Thread Grant McLean
Hi POD people

There's been a discussion on #metacpan about non-ASCII characters in POD
being rendered incorrectly on the metacpan.org web site.

The short story is that some people use utf8 characters without
including: =encoding utf8.  Apparently the metacpan tool chain assumes
latin1 encoding, but with the right encoding declaration, the characters
would be rendered correctly.

The latest perlpodspec seems to imply an ASCII default and anything else
should have an =encoding.  In the implementation notes section it also
suggests a heuristic of checking whether the first highbit byte-sequence
is valid as UTF-8 and default to UTF-8 if so and Latin-1 otherwise.

This raises two issues:

1) Pod::Simple (as used by metacpan) does not seem to implement this
   heuristic
2) We need to educate people who are not aware of the =encoding command

My thoughts on the second issue are that we could modify Pod::Simple to
'whine' if it sees non-ASCII bytes but no =encoding.  This in turn would
cause Test::Pod to pick up the error and help people fix it.

I'd be happy to look at implementing both these things if it's agreed
they're a good idea.

Regards
Grant





Re: how is the html on search.cpan.org generated?

2009-10-20 Thread Grant McLean
On Tue, 2009-10-20 at 14:19 +0100, Mitch Gower wrote:
> The pod to html conversion used on search.cpan.org seems to produce
> quite nice output.  What tool is being used there, and is it publicly
> available?  I couldn't see any evidence in the html itself of what's
> being used.

As far as I know, the source for Graham Barr's search.cpan.org site is
not available in a public repo.  There are a number of modules on CPAN
that can deliver HTML from POD in a similar form.

At the moment, I'm using Apache2::Pod and have this snippet in my Apache
config:


SetHandler  perl-script
PerlHandler Apache2::Pod::HTML
PerlSetVar  INDEX 1
PerlSetVar  STYLESHEET /pod-style.css
PerlSetVar  LINKBASE LOCAL


In my browser I can view the POD of installed modules using a URL like
this:

  http://putnam/perldoc/Apache2::Pod::HTML

The module has a built-in CSS stylesheet which you can use like this:

PerlSetVar  STYLESHEET auto

But I've got a slightly modified version in /pod-style.css which more
closely emulates the styles on search.cpan.org.

Cheers
Grant



How to generate pod.idx?

2008-09-30 Thread Grant McLean
Hi pod-people

The documentation for the perldoc command refers to a -X switch which
causes the module/POD lookup to use a pre-generated index rather than
crawling @INC.

I haven't been able to locate a utility for generating the pod.idx file
which is referred to by the code behind the -X option.  Does such a
utility exist?

Cheers
Grant