Re: the construct "=item * Foo bar"

Russ Allbery Fri, 18 Oct 2002 01:59:37 -0700

Sean M Burke <[EMAIL PROTECTED]> writes:

> I have a feeling this is something entirely peculiar to you, and your
> unique combination of inexperience with modern markup language parsing
> approaches, and wealth of experience with Pod::Parser's very strange
> approach to, well, everything, including the very concept of parsing
> itself.


There are some parts of this that are likely peculiar to me, such as the
fact that I don't have experience with the current in-vogue method of
doing XML parsing.  I've never thought of POD as being particularly like
XML, but I do understand the basic theory of event-stream parsing and why
people would find that convenient.  I don't dispute that Pod::Parser has a
steep learning curve; it took me quite a while to figure out how to make
it do exactly what I wanted.

What is not peculiar to me is an aversion to changing running code built
on stable and tested infrastructure, and I think you're understating this
portion of the problem.

It's certainly possible to retarget Pod::Text, Pod::Man, and friends to
use Pod::Simple as the parser.  I see how I could do that.  It's a
significant restructuring, and I expect that the result will have
regressions with respect to the current output simply because it's a large
change.

What I'm missing is the compelling case for doing this.

I was initially excited about a new POD parser, because Pod::Parser isn't
exactly obvious to understand and particularly for Pod::Man I had to fight
it in some places to get it to do what I wanted.  Unfortunately, the new
parser, while it may well improve the ease of writing HTML formatters
considerably, does nothing for me for Pod::Man.  I'm still going to have
to fight the parser in different ways to get correct output, and the
structural changes are extensive.

That's a lot of work.  Pod::Man works right now.

If you had maintained an interface compatible with Pod::Parser, existing
formatters could have taken advantage of whatever new features there are
(and again, I'm unclear on what those features might be at present)
without having to do that work and incur the resulting instability.

> If you can articulate what it is you like about Pod::Parser's interface,
> I think I can make a thin wrapper around Pod::Simple that will make it
> work that way in that particular respect.  Would you like if I made
> something like the Pod::Simple::Methody class that uses the return value
> of methods as the code that is to be sent to the output channel, and/or
> if it had some notion of "the output text of the paragraph in progress"?
> I'm more than willing to accomodate all sorts of approaches to things.
> As you explain what you want, do bear in mind that it's been a while
> since I've been conversant with Pod::Parser's interface.

That's a very reasonable request, and I'll try to think about that.

The problems that I'm specifically running into with the direction that
you're going are that the POD translators that I maintain work right now
and are extremely stable.  They have very few outstanding issues; the only
significant difficulties they have right now are problems handling
non-ASCII character sets in some circumstances, something which is
exceedingly hard to fix for Pod::Man in a portable fashion regardless of
what parser you use, and which Pod::Simple doesn't yet fully address
anyway.

Therefore, unless the new parser interface clearly makes my life much
easier as a translator author, I don't see the advantages in doing a great
deal of work to switch.  So that would mean that the three things that you
or someone could do for me would be any of explaining the significant
advantages of switching, making it simple to switch by supporting a
backward compatibility layer so that formatters written to use the
Pod::Parser interface will keep working, or provide an interface that,
when I read the documentation, shows a great deal of promise for making my
code easier to maintain.

The latter I can try to help with by explaining the problems that I'm
facing, and I'll try to do that.

> I wouldn't mind if someone really did try to bring Pod::Parser up to
> spec, and also would improve its docs, and its interface, and to make it
> a real parser for Pod, instead of barely more than a tokenizer for the
> set of Pod-like languages.

The documentation for Pod::Parser is excellent.  It clearly explains every
portion of how the interface works, and furthermore the code itself is
extensively commented and documented.  Pod::Parser may have problems that
make it difficult for people to wrap their minds around how it works, but
those are interface issues, not documentation problems.  The documentation
standard set by Pod::Parser is one that I wish other Perl module writers
would follow.

I see very little benefit in making the parsing layer any more than the
tokenizer that Pod::Parser is unless you intend to support, as part of the
parser, some sort of guesswork layer.  Given that you've previously
indicated strong opposition to guesswork, I don't understand what you're
getting at.  It takes all of 10 lines of code in a formatter to reject
unknown escapes and commands, which is the primary difference.  I'm not
interested in having the main parser parse L<> sequences for me; I already
wrote a well-tested parser that does exactly that, shortly after you wrote
perlpodspec.

> I considered being that somebody, but as I read Pod::Parser's
> documentation and source, I decided it /really would/ be easier to start
> over.

I can understand why it would be easier for you to start over, but that
doesn't make it easier for the *rest* of us when you've finished starting
over and want us all to switch to a completely different module,
infrastructure, and interface.

Perhaps the reason why this seems peculiar to me is because I maintain
some of the few *actively* maintained POD formatters that use Pod::Parser.
Many of the other Pod::Parser-based formatters are largely unmaintained,
and many of the POD formatters use their own parsers or parsers written by
the same person as the formatter.

> It has been about a year since I finished writing perlpodspec, and I've
> seen no signs that anyone is interested in modernizing Pod::Parser.  You
> and one or two other people have said "it'd be nice if someone did", but
> there has been a universal expression of disinterest in actually doing
> it.

Again, I think this is directly related to the advantages in doing so not
being entirely obvious.

> Incidentally, it is my opinion that changing Pod::Man to use Pod::Simple
> is a much more important (and vastly easier) task than bringing
> Pod::Parser into the 90's, much less the 00's.  Are you interested in
> trying to make Pod::Man use Pod::Simple?  If dealing with Pod::Parser's
> interfaces is too hard for you, I could probably find someone else who
> knows enough *roff to pull it off.  But I'd like it to happen sooner
> rather than later.

So, basically, what you're saying is that after I've spent the past three
years completely rewriting pod2man, responding to bug reports on
perl5-porters, adding many changes to improve the output and make the
result more portable on multiple platforms, and turning it into a useful
tool for projects well outside of just Perl (and responding to their
feature requests and bug reports as well), the obvious solution to you if
I don't like the style of your rewritten POD formatter is to fork the
module and throw away what expertise I offer in continuing to maintain and
improve the formatter?

I suppose that means that I've done my job rather well as maintainer if
you think that would be so easy.

Let's come back again, for a moment, to the fact that Pod::Man works.  It
takes POD and it converts it to manual pages that can be viewed or
printed, it's widely used not only by Perl but by many other projects as
well, and it has an active maintainer who, I might add, fairly quickly
fixed the issues with its compliance with perlpodspec that came up after
the specification was written.

Perhaps a better way to go forward here would be for you to comment
specifically on the places where you feel that Pod::Man does not comply
with perlpodspec?

The specific problems that I'm aware of, reading the perlpodspec in Perl
5.8.0 are:

 * Unicode text input is not handled.  This is a difficult problem for
   Pod::Man that isn't going to get less difficult by switching parsers.
   I already have some ideas about how to handle this down the road, but
   any real solution is going to require limiting the portability of
   the generated *roff output, which means it's going to have to be
   optional to do anything with Unicode characters other than suppress
   them, unfortunately.

 * Handling a variety of different input encodings isn't handled cleanly.
   This includes literal high-bit ISO 8859-1 characters in verbatim
   blocks.  This is a subproblem of the above.

 * Pod::Man does not support the full current HTML entity list.  Adding
   recognition isn't particularly hard; doing something sensible with all
   of them while still being portable to the lowest common denominator
   *roff engine is rather more difficult.

 * Errors can't be suppressed and turned into a section of the document.
   This can be easily added, however; the necessary hook is in
   Pod::Parser.

 * Pod::Man doesn't correctly make decisions about the type of =over/=back
   list by inspecting the entire thing before formatting the output, and
   therefore occasionally guesses incorrectly.  This is the only place
   where changing parsers would provide some benefit, so far as I can
   tell.  Incidentally, the insistance that any numbered list always start
   with 1 is completely unnecessary for Pod::Man, and for that matter for
   most output formats including HTML.

 * Pod::Man treats multiple =item tags without an intervening paragraph as
   multiple tags for the same paragraph.  perlpodspec is simply incorrect
   on this point, changing Pod::Man's behavior on this point will cause
   the formatted output to be noticably worse in many cases including
   cases within Perl's own documentation, this was changed specifically at
   the request of users, and I'm not changing it back.  I made this clear
   at the time that perlpodspec was being written.

 * A variety of relatively obscure errors are not diagnosed as described
   in perlpodspec.

So, of these problems, unless I'm missing something significant, the only
advantage to switching to Pod::Simple would be slightly better handling of
some =over/=back lists, something that could also be solved other ways.

This is why I'm unclear on why this is such a high priority to you.

The other issues are ones that I plan on working on as I find the time,
and that of course I would welcome other people's help with.  I've not
seen any reason to date to expect that converting to Pod::Simple will
cause more people to offer to help than help now; intimate knowledge of
*roff isn't particularly common no matter what parsing infrastructure one
is using.  The primary outstanding issue is handling of non-ASCII
characters, which is simply hard, and will remain hard.

> Okay, give me compelling arguments for why I should make it be a good
> reference implementation, rather than just work well and have a good
> API.  Bearing in mind that I'm doing all the actual work here, I
> emphasize the word "compelling".

If you want to demand that everyone switch to your parser or you'll fork
their code, or if you want to put your parser into Perl core and remove
Pod::Parser, I think that's the standard that you should attempt to meet.
Obviously I'm not the Perl pumpking, and they may have different opinions
on that score.

> I'm torn between saying "funny, that's what I thought when I read
> Pod::Parser", and "funny, that's what I thought when I read the perl
> source".

Perl 5 source is not something that I would put forward to anyone as a
model of maintainable code, and I believe that's the near-unanimous
opinion of the people who are maintaining it as well.  It's reasonably
readable (outside of the regex engine) once you understand the complex
layer of macros sitting on top of it, but the learning curve is steep.
The coding standards for Perl 6 are significantly different as a result.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Re: the construct "=item * Foo bar"

Reply via email to