Sean M Burke <[EMAIL PROTECTED]> writes: > I have a feeling this is something entirely peculiar to you, and your > unique combination of inexperience with modern markup language parsing > approaches, and wealth of experience with Pod::Parser's very strange > approach to, well, everything, including the very concept of parsing > itself.
There are some parts of this that are likely peculiar to me, such as the fact that I don't have experience with the current in-vogue method of doing XML parsing. I've never thought of POD as being particularly like XML, but I do understand the basic theory of event-stream parsing and why people would find that convenient. I don't dispute that Pod::Parser has a steep learning curve; it took me quite a while to figure out how to make it do exactly what I wanted. What is not peculiar to me is an aversion to changing running code built on stable and tested infrastructure, and I think you're understating this portion of the problem. It's certainly possible to retarget Pod::Text, Pod::Man, and friends to use Pod::Simple as the parser. I see how I could do that. It's a significant restructuring, and I expect that the result will have regressions with respect to the current output simply because it's a large change. What I'm missing is the compelling case for doing this. I was initially excited about a new POD parser, because Pod::Parser isn't exactly obvious to understand and particularly for Pod::Man I had to fight it in some places to get it to do what I wanted. Unfortunately, the new parser, while it may well improve the ease of writing HTML formatters considerably, does nothing for me for Pod::Man. I'm still going to have to fight the parser in different ways to get correct output, and the structural changes are extensive. That's a lot of work. Pod::Man works right now. If you had maintained an interface compatible with Pod::Parser, existing formatters could have taken advantage of whatever new features there are (and again, I'm unclear on what those features might be at present) without having to do that work and incur the resulting instability. > If you can articulate what it is you like about Pod::Parser's interface, > I think I can make a thin wrapper around Pod::Simple that will make it > work that way in that particular respect. Would you like if I made > something like the Pod::Simple::Methody class that uses the return value > of methods as the code that is to be sent to the output channel, and/or > if it had some notion of "the output text of the paragraph in progress"? > I'm more than willing to accomodate all sorts of approaches to things. > As you explain what you want, do bear in mind that it's been a while > since I've been conversant with Pod::Parser's interface. That's a very reasonable request, and I'll try to think about that. The problems that I'm specifically running into with the direction that you're going are that the POD translators that I maintain work right now and are extremely stable. They have very few outstanding issues; the only significant difficulties they have right now are problems handling non-ASCII character sets in some circumstances, something which is exceedingly hard to fix for Pod::Man in a portable fashion regardless of what parser you use, and which Pod::Simple doesn't yet fully address anyway. Therefore, unless the new parser interface clearly makes my life much easier as a translator author, I don't see the advantages in doing a great deal of work to switch. So that would mean that the three things that you or someone could do for me would be any of explaining the significant advantages of switching, making it simple to switch by supporting a backward compatibility layer so that formatters written to use the Pod::Parser interface will keep working, or provide an interface that, when I read the documentation, shows a great deal of promise for making my code easier to maintain. The latter I can try to help with by explaining the problems that I'm facing, and I'll try to do that. > I wouldn't mind if someone really did try to bring Pod::Parser up to > spec, and also would improve its docs, and its interface, and to make it > a real parser for Pod, instead of barely more than a tokenizer for the > set of Pod-like languages. The documentation for Pod::Parser is excellent. It clearly explains every portion of how the interface works, and furthermore the code itself is extensively commented and documented. Pod::Parser may have problems that make it difficult for people to wrap their minds around how it works, but those are interface issues, not documentation problems. The documentation standard set by Pod::Parser is one that I wish other Perl module writers would follow. I see very little benefit in making the parsing layer any more than the tokenizer that Pod::Parser is unless you intend to support, as part of the parser, some sort of guesswork layer. Given that you've previously indicated strong opposition to guesswork, I don't understand what you're getting at. It takes all of 10 lines of code in a formatter to reject unknown escapes and commands, which is the primary difference. I'm not interested in having the main parser parse L<> sequences for me; I already wrote a well-tested parser that does exactly that, shortly after you wrote perlpodspec. > I considered being that somebody, but as I read Pod::Parser's > documentation and source, I decided it /really would/ be easier to start > over. I can understand why it would be easier for you to start over, but that doesn't make it easier for the *rest* of us when you've finished starting over and want us all to switch to a completely different module, infrastructure, and interface. Perhaps the reason why this seems peculiar to me is because I maintain some of the few *actively* maintained POD formatters that use Pod::Parser. Many of the other Pod::Parser-based formatters are largely unmaintained, and many of the POD formatters use their own parsers or parsers written by the same person as the formatter. > It has been about a year since I finished writing perlpodspec, and I've > seen no signs that anyone is interested in modernizing Pod::Parser. You > and one or two other people have said "it'd be nice if someone did", but > there has been a universal expression of disinterest in actually doing > it. Again, I think this is directly related to the advantages in doing so not being entirely obvious. > Incidentally, it is my opinion that changing Pod::Man to use Pod::Simple > is a much more important (and vastly easier) task than bringing > Pod::Parser into the 90's, much less the 00's. Are you interested in > trying to make Pod::Man use Pod::Simple? If dealing with Pod::Parser's > interfaces is too hard for you, I could probably find someone else who > knows enough *roff to pull it off. But I'd like it to happen sooner > rather than later. So, basically, what you're saying is that after I've spent the past three years completely rewriting pod2man, responding to bug reports on perl5-porters, adding many changes to improve the output and make the result more portable on multiple platforms, and turning it into a useful tool for projects well outside of just Perl (and responding to their feature requests and bug reports as well), the obvious solution to you if I don't like the style of your rewritten POD formatter is to fork the module and throw away what expertise I offer in continuing to maintain and improve the formatter? I suppose that means that I've done my job rather well as maintainer if you think that would be so easy. Let's come back again, for a moment, to the fact that Pod::Man works. It takes POD and it converts it to manual pages that can be viewed or printed, it's widely used not only by Perl but by many other projects as well, and it has an active maintainer who, I might add, fairly quickly fixed the issues with its compliance with perlpodspec that came up after the specification was written. Perhaps a better way to go forward here would be for you to comment specifically on the places where you feel that Pod::Man does not comply with perlpodspec? The specific problems that I'm aware of, reading the perlpodspec in Perl 5.8.0 are: * Unicode text input is not handled. This is a difficult problem for Pod::Man that isn't going to get less difficult by switching parsers. I already have some ideas about how to handle this down the road, but any real solution is going to require limiting the portability of the generated *roff output, which means it's going to have to be optional to do anything with Unicode characters other than suppress them, unfortunately. * Handling a variety of different input encodings isn't handled cleanly. This includes literal high-bit ISO 8859-1 characters in verbatim blocks. This is a subproblem of the above. * Pod::Man does not support the full current HTML entity list. Adding recognition isn't particularly hard; doing something sensible with all of them while still being portable to the lowest common denominator *roff engine is rather more difficult. * Errors can't be suppressed and turned into a section of the document. This can be easily added, however; the necessary hook is in Pod::Parser. * Pod::Man doesn't correctly make decisions about the type of =over/=back list by inspecting the entire thing before formatting the output, and therefore occasionally guesses incorrectly. This is the only place where changing parsers would provide some benefit, so far as I can tell. Incidentally, the insistance that any numbered list always start with 1 is completely unnecessary for Pod::Man, and for that matter for most output formats including HTML. * Pod::Man treats multiple =item tags without an intervening paragraph as multiple tags for the same paragraph. perlpodspec is simply incorrect on this point, changing Pod::Man's behavior on this point will cause the formatted output to be noticably worse in many cases including cases within Perl's own documentation, this was changed specifically at the request of users, and I'm not changing it back. I made this clear at the time that perlpodspec was being written. * A variety of relatively obscure errors are not diagnosed as described in perlpodspec. So, of these problems, unless I'm missing something significant, the only advantage to switching to Pod::Simple would be slightly better handling of some =over/=back lists, something that could also be solved other ways. This is why I'm unclear on why this is such a high priority to you. The other issues are ones that I plan on working on as I find the time, and that of course I would welcome other people's help with. I've not seen any reason to date to expect that converting to Pod::Simple will cause more people to offer to help than help now; intimate knowledge of *roff isn't particularly common no matter what parsing infrastructure one is using. The primary outstanding issue is handling of non-ASCII characters, which is simply hard, and will remain hard. > Okay, give me compelling arguments for why I should make it be a good > reference implementation, rather than just work well and have a good > API. Bearing in mind that I'm doing all the actual work here, I > emphasize the word "compelling". If you want to demand that everyone switch to your parser or you'll fork their code, or if you want to put your parser into Perl core and remove Pod::Parser, I think that's the standard that you should attempt to meet. Obviously I'm not the Perl pumpking, and they may have different opinions on that score. > I'm torn between saying "funny, that's what I thought when I read > Pod::Parser", and "funny, that's what I thought when I read the perl > source". Perl 5 source is not something that I would put forward to anyone as a model of maintainable code, and I believe that's the near-unanimous opinion of the people who are maintaining it as well. It's reasonably readable (outside of the regex engine) once you understand the complex layer of macros sitting on top of it, but the learning curve is steep. The coding standards for Perl 6 are significantly different as a result. -- Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/>
