Sorry this is so long. This idea comes up every so often, and I don't remember the last it was possible to lay all the issues on the table in a single message.
On Thu, Jan 03, 2002 at 02:03:48PM -0700, Sean M. Burke wrote: > So I've been thinking many Deep Thoughts lately about Pod. > > I have competing goals in the design of Pod as a document format: OK. > The first and foremost goal is the absolute requirement that Pod be > sufficient for easily writing text documentation, and that its semantics be > simple enough for all its constructs to be easily translatable into any > sane markup language or typesetting system. OK. > The second goal is that Pod be extensible enough that you could use it as a > sort of "Huffman-coding for XML" [...] Er, um, uh, why? And which definition of XML are you using? The simple definition of a well-formed tagged document where block and inline tags are conflated, or the whole big tur^Wshiny ball of metal that includes schemata, hyperlinking, namespaces and whatnot? It would be really nice if the second didn't induce carpal tunnel syndrome. It would be sufficient if the first was explicitly targeted as an goal for an extension of Pod. It would be an interesting hack to target the second goal, but one that would probably never be adopted outside a small group of extremists (see http://www.yaml.org, sml-dev, etc.). Getting back to the first goal, remember that Pod is a *formatting* language, and XML is simply a grammar. Yes, a grammar, not a format. The ideal use case for XML begins with an XML vocabulary that completely divorces structure from presentation, and forces presentation to be handled by Some Other Program(tm). Here, Pod is like HTML, *TeX and *roff, but only less so. It's a good lowest common denominator format that is roughly interchangeable between these other formats. But Pod's greatest strength is also it's greatest weakness: =head1 Author and Copyright Information Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington. All rights reserved. [...] Or even this: =head1 SYNOPSIS B<perl> S<[ B<-sTuU> ]> S<[ B<-hv> ] [ B<-V>[:I<configvar>] ]> S<[ B<-cw> ] [ B<-d>[:I<debugger>] ] [ B<-D>[I<number/list>] ]> S<[ B<-pna> ] [ B<-F>I<pattern> ] [ B<-l>[I<octal>] ] [ B<-0>[I<octal>] ]> S<[ B<-I>I<dir> ] [ B<-m>[B<->]I<module> ] [ B<-M>[B<->]I<'module...'> ]> S<[ B<-P> ]> S<[ B<-S> ]> S<[ B<-x>[I<dir>] ]> S<[ B<-i>[I<extension>] ]> S<[ B<-e> I<'command'> ] [ B<--> ] [ I<programfile> ] [ I<argument> ]...> Keep in mind that Pod is designed to communicate directly to a human audience via a formatting system of some type. Ideally, a Huffman coded XML version of these documents would allow the author to specify: - who are the authors? - where can they be contacted? - when was the document copyrighted? - under what terms the document was copyrighted? (GPL, LGPL, AL) - what is being protected by this copyright statement? (this document? a module? a distribution?) - what is B<perl> here? A piece of boldfaced text, or a program name? - Is this string of text formatted literally, or is it something more structured (e.g. a command synopsis typically found at the beginning of a manpage)? - what is B<-d>? Is it an operator that tests for the presence of directories? Is it a command line switch? A boldfaced negative d? - what do I<configvar> and I<debugger> mean exactly? Are they optional parameters? Are the preceding colons required when they appear? Italicized words? - What exactly is a pattern, as described by B<-F>I<pattern>? a regex? A swatch of cloth? A literal string? - Is B<-hv> one switch or two? If two, are they always used together? Do they serve similar functions? Do they serve DocBook allows most of these questions to be answered explictly; formatting these items as bold, plaintext, italics, concatenated into paragraphs, etc. is necessarily handled by a stylesheet that associates these formatting properties to specific types of text. If Pod were to solve this problem in a Pod-like manner, then it would intuit most of the answers to these questions much like it intuits literal sections (or over eagerly intuits "ls(1)" to be shorthand for "the ls(1) manpage"). The grammar would be nasty, the parser would have a lot of special cases, but the complexity would be where it belongs -- with the parser, not with the documentation format. There are two core issues here that have always been conflated in Pod: - Pod is a simple authoring syntax; the syntax has been oversimplified to better match the problem domain of authoring documents, and the complexity has been shoved to the parser (rather than the other way around, e.g. XML) - Pod is a compact formatting language specifically designed for writing documentation that is directly targeted at a human audience It's the second point that's the thorn. It's also the second point that gives the 80/20 benefit. The first point is a holy grail. > (as I remember Larry once expressing the > idea, altho I'm quoting from memory, as I can't now find the exact > message). I've heard what Larry has to say about XML, and I don't think he fully groks the difference between presentational markup and structural markup. Pod (the language) simply isn't structural (quibbles about =head1 vs. =sect1 notwithstanding). That isn't to say that Pod can't do a better job, or that Pod can't be extended ever so slightly to hit an 80/20 spot in a much wider domain. But Pod as defined in perlpod and perlpodspec are quite simply formatting languages. So, from here on out, I'll talk about the possibilities of extending the Pod *syntax* to be a huffman coding for XML. This means keeping command paragraphs as block tags, formatting codes as inline tags, paragraph parsing, and (possibly) verbatim paragraphs -- and ignoring the semantic behaviors of =head1, C<>, B<> and the like. The best idea Larry has blessed so far is the =use clause that optionally makes Pod behave differently. That is, the definition of Pod as a syntax is pretty stable, the definition of the formatting semantics is sacrosanct, but the definitions of new semantics is up for discussion. Use of the standard semantics in a new context is similarly up for discussion. Here's a proposal (From Ilya, by way of Sean): > So instead of: > > The <emphasis>destructor</emphasis> (<function>DESTROY</function>) for the > object <literal>$b</literal> will be called... > > You would have something like: > > =equate M emphasis,B > > =equate U function,C > > =equate T literal,C > > and then anytime later... > > The M<destructor> (U<function>) for the object T<$b> will be called... The obvious issue here is that there are only so many uppercase ASCII characters. I wrote up some ideas for TPC5 about an experiment to create a new Pod language using something an extension of basic Pod syntax. The jump to from m/[A-Z]<+.../ to m/[A-Za-z]{1,2}<+.../ isn't that great. Furthermore, ln<> or link<> is so much more intuitive and self-describing than L<>. (I experimented with multiple spellings of the same tag name: em<>, emph<>, emphasis<> => <emphasis>...</emphasis>; lit<>, literal<> => <literal>...</literal>; but 2-4 characters seemed sufficient, as Larry has said on many occasions.) The =equate proposal only addresses formatting codes, and surely new block codes will be necessary to extend beyond =head1 and =begin/=end. If Pod is to be a huffman encoding for XML, then we can do much better than =begin table/=end table "compressing" <table>...</table>. :-) Presumably, a similar mechanism could be created to define new command paragraphs, both "standalone" commands like =head1 and "block" commands like =begin/=end. However, this spontaneously recreates the problems of *roff (and perhaps TeX) that have been solved by SGML and XML. James Clark wrote groff because there was no GNU replacement for *roff, and because it seemed like a good idea at the time. However, in the process of working so deeply with the *roff language, James found that conflating the macro language with the formatting language (or markup language if you perfer) really makes things quite difficult to maintain, and just isn't as expressive as it ought to be. The experience drove him to SGML, and write sp/jade/etc. (He discusses this in a recent interview in Dr. Dobbs). Like many XML folks, I trust James implicitly when it comes to markup languages; if he says that adding a macro facility such as =equate is a bad idea in a markup language, then it's a Bad Idea(tm). James has also been pretty down on XML DTDs of late, but the more I look at the issue of extending Pod, the best idea I've seen worth stealing is the SGML DTD. Consider this: - Pod (the syntax) parsers do nothing more than convert a document in the Pod syntax into a series of events or a tree (the two most popular parsing APIs). - Pod (the language) formatters take this tree of events (possibly validated through a Pod checker to barf on illegal constructs such as E<0 1 2 3>) process these events/trees into something else (new Pod documents, HTML, spell-checked Pod, word-count summaries, etc.) Given that, consider: - A Pod formatter that comes across a =use clause may load a Perl module that contains: - code for processing new linguistic constructs (e.g. =list, =table, =bibliography) - code for formatting them appropriately (?) Note that the "DTD" in this case is actually Perl code that defines these new tags/"linguistic constructs", but also contains the code to process them. This is similar to the SGML DTD in that a document cannot be parsed without the document definition, but improves upon it in that it is not simply a declaration of what is valid, but also the validator itself. Using an invalid target in a =use clause is obviously an error; support of the =use clause is manditory for an "extensible Pod" formatter, but completely optional for a standard Pod formatter (thus maintaining Pod the formatting language as sacrosanct). If that's the case, then we can have documents such as this: =use p5ee.component =name ElfinSword::OrcFinder =interface [...] =prereq ElfinSword ElfinMagic OrcFinder =version 1.0 =author Keebler the Immortal Elf =maintainer Frodo Baggins <[EMAIL PROTECTED]> =copyright GPL =pod =head1 .... No, it's not a simple document. Yes, it's easy to understand, easy to create, easy to validate, easy to process. Yes, it's more explicit and contains much more metadata than jrandom.pod *for a specific problem domain*. No, it's not an extension of the Pod formatting language, it's a new language that reuses the Pod syntax. No, it's not a replacement for DocBook. Yes it's a replacement for Carpal-Tunnel-P5EE-ML. Yes it's better than punting with =for xml <p5ee-component>...</p5ee-component> >From here, it's a SMOP to add support for other XML constructs, such as PIs comments attributes namespaces (for specific problem domains) deeply nested blocks complex markup All that's required is a "validator" that recognizes Pod constructs that map to these XML constructs, as well as the appropriate formatters to emit them. As always, the formatter for a hypothetical p5ee Pod format is an exercise left to the reader, much like the formatter for FooML. > While I realize that these are all "problems" not with Pod, but with the > attempt to allow use Pod as a shorthand for XML. But like I say, if > there's some way to kill many birds with one stone (without requiring that > stone be the size of Ireland, be in hyperspace, and/or be made of > neutronium), it'd be nice to do, so that we could spread around the > numminess of Pod! > > Thoughts, anyone? Yes. Keep up the good work. Z.
