Sean,

I like the idea. However it seems to me that we both did up to 70% of the
same work - i.e. parsing POD and turning it into a more structured
representation. See Pod::Compiler for what I did on this
subject. Attaching a XML backend to it should be a matter of one day max.

I guess that you first build up an internal object-based XML
representation and then write the code. Is the module suited for just
reading POD and returning the internal representation? Then I can just
throw away Pod::Compiler after checking whether there is something to be
merged into Pod::PXML. Do you print any errors in the POD syntax? Checking
whether internal L<>'s are valid?

Cheers,

Marek


On Tue, 17 Apr 2001, Sean M. Burke wrote:

SMB>I've been working like a madman for the past few days, coming up with
SMB>a pod2xml and matching xml2pod.  (Why?  Because it's not there.)
SMB>
SMB>The idea of these xml2pod/pod2xml (both wrapped up in a not-yet-released
SMB>Pod::PXML) is to freely go between a conventional POD document and an XML
SMB>representation of it -- write in whichever you want, convert to whichever
SMB>you need.
SMB>I came up with a DTD for the XML doctype I had in mind, and then wrote
SMB>the pod2xml and xml2pod to work for that doctype.
SMB>
SMB>(This whole endeavor, BTW, is unrelated to Matt Sergeant's Pod::XML
SMB>module.)
SMB>
SMB>The pod2xml and xml2pod that I've come up with seem to work.
SMB>I'm as surprised as anyone else.
SMB>
SMB>I've been testing them on various POD files to be found in the Perl dist
SMB>and elsewhere.  I convert each POD document to XML, check it the XML the
SMB>DTD I wrote, convert that XML to POD, and then diff that against the
SMB>original POD.  So far the XML validates, and diffing the original pod
SMB>against the round-tripped pod is showing only representational
SMB>differences, like between these two:
SMB>
SMB> C<< (?>...) >> 
SMB> C<(?E<gt>...)>
SMB>
SMB>which both equally well represent the text "(?>...)" in code style.
SMB>
SMB>I've attached the DTD below, and I'd be happy to hear your reactions,
SMB>before I go releasing Pod::PXML in a few days.
SMB>And after the DTD, I've included a sample file, the result of
SMB>feeding strict.pm thru my pod2xml.
SMB>
SMB>
SMB>One point on which I'm a bit unsure is whether I'm right in forbidding
SMB>"head1"..."head4" elements as children of "list" elements.  I've found
SMB>a cases of head1 under list elements, but all but one or two of them
SMB>are pretty clearly typos.
SMB>
SMB>Also, the treatment of L<...> elements is a bit strange, basically
SMB>because the interpretation of bar in L<foo|bar> is a mess, itself.
SMB>I've tried to make the XML representation no crazier than the Perl
SMB>representation, and to allow for flawless roundtripping.
SMB>
SMB>
SMB>===========================================================================
SMB>
SMB>DRAFT EXPERIMENTAL DTD:
SMB>
SMB><!-- It so happens that all the content-models came out
SMB>  as either (#PCDATA), (#PCDATA | foo | bar)*, or
SMB>  (foo | bar)*.  That makes validation simple.
SMB>-->
SMB>
SMB><!ELEMENT pod (head1|head2|head3|head4|p|pre|list|for|cmd)*>
SMB><!ATTLIST pod
SMB>  xmlns CDATA #FIXED 
'http://www.perl.com/CPAN/authors/id/S/SB/SBURKE/pxml_0.01.dtd'
SMB>>
SMB>
SMB><!ENTITY % Style " b | i | c | x | f | s | link ">
SMB>
SMB><!-- ==== BLOCK-LEVEL ELEMENTS ==== -->
SMB>
SMB><!ELEMENT head1 (#PCDATA | %Style; )* >
SMB><!ELEMENT head2 (#PCDATA | %Style; )* >
SMB><!ELEMENT head3 (#PCDATA | %Style; )* >
SMB><!ELEMENT head4 (#PCDATA | %Style; )* >
SMB><!ELEMENT p     (#PCDATA | %Style; )* > <!-- a normal paragraph -->
SMB>
SMB><!ELEMENT pre (#PCDATA) >  <!-- a verbatim/preformatted paragraph -->
SMB><!ATTLIST pre xml:space (preserve) #FIXED 'preserve' >
SMB>
SMB><!ELEMENT list (item|p|pre|list|for|cmd)* > <!-- "=over ...stuff... =back" -->
SMB><!ATTLIST list indent CDATA #IMPLIED > <!-- where you put the 8 in "over 8" -->
SMB><!-- sound advice: have at least one p, pre, or item per list -->
SMB><!-- sane advice: if the first thing in the list in a p, don't have items -->
SMB>
SMB><!ELEMENT item (#PCDATA | %Style;)*  > <!-- "=item ...label..." -->
SMB>
SMB><!ELEMENT for (#PCDATA) > <!-- I guess? -->
SMB><!ATTLIST for target CDATA #IMPLIED >
SMB> <!-- where you put the thing in "=begin thing" -->
SMB>
SMB><!-- Hm, maybe I should delete this element... -->
SMB><!ELEMENT cmd (#PCDATA) > <!-- I guess? -->
SMB><!ATTLIST cmd cname CDATA #IMPLIED >
SMB> <!-- where you put the thing in "=thing" -->
SMB>
SMB><!-- ==== STYLE ELEMENTS ==== -->
SMB>
SMB><!ELEMENT link (#PCDATA | %Style;)* >
SMB><!ATTLIST link
SMB> xref    CDATA #REQUIRED
SMB>><!--
SMB> Yes, every link element MUST have an xref attribute.
SMB>   xref="HTML::Tree"
SMB>   xref="HTML::Tree/That Item"
SMB>   xref="HTML::Tree/&quot;That Sectionf&quot;"
SMB>   xref="chmod(3)/That Item"
SMB>   xref="/&quot;That Section In This Document&quot;"
SMB> See perlpod for an explanation of these designators.
SMB> Note that what perlpod says goes before the | in an L<...|designator>
SMB>  actually is, in PXML, the content of the link element, not an attribute.
SMB> So this is quite wrong: <link xref="foo|bar">...</link>
SMB>  Instead have: <link xref="bar">foo</link>.
SMB> Note that you can have an empty link element: <link xref="bar"></link>
SMB>  That's the PXML representation of POD L<bar>.
SMB>  Neither the PXML nor the POD form are advisable, tho, as they both
SMB>  let the formatter decide what the link text should be; and current
SMB>  formatters are inconsistent.
SMB>-->
SMB><!-- validity constraint: don't nest your link elements -->
SMB>
SMB><!-- And now the other style elements... -->
SMB><!ELEMENT b (#PCDATA | %Style;)* > <!-- bold -->
SMB><!ELEMENT i (#PCDATA | %Style;)* > <!-- italic -->
SMB><!ELEMENT c (#PCDATA | %Style;)* > <!-- code (monospace) -->
SMB><!ELEMENT f (#PCDATA | %Style;)* > <!-- filename (monospace) -->
SMB><!ELEMENT x (#PCDATA | %Style;)* > <!-- index-point -->
SMB><!ELEMENT s (#PCDATA | %Style;)* > <!-- spaces are nonbreaking -->
SMB><!-- vality constraint: don't nest your x elements -->
SMB><!-- note: no, there's nothing in PXML corresponding to Z<> -->
SMB>
SMB><!-- ===== CHARACTER ENTITIES ===== -->
SMB>
SMB><!-- Two specials defined by perlpod, in case you want them... -->
SMB><!ENTITY sol  "&#47;">   <!-- solidus -->
SMB><!ENTITY vbar "&#124;">  <!-- vertical bar -->
SMB>
SMB><!-- And then all the W3C HTML entities that you know and love... -->
SMB>
SMB><!ENTITY % HTMLlat1 PUBLIC
SMB>   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
SMB>   "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
SMB>%HTMLlat1;
SMB><!ENTITY % HTMLspecial PUBLIC
SMB>   "-//W3C//ENTITIES Special for XHTML//EN"
SMB>   "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
SMB>%HTMLspecial;
SMB><!ENTITY % HTMLsymbol PUBLIC
SMB>   "-//W3C//ENTITIES Symbols for XHTML//EN"
SMB>   "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
SMB>%HTMLsymbol;
SMB>
SMB><!-- And they all lived happily ever after. -->
SMB>
SMB>
SMB>
SMB>===========================================================================
SMB>And now a sample document:
SMB>
SMB>
SMB><?xml version='1.0' encoding='iso-8859-1'?>
SMB><!DOCTYPE pod PUBLIC "-//Sean Michael Burke//DTD PXML 0.01//EN"
SMB> "http://www.perl.com/CPAN/authors/id/S/SB/SBURKE/pxml_0.01.dtd">
SMB><pod xmlns="http://www.perl.com/CPAN/authors/id/S/SB/SBURKE/pxml_0.01.dtd">
SMB>
SMB><head1>NAME</head1>
SMB>
SMB><p>strict - Perl pragma to restrict unsafe constructs</p>
SMB>
SMB><head1>SYNOPSIS</head1>
SMB>
SMB><pre>
SMB>    use strict;
SMB></pre>
SMB>
SMB><pre>
SMB>    use strict "vars";
SMB>    use strict "refs";
SMB>    use strict "subs";
SMB></pre>
SMB>
SMB><pre>
SMB>    use strict;
SMB>    no strict "vars";
SMB></pre>
SMB>
SMB><head1>DESCRIPTION</head1>
SMB>
SMB><p>If no import list is supplied, all possible restrictions are assumed.
SMB>(This is the safest mode to operate in, but is sometimes too strict for
SMB>casual programming.)  Currently, there are three possible things to be
SMB>strict about:  "subs", "vars", and "refs".</p>
SMB>
SMB><list indent="6">
SMB>
SMB><item><c>strict refs</c></item>
SMB>
SMB><p>This generates a runtime error if you 
SMB>use symbolic references (see <link xref="perlref"></link>).</p>
SMB>
SMB><pre>
SMB>    use strict 'refs';
SMB>    $ref = \$foo;
SMB>    print $$ref;    # ok
SMB>    $ref = "foo";
SMB>    print $$ref;    # runtime error; normally ok
SMB>    $file = "STDOUT";
SMB>    print $file "Hi!";      # error; note: no comma after $file
SMB></pre>
SMB>
SMB><item><c>strict vars</c></item>
SMB>
SMB><p>This generates a compile-time error if you access a variable that wasn't
SMB>declared via "our" or <c>use vars</c>,
SMB>localized via <c>my()</c>, or wasn't fully qualified.  Because this is to avoid
SMB>variable suicide problems and subtle dynamic scoping issues, a merely
SMB>local() variable isn't good enough.  See <link xref="perlfunc/my"></link> and
SMB><link xref="perlfunc/local"></link>.</p>
SMB>
SMB><pre>
SMB>    use strict 'vars';
SMB>    $X::foo = 1;     # ok, fully qualified
SMB>    my $foo = 10;    # ok, my() var
SMB>    local $foo = 9;  # blows up
SMB></pre>
SMB>
SMB><pre>
SMB>    package Cinna;
SMB>    our $bar;                       # Declares $bar in current package
SMB>    $bar = 'HgS';           # ok, global declared via pragma
SMB></pre>
SMB>
SMB><p>The local() generated a compile-time error because you just touched a global
SMB>name without fully qualifying it.</p>
SMB>
SMB><p>Because of their special use by sort(), the variables $a and $b are
SMB>exempted from this check.</p>
SMB>
SMB><item><c>strict subs</c></item>
SMB>
SMB><p>This disables the poetry optimization, generating a compile-time error if
SMB>you try to use a bareword identifier that's not a subroutine, unless it
SMB>appears in curly braces or on the left hand side of the "=&gt;" symbol.</p>
SMB>
SMB><pre><![CDATA[
SMB>    use strict 'subs';
SMB>    $SIG{PIPE} = Plumber;           # blows up
SMB>    $SIG{PIPE} = "Plumber";         # just fine: bareword in curlies always ok
SMB>    $SIG{PIPE} = \&Plumber;         # preferred form
SMB>]]></pre>
SMB>
SMB></list>
SMB>
SMB><p>See <link xref="perlmodlib/Pragmatic Modules"></link>.</p>
SMB>
SMB></pod>
SMB>
SMB>
SMB>===========================================================================
SMB>
SMB>
SMB>--
SMB>Sean M. Burke  [EMAIL PROTECTED]  http://www.spinn.net/~sburke/
SMB>

Reply via email to