pod2xml/xml2pod (Pod::PXML)

Sean M. Burke Tue, 17 Apr 2001 02:36:04 -0700
I've been working like a madman for the past few days, coming up with
a pod2xml and matching xml2pod.  (Why?  Because it's not there.)

The idea of these xml2pod/pod2xml (both wrapped up in a not-yet-released
Pod::PXML) is to freely go between a conventional POD document and an XML
representation of it -- write in whichever you want, convert to whichever
you need.
I came up with a DTD for the XML doctype I had in mind, and then wrote
the pod2xml and xml2pod to work for that doctype.

(This whole endeavor, BTW, is unrelated to Matt Sergeant's Pod::XML
module.)

The pod2xml and xml2pod that I've come up with seem to work.
I'm as surprised as anyone else.

I've been testing them on various POD files to be found in the Perl dist
and elsewhere.  I convert each POD document to XML, check it the XML the
DTD I wrote, convert that XML to POD, and then diff that against the
original POD.  So far the XML validates, and diffing the original pod
against the round-tripped pod is showing only representational
differences, like between these two:

 C<< (?>...) >> 
 C<(?E<gt>...)>

which both equally well represent the text "(?>...)" in code style.

I've attached the DTD below, and I'd be happy to hear your reactions,
before I go releasing Pod::PXML in a few days.
And after the DTD, I've included a sample file, the result of
feeding strict.pm thru my pod2xml.


One point on which I'm a bit unsure is whether I'm right in forbidding
"head1"..."head4" elements as children of "list" elements.  I've found
a cases of head1 under list elements, but all but one or two of them
are pretty clearly typos.

Also, the treatment of L<...> elements is a bit strange, basically
because the interpretation of bar in L<foo|bar> is a mess, itself.
I've tried to make the XML representation no crazier than the Perl
representation, and to allow for flawless roundtripping.


===========================================================================

DRAFT EXPERIMENTAL DTD:

<!-- It so happens that all the content-models came out
  as either (#PCDATA), (#PCDATA | foo | bar)*, or
  (foo | bar)*.  That makes validation simple.
-->

<!ELEMENT pod (head1|head2|head3|head4|p|pre|list|for|cmd)*>
<!ATTLIST pod
  xmlns CDATA #FIXED 'http://www.perl.com/CPAN/authors/id/S/SB/SBURKE/pxml_0.01.dtd'
>

<!ENTITY % Style " b | i | c | x | f | s | link ">

<!-- ==== BLOCK-LEVEL ELEMENTS ==== -->

<!ELEMENT head1 (#PCDATA | %Style; )* >
<!ELEMENT head2 (#PCDATA | %Style; )* >
<!ELEMENT head3 (#PCDATA | %Style; )* >
<!ELEMENT head4 (#PCDATA | %Style; )* >
<!ELEMENT p     (#PCDATA | %Style; )* > <!-- a normal paragraph -->

<!ELEMENT pre (#PCDATA) >  <!-- a verbatim/preformatted paragraph -->
<!ATTLIST pre xml:space (preserve) #FIXED 'preserve' >

<!ELEMENT list (item|p|pre|list|for|cmd)* > <!-- "=over ...stuff... =back" -->
<!ATTLIST list indent CDATA #IMPLIED > <!-- where you put the 8 in "over 8" -->
<!-- sound advice: have at least one p, pre, or item per list -->
<!-- sane advice: if the first thing in the list in a p, don't have items -->

<!ELEMENT item (#PCDATA | %Style;)*  > <!-- "=item ...label..." -->

<!ELEMENT for (#PCDATA) > <!-- I guess? -->
<!ATTLIST for target CDATA #IMPLIED >
 <!-- where you put the thing in "=begin thing" -->

<!-- Hm, maybe I should delete this element... -->
<!ELEMENT cmd (#PCDATA) > <!-- I guess? -->
<!ATTLIST cmd cname CDATA #IMPLIED >
 <!-- where you put the thing in "=thing" -->

<!-- ==== STYLE ELEMENTS ==== -->

<!ELEMENT link (#PCDATA | %Style;)* >
<!ATTLIST link
 xref    CDATA #REQUIRED
><!--
 Yes, every link element MUST have an xref attribute.
   xref="HTML::Tree"
   xref="HTML::Tree/That Item"
   xref="HTML::Tree/&quot;That Sectionf&quot;"
   xref="chmod(3)/That Item"
   xref="/&quot;That Section In This Document&quot;"
 See perlpod for an explanation of these designators.
 Note that what perlpod says goes before the | in an L<...|designator>
  actually is, in PXML, the content of the link element, not an attribute.
 So this is quite wrong: <link xref="foo|bar">...</link>
  Instead have: <link xref="bar">foo</link>.
 Note that you can have an empty link element: <link xref="bar"></link>
  That's the PXML representation of POD L<bar>.
  Neither the PXML nor the POD form are advisable, tho, as they both
  let the formatter decide what the link text should be; and current
  formatters are inconsistent.
-->
<!-- validity constraint: don't nest your link elements -->

<!-- And now the other style elements... -->
<!ELEMENT b (#PCDATA | %Style;)* > <!-- bold -->
<!ELEMENT i (#PCDATA | %Style;)* > <!-- italic -->
<!ELEMENT c (#PCDATA | %Style;)* > <!-- code (monospace) -->
<!ELEMENT f (#PCDATA | %Style;)* > <!-- filename (monospace) -->
<!ELEMENT x (#PCDATA | %Style;)* > <!-- index-point -->
<!ELEMENT s (#PCDATA | %Style;)* > <!-- spaces are nonbreaking -->
<!-- vality constraint: don't nest your x elements -->
<!-- note: no, there's nothing in PXML corresponding to Z<> -->

<!-- ===== CHARACTER ENTITIES ===== -->

<!-- Two specials defined by perlpod, in case you want them... -->
<!ENTITY sol  "&#47;">   <!-- solidus -->
<!ENTITY vbar "&#124;">  <!-- vertical bar -->

<!-- And then all the W3C HTML entities that you know and love... -->

<!ENTITY % HTMLlat1 PUBLIC
   "-//W3C//ENTITIES Latin 1 for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;
<!ENTITY % HTMLspecial PUBLIC
   "-//W3C//ENTITIES Special for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
%HTMLspecial;
<!ENTITY % HTMLsymbol PUBLIC
   "-//W3C//ENTITIES Symbols for XHTML//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%HTMLsymbol;

<!-- And they all lived happily ever after. -->



===========================================================================
And now a sample document:


<?xml version='1.0' encoding='iso-8859-1'?>
<!DOCTYPE pod PUBLIC "-//Sean Michael Burke//DTD PXML 0.01//EN"
 "http://www.perl.com/CPAN/authors/id/S/SB/SBURKE/pxml_0.01.dtd">
<pod xmlns="http://www.perl.com/CPAN/authors/id/S/SB/SBURKE/pxml_0.01.dtd">

<head1>NAME</head1>

<p>strict - Perl pragma to restrict unsafe constructs</p>

<head1>SYNOPSIS</head1>

<pre>
    use strict;
</pre>

<pre>
    use strict "vars";
    use strict "refs";
    use strict "subs";
</pre>

<pre>
    use strict;
    no strict "vars";
</pre>

<head1>DESCRIPTION</head1>

<p>If no import list is supplied, all possible restrictions are assumed.
(This is the safest mode to operate in, but is sometimes too strict for
casual programming.)  Currently, there are three possible things to be
strict about:  "subs", "vars", and "refs".</p>

<list indent="6">

<item><c>strict refs</c></item>

<p>This generates a runtime error if you 
use symbolic references (see <link xref="perlref"></link>).</p>

<pre>
    use strict 'refs';
    $ref = \$foo;
    print $$ref;        # ok
    $ref = "foo";
    print $$ref;        # runtime error; normally ok
    $file = "STDOUT";
    print $file "Hi!";  # error; note: no comma after $file
</pre>

<item><c>strict vars</c></item>

<p>This generates a compile-time error if you access a variable that wasn't
declared via "our" or <c>use vars</c>,
localized via <c>my()</c>, or wasn't fully qualified.  Because this is to avoid
variable suicide problems and subtle dynamic scoping issues, a merely
local() variable isn't good enough.  See <link xref="perlfunc/my"></link> and
<link xref="perlfunc/local"></link>.</p>

<pre>
    use strict 'vars';
    $X::foo = 1;         # ok, fully qualified
    my $foo = 10;        # ok, my() var
    local $foo = 9;      # blows up
</pre>

<pre>
    package Cinna;
    our $bar;                   # Declares $bar in current package
    $bar = 'HgS';               # ok, global declared via pragma
</pre>

<p>The local() generated a compile-time error because you just touched a global
name without fully qualifying it.</p>

<p>Because of their special use by sort(), the variables $a and $b are
exempted from this check.</p>

<item><c>strict subs</c></item>

<p>This disables the poetry optimization, generating a compile-time error if
you try to use a bareword identifier that's not a subroutine, unless it
appears in curly braces or on the left hand side of the "=&gt;" symbol.</p>

<pre><![CDATA[
    use strict 'subs';
    $SIG{PIPE} = Plumber;       # blows up
    $SIG{PIPE} = "Plumber";     # just fine: bareword in curlies always ok
    $SIG{PIPE} = \&Plumber;     # preferred form
]]></pre>

</list>

<p>See <link xref="perlmodlib/Pragmatic Modules"></link>.</p>

</pod>


===========================================================================


--
Sean M. Burke  [EMAIL PROTECTED]  http://www.spinn.net/~sburke/
pod2xml/xml2pod (Pod::PXML)

Reply via email to