Re: Pod::InputObjects / Pod::ParseTree

Marek Rouchal Sun, 04 Nov 2001 23:27:19 -0800

I'm finally starting to implement this in Pod::Man and Pod::Text, and here
are the nits that I've found so far in perlpodspec.


> Note that EE<lt>number> I<must not> be interpreted as simply "codepoint
> I<number> in the current/native character set".  It always means only
> "the character represented by codepoint I<number> in Unicode."  (This is
> identical to the semantics of &#I<number>; in XML.)

> This will likely require many formatters to have tables mapping from
> treatable Unicode codepoints (such as the "\xE9" for the e-acute
> character) to the escape sequences or codes necessary for conveying such
> sequences in the target output format.  A converter to *roff would, for
> example know that "\xE9" (whether conveyed literally, or via a
> EE<lt>...> sequence) is to be conveyed as "e\\*'".

\\*' is an escape defined by the prelude created by Pod::Man; it's not a
general nroff thing.  It is defined as:

..    ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"

and the accent is dropped in nroff mode.  Less than ideal.  With groff,
it's probably okay to just leave the literal ISO 8859-1 character, but
that isn't portable and will cause some nroff implementations to
segfault.

> =head1 About LE<lt>...E<gt> Codes

> As you can tell from a glance at L<perlpod|perlpod>, the LE<lt>...> code
> is the most complex of the Pod formatting codes.  The points below will
> hopefully clarify what it means and how processors should deal with it.

It would be good to include more of the technical syntax in here.  For
example, how should the following be parsed?

    L<"Parsing and/or Formatting">
    L<"Parsing/Formatting">
    L<"Fun Times"/"Summer Vacation">
    L< Time::HiRes >
    L< perlfunc / $. >
    L<
    perlfunc
    /
    $.
    >
    L< perlpodspec / " About LE<lt>...E<gt> Codes ">

What I'm currently implementing retains the behavior of stripping off
surrounding double-quotes from the section, as well as leading or trailing
whitespace inside or outside the double-quotes.  It also interprets L<>
entries that are fully contained in double-quotes as links to sections for
backward compatibility, not just looking at internal whitespace.

I'm implementing a parser that breaks down link text into the five items
described here and will include it in the next release of podlators.
(Called Pod::ParseLink.)

> Pod processors must now treat "text|"-less links as follows:

>   L<name>         =>  L<name|name>
>   L</section>     =>  L<"section"|/section>
>   L<name/section> =>  L<"section" in name|name/section>

This and some of the other text there outlaws rendering:

    L<http://www.eyrie.org/~eagle/>

as:

    <http://www.eyrie.org/~eagle/>

but instead requires that there be nothing distinguishing the URL from the
surrounding text for output formats that can't do hypertext or don't want
to use a different font.  Was that intentional?  I think I'd rather
surround it with angle brackets for Pod::Text and Pod::Man.

> Authors wanting to link to a particular (absolute) URL, must do so only
> with "LE<lt>scheme:...>" codes (like LE<lt>http://www.perl.org>), and
> must not attempt "LE<lt>Some Site Name|scheme:...>" codes.  This
> restriction avoids many problems in parsing and rendering LE<lt>...>
> codes.

This is rather unfortunate, and I don't recall the reason for it.  Can't
we just say that no unescaped | is permitted in L<> and then allow anchor
text to be specified for regular URLs?  It would be a bit trickier to deal
with this for text renderings, but something like:

    L<Anchor Text|http://www.perl.com/>
        => Anchor Text <http://www.perl.com/>

for formats that don't support hyperlinks might work.

> Previous versions of perlpod allowed for a C<LE<lt>sectionE<gt>> syntax
> (as in "C<LE<lt>Object AttributesE<gt>>"), which was not easily
> distinguishable from C<LE<lt>nameE<gt>> syntax.  This syntax is no
> longer in the specification, and has been replaced by the
> C<LE<lt>"section"E<gt>> syntax (where the quotes were formerly
> optional).  Pod parsers should tolerate the C<LE<lt>sectionE<gt>>
> syntax, for a while at least.  The suggested heuristic for
> distinguishing C<LE<lt>sectionE<gt>> from C<LE<lt>nameE<gt>> is that if
> it contains any whitespace, it's a I<section>.  Pod processors may warn
> about this being deprecated syntax.

This is where to say something about L<"Parsing/Formatting"> if we want to.

> Authors of Pod formatters are reminded that "=over" ... "=back" may map
> to several different constructs in your output format.  For example, in
> converting Pod to (X)HTML, it can map to any of <ul>...</ul>,
> <ol>...</ol>, <dl>...</dl>, or <blockquote>...</blockquote>.

Hm.  The blockquote stuff is new, but makes sense.  *time passes*  Wow,
that was a mess to implement in *roff.  Gah.

> Pod formatters I<must> tolerate arbitrarily large amounts of text in the
> "=item I<text...>" paragraph.

I think this is the wrong approach.  Why?

> But they may be arbitrarily long:

>   =item For transporting us beyond seas to be tried for pretended
>   offenses

>   =item He is at this time transporting large armies of foreign
>   mercenaries to complete the works of death, desolation and
>   tyranny, already begun with circumstances of cruelty and perfidy
>   scarcely paralleled in the most barbarous ages, and totally
>   unworthy the head of a civilized nation.

Why would someone use item tags for that?  What is that accomplishing that
can't be better accomplished by using some other form of markup?  How do
you expect HTML translators to turn that into an <a name=""> tag?  How are
you going to refer to that in an L<> code?

The only way to implement this in *roff is to stop using the .Ip macros
entirely.  That seems like a really bad idea to me.

> But (for the forseeable future), Pod does not provide any way for Pod
> authors to distinguish which grouping is meant by the above
> "=item"-cluster structure.  So formatters should format it like so:

>   Neque

>   Porro

>   Quisquam Est

>     Qui dolorem ipsum quia dolor sit amet, consectetur, adipisci
>     velit, sed quia non numquam eius modi tempora incidunt ut
>     labore et dolore magnam aliquam quaerat voluptatem.

No, I strongly disagree.

Multiple =item tags in a row have been clustered items for years; see
oodles and oodles of examples in perlfunc.  I won't implement it as above,
particularly given that the current implementation of clustering was a
specifically requested feature (and by users outside the perl5-porters
community as well as users within it).  It's not ambiguous; the item tags
are clustered.

If one really wants to render an =item tag with an empty paragraph, use
Z<> as the paragraph text.

....

Overall, this is really good.  Thank you!  Except as noted above, and
except where I've missed something (and except for a bunch of the E<>
processing which will have to wait for another time), I think I now have
this all implemented and it will be in the next release of podlators.

-- 
Russ Allbery ([EMAIL PROTECTED])             <http://www.eyrie.org/~eagle/>

Re: Pod::InputObjects / Pod::ParseTree

Reply via email to