Re: perlpodspec, draft 2

Sean M. Burke Tue, 02 Oct 2001 14:49:45 -0700

[Summary: There's serious semantic problems with all "=for :lang en"-like
notations, and I don't want to let it in perlpodspec until I'm sure I can
identify the least-bad of the many ways to do it.]

At 05:40 PM 2001-10-01 +0300, Jarkko Hietaniemi wrote:
>[...] Moreover, I tried searching the new perlpod and perlpodspec, but
>couldn't find much mention of the i18n/l10n issues (what if somebody
>wants to write pods in Italian?)  (Maybe I just didn't find the
>discussion.) [...]

Well, you can just write in Italian.  "Ecco la fonzione foo()!"  As long as
you save in Unicode or Latin-1 you should be fine.

However, if you want to write a part in Italian that is a content-alternate
with equivalent text in English, such that English-speakers seeing only
Italian part, and Italian-speakers can see only the Italian part, then THAT
opens up a whole can of worms.  I've been thinking about those worms
lately.  They are slimy and crawly!

The message
  http:[EMAIL PROTECTED]/msg00383.html
is what got me started thinking about it in the first place.

To paraphrase it, you could write something like this:

[newlines elided for concision]
=over
=begin :lang en
=item function($filename)
This is a function.
=end :lang en
=for :lang fr
=item function($nom_de_fichier)
C'est une fonction.
=end :lang fr
...

This introduces a whole level of complexity into Pod parsing and
processing, because:

* No longer can the first (non-=pod non-=cut) command after =over be used
to say which of the four kinds of over-regions this is.
(Yes, that's just a problem with one command, but when you have only a
half-dozen commands in the language, every one of them counts.)

* It's not backwards compatible.  Pulling English things off into "=begin
:lang en"..."=end :lang en" regions (or "=begin :lang-en"..."=end :lang-en"
or however it's expressed -- maybe I prefer :lang-en) looks to all existing
processors as if all the content just up and disappeared, and that's REALLY
bad.  It's not "degrading gracefully", to reuse a term from HTML standards.
(I can think of a way that's backwards compatible AND degrades gracefully,
but it makes some the other problems actually worse.)

I think this means that I don't want this in the current
perlpod/perlpodspec drafts, because the current drafts are basically a
best-current-practices documents, and the few new(-seeming) things that it
introduces, are backwards-compatable.  Really, I don't like breaking things!

* If/when you assemble a doctree, you can no longer say that the parent of
every 'item' node is an 'over' node (or whatever you call an =over...=back
region).  In general, this could make for pretty crazy trees, whereas
before the doctrees pretty much made good sense.  For example, consider:

[newlines elided for concision]
=head1 Foo
Bar
=over 
=item *
Stuff
=item *
Thing
=back

That assembles to probably something like this:

  * head1
     * "Foo"
  * p
     * "Bar"
  * over-bullet
     * item-bullet
        * "Stuff"
     * item-bullet
        * "Thing"

Nice and straightforward.  And if someone wants to convert this to a format
where headings can't stand on their own, but have to bracket the text that
they're a heading for, then it only takes a /little/ doing.  (Like: look
for all the right siblings of each "head1" node, stopping just before the
next "head1" node.)

However, consider:

[newlines elided for concision]
=begin :lang fr
=head1 Les Fonctions Publiques
Ces fonctions sont la pour vous!
=end :lang fr
=begin :lang en
=head1 Public Functions
These functions are for you!
=end :lang en
Bar
=over

=begin :lang fr
=item *
Des trucs
=end :lang fr
=begin :lang en
=item *
Stuff
=end :lang en

=begin :lang fr
=item *
Les choses
=end :lang fr
=begin :lang en
=item *
Things
=end :lang en

=back

The tree for that looks like this:

  * for (lang fr)
     * head1
        * "Les Fonctions Publiques"
     * p
        * "Ces fonctions sont la pour vous!"
  * for (lang en)
     * head1
        * "Public Functions"
     * p
        * "These functions are for you!"
  * over-bullet
     * for (lang fr)
        * item-bullet
           * "Des trucs"
     * for (lang en)
        * item-bullet
           * "Stuff"
     * for (lang fr)
        * item-bullet
           * "Les choses"
     * for (lang en)
        * item-bullet
           * "Things"

I think that makes for a rather harder-to-traverse doctree -- and our
example operation of "find the nodes governed by this =head1 node" has now
become really much more difficult.

Moreover, as you process these things (whether in a doctree or in an event
model), you can no longer get away with treating for/begin...end regions by
saying "is their target tag 'rtf', because I'm an rtf formatter!", and if
so, then process the nodes under it, otherwise prune.  But once you
introduce a "=for lang langtag", instead you have to see whether it's
"lang", and if so, see whether that's a tag of any of the languages you've
been told to process for (which maybe MUST be listed in the VERSION
section?  Or even before that?).
(BTW, I /think/ that instead a ":lang-fr" syntax solves some of these
problems.)

Now, it could be that the relationship of these things to doctree models
can be ironed out by saying that to get a doctree, you /have/ to say what
targets (like "rtf, lang-en, private") you're formatting for.  I strongly
resist that idea, partly because carries the stink of preprocessors (which
are bad news for just about any language they touch, /especially/ markup
languages) and also because it really destroys one of my long-term goals
(and Larry's too, unless he changed his mind since 1999), which is that Pod
should be a notational variant of (and losslessly expressable as) a subset
of XML.  And currently it IS!  Pod::PXML, while still experimental (and
probably requiring minor revision in light of perlpodspec), makes it so.
Except possibly for  in XML, you can convert POD to XML
and back without losing any information.  And that is true /only/ because
you can take a Pod document and get THE doctree, not just A doctree for my
target-set.  (And you can take a PXML document's doctree, and make /the/
Pod doctree from it.)

An alternate approach is that there's two kinds of trees:  one is THE
doctree, and has all these things like piles of "for" elements and whatnot,
as with the bigger tree shown above; and then you say "now, make this a
happy simple doctree by throwing out anything that's not in my target-set,
which is qw~lang-en rtf~".  And then the simplifier goes thru and promotes
some nodes, and destroys others, so that what you're left with is one of
the N-possible handy simplified doctrees, like this (for lang-en):

  * head1
     * "Public Functions"
  * p
     * "These functions are for you!"
  * over-bullet
     * item-bullet
        * "Stuff"
     * item-bullet
        * "Things"

However, this dichotomy between The Doctree for a document and a usable
doctree for your target-set, is still upsetting.
But maybe there's no way around it (or something very much like it).

I think it'll take a lot more thought (and me sitting in diners and
scribbling lots of code on the back of napkins), and I don't want to
prematurely add it to current perlpodspec until I'm sure of how it needs to
happen.  For the moment, I want the current perlpodspec (once I tidy up a
few loose ends people have commented on) to basically be
backwards-compatible and a best-current-practices document.

--
Sean M. Burke  [EMAIL PROTECTED]  http://www.spinn.net/~sburke/

Re: perlpodspec, draft 2

Reply via email to