On 2013-08-12 12:22-0700 Alan W. Irwin wrote:

> [Because SGML is dead] I have decided to try xmlto (starting later today) to 
> see whether we can
> use it instead of those SGML [DocBook backend] tools.

To Andrew and Orion:

Sorry this is long, but I have achieved quite a few results since I
wrote the above just yesterday so this e-mail is packed with goodness.  :-)

I discovered a documentation issue with the SGML backend tools which
is that Table 3.4 (which is designed to show how #g<normal character>
maps to Greek characters) currently gives gibberish (see
http://plplot.sourceforge.net/docbook-manual/plplot-html-5.9.9/characters.html#greek).
This problem obviously occurred for our last release (done by Hazen
with whatever Debian release he was using at that time) and also
occurs now with my Debian wheezy platform.  (This is some sort of
regression in the SGML html backend since we did not have this problem
for older releases.) This regression and also the errors that Orion is
experiencing with generating our DocBook-based documentation for a
cutting-edge Linux distribution are all symptoms of the lack of
maintenance for the SGML backend tools for many years now.  Therefore,
I think it is long past time to move from the SGML backend tools to
XML backend tools to put generation of documentation from our DocBook
source back on solid footing again.

I changed the subject line of this e-mail to something appropriate to
that project, and the following concerns my initial results with that
project.

1. xmlto initially bombed because it uses xmllint for validation, and
that validator showed there were issues with our DocBook source code.
I fixed those issues (as of revision 12482) so this is an immediate benefit of 
looking into
xmlto.  Specifically,

xmllint --noout --nonet --xinclude --postvalid --noent plplotdoc.xml

works perfectly for revision 12482.  However, I am not going to
replace our current validator onsgmls (which is less careful than
xmllint since it did not detect the need for changes and validated our
DocBook source code both several revisions before 12482 and also for
that revision) because when there are validation errors I find that
xmllint is not robust, i.e., it tends to segfault.

2. HTML results generated from our DocBook source.

There are promising html results from xmlto but with some caveats.

a. Extremely promising....

After revision 12482 and after running make validate in the build tree
to take care of dependencies, the command

xmlto -o html-dir html plplotdoc.xml

succeeded and also rendered table 3.4 without issues (a big improvement
on results generated with the html SGML backend tool).

b. Caveats....

Just before that table 3.4 on the webpage the overline-underline example ends up
empty.  Furthermore, the colour-coded API examples in the API chapter are now a 
much
more bland looking format with no colour coding, and the filenames for
the html bits and pieces are arbitrarily numerical (e.g.,
html-dir/ch19s133.html) rather than the logical names you get from the
SGML backend (e.g., plplot-html-5.9.9/plssym.html which refers to the
same area of the documentation as that numerical file name generated
by the above xmlto command).

All these html style issues are currently controlled for the SGML HTML
backend by a configurable DSSSL stylesheet (plplotdoc-html.dsl.in in
doc/docbook/src) supplemented by the CSS stylesheet, stylesheet.css in
that same directory. Norman Walsh's on-line "DocBook: The Definitive
Guide" <http://docbook.org/tdg/en/html/docbook.html> (TDG, copyright
2003, last updated in 2006) covers DSSSL stylesheets in some detail in
Chapter 4, but remarks (a) few tools honour DSSSL and (b) DSSSL
stylesheets are actually SGML documents (which means the paucity of
open-source tools that can deal with SGML makes life difficult for the
DSSSL approach).  Furthermore, from that book it was clear that XSL
stylesheets were rapidly gaining acceptance as an alternative to DSSSL
(probably because there are so many XML tools out there in the
open-source world) and Bob Stayton wrote a chapter in that book
concerning XSL which has since expanded into its own independent book,
DocBook XSL: The Complete Guide 4th edition (TCG, Copyright 2007)
<http://www.sagehill.net/docbookxsl/>.  Furthermore interest in DSSSL
has waned since TDG was written ~10 years ago.  Norman Walsh has
published <http://docbook.org/tdg51/en/html/docbook.html> (copyright
2013) which is the DocBook 5.1 variant of TDG.  Chapter 4 in that
latest variant doesn't even mention DSSSL as a publishing tool (i.e.,
a backend language)!  Also, I am pretty sure that the tools actually
invoked by the xmlto script don't understand DSSSL stylesheets.  Thus,
my conclusion is that our current DSSSL style sheets (and probably the
CSS stylesheet, stylesheet.css as well) must be replaced by XSL
styling sheets following the methods that are documented in TCG.
And until we do that the style of our xmlto results is going
to be quite bland.

3. Print (PDF) results generated from our DocBook source.
Here too, there are promising results but also some caveats.

a. Extremely promising....

After revision 12482 and after running make validate in the build tree
to take care of dependencies, the command

xmlto --with-fop pdf plplotdoc.xml

succeeded and also rendered table 3.4 without issues.  (For example,
the small number of missing glyphs in the SGML PDF backend results
are present here.)

b. Caveats....

xmlto pdf plplotdoc.xml

errors out (see https://bugzilla.redhat.com/show_bug.cgi?id=949087 where
there doesn't seem to be any quick solution for this default pdf issue) and

xmlto --with-dblatex pdf plplotdoc.xml

succeeds but does not give good Table 3.4 results.  So avoid these
variants of the xmlto command for pdf (which is easy to do, but I
thought I had better remark on it here).

Another much more important caveat is all the style issues that
occurred for html with xmlto also occur for pdf.  So the same remarks
about moving to XSL style sheets apply here as well.

4. Print (PostScript) results generated from our DocBook source.
Here too, there are promising results but also some caveats.

a. Extremely promising....

After revision 12482 and after running make validate in the build tree
to take care of dependencies, the command

xmlto --with-fop ps plplotdoc.xml

succeeded and also rendered table 3.4 without issues (as does the
PostScript SGML backend).

b. Caveats....

Both

xmlto ps plplotdoc.xml

and

xmlto --with-dblatex ps plplotdoc.xml

error out so avoid these variants for ps (which is easy to do, but I
thought I had better remark on it here).

Another much more important caveat is all the style issues that
occurred for html and pdf also occur for ps.  So the same remarks
about moving to XSL style sheets apply here as well.

5. Print (dvi) results generated from our DocBook source.
Here there are slightly promising results but also some strong caveats.

a. Slightly promising....

After revision 12482 and after running make validate in the build tree
to take care of dependencies, the command

xmlto --with-dblatex dvi plplotdoc.xml

succeeded without obvious error messages if and only if I locally
replaced the Greek entities in math.ent by their equivalent Math
symbol unicode values, e.g., unicode x391 changed to unicode
x1D6A8.  (Note, that probably anything that was unrecognizable
would have worked for these entities.)

b. Caveats....

Both

xmlto dvi plplotdoc.xml

and

xmlto --with-fop dvi plplotdoc.xml

error out (with or without the changed math.ent) so avoid these variants for 
dvi (which is easy to do, but I
thought I had better remark on it here).

Another caveat is the dvi result produced by the one (--with-dblatex)
variant of xmlto that works for dvi above is the entities (mostly Math
symbols for the Greek letters) defined by the locally replaced
math.ent were all meaningless to the tools invoked by --with-dblatex
(probably because of the large numerical unicode index for the
Math symbol variants of the Greek letters).

So the resulting Table-3.4 results printed out the entities verbatim
e.g., &#x1D6A8; rather than the Greek letter, capital alpha).  (This
issue also occurred for --with-dblatex for pdf which is why that
variant should be avoided in the pdf case, see above.) The SGML
backend dvi results do not have this issue.  I presume there is some
sort of bug in dblatex concerning propagating entities to dvi that is
avoided if you use large unrecognizable (at least for this XML dvi
backend) unicode indices.  So this is a pretty crummy dvi result which
depends on internal details to avoid other bugs in the XML dvi
backend.

An additional less serious caveat is all the style issues that
occurred for html also occur for dvi.  So the same remarks about
moving to XSL style sheets apply here as well.

Other remarks:

I have not looked yet at man and info results with xmlto, but
apparently they are possible (which would complete the
backend set of tools that we need).

Also, according to documentation available on the web, xml is almost
completely (with just a few necessary exceptions) utf8 aware so I have tried the
experiment of inserting the utf8 code for a gamma (e.g., "γ" if your
mailer is utf8 aware) right into math.xml and

xmlto --with-fop pdf plplotdoc.xml

filled out the appropriate bit of Table-3.4 in the resulting
plplotdoc.pdf with no issues.  So this constitutes a proof-of-concept
that numerical entities such as "&#x3B3;" (or the equivalent decimal
equivalent "&#947;") that define the "&gamma;" entity in math.ent
could be replaced by the utf8 code for gamma, "γ" and so on for
all the other Greek letters.

In sum, xmlto is looking pretty good right now (except for problematic
dvi results and DSSSL stylesheet replacements by XSL stylesheets that
would have to be made in the future) so assuming I can get man and
info to work with the xmlto approach, I would likely deprecate the SGML
backend tools.  So by default (unless -DDEPRECATED_SGML_BACKEND=ON was
specified) those building our documentation would get the xmlto backend
results.  That would give us a chance to work on XSL stylesheets
(using the TCG reference above) to improve the style of our xmlto
backend results to the equivalent or better than the style of our
current SGML backend results.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Plplot-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/plplot-devel

Reply via email to