On 2013-08-12 12:22-0700 Alan W. Irwin wrote: > [Because SGML is dead] I have decided to try xmlto (starting later today) to > see whether we can > use it instead of those SGML [DocBook backend] tools.
To Andrew and Orion: Sorry this is long, but I have achieved quite a few results since I wrote the above just yesterday so this e-mail is packed with goodness. :-) I discovered a documentation issue with the SGML backend tools which is that Table 3.4 (which is designed to show how #g<normal character> maps to Greek characters) currently gives gibberish (see http://plplot.sourceforge.net/docbook-manual/plplot-html-5.9.9/characters.html#greek). This problem obviously occurred for our last release (done by Hazen with whatever Debian release he was using at that time) and also occurs now with my Debian wheezy platform. (This is some sort of regression in the SGML html backend since we did not have this problem for older releases.) This regression and also the errors that Orion is experiencing with generating our DocBook-based documentation for a cutting-edge Linux distribution are all symptoms of the lack of maintenance for the SGML backend tools for many years now. Therefore, I think it is long past time to move from the SGML backend tools to XML backend tools to put generation of documentation from our DocBook source back on solid footing again. I changed the subject line of this e-mail to something appropriate to that project, and the following concerns my initial results with that project. 1. xmlto initially bombed because it uses xmllint for validation, and that validator showed there were issues with our DocBook source code. I fixed those issues (as of revision 12482) so this is an immediate benefit of looking into xmlto. Specifically, xmllint --noout --nonet --xinclude --postvalid --noent plplotdoc.xml works perfectly for revision 12482. However, I am not going to replace our current validator onsgmls (which is less careful than xmllint since it did not detect the need for changes and validated our DocBook source code both several revisions before 12482 and also for that revision) because when there are validation errors I find that xmllint is not robust, i.e., it tends to segfault. 2. HTML results generated from our DocBook source. There are promising html results from xmlto but with some caveats. a. Extremely promising.... After revision 12482 and after running make validate in the build tree to take care of dependencies, the command xmlto -o html-dir html plplotdoc.xml succeeded and also rendered table 3.4 without issues (a big improvement on results generated with the html SGML backend tool). b. Caveats.... Just before that table 3.4 on the webpage the overline-underline example ends up empty. Furthermore, the colour-coded API examples in the API chapter are now a much more bland looking format with no colour coding, and the filenames for the html bits and pieces are arbitrarily numerical (e.g., html-dir/ch19s133.html) rather than the logical names you get from the SGML backend (e.g., plplot-html-5.9.9/plssym.html which refers to the same area of the documentation as that numerical file name generated by the above xmlto command). All these html style issues are currently controlled for the SGML HTML backend by a configurable DSSSL stylesheet (plplotdoc-html.dsl.in in doc/docbook/src) supplemented by the CSS stylesheet, stylesheet.css in that same directory. Norman Walsh's on-line "DocBook: The Definitive Guide" <http://docbook.org/tdg/en/html/docbook.html> (TDG, copyright 2003, last updated in 2006) covers DSSSL stylesheets in some detail in Chapter 4, but remarks (a) few tools honour DSSSL and (b) DSSSL stylesheets are actually SGML documents (which means the paucity of open-source tools that can deal with SGML makes life difficult for the DSSSL approach). Furthermore, from that book it was clear that XSL stylesheets were rapidly gaining acceptance as an alternative to DSSSL (probably because there are so many XML tools out there in the open-source world) and Bob Stayton wrote a chapter in that book concerning XSL which has since expanded into its own independent book, DocBook XSL: The Complete Guide 4th edition (TCG, Copyright 2007) <http://www.sagehill.net/docbookxsl/>. Furthermore interest in DSSSL has waned since TDG was written ~10 years ago. Norman Walsh has published <http://docbook.org/tdg51/en/html/docbook.html> (copyright 2013) which is the DocBook 5.1 variant of TDG. Chapter 4 in that latest variant doesn't even mention DSSSL as a publishing tool (i.e., a backend language)! Also, I am pretty sure that the tools actually invoked by the xmlto script don't understand DSSSL stylesheets. Thus, my conclusion is that our current DSSSL style sheets (and probably the CSS stylesheet, stylesheet.css as well) must be replaced by XSL styling sheets following the methods that are documented in TCG. And until we do that the style of our xmlto results is going to be quite bland. 3. Print (PDF) results generated from our DocBook source. Here too, there are promising results but also some caveats. a. Extremely promising.... After revision 12482 and after running make validate in the build tree to take care of dependencies, the command xmlto --with-fop pdf plplotdoc.xml succeeded and also rendered table 3.4 without issues. (For example, the small number of missing glyphs in the SGML PDF backend results are present here.) b. Caveats.... xmlto pdf plplotdoc.xml errors out (see https://bugzilla.redhat.com/show_bug.cgi?id=949087 where there doesn't seem to be any quick solution for this default pdf issue) and xmlto --with-dblatex pdf plplotdoc.xml succeeds but does not give good Table 3.4 results. So avoid these variants of the xmlto command for pdf (which is easy to do, but I thought I had better remark on it here). Another much more important caveat is all the style issues that occurred for html with xmlto also occur for pdf. So the same remarks about moving to XSL style sheets apply here as well. 4. Print (PostScript) results generated from our DocBook source. Here too, there are promising results but also some caveats. a. Extremely promising.... After revision 12482 and after running make validate in the build tree to take care of dependencies, the command xmlto --with-fop ps plplotdoc.xml succeeded and also rendered table 3.4 without issues (as does the PostScript SGML backend). b. Caveats.... Both xmlto ps plplotdoc.xml and xmlto --with-dblatex ps plplotdoc.xml error out so avoid these variants for ps (which is easy to do, but I thought I had better remark on it here). Another much more important caveat is all the style issues that occurred for html and pdf also occur for ps. So the same remarks about moving to XSL style sheets apply here as well. 5. Print (dvi) results generated from our DocBook source. Here there are slightly promising results but also some strong caveats. a. Slightly promising.... After revision 12482 and after running make validate in the build tree to take care of dependencies, the command xmlto --with-dblatex dvi plplotdoc.xml succeeded without obvious error messages if and only if I locally replaced the Greek entities in math.ent by their equivalent Math symbol unicode values, e.g., unicode x391 changed to unicode x1D6A8. (Note, that probably anything that was unrecognizable would have worked for these entities.) b. Caveats.... Both xmlto dvi plplotdoc.xml and xmlto --with-fop dvi plplotdoc.xml error out (with or without the changed math.ent) so avoid these variants for dvi (which is easy to do, but I thought I had better remark on it here). Another caveat is the dvi result produced by the one (--with-dblatex) variant of xmlto that works for dvi above is the entities (mostly Math symbols for the Greek letters) defined by the locally replaced math.ent were all meaningless to the tools invoked by --with-dblatex (probably because of the large numerical unicode index for the Math symbol variants of the Greek letters). So the resulting Table-3.4 results printed out the entities verbatim e.g., 𝚨 rather than the Greek letter, capital alpha). (This issue also occurred for --with-dblatex for pdf which is why that variant should be avoided in the pdf case, see above.) The SGML backend dvi results do not have this issue. I presume there is some sort of bug in dblatex concerning propagating entities to dvi that is avoided if you use large unrecognizable (at least for this XML dvi backend) unicode indices. So this is a pretty crummy dvi result which depends on internal details to avoid other bugs in the XML dvi backend. An additional less serious caveat is all the style issues that occurred for html also occur for dvi. So the same remarks about moving to XSL style sheets apply here as well. Other remarks: I have not looked yet at man and info results with xmlto, but apparently they are possible (which would complete the backend set of tools that we need). Also, according to documentation available on the web, xml is almost completely (with just a few necessary exceptions) utf8 aware so I have tried the experiment of inserting the utf8 code for a gamma (e.g., "γ" if your mailer is utf8 aware) right into math.xml and xmlto --with-fop pdf plplotdoc.xml filled out the appropriate bit of Table-3.4 in the resulting plplotdoc.pdf with no issues. So this constitutes a proof-of-concept that numerical entities such as "γ" (or the equivalent decimal equivalent "γ") that define the "γ" entity in math.ent could be replaced by the utf8 code for gamma, "γ" and so on for all the other Greek letters. In sum, xmlto is looking pretty good right now (except for problematic dvi results and DSSSL stylesheet replacements by XSL stylesheets that would have to be made in the future) so assuming I can get man and info to work with the xmlto approach, I would likely deprecate the SGML backend tools. So by default (unless -DDEPRECATED_SGML_BACKEND=ON was specified) those building our documentation would get the xmlto backend results. That would give us a chance to work on XSL stylesheets (using the TCG reference above) to improve the style of our xmlto backend results to the equivalent or better than the style of our current SGML backend results. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ Plplot-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/plplot-devel
