[sphinx-dev] Re: using sphinx latex with scientific journal templates

Guenter Milde Tue, 01 Feb 2011 00:30:10 -0800

On 2011-01-29, foobaron wrote:
> Hi,
> I've been using Sphinx a long time for software projects and more
> recently for writing all my scientific papers.  I love being able to
> run it right on my iPad to view and edit papers with lots of equations
> (HTML output with jsMath), and at the same time generate latex & PDFs
> I can circulate to people for comments.


For single documents, I suggest Docutils over Sphinx.
Missing some of the bells and whistles, but easier to set up and
configure.

OTOH, for my taste reStructuredText is too limited for scientific
papers yet: no citation support, no referencing to numbered
tables/figures via labels, ...

> I've found that the problems start when I want to submit the paper to
> a journal, because each journals requires the latex source of the
> manuscript to conform exactly to their custom latex template, which of
> course Sphinx's latex output does not.  After studying how Sphinx /
> docutils generate latex output, I decided not to mess with that code
> at all but instead just wrote a separate Python script to extract the
> relevant sections of Sphinx's latex output and insert them into the
> latex template supplied by the journal.  This is pretty easy, works
> great, and is easy to adapt to different journals (so far I've done it
> for PNAS, PLOS, and Information; if people want me to post an example
> rewriter script I can do that).  But it feels like a kluge; it seems
> like lots of people want this kind of latex output customization and
> we should instead all be using docutils latex templates etc. 

Docutils has a "publisher" API for this kind of work. It allows to
write custom back-ends that get the document parts and are free to
combine them as required.

As no-one used it with the LaTeX writer, there might be issues. Report
them on the docutils-devel list (if you want to go this way).

> example, if Sphinx alters little details of the latex it outputs, that
> might make my re-writer script stop working (because it has to search
> for specific strings in Sphinx's output, and transform them).

There is the open task of making Sphinx use the Docutils LaTeX writer to
reduce code duplication. This will definitely change lots of these
details.

...

> I'm totally sold on Sphinx as my long-term solution for being able to
> "cross-compile" my content to many different outputs.  I would now
> like to work on this latex output template customization problem in a
> general way that would be usable, extensible and customizable by
> others.  My question is what approach Sphinx developers would
> recommend.  I'll quickly note a few requirements:

> - this is not just a matter of changing a template file.  The Sphinx
> code itself contains many fragments of latex that must be customized
> for each possible output.  For example,
> sphinx.ext.mathbase.wrap_displaymath() outputs displaymath using the
> non-standard environments "gather" and "split".  This does not match
> any scientific journal's template, so either this Sphinx code itself
> must change, or we are stuck with my klugey approach (external scripts
> that rewrite the latex output by Sphinx, inserting it into a journal's
> template).

I am very cautious regarding changes to the Sphinx/Docutils code just to
please special requirements of some publisher.

* Some of Sphix' LaTeX output is still very specific (due to its
  anchestry as a special purpose Python documentation writer). These
  parts should be changed to be more "mainstream".

* Some of these customizations go "away from the mainstream" (e.g. "gather"
  and "split" are provided by the "amsmath" package that can be regarded
  a requirement for every serious math typesetting (except if special
  packages are used that emulate its behaviour).
  
  In these cases, I propose subclassing the Sphinx (or Docutils) writer
  and do the customizations in the child.
 

> - scientific journals supply precise templates and demand that authors
> follow them exactly.  Nowadays they input the latex file directly into
> their typesetting production, so to ensure a uniform appearance and
> standard across all the papers they publish, they *require* that the
> manuscript follow the template.  As an author, I cannot deviate from
> their template at all.  For example, they won't permit the inclusion
> of any packages other than a specified list that is used by their
> template.  Unfortunately, much of the Sphinx code for latex support
> assumes the use of many custom packages.  Again either that code has
> to change, or we're stuck with the external re-writer script approach.

This is (IMO) a clear case for using the publisher (to get the parts)
together with a custom writer (inheriting from the default one).

It may require to emulate (or copy/paste) the definitions of the
"prohibited" packages in the document preamble.


Generally, generating content that strictly follows a template is
troublesome.

> - this will require user-settable options for what to do with figures
> and tables.  For example, during the initial submission / review
> phase, PLOS journals want the figures and tables included at the end
> of the manuscript (not in the middle of the text where Sphinx inserts
> them).  However, once the paper is accepted for production, they want
> *only* the figure legends included at the end of the manuscript (i.e.
> do not include the figure images in the manuscript at all; they must
> be submitted separately).  With my re-writer script this is easy; it
> just takes an optional argument that controls whether it includes the
> figures in the output or not.

Much of this can be done with custom style-sheets and existing latex packages
that can also be embedded by Docutils.

For leaving out objects (e.g. figures), the "skip-elements-with-class"
config option can be used.

> I'd like to get some advice about what approach people think would be
> best.  A few options come to mind:

- use what is already available (custom style sheets, config options,
  preamble code)
  
  This will trim down the amount of necessary changes.

> - external rewriter scripts:
  ...
  
- external back-ends:
  
  the back-end takes from Docutils (or Sphinx) the parts of the
  latex output and a LaTeX template
  
> and inserts the relevant pieces of
> content into the template.  This could be designed in a relatively
> modular way.  I.e. a parser that extracts relevant sections from the
> Sphinx latex output; a "standardizer" that removes non-standard things
> like "gather" and "split".  Then for each output target there could be
> a very small amount of code that processes journal-specific options
> like "submission format" vs. "production format".  

  As there is no parsing of a complete document, things are less likely
  to break (but still not fail save across Docutils/Sphix versions).


> - for this reason, it might make sense to make the "parser" and
> "standardizer" components of this actually part of the sphinx
> codebase, along with a bunch of automated tests that ensure they are
> working.  Since these pieces must be kept in sync with the Sphinx
> code, that argues that they should be part of the Sphinx mercurial
> tree.  Then the set of journal-specific "writer" scripts (which will
> be *very* simple, since all they have to do is process various little
> options) could either also be included with Sphinx, or distributed as
> a separate project.

This will only make sense, if there is someone committed to keeping
them in sync.

> - "the full Monty": instead of using an external re-writer script, we
> modify the Sphinx latex code (e.g. sphinx.ext.mathbase,
> sphinx.writers.latex) to make it easy to customize the latex output in
> a truly general way (i.e. to produce output that does *not* assume non-
> standard packages, that inserts directly into any template file the
> user specifies, etc.).  Having browsed the sphinx code a bit, this
> seems like a fair amount of work, as it requires understanding what
> both docutils and sphinx are doing to produce the latex output, and a
> fair amount of code is involved...

I suppose a mixed approach. Doing "the right thing at the right place"
requires a lot of understanding the code and discussion, but will be best
to improve both Docutils and Sphinx for everyone.

Günter

-- 
You received this message because you are subscribed to the Google Groups 
"sphinx-dev" group.
To post to this group, send email to sphinx-dev@googlegroups.com.
To unsubscribe from this group, send email to 
sphinx-dev+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sphinx-dev?hl=en.

[sphinx-dev] Re: using sphinx latex with scientific journal templates

Reply via email to