Re: [HACKERS] [GENERAL] Postgres 10 manual breaks links with anchors

Peter Eisentraut Fri, 27 Oct 2017 04:32:11 -0700

On 10/26/17 16:10, Tom Lane wrote:
> Peter Eisentraut <peter.eisentr...@2ndquadrant.com> writes:
>> On 10/16/17 03:19, Thomas Kellerer wrote:
>>> I don't know if this is intentional, but the Postgres 10 manual started to 
>>> use lowercase IDs as anchors in the manual.
> 
>> Here is a patch that can be applied to PG 10 to put the upper case
>> anchors back.
>> The question perhaps is whether we want to maintain this patch
>> indefinitely, or whether a clean break is better.
> 
> In view of commit 1ff01b390, aren't we more or less locked into
> lower-case anchors going forward?  I'm not sure I see the point
> of changing v10 back to the old way if v11 will be incompatible
> anyhow.


The details are more complicated.

The IDs in DocBook documents have two purposes.

One is to ensure non-broken links between things like <sect1 id="foo">
and <xref linkend="foo">.  This is set up in the DTD and checked during
parsing (validation, more precisely).  In DocBook SGML, many things
including tag names, attribute names, and IDs are case insensitive.  But
in DocBook XML, everything is case sensitive.  So in order to make
things compatible for a conversion, we had to consolidate some variant
spellings that have accumulated in our sources.  For simplicity, I have
converted everything to lower case.

The other purpose is that the DocBook XSL and DSSSL stylesheets use the
IDs for creating anchors in HTML documents (and also for the HTML file
names themselves).  This is merely a useful choice of those stylesheets.

In PG 9.6 and earlier, we used a straight SGML toolchain, using Jade and
DSSSL.  The internal representation of a DocBook SGML document after
parsing converts all the case insensitive bits to upper case.  (This
might be configured somewhere; I'm not sure.)  So the stylesheets see
all the IDs as upper case to begin with, and that's why all the anchors
come out in upper case in the HTML output.

In PG 10, the build first converts the SGML sources to XML, redeclares
them as DocBook XML, then builds using XSLT.  Because DocBook XML
requires lower-case tags and attribute names, we have to use the osx -x
lower option to convert all the case-insensitive bits to lower case
instead of the default upper case.  That's why the XSLT stylesheets see
the IDs as lower case and that's why they are like that in the output.
(If there were options more detailed than -x lower, that could have been
useful.)

The proposed patch works much later in the build process and converts
IDs to upper case only when they are being considered for making an HTML
anchor.  The structure of the document as far as the XML parser is
concerned stays the same.

For PG 11, the idea is to convert the sources to a pure XML document.
XML is case insensitive, so the XML parser would see the IDs as what
they are.  Without the mentioned patch to convert all IDs to lower case
in the source, the XSL processor would see the IDs in whatever case they
are, and anchors would end up in the HTML output using whatever case
they are.  So the conversion to lower case in the source also ensured
anchor compatibility to PG 10.  Otherwise, someone might well have
complained in a similar manner a year from now.

Applying the proposed patch to master/PG 11 would have the same effect
as in PG 10.  It would convert anchors to upper case in the HTML output
but leave the logical and physical structure of the XML document alone.

So the options are simply

1) Use the patch and keep indefinitely, keeping anchors compatible back
to forever and forward indefinitely.

2) Don't use the patch, breaking anchors from <=9.6, but keeping them
compatible going forward.

Considering how small the patch is compared to some other customizations
we carry, #1 seems reasonable to me.  I just didn't know to what extent
people had actually bookmarked fragment links.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Postgres 10 manual breaks links with anchors

Reply via email to