[docbook-apps] Canonical DocBook / para vs simpara

2022-02-27 Thread Frank Steimke
Yes. If the para element has an xml:id attribute, you will have to 
decide which of the elements in the generated sequence should get that 
id. I would vote for the first Element, but every other Element is also 
possible. A reference to the para element in DocBook references a larger 
area (namely the one including the block elements) than in the generated 
document.


On the other hand, an author who for some reason needs to publish his 
document in an Office format would have to find a solution to this 
problem anyway. If there is no automatic transformation from DocBook to 
Office, then just manually.


If no agreement can be reached in a discussion on this aspect, 
intermediate solutions might help. For example, it would help to 
transform all para elements that do not contain block elements into 
simpara elements. This would at least make it clear that the remaining 
para elements are always those that contain block elements which may 
require special attention. Their transformation into a sequence of 
simpara and block elements could be done in the first step in the 
transformation to ODF.


The benefit would be, that the challenge is clearly documented in the 
description of the interchange format: /"If you see a para element in 
the output, then you have to take care about the included block 
elements."/ Even with para elements allowed, this would help for the 
development of the transformation into office formats.


It would be interesting to know how many such controversial proposals 
actually exist. I would also advocate, for example, that the sect1 to 
sect6 elements should be transformed to section elements.


Probably one should start with a collection of properties for the 
interchange format (that is, a rough first description of the docbook 
subset) and match which of them are now already supported by xslTNG 
steps, or can be achieved very easily?


Regards,
Frank

Am 27.02.22 um 23:23 schrieb David Cramer:

On 2/27/22 1:18 PM, Frank Steimke wrote:


/No block Elements within para/

That's in my 80% because neither ODF nor OOXML do allow tables or 
lists in paragraphs. I would see a great benefit when the DocBook 
based structural interchange format would allow easy transformation 
into office Standards, especially ODF.


You potentially lose information by doing that. For example, when 
normalizing, profiling and other attributes on the enclosing table 
would need to be copied to the paras you create before and after the 
table or list and onto the table or list itself as well. But doing so 
might change the author's intent in subtle ways depending on what job 
those attributes are doing.


I guess this is Norm's point about everybody having a different 80%.

Regards,

David


Re: [docbook-apps] Canonical DocBook

2022-02-27 Thread David Cramer

On 2/27/22 1:18 PM, Frank Steimke wrote:


/No block Elements within para/

That's in my 80% because neither ODF nor OOXML do allow tables or 
lists in paragraphs. I would see a great benefit when the DocBook 
based structural interchange format would allow easy transformation 
into office Standards, especially ODF.


You potentially lose information by doing that. For example, when 
normalizing, profiling and other attributes on the enclosing table would 
need to be copied to the paras you create before and after the table or 
list and onto the table or list itself as well. But doing so might 
change the author's intent in subtle ways depending on what job those 
attributes are doing.


I guess this is Norm's point about everybody having a different 80%.

Regards,

David


Re: [docbook-apps] Canonical DocBook

2022-02-27 Thread Frank Steimke
Thank you very much for your comments and suggestions, Norm. Please 
allow a few remarks.


/"After a while, this starts to feel less like a canonical DocBook and 
more like a structural interchange format"./


Yes, based on DocBook. After all, the result of standard steps 1 to 7 is 
almost a valid DocBook Document, isn't it? That is, with the exception 
of a few additional attributes in a separate namespace (e. g. ghost 
attributes in tables). But it's true that this format is not intended 
for authors. They keep writing the way they do today, and the 
interchange format is generated by applying the xslTNG steps.


/No block Elements within para/

That's in my 80% because neither ODF nor OOXML do allow tables or lists 
in paragraphs. I would see a great benefit when the DocBook based 
structural interchange format would allow easy transformation into 
office Standards, especially ODF.


/Image size and scaling attributes/

You are right, this is more a question of the application or tool.

/I could spin off the normalizing stylesheets, steps 1 to 5 above, 
optional 6 and 7, into a separate package. And I suppose, that could be 
documented. /


That would be great. Is there a way i can help?

Thanks,

Frank Steimke


Am 27.02.22 um 14:42 schrieb Norm Tovey-Walsh:

Our own stylesheets are therefore divided into at least phases. First,

[…]

As far as I can see, the XSL 3 stylesheets for XslTNG are also similar
in structure.

Yep. The xslTNG stylesheets go through several standard stages:

1. Normalize the logical structure (get rid of entity refs, basically)
2. Expand XIncludes
3. Upgrade from 4 to 5 if the input isn’t in a namespace
4. Process transclusions
5. Normalize the markup
6. Process annotations
7. Process external link bases

Plus a couple more that are conditional.


So there is a point in these stylesheets where the input document is
in a sort of "canonical DocBook". However, this canonical format is
not documented.

That’s true.


My suggestion is that the DocBook TC standardize and document the
canonical DocBook format. Subsequently, stylesheets for transforming

The problem with a documented canonical format is that, like a “minimal
subset”, you could probably get broad agreement on 80% of it, but no two
people would have the same 80% in mind.

Another problem is that no one wants to author in the canonical format.
It’s the format that removes all markup minimization.

I could spin off the normalizing stylesheets, steps 1 to 5 above,
optional 6 and 7, into a separate package. And I suppose, that could be
documented. I don’t know if that’s a TC activity or not though as it’s
pretty application specific.


para/simpara: canonical DocBook should only support simpara. para with
block-content (tables, lists) must be transformed into a sequence of
simpara and other block-content.

That’s in your 80% is it :-).


Tables: In canonical DocBook, each table must have table column
specifications. Default values are replaced by explicit values.

[…]

which column it starts and where it ends without complex calculations.
Content of table cell must be element only.

It sounds like what you really want here, isn’t even CALS (or HTML)
tables. You want the completely explicit internal format that the xslTNG
stylesheets generate during table processing. They turn the entire table
into a perfectly rectangular grid, using “ghost” elements for cells that
are missing.

That’s kind of true for a few of the other ideas you proposed, like the
inline markup.

After a while, this starts to feel less like a canonical DocBook and
more like a structural interchange format.


Images: Each image must have at least the attributes for image size
and scaling.

Getting those, if the author didn’t provide them, requires extensions
and is even then only speculative. I’m sure there are image formats I
can’t parse. Author’s really should provide them.


P. S. This text was translated withwww.DeepL.com/Translator  (free
version) from german language.

Wow. It did a remarkably good job. I would not, on a casual reading,
have suspected autotranslation.
 Be seeing you,
   norm

--
Norman Tovey-Walsh
https://nwalsh.com/


Before you criticize someone, walk a mile in his shoes. That way, when
you criticize him, you're a mile away and you have his shoes.

Re: [docbook-apps] Canonical DocBook

2022-02-27 Thread Norm Tovey-Walsh
> Our own stylesheets are therefore divided into at least phases. First,
[…]
> As far as I can see, the XSL 3 stylesheets for XslTNG are also similar
> in structure. 

Yep. The xslTNG stylesheets go through several standard stages:

1. Normalize the logical structure (get rid of entity refs, basically)
2. Expand XIncludes
3. Upgrade from 4 to 5 if the input isn’t in a namespace
4. Process transclusions
5. Normalize the markup
6. Process annotations
7. Process external link bases

Plus a couple more that are conditional.

> So there is a point in these stylesheets where the input document is
> in a sort of "canonical DocBook". However, this canonical format is
> not documented.

That’s true.

> My suggestion is that the DocBook TC standardize and document the
> canonical DocBook format. Subsequently, stylesheets for transforming

The problem with a documented canonical format is that, like a “minimal
subset”, you could probably get broad agreement on 80% of it, but no two
people would have the same 80% in mind.

Another problem is that no one wants to author in the canonical format.
It’s the format that removes all markup minimization.

I could spin off the normalizing stylesheets, steps 1 to 5 above,
optional 6 and 7, into a separate package. And I suppose, that could be
documented. I don’t know if that’s a TC activity or not though as it’s
pretty application specific.

> para/simpara: canonical DocBook should only support simpara. para with
> block-content (tables, lists) must be transformed into a sequence of
> simpara and other block-content.

That’s in your 80% is it :-).

> Tables: In canonical DocBook, each table must have table column
> specifications. Default values are replaced by explicit values.
[…]
> which column it starts and where it ends without complex calculations.
> Content of table cell must be element only.

It sounds like what you really want here, isn’t even CALS (or HTML)
tables. You want the completely explicit internal format that the xslTNG
stylesheets generate during table processing. They turn the entire table
into a perfectly rectangular grid, using “ghost” elements for cells that
are missing. 

That’s kind of true for a few of the other ideas you proposed, like the
inline markup.

After a while, this starts to feel less like a canonical DocBook and
more like a structural interchange format.

> Images: Each image must have at least the attributes for image size
> and scaling.

Getting those, if the author didn’t provide them, requires extensions
and is even then only speculative. I’m sure there are image formats I
can’t parse. Author’s really should provide them.

> P. S. This text was translated with www.DeepL.com/Translator (free
> version) from german language.

Wow. It did a remarkably good job. I would not, on a casual reading,
have suspected autotranslation.
Be seeing you,
  norm

--
Norman Tovey-Walsh 
https://nwalsh.com/

> Before you criticize someone, walk a mile in his shoes. That way, when
> you criticize him, you're a mile away and you have his shoes.


signature.asc
Description: PGP signature


Re: [docbook-apps] Canonical DocBook

2022-02-27 Thread Frank Steimke

OK, i see. Not so easy.

However, i never wanted to be unfair to anyone. My hope or expectation 
was that the stylesheets are already available, because we have XslTNG. 
We have the stylesheets for preprocessing in XslTNG, but we do not have 
a documentation of the DocBook subset they produce.


But of course, when someone writes it down, there will be a discussion 
about the target format, someone will suggest a "better" solution which 
leads to a change request for the stylesheets ...


Frank

Am 27.02.22 um 12:29 schrieb Dave Pawson:

On Sun, 27 Feb 2022 at 11:20, Frank Steimke
 wrote:

What would be a better way to accomplish this?

Ask Norm to document its internal format so that a particular community
can choose it as the "de facto standard" for developing own stylesheets
based on it?

Not sure at all Frank.
Democratic solution perhaps? Majority of TC support A vs B?
Then rely on Norm () to write appropriate stylesheets? Seems unfair IMHO

regards



-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Canonical DocBook

2022-02-27 Thread Dave Pawson
On Sun, 27 Feb 2022 at 11:20, Frank Steimke
 wrote:
>
> What would be a better way to accomplish this?
>
> Ask Norm to document its internal format so that a particular community
> can choose it as the "de facto standard" for developing own stylesheets
> based on it?

Not sure at all Frank.
   Democratic solution perhaps? Majority of TC support A vs B?
Then rely on Norm () to write appropriate stylesheets? Seems unfair IMHO

regards

-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Canonical DocBook

2022-02-27 Thread Frank Steimke

What would be a better way to accomplish this?

Ask Norm to document its internal format so that a particular community 
can choose it as the "de facto standard" for developing own stylesheets 
based on it?


Frank

Am 27.02.22 um 12:09 schrieb Dave Pawson:

On Sun, 27 Feb 2022 at 11:02, Frank Steimke
 wrote:


My suggestion is that the DocBook TC standardize and document the
canonical DocBook format.

(My view). This is the problem Frank?
I'm sure, even within the TC, that defining and agreeing which 'form'
is to be named Canonical
would be contentious?

I agree with the motive, I'm less sure about the means of achieving it?

regards




-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Canonical DocBook

2022-02-27 Thread Dave Pawson
On Sun, 27 Feb 2022 at 11:02, Frank Steimke
 wrote:

> My suggestion is that the DocBook TC standardize and document the
> canonical DocBook format.

(My view). This is the problem Frank?
I'm sure, even within the TC, that defining and agreeing which 'form'
is to be named Canonical
would be contentious?

I agree with the motive, I'm less sure about the means of achieving it?

regards


-- 
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



[docbook-apps] Canonical DocBook

2022-02-27 Thread Frank Steimke

Hello List,

I would like to propose a project "canonical DocBook" to the DocBook TC, 
and I am interested in the opinion of this mailing list. I hope this is 
the right list.


DocBook is a great system for creating technical documents. We use it 
successfully for various purposes, which include transformation to 
formats other than HTML and PDF. For example, we work with stylesheets 
for transformation into ODF and into NISO-STS.


Here, the flexibility of DocBook schemas is problematic, because it 
increases complexity. To give a very simple example, the title of a 
section is valid both with and without an enclosing info element. A 
template for transforming the title element must account for both 
possibilities.


Our own stylesheets are therefore divided into at least phases. First, 
the input document is transformed into a uniform structure. This would 
ensure, for example, that each title element is always contained in an 
info element. In a second step, the document is converted into the 
target format. The advantage of this method is that the transformation 
of the second phase can be made much easier.


As far as I can see, the XSL 3 stylesheets for XslTNG are also similar 
in structure. These are certainly much more professional, comprehensive 
and systematic in design. So there is a point in these stylesheets where 
the input document is in a sort of "canonical DocBook". However, this 
canonical format is not documented.


My suggestion is that the DocBook TC standardize and document the 
canonical DocBook format. Subsequently, stylesheets for transforming 
valid DocBook 5 documents into the canonical format would be published - 
possibly these already exist, as part of the XslTNG stylesheets. The 
advantage would be that other projects could more easily transform 
canonical docbook to other formats. They would be able to build on a 
standard, documented DocBook format of lower complexity.


Besides the simple example of the title elements, canonical DocBook 
would have to consider the following aspects, among others:


para/simpara: canonical DocBook should only support simpara. para with 
block-content (tables, lists) must be transformed into a sequence of 
simpara and other block-content.


Tables: In canonical DocBook, each table must have table column 
specifications. Default values are replaced by explicit values. Spanspec 
elements are converted to corresponding column start and end positions. 
Each cell of a table must have information about its position within the 
table, so that it is possible to determine at which column it starts and 
where it ends without complex calculations. Content of table cell must 
be element only.


Images: Each image must have at least the attributes for image size and 
scaling.


emphasis: explicit values instead of default values (e. g. role='bold'). 
A list of values for role which must be supported (bold, italic, 
underline).


Lists: explicit values instead of default values (e. g. numeration for 
orderedlist).


Of course, this task could be exceedingly difficult if we were on a 
greenfield site. I hope that in reality it will be less difficult if we 
take XslTNG stylesheets as a basis. And accept the format generated in 
them for the intermediate result after simplifying the structure as a 
basis for canonical DocBook standardization.


I would be very interested in the opinion of the members of this list on 
this proposal.


Sincerely
Frank Steimke

P. S. This text was translated with www.DeepL.com/Translator (free 
version) from german language.


-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org