[docbook-apps] Canonical DocBook / para vs simpara
Yes. If the para element has an xml:id attribute, you will have to decide which of the elements in the generated sequence should get that id. I would vote for the first Element, but every other Element is also possible. A reference to the para element in DocBook references a larger area (namely the one including the block elements) than in the generated document. On the other hand, an author who for some reason needs to publish his document in an Office format would have to find a solution to this problem anyway. If there is no automatic transformation from DocBook to Office, then just manually. If no agreement can be reached in a discussion on this aspect, intermediate solutions might help. For example, it would help to transform all para elements that do not contain block elements into simpara elements. This would at least make it clear that the remaining para elements are always those that contain block elements which may require special attention. Their transformation into a sequence of simpara and block elements could be done in the first step in the transformation to ODF. The benefit would be, that the challenge is clearly documented in the description of the interchange format: /"If you see a para element in the output, then you have to take care about the included block elements."/ Even with para elements allowed, this would help for the development of the transformation into office formats. It would be interesting to know how many such controversial proposals actually exist. I would also advocate, for example, that the sect1 to sect6 elements should be transformed to section elements. Probably one should start with a collection of properties for the interchange format (that is, a rough first description of the docbook subset) and match which of them are now already supported by xslTNG steps, or can be achieved very easily? Regards, Frank Am 27.02.22 um 23:23 schrieb David Cramer: On 2/27/22 1:18 PM, Frank Steimke wrote: /No block Elements within para/ That's in my 80% because neither ODF nor OOXML do allow tables or lists in paragraphs. I would see a great benefit when the DocBook based structural interchange format would allow easy transformation into office Standards, especially ODF. You potentially lose information by doing that. For example, when normalizing, profiling and other attributes on the enclosing table would need to be copied to the paras you create before and after the table or list and onto the table or list itself as well. But doing so might change the author's intent in subtle ways depending on what job those attributes are doing. I guess this is Norm's point about everybody having a different 80%. Regards, David
Re: [docbook-apps] Canonical DocBook
On 2/27/22 1:18 PM, Frank Steimke wrote: /No block Elements within para/ That's in my 80% because neither ODF nor OOXML do allow tables or lists in paragraphs. I would see a great benefit when the DocBook based structural interchange format would allow easy transformation into office Standards, especially ODF. You potentially lose information by doing that. For example, when normalizing, profiling and other attributes on the enclosing table would need to be copied to the paras you create before and after the table or list and onto the table or list itself as well. But doing so might change the author's intent in subtle ways depending on what job those attributes are doing. I guess this is Norm's point about everybody having a different 80%. Regards, David
Re: [docbook-apps] Canonical DocBook
Thank you very much for your comments and suggestions, Norm. Please allow a few remarks. /"After a while, this starts to feel less like a canonical DocBook and more like a structural interchange format"./ Yes, based on DocBook. After all, the result of standard steps 1 to 7 is almost a valid DocBook Document, isn't it? That is, with the exception of a few additional attributes in a separate namespace (e. g. ghost attributes in tables). But it's true that this format is not intended for authors. They keep writing the way they do today, and the interchange format is generated by applying the xslTNG steps. /No block Elements within para/ That's in my 80% because neither ODF nor OOXML do allow tables or lists in paragraphs. I would see a great benefit when the DocBook based structural interchange format would allow easy transformation into office Standards, especially ODF. /Image size and scaling attributes/ You are right, this is more a question of the application or tool. /I could spin off the normalizing stylesheets, steps 1 to 5 above, optional 6 and 7, into a separate package. And I suppose, that could be documented. / That would be great. Is there a way i can help? Thanks, Frank Steimke Am 27.02.22 um 14:42 schrieb Norm Tovey-Walsh: Our own stylesheets are therefore divided into at least phases. First, […] As far as I can see, the XSL 3 stylesheets for XslTNG are also similar in structure. Yep. The xslTNG stylesheets go through several standard stages: 1. Normalize the logical structure (get rid of entity refs, basically) 2. Expand XIncludes 3. Upgrade from 4 to 5 if the input isn’t in a namespace 4. Process transclusions 5. Normalize the markup 6. Process annotations 7. Process external link bases Plus a couple more that are conditional. So there is a point in these stylesheets where the input document is in a sort of "canonical DocBook". However, this canonical format is not documented. That’s true. My suggestion is that the DocBook TC standardize and document the canonical DocBook format. Subsequently, stylesheets for transforming The problem with a documented canonical format is that, like a “minimal subset”, you could probably get broad agreement on 80% of it, but no two people would have the same 80% in mind. Another problem is that no one wants to author in the canonical format. It’s the format that removes all markup minimization. I could spin off the normalizing stylesheets, steps 1 to 5 above, optional 6 and 7, into a separate package. And I suppose, that could be documented. I don’t know if that’s a TC activity or not though as it’s pretty application specific. para/simpara: canonical DocBook should only support simpara. para with block-content (tables, lists) must be transformed into a sequence of simpara and other block-content. That’s in your 80% is it :-). Tables: In canonical DocBook, each table must have table column specifications. Default values are replaced by explicit values. […] which column it starts and where it ends without complex calculations. Content of table cell must be element only. It sounds like what you really want here, isn’t even CALS (or HTML) tables. You want the completely explicit internal format that the xslTNG stylesheets generate during table processing. They turn the entire table into a perfectly rectangular grid, using “ghost” elements for cells that are missing. That’s kind of true for a few of the other ideas you proposed, like the inline markup. After a while, this starts to feel less like a canonical DocBook and more like a structural interchange format. Images: Each image must have at least the attributes for image size and scaling. Getting those, if the author didn’t provide them, requires extensions and is even then only speculative. I’m sure there are image formats I can’t parse. Author’s really should provide them. P. S. This text was translated withwww.DeepL.com/Translator (free version) from german language. Wow. It did a remarkably good job. I would not, on a casual reading, have suspected autotranslation. Be seeing you, norm -- Norman Tovey-Walsh https://nwalsh.com/ Before you criticize someone, walk a mile in his shoes. That way, when you criticize him, you're a mile away and you have his shoes.
Re: [docbook-apps] Canonical DocBook
> Our own stylesheets are therefore divided into at least phases. First, […] > As far as I can see, the XSL 3 stylesheets for XslTNG are also similar > in structure. Yep. The xslTNG stylesheets go through several standard stages: 1. Normalize the logical structure (get rid of entity refs, basically) 2. Expand XIncludes 3. Upgrade from 4 to 5 if the input isn’t in a namespace 4. Process transclusions 5. Normalize the markup 6. Process annotations 7. Process external link bases Plus a couple more that are conditional. > So there is a point in these stylesheets where the input document is > in a sort of "canonical DocBook". However, this canonical format is > not documented. That’s true. > My suggestion is that the DocBook TC standardize and document the > canonical DocBook format. Subsequently, stylesheets for transforming The problem with a documented canonical format is that, like a “minimal subset”, you could probably get broad agreement on 80% of it, but no two people would have the same 80% in mind. Another problem is that no one wants to author in the canonical format. It’s the format that removes all markup minimization. I could spin off the normalizing stylesheets, steps 1 to 5 above, optional 6 and 7, into a separate package. And I suppose, that could be documented. I don’t know if that’s a TC activity or not though as it’s pretty application specific. > para/simpara: canonical DocBook should only support simpara. para with > block-content (tables, lists) must be transformed into a sequence of > simpara and other block-content. That’s in your 80% is it :-). > Tables: In canonical DocBook, each table must have table column > specifications. Default values are replaced by explicit values. […] > which column it starts and where it ends without complex calculations. > Content of table cell must be element only. It sounds like what you really want here, isn’t even CALS (or HTML) tables. You want the completely explicit internal format that the xslTNG stylesheets generate during table processing. They turn the entire table into a perfectly rectangular grid, using “ghost” elements for cells that are missing. That’s kind of true for a few of the other ideas you proposed, like the inline markup. After a while, this starts to feel less like a canonical DocBook and more like a structural interchange format. > Images: Each image must have at least the attributes for image size > and scaling. Getting those, if the author didn’t provide them, requires extensions and is even then only speculative. I’m sure there are image formats I can’t parse. Author’s really should provide them. > P. S. This text was translated with www.DeepL.com/Translator (free > version) from german language. Wow. It did a remarkably good job. I would not, on a casual reading, have suspected autotranslation. Be seeing you, norm -- Norman Tovey-Walsh https://nwalsh.com/ > Before you criticize someone, walk a mile in his shoes. That way, when > you criticize him, you're a mile away and you have his shoes. signature.asc Description: PGP signature
Re: [docbook-apps] Canonical DocBook
OK, i see. Not so easy. However, i never wanted to be unfair to anyone. My hope or expectation was that the stylesheets are already available, because we have XslTNG. We have the stylesheets for preprocessing in XslTNG, but we do not have a documentation of the DocBook subset they produce. But of course, when someone writes it down, there will be a discussion about the target format, someone will suggest a "better" solution which leads to a change request for the stylesheets ... Frank Am 27.02.22 um 12:29 schrieb Dave Pawson: On Sun, 27 Feb 2022 at 11:20, Frank Steimke wrote: What would be a better way to accomplish this? Ask Norm to document its internal format so that a particular community can choose it as the "de facto standard" for developing own stylesheets based on it? Not sure at all Frank. Democratic solution perhaps? Majority of TC support A vs B? Then rely on Norm () to write appropriate stylesheets? Seems unfair IMHO regards - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Canonical DocBook
On Sun, 27 Feb 2022 at 11:20, Frank Steimke wrote: > > What would be a better way to accomplish this? > > Ask Norm to document its internal format so that a particular community > can choose it as the "de facto standard" for developing own stylesheets > based on it? Not sure at all Frank. Democratic solution perhaps? Majority of TC support A vs B? Then rely on Norm () to write appropriate stylesheets? Seems unfair IMHO regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Canonical DocBook
What would be a better way to accomplish this? Ask Norm to document its internal format so that a particular community can choose it as the "de facto standard" for developing own stylesheets based on it? Frank Am 27.02.22 um 12:09 schrieb Dave Pawson: On Sun, 27 Feb 2022 at 11:02, Frank Steimke wrote: My suggestion is that the DocBook TC standardize and document the canonical DocBook format. (My view). This is the problem Frank? I'm sure, even within the TC, that defining and agreeing which 'form' is to be named Canonical would be contentious? I agree with the motive, I'm less sure about the means of achieving it? regards - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Canonical DocBook
On Sun, 27 Feb 2022 at 11:02, Frank Steimke wrote: > My suggestion is that the DocBook TC standardize and document the > canonical DocBook format. (My view). This is the problem Frank? I'm sure, even within the TC, that defining and agreeing which 'form' is to be named Canonical would be contentious? I agree with the motive, I'm less sure about the means of achieving it? regards -- Dave Pawson XSLT XSL-FO FAQ. Docbook FAQ. - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
[docbook-apps] Canonical DocBook
Hello List, I would like to propose a project "canonical DocBook" to the DocBook TC, and I am interested in the opinion of this mailing list. I hope this is the right list. DocBook is a great system for creating technical documents. We use it successfully for various purposes, which include transformation to formats other than HTML and PDF. For example, we work with stylesheets for transformation into ODF and into NISO-STS. Here, the flexibility of DocBook schemas is problematic, because it increases complexity. To give a very simple example, the title of a section is valid both with and without an enclosing info element. A template for transforming the title element must account for both possibilities. Our own stylesheets are therefore divided into at least phases. First, the input document is transformed into a uniform structure. This would ensure, for example, that each title element is always contained in an info element. In a second step, the document is converted into the target format. The advantage of this method is that the transformation of the second phase can be made much easier. As far as I can see, the XSL 3 stylesheets for XslTNG are also similar in structure. These are certainly much more professional, comprehensive and systematic in design. So there is a point in these stylesheets where the input document is in a sort of "canonical DocBook". However, this canonical format is not documented. My suggestion is that the DocBook TC standardize and document the canonical DocBook format. Subsequently, stylesheets for transforming valid DocBook 5 documents into the canonical format would be published - possibly these already exist, as part of the XslTNG stylesheets. The advantage would be that other projects could more easily transform canonical docbook to other formats. They would be able to build on a standard, documented DocBook format of lower complexity. Besides the simple example of the title elements, canonical DocBook would have to consider the following aspects, among others: para/simpara: canonical DocBook should only support simpara. para with block-content (tables, lists) must be transformed into a sequence of simpara and other block-content. Tables: In canonical DocBook, each table must have table column specifications. Default values are replaced by explicit values. Spanspec elements are converted to corresponding column start and end positions. Each cell of a table must have information about its position within the table, so that it is possible to determine at which column it starts and where it ends without complex calculations. Content of table cell must be element only. Images: Each image must have at least the attributes for image size and scaling. emphasis: explicit values instead of default values (e. g. role='bold'). A list of values for role which must be supported (bold, italic, underline). Lists: explicit values instead of default values (e. g. numeration for orderedlist). Of course, this task could be exceedingly difficult if we were on a greenfield site. I hope that in reality it will be less difficult if we take XslTNG stylesheets as a basis. And accept the format generated in them for the intermediate result after simplifying the structure as a basis for canonical DocBook standardization. I would be very interested in the opinion of the members of this list on this proposal. Sincerely Frank Steimke P. S. This text was translated with www.DeepL.com/Translator (free version) from german language. - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org