Re: [docbook-apps] change default HTML encoding to UTF-8
Hi Bob. Do the stylesheets output both html 4, html 5, xhtml and xhtml5? Or did you conflate html 4 and html 5? See more below. On 14 Aug 2017, at 18:48, Bob Stayton wrote: We have a bug report suggesting that the default output encoding for the DocBook html stylesheet be changed from ISO-8859-1 to UTF-8. I agree with this bug report. Why? Well, for one thing, you - here - talk about "html", and "html" today means "html 5". HTML 5.x recommends that documents are authored using UTF-8. Also, when I look at the link in the forwarded message (https://www.oxygenxml.com/forum/viewtopic.php?f=6=14812=43711#p43711), I note that the discussion thread talks about HTML 5. I am not able to see that HTML 4 is mentioned at all in that thread. Note this only applies to the original HTML 4 output from the "html" directory. Are you saying that the stylesheet also outputs HTML 5? (Note that I ask about "HTML 5" and not about xhtml or xhtml5.) The "xhtml" and "xhtml5" outputs already output UTF. The justification for that ought to be that XML defaults to UTF-8. Xhtml and xhtml5 are not 'html'. The original HTML 4 standard said ISO-8859-1 was the default encoding, but that UTF-8 would be acceptable. I am not able to find such statement in the HTMl 4 specification. I looked at the one page version: https://www.w3.org/TR/html401/html40.txt UTF-8 ”took over” as the dominant encoding on the Web long before HTML 5 became the official version of HTML. Technically speaking ISO-8859-1 is STILL the default HTML encoding, from user agents’ perspective. It is only from an authoring perspective that HTML 5 recommends UTF-8. DocBook stylesheets is an authoring tool. THere is only one processing model for HTML, and that model is defined by the latets HTML spec. Thus it should use UTF-8. At the very least, the DocBook stylesheet should not use the HTML 4 specification as a justification for failing to output HTML 5 as UTF-8. It isn't difficult for a user to change the output to UTF-8, but it does require a customization. The question here is whether to change the default output encoding to UTF-8. If the user has to change the output to UTF-8 in order to produce HTML 5 output, then the stylesheet does not follow HTML5’s recommendations. The fact that the user can produce XHTMl - and thus automatically get UTF-8 - does not alter the picture. This would change the HTML output to replace character references like
[docbook-apps] Getting the HTML encoding declaration i XML output
Hello. Back in 2009, Michael Leslie asked the list (but received no answer) the following: [1] «Does anyone have any experience generating UTF-8 XHTML that can be consistently rendered in both Firefox and IE?» And, like him, I want to use Docbook to produce HTML-compatible XHTML. However, as a (former) member of the HTML working group (and co-editor of a spec for polyglot markup - that is: XHTML that is HTML as well), I can say that the question has (since) been answered by the HTML5.x specifications: HTML-compatible XHTML(5) documents MUST NOT include the XML declaration, and they MUST be UTF-8 encoded, and the encoding must be declared using either the HTTP header, the Byte-order mark or the HTML encoding declaration. The latter - the HTML encoding declaration - comes in two variants: 1) 2) Both works equally well in Web browsers, but occationally there are some fringe, legacy implementations that only support the http-equiv variant. The Docbook XSL book does also try to explain encoding issue of HTML and XHTML.[2] See chapter on ’Special characters’ under the heading «HTML encoding». However, the book fails to nail the solution that HTML5.x specifies. Further more, it is (probably) well known that when the output mode of Docbook XSL is set to 'xml', then, by default, the HTML encoding declaration is not included. As a result, browsing a Docbook XSL-generated XHTML-file as text/html fails (e.g. by adding .html instead .xhtml), Web browsers receive no encoding declaration from the HTML document itself. Hence, I propose that in next version of Docbook XSL, you allow the HTML encoding declaration (both variants) to be used. In fact, it would be best if, by default, the HTML encoding declaration always is included. To solve my own problem, I have created the following customization (that I use with XMLmind XML editor), see below. If there is better/more generic way to do it, I would be thankful for your help (for instance, I am not sure why, in my iplementation, I had to include the namespace declaration - I’m sure that could have been avoided - anyway, it is excluded in the final output so it does not matter.) http://www.w3.org/1999/XSL/Transform; version="1.0"> http://www.w3.org/1999/xhtml; http-equiv="Content-Type" content="text/html;charset=UTF-8" /> Btw - and not to stamp on too many toes, but: I had a look at how the TEI xsl sheets works, and they seem to have taken care of the issue: They output their HTML as XML but without the XML declaration.[3] And they include the HTML encoding declaration in both their HTML outputs as well as their Epub3 output - which seems very wise.[4] I hope that Docbook XSL follows the same lead. In fact, to me Docbook XSL’s html output mode seems like a waste of time. Better to simply produce HTML-compatible XML output. [1] https://lists.oasis-open.org/archives/docbook-apps/200902/msg00099.html [2] http://www.sagehill.net/docbookxsl/SpecialChars.html [3] http://www.tei-c.org/release/doc/tei-xsl/profiles/default/html/to0.html#bt_src_O_S_to.xsl [4] http://www.tei-c.org/release/doc/tei-xsl/profiles/default/html/to4.html#bt_src_T_metaHTMLS_..htmlhtml_param.xsl -- leif halvard silli - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org
Re: [docbook-apps] Conversion: xml:lang not added to root element of Epub/XHTML1/XHTML5 files
Hi Bob and Dave, One thing that has held me back from using DocBook is that the XSLT XHTML transformation results I have seen, has not been up to my expectations. So, thanks for your great feedback. See more below. On 15 Feb 2017, at 8:37, Dave Pawson wrote: On 15 February 2017 at 01:48, Bob Stayton <b...@sagehill.net> wrote: Hi Leif, This can be easily done by adding the following to your customization layer: This makes use of two utility templates. The 'root.attributes' template is called right after the opening tag of , and it should output xsl:attribute elements. The 'xml.language.attribute' will generate an xml:lang attribute name and value. Thanks. This works very well. It even works for the lang attribute: I'd like to hear from members of the mailing list about whether you think this should be default behavior or not. (1) According to the i18n community of the W3.org: «Always use a language attribute on the html tag to declare the default language of the text in the page. When the page contains content in another language, add a language attribute to an element surrounding that content.» See https://www.w3.org/International/questions/qa-html-language-declarations (2) I try to follow the same attitude when I create a DocBook document: I declare the (main) document language on the root element. (3) If this community agree that the XSLT sheet should declare the language on the element, then, in addition, the XSLT sheet should stop declaring the language on the stand-in element for the DocBook root element (in this case, this means that the language should not be declared on the HTML element). It is no error to repeat the declaration, but it is not necessary. Can this be can be avoided? Fair request IMHO. Thanks. Every element? Suggest that is too much. Of course. On every HTML root element, only: . That is: The one and only element. Anything else is too much - unless you need to override the language declaration on the root element (because the text switches to/from another language). A parameter on the root element, then customisation thereafter perhaps? Pareto: suggest most documents will major in one language with (small?) parts in another? Indeed. Btw, I think the main reason for the issuees we here discuss is the fact that the DocBook vocabulary does not match the HTML vocabulary. Which reminds me of another, perhaps minor, issue: If the DocBook title element happens to be in German, while the document otherwise is in English (Example: xml:lang="de">Nein!...), then the XSLT sheet should declare the language on the element: xmlns='http://www.w3.org/1999/xhtml'>Nein . Currently, the language is declared on the stand-in element of the DocBook element - namely the HTML element - but not on the HTML element: * Current result (when Bob’s customization layer is added): Nein Nein … * Wanted result: Nein Nein … -- leif halvard silli - To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org