Re: [docbook-apps] change default HTML encoding to UTF-8

2017-08-15 Thread Leif Halvard Silli
Hi Bob. Do the stylesheets output both html 4, html 5, xhtml and xhtml5? 
Or did you conflate html 4 and html 5? See more below.


On 14 Aug 2017, at 18:48, Bob Stayton wrote:

We have a bug report suggesting that the default output encoding for 
the DocBook html stylesheet be changed from ISO-8859-1 to UTF-8.


I agree with this bug report. Why? Well, for one thing, you - here - 
talk about "html", and "html" today means "html 5". HTML 5.x recommends 
that documents are authored using UTF-8.


Also, when I look at the link in the forwarded message 
(https://www.oxygenxml.com/forum/viewtopic.php?f=6=14812=43711#p43711), 
I note that the discussion thread talks about HTML 5. I am not able to 
see that HTML 4 is mentioned at all in that thread.


Note this only applies to the original HTML 4 output from the "html" 
directory.



Are you saying that the stylesheet also outputs HTML 5? (Note that I ask 
about "HTML 5" and not about xhtml or xhtml5.)




The "xhtml" and "xhtml5" outputs already output UTF.



The justification for that ought to be that XML defaults to UTF-8. Xhtml 
and xhtml5 are not 'html'.



The original HTML 4 standard said ISO-8859-1 was the default encoding, 
but that UTF-8 would be acceptable.


I am not able to find such statement in the HTMl 4 specification. I 
looked at the one page version: https://www.w3.org/TR/html401/html40.txt


UTF-8 ”took over” as the dominant encoding on the Web long before 
HTML 5 became the official version of HTML.


Technically speaking ISO-8859-1 is STILL the default HTML encoding, from 
user agents’ perspective. It is only from an authoring perspective 
that HTML 5 recommends UTF-8.


DocBook stylesheets is an authoring tool. THere is only one processing 
model for HTML, and that model is defined by the latets HTML spec. Thus 
it should use UTF-8.


At the very least, the DocBook stylesheet should not use the HTML 4 
specification as a justification for failing to output HTML 5 as UTF-8.


It isn't difficult for a user to change the output to UTF-8, but it 
does require a customization.  The question here is whether to change 
the default output encoding to UTF-8.


If the user has to change the output to UTF-8 in order to produce HTML 5 
output, then the stylesheet does not follow HTML5’s recommendations.


The fact that the user can produce XHTMl - and thus automatically get 
UTF-8 - does not alter the picture.


This would change the HTML output to replace character references like 


[docbook-apps] Getting the HTML encoding declaration i XML output

2017-02-20 Thread Leif Halvard Silli
Hello. Back in 2009, Michael Leslie asked the list (but received no 
answer) the following: [1]


   «Does anyone have any experience generating UTF-8 XHTML
   that can be consistently rendered in both Firefox and IE?»

And, like him, I want to use Docbook to produce HTML-compatible XHTML. 
However, as a (former) member of the HTML working group (and co-editor 
of a spec for polyglot markup - that is: XHTML that is HTML as well), I 
can say that the question has (since) been answered by the HTML5.x 
specifications: HTML-compatible XHTML(5) documents MUST NOT include the 
XML declaration, and they MUST be UTF-8 encoded, and the encoding must 
be declared using either the HTTP header, the Byte-order mark or the 
HTML encoding declaration. The latter - the HTML encoding declaration - 
comes in two variants:


 1) 
 2) 

Both works equally well in Web browsers, but occationally there are some 
fringe, legacy implementations that only support the http-equiv variant.


The Docbook XSL book does also try to explain encoding issue of HTML and 
XHTML.[2] See chapter on ’Special characters’ under the heading 
«HTML encoding». However, the book fails to nail the solution that 
HTML5.x specifies.


Further more, it is (probably) well known that when the output mode of 
Docbook XSL is set to 'xml', then, by default, the HTML encoding 
declaration is not included. As a result, browsing a Docbook 
XSL-generated XHTML-file as text/html fails (e.g. by adding .html 
instead .xhtml), Web browsers receive no encoding declaration from the 
HTML document itself.


Hence, I propose that in next version of Docbook XSL, you allow the HTML 
encoding declaration (both variants) to be used. In fact, it would be 
best if, by default, the HTML encoding declaration always is included.


To solve my own problem, I have created the following customization 
(that I use with XMLmind XML editor), see below. If there is better/more 
generic way to do it, I would be thankful for your help (for instance, I 
am not sure why, in my iplementation, I had to include the namespace 
declaration - I’m sure that could have been avoided - anyway, it is 
excluded in the final output so it does not matter.)




http://www.w3.org/1999/XSL/Transform;
version="1.0">
  
  
  

  


  

  
http://www.w3.org/1999/xhtml;
http-equiv="Content-Type"
   content="text/html;charset=UTF-8"
/>
  


Btw - and not to stamp on too many toes, but: I had a look at how the 
TEI xsl sheets works, and they seem to have taken care of the issue: 
They output their HTML as XML but without the XML declaration.[3] And 
they include the HTML encoding declaration in both their HTML outputs as 
well as their Epub3 output - which seems very wise.[4] I hope that 
Docbook XSL follows the same lead. In fact, to me Docbook XSL’s html 
output mode seems like a waste of time. Better to simply produce 
HTML-compatible XML output.


[1] https://lists.oasis-open.org/archives/docbook-apps/200902/msg00099.html
[2] http://www.sagehill.net/docbookxsl/SpecialChars.html
[3] 
http://www.tei-c.org/release/doc/tei-xsl/profiles/default/html/to0.html#bt_src_O_S_to.xsl

[4] 
http://www.tei-c.org/release/doc/tei-xsl/profiles/default/html/to4.html#bt_src_T_metaHTMLS_..htmlhtml_param.xsl
--
leif halvard silli

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Conversion: xml:lang not added to root element of Epub/XHTML1/XHTML5 files

2017-02-15 Thread Leif Halvard Silli

Hi Bob and Dave,

One thing that has held me back from using DocBook is that the XSLT 
XHTML transformation results I have seen, has not been up to my 
expectations. So, thanks for your great feedback. See more below.


On 15 Feb 2017, at 8:37, Dave Pawson wrote:

On 15 February 2017 at 01:48, Bob Stayton <b...@sagehill.net> wrote:

Hi Leif,
This can be easily done by adding the following to your customization 
layer:



  


This makes use of two utility templates.  The 'root.attributes' 
template is

called right after the opening tag of , and it should output
xsl:attribute elements.

The 'xml.language.attribute' will generate an xml:lang attribute name 
and

value.


Thanks. This works very well. It even works for the lang attribute:
   
  

I'd like to hear from members of the mailing list about whether you 
think

this should be default behavior or not.


(1) According to the i18n community of the W3.org:

   «Always use a language attribute on the html tag to declare the 
default language of the text in the page. When the page contains content 
in another language, add a language attribute to an element surrounding 
that content.»


   See 
https://www.w3.org/International/questions/qa-html-language-declarations


(2) I try to follow the same attitude when I create a DocBook document: 
I declare the (main) document language on the root element.


(3) If this community agree that the XSLT sheet should declare the 
language on the  element, then, in addition, the XSLT sheet should 
stop declaring the language on the stand-in element for the DocBook root 
element (in this case, this means that the language should not be 
declared on the HTML  element). It is no error 
to repeat the declaration, but it is not necessary.


Can this be can be avoided?


Fair request IMHO.


Thanks.


Every element? Suggest that is too much.


Of course. On every HTML root element, only: . That is: The one 
and only  element. Anything else is too much - unless you need to 
override the language declaration on the root element (because the text 
switches to/from another language).


A parameter on the root element, then customisation thereafter 
perhaps?
Pareto: suggest most documents will major in one language with 
(small?)

parts in another?


Indeed.

Btw, I think the main reason for the issuees we here discuss is the fact 
that the DocBook vocabulary does not match the HTML vocabulary. Which 
reminds me of another, perhaps minor, issue:


If the DocBook title element happens to be in German, while the document 
otherwise is in English (Example: xml:lang="de">Nein!...), then the XSLT sheet 
should declare the language on the  element: xmlns='http://www.w3.org/1999/xhtml'>Nein 
. Currently, the language is declared on the stand-in element 
of the DocBook  element - namely the HTML  element - but not 
on the HTML  element:


* Current result (when Bob’s customization layer is added):

Nein
Nein …

* Wanted result:

Nein
    Nein …
--
leif halvard silli

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org