Henry and Martin,

Martin J. Dürst, Wed, 18 Dec 2013 16:59:10 +0900, in reply to Henry S. 
Thompson:

>>   * In cases where conflicting information is supplied (from charset
>>     param, BOM and/or XML encoding declaration) it give a BOM, if
>>     present, authoritative status;
> 
> I'm a bit uneasy about the fact that we now have BOM (internal) - 
> charset (external) - encoding (internal), i.e. 
> internal-external-internal,

A better way of looking at would be that we now get External-Internal.

Were external is subdivided in charset parameter and encoding signature 
[BOM]. And internal is subdivided in encoding declaration and 
default/fallback encoding. Yeah, it might be that a lack of clear 
classification of the BOM as an external method is quite directly 
linked the lacking interoperability. 

Previously we had External-Limbo-Internal. However, per XML, both BOM 
and charset param are external.[1] The draft makes a point about 
this:[2] ”[XML] further states that the BOM is an encoding signature, 
and is not part of either the markup or the character data of the XML 
document.”

> but I guess there is lots of experience 
> in HTML 5 for giving the BOM precedence.

Sorry for focusing on XML rather than XML media types, but I think both 
of them should be edited.

The way of looking at it that I propose above also incorporates the 
fact that XML-capable Web browsers (the HTML 5 browsers) give 
precedence to the BOM, and without fatal error if there is a 
(conflicting) XML encoding declaration. (Btw, I find it very odd that, 
up until now, the *charset* parameter could override the encoding 
declaration, but if the BOM does the same [that is: overrides the 
encoding declaration], *then* it is a fatal error ...)

It makes sense to treat all external encoding declaration methods the 
same. Currently only the external *transport* protocol may override the 
internal mechanism. But the BOM should have the same ”right”.

Therefore I would suggest that the other spec, XML 1.0, section 4.3.3 
[3] does this (see the <INS> element):

]]In the absence of information provided by an external transport 
protocol (e.g. HTTP or MIME) <INS>OR BY THE BYTE ORDER MARK</INS>, it 
is a fatal error for an entity including an encoding declaration to be 
presented to the XML processor in an encoding other than that named in 
the declaration,[[

It should still be an error, but not a fatal error, if the xml encoding 
declaration  conflicts with the external method - BOM or HTTP.

[1] http://www.w3.org/TR/REC-xml/#NT-document
[2] 
http://tools.ietf.org/html/draft-ietf-appsawg-xml-mediatypes-06#section-3.3
[3] http://www.w3.org/TR/REC-xml/#charencoding
-- 
leif halvard silli


Reply via email to