Internal Subset:

The latest Firefox, Chrome and IE all support the doctype.internalSubset 
property in the DOM. Their behavior diverges slightly when parsing and 
serializing:
For HTML parsing the internalSubset is ignored as specified in HTML5. This 
property returns null. For XHTML parsing, IE and Firefox parse the literal 
contents of the internal subset up until the closing angle bracket into the 
'internalSubset' property. Chrome does not.
For Serializing, if the browser has stored an internalSubset property, it is 
serialized as part of the Doctype.

Since this is two out of three main browsers, I added this serialization step 
as optional, conditional on the browser storing an internalSubset. If browsers 
choose to remove their internalSubset support, then they will still be 
conformant to this specification.

CDATASection:

From what I can determine from the DOM spec (DOM4), the CDATASection object has 
been removed to "simplify the DOM platform" (Section 10.2). Which seems nice 
since CDATASections cannot be parsed by the HTML parser defined in HTML5. 
However, CDATASection (as a parser concept) is alive and well in XHTML and XML 
documents and as such these get parsed into CDATASection objects today on all 
browsers. In these cases (XHTML & XML documents), I presume that the DOM spec 
would like browsers to store parsed CDATASection content as Text objects? 
Today, no browser does this.

There shouldn't be any material problem that I can see for browsers to treat 
XHTML/XML parsed CDATASections as Text. Characters accepted without escaping in 
CDATASections like "<" and ">" would be put into a Text node literally, and 
then escaped out on serialization. This will make serialized text containing 
lots of angle brackets much larger than the original text content, but that's 
not a technical downside. There may be compat risk to making this change, but 
that's another story. Since it doesn't hurt browsers to leave it in the 
platform, I wonder whether there are browser implementations who want to make 
this change? It certainly isn't on IE's radar. 

I suppose I could make CDATASection serialization a historical (optional) 
behavior for platforms that preserve the identity of CDATASection objects in 
the DOM. I hate to pull it out altogether, because this is something that all 
platforms support interoperably today. Leaving it in the spec is not a problem 
because once a browser starts converting CDATASection input to Text, then the 
identity of the object to serialize is now Text, and the CDATASection 
serialization rules don't apply.

It seems like there may be a separate concern with the references though. I 
don't currently make a reference to DOM L3 Core for CDATASection or 
internalSubset. Should I be?

-Travis

From: [email protected] [mailto:[email protected]] 
On Wed, Nov 27, 2013 at 5:22 PM, Travis Leithead 
<[email protected]> wrote:
> I did end up talking about the (historical) internalSubset property of the 
> Doctype object for serialization--since browsers will include it if they 
> support it. Is this what you're referring to?

Do all browsers include it or only some?

I was referring to CDATASection. I had not noticed this doctype-related change, 
which also seems substantive. If you want to change the tree model relative to 
DOM, you really ought to argue that against the DOM specification, and not make 
willy-nilly changes on the serialization side.

--
http://annevankesteren.nl/

Reply via email to