Re: [whatwg] On tag inference

Henri Sivonen Thu, 01 Sep 2005 10:49:06 -0700

On Aug 29, 2005, at 22:29, Henri Sivonen wrote:

What kind of approach to tag inference can HTML5 be expected to take?For an SGML validator that is parsing HTML 4 the set of possibleelement names is finite. However, a browser needs to deal with aninfinite set of a potential elements names. Therefore, it makes adifference whether end tag inference is based on what is allowed as achild of an element or on what elements are not allowed.
Example:
<p><foo>
Is 'foo' an element that not allowed as a child of 'p' and, therefore,implicitly closes the 'p'? Or is 'foo' not on the list of elementsthat close 'p' and, therefore, does not implicitly close it? Which wayare the inference rules going to be defined?

I think the latter approach should be chosen, because otherwise itwould be impossible to extend HTML in the future with an element thatcan occur as a child of 'p'.


Therefore:

End tag inference

I made the following list based on the HTML 4.01 Transitional DTD.Before the colon on each line there is a element whose end tag isoptional. After the colon, there is the list of elements whose starttag can cause the end tag being inferred. How should this list beaugmented for HTML5? Eg. should a start tag for <section> close aparagraph?

p: p, h1, h2, h3, h4, h5, h6, ol, ul, pre, dl, div, center, noscript,noframes, blockquote, form, isindex, hr, table, fieldset, address

dt: dt, dd
dd: dt, dd
li: li
thead: tfoot, tbody
tfoot: tbody
tbody: tbody
colgroup: colgroup, thead, tfoot, tbody, tr
tr: tr, tfoot, tbody
td: td, th, tr, tfoot, tbody
th: td, th, tr, tfoot, tbody
html:
body:
head: ANY BUT script, style, meta, link, object, title, isindex, base


Start tag inference

* If the top of the stack is 'table' and the element start is 'tr',infer 'tbody'.* If the stack is empty and the element start is anything but 'html',infer 'html'.* If the top of the stack is 'html', the element start is not 'head'and 'head' has not been seen yet, infer 'head'.* If the top of the stack is 'html', the element start is not 'body'and 'head' has been seen, infer 'body'.

Should (in memory of HTML 4.01 Transitional) character data imply thestart of body?

As far as I can tell, there are four kinds of inference needed whenparsing *conforming* documents (ie. no second stack for residualstyle):1) Element end causes the end of the elements that is on the top ofthe stack*.

If the top of the stack does not match the element end event, see ifthe top of the stack is on the list of elements whose end tag isoptional. Pop and report the end of the popped element if yes. Err ifnot. Repeat.

2) End of the data stream causes the end of the element that is on thetop of the stack.

See if the top of the stack is on the list of elements whose end tag isoptional. Pop and report the end of the popped element if yes. Err ifnot. Repeat.

3) Element start causes the end of the element that is on the top ofthe stack.
4) Element start causes another element start before itself.

a) Perform end tag inference repeatedly according to the lists givenabove until no inference can be made.

b) Perform the start tag inference once.

Repeat from a) until additional inference cannot be performed. Then letthe original element start go through.

Is this correct for *conforming* documents (ie. without residual style,etc.)?


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] On tag inference

Reply via email to