On Wed, 08 Apr 2015 14:46:44 +0200, Mikko Rantalainen <mikko.rantalai...@peda.net> wrote:

Simon Pieters (2015-04-08 11:07 Europe/Helsinki):
On Wed, 08 Apr 2015 07:55:26 +0200, Mikko Rantalainen
<mikko.rantalai...@peda.net> wrote:
The section 12.2.3.3 The list of active formatting elements
(https://html.spec.whatwg.org/multipage/syntax.html#the-list-of-active-formatting-elements)
has steps to "reconstruct the active formatting elements". The steps
include

[...]
How to deal with the case where the `entry` points to a marker after
step 7? Obviously one cannot create a marker as an HTML element.

This case seems possible because only the Step 6 checks for a marker and
then Step 7 blindly advances the list and may set `entry` to a marker.

(I'm asking this question because I hit this case while parsing user
input with html5lib PHP implementation and that implemenetation crashes
while trying to create an HTML element from marker.)

What is the input that triggers this? I fail to come up with a list of
active formatting elements that makes the reconstruct algorithm have a
marker as entry in step 8.

A minimal test case that reproduces the problem is

<table><tr><td>
<p><b>1<span><div><a>2</a></div></span></b></p>
</td></tr></table>

I'm not sure if some of that is not strictly required but at least this test case causes a crash at https://github.com/PedaNet/html5lib/blob/a11001bb9fd27d8a54228eb7851564cf27c25d6d/php/library/HTML5/TreeBuilder.php#L3307 where $entry->cloneNode() is called and $entry in fact contains the self::MARKER instead of a DOMNode. Source code comments refer to "steps to reconstruct the active formatting elements".

If no other parser implementation has issues with this source, I guess it's some another bug in the html5lib PHP implementation which causes an extra marker in the list of active formatting elements.

I don't think that's the issue, since you have one marker and there should be one (for <td>). Skipping past the "advance" step could explain this situation. Looking at the code it appears $step_seven is not defined for the first iteration, so that step will be skipped. Adding $step_seven = true; at the top of the function might fix this.

Could somebody explain the intended contents of list of active formatting elements? Should that list ever contain multiple markers by design?

Sure, e.g. <object><object> will have two markers.

In the case of crash, the list contains one marker followed by one DOM node.

OK. So I think the crash happens when seeing the <a>, but it's not a bug in the spec AFAICT. It also doesn't crash in Blink/WebKit/Gecko/Presto.



<table><tr><td><p><b>1<span>

This is straight-forward.
SoOE: html, body, table, tbody, tr, td, p, b, span
LoAFE: marker (td), b


<table><tr><td><p><b>1<span><div>

"If the stack of open elements has a p element in button scope, then close a p element."
->
"Pop elements from the stack of open elements until a p element has been popped from the stack."

SoOE: html, body, table, tbody, tr, td, div
LoAFE: marker (td), b


<table><tr><td><p><b>1<span><div><a>

"Reconstruct the active formatting elements, if any."
->
"1. If there are no entries in the list of active formatting elements, then there is nothing to reconstruct; stop this algorithm."

There are two entries. Carry on.

"2. If the last (most recently added) entry in the list of active formatting elements is a marker, or if it is an element that is in the stack of open elements, then there is nothing to reconstruct; stop this algorithm."

It's not a marker, it's not in the SoOE. Carry on.

"3. Let entry be the last (most recently added) element in the list of active formatting elements."

entry = b

"4. Rewind: If there are no entries before entry in the list of active formatting elements, then jump to the step labeled create."

There is an entry before. Carry on.

"5. Let entry be the entry one earlier than entry in the list of active formatting elements."

entry = marker

"6. If entry is neither a marker nor an element that is also in the stack of open elements, go to the step labeled rewind."

entry is marker. Carry on.

"7. Advance: Let entry be the element one later than entry in the list of active formatting elements."

entry = b

"8. Create: Insert an HTML element for the token for which the element entry was created, to obtain new element."

This creates a <b> element.

"9. Replace the entry for entry in the list with an entry for new element."

Carry on.

"10. If the entry for new element in the list of active formatting elements is not the last entry in the list, return to the step labeled advance."

It is the last entry. The algorithm stops here.

HTH,
--
Simon Pieters
Opera Software

Reply via email to