https://bugzilla.wikimedia.org/show_bug.cgi?id=17486

--- Comment #31 from Daniel Friesen <[email protected]> ---
(In reply to comment #30)
> My comment was on topic simply because the malformed output is caused by
> incorrect specification about how distinct content elements can be safely
> embedded into each other.
> 
> And the whole topic is about this issue: the basic wiki syntax interacts very
> badly with the HTML (or XML) syntax based on *explicit* closure of tags (or
> wiki syntaxes). The current parsing rules contradict between each other, and
> we
> constantly have to find tricks to avoid these issues and incorrect output
> (which may parse as valid HTML5 but was in fact not the one intended and will
> be wrong XHTML5 anyway).

Specification and mixing custom WikiText syntaxes with HTML is irrelevant.
We're supposed to fail silently when bad WikiText is used and output valid HTML
even when given crap, not output malformed markup.

This WikiText:
* List item 1. <table class="wikitable">
<tr>
<td> Cell 1. </td>
</tr>
</table>
* List item 2.

Outputs this:
<ul><li> List item 1. <table class="wikitable">
</li></ul>
<tr>
<td> Cell 1. </td>
</tr>
</table>
<ul><li> List item 2.
</li></ul>

There's a </li></ul> right after the <table class="wikitable"> it leaves <tr>
and <td> elements outside of a table, that's invalid.

This but has nothing to do with integrating the WikiText list syntax and HTML
table markup. The fix for this issue is simply making sure that the garbage we
output for this invalid input is still well-formed markup.

Try inserting that garbage output back into a wiki page:
<ul><li> List item 1. <table class="wikitable">
</li></ul>
<tr>
<td> Cell 1. </td>
</tr>
</table>
<ul><li> List item 2.
</li></ul>

This is essentially the same garbage that the user gives us. But this time the
parser outputs:
<ul><li> List item 1. <table class="wikitable">
&lt;/li&gt;&lt;/ul&gt;
<tr>
<td> Cell 1. </td>
</tr>
</table>
<ul><li> List item 2.
</li></ul></li>
</ul>

While there is a minor validity issue in the fact that we have a string of text
inside of a <table> but outside of a cell -- fixing that would probably be a
separate bug -- that aside the markup is still well formed XML. Tags are
properly paired up, same number of each, and they are closed in the correct
order. When output into an XHTML5 page parsed with an XML parser this will work
and won't give you an XML parse error.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to