On 07/25/2013 01:03 PM, Roan Kattouw wrote:
On Wed, Jul 24, 2013 at 2:49 PM, C. Scott Ananian
<canan...@wikimedia.org> wrote:
For what it's worth, both the DOM serialization-to-a-string and DOM
parsing-from-a-string are done with the domino package.  It has a
substantial test suite of its own (originally from
http://www.w3.org/html/wg/wiki/Testing I believe).  So although the above
is probably worth doing as a low-priority task, it's really a test of the
third-party library, not of Parsoid.  (Although, since I'm a co-maintainer
of domino, I'd be very interested in fixing any bugs which it did turn up.)

I didn't mean it as a test of Domino, I meant it as a test of Parsoid:
does it generate things that are then foster-parented out, or other
things that a compliant DOM parser won't round-trip? It's also a more
realistic test, because the way that Parsoid is actually used by VE in
practice is that it serializes its DOM, sends it over the wire to VE,
which then does things with it and gives an HTML string back, which is
then parsed through Domino. So even in normal operation, ignoring the
fact that VE runs stuff through the browser's DOM parser, Parsoid
itself already round-trips the HTML through Domino, effectively.

We use two different libraries for different things:

* html5 library for building a DOM from a tag soup
* domino for serializing DOM --> HTML and for parsing HTML --> DOM

When doing a WT2WT roundtrip test, there are 2 ways to do this:

1. wikitext --> tag soup --> DOM (in-memory tree) --> wikitext
2. wikitext --> tag soup --> DOM (in-memory tree) --> HTML (string)--> DOM --> wikitext

We currently do 1. in our wt2wt testing. If there are foster-parenting bugs in the HTML5 library, then they will get hidden if we use path 1. However, when using VE and serializing its result back to wikitext, we are effectively using path 2.

And, both Roan and Scott are correct. Pathway 2. would be a test of of external libraries (HTML5 and Domino, not just domino). And, we did have bugs in the HTML5 parsing library we used (which I fixed based on reports from Roan) and then added them to parser tests.

But, if we use path 2. for all our RT testing for wp pages, other latent bugs with fostered content will show up.

Hope this clarifies the issue.

Subbu.


The foster parenting issues mostly arise in the wikitext->parsoid DOM
phase.  Basically, the wikitext is tokenized into a HTML tag soup and then
a customized version of the standard HTML parser is used to assemble the
soup into a DOM, mimicking the process by which a browser would parse the
tag soup emitted by the current PHP parser.  So the existing test suite
does expose these foster-parenting issues already.
Does it really? There were a number of foster-parenting issues a few
months ago where Parsoid inserted <meta> tags in places where they
can't be put (e.g. <tr>s), and no one in the Parsoid team seemed to
have noticed until I tracked down a few VE bugs to that problem.

Roan

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to