--- Comment #16 from ssas...@wikimedia.org ---
Parsoid converts wikitext to HTML and back. Tidy does not enter the picture at
all in the Parsoid pipeline.
There are two separate issues:
1. Parsoid uses the same set of PHP parser tests to make sure that Parsoid
generates HTML that doesn't break existing wikitext. It uses PHP HTML to
serialize back to wikitext. The extra newline that PHP parser emits introduces
extra newlines in wikitext. "a\n\n*b" instead of "a\n*b". So, the extra newline
is a "dirty diff". This now obscures our regular testing because we blacklist
these failing tests and then try to compare changes in HTML output. So, not
entirely unworkable, but requires greater diligence to monitor regressions
during development. Ideally, we wouldn't have these failures. Alternatively, we
have to fix Parsoid's HTML to wikitext conversion to ignore the extra newlines
when they come from the parser's insertion rather than from source -- that
means extra bookkeeping.
2. If you want Parsoid HTML to have the same CSS semantics like PHP HTML,
Parsoid would also have to start adding these newlines after <ul> and before
</ul>. These extra newline characters would now have to be accounted for in the
DSR calculations (DSR maps a range to wikitext source to a DOM subtree, ex:
wikitext substring [3,9] generate <p>foobar</p> HTML) which is also additional
bookkeeping. DSR calculations has been carefully tweaked to account for every
single character of wikitext so that Parsoid can faithfully output original
wikitext on unedited portions of the HTML. So, Parsoid would have to ignore the
extra inserted newlines in its calculations. Without accounting for them,
Parsoid will generate different wikitext on conversion which manifests as
"dirty diffs". Again, all of this extra bookkeeping can be done.
But, if we can avoid the extra newlines, the extra work can be avoided.
You are receiving this mail because:
You are on the CC list for the bug.
Wikibugs-l mailing list