Chad wrote:
> 
> To elaborate on the final point. Sometimes the parser is changed and
> it breaks output on purpose. Case in point was when Tim rewrote the
> preprocessor. Some parts of syntax were intentionally changed. You'd
> have to establish a new baseline for this new behavior at that point.
> 
> This also comes down to the fact that we don't have a formal grammar
> for wikisyntax (basically it's whatever the Parser says it is at any given
> time). This makes testing the parser hard--we can only give it input and
> expected output, there's no standard to check against.
> 
> Finally, I don't think we need to dump all of enwiki. It can't require that
> much content to describe the various combinations of wiki syntax...

In principle, I rather like the idea of using the entire English 
Wikipedia (or why limit to that? we have plenty of other projects too) 
as a parser test, or at least of having the ability to do that if we want.

You see, the flip side to not having a formal grammar for wikimarkup is 
that we also don't have a spec sheet for it: the best description of how 
people actually expect the parser to behave and what features they 
expect it to support is what they're actually using it for on their 
wikis.  And en.wikipedia is the biggest and ugliest of the bunch.

There's no way we can ever write a test suite comprehensive enough to 
cover every single feature, bug, quirk and coincidence that actual wiki 
pages and templates may have come to rely on.  That's simply because for 
every MediaWiki coder there are dozens or hundreds of template writers 
and thousands of other editors.

In a way, all those editors form the biggest, most thorough fuzz tester 
there can be.  The only problem is that it's also a rather inefficient 
one, even for a fuzz tester: most wiki pages exercise only a fairly 
small and boring set of parser features.  But at least, if one were to, 
say, run a random sample of a few thousand Wikipedia pages through the 
parser and observe no unexpected changes in the output, one could start 
to make some statistical predictions about how many of the remaining 
pages one could at worst expect to break.

The real problem, as noted elsewhere in the thread, is of course 
filtering the unexpected changes from any expected ones.  A partial 
solution could be having the test implementation extract the changes -- 
we conveniently have a word-level diff implementation available already 
-- and combining any duplicates.

Another, complementary approach would be to allow the person running the 
tests to postprocess the two outputs before they're compared, so as to 
try and eliminate any expected differences.  Of course, this would 
require some significant extra effort on the part of that person, beyond 
just typing "php runSomeTests.php" and hitting enter, but then again, 
throughly analyzing the effects of a major parser change is a nontrivial 
exercise anyway, no matter what.  And for things that _shouldn't_ cause 
any changes to the parser output, it really could be just as easy, in 
principle at least, as running parserTests currently is.

-- 
Ilmari Karonen

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to