I am investigating how to write a comprehensive parser regression test. What I 
mean by this is something you wouldn't normally run frequently, but rather 
something that we could use to get past the "known to fail" tests now disabled. 
The problem is no one understands the parser well enough to have confidence 
that if you fix one of these tests that you will not break something else.

So, I thought, how about using the guts of DumpHTML to create a comprehensive 
parser regression test. The idea is to have two versions of phase3 + 
extensions, one without the change you make to the parser to fix a 
known-to-fail test (call this Base) and one with the change (call this 
Current). Modify DumpHTML to first visit a page through Base, saving the HTML 
then visit the same page through Current and compare the two results. Do this 
for every page in the database. If there are no differences, the change in 
Current works.

Sitting here I can see the eyeballs of various developers bulging from their 
faces. "What?" they say. "If you ran this test on, for example, Wikipedia, it 
could take days to complete." Well, that is one of the things I want to find 
out. The key to making this test useful is getting the code in the loop 
(rendering the page twice and testing the results for equality) very efficient. 
I may not have the skills to do this, but I can at least develop an upper bound 
on the time it would take to run such a test.

A comprehensive parser regression test would be valuable for:

* fixing the known-to-fail tests.
* testing any new parser that some courageous developer decides to code.
* testing major releases before they are released.
* catching bugs that aren't found by the current parserTest tests.
* other things I haven't thought of.

Of course, you wouldn't run this thing nightly or, perhaps, even weekly. Maybe 
once a month would be enough to ensure the parser hasn't regressed out of sight.



      

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to