On Wed, Aug 12, 2009 at 4:48 PM, dan nessett<[email protected]> wrote:
> --- On Wed, 8/12/09, Roan Kattouw <[email protected]> wrote:
>
>> I read this paragraph first, then read the paragraph above
>> and
>> couldn't help saying "WHAT?!?". Using a huge set of pages
>> is a poor
>> replacement for decent tests.
>
> I am not proposing that the CPRT be a substitute for "decent tests." We still 
> need a a good set of tests for the whole MW product (not just the parser). 
> Nor would I recommend making a change to the parser and then immediately 
> running the CPRT. Any developer that isn't masochistic would first run the 
> existing parserTests and ensure it passes. Then, you probably want to run the 
> modified DumpHTML against a small random selection of pages in the WP DB. 
> Only if it passes those tests would you then run the CPRT for final assurance.
>
> The CPRT I am proposing is about as good a test of the parser that I can 
> think of. If a change to the parser passes it using the Wikipedia database 
> (currently 5 GB), then I would say for all practical purposes the changes 
> made to the parser do not regress it.
>
>> Also, how would you handle
>> intentional
>> changes to the parser output, especially when they're
>> non-trivial?
>
> I don't understand this point. Would you elaborate?
>
> Dan
>
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

To elaborate on the final point. Sometimes the parser is changed and
it breaks output on purpose. Case in point was when Tim rewrote the
preprocessor. Some parts of syntax were intentionally changed. You'd
have to establish a new baseline for this new behavior at that point.

This also comes down to the fact that we don't have a formal grammar
for wikisyntax (basically it's whatever the Parser says it is at any given
time). This makes testing the parser hard--we can only give it input and
expected output, there's no standard to check against.

Finally, I don't think we need to dump all of enwiki. It can't require that
much content to describe the various combinations of wiki syntax...

-Chad

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to