Say, while everybody's trying to figure out a formal grammar, have you had a 
look at Ward Cunningham's exploratory parsing kit? He gave me a demo at 
OSBridge, and it's a really handy tool. Basically, it's a web app with an 
asynchronous C backend. You paste a tentative PEG grammar into a textarea, and 
it runs through whatever corpus you want, showing you representative instances 
of how it does or does not match. He was running it against the full English 
Wikipedia on his laptop, and it took only half an hour or something—with 
results coming in as they were generated, of course.

Using that, they made a PEG-and-then-some implementation of MW syntax that 
parses darn near all of Wikipedia: 
https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg. (I call it 
"PEG-and-then-some" because it does have a lot of callbacks which might 
interlock with and affect the rule matching—not sure.)

Cheers,
Erik
_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to