On Wed, Jun 29, 2011 at 4:14 PM, Peter17 <[email protected]> wrote: > I have been working as a student on the 2011 edition of the Google > Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation. > My mentor is Erik Rose. > > For this purpose, we use a Python PEG parser called Pijnu [2] and > implement a grammar for it [3]. This way, we parse the wikitext into > an abstract syntax tree that we will then transform to HTML or other > formats. > > One of the advantages of Pijnu is the simplicity and readability of > the grammar definition [3]. It is not finished yet, but what we have > done so far seems very promising. >
Neat! Your life is definitely made easier by skipping full compatibility with some of our freakier syntax oddities ;) which'll still be very handy for various embedded-style "lite wiki" usages. Great list of alternatives, libraries & algorithms in your notes too though obviously mostly Python-oriented; looks like you've already looked at PediaPress's mwlib library, which is also Python-based. It's definitely a bit... hairier due to having to handle more of our funky syntax (it drives the PDF download and print-on-demand system on Wikipedia). I'm still looking around for good parser generator tools for PHP (we've been fiddling with PEG.js in some of our JavaScript-side experiments so far but will eventually need both JS and PHP implementations to cover editing tools and actual back-end rendering), so if anybody stumbles on good existing ones give a shout or we may have to roll some our own. Bonus points if we can eventually share the formal grammar production rules between multiple language implementations. :) -- brion
_______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
