Re: [Wikitext-l] MediaWiki parser in Python

Brion Vibber Wed, 29 Jun 2011 16:48:47 -0700

On Wed, Jun 29, 2011 at 4:14 PM, Peter17 <[email protected]> wrote:

> I have been working as a student on the 2011 edition of the Google
> Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
> My mentor is Erik Rose.
>
> For this purpose, we use a Python PEG parser called Pijnu [2] and
> implement a grammar for it [3]. This way, we parse the wikitext into
> an abstract syntax tree that we will then transform to HTML or other
> formats.
>
> One of the advantages of Pijnu is the simplicity and readability of
> the grammar definition [3]. It is not finished yet, but what we have
> done so far seems very promising.
>


Neat! Your life is definitely made easier by skipping full compatibility
with some of our freakier syntax oddities ;) which'll still be very handy
for various embedded-style "lite wiki" usages.

Great list of alternatives, libraries & algorithms in your notes too though
obviously mostly Python-oriented; looks like you've already looked at
PediaPress's mwlib library, which is also Python-based. It's definitely a
bit... hairier due to having to handle more of our funky syntax (it drives
the PDF download and print-on-demand system on Wikipedia).

I'm still looking around for good parser generator tools for PHP (we've been
fiddling with PEG.js in some of our JavaScript-side experiments so far but
will eventually need both JS and PHP implementations to cover editing tools
and actual back-end rendering), so if anybody stumbles on good existing ones
give a shout or we may have to roll some our own.

Bonus points if we can eventually share the formal grammar production rules
between multiple language implementations. :)

-- brion

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Re: [Wikitext-l] MediaWiki parser in Python

Reply via email to