Hello, Andreas, I am interesting with your project.
But I can not download the source, could you send it to me via mail (mingli.yuan AT gmail.com) Thanks a lot. Regards, Mingli On Wed, Aug 4, 2010 at 6:10 AM, Andreas Jonsson <[email protected]>wrote: > Hello, > > I am initiating yet another attempt at writing a new parser for > MediaWiki. It seems that more than six month have passed since the > last attempt, so it's about time. :) > > Parser functions, magic words and html comments are better handled by > a preprocessor than trying to integrate them with the parser (at least > if you want preserve the current behavior). So I am only aiming at > implementing something that can be plugged in after the preprocessing > stages. > > In the wikimodel project (http://code.google.com/p/wikimodel/) we are > using a parser design that works well for wiki syntax; a front end > (implemented using an LL-parser generator) scans the text and feeds > events to a context object, which can be queried by the front end to > enable context sensitive parsing. The context object will in turn > feed a well formed sequence of events to a listener that may build a > tree structure, generate xml, or any other format. > > As of parser generators, Antlr seems to be the best choice. It have > support for semantic predicates and rather sophisticated options for > backtracking. I'm peeking at Steve Bennet's antlr grammar > (http://www.mediawiki.org/wiki/Markup_spec/ANTLR), but I cannot really > use that one, since the parsing algorothm is fundamentally different. > > There are two problems with Antlr: > > 1. No php back-end > > Writing a php back-end to antlr is a matter of providing a set of > templates and porting the runtime. It's a lot of work, but seems > fairly straightforward. > > The parser can, of course, be written in C and be deployed as a php > extension. The drawback is that it will be harder to deploy it, > while the advantage is the performance. For MediaWiki it might be > worth to maintain both a php and a C version though, since both > speed and deployability are important. > > 2. No UTF-8 support in the C runtime in the latest release of antlr. > > In trunk it has support of various character encodings,though, so > it will probably be there in the next release. > > My implementation is just at the beginning stages, but I have > successfully reproduced the exact behavior of MediaWiki's parsing of > apostrophes, which seems to be by far the hardest part. :) > > I put it up right here if anyone is interested at looking at it: > > > http://kreablo.se:8080/x/bin/download/Gob/libmwparser/libwikimodel%2D0.1.tar.gz > > > Best regards, > > Andreas Jonsson > > _______________________________________________ > Wikitext-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitext-l >
_______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
