Hi, On Mon, Oct 25, 2010 at 10:27 AM, Ista Pouss <[email protected]> wrote: > There is no official spec of the markup langage. There are some > parsers... I find "Wiki2HtmlJavaProgram" > (http://community.jboss.org/wiki/Wiki2HtmlJavaProgram) and "jwpl" > (http://code.google.com/p/jwpl/). Perhaps it's best to start from > scratch with antlr ?
Note that since the MediaWiki markup is practically plain text with some structural formatting rules, you can get pretty far with Tika's normal plain text parser unless you really need the structural information. Or if you already have the markup of a wiki page available as a string or a character stream (for example if you're accessing the underlying database or JSON exports directly), then there may be no need to involve Tika in the process. BR, Jukka Zitting
