2010/10/25 Paul Jakubik <[email protected]>: > > If your goal is to index or perform any kind of text analysis of mediawiki > pages, I understand why you want to parse the page since the markup tends to > mess up text analysis.
My goal is to perform some text analysis, and some structure analysis on the pages. It's for text search, and also to reformat these pages ; something like a mediawiki bots. It's also to write some wikipedia pages, but I understand it's outside the tika scope. I'cant' download all wikipedia pages : I need to work with the mediawiki api. If I undestand, there is two big difficulties : no mime type, and a complicated markup. I'm going to sit down and think a little bit. Thanks.
