2010/10/25 Paul Jakubik <[email protected]>:
>
> If your goal is to index or perform any kind of text analysis of mediawiki
> pages, I understand why you want to parse the page since the markup tends to
> mess up text analysis.

My goal is to perform some text analysis, and some structure analysis
on the pages.

It's for text search, and also to reformat these pages ; something
like a mediawiki bots.

It's also to write some wikipedia pages, but I understand it's outside
the tika scope.

I'cant' download all wikipedia pages : I need to work with the mediawiki api.

If I undestand, there is two big difficulties : no mime type, and a
complicated markup. I'm going to sit down and think a little bit.

Thanks.

Reply via email to