Re: Tika for mediawiki ?

Ista Pouss Sun, 24 Oct 2010 06:33:36 -0700

2010/10/24 Jukka Zitting <[email protected]>:
>
> No. MediaWiki uses a database backend instead of a special file format
> for storing data, so you'd need to use something like the ManifoldCF
> (http://incubator.apache.org/connectors/) to extract information from
> a MediaWiki installation.
>


Yes, but it's also possible to use the media wiki API
(http://www.mediawiki.org/wiki/API) and read json, yaml, xml etc
format. It's also possible to read the mediawiki code of a simple page
(http://en.wikipedia.org/wiki/Lucene?action=raw, to get the mediawiki
source of Lucene page). Is it possible to make an extractor with that,
or is it best to do with Manifold ?

Thanks.

Re: Tika for mediawiki ?

Reply via email to