SearchEngine subclasses can implement getTextFromContent() if they want to override the normal text fetching behavior.

I can't put it into SearchEngine subclass because Tika isn't a search engine, it's rather a java application that runs separately and extracts text from binary files like *.doc, *.pdf and so on.

TikaMW is a plugin that should work with any search engine - it just modifies indexed text for pages in File: namespace.

--
With best regards,
  Vitaliy Filippov

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to