SearchEngine subclasses can implement getTextFromContent() if they want to override the normal text fetching behavior.
I can't put it into SearchEngine subclass because Tika isn't a search engine, it's rather a java application that runs separately and extracts text from binary files like *.doc, *.pdf and so on.
TikaMW is a plugin that should work with any search engine - it just modifies indexed text for pages in File: namespace.
-- With best regards, Vitaliy Filippov _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l