FWIW, we do index the full text of (PDF and?) DjVu files on Commons
(because it's stored in img_metadata). It's probably the biggest
improvement CirrusSearch brought for Commons.

And we also index office documents via Tika (*.doc and similar).

And I think it should not be a feature of the search engine at all! It's a separate feature that's completely independent of the search engine used (that's how it's implemented in my TikaMW).

So, is there any replacement for the SearchUpdate hook to modify the indexed text?

Of course I can just return SearchUpdate back by including a patch in our distribution mediawiki4intranet, but I would prefer if TikaMW didn't require patching...

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to