I've written about my problem ~2 years ago: http://wikitech-l.wikimedia.narkive.com/6G0YPmWQ/need-a-way-to-modify-text-before-indexing-was-searchupdate
It seems I've lost the latest message, so I want to answer to it now:
With lsearchd and Elasticsearch, we absolutely wouldn't want to munge file text into page content (with sql-backed search, you might maybe).
Why?? Aren't these also just the fulltext search backends? As I understand they're much faster than sql-backed search engines. What would prevent them to store file texts?
Personally I use Sphinx (http://sphinxsearch.com) with TikaMW, and of course everything is fine.
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l