Le 27/08/2015 22:29, Trey Jones a écrit :
Anyway, I like stripping stop words better than relaxing AND to OR, unless there's some additional post-search ranking to sort the results into a more AND-ish order.

I think my previous mail was misleading, I don't want to replace AND by OR. I mean when the query contains a lot of words (questions) the default AND is not appropriate because a single missing stopword could hide a good result. We could use the minimum_should_match attribute which allows to force a minimal number term to match (e.g. 90% of the query terms should match).

There's also another interesting query which will do the "stopwords stripping" automagically, it's the common term query [1]. In few words this query is able to detect stopwords by analyzing word freq at query time, so the query:

What's the connection between power laws and zipf distribution
will be split into 2 clauses :
- connection power laws zipf distribution
- what's the between and

And we can control the boolean operator of these clauses independently, e.g. OR for high freq words and AND for low freq words. Or even more complex stuff like "3<80%" [2]: if there is more than 3 words only 80% of them are required.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-common-terms-query.html [2] https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html

Wikimedia-search mailing list

Reply via email to