Le 27/08/2015 22:29, Trey Jones a écrit :
Anyway, I like stripping stop words better than relaxing AND to OR,
unless there's some additional post-search ranking to sort the results
into a more AND-ish order.
I think my previous mail was misleading, I don't want to replace AND by
OR. I mean when the query contains a lot of words (questions) the
default AND is not appropriate because a single missing stopword could
hide a good result. We could use the minimum_should_match attribute
which allows to force a minimal number term to match (e.g. 90% of the
query terms should match).
There's also another interesting query which will do the "stopwords
stripping" automagically, it's the common term query .
In few words this query is able to detect stopwords by analyzing word
freq at query time, so the query:
What's the connection between power laws and zipf distribution
will be split into 2 clauses :
- connection power laws zipf distribution
- what's the between and
And we can control the boolean operator of these clauses independently,
e.g. OR for high freq words and AND for low freq words. Or even more
complex stuff like "3<80%" : if there is more than 3 words only 80%
of them are required.
Wikimedia-search mailing list