Hi Dirk,

First of all sorry for the delayed response - I wasn't working due to
holidays last week.

While, I've commented on the issue itself, but I thought maybe we can
still discuss the comparable scoring across UNION-ed clauses in this
chain.

On Fri, Dec 29, 2017 at 9:31 PM, Dirk Rudolph
<[email protected]> wrote:
> ...
> In my naive assumption I would say that the fulltext constraint, if splitting 
> into multiple queries will be part of any on the disjunctive statements (or 
> unions) and with that the queryNorm(q) according to 
> https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>  
> <https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html>
>  will be the same for each of the queries. Property constraints and even path 
> constraints could potentially be boosted to 0 to not have any impact on the 
> score - anyway from what I could observe in our tests scores are, if coming 
> from the same index, comparable across (similar) queries with the same 
> fulltext constraint but different property constraints.
>

While your argument for TFIDF similarity is correct for the fulltext
clauses. But complete score also account for other clauses in the
query. Consider following example

(A AND B AND contains('test')) OR (C AND contains('test'))

when lucene would score first sub-query, the fulltext similarity would
get 1/3rd because there are 2 more matches (assuming score for A, B
and C be boosted to 0 as you said). Otoh, for the second sub-query, it
would only get halved. Also, documents from first clause that might
alst have C as true would also not get scored for that (assuming we
don't boost to 0 and have better matches above).

Otoh, if we could send the whole query to lucene (which is
unfortunately not possible today) then all resulting documents would
have got scored on equal footing.

(of course, btw, this is a trivial example but I guess it would bring
out the point why UNION scores aren't necessarily comparable even if
they get answered by the same index).

Thanks,
Vikas

Reply via email to