Hi Dirk, First of all sorry for the delayed response - I wasn't working due to holidays last week.
While, I've commented on the issue itself, but I thought maybe we can still discuss the comparable scoring across UNION-ed clauses in this chain. On Fri, Dec 29, 2017 at 9:31 PM, Dirk Rudolph <[email protected]> wrote: > ... > In my naive assumption I would say that the fulltext constraint, if splitting > into multiple queries will be part of any on the disjunctive statements (or > unions) and with that the queryNorm(q) according to > https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html > > <https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html> > will be the same for each of the queries. Property constraints and even path > constraints could potentially be boosted to 0 to not have any impact on the > score - anyway from what I could observe in our tests scores are, if coming > from the same index, comparable across (similar) queries with the same > fulltext constraint but different property constraints. > While your argument for TFIDF similarity is correct for the fulltext clauses. But complete score also account for other clauses in the query. Consider following example (A AND B AND contains('test')) OR (C AND contains('test')) when lucene would score first sub-query, the fulltext similarity would get 1/3rd because there are 2 more matches (assuming score for A, B and C be boosted to 0 as you said). Otoh, for the second sub-query, it would only get halved. Also, documents from first clause that might alst have C as true would also not get scored for that (assuming we don't boost to 0 and have better matches above). Otoh, if we could send the whole query to lucene (which is unfortunately not possible today) then all resulting documents would have got scored on equal footing. (of course, btw, this is a trivial example but I guess it would bring out the point why UNION scores aren't necessarily comparable even if they get answered by the same index). Thanks, Vikas
