[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301777#comment-16301777
 ] 

Vikas Saurabh commented on OAK-7109:
------------------------------------

[~diru],
bq. That basically works, but only in the case that both queries hit the same 
index as only then TF/IDF score is comparable (also across multiple queries). 
So the solutions I see are:
Umm... I don't think lucene scores across different queries from same index can 
be comparable (the first thing that comes to my mind is normalization factors 
would be different for each query.... there might be other reasons too)

bq. a) creating DNF disjunctive statements of a query as alternatives (not sure 
if the alternative currently created is DNF) and support proper counting over 
union queries
well, the alternative is indeed very similar... although, ORs are made into 
UNIONs. The bigger problem is that current lucene cost estimation would give 
same cost (at least for the example in description) for both sub-queries ... 
that would make total cost of UNION-ed execution double of what non-alternative 
version would give.
Current (OAK-6776) would scale cost for both components down... so, the cost 
war would be fairer... but still there would be chances that original query 
wins the cost war.

b) filtering the results in the using the query plans filter while counting 
facets, similar to the way its done for ACLs
I think that would be pretty bad for performance. I haven't looked closely of 
how ACL was done - but, there definitely were concerns... not sure how were 
they avoided... or if that wasn't required at all.

c) implementing a mode which translates any query as it is to its lucene 
equivalent
I'm not sure what you mean by "any query" - as far as I know all reasonable 
constrains (property, ordering, fulltext) do get passed down well to lucene. Of 
course, it depends that the backing index definition is sufficient according to 
the query. Imo, if both (or more... along with operators) could be passed down 
well, then this could have been solved - but, we don't have functionality yet.

bq. We tried already running one query for each path, but even with that the 
individual queries are too complex to be passed to lucene with all constraints. 
(not entirely sure why though ...)
I'd interested to look at your query and the index def. Can you share some 
details on a mail to oak-dev?

> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to