[ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332267#comment-16332267
 ] 

Thomas Mueller commented on OAK-7109:
-------------------------------------

[~diru] I'm sorry for the delay. I'm afraid I can't follow you... Some links 
(more for myself, please tell me if I made a mistake):

CNF https://en.wikipedia.org/wiki/Conjunctive_normal_form
example: A and (B or C)

DNF https://en.wikipedia.org/wiki/Disjunctive_normal_form
example: (A and not(B) and not((C)) or (not(D) and E and F)

NNF https://en.wikipedia.org/wiki/Negation_normal_form
example: (A or B) and C

> all constraints have to be passed to lucene, so the query has to be in DNF, 
> which is not the case at the moment

Only the filter is passed to Lucene currently, and that one doesn't have any 
"or" conditions (except for "x in(1, 2, 3)"). Changing that will be hard, and 
has some disadvantages. Other "or" conditions are currently only supported by 
using "union" (aggregation), or by not processing them in the index (filtering 
in the query engine).

So I think it's not so much about "not" conditions.

> would require also a deduplication between the lucene result sets returned 
> from each of the unions.

Yes. I think that's possible, even though it's not optimal.



> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>            Priority: Major
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch, 
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the 
> [disjunctive normal 
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex 
> query and executing a query for each of the disjunctive statements. As this 
> is expanding exponentially its only a theoretical solution, nothing for 
> production. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to