[
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16312632#comment-16312632
]
Vikas Saurabh commented on OAK-7109:
------------------------------------
[~diru], thanks for the investigation. I now see the issue, but unfortunately,
with the current design of how query engine parses the queries and then passes
sub-query to index providers, it's almost impossible to have correct faceting
for complex queries.
The way I see the fundamental problem is:
* facet is an aggregation function => any query with rep:facet must be
completely resolved by a single index
* currently index providers only resolve ANDed clauses => so, complex queries
never get all their clauses passed down to (lucene) index
I really don't have any solution work-around for your problem though :(.
[~tmueller], would you have any ideas about how can we make such cases work?
PS: Btw, [~diru], the scoring across UNIONed clauses won't work (as you
mentioned in the mail) - but that's a digression and won't solve the problem at
hand as you correctly said that the different clauses across UNIONs won't be
disjoint.
> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
> Key: OAK-7109
> URL: https://issues.apache.org/jira/browse/OAK-7109
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: lucene
> Affects Versions: 1.6.7
> Reporter: Dirk Rudolph
> Labels: facet
> Attachments: facetsInMultipleRoots.patch,
> restrictionPropagationTest.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not
> containing all original constraints. For example queries with multiple path
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*],
> 'ipsum') and (isdescendantnode(a,'/content1') or
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene
> even though the index supports evaluating path constraints.
> As counting the facets happens on the raw result of lucene, the returned
> facets are incorrect. For example having the following content
> {code}
> /content1/test/foo
> + text = lorem ipsum
> - simple/
> + tags = tag1, tag2
> /content2/test/bar
> + text = lorem ipsum
> - simple/
> + tags = tag1, tag2
> /content3/test/bar
> + text = lorem ipsum
> - simple/
> + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual
> result set is
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the
> [disjunctive normal
> form|https://en.wikipedia.org/wiki/Disjunctive_normal_form] of my complex
> query and executing a query for each of the disjunctive statements. As this
> is expanding exponentially its only a theoretical solution, nothing for
> production.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)