[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

Vikas Saurabh (JIRA) Fri, 22 Dec 2017 04:23:40 -0800

    [ 
https://issues.apache.org/jira/browse/OAK-7109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16301319#comment-16301319
 ]


Vikas Saurabh commented on OAK-7109:
------------------------------------

bq. To workaround that the only solution that came to my mind is building the 
DNF of my complex query and executing a query for each of the disjunctive 
statements. As this is expanding exponentially its only a theoretical solution, 
nothing for production. 
Interesting issue. Btw, the work-around you mentioned above would also most 
likely not work right away :(. Result from
{noformat}
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content1')
UNION
select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
'ipsum') and isdescendantnode(a,'/content2')
{noformat}
would be
||path||facet||
|/content1/test/bar|{"tag1":1"tag2":1}|
|/content2/test/bar|{"tag1":1"tag2":1}|

Basically, afaict, you are hitting 2 issues:
* single query passes only 1 path restrition down to planner - so, without 
manual break into UNION, single query would win the cost war (unfortunately) 
and give the result you have in description ({{tag1:3, tag2:3}}
* otoh, with manual break into query, you'd get different facet results for 
each part of the UNION and you'd have to aggregate the result at your end

I don't see how to easily fix this issue though :(. [~tmueller], [~chetanm], 
[~teofili], you guys might be interested in this issue.

Otoh, btw, if we "accept" that you can break the query and aggregate facets 
once more at your end, even then I think what you should do is:
* hit multiple query - one each for each path
* get first row from each path and aggregate facets
* run normal query (without facet) with union/or/what-you-have-in-description - 
so, that you still get benefits from lucene scoring compared correctly across 
different paths.

Btw, the reason, I think you should run separate queries and extract facets 
from first result from each path is to avoid consuming all results from a 
single path before being able to get facet output from the second path.
(... and, yes, I know, this is sub-optimal... but, afaict, that's the best 
possible way as of now).

> rep:facet returns wrong results for complex queries
> ---------------------------------------------------
>
>                 Key: OAK-7109
>                 URL: https://issues.apache.org/jira/browse/OAK-7109
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: lucene
>    Affects Versions: 1.6.7
>            Reporter: Dirk Rudolph
>              Labels: facet
>         Attachments: facetsInMultipleRoots.patch
>
>
> eComplex queries in that case are queries, which are passed to lucene not 
> containing all original constraints. For example queries with multiple path 
> restrictions like:
> {code}
> select [rep:facet(simple/tags)] from [nt:base] as a where contains(a.[*], 
> 'ipsum') and (isdescendantnode(a,'/content1') or 
> isdescendantnode(a,'/content2'))
> {code}
> In that particular case the index planer gives ":fulltext:ipsum" to lucene 
> even though the index supports evaluating path constraints. 
> As counting the facets happens on the raw result of lucene, the returned 
> facets are incorrect. For example having the following content 
> {code}
> /content1/test/foo
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content2/test/bar
>  + text = lorem ipsum
>  - simple/
>   + tags = tag1, tag2
> /content3/test/bar
>  + text = lorem ipsum
>  - simple/
>    + tags = tag1, tag2
> {code}
> the expected result for the dimensions of simple/tags and the query above is 
> - tag1: 2
> - tag2: 2
> as the result set is 2 results long and all documents are equal. The actual 
> result set is 
> - tag1: 3
> - tag2: 3
> as the path constraint is not handled by lucene.
> To workaround that the only solution that came to my mind is building the DNF 
> of my complex query and executing a query for each of the disjunctive 
> statements. As this is expanding exponentially its only a theoretical 
> solution, nothing for production. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (OAK-7109) rep:facet returns wrong results for complex queries

Reply via email to