[jira] [Commented] (OAK-2730) Faster result count estimation for QueryResult on lines of resultFetchSize support in JR2

Thomas Mueller (JIRA) Thu, 09 Apr 2015 00:51:02 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486925#comment-14486925
 ]


Thomas Mueller commented on OAK-2730:
-------------------------------------

The estimation in Jackrabbit 2.x had security problems, so I think we should 
not do that.

In my view, we should do OAK-2423 "paths-based read-access evaluation", and use 
that within the query engine. _If_ this is not good enough (but only then), we 
can check if we need to do something else. But I think it would be something 
else, I think it wouldn't be the solution proposed here.

> i would strongly recommend to not introduce such an optimization without 
> adding dedicated benchmarks.

+1. Specially because it's a big change, a potential security problem, and can 
break other things.

> Faster result count estimation for QueryResult on lines of resultFetchSize 
> support in JR2
> -----------------------------------------------------------------------------------------
>
>                 Key: OAK-2730
>                 URL: https://issues.apache.org/jira/browse/OAK-2730
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: query
>            Reporter: Chetan Mehrotra
>             Fix For: 1.3.0
>
>
> Currently in Oak while fetching the result size for a query the time taken is 
> proportional to the result size. This would not perform well when result size 
> is big. The complete traversal is required to perform ACL check to ensure 
> that result count is *accurate*
> JR2 used to support {{resultFetchSize}} (default to integer max).  This was 
> used to get an estimate of possible result count whereby the count might not 
> be accurate.
> Per [~mreutegg] this feature worked like below
> {quote}
> If resultFetchSize is set to 50 then QueryEngine will initially collect up to 
> 50 nodes the current session is allowed to read from the raw lucene result 
> set. While doing that, it counts the number of nodes denied by access control 
> checks. The result size reported is then calculated as: 
> raw-lucene-result-size - number-of-nodes-denied. The resultFetchSize is 
> double and the query executed again if a client iterates passed the currently 
> available nodes. If it is required to have an exact result size, then the 
> configuration for 'resultFetchSize' can be increased to a much higher value. 
> However, this has a severe performance impact for large result sets, because 
> the query will now have to apply access control checks for the complete 
> result set
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OAK-2730) Faster result count estimation for QueryResult on lines of resultFetchSize support in JR2

Reply via email to