[
https://issues.apache.org/jira/browse/OAK-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485277#comment-14485277
]
Michael Marth commented on OAK-2730:
------------------------------------
[~chetanm], re
{quote}
Currently in Oak while fetching the result size for a query the time taken is
proportional to the result size. This would not perform well when result size
is big.
{quote}
I agree with the first sentence, but would at least discuss the second one :)
I had an offline discussion with [~tmueller] a while back about this. AFAIR we
had the this idea: the Query Engine can maybe avoid initializing nodes in order
to check ACLs, but rather just check the path. This would avoid object
creation. In common scenario with few ACLs on the whole repo this check could
potentially be done very efficiently, so that a result set of, say, 100k nodes
could still be ACL-checked reasonably fast to calculate the size.
Wanted to at least put this out here before we rush to implement a work around.
[~anchela], [~tmueller], thoughts?
> Faster result count estimation for QueryResult on lines of resultFetchSize
> support in JR2
> -----------------------------------------------------------------------------------------
>
> Key: OAK-2730
> URL: https://issues.apache.org/jira/browse/OAK-2730
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: query
> Reporter: Chetan Mehrotra
> Fix For: 1.3.0
>
>
> Currently in Oak while fetching the result size for a query the time taken is
> proportional to the result size. This would not perform well when result size
> is big. The complete traversal is required to perform ACL check to ensure
> that result count is *accurate*
> JR2 used to support {{resultFetchSize}} (default to integer max). This was
> used to get an estimate of possible result count whereby the count might not
> be accurate.
> Per [~mreutegg] this feature worked like below
> {quote}
> If resultFetchSize is set to 50 then QueryEngine will initially collect up to
> 50 nodes the current session is allowed to read from the raw lucene result
> set. While doing that, it counts the number of nodes denied by access control
> checks. The result size reported is then calculated as:
> raw-lucene-result-size - number-of-nodes-denied. The resultFetchSize is
> double and the query executed again if a client iterates passed the currently
> available nodes. If it is required to have an exact result size, then the
> configuration for 'resultFetchSize' can be increased to a much higher value.
> However, this has a severe performance impact for large result sets, because
> the query will now have to apply access control checks for the complete
> result set
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)