[
https://issues.apache.org/jira/browse/OAK-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486925#comment-14486925
]
Thomas Mueller commented on OAK-2730:
-------------------------------------
The estimation in Jackrabbit 2.x had security problems, so I think we should
not do that.
In my view, we should do OAK-2423 "paths-based read-access evaluation", and use
that within the query engine. _If_ this is not good enough (but only then), we
can check if we need to do something else. But I think it would be something
else, I think it wouldn't be the solution proposed here.
> i would strongly recommend to not introduce such an optimization without
> adding dedicated benchmarks.
+1. Specially because it's a big change, a potential security problem, and can
break other things.
> Faster result count estimation for QueryResult on lines of resultFetchSize
> support in JR2
> -----------------------------------------------------------------------------------------
>
> Key: OAK-2730
> URL: https://issues.apache.org/jira/browse/OAK-2730
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: query
> Reporter: Chetan Mehrotra
> Fix For: 1.3.0
>
>
> Currently in Oak while fetching the result size for a query the time taken is
> proportional to the result size. This would not perform well when result size
> is big. The complete traversal is required to perform ACL check to ensure
> that result count is *accurate*
> JR2 used to support {{resultFetchSize}} (default to integer max). This was
> used to get an estimate of possible result count whereby the count might not
> be accurate.
> Per [~mreutegg] this feature worked like below
> {quote}
> If resultFetchSize is set to 50 then QueryEngine will initially collect up to
> 50 nodes the current session is allowed to read from the raw lucene result
> set. While doing that, it counts the number of nodes denied by access control
> checks. The result size reported is then calculated as:
> raw-lucene-result-size - number-of-nodes-denied. The resultFetchSize is
> double and the query executed again if a client iterates passed the currently
> available nodes. If it is required to have an exact result size, then the
> configuration for 'resultFetchSize' can be increased to a much higher value.
> However, this has a severe performance impact for large result sets, because
> the query will now have to apply access control checks for the complete
> result set
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)