[
https://issues.apache.org/jira/browse/OAK-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486759#comment-14486759
]
Chetan Mehrotra commented on OAK-2730:
--------------------------------------
bq. Wanted to at least put this out here before we rush to implement a work
around.
[~mmarth] No rush here ;) Just wanted to initiate discussion on this issue. The
idea looks worth a try and benchmark. We already make use of access control
within index implementation to filter out result in a faster way in Lucene for
suggestor.
Other point being approach being used in JR2 has been in use for pretty long
time and people are probably fine with that. So we should leave that option to
administrator to decide and expose this as a configurable feature. At times
people do require fast count and we can surely tell it in a faster way for
Lucene based indexes.
> Faster result count estimation for QueryResult on lines of resultFetchSize
> support in JR2
> -----------------------------------------------------------------------------------------
>
> Key: OAK-2730
> URL: https://issues.apache.org/jira/browse/OAK-2730
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: query
> Reporter: Chetan Mehrotra
> Fix For: 1.3.0
>
>
> Currently in Oak while fetching the result size for a query the time taken is
> proportional to the result size. This would not perform well when result size
> is big. The complete traversal is required to perform ACL check to ensure
> that result count is *accurate*
> JR2 used to support {{resultFetchSize}} (default to integer max). This was
> used to get an estimate of possible result count whereby the count might not
> be accurate.
> Per [~mreutegg] this feature worked like below
> {quote}
> If resultFetchSize is set to 50 then QueryEngine will initially collect up to
> 50 nodes the current session is allowed to read from the raw lucene result
> set. While doing that, it counts the number of nodes denied by access control
> checks. The result size reported is then calculated as:
> raw-lucene-result-size - number-of-nodes-denied. The resultFetchSize is
> double and the query executed again if a client iterates passed the currently
> available nodes. If it is required to have an exact result size, then the
> configuration for 'resultFetchSize' can be increased to a much higher value.
> However, this has a severe performance impact for large result sets, because
> the query will now have to apply access control checks for the complete
> result set
> {quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)