[ 
https://issues.apache.org/jira/browse/OAK-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486759#comment-14486759
 ] 

Chetan Mehrotra commented on OAK-2730:
--------------------------------------

bq. Wanted to at least put this out here before we rush to implement a work 
around.

[~mmarth] No rush here ;) Just wanted to initiate discussion on this issue. The 
idea looks worth a try and benchmark. We already make use of access control 
within index implementation to filter out result in a faster way in Lucene for 
suggestor. 

Other point being approach being used in JR2 has been in use for pretty long 
time and people are probably fine with that. So we should leave that option to 
administrator to decide and expose this as a configurable feature. At times 
people do require fast count and we can surely tell it in a faster way for 
Lucene based indexes. 

> Faster result count estimation for QueryResult on lines of resultFetchSize 
> support in JR2
> -----------------------------------------------------------------------------------------
>
>                 Key: OAK-2730
>                 URL: https://issues.apache.org/jira/browse/OAK-2730
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: query
>            Reporter: Chetan Mehrotra
>             Fix For: 1.3.0
>
>
> Currently in Oak while fetching the result size for a query the time taken is 
> proportional to the result size. This would not perform well when result size 
> is big. The complete traversal is required to perform ACL check to ensure 
> that result count is *accurate*
> JR2 used to support {{resultFetchSize}} (default to integer max).  This was 
> used to get an estimate of possible result count whereby the count might not 
> be accurate.
> Per [~mreutegg] this feature worked like below
> {quote}
> If resultFetchSize is set to 50 then QueryEngine will initially collect up to 
> 50 nodes the current session is allowed to read from the raw lucene result 
> set. While doing that, it counts the number of nodes denied by access control 
> checks. The result size reported is then calculated as: 
> raw-lucene-result-size - number-of-nodes-denied. The resultFetchSize is 
> double and the query executed again if a client iterates passed the currently 
> available nodes. If it is required to have an exact result size, then the 
> configuration for 'resultFetchSize' can be increased to a much higher value. 
> However, this has a severe performance impact for large result sets, because 
> the query will now have to apply access control checks for the complete 
> result set
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to