Hi, On Mon, Jun 23, 2014 at 1:58 PM, Thomas Mueller <muel...@adobe.com> wrote: >>It's more than access control. The query engine needs to double-check >>the constraints of the query for each matching path before passing >>that node to the client (see the constraint.evaluate() call in [1]). I >>don't see any easy way to avoid that step without major refactoring. > > If there is no other constraint, then no additional checks are needed.
That's not correct. For example if I query for a value with more than 100 characters, the PropertyIndex may return paths that actually won't match the query. This requirement to double-check the results is also described in the QueryIndex.guery() contract: "An implementation should only filter the result if it can do so easily and efficiently; the query engine will verify the data again (in memory) and check for access rights." >>Are there any potential indexes where the >>AdvancedQueryIndex.getCostPerEntry() method (at least the way it's now >>used in [2]) should return a value that's different from 1? > > Yes, for example an index that keep all (relevant) entries in memory, the > cost should be close to zero. Makes sense if we adapt the calculation as suggested in OAK-1910. > Yes. The AdvancedQueryIndex has separate methods so that the query engine > can better calculate the cost (getCostPerExecution, getCostPerEntry, > getEstimatedEntryCount). The query could contain a limit (let's say 100) > which should also be taken into account, and possibly an "order by" > restriction. Plus the query engine could take into account that typically, > only the first 50 entries are read (optimize for "fast first 50 entries" - > see also > http://stackoverflow.com/questions/1308946/should-i-use-query-hint-fast-num > ber-rows-fastfirstrow ). Agreed. >>The index-level entry cost estimates only become relevant when the >>cost of returning a path is more than a fraction of the cost of >>loading a node. I don't believe that's the case for any reasonable >>index implementations. > > It depends on whether (and when) the node needs to be loaded. The node always needs to be loaded, see above. >>I'm just worried about potential confusion about what the >>getCostPerEntry() method (as used in [2]) should return. The value is >>currently only set in [3], but there the estimate seems to be based on >>the relative performance of the *index lookup*, not the overall >>performance of a query. I believe either [2] or [3] should be adjusted >>to fix the cost calculation. > > Yes, you are right. Currently the formula assumes that the query engine > doesn't load the node. That's not correct. I created OAK-1910 to track > this. Thanks! BR, Jukka Zitting