Re: [DISCUSS] - QueryIndex selection

Jukka Zitting Mon, 23 Jun 2014 11:35:04 -0700

Hi,

On Mon, Jun 23, 2014 at 1:58 PM, Thomas Mueller <muel...@adobe.com> wrote:
>>It's more than access control. The query engine needs to double-check
>>the constraints of the query for each matching path before passing
>>that node to the client (see the constraint.evaluate() call in [1]). I
>>don't see any easy way to avoid that step without major refactoring.
>
> If there is no other constraint, then no additional checks are needed.


That's not correct. For example if I query for a value with more than
100 characters, the PropertyIndex may return paths that actually won't
match the query.

This requirement to double-check the results is also described in the
QueryIndex.guery() contract: "An implementation should only filter the
result if it can do so easily and efficiently; the query engine will
verify the data again (in memory) and check for access rights."

>>Are there any potential indexes where the
>>AdvancedQueryIndex.getCostPerEntry() method (at least the way it's now
>>used in [2]) should return a value that's different from 1?
>
> Yes, for example an index that keep all (relevant) entries in memory, the
> cost should be close to zero.

Makes sense if we adapt the calculation as suggested in OAK-1910.

> Yes. The AdvancedQueryIndex has separate methods so that the query engine
> can better calculate the cost (getCostPerExecution, getCostPerEntry,
> getEstimatedEntryCount). The query could contain a limit (let's say 100)
> which should also be taken into account, and possibly an "order by"
> restriction. Plus the query engine could take into account that typically,
> only the first 50 entries are read (optimize for "fast first 50 entries" -
> see also
> http://stackoverflow.com/questions/1308946/should-i-use-query-hint-fast-num
> ber-rows-fastfirstrow ).

Agreed.

>>The index-level entry cost estimates only become relevant when the
>>cost of returning a path is more than a fraction of the cost of
>>loading a node. I don't believe that's the case for any reasonable
>>index implementations.
>
> It depends on whether (and when) the node needs to be loaded.

The node always needs to be loaded, see above.

>>I'm just worried about potential confusion about what the
>>getCostPerEntry() method (as used in [2]) should return. The value is
>>currently only set in [3], but there the estimate seems to be based on
>>the relative performance of the *index lookup*, not the overall
>>performance of a query. I believe either [2] or [3] should be adjusted
>>to fix the cost calculation.
>
> Yes, you are right. Currently the formula assumes that the query engine
> doesn't load the node. That's not correct. I created OAK-1910 to track
> this.

Thanks!

BR,

Jukka Zitting

Re: [DISCUSS] - QueryIndex selection

Reply via email to