Re: [DISCUSS] - QueryIndex selection

Tommaso Teofili Wed, 18 Jun 2014 08:33:11 -0700

Hi,

2014-06-18 16:02 GMT+02:00 Jukka Zitting <[email protected]>:

> Hi,
>
> On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili
> <[email protected]> wrote:
> > should we just return the number of estimated entries for the cost?
>
> Yes, that's what I think the contract should be.
>

ok, that's different from what Thomas suggests, right? Just entry
estimates, no network roundtrips / asynchronous index penalties, etc.

>
> > My other concern on this point is that it's not granted, in my opinion,
> > that the index returning less entries would be the faster.
>
> The key concern here isn't the performance of the index, but the
> overall performance of the query. If an index returns n paths for a
> given query, then the amount of work done by the query engine will be
> O(n) as it needs to look up each path and double-check the constraints
> on those nodes (or O(n log n) if it also needs to sort the results).
> Since I don't expect any reasonable index to perform worse than O(n),
> returning n as the cost estimate seems like the best thing to do.
>

ok, under such perspective the index is not returning a cost, but how many
nodes it will provide to the engine, the cost of the query is then a
function of the number of entries.
At the moment node number estimates and performance of the index aspects
seem kind of merged into the "getCost".
Then we should probably decouple (at least) the concepts of:
1. how many nodes the index will return for this query (as an estimate)
2. how fast in retrieving the estimated nodes the index is

Even with this distinction we would have to make some choices as given two
indices returning the same number of estimated nodes for the same query, (I
assume) the fastest should be chosen, but if two indices return two
different node number estimates (e.g. that's likely if you have two
different full text indices being able to handle the same query), which one
should be chosen and why?

>
> On a related note, I tend to disagree with the additional cost factors
> introduced in AdvancedQueryIndex/IndexPlan. In practically all cases
> the index lookups will be asymptotically faster than looking up all
> the paths returned by the index.

true, but then again the same question comes to my mind: how we select the
index to be used? Are we just interested in how many nodes it will return
or how much time it takes to do that plays some role in selection?

> Thus I don't think there is value in
> trying to make detailed estimates about the cost of the index lookup.
>

Thanks to all, I will probably need some more time to digest all the
thoughts and maybe I'll come back with more questions / ideas.
Regards,
Tommaso

>
> BR,
>
> Jukka Zitting
>

Re: [DISCUSS] - QueryIndex selection

Reply via email to