Hi, 2014-06-18 16:02 GMT+02:00 Jukka Zitting <[email protected]>:
> Hi, > > On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili > <[email protected]> wrote: > > should we just return the number of estimated entries for the cost? > > Yes, that's what I think the contract should be. > ok, that's different from what Thomas suggests, right? Just entry estimates, no network roundtrips / asynchronous index penalties, etc. > > > My other concern on this point is that it's not granted, in my opinion, > > that the index returning less entries would be the faster. > > The key concern here isn't the performance of the index, but the > overall performance of the query. If an index returns n paths for a > given query, then the amount of work done by the query engine will be > O(n) as it needs to look up each path and double-check the constraints > on those nodes (or O(n log n) if it also needs to sort the results). > Since I don't expect any reasonable index to perform worse than O(n), > returning n as the cost estimate seems like the best thing to do. > ok, under such perspective the index is not returning a cost, but how many nodes it will provide to the engine, the cost of the query is then a function of the number of entries. At the moment node number estimates and performance of the index aspects seem kind of merged into the "getCost". Then we should probably decouple (at least) the concepts of: 1. how many nodes the index will return for this query (as an estimate) 2. how fast in retrieving the estimated nodes the index is Even with this distinction we would have to make some choices as given two indices returning the same number of estimated nodes for the same query, (I assume) the fastest should be chosen, but if two indices return two different node number estimates (e.g. that's likely if you have two different full text indices being able to handle the same query), which one should be chosen and why? > > On a related note, I tend to disagree with the additional cost factors > introduced in AdvancedQueryIndex/IndexPlan. In practically all cases > the index lookups will be asymptotically faster than looking up all > the paths returned by the index. true, but then again the same question comes to my mind: how we select the index to be used? Are we just interested in how many nodes it will return or how much time it takes to do that plays some role in selection? > Thus I don't think there is value in > trying to make detailed estimates about the cost of the index lookup. > Thanks to all, I will probably need some more time to digest all the thoughts and maybe I'll come back with more questions / ideas. Regards, Tommaso > > BR, > > Jukka Zitting >
