Hi, On Wed, Jun 18, 2014 at 11:31 AM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > 2014-06-18 16:02 GMT+02:00 Jukka Zitting <jukka.zitt...@gmail.com>: >> On Wed, Jun 18, 2014 at 4:26 AM, Tommaso Teofili >> <tommaso.teof...@gmail.com> wrote: >> > should we just return the number of estimated entries for the cost? >> >> Yes, that's what I think the contract should be. > > ok, that's different from what Thomas suggests, right? Just entry > estimates, no network roundtrips / asynchronous index penalties, etc.
Right. I don't believe the cost of the index lookup is significant (at least in the asymptotic sense) compared to the overall cost of executing a query. > ok, under such perspective the index is not returning a cost, but how many > nodes it will provide to the engine, the cost of the query is then a > function of the number of entries. Exactly. > At the moment node number estimates and performance of the index aspects > seem kind of merged into the "getCost". > Then we should probably decouple (at least) the concepts of: > 1. how many nodes the index will return for this query (as an estimate) > 2. how fast in retrieving the estimated nodes the index is I would further argue that point 2 is mostly irrelevant for any decent index. The only case where I would expect index performance to show up as a significant factor is when n is small, but the best way to optimize such queries is probably to just cache the results per query instead of trying to make informed guesses about expected index performance. > Even with this distinction we would have to make some choices as given two > indices returning the same number of estimated nodes for the same query, (I > assume) the fastest should be chosen, but if two indices return two > different node number estimates (e.g. that's likely if you have two > different full text indices being able to handle the same query), which one > should be chosen and why? Unless there are other contributing factors (like preferring a synchronous index over an asynchronous one, or an explicit preference by a client), it shouldn't really matter much which one of equally costly indexes is being selected. BR, Jukka Zitting