Hi,

>should we just return the number of estimated entries for the cost?

For Lucene, the property index, the ordered index, and the node type
index: yes. 

For Solr, the cost per index lookup (not per entry) is probably a bit
higher, because there is a network round trip. Specially if Solr is
remote. That's a fixed offset, so the cost could be: 1 +
estimatedEntryCount.

For in-memory indexes (that keep all entries in memory), no. One example
is an in-memory index of transient UUIDs. In this case it might depend on
whether the UUID is in memory or not. Calculating the cost in this case is
quick, as it's just an in-memory lookup. Therefore, the cost could be very
close to 0.

>Right. I don't believe the cost of the index lookup is significant (at
>least in the asymptotic sense) compared to the overall cost of
>executing a query.

Sorry, I don't understand. The cost of the index lookup *is* significant
of course, specially if the nodes are not in the cache (which is very
common). In which case, index lookup can contribute to about 50% of the
overall cost of a query (if, let's assume, one disk read is needed for the
index lookup, and one disk read is needed to read the actual node). For a
covering index - 
http://stackoverflow.com/questions/62137/what-is-a-covered-index - the
index lookup is nearly 100% of the query cost.

> in the asymptotic sense

Sorry could you translate that for me please? :-)

>>ok, under such perspective the index is not returning a cost, but how
>>many
>> nodes it will provide to the engine, the cost of the query is then a
>> function of the number of entries.
>
>Exactly.

As I wrote above, it depends on the nature of the index (in-memory, disk
based, remote or local).

>instead of trying to make informed guesses about expected index
>performance.

Trying to make an informed guess about the cost is the whole point of a
cost based optimizer - http://en.wikipedia.org/wiki/Query_optimization

>it shouldn't really matter much which one of equally
>costly indexes is being selected.

Sure, if the cost of two indexes is the same, then it doesn't matter which
index is used. That's the reason to make sure the returned cost is
somewhat accurate, and the reason to use the same scale (units of
measuring) in all index implementations.

Regards,
Thomas

Reply via email to