Hi, >should we just return the number of estimated entries for the cost?
For Lucene, the property index, the ordered index, and the node type index: yes. For Solr, the cost per index lookup (not per entry) is probably a bit higher, because there is a network round trip. Specially if Solr is remote. That's a fixed offset, so the cost could be: 1 + estimatedEntryCount. For in-memory indexes (that keep all entries in memory), no. One example is an in-memory index of transient UUIDs. In this case it might depend on whether the UUID is in memory or not. Calculating the cost in this case is quick, as it's just an in-memory lookup. Therefore, the cost could be very close to 0. >Right. I don't believe the cost of the index lookup is significant (at >least in the asymptotic sense) compared to the overall cost of >executing a query. Sorry, I don't understand. The cost of the index lookup *is* significant of course, specially if the nodes are not in the cache (which is very common). In which case, index lookup can contribute to about 50% of the overall cost of a query (if, let's assume, one disk read is needed for the index lookup, and one disk read is needed to read the actual node). For a covering index - http://stackoverflow.com/questions/62137/what-is-a-covered-index - the index lookup is nearly 100% of the query cost. > in the asymptotic sense Sorry could you translate that for me please? :-) >>ok, under such perspective the index is not returning a cost, but how >>many >> nodes it will provide to the engine, the cost of the query is then a >> function of the number of entries. > >Exactly. As I wrote above, it depends on the nature of the index (in-memory, disk based, remote or local). >instead of trying to make informed guesses about expected index >performance. Trying to make an informed guess about the cost is the whole point of a cost based optimizer - http://en.wikipedia.org/wiki/Query_optimization >it shouldn't really matter much which one of equally >costly indexes is being selected. Sure, if the cost of two indexes is the same, then it doesn't matter which index is used. That's the reason to make sure the returned cost is somewhat accurate, and the reason to use the same scale (units of measuring) in all index implementations. Regards, Thomas
