Hi, I looked a bit into how MongoDB selects indexes (query plans) and think we could take some inspiration.
So, the way MongoDB does it afaiu: * query gets parsed into Abstract Syntax Tree (so that parameters can get stripped out) * the first time this query is performed then the query is executed against *all* available indexes * the fastest index is put into a cache, so that when the same query (abstracted, regardless of parameters) comes in, then only that fastest index will be used (will be looked up from cache) * after a number of modifications that index-selection-cache is flushed. Process starts at beginning. What I dislike about this process is that the first query puts a lot more into the system (due to the fact that all indexes perform the query). Moreover, the first execution of that query could be disturbed by noise, so the selection could be wrong. What I like, though, (if we ignore the noise issue from above) is that the selected index is the one that has actually proven to be the fastest. So, for Oak: maybe we could enhance the deterministic selection process we have right now. We could run queries in the background to determine if the cost factors that the indexers claim to have are actually correct (and if not, correct them in the query engine). Those background queries could be the ones “most often executed” by users on that repo that have multiple indexes capable of answering the query. Consider such a scenario: you have the same nodes indexed in the local property index (on the same machine that also serves requests) and a remote SolrCloud cluster. If we only reason about index size etc then we can never account for the fact that the local machine’s index might be much slower than those external machines that are used exclusively for answering queries. We could though, if we actually run those queries a number of times on both indexes. Cheers Michael
