Hello, Using the entryCount was our first option, but we decided to modify costPerEntry instead. Basically these are the reasons:
- It can only be used to reduce the cost. It can't be used to increase the index cost as the index-planner selects Math.min(definition.getEntryCount(), getReader().*numDocs()); *Thus it useless to have an entryCount value bigger than the index size. - The entry count for the rest of indexes competing will be, in general 1000, as it will not be defined and all of our indexes are contain more that 100 documents. Remember that if the value, entryCount, is not defined then *definition*.*getEntryCount() *will use th*e **DEFAULT_ENTRY_COUNT* which is 1000 and the planner will take Math.min(definition.getEntryCount(), getReader().*numDocs())*. In other words the entryCount should be defined smaller than 1000 to make it "very attractive". Using costPerEntry allows us to increase the cost, thus avoiding the usage of this index by every user. In this way, only selected applications, performing Lucene native queries will use it. In this apparent maze that is the planner it is still not clear for me: - which index is selected in case of cost tie. I guess that part of the misconception I have about how the queries work in OAK, comes from my experience working with Solr and Lucene building search engines. Basically, when we develop a search component using Solr or Lucene, all the necessary data (all the documents and fields) is stored within the index (or indexes). Thus every query is routed deliberately to the index we want to use and that index is enough to get the complete response we expect. It seems that in OAK, indexes are used to speed up queries, but not as a complete information entity that must answer every query clause. Well, I'm still figuring out the big picture of querying process in OAK. Regards. On Tue, May 16, 2017 at 6:23 PM, Chetan Mehrotra <[email protected]> wrote: > Having same asset indexed twice would add overhead in terms of async > indexing speed and space consumption by index. So if possible avoid > such a setup > > > We could assume that we always add a path restriction, but I'm not sure > how > > index movement can help. I mean, both indexes contains documents under > > /nodeA/nodeB/nodeC so any query under this path is satisfied by both > > One way it would help is that entryCount for second index would be > smaller compared to first and hence queries with path restriction for > second index would have lesser cost > > Chetan Mehrotra >
