Hello,

Using the entryCount was our first option, but we decided to modify
costPerEntry instead. Basically these are the reasons:

   -  It can only be used to reduce the cost. It can't be used to increase
   the index cost as the index-planner selects
Math.min(definition.getEntryCount(),
   getReader().*numDocs()); *Thus it useless to have an entryCount value
   bigger than the index size.
   - The entry count for the rest of indexes competing will be, in general
   1000, as it will not be defined and all of our indexes are contain more
   that 100 documents. Remember that if the value, entryCount, is not defined
   then *definition*.*getEntryCount() *will use th*e **DEFAULT_ENTRY_COUNT*
   which is 1000 and the planner will take
Math.min(definition.getEntryCount(),
   getReader().*numDocs())*. In other words the entryCount should be
   defined smaller than 1000 to make it "very attractive".

Using costPerEntry allows us to increase the cost, thus avoiding the usage
of this index by every user. In this way, only selected applications,
performing Lucene native queries will use it.

In this apparent maze that is the planner it is still not clear for me:

   - which index is selected in case of cost tie.

I guess that part of the misconception I have about how the queries work in
OAK, comes from my experience working with Solr and Lucene building search
engines. Basically, when we develop a search component using Solr or
Lucene, all the necessary data (all the documents and fields) is stored
within the index (or indexes). Thus every query is routed deliberately to
the index we want to use and that index is enough to get the complete
response we expect. It seems that in OAK, indexes are used to speed up
queries, but not as a complete information entity that must answer every
query clause. Well, I'm still figuring out the big picture of querying
process in OAK.

Regards.

On Tue, May 16, 2017 at 6:23 PM, Chetan Mehrotra <[email protected]>
wrote:

> Having same asset indexed twice would add overhead in terms of async
> indexing speed and space consumption by index. So if possible avoid
> such a setup
>
> > We could assume that we always add a path restriction, but I'm not sure
> how
> > index movement can help. I mean, both indexes contains documents under
> > /nodeA/nodeB/nodeC  so any query under this path is satisfied by both
>
> One way it would help is that entryCount for second index would be
> smaller compared to first and hence queries with path restriction for
> second index would have lesser cost
>
> Chetan Mehrotra
>

Reply via email to