> - Rebuilding Assets index takes several days Is the time spent in text extraction?
Would the code always specify path restriction in the queries for my:Asset? If yes then you can just move the index definition under respective paths Would that be an option Chetan Mehrotra On Tue, May 16, 2017 at 8:20 PM, Alvaro Cabrerizo <[email protected]> wrote: > Hello, > > Yes, there are some reasons: > > - One team is working with all the assets under /nodeA/nodeB > - Other team only works with an small subset, only assets under > /nodeA/node/nodeC > - Both teams have different searching requirements > - Rebuilding Assets index takes several days > - Rebuilding deeperAsset takes a couple of hours. Thus merging both will > penalize the size and modification agility of deeperAsset > > One option we are evaluating is to cheat the deepAsset cost (using > costPerEntry property) and make that team work with lucene native queries > (any recommendation is welcome). Anyway, it is not clear which index is > selected in case of tie. > > Regards. > > > On Tue, May 16, 2017 at 4:36 PM, Chetan Mehrotra <[email protected]> > wrote: > >> Any reason for having separate definitions for same nodetype? >> Chetan Mehrotra >> >> >> On Tue, May 16, 2017 at 7:52 PM, Alvaro Cabrerizo <[email protected]> >> wrote: >> > Hello, >> > >> > Actually, it is OAK-5449. Sorry, I hadn't seen it. >> > >> > On the other hand, having these two definitions under oak:index (just a >> > sketch): >> > >> > >> > - Asset >> > - evaluatePathRestrictions="true" >> > - type="lucene" >> > - includedPath="/nodeA/nodeB" >> > - indexRules >> > - my:Asset >> > - properties >> > - name="my:title" >> > - deeperAsset >> > - evaluatePathRestrictions="true" >> > - type="lucene" >> > - includedPath="/nodeA/nodeB/includedC" >> > - ndexRules >> > - my:Asset >> > - properties >> > - name="my:description" >> > >> > Made the system to return a cost of 1001 (for both indexes) when >> performing >> > these kind of queries: >> > >> > - SELECT * FROM [my:Asset] AS s WHERE ISDESCENDANTNODE(s,'/nodeA/ >> nodeB') >> > - SELECT * FROM [my:Asset] AS s WHERE ISDESCENDANTNODE(s,'/nodeA/ >> nodeB') >> > AND s.[my:title]='title' >> > - SELECT * FROM [my:Asset] AS s WHERE ISDESCENDANTNODE(s,'/nodeA/ >> nodeB') >> > AND s.[my:description]='description' >> > >> > Once cost is assigned (equal for both indexes), it is not clear which >> index >> > will be selected. >> > >> > Regards. >> > >> > On Tue, May 16, 2017 at 3:50 PM, Chetan Mehrotra < >> [email protected]> >> > wrote: >> > >> >> This looks similar to OAK-5449 (not yet fixed). Can you give a sample >> >> index definition there and some usecase details which is leading to >> >> ambiguity in index selection. >> >> >> >> In general index selection should not have multiple competing index >> >> definitions hence interested in knowing setup details >> >> Chetan Mehrotra >> >> >> >> >> >> On Tue, May 16, 2017 at 1:53 PM, Alvaro Cabrerizo <[email protected]> >> >> wrote: >> >> > Hello, >> >> > >> >> > I've been checking the code of the IndexPlanner (apache OAK 1.4.1) >> and I >> >> > was surprised because the costPerEntryFactor remains 1 in both cases: >> >> > >> >> > - when no property indexed or sorted match any property clause or >> sort >> >> > clause from the query >> >> > - when only an indexed or sorted property matches a property >> clause or >> >> > sort clause from the query >> >> > >> >> > Although this piece of code avoids a division by zero (see >> >> > org.apache.jackrabbit.oak.plugins.index.lucene.IndexPlanner lines >> >> 201-203) >> >> > >> >> > if (costPerEntryFactor == 0){ >> >> > costPerEntryFactor = 1; >> >> > } >> >> > >> >> > It also avoids the boosting of indexes that match 1 query clause (or >> in >> >> > other words, it doesn't penalize indexes that don't match any clause). >> >> I'm >> >> > thinking about opening an issue. Although it is for the long-tern, >> >> actually >> >> > I would like to know which index is selected in case that more than >> one >> >> had >> >> > the same cost. >> >> > >> >> > Regards. >> >> >>
