>  - Rebuilding Assets index takes several days

Is the time spent in text extraction?

Would the code always specify path restriction in the queries for
my:Asset? If yes then you can just move the index definition under
respective paths

Would that be an option
Chetan Mehrotra


On Tue, May 16, 2017 at 8:20 PM, Alvaro Cabrerizo <[email protected]> wrote:
> Hello,
>
> Yes, there are some reasons:
>
>    - One team is working with all the assets under /nodeA/nodeB
>    - Other team only works with an small subset, only assets under
>    /nodeA/node/nodeC
>    - Both teams have different searching requirements
>    - Rebuilding Assets index takes several days
>    - Rebuilding deeperAsset takes a couple of hours. Thus merging both will
>    penalize the size and modification agility of deeperAsset
>
> One option we are evaluating is to cheat the deepAsset cost (using
> costPerEntry property) and make that team work with lucene native queries
> (any recommendation is welcome).  Anyway, it is not clear which index is
> selected in case of tie.
>
> Regards.
>
>
> On Tue, May 16, 2017 at 4:36 PM, Chetan Mehrotra <[email protected]>
> wrote:
>
>> Any reason for having separate definitions for same nodetype?
>> Chetan Mehrotra
>>
>>
>> On Tue, May 16, 2017 at 7:52 PM, Alvaro Cabrerizo <[email protected]>
>> wrote:
>> > Hello,
>> >
>> > Actually, it is OAK-5449. Sorry, I hadn't seen it.
>> >
>> > On the other hand, having these two definitions under oak:index (just a
>> > sketch):
>> >
>> >
>> >    - Asset
>> >    - evaluatePathRestrictions="true"
>> >       - type="lucene"
>> >       - includedPath="/nodeA/nodeB"
>> >       - indexRules
>> >       - my:Asset
>> >          - properties
>> >                - name="my:title"
>> >             - deeperAsset
>> >       - evaluatePathRestrictions="true"
>> >       - type="lucene"
>> >       - includedPath="/nodeA/nodeB/includedC"
>> >       - ndexRules
>> >       - my:Asset
>> >          - properties
>> >                - name="my:description"
>> >
>> > Made the system to return a cost of 1001 (for both indexes) when
>> performing
>> > these kind of queries:
>> >
>> >    - SELECT * FROM [my:Asset] AS s WHERE ISDESCENDANTNODE(s,'/nodeA/
>> nodeB')
>> >    - SELECT * FROM [my:Asset] AS s WHERE ISDESCENDANTNODE(s,'/nodeA/
>> nodeB')
>> >    AND s.[my:title]='title'
>> >    - SELECT * FROM [my:Asset] AS s WHERE ISDESCENDANTNODE(s,'/nodeA/
>> nodeB')
>> >    AND s.[my:description]='description'
>> >
>> > Once cost is assigned (equal for both indexes), it is not clear which
>> index
>> > will be selected.
>> >
>> > Regards.
>> >
>> > On Tue, May 16, 2017 at 3:50 PM, Chetan Mehrotra <
>> [email protected]>
>> > wrote:
>> >
>> >> This looks similar to OAK-5449 (not yet fixed). Can you give a sample
>> >> index definition there and some usecase details which is leading to
>> >> ambiguity in index selection.
>> >>
>> >> In general index selection should not have multiple competing index
>> >> definitions hence interested in knowing setup details
>> >> Chetan Mehrotra
>> >>
>> >>
>> >> On Tue, May 16, 2017 at 1:53 PM, Alvaro Cabrerizo <[email protected]>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > I've been checking the code of the IndexPlanner (apache OAK 1.4.1)
>> and I
>> >> > was surprised because the costPerEntryFactor remains 1 in both cases:
>> >> >
>> >> >    - when no property indexed or sorted match any property clause or
>> sort
>> >> >    clause from the query
>> >> >    - when only an indexed or sorted property matches a property
>> clause or
>> >> >    sort clause from the query
>> >> >
>> >> > Although this piece of code avoids a division by zero (see
>> >> > org.apache.jackrabbit.oak.plugins.index.lucene.IndexPlanner lines
>> >> 201-203)
>> >> >
>> >> >             if (costPerEntryFactor == 0){
>> >> >                 costPerEntryFactor = 1;
>> >> >             }
>> >> >
>> >> > It also avoids the boosting of indexes that match 1 query clause (or
>> in
>> >> > other words, it doesn't penalize indexes that don't match any clause).
>> >> I'm
>> >> > thinking about opening an issue. Although it is for the long-tern,
>> >> actually
>> >> > I would like to know which index is selected in case that more than
>> one
>> >> had
>> >> > the same cost.
>> >> >
>> >> > Regards.
>> >>
>>

Reply via email to