[ 
https://issues.apache.org/jira/browse/OAK-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738447#comment-14738447
 ] 

Thomas Mueller commented on OAK-3380:
-------------------------------------

As an alternative: instead of pruning while updating the index, we might be 
able to prune while reading from the index. I'm not sure if that's really a 
good idea (write while reading), but anyway: when traversing a empty node, 
remove it with a probability of let's say 5% (using a pseudo random number 
generator). That way, empty nodes are removed eventually, and the effect of 
query performance would be limited.

> Property index pruning should happen asynchronously
> ---------------------------------------------------
>
>                 Key: OAK-3380
>                 URL: https://issues.apache.org/jira/browse/OAK-3380
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: query
>    Affects Versions: 1.3.5
>            Reporter: Vikas Saurabh
>            Priority: Minor
>              Labels: resilience
>
> Following up on this (a relatively old) thread \[1], we should do pruning of 
> property index structure asynchronously. The thread was never concluded.. 
> here are a couple of ideas picked from the thread:
> * Move pruning to an async thread
> * Throttle pruning i.e. prune only once in a while
> ** I'm not sure how that would work though -- an unpruned part would remain 
> as is until another index happens on that path.
> Once we can move pruning to some async thread (reducing concurrent updates), 
> OAK-2673 + OAK-2929 can take care of add-add conflicts.
> ----
> h6. Why is this an issue despite merge retries taking care of it?
> A couple of cases which have concurrent updates hitting merge conflicts in 
> our product (Adobe AEM):
> * Some index are very volatile (in the sense that indexed property switches 
> its values very quickly) e.g. sling job status, AEM workflow status.
> * Multiple threads take care of jobs. Although sling maintains a bucketed 
> structure for job storage to reduce conflicts... but inside index tree the 
> bucket structure, at times, gets pruned and needs to be created in the next 
> job status change
> While retries do take care of these conflict a lot of times and even when 
> they don't, AEM workflows has it's own retry to work around. But, retrying, 
> IMHO, is just a waste of time -- more importantly in paths where application 
> doesn't really have a control.
> h6. Would this add to cost of traversing index structure?
> Yes, there'd be some left over paths in index structure between asynchronous 
> prunes. But, I think the cost of such wasted traversals would be covered up 
> with time saved in avoiding the concurrent update conflict.
> ----
> (cc [~tmueller], [~mreutegg], [~alex.parvulescu], [~chetanm])
> \[1]: 
> http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201506.mbox/%3ccadichf66u2vh-hlrjunansytxfidj2mt3vktr4ybkngpzy9...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to