[
https://issues.apache.org/jira/browse/OAK-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738447#comment-14738447
]
Thomas Mueller commented on OAK-3380:
-------------------------------------
As an alternative: instead of pruning while updating the index, we might be
able to prune while reading from the index. I'm not sure if that's really a
good idea (write while reading), but anyway: when traversing a empty node,
remove it with a probability of let's say 5% (using a pseudo random number
generator). That way, empty nodes are removed eventually, and the effect of
query performance would be limited.
> Property index pruning should happen asynchronously
> ---------------------------------------------------
>
> Key: OAK-3380
> URL: https://issues.apache.org/jira/browse/OAK-3380
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: query
> Affects Versions: 1.3.5
> Reporter: Vikas Saurabh
> Priority: Minor
> Labels: resilience
>
> Following up on this (a relatively old) thread \[1], we should do pruning of
> property index structure asynchronously. The thread was never concluded..
> here are a couple of ideas picked from the thread:
> * Move pruning to an async thread
> * Throttle pruning i.e. prune only once in a while
> ** I'm not sure how that would work though -- an unpruned part would remain
> as is until another index happens on that path.
> Once we can move pruning to some async thread (reducing concurrent updates),
> OAK-2673 + OAK-2929 can take care of add-add conflicts.
> ----
> h6. Why is this an issue despite merge retries taking care of it?
> A couple of cases which have concurrent updates hitting merge conflicts in
> our product (Adobe AEM):
> * Some index are very volatile (in the sense that indexed property switches
> its values very quickly) e.g. sling job status, AEM workflow status.
> * Multiple threads take care of jobs. Although sling maintains a bucketed
> structure for job storage to reduce conflicts... but inside index tree the
> bucket structure, at times, gets pruned and needs to be created in the next
> job status change
> While retries do take care of these conflict a lot of times and even when
> they don't, AEM workflows has it's own retry to work around. But, retrying,
> IMHO, is just a waste of time -- more importantly in paths where application
> doesn't really have a control.
> h6. Would this add to cost of traversing index structure?
> Yes, there'd be some left over paths in index structure between asynchronous
> prunes. But, I think the cost of such wasted traversals would be covered up
> with time saved in avoiding the concurrent update conflict.
> ----
> (cc [~tmueller], [~mreutegg], [~alex.parvulescu], [~chetanm])
> \[1]:
> http://mail-archives.apache.org/mod_mbox/jackrabbit-oak-dev/201506.mbox/%3ccadichf66u2vh-hlrjunansytxfidj2mt3vktr4ybkngpzy9...@mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)