[
https://issues.apache.org/jira/browse/OAK-1926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071411#comment-14071411
]
Chetan Mehrotra edited comment on OAK-1926 at 7/23/14 7:22 AM:
---------------------------------------------------------------
Following notes are are based on a discussion with [~mreutegg] on this issue
* DocumentNodeStore needs to keep track of UnmergedBranches to distinguish
revisions which are part of a branch
* If a process terminates with some pending UnmergedBranches then those branch
info remain present in root document revision map and can only be removed if we
do a garbage collection and remove all commits which were part of those
branches. Without that we need to maintain the in memory state
* Loading of unmerged branch was done in
[1461193|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mongomk/src/main/java/org/apache/jackrabbit/mongomk/prototype/MongoMK.java?r1=1461193&r2=1461192&pathrev=1461193]
* Currently there are following problems wrt unmerged branches
** A - Check for revision being part of branch is costly - The way check is
currently implemented does not distinguish between in memory alive branches and
persisted unmerged branches. To simplify the check we distinguish between the
two types and for persisted unmerged branch we keep a set of such rev and first
do a lookup there to confirm if rev is part of unmerged branch before doing
actual check
** B - Tracking of branches which are not merged - An unmerged branch state
would be persisted in two cases
*** Client did not merged the branch - In this case we can somehow figure out
that a branch has gone out of scope (possibly via WekReference on
DocumentNodeStoreBranch) and would not be merged. In such a case we know the
commits done in that branch and perform a cleanup
*** Oak processes had a sudden exit - In this case branch commit info would be
lost and we would have to resort to GC
** C - Unmerged Rev GC (OAK-1981) - Once we implement a full GC then such
branch state can be collected in that GC
For now as part of this bug we would implement #A as that should reduce the
performance issue and later we can go for #B and #C
was (Author: chetanm):
Following notes are are based on a discussion with [~mreutegg] on this issue
* DocumentNodeStore needs to keep track of UnmergedBranches to distinguish
revisions which are part of a branch
* If a process terminates with some pending UnmergedBranches then those branch
info remain present in root document revision map and can only be removed if we
do a garbage collection and remove all commits which were part of those
branches. Without that we need to maintain the in memory state
* Loading of unmerged branch was done in
[1461193|http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-mongomk/src/main/java/org/apache/jackrabbit/mongomk/prototype/MongoMK.java?r1=1461193&r2=1461192&pathrev=1461193]
* Currently there are following problems wrt unmerged branches
** A - Check for revision being part of branch is costly - The way check is
currently implemented does not distinguish between in memory alive branches and
persisted unmerged branches. To simplify the check we distinguish between the
two types and for persisted unmerged branch we keep a set of such rev and first
do a lookup there to confirm if rev is part of unmerged branch before doing
actual check
** B - Tracking of branches which are not merged - An unmerged branch state
would be persisted in two cases
*** Client did not merged the branch - In this case we can somehow figure out
that a branch has gone out of scope (possibly via WekReference on
DocumentNodeStoreBranch) and would not be merged. In such a case we know the
commits done in that branch and perform a cleanup
*** Oak processes had a sudden exit - In this case branch commit info would be
lost and we would have to resort to GC
** C - Unmerged Rev GC (OAK-1981) - Once we implement a full GC then such
branch state can be collected in that GC
For now as part of this bug we would implement #C as that should reduce the
performance issue and later we can go for #B and #C
> UnmergedBranch state growing with empty BranchCommit leading to performance
> degradation
> ---------------------------------------------------------------------------------------
>
> Key: OAK-1926
> URL: https://issues.apache.org/jira/browse/OAK-1926
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: mongomk
> Affects Versions: 1.0.1
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.1
>
>
> In some cluster deployment cases it has been seen that in memory state of
> UnmergedBranches contains large number of empty commits. For e.g. in one of
> of the runs there were 750 entries in the UnmergedBranches and each Branch
> had empty branch commits.
> If there are large number of UnmergedBranches then read performance would
> degrade as for determining revision validity currently logic scans all
> branches
> Below is some part of UnmergedBranch state
> {noformat}
> Branch 1
> 1 -> br146d2edb7a7-0-1 (true) (revision: "br146d2edb7a7-0-1", clusterId: 1,
> time: "2014-06-25 05:08:52.903", branch: true)
> 2 -> br146d2f0450b-0-1 (true) (revision: "br146d2f0450b-0-1", clusterId: 1,
> time: "2014-06-25 05:11:40.171", branch: true)
> Branch 2
> 1 -> br146d2ef1d08-0-1 (true) (revision: "br146d2ef1d08-0-1", clusterId: 1,
> time: "2014-06-25 05:10:24.392", branch: true)
> Branch 3
> 1 -> br146d2ed26ca-0-1 (true) (revision: "br146d2ed26ca-0-1", clusterId: 1,
> time: "2014-06-25 05:08:15.818", branch: true)
> 2 -> br146d2edfd0e-0-1 (true) (revision: "br146d2edfd0e-0-1", clusterId: 1,
> time: "2014-06-25 05:09:10.670", branch: true)
> Branch 4
> 1 -> br146d2ecd85b-0-1 (true) (revision: "br146d2ecd85b-0-1", clusterId: 1,
> time: "2014-06-25 05:07:55.739", branch: true)
> Branch 5
> 1 -> br146d2ec21a0-0-1 (true) (revision: "br146d2ec21a0-0-1", clusterId: 1,
> time: "2014-06-25 05:07:08.960", branch: true)
> 2 -> br146d2ec8eca-0-1 (true) (revision: "br146d2ec8eca-0-1", clusterId: 1,
> time: "2014-06-25 05:07:36.906", branch: true)
> Branch 6
> 1 -> br146d2eaf159-1-1 (true) (revision: "br146d2eaf159-1-1", clusterId: 1,
> time: "2014-06-25 05:05:51.065", counter: 1, branch: true)
> Branch 7
> 1 -> br146d2e9a513-0-1 (true) (revision: "br146d2e9a513-0-1", clusterId: 1,
> time: "2014-06-25 05:04:26.003", branch: true)
> {noformat}
> [~mreutegg] Suggested that these branch might be for those revision which
> have resulted in a collision and upon checking it indeed appears to be the
> case (value true in brackets above indicate that). Further given the age of
> such revision it looks like they get populated upon startup itself
> *Fix*
> * Need to check why we need to populate the UnermgedBranch
> * Possibly implement some purge job which would remove such stale entries
--
This message was sent by Atlassian JIRA
(v6.2#6252)