[ 
https://issues.apache.org/jira/browse/OAK-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312663#comment-14312663
 ] 

Chetan Mehrotra commented on OAK-2492:
--------------------------------------

One possible approach would be

# Introduce a new flag in {{nodes}} document - {{_manyChildren}}
# When doing a diff in {{diffImpl}} its found that node at given path has lots 
of child and its not already marked then it would record that path in an in 
memory list
# Periodically a background thread would pull out such paths and update them to 
set {{_manyChildren}} to true

Points to note
# The flag is set on best effort basis
# Once set no effort would be made to unset it. So if number of children reduce 
diff logic would continue to use the query based approach
# The threshold is currently configurable at runtime. However once a node is 
marked it would be on the basis of threshold at that time. Later if the 
threshold is change then also flagged document would be diff based on query only
# It would be better to use a separate thread for this task and not club it as 
part of backgroundOperations as its not critical and does not depend on how 
revisions are seen. So no use in adding work in the time critical 
backgroundOperation logic we have

> Flag Document having many children
> ----------------------------------
>
>                 Key: OAK-2492
>                 URL: https://issues.apache.org/jira/browse/OAK-2492
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.0.12, 1.1.7
>
>
> Current DocumentMK logic while performing a diff for child nodes works as 
> below
> # Get children for _before_ revision upto MANY_CHILDREN_THRESHOLD (which 
> defaults to 50). Further note that current logic of fetching children nodes 
> also add children {{NodeDocument}} to {{Document}} cache and also reads the 
> complete Document for those children
> # Get children for _after_ revision with limits as above
> # If the child list is complete then it does a direct diff on the fetched 
> children
> # if the list is not complete i.e. number of children are more than the 
> threshold then it for a query based diff (also see OAK-1970)
> So in those cases where number of children are large then all work done in #1 
> above is wasted and should be avoided. To do that we can mark those parent 
> nodes which have many children via special flag like {{_manyChildren}}. One 
> such nodes are marked the diff logic can check for the flag and skip the work 
> done in #1
> This is kind of similar to way we mark nodes which have at least one child 
> (OAK-1117)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to