[ 
https://issues.apache.org/jira/browse/OAK-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcel Reutegger reopened OAK-1970:
-----------------------------------
      Assignee: Marcel Reutegger  (was: Chetan Mehrotra)

OAK-2829 is somewhat different because it focuses on comparing node states for 
external changes. This issue is about the more general case of comparing states 
with many children. Now that OAK-2685 is resolved, we have the root revision 
available and can use it to narrow the modified range.

> Optimize the diff logic for large number of children case 
> ----------------------------------------------------------
>
>                 Key: OAK-1970
>                 URL: https://issues.apache.org/jira/browse/OAK-1970
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk
>            Reporter: Chetan Mehrotra
>            Assignee: Marcel Reutegger
>              Labels: performance, resilience
>             Fix For: 1.3.1
>
>
> DocumentNodeStore currently makes use of query to determine child nodes which 
> have changed after certain time. Query used is something like
> {noformat}
> db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, 
> _modified: { $gte: <start time> } }).sort({_id:1})
> {noformat}
> OAK-1966 tries to optimize the majority case where start times is recent and 
> in that case it makes use of _modified index. However if the start time is 
> quite old and a node has large number of children say 100k then it would 
> involve scan of all those 100k nodes as _modified index would not be of much 
> help. 
> Instead of querying like this we can have a special handling for cases where 
> large number of children are involved. It would involve following steps
> After analyzing the runtime queries in most case it is seen that even with 
> old modified time the number of change nodes is < 50
> # Mark parent nodes which have large number of children say > 50
> # On such nodes we would keep an array of \{modifiedtime, childName\} ## 
> Array would be bounded say keep last 50 updates. This can be done via splice 
> and push operators [1]
> ## Each entry in array would record modifiedtime and name of child node which 
> was modified. 
> ## Array would be sorted on modifiedtime
> # Each updated to any child belonging to such parent would also involve 
> update to above array
> # When we query for modified we check if the parent has such an array (if 
> parent is in cache) and if that array has time entries from the required 
> start time we directly make use of that and avoid the query
> This should reduce needs for such queries in majority of cases
> [1] http://docs.mongodb.org/manual/reference/operator/update-array/
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to