Hi, On Thu, Mar 20, 2014 at 11:05 AM, Alex Parvulescu <[email protected]> wrote: > On Thu, Mar 20, 2014 at 1:33 PM, Jukka Zitting <[email protected]>wrote: >> Perhaps the comparison is between content in the source repository and >> that in the backup repository? In that case the segment identifiers >> wouldn't match, and the comparison would slow down as described. > > Yes exactly, the backup runs a diff between the source instance and the > target instance.
Ideally it shouldn't, that's why the backup code tries to instead use the checkpoint that was used for the previous backup [1]. The comparison across repositories is much slower and produces some slightly unexpected behavior like what you're probably seeing here. > One would expect that the backup is incremental in the > sense that running it a consecutive time yields only the modifications that > happened during that time period but these findings show otherwise. If the comparison is done across repositories, the content diff will call childNodeChanged() even on unmodified nodes as permitted by OAK-914 [2]. The reason for this is that the child node identifiers will not match across repositories, so there's no efficient way for the content diff to tell whether the subtrees are equal or not. However, AFAICT these extra childNodeChanged() calls should only result in some slowdown as ApplyDiff recurses down the tree, not actual changes to be written by SegmentWriter. Can you for example try dumping the output of JsopDiff.diffToJsop() before line 74 in FileStoreBackup to verify that no changes are unexpectedly showing up? [1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/backup/FileStoreBackup.java#L61 [2] https://issues.apache.org/jira/browse/OAK-914 BR, Jukka Zitting
