Davide Giannella closed OAK-4751.

bulk close for 1.5.10

> Improve the checkpoint migration performance
> --------------------------------------------
>                 Key: OAK-4751
>                 URL: https://issues.apache.org/jira/browse/OAK-4751
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar, upgrade
>            Reporter: Tomek Rękawek
>            Assignee: Tomek Rękawek
>             Fix For: 1.6, 1.5.10
>         Attachments: OAK-4751.patch
> (based on [~alex.parvulescu] input):
> During the segment->segment-tar migration, a fair amount of time is being 
> taken by the deduplication process. Basically the repository is ingesting 
> large amounts of content (a checkpoint is the equivalent of a full repo 
> state), and once it deduplicates the data, it finds it already available in 
> the destination repository.
> The reason this happens is because the diff mechanism cannot be efficient 
> across repositories.
> For example: on the source repo we have r0 root state and cp0 a checkpoint 
> very close to r0. the diff(r0, cp0) is extremely cheap measured in 
> milliseconds. But what the sidegrade does is it copies r0 to the destination 
> repository: r0 -> rx1, then it runs diff(rx1, cp0) which becomes very 
> expensive as the 2 node states don't originate from the same repository, so 
> diffing will fallback to a slow content equals comparison. next the content 
> is almost equal, so a huge amount of cycles are wasted in deduplicating data 
> over the 2 repositories.
> I have no easy solution here other than looking into providing a diff 
> mechanism that will compare the 2 local states diff(r0, cp0) BUT apply the 
> delta to the destination repository (apply it on rx1). I'm not sure how easy 
> this will turn out to be, and if it's worth the effort.

This message was sent by Atlassian JIRA

Reply via email to