[
https://issues.apache.org/jira/browse/OAK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tomek Rękawek updated OAK-4751:
-------------------------------
Fix Version/s: 1.4.12
> Improve the checkpoint migration performance
> --------------------------------------------
>
> Key: OAK-4751
> URL: https://issues.apache.org/jira/browse/OAK-4751
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-tar, upgrade
> Reporter: Tomek Rękawek
> Assignee: Tomek Rękawek
> Fix For: 1.5.10, 1.4.12, 1.6
>
> Attachments: OAK-4751.patch
>
>
> (based on [~alex.parvulescu] input):
> During the segment->segment-tar migration, a fair amount of time is being
> taken by the deduplication process. Basically the repository is ingesting
> large amounts of content (a checkpoint is the equivalent of a full repo
> state), and once it deduplicates the data, it finds it already available in
> the destination repository.
> The reason this happens is because the diff mechanism cannot be efficient
> across repositories.
> For example: on the source repo we have r0 root state and cp0 a checkpoint
> very close to r0. the diff(r0, cp0) is extremely cheap measured in
> milliseconds. But what the sidegrade does is it copies r0 to the destination
> repository: r0 -> rx1, then it runs diff(rx1, cp0) which becomes very
> expensive as the 2 node states don't originate from the same repository, so
> diffing will fallback to a slow content equals comparison. next the content
> is almost equal, so a huge amount of cycles are wasted in deduplicating data
> over the 2 repositories.
> I have no easy solution here other than looking into providing a diff
> mechanism that will compare the 2 local states diff(r0, cp0) BUT apply the
> delta to the destination repository (apply it on rx1). I'm not sure how easy
> this will turn out to be, and if it's worth the effort.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)