[
https://issues.apache.org/jira/browse/OAK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831877#comment-13831877
]
Jukka Zitting commented on OAK-1159:
------------------------------------
bq. but if there already is a folder where we have an unknown state, why not
use that as the reference?
It's much faster to compare states from the same store than it is to compare
states from two different stores like the main repository and the backup store.
For example the comparison of two checkpoints that contain no changes to a
particular subtree can simply ignore that subtree as the relevant record
identifiers are equal in both states. If the states come from different stores,
the identifiers don't match and the comparison needs to traverse the entire
tree to see whether anything changed.
bq. where would I save the last seen&backed-up checkpoint and how do I make
sure that the state is still there?
A good approach would be for example to add an extra top-level node to the
backup store and keep track of the last checkpoint as a property of that node,
along with the backed up state as a subtree. Something like this:
{code}
FileStore backup = new FileStore(...);
Journal root = backup.getJournal("root");
SegmentNodeState state = new SegmentNodeState(
backup.getWriter().getDummySegment(), root.getHead());
SegmentRootBuilder builder = state.builder();
String checkpoint = state.getString("checkpoint");
NodeState before = store.retrieve(checkpoint);
checkpoint = store.checkpoint(...);
NodeState after = store.retrieve(checkpoint);
builder.setProperty("checkpoint", checkpoint, Type.STRING);
after.compareAgainstBaseState(before, new ApplyDiff(builder.child("root")));
root.setHead(builder.getNodeState().getRecordId());
backup.close();
{code}
Note that unless someone explicitly messes up the backup storage, the above
pattern guarantees that the "checkpoint" property and the backed up state
stored under "root" will remain in sync as the journal update is atomic.
bq. what happens if the checkpoint gets recycled in the meantime?
At that point we could fall back to using the backed up state. It'll be slower
as mentioned above, but allows us to catch up again with the latest state
without doing a full backup:
{code}
NodeState before = store.retrieve(checkpoint);
if (before == null) {
before = state.getChildNode("root");
}
{code}
bq. I'd see this ApplyDiff happen automagically in the FileStore
I'd prefer to keep the SegmentStore implementations as simple and
straightforward as possible, i.e. have as little custom processing inside it as
possible. If something can efficiently be done above the SegmentStore
interface, it generally should be done above it.
> Backup and restore
> ------------------
>
> Key: OAK-1159
> URL: https://issues.apache.org/jira/browse/OAK-1159
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: core, mk
> Reporter: Michael Marth
> Assignee: Alex Parvulescu
>
> We need a way to backup and restore a repository. I was thinking that the MK
> impl could expose an interface for this, as the actual implementation would
> differ quite a bit between e.g. TarMK and MongoMK.
> Also, I think we could leverage the MVCC nature of the MKs and mark a
> specific revision as "the revision to backup" (regardless of ongoing writes).
> That would allow us to prevent the ugly situation in JR2, that we need to
> stop writes for a while to produce a consistent backup.
> The restore in such a scenario would discard revisions that happened after
> said marker (but still made it into the backup).
--
This message was sent by Atlassian JIRA
(v6.1#6144)