Re: Checkpointing on read only databases

Calvin Metcalf Sun, 13 Apr 2014 18:48:16 -0700

oo didn't think of that, yeah uuids wouldn't hurt, though the more I think
about the rolling hashing on revs, the more I like that



On Sun, Apr 13, 2014 at 6:00 PM, Adam Kocoloski <[email protected]>wrote:

> Yes, but then sysadmins have to be very very careful about restoring from
> a file-based backup. We run the risk that {uuid, seq} could be
> multi-valued, which diminishes its value considerably.
>
> I like the UUID in general -- we've added them to our internal shard files
> at Cloudant -- but on their own they're not a bulletproof solution for
> read-only incremental replications.
>
> Adam
>
> > On Apr 13, 2014, at 5:16 PM, Calvin Metcalf <[email protected]>
> wrote:
> >
> > I mean if your going to add new features to couch you could just have the
> > db generate a random uuid on creation that would be different if it was
> > deleted and recreated
> >> On Apr 13, 2014 1:59 PM, "Adam Kocoloski" <[email protected]>
> wrote:
> >>
> >> Other thoughts:
> >>
> >> - We could enhance the authorization system to have a role that allows
> >> updates to _local docs but nothing else. It wouldn't make sense for
> >> completely untrusted peers, but it could give peace of mind to sysadmins
> >> trying to execute replications with the minimum level of access
> possible.
> >>
> >> - We could teach the sequence index to maintain a report of rolling hash
> >> of the {id,rev} pairs that comprise the database up to that sequence,
> >> record that in the replication checkpoint document, and check that it's
> >> unchanged on resume. It's a new API enhancement and it grows the amount
> of
> >> information stored with each sequence, but it completely closes off the
> >> probabilistic edge case associated with simply checking that the {id,
> rev}
> >> associated with the checkpointed sequence has not changed. Perhaps
> overkill
> >> for what is admittedly a pretty low-probability event.
> >>
> >> Adam
> >>
> >> On Apr 13, 2014, at 1:50 PM, Adam Kocoloski <[email protected]>
> >> wrote:
> >>
> >>> Yeah, this is a subtle little thing. The main reason we checkpoint on
> >> both source and target and compare is to cover the case where the source
> >> database is deleted and recreated in between replication attempts. If
> that
> >> were to happen and the replicator just resumes blindly from the
> checkpoint
> >> sequence stored on the target then the replication could permanently
> miss
> >> some documents written to the new source.
> >>>
> >>> I'd love to have a robust solution for incremental replication of
> >> read-only databases. To first order a UUID on the source database that
> was
> >> fixed at create time could do the trick, but we'll run into trouble with
> >> file-based backup and restores. If a database file is restored to a
> point
> >> before the latest replication checkpoint we'd again be in a position of
> >> potentially permanently missing updates.
> >>>
> >>> Calvin's suggestion of storing e.g. {seq, id, rev} instead of simply
> seq
> >> as the checkpoint information would dramatically reduce the likelihood
> of
> >> that type of permanent skip in the replication, but it's only a
> >> probabilistic answer.
> >>>
> >>> Adam
> >>>
> >>>> On Apr 13, 2014, at 1:31 PM, Calvin Metcalf <[email protected]
> >
> >>> wrote:
> >>>
> >>>> Though currently we have the opposite problem right if we delete the
> >> target
> >>>> db? (this on me brain storming)
> >>>>
> >>>> Could we store last rev in addition to last seq?
> >>>>> On Apr 13, 2014 1:15 PM, "Dale Harvey" <[email protected]> wrote:
> >>>>>
> >>>>> If the src database was to be wiped, when we restarted replication
> >> nothing
> >>>>> would happen until the source database caught up to the previously
> >> written
> >>>>> checkpoint
> >>>>>
> >>>>> create A, write 5 documents
> >>>>> replicate 5 documents A -> B, write checkpoint 5 on B
> >>>>> destroy A
> >>>>> write 4 documents
> >>>>> replicate A -> B, pick up checkpoint from B and to ?since=5
> >>>>> .. no documents written
> >>
> https://github.com/pouchdb/pouchdb/blob/master/tests/test.replication.js#L771is
> >>>>> our test that covers it
> >>>>>
> >>>>>
> >>>>> On 13 April 2014 18:02, Calvin Metcalf <[email protected]>
> >> wrote:
> >>>>>
> >>>>>> If we were to unilaterally switch to checkpoint on target what would
> >>>>>> happen, replication in progress would loose their place?
> >>>>>>> On Apr 13, 2014 11:21 AM, "Dale Harvey" <[email protected]>
> wrote:
> >>>>>>>
> >>>>>>> So with checkpointing we write the checkpoint to both A and B and
> >>>>> verify
> >>>>>>> they match before using the checkpoint
> >>>>>>>
> >>>>>>> What happens if the src of the replication is read only?
> >>>>>>>
> >>>>>>> As far as I can tell couch will just checkout a
> >> checkpoint_commit_error
> >>>>>> and
> >>>>>>> carry on from the start, The only improvement I can think of is the
> >>>>> user
> >>>>>>> specifies they know the src is read only and to only use the target
> >>>>>>> checkpoint, we can 'possibly' make that happen automatically if the
> >> src
> >>>>>>> specifically fails the write due to permissions.
> >>
> >>
>



-- 
-Calvin W. Metcalf

Re: Checkpointing on read only databases

Reply via email to