I would certainly say, "yes". This thread has certainly given me food for thought about my design, given the current behaviour of CouchDB replication!
On 08/07/2013, at 11:53 PM, Paul Hirst <[email protected]> wrote: > Actually the third option is the most compelling argument. > > * Replicate with target validate_doc_update that rejects "foo"; > > If you changed the validate_doc_update function so it accepts "foo", would > you expect "foo" to now arrive when you replicate again? > > I say no, because it feels more useful that way (and unless I'm really > confused that's the current behaviour), but I can image other people might > say yes. > > -----Original Message----- > From: Jason Smith [mailto:[email protected]] > Sent: 08 July 2013 08:46 > To: [email protected] > Subject: Re: Purging documents and view invalidation > > I think the "official" behavior of purging and replicating is undefined. > > It is still "as if the document had never existed" because these three > procedures are equivalent (modulo my own misunderstanding of the question or > CouchDB's code) > > * Replicate all docs; purge doc "foo"; replicate again > * Replicate with a source filter that blocks doc "foo"; replicate again > * Replicate with target validate_doc_update that rejects "foo"; replicate > again > > Paul, when I think about it this way, it is not as fragile as I first > thought. Replicating with a filter is pretty well-understood, and a > post-replication purge should behave the same way. > > > > On Mon, Jul 8, 2013 at 2:38 PM, Steven Barlow <[email protected]> wrote: > >> OK, the thing is, I have been having some issues when I want to >> re-replicated documents that have been previously purged. Thinking >> about this some more, and reading some of the below thread, I suspect >> that the replication probably is failing to always replicate documents >> that have been purged, due to some stored sequence number. My >> suspicion is, therefore, that at least as far as replication is >> concerned, purging a document is not "as if the document had never >> existed". >> >> I'm tempted to suggest that this is a bug with purging and replication? >> >> >> On 08/07/2013, at 5:18 PM, Jason Smith <[email protected]> wrote: >> >>> Paul, I believe you are correct on both counts: It would not >>> re-replicate but IMO it is a fragile thing to depend on. >>> >>> The database has a purge_seq value which tracks the number of >>> purges. I >> do >>> not recall if that is factored into the replication ID. It should >>> be. If the target database has undergone a purge since you last met >>> you have no idea what its state is. Note, the database name is >>> relevant to the replication id, so simply copying foo.couch to >>> bar.couch would trigger re-replicating the purged documents. >>> >>> To me, purging is as if a document had never existed. A replication >> should >>> recreate it (unless you change your filter policy or >> validate_doc_update). >>> This seems to be what Steven is doing so I think he is using it >> correctly. >>> CouchDB purge is like Git rebase: sure it is dangerous, but that's >> because >>> it is powerful; and sometimes power users need power tools. >>> >>> >>> On Mon, Jul 8, 2013 at 2:04 PM, Paul Hirst <[email protected]> >> wrote: >>> >>>> Wouldn't the _local document which tracks replication prevent that? >>>> Provided the source and destinations databases don't change URL it >> should >>>> pick up where it left off every time, and therefore never consider >>>> the documents due for consideration unless they change. Are you >>>> suggesting >> that >>>> it's rather fragile to rely on that? >>>> >>>> -----Original Message----- >>>> From: Jason Smith [mailto:[email protected]] >>>> Sent: 05 July 2013 16:23 >>>> To: [email protected] >>>> Subject: Re: Purging documents and view invalidation >>>> >>>> If you do that, and you re-run replication (or potentially if you >>>> use continuous replication) then those documents will be >>>> re-replicated back >> to >>>> the remote site. Purging is as if the document was never created at >> all. So >>>> when replication runs, the couches will want to copy it from the >> "master" >>>> source. >>>> >>>> >>>> On Fri, Jul 5, 2013 at 8:12 PM, Steven Barlow <[email protected]> >> wrote: >>>> >>>>> Purged at the remote site. The master always contains the complete >>>>> data set, the remote sites replicate partial data sets for their >>>>> immediate needs, and then clean themselves up once the tasks are >>>>> complete. >>>>> >>>>> On 05/07/2013, at 9:57 PM, Jason Smith <[email protected]> >> wrote: >>>>> >>>>>> On which database will you perform the purging? >>>>>> >>>>>> >>>>>> On Fri, Jul 5, 2013 at 6:52 PM, Steven Barlow >>>>>> <[email protected]> >>>>> wrote: >>>>>> >>>>>>> Sorry if this is a tangent, but I wanted to pick up on the >>>>>>> "rarely used in the wild" thread: I personally intend to use >>>>>>> purge, because I have temporary partial (filtered) replications of a >>>>>>> "master" >>>>>>> database at remote sites. When the data has been consumed by the >>>>>>> remote site, I figured I could purge it (to save space). Is this >>>>>>> not a valid, or common use case for purging? >>>>>>> >>>>>>> On 05/07/2013, at 7:21 PM, Jason Smith <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> I slightly disagree with Bob, but he is right that all purge >>>>>>>> buys you >>>>>>> (vs. >>>>>>>> filtered replication and then swapping DBs) is a little bit of >>>> uptime. >>>>>>>> Purge is not "untested" but it is rarely used in the wild, so >>>>>>>> the cost/benefit for your uptime is something between "risky" >>>>>>>> and >>>>> "unknown." >>>>>>>> >>>>>>>> (For me, personally, I would purge.) >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 5, 2013 at 3:31 PM, Robert Newson >>>>>>>> <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Paul, >>>>>>>>> >>>>>>>>> If you replicate this database to another database and use a >>>>>>>>> filter that blocks deleted documents, the target will not >>>>>>>>> contain a trace of your 100 million deletes (that is, you can >>>>>>>>> build a new database without cruft without messing with your >>>>>>>>> existing database). During the replication, you can query the >>>>>>>>> view on the target to build it incrementally, or wait till the >>>>>>>>> end, query it once and wait for completion. At the end, flip >>>>>>>>> your app to look at the new database instead. >>>>>>>>> >>>>>>>>> The _purge feature is really only for the case where you >>>>>>>>> accidentally write your root password down in a document id or >>>>>>>>> something (since compaction will sweep away old document >>>>>>>>> contents). I advise against using it for any other reason. >>>>>>>>> >>>>>>>>> B. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 5 July 2013 09:17, Jason Smith <[email protected]> wrote: >>>>>>>>>> Hi, Paul. I wrote up some thoughts on purging here: >>>>>>>>>> https://github.com/iriscouch/cqs#purging-couchdb >>>>>>>>>> >>>>>>>>>> Note, that procedure is untested. It works as a thought >>>>>>>>>> experiment >>>>>>> only. >>>>>>>>>> >>>>>>>>>> The procedure looks complicated, but all you will need is the >>>>>>>>>> core >>>>>>> purge, >>>>>>>>>> view, purge, view, etc. cadence as described in Damien's >>>>>>>>>> email I >>>>> linked >>>>>>>>> to. >>>>>>>>>> As long as you never purge twice before hitting the view, you >>>>>>>>>> are >>>>> fine. >>>>>>>>>> Again, to my knowledge, the purge code is less well tested >>>>>>>>>> than other >>>>>>>>> parts >>>>>>>>>> of CouchDB, so perhaps copy your .couch file and try with >>>>>>>>>> that until >>>>>>> you >>>>>>>>>> are confident. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jul 5, 2013 at 2:37 PM, Paul Hirst >>>>>>>>>> <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I would like to purge a few (~100 million) documents from my >>>>> database. >>>>>>>>>>> I've been going through deleting them all, and that'll be >>>>>>>>>>> complete >>>>> in >>>>>>>>> the >>>>>>>>>>> next few days but I would like to free up some extra space >>>>>>>>>>> by >>>>> purging >>>>>>>>> them >>>>>>>>>>> also. >>>>>>>>>>> >>>>>>>>>>> My concern is around a comment on the wiki page here >>>>>>>>>>> http://wiki.apache.org/couchdb/Purge_Documents >>>>>>>>>>> >>>>>>>>>>> 'If you have purged more than one document between querying >>>>>>>>>>> your >>>>>>> views, >>>>>>>>>>> you will find that they will rebuild from scratch.' >>>>>>>>>>> >>>>>>>>>>> Since I have already deleted the documents I know they >>>>>>>>>>> aren't >>>>> showing >>>>>>> up >>>>>>>>>>> in the view any longer. Is there any way I can avoid this >>>>>>>>>>> view invalidation? (My views take about 10 days to build >>>>>>>>>>> from scratch so >>>>> I >>>>>>>>> can't >>>>>>>>>>> afford the hit). >>>>>>>>>>> >>>>>>>>>>> I have a replica of the database. I could do the purge on >>>>>>>>>>> the >>>>> replica, >>>>>>>>>>> wait for the view to rebuild, switch over, purge on the >>>>>>>>>>> original db, >>>>>>>>> wait >>>>>>>>>>> for the view, switch back, unless there are any obvious >>>>>>>>>>> problems >>>>> with >>>>>>>>> this >>>>>>>>>>> approach? >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Paul >>>>>>>>>>> >>>>>>>>>>> ________________________________ >>>>>>>>>>> >>>>>>>>>>> Sophos Limited, The Pentagon, Abingdon Science Park, >>>>>>>>>>> Abingdon, >>>>>>>>>>> OX14 >>>>>>> 3YP, >>>>>>>>>>> United Kingdom. >>>>>>>>>>> Company Reg No 2096520. VAT Reg No GB 991 2418 08. >>>> >>>> ________________________________ >>>> >>>> Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 >>>> 3YP, United Kingdom. >>>> Company Reg No 2096520. VAT Reg No GB 991 2418 08. > > ________________________________ > > Sophos Limited, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, > United Kingdom. > Company Reg No 2096520. VAT Reg No GB 991 2418 08.
