Hi Stefan Thanks for pointing to the /_db_updates feature, I wasn't aware about it.
Yes, my only need for central db is to simplify polling for changes. No central views etc. I'll see if I can efficiently poll on multiple databases changes by listening on /_db_updates feed: on each new entry where type is 'updated' and db_name starts with 'some common prefix for dbs I want to track': subscribe to the _changes feed of that db_name stop listening for _changes after some period of inactivity If this works I think rotating databases are not a big deal as I actually don't care what databases to listen, just pick any I see update notifications from. Thanks one more time. --Vovan P.S. after a little bit more reading I see that _purge may not be the right tool. Also I was a little bit confused thinking it's being invoked the same way _compact does i.e. once for entire database. This is not the case as I have to pass document id with all the all the revisions as a parameter, handle conflicts etc which makes things more complicated. thanks, --Vovan > On Jun 20, 2017, at 2:12 PM, Stefan Klein <[email protected]> wrote: > > Hi, > > commenting on individual topics inline, though i only have experience with > CouchDB 1.6, maybe CouchDB 2.0 behaves a bit different. > > 2017-06-20 20:34 GMT+02:00 Vladimir Kuznetsov <[email protected]>: > >> >> Now, I know in couchdb documents are not being really deleted, just marked >> as 'deleted' so the database will permanently grow. I've and option either >> to use periodic _purge(which I heard may be not safe especially in >> clustered environment) or implement this as monthly rotating database(which >> is more complex and I don't really want to follow this route). >> > > I think rotating databases are not that more complex, but see below. > > >> My questions are: >> >> - Is this a valid use case for CouchDB? I want to use it primarily because >> of its good replication capabilities, especially in not reliable >> environments with some periods of being offline etc. Otherwise I'll have to >> write the whole set of data sync APIs with buffering, retries etc myself. >> > > In our case, mobiles replicating to and from CouchDB, replication has > proven to be very very reliable, it just works. > If i had to implement that on my own it would have been much much worse. > > >> - Is this recommended practice to set up a chain of replication? Due to >> security considerations I want customer devices to replicate each to its >> own database in the cloud. Then I want those databases to replicate to the >> single central log database I'd subscribe for _changes. The reason is that >> it's easier for me to have a single source of _changes feed rather than >> multiple databases. >> > > We do this, each user got his own database which we consider "outside", we > monitor these databases for changes and take appropriate actions. :) > On our server continuous replications from thousands of customer databases > to a central database occupied to many connections¹ and the performance > over all degraded, even if only "some" users where actually active. We now > listen to DB changes (_db_updates), start replications for the DB in > question and stop them again after a certain timeout, activity obviously > resets the timeout. > > If you consider the single central log database only so you have a single > changes feed (you don't need centralized views etc.) I would skip the > central database and just process _all_docs (or a view only containing > unprocessed logentries) of any database an "updated" event was triggered on. > > If you go that route, you are half way through rotating databases already, > your backend doesn't care anymore on which database a change is triggered > on. > > >> - Is using _purge safe in my case? From the official doc I read "In >> clustered or replicated environments it is very difficult to guarantee that >> a particular purged document has been removed from all replicas". I don't >> think this is a problem for me as I primarily care about database size so >> it shouldn't be critical if some documents fail to delete. >> > > _purge is the wrong tool for this job. > From my understanding it's there as a last resort to get sensitive data out > of a DB. > > Regards, > Stefan > > [1] I think the main reason for this was actually the operating system, but > it was faster, easier and more future proof to implement the described > solution than to tune the OS to handle the connection, at least for me.
