Hi Vladimir. I suppose you already have evaluated this option, but just in case... To me this sounds like a task for a queue. Have you thought about having a queue as your central logs storage (instead of the couchdb cluster on cloud you described)? To me it feels a more natural use case. From your devices you publish new messages (log entries) into the queue and then from your application you consume those messages to process them. The queue will already delete the processed messages for you, so you won't need to take care of it (no rotating dbs, purging, ...). There are many different tools out there so I guess you'll find one that provides your required durability/ordering requirements out of the box....
Hope it helps On Tue, Jun 20, 2017 at 11:12 PM Stefan Klein <[email protected]> wrote: > Hi, > > commenting on individual topics inline, though i only have experience with > CouchDB 1.6, maybe CouchDB 2.0 behaves a bit different. > > 2017-06-20 20:34 GMT+02:00 Vladimir Kuznetsov <[email protected]>: > > > > > Now, I know in couchdb documents are not being really deleted, just > marked > > as 'deleted' so the database will permanently grow. I've and option > either > > to use periodic _purge(which I heard may be not safe especially in > > clustered environment) or implement this as monthly rotating > database(which > > is more complex and I don't really want to follow this route). > > > > I think rotating databases are not that more complex, but see below. > > > > My questions are: > > > > - Is this a valid use case for CouchDB? I want to use it primarily > because > > of its good replication capabilities, especially in not reliable > > environments with some periods of being offline etc. Otherwise I'll have > to > > write the whole set of data sync APIs with buffering, retries etc myself. > > > > In our case, mobiles replicating to and from CouchDB, replication has > proven to be very very reliable, it just works. > If i had to implement that on my own it would have been much much worse. > > > > - Is this recommended practice to set up a chain of replication? Due to > > security considerations I want customer devices to replicate each to its > > own database in the cloud. Then I want those databases to replicate to > the > > single central log database I'd subscribe for _changes. The reason is > that > > it's easier for me to have a single source of _changes feed rather than > > multiple databases. > > > > We do this, each user got his own database which we consider "outside", we > monitor these databases for changes and take appropriate actions. :) > On our server continuous replications from thousands of customer databases > to a central database occupied to many connections¹ and the performance > over all degraded, even if only "some" users where actually active. We now > listen to DB changes (_db_updates), start replications for the DB in > question and stop them again after a certain timeout, activity obviously > resets the timeout. > > If you consider the single central log database only so you have a single > changes feed (you don't need centralized views etc.) I would skip the > central database and just process _all_docs (or a view only containing > unprocessed logentries) of any database an "updated" event was triggered > on. > > If you go that route, you are half way through rotating databases already, > your backend doesn't care anymore on which database a change is triggered > on. > > > > - Is using _purge safe in my case? From the official doc I read "In > > clustered or replicated environments it is very difficult to guarantee > that > > a particular purged document has been removed from all replicas". I don't > > think this is a problem for me as I primarily care about database size so > > it shouldn't be critical if some documents fail to delete. > > > > _purge is the wrong tool for this job. > From my understanding it's there as a last resort to get sensitive data out > of a DB. > > Regards, > Stefan > > [1] I think the main reason for this was actually the operating system, but > it was faster, easier and more future proof to implement the described > solution than to tune the OS to handle the connection, at least for me. > -- [image: Cabify - Your private Driver] <http://www.cabify.com/> *Carlos Alonso* Data Engineer Madrid, Spain [email protected] Prueba gratis con este código #CARLOSA6319 <https://cabify.com/i/carlosa6319> [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter] <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image: Linkedin] <https://www.linkedin.com/in/mrcalonso> -- Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su destinatario, pudiendo contener información confidencial sometida a secreto profesional. No está permitida su reproducción o distribución sin la autorización expresa de Cabify. Si usted no es el destinatario final por favor elimínelo e infórmenos por esta vía. This message and any attached file are intended exclusively for the addressee, and it may be confidential. You are not allowed to copy or disclose it without Cabify's prior written authorization. If you are not the intended recipient please delete it from your system and notify us by e-mail.
