Hi Carlos, One of the requirements for the system is to be able to work completely offline and sync when there's an internet connectivity. Right, this is a task for a queue, I would use one if I could find any implementation addressing offline replication use case.
thanks, --Vovan > On Jun 21, 2017, at 1:33 AM, Carlos Alonso <[email protected]> wrote: > > Hi Vladimir. > > I suppose you already have evaluated this option, but just in case... To me > this sounds like a task for a queue. Have you thought about having a queue > as your central logs storage (instead of the couchdb cluster on cloud you > described)? To me it feels a more natural use case. From your devices you > publish new messages (log entries) into the queue and then from your > application you consume those messages to process them. The queue will > already delete the processed messages for you, so you won't need to take > care of it (no rotating dbs, purging, ...). There are many different tools > out there so I guess you'll find one that provides your required > durability/ordering requirements out of the box.... > > Hope it helps > > On Tue, Jun 20, 2017 at 11:12 PM Stefan Klein <[email protected]> wrote: > >> Hi, >> >> commenting on individual topics inline, though i only have experience with >> CouchDB 1.6, maybe CouchDB 2.0 behaves a bit different. >> >> 2017-06-20 20:34 GMT+02:00 Vladimir Kuznetsov <[email protected]>: >> >>> >>> Now, I know in couchdb documents are not being really deleted, just >> marked >>> as 'deleted' so the database will permanently grow. I've and option >> either >>> to use periodic _purge(which I heard may be not safe especially in >>> clustered environment) or implement this as monthly rotating >> database(which >>> is more complex and I don't really want to follow this route). >>> >> >> I think rotating databases are not that more complex, but see below. >> >> >>> My questions are: >>> >>> - Is this a valid use case for CouchDB? I want to use it primarily >> because >>> of its good replication capabilities, especially in not reliable >>> environments with some periods of being offline etc. Otherwise I'll have >> to >>> write the whole set of data sync APIs with buffering, retries etc myself. >>> >> >> In our case, mobiles replicating to and from CouchDB, replication has >> proven to be very very reliable, it just works. >> If i had to implement that on my own it would have been much much worse. >> >> >>> - Is this recommended practice to set up a chain of replication? Due to >>> security considerations I want customer devices to replicate each to its >>> own database in the cloud. Then I want those databases to replicate to >> the >>> single central log database I'd subscribe for _changes. The reason is >> that >>> it's easier for me to have a single source of _changes feed rather than >>> multiple databases. >>> >> >> We do this, each user got his own database which we consider "outside", we >> monitor these databases for changes and take appropriate actions. :) >> On our server continuous replications from thousands of customer databases >> to a central database occupied to many connections¹ and the performance >> over all degraded, even if only "some" users where actually active. We now >> listen to DB changes (_db_updates), start replications for the DB in >> question and stop them again after a certain timeout, activity obviously >> resets the timeout. >> >> If you consider the single central log database only so you have a single >> changes feed (you don't need centralized views etc.) I would skip the >> central database and just process _all_docs (or a view only containing >> unprocessed logentries) of any database an "updated" event was triggered >> on. >> >> If you go that route, you are half way through rotating databases already, >> your backend doesn't care anymore on which database a change is triggered >> on. >> >> >>> - Is using _purge safe in my case? From the official doc I read "In >>> clustered or replicated environments it is very difficult to guarantee >> that >>> a particular purged document has been removed from all replicas". I don't >>> think this is a problem for me as I primarily care about database size so >>> it shouldn't be critical if some documents fail to delete. >>> >> >> _purge is the wrong tool for this job. >> From my understanding it's there as a last resort to get sensitive data out >> of a DB. >> >> Regards, >> Stefan >> >> [1] I think the main reason for this was actually the operating system, but >> it was faster, easier and more future proof to implement the described >> solution than to tune the OS to handle the connection, at least for me. >> > -- > [image: Cabify - Your private Driver] <http://www.cabify.com/> > > *Carlos Alonso* > Data Engineer > Madrid, Spain > > [email protected] > > Prueba gratis con este código > #CARLOSA6319 <https://cabify.com/i/carlosa6319> > [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter] > <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image: > Linkedin] <https://www.linkedin.com/in/mrcalonso> > > -- > Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su > destinatario, pudiendo contener información confidencial sometida a secreto > profesional. No está permitida su reproducción o distribución sin la > autorización expresa de Cabify. Si usted no es el destinatario final por > favor elimínelo e infórmenos por esta vía. > > This message and any attached file are intended exclusively for the > addressee, and it may be confidential. You are not allowed to copy or > disclose it without Cabify's prior written authorization. If you are not > the intended recipient please delete it from your system and notify us by > e-mail.
