Re: Using CouchDB for transferring user logs via the internet

Vladimir Kuznetsov Wed, 21 Jun 2017 10:48:53 -0700

Hi Carlos,

One of the requirements for the system is to be able to work completely offline 
and sync when there's an internet connectivity. Right, this is a task for a 
queue, I would use one if I could find any implementation addressing offline 
replication use case.


thanks,
--Vovan

> On Jun 21, 2017, at 1:33 AM, Carlos Alonso <[email protected]> wrote:
> 
> Hi Vladimir.
> 
> I suppose you already have evaluated this option, but just in case... To me
> this sounds like a task for a queue. Have you thought about having a queue
> as your central logs storage (instead of the couchdb cluster on cloud you
> described)? To me it feels a more natural use case. From your devices you
> publish new messages (log entries) into the queue and then from your
> application you consume those messages to process them. The queue will
> already delete the processed messages for you, so you won't need to take
> care of it (no rotating dbs, purging, ...). There are many different tools
> out there so I guess you'll find one that provides your required
> durability/ordering requirements out of the box....
> 
> Hope it helps
> 
> On Tue, Jun 20, 2017 at 11:12 PM Stefan Klein <[email protected]> wrote:
> 
>> Hi,
>> 
>> commenting on individual topics inline, though i only have experience with
>> CouchDB 1.6, maybe CouchDB 2.0 behaves a bit different.
>> 
>> 2017-06-20 20:34 GMT+02:00 Vladimir Kuznetsov <[email protected]>:
>> 
>>> 
>>> Now, I know in couchdb documents are not being really deleted, just
>> marked
>>> as 'deleted' so the database will permanently grow. I've and option
>> either
>>> to use periodic _purge(which I heard may be not safe especially in
>>> clustered environment) or implement this as monthly rotating
>> database(which
>>> is more complex and I don't really want to follow this route).
>>> 
>> 
>> I think rotating databases are not that more complex, but see below.
>> 
>> 
>>> My questions are:
>>> 
>>> - Is this a valid use case for CouchDB? I want to use it primarily
>> because
>>> of its good replication capabilities, especially in not reliable
>>> environments with some periods of being offline etc. Otherwise I'll have
>> to
>>> write the whole set of data sync APIs with buffering, retries etc myself.
>>> 
>> 
>> In our case, mobiles replicating to and from CouchDB, replication has
>> proven to be very very reliable, it just works.
>> If i had to implement that on my own it would have been much much worse.
>> 
>> 
>>> - Is this recommended practice to set up a chain of replication? Due to
>>> security considerations I want customer devices to replicate each to its
>>> own database in the cloud. Then I want those databases to replicate to
>> the
>>> single central log database I'd subscribe for _changes. The reason is
>> that
>>> it's easier for me to have a single source of _changes feed rather than
>>> multiple databases.
>>> 
>> 
>> We do this, each user got his own database which we consider "outside", we
>> monitor these databases for changes and take appropriate actions. :)
>> On our server continuous replications from thousands of customer databases
>> to a central database occupied to many connections¹ and the performance
>> over all degraded, even if only "some" users where actually active. We now
>> listen to DB changes (_db_updates), start replications for the DB in
>> question and stop them again after a certain timeout, activity obviously
>> resets the timeout.
>> 
>> If you consider the single central log database only so you have a single
>> changes feed (you don't need centralized views etc.) I would skip the
>> central database and just process _all_docs (or a view only containing
>> unprocessed logentries) of any database an "updated" event was triggered
>> on.
>> 
>> If you go that route, you are half way through rotating databases already,
>> your backend doesn't care anymore on which database a change is triggered
>> on.
>> 
>> 
>>> - Is using _purge safe in my case? From the official doc I read "In
>>> clustered or replicated environments it is very difficult to guarantee
>> that
>>> a particular purged document has been removed from all replicas". I don't
>>> think this is a problem for me as I primarily care about database size so
>>> it shouldn't be critical if some documents fail to delete.
>>> 
>> 
>> _purge is the wrong tool for this job.
>> From my understanding it's there as a last resort to get sensitive data out
>> of a DB.
>> 
>> Regards,
>> Stefan
>> 
>> [1] I think the main reason for this was actually the operating system, but
>> it was faster, easier and more future proof to implement the described
>> solution than to tune the OS to handle the connection, at least for me.
>> 
> -- 
> [image: Cabify - Your private Driver] <http://www.cabify.com/>
> 
> *Carlos Alonso*
> Data Engineer
> Madrid, Spain
> 
> [email protected]
> 
> Prueba gratis con este código
> #CARLOSA6319 <https://cabify.com/i/carlosa6319>
> [image: Facebook] <http://cbify.com/fb_ES>[image: Twitter]
> <http://cbify.com/tw_ES>[image: Instagram] <http://cbify.com/in_ES>[image:
> Linkedin] <https://www.linkedin.com/in/mrcalonso>
> 
> -- 
> Este mensaje y cualquier archivo adjunto va dirigido exclusivamente a su 
> destinatario, pudiendo contener información confidencial sometida a secreto 
> profesional. No está permitida su reproducción o distribución sin la 
> autorización expresa de Cabify. Si usted no es el destinatario final por 
> favor elimínelo e infórmenos por esta vía. 
> 
> This message and any attached file are intended exclusively for the 
> addressee, and it may be confidential. You are not allowed to copy or 
> disclose it without Cabify's prior written authorization. If you are not 
> the intended recipient please delete it from your system and notify us by 
> e-mail.

Re: Using CouchDB for transferring user logs via the internet

Reply via email to