Re: Using CouchDB for transferring user logs via the internet

Vladimir Kuznetsov Wed, 21 Jun 2017 11:00:06 -0700

Hi Stefan

Thanks for pointing to the /_db_updates feature, I wasn't aware about it.


Yes, my only need for central db is to simplify polling for changes. No central 
views etc.  I'll see if I can efficiently poll on multiple databases changes by 
listening on /_db_updates feed:

 on each new entry where type  is 'updated' and db_name starts with 'some 
common prefix for dbs I want to track':
subscribe to the _changes feed of that db_name
stop listening for _changes after some period of inactivity

If this works I think rotating databases are not a big deal as I actually don't 
care what databases to listen, just pick any I see update notifications from. 

Thanks one more time.
--Vovan

P.S. after a little bit more reading I see that _purge may not be the right 
tool. Also I was a little bit confused thinking it's being invoked the same way 
_compact does i.e. once for entire database. This is not the case as I have to 
pass document id with all the all the revisions as a parameter, handle 
conflicts etc which makes things more complicated.

thanks,
--Vovan


> On Jun 20, 2017, at 2:12 PM, Stefan Klein <[email protected]> wrote:
> 
> Hi,
> 
> commenting on individual topics inline, though i only have experience with
> CouchDB 1.6, maybe CouchDB 2.0 behaves a bit different.
> 
> 2017-06-20 20:34 GMT+02:00 Vladimir Kuznetsov <[email protected]>:
> 
>> 
>> Now, I know in couchdb documents are not being really deleted, just marked
>> as 'deleted' so the database will permanently grow. I've and option either
>> to use periodic _purge(which I heard may be not safe especially in
>> clustered environment) or implement this as monthly rotating database(which
>> is more complex and I don't really want to follow this route).
>> 
> 
> I think rotating databases are not that more complex, but see below.
> 
> 
>> My questions are:
>> 
>> - Is this a valid use case for CouchDB? I want to use it primarily because
>> of its good replication capabilities, especially in not reliable
>> environments with some periods of being offline etc. Otherwise I'll have to
>> write the whole set of data sync APIs with buffering, retries etc myself.
>> 
> 
> In our case, mobiles replicating to and from CouchDB, replication has
> proven to be very very reliable, it just works.
> If i had to implement that on my own it would have been much much worse.
> 
> 
>> - Is this recommended practice to set up a chain of replication? Due to
>> security considerations I want customer devices to replicate each to its
>> own database in the cloud. Then I want those databases to replicate to the
>> single central log database I'd subscribe for _changes. The reason is that
>> it's easier for me to have a single source of _changes feed rather than
>> multiple databases.
>> 
> 
> We do this, each user got his own database which we consider "outside", we
> monitor these databases for changes and take appropriate actions. :)
> On our server continuous replications from thousands of customer databases
> to a central database occupied to many connections¹ and the performance
> over all degraded, even if only "some" users where actually active. We now
> listen to DB changes (_db_updates), start replications for the DB in
> question and stop them again after a certain timeout, activity obviously
> resets the timeout.
> 
> If you consider the single central log database only so you have a single
> changes feed (you don't need centralized views etc.) I would skip the
> central database and just process _all_docs (or a view only containing
> unprocessed logentries) of any database an "updated" event was triggered on.
> 
> If you go that route, you are half way through rotating databases already,
> your backend doesn't care anymore on which database a change is triggered
> on.
> 
> 
>> - Is using _purge safe in my case? From the official doc I read "In
>> clustered or replicated environments it is very difficult to guarantee that
>> a particular purged document has been removed from all replicas". I don't
>> think this is a problem for me as I primarily care about database size so
>> it shouldn't be critical if some documents fail to delete.
>> 
> 
> _purge is the wrong tool for this job.
> From my understanding it's there as a last resort to get sensitive data out
> of a DB.
> 
> Regards,
> Stefan
> 
> [1] I think the main reason for this was actually the operating system, but
> it was faster, easier and more future proof to implement the described
> solution than to tune the OS to handle the connection, at least for me.

Re: Using CouchDB for transferring user logs via the internet

Reply via email to