Re: The state of filtered replication

Robert Newson Thu, 26 May 2016 11:23:49 -0700

All replications should checkpoint periodically too, not just at the end. The 
log will show this, PUT to a _local url.


Sent from my iPhone

> On 26 May 2016, at 14:04, Paul Okstad <[email protected]> wrote:
> 
> I'll double check my situation since I have not thoroughly verified it. This 
> particular issue occurs between restarts of the server where I make no 
> changes to the continuous replications in the _replicator DB, but it may also 
> be related to the issue of too many continuous replications causing a 
> replications to stall out from lack of resources. It's possible that I 
> assumed they were starting over from seq 1 when in fact they were never able 
> to complete a full replication in the first place.
> 
> -- 
> Paul Okstad
> 
>> On May 26, 2016, at 2:51 AM, Robert Newson <[email protected]> wrote:
>> 
>> There must be something else wrong. Filtered replications definitely make 
>> and resume from checkpoints, same as unfiltered.
>> 
>> We mix the filter code and parameters into the replication checkpoint id to 
>> ensure we start from 0 for a potentially different filtering. Perhaps you 
>> are changing those? Or maybe supplying since_seq as well (which overrides 
>> the checkpoint)?
>> 
>> Sent from my iPhone
>> 
>>> On 25 May 2016, at 16:39, Paul Okstad <[email protected]> wrote:
>>> 
>>> This isn’t just a problem of filtered replication, it’s a major issue in 
>>> the database-per-user strategy (at least in the v1.6.1 I’m using). I’m also 
>>> using a database-per-user design with thousands of users and a single 
>>> global database. If a small fraction of the users (hundreds) has 
>>> continuously ongoing replications from the user DB to the global DB, it 
>>> will cause extremely high CPU utilization. This is without any replication 
>>> filtered javascript function.
>>> 
>>> Another huge issue with filtered replications is that they lose their place 
>>> when replications are restarted. In other words, they don’t keep track of 
>>> sequence ID between restarts of the server or stopping and starting the 
>>> same replication. So for example, if I want to perform filtered replication 
>>> of public documents from the global DB to the public DB, and I have a ton 
>>> of documents in global, then each time I restart the filtered replication 
>>> it will begin from sequence #1. I’m guessing this is due to the fact that 
>>> CouchDB does not know if the filter function has been modified between 
>>> replications, but this behavior is still very disappointing.
>>> 
>>> — 
>>> Paul Okstad
>>> http://pokstad.com <http://pokstad.com/>
>>> 
>>> 
>>> 
>>>> On May 25, 2016, at 4:25 AM, Stefan Klein <[email protected]> wrote:
>>>> 
>>>> 2016-05-25 12:48 GMT+02:00 Stefan du Fresne <[email protected]>:
>>>> 
>>>> 
>>>> 
>>>>> So to be clear, this is effectively replacing replication— where the
>>>>> client negotiates with the server for a collection of changes to download—
>>>>> with a daemon that builds up a collection of documents that each client
>>>>> should get (and also presumably delete), which clients can then query for
>>>>> when they’re able?
>>>> 
>>>> Sorry, didn't describe well enough.
>>>> 
>>>> On Serverside we have one big database containing all documents and one db
>>>> for each user.
>>>> The clients always replicate to and from their individual userdb,
>>>> unfiltered. So the db for a user is a 1:1 copy of their pouchdb/... on
>>>> their client.
>>>> 
>>>> Initially we set up a filtered replication for each user from servers main
>>>> database to the server copy of the users database.
>>>> With this we ran into performance problems and sooner or later we probably
>>>> would have ran into issues with open file descriptors.
>>>> 
>>>> So what we do instead is listening to the changes of the main database and
>>>> distribute the documents to the servers userdb, which then are synced with
>>>> the clients.
>>>> 
>>>> Note: this is only for documents the users actually work with (as in
>>>> possibly modify), for queries on the data we query views on the main
>>>> database.
>>>> 
>>>> For the way back, we listen to the _dbchanges, so we get an event for
>>>> changes on the users dbs, get that change from the users db and determine
>>>> what to do with it.
>>>> We do not replicate back users changes to the main database but rather have
>>>> an internal API to evaluate all kinds of constrains on users input.
>>>> If you do not have to check users input, you could certainly listen to
>>>> _dbchanges and "blindly" one-shot replicate from the changed DB to your
>>>> main DB.
>>>> 
>>>> -- 
>>>> Stefan
>>

Re: The state of filtered replication

Reply via email to