Re: The state of filtered replication

Paul Okstad Thu, 26 May 2016 07:13:35 -0700

I'll double check my situation since I have not thoroughly verified it. This 
particular issue occurs between restarts of the server where I make no changes 
to the continuous replications in the _replicator DB, but it may also be 
related to the issue of too many continuous replications causing a replications 
to stall out from lack of resources. It's possible that I assumed they were 
starting over from seq 1 when in fact they were never able to complete a full 
replication in the first place.


-- 
Paul Okstad

> On May 26, 2016, at 2:51 AM, Robert Newson <[email protected]> wrote:
> 
> There must be something else wrong. Filtered replications definitely make and 
> resume from checkpoints, same as unfiltered.
> 
> We mix the filter code and parameters into the replication checkpoint id to 
> ensure we start from 0 for a potentially different filtering. Perhaps you are 
> changing those? Or maybe supplying since_seq as well (which overrides the 
> checkpoint)?
> 
> Sent from my iPhone
> 
>> On 25 May 2016, at 16:39, Paul Okstad <[email protected]> wrote:
>> 
>> This isn’t just a problem of filtered replication, it’s a major issue in the 
>> database-per-user strategy (at least in the v1.6.1 I’m using). I’m also 
>> using a database-per-user design with thousands of users and a single global 
>> database. If a small fraction of the users (hundreds) has continuously 
>> ongoing replications from the user DB to the global DB, it will cause 
>> extremely high CPU utilization. This is without any replication filtered 
>> javascript function.
>> 
>> Another huge issue with filtered replications is that they lose their place 
>> when replications are restarted. In other words, they don’t keep track of 
>> sequence ID between restarts of the server or stopping and starting the same 
>> replication. So for example, if I want to perform filtered replication of 
>> public documents from the global DB to the public DB, and I have a ton of 
>> documents in global, then each time I restart the filtered replication it 
>> will begin from sequence #1. I’m guessing this is due to the fact that 
>> CouchDB does not know if the filter function has been modified between 
>> replications, but this behavior is still very disappointing.
>> 
>> — 
>> Paul Okstad
>> http://pokstad.com <http://pokstad.com/>
>> 
>> 
>> 
>>> On May 25, 2016, at 4:25 AM, Stefan Klein <[email protected]> wrote:
>>> 
>>> 2016-05-25 12:48 GMT+02:00 Stefan du Fresne <[email protected]>:
>>> 
>>> 
>>> 
>>>> So to be clear, this is effectively replacing replication— where the
>>>> client negotiates with the server for a collection of changes to download—
>>>> with a daemon that builds up a collection of documents that each client
>>>> should get (and also presumably delete), which clients can then query for
>>>> when they’re able?
>>> 
>>> Sorry, didn't describe well enough.
>>> 
>>> On Serverside we have one big database containing all documents and one db
>>> for each user.
>>> The clients always replicate to and from their individual userdb,
>>> unfiltered. So the db for a user is a 1:1 copy of their pouchdb/... on
>>> their client.
>>> 
>>> Initially we set up a filtered replication for each user from servers main
>>> database to the server copy of the users database.
>>> With this we ran into performance problems and sooner or later we probably
>>> would have ran into issues with open file descriptors.
>>> 
>>> So what we do instead is listening to the changes of the main database and
>>> distribute the documents to the servers userdb, which then are synced with
>>> the clients.
>>> 
>>> Note: this is only for documents the users actually work with (as in
>>> possibly modify), for queries on the data we query views on the main
>>> database.
>>> 
>>> For the way back, we listen to the _dbchanges, so we get an event for
>>> changes on the users dbs, get that change from the users db and determine
>>> what to do with it.
>>> We do not replicate back users changes to the main database but rather have
>>> an internal API to evaluate all kinds of constrains on users input.
>>> If you do not have to check users input, you could certainly listen to
>>> _dbchanges and "blindly" one-shot replicate from the changed DB to your
>>> main DB.
>>> 
>>> -- 
>>> Stefan
>

Re: The state of filtered replication

Reply via email to