Re: The state of filtered replication

Robert Newson Thu, 26 May 2016 02:52:24 -0700

There must be something else wrong. Filtered replications definitely make and 
resume from checkpoints, same as unfiltered.


We mix the filter code and parameters into the replication checkpoint id to 
ensure we start from 0 for a potentially different filtering. Perhaps you are 
changing those? Or maybe supplying since_seq as well (which overrides the 
checkpoint)?

Sent from my iPhone

> On 25 May 2016, at 16:39, Paul Okstad <[email protected]> wrote:
> 
> This isn’t just a problem of filtered replication, it’s a major issue in the 
> database-per-user strategy (at least in the v1.6.1 I’m using). I’m also using 
> a database-per-user design with thousands of users and a single global 
> database. If a small fraction of the users (hundreds) has continuously 
> ongoing replications from the user DB to the global DB, it will cause 
> extremely high CPU utilization. This is without any replication filtered 
> javascript function.
> 
> Another huge issue with filtered replications is that they lose their place 
> when replications are restarted. In other words, they don’t keep track of 
> sequence ID between restarts of the server or stopping and starting the same 
> replication. So for example, if I want to perform filtered replication of 
> public documents from the global DB to the public DB, and I have a ton of 
> documents in global, then each time I restart the filtered replication it 
> will begin from sequence #1. I’m guessing this is due to the fact that 
> CouchDB does not know if the filter function has been modified between 
> replications, but this behavior is still very disappointing.
> 
> — 
> Paul Okstad
> http://pokstad.com <http://pokstad.com/>
> 
> 
> 
>> On May 25, 2016, at 4:25 AM, Stefan Klein <[email protected]> wrote:
>> 
>> 2016-05-25 12:48 GMT+02:00 Stefan du Fresne <[email protected]>:
>> 
>> 
>> 
>>> So to be clear, this is effectively replacing replication— where the
>>> client negotiates with the server for a collection of changes to download—
>>> with a daemon that builds up a collection of documents that each client
>>> should get (and also presumably delete), which clients can then query for
>>> when they’re able?
>> 
>> Sorry, didn't describe well enough.
>> 
>> On Serverside we have one big database containing all documents and one db
>> for each user.
>> The clients always replicate to and from their individual userdb,
>> unfiltered. So the db for a user is a 1:1 copy of their pouchdb/... on
>> their client.
>> 
>> Initially we set up a filtered replication for each user from servers main
>> database to the server copy of the users database.
>> With this we ran into performance problems and sooner or later we probably
>> would have ran into issues with open file descriptors.
>> 
>> So what we do instead is listening to the changes of the main database and
>> distribute the documents to the servers userdb, which then are synced with
>> the clients.
>> 
>> Note: this is only for documents the users actually work with (as in
>> possibly modify), for queries on the data we query views on the main
>> database.
>> 
>> For the way back, we listen to the _dbchanges, so we get an event for
>> changes on the users dbs, get that change from the users db and determine
>> what to do with it.
>> We do not replicate back users changes to the main database but rather have
>> an internal API to evaluate all kinds of constrains on users input.
>> If you do not have to check users input, you could certainly listen to
>> _dbchanges and "blindly" one-shot replicate from the changed DB to your
>> main DB.
>> 
>> -- 
>> Stefan
>

Re: The state of filtered replication

Reply via email to