On Jul 9, 2013, at 11:09 AM, Robert Newson <[email protected]> wrote:

> If you didn’t have filters at all, but still had n^2 replications, you've 
> still got a scaling problem, it's just not directly related to the filtering 
> overhead.

Yes, I agree that CouchDB filtering is not significantly higher-CPU than not 
filtering :) and likely cheaper if you include the savings from not 
transmitting the filtered-out revisions.

But if you _do_ filter heavily, so any one client is seeing only a small 
fraction of the total update traffic, the filtering overhead starts to dominate 
as the number of clients grows. Because the server is still fetching, decoding 
and running a JS function on (say) 100 or 1000 rejected documents for every one 
that does get sent. That’s a pretty typical scenario for a system with mobile 
or desktop clients — think of Exchange or SalesForce.com or Words With Friends; 
what fraction of the total server-side updates does any one client see?

The alternative is the hypothetical view-based filtering that’s been talked 
about here before, where the source db would iterate over a pre-filtered list 
of revisions from a view index rather than going through the entire by-sequence 
index. Or the actual-but-alpha-quality “channels” mechanism we’re using in the 
Couchbase Sync Gateway.

Anyway. I’m not meaning to harsh on filtering in general, and in the OP’s case 
it sounds like the target databases are corporate customers rather than 
end-users, so there probably aren’t nearly as many of them as in the scenarios 
I’m talking about.

—Jens

Reply via email to