Nathan, 

I dropped the pool size down to 500 and still the same story.  I also tried 
lower the number of replicator processes down to 1 per replicator.  Again same 
thing.  

All the while, I keep an eye on how much memory beam.smp consumes during one 
replication "wave" and it never exceeds 2%.  So Im reluctant to think that the 
os is running out of memory.  It does seem like theres some sort of process 
contention however.  The error code that the replicators are reporting while 
trying to POST is 503.  I assume that this is for the web server being 
unavailable.

Yes Im going to add filtering on top of this, and I think Im going to need to 
do those in eel, although Id like to try first to avoid it.

This is probably a dumb question, do I need to restart couch after changes with 
these settings?


On Feb 5, 2013, at 10:22 AM, Nathan Vander Wilt <[email protected]> 
wrote:

> Hi Stephen,
> 
> I've been doing some tests related to replication lately too 
> (continuous+filtered in my case). I suspect the reason Futon hangs is because 
> your whole VM is running out of RAM due to your very high os_process_limit. I 
> went in to a bit more detail in 
> http://mail-archives.apache.org/mod_mbox/couchdb-dev/201302.mbox/%[email protected]%3e
>  but this setting basically determines the size of the couchjs worker pool — 
> you'd probably rather have a bit of contention for the pool at a reasonable 
> size (maybe ~100 per GB free, tops?) than start paging.
> 
> hth,
> -natevw
> 
> 
> 
> On Feb 4, 2013, at 5:15 PM, Stephen Bartell wrote:
> 
>> Hi all,
>> 
>> I'm hitting some limits while replicating , I'm hoping someone could advise. 
>>  
>> Im running this in a VM on my macbook with the following allocated resources:
>> ubuntu 11.04
>> 4 cores @ 2.3ghz
>> 8 gb mem
>> 
>> I'm doing a one-to-many replication.  
>> 1) I create one db named test. 
>> 2) Then create [test_0 .. test_99] databases.  
>> 3) I then set up replications from test -> [test_0 .. test_99].  100 
>> replications total.
>> 4) I finally go to test and create a doc, hit save.
>> 
>> When I hit save, futon becomes completely unresponsive for around 10sec.  It 
>> eventually returns to normal behavior.
>> 
>> Tailing the couchdb log I find waves of the following errors:
>> [Tue, 05 Feb 2013 00:46:26 GMT] [info] [<0.6936.1>] Retrying POST request to 
>> http://admin:*****@localhost:5984/test_25/_revs_diff in 1.0 seconds due to 
>> error {code,503}
>> 
>> I see that the replicator is finding the server to be unresponsive.  The 
>> waves of these messages show that replicator retries in 0.25 sec, then 0.5 
>> sec, then 1sec, then 2sec.  This is expected.  Everything settles done after 
>> about 4 retries.  
>> 
>> So my first thought is resource limits.  I threw the book at it and set :
>> 1) max_dbs_open: 500
>> 2) os_process_limit: 5000
>> 3) http_connections: 20000
>> 4) ulimit -Sn 4096 (the hard limit is 4096)
>> 
>> I really don't know whats reasonable for these values relative to how many 
>> replications I am setting up.  So these values, save max_dbs_open,  are all 
>> stabs in the dark.
>> 
>> No change in performance.
>> 
>> So, I'm at a loss now.  what can I do to get all this to work? Or what am I 
>> doing wrong?  And note that this is only a test.  I aim to quadruple  the 
>> amount of replications and have lots and lots of insertions on the so called 
>> "test" database.  Actually, there will be several of these one-to-many 
>> databases.
>> 
>> I've heard people get systems up to thousands of dbs and replicators running 
>> just fine.  So I hope Im just not offering to right sacrifices up to couchdb 
>> yet.
>> 
>> Thanks for any insight,
>> 
>> sb
>> 
> 

Reply via email to