Sorry to hear you are having trouble. It's odd this started only in 1.5, otherwise I'd wonder if it were related to https://issues.apache.org/jira/browse/COUCHDB-1874 — that's an older bug. I know I've had trouble with large replications hanging lately too, but haven't been able to track it down.
hth, -nvw On Nov 17, 2013, at 2:13 PM, Edward Levin <[email protected]> wrote: > Hi, > > I recently compiled and installed CouchDB 1.5.0 on Ubuntu 12.04 with the > intention or creating my own replica of the NPM registry. I followed the > official NPM (https://github.com/isaacs/npmjs.org) and kicked off > replication of the NPM database (50gb+): > > curl -X POST http://@10.0.0.34:5984/_replicate -d '{"source":" > http://isaacs.iriscouch.com/registry/", "target":"registry", > "continuous":true, "create_target":true}' -H "Content-Type: > application/json" > > Replication was proceeding smoothly for a few days until the database > reached 48gb. From this point forward replication crashes every time after > a few minutes with the following error: > > > [Sun, 17 Nov 2013 22:10:25 GMT] [error] [<0.609.0>] ** Generic server > <0.609.0> terminating > ** Last message in was {#Port<0.3446>,{exit_status,137}} > ** When Server state == {os_proc,"/usr/local/bin/couchjs > /usr/local/share/couchdb/server/main.js", > #Port<0.3446>, > #Fun<couch_os_process.2.132569728>, > #Fun<couch_os_process.3.35601548>,5000} > ** Reason for termination == > ** {exit_status,137} > > [Sun, 17 Nov 2013 22:10:29 GMT] [error] [<0.609.0>] {error_report,<0.31.0>, > {<0.609.0>,crash_report, > [[{initial_call, > {couch_os_process,init,['Argument__1']}}, > {pid,<0.609.0>}, > {registered_name,[]}, > {error_info, > {exit, > {exit_status,137}, > [{gen_server,terminate,6}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > > [couch_query_servers,couch_secondary_services, > couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.104.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,6765}, > {stack_size,24}, > {reductions,12152}], > []]}} > > > > Further investigation revealed that after replication is invoked, process: > > /usr/lib/erlang/erts-5.8.5/bin/beam > > starts consuming all available system memory within a few minutes (1.5gb > ram + 1gb swap) until a crash and an OOM error above. > > Result from _active_tasks prior to crash: > > [{"pid":"<0.397.0>","checkpointed_source_seq":729415,"continuous":true,"doc_id":null,"doc_write_failures":0,"docs_read":0,"docs_written":0,"missing_revisions_found":0,"progress":92,"replication_id":"42d81068841a085e7120226f0a010519+continuous+create_target","revisions_checked":982,"source":" > http://isaacs.iriscouch.com/registry/ > ","source_seq":787280,"started_on":1384725566,"target":"registry","type":"replication","updated_on":1384725571}] > > Tried reducing worker_processes to 1 and worker_batch_size to 100 with no > effect. > > At this point not sure if this behavior might be due to a memory leak, > insufficient resources, or a misconfiguration. > > Any help would be appreciated, > > Ed
