On Fri, Feb 25, 2011 at 4:18 AM, Pasi Eronen <[email protected]> wrote: > Hi, > > I had a big batch job (inserting 10M+ documents and generating views for them) > that ran just fine for about 6 hours, but then I got this error: > > [Thu, 24 Feb 2011 19:42:57 GMT] [error] [<0.276.0>] ** Generic server > <0.276.0> terminating > ** Last message in was delayed_commit > ** When Server state == {db,<0.275.0>,<0.276.0>,nil,<<"1298547642391489">>, > <0.273.0>,<0.277.0>, > {db_header,5,739828,0, > {4778613011,{663866,0}}, > {4778614954,663866}, > nil,0,nil,nil,1000}, > 739828, > {btree,<0.273.0>, > {4778772755,{663866,0}}, > #Fun<couch_db_updater.7.10053969>, > #Fun<couch_db_updater.8.35220795>, > #Fun<couch_btree.5.124754102>, > #Fun<couch_db_updater.9.107593676>}, > {btree,<0.273.0>, > {4778774698,663866}, > #Fun<couch_db_updater.10.30996817>, > #Fun<couch_db_updater.11.96515267>, > #Fun<couch_btree.5.124754102>, > #Fun<couch_db_updater.12.117826253>}, > {btree,<0.273.0>,nil, > #Fun<couch_btree.0.83553141>, > #Fun<couch_btree.1.30790806>, > #Fun<couch_btree.2.124754102>,nil}, > 739831,<<"foo_replication_tmp">>, > "/data/foo/couchdb-data/foo_replication_tmp.couch", > [],[],nil, > {user_ctx,null,[],undefined}, > #Ref<0.0.1793.256453>,1000, > [before_header,after_header,on_file_open], > false} > ** Reason for termination == > ** {{badmatch,{error,emfile}}, > [{couch_file,sync,1}, > {couch_db_updater,commit_data,2}, > {couch_db_updater,handle_info,2}, > {gen_server,handle_msg,5}, > {proc_lib,init_p_do_apply,3}]} > > (+lot of other messages with the same timestamp -- can send if they're useful) > > Exactly at this time, the client got HTTP 500 status code; the request > was a bulk get (POST /foo_replication_tmp/_all_docs?include_docs=true). > > Just before this request, the client had made a PUT (updating an existing > document) that got 200 status code, but apparently was not successfully > committed to the disk (I'm using "delayed_commits=true" - for my app, > this is just fine). The client had received the new _rev value, but when > it tried updating the same document a minute later, there was a conflict > (and it's not possible that somebody else updated this same document). > > About four hours later, there was a different error ("accept_failed" > sounds like some temporary problem with sockets?): > > [Thu, 24 Feb 2011 23:55:42 GMT] [error] [<0.20693.4>] {error_report,<0.31.0>, > {<0.20693.4>,std_error, > [{application,mochiweb}, > "Accept failed error","{error,emfile}"]}} > > [Thu, 24 Feb 2011 23:55:42 GMT] [error] [<0.20693.4>] {error_report,<0.31.0>, > {<0.20693.4>,crash_report, > [[{initial_call,{mochiweb_socket_server,acceptor_loop,['Argument__1']}}, > {pid,<0.20693.4>}, > {registered_name,[]}, > {error_info, > {exit, > {error,accept_failed}, > [{mochiweb_socket_server,acceptor_loop,1}, > {proc_lib,init_p_do_apply,3}]}}, > {ancestors, > [couch_httpd,couch_secondary_services,couch_server_sup,<0.32.0>]}, > {messages,[]}, > {links,[<0.106.0>]}, > {dictionary,[]}, > {trap_exit,false}, > {status,running}, > {heap_size,233}, > {stack_size,24}, > {reductions,200}], > []]}} > > (+lots of other messages within the next couple of minutes) > > The same error occured once more, about four hours later. > > I'm quite new to CouchDB, so I'd appreciate any help in interpreting > what these error messages mean. (BTW, are these something I should > report as bugs in JIRA? I can do that, but I'd like to at least understand > which parts of the error messages are actually relevant here :-) > > I'm running CouchDB 1.0.2 with Erlang R14B on 64-bit RHEL 5.6. > > Best regards, > Pasi >
The error you're getting is because CouchDB is running out of available file descriptors to use. Try increasing the limit for the user running CouchDB.
