Thanks for the reply Chris. I'll look into upgrading our test environment to the trunk version of CouchDB, and see if I can reproduce the error there.
We're using CouchRest version 0.33 as the client library. Thanks again, John On Tue, Sep 1, 2009 at 12:49 PM, Chris Anderson <[email protected]> wrote: > On Tue, Sep 1, 2009 at 7:52 AM, John Wood<[email protected]> > wrote: > > Hi everybody, > > > > I'm currently facing an issue with our production installation of > CouchDB. > > Two times within the past 5 days, the Erlang process running CouchDB pegs > > one of the 4 cores on the machine, consumes about 40% of the system RAM > > (which is 4GB), and becomes completely unresponsive to incoming HTTP > > requests. The only way we can get it back to normal is to restart > CouchDB. > > > > I'm trying to determine what may be causing this, but I'm not having much > > luck. Nothing stands out in the CouchDB log files. I can see that there > > are no entries in the log files from the time it goes unresponsive until > the > > time I restart it. Besides that, there doesn't appear to be any errors > > leading up to the issue. There are however a few errors like the one > below, > > but none right before CouchDB goes unresponsive: > > > > [error] [<0.11738.288>] {error_report,<0.21.0>, > > {<0.11738.288>,crash_report, > > [[{pid,<0.11738.288>}, > > {registered_name,[]}, > > {error_info, > > {error, > > {case_clause,{error,enotconn}}, > > [{mochiweb_request,get,2}, > > {couch_httpd,handle_request,4}, > > {mochiweb_http,headers,5}, > > {proc_lib,init_p,5}]}}, > > {initial_call, > > {mochiweb_socket_server,acceptor_loop, > > [{<0.56.0>,#Port<0.148>,#Fun<mochiweb_http.1.81679042>}]}}, > > {ancestors, > > [couch_httpd,couch_secondary_services,couch_server_sup, > > <0.1.0>]}, > > {messages,[]}, > > {links,[<0.56.0>,#Port<0.5032425>]}, > > {dictionary,[{mochiweb_request_qs,[{"limit","0"}]}]}, > > {trap_exit,false}, > > {status,running}, > > {heap_size,28657}, > > {stack_size,23}, > > {reductions,14034}], > > []]}} > > [error] [<0.56.0>] {error_report,<0.21.0>, > > {<0.56.0>,std_error, > > {mochiweb_socket_server,235, > > {child_error,{case_clause,{error,enotconn}}}}}} > > > > =ERROR REPORT==== 30-Aug-2009::04:29:07 === > > {mochiweb_socket_server,235, > > {child_error,{case_clause,{error,enotconn}}}} > > > > I checked some of the other system log files (/var/log/messages, etc), > and > > there doesn't appear to be any information there either. > > > > Our CouchDB installation is fairly large. We have 7 production > databases, > > totaling almost 250GB. The largest database is 129GB. We are running > > CouchDB 0.9.0 on Red Hat Enterprise Server 5.3. As far as usage goes, we > > are constantly inserting documents into the database (5,000 at a time via > a > > bulk insert), and pausing to regenerate the views after 100,000 documents > > have been inserted. Besides for the process that does the inserts, all > > views are accessed using stale=ok. > > > > Has anybody else faced a similar issue? Can anybody suggest tips > regarding > > how I should go about diagnosing this issue? > > > > Just a guess, based on the information available here, but the > enotconn error suggests that the remote client is dropping the > connection prematurely. There is an old bug about this in the tracker, > which might be a good thing to reopen if we learn much more about the > issue (and it is still present in trunk / 0.10): > > http://issues.apache.org/jira/browse/COUCHDB-45 > > There is also this open bug which could be related: > > https://issues.apache.org/jira/browse/COUCHDB-394 > > Perhaps you have clients who aren't properly closing the connection, > and them somehow this is running up against a limit in the underlying > server system (max number of connections, or maybe even max number of > erlang processes in the vm). > > It would be nice to get to the bottom of this one, eventually. > > The first step I'd suggest taking is attempting to reproduce on the > 0.10.x branch from svn. This will at least tell us if the bug has been > fixed. If it's still around and repeatable, that will give us a test > case for finally crushing it into oblivion. > > It might help to know more about which client library you are using, > as this bug seems to depend on the TCP behavior of clients. > > Chris > > > Thanks, > > John > > > > -- > > John Wood > > Interactive Mediums > > [email protected] > > > > > > -- > Chris Anderson > http://jchrisa.net > http://couch.io > -- John Wood Interactive Mediums [email protected]
