Re: CouchDB pegging the CPU and not responding to requests

John Wood Tue, 01 Sep 2009 11:20:49 -0700

Thanks for the reply Chris.

I'll look into upgrading our test environment to the trunk version of
CouchDB, and see if I can reproduce the error there.


We're using CouchRest version 0.33 as the client library.

Thanks again,
John

On Tue, Sep 1, 2009 at 12:49 PM, Chris Anderson <[email protected]> wrote:

> On Tue, Sep 1, 2009 at 7:52 AM, John Wood<[email protected]>
> wrote:
> > Hi everybody,
> >
> > I'm currently facing an issue with our production installation of
> CouchDB.
> > Two times within the past 5 days, the Erlang process running CouchDB pegs
> > one of the 4 cores on the machine, consumes about 40% of the system RAM
> > (which is 4GB), and becomes completely unresponsive to incoming HTTP
> > requests.  The only way we can get it back to normal is to restart
> CouchDB.
> >
> > I'm trying to determine what may be causing this, but I'm not having much
> > luck.  Nothing stands out in the CouchDB log files.  I can see that there
> > are no entries in the log files from the time it goes unresponsive until
> the
> > time I restart it.  Besides that, there doesn't appear to be any errors
> > leading up to the issue.  There are however a few errors like the one
> below,
> > but none right before CouchDB goes unresponsive:
> >
> > [error] [<0.11738.288>] {error_report,<0.21.0>,
> >    {<0.11738.288>,crash_report,
> >     [[{pid,<0.11738.288>},
> >       {registered_name,[]},
> >       {error_info,
> >           {error,
> >               {case_clause,{error,enotconn}},
> >               [{mochiweb_request,get,2},
> >                {couch_httpd,handle_request,4},
> >                {mochiweb_http,headers,5},
> >                {proc_lib,init_p,5}]}},
> >       {initial_call,
> >           {mochiweb_socket_server,acceptor_loop,
> >               [{<0.56.0>,#Port<0.148>,#Fun<mochiweb_http.1.81679042>}]}},
> >       {ancestors,
> >           [couch_httpd,couch_secondary_services,couch_server_sup,
> >            <0.1.0>]},
> >       {messages,[]},
> >       {links,[<0.56.0>,#Port<0.5032425>]},
> >       {dictionary,[{mochiweb_request_qs,[{"limit","0"}]}]},
> >       {trap_exit,false},
> >       {status,running},
> >       {heap_size,28657},
> >       {stack_size,23},
> >       {reductions,14034}],
> >      []]}}
> > [error] [<0.56.0>] {error_report,<0.21.0>,
> >    {<0.56.0>,std_error,
> >     {mochiweb_socket_server,235,
> >         {child_error,{case_clause,{error,enotconn}}}}}}
> >
> > =ERROR REPORT==== 30-Aug-2009::04:29:07 ===
> > {mochiweb_socket_server,235,
> >                        {child_error,{case_clause,{error,enotconn}}}}
> >
> > I checked some of the other system log files (/var/log/messages, etc),
> and
> > there doesn't appear to be any information there either.
> >
> > Our CouchDB installation is fairly large.  We have 7 production
> databases,
> > totaling almost 250GB.  The largest database is 129GB.  We are running
> > CouchDB 0.9.0 on Red Hat Enterprise Server 5.3.  As far as usage goes, we
> > are constantly inserting documents into the database (5,000 at a time via
> a
> > bulk insert), and pausing to regenerate the views after 100,000 documents
> > have been inserted.  Besides for the process that does the inserts, all
> > views are accessed using stale=ok.
> >
> > Has anybody else faced a similar issue?  Can anybody suggest tips
> regarding
> > how I should go about diagnosing this issue?
> >
>
> Just a guess, based on the information available here, but the
> enotconn error suggests that the remote client is dropping the
> connection prematurely. There is an old bug about this in the tracker,
> which might be a good thing to reopen if we learn much more about the
> issue (and it is still present in trunk / 0.10):
>
> http://issues.apache.org/jira/browse/COUCHDB-45
>
> There is also this open bug which could be related:
>
> https://issues.apache.org/jira/browse/COUCHDB-394
>
> Perhaps you have clients who aren't properly closing the connection,
> and them somehow this is running up against a limit in the underlying
> server system (max number of connections, or maybe even max number of
> erlang processes in the vm).
>
> It would be nice to get to the bottom of this one, eventually.
>
> The first step I'd suggest taking is attempting to reproduce on the
> 0.10.x branch from svn. This will at least tell us if the bug has been
> fixed. If it's still around and repeatable, that will give us a test
> case for finally crushing it into oblivion.
>
> It might help to know more about which client library you are using,
> as this bug seems to depend on the TCP behavior of clients.
>
> Chris
>
> > Thanks,
> > John
> >
> > --
> > John Wood
> > Interactive Mediums
> > [email protected]
> >
>
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>



-- 
John Wood
Interactive Mediums
[email protected]

Re: CouchDB pegging the CPU and not responding to requests

Reply via email to