Re: random couch crash

stephen bartell Tue, 07 Aug 2012 10:43:20 -0700

delayed_commits = true

Stephen Bartell


On Aug 7, 2012, at 10:39 AM, Robert Newson wrote:

> Are you running with delayed_commits=true or false?
> 
> B.
> 
> On 7 Aug 2012, at 18:27, stephen bartell wrote:
> 
>> 
>>> Hi Stephen,
>>> 
>>> Can you tell us anymore about the context, or did you start seeing these in 
>>> the logs?
>> 
>> Sure, here's some context.  This couch is part of a demo server.  It travels 
>> a lot and is cycled a lot.  There is one physical server, it consists of 
>> nginx (serving web apps and reverse proxying for couch), couchdb for 
>> persistence, and numerous programs which read and write to couch.  Traffic 
>> on couch can get very heavy.
>> 
>> I didn't first see this in the logs.  Some of the web apps would grind to a 
>> halt, nginx would return 404, and then eventually couch would restart.  This 
>> would happen every couple of minutes. 
>> 
>>> By chance do you have a scenario that reproduces this? Was this db 
>>> compacted or replicated from elsewhere?
>> 
>> I wish I had a pliable scenario other than sending the server through taxi 
>> cabs, airlines, and pulling the power cord several times a day.  We haven't 
>> seen this on any of our production servers.
>> This server was not subject to any replication.  Most databases on it are 
>> compacted often.  
>> 
>> Last night we were able to drill down to one particular program which was 
>> triggering the crash.  One by one, we backed up, deleted, and rebuilt the 
>> databases that program touched.  There was one database which seemed to be 
>> the culprit, lets call it History.  History is a dumping ground for stale 
>> docs from another db. History is almost always written to, and rarely read 
>> from.   We don't compact History since all docs in it are one revision deep. 
>>  We never replicate to or from it.  The only reason we deem History the 
>> culprit is because after rebuilding it, there hasn't been a crash for over 
>> 12 hours.
>> 
>> I have an additional question.  Is it possible to turn couch logging off 
>> entirely, or would redirecting to dev/null suffice?  When couch would crash, 
>> hundreds of MB of crap would get dumped to the log. ( 
>> {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' … ,0,3,232>>}}).  
>> Right when this dump occurred, the cpu spiked and the server began its 
>> downward descent. 
>> 
>> Best
>> 
>>> 
>>> Thanks,
>>> 
>>> Bob
>>> On Aug 7, 2012, at 2:06 AM, stephen bartell <[email protected]> wrote:
>>> 
>>>> Hi all, could some one help shed some light on this crash I'm having.  I'm 
>>>> on v1.2, ubuntu 11.04.  
>>>> 
>>>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic server 
>>>> <0.492.0> terminating 
>>>> ** Last message in was {pread_iolist,88385709}
>>>> ** When Server state == 
>>>> {file,{file_descriptor,prim_file,{#Port<0.2899>,79}},
>>>>                           93302896}
>>>> ** Reason for termination == 
>>>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump … ,0,3,232>>}},
>>>> [{couch_file,read_raw_iolist_int,3},
>>>>  {couch_file,maybe_read_more_iolist,4},
>>>>  {couch_file,handle_call,3},
>>>>  {gen_server,handle_msg,5},
>>>>  {proc_lib,init_p_do_apply,3}]}
>>>> 
>>>> I'm not too familiar with erlang, but what I gathered from the src was 
>>>> `pread_iolist` function is used when reading anything from the disk.  So I 
>>>> think this might be a corrupt db problem.
>>>> 
>>>> Thanks,
>>>> Stephen Bartell
>>> 
>> 
>

Re: random couch crash

Reply via email to