I'll consider delayed_commits.

The database was 85MB before compaction. We ran compact and it was still 85Mb.  
So compact didn't work.  The same db on other servers will compact ~10x its 
original size.




> I strongly suggest disabling delayed_commits on general principles (what's 
> written should stay written). Are you able to compact the database(s) that 
> give this error?
> 
> B.
> 
> On 7 Aug 2012, at 18:42, stephen bartell wrote:
> 
>> delayed_commits = true
>> 
>> Stephen Bartell
>> 
>> On Aug 7, 2012, at 10:39 AM, Robert Newson wrote:
>> 
>>> Are you running with delayed_commits=true or false?
>>> 
>>> B.
>>> 
>>> On 7 Aug 2012, at 18:27, stephen bartell wrote:
>>> 
>>>> 
>>>>> Hi Stephen,
>>>>> 
>>>>> Can you tell us anymore about the context, or did you start seeing these 
>>>>> in the logs?
>>>> 
>>>> Sure, here's some context.  This couch is part of a demo server.  It 
>>>> travels a lot and is cycled a lot.  There is one physical server, it 
>>>> consists of nginx (serving web apps and reverse proxying for couch), 
>>>> couchdb for persistence, and numerous programs which read and write to 
>>>> couch.  Traffic on couch can get very heavy.
>>>> 
>>>> I didn't first see this in the logs.  Some of the web apps would grind to 
>>>> a halt, nginx would return 404, and then eventually couch would restart.  
>>>> This would happen every couple of minutes. 
>>>> 
>>>>> By chance do you have a scenario that reproduces this? Was this db 
>>>>> compacted or replicated from elsewhere?
>>>> 
>>>> I wish I had a pliable scenario other than sending the server through taxi 
>>>> cabs, airlines, and pulling the power cord several times a day.  We 
>>>> haven't seen this on any of our production servers.
>>>> This server was not subject to any replication.  Most databases on it are 
>>>> compacted often.  
>>>> 
>>>> Last night we were able to drill down to one particular program which was 
>>>> triggering the crash.  One by one, we backed up, deleted, and rebuilt the 
>>>> databases that program touched.  There was one database which seemed to be 
>>>> the culprit, lets call it History.  History is a dumping ground for stale 
>>>> docs from another db. History is almost always written to, and rarely read 
>>>> from.   We don't compact History since all docs in it are one revision 
>>>> deep.  We never replicate to or from it.  The only reason we deem History 
>>>> the culprit is because after rebuilding it, there hasn't been a crash for 
>>>> over 12 hours.
>>>> 
>>>> I have an additional question.  Is it possible to turn couch logging off 
>>>> entirely, or would redirecting to dev/null suffice?  When couch would 
>>>> crash, hundreds of MB of crap would get dumped to the log. ( 
>>>> {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' … 
>>>> ,0,3,232>>}}).  Right when this dump occurred, the cpu spiked and the 
>>>> server began its downward descent. 
>>>> 
>>>> Best
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Bob
>>>>> On Aug 7, 2012, at 2:06 AM, stephen bartell <[email protected]> wrote:
>>>>> 
>>>>>> Hi all, could some one help shed some light on this crash I'm having.  
>>>>>> I'm on v1.2, ubuntu 11.04.  
>>>>>> 
>>>>>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic server 
>>>>>> <0.492.0> terminating 
>>>>>> ** Last message in was {pread_iolist,88385709}
>>>>>> ** When Server state == 
>>>>>> {file,{file_descriptor,prim_file,{#Port<0.2899>,79}},
>>>>>>                         93302896}
>>>>>> ** Reason for termination == 
>>>>>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump … ,0,3,232>>}},
>>>>>> [{couch_file,read_raw_iolist_int,3},
>>>>>> {couch_file,maybe_read_more_iolist,4},
>>>>>> {couch_file,handle_call,3},
>>>>>> {gen_server,handle_msg,5},
>>>>>> {proc_lib,init_p_do_apply,3}]}
>>>>>> 
>>>>>> I'm not too familiar with erlang, but what I gathered from the src was 
>>>>>> `pread_iolist` function is used when reading anything from the disk.  So 
>>>>>> I think this might be a corrupt db problem.
>>>>>> 
>>>>>> Thanks,
>>>>>> Stephen Bartell
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to