did compaction complete, though? I wasn't thinking of reducing the file size,
but of being able to successfully read all live data and write it back out
again.
B.
On 7 Aug 2012, at 21:01, stephen bartell wrote:
> I'll consider delayed_commits.
>
> The database was 85MB before compaction. We ran compact and it was still
> 85Mb. So compact didn't work. The same db on other servers will compact
> ~10x its original size.
>
>
>
>
>> I strongly suggest disabling delayed_commits on general principles (what's
>> written should stay written). Are you able to compact the database(s) that
>> give this error?
>>
>> B.
>>
>> On 7 Aug 2012, at 18:42, stephen bartell wrote:
>>
>>> delayed_commits = true
>>>
>>> Stephen Bartell
>>>
>>> On Aug 7, 2012, at 10:39 AM, Robert Newson wrote:
>>>
>>>> Are you running with delayed_commits=true or false?
>>>>
>>>> B.
>>>>
>>>> On 7 Aug 2012, at 18:27, stephen bartell wrote:
>>>>
>>>>>
>>>>>> Hi Stephen,
>>>>>>
>>>>>> Can you tell us anymore about the context, or did you start seeing these
>>>>>> in the logs?
>>>>>
>>>>> Sure, here's some context. This couch is part of a demo server. It
>>>>> travels a lot and is cycled a lot. There is one physical server, it
>>>>> consists of nginx (serving web apps and reverse proxying for couch),
>>>>> couchdb for persistence, and numerous programs which read and write to
>>>>> couch. Traffic on couch can get very heavy.
>>>>>
>>>>> I didn't first see this in the logs. Some of the web apps would grind to
>>>>> a halt, nginx would return 404, and then eventually couch would restart.
>>>>> This would happen every couple of minutes.
>>>>>
>>>>>> By chance do you have a scenario that reproduces this? Was this db
>>>>>> compacted or replicated from elsewhere?
>>>>>
>>>>> I wish I had a pliable scenario other than sending the server through
>>>>> taxi cabs, airlines, and pulling the power cord several times a day. We
>>>>> haven't seen this on any of our production servers.
>>>>> This server was not subject to any replication. Most databases on it are
>>>>> compacted often.
>>>>>
>>>>> Last night we were able to drill down to one particular program which was
>>>>> triggering the crash. One by one, we backed up, deleted, and rebuilt the
>>>>> databases that program touched. There was one database which seemed to
>>>>> be the culprit, lets call it History. History is a dumping ground for
>>>>> stale docs from another db. History is almost always written to, and
>>>>> rarely read from. We don't compact History since all docs in it are one
>>>>> revision deep. We never replicate to or from it. The only reason we
>>>>> deem History the culprit is because after rebuilding it, there hasn't
>>>>> been a crash for over 12 hours.
>>>>>
>>>>> I have an additional question. Is it possible to turn couch logging off
>>>>> entirely, or would redirecting to dev/null suffice? When couch would
>>>>> crash, hundreds of MB of crap would get dumped to the log. (
>>>>> {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' …
>>>>> ,0,3,232>>}}). Right when this dump occurred, the cpu spiked and the
>>>>> server began its downward descent.
>>>>>
>>>>> Best
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Bob
>>>>>> On Aug 7, 2012, at 2:06 AM, stephen bartell <[email protected]> wrote:
>>>>>>
>>>>>>> Hi all, could some one help shed some light on this crash I'm having.
>>>>>>> I'm on v1.2, ubuntu 11.04.
>>>>>>>
>>>>>>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic server
>>>>>>> <0.492.0> terminating
>>>>>>> ** Last message in was {pread_iolist,88385709}
>>>>>>> ** When Server state ==
>>>>>>> {file,{file_descriptor,prim_file,{#Port<0.2899>,79}},
>>>>>>> 93302896}
>>>>>>> ** Reason for termination ==
>>>>>>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump … ,0,3,232>>}},
>>>>>>> [{couch_file,read_raw_iolist_int,3},
>>>>>>> {couch_file,maybe_read_more_iolist,4},
>>>>>>> {couch_file,handle_call,3},
>>>>>>> {gen_server,handle_msg,5},
>>>>>>> {proc_lib,init_p_do_apply,3}]}
>>>>>>>
>>>>>>> I'm not too familiar with erlang, but what I gathered from the src was
>>>>>>> `pread_iolist` function is used when reading anything from the disk.
>>>>>>> So I think this might be a corrupt db problem.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Stephen Bartell
>>>>>>
>>>>>
>>>>
>>>
>>
>