we don't even "think" it started. After starting compact we looked at the
status in futon and nothing came up. The reason I say "think" is because
compact can happen too quickly for us to click over to status and watch it
start/end. But for this db of this size it should have taken ~ 5-10 sec. So
we assumed it failed and went on to destroying/rebuilding the db.
On Aug 7, 2012, at 1:11 PM, Robert Newson wrote:
>
> did compaction complete, though? I wasn't thinking of reducing the file size,
> but of being able to successfully read all live data and write it back out
> again.
>
> B.
>
> On 7 Aug 2012, at 21:01, stephen bartell wrote:
>
>> I'll consider delayed_commits.
>>
>> The database was 85MB before compaction. We ran compact and it was still
>> 85Mb. So compact didn't work. The same db on other servers will compact
>> ~10x its original size.
>>
>>
>>
>>
>>> I strongly suggest disabling delayed_commits on general principles (what's
>>> written should stay written). Are you able to compact the database(s) that
>>> give this error?
>>>
>>> B.
>>>
>>> On 7 Aug 2012, at 18:42, stephen bartell wrote:
>>>
>>>> delayed_commits = true
>>>>
>>>> Stephen Bartell
>>>>
>>>> On Aug 7, 2012, at 10:39 AM, Robert Newson wrote:
>>>>
>>>>> Are you running with delayed_commits=true or false?
>>>>>
>>>>> B.
>>>>>
>>>>> On 7 Aug 2012, at 18:27, stephen bartell wrote:
>>>>>
>>>>>>
>>>>>>> Hi Stephen,
>>>>>>>
>>>>>>> Can you tell us anymore about the context, or did you start seeing
>>>>>>> these in the logs?
>>>>>>
>>>>>> Sure, here's some context. This couch is part of a demo server. It
>>>>>> travels a lot and is cycled a lot. There is one physical server, it
>>>>>> consists of nginx (serving web apps and reverse proxying for couch),
>>>>>> couchdb for persistence, and numerous programs which read and write to
>>>>>> couch. Traffic on couch can get very heavy.
>>>>>>
>>>>>> I didn't first see this in the logs. Some of the web apps would grind
>>>>>> to a halt, nginx would return 404, and then eventually couch would
>>>>>> restart. This would happen every couple of minutes.
>>>>>>
>>>>>>> By chance do you have a scenario that reproduces this? Was this db
>>>>>>> compacted or replicated from elsewhere?
>>>>>>
>>>>>> I wish I had a pliable scenario other than sending the server through
>>>>>> taxi cabs, airlines, and pulling the power cord several times a day. We
>>>>>> haven't seen this on any of our production servers.
>>>>>> This server was not subject to any replication. Most databases on it
>>>>>> are compacted often.
>>>>>>
>>>>>> Last night we were able to drill down to one particular program which
>>>>>> was triggering the crash. One by one, we backed up, deleted, and
>>>>>> rebuilt the databases that program touched. There was one database
>>>>>> which seemed to be the culprit, lets call it History. History is a
>>>>>> dumping ground for stale docs from another db. History is almost always
>>>>>> written to, and rarely read from. We don't compact History since all
>>>>>> docs in it are one revision deep. We never replicate to or from it.
>>>>>> The only reason we deem History the culprit is because after rebuilding
>>>>>> it, there hasn't been a crash for over 12 hours.
>>>>>>
>>>>>> I have an additional question. Is it possible to turn couch logging off
>>>>>> entirely, or would redirecting to dev/null suffice? When couch would
>>>>>> crash, hundreds of MB of crap would get dumped to the log. (
>>>>>> {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' …
>>>>>> ,0,3,232>>}}). Right when this dump occurred, the cpu spiked and the
>>>>>> server began its downward descent.
>>>>>>
>>>>>> Best
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Bob
>>>>>>> On Aug 7, 2012, at 2:06 AM, stephen bartell <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi all, could some one help shed some light on this crash I'm having.
>>>>>>>> I'm on v1.2, ubuntu 11.04.
>>>>>>>>
>>>>>>>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic server
>>>>>>>> <0.492.0> terminating
>>>>>>>> ** Last message in was {pread_iolist,88385709}
>>>>>>>> ** When Server state ==
>>>>>>>> {file,{file_descriptor,prim_file,{#Port<0.2899>,79}},
>>>>>>>> 93302896}
>>>>>>>> ** Reason for termination ==
>>>>>>>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump … ,0,3,232>>}},
>>>>>>>> [{couch_file,read_raw_iolist_int,3},
>>>>>>>> {couch_file,maybe_read_more_iolist,4},
>>>>>>>> {couch_file,handle_call,3},
>>>>>>>> {gen_server,handle_msg,5},
>>>>>>>> {proc_lib,init_p_do_apply,3}]}
>>>>>>>>
>>>>>>>> I'm not too familiar with erlang, but what I gathered from the src was
>>>>>>>> `pread_iolist` function is used when reading anything from the disk.
>>>>>>>> So I think this might be a corrupt db problem.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Stephen Bartell
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>