Hi Octavian, I usually tail -f the log while debugging on couch. It was actually my coworker who did the compact, determined it failed, and rebuilt the db. I didn't observe the logs during that process. The server is on the road right now. Once it gets back I can grep the log for details on that compact he attempted.
On Aug 7, 2012, at 1:21 PM, Octavian Damiean wrote: > Hello Stephen, > > Just "less" the log and let it wait for changes. That way you can inspect > what it does. > > Cheers, Octavian > > On Tue, Aug 7, 2012 at 10:18 PM, stephen bartell <[email protected]>wrote: > >> we don't even "think" it started. After starting compact we looked at the >> status in futon and nothing came up. The reason I say "think" is because >> compact can happen too quickly for us to click over to status and watch it >> start/end. But for this db of this size it should have taken ~ 5-10 sec. >> So we assumed it failed and went on to destroying/rebuilding the db. >> >> >> On Aug 7, 2012, at 1:11 PM, Robert Newson wrote: >> >>> >>> did compaction complete, though? I wasn't thinking of reducing the file >> size, but of being able to successfully read all live data and write it >> back out again. >>> >>> B. >>> >>> On 7 Aug 2012, at 21:01, stephen bartell wrote: >>> >>>> I'll consider delayed_commits. >>>> >>>> The database was 85MB before compaction. We ran compact and it was >> still 85Mb. So compact didn't work. The same db on other servers will >> compact ~10x its original size. >>>> >>>> >>>> >>>> >>>>> I strongly suggest disabling delayed_commits on general principles >> (what's written should stay written). Are you able to compact the >> database(s) that give this error? >>>>> >>>>> B. >>>>> >>>>> On 7 Aug 2012, at 18:42, stephen bartell wrote: >>>>> >>>>>> delayed_commits = true >>>>>> >>>>>> Stephen Bartell >>>>>> >>>>>> On Aug 7, 2012, at 10:39 AM, Robert Newson wrote: >>>>>> >>>>>>> Are you running with delayed_commits=true or false? >>>>>>> >>>>>>> B. >>>>>>> >>>>>>> On 7 Aug 2012, at 18:27, stephen bartell wrote: >>>>>>> >>>>>>>> >>>>>>>>> Hi Stephen, >>>>>>>>> >>>>>>>>> Can you tell us anymore about the context, or did you start seeing >> these in the logs? >>>>>>>> >>>>>>>> Sure, here's some context. This couch is part of a demo server. >> It travels a lot and is cycled a lot. There is one physical server, it >> consists of nginx (serving web apps and reverse proxying for couch), >> couchdb for persistence, and numerous programs which read and write to >> couch. Traffic on couch can get very heavy. >>>>>>>> >>>>>>>> I didn't first see this in the logs. Some of the web apps would >> grind to a halt, nginx would return 404, and then eventually couch would >> restart. This would happen every couple of minutes. >>>>>>>> >>>>>>>>> By chance do you have a scenario that reproduces this? Was this db >> compacted or replicated from elsewhere? >>>>>>>> >>>>>>>> I wish I had a pliable scenario other than sending the server >> through taxi cabs, airlines, and pulling the power cord several times a >> day. We haven't seen this on any of our production servers. >>>>>>>> This server was not subject to any replication. Most databases on >> it are compacted often. >>>>>>>> >>>>>>>> Last night we were able to drill down to one particular program >> which was triggering the crash. One by one, we backed up, deleted, and >> rebuilt the databases that program touched. There was one database which >> seemed to be the culprit, lets call it History. History is a dumping >> ground for stale docs from another db. History is almost always written to, >> and rarely read from. We don't compact History since all docs in it are >> one revision deep. We never replicate to or from it. The only reason we >> deem History the culprit is because after rebuilding it, there hasn't been >> a crash for over 12 hours. >>>>>>>> >>>>>>>> I have an additional question. Is it possible to turn couch >> logging off entirely, or would redirecting to dev/null suffice? When couch >> would crash, hundreds of MB of crap would get dumped to the log. ( >> {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' … ,0,3,232>>}}). >> Right when this dump occurred, the cpu spiked and the server began its >> downward descent. >>>>>>>> >>>>>>>> Best >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Bob >>>>>>>>> On Aug 7, 2012, at 2:06 AM, stephen bartell <[email protected]> >> wrote: >>>>>>>>> >>>>>>>>>> Hi all, could some one help shed some light on this crash I'm >> having. I'm on v1.2, ubuntu 11.04. >>>>>>>>>> >>>>>>>>>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic >> server <0.492.0> terminating >>>>>>>>>> ** Last message in was {pread_iolist,88385709} >>>>>>>>>> ** When Server state == >> {file,{file_descriptor,prim_file,{#Port<0.2899>,79}}, >>>>>>>>>> 93302896} >>>>>>>>>> ** Reason for termination == >>>>>>>>>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump … ,0,3,232>>}}, >>>>>>>>>> [{couch_file,read_raw_iolist_int,3}, >>>>>>>>>> {couch_file,maybe_read_more_iolist,4}, >>>>>>>>>> {couch_file,handle_call,3}, >>>>>>>>>> {gen_server,handle_msg,5}, >>>>>>>>>> {proc_lib,init_p_do_apply,3}]} >>>>>>>>>> >>>>>>>>>> I'm not too familiar with erlang, but what I gathered from the >> src was `pread_iolist` function is used when reading anything from the >> disk. So I think this might be a corrupt db problem. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Stephen Bartell >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >>
