Hello Stephen, Just "less" the log and let it wait for changes. That way you can inspect what it does.
Cheers, Octavian On Tue, Aug 7, 2012 at 10:18 PM, stephen bartell <[email protected]>wrote: > we don't even "think" it started. After starting compact we looked at the > status in futon and nothing came up. The reason I say "think" is because > compact can happen too quickly for us to click over to status and watch it > start/end. But for this db of this size it should have taken ~ 5-10 sec. > So we assumed it failed and went on to destroying/rebuilding the db. > > > On Aug 7, 2012, at 1:11 PM, Robert Newson wrote: > > > > > did compaction complete, though? I wasn't thinking of reducing the file > size, but of being able to successfully read all live data and write it > back out again. > > > > B. > > > > On 7 Aug 2012, at 21:01, stephen bartell wrote: > > > >> I'll consider delayed_commits. > >> > >> The database was 85MB before compaction. We ran compact and it was > still 85Mb. So compact didn't work. The same db on other servers will > compact ~10x its original size. > >> > >> > >> > >> > >>> I strongly suggest disabling delayed_commits on general principles > (what's written should stay written). Are you able to compact the > database(s) that give this error? > >>> > >>> B. > >>> > >>> On 7 Aug 2012, at 18:42, stephen bartell wrote: > >>> > >>>> delayed_commits = true > >>>> > >>>> Stephen Bartell > >>>> > >>>> On Aug 7, 2012, at 10:39 AM, Robert Newson wrote: > >>>> > >>>>> Are you running with delayed_commits=true or false? > >>>>> > >>>>> B. > >>>>> > >>>>> On 7 Aug 2012, at 18:27, stephen bartell wrote: > >>>>> > >>>>>> > >>>>>>> Hi Stephen, > >>>>>>> > >>>>>>> Can you tell us anymore about the context, or did you start seeing > these in the logs? > >>>>>> > >>>>>> Sure, here's some context. This couch is part of a demo server. > It travels a lot and is cycled a lot. There is one physical server, it > consists of nginx (serving web apps and reverse proxying for couch), > couchdb for persistence, and numerous programs which read and write to > couch. Traffic on couch can get very heavy. > >>>>>> > >>>>>> I didn't first see this in the logs. Some of the web apps would > grind to a halt, nginx would return 404, and then eventually couch would > restart. This would happen every couple of minutes. > >>>>>> > >>>>>>> By chance do you have a scenario that reproduces this? Was this db > compacted or replicated from elsewhere? > >>>>>> > >>>>>> I wish I had a pliable scenario other than sending the server > through taxi cabs, airlines, and pulling the power cord several times a > day. We haven't seen this on any of our production servers. > >>>>>> This server was not subject to any replication. Most databases on > it are compacted often. > >>>>>> > >>>>>> Last night we were able to drill down to one particular program > which was triggering the crash. One by one, we backed up, deleted, and > rebuilt the databases that program touched. There was one database which > seemed to be the culprit, lets call it History. History is a dumping > ground for stale docs from another db. History is almost always written to, > and rarely read from. We don't compact History since all docs in it are > one revision deep. We never replicate to or from it. The only reason we > deem History the culprit is because after rebuilding it, there hasn't been > a crash for over 12 hours. > >>>>>> > >>>>>> I have an additional question. Is it possible to turn couch > logging off entirely, or would redirecting to dev/null suffice? When couch > would crash, hundreds of MB of crap would get dumped to the log. ( > {{badmatch,{ok,<<32,50,48,48,10 … 'hundreds of MB of crap' … ,0,3,232>>}}). > Right when this dump occurred, the cpu spiked and the server began its > downward descent. > >>>>>> > >>>>>> Best > >>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Bob > >>>>>>> On Aug 7, 2012, at 2:06 AM, stephen bartell <[email protected]> > wrote: > >>>>>>> > >>>>>>>> Hi all, could some one help shed some light on this crash I'm > having. I'm on v1.2, ubuntu 11.04. > >>>>>>>> > >>>>>>>> [Mon, 06 Aug 2012 18:29:16 GMT] [error] [<0.492.0>] ** Generic > server <0.492.0> terminating > >>>>>>>> ** Last message in was {pread_iolist,88385709} > >>>>>>>> ** When Server state == > {file,{file_descriptor,prim_file,{#Port<0.2899>,79}}, > >>>>>>>> 93302896} > >>>>>>>> ** Reason for termination == > >>>>>>>> ** {{badmatch,{ok,<<32,50,48,48,10 … huge dump … ,0,3,232>>}}, > >>>>>>>> [{couch_file,read_raw_iolist_int,3}, > >>>>>>>> {couch_file,maybe_read_more_iolist,4}, > >>>>>>>> {couch_file,handle_call,3}, > >>>>>>>> {gen_server,handle_msg,5}, > >>>>>>>> {proc_lib,init_p_do_apply,3}]} > >>>>>>>> > >>>>>>>> I'm not too familiar with erlang, but what I gathered from the > src was `pread_iolist` function is used when reading anything from the > disk. So I think this might be a corrupt db problem. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Stephen Bartell > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > >
