Dear list, For future reference, I think my problem is solved, and it doesn't appear to be a CouchDB or Erlang thing, but rather a library/Gentoo Linux issue.
This is a Gentoo Linux box, and Gentoo likes to be rebuilt from top to bottom every 6 months or so, I bit the bullet and did that. In the process I noticed here and there messages about links to icu library within couchdb that required a rebuild of couchdb. So, wildly guessing, I *think* that was the problem...an older build of icu was being used during the couchdb build, but was incompatible with some other, more recently built system library. Or perhaps it was something else. Regardless, a rebuild of everything solved the problems I was having. Been stable for a few hours now with about twice the load that was crashing it before. Thanks, James Marca On Mon, Sep 16, 2013 at 08:28:09PM +0200, Dave Cottlehuber wrote: > My gut feel is that some OS thing is killing off beam and the usual > suspect for that is OOM. I see you've noted nothing wrt in logs > though. > > On ubuntu > 12.x this works: > > ps -ef |grep beam > # you'll see 2 processes, so do this for both pids > cat /proc/$PID/oom_score > 124 > # echo '-1000' > /proc/$PID/oom_score_adj > # cat /proc/$PID/oom_score > > > only other advice I can offer is to login & run as sudo <couchdb_user> > `couchdb -i` for a while, it's interactive mode and *maybe* something > useful will be left… > > > > On 16 September 2013 18:59, James Marca <[email protected]> wrote: > > On Sun, Sep 15, 2013 at 10:10:24PM -0700, James Marca wrote: > >> On Sun, Sep 15, 2013 at 08:04:27PM +0200, Dave Cottlehuber wrote: > >> > NIF scheduler issues could be a reasonable suspect; > >> > > >> > heart: Fri Sep 13 20:59:36 2013: heart-beat time-out, no activity for > >> > 15 seconds > >> > > >> > 15 seconds is a *long* time however. > >> > > >> > 1.4.0 needs 14B04 or higher I think due to one of our dependencies, so > >> > I'd suggest reverting back to that & seeing if you are having any > >> > other issues. > >> > > >> > Also, probably unrelated, why is kernel polling disabled? > >> > >> Honestly, on my gentoo boxes I just use the ebuild. I have no idea > >> why kernel polling is false...it is whatever the default is in the > >> ebuild I guess. I have no clue about whether kpoll should be enabled, > >> so I'm trusting the default. > > > > correction. kernel polling is enabled. The kpoll option is set when > > building, and /usr/bin/couchdb has +K true. If I invoke erl with +K true, > > then > > kpoll=true. One think I do not havae though is HIPE enabled. > > > > -- > > This message has been scanned for viruses and > > dangerous content by MailScanner, and is > > believed to be clean. > > -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
