On Tue, Oct 6, 2009 at 2:21 PM, Glenn Rempe <[email protected]> wrote: > Thanks Paul. Comments below. > > On Tue, Oct 6, 2009 at 11:01 AM, Paul Davis > <[email protected]>wrote: > >> >> Glenn, >> >> The quickest way to check if you have a bad document in your DB would >> probably be something like: >> >> $ ps ax | grep beam.smp >> $ curl http://127.0.0.1:5984/db_name/_all_docs?include_docs=true > >> /dev/null >> $ ps ax | grep beam.smp >> >> You only need to trigger the doc to exit through the JSON serializer >> to trigger the badness. >> >> > I am running this now. > > >> If its being restarted by heart, then its most likely a complete VM >> death. The fact the PID is changing suggests that you're hitting VM >> death. And on complete VM death there is nothing CouchDB can do to >> help. VM deaths are instant and dramatic. Have you tried checking >> memory allocated the beam.smp process as it gets further along? A >> common cause of instant VM deaths is when malloc returns NULL. >> >> > I have kept an eye on the overall system memory usage. The EC2 XLarge > instance I am running on has 15GB RAM, and I have never seen the RAM usage > go over 4-5GB since I switched to XLarge. Is there a specific command you > suggest for tracking memory explicitly assigned to the beam? >
I'm not very high tech here. Top and free generally just to get an idea. Memory reporting is kinda wonky so I generally only check for order of magnitude type checking. Though the next time you start an indexing run a small script that spins and records high water mark memory allocation to that PID could prove useful if it's a major spike that causes VM death. > > >> Also, I just went through and re-read the entire discussion. After >> your 0.9.1 -> trunk upgrade did you compact the database? I can't >> think of anything that'd cause an issue there but it might be >> something to try (there is a conversion process during compaction). >> >> > I did not do a compaction. I can try that. Unfortunately that probably > kills another day compacting my 50GB 28mm record DB. ;-) But, hey, if it > helps... :-) > Its a possibility is all. Theoretically this is more incremental, so even if you kick it off and it dies it'll restart part way through even without a complete run. (Very theoretically as I haven't tried it yet). Also it'll run just fine in the background. > >> If the db dump and compaction don't show anything then we'll take a >> look at writing some scripts to go through and check docs and add some >> reporting to the view generation process to try and get a handle on >> what's going on. >> >> Paul Davis >> > > So there is no way to turn on an additional level of debugging in the view > generation process with the current code? I noticed that there is a 'tmi' > logging level in the erlang couchdb code (which I just turned on). Will > this help? A TMI log level is news to me. I've never seen a log macro that uses it. > Again, thanks. I know this is my problem, but knowing that there are some > people willing to lend a hand, and maybe write some code to help identify / > resolve this is whats keeping me going. :-) Much appreciated. And > hopefully couchdb will be the better for it in the end. > > Glenn > Don't worry. I quite dislike not figuring out the cause of anything that sounds even remotely like a bug in CouchDB. Paul Davis
