Thanks Paul. Comments below. On Tue, Oct 6, 2009 at 11:01 AM, Paul Davis <[email protected]>wrote:
> > Glenn, > > The quickest way to check if you have a bad document in your DB would > probably be something like: > > $ ps ax | grep beam.smp > $ curl http://127.0.0.1:5984/db_name/_all_docs?include_docs=true > > /dev/null > $ ps ax | grep beam.smp > > You only need to trigger the doc to exit through the JSON serializer > to trigger the badness. > > I am running this now. > If its being restarted by heart, then its most likely a complete VM > death. The fact the PID is changing suggests that you're hitting VM > death. And on complete VM death there is nothing CouchDB can do to > help. VM deaths are instant and dramatic. Have you tried checking > memory allocated the beam.smp process as it gets further along? A > common cause of instant VM deaths is when malloc returns NULL. > > I have kept an eye on the overall system memory usage. The EC2 XLarge instance I am running on has 15GB RAM, and I have never seen the RAM usage go over 4-5GB since I switched to XLarge. Is there a specific command you suggest for tracking memory explicitly assigned to the beam? > Also, I just went through and re-read the entire discussion. After > your 0.9.1 -> trunk upgrade did you compact the database? I can't > think of anything that'd cause an issue there but it might be > something to try (there is a conversion process during compaction). > > I did not do a compaction. I can try that. Unfortunately that probably kills another day compacting my 50GB 28mm record DB. ;-) But, hey, if it helps... :-) > If the db dump and compaction don't show anything then we'll take a > look at writing some scripts to go through and check docs and add some > reporting to the view generation process to try and get a handle on > what's going on. > > Paul Davis > So there is no way to turn on an additional level of debugging in the view generation process with the current code? I noticed that there is a 'tmi' logging level in the erlang couchdb code (which I just turned on). Will this help? Again, thanks. I know this is my problem, but knowing that there are some people willing to lend a hand, and maybe write some code to help identify / resolve this is whats keeping me going. :-) Much appreciated. And hopefully couchdb will be the better for it in the end. Glenn
