nope, no thread dump unfortunately. the whole cluster had been restarted by the IT department of my client before I came in to help figure out what had happened. I will tell them to get a thread dump if it happens again.
I don't think the JVM itself was frozen. but I am not sure. in the meantime, I think we will try increasing the heap size and the memory-free-min value. ..mike.. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Scott Ferguson Sent: Monday, March 10, 2008 4:42 PM To: General Discussion for the Resin application server Subject: Re: [Resin-interest] Resin cluster failure with a singlenoderunning out of heap space On Mar 10, 2008, at 3:57 PM, Mike Wynholds wrote: > Hmmm... we do have a <memory-free-min> setting of 1MB (Scott asked > about > that just before this email). So then how would Resin still get an > OOM > error? Is there a thread in the server that watches the heap space? > Because we do a lot of in-JVM image manipulation, which takes up a LOT > of memory and quite quickly. So if it is a timing issue, it's > possible > that the heap-watcher doesn't have a chance to act quickly enough. It's the main thread and checked every 10 seconds. So it's possible that using a lot of memory could run by it. Still, that thread should detect the problem after 10 seconds and force a restart. Although, memory checking isn't exact. It's even possible the original failure freed up memory between checks, but that's not likely. Did you happen to get a thread dump or was the JVM itself frozen? JVM freezes are hard to deal with. -- Scott > > > ..mike.. > > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Sam > Sent: Monday, March 10, 2008 3:38 PM > To: General Discussion for the Resin application server > Subject: Re: [Resin-interest] Resin cluster failure with a single > noderunning out of heap space > >> We are currently looking at our watchdog process config to see why it >> did not auto-restart Resin. I think we didn't give enough memory > buffer >> for the watchdog to detect a needed restart, and our app lost >> responsiveness before the watchdog could restart it. But that's just > a >> theory. > > The memory low detection happens within the server itself. If the > server itself detects that the memory is about to be exhausted, it > exits. The watchdog then notices that the server did not exit > cleanly, > and starts a new server to replace it. > > -- Sam > > > > _______________________________________________ > resin-interest mailing list > resin-interest@caucho.com > http://maillist.caucho.com/mailman/listinfo/resin-interest > > > _______________________________________________ > resin-interest mailing list > resin-interest@caucho.com > http://maillist.caucho.com/mailman/listinfo/resin-interest _______________________________________________ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest _______________________________________________ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest