nope, no thread dump unfortunately.  the whole cluster had been
restarted by the IT department of my client before I came in to help
figure out what had happened.  I will tell them to get a thread dump if
it happens again.

I don't think the JVM itself was frozen.  but I am not sure.

in the meantime, I think we will try increasing the heap size and the
memory-free-min value.

..mike..

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Scott Ferguson
Sent: Monday, March 10, 2008 4:42 PM
To: General Discussion for the Resin application server
Subject: Re: [Resin-interest] Resin cluster failure with a
singlenoderunning out of heap space


On Mar 10, 2008, at 3:57 PM, Mike Wynholds wrote:

> Hmmm... we do have a <memory-free-min> setting of 1MB (Scott asked  
> about
> that just before this email).  So then how would Resin still get an  
> OOM
> error?  Is there a thread in the server that watches the heap space?
> Because we do a lot of in-JVM image manipulation, which takes up a LOT
> of memory and quite quickly.  So if it is a timing issue, it's  
> possible
> that the heap-watcher doesn't have a chance to act quickly enough.

It's the main thread and checked every 10 seconds.  So it's possible  
that using a lot of memory could run
by it.  Still, that thread should detect the problem after 10 seconds  
and force a restart.

Although, memory checking isn't exact.  It's even possible the  
original failure freed up memory between checks, but that's not likely.

Did you happen to get a thread dump or was the JVM itself frozen?  JVM  
freezes are hard to deal with.

-- Scott

>
>
> ..mike..
>
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Sam
> Sent: Monday, March 10, 2008 3:38 PM
> To: General Discussion for the Resin application server
> Subject: Re: [Resin-interest] Resin cluster failure with a single
> noderunning out of heap space
>
>> We are currently looking at our watchdog process config to see why it
>> did not auto-restart Resin.  I think we didn't give enough memory
> buffer
>> for the watchdog to detect a needed restart, and our app lost
>> responsiveness before the watchdog could restart it.  But that's just
> a
>> theory.
>
> The memory low detection happens within the server itself.  If the
> server itself detects that the memory is about to be exhausted, it
> exits.  The watchdog then notices that the server did not exit  
> cleanly,
> and starts a new server to replace it.
>
> -- Sam
>
>
>
> _______________________________________________
> resin-interest mailing list
> resin-interest@caucho.com
> http://maillist.caucho.com/mailman/listinfo/resin-interest
>
>
> _______________________________________________
> resin-interest mailing list
> resin-interest@caucho.com
> http://maillist.caucho.com/mailman/listinfo/resin-interest



_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest


_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

Reply via email to