Scott and others-
My client has a five-node Resin Pro cluster, each running version 3.1.2. Today one of the nodes experienced an OutOfMemoryException which did not bring Resin down but seemed to have put it in a completely unresponsive state. With 10 minutes or so of that happening, the other four servers stop responding as well. Looking at their logs shows that they are continuously getting socket timeouts while trying to communicate with the first server for session clustering. (Stack trace below). To be fair, this is not the only exception being thrown. We also see our distributed EhCache system unsuccessfully trying to replicate itself. And we *also* see the occasional Hessian exception happening (also below). Ultimately the server just gets so bogged down, it seems, that it needs to be restarted. So my question is this: Assuming a Resin node runs out of memory, is there a way for other Resin nodes to detect that and take the same action as if the node was actually down? I'm not sure this is really a bug, but it is probably a good super-edge-case scenario worth thinking about. We are currently looking at our watchdog process config to see why it did not auto-restart Resin. I think we didn't give enough memory buffer for the watchdog to detect a needed restart, and our app lost responsiveness before the watchdog could restart it. But that's just a theory. I am interested in feedback from Scott and other Caucho developers about this issue, as well as other Resin users who may have experienced issues like this before and have any thoughts or suggestions on the matter. Thanks. ..mike.. --- Socket Timeout stack trace (partial) --- [14:47:10.389] java.net.SocketTimeoutException: Read timed out [14:47:10.389] at java.net.SocketInputStream.socketRead0(Native Method) [14:47:10.389] at java.net.SocketInputStream.read(SocketInputStream.java:129) [14:47:10.389] at com.caucho.vfs.TcpStream.read(TcpStream.java:163) [14:47:10.389] at com.caucho.vfs.ReadStream.readBuffer(ReadStream.java:1001) [14:47:10.389] at com.caucho.vfs.ReadStream.read(ReadStream.java:306) [14:47:10.389] at com.caucho.server.cluster.ClusterStore.updateAccess(ClusterStore.java:85 6) [14:47:10.389] at com.caucho.server.cluster.ClusterStore.accessServer(ClusterStore.java:82 3) [14:47:10.389] at com.caucho.server.cluster.ClusterStore.accessImpl(ClusterStore.java:804) [14:47:10.389] at com.caucho.server.cluster.ClusterObject.access(ClusterObject.java:337) [14:47:10.389] at com.caucho.server.session.SessionImpl.setAccess(SessionImpl.java:839) [14:47:10.389] at com.caucho.server.session.SessionManager.load(SessionManager.java:1477) [14:47:10.389] at com.caucho.server.session.SessionManager.getSession(SessionManager.java: 1335) [14:47:10.389] at com.caucho.server.connection.AbstractHttpRequest.createSession(AbstractH ttpRequest.java:1455) [14:47:10.389] at com.caucho.server.connection.AbstractHttpRequest.getSession(AbstractHttp Request.java:1270) [14:47:10.389] at net.sf.acegisecurity.context.HttpSessionContextIntegrationFilter.doFilte r(HttpSessionContextIntegrationFilter.java:172) [14:47:10.389] at net.sf.acegisecurity.util.FilterChainProxy$VirtualFilterChain.doFilter(F ilterChainProxy.java:303) [14:47:10.389] at net.sf.acegisecurity.util.FilterChainProxy.doFilter(FilterChainProxy.jav a:173) [14:47:10.389] at net.sf.acegisecurity.util.FilterToBeanProxy.doFilter(FilterToBeanProxy.j ava:125) [14:47:10.389] at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain. java:73) --- Hessian failure stack trace --- [14:15:00.065] Caused by: org.springframework.web.util.NestedServletException: Hessian skeleton invocation failed; nested exception is java.io.IOException: expected 'c' in hessian input at -1 [14:15:00.065] at org.springframework.remoting.caucho.HessianServiceExporter.handleRequest (HessianServiceExporter.java:150) [14:15:00.065] at org.springframework.web.servlet.mvc.HttpRequestHandlerAdapter.handle(Htt pRequestHandlerAdapter.java:49) [14:15:00.065] at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherS ervlet.java:857) [14:15:00.065] at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherSe rvlet.java:792) [14:15:00.065] at org.springframework.web.servlet.FrameworkServlet.processRequest(Framewor kServlet.java:475) [14:15:00.065] at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet .java:440) [14:15:00.065] at javax.servlet.http.HttpServlet.service(HttpServlet.java:153) [14:15:00.065] at javax.servlet.http.HttpServlet.service(HttpServlet.java:91) [14:15:00.065] at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChai n.java:103) [14:15:00.065] at net.sf.acegisecurity.util.FilterChainProxy$VirtualFilterChain.doFilter(F ilterChainProxy.java:292) [14:15:00.065] at taylor.tops.security.UserTrackerFilter.doFilter(UserTrackerFilter.java:2 7) [14:15:00.065] at net.sf.acegisecurity.util.FilterChainProxy$VirtualFilterChain.doFilter(F ilterChainProxy.java:303) [14:15:00.065] at net.sf.acegisecurity.intercept.web.FilterSecurityInterceptor.invoke(Filt erSecurityInterceptor.java:84) [14:15:00.065] at net.sf.acegisecurity.intercept.web.SecurityEnforcementFilter.doFilter(Se curityEnforcementFilter.java:182) [14:15:00.065] ... 18 more [14:15:00.065] Caused by: java.io.IOException: expected 'c' in hessian input at -1 [14:15:00.065] at org.springframework.remoting.caucho.Hessian2SkeletonInvoker.invoke(Hessi an2SkeletonInvoker.java:51) [14:15:00.065] at org.springframework.remoting.caucho.HessianServiceExporter.handleRequest (HessianServiceExporter.java:147) [14:15:00.065] ... 31 more ..... Michael Wynholds President Carbon Five, Inc. 310 821 7125 x13 [EMAIL PROTECTED]
_______________________________________________ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest