Hello again The system has now been running for a few days and I've not seen the hang on startup so I believe the issue has been resolved. Great!
I am still seeing some warnings like this: 10:58:49,250 WARN [UpdateReplicationCommand] Update has not been properly cascaded due to a communication failure. If a targeted node has been lost, state will be re-balanced automatically. org.codehaus.wadi.servicespace.ServiceInvocationException: org.codehaus.wadi.group.MessageExchangeException: No correlated messages received within [2000]ms at org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:209) at org.codehaus.wadi.replication.storage.ReplicaStorage$$EnhancerByCGLIB$$4372d264.mergeUpdate(<generated>) at org.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.cascadeUpdate(UpdateReplicationCommand.java:93) at org.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.run(UpdateReplicationCommand.java:86) at org.codehaus.wadi.replication.manager.basic.SyncReplicationManager.update(SyncReplicationManager.java:138) at org.codehaus.wadi.replication.manager.basic.LoggingReplicationManager.update(LoggingReplicationManager.java:100) at org.codehaus.wadi.core.session.AbstractReplicableSession.onEndProcessing(AbstractReplicableSession.java:49) at org.codehaus.wadi.core.session.AtomicallyReplicableSession.onEndProcessing(AtomicallyReplicableSession.java:58) at org.apache.geronimo.clustering.wadi.WADISessionAdaptor.onEndAccess(WADISessionAdaptor.java:77) at org.apache.geronimo.jetty6.cluster.ClusteredSessionManager.complete(ClusteredSessionManager.java:60) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:198) at org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.doHandle(ClusteredSessionHandler.java:59) at org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler$ActualHandler.handle(ClusteredSessionHandler.java:66) at org.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler$WebClusteredInvocation.invokeLocally(AbstractClusteredPreHandler.java:71) at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation.access$000(WADIClusteredPreHandler.java:52) at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation$1.doFilter(WADIClusteredPreHandler.java:64) at org.codehaus.wadi.web.impl.WebInvocation.invoke(WebInvocation.java:116) at org.codehaus.wadi.core.contextualiser.MemoryContextualiser.handleLocally(MemoryContextualiser.java:71) at org.codehaus.wadi.core.contextualiser.AbstractExclusiveContextualiser.handle(AbstractExclusiveContextualiser.java:94) at org.codehaus.wadi.core.contextualiser.AbstractMotingContextualiser.contextualise(AbstractMotingContextualiser.java:37) at org.codehaus.wadi.core.manager.StandardManager.processStateful(StandardManager.java:154) at org.codehaus.wadi.core.manager.StandardManager.contextualise(StandardManager.java:146) at org.codehaus.wadi.core.manager.ClusteredManager.contextualise(ClusteredManager.java:81) at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation.invoke(WADIClusteredPreHandler.java:72) at org.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler.handle(AbstractClusteredPreHandler.java:39) at org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.handle(ClusteredSessionHandler.java:51) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.apache.geronimo.jetty6.handler.TwistyWebAppContext.access$101(TwistyWebAppContext.java:41) at org.apache.geronimo.jetty6.handler.TwistyWebAppContext$TwistyHandler.handle(TwistyWebAppContext.java:66) at org.apache.geronimo.jetty6.handler.ThreadClassloaderHandler.handle(ThreadClassloaderHandler.java:46) at org.apache.geronimo.jetty6.handler.InstanceContextHandler.handle(InstanceContextHandler.java:58) at org.apache.geronimo.jetty6.handler.UserTransactionHandler.handle(UserTransactionHandler.java:48) at org.apache.geronimo.jetty6.handler.ComponentContextHandler.handle(ComponentContextHandler.java:47) at org.apache.geronimo.jetty6.handler.TwistyWebAppContext.handle(TwistyWebAppContext.java:60) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.apache.geronimo.pool.ThreadPool$1.run(ThreadPool.java:214) at org.apache.geronimo.pool.ThreadPool$ContextClassLoaderRunnable.run(ThreadPool.java:344) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.codehaus.wadi.group.MessageExchangeException: No correlated messages received within [2000]ms at org.codehaus.wadi.group.impl.AbstractDispatcher.attemptMultiRendezVous(AbstractDispatcher.java:174) at org.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invokeOnPeers(BasicServiceInvoker.java:90) at org.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invoke(BasicServiceInvoker.java:69) at org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:193) ... 49 more Not sure if I should worry about that. Many thanks for your help! Trygve On Tue, Nov 3, 2009 at 4:35 PM, Trygve Hardersen <[email protected]> wrote: > > > On Mon, Nov 2, 2009 at 5:39 PM, Kevan Miller <[email protected]>wrote: >> >> >> Thanks Gianny. I'd like to see this included in the Geronimo 2.2 release. >> Can we look for a new WADI release, soon? Once we know the problem is fixed? >> >> Trygve, the sooner we get confirmation that your issue is resolved, the >> sooner we can start finalizing the 2.2 release. >> >> --kevan >> > > I've now built Geronimo using the 2.2-SNAPSHOT of WADI and installed it on > our test environment. No obvious issues so I'll go ahead and deploy this to > production either later this evening (EU time) or tomorrow. Then it needs to > run for a few days before I can confirm if the issue has really been > resolved. > > BTW I got this on our test system: > > AS-000: > 16:23:17,773 INFO [TcpFailureDetector] Received > memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, > -88, 1, 61}:4000,{-64, -88, 1, 61},4000, alive=1814258,id={50 18 86 10 111 > -47 79 83 -108 -4 82 -8 26 82 -79 -59 }, payload={-84 -19 0 5 115 114 0 50 > 111 ...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]] > message. Will verify. > 16:23:17,897 INFO [TcpFailureDetector] Verification complete. Member still > alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, 1, > 61}:4000,{-64, -88, 1, 61},4000, alive=1814258,id={50 18 86 10 111 -47 79 83 > -108 -4 82 -8 26 82 -79 -59 }, payload={-84 -19 0 5 115 114 0 50 111 > ...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]] > > AS-001: > 16:23:18,446 INFO [TcpFailureDetector] Received > memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, > -88, 1, 60}:4000,{-64, -88, 1, 60},4000, alive=2500759,id={107 -64 91 -23 > 109 93 75 116 -95 109 110 22 -85 53 -52 85 }, payload={-84 -19 0 5 115 114 0 > 50 111 ...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, > ]] message. Will verify. > 16:23:18,456 INFO [TcpFailureDetector] Verification complete. Member still > alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, 1, > 60}:4000,{-64, -88, 1, 60},4000, alive=2500759,id={107 -64 91 -23 109 93 75 > 116 -95 109 110 22 -85 53 -52 85 }, payload={-84 -19 0 5 115 114 0 50 111 > ...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]] > > And then: > > AS-000 > 16:30:02,576 INFO [ChannelInterceptorBase] memberDisappeared:tcp://{-64, > -88, 1, 61}:4000 > 16:30:02,577 INFO [BasicPartitionBalancerSingletonService] Queueing > partition rebalancing > 16:30:02,600 INFO [SimpleStateManager] > ============================= > New Partition Balancing > Partition Balancing > Size [24] > Partition[0] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[1] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[2] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[3] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[4] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[5] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[6] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[7] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[8] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[9] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[10] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[11] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[12] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[13] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[14] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[15] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[16] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[17] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[18] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[19] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[20] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[21] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[22] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > Partition[23] owned by [TribesPeer [AS-000; tcp://192.168.1.60:4000]]; > version [3]; mergeVersion [0] > ============================= > > 16:30:02,888 WARN [TcpFailureDetector] Member added, even though we werent > notified:org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, > 1, 61}:4000,{-64, -88, 1, 61},4000, alive=2221072,id={50 18 86 10 111 -47 79 > 83 -108 -4 82 -8 26 82 -79 -59 }, payload={-84 -19 0 5 115 114 0 50 111 > ...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ] > 16:30:02,889 INFO [ChannelInterceptorBase] memberAdded:tcp://{-64, -88, 1, > 61}:4000 > > AS-001 > Nothing.... > > There is practically no load on this network. Anyway I'll try this with > load and see what happens. > > Many thanks again! > > Trygve >
