Re: 2.2 in production

Gianny Damour Sun, 01 Nov 2009 01:52:03 -0800

Hi Trygve,

Thanks for reporting these problems with comprehensive log messages;they were very helpful to diagnose.

The bug causing the inability to reliably restart without downtimehas been identified and fixed. I just deployed new WADI 2.2-SNAPSHOTartifacts, which can be found there:


http://snapshots.repository.codehaus.org/org/codehaus/wadi/

Can you please confirm that this is now working as expected withinyour environment?



Regarding the warning message

16:22:43,524 WARN [UpdateReplicationCommand] Update has not beenproperly cascaded due to a communication failure. If a targetednode has been lost, state will be re-balanced automatically.

the timeout is not exposed via API. Even if this warning message isdisplayed, an update message has actually been queued for delivery tothe relevant back-up nodes. However, there is no guarantee that incase of failure, the cluster will be able to restore the latestsession state as the latest update message may not have actually beendelivered to back-up nodes.

I conducted load-testing on a single computer. As the network trafficgoing through the loopback interface does not actually touch thenetwork card, this is a problem I was never able to observe.

Do you have an idea of the traffic volume when this warning messagestarts to appear? Also, do you have an idea of the session size?


Thanks,
Gianny

On 31/10/2009, at 3:39 AM, Trygve Hardersen wrote:

Hello
We have been using Geronimo 2.2-SNAPSHOT in production for a goodmonth now, and I thought I'd share some experiences with thecommunity, and maybe get some help. We are an online backupservice, check out jottabackup.com if you're interested.
Generally our experience has been very positive. We're using theGBean framework for custom server components, the DB connectionpools against MySQL databases, stateless service EJBs and variousMDB, and of course the web tier (Jetty). Everything is runningsmoothly and we're very much looking forward to 2.2 being releasedso we can "release" our own software.
The issues we're having are related to WADI clustering with Jetty.First we can't use Jetty7 because of GERONIMO-4846, so we're usingJetty6 which works fine. The more serious issue is that we oftencan not update our servers without downtime. This is what happens:
We have 2 application servers (AS-000 and AS-001) running dynamicWADI HTTP session replication between them. When updating we firststop one, AS-000 in this case. That works fine and the activesessions are migrated to AS-001:
23:43:18,160 INFO  [SimpleStateManager]
=============================
New Partition Balancing
Partition Balancing
    Size [24]
Partition[0] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[1] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[2] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[3] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[4] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[5] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[6] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[7] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[8] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[9] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[10] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[11] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[12] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[13] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[14] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[15] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[16] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[17] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[18] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[19] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[20] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[21] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[22] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]Partition[23] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]]; version [3]; mergeVersion [0]
=============================
23:43:28,539 INFO [TcpFailureDetector] Verification complete.Member disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 10, 100}:4000,{10, 0, 10, 100},4000,alive=41104531,id={-4 -32 54 90 -109 -17 65 64 -117 40 -110 -14 3693 -12 -118 }, payload={-84 -19 0 5 115 114 0 50 111 ...(423)},command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={74 79 84 84 6595 87 65 68 ...(10)}, ]]23:43:28,540 INFO [ChannelInterceptorBase] memberDisappeared:tcp://{10, 0, 10, 100}:4000
We then update AS-000 and try to start it, but it fails to rejointhe WADI cluster:
23:46:30,784 INFO [ReceiverBase] Receiver Server Socket bound to:/10.0.10.100:400023:46:30,864 INFO [ChannelInterceptorBase] memberStartlocal:org.apache.catalina.tribes.membership.MemberImpl[tcp://10.0.10.100:4000,10.0.10.100,4000, alive=0,id={-103 34 80 -19 68-51 70 -91 -108 39 -84 65 50 50 103 -107 }, payload={-84 -19 0 5115 114 0 50 111 ...(423)}, command={}, domain={74 79 84 84 65 9587 65 68 ...(10)}, ] notify:false peer:AS-00023:46:30,868 INFO [McastService] Setting cluster mcast soTimeoutto 50023:46:30,908 INFO [McastService] Sleeping for 1000 milliseconds toestablish cluster membership, start level:423:46:31,139 INFO [ChannelInterceptorBase] memberAdded:tcp://{10,0, 10, 101}:400023:46:31,908 INFO [McastService] Done sleeping, membershipestablished, start level:423:46:31,912 INFO [McastService] Sleeping for 1000 milliseconds toestablish cluster membership, start level:823:46:31,927 INFO [BufferPool] Created a buffer pool with max size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl23:46:32,912 INFO [McastService] Done sleeping, membershipestablished, start level:823:46:32,912 INFO [ChannelInterceptorBase] memberStartlocal:org.apache.catalina.tribes.membership.MemberImpl[tcp://10.0.10.100:4000,10.0.10.100,4000, alive=272,id={-103 34 80 -19 68-51 70 -91 -108 39 -84 65 50 50 103 -107 }, payload={-84 -19 0 5115 114 0 50 111 ...(423)}, command={}, domain={74 79 84 84 65 9587 65 68 ...(10)}, ] notify:false peer:AS-00023:46:37,848 INFO [DiscStore] Creating directory: /usr/lib/jotta/jotta-as-prod-1.0-SNAPSHOT/var/temp/SessionStore23:46:37,930 INFO [BasicSingletonServiceHolder] [TribesPeer[AS-000; tcp://10.0.10.100:4000]] owns singleton service[PartitionManager for ServiceSpace [/]]23:46:37,964 INFO [BasicSingletonServiceHolder] [TribesPeer[AS-000; tcp://10.0.10.100:4000]] resigns ownership of singletonservice [PartitionManager for ServiceSpace [/]]23:47:40,065 ERROR [BasicServiceRegistry] Error while starting[Holder for service[org.codehaus.wadi.location.partitionmanager.simplepartitionmana...@7dc2445f] named [PartitionManager] in space [ServiceSpace [/]]]org.codehaus.wadi.location.partitionmanager.PartitionManagerException:Partition [0] is unknown.atorg.codehaus.wadi.location.partitionmanager.SimplePartitionManager.waitForBoot(SimplePartitionManager.java:248)atorg.codehaus.wadi.location.partitionmanager.SimplePartitionManager.start(SimplePartitionManager.java:119)at org.codehaus.wadi.servicespace.basic.BasicServiceHolder.start(BasicServiceHolder.java:60)at org.codehaus.wadi.servicespace.basic.BasicServiceRegistry.start(BasicServiceRegistry.java:152)at org.codehaus.wadi.servicespace.basic.BasicServiceSpace.start(BasicServiceSpace.java:169)atorg.apache.geronimo.clustering.wadi.BasicWADISessionManager.doStart(BasicWADISessionManager.java:125)at org.apache.geronimo.gbean.runtime.GBeanInstance.createInstance(GBeanInstance.java:948)atorg.apache.geronimo.gbean.runtime.GBeanInstanceState.attemptFullStart(GBeanInstanceState.java:269)at org.apache.geronimo.gbean.runtime.GBeanInstanceState.start(GBeanInstanceState.java:103)atorg.apache.geronimo.gbean.runtime.GBeanInstanceState.startRecursive(GBeanInstanceState.java:125)at org.apache.geronimo.gbean.runtime.GBeanInstance.startRecursive(GBeanInstance.java:538)at org.apache.geronimo.kernel.basic.BasicKernel.startRecursiveGBean(BasicKernel.java:377)atorg.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:456)atorg.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:493)atorg.apache.geronimo.kernel.config.KernelConfigurationManager.start(KernelConfigurationManager.java:190)atorg.apache.geronimo.kernel.config.SimpleConfigurationManager.startConfiguration(SimpleConfigurationManager.java:546)
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.geronimo.gbean.runtime.ReflectionMethodInvoker.invoke(ReflectionMethodInvoker.java:34)at org.apache.geronimo.gbean.runtime.GBeanOperation.invoke(GBeanOperation.java:130)at org.apache.geronimo.gbean.runtime.GBeanInstance.invoke(GBeanInstance.java:815)at org.apache.geronimo.gbean.runtime.RawInvoker.invoke(RawInvoker.java:57)at org.apache.geronimo.kernel.basic.RawOperationInvoker.invoke(RawOperationInvoker.java:35)atorg.apache.geronimo.kernel.basic.ProxyMethodInterceptor.intercept(ProxyMethodInterceptor.java:96)at org.apache.geronimo.gbean.GBeanLifecycle$$EnhancerByCGLIB$$628b9237.startConfiguration(<generated>)at org.apache.geronimo.system.main.EmbeddedDaemon.doStartup(EmbeddedDaemon.java:161)at org.apache.geronimo.system.main.EmbeddedDaemon.execute(EmbeddedDaemon.java:78)atorg.apache.geronimo.kernel.util.MainConfigurationBootstrapper.main(MainConfigurationBootstrapper.java:45)at org.apache.geronimo.cli.AbstractCLI.executeMain(AbstractCLI.java:65)
        at org.apache.geronimo.cli.daemon.DaemonCLI.main(DaemonCLI.java:30)
23:47:40,078 ERROR [BasicWADISessionManager] Failed to stop
org.codehaus.wadi.servicespace.ServiceSpaceNotFoundException:ServiceSpaceName not foundatorg.codehaus.wadi.servicespace.basic.BasicServiceSpaceRegistry.unregister(BasicServiceSpaceRegistry.java:55)atorg.codehaus.wadi.servicespace.basic.BasicServiceSpace.unregisterServiceSpace(BasicServiceSpace.java:228)at org.codehaus.wadi.servicespace.basic.BasicServiceSpace.stop(BasicServiceSpace.java:175)atorg.apache.geronimo.clustering.wadi.BasicWADISessionManager.doFail(BasicWADISessionManager.java:134)at org.apache.geronimo.gbean.runtime.GBeanInstance.createInstance(GBeanInstance.java:978)atorg.apache.geronimo.gbean.runtime.GBeanInstanceState.attemptFullStart(GBeanInstanceState.java:269)at org.apache.geronimo.gbean.runtime.GBeanInstanceState.start(GBeanInstanceState.java:103)atorg.apache.geronimo.gbean.runtime.GBeanInstanceState.startRecursive(GBeanInstanceState.java:125)at org.apache.geronimo.gbean.runtime.GBeanInstance.startRecursive(GBeanInstance.java:538)at org.apache.geronimo.kernel.basic.BasicKernel.startRecursiveGBean(BasicKernel.java:377)atorg.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:456)atorg.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:493)atorg.apache.geronimo.kernel.config.KernelConfigurationManager.start(KernelConfigurationManager.java:190)atorg.apache.geronimo.kernel.config.SimpleConfigurationManager.startConfiguration(SimpleConfigurationManager.java:546)
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.geronimo.gbean.runtime.ReflectionMethodInvoker.invoke(ReflectionMethodInvoker.java:34)at org.apache.geronimo.gbean.runtime.GBeanOperation.invoke(GBeanOperation.java:130)at org.apache.geronimo.gbean.runtime.GBeanInstance.invoke(GBeanInstance.java:815)at org.apache.geronimo.gbean.runtime.RawInvoker.invoke(RawInvoker.java:57)at org.apache.geronimo.kernel.basic.RawOperationInvoker.invoke(RawOperationInvoker.java:35)atorg.apache.geronimo.kernel.basic.ProxyMethodInterceptor.intercept(ProxyMethodInterceptor.java:96)at org.apache.geronimo.gbean.GBeanLifecycle$$EnhancerByCGLIB$$628b9237.startConfiguration(<generated>)at org.apache.geronimo.system.main.EmbeddedDaemon.doStartup(EmbeddedDaemon.java:161)at org.apache.geronimo.system.main.EmbeddedDaemon.execute(EmbeddedDaemon.java:78)atorg.apache.geronimo.kernel.util.MainConfigurationBootstrapper.main(MainConfigurationBootstrapper.java:45)at org.apache.geronimo.cli.AbstractCLI.executeMain(AbstractCLI.java:65)
        at org.apache.geronimo.cli.daemon.DaemonCLI.main(DaemonCLI.java:30)
After this failure the server stops. Over at the running instanceAS-001 this is logged:
23:46:31,909 INFO [ChannelInterceptorBase] memberAdded:tcp://{10,0, 10, 100}:400023:46:37,929 INFO [BasicSingletonServiceHolder] [TribesPeer[AS-001; tcp://10.0.10.101:4000]] owns singleton service[PartitionManager for ServiceSpace [/]]23:46:37,929 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:46:38,438 ERROR [BasicEnvelopeDispatcherManager] problemdispatching messagejava.lang.IllegalArgumentException:org.codehaus.wadi.core.store.BasicStoreMotable is not a Sessionatorg.codehaus.wadi.replication.manager.basic.SessionStateHandler.newExtractFullStateExternalizable(SessionStateHandler.java:105)atorg.codehaus.wadi.replication.manager.basic.SessionStateHandler.extractFullState(SessionStateHandler.java:53)atorg.codehaus.wadi.replication.manager.basic.CreateStorageCommand.execute(CreateStorageCommand.java:45)atorg.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondaries(SyncSecondaryManager.java:169)atorg.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondaries(SyncSecondaryManager.java:114)atorg.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondaries(SyncSecondaryManager.java:103)atorg.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondariesFollowingJoiningPeer(SyncSecondaryManager.java:75)atorg.codehaus.wadi.replication.manager.basic.ReOrganizeSecondariesListener.receive(ReOrganizeSecondariesListener.java:53)atorg.codehaus.wadi.servicespace.basic.BasicServiceMonitor.notifyListeners(BasicServiceMonitor.java:124)atorg.codehaus.wadi.servicespace.basic.BasicServiceMonitor.processLifecycleEvent(BasicServiceMonitor.java:141)at org.codehaus.wadi.servicespace.basic.BasicServiceMonitor$ServiceLifecycleEndpoint.dispatch(BasicServiceMonitor.java:148)at org.codehaus.wadi.group.impl.ServiceEndpointWrapper.dispatch(ServiceEndpointWrapper.java:50)at org.codehaus.wadi.group.impl.BasicEnvelopeDispatcherManager$DispatchRunner.run(BasicEnvelopeDispatcherManager.java:121)at org.codehaus.wadi.servicespace.basic.BasicServiceSpaceDispatcher$ExecuteInThread.execute(BasicServiceSpaceDispatcher.java:102)atorg.codehaus.wadi.group.impl.BasicEnvelopeDispatcherManager.onEnvelope(BasicEnvelopeDispatcherManager.java:100)at org.codehaus.wadi.group.impl.AbstractDispatcher.doOnEnvelope(AbstractDispatcher.java:104)at org.codehaus.wadi.group.impl.AbstractDispatcher.onEnvelope(AbstractDispatcher.java:100)atorg.codehaus.wadi.servicespace.basic.ServiceSpaceEndpoint.dispatch(ServiceSpaceEndpoint.java:49)at org.codehaus.wadi.group.impl.ServiceEndpointWrapper.dispatch(ServiceEndpointWrapper.java:50)at org.codehaus.wadi.group.impl.BasicEnvelopeDispatcherManager$DispatchRunner.run(BasicEnvelopeDispatcherManager.java:121)at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
23:46:40,063 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:46:42,938 WARN [BasicPartitionBalancerSingletonService]Rebalancing has failedorg.codehaus.wadi.group.MessageExchangeException: No correlatedmessages received within [5000]msatorg.codehaus.wadi.group.impl.AbstractDispatcher.attemptMultiRendezVous(AbstractDispatcher.java:174)atorg.codehaus.wadi.location.balancing.BasicPartitionBalancer.fetchBalancingInfoState(BasicPartitionBalancer.java:85)atorg.codehaus.wadi.location.balancing.BasicPartitionBalancer.balancePartitions(BasicPartitionBalancer.java:69)atorg.codehaus.wadi.location.balancing.BasicPartitionBalancerSingletonService.run(BasicPartitionBalancerSingletonService.java:85)
        at java.lang.Thread.run(Thread.java:619)
23:46:42,939 WARN [BasicPartitionBalancerSingletonService] Willretry rebalancing in [500] ms23:46:43,439 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:47:40,269 INFO [TcpFailureDetector] Verification complete.Member disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 10, 100}:4000,{10, 0, 10, 100},4000, alive=69401,id={-103 34 80 -19 68 -51 70 -91 -108 39 -84 65 50 50 103 -107 },payload={-84 -19 0 5 115 114 0 50 111 ...(423)}, command={66 65 6689 45 65 76 69 88 ...(9)}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]]23:47:40,271 INFO [ChannelInterceptorBase] memberDisappeared:tcp://{10, 0, 10, 100}:400023:47:40,271 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing
If I try to start AS-000 again the same thing happens. If we stopAS-001 the following is logged;
23:49:18,695 INFO  [SimpleStateManager] Evacuating partitions
23:49:18,699 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:49:23,698 WARN [SimpleStateManager] Partition balancer hasdisappeared - backing off for [1000]ms23:49:24,699 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:49:29,698 WARN [SimpleStateManager] Partition balancer hasdisappeared - backing off for [1000]ms23:49:30,699 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:49:35,699 WARN [SimpleStateManager] Partition balancer hasdisappeared - backing off for [1000]ms23:49:36,700 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:49:41,699 WARN [SimpleStateManager] Partition balancer hasdisappeared - backing off for [1000]ms23:49:42,701 INFO [BasicPartitionBalancerSingletonService]Queueing partition rebalancing23:49:47,700 WARN [SimpleStateManager] Partition balancer hasdisappeared - backing off for [1000]ms
23:49:48,700 INFO  [SimpleStateManager] Evacuated
23:49:48,808 INFO [AbstractExclusiveContextualiser] Unloadedsessions=[36]23:49:48,843 INFO [AbstractExclusiveContextualiser] Unloadedsessions=[13]23:49:58,852 INFO [BasicSingletonServiceHolder] [TribesPeer[AS-001; tcp://10.0.10.101:4000]] resigns ownership of singletonservice [PartitionManager for ServiceSpace [/]]
, however AS-001 then just hangs, and we have to kill the processto get it stopped. After this we can start AS-000, update AS-001and it always seems to have no problem joining the clusterthereafter. The strange thing is that this problem does not alwaysoccur, sometimes everything goes fine. I can't find a consistentpattern but I've tried stopping AS-001 before AS-000, and I'm sureno serializable object in the session has changed between theupdated and running instance.
My gut feeling is that this is either a concurrency-related bug inWADI or a network-timeout related problem. During normal operationI'm sometimes seeing messages like this in the log files:
17:14:08,869 INFO [TcpFailureDetector] Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 10,101}:4000,{10, 0, 10, 101},4000, alive=95659954,id={-52 -76 98 2210 71 76 -72 -122 -59 -21 -29 44 -86 38 114 }, payload={-84 -19 0 5115 114 0 50 111 ...(423)}, command={}, domain={74 79 84 84 65 9587 65 68 ...(10)}, ]] message. Will verify.17:14:08,870 INFO [TcpFailureDetector] Verification complete.Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 10, 101}:4000,{10, 0, 10, 101},4000,alive=95659954,id={-52 -76 98 22 10 71 76 -72 -122 -59 -21 -29 44-86 38 114 }, payload={-84 -19 0 5 115 114 0 50 111 ...(423)},command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]]
And lately, as traffic has increased, errors like this:
16:22:43,524 WARN [UpdateReplicationCommand] Update has not beenproperly cascaded due to a communication failure. If a targetednode has been lost, state will be re-balanced automatically.org.codehaus.wadi.servicespace.ServiceInvocationException:org.codehaus.wadi.group.MessageExchangeException: No correlatedmessages received within [2000]msat org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:209)at org.codehaus.wadi.replication.storage.ReplicaStorage$$EnhancerByCGLIB$$a901e91b.mergeUpdate(<generated>)atorg.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.cascadeUpdate(UpdateReplicationCommand.java:93)atorg.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.run(UpdateReplicationCommand.java:86)atorg.codehaus.wadi.replication.manager.basic.SyncReplicationManager.update(SyncReplicationManager.java:138)atorg.codehaus.wadi.replication.manager.basic.LoggingReplicationManager.update(LoggingReplicationManager.java:100)atorg.codehaus.wadi.core.session.AbstractReplicableSession.onEndProcessing(AbstractReplicableSession.java:49)atorg.codehaus.wadi.core.session.AtomicallyReplicableSession.onEndProcessing(AtomicallyReplicableSession.java:58)atorg.apache.geronimo.clustering.wadi.WADISessionAdaptor.onEndAccess(WADISessionAdaptor.java:77)atorg.apache.geronimo.jetty6.cluster.ClusteredSessionManager.complete(ClusteredSessionManager.java:60)at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:198)atorg.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.doHandle(ClusteredSessionHandler.java:59)at org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler$ActualHandler.handle(ClusteredSessionHandler.java:66)at org.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler$WebClusteredInvocation.invokeLocally(AbstractClusteredPreHandler.java:71)at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation.access$000(WADIClusteredPreHandler.java:52)at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation$1.doFilter(WADIClusteredPreHandler.java:64)at org.codehaus.wadi.web.impl.WebInvocation.invoke(WebInvocation.java:116)atorg.codehaus.wadi.core.contextualiser.MemoryContextualiser.handleLocally(MemoryContextualiser.java:71)atorg.codehaus.wadi.core.contextualiser.AbstractExclusiveContextualiser.handle(AbstractExclusiveContextualiser.java:94)atorg.codehaus.wadi.core.contextualiser.AbstractMotingContextualiser.contextualise(AbstractMotingContextualiser.java:37)at org.codehaus.wadi.core.manager.StandardManager.processStateful(StandardManager.java:150)at org.codehaus.wadi.core.manager.StandardManager.contextualise(StandardManager.java:142)at org.codehaus.wadi.core.manager.ClusteredManager.contextualise(ClusteredManager.java:81)at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation.invoke(WADIClusteredPreHandler.java:72)atorg.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler.handle(AbstractClusteredPreHandler.java:39)atorg.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.handle(ClusteredSessionHandler.java:51)at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)at org.apache.geronimo.jetty6.handler.TwistyWebAppContext.access$101(TwistyWebAppContext.java:41)at org.apache.geronimo.jetty6.handler.TwistyWebAppContext$TwistyHandler.handle(TwistyWebAppContext.java:66)atorg.apache.geronimo.jetty6.handler.ThreadClassloaderHandler.handle(ThreadClassloaderHandler.java:46)at org.apache.geronimo.jetty6.handler.InstanceContextHandler.handle(InstanceContextHandler.java:58)at org.apache.geronimo.jetty6.handler.UserTransactionHandler.handle(UserTransactionHandler.java:48)atorg.apache.geronimo.jetty6.handler.ComponentContextHandler.handle(ComponentContextHandler.java:47)at org.apache.geronimo.jetty6.handler.TwistyWebAppContext.handle(TwistyWebAppContext.java:60)at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at org.apache.geronimo.pool.ThreadPool$1.run(ThreadPool.java:214)
at org.apache.geronimo.pool.ThreadPool$ContextClassLoaderRunnable.run(ThreadPool.java:344)at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: org.codehaus.wadi.group.MessageExchangeException: Nocorrelated messages received within [2000]msatorg.codehaus.wadi.group.impl.AbstractDispatcher.attemptMultiRendezVous(AbstractDispatcher.java:174)atorg.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invokeOnPeers(BasicServiceInvoker.java:90)at org.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invoke(BasicServiceInvoker.java:69)at org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:193)
        ... 49 more
Does anyone have some insight into what might be causing this, orwhere the timeouts can be increased if there are any?
I'm thinking that a static WADI configuration can be more stablethan the dynamic setup we have not which relies upon multicasting.Does anyone have experience with similar setups?
Thanks!

Trygve Hardersen - Jotta

Re: 2.2 in production

Reply via email to