Hi Trygve,
Thanks for your test.
These warnings indicate that a session replication message was not
acknowledged within a 2000ms timeframe. The message has not been lost
and will be processed by back-up nodes as expected.
Kevan, I just deployed release 2.1.2 of WADI for geronimo 2.2.
Thanks,
Gianny
On 09/11/2009, at 11:38 PM, Trygve Hardersen wrote:
Hello again
The system has now been running for a few days and I've not seen
the hang on startup so I believe the issue has been resolved. Great!
I am still seeing some warnings like this:
10:58:49,250 WARN [UpdateReplicationCommand] Update has not been
properly cascaded due to a communication failure. If a targeted
node has been lost, state will be re-balanced automatically.
org.codehaus.wadi.servicespace.ServiceInvocationException:
org.codehaus.wadi.group.MessageExchangeException: No correlated
messages received within [2000]ms
at org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory
$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:209)
at org.codehaus.wadi.replication.storage.ReplicaStorage$
$EnhancerByCGLIB$$4372d264.mergeUpdate(<generated>)
at
org.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.c
ascadeUpdate(UpdateReplicationCommand.java:93)
at
org.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.r
un(UpdateReplicationCommand.java:86)
at
org.codehaus.wadi.replication.manager.basic.SyncReplicationManager.upd
ate(SyncReplicationManager.java:138)
at
org.codehaus.wadi.replication.manager.basic.LoggingReplicationManager.
update(LoggingReplicationManager.java:100)
at
org.codehaus.wadi.core.session.AbstractReplicableSession.onEndProcessi
ng(AbstractReplicableSession.java:49)
at
org.codehaus.wadi.core.session.AtomicallyReplicableSession.onEndProces
sing(AtomicallyReplicableSession.java:58)
at
org.apache.geronimo.clustering.wadi.WADISessionAdaptor.onEndAccess
(WADISessionAdaptor.java:77)
at
org.apache.geronimo.jetty6.cluster.ClusteredSessionManager.complete
(ClusteredSessionManager.java:60)
at org.mortbay.jetty.servlet.SessionHandler.handle
(SessionHandler.java:198)
at
org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.doHandle
(ClusteredSessionHandler.java:59)
at org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler
$ActualHandler.handle(ClusteredSessionHandler.java:66)
at org.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler
$WebClusteredInvocation.invokeLocally
(AbstractClusteredPreHandler.java:71)
at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler
$WADIWebClusteredInvocation.access$000(WADIClusteredPreHandler.java:
52)
at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler
$WADIWebClusteredInvocation$1.doFilter(WADIClusteredPreHandler.java:
64)
at org.codehaus.wadi.web.impl.WebInvocation.invoke
(WebInvocation.java:116)
at
org.codehaus.wadi.core.contextualiser.MemoryContextualiser.handleLocal
ly(MemoryContextualiser.java:71)
at
org.codehaus.wadi.core.contextualiser.AbstractExclusiveContextualiser.
handle(AbstractExclusiveContextualiser.java:94)
at
org.codehaus.wadi.core.contextualiser.AbstractMotingContextualiser.con
textualise(AbstractMotingContextualiser.java:37)
at org.codehaus.wadi.core.manager.StandardManager.processStateful
(StandardManager.java:154)
at org.codehaus.wadi.core.manager.StandardManager.contextualise
(StandardManager.java:146)
at org.codehaus.wadi.core.manager.ClusteredManager.contextualise
(ClusteredManager.java:81)
at org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler
$WADIWebClusteredInvocation.invoke(WADIClusteredPreHandler.java:72)
at
org.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler.handle(
AbstractClusteredPreHandler.java:39)
at
org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.handle
(ClusteredSessionHandler.java:51)
at org.mortbay.jetty.handler.ContextHandler.handle
(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle
(WebAppContext.java:417)
at org.apache.geronimo.jetty6.handler.TwistyWebAppContext.access
$101(TwistyWebAppContext.java:41)
at org.apache.geronimo.jetty6.handler.TwistyWebAppContext
$TwistyHandler.handle(TwistyWebAppContext.java:66)
at
org.apache.geronimo.jetty6.handler.ThreadClassloaderHandler.handle
(ThreadClassloaderHandler.java:46)
at org.apache.geronimo.jetty6.handler.InstanceContextHandler.handle
(InstanceContextHandler.java:58)
at org.apache.geronimo.jetty6.handler.UserTransactionHandler.handle
(UserTransactionHandler.java:48)
at
org.apache.geronimo.jetty6.handler.ComponentContextHandler.handle
(ComponentContextHandler.java:47)
at org.apache.geronimo.jetty6.handler.TwistyWebAppContext.handle
(TwistyWebAppContext.java:60)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle
(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerCollection.handle
(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle
(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest
(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.content
(HttpConnection.java:879)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run
(SelectChannelEndPoint.java:409)
at org.apache.geronimo.pool.ThreadPool$1.run(ThreadPool.java:214)
at org.apache.geronimo.pool.ThreadPool
$ContextClassLoaderRunnable.run(ThreadPool.java:344)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.codehaus.wadi.group.MessageExchangeException: No
correlated messages received within [2000]ms
at
org.codehaus.wadi.group.impl.AbstractDispatcher.attemptMultiRendezVous
(AbstractDispatcher.java:174)
at
org.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invokeOnPeers
(BasicServiceInvoker.java:90)
at org.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invoke
(BasicServiceInvoker.java:69)
at org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory
$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:193)
... 49 more
Not sure if I should worry about that.
Many thanks for your help!
Trygve
On Tue, Nov 3, 2009 at 4:35 PM, Trygve Hardersen <[email protected]>
wrote:
On Mon, Nov 2, 2009 at 5:39 PM, Kevan Miller
<[email protected]> wrote:
Thanks Gianny. I'd like to see this included in the Geronimo 2.2
release. Can we look for a new WADI release, soon? Once we know the
problem is fixed?
Trygve, the sooner we get confirmation that your issue is resolved,
the sooner we can start finalizing the 2.2 release.
--kevan
I've now built Geronimo using the 2.2-SNAPSHOT of WADI and
installed it on our test environment. No obvious issues so I'll go
ahead and deploy this to production either later this evening (EU
time) or tomorrow. Then it needs to run for a few days before I can
confirm if the issue has really been resolved.
BTW I got this on our test system:
AS-000:
16:23:17,773 INFO [TcpFailureDetector] Received memberDisappeared
[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88,
1, 61}:4000,{-64, -88, 1, 61},4000, alive=1814258,id={50 18 86 10
111 -47 79 83 -108 -4 82 -8 26 82 -79 -59 }, payload={-84 -19 0 5
115 114 0 50 111 ...(423)}, command={}, domain={74 79 84 84 65 95
87 65 68 ...(10)}, ]] message. Will verify.
16:23:17,897 INFO [TcpFailureDetector] Verification complete.
Member still alive[org.apache.catalina.tribes.membership.MemberImpl
[tcp://{-64, -88, 1, 61}:4000,{-64, -88, 1, 61},4000,
alive=1814258,id={50 18 86 10 111 -47 79 83 -108 -4 82 -8 26 82 -79
-59 }, payload={-84 -19 0 5 115 114 0 50 111 ...(423)}, command={},
domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]]
AS-001:
16:23:18,446 INFO [TcpFailureDetector] Received memberDisappeared
[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88,
1, 60}:4000,{-64, -88, 1, 60},4000, alive=2500759,id={107 -64 91
-23 109 93 75 116 -95 109 110 22 -85 53 -52 85 }, payload={-84 -19
0 5 115 114 0 50 111 ...(423)}, command={}, domain={74 79 84 84 65
95 87 65 68 ...(10)}, ]] message. Will verify.
16:23:18,456 INFO [TcpFailureDetector] Verification complete.
Member still alive[org.apache.catalina.tribes.membership.MemberImpl
[tcp://{-64, -88, 1, 60}:4000,{-64, -88, 1, 60},4000,
alive=2500759,id={107 -64 91 -23 109 93 75 116 -95 109 110 22 -85
53 -52 85 }, payload={-84 -19 0 5 115 114 0 50 111 ...(423)},
command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]]
And then:
AS-000
16:30:02,576 INFO [ChannelInterceptorBase] memberDisappeared:tcp://
{-64, -88, 1, 61}:4000
16:30:02,577 INFO [BasicPartitionBalancerSingletonService]
Queueing partition rebalancing
16:30:02,600 INFO [SimpleStateManager]
=============================
New Partition Balancing
Partition Balancing
Size [24]
Partition[0] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[1] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[2] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[3] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[4] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[5] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[6] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[7] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[8] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[9] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[10] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[11] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[12] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[13] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[14] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[15] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[16] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[17] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[18] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[19] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[20] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[21] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[22] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
Partition[23] owned by [TribesPeer [AS-000; tcp://
192.168.1.60:4000]]; version [3]; mergeVersion [0]
=============================
16:30:02,888 WARN [TcpFailureDetector] Member added, even though
we werent notified:org.apache.catalina.tribes.membership.MemberImpl
[tcp://{-64, -88, 1, 61}:4000,{-64, -88, 1, 61},4000,
alive=2221072,id={50 18 86 10 111 -47 79 83 -108 -4 82 -8 26 82 -79
-59 }, payload={-84 -19 0 5 115 114 0 50 111 ...(423)}, command={},
domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]
16:30:02,889 INFO [ChannelInterceptorBase] memberAdded:tcp://{-64,
-88, 1, 61}:4000
AS-001
Nothing....
There is practically no load on this network. Anyway I'll try this
with load and see what happens.
Many thanks again!
Trygve