Re: Cluster setup stopped working after 3 months in production

Igor Cicimov Mon, 11 Aug 2014 23:59:18 -0700

On 12/08/2014 4:24 PM, "Krishna Saranathan" <krishna.saran...@gmail.com>
wrote:
>
> We have J2EE war application deployed in a cluster setup having two
> nodes. Tomcat 6.0.39 is installed in the both nodes having identical
> war deployed in both. Its deployed in Amazon AWS environment, and the


What distro? Win or linux? And if linux which one?

> two ec2-nodes are beneath an ELB , with session stickiness enabled for
> JSESSIONID. Also the two tomcat nodes are session replication enabled
> too.
>
> Following is Cluster config updated server.xml file:
>
=============================================================================
>  <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
> channelSendOptions="6" channelStartOptions="3">
>
> <Manager className="org.apache.catalina.ha.session.DeltaManager"
> expireSessionsOnShutdown="false" notifyListenersOnReplication="true"
> />
>
> <Channel className="org.apache.catalina.tribes.group.GroupChannel">
>
> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver"
>                                 autoBind="0" selectorTimeout="5000"
> maxThreads="6"
>                                 address="x.x.x.x" port="4444" />
> <Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
> <Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
>                                         timeout="60000"
>                                         keepAliveTime="10"
>                                         keepAliveCount="0"
> />
> </Sender>
> <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor"
> staticOnly="true"/>
> <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
> <Interceptor
className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor">
> <Member className="org.apache.catalina.tribes.membership.StaticMember"
>                                         host="x.x.x.x"
>                                         port="4444"
>
> uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4}"/>
> </Interceptor>
> </Channel>
> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""
/>
> <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve" />
> <ClusterListener
>
className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
> <ClusterListener
> className="org.apache.catalina.ha.session.ClusterSessionListener"/>
> </Cluster>
>
> ==========================================================================
>
> Receiver ip, static member ip and unique id is different in the
> server.xml of the other node in the cluster.
>
> this was running fine in production environment for 3 months. Suddenly
there was
> an exception logged like this :, and started coming up infinitely.
>
>
> ==================================================
> Aug 6, 2014 12:00:39 AM
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
> memberDisappeared
> INFO: Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://
10.160.40.12:4444,10.160.40.12,4444,
> alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={},
> domain={}, ]] message. Will verify.
> Aug 6, 2014 12:00:39 AM
> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector
> memberDisappeared
> INFO: Verification complete. Member still
> alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://
10.160.40.12:4444,10.160.40.12,4444,
> alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={},
> domain={}, ]]
> Aug 6, 2014 12:00:39 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send
> SEVERE: Unable to send message through cluster sender.
> org.apache.catalina.tribes.ChannelException: Operation has timed
> out(60000 ms.).; Faulty members:tcp://10.160.40.12:4444;
>         at
org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:97)
>         at
org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:53)
>         at
org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:80)
>         at
org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:76)
>         at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:88)
>         at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at
org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75)
>         at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216)
>         at
org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175)
>         at
org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:817)
>         at
org.apache.catalina.ha.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:791)
>         at
org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:553)
>         at
org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:537)
>         at
org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:519)
>         at
org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:430)
>         at
org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:363)
>         at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>         at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
>         at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
>         at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>         at java.lang.Thread.run(Thread.java:662)
>
============================================================================
>
>
> After this, the web application is not accessible, and we have to
> manually kill the tomcat process in one node, thereby disabling the
> cluster.
>
>
> We are unsure, how all of a sudden this is coming, and disabling
> application access altogether. If there are any suggestion on remedy,
> pls provide the same.

Firewall???
Did you change something in the SecurityGroup the instances belong  to that
might have affected the port 4444? Can you telnet from the server you shut
down tomcat to port 4444 on the server tomcat is running on? Did you do a
restart or OS update/upgrade that might have pulled some firewall package
and activated it afterwards?

Re: Cluster setup stopped working after 3 months in production

Reply via email to