On 12/08/2014 7:47 PM, "Krishna Saranathan" <krishna.saran...@gmail.com> wrote: > > Its linux distro. > Linux version 2.6.32-358.14.1.el6.x86_64 ( > mockbu...@x86-022.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 > (Red Hat 4.4.7-3) (GCC) ) #1 SMP Mon Jun 17 15:54:20 EDT 2013 > > Java version - 1.6 update 45. > > I doubt change in security group suddenly applied for the port. Am able to > telnet from server which is shutdown to the currently running server to > port 4444 . Yes. OS restart was done for a hardware upgrade for RAM and > disk volume. >
Well your logs clearly show the member cant establish connection to 10.160.40.12:4444 Did you try the telnet to that exact ip and port or you used something else like internal dns name? Note that some instances on AWS change some parameters upon restart so check in your console to confirm they have the ip's you expect them to have. > > On Tue, Aug 12, 2014 at 6:58 AM, Igor Cicimov <icici...@gmail.com> wrote: > > > On 12/08/2014 4:24 PM, "Krishna Saranathan" <krishna.saran...@gmail.com> > > wrote: > > > > > > We have J2EE war application deployed in a cluster setup having two > > > nodes. Tomcat 6.0.39 is installed in the both nodes having identical > > > war deployed in both. Its deployed in Amazon AWS environment, and the > > > > What distro? Win or linux? And if linux which one? > > > > > two ec2-nodes are beneath an ELB , with session stickiness enabled for > > > JSESSIONID. Also the two tomcat nodes are session replication enabled > > > too. > > > > > > Following is Cluster config updated server.xml file: > > > > > > > ============================================================================= > > > <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" > > > channelSendOptions="6" channelStartOptions="3"> > > > > > > <Manager className="org.apache.catalina.ha.session.DeltaManager" > > > expireSessionsOnShutdown="false" notifyListenersOnReplication="true" > > > /> > > > > > > <Channel className="org.apache.catalina.tribes.group.GroupChannel"> > > > > > > <Receiver > > className="org.apache.catalina.tribes.transport.nio.NioReceiver" > > > autoBind="0" selectorTimeout="5000" > > > maxThreads="6" > > > address="x.x.x.x" port="4444" /> > > > <Sender > > className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> > > > <Transport > > className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" > > > timeout="60000" > > > keepAliveTime="10" > > > keepAliveCount="0" > > > /> > > > </Sender> > > > <Interceptor > > > > className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor" > > > staticOnly="true"/> > > > <Interceptor > > > > className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> > > > <Interceptor > > > > className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor"> > > > <Member className="org.apache.catalina.tribes.membership.StaticMember" > > > host="x.x.x.x" > > > port="4444" > > > > > > uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4}"/> > > > </Interceptor> > > > </Channel> > > > <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="" > > /> > > > <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve" /> > > > <ClusterListener > > > > > > > className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> > > > <ClusterListener > > > className="org.apache.catalina.ha.session.ClusterSessionListener"/> > > > </Cluster> > > > > > > > > ========================================================================== > > > > > > Receiver ip, static member ip and unique id is different in the > > > server.xml of the other node in the cluster. > > > > > > this was running fine in production environment for 3 months. Suddenly > > there was > > > an exception logged like this :, and started coming up infinitely. > > > > > > > > > ================================================== > > > Aug 6, 2014 12:00:39 AM > > > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector > > > memberDisappeared > > > INFO: Received > > memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:// > > 10.160.40.12:4444,10.160.40.12,4444, > > > alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={}, > > > domain={}, ]] message. Will verify. > > > Aug 6, 2014 12:00:39 AM > > > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector > > > memberDisappeared > > > INFO: Verification complete. Member still > > > alive[org.apache.catalina.tribes.membership.MemberImpl[tcp:// > > 10.160.40.12:4444,10.160.40.12,4444, > > > alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={}, > > > domain={}, ]] > > > Aug 6, 2014 12:00:39 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send > > > SEVERE: Unable to send message through cluster sender. > > > org.apache.catalina.tribes.ChannelException: Operation has timed > > > out(60000 ms.).; Faulty members:tcp://10.160.40.12:4444; > > > at > > > > org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:97) > > > at > > > > org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:53) > > > at > > > > org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:80) > > > at > > > > org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:76) > > > at > > > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > > at > > > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > > at > > > > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:88) > > > at > > > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > > at > > > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > > at > > org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216) > > > at > > org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175) > > > at > > org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:817) > > > at > > > > org.apache.catalina.ha.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:791) > > > at > > org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:553) > > > at > > > > org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:537) > > > at > > > > org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:519) > > > at > > > > org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:430) > > > at > > > > org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:363) > > > at > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > > > at > > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) > > > at > > > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) > > > at > > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) > > > at java.lang.Thread.run(Thread.java:662) > > > > > > > ============================================================================ > > > > > > > > > After this, the web application is not accessible, and we have to > > > manually kill the tomcat process in one node, thereby disabling the > > > cluster. > > > > > > > > > We are unsure, how all of a sudden this is coming, and disabling > > > application access altogether. If there are any suggestion on remedy, > > > pls provide the same. > > > > Firewall??? > > Did you change something in the SecurityGroup the instances belong to that > > might have affected the port 4444? Can you telnet from the server you shut > > down tomcat to port 4444 on the server tomcat is running on? Did you do a > > restart or OS update/upgrade that might have pulled some firewall package > > and activated it afterwards? > >