Hi Patrick, A similar problem has been reported before: http://tomcat.10.n6.nabble.com/org-apache-catalina-tribes-ChannelException-Operation-has-timed-out-3000-ms-Faulty-members-tcp-64-88-td4656393.html The important error message from your log output is:
................ > Caused by: org.apache.catalina.tribes.ChannelException: Operation has > timed out(3000 ms.).; Faulty members:tcp://{10, 230, 20, 86}:4001; > tcp://{10, 230, 20, 87}:4001; tcp://{10, 230, 20, 94}:4001; tcp://{10, 230, > 20, 95}:4001; tcp://{10, 230, 20, 70}:4001; tcp://{10, 230, 20, 89}:4001; > > at > > org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(Paral > lelNioSender.java:109) > ............... > I am familiar with the code that generates this message; the problem is that the sending operation is abandoned for any sender object which has not been drained of data within timeout milliseconds. The "timeout" parameter is declared in AbstractSender class as (long) 3000. By my reckoning a small change to the timeout value could result a large reduction in messaging failures. According to information from this page: http://tomcat.apache.org/tomcat-7.0-doc/config/cluster-sender.html you should be able to increase the timeout parameter by setting a transport attribute thus: <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" timeout="4000" </Transport> </Sender> However, I can not find the code where the system reads the configuration to override the default value; if you make the alteration and find the error message still reports "3000ms", this would indicate an oversight in the coding which could be reported. BTW, your configuration for receiver has selectorTimeout="100" The code suggests that this should be the same value as sender/transport timeout (ie 3000). The documentation says the default is 5000. My examination of the code suggests that the PooledParallelSender class does not read the configuration but always uses 5000. Nevertheless, you could try setting that value to 5000 and seeing what happens. BTW my own interest was to implement tribes at Internet connection speed; by manipulating the parameter in question, my system copes with data transfers that take multiple seconds. http://tomcat.10.x6.nabble.com/overcoming-a-message-size-limitation-in-tribes-parallel-messaging-with-NioSender-tt4995446.html