After heavily loading a 10 node cluster for 3-4 days I got a concurrent mode failure of 53 seconds followed by a NodeIsDeadException which caused the node to be shut down. Is there is a timeout that can be increased so this does not occur in the future? From my experience with Cassandra Concurrent Mode Failures are a reality of Java, so my assumption is the timeouts need to accept them in terms of zookeeper and the master.
Any suggestions? The node was not even under load when this occurred. It was idle after being under load for several days. I also saw other CMF errors on this node and others so I assume this one just crossed the 60 second mark which is the timeout exception somewhere? Any help or suggestions would be appreciated. Parnew was getting large and taking too long (> 100ms) so I will try to limit the size with the suggestion from the performance tuning page (-XX:NewSize=6m -XX:MaxNewSize=6m). Thanks
