Hello, I will preface this and say that all of the nodes have been running for about the same amount of time and were not restarted before running nodetool tpstats.
This is more for my understanding that anything else but I have a 20 node cassandra cluster running cassandra 3.0.3. I have 0 read and 0 writes going to the cluster and I see a strange banding of load average for some of the nodes. 18 out of 20 of the nodes sit around an 15 min LA of 0.3 while 2 nodes are at 0.08. Once I applied writes to the cluster all of the load averages increased by the same proportion, so 18 out of 20 nodes increased to ~0.4 and 2 nodes increased to 0.1 When I look at what is different between these nodes, all of which have been running for the same amount of time, the only numerical difference in nodetool tpstats is the InternalResponseStage. 18 out of 20 of the nodes are in the range of 20,000 completed while 2 are only at 300. Interestingly it also looks like the 2 nodes that are on the low side are exhibiting symptoms of https://issues.apache.org/jira/browse/CASSANDRA-11090. I restarted one of the low nodes and it immediately jumped up to match the other 18 nodes in the cluster and settled around 0.4 15 minute LA. I am curious of two things, first why the InternalResponseStage is so low on two of the nodes in the cluster and what that means. Second is it atypical for the other 18 nodes in the cluster to have such a high number of InternalResponseStage completed tasks or do these numbers seems reasonable for a idle cluster? Thanks! Andrew Jorgensen @ajorgensen