Hello,

I will preface this and say that all of the nodes have been running for
about the same amount of time and were not restarted before running
nodetool tpstats.

This is more for my understanding that anything else but I have a 20 node
cassandra cluster running cassandra 3.0.3. I have 0 read and 0 writes going
to the cluster and I see a strange banding of load average for some of the
nodes.

18 out of 20 of the nodes sit around an 15 min LA of 0.3 while 2 nodes are
at 0.08. Once I applied writes to the cluster all of the load averages
increased by the same proportion, so 18 out of 20 nodes increased to ~0.4
and 2 nodes increased to 0.1

When I look at what is different between these nodes, all of which have
been running for the same amount of time, the only numerical difference in
nodetool tpstats is the InternalResponseStage. 18 out of 20 of the nodes
are in the range of 20,000 completed while 2 are only at 300. Interestingly
it also looks like the 2 nodes that are on the low side are exhibiting
symptoms of https://issues.apache.org/jira/browse/CASSANDRA-11090.

I restarted one of the low nodes and it immediately jumped up to match the
other 18 nodes in the cluster and settled around 0.4 15 minute LA.

I am curious of two things, first why the InternalResponseStage is so low
on two of the nodes in the cluster and what that means. Second is it
atypical for the other 18 nodes in the cluster to have such a high number
of InternalResponseStage completed tasks or do these numbers seems
reasonable for a idle cluster?

Thanks!
Andrew Jorgensen
@ajorgensen

Reply via email to