Does tpstats show unusually high counts for blocked flush writers? As Sebastian suggests, running ttop will paint a clearer picture about what is happening within C*. I would however recommend going back to CMS in this case as that is the devil we all know and more folks will be able to offer advice on seeing its output (and it removes a delta).
> It’s starting to look to me like it’s possibly related to brief IO spikes > that are smaller than my usual graphing granularity. It feels surprising to > me that these would affect the Gossip threads, but it’s the best current > lead I have with my debugging right now. More to come when I learn it. > Probably not the case since this was a result of an upgrade, but I've seen similar behavior on systems where some kernels had issues with irqbalance doing the right thing and would end up parking most interrupts on CPU0 (like say for the disk and ethernet modules) regardless of the number of cores. Check out proc via 'cat /proc/interrupts' and make sure the interrupts are spread out of CPU cores. You can steer them off manually at runtime if they are not spread out. Also, did you upgrade anything besides Cassandra? -- ----------------- Nate McCall Austin, TX @zznate Co-Founder & Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com