System block cache vs. disk access and metrics

Jeff Ferland Thu, 04 Feb 2016 13:27:39 -0800

We struggled for a while to upgrade due to an out of order SStables bug. During 
this time, load continued to increase and we were eventually accessing the disk 
a lot. When we could finally expand the cluster, the went down by an order of 
magnitude. This leads me to conclude that we had blown out the block cache.


Linux unfortunately doesn’t have a metric for tracking the block cache hit 
ratio. There is system tap which may be the way we have to go, but I’m 
wondering about Cassandra counters as well. If I can track the ratio of SSTable 
reads vs. actual disk reads, I’ll have sufficiently good enough data to not 
spend my time writing up a system tap script.

This brings about the following specific questions:
 * Which if any metric corresponds to the number of queries made by clients
 * Which if any metric corresponds to the number of sstable reads performed

Metrics such as cassandra.ReadCount aren’t perfectly clear as to what they do 
and don’t indicate, so feedback on that before I go on another source code 
reading adventure is welcomed.

Cheers all,
-Jeff

System block cache vs. disk access and metrics

Reply via email to