Do you mean high CPU usage or high load avg? (20 indicates load avg to me). High load avg means the CPU is waiting on something.
Check "iostat -dmx 1 100" to check your disk stats, you'll see the columns that indicate mb/s read & write as well as % utilization. Once you understand the bottleneck we can start to narrow down the cause. On Thu, Dec 5, 2013 at 4:33 AM, Alexander Shutyaev <shuty...@gmail.com>wrote: > Hi all, > > We have a 3 node cluster setup, single keyspace, about 500 tables. The > hardware is 2 cores + 16 GB RAM (Cassandra chose to have 4GB). Cassandra > version is 2.0.3. Our replication factor is 3, read/write consistency is > QUORUM. We've plugged it into our production environment as a cache in > front of postgres. Everything worked fine, we even stressed it by > explicitly propagating about 30G (10G/node) data from postgres to cassandra. > > Then the problems came. Our nodes began showing high cpu usage (around > 20). The funny thing is that they were actually doing it one after another > and there was always only node with high cpu usage. Using OpsCenter we saw > that when the CPU was beginning to go high the node in question was > performing compaction. But even after the compaction was performed the cpu > remained still high, and in some cases didn't go down for hours. Our jmx > monitoring showed that it was presumably in constant garbage collection. > During that time cluster read latency goes from 2ms to 200ms > > What can be the reason? Can it be high number of tables? Do we need to > adjust some settings for this setup? Is it ok to have so many tables? > Theoretically we can stuck them all in 3-4 tables. > > Thanks in advance, > Alexander > -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade