With a 5s collection, the problem is almost certainly GC. 

GC pressure can be caused by a number of things, including normal read/write 
loads, but ALSO compaction calculation (pre-2.1.9 / #9882) and very large 
partitions (trying to load a very large partition with something like row cache 
in 2.0 and earlier, or issuing a full row read where the row is larger than you 
expect). 

You can try to tune the GC behavior, but the underlying problem may be 
something like a bad data model (which Samuel suggested), and no amount of GC 
tuning is going to fix trying to do bad things with very big rows. 



From:  Roman Tkachenko
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, September 10, 2015 at 10:54 AM
To:  "user@cassandra.apache.org"
Subject:  Re: High CPU usage on some of nodes

Thanks for the responses guys. 

I also suspected GC and I guess it could be it, since during the spikes logs 
are filled with messages like "GC for ConcurrentMarkSweep: 5908 ms for 1 
collections, 1986282520 used; max is 8375238656", often right before messages 
about dropped queries, unlike other, unaffected, nodes that only have "GC for 
ParNew: 230 ms for 1 collections, 4418571760 used; max is 8375238656" type of 
messages.

Is my best shot to play with JVM settings trying to tune garbage collection 
then?


On Thu, Sep 10, 2015 at 6:52 AM, Samuel CARRIERE <samuel.carri...@urssaf.fr> 
wrote:
Hi Roman, 
If it affects only a subset of nodes and it's always the same ones, it could be 
a "problem" with your data model : maybe some (too) wide rows on theses nodes.
If one of your row is too wide, the deserialisation of the columns index of 
this row can take a lot of resources (disk, RAM, and CPU).
If you are using leveled compaction strategy and you see anormaly big sstables 
on thoses nodes, it could be a clue.
Regards, 
Samuel 

Robert Wille <rwi...@fold3.com> a écrit sur 10/09/2015 15:27:41 :

> De : Robert Wille <rwi...@fold3.com>
> A : "user@cassandra.apache.org" <user@cassandra.apache.org>, 
> Date : 10/09/2015 15:30 
> Objet : Re: High CPU usage on some of nodes 
> 
> It sounds like its probably GC. Grep for GC in system.log to verify.
> If it is GC, there are a myriad of issues that could cause it, but 
> at least you’ve narrowed it down.
> 
> On Sep 9, 2015, at 11:05 PM, Roman Tkachenko <ro...@mailgunhq.com> wrote:
> 
> > Hey guys,
> > 
> > We've been having issues in the past couple of days with CPU usage
> / load average suddenly skyrocketing on some nodes of the cluster, 
> affecting performance significantly so majority of requests start 
> timing out. It can go on for several hours, with CPU spiking through
> the roof then coming back down to norm and so on. Weirdly, it 
> affects only a subset of nodes and it's always the same ones. The 
> boxes Cassandra is running on are pretty beefy, 24 cores, and these 
> CPU spikes go up to >1000%.
> > 
> > What is the best way to debug such kind of issues and find out 
> what Cassandra is doing during spikes like this? Doesn't seem to be 
> compaction related as sometimes during these spikes "nodetool 
> compactionstats" says no compactions are running.
> > 
> > Thanks!
> > 
> 


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to