[ 
https://issues.apache.org/jira/browse/CASSANDRA-12965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-12965:
-----------------------------------------
    Component/s: Streaming and Messaging

> StreamReceiveTask causing high CPU utilization during repair
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-12965
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12965
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Randy Fradin
>            Priority: Major
>
> During a full repair run, I observed one node in my cluster using 100% cpu 
> (100% of all cores on a 48-core machine). When I took a stack trace I found 
> exactly 48 running StreamReceiveTask threads. Each was in the same block of 
> code in StreamReceiveTask.OnCompletionRunnable:
> {noformat}
> "StreamReceiveTask:8077" #1511134 daemon prio=5 os_prio=0 
> tid=0x00007f01520a8800 nid=0x6e77 runnable [0x00007f020dfae000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.ComparableTimSort.binarySort(ComparableTimSort.java:258)
>         at java.util.ComparableTimSort.sort(ComparableTimSort.java:203)
>         at java.util.Arrays.sort(Arrays.java:1312)
>         at java.util.Arrays.sort(Arrays.java:1506)
>         at java.util.ArrayList.sort(ArrayList.java:1454)
>         at java.util.Collections.sort(Collections.java:141)
>         at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:257)
>         at 
> org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:280)
>         at 
> org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72)
>         at 
> org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:590)
>         at 
> org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:584)
>         at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:565)
>         at 
> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:761)
>         at 
> org.apache.cassandra.db.DataTracker.addSSTablesToTracker(DataTracker.java:428)
>         at 
> org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:283)
>         at 
> org.apache.cassandra.db.ColumnFamilyStore.addSSTables(ColumnFamilyStore.java:1422)
>         at 
> org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:148)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> All 48 threads were in ColumnFamilyStore.addSSTables(), and specifically in 
> the IntervalNode constructor called from the IntervalTree constructor.
> It stayed this way for maybe an hour before we restarted the node. The repair 
> was also generating thousands (20,000+) of tiny SSTables in a table that 
> previously had just 20.
> I don't know enough about SSTables and ColumnFamilyStore to know if all this 
> CPU work is necessary or a bug, but I did notice that these tasks are run on 
> a thread pool constructed in StreamReceiveTask.java, so perhaps this pool 
> should have a thread count max less than the number of processors on the 
> machine, at least for machines with a lot of processors. Any reason not to do 
> that? Any ideas for a reasonable # or formula to cap the thread count?
> Some additional info: We have never run incremental repair on this cluster, 
> so that is not a factor. All our tables use LCS. Unfortunately I don't have 
> the log files from the period saved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to