[ https://issues.apache.org/jira/browse/CASSANDRA-14369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432238#comment-16432238 ]
Paulo Motta commented on CASSANDRA-14369: ----------------------------------------- Since you are using multiple disks (JBOD) this looks similar to CASSANDRA-13948, would you mind upgrade to 3.11.2 and see if the issue is happening there? > infinite loop when decommission a node > -------------------------------------- > > Key: CASSANDRA-14369 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14369 > Project: Cassandra > Issue Type: Bug > Reporter: Daniel Woo > Priority: Major > Fix For: 3.11.1 > > > I have 6 nodes (N1 to N6), N2 to N6 are new hardwares with two SSDs on each, > N1 is an old box with spinning disks, and I am trying to decommission N1. > Then I see two nodes are trying to receive streaming from N1 infinitely. The > log rotates so quickly that I can only see this: > > {{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,560 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,560 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,560 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,560 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,560 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,560 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,560 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,561 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,561 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}}{{INFO [CompactionExecutor:19401] 2018-04-07 13:07:56,561 > LeveledManifest.java:474 - Adding high-level (L3) > BigTableReader(path='/opt/platform/data1/cassandra/data/data/contract_center_cloud/contract-2f2f9f70cd9911e7bfe87fec03576322/mc-31-big-Data.db') > to candidates}} > nodetool tpstats shows some of the compactions are pending: > > {{Pool Name Active Pending Completed Blocked > All time blocked}}{{ReadStage 0 0 > 1366419 0 0}}{{MiscStage > 0 0 0 0 0}}{{CompactionExecutor > 9 9 77739 0 > 0}}{{MutationStage 0 0 7504702 > 0 0}}{{MemtableReclaimMemory 0 0 > 327 0 0}}{{PendingRangeCalculator > 0 0 20 0 0}}{{GossipStage > 0 0 486365 0 > 0}}{{SecondaryIndexManagement 0 0 0 > 0 0}} > > This is from the jstack output: > {{"CompactionExecutor:16666" #26533 daemon prio=1 os_prio=4 > tid=0x00007f971812f170 nid=0x6581 waiting for monitor entry > [0x00007f9990f4a000]}}{{ java.lang.Thread.State: BLOCKED (on object > monitor)}}{{ at > org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:310)}}{{ > - waiting to lock <0x00000001c14acab0> (a > org.apache.cassandra.db.compaction.LeveledManifest)}}{{ at > org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:119)}}{{ > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:119)}}{{ > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:262)}}{{ > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}{{ > at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}{{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}{{ > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}{{ > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)}}{{ > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/2123444693.run(Unknown > Source)}}{{ at java.lang.Thread.run(Thread.java:748)}}{{ }}{{ Locked > ownable synchronizers:}}{{ - <0x00000002b48f5ff0> (a > java.util.concurrent.ThreadPoolExecutor$Worker)}}{{ > }}{{"CompactionExecutor:16632" #26499 daemon prio=1 os_prio=4 > tid=0x00007f970c16c420 nid=0x6553 runnable [0x00007f9982714000]}}{{ > java.lang.Thread.State: RUNNABLE}}{{ at > org.apache.cassandra.db.compaction.LeveledManifest.getLevelSize(LeveledManifest.java:489)}}{{ > - locked <0x00000001c14acab0> (a > org.apache.cassandra.db.compaction.LeveledManifest)}}{{ at > org.apache.cassandra.db.compaction.LeveledManifest.getOverlappingStarvedSSTables(LeveledManifest.java:448)}}{{ > at > org.apache.cassandra.db.compaction.LeveledManifest.getCompactionCandidates(LeveledManifest.java:370)}}{{ > - locked <0x00000001c14acab0> (a > org.apache.cassandra.db.compaction.LeveledManifest)}}{{ at > org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getNextBackgroundTask(LeveledCompactionStrategy.java:119)}}{{ > at > org.apache.cassandra.db.compaction.CompactionStrategyManager.getNextBackgroundTask(CompactionStrategyManager.java:119)}}{{ > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:262)}}{{ > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}}{{ > at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}{{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)}}{{ > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}}{{ > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81)}}{{ > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$5/2123444693.run(Unknown > Source)}}{{ at java.lang.Thread.run(Thread.java:748)}}{{ }}{{ Locked > ownable synchronizers:}} > > Now the problem is, this is my online production environment, how can I fix > it online? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org