[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-11-16 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670769#comment-15670769
 ] 

Joel Knighton commented on CASSANDRA-11709:
---

I've been unable to get much further on this without a comparably large cluster 
to test on. The branch I've linked above does help parts of the issue by 
reducing the invalidation of the cached ring in unnecessary circumstances; I 
think a patch addressing this issue will need that change as well as others.

Unassigning so as to not block progress.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
> Attachments: lock.jstack
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-08-02 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404599#comment-15404599
 ] 

Joel Knighton commented on CASSANDRA-11709:
---

A long, long overdue update. I have a branch here that should greatly decrease 
the number of times we invalidate the cached ring on nodes restarting/gossip 
restarting. I believe this should significantly help the problem based on the 
logs/jstack provided so far. That said, I still haven't had a large enough 
cluster to fully faithfully reproduce this, so there may be other problems 
lurking after this (potential) fix.

The branch is available at 
[CASSANDRA-11709-2.2|https://github.com/jkni/cassandra/tree/CASSANDRA-11709-2.2].
 I'm holding off on making this officially patch available while I do further 
testing; I've been trying to run CI on this but other problems are making it 
difficult for me to have a conclusive run at the moment. From what I can tell, 
CI looks pretty good on it.

If anyone who can reproduce this issue is interested in testing this patch in a 
dev environment, that would be greatly appreciated. I would strongly recommend 
against putting this patch in an environment of any importance until I test it 
more.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
> Attachments: lock.jstack
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> 

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-19 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292449#comment-15292449
 ] 

Dikang Gu commented on CASSANDRA-11709:
---

[~jkni] Thanks a lot for looking at this!
1. the jstack is token from the nodes did not have gossip disabled, and I 
attached the full jstack.
2. I will send the logs to your email.
3. The latency increased about several minutes after I re-enable the gossips. 
It could not recover by itself. I fixed it by rolling restart the cluster.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
> Attachments: lock.jstack
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExe

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-19 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292423#comment-15292423
 ] 

Jeremiah Jordan commented on CASSANDRA-11709:
-

It falls back so you can do a rolling upgrade from PFS to GPFS.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
> Attachments: lock.jstack
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer ===
> "Thrift:7668" daemon prio=10 

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-19 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292415#comment-15292415
 ] 

Dikang Gu commented on CASSANDRA-11709:
---

[~jjordan], yes, it's definite possible. I'm wondering what the reason that 
GPFS would fall back to the PFS?

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer ===
> "Thrift:7668" daemon prio=10 tid

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-17 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15288161#comment-15288161
 ] 

Jeremiah Jordan commented on CASSANDRA-11709:
-

[~dikanggu] if you are using GossipingPropertyFileSnitch the DC1 R1 thing is 
most likely because you have a cassandra-topology.properties file and GPFS is 
falling back to that when it shouldn't.
https://issues.apache.org/jira/browse/CASSANDRA-11508

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.r

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-17 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287794#comment-15287794
 ] 

Joel Knighton commented on CASSANDRA-11709:
---

[~dikanggu] - I'm investigating a patch that reduces cached token map 
invalidation and testing it to ensure that it invalidates the cache when 
necessary. I believe this should help some of the issues; that said, I don't 
have a 500 node cluster easily available to test.

A few questions:
* Is the jstack above from a node that has gossip enabled and disabled, or from 
a node that did not have gossip enabled/disabled? Can you attach the full 
jstack? In particular, the full jstack would be very helpful.
* Are logs available from both a node that had gossip enabled/disabled and a 
node that did not have gossip enabled/disabled?
* How long following the restart of gossip did you see degraded read/write 
performance? Did the cluster recover?

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassa

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-10 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278576#comment-15278576
 ] 

Joel Knighton commented on CASSANDRA-11709:
---

Sorry [~dikanggu] - looking at this further, I believe that [CASSANDRA-11740] 
is actually the root cause of this issue, so I've resolved it as a duplicate 
and we can continue here.

I do not believe this is related to the pending gossip tasks - I believe that 
the actual root cause is the change in datacenter/rack you mentioned. The 
GossipingPropertyFileSnitch periodically reloads its configuration and 
invalidates the cached token ring if datacenter/rack has changed. If you didn't 
bootstrap/decommission any nodes during your testing, I suspect this is the 
root cause of the invalidations.

The following information will help resolve this issue:
- What version of Cassandra are you running?
- Have you always been using GossipingPropertyFileSnitch?
- Is this an issue you've only seen once or is it reproducible?
- Did you make any changes to cassandra-rackdc.properties files during the 
test? How are you make changes to your cassandra-rackdc.properties files?
- Can you attach a cassandra-rackdc.properties file from an affected node?
- Can you attach a log from an affected node? In addition, if you have it, can 
you attach the full jstack from an affected node?

In all the above questions, "affected node" refers to a node where many 
read/write threads are blocked on the cachedOnlyTokenMap.

Thanks.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraS

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-09 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277099#comment-15277099
 ] 

Joel Knighton commented on CASSANDRA-11709:
---

That sounds like a different issue for me - I'd recommend opening another issue 
with as much info about your set up (snitch, etc) as possible.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer 

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-09 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277090#comment-15277090
 ] 

Dikang Gu commented on CASSANDRA-11709:
---

[~jkni], cool, thanks!

One more question, I also found a serious problem recently, that in "nodetool 
status", bunch of nodes are not shown in correct region. For example, we have 
three regions A, B and C, I found that there are almost 100 hundreds nodes are 
not shown in any of those regions, they are shown as in DC1, and Rack r1, which 
I think complete broke the replication, and return incorrect data to client 
requests.

Do you think they are the same issue, or I'd better open another jira?

Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  OwnsHost ID 
  Rack
UN  2401:db00:11:6134:face:0:1:0 509.52 GB  256 ?   
e24656ac-c3b2-4117-b933-a5b06852c993  r1
UN  2401:db00:11:b218:face:0:5:0 510.01 GB  256 ?   
53da2104-b1b5-4fa5-a3dd-52c7557149f9  r1
UN  2401:db00:2130:5133:face:0:4d:0  459.75 GB  256 ?   
ef8311f0-f6b8-491c-904d-baa925cdd7c2  r1

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.ja

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-09 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277075#comment-15277075
 ] 

Joel Knighton commented on CASSANDRA-11709:
---

I've just started really looking at this now - I'll post an update regarding 
the strategy to fix this once I have one.

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer ===
> "Thrift:7668" 

[jira] [Commented] (CASSANDRA-11709) Lock contention when large number of dead nodes come back within short time

2016-05-09 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277066#comment-15277066
 ] 

Dikang Gu commented on CASSANDRA-11709:
---

[~jkni], can you please share a bit about how are you going to fix this? Thanks!

> Lock contention when large number of dead nodes come back within short time
> ---
>
> Key: CASSANDRA-11709
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11709
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster. 
> We were trying to simulate a data center failure, by disabling the gossip on 
> all the nodes in one data center. After ~20mins, I re-enabled the gossip on 
> those nodes, was doing 5 nodes in each batch, and sleep 5 seconds between the 
> batch.
> After that, I saw the latency of read/write requests increased a lot, and 
> client requests started to timeout.
> On the node, I can see there are huge number of pending tasks in GossipStage. 
> =
> 2016-05-02_23:55:08.99515 WARN  23:55:08 Gossip stage has 36337 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:09.36009 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:41:0 state jump to normal
> 2016-05-02_23:55:09.99057 INFO  23:55:09 Node 
> /2401:db00:2020:717a:face:0:43:0 state jump to normal
> 2016-05-02_23:55:10.09742 WARN  23:55:10 Gossip stage has 36421 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:10.91860 INFO  23:55:10 Node 
> /2401:db00:2020:717a:face:0:45:0 state jump to normal
> 2016-05-02_23:55:11.20100 WARN  23:55:11 Gossip stage has 36558 pending 
> tasks; skipping status check (no nodes will be marked down)
> 2016-05-02_23:55:11.57893 INFO  23:55:11 Node 
> /2401:db00:2030:612a:face:0:49:0 state jump to normal
> 2016-05-02_23:55:12.23405 INFO  23:55:12 Node /2401:db00:2020:7189:face:0:7:0 
> state jump to normal
> 
> And I took jstack of the node, I found the read/write threads are blocked by 
> a lock,
>  read thread ==
> "Thrift:7994" daemon prio=10 tid=0x7fde91080800 nid=0x5255 waiting for 
> monitor entry [0x7fde6f8a1000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.locator.TokenMetadata.cachedOnlyTokenMap(TokenMetadata.java:546)
> - waiting to lock <0x7fe4faef4398> (a 
> org.apache.cassandra.locator.TokenMetadata)
> at 
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:111)
> at 
> org.apache.cassandra.service.StorageService.getLiveNaturalEndpoints(StorageService.java:3155)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1526)
> at 
> org.apache.cassandra.service.StorageProxy.getLiveSortedEndpoints(StorageProxy.java:1521)
> at 
> org.apache.cassandra.service.AbstractReadExecutor.getReadExecutor(AbstractReadExecutor.java:155)
> at 
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1328)
> at 
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1270)
> at 
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1195)
> at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:118)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSlice(CassandraServer.java:275)
> at 
> org.apache.cassandra.thrift.CassandraServer.multigetSliceInternal(CassandraServer.java:457)
> at 
> org.apache.cassandra.thrift.CassandraServer.getSliceInternal(CassandraServer.java:346)
> at 
> org.apache.cassandra.thrift.CassandraServer.get_slice(CassandraServer.java:325)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3659)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$get_slice.getResult(Cassandra.java:3643)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:205)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> =  writer ===
> "Thrift:7668" daemon prio=10 tid=0x7fde90d91000 nid=0x50e