[jira] [Comment Edited] (CASSANDRA-13043) UnavailabeException caused by counter writes forwarded to leaders without complete cluster view

2017-09-15 Thread Stefano Ortolani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167923#comment-16167923
 ] 

Stefano Ortolani edited comment on CASSANDRA-13043 at 9/15/17 2:19 PM:
---

Thanks a lot for the feedback!

I am submitting a new patch [^13043-3.0.patch]


was (Author: ostefano):
Thanks a lot for the feedback!

I am submitting a new patch.

> UnavailabeException caused by counter writes forwarded to leaders without 
> complete cluster view
> ---
>
> Key: CASSANDRA-13043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
> Environment: Debian
>Reporter: Catalin Alexandru Zamfir
>Assignee: Stefano Ortolani
>Priority: Minor
> Fix For: 3.0.x, 3.11.x
>
> Attachments: 13043-3.0.patch, patch.diff
>
>
> In version 3.9 of Cassandra, we get the following exceptions on the 
> system.log whenever booting an agent. They seem to grow in number with each 
> reboot. Any idea where they come from or what can we do about them? Note that 
> the cluster is healthy (has sufficient live nodes).
> {noformat}
> 2/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-111,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat java.lang.Thread.run(Thread.java:745) 
> [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-118,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
> 12/14/2016 12:39:47 PMat 
> 

[jira] [Comment Edited] (CASSANDRA-13043) UnavailabeException caused by counter writes forwarded to leaders without complete cluster view

2017-09-04 Thread Stefano Ortolani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149262#comment-16149262
 ] 

Stefano Ortolani edited comment on CASSANDRA-13043 at 9/4/17 10:58 AM:
---

Some updates: added to ccm the ability to send a byteman rule when restarting a 
node (https://github.com/ostefano/ccm/tree/startup_byteman). Originally it was 
only possible to submit a rule _after_ the node started, which made reproducing 
this race quite impossible.

With that I could finally reproduce the bug. Here you can my branch with a new 
DTEST exercising the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043

Attached you can find a patch fixing the bug by removing nodes that are not RPC 
ready from a leader election. Also, I fixed the corner case where the CL 
required is local and the current DC does not have available nodes.

Here you can find the commit: 
https://github.com/ostefano/cassandra/commit/91cc9b4398009a3cee3004bc11a047c056fda6a6

update: unit tests are passing


was (Author: ostefano):
Some updates: added to ccm the ability to send a byteman rule when restarting a 
node (https://github.com/ostefano/ccm/tree/startup_byteman). Originally it was 
only possible to submit a rule _after_ the node started, which made reproducing 
this race quite impossible.

With that I could finally reproduce the bug. Here you can my branch with a new 
DTEST exercising the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043

Attached you can find a patch fixing the bug by removing nodes that are not RPC 
ready from a leader election. Also, I fixed the corner case where the CL 
required is local and the current DC does not have available nodes.

Here you can find the commit: 
https://github.com/ostefano/cassandra/commit/91cc9b4398009a3cee3004bc11a047c056fda6a6

> UnavailabeException caused by counter writes forwarded to leaders without 
> complete cluster view
> ---
>
> Key: CASSANDRA-13043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian
>Reporter: Catalin Alexandru Zamfir
> Attachments: patch.diff
>
>
> In version 3.9 of Cassandra, we get the following exceptions on the 
> system.log whenever booting an agent. They seem to grow in number with each 
> reboot. Any idea where they come from or what can we do about them? Note that 
> the cluster is healthy (has sufficient live nodes).
> {noformat}
> 2/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-111,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat java.lang.Thread.run(Thread.java:745) 
> [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> 

[jira] [Comment Edited] (CASSANDRA-13043) UnavailabeException caused by counter writes forwarded to leaders without complete cluster view

2017-09-04 Thread Stefano Ortolani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149262#comment-16149262
 ] 

Stefano Ortolani edited comment on CASSANDRA-13043 at 9/4/17 10:44 AM:
---

Some updates: added to ccm the ability to send a byteman rule when restarting a 
node (https://github.com/ostefano/ccm/tree/startup_byteman). Originally it was 
only possible to submit a rule _after_ the node started, which made reproducing 
this race quite impossible.

With that I could finally reproduce the bug. Here you can my branch with a new 
DTEST exercising the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043

Attached you can find a patch fixing the bug by removing nodes that are not RPC 
ready from a leader election. Also, I fixed the corner case where the CL 
required is local and the current DC does not have available nodes.

Here you can find the commit: 
https://github.com/ostefano/cassandra/commit/91cc9b4398009a3cee3004bc11a047c056fda6a6


was (Author: ostefano):
Some updates: added to ccm the ability to send a byteman rule when restarting a 
node (https://github.com/ostefano/ccm/tree/startup_byteman). 

This allowed me to finally reproduce the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043

Attached a patch that try to fix it by removing from a leader election nodes 
that are not RPC ready.
Also, fixed corner case where the CL required is local.

> UnavailabeException caused by counter writes forwarded to leaders without 
> complete cluster view
> ---
>
> Key: CASSANDRA-13043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian
>Reporter: Catalin Alexandru Zamfir
> Attachments: patch.diff
>
>
> In version 3.9 of Cassandra, we get the following exceptions on the 
> system.log whenever booting an agent. They seem to grow in number with each 
> reboot. Any idea where they come from or what can we do about them? Note that 
> the cluster is healthy (has sufficient live nodes).
> {noformat}
> 2/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-111,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat java.lang.Thread.run(Thread.java:745) 
> [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-118,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> 

[jira] [Comment Edited] (CASSANDRA-13043) UnavailabeException caused by counter writes forwarded to leaders without complete cluster view

2017-09-03 Thread Stefano Ortolani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149262#comment-16149262
 ] 

Stefano Ortolani edited comment on CASSANDRA-13043 at 9/3/17 3:55 PM:
--

Some updates: added to ccm the ability to send a byteman rule when restarting a 
node (https://github.com/ostefano/ccm/tree/startup_byteman). 

This allowed me to finally reproduce the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043

Attached a patch that try to fix it by removing from a leader election nodes 
that are not RPC ready.
Also, fixed corner case where the CL required is local.


was (Author: ostefano):
Some updates:

* Added to ccm the ability to send a byteman rule when restarting a node 
(https://github.com/ostefano/ccm/tree/startup_byteman). 
* Not trying to slow down the gossip anymore, but rather I instruct the other 
two nodes to pick the restarting node as leader.

This allowed me to finally reproduce the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043
The way I plan to fix is to make `assureSufficientLiveNodes` wait for the 
gossip to settle.
What do you think, [~iamaleksey]? Would that be a sound approach to fix it?

> UnavailabeException caused by counter writes forwarded to leaders without 
> complete cluster view
> ---
>
> Key: CASSANDRA-13043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian
>Reporter: Catalin Alexandru Zamfir
> Attachments: patch.diff
>
>
> In version 3.9 of Cassandra, we get the following exceptions on the 
> system.log whenever booting an agent. They seem to grow in number with each 
> reboot. Any idea where they come from or what can we do about them? Note that 
> the cluster is healthy (has sufficient live nodes).
> {noformat}
> 2/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-111,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat java.lang.Thread.run(Thread.java:745) 
> [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-118,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> 

[jira] [Comment Edited] (CASSANDRA-13043) UnavailabeException caused by counter writes forwarded to leaders without complete cluster view

2017-08-31 Thread Stefano Ortolani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149262#comment-16149262
 ] 

Stefano Ortolani edited comment on CASSANDRA-13043 at 8/31/17 5:09 PM:
---

Some updates:

* Added to ccm the ability to send a byteman rule when restarting a node 
(https://github.com/ostefano/ccm/tree/startup_byteman). 
* Not trying to slow down the gossip anymore, but rather I instruct the other 
two nodes to pick the restarting node as leader.

This allowed me to finally reproduce the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043
The way I plan to fix is to make `assureSufficientLiveNodes` wait for the 
gossip to settle.
What do you think, [~iamaleksey]? Would that be a sound approach to fix it?


was (Author: ostefano):
Some updates:

* Added to ccm the ability to send a byteman rule when restarting a node 
(https://github.com/ostefano/ccm/tree/startup_byteman). 
* Not trying to slow down the gossip anymore, but rather I instruct the other 
two nodes to pick the restarting node as leader.

This allowed me to finally reproduce the bug: 
https://github.com/ostefano/cassandra-dtest/tree/CASSANDRA-13043
The way I plan to fix is to make `assureSufficientLiveNodes` wait for the 
gossip to settle.
What do you think, [~iamaleksey]? Would that work?

> UnavailabeException caused by counter writes forwarded to leaders without 
> complete cluster view
> ---
>
> Key: CASSANDRA-13043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian
>Reporter: Catalin Alexandru Zamfir
>
> In version 3.9 of Cassandra, we get the following exceptions on the 
> system.log whenever booting an agent. They seem to grow in number with each 
> reboot. Any idea where they come from or what can we do about them? Note that 
> the cluster is healthy (has sufficient live nodes).
> {noformat}
> 2/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-111,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat java.lang.Thread.run(Thread.java:745) 
> [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-118,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PM   

[jira] [Comment Edited] (CASSANDRA-13043) UnavailabeException caused by counter writes forwarded to leaders without complete cluster view

2017-08-25 Thread Stefano Ortolani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141592#comment-16141592
 ] 

Stefano Ortolani edited comment on CASSANDRA-13043 at 8/25/17 1:16 PM:
---

Hi [~iamaleksey], I am having some difficulties reproducing it.

I am bootstrapping 3 nodes via cassandra-dtest, and updating the same counter 
3 times in three different phases.
During each phase I pick a node, stop it, and start it without waiting 
({{no_wait=True, wait_other_notice=False}}).
This while I keep inserting. Unfortunately no luck.

I have also been trying byteman as well, with the idea of slowing down how 
quickly the starting node becomes aware of the topology.
Unfortunately it seems I am either able to start the node without waiting, or 
submit a rule with byteman.
If I don't want wait for the node to start, the rule submission fails; while if 
I wait, the rule doesn't trigger because the node is already up and running.

Any suggestion how to proceed?


was (Author: ostefano):
Hi [~iamaleksey], I am having some difficulties reproducing it.

I am bootstrapping 3 nodes via cassandra-dtest, and updating the same counter 
3 times in three different phases.
During each phase I pick a node, stop it, and start it without waiting 
(`no_wait=True, wait_other_notice=False`).
This while I keep inserting. Unfortunately no luck.

I have also been trying byteman as well, with the idea of slowing down how 
quickly the starting node becomes aware of the topology.
Unfortunately it seems I am either able to start the node without waiting, or 
submit a rule with byteman.
If I don't want wait for the node to start, the rule submission fails; while if 
I wait, the rule doesn't trigger because the node is already up and running.

Any suggestion how to proceed?

> UnavailabeException caused by counter writes forwarded to leaders without 
> complete cluster view
> ---
>
> Key: CASSANDRA-13043
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13043
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian
>Reporter: Catalin Alexandru Zamfir
>
> In version 3.9 of Cassandra, we get the following exceptions on the 
> system.log whenever booting an agent. They seem to grow in number with each 
> reboot. Any idea where they come from or what can we do about them? Note that 
> the cluster is healthy (has sufficient live nodes).
> {noformat}
> 2/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMINFO  10:39:47 Updating topology for /10.136.64.120
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-111,5,main]: {}
> 12/14/2016 12:39:47 PMorg.apache.cassandra.exceptions.UnavailableException: 
> Cannot achieve consistency level LOCAL_QUORUM
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.ConsistencyLevel.assureSufficientLiveNodes(ConsistencyLevel.java:313)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.assureSufficientLiveNodes(AbstractWriteResponseHandler.java:146)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1054)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.service.StorageProxy.applyCounterMutationOnLeader(StorageProxy.java:1450)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.db.CounterMutationVerbHandler.doVerb(CounterMutationVerbHandler.java:48)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_111]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
> 12/14/2016 12:39:47 PMat java.lang.Thread.run(Thread.java:745) 
> [na:1.8.0_111]
> 12/14/2016 12:39:47 PMWARN  10:39:47 Uncaught exception on thread 
> Thread[CounterMutationStage-118,5,main]: {}
> 12/14/2016 12:39:47