Tom van der Woerdt created CASSANDRA-13810:
----------------------------------------------

             Summary: Overload because of hint pressure + MVs
                 Key: CASSANDRA-13810
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13810
             Project: Cassandra
          Issue Type: Bug
            Reporter: Tom van der Woerdt


Cluster setup: 3 DCs, 20 Cassandra nodes each, all 3.0.14, with approx. 200GB 
data per machine. Many tables have MVs associated.

During some maintenance we did a rolling restart of all nodes in the cluster. 
This caused a buildup of hints/batches, as expected. Most nodes came back just 
fine, except for two nodes.

These two nodes came back with a loadavg of >100, and 'nodetool tpstats' showed 
a million (not exaggerating) MutationStage tasks per second(!). It was clear 
that these were mostly (all?) mutations coming from hints, as indicated by 
thousands of log entries per second in debug.log :

{noformat}
DEBUG [SharedPool-Worker-107] 2017-08-27 13:16:51,098 HintVerbHandler.java:95 - 
Failed to apply hint
java.util.concurrent.CompletionException: 
org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - 
received only 0 responses.
    at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
 ~[na:1.8.0_144]
    at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
 ~[na:1.8.0_144]
    at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) 
~[na:1.8.0_144]
    at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
 ~[na:1.8.0_144]
    at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) 
~[na:1.8.0_144]
    at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
 ~[na:1.8.0_144]
    at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:481) 
~[apache-cassandra-3.0.14.jar:3.0.14]
    at 
org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:495) 
~[apache-cassandra-3.0.14.jar:3.0.14]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_144]
    at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 ~[apache-cassandra-3.0.14.jar:3.0.14]
    at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
~[apache-cassandra-3.0.14.jar:3.0.14]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation 
timed out - received only 0 responses.
    ... 6 common frames omitted
{noformat}

After reading the relevant code, it seems that a hint is considered droppable, 
and in the mutation path when the table contains a MV and the lock fails to 
acquire and the mutation is droppable, it throws a WTE without waiting until 
the timeout expires. This explains why Cassandra is able to process a million 
mutations per second without actually considering them 'dropped' in the 
'nodetool tpstats' output.

I managed to recover the two nodes by stopping handoffs on all nodes in the 
cluster and reenabling them one at a time. It's likely that the hint/batchlog 
settings were sub-optimal on this cluster, but I think that the retry 
behavior(?) of hints should be improved as it's hard to express hint throughput 
in kb/s when the mutations can involve MVs.

More data available upon request -- I'm not sure which bits are relevant and 
which aren't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to