Hey guys,

Yesterday we had an outage after we have lost a node and we saw such a behavior we can not explain.

Our data schema has both: counter and norma tables. And we have replicationFactor = 2 and consistency level LOCAL_ONE (explicitly set)

What we saw:
After a node went down the updates of the counter tables slowed down. A lot! These updates normally take only a few millisecs but now started to take 30-60 seconds(!) At the same time the write ops against non-counter tables did not show any difference. The app log was silent in a sense of errors. So the queries - including the counter table updates - were not failing (otherwise we see exceptions coming from DAO layer originating from Cassandra driver) at all. One more thing: only those updates suffered from the above huuuge wait time where the lost node was involved (due to partition key). Other updates just went fine

The whole stuff looks like Cassandra internally started to wait - a lot - for the lost node. Updates finally succeeded without failure - at least for the App (the client)

Did anyone ever experienced similar behavior?
What could be an explanation for the above?

Some more details: the App is implemented in Java 8, we are using Datastax driver 3.7.1 and server cluster is running on Cassandra 4.0 alpha 4. Cluster size is 3 nodes.

Any feedback is appreciated! :-)

thanks

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932


Reply via email to