strange behavior of counter tables after losing a node

Attila Wind Thu, 21 Jan 2021 22:35:12 -0800

Hey guys,

Yesterday we had an outage after we have lost a node and we saw such abehavior we can not explain.

Our data schema has both: counter and norma tables. And we havereplicationFactor = 2 and consistency level LOCAL_ONE (explicitly set)


What we saw:

After a node went down the updates of the counter tables slowed down. Alot! These updates normally take only a few millisecs but now started totake 30-60 seconds(!)At the same time the write ops against non-counter tables did not showany difference. The app log was silent in a sense of errors. So thequeries - including the counter table updates - were not failing(otherwise we see exceptions coming from DAO layer originating fromCassandra driver) at all.One more thing: only those updates suffered from the above huuuge waittime where the lost node was involved (due to partition key). Otherupdates just went fine

The whole stuff looks like Cassandra internally started to wait - a lot- for the lost node. Updates finally succeeded without failure - atleast for the App (the client)


Did anyone ever experienced similar behavior?
What could be an explanation for the above?

Some more details: the App is implemented in Java 8, we are usingDatastax driver 3.7.1 and server cluster is running on Cassandra 4.0alpha 4. Cluster size is 3 nodes.


Any feedback is appreciated! :-)

thanks

--
Attila Wind

http://www.linkedin.com/in/attilaw <http://www.linkedin.com/in/attilaw>
Mobile: +49 176 43556932

strange behavior of counter tables after losing a node

Reply via email to