Having had a read through the archives, I missed this at first, but this seems to be *exactly* like what we're experiencing.
http://www.mail-archive.com/user@cassandra.apache.org/msg46064.html Only difference is we're getting this for reads and using CQL, but the behaviour is identical. On Thu, 25 Feb 2016 at 14:55 Emīls Šolmanis <emils.solma...@gmail.com> wrote: > Hello, > > We're having a problem with concurrent requests. It seems that whenever we > try resolving more > than ~ 15 queries at the same time, one or two get a read timeout and then > succeed on a retry. > > We're running Cassandra 2.2.4 accessed via the 2.1.9 Datastax driver on > AWS. > > What we've found while investigating: > > * this is not db-wide. Trying the same pattern against another table > everything works fine. > * it fails 1 or 2 requests regardless of how many are executed in > parallel, i.e., it's still 1 or 2 when we ramp it up to ~ 120 concurrent > requests and doesn't seem to scale up. > * the problem is consistently reproducible. It happens both under heavier > load and when just firing off a single batch of requests for testing. > * tracing the faulty requests says everything is great. An example trace: > https://gist.github.com/emilssolmanis/41e1e2ecdfd9a0569b1a > * the only peculiar thing in the logs is there's no acknowledgement of > the request being accepted by the server, as seen in > https://gist.github.com/emilssolmanis/242d9d02a6d8fb91da8a > * there's nothing funny in the timed out Cassandra node's logs around > that time as far as I can tell, not even in the debug logs. > > Any ideas about what might be causing this, pointers to server config > options, or how else we might debug this would be much appreciated. > > Kind regards, > Emils > >