If you're getting 30 second GC's, this all by itself could and probably does explain the problem.
If you're writing exclusively to A, and there are frequent partitions between A and B, then A is potentially working a lot harder than B, because it needs to keep track of hinted handoffs to replay to B whenever connectivity is restored. It's also acting as coordinator for writes which need to end up in B eventually. This in turn may be a significant contributing factor to your GC pressure in A. I'd also grow suspicious of the integrity of B as a reliable backup of A unless you're running repair on a regular basis. Also, if you have thousands of SSTables, then you're probably falling behind on compaction, check nodetool compactionstats - you should typically have < 5 outstanding tasks (preferably 0-1). If you're not behind on compaction, your sstable_size_in_mb might be a bad value for your use case. On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam <ptrstp...@gmail.com> wrote: > *Environment* > > > - Cassandra 2.1.0 > - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B) > - 2500 writes per seconds, I write only to DC_A with local_quorum > - minimal reads (usually none, sometimes few) > > *Problem* > > After a few weeks of running I cannot read any data from my cluster, > because I have ReadTimeoutException like following: > > ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - > Error occurred during processing of message. > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - > received only 2 responses. > > To be precise it is not only problem in my cluster, The second one was > described here: Cassandra GC takes 30 seconds and hangs node > <http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node> > and > I will try to use fix from CASSANDRA-6541 > <http://issues.apache.org/jira/browse/CASSANDRA-6541> as leshkin suggested > > *Diagnose * > > I tried to use some tools which were presented on > http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ > by Jon Haddad and have some strange result. > > > I tried to run same query in DC_A and DC_B with tracing enabled. Query is > simple: > > SELECT * FROM X.customer_events WHERE customer='1234567' AND > utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10); > > Where table is defiied as following: > > CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day int, > bucket > int, event_time bigint, event_id blob, event_type int, event blob, > > PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id, > event_type)[...] > > Results of the query: > > 1) In DC_B the query finished in less then a 0.22 of second . In DC_A more > then 2.5 (~10 times longer). -> the problem is that bucket can be in range > form -128 to 256 > > 2) In DC_B it checked ~1000 SSTables with lines like: > > Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] | > 2015-01-12 13:51:49.467001 | 192.168.71.198 | 4782 > > Where in DC_A it is: > > Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] | > 2015-01-12 14:01:39.520001 | 192.168.61.199 | 25527 > > 3) Total records in both DC were same. > > > *Question* > > The question is quite simple: how can I speed up DC_A - it is my primary > DC, DC_B is mostly for backup, and there is a lot of network partitions > between A and B. > > Maybe I should check something more, but I just don't have an idea what it > should be. > > >