If you're getting 30 second GC's, this all by itself could and probably
does explain the problem.

If you're writing exclusively to A, and there are frequent partitions
between A and B, then A is potentially working a lot harder than B, because
it needs to keep track of hinted handoffs to replay to B whenever
connectivity is restored.  It's also acting as coordinator for writes which
need to end up in B eventually.  This in turn may be a significant
contributing factor to your GC pressure in A.

I'd also grow suspicious of the integrity of B as a reliable backup of A
unless you're running repair on a regular basis.

Also, if you have thousands of SSTables, then you're probably falling
behind on compaction, check nodetool compactionstats - you should typically
have < 5 outstanding tasks (preferably 0-1).  If you're not behind on
compaction, your sstable_size_in_mb might be a bad value for your use case.

On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam <ptrstp...@gmail.com> wrote:

> *Environment*
>
>
>    - Cassandra 2.1.0
>    - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B)
>    - 2500 writes per seconds, I write only to DC_A with local_quorum
>    - minimal reads (usually none, sometimes few)
>
> *Problem*
>
> After a few weeks of running I cannot read any data from my cluster,
> because I have ReadTimeoutException like following:
>
> ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - 
> Error occurred during processing of message.
> com.google.common.util.concurrent.UncheckedExecutionException: 
> java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
> received only 2 responses.
>
> To be precise it is not only problem in my cluster, The second one was
> described here: Cassandra GC takes 30 seconds and hangs node
> <http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node>
>  and
> I will try to use fix from CASSANDRA-6541
> <http://issues.apache.org/jira/browse/CASSANDRA-6541> as leshkin suggested
>
> *Diagnose *
>
> I tried to use some tools which were presented on
> http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
> by Jon Haddad and have some strange result.
>
>
> I tried to run same query in DC_A and DC_B with tracing enabled. Query is
> simple:
>
>    SELECT * FROM X.customer_events WHERE customer='1234567' AND
> utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10);
>
> Where table is defiied as following:
>
>   CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day int, 
> bucket
> int, event_time bigint, event_id blob, event_type int, event blob,
>
>   PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id,
> event_type)[...]
>
> Results of the query:
>
> 1) In DC_B the query finished in less then a 0.22 of second . In DC_A more
> then 2.5 (~10 times longer). -> the problem is that bucket can be in range
> form -128 to 256
>
> 2) In DC_B it checked ~1000 SSTables with lines like:
>
>    Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] |
> 2015-01-12 13:51:49.467001 | 192.168.71.198 |           4782
>
> Where in DC_A it is:
>
>    Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] |
> 2015-01-12 14:01:39.520001 | 192.168.61.199 |          25527
>
> 3) Total records in both DC were same.
>
>
> *Question*
>
> The question is quite simple: how can I speed up DC_A - it is my primary
> DC, DC_B is mostly for backup, and there is a lot of network partitions
> between A and B.
>
> Maybe I should check something more, but I just don't have an idea what it
> should be.
>
>
>

Reply via email to