To precise your remarks: 1) About 30 sec GC. I know that after time my cluster had such problem, we added "magic" flag, but result will be in ~2 weeks (as I presented in screen on StackOverflow). If you have any idea how can fix/diagnose this problem, I will be very grateful.
2) It is probably true, but I don't think that I can change it. Our data centers are in different places and the network between them is not perfect. But as we observed network partition happened rare. Maximum is once a week for an hour. 3) We are trying to do a regular repairs (incremental), but usually they do not finish. Even local repairs have problems with finishing. 4) I will check it as soon as possible and post it here. If you have any suggestion what else should I check, you are welcome :) On Mon, Jan 12, 2015 at 7:28 PM, Eric Stevens <migh...@gmail.com> wrote: > If you're getting 30 second GC's, this all by itself could and probably > does explain the problem. > > If you're writing exclusively to A, and there are frequent partitions > between A and B, then A is potentially working a lot harder than B, because > it needs to keep track of hinted handoffs to replay to B whenever > connectivity is restored. It's also acting as coordinator for writes which > need to end up in B eventually. This in turn may be a significant > contributing factor to your GC pressure in A. > > I'd also grow suspicious of the integrity of B as a reliable backup of A > unless you're running repair on a regular basis. > > Also, if you have thousands of SSTables, then you're probably falling > behind on compaction, check nodetool compactionstats - you should typically > have < 5 outstanding tasks (preferably 0-1). If you're not behind on > compaction, your sstable_size_in_mb might be a bad value for your use case. > > On Mon, Jan 12, 2015 at 7:35 AM, Ja Sam <ptrstp...@gmail.com> wrote: > >> *Environment* >> >> >> - Cassandra 2.1.0 >> - 5 nodes in one DC (DC_A), 4 nodes in second DC (DC_B) >> - 2500 writes per seconds, I write only to DC_A with local_quorum >> - minimal reads (usually none, sometimes few) >> >> *Problem* >> >> After a few weeks of running I cannot read any data from my cluster, >> because I have ReadTimeoutException like following: >> >> ERROR [Thrift:15] 2015-01-07 14:16:21,124 CustomTThreadPoolServer.java:219 - >> Error occurred during processing of message. >> com.google.common.util.concurrent.UncheckedExecutionException: >> java.lang.RuntimeException: >> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - >> received only 2 responses. >> >> To be precise it is not only problem in my cluster, The second one was >> described here: Cassandra GC takes 30 seconds and hangs node >> <http://stackoverflow.com/questions/27843538/cassandra-gc-takes-30-seconds-and-hangs-node> >> and >> I will try to use fix from CASSANDRA-6541 >> <http://issues.apache.org/jira/browse/CASSANDRA-6541> as leshkin >> suggested >> >> *Diagnose * >> >> I tried to use some tools which were presented on >> http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/ >> by Jon Haddad and have some strange result. >> >> >> I tried to run same query in DC_A and DC_B with tracing enabled. Query is >> simple: >> >> SELECT * FROM X.customer_events WHERE customer='1234567' AND >> utc_day=16447 AND bucket IN (1,2,3,4,5,6,7,8,9,10); >> >> Where table is defiied as following: >> >> CREATE TABLE drev_maelstrom.customer_events (customer text,utc_day >> int, bucket int, event_time bigint, event_id blob, event_type int, event >> blob, >> >> PRIMARY KEY ((customer, utc_day, bucket), event_time, event_id, >> event_type)[...] >> >> Results of the query: >> >> 1) In DC_B the query finished in less then a 0.22 of second . In DC_A >> more then 2.5 (~10 times longer). -> the problem is that bucket can be in >> range form -128 to 256 >> >> 2) In DC_B it checked ~1000 SSTables with lines like: >> >> Bloom filter allows skipping sstable 50372 [SharedPool-Worker-7] | >> 2015-01-12 13:51:49.467001 | 192.168.71.198 | 4782 >> >> Where in DC_A it is: >> >> Bloom filter allows skipping sstable 118886 [SharedPool-Worker-5] | >> 2015-01-12 14:01:39.520001 | 192.168.61.199 | 25527 >> >> 3) Total records in both DC were same. >> >> >> *Question* >> >> The question is quite simple: how can I speed up DC_A - it is my primary >> DC, DC_B is mostly for backup, and there is a lot of network partitions >> between A and B. >> >> Maybe I should check something more, but I just don't have an idea what >> it should be. >> >> >> >