Apologies I sent that prior email too soon. I see exceptions in the log during nodetool repair like these: ERROR [AntiEntropySessions:1] 2014-01-06 21:46:45,339 RepairSession.java (line 278) [repair #179e84f0-775f-11e3-9c6a-538c9374b226] session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #179e84f0-775f-11e3-9c6a-538c9374b226 on test_video/videosearchresult, (9124197757189535229,9176696932386753491]] Validation failed in /10.20.2.11 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:152) at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:188) at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:59) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
and on the server mentioned above: ERROR [ValidationExecutor:1] 2014-01-06 21:08:35,261 CassandraDaemon.java (line 187) Exception in thread Thread[ValidationExecutor:1,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.PrecompactedRow.update(PrecompactedRow.java:171) at org.apache.cassandra.repair.Validator.rowHash(Validator.java:198) at org.apache.cassandra.repair.Validator.add(Validator.java:151) at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:799) at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:62) at org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:397) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) I searched on these exceptions and it appears someone else was having a similar problem where upgrading to 2.0.3 fixed it (we are currently 2.0.2) http://www.mail-archive.com/user@cassandra.apache.org/msg33647.html We will be upgrading today. After a restart, memory usage was normal, GC shows primarily only the new space collections, and I see none of the CMS collections either. My read query delays are gone now as well. Not sure exactly what happened but I will continue to monitor memory usage and GC stats. Thanks again everyone. Thunder On Tue, Jan 7, 2014 at 11:11 AM, Thunder Stumpges < thunder.stump...@gmail.com> wrote: > Thanks Lee, > I put these in place on one of the three servers, and while doing so I > noticed in the logs on the other servers that there were many errors from > our weekly nodetool repair of this form: > > > > On Thu, Jan 2, 2014 at 9:34 PM, Lee Mighdoll <l...@underneath.ca> wrote: > >> Well from what I see in system.log it does not appear that GC aligns with >>> this delay. >>> >> Though it does seem like quite a few GCs take place. Here is my >>> system.log around the time of the delay: >>> >> >> It does sound like a lot of CMS runs - you'd like most of your garbage to >> be collected in new space (ParallelNew) not in old space (CMS). It also >> looks like your heap is staying fairly full - either CMS isn't recovering >> much (most likely) or you're generating a lot of garbage. I'm not familiar >> with the GCInspector logs, though, so I may be guessing too far. >> >> You can get gold standard gc logs by passing these parameters to the java >> command when you launch cassandra: >> >> -Xloggc:/path/to/where/to/put/the/gc.log >> -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps >> -XX:+PrintHeapAtGC >> -XX:+PrintTenuringDistribution >> -XX:+PrintGCApplicationStoppedTime >> -XX:+PrintPromotionFailure >> >> Once you have good logs, more people (and tools) can help interpret them. >> >> >> Cheers, >> Lee >> >> >>> >>> INFO [ScheduledTasks:1] 2014-01-02 12:30:22,164 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 479 ms for 1 collections, 7417499584 used; >>> max is 8375238656 >>> INFO [ScheduledTasks:1] 2014-01-02 12:30:37,579 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 557 ms for 1 collections, 7663015624 used; >>> max is 8375238656 >>> INFO [ScheduledTasks:1] 2014-01-02 12:31:50,416 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 633 ms for 1 collections, 6053238384used; >>> max is 8375238656 >>> INFO [ScheduledTasks:1] 2014-01-02 12:32:06,525 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 219 ms for 1 collections, 6430575008 used; >>> max is 8375238656 >>> INFO [ScheduledTasks:1] 2014-01-02 12:32:22,548 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 296 ms for 1 collections, 6732761584 used; >>> max is 8375238656 >>> INFO [ScheduledTasks:1] 2014-01-02 12:32:37,870 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 343 ms for 1 collections, 7020263776 used; >>> max is 8375238656 >>> INFO [ScheduledTasks:1] 2014-01-02 12:32:53,116 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 443 ms for 1 collections, 7276089816used; >>> max is 8375238656 >>> INFO [ScheduledTasks:1] 2014-01-02 12:33:08,557 GCInspector.java (line >>> 116) GC for ConcurrentMarkSweep: 554 ms for 1 collections, 7662839184 used; >>> max is 8375238656 >>> >>> The earliest start time on my three delayed requests is 12:32:00.083 and >>> they all clear out at 12:32:00.517. GC log events occur at 12:31:50 and at >>> 21:32:06. The above entries are the only log entries in system.log at this >>> time. Any other ideas? >>> >>> Is this type of GC frequency and delay "normal" ? >>> >>> thanks, >>> Thunder >>> >>> >>> >>> >>> On Thu, Jan 2, 2014 at 2:27 PM, Robert Coli <rc...@eventbrite.com>wrote: >>> >>>> On Thu, Jan 2, 2014 at 2:24 PM, Thunder Stumpges < >>>> thunder.stump...@gmail.com> wrote: >>>> >>>>> Excuse my ignorance, but where would I look for the GC info? What logs >>>>> contain this? I will start looking for log files and more clues in them. >>>>> >>>> >>>> system.log contains some basic info, you can enable extended gc info >>>> via options to the JVM. >>>> >>>> =Rob >>>> >>>> >>> >>> >> >