My feeling here is some of the repair jobs remained somehow pending, and now when I try to run repair on those sstables I get the "Cannot start multiple repair sessions over the same sstables" exception.
I checked with nodetool compactionstats for pending tasks before running nodetool repair, and I still get this error... Is there a way to continue that hanging repair, if it remained in progress? Thanks, Robert Robert Sicoie On Wed, Sep 28, 2016 at 3:56 PM, Robert Sicoie <[email protected]> wrote: > Thanks Alexander, > > Now I started to run the repair with -pr arg and with keyspace and table > args. > Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288 > RepairRunnable.java:246 - Repair session 89af4d10-856f-11e6-b28f-df99132d7979 > for range [(8323429577695061526,8326640819362122791], > ..., (4212695343340915405,4229348077081465596]]] Validation failed in / > 10.45.113.88" > > for one of the tables. 10.45.113.88 is the ip of the machine I am running > the nodetool on. > I'm wondering if this is normal... > > Thanks, > Robert > > > > > Robert Sicoie > > On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski < > [email protected]> wrote: > >> Hi, >> >> nodetool scrub won't help here, as what you're experiencing is most >> likely that one SSTable is going through anticompaction, and then another >> node is asking for a Merkle tree that involves it. >> For understandable reasons, an SSTable cannot be anticompacted and >> validation compacted at the same time. >> >> The solution here is to adjust the repair pressure on your cluster so >> that anticompaction can end before you run repair on another node. >> You may have a lot of anticompaction to do if you had high volumes of >> unrepaired data, which can take a long time depending on several factors. >> >> You can tune your repair process to make sure no anticompaction is >> running before launching a new session on another node or you can try my >> Reaper fork that handles incremental repair : >> https://github.com/adejanovski/cassandra-reaper/tree/inc- >> repair-support-with-ui >> I may have to add a few checks in order to avoid all collisions between >> anticompactions and new sessions, but it should be helpful if you struggle >> with incremental repair. >> >> In any case, check if your nodes are still anticompacting before trying >> to run a new repair session on a node. >> >> Cheers, >> >> >> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <[email protected]> >> wrote: >> >>> Hi guys, >>> >>> I have a cluster of 5 nodes, cassandra 3.0.5. >>> I was running nodetool repair last days, one node at a time, when I >>> first encountered this exception >>> >>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409 >>> CassandraDaemon.java:195 - Exception in thread >>> Thread[ValidationExecutor:11,1,main]* >>> *java.lang.RuntimeException: Cannot start multiple repair sessions over >>> the same sstables* >>> * at >>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at >>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at >>> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at >>> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> ~[na:1.8.0_60]* >>> * at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> ~[na:1.8.0_60]* >>> * at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> [na:1.8.0_60]* >>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>> >>> On some of the other boxes I see this: >>> >>> >>> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair >>> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv, >>> [(-7505573573695693981,-7495786486761919991],* >>> *....* >>> * (-8483612809930827919,-8480482504800860871]]] Validation failed in >>> /10.45.113.67 <http://10.45.113.67>* >>> * at org.apache.cassandra.repair.Va >>> <http://org.apache.cassandra.repair.Va>lidationTask.treesReceived(ValidationTask.java:68) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at org.apache.cassandra.repair.Re >>> <http://org.apache.cassandra.repair.Re>pairSession.validationComplete(RepairSession.java:183) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at >>> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at org.apache.cassandra.repair.Re >>> <http://org.apache.cassandra.repair.Re>pairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at org.apache.cassandra.net >>> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:67) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> ~[na:1.8.0_60]* >>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> ~[na:1.8.0_60]* >>> * at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> [na:1.8.0_60]* >>> * at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> [na:1.8.0_60]* >>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 >>> CassandraDaemon.java:195 - Exception in thread Thread[RepairJobTask:3,5,RMI >>> Runtime]* >>> *java.lang.AssertionError: java.lang.InterruptedException* >>> * at org.apache.cassandra.net >>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at org.apache.cassandra.net >>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:761) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at org.apache.cassandra.net >>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:729) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at org.apache.cassandra.repair.Va >>> <http://org.apache.cassandra.repair.Va>lidationTask.run(ValidationTask.java:56) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> ~[na:1.8.0_60]* >>> * at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> ~[na:1.8.0_60]* >>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]* >>> *Caused by: java.lang.InterruptedException: null* >>> * at >>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) >>> ~[na:1.8.0_60]* >>> * at >>> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) >>> ~[na:1.8.0_60]* >>> * at >>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) >>> ~[na:1.8.0_60]* >>> * at org.apache.cassandra.net >>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168) >>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>> * ... 6 common frames omitted* >>> >>> >>> Now if I run nodetool repair I get the >>> >>> *java.lang.RuntimeException: Cannot start multiple repair sessions over >>> the same sstables* >>> >>> exception. >>> What do you suggest? would nodetool scrub or sstablescrub help in this >>> case. or it would just make it worse? >>> >>> Thanks, >>> >>> Robert >>> >> -- >> ----------------- >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > >
