They will show up in nodetool compactionstats : https://issues.apache.org/jira/browse/CASSANDRA-9098
Did you check nodetool tpstats to see if you didn't have any running repair session ? Just to make sure (and if you can actually do it), roll restart the cluster and try again. Repair sessions can get sticky sometimes. On Wed, Sep 28, 2016 at 4:23 PM Robert Sicoie <[email protected]> wrote: > I am using nodetool compactionstats to check for pending compactions and > it shows me 0 pending on all nodes, seconds before running nodetool repair. > I am also monitoring PendingCompactions on jmx. > > Is there other way I can find out if is there any anticompaction running > on any node? > > Thanks a lot, > Robert > > Robert Sicoie > > On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski < > [email protected]> wrote: > >> Robert, >> >> you need to make sure you have no repair session currently running on >> your cluster, and no anticompaction. >> I'd recommend doing a rolling restart in order to stop all running repair >> for sure, then start the process again, node by node, checking that no >> anticompaction is running before moving from one node to the other. >> >> Please do not use the -pr switch as it is both useless (token ranges are >> repaired only once with inc repair, whatever the replication factor) and >> harmful as all anticompactions won't be executed (you'll still have >> sstables marked as unrepaired even if the process has ran entirely with no >> error). >> >> Let us know how that goes. >> >> Cheers, >> >> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie <[email protected]> >> wrote: >> >>> Thanks Alexander, >>> >>> Now I started to run the repair with -pr arg and with keyspace and table >>> args. >>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288 >>> RepairRunnable.java:246 - Repair session >>> 89af4d10-856f-11e6-b28f-df99132d7979 for range >>> [(8323429577695061526,8326640819362122791], >>> ..., (4212695343340915405,4229348077081465596]]] Validation failed in / >>> 10.45.113.88" >>> >>> for one of the tables. 10.45.113.88 is the ip of the machine I am >>> running the nodetool on. >>> I'm wondering if this is normal... >>> >>> Thanks, >>> Robert >>> >>> >>> >>> >>> Robert Sicoie >>> >>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski < >>> [email protected]> wrote: >>> >>>> Hi, >>>> >>>> nodetool scrub won't help here, as what you're experiencing is most >>>> likely that one SSTable is going through anticompaction, and then another >>>> node is asking for a Merkle tree that involves it. >>>> For understandable reasons, an SSTable cannot be anticompacted and >>>> validation compacted at the same time. >>>> >>>> The solution here is to adjust the repair pressure on your cluster so >>>> that anticompaction can end before you run repair on another node. >>>> You may have a lot of anticompaction to do if you had high volumes of >>>> unrepaired data, which can take a long time depending on several factors. >>>> >>>> You can tune your repair process to make sure no anticompaction is >>>> running before launching a new session on another node or you can try my >>>> Reaper fork that handles incremental repair : >>>> https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui >>>> I may have to add a few checks in order to avoid all collisions between >>>> anticompactions and new sessions, but it should be helpful if you struggle >>>> with incremental repair. >>>> >>>> In any case, check if your nodes are still anticompacting before trying >>>> to run a new repair session on a node. >>>> >>>> Cheers, >>>> >>>> >>>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <[email protected]> >>>> wrote: >>>> >>>>> Hi guys, >>>>> >>>>> I have a cluster of 5 nodes, cassandra 3.0.5. >>>>> I was running nodetool repair last days, one node at a time, when I >>>>> first encountered this exception >>>>> >>>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409 >>>>> CassandraDaemon.java:195 - Exception in thread >>>>> Thread[ValidationExecutor:11,1,main]* >>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions >>>>> over the same sstables* >>>>> * at >>>>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>> ~[na:1.8.0_60]* >>>>> * at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>> ~[na:1.8.0_60]* >>>>> * at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>> [na:1.8.0_60]* >>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>>>> >>>>> On some of the other boxes I see this: >>>>> >>>>> >>>>> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair >>>>> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv, >>>>> [(-7505573573695693981,-7495786486761919991],* >>>>> *....* >>>>> * (-8483612809930827919,-8480482504800860871]]] Validation failed in >>>>> /10.45.113.67 <http://10.45.113.67>* >>>>> * at >>>>> org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at org.apache.cassandra.net >>>>> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:67) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>>> ~[na:1.8.0_60]* >>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>> ~[na:1.8.0_60]* >>>>> * at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>> [na:1.8.0_60]* >>>>> * at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>> [na:1.8.0_60]* >>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>>>> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 >>>>> CassandraDaemon.java:195 - Exception in thread >>>>> Thread[RepairJobTask:3,5,RMI >>>>> Runtime]* >>>>> *java.lang.AssertionError: java.lang.InterruptedException* >>>>> * at org.apache.cassandra.net >>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at org.apache.cassandra.net >>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:761) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at org.apache.cassandra.net >>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:729) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>> ~[na:1.8.0_60]* >>>>> * at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>> ~[na:1.8.0_60]* >>>>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]* >>>>> *Caused by: java.lang.InterruptedException: null* >>>>> * at >>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) >>>>> ~[na:1.8.0_60]* >>>>> * at >>>>> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) >>>>> ~[na:1.8.0_60]* >>>>> * at >>>>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) >>>>> ~[na:1.8.0_60]* >>>>> * at org.apache.cassandra.net >>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168) >>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>>> * ... 6 common frames omitted* >>>>> >>>>> >>>>> Now if I run nodetool repair I get the >>>>> >>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions >>>>> over the same sstables* >>>>> >>>>> exception. >>>>> What do you suggest? would nodetool scrub or sstablescrub help in this >>>>> case. or it would just make it worse? >>>>> >>>>> Thanks, >>>>> >>>>> Robert >>>>> >>>> -- >>>> ----------------- >>>> Alexander Dejanovski >>>> France >>>> @alexanderdeja >>>> >>>> Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>> >>> -- >> ----------------- >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > > -- ----------------- Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com
