I am using nodetool compactionstats to check for pending compactions and it shows me 0 pending on all nodes, seconds before running nodetool repair. I am also monitoring PendingCompactions on jmx.
Is there other way I can find out if is there any anticompaction running on any node? Thanks a lot, Robert Robert Sicoie On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski < [email protected]> wrote: > Robert, > > you need to make sure you have no repair session currently running on your > cluster, and no anticompaction. > I'd recommend doing a rolling restart in order to stop all running repair > for sure, then start the process again, node by node, checking that no > anticompaction is running before moving from one node to the other. > > Please do not use the -pr switch as it is both useless (token ranges are > repaired only once with inc repair, whatever the replication factor) and > harmful as all anticompactions won't be executed (you'll still have > sstables marked as unrepaired even if the process has ran entirely with no > error). > > Let us know how that goes. > > Cheers, > > On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie <[email protected]> > wrote: > >> Thanks Alexander, >> >> Now I started to run the repair with -pr arg and with keyspace and table >> args. >> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288 >> RepairRunnable.java:246 - Repair session 89af4d10-856f-11e6-b28f-df99132d7979 >> for range [(8323429577695061526,8326640819362122791], >> ..., (4212695343340915405,4229348077081465596]]] Validation failed in / >> 10.45.113.88" >> >> for one of the tables. 10.45.113.88 is the ip of the machine I am running >> the nodetool on. >> I'm wondering if this is normal... >> >> Thanks, >> Robert >> >> >> >> >> Robert Sicoie >> >> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski < >> [email protected]> wrote: >> >>> Hi, >>> >>> nodetool scrub won't help here, as what you're experiencing is most >>> likely that one SSTable is going through anticompaction, and then another >>> node is asking for a Merkle tree that involves it. >>> For understandable reasons, an SSTable cannot be anticompacted and >>> validation compacted at the same time. >>> >>> The solution here is to adjust the repair pressure on your cluster so >>> that anticompaction can end before you run repair on another node. >>> You may have a lot of anticompaction to do if you had high volumes of >>> unrepaired data, which can take a long time depending on several factors. >>> >>> You can tune your repair process to make sure no anticompaction is >>> running before launching a new session on another node or you can try my >>> Reaper fork that handles incremental repair : https://github.com/ >>> adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui >>> I may have to add a few checks in order to avoid all collisions between >>> anticompactions and new sessions, but it should be helpful if you struggle >>> with incremental repair. >>> >>> In any case, check if your nodes are still anticompacting before trying >>> to run a new repair session on a node. >>> >>> Cheers, >>> >>> >>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <[email protected]> >>> wrote: >>> >>>> Hi guys, >>>> >>>> I have a cluster of 5 nodes, cassandra 3.0.5. >>>> I was running nodetool repair last days, one node at a time, when I >>>> first encountered this exception >>>> >>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409 >>>> CassandraDaemon.java:195 - Exception in thread >>>> Thread[ValidationExecutor:11,1,main]* >>>> *java.lang.RuntimeException: Cannot start multiple repair sessions over >>>> the same sstables* >>>> * at >>>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> ~[na:1.8.0_60]* >>>> * at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> ~[na:1.8.0_60]* >>>> * at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> [na:1.8.0_60]* >>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>>> >>>> On some of the other boxes I see this: >>>> >>>> >>>> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair >>>> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv, >>>> [(-7505573573695693981,-7495786486761919991],* >>>> *....* >>>> * (-8483612809930827919,-8480482504800860871]]] Validation failed in >>>> /10.45.113.67 <http://10.45.113.67>* >>>> * at >>>> org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at org.apache.cassandra.net >>>> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:67) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>>> ~[na:1.8.0_60]* >>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>> ~[na:1.8.0_60]* >>>> * at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> [na:1.8.0_60]* >>>> * at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> [na:1.8.0_60]* >>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]* >>>> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096 >>>> CassandraDaemon.java:195 - Exception in thread Thread[RepairJobTask:3,5,RMI >>>> Runtime]* >>>> *java.lang.AssertionError: java.lang.InterruptedException* >>>> * at org.apache.cassandra.net >>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at org.apache.cassandra.net >>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:761) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at org.apache.cassandra.net >>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:729) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> ~[na:1.8.0_60]* >>>> * at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> ~[na:1.8.0_60]* >>>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]* >>>> *Caused by: java.lang.InterruptedException: null* >>>> * at >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) >>>> ~[na:1.8.0_60]* >>>> * at >>>> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) >>>> ~[na:1.8.0_60]* >>>> * at >>>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) >>>> ~[na:1.8.0_60]* >>>> * at org.apache.cassandra.net >>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168) >>>> ~[apache-cassandra-3.0.5.jar:3.0.5]* >>>> * ... 6 common frames omitted* >>>> >>>> >>>> Now if I run nodetool repair I get the >>>> >>>> *java.lang.RuntimeException: Cannot start multiple repair sessions over >>>> the same sstables* >>>> >>>> exception. >>>> What do you suggest? would nodetool scrub or sstablescrub help in this >>>> case. or it would just make it worse? >>>> >>>> Thanks, >>>> >>>> Robert >>>> >>> -- >>> ----------------- >>> Alexander Dejanovski >>> France >>> @alexanderdeja >>> >>> Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com >>> >> >> -- > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com >
