Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

Robert Sicoie Wed, 28 Sep 2016 06:43:16 -0700

My feeling here is some of the repair jobs remained somehow pending, and
now when I try to run repair on those sstables I get the "Cannot start
multiple repair sessions over the same sstables" exception.


I checked with nodetool compactionstats for pending tasks before running
nodetool repair, and I still get this error... Is there a way to continue
that hanging repair, if it remained in progress?

Thanks,
Robert

Robert Sicoie

On Wed, Sep 28, 2016 at 3:56 PM, Robert Sicoie <[email protected]>
wrote:

> Thanks Alexander,
>
> Now I started to run the repair with -pr arg and with keyspace and table
> args.
> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
> RepairRunnable.java:246 - Repair session 89af4d10-856f-11e6-b28f-df99132d7979
> for range [(8323429577695061526,8326640819362122791],
> ..., (4212695343340915405,4229348077081465596]]] Validation failed in /
> 10.45.113.88"
>
> for one of the tables. 10.45.113.88 is the ip of the machine I am running
> the nodetool on.
> I'm wondering if this is normal...
>
> Thanks,
> Robert
>
>
>
>
> Robert Sicoie
>
> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
> [email protected]> wrote:
>
>> Hi,
>>
>> nodetool scrub won't help here, as what you're experiencing is most
>> likely that one SSTable is going through anticompaction, and then another
>> node is asking for a Merkle tree that involves it.
>> For understandable reasons, an SSTable cannot be anticompacted and
>> validation compacted at the same time.
>>
>> The solution here is to adjust the repair pressure on your cluster so
>> that anticompaction can end before you run repair on another node.
>> You may have a lot of anticompaction to do if you had high volumes of
>> unrepaired data, which can take a long time depending on several factors.
>>
>> You can tune your repair process to make sure no anticompaction is
>> running before launching a new session on another node or you can try my
>> Reaper fork that handles incremental repair :
>> https://github.com/adejanovski/cassandra-reaper/tree/inc-
>> repair-support-with-ui
>> I may have to add a few checks in order to avoid all collisions between
>> anticompactions and new sessions, but it should be helpful if you struggle
>> with incremental repair.
>>
>> In any case, check if your nodes are still anticompacting before trying
>> to run a new repair session on a node.
>>
>> Cheers,
>>
>>
>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <[email protected]>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I have a cluster of 5 nodes, cassandra 3.0.5.
>>> I was running nodetool repair last days, one node at a time, when I
>>> first encountered this exception
>>>
>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
>>> CassandraDaemon.java:195 - Exception in thread
>>> Thread[ValidationExecutor:11,1,main]*
>>> *java.lang.RuntimeException: Cannot start multiple repair sessions over
>>> the same sstables*
>>> * at
>>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at
>>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at
>>> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at
>>> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> ~[na:1.8.0_60]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> ~[na:1.8.0_60]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_60]*
>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>>>
>>> On some of the other boxes I see this:
>>>
>>>
>>> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair
>>> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv,
>>> [(-7505573573695693981,-7495786486761919991],*
>>> *....*
>>> * (-8483612809930827919,-8480482504800860871]]] Validation failed in
>>> /10.45.113.67 <http://10.45.113.67>*
>>> * at org.apache.cassandra.repair.Va
>>> <http://org.apache.cassandra.repair.Va>lidationTask.treesReceived(ValidationTask.java:68)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at org.apache.cassandra.repair.Re
>>> <http://org.apache.cassandra.repair.Re>pairSession.validationComplete(RepairSession.java:183)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at
>>> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at org.apache.cassandra.repair.Re
>>> <http://org.apache.cassandra.repair.Re>pairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at org.apache.cassandra.net
>>> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> ~[na:1.8.0_60]*
>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> ~[na:1.8.0_60]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> [na:1.8.0_60]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_60]*
>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>>> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096
>>> CassandraDaemon.java:195 - Exception in thread Thread[RepairJobTask:3,5,RMI
>>> Runtime]*
>>> *java.lang.AssertionError: java.lang.InterruptedException*
>>> * at org.apache.cassandra.net
>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at org.apache.cassandra.net
>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:761)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at org.apache.cassandra.net
>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:729)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at org.apache.cassandra.repair.Va
>>> <http://org.apache.cassandra.repair.Va>lidationTask.run(ValidationTask.java:56)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> ~[na:1.8.0_60]*
>>> * at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> ~[na:1.8.0_60]*
>>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]*
>>> *Caused by: java.lang.InterruptedException: null*
>>> * at
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
>>> ~[na:1.8.0_60]*
>>> * at
>>> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
>>> ~[na:1.8.0_60]*
>>> * at
>>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
>>> ~[na:1.8.0_60]*
>>> * at org.apache.cassandra.net
>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168)
>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>> * ... 6 common frames omitted*
>>>
>>>
>>> Now if I run nodetool repair I get the
>>>
>>> *java.lang.RuntimeException: Cannot start multiple repair sessions over
>>> the same sstables*
>>>
>>> exception.
>>> What do you suggest? would nodetool scrub or sstablescrub help in this
>>> case. or it would just make it worse?
>>>
>>> Thanks,
>>>
>>> Robert
>>>
>> --
>> -----------------
>> Alexander Dejanovski
>> France
>> @alexanderdeja
>>
>> Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>

Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

Reply via email to