Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

Robert Sicoie Wed, 28 Sep 2016 07:25:26 -0700

I am using nodetool compactionstats to check for pending compactions and it
shows me 0 pending on all nodes, seconds before running nodetool repair.
I am also monitoring PendingCompactions on jmx.


Is there other way I can find out if is there any anticompaction running on
any node?

Thanks a lot,
Robert

Robert Sicoie

On Wed, Sep 28, 2016 at 4:44 PM, Alexander Dejanovski <
[email protected]> wrote:

> Robert,
>
> you need to make sure you have no repair session currently running on your
> cluster, and no anticompaction.
> I'd recommend doing a rolling restart in order to stop all running repair
> for sure, then start the process again, node by node, checking that no
> anticompaction is running before moving from one node to the other.
>
> Please do not use the -pr switch as it is both useless (token ranges are
> repaired only once with inc repair, whatever the replication factor) and
> harmful as all anticompactions won't be executed (you'll still have
> sstables marked as unrepaired even if the process has ran entirely with no
> error).
>
> Let us know how that goes.
>
> Cheers,
>
> On Wed, Sep 28, 2016 at 2:57 PM Robert Sicoie <[email protected]>
> wrote:
>
>> Thanks Alexander,
>>
>> Now I started to run the repair with -pr arg and with keyspace and table
>> args.
>> Still, I got the "ERROR [RepairJobTask:1] 2016-09-28 11:34:38,288
>> RepairRunnable.java:246 - Repair session 89af4d10-856f-11e6-b28f-df99132d7979
>> for range [(8323429577695061526,8326640819362122791],
>> ..., (4212695343340915405,4229348077081465596]]] Validation failed in /
>> 10.45.113.88"
>>
>> for one of the tables. 10.45.113.88 is the ip of the machine I am running
>> the nodetool on.
>> I'm wondering if this is normal...
>>
>> Thanks,
>> Robert
>>
>>
>>
>>
>> Robert Sicoie
>>
>> On Wed, Sep 28, 2016 at 11:53 AM, Alexander Dejanovski <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> nodetool scrub won't help here, as what you're experiencing is most
>>> likely that one SSTable is going through anticompaction, and then another
>>> node is asking for a Merkle tree that involves it.
>>> For understandable reasons, an SSTable cannot be anticompacted and
>>> validation compacted at the same time.
>>>
>>> The solution here is to adjust the repair pressure on your cluster so
>>> that anticompaction can end before you run repair on another node.
>>> You may have a lot of anticompaction to do if you had high volumes of
>>> unrepaired data, which can take a long time depending on several factors.
>>>
>>> You can tune your repair process to make sure no anticompaction is
>>> running before launching a new session on another node or you can try my
>>> Reaper fork that handles incremental repair : https://github.com/
>>> adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui
>>> I may have to add a few checks in order to avoid all collisions between
>>> anticompactions and new sessions, but it should be helpful if you struggle
>>> with incremental repair.
>>>
>>> In any case, check if your nodes are still anticompacting before trying
>>> to run a new repair session on a node.
>>>
>>> Cheers,
>>>
>>>
>>> On Wed, Sep 28, 2016 at 10:31 AM Robert Sicoie <[email protected]>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I have a cluster of 5 nodes, cassandra 3.0.5.
>>>> I was running nodetool repair last days, one node at a time, when I
>>>> first encountered this exception
>>>>
>>>> *ERROR [ValidationExecutor:11] 2016-09-27 16:12:20,409
>>>> CassandraDaemon.java:195 - Exception in thread
>>>> Thread[ValidationExecutor:11,1,main]*
>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions over
>>>> the same sstables*
>>>> * at
>>>> org.apache.cassandra.db.compaction.CompactionManager.getSSTablesToValidate(CompactionManager.java:1194)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1084)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:80)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:714)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> ~[na:1.8.0_60]*
>>>> * at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> ~[na:1.8.0_60]*
>>>> * at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> [na:1.8.0_60]*
>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>>>>
>>>> On some of the other boxes I see this:
>>>>
>>>>
>>>> *Caused by: org.apache.cassandra.exceptions.RepairException: [repair
>>>> #9dd21ab0-83f4-11e6-b28f-df99132d7979 on notes/operator_source_mv,
>>>> [(-7505573573695693981,-7495786486761919991],*
>>>> *....*
>>>> * (-8483612809930827919,-8480482504800860871]]] Validation failed in
>>>> /10.45.113.67 <http://10.45.113.67>*
>>>> * at
>>>> org.apache.cassandra.repair.ValidationTask.treesReceived(ValidationTask.java:68)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:408)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:168)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at org.apache.cassandra.net
>>>> <http://org.apache.cassandra.net>.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> ~[na:1.8.0_60]*
>>>> * at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>> ~[na:1.8.0_60]*
>>>> * at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> [na:1.8.0_60]*
>>>> * at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> [na:1.8.0_60]*
>>>> * at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]*
>>>> *ERROR [RepairJobTask:3] 2016-09-26 16:39:33,096
>>>> CassandraDaemon.java:195 - Exception in thread Thread[RepairJobTask:3,5,RMI
>>>> Runtime]*
>>>> *java.lang.AssertionError: java.lang.InterruptedException*
>>>> * at org.apache.cassandra.net
>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:172)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at org.apache.cassandra.net
>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:761)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at org.apache.cassandra.net
>>>> <http://org.apache.cassandra.net>.MessagingService.sendOneWay(MessagingService.java:729)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> org.apache.cassandra.repair.ValidationTask.run(ValidationTask.java:56)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> ~[na:1.8.0_60]*
>>>> * at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> ~[na:1.8.0_60]*
>>>> * at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]*
>>>> *Caused by: java.lang.InterruptedException: null*
>>>> * at
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
>>>> ~[na:1.8.0_60]*
>>>> * at
>>>> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
>>>> ~[na:1.8.0_60]*
>>>> * at
>>>> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
>>>> ~[na:1.8.0_60]*
>>>> * at org.apache.cassandra.net
>>>> <http://org.apache.cassandra.net>.OutboundTcpConnection.enqueue(OutboundTcpConnection.java:168)
>>>> ~[apache-cassandra-3.0.5.jar:3.0.5]*
>>>> * ... 6 common frames omitted*
>>>>
>>>>
>>>> Now if I run nodetool repair I get the
>>>>
>>>> *java.lang.RuntimeException: Cannot start multiple repair sessions over
>>>> the same sstables*
>>>>
>>>> exception.
>>>> What do you suggest? would nodetool scrub or sstablescrub help in this
>>>> case. or it would just make it worse?
>>>>
>>>> Thanks,
>>>>
>>>> Robert
>>>>
>>> --
>>> -----------------
>>> Alexander Dejanovski
>>> France
>>> @alexanderdeja
>>>
>>> Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>> --
> -----------------
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: How to get rid of "Cannot start multiple repair sessions over the same sstables" exception

Reply via email to