The parent repair session will be on the node that you kicked off the repair on. Are the logs above from that node? Can you make it a bit clearer how many nodes are involved and the corresponding logs from each node?
On 9 January 2018 at 09:49, Hannu Kröger <hkro...@gmail.com> wrote: > We have run restarts on the cluster and that doesn’t seem to help at all. > > We ran repair separately for each table that seems to go through usually > but running a repair on a keyspace doesn’t. > > Anything anyone? > > Hannu > > > On 3 Jan 2018, at 23:24, Hannu Kröger <hkro...@gmail.com> wrote: > > I can certainly try that. No problem there. > > However wouldn’t we then get this kind of errors if that was the case: > > java.lang.RuntimeException: Cannot start multiple repair sessions over the > same sstables > > ? > > Hannu > > On 3 Jan 2018, at 20:50, Nandakishore Tokala < > nandakishore.tok...@gmail.com> wrote: > > hi Hannu, > > I think some of the repairs are hanging there. please restart all the > nodes in the cluster and start the repair > > > Thanks > Nanda > > On Wed, Jan 3, 2018 at 9:35 AM, Hannu Kröger <hkro...@gmail.com> wrote: > >> Additional notes: >> >> 1) If I run the repair just on those tables, it works fine >> 2) Those tables are empty >> >> Hannu >> >> > On 3 Jan 2018, at 18:23, Hannu Kröger <hkro...@gmail.com> wrote: >> > >> > Hello, >> > >> > Situation is as follows: >> > >> > Repair was started on node X on this keyspace with —full —pr. Repair >> fails on node Y. >> > >> > Node Y has debug logging on (DEBUG on org.apache.cassandra) and I’m >> looking at the debug.log. I see following messages related to this repair >> request: >> > >> > ----------- >> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,530 >> RepairMessageVerbHandler.java:114 - Validating >> ValidationRequest{gcBefore=1511473932} org.apache.cassandra.repair.me >> ssages.ValidationRequest@5a17430c >> > DEBUG [ValidationExecutor:4] 2018-01-02 17:52:12,531 >> StorageService.java:3321 - Forcing flush on keyspace mykeyspace, CF mytable >> > DEBUG [MemtablePostFlush:54] 2018-01-02 17:52:12,531 >> ColumnFamilyStore.java:954 - forceFlush requested but everything is clean >> in mytable >> > ERROR [ValidationExecutor:4] 2018-01-02 17:52:12,532 Validator.java:268 >> - Failed creating a merkle tree for [repair >> #1df000a0-effa-11e7-8361-b7c9edfbfc33 >> on mykeyspace/mytable, [(6917529027641081856,-9223372036854775808]]], / >> 123.123.123.123 (see log for details) >> > ----------- >> > >> > then the same about another table and after that which indicates that >> repair “master” has told to abort basically, right? >> > >> > ----------- >> > DEBUG [AntiEntropyStage:1] 2018-01-02 17:52:12,563 >> RepairMessageVerbHandler.java:142 - Got anticompaction request >> AnticompactionRequest{parentRepairSession=1de949e0-effa-11e7-8361-b7c9edfbfc33} >> org.apache.cassandra.repair.messages.AnticompactionRequest@5dc8be >> > ea >> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,563 >> RepairMessageVerbHandler.java:168 - Got error, removing parent repair >> session >> > ERROR [AntiEntropyStage:1] 2018-01-02 17:52:12,564 >> CassandraDaemon.java:228 - Exception in thread >> Thread[AntiEntropyStage:1,5,main] >> > java.lang.RuntimeException: java.lang.RuntimeException: Parent repair >> session with id = 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed. >> > at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb( >> RepairMessageVerbHandler.java:171) ~[apache-cassandra-3.11.0.jar:3.11.0] >> > at >> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) >> ~[apache-cassandra-3.11.0.jar:3.11.0] >> > at >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> ~[na:1.8.0_111] >> > at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> ~[na:1.8.0_111] >> > at >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> ~[na:1.8.0_111] >> > at >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> [na:1.8.0_111] >> > at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$ >> threadLocalDeallocator$0(NamedThreadFactory.java:81) >> [apache-cassandra-3.11.0.jar:3.11.0] >> > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111] >> > Caused by: java.lang.RuntimeException: Parent repair session with id = >> 1de949e0-effa-11e7-8361-b7c9edfbfc33 has failed. >> > at org.apache.cassandra.service.ActiveRepairService.getParentRe >> pairSession(ActiveRepairService.java:409) ~[apache-cassandra-3.11.0.jar: >> 3.11.0] >> > at org.apache.cassandra.service.ActiveRepairService.doAntiCompa >> ction(ActiveRepairService.java:444) ~[apache-cassandra-3.11.0.jar:3.11.0] >> > at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb( >> RepairMessageVerbHandler.java:143) ~[apache-cassandra-3.11.0.jar:3.11.0] >> > ... 7 common frames omitted >> > ----------- >> > >> > But that is almost all in the log and I don’t really see what the >> original problem here is. >> > >> > Cassandra flushes the table to start building merkle tree and on next >> millisecond it already fails the repair but without proper exception or >> error logging about the problem. >> > >> > Cassandra version is the 3.11.0. >> > >> > Any ideas? >> > >> > Cheers, >> > Hannu >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> > > > -- > Thanks & Regards, > Nanda Kishore > > > >