Thank you Reynald. I have contributed to that issue. But I cannot participate further right now because now I'm having an out of memory issue which may be unrelated. I think I'll start a new thread on this list for that.
~ Paul Nickerson On Thu, Jan 29, 2015 at 11:15 AM, Reynald Bourtembourg < reynald.bourtembo...@esrf.fr> wrote: > Hi Paul, > > There is a JIRA ticket about this issue: > https://issues.apache.org/jira/browse/CASSANDRA-8696 > > I have seen these errors too the last time I ran "nodetool repair". > I would also be interested to know the answer to the questions you were > asking: > > "Are these errors problematic? Should I just let the repair process > continue for however long it takes? " > "I am wondering whether this is making the repair ineffectual." > > Best regards > > Reynald > > > > On 29/01/2015 17:03, Paul Nickerson wrote: > > I am running a 6 node cluster using Apache Cassandra 2.1.2 with DataStax > OpsCenter 5.0.2 from the AWS EC2 AMI "DataStax Auto-Clustering AMI > 2.5.1-hvm" (DataStax Community AMI). When I try to run a repair on the > rollups60 column family in the OpsCenter keyspace, I get errors about > failed snapshot creation in the Cassandra system log. The repair seems to > continue, and then finishes with errors. > > I am wondering whether this is making the repair ineffectual. > > I am running the command > > nodetool repair OpsCenter rollups60 > > on one of the nodes (10.63.74.70). From the command, I get this output: > > [2015-01-23 19:36:06,261] Starting repair command #9, repairing 511 > ranges for keyspace OpsCenter (seq=true, full=true) > [2015-01-23 21:08:16,242] Repair session > 67772db0-a337-11e4-9e78-37e5027a626b for range > (5848435723460298978,5868916338423419522] failed with error > java.io.IOException: Failed during snapshot creation. > > The error is repeated many times, and they all appear right at the end. > Here is an example of what I see in the log on that same system (the one > that I'm running the command from, and the one that's trying to snapshot): > > INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,235 > RepairSession.java:171 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b] > Received merkle tree for rollups60 from /10.63.74.70 > INFO [AntiEntropySessions:9] 2015-01-23 19:38:28,236 > RepairSession.java:260 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] new > session: will sync /10.63.74.70, /10.51.180.16 on range > (5848435723460298978,5868916338423419522] for OpsCenter.[rollups60] > INFO [RepairJobTask:3] 2015-01-23 19:38:28,237 Differencer.java:74 - > [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Endpoints /10.13.157.190 > and /10.63.74.70 have 1 range(s) out of sync for rollups60 > INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,237 > ColumnFamilyStore.java:840 - Enqueuing flush of rollups60: 465365 (0%) > on-heap, 0 (0%) off-heap > INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,238 > Memtable.java:325 - Writing Memtable-rollups60@204861223(51960 serialized > bytes, 1395 ops, 0%/0% of on/off-heap limit) > INFO [RepairJobTask:3] 2015-01-23 19:38:28,239 > StreamingRepairTask.java:68 - [streaming task > #138b42e0-a337-11e4-9e78-37e5027a626b] Performing streaming repair of 1 > ranges with /10.13.157.190 > INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,262 > Memtable.java:364 - Completed flushing > /raid0/cassandra/data/OpsCenter/rollups60-445613507ca411e4bd3f1927a2a71193/OpsCenter-rollups60-ka-331933-Data.db > (29998 bytes) for commitlog position > ReplayPosition(segmentId=1422038939094, position=31047766) > ERROR [RepairJobTask:2] 2015-01-23 19:38:39,067 RepairJob.java:127 - > Error occurred during snapshot phase > java.lang.RuntimeException: Could not create snapshot at /10.63.74.70 > at > org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > INFO [AntiEntropySessions:10] 2015-01-23 19:38:39,068 > RepairSession.java:260 - [repair #6dec29c0-a337-11e4-9e78-37e5027a626b] new > session: will sync /10.63.74.70, /10.51.180.16 on range > (-6918744323658665195,-6916171087863528821] for OpsCenter.[rollups60] > ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,068 > RepairSession.java:303 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] > session completed with the following error > java.io.IOException: Failed during snapshot creation. > at > org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) > ~[guava-16.0.jar:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,070 > CassandraDaemon.java:153 - Exception in thread > Thread[AntiEntropySessions:9,5,RMI Runtime] > java.lang.RuntimeException: java.io.IOException: Failed during > snapshot creation. > at > com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_51] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_51] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_51] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] > Caused by: java.io.IOException: Failed during snapshot creation. > at > org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) > ~[apache-cassandra-2.1.2.jar:2.1.2] > at > com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) > ~[guava-16.0.jar:na] > ... 3 common frames omitted > > The errors are repeated many times. The IP Address 10.63.74.70 in the > log is the node I'm running the repair from. I am able to repair the rest > of the OpsCenter column families, and they complete pretty quickly without > error. > > I have tried creating my own snapshot, and it completes successfully > with nothing logged. > > nodetool snapshot OpsCenter > > The disk has plenty of space left. Are these errors problematic? Should > I just let the repair process continue for however long it takes? The > cluster is currently not in use by any application, yet it has some load > while it's trying this repair, so it's not sitting idle (it has no load > when I'm not repairing). > > Thanks for any help. > > And if this is the wrong place to ask about a DataStax Community thing, > could someone point me in the right direction? > > ~ Paul Nickerson > > >