Hi Paul,

There is a JIRA ticket about this issue:
https://issues.apache.org/jira/browse/CASSANDRA-8696

I have seen these errors too the last time I ran "nodetool repair".
I would also be interested to know the answer to the questions you were asking:

"Are these errors problematic? Should I just let the repair process continue for however long it takes? "
"I am wondering whether this is making the repair ineffectual."

Best regards

Reynald


On 29/01/2015 17:03, Paul Nickerson wrote:
I am running a 6 node cluster using Apache Cassandra 2.1.2 with DataStax OpsCenter 5.0.2 from the AWS EC2 AMI "DataStax Auto-Clustering AMI 2.5.1-hvm" (DataStax Community AMI). When I try to run a repair on the rollups60 column family in the OpsCenter keyspace, I get errors about failed snapshot creation in the Cassandra system log. The repair seems to continue, and then finishes with errors.

I am wondering whether this is making the repair ineffectual.

I am running the command

    nodetool repair OpsCenter rollups60

on one of the nodes (10.63.74.70). From the command, I get this output:

[2015-01-23 19:36:06,261] Starting repair command #9, repairing 511 ranges for keyspace OpsCenter (seq=true, full=true) [2015-01-23 21:08:16,242] Repair session 67772db0-a337-11e4-9e78-37e5027a626b for range (5848435723460298978,5868916338423419522] failed with error java.io.IOException: Failed during snapshot creation.

The error is repeated many times, and they all appear right at the end. Here is an example of what I see in the log on that same system (the one that I'm running the command from, and the one that's trying to snapshot):

INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,235 RepairSession.java:171 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Received merkle tree for rollups60 from /10.63.74.70 <http://10.63.74.70> INFO [AntiEntropySessions:9] 2015-01-23 19:38:28,236 RepairSession.java:260 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] new session: will sync /10.63.74.70 <http://10.63.74.70>, /10.51.180.16 <http://10.51.180.16> on range (5848435723460298978,5868916338423419522] for OpsCenter.[rollups60] INFO [RepairJobTask:3] 2015-01-23 19:38:28,237 Differencer.java:74 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b] Endpoints /10.13.157.190 <http://10.13.157.190> and /10.63.74.70 <http://10.63.74.70> have 1 range(s) out of sync for rollups60 INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,237 ColumnFamilyStore.java:840 - Enqueuing flush of rollups60: 465365 (0%) on-heap, 0 (0%) off-heap INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,238 Memtable.java:325 - Writing Memtable-rollups60@204861223(51960 serialized bytes, 1395 ops, 0%/0% of on/off-heap limit) INFO [RepairJobTask:3] 2015-01-23 19:38:28,239 StreamingRepairTask.java:68 - [streaming task #138b42e0-a337-11e4-9e78-37e5027a626b] Performing streaming repair of 1 ranges with /10.13.157.190 <http://10.13.157.190> INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,262 Memtable.java:364 - Completed flushing /raid0/cassandra/data/OpsCenter/rollups60-445613507ca411e4bd3f1927a2a71193/OpsCenter-rollups60-ka-331933-Data.db (29998 bytes) for commitlog position ReplayPosition(segmentId=1422038939094, position=31047766) ERROR [RepairJobTask:2] 2015-01-23 19:38:39,067 RepairJob.java:127 - Error occurred during snapshot phase java.lang.RuntimeException: Could not create snapshot at /10.63.74.70 <http://10.63.74.70> at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) ~[apache-cassandra-2.1.2.jar:2.1.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
            at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
INFO [AntiEntropySessions:10] 2015-01-23 19:38:39,068 RepairSession.java:260 - [repair #6dec29c0-a337-11e4-9e78-37e5027a626b] new session: will sync /10.63.74.70 <http://10.63.74.70>, /10.51.180.16 <http://10.51.180.16> on range (-6918744323658665195,-6916171087863528821] for OpsCenter.[rollups60] ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,068 RepairSession.java:303 - [repair #67772db0-a337-11e4-9e78-37e5027a626b] session completed with the following error
    java.io.IOException: Failed during snapshot creation.
at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2] at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
            at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,070 CassandraDaemon.java:153 - Exception in thread Thread[AntiEntropySessions:9,5,RMI Runtime] java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation. at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.2.jar:2.1.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]
            at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
    Caused by: java.io.IOException: Failed during snapshot creation.
at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344) ~[apache-cassandra-2.1.2.jar:2.1.2] at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2] at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
            ... 3 common frames omitted

The errors are repeated many times. The IP Address 10.63.74.70 in the log is the node I'm running the repair from. I am able to repair the rest of the OpsCenter column families, and they complete pretty quickly without error.

I have tried creating my own snapshot, and it completes successfully with nothing logged.

    nodetool snapshot OpsCenter

The disk has plenty of space left. Are these errors problematic? Should I just let the repair process continue for however long it takes? The cluster is currently not in use by any application, yet it has some load while it's trying this repair, so it's not sitting idle (it has no load when I'm not repairing).

Thanks for any help.

And if this is the wrong place to ask about a DataStax Community thing, could someone point me in the right direction?

 ~ Paul Nickerson

Reply via email to