Hi Paul,
There is a JIRA ticket about this issue:
https://issues.apache.org/jira/browse/CASSANDRA-8696
I have seen these errors too the last time I ran "nodetool repair".
I would also be interested to know the answer to the questions you were
asking:
"Are these errors problematic? Should I just let the repair process
continue for however long it takes? "
"I am wondering whether this is making the repair ineffectual."
Best regards
Reynald
On 29/01/2015 17:03, Paul Nickerson wrote:
I am running a 6 node cluster using Apache Cassandra 2.1.2 with
DataStax OpsCenter 5.0.2 from the AWS EC2 AMI "DataStax
Auto-Clustering AMI 2.5.1-hvm" (DataStax Community AMI). When I try to
run a repair on the rollups60 column family in the OpsCenter keyspace,
I get errors about failed snapshot creation in the Cassandra system
log. The repair seems to continue, and then finishes with errors.
I am wondering whether this is making the repair ineffectual.
I am running the command
nodetool repair OpsCenter rollups60
on one of the nodes (10.63.74.70). From the command, I get this output:
[2015-01-23 19:36:06,261] Starting repair command #9, repairing
511 ranges for keyspace OpsCenter (seq=true, full=true)
[2015-01-23 21:08:16,242] Repair session
67772db0-a337-11e4-9e78-37e5027a626b for range
(5848435723460298978,5868916338423419522] failed with error
java.io.IOException: Failed during snapshot creation.
The error is repeated many times, and they all appear right at the
end. Here is an example of what I see in the log on that same system
(the one that I'm running the command from, and the one that's trying
to snapshot):
INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,235
RepairSession.java:171 - [repair
#138b42e0-a337-11e4-9e78-37e5027a626b] Received merkle tree for
rollups60 from /10.63.74.70 <http://10.63.74.70>
INFO [AntiEntropySessions:9] 2015-01-23 19:38:28,236
RepairSession.java:260 - [repair
#67772db0-a337-11e4-9e78-37e5027a626b] new session: will sync
/10.63.74.70 <http://10.63.74.70>, /10.51.180.16 <http://10.51.180.16>
on range (5848435723460298978,5868916338423419522] for
OpsCenter.[rollups60]
INFO [RepairJobTask:3] 2015-01-23 19:38:28,237
Differencer.java:74 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b]
Endpoints /10.13.157.190 <http://10.13.157.190> and /10.63.74.70
<http://10.63.74.70> have 1 range(s) out of sync for rollups60
INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,237
ColumnFamilyStore.java:840 - Enqueuing flush of rollups60: 465365 (0%)
on-heap, 0 (0%) off-heap
INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,238
Memtable.java:325 - Writing Memtable-rollups60@204861223(51960
serialized bytes, 1395 ops, 0%/0% of on/off-heap limit)
INFO [RepairJobTask:3] 2015-01-23 19:38:28,239
StreamingRepairTask.java:68 - [streaming task
#138b42e0-a337-11e4-9e78-37e5027a626b] Performing streaming repair of
1 ranges with /10.13.157.190 <http://10.13.157.190>
INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,262
Memtable.java:364 - Completed flushing
/raid0/cassandra/data/OpsCenter/rollups60-445613507ca411e4bd3f1927a2a71193/OpsCenter-rollups60-ka-331933-Data.db
(29998 bytes) for commitlog position
ReplayPosition(segmentId=1422038939094, position=31047766)
ERROR [RepairJobTask:2] 2015-01-23 19:38:39,067 RepairJob.java:127
- Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at
/10.63.74.70 <http://10.63.74.70>
at
org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at
java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_51]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
INFO [AntiEntropySessions:10] 2015-01-23 19:38:39,068
RepairSession.java:260 - [repair
#6dec29c0-a337-11e4-9e78-37e5027a626b] new session: will sync
/10.63.74.70 <http://10.63.74.70>, /10.51.180.16 <http://10.51.180.16>
on range (-6918744323658665195,-6916171087863528821] for
OpsCenter.[rollups60]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,068
RepairSession.java:303 - [repair
#67772db0-a337-11e4-9e78-37e5027a626b] session completed with the
following error
java.io.IOException: Failed during snapshot creation.
at
org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_51]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,070
CassandraDaemon.java:153 - Exception in thread
Thread[AntiEntropySessions:9,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Failed during
snapshot creation.
at
com.google.common.base.Throwables.propagate(Throwables.java:160)
~[guava-16.0.jar:na]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.2.jar:2.1.2]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at
java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_51]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
Caused by: java.io.IOException: Failed during snapshot creation.
at
org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)
~[apache-cassandra-2.1.2.jar:2.1.2]
at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
... 3 common frames omitted
The errors are repeated many times. The IP Address 10.63.74.70 in the
log is the node I'm running the repair from. I am able to repair the
rest of the OpsCenter column families, and they complete pretty
quickly without error.
I have tried creating my own snapshot, and it completes successfully
with nothing logged.
nodetool snapshot OpsCenter
The disk has plenty of space left. Are these errors problematic?
Should I just let the repair process continue for however long it
takes? The cluster is currently not in use by any application, yet it
has some load while it's trying this repair, so it's not sitting idle
(it has no load when I'm not repairing).
Thanks for any help.
And if this is the wrong place to ask about a DataStax Community
thing, could someone point me in the right direction?
~ Paul Nickerson