Re: Repairing OpsCenter rollups60 Results in Snapshot Errors

Reynald Bourtembourg Thu, 29 Jan 2015 08:17:05 -0800

Hi Paul,

There is a JIRA ticket about this issue:
https://issues.apache.org/jira/browse/CASSANDRA-8696


I have seen these errors too the last time I ran "nodetool repair".

I would also be interested to know the answer to the questions you wereasking:

"Are these errors problematic? Should I just let the repair processcontinue for however long it takes? "

"I am wondering whether this is making the repair ineffectual."

Best regards

Reynald


On 29/01/2015 17:03, Paul Nickerson wrote:

I am running a 6 node cluster using Apache Cassandra 2.1.2 withDataStax OpsCenter 5.0.2 from the AWS EC2 AMI "DataStaxAuto-Clustering AMI 2.5.1-hvm" (DataStax Community AMI). When I try torun a repair on the rollups60 column family in the OpsCenter keyspace,I get errors about failed snapshot creation in the Cassandra systemlog. The repair seems to continue, and then finishes with errors.
I am wondering whether this is making the repair ineffectual.

I am running the command

    nodetool repair OpsCenter rollups60

on one of the nodes (10.63.74.70). From the command, I get this output:
[2015-01-23 19:36:06,261] Starting repair command #9, repairing511 ranges for keyspace OpsCenter (seq=true, full=true)[2015-01-23 21:08:16,242] Repair session67772db0-a337-11e4-9e78-37e5027a626b for range(5848435723460298978,5868916338423419522] failed with errorjava.io.IOException: Failed during snapshot creation.
The error is repeated many times, and they all appear right at theend. Here is an example of what I see in the log on that same system(the one that I'm running the command from, and the one that's tryingto snapshot):
INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,235RepairSession.java:171 - [repair#138b42e0-a337-11e4-9e78-37e5027a626b] Received merkle tree forrollups60 from /10.63.74.70 <http://10.63.74.70>INFO [AntiEntropySessions:9] 2015-01-23 19:38:28,236RepairSession.java:260 - [repair#67772db0-a337-11e4-9e78-37e5027a626b] new session: will sync/10.63.74.70 <http://10.63.74.70>, /10.51.180.16 <http://10.51.180.16>on range (5848435723460298978,5868916338423419522] forOpsCenter.[rollups60]INFO [RepairJobTask:3] 2015-01-23 19:38:28,237Differencer.java:74 - [repair #138b42e0-a337-11e4-9e78-37e5027a626b]Endpoints /10.13.157.190 <http://10.13.157.190> and /10.63.74.70<http://10.63.74.70> have 1 range(s) out of sync for rollups60INFO [AntiEntropyStage:1] 2015-01-23 19:38:28,237ColumnFamilyStore.java:840 - Enqueuing flush of rollups60: 465365 (0%)on-heap, 0 (0%) off-heapINFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,238Memtable.java:325 - Writing Memtable-rollups60@204861223(51960serialized bytes, 1395 ops, 0%/0% of on/off-heap limit)INFO [RepairJobTask:3] 2015-01-23 19:38:28,239StreamingRepairTask.java:68 - [streaming task#138b42e0-a337-11e4-9e78-37e5027a626b] Performing streaming repair of1 ranges with /10.13.157.190 <http://10.13.157.190>INFO [MemtableFlushWriter:25] 2015-01-23 19:38:28,262Memtable.java:364 - Completed flushing/raid0/cassandra/data/OpsCenter/rollups60-445613507ca411e4bd3f1927a2a71193/OpsCenter-rollups60-ka-331933-Data.db(29998 bytes) for commitlog positionReplayPosition(segmentId=1422038939094, position=31047766)ERROR [RepairJobTask:2] 2015-01-23 19:38:39,067 RepairJob.java:127- Error occurred during snapshot phasejava.lang.RuntimeException: Could not create snapshot at/10.63.74.70 <http://10.63.74.70>atorg.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)~[apache-cassandra-2.1.2.jar:2.1.2]atorg.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347)~[apache-cassandra-2.1.2.jar:2.1.2]atjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]atjava.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[na:1.7.0_51]atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[na:1.7.0_51]
            at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
INFO [AntiEntropySessions:10] 2015-01-23 19:38:39,068RepairSession.java:260 - [repair#6dec29c0-a337-11e4-9e78-37e5027a626b] new session: will sync/10.63.74.70 <http://10.63.74.70>, /10.51.180.16 <http://10.51.180.16>on range (-6918744323658665195,-6916171087863528821] forOpsCenter.[rollups60]ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,068RepairSession.java:303 - [repair#67772db0-a337-11e4-9e78-37e5027a626b] session completed with thefollowing error
    java.io.IOException: Failed during snapshot creation.
atorg.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)~[apache-cassandra-2.1.2.jar:2.1.2]atorg.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)~[apache-cassandra-2.1.2.jar:2.1.2]atcom.google.common.util.concurrent.Futures$4.run(Futures.java:1172)~[guava-16.0.jar:na]atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[na:1.7.0_51]atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[na:1.7.0_51]
            at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
ERROR [AntiEntropySessions:9] 2015-01-23 19:38:39,070CassandraDaemon.java:153 - Exception in threadThread[AntiEntropySessions:9,5,RMI Runtime]java.lang.RuntimeException: java.io.IOException: Failed duringsnapshot creation.atcom.google.common.base.Throwables.propagate(Throwables.java:160)~[guava-16.0.jar:na]atorg.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.2.jar:2.1.2]atjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]atjava.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51]atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)~[na:1.7.0_51]atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[na:1.7.0_51]
            at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
    Caused by: java.io.IOException: Failed during snapshot creation.
atorg.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)~[apache-cassandra-2.1.2.jar:2.1.2]atorg.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)~[apache-cassandra-2.1.2.jar:2.1.2]atcom.google.common.util.concurrent.Futures$4.run(Futures.java:1172)~[guava-16.0.jar:na]
            ... 3 common frames omitted
The errors are repeated many times. The IP Address 10.63.74.70 in thelog is the node I'm running the repair from. I am able to repair therest of the OpsCenter column families, and they complete prettyquickly without error.
I have tried creating my own snapshot, and it completes successfullywith nothing logged.
    nodetool snapshot OpsCenter
The disk has plenty of space left. Are these errors problematic?Should I just let the repair process continue for however long ittakes? The cluster is currently not in use by any application, yet ithas some load while it's trying this repair, so it's not sitting idle(it has no load when I'm not repairing).
Thanks for any help.
And if this is the wrong place to ask about a DataStax Communitything, could someone point me in the right direction?
 ~ Paul Nickerson

Re: Repairing OpsCenter rollups60 Results in Snapshot Errors

Reply via email to