[jira] [Commented] (CASSANDRA-11345) Assertion Errors "Memory was freed" during streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-11345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256291#comment-15256291 ] Jean-Francois Gosselin commented on CASSANDRA-11345: During a sequential repair. > Assertion Errors "Memory was freed" during streaming > > > Key: CASSANDRA-11345 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11345 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jean-Francois Gosselin >Assignee: Paulo Motta > > We encountered the following AssertionError (twice on the same node) during a > repair : > On node /172.16.63.41 > {noformat} > INFO [STREAM-IN-/10.174.216.160] 2016-03-09 02:38:13,900 > StreamResultFuture.java:180 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] > Session with /10.174.216.160 is complete > > WARN [STREAM-IN-/10.174.216.160] 2016-03-09 02:38:13,900 > StreamResultFuture.java:207 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] > Stream failed > ERROR [STREAM-OUT-/10.174.216.160] 2016-03-09 02:38:13,906 > StreamSession.java:505 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] > Streaming error occurred > java.lang.AssertionError: Memory was freed > > > at > org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97) > ~[apache-cassandra-2.1.13.jar:2.1.13] > > at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) > ~[apache-cassandra-2.1.13.jar:2.1.13] > > at > org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247) > ~[apache-cassandra-2.1.13.jar:2.1.13] > at > org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112) > ~[apache-cassandra-2.1.13.jar:2.1.13] > at > org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546) > ~[apache-cassandra-2.1.13.jar:2.1.13] > > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50) > ~[apache-cassandra-2.1.13.jar:2.1.13] > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) > ~[apache-cassandra-2.1.13.jar:2.1.13] > at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) > ~[apache-cassandra-2.1.13.jar:2.1.13] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) > ~[apache-cassandra-2.1.13.jar:2.1.13] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331) > ~[apache-cassandra-2.1.13.jar:2.1.13] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] > > > {noformat} > On node /10.174.216.160 > > {noformat} > ERROR [STREAM-OUT-/172.16.63.41] 2016-03-09 02:38:14,140 > StreamSession.java:505 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] > Streaming error occurred > java.io.IOException: Connection reset by peer > > > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.7.0_65] > > > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > ~[na:1.7.0_65] > > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > ~[na:1.7.0_65] > > at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.7.0_65] > > > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) > ~[na:1.7.0_65] >
[jira] [Commented] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208597#comment-15208597 ] Jean-Francois Gosselin commented on CASSANDRA-10769: Based on the comments in CASSANDRA-9935 the "AssertionError: row DecoratedKey" is still present in 2.1.13. > "received out of order wrt DecoratedKey" after scrub > > > Key: CASSANDRA-10769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10769 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > After running scrub and cleanup on all nodes in single data center I'm > getting: > {code} > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - > Failed creating a merkle tree for [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for > details) > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 > CassandraDaemon.java:227 - Exception in thread > Thread[ValidationExecutor:103,1,main] > java.lang.AssertionError: row DecoratedKey(-5867787467868737053, > 000932373633313036313204808800) received out of order wrt > DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) > at org.apache.cassandra.repair.Validator.add(Validator.java:127) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} > What I did is to run repair on other node: > {code} > time nodetool repair --in-local-dc > {code} > Corresponding log on the node where repair has been started: > {code} > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > RepairSession.java:303 - [repair #89fa2b70-933d-11e5-b036-75bb514ae072] > session completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > INFO [AntiEntropySessions:415] 2015-11-25 06:28:21,533 > RepairSession.java:260 - [repair #b9458fa0-933d-11e5-b036-75bb514ae072] new > session: will sync /10.210.3.221, /10.210.3.118, /10.210.3.117 on range > (7119703141488009983,7129744584776466802] for sync.[device_token, entity2, > user_stats, user_device, user_quota, user_store, user_device_progress, > entity_by_id2] > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > CassandraDaemon.java:227 - Exception in thread > Thread[AntiEntropySessions:414,5,RMI Runtime] > java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: > [repair #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at com.google.common.base.Throwables.propagate(Throwables.java:160) > ~[guava-16.0.jar:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > ~[na:1.7.0_80] > at
[jira] [Commented] (CASSANDRA-11374) LEAK DETECTED during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199874#comment-15199874 ] Jean-Francois Gosselin commented on CASSANDRA-11374: Same issue as CASSANDRA-9117 but not fixed in 2.1.x ? > LEAK DETECTED during repair > --- > > Key: CASSANDRA-11374 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11374 > Project: Cassandra > Issue Type: Bug >Reporter: Jean-Francois Gosselin > > When running a range repair we are seeing the following LEAK DETECTED errors: > {noformat} > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,261 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@5ee90b43) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@367168611:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ea9d4a7) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1875396681:Memory@[7f34b905fd10..7f34b9060b7a) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@27a6b614) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@838594402:Memory@[7f34bae11ce0..7f34bae11d84) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,263 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@64e7b566) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@674656075:Memory@[7f342deab4e0..7f342deb7ce0) > was not released before the reference was garbage collected > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-11374) LEAK DETECTED during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Francois Gosselin updated CASSANDRA-11374: --- Comment: was deleted (was: We are using the Reaper https://nuance.webex.com/join/sylvain_boily, so a subrange repair (we are not using incremental repair).) > LEAK DETECTED during repair > --- > > Key: CASSANDRA-11374 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11374 > Project: Cassandra > Issue Type: Bug >Reporter: Jean-Francois Gosselin >Assignee: Marcus Eriksson > > When running a range repair we are seeing the following LEAK DETECTED errors: > {noformat} > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,261 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@5ee90b43) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@367168611:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ea9d4a7) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1875396681:Memory@[7f34b905fd10..7f34b9060b7a) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@27a6b614) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@838594402:Memory@[7f34bae11ce0..7f34bae11d84) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,263 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@64e7b566) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@674656075:Memory@[7f342deab4e0..7f342deb7ce0) > was not released before the reference was garbage collected > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11374) LEAK DETECTED during repair
Jean-Francois Gosselin created CASSANDRA-11374: -- Summary: LEAK DETECTED during repair Key: CASSANDRA-11374 URL: https://issues.apache.org/jira/browse/CASSANDRA-11374 Project: Cassandra Issue Type: Bug Reporter: Jean-Francois Gosselin When running a range repair we are seeing the following LEAK DETECTED errors: {noformat} ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,261 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@5ee90b43) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@367168611:[[OffHeapBitSet]] was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@4ea9d4a7) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1875396681:Memory@[7f34b905fd10..7f34b9060b7a) was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@27a6b614) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@838594402:Memory@[7f34bae11ce0..7f34bae11d84) was not released before the reference was garbage collected ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,263 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@64e7b566) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@674656075:Memory@[7f342deab4e0..7f342deb7ce0) was not released before the reference was garbage collected {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11374) LEAK DETECTED during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201920#comment-15201920 ] Jean-Francois Gosselin commented on CASSANDRA-11374: We are using the Reaper from Spotify https://github.com/spotify/cassandra-reaper, so subrange repair . We are not using incremental repair. > LEAK DETECTED during repair > --- > > Key: CASSANDRA-11374 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11374 > Project: Cassandra > Issue Type: Bug >Reporter: Jean-Francois Gosselin >Assignee: Marcus Eriksson > > When running a range repair we are seeing the following LEAK DETECTED errors: > {noformat} > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,261 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@5ee90b43) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@367168611:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ea9d4a7) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1875396681:Memory@[7f34b905fd10..7f34b9060b7a) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@27a6b614) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@838594402:Memory@[7f34bae11ce0..7f34bae11d84) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,263 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@64e7b566) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@674656075:Memory@[7f342deab4e0..7f342deb7ce0) > was not released before the reference was garbage collected > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11374) LEAK DETECTED during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201915#comment-15201915 ] Jean-Francois Gosselin commented on CASSANDRA-11374: We are using the Reaper https://nuance.webex.com/join/sylvain_boily, so a subrange repair (we are not using incremental repair). > LEAK DETECTED during repair > --- > > Key: CASSANDRA-11374 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11374 > Project: Cassandra > Issue Type: Bug >Reporter: Jean-Francois Gosselin >Assignee: Marcus Eriksson > > When running a range repair we are seeing the following LEAK DETECTED errors: > {noformat} > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,261 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@5ee90b43) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@367168611:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ea9d4a7) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1875396681:Memory@[7f34b905fd10..7f34b9060b7a) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@27a6b614) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@838594402:Memory@[7f34bae11ce0..7f34bae11d84) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,263 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@64e7b566) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@674656075:Memory@[7f342deab4e0..7f342deb7ce0) > was not released before the reference was garbage collected > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11345) Assertion Errors "Memory was freed" during streaming
Jean-Francois Gosselin created CASSANDRA-11345: -- Summary: Assertion Errors "Memory was freed" during streaming Key: CASSANDRA-11345 URL: https://issues.apache.org/jira/browse/CASSANDRA-11345 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Reporter: Jean-Francois Gosselin We encountered the following AssertionError (twice on the same node) during a repair : On node /172.16.63.41 {noformat} INFO [STREAM-IN-/10.174.216.160] 2016-03-09 02:38:13,900 StreamResultFuture.java:180 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] Session with /10.174.216.160 is complete WARN [STREAM-IN-/10.174.216.160] 2016-03-09 02:38:13,900 StreamResultFuture.java:207 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] Stream failed ERROR [STREAM-OUT-/10.174.216.160] 2016-03-09 02:38:13,906 StreamSession.java:505 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] Streaming error occurred java.lang.AssertionError: Memory was freed at org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331) ~[apache-cassandra-2.1.13.jar:2.1.13] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] {noformat} On node /10.174.216.160 {noformat} ERROR [STREAM-OUT-/172.16.63.41] 2016-03-09 02:38:14,140 StreamSession.java:505 - [Stream #f6980580-e55f-11e5-8f08-ef9e099ce99e] Streaming error occurred java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.7.0_65] at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.7.0_65] at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.7.0_65] at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.7.0_65] at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) ~[na:1.7.0_65] at org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48) ~[apache-cassandra-2.1.13.jar:2.1.13] at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44) ~[apache-cassandra-2.1.13.jar:2.1.13] at
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157028#comment-15157028 ] Jean-Francois Gosselin commented on CASSANDRA-9625: --- I will give it a try. > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans >Assignee: T Jake Luciani > Attachments: metrics.yaml, thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154789#comment-15154789 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- We are doing range repair with https://github.com/spotify/cassandra-reaper . We don't use incremental repair . We also see the issue with : nodetool repair -pr > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15145336#comment-15145336 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- [~yukim] What's the next step to troubleshoot this issue ? Any specific log we could enable at DEBUG ? > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143400#comment-15143400 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- Ok from 172.16.63.39, same error "received out of order wrt DecoratedKey" : {noformat} ERROR [ValidationExecutor:118] 2016-02-11 17:21:27,512 Validator.java:245 - Failed creating a merkle tree for [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b on foo/bar, (-5525881226490706160,-5525442713957813067]], /10.174.216.158 (see log for details) ERROR [ValidationExecutor:118] 2016-02-11 17:21:27,516 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:118,1,main] java.lang.AssertionError: row DecoratedKey(-5525725068665570338, 0010e3a74bf82717394598e2b7421c89382e250265336137346266382d323731372d333934352d393865322d62373432316338393338326510f64b1c2b7d1c3ff893b70c24c5dbdc6b00) received out of order wrt DecoratedKey(-5525444669477674618, 0010581499f0b99337e1bf468611fd0233e4250235383134393966302d623939332d333765312d626634362d3836313166643032653410f64b1c2b7d1c3ff893b70c24c5dbdc6b00) at org.apache.cassandra.repair.Validator.add(Validator.java:126) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1003) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:615) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] {noformat} > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143270#comment-15143270 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- [~yukim] Yesterday we ran nodetool scrub on all the nodes and restarted the nodes. No luck we're still getting "received out of order wrt DecoratedKey" . Any suggestions for the next step ? > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143307#comment-15143307 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- Here's a new one with no clear message from the exception : {noformat} INFO [AntiEntropyStage:1] 2016-02-11 17:21:20,947 RepairSession.java:171 - [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b] Received merkle tree for bar from /10.53.10.30 ERROR [AntiEntropySessions:28] 2016-02-11 17:21:21,033 RepairSession.java:303 - [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b] session completed with the following error org.apache.cassandra.exceptions.RepairException: [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b on foo/bar, (-5525881226490706160,-5525442713957813067]] Validation failed in /172.16.63.39 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] ERROR [AntiEntropySessions:28] 2016-02-11 17:21:21,034 CassandraDaemon.java:223 - Exception in thread Thread[AntiEntropySessions:28,5,RMI Runtime] java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b on foo/bar, (-5525881226490706160,-5525442713957813067]] Validation failed in /172.16.63.39 at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_65] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] Caused by: org.apache.cassandra.exceptions.RepairException: [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b on foo/bar, (-5525881226490706160,-5525442713957813067]] Validation failed in /172.16.63.39 at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:406) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:134) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) ~[apache-cassandra-2.1.9.jar:2.1.9] ... 3 common frames omitted ERROR [Thread-20728] 2016-02-11 17:21:21,034 StorageService.java:2966 - Repair session d78e02b0-d0e3-11e5-a04a-4ffa10ef584b for range (-5525881226490706160,-5525442713957813067] failed with error org.apache.cassandra.exceptions.RepairException: [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b on foo/bar, (-5525881226490706160,-5525442713957813067]] Validation failed in /172.16.63.39 java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b on foo/bar, (-5525881226490706160,-5525442713957813067]] Validation failed in /172.16.63.39 at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_65] at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_65] at org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2957) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) [apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [na:1.7.0_65] at java.util.concurrent.FutureTask.run(FutureTask.java:262) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: [repair #d78e02b0-d0e3-11e5-a04a-4ffa10ef584b on foo/bar,
[jira] [Commented] (CASSANDRA-10769) "received out of order wrt DecoratedKey" after scrub
[ https://issues.apache.org/jira/browse/CASSANDRA-10769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140965#comment-15140965 ] Jean-Francois Gosselin commented on CASSANDRA-10769: We are also seeing this issue in our multi datacenters cluster (3 DCs), C* 2.1.9 (and using LCS). We ran nodetool scrub on all the nodes but the error keeps coming back . How can we get into this state ? {noformat} ERROR [ValidationExecutor:5884] 2016-02-03 09:27:41,703 Validator.java:245 - Failed creating a merkle tree for [repair #a8f3f040-ca58-11e5-9dda-130298de45de on keyspace1/xyz, (5126461213031423923,5128334161692376535]], /10.174.216.163 (see log for details) ERROR [ValidationExecutor:5884] 2016-02-03 09:27:41,704 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:5884,1,main] java.lang.AssertionError: row DecoratedKey(5126475305931285312, 00103cee13c2c0ea38328138fcad86515eef250233636565313363322d633065612d333833322d383133382d666361643836353135656566105cc950f02b6239f0bf9af60ac7dd452400) received out of order wrt DecoratedKey(5128167525973821686, 00105fe2e7db8810387a9a2955a07ecfa7d3250235666532653764622d383831302d333837612d396132392d35356130376563666137643310f64b1c2b7d1c3ff893b70c24c5dbdc6b00) at org.apache.cassandra.repair.Validator.add(Validator.java:126) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1003) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:615) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] {noformat} > "received out of order wrt DecoratedKey" after scrub > > > Key: CASSANDRA-10769 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10769 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.11, Debian Wheezy >Reporter: mlowicki > > After running scrub and cleanup on all nodes in single data center I'm > getting: > {code} > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,530 Validator.java:245 - > Failed creating a merkle tree for [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]], /10.210.3.221 (see log for > details) > ERROR [ValidationExecutor:103] 2015-11-25 06:28:21,531 > CassandraDaemon.java:227 - Exception in thread > Thread[ValidationExecutor:103,1,main] > java.lang.AssertionError: row DecoratedKey(-5867787467868737053, > 000932373633313036313204808800) received out of order wrt > DecoratedKey(-5865937851627253360, 000933313230313737333204c3c700) > at org.apache.cassandra.repair.Validator.add(Validator.java:127) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1010) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at > org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:622) > ~[apache-cassandra-2.1.11.jar:2.1.11] > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_80] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80] > {code} > What I did is to run repair on other node: > {code} > time nodetool repair --in-local-dc > {code} > Corresponding log on the node where repair has been started: > {code} > ERROR [AntiEntropySessions:414] 2015-11-25 06:28:21,533 > RepairSession.java:303 - [repair #89fa2b70-933d-11e5-b036-75bb514ae072] > session completed with the following error > org.apache.cassandra.exceptions.RepairException: [repair > #89fa2b70-933d-11e5-b036-75bb514ae072 on sync/entity_by_id2, > (-5867793819051725444,-5865919628027816979]] Validation failed in > /10.210.3.117 > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166) >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141245#comment-15141245 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- [~yukim] The WARN message should be in the C* log or on the stdout of nodetool ? > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > [na:1.7.0_80] > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > [na:1.7.0_80] > at >
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141341#comment-15141341 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- No, we haven't seen this WARN. The only thing we haven't tried is a node restart (based on you comment above " ... The latter may be fixed by restarting the node." ) . Although I'm not sure it will fix the problem since we've used C* 2.1.9 from the beginning. > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range > (-5474076923322749342,-5468600594078911162] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 336f8740-3632-11e5-a93e-4963524a8bde for range > (-8631877858109464676,-8624040066373718932] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde for range > (-5372806541854279315,-5369354119480076785] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 59f129f0-3632-11e5-a93e-4963524a8bde for range > (8166489034383821955,8168408930184216281] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde for range > (6084602890817326921,6088328703025510057] finished > INFO [Thread-173887] 2015-07-29 20:44:03,957 StorageService.java:2952 - > Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde for range > (-781874602493000830,-781745173070807746] finished > {code} > but a bit above I see (at least two times in attached log): > {code} > ERROR [Thread-173887] 2015-07-29 20:44:03,853 StorageService.java:2959 - > Repair session 1b07ea50-3608-11e5-a93e-4963524a8bde for range > (5765414319217852786,5781018794516851576] failed with error > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > org.apache.cassandra.exceptions.RepairException: [repair > #1b07ea50-3608-11e5-a93e-4963524a8bde on sync/entity_by_id2, > (5765414319217852786,5781018794516851576]] Validation failed in /10.195.15.162 > at
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139500#comment-15139500 ] Jean-Francois Gosselin commented on CASSANDRA-9625: --- It's not fixed. I end up adding a catch for the AssertionError in the GraphiteReporter as a workaround. > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans > Attachments: metrics.yaml, thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9935) Repair fails with RuntimeException
[ https://issues.apache.org/jira/browse/CASSANDRA-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134437#comment-15134437 ] Jean-Francois Gosselin commented on CASSANDRA-9935: --- [~yukim] We are also seeing this issue in our multi datacenters cluster (3 DCs), C* 2.1.9 (and using LCS). We ran nodetool scrub on all the nodes but the error keeps coming back . We did have some network glitch, as [~mlowicki] was saying, can it be related to network issues ? {noformat} ERROR [ValidationExecutor:5884] 2016-02-03 09:27:41,703 Validator.java:245 - Failed creating a merkle tree for [repair #a8f3f040-ca58-11e5-9dda-130298de45de on keyspace1/xyz, (5126461213031423923,5128334161692376535]], /10.174.216.163 (see log for details) ERROR [ValidationExecutor:5884] 2016-02-03 09:27:41,704 CassandraDaemon.java:223 - Exception in thread Thread[ValidationExecutor:5884,1,main] java.lang.AssertionError: row DecoratedKey(5126475305931285312, 00103cee13c2c0ea38328138fcad86515eef250233636565313363322d633065612d333833322d383133382d666361643836353135656566105cc950f02b6239f0bf9af60ac7dd452400) received out of order wrt DecoratedKey(5128167525973821686, 00105fe2e7db8810387a9a2955a07ecfa7d3250235666532653764622d383831302d333837612d396132392d35356130376563666137643310f64b1c2b7d1c3ff893b70c24c5dbdc6b00) at org.apache.cassandra.repair.Validator.add(Validator.java:126) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1003) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:94) ~[apache-cassandra-2.1.9.jar:2.1.9] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:615) ~[apache-cassandra-2.1.9.jar:2.1.9] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_65] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_65] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65] {noformat} > Repair fails with RuntimeException > -- > > Key: CASSANDRA-9935 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9935 > Project: Cassandra > Issue Type: Bug > Environment: C* 2.1.8, Debian Wheezy >Reporter: mlowicki >Assignee: Yuki Morishita > Fix For: 2.1.x > > Attachments: db1.sync.lati.osa.cassandra.log, > db5.sync.lati.osa.cassandra.log, system.log.10.210.3.117, > system.log.10.210.3.221, system.log.10.210.3.230 > > > We had problems with slow repair in 2.1.7 (CASSANDRA-9702) but after upgrade > to 2.1.8 it started to work faster but now it fails with: > {code} > ... > [2015-07-29 20:44:03,956] Repair session 23a811b0-3632-11e5-a93e-4963524a8bde > for range (-5474076923322749342,-5468600594078911162] finished > [2015-07-29 20:44:03,957] Repair session 336f8740-3632-11e5-a93e-4963524a8bde > for range (-8631877858109464676,-8624040066373718932] finished > [2015-07-29 20:44:03,957] Repair session 4ccd8430-3632-11e5-a93e-4963524a8bde > for range (-5372806541854279315,-5369354119480076785] finished > [2015-07-29 20:44:03,957] Repair session 59f129f0-3632-11e5-a93e-4963524a8bde > for range (8166489034383821955,8168408930184216281] finished > [2015-07-29 20:44:03,957] Repair session 6ae7a9a0-3632-11e5-a93e-4963524a8bde > for range (6084602890817326921,6088328703025510057] finished > [2015-07-29 20:44:03,957] Repair session 8938e4a0-3632-11e5-a93e-4963524a8bde > for range (-781874602493000830,-781745173070807746] finished > [2015-07-29 20:44:03,957] Repair command #4 finished > error: nodetool failed, check server logs > -- StackTrace -- > java.lang.RuntimeException: nodetool failed, check server logs > at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:290) > at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:202) > {code} > After running: > {code} > nodetool repair --partitioner-range --parallel --in-local-dc sync > {code} > Last records in logs regarding repair are: > {code} > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 09ff9e40-3632-11e5-a93e-4963524a8bde for range > (-7695808664784761779,-7693529816291585568] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 17d8d860-3632-11e5-a93e-4963524a8bde for range > (806371695398849,8065203836608925992] finished > INFO [Thread-173887] 2015-07-29 20:44:03,956 StorageService.java:2952 - > Repair session 23a811b0-3632-11e5-a93e-4963524a8bde for range >
[jira] [Commented] (CASSANDRA-10502) Cassandra query degradation with high frequency updated tables
[ https://issues.apache.org/jira/browse/CASSANDRA-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086025#comment-15086025 ] Jean-Francois Gosselin commented on CASSANDRA-10502: [~thobbs] Have you tried to dump the data for this key with sstable2json ? > Cassandra query degradation with high frequency updated tables > -- > > Key: CASSANDRA-10502 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10502 > Project: Cassandra > Issue Type: Bug >Reporter: Dodong Juan >Priority: Minor > Labels: perfomance, query, triage > Fix For: 2.2.x > > > Hi, > So we are developing a system that computes profile of things that it > observes. The observation comes in form of events. Each thing that it > observe has an id and each thing has a set of subthings in it which has > measurement of some kind. Roughly there are about 500 subthings within each > thing. We receive events containing measurements of these 500 subthings every > 10 seconds or so. > So as we receive events, we read the old profile value, calculate the new > profile based on the new value and save it back. > One of the things we observe are the processes running on the server. > We use the following schema to hold the profile. > {noformat} > CREATE TABLE processinfometric_profile ( > profilecontext text, > id text, > month text, > day text, > hour text, > minute text, > command text, > cpu map, > majorfaults map , > minorfaults map , > nice map , > pagefaults map , > pid map , > ppid map , > priority map , > resident map , > rss map , > sharesize map , > size map , > starttime map , > state map , > threads map , > user map , > vsize map , > PRIMARY KEY ((profilecontext, id, month, day, hour, minute), command) > ) WITH CLUSTERING ORDER BY (command ASC) > AND bloom_filter_fp_chance = 0.1 > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'} > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99.0PERCENTILE'; > {noformat} > This profile will then be use for certain analytics that can use in the > context of the ‘thing’ or in the context of specific thing and subthing. > A profile can be defined as monthly, daily, hourly. So in case of monthly the > month will be set to the current month (i.e. ‘Oct’) and the day and hour will > be set to empty ‘’ string. > The problem that we have observed is that over time (actually in just a > matter of hours) we will see a huge degradation of query response for the > monthly profile. At the start it will be respinding in 10-100 ms and after a > couple of hours it will go to 2000-3000 ms . If you leave it for a couple of > days you will start experiencing readtimeouts . The query is basically just : > {noformat} > select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and hour=‘' > and minute='' > {noformat} > This will have only about 500 rows or so. > We were using Cassandra 2.2.1 , but upgraded to 2.2.2 to see if it fixed the > issue to no avail. And since this is a test, we are running on a single node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943468#comment-14943468 ] Jean-Francois Gosselin commented on CASSANDRA-9625: --- I ran into another assert that breaks the GraphiteReporter on 2.1.9 . When SSTableReader.getApproximateKeyCount is called, how can I get in a state where the CompactionMetadata is null ? {code:title=SSTableReader.java|borderStyle=solid} 276try 278{ 279CompactionMetadata metadata = (CompactionMetadata) sstable.descriptor.getMetadataSerializer().deserialize(sstable.descriptor, MetadataType.COMPACTION); 280assert metadata != null : sstable.getFilename(); 281if (cardinality == null) {code} {noformat} at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:279) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:296) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:290) at com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:292) at com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:27) at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28) at com.yammer.metrics.reporting.GraphiteReporter.printRegularMetrics(GraphiteReporter.java:235) at com.yammer.metrics.reporting.GraphiteReporter.run(GraphiteReporter.java:199) {noformat} > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans > Attachments: metrics.yaml, thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943468#comment-14943468 ] Jean-Francois Gosselin edited comment on CASSANDRA-9625 at 10/5/15 2:51 PM: I ran into another assert that breaks the GraphiteReporter on 2.1.9 . When SSTableReader.getApproximateKeyCount is called, how can I get in a state where the CompactionMetadata is null ? {code:title=SSTableReader.java|borderStyle=solid} 276 try 278 { 279 CompactionMetadata metadata = (CompactionMetadata) sstable.descriptor.getMetadataSerializer().deserialize(sstable.descriptor, MetadataType.COMPACTION); 280 assert metadata != null : sstable.getFilename(); 281 if (cardinality == null) {code} {noformat} at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:279) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:296) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:290) at com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:292) at com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:27) at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28) at com.yammer.metrics.reporting.GraphiteReporter.printRegularMetrics(GraphiteReporter.java:235) at com.yammer.metrics.reporting.GraphiteReporter.run(GraphiteReporter.java:199) {noformat} was (Author: jfgosselin): I ran into another assert that breaks the GraphiteReporter on 2.1.9 . When SSTableReader.getApproximateKeyCount is called, how can I get in a state where the CompactionMetadata is null ? {code:title=SSTableReader.java|borderStyle=solid} 276try 278{ 279CompactionMetadata metadata = (CompactionMetadata) sstable.descriptor.getMetadataSerializer().deserialize(sstable.descriptor, MetadataType.COMPACTION); 280assert metadata != null : sstable.getFilename(); 281if (cardinality == null) {code} {noformat} at org.apache.cassandra.io.sstable.SSTableReader.getApproximateKeyCount(SSTableReader.java:279) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:296) at org.apache.cassandra.metrics.ColumnFamilyMetrics$9.value(ColumnFamilyMetrics.java:290) at com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:292) at com.yammer.metrics.reporting.GraphiteReporter.processGauge(GraphiteReporter.java:27) at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28) at com.yammer.metrics.reporting.GraphiteReporter.printRegularMetrics(GraphiteReporter.java:235) at com.yammer.metrics.reporting.GraphiteReporter.run(GraphiteReporter.java:199) {noformat} > GraphiteReporter not reporting > -- > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 >Reporter: Eric Evans > Attachments: metrics.yaml, thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715194#comment-14715194 ] Jean-Francois Gosselin commented on CASSANDRA-9625: --- [~benedict] Can this issue be reopened ? GraphiteReporter not reporting -- Key: CASSANDRA-9625 URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 Project: Cassandra Issue Type: Bug Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 Reporter: Eric Evans Assignee: T Jake Luciani Attachments: metrics.yaml, thread-dump.log When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops working. The usual startup is logged, and one batch of samples is sent, but the reporting interval comes and goes, and no other samples are ever sent. The logs are free from errors. Frustratingly, metrics reporting works in our smaller (staging) environment on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not on a 3 node (otherwise identical) staging cluster (maybe it takes a certain level of concurrency?). Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711887#comment-14711887 ] Jean-Francois Gosselin commented on CASSANDRA-9625: --- [~tjake] I think that I've found the issue. When the Gauge metric for CompressionMetadataOffHeapMemoryUsed is called, the following method is called in org.apache.cassandra.io.util.Memory : {code:title=org.apache.cassandra.io.util.Memory.java|borderStyle=solid} public long size() { assert peer != 0; return size; } {code} and for some reason peer was 0. After the AssertionError the metrics graphite reporter thread is no longer executed. GraphiteReporter not reporting -- Key: CASSANDRA-9625 URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 Project: Cassandra Issue Type: Bug Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 Reporter: Eric Evans Assignee: T Jake Luciani Attachments: metrics.yaml, thread-dump.log When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops working. The usual startup is logged, and one batch of samples is sent, but the reporting interval comes and goes, and no other samples are ever sent. The logs are free from errors. Frustratingly, metrics reporting works in our smaller (staging) environment on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not on a 3 node (otherwise identical) staging cluster (maybe it takes a certain level of concurrency?). Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695853#comment-14695853 ] Jean-Francois Gosselin commented on CASSANDRA-9625: --- [~tjake] I can easily reproduce the issue after ~ 12h, com.yammer.metrics.reporting at DEBUG didn't provide anything . Any specific places where I should add traces in GraphiteRepoter ? GraphiteReporter not reporting -- Key: CASSANDRA-9625 URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 Project: Cassandra Issue Type: Bug Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 Reporter: Eric Evans Assignee: T Jake Luciani Attachments: metrics.yaml, thread-dump.log When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops working. The usual startup is logged, and one batch of samples is sent, but the reporting interval comes and goes, and no other samples are ever sent. The logs are free from errors. Frustratingly, metrics reporting works in our smaller (staging) environment on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not on a 3 node (otherwise identical) staging cluster (maybe it takes a certain level of concurrency?). Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9625) GraphiteReporter not reporting
[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682117#comment-14682117 ] Jean-Francois Gosselin commented on CASSANDRA-9625: --- We are seeing this issue on 2.1.8. GraphiteReporter not reporting -- Key: CASSANDRA-9625 URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 Project: Cassandra Issue Type: Bug Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 Reporter: Eric Evans Assignee: T Jake Luciani Fix For: 2.1.x Attachments: metrics.yaml, thread-dump.log When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops working. The usual startup is logged, and one batch of samples is sent, but the reporting interval comes and goes, and no other samples are ever sent. The logs are free from errors. Frustratingly, metrics reporting works in our smaller (staging) environment on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not on a 3 node (otherwise identical) staging cluster (maybe it takes a certain level of concurrency?). Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)