[jira] [Commented] (CASSANDRA-6101) Debian init script broken
[ https://issues.apache.org/jira/browse/CASSANDRA-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785808#comment-13785808 ] Anton Winter commented on CASSANDRA-6101: - That {{service cassandra status}} problem is resolved by CASSANDRA-6090 Debian init script broken - Key: CASSANDRA-6101 URL: https://issues.apache.org/jira/browse/CASSANDRA-6101 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anton Winter Assignee: Eric Evans Priority: Minor Attachments: 6101-classpath.patch, 6101.txt The debian init script released in 2.0.1 contains 2 issues: # The pidfile directory is not created if it doesn't already exist. # Classpath not exported to the start-stop-daemon. These lead to the init script not picking up jna.jar, or anything from the debian EXTRA_CLASSPATH environment variable, and the init script not being able to stop/restart Cassandra. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6101) Debian init script broken
[ https://issues.apache.org/jira/browse/CASSANDRA-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784838#comment-13784838 ] Anton Winter commented on CASSANDRA-6101: - Yes, that works as well. Debian init script broken - Key: CASSANDRA-6101 URL: https://issues.apache.org/jira/browse/CASSANDRA-6101 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anton Winter Assignee: Eric Evans Priority: Minor Attachments: 6101-classpath.patch, 6101.txt The debian init script released in 2.0.1 contains 2 issues: # The pidfile directory is not created if it doesn't already exist. # Classpath not exported to the start-stop-daemon. These lead to the init script not picking up jna.jar, or anything from the debian EXTRA_CLASSPATH environment variable, and the init script not being able to stop/restart Cassandra. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (CASSANDRA-6101) Debian init script broken
Anton Winter created CASSANDRA-6101: --- Summary: Debian init script broken Key: CASSANDRA-6101 URL: https://issues.apache.org/jira/browse/CASSANDRA-6101 Project: Cassandra Issue Type: Bug Components: Core Reporter: Anton Winter Priority: Minor The debian init script released in 2.0.1 contains 2 issues: # The pidfile directory is not created if it doesn't already exist. # Classpath not exported to the start-stop-daemon. These lead to the init script not picking up jna.jar, or anything from the debian EXTRA_CLASSPATH environment variable, and the init script not being able to stop/restart Cassandra. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4561) update column family fails
[ https://issues.apache.org/jira/browse/CASSANDRA-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450479#comment-13450479 ] Anton Winter commented on CASSANDRA-4561: - My schema is in agreement, but the timestamps aren't fixed, even though the log suggests otherwise (is that previously mentioned NPE related?). {code} [default@unknown] describe cluster; Cluster Information: Snitch: org.apache.cassandra.locator.PropertyFileSnitch Partitioner: org.apache.cassandra.dht.RandomPartitioner Schema versions: 89b22434-5e34-381d-83d1-2a3cde1482fe: [x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x, x] [default@unknown] {code} Schema updates continue to silently fail. update column family fails -- Key: CASSANDRA-4561 URL: https://issues.apache.org/jira/browse/CASSANDRA-4561 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4 Reporter: Zenek Kraweznik Assignee: Pavel Yaskevich Fix For: 1.1.5 Attachments: CASSANDRA-4561.patch [default@test] show schema; create column family Messages with column_type = 'Standard' and comparator = 'AsciiType' and default_validation_class = 'BytesType' and key_validation_class = 'AsciiType' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 2 and max_compaction_threshold = 4 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'KEYS_ONLY' and compaction_strategy_options = {'sstable_size_in_mb' : '1024'} and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.DeflateCompressor'}; [default@test] update column family Messages with min_compaction_threshold = 4 and max_compaction_threshold = 32; a5b7544e-1ef5-3bfd-8770-c09594e37ec2 Waiting for schema agreement... ... schemas agree across the cluster [default@test] show schema; create column family Messages with column_type = 'Standard' and comparator = 'AsciiType' and default_validation_class = 'BytesType' and key_validation_class = 'AsciiType' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 2 and max_compaction_threshold = 4 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'KEYS_ONLY' and compaction_strategy_options = {'sstable_size_in_mb' : '1024'} and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.DeflateCompressor'}; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4561) update column family fails
[ https://issues.apache.org/jira/browse/CASSANDRA-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450670#comment-13450670 ] Anton Winter commented on CASSANDRA-4561: - Ring upgraded with the second patch and I am now able to perform schema updates. Thanks! update column family fails -- Key: CASSANDRA-4561 URL: https://issues.apache.org/jira/browse/CASSANDRA-4561 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4 Reporter: Zenek Kraweznik Assignee: Pavel Yaskevich Fix For: 1.1.5 Attachments: CASSANDRA-4561-CS.patch, CASSANDRA-4561.patch [default@test] show schema; create column family Messages with column_type = 'Standard' and comparator = 'AsciiType' and default_validation_class = 'BytesType' and key_validation_class = 'AsciiType' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 2 and max_compaction_threshold = 4 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'KEYS_ONLY' and compaction_strategy_options = {'sstable_size_in_mb' : '1024'} and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.DeflateCompressor'}; [default@test] update column family Messages with min_compaction_threshold = 4 and max_compaction_threshold = 32; a5b7544e-1ef5-3bfd-8770-c09594e37ec2 Waiting for schema agreement... ... schemas agree across the cluster [default@test] show schema; create column family Messages with column_type = 'Standard' and comparator = 'AsciiType' and default_validation_class = 'BytesType' and key_validation_class = 'AsciiType' and read_repair_chance = 0.1 and dclocal_read_repair_chance = 0.0 and gc_grace = 864000 and min_compaction_threshold = 2 and max_compaction_threshold = 4 and replicate_on_write = true and compaction_strategy = 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy' and caching = 'KEYS_ONLY' and compaction_strategy_options = {'sstable_size_in_mb' : '1024'} and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' : 'org.apache.cassandra.io.compress.DeflateCompressor'}; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4411) Assertion with LCS compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13421061#comment-13421061 ] Anton Winter commented on CASSANDRA-4411: - I can also confirm that after multiple offline sstablescrubs across all nodes that I still had several nodes (but not all) spread across multiple DC's still exhibiting this problem as described above by Mina. In an attempt to work around the problem I shut down the affected instances, deleted all data and re-bootstrapped them as if they were dead nodes. Since doing so I haven't had the problem return however it is still early days. Assertion with LCS compaction - Key: CASSANDRA-4411 URL: https://issues.apache.org/jira/browse/CASSANDRA-4411 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.2 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.3 Attachments: 0001-Add-debugging-info-for-LCS.txt, 4411-followup.txt, 4411.txt, assertion-w-more-debugging-info-omid.log, assertion.moreinfo.system.log, system.log As instructed in CASSANDRA-4321 I have raised this issue as a continuation of that issue as it appears the problem still exists. I have repeatedly run sstablescrub across all my nodes after the 1.1.2 upgrade until sstablescrub shows no errors. The exceptions described in CASSANDRA-4321 do not occur as frequently now but the integrity check still throws exceptions on a number of nodes. Once those exceptions occur compactionstats shows a large number of pending tasks with no progression afterwards. {code} ERROR [CompactionExecutor:150] 2012-07-05 04:26:15,570 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:150,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-4411) Assertion with LCS compaction
Anton Winter created CASSANDRA-4411: --- Summary: Assertion with LCS compaction Key: CASSANDRA-4411 URL: https://issues.apache.org/jira/browse/CASSANDRA-4411 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.2 Reporter: Anton Winter As instructed in CASSANDRA-4321 I have raised this issue as a continuation of that issue as it appears the problem still exists. I have repeatedly run sstablescrub across all my nodes after the 1.1.2 upgrade until sstablescrub shows no errors. The exceptions described in CASSANDRA-4321 do not occur as frequently now but the integrity check still throws exceptions on a number of nodes. Once those exceptions occur compactionstats shows a large number of pending tasks with no progression afterwards. {code} ERROR [CompactionExecutor:150] 2012-07-05 04:26:15,570 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:150,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406844#comment-13406844 ] Anton Winter commented on CASSANDRA-4321: - New issue raised as requested: CASSANDRA-4411 stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Fix-overlapping-computation-v7.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 0003-Create-standalone-scrub-v7.txt, 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39) at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406254#comment-13406254 ] Anton Winter commented on CASSANDRA-4321: - I have repeatedly run sstablescrub across all my nodes and the exceptions do not occur as frequently now, however, the integrity check still throw exceptions. compactionstats shows a large number of pending tasks but no progression after this error. Should this ticket be reopened or a new one raised? {code} ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:912,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Fix-overlapping-computation-v7.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 0003-Create-standalone-scrub-v7.txt, 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at
[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406254#comment-13406254 ] Anton Winter edited comment on CASSANDRA-4321 at 7/4/12 3:28 AM: - I have repeatedly run sstablescrub across all my nodes and the exceptions do not occur as frequently now, however, the integrity check still throws exceptions and compactionstats shows a large number of pending tasks but no progression afterwards. Should this ticket be reopened or a new one raised? {code} ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:912,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} was (Author: awinter): I have repeatedly run sstablescrub across all my nodes and the exceptions do not occur as frequently now, however, the integrity check still throw exceptions. compactionstats shows a large number of pending tasks but no progression after this error. Should this ticket be reopened or a new one raised? {code} ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:912,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Fix-overlapping-computation-v7.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 0003-Create-standalone-scrub-v7.txt, 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart.
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403764#comment-13403764 ] Anton Winter commented on CASSANDRA-4321: - I've applied the v7 patches and have successfully offline scrubbed reinserted a number of nodes in my ring without further occurrence of the previous issues. Thanks :) stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Fix-overlapping-computation-v7.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 0003-Create-standalone-scrub-v7.txt, 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404333#comment-13404333 ] Anton Winter commented on CASSANDRA-4321: - Maybe I spoke too soon. Overnight I've seen the exceptions happen again on nodes that were v7 patched scrubbed. {code} ERROR [CompactionExecutor:1301] 2012-06-29 21:54:12,078 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:1301,1,main] java.lang.RuntimeException: Last written key DecoratedKey(116816802911061669023614481109871014436, 4faa631ca88ef85b8e26ddeb) = current key DecoratedKey(115179899219377463875853982254751557438, 4fa892bf42d3f24479f627b6) writing into /var/lib//data/cassandra/KS/CF/KS-CF-tmp-hd-837655-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Fix-overlapping-computation-v7.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 0003-Create-standalone-scrub-v7.txt, 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404338#comment-13404338 ] Anton Winter commented on CASSANDRA-4321: - 1.1 dev branch + patches stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Fix-overlapping-computation-v7.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 0003-Create-standalone-scrub-v7.txt, 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39) at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) at
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399760#comment-13399760 ] Anton Winter commented on CASSANDRA-4321: - bq. Was I lucky? Are you guys able to reproduce those steps and still get more errors? As discussed, but repeated here just for the ticket's reference; I was patching and scrubbing in the same way as described above. Once the scrubbed nodes were restarted in the cluster they were then under normal read/write load and experienced the exceptions again. Given that the sstablescrub and subsequent compactions run fine in Sylvain's test, using my out of order sstables, means that the sstablescrub command appears to do its job fine. The root cause, originally expected to be resolved with the 0001 patch, still appears to be occurring so Sylvain was going to investigate the code further. stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398206#comment-13398206 ] Anton Winter commented on CASSANDRA-4321: - After working around the issue with the 0003 v5 patch that Omid refers I've had an sstablescrub complete on one of my servers. sstablescrub did detected several overlapping sstables, resetting them to L0, but no out of order keys. The Last written key DecoratedKey = current key exception however resurfaces again after the first set of compactions, 5 minutes after startup, in the exact same manner as before. The same exception occurs for various CF's until compactions stop completely. compactionstats still shows a large number of pending compaction tasks after this event. stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396433#comment-13396433 ] Anton Winter commented on CASSANDRA-4321: - I can confirm I also experienced the Unexpected empty index file errors on some of the nodes that I have run sstablescrub on. On some other nodes the sstablescrub command appears to complete successfully but compactions still stops at the java.lang.RuntimeException: Last written key DecoratedKey error. Is there any further information we can supply to help debug? stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at
[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396433#comment-13396433 ] Anton Winter edited comment on CASSANDRA-4321 at 6/19/12 2:01 AM: -- I can confirm I also experienced the Unexpected empty index file errors on some of the nodes that I have run sstablescrub on. Other nodes had this error when running sstablescrub: {code} Scrub of SSTableReader(path='/var/lib//data/cassandra/KS/CF/KS-CF-hd-259648-Data.db') complete: 1592 rows in new sstable and 0 empty (tombstoned) rows dropped EOF after 6 bytes out of 8 {code} Compactions stop with the java.lang.RuntimeException: Last written key DecoratedKey error on the nodes affected by either of the above 2 errors . Nodes that seem to have been repaired by the sstablescrub still continue to have java.lang.RuntimeException: Last written key DecoratedKey errors scattered through the logs but are still be compacting. Is there any further information we can supply to help debug? was (Author: awinter): I can confirm I also experienced the Unexpected empty index file errors on some of the nodes that I have run sstablescrub on. On some other nodes the sstablescrub command appears to complete successfully but compactions still stops at the java.lang.RuntimeException: Last written key DecoratedKey error. Is there any further information we can supply to help debug? stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at
[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396433#comment-13396433 ] Anton Winter edited comment on CASSANDRA-4321 at 6/19/12 2:02 AM: -- I can confirm I also experienced the Unexpected empty index file errors on some of the nodes that I have run sstablescrub on. Other nodes had this error when running sstablescrub: {code} Scrub of SSTableReader(path='/var/lib//data/cassandra/KS/CF/KS-CF-hd-259648-Data.db') complete: 1592 rows in new sstable and 0 empty (tombstoned) rows dropped EOF after 6 bytes out of 8 {code} Compactions stop with the java.lang.RuntimeException: Last written key DecoratedKey error on the nodes affected by either of the above 2 errors . Nodes that seem to have been repaired by the sstablescrub still continue to have java.lang.RuntimeException: Last written key DecoratedKey errors scattered through the logs but are still compacting. Is there any further information we can supply to help debug? was (Author: awinter): I can confirm I also experienced the Unexpected empty index file errors on some of the nodes that I have run sstablescrub on. Other nodes had this error when running sstablescrub: {code} Scrub of SSTableReader(path='/var/lib//data/cassandra/KS/CF/KS-CF-hd-259648-Data.db') complete: 1592 rows in new sstable and 0 empty (tombstoned) rows dropped EOF after 6 bytes out of 8 {code} Compactions stop with the java.lang.RuntimeException: Last written key DecoratedKey error on the nodes affected by either of the above 2 errors . Nodes that seem to have been repaired by the sstablescrub still continue to have java.lang.RuntimeException: Last written key DecoratedKey errors scattered through the logs but are still be compacting. Is there any further information we can supply to help debug? stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293412#comment-13293412 ] Anton Winter commented on CASSANDRA-4321: - If I use the v2 patch startup stops with the following: {code} INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null = current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db {code} Given the above I scrubbed the system keyspace which removed all sstables, leaving only the snapshots eg: {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 651) Row at 100 is unreadable; skipping to next WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 602) Non-fatal error reading row (stacktrace follows) java.lang.RuntimeException: Last written key null = current key DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db {code} ..eventually resulting in {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java (line 692) No valid rows found while scrubbing SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db'); it is marked for deletion now. If you want to attempt manual recovery, you can find a copy in the pre-scrub snapshot {code} A clean bootstrap also stops with similar errors: {code} java.lang.RuntimeException: Last written key null = current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db {code} and {code} java.lang.RuntimeException: Last written key null = current key DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db {code} stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter Assignee: Sylvain Lebresne Fix For: 1.1.2 Attachments: 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 0002-Scrub-detects-and-repair-outOfOrder-rows.txt After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging
[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293412#comment-13293412 ] Anton Winter edited comment on CASSANDRA-4321 at 6/12/12 7:46 AM: -- If I use the v2 patch startup stops with the following: {code} INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null = current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db {code} Given the above I (probably incorrectly) scrubbed the system keyspace which removed all sstables, leaving only the snapshots eg: {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 651) Row at 100 is unreadable; skipping to next WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 602) Non-fatal error reading row (stacktrace follows) java.lang.RuntimeException: Last written key null = current key DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db {code} ..eventually resulting in {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java (line 692) No valid rows found while scrubbing SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db'); it is marked for deletion now. If you want to attempt manual recovery, you can find a copy in the pre-scrub snapshot {code} A clean bootstrap also stops with similar errors: {code} java.lang.RuntimeException: Last written key null = current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db {code} and {code} java.lang.RuntimeException: Last written key null = current key DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db {code} was (Author: awinter): If I use the v2 patch startup stops with the following: {code} INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null = current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db {code} Given the above I scrubbed the system keyspace which removed all sstables, leaving only the snapshots eg: {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 651) Row at 100 is unreadable; skipping to next WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 602) Non-fatal error reading row (stacktrace follows) java.lang.RuntimeException: Last written key null = current key DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db {code} ..eventually resulting in {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java (line 692) No valid rows found while scrubbing SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db'); it is marked for deletion now. If you want to attempt manual recovery, you can find a copy in the pre-scrub snapshot {code} A clean bootstrap also stops with similar errors: {code} java.lang.RuntimeException: Last written key null = current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db {code} and {code} java.lang.RuntimeException: Last written key null = current key DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db {code} stackoverflow building interval tree possible sstable corruptions
[jira] [Created] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
Anton Winter created CASSANDRA-4321: --- Summary: stackoverflow building interval tree possible sstable corruptions Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39) at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) at org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) at org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) at org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:234) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:331) at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:309) at org.apache.cassandra.db.Table.initCf(Table.java:367) at org.apache.cassandra.db.Table.init(Table.java:299) at
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291520#comment-13291520 ] Anton Winter commented on CASSANDRA-4321: - The partitioner (RP) was not changed. stackoverflow building interval tree possible sstable corruptions --- Key: CASSANDRA-4321 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Reporter: Anton Winter After upgrading to 1.1.1 (from 1.1.0) I have started experiencing StackOverflowError's resulting in compaction backlog and failure to restart. The ring currently consists of 6 DC's and 22 nodes using LCS compression. This issue was first noted on 2 nodes in one DC and then appears to have spread to various other nodes in the other DC's. When the first occurrence of this was found I restarted the instance but it failed to start so I cleared its data and treated it as a replacement node for the token it was previously responsible for. This node successfully streamed all the relevant data back but failed again a number of hours later with the same StackOverflowError and again was unable to restart. The initial stack overflow error on a running instance looks like this: ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:314,1,main] java.lang.StackOverflowError at java.util.Arrays.mergeSort(Arrays.java:1157) at java.util.Arrays.sort(Arrays.java:1092) at java.util.Collections.sort(Collections.java:134) at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow. Compactions stop from this point onwards] I restarted this failing instance with DEBUG logging enabled and it throws the following exception part way through startup: ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] java.lang.StackOverflowError at org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) at org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) at org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) [snip - this repeats until stack overflow] at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62) at org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39) at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) at org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) at org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) at
[jira] [Created] (CASSANDRA-3194) repair streaming forwarding loop
repair streaming forwarding loop Key: CASSANDRA-3194 URL: https://issues.apache.org/jira/browse/CASSANDRA-3194 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.0.0 Reporter: Anton Winter I am able to reproduce what appears to be a streaming forwarding loop when running repairs. This affect only nodes using broadcast_address (ec2 external ip) listen_address of 0.0.0.0. (Configuration is using property file snitch in a multi DC NTS where some DC's are EC2 and others are not). The hosts in the other dc's not using broadcast_address do not experience this symptom. on ec2 host dc1host1: INFO [AntiEntropyStage:1] 2011-09-13 06:34:01,673 StreamingRepairTask.java (line 211) [streaming task #ce793c30-ddd1-11e0--071a4b76fefb] Received task from /0.0.0.0 to stream 12259 ranges to /external.ec2.ip.dc1host3 INFO [AntiEntropyStage:1] 2011-09-13 06:34:01,673 StreamingRepairTask.java (line 136) [streaming task #ce793c30-ddd1-11e0--071a4b76fefb] Forwarding streaming repair of 12259 ranges to /external.ec2.ip.of.dc1host1 (to be streamed with /external.ip.of.host3) The above appears to trigger another streaming task and results in saturating the network interfaces dc1host1. The above log entries are repeated until cassandra is killed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira