[jira] [Commented] (CASSANDRA-6101) Debian init script broken

2013-10-03 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785808#comment-13785808
 ] 

Anton Winter commented on CASSANDRA-6101:
-

That {{service cassandra status}} problem is resolved by CASSANDRA-6090

 Debian init script broken
 -

 Key: CASSANDRA-6101
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6101
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anton Winter
Assignee: Eric Evans
Priority: Minor
 Attachments: 6101-classpath.patch, 6101.txt


 The debian init script released in 2.0.1 contains 2 issues:
 # The pidfile directory is not created if it doesn't already exist.
 # Classpath not exported to the start-stop-daemon.
 These lead to the init script not picking up jna.jar, or anything from the 
 debian EXTRA_CLASSPATH environment variable, and the init script not being 
 able to stop/restart Cassandra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-6101) Debian init script broken

2013-10-02 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784838#comment-13784838
 ] 

Anton Winter commented on CASSANDRA-6101:
-

Yes, that works as well.

 Debian init script broken
 -

 Key: CASSANDRA-6101
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6101
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anton Winter
Assignee: Eric Evans
Priority: Minor
 Attachments: 6101-classpath.patch, 6101.txt


 The debian init script released in 2.0.1 contains 2 issues:
 # The pidfile directory is not created if it doesn't already exist.
 # Classpath not exported to the start-stop-daemon.
 These lead to the init script not picking up jna.jar, or anything from the 
 debian EXTRA_CLASSPATH environment variable, and the init script not being 
 able to stop/restart Cassandra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (CASSANDRA-6101) Debian init script broken

2013-09-26 Thread Anton Winter (JIRA)
Anton Winter created CASSANDRA-6101:
---

 Summary: Debian init script broken
 Key: CASSANDRA-6101
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6101
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Anton Winter
Priority: Minor


The debian init script released in 2.0.1 contains 2 issues:

# The pidfile directory is not created if it doesn't already exist.
# Classpath not exported to the start-stop-daemon.

These lead to the init script not picking up jna.jar, or anything from the 
debian EXTRA_CLASSPATH environment variable, and the init script not being able 
to stop/restart Cassandra.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4561) update column family fails

2012-09-07 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450479#comment-13450479
 ] 

Anton Winter commented on CASSANDRA-4561:
-

My schema is in agreement, but the timestamps aren't fixed, even though the log 
suggests otherwise (is that previously mentioned NPE related?).  

{code}
[default@unknown] describe cluster;
Cluster Information:
   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
89b22434-5e34-381d-83d1-2a3cde1482fe: [x, x, x, x, x, x, x, x, x, x, x, 
x, x, x, x, x, x, x, x, x, x, x, x, x, x]

[default@unknown]
{code}

Schema updates continue to silently fail.

 update column family fails
 --

 Key: CASSANDRA-4561
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4561
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4
Reporter: Zenek Kraweznik
Assignee: Pavel Yaskevich
 Fix For: 1.1.5

 Attachments: CASSANDRA-4561.patch


 [default@test] show schema;
 create column family Messages
   with column_type = 'Standard'
   and comparator = 'AsciiType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'AsciiType'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 864000
   and min_compaction_threshold = 2
   and max_compaction_threshold = 4
   and replicate_on_write = true
   and compaction_strategy = 
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'KEYS_ONLY'
   and compaction_strategy_options = {'sstable_size_in_mb' : '1024'}
   and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' 
 : 'org.apache.cassandra.io.compress.DeflateCompressor'};
 [default@test] update column family Messages with min_compaction_threshold = 
 4 and  max_compaction_threshold = 32;
 a5b7544e-1ef5-3bfd-8770-c09594e37ec2
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@test] show schema;
 create column family Messages
   with column_type = 'Standard'
   and comparator = 'AsciiType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'AsciiType'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 864000
   and min_compaction_threshold = 2
   and max_compaction_threshold = 4
   and replicate_on_write = true
   and compaction_strategy = 
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'KEYS_ONLY'
   and compaction_strategy_options = {'sstable_size_in_mb' : '1024'}
   and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' 
 : 'org.apache.cassandra.io.compress.DeflateCompressor'};

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4561) update column family fails

2012-09-07 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450670#comment-13450670
 ] 

Anton Winter commented on CASSANDRA-4561:
-

Ring upgraded with the second patch and I am now able to perform schema 
updates.  Thanks!

 update column family fails
 --

 Key: CASSANDRA-4561
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4561
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4
Reporter: Zenek Kraweznik
Assignee: Pavel Yaskevich
 Fix For: 1.1.5

 Attachments: CASSANDRA-4561-CS.patch, CASSANDRA-4561.patch


 [default@test] show schema;
 create column family Messages
   with column_type = 'Standard'
   and comparator = 'AsciiType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'AsciiType'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 864000
   and min_compaction_threshold = 2
   and max_compaction_threshold = 4
   and replicate_on_write = true
   and compaction_strategy = 
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'KEYS_ONLY'
   and compaction_strategy_options = {'sstable_size_in_mb' : '1024'}
   and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' 
 : 'org.apache.cassandra.io.compress.DeflateCompressor'};
 [default@test] update column family Messages with min_compaction_threshold = 
 4 and  max_compaction_threshold = 32;
 a5b7544e-1ef5-3bfd-8770-c09594e37ec2
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@test] show schema;
 create column family Messages
   with column_type = 'Standard'
   and comparator = 'AsciiType'
   and default_validation_class = 'BytesType'
   and key_validation_class = 'AsciiType'
   and read_repair_chance = 0.1
   and dclocal_read_repair_chance = 0.0
   and gc_grace = 864000
   and min_compaction_threshold = 2
   and max_compaction_threshold = 4
   and replicate_on_write = true
   and compaction_strategy = 
 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'
   and caching = 'KEYS_ONLY'
   and compaction_strategy_options = {'sstable_size_in_mb' : '1024'}
   and compression_options = {'chunk_length_kb' : '64', 'sstable_compression' 
 : 'org.apache.cassandra.io.compress.DeflateCompressor'};

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4411) Assertion with LCS compaction

2012-07-23 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13421061#comment-13421061
 ] 

Anton Winter commented on CASSANDRA-4411:
-

I can also confirm that after multiple offline sstablescrubs across all nodes 
that I still had several nodes (but not all) spread across multiple DC's still 
exhibiting this problem as described above by Mina.  

In an attempt to work around the problem I shut down the affected instances, 
deleted all data and re-bootstrapped them as if they were dead nodes.  Since 
doing so I haven't had the problem return however it is still early days.

 Assertion with LCS compaction
 -

 Key: CASSANDRA-4411
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4411
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.3

 Attachments: 0001-Add-debugging-info-for-LCS.txt, 4411-followup.txt, 
 4411.txt, assertion-w-more-debugging-info-omid.log, 
 assertion.moreinfo.system.log, system.log


 As instructed in CASSANDRA-4321 I have raised this issue as a continuation of 
 that issue as it appears the problem still exists.
 I have repeatedly run sstablescrub across all my nodes after the 1.1.2 
 upgrade until sstablescrub shows no errors.  The exceptions described in 
 CASSANDRA-4321 do not occur as frequently now but the integrity check still 
 throws exceptions on a number of nodes.  Once those exceptions occur 
 compactionstats shows a large number of pending tasks with no progression 
 afterwards.
 {code}
 ERROR [CompactionExecutor:150] 2012-07-05 04:26:15,570 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:150,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
 at 
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
 at 
 org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
 at 
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
 at 
 org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4411) Assertion with LCS compaction

2012-07-04 Thread Anton Winter (JIRA)
Anton Winter created CASSANDRA-4411:
---

 Summary: Assertion with LCS compaction
 Key: CASSANDRA-4411
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4411
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
Reporter: Anton Winter


As instructed in CASSANDRA-4321 I have raised this issue as a continuation of 
that issue as it appears the problem still exists.

I have repeatedly run sstablescrub across all my nodes after the 1.1.2 upgrade 
until sstablescrub shows no errors.  The exceptions described in CASSANDRA-4321 
do not occur as frequently now but the integrity check still throws exceptions 
on a number of nodes.  Once those exceptions occur compactionstats shows a 
large number of pending tasks with no progression afterwards.

{code}
ERROR [CompactionExecutor:150] 2012-07-05 04:26:15,570 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:150,1,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at 
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at 
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-07-04 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406844#comment-13406844
 ] 

Anton Winter commented on CASSANDRA-4321:
-

New issue raised as requested: CASSANDRA-4411

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 0001-Fix-overlapping-computation-v7.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
 0003-Create-standalone-scrub-v7.txt, 
 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
 ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39)
 at 
 org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-07-03 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406254#comment-13406254
 ] 

Anton Winter commented on CASSANDRA-4321:
-

I have repeatedly run sstablescrub across all my nodes and the exceptions do 
not occur as frequently now, however, the integrity check still throw 
exceptions.  compactionstats shows a large number of pending tasks but no 
progression after this error.

Should this ticket be reopened or a new one raised?

{code}
ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:912,1,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at 
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at 
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
{code}


 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 0001-Fix-overlapping-computation-v7.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
 0003-Create-standalone-scrub-v7.txt, 
 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
 ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 

[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-07-03 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406254#comment-13406254
 ] 

Anton Winter edited comment on CASSANDRA-4321 at 7/4/12 3:28 AM:
-

I have repeatedly run sstablescrub across all my nodes and the exceptions do 
not occur as frequently now, however, the integrity check still throws 
exceptions and compactionstats shows a large number of pending tasks but no 
progression afterwards.

Should this ticket be reopened or a new one raised?

{code}
ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:912,1,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at 
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at 
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
{code}


  was (Author: awinter):
I have repeatedly run sstablescrub across all my nodes and the exceptions 
do not occur as frequently now, however, the integrity check still throw 
exceptions.  compactionstats shows a large number of pending tasks but no 
progression after this error.

Should this ticket be reopened or a new one raised?

{code}
ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:912,1,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at 
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at 
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
{code}

  
 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 0001-Fix-overlapping-computation-v7.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
 0003-Create-standalone-scrub-v7.txt, 
 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
 ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-29 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13403764#comment-13403764
 ] 

Anton Winter commented on CASSANDRA-4321:
-

I've applied the v7 patches and have successfully offline scrubbed  reinserted 
a number of nodes in my ring without further occurrence of the previous issues. 
 Thanks :)

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 0001-Fix-overlapping-computation-v7.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
 0003-Create-standalone-scrub-v7.txt, 
 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
 ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-29 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404333#comment-13404333
 ] 

Anton Winter commented on CASSANDRA-4321:
-

Maybe I spoke too soon.  Overnight I've seen the exceptions happen again on 
nodes that were v7 patched  scrubbed.  

{code}
ERROR [CompactionExecutor:1301] 2012-06-29 21:54:12,078 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:1301,1,main]
java.lang.RuntimeException: Last written key 
DecoratedKey(116816802911061669023614481109871014436, 4faa631ca88ef85b8e26ddeb) 
= current key DecoratedKey(115179899219377463875853982254751557438, 
4fa892bf42d3f24479f627b6) writing into 
/var/lib//data/cassandra/KS/CF/KS-CF-tmp-hd-837655-Data.db
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
{code}

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 0001-Fix-overlapping-computation-v7.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
 0003-Create-standalone-scrub-v7.txt, 
 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
 ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-29 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404338#comment-13404338
 ] 

Anton Winter commented on CASSANDRA-4321:
-

1.1 dev branch + patches

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 0001-Fix-overlapping-computation-v7.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
 0003-Create-standalone-scrub-v7.txt, 
 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
 ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39)
 at 
 org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
 at 
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-22 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13399760#comment-13399760
 ] 

Anton Winter commented on CASSANDRA-4321:
-

bq. Was I lucky? Are you guys able to reproduce those steps and still get more 
errors?

As discussed, but repeated here just for the ticket's reference; I was patching 
and scrubbing in the same way as described above.  Once the scrubbed nodes were 
restarted in the cluster they were then under normal read/write load and 
experienced the exceptions again.  Given that the sstablescrub and subsequent 
compactions run fine in Sylvain's test, using my out of order sstables, means 
that the sstablescrub command appears to do its job fine.  The root cause, 
originally expected to be resolved with the 0001 patch, still appears to be 
occurring so Sylvain was going to investigate the code further.

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 
 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-20 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398206#comment-13398206
 ] 

Anton Winter commented on CASSANDRA-4321:
-

After working around the issue with the 0003 v5 patch that Omid refers I've had 
an sstablescrub complete on one of my servers.  sstablescrub did detected 
several overlapping sstables, resetting them to L0, but no out of order keys.

The Last written key DecoratedKey = current key exception however resurfaces 
again after the first set of compactions, 5 minutes after startup, in the exact 
same manner as before.  The same exception occurs for various CF's until 
compactions stop completely.  compactionstats still shows a large number of 
pending compaction tasks after this event.

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 
 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-18 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396433#comment-13396433
 ] 

Anton Winter commented on CASSANDRA-4321:
-

I can confirm I also experienced the Unexpected empty index file errors on 
some of the nodes that I have run sstablescrub on.

On some other nodes the sstablescrub command appears to complete successfully 
but compactions still stops at the java.lang.RuntimeException: Last written 
key DecoratedKey error.

Is there any further information we can supply to help debug?

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 
 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 

[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-18 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396433#comment-13396433
 ] 

Anton Winter edited comment on CASSANDRA-4321 at 6/19/12 2:01 AM:
--

I can confirm I also experienced the Unexpected empty index file errors on 
some of the nodes that I have run sstablescrub on.

Other nodes had this error when running sstablescrub:
{code}
Scrub of 
SSTableReader(path='/var/lib//data/cassandra/KS/CF/KS-CF-hd-259648-Data.db')
 complete: 1592 rows in new sstable and 0 empty (tombstoned) rows dropped
EOF after 6 bytes out of 8
{code}

Compactions stop with the java.lang.RuntimeException: Last written key 
DecoratedKey error on the nodes affected by either of the above 2 errors .

Nodes that seem to have been repaired by the sstablescrub still continue to 
have java.lang.RuntimeException: Last written key DecoratedKey errors 
scattered through the logs but are still be compacting.

Is there any further information we can supply to help debug?

  was (Author: awinter):
I can confirm I also experienced the Unexpected empty index file errors 
on some of the nodes that I have run sstablescrub on.

On some other nodes the sstablescrub command appears to complete successfully 
but compactions still stops at the java.lang.RuntimeException: Last written 
key DecoratedKey error.

Is there any further information we can supply to help debug?
  
 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 
 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 

[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-18 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396433#comment-13396433
 ] 

Anton Winter edited comment on CASSANDRA-4321 at 6/19/12 2:02 AM:
--

I can confirm I also experienced the Unexpected empty index file errors on 
some of the nodes that I have run sstablescrub on.

Other nodes had this error when running sstablescrub:
{code}
Scrub of 
SSTableReader(path='/var/lib//data/cassandra/KS/CF/KS-CF-hd-259648-Data.db')
 complete: 1592 rows in new sstable and 0 empty (tombstoned) rows dropped
EOF after 6 bytes out of 8
{code}

Compactions stop with the java.lang.RuntimeException: Last written key 
DecoratedKey error on the nodes affected by either of the above 2 errors .

Nodes that seem to have been repaired by the sstablescrub still continue to 
have java.lang.RuntimeException: Last written key DecoratedKey errors 
scattered through the logs but are still compacting.

Is there any further information we can supply to help debug?

  was (Author: awinter):
I can confirm I also experienced the Unexpected empty index file errors 
on some of the nodes that I have run sstablescrub on.

Other nodes had this error when running sstablescrub:
{code}
Scrub of 
SSTableReader(path='/var/lib//data/cassandra/KS/CF/KS-CF-hd-259648-Data.db')
 complete: 1592 rows in new sstable and 0 empty (tombstoned) rows dropped
EOF after 6 bytes out of 8
{code}

Compactions stop with the java.lang.RuntimeException: Last written key 
DecoratedKey error on the nodes affected by either of the above 2 errors .

Nodes that seem to have been repaired by the sstablescrub still continue to 
have java.lang.RuntimeException: Last written key DecoratedKey errors 
scattered through the logs but are still be compacting.

Is there any further information we can supply to help debug?
  
 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 
 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-12 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293412#comment-13293412
 ] 

Anton Winter commented on CASSANDRA-4321:
-

If I use the v2 patch startup stops with the following:
{code}
 INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) 
Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live 
bytes, 1 ops)
 INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing 
Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops)
ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java 
(line 134) Exception in thread 
Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null 
= current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing 
into 
/var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db
{code}

Given the above I scrubbed the system keyspace which removed all sstables, 
leaving only the snapshots eg:

{code}
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 651) Row at 100 is unreadable; skipping to next
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 602) Non-fatal error reading row (stacktrace follows)
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db
{code}
..eventually resulting in
{code}
WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java 
(line 692) No valid rows found while scrubbing 
SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db');
 it is marked for deletion now. If you want to attempt manual recovery, you can 
find a copy in the pre-scrub snapshot
{code}

A clean bootstrap also stops with similar errors:
{code}
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(61078635599166706937511052402724559481, 4c) writing into 
/var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db
{code}
and 
{code}
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db
{code}


 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 
 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, 
 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
 0002-Scrub-detects-and-repair-outOfOrder-rows.txt


 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging 

[jira] [Comment Edited] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-12 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293412#comment-13293412
 ] 

Anton Winter edited comment on CASSANDRA-4321 at 6/12/12 7:46 AM:
--

If I use the v2 patch startup stops with the following:
{code}
 INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) 
Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live 
bytes, 1 ops)
 INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing 
Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops)
ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java 
(line 134) Exception in thread 
Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null 
= current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing 
into 
/var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db
{code}

Given the above I (probably incorrectly) scrubbed the system keyspace which 
removed all sstables, leaving only the snapshots eg:

{code}
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 651) Row at 100 is unreadable; skipping to next
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 602) Non-fatal error reading row (stacktrace follows)
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db
{code}
..eventually resulting in
{code}
WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java 
(line 692) No valid rows found while scrubbing 
SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db');
 it is marked for deletion now. If you want to attempt manual recovery, you can 
find a copy in the pre-scrub snapshot
{code}

A clean bootstrap also stops with similar errors:
{code}
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(61078635599166706937511052402724559481, 4c) writing into 
/var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db
{code}
and 
{code}
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db
{code}


  was (Author: awinter):
If I use the v2 patch startup stops with the following:
{code}
 INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) 
Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live 
bytes, 1 ops)
 INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing 
Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops)
ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java 
(line 134) Exception in thread 
Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null 
= current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing 
into 
/var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db
{code}

Given the above I scrubbed the system keyspace which removed all sstables, 
leaving only the snapshots eg:

{code}
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 651) Row at 100 is unreadable; skipping to next
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 602) Non-fatal error reading row (stacktrace follows)
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db
{code}
..eventually resulting in
{code}
WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java 
(line 692) No valid rows found while scrubbing 
SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db');
 it is marked for deletion now. If you want to attempt manual recovery, you can 
find a copy in the pre-scrub snapshot
{code}

A clean bootstrap also stops with similar errors:
{code}
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(61078635599166706937511052402724559481, 4c) writing into 
/var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db
{code}
and 
{code}
java.lang.RuntimeException: Last written key null = current key 
DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db
{code}

  
 stackoverflow building interval tree  possible sstable corruptions
 

[jira] [Created] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-07 Thread Anton Winter (JIRA)
Anton Winter created CASSANDRA-4321:
---

 Summary: stackoverflow building interval tree  possible sstable 
corruptions
 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter


After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
StackOverflowError's resulting in compaction backlog and failure to restart. 

The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
This issue was first noted on 2 nodes in one DC and then appears to have spread 
to various other nodes in the other DC's.  

When the first occurrence of this was found I restarted the instance but it 
failed to start so I cleared its data and treated it as a replacement node for 
the token it was previously responsible for.  This node successfully streamed 
all the relevant data back but failed again a number of hours later with the 
same StackOverflowError and again was unable to restart. 

The initial stack overflow error on a running instance looks like this:

ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:314,1,main]
java.lang.StackOverflowError
at java.util.Arrays.mergeSort(Arrays.java:1157)
at java.util.Arrays.sort(Arrays.java:1092)
at java.util.Collections.sort(Collections.java:134)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)

[snip - this repeats until stack overflow.  Compactions stop from this point 
onwards]


I restarted this failing instance with DEBUG logging enabled and it throws the 
following exception part way through startup:

ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.StackOverflowError
at 
org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
at 
org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
at 
org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
at org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
at org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)

[snip - this repeats until stack overflow]

at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
at 
org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39)
at 
org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
at 
org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617)
at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
at 
org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259)
at 
org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:234)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:331)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:309)
at org.apache.cassandra.db.Table.initCf(Table.java:367)
at org.apache.cassandra.db.Table.init(Table.java:299)
at 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree possible sstable corruptions

2012-06-07 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291520#comment-13291520
 ] 

Anton Winter commented on CASSANDRA-4321:
-

The partitioner (RP) was not changed.

 stackoverflow building interval tree  possible sstable corruptions
 ---

 Key: CASSANDRA-4321
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
Reporter: Anton Winter

 After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
 StackOverflowError's resulting in compaction backlog and failure to restart. 
 The ring currently consists of 6 DC's and 22 nodes using LCS  compression.  
 This issue was first noted on 2 nodes in one DC and then appears to have 
 spread to various other nodes in the other DC's.  
 When the first occurrence of this was found I restarted the instance but it 
 failed to start so I cleared its data and treated it as a replacement node 
 for the token it was previously responsible for.  This node successfully 
 streamed all the relevant data back but failed again a number of hours later 
 with the same StackOverflowError and again was unable to restart. 
 The initial stack overflow error on a running instance looks like this:
 ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[CompactionExecutor:314,1,main]
 java.lang.StackOverflowError
 at java.util.Arrays.mergeSort(Arrays.java:1157)
 at java.util.Arrays.sort(Arrays.java:1092)
 at java.util.Collections.sort(Collections.java:134)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:49)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow.  Compactions stop from this point 
 onwards]
 I restarted this failing instance with DEBUG logging enabled and it throws 
 the following exception part way through startup:
 ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
 java.lang.StackOverflowError
 at 
 org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
 at 
 org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
 at 
 org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
 at 
 org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
 at 
 org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:45)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 [snip - this repeats until stack overflow]
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:64)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalNode.init(IntervalNode.java:62)
 at 
 org.apache.cassandra.utils.IntervalTree.IntervalTree.init(IntervalTree.java:39)
 at 
 org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
 at 
 org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617)
 at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
 at 
 org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259)
 at 
 

[jira] [Created] (CASSANDRA-3194) repair streaming forwarding loop

2011-09-13 Thread Anton Winter (JIRA)
repair streaming forwarding loop


 Key: CASSANDRA-3194
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3194
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Anton Winter


I am able to reproduce what appears to be a streaming forwarding loop when 
running repairs.  This affect only nodes using broadcast_address (ec2 external 
ip)  listen_address of 0.0.0.0. (Configuration is using property file snitch 
in a multi DC NTS where some DC's are EC2 and others are not).  The hosts in 
the other dc's not using broadcast_address do not experience this symptom.

on ec2 host dc1host1:
INFO [AntiEntropyStage:1] 2011-09-13 06:34:01,673 StreamingRepairTask.java 
(line 211) [streaming task #ce793c30-ddd1-11e0--071a4b76fefb] Received task 
from /0.0.0.0 to stream 12259 ranges to /external.ec2.ip.dc1host3
 INFO [AntiEntropyStage:1] 2011-09-13 06:34:01,673 StreamingRepairTask.java 
(line 136) [streaming task #ce793c30-ddd1-11e0--071a4b76fefb] Forwarding 
streaming repair of 12259 ranges to /external.ec2.ip.of.dc1host1 (to be 
streamed with /external.ip.of.host3)

The above appears to trigger another streaming task and results in saturating 
the network interfaces dc1host1.  The above log entries are repeated until 
cassandra is killed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira