[jira] [Commented] (CASSANDRA-4456) AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair

2012-07-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420685#comment-13420685
 ] 

Jonathan Ellis commented on CASSANDRA-4456:
---

I think this was introduced by CASSANDRA-3721: getOverlappingSSTables assumes 
that the sstables we check for overlaps are part of the live set, but now we 
can validate over a snapshot instead.

 AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair
 --

 Key: CASSANDRA-4456
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4456
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Sylvain Lebresne

 We have hit the following exception on several nodes while running repairs 
 across our 1.1.2 ring. We've observed it happen on either the node executing 
 the repair or a participating replica in the repair operation. The result in 
 either case is that the repair hangs.
 ERROR [ValidationExecutor:9] 2012-07-21 01:54:03,019 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[ValidationExecutor:9,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:874)
 at 
 org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:834)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:698)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 In building this ring we migrated sstables from an identical 0.8.8 ring by:
  1. Creating the schema on our new 1.1.2 ring.
  2. Rsyncing over sstables from 0.8.8 ring.
  3. Renaming the sstables to match the directory and file naming structure of 
 1.1.x.
  4. Ran nodetool refresh keyspace cf for each CF across each node.
  5. Ran nodetool upgradesstables for each CF across each node.
 When those steps had completed, we began rolling repairs. Not all of the 
 repair operations have hit the exception -- some of the repairs have 
 completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4456) AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair

2012-07-23 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420746#comment-13420746
 ] 

Sylvain Lebresne commented on CASSANDRA-4456:
-

Actually I think this can happen even when snapshots are not used since a 
sstable can finish to be compacted just between when we chose sstable for 
repair and when we create the CompactionController for the validation 
compaction. In particular, I wonder if Michael and Mike have used -snapshot for 
their compaction. Though it's true that repair on snapshot will make that way 
more likely to happen.

But actually I don't think we need to call getOverlappingSStables at all in the 
first place for repair, since this is used only to decide if we can purge but 
repair does not do purging. Attaching a simple patch to skip the call entirely.


 AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair
 --

 Key: CASSANDRA-4456
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4456
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Sylvain Lebresne
 Fix For: 1.1.3

 Attachments: 4456.txt


 We have hit the following exception on several nodes while running repairs 
 across our 1.1.2 ring. We've observed it happen on either the node executing 
 the repair or a participating replica in the repair operation. The result in 
 either case is that the repair hangs.
 ERROR [ValidationExecutor:9] 2012-07-21 01:54:03,019 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[ValidationExecutor:9,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:874)
 at 
 org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:834)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:698)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 In building this ring we migrated sstables from an identical 0.8.8 ring by:
  1. Creating the schema on our new 1.1.2 ring.
  2. Rsyncing over sstables from 0.8.8 ring.
  3. Renaming the sstables to match the directory and file naming structure of 
 1.1.x.
  4. Ran nodetool refresh keyspace cf for each CF across each node.
  5. Ran nodetool upgradesstables for each CF across each node.
 When those steps had completed, we began rolling repairs. Not all of the 
 repair operations have hit the exception -- some of the repairs have 
 completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4456) AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair

2012-07-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420755#comment-13420755
 ] 

Jonathan Ellis commented on CASSANDRA-4456:
---

You need to wire VCC in to ValidationCompactionIterable, but otherwise +1.

 AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair
 --

 Key: CASSANDRA-4456
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4456
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Sylvain Lebresne
 Fix For: 1.1.3

 Attachments: 4456.txt


 We have hit the following exception on several nodes while running repairs 
 across our 1.1.2 ring. We've observed it happen on either the node executing 
 the repair or a participating replica in the repair operation. The result in 
 either case is that the repair hangs.
 ERROR [ValidationExecutor:9] 2012-07-21 01:54:03,019 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[ValidationExecutor:9,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:874)
 at 
 org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:834)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:698)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 In building this ring we migrated sstables from an identical 0.8.8 ring by:
  1. Creating the schema on our new 1.1.2 ring.
  2. Rsyncing over sstables from 0.8.8 ring.
  3. Renaming the sstables to match the directory and file naming structure of 
 1.1.x.
  4. Ran nodetool refresh keyspace cf for each CF across each node.
  5. Ran nodetool upgradesstables for each CF across each node.
 When those steps had completed, we began rolling repairs. Not all of the 
 repair operations have hit the exception -- some of the repairs have 
 completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4456) AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair

2012-07-21 Thread Michael Theroux (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419912#comment-13419912
 ] 

Michael Theroux commented on CASSANDRA-4456:


I just hit this problem myself today, on a single node in a six node cluster.  
I was running nodetool repair, and it halted with this exception in the log.  I 
was monitoring the repair pretty closely.  A couple of observations:

1) It happened while compaction of the same column family was happening 
simultaneously
2) When I re-ran it, it worked.

Note: I am not a cassandra developer, but I looked at the code.  A highly 
uneducated guess is that an sstable was compacted and deleted while validation 
was expecting it to be there?  

 AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair
 --

 Key: CASSANDRA-4456
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4456
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner

 We have hit the following exception on several nodes while running repairs 
 across our 1.1.2 ring. We've observed it happen on either the node executing 
 the repair or a participating replica in the repair operation. The result in 
 either case is that the repair hangs.
 ERROR [ValidationExecutor:9] 2012-07-21 01:54:03,019 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[ValidationExecutor:9,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:874)
 at 
 org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:834)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:698)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 In building this ring we migrated sstables from an identical 0.8.8 ring by:
  1. Creating the schema on our new 1.1.2 ring.
  2. Rsyncing over sstables from 0.8.8 ring.
  3. Renaming the sstables to match the directory and file naming structure of 
 1.1.x.
  4. Ran nodetool refresh keyspace cf for each CF across each node.
  5. Ran nodetool upgradesstables for each CF across each node.
 When those steps had completed, we began rolling repairs. Not all of the 
 repair operations have hit the exception -- some of the repairs have 
 completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4456) AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair

2012-07-21 Thread Michael Theroux (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419913#comment-13419913
 ] 

Michael Theroux commented on CASSANDRA-4456:


I am also on 1.1.2.

 AssertionError in ColumnFamilyStore.getOverlappingSSTables() during repair
 --

 Key: CASSANDRA-4456
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4456
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner

 We have hit the following exception on several nodes while running repairs 
 across our 1.1.2 ring. We've observed it happen on either the node executing 
 the repair or a participating replica in the repair operation. The result in 
 either case is that the repair hangs.
 ERROR [ValidationExecutor:9] 2012-07-21 01:54:03,019 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[ValidationExecutor:9,1,main]
 java.lang.AssertionError
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getOverlappingSSTables(ColumnFamilyStore.java:874)
 at 
 org.apache.cassandra.db.compaction.CompactionController.init(CompactionController.java:69)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$ValidationCompactionIterable.init(CompactionManager.java:834)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:698)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 In building this ring we migrated sstables from an identical 0.8.8 ring by:
  1. Creating the schema on our new 1.1.2 ring.
  2. Rsyncing over sstables from 0.8.8 ring.
  3. Renaming the sstables to match the directory and file naming structure of 
 1.1.x.
  4. Ran nodetool refresh keyspace cf for each CF across each node.
  5. Ran nodetool upgradesstables for each CF across each node.
 When those steps had completed, we began rolling repairs. Not all of the 
 repair operations have hit the exception -- some of the repairs have 
 completed successfully.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira