More data, I decided to force a flush on the table I know was corrupt and got this error
ERROR [MINOR-COMPACTION-POOL:1] 2009-09-30 18:42:01,010 DebuggableThreadPoolExec utor.java (line 103) Error in executor futuretaskjava.util.concurrent.ExecutionException: java.io.EOFException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logFutur eExceptions(DebuggableThreadPoolExecutor.java:95) at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor .afterExecute(DebuggableScheduledThreadPoolExecutor.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:619)Caused by: java.io.EOFException at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) at org.apache.cassandra.io.IndexHelper.skipIndex(IndexHelper.java:77) at org.apache.cassandra.io.IndexHelper.skipBloomFilterAndIndex(IndexHelper.java:46) at org.apache.cassandra.io.IteratingRow.<init>(IteratingRow.java:47) at org.apache.cassandra.io.FileStruct.advance(FileStruct.java:124) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFami lyStore.java:1108) at org.apache.cassandra.db.ColumnFamilyStore.doCompaction(ColumnFamilyStore.java:689) at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:165) at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:162) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more -Anthony On Wed, Sep 30, 2009 at 11:24:15AM -0700, Anthony Molinaro wrote: > Hi, > > I'm not getting any responses on IRC, so figured I'd put this out on > the mailing list. > > I had a 3 node cassandra cluster, replication factor 3 on > 3 ec2 m1.large instances behind an haproxy. I restarted one > of the node to test out some modified sysctl's (tcp stack tuning). > As soon as I restarted it the other 2 nodes started spiking memory > use and the first node seemed to have corrupted data. The corruption > is an exception when I read some and only some keys. > > The exception is > > ERROR [pool-1-thread-1] 2009-09-30 17:50:30,037 Cassandra.java (line 679) > Internal error processing get_slice > java.lang.RuntimeException: java.io.EOFException > at > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:104) > at > org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182) > at > org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251) > at > org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220) > at > org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671) > at > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.io.EOFException > at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) > at > org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95) > at > org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118) > at > org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56) > at > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390) > at > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325) > at org.apache.cassandra.db.Table.getRow(Table.java:590) > at > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) > at > org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:471) > at > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:309) > at > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100) > ... 9 more > > > I ended up having to fire up some new instances, and reload the data > (luckily this is my small instance which I can reload quickly, I've got a > large cassandra cluster currently loading which I will not be > able to do this with, so I'm a little scared about that cluster). > > Anyway, any ideas? I've left the broken cluster so I can > investigate/patch/etc. > > -Anthony > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>