I expect that this problem was due to https://issues.apache.org/jira/browse/CASSANDRA-2216 : I'll make noise to try and get it released soon as 0.7.3
On Tue, Feb 22, 2011 at 5:41 AM, David Boxenhorn <da...@lookin2.com> wrote: > Thanks, Shimi. I'll keep you posted if we make progress. Riptano is working > on this problem too. > > On Tue, Feb 22, 2011 at 3:30 PM, shimi <shim...@gmail.com> wrote: > >> I didn't solved it. >> Since it is a test cluster I deleted all the data. I copied some sstables >> from my production cluster and I tried again, this time I didn't have this >> problem. >> I am planing on removing everything from this test cluster. I will start >> all over again with 0.6.x , then I will load it with 10th of GB of data (not >> sstable copy) and test the upgrade again. >> >> I did a mistake that I didn't backup the data files before I upgraded. >> >> Shimi >> >> On Tue, Feb 22, 2011 at 2:24 PM, David Boxenhorn <da...@lookin2.com>wrote: >> >>> Shimi, >>> >>> I am getting the same error that you report here. What did you do to >>> solve it? >>> >>> David >>> >>> >>> On Thu, Feb 10, 2011 at 2:54 PM, shimi <shim...@gmail.com> wrote: >>> >>>> I upgraded the version on all the nodes but I still gets the Exceptions. >>>> I run cleanup on one of the nodes but I don't think there is any cleanup >>>> going on. >>>> >>>> Another weird thing that I see is: >>>> INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353 >>>> CompactionIterator.java (line 135) Compacting large row >>>> 333531353730363835363237353338383836383035363036393135323132383 >>>> 73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273 >>>> (725849473109 bytes) incrementally >>>> >>>> In my production version the largest row is 10259. It shouldn't be >>>> different in this case. >>>> >>>> The first Exception is been thrown on 3 nodes during compaction. >>>> The second Exception (Internal error processing get_range_slices) is >>>> been thrown all the time by a forth node. I disabled gossip and any client >>>> traffic to it and I still get the Exceptions. >>>> Is it possible to boot a node with gossip disable? >>>> >>>> Shimi >>>> >>>> On Thu, Feb 10, 2011 at 11:11 AM, aaron morton <aa...@thelastpickle.com >>>> > wrote: >>>> >>>>> I should be able to repair, install the new version and kick off >>>>> nodetool repair . >>>>> >>>>> If you are uncertain search for cassandra-1992 on the list, there has >>>>> been some discussion. You can also wait till some peeps in the states wake >>>>> up if you want to be extra sure. >>>>> >>>>> The number if the number of columns the iterator is going to return >>>>> from the row. I'm guessing that because this happening during compaction >>>>> it's using asked for the maximum possible number of columns. >>>>> >>>>> Aaron >>>>> >>>>> >>>>> >>>>> On 10 Feb 2011, at 21:37, shimi wrote: >>>>> >>>>> On 10 Feb 2011, at 13:42, Dan Hendry wrote: >>>>> >>>>> Out of curiosity, do you really have on the order of 1,986,622,313 >>>>> elements (I believe elements=keys) in the cf? >>>>> >>>>> Dan >>>>> >>>>> No. I was too puzzled by the numbers >>>>> >>>>> >>>>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton < >>>>> aa...@thelastpickle.com> wrote: >>>>> >>>>>> Shimi, >>>>>> You may be seeing the result of CASSANDRA-1992, are you able to test >>>>>> with the most recent 0.7 build ? >>>>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/ >>>>>> >>>>>> >>>>>> Aaron >>>>>> >>>>> I will. I hope the data was not corrupted. >>>>> >>>>> >>>>> >>>>> On Thu, Feb 10, 2011 at 10:30 AM, aaron morton < >>>>> aa...@thelastpickle.com> wrote: >>>>> >>>>>> Shimi, >>>>>> You may be seeing the result of CASSANDRA-1992, are you able to test >>>>>> with the most recent 0.7 build ? >>>>>> https://hudson.apache.org/hudson/job/Cassandra-0.7/ >>>>>> >>>>>> >>>>>> Aaron >>>>>> >>>>>> On 10 Feb 2011, at 13:42, Dan Hendry wrote: >>>>>> >>>>>> Out of curiosity, do you really have on the order of 1,986,622,313 >>>>>> elements (I believe elements=keys) in the cf? >>>>>> >>>>>> Dan >>>>>> >>>>>> *From:* shimi [mailto:shim...@gmail.com] >>>>>> *Sent:* February-09-11 15:06 >>>>>> *To:* user@cassandra.apache.org >>>>>> *Subject:* Exceptions on 0.7.0 >>>>>> >>>>>> I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X >>>>>> On 3 out of the 4 nodes I get exceptions in the log. >>>>>> I am using RP. >>>>>> Changes that I did: >>>>>> 1. changed the replication factor from 3 to 4 >>>>>> 2. configured the nodes to use Dynamic Snitch >>>>>> 3. RR of 0.33 >>>>>> >>>>>> I run repair on 2 nodes before I noticed the errors. One of them is >>>>>> having the first error and the other the second. >>>>>> I restart the nodes but I still get the exceptions. >>>>>> >>>>>> The following Exception I get from 2 nodes: >>>>>> WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java >>>>>> (line 84) Cannot provide an optimal Bloom >>>>>> Filter for 1986622313 elements (1/4 buckets per element). >>>>>> ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190 >>>>>> AbstractCassandraDaemon.java (line 91) Fatal exception in >>>>>> thread Thread[CompactionExecutor:1,1,main] >>>>>> java.io.IOError: java.io.EOFException >>>>>> at >>>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105) >>>>>> at >>>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34) >>>>>> at >>>>>> org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284) >>>>>> at >>>>>> org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) >>>>>> at >>>>>> org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) >>>>>> at >>>>>> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) >>>>>> at >>>>>> com.google.common.collect.Iterators$7.computeNext(Iterators.java:604) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:76) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50) >>>>>> at >>>>>> org.apache.cassandra.io.LazilyCompactedRow.<init>(LazilyCompactedRow.java:88) >>>>>> at >>>>>> org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:136) >>>>>> at >>>>>> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107) >>>>>> at >>>>>> org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42) >>>>>> at >>>>>> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) >>>>>> at >>>>>> org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) >>>>>> at >>>>>> org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) >>>>>> at >>>>>> org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323) >>>>>> at >>>>>> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122) >>>>>> at >>>>>> org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92) >>>>>> at >>>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>> Caused by: java.io.EOFException >>>>>> at >>>>>> java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) >>>>>> at >>>>>> org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35) >>>>>> at >>>>>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:101) >>>>>> ... 29 more >>>>>> >>>>>> >>>>>> On another node I get: >>>>>> >>>>>> ERROR [pool-1-thread-2] 2011-02-09 19:48:32,137 Cassandra.java (line >>>>>> 2876) Internal error processing get_range_ >>>>>> slices >>>>>> java.lang.RuntimeException: error reading 1 of 1970563183 >>>>>> at >>>>>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:82) >>>>>> at >>>>>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) >>>>>> at >>>>>> org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108) >>>>>> at >>>>>> org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(CollatingIterator.java:364) >>>>>> at >>>>>> org.apache.commons.collections.iterators.CollatingIterator.hasNext(CollatingIterator.java:217) >>>>>> at >>>>>> org.apache.cassandra.db.RowIteratorFactory$3.getReduced(RowIteratorFactory.java:136) >>>>>> at >>>>>> org.apache.cassandra.db.RowIteratorFactory$3.getReduced(RowIteratorFactory.java:106) >>>>>> at >>>>>> org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) >>>>>> at >>>>>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) >>>>>> at >>>>>> org.apache.cassandra.db.RowIterator.hasNext(RowIterator.java:49) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1294) >>>>>> at >>>>>> org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:438) >>>>>> at >>>>>> org.apache.cassandra.thrift.CassandraServer.get_range_slices(CassandraServer.java:473) >>>>>> at >>>>>> org.apache.cassandra.thrift.Cassandra$Processor$get_range_slices.process(Cassandra.java:2868) >>>>>> at >>>>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555) >>>>>> at >>>>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:1 >>>>>> 67) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>> Caused by: java.io.EOFException >>>>>> at >>>>>> java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) >>>>>> at >>>>>> org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35) >>>>>> at >>>>>> org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:78) >>>>>> ... 21 more >>>>>> >>>>>> any idea what went wrong? >>>>>> Shimi >>>>>> >>>>>> No virus found in this incoming message. >>>>>> Checked by AVG - www.avg.com >>>>>> Version: 9.0.872 / Virus Database: 271.1.1/3432 - Release Date: >>>>>> 02/09/11 02:34:00 >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >