I may have found a trigger that is causing these problems. Anyone seen
these compaction problems in 1.1? I did run scrub on all my 1.0 data to
convert it to 1.1 and fix level-manifest problems before I started running
1.1.

1 node:
ERROR [CompactionExecutor:281] 2013-02-06 23:56:16,183
AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp
actionExecutor:281,1,main]
java.io.IOError:
org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid
column name length 0
        at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
        at
org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99)
        at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
        at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
        at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
        at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
        at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
        at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
        at
org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException:
invalid column name length 0
        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:98)
        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
        at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144)
        at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234
)
        at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112)
        ... 21 more

2nd node:
ERROR [CompactionExecutor:266] 2013-02-06 23:51:35,181
AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Comp
actionExecutor:266,1,main]
java.io.IOError: java.io.EOFException
        at
org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
        at
org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99)
        at
org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
        at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
        at
org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
        at
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
        at
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
        at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
        at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
        at
org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
        at java.io.RandomAccessFile.readFully(Unknown Source)
        at java.io.RandomAccessFile.readFully(Unknown Source)
        at
org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95)
        at
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
        at
org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363)
        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120)
        at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
        at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144)

On Wed, Feb 6, 2013 at 11:32 AM, Terry Cumaranatunge <cumar...@gmail.com>wrote:

> I've gotten timeouts on clients when using Cassandra 1.1.8 in a cluster of
> 12 nodes, but I don't see the same behavior when using Cassandra 1.0.10.
> So, to do a controlled experiment, the following was tried:
>
> 1. Started with Cassandra 1.0.10. Built a database and ran our test tools
> against it to build a database
> 2. Ran workload to ensure no timeout problems were seen. Stopped the load
> 3. Upgraded only 2 of the nodes in the cluster to 1.1.8. In the cluster of
> 12 nodes. Ran scrub afterwards as document states to convert sstables to
> 1.1 format and to fix level-manifest problems.
> 4. Started load back up
> 5. After some time, started seeing timeouts on the client for requests
> that go to the 1.1.8 nodes (i.e. requests sent to those nodes as the
> coordinator node)
>
> There appears to be a pattern in these timeouts in that a large burst of
> them occur every 10 minutes (on the 10 minute boundary of the hour, like
> 10:10:XX, 10:20:YY, 10:30:ZZ etc.). All clients see the timeouts from those
> two 1.1.8 nodes at the same exact time. The workload is not I/O bound at
> this point and requests are not being dropped either based on tpstat
> output. I don't see hinted handoff messages either as I believe that
> happens every 10 minutes. Key cache size is set to 2.7GB and memtable size
> is 1/3 of heap (2.7GB). The key cache memory usage is same as 1.0.10 based
> on heap size calculator. There are no GC pauses or any type of heap
> pressure messages in the logs. This is with Java 1.6.0.38.
>
> Does anyone know of some periodic tasks in Cassandra 1.1 that happens
> every 10 minutes that could explain this problem or have any ideas?
>
> Thanks
>

Reply via email to