We attempted a compaction to see if that would improve read performance (BTW: write performance is as expected, fast!). Here is the result, an ArrayOutOfBounds exception:
INFO 11:48:41,070 Compacting [org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/DateIndex-e-7-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/FieldIndex-e-9-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/FieldIndex-e-10-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/test/cassandra/data/Logging/Messages-e-13-Data.db')] ERROR 11:48:41,080 Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:58) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at java.util.concurrent.ConcurrentSkipListMap$ComparableUsingComparator.compareTo(ConcurrentSkipListMap.java:606) at java.util.concurrent.ConcurrentSkipListMap.doPut(ConcurrentSkipListMap.java:878) at java.util.concurrent.ConcurrentSkipListMap.putIfAbsent(ConcurrentSkipListMap.java:1893) at org.apache.cassandra.db.ColumnFamily.addColumn(ColumnFamily.java:218) at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:130) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137) at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:312) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Does any of that mean anything to anyone? Thanks... Bill- On Thu, Feb 10, 2011 at 11:00 AM, Bill Speirs <bill.spe...@gmail.com> wrote: > I have a 7 node setup with a replication factor of 1 and a read > consistency of 1. I have two column families: Messages which stores > millions of rows with a UUID for the row key, DateIndex which stores > thousands of rows with a String as the row key. I perform 2 look-ups > for my queries: > > 1) Fetch the row from DateIndex that includes the date I'm looking > for. This returns 1,000 columns where the column names are the UUID of > the messages > 2) Do a multi-get (Hector client) using those 1,000 row keys I got > from the first query. > > Query 1 is taking ~300ms to fetch 1,000 columns from a single row... > respectable. However, query 2 is taking over 50s to perform 1,000 row > look-ups! Also, when I scale down to 100 row look-ups for query 2, the > time scales in a similar fashion, down to 5s. > > Am I doing something wrong here? It seems like taking 5s to look-up > 100 rows in a distributed hash table is way too slow. > > Thoughts? > > Bill- >