Re: performance degradation in cluster

Arijit Mukherjee Thu, 03 Feb 2011 21:48:35 -0800

Hi

I'll explain a bit. I'm working with Abhinav.


We've an application which was earlier based on Lucene which would
index a huge volume of data, and later use the indices to fetch data
and perform a fuzzy matching operation. We wanted to use Cassandra
primarily because of the sharding/availability/SPOF capabilities and
the write-speed. The application is running on an 8-core machine, and
we've 8 threads, each reading different files and writing to 3
different CFs -

- one to store the raw data, keyed by an ID, the ID is of the form
ThreadName-<counter> and is unique
- one to store a subset of the raw data - I mean a small set of
fields, and keyed by the same ID as before
- one to store the inverted index, keyed by a field in the data with
all the ID of the records for which that field matched

On the 8-core machine, with 8-threads, it took us approx 20 min. to
create the index store with a data set of 24M rows. And this was for a
single instance of Cassandra. 480 sec. mentioned by Abhinav earlier
was for a smaller dataset.

When we created a ring, by adding another similar machine, and
re-executed the application from scratch (consistency level = ONE),
the total time increased considerably - actually doubled. And the
nodes were unbalanced showing 70-30 distribution of load (sometimes
even more skewed). Effectively, in the ring, it's taking much longer
and the data distribution in skewed. Similar thing happened when we
tried the application on a collection of desktops (4/5 of them).

We have faced another issue while doing this. We performed jstack on
the application, and found an output similar to the JIRA issue 1594
(which I mentioned in another mail earlier) - and this is true for
both 0.6.8 and 0.7 versions. The cpu usage on the nodes is never
greater than 50-60% (user+sys), the disk busy time is quite high. The
CPU usage when we were using Lucene was pretty high for all the cores
(90% or more). It may be possible that the usage has gone down because
of the disk IO - but we aren't completely sure on this.

We have a feeling that we aren't creating the cluster properly or have
missed certain important configuration aspects. The configuration we
are using is the default one. Changes to the memtable-throughput in MB
didn't have much effect.

Following is a snapshot from the cfstat output (for a data set of 2M rows):

Keyspace: fct_cdr
        Read Count: 277537
        Read Latency: 0.43607250564789557 ms.
        Write Count: 3781264
        Write Latency: 0.01323008708199163 ms.
        Pending Tasks: 0
                Column Family: RawCDR
                SSTable count: 1
                Space used (live): 719796067
                Space used (total): 1439605485
                Memtable Columns Count: 218459
                Memtable Data Size: 120398507
                Memtable Switch Count: 4
                Read Count: 0
                Read Latency: NaN ms.
                Write Count: 1203177
                Write Latency: 0.016 ms.
                Pending Tasks: 0
                Key cache capacity: 10000
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache capacity: 1000
                Row cache size: 0
                Row cache hit rate: NaN
                Compacted row minimum size: 535
                Compacted row maximum size: 924
                Compacted row mean size: 642

Column Family: Index
                SSTable count: 5
                Space used (live): 326960041
                Space used (total): 564423442
                Memtable Columns Count: 264507
                Memtable Data Size: 9443853
                Memtable Switch Count: 15
                Read Count: 178785
                Read Latency: 0.425 ms.
                Write Count: 1203177
                Write Latency: 0.012 ms.
                Pending Tasks: 0
                Key cache capacity: 10000
                Key cache size: 10000
                Key cache hit rate: 0.0
                Row cache capacity: 1000
                Row cache size: 1000
                Row cache hit rate: 0.0
                Compacted row minimum size: 215
                Compacted row maximum size: 310
                Compacted row mean size: 215

                Column Family: IndexInverse
                SSTable count: 3
                Space used (live): 164782651
                Space used (total): 164782651
                Memtable Columns Count: 289647
                Memtable Data Size: 12757041
                Memtable Switch Count: 3
                Read Count: 98950
                Read Latency: 0.457 ms.
                Write Count: 1201911
                Write Latency: 0.017 ms.
                Pending Tasks: 0
                Key cache capacity: 10000
                Key cache size: 10000
                Key cache hit rate: 0.0
                Row cache capacity: 1000
                Row cache size: 1000
                Row cache hit rate: 0.0
                Compacted row minimum size: 149
                Compacted row maximum size: 14237
                Compacted row mean size: 179

The write latency shown in this is not bad, but we need to confirm
this. It may be the case that it's something to do with the
application and/or our configuration.

Regards
Arijit


-- 
"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."

Re: performance degradation in cluster

Reply via email to