Troubleshooting IO performance ?

Philippe Sat, 04 Jun 2011 15:35:09 -0700

Hello,
I am evaluating using cassandra and I'm running into some strange IO
behavior that I can't explain, I'd like some help/ideas to troubleshoot it.


I am running a 1 node cluster with a keyspace consisting of two columns
families, one of which has dozens of supercolumns itself containing dozens
of columns.
All in all, this is a couple gigabytes of data, 12GB on the hard drive.
The hardware is pretty good : 16GB memory + RAID-0 SSD drives with LVM and
an i5 processor (4 cores).

Keyspace: xxxxxxxxxxxxxxxxxxx
        Read Count: 460754852
        Read Latency: 1.108205793092766 ms.
        Write Count: 30620665
        Write Latency: 0.01411020877567486 ms.
        Pending Tasks: 0
                Column Family: xxxxxxxxxxxxxxxxxxxxxxxxxx
                SSTable count: 5
                Space used (live): 548700725
                Space used (total): 548700725
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 11
                Read Count: 2891192
                Read Latency: NaN ms.
                Write Count: 3157547
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 367396
                Key cache size: 367396
                Key cache hit rate: NaN
                Row cache capacity: 112683
                Row cache size: 112683
                Row cache hit rate: NaN
                Compacted row minimum size: 125
                Compacted row maximum size: 924
                Compacted row mean size: 172

                Column Family: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
                SSTable count: 7
                Space used (live): 8707538781
                Space used (total): 8707538781
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 30
                Read Count: 457863660
                Read Latency: 2.381 ms.
                Write Count: 27463118
                Write Latency: NaN ms.
                Pending Tasks: 0
                Key cache capacity: 4518387
                Key cache size: 4518387
                Key cache hit rate: 0.9247881700850826
                Row cache capacity: 1349682
                Row cache size: 1349682
                Row cache hit rate: 0.39400533823415573
                Compacted row minimum size: 125
                Compacted row maximum size: 6866
                Compacted row mean size: 165

My app makes a bunch of requests using a MultigetSuperSliceQuery for a set
of keys, typically a couple dozen at most. It also selects a subset of the
supercolumns. I am running 8 requests in parallel at most.


Two days, I ran a 1.5 hour process that basically read every key. The server
had no IOwaits and everything was humming along. However, right at the end
of the process, there was a huge spike in IOs. I didn't think much of it.

Today, after two days of inactivity, any query I run raises the IOs to 80%
utilization of the SSD drives even though I'm running the same query over
and over (no cache??)

Any ideas on how to troubleshoot this, or better, how to solve this ?
thanks

Philippe

Troubleshooting IO performance ?

Reply via email to