Hi,
I have a small 2 node cassandra cluster that seems to be constrained by
read throughput. There are about 100 writes/s and 60 reads/s mostly against
a skinny column family. Here's the cfstats for that family:
SSTable count: 13
Space used (live): 231920026568
Space used (total): 231920026568
Number of Keys (estimate): 356899200
Memtable Columns Count: 1385568
Memtable Data Size: 359155691
Memtable Switch Count: 26
Read Count: 40705879
Read Latency: 25.010 ms.
Write Count: 9680958
Write Latency: 0.036 ms.
Pending Tasks: 0
Bloom Filter False Postives: 28380
Bloom Filter False Ratio: 0.00360
Bloom Filter Space Used: 874173664
Compacted row minimum size: 61
Compacted row maximum size: 152321
Compacted row mean size: 1445
iostat shows almost no write activity, here's a typical line:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sdb 0.00 0.00 312.87 0.00 6.61 0.00 43.27
23.35 105.06 2.28 71.19
and nodetool tpstats always shows pending tasks in the ReadStage. The data
set has grown beyond physical memory (250GB/node w/64GB of RAM) so I know
disk access is required, but are there particular settings I should
experiment with that could help relieve some read i/o pressure? I already
put memcached in front of cassandra so the row cache probably won't help
much.
Also this column family stores smallish documents (usually 1-100K) along
with metadata. The document is only occasionally accessed, usually only the
metadata is read/written. Would splitting out the document into a separate
column family help?
Thanks
Kireet