Hi All, I am new to Cassandra so I apologise in advance if I have missed anything obvious but this one currently has me stumped.
I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using C3.2XLarge nodes which overall is working very well for us. However, after letting it run for a while I seem to get into a situation where the amount of disk space used far exceeds the total amount of data on each node and I haven't been able to get the size to go back down except by stopping and restarting the node. For example, in my data I have almost all of my data in one table. On one of my nodes right now the total space used (as reported by nodetool cfstats) is 57.2 GB and there are no snapshots. However, when I look at the size of the data files (using du) the data file for that table is 107GB. Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly becomes a problem. Running nodetool compact didn't reduce the size and neither does running nodetool repair -pr on the node. I also tried nodetool flush and nodetool cleanup (even though I have not added or removed any nodes recently) but it didn't change anything either. In order to keep my cluster up I then stopped and started that node and the size of the data file dropped to 54GB while the total column family size (as reported by nodetool) stayed about the same. Any suggestions as to what I could be doing wrong? Thanks, Nate