Hi All,

I am new to Cassandra so I apologise in advance if I have missed anything
obvious but this one currently has me stumped.

I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using
C3.2XLarge nodes which overall is working very well for us.  However, after
letting it run for a while I seem to get into a situation where the amount
of disk space used far exceeds the total amount of data on each node and I
haven't been able to get the size to go back down except by stopping and
restarting the node.

For example, in my data I have almost all of my data in one table.  On one
of my nodes right now the total space used (as reported by nodetool
cfstats) is 57.2 GB and there are no snapshots. However, when I look at the
size of the data files (using du) the data file for that table is 107GB.
Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly
becomes a problem.

Running nodetool compact didn't reduce the size and neither does running
nodetool repair -pr on the node.  I also tried nodetool flush and nodetool
cleanup (even though I have not added or removed any nodes recently) but it
didn't change anything either.  In order to keep my cluster up I then
stopped and started that node and the size of the data file dropped to 54GB
while the total column family size (as reported by nodetool) stayed about
the same.

Any suggestions as to what I could be doing wrong?

Thanks,
Nate

Reply via email to