Hi Ian, Thanks for the suggestion but I had actually already done that prior to the scenario I described (to get myself some free space) and when I ran nodetool cfstats it listed 0 snapshots as expected, so unfortunately I don't think that is where my space went.
One additional piece of information I forgot to point out is that when I ran nodetool status on the node it included all 6 nodes. I have also heard it mentioned that I may want to have a prime number of nodes which may help protect against split-brain. Is this true? If so does it still apply when I am using vnodes? Thanks again, Nate -- *Nathanael Yoder* Principal Engineer & Data Scientist, Whistle 415-944-7344 // n...@whistle.com On Tue, Dec 9, 2014 at 7:42 AM, Ian Rose <ianr...@fullstory.com> wrote: > Try `nodetool clearsnapshot` which will delete any snapshots you have. I > have never taken a snapshot with nodetool yet I found several snapshots on > my disk recently (which can take a lot of space). So perhaps they are > automatically generated by some operation? No idea. Regardless, nuking > those freed up a ton of space for me. > > - Ian > > > On Mon, Dec 8, 2014 at 8:12 PM, Nate Yoder <n...@whistle.com> wrote: > >> Hi All, >> >> I am new to Cassandra so I apologise in advance if I have missed anything >> obvious but this one currently has me stumped. >> >> I am currently running a 6 node Cassandra 2.1.1 cluster on EC2 using >> C3.2XLarge nodes which overall is working very well for us. However, after >> letting it run for a while I seem to get into a situation where the amount >> of disk space used far exceeds the total amount of data on each node and I >> haven't been able to get the size to go back down except by stopping and >> restarting the node. >> >> For example, in my data I have almost all of my data in one table. On >> one of my nodes right now the total space used (as reported by nodetool >> cfstats) is 57.2 GB and there are no snapshots. However, when I look at the >> size of the data files (using du) the data file for that table is 107GB. >> Because the C3.2XLarge only have 160 GB of SSD you can see why this quickly >> becomes a problem. >> >> Running nodetool compact didn't reduce the size and neither does running >> nodetool repair -pr on the node. I also tried nodetool flush and nodetool >> cleanup (even though I have not added or removed any nodes recently) but it >> didn't change anything either. In order to keep my cluster up I then >> stopped and started that node and the size of the data file dropped to 54GB >> while the total column family size (as reported by nodetool) stayed about >> the same. >> >> Any suggestions as to what I could be doing wrong? >> >> Thanks, >> Nate >> > >