If you are running a sequential repair (or have previously run a sequential repair that is still running) Cassandra will still have the file descriptors open for files in the snapshot it is using for the repair operation.
>From the http://www.datastax.com/dev/blog/repair-in-cassandra *Cassandra 1.2 introduced a new option to repair to help manage the problems caused by the nodes all repairing with each other at the same time, it is call a snapshot repair, or sequential repair. As of Cassandra 2.1, sequential repair is the default, and the old parallel repair an option. Sequential repair has all of the nodes involved take a snapshot, the snapshot lives until the repair finishes, and then is removed. By taking a snapshot, repair can procede in a serial fashion, such that only two nodes are ever comparing with each other at a time. This makes the overall repair process slower, but decreases the burden placed on the nodes, and means you have less impact on reads/writes to the system.* On 16 March 2015 at 16:33, David Wahler <dwah...@indeed.com> wrote: > On Mon, Mar 16, 2015 at 6:12 PM, Ben Bromhead <b...@instaclustr.com> wrote: > > Cassandra will by default snapshot your data directory on the following > > events: > > > > TRUNCATE and DROP schema events > > when you run nodetool repair > > when you run nodetool snapshot > > > > Snapshots are just hardlinks to existing SSTables so the only disk space > > they take up is for files that have since been compacted away. Disk space > > for snapshots will be freed when the last link to the files are removed. > You > > can remove all snapshots in a cluster using nodetool clearsnapshot > > > > Snapshots will fail if you are out of disk space (this is > counterintuitive > > to the above, but it is true), if you have not increased the number of > > available file descriptors or if there are permissions issues. > > > > Out of curiosity, how often are you running repair? > > Thanks for the information. We're running repair once per week, as > recommended by the Datastax documentation. The repair is staggered to > run on one machine at a time with the --partitioner-range option in > order to spread out the load. > > Running "nodetool clearsnapshot" doesn't free up any space. I'm > guessing that because the snapshot files have been deleted from the > filesystem, Cassandra thinks the snapshots are already gone. But > because it still has the file descriptors open, the disk space hasn't > actually been reclaimed. > -- Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr <http://twitter.com/instaclustr> | (650) 284 9692