Re: Truncate data from a single node
Thanks for the suggestions! Could altering the RF from 2 to 1 cause any issues, or will it basically just be changing the coordinator's write paths and also guiding future repairs/cleans? On Wed, Jul 12, 2017 at 22:29 Jeff Jirsa <jji...@apache.org> wrote: > > > On 2017-07-11 20:09 (-0700), "Kevin O'Connor" <ke...@reddit.com.INVALID> > wrote: > > This might be an interesting question - but is there a way to truncate > data > > from just a single node or two as a test instead of truncating from the > > entire cluster? We have time series data we don't really care if we're > > missing gaps in, but it's taking up a huge amount of space and we're > > looking to clear some. I'm worried if we run a truncate on this huge CF > > it'll end up locking up the cluster, but I don't care so much if it just > > kills a single node. > > > > IF YOU CAN TOLERATE DATA INCONSISTENCIES, You can stop a node, delete some > sstables, and start it again. The risk in deleting arbitrary sstables is > that you may remove a tombstone and bring data back to life, or remove the > only replica with a write if you write at CL:ONE, but if you're OK with > data going missing, you won't hurt much as long as you stop cassandra > before you go killing sstables. > > TWCS does make this easier, because you can use sstablemetadata to > identify timestamps/tombstone %s, and then nuke sstables that are > old/mostly-expired first. > > > > Is doing something like deleting SSTables from disk possible? If I alter > > this keyspace from an RF of 2 down to 1 and then delete them, they won't > be > > able to be repaired if I'm thinking this through right. > > > > If you drop RF from 2 to 1, you can just run cleanup and delete half the > data (though it'll rewrite sstables to do it, which will be a short term > increase). > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: Truncate data from a single node
On 2017-07-11 20:09 (-0700), "Kevin O'Connor" <ke...@reddit.com.INVALID> wrote: > This might be an interesting question - but is there a way to truncate data > from just a single node or two as a test instead of truncating from the > entire cluster? We have time series data we don't really care if we're > missing gaps in, but it's taking up a huge amount of space and we're > looking to clear some. I'm worried if we run a truncate on this huge CF > it'll end up locking up the cluster, but I don't care so much if it just > kills a single node. > IF YOU CAN TOLERATE DATA INCONSISTENCIES, You can stop a node, delete some sstables, and start it again. The risk in deleting arbitrary sstables is that you may remove a tombstone and bring data back to life, or remove the only replica with a write if you write at CL:ONE, but if you're OK with data going missing, you won't hurt much as long as you stop cassandra before you go killing sstables. TWCS does make this easier, because you can use sstablemetadata to identify timestamps/tombstone %s, and then nuke sstables that are old/mostly-expired first. > Is doing something like deleting SSTables from disk possible? If I alter > this keyspace from an RF of 2 down to 1 and then delete them, they won't be > able to be repaired if I'm thinking this through right. > If you drop RF from 2 to 1, you can just run cleanup and delete half the data (though it'll rewrite sstables to do it, which will be a short term increase). - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Truncate data from a single node
Hey Kevin, I would worry that much about a truncate operation. It can quietly destroy all your data very efficiently. One thing you should know is that a snapshot is automatically created when you issue a truncate. Yes. An undelete if you screw up. Just don't be surprised when you find it. Deleting SSTables is also valid. If you are using something like twcs you can pick some files that are older and grouped together. Altering the keyspace to a different RF won't account for what keys are present in the SStable. You could determine the keys in each file, but at this point it's getting much more complicated. Find some old SSTables for the table in question and delete them. Much easier. Patrick On Tue, Jul 11, 2017 at 8:09 PM, Kevin O'Connor <ke...@reddit.com.invalid> wrote: > This might be an interesting question - but is there a way to truncate > data from just a single node or two as a test instead of truncating from > the entire cluster? We have time series data we don't really care if we're > missing gaps in, but it's taking up a huge amount of space and we're > looking to clear some. I'm worried if we run a truncate on this huge CF > it'll end up locking up the cluster, but I don't care so much if it just > kills a single node. > > Is doing something like deleting SSTables from disk possible? If I alter > this keyspace from an RF of 2 down to 1 and then delete them, they won't be > able to be repaired if I'm thinking this through right. > > Thanks! >
Truncate data from a single node
This might be an interesting question - but is there a way to truncate data from just a single node or two as a test instead of truncating from the entire cluster? We have time series data we don't really care if we're missing gaps in, but it's taking up a huge amount of space and we're looking to clear some. I'm worried if we run a truncate on this huge CF it'll end up locking up the cluster, but I don't care so much if it just kills a single node. Is doing something like deleting SSTables from disk possible? If I alter this keyspace from an RF of 2 down to 1 and then delete them, they won't be able to be repaired if I'm thinking this through right. Thanks!