Re: Truncate data from a single node

2017-07-12 Thread Kevin O'Connor
Thanks for the suggestions! Could altering the RF from 2 to 1 cause any
issues, or will it basically just be changing the coordinator's write paths
and also guiding future repairs/cleans?

On Wed, Jul 12, 2017 at 22:29 Jeff Jirsa <jji...@apache.org> wrote:

>
>
> On 2017-07-11 20:09 (-0700), "Kevin O'Connor" <ke...@reddit.com.INVALID>
> wrote:
> > This might be an interesting question - but is there a way to truncate
> data
> > from just a single node or two as a test instead of truncating from the
> > entire cluster? We have time series data we don't really care if we're
> > missing gaps in, but it's taking up a huge amount of space and we're
> > looking to clear some. I'm worried if we run a truncate on this huge CF
> > it'll end up locking up the cluster, but I don't care so much if it just
> > kills a single node.
> >
>
> IF YOU CAN TOLERATE DATA INCONSISTENCIES, You can stop a node, delete some
> sstables, and start it again. The risk in deleting arbitrary sstables is
> that you may remove a tombstone and bring data back to life, or remove the
> only replica with a write if you write at CL:ONE, but if you're OK with
> data going missing, you won't hurt much as long as you stop cassandra
> before you go killing sstables.
>
> TWCS does make this easier, because you can use sstablemetadata to
> identify timestamps/tombstone %s, and then nuke sstables that are
> old/mostly-expired first.
>
>
> > Is doing something like deleting SSTables from disk possible? If I alter
> > this keyspace from an RF of 2 down to 1 and then delete them, they won't
> be
> > able to be repaired if I'm thinking this through right.
> >
>
> If you drop RF from 2 to 1, you can just run cleanup and delete half the
> data (though it'll rewrite sstables to do it, which will be a short term
> increase).
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Truncate data from a single node

2017-07-12 Thread Jeff Jirsa


On 2017-07-11 20:09 (-0700), "Kevin O'Connor" <ke...@reddit.com.INVALID> wrote: 
> This might be an interesting question - but is there a way to truncate data
> from just a single node or two as a test instead of truncating from the
> entire cluster? We have time series data we don't really care if we're
> missing gaps in, but it's taking up a huge amount of space and we're
> looking to clear some. I'm worried if we run a truncate on this huge CF
> it'll end up locking up the cluster, but I don't care so much if it just
> kills a single node.
> 

IF YOU CAN TOLERATE DATA INCONSISTENCIES, You can stop a node, delete some 
sstables, and start it again. The risk in deleting arbitrary sstables is that 
you may remove a tombstone and bring data back to life, or remove the only 
replica with a write if you write at CL:ONE, but if you're OK with data going 
missing, you won't hurt much as long as you stop cassandra before you go 
killing sstables.

TWCS does make this easier, because you can use sstablemetadata to identify 
timestamps/tombstone %s, and then nuke sstables that are old/mostly-expired 
first.


> Is doing something like deleting SSTables from disk possible? If I alter
> this keyspace from an RF of 2 down to 1 and then delete them, they won't be
> able to be repaired if I'm thinking this through right.
> 

If you drop RF from 2 to 1, you can just run cleanup and delete half the data 
(though it'll rewrite sstables to do it, which will be a short term increase).


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Truncate data from a single node

2017-07-11 Thread Patrick McFadin
Hey Kevin,

I would worry that much about a truncate operation. It can quietly destroy
all your data very efficiently. One thing you should know is that a
snapshot is automatically created when you issue a truncate. Yes. An
undelete if you screw up. Just don't be surprised when you find it.

Deleting SSTables is also valid. If you are using something like twcs you
can pick some files that are older and grouped together.  Altering the
keyspace to a different RF won't account for what keys are present in the
SStable. You could determine the keys in each file, but at this point it's
getting much more complicated.

Find some old SSTables for the table in question and delete them. Much
easier.

Patrick

On Tue, Jul 11, 2017 at 8:09 PM, Kevin O'Connor <ke...@reddit.com.invalid>
wrote:

> This might be an interesting question - but is there a way to truncate
> data from just a single node or two as a test instead of truncating from
> the entire cluster? We have time series data we don't really care if we're
> missing gaps in, but it's taking up a huge amount of space and we're
> looking to clear some. I'm worried if we run a truncate on this huge CF
> it'll end up locking up the cluster, but I don't care so much if it just
> kills a single node.
>
> Is doing something like deleting SSTables from disk possible? If I alter
> this keyspace from an RF of 2 down to 1 and then delete them, they won't be
> able to be repaired if I'm thinking this through right.
>
> Thanks!
>


Truncate data from a single node

2017-07-11 Thread Kevin O'Connor
This might be an interesting question - but is there a way to truncate data
from just a single node or two as a test instead of truncating from the
entire cluster? We have time series data we don't really care if we're
missing gaps in, but it's taking up a huge amount of space and we're
looking to clear some. I'm worried if we run a truncate on this huge CF
it'll end up locking up the cluster, but I don't care so much if it just
kills a single node.

Is doing something like deleting SSTables from disk possible? If I alter
this keyspace from an RF of 2 down to 1 and then delete them, they won't be
able to be repaired if I'm thinking this through right.

Thanks!