Re: What is your backup strategy for Cassandra?

2015-09-24 Thread Luigi Tagliamonte
Since I'm running on AWS we wrote a script that for each column performs a
snapshot and sync it on S3, and at the end of the script i'm also grabbing
the node tokens and store them on S3.
In case of restore i will use this procedure
<http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html>
.

On Mon, Sep 21, 2015 at 9:23 PM, Sanjay Baronia <
sanjay.baro...@triliodata.com> wrote:

> John,
>
> Yes the Trilio solution is private and today, it is for Cassandra running
> in Vmware and OpenStack environment. AWS support is on the roadmap. Will
> reach out separately to give you a demo after the summit.
>
> Thanks,
>
> Sanjay
>
> _
>
>
>
> *Sanjay Baronia VP of Product & Solutions Management Trilio Data *(c)
> 508-335-2306
> sanjay.baro...@triliodata.com
>
> [image: Trilio-Business Assurance_300 Pixels] <http://www.triliodata.com/>
>
> *Experience Trilio* *in action*, please *click here
> <i...@triliodata.com?subject=Demo%20Request.>* to request a demo today!
>
>
> From: John Wong <gokoproj...@gmail.com>
> Reply-To: Cassandra Maillist <user@cassandra.apache.org>
> Date: Friday, September 18, 2015 at 8:02 PM
> To: Cassandra Maillist <user@cassandra.apache.org>
> Subject: Re: What is your backup strategy for Cassandra?
>
>
>
> On Fri, Sep 18, 2015 at 3:02 PM, Sanjay Baronia <
> sanjay.baro...@triliodata.com> wrote:
>
>>
>> Will be at the Cassandra summit next week if any of you would like a demo.
>>
>>
>>
>
> Sanjay, is Trilio Data's work private? Unfortunately I will not attend the
> Summit, but maybe Trilio can also talk about this in, say, a Cassandra
> Planet blog post? I'd like to see a demo or get a little more technical. If
> open source would be cool.
>
> I didn't implement our solution, but the current solution is based on full
> snapshot copies to a remote server for storage using rsync (only transfers
> what is needed). On our remote server we have a complete backup of every
> hour, so if you cd into the data directory you can get every node's exact
> moment-in-time data like you are browsing on the actual nodes.
>
> We are an AWS shop so we can further optimize our cost by using EBS
> snapshot so the volume can reduce (currently we provisioned 4000GB which is
> too much). Anyway, s3 we tried, and is an okay solution. The bad thing is
> performance plus ability to quickly go back in time. With EBS I can create
> a dozen volumes from the same snapshot, attach each to my each of my node,
> and cp -r files over.
>
> John
>
>>
>> From: Maciek Sakrejda <mac...@heroku.com>
>> Reply-To: Cassandra Maillist <user@cassandra.apache.org>
>> Date: Friday, September 18, 2015 at 2:09 PM
>> To: Cassandra Maillist <user@cassandra.apache.org>
>> Subject: Re: What is your backup strategy for Cassandra?
>>
>> On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky <mtam...@gmail.com> wrote:
>>
>>> This seems like an apt time to quote [1]:
>>>
>>> > Remember that you get 1 point for making a backup and 10,000 points
>>> for restoring one.
>>>
>>> Restoring from backups is my goal.
>>>
>>> The commonly recommended tools (tablesnap, cassandra_snapshotter) all
>>> seem to leave the restore operation as a pretty complicated exercise for
>>> the operator.
>>>
>>> Do any include a working way to restore, on a different host, all of
>>> node X's data from backups to the correct directories, such that the
>>> restored files are in the proper places and the node restart method [2]
>>> "just works"?
>>>
>>
>> As someone getting started with Cassandra, I'm very much interested in
>> this as well. It seems that for the most part, folks seem to rely on
>> replication and node replacement to recover from failures, and perhaps this
>> is a testament for how well this works, but as long as we're hauling out
>> aphorisms, "RAID is not a backup" seems to (partially) apply here too.
>>
>> I'd love to hear more about how the community does restores, too. This
>> isn't complaining about shoddy tooling: this is trying to understand--and
>> hopefully, in time, improve--the status quo re: disaster recovery. E.g.,
>> given that tableslurp operates on a single table at a time, do people
>> normally just restore single tables? Is that used when there's filesystem
>> or disk corruption? Bugs? Other issues? Looking forward to learning more.
>>
>> Thanks,
>> Maciek
>>
>
>


-- 
Luigi
---
“The only way to get smarter is by playing a smarter opponent.”


Re: What is your backup strategy for Cassandra?

2015-09-21 Thread Sanjay Baronia
John,

Yes the Trilio solution is private and today, it is for Cassandra running in 
Vmware and OpenStack environment. AWS support is on the roadmap. Will reach out 
separately to give you a demo after the summit.

Thanks,

Sanjay
_
Sanjay Baronia
VP of Product & Solutions Management
Trilio Data
(c) 508-335-2306
sanjay.baro...@triliodata.com<mailto:sanjay.baro...@triliodata.com>
[Trilio-Business Assurance_300 Pixels]<http://www.triliodata.com/>

Experience Trilio in action, please click 
here<mailto:i...@triliodata.com?subject=Demo%20Request.> to request a demo 
today!
[cid:A671941A-2E52-4BB7-B7F8-994DC2C6BDB6]

From: John Wong <gokoproj...@gmail.com<mailto:gokoproj...@gmail.com>>
Reply-To: Cassandra Maillist 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, September 18, 2015 at 8:02 PM
To: Cassandra Maillist 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: What is your backup strategy for Cassandra?



On Fri, Sep 18, 2015 at 3:02 PM, Sanjay Baronia 
<sanjay.baro...@triliodata.com<mailto:sanjay.baro...@triliodata.com>> wrote:

Will be at the Cassandra summit next week if any of you would like a demo.


Sanjay, is Trilio Data's work private? Unfortunately I will not attend the 
Summit, but maybe Trilio can also talk about this in, say, a Cassandra Planet 
blog post? I'd like to see a demo or get a little more technical. If open 
source would be cool.

I didn't implement our solution, but the current solution is based on full 
snapshot copies to a remote server for storage using rsync (only transfers what 
is needed). On our remote server we have a complete backup of every hour, so if 
you cd into the data directory you can get every node's exact moment-in-time 
data like you are browsing on the actual nodes.

We are an AWS shop so we can further optimize our cost by using EBS snapshot so 
the volume can reduce (currently we provisioned 4000GB which is too much). 
Anyway, s3 we tried, and is an okay solution. The bad thing is performance plus 
ability to quickly go back in time. With EBS I can create a dozen volumes from 
the same snapshot, attach each to my each of my node, and cp -r files over.

John

From: Maciek Sakrejda <mac...@heroku.com<mailto:mac...@heroku.com>>
Reply-To: Cassandra Maillist 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, September 18, 2015 at 2:09 PM
To: Cassandra Maillist 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: What is your backup strategy for Cassandra?

On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky 
<mtam...@gmail.com<mailto:mtam...@gmail.com>> wrote:
This seems like an apt time to quote [1]:

> Remember that you get 1 point for making a backup and 10,000 points for 
> restoring one.

Restoring from backups is my goal.

The commonly recommended tools (tablesnap, cassandra_snapshotter) all seem to 
leave the restore operation as a pretty complicated exercise for the operator.

Do any include a working way to restore, on a different host, all of node X's 
data from backups to the correct directories, such that the restored files are 
in the proper places and the node restart method [2] "just works"?

As someone getting started with Cassandra, I'm very much interested in this as 
well. It seems that for the most part, folks seem to rely on replication and 
node replacement to recover from failures, and perhaps this is a testament for 
how well this works, but as long as we're hauling out aphorisms, "RAID is not a 
backup" seems to (partially) apply here too.

I'd love to hear more about how the community does restores, too. This isn't 
complaining about shoddy tooling: this is trying to understand--and hopefully, 
in time, improve--the status quo re: disaster recovery. E.g., given that 
tableslurp operates on a single table at a time, do people normally just 
restore single tables? Is that used when there's filesystem or disk corruption? 
Bugs? Other issues? Looking forward to learning more.

Thanks,
Maciek



Re: What is your backup strategy for Cassandra?

2015-09-18 Thread Maciek Sakrejda
On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky  wrote:

> This seems like an apt time to quote [1]:
>
> > Remember that you get 1 point for making a backup and 10,000 points for
> restoring one.
>
> Restoring from backups is my goal.
>
> The commonly recommended tools (tablesnap, cassandra_snapshotter) all seem
> to leave the restore operation as a pretty complicated exercise for the
> operator.
>
> Do any include a working way to restore, on a different host, all of node
> X's data from backups to the correct directories, such that the restored
> files are in the proper places and the node restart method [2] "just works"?
>

As someone getting started with Cassandra, I'm very much interested in this
as well. It seems that for the most part, folks seem to rely on replication
and node replacement to recover from failures, and perhaps this is a
testament for how well this works, but as long as we're hauling out
aphorisms, "RAID is not a backup" seems to (partially) apply here too.

I'd love to hear more about how the community does restores, too. This
isn't complaining about shoddy tooling: this is trying to understand--and
hopefully, in time, improve--the status quo re: disaster recovery. E.g.,
given that tableslurp operates on a single table at a time, do people
normally just restore single tables? Is that used when there's filesystem
or disk corruption? Bugs? Other issues? Looking forward to learning more.

Thanks,
Maciek


Re: What is your backup strategy for Cassandra?

2015-09-18 Thread John Wong
On Fri, Sep 18, 2015 at 3:02 PM, Sanjay Baronia <
sanjay.baro...@triliodata.com> wrote:

>
> Will be at the Cassandra summit next week if any of you would like a demo.
>
>
>

Sanjay, is Trilio Data's work private? Unfortunately I will not attend the
Summit, but maybe Trilio can also talk about this in, say, a Cassandra
Planet blog post? I'd like to see a demo or get a little more technical. If
open source would be cool.

I didn't implement our solution, but the current solution is based on full
snapshot copies to a remote server for storage using rsync (only transfers
what is needed). On our remote server we have a complete backup of every
hour, so if you cd into the data directory you can get every node's exact
moment-in-time data like you are browsing on the actual nodes.

We are an AWS shop so we can further optimize our cost by using EBS
snapshot so the volume can reduce (currently we provisioned 4000GB which is
too much). Anyway, s3 we tried, and is an okay solution. The bad thing is
performance plus ability to quickly go back in time. With EBS I can create
a dozen volumes from the same snapshot, attach each to my each of my node,
and cp -r files over.

John

>
> From: Maciek Sakrejda <mac...@heroku.com>
> Reply-To: Cassandra Maillist <user@cassandra.apache.org>
> Date: Friday, September 18, 2015 at 2:09 PM
> To: Cassandra Maillist <user@cassandra.apache.org>
> Subject: Re: What is your backup strategy for Cassandra?
>
> On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky <mtam...@gmail.com> wrote:
>
>> This seems like an apt time to quote [1]:
>>
>> > Remember that you get 1 point for making a backup and 10,000 points for
>> restoring one.
>>
>> Restoring from backups is my goal.
>>
>> The commonly recommended tools (tablesnap, cassandra_snapshotter) all
>> seem to leave the restore operation as a pretty complicated exercise for
>> the operator.
>>
>> Do any include a working way to restore, on a different host, all of node
>> X's data from backups to the correct directories, such that the restored
>> files are in the proper places and the node restart method [2] "just works"?
>>
>
> As someone getting started with Cassandra, I'm very much interested in
> this as well. It seems that for the most part, folks seem to rely on
> replication and node replacement to recover from failures, and perhaps this
> is a testament for how well this works, but as long as we're hauling out
> aphorisms, "RAID is not a backup" seems to (partially) apply here too.
>
> I'd love to hear more about how the community does restores, too. This
> isn't complaining about shoddy tooling: this is trying to understand--and
> hopefully, in time, improve--the status quo re: disaster recovery. E.g.,
> given that tableslurp operates on a single table at a time, do people
> normally just restore single tables? Is that used when there's filesystem
> or disk corruption? Bugs? Other issues? Looking forward to learning more.
>
> Thanks,
> Maciek
>


Re: What is your backup strategy for Cassandra?

2015-09-18 Thread Sanjay Baronia
Trilio Data provides an elegant backup and recovery  solution for scaleout 
Cassandra in VMware & OpenStack environment with key highlights as follows:
-Discovers topology changes for accurate point in time backups
-Speeds recovery by an order of magnitude as it takes an environmental and 
cluster-wide snapshot
-Eliminates maintenance of inherently error-prone script based backups

Will be at the Cassandra summit next week if any of you would like a demo.

Regards,

Sanjay
508-335-2306
_
Sanjay Baronia
VP of Product & Solutions Management
Trilio Data
(c) 508-335-2306
sanjay.baro...@triliodata.com<mailto:sanjay.baro...@triliodata.com>


From: Maciek Sakrejda <mac...@heroku.com<mailto:mac...@heroku.com>>
Reply-To: Cassandra Maillist 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, September 18, 2015 at 2:09 PM
To: Cassandra Maillist 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: What is your backup strategy for Cassandra?

On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky 
<mtam...@gmail.com<mailto:mtam...@gmail.com>> wrote:
This seems like an apt time to quote [1]:

> Remember that you get 1 point for making a backup and 10,000 points for 
> restoring one.

Restoring from backups is my goal.

The commonly recommended tools (tablesnap, cassandra_snapshotter) all seem to 
leave the restore operation as a pretty complicated exercise for the operator.

Do any include a working way to restore, on a different host, all of node X's 
data from backups to the correct directories, such that the restored files are 
in the proper places and the node restart method [2] "just works"?

As someone getting started with Cassandra, I'm very much interested in this as 
well. It seems that for the most part, folks seem to rely on replication and 
node replacement to recover from failures, and perhaps this is a testament for 
how well this works, but as long as we're hauling out aphorisms, "RAID is not a 
backup" seems to (partially) apply here too.

I'd love to hear more about how the community does restores, too. This isn't 
complaining about shoddy tooling: this is trying to understand--and hopefully, 
in time, improve--the status quo re: disaster recovery. E.g., given that 
tableslurp operates on a single table at a time, do people normally just 
restore single tables? Is that used when there's filesystem or disk corruption? 
Bugs? Other issues? Looking forward to learning more.

Thanks,
Maciek


Re: What is your backup strategy for Cassandra?

2015-09-17 Thread Marc Tamsky
This seems like an apt time to quote [1]:

> Remember that you get 1 point for making a backup and 10,000 points for
restoring one.

Restoring from backups is my goal.

The commonly recommended tools (tablesnap, cassandra_snapshotter) all seem
to leave the restore operation as a pretty complicated exercise for the
operator.

Do any include a working way to restore, on a different host, all of node
X's data from backups to the correct directories, such that the restored
files are in the proper places and the node restart method [2] "just works"?


On Thu, Sep 17, 2015 at 6:47 PM, Robert Coli  wrote:

> tl;dr - tablesnap works. There are awkward aspects to its use, but if you
> are operating Cassandra in AWS it's probably the best off the shelf
> off-node backup.
>

Have folks here ever used tableslurp to restore a backup taken with
tablesnap?
How would you rate the difficulty of restore?

>From my limited testing, tableslurp looks like it can only restore a single
table within a keyspace per execution.

I have hundreds of tables... so without automation around tableslurp, that
doesn't seem like a reliable path toward a full restore.

Perhaps someone has written a tool that drives tableslurp so it "just
works" ?


[1] http://serverfault.com/a/277092/218999

[2]
http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_backup_noderestart_t.html


Re: What is your backup strategy for Cassandra?

2015-09-09 Thread Robert Coli
On Sun, Sep 6, 2015 at 12:32 AM, Gene  wrote:

> I've seen quite a few blog posts here and there about various back up
> strategies.  I'm wondering if anyone on this list would be willing to share
> theirs.
>

https://github.com/JeremyGrosser/tablesnap


> Things I'm curious about:
>
> 1. Data size
>

Up to hundreds of gigs per node.


> 2. Frequency for full snapshots
>

Never/always (depends on your perspective).


> 3. Frequency for copying snapshots off of the Cassandra nodes
>

As SSTables are flushed.


> 4. Do you use the incremental backups feature
>

No.


> 5. Do you use commitlog archiving
>

No.


> 6. What method you use to copy data off of the cluster (e.g. NFS, rsync,
> rsync+ssh, etc)
>

S3 upload.


> 7. Do you compress your backups, if so how soon (e.g. compress backups
> older than N days)
>

My SSTables are already snappy compressed, so I am skeptical of benefit
from re-compression.


> 8. Do you use any Off the Shelf scripts for your backups (e.g. tablesnap,
> cassandra_snapshotter, etc)
>

tablesnap


> 9. Do you utilise AWS for your backups, or do you keep it local (or
> offsite on your own hardware)
>

AWS.

tl;dr - tablesnap works. There are awkward aspects to its use, but if you
are operating Cassandra in AWS it's probably the best off the shelf
off-node backup.


What is your backup strategy for Cassandra?

2015-09-06 Thread Gene
Hello everyone,

I'm new to this mailing list, and still fairly new to Cassandra.  I'm a
systems administrator and have had a 3-node Cassandra cluster with a
replication factor of 3 running in Production for about a year now.  We
have about 200 GB of data per node currently.

Up until recently I have just been performing snapshots and clearing them
out as needed.  I recently implemented an automated process to perform
snapshots of our data and copy them off of our cluster via rsync+ssh.
Pretty soon I'll also be utilising the incremental backup feature for
sstables (cassandra.yaml:incremental_backups), and will be taking a look at
archiving for commitlog as well (commitlog_archiving.properties).

I've seen quite a few blog posts here and there about various back up
strategies.  I'm wondering if anyone on this list would be willing to share
theirs.

Things I'm curious about:

1. Data size
2. Frequency for full snapshots
3. Frequency for copying snapshots off of the Cassandra nodes
4. Do you use the incremental backups feature
5. Do you use commitlog archiving
6. What method you use to copy data off of the cluster (e.g. NFS, rsync,
rsync+ssh, etc)
7. Do you compress your backups, if so how soon (e.g. compress backups
older than N days)
8. Do you use any Off the Shelf scripts for your backups (e.g. tablesnap,
cassandra_snapshotter, etc)
9. Do you utilise AWS for your backups, or do you keep it local (or offsite
on your own hardware)
10. Anything else you'd like to add, especially if I missed something
important

I'm not asking for the best, perfect method for Cassandra backups. I'd just
like to see what others are doing and hopefully use some ideas to improve
our processes.

Thanks in advance for any responses, and sorry for the wall of text.

-Gene