cassandra backup

Marcelo Elias Del Valle Fri, 06 Dec 2013 05:15:01 -0800

Hello everyone,

    I am trying to create backups of my data on AWS. My goal is to store
the backups on S3 or glacier, as it's cheap to store this kind of data. So,
if I have a cluster with N nodes, I would like to copy data from all N
nodes to S3 and be able to restore later. I know Priam does that (we were
using it), but I am using the latest cassandra version and we plan to use
DSE some time, I am not sure Priam fits this case.
    I took a look at the docs:
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_backup_takes_snapshot_t.html


    And I am trying to understand if it's really needed to take a snapshot
to create my backup. Suppose I do a flush and copy the sstables from each
node, 1 by one, to s3. Not all at the same time, but one by one.
    When I try to restore my backup, data from node 1 will be older than
data from node 2. Will this cause problems? AFAIK, if I am using a
replication factor of 2, for instance, and Cassandra sees data from node X
only, it will automatically copy it to other nodes, right? Is there any
chance of cassandra nodes become corrupt somehow if I do my backups this
way?

Best regards,
Marcelo Valle.

cassandra backup

Reply via email to