I have a couple of questions regarding the coordination of Cassandra nodetool 
snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore 
strategy.

Background: I have a cluster running in EC2. Its nodes are configured like so:

* Instance type: m1.xlarge
* Cassandra commit log writing to RAID-0 ephemeral storage
* Cassandra data writing to an EBS volume.

Note: there is a lot of conflicting information/advice about using Cassandra in 
EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well for my 
application. I only described this to provide context for my EBS snapshotting 
question. With respect, I hope not to debate Cassandra performance for 
ephemeral vs. EBS in this thread!

I am setting up a process that performs regular EBS (->S3) snapshots for the 
purpose of backing up Cassandra plus other data.
I presume this will need to be coordinated with regular Cassandra (nodetool) 
snapshots also.

My questions:
1. Is it feasible to run directly against a Cassandra data directory restored 
from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS 
snapshot).
2. Noting the wiki's consistent Cassandra backups advice; if I schedule 
nodetool snapshots across the cluster, should the relative age of the 'sibling' 
snapshots be a concern? How far apart can they be before its a problem? 
(seconds? minutes? hours?)

My motivation for these two questions: I'm trying to figure out how much effort 
needs to be put into:
* Time-coordinated scheduling of nodetool snapshots across the cluster
* Automation of the process of determining the most appropriate set of nodetool 
snapshots to use when restoring a cluster.

Thanks!

Reply via email to