I have a couple of questions regarding the coordination of Cassandra nodetool snapshots with Amazon EBS snapshots as part of a Cassandra backup/restore strategy.
Background: I have a cluster running in EC2. Its nodes are configured like so: * Instance type: m1.xlarge * Cassandra commit log writing to RAID-0 ephemeral storage * Cassandra data writing to an EBS volume. Note: there is a lot of conflicting information/advice about using Cassandra in EC2 w.r.t ephemeral vs. EBS. The above configuration seems to work well for my application. I only described this to provide context for my EBS snapshotting question. With respect, I hope not to debate Cassandra performance for ephemeral vs. EBS in this thread! I am setting up a process that performs regular EBS (->S3) snapshots for the purpose of backing up Cassandra plus other data. I presume this will need to be coordinated with regular Cassandra (nodetool) snapshots also. My questions: 1. Is it feasible to run directly against a Cassandra data directory restored from an EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot). 2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far apart can they be before its a problem? (seconds? minutes? hours?) My motivation for these two questions: I'm trying to figure out how much effort needs to be put into: * Time-coordinated scheduling of nodetool snapshots across the cluster * Automation of the process of determining the most appropriate set of nodetool snapshots to use when restoring a cluster. Thanks!