I'd recommend not storing commit logs or data files on EBS volumes if your machines are under any decent amount of load. I say that for three reasons.
First, both EBS volumes contend directly for network throughput with what appears to be a peer QoS policy to standard packets. In other words, if you're saturating a network link, EBS throughput falls. The same has not been true of ephemeral volumes in all of our testing, ephemeral I/O speeds tend to only take a minor hit under network pressure and are consistently faster in raw speed tests. Second, at some point it's a given that you will encounter misbehaving EBS volumes. They won't completely fail, worse they will just get really, really slow. Often times this is worse than a total failure because the system just back piles reads/writes but doesn't totally fall over until the entire cluster becomes overwhelmed. We've never had single volume ephemeral problems. Lastly, I think people have a tendency to bolt a large number of EBS volumes to a host and think that because they have disk capacity they serve more data from fewer hosts. If you push that too far, you'll outstrip the ability of the system to keep effective buffer caches and concurrently serve requests for all the data it is responsible for managing. IME there is pretty good parity between an EC2 XL and the ephemeral disks available relative to how Cassandra uses disk and RAM that adding more storage is right at the breaking point of over committing your hardware. If you want protection from AZ failure, split you ring across AZs (Cassandra is quite good at this) or copy snapshots to EBS volumes. -erik There are a lot of benefits to EBS volumes, I/O throughput and reliability are not among those benefits. On Wed, Mar 9, 2011 at 8:39 AM, William Oberman <ober...@civicscience.com> wrote: > I thought nodetool snapshot writes the snapshot locally, requiring 2x of > expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs > snapshot). By that I mean EBS allocation is GB allocated per month costs at > one rate, and EBS snapshots are delta compressed copies to S3. > > Can you point the snapshot to an external filesystem? > > will > > On Wed, Mar 9, 2011 at 11:31 AM, Sasha Dolgy <sdo...@gmail.com> wrote: >> >> Could you not nodetool snapshot the data into an mounted ebs/s3 bucket and >> satisfy your development requirement? >> -sd >> >> On Wed, Mar 9, 2011 at 5:23 PM, William Oberman <ober...@civicscience.com> >> wrote: >>> >>> For me, to transition production data into a development environment for >>> real world testing. Also, backups are never a bad idea, though I agree most >>> all risk is mitigated due to cassandra's design. >>> >>> will > > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) ober...@civicscience.com >