I'd recommend not storing commit logs or data files on EBS volumes if
your machines are under any decent amount of load. I say that for
First, both EBS volumes contend directly for network throughput with
what appears to be a peer QoS policy to standard packets. In other
words, if you're saturating a network link, EBS throughput falls. The
same has not been true of ephemeral volumes in all of our testing,
ephemeral I/O speeds tend to only take a minor hit under network
pressure and are consistently faster in raw speed tests.
Second, at some point it's a given that you will encounter misbehaving
EBS volumes. They won't completely fail, worse they will just get
really, really slow. Often times this is worse than a total failure
because the system just back piles reads/writes but doesn't totally
fall over until the entire cluster becomes overwhelmed. We've never
had single volume ephemeral problems.
Lastly, I think people have a tendency to bolt a large number of EBS
volumes to a host and think that because they have disk capacity they
serve more data from fewer hosts. If you push that too far, you'll
outstrip the ability of the system to keep effective buffer caches and
concurrently serve requests for all the data it is responsible for
managing. IME there is pretty good parity between an EC2 XL and the
ephemeral disks available relative to how Cassandra uses disk and RAM
that adding more storage is right at the breaking point of over
committing your hardware.
If you want protection from AZ failure, split you ring across AZs
(Cassandra is quite good at this) or copy snapshots to EBS volumes.
There are a lot of benefits to EBS volumes, I/O throughput and
reliability are not among those benefits.
On Wed, Mar 9, 2011 at 8:39 AM, William Oberman
> I thought nodetool snapshot writes the snapshot locally, requiring 2x of
> expensive storage allocation 24x7 (vs. cheap storage allocation of a ebs
> snapshot). By that I mean EBS allocation is GB allocated per month costs at
> one rate, and EBS snapshots are delta compressed copies to S3.
> Can you point the snapshot to an external filesystem?
> On Wed, Mar 9, 2011 at 11:31 AM, Sasha Dolgy <sdo...@gmail.com> wrote:
>> Could you not nodetool snapshot the data into an mounted ebs/s3 bucket and
>> satisfy your development requirement?
>> On Wed, Mar 9, 2011 at 5:23 PM, William Oberman <ober...@civicscience.com>
>>> For me, to transition production data into a development environment for
>>> real world testing. Also, backups are never a bad idea, though I agree most
>>> all risk is mitigated due to cassandra's design.
> Will Oberman
> Civic Science, Inc.
> 3030 Penn Avenue., First Floor
> Pittsburgh, PA 15201
> (M) 412-480-7835
> (E) ober...@civicscience.com