Yup as everyone has mentioned ephemeral are fine if you run in multiple
AZs... which is pretty much mandatory for any production deployment in AWS
(and other cloud providers) . i2.2xls are generally your best bet for high
read throughput applications on AWS.

Also on AWS ephemeral storage will generally survive a user initiated
restart. For the times that AWS retires an instance, you get plenty of
notice and it's generally pretty rare. We run over 1000 instances on AWS
and see one forced retirement a month if that. We've never had an instance
pulled from under our feet without warning.

To add another option for the original question, one thing you can do is to
attach a large EBS drive to the instance and bind mount it to the directory
for the table that has the very large SSTables. You will need to copy data
across to the EBS volume. Let everything compact and then copy everything
back and detach EBS. Latency may be higher than normal on the node you are
doing this on (especially if you are used to i2.2xl performance).

This is something we often have to do, when we encounter pathological
compaction situations associated with bootstrapping, adding new DCs or STCS
with a dominant table or people ignore high disk usage warnings :)

On Mon, 17 Oct 2016 at 12:43 Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:

> Ephemeral is fine, you just need to have enough replicas (in enough AZs
> and enough regions) to tolerate instances being terminated.
>
>
>
>
>
>
>
> *From: *Vladimir Yudovin <vla...@winguzone.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, October 17, 2016 at 11:48 AM
> *To: *user <user@cassandra.apache.org>
>
>
> *Subject: *Re: Adding disk capacity to a running node
>
>
>
> It's extremely unreliable to use ephemeral (local) disks. Even if you
> don't stop instance by yourself, it can be restarted on different server in
> case of some hardware failure or AWS initiated update. So all node data
> will be lost.
>
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com-3Ffrom-3Dlist&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU&s=4q7P9fddEYpXwPR-h9yA_tk5JwR8l6c7cKJ-LQTVcGM&e=>
> - Hosted Cloud Cassandra on Azure and SoftLayer.Launch your cluster in
> minutes.*
>
>
>
>
>
> ---- On Mon, 17 Oct 2016 14:45:00 -0400*Seth Edwards <s...@pubnub.com
> <s...@pubnub.com>>* wrote ----
>
>
>
> These are i2.2xlarge instances so the disks currently configured as
> ephemeral dedicated disks.
>
>
>
> On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael <
> michael.la...@nytimes.com> wrote:
>
>
>
> You could just expand the size of your ebs volume and extend the file
> system. No data is lost - assuming you are running Linux.
>
>
>
>
>
> On Monday, October 17, 2016, Seth Edwards <s...@pubnub.com> wrote:
>
> We're running 2.0.16. We're migrating to a new data model but we've had an
> unexpected increase in write traffic that has caused us some capacity
> issues when we encounter compactions. Our old data model is on STCS. We'd
> like to add another ebs volume (we're on aws) to our JBOD config and
> hopefully avoid any situation where we run out of disk space during a large
> compaction. It appears that the behavior we are hoping to get is actually
> undesirable and removed in 3.2. It still might be an option for us until we
> can finish the migration.
>
>
>
> I'm not familiar with LVM so it may be a bit risky to try at this point.
>
>
>
> On Mon, Oct 17, 2016 at 9:42 AM, Yabin Meng <yabinm...@gmail.com> wrote:
>
> I assume you're talking about Cassandra JBOD (just a bunch of disk) setup
> because you do mention it as adding it to the list of data directories. If
> this is the case, you may run into issues, depending on your C* version.
> Check this out: http://www.datastax.com/dev/blog/improving-jbod
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_improving-2Djbod&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU&s=e_rkkJ8RHJXe4KvyNfeRWQkdy-zZzOnaMDQle3nN808&e=>
> .
>
>
>
> Or another approach is to use LVM to manage multiple devices into a single
> mount point. If you do so, from what Cassandra can see is just simply
> increased disk storage space and there should should have no problem.
>
>
>
> Hope this helps,
>
>
>
> Yabin
>
>
>
> On Mon, Oct 17, 2016 at 11:54 AM, Vladimir Yudovin <vla...@winguzone.com>
> wrote:
>
>
>
> Yes, Cassandra should keep percent of disk usage equal for all disk.
> Compaction process and SSTable flushes will use new disk to distribute both
> new and existing data.
>
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__winguzone.com-3Ffrom-3Dlist&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU&s=4q7P9fddEYpXwPR-h9yA_tk5JwR8l6c7cKJ-LQTVcGM&e=>
> - Hosted Cloud Cassandra on Azure and SoftLayer.Launch your cluster in
> minutes.*
>
>
>
>
>
> ---- On Mon, 17 Oct 2016 11:43:27 -0400*Seth Edwards <s...@pubnub.com
> <s...@pubnub.com>>* wrote ----
>
>
>
> We have a few nodes that are running out of disk capacity at the moment
> and instead of adding more nodes to the cluster, we would like to add
> another disk to the server and add it to the list of data directories. My
> question, is, will Cassandra use the new disk for compactions on sstables
> that already exist in the primary directory?
>
>
>
>
>
>
>
> Thanks!
>
>
>
>
>
>
> ____________________________________________________________________
> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and
> may be legally privileged. If you are not the intended recipient, do not
> disclose, copy, distribute, or use this email or any attachments. If you
> have received this in error please let the sender know and then delete the
> email and all attachments.
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer

Reply via email to