Re: AWS ephemeral instances + backup

2019-12-09 Thread Carl Mueller
Jeff: the gp2 drives are expensive, especially if you have to make them
unnecessarily large to get the IOPS, and I want to get cheap per node as
possible to get as many nodes as possible.

i3 + a cheap rust backup beats an m5 or similar one + EBS gp2 in cost when
i did the numbers

Ben: Going to s3 would be even cheaper and probably about the same speed, I
think I was avoiding it for the network cost and throttling/not throttling,
but if it is cheap enough vs the rust EBS then I'll do that. I think I came
across your page when doing earlier research.

Jon: I have my own thing that is very similar to medusa but supports our
wonky various modes of access (bastions, ipv6, etc). Very similar with
comparative incremental backups and the like. The backups run at scheduled
times, but my rewrite would enable a more local strategy by watching the
sstabledirs. The restore modes of medusa are better in some respects, but I
can do more complicated things too. I'm trying to abstract access mode
(k8/ssh/etc), cloud, and even tech (kafka/cassandra) in a rewrite and it is
damn hard to avoid leakage of abstractions

Reid: possibly we could but the ebs snapshot needs to do the 100G's every
time, while various sstable copies/incremental backups just do the new
files so the raw amount of bits being saved is just faster and more
resiliant

Thank you everyone, at least with all you bigwigs giving advice I can argue
from appeal to authority to management :-) (which is always more effective
than arguing from reason or evidence)


On Fri, Dec 6, 2019 at 9:18 AM Reid Pinchback 
wrote:

> Correction:  “most of your database will be in chunk cache, or buffer
> cache anyways.
>
>
>
> *From: *Reid Pinchback 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Friday, December 6, 2019 at 10:16 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: AWS ephemeral instances + backup
>
>
>
> *Message from External Sender*
>
> If you’re only going to have a small storage footprint per node like
> 100gb, another option comes to mind. Use an instance type with large ram.
> Use an EBS storage volume on an EBS-optimized instance type, and take EBS
> snapshots. Most of your database will be in chunk cache anyways, so you
> only need to make sure that the dirty background writer is keeping up.  I’d
> take a look at iowait during a snapshot and see if the results are
> acceptable for a running node.  Even if it is marginal, if you’re only
> snapshotting one node at a time, then speculative retry would just skip
> over the temporary slowpoke.
>
>
>
> *From: *Carl Mueller 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Thursday, December 5, 2019 at 3:21 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *AWS ephemeral instances + backup
>
>
>
> *Message from External Sender*
>
> Does anyone have experience tooling written to support this strategy:
>
> Use case: run cassandra on i3 instances on ephemerals but synchronize the
> sstables and commitlog files to the cheapest EBS volume type (those have
> bad IOPS but decent enough throughput)
>
> On node replace, the startup script for the node, back-copies the sstables
> and commitlog state from the EBS to the ephemeral.
>
> As can be seen:
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=>
>
> the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS
> volumes presumably) that would incur about a ten minute delay for node
> replacement for a 1TB node, but I imagine this would only be used on higher
> IOPS r/w nodes with smaller densities, so 100GB would be about a minute of
> delay only, already within the timeframes of an AWS node
> replacement/instance restart.
>
>
>


Re: AWS ephemeral instances + backup

2019-12-06 Thread Reid Pinchback
Correction:  “most of your database will be in chunk cache, or buffer cache 
anyways.

From: Reid Pinchback 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, December 6, 2019 at 10:16 AM
To: "user@cassandra.apache.org" 
Subject: Re: AWS ephemeral instances + backup

Message from External Sender
If you’re only going to have a small storage footprint per node like 100gb, 
another option comes to mind. Use an instance type with large ram.  Use an EBS 
storage volume on an EBS-optimized instance type, and take EBS snapshots. Most 
of your database will be in chunk cache anyways, so you only need to make sure 
that the dirty background writer is keeping up.  I’d take a look at iowait 
during a snapshot and see if the results are acceptable for a running node.  
Even if it is marginal, if you’re only snapshotting one node at a time, then 
speculative retry would just skip over the temporary slowpoke.

From: Carl Mueller 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, December 5, 2019 at 3:21 PM
To: "user@cassandra.apache.org" 
Subject: AWS ephemeral instances + backup

Message from External Sender
Does anyone have experience tooling written to support this strategy:

Use case: run cassandra on i3 instances on ephemerals but synchronize the 
sstables and commitlog files to the cheapest EBS volume type (those have bad 
IOPS but decent enough throughput)

On node replace, the startup script for the node, back-copies the sstables and 
commitlog state from the EBS to the ephemeral.

As can be seen: 
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=>

the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS 
volumes presumably) that would incur about a ten minute delay for node 
replacement for a 1TB node, but I imagine this would only be used on higher 
IOPS r/w nodes with smaller densities, so 100GB would be about a minute of 
delay only, already within the timeframes of an AWS node replacement/instance 
restart.




Re: AWS ephemeral instances + backup

2019-12-06 Thread Reid Pinchback
If you’re only going to have a small storage footprint per node like 100gb, 
another option comes to mind. Use an instance type with large ram.  Use an EBS 
storage volume on an EBS-optimized instance type, and take EBS snapshots. Most 
of your database will be in chunk cache anyways, so you only need to make sure 
that the dirty background writer is keeping up.  I’d take a look at iowait 
during a snapshot and see if the results are acceptable for a running node.  
Even if it is marginal, if you’re only snapshotting one node at a time, then 
speculative retry would just skip over the temporary slowpoke.

From: Carl Mueller 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, December 5, 2019 at 3:21 PM
To: "user@cassandra.apache.org" 
Subject: AWS ephemeral instances + backup

Message from External Sender
Does anyone have experience tooling written to support this strategy:

Use case: run cassandra on i3 instances on ephemerals but synchronize the 
sstables and commitlog files to the cheapest EBS volume type (those have bad 
IOPS but decent enough throughput)

On node replace, the startup script for the node, back-copies the sstables and 
commitlog state from the EBS to the ephemeral.

As can be seen: 
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.aws.amazon.com_AWSEC2_latest_UserGuide_EBSVolumeTypes.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=vReT2cww6MdAQWz8b6u96QUK08ufU_4uP3X-zH4CyTc&s=CXEcXQAHUhdV8CrzCfURvvW9qRDp_Ji9TvbUgVwKxhA&e=>

the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS 
volumes presumably) that would incur about a ten minute delay for node 
replacement for a 1TB node, but I imagine this would only be used on higher 
IOPS r/w nodes with smaller densities, so 100GB would be about a minute of 
delay only, already within the timeframes of an AWS node replacement/instance 
restart.



Re: AWS ephemeral instances + backup

2019-12-05 Thread daemeon reiydelle
If you can handle the slower IO of S3 this can work, but you will have a
window of out of date images. YOu don't have a concept of persistent
snapshots.

<==>
Life lived is not about the size of the dog in the fight:
It is about the size of the fight in the dog.

*Daemeon Reiydelle*

*email: daeme...@gmail.com *
*San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle*



On Thu, Dec 5, 2019 at 2:06 PM Jon Haddad  wrote:

> You can easily do this with bcache or LVM
> http://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/.
>
> Medusa might be a good route to go down if you want to do backups instead:
> https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html
>
>
>
> On Thu, Dec 5, 2019 at 12:21 PM Carl Mueller
>  wrote:
>
>> Does anyone have experience tooling written to support this strategy:
>>
>> Use case: run cassandra on i3 instances on ephemerals but synchronize the
>> sstables and commitlog files to the cheapest EBS volume type (those have
>> bad IOPS but decent enough throughput)
>>
>> On node replace, the startup script for the node, back-copies the
>> sstables and commitlog state from the EBS to the ephemeral.
>>
>> As can be seen:
>> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>>
>> the (presumably) spinning rust tops out at 2375 MB/sec (using
>> multiple EBS volumes presumably) that would incur about a ten minute delay
>> for node replacement for a 1TB node, but I imagine this would only be used
>> on higher IOPS r/w nodes with smaller densities, so 100GB would be about a
>> minute of delay only, already within the timeframes of an AWS node
>> replacement/instance restart.
>>
>>
>>


Re: AWS ephemeral instances + backup

2019-12-05 Thread Jon Haddad
You can easily do this with bcache or LVM
http://rustyrazorblade.com/post/2018/2018-04-24-intro-to-lvm/.

Medusa might be a good route to go down if you want to do backups instead:
https://thelastpickle.com/blog/2019/11/05/cassandra-medusa-backup-tool-is-open-source.html



On Thu, Dec 5, 2019 at 12:21 PM Carl Mueller
 wrote:

> Does anyone have experience tooling written to support this strategy:
>
> Use case: run cassandra on i3 instances on ephemerals but synchronize the
> sstables and commitlog files to the cheapest EBS volume type (those have
> bad IOPS but decent enough throughput)
>
> On node replace, the startup script for the node, back-copies the sstables
> and commitlog state from the EBS to the ephemeral.
>
> As can be seen:
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>
> the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS
> volumes presumably) that would incur about a ten minute delay for node
> replacement for a 1TB node, but I imagine this would only be used on higher
> IOPS r/w nodes with smaller densities, so 100GB would be about a minute of
> delay only, already within the timeframes of an AWS node
> replacement/instance restart.
>
>
>


Re: AWS ephemeral instances + backup

2019-12-05 Thread Ben Slater
We have some tooling that does that kind of thing using S3 rather  than
attached EBS but a similar principle. There is a bit of an overview here:
https://www.instaclustr.com/advanced-node-replace/

It's become a pretty core part of our ops toolbox since we introduced it.

Cheers
Ben

---


*Ben Slater**Chief Product Officer*



   


Read our latest technical blog posts here
.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.


On Fri, 6 Dec 2019 at 08:32, Jeff Jirsa  wrote:

>
> No experience doing it that way personally, but I'm curious: Are you
> backing up in case of ephemeral instance dying, or backing up in case of
> data problems / errors / etc?
>
> On instance dying, you're probably fine with just straight normal
> replacements, not restoring from backup. For the rest, is it cheaper to use
> something like tablesnap and go straight to s3?
>
> On Thu, Dec 5, 2019 at 12:21 PM Carl Mueller
>  wrote:
>
>> Does anyone have experience tooling written to support this strategy:
>>
>> Use case: run cassandra on i3 instances on ephemerals but synchronize the
>> sstables and commitlog files to the cheapest EBS volume type (those have
>> bad IOPS but decent enough throughput)
>>
>> On node replace, the startup script for the node, back-copies the
>> sstables and commitlog state from the EBS to the ephemeral.
>>
>> As can be seen:
>> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>>
>> the (presumably) spinning rust tops out at 2375 MB/sec (using
>> multiple EBS volumes presumably) that would incur about a ten minute delay
>> for node replacement for a 1TB node, but I imagine this would only be used
>> on higher IOPS r/w nodes with smaller densities, so 100GB would be about a
>> minute of delay only, already within the timeframes of an AWS node
>> replacement/instance restart.
>>
>>
>>


Re: AWS ephemeral instances + backup

2019-12-05 Thread Jeff Jirsa
No experience doing it that way personally, but I'm curious: Are you
backing up in case of ephemeral instance dying, or backing up in case of
data problems / errors / etc?

On instance dying, you're probably fine with just straight normal
replacements, not restoring from backup. For the rest, is it cheaper to use
something like tablesnap and go straight to s3?

On Thu, Dec 5, 2019 at 12:21 PM Carl Mueller
 wrote:

> Does anyone have experience tooling written to support this strategy:
>
> Use case: run cassandra on i3 instances on ephemerals but synchronize the
> sstables and commitlog files to the cheapest EBS volume type (those have
> bad IOPS but decent enough throughput)
>
> On node replace, the startup script for the node, back-copies the sstables
> and commitlog state from the EBS to the ephemeral.
>
> As can be seen:
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html
>
> the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS
> volumes presumably) that would incur about a ten minute delay for node
> replacement for a 1TB node, but I imagine this would only be used on higher
> IOPS r/w nodes with smaller densities, so 100GB would be about a minute of
> delay only, already within the timeframes of an AWS node
> replacement/instance restart.
>
>
>


AWS ephemeral instances + backup

2019-12-05 Thread Carl Mueller
Does anyone have experience tooling written to support this strategy:

Use case: run cassandra on i3 instances on ephemerals but synchronize the
sstables and commitlog files to the cheapest EBS volume type (those have
bad IOPS but decent enough throughput)

On node replace, the startup script for the node, back-copies the sstables
and commitlog state from the EBS to the ephemeral.

As can be seen:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

the (presumably) spinning rust tops out at 2375 MB/sec (using multiple EBS
volumes presumably) that would incur about a ten minute delay for node
replacement for a 1TB node, but I imagine this would only be used on higher
IOPS r/w nodes with smaller densities, so 100GB would be about a minute of
delay only, already within the timeframes of an AWS node
replacement/instance restart.