Re: AWS I3.XLARGE retiring instances advices

2020-02-16 Thread Sergio
I really like these conversations. So feel free to continue this one or
create a new one Thanks to everyone participating :)


Il giorno dom 16 feb 2020 alle ore 14:04 Reid Pinchback <
rpinchb...@tripadvisor.com> ha scritto:

> No actually in this case I didn’t really have an opinion because C* is an
> architecturally different beast than an RDBMS.  That’s kinda what ticked
> the curiosity when you made the suggestion about co-locating commit and
> data.  It raises an interesting question for me.  As for the 10 seconds
> delay, I’m used to looking at graphite, so bad is relative. 
>
>
>
> The question that pops to mind is this. If a commit log isn’t really an
> important recovery mechanism…. should one even be part of C* at all?  It’s
> a lot of code complexity and I/O volume and O/S tuning complexity to worry
> about having good I/O resiliency and performance with both commit and data
> volumes.
>
>
>
> If the proper way to deal with all data volume problems in C* would be to
> burn the node (or at least, it’s state) and rebuild via the state of its
> neighbours, then repairs (whether administratively triggered, or as a
> side-effect of ongoing operations) should always catch up with any
> mutations anyways so long as the data is appropriately replicated.  The
> benefit to the having a commit log would seem limited to data which isn’t
> replicated.
>
>
>
> However, I shouldn’t derail Sergio’s thread.  It just was something that
> caught my interest and got me mulling, but it’s a tangent.
>
>
>
> *From: *Erick Ramirez 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Friday, February 14, 2020 at 9:04 PM
> *To: *"user@cassandra.apache.org" 
> *Subject: *Re: AWS I3.XLARGE retiring instances advices
>
>
>
> *Message from External Sender*
>
> Erick, a question purely as a point of curiosity.  The entire model of a
> commit log, historically (speaking in RDBS terms), depended on a notion of
> stable store. The idea being that if your data volume lost recent writes,
> the failure mode there would be independent of writes to the volume holding
> the commit log, so that replay of the commit log could generally be
> depended on to recover the missing data.  I’d be curious what the C* expert
> viewpoint on that would be, with the commit log and data on the same volume.
>
>
>
> Those are fair points so thanks for bringing them up. I'll comment from a
> personal viewpoint and others can provide their opinions/feedback.
>
>
>
> If you think about it, you've lost the data volume -- not just the recent
> writes. Replaying the mutations in the commit log is probably insignificant
> compared to having to recover the data through various ways (re-bootstrap,
> refresh from off-volume/off-server snapshots, etc). The data and
> redo/archive logs being on the same volume (in my opinion) is more relevant
> in RDBMS since they're mostly deployed on SANs compared to the
> nothing-shared architecture of C*. I know that's debatable and others will
> have their own view. :)
>
>
>
> How about you, Reid? Do you have concerns about both data and commitlog
> being on the same disk? And slightly off-topic but by extension, do you
> also have concerns about the default commitlog fsync() being 10 seconds?
> Cheers!
>


Re: AWS I3.XLARGE retiring instances advices

2020-02-16 Thread Reid Pinchback
No actually in this case I didn’t really have an opinion because C* is an 
architecturally different beast than an RDBMS.  That’s kinda what ticked the 
curiosity when you made the suggestion about co-locating commit and data.  It 
raises an interesting question for me.  As for the 10 seconds delay, I’m used 
to looking at graphite, so bad is relative. 

The question that pops to mind is this. If a commit log isn’t really an 
important recovery mechanism…. should one even be part of C* at all?  It’s a 
lot of code complexity and I/O volume and O/S tuning complexity to worry about 
having good I/O resiliency and performance with both commit and data volumes.

If the proper way to deal with all data volume problems in C* would be to burn 
the node (or at least, it’s state) and rebuild via the state of its neighbours, 
then repairs (whether administratively triggered, or as a side-effect of 
ongoing operations) should always catch up with any mutations anyways so long 
as the data is appropriately replicated.  The benefit to the having a commit 
log would seem limited to data which isn’t replicated.

However, I shouldn’t derail Sergio’s thread.  It just was something that caught 
my interest and got me mulling, but it’s a tangent.

From: Erick Ramirez 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, February 14, 2020 at 9:04 PM
To: "user@cassandra.apache.org" 
Subject: Re: AWS I3.XLARGE retiring instances advices

Message from External Sender
Erick, a question purely as a point of curiosity.  The entire model of a commit 
log, historically (speaking in RDBS terms), depended on a notion of stable 
store. The idea being that if your data volume lost recent writes, the failure 
mode there would be independent of writes to the volume holding the commit log, 
so that replay of the commit log could generally be depended on to recover the 
missing data.  I’d be curious what the C* expert viewpoint on that would be, 
with the commit log and data on the same volume.

Those are fair points so thanks for bringing them up. I'll comment from a 
personal viewpoint and others can provide their opinions/feedback.

If you think about it, you've lost the data volume -- not just the recent 
writes. Replaying the mutations in the commit log is probably insignificant 
compared to having to recover the data through various ways (re-bootstrap, 
refresh from off-volume/off-server snapshots, etc). The data and redo/archive 
logs being on the same volume (in my opinion) is more relevant in RDBMS since 
they're mostly deployed on SANs compared to the nothing-shared architecture of 
C*. I know that's debatable and others will have their own view. :)

How about you, Reid? Do you have concerns about both data and commitlog being 
on the same disk? And slightly off-topic but by extension, do you also have 
concerns about the default commitlog fsync() being 10 seconds? Cheers!


Re: AWS I3.XLARGE retiring instances advices

2020-02-14 Thread Erick Ramirez
>
> Erick, a question purely as a point of curiosity.  The entire model of a
> commit log, historically (speaking in RDBS terms), depended on a notion of
> stable store. The idea being that if your data volume lost recent writes,
> the failure mode there would be independent of writes to the volume holding
> the commit log, so that replay of the commit log could generally be
> depended on to recover the missing data.  I’d be curious what the C* expert
> viewpoint on that would be, with the commit log and data on the same volume.


Those are fair points so thanks for bringing them up. I'll comment from a
personal viewpoint and others can provide their opinions/feedback.

If you think about it, you've lost the data volume -- not just the recent
writes. Replaying the mutations in the commit log is probably insignificant
compared to having to recover the data through various ways (re-bootstrap,
refresh from off-volume/off-server snapshots, etc). The data and
redo/archive logs being on the same volume (in my opinion) is more relevant
in RDBMS since they're mostly deployed on SANs compared to the
nothing-shared architecture of C*. I know that's debatable and others will
have their own view. :)

How about you, Reid? Do you have concerns about both data and commitlog
being on the same disk? And slightly off-topic but by extension, do you
also have concerns about the default commitlog fsync() being 10 seconds?
Cheers!

>


Re: AWS I3.XLARGE retiring instances advices

2020-02-14 Thread Reid Pinchback
I was curious and did some digging.  400k is the max read IOPs on the 1-device 
instance types, 3M IOPS is for the 8-device instance types.

From: Reid Pinchback 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, February 14, 2020 at 11:24 AM
To: "user@cassandra.apache.org" 
Subject: Re: AWS I3.XLARGE retiring instances advices

Message from External Sender
I’ve seen claims of 3M IOPS on reads for AWS, not sure about writes.  I think 
you just need a recent enough kernel to not get in the way of doing multiqueue 
operations against the NVMe device.

Erick, a question purely as a point of curiosity.  The entire model of a commit 
log, historically (speaking in RDBS terms), depended on a notion of stable 
store.  The idea being that if your data volume lost recent writes, the failure 
mode there would be independent of writes to the volume holding the commit log, 
so that replay of the commit log could generally be depended on to recover the 
missing data.  I’d be curious what the C* expert viewpoint on that would be, 
with the commit log and data on the same volume.

From: Sergio 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, February 13, 2020 at 10:46 PM
To: "user@cassandra.apache.org" 
Subject: Re: AWS I3.XLARGE retiring instances advices

A little off-topic but personally I would co-locate the commitlog on the same 
950GB NVMe SSD as the data files. You would get a much better write performance 
from the nodes compared to EBS and they shouldn't hurt your reads since the 
NVMe disks have very high IOPS. I think they can sustain 400K+ IOPS (don't 
quote me). I'm sure others will comment if they have a different experience. 
And of course, YMMV. Cheers!


Re: AWS I3.XLARGE retiring instances advices

2020-02-14 Thread Reid Pinchback
I’ve seen claims of 3M IOPS on reads for AWS, not sure about writes.  I think 
you just need a recent enough kernel to not get in the way of doing multiqueue 
operations against the NVMe device.

Erick, a question purely as a point of curiosity.  The entire model of a commit 
log, historically (speaking in RDBS terms), depended on a notion of stable 
store.  The idea being that if your data volume lost recent writes, the failure 
mode there would be independent of writes to the volume holding the commit log, 
so that replay of the commit log could generally be depended on to recover the 
missing data.  I’d be curious what the C* expert viewpoint on that would be, 
with the commit log and data on the same volume.

From: Sergio 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, February 13, 2020 at 10:46 PM
To: "user@cassandra.apache.org" 
Subject: Re: AWS I3.XLARGE retiring instances advices

A little off-topic but personally I would co-locate the commitlog on the same 
950GB NVMe SSD as the data files. You would get a much better write performance 
from the nodes compared to EBS and they shouldn't hurt your reads since the 
NVMe disks have very high IOPS. I think they can sustain 400K+ IOPS (don't 
quote me). I'm sure others will comment if they have a different experience. 
And of course, YMMV. Cheers!


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Jeff Jirsa
Feels that way and most people don’t do it, but definitely required for strict 
correctness.



> On Feb 13, 2020, at 8:57 PM, Erick Ramirez  wrote:
> 
> 
> Interesting... though it feels a bit extreme unless you're dealing with a 
> cluster that's constantly dropping mutations. In which case, you have bigger 
> problems anyway. :)


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Erick Ramirez
Interesting... though it feels a bit extreme unless you're dealing with a
cluster that's constantly dropping mutations. In which case, you have
bigger problems anyway. :)


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Jeff Jirsa
Option 1 is only strictly safe if you run repair while the down replica is
down (otherwise you validate quorum consistency guarantees)

Option 2 is probably easier to manage and wont require any special effort
to avoid violating consistency.

I'd probably go with option 2.


On Thu, Feb 13, 2020 at 7:16 PM Sergio  wrote:

> We have i3xlarge instances with data directory in the XFS filesystem that
> is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
> volume.
> Whenever AWS is going to retire the instance due to degraded hardware
> performance is it better:
>
> Option 1)
>- Nodetool drain
>- Stop cassandra
>- Restart the machine from aws-cli to be restored in a different VM
> from the hypervisor
>- Start Cassandra with -Dcassandra.replace_address
>- We lose only the ephemeral but the commit_logs, hints, saved_cache
> will be there
>
>
> OR
>
> Option 2)
>  - Add a new node and wait for the NORMAL status
>  - Decommission the one that is going to be retired
>  - Run cleanup with cstar across the datacenters
>
> ?
>
> Thanks,
>
> Sergio
>


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Sergio
Thank you for the advices!

Best!

Sergio

On Thu, Feb 13, 2020, 7:44 PM Erick Ramirez 
wrote:

> Option 1 is a cheaper option because the cluster doesn't need to rebalance
> (with the loss of a replica) post-decommission then rebalance again when
> you add a new node.
>
> The hints directory on EBS is irrelevant because it would only contain
> mutations to replay to down replicas if the node was a coordinator. In the
> scenario where the node itself goes down, other nodes will be storing hints
> for this down node. The saved_caches are also useless if you're
> bootstrapping the node into the cluster because the cache entries are only
> valid for the previous data files, not the newly streamed files from the
> bootstrap. Similarly, your commitlog directory will be empty -- that's
> the whole point of running nodetool drain. :)
>
> A little off-topic but *personally* I would co-locate the commitlog on
> the same 950GB NVMe SSD as the data files. You would get a much better
> write performance from the nodes compared to EBS and they shouldn't hurt
> your reads since the NVMe disks have very high IOPS. I think they can
> sustain 400K+ IOPS (don't quote me). I'm sure others will comment if they
> have a different experience. And of course, YMMV. Cheers!
>
>
>
> On Fri, 14 Feb 2020 at 14:16, Sergio  wrote:
>
>> We have i3xlarge instances with data directory in the XFS filesystem that
>> is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
>> volume.
>> Whenever AWS is going to retire the instance due to degraded hardware
>> performance is it better:
>>
>> Option 1)
>>- Nodetool drain
>>- Stop cassandra
>>- Restart the machine from aws-cli to be restored in a different VM
>> from the hypervisor
>>- Start Cassandra with -Dcassandra.replace_address
>>- We lose only the ephemeral but the commit_logs, hints, saved_cache
>> will be there
>>
>>
>> OR
>>
>> Option 2)
>>  - Add a new node and wait for the NORMAL status
>>  - Decommission the one that is going to be retired
>>  - Run cleanup with cstar across the datacenters
>>
>> ?
>>
>> Thanks,
>>
>> Sergio
>>
>


Re: AWS I3.XLARGE retiring instances advices

2020-02-13 Thread Erick Ramirez
Option 1 is a cheaper option because the cluster doesn't need to rebalance
(with the loss of a replica) post-decommission then rebalance again when
you add a new node.

The hints directory on EBS is irrelevant because it would only contain
mutations to replay to down replicas if the node was a coordinator. In the
scenario where the node itself goes down, other nodes will be storing hints
for this down node. The saved_caches are also useless if you're
bootstrapping the node into the cluster because the cache entries are only
valid for the previous data files, not the newly streamed files from the
bootstrap. Similarly, your commitlog directory will be empty -- that's the
whole point of running nodetool drain. :)

A little off-topic but *personally* I would co-locate the commitlog on the
same 950GB NVMe SSD as the data files. You would get a much better write
performance from the nodes compared to EBS and they shouldn't hurt your
reads since the NVMe disks have very high IOPS. I think they can sustain
400K+ IOPS (don't quote me). I'm sure others will comment if they have a
different experience. And of course, YMMV. Cheers!



On Fri, 14 Feb 2020 at 14:16, Sergio  wrote:

> We have i3xlarge instances with data directory in the XFS filesystem that
> is ephemeral and *hints*, *commit_log* and *saved_caches* in the EBS
> volume.
> Whenever AWS is going to retire the instance due to degraded hardware
> performance is it better:
>
> Option 1)
>- Nodetool drain
>- Stop cassandra
>- Restart the machine from aws-cli to be restored in a different VM
> from the hypervisor
>- Start Cassandra with -Dcassandra.replace_address
>- We lose only the ephemeral but the commit_logs, hints, saved_cache
> will be there
>
>
> OR
>
> Option 2)
>  - Add a new node and wait for the NORMAL status
>  - Decommission the one that is going to be retired
>  - Run cleanup with cstar across the datacenters
>
> ?
>
> Thanks,
>
> Sergio
>