This is a great way to think through the problem and solution. I will add that 
part of my calculation on failure time is how long does it take to actually 
replace a drive and/or a server with (however many) drives? We pay for very 
fast vendor SLAs. However, in reality, there has been quite a bit more activity 
before any of those SLAs kicks in and then the hardware is actually ready for 
use by Cassandra. So, I calculate my needed capacity and preferred node sizes 
with those factors included. (This is for on-prem hardware, not a 
cloud-there’s-always-a-spare model.)


Sean Durity

From: Jeff Jirsa <jji...@gmail.com>
Sent: Wednesday, January 20, 2021 11:59 AM
To: cassandra <user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Node Size


Not going to give a number other than to say that 1TB/instance is probably 
super super super conservative in 2021. The modern number is likely 
considerably higher. But let's look at this from first principles. There's 
basically two things to worry about here:

1) Can you get enough CPU/memory to support a query load over that much data, 
and
2) When that machine fails, what happens?

Let's set aside 1, because you can certainly find some query pattern that 
works, e.g. write-only with time window compaction or something where there's 
very little actual work to maintain state.

So focusing on 2, a few philosophical notes:

2.a) For each range, cassandra streams from one replica. That means if you use 
a single token and RF=3, you're probably streaming from 3 hosts at a time
2.b) In cassandra 0.whatever to 3.11, streaming during replacement presumed 
that you would only send a portion of each data file to the new node, so it 
deserialized and reserialized most of the contents, even if the whole file was 
being sent (in LCS, sending the whole file is COMMON; in TWCS / STCS, it's less 
common)
2.c) Each data file doing the partial file streaming ser/deser uses exactly one 
core/thread on the receiving side. Adding extra cpu doesnt speed up streaming 
when you have to serialize/deserialize.
2.d) The more disks you put into a system, the more likely it is that any disk 
on a host fails, so your frequency of failure will go up with more disks.

What's that mean?

The time it takes to rebuild a failed node depends on:
- Whether or not you're using vnodes (recalling that Joey at Netflix did some 
fun math that says lots of vnodes makes your chance of outage/dataloss go up 
very very quickly)
- Whether or not you're using LCS (recalling that LCS is super IO intensive 
compared to other compaction strategies)
- Whether or not you're running RAID on the host

Vnodes means more streaming sources, but also increases your chance of an 
outage with concurrent host failures.
LCS means streaming is faster, but also requires a lot more IO to maintain
RAID is ... well, RAID. You're still doing the same type of rebuild operation 
there, and losing capacity, so ... dont do that probably.

If you are clever enough to run more than one cassandra instance on the host, 
you protect yourself from the "bad" vnode behaviors (likelihood of an outage 
with 2 hosts down, ability to do simultaneous hosts joining/leaving/moving, 
etc), but it requires multiple IPs and a lot more effort.

So, how much data can you put onto a machine? Calculate your failure rate. 
Calculate your rebuild time. Figure out your chances of two failures in that 
same window, and the cost to your business of an outage/data loss if that were 
to happen. Keep adjusting fill sizes / ratios until you get numbers that work 
for you.



On Wed, Jan 20, 2021 at 7:59 AM Joe Obernberger 
<joseph.obernber...@gmail.com<mailto:joseph.obernber...@gmail.com>> wrote:

Thank you Sean and Yakir.  Is 4.x the same?

So if you were to build a 1PByte system, you would want 512-1024 nodes?  
Doesn't seem space efficient vs say 48TByte nodes where you would need ~21 
machines.
What would you do to build a 1PByte configuration?  I know there are a lot of - 
it depends - on that question, but say it was a write heavy, light read setup.  
Thank you!

-Joe
On 1/20/2021 10:06 AM, Durity, Sean R wrote:
Yakir is correct. While it is feasible to have large disk nodes, the practical 
aspect of managing them is an issue. With the current technology, I do not 
build nodes with more than about 3.5 TB of disk available. I prefer 1-2 TB, but 
costs/number of nodes can change the considerations.

Putting more than 1 node of Cassandra on a given host is also possible, but you 
will want to consider your availability if that hardware goes down. Losing 2 or 
more nodes with one failure is usually not good.

NOTE: DataStax has some new features for supporting much larger disks and 
alleviating many of the admin pains associated with it. I don’t have personal 
experience with it, yet, but I will be testing it soon. In my understanding it 
is for use cases with massive needs for disk, but low to moderate throughput 
(ie, where node expansion is only for disk, not additional traffic).

Sean Durity

From: Yakir Gibraltar <yaki...@gmail.com><mailto:yaki...@gmail.com>
Sent: Wednesday, January 20, 2021 9:21 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Node Size

It possible to use large nodes and it will work, the problem of large nodes 
will be:

  *   Maintenance like join/remove nodes will take more time.
  *   Larger heap
  *   etc.

On Wed, Jan 20, 2021 at 3:54 PM Joe Obernberger 
<joseph.obernber...@gmail.com<mailto:joseph.obernber...@gmail.com>> wrote:
Anyone know where I could find out more information on this?
Thanks!

-Joe

On 1/13/2021 8:42 AM, Joe Obernberger wrote:
> Reading the documentation on Cassandra 3.x there is recommendations
> that node size should be ~1TByte of data.  Modern servers can have 24
> SSDs, each at 2TBytes in size for data.  Is that a bad idea for
> Cassandra?  Does 4.0beta4 handle larger nodes?
> We have machines that have 16, 8TBytes SATA drives - would that be a
> bad server for Cassandra?  Would it make sense to run multiple copies
> of Cassandra on the same node in that case?
>
> Thanks!
>
> -Joe
>

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>


--
בברכה,
יקיר גיברלטר

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

[Image removed by 
sender.][avg.com]<https://urldefense.com/v3/__http:/www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient__;!!M-nmYVHPHQ!ZsGwKqKTIhs3ZFvMXTzXUxkppCAiXXZ1sx0fsPypjMFlr3OYsfemtjeZXAJW849AvbtVW-I$>
Virus-free. www.avg.com 
[avg.com]<https://urldefense.com/v3/__http:/www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient__;!!M-nmYVHPHQ!ZsGwKqKTIhs3ZFvMXTzXUxkppCAiXXZ1sx0fsPypjMFlr3OYsfemtjeZXAJW849AvbtVW-I$>


________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to