RE: [EXTERNAL] Re: Running and Managing Large Cassandra Clusters

2020-10-29 Thread Gediminas Blazys
Hey,

@Tom I just wanted to clarify when I mention workload in the last email I meant 
it as the amount of requests the cluster has to serve.

Gediminas

From: Tom van der Woerdt 
Sent: Wednesday, October 28, 2020 14:35
To: user 
Subject: [EXTERNAL] Re: Running and Managing Large Cassandra Clusters

Heya,

We're running version 3.11.7, can't use 3.11.8 as it won't even start 
(CASSANDRA-16091). Our policy is to use LCS for everything unless there's a 
good argument for a different compaction strategy (I don't think we have *any* 
STCS at all other than system keyspaces). Since our nodes are mostly on-prem 
they are generally oversized on cpu count, but when idle the cluster with 360 
nodes ends up using less than two cores *peak* for background tasks like (full, 
weekly) repairs and tombstone compactions. That said they do get 32 logical 
threads because that's what the hardware ships with (-:

Haven't had major problems with Gossip over the years. I think we've had to run 
nodetool assassinate exactly once, a few years ago. Probably the only gossip 
related annoyance is that when you decommission all seed nodes Cassandra will 
happily run a single core at 100% trying to connect until you update the list 
of seeds, but that's really minor.

There's also one cluster that has 50TB nodes, 60 of them, storing reasonably 
large cells (using LCS, previously TWCS, both fine). Replacing a node takes a 
few days, but other than that it's not particularly problematic.

In my experience it's the small clusters that wake you up ;-)

Tom van der Woerdt
Senior Site Reliability Engineer
Booking.com BV
Vijzelstraat Amsterdam Netherlands 1017HL
[Booking.com]<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.booking.com%2F=04%7C01%7CGediminas.Blazys%40microsoft.com%7C49ac72df223f4567734408d87b3deb3b%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637394853158328199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=wnoZ9an0Fz1ePKGR2XYIODNFCKnbYxD03PYAStuzxKE%3D=0>
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million 
reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 12:32 PM Joshua McKenzie 
mailto:jmcken...@apache.org>> wrote:

A few questions for you Tom if you have 30 seconds and care to disclose:

  1.  What version of C*?
  2.  What compaction strategy?
  3.  What's core count allocated per C* node?
  4.  Gossip give you any headaches / you have to be delicate there or does it 
behave itself?
Context: pmc/committer and I manage the OSS C* team at DataStax. We're doing a 
lot of thinking about how to generally improve the operator experience across 
the board for folks in the post 4.0 time frame, so data like the above (where 
things are going well at scale and why) is super useful to help feed into that 
effort.

Thanks!



On Wed, Oct 28, 2020 at 7:14 AM, Tom van der Woerdt 
mailto:tom.vanderwoe...@booking.com.invalid>>
 wrote:
Does 360 count? :-)

num_tokens is 16, works fine (had 256 on a 300 node cluster as well, not too 
many problems either). Roughly 2.5TB per node, running on-prem on reasonably 
stable hardware so replacements end up happening once a week at most, and 
there's no particular change needed in the automation. Scaling up or down takes 
a while, but it doesn't appear to be slower than any other cluster. 
Configuration wise it's no different than a 5-node cluster either. Pretty 
uneventful tbh.

Tom van der Woerdt
Senior Site Reliability Engineer
Booking.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbooking.com%2F=04%7C01%7CGediminas.Blazys%40microsoft.com%7C49ac72df223f4567734408d87b3deb3b%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637394853158338189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=r5lT3%2BeixKW2I1Lj6aI%2F6TUe0shDgqXLQHxgu%2FbwWTk%3D=0>
 BV
Vijzelstraat Amsterdam Netherlands 1017HL
[Booking.com]<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.booking.com%2F=04%7C01%7CGediminas.Blazys%40microsoft.com%7C49ac72df223f4567734408d87b3deb3b%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637394853158338189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=Ba%2FIbybyy2p%2FsreGLBCxBedwl7mkgyBpy6%2BOPJWPcL8%3D=0>
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million 
reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 8:58 AM Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 wrote:
Hello,

I wanted to seek out your opinion and experience.

Has anyone of you had a chance to run a Cassandra cluster of more than 350 
nodes?
What are the major configuration considerations th

RE: [EXTERNAL] Re: Running and Managing Large Cassandra Clusters

2020-10-28 Thread Gediminas Blazys
Hey,

Thanks chipping in Tomas. Could you describe what sort of workload is the big 
cluster receiving in terms of local C* reads, writes and client requests as 
well?

You mention repairs, how do you run them?

Gediminas

From: Tom van der Woerdt 
Sent: Wednesday, October 28, 2020 14:35
To: user 
Subject: [EXTERNAL] Re: Running and Managing Large Cassandra Clusters

Heya,

We're running version 3.11.7, can't use 3.11.8 as it won't even start 
(CASSANDRA-16091). Our policy is to use LCS for everything unless there's a 
good argument for a different compaction strategy (I don't think we have *any* 
STCS at all other than system keyspaces). Since our nodes are mostly on-prem 
they are generally oversized on cpu count, but when idle the cluster with 360 
nodes ends up using less than two cores *peak* for background tasks like (full, 
weekly) repairs and tombstone compactions. That said they do get 32 logical 
threads because that's what the hardware ships with (-:

Haven't had major problems with Gossip over the years. I think we've had to run 
nodetool assassinate exactly once, a few years ago. Probably the only gossip 
related annoyance is that when you decommission all seed nodes Cassandra will 
happily run a single core at 100% trying to connect until you update the list 
of seeds, but that's really minor.

There's also one cluster that has 50TB nodes, 60 of them, storing reasonably 
large cells (using LCS, previously TWCS, both fine). Replacing a node takes a 
few days, but other than that it's not particularly problematic.

In my experience it's the small clusters that wake you up ;-)

Tom van der Woerdt
Senior Site Reliability Engineer
Booking.com BV
Vijzelstraat Amsterdam Netherlands 1017HL
[Booking.com]<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.booking.com%2F=04%7C01%7CGediminas.Blazys%40microsoft.com%7C49ac72df223f4567734408d87b3deb3b%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637394853158328199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=wnoZ9an0Fz1ePKGR2XYIODNFCKnbYxD03PYAStuzxKE%3D=0>
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million 
reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 12:32 PM Joshua McKenzie 
mailto:jmcken...@apache.org>> wrote:

A few questions for you Tom if you have 30 seconds and care to disclose:

  1.  What version of C*?
  2.  What compaction strategy?
  3.  What's core count allocated per C* node?
  4.  Gossip give you any headaches / you have to be delicate there or does it 
behave itself?
Context: pmc/committer and I manage the OSS C* team at DataStax. We're doing a 
lot of thinking about how to generally improve the operator experience across 
the board for folks in the post 4.0 time frame, so data like the above (where 
things are going well at scale and why) is super useful to help feed into that 
effort.

Thanks!



On Wed, Oct 28, 2020 at 7:14 AM, Tom van der Woerdt 
mailto:tom.vanderwoe...@booking.com.invalid>>
 wrote:
Does 360 count? :-)

num_tokens is 16, works fine (had 256 on a 300 node cluster as well, not too 
many problems either). Roughly 2.5TB per node, running on-prem on reasonably 
stable hardware so replacements end up happening once a week at most, and 
there's no particular change needed in the automation. Scaling up or down takes 
a while, but it doesn't appear to be slower than any other cluster. 
Configuration wise it's no different than a 5-node cluster either. Pretty 
uneventful tbh.

Tom van der Woerdt
Senior Site Reliability Engineer
Booking.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbooking.com%2F=04%7C01%7CGediminas.Blazys%40microsoft.com%7C49ac72df223f4567734408d87b3deb3b%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637394853158338189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=r5lT3%2BeixKW2I1Lj6aI%2F6TUe0shDgqXLQHxgu%2FbwWTk%3D=0>
 BV
Vijzelstraat Amsterdam Netherlands 1017HL
[Booking.com]<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.booking.com%2F=04%7C01%7CGediminas.Blazys%40microsoft.com%7C49ac72df223f4567734408d87b3deb3b%7C72f988bf86f141af91ab2d7cd011db47%7C0%7C0%7C637394853158338189%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=Ba%2FIbybyy2p%2FsreGLBCxBedwl7mkgyBpy6%2BOPJWPcL8%3D=0>
Making it easier for everyone to experience the world since 1996
43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 million 
reported listings
Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)


On Wed, Oct 28, 2020 at 8:58 AM Gediminas Blazys 
mailto:gediminas.bla...@microsoft.com.invalid>>
 wrote:
Hello,

I wanted to seek out your opinion and experience.

Has anyone of you had a chance to run a Cassandra cluster of more