We have a cluster over 100 nodes that performs just fine for its use case. In 
our case, we needed the disk space and did not want the admin headache of very 
dense nodes. It does take more automation and process to handle a larger 
cluster, but those are all good things to solve anyway.

But count me in on being interested in what DataStax is calling “Big Node.” 
Would love to be able to use denser nodes, if the headaches are reduced.


Sean Durity

From: Ben Slater <ben.sla...@instaclustr.com>
Sent: Wednesday, November 07, 2018 6:08 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Multiple cluster for a single application

I tend to recommend an approach similar to Eric’s functional sharding although 
I describe it at quality of service sharding - group your small, hot data into 
one cluster and your large, cooler data into another so you can provision 
infrastructure and tune according. I guess it depends on you management 
environment but if you app functionality allows your to split into multiple 
clusters (ie all your data is not all in one giant table) then I would 
generally look to split. Splitting also gives you the advantage of making it 
harder to have an outage that brings everything down.

Cheers
Ben

On Thu, 8 Nov 2018 at 08:44 Jonathan Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:
Interesting approach Eric, thanks for sharing that.

Regarding this:

> I've read documents recommended to use clusters with less than 50 or 100 
> nodes (Netflix got hundreds of clusters with less 100 nodes on each).

Not sure where you read that, but it's nonsense.  We work with quite a few 
clusters that are several hundred nodes each.  Your problems can get a bit 
amplified, for instance dynamic snitch can make a cluster perform significantly 
worse than if you just flat out disable it, which is what I usually recommend.

I'm curious how you arrived at the estimate of needing > 100 nodes.  Is that 
due to space constraints or performance ones?



On Wed, Nov 7, 2018 at 12:52 PM Eric Stevens 
<migh...@gmail.com<mailto:migh...@gmail.com>> wrote:
We are engaging in both strategies at the same time:

1) We call it functional sharding - we write to clusters targeted according to 
the type of data being written.  Because different data types often have 
different workloads this has the nice side effect of being able to tune each 
cluster according to its workload.  Your ability to grow in this dimension is 
limited by the number of business object types you're recording.

2) We write to clusters sharded by time.  Our objects are network security 
events, so there's always an element of time.  We encode that time into 
deterministic object IDs so that we are able to identify in the read path which 
shard to direct the request to by extracting the time component.  This basic 
idea should be able to work any time you're able to use surrogate keys instead 
of natural keys.  If you are using natural keys, you may be facing an 
unpleasant migration should you need to increase the number of shards in this 
dimension.

Our reason for engaging in the second strategy was not purely Cassandra's 
fault, rather we were using DSE with a search workload, and the cost of 
rebuilding Solr indexes on streaming operations (such as adding nodes to an 
existing cluster) required enough resources that we found it prohibitive.  
That's because the bootstrapping node was also taking a production write 
workload, and we didn't want to run our cluster with enough overhead that a 
node could bootstrap and take production workload at the same time.

For vanilla Cassandra workloads we have run clusters with quite a bit more 
nodes than 100 without any appreciable trouble.  Curious if you can share 
documents about clusters over 100 nodes causing troubles for users.  I'm 
wondering if it's related to node failure rate combined with vnodes meaning 
that several concurrent node failures cause a part of the ring to go offline 
too reliably.

On Mon, Nov 5, 2018 at 7:38 AM onmstester onmstester 
<onmstes...@zoho.com.invalid> wrote:
Hi,

One of my applications requires to create a cluster with more than 100 nodes, 
I've read documents recommended to use clusters with less than 50 or 100 nodes 
(Netflix got hundreds of clusters with less 100 nodes on each).
Is it a good idea to use multiple clusters for a single application, just to 
decrease maintenance problems and system complexity/performance?
If So, which one of below policies is more suitable to distribute data among 
clusters and Why?
1. each cluster' would be responsible for a specific partial set of tables only 
(table sizes are almost equal so easy calculations here) for example inserts to 
table X would go to cluster Y
2. shard data at loader level by some business logic grouping of data, for 
example all rows with some column starting with X would go to cluster Y

I would appreciate sharing your experiences working with big clusters, problem 
encountered and solutions.

Thanks in Advance


Sent using Zoho 
Mail<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.zoho.com_mail_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JtyCwjLIoXfm4NyUJtOP4ZzXH53yxOv5HqVRr7uXlcg&s=A5A3vTpzb6q3lTYHvQIGaSbFOgjYPvzZ2bkci_kaUqQ&e=>




--
Jon Haddad
http://www.rustyrazorblade.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.rustyrazorblade.com&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JtyCwjLIoXfm4NyUJtOP4ZzXH53yxOv5HqVRr7uXlcg&s=1dsAzxmOFRB3s1iSoo_3vCJ1GDqX-qd2raWQOZRDXTs&e=>
twitter: rustyrazorblade
--

Ben Slater
Chief Product Officer
[Image removed by 
sender.]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JtyCwjLIoXfm4NyUJtOP4ZzXH53yxOv5HqVRr7uXlcg&s=GtqmWR9o1iuNIC1FoMe-zcI1grRkJold6_9Og8tyhM8&e=>

[Image removed by 
sender.]<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_instaclustr&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JtyCwjLIoXfm4NyUJtOP4ZzXH53yxOv5HqVRr7uXlcg&s=YKfIIXd1PXZIACLNqwTmXmaLqMohFMH-s2-GNHip9JQ&e=>
  [Image removed by sender.] 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_instaclustr&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JtyCwjLIoXfm4NyUJtOP4ZzXH53yxOv5HqVRr7uXlcg&s=JLc8FWMtWT4Pm3INKSG1iiKEIZgA3BGnVF4p5VXwveY&e=>
   [Image removed by sender.] 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_instaclustr&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JtyCwjLIoXfm4NyUJtOP4ZzXH53yxOv5HqVRr7uXlcg&s=mVevNHYCE0qZvQjnrjnheyl5JrqDmZh_CJOik8f3ATQ&e=>

Read our latest technical blog posts 
here<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=JtyCwjLIoXfm4NyUJtOP4ZzXH53yxOv5HqVRr7uXlcg&s=DJEnwUu2DbFiYN4sgGOWWBxn5PwK7mugZBJVhJTGMlI&e=>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Reply via email to