Best practice: Multiple clusters vs multiple tables in a single cluster?
Hi all - We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables. Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system. Does anyone have pros/cons to share on this? It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference. But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops. While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster... Thanks! - Ian
Re: Best practice: Multiple clusters vs multiple tables in a single cluster?
Adding a new keyspace should be perfectly fine. Unless you have completely distinct workloads for the different keyspaces. Even so you can balanced some stuff at keyspace/table level. But I would go with a new keyspace not with a new cluster given the small size you say you have. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose ianr...@fullstory.com wrote: Hi all - We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables. Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system. Does anyone have pros/cons to share on this? It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference. But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops. While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster... Thanks! - Ian -- --
Re: Best practice: Multiple clusters vs multiple tables in a single cluster?
Thanks for the input, folks! As a startup, we don't really have different dev teams / apps - everything is in service of the product, so given these responses, I think putting both into the same cluster is the best idea. And if we want to split them out in the future we are still small enough that it would be a pain but not the end of the world... Cheers, Ian On Thu, Apr 2, 2015 at 9:57 AM, Carlos Rolo r...@pythian.com wrote: Adding a new keyspace should be perfectly fine. Unless you have completely distinct workloads for the different keyspaces. Even so you can balanced some stuff at keyspace/table level. But I would go with a new keyspace not with a new cluster given the small size you say you have. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose ianr...@fullstory.com wrote: Hi all - We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables. Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system. Does anyone have pros/cons to share on this? It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference. But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops. While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster... Thanks! - Ian --
Re: Best practice: Multiple clusters vs multiple tables in a single cluster?
There is an old saying in the software industry: The structure of a system follows from the structure of the organization that created it (Conway's Law). Seriously, the main, first question for your end is who owns the applications in terms of executive management, such that if management makes a decision that dramatically affects the app's impact on the cluster, is it likely that they will have done so with the concurrence of management who owns the other app. Trust me, you do not want to be in the middle when two managers are in dispute over whose app is more important. IOW, if one manager owns both apps, you are probably safe, but if two different managers might have differing views of each other's priorities, tread with caution. In any case, be prepared to move one of the apps to a different cluster if and when usage patterns cause them to conflict. There is also the concept of devOps, where the app developers also own operations. You really can't have two separate development teams administer operations for one set of hardware. If you are dedicated to operations for both app teams and the teams seem to be reasonably compatible, then it could be fine. In short, sure, technically a single cluster can support any number of key spaces, but mostly it will come down to whether there might be an excess of contention for load and operations of the cluster in production. And then little things like software upgrades - one app might really need a disruptive or risky upgrade or need to bounce the entire cluster, but then the other app may be impacted even though it had no need for the upgrade or be bounced. Are the apps synergistic in some way, such that there is an architectural benefit from running on the same hardware? In the end, the simplest solution is typically the better solution, unless any of these other factors loom too large. -- Jack Krupansky On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose ianr...@fullstory.com wrote: Hi all - We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables. Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system. Does anyone have pros/cons to share on this? It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference. But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops. While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster... Thanks! - Ian
Re: Best practice: Multiple clusters vs multiple tables in a single cluster?
Jack did a superb job of explaining all of your issues, and his last sentence seems to fit your needs (and my experience) very well. The only other point I would add is to ascertain if the use patterns commend microservices to abstract from data locality, even if the initial deployment is a noop to a single cluster. This depends on whether you see a rapid stream of special purpose business functions. A second question is about data access ... does Pig support your data access response times? Many clients find Hadoop ideally suited to a sophisticated ECTL (extract, cleanup, transformation, and load) model to fast, schema oriented, repositories like e.g. MySQL. All depends on the use case, growth fragmentation expectations for your business model(s), etc. Good luck. PS, Jack thanks, for your succint comment. On Thu, Apr 2, 2015 at 6:33 AM, Jack Krupansky jack.krupan...@gmail.com wrote: There is an old saying in the software industry: The structure of a system follows from the structure of the organization that created it (Conway's Law). Seriously, the main, first question for your end is who owns the applications in terms of executive management, such that if management makes a decision that dramatically affects the app's impact on the cluster, is it likely that they will have done so with the concurrence of management who owns the other app. Trust me, you do not want to be in the middle when two managers are in dispute over whose app is more important. IOW, if one manager owns both apps, you are probably safe, but if two different managers might have differing views of each other's priorities, tread with caution. In any case, be prepared to move one of the apps to a different cluster if and when usage patterns cause them to conflict. There is also the concept of devOps, where the app developers also own operations. You really can't have two separate development teams administer operations for one set of hardware. If you are dedicated to operations for both app teams and the teams seem to be reasonably compatible, then it could be fine. In short, sure, technically a single cluster can support any number of key spaces, but mostly it will come down to whether there might be an excess of contention for load and operations of the cluster in production. And then little things like software upgrades - one app might really need a disruptive or risky upgrade or need to bounce the entire cluster, but then the other app may be impacted even though it had no need for the upgrade or be bounced. Are the apps synergistic in some way, such that there is an architectural benefit from running on the same hardware? In the end, the simplest solution is typically the better solution, unless any of these other factors loom too large. -- Jack Krupansky On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose ianr...@fullstory.com wrote: Hi all - We currently have a single cassandra cluster that is dedicated to a relatively narrow purpose, with just 2 tables. Soon we will need cassandra for another, unrelated, system, and my debate is whether to just add the new tables to our existing cassandra cluster or whether to spin up an entirely new, separate cluster for this new system. Does anyone have pros/cons to share on this? It appears from watching talks and such online that the big users (e.g. Netflix, Spotify) tend to favor multiple, single-purpose clusters, and thus that was my initial preference. But we are (for now) no where close to them in traffic so I'm wondering if running an entirely separate cluster would be a premature optimization which wouldn't pay for the (nontrivial) overhead in configuration management and ops. While we are still small it might be much smarter to reuse our existing clusters so that I can get it done faster... Thanks! - Ian