Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Ian Rose
Hi all -

We currently have a single cassandra cluster that is dedicated to a
relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
for another, unrelated, system, and my debate is whether to just add the
new tables to our existing cassandra cluster or whether to spin up an
entirely new, separate cluster for this new system.

Does anyone have pros/cons to share on this?  It appears from watching
talks and such online that the big users (e.g. Netflix, Spotify) tend to
favor multiple, single-purpose clusters, and thus that was my initial
preference.  But we are (for now) no where close to them in traffic so I'm
wondering if running an entirely separate cluster would be a premature
optimization which wouldn't pay for the (nontrivial) overhead in
configuration management and ops.  While we are still small it might be
much smarter to reuse our existing clusters so that I can get it done
faster...

Thanks!
- Ian


Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Carlos Rolo
Adding a new keyspace should be perfectly fine. Unless you have completely
distinct workloads for the different keyspaces. Even so you can balanced
some stuff at keyspace/table level. But I would go with a new keyspace not
with a new cluster given the small size you say you have.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian



-- 


--





Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Ian Rose
Thanks for the input, folks!

As a startup, we don't really have different dev teams / apps - everything
is in service of the product, so given these responses, I think putting
both into the same cluster is the best idea.  And if we want to split them
out in the future we are still small enough that it would be a pain but not
the end of the world...

Cheers,
Ian


On Thu, Apr 2, 2015 at 9:57 AM, Carlos Rolo r...@pythian.com wrote:

 Adding a new keyspace should be perfectly fine. Unless you have completely
 distinct workloads for the different keyspaces. Even so you can balanced
 some stuff at keyspace/table level. But I would go with a new keyspace not
 with a new cluster given the small size you say you have.

 Regards,

 Carlos Juzarte Rolo
 Cassandra Consultant

 Pythian - Love your data

 rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
 http://linkedin.com/in/carlosjuzarterolo*
 Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
 www.pythian.com

 On Thu, Apr 2, 2015 at 3:06 PM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian



 --






Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread Jack Krupansky
There is an old saying in the software industry: The structure of a system
follows from the structure of the organization that created it (Conway's
Law). Seriously, the main, first question for your end is who owns the
applications in terms of executive management, such that if management
makes a decision that dramatically affects the app's impact on the cluster,
is it likely that they will have done so with the concurrence of management
who owns the other app. Trust me, you do not want to be in the middle when
two managers are in dispute over whose app is more important. IOW, if one
manager owns both apps, you are probably safe, but if two different
managers might have differing views of each other's priorities, tread with
caution.

In any case, be prepared to move one of the apps to a different cluster if
and when usage patterns cause them to conflict.

There is also the concept of devOps, where the app developers also own
operations. You really can't have two separate development teams administer
operations for one set of hardware.

If you are dedicated to operations for both app teams and the teams seem to
be reasonably compatible, then it could be fine.

In short, sure, technically a single cluster can support  any number of key
spaces, but mostly it will come down to whether there might be an excess of
contention for load and operations of the cluster in production.

And then little things like software upgrades - one app might really need a
disruptive or risky upgrade or need to bounce the entire cluster, but then
the other app may be impacted even though it had no need for the upgrade or
be bounced.

Are the apps synergistic in some way, such that there is an architectural
benefit from running on the same hardware?

In the end, the simplest solution is typically the better solution, unless
any of these other factors loom too large.


-- Jack Krupansky

On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian




Re: Best practice: Multiple clusters vs multiple tables in a single cluster?

2015-04-02 Thread daemeon reiydelle
Jack did a superb job of explaining all of your issues, and his last
sentence seems to fit your needs (and my experience) very well. The only
other point I would add is to ascertain if the use patterns commend
microservices to abstract from data locality, even if the initial
deployment is a noop to a single cluster. This depends on whether you see a
rapid stream of special purpose business functions. A second question is
about data access ... does Pig support your data access response times?
Many clients find Hadoop ideally suited to a sophisticated ECTL (extract,
cleanup, transformation, and load) model to fast, schema oriented,
repositories like e.g. MySQL. All depends on the use case, growth 
fragmentation expectations for your business model(s), etc.

Good luck.

PS, Jack thanks, for your succint comment.




On Thu, Apr 2, 2015 at 6:33 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 There is an old saying in the software industry: The structure of a system
 follows from the structure of the organization that created it (Conway's
 Law). Seriously, the main, first question for your end is who owns the
 applications in terms of executive management, such that if management
 makes a decision that dramatically affects the app's impact on the cluster,
 is it likely that they will have done so with the concurrence of management
 who owns the other app. Trust me, you do not want to be in the middle when
 two managers are in dispute over whose app is more important. IOW, if one
 manager owns both apps, you are probably safe, but if two different
 managers might have differing views of each other's priorities, tread with
 caution.

 In any case, be prepared to move one of the apps to a different cluster if
 and when usage patterns cause them to conflict.

 There is also the concept of devOps, where the app developers also own
 operations. You really can't have two separate development teams administer
 operations for one set of hardware.

 If you are dedicated to operations for both app teams and the teams seem
 to be reasonably compatible, then it could be fine.

 In short, sure, technically a single cluster can support  any number of
 key spaces, but mostly it will come down to whether there might be an
 excess of contention for load and operations of the cluster in production.

 And then little things like software upgrades - one app might really need
 a disruptive or risky upgrade or need to bounce the entire cluster, but
 then the other app may be impacted even though it had no need for the
 upgrade or be bounced.

 Are the apps synergistic in some way, such that there is an architectural
 benefit from running on the same hardware?

 In the end, the simplest solution is typically the better solution, unless
 any of these other factors loom too large.


 -- Jack Krupansky

 On Thu, Apr 2, 2015 at 9:06 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 We currently have a single cassandra cluster that is dedicated to a
 relatively narrow purpose, with just 2 tables.  Soon we will need cassandra
 for another, unrelated, system, and my debate is whether to just add the
 new tables to our existing cassandra cluster or whether to spin up an
 entirely new, separate cluster for this new system.

 Does anyone have pros/cons to share on this?  It appears from watching
 talks and such online that the big users (e.g. Netflix, Spotify) tend to
 favor multiple, single-purpose clusters, and thus that was my initial
 preference.  But we are (for now) no where close to them in traffic so I'm
 wondering if running an entirely separate cluster would be a premature
 optimization which wouldn't pay for the (nontrivial) overhead in
 configuration management and ops.  While we are still small it might be
 much smarter to reuse our existing clusters so that I can get it done
 faster...

 Thanks!
 - Ian