Re: Using keyspaces for virtual clusters
@Alain I wanted to do 2, but looks like that won't be possible because of too much overhead. @Eric Yeah that's what I was afraid of. Though I know that the client connects to every server, I just didn't want to do the extra code. On Wed, Sep 21, 2016 at 4:56 PM, Eric Stevens wrote: > Using keyspaces to support multi tenancy is very close to an anti pattern > unless there is a finite and reasonable upper bound to how many tenants > you'll support overall. Large numbers of tables comes with cluster overhead > and operational complexity you will come to regret eventually. > > >and because I don't like having multiple cql clients/connections on my > app-code > > You should note that although Cassandra drivers present a single logical > connection per cluster, under the hood it maintains connection pools per C* > host. You might be able to do a slightly better job of managing those pools > as a single cluster and logical connection, but I doubt it will be very > significant. It would depend on what options you have available in your > driver of choice. > > Application logic would complexity not be greatly improved because you > still need to switch by tenant, whether it's keyspace name or connection > name doesn't seem like it would make much difference. > > As Alain pointed out, upgrades will be painful and maybe even dangerous as > a monolithic cluster. > > On Wed, Sep 21, 2016, 3:50 AM Alain RODRIGUEZ wrote: > >> Hi Dorian, >> >> I'm thinking of creating many keyspaces and storing them into many >>> virtual datacenters (the servers will be in 1 logical datacenter, but >>> separated by keyspaces). >>> >>> Does that make sense (so growing up to 200 dcs of 3 servers each in best >>> case scenario)? >> >> >> There is 3 main things you can do here >> >> 1 - Use 1 DC, 200 keyspaces using the DC >> 2 - Use 200 DC, 1 keyspace per DC. >> 3 - Use 200 cluster, 1 DC, 1 keyspace per client (or many keyspaces, but >> related to 1 client) >> >> I am not sure if you want to go with 1 or 2, my understanding is you >> wanted to write "the servers will be in 1 -*logical- **physical* >> datacenter" and you are willing to do as described in 2. >> >> This looks to be a good idea to me, but for other reasons (clients / >> workload isolation, limited risk, independent growth for each client, >> visibility on cost per client, ...) >> >> Does that make sense (so growing up to 200 dcs of 3 servers each in best >>> case scenario)? >>> >> >> Yet I would not go with distinct DC, but rather distinct C* clusters >> (different cluster names, seeds, etc). >> >> I see no good reason to use virtual cluster instead of distinct cluster. >> Keep keyspace in distinct isolated datacenter would work. Datacenter would >> be quite isolated since no information or load would be shared, excepted >> from gossip. >> >> Yet there are some issue with big clusters due to gossip, and I had some >> issue in the past due to gossip, affecting all the DC within a cluster. In >> this case you would face a major issue, that you could have avoided or >> limited. Plus when upgrading Cassandra, you would have to upgrade 600 nodes >> quite quickly when distinct clusters can be upgraded independently. I would >> then go with either option 1 or 3. >> >> and because I don't like having multiple cql clients/connections on my >>> app-code >> >> >> In this case, wouldn't it make sense for you to have per customer app-code >> or just a conditional connection creation depending on the client? >> >> I just try to give you some ideas. >> >> Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since >>> there is overhead with each keyspace + table which would probably break >>> this design) >> >> Or is it just a simple map dcx--->ip1,ip2,ip3 ? >> >> >> I just checked it. All the nodes would know about every keyspace and >> table, if using the same Cassandra cluster, (in my testing version C*3.7, >> this is stored under system_schema.tables - local strategy, no >> replication). To avoid that, using distinct clusters is the way to go. >> >> https://gist.github.com/arodrime/2f4fb2133c5b242b9500860ac8c6d89c >> >> C*heers, >> --- >> Alain Rodriguez - @arodream - al...@thelastpickle.com >> France >> >> The Last Pickle - Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> 2016-09-20 22:49 GMT+02:00 Dorian Hoxha : >> >>> Hi, >>> >>> I need to separate clients data into multiple clusters and because I >>> don't like having multiple cql clients/connections on my app-code, I'm >>> thinking of creating many keyspaces and storing them into many virtual >>> datacenters (the servers will be in 1 logical datacenter, but separated by >>> keyspaces). >>> >>> Does that make sense (so growing up to 200 dcs of 3 servers each in best >>> case scenario)? >>> >>> Does the cql-engine make a new connection (like "use keyspace") when >>> specifying "keyspace.table" on the query ? >>> >>> Are the keyspaces+tables of dc1 stored in a cassandra node of dc2
Re: Using keyspaces for virtual clusters
Using keyspaces to support multi tenancy is very close to an anti pattern unless there is a finite and reasonable upper bound to how many tenants you'll support overall. Large numbers of tables comes with cluster overhead and operational complexity you will come to regret eventually. >and because I don't like having multiple cql clients/connections on my app-code You should note that although Cassandra drivers present a single logical connection per cluster, under the hood it maintains connection pools per C* host. You might be able to do a slightly better job of managing those pools as a single cluster and logical connection, but I doubt it will be very significant. It would depend on what options you have available in your driver of choice. Application logic would complexity not be greatly improved because you still need to switch by tenant, whether it's keyspace name or connection name doesn't seem like it would make much difference. As Alain pointed out, upgrades will be painful and maybe even dangerous as a monolithic cluster. On Wed, Sep 21, 2016, 3:50 AM Alain RODRIGUEZ wrote: > Hi Dorian, > > I'm thinking of creating many keyspaces and storing them into many virtual >> datacenters (the servers will be in 1 logical datacenter, but separated by >> keyspaces). >> >> Does that make sense (so growing up to 200 dcs of 3 servers each in best >> case scenario)? > > > There is 3 main things you can do here > > 1 - Use 1 DC, 200 keyspaces using the DC > 2 - Use 200 DC, 1 keyspace per DC. > 3 - Use 200 cluster, 1 DC, 1 keyspace per client (or many keyspaces, but > related to 1 client) > > I am not sure if you want to go with 1 or 2, my understanding is you > wanted to write "the servers will be in 1 -*logical- **physical* > datacenter" and you are willing to do as described in 2. > > This looks to be a good idea to me, but for other reasons (clients / > workload isolation, limited risk, independent growth for each client, > visibility on cost per client, ...) > > Does that make sense (so growing up to 200 dcs of 3 servers each in best >> case scenario)? >> > > Yet I would not go with distinct DC, but rather distinct C* clusters > (different cluster names, seeds, etc). > > I see no good reason to use virtual cluster instead of distinct cluster. > Keep keyspace in distinct isolated datacenter would work. Datacenter would > be quite isolated since no information or load would be shared, excepted > from gossip. > > Yet there are some issue with big clusters due to gossip, and I had some > issue in the past due to gossip, affecting all the DC within a cluster. In > this case you would face a major issue, that you could have avoided or > limited. Plus when upgrading Cassandra, you would have to upgrade 600 nodes > quite quickly when distinct clusters can be upgraded independently. I would > then go with either option 1 or 3. > > and because I don't like having multiple cql clients/connections on my >> app-code > > > In this case, wouldn't it make sense for you to have per customer app-code > or just a conditional connection creation depending on the client? > > I just try to give you some ideas. > > Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since >> there is overhead with each keyspace + table which would probably break >> this design) > > Or is it just a simple map dcx--->ip1,ip2,ip3 ? > > > I just checked it. All the nodes would know about every keyspace and > table, if using the same Cassandra cluster, (in my testing version C*3.7, > this is stored under system_schema.tables - local strategy, no > replication). To avoid that, using distinct clusters is the way to go. > > https://gist.github.com/arodrime/2f4fb2133c5b242b9500860ac8c6d89c > > C*heers, > --- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > 2016-09-20 22:49 GMT+02:00 Dorian Hoxha : > >> Hi, >> >> I need to separate clients data into multiple clusters and because I >> don't like having multiple cql clients/connections on my app-code, I'm >> thinking of creating many keyspaces and storing them into many virtual >> datacenters (the servers will be in 1 logical datacenter, but separated by >> keyspaces). >> >> Does that make sense (so growing up to 200 dcs of 3 servers each in best >> case scenario)? >> >> Does the cql-engine make a new connection (like "use keyspace") when >> specifying "keyspace.table" on the query ? >> >> Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since >> there is overhead with each keyspace + table which would probably break >> this design) >> Or is it just a simple map dcx--->ip1,ip2,ip3 ? >> >> Thank you! >> > >
Re: Using keyspaces for virtual clusters
Hi Dorian, I'm thinking of creating many keyspaces and storing them into many virtual > datacenters (the servers will be in 1 logical datacenter, but separated by > keyspaces). > > Does that make sense (so growing up to 200 dcs of 3 servers each in best > case scenario)? There is 3 main things you can do here 1 - Use 1 DC, 200 keyspaces using the DC 2 - Use 200 DC, 1 keyspace per DC. 3 - Use 200 cluster, 1 DC, 1 keyspace per client (or many keyspaces, but related to 1 client) I am not sure if you want to go with 1 or 2, my understanding is you wanted to write "the servers will be in 1 -*logical- **physical* datacenter" and you are willing to do as described in 2. This looks to be a good idea to me, but for other reasons (clients / workload isolation, limited risk, independent growth for each client, visibility on cost per client, ...) Does that make sense (so growing up to 200 dcs of 3 servers each in best > case scenario)? > Yet I would not go with distinct DC, but rather distinct C* clusters (different cluster names, seeds, etc). I see no good reason to use virtual cluster instead of distinct cluster. Keep keyspace in distinct isolated datacenter would work. Datacenter would be quite isolated since no information or load would be shared, excepted from gossip. Yet there are some issue with big clusters due to gossip, and I had some issue in the past due to gossip, affecting all the DC within a cluster. In this case you would face a major issue, that you could have avoided or limited. Plus when upgrading Cassandra, you would have to upgrade 600 nodes quite quickly when distinct clusters can be upgraded independently. I would then go with either option 1 or 3. and because I don't like having multiple cql clients/connections on my > app-code In this case, wouldn't it make sense for you to have per customer app-code or just a conditional connection creation depending on the client? I just try to give you some ideas. Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since > there is overhead with each keyspace + table which would probably break > this design) Or is it just a simple map dcx--->ip1,ip2,ip3 ? I just checked it. All the nodes would know about every keyspace and table, if using the same Cassandra cluster, (in my testing version C*3.7, this is stored under system_schema.tables - local strategy, no replication). To avoid that, using distinct clusters is the way to go. https://gist.github.com/arodrime/2f4fb2133c5b242b9500860ac8c6d89c C*heers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-09-20 22:49 GMT+02:00 Dorian Hoxha : > Hi, > > I need to separate clients data into multiple clusters and because I don't > like having multiple cql clients/connections on my app-code, I'm thinking > of creating many keyspaces and storing them into many virtual datacenters > (the servers will be in 1 logical datacenter, but separated by keyspaces). > > Does that make sense (so growing up to 200 dcs of 3 servers each in best > case scenario)? > > Does the cql-engine make a new connection (like "use keyspace") when > specifying "keyspace.table" on the query ? > > Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since > there is overhead with each keyspace + table which would probably break > this design) > Or is it just a simple map dcx--->ip1,ip2,ip3 ? > > Thank you! >
Using keyspaces for virtual clusters
Hi, I need to separate clients data into multiple clusters and because I don't like having multiple cql clients/connections on my app-code, I'm thinking of creating many keyspaces and storing them into many virtual datacenters (the servers will be in 1 logical datacenter, but separated by keyspaces). Does that make sense (so growing up to 200 dcs of 3 servers each in best case scenario)? Does the cql-engine make a new connection (like "use keyspace") when specifying "keyspace.table" on the query ? Are the keyspaces+tables of dc1 stored in a cassandra node of dc2 ?(since there is overhead with each keyspace + table which would probably break this design) Or is it just a simple map dcx--->ip1,ip2,ip3 ? Thank you!