Re: Replication to second data center with different number of nodes

2015-03-30 Thread Carlos Rolo
Sharing my experience here.

1) Never had any issues with different size DCs. If the hardware is the
same, keep the # to 256.
2) In most of the cases I keep the 256 vnodes and no performance problems
(when they are triggered, the cause is not the vnodes #)

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Tel: 1649
www.pythian.com

On Mon, Mar 30, 2015 at 6:31 AM, Anishek Agarwal anis...@gmail.com wrote:

 Colin,

 When you said larger number of tokens has Query performance hit, is it
 read or write performance. Also if you have any links you could share to
 shed some light on this it would be great.

 Thanks
 Anishek

 On Sun, Mar 29, 2015 at 2:20 AM, Colin Clark co...@clark.ws wrote:

 I typically use a # a lot lower than 256, usually less than 20 for
 num_tokens as a larger number has historically had a dramatic impact on
 query performance.
 —
 Colin Clark
 co...@clark.ws
 +1 612-859-6129
 skype colin.p.clark

 On Mar 28, 2015, at 3:46 PM, Eric Stevens migh...@gmail.com wrote:

 If you're curious about how Cassandra knows how to replicate data in the
 remote DC, it's the same as in the local DC, replication is independent in
 each, and you can even set a different replication strategy per keyspace
 per datacenter.  Nodes in each DC take up num_tokens positions on a ring,
 each partition key is mapped to a position on that ring, and whomever owns
 that part of the ring is the primary for that data.  Then (oversimplified)
 r-1 adjacent nodes become replicas for that same data.

 On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles 
 charles.sibb...@bskyb.com wrote:


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens

  So go with a default 256, and leave initial token empty:

  num_tokens: 256

 # initial_token:


  Cassandra will always give each node the same number of tokens, the
 only time you might want to distribute this is if your instances are of
 different sizing/capability which is also a bad scenario.

   From: Björn Hachmann bjoern.hachm...@metrigo.de
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Friday, 27 March 2015 12:11
 To: user user@cassandra.apache.org
 Subject: Re: Replication to second data center with different number of
 nodes


 2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com:

 Cassandra’s Vnodes config


 ​Thank you. Yes, we are using vnodes! The num_token parameter controls
 the number of vnodes assigned to a specific node.​

  Might be I am seeing problems where are none.

  Let me rephrase my question: How does Cassandra know it has to
 replicate 1/3 of all keys to each single node in the second DC? I can see
 two ways:
  1. It has to be configured explicitly.
  2. It is derived from the number of nodes available in the data center
 at the time `nodetool rebuild` is started.

  Kind regards
 Björn
   Information in this email including any attachments may be
 privileged, confidential and is intended exclusively for the addressee. The
 views expressed may not be official policy, but the personal views of the
 originator. If you have received it in error, please notify the sender by
 return e-mail and delete it from your system. You should not reproduce,
 distribute, store, retransmit, use or disclose its contents to anyone.
 Please note we reserve the right to monitor all e-mail communication
 through our internal and external networks. SKY and the SKY marks are
 trademarks of Sky plc and Sky International AG and are used under licence.
 Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
 (Registration No. 2067075) and Sky Subscribers Services Limited
 (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc
 (Registration No. 2247735). All of the companies mentioned in this
 paragraph are incorporated in England and Wales and share the same
 registered office at Grant Way, Isleworth, Middlesex TW7 5QD.






-- 


--





Re: Replication to second data center with different number of nodes

2015-03-29 Thread Anishek Agarwal
Colin,

When you said larger number of tokens has Query performance hit, is it read
or write performance. Also if you have any links you could share to shed
some light on this it would be great.

Thanks
Anishek

On Sun, Mar 29, 2015 at 2:20 AM, Colin Clark co...@clark.ws wrote:

 I typically use a # a lot lower than 256, usually less than 20 for
 num_tokens as a larger number has historically had a dramatic impact on
 query performance.
 —
 Colin Clark
 co...@clark.ws
 +1 612-859-6129
 skype colin.p.clark

 On Mar 28, 2015, at 3:46 PM, Eric Stevens migh...@gmail.com wrote:

 If you're curious about how Cassandra knows how to replicate data in the
 remote DC, it's the same as in the local DC, replication is independent in
 each, and you can even set a different replication strategy per keyspace
 per datacenter.  Nodes in each DC take up num_tokens positions on a ring,
 each partition key is mapped to a position on that ring, and whomever owns
 that part of the ring is the primary for that data.  Then (oversimplified)
 r-1 adjacent nodes become replicas for that same data.

 On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles 
 charles.sibb...@bskyb.com wrote:


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens

  So go with a default 256, and leave initial token empty:

  num_tokens: 256

 # initial_token:


  Cassandra will always give each node the same number of tokens, the
 only time you might want to distribute this is if your instances are of
 different sizing/capability which is also a bad scenario.

   From: Björn Hachmann bjoern.hachm...@metrigo.de
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Friday, 27 March 2015 12:11
 To: user user@cassandra.apache.org
 Subject: Re: Replication to second data center with different number of
 nodes


 2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com:

 Cassandra’s Vnodes config


 ​Thank you. Yes, we are using vnodes! The num_token parameter controls
 the number of vnodes assigned to a specific node.​

  Might be I am seeing problems where are none.

  Let me rephrase my question: How does Cassandra know it has to
 replicate 1/3 of all keys to each single node in the second DC? I can see
 two ways:
  1. It has to be configured explicitly.
  2. It is derived from the number of nodes available in the data center
 at the time `nodetool rebuild` is started.

  Kind regards
 Björn
   Information in this email including any attachments may be privileged,
 confidential and is intended exclusively for the addressee. The views
 expressed may not be official policy, but the personal views of the
 originator. If you have received it in error, please notify the sender by
 return e-mail and delete it from your system. You should not reproduce,
 distribute, store, retransmit, use or disclose its contents to anyone.
 Please note we reserve the right to monitor all e-mail communication
 through our internal and external networks. SKY and the SKY marks are
 trademarks of Sky plc and Sky International AG and are used under licence.
 Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
 (Registration No. 2067075) and Sky Subscribers Services Limited
 (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc
 (Registration No. 2247735). All of the companies mentioned in this
 paragraph are incorporated in England and Wales and share the same
 registered office at Grant Way, Isleworth, Middlesex TW7 5QD.






Re: Replication to second data center with different number of nodes

2015-03-28 Thread Colin Clark
I typically use a # a lot lower than 256, usually less than 20 for num_tokens 
as a larger number has historically had a dramatic impact on query performance.
—
Colin Clark
co...@clark.ws
+1 612-859-6129
skype colin.p.clark

 On Mar 28, 2015, at 3:46 PM, Eric Stevens migh...@gmail.com wrote:
 
 If you're curious about how Cassandra knows how to replicate data in the 
 remote DC, it's the same as in the local DC, replication is independent in 
 each, and you can even set a different replication strategy per keyspace per 
 datacenter.  Nodes in each DC take up num_tokens positions on a ring, each 
 partition key is mapped to a position on that ring, and whomever owns that 
 part of the ring is the primary for that data.  Then (oversimplified) r-1 
 adjacent nodes become replicas for that same data.
 
 On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles charles.sibb...@bskyb.com 
 mailto:charles.sibb...@bskyb.com wrote:
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens
  
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens
 
 So go with a default 256, and leave initial token empty:
 
 num_tokens: 256
 # initial_token:
 
 Cassandra will always give each node the same number of tokens, the only time 
 you might want to distribute this is if your instances are of different 
 sizing/capability which is also a bad scenario.
 
 From: Björn Hachmann bjoern.hachm...@metrigo.de 
 mailto:bjoern.hachm...@metrigo.de
 Reply-To: user@cassandra.apache.org mailto:user@cassandra.apache.org 
 user@cassandra.apache.org mailto:user@cassandra.apache.org
 Date: Friday, 27 March 2015 12:11
 To: user user@cassandra.apache.org mailto:user@cassandra.apache.org
 Subject: Re: Replication to second data center with different number of nodes
 
 
 2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com 
 mailto:charles.sibb...@bskyb.com:
 Cassandra’s Vnodes config
 
 ​Thank you. Yes, we are using vnodes! The num_token parameter controls the 
 number of vnodes assigned to a specific node.​
 
 Might be I am seeing problems where are none. 
 
 Let me rephrase my question: How does Cassandra know it has to replicate 1/3 
 of all keys to each single node in the second DC? I can see two ways:
  1. It has to be configured explicitly.
  2. It is derived from the number of nodes available in the data center at 
 the time `nodetool rebuild` is started.
 
 Kind regards
 Björn
 Information in this email including any attachments may be privileged, 
 confidential and is intended exclusively for the addressee. The views 
 expressed may not be official policy, but the personal views of the 
 originator. If you have received it in error, please notify the sender by 
 return e-mail and delete it from your system. You should not reproduce, 
 distribute, store, retransmit, use or disclose its contents to anyone. Please 
 note we reserve the right to monitor all e-mail communication through our 
 internal and external networks. SKY and the SKY marks are trademarks of Sky 
 plc and Sky International AG and are used under licence. Sky UK Limited 
 (Registration No. 2906991), Sky-In-Home Service Limited (Registration No. 
 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are 
 direct or indirect subsidiaries of Sky plc (Registration No. 2247735). All of 
 the companies mentioned in this paragraph are incorporated in England and 
 Wales and share the same registered office at Grant Way, Isleworth, Middlesex 
 TW7 5QD.
 



smime.p7s
Description: S/MIME cryptographic signature


Re: Replication to second data center with different number of nodes

2015-03-28 Thread Eric Stevens
If you're curious about how Cassandra knows how to replicate data in the
remote DC, it's the same as in the local DC, replication is independent in
each, and you can even set a different replication strategy per keyspace
per datacenter.  Nodes in each DC take up num_tokens positions on a ring,
each partition key is mapped to a position on that ring, and whomever owns
that part of the ring is the primary for that data.  Then (oversimplified)
r-1 adjacent nodes become replicas for that same data.

On Fri, Mar 27, 2015 at 6:55 AM, Sibbald, Charles charles.sibb...@bskyb.com
 wrote:


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens

  So go with a default 256, and leave initial token empty:

  num_tokens: 256

 # initial_token:


  Cassandra will always give each node the same number of tokens, the only
 time you might want to distribute this is if your instances are of
 different sizing/capability which is also a bad scenario.

   From: Björn Hachmann bjoern.hachm...@metrigo.de
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Friday, 27 March 2015 12:11
 To: user user@cassandra.apache.org
 Subject: Re: Replication to second data center with different number of
 nodes


 2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com:

 Cassandra’s Vnodes config


 ​Thank you. Yes, we are using vnodes! The num_token parameter controls the
 number of vnodes assigned to a specific node.​

  Might be I am seeing problems where are none.

  Let me rephrase my question: How does Cassandra know it has to replicate
 1/3 of all keys to each single node in the second DC? I can see two ways:
  1. It has to be configured explicitly.
  2. It is derived from the number of nodes available in the data center at
 the time `nodetool rebuild` is started.

  Kind regards
 Björn
   Information in this email including any attachments may be privileged,
 confidential and is intended exclusively for the addressee. The views
 expressed may not be official policy, but the personal views of the
 originator. If you have received it in error, please notify the sender by
 return e-mail and delete it from your system. You should not reproduce,
 distribute, store, retransmit, use or disclose its contents to anyone.
 Please note we reserve the right to monitor all e-mail communication
 through our internal and external networks. SKY and the SKY marks are
 trademarks of Sky plc and Sky International AG and are used under licence.
 Sky UK Limited (Registration No. 2906991), Sky-In-Home Service Limited
 (Registration No. 2067075) and Sky Subscribers Services Limited
 (Registration No. 2340150) are direct or indirect subsidiaries of Sky plc
 (Registration No. 2247735). All of the companies mentioned in this
 paragraph are incorporated in England and Wales and share the same
 registered office at Grant Way, Isleworth, Middlesex TW7 5QD.



Re: Replication to second data center with different number of nodes

2015-03-27 Thread Sibbald, Charles
I would recommend you utilise Cassandra’s Vnodes config and let it manage this 
itself.

This means it will create these and a mange them all on its own and allows 
quick and easy scaling and boot strapping.

From: Björn Hachmann 
bjoern.hachm...@metrigo.demailto:bjoern.hachm...@metrigo.de
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, 27 March 2015 10:40
To: user user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Replication to second data center with different number of nodes

Hi,

we currently plan to add a second data center to our Cassandra-Cluster. I have 
read about this procedure in the documentation (eg. 
https://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html),
 but at least one question remains:

Do I have to provide appropriate values for num_tokens dependent on the number 
of nodes per data center, or is this handled somehow by the 
NetworkTopologyStrategy?

Example: We currently have 12 nodes each covering 256 tokens. Our second 
datacenter will have three nodes only. Do I have to set num_tokens to 1024 
(12*256/3) for the nodes in that DC?

Thank you very much for your valuable input!

Kind regards
Björn Hachmann
Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views expressed 
may not be official policy, but the personal views of the originator. If you 
have received it in error, please notify the sender by return e-mail and delete 
it from your system. You should not reproduce, distribute, store, retransmit, 
use or disclose its contents to anyone. Please note we reserve the right to 
monitor all e-mail communication through our internal and external networks. 
SKY and the SKY marks are trademarks of Sky plc and Sky International AG and 
are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home 
Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of Sky plc 
(Registration No. 2247735). All of the companies mentioned in this paragraph 
are incorporated in England and Wales and share the same registered office at 
Grant Way, Isleworth, Middlesex TW7 5QD.


Replication to second data center with different number of nodes

2015-03-27 Thread Björn Hachmann
Hi,

we currently plan to add a second data center to our Cassandra-Cluster. I
have read about this procedure in the documentation (eg.
https://www.datastax.com/documentation/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html),
but at least one question remains:

Do I have to provide appropriate values for num_tokens dependent on the
number of nodes per data center, or is this handled somehow by the
NetworkTopologyStrategy?

Example: We currently have 12 nodes each covering 256 tokens. Our second
datacenter will have three nodes only. Do I have to set num_tokens to 1024
(12*256/3) for the nodes in that DC?

Thank you very much for your valuable input!

Kind regards
Björn Hachmann


Re: Replication to second data center with different number of nodes

2015-03-27 Thread Sibbald, Charles
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__num_tokens

So go with a default 256, and leave initial token empty:


num_tokens: 256

# initial_token:

Cassandra will always give each node the same number of tokens, the only time 
you might want to distribute this is if your instances are of different 
sizing/capability which is also a bad scenario.

From: Björn Hachmann 
bjoern.hachm...@metrigo.demailto:bjoern.hachm...@metrigo.de
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Friday, 27 March 2015 12:11
To: user user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Replication to second data center with different number of nodes


2015-03-27 11:58 GMT+01:00 Sibbald, Charles 
charles.sibb...@bskyb.commailto:charles.sibb...@bskyb.com:
Cassandra’s Vnodes config

​Thank you. Yes, we are using vnodes! The num_token parameter controls the 
number of vnodes assigned to a specific node.​

Might be I am seeing problems where are none.

Let me rephrase my question: How does Cassandra know it has to replicate 1/3 of 
all keys to each single node in the second DC? I can see two ways:
 1. It has to be configured explicitly.
 2. It is derived from the number of nodes available in the data center at the 
time `nodetool rebuild` is started.

Kind regards
Björn
Information in this email including any attachments may be privileged, 
confidential and is intended exclusively for the addressee. The views expressed 
may not be official policy, but the personal views of the originator. If you 
have received it in error, please notify the sender by return e-mail and delete 
it from your system. You should not reproduce, distribute, store, retransmit, 
use or disclose its contents to anyone. Please note we reserve the right to 
monitor all e-mail communication through our internal and external networks. 
SKY and the SKY marks are trademarks of Sky plc and Sky International AG and 
are used under licence. Sky UK Limited (Registration No. 2906991), Sky-In-Home 
Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited 
(Registration No. 2340150) are direct or indirect subsidiaries of Sky plc 
(Registration No. 2247735). All of the companies mentioned in this paragraph 
are incorporated in England and Wales and share the same registered office at 
Grant Way, Isleworth, Middlesex TW7 5QD.


Re: Replication to second data center with different number of nodes

2015-03-27 Thread Björn Hachmann
2015-03-27 11:58 GMT+01:00 Sibbald, Charles charles.sibb...@bskyb.com:

 Cassandra’s Vnodes config


​Thank you. Yes, we are using vnodes! The num_token parameter controls the
number of vnodes assigned to a specific node.​

Might be I am seeing problems where are none.

Let me rephrase my question: How does Cassandra know it has to replicate
1/3 of all keys to each single node in the second DC? I can see two ways:
 1. It has to be configured explicitly.
 2. It is derived from the number of nodes available in the data center at
the time `nodetool rebuild` is started.

Kind regards
Björn