Re: Cassandra Pig with network topology and data centers.

2011-08-01 Thread Aaron Griffith
Ryan King ryan at twitter.com writes:

 
 It'd be great if we had different settings for inter- and intra-DC read 
 repair.
 
 -ryan
 


Is there a formula or standard for lowering read repair chance to running
analytics on one DC doesn't hammer the other datacenter?

How low can you set the read repair chance?






Re: Cassandra Pig with network topology and data centers.

2011-07-29 Thread Ryan King
It'd be great if we had different settings for inter- and intra-DC read repair.

-ryan

On Fri, Jul 29, 2011 at 5:06 PM, Jake Luciani jak...@gmail.com wrote:
 Yes it's read repair you can lower the read repair chance to tune this.



 On Jul 29, 2011, at 6:31 PM, Aaron Griffith aaron.c.griff...@gmail.com 
 wrote:

 I currently have a 9 node cassandra cluster setup as follows:

 DC1: Six nodes
 DC2: Three nodes

 The tokens alternate between the two datacenters.

 I have hadoop installed as tasktracker/datanodes on the
 three cassandra nodes in DC2.

 There is another non cassandra node that is used as the hadoop namenode / job
 tracker.

 When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read
 consistency I am seeing network and cpu spikes on the nodes in DC1.  I was
 not expecting any impact on those nodes when local quorum is used.

 Can read repair be causing the traffic/cpu spikes?

 The replication settings for DC1 is 5, and for DC2 is 1.

 When looking at the map tasks I am seeing input splits for computers in
 both data centers.  I am not sure what this means.  My thought is
 that is should only be getting data from the nodes in DC2.

 Thanks

 Aaron