It'd be great if we had different settings for inter- and intra-DC read repair.
-ryan
On Fri, Jul 29, 2011 at 5:06 PM, Jake Luciani jak...@gmail.com wrote:
Yes it's read repair you can lower the read repair chance to tune this.
On Jul 29, 2011, at 6:31 PM, Aaron Griffith aaron.c.griff...@gmail.com
wrote:
I currently have a 9 node cassandra cluster setup as follows:
DC1: Six nodes
DC2: Three nodes
The tokens alternate between the two datacenters.
I have hadoop installed as tasktracker/datanodes on the
three cassandra nodes in DC2.
There is another non cassandra node that is used as the hadoop namenode / job
tracker.
When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read
consistency I am seeing network and cpu spikes on the nodes in DC1. I was
not expecting any impact on those nodes when local quorum is used.
Can read repair be causing the traffic/cpu spikes?
The replication settings for DC1 is 5, and for DC2 is 1.
When looking at the map tasks I am seeing input splits for computers in
both data centers. I am not sure what this means. My thought is
that is should only be getting data from the nodes in DC2.
Thanks
Aaron