I currently have a 9 node cassandra cluster setup as follows: DC1: Six nodes DC2: Three nodes
The tokens alternate between the two datacenters. I have hadoop installed as tasktracker/datanodes on the three cassandra nodes in DC2. There is another non cassandra node that is used as the hadoop namenode / job tracker. When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read consistency I am seeing network and cpu spikes on the nodes in DC1. I was not expecting any impact on those nodes when local quorum is used. Can read repair be causing the traffic/cpu spikes? The replication settings for DC1 is 5, and for DC2 is 1. When looking at the map tasks I am seeing input splits for computers in both data centers. I am not sure what this means. My thought is that is should only be getting data from the nodes in DC2. Thanks Aaron