Cassandra Pig with network topology and data centers.

Aaron Griffith Fri, 29 Jul 2011 15:40:38 -0700

I currently have a 9 node cassandra cluster setup as follows:

DC1: Six nodes
DC2: Three nodes


The tokens alternate between the two datacenters.

I have hadoop installed as tasktracker/datanodes on the 
three cassandra nodes in DC2.

There is another non cassandra node that is used as the hadoop namenode / job 
tracker.

When running pig scripts pointed to a node in DC2 using LOCAL_QUORUM as read
consistency I am seeing network and cpu spikes on the nodes in DC1.  I was 
not expecting any impact on those nodes when local quorum is used.

Can read repair be causing the traffic/cpu spikes?  

The replication settings for DC1 is 5, and for DC2 is 1.

When looking at the map tasks I am seeing input splits for computers in 
both data centers.  I am not sure what this means.  My thought is 
that is should only be getting data from the nodes in DC2.

Thanks

Aaron

Cassandra Pig with network topology and data centers.

Reply via email to