I found the answer. By default, the Datastax driver for Cassandra uses the RoundRobinPolicy for deciding which Cassandra node a client read or write request should be routed to. But that policy is independent of data center.
Per the documentation (http://www.datastax.com/drivers/java/2.0/apidocs/com/datastax/driver/core/policies/LoadBalancingPolicy.html) , one can see that if you have multiple data centers, it's probably better to use DCAwareRoundRobinPolicy, which gives preference to the local data center. The client program needs to know which datacenter it resides in (e.g., "DC1"). private void connect() { if (m_session != null) { return; } String[] components = m_cassandraNode.split(","); Builder builder = Cluster.builder(); for (String component : components) { builder.addContactPoint(component); } long start = System.currentTimeMillis(); LoadBalancingPolicy loadBalancingPolicy = new DCAwareRoundRobinPolicy(localDataCenterName); if (useTokenAwarePolicy) {loadBalancingPolicy= new TokenAwarePolicy(loadBalancingPolicy);} m_cluster = builder.withLoadBalancingPolicy(loadBalancingPolicy) .build(); m_session = m_cluster.connect(); prepareQueries(); float seconds = 0.001f * (System.currentTimeMillis() - start); System.out.println("Connected to cassandra host " + m_cassandraNode + " in " + seconds + " seconds."); } -----Original Message----- From: Duncan Sands [mailto:duncan.sa...@gmail.com] Sent: Thursday, January 30, 2014 1:19 AM To: user@cassandra.apache.org Subject: Re: Question about local reads with multiple data centers Hi Donald, which driver are you using? With the datastax python driver you need to use the DCAwareRoundRobinPolicy for the load balancing policy if you want the driver to distinguish between your data centres, otherwise by default it round robins robins requests amongst all nodes regardless of which data centre they are in, and regardless of which data centre the nodes you told it to connect to are in. Probably it is the same for the other datastax drivers. Best wishes, Duncan. On 30/01/14 02:07, Donald Smith wrote: > We have two datacenters, DC1 and DC2 in our test cluster. Our *write* > process uses a connection string with just the two hosts in DC1. Our *read* > process uses > a connection string just with the two hosts in DC2. We use a > PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data > centers. > > I notice from the *read* process's logs that the reader adds ALL the > hosts (in both datacenters) to the list of queried hosts. > > My question: will the *read* process try to read first locally from the > datacenter DC2 I specified in its connection string? I presume so. (I > doubt > that it uses the client's IP address to decide which datacenter is > closer. And I am unaware of another way to tell it to read locally.) > > Also, will read repair happen between datacenters automatically > ("read_repair_chance=0.100000")? Or does that only happen within a > single data center? > > We're using Cassandra 2.0.4 and CQL. > > Thank you > > *Donald A. Smith*| Senior Software Engineer > P: 425.201.3900 x 3866 > C: (206) 819-5965 > F: (646) 443-2333 > dona...@audiencescience.com <mailto:dona...@audiencescience.com> > > > AudienceScience >