I found the answer.

By default, the Datastax driver for Cassandra uses the RoundRobinPolicy for 
deciding which Cassandra node a client read or write request should be routed 
to. But that policy is independent of data center.

Per the documentation 
(http://www.datastax.com/drivers/java/2.0/apidocs/com/datastax/driver/core/policies/LoadBalancingPolicy.html)
 , one can see  that if you have multiple data centers, it's probably better to 
use DCAwareRoundRobinPolicy, which gives preference to the local data center. 
The client program needs to know which datacenter it resides in (e.g., "DC1").


        private void connect() {
                if (m_session != null) {
                        return;
                }
                String[] components = m_cassandraNode.split(",");
                Builder builder = Cluster.builder();  
                for (String component : components) {
                        builder.addContactPoint(component);
                }
                long start = System.currentTimeMillis();
                LoadBalancingPolicy loadBalancingPolicy = new 
DCAwareRoundRobinPolicy(localDataCenterName);
                if (useTokenAwarePolicy) {loadBalancingPolicy= new 
TokenAwarePolicy(loadBalancingPolicy);}
                m_cluster = builder.withLoadBalancingPolicy(loadBalancingPolicy)
                                .build();
                m_session = m_cluster.connect();
                prepareQueries();
                float seconds = 0.001f * (System.currentTimeMillis() - start);
                System.out.println("Connected to cassandra host " + 
m_cassandraNode
                                + " in " + seconds + " seconds.");
      }


-----Original Message-----
From: Duncan Sands [mailto:duncan.sa...@gmail.com] 
Sent: Thursday, January 30, 2014 1:19 AM
To: user@cassandra.apache.org
Subject: Re: Question about local reads with multiple data centers

Hi Donald, which driver are you using?  With the datastax python driver you 
need to use the DCAwareRoundRobinPolicy for the load balancing policy if you 
want the driver to distinguish between your data centres, otherwise by default 
it round robins robins requests amongst all nodes regardless of which data 
centre they are in, and regardless of which data centre the nodes you told it 
to connect to are in.  Probably it is the same for the other datastax drivers.

Best wishes, Duncan.

On 30/01/14 02:07, Donald Smith wrote:
> We have two datacenters, DC1 and DC2 in our test cluster. Our *write* 
> process uses a connection string with just the two hosts in DC1. Our *read* 
> process uses
> a connection string just with the two hosts in DC2.   We use a
> PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data 
> centers.
>
> I notice from the *read* process's logs that the reader adds ALL the 
> hosts (in both datacenters) to the list of queried hosts.
>
> My question: will the *read* process try to read first locally from the
> datacenter DC2 I specified in its connection string?     I presume so.  (I 
> doubt
> that it uses the client's IP address to decide which datacenter is 
> closer. And I am unaware of another way to tell it to read locally.)
>
> Also, will read repair happen between datacenters automatically 
> ("read_repair_chance=0.100000")?  Or does that only happen within a 
> single data center?
>
> We're using Cassandra 2.0.4  and CQL.
>
> Thank you
>
> *Donald A. Smith*| Senior Software Engineer
> P: 425.201.3900 x 3866
> C: (206) 819-5965
> F: (646) 443-2333
> dona...@audiencescience.com <mailto:dona...@audiencescience.com>
>
>
> AudienceScience
>

Reply via email to