The difference is likely due to the DynamicEndpointSnitch (aka dynamic
snitch), which picks replicas to send messages to based on recently
observed latency and self-reported load (accounting for compactions,
repair, etc).  If you want to confirm this, you can disable the dynamic
snitch by adding this line to cassandra.yaml: "dynamic_snitch: false".

On Thu, Nov 20, 2014 at 9:52 AM, Nikolai Grigoriev <[email protected]>
wrote:

> Hi,
>
> There is something odd I have observed when testing a configuration with
> two DC for the first time. I wanted to do a simple functional test to prove
> myself (and my pessimistic colleagues ;) ) that it works.
>
> I have a test cluster of 6 nodes, 3 in each DC, and a keyspace that is
> replicated as follows:
>
> CREATE KEYSPACE xxxxxxx WITH replication = {
>
>   'class': 'NetworkTopologyStrategy',
>
>   'DC2': '3',
>
>   'DC1': '3'
>
> };
>
>
> I have disabled the traffic compression between DCs to get more accurate
> numbers.
>
> I have set up a bunch of IP accounting rules on each node so they count
> the outgoing traffic from this node to each other node. I had rules for
> different ports but, of course, but it is mostly about port 7000 (or 7001)
> when talking about inter-node traffic. Anyway, I have a table that shows
> the traffic from any node to any node's port 7000.
>
> I have ran a test with DCAwareRoundRobinPolicy and the client talking only
> to DC1 nodes. Everything looks fine - the client has sent identical amount
> of data to each of 3 nodes in DC1. These nodes inside of DC1 (I was writing
> with LOCAL_ONE consistency) have sent similar amount of data to each other
> that represents exactly two extra replicas.
>
> However, when I look at the traffic from the nodes in DC1 to the nodes in
> DC1 the picture is different:
>
>   10.3.45.156
>
> 10.3.45.159
>
> dpt:7000
>
> 117,273,075
>
> 10.3.45.156
>
> 10.3.45.160
>
> dpt:7000
>
> 228,326,091
>
> 10.3.45.156
>
> 10.3.45.161
>
> dpt:7000
>
> 46,924,339
>
> 10.3.45.157
>
> 10.3.45.159
>
> dpt:7000
>
> 118,978,269
>
> 10.3.45.157
>
> 10.3.45.160
>
> dpt:7000
>
> 230,444,929
>
> 10.3.45.157
>
> 10.3.45.161
>
> dpt:7000
>
> 47,394,179
>
> 10.3.45.158
>
> 10.3.45.159
>
> dpt:7000
>
> 113,969,248
>
> 10.3.45.158
>
> 10.3.45.160
>
> dpt:7000
>
> 225,844,838
>
> 10.3.45.158
>
> 10.3.45.161
>
> dpt:7000
>
> 46,338,939
>
> Nodes 10.3.45.156-158 are in DC1, .159-161 - in DC2. As you can see, each
> of nodes in DC1 has sent different amount of traffic to the remote nodes:
> 117Mb, 228Mb and 46Mb respectively. Both DC have one rack.
>
> So, here is my question. How does node select the node in remote DC to
> send the message to? I did a quick sweep through the code and I could only
> find the sorting by proximity (checking the rack and DC). So, considering
> that for each request I fire the targets are all 3 nodes in the remote DC,
> the list will contain all 3 nodes in DC2. And, if I understood correctly,
> the first node from the list is picked to send the message.
>
> So, it seems to me that there is no any kind of round-robin-type logic is
> applied when selecting the target node to forward the write to from the
> list of targets in remote DC.
>
> If this is true (and the numbers kind of show it is, right?), then
> probably the list with equal proximity should be shuffled randomly? Or,
> instead of picking the first target, a random one should be picked?
>
>
> --
> Nikolai Grigoriev
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Reply via email to