Re: Unexpected high internode network activity

Gianluca Borello Fri, 26 Feb 2016 06:52:30 -0800

I understand your point about the billing, but billing here was merely
the triggering factor that had me start analyzing the traffic in the first
place.


At the moment, I'm not considering the numbers on my bill anymore but
simply the numbers that I am measuring with iftop on each node of the
cluster, and if I measure the total traffic on port 7000 I see 35 GB in the
example above, and since each byte is counted twice by iftop (because I'm
running on every node) the cluster generated 17.5 GB of unique network
activity, and I am trying to explain that number in relation to the traffic
I'm seeing on port 9042, billing aside.

Unfortunately, these numbers still don't match at all.

And yes, the cluster is in a single DC and since I am using the EC2 snitch,
replicas are AZ aware.

Thanks

On Thursday, February 25, 2016, daemeon reiydelle <daeme...@gmail.com>
wrote:

> Hmm. From the AWS FAQ:
>
> *Q: If I have two instances in different availability zones, how will I be
> charged for regional data transfer?*
>
> Each instance is charged for its data in and data out. Therefore, if data
> is transferred between these two instances, it is charged out for the first
> instance and in for the second instance.
>
>
> I really am not seeing this factored into your numbers fully. If data
> transfer is only twice as much as expected, the above billing would seem to
> put the numbers in line. Since (I assume) you have one copy in EACH AZ (dc
> aware but really dc=az) I am not seeing the bandwidth as that much out of
> line.
>
>
>
> *.......*
>
>
>
> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*
>
> On Thu, Feb 25, 2016 at 11:00 PM, Gianluca Borello <gianl...@sysdig.com
> <javascript:_e(%7B%7D,'cvml','gianl...@sysdig.com');>> wrote:
>
>> It is indeed very intriguing and I really hope to learn more from the
>> experience of this mailing list. To address your points:
>>
>> - The theory that full data is coming from replicas during reads is not
>> enough to explain the situation. In my scenario, over a time window I had
>> 17.5 GB of intra node activity (port 7000) for 1 GB of writes and 1.5 GB of
>> reads (measured on port 9042), so even if both reads and writes affected
>> all replicas, I would have (1 + 1.5) * 3 = 7.5 GB, still leaving 10 GB on
>> port 7000 unaccounted
>>
>> - We are doing regular backups the standard way, using periodic snapshots
>> and synchronizing them to S3. This traffic is not part of the anomalous
>> traffic we're seeing above, since this one goes on port 80 and it's clearly
>> visible with a separate bpf filter, and its magnitude is far lower than
>> that anyway
>>
>> Thanks
>>
>> On Thu, Feb 25, 2016 at 9:03 PM, daemeon reiydelle <daeme...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','daeme...@gmail.com');>> wrote:
>>
>>> Intriguing. It's enough data to look like full data is coming from the
>>> replicants instead of digests when the read of the copy occurs. Are you
>>> doing backup/dr? Are directories copied regularly and over the network or ?
>>>
>>>
>>> *.......*
>>>
>>>
>>>
>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>
>>> On Thu, Feb 25, 2016 at 8:12 PM, Gianluca Borello <gianl...@sysdig.com
>>> <javascript:_e(%7B%7D,'cvml','gianl...@sysdig.com');>> wrote:
>>>
>>>> Thank you for your reply.
>>>>
>>>> To answer your points:
>>>>
>>>> - I fully agree on the write volume, in fact my isolated tests confirm
>>>> your estimation
>>>>
>>>> - About the read, I agree as well, but the volume of data is still much
>>>> higher
>>>>
>>>> - I am writing to one single keyspace with RF 3, there's just one
>>>> keyspace
>>>>
>>>> - I am not using any indexes, the column families are very simple
>>>>
>>>> - I am aware of the double count, in fact, I measured the traffic on
>>>> port 9042 at the client side (so just counted once) and I divided by two
>>>> the traffic on port 7000 as measured on each node (35 GB -> 17.5 GB). All
>>>> the measurements have been done with iftop with proper bpf filters on the
>>>> port and the total traffic matches what I see in cloudwatch (divided by 
>>>> two)
>>>>
>>>> So unfortunately I still don't have any ideas about what's going on and
>>>> why I'm seeing 17 GB of internode traffic instead of ~ 5-6.
>>>>
>>>> On Thursday, February 25, 2016, daemeon reiydelle <daeme...@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','daeme...@gmail.com');>> wrote:
>>>>
>>>>> If read & write at quorum then you write 3 copies of the data then
>>>>> return to the caller; when reading you read one copy (assume it is not on
>>>>> the coordinator), and 1 digest (because read at quorum is 2, not 3).
>>>>>
>>>>> When you insert, how many keyspaces get written to? (Are you using
>>>>> e.g. inverted indices?) That is my guess, that your db has about 1.8 bytes
>>>>> written for every byte inserted.
>>>>>
>>>>> Every byte you write is counted also as a read (system a sends 1gb to
>>>>> system b, so system b receives 1gb). You would not be charged if intra AZ,
>>>>> but inter AZ and inter DC will get that double count.
>>>>>
>>>>> So, my guess is reverse indexes, and you forgot to include receive and
>>>>> transmit.
>>>>> 
>>>>>
>>>>>
>>>>> *.......*
>>>>>
>>>>>
>>>>>
>>>>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>>>>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>>>>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>>>>
>>>>> On Thu, Feb 25, 2016 at 6:51 PM, Gianluca Borello <gianl...@sysdig.com
>>>>> > wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We have a Cassandra 2.1.9 cluster on EC2 for one of our live
>>>>>> applications. There's a total of 21 nodes across 3 AWS availability 
>>>>>> zones,
>>>>>> c3.2xlarge instances.
>>>>>>
>>>>>> The configuration is pretty standard, we use the default settings
>>>>>> that come with the datastax AMI and the driver in our application is
>>>>>> configured to use lz4 compression. The keyspace where all the activity
>>>>>> happens has RF 3 and we read and write at quorum to get strong 
>>>>>> consistency.
>>>>>>
>>>>>> While analyzing our monthly bill, we noticed that the amount of
>>>>>> network traffic related to Cassandra was significantly higher than
>>>>>> expected. After breaking it down by port, it seems like over any given
>>>>>> time, the internode network activity is 6-7 times higher than the traffic
>>>>>> on port 9042, whereas we would expect something around 2-3 times, given 
>>>>>> the
>>>>>> replication factor and the consistency level of our queries.
>>>>>>
>>>>>> For example, this is the network traffic broken down by port and
>>>>>> direction over a few minutes, measured as sum of each node:
>>>>>>
>>>>>> Port 9042 from client to cluster (write queries): 1 GB
>>>>>> Port 9042 from cluster to client (read queries): 1.5 GB
>>>>>> Port 7000: 35 GB, which must be divided by two because the traffic is
>>>>>> always directed to another instance of the cluster, so that makes it 17.5
>>>>>> GB generated traffic
>>>>>>
>>>>>> The traffic on port 9042 completely matches our expectations, we do
>>>>>> about 100k write operations writing 10KB binary blobs for each query, 
>>>>>> and a
>>>>>> bit more reads on the same data.
>>>>>>
>>>>>> According to our calculations, in the worst case, when the
>>>>>> coordinator of the query is not a replica for the data, this should
>>>>>> generate about (1 + 1.5) * 3 = 7.5 GB, and instead we see 17 GB, which is
>>>>>> quite a lot more.
>>>>>>
>>>>>> Also, hinted handoffs are disabled and nodes are healthy over the
>>>>>> period of observation, and I get the same numbers across pretty much 
>>>>>> every
>>>>>> time window, even including an entire 24 hours period.
>>>>>>
>>>>>> I tried to replicate this problem in a test environment so I
>>>>>> connected a client to a test cluster done in a bunch of Docker containers
>>>>>> (same parameters, essentially the only difference is the
>>>>>> GossipingPropertyFileSnitch instead of the EC2 one) and I always get 
>>>>>> what I
>>>>>> expect, the amount of traffic on port 7000 is between 2 and 3 times the
>>>>>> amount of traffic on port 9042 and the queries are pretty much the same
>>>>>> ones.
>>>>>>
>>>>>> Before doing more analysis, I was wondering if someone has an
>>>>>> explanation on this problem, since perhaps we are missing something 
>>>>>> obvious
>>>>>> here?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Unexpected high internode network activity

Reply via email to