Thank you for your reply.

- Repairs are not running on the cluster, in fact we've been "slacking"
when it comes to repair, mainly because we never manually delete our data
as it's always TTLed and we haven't had major failures or outages that
required repairing data (I know that's not a good reason anyway)

- We are not using server-to-server encryption

- internode_compression is set to all, and the application driver is lz4

- I just did a "nodetool flush && service cassandra restart" on one node of
the affected cluster and let it run for a few minutes, and these are the
statistics (all the nodes get the same ratio of network activity on port
9042 and port 7000, so pardon my raw estimates below in assuming that the
activity of a single node can reflect the activity of the whole cluster):

9042 traffic: 400 MB (split between 200 MB reads and 200 MB writes)
7000 traffic: 5 GB (counted twice by iftop, so 2.5 GB)

$ nodetool netstats -H
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 10167
Mismatch (Blocking): 210
Mismatch (Background): 151
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         422986
Responses                       n/a         0         403144

If I do the same on a test cluster (with less activity and nodes but same
RF and configuration), I get, always for a single node:

9042 traffic: 250 MB (split between 100 MB reads and 150 MB writes)
7000 traffic: 1 GB (counted twice by iftop, so 500 MB)

$ nodetool netstats -H
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 6668
Mismatch (Blocking): 159
Mismatch (Background): 43
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         125202
Responses                       n/a         0         141708

So, once again, in one cluster the internode activity is ~7 times the 9042
one, whereas in the test one is ~2, which is expected.

Thanks


On Fri, Feb 26, 2016 at 10:04 AM, Nate McCall <n...@thelastpickle.com>
wrote:

>
>> Unfortunately, these numbers still don't match at all.
>>
>> And yes, the cluster is in a single DC and since I am using the EC2
>> snitch, replicas are AZ aware.
>>
>>
> Are repairs running on the cluster?
>
> Other thoughts:
> - is internode_compression set to 'all' in cassandra.yaml (should be 'all'
> by default, but worth checking since you are using lz4 on the client)?
> - are you using server-to-server encryption ?
>
> You can compare the output of nodetool netstats on the test cluster with
> the AWS cluster as well to see if anything sticks out.
>
>
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Reply via email to