“ data was allowed to fully rebalance/repair/drain before the next node was
taken off?”
--------------------------------------------------------------
Judging by the messages, the decomm was healthy. As an example
StorageService.java:3425 - Announcing that I have left the ring for 30000ms
…
INFO [RMI TCP Connection(4)-127.0.0.1] 2016-01-07 06:00:52,662
StorageService.java:1191 – DECOMMISSIONED
I do not believe repairs were run after each node removal. I’ll double-check.
I’m not sure what you mean by ‘rebalance’? How do you check if a node is
balanced? Load/size of data dir?
As for the drain, there was no need to drain and I believe it is not something
you do as part of decomm’ing a node.
did you take 1 off per rack/AZ?
--------------------------------------------------------------
We removed 3 nodes, one from each AZ in sequence
These are some of the cfhistogram metrics. Read latencies are high after the
removal of the nodes
--------------------------------------------------------------
You can see reads of 186ms are at the 99th% from 5 sstables. There are awfully
high numbers given that these metrics measure C* storage layer read
performance.
Does this mean removing the nodes undersized the cluster?
key_space_01/cf_01 histograms
Percentile SSTables Write Latency Read Latency Partition Size
Cell Count
(micros) (micros) (bytes)
50% 1.00 24.60 4055.27 11864
4
75% 2.00 35.43 14530.76 17084
4
95% 4.00 126.93 89970.66 35425
4
98% 5.00 219.34 155469.30 73457
4
99% 5.00 219.34 186563.16 105778
4
Min 0.00 5.72 17.09 87
3
Max 7.00 20924.30 1386179.89 14530764
4
key_space_01/cf_01 histograms
Percentile SSTables Write Latency Read Latency Partition Size
Cell Count
(micros) (micros) (bytes)
50% 1.00 29.52 4055.27 11864
4
75% 2.00 42.51 10090.81 17084
4
95% 4.00 152.32 52066.35 35425
4
98% 4.00 219.34 89970.66 73457
4
99% 5.00 219.34 155469.30 88148
4
Min 0.00 9.89 24.60 87
0
Max 6.00 1955.67 557074.61 14530764
4
----------------
Thank you
From: Carl Mueller
Sent: Wednesday, February 21, 2018 4:33 PM
To: [email protected]
Subject: Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase inRead
Latency After Shrinking Cluster
Hm nodetool decommision performs the streamout of the replicated data, and you
said that was apparently without error...
But if you dropped three nodes in one AZ/rack on a five node with RF3, then we
have a missing RF factor unless NetworkTopologyStrategy fails over to another
AZ. But that would also entail cross-az streaming and queries and repair.
On Wed, Feb 21, 2018 at 3:30 PM, Carl Mueller <[email protected]>
wrote:
sorry for the idiot questions...
data was allowed to fully rebalance/repair/drain before the next node was taken
off?
did you take 1 off per rack/AZ?
On Wed, Feb 21, 2018 at 12:29 PM, Fred Habash <[email protected]> wrote:
One node at a time
On Feb 21, 2018 10:23 AM, "Carl Mueller" <[email protected]> wrote:
What is your replication factor?
Single datacenter, three availability zones, is that right?
You removed one node at a time or three at once?
On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash <[email protected]> wrote:
We have had a 15 node cluster across three zones and cluster repairs using
‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk the
cluster to 12. Since then, same repair job has taken up to 12 hours to finish
and most times, it never does.
More importantly, at some point during the repair cycle, we see read latencies
jumping to 1-2 seconds and applications immediately notice the impact.
stream_throughput_outbound_megabits_per_sec is set at 200 and
compaction_throughput_mb_per_sec at 64. The /data dir on the nodes is around
~500GB at 44% usage.
When shrinking the cluster, the ‘nodetool decommision’ was eventless. It
completed successfully with no issues.
What could possibly cause repairs to cause this impact following cluster
downsizing? Taking three nodes out does not seem compatible with such a drastic
effect on repair and read latency.
Any expert insights will be appreciated.
----------------
Thank you