Re: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Carl Mueller
Your partition sizes aren't ridiculous... kinda big cells if there are 4
cells and 12 MB partitions, but still I don't think that is ludicrous.

Whelp, I'm out of ideas from my "pay grade". Honestly, with AZ/racks you
should have theoretically might have been able to take the nodes off
simultaneously, but (Disclaimer) I've never done that.

?Rolling Restart? <-- definitely indicates I have no ideas :-)

On Thu, Feb 22, 2018 at 8:15 AM, Fd Habash <fmhab...@gmail.com> wrote:

> One more observation …
>
>
>
> When we compare read latencies between non-prod (where nodes were removed)
> to prod clusters, even though the node load as measure by size of /data dir
> is similar, yet the read latencies are 5 times slower in the downsized
> non-prod cluster.
>
>
>
> The only difference we see is that prod reads from 4 sstables whereas
> non-prod reads from 5 as cfhistograms.
>
>
>
> Non-prod /data size
>
> -
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  454G  432G  52% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  439G  446G  50% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  368G  518G  42% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  431G  455G  49% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  463G  423G  53% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  406G  479G  46% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  419G  466G  48% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
>
>
> Prod /data size
>
> 
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  352G  534G  40% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  423G  462G  48% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  431G  454G  49% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  442G  443G  50% /data
>
> Filesystem  Size  Used Avail Use% Mounted on
>
> /dev/nvme0n1885G  454G  431G  52% /data
>
>
>
>
>
> Cfhistograms: comparing prod to non-prod
>
> -
>
>
>
> Non-prod
>
> --
>
> 08:21:38Percentile  SSTables Write Latency  Read
> LatencyPartition SizeCell Count
>
> 08:21:38  (micros)
> (micros)   (bytes)
>
> 08:21:3850% 1.00 24.60
> 4055.27 11864 4
>
> 08:21:3875% 2.00 35.43
> 14530.76 17084 4
>
> 08:21:3895% 4.00126.93
> 89970.66 35425 4
>
> 08:21:3898% 5.00219.34
> 155469.30 73457 4
>
> 08:21:3899% 5.00219.34
>186563.16105778 4
>
> 08:21:38Min 0.00  5.72
> 17.0987 3
>
> 08:21:38Max 7.00  20924.30
> 1386179.89  14530764 4
>
>
>
> Prod
>
> ---
>
> 07:41:42Percentile  SSTables Write Latency  Read
> LatencyPartition SizeCell Count
>
> 07:41:42  (micros)
> (micros)   (bytes)
>
> 07:41:4250% 1.00 24.60
> 2346.80 11864 4
>
> 07:41:4275% 2.00 29.52
> 4866.32 17084 4
>
> 07:41:4295% 3.00 73.46
> 14530.76 29521 4
>
> 07:41:4298% 4.00182.79
> 25109.16 61214 4
>
> 07:41:4299% 4.00182.79
> 36157.19     88148         4
>
> 07:41:42    Min     0.00  9.89
> 20.5087 0
>
> 07:41:42Max 5.00219.34
> 155469.30  12108970 4
>
>
>
>
>
> 
> Thank you
>
>
>
> *From: *Fd Habash <fmhab...@gmail.com>
> *Sent: *Thursday, February 22, 2018 9:00 AM
> *To: *user@cassandra.apache.org
>

RE: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Fd Habash
One more observation …

When we compare read latencies between non-prod (where nodes were removed) to 
prod clusters, even though the node load as measure by size of /data dir is 
similar, yet the read latencies are 5 times slower in the downsized non-prod 
cluster.

The only difference we see is that prod reads from 4 sstables whereas non-prod 
reads from 5 as cfhistograms. 

Non-prod /data size
-
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  454G  432G  52% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  439G  446G  50% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  368G  518G  42% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  431G  455G  49% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  463G  423G  53% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  406G  479G  46% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  419G  466G  48% /data
Filesystem  Size  Used Avail Use% Mounted on

Prod /data size

Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  352G  534G  40% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  423G  462G  48% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  431G  454G  49% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  442G  443G  50% /data
Filesystem  Size  Used Avail Use% Mounted on
/dev/nvme0n1885G  454G  431G  52% /data


Cfhistograms: comparing prod to non-prod
-

Non-prod
--
08:21:38Percentile  SSTables Write Latency  Read 
LatencyPartition SizeCell Count
08:21:38  (micros)  
(micros)   (bytes)  
08:21:3850% 1.00 24.60   
4055.27 11864 4
08:21:3875% 2.00 35.43  
14530.76 17084 4
08:21:3895% 4.00126.93  
89970.66 35425 4
08:21:3898% 5.00219.34 
155469.30 73457 4
08:21:3899% 5.00219.34 
186563.16105778 4
08:21:38Min 0.00  5.72 
17.0987 3
08:21:38Max 7.00  20924.30
1386179.89  14530764 4

Prod
--- 
07:41:42Percentile  SSTables Write Latency  Read 
LatencyPartition SizeCell Count
07:41:42  (micros)  
(micros)   (bytes)  
07:41:4250% 1.00 24.60   
2346.80 11864 4
07:41:4275% 2.00 29.52   
4866.32 17084 4
07:41:4295% 3.00 73.46  
14530.76 29521 4
07:41:4298% 4.00182.79  
25109.16 61214 4
07:41:4299% 4.00182.79  
36157.19 88148 4
07:41:42Min 0.00  9.89 
20.5087 0
07:41:42Max 5.00219.34 
155469.30  12108970 4



Thank you

From: Fd Habash
Sent: Thursday, February 22, 2018 9:00 AM
To: user@cassandra.apache.org
Subject: RE: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead 
Latency After Shrinking Cluster


“ data was allowed to fully rebalance/repair/drain before the next node was 
taken off?”
--
Judging by the messages, the decomm was healthy. As an example 

  StorageService.java:3425 - Announcing that I have left the ring for 3ms   
…
INFO  [RMI TCP Connection(4)-127.0.0.1] 2016-01-07 06:00:52,662 
StorageService.java:1191 – DECOMMISSIONED

I do not believe repairs were run after each node removal. I’ll double-check. 

I’m not sure what you mean by ‘rebalance’? How do you check if a node is 
balanced? Load/size of data dir? 

As for the drain, there was no need to drain and I believe it is not something 
you do as part of decomm’ing a node. 

did you take 1 off per rack/AZ?
--
We removed 3 nodes, one from each AZ in sequence

These are some