This looks fine, 8k read ahead as you mentioned.
Doesnt look like an issue of data model as well since reads in this
https://cl.ly/2c3Z1u2k0u2I appear balanced.

In most possibility, this looks like an issue with new node configuration
to me. The fact that you have really less data going out of node rules out
the possibility of More "hot" data than can be cached. Are your nodes
running spark jobs in locality which are filtering data locally and sending
limited data out?

Im finding 800M Disk IO for 4M network transfer a really fishy!

I believe as a starting point, you can try and debugging page faults with:
*sar -B 1 10*

Regards*,*

On Sun, Feb 19, 2017 at 2:57 AM, Benjamin Roth <benjamin.r...@jaumo.com>
wrote:

> Just for the record, that's what dstat looks like while CS is starting:
>
> root@cas10:~# dstat -lrnv 10
> ---load-avg--- --io/total- -net/total- ---procs--- ------memory-usage-----
> ---paging-- -dsk/total- ---system-- ----total-cpu-usage----
>  1m   5m  15m | read  writ| recv  send|run blk new| used  buff  cach
>  free|  in   out | read  writ| int   csw |usr sys idl wai hiq siq
> 0.69 0.18 0.06| 228  24.3 |   0     0 |0.0   0  24|17.8G 3204k  458M
>  108G|   0     0 |5257k  417k|  17k 3319 |  2   1  97   0   0   0
> 0.96 0.26 0.09| 591  27.9 | 522k  476k|4.1   0  69|18.3G 3204k  906M
>  107G|   0     0 |  45M  287k|  22k 6943 |  7   1  92   0   0   0
> 13.2 2.83 0.92|2187  28.7 |1311k  839k|5.3  90  18|18.9G 3204k 9008M
> 98.1G|   0     0 | 791M 8346k|  49k   25k| 17   1  36  46   0   0
> 30.6 6.91 2.27|2188  67.0 |4200k 3610k|8.8 106  27|19.5G 3204k 17.9G
> 88.4G|   0     0 | 927M 8396k| 116k  119k| 24   2  17  57   0   0
> 43.6 10.5 3.49|2136  24.3 |4371k 3708k|6.3 108 1.0|19.5G 3204k 26.7G
> 79.6G|   0     0 | 893M   13M| 117k  159k| 15   1  17  66   0   0
> 56.9 14.4 4.84|2152  32.5 |3937k 3767k| 11  83 5.0|19.5G 3204k 35.5G
> 70.7G|   0     0 | 894M   14M| 126k  160k| 16   1  16  65   0   0
> 63.2 17.1 5.83|2135  44.1 |4601k 4185k|6.9  99  35|19.6G 3204k 44.3G
> 61.9G|   0     0 | 879M   15M| 133k  168k| 19   2  19  60   0   0
> 64.6 18.9 6.54|2174  42.2 |4393k 3522k|8.4  93 2.2|20.0G 3204k 52.7G
> 53.0G|   0     0 | 897M   14M| 138k  160k| 14   2  15  69   0   0
>
> The IO shoots up (791M) as soon as CS has started up and accepts requests.
> I also diffed sysctl of the both machines. No significant differences.
> Only CPU-related, random values and some hashes differ.
>
> 2017-02-18 21:49 GMT+01:00 Benjamin Roth <benjamin.r...@jaumo.com>:
>
>> 256 tokens:
>>
>> root@cas9:/sys/block/dm-0# blockdev --report
>> RO    RA   SSZ   BSZ   StartSec            Size   Device
>> rw   256   512  4096          0        67108864   /dev/ram0
>> rw   256   512  4096          0        67108864   /dev/ram1
>> rw   256   512  4096          0        67108864   /dev/ram2
>> rw   256   512  4096          0        67108864   /dev/ram3
>> rw   256   512  4096          0        67108864   /dev/ram4
>> rw   256   512  4096          0        67108864   /dev/ram5
>> rw   256   512  4096          0        67108864   /dev/ram6
>> rw   256   512  4096          0        67108864   /dev/ram7
>> rw   256   512  4096          0        67108864   /dev/ram8
>> rw   256   512  4096          0        67108864   /dev/ram9
>> rw   256   512  4096          0        67108864   /dev/ram10
>> rw   256   512  4096          0        67108864   /dev/ram11
>> rw   256   512  4096          0        67108864   /dev/ram12
>> rw   256   512  4096          0        67108864   /dev/ram13
>> rw   256   512  4096          0        67108864   /dev/ram14
>> rw   256   512  4096          0        67108864   /dev/ram15
>> rw    16   512  4096          0    800166076416 <0800%20166076416>
>> /dev/sda
>> rw    16   512  4096       2048    800164151296   /dev/sda1
>> rw    16   512  4096          0    644245094400 <06442%2045094400>
>> /dev/dm-0
>> rw    16   512  4096          0      2046820352   /dev/dm-1
>> rw    16   512  4096          0      1023410176   /dev/dm-2
>> rw    16   512  4096          0    800166076416 <0800%20166076416>
>> /dev/sdb
>>
>> 512 tokens:
>> root@cas10:/sys/block# blockdev --report
>> RO    RA   SSZ   BSZ   StartSec            Size   Device
>> rw   256   512  4096          0        67108864   /dev/ram0
>> rw   256   512  4096          0        67108864   /dev/ram1
>> rw   256   512  4096          0        67108864   /dev/ram2
>> rw   256   512  4096          0        67108864   /dev/ram3
>> rw   256   512  4096          0        67108864   /dev/ram4
>> rw   256   512  4096          0        67108864   /dev/ram5
>> rw   256   512  4096          0        67108864   /dev/ram6
>> rw   256   512  4096          0        67108864   /dev/ram7
>> rw   256   512  4096          0        67108864   /dev/ram8
>> rw   256   512  4096          0        67108864   /dev/ram9
>> rw   256   512  4096          0        67108864   /dev/ram10
>> rw   256   512  4096          0        67108864   /dev/ram11
>> rw   256   512  4096          0        67108864   /dev/ram12
>> rw   256   512  4096          0        67108864   /dev/ram13
>> rw   256   512  4096          0        67108864   /dev/ram14
>> rw   256   512  4096          0        67108864   /dev/ram15
>> rw    16   512  4096          0    800166076416 <0800%20166076416>
>> /dev/sda
>> rw    16   512  4096       2048    800164151296   /dev/sda1
>> rw    16   512  4096          0    800166076416 <0800%20166076416>
>> /dev/sdb
>> rw    16   512  4096       2048    800165027840   /dev/sdb1
>> rw    16   512  4096          0   1073741824000   /dev/dm-0
>> rw    16   512  4096          0      2046820352   /dev/dm-1
>> rw    16   512  4096          0      1023410176   /dev/dm-2
>>
>> 2017-02-18 21:41 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>>
>>> Hi Ben,
>>>
>>> If its same on both machines then something else could be the issue. We
>>> faced high disk io due to misconfigured read ahead which resulted in high
>>> amount of disk io for comparatively insignificant network transfer.
>>>
>>> Can you post output of blockdev --report for a normal node and 512 token
>>> node.
>>>
>>> Regards,
>>>
>>> On Sun, Feb 19, 2017 at 2:07 AM, Benjamin Roth <benjamin.r...@jaumo.com>
>>> wrote:
>>>
>>>> cat /sys/block/sda/queue/read_ahead_kb
>>>> => 8
>>>>
>>>> On all CS nodes. Is that what you mean?
>>>>
>>>> 2017-02-18 21:32 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:
>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> What is the disk read ahead on both nodes?
>>>>>
>>>>> Regards,
>>>>> Bhuvan
>>>>>
>>>>> On Sun, Feb 19, 2017 at 1:58 AM, Benjamin Roth <
>>>>> benjamin.r...@jaumo.com> wrote:
>>>>>
>>>>>> This is status of the largest KS of these both nodes:
>>>>>> UN  10.23.71.10  437.91 GiB  512          49.1%
>>>>>> 2679c3fa-347e-4845-bfc1-c4d0bc906576  RAC1
>>>>>> UN  10.23.71.9   246.99 GiB  256          28.3%
>>>>>> 2804ef8a-26c8-4d21-9e12-01e8b6644c2f  RAC1
>>>>>>
>>>>>> So roughly as expected.
>>>>>>
>>>>>> 2017-02-17 23:07 GMT+01:00 kurt greaves <k...@instaclustr.com>:
>>>>>>
>>>>>>> what's the Owns % for the relevant keyspace from nodetool status?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Benjamin Roth
>>>>>> Prokurist
>>>>>>
>>>>>> Jaumo GmbH · www.jaumo.com
>>>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>>>> <07161%203048801>
>>>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Benjamin Roth
>>>> Prokurist
>>>>
>>>> Jaumo GmbH · www.jaumo.com
>>>> Wehrstraße 46 · 73035 Göppingen · Germany
>>>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>>>> <07161%203048801>
>>>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>>>
>>>
>>>
>>
>>
>> --
>> Benjamin Roth
>> Prokurist
>>
>> Jaumo GmbH · www.jaumo.com
>> Wehrstraße 46 · 73035 Göppingen · Germany
>> Phone +49 7161 304880-6 <07161%203048806> · Fax +49 7161 304880-1
>> <07161%203048801>
>> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>>
>
>
>
> --
> Benjamin Roth
> Prokurist
>
> Jaumo GmbH · www.jaumo.com
> Wehrstraße 46 · 73035 Göppingen · Germany
> Phone +49 7161 304880-6 · Fax +49 7161 304880-1
> AG Ulm · HRB 731058 · Managing Director: Jens Kammerer
>

Reply via email to