Thanks Todd!
I check disk bandwidth by first running "hparm" on it, (this shows me a
read b/w of around 56Mbps)
and then running "iftop" while the benchmarks run (This shows me that reads
are only around 10-15Mbps: but
this could definitely be because random seeks are a bottleneck)
The iostat output seems to suggest seek is a problem too, although I'm not
sure I interpret these numbers correctly.
Here's some output from iostat, while the benchmark runs:
Do the queue-lengths I see here indicate a bottleneck?
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 1.00 1.28 6.31 2.57 0.43 1.01 332.30
3.63 408.90 259.25 776.48 6.40 5.67
dm-0 0.00 0.00 6.55 2.87 0.43 1.01 311.48
4.60 487.90 380.14 733.49 5.99 5.65
dm-1 0.00 0.00 0.29 0.88 0.00 0.00 8.00
1.33 1135.17 89.17 1479.15 3.23 0.38
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 1.40 18.20 257.80 2.60 13.84 0.08 109.49
62.55 240.45 241.44 141.85 4.43 115.28
dm-0 0.00 0.00 258.80 3.40 13.81 0.01 107.99
63.17 241.17 241.93 183.76 4.40 115.28
dm-1 0.00 0.00 0.00 17.20 0.00 0.07 8.00
0.21 12.00 0.00 12.00 0.14 0.24
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 1.60 255.80 234.60 3.00 16.43 1.01 150.36
113.11 427.74 430.83 186.40 4.82 114.56
dm-0 0.00 0.00 262.00 1.00 18.03 0.00 140.44
113.86 389.06 389.87 175.20 4.36 114.56
dm-1 0.00 0.00 0.20 258.00 0.00 1.01 8.00
37.92 146.87 0.00 146.98 1.02 26.32
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 3.42 98.67 152.09 2.66 9.47 0.40 130.68
75.82 543.73 549.56 210.86 4.61 71.33
dm-0 0.00 0.00 132.32 1.33 8.04 0.01 123.43
76.06 631.83 635.09 308.00 5.34 71.33
dm-1 0.00 0.00 3.04 99.62 0.01 0.39 8.00
14.84 144.57 648.75 129.18 2.72 27.91
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 1.27 54.85 220.89 2.95 13.15 0.23 122.44
66.15 304.77 305.75 231.71 4.43 99.16
dm-0 0.00 0.00 232.49 3.38 14.07 0.02 122.30
66.66 291.36 292.25 230.00 4.20 99.16
dm-1 0.00 0.00 0.00 54.22 0.00 0.21 8.00
18.12 334.27 0.00 334.27 1.57 8.52
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 4.00 19.80 224.60 3.00 12.80 0.09 115.95
62.69 254.17 256.84 54.67 4.39 99.92
dm-0 0.00 0.00 229.40 2.20 13.00 0.01 115.01
61.95 246.34 247.99 73.82 4.29 99.28
dm-1 0.00 0.00 8.00 20.40 0.03 0.08 8.00
3.78 133.13 216.40 100.47 14.25 40.48
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.59 467.12 191.98 5.28 11.99 1.84 143.64
64.63 355.47 359.03 226.22 5.02 99.10
dm-0 0.00 0.00 174.76 2.54 10.82 0.01 125.05
64.04 392.46 396.73 99.38 5.59 99.10
dm-1 0.00 0.00 0.00 469.67 0.00 1.83 8.00
129.15 274.97 0.00 274.97 0.26 12.05
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 4.91 0.41 228.02 0.82 12.08 0.00 108.15
59.24 241.96 242.07 210.00 4.37 99.96
dm-0 0.00 0.00 218.00 1.02 12.97 0.00 121.31
52.74 223.31 223.21 244.00 4.56 99.96
dm-1 0.00 0.00 28.43 0.00 0.11 0.00 8.00
9.19 299.22 299.22 0.00 25.76 73.21
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 5.20 0.60 233.40 1.20 13.55 0.01 118.34
81.72 316.28 317.00 177.33 4.26 100.00
dm-0 0.00 0.00 243.80 1.00 14.17 0.01 118.60
81.06 302.02 302.52 180.80 4.08 99.92
dm-1 0.00 0.00 9.60 0.80 0.04 0.00 8.00
8.72 496.15 522.75 177.00 96.15 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 5.60 0.80 205.20 1.80 13.77 0.01 136.35
120.57 559.97 562.99 215.11 4.83 100.00
dm-0 0.00 0.00 203.60 1.80 13.86 0.01 138.27
120.66 565.62 568.50 239.56 4.87 100.00
dm-1 0.00 0.00 4.80 0.60 0.02 0.00 8.00
12.45 2189.93 2434.83 230.67 136.44 73.68
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 12.20 0.20 208.00 1.60 13.31 0.01 130.18
102.09 540.08 542.48 228.00 4.77 100.00
dm-0 0.00 0.00 196.40 1.40 13.09 0.01 135.61
100.83 563.25 565.48 249.71 5.02 99.36
dm-1 0.00 0.00 25.00 0.20 0.10 0.00 8.00
15.92 790.03 794.46 236.00 39.68 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 8.40 0.60 210.20 2.60 12.13 0.02 116.87
101.56 443.61 445.35 302.77 4.70 100.00
dm-0 0.00 0.00 189.40 2.20 12.12 0.01 129.66
104.66 509.39 511.55 324.00 5.22 100.00
dm-1 0.00 0.00 27.80 0.80 0.11 0.00 8.00
11.54 386.55 391.17 226.00 34.97 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 7.73 73.54 206.87 2.06 13.89 0.26 138.64
113.54 587.97 591.64 219.33 4.79 100.07
dm-0 0.00 0.00 190.21 1.03 12.46 0.00 133.43
111.91 633.26 634.96 319.33 5.23 100.07
dm-1 0.00 0.00 7.22 74.57 0.03 0.29 8.00
24.64 295.01 2199.43 110.71 12.24 100.07
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 22.01 207.89 233.49 14.35 10.57 0.92 94.90
63.80 245.64 258.79 31.80 4.03 99.90
dm-0 0.00 0.00 162.44 0.00 10.66 0.00 134.42
54.57 328.21 328.21 0.00 6.09 98.95
dm-1 0.00 0.00 100.00 222.01 0.39 0.87 8.00
24.66 77.74 113.21 61.76 3.10 99.90
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 8.00 0.80 185.40 1.00 12.21 0.01 134.20
36.76 195.85 196.16 136.80 5.36 100.00
dm-0 0.00 0.00 180.40 1.20 12.10 0.00 136.56
32.82 182.95 183.42 112.00 5.51 100.00
dm-1 0.00 0.00 9.60 0.40 0.04 0.00 8.00
17.82 420.32 428.67 220.00 99.60 99.60
On 13 February 2012 23:43, Bharath Ravi <[email protected]> wrote:
> Hi all,
>
> I have a distributed HBase setup, on which I'm running the
> YCSB<https://github.com/brianfrankcooper/YCSB/wiki/running-a-workload>benchmark.
> There are 5 region servers, each a Dual core with around 4GB of memory,
> connected simply by a 1Gbps ethernet switch.
>
> The number of "handlers" per regionserver is set to 500 (!) and HDFS's
> maximum receivers per datanode is 4096.
>
> The benchmark dataset is large enough not to fit in memory.
> Update/Insert/Write throughput goes up to 8000 ops/sec easily.
> However, I see read latencies in the order of seconds, and read
> throughputs of only a few 100 ops per second.
>
> "Top" tells me that the CPU's on regionservers spend 70-80% of their time
> waiting for IO, while disk and network
> have plenty of unused bandwidth. How could I diagnose where the read
> bottleneck is?
>
> Any help would be greatly appreciated :)
>
> Thanks in advance!
> --
> Bharath Ravi
>
>
--
Bharath Ravi