Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-06-01 Thread Kelly Lesperance
Software RAID 10.  Servers are HP DL380 Gen 8s, with 12x4 TB 7200 RPM drives.

On 2016-06-01, 3:52 PM, "centos-boun...@centos.org on behalf of 
m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
wrote:

>Kelly Lesperance wrote:
>> I did some additional testing - I stopped Kafka on the host, and kicked
>> off a disk check, and it ran at the expected speed overnight. I started
>> kafka this morning, and the raid check's speed immediately dropped down to
>> ~2000K/Sec.
>>
>> I then enabled the write-back cache on the drives (hdparm -W1 /dev/sd*).
>> The raid check is now running between 10K/Sec and 20K/Sec, and has
>> been for several hours (it fluctuates, but seems to stay within that
>> range). Write-back cache is NOT enabled for the drives on the hosts we
>> haven't upgraded yet, but the speeds are similar (I kicked off a raid
>> check on one of our CentOS 6 hosts as well, the window seems to be 15
>> - 20K/Sec on that host).
>
>Perhaps I missed where you answered this: is this software RAID, or
>hardware? And I think you said you're upgrading existing boxes?
>
>  mark
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-06-01 Thread Kelly Lesperance
I did some additional testing - I stopped Kafka on the host, and kicked off a 
disk check, and it ran at the expected speed overnight. I started kafka this 
morning, and the raid check's speed immediately dropped down to ~2000K/Sec.

I then enabled the write-back cache on the drives (hdparm -W1 /dev/sd*). The 
raid check is now running between 10K/Sec and 20K/Sec, and has been for 
several hours (it fluctuates, but seems to stay within that range). Write-back 
cache is NOT enabled for the drives on the hosts we haven't upgraded yet, but 
the speeds are similar (I kicked off a raid check on one of our CentOS 6 hosts 
as well, the window seems to be 15 - 20K/Sec on that host).

Kelly

On 2016-05-27, 9:21 AM, "Kelly Lesperance" <klespera...@blackberry.com> wrote:

>All of our Kafka clusters are fairly write-heavy.  The cluster in question is 
>our second-heaviest – we haven’t yet upgraded the heaviest, due to the issues 
>we’ve been experiencing in this one. 
>
>Here is an iostat example from a host within the same cluster, but without the 
>RAID check running:
>
>[root@r2k1 ~] # iostat -xdmc 1 10
>Linux 3.10.0-327.13.1.el7.x86_64 (r2k1)05/27/16_x86_64_
>(32 CPU)
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>   8.870.021.280.210.00   89.62
>
>Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
>avgqu-sz   await r_await w_await  svctm  %util
>sdd   0.02 0.550.15   27.06 0.0311.40   859.89 
>1.02   37.40   36.13   37.41   6.86  18.65
>sdf   0.02 0.480.15   26.99 0.0311.40   862.17 
>0.155.56   40.945.37   7.27  19.73
>sdk   0.03 0.580.22   27.10 0.0311.40   857.01 
>1.60   58.49   36.20   58.67   7.17  19.58
>sdb   0.02 0.520.15   27.43 0.0311.40   848.37 
>0.020.78   42.840.55   7.07  19.50
>sdj   0.02 0.550.15   27.11 0.0311.40   858.28 
>0.62   22.70   41.97   22.59   7.43  20.27
>sdg   0.03 0.680.22   27.76 0.0311.40   836.98 
>0.76   27.10   34.36   27.04   7.33  20.51
>sde   0.03 0.480.22   26.99 0.0311.40   860.43 
>0.33   12.07   33.16   11.90   7.34  19.98
>sda   0.03 0.520.22   27.43 0.0311.40   846.65 
>0.57   20.48   36.42   20.35   7.34  20.31
>sdh   0.02 0.680.15   27.76 0.0311.40   838.63 
>0.47   16.66   40.96   16.53   7.20  20.09
>sdc   0.03 0.550.22   27.06 0.0311.40   858.19 
>0.74   27.30   36.96   27.22   7.55  20.58
>sdi   0.03 0.530.22   27.13 0.0311.40   856.04 
>1.60   58.50   27.43   58.75   5.21  14.24
>sdl   0.02 0.560.15   27.11 0.0311.40   858.27 
>1.12   41.09   27.89   41.16   5.00  13.63
>md127 0.00 0.002.53  161.84 0.3668.39   856.56 
>0.000.000.000.00   0.00   0.00
>
>avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>  13.110.001.821.070.00   84.01
>
>Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
>avgqu-sz   await r_await w_await  svctm  %util
>sdd   0.00 0.000.00   81.00 0.0038.48   972.95
>51.00  219.060.00  219.06   6.37  51.60
>sdf   0.00 1.000.00   73.00 0.0033.70   945.33
>55.02  235.860.00  235.86   7.12  52.00
>sdk   0.00 1.000.00   56.00 0.0025.70   939.73
>60.45  223.790.00  223.79   9.29  52.00
>sdb   0.00 2.000.00   70.00 0.0034.48  1008.70
>58.88  292.810.00  292.81   7.37  51.60
>sdj   0.00 3.000.00   62.00 0.0029.87   986.60
>59.32  243.480.00  243.48   8.26  51.20
>sdg   0.00 1.000.00   49.00 0.0023.43   979.45
>60.37  234.980.00  234.98  10.53  51.60
>sde   0.00 1.000.00   61.00 0.0027.95   938.38
>58.17  239.570.00  239.57   8.52  52.00
>sda   0.00 2.000.00   56.00 0.0027.48  1004.88
>56.27  202.880.00  202.88   9.27  51.90
>sdh   0.00 1.000.00   70.00 0.0033.57   982.19
>59.00  277.840.00  277.84   7.43  52.00
>sdc   0.00 0.000.00   64.00 0.0030.06   961.89
>58.20  268.300.00  268.30   8.08  51.70
>sdi   0.00 3.000.00  116.00 0.0055.62   981.94
>44.54  199.720.00  199.72   4.56  52.90
>sdl   0.00 1.000.00  128.00 0.0060

Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-27 Thread Kelly Lesperance
 0.00 0.000.00  535.00 0.00   248.42   950.95 
0.000.000.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  11.080.001.410.000.00   87.51

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdd   0.00 5.000.00   42.00 0.00 0.3818.55 
2.25   53.520.00   53.52   4.93  20.70
sdf   0.00 0.000.00   35.00 0.00 0.2112.43 
1.62   46.170.00   46.17   5.29  18.50
sdk   0.0023.000.00   42.00 0.00 0.4421.40 
1.99   47.290.00   47.29   4.64  19.50
sdb   0.00 9.000.00   58.00 0.00 0.3412.02 
2.77   47.780.00   47.78   4.12  23.90
sdj   0.00 1.000.00   39.00 0.00 0.2412.79 
1.79   45.970.00   45.97   5.21  20.30
sdg   0.0011.000.00   66.00 0.00 0.4012.45 
3.60   54.470.00   54.47   3.42  22.60
sde   0.00 0.000.00   35.00 0.00 0.2112.43 
2.13   61.000.00   61.00   8.89  31.10
sda   0.00 9.000.00   58.00 0.00 0.3412.02 
2.48   42.810.00   42.81   3.71  21.50
sdh   0.0011.000.00   66.00 0.00 0.4012.45 
4.81   72.830.00   72.83   3.80  25.10
sdc   0.00 5.000.00   43.00 0.00 0.8841.93 
1.99   63.810.00   63.81   5.00  21.50
sdi   0.00 1.000.00   39.00 0.00 0.2412.79 
1.31   33.690.00   33.69   4.03  15.70
sdl   0.0023.000.00   42.00 0.00 0.4421.40 
1.23   29.330.00   29.33   3.71  15.60
md127 0.00 0.000.00  313.00 0.00 2.0113.14 
0.000.000.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  16.160.031.660.000.00   82.15

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdd   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdf   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdk   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdb   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdj   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdg   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sde   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sda   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdh   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdc   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdi   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
sdl   0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
md127 0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00

On 2016-05-26, 11:50 PM, "centos-boun...@centos.org on behalf of Gordon 
Messmer" <centos-boun...@centos.org on behalf of gordon.mess...@gmail.com> 
wrote:

>On 05/25/2016 09:54 AM, Kelly Lesperance wrote:
>> What we're seeing is that when the weekly raid-check script executes, 
>> performance nose dives, and I/O wait skyrockets. The raid check starts out 
>> fairly fast (2K/sec - the limit that's been set), but then quickly drops 
>> down to about 4000K/Sec. dev.raid.speed sysctls are at the defaults:
>
>It looks like some pretty heavy writes are going on at the time. I'm not 
>sure what you mean by "nose dives", but I'd expect *some* performance 
>impact of running a read-intensive process like a RAID check at the same 
>time you're running a write-intensive process.
>
>Do the same write-heavy processes run on the other clusters, where you 
>aren't seeing performance issues?
>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>> 9.240.001.32   20.020.00   69.42
>>
>> Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
>>

Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-26 Thread Kelly Lesperance
sdf   0.00 0.00   16.006.00 1.00 0.0295.27 
0.146.594.06   13.33   6.55  14.40
sdg   0.00 0.00   16.00   29.00 1.00 0.1150.56 
0.316.800.44   10.31   4.82  21.70
sdb   0.00 0.008.00   16.00 0.50 0.0648.08 
0.24   10.170.62   14.94   6.75  16.20
md127 0.00 0.000.00   89.00 0.00 0.36 8.24 
0.000.000.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  35.500.037.430.000.00   57.04

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdi  74.00 0.00   21.007.00 5.94 1.53   546.00 
0.47   16.71   17.57   14.14   4.89  13.70
sdk  70.00 0.00   26.00   10.00 6.00 1.41   421.56 
0.41   10.94   11.738.90   4.33  15.60
sdc  77.00 0.00   11.009.00 5.50 1.57   723.60 
0.64   32.00   42.64   19.00  10.65  21.30
sdd  77.00 0.00   11.009.00 5.50 1.57   723.60 
1.19   59.60   96.36   14.67  12.10  24.20
sda  71.00 1.00   24.00   11.00 5.94 1.53   437.09 
0.51   14.46   14.38   14.64   5.09  17.80
sdj  74.00 0.00   21.007.00 5.94 1.53   546.00 
0.58   20.79   20.57   21.43   7.04  19.70
sdl  70.00 0.00   26.00   11.00 6.00 1.91   437.84 
0.39   10.54   11.049.36   4.32  16.00
sdh  77.00 0.00   11.007.00 5.50 1.52   798.67 
0.43   24.17   33.829.00   6.61  11.90
sde  77.00 0.00   11.006.00 5.50 1.52   845.18 
0.58   34.24   36.91   29.33  13.71  23.30
sdf  77.00 0.00   11.006.00 5.50 1.52   845.18 
0.60   35.35   45.36   17.00  10.06  17.10
sdg  77.00 0.00   11.008.00 5.50 1.52   757.05 
0.43   22.95   32.00   10.50   6.89  13.10
sdb  71.00 1.00   24.00   11.00 5.94 1.53   437.09 
0.60   17.14   13.67   24.73   9.03  31.60
md127 0.00 0.000.00   52.00 0.00 9.57   376.96 
0.000.000.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  27.060.036.000.000.00   66.91

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
sdi  14.00 0.00   10.009.00 1.50 4.06   599.58 
0.136.842.60   11.56   6.63  12.60
sdk  14.00 0.00   10.00   10.00 1.50 5.00   665.60 
0.137.052.50   11.60   6.35  12.70
sdc  14.00 0.00   11.00   10.00 1.56 4.01   543.24 
0.157.331.00   14.30   7.24  15.20
sdd  14.00 0.00   11.00   10.00 1.56 4.01   543.24 
0.157.141.00   13.90   7.05  14.80
sda  14.00 0.00   11.00   10.00 1.56 4.20   561.52 
0.125.380.91   10.30   5.43  11.40
sdj  14.00 0.00   10.009.00 1.50 4.06   599.58 
0.26   13.683.60   24.89  13.47  25.60
sdl  14.00 0.00   10.009.00 1.50 4.50   646.74 
0.136.631.30   12.56   6.47  12.30
sdh  13.00 0.00   11.009.00 1.50 4.00   563.60 
0.115.701.18   11.22   5.55  11.10
sde  14.00 0.00   10.008.00 1.50 4.00   625.78 
0.094.781.109.38   4.67   8.40
sdf  14.00 0.00   10.008.00 1.50 4.00   625.78 
0.148.064.00   13.12   7.17  12.90
sdg  13.00 0.00   11.009.00 1.50 4.00   563.60 
0.147.001.91   13.22   6.80  13.60
sdb  14.00 0.00   11.00   10.00 1.56 4.20   561.52 
0.177.671.73   14.20   7.71  16.20
md127 0.00 0.000.00   56.00 0.0025.27   924.14 
0.000.000.000.00   0.00   0.00



On 2016-05-25, 5:43 PM, "centos-boun...@centos.org on behalf of 
cpol...@surewest.net" <centos-boun...@centos.org on behalf of 
cpol...@surewest.net> wrote:

>On 2016-05-25 19:13, Kelly Lesperance wrote:
>> Hdparm didn’t get far:
>> 
>> [root@r1k1 ~] # hdparm -tT /dev/sda
>> 
>> /dev/sda:
>>  Timing cached reads:   Alarm clock
>> [root@r1k1 ~] #
>
>Hi Kelly,
>
>Try running 'iostat -xdmc 1'. Look for a single drive that has
>substantially greater await than ~10msec. If all the drives 
>except one are taking 6-8msec, but one is very much more, you've
>got a drive that drags down the whole array's performance.
>
>Ignore the very first output from the command - it's an
>average of the disk subsystem since boot.
>
>Pos

Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
I should rephrase that – some parts of HP are helping us, but the team I opened 
the case with isn’t being very helpful.

On 2016-05-25, 4:29 PM, "Kelly Lesperance" <klespera...@blackberry.com> wrote:

>Already done – they’re not being very helpful, as we don’t have a support 
>contract, just standard warranty.
>
>On 2016-05-25, 4:27 PM, "centos-boun...@centos.org on behalf of 
>m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
>wrote:
>
>>Kelly Lesperance wrote:
>>> LSI/Avago’s web pages don’t have any downloads for the SAS2308, so I think
>>> I’m out of luck wrt MegaRAID.
>>>
>>> Bounced the node, confirmed MPT Firmware 15.10.09.00-IT.
>>> HP Driver is v 15.10.04.00.
>>>
>>> Both are the latest from HP.
>>>
>>> Unsure why, but the module itself reports version 20.100.00.00:
>>>
>>> [root@r1k1 sys] # cat module/mpt2sas/version
>>> 20.100.00.00
>>
>>Suggestion: if these are new, they're under warranty, and it's a hardware
>>issue. Call HP tech support and open a ticket with them - they might have
>>an answer.
>>
>> mark
>>
>>___
>>CentOS mailing list
>>CentOS@centos.org
>>https://lists.centos.org/mailman/listinfo/centos
>

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
Already done – they’re not being very helpful, as we don’t have a support 
contract, just standard warranty.

On 2016-05-25, 4:27 PM, "centos-boun...@centos.org on behalf of 
m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
wrote:

>Kelly Lesperance wrote:
>> LSI/Avago’s web pages don’t have any downloads for the SAS2308, so I think
>> I’m out of luck wrt MegaRAID.
>>
>> Bounced the node, confirmed MPT Firmware 15.10.09.00-IT.
>> HP Driver is v 15.10.04.00.
>>
>> Both are the latest from HP.
>>
>> Unsure why, but the module itself reports version 20.100.00.00:
>>
>> [root@r1k1 sys] # cat module/mpt2sas/version
>> 20.100.00.00
>
>Suggestion: if these are new, they're under warranty, and it's a hardware
>issue. Call HP tech support and open a ticket with them - they might have
>an answer.
>
> mark
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
LSI/Avago’s web pages don’t have any downloads for the SAS2308, so I think I’m 
out of luck wrt MegaRAID.

Bounced the node, confirmed MPT Firmware 15.10.09.00-IT.
HP Driver is v 15.10.04.00.

Both are the latest from HP.

Unsure why, but the module itself reports version 20.100.00.00:

[root@r1k1 sys] # cat module/mpt2sas/version 
20.100.00.00


On 2016-05-25, 3:20 PM, "centos-boun...@centos.org on behalf of 
m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
wrote:

>John R Pierce wrote:
>> On 5/25/2016 11:44 AM, Kelly Lesperance wrote:
>>> The HBA is an HP H220.
>>
>> OH.its a very good idea to verify the driver is at the same revision
>> level as the firmware.not 100% sure how you do this under CentOS, my
>> H220 system is running FreeBSD, and is at revision P20, both firmware
>> and driver. HP's firmware, at least what I could find, was a fairly
>> old P14 or something, so I had to re-flash mine with 'generic' LSI
>> firmware, this isn't exactly a recommended thing to do, but its sure
>> working fine for me.
>
>Not sure if dmidecode will tell you, but you might see if you can run
>smartctl -i
>
>Also, you could either, on boot, go into the card's firmware interface,
>and that'll tell you, somewhere, what the firmware version is. Not sure if
>MegaRAID will work with this card - if it does, you really want it..even
>though it has an actively user-hostile interface.
>
>  mark
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
I installed the latest firmware and driver (mpt2sas) from HP on one system.  
The driver is v20, it appears the firmware may be 15, though:

[   11.128979] mpt2sas version 20.100.00.00 loaded
[   11.513836] mpt2sas0: LSISAS2308: FWVersion(15.10.09.00), 
ChipRevision(0x05), BiosVersion(07.39.00.00)


On 2016-05-25, 3:01 PM, "centos-boun...@centos.org on behalf of John R Pierce" 
<centos-boun...@centos.org on behalf of pie...@hogranch.com> wrote:

>On 5/25/2016 11:44 AM, Kelly Lesperance wrote:
>> The HBA is an HP H220.
>
>
>OH.its a very good idea to verify the driver is at the same revision 
>level as the firmware.not 100% sure how you do this under CentOS, my 
>H220 system is running FreeBSD, and is at revision P20, both firmware 
>and driver. HP's firmware, at least what I could find, was a fairly 
>old P14 or something, so I had to re-flash mine with 'generic' LSI 
>firmware, this isn't exactly a recommended thing to do, but its sure 
>working fine for me.
>
>
>
>
>-- 
>john r pierce, recycling bits in santa cruz
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
Hdparm didn’t get far:

[root@r1k1 ~] # hdparm -tT /dev/sda

/dev/sda:
 Timing cached reads:   Alarm clock
[root@r1k1 ~] #

On 2016-05-25, 2:44 PM, "Kelly Lesperance" <klespera...@blackberry.com> wrote:

>The HBA is an HP H220.
>
>We haven’t really benchmarked individual drives – all 12 drives are utilized 
>in one RAID-10 array, I’m unsure how we would test individual drives without 
>breaking the array.  
>
>Trying ‘hdparm -tT /dev/sda’ now – it’s been running for 25 minutes so far… 
>
>Kelly
>
>On 2016-05-25, 2:12 PM, "centos-boun...@centos.org on behalf of Dennis 
>Jacobfeuerborn" <centos-boun...@centos.org on behalf of denni...@conversis.de> 
>wrote:
>
>>What is the HBA the drives are attached to?
>>Have you done a quick benchmark on a single disk to check if this is a
>>raid problem or further down the stack?
>>
>>Regards,
>>  Dennis
>>
>>On 25.05.2016 19:26, Kelly Lesperance wrote:
>>> [merging]
>>> 
>>> The HBA the drives are attached to has no configuration that I’m aware of.  
>>> We would have had to accidentally change 23 of them ☺
>>> 
>>> Thanks,
>>> 
>>> Kelly
>>> 
>>> On 2016-05-25, 1:25 PM, "Kelly Lesperance" <klespera...@blackberry.com> 
>>> wrote:
>>> 
>>>> They are:
>>>>
>>>> [root@r1k1 ~] # hdparm -I /dev/sda
>>>>
>>>> /dev/sda:
>>>>
>>>> ATA device, with non-removable media
>>>>Model Number:   MB4000GCWDC 
>>>>Serial Number:  S1Z06RW9
>>>>Firmware Revision:  HPGD
>>>>Transport:  Serial, SATA Rev 3.0
>>>>
>>>> Thanks,
>>>>
>>>> Kelly
>>> 
>>> 
>>> On 2016-05-25, 1:23 PM, "centos-boun...@centos.org on behalf of 
>>> m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
>>> wrote:
>>> 
>>>> Kelly Lesperance wrote:
>>>>> I’ve posted this on the forums at
>>>>> https://www.centos.org/forums/viewtopic.php?f=47=57926=244614#p244614
>>>>> - posting to the list in the hopes of getting more eyeballs on it.
>>>>>
>>>>> We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs:
>>>>>
>>>>> 2x E5-2650
>>>>> 128 GB RAM
>>>>> 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA
>>>>> Dual port 10 GB NIC
>>>>>
>>>>> The drives are configured as one large RAID-10 volume with mdadm,
>>>>> filesystem is XFS. The OS is not installed on the drive - we PXE boot a
>>>>> CentOS image we've built with minimal packages installed, and do the OS
>>>>> configuration via puppet. Originally, the hosts were running CentOS 6.5,
>>>>> with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and
>>>>> Kafka 0.9, and that's when the trouble started.
>>>> 
>>>> One more stupid question: could the configuration of the card for how the
>>>> drives are accessed been accidentally changed?
>>>>
>>>>  mark
>>>>
>>>> ___
>>>> CentOS mailing list
>>>> CentOS@centos.org
>>>> https://lists.centos.org/mailman/listinfo/centos
>>> 
>>> ___
>>> CentOS mailing list
>>> CentOS@centos.org
>>> https://lists.centos.org/mailman/listinfo/centos
>>> 
>>
>>___
>>CentOS mailing list
>>CentOS@centos.org
>>https://lists.centos.org/mailman/listinfo/centos
>

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
The HBA is an HP H220.

We haven’t really benchmarked individual drives – all 12 drives are utilized in 
one RAID-10 array, I’m unsure how we would test individual drives without 
breaking the array.  

Trying ‘hdparm -tT /dev/sda’ now – it’s been running for 25 minutes so far… 

Kelly

On 2016-05-25, 2:12 PM, "centos-boun...@centos.org on behalf of Dennis 
Jacobfeuerborn" <centos-boun...@centos.org on behalf of denni...@conversis.de> 
wrote:

>What is the HBA the drives are attached to?
>Have you done a quick benchmark on a single disk to check if this is a
>raid problem or further down the stack?
>
>Regards,
>  Dennis
>
>On 25.05.2016 19:26, Kelly Lesperance wrote:
>> [merging]
>> 
>> The HBA the drives are attached to has no configuration that I’m aware of.  
>> We would have had to accidentally change 23 of them ☺
>> 
>> Thanks,
>> 
>> Kelly
>> 
>> On 2016-05-25, 1:25 PM, "Kelly Lesperance" <klespera...@blackberry.com> 
>> wrote:
>> 
>>> They are:
>>>
>>> [root@r1k1 ~] # hdparm -I /dev/sda
>>>
>>> /dev/sda:
>>>
>>> ATA device, with non-removable media
>>> Model Number:   MB4000GCWDC 
>>> Serial Number:  S1Z06RW9
>>> Firmware Revision:  HPGD
>>> Transport:  Serial, SATA Rev 3.0
>>>
>>> Thanks,
>>>
>>> Kelly
>> 
>> 
>> On 2016-05-25, 1:23 PM, "centos-boun...@centos.org on behalf of 
>> m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
>> wrote:
>> 
>>> Kelly Lesperance wrote:
>>>> I’ve posted this on the forums at
>>>> https://www.centos.org/forums/viewtopic.php?f=47=57926=244614#p244614
>>>> - posting to the list in the hopes of getting more eyeballs on it.
>>>>
>>>> We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs:
>>>>
>>>> 2x E5-2650
>>>> 128 GB RAM
>>>> 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA
>>>> Dual port 10 GB NIC
>>>>
>>>> The drives are configured as one large RAID-10 volume with mdadm,
>>>> filesystem is XFS. The OS is not installed on the drive - we PXE boot a
>>>> CentOS image we've built with minimal packages installed, and do the OS
>>>> configuration via puppet. Originally, the hosts were running CentOS 6.5,
>>>> with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and
>>>> Kafka 0.9, and that's when the trouble started.
>>> 
>>> One more stupid question: could the configuration of the card for how the
>>> drives are accessed been accidentally changed?
>>>
>>>  mark
>>>
>>> ___
>>> CentOS mailing list
>>> CentOS@centos.org
>>> https://lists.centos.org/mailman/listinfo/centos
>> 
>> ___
>> CentOS mailing list
>> CentOS@centos.org
>> https://lists.centos.org/mailman/listinfo/centos
>> 
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
[merging]

The HBA the drives are attached to has no configuration that I’m aware of.  We 
would have had to accidentally change 23 of them ☺

Thanks,

Kelly

On 2016-05-25, 1:25 PM, "Kelly Lesperance" <klespera...@blackberry.com> wrote:

>They are:
>
>[root@r1k1 ~] # hdparm -I /dev/sda
>
>/dev/sda:
>
>ATA device, with non-removable media
>   Model Number:   MB4000GCWDC 
>   Serial Number:  S1Z06RW9
>   Firmware Revision:  HPGD
>   Transport:  Serial, SATA Rev 3.0
>
>Thanks,
>
>Kelly


On 2016-05-25, 1:23 PM, "centos-boun...@centos.org on behalf of 
m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
wrote:

>Kelly Lesperance wrote:
>> I’ve posted this on the forums at
>> https://www.centos.org/forums/viewtopic.php?f=47=57926=244614#p244614
>> - posting to the list in the hopes of getting more eyeballs on it.
>>
>> We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs:
>>
>> 2x E5-2650
>> 128 GB RAM
>> 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA
>> Dual port 10 GB NIC
>>
>> The drives are configured as one large RAID-10 volume with mdadm,
>> filesystem is XFS. The OS is not installed on the drive - we PXE boot a
>> CentOS image we've built with minimal packages installed, and do the OS
>> configuration via puppet. Originally, the hosts were running CentOS 6.5,
>> with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and
>> Kafka 0.9, and that's when the trouble started.
>
>One more stupid question: could the configuration of the card for how the
>drives are accessed been accidentally changed?
>
>  mark
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
They are:

[root@r1k1 ~] # hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
Model Number:   MB4000GCWDC 
Serial Number:  S1Z06RW9
Firmware Revision:  HPGD
Transport:  Serial, SATA Rev 3.0

Thanks,

Kelly

On 2016-05-25, 1:21 PM, "centos-boun...@centos.org on behalf of 
m.r...@5-cent.us" <centos-boun...@centos.org on behalf of m.r...@5-cent.us> 
wrote:

>Kelly Lesperance wrote:
>> I’ve posted this on the forums at
>> https://www.centos.org/forums/viewtopic.php?f=47=57926=244614#p244614
>> - posting to the list in the hopes of getting more eyeballs on it.
>>
>> We have a cluster of 23 HP DL380p Gen8 hosts running Kafka. Basic specs:
>>
>> 2x E5-2650
>> 128 GB RAM
>> 12 x 4 TB 7200 RPM SATA drives connected to an HP H220 HBA
>> Dual port 10 GB NIC
>>
>> The drives are configured as one large RAID-10 volume with mdadm,
>> filesystem is XFS. The OS is not installed on the drive - we PXE boot a
>> CentOS image we've built with minimal packages installed, and do the OS
>> configuration via puppet. Originally, the hosts were running CentOS 6.5,
>> with Kafka 0.8.1, without issue. We recently upgraded to CentOS 7.2 and
>> Kafka 0.9, and that's when the trouble started.
>
>Really stupid question: are the drives in that the ones that came with the
>unit?
>
>  mark, who, a few years ago, found serious issues with green drives in a
>  server
>
>___
>CentOS mailing list
>CentOS@centos.org
>https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Slow RAID Check/high %iowait during check after updgrade from CentOS 6.5 -> CentOS 7.2

2016-05-25 Thread Kelly Lesperance
        704       7680
sdl              26.00       704.00      7680.00        704       7680
md127            92.00         0.00     46596.00          0      46596

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.24    0.00    2.22   19.89    0.00   63.65

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              33.00      1024.00      7244.00       1024       7244
sdb              33.00      1024.00      7244.00       1024       7244
sdc              31.00      1024.00      7668.00       1024       7668
sdd              31.00      1024.00      7668.00       1024       7668
sdf              31.00      1024.00      7680.00       1024       7680
sdg              26.00       768.00      6672.00        768       6672
sdh              26.00       768.00      6672.00        768       6672
sde              31.00      1024.00      7680.00       1024       7680
sdj              21.00       512.00      6656.00        512       6656
sdi              21.00       512.00      6656.00        512       6656
sdk              27.00       832.00      7168.00        832       7168
sdl              27.00       832.00      7168.00        832       7168
md127            88.00         0.00     43088.00          0      43088

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.02    0.13    1.42   23.90    0.00   66.53

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda              30.00      1024.00      7168.00       1024       7168
sdb              30.00      1024.00      7168.00       1024       7168
sdc              29.00       960.00      7168.00        960       7168
sdd              29.00       960.00      7168.00        960       7168
sdf              23.00       512.00      7668.00        512       7668
sdg              28.00       768.00      7680.00        768       7680
sdh              28.00       768.00      7680.00        768       7680
sde              23.00       512.00      7668.00        512       7668
sdj              30.00      1024.00      6672.00       1024       6672
sdi              30.00      1024.00      6672.00       1024       6672
sdk              30.00      1024.00      7168.00       1024       7168
sdl              30.00      1024.00      7168.00       1024       7168
md127            87.00         0.00     43524.00          0      43524


Details of the array:

[root@r1k1] # cat /proc/mdstat 
Personalities : [raid10] 
md127 : active raid10 sdf[5] sdi[8] sdh[7] sdk[10] sdb[1] sdj[9] sdc[2] sdd[3] 
sdl[11] sde[13] sdg[12] sda[0]
      23441323008 blocks super 1.2 512K chunks 2 near-copies [12/12] 
[]
      [==>..]  check = 30.8% (7237496960/23441323008) 
finish=62944.5min speed=4290K/sec
      
unused devices: 
[root@r1k1] # mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Thu Sep 18 09:57:57 2014
     Raid Level : raid10
     Array Size : 23441323008 (22355.39 GiB 24003.91 GB)
  Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB)
   Raid Devices : 12
  Total Devices : 12
    Persistence : Superblock is persistent

    Update Time : Tue May 24 15:32:56 2016
          State : active, checking 
 Active Devices : 12
Working Devices : 12
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

   Check Status : 30% complete

           Name : localhost:kafka
           UUID : b6b98e3e:65ee06c3:3599d781:98908041
         Events : 2459193

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync set-A   /dev/sda
       1       8       16        1      active sync set-B   /dev/sdb
       2       8       32        2      active sync set-A   /dev/sdc
       3       8       48        3      active sync set-B   /dev/sdd
      13       8       64        4      active sync set-A   /dev/sde
       5       8       80        5      active sync set-B   /dev/sdf
      12       8       96        6      active sync set-A   /dev/sdg
       7       8      112        7      active sync set-B   /dev/sdh
       8       8      128        8      active sync set-A   /dev/sdi
       9       8      144        9      active sync set-B   /dev/sdj
      10       8      160       10      active sync set-A   /dev/sdk
      11       8      176       11      active sync set-B   /dev/sdl


We've tried changing the I/O scheduler, queue_depth, queue_type, read-ahead, 
etc, but nothing has helped. We've also upgraded all of the firmware, and 
installed HP's mpt2sas driver.

We have 4 other Kafka clusters, however they're HP DL180 G6 servers. We 
completed the same CentOS 6.5 -> 7.2/Kafka 0.8 -> 0.9 upgrade on those 
clusters, and there has been no impact to their performance.

We've been banging our heads against the wall for a few weeks now, really 
hoping someone from the community can point us in the right direction.

Thanks,

Kelly Lesperance

___
CentOS mailing li