Re: [CentOS] waiting IOs...
John Doe wrote: Hi, We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a SmartArray6400). 10 directories are exported through nfs to 10 clients (rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3). The server is apparently not doing much but... we have very high waiting IOs. Probably not connected, but personally I would use 'hard' and 'proto=tcp' instead of 'soft' and 'proto=udp' on the clients James Pearson ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] waiting IOs...
James Pearson wrote: John Doe wrote: Hi, We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a SmartArray6400). 10 directories are exported through nfs to 10 clients (rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3). The server is apparently not doing much but... we have very high waiting IOs. Probably not connected, but personally I would use 'hard' and 'proto=tcp' instead of 'soft' and 'proto=udp' on the clients I'd usually blame disk seek time first. If your raid level requires several drives to move their heads together and/or the data layout lands on the same drive set, consider what happens when your 10 users all want the disk head(s) to be in different places. Disk drives allow random access but they really aren't that good at it if they have to spend most of their time seeking. Raid6 is particularly bad at write performance so a different raid level might help - or if you know the data access pattern you might split the drives into different volumes that don't affect each other and arrange the data accordingly. And mounting with the async option might give much better performance - I'm not sure what the default is these days. -- Les Mikesell lesmikes...@gmail.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] waiting IOs...
# iostat -m -x 10 Linux 2.6.18-8.1.6.el5 (data1.iol) 09/10/2009 . . . avg-cpu: %user %nice %system %iowait %steal %idle 0.200.000.318.790.00 90.70 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d00.00 0.20 0.92 0.51 0.00 0.0010.29 0.04 25.29 25.93 3.71 cciss/c0d10.00 0.20 3.68 2.45 0.02 0.0111.07 0.06 9.87 7.27 4.45 cciss/c0d20.00 0.20 0.41 2.76 0.00 0.01 8.52 0.03 9.97 2.81 0.89 cciss/c0d31.23 0.51 3.98 1.53 0.03 0.0114.52 0.05 9.69 8.07 4.45 cciss/c0d40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 cciss/c0d50.00 0.00 1.02 0.10 0.00 0.00 8.00 0.01 9.36 9.36 1.05 cciss/c0d62.45 0.20 0.92 0.51 0.06 0.0087.43 0.01 9.64 7.21 1.03 cciss/c0d70.00 0.00 0.31 0.10 0.00 0.00 8.00 0.01 14.25 14.25 0.58 cciss/c0d80.00 0.00 0.10 1.84 0.00 0.01 8.00 0.01 7.26 1.42 0.28 cciss/c0d90.00 0.10 0.41 3.78 0.00 0.02 9.56 0.05 12.24 1.59 0.66 cciss/c1d00.00 6.03 0.00 1.74 0.00 0.0226.94 0.04 25.35 12.59 2.19 avg-cpu: %user %nice %system %iowait %steal %idle 0.050.000.36 25.770.00 73.82 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d00.00 0.82 4.60 0.20 0.03 0.0013.11 0.06 13.55 13.00 6.25 cciss/c0d11.23 1.43 1.94 0.20 0.02 0.0129.33 0.02 11.48 11.48 2.46 cciss/c0d20.00 0.00 0.82 0.00 0.00 0.00 8.00 0.01 14.00 14.00 1.15 cciss/c0d32.45 1.43 11.25 0.20 0.07 0.0114.43 0.12 10.62 10.52 12.04 cciss/c0d40.00 1.64 7.98 0.20 0.03 0.0110.00 0.08 9.24 9.10 7.44 cciss/c0d55.93 0.20 22.09 0.20 2.19 0.00 201.03 0.58 26.06 1.88 4.19 cciss/c0d62.45 1.12 5.62 0.41 0.08 0.0129.15 0.06 10.66 9.66 5.83 cciss/c0d70.00 1.23 5.42 0.20 0.02 0.0110.76 0.05 8.87 8.73 4.91 cciss/c0d80.00 0.72 3.17 0.20 0.02 0.0013.09 0.04 12.33 12.21 4.12 cciss/c0d90.92 0.82 3.17 0.20 0.03 0.0018.67 0.04 13.12 12.67 4.27 cciss/c1d00.00 2.66 0.41 3.07 0.00 0.01 8.29 0.28 81.91 10.94 3.80 avg-cpu: %user %nice %system %iowait %steal %idle 0.100.000.20 15.950.00 83.74 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await svctm %util cciss/c0d00.00 4.19 14.72 0.51 0.10 0.0215.52 0.17 10.98 9.05 13.79 cciss/c0d10.00 0.20 0.31 0.20 0.00 0.0011.20 0.01 10.20 10.20 0.52 cciss/c0d20.00 0.31 0.41 0.41 0.00 0.0013.00 0.01 7.88 7.88 0.64 cciss/c0d30.00 0.31 4.50 0.20 0.02 0.00 9.74 0.05 10.61 10.37 4.88 cciss/c0d41.23 0.31 2.76 0.41 0.28 0.00 182.71 0.06 19.97 4.00 1.27 cciss/c0d50.00 0.82 3.17 0.20 0.02 0.0011.64 0.04 11.30 11.30 3.81 cciss/c0d62.45 0.51 3.68 0.41 0.28 0.00 143.60 0.07 16.27 5.30 2.17 cciss/c0d71.23 0.10 1.94 0.20 0.01 0.0015.24 0.03 13.33 13.33 2.86 cciss/c0d80.00 0.31 0.51 0.41 0.00 0.0011.56 0.01 8.00 8.00 0.74 cciss/c0d90.00 0.10 0.61 0.20 0.00 0.0010.00 0.01 11.88 11.75 0.96 cciss/c1d00.00 3.07 0.00 1.33 0.00 0.0119.08 0.02 16.77 13.31 1.77 I tried nmon but I did not see anything out of the ordinary... But when I try to see the NFS stats, nmon (11f-1.el5.rf) coredumps. I tried the iostat nfs option (kernel 2.6.18-8.1.6.el5, should support it) but it did not show anything. nfsstat shows no apparent activity... nfs is normaly only used to put files or modify small files (1K) from times to times, while http is used to get files. From: Les Mikesell lesmikes...@gmail.com I'd usually blame disk seek time first. If your raid level requires several drives to move their heads together and/or the data layout lands on the same drive set, consider what happens when your 10 users all want the disk head(s) to be in different places. Disk drives allow random access but they really aren't that good at it if they have to spend most of their time seeking. That could be it... It's always a
Re: [CentOS] waiting IOs...
John Doe wrote: # iostat -m -x 10 Linux 2.6.18-8.1.6.el5 (data1.iol) 09/10/2009 Are you also following the 'Excessive NFS operations' thread? There's some interesting information there about buggy kernels and apps. -- Les Mikesell lesmikes...@gmail.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] waiting IOs...
Hi, We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a SmartArray6400). 10 directories are exported through nfs to 10 clients (rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3). The server is apparently not doing much but... we have very high waiting IOs. dstat show very little activity, but high 'wai'... # dstat total-cpu-usage -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 0 0 88 12 0 0| 413k 98k| 0 0 | 0 0 | 188 132 0 1 46 53 0 0| 716k 48k| 19k 420k| 0 0 |1345 476 0 1 49 50 0 1| 492k 32k| 12k 181k| 0 0 |1269 482 0 1 63 37 0 0| 316k 159k| 58k 278k| 0 0 |1789 1562 0 0 74 26 0 0| 84k 512k|1937B 6680B| 0 0 |1200 106 0 1 44 55 0 1| 612k 80k| 14k 221k| 0 0 |1378 538 1 1 52 47 0 0| 628k0 | 17k 318k| 0 0 |1327 520 0 1 50 49 0 0| 484k 60k| 14k 178k| 0 0 |1303 494 0 0 87 13 0 0| 124k0 |7745B 116k| 0 0 |1083 139 0 1 59 41 0 0| 316k 60k|4828B 67k| 0 0 |1179 346 top shows that one nfsd is usualy in state 'D' (waiting). # top -i(sorted by cpu usage) top - 18:11:28 up 207 days, 7:13, 2 users, load average: 0.99, 1.07, 1.00 Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 54.3%id, 45.3%wa, 0.2%hi, 0.0%si, 0.0%st Mem: 3089252k total, 3068112k used,21140k free, 928468k buffers Swap: 2008116k total, 164k used, 2007952k free, 293716k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 16571 root 15 0 12708 1076 788 R1 0.0 0:00.02 top 2580 root 15 0 000 D0 0.0 2:36.70 nfsd # cat /proc/net/rpc/nfsd rc 8872 34768207 38630969 fh 142 0 0 0 0 io 2432226534 884662242 th 32 394 4851.311 2437.416 370.949 238.432 542.241 4.942 2.239 1.000 0.427 0.541 ra 64 3876274 5025 3724 2551 2030 2036 1506 1607 1219 1154 1136249 net 73410453 73261524 0 0 rpc 73408119 0 0 0 0 proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 proc3 22 33 9503937 1315066 11670859 7139862 0 5033349 28129122 3729031 0 0 0 487614 0 1116215 0 0 2054329 21225 66 0 2351744 proc4 2 0 0 proc4ops 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Do you think nfs is the problem here? If so, is there something wrong with our config? Is it too much to have 10 dir x 10 clients, even if there is almost no traffic? Thx, JD ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] waiting IOs...
John Doe wrote: Hi, We have a storage server (HP DL360G5 + MSA20 (12 disks in RAID 6) on a SmartArray6400). 10 directories are exported through nfs to 10 clients (rsize=32768,wsize=32768,soft,intr,nosuid,proto=udp,vers=3). The server is apparently not doing much but... we have very high waiting IOs. How about running iostat -x ? Sounds like the system is doing a lot more than you think it is.. nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] waiting IOs...
How about running iostat -x ? Sounds like the system is doing a lot more than you think it is.. You might want to set yourself up with a performance monitoring system like Munin to give you more extensive data, as well. If you get that far, you'll find the iostat plugin to be a bit lacking - I've written a more useful one that I'd be happy to share. -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of In Defense of Food ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos