Re: strange high system cpu usage.
Lee Thanks for your help. In testing different kernels we found that using an unpatched kernel from kernel.org seems to fix the problem. I'm assuming that a patch added in the gentoo-sources patch set was creating the problem. Our once 8 minute untar is now down to 7-8 seconds with a vanilla 2.6.18.6 kernel. If anyone is interested in our oprofile code or other info, just ask and I'll post it. Otherwise I'll be reporting this to the gentoo developers. -E > - Original Message - > From: "Elliott Johnson" <[EMAIL PROTECTED]> > To: linux-kernel@vger.kernel.org > Subject: Re: strange high system cpu usage. > Date: Fri, 30 Mar 2007 11:54:57 +0800 > > > > What problem are you trying to solve? IOW, how do you know it's not > > just an artifact of diferent load average calculation between 2.4 and > > 2.6? > > > > Are you actually seeing reduced throughput/performance? Or are you > > just looking at load average? > > > > Lee > > Well the problem is apparent, we are having abnormally high cpu > usage. It's about a > 20-40% performance hit. > > The load calculations were not between 2.4 and 2.6 kernel versions, > but between 2.6.8 and > 2.6.19. Sorry if this wasn't very clear from my last email. > > In trying to diagnose the problem I also looked at memory stats > (vmstat) and found the > 'buffered' memory statistic way off from the comparable debian > (2.6.8) install (0-300kb > versus 500mb). > > The vmstat man page has little information on this statistic and > there seems to be varying > explanations on the web. I was hoping for a decisive explanation > (or link) and possibly > advice in toggling this value (or reasons not to). > > I'm still trying to work on this at my end. Some recent tests show > that it might be > related to the megasas driver or the large number of small files we > are using on a xfs > formated 10T array. I'll keep at it. > > Thanks for your response, > > -Elliott > > = > Search for products and services at: > http://search.mail.com > > -- > Powered by Outblaze > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > = Search for products and services at: http://search.mail.com -- Powered by Outblaze - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange high system cpu usage.
Lee Thanks for your help. In testing different kernels we found that using an unpatched kernel from kernel.org seems to fix the problem. I'm assuming that a patch added in the gentoo-sources patch set was creating the problem. Our once 8 minute untar is now down to 7-8 seconds with a vanilla 2.6.18.6 kernel. If anyone is interested in our oprofile code or other info, just ask and I'll post it. Otherwise I'll be reporting this to the gentoo developers. -E - Original Message - From: Elliott Johnson [EMAIL PROTECTED] To: linux-kernel@vger.kernel.org Subject: Re: strange high system cpu usage. Date: Fri, 30 Mar 2007 11:54:57 +0800 What problem are you trying to solve? IOW, how do you know it's not just an artifact of diferent load average calculation between 2.4 and 2.6? Are you actually seeing reduced throughput/performance? Or are you just looking at load average? Lee Well the problem is apparent, we are having abnormally high cpu usage. It's about a 20-40% performance hit. The load calculations were not between 2.4 and 2.6 kernel versions, but between 2.6.8 and 2.6.19. Sorry if this wasn't very clear from my last email. In trying to diagnose the problem I also looked at memory stats (vmstat) and found the 'buffered' memory statistic way off from the comparable debian (2.6.8) install (0-300kb versus 500mb). The vmstat man page has little information on this statistic and there seems to be varying explanations on the web. I was hoping for a decisive explanation (or link) and possibly advice in toggling this value (or reasons not to). I'm still trying to work on this at my end. Some recent tests show that it might be related to the megasas driver or the large number of small files we are using on a xfs formated 10T array. I'll keep at it. Thanks for your response, -Elliott = Search for products and services at: http://search.mail.com -- Powered by Outblaze - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ = Search for products and services at: http://search.mail.com -- Powered by Outblaze - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange high system cpu usage.
On 3/29/07, Elliott Johnson <[EMAIL PROTECTED]> wrote: >What problem are you trying to solve? IOW, how do you know it's not >just an artifact of diferent load average calculation between 2.4 and >2.6? > >Are you actually seeing reduced throughput/performance? Or are you >just looking at load average? > >Lee Well the problem is apparent, we are having abnormally high cpu usage. It's about a 20-40% performance hit. Please post a kernel profile for the problematic workload with the "good" and "bad" kernels (search the list archive for Andrew Morton's instructions on doing it with oprofile, email me privately if you can't find it). The vmstat man page has little information on this statistic and there seems to be varying explanations on the web. I was hoping for a decisive explanation (or link) and possibly advice in toggling this value (or reasons not to). The meaning of these numbers can change drastically from one minor release to the next, and the docs often lag behind the code. I would not focus on tweaking VM knobs, but on describing the problem in enough detail to fix the kernel - it's a bug if the same workload regresses significantly from one release to another. Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange high system cpu usage.
>What problem are you trying to solve? IOW, how do you know it's not >just an artifact of diferent load average calculation between 2.4 and >2.6? > >Are you actually seeing reduced throughput/performance? Or are you >just looking at load average? > >Lee Well the problem is apparent, we are having abnormally high cpu usage. It's about a 20-40% performance hit. The load calculations were not between 2.4 and 2.6 kernel versions, but between 2.6.8 and 2.6.19. Sorry if this wasn't very clear from my last email. In trying to diagnose the problem I also looked at memory stats (vmstat) and found the 'buffered' memory statistic way off from the comparable debian (2.6.8) install (0-300kb versus 500mb). The vmstat man page has little information on this statistic and there seems to be varying explanations on the web. I was hoping for a decisive explanation (or link) and possibly advice in toggling this value (or reasons not to). I'm still trying to work on this at my end. Some recent tests show that it might be related to the megasas driver or the large number of small files we are using on a xfs formated 10T array. I'll keep at it. Thanks for your response, -Elliott = Search for products and services at: http://search.mail.com -- Powered by Outblaze - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange high system cpu usage.
On 3/29/07, Elliott Johnson <[EMAIL PROTECTED]> wrote: Hello, I've been upgrading a few machines here at work and noticed some problems with high system cpu usage on one machine. In trying to debug the problem I've come across a few confusing stats that I was hoping could be cleared up by someone on this list. What problem are you trying to solve? IOW, how do you know it's not just an artifact of diferent load average calculation between 2.4 and 2.6? Are you actually seeing reduced throughput/performance? Or are you just looking at load average? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
strange high system cpu usage.
Hello, I've been upgrading a few machines here at work and noticed some problems with high system cpu usage on one machine. In trying to debug the problem I've come across a few confusing stats that I was hoping could be cleared up by someone on this list. Firstly some info about the system. It's a dell 2850 running a 32bit gentoo install with glibc 2.5 and kernel 2.6.19. A single multi-threaded process is generating the system cpu usage. One of our developers wrote the code and it does some intense disk reads and writes, which should create primarily iowait cpu usage. On a debian system with the same hardware using glibc-2.4 and kernel 2.6.8 the process generates virtually 0 load compared the the upgraded server's load of 2-4. Here is some disk usage info: dc2 linux # sar -d -p 17:40:01 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 17:45:01 sda 0.61 10.50 5.11 25.47 0.00 1.07 0.91 0.06 17:45:01 sdb 26.43 41.81438.41 18.17 0.19 7.10 2.42 6.41 17:45:01nodev 26.63 41.81438.41 18.03 0.19 7.07 2.41 6.41 17:50:01 sda 0.10 0.35 0.97 13.66 0.00 0.28 0.28 0.00 17:50:01 sdb 25.45 56.05523.54 22.77 0.06 2.38 1.88 4.80 17:50:01nodev 25.67 56.05523.54 22.58 0.06 2.38 1.87 4.79 Here is some cpu info: dc2 linux # sar -p 17:40:01CPU %user %nice %system %iowait%steal %idle 17:45:01all 0.05 0.00 20.80 0.27 0.00 78.89 17:50:01all 0.03 0.00 26.46 0.24 0.00 73.26 Here are some memory stats: dc2 linux # vmstat 3 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 0 0 1166088320 720204001463 131 124 0 23 76 0 0 0 0 1165832320 7208080032 410 400 253 0 30 70 0 1 0 0 1165028320 7212080016 260 469 297 0 22 78 0 1 0 0 1164228320 72215600 14159 474 340 0 27 71 2 dc2 linux # vmstat -s 2075276 total memory 993380 used memory 162412 active memory 697456 inactive memory 1081896 free memory 320 buffer memory 792036 swap cache 4152792 total swap 0 used swap 4152792 free swap 1308 non-nice user cpu ticks 0 nice user cpu ticks 286057 system cpu ticks 951507 idle cpu ticks 3378 IO-wait cpu ticks 188 IRQ cpu ticks 362 softirq cpu ticks 0 stolen cpu ticks 157581 pages paged in 780844 pages paged out 0 pages swapped in 0 pages swapped out 1617473 interrupts 1506805 CPU context switches 1175215011 boot time 7709 forks We rebooted it a few moments ago, but before that vmstat showed that the buffered memory was 0kb. This is very different from the other machine which has around 500Mb. A low buffer mem count seems common in our machines with kernel 2.6.17-2.6.19. Looking at vmstat's man page its difficult to understand exactly what buffered mem is and how to go about altering things to get this value higher to test with. It seems to be a computed value and not something settable via /proc. Does any one know more about buffered memory and how to adjust it? -Elliott Johnson = Search for products and services at: http://search.mail.com -- Powered by Outblaze - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
strange high system cpu usage.
Hello, I've been upgrading a few machines here at work and noticed some problems with high system cpu usage on one machine. In trying to debug the problem I've come across a few confusing stats that I was hoping could be cleared up by someone on this list. Firstly some info about the system. It's a dell 2850 running a 32bit gentoo install with glibc 2.5 and kernel 2.6.19. A single multi-threaded process is generating the system cpu usage. One of our developers wrote the code and it does some intense disk reads and writes, which should create primarily iowait cpu usage. On a debian system with the same hardware using glibc-2.4 and kernel 2.6.8 the process generates virtually 0 load compared the the upgraded server's load of 2-4. Here is some disk usage info: dc2 linux # sar -d -p 17:40:01 DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 17:45:01 sda 0.61 10.50 5.11 25.47 0.00 1.07 0.91 0.06 17:45:01 sdb 26.43 41.81438.41 18.17 0.19 7.10 2.42 6.41 17:45:01nodev 26.63 41.81438.41 18.03 0.19 7.07 2.41 6.41 17:50:01 sda 0.10 0.35 0.97 13.66 0.00 0.28 0.28 0.00 17:50:01 sdb 25.45 56.05523.54 22.77 0.06 2.38 1.88 4.80 17:50:01nodev 25.67 56.05523.54 22.58 0.06 2.38 1.87 4.79 Here is some cpu info: dc2 linux # sar -p 17:40:01CPU %user %nice %system %iowait%steal %idle 17:45:01all 0.05 0.00 20.80 0.27 0.00 78.89 17:50:01all 0.03 0.00 26.46 0.24 0.00 73.26 Here are some memory stats: dc2 linux # vmstat 3 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 2 0 0 1166088320 720204001463 131 124 0 23 76 0 0 0 0 1165832320 7208080032 410 400 253 0 30 70 0 1 0 0 1165028320 7212080016 260 469 297 0 22 78 0 1 0 0 1164228320 72215600 14159 474 340 0 27 71 2 dc2 linux # vmstat -s 2075276 total memory 993380 used memory 162412 active memory 697456 inactive memory 1081896 free memory 320 buffer memory 792036 swap cache 4152792 total swap 0 used swap 4152792 free swap 1308 non-nice user cpu ticks 0 nice user cpu ticks 286057 system cpu ticks 951507 idle cpu ticks 3378 IO-wait cpu ticks 188 IRQ cpu ticks 362 softirq cpu ticks 0 stolen cpu ticks 157581 pages paged in 780844 pages paged out 0 pages swapped in 0 pages swapped out 1617473 interrupts 1506805 CPU context switches 1175215011 boot time 7709 forks We rebooted it a few moments ago, but before that vmstat showed that the buffered memory was 0kb. This is very different from the other machine which has around 500Mb. A low buffer mem count seems common in our machines with kernel 2.6.17-2.6.19. Looking at vmstat's man page its difficult to understand exactly what buffered mem is and how to go about altering things to get this value higher to test with. It seems to be a computed value and not something settable via /proc. Does any one know more about buffered memory and how to adjust it? -Elliott Johnson = Search for products and services at: http://search.mail.com -- Powered by Outblaze - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange high system cpu usage.
On 3/29/07, Elliott Johnson [EMAIL PROTECTED] wrote: Hello, I've been upgrading a few machines here at work and noticed some problems with high system cpu usage on one machine. In trying to debug the problem I've come across a few confusing stats that I was hoping could be cleared up by someone on this list. What problem are you trying to solve? IOW, how do you know it's not just an artifact of diferent load average calculation between 2.4 and 2.6? Are you actually seeing reduced throughput/performance? Or are you just looking at load average? Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange high system cpu usage.
What problem are you trying to solve? IOW, how do you know it's not just an artifact of diferent load average calculation between 2.4 and 2.6? Are you actually seeing reduced throughput/performance? Or are you just looking at load average? Lee Well the problem is apparent, we are having abnormally high cpu usage. It's about a 20-40% performance hit. The load calculations were not between 2.4 and 2.6 kernel versions, but between 2.6.8 and 2.6.19. Sorry if this wasn't very clear from my last email. In trying to diagnose the problem I also looked at memory stats (vmstat) and found the 'buffered' memory statistic way off from the comparable debian (2.6.8) install (0-300kb versus 500mb). The vmstat man page has little information on this statistic and there seems to be varying explanations on the web. I was hoping for a decisive explanation (or link) and possibly advice in toggling this value (or reasons not to). I'm still trying to work on this at my end. Some recent tests show that it might be related to the megasas driver or the large number of small files we are using on a xfs formated 10T array. I'll keep at it. Thanks for your response, -Elliott = Search for products and services at: http://search.mail.com -- Powered by Outblaze - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: strange high system cpu usage.
On 3/29/07, Elliott Johnson [EMAIL PROTECTED] wrote: What problem are you trying to solve? IOW, how do you know it's not just an artifact of diferent load average calculation between 2.4 and 2.6? Are you actually seeing reduced throughput/performance? Or are you just looking at load average? Lee Well the problem is apparent, we are having abnormally high cpu usage. It's about a 20-40% performance hit. Please post a kernel profile for the problematic workload with the good and bad kernels (search the list archive for Andrew Morton's instructions on doing it with oprofile, email me privately if you can't find it). The vmstat man page has little information on this statistic and there seems to be varying explanations on the web. I was hoping for a decisive explanation (or link) and possibly advice in toggling this value (or reasons not to). The meaning of these numbers can change drastically from one minor release to the next, and the docs often lag behind the code. I would not focus on tweaking VM knobs, but on describing the problem in enough detail to fix the kernel - it's a bug if the same workload regresses significantly from one release to another. Lee - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/