Re: strange high system cpu usage.

2007-03-30 Thread Elliott Johnson
Lee

Thanks for your help.  In testing different kernels we found that using an 
unpatched kernel from kernel.org seems to fix the problem.  I'm assuming that a 
patch added in the gentoo-sources patch set was creating the problem.  Our once 
8 minute untar is now down to 7-8 seconds with a vanilla 2.6.18.6 kernel.

If anyone is interested in our oprofile code or other info, just ask and I'll 
post it.  Otherwise I'll be reporting this to the gentoo developers.

-E

> - Original Message -
> From: "Elliott Johnson" <[EMAIL PROTECTED]>
> To: linux-kernel@vger.kernel.org
> Subject: Re: strange high system cpu usage.
> Date: Fri, 30 Mar 2007 11:54:57 +0800
> 
> 
> > What problem are you trying to solve?  IOW, how do you know it's not
> > just an artifact of diferent load average calculation between 2.4 and
> > 2.6?
> >
> > Are you actually seeing reduced throughput/performance?  Or are you
> > just looking at load average?
> >
> > Lee
> 
> Well the problem is apparent, we are having abnormally high cpu 
> usage.  It's about a
> 20-40% performance hit.
> 
> The load calculations were not between 2.4 and 2.6 kernel versions, 
> but between 2.6.8 and
> 2.6.19.  Sorry if this wasn't very clear from my last email.
> 
> In trying to diagnose the problem I also looked at memory stats 
> (vmstat) and found the
> 'buffered' memory statistic way off from the comparable debian 
> (2.6.8) install (0-300kb
> versus 500mb).
> 
> The vmstat man page has little information on this statistic and 
> there seems to be varying
> explanations on the web.  I was hoping for a decisive explanation 
> (or link) and possibly
> advice in toggling this value (or reasons not to).
> 
> I'm still trying to work on this at my end.  Some recent tests show 
> that it might be
> related to the megasas driver or the large number of small files we 
> are using on a xfs
> formated 10T array.  I'll keep at it.
> 
> Thanks for your response,
> 
> -Elliott
> 
> =
> Search for products and services at:
> http://search.mail.com
> 
> --
> Powered by Outblaze
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

>


=
Search for products and services at: 
http://search.mail.com

-- 
Powered by Outblaze
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange high system cpu usage.

2007-03-30 Thread Elliott Johnson
Lee

Thanks for your help.  In testing different kernels we found that using an 
unpatched kernel from kernel.org seems to fix the problem.  I'm assuming that a 
patch added in the gentoo-sources patch set was creating the problem.  Our once 
8 minute untar is now down to 7-8 seconds with a vanilla 2.6.18.6 kernel.

If anyone is interested in our oprofile code or other info, just ask and I'll 
post it.  Otherwise I'll be reporting this to the gentoo developers.

-E

 - Original Message -
 From: Elliott Johnson [EMAIL PROTECTED]
 To: linux-kernel@vger.kernel.org
 Subject: Re: strange high system cpu usage.
 Date: Fri, 30 Mar 2007 11:54:57 +0800
 
 
  What problem are you trying to solve?  IOW, how do you know it's not
  just an artifact of diferent load average calculation between 2.4 and
  2.6?
 
  Are you actually seeing reduced throughput/performance?  Or are you
  just looking at load average?
 
  Lee
 
 Well the problem is apparent, we are having abnormally high cpu 
 usage.  It's about a
 20-40% performance hit.
 
 The load calculations were not between 2.4 and 2.6 kernel versions, 
 but between 2.6.8 and
 2.6.19.  Sorry if this wasn't very clear from my last email.
 
 In trying to diagnose the problem I also looked at memory stats 
 (vmstat) and found the
 'buffered' memory statistic way off from the comparable debian 
 (2.6.8) install (0-300kb
 versus 500mb).
 
 The vmstat man page has little information on this statistic and 
 there seems to be varying
 explanations on the web.  I was hoping for a decisive explanation 
 (or link) and possibly
 advice in toggling this value (or reasons not to).
 
 I'm still trying to work on this at my end.  Some recent tests show 
 that it might be
 related to the megasas driver or the large number of small files we 
 are using on a xfs
 formated 10T array.  I'll keep at it.
 
 Thanks for your response,
 
 -Elliott
 
 =
 Search for products and services at:
 http://search.mail.com
 
 --
 Powered by Outblaze
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/




=
Search for products and services at: 
http://search.mail.com

-- 
Powered by Outblaze
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange high system cpu usage.

2007-03-29 Thread Lee Revell

On 3/29/07, Elliott Johnson <[EMAIL PROTECTED]> wrote:

>What problem are you trying to solve?  IOW, how do you know it's not
>just an artifact of diferent load average calculation between 2.4 and
>2.6?
>
>Are you actually seeing reduced throughput/performance?  Or are you
>just looking at load average?
>
>Lee

Well the problem is apparent, we are having abnormally high cpu usage.  It's 
about a
20-40% performance hit.



Please post a kernel profile for the problematic workload with the
"good" and "bad" kernels (search the list archive for Andrew Morton's
instructions on doing it with oprofile, email me privately if you
can't find it).


The vmstat man page has little information on this statistic and there seems to 
be varying
explanations on the web.  I was hoping for a decisive explanation (or link) and 
possibly
advice in toggling this value (or reasons not to).


The meaning of these numbers can change drastically from one minor
release to the next, and the docs often lag behind the code.

I would not focus on tweaking VM knobs, but on describing the problem
in enough detail to fix the kernel - it's a bug if the same workload
regresses significantly from one release to another.

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange high system cpu usage.

2007-03-29 Thread Elliott Johnson
>What problem are you trying to solve?  IOW, how do you know it's not
>just an artifact of diferent load average calculation between 2.4 and
>2.6?
>
>Are you actually seeing reduced throughput/performance?  Or are you
>just looking at load average?
>
>Lee

Well the problem is apparent, we are having abnormally high cpu usage.  It's 
about a 
20-40% performance hit.

The load calculations were not between 2.4 and 2.6 kernel versions, but between 
2.6.8 and 
2.6.19.  Sorry if this wasn't very clear from my last email.

In trying to diagnose the problem I also looked at memory stats (vmstat) and 
found the 
'buffered' memory statistic way off from the comparable debian (2.6.8) install 
(0-300kb 
versus 500mb).

The vmstat man page has little information on this statistic and there seems to 
be varying
explanations on the web.  I was hoping for a decisive explanation (or link) and 
possibly 
advice in toggling this value (or reasons not to).

I'm still trying to work on this at my end.  Some recent tests show that it 
might be
related to the megasas driver or the large number of small files we are using 
on a xfs
formated 10T array.  I'll keep at it.

Thanks for your response,

-Elliott

=
Search for products and services at: 
http://search.mail.com

-- 
Powered by Outblaze
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange high system cpu usage.

2007-03-29 Thread Lee Revell

On 3/29/07, Elliott Johnson <[EMAIL PROTECTED]> wrote:

Hello,

I've been upgrading a few machines here at work and noticed some problems with 
high system cpu usage on one machine.  In trying to debug the problem I've come 
across a few confusing stats that I was hoping could be cleared up by someone 
on this list.


What problem are you trying to solve?  IOW, how do you know it's not
just an artifact of diferent load average calculation between 2.4 and
2.6?

Are you actually seeing reduced throughput/performance?  Or are you
just looking at load average?

Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


strange high system cpu usage.

2007-03-29 Thread Elliott Johnson
Hello,

I've been upgrading a few machines here at work and noticed some problems with 
high system cpu usage on one machine.  In trying to debug the problem I've come 
across a few confusing stats that I was hoping could be cleared up by someone 
on this list.

Firstly some info about the system.  It's a dell 2850 running a 32bit gentoo 
install with glibc 2.5 and kernel 2.6.19.  A single multi-threaded process is 
generating the system cpu usage.  One of our developers wrote the code and it 
does some intense disk reads and writes, which should create primarily iowait 
cpu usage.  On a debian system with the same hardware using glibc-2.4 and 
kernel 2.6.8 the process generates virtually 0 load compared the the upgraded 
server's load of 2-4.

Here is some disk usage info:

 dc2 linux # sar -d -p
 17:40:01  DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
await svctm %util
 17:45:01  sda  0.61 10.50  5.11 25.47  0.00  
1.07  0.91  0.06
 17:45:01  sdb 26.43 41.81438.41 18.17  0.19  
7.10  2.42  6.41
 17:45:01nodev 26.63 41.81438.41 18.03  0.19  
7.07  2.41  6.41
 17:50:01  sda  0.10  0.35  0.97 13.66  0.00  
0.28  0.28  0.00
 17:50:01  sdb 25.45 56.05523.54 22.77  0.06  
2.38  1.88  4.80
 17:50:01nodev 25.67 56.05523.54 22.58  0.06  
2.38  1.87  4.79

Here is some cpu info:

 dc2 linux # sar -p
 17:40:01CPU %user %nice   %system   %iowait%steal %idle
 17:45:01all  0.05  0.00 20.80  0.27  0.00 78.89
 17:50:01all  0.03  0.00 26.46  0.24  0.00 73.26

Here are some memory stats:

 dc2 linux # vmstat 3
 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
  2  0  0 1166088320 720204001463  131  124  0 23 76  0
  0  0  0 1165832320 7208080032   410  400  253  0 30 70  0
  1  0  0 1165028320 7212080016   260  469  297  0 22 78  0
  1  0  0 1164228320 72215600   14159  474  340  0 27 71  2

dc2 linux # vmstat -s
  2075276  total memory
   993380  used memory
   162412  active memory
   697456  inactive memory
  1081896  free memory
  320  buffer memory
   792036  swap cache
  4152792  total swap
0  used swap
  4152792  free swap
 1308 non-nice user cpu ticks
0 nice user cpu ticks
   286057 system cpu ticks
   951507 idle cpu ticks
 3378 IO-wait cpu ticks
  188 IRQ cpu ticks
  362 softirq cpu ticks
0 stolen cpu ticks
   157581 pages paged in
   780844 pages paged out
0 pages swapped in
0 pages swapped out
  1617473 interrupts
  1506805 CPU context switches
   1175215011 boot time
 7709 forks


We rebooted it a few moments ago, but before that vmstat showed that the 
buffered memory was 0kb.  This is very different from the other machine which 
has around 500Mb.  A low buffer mem count seems common in our machines with 
kernel 2.6.17-2.6.19.  Looking at vmstat's man page its difficult to understand 
exactly what buffered mem is and how to go about altering things to get this 
value higher to test with.  It seems to be a computed value and not something 
settable via /proc.  Does any one know more about buffered memory and how to 
adjust it?

-Elliott Johnson

=
Search for products and services at: 
http://search.mail.com

-- 
Powered by Outblaze
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


strange high system cpu usage.

2007-03-29 Thread Elliott Johnson
Hello,

I've been upgrading a few machines here at work and noticed some problems with 
high system cpu usage on one machine.  In trying to debug the problem I've come 
across a few confusing stats that I was hoping could be cleared up by someone 
on this list.

Firstly some info about the system.  It's a dell 2850 running a 32bit gentoo 
install with glibc 2.5 and kernel 2.6.19.  A single multi-threaded process is 
generating the system cpu usage.  One of our developers wrote the code and it 
does some intense disk reads and writes, which should create primarily iowait 
cpu usage.  On a debian system with the same hardware using glibc-2.4 and 
kernel 2.6.8 the process generates virtually 0 load compared the the upgraded 
server's load of 2-4.

Here is some disk usage info:

 dc2 linux # sar -d -p
 17:40:01  DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
await svctm %util
 17:45:01  sda  0.61 10.50  5.11 25.47  0.00  
1.07  0.91  0.06
 17:45:01  sdb 26.43 41.81438.41 18.17  0.19  
7.10  2.42  6.41
 17:45:01nodev 26.63 41.81438.41 18.03  0.19  
7.07  2.41  6.41
 17:50:01  sda  0.10  0.35  0.97 13.66  0.00  
0.28  0.28  0.00
 17:50:01  sdb 25.45 56.05523.54 22.77  0.06  
2.38  1.88  4.80
 17:50:01nodev 25.67 56.05523.54 22.58  0.06  
2.38  1.87  4.79

Here is some cpu info:

 dc2 linux # sar -p
 17:40:01CPU %user %nice   %system   %iowait%steal %idle
 17:45:01all  0.05  0.00 20.80  0.27  0.00 78.89
 17:50:01all  0.03  0.00 26.46  0.24  0.00 73.26

Here are some memory stats:

 dc2 linux # vmstat 3
 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
  2  0  0 1166088320 720204001463  131  124  0 23 76  0
  0  0  0 1165832320 7208080032   410  400  253  0 30 70  0
  1  0  0 1165028320 7212080016   260  469  297  0 22 78  0
  1  0  0 1164228320 72215600   14159  474  340  0 27 71  2

dc2 linux # vmstat -s
  2075276  total memory
   993380  used memory
   162412  active memory
   697456  inactive memory
  1081896  free memory
  320  buffer memory
   792036  swap cache
  4152792  total swap
0  used swap
  4152792  free swap
 1308 non-nice user cpu ticks
0 nice user cpu ticks
   286057 system cpu ticks
   951507 idle cpu ticks
 3378 IO-wait cpu ticks
  188 IRQ cpu ticks
  362 softirq cpu ticks
0 stolen cpu ticks
   157581 pages paged in
   780844 pages paged out
0 pages swapped in
0 pages swapped out
  1617473 interrupts
  1506805 CPU context switches
   1175215011 boot time
 7709 forks


We rebooted it a few moments ago, but before that vmstat showed that the 
buffered memory was 0kb.  This is very different from the other machine which 
has around 500Mb.  A low buffer mem count seems common in our machines with 
kernel 2.6.17-2.6.19.  Looking at vmstat's man page its difficult to understand 
exactly what buffered mem is and how to go about altering things to get this 
value higher to test with.  It seems to be a computed value and not something 
settable via /proc.  Does any one know more about buffered memory and how to 
adjust it?

-Elliott Johnson

=
Search for products and services at: 
http://search.mail.com

-- 
Powered by Outblaze
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange high system cpu usage.

2007-03-29 Thread Lee Revell

On 3/29/07, Elliott Johnson [EMAIL PROTECTED] wrote:

Hello,

I've been upgrading a few machines here at work and noticed some problems with 
high system cpu usage on one machine.  In trying to debug the problem I've come 
across a few confusing stats that I was hoping could be cleared up by someone 
on this list.


What problem are you trying to solve?  IOW, how do you know it's not
just an artifact of diferent load average calculation between 2.4 and
2.6?

Are you actually seeing reduced throughput/performance?  Or are you
just looking at load average?

Lee
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange high system cpu usage.

2007-03-29 Thread Elliott Johnson
What problem are you trying to solve?  IOW, how do you know it's not
just an artifact of diferent load average calculation between 2.4 and
2.6?

Are you actually seeing reduced throughput/performance?  Or are you
just looking at load average?

Lee

Well the problem is apparent, we are having abnormally high cpu usage.  It's 
about a 
20-40% performance hit.

The load calculations were not between 2.4 and 2.6 kernel versions, but between 
2.6.8 and 
2.6.19.  Sorry if this wasn't very clear from my last email.

In trying to diagnose the problem I also looked at memory stats (vmstat) and 
found the 
'buffered' memory statistic way off from the comparable debian (2.6.8) install 
(0-300kb 
versus 500mb).

The vmstat man page has little information on this statistic and there seems to 
be varying
explanations on the web.  I was hoping for a decisive explanation (or link) and 
possibly 
advice in toggling this value (or reasons not to).

I'm still trying to work on this at my end.  Some recent tests show that it 
might be
related to the megasas driver or the large number of small files we are using 
on a xfs
formated 10T array.  I'll keep at it.

Thanks for your response,

-Elliott

=
Search for products and services at: 
http://search.mail.com

-- 
Powered by Outblaze
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: strange high system cpu usage.

2007-03-29 Thread Lee Revell

On 3/29/07, Elliott Johnson [EMAIL PROTECTED] wrote:

What problem are you trying to solve?  IOW, how do you know it's not
just an artifact of diferent load average calculation between 2.4 and
2.6?

Are you actually seeing reduced throughput/performance?  Or are you
just looking at load average?

Lee

Well the problem is apparent, we are having abnormally high cpu usage.  It's 
about a
20-40% performance hit.



Please post a kernel profile for the problematic workload with the
good and bad kernels (search the list archive for Andrew Morton's
instructions on doing it with oprofile, email me privately if you
can't find it).


The vmstat man page has little information on this statistic and there seems to 
be varying
explanations on the web.  I was hoping for a decisive explanation (or link) and 
possibly
advice in toggling this value (or reasons not to).


The meaning of these numbers can change drastically from one minor
release to the next, and the docs often lag behind the code.

I would not focus on tweaking VM knobs, but on describing the problem
in enough detail to fix the kernel - it's a bug if the same workload
regresses significantly from one release to another.

Lee
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/