[Bug 1023214] Re: bogus utime and stime in /proc/

alecm3 Tue, 10 Jul 2012 17:59:40 -0700

** Description changed:

  I administer a large number of servers, and I have this problem only
  with Ubuntu 10.04 LTS: I run a server under normal load (say load
  average 3.0 on an 8-core server). The "top" command shows processes
  taking certain % of CPU that cause this load average: say
  
- Code:
+ 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  11008 mysql     20   0 25.9g  22g 5496 S   67 76.0 643539:38 mysqld
- Code:
+ 
  ps -o pcpu,pid -p11008
  %CPU   PID
  53.1 11008
- , everything is consistent.
+ 
+ everything is consistent.
  
  The all of the sudden, the process causing the load average disappears
  from "top", but the process continues to run normally (albeit with a
  slight performance decrease), and the system load average becomes
  somewhat higher. The output of ps -o pcpu becomes bogus:
  
- Code:
+ 
  # ps -o pcpu,pid -p11008
  %CPU      PID
  317910278 1587
+ 
+ 
  This happened to at least 5 different severs (different brand new IBM System 
X hardware), each running different software: one httpd 2.2, one mysqld 5.1, 
and one Twisted Python TCP servers. Each time the kernel was between 
2.6.32-32-server and 2.6.32-40-server. I updated some machines to 
2.6.32-41-server, and it has not happened on those yet, but the bug is rare 
(once every 60 days or so).
  
  This is from an affected machine:
  
  Code:
  top - 10:39:06 up 73 days, 17:57,  3 users,  load average: 6.62, 5.60, 5.34
  Tasks: 207 total,   2 running, 205 sleeping,   0 stopped,   0 zombie
  Cpu(s): 11.4%us, 18.0%sy,  0.0%ni, 66.3%id,  4.3%wa,  0.0%hi,  0.0%si,  0.0%st
  Mem:  74341464k total, 71985004k used,  2356460k free,   236456k buffers
  Swap:  3906552k total,      328k used,  3906224k free, 24838212k cached
  
- PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
                                                    
- 805 root      20   0     0    0    0 S    3  0.0   1493:09 fct0-worker        
                                                    
- 982 root      20   0     0    0    0 S    1  0.0 111:35.05 fioa-data-groom    
                                                    
- 914 root      20   0     0    0    0 S    0  0.0 884:42.71 fct1-worker        
                                                    
+ PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
+ 805 root      20   0     0    0    0 S    3  0.0   1493:09 fct0-worker
+ 982 root      20   0     0    0    0 S    1  0.0 111:35.05 fioa-data-groom
+ 914 root      20   0     0    0    0 S    0  0.0 884:42.71 fct1-worker
  1068 root      20   0 19364 1496 1060 R    0  0.0   0:00.02 top
- Nothing causing high load is showing on top, but I have two highly loaded 
mysqld instances on it, that suddenly show crazy %CPU:
  
- Code:
+ Nothing causing high load is showing on top, but I have two highly
+ loaded mysqld instances on it, that suddenly show crazy %CPU:
+ 
+ 
  #ps -o pcpu,pid,cmd -p1587
  %CPU   PID CMD
  317713124 1587 /nail/encap/mysql-5.1.60/libexec/mysqld
  and
  
- Code:
+ 
  #ps -o pcpu,pid,cmd -p1624
  %CPU   PID CMD
  2802  1624 /nail/encap/mysql-5.1.60/libexec/mysqld
+ 
+ 
  Here are the numbers from
  
- Code:
-  # cat /proc/1587/stat
+ # cat /proc/1587/stat
  1587 (mysqld) S 1212 1088 1088 0 -1 4202752 14307313 0 162 0 85773299069 
4611685932654088833 0 0 20 0 52 0 3549 27255418880 5483524 18446744073709551615 
4194304 11111617 140733749236976 140733749235984 8858659 0 552967 4102 26345 
18446744073709551615 0 0 17 5 0 0 0 0 0
  
  The 14th and 15th numbers according to
  
  man proc and fs/proc/array.c are supposed to be:
  
  utime %lu Amount of time that this process has been scheduled in user mode, 
measured in clock ticks (divide by
  sysconf(_SC_CLK_TCK). This includes guest time, guest_time (time spent 
running a virtual CPU, see
  below), so that applications that are not aware of the guest time field do 
not lose that time from
  their calculations.
  
  stime %lu Amount of time that this process has been scheduled in kernel mode, 
measured in clock ticks (divide by
  sysconf(_SC_CLK_TCK).
  
  On a normal server, these numbers 13 orders of magnitude smaller and
  they are advancing by about 10 per sec. On a buggy server, these numbers
  are stuck at a ridiculously high value like 4611685932654088833, and
  they are not changing.
  
+ I noticed that this bug does not happen on 2.6.24-26-server , and there
+ has been many changes in fs/proc/array.c between 2.6.24-26-server and
+ 2.6.32-32-server when this bug starting to occur.
+ 
  ProblemType: Bug
  DistroRelease: Ubuntu 10.04
  Package: linux-server 2.6.32.40.47
  Regression: Yes
  Reproducible: No
  ProcVersionSignature: Ubuntu 2.6.32-40.87-server 2.6.32.57+drm33.23
  Uname: Linux 2.6.32-40-server x86_64
  NonfreeKernelModules: iomemory_vsl
  AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 
2: ls: cannot access /dev/snd/: No such file or directory
  AplayDevices: Error: [Errno 2] No such file or directory
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory
  Date: Tue Jul 10 17:33:10 2012
  Frequency: Once a month.
  InstallationMedia: Ubuntu-Server 10.04.2 LTS "Lucid Lynx" - Release amd64 
(20110211.1)
  MachineType: IBM System x3550 M3 -[7944AC1]-
  PciMultimedia:
-  
+ 
  ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-40-server 
root=UUID=ef643bc9-46ac-474d-bb00-4e0bfee77808 ro quiet
  ProcEnviron:
-  LANG=en_US.UTF-8
-  SHELL=/bin/bash
+  LANG=en_US.UTF-8
+  SHELL=/bin/bash
  SourcePackage: linux
  dmi.bios.date: 02/08/2011
  dmi.bios.vendor: IBM Corp.
  dmi.bios.version: -[D6E150CUS-1.11]-
  dmi.board.asset.tag: (none)
  dmi.board.name: 69Y5698
  dmi.board.vendor: IBM
  dmi.board.version: (none)
  dmi.chassis.asset.tag: none
  dmi.chassis.type: 23
  dmi.chassis.vendor: IBM
  dmi.chassis.version: none
  dmi.modalias: 
dmi:bvnIBMCorp.:bvr-[D6E150CUS-1.11]-:bd02/08/2011:svnIBM:pnSystemx3550M3-[7944AC1]-:pvr00:rvnIBM:rn69Y5698:rvr(none):cvnIBM:ct23:cvrnone:
  dmi.product.name: System x3550 M3 -[7944AC1]-
  dmi.product.version: 00
  dmi.sys.vendor: IBM


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1023214

Title:
  bogus utime and stime in /proc/<PID/stat

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1023214/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1023214] Re: bogus utime and stime in /proc/

Reply via email to