Re: [one-users] OpenNebula Load Average/CPU Usage

Alberto Zuin - Liste Thu, 05 Jul 2012 23:23:38 -0700

Hello Tao,

I had the same problem on my server (see "[one-users] OpenNebula DrainCPU").

Thanks for your solutions, save me from a very big hardware upgrade!
Use:


date -s "$(LC_ALL=C date)"

on localized servers.

Alberto Zuin


On 06/07/2012 00:24, Tao Craig wrote:

Hi Marshall,

I think this could be related to the "leap second bug". Did you noticeany lag prior to this past weekend?If not, try issuing this command on your cloud controller(s): date -s"`date`"Alternatively, you can reboot. I had pretty much an identical issue asyou and it appears it was related to the "leap second bug". A rebootfixed it for me, but some people have had success with the date command.


    ----- Original Message -----
    *From:* Marshall Grillos <mailto:[email protected]>
    *To:* [email protected] <mailto:[email protected]>
    *Cc:* Rusty Wolf <mailto:[email protected]>
    *Sent:* Thursday, July 05, 2012 11:53 AM
    *Subject:* [one-users] OpenNebula Load Average/CPU Usage

    My company is running OpenNebula 3.4.  It's been running in
    production now for just over a month.  Recently we started
    noticing issues in Sunstone.  Mainly the VM List wouldn't load and
    several other parts of the GUI would not load.  I attempted to
    restart Sunstone and oned and the problem persists.

    One item of note is the high CPU utilization of several of
    OpenNebula's processes.  Here are the top details from our cloud
    controller (this server also serves up the shared data for our VMs
    via NFS):
    top - 13:47:21 up 48 days, 17:07,  1 user,  load average: 7.78,
    5.09, 2.69
    Tasks: 305 total,   2 running, 303 sleeping,   0 stopped,   0 zombie
    Cpu(s):  0.0%us, 52.2%sy,  6.5%ni, 39.1%id,  0.0%wa,  0.0%hi,
     2.2%si,  0.0%st
    Mem:  32865312k total, 32622008k used,   243304k free,    65196k
    buffers
    Swap: 16383992k total,        0k used, 16383992k free, 30481224k
    cached

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+  COMMAND
    16664 oneadmin  39  19 53652  15m 1548 S 69.8  0.0 9:37.50 ruby
    16718 oneadmin  39  19 39392 4044 1452 S 67.8  0.0 8:49.46 ruby
    16697 oneadmin  39  19 39220 3880 1452 S 67.1  0.0 9:04.05 ruby
    16687 oneadmin  39  19 39388 4040 1452 S 65.1  0.0 9:20.84 ruby
    16677 oneadmin  39  19 39500 4228 1464 S 54.2  0.0 8:13.68 ruby
    16708 oneadmin  39  19 43520 6780 1524 S 42.2  0.0 8:09.96 ruby

    Here is the process list for oneadmin on the cloud controller:
    oneadmin 16629  0.0  0.0 108288  1900 pts/0    S    13:31   0:00 bash
    oneadmin 16641  0.0  0.0 1461156 12604 pts/0   Sl   13:31   0:00
    /usr/bin/oned -f
    oneadmin 16642  0.0  0.0 192200  5688 pts/0    Sl   13:31   0:00
    /usr/bin/mm_sched
    oneadmin 16664 58.9  0.0  53652 16160 pts/0    SNl  13:31   7:04
    ruby /usr/lib/one/mads/one_vmm_exec.rb -t 15 -r 0 xen
    oneadmin 16677 51.8  0.0  39500  4228 pts/0    SNl  13:31   6:12
    ruby /usr/lib/one/mads/one_im_exec.rb xen
    oneadmin 16687 56.4  0.0  39388  4040 pts/0    SNl  13:31   6:46
    ruby /usr/lib/one/mads/one_tm.rb -t 15 -d
    dummy,lvm,shared,qcow2,ssh,vmware,iscsi
    oneadmin 16697 53.9  0.0  39220  3880 pts/0    SNl  13:31   6:28
    ruby /usr/lib/one/mads/one_hm.rb
    oneadmin 16708 53.1  0.0  43520  6780 pts/0    SNl  13:31   6:21
    ruby /usr/lib/one/mads/one_datastore.rb -t 15 -d fs,vmware,iscsi
    oneadmin 16718 52.5  0.0  39392  4044 pts/0    SNl  13:31   6:17
    ruby /usr/lib/one/mads/one_auth_mad.rb --authn
    ssh,x509,ldap,server_cipher,server_x509
    oneadmin 16997  1.0  0.0 110212  1164 pts/0    R+   13:43   0:00
    ps -aux
    oneadmin 16998  0.0  0.0 103116   908 pts/0    S+   13:43   0:00 more

    When I stop one, the load average on the server returns to normal
    with over 98% idle CPU.  I can't seem to find anything bad in the
    logs:
    Thu Jul  5 13:47:24 2012 [VMM][I]: --Mark--
    Thu Jul  5 13:47:50 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:47:50 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:47:50 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:48:19 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:48:19 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:48:19 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:48:48 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:48:48 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:48:48 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:49:17 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:49:17 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:49:17 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:49:46 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:49:46 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:49:46 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:50:15 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:50:15 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:50:15 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:50:44 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:50:44 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:50:44 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:51:13 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:51:13 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:51:13 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 134.
    Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 184.
    Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 200.
    Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 202.
    Thu Jul  5 13:51:28 2012 [VMM][I]: Monitoring VM 206.
    Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 123.
    Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 130.
    Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 162.
    Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 186.
    Thu Jul  5 13:51:32 2012 [VMM][I]: Monitoring VM 208.
    Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.31 (6)
    Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.32 (7)
    Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.33 (9)
    Thu Jul  5 13:51:36 2012 [InM][I]: Monitoring host 10.20.52.34 (10)
    Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 127.
    Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 141.
    Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 146.
    Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 190.
    Thu Jul  5 13:51:36 2012 [VMM][I]: Monitoring VM 201.
    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 202
    ExitCode: 0

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
    202 NAME=one-202 STATE=a USEDCPU=0.3 USEDMEMORY=4197164 NETTX=5147
    NETRX=24499

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 206
    ExitCode: 0

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 184
    ExitCode: 0

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 134
    ExitCode: 0

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
    206 NAME=one-206 STATE=a USEDCPU=0.3 USEDMEMORY=4197164
    NETTX=37568 NETRX=640851

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
    184 NAME=one-184 STATE=a USEDCPU=2.2 USEDMEMORY=4197164
    NETTX=1220134 NETRX=496270

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
    134 NAME=one-134 STATE=a USEDCPU=0.9 USEDMEMORY=8391468 NETTX=665
    NETRX=1451

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: LOG I 200
    ExitCode: 0

    Thu Jul  5 13:51:38 2012 [VMM][D]: Message received: POLL SUCCESS
    200 NAME=one-200 STATE=a USEDCPU=10.3 USEDMEMORY=8392616
    NETTX=265144 NETRX=370045

    Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
    Thu Jul  5 13:51:39 2012 [InM][D]: Host 6 successfully monitored.
    Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
    Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
    Thu Jul  5 13:51:39 2012 [InM][D]: Host 7 successfully monitored.
    Thu Jul  5 13:51:39 2012 [InM][D]: Host 9 successfully monitored.
    Thu Jul  5 13:51:39 2012 [InM][I]: ExitCode: 0
    Thu Jul  5 13:51:39 2012 [InM][D]: Host 10 successfully monitored.
    Thu Jul  5 13:51:40 2012 [VMM][I]: Monitoring VM 120.
    Thu Jul  5 13:51:40 2012 [VMM][I]: Monitoring VM 143.
    Thu Jul  5 13:51:40 2012 [VMM][I]: Monitoring VM 191.
    Thu Jul  5 13:51:42 2012 [ReM][D]: HostPoolInfo method invoked
    Thu Jul  5 13:51:42 2012 [ReM][D]: VirtualMachinePoolInfo method
    invoked
    Thu Jul  5 13:51:42 2012 [ReM][D]: AclInfo method invoked
    Thu Jul  5 13:51:42 2012 [VMM][D]: Message received: LOG I 208
    ExitCode: 0

    Thu Jul  5 13:51:42 2012 [VMM][D]: Message received: POLL SUCCESS
    208 NAME=one-208 STATE=a USEDCPU=3.7 USEDMEMORY=8391468
    NETTX=250167 NETRX=162294

    The running VMs are not impacted in any way -- we have resorted to
    leaving one stopped until we can resolve the issue.  What
    can/should I look at to begin debugging this problem?

    Thanks,
    Marshall

    ------------------------------------------------------------------------
    _______________________________________________
    Users mailing list
    [email protected]
    http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

    ------------------------------------------------------------------------

    No virus found in this message.
    Checked by AVG - www.avg.com <http://www.avg.com>
    Version: 2012.0.2193 / Virus Database: 2437/5112 - Release Date:
    07/05/12



_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org



--
----------------------------
Alberto Zuin
via Mare, 36/A
36030 Lugo di Vicenza (VI)
Italy
P.I. 04310790284
Tel. +39.0499271575
Fax. +39.0492106654
Cell. +39.3286268626
www.azns.it - [email protected]

_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Re: [one-users] OpenNebula Load Average/CPU Usage

Reply via email to