The console proxy became unavailable again yesterday afternoon. I could SSH into it via its link local address and nothing seemed to be wrong inside the VM itself. However, the qemu-kvm process for that VM was at almost 100% CPU. Inside the VM, the CPU usage was minimal and the java process was running and listening on port 443. So there seems to be something wrong with it down at the KVM/QEMU level. It's weird how this keeps happening to the console proxy only and not any of the other VMs. I tried to reboot it from the management UI and after about 15 minutes, it finally did. Now the console proxy is working but I don't know how long it will last before it breaks again. I found this in libvirtd.log, which corresponds with the time the console proxy rebooted:
2014-05-22 17:17:04.362+0000: 25195: info : libvirt version: 0.10.2, package: 29.el6_5.7 (CentOS BuildSystem <http://bugs.centos.org>, 2014-04-07-07:42:04, c6b9.bsys.dev.centos.org) 2014-05-22 17:17:04.362+0000: 25195: error : qemuMonitorIO:614 : internal error End of file from monitor On Wed, May 21, 2014 at 2:07 PM, Ian Young <[email protected]> wrote: > I built and installed a libvirt 1.04 package from the Fedora src rpm. It > installed fine inside a test VM but installing it on the real hypervisor > was a bad idea and I doubt I'll be pursuing it further. All VMs promptly > stopped and this appeared in libvirtd.log: > > 2014-05-21 20:36:19.260+0000: 23567: info : libvirt version: 1.0.4, > package: 1.el6 (Unknown, 2014-05-21-11:36:09, redacted.com) > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_network.so not > accessible > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_storage.so not > accessible > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so not > accessible > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_secret.so not > accessible > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so not > accessible > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_interface.so not > accessible > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so not > accessible > 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 : > Module /usr/lib64/libvirt/connection-driver/libvirt_driver_lxc.so not > accessible > 2014-05-21 20:36:49.471+0000: 23570: error : do_open:1220 : no connection > driver available for qemu:///system > 2014-05-21 20:36:49.472+0000: 23567: error : virNetSocketReadWire:1370 : > End of file while reading data: Input/output error > 2014-05-21 20:36:49.473+0000: 23571: error : do_open:1220 : no connection > driver available for lxc:/// > 2014-05-21 20:36:49.474+0000: 23567: error : virNetSocketReadWire:1370 : > End of file while reading data: Input/output error > 2014-05-21 20:36:49.475+0000: 23568: error : do_open:1220 : no connection > driver available for qemu:///system > 2014-05-21 20:36:49.476+0000: 23567: error : virNetSocketReadWire:1370 : > End of file while reading data: Input/output error > 2014-05-21 20:36:49.678+0000: 23575: error : do_open:1220 : no connection > driver available for qemu:///system > 2014-05-21 20:36:49.678+0000: 23567: error : virNetSocketReadWire:1370 : > End of file while reading data: Input/output error > 2014-05-21 20:36:49.681+0000: 23572: error : do_open:1220 : no connection > driver available for qemu:///system > 2014-05-21 20:36:49.682+0000: 23567: error : virNetSocketReadWire:1370 : > End of file while reading data: Input/output error > > > On Wed, May 21, 2014 at 10:45 AM, Ian Young <[email protected]>wrote: > >> I was able to get it working by following these steps: >> >> 1. stop all instances >> 2. service cloudstack-management stop >> 3. service cloudstack-agent stop >> 4. virsh shutdown {domain} (for each of the system VMs) >> 5. service libvirtd stop >> 6. umount primary and secondary >> 7. reboot >> >> The console proxy is working again. I expect it will probably break >> again in a day or two. I have a feeling it's a result of this libvirtd >> bug, since I've seen the "cannot acquire state change lock" several times. >> >> https://bugs.launchpad.net/nova/+bug/1254872 >> >> I might try building my own libvirtd 1.0.3 for EL6. >> >> >> On Tue, May 20, 2014 at 6:21 PM, Ian Young <[email protected]>wrote: >> >>> So I got the console proxy working via HTTPS (by managing my own " >>> realhostip.com" DNS) last week and everything was working fine. Today, >>> all of a sudden, the console proxy stopped working again. The browser >>> says, "Connecting to 192-168-100-159.realhostip.com..." and eventually >>> times out. I tried to restart it and it went into a "Stopping" state that >>> never completed and the Agent State was "Disconnected." I could not shut >>> down the VM using virsh or with "kill -9" because libvirtd kept saying, >>> "cannot acquire state change lock," so I gracefully shut down the remaining >>> instances and rebooted the entire management server/hypervisor. Start over. >>> >>> When it came back up, the SSVM and console proxy started but the virtual >>> router was stopped. I was able to manually start it from the UI. The >>> console proxy still times out when I try to access it from a browser. I >>> don't see any errors in the management or agent logs, just this: >>> >>> 2014-05-20 18:04:27,632 DEBUG [c.c.a.t.Request] (catalina-exec-10:null) >>> Seq 1-2130378876: Sending { Cmd , MgmtId: 55157049428734, via: 1( >>> virthost1.redacted.com), Ver: v1, Flags: 100011, >>> [{"com.cloud.agent.api.GetVncPortCommand":{"id":4,"name":"r-4-VM","wait":0}}] >>> } >>> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request] >>> (AgentManager-Handler-3:null) Seq 1-2130378876: Processing: { Ans: , >>> MgmtId: 55157049428734, via: 1, Ver: v1, Flags: 10, >>> [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"192.168.100.6","port":5902,"result":true,"wait":0}}] >>> } >>> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request] (catalina-exec-10:null) >>> Seq 1-2130378876: Received: { Ans: , MgmtId: 55157049428734, via: 1, Ver: >>> v1, Flags: 10, { GetVncPortAnswer } } >>> 2014-05-20 18:04:27,684 DEBUG [c.c.s.ConsoleProxyServlet] >>> (catalina-exec-10:null) Port info 192.168.100.6 >>> 2014-05-20 18:04:27,684 INFO [c.c.s.ConsoleProxyServlet] >>> (catalina-exec-10:null) Parse host info returned from executing >>> GetVNCPortCommand. host info: 192.168.100.6 >>> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet] >>> (catalina-exec-10:null) Compose console url: >>> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A >>> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet] >>> (catalina-exec-10:null) the console url is :: >>> <html><title>r-4-VM</title><frameset><frame src=" >>> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A >>> "></frame></frameset></html> >>> 2014-05-20 18:04:29,216 DEBUG [c.c.a.m.AgentManagerImpl] >>> (AgentManager-Handler-4:null) SeqA 2-545: Processing Seq 2-545: { Cmd , >>> MgmtId: -1, via: 2, Ver: v1, Flags: 11, >>> [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":2,"_loadInfo":"{\n >>> \"connections\": []\n}","wait":0}}] } >>> >>> If I try to restart the system VMs with cloudstack-sysvmadm, it says: >>> >>> Stopping and starting 1 secondary storage vm(s)... >>> curl: (7) couldn't connect to host >>> ERROR: Failed to stop secondary storage vm with id 1 >>> >>> Done stopping and starting secondary storage vm(s) >>> >>> Stopping and starting 1 console proxy vm(s)... >>> curl: (7) couldn't connect to host >>> ERROR: Failed to stop console proxy vm with id 2 >>> >>> Done stopping and starting console proxy vm(s) . >>> >>> Stopping and starting 1 running routing vm(s)... >>> curl: (7) couldn't connect to host >>> 2 >>> Done restarting router(s). >>> >>> I notice there are now four entries for the same management server in >>> the mshost table, and they all are in an "Up" state and the "removed" field >>> is NULL. What's wrong with this system? >>> >> >> >
