I was able to get it working by following these steps:
1. stop all instances
2. service cloudstack-management stop
3. service cloudstack-agent stop
4. virsh shutdown {domain} (for each of the system VMs)
5. service libvirtd stop
6. umount primary and secondary
7. reboot
The console proxy is working again. I expect it will probably break again
in a day or two. I have a feeling it's a result of this libvirtd bug,
since I've seen the "cannot acquire state change lock" several times.
https://bugs.launchpad.net/nova/+bug/1254872
I might try building my own libvirtd 1.0.3 for EL6.
On Tue, May 20, 2014 at 6:21 PM, Ian Young <[email protected]> wrote:
> So I got the console proxy working via HTTPS (by managing my own "
> realhostip.com" DNS) last week and everything was working fine. Today,
> all of a sudden, the console proxy stopped working again. The browser
> says, "Connecting to 192-168-100-159.realhostip.com..." and eventually
> times out. I tried to restart it and it went into a "Stopping" state that
> never completed and the Agent State was "Disconnected." I could not shut
> down the VM using virsh or with "kill -9" because libvirtd kept saying,
> "cannot acquire state change lock," so I gracefully shut down the remaining
> instances and rebooted the entire management server/hypervisor. Start over.
>
> When it came back up, the SSVM and console proxy started but the virtual
> router was stopped. I was able to manually start it from the UI. The
> console proxy still times out when I try to access it from a browser. I
> don't see any errors in the management or agent logs, just this:
>
> 2014-05-20 18:04:27,632 DEBUG [c.c.a.t.Request] (catalina-exec-10:null)
> Seq 1-2130378876: Sending { Cmd , MgmtId: 55157049428734, via: 1(
> virthost1.redacted.com), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.GetVncPortCommand":{"id":4,"name":"r-4-VM","wait":0}}]
> }
> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request]
> (AgentManager-Handler-3:null) Seq 1-2130378876: Processing: { Ans: ,
> MgmtId: 55157049428734, via: 1, Ver: v1, Flags: 10,
> [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"192.168.100.6","port":5902,"result":true,"wait":0}}]
> }
> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request] (catalina-exec-10:null)
> Seq 1-2130378876: Received: { Ans: , MgmtId: 55157049428734, via: 1, Ver:
> v1, Flags: 10, { GetVncPortAnswer } }
> 2014-05-20 18:04:27,684 DEBUG [c.c.s.ConsoleProxyServlet]
> (catalina-exec-10:null) Port info 192.168.100.6
> 2014-05-20 18:04:27,684 INFO [c.c.s.ConsoleProxyServlet]
> (catalina-exec-10:null) Parse host info returned from executing
> GetVNCPortCommand. host info: 192.168.100.6
> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet]
> (catalina-exec-10:null) Compose console url:
> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A
> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet]
> (catalina-exec-10:null) the console url is ::
> <html><title>r-4-VM</title><frameset><frame src="
> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A
> "></frame></frameset></html>
> 2014-05-20 18:04:29,216 DEBUG [c.c.a.m.AgentManagerImpl]
> (AgentManager-Handler-4:null) SeqA 2-545: Processing Seq 2-545: { Cmd ,
> MgmtId: -1, via: 2, Ver: v1, Flags: 11,
> [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":2,"_loadInfo":"{\n
> \"connections\": []\n}","wait":0}}] }
>
> If I try to restart the system VMs with cloudstack-sysvmadm, it says:
>
> Stopping and starting 1 secondary storage vm(s)...
> curl: (7) couldn't connect to host
> ERROR: Failed to stop secondary storage vm with id 1
>
> Done stopping and starting secondary storage vm(s)
>
> Stopping and starting 1 console proxy vm(s)...
> curl: (7) couldn't connect to host
> ERROR: Failed to stop console proxy vm with id 2
>
> Done stopping and starting console proxy vm(s) .
>
> Stopping and starting 1 running routing vm(s)...
> curl: (7) couldn't connect to host
> 2
> Done restarting router(s).
>
> I notice there are now four entries for the same management server in the
> mshost table, and they all are in an "Up" state and the "removed" field is
> NULL. What's wrong with this system?
>