Hi Somesh, Thanks for your reply.
>Instead of rebooting the KVM hosts, you may want try stopping the agent on all the hosts >and then starting the agent service one by one. We have done this, in fact, this is what we tried to do every time we want to reconnect a CloudStack agent (on Alert or Disconnected state) to the management server. We are running Ubuntu 12.04 LTS platform for the agent hosts as well as the management server. service cloudstack-agent stop (optional:) killall jsvc (optional:) service libvirt-bin restart service cloudstack-agent start But it didn't work for that particular occasion. Not too sure why. So far we didn't have any further disconnection issues after that particular incident so I don't know if the problem will still be there when a host gets disconnected now. It will be very disruptive to always reboot the hypervisor host (and sacrifices all running VMs in the process) every time a host gets disconnected for any reason. Thank you. On Wed, Apr 6, 2016 at 8:53 PM, Somesh Naidu <somesh.na...@citrix.com> wrote: > > Eventually, we could only connect back the host after we rebooted it, > which means sacrificing all the VMs which were still up and running during > the disconnection. > > Instead of rebooting the KVM hosts, you may want try stopping the agent on > all the hosts and then starting the agent service one by one. > > > Will adding new management server be able to resolve the problem? > > That really depends on whether your existing management servers are > optimally tuned and still the resources are getting maxed out, if not, > adding another server will be more of an overhead than benefit. > > Regards, > Somesh > > -----Original Message----- > From: Indra Pramana [mailto:in...@sg.or.id] > Sent: Sunday, April 03, 2016 7:44 AM > To: users@cloudstack.apache.org > Subject: Re: URGENT - CloudStack agent not able to connect to management > server > > Hi Lucian, > > Good day to you, and thank you for your reply. Apologise for the delay in > my reply. > > Yes, I can confirm that we can access the host and port specified. Based on > the logs, the host can connect to the management server but there's no > follow-up logs which usually come after it's connected. Eventually, we > could only connect back the host after we rebooted it, which means > sacrificing all the VMs which were still up and running during the > disconnection. > > At the time when the first hypervisor was disconnected, the CloudStack > management servers were very busy handling the disconnections, trying to > fence the hosts and initiate HA for all the affected VMs, based on the > logs. Could this have put a strain on the management server, causing it to > disconnect all the remaining hosts? Will adding new management server be > able to resolve the problem? > > Any advice is appreciated. > > Looking forward to your reply, thank you. > > Cheers. > > On Thu, Mar 31, 2016 at 5:28 PM, Nux! <n...@li.nux.ro> wrote: > > > Hello, > > > > Are you sure you can connect from the hypervisors to the > > cloudstack-management on the host and port specified in the > > agent.properties? > > > > -- > > Sent from the Delta quadrant using Borg technology! > > > > Nux! > > www.nux.ro > > > > ----- Original Message ----- > > > From: "Indra Pramana" <in...@sg.or.id> > > > To: users@cloudstack.apache.org > > > Sent: Thursday, 31 March, 2016 03:14:59 > > > Subject: URGENT - CloudStack agent not able to connect to management > > server > > > > > Dear all, > > > > > > We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage. All > > our > > > agents got disconnected from the management server and unable to > connect > > > again, despite rebooting the management server and stopping and > > restarting > > > the cloudstack-agent many times. > > > > > > We even tried to physically reboot a hypervisor host (sacrificing all > the > > > running VMs inside) to see if it can reconnect after boot-up, and it's > > not > > > able to reconnect (keep on "Connecting" state). Here's the excerpts > from > > > the logs: > > > > > > ==== > > > 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > > > Sending ping: Seq 0-11: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: > 11, > > > > > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}] > > > } > > > 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent] > (Agent-Handler-2:null) > > > Received response: Seq 0-11: { Ans: , MgmtId: 161342671900, via: 75, > > Ver: > > > v1, Flags: 100010, > > > > > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}] > > > } > > > 2016-03-31 10:08:49,271 DEBUG [kvm.resource.LibvirtComputingResource] > > > (UgentTask-5:null) Executing: > > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py > > > get_rule_logs_for_vms > > > 2016-03-31 10:08:49,350 DEBUG [kvm.resource.LibvirtComputingResource] > > > (UgentTask-5:null) Execution is successful. > > > 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > > > Sending ping: Seq 0-12: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: > 11, > > > > > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}] > > > } > > > 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent] > (Agent-Handler-3:null) > > > Received response: Seq 0-12: { Ans: , MgmtId: 161342671900, via: 75, > > Ver: > > > v1, Flags: 100010, > > > > > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}] > > > } > > > 2016-03-31 10:09:49,272 DEBUG [kvm.resource.LibvirtComputingResource] > > > (UgentTask-5:null) Executing: > > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py > > > get_rule_logs_for_vms > > > 2016-03-31 10:09:49,345 DEBUG [kvm.resource.LibvirtComputingResource] > > > (UgentTask-5:null) Execution is successful. > > > 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null) > > > Sending ping: Seq 0-13: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: > 11, > > > > > > [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}] > > > } > > > 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent] > (Agent-Handler-4:null) > > > Received response: Seq 0-13: { Ans: , MgmtId: 161342671900, via: 75, > > Ver: > > > v1, Flags: 100010, > > > > > > [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}] > > > } > > > ==== > > > > > > On the existing hypervisor hosts, normally the agent would stuck at > this > > > stage and from Cloudstack GUI, we don't see the agent in "Connecting" > > > state, it will be either on "Disconnected" or "Alert" state. > > > > > > ==== > > > 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null) > > Executing: > > > /bin/bash -c uname -r > > > 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null) > Execution > > > is successful. > > > 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding > > > shutdown hook > > > 2016-03-31 07:37:09,833 INFO [cloud.agent.Agent] (main:null) Agent > [id = > > > 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers = > 5 : > > > host = 10.x.x.x : port = 8250 > > > 2016-03-31 07:37:09,856 INFO [utils.nio.NioClient] > (Agent-Selector:null) > > > Connecting to 10.x.x.x:8250 > > > 2016-03-31 07:37:10,178 INFO [utils.nio.NioClient] > (Agent-Selector:null) > > > SSL: Handshake done > > > 2016-03-31 07:37:10,179 INFO [utils.nio.NioClient] > (Agent-Selector:null) > > > Connected to 10.x.x.x:8250 > > > ==== > > > > > > No other significant and useful logs found on both the agents and > > > management server logs. > > > > > > Anyone can give a clue on what could be the problem? Have been trying > to > > > reconnect in the past couple of hours without any issues. Any help is > > > greatly appreciated. > > > > > > Looking forward to your reply, thnk you. > > > > > > Cheers. > > > > > > -ip- > > >