Hi Somesh,

Thanks for your reply.

>Instead of rebooting the KVM hosts, you may want try stopping the agent on
all the hosts
>and then starting the agent service one by one.

We have done this, in fact, this is what we tried to do every time we want
to reconnect a CloudStack agent (on Alert or Disconnected state) to the
management server. We are running Ubuntu 12.04 LTS platform for the agent
hosts as well as the management server.

service cloudstack-agent stop
(optional:) killall jsvc
(optional:) service libvirt-bin restart
service cloudstack-agent start

But it didn't work for that particular occasion. Not too sure why. So far
we didn't have any further disconnection issues after that particular
incident so I don't know if the problem will still be there when a host
gets disconnected now. It will be very disruptive to always reboot the
hypervisor host (and sacrifices all running VMs in the process) every time
a host gets disconnected for any reason.

Thank you.


On Wed, Apr 6, 2016 at 8:53 PM, Somesh Naidu <somesh.na...@citrix.com>
wrote:

> > Eventually, we could only connect back the host after we rebooted it,
> which means sacrificing all the VMs which were still up and running during
> the disconnection.
>
> Instead of rebooting the KVM hosts, you may want try stopping the agent on
> all the hosts and then starting the agent service one by one.
>
> > Will adding new management server be able to resolve the problem?
>
> That really depends on whether your existing management servers are
> optimally tuned and still the resources are getting maxed out, if not,
> adding another server will be more of an overhead than benefit.
>
> Regards,
> Somesh
>
> -----Original Message-----
> From: Indra Pramana [mailto:in...@sg.or.id]
> Sent: Sunday, April 03, 2016 7:44 AM
> To: users@cloudstack.apache.org
> Subject: Re: URGENT - CloudStack agent not able to connect to management
> server
>
> Hi Lucian,
>
> Good day to you, and thank you for your reply. Apologise for the delay in
> my reply.
>
> Yes, I can confirm that we can access the host and port specified. Based on
> the logs, the host can connect to the management server but there's no
> follow-up logs which usually come after it's connected. Eventually, we
> could only connect back the host after we rebooted it, which means
> sacrificing all the VMs which were still up and running during the
> disconnection.
>
> At the time when the first hypervisor was disconnected, the CloudStack
> management servers were very busy handling the disconnections, trying to
> fence the hosts and initiate HA for all the affected VMs, based on the
> logs. Could this have put a strain on the management server, causing it to
> disconnect all the remaining hosts? Will adding new management server be
> able to resolve the problem?
>
> Any advice is appreciated.
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
> On Thu, Mar 31, 2016 at 5:28 PM, Nux! <n...@li.nux.ro> wrote:
>
> > Hello,
> >
> > Are you sure you can connect from the hypervisors to the
> > cloudstack-management on the host and port specified in the
> > agent.properties?
> >
> > --
> > Sent from the Delta quadrant using Borg technology!
> >
> > Nux!
> > www.nux.ro
> >
> > ----- Original Message -----
> > > From: "Indra Pramana" <in...@sg.or.id>
> > > To: users@cloudstack.apache.org
> > > Sent: Thursday, 31 March, 2016 03:14:59
> > > Subject: URGENT - CloudStack agent not able to connect to management
> > server
> >
> > > Dear all,
> > >
> > > We are using CloudStack 4.2.0, KVM hypervisor and Ceph RBD storage. All
> > our
> > > agents got disconnected from the management server and unable to
> connect
> > > again, despite rebooting the management server and stopping and
> > restarting
> > > the cloudstack-agent many times.
> > >
> > > We even tried to physically reboot a hypervisor host (sacrificing all
> the
> > > running VMs inside) to see if it can reconnect after boot-up, and it's
> > not
> > > able to reconnect (keep on "Connecting" state). Here's the excerpts
> from
> > > the logs:
> > >
> > > ====
> > > 2016-03-31 10:07:49,346 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > Sending ping: Seq 0-11:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> 11,
> > >
> >
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> > > }
> > > 2016-03-31 10:07:49,395 DEBUG [cloud.agent.Agent]
> (Agent-Handler-2:null)
> > > Received response: Seq 0-11:  { Ans: , MgmtId: 161342671900, via: 75,
> > Ver:
> > > v1, Flags: 100010,
> > >
> >
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> > > }
> > > 2016-03-31 10:08:49,271 DEBUG [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Executing:
> > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > > get_rule_logs_for_vms
> > > 2016-03-31 10:08:49,350 DEBUG [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Execution is successful.
> > > 2016-03-31 10:08:49,353 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > Sending ping: Seq 0-12:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> 11,
> > >
> >
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> > > }
> > > 2016-03-31 10:08:49,406 DEBUG [cloud.agent.Agent]
> (Agent-Handler-3:null)
> > > Received response: Seq 0-12:  { Ans: , MgmtId: 161342671900, via: 75,
> > Ver:
> > > v1, Flags: 100010,
> > >
> >
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> > > }
> > > 2016-03-31 10:09:49,272 DEBUG [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Executing:
> > > /usr/share/cloudstack-common/scripts/vm/network/security_group.py
> > > get_rule_logs_for_vms
> > > 2016-03-31 10:09:49,345 DEBUG [kvm.resource.LibvirtComputingResource]
> > > (UgentTask-5:null) Execution is successful.
> > > 2016-03-31 10:09:49,347 DEBUG [cloud.agent.Agent] (UgentTask-5:null)
> > > Sending ping: Seq 0-13:  { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags:
> 11,
> > >
> >
> [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"newStates":{},"_gatewayAccessible":true,"_vnetAccessible":true,"hostType":"Routing","hostId":0,"wait":0}}]
> > > }
> > > 2016-03-31 10:09:49,398 DEBUG [cloud.agent.Agent]
> (Agent-Handler-4:null)
> > > Received response: Seq 0-13:  { Ans: , MgmtId: 161342671900, via: 75,
> > Ver:
> > > v1, Flags: 100010,
> > >
> >
> [{"com.cloud.agent.api.PingAnswer":{"_command":{"hostType":"Routing","hostId":0,"wait":0},"result":true,"wait":0}}]
> > > }
> > > ====
> > >
> > > On the existing hypervisor hosts, normally the agent would stuck at
> this
> > > stage and from Cloudstack GUI, we don't see the agent in "Connecting"
> > > state, it will be either on "Disconnected" or "Alert" state.
> > >
> > > ====
> > > 2016-03-31 07:37:09,819 DEBUG [utils.script.Script] (main:null)
> > Executing:
> > > /bin/bash -c uname -r
> > > 2016-03-31 07:37:09,829 DEBUG [utils.script.Script] (main:null)
> Execution
> > > is successful.
> > > 2016-03-31 07:37:09,832 DEBUG [cloud.agent.Agent] (main:null) Adding
> > > shutdown hook
> > > 2016-03-31 07:37:09,833 INFO  [cloud.agent.Agent] (main:null) Agent
> [id =
> > > 73 : type = LibvirtComputingResource : zone = 6 : pod = 6 : workers =
> 5 :
> > > host = 10.x.x.x : port = 8250
> > > 2016-03-31 07:37:09,856 INFO  [utils.nio.NioClient]
> (Agent-Selector:null)
> > > Connecting to 10.x.x.x:8250
> > > 2016-03-31 07:37:10,178 INFO  [utils.nio.NioClient]
> (Agent-Selector:null)
> > > SSL: Handshake done
> > > 2016-03-31 07:37:10,179 INFO  [utils.nio.NioClient]
> (Agent-Selector:null)
> > > Connected to 10.x.x.x:8250
> > > ====
> > >
> > > No other significant and useful logs found on both the agents and
> > > management server logs.
> > >
> > > Anyone can give a clue on what could be the problem? Have been trying
> to
> > > reconnect in the past couple of hours without any issues. Any help is
> > > greatly appreciated.
> > >
> > > Looking forward to your reply, thnk you.
> > >
> > > Cheers.
> > >
> > > -ip-
> >
>

Reply via email to