Hi all,

I'm testing CloudStack in two of our servers and I'm finding trouble starting the agent in the hypervisor node.

Heres a summary of my configuration:

0 -  Nw infrastructure:

  subnet: 172.16.0.0/16
  gateway 172.16.0.1
  DNS1: 192.168.1.204


1 - Cloud management server:
    HW: Dell PowerEdge R320
    OS: CentOS 6.4
    IP: 172.16.2.2/16
    hostname: morpheus  (morpheus.biometrics.local)

2 - Hypervisor node:
     HW: Dell PowerEdge R320
    OS: CentOS 6.4
    IP: 172.16.2.3/16
    Hypervisor type: KVM
    hostname: mnode-1 (mnode-1.biometrics.local)


Everything worked seamlessly on the cloud manager's side, and I didn't see any error while following the steps for the kvm hypervisor node described in the install manual. In the manager console, I was able to add the zone, pod and cluster, but when I tried to create the host, it spent several minutes "creating" and then I got a popup alert with the error: "Unable to add the host".

management-server.log in morpheus :
=======================

2013-05-20 11:52:05,403 DEBUG [cloud.api.ApiServlet] (catalina-exec-25:null) ===START=== 192.168.1.187 -- GET command=addHost&zoneid=ac6c892c-8a9d-494b-8d3c-c612263059a0&podid=a4dad17d-f8dc-4734-a6b2-46954144b954&clusterid=b808e253-3aa2-4372-8520-84d8cda88c27&hypervisor=KVM&clustertype=CloudManaged&hosttags=&username=root&url=http%3A%2F%2Fmnode-1&response=json&sessionkey=%2BcU40ygVQMKpfKAI9il7SspVnBk%3D&_=1369039922245 2013-05-20 11:52:05,413 INFO [cloud.resource.ResourceManagerImpl] (catalina-exec-25:null) Trying to add a new host at http://mnode-1 in data center 2 2013-05-20 11:52:05,665 DEBUG [utils.ssh.SSHCmdHelper] (catalina-exec-25:null) Executing cmd: lsmod|grep kvm 2013-05-20 11:52:06,797 DEBUG [utils.ssh.SSHCmdHelper] (catalina-exec-25:null) lsmod|grep kvm output:kvm_intel 53484 0
kvm                   316602  1 kvm_intel

2013-05-20 11:52:07,812 DEBUG [utils.ssh.SSHCmdHelper] (catalina-exec-25:null) Executing cmd: cloud-setup-agent -m 172.16.2.2 -z 2 -p 2 -c 2 -g 8224e6a1-e640-3bd1-b5d0-dd43c74b8b08 -a --pubNic=cloudbr0 --prvNic=cloudbr0 --guestNic=cloudbr0 2013-05-20 11:52:31,388 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip capacity scan due to there is no Primary Storage UPintenance mode 2013-05-20 11:52:31,805 DEBUG [storage.snapshot.SnapshotSchedulerImpl] (SnapshotPollTask:null) Snapshot scheduler.poll is being called at 2013-05-20 09:52:31 GMT 2013-05-20 11:52:31,806 DEBUG [storage.snapshot.SnapshotSchedulerImpl] (SnapshotPollTask:null) Got 0 snapshots to be executed at 2013-05-20 09:52:31 GMT 2013-05-20 11:52:31,830 DEBUG [cloud.network.ExternalLoadBalancerUsageManagerImpl] (ExternalNetworkMonitor-1:null) External load balancer devices stats collector is running... 2013-05-20 11:52:31,868 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterMonitor-1:null) Found 0 running routers. 2013-05-20 11:52:31,870 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers. 2013-05-20 11:52:41,440 DEBUG [utils.ssh.SSHCmdHelper] (catalina-exec-25:null) cloud-setup-agent -m 172.16.2.2 -z 2 -p 2 -c 2 -g 8224e6a1-e640-3bd1-b5d0-dd43c74b8b08 -a --pubNic=cloudbr0 --prvNic=cloudbr0 --guestNic=cloudbr0 output:CloudStack Agent setup is done!
   Configure Cgroup ...
2013-05-20 11:52:46,100 DEBUG [cloud.server.StatsCollector] (StatsCollector-1:null) HostStatsCollector is running... 2013-05-20 11:52:46,100 DEBUG [cloud.server.StatsCollector] (StatsCollector-2:null) VmStatsCollector is running... 2013-05-20 11:52:46,114 DEBUG [cloud.server.StatsCollector] (StatsCollector-3:null) StorageCollector is running... 2013-05-20 11:53:01,388 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip capacity scan due to there is no Primary Storage UPintenance mode 2013-05-20 11:53:01,793 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Running Capacity Checker ... 2013-05-20 11:53:01,793 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) recalculating system capacity 2013-05-20 11:53:01,793 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Executing cpu/ram capacity update 2013-05-20 11:53:01,794 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Done executing cpu/ram capacity update 2013-05-20 11:53:01,794 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Executing storage capacity update 2013-05-20 11:53:01,795 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Done executing storage capacity update 2013-05-20 11:53:01,795 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Executing capacity updates for public ip and Vlans 2013-05-20 11:53:01,803 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Done capacity updates for public ip and Vlans 2013-05-20 11:53:01,803 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Executing capacity updates for private ip 2013-05-20 11:53:01,807 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Done executing capacity updates for private ip 2013-05-20 11:53:01,807 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Done recalculating system capacity 2013-05-20 11:53:01,817 DEBUG [cloud.alert.AlertManagerImpl] (CapacityChecker:null) Done running Capacity Checker ... 2013-05-20 11:53:01,870 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers. 2013-05-20 11:53:31,388 DEBUG [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip capacity scan due to there is no Primary Storage UPintenance mode 2013-05-20 11:53:31,870 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers. 2013-05-20 11:53:46,102 DEBUG [cloud.server.StatsCollector] (StatsCollector-1:null) HostStatsCollector is running... 2013-05-20 11:53:46,102 DEBUG [cloud.server.StatsCollector] (StatsCollector-2:null) VmStatsCollector is running... 2013-05-20 11:53:46,117 DEBUG [cloud.server.StatsCollector] (StatsCollector-3:null) StorageCollector is running...
( ... )
2013-05-20 11:57:31,806 DEBUG [storage.snapshot.SnapshotSchedulerImpl] (SnapshotPollTask:null) Snapshot scheduler.poll is being called at 2013-05-20 09:57:31 GMT 2013-05-20 11:57:31,807 DEBUG [storage.snapshot.SnapshotSchedulerImpl] (SnapshotPollTask:null) Got 0 snapshots to be executed at 2013-05-20 09:57:31 GMT 2013-05-20 11:57:31,830 DEBUG [cloud.network.ExternalLoadBalancerUsageManagerImpl] (ExternalNetworkMonitor-1:null) External load balancer devices stats collector is running... 2013-05-20 11:57:31,868 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterMonitor-1:null) Found 0 running routers. 2013-05-20 11:57:31,870 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 0 routers. 2013-05-20 11:57:42,458 DEBUG [kvm.discoverer.KvmServerDiscoverer] (catalina-exec-25:null) Timeout, to wait for the host connecting to mgt svr, assuming it is failed 2013-05-20 11:57:42,461 WARN [cloud.resource.ResourceManagerImpl] (catalina-exec-25:null) Unable to find the server resources at http://mnode-1 2013-05-20 11:57:42,464 WARN [api.commands.AddHostCmd] (catalina-exec-25:null) Exception:
com.cloud.exception.DiscoveryException: Unable to add the host
at com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManagerImpl.java:737) at com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImpl.java:544)
    at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:140)
    at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
    at com.cloud.api.ApiServer.queueCommand(ApiServer.java:544)
    at com.cloud.api.ApiServer.handleRequest(ApiServer.java:423)
    at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:312)
    at com.cloud.api.ApiServlet.doGet(ApiServlet.java:64)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:721) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:722)
2013-05-20 11:57:42,465 WARN [cloud.api.ApiDispatcher] (catalina-exec-25:null) class com.cloud.api.ServerApiException : Unable to add the host 2013-05-20 11:57:42,466 DEBUG [cloud.api.ApiServlet] (catalina-exec-25:null) ===END=== 192.168.1.187 -- GET command=addHost&zoneid=ac6c892c-8a9d-494b-8d3c-c612263059a0&podid=a4dad17d-f8dc-4734-a6b2-46954144b954&clusterid=b808e253-3aa2-4372-8520-84d8cda88c27&hypervisor=KVM&clustertype=CloudManaged&hosttags=&username=root&url=http%3A%2F%2Fmnode-1&response=json&sessionkey=%2BcU40ygVQMKpfKAI9il7SspVnBk%3D&_=1369039922245

===== END ====

agent.log in mnode-1:
==============
2013-05-20 10:48:55,603 ERROR [cloud.agent.AgentShell] (main:null) Unable to start agent: Failed to get public nic name 2013-05-20 11:04:28,473 INFO [utils.component.ComponentLocator] (main:null) Unable to find components.xml 2013-05-20 11:04:28,474 INFO [utils.component.ComponentLocator] (main:null) Skipping configuration using components.xml 2013-05-20 11:04:28,474 INFO [cloud.agent.AgentShell] (main:null) Implementation Version is 4.0.2.20130420145617 2013-05-20 11:04:28,475 INFO [cloud.agent.AgentShell] (main:null) agent.properties found at /etc/cloud/agent/agent.properties 2013-05-20 11:04:28,476 INFO [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage 2013-05-20 11:04:28,478 INFO [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
2013-05-20 11:04:28,534 INFO  [cloud.agent.Agent] (main:null) id is
2013-05-20 11:04:28,544 ERROR [cloud.resource.ServerResourceBase] (main:null) Nics are not configured! 2013-05-20 11:04:28,550 INFO [cloud.resource.ServerResourceBase] (main:null) Designating private to be nic em1.100 2013-05-20 11:04:28,638 INFO [resource.virtualnetwork.VirtualRoutingResource] (main:null) VirtualRoutingResource _scriptDir to use: scripts/network/domr/kvm 2013-05-20 11:04:28,838 ERROR [cloud.agent.AgentShell] (main:null) Unable to start agent: Failed to get public nic name

2013-05-20 11:52:05,347 INFO [utils.component.ComponentLocator] (main:null) Unable to find components.xml 2013-05-20 11:52:05,348 INFO [utils.component.ComponentLocator] (main:null) Skipping configuration using components.xml 2013-05-20 11:52:05,348 INFO [cloud.agent.AgentShell] (main:null) Implementation Version is 4.0.2.20130420145617 2013-05-20 11:52:05,349 INFO [cloud.agent.AgentShell] (main:null) agent.properties found at /etc/cloud/agent/agent.properties 2013-05-20 11:52:05,350 INFO [cloud.agent.AgentShell] (main:null) Defaulting to using properties file for storage 2013-05-20 11:52:05,352 INFO [cloud.agent.AgentShell] (main:null) Defaulting to the constant time backoff algorithm
2013-05-20 11:52:05,409 INFO  [cloud.agent.Agent] (main:null) id is
2013-05-20 11:52:05,418 ERROR [cloud.resource.ServerResourceBase] (main:null) Nics are not configured! 2013-05-20 11:52:05,424 INFO [cloud.resource.ServerResourceBase] (main:null) Designating private to be nic em1.100 2013-05-20 11:52:05,513 INFO [resource.virtualnetwork.VirtualRoutingResource] (main:null) VirtualRoutingResource _scriptDir to use: scripts/network/domr/kvm 2013-05-20 11:52:05,712 ERROR [cloud.agent.AgentShell] (main:null) Unable to start agent: Failed to get public nic name

===== END ==

I have iptables disabled and ipsec set to permissive on both sides. I'm guessing there might be a problem with the bridge configuration in the kvm node (mnode-1). This is what I have at the moment:

network scripts:
===========

::::::::::::::
ifcfg-em1
::::::::::::::
DEVICE=em1
BOOTPROTO=none
BROADCAST=172.16.255.255
DNS1=192.168.1.204
GATEWAY=172.16.0.1
HWADDR=90:B1:1C:39:33:8A
IPADDR=172.16.2.3
NETMASK=255.255.0.0
NM_CONTROLLED=no
ONBOOT=yes
HOTPLUG=no
TYPE=Ethernet

::::::::::::::
ifcfg-em1.100
::::::::::::::
DEVICE=em1.100
HWADDR=90:B1:1C:39:33:8A
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
TYPE=Ethernet
VLAN=yes
IPADDR=172.16.5.1
GATEWAY=172.16.0.1
NETMASK=255.255.0.0

::::::::::::::
ifcfg-em1.200
::::::::::::::
DEVICE=em1.200
HWADDR=90:B1:1C:39:33:8A
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
TYPE=Ethernet
VLAN=yes
BRIDGE=cloudbr0

::::::::::::::
ifcfg-em1.300
::::::::::::::
DEVICE=em1.300
HWADDR=90:B1:1C:39:33:8A
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
TYPE=Ethernet
VLAN=yes
BRIDGE=cloudbr1

::::::::::::::
ifcfg-cloudbr0
::::::::::::::
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=yes

::::::::::::::
ifcfg-cloudbr1
::::::::::::::
DEVICE=cloudbr1
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=yes


ifconfig output:
===========

cloudbr0  Link encap:Ethernet  HWaddr 90:B1:1C:39:33:8A
          inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:1188 (1.1 KiB)

cloudbr1  Link encap:Ethernet  HWaddr 90:B1:1C:39:33:8A
          inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:1188 (1.1 KiB)

em1       Link encap:Ethernet  HWaddr 90:B1:1C:39:33:8A
          inet addr:172.16.2.3  Bcast:172.16.255.255 Mask:255.255.0.0
          inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:89868 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6620 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:20755835 (19.7 MiB)  TX bytes:479334 (468.0 KiB)
          Interrupt:16

em1.100   Link encap:Ethernet  HWaddr 90:B1:1C:39:33:8A
          inet addr:172.16.5.1  Bcast:172.16.255.255 Mask:255.255.0.0
          inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:1440 (1.4 KiB)

em1.200   Link encap:Ethernet  HWaddr 90:B1:1C:39:33:8A
          inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3019 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:158774 (155.0 KiB)

em1.300   Link encap:Ethernet  HWaddr 90:B1:1C:39:33:8A
          inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3020 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:158844 (155.1 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:118 errors:0 dropped:0 overruns:0 frame:0
          TX packets:118 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:10544 (10.2 KiB)  TX bytes:10544 (10.2 KiB)

virbr0    Link encap:Ethernet  HWaddr 52:54:00:37:05:54
          inet addr:192.168.122.1  Bcast:192.168.122.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)



The current status of cloud-agent is "cloud-agent dead but subsys locked". Attempting to restart the service results in the same error(s) being printed out to the agent.log.

I ran the cloud-setup-agent command I found in management-server.log, and it seems to work fine, but still Im not able to bring the cloud-agent service up:

[root@mnode-1 network-scripts]# cloud-setup-agent -m 172.16.2.2 -z 2 -p 2 -c 2 -g 8224e6a1-e640-3bd1-b5d0-dd43c74b8b08 -a --pubNic=cloudbr0 --prvNic=cloudbr0 --guestNic=cloudbr0
Starting to configure your system:
Configure Cgroup ...          [OK]
Configure SElinux ...         [OK]
Configure Network ...         [OK]
Configure Libvirt ...         [OK]
Configure Firewall ...        [OK]
Configure Nfs ...             [OK]
Configure cloudAgent ...      [OK]
CloudStack Agent setup is done!
[root@mnode-1 network-scripts]# service cloud-agent restart
Stopping Cloud Agent:
Starting Cloud Agent:
[root@mnode-1 network-scripts]# service cloud-agent status
cloud-agent dead but subsys locked


What am I doing wrong? Thanks in advance for the help.


--
kind regards,

Javier Rodríguez

Reply via email to