Hi all,
I'm testing CloudStack in two of our servers and I'm finding trouble
starting the agent in the hypervisor node.
Heres a summary of my configuration:
0 - Nw infrastructure:
subnet: 172.16.0.0/16
gateway 172.16.0.1
DNS1: 192.168.1.204
1 - Cloud management server:
HW: Dell PowerEdge R320
OS: CentOS 6.4
IP: 172.16.2.2/16
hostname: morpheus (morpheus.biometrics.local)
2 - Hypervisor node:
HW: Dell PowerEdge R320
OS: CentOS 6.4
IP: 172.16.2.3/16
Hypervisor type: KVM
hostname: mnode-1 (mnode-1.biometrics.local)
Everything worked seamlessly on the cloud manager's side, and I didn't
see any error while following the steps for the kvm hypervisor node
described in the install manual. In the manager console, I was able to
add the zone, pod and cluster, but when I tried to create the host, it
spent several minutes "creating" and then I got a popup alert with the
error: "Unable to add the host".
management-server.log in morpheus :
=======================
2013-05-20 11:52:05,403 DEBUG [cloud.api.ApiServlet]
(catalina-exec-25:null) ===START=== 192.168.1.187 -- GET
command=addHost&zoneid=ac6c892c-8a9d-494b-8d3c-c612263059a0&podid=a4dad17d-f8dc-4734-a6b2-46954144b954&clusterid=b808e253-3aa2-4372-8520-84d8cda88c27&hypervisor=KVM&clustertype=CloudManaged&hosttags=&username=root&url=http%3A%2F%2Fmnode-1&response=json&sessionkey=%2BcU40ygVQMKpfKAI9il7SspVnBk%3D&_=1369039922245
2013-05-20 11:52:05,413 INFO [cloud.resource.ResourceManagerImpl]
(catalina-exec-25:null) Trying to add a new host at http://mnode-1 in
data center 2
2013-05-20 11:52:05,665 DEBUG [utils.ssh.SSHCmdHelper]
(catalina-exec-25:null) Executing cmd: lsmod|grep kvm
2013-05-20 11:52:06,797 DEBUG [utils.ssh.SSHCmdHelper]
(catalina-exec-25:null) lsmod|grep kvm output:kvm_intel 53484 0
kvm 316602 1 kvm_intel
2013-05-20 11:52:07,812 DEBUG [utils.ssh.SSHCmdHelper]
(catalina-exec-25:null) Executing cmd: cloud-setup-agent -m 172.16.2.2
-z 2 -p 2 -c 2 -g 8224e6a1-e640-3bd1-b5d0-dd43c74b8b08 -a
--pubNic=cloudbr0 --prvNic=cloudbr0 --guestNic=cloudbr0
2013-05-20 11:52:31,388 DEBUG
[cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip
capacity scan due to there is no Primary Storage UPintenance mode
2013-05-20 11:52:31,805 DEBUG [storage.snapshot.SnapshotSchedulerImpl]
(SnapshotPollTask:null) Snapshot scheduler.poll is being called at
2013-05-20 09:52:31 GMT
2013-05-20 11:52:31,806 DEBUG [storage.snapshot.SnapshotSchedulerImpl]
(SnapshotPollTask:null) Got 0 snapshots to be executed at 2013-05-20
09:52:31 GMT
2013-05-20 11:52:31,830 DEBUG
[cloud.network.ExternalLoadBalancerUsageManagerImpl]
(ExternalNetworkMonitor-1:null) External load balancer devices stats
collector is running...
2013-05-20 11:52:31,868 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(RouterMonitor-1:null) Found 0 running routers.
2013-05-20 11:52:31,870 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(RouterStatusMonitor-1:null) Found 0 routers.
2013-05-20 11:52:41,440 DEBUG [utils.ssh.SSHCmdHelper]
(catalina-exec-25:null) cloud-setup-agent -m 172.16.2.2 -z 2 -p 2 -c 2
-g 8224e6a1-e640-3bd1-b5d0-dd43c74b8b08 -a --pubNic=cloudbr0
--prvNic=cloudbr0 --guestNic=cloudbr0 output:CloudStack Agent setup is done!
Configure Cgroup ...
2013-05-20 11:52:46,100 DEBUG [cloud.server.StatsCollector]
(StatsCollector-1:null) HostStatsCollector is running...
2013-05-20 11:52:46,100 DEBUG [cloud.server.StatsCollector]
(StatsCollector-2:null) VmStatsCollector is running...
2013-05-20 11:52:46,114 DEBUG [cloud.server.StatsCollector]
(StatsCollector-3:null) StorageCollector is running...
2013-05-20 11:53:01,388 DEBUG
[cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip
capacity scan due to there is no Primary Storage UPintenance mode
2013-05-20 11:53:01,793 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Running Capacity Checker ...
2013-05-20 11:53:01,793 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) recalculating system capacity
2013-05-20 11:53:01,793 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Executing cpu/ram capacity update
2013-05-20 11:53:01,794 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Done executing cpu/ram capacity update
2013-05-20 11:53:01,794 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Executing storage capacity update
2013-05-20 11:53:01,795 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Done executing storage capacity update
2013-05-20 11:53:01,795 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Executing capacity updates for public ip and Vlans
2013-05-20 11:53:01,803 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Done capacity updates for public ip and Vlans
2013-05-20 11:53:01,803 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Executing capacity updates for private ip
2013-05-20 11:53:01,807 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Done executing capacity updates for private ip
2013-05-20 11:53:01,807 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Done recalculating system capacity
2013-05-20 11:53:01,817 DEBUG [cloud.alert.AlertManagerImpl]
(CapacityChecker:null) Done running Capacity Checker ...
2013-05-20 11:53:01,870 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(RouterStatusMonitor-1:null) Found 0 routers.
2013-05-20 11:53:31,388 DEBUG
[cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Skip
capacity scan due to there is no Primary Storage UPintenance mode
2013-05-20 11:53:31,870 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(RouterStatusMonitor-1:null) Found 0 routers.
2013-05-20 11:53:46,102 DEBUG [cloud.server.StatsCollector]
(StatsCollector-1:null) HostStatsCollector is running...
2013-05-20 11:53:46,102 DEBUG [cloud.server.StatsCollector]
(StatsCollector-2:null) VmStatsCollector is running...
2013-05-20 11:53:46,117 DEBUG [cloud.server.StatsCollector]
(StatsCollector-3:null) StorageCollector is running...
( ... )
2013-05-20 11:57:31,806 DEBUG [storage.snapshot.SnapshotSchedulerImpl]
(SnapshotPollTask:null) Snapshot scheduler.poll is being called at
2013-05-20 09:57:31 GMT
2013-05-20 11:57:31,807 DEBUG [storage.snapshot.SnapshotSchedulerImpl]
(SnapshotPollTask:null) Got 0 snapshots to be executed at 2013-05-20
09:57:31 GMT
2013-05-20 11:57:31,830 DEBUG
[cloud.network.ExternalLoadBalancerUsageManagerImpl]
(ExternalNetworkMonitor-1:null) External load balancer devices stats
collector is running...
2013-05-20 11:57:31,868 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(RouterMonitor-1:null) Found 0 running routers.
2013-05-20 11:57:31,870 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl]
(RouterStatusMonitor-1:null) Found 0 routers.
2013-05-20 11:57:42,458 DEBUG [kvm.discoverer.KvmServerDiscoverer]
(catalina-exec-25:null) Timeout, to wait for the host connecting to mgt
svr, assuming it is failed
2013-05-20 11:57:42,461 WARN [cloud.resource.ResourceManagerImpl]
(catalina-exec-25:null) Unable to find the server resources at
http://mnode-1
2013-05-20 11:57:42,464 WARN [api.commands.AddHostCmd]
(catalina-exec-25:null) Exception:
com.cloud.exception.DiscoveryException: Unable to add the host
at
com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManagerImpl.java:737)
at
com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImpl.java:544)
at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:140)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:138)
at com.cloud.api.ApiServer.queueCommand(ApiServer.java:544)
at com.cloud.api.ApiServer.handleRequest(ApiServer.java:423)
at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:312)
at com.cloud.api.ApiServlet.doGet(ApiServlet.java:64)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:721)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
2013-05-20 11:57:42,465 WARN [cloud.api.ApiDispatcher]
(catalina-exec-25:null) class com.cloud.api.ServerApiException : Unable
to add the host
2013-05-20 11:57:42,466 DEBUG [cloud.api.ApiServlet]
(catalina-exec-25:null) ===END=== 192.168.1.187 -- GET
command=addHost&zoneid=ac6c892c-8a9d-494b-8d3c-c612263059a0&podid=a4dad17d-f8dc-4734-a6b2-46954144b954&clusterid=b808e253-3aa2-4372-8520-84d8cda88c27&hypervisor=KVM&clustertype=CloudManaged&hosttags=&username=root&url=http%3A%2F%2Fmnode-1&response=json&sessionkey=%2BcU40ygVQMKpfKAI9il7SspVnBk%3D&_=1369039922245
===== END ====
agent.log in mnode-1:
==============
2013-05-20 10:48:55,603 ERROR [cloud.agent.AgentShell] (main:null)
Unable to start agent: Failed to get public nic name
2013-05-20 11:04:28,473 INFO [utils.component.ComponentLocator]
(main:null) Unable to find components.xml
2013-05-20 11:04:28,474 INFO [utils.component.ComponentLocator]
(main:null) Skipping configuration using components.xml
2013-05-20 11:04:28,474 INFO [cloud.agent.AgentShell] (main:null)
Implementation Version is 4.0.2.20130420145617
2013-05-20 11:04:28,475 INFO [cloud.agent.AgentShell] (main:null)
agent.properties found at /etc/cloud/agent/agent.properties
2013-05-20 11:04:28,476 INFO [cloud.agent.AgentShell] (main:null)
Defaulting to using properties file for storage
2013-05-20 11:04:28,478 INFO [cloud.agent.AgentShell] (main:null)
Defaulting to the constant time backoff algorithm
2013-05-20 11:04:28,534 INFO [cloud.agent.Agent] (main:null) id is
2013-05-20 11:04:28,544 ERROR [cloud.resource.ServerResourceBase]
(main:null) Nics are not configured!
2013-05-20 11:04:28,550 INFO [cloud.resource.ServerResourceBase]
(main:null) Designating private to be nic em1.100
2013-05-20 11:04:28,638 INFO
[resource.virtualnetwork.VirtualRoutingResource] (main:null)
VirtualRoutingResource _scriptDir to use: scripts/network/domr/kvm
2013-05-20 11:04:28,838 ERROR [cloud.agent.AgentShell] (main:null)
Unable to start agent: Failed to get public nic name
2013-05-20 11:52:05,347 INFO [utils.component.ComponentLocator]
(main:null) Unable to find components.xml
2013-05-20 11:52:05,348 INFO [utils.component.ComponentLocator]
(main:null) Skipping configuration using components.xml
2013-05-20 11:52:05,348 INFO [cloud.agent.AgentShell] (main:null)
Implementation Version is 4.0.2.20130420145617
2013-05-20 11:52:05,349 INFO [cloud.agent.AgentShell] (main:null)
agent.properties found at /etc/cloud/agent/agent.properties
2013-05-20 11:52:05,350 INFO [cloud.agent.AgentShell] (main:null)
Defaulting to using properties file for storage
2013-05-20 11:52:05,352 INFO [cloud.agent.AgentShell] (main:null)
Defaulting to the constant time backoff algorithm
2013-05-20 11:52:05,409 INFO [cloud.agent.Agent] (main:null) id is
2013-05-20 11:52:05,418 ERROR [cloud.resource.ServerResourceBase]
(main:null) Nics are not configured!
2013-05-20 11:52:05,424 INFO [cloud.resource.ServerResourceBase]
(main:null) Designating private to be nic em1.100
2013-05-20 11:52:05,513 INFO
[resource.virtualnetwork.VirtualRoutingResource] (main:null)
VirtualRoutingResource _scriptDir to use: scripts/network/domr/kvm
2013-05-20 11:52:05,712 ERROR [cloud.agent.AgentShell] (main:null)
Unable to start agent: Failed to get public nic name
===== END ==
I have iptables disabled and ipsec set to permissive on both sides. I'm
guessing there might be a problem with the bridge configuration in the
kvm node (mnode-1). This is what I have at the moment:
network scripts:
===========
::::::::::::::
ifcfg-em1
::::::::::::::
DEVICE=em1
BOOTPROTO=none
BROADCAST=172.16.255.255
DNS1=192.168.1.204
GATEWAY=172.16.0.1
HWADDR=90:B1:1C:39:33:8A
IPADDR=172.16.2.3
NETMASK=255.255.0.0
NM_CONTROLLED=no
ONBOOT=yes
HOTPLUG=no
TYPE=Ethernet
::::::::::::::
ifcfg-em1.100
::::::::::::::
DEVICE=em1.100
HWADDR=90:B1:1C:39:33:8A
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
TYPE=Ethernet
VLAN=yes
IPADDR=172.16.5.1
GATEWAY=172.16.0.1
NETMASK=255.255.0.0
::::::::::::::
ifcfg-em1.200
::::::::::::::
DEVICE=em1.200
HWADDR=90:B1:1C:39:33:8A
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
TYPE=Ethernet
VLAN=yes
BRIDGE=cloudbr0
::::::::::::::
ifcfg-em1.300
::::::::::::::
DEVICE=em1.300
HWADDR=90:B1:1C:39:33:8A
ONBOOT=yes
HOTPLUG=no
BOOTPROTO=none
TYPE=Ethernet
VLAN=yes
BRIDGE=cloudbr1
::::::::::::::
ifcfg-cloudbr0
::::::::::::::
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=yes
::::::::::::::
ifcfg-cloudbr1
::::::::::::::
DEVICE=cloudbr1
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=yes
ifconfig output:
===========
cloudbr0 Link encap:Ethernet HWaddr 90:B1:1C:39:33:8A
inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:1188 (1.1 KiB)
cloudbr1 Link encap:Ethernet HWaddr 90:B1:1C:39:33:8A
inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:1188 (1.1 KiB)
em1 Link encap:Ethernet HWaddr 90:B1:1C:39:33:8A
inet addr:172.16.2.3 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:89868 errors:0 dropped:0 overruns:0 frame:0
TX packets:6620 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:20755835 (19.7 MiB) TX bytes:479334 (468.0 KiB)
Interrupt:16
em1.100 Link encap:Ethernet HWaddr 90:B1:1C:39:33:8A
inet addr:172.16.5.1 Bcast:172.16.255.255 Mask:255.255.0.0
inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:1440 (1.4 KiB)
em1.200 Link encap:Ethernet HWaddr 90:B1:1C:39:33:8A
inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:3019 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:158774 (155.0 KiB)
em1.300 Link encap:Ethernet HWaddr 90:B1:1C:39:33:8A
inet6 addr: fe80::92b1:1cff:fe39:338a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:3020 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:158844 (155.1 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:118 errors:0 dropped:0 overruns:0 frame:0
TX packets:118 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10544 (10.2 KiB) TX bytes:10544 (10.2 KiB)
virbr0 Link encap:Ethernet HWaddr 52:54:00:37:05:54
inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
The current status of cloud-agent is "cloud-agent dead but subsys
locked". Attempting to restart the service results in the same error(s)
being printed out to the agent.log.
I ran the cloud-setup-agent command I found in management-server.log,
and it seems to work fine, but still Im not able to bring the
cloud-agent service up:
[root@mnode-1 network-scripts]# cloud-setup-agent -m 172.16.2.2 -z 2 -p
2 -c 2 -g 8224e6a1-e640-3bd1-b5d0-dd43c74b8b08 -a --pubNic=cloudbr0
--prvNic=cloudbr0 --guestNic=cloudbr0
Starting to configure your system:
Configure Cgroup ... [OK]
Configure SElinux ... [OK]
Configure Network ... [OK]
Configure Libvirt ... [OK]
Configure Firewall ... [OK]
Configure Nfs ... [OK]
Configure cloudAgent ... [OK]
CloudStack Agent setup is done!
[root@mnode-1 network-scripts]# service cloud-agent restart
Stopping Cloud Agent:
Starting Cloud Agent:
[root@mnode-1 network-scripts]# service cloud-agent status
cloud-agent dead but subsys locked
What am I doing wrong? Thanks in advance for the help.
--
kind regards,
Javier Rodríguez