Re: cs 4.5.1, hosts stuck in disconnected status

Marc-Andre Jutras Thu, 21 Jul 2016 11:10:33 -0700

Hey Francois,

here is some suggestion...

Did you have any load balancer in front of your 3 CSMAN servers? if so,is there any persistence defined in your configuration ? Can you try toset it to SourceIP and fix the timeout to something like 60 or 120 min ?


Also validate these points:

under global settings / host, make sure your Xen hosts, VM or System VMcan reach the IP defined there...

iptables : make sure these tcp port are open on each of your CSMANservers... : 8080, 8096, 8250, 9090 ( and also validate that you gotthese ports open on your Load balancer too... )

if your zone is set to Advanced mode, make sure each of your xenserveris running openvswitch ( xe-switch-network-backend openvswitch ) if not,( basic mode ) set it to bridge... ( xe-switch-network-backend bridge )( more info:http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.6/hypervisor/xenserver.html#install-cloudstack-xenserver-support-package-csp)

check also each iptables definition in each of your xen server, to test,flush all tables and check if Cloudstack can connect correctly to it...( iptables -F iptables definition in : /etc/sysconfig/iptables )

you can also try to delete one xenhost and re-add it to cloudstack andcheck in the CS logs if you're seeing some files copied to the host...


try that and keep us posted !

Marcus


On 2016-07-21 10:50 AM, Scheurer François wrote:

Dear Stephan and Dag,

we also thought about it and checked it but the host was already enabled on xen.

Best Regards
Francois



EveryWare AG
François Scheurer
Senior Systems Engineer

-----Original Message-----
From: Dag Sonstebo [mailto:[email protected]]
Sent: Thursday, July 21, 2016 1:23 PM
To: [email protected]
Subject: Re: cs 4.5.1, hosts stuck in disconnected status

Hi Francois,

As pointed out by Stephan the problem is probably with your Xen cluster rather 
than your CloudStack management. On the disconnected host you may want to carry 
out a xe-toolstack-restart - this will restart Xapi without affecting running 
Vms. After that check your cluster with ‘xe host-list’ etc. If this doesn’t 
help you may have to consider restarting the host.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue







On 21/07/2016, 11:25, "Francois Scheurer" <[email protected]> 
wrote:

Dear CS contributors


We could fix the issue without having to restart the disconnected Xen Hosts.
We suspect that the root cause was a interrupted agent transfer, during
a restart of a Managment Server (CSMAN).

We have 3 CSMAN's running in cluster: man01, man02 and man03.
The disconnected vh010 belongs to one Xen Hosts Cluster with 4 nodes:
vh009, vh010, vh011 and vh012.
See the chronological events from the logs with our comments regarding
the disconnection of vh010:

===>vh010 (host 19) was on agent 345049103441 (man02)
     vh010: Last Disconnected   2016-07-18T14:03:50+0200
     345049098498 = man01
     345049103441 = man02
     345049098122 = man03

     ewcstack-man02-prod:
         2016-07-18T14:00:34.878973+02:00 ewcstack-man02-prod [audit
root/10467 as root/10467 on
pts/1/192.168.252.77:36251->192.168.225.72:22] /root: service
cloudstack-management restart; service cloudstack-usage restart

     ewcstack-man02-prod:
         2016-07-18 14:02:15,797 DEBUG [c.c.s.StorageManagerImpl]
(StorageManager-Scavenger-1:ctx-ea98efd4) Storage pool garbage collector
found 0 templates to clean up in storage pool: ewcstack-vh010-prod Local
Storage
     !    2016-07-18 14:02:26,699 DEBUG
[c.c.a.m.ClusteredAgentManagerImpl] (StatsCollector-1:ctx-7da7a491) Host
19 has switched to another management server, need to update agent map
with a forwarding agent attache

     ewcstack-man01-prod:
         2016-07-18T14:02:47.317644+02:00 ewcstack-man01-prod [audit
root/11094 as root/11094 on
pts/0/192.168.252.77:40654->192.168.225.71:22] /root: service
cloudstack-management restart; service cloudstack-usage restart;

     ewcstack-man02-prod:
         2016-07-18 14:03:24,859 DEBUG [c.c.s.StorageManagerImpl]
(StorageManager-Scavenger-1:ctx-c39aaa53) Storage pool garbage collector
found 0 templates to clean up in storage pool: ewcstack-vh010-prod Local
Storage

     ewcstack-man02-prod:
         2016-07-18 14:03:26,260 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-6:null) SeqA 256-29401: Sending Seq 256-29401:  {
Ans: , MgmtId: 345049103441, via: 256, Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
         2016-07-18 14:03:28,535 DEBUG [c.c.s.StatsCollector]
(StatsCollector-1:ctx-814f1ae1) HostStatsCollector is running...
         2016-07-18 14:03:28,553 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 7-6771162039751540742: Forwarding
null to 345049098122
         2016-07-18 14:03:28,661 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-7:null) SeqA 244-153489: Processing Seq
244-153489:  { Cmd , MgmtId: -1, via: 244, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":1456,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
         2016-07-18 14:03:28,667 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-7:null) SeqA 244-153489: Sending Seq 244-153489:
{ Ans: , MgmtId: 345049103441, via: 244, Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
         2016-07-18 14:03:28,731 DEBUG [c.c.a.t.Request]
(StatsCollector-1:ctx-814f1ae1) Seq 7-6771162039751540742: Received:  {
Ans: , MgmtId: 345049103441, via: 7, Ver: v1, Flags: 10, {
GetHostStatsAnswer } }
===>11 = vh006, 345049098122 = man03, vh006 is transfered to man03:
         2016-07-18 14:03:28,744 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 11-5143110774457106438: Forwarding
null to 345049098122
         2016-07-18 14:03:28,838 DEBUG [c.c.a.t.Request]
(StatsCollector-1:ctx-814f1ae1) Seq 11-5143110774457106438: Received:  {
Ans: , MgmtId: 345049103441, via: 11, Ver: v1, Flags: 10, {
GetHostStatsAnswer } }
===>19 = vh010, 345049098498 = man01, vh010 is transfered to man01, but
man01 is stopping and starting at 14:02:47, so the transfer failed:
     !    2016-07-18 14:03:28,851 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 19-2009731333714083845: Forwarding
null to 345049098498
         2016-07-18 14:03:28,852 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 19-2009731333714083845: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:28,852 INFO [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) IOException Broken pipe when sending
data to peer 345049098498, close peer connection and let it re-open
         2016-07-18 14:03:28,856 WARN  [c.c.a.m.AgentManagerImpl]
(StatsCollector-1:ctx-814f1ae1) Exception while sending
         java.lang.NullPointerException
                 at
com.cloud.agent.manager.ClusteredAgentManagerImpl.connectToPeer(ClusteredAgentManagerImpl.java:527)
                 at
com.cloud.agent.manager.ClusteredAgentAttache.send(ClusteredAgentAttache.java:177)
                 at
com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:395)
                 at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:433)
                 at
com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:362)
                 at
com.cloud.agent.manager.AgentManagerImpl.easySend(AgentManagerImpl.java:919)
                 at
com.cloud.resource.ResourceManagerImpl.getHostStatistics(ResourceManagerImpl.java:2460)
                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
                 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
                 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
                 at java.lang.reflect.Method.invoke(Method.java:606)
                 at
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
                 at
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
                 at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
                 at
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
                 at
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
                 at
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
                 at com.sun.proxy.$Proxy149.getHostStatistics(Unknown
Source)
                 at
com.cloud.server.StatsCollector$HostCollector.runInContext(StatsCollector.java:325)
                 at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
                 at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
                 at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
                 at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
                 at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
                 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                 at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
                 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
                 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                 at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                 at java.lang.Thread.run(Thread.java:745)
         2016-07-18 14:03:28,857 WARN  [c.c.r.ResourceManagerImpl]
(StatsCollector-1:ctx-814f1ae1) Unable to obtain host 19 statistics.
         2016-07-18 14:03:28,857 WARN  [c.c.s.StatsCollector]
(StatsCollector-1:ctx-814f1ae1) Received invalid host stats for host: 19

         2016-07-18 14:03:28,870 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 21-6297439653947506693: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:28,887 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 25-2894407185515675660: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:28,903 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 29-4279264070932103175: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:28,919 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 33-123567514775977989: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:29,057 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 224-4524428775647084550: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:29,170 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 19-2009731333714083846: Error on
connecting to management node: null try = 1
===>vh010 is invalid and stays disconnected:
     !    2016-07-18 14:03:29,174 WARN  [c.c.r.ResourceManagerImpl]
(StatsCollector-1:ctx-814f1ae1) Unable to obtain GPU stats for host
ewcstack-vh010-prod
         2016-07-18 14:03:29,183 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 21-6297439653947506694: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:29,196 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 25-2894407185515675661: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:29,212 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 29-4279264070932103176: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:29,226 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 33-123567514775977990: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:29,282 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-1:ctx-814f1ae1) Seq 224-4524428775647084551: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:30,246 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-2:ctx-942dd66c) Seq 19-2009731333714083847: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:30,302 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-2:ctx-942dd66c) Seq 21-6297439653947506695: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:30,352 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-2:ctx-942dd66c) Seq 25-2894407185515675662: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:30,381 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-2:ctx-942dd66c) Seq 29-4279264070932103177: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:30,421 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-2:ctx-942dd66c) Seq 33-123567514775977991: Error on
connecting to management node: null try = 1
         2016-07-18 14:03:30,691 DEBUG [c.c.a.m.ClusteredAgentAttache]
(StatsCollector-2:ctx-942dd66c) Seq 224-4524428775647084552: Error on
connecting to management node: null try = 1

The Table op_host_transfer shows 3 Transfers, that were not completed:
für id 3,15,19 = vh007, vh011, vh010:

     mysql> select * from op_host_transfer ;
+-----+------------------------+-----------------------+-------------------+---------------------+
     | id  | initial_mgmt_server_id | future_mgmt_server_id |
state             | created             |
+-----+------------------------+-----------------------+-------------------+---------------------+
     |   3 |           345049103441 |          345049098122 |
TransferRequested | 2016-07-13 14:46:57 |
     |  15 |           345049103441 |          345049098122 |
TransferRequested | 2016-07-14 16:15:11 |
     |  19 |           345049098498 |          345049103441 |
TransferRequested | 2016-07-18 12:03:39 |
     | 130 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-13 14:52:00 |
     | 134 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-03 08:54:40 |
     | 150 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-13 14:52:00 |
     | 158 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-03 08:54:41 |
     | 221 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-13 14:52:00 |
     | 232 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-03 08:54:41 |
     | 244 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-13 14:52:00 |
     | 248 |           345049103441 |          345049098498 |
TransferRequested | 2016-07-03 08:54:41 |
     | 250 |           345049098122 |          345049103441 |
TransferRequested | 2016-07-15 18:54:35 |
     | 251 |           345049103441 |          345049098122 |
TransferRequested | 2016-07-16 09:06:12 |
     | 252 |           345049103441 |          345049098122 |
TransferRequested | 2016-07-18 11:22:06 |
     | 253 |           345049103441 |          345049098122 |
TransferRequested | 2016-07-16 09:06:13 |
     | 254 |           345049103441 |          345049098122 |
TransferRequested | 2016-07-18 11:22:07 |
     | 255 |           345049098122 |          345049098498 |
TransferRequested | 2016-07-18 12:05:40 |
+-----+------------------------+-----------------------+-------------------+---------------------+


Analysis:
A rolling restart of all 3 CSMANs (one-by-one) seems to have caused
these 3 uncompleted transfers and seems to be the cause of the hosts
stucked in Disconnected status.

If we stop all CSMAN's and start a single one (for ex. man03), then
these 3 uncompleted transfers disappeared and the hosts get connected
automatically.
It is probably also possible to delete them manually in the
op_host_transfer. (can you confirm this?)

We also discovered an issue with loopback devices that are not removed
after a stop of the CMSAN.


Conclusion:

Problem: xen hosts get and stay forever disconnected.
Solution:
     stop all CSMAN
         losetup -a
         losetup -d /dev/loop{0..7}
         mysql> update host set
status="Up",resource_state="Enabled",mgmt_server_id=<CSMAN-ID> where
id=<HOST-ID>;
         mysql> update op_host_capacity set capacity_state="Enabled"
where host_id=<HOST-ID>;
         mysql> delete op_host_transfer where id=<HOST-ID>;
     optional:
         on xen server host:
             xe-toolstack-restart; sleep 60
             xe host-list params=enabled
             xe host-enable host=<hostname>
     start a single CSMAN
     restart all System VM's (Secondary Storage and Console Proxy)
     wait until all hosts are connected
     start all other CSMAN's
Useful:
     mysql> select id,name,uuid,status,type, mgmt_server_id from host
where removed is NULL;
     mysql> select * from mshost;
     mysql> select * from op_host_transfer;
     mysql> select * from mshost where removed is NULL;
     mysql> select * from host_tags;
     mysql> select * from mshost_peer;
     mysql> select * from op_host_capacity order by host_id;



Best regards
Francois Scheurer

On 21.07.2016 11:56, Francois Scheurer wrote:

Dear CS contributors


We use CS 4.5.1 on a 3 Clusters with XenServer 6.5.

One Host in a cluster (and another in another cluster as well) got and
stayed in status "Disconnected".

We tried to unmanage/remanage the cluster to force a reconnection, we
also destroyed all System VM's (virtual console and secondary storage
VM's), we restarted all management servers.
We verified on the xen server that it is not disabled, we restarted
the xen toolstack.
We also updated the host table to put a mgmt_server_id: update host
set
status="Up",resource_state="Disabled",mgmt_server_id="345049103441"
where id=15;
Then we restarted the management servers again and also the System VM's.
We finally updated the table to without mgmt_server_id: update host
set status="Alert",resource_state="Disabled",mgmt_server_id=NULL where
id=15;
Then we restarted the management servers again and also the System VM's.
Nothing helps, the server does not reconnect.

Calling ForceReconnect shows this error:

2016-07-18 11:26:07,418 DEBUG [c.c.a.ApiServlet]
(catalina-exec-13:ctx-4e82fdce) ===START===  192.168.252.77 -- GET
command=reconnectHost&id=3490cfa0-b2a7-4a12-aa5e-7e351ce9df00&response=json&sessionkey=Tnc9l6aaSvc8J5SNy3Z71FLXgEI%3D&_=1468833953948

2016-07-18 11:26:07,450 INFO [o.a.c.f.j.i.AsyncJobMonitor]
(API-Job-Executor-23:ctx-fc340a8e job-148672) Add job-148672 into job
monitoring
2016-07-18 11:26:07,453 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(catalina-exec-13:ctx-4e82fdce ctx-9c696de2) submit async job-148672,
details: AsyncJobVO {id:148672, userId: 51, accountId: 51,
instanceType: Host, instanceId: 15, cmd:
org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd,
cmdInfo:
{"id":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","response":"json","sessionkey":"Tnc9l6aaSvc8J5SNy3Z71FLXgEI\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00\"}","cmdEventType":"HOST.RECONNECT","ctxUserId":"51","httpmethod":"GET","_":"1468833953948","uuid":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","ctxAccountId":"51","ctxStartEventId":"18026840"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049098122, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2016-07-18 11:26:07,454 DEBUG [c.c.a.ApiServlet]
(catalina-exec-13:ctx-4e82fdce ctx-9c696de2) ===END=== 192.168.252.77
-- GET
command=reconnectHost&id=3490cfa0-b2a7-4a12-aa5e-7e351ce9df00&response=json&sessionkey=Tnc9l6aaSvc8J5SNy3Z71FLXgEI%3D&_=1468833953948
2016-07-18 11:26:07,455 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-23:ctx-fc340a8e job-148672) Executing AsyncJobVO
{id:148672, userId: 51, accountId: 51, instanceType: Host, instanceId:
15, cmd:
org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd,
cmdInfo:
{"id":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","response":"json","sessionkey":"Tnc9l6aaSvc8J5SNy3Z71FLXgEI\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00\"}","cmdEventType":"HOST.RECONNECT","ctxUserId":"51","httpmethod":"GET","_":"1468833953948","uuid":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","ctxAccountId":"51","ctxStartEventId":"18026840"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049098122, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2016-07-18 11:26:07,461 DEBUG [c.c.a.m.DirectAgentAttache]
(DirectAgent-495:ctx-77e68e88) Seq 213-6743858967010618892: Executing
request
2016-07-18 11:26:07,467 INFO  [c.c.a.m.AgentManagerImpl]
(API-Job-Executor-23:ctx-fc340a8e job-148672 ctx-0061c491) Unable to
disconnect host because it is not connected to this server: 15
2016-07-18 11:26:07,467 WARN [o.a.c.a.c.a.h.ReconnectHostCmd]
(API-Job-Executor-23:ctx-fc340a8e job-148672 ctx-0061c491) Exception:
org.apache.cloudstack.api.ServerApiException: Failed to reconnect host
     at
org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd.execute(ReconnectHostCmd.java:109)
     at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
     at
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
     at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537)
     at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
     at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
     at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
     at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
     at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
     at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
     at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
     at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
     at java.lang.Thread.run(Thread.java:745)

Connecting via SSH from the management server is fine, for ex.:
   [root@ewcstack-man03-prod ~]# ssh -i
/var/cloudstack/management/.ssh/id_rsa root@ewcstack-vh011-prod
"/opt/cloud/bin/router_proxy.sh netusage.sh 169.254.2.103 -g"
   root@ewcstack-vh011-prod's password:
   2592:0:0:0:[root@ewcstack-man03-prod ~]#


Any Idea how to solve this issue and how to track the reason of the
failure to reconnect?

Many thanks in advance for your help.



Best Regards
Francois

--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: [email protected]
web: http://www.everyware.ch

[email protected]
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue

Re: cs 4.5.1, hosts stuck in disconnected status

Reply via email to