Dear CS contributors

We use CS 4.5.1 on a 3 Clusters with XenServer 6.5.

One Host in a cluster (and another in another cluster as well) got and stayed in status "Disconnected".

We tried to unmanage/remanage the cluster to force a reconnection, we also destroyed all System VM's (virtual console and secondary storage VM's), we restarted all management servers. We verified on the xen server that it is not disabled, we restarted the xen toolstack. We also updated the host table to put a mgmt_server_id: update host set status="Up",resource_state="Disabled",mgmt_server_id="345049103441" where id=15;
Then we restarted the management servers again and also the System VM's.
We finally updated the table to without mgmt_server_id: update host set status="Alert",resource_state="Disabled",mgmt_server_id=NULL where id=15;
Then we restarted the management servers again and also the System VM's.
Nothing helps, the server does not reconnect.

Calling ForceReconnect shows this error:

2016-07-18 11:26:07,418 DEBUG [c.c.a.ApiServlet] (catalina-exec-13:ctx-4e82fdce) ===START=== 192.168.252.77 -- GET command=reconnectHost&id=3490cfa0-b2a7-4a12-aa5e-7e351ce9df00&response=json&sessionkey=Tnc9l6aaSvc8J5SNy3Z71FLXgEI%3D&_=1468833953948 2016-07-18 11:26:07,450 INFO [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-23:ctx-fc340a8e job-148672) Add job-148672 into job monitoring 2016-07-18 11:26:07,453 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (catalina-exec-13:ctx-4e82fdce ctx-9c696de2) submit async job-148672, details: AsyncJobVO {id:148672, userId: 51, accountId: 51, instanceType: Host, instanceId: 15, cmd: org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd, cmdInfo: {"id":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","response":"json","sessionkey":"Tnc9l6aaSvc8J5SNy3Z71FLXgEI\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00\"}","cmdEventType":"HOST.RECONNECT","ctxUserId":"51","httpmethod":"GET","_":"1468833953948","uuid":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","ctxAccountId":"51","ctxStartEventId":"18026840"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 345049098122, completeMsid: null, lastUpdated: null, lastPolled: null, created: null} 2016-07-18 11:26:07,454 DEBUG [c.c.a.ApiServlet] (catalina-exec-13:ctx-4e82fdce ctx-9c696de2) ===END=== 192.168.252.77 -- GET command=reconnectHost&id=3490cfa0-b2a7-4a12-aa5e-7e351ce9df00&response=json&sessionkey=Tnc9l6aaSvc8J5SNy3Z71FLXgEI%3D&_=1468833953948 2016-07-18 11:26:07,455 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-23:ctx-fc340a8e job-148672) Executing AsyncJobVO {id:148672, userId: 51, accountId: 51, instanceType: Host, instanceId: 15, cmd: org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd, cmdInfo: {"id":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","response":"json","sessionkey":"Tnc9l6aaSvc8J5SNy3Z71FLXgEI\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00\"}","cmdEventType":"HOST.RECONNECT","ctxUserId":"51","httpmethod":"GET","_":"1468833953948","uuid":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","ctxAccountId":"51","ctxStartEventId":"18026840"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 345049098122, completeMsid: null, lastUpdated: null, lastPolled: null, created: null} 2016-07-18 11:26:07,461 DEBUG [c.c.a.m.DirectAgentAttache] (DirectAgent-495:ctx-77e68e88) Seq 213-6743858967010618892: Executing request 2016-07-18 11:26:07,467 INFO [c.c.a.m.AgentManagerImpl] (API-Job-Executor-23:ctx-fc340a8e job-148672 ctx-0061c491) Unable to disconnect host because it is not connected to this server: 15 2016-07-18 11:26:07,467 WARN [o.a.c.a.c.a.h.ReconnectHostCmd] (API-Job-Executor-23:ctx-fc340a8e job-148672 ctx-0061c491) Exception:
org.apache.cloudstack.api.ServerApiException: Failed to reconnect host
at org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd.execute(ReconnectHostCmd.java:109)
    at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

Connecting via SSH from the management server is fine, for ex.:
[root@ewcstack-man03-prod ~]# ssh -i /var/cloudstack/management/.ssh/id_rsa root@ewcstack-vh011-prod "/opt/cloud/bin/router_proxy.sh netusage.sh 169.254.2.103 -g"
  root@ewcstack-vh011-prod's password:
  2592:0:0:0:[root@ewcstack-man03-prod ~]#


Any Idea how to solve this issue and how to track the reason of the failure to reconnect?

Many thanks in advance for your help.



Best Regards
Francois






--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: [email protected]
web: http://www.everyware.ch

Reply via email to