Dear CS contributors
We use CS 4.5.1 on a 3 Clusters with XenServer 6.5.
One Host in a cluster (and another in another cluster as well) got and
stayed in status "Disconnected".
We tried to unmanage/remanage the cluster to force a reconnection, we
also destroyed all System VM's (virtual console and secondary storage
VM's), we restarted all management servers.
We verified on the xen server that it is not disabled, we restarted the
xen toolstack.
We also updated the host table to put a mgmt_server_id: update host set
status="Up",resource_state="Disabled",mgmt_server_id="345049103441"
where id=15;
Then we restarted the management servers again and also the System VM's.
We finally updated the table to without mgmt_server_id: update host set
status="Alert",resource_state="Disabled",mgmt_server_id=NULL where id=15;
Then we restarted the management servers again and also the System VM's.
Nothing helps, the server does not reconnect.
Calling ForceReconnect shows this error:
2016-07-18 11:26:07,418 DEBUG [c.c.a.ApiServlet]
(catalina-exec-13:ctx-4e82fdce) ===START=== 192.168.252.77 -- GET
command=reconnectHost&id=3490cfa0-b2a7-4a12-aa5e-7e351ce9df00&response=json&sessionkey=Tnc9l6aaSvc8J5SNy3Z71FLXgEI%3D&_=1468833953948
2016-07-18 11:26:07,450 INFO [o.a.c.f.j.i.AsyncJobMonitor]
(API-Job-Executor-23:ctx-fc340a8e job-148672) Add job-148672 into job
monitoring
2016-07-18 11:26:07,453 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(catalina-exec-13:ctx-4e82fdce ctx-9c696de2) submit async job-148672,
details: AsyncJobVO {id:148672, userId: 51, accountId: 51, instanceType:
Host, instanceId: 15, cmd:
org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd, cmdInfo:
{"id":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","response":"json","sessionkey":"Tnc9l6aaSvc8J5SNy3Z71FLXgEI\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00\"}","cmdEventType":"HOST.RECONNECT","ctxUserId":"51","httpmethod":"GET","_":"1468833953948","uuid":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","ctxAccountId":"51","ctxStartEventId":"18026840"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049098122, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2016-07-18 11:26:07,454 DEBUG [c.c.a.ApiServlet]
(catalina-exec-13:ctx-4e82fdce ctx-9c696de2) ===END=== 192.168.252.77 --
GET
command=reconnectHost&id=3490cfa0-b2a7-4a12-aa5e-7e351ce9df00&response=json&sessionkey=Tnc9l6aaSvc8J5SNy3Z71FLXgEI%3D&_=1468833953948
2016-07-18 11:26:07,455 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-23:ctx-fc340a8e job-148672) Executing AsyncJobVO
{id:148672, userId: 51, accountId: 51, instanceType: Host, instanceId:
15, cmd: org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd,
cmdInfo:
{"id":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","response":"json","sessionkey":"Tnc9l6aaSvc8J5SNy3Z71FLXgEI\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00\"}","cmdEventType":"HOST.RECONNECT","ctxUserId":"51","httpmethod":"GET","_":"1468833953948","uuid":"3490cfa0-b2a7-4a12-aa5e-7e351ce9df00","ctxAccountId":"51","ctxStartEventId":"18026840"},
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 345049098122, completeMsid: null, lastUpdated:
null, lastPolled: null, created: null}
2016-07-18 11:26:07,461 DEBUG [c.c.a.m.DirectAgentAttache]
(DirectAgent-495:ctx-77e68e88) Seq 213-6743858967010618892: Executing
request
2016-07-18 11:26:07,467 INFO [c.c.a.m.AgentManagerImpl]
(API-Job-Executor-23:ctx-fc340a8e job-148672 ctx-0061c491) Unable to
disconnect host because it is not connected to this server: 15
2016-07-18 11:26:07,467 WARN [o.a.c.a.c.a.h.ReconnectHostCmd]
(API-Job-Executor-23:ctx-fc340a8e job-148672 ctx-0061c491) Exception:
org.apache.cloudstack.api.ServerApiException: Failed to reconnect host
at
org.apache.cloudstack.api.command.admin.host.ReconnectHostCmd.execute(ReconnectHostCmd.java:109)
at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
at
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:537)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:494)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Connecting via SSH from the management server is fine, for ex.:
[root@ewcstack-man03-prod ~]# ssh -i
/var/cloudstack/management/.ssh/id_rsa root@ewcstack-vh011-prod
"/opt/cloud/bin/router_proxy.sh netusage.sh 169.254.2.103 -g"
root@ewcstack-vh011-prod's password:
2592:0:0:0:[root@ewcstack-man03-prod ~]#
Any Idea how to solve this issue and how to track the reason of the
failure to reconnect?
Many thanks in advance for your help.
Best Regards
Francois
--
EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich
tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: [email protected]
web: http://www.everyware.ch