Hi Chen, You say you just upgraded to 4.11 – did you destroy your system VMs and let them recreate after the upgrade?
Can you also check what version a “cat /etc/cloudstack-release” shows up with on your SSVM/CPVM? Regards, Dag Sonstebo Cloud Architect ShapeBlue On 23/02/2018, 14:00, "Chen Zhang" <[email protected]> wrote: Hello, I am new in the list and I am stuck with a very annoying issue on CPVM/SSVM. When I start the Cloudstack-management, everything is good. After around 3-4 <outlook-data-detector://0> hours, the agent state of CPVM/SSVM automatically turns to "Disconnected" and the secondary storage goes to "0kb/0kb", but the VM state is still "running". Once manually rebooting CPVM/SSVM, the agent state would turn back to "up" and the secondary storage would be back as well. After 3-4 hours, the issue repeats again. Here is the log when SSVM/CPVM goes down: ---- 2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl] (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following agents behind on ping: [3] 2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl] (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for CPVM/SSVM due to physical connection close. host: 3 2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting with event ShutdownRequested 2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of agent 3is Disconnected, current status is Up 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for 3 with state Disconnected 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect. 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq 3-906630899985023222: Sending disconnect to class com.cloud.agent.manager.SynchronousListener 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.storage.listener.StoragePoolMonitor 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.storage.secondary.SecondaryStorageListener 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.network.security.SecurityGroupListener 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222: Waiting some more time because this is the current command 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.deploy.DeploymentPlanningManagerImpl 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.network.SshKeysDistriMonitor 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.consoleproxy.ConsoleProxyListener 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222: Waiting some more time because this is the current command 2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode] (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions 2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222: Timed out on null 2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache] (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222: Cancelling. 2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint] (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send command, due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands 906630899985023222 to Host 3 timed out after 3600 2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector] (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to retrieve storage stats com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands 906630899985023222 to Host 3 timed out after 3600 at org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133) at com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:985) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener 2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on 3 with status Disconnected 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.capacity.StorageCapacityListener 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.capacity.ComputeCapacityListener 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.network.SshKeysDistriMonitor 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.storage.LocalStoragePoolListener 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.storage.upload.UploadListener 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to listener: com.cloud.storage.download.DownloadListener 2018-02-21 15:57:47,527 DEBUG [c.c.h.Status] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Transition:[Resource state = Enabled, Agent event = ShutdownRequested, Host id = 3, name = s-1-VM] 2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes of to disconnect ---- When the issue arises, all instances, hosts, and other resources are running fine. I just updated the cloudstack-management and cloudstack-agent to to 4.11, but the problem is still there. Any ideas? Thanks! Chen [email protected] www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
