Hello,

I am new in the list and I am stuck with a very annoying issue on
CPVM/SSVM.


When I start the Cloudstack-management, everything is good. After around 3-4
<outlook-data-detector://0> hours, the agent state of CPVM/SSVM
automatically turns to "Disconnected" and the secondary storage goes to
"0kb/0kb", but the VM state is still "running". Once manually rebooting
CPVM/SSVM, the agent state would turn back to "up" and the secondary
storage would be back as well. After 3-4 hours, the issue repeats again.


Here is the log when SSVM/CPVM goes down:


----
2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
(AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following agents
behind on ping: [3]
2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
(AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for
CPVM/SSVM due to physical connection close. host: 3
2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting
with event ShutdownRequested
2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of agent
3is Disconnected, current status is Up
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for 3
with state Disconnected
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect.
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq 3-906630899985023222:
Sending disconnect to class com.cloud.agent.manager.SynchronousListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.listener.StoragePoolMonitor
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.secondary.SecondaryStorageListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.security.SecurityGroupListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Waiting some more time because this is the current command
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.deploy.DeploymentPlanningManagerImpl
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.SshKeysDistriMonitor
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.consoleproxy.ConsoleProxyListener
2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Waiting some more time because this is the current command
2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find exception:
com.cloud.exception.OperationTimedoutException in error code list for
exceptions
2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Timed out on null
2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
Cancelling.
2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send command,
due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
906630899985023222 to Host 3 timed out after 3600
2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
(StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to retrieve
storage stats
com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
906630899985023222 to Host 3 timed out after 3600
at
org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
at
com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:985)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener:
com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on 3
with status Disconnected
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.capacity.StorageCapacityListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.capacity.ComputeCapacityListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.SshKeysDistriMonitor
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.LocalStoragePoolListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.upload.UploadListener
2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
listener: com.cloud.storage.download.DownloadListener
2018-02-21 15:57:47,527 DEBUG [c.c.h.Status] (AgentTaskPool-7:ctx-67ec16e3)
(logid:d6a36e24) Transition:[Resource state = Enabled, Agent event =
ShutdownRequested, Host id = 3, name = s-1-VM]
2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
(AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes of to
disconnect
----

When the issue arises, all instances, hosts, and other resources are
running fine. I just updated the cloudstack-management and cloudstack-agent
to to 4.11, but the problem is still there. Any ideas?


Thanks!

Chen

Reply via email to