Hi Dag,

Yes I did recreate the new system VMs. The version is "Cloudstack release
4.11.0".

Thanks!
Chen

On Fri, Feb 23, 2018 at 9:27 AM, Dag Sonstebo <dag.sonst...@shapeblue.com>
wrote:

> Hi Chen,
>
> You say you just upgraded to 4.11 – did you destroy your system VMs and
> let them recreate after the upgrade?
>
> Can you also check what version a “cat /etc/cloudstack-release” shows up
> with on your SSVM/CPVM?
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 23/02/2018, 14:00, "Chen Zhang" <iamczh...@gmail.com> wrote:
>
>     Hello,
>
>
>     I am new in the list and I am stuck with a very annoying issue on
>     CPVM/SSVM.
>
>
>     When I start the Cloudstack-management, everything is good. After
> around 3-4
>     <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
>     automatically turns to "Disconnected" and the secondary storage goes to
>     "0kb/0kb", but the VM state is still "running". Once manually rebooting
>     CPVM/SSVM, the agent state would turn back to "up" and the secondary
>     storage would be back as well. After 3-4 hours, the issue repeats
> again.
>
>
>     Here is the log when SSVM/CPVM goes down:
>
>
>     ----
>     2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
>     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following
> agents
>     behind on ping: [3]
>     2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
>     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for
>     CPVM/SSVM due to physical connection close. host: 3
>     2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting
>     with event ShutdownRequested
>     2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of
> agent
>     3is Disconnected, current status is Up
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for
> 3
>     with state Disconnected
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect.
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq
> 3-906630899985023222:
>     Sending disconnect to class com.cloud.agent.manager.
> SynchronousListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.hypervisor.xenserver.discoverer.
> XcpServerDiscoverer
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.hypervisor.hyperv.discoverer.
> HypervServerDiscoverer
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.listener.StoragePoolMonitor
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: org.apache.cloudstack.engine.orchestration.
> NetworkOrchestrator
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.secondary.SecondaryStorageListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.security.SecurityGroupListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Waiting some more time because this is the current command
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.deploy.DeploymentPlanningManagerImpl
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.SshKeysDistriMonitor
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.consoleproxy.ConsoleProxyListener
>     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Waiting some more time because this is the current command
>     2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find
> exception:
>     com.cloud.exception.OperationTimedoutException in error code list for
>     exceptions
>     2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Timed out on null
>     2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
> 3-906630899985023222:
>     Cancelling.
>     2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send
> command,
>     due to Agent:3, com.cloud.exception.OperationTimedoutException:
> Commands
>     906630899985023222 to Host 3 timed out after 3600
>     2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
>     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to
> retrieve
>     storage stats
>     com.cloud.utils.exception.CloudRuntimeException: Failed to send
> command,
>     due to Agent:3, com.cloud.exception.OperationTimedoutException:
> Commands
>     906630899985023222 to Host 3 timed out after 3600
>     at
>     org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(
> RemoteHostEndPoint.java:133)
>     at
>     com.cloud.server.StatsCollector$StorageCollector.runInContext(
> StatsCollector.java:985)
>     at
>     org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(
> ManagedContextRunnable.java:49)
>     at
>     org.apache.cloudstack.managed.context.impl.
> DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>     at
>     org.apache.cloudstack.managed.context.impl.DefaultManagedContext.
> callWithContext(DefaultManagedContext.java:103)
>     at
>     org.apache.cloudstack.managed.context.impl.DefaultManagedContext.
> runWithContext(DefaultManagedContext.java:53)
>     at
>     org.apache.cloudstack.managed.context.ManagedContextRunnable.run(
> ManagedContextRunnable.java:46)
>     at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at
>     java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at
>     java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at
>     java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>     at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener:
>     com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on
> 3
>     with status Disconnected
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.agent.manager.AgentManagerImpl$
> BehindOnPingListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.agent.manager.AgentManagerImpl$
> SetHostParamsListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.capacity.StorageCapacityListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.capacity.ComputeCapacityListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.SshKeysDistriMonitor
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.network.router.VpcVirtualNetworkApplianceMana
> gerImpl
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.LocalStoragePoolListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.upload.UploadListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
>     listener: com.cloud.storage.download.DownloadListener
>     2018-02-21 15:57:47,527 DEBUG [c.c.h.Status]
> (AgentTaskPool-7:ctx-67ec16e3)
>     (logid:d6a36e24) Transition:[Resource state = Enabled, Agent event =
>     ShutdownRequested, Host id = 3, name = s-1-VM]
>     2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
>     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes
> of to
>     disconnect
>     ----
>
>     When the issue arises, all instances, hosts, and other resources are
>     running fine. I just updated the cloudstack-management and
> cloudstack-agent
>     to to 4.11, but the problem is still there. Any ideas?
>
>
>     Thanks!
>
>     Chen
>
>
>
> dag.sonst...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>

Reply via email to