Hi Chen,

You say you just upgraded to 4.11 – did you destroy your system VMs and let 
them recreate after the upgrade?

Can you also check what version a “cat /etc/cloudstack-release” shows up with 
on your SSVM/CPVM?

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 23/02/2018, 14:00, "Chen Zhang" <iamczh...@gmail.com> wrote:

    Hello,
    
    
    I am new in the list and I am stuck with a very annoying issue on
    CPVM/SSVM.
    
    
    When I start the Cloudstack-management, everything is good. After around 3-4
    <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
    automatically turns to "Disconnected" and the secondary storage goes to
    "0kb/0kb", but the VM state is still "running". Once manually rebooting
    CPVM/SSVM, the agent state would turn back to "up" and the secondary
    storage would be back as well. After 3-4 hours, the issue repeats again.
    
    
    Here is the log when SSVM/CPVM goes down:
    
    
    ----
    2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
    (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following agents
    behind on ping: [3]
    2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
    (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for
    CPVM/SSVM due to physical connection close. host: 3
    2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting
    with event ShutdownRequested
    2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of agent
    3is Disconnected, current status is Up
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for 3
    with state Disconnected
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect.
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq 3-906630899985023222:
    Sending disconnect to class com.cloud.agent.manager.SynchronousListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.listener.StoragePoolMonitor
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.secondary.SecondaryStorageListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.security.SecurityGroupListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Waiting some more time because this is the current command
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.deploy.DeploymentPlanningManagerImpl
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.SshKeysDistriMonitor
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.consoleproxy.ConsoleProxyListener
    2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Waiting some more time because this is the current command
    2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find exception:
    com.cloud.exception.OperationTimedoutException in error code list for
    exceptions
    2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Timed out on null
    2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq 3-906630899985023222:
    Cancelling.
    2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send command,
    due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
    906630899985023222 to Host 3 timed out after 3600
    2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
    (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to retrieve
    storage stats
    com.cloud.utils.exception.CloudRuntimeException: Failed to send command,
    due to Agent:3, com.cloud.exception.OperationTimedoutException: Commands
    906630899985023222 to Host 3 timed out after 3600
    at
    
org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
    at
    
com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:985)
    at
    
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at
    
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at
    
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at
    
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at
    
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at
    
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at
    
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener:
    com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
    2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on 3
    with status Disconnected
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.capacity.StorageCapacityListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.capacity.ComputeCapacityListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.SshKeysDistriMonitor
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.LocalStoragePoolListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.upload.UploadListener
    2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to
    listener: com.cloud.storage.download.DownloadListener
    2018-02-21 15:57:47,527 DEBUG [c.c.h.Status] (AgentTaskPool-7:ctx-67ec16e3)
    (logid:d6a36e24) Transition:[Resource state = Enabled, Agent event =
    ShutdownRequested, Host id = 3, name = s-1-VM]
    2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
    (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes of to
    disconnect
    ----
    
    When the issue arises, all instances, hosts, and other resources are
    running fine. I just updated the cloudstack-management and cloudstack-agent
    to to 4.11, but the problem is still there. Any ideas?
    
    
    Thanks!
    
    Chen
    


dag.sonst...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

Reply via email to