Hi Dag,

Yes the VR is online all the time. I checked the cloud.log inside the
system VM, here is the problem:

2018-02-27 09:01:23,345 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-5:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:02:23,650 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-2:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:03:23,915 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-4:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:04:24,240 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-3:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:05:24,507 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-1:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:06:24,773 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-5:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:07:25,296 INFO
[storage.resource.NfsSecondaryStorageResource]
(agentRequest-Handler-2:null) Determined host 192.168.1.101 corresponds to
IP 192.168.1.101
2018-02-27 09:09:57,210 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Lost connection to the server. Dealing with the remaining commands...
2018-02-27 09:09:57,218 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
NioClient connection closed
2018-02-27 09:09:57,218 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Reconnecting to host:129.*.*.*
2018-02-27 09:09:57,219 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
Connecting to 129.*.*.*:8250
2018-02-27 09:09:57,228 ERROR [utils.nio.NioConnection]
(Agent-Handler-2:null) Unable to initialize the threads.
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at com.cloud.utils.nio.NioClient.init(NioClient.java:56)
at com.cloud.utils.nio.NioConnection.start(NioConnection.java:95)
at com.cloud.agent.Agent.reconnect(Agent.java:442)
at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1014)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-02-27 09:09:57,249 INFO  [utils.exception.CSExceptionErrorCode]
(Agent-Handler-2:null) Could not find exception:
com.cloud.utils.exception.NioConnectionException in error code list for
exceptions
2018-02-27 09:09:57,250 WARN  [cloud.agent.Agent] (Agent-Handler-2:null)
NIO Connection Exception  com.cloud.utils.exception.NioConnectionException:
No route to host
2018-02-27 09:09:57,250 INFO  [cloud.agent.Agent] (Agent-Handler-2:null)
Attempted to connect to the server, but received an unexpected exception,
trying again...
2018-02-27 09:09:57,250 INFO  [utils.nio.NioClient] (Agent-Handler-2:null)
NioClient connection closed

The management server has two IPs, the public IP (129.*.*.*) and a local IP
(192.168.1.101). The 8250 port is restricted by the public IP so it cannot
be accessed. I use the local IP as the cluster node ip and host ip in all
agents, so I do not understand why the system VM always suddenly
disconnected with the local ip and started connecting to the public IP. Is
there any way to fix the IP to local IP?

Thanks!
Chen

On Fri, Feb 23, 2018 at 11:01 AM, Dag Sonstebo <dag.sonst...@shapeblue.com>
wrote:

> Do VRs stay online and connected?
>
> What you need to do next is check your cloud.log on the system VMs,
> possibly also up the verbosity level in the logs to catch why they are
> dropping comms.
>
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
>
> On 23/02/2018, 15:25, "Chen Zhang" <iamczh...@gmail.com> wrote:
>
>     Hi Dag,
>
>     Yes I did recreate the new system VMs. The version is "Cloudstack
> release
>     4.11.0".
>
>     Thanks!
>     Chen
>
>     On Fri, Feb 23, 2018 at 9:27 AM, Dag Sonstebo <
> dag.sonst...@shapeblue.com>
>     wrote:
>
>     > Hi Chen,
>     >
>     > You say you just upgraded to 4.11 – did you destroy your system VMs
> and
>     > let them recreate after the upgrade?
>     >
>     > Can you also check what version a “cat /etc/cloudstack-release”
> shows up
>     > with on your SSVM/CPVM?
>     >
>     > Regards,
>     > Dag Sonstebo
>     > Cloud Architect
>     > ShapeBlue
>     >
>     > On 23/02/2018, 14:00, "Chen Zhang" <iamczh...@gmail.com> wrote:
>     >
>     >     Hello,
>     >
>     >
>     >     I am new in the list and I am stuck with a very annoying issue on
>     >     CPVM/SSVM.
>     >
>     >
>     >     When I start the Cloudstack-management, everything is good. After
>     > around 3-4
>     >     <outlook-data-detector://0> hours, the agent state of CPVM/SSVM
>     >     automatically turns to "Disconnected" and the secondary storage
> goes to
>     >     "0kb/0kb", but the VM state is still "running". Once manually
> rebooting
>     >     CPVM/SSVM, the agent state would turn back to "up" and the
> secondary
>     >     storage would be back as well. After 3-4 hours, the issue repeats
>     > again.
>     >
>     >
>     >     Here is the log when SSVM/CPVM goes down:
>     >
>     >
>     >     ----
>     >     2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl]
>     >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the
> following
>     > agents
>     >     behind on ping: [3]
>     >     2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl]
>     >     (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent
> for
>     >     CPVM/SSVM due to physical connection close. host: 3
>     >     2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is
> disconnecting
>     >     with event ShutdownRequested
>     >     2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status
> of
>     > agent
>     >     3is Disconnected, current status is Up
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering
> link for
>     > 3
>     >     with state Disconnected
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing
> Disconnect.
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq
>     > 3-906630899985023222:
>     >     Sending disconnect to class com.cloud.agent.manager.
>     > SynchronousListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.hypervisor.xenserver.discoverer.
>     > XcpServerDiscoverer
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.hypervisor.hyperv.discoverer.
>     > HypervServerDiscoverer
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.listener.StoragePoolMonitor
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: org.apache.cloudstack.engine.orchestration.
>     > NetworkOrchestrator
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.secondary.SecondaryStorageListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.security.SecurityGroupListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Waiting some more time because this is the current command
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.deploy.DeploymentPlanningManagerImpl
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.SshKeysDistriMonitor
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.router.
> VirtualNetworkApplianceManagerImpl
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.consoleproxy.ConsoleProxyListener
>     >     2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Waiting some more time because this is the current command
>     >     2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find
>     > exception:
>     >     com.cloud.exception.OperationTimedoutException in error code
> list for
>     >     exceptions
>     >     2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Timed out on null
>     >     2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq
>     > 3-906630899985023222:
>     >     Cancelling.
>     >     2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send
>     > command,
>     >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
>     > Commands
>     >     906630899985023222 to Host 3 timed out after 3600
>     >     2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector]
>     >     (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to
>     > retrieve
>     >     storage stats
>     >     com.cloud.utils.exception.CloudRuntimeException: Failed to send
>     > command,
>     >     due to Agent:3, com.cloud.exception.OperationTimedoutException:
>     > Commands
>     >     906630899985023222 to Host 3 timed out after 3600
>     >     at
>     >     org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(
>     > RemoteHostEndPoint.java:133)
>     >     at
>     >     com.cloud.server.StatsCollector$StorageCollector.runInContext(
>     > StatsCollector.java:985)
>     >     at
>     >     org.apache.cloudstack.managed.context.
> ManagedContextRunnable$1.run(
>     > ManagedContextRunnable.java:49)
>     >     at
>     >     org.apache.cloudstack.managed.context.impl.
>     > DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>     >     at
>     >     org.apache.cloudstack.managed.context.impl.
> DefaultManagedContext.
>     > callWithContext(DefaultManagedContext.java:103)
>     >     at
>     >     org.apache.cloudstack.managed.context.impl.
> DefaultManagedContext.
>     > runWithContext(DefaultManagedContext.java:53)
>     >     at
>     >     org.apache.cloudstack.managed.context.
> ManagedContextRunnable.run(
>     > ManagedContextRunnable.java:46)
>     >     at java.util.concurrent.Executors$RunnableAdapter.
>     > call(Executors.java:511)
>     >     at java.util.concurrent.FutureTask.runAndReset(
> FutureTask.java:308)
>     >     at
>     >     java.util.concurrent.ScheduledThreadPoolExecutor$
>     > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     >     at
>     >     java.util.concurrent.ScheduledThreadPoolExecutor$
>     > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     >     at
>     >     java.util.concurrent.ThreadPoolExecutor.runWorker(
>     > ThreadPoolExecutor.java:1149)
>     >     at
>     >     java.util.concurrent.ThreadPoolExecutor$Worker.run(
>     > ThreadPoolExecutor.java:624)
>     >     at java.lang.Thread.run(Thread.java:748)
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener:
>     >     com.cloud.network.NetworkUsageManagerImpl$
> DirectNetworkStatsListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected
> called on
>     > 3
>     >     with status Disconnected
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.agent.manager.AgentManagerImpl$
>     > BehindOnPingListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.agent.manager.AgentManagerImpl$
>     > SetHostParamsListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.capacity.StorageCapacityListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.capacity.ComputeCapacityListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.SshKeysDistriMonitor
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.network.router.
> VpcVirtualNetworkApplianceMana
>     > gerImpl
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.LocalStoragePoolListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.upload.UploadListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending
> Disconnect to
>     >     listener: com.cloud.storage.download.DownloadListener
>     >     2018-02-21 15:57:47,527 DEBUG [c.c.h.Status]
>     > (AgentTaskPool-7:ctx-67ec16e3)
>     >     (logid:d6a36e24) Transition:[Resource state = Enabled, Agent
> event =
>     >     ShutdownRequested, Host id = 3, name = s-1-VM]
>     >     2018-02-21 15:57:47,620 DEBUG [c.c.a.m.
> ClusteredAgentManagerImpl]
>     >     (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other
> nodes
>     > of to
>     >     disconnect
>     >     ----
>     >
>     >     When the issue arises, all instances, hosts, and other resources
> are
>     >     running fine. I just updated the cloudstack-management and
>     > cloudstack-agent
>     >     to to 4.11, but the problem is still there. Any ideas?
>     >
>     >
>     >     Thanks!
>     >
>     >     Chen
>     >
>     >
>     >
>     > dag.sonst...@shapeblue.com
>     > www.shapeblue.com
>     > 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>     > @shapeblue
>     >
>     >
>     >
>     >
>
>
>
> dag.sonst...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>

Reply via email to