Hi Nick,

This might not be relevant for Xen, but we've had problems with memory leaks on 
the VRs on VMware when balloon memory was enabled.

A while ago, we built a custom router monitoring setup via SSH for our 
environment, because CloudStack doesn't give us enough information about router 
status. This caused the VR kernel to leak memory, and the router to reboot 
suddenly when memory was used up.

The issue was fixed by several memory management optimisations on the system VM 
template (done by René, Rohit and Angus, if I remember correctly) and by 
setting an OS type that would cause VMware to completely disable balloon memory.

It's possible that you have a similar issue - can you monitor the affected VRs 
for a while and see if the reboots are caused by a memory leak?
We still see memory being used up slowly, but when a critical threshold (~98%) 
is reached, the kernel will garbage collect it.

Regards,
Gregor
________________________________
From: Nick Thompson <nick.thomp...@neos.co.nz>
Sent: 06 August 2019 06:01
To: 'users@cloudstack.apache.org' <users@cloudstack.apache.org>
Subject: RE: Virtual routers randomly rebooting

Hey,

Thanks Andrija,

>From what I have read I didn't need to add the hypervisor mapping 7.5 as 
>CloudStack only looks at the version number rather than the name of the 
>hypervisor at this stage (e.g. XenServer vs XCP-ng). Also the issue was 
>happening in XenServer 6.5 anyway and CloudStack isn't having any problems 
>controlling XCP-ng.

>From what I have seen so far only some VPCs are randomly rebooting (on 
>different hosts too), Storage and Console and standard network VMs seem to be 
>fine (however I have a lot more VPCs than any other network type). I'm not 
>sure if the VM is rebooting itself or if the Management Server is having an 
>issue communicating with the VM so shutting it down and restarting it.

Is there a way to disable checks on VPCs so it doesn't try and restart the 
router VM? I have found/tried network.router.EnableServiceMonitoring = NO but 
from the documentation it notes that VPC networks are not supported.

Any suggestions in what I could try/look into would greatly be appreciated.

Regards,
Nick Thompson


-----Original Message-----
From: Andrija Panic [mailto:andrija.pa...@gmail.com]
Sent: Wednesday, 17 July 2019 6:24 a.m.
To: users <users@cloudstack.apache.org>; Rohit Yadav <rohit.ya...@shapeblue.com>
Subject: Re: Virtual routers randomly rebooting

Have you added os/hypervisor mappings inside the DB? I vaguely remember 7.5 not 
having needed mapping and was considered to be 6.5, thus a manual fix was 
needed.

Perhaps Rohit can sched some light?

Anyways, a full log would be great (pastebin or other online service please).

Regards

On Tue, 16 Jul 2019, 00:59 Nick Thompson, <nick.thomp...@neos.co.nz> wrote:

> Hey,
>
> Since we upgraded to the 4.11 branch (currently 4.11.3) and virtual
> routers have become HVM on XenServer/XCP-ng we have had problems with
> the virtual routers randomly rebooting themselves. We still have some
> running in the older paravirtualized mode and they seem to be fine (it
> may be that the Management server can't communicate to these virtual
> routers since they are an older template?). Other running Windows/Linux VMs 
> are fine.
>
> CloudStack Cluster: XCP-ng 7.5 (previously XenServer 6.5, same issue
> was
> happening)
> CloudStack: 4.11.3 (same issue in 4.11.2, was working fine in V4.9.3
>
> When digging through the management-server.log, I have found the
> following;
>
> >grep -n "Error while collecting network stats from router"
> /var/log/cloudstack/management/management-server.log
> 919060:2019-07-16 10:28:43,756 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (RouterMonitor-1:ctx-1931d792)
> (logid:6e236728) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 919328:2019-07-16 10:28:50,940 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (RouterMonitor-1:ctx-1931d792)
> (logid:6e236728) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 920533:2019-07-16 10:29:54,768 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 920621:2019-07-16 10:30:01,952 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
>
> >less +920621 /var/log/cloudstack/management/management-server.log
> 2019-07-16 10:30:01,952 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836555: Received:  { Ans: , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 10, {
> NetworkUsageAnswer } }
> 2019-07-16 10:30:01,952 DEBUG [c.c.a.m.AgentManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Details from executing class
> com.cloud.agent.api.NetworkUsageCommand: Exception:
> java.lang.Exception
> Message:  vpc network usage plugin call failed
> Stack: java.lang.Exception:  vpc network usage plugin call failed
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.executeNetworkUsage(XenServer56NetworkUsageCommandWrapper.java:84)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:41)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:33)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1737)
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> 2019-07-16 10:30:01,952 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> Message:  vpc network usage plugin call failed
> Stack: java.lang.Exception:  vpc network usage plugin call failed
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.executeNetworkUsage(XenServer56NetworkUsageCommandWrapper.java:84)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:41)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:33)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1737)
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836557: Sending  { Cmd , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeClean
> up":false,"controlIp":"169.254.0.200","forceStop":true,"volumesToDisco
> nnect":[],"vmName":"r-2280-VM","executeInSequence":false,"wait":0}}]
> }
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836557: Executing:  { Cmd , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeClean
> up":false,"controlIp":"169.254.0.200","forceStop":true,"volumesToDisco
> nnect":[],"vmName":"r-2280-VM","executeInSequence":false,"wait":0}}]
> }
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-77c57645) (logid:8733e444) Seq 71-7505811728966836557:
> Executing request
> 2019-07-16 10:30:01,995 DEBUG [c.c.h.x.r.w.x.CitrixStopCommandWrapper]
> (DirectAgent-430:ctx-77c57645) (logid:2b94fd5d) 9. The VM r-2280-VM is
> in Stopping state
> 2019-07-16 10:30:02,303 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-390:ctx-3b9a78cd) (logid:27f6ec94) Seq 67-8459448950062117779:
> Response Received
>
>
> Any thoughts would be greatly appreciated.
>
> Cheers,
> Nick.
>

Reply via email to