RE: Virtual routers randomly rebooting

Nick Thompson Sun, 18 Aug 2019 17:01:27 -0700

So it's been a week and we haven't had any virtual routers randomly reboot. 
Gregor hit the nail on the head advising the memory issue. Since the latest HVM 
versions of the virtual router (4.11.X) I have noticed more swap activity 
(kswapd process running higher CPU than usual), seeing swap happening with 
performance monitoring, swap used is about 40-50MB. Effected system VMs are the 
ones that are setup as VPC or actively running HAProxy.


I have allocated an extra 128MB of RAM to the effected routers and the issue 
seems to have gone away (for the time being anyway)

Modified service offering in SQL;
#Get the ID
select * from service_offering_view where name LIKE "System Offering For 
Software Router";
#Update the service offering
update service_offering_view set ram_size = 384 where id = 7;
#Rebooted test router, confirmed new memory is available.


Is the currently allocated 256MB of memory no longer enough for system VMs?
If not, how much should be allocated? An extra 128MB does seem a bit excessive, 
maybe 64MB is enough?
What could be causing this extra usage?

Nothing is really standing out in the way of excessive memory usage between 
each of the system VMs.

#Top memory usage in a VPC virtual router (V 4.11.3)
ps -o pid,user,%mem,command ax | sort -b -k3 -r
PID     User    %MEM COMMAND
1369    root    3.2     python /opt/cloud/bin/passwd_server_ip.py 10.30.0.1
3607    root    3.0     /usr/lib/ipsec/Sharon
354     root    2.1     /usr/sbin/xe-daemon
1416    root    1.8     /usr/sbin/apache2 -k start
1       root    1.8     /sbin/init
748     root    1.8 /lib/system/system-journald

#Top memory usage in a VPC virtual router (V 4.6)
PID     USER                    %MEM            COMMAND
4474    root                    5.9     /usr/lib/ipsec/pluto 
3180     root                   3.3     python 
/opt/cloud/bin/passwd_server_ip.py 10.30.0.1
2370    root                    2.4     /usr/sbin/rsyslogd -c5
3116    root                    1.9     /usr/sbin/apache2 -k start
3122    www-data        1.3     /usr/sbin/apache2 -k start
3121    www-data        1.3     /usr/sbin/apache2 -k start
4526    root                    1.3     pluto helper  #  0                     

Regards,
Nick Thompson

-----Original Message-----
From: Nick Thompson [mailto:[email protected]] 
Sent: Wednesday, 7 August 2019 12:34 p.m.
To: '[email protected]' <[email protected]>
Subject: RE: Virtual routers randomly rebooting

Thanks Gregor,

I'll give that a go. I did notice a high load average on some VRs a while back, 
could be related.

Regards,

Nick Thompson


-----Original Message-----
From: Riepl, Gregor (SWISS TXT) [mailto:[email protected]]
Sent: Wednesday, 7 August 2019 3:44 a.m.
To: [email protected]
Subject: Re: Virtual routers randomly rebooting

Hi Nick,

This might not be relevant for Xen, but we've had problems with memory leaks on 
the VRs on VMware when balloon memory was enabled.

A while ago, we built a custom router monitoring setup via SSH for our 
environment, because CloudStack doesn't give us enough information about router 
status. This caused the VR kernel to leak memory, and the router to reboot 
suddenly when memory was used up.

The issue was fixed by several memory management optimisations on the system VM 
template (done by René, Rohit and Angus, if I remember correctly) and by 
setting an OS type that would cause VMware to completely disable balloon memory.

It's possible that you have a similar issue - can you monitor the affected VRs 
for a while and see if the reboots are caused by a memory leak?
We still see memory being used up slowly, but when a critical threshold (~98%) 
is reached, the kernel will garbage collect it.

Regards,
Gregor
________________________________
From: Nick Thompson <[email protected]>
Sent: 06 August 2019 06:01
To: '[email protected]' <[email protected]>
Subject: RE: Virtual routers randomly rebooting

Hey,

Thanks Andrija,

>From what I have read I didn't need to add the hypervisor mapping 7.5 as 
>CloudStack only looks at the version number rather than the name of the 
>hypervisor at this stage (e.g. XenServer vs XCP-ng). Also the issue was 
>happening in XenServer 6.5 anyway and CloudStack isn't having any problems 
>controlling XCP-ng.

>From what I have seen so far only some VPCs are randomly rebooting (on 
>different hosts too), Storage and Console and standard network VMs seem to be 
>fine (however I have a lot more VPCs than any other network type). I'm not 
>sure if the VM is rebooting itself or if the Management Server is having an 
>issue communicating with the VM so shutting it down and restarting it.

Is there a way to disable checks on VPCs so it doesn't try and restart the 
router VM? I have found/tried network.router.EnableServiceMonitoring = NO but 
from the documentation it notes that VPC networks are not supported.

Any suggestions in what I could try/look into would greatly be appreciated.

Regards,
Nick Thompson


-----Original Message-----
From: Andrija Panic [mailto:[email protected]]
Sent: Wednesday, 17 July 2019 6:24 a.m.
To: users <[email protected]>; Rohit Yadav <[email protected]>
Subject: Re: Virtual routers randomly rebooting

Have you added os/hypervisor mappings inside the DB? I vaguely remember 7.5 not 
having needed mapping and was considered to be 6.5, thus a manual fix was 
needed.

Perhaps Rohit can sched some light?

Anyways, a full log would be great (pastebin or other online service please).

Regards

On Tue, 16 Jul 2019, 00:59 Nick Thompson, <[email protected]> wrote:

> Hey,
>
> Since we upgraded to the 4.11 branch (currently 4.11.3) and virtual 
> routers have become HVM on XenServer/XCP-ng we have had problems with 
> the virtual routers randomly rebooting themselves. We still have some 
> running in the older paravirtualized mode and they seem to be fine (it 
> may be that the Management server can't communicate to these virtual 
> routers since they are an older template?). Other running Windows/Linux VMs 
> are fine.
>
> CloudStack Cluster: XCP-ng 7.5 (previously XenServer 6.5, same issue 
> was
> happening)
> CloudStack: 4.11.3 (same issue in 4.11.2, was working fine in V4.9.3
>
> When digging through the management-server.log, I have found the 
> following;
>
> >grep -n "Error while collecting network stats from router"
> /var/log/cloudstack/management/management-server.log
> 919060:2019-07-16 10:28:43,756 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (RouterMonitor-1:ctx-1931d792)
> (logid:6e236728) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 919328:2019-07-16 10:28:50,940 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (RouterMonitor-1:ctx-1931d792)
> (logid:6e236728) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 920533:2019-07-16 10:29:54,768 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> 920621:2019-07-16 10:30:01,952 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
>
> >less +920621 /var/log/cloudstack/management/management-server.log
> 2019-07-16 10:30:01,952 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836555: Received:  { Ans: , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 10, { 
> NetworkUsageAnswer } }
> 2019-07-16 10:30:01,952 DEBUG [c.c.a.m.AgentManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Details from executing class
> com.cloud.agent.api.NetworkUsageCommand: Exception:
> java.lang.Exception
> Message:  vpc network usage plugin call failed
> Stack: java.lang.Exception:  vpc network usage plugin call failed
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.executeNetworkUsage(XenServer56NetworkUsageCommandWrapper.java:84)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:41)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:33)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1737)
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> 2019-07-16 10:30:01,952 WARN
> [c.c.n.r.VirtualNetworkApplianceManagerImpl]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Error while collecting network stats from router:
> r-2280-VM from host: 71; details: Exception: java.lang.Exception
> Message:  vpc network usage plugin call failed
> Stack: java.lang.Exception:  vpc network usage plugin call failed
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.executeNetworkUsage(XenServer56NetworkUsageCommandWrapper.java:84)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:41)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xen56.XenServer56NetworkUsageCommandWrapper.execute(XenServer56NetworkUsageCommandWrapper.java:33)
>         at
> com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)
>         at
> com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1737)
>         at
> com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
>         at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
>         at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836557: Sending  { Cmd , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 100011, 
> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeClean
> up":false,"controlIp":"169.254.0.200","forceStop":true,"volumesToDisco
> nnect":[],"vmName":"r-2280-VM","executeInSequence":false,"wait":0}}]
> }
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.t.Request]
> (Work-Job-Executor-63:ctx-6d1f0235 job-24930/job-24944 ctx-a15a2697)
> (logid:2b94fd5d) Seq 71-7505811728966836557: Executing:  { Cmd , MgmtId:
> 226842157555374, via: 71(hostname), Ver: v1, Flags: 100011, 
> [{"com.cloud.agent.api.StopCommand":{"isProxy":false,"checkBeforeClean
> up":false,"controlIp":"169.254.0.200","forceStop":true,"volumesToDisco
> nnect":[],"vmName":"r-2280-VM","executeInSequence":false,"wait":0}}]
> }
> 2019-07-16 10:30:01,973 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-430:ctx-77c57645) (logid:8733e444) Seq 71-7505811728966836557:
> Executing request
> 2019-07-16 10:30:01,995 DEBUG [c.c.h.x.r.w.x.CitrixStopCommandWrapper]
> (DirectAgent-430:ctx-77c57645) (logid:2b94fd5d) 9. The VM r-2280-VM is 
> in Stopping state
> 2019-07-16 10:30:02,303 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-390:ctx-3b9a78cd) (logid:27f6ec94) Seq 67-8459448950062117779:
> Response Received
>
>
> Any thoughts would be greatly appreciated.
>
> Cheers,
> Nick.
>

RE: Virtual routers randomly rebooting

Reply via email to