Re: [Qemu-devel] Overcommiting cpu results in all vms offline

Jack Wang Mon, 17 Sep 2018 02:43:09 -0700

Stefan Priebe - Profihost AG <s.pri...@profihost.ag> 于2018年9月17日周一 上午9:00写道：
>
> Hi,
>
> Am 17.09.2018 um 08:38 schrieb Jack Wang:
> > Stefan Priebe - Profihost AG <s.pri...@profihost.ag> 于2018年9月16日周日 下午3:31写道：
> >>
> >> Hello,
> >>
> >> while overcommiting cpu I had several situations where all vms gone 
> >> offline while two vms saturated all cores.
> >>
> >> I believed all vms would stay online but would just not be able to use all 
> >> their cores?
> >>
> >> My original idea was to automate live migration on high host load to move 
> >> vms to another node but that makes only sense if all vms stay online.
> >>
> >> Is this expected? Anything special needed to archive this?
> >>
> >> Greets,
> >> Stefan
> >>
> > Hi, Stefan,
> >
> > Do you have any logs when all VMs go offline?
> > Maybe OOMkiller play a role there?
>
> After reviewing i think this is memory related but OOM did not play a role.
> All kvm processes where spinning trying to get > 100% CPU and i was not
> able to even login to ssh. After 5-10 minutes i was able to login.
So the VMs are not really offline, what the result if you run
query-status via qmp?
>
> There were about 150GB free mem.
>
> Relevant settings (no local storage involved):
>         vm.dirty_background_ratio:
>             3
>         vm.dirty_ratio:
>             10
>         vm.min_free_kbytes:
>             10567004
>
> # cat /sys/kernel/mm/transparent_hugepage/defrag
> always defer [defer+madvise] madvise never
>
> # cat /sys/kernel/mm/transparent_hugepage/enabled
> [always] madvise never
>
> After that i had the following traces on the host node:
> https://pastebin.com/raw/0VhyQmAv


The call trace looks ceph related deadlock or hung.

>
> Thanks!
>
> Greets,
> Stefan

Re: [Qemu-devel] Overcommiting cpu results in all vms offline

Reply via email to