On January 28, 2020 4:55:37 PM GMT+02:00, [email protected] wrote:
>Hi All,
>
>A question regarding memory management with ovirt. I know memory can
>be complicated hence I'm asking the experts. :)
>
>Two examples of where it looks - to me - that memory management from
>ovirt perspective is incorrect. This is resulting in us not getting as
>much out of a host as we'd expect.
>
>## Example 1:
>
>host: dev-cluster-04
>
>I understand the mem on the host to be:
>128G total (physical)
>68G used
>53G available
>56G buff/cache
>
>I understand therefore 53G should still be available to allocate
>(approximately, minus a few things).
>
>```
>  DEV  [root@dev-cluster-04:~]  # free -m
>      total        used        free      shared  buff/cache   available
>Mem:         128741       68295        4429        4078       56016    
>  53422
>  Swap:         12111        1578       10533
>  DEV  [root@dev-cluster-04:~]  # cat /proc/meminfo
>  MemTotal:       131831292 kB
>  MemFree:         4540852 kB
>  MemAvailable:   54709832 kB
>  Buffers:            3104 kB
>  Cached:          5174136 kB
>  SwapCached:       835012 kB
>  Active:         66943552 kB
>  Inactive:        5980340 kB
>  Active(anon):   66236968 kB
>  Inactive(anon):  5713972 kB
>  Active(file):     706584 kB
>  Inactive(file):   266368 kB
>  Unevictable:       50036 kB
>  Mlocked:           54132 kB
>  SwapTotal:      12402684 kB
>  SwapFree:       10786688 kB
>  Dirty:               812 kB
>  Writeback:             0 kB
>  AnonPages:      67068548 kB
>  Mapped:           143880 kB
>  Shmem:           4176328 kB
>  Slab:           52183680 kB
>  SReclaimable:   49822156 kB
>  SUnreclaim:      2361524 kB
>  KernelStack:       20000 kB
>  PageTables:       213628 kB
>  NFS_Unstable:          0 kB
>  Bounce:                0 kB
>  WritebackTmp:          0 kB
>  CommitLimit:    78318328 kB
>  Committed_AS:   110589076 kB
>  VmallocTotal:   34359738367 kB
>  VmallocUsed:      859104 kB
>  VmallocChunk:   34291324976 kB
>  HardwareCorrupted:     0 kB
>  AnonHugePages:    583680 kB
>  CmaTotal:              0 kB
>  CmaFree:               0 kB
>  HugePages_Total:       0
>  HugePages_Free:        0
>  HugePages_Rsvd:        0
>  HugePages_Surp:        0
>  Hugepagesize:       2048 kB
>  DirectMap4k:      621088 kB
>  DirectMap2M:    44439552 kB
>  DirectMap1G:    91226112 kB
>```
>
>The ovirt engine, compute -> hosts view shows s4-dev-cluster-01 as 93%
>memory utilised.
>
>Clicking on the node says:
>Physical Memory: 128741 MB total, 119729 MB used, 9012 MB free
>
>So ovirt engine says 9G free. The OS reports 4G free but 53G
>available. Surely ovirt should be looking at available memory?
>
>This is a problem, for instance, when trying to run a VM, called
>dev-cassandra-01, with mem size 24576, max mem 24576 and mem
>guarantee set to 10240 on this host it fails with:
>
>```
>  Cannot run VM. There is no host that satisfies current scheduling
>  constraints. See below for details:
>
>  The host dev-cluster-04.fnb.co.za did not satisfy internal filter
>  Memory because its available memory is too low (19884 MB) to run the
>  VM.
>```
>
>To me this looks blatantly wrong. The host has 53G available according
>to free -m.
>
>Guessing I'm missing something, unless this is some sort of bug?
>
>versions:
>
>```
>  engine: 4.3.7.2-1.el7
>
>  host:
>  OS Version: RHEL - 7 - 6.1810.2.el7.centos
>  OS Description: CentOS Linux 7 (Core)
>  Kernel Version: 3.10.0 - 957.12.1.el7.x86_64
>  KVM Version: 2.12.0 - 18.el7_6.3.1
>  LIBVIRT Version: libvirt-4.5.0-10.el7_6.7
>  VDSM Version: vdsm-4.30.13-1.el7
>  SPICE Version: 0.14.0 - 6.el7_6.1
>  GlusterFS Version: [N/A]
>  CEPH Version: librbd1-10.2.5-4.el7
>  Open vSwitch Version: openvswitch-2.10.1-3.el7
>  Kernel Features: PTI: 1, IBRS: 0, RETP: 1, SSBD: 3
>  VNC Encryption: Disabled
>```
>
>## Example 2:
>
>A ovirt host with two VMs:
>
>According to the host, it has 128G of physical memory of which 56G is
>used, 69G is buff/cache and 65G is available.
>
>As is shown here:
>
>```
>  LIVE  [root@prod-cluster-01:~]  # cat /proc/meminfo
>  MemTotal:       131326836 kB
>  MemFree:         2630812 kB
>  MemAvailable:   66573596 kB
>  Buffers:            2376 kB
>  Cached:          5670628 kB
>  SwapCached:       151072 kB
>  Active:         59106140 kB
>  Inactive:        2744176 kB
>  Active(anon):   58099732 kB
>  Inactive(anon):  2327428 kB
>  Active(file):    1006408 kB
>  Inactive(file):   416748 kB
>  Unevictable:       40004 kB
>  Mlocked:           42052 kB
>  SwapTotal:       4194300 kB
>  SwapFree:        3579492 kB
>  Dirty:                 0 kB
>  Writeback:             0 kB
>  AnonPages:      56085040 kB
>  Mapped:           121816 kB
>  Shmem:           4231808 kB
>  Slab:           65143868 kB
>  SReclaimable:   63145684 kB
>  SUnreclaim:      1998184 kB
>  KernelStack:       25296 kB
>  PageTables:       148336 kB
>  NFS_Unstable:          0 kB
>  Bounce:                0 kB
>  WritebackTmp:          0 kB
>  CommitLimit:    69857716 kB
>  Committed_AS:   76533164 kB
>  VmallocTotal:   34359738367 kB
>  VmallocUsed:      842296 kB
>  VmallocChunk:   34291404724 kB
>  HardwareCorrupted:     0 kB
>  AnonHugePages:     55296 kB
>  CmaTotal:              0 kB
>  CmaFree:               0 kB
>  HugePages_Total:       0
>  HugePages_Free:        0
>  HugePages_Rsvd:        0
>  HugePages_Surp:        0
>  Hugepagesize:       2048 kB
>  DirectMap4k:      722208 kB
>  DirectMap2M:    48031744 kB
>  DirectMap1G:    87031808 kB
>
>  LIVE  [root@prod-cluster-01:~]  # free -m
>      total        used        free      shared  buff/cache   available
>Mem:         128248       56522        2569        4132       69157    
>  65013
>  Swap:          4095         600        3495
>```
>
>However the compute -> hosts ovirt screen shows this node as 94%
>memory.
>
>Clicking compute -> hosts -> prod-cluster-01 -> general says:
>
>Physical Memory: 128248 MB total, 120553 MB used, 7695 MB free
>Swap Size: 4095 MB total, 600 MB used, 3495 MB free
>
>The physical memory in the above makes no sense to me. Unless it
>includes caches which I would think it shouldn't.
>
>This host has just two VMs:
>
>LIVE  [root@prod-cluster-01:~]  # virsh -c
>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf list
> Id    Name                           State
>----------------------------------------------------
> 35    prod-box-18                   running
> 36    prod-box-11                   running
>
>Moreover each VM has 32G memory set, in every possible place - from
>what I can see.
>
>```
>LIVE  [root@prod-cluster-01:~]  # virsh -c
>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf
>dumpxml prod-box-11|grep -i mem
><ovirt-vm:memGuaranteedSize
>type="int">32768</ovirt-vm:memGuaranteedSize>
><ovirt-vm:minGuaranteedMemoryMb
>type="int">32768</ovirt-vm:minGuaranteedMemoryMb>
>    <memory unit='KiB'>33554432</memory>
>    <currentMemory unit='KiB'>33554432</currentMemory>
>        <cell id='0' cpus='0-27' memory='33554432' unit='KiB'/>
>      <suspend-to-mem enabled='no'/>
><model type='qxl' ram='65536' vram='32768' vgamem='16384' heads='1'
>primary='yes'/>
>      <memballoon model='virtio'>
>      </memballoon>
>```
>
>prod-box-11 is however set as high performance VM. That could cause a
>problem.
>
>Same for the other VM:
>
>```
>LIVE  [root@prod-cluster-01:~]  # virsh -c
>qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf
>dumpxml prod-box-18|grep -i mem
><ovirt-vm:memGuaranteedSize
>type="int">32768</ovirt-vm:memGuaranteedSize>
><ovirt-vm:minGuaranteedMemoryMb
>type="int">32768</ovirt-vm:minGuaranteedMemoryMb>
>    <memory unit='KiB'>33554432</memory>
>    <currentMemory unit='KiB'>33554432</currentMemory>
>        <cell id='0' cpus='0-27' memory='33554432' unit='KiB'/>
>      <suspend-to-mem enabled='no'/>
><model type='qxl' ram='65536' vram='32768' vgamem='16384' heads='1'
>primary='yes'/>
>      <memballoon model='virtio'>
>      </memballoon>
>```
>
>So I understand that two VMs each having allocated 32G of ram should
>consume approx 64G of ram on the host. The host has 128G of ram, so
>usage should be at approx 50%. However ovirt is reporting 94% usage.
>
>Versions:
>
>```
>  engine: 4.3.5.5-1.el7
>
>  host:
>  OS Version: RHEL - 7 - 6.1810.2.el7.centos
>  OS Description: CentOS Linux 7 (Core)
>  Kernel Version: 3.10.0 - 957.10.1.el7.x86_64
>  KVM Version: 2.12.0 - 18.el7_6.3.1
>  LIBVIRT Version: libvirt-4.5.0-10.el7_6.6
>  VDSM Version: vdsm-4.30.11-1.el7
>  SPICE Version: 0.14.0 - 6.el7_6.1
>  GlusterFS Version: [N/A]
>  CEPH Version: librbd1-10.2.5-4.el7
>  Open vSwitch Version: openvswitch-2.10.1-3.el7
>  Kernel Features: PTI: 1, IBRS: 0, RETP: 1
>  VNC Encryption: Disabled
>```
>
>Thanks for any insights!
>
>--
>Divan Santana
>https://divansantana.com
>_______________________________________________
>Users mailing list -- [email protected]
>To unsubscribe send an email to [email protected]
>Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>oVirt Code of Conduct:
>https://www.ovirt.org/community/about/community-guidelines/
>List Archives:
>https://lists.ovirt.org/archives/list/[email protected]/message/DDTOCVPEXOGT43UJ42CKEQJ6FSAZFFVQ/

I've seen similar behavior before.
Have you tried to put the host in maintenance and once all VMs are moved away 
to reboot it ?

Best Regards,
Strahil Nikolov
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/MGK7QZJOLVDALTJZQRAWDTKOEBYLSR2F/

Reply via email to