[ovirt-users] Re: numa pinning and reserved hugepages (1G) Bug in scheduler calculation or decision ?

2019-08-24 Thread Ralf Schenk
Hello,

reported it (again) as
https://bugzilla.redhat.com/show_bug.cgi?id=1745247 and referenced [1]

Thanks

Am 23.08.2019 um 14:27 schrieb Andrej Krejcir:
> Hi,
>
> this is a bug in the scheduler. Currently, it ignores hugepages when
> evaluating NUMA pinning.
>
> There is a bugzilla ticket[1] that was originally reported as a
> similar case, but then later the reporter changed it.
>
> Could you open a new bugzilla ticket and attach the details from this
> email?
>
> As a workaround, if you don't want to migrate the VM or you are sure
> that it can run on the target host, you can clone a cluster policy and
> remove the 'NUMA' filter. (In Administration -> Configure ->
> Scheduling Policies).
>
>
> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1720558
>
>
> Best regards,
> Andrej
>
>
>
> On Wed, 21 Aug 2019 at 12:16, Ralf Schenk  > wrote:
>
> Hello List,
>
> i ran into problems using numa-pinning and reserved hugepages.
>
> - My EPYC 7281 based Servers (Dual Socket) have 8 Numa-Nodes each
> having 32 GB of memory for a total of 256 GB System Memory
>
> - I'm using 192 x 1 GB hugepages reserved on the kernel cmdline
> default_hugepagesz=1G hugepagesz=1G hugepages=192 This reserves 24
> hugepages on each numa-node.
>
> I wanted to pin a MariaDB VM using 32 GB (Custom Property
> hugepages=1048576) to numa-nodes 0-3 of CPU-Socket 1. Pinning in
> GUI etc. no problem.
>
> When trying to start the vm this can't be done since ovirt claims
> that the host can't fullfill the memory requirements - which is
> simply not correct since there were > 164 hugepages free.
>
> It should have taken 8 hugepages from each numa node 0-3 to
> fullfill the 32 GB Memory requirement.
>
> I also freed the system completely from other VM's but that didn't
> work either.
>
> Is it possible that the scheduler only takes into account the
> "free memory" (as seen in numactl -H below) *not reserved* by
> hugepages for its decisions ? Since the host has only < 8 GB of
> free mem per numa-node I can understand that VM was not able to
> start under that condition.
>
> VM is runnig and using 32 hugepages without pinning but a warning
> states "VM dbserver01b does not fit to a single NUMA node on host
> myhost.mydomain.de . This may
> negatively impact its performance. Consider using vNUMA and NUMA
> pinning for this VM."
>
> This is the numa Hardware Layout and hugepages usage now with
> other VM's running:
>
> from cat /proc/meminfo
>
> HugePages_Total: 192
> HugePages_Free:  160
> HugePages_Rsvd:    0
> HugePages_Surp:    0
>
> I can confirm that also under the condition of running other VM's
> there are at least 8 hugepages free for each numa-node 0-3:
>
> grep ""
> /sys/devices/system/node/*/hugepages/hugepages-1048576kB/free_hugepages
> 
> /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/free_hugepages:8
> 
> /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages:23
> 
> /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages:20
> 
> /sys/devices/system/node/node3/hugepages/hugepages-1048576kB/free_hugepages:22
> 
> /sys/devices/system/node/node4/hugepages/hugepages-1048576kB/free_hugepages:16
> 
> /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/free_hugepages:5
> 
> /sys/devices/system/node/node6/hugepages/hugepages-1048576kB/free_hugepages:19
> 
> /sys/devices/system/node/node7/hugepages/hugepages-1048576kB/free_hugepages:24
>
> numactl -h:
>
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 2 3 32 33 34 35
> node 0 size: 32673 MB
> node 0 free: 3779 MB
> node 1 cpus: 4 5 6 7 36 37 38 39
> node 1 size: 32767 MB
> node 1 free: 6162 MB
> node 2 cpus: 8 9 10 11 40 41 42 43
> node 2 size: 32767 MB
> node 2 free: 6698 MB
> node 3 cpus: 12 13 14 15 44 45 46 47
> node 3 size: 32767 MB
> node 3 free: 1589 MB
> node 4 cpus: 16 17 18 19 48 49 50 51
> node 4 size: 32767 MB
> node 4 free: 2630 MB
> node 5 cpus: 20 21 22 23 52 53 54 55
> node 5 size: 32767 MB
> node 5 free: 2487 MB
> node 6 cpus: 24 25 26 27 56 57 58 59
> node 6 size: 32767 MB
> node 6 free: 3279 MB
> node 7 cpus: 28 29 30 31 60 61 62 63
> node 7 size: 32767 MB
> node 7 free: 5513 MB
> node distances:
> node   0   1   2   3   4   5   6   7
>   0:  10  16  16  16  32  32  32  32
>   1:  16  10  16  16  32  32  32  32
>   2:  16  16  10  16  32  32  32  32
>   3:  16  16  16  10  32  32  32  32
>   4:  32  32  32  32  10  16  16  16
>   5:  32  32  32  32  16  10  16  16
>   6:  32  32  32  32  16  16  10  16
>   7:  32  32  32  32  16  16  16  10
>
> --
>
>
> *Ralf Schenk*
> fon +49 (0) 24 05 / 40 83 70
> fax +49 

[ovirt-users] Re: numa pinning and reserved hugepages (1G) Bug in scheduler calculation or decision ?

2019-08-23 Thread Andrej Krejcir
Hi,

this is a bug in the scheduler. Currently, it ignores hugepages when
evaluating NUMA pinning.

There is a bugzilla ticket[1] that was originally reported as a similar
case, but then later the reporter changed it.

Could you open a new bugzilla ticket and attach the details from this email?

As a workaround, if you don't want to migrate the VM or you are sure that
it can run on the target host, you can clone a cluster policy and remove
the 'NUMA' filter. (In Administration -> Configure -> Scheduling Policies).


[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1720558


Best regards,
Andrej



On Wed, 21 Aug 2019 at 12:16, Ralf Schenk  wrote:

> Hello List,
>
> i ran into problems using numa-pinning and reserved hugepages.
>
> - My EPYC 7281 based Servers (Dual Socket) have 8 Numa-Nodes each having
> 32 GB of memory for a total of 256 GB System Memory
>
> - I'm using 192 x 1 GB hugepages reserved on the kernel cmdline
> default_hugepagesz=1G hugepagesz=1G hugepages=192 This reserves 24
> hugepages on each numa-node.
>
> I wanted to pin a MariaDB VM using 32 GB (Custom Property
> hugepages=1048576) to numa-nodes 0-3 of CPU-Socket 1. Pinning in GUI etc.
> no problem.
>
> When trying to start the vm this can't be done since ovirt claims that the
> host can't fullfill the memory requirements - which is simply not correct
> since there were > 164 hugepages free.
>
> It should have taken 8 hugepages from each numa node 0-3 to fullfill the
> 32 GB Memory requirement.
>
> I also freed the system completely from other VM's but that didn't work
> either.
>
> Is it possible that the scheduler only takes into account the "free
> memory" (as seen in numactl -H below) *not reserved* by hugepages for its
> decisions ? Since the host has only < 8 GB of free mem per numa-node I can
> understand that VM was not able to start under that condition.
>
> VM is runnig and using 32 hugepages without pinning but a warning states
> "VM dbserver01b does not fit to a single NUMA node on host
> myhost.mydomain.de. This may negatively impact its performance. Consider
> using vNUMA and NUMA pinning for this VM."
>
> This is the numa Hardware Layout and hugepages usage now with other VM's
> running:
>
> from cat /proc/meminfo
>
> HugePages_Total: 192
> HugePages_Free:  160
> HugePages_Rsvd:0
> HugePages_Surp:0
>
> I can confirm that also under the condition of running other VM's there
> are at least 8 hugepages free for each numa-node 0-3:
>
> grep ""
> /sys/devices/system/node/*/hugepages/hugepages-1048576kB/free_hugepages
>
> /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/free_hugepages:8
>
> /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/free_hugepages:23
>
> /sys/devices/system/node/node2/hugepages/hugepages-1048576kB/free_hugepages:20
>
> /sys/devices/system/node/node3/hugepages/hugepages-1048576kB/free_hugepages:22
>
> /sys/devices/system/node/node4/hugepages/hugepages-1048576kB/free_hugepages:16
>
> /sys/devices/system/node/node5/hugepages/hugepages-1048576kB/free_hugepages:5
>
> /sys/devices/system/node/node6/hugepages/hugepages-1048576kB/free_hugepages:19
>
> /sys/devices/system/node/node7/hugepages/hugepages-1048576kB/free_hugepages:24
>
> numactl -h:
>
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 2 3 32 33 34 35
> node 0 size: 32673 MB
> node 0 free: 3779 MB
> node 1 cpus: 4 5 6 7 36 37 38 39
> node 1 size: 32767 MB
> node 1 free: 6162 MB
> node 2 cpus: 8 9 10 11 40 41 42 43
> node 2 size: 32767 MB
> node 2 free: 6698 MB
> node 3 cpus: 12 13 14 15 44 45 46 47
> node 3 size: 32767 MB
> node 3 free: 1589 MB
> node 4 cpus: 16 17 18 19 48 49 50 51
> node 4 size: 32767 MB
> node 4 free: 2630 MB
> node 5 cpus: 20 21 22 23 52 53 54 55
> node 5 size: 32767 MB
> node 5 free: 2487 MB
> node 6 cpus: 24 25 26 27 56 57 58 59
> node 6 size: 32767 MB
> node 6 free: 3279 MB
> node 7 cpus: 28 29 30 31 60 61 62 63
> node 7 size: 32767 MB
> node 7 free: 5513 MB
> node distances:
> node   0   1   2   3   4   5   6   7
>   0:  10  16  16  16  32  32  32  32
>   1:  16  10  16  16  32  32  32  32
>   2:  16  16  10  16  32  32  32  32
>   3:  16  16  16  10  32  32  32  32
>   4:  32  32  32  32  10  16  16  16
>   5:  32  32  32  32  16  10  16  16
>   6:  32  32  32  32  16  16  10  16
>   7:  32  32  32  32  16  16  16  10
>
> --
>
>
> *Ralf Schenk*
> fon +49 (0) 24 05 / 40 83 70
> fax +49 (0) 24 05 / 40 83 759
> mail *r...@databay.de* 
>
> *Databay AG*
> Jens-Otto-Krag-Straße 11
> D-52146 Würselen
> *www.databay.de* 
>
> Sitz/Amtsgericht Aachen • HRB:8437 • USt-IdNr.: DE 210844202
> Vorstand: Ralf Schenk, Dipl.-Ing. Jens Conze, Aresch Yavari, Dipl.-Kfm.
> Philipp Hermanns
> Aufsichtsratsvorsitzender: Wilhelm Dohmen
> --
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Cod