On 6/6/2017 8:39 PM, Imre Pinter wrote:
Hi guys,
Thanks for the replies. See my comments inline.
-----Original Message-----
From: Tan, Jianfeng [mailto:[email protected]]
Sent: 2017. június 2. 3:40
To: Marco Varlese <[email protected]>; Imre Pinter
<[email protected]>; [email protected]
Cc: Gabor Halász <[email protected]>; Péter Suskovics
<[email protected]>
Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages
-----Original Message-----
From: Marco Varlese [mailto:[email protected]]
Sent: Thursday, June 1, 2017 6:12 PM
To: Tan, Jianfeng; Imre Pinter; [email protected]
Cc: Gabor Halász; Péter Suskovics
Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages
On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote:
-----Original Message-----
From: users [mailto:[email protected]] On Behalf Of Imre
Pinter
Sent: Thursday, June 1, 2017 3:55 PM
To: [email protected]
Cc: Gabor Halász; Péter Suskovics
Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages
Hi,
We experience slow startup time in DPDK-OVS, when backing memory
with
1G hugepages instead of 2M hugepages.
Currently we're mapping 2M hugepages as memory backend for DPDK
OVS.
In the future we would like to allocate this memory from the 1G
hugepage
pool. Currently in our deployments we have significant amount of
1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M
hugepages.
Typical setup for 2M hugepages:
GRUB:
hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54
default_hugepagesz=1G
$ grep hugetlbfs /proc/mounts
nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev
/mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
Typical setup for 1GB hugepages:
GRUB:
hugepagesz=1G hugepages=56 default_hugepagesz=1G
$ grep hugetlbfs /proc/mounts
nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
DPDK OVS startup times based on the ovs-vswitchd.log logs:
* 2M (2G memory allocated) - startup time ~3 sec:
2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
0x1
--huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024
2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-
netdev:
Datapath supports recirculation
* 1G (56G memory allocated) - startup time ~13 sec:
2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c
0x1
--huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024
2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-
netdev:
Datapath supports recirculation
I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04
with kernel 3.13.0-117-generic and 4.4.0-78-generic.
You can shorten the time by this:
(1) Mount 1 GB hugepages into two directories.
nodev /mnt/huge_ovs_1G hugetlbfs rw,relatime,pagesize=1G,size=<how
much you
want to use in OVS> 0 0
nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0
I understood (reading Imre) that this does not really work because of
non- deterministic allocation of hugepages in a NUMA architecture.
e.g. we would end up (potentially) using hugepages allocated on
different nodes even when accessing the OVS directory.
Did I understand this correctly?
Did you try step 2? And Sergio also gives more options on another email in this
thread for your reference.
Thanks,
Jianfeng
@Jianfeng: Step (1) will not help in our case. Hence 'mount' will not allocate
hugepages from NUMA1 till the system has free hugepages on NUMA0.
I have 56G hugepages allocated from 1G size. This means 28-28G hugepages
available per NUMA node. If mounting action is performed via fstab, then we'll
end up in one of the following scenarios randomly.
First mount for OVS, then for VMs:
+---------------------------------------+---------------------------------------+
| NUMA0 | NUMA1
|
+---------------------------------------+---------------------------------------+
| OVS(2G) | VMs(26G) | VMs (28G)
|
+---------------------------------------+---------------------------------------+
First mount for VMs, then OVS:
+---------------------------------------+---------------------------------------+
| NUMA0 | NUMA1
|
+---------------------------------------+---------------------------------------+
| VMs (28G) | VMs(26G) | OVS(2G)
|
+---------------------------------------+---------------------------------------+
This is why I suggested step 2 to allocate memory in an interleave way.
Do you try that?
Thanks,
Jianfeng
@Marco: After the hugepages were allocated, the ones in OVS directory were
either from NUMA0, or NUMA1, but not from both (different setup come after a
roboot). This caused error in DPDK startup, hence 1-1 hugepages were requested
from both NUMA nodes, and there was no hugepages allocated to the other NUMA
node.
(2) Force to use memory interleave policy $ numactl
--interleave=all ovs-vswitchd ...
Note: keep the huge-dir and socket-mem option, "--huge-dir
/mnt/huge_ovs_1G --
socket-mem 1024,1024".
@Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all
ovs-vswitchd ...' cannot help, because all the hugepages mounted to OVS
directory will be from one of the NUMA nodes. The DPDK application requires
1-1G hugepage from both of the NUMA nodes, so DPDK returns with an error.
I have also tried without Step (1), and we still has the slower startup.
Currently I'm looking into Sergio's mail.
Br,
Imre