> -----Original Message----- > From: Tan, Jianfeng [mailto:[email protected]] > Sent: 2017. június 6. 16:32 > To: Imre Pinter <[email protected]>; Marco Varlese > <[email protected]>; [email protected] > Cc: Gabor Halász <[email protected]>; Péter Suskovics > <[email protected]> > Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages > > > > On 6/6/2017 8:39 PM, Imre Pinter wrote: > > Hi guys, > > > > Thanks for the replies. See my comments inline. > > > > > > -----Original Message----- > > From: Tan, Jianfeng [mailto:[email protected]] > > Sent: 2017. június 2. 3:40 > > To: Marco Varlese <[email protected]>; Imre Pinter > > <[email protected]>; [email protected] > > Cc: Gabor Halász <[email protected]>; Péter Suskovics > > <[email protected]> > > Subject: RE: [dpdk-users] Slow DPDK startup with many 1G hugepages > > > > > > > >> -----Original Message----- > >> From: Marco Varlese [mailto:[email protected]] > >> Sent: Thursday, June 1, 2017 6:12 PM > >> To: Tan, Jianfeng; Imre Pinter; [email protected] > >> Cc: Gabor Halász; Péter Suskovics > >> Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages > >> > >> On Thu, 2017-06-01 at 08:50 +0000, Tan, Jianfeng wrote: > >>>> -----Original Message----- > >>>> From: users [mailto:[email protected]] On Behalf Of Imre > >>>> Pinter > >>>> Sent: Thursday, June 1, 2017 3:55 PM > >>>> To: [email protected] > >>>> Cc: Gabor Halász; Péter Suskovics > >>>> Subject: [dpdk-users] Slow DPDK startup with many 1G hugepages > >>>> > >>>> Hi, > >>>> > >>>> We experience slow startup time in DPDK-OVS, when backing memory > >> with > >>>> 1G hugepages instead of 2M hugepages. > >>>> Currently we're mapping 2M hugepages as memory backend for DPDK > >> OVS. > >>>> In the future we would like to allocate this memory from the 1G > >> hugepage > >>>> pool. Currently in our deployments we have significant amount of 1G > >>>> hugepages allocated (min. 54G) for VMs and only 2G memory on 2M > >>>> hugepages. > >>>> > >>>> Typical setup for 2M hugepages: > >>>> GRUB: > >>>> hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 > >>>> default_hugepagesz=1G > >>>> > >>>> $ grep hugetlbfs /proc/mounts > >>>> nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 > nodev > >>>> /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > >>>> > >>>> Typical setup for 1GB hugepages: > >>>> GRUB: > >>>> hugepagesz=1G hugepages=56 default_hugepagesz=1G > >>>> > >>>> $ grep hugetlbfs /proc/mounts > >>>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > >>>> > >>>> DPDK OVS startup times based on the ovs-vswitchd.log logs: > >>>> > >>>> * 2M (2G memory allocated) - startup time ~3 sec: > >>>> > >>>> 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd - > c > >> 0x1 > >>>> --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024 > >>>> > >>>> 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs- > >> netdev: > >>>> Datapath supports recirculation > >>>> > >>>> * 1G (56G memory allocated) - startup time ~13 sec: > >>>> 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd - > c > >> 0x1 > >>>> --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024 > >>>> 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs- > >> netdev: > >>>> Datapath supports recirculation > >>>> I used DPDK 16.11 for OVS and testpmd and tested on Ubuntu 14.04 > >>>> with kernel 3.13.0-117-generic and 4.4.0-78-generic. > >>> > >>> You can shorten the time by this: > >>> > >>> (1) Mount 1 GB hugepages into two directories. > >>> nodev /mnt/huge_ovs_1G hugetlbfs > rw,relatime,pagesize=1G,size=<how > >> much you > >>> want to use in OVS> 0 0 > >>> nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > >> I understood (reading Imre) that this does not really work because of > >> non- deterministic allocation of hugepages in a NUMA architecture. > >> e.g. we would end up (potentially) using hugepages allocated on > >> different nodes even when accessing the OVS directory. > >> Did I understand this correctly? > > Did you try step 2? And Sergio also gives more options on another email in > this thread for your reference. > > > > Thanks, > > Jianfeng > > > > @Jianfeng: Step (1) will not help in our case. Hence 'mount' will not > > allocate > hugepages from NUMA1 till the system has free hugepages on NUMA0. > > I have 56G hugepages allocated from 1G size. This means 28-28G > hugepages available per NUMA node. If mounting action is performed via > fstab, then we'll end up in one of the following scenarios randomly. > > First mount for OVS, then for VMs: > > +---------------------------------------+---------------------------------------+ > > | NUMA0 | NUMA1 > > | > > +---------------------------------------+---------------------------------------+ > > | OVS(2G) | VMs(26G) | VMs (28G) > > | > > +---------------------------------------+---------------------------------------+ > > > > First mount for VMs, then OVS: > > +---------------------------------------+---------------------------------------+ > > | NUMA0 | NUMA1 > > | > > +---------------------------------------+---------------------------------------+ > > | VMs (28G) | VMs(26G) | > > OVS(2G) | > > +---------------------------------------+---------------------------------------+ > > This is why I suggested step 2 to allocate memory in an interleave way. > Do you try that? > > Thanks, > Jianfeng > I've double-checked it, and if I combine Step (1) and Step (2), then OVS start end up in the following error: EAL: Detected 32 lcore(s) EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size EAL: Probing VFIO support... EAL: Not enough memory available on socket 1! Requested: 1024MB, available: 0MB PANIC in rte_eal_init(): Cannot init memory
I experienced the same behavior with testpmd. However when they use hugepages from the 'huge_qemu_1G' mountpoint, then they start properly. Br, Imre > > @Marco: After the hugepages were allocated, the ones in OVS directory > were either from NUMA0, or NUMA1, but not from both (different setup > come after a roboot). This caused error in DPDK startup, hence 1-1 > hugepages were requested from both NUMA nodes, and there was no > hugepages allocated to the other NUMA node. > > > >>> (2) Force to use memory interleave policy $ numactl > >>> --interleave=all ovs-vswitchd ... > >>> > >>> Note: keep the huge-dir and socket-mem option, "--huge-dir > >> /mnt/huge_ovs_1G -- > >>> socket-mem 1024,1024". > >>> > > @Jianfeng: If I perform Step (1), then Step (2) 'numactl --interleave=all > > ovs- > vswitchd ...' cannot help, because all the hugepages mounted to OVS > directory will be from one of the NUMA nodes. The DPDK application > requires 1-1G hugepage from both of the NUMA nodes, so DPDK returns > with an error. > > I have also tried without Step (1), and we still has the slower startup. > > Currently I'm looking into Sergio's mail. > > > > Br, > > Imre
