> -----Original Message----- > From: Sergio Gonzalez Monroy [mailto:[email protected]] > Sent: 2017. június 1. 11:03 > To: Imre Pinter <[email protected]>; [email protected] > Cc: Gabor Halász <[email protected]>; Péter Suskovics > <[email protected]> > Subject: Re: [dpdk-users] Slow DPDK startup with many 1G hugepages > > On 01/06/2017 08:55, Imre Pinter wrote: > > Hi, > > > > We experience slow startup time in DPDK-OVS, when backing memory > with 1G hugepages instead of 2M hugepages. > > Currently we're mapping 2M hugepages as memory backend for DPDK > OVS. In the future we would like to allocate this memory from the 1G > hugepage pool. Currently in our deployments we have significant amount of > 1G hugepages allocated (min. 54G) for VMs and only 2G memory on 2M > hugepages. > > > > Typical setup for 2M hugepages: > > GRUB: > > hugepagesz=2M hugepages=1024 hugepagesz=1G hugepages=54 > > default_hugepagesz=1G > > > > $ grep hugetlbfs /proc/mounts > > nodev /mnt/huge_ovs_2M hugetlbfs rw,relatime,pagesize=2M 0 0 nodev > > /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > > > > Typical setup for 1GB hugepages: > > GRUB: > > hugepagesz=1G hugepages=56 default_hugepagesz=1G > > > > $ grep hugetlbfs /proc/mounts > > nodev /mnt/huge_qemu_1G hugetlbfs rw,relatime,pagesize=1G 0 0 > > > > DPDK OVS startup times based on the ovs-vswitchd.log logs: > > > > * 2M (2G memory allocated) - startup time ~3 sec: > > > > 2017-05-03T08:13:50.177Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c > 0x1 > > --huge-dir /mnt/huge_ovs_2M --socket-mem 1024,1024 > > > > 2017-05-03T08:13:50.708Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: > > Datapath supports recirculation > > > > * 1G (56G memory allocated) - startup time ~13 sec: > > 2017-05-03T08:09:22.114Z|00009|dpdk|INFO|EAL ARGS: ovs-vswitchd -c > 0x1 > > --huge-dir /mnt/huge_qemu_1G --socket-mem 1024,1024 > > 2017-05-03T08:09:32.706Z|00010|ofproto_dpif|INFO|netdev@ovs-netdev: > > Datapath supports recirculation I used DPDK 16.11 for OVS and testpmd > and tested on Ubuntu 14.04 with kernel 3.13.0-117-generic and 4.4.0-78- > generic. > > > > We had a discussion with Mark Gray (from Intel), and he come up with the > following items: > > > > · The ~10 sec time difference is there with testpmd as well > > > > · They believe it is a kernel overhead (mmap is slow, perhaps it is > zeroing pages). The following code from eal_memory.c does the above > mentioned printout in EAL startup: > > 469 /* map the segment, and populate page tables, > > 470 * the kernel fills this segment with zeros */ > > 468 uint64_t start = rte_rdtsc(); > > 471 virtaddr = mmap(vma_addr, hugepage_sz, PROT_READ | > PROT_WRITE, > > 472 MAP_SHARED | MAP_POPULATE, fd, 0); > > 473 if (virtaddr == MAP_FAILED) { > > 474 RTE_LOG(DEBUG, EAL, "%s(): mmap failed: %s\n", __func__, > > 475 strerror(errno)); > > 476 close(fd); > > 477 return i; > > 478 } > > 479 > > 480 if (orig) { > > 481 hugepg_tbl[i].orig_va = virtaddr; > > 482 printf("Original mapping of page %u took: %"PRIu64" ticks, > %"PRIu64" ms\n ", > > 483 i, rte_rdtsc() - start, > > 484 (rte_rdtsc() - start) * 1000 / > > 485 rte_get_timer_hz()); > > 486 } > > > > > > A solution could be to mount 1G hugepages to 2 separate directory: 2G for > OVS and the remaining for the VMs, but the NUMA location for these > hugepages is non-deterministic. Since mount cannot handle NUMA related > parameters during mounting hugetlbfs, and fstab forks the mounts during > boot. > > > > Do you have a solution on how to use 1G hugepages for VMs and have > reasonable DPDK EAL startup time? > > In theory, one solution would be to use cgroup , as described here: > http://dpdk.org/ml/archives/dev/2017-February/057742.html > http://dpdk.org/ml/archives/dev/2017-April/063442.html > > Then use 'numactl --interleave' policy. > > I said in theory because it does not seem to work as one would expect, so > the proposed patch in above threads would be a solution by forcing > allocation from specific numa node for each page. > > Thanks, > Sergio >
Thanks for the reply Sergio! The following patch (v5) at the end of the mentioned mail thread seems to be solving the issue. http://dpdk.org/dev/patchwork/patch/25069/ Thanks, Imre > > Thanks, > > Imre > >
