> -----Original Message----- > From: Tobias Hofmann -T (tohofman - AAP3 INC at Cisco) > [mailto:tohof...@cisco.com] > Sent: Friday, April 5, 2019 9:39 PM > To: b...@openvswitch.org; Burakov, Anatoly <anatoly.bura...@intel.com> > Cc: Shriroop Joshi (shrirjos) <shrir...@cisco.com>; Stokes, Ian > <ian.sto...@intel.com> > Subject: Re: [ovs-discuss] OVS-DPDK fails after clearing buffer > > Hi Anatoly, > > I just wanted to follow up on the issue reported below. (It's already been 2 > weeks ago) > > I don’t really understand the first solution you suggested: use IOVA as VA > mode Does that mean I shall load vfio-pci driver before I set dpdk-init to > true? So, doing a 'modprobe vfio-pci'? Actually I use vfio-pci but I wait with > loading the vfio-pci until I actually bind an interface to it.
Hi Tobias, As far as I can remember, in 18.08, IOVA as VA mode will be enabled if 0) modprobe vfio-pci, enable IOMMU in the BIOS, etc. 1) you have *at least one physical device* (otherwise EAL defaults to IOVA as PA mode) 2) *all* of your *physical* devices are bound to vfio-pci Provided all of this is true, DPDK should run in IOVA as VA mode. Alternatively, DPDK 17.11 and 18.11 will have --iova-mode command-line switch which will allow forcing IOVA as VA mode if possible, but I'm not sure if 18.08 has it. > > Also, to answer your last question: Transparent HugePages are enabled. I've > just disabled them and was still able to reproduce the issue. Unfortunately, I can't be of much help here as I did not look into how vmcaches work on Linux, let alone what happens when hugepages end up in said cache. I obviously don't know specifics of your use case and whether it's really necessary to drop caches, however a cursory Google search indicates that the general sentiment seems to be that you shouldn't drop caches in the first place, and that it is not a good practice in general. > > Regards > Toby > > > On 3/21/19, 12:19 PM, "Ian Stokes" <ian.sto...@intel.com> wrote: > > On 3/20/2019 10:37 PM, Tobias Hofmann -T (tohofman - AAP3 INC at Cisco) > via discuss wrote: > > Hello, > > > > Hi, > > I wasnt sure at first glance what was happening so discussed with > Anatoly (Cc'd) who has worked a considerable amount with DPDK memory > models. Please see response below to what the suspected issue is. > Anatoly, thanks for you input on this. > > > I want to use Open vSwitch with DPDK enabled. For this purpose, I first > > allocate 512 HugePages of size 2MB to have a total of 1GB of HugePage > > memory available for OVS-DPDK. (I don’t set any value for > > */dpdk-socket-mem/ *so the default value of 1GB is taken). Then I set > > */dpdk-init=true/*. This normally works fine. > > > > However, I have realized that I can’t allocate HugePages from memory > > that is inside the buff/cache (visible through */free -h/*). To solve > > this issue, I decided to clear the cache/buffer in Linux before > > allocating HugePages by running */echo 1 > > /proc/sys/vm/drop_caches/*. > > > > After that, allocation of the HugePages still works fine. However, when > > I then run */ovs-vsctl set open_vswitch other_config:dpdk-init=true/* > > the process crashes and inside the ovs-vswitchd.log I observe the > following: > > > > *ovs-vswitchd log output:* > > > > 2019-03-18T13:32:41.112Z|00015|dpdk|ERR|EAL: Can only reserve 270 > pages > > from 512 requested > > > > Current CONFIG_RTE_MAX_MEMSEG=256 is not enough > > After you drop the cache, from the above log it is clear that, as a > result, hugepages’ physical addresses get fragmented, as DPDK cannot > concatenate pages into segments any more (which results in > 1-page-per-segment type situation which causes you to run out of > memseg > structures, of which there are only 256). We have no control over what > addresses we get from the OS, so there’s really no way to “unfragment” > the pages. > > So, the above only happens when > > 1) you’re running in IOVA as PA mode (so, using real physical addresses). > 2) your hugepages are heavily fragmented. > > Possible solutions for this are: > > 1. Use IOVA as VA mode (so, use VFIO, not igb_uio), this way, the pages > will still be fragmented, but the IOMMU will remap them to be contiguous > – this is the recommended option, with VFIO being available it is the > better choice than igb_uio. > > 2. Use bigger page sizes. Strictly speaking, this isn’t a solution as > memory would be fragmented too, but a 1GB-long standalone segment is > way > more useful than a standalone 2MB-long segment. > > 3. Reboot (as you have done), maybe try re-reserving all pages? E.g. > i. Clean your hugetlbfs contents to free any leftover pages > ii. echo 0 > /sys/kernel/mm/hugepages/hugepage-<size>/nr_hugepages > iii. echo 512 > /sys/kernel/mm/hugepages/hugepage- > <size>/nr_hugepages > > Alternatively if you upgrade to OVs 2.11 it will use DPDK 18.11. This > would make a difference as since DPDK 18.05+ we don’t require > PA-contiguous segments any more > > I would also question why these pages are in the regular page cache in > the first place. Are transparent hugepages enabled? > > HTL > Ian > > > > > Please either increase it or request less amount of memory. > > > > 2019-03-18T13:32:41.112Z|00016|dpdk|ERR|EAL: Cannot init memory > > > > 2019-03-18T13:32:41.128Z|00002|daemon_unix|ERR|fork child died > before > > signaling startup (killed (Aborted)) > > > > 2019-03-18T13:32:41.128Z|00003|daemon_unix|EMER|could not detach > from > > foreground session > > > > *Tech Details:* > > > > * Open vSwitch version: 2.9.2 > > * DPDK version: 17.11 > > * System has only a single NUMA node. > > > > This problem is consistently reproducible when having a relatively high > > amount of memory in the buffer/cache (usually around 5GB) and clearing > > the buffer afterwards with the command outlined above. > > > > On the Internet, I found some posts saying that this is due to memory > > fragmentation but normally I’m not even able to allocate HugePages in > > the first place when my memory is already fragmented. In this scenario > > however the allocation of HugePages works totally fine after clearing > > the buffer so why would they be fragmented? > > > > A workaround that I know of is a reboot. > > > > I’d be very grateful about any opinion on that. > > > > > > > > Thank you > > > > Tobias > > > > > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss