Update: After unbinding the ports on machine A, the program passes rte_eal_init() and works correctly with the primary process for any number of cores.
>From the error message when binding the port, I think there are some resource allocation issues with the bound port. But why is it related with the number of cores ... EAL: Cannot mmap device resource file /sys/bus/pci/devices/*0000:02:00.0*/resource0 to address: 0x7fff65bfc000 EAL: Error - exiting with code: 1 Cause: Requested device *0000:02:00.0* cannot be used Regards, Kai On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <[email protected]> wrote: > > Command line: > primary: sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary > > The configurations are as follows: > A) 1 x Intel E5-2650 v4, 12 cores [UMA], XL710 40GbE, bind 02:00.0, > 2048 x 4k huge page > 02:00.0 *Ethernet* controller: Intel Corporation *Ethernet* Controller > XL710 for 40GbE QSFP+ (rev 02) * [<<- Only bind this one]* > 02:00.1 *Ethernet* controller: Intel Corporation *Ethernet* Controller > XL710 for 40GbE QSFP+ (rev 02) > 05:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) > 06:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) > Socket 0 > -------- > Core 0 [0, 12] > Core 1 [1, 13] > Core 2 [2, 14] > Core 3 [3, 15] > Core 4 [4, 16] > Core 5 [5, 17] > Core 8 [6, 18] > Core 9 [7, 19] > Core 10 [8, 20] > Core 11 [9, 21] > Core 12 [10, 22] > Core 13 [11, 23] > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA], No Port Bind, 2048 x 4k > huge page > 05:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) > 06:00.0 *Ethernet* controller: Intel Corporation I210 Gigabit Network > Connection (rev 03) > Socket 0 Socket 1 > -------- -------- > Core 0 [0, 20] [10, 30] > Core 1 [1, 21] [11, 31] > Core 2 [2, 22] [12, 32] > Core 3 [3, 23] [13, 33] > Core 4 [4, 24] [14, 34] > Core 8 [5, 25] [15, 35] > Core 9 [6, 26] [16, 36] > Core 10 [7, 27] [17, 37] > Core 11 [8, 28] [18, 38] > Core 12 [9, 29] [19, 39] > > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my > program with locally generated packets. But I am using other DPDK features, > such as memory sharing and message passing. Maybe that is the reason it > works correctly? I can only access machine B remotely, so I am unable to > install a NIC on it. I have another PC that is used as a client that only > has four cores, which also cannot be used for verification... > > Regards, > Kai > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <[email protected]> > wrote: > >> >> > On Mar 11, 2017, at 9:45 AM, Kai Zhang <[email protected]> wrote: >> > >> > Hi Keith, >> > >> > Thank you for your reply. >> > >> > I have tested my program on two machines >> > A) 1 x Intel E5-2650 v4, 12 cores [UMA] >> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA] >> > >> > I am very sure that the primary process uses different cores with the >> secondary process. The strange thing is that my program works correctly on >> machine B. But on machine A, the above issue happens with more than 4 cores >> assigned to the secondary process. >> > >> > I have tried to assign cores 1-5 to the secondary process and also >> tried other core assignment policies, but the error still happens >> rte_eal_init() with more than 4 cores. >> >> It would be nice to see both command lines. I am not sure I can help more >> all I can do is suggest some ideas to look at. >> >> Does machine B have the same number and type of NICs? Use ‘lspci | grep >> Ethernet’ to get a list of all Ethernet devices on both machines. >> >> What is the number of hugepages you have allocated for both machines. >> >> Also look at the cpu_layout.py script to see why adding the 5th core >> would be different on the two machines and try to make them the same. >> >> > >> > Regards, >> > Kai >> > >> > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <[email protected]> >> wrote: >> > >> > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <[email protected]> wrote: >> > > >> > > Hi, there >> > > >> > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with >> Linux >> > > kernel version 3.8.0-30. >> > > >> > > I have a master process and a secondary process. When I run the >> secondary >> > > process with less than or equal to 4 cores, it works correctly. Such >> as: >> > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary >> > > sudo ./program -c 0x0f -n 4 --proc-type=secondary >> > > >> > > However, there will be error in the rte_eal_init if I assign more >> than 4 >> > > cores. >> > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary >> > > sudo ./program -c 0x1f -n 4 --proc-type=secondary >> > > >> > > EAL: Cannot mmap device resource file >> > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: >> 0x7fff65bfc000 >> > > EAL: Error - exiting with code: 1 >> > > Cause: Requested device 0000:02:00.0 cannot be used >> > >> > I assume you have at least 8 cores. Have you tried -l 1-5 on the >> secondary process. >> > >> > You did not show the primary process command line, but the if you use >> 1-5 then you can only give primary process -l 6-7 or two cores. It is >> always a reasonable thing is to leave core zero for linux to use. >> > >> > Also it could be you ran out of memory or hugepages you allocated to >> the system. >> > >> > > >> > > Anyone knows why this happens? >> > > >> > > Thanks a lot, >> > > Kai Zhang >> > >> > Regards, >> > Keith >> > >> > >> >> Regards, >> Keith >> >> >
