> On Mar 12, 2017, at 6:39 PM, Kai Zhang <[email protected]> wrote: > > > Your application may be attaching to the same port for each core. Normally > this means the each core could be allocating memory and the 4th core just > goes over the amount of memory you have reserved. > > I don't think so. Because the error is in the rte_eal_init(), which is > executed in the first line of the main() function. At the time, the other > threads are not even launched. > > Is it possible to consider this as a bug in DPDK?
One more thing, I run Pktgen as two processes all of the time. The big difference is I do not run in primary and secondary modes. I run two different instances of pktgen at the same time without seeing this type problem. If the failure is associated with primary/secondary application model, then it could be a bug in that code as a lot of syncing up between the two processes needs to be done because of memory/device sharing. One problem with P/S applications is memory needs to be mapped at the same address between the processes and Linux has the Random memory mapping builtin for security reasons. I forget the name of the mode in Linux to turn off the random page mapping and google is not work for me ATM. Does your application require running as a primary/secondary application? > > Regards, > Kai > > > > > > EAL: Cannot mmap device resource file > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000 > > EAL: Error - exiting with code: 1 > > Cause: Requested device 0000:02:00.0 cannot be used > > > > Regards, > > Kai > > > > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <[email protected]> wrote: > > > > Command line: > > primary: sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary > > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary > > > > The configurations are as follows: > > A) 1 x Intel E5-2650 v4, 12 cores [UMA], XL710 40GbE, bind 02:00.0, > > 2048 x 4k huge page > > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 > > for 40GbE QSFP+ (rev 02) [<<- Only bind this one] > > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 > > for 40GbE QSFP+ (rev 02) > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > > Connection (rev 03) > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > > Connection (rev 03) > > Socket 0 > > -------- > > Core 0 [0, 12] > > Core 1 [1, 13] > > Core 2 [2, 14] > > Core 3 [3, 15] > > Core 4 [4, 16] > > Core 5 [5, 17] > > Core 8 [6, 18] > > Core 9 [7, 19] > > Core 10 [8, 20] > > Core 11 [9, 21] > > Core 12 [10, 22] > > Core 13 [11, 23] > > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA], No Port Bind, 2048 x 4k > > huge page > > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > > Connection (rev 03) > > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network > > Connection (rev 03) > > Socket 0 Socket 1 > > -------- -------- > > Core 0 [0, 20] [10, 30] > > Core 1 [1, 21] [11, 31] > > Core 2 [2, 22] [12, 32] > > Core 3 [3, 23] [13, 33] > > Core 4 [4, 24] [14, 34] > > Core 8 [5, 25] [15, 35] > > Core 9 [6, 26] [16, 36] > > Core 10 [7, 27] [17, 37] > > Core 11 [8, 28] [18, 38] > > Core 12 [9, 29] [19, 39] > > > > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my > > program with locally generated packets. But I am using other DPDK features, > > such as memory sharing and message passing. Maybe that is the reason it > > works correctly? I can only access machine B remotely, so I am unable to > > install a NIC on it. I have another PC that is used as a client that only > > has four cores, which also cannot be used for verification... > > > > Regards, > > Kai > > > > > > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <[email protected]> wrote: > > > > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <[email protected]> wrote: > > > > > > Hi Keith, > > > > > > Thank you for your reply. > > > > > > I have tested my program on two machines > > > A) 1 x Intel E5-2650 v4, 12 cores [UMA] > > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA] > > > > > > I am very sure that the primary process uses different cores with the > > > secondary process. The strange thing is that my program works correctly > > > on machine B. But on machine A, the above issue happens with more than 4 > > > cores assigned to the secondary process. > > > > > > I have tried to assign cores 1-5 to the secondary process and also tried > > > other core assignment policies, but the error still happens > > > rte_eal_init() with more than 4 cores. > > > > It would be nice to see both command lines. I am not sure I can help more > > all I can do is suggest some ideas to look at. > > > > Does machine B have the same number and type of NICs? Use ‘lspci | grep > > Ethernet’ to get a list of all Ethernet devices on both machines. > > > > What is the number of hugepages you have allocated for both machines. > > > > Also look at the cpu_layout.py script to see why adding the 5th core would > > be different on the two machines and try to make them the same. > > > > > > > > Regards, > > > Kai > > > > > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <[email protected]> > > > wrote: > > > > > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <[email protected]> wrote: > > > > > > > > Hi, there > > > > > > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux > > > > kernel version 3.8.0-30. > > > > > > > > I have a master process and a secondary process. When I run the > > > > secondary > > > > process with less than or equal to 4 cores, it works correctly. Such as: > > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary > > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary > > > > > > > > However, there will be error in the rte_eal_init if I assign more than 4 > > > > cores. > > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary > > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary > > > > > > > > EAL: Cannot mmap device resource file > > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000 > > > > EAL: Error - exiting with code: 1 > > > > Cause: Requested device 0000:02:00.0 cannot be used > > > > > > I assume you have at least 8 cores. Have you tried -l 1-5 on the > > > secondary process. > > > > > > You did not show the primary process command line, but the if you use 1-5 > > > then you can only give primary process -l 6-7 or two cores. It is always > > > a reasonable thing is to leave core zero for linux to use. > > > > > > Also it could be you ran out of memory or hugepages you allocated to the > > > system. > > > > > > > > > > > Anyone knows why this happens? > > > > > > > > Thanks a lot, > > > > Kai Zhang > > > > > > Regards, > > > Keith > > > > > > > > > > Regards, > > Keith > > > > > > > > Regards, > Keith Regards, Keith
