> On Mar 12, 2017, at 6:39 PM, Kai Zhang <[email protected]> wrote:
> 
> 
> Your application may be attaching to the same port for each core. Normally 
> this means the each core could be allocating memory and the 4th core just 
> goes over the amount of memory you have reserved.
> 
> I don't think so. Because the error is in the rte_eal_init(), which is 
> executed in the first line of the main() function. At the time, the other 
> threads are not even launched.
> 
> Is it possible to consider this as a bug in DPDK?

One more thing, I run Pktgen as two processes all of the time. The big 
difference is I do not run in primary and secondary modes. I run two different 
instances of pktgen at the same time without seeing this type problem. If the 
failure is associated with primary/secondary application model, then it could 
be a bug in that code as a lot of syncing up between the two processes needs to 
be done because of memory/device sharing. One problem with P/S applications is 
memory needs to be mapped at the same address between the processes and Linux 
has the Random memory mapping builtin for security reasons. I forget the name 
of the mode in Linux to turn off the random page mapping and google is not work 
for me ATM.

Does your application require running as a primary/secondary application?

> 
> Regards,
> Kai
> 
>  
> >
> > EAL: Cannot mmap device resource file 
> > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device 0000:02:00.0 cannot be used
> >
> > Regards,
> > Kai
> >
> > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <[email protected]> wrote:
> >
> > Command line:
> > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> >
> > The configurations are as follows:
> > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,    
> > 2048 x 4k huge page
> > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 
> > for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 
> > for 40GbE QSFP+ (rev 02)
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> >         Socket 0
> > --------
> > Core 0  [0, 12]
> > Core 1  [1, 13]
> > Core 2  [2, 14]
> > Core 3  [3, 15]
> > Core 4  [4, 16]
> > Core 5  [5, 17]
> > Core 8  [6, 18]
> > Core 9  [7, 19]
> > Core 10 [8, 20]
> > Core 11 [9, 21]
> > Core 12 [10, 22]
> > Core 13 [11, 23]
> >
> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x 4k 
> > huge page
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> >         Socket 0        Socket 1
> >         --------        --------
> > Core 0  [0, 20]         [10, 30]
> > Core 1  [1, 21]         [11, 31]
> > Core 2  [2, 22]         [12, 32]
> > Core 3  [3, 23]         [13, 33]
> > Core 4  [4, 24]         [14, 34]
> > Core 8  [5, 25]         [15, 35]
> > Core 9  [6, 26]         [16, 36]
> > Core 10 [7, 27]         [17, 37]
> > Core 11 [8, 28]         [18, 38]
> > Core 12 [9, 29]         [19, 39]
> >
> > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my 
> > program with locally generated packets. But I am using other DPDK features, 
> > such as memory sharing and message passing. Maybe that is the reason it 
> > works correctly? I can only access machine B remotely, so I am unable to 
> > install a NIC on it. I have another PC that is used as a client that only 
> > has four cores, which also cannot be used for verification...
> >
> > Regards,
> > Kai
> >
> >
> > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <[email protected]> wrote:
> >
> > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <[email protected]> wrote:
> > >
> > > Hi Keith,
> > >
> > > Thank you for your reply.
> > >
> > > I have tested my program on two machines
> > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > >
> > > I am very sure that the primary process uses different cores with the 
> > > secondary process. The strange thing is that my program works correctly 
> > > on machine B. But on machine A, the above issue happens with more than 4 
> > > cores assigned to the secondary process.
> > >
> > > I have tried to assign cores 1-5  to the secondary process and also tried 
> > > other core assignment policies, but the error still happens 
> > > rte_eal_init() with more than 4 cores.
> >
> > It would be nice to see both command lines. I am not sure I can help more 
> > all I can do is suggest some ideas to look at.
> >
> > Does machine B have the same number and type of NICs? Use ‘lspci | grep 
> > Ethernet’ to get a list of all Ethernet devices on both machines.
> >
> > What is the number of hugepages you have allocated for both machines.
> >
> > Also look at the cpu_layout.py script to see why adding the 5th core would 
> > be different on the two machines and try to make them the same.
> >
> > >
> > > Regards,
> > > Kai
> > >
> > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <[email protected]> 
> > > wrote:
> > >
> > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <[email protected]> wrote:
> > > >
> > > > Hi, there
> > > >
> > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> > > > kernel version 3.8.0-30.
> > > >
> > > > I have a master process and a secondary process. When I run the 
> > > > secondary
> > > > process with less than or equal to 4 cores, it works correctly. Such as:
> > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > >
> > > > However, there will be error in the rte_eal_init if I assign more than 4
> > > > cores.
> > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > >
> > > > EAL: Cannot mmap device resource file
> > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > > > EAL: Error - exiting with code: 1
> > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > >
> > > I assume you have at least 8 cores. Have you tried -l 1-5 on the 
> > > secondary process.
> > >
> > > You did not show the primary process command line, but the if you use 1-5 
> > > then you can only give primary process -l 6-7 or two cores. It is always 
> > > a reasonable thing is to leave core zero for linux to use.
> > >
> > > Also it could be you ran out of memory or hugepages you allocated to the 
> > > system.
> > >
> > > >
> > > > Anyone knows why this happens?
> > > >
> > > > Thanks a lot,
> > > > Kai Zhang
> > >
> > > Regards,
> > > Keith
> > >
> > >
> >
> > Regards,
> > Keith
> >
> >
> >
> 
> Regards,
> Keith

Regards,
Keith

Reply via email to