> On Mar 12, 2017, at 6:39 PM, Kai Zhang <[email protected]> wrote:
> 
> 
> Your application may be attaching to the same port for each core. Normally 
> this means the each core could be allocating memory and the 4th core just 
> goes over the amount of memory you have reserved.
> 
> I don't think so. Because the error is in the rte_eal_init(), which is 
> executed in the first line of the main() function. At the time, the other 
> threads are not even launched.
> 
> Is it possible to consider this as a bug in DPDK?

It is possible it is a bug, but we need to be able to reproduce the bug. Can 
you create this problem with any of the DPDK example apps? If not then we need 
to figure out how to create a set of steps or code to produce the failure.

> 
> Regards,
> Kai
> 
>  
> >
> > EAL: Cannot mmap device resource file 
> > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > EAL: Error - exiting with code: 1
> >   Cause: Requested device 0000:02:00.0 cannot be used
> >
> > Regards,
> > Kai
> >
> > On Sun, Mar 12, 2017 at 11:21 AM, Kai Zhang <[email protected]> wrote:
> >
> > Command line:
> > primary:      sudo ./primary -l 0,1,2,3 -n 4 --proc-type=primary
> > secondary: sudo ./secondary -l 4,5,6,7,8 -n 4 --proc-type=secondary
> >
> > The configurations are as follows:
> > A) 1 x Intel E5-2650 v4, 12 cores [UMA],     XL710 40GbE, bind 02:00.0,    
> > 2048 x 4k huge page
> > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 
> > for 40GbE QSFP+ (rev 02)   [<<- Only bind this one]
> > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 
> > for 40GbE QSFP+ (rev 02)
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> >         Socket 0
> > --------
> > Core 0  [0, 12]
> > Core 1  [1, 13]
> > Core 2  [2, 14]
> > Core 3  [3, 15]
> > Core 4  [4, 16]
> > Core 5  [5, 17]
> > Core 8  [6, 18]
> > Core 9  [7, 19]
> > Core 10 [8, 20]
> > Core 11 [9, 21]
> > Core 12 [10, 22]
> > Core 13 [11, 23]
> >
> > B) 2 x Intel E5-2640 v4, 10 cores [NUMA],      No Port Bind,    2048 x 4k 
> > huge page
> > 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> > 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network 
> > Connection (rev 03)
> >         Socket 0        Socket 1
> >         --------        --------
> > Core 0  [0, 20]         [10, 30]
> > Core 1  [1, 21]         [11, 31]
> > Core 2  [2, 22]         [12, 32]
> > Core 3  [3, 23]         [13, 33]
> > Core 4  [4, 24]         [14, 34]
> > Core 8  [5, 25]         [15, 35]
> > Core 9  [6, 26]         [16, 36]
> > Core 10 [7, 27]         [17, 37]
> > Core 11 [8, 28]         [18, 38]
> > Core 12 [9, 29]         [19, 39]
> >
> > Ah, as machine B does not have a 40GbE, I did not bind any NIC and run my 
> > program with locally generated packets. But I am using other DPDK features, 
> > such as memory sharing and message passing. Maybe that is the reason it 
> > works correctly? I can only access machine B remotely, so I am unable to 
> > install a NIC on it. I have another PC that is used as a client that only 
> > has four cores, which also cannot be used for verification...
> >
> > Regards,
> > Kai
> >
> >
> > On Sun, Mar 12, 2017 at 2:59 AM, Wiles, Keith <[email protected]> wrote:
> >
> > > On Mar 11, 2017, at 9:45 AM, Kai Zhang <[email protected]> wrote:
> > >
> > > Hi Keith,
> > >
> > > Thank you for your reply.
> > >
> > > I have tested my program on two machines
> > > A) 1 x Intel E5-2650 v4, 12 cores [UMA]
> > > B) 2 x Intel E5-2640 v4, 10 cores [NUMA]
> > >
> > > I am very sure that the primary process uses different cores with the 
> > > secondary process. The strange thing is that my program works correctly 
> > > on machine B. But on machine A, the above issue happens with more than 4 
> > > cores assigned to the secondary process.
> > >
> > > I have tried to assign cores 1-5  to the secondary process and also tried 
> > > other core assignment policies, but the error still happens 
> > > rte_eal_init() with more than 4 cores.
> >
> > It would be nice to see both command lines. I am not sure I can help more 
> > all I can do is suggest some ideas to look at.
> >
> > Does machine B have the same number and type of NICs? Use ‘lspci | grep 
> > Ethernet’ to get a list of all Ethernet devices on both machines.
> >
> > What is the number of hugepages you have allocated for both machines.
> >
> > Also look at the cpu_layout.py script to see why adding the 5th core would 
> > be different on the two machines and try to make them the same.
> >
> > >
> > > Regards,
> > > Kai
> > >
> > > On Sat, Mar 11, 2017 at 10:52 PM, Wiles, Keith <[email protected]> 
> > > wrote:
> > >
> > > > On Mar 10, 2017, at 9:35 PM, Kai Zhang <[email protected]> wrote:
> > > >
> > > > Hi, there
> > > >
> > > > I am using DPDK-16.11 on XL710 40GbE NIC. OS: CentOS 7.3.1611 with Linux
> > > > kernel version 3.8.0-30.
> > > >
> > > > I have a master process and a secondary process. When I run the 
> > > > secondary
> > > > process with less than or equal to 4 cores, it works correctly. Such as:
> > > > sudo ./program -l 4,5,6,7 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x0f -n 4 --proc-type=secondary
> > > >
> > > > However, there will be error in the rte_eal_init if I assign more than 4
> > > > cores.
> > > > sudo ./program -l 0,1,2,3,4 -n 4 --proc-type=secondary
> > > > sudo ./program -c 0x1f -n 4 --proc-type=secondary
> > > >
> > > > EAL: Cannot mmap device resource file
> > > > /sys/bus/pci/devices/0000:02:00.0/resource0 to address: 0x7fff65bfc000
> > > > EAL: Error - exiting with code: 1
> > > >  Cause: Requested device 0000:02:00.0 cannot be used
> > >
> > > I assume you have at least 8 cores. Have you tried -l 1-5 on the 
> > > secondary process.
> > >
> > > You did not show the primary process command line, but the if you use 1-5 
> > > then you can only give primary process -l 6-7 or two cores. It is always 
> > > a reasonable thing is to leave core zero for linux to use.
> > >
> > > Also it could be you ran out of memory or hugepages you allocated to the 
> > > system.
> > >
> > > >
> > > > Anyone knows why this happens?
> > > >
> > > > Thanks a lot,
> > > > Kai Zhang
> > >
> > > Regards,
> > > Keith
> > >
> > >
> >
> > Regards,
> > Keith
> >
> >
> >
> 
> Regards,
> Keith

Regards,
Keith

Reply via email to