I've done my homework with perf and the results show that iTLB-load-misses value is very high. In the tests without socket operations the processing lcore has 0.87% of all iTLB cache hits and there is no packet loss. In the test WITH socket operations the processing lcore has 31.09% of all iTLB cache hits and there is about 10% packet loss. How to interpret with results? Google shows a little about iTLB. So far some web pages suggest the following: "Try to minimize the size of the source code and locality so that instructions span a minimum number of pages, and so that the instruction span is less then the number of ITLB entries."
Any ideas? 2016-04-14 23:43 GMT+03:00 Hu, Xuekun <xuekun.hu at intel.com>: > Perf could. Or PCM, that is also a good tool. > https://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization > > > > *From:* ????????? ??????? [mailto:kiselev99 at gmail.com] > *Sent:* Friday, April 15, 2016 3:31 AM > *To:* Hu, Xuekun > *Cc:* Shawn Lewis; users at dpdk.org > > *Subject:* Re: [dpdk-users] Lcore impact > > > > > > > > 2016-04-14 20:49 GMT+03:00 Hu, Xuekun <xuekun.hu at intel.com>: > > Are the two lcore belonging to one processor, or two processors? What the > memory footprint is for the system call threads? If the memory footprint is > big (>LLC cache size) and two locre are in the same processor, then it > could have impact on packet processing thread. > > > > Those two lcores belong to one processor and it's a single processor > machine. > > > > Both cores allocates a lot of memory and use the full dpdk arsenal: lpm, > mempools, hashes and etc. But during the test the core doing socket data > transfering is using only small 16k buffer for sending and sending is the > all it does during the test. It doesn't use any other allocated memory > structures. The processing core in turn is using rte_lpm whitch is big, but > in my test there are only about 10 routes in it, so I think the amount > "hot" memory is not very big. But I can't say if it's bigger than l3 cpu > cache or not. Should I use some profilers and see if socket operations > cause a lot of cache miss in the processing lcore? It there some tool that > allows me to do that? perf maybe? > > > > > > > > -----Original Message----- > From: users [mailto:users-bounces at dpdk.org] On Behalf Of Alexander Kiselev > Sent: Friday, April 15, 2016 1:19 AM > To: Shawn Lewis > Cc: users at dpdk.org > Subject: Re: [dpdk-users] Lcore impact > > I've already seen this documen and have used this tricks a lot of times. > But this time I send data locally over localhost. There is even no nics > bind to linux in my machine. Therefore there is no nics interruptions I can > pin to cpu. So what do you propose? > > > 14 ???. 2016 ?., ? 20:06, Shawn Lewis <smlsr at tencara.com> ???????(?): > > > > You have to work with IRQBalancer as well > > > > > http://www.intel.com/content/dam/doc/application-note/82575-82576-82598-82599-ethernet-controllers-interrupts-appl-note.pdf > > > > Is just an example document which discuss this (not so much DPDK > related)... But the OS will attempt to balance the interrupts when you > actually want to remove or pin them down... > > > >> On Thu, Apr 14, 2016 at 1:02 PM, Alexander Kiselev <kiselev99 at gmail.com> > wrote: > >> > >> > >>> 14 ???. 2016 ?., ? 19:35, Shawn Lewis <smlsr at tencara.com> ???????(?): > >>> > >>> Lots of things... > >>> > >>> One just because you have a process running on an lcore, does not mean > thats all that runs on it. Unless you have told the kernel at boot to NOT > use those specific cores, those cores will be used for many things OS > related. > >> > >> Generally yes, but unless I start sending data to socket there is no > packet loss. I did about 10 test runs in a raw and everythis was ok. And > there is no other application running on that test machine that uses cpu > cores. > >> > >> So the question is why this socket operations influence the other lcore? > >> > >>> > >>> IRQBlance > >>> System OS operations. > >>> Other Applications. > >>> > >>> So by doing file i/o you are generating interrupts, where those > interrupts get serviced is up to IRQBalancer. So could be any one of your > cores. > >> > >> That is a good point. I can use cpu affinity feature to bind > unterruption handler to the core not used in my test. But I send data > locally over localhost. Is it possible to use cpu affinity in that case? > >> > >>> > >>> > >>> > >>>> On Thu, Apr 14, 2016 at 12:31 PM, Alexander Kiselev < > kiselev99 at gmail.com> wrote: > >>>> Could someone give me any hints about what could cause permormance > issues in a situation where one lcore doing a lot of linux system calls > (read/write on socket) slow down the other lcore doing packet forwarding? > In my test the forwarding lcore doesn't share any memory structures with > the other lcore that sends test data to socket. Both lcores pins to > different processors cores. So therotically they shouldn't have any impact > on each other but they do, once one lcore starts sending data to socket the > other lcore starts dropping packets. Why? > >>> > > > > > > > > -- > > ? ?????????, > ??????? ????????? > -- ? ?????????, ??????? ?????????
