[dpdk-users] Service cores and multi-process

2019-01-16 Thread Pathak, Pravin
Hi All -

In case of DPDK multi-process mode, do we need to give same service core masks 
for all process invocation?

Regards
Pravin



Re: [dpdk-users] Query on handling packets

2019-01-16 Thread Harsh Patel
Hi

We were able to optimise the DPDK version. There were couple of things we
needed to do.

We were using tx timeout as 1s/2048, which we found out to be very less.
Then we increased the timeout, but we were getting lot of retransmissions.

So we removed the timeout and sent single packet as soon as we get it. This
increased the throughput.

Then we used DPDK feature to launch function on core, and gave a dedicated
core for Rx. This increased the throughput further.

The code is working really well for low bandwidth (<~50Mbps) and is
outperforming raw socket version.
But for high bandwidth, we are getting packet length mismatches for some
reason. We are investigating it.

We really thank you for the suggestions given by you and also for keeping
the patience for last couple of months.

Thank you

Regards,
Harsh & Hrishikesh

On Fri, Jan 4, 2019, 11:27 Harsh Patel  wrote:

> Yes that would be helpful.
> It'd be ok for now to use the same dpdk version to overcome the build
> issues.
> We will look into updating the code for latest versions once we get past
> this problem.
>
> Thank you very much.
>
> Regards,
> Harsh & Hrishikesh
>
> On Fri, Jan 4, 2019, 04:13 Wiles, Keith  wrote:
>
>>
>>
>> > On Jan 3, 2019, at 12:12 PM, Harsh Patel 
>> wrote:
>> >
>> > Hi
>> >
>> > We applied your suggestion of removing the `IsLinkUp()` call. But the
>> performace is even worse. We could only get around 340kbits/s.
>> >
>> > The Top Hotspots are:
>> >
>> > FunctionModuleCPU Time
>> > eth_em_recv_pktslibrte_pmd_e1000.so15.106s
>> > rte_delay_us_blocklibrte_eal.so.6.17.372s
>> > ns3::DpdkNetDevice::Readlibns3.28.1-fd-net-device-debug.so
>> 5.080s
>> > rte_eth_rx_burstlibns3.28.1-fd-net-device-debug.so3.558s
>> > ns3::DpdkNetDeviceReader::DoReadlibns3.28.1-fd-net-device-debug.so
>>   3.364s
>> > [Others]4.760s
>>
>> Performance reduced by removing that link status check, that is weird.
>> >
>> > Upon checking the callers of `rte_delay_us_block`, we got to know that
>> most of the time (92%) spent in this function is during initialization.
>> > This does not waste our processing time during communication. So, it's
>> a good start to our optimization.
>> >
>> > CallersCPU Time: TotalCPU Time: Self
>> > rte_delay_us_block100.0%7.372s
>> >   e1000_enable_ulp_lpt_lp92.3%6.804s
>> >   e1000_write_phy_reg_mdic1.8%0.136s
>> >   e1000_reset_hw_ich8lan1.7%0.128s
>> >   e1000_read_phy_reg_mdic1.4%0.104s
>> >   eth_em_link_update1.4%0.100s
>> >   e1000_get_cfg_done_generic0.7%0.052s
>> >   e1000_post_phy_reset_ich8lan.part.180.7%0.048s
>>
>> I guess you are having vTune start your application and that is why you
>> have init time items in your log. I normally start my application and then
>> attach vtune to the application. One of the options in configuration of
>> vtune for that project is to attach to the application. Maybe it would help
>> hear.
>>
>> Looking at the data you provided it was ok. The problem is it would not
>> load the source files as I did not have the same build or executable. I
>> tried to build the code, but it failed to build and I did not go further. I
>> guess I would need to see the full source tree and the executable you used
>> to really look at the problem. I have limited time, but I can try if you
>> like.
>> >
>> >
>> > Effective CPU Utilization:21.4% (0.856 out of 4)
>> >
>> > Here is the link to vtune profiling results.
>> https://drive.google.com/open?id=1M6g2iRZq2JGPoDVPwZCxWBo7qzUhvWi5
>> >
>> > Thank you
>> >
>> > Regards
>> >
>> > On Sun, Dec 30, 2018, 06:00 Wiles, Keith  wrote:
>> >
>> >
>> > > On Dec 29, 2018, at 4:03 PM, Harsh Patel 
>> wrote:
>> > >
>> > > Hello,
>> > > As suggested, we tried profiling the application using Intel VTune
>> Amplifier. We aren't sure how to use these results, so we are attaching
>> them to this email.
>> > >
>> > > The things we understood were 'Top Hotspots' and 'Effective CPU
>> utilization'. Following are some of our understandings:
>> > >
>> > > Top Hotspots
>> > >
>> > > FunctionModule  CPU Time
>> > > rte_delay_us_block  librte_eal.so.6.1   15.042s
>> > > eth_em_recv_pktslibrte_pmd_e1000.so 9.544s
>> > > ns3::DpdkNetDevice::Readlibns3.28.1-fd-net-device-debug.so
>> 3.522s
>> > > ns3::DpdkNetDeviceReader::DoRead
>> libns3.28.1-fd-net-device-debug.so  2.470s
>> > > rte_eth_rx_burstlibns3.28.1-fd-net-device-debug.so
>> 2.456s
>> > > [Others]6.656s
>> > >
>> > > We knew about other methods except `rte_delay_us_block`. So we
>> investigated the callers of this method:
>> > >
>> > > Callers Effective Time  Spin Time   Overhead Time   Effective
>> Time  Spin Time   Overhead Time   Wait Time: TotalWait Time:
>> Self
>> > > e1000_enable_ulp_lpt_lp 45.6%   0.0%0.0%6.860s  0usec   0usec
>> > > e1000_write_phy_reg_mdic32.7%   0.0%0.0%4.916s
>>