> On Feb 5, 2019, at 8:00 AM, Harsh Patel <[email protected]> wrote: > > Hi, > One of the mistake was as following. ns-3 frees the packet buffer just as it > writes to the socket and thus we thought that we should also do the same. But > dpdk while writing places the packet buffer to the tx descriptor ring and > perform the transmission after that on its own. And we were freeing early so > sometimes the packets were lost i.e. freed before transmission. > > Another thing was that as you suggested earlier we compiled the whole ns-3 in > optimized mode. That improved the performance. > > These 2 things combined got us the desired results.
Excellent thanks > > Regards, > Harsh & Hrishikesh > > On Tue, Feb 5, 2019, 18:33 Wiles, Keith <[email protected]> wrote: > > > > On Feb 5, 2019, at 12:37 AM, Harsh Patel <[email protected]> wrote: > > > > Hi, > > > > We would like to inform you that our code is working as expected and we are > > able to obtain 95-98 Mbps data rate for a 100Mbps application rate. We are > > now working on the testing of the code. Thanks a lot, especially to Keith > > for all the help you provided. > > > > We have 2 main queries :- > > 1) We wanted to calculate Backlog at the NIC Tx Descriptors but were not > > able to find anything in the documentation. Can you help us in how to > > calculate the backlog? > > 2) We searched on how to use Byte Queue Limit (BQL) on the NIC queue but > > couldn't find anything like that in DPDK. Does DPDK support BQL? If so, can > > you help us on how to use it for our project? > > what was the last set of problems if I may ask? > > > > Thanks & Regards > > Harsh & Hrishikesh > > > > On Thu, 31 Jan 2019 at 22:28, Wiles, Keith <[email protected]> wrote: > > > > > > Sent from my iPhone > > > > On Jan 30, 2019, at 5:36 PM, Harsh Patel <[email protected]> wrote: > > > >> Hello, > >> > >> This mail is to inform you that the integration of DPDK is working with > >> ns-3 on a basic level. The model is running. > >> For UDP traffic we are getting throughput same or better than raw socket. > >> (Around 100Mbps) > >> But unfortunately for TCP, there are burst packet losses due to which the > >> throughput is drastically affected after some point of time. The bandwidth > >> of the link used was 100Mbps. > >> We have obtained cwnd and ssthresh graphs which show that once the flow > >> gets out from Slow Start mode, there are so many packet losses that the > >> congestion window & the slow start threshold is not able to go above 4-5 > >> packets. > > > > Can you determine where the packets are being dropped? > >> We have attached the graphs with this mail. > >> > > > > I do not see the graphs attached but that’s OK. > >> We would like to know if there is any reason to this or how can we fix > >> this. > > > > I think we have to find out where the packets are being dropped this is the > > only reason for the case to your referring to. > >> > >> Thanks & Regards > >> Harsh & Hrishikesh > >> > >> On Wed, 16 Jan 2019 at 19:25, Harsh Patel <[email protected]> wrote: > >> Hi > >> > >> We were able to optimise the DPDK version. There were couple of things we > >> needed to do. > >> > >> We were using tx timeout as 1s/2048, which we found out to be very less. > >> Then we increased the timeout, but we were getting lot of retransmissions. > >> > >> So we removed the timeout and sent single packet as soon as we get it. > >> This increased the throughput. > >> > >> Then we used DPDK feature to launch function on core, and gave a dedicated > >> core for Rx. This increased the throughput further. > >> > >> The code is working really well for low bandwidth (<~50Mbps) and is > >> outperforming raw socket version. > >> But for high bandwidth, we are getting packet length mismatches for some > >> reason. We are investigating it. > >> > >> We really thank you for the suggestions given by you and also for keeping > >> the patience for last couple of months. > >> > >> Thank you > >> > >> Regards, > >> Harsh & Hrishikesh > >> > >> On Fri, Jan 4, 2019, 11:27 Harsh Patel <[email protected]> wrote: > >> Yes that would be helpful. > >> It'd be ok for now to use the same dpdk version to overcome the build > >> issues. > >> We will look into updating the code for latest versions once we get past > >> this problem. > >> > >> Thank you very much. > >> > >> Regards, > >> Harsh & Hrishikesh > >> > >> On Fri, Jan 4, 2019, 04:13 Wiles, Keith <[email protected]> wrote: > >> > >> > >> > On Jan 3, 2019, at 12:12 PM, Harsh Patel <[email protected]> > >> > wrote: > >> > > >> > Hi > >> > > >> > We applied your suggestion of removing the `IsLinkUp()` call. But the > >> > performace is even worse. We could only get around 340kbits/s. > >> > > >> > The Top Hotspots are: > >> > > >> > Function Module CPU Time > >> > eth_em_recv_pkts librte_pmd_e1000.so 15.106s > >> > rte_delay_us_block librte_eal.so.6.1 7.372s > >> > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so 5.080s > >> > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 3.558s > >> > ns3::DpdkNetDeviceReader::DoRead libns3.28.1-fd-net-device-debug.so > >> > 3.364s > >> > [Others] 4.760s > >> > >> Performance reduced by removing that link status check, that is weird. > >> > > >> > Upon checking the callers of `rte_delay_us_block`, we got to know that > >> > most of the time (92%) spent in this function is during initialization. > >> > This does not waste our processing time during communication. So, it's a > >> > good start to our optimization. > >> > > >> > Callers CPU Time: Total CPU Time: Self > >> > rte_delay_us_block 100.0% 7.372s > >> > e1000_enable_ulp_lpt_lp 92.3% 6.804s > >> > e1000_write_phy_reg_mdic 1.8% 0.136s > >> > e1000_reset_hw_ich8lan 1.7% 0.128s > >> > e1000_read_phy_reg_mdic 1.4% 0.104s > >> > eth_em_link_update 1.4% 0.100s > >> > e1000_get_cfg_done_generic 0.7% 0.052s > >> > e1000_post_phy_reset_ich8lan.part.18 0.7% 0.048s > >> > >> I guess you are having vTune start your application and that is why you > >> have init time items in your log. I normally start my application and then > >> attach vtune to the application. One of the options in configuration of > >> vtune for that project is to attach to the application. Maybe it would > >> help hear. > >> > >> Looking at the data you provided it was ok. The problem is it would not > >> load the source files as I did not have the same build or executable. I > >> tried to build the code, but it failed to build and I did not go further. > >> I guess I would need to see the full source tree and the executable you > >> used to really look at the problem. I have limited time, but I can try if > >> you like. > >> > > >> > > >> > Effective CPU Utilization: 21.4% (0.856 out of 4) > >> > > >> > Here is the link to vtune profiling results. > >> > https://drive.google.com/open?id=1M6g2iRZq2JGPoDVPwZCxWBo7qzUhvWi5 > >> > > >> > Thank you > >> > > >> > Regards > >> > > >> > On Sun, Dec 30, 2018, 06:00 Wiles, Keith <[email protected]> wrote: > >> > > >> > > >> > > On Dec 29, 2018, at 4:03 PM, Harsh Patel <[email protected]> > >> > > wrote: > >> > > > >> > > Hello, > >> > > As suggested, we tried profiling the application using Intel VTune > >> > > Amplifier. We aren't sure how to use these results, so we are > >> > > attaching them to this email. > >> > > > >> > > The things we understood were 'Top Hotspots' and 'Effective CPU > >> > > utilization'. Following are some of our understandings: > >> > > > >> > > Top Hotspots > >> > > > >> > > Function Module CPU Time > >> > > rte_delay_us_block librte_eal.so.6.1 15.042s > >> > > eth_em_recv_pkts librte_pmd_e1000.so 9.544s > >> > > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so > >> > > 3.522s > >> > > ns3::DpdkNetDeviceReader::DoRead > >> > > libns3.28.1-fd-net-device-debug.so 2.470s > >> > > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 2.456s > >> > > [Others] 6.656s > >> > > > >> > > We knew about other methods except `rte_delay_us_block`. So we > >> > > investigated the callers of this method: > >> > > > >> > > Callers Effective Time Spin Time Overhead Time Effective Time > >> > > Spin Time Overhead Time Wait Time: Total Wait Time: > >> > > Self > >> > > e1000_enable_ulp_lpt_lp 45.6% 0.0% 0.0% 6.860s 0usec 0usec > >> > > e1000_write_phy_reg_mdic 32.7% 0.0% 0.0% 4.916s 0usec > >> > > 0usec > >> > > e1000_read_phy_reg_mdic 19.4% 0.0% 0.0% 2.922s 0usec 0usec > >> > > e1000_reset_hw_ich8lan 1.0% 0.0% 0.0% 0.143s 0usec 0usec > >> > > eth_em_link_update 0.7% 0.0% 0.0% 0.100s 0usec 0usec > >> > > e1000_post_phy_reset_ich8lan.part.18 0.4% 0.0% 0.0% 0.064s > >> > > 0usec 0usec > >> > > e1000_get_cfg_done_generic 0.2% 0.0% 0.0% 0.037s 0usec > >> > > 0usec > >> > > > >> > > We lack sufficient knowledge to investigate more than this. > >> > > > >> > > Effective CPU utilization > >> > > > >> > > Interestingly, the effective CPU utilization was 20.8% (0.832 out of 4 > >> > > logical CPUs). We thought this is less. So we compared this with the > >> > > raw-socket version of the code, which was even less, 8.0% (0.318 out > >> > > of 4 logical CPUs), and even then it is performing way better. > >> > > > >> > > It would be helpful if you give us insights on how to use these > >> > > results or point us to some resources to do so. > >> > > > >> > > Thank you > >> > > > >> > > >> > BTW, I was able to build ns3 with DPDK 18.11 it required a couple > >> > changes in the DPDK init code in ns3 plus one hack in rte_mbuf.h file. > >> > > >> > I did have a problem including rte_mbuf.h file into your code. It > >> > appears the g++ compiler did not like referencing the struct > >> > rte_mbuf_sched inside the rte_mbuf structure. The rte_mbuf_sched was > >> > inside the big union as a hack I moved the struct outside of the > >> > rte_mbuf structure and replaced the struct in the union with ’struct > >> > rte_mbuf_sched sched;', but I am guessing you are missing some compiler > >> > options in your build system as DPDK builds just fine without that hack. > >> > > >> > The next place was the rxmode and the txq_flags. The rxmode structure > >> > has changed and I commented out the inits in ns3 and then commented out > >> > the txq_flags init code as these are now the defaults. > >> > > >> > Regards, > >> > Keith > >> > > >> > >> Regards, > >> Keith > >> > >> <Ssthresh.png> > >> <Cwnd.png> > > Regards, > Keith > Regards, Keith
