> On Dec 29, 2018, at 4:03 PM, Harsh Patel <[email protected]> wrote: > > Hello, > As suggested, we tried profiling the application using Intel VTune Amplifier. > We aren't sure how to use these results, so we are attaching them to this > email. > > The things we understood were 'Top Hotspots' and 'Effective CPU utilization'. > Following are some of our understandings: > > Top Hotspots > > Function Module CPU Time > rte_delay_us_block librte_eal.so.6.1 15.042s > eth_em_recv_pkts librte_pmd_e1000.so 9.544s > ns3::DpdkNetDevice::Read libns3.28.1-fd-net-device-debug.so 3.522s > ns3::DpdkNetDeviceReader::DoRead libns3.28.1-fd-net-device-debug.so > 2.470s > rte_eth_rx_burst libns3.28.1-fd-net-device-debug.so 2.456s > [Others] 6.656s > > We knew about other methods except `rte_delay_us_block`. So we investigated > the callers of this method: > > Callers Effective Time Spin Time Overhead Time Effective Time Spin > Time Overhead Time Wait Time: Total Wait Time: Self > e1000_enable_ulp_lpt_lp 45.6% 0.0% 0.0% 6.860s 0usec 0usec > e1000_write_phy_reg_mdic 32.7% 0.0% 0.0% 4.916s 0usec 0usec > e1000_read_phy_reg_mdic 19.4% 0.0% 0.0% 2.922s 0usec 0usec > e1000_reset_hw_ich8lan 1.0% 0.0% 0.0% 0.143s 0usec 0usec > eth_em_link_update 0.7% 0.0% 0.0% 0.100s 0usec 0usec > e1000_post_phy_reset_ich8lan.part.18 0.4% 0.0% 0.0% 0.064s 0usec > 0usec > e1000_get_cfg_done_generic 0.2% 0.0% 0.0% 0.037s 0usec 0usec > > We lack sufficient knowledge to investigate more than this. > > Effective CPU utilization > > Interestingly, the effective CPU utilization was 20.8% (0.832 out of 4 > logical CPUs). We thought this is less. So we compared this with the > raw-socket version of the code, which was even less, 8.0% (0.318 out of 4 > logical CPUs), and even then it is performing way better. > > It would be helpful if you give us insights on how to use these results or > point us to some resources to do so. > > Thank you >
BTW, I was able to build ns3 with DPDK 18.11 it required a couple changes in the DPDK init code in ns3 plus one hack in rte_mbuf.h file. I did have a problem including rte_mbuf.h file into your code. It appears the g++ compiler did not like referencing the struct rte_mbuf_sched inside the rte_mbuf structure. The rte_mbuf_sched was inside the big union as a hack I moved the struct outside of the rte_mbuf structure and replaced the struct in the union with ’struct rte_mbuf_sched sched;', but I am guessing you are missing some compiler options in your build system as DPDK builds just fine without that hack. The next place was the rxmode and the txq_flags. The rxmode structure has changed and I commented out the inits in ns3 and then commented out the txq_flags init code as these are now the defaults. Regards, Keith
