> On Dec 29, 2018, at 4:03 PM, Harsh Patel <[email protected]> wrote:
> 
> Hello,
> As suggested, we tried profiling the application using Intel VTune Amplifier. 
> We aren't sure how to use these results, so we are attaching them to this 
> email.
> 
> The things we understood were 'Top Hotspots' and 'Effective CPU utilization'. 
> Following are some of our understandings:
> 
> Top Hotspots
> 
> Function        Module  CPU Time
> rte_delay_us_block      librte_eal.so.6.1       15.042s
> eth_em_recv_pkts        librte_pmd_e1000.so     9.544s
> ns3::DpdkNetDevice::Read        libns3.28.1-fd-net-device-debug.so      3.522s
> ns3::DpdkNetDeviceReader::DoRead        libns3.28.1-fd-net-device-debug.so    
>   2.470s
> rte_eth_rx_burst        libns3.28.1-fd-net-device-debug.so      2.456s
> [Others]                6.656s
> 
> We knew about other methods except `rte_delay_us_block`. So we investigated 
> the callers of this method:
> 
> Callers Effective Time  Spin Time       Overhead Time   Effective Time  Spin 
> Time       Overhead Time   Wait Time: Total        Wait Time: Self
> e1000_enable_ulp_lpt_lp 45.6%   0.0%    0.0%    6.860s  0usec   0usec
> e1000_write_phy_reg_mdic        32.7%   0.0%    0.0%    4.916s  0usec   0usec
> e1000_read_phy_reg_mdic 19.4%   0.0%    0.0%    2.922s  0usec   0usec
> e1000_reset_hw_ich8lan  1.0%    0.0%    0.0%    0.143s  0usec   0usec
> eth_em_link_update      0.7%    0.0%    0.0%    0.100s  0usec   0usec
> e1000_post_phy_reset_ich8lan.part.18    0.4%    0.0%    0.0%    0.064s  0usec 
>   0usec
> e1000_get_cfg_done_generic      0.2%    0.0%    0.0%    0.037s  0usec   0usec
> 
> We lack sufficient knowledge to investigate more than this.
> 
> Effective CPU utilization
> 
> Interestingly, the effective CPU utilization was 20.8% (0.832 out of 4 
> logical CPUs). We thought this is less. So we compared this with the 
> raw-socket version of the code, which was even less, 8.0% (0.318 out of 4 
> logical CPUs), and even then it is performing way better.
> 
> It would be helpful if you give us insights on how to use these results or 
> point us to some resources to do so. 
> 
> Thank you 
> 

BTW, I was able to build ns3 with DPDK 18.11 it required a couple changes in 
the DPDK init code in ns3 plus one hack in rte_mbuf.h file.

I did have a problem including rte_mbuf.h file into your code. It appears the 
g++ compiler did not like referencing the struct rte_mbuf_sched inside the 
rte_mbuf structure. The rte_mbuf_sched was inside the big union as a hack I 
moved the struct outside of the rte_mbuf structure and replaced the struct in 
the union with ’struct rte_mbuf_sched sched;', but I am guessing you are 
missing some compiler options in your build system as DPDK builds just fine 
without that hack.

The next place was the rxmode and the txq_flags. The rxmode structure has 
changed and I commented out the inits in ns3 and then commented out the 
txq_flags init code as these are now the defaults.

Regards,
Keith

Reply via email to