On 19/05/2021 17:06, Stephen Hemminger wrote:
On Wed, 19 May 2021 15:14:38 +0000
"Van Haaren, Harry" <harry.van.haa...@intel.com> wrote:

-----Original Message-----
From: users <users-boun...@dpdk.org> On Behalf Of Filip Janiszewski
Sent: Wednesday, May 19, 2021 2:10 PM
To: users@dpdk.org
Subject: [dpdk-users] Performance of rte_eth_stats_get

Hi,

Is it safe to call rte_eth_stats_get while capturing from the port?

I'm mostly concerned about performance, if rte_eth_stats_get will in any
way impact the port performance, in the application I plan to call the
function from a thread that is not directly involved in the capture,
there's another worker responsible for rx bursting, but I wonder if the
NIC might get upset if I call it too frequently (say 10 times per
second) and potentially cause some performance issues.

The question is really Nic agnostic, but if the Nic vendor is actually
relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5.

To understand what really goes on when getting stats, it might help to list the
steps involved in getting statistics from the NIC HW.

1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred
to as a "pci read") to the NIC.
2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read
3) NIC has to send a reply based on accessing its internal counters
4) CPU gets a result from the PCI read.

Notice how elegantly this whole process is abstracted from SW? In code, reading
a stat value is just dereferencing a pointer that is mapped to the NIC HW 
address.
In practice from a CPU performance point of view, doing an MMIO-read is one of
the slowest things you can do. You say the stats-reads are occurring from a 
thread
that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a 
concern.

Do note however, that when reading a full set of extended stats from the NIC, 
there
could be many 10's to 100's of MMIO reads (depending on the statistics 
requested,
and how the PMD itself is implemented to handle stats updates).

The PCI bus does become more busy with reads to the NIC HW when doing lots of
statistic updates, so there is some more contention/activity to be expected 
there.
The PCM tool can be very useful to see MMIO traffic, you could measure how many
extra PCI transactions are occurring due to reading stats every X ms?
https://github.com/opcm/pcm

I can recommend measuring pkt latency/jitter as a histogram, as then outliers 
in performance
can be identified. If you specifically want to identify if these are due stats 
reads, compare
with a "no stats reads" latency/jitter histogram, and graphically see the 
impact.
In the end if it doesn't affect packet latency/jitter, then it has no impact 
right?

Ultimately, I can't give a generic answer - best steps are to measure carefully 
and find out!

Thanks

Hope the above helps and doesn't add confusion :)  Regards, -Harry

Many drivers require transactions with the firmware via mailbox.
And that transaction needs a spin wait for the shared area.


Thank you for explaining the steps quite nicely. I also noticed this problem too. Calling `rte_eth_stats_get` in the PMDport per batch almost halves the throughput in a 10G setup IIRC, the cost is prohibitively HIGH. This, however, doesn't show up when DPDK connects a vhost-pmdport, since all of port statistics are probably somewhere in the shared
memory.

Reply via email to