On 22.10.2019 10:25, Sriram Vatala wrote:
Hi Ilya & Kevin,
Thanks for your suggestions. I am summarizing our discussion, please feel free
to correct me, if I am wrong.

1) All custom stats calculated in OVS will have the prefix , so that these
stats will not intersect with the names of other stats (driver/HW etc).
2) The prefix used for these custom stats is "cstat_".
3) Instead of documenting all the stats, I will add a generic note about the
prefix used for custom stats in "Documentation/topics/dpdk/vhost-user.rst
where "tx_retries" is documented.

Looks OK.
One thing is that prefix documentation should, probably, go to the
"Extended & Custom Statistics" section of Documentation/topics/dpdk/bridge.rst .
This looks more appropriate, because we'll have those stats not only for
vhostuser ports.

Best regards, Ilya Maximets.


Thanks & Regards,
Sriram.

-----Original Message-----
From: Kevin Traynor <ktray...@redhat.com>
Sent: 21 October 2019 20:22
To: Ilya Maximets <i.maxim...@ovn.org>; Sriram Vatala
<srira...@altencalsoftlabs.com>; ovs-dev@openvswitch.org
Cc: Stokes, Ian <ian.sto...@intel.com>
Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics

On 18/10/2019 13:59, Ilya Maximets wrote:
On 16.10.2019 19:48, Kevin Traynor wrote:
On 16/10/2019 13:16, Ilya Maximets wrote:
Hi Kevin,

Thanks for review. Some comments inline.

On 16.10.2019 12:15, Kevin Traynor wrote:
Hi Sriram,

Thanks for working on making more fine grained stats for debugging.
As mentioned yesterday, this patch needs rebase so I just reviewed
docs and netdev-dpdk.c which applied. Comments below.

On 21/09/2019 03:40, Sriram Vatala wrote:
OVS may be unable to transmit packets for multiple reasons and
today there is a single counter to track packets dropped due to
any of those reasons. The most common reason is that a VM is
unable to read packets fast enough causing the vhostuser port
transmit queue on the OVS side to become full. This manifests as a
problem with VNFs not receiving all packets. Having a separate
drop counter to track packets dropped because the transmit queue
is full will clearly indicate that the problem is on the VM side
and not in OVS. Similarly maintaining separate

It reads like a bit of a contradiction. On one hand the "The *most
common* reason is that a VM is unable to read packets fast enough".
While having a stat "*will clearly indicate* that the problem is on
the VM side".

counters for all possible drops helps in indicating sensible cause
for packet drops.

This patch adds custom software stats counters to track packets
dropped at port level and these counters are displayed along with
other stats in "ovs-vsctl get interface <iface> statistics"
command. The detailed stats will be available for both dpdk and
vhostuser ports.


I think the commit msg could be a bit clearer, suggest something
like below - feel free to modify (or reject):

OVS may be unable to transmit packets for multiple reasons on the
DPDK datapath and today there is a single counter to track packets
dropped due to any of those reasons.

This patch adds custom software stats for the different reasons
packets may be dropped during tx on the DPDK datapath in OVS.

- MTU drops : drops that occur due to a too large packet size
- Qos drops: drops that occur due to egress QOS
- Tx failures: drops as returned by the DPDK PMD send function

Note that the reason for tx failures is not specificied in OVS. In
practice for vhost ports it is most common that tx failures are
because there are not enough available descriptors, which is
usually caused by misconfiguration of the guest queues and/or
because the guest is not consuming packets fast enough from the queues.

These counters are displayed along with other stats in "ovs-vsctl
get interface <iface> statistics" command and are available for
dpdk and vhostuser/vhostuserclient ports.

Signed-off-by: Sriram Vatala <srira...@altencalsoftlabs.com>

---

v9:...
v8:...


Note that signed-off, --- and version info should be like this ^^^.
otherwise the review version comments will be in the git commit log
when it is applied.

--
Changes since v8:
Addressed comments given by Ilya.

Signed-off-by: Sriram Vatala <srira...@altencalsoftlabs.com>
---
    Documentation/topics/dpdk/vhost-user.rst      | 13 ++-
    lib/netdev-dpdk.c                             | 81
+++++++++++++++----
    utilities/bugtool/automake.mk                 |  3 +-
    utilities/bugtool/ovs-bugtool-get-port-stats  | 25 ++++++
    .../plugins/network-status/openvswitch.xml    |  1 +
    5 files changed, 105 insertions(+), 18 deletions(-)
    create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats

diff --git a/Documentation/topics/dpdk/vhost-user.rst
b/Documentation/topics/dpdk/vhost-user.rst
index 724aa62f6..89388a2bf 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -551,7 +551,18 @@ processing packets at the required rate.
    The amount of Tx retries on a vhost-user or vhost-user-client
interface can be
    shown with::

-  $ ovs-vsctl get Interface dpdkvhostclient0
statistics:tx_retries
+  $ ovs-vsctl get Interface dpdkvhostclient0
+ statistics:netdev_dpdk_tx_retries

I think the "netdev_dpdk_" prefixes should be removed for a few
reasons,

- These are custom stats that will only be displayed for the dpdk
ports so they don't need any extra prefix to say they are for dpdk
ports

- The 'tx_retries' stat is part of OVS 2.12, I don't like to change
its name to a different one in OVS 2.13 unless there is a very good
reason

- The existing stats don't have this type of prefixes (granted most
of them are general stats):

The main reason for the prefix for me is to distinguish those stats
from the stats that comest from the driver/HW.  Our new stats has
very generic names that could be named a lot like or even equal to
HW stats.  This might be not an issue for vhostuser, but for HW NICs
this may be really confusing for users.

I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way
to clearly distinguish those stats.  We may use different prefixes
like 'sw_' or just 'ovs_'.


ok, I can see that we might want to distinguish custom stats to
indicate that they are specific for that device but that also has the
side effect of making them less self-descriptive and the user will
have to be told somewhere what the prefix means.

'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be
confusing too, as for example in DPDK case, Tx drops are calculated
in sw/ovs the same as qos/mtu/txfails but won't have that prefix. If
we really need a prefix 'cstat_' is the best I can think of.

What we're trying to solve here is to distinguish stats that counted
internally in OVS from the stats provided by the third parties.
Just to not have same/similar names

Usual stats from 'netdev_stats' are combined from various sources.
For example, 'rx_dropped' in netdev-dpdk is a combination of our QoS
dorps from OVS and HW counters like rte_stats.rx_nombuf.
For me it looks like for these stats we have kind of best-effort
policy to provide as full info as possible.  Trying to describe how
each of these counters calculated doesn't make sense because we do not
know the origin of many of the components that comes from DPDK or HW.


# ovs-vsctl get Interface dpdkvhost0 statistics
{"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0,
"rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176,
"rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0,
"rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0,
rx_errors=0, rx_packets=25622176, tx_bytes=3829825920,
tx_dropped=0, tx_packets=63830432, tx_retries=0}

Also, just to note that if there are changes to existing
interfaces/behaviour it should really mention that in the commit
message so it is highlighted.

+
+When the guest is not able to consume the packets fast enough,
+the transmit queue of the port gets filled up i.e queue runs out of
free descriptors.
+This is the most likely reason why dpdk transmit API will fail to
+send packets besides other reasons.
+
+The amount of tx failure drops on a dpdk vhost/physical interface
+can be shown with::
+
+  $ ovs-vsctl get Interface dpdkvhostclient0 \
+
+ statistics:netdev_dpdk_tx_failure_drops


The commit msg/docs are only focusing on one stat for vhost ports,
but there are other stats and dpdk ports, so they should all get some
mention.

I don't feel comfortable for documenting custom stats.  This is just
because we can't describe them all.  These are things to be changed over
time.
And we don't really know what some types of stats means and if they
means the same for different types of HW NICs (they are definitely
not) or OVS netdevs (even within netdev-dpdk).
And that is one more reason to have a prefix for OVS internal
statistics on which we have at least partial control.

I think, that user documentation could mention some special
statistics while describing the troubleshooting for some hard
special cases, but this should not be a glossary of all the possible
custom stats.  From my point of view, names of the stats should be
as possible self-descriptive and should not require additional
documentation.


For sure any prefix would need to be explained because that part
would not be intuitive.

We could add a note to the common DPDK guide about that in a single
sentence:
"There are custom statistics that OVS accumulates itself and these
stats has 'your_prefix_here_' as a prefix."


yes, that is what I had in mind. It clarifies that it is a custom stat which
explains why something like vhost tx_dropped does not have this prefix even
though it is also counted in ovs/sw/netdev_dpdk.

Anyway, I don't want Sriram to be held up over one line of documentation, so
i'm ok with whatever is decided.

thanks,
Kevin.

'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks
self-descriptive enough to not do that.


thanks,
Kevin.

BTW, you will not find any description for statistics provided by
the linux or DPDK drivers.  You could only look at the driver code or
device spec.

Best regards, Ilya Maximets.


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to