On 22/10/2019 09:25, Sriram Vatala wrote: > Hi Ilya & Kevin, > Thanks for your suggestions. I am summarizing our discussion, please feel > free > to correct me, if I am wrong. >
Hi Sriram, Will share my thought, Ilya may have different view. > 1) All custom stats calculated in OVS will have the prefix , so that these > stats will not intersect with the names of other stats (driver/HW etc). yes > 2) The prefix used for these custom stats is "cstat_". I think with 3) (note to explain the prefix) then 'sw_' or 'ovs_' would look neater in the logs but i'm ok with any of them. > 3) Instead of documenting all the stats, I will add a generic note about the > prefix used for custom stats in "Documentation/topics/dpdk/vhost-user.rst > where "tx_retries" is documented. > yes, but these stats are not vhost specific so I think they should go into topics/dpdk/bridge.rst in the 'Extended & Custom Statistics' section. Kevin. > Thanks & Regards, > Sriram. > > -----Original Message----- > From: Kevin Traynor <[email protected]> > Sent: 21 October 2019 20:22 > To: Ilya Maximets <[email protected]>; Sriram Vatala > <[email protected]>; [email protected] > Cc: Stokes, Ian <[email protected]> > Subject: Re: [PATCH v9 2/2] netdev-dpdk:Detailed packet drop statistics > > On 18/10/2019 13:59, Ilya Maximets wrote: >> On 16.10.2019 19:48, Kevin Traynor wrote: >>> On 16/10/2019 13:16, Ilya Maximets wrote: >>>> Hi Kevin, >>>> >>>> Thanks for review. Some comments inline. >>>> >>>> On 16.10.2019 12:15, Kevin Traynor wrote: >>>>> Hi Sriram, >>>>> >>>>> Thanks for working on making more fine grained stats for debugging. >>>>> As mentioned yesterday, this patch needs rebase so I just reviewed >>>>> docs and netdev-dpdk.c which applied. Comments below. >>>>> >>>>> On 21/09/2019 03:40, Sriram Vatala wrote: >>>>>> OVS may be unable to transmit packets for multiple reasons and >>>>>> today there is a single counter to track packets dropped due to >>>>>> any of those reasons. The most common reason is that a VM is >>>>>> unable to read packets fast enough causing the vhostuser port >>>>>> transmit queue on the OVS side to become full. This manifests as a >>>>>> problem with VNFs not receiving all packets. Having a separate >>>>>> drop counter to track packets dropped because the transmit queue >>>>>> is full will clearly indicate that the problem is on the VM side >>>>>> and not in OVS. Similarly maintaining separate >>>>> >>>>> It reads like a bit of a contradiction. On one hand the "The *most >>>>> common* reason is that a VM is unable to read packets fast enough". >>>>> While having a stat "*will clearly indicate* that the problem is on >>>>> the VM side". >>>>> >>>>>> counters for all possible drops helps in indicating sensible cause >>>>>> for packet drops. >>>>>> >>>>>> This patch adds custom software stats counters to track packets >>>>>> dropped at port level and these counters are displayed along with >>>>>> other stats in "ovs-vsctl get interface <iface> statistics" >>>>>> command. The detailed stats will be available for both dpdk and >>>>>> vhostuser ports. >>>>>> >>>>> >>>>> I think the commit msg could be a bit clearer, suggest something >>>>> like below - feel free to modify (or reject): >>>>> >>>>> OVS may be unable to transmit packets for multiple reasons on the >>>>> DPDK datapath and today there is a single counter to track packets >>>>> dropped due to any of those reasons. >>>>> >>>>> This patch adds custom software stats for the different reasons >>>>> packets may be dropped during tx on the DPDK datapath in OVS. >>>>> >>>>> - MTU drops : drops that occur due to a too large packet size >>>>> - Qos drops: drops that occur due to egress QOS >>>>> - Tx failures: drops as returned by the DPDK PMD send function >>>>> >>>>> Note that the reason for tx failures is not specificied in OVS. In >>>>> practice for vhost ports it is most common that tx failures are >>>>> because there are not enough available descriptors, which is >>>>> usually caused by misconfiguration of the guest queues and/or >>>>> because the guest is not consuming packets fast enough from the queues. >>>>> >>>>> These counters are displayed along with other stats in "ovs-vsctl >>>>> get interface <iface> statistics" command and are available for >>>>> dpdk and vhostuser/vhostuserclient ports. >>>>> >>>>> Signed-off-by: Sriram Vatala <[email protected]> >>>>> >>>>> --- >>>>> >>>>> v9:... >>>>> v8:... >>>>> >>>>> >>>>> Note that signed-off, --- and version info should be like this ^^^. >>>>> otherwise the review version comments will be in the git commit log >>>>> when it is applied. >>>>> >>>>>> -- >>>>>> Changes since v8: >>>>>> Addressed comments given by Ilya. >>>>>> >>>>>> Signed-off-by: Sriram Vatala <[email protected]> >>>>>> --- >>>>>> Documentation/topics/dpdk/vhost-user.rst | 13 ++- >>>>>> lib/netdev-dpdk.c | 81 >>>>>> +++++++++++++++---- >>>>>> utilities/bugtool/automake.mk | 3 +- >>>>>> utilities/bugtool/ovs-bugtool-get-port-stats | 25 ++++++ >>>>>> .../plugins/network-status/openvswitch.xml | 1 + >>>>>> 5 files changed, 105 insertions(+), 18 deletions(-) >>>>>> create mode 100755 utilities/bugtool/ovs-bugtool-get-port-stats >>>>>> >>>>>> diff --git a/Documentation/topics/dpdk/vhost-user.rst >>>>>> b/Documentation/topics/dpdk/vhost-user.rst >>>>>> index 724aa62f6..89388a2bf 100644 >>>>>> --- a/Documentation/topics/dpdk/vhost-user.rst >>>>>> +++ b/Documentation/topics/dpdk/vhost-user.rst >>>>>> @@ -551,7 +551,18 @@ processing packets at the required rate. >>>>>> The amount of Tx retries on a vhost-user or vhost-user-client >>>>>> interface can be >>>>>> shown with:: >>>>>> >>>>>> - $ ovs-vsctl get Interface dpdkvhostclient0 >>>>>> statistics:tx_retries >>>>>> + $ ovs-vsctl get Interface dpdkvhostclient0 >>>>>> + statistics:netdev_dpdk_tx_retries >>>>> >>>>> I think the "netdev_dpdk_" prefixes should be removed for a few >>>>> reasons, >>>>> >>>>> - These are custom stats that will only be displayed for the dpdk >>>>> ports so they don't need any extra prefix to say they are for dpdk >>>>> ports >>>>> >>>>> - The 'tx_retries' stat is part of OVS 2.12, I don't like to change >>>>> its name to a different one in OVS 2.13 unless there is a very good >>>>> reason >>>>> >>>>> - The existing stats don't have this type of prefixes (granted most >>>>> of them are general stats): >>>> >>>> The main reason for the prefix for me is to distinguish those stats >>>> from the stats that comest from the driver/HW. Our new stats has >>>> very generic names that could be named a lot like or even equal to >>>> HW stats. This might be not an issue for vhostuser, but for HW NICs >>>> this may be really confusing for users. >>>> >>>> I'm not insisting on 'netdev_dpdk_' prefix, but, IMHO, we need a way >>>> to clearly distinguish those stats. We may use different prefixes >>>> like 'sw_' or just 'ovs_'. >>>> >>> >>> ok, I can see that we might want to distinguish custom stats to >>> indicate that they are specific for that device but that also has the >>> side effect of making them less self-descriptive and the user will >>> have to be told somewhere what the prefix means. >>> >>> 'sw_' and 'ovs_' seem better than 'netdev_dpdk_' but it might be >>> confusing too, as for example in DPDK case, Tx drops are calculated >>> in sw/ovs the same as qos/mtu/txfails but won't have that prefix. If >>> we really need a prefix 'cstat_' is the best I can think of. >> >> What we're trying to solve here is to distinguish stats that counted >> internally in OVS from the stats provided by the third parties. >> Just to not have same/similar names >> >> Usual stats from 'netdev_stats' are combined from various sources. >> For example, 'rx_dropped' in netdev-dpdk is a combination of our QoS >> dorps from OVS and HW counters like rte_stats.rx_nombuf. >> For me it looks like for these stats we have kind of best-effort >> policy to provide as full info as possible. Trying to describe how >> each of these counters calculated doesn't make sense because we do not >> know the origin of many of the components that comes from DPDK or HW. >> >>> >>>>> # ovs-vsctl get Interface dpdkvhost0 statistics >>>>> {"rx_1024_to_1522_packets"=0, "rx_128_to_255_packets"=0, >>>>> "rx_1523_to_max_packets"=0, "rx_1_to_64_packets"=25622176, >>>>> "rx_256_to_511_packets"=0, "rx_512_to_1023_packets"=0, >>>>> "rx_65_to_127_packets"=0, rx_bytes=1537330560, rx_dropped=0, >>>>> rx_errors=0, rx_packets=25622176, tx_bytes=3829825920, >>>>> tx_dropped=0, tx_packets=63830432, tx_retries=0} >>>>> >>>>> Also, just to note that if there are changes to existing >>>>> interfaces/behaviour it should really mention that in the commit >>>>> message so it is highlighted. >>>>> >>>>>> + >>>>>> +When the guest is not able to consume the packets fast enough, >>>>>> +the transmit queue of the port gets filled up i.e queue runs out of >>>>>> free descriptors. >>>>>> +This is the most likely reason why dpdk transmit API will fail to >>>>>> +send packets besides other reasons. >>>>>> + >>>>>> +The amount of tx failure drops on a dpdk vhost/physical interface >>>>>> +can be shown with:: >>>>>> + >>>>>> + $ ovs-vsctl get Interface dpdkvhostclient0 \ >>>>>> + >>>>>> + statistics:netdev_dpdk_tx_failure_drops >>>>>> >>>>> >>>>> The commit msg/docs are only focusing on one stat for vhost ports, >>>>> but there are other stats and dpdk ports, so they should all get some >>>>> mention. >>>> >>>> I don't feel comfortable for documenting custom stats. This is just >>>> because we can't describe them all. These are things to be changed over >>>> time. >>>> And we don't really know what some types of stats means and if they >>>> means the same for different types of HW NICs (they are definitely >>>> not) or OVS netdevs (even within netdev-dpdk). >>>> And that is one more reason to have a prefix for OVS internal >>>> statistics on which we have at least partial control. >>>> >>>> I think, that user documentation could mention some special >>>> statistics while describing the troubleshooting for some hard >>>> special cases, but this should not be a glossary of all the possible >>>> custom stats. From my point of view, names of the stats should be >>>> as possible self-descriptive and should not require additional >>>> documentation. >>>> >>> >>> For sure any prefix would need to be explained because that part >>> would not be intuitive. >> >> We could add a note to the common DPDK guide about that in a single >> sentence: >> "There are custom statistics that OVS accumulates itself and these >> stats has 'your_prefix_here_' as a prefix." >> > > yes, that is what I had in mind. It clarifies that it is a custom stat which > explains why something like vhost tx_dropped does not have this prefix even > though it is also counted in ovs/sw/netdev_dpdk. > > Anyway, I don't want Sriram to be held up over one line of documentation, so > i'm ok with whatever is decided. > > thanks, > Kevin. > >> 'cstat_' or 'sw_' might need this, but 'netdev_dpdk_' or 'ovs_' looks >> self-descriptive enough to not do that. >> >>> >>> thanks, >>> Kevin. >>> >>>> BTW, you will not find any description for statistics provided by >>>> the linux or DPDK drivers. You could only look at the driver code or >>>> device spec. >>>> >>>> Best regards, Ilya Maximets. >>>> >>> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
