Hi Tom, Wednesday, July 18, 2018 6:41 PM, tom.barbe...@uliege.be: > Cc: katsi...@kth.se > Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good counters > > Hi all, > > During a simple forwarding experiment using mlx4 (but we observed the > same with mlx5) 100G NICs, we have a sender reporting more TX throughput > than what the receiver is receiving, but the receiver does not report any > packet loss... They are connected by a simple QSFP28 direct attach cable. So > where did the packet disappear? > > The only thing we could find is that rx_good_packets in xstats is lower than > rx_packets_phy. rx_packets_phy is in line with what the sender is reporting, > so I guess some of the "phy" are not "good". But no error counter, missed, > mbuf_alloc, ... is giving as a clue why those packets are not "good". > > We tried with real traces and UDP crafted packets of various size, same > problem. > > Any idea ?
Yes, what you are experiencing is a packet drop due to backpressure from the device. The rx_good_packets are the good packets (w/o errors) received by the port (can be either PF or VF). The rx_packets_phy are the packets received by the physical port (this is the aggregation of the PF and all of the VFs). A gap between those means some packet has been lost, or as you said received w/ errors. We are indeed missing one counter here which is the rx_discard_phy which counts The number of received packets dropped due to lack of buffers on a physical port. This work is in progress. There is another way to query this counter (and many others) for Mellanox devices by using linux ethtool: "ethtool -S <ifname>" (Mellanox devices keep their kernel module). The statistics in DPDK are shadow of the ethtool ones. You can read more about those counters in the community doc[1]. w/ the ethtool statistics look for the discard counter and validate if it is increasing. Assuming it does, we need to understand why you have such backpressure. Things to check: 1. is the PCI slot for your mlx5 device is indeed by 16x? 2. are you using scatter mode w/ large max_rx_pkt_len? 3. have you followed the mlx5 performance tunning guide[2]? > > Below the detail stats of the receiver (which is a forwarder but it is not of > importance in this context) : > > stats.count: > 31986429 > stats.missed: > 0 > stats.error: > 0 > fd0.xstats: > rx_good_packets[0] = 31986429 > tx_good_packets[1] = 31986429 > rx_good_bytes[2] = 47979639204 > tx_good_bytes[3] = 47851693488 > rx_missed_errors[4] = 0 > rx_errors[5] = 0 > tx_errors[6] = 0 > rx_mbuf_allocation_errors[7] = 0 > rx_q0packets[8] = 4000025 > rx_q0bytes[9] = 6000036068 > rx_q0errors[10] = 0 > rx_q1packets[11] = 4002151 > rx_q1bytes[12] = 6003226500 > rx_q1errors[13] = 0 > rx_q2packets[14] = 3996758 > rx_q2bytes[15] = 5995137000 > rx_q2errors[16] = 0 > rx_q3packets[17] = 3993614 > rx_q3bytes[18] = 5990421000 > rx_q3errors[19] = 0 > rx_q4packets[20] = 3995758 > rx_q4bytes[21] = 5993637000 > rx_q4errors[22] = 0 > rx_q5packets[23] = 3992126 > rx_q5bytes[24] = 5988189000 > rx_q5errors[25] = 0 > rx_q6packets[26] = 4007488 > rx_q6bytes[27] = 6011230568 > rx_q6errors[28] = 0 > rx_q7packets[29] = 3998509 > rx_q7bytes[30] = 5997762068 > rx_q7errors[31] = 0 > tx_q0packets[32] = 4000025 > tx_q0bytes[33] = 5984035968 > tx_q1packets[34] = 4002151 > tx_q1bytes[35] = 5987217896 > tx_q2packets[36] = 3996758 > tx_q2bytes[37] = 5979149968 > tx_q3packets[38] = 3993614 > tx_q3bytes[39] = 5974446544 > tx_q4packets[40] = 3995758 > tx_q4bytes[41] = 5977653968 > tx_q5packets[42] = 3992126 > tx_q5bytes[43] = 5972220496 > tx_q6packets[44] = 4007488 > tx_q6bytes[45] = 5995200616 > tx_q7packets[46] = 3998509 > tx_q7bytes[47] = 5981768032 > rx_port_unicast_bytes[48] = 47851693488 > rx_port_multicast_bytes[49] = 0 > rx_port_broadcast_bytes[50] = 0 > rx_port_unicast_packets[51] = 31986429 > rx_port_multicast_packets[52] = 0 > rx_port_broadcast_packets[53] = 0 > tx_port_unicast_bytes[54] = 47851693488 > tx_port_multicast_bytes[55] = 0 > tx_port_broadcast_bytes[56] = 0 > tx_port_unicast_packets[57] = 31986429 > tx_port_multicast_packets[58] = 0 > tx_port_broadcast_packets[59] = 0 > rx_wqe_err[60] = 0 > rx_crc_errors_phy[61] = 0 > rx_in_range_len_errors_phy[62] = 0 > rx_symbol_err_phy[63] = 0 > tx_errors_phy[64] = 0 > rx_out_of_buffer[65] = 0 > tx_packets_phy[66] = 31986429 > rx_packets_phy[67] = 36243270 > tx_bytes_phy[68] = 47979639204 > rx_bytes_phy[69] = 54364900704 > > > Thanks, > Tom [1] https://community.mellanox.com/docs/DOC-2532 [2] https://doc.dpdk.org/guides/nics/mlx5.html