Monday, July 23, 2018 2:14 PM, tom.barbe...@uliege.be: > Subject: Re: Mlx4/5 : Packets lost between phy and good counters > > Hi Shahaf, > > Thank you for the help ! > > I did not notice ethtool showed more stats, indeed it would be great to have > them in DPDK. As you suggested, rx_discards_phy is increasing so packets > are dropped there. > > However, it is not due to a lack of buffer (if you meant queues/ring buffer as > opposed to some mellanox internals) as the CPU is starving for work on any > queues. We also ensured the CPU was not the problem by 1) using more > CPU cores, 2) introducing on-purpose instructions and cache misses on the > CPU, that did not lead to any performance loss.
I didn't say the backpressure comes from CPU, it is probably triggered by the NIC from some reason (the PCI and scatter check I requested from you were some simple sanity checks for possible reasons). > > 1) Both cards on both machines are on a PCIe Gen 3 x 16 and acknowledged > both by lspci and Mlx5 driver as it. > 2) Disabling/enabling scatter mode in ethtool does not change performances, Not through ethtool, by DPDK APIs. > but I don't think we're using it anyway (we do nothing special in DPDK for > this > ? Packets are always one segment) > 3) We followed the performance guide(s) among other things, with the > exception of CQE_COMPRESSION as we didn't find any "mst" reference. > > We noticed that when using only one side of a port, that is one machine only > doing TX, and the other doing RX (discarding packets, but still rewriting > them), we do send/receive 100G (the numbers discussed before lead to a > ~80G "bouncing" throughput cap). > > This is still true with Connect-X 4 or 5, and with different (Intel) machines > with different motherboards. Maybe the mlx5 perform slightly better > (bouncing 84G) but there is still this cap, and it may be due to other > parameters. > > Interestingly, we found that this cap is somehow dependent on the card and > not the port, as if we use the two ports of the PCIe card, forwarding from A > to B and B to A at full speed, the throughput goes down to ~40G per port (so > 80G total forwarding throughput), but if we use two different PCI express > card, it is back to ~80G per side, so ~160G forwarding rate total (also > leading > to the conclusion that our problem is not CPU-based as with more PCIe card > we have better perfs). It looks like the bottleneck is on the PCI, and CQE_COMPRESSION configuration can be the reason for that (it is feature to save PCI utilization and critical to reach 100G w/ small frames). As this looks like a NIC/System configuration issue I suggest to open a ticket to Mellanox Support in order to look on your system and advise. > > Thanks, > > > Tom > > ----- Mail original ----- > > De: "Shahaf Shuler" <shah...@mellanox.com> > > À: "tom barbette" <tom.barbe...@uliege.be>, users@dpdk.org > > Cc: katsi...@kth.se, "Erez Ferber" <er...@mellanox.com> > > Envoyé: Dimanche 22 Juillet 2018 07:14:05 > > Objet: RE: Mlx4/5 : Packets lost between phy and good counters > > > Hi Tom, > > > > Wednesday, July 18, 2018 6:41 PM, tom.barbe...@uliege.be: > >> Cc: katsi...@kth.se > >> Subject: [dpdk-users] Mlx4/5 : Packets lost between phy and good > >> counters > >> > >> Hi all, > >> > >> During a simple forwarding experiment using mlx4 (but we observed the > >> same with mlx5) 100G NICs, we have a sender reporting more TX > >> throughput than what the receiver is receiving, but the receiver does > >> not report any packet loss... They are connected by a simple QSFP28 > >> direct attach cable. So where did the packet disappear? > >> > >> The only thing we could find is that rx_good_packets in xstats is > >> lower than rx_packets_phy. rx_packets_phy is in line with what the > >> sender is reporting, so I guess some of the "phy" are not "good". But > >> no error counter, missed, mbuf_alloc, ... is giving as a clue why those > packets are not "good". > >> > >> We tried with real traces and UDP crafted packets of various size, > >> same problem. > >> > >> Any idea ? > > > > Yes, what you are experiencing is a packet drop due to backpressure > > from the device. > > > > The rx_good_packets are the good packets (w/o errors) received by the > > port (can be either PF or VF). > > The rx_packets_phy are the packets received by the physical port (this > > is the aggregation of the PF and all of the VFs). > > A gap between those means some packet has been lost, or as you said > > received w/ errors. We are indeed missing one counter here which is > > the rx_discard_phy which counts The number of received packets dropped > > due to lack of buffers on a physical port. This work is in progress. > > > > There is another way to query this counter (and many others) for > > Mellanox devices by using linux ethtool: "ethtool -S <ifname>" > > (Mellanox devices keep their kernel module). > > The statistics in DPDK are shadow of the ethtool ones. You can read > > more about those counters in the community doc[1]. > > w/ the ethtool statistics look for the discard counter and validate if > > it is increasing. > > > > Assuming it does, we need to understand why you have such > backpressure. > > Things to check: > > 1. is the PCI slot for your mlx5 device is indeed by 16x? > > 2. are you using scatter mode w/ large max_rx_pkt_len? > > 3. have you followed the mlx5 performance tunning guide[2]? > > > > > >> > >> Below the detail stats of the receiver (which is a forwarder but it > >> is not of importance in this context) : > >> > >> stats.count: > >> 31986429 > >> stats.missed: > >> 0 > >> stats.error: > >> 0 > >> fd0.xstats: > >> rx_good_packets[0] = 31986429 > >> tx_good_packets[1] = 31986429 > >> rx_good_bytes[2] = 47979639204 > >> tx_good_bytes[3] = 47851693488 > >> rx_missed_errors[4] = 0 > >> rx_errors[5] = 0 > >> tx_errors[6] = 0 > >> rx_mbuf_allocation_errors[7] = 0 > >> rx_q0packets[8] = 4000025 > >> rx_q0bytes[9] = 6000036068 > >> rx_q0errors[10] = 0 > >> rx_q1packets[11] = 4002151 > >> rx_q1bytes[12] = 6003226500 > >> rx_q1errors[13] = 0 > >> rx_q2packets[14] = 3996758 > >> rx_q2bytes[15] = 5995137000 > >> rx_q2errors[16] = 0 > >> rx_q3packets[17] = 3993614 > >> rx_q3bytes[18] = 5990421000 > >> rx_q3errors[19] = 0 > >> rx_q4packets[20] = 3995758 > >> rx_q4bytes[21] = 5993637000 > >> rx_q4errors[22] = 0 > >> rx_q5packets[23] = 3992126 > >> rx_q5bytes[24] = 5988189000 > >> rx_q5errors[25] = 0 > >> rx_q6packets[26] = 4007488 > >> rx_q6bytes[27] = 6011230568 > >> rx_q6errors[28] = 0 > >> rx_q7packets[29] = 3998509 > >> rx_q7bytes[30] = 5997762068 > >> rx_q7errors[31] = 0 > >> tx_q0packets[32] = 4000025 > >> tx_q0bytes[33] = 5984035968 > >> tx_q1packets[34] = 4002151 > >> tx_q1bytes[35] = 5987217896 > >> tx_q2packets[36] = 3996758 > >> tx_q2bytes[37] = 5979149968 > >> tx_q3packets[38] = 3993614 > >> tx_q3bytes[39] = 5974446544 > >> tx_q4packets[40] = 3995758 > >> tx_q4bytes[41] = 5977653968 > >> tx_q5packets[42] = 3992126 > >> tx_q5bytes[43] = 5972220496 > >> tx_q6packets[44] = 4007488 > >> tx_q6bytes[45] = 5995200616 > >> tx_q7packets[46] = 3998509 > >> tx_q7bytes[47] = 5981768032 > >> rx_port_unicast_bytes[48] = 47851693488 rx_port_multicast_bytes[49] = > >> 0 rx_port_broadcast_bytes[50] = 0 rx_port_unicast_packets[51] = > >> 31986429 rx_port_multicast_packets[52] = 0 > >> rx_port_broadcast_packets[53] = 0 tx_port_unicast_bytes[54] = > >> 47851693488 tx_port_multicast_bytes[55] = 0 > >> tx_port_broadcast_bytes[56] = 0 tx_port_unicast_packets[57] = > >> 31986429 tx_port_multicast_packets[58] = 0 > >> tx_port_broadcast_packets[59] = 0 rx_wqe_err[60] = 0 > >> rx_crc_errors_phy[61] = 0 rx_in_range_len_errors_phy[62] = 0 > >> rx_symbol_err_phy[63] = 0 tx_errors_phy[64] = 0 rx_out_of_buffer[65] > >> = 0 tx_packets_phy[66] = 31986429 rx_packets_phy[67] = 36243270 > >> tx_bytes_phy[68] = 47979639204 rx_bytes_phy[69] = 54364900704 > >> > >> > >> Thanks, > >> Tom > > > > [1] https://community.mellanox.com/docs/DOC-2532 > > [2] > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdo > c > > > .dpdk.org%2Fguides%2Fnics%2Fmlx5.html&data=02%7C01%7Cshahafs > %40mel > > > lanox.com%7Cfb4a024df18c43fa3ede08d5f08d662c%7Ca652971c7d2e4d9ba6 > a4d14 > > > 9256f461b%7C0%7C0%7C636679412485381834&sdata=AqmeV36SzgCaN > azE8PMna > > sdycGqkW7w98v9WeNij9bw%3D&reserved=0