Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, 12 Sep 2016 16:46:08 -0700 Eric Dumazetwrote: > This XDP_TX thing was one of the XDP marketing stuff, but there is > absolutely no documentation on it, warning users about possible > limitations/outcomes. I will take care of documentation for the XDP project. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Tue, Sep 13, 2016 at 10:37:32AM -0700, Eric Dumazet wrote: > On Tue, 2016-09-13 at 10:13 -0700, Alexei Starovoitov wrote: > > > I'm afraid the point 'only for debugging' still didn't make it across. > > xdp+e1k is for development (and debugging) of xdp-type of bpf > > programs and _not_ for debugging of xdp itself, kernel or anything else. > > The e1k provided interfaces and behavior needs to match exactly > > what real hw nics (like mlx4, mlx5, igxbe, i40e) will do. > > Doing special hacks are not acceptable. Therefore your > > 'proposed fix' misses the mark, since: > > 1. ignoring bql/qdisc is not a bug, but the requirement > > 2. such 'fix' goes against the goal above since behaviors will be > > different and xdp developer won't be able to build something like > > xdp loadbalancer in the kvm. > > > > > Is e1k the only way a VM can receive and send packets ? > > Instead of adding more cruft to a legacy driver, risking breaking real > old machines, agree that it is the concern. > I am sure we can find modern alternative. I've looked through qemu and it appears only emulate e1k and tg3. The latter is still used in the field, so the risk of touching it is higher. The other alternative is virtio, but it doesn't have dma and/or pages, so it looks to me even messier hack. The last alternative considered was to invent xdp-only fake 'hw' nic, but it's too much work to get it into qemu then ask the world to upgrade qemu. At that point I ran out of ideas and settled on hacking e1k :( Not proud of this hack at all.
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Tue, Sep 13, 2016 at 10:13 AM, Alexei Starovoitovwrote: > On Tue, Sep 13, 2016 at 09:21:47AM -0700, Tom Herbert wrote: >> On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitov >> wrote: >> > On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote: >> >> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet >> >> wrote: >> >> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote: >> >> > >> >> >> yep. there are various ways to shoot yourself in the foot with xdp. >> >> >> The simplest program that drops all the packets will make the box >> >> >> unpingable. >> >> > >> >> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a >> >> > scooter on 101 highway ;) >> >> > >> >> > This XDP_TX thing was one of the XDP marketing stuff, but there is >> >> > absolutely no documentation on it, warning users about possible >> >> > limitations/outcomes. >> >> > >> >> > BTW, I am not sure mlx4 implementation even works, vs BQL : >> >> > >> >> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(), >> >> > but tx completion will call netdev_tx_completed_queue() -> crash >> >> > >> >> > Do we have one test to validate that a XDP_TX implementation is actually >> >> > correct ? >> >> > >> >> Obviously not for e1000 :-(. We really need some real test and >> >> performance results and analysis on the interaction between the stack >> >> data path and XDP data path. >> > >> > no. we don't need it for e1k and we cannot really do it. >> > this patch is for debugging of xdp programs only. >> > >> You can say this "only for a debugging" a thousand times and that >> still won't justify putting bad code into the kernel. Material issues >> have been raised with these patches, I have proposed a fix for one >> core issue, and we have requested a lot more testing. So, please, if >> you really want to move these patches forward start addressing the >> concerns being raised by reviewers. > > I'm afraid the point 'only for debugging' still didn't make it across. > xdp+e1k is for development (and debugging) of xdp-type of bpf > programs and _not_ for debugging of xdp itself, kernel or anything else. > The e1k provided interfaces and behavior needs to match exactly > what real hw nics (like mlx4, mlx5, igxbe, i40e) will do. > Doing special hacks are not acceptable. Therefore your > 'proposed fix' misses the mark, since: > 1. ignoring bql/qdisc is not a bug, but the requirement You don't seem to understand the problem. In the shared queue scenario if one party (the stack) implements qdiscs, BQL, and such and the other (XDP) just throws packets onto the queue then these are incompatible behaviors and something will break. I suppose it's possible that some how this does not affect the stack path, but remains to be proven. In any case the patches under review look very much like they break things; either a fix is needed or tests run to show it's not a problem. Until this is resolved I am going to nack the patch. Tom > 2. such 'fix' goes against the goal above since behaviors will be > different and xdp developer won't be able to build something like > xdp loadbalancer in the kvm. > > If you have other concerns please raise them or if you have > suggestions on how to develop xdp programs without this e1k patch > I would love hear them. > Alexander's review comments are discussed in separate thread. >
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Tue, 2016-09-13 at 10:13 -0700, Alexei Starovoitov wrote: > I'm afraid the point 'only for debugging' still didn't make it across. > xdp+e1k is for development (and debugging) of xdp-type of bpf > programs and _not_ for debugging of xdp itself, kernel or anything else. > The e1k provided interfaces and behavior needs to match exactly > what real hw nics (like mlx4, mlx5, igxbe, i40e) will do. > Doing special hacks are not acceptable. Therefore your > 'proposed fix' misses the mark, since: > 1. ignoring bql/qdisc is not a bug, but the requirement > 2. such 'fix' goes against the goal above since behaviors will be > different and xdp developer won't be able to build something like > xdp loadbalancer in the kvm. > Is e1k the only way a VM can receive and send packets ? Instead of adding more cruft to a legacy driver, risking breaking real old machines, I am sure we can find modern alternative.
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Tue, Sep 13, 2016 at 09:21:47AM -0700, Tom Herbert wrote: > On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitov >wrote: > > On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote: > >> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet > >> wrote: > >> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote: > >> > > >> >> yep. there are various ways to shoot yourself in the foot with xdp. > >> >> The simplest program that drops all the packets will make the box > >> >> unpingable. > >> > > >> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a > >> > scooter on 101 highway ;) > >> > > >> > This XDP_TX thing was one of the XDP marketing stuff, but there is > >> > absolutely no documentation on it, warning users about possible > >> > limitations/outcomes. > >> > > >> > BTW, I am not sure mlx4 implementation even works, vs BQL : > >> > > >> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(), > >> > but tx completion will call netdev_tx_completed_queue() -> crash > >> > > >> > Do we have one test to validate that a XDP_TX implementation is actually > >> > correct ? > >> > > >> Obviously not for e1000 :-(. We really need some real test and > >> performance results and analysis on the interaction between the stack > >> data path and XDP data path. > > > > no. we don't need it for e1k and we cannot really do it. > > this patch is for debugging of xdp programs only. > > > You can say this "only for a debugging" a thousand times and that > still won't justify putting bad code into the kernel. Material issues > have been raised with these patches, I have proposed a fix for one > core issue, and we have requested a lot more testing. So, please, if > you really want to move these patches forward start addressing the > concerns being raised by reviewers. I'm afraid the point 'only for debugging' still didn't make it across. xdp+e1k is for development (and debugging) of xdp-type of bpf programs and _not_ for debugging of xdp itself, kernel or anything else. The e1k provided interfaces and behavior needs to match exactly what real hw nics (like mlx4, mlx5, igxbe, i40e) will do. Doing special hacks are not acceptable. Therefore your 'proposed fix' misses the mark, since: 1. ignoring bql/qdisc is not a bug, but the requirement 2. such 'fix' goes against the goal above since behaviors will be different and xdp developer won't be able to build something like xdp loadbalancer in the kvm. If you have other concerns please raise them or if you have suggestions on how to develop xdp programs without this e1k patch I would love hear them. Alexander's review comments are discussed in separate thread.
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitovwrote: > On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote: >> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet wrote: >> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote: >> > >> >> yep. there are various ways to shoot yourself in the foot with xdp. >> >> The simplest program that drops all the packets will make the box >> >> unpingable. >> > >> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a >> > scooter on 101 highway ;) >> > >> > This XDP_TX thing was one of the XDP marketing stuff, but there is >> > absolutely no documentation on it, warning users about possible >> > limitations/outcomes. >> > >> > BTW, I am not sure mlx4 implementation even works, vs BQL : >> > >> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(), >> > but tx completion will call netdev_tx_completed_queue() -> crash >> > >> > Do we have one test to validate that a XDP_TX implementation is actually >> > correct ? >> > >> Obviously not for e1000 :-(. We really need some real test and >> performance results and analysis on the interaction between the stack >> data path and XDP data path. > > no. we don't need it for e1k and we cannot really do it. > this patch is for debugging of xdp programs only. > You can say this "only for a debugging" a thousand times and that still won't justify putting bad code into the kernel. Material issues have been raised with these patches, I have proposed a fix for one core issue, and we have requested a lot more testing. So, please, if you really want to move these patches forward start addressing the concerns being raised by reviewers. Tom >> The fact that these changes are being >> passed of as something only needed for KCM is irrelevant, e1000 is a >> well deployed a NIC and there's no restriction that I see that would >> prevent any users from enabling this feature on real devices. > > e1k is not even manufactured any more. Probably the only place > where it can be found is computer history museum. > e1000e fairs slightly better, but it's a different nic and this > patch is not about it. >
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote: > On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazetwrote: > > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote: > > > >> yep. there are various ways to shoot yourself in the foot with xdp. > >> The simplest program that drops all the packets will make the box > >> unpingable. > > > > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a > > scooter on 101 highway ;) > > > > This XDP_TX thing was one of the XDP marketing stuff, but there is > > absolutely no documentation on it, warning users about possible > > limitations/outcomes. > > > > BTW, I am not sure mlx4 implementation even works, vs BQL : > > > > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(), > > but tx completion will call netdev_tx_completed_queue() -> crash > > > > Do we have one test to validate that a XDP_TX implementation is actually > > correct ? > > > Obviously not for e1000 :-(. We really need some real test and > performance results and analysis on the interaction between the stack > data path and XDP data path. no. we don't need it for e1k and we cannot really do it. this patch is for debugging of xdp programs only. > The fact that these changes are being > passed of as something only needed for KCM is irrelevant, e1000 is a > well deployed a NIC and there's no restriction that I see that would > prevent any users from enabling this feature on real devices. e1k is not even manufactured any more. Probably the only place where it can be found is computer history museum. e1000e fairs slightly better, but it's a different nic and this patch is not about it.
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, Sep 12, 2016 at 04:46:08PM -0700, Eric Dumazet wrote: > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote: > > > yep. there are various ways to shoot yourself in the foot with xdp. > > The simplest program that drops all the packets will make the box > > unpingable. > > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a > scooter on 101 highway ;) > > This XDP_TX thing was one of the XDP marketing stuff, but there is > absolutely no documentation on it, warning users about possible > limitations/outcomes. that's fair critique. there is no documentation for xdp in general. > BTW, I am not sure mlx4 implementation even works, vs BQL : it doesn't and it shouldn't. xdp and stack use different tx queues. > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(), > but tx completion will call netdev_tx_completed_queue() -> crash not quite. netdev_tx_completed_queue() is not called for xdp queues. > Do we have one test to validate that a XDP_TX implementation is actually > correct ? it's correct in the scope that it was defined for. There is no objective to share the same tx queue with the stack in xdp. The queues must be absolutely separate otherwise performance will tank. This e1k patch is really special, because the e1k has one tx queue, but this is for debugging of the programs, so it's ok to cut corners. The e1k xdp support doesn't need to interact nicely with stack. It's impossible in the first place. xdp is the layer before the stack.
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazetwrote: > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote: > >> yep. there are various ways to shoot yourself in the foot with xdp. >> The simplest program that drops all the packets will make the box unpingable. > > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a > scooter on 101 highway ;) > > This XDP_TX thing was one of the XDP marketing stuff, but there is > absolutely no documentation on it, warning users about possible > limitations/outcomes. > > BTW, I am not sure mlx4 implementation even works, vs BQL : > > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(), > but tx completion will call netdev_tx_completed_queue() -> crash > > Do we have one test to validate that a XDP_TX implementation is actually > correct ? > Obviously not for e1000 :-(. We really need some real test and performance results and analysis on the interaction between the stack data path and XDP data path. The fact that these changes are being passed of as something only needed for KCM is irrelevant, e1000 is a well deployed a NIC and there's no restriction that I see that would prevent any users from enabling this feature on real devices. So these patches need to be tested and justified. Eric's skepticism in all this really doesn't surprise me... Tom > >
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote: > yep. there are various ways to shoot yourself in the foot with xdp. > The simplest program that drops all the packets will make the box unpingable. Well, my comment was about XDP_TX only, not about XDP_DROP or driving a scooter on 101 highway ;) This XDP_TX thing was one of the XDP marketing stuff, but there is absolutely no documentation on it, warning users about possible limitations/outcomes. BTW, I am not sure mlx4 implementation even works, vs BQL : mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(), but tx completion will call netdev_tx_completed_queue() -> crash Do we have one test to validate that a XDP_TX implementation is actually correct ?
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, Sep 12, 2016 at 03:46:39PM -0700, Eric Dumazet wrote: > On Mon, 2016-09-12 at 15:13 -0700, John Fastabend wrote: > > From: Alexei Starovoitov> > > +static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info, > > +u32 len, > > +struct net_device *netdev, > > +struct e1000_adapter *adapter) > > +{ > > + struct netdev_queue *txq = netdev_get_tx_queue(netdev, 0); > > + struct e1000_hw *hw = >hw; > > + struct e1000_tx_ring *tx_ring; > > + > > + if (len > E1000_MAX_DATA_PER_TXD) > > + return; > > + > > + /* e1000 only support a single txq at the moment so the queue is being > > +* shared with stack. To support this requires locking to ensure the > > +* stack and XDP are not running at the same time. Devices with > > +* multiple queues should allocate a separate queue space. > > +*/ > > + HARD_TX_LOCK(netdev, txq, smp_processor_id()); > > + > > + tx_ring = adapter->tx_ring; > > + > > + if (E1000_DESC_UNUSED(tx_ring) < 2) { > > + HARD_TX_UNLOCK(netdev, txq); > > + return; > > + } > > + > > + if (netif_xmit_frozen_or_stopped(txq)) > > + return; > > + > > + e1000_tx_map_rxpage(tx_ring, rx_buffer_info, len); > > + netdev_sent_queue(netdev, len); > > + e1000_tx_queue(adapter, tx_ring, 0/*tx_flags*/, 1); > > + > > + writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt); > > + mmiowb(); > > + > > + HARD_TX_UNLOCK(netdev, txq); > > +} > > > e1000_tx_map() is full of workarounds. > > Have a look at last_tx_tso for example. > >/* Workaround for Controller erratum -- > * descriptor for non-tso packet in a linear SKB that follows > a > * tso gets written back prematurely before the data is fully > * DMA'd to the controller > */ > if (!skb->data_len && tx_ring->last_tx_tso && > !skb_is_gso(skb)) { > tx_ring->last_tx_tso = false; > size -= 4; > } > > Look, this XDP_TX thing is hard to properly implement and test on > various NIC revisions. my reading of these e1k workarounds are for old versions of the nic that don't even exist anymore. I believe none of them apply to qemu e1k. > Without proper queue management, high prio packets in qdisc wont be sent > if NIC is under RX -> XDP_TX flood. yep. there are various ways to shoot yourself in the foot with xdp. The simplest program that drops all the packets will make the box unpingable. > Sounds a horrible feature to me. yep. not pretty by any means. This whole thing is only to test xdp programs under qemu which realistically emulates only e1k. I simply don't see any other options to test xdp under kvm.
Re: [net-next PATCH v3 2/3] e1000: add initial XDP support
On Mon, 2016-09-12 at 15:13 -0700, John Fastabend wrote: > From: Alexei Starovoitov> +static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info, > + u32 len, > + struct net_device *netdev, > + struct e1000_adapter *adapter) > +{ > + struct netdev_queue *txq = netdev_get_tx_queue(netdev, 0); > + struct e1000_hw *hw = >hw; > + struct e1000_tx_ring *tx_ring; > + > + if (len > E1000_MAX_DATA_PER_TXD) > + return; > + > + /* e1000 only support a single txq at the moment so the queue is being > + * shared with stack. To support this requires locking to ensure the > + * stack and XDP are not running at the same time. Devices with > + * multiple queues should allocate a separate queue space. > + */ > + HARD_TX_LOCK(netdev, txq, smp_processor_id()); > + > + tx_ring = adapter->tx_ring; > + > + if (E1000_DESC_UNUSED(tx_ring) < 2) { > + HARD_TX_UNLOCK(netdev, txq); > + return; > + } > + > + if (netif_xmit_frozen_or_stopped(txq)) > + return; > + > + e1000_tx_map_rxpage(tx_ring, rx_buffer_info, len); > + netdev_sent_queue(netdev, len); > + e1000_tx_queue(adapter, tx_ring, 0/*tx_flags*/, 1); > + > + writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt); > + mmiowb(); > + > + HARD_TX_UNLOCK(netdev, txq); > +} e1000_tx_map() is full of workarounds. Have a look at last_tx_tso for example. /* Workaround for Controller erratum -- * descriptor for non-tso packet in a linear SKB that follows a * tso gets written back prematurely before the data is fully * DMA'd to the controller */ if (!skb->data_len && tx_ring->last_tx_tso && !skb_is_gso(skb)) { tx_ring->last_tx_tso = false; size -= 4; } Look, this XDP_TX thing is hard to properly implement and test on various NIC revisions. Without proper queue management, high prio packets in qdisc wont be sent if NIC is under RX -> XDP_TX flood. Sounds a horrible feature to me.