Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-18 Thread Jesper Dangaard Brouer
On Mon, 12 Sep 2016 16:46:08 -0700
Eric Dumazet  wrote:

> This XDP_TX thing was one of the XDP marketing stuff, but there is
> absolutely no documentation on it, warning users about possible
> limitations/outcomes.

I will take care of documentation for the XDP project.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Alexei Starovoitov
On Tue, Sep 13, 2016 at 10:37:32AM -0700, Eric Dumazet wrote:
> On Tue, 2016-09-13 at 10:13 -0700, Alexei Starovoitov wrote:
> 
> > I'm afraid the point 'only for debugging' still didn't make it across.
> > xdp+e1k is for development (and debugging) of xdp-type of bpf
> > programs and _not_ for debugging of xdp itself, kernel or anything else.
> > The e1k provided interfaces and behavior needs to match exactly
> > what real hw nics (like mlx4, mlx5, igxbe, i40e) will do.
> > Doing special hacks are not acceptable. Therefore your
> > 'proposed fix' misses the mark, since:
> > 1. ignoring bql/qdisc is not a bug, but the requirement
> > 2. such 'fix' goes against the goal above since behaviors will be
> > different and xdp developer won't be able to build something like
> > xdp loadbalancer in the kvm.
> > 
> 
> 
> Is e1k the only way a VM can receive and send packets ?
> 
> Instead of adding more cruft to a legacy driver, risking breaking real
> old machines, 

agree that it is the concern.

> I am sure we can find modern alternative.

I've looked through qemu and it appears only emulate e1k and tg3.
The latter is still used in the field, so the risk of touching
it is higher.
The other alternative is virtio, but it doesn't have dma and/or pages,
so it looks to me even messier hack.
The last alternative considered was to invent xdp-only fake 'hw' nic,
but it's too much work to get it into qemu then ask the world
to upgrade qemu.
At that point I ran out of ideas and settled on hacking e1k :(
Not proud of this hack at all.



Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Tom Herbert
On Tue, Sep 13, 2016 at 10:13 AM, Alexei Starovoitov
 wrote:
> On Tue, Sep 13, 2016 at 09:21:47AM -0700, Tom Herbert wrote:
>> On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitov
>>  wrote:
>> > On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote:
>> >> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet  
>> >> wrote:
>> >> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
>> >> >
>> >> >> yep. there are various ways to shoot yourself in the foot with xdp.
>> >> >> The simplest program that drops all the packets will make the box 
>> >> >> unpingable.
>> >> >
>> >> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
>> >> > scooter on 101 highway ;)
>> >> >
>> >> > This XDP_TX thing was one of the XDP marketing stuff, but there is
>> >> > absolutely no documentation on it, warning users about possible
>> >> > limitations/outcomes.
>> >> >
>> >> > BTW, I am not sure mlx4 implementation even works, vs BQL :
>> >> >
>> >> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
>> >> > but tx completion will call netdev_tx_completed_queue() -> crash
>> >> >
>> >> > Do we have one test to validate that a XDP_TX implementation is actually
>> >> > correct ?
>> >> >
>> >> Obviously not for e1000 :-(. We really need some real test and
>> >> performance results and analysis on the interaction between the stack
>> >> data path and XDP data path.
>> >
>> > no. we don't need it for e1k and we cannot really do it.
>> >  this patch is for debugging of xdp programs only.
>> >
>> You can say this "only for a debugging" a thousand times and that
>> still won't justify putting bad code into the kernel. Material issues
>> have been raised with these patches, I have proposed a fix for one
>> core issue, and we have requested a lot more testing. So, please, if
>> you really want to move these patches forward start addressing the
>> concerns being raised by reviewers.
>
> I'm afraid the point 'only for debugging' still didn't make it across.
> xdp+e1k is for development (and debugging) of xdp-type of bpf
> programs and _not_ for debugging of xdp itself, kernel or anything else.
> The e1k provided interfaces and behavior needs to match exactly
> what real hw nics (like mlx4, mlx5, igxbe, i40e) will do.
> Doing special hacks are not acceptable. Therefore your
> 'proposed fix' misses the mark, since:
> 1. ignoring bql/qdisc is not a bug, but the requirement

You don't seem to understand the problem. In the shared queue scenario
if one party (the stack) implements qdiscs, BQL, and such and the
other (XDP) just throws packets onto the queue then these are
incompatible behaviors and something will break. I suppose it's
possible that some how this does not affect the stack path, but
remains to be proven. In any case the patches under review look very
much like they break things; either a fix is needed or tests run to
show it's not a problem. Until this is resolved I am going to nack the
patch.

Tom

> 2. such 'fix' goes against the goal above since behaviors will be
> different and xdp developer won't be able to build something like
> xdp loadbalancer in the kvm.
>
> If you have other concerns please raise them or if you have
> suggestions on how to develop xdp programs without this e1k patch
> I would love hear them.
> Alexander's review comments are discussed in separate thread.
>


Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Eric Dumazet
On Tue, 2016-09-13 at 10:13 -0700, Alexei Starovoitov wrote:

> I'm afraid the point 'only for debugging' still didn't make it across.
> xdp+e1k is for development (and debugging) of xdp-type of bpf
> programs and _not_ for debugging of xdp itself, kernel or anything else.
> The e1k provided interfaces and behavior needs to match exactly
> what real hw nics (like mlx4, mlx5, igxbe, i40e) will do.
> Doing special hacks are not acceptable. Therefore your
> 'proposed fix' misses the mark, since:
> 1. ignoring bql/qdisc is not a bug, but the requirement
> 2. such 'fix' goes against the goal above since behaviors will be
> different and xdp developer won't be able to build something like
> xdp loadbalancer in the kvm.
> 


Is e1k the only way a VM can receive and send packets ?

Instead of adding more cruft to a legacy driver, risking breaking real
old machines, I am sure we can find modern alternative.





Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Alexei Starovoitov
On Tue, Sep 13, 2016 at 09:21:47AM -0700, Tom Herbert wrote:
> On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitov
>  wrote:
> > On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote:
> >> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet  
> >> wrote:
> >> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
> >> >
> >> >> yep. there are various ways to shoot yourself in the foot with xdp.
> >> >> The simplest program that drops all the packets will make the box 
> >> >> unpingable.
> >> >
> >> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
> >> > scooter on 101 highway ;)
> >> >
> >> > This XDP_TX thing was one of the XDP marketing stuff, but there is
> >> > absolutely no documentation on it, warning users about possible
> >> > limitations/outcomes.
> >> >
> >> > BTW, I am not sure mlx4 implementation even works, vs BQL :
> >> >
> >> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
> >> > but tx completion will call netdev_tx_completed_queue() -> crash
> >> >
> >> > Do we have one test to validate that a XDP_TX implementation is actually
> >> > correct ?
> >> >
> >> Obviously not for e1000 :-(. We really need some real test and
> >> performance results and analysis on the interaction between the stack
> >> data path and XDP data path.
> >
> > no. we don't need it for e1k and we cannot really do it.
> >  this patch is for debugging of xdp programs only.
> >
> You can say this "only for a debugging" a thousand times and that
> still won't justify putting bad code into the kernel. Material issues
> have been raised with these patches, I have proposed a fix for one
> core issue, and we have requested a lot more testing. So, please, if
> you really want to move these patches forward start addressing the
> concerns being raised by reviewers.

I'm afraid the point 'only for debugging' still didn't make it across.
xdp+e1k is for development (and debugging) of xdp-type of bpf
programs and _not_ for debugging of xdp itself, kernel or anything else.
The e1k provided interfaces and behavior needs to match exactly
what real hw nics (like mlx4, mlx5, igxbe, i40e) will do.
Doing special hacks are not acceptable. Therefore your
'proposed fix' misses the mark, since:
1. ignoring bql/qdisc is not a bug, but the requirement
2. such 'fix' goes against the goal above since behaviors will be
different and xdp developer won't be able to build something like
xdp loadbalancer in the kvm.

If you have other concerns please raise them or if you have
suggestions on how to develop xdp programs without this e1k patch
I would love hear them.
Alexander's review comments are discussed in separate thread.



Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-13 Thread Tom Herbert
On Mon, Sep 12, 2016 at 6:28 PM, Alexei Starovoitov
 wrote:
> On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote:
>> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet  wrote:
>> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
>> >
>> >> yep. there are various ways to shoot yourself in the foot with xdp.
>> >> The simplest program that drops all the packets will make the box 
>> >> unpingable.
>> >
>> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
>> > scooter on 101 highway ;)
>> >
>> > This XDP_TX thing was one of the XDP marketing stuff, but there is
>> > absolutely no documentation on it, warning users about possible
>> > limitations/outcomes.
>> >
>> > BTW, I am not sure mlx4 implementation even works, vs BQL :
>> >
>> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
>> > but tx completion will call netdev_tx_completed_queue() -> crash
>> >
>> > Do we have one test to validate that a XDP_TX implementation is actually
>> > correct ?
>> >
>> Obviously not for e1000 :-(. We really need some real test and
>> performance results and analysis on the interaction between the stack
>> data path and XDP data path.
>
> no. we don't need it for e1k and we cannot really do it.
>  this patch is for debugging of xdp programs only.
>
You can say this "only for a debugging" a thousand times and that
still won't justify putting bad code into the kernel. Material issues
have been raised with these patches, I have proposed a fix for one
core issue, and we have requested a lot more testing. So, please, if
you really want to move these patches forward start addressing the
concerns being raised by reviewers.

Tom

>> The fact that these changes are being
>> passed of as something only needed for KCM is irrelevant, e1000 is a
>> well deployed a NIC and there's no restriction that I see that would
>> prevent any users from enabling this feature on real devices.
>
> e1k is not even manufactured any more. Probably the only place
> where it can be found is computer history museum.
> e1000e fairs slightly better, but it's a different nic and this
> patch is not about it.
>


Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-12 Thread Alexei Starovoitov
On Mon, Sep 12, 2016 at 05:03:25PM -0700, Tom Herbert wrote:
> On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet  wrote:
> > On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
> >
> >> yep. there are various ways to shoot yourself in the foot with xdp.
> >> The simplest program that drops all the packets will make the box 
> >> unpingable.
> >
> > Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
> > scooter on 101 highway ;)
> >
> > This XDP_TX thing was one of the XDP marketing stuff, but there is
> > absolutely no documentation on it, warning users about possible
> > limitations/outcomes.
> >
> > BTW, I am not sure mlx4 implementation even works, vs BQL :
> >
> > mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
> > but tx completion will call netdev_tx_completed_queue() -> crash
> >
> > Do we have one test to validate that a XDP_TX implementation is actually
> > correct ?
> >
> Obviously not for e1000 :-(. We really need some real test and
> performance results and analysis on the interaction between the stack
> data path and XDP data path.

no. we don't need it for e1k and we cannot really do it.
 this patch is for debugging of xdp programs only.

> The fact that these changes are being
> passed of as something only needed for KCM is irrelevant, e1000 is a
> well deployed a NIC and there's no restriction that I see that would
> prevent any users from enabling this feature on real devices.

e1k is not even manufactured any more. Probably the only place
where it can be found is computer history museum.
e1000e fairs slightly better, but it's a different nic and this
patch is not about it.



Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-12 Thread Alexei Starovoitov
On Mon, Sep 12, 2016 at 04:46:08PM -0700, Eric Dumazet wrote:
> On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
> 
> > yep. there are various ways to shoot yourself in the foot with xdp.
> > The simplest program that drops all the packets will make the box 
> > unpingable.
> 
> Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
> scooter on 101 highway ;)
> 
> This XDP_TX thing was one of the XDP marketing stuff, but there is
> absolutely no documentation on it, warning users about possible
> limitations/outcomes.

that's fair critique. there is no documentation for xdp in general.

> BTW, I am not sure mlx4 implementation even works, vs BQL :

it doesn't and it shouldn't. xdp and stack use different tx queues.

> mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
> but tx completion will call netdev_tx_completed_queue() -> crash

not quite. netdev_tx_completed_queue() is not called for xdp queues.

> Do we have one test to validate that a XDP_TX implementation is actually
> correct ?

it's correct in the scope that it was defined for.
There is no objective to share the same tx queue with the stack in xdp.
The queues must be absolutely separate otherwise performance will tank.

This e1k patch is really special, because the e1k has one tx queue,
but this is for debugging of the programs, so it's ok to cut corners.
The e1k xdp support doesn't need to interact nicely with stack.
It's impossible in the first place. xdp is the layer before the stack.



Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-12 Thread Tom Herbert
On Mon, Sep 12, 2016 at 4:46 PM, Eric Dumazet  wrote:
> On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:
>
>> yep. there are various ways to shoot yourself in the foot with xdp.
>> The simplest program that drops all the packets will make the box unpingable.
>
> Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
> scooter on 101 highway ;)
>
> This XDP_TX thing was one of the XDP marketing stuff, but there is
> absolutely no documentation on it, warning users about possible
> limitations/outcomes.
>
> BTW, I am not sure mlx4 implementation even works, vs BQL :
>
> mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
> but tx completion will call netdev_tx_completed_queue() -> crash
>
> Do we have one test to validate that a XDP_TX implementation is actually
> correct ?
>
Obviously not for e1000 :-(. We really need some real test and
performance results and analysis on the interaction between the stack
data path and XDP data path. The fact that these changes are being
passed of as something only needed for KCM is irrelevant, e1000 is a
well deployed a NIC and there's no restriction that I see that would
prevent any users from enabling this feature on real devices. So these
patches need to be tested and justified. Eric's skepticism in all this
really doesn't surprise me...

Tom

>
>


Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-12 Thread Eric Dumazet
On Mon, 2016-09-12 at 16:07 -0700, Alexei Starovoitov wrote:

> yep. there are various ways to shoot yourself in the foot with xdp.
> The simplest program that drops all the packets will make the box unpingable.

Well, my comment was about XDP_TX only, not about XDP_DROP or driving a
scooter on 101 highway ;)

This XDP_TX thing was one of the XDP marketing stuff, but there is
absolutely no documentation on it, warning users about possible
limitations/outcomes.

BTW, I am not sure mlx4 implementation even works, vs BQL :

mlx4_en_xmit_frame() does not call netdev_tx_sent_queue(),
but tx completion will call netdev_tx_completed_queue() -> crash

Do we have one test to validate that a XDP_TX implementation is actually
correct ?





Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-12 Thread Alexei Starovoitov
On Mon, Sep 12, 2016 at 03:46:39PM -0700, Eric Dumazet wrote:
> On Mon, 2016-09-12 at 15:13 -0700, John Fastabend wrote:
> > From: Alexei Starovoitov 
> 
> > +static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info,
> > +u32 len,
> > +struct net_device *netdev,
> > +struct e1000_adapter *adapter)
> > +{
> > +   struct netdev_queue *txq = netdev_get_tx_queue(netdev, 0);
> > +   struct e1000_hw *hw = >hw;
> > +   struct e1000_tx_ring *tx_ring;
> > +
> > +   if (len > E1000_MAX_DATA_PER_TXD)
> > +   return;
> > +
> > +   /* e1000 only support a single txq at the moment so the queue is being
> > +* shared with stack. To support this requires locking to ensure the
> > +* stack and XDP are not running at the same time. Devices with
> > +* multiple queues should allocate a separate queue space.
> > +*/
> > +   HARD_TX_LOCK(netdev, txq, smp_processor_id());
> > +
> > +   tx_ring = adapter->tx_ring;
> > +
> > +   if (E1000_DESC_UNUSED(tx_ring) < 2) {
> > +   HARD_TX_UNLOCK(netdev, txq);
> > +   return;
> > +   }
> > +
> > +   if (netif_xmit_frozen_or_stopped(txq))
> > +   return;
> > +
> > +   e1000_tx_map_rxpage(tx_ring, rx_buffer_info, len);
> > +   netdev_sent_queue(netdev, len);
> > +   e1000_tx_queue(adapter, tx_ring, 0/*tx_flags*/, 1);
> > +
> > +   writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
> > +   mmiowb();
> > +
> > +   HARD_TX_UNLOCK(netdev, txq);
> > +}
> 
> 
> e1000_tx_map() is full of workarounds.
> 
> Have a look at last_tx_tso for example.
> 
>/* Workaround for Controller erratum --
>  * descriptor for non-tso packet in a linear SKB that follows 
> a
>  * tso gets written back prematurely before the data is fully
>  * DMA'd to the controller
>  */
> if (!skb->data_len && tx_ring->last_tx_tso &&
> !skb_is_gso(skb)) {
> tx_ring->last_tx_tso = false;
> size -= 4;
> }
> 
> Look, this XDP_TX thing is hard to properly implement and test on
> various NIC revisions.

my reading of these e1k workarounds are for old versions of the nic that
don't even exist anymore. I believe none of them apply to qemu e1k.

> Without proper queue management, high prio packets in qdisc wont be sent
> if NIC is under RX -> XDP_TX flood.

yep. there are various ways to shoot yourself in the foot with xdp.
The simplest program that drops all the packets will make the box unpingable.

> Sounds a horrible feature to me.

yep. not pretty by any means.
This whole thing is only to test xdp programs under qemu which
realistically emulates only e1k.
I simply don't see any other options to test xdp under kvm.



Re: [net-next PATCH v3 2/3] e1000: add initial XDP support

2016-09-12 Thread Eric Dumazet
On Mon, 2016-09-12 at 15:13 -0700, John Fastabend wrote:
> From: Alexei Starovoitov 

> +static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info,
> +  u32 len,
> +  struct net_device *netdev,
> +  struct e1000_adapter *adapter)
> +{
> + struct netdev_queue *txq = netdev_get_tx_queue(netdev, 0);
> + struct e1000_hw *hw = >hw;
> + struct e1000_tx_ring *tx_ring;
> +
> + if (len > E1000_MAX_DATA_PER_TXD)
> + return;
> +
> + /* e1000 only support a single txq at the moment so the queue is being
> +  * shared with stack. To support this requires locking to ensure the
> +  * stack and XDP are not running at the same time. Devices with
> +  * multiple queues should allocate a separate queue space.
> +  */
> + HARD_TX_LOCK(netdev, txq, smp_processor_id());
> +
> + tx_ring = adapter->tx_ring;
> +
> + if (E1000_DESC_UNUSED(tx_ring) < 2) {
> + HARD_TX_UNLOCK(netdev, txq);
> + return;
> + }
> +
> + if (netif_xmit_frozen_or_stopped(txq))
> + return;
> +
> + e1000_tx_map_rxpage(tx_ring, rx_buffer_info, len);
> + netdev_sent_queue(netdev, len);
> + e1000_tx_queue(adapter, tx_ring, 0/*tx_flags*/, 1);
> +
> + writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
> + mmiowb();
> +
> + HARD_TX_UNLOCK(netdev, txq);
> +}


e1000_tx_map() is full of workarounds.

Have a look at last_tx_tso for example.

   /* Workaround for Controller erratum --
 * descriptor for non-tso packet in a linear SKB that follows a
 * tso gets written back prematurely before the data is fully
 * DMA'd to the controller
 */
if (!skb->data_len && tx_ring->last_tx_tso &&
!skb_is_gso(skb)) {
tx_ring->last_tx_tso = false;
size -= 4;
}

Look, this XDP_TX thing is hard to properly implement and test on
various NIC revisions.

Without proper queue management, high prio packets in qdisc wont be sent
if NIC is under RX -> XDP_TX flood.

Sounds a horrible feature to me.