Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
On Tue, Jun 12, 2018 at 8:18 AM, Alexander Duyck wrote: > This patch series is meant to allow support for the L2 forward offload, aka > MACVLAN offload without the need for using ndo_select_queue. > > The existing solution currently requires that we use ndo_select_queue in > the transmit path if we want to associate specific Tx queues with a given > MACVLAN interface. In order to get away from this we need to repurpose the > tc_to_txq array and XPS pointer for the MACVLAN interface and use those as > a means of accessing the queues on the lower device. As a result we cannot > offload a device that is configured as multiqueue, however it doesn't > really make sense to configure a macvlan interfaced as being multiqueue > anyway since it doesn't really have a qdisc of its own in the first place. > > I am submitting this as an RFC for the netdev mailing list, and officially > submitting it for testing to Jeff Kirsher's next-queue in order to validate > the ixgbe specific bits. > > The big changes in this set are: > Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN > Disable XPS for single queue devices > Replace accel_priv with sb_dev in ndo_select_queue > Add sb_dev parameter to fallback function for ndo_select_queue > Consolidated ndo_select_queue functions that appeared to be duplicates > > v2: Implement generic "select_queue" functions instead of "fallback" > functions. > Tweak last two patches to account for changes in dev_pick_tx_xxx > functions. > > --- > > Alexander Duyck (7): > net-sysfs: Drop support for XPS and traffic_class on single queue device > net: Add support for subordinate device traffic classes > ixgbe: Add code to populate and use macvlan tc to Tx queue map > net: Add support for subordinate traffic classes to netdev_pick_tx > net: Add generic ndo_select_queue functions > net: allow ndo_select_queue to pass netdev > net: allow fallback function to pass netdev > Alex, there were recent changes to Dave's net-next which caused a conflict with one or more of your patches. So I have removed this series from my tree for now, and will work with you on updating the series to work with the latest net-next tree.
Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
On Tue, Jun 12, 2018 at 10:50 AM, Stephen Hemminger wrote: > On Tue, 12 Jun 2018 11:18:25 -0400 > Alexander Duyck wrote: > >> This patch series is meant to allow support for the L2 forward offload, aka >> MACVLAN offload without the need for using ndo_select_queue. >> >> The existing solution currently requires that we use ndo_select_queue in >> the transmit path if we want to associate specific Tx queues with a given >> MACVLAN interface. In order to get away from this we need to repurpose the >> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as >> a means of accessing the queues on the lower device. As a result we cannot >> offload a device that is configured as multiqueue, however it doesn't >> really make sense to configure a macvlan interfaced as being multiqueue >> anyway since it doesn't really have a qdisc of its own in the first place. >> >> I am submitting this as an RFC for the netdev mailing list, and officially >> submitting it for testing to Jeff Kirsher's next-queue in order to validate >> the ixgbe specific bits. >> >> The big changes in this set are: >> Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN >> Disable XPS for single queue devices >> Replace accel_priv with sb_dev in ndo_select_queue >> Add sb_dev parameter to fallback function for ndo_select_queue >> Consolidated ndo_select_queue functions that appeared to be duplicates >> >> v2: Implement generic "select_queue" functions instead of "fallback" >> functions. >> Tweak last two patches to account for changes in dev_pick_tx_xxx >> functions. >> >> --- >> >> Alexander Duyck (7): >> net-sysfs: Drop support for XPS and traffic_class on single queue >> device >> net: Add support for subordinate device traffic classes >> ixgbe: Add code to populate and use macvlan tc to Tx queue map >> net: Add support for subordinate traffic classes to netdev_pick_tx >> net: Add generic ndo_select_queue functions >> net: allow ndo_select_queue to pass netdev >> net: allow fallback function to pass netdev >> >> >> drivers/infiniband/hw/hfi1/vnic_main.c|2 >> drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c |4 - >> drivers/net/bonding/bond_main.c |3 >> drivers/net/ethernet/amazon/ena/ena_netdev.c |5 - >> drivers/net/ethernet/broadcom/bcmsysport.c|6 - >> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |6 + >> drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |3 >> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |5 - >> drivers/net/ethernet/hisilicon/hns/hns_enet.c |5 - >> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 62 ++-- >> drivers/net/ethernet/lantiq_etop.c| 10 - >> drivers/net/ethernet/mellanox/mlx4/en_tx.c|7 + >> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |3 >> drivers/net/ethernet/mellanox/mlx5/core/en.h |3 >> drivers/net/ethernet/mellanox/mlx5/core/en_tx.c |5 - >> drivers/net/ethernet/renesas/ravb_main.c |3 >> drivers/net/ethernet/sun/ldmvsw.c |3 >> drivers/net/ethernet/sun/sunvnet.c|3 >> drivers/net/ethernet/ti/netcp_core.c |9 - >> drivers/net/hyperv/netvsc_drv.c |6 - >> drivers/net/macvlan.c | 10 - >> drivers/net/net_failover.c|7 + >> drivers/net/team/team.c |3 >> drivers/net/tun.c |3 >> drivers/net/wireless/marvell/mwifiex/main.c |3 >> drivers/net/xen-netback/interface.c |4 - >> drivers/net/xen-netfront.c|3 >> drivers/staging/netlogic/xlr_net.c|9 - >> drivers/staging/rtl8188eu/os_dep/os_intfs.c |3 >> drivers/staging/rtl8723bs/os_dep/os_intfs.c |7 - >> include/linux/netdevice.h | 34 - >> net/core/dev.c| 156 >> ++--- >> net/core/net-sysfs.c | 36 - >> net/mac80211/iface.c |4 - >> net/packet/af_packet.c|7 + >> 35 files changed, 312 insertions(+), 130 deletions(-) >> >> -- > > This makes sense. I thought you were hoping to get rid of select queue in > future? That would be nice, however there are still a bunch of corner cases that are not handled that have been dumped into select queue. For example in the case of ixgbe the issue is FCoE. There are a number of other places that are using it as well as I seem to recall netvsc and bonding both use it to store off the original Rx->Tx queue mapping when passing through the interface. For now I figure we can take this one hill at a time and I am just making it so we don't have to use ndo_select_queue in order to make vmdq
Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue
On Tue, Jun 12, 2018 at 10:56 AM, Florian Fainelli wrote: > On 06/12/2018 08:18 AM, Alexander Duyck wrote: >> This patch series is meant to allow support for the L2 forward offload, aka >> MACVLAN offload without the need for using ndo_select_queue. >> >> The existing solution currently requires that we use ndo_select_queue in >> the transmit path if we want to associate specific Tx queues with a given >> MACVLAN interface. In order to get away from this we need to repurpose the >> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as >> a means of accessing the queues on the lower device. As a result we cannot >> offload a device that is configured as multiqueue, however it doesn't >> really make sense to configure a macvlan interfaced as being multiqueue >> anyway since it doesn't really have a qdisc of its own in the first place. > > Interesting, so at some point I had came up with the following for > mapping queues between the DSA slave network devices and the DSA master > network device (doing the actual transmission). The DSA master network > device driver is just a normal network device driver. > > The set-up is as follows: 4 external Ethernet switch ports, each with 8 > egress queues and the DSA master (bcmsysport.c), aka CPU Ethernet > controller has 32 output queues, so you can do a 1:1 mapping of those, > that's actually what we want. A subsequent hardware generation only > provides 16 output queues, so we can still do 2:1 mapping. > > The implementation is done like this: > > - DSA slave network devices are always created after the DSA master > network device so we can leverage that > > - a specific notifier is running from the DSA core and tells the DSA > master about the switch position in the tree (position 0 = directly > attached), and the switch port number and a pointer to the slave network > device > > - we establish the mapping between the queues within the bcmsysport > driver as a simple array > > - when transmitting, DSA slave network devices set a specific queue/port > number within the 16-bits that skb->queue_mapping permits > > - this gets re-used by bcmsysport.c to extract the correct queue number > during ndo_select_queue such that the appropriate queue number gets used > and congestion works end-to-end. > > The reason why we do that is because there is some out of band HW that > monitors the queue depth of the switch port's egress queue and > back-pressure the Ethernet controller directly when trying to transmit > to a congested queue. > > I had initially considered establishing the mapping using tc and some > custom "bind" argument of some kind, but ended-up doing things the way > they are which are more automatic though they leave less configuration > to an user. This has a number of caveats though: > > - this is made generic within the context of DSA in that nothing is > switch driver or Ethernet MAC driver specific and the notifier > represents the contract between these two seemingly independent subsystems > > - the queue indicated between DSA slave and master is unfortunately > switch driver/controller specific (BRCM_TAG_SET_PORT_QUEUE, > BRCM_TAG_GET_PORT, BRCM_TAG_GET_QUEUE) > > What I like about your patchset is the mapping establishment, but as you > will read from my reply in patch 2, I think the (upper) 1:N (lower) > mapping might not work for my specific use case. > > Anyhow, not intended to be blocking this, as it seems to be going in the > right direction anyway. I think I am still not getting why the 1:N would be an issue. At least the way I have the code implemented here the lower queues all have a qdisc associated with them, just not the upper device. Generally I am using the macvlan as a bump in the wire to take care of filtering for the bridging mode. If I have to hairpin packets and send them back up on on of the the upper interfaces I want to do that in software rather than hardware so I try to take care of it there instead of routing it through the hardware. >> >> I am submitting this as an RFC for the netdev mailing list, and officially >> submitting it for testing to Jeff Kirsher's next-queue in order to validate >> the ixgbe specific bits. >> >> The big changes in this set are: >> Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN >> Disable XPS for single queue devices >> Replace accel_priv with sb_dev in ndo_select_queue >> Add sb_dev parameter to fallback function for ndo_select_queue >> Consolidated ndo_select_queue functions that appeared to be duplicates > > Interesting, turns out I had a possibly similar use case with DSA with > the slave network devices need to select an outgoing queue number for I was kind of assuming this could be applied to a number of possible use cases. As it was I was wondering if maybe we should look at adding this as an option for just a standard VLAN as we could perform the same kind of filtering and just deliver the packet directly to the VLAN interface instead of