Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue

2018-07-03 Thread Jeff Kirsher
On Tue, Jun 12, 2018 at 8:18 AM, Alexander Duyck
 wrote:
> This patch series is meant to allow support for the L2 forward offload, aka
> MACVLAN offload without the need for using ndo_select_queue.
>
> The existing solution currently requires that we use ndo_select_queue in
> the transmit path if we want to associate specific Tx queues with a given
> MACVLAN interface. In order to get away from this we need to repurpose the
> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
> a means of accessing the queues on the lower device. As a result we cannot
> offload a device that is configured as multiqueue, however it doesn't
> really make sense to configure a macvlan interfaced as being multiqueue
> anyway since it doesn't really have a qdisc of its own in the first place.
>
> I am submitting this as an RFC for the netdev mailing list, and officially
> submitting it for testing to Jeff Kirsher's next-queue in order to validate
> the ixgbe specific bits.
>
> The big changes in this set are:
>   Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
>   Disable XPS for single queue devices
>   Replace accel_priv with sb_dev in ndo_select_queue
>   Add sb_dev parameter to fallback function for ndo_select_queue
>   Consolidated ndo_select_queue functions that appeared to be duplicates
>
> v2: Implement generic "select_queue" functions instead of "fallback" 
> functions.
> Tweak last two patches to account for changes in dev_pick_tx_xxx 
> functions.
>
> ---
>
> Alexander Duyck (7):
>   net-sysfs: Drop support for XPS and traffic_class on single queue device
>   net: Add support for subordinate device traffic classes
>   ixgbe: Add code to populate and use macvlan tc to Tx queue map
>   net: Add support for subordinate traffic classes to netdev_pick_tx
>   net: Add generic ndo_select_queue functions
>   net: allow ndo_select_queue to pass netdev
>   net: allow fallback function to pass netdev
>

Alex, there were recent changes to Dave's net-next which caused a
conflict with one or more of your patches.  So I have removed this
series from my tree for now, and will work with you on updating the
series to work with the latest net-next tree.


Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue

2018-06-12 Thread Alexander Duyck
On Tue, Jun 12, 2018 at 10:50 AM, Stephen Hemminger
 wrote:
> On Tue, 12 Jun 2018 11:18:25 -0400
> Alexander Duyck  wrote:
>
>> This patch series is meant to allow support for the L2 forward offload, aka
>> MACVLAN offload without the need for using ndo_select_queue.
>>
>> The existing solution currently requires that we use ndo_select_queue in
>> the transmit path if we want to associate specific Tx queues with a given
>> MACVLAN interface. In order to get away from this we need to repurpose the
>> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
>> a means of accessing the queues on the lower device. As a result we cannot
>> offload a device that is configured as multiqueue, however it doesn't
>> really make sense to configure a macvlan interfaced as being multiqueue
>> anyway since it doesn't really have a qdisc of its own in the first place.
>>
>> I am submitting this as an RFC for the netdev mailing list, and officially
>> submitting it for testing to Jeff Kirsher's next-queue in order to validate
>> the ixgbe specific bits.
>>
>> The big changes in this set are:
>>   Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
>>   Disable XPS for single queue devices
>>   Replace accel_priv with sb_dev in ndo_select_queue
>>   Add sb_dev parameter to fallback function for ndo_select_queue
>>   Consolidated ndo_select_queue functions that appeared to be duplicates
>>
>> v2: Implement generic "select_queue" functions instead of "fallback" 
>> functions.
>> Tweak last two patches to account for changes in dev_pick_tx_xxx 
>> functions.
>>
>> ---
>>
>> Alexander Duyck (7):
>>   net-sysfs: Drop support for XPS and traffic_class on single queue 
>> device
>>   net: Add support for subordinate device traffic classes
>>   ixgbe: Add code to populate and use macvlan tc to Tx queue map
>>   net: Add support for subordinate traffic classes to netdev_pick_tx
>>   net: Add generic ndo_select_queue functions
>>   net: allow ndo_select_queue to pass netdev
>>   net: allow fallback function to pass netdev
>>
>>
>>  drivers/infiniband/hw/hfi1/vnic_main.c|2
>>  drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c |4 -
>>  drivers/net/bonding/bond_main.c   |3
>>  drivers/net/ethernet/amazon/ena/ena_netdev.c  |5 -
>>  drivers/net/ethernet/broadcom/bcmsysport.c|6 -
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |6 +
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h   |3
>>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c   |5 -
>>  drivers/net/ethernet/hisilicon/hns/hns_enet.c |5 -
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   62 ++--
>>  drivers/net/ethernet/lantiq_etop.c|   10 -
>>  drivers/net/ethernet/mellanox/mlx4/en_tx.c|7 +
>>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h  |3
>>  drivers/net/ethernet/mellanox/mlx5/core/en.h  |3
>>  drivers/net/ethernet/mellanox/mlx5/core/en_tx.c   |5 -
>>  drivers/net/ethernet/renesas/ravb_main.c  |3
>>  drivers/net/ethernet/sun/ldmvsw.c |3
>>  drivers/net/ethernet/sun/sunvnet.c|3
>>  drivers/net/ethernet/ti/netcp_core.c  |9 -
>>  drivers/net/hyperv/netvsc_drv.c   |6 -
>>  drivers/net/macvlan.c |   10 -
>>  drivers/net/net_failover.c|7 +
>>  drivers/net/team/team.c   |3
>>  drivers/net/tun.c |3
>>  drivers/net/wireless/marvell/mwifiex/main.c   |3
>>  drivers/net/xen-netback/interface.c   |4 -
>>  drivers/net/xen-netfront.c|3
>>  drivers/staging/netlogic/xlr_net.c|9 -
>>  drivers/staging/rtl8188eu/os_dep/os_intfs.c   |3
>>  drivers/staging/rtl8723bs/os_dep/os_intfs.c   |7 -
>>  include/linux/netdevice.h |   34 -
>>  net/core/dev.c|  156 
>> ++---
>>  net/core/net-sysfs.c  |   36 -
>>  net/mac80211/iface.c  |4 -
>>  net/packet/af_packet.c|7 +
>>  35 files changed, 312 insertions(+), 130 deletions(-)
>>
>> --
>
> This makes sense. I thought you were hoping to get rid of select queue in 
> future?

That would be nice, however there are still a bunch of corner cases
that are not handled that have been dumped into select queue. For
example in the case of ixgbe the issue is FCoE. There are a number of
other places that are using it as well as I seem to recall netvsc and
bonding both use it to store off the original Rx->Tx queue mapping
when passing through the interface.

For now I figure we can take this one hill at a time and I am just
making it so we don't have to use ndo_select_queue in order to make
vmdq

Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v2 0/7] Add support for L2 Fwd Offload w/o ndo_select_queue

2018-06-12 Thread Alexander Duyck
On Tue, Jun 12, 2018 at 10:56 AM, Florian Fainelli  wrote:
> On 06/12/2018 08:18 AM, Alexander Duyck wrote:
>> This patch series is meant to allow support for the L2 forward offload, aka
>> MACVLAN offload without the need for using ndo_select_queue.
>>
>> The existing solution currently requires that we use ndo_select_queue in
>> the transmit path if we want to associate specific Tx queues with a given
>> MACVLAN interface. In order to get away from this we need to repurpose the
>> tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
>> a means of accessing the queues on the lower device. As a result we cannot
>> offload a device that is configured as multiqueue, however it doesn't
>> really make sense to configure a macvlan interfaced as being multiqueue
>> anyway since it doesn't really have a qdisc of its own in the first place.
>
> Interesting, so at some point I had came up with the following for
> mapping queues between the DSA slave network devices and the DSA master
> network device (doing the actual transmission). The DSA master network
> device driver is just a normal network device driver.
>
> The set-up is as follows: 4 external Ethernet switch ports, each with 8
> egress queues and the DSA master (bcmsysport.c), aka CPU Ethernet
> controller has 32 output queues, so you can do a 1:1 mapping of those,
> that's actually what we want. A subsequent hardware generation only
> provides 16 output queues, so we can still do 2:1 mapping.
>
> The implementation is done like this:
>
> - DSA slave network devices are always created after the DSA master
> network device so we can leverage that
>
> - a specific notifier is running from the DSA core and tells the DSA
> master about the switch position in the tree (position 0 = directly
> attached), and the switch port number and a pointer to the slave network
> device
>
> - we establish the mapping between the queues within the bcmsysport
> driver as a simple array
>
> - when transmitting, DSA slave network devices set a specific queue/port
> number within the 16-bits that skb->queue_mapping permits
>
> - this gets re-used by bcmsysport.c to extract the correct queue number
> during ndo_select_queue such that the appropriate queue number gets used
> and congestion works end-to-end.
>
> The reason why we do that is because there is some out of band HW that
> monitors the queue depth of the switch port's egress queue and
> back-pressure the Ethernet controller directly when trying to transmit
> to a congested queue.
>
> I had initially considered establishing the mapping using tc and some
> custom "bind" argument of some kind, but ended-up doing things the way
> they are which are more automatic though they leave less configuration
> to an user. This has a number of caveats though:
>
> - this is made generic within the context of DSA in that nothing is
> switch driver or Ethernet MAC driver specific and the notifier
> represents the contract between these two seemingly independent subsystems
>
> - the queue indicated between DSA slave and master is unfortunately
> switch driver/controller specific (BRCM_TAG_SET_PORT_QUEUE,
> BRCM_TAG_GET_PORT, BRCM_TAG_GET_QUEUE)
>
> What I like about your patchset is the mapping establishment, but as you
> will read from my reply in patch 2, I think the (upper) 1:N (lower)
> mapping might not work for my specific use case.
>
> Anyhow, not intended to be blocking this, as it seems to be going in the
> right direction anyway.

I think I am still not getting why the 1:N would be an issue. At least
the way I have the code implemented here the lower queues all have a
qdisc associated with them, just not the upper device. Generally I am
using the macvlan as a bump in the wire to take care of filtering for
the bridging mode. If I have to hairpin packets and send them back up
on on of the the upper interfaces I want to do that in software rather
than hardware so I try to take care of it there instead of routing it
through the hardware.

>>
>> I am submitting this as an RFC for the netdev mailing list, and officially
>> submitting it for testing to Jeff Kirsher's next-queue in order to validate
>> the ixgbe specific bits.
>>
>> The big changes in this set are:
>>   Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
>>   Disable XPS for single queue devices
>>   Replace accel_priv with sb_dev in ndo_select_queue
>>   Add sb_dev parameter to fallback function for ndo_select_queue
>>   Consolidated ndo_select_queue functions that appeared to be duplicates
>
> Interesting, turns out I had a possibly similar use case with DSA with
> the slave network devices need to select an outgoing queue number for

I was kind of assuming this could be applied to a number of possible
use cases. As it was I was wondering if maybe we should look at adding
this as an option for just a standard VLAN as we could perform the
same kind of filtering and just deliver the packet directly to the
VLAN interface instead of