On 02/27/2013 02:04 PM, Tijs Van Buggenhout wrote:
> On Tuesday 26 February 2013 21:45:00 Tijs Van Buggenhout wrote:
>> This is a proposal to add QoS support to bgmac driver, allowing to use
>> (possibly) all of the tx queues supported by the hardware.
>>
>> The bgmac device is allocated as an ethernet multiqueue network device,
>> reporting the (maximum) tx queues supported by the hardware to the kernel.
>>
>> The default queue selection mechenism is replaced with the
>> bgmac_select_queue function, mapping the priorities of the skbs to a
>> specific tx queue number.

I would not replace the default algorithm.

>> Later this number is used in the bgmax_start_xmit function to select the
>> right tx queue, on which the new skb is added.
>> The hardware will generate an interrupt when the skb is handled, in which it
>> reports on what hardware queue (number/id) the event occured, allowing to
>> call bgmac_dma_tx_free on the right tx queue/ring.
>>
>> Mapping is according to 802.1d priority to traffic class mapping, as found
>> in et_linux.c :
> 
> Sorry, it resides in et/sys/etc.c, my mistake...
> 
>>         TX_Q1,          /* 0    BE    TC_BE    Best Effort */
>>         TX_Q0,          /* 1    BK    TC_BK    Background */
>>         TX_Q0,          /* 2    --    TC_BK    Background */
>>         TX_Q1,          /* 3    EE    TC_BE    Best Effort */
>>         TX_Q2,          /* 4    CL    TC_CL    Controlled Load */
>>         TX_Q2,          /* 5    VI    TC_CL    Controlled Load */
>>         TX_Q3,          /* 6    VO    TC_VO    Voice */
>>         TX_Q3           /* 7    NC    TC_VO    Voice */
>> Additionally any other value is mapped on TX_Q0..
>>
>> Impact on the source code is rather limited, some extra local variables for
>> netdev_queue, queue_id (u16), and all netif_.*_queue calls converted to
>> appropriate netdif_tx_.*_queue calls.
>>
>> I did not test extensively yet, but proof of concept works and is easily
>> tested by adding some debug code to the driver (in bgmac_dma_tx_add,
>> bgmac_poll and bgmac_select_queue functions) and using the ping command
>> on command line and the dropbear ssh (which uses a non-default TOS value
>> on packets: Minimize Delay). Console output shows the following:
>>
>> root@OpenWrt:/# ping 192.168.1.113
>> PING 192.168.1.113 (192.168.1.113): 56 data bytes
>> [   58.304000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
>> [   58.316000] bgmac bcma0:1: Select tx queue 1 for packet
>> [   58.320000] bgmac bcma0:1: Received interrupt for tx queue 1
>> [   58.328000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
>> [   58.332000] bgmac bcma0:1: Select tx queue 1 for packet
>> [   58.340000] bgmac bcma0:1: Received interrupt for tx queue 1
>> 64 bytes from 192.168.1.113: seq=0 ttl=64 time=40.513 ms
>> [   59.344000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
>> [   59.352000] bgmac bcma0:1: Select tx queue 1 for packet
>> [   59.356000] bgmac bcma0:1: Received interrupt for tx queue 1
>> 64 bytes from 192.168.1.113: seq=1 ttl=64 time=18.595 ms
>> [   60.364000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
>> [   60.372000] bgmac bcma0:1: Select tx queue 1 for packet
>> [   60.376000] bgmac bcma0:1: Received interrupt for tx queue 1
>> 64 bytes from 192.168.1.113: seq=2 ttl=64 time=18.550 ms
>> ^C
>> --- 192.168.1.113 ping statistics ---
>> 3 packets transmitted, 3 packets received, 0% packet loss
>> root@OpenWrt:/# ssh [email protected]
>> [  105.992000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
>> [  106.000000] bgmac bcma0:1: Select tx queue 1 for packet
>> [  106.004000] bgmac bcma0:1: Received interrupt for tx queue 1
>> [  106.012000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
>> [  106.016000] bgmac bcma0:1: Select tx queue 1 for packet
>> [  106.024000] bgmac bcma0:1: Received interrupt for tx queue 1
>> [  106.032000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
>> [  106.036000] bgmac bcma0:1: Select tx queue 3 for packet
>> [  106.044000] bgmac bcma0:1: Received interrupt for tx queue 3
>> [  106.048000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
>> [  106.056000] bgmac bcma0:1: Select tx queue 3 for packet
>> [  106.060000] bgmac bcma0:1: Received interrupt for tx queue 3
>> [  106.072000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
>> ...
>> Host '192.168.1.113' is not in the trusted hosts file.
>> (fingerprint md5 b7:12:75:ca:65:cb:c7:60:59:35:6a:5d:28:6d:27:83)
>> Do you want to continue connecting? (y/n)
>> [  106.400000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
>> [  106.404000] bgmac bcma0:1: Select tx queue 3 for packet
>> [  106.408000] bgmac bcma0:1: Received interrupt for tx queue 3
>> y
>> [  111.776000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
>> ...
>>
>> This patch is generated from git trunk.. Any comments/suggestions/ideas very
>> welcome...

This patch is whitespace damaged by your mail client. All the tabs are
replaced by spaces and so on this prevents this patch from applying.
Please read http://kerneltrap.org/Linux/Email_Clients_and_Patches or use
"git send-email".

Could you also send this for linux mainline inclusion to the netdev
mailing list.

>> Signed-off-by: Tijs Van Buggenhout <[email protected]>
>> ---
>>
>> diff --git a/drivers/net/ethernet/broadcom/bgmac.c
>> b/drivers/net/ethernet/broadcom/bgmac.c index 3fd3288..58b1d09 100644
>> --- a/drivers/net/ethernet/broadcom/bgmac.c
>> +++ b/drivers/net/ethernet/broadcom/bgmac.c
>> @@ -15,6 +15,8 @@
>>  #include <linux/mii.h>
>>  #include <linux/interrupt.h>
>>  #include <linux/dma-mapping.h>
>> +#include <linux/pkt_sched.h>          /* TC defines */
>> +#include <linux/skbuff.h>             /* skb_get_queue_mapping */
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
These comments are not needed.

>>  #include <asm/mach-bcm47xx/nvram.h>
>>
>>  static const struct bcma_device_id bgmac_bcma_tbl[] = {
>> @@ -102,15 +104,22 @@ static void bgmac_dma_tx_enable(struct bgmac *bgmac,
>>
>>  static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
>>                                     struct bgmac_dma_ring *ring,
>> -                                   struct sk_buff *skb)
>> +                                   struct sk_buff *skb,
>> +                                   u16 queue_id)
> 
> We could determine queue_id from (ring - &bgmac->tx_ring[0]) ofcourse,
> avoiding the extra function argument...

I like the current way more.

>>  {
>>         struct device *dma_dev = bgmac->core->dma_dev;
>>         struct net_device *net_dev = bgmac->net_dev;
>> +       struct netdev_queue *tx_queue;
>>         struct bgmac_dma_desc *dma_desc;
>>         struct bgmac_slot_info *slot;
>>         u32 ctl0, ctl1;
>>         int free_slots;
>>
>> +       if ((tx_queue = netdev_get_tx_queue(net_dev, queue_id)) == NULL) {
>> +               bgmac_err(bgmac, "Error mapping skb to TX queue (id %u)\n",
>> queue_id); +               goto err_drop;
>> +       }
>> +
>>         if (skb->len > BGMAC_DESC_CTL1_LEN) {
>>                 bgmac_err(bgmac, "Too long skb (%d)\n", skb->len);
>>                 goto err_stop_drop;
>> @@ -122,7 +131,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
>> free_slots = ring->start - ring->end;
>>         if (free_slots == 1) {
>>                 bgmac_err(bgmac, "TX ring is full, queue should be
>> stopped!\n"); -               netif_stop_queue(net_dev);
>> +               netif_tx_stop_queue(tx_queue);

This looks like an other fix.

>>                 return NETDEV_TX_BUSY;
>>         }
>>
>> @@ -160,20 +169,22 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac
>> *bgmac,
>>
>>         /* Always keep one slot free to allow detecting bugged calls. */
>>         if (--free_slots == 1)
>> -               netif_stop_queue(net_dev);
>> +               netif_tx_stop_queue(tx_queue);
>>
>>         return NETDEV_TX_OK;
>>
>>  err_stop_drop:
>> -       netif_stop_queue(net_dev);
>> +       netif_tx_stop_queue(tx_queue);
>> +err_drop:
>>         dev_kfree_skb(skb);
>>         return NETDEV_TX_OK;
>>  }
>>
>>  /* Free transmitted packets */
>> -static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring
>> *ring) +static void bgmac_dma_tx_free(struct bgmac *bgmac, struct
>> bgmac_dma_ring *ring, u16 queue_id) {
> 
> Also here the extra queue_id argument could be avoided..
> 
>>         struct device *dma_dev = bgmac->core->dma_dev;
>> +       struct netdev_queue *tx_queue = netdev_get_tx_queue(bgmac->net_dev,
>> queue_id); int empty_slot;
>>         bool freed = false;
>>
>> @@ -204,8 +215,8 @@ static void bgmac_dma_tx_free(struct bgmac *bgmac,
>> struct bgmac_dma_ring *ring) freed = true;
>>         }
>>
>> -       if (freed && netif_queue_stopped(bgmac->net_dev))
>> -               netif_wake_queue(bgmac->net_dev);
>> +       if (freed && netif_tx_queue_stopped(tx_queue))
>> +               netif_tx_wake_queue(tx_queue);
>>  }
>>
>>  static void bgmac_dma_rx_reset(struct bgmac *bgmac, struct bgmac_dma_ring
>> *ring) @@ -1096,14 +1107,23 @@ static irqreturn_t bgmac_interrupt(int irq,
>> void *dev_id)
>>
>>  static int bgmac_poll(struct napi_struct *napi, int weight)
>>  {
>> +       static const u32 tx_ring[BGMAC_MAX_TX_RINGS] = {
>> +               BGMAC_IS_TX0, BGMAC_IS_TX1,
>> +               BGMAC_IS_TX2, BGMAC_IS_TX3,
>> +       };
>>         struct bgmac *bgmac = container_of(napi, struct bgmac, napi);
>>         struct bgmac_dma_ring *ring;
>> -       int handled = 0;
>> -
>> -       if (bgmac->int_status & BGMAC_IS_TX0) {
>> -               ring = &bgmac->tx_ring[0];
>> -               bgmac_dma_tx_free(bgmac, ring);
>> -               bgmac->int_status &= ~BGMAC_IS_TX0;
>> +       int handled = 0, i;
>> +
>> +       if (bgmac->int_status & BGMAC_IS_TX_MASK) {
>> +               for (i = 0 ; (bgmac->int_status & BGMAC_IS_TX_MASK) &&
>> +                               i < BGMAC_MAX_TX_RINGS ; i++) {
>> +                       if (bgmac->int_status & tx_ring[i]) {
>> +                               ring = &bgmac->tx_ring[i];
>> +                               bgmac_dma_tx_free(bgmac, ring, (u16)i);
>> +                               bgmac->int_status &= ~tx_ring[i];
>> +                       }
>> +               }
>>         }
> 
> The if clause is redundant, as int_status is also checked in the condition of
> the for-loop, .. could be left out.

I would remove the (bgmac->int_status & BGMAC_IS_TX_MASK) from the for
loop. This is a hot path and if there is no tx ack we do not have to do
the for loop at all.

> 
>>         if (bgmac->int_status & BGMAC_IS_RX) {
>> @@ -1172,10 +1192,11 @@ static netdev_tx_t bgmac_start_xmit(struct sk_buff
>> *skb, {
>>         struct bgmac *bgmac = netdev_priv(net_dev);
>>         struct bgmac_dma_ring *ring;
>> +       u16 queue_id = 0;
>>
>> -       /* No QOS support yet */
>> -       ring = &bgmac->tx_ring[0];
>> -       return bgmac_dma_tx_add(bgmac, ring, skb);
>> +       queue_id = skb_get_queue_mapping(skb);
>> +       ring = &bgmac->tx_ring[queue_id];
>> +       return bgmac_dma_tx_add(bgmac, ring, skb, queue_id);
>>  }
>>
>>  static int bgmac_set_mac_address(struct net_device *net_dev, void *addr)
>> @@ -1217,6 +1238,32 @@ static int bgmac_ioctl(struct net_device *net_dev,
>> struct ifreq *ifr, int cmd) }
>>  }
>>
>> +static u16 bgmac_select_queue(struct net_device *net_dev, struct sk_buff
>> *skb) +{
>> +       u32 priority = skb->priority;
>> +
>> +       /* 802.1d priority to traffic class mapping for 4 queues */
>> +       switch (priority) {
>> +               case TC_PRIO_BESTEFFORT:            /* Best effort */
>> +               case 3:                             /* Excellent effort */
>> +                       /* Best effort queue TX_Q1 */
>> +                       return 1;
>> +               case TC_PRIO_INTERACTIVE_BULK:      /* Controlled load */
>> +               case 5:                             /* Video */
>> +                       /* Controlled Load queue TX_Q2 */
>> +                       return 2;
>> +               case TC_PRIO_INTERACTIVE:          /* Voice */
>> +               case TC_PRIO_CONTROL:              /* Network control */
>> +                       /* Interactive queue TX_Q3 */
>> +                       return 3;
>> +               case TC_PRIO_FILLER:
>> +               case TC_PRIO_BULK:
>> +               default:
>> +                       /* Background queue TX_Q0 */
>> +                       return 0;
>> +       }
>> +}

You should not use priority, no other ethernet driver does so, use
skb_get_queue_mapping() and see select_ring_by_priority() in
drivers/net/wireless/b43/dma.c

>> +
>>  static const struct net_device_ops bgmac_netdev_ops = {
>>         .ndo_open               = bgmac_open,
>>         .ndo_stop               = bgmac_stop,
>> @@ -1225,6 +1272,7 @@ static const struct net_device_ops bgmac_netdev_ops =
>> { .ndo_set_mac_address    = bgmac_set_mac_address,
>>         .ndo_validate_addr      = eth_validate_addr,
>>         .ndo_do_ioctl           = bgmac_ioctl,
>> +       .ndo_select_queue       = bgmac_select_queue,
>>  };
>>
>>  /**************************************************
>> @@ -1330,7 +1378,7 @@ static int bgmac_probe(struct bcma_device *core)
>>         }
>>
>>         /* Allocation and references */
>> -       net_dev = alloc_etherdev(sizeof(*bgmac));
>> +       net_dev = alloc_etherdev_mq(sizeof(*bgmac), BGMAC_MAX_TX_RINGS);
>>         if (!net_dev)
>>                 return -ENOMEM;
>>         net_dev->netdev_ops = &bgmac_netdev_ops;
>> _______________________________________________
>> openwrt-devel mailing list
>> [email protected]
>> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
> 
> We could keep some counters on the individual rings, to print statistics later
> e.g. with ethtool.

This driver currently does not generate many statics, but this should be
improved in general.

> We could have a qos boolean to en/disable qos, which can be checked in
> bgmac_select_queue to e.g always return TX_Q0 when disabled, or one of four
> tx queues, as shown above.

when you use skb_get_queue_mapping() it should be configurable from user
space.

Hauke
_______________________________________________
openwrt-devel mailing list
[email protected]
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to