On Thursday 28 February 2013 15:02:26 Hauke Mehrtens wrote:
> On 02/27/2013 02:04 PM, Tijs Van Buggenhout wrote:
> > On Tuesday 26 February 2013 21:45:00 Tijs Van Buggenhout wrote:
> >> This is a proposal to add QoS support to bgmac driver, allowing to use
> >> (possibly) all of the tx queues supported by the hardware.
> >> 
> >> The bgmac device is allocated as an ethernet multiqueue network device,
> >> reporting the (maximum) tx queues supported by the hardware to the
> >> kernel.
> >> 
> >> The default queue selection mechenism is replaced with the
> >> bgmac_select_queue function, mapping the priorities of the skbs to a
> >> specific tx queue number.
> 
> I would not replace the default algorithm.

I would agree on that. I first tried with the default algorithm, but it maps 
the priorities to the queues in a very different way (other than found in 
broadcom driver, as displayed below). I don't know if it is intentional or not 
(by broadcom).. does the hardware treat TX queues differently in priority, 
e.g. broadcom driver prioritises TX_Q3 over TX_Q2, over TX_Q1, over TX_Q0.

When using the default algorithm, I saw normal packets (without priority 
explicitely set), like ICMP being mapped to TX_Q3, whereas ssh traffic was 
going to TX_Q2. As it seems, the default algorithm can map priorities 
differently every time the driver/kernel is initialised (random). Not really 
what we want.

> >> Later this number is used in the bgmax_start_xmit function to select the
> >> right tx queue, on which the new skb is added.
> >> The hardware will generate an interrupt when the skb is handled, in which
> >> it reports on what hardware queue (number/id) the event occured,
> >> allowing to call bgmac_dma_tx_free on the right tx queue/ring.
> >> 
> >> Mapping is according to 802.1d priority to traffic class mapping, as
> >> found
> > 
> >> in et_linux.c :
> > Sorry, it resides in et/sys/etc.c, my mistake...
> > 
> >>         TX_Q1,          /* 0    BE    TC_BE    Best Effort */
> >>         TX_Q0,          /* 1    BK    TC_BK    Background */
> >>         TX_Q0,          /* 2    --    TC_BK    Background */
> >>         TX_Q1,          /* 3    EE    TC_BE    Best Effort */
> >>         TX_Q2,          /* 4    CL    TC_CL    Controlled Load */
> >>         TX_Q2,          /* 5    VI    TC_CL    Controlled Load */
> >>         TX_Q3,          /* 6    VO    TC_VO    Voice */
> >>         TX_Q3           /* 7    NC    TC_VO    Voice */
> >> 
> >> Additionally any other value is mapped on TX_Q0..
> >> 
> >> Impact on the source code is rather limited, some extra local variables
> >> for
> >> netdev_queue, queue_id (u16), and all netif_.*_queue calls converted to
> >> appropriate netdif_tx_.*_queue calls.
> >> 
> >> I did not test extensively yet, but proof of concept works and is easily
> >> tested by adding some debug code to the driver (in bgmac_dma_tx_add,
> >> bgmac_poll and bgmac_select_queue functions) and using the ping command
> >> on command line and the dropbear ssh (which uses a non-default TOS value
> >> on packets: Minimize Delay). Console output shows the following:
> >> 
> >> root@OpenWrt:/# ping 192.168.1.113
> >> PING 192.168.1.113 (192.168.1.113): 56 data bytes
> >> [   58.304000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
> >> [   58.316000] bgmac bcma0:1: Select tx queue 1 for packet
> >> [   58.320000] bgmac bcma0:1: Received interrupt for tx queue 1
> >> [   58.328000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
> >> [   58.332000] bgmac bcma0:1: Select tx queue 1 for packet
> >> [   58.340000] bgmac bcma0:1: Received interrupt for tx queue 1
> >> 64 bytes from 192.168.1.113: seq=0 ttl=64 time=40.513 ms
> >> [   59.344000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
> >> [   59.352000] bgmac bcma0:1: Select tx queue 1 for packet
> >> [   59.356000] bgmac bcma0:1: Received interrupt for tx queue 1
> >> 64 bytes from 192.168.1.113: seq=1 ttl=64 time=18.595 ms
> >> [   60.364000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
> >> [   60.372000] bgmac bcma0:1: Select tx queue 1 for packet
> >> [   60.376000] bgmac bcma0:1: Received interrupt for tx queue 1
> >> 64 bytes from 192.168.1.113: seq=2 ttl=64 time=18.550 ms
> >> ^C
> >> --- 192.168.1.113 ping statistics ---
> >> 3 packets transmitted, 3 packets received, 0% packet loss
> >> root@OpenWrt:/# ssh [email protected]
> >> [  105.992000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
> >> [  106.000000] bgmac bcma0:1: Select tx queue 1 for packet
> >> [  106.004000] bgmac bcma0:1: Received interrupt for tx queue 1
> >> [  106.012000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0
> >> [  106.016000] bgmac bcma0:1: Select tx queue 1 for packet
> >> [  106.024000] bgmac bcma0:1: Received interrupt for tx queue 1
> >> [  106.032000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
> >> [  106.036000] bgmac bcma0:1: Select tx queue 3 for packet
> >> [  106.044000] bgmac bcma0:1: Received interrupt for tx queue 3
> >> [  106.048000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
> >> [  106.056000] bgmac bcma0:1: Select tx queue 3 for packet
> >> [  106.060000] bgmac bcma0:1: Received interrupt for tx queue 3
> >> [  106.072000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
> >> ...
> >> Host '192.168.1.113' is not in the trusted hosts file.
> >> (fingerprint md5 b7:12:75:ca:65:cb:c7:60:59:35:6a:5d:28:6d:27:83)
> >> Do you want to continue connecting? (y/n)
> >> [  106.400000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
> >> [  106.404000] bgmac bcma0:1: Select tx queue 3 for packet
> >> [  106.408000] bgmac bcma0:1: Received interrupt for tx queue 3
> >> y
> >> [  111.776000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6
> >> ...
> >> 
> >> This patch is generated from git trunk.. Any comments/suggestions/ideas
> >> very welcome...
> 
> This patch is whitespace damaged by your mail client. All the tabs are
> replaced by spaces and so on this prevents this patch from applying.
> Please read http://kerneltrap.org/Linux/Email_Clients_and_Patches or use
> "git send-email".
> 
> Could you also send this for linux mainline inclusion to the netdev
> mailing list.

I'll use inline attachment next time, sorry for that...

> >> Signed-off-by: Tijs Van Buggenhout <[email protected]>
> >> ---
> >> 
> >> diff --git a/drivers/net/ethernet/broadcom/bgmac.c
> >> b/drivers/net/ethernet/broadcom/bgmac.c index 3fd3288..58b1d09 100644
> >> --- a/drivers/net/ethernet/broadcom/bgmac.c
> >> +++ b/drivers/net/ethernet/broadcom/bgmac.c
> >> @@ -15,6 +15,8 @@
> >> 
> >>  #include <linux/mii.h>
> >>  #include <linux/interrupt.h>
> >>  #include <linux/dma-mapping.h>
> >> 
> >> +#include <linux/pkt_sched.h>          /* TC defines */
> >> +#include <linux/skbuff.h>             /* skb_get_queue_mapping */
> 
>                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> These comments are not needed.
> 
> >>  #include <asm/mach-bcm47xx/nvram.h>
> >>  
> >>  static const struct bcma_device_id bgmac_bcma_tbl[] = {
> >> 
> >> @@ -102,15 +104,22 @@ static void bgmac_dma_tx_enable(struct bgmac
> >> *bgmac,
> >> 
> >>  static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac,
> >>  
> >>                                     struct bgmac_dma_ring *ring,
> >> 
> >> -                                   struct sk_buff *skb)
> >> +                                   struct sk_buff *skb,
> >> +                                   u16 queue_id)
> > 
> > We could determine queue_id from (ring - &bgmac->tx_ring[0]) ofcourse,
> > avoiding the extra function argument...
> 
> I like the current way more.
> 
> >>  {
> >>  
> >>         struct device *dma_dev = bgmac->core->dma_dev;
> >>         struct net_device *net_dev = bgmac->net_dev;
> >> 
> >> +       struct netdev_queue *tx_queue;
> >> 
> >>         struct bgmac_dma_desc *dma_desc;
> >>         struct bgmac_slot_info *slot;
> >>         u32 ctl0, ctl1;
> >>         int free_slots;
> >> 
> >> +       if ((tx_queue = netdev_get_tx_queue(net_dev, queue_id)) == NULL)
> >> {
> >> +               bgmac_err(bgmac, "Error mapping skb to TX queue (id
> >> %u)\n",
> >> queue_id); +               goto err_drop;
> >> +       }
> >> +
> >> 
> >>         if (skb->len > BGMAC_DESC_CTL1_LEN) {
> >>         
> >>                 bgmac_err(bgmac, "Too long skb (%d)\n", skb->len);
> >>                 goto err_stop_drop;
> >> 
> >> @@ -122,7 +131,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac
> >> *bgmac, free_slots = ring->start - ring->end;
> >> 
> >>         if (free_slots == 1) {
> >>         
> >>                 bgmac_err(bgmac, "TX ring is full, queue should be
> >> 
> >> stopped!\n"); -               netif_stop_queue(net_dev);
> >> +               netif_tx_stop_queue(tx_queue);
> 
> This looks like an other fix.

Not really .. all netif_stop_queue's in dma_tx_add or dma_tx_free need to be 
converted to netif_tx_stop_queue, to only address that specific queue being 
manipulated?
 
> >>                 return NETDEV_TX_BUSY;
> >>         
> >>         }
> >> 
> >> @@ -160,20 +169,22 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac
> >> *bgmac,
> >> 
> >>         /* Always keep one slot free to allow detecting bugged calls. */
> >>         if (--free_slots == 1)
> >> 
> >> -               netif_stop_queue(net_dev);
> >> +               netif_tx_stop_queue(tx_queue);
> >> 
> >>         return NETDEV_TX_OK;
> >>  
> >>  err_stop_drop:
> >> -       netif_stop_queue(net_dev);
> >> +       netif_tx_stop_queue(tx_queue);
> >> 
> >> +err_drop:
> >>         dev_kfree_skb(skb);
> >>         return NETDEV_TX_OK;
> >>  
> >>  }
> >>  
> >>  /* Free transmitted packets */
> >> 
> >> -static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring
> >> *ring) +static void bgmac_dma_tx_free(struct bgmac *bgmac, struct
> >> bgmac_dma_ring *ring, u16 queue_id) {
> > 
> > Also here the extra queue_id argument could be avoided..
> > 
> >>         struct device *dma_dev = bgmac->core->dma_dev;
> >> 
> >> +       struct netdev_queue *tx_queue =
> >> netdev_get_tx_queue(bgmac->net_dev,
> >> queue_id); int empty_slot;
> >> 
> >>         bool freed = false;
> >> 
> >> @@ -204,8 +215,8 @@ static void bgmac_dma_tx_free(struct bgmac *bgmac,
> >> struct bgmac_dma_ring *ring) freed = true;
> >> 
> >>         }
> >> 
> >> -       if (freed && netif_queue_stopped(bgmac->net_dev))
> >> -               netif_wake_queue(bgmac->net_dev);
> >> +       if (freed && netif_tx_queue_stopped(tx_queue))
> >> +               netif_tx_wake_queue(tx_queue);
> >> 
> >>  }
> >>  
> >>  static void bgmac_dma_rx_reset(struct bgmac *bgmac, struct
> >>  bgmac_dma_ring
> >> 
> >> *ring) @@ -1096,14 +1107,23 @@ static irqreturn_t bgmac_interrupt(int
> >> irq,
> >> void *dev_id)
> >> 
> >>  static int bgmac_poll(struct napi_struct *napi, int weight)
> >>  {
> >> 
> >> +       static const u32 tx_ring[BGMAC_MAX_TX_RINGS] = {
> >> +               BGMAC_IS_TX0, BGMAC_IS_TX1,
> >> +               BGMAC_IS_TX2, BGMAC_IS_TX3,
> >> +       };
> >> 
> >>         struct bgmac *bgmac = container_of(napi, struct bgmac, napi);
> >>         struct bgmac_dma_ring *ring;
> >> 
> >> -       int handled = 0;
> >> -
> >> -       if (bgmac->int_status & BGMAC_IS_TX0) {
> >> -               ring = &bgmac->tx_ring[0];
> >> -               bgmac_dma_tx_free(bgmac, ring);
> >> -               bgmac->int_status &= ~BGMAC_IS_TX0;
> >> +       int handled = 0, i;
> >> +
> >> +       if (bgmac->int_status & BGMAC_IS_TX_MASK) {
> >> +               for (i = 0 ; (bgmac->int_status & BGMAC_IS_TX_MASK) &&
> >> +                               i < BGMAC_MAX_TX_RINGS ; i++) {
> >> +                       if (bgmac->int_status & tx_ring[i]) {
> >> +                               ring = &bgmac->tx_ring[i];
> >> +                               bgmac_dma_tx_free(bgmac, ring, (u16)i);
> >> +                               bgmac->int_status &= ~tx_ring[i];
> >> +                       }
> >> +               }
> >> 
> >>         }
> > 
> > The if clause is redundant, as int_status is also checked in the condition
> > of the for-loop, .. could be left out.
> 
> I would remove the (bgmac->int_status & BGMAC_IS_TX_MASK) from the for
> loop. This is a hot path and if there is no tx ack we do not have to do
> the for loop at all.

I used it in the for loop aswel, because we may not need to loop i over all tx 
queues. Every time a tx_ring is selected, int_status is updated. When masking 
the int_status in the for loop becomes zero, the loop may have ended before i 
becomes >= BGMAC_MAX_TX_RINGS.. Maybe not worth the extra condition? just loop 
over all rings every time?
I could replace the for loop with four if conditions too (one for every mask 
in tx_ring array)...

> >>         if (bgmac->int_status & BGMAC_IS_RX) {
> >> 
> >> @@ -1172,10 +1192,11 @@ static netdev_tx_t bgmac_start_xmit(struct
> >> sk_buff
> >> *skb, {
> >> 
> >>         struct bgmac *bgmac = netdev_priv(net_dev);
> >>         struct bgmac_dma_ring *ring;
> >> 
> >> +       u16 queue_id = 0;
> >> 
> >> -       /* No QOS support yet */
> >> -       ring = &bgmac->tx_ring[0];
> >> -       return bgmac_dma_tx_add(bgmac, ring, skb);
> >> +       queue_id = skb_get_queue_mapping(skb);
> >> +       ring = &bgmac->tx_ring[queue_id];
> >> +       return bgmac_dma_tx_add(bgmac, ring, skb, queue_id);
> >> 
> >>  }
> >>  
> >>  static int bgmac_set_mac_address(struct net_device *net_dev, void *addr)
> >> 
> >> @@ -1217,6 +1238,32 @@ static int bgmac_ioctl(struct net_device *net_dev,
> >> struct ifreq *ifr, int cmd) }
> >> 
> >>  }
> >> 
> >> +static u16 bgmac_select_queue(struct net_device *net_dev, struct sk_buff
> >> *skb) +{
> >> +       u32 priority = skb->priority;
> >> +
> >> +       /* 802.1d priority to traffic class mapping for 4 queues */
> >> +       switch (priority) {
> >> +               case TC_PRIO_BESTEFFORT:            /* Best effort */
> >> +               case 3:                             /* Excellent effort
> >> */
> >> +                       /* Best effort queue TX_Q1 */
> >> +                       return 1;
> >> +               case TC_PRIO_INTERACTIVE_BULK:      /* Controlled load */
> >> +               case 5:                             /* Video */
> >> +                       /* Controlled Load queue TX_Q2 */
> >> +                       return 2;
> >> +               case TC_PRIO_INTERACTIVE:          /* Voice */
> >> +               case TC_PRIO_CONTROL:              /* Network control */
> >> +                       /* Interactive queue TX_Q3 */
> >> +                       return 3;
> >> +               case TC_PRIO_FILLER:
> >> +               case TC_PRIO_BULK:
> >> +               default:
> >> +                       /* Background queue TX_Q0 */
> >> +                       return 0;
> >> +       }
> >> +}
> 
> You should not use priority, no other ethernet driver does so, use
> skb_get_queue_mapping() and see select_ring_by_priority() in
> drivers/net/wireless/b43/dma.c

I do use the skb_get_queue_mapping (in bgmac_start_xmit), but the mapping is 
first manipulated by bgmac_select_queue. The b43 driver gets the mapping by 
the kernel (__skb_tx_hash), and will then select the ring according to this 
mapping. The problem is (as far as I understand) that __skb_tx_hash is random 
(by default). Haven taken a closer look, the behaviour can be influenced by 
configuring the kernel for certain tc values and setting mappings for tc 
priorities accordingly. I'll have to take a better look at

http://thread.gmane.org/gmane.linux.network/181887
and
http://comments.gmane.org/gmane.linux.network/201366

> >> +
> >> 
> >>  static const struct net_device_ops bgmac_netdev_ops = {
> >>  
> >>         .ndo_open               = bgmac_open,
> >>         .ndo_stop               = bgmac_stop,
> >> 
> >> @@ -1225,6 +1272,7 @@ static const struct net_device_ops bgmac_netdev_ops
> >> =
> >> { .ndo_set_mac_address    = bgmac_set_mac_address,
> >> 
> >>         .ndo_validate_addr      = eth_validate_addr,
> >>         .ndo_do_ioctl           = bgmac_ioctl,
> >> 
> >> +       .ndo_select_queue       = bgmac_select_queue,
> >> 
> >>  };
> >>  
> >>  /**************************************************
> >> 
> >> @@ -1330,7 +1378,7 @@ static int bgmac_probe(struct bcma_device *core)
> >> 
> >>         }
> >>         
> >>         /* Allocation and references */
> >> 
> >> -       net_dev = alloc_etherdev(sizeof(*bgmac));
> >> +       net_dev = alloc_etherdev_mq(sizeof(*bgmac), BGMAC_MAX_TX_RINGS);
> >> 
> >>         if (!net_dev)
> >>         
> >>                 return -ENOMEM;
> >>         
> >>         net_dev->netdev_ops = &bgmac_netdev_ops;
> >> 
> >> _______________________________________________
> >> openwrt-devel mailing list
> >> [email protected]
> >> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
> > 
> > We could keep some counters on the individual rings, to print statistics
> > later e.g. with ethtool.
> 
> This driver currently does not generate many statics, but this should be
> improved in general.
> 
> > We could have a qos boolean to en/disable qos, which can be checked in
> > bgmac_select_queue to e.g always return TX_Q0 when disabled, or one of
> > four
> > tx queues, as shown above.
> 
> when you use skb_get_queue_mapping() it should be configurable from user
> space.
> 
> Hauke

I do use that function ;-) I tried to set a filter with tc to filter traffic 
for a certain ip and assign a strict queue mapping on it, but it failed thus 
far. Still need to find out why (maybe no .ndo_setup_tc defined?)...

Thanks very much.
Regards,
Tijs
_______________________________________________
openwrt-devel mailing list
[email protected]
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to