On Thursday 28 February 2013 15:02:26 Hauke Mehrtens wrote: > On 02/27/2013 02:04 PM, Tijs Van Buggenhout wrote: > > On Tuesday 26 February 2013 21:45:00 Tijs Van Buggenhout wrote: > >> This is a proposal to add QoS support to bgmac driver, allowing to use > >> (possibly) all of the tx queues supported by the hardware. > >> > >> The bgmac device is allocated as an ethernet multiqueue network device, > >> reporting the (maximum) tx queues supported by the hardware to the > >> kernel. > >> > >> The default queue selection mechenism is replaced with the > >> bgmac_select_queue function, mapping the priorities of the skbs to a > >> specific tx queue number. > > I would not replace the default algorithm.
I would agree on that. I first tried with the default algorithm, but it maps the priorities to the queues in a very different way (other than found in broadcom driver, as displayed below). I don't know if it is intentional or not (by broadcom).. does the hardware treat TX queues differently in priority, e.g. broadcom driver prioritises TX_Q3 over TX_Q2, over TX_Q1, over TX_Q0. When using the default algorithm, I saw normal packets (without priority explicitely set), like ICMP being mapped to TX_Q3, whereas ssh traffic was going to TX_Q2. As it seems, the default algorithm can map priorities differently every time the driver/kernel is initialised (random). Not really what we want. > >> Later this number is used in the bgmax_start_xmit function to select the > >> right tx queue, on which the new skb is added. > >> The hardware will generate an interrupt when the skb is handled, in which > >> it reports on what hardware queue (number/id) the event occured, > >> allowing to call bgmac_dma_tx_free on the right tx queue/ring. > >> > >> Mapping is according to 802.1d priority to traffic class mapping, as > >> found > > > >> in et_linux.c : > > Sorry, it resides in et/sys/etc.c, my mistake... > > > >> TX_Q1, /* 0 BE TC_BE Best Effort */ > >> TX_Q0, /* 1 BK TC_BK Background */ > >> TX_Q0, /* 2 -- TC_BK Background */ > >> TX_Q1, /* 3 EE TC_BE Best Effort */ > >> TX_Q2, /* 4 CL TC_CL Controlled Load */ > >> TX_Q2, /* 5 VI TC_CL Controlled Load */ > >> TX_Q3, /* 6 VO TC_VO Voice */ > >> TX_Q3 /* 7 NC TC_VO Voice */ > >> > >> Additionally any other value is mapped on TX_Q0.. > >> > >> Impact on the source code is rather limited, some extra local variables > >> for > >> netdev_queue, queue_id (u16), and all netif_.*_queue calls converted to > >> appropriate netdif_tx_.*_queue calls. > >> > >> I did not test extensively yet, but proof of concept works and is easily > >> tested by adding some debug code to the driver (in bgmac_dma_tx_add, > >> bgmac_poll and bgmac_select_queue functions) and using the ping command > >> on command line and the dropbear ssh (which uses a non-default TOS value > >> on packets: Minimize Delay). Console output shows the following: > >> > >> root@OpenWrt:/# ping 192.168.1.113 > >> PING 192.168.1.113 (192.168.1.113): 56 data bytes > >> [ 58.304000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0 > >> [ 58.316000] bgmac bcma0:1: Select tx queue 1 for packet > >> [ 58.320000] bgmac bcma0:1: Received interrupt for tx queue 1 > >> [ 58.328000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0 > >> [ 58.332000] bgmac bcma0:1: Select tx queue 1 for packet > >> [ 58.340000] bgmac bcma0:1: Received interrupt for tx queue 1 > >> 64 bytes from 192.168.1.113: seq=0 ttl=64 time=40.513 ms > >> [ 59.344000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0 > >> [ 59.352000] bgmac bcma0:1: Select tx queue 1 for packet > >> [ 59.356000] bgmac bcma0:1: Received interrupt for tx queue 1 > >> 64 bytes from 192.168.1.113: seq=1 ttl=64 time=18.595 ms > >> [ 60.364000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0 > >> [ 60.372000] bgmac bcma0:1: Select tx queue 1 for packet > >> [ 60.376000] bgmac bcma0:1: Received interrupt for tx queue 1 > >> 64 bytes from 192.168.1.113: seq=2 ttl=64 time=18.550 ms > >> ^C > >> --- 192.168.1.113 ping statistics --- > >> 3 packets transmitted, 3 packets received, 0% packet loss > >> root@OpenWrt:/# ssh [email protected] > >> [ 105.992000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0 > >> [ 106.000000] bgmac bcma0:1: Select tx queue 1 for packet > >> [ 106.004000] bgmac bcma0:1: Received interrupt for tx queue 1 > >> [ 106.012000] bgmac bcma0:1: bgmac_select_queue based on priority 0x0 > >> [ 106.016000] bgmac bcma0:1: Select tx queue 1 for packet > >> [ 106.024000] bgmac bcma0:1: Received interrupt for tx queue 1 > >> [ 106.032000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6 > >> [ 106.036000] bgmac bcma0:1: Select tx queue 3 for packet > >> [ 106.044000] bgmac bcma0:1: Received interrupt for tx queue 3 > >> [ 106.048000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6 > >> [ 106.056000] bgmac bcma0:1: Select tx queue 3 for packet > >> [ 106.060000] bgmac bcma0:1: Received interrupt for tx queue 3 > >> [ 106.072000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6 > >> ... > >> Host '192.168.1.113' is not in the trusted hosts file. > >> (fingerprint md5 b7:12:75:ca:65:cb:c7:60:59:35:6a:5d:28:6d:27:83) > >> Do you want to continue connecting? (y/n) > >> [ 106.400000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6 > >> [ 106.404000] bgmac bcma0:1: Select tx queue 3 for packet > >> [ 106.408000] bgmac bcma0:1: Received interrupt for tx queue 3 > >> y > >> [ 111.776000] bgmac bcma0:1: bgmac_select_queue based on priority 0x6 > >> ... > >> > >> This patch is generated from git trunk.. Any comments/suggestions/ideas > >> very welcome... > > This patch is whitespace damaged by your mail client. All the tabs are > replaced by spaces and so on this prevents this patch from applying. > Please read http://kerneltrap.org/Linux/Email_Clients_and_Patches or use > "git send-email". > > Could you also send this for linux mainline inclusion to the netdev > mailing list. I'll use inline attachment next time, sorry for that... > >> Signed-off-by: Tijs Van Buggenhout <[email protected]> > >> --- > >> > >> diff --git a/drivers/net/ethernet/broadcom/bgmac.c > >> b/drivers/net/ethernet/broadcom/bgmac.c index 3fd3288..58b1d09 100644 > >> --- a/drivers/net/ethernet/broadcom/bgmac.c > >> +++ b/drivers/net/ethernet/broadcom/bgmac.c > >> @@ -15,6 +15,8 @@ > >> > >> #include <linux/mii.h> > >> #include <linux/interrupt.h> > >> #include <linux/dma-mapping.h> > >> > >> +#include <linux/pkt_sched.h> /* TC defines */ > >> +#include <linux/skbuff.h> /* skb_get_queue_mapping */ > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > These comments are not needed. > > >> #include <asm/mach-bcm47xx/nvram.h> > >> > >> static const struct bcma_device_id bgmac_bcma_tbl[] = { > >> > >> @@ -102,15 +104,22 @@ static void bgmac_dma_tx_enable(struct bgmac > >> *bgmac, > >> > >> static netdev_tx_t bgmac_dma_tx_add(struct bgmac *bgmac, > >> > >> struct bgmac_dma_ring *ring, > >> > >> - struct sk_buff *skb) > >> + struct sk_buff *skb, > >> + u16 queue_id) > > > > We could determine queue_id from (ring - &bgmac->tx_ring[0]) ofcourse, > > avoiding the extra function argument... > > I like the current way more. > > >> { > >> > >> struct device *dma_dev = bgmac->core->dma_dev; > >> struct net_device *net_dev = bgmac->net_dev; > >> > >> + struct netdev_queue *tx_queue; > >> > >> struct bgmac_dma_desc *dma_desc; > >> struct bgmac_slot_info *slot; > >> u32 ctl0, ctl1; > >> int free_slots; > >> > >> + if ((tx_queue = netdev_get_tx_queue(net_dev, queue_id)) == NULL) > >> { > >> + bgmac_err(bgmac, "Error mapping skb to TX queue (id > >> %u)\n", > >> queue_id); + goto err_drop; > >> + } > >> + > >> > >> if (skb->len > BGMAC_DESC_CTL1_LEN) { > >> > >> bgmac_err(bgmac, "Too long skb (%d)\n", skb->len); > >> goto err_stop_drop; > >> > >> @@ -122,7 +131,7 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac > >> *bgmac, free_slots = ring->start - ring->end; > >> > >> if (free_slots == 1) { > >> > >> bgmac_err(bgmac, "TX ring is full, queue should be > >> > >> stopped!\n"); - netif_stop_queue(net_dev); > >> + netif_tx_stop_queue(tx_queue); > > This looks like an other fix. Not really .. all netif_stop_queue's in dma_tx_add or dma_tx_free need to be converted to netif_tx_stop_queue, to only address that specific queue being manipulated? > >> return NETDEV_TX_BUSY; > >> > >> } > >> > >> @@ -160,20 +169,22 @@ static netdev_tx_t bgmac_dma_tx_add(struct bgmac > >> *bgmac, > >> > >> /* Always keep one slot free to allow detecting bugged calls. */ > >> if (--free_slots == 1) > >> > >> - netif_stop_queue(net_dev); > >> + netif_tx_stop_queue(tx_queue); > >> > >> return NETDEV_TX_OK; > >> > >> err_stop_drop: > >> - netif_stop_queue(net_dev); > >> + netif_tx_stop_queue(tx_queue); > >> > >> +err_drop: > >> dev_kfree_skb(skb); > >> return NETDEV_TX_OK; > >> > >> } > >> > >> /* Free transmitted packets */ > >> > >> -static void bgmac_dma_tx_free(struct bgmac *bgmac, struct bgmac_dma_ring > >> *ring) +static void bgmac_dma_tx_free(struct bgmac *bgmac, struct > >> bgmac_dma_ring *ring, u16 queue_id) { > > > > Also here the extra queue_id argument could be avoided.. > > > >> struct device *dma_dev = bgmac->core->dma_dev; > >> > >> + struct netdev_queue *tx_queue = > >> netdev_get_tx_queue(bgmac->net_dev, > >> queue_id); int empty_slot; > >> > >> bool freed = false; > >> > >> @@ -204,8 +215,8 @@ static void bgmac_dma_tx_free(struct bgmac *bgmac, > >> struct bgmac_dma_ring *ring) freed = true; > >> > >> } > >> > >> - if (freed && netif_queue_stopped(bgmac->net_dev)) > >> - netif_wake_queue(bgmac->net_dev); > >> + if (freed && netif_tx_queue_stopped(tx_queue)) > >> + netif_tx_wake_queue(tx_queue); > >> > >> } > >> > >> static void bgmac_dma_rx_reset(struct bgmac *bgmac, struct > >> bgmac_dma_ring > >> > >> *ring) @@ -1096,14 +1107,23 @@ static irqreturn_t bgmac_interrupt(int > >> irq, > >> void *dev_id) > >> > >> static int bgmac_poll(struct napi_struct *napi, int weight) > >> { > >> > >> + static const u32 tx_ring[BGMAC_MAX_TX_RINGS] = { > >> + BGMAC_IS_TX0, BGMAC_IS_TX1, > >> + BGMAC_IS_TX2, BGMAC_IS_TX3, > >> + }; > >> > >> struct bgmac *bgmac = container_of(napi, struct bgmac, napi); > >> struct bgmac_dma_ring *ring; > >> > >> - int handled = 0; > >> - > >> - if (bgmac->int_status & BGMAC_IS_TX0) { > >> - ring = &bgmac->tx_ring[0]; > >> - bgmac_dma_tx_free(bgmac, ring); > >> - bgmac->int_status &= ~BGMAC_IS_TX0; > >> + int handled = 0, i; > >> + > >> + if (bgmac->int_status & BGMAC_IS_TX_MASK) { > >> + for (i = 0 ; (bgmac->int_status & BGMAC_IS_TX_MASK) && > >> + i < BGMAC_MAX_TX_RINGS ; i++) { > >> + if (bgmac->int_status & tx_ring[i]) { > >> + ring = &bgmac->tx_ring[i]; > >> + bgmac_dma_tx_free(bgmac, ring, (u16)i); > >> + bgmac->int_status &= ~tx_ring[i]; > >> + } > >> + } > >> > >> } > > > > The if clause is redundant, as int_status is also checked in the condition > > of the for-loop, .. could be left out. > > I would remove the (bgmac->int_status & BGMAC_IS_TX_MASK) from the for > loop. This is a hot path and if there is no tx ack we do not have to do > the for loop at all. I used it in the for loop aswel, because we may not need to loop i over all tx queues. Every time a tx_ring is selected, int_status is updated. When masking the int_status in the for loop becomes zero, the loop may have ended before i becomes >= BGMAC_MAX_TX_RINGS.. Maybe not worth the extra condition? just loop over all rings every time? I could replace the for loop with four if conditions too (one for every mask in tx_ring array)... > >> if (bgmac->int_status & BGMAC_IS_RX) { > >> > >> @@ -1172,10 +1192,11 @@ static netdev_tx_t bgmac_start_xmit(struct > >> sk_buff > >> *skb, { > >> > >> struct bgmac *bgmac = netdev_priv(net_dev); > >> struct bgmac_dma_ring *ring; > >> > >> + u16 queue_id = 0; > >> > >> - /* No QOS support yet */ > >> - ring = &bgmac->tx_ring[0]; > >> - return bgmac_dma_tx_add(bgmac, ring, skb); > >> + queue_id = skb_get_queue_mapping(skb); > >> + ring = &bgmac->tx_ring[queue_id]; > >> + return bgmac_dma_tx_add(bgmac, ring, skb, queue_id); > >> > >> } > >> > >> static int bgmac_set_mac_address(struct net_device *net_dev, void *addr) > >> > >> @@ -1217,6 +1238,32 @@ static int bgmac_ioctl(struct net_device *net_dev, > >> struct ifreq *ifr, int cmd) } > >> > >> } > >> > >> +static u16 bgmac_select_queue(struct net_device *net_dev, struct sk_buff > >> *skb) +{ > >> + u32 priority = skb->priority; > >> + > >> + /* 802.1d priority to traffic class mapping for 4 queues */ > >> + switch (priority) { > >> + case TC_PRIO_BESTEFFORT: /* Best effort */ > >> + case 3: /* Excellent effort > >> */ > >> + /* Best effort queue TX_Q1 */ > >> + return 1; > >> + case TC_PRIO_INTERACTIVE_BULK: /* Controlled load */ > >> + case 5: /* Video */ > >> + /* Controlled Load queue TX_Q2 */ > >> + return 2; > >> + case TC_PRIO_INTERACTIVE: /* Voice */ > >> + case TC_PRIO_CONTROL: /* Network control */ > >> + /* Interactive queue TX_Q3 */ > >> + return 3; > >> + case TC_PRIO_FILLER: > >> + case TC_PRIO_BULK: > >> + default: > >> + /* Background queue TX_Q0 */ > >> + return 0; > >> + } > >> +} > > You should not use priority, no other ethernet driver does so, use > skb_get_queue_mapping() and see select_ring_by_priority() in > drivers/net/wireless/b43/dma.c I do use the skb_get_queue_mapping (in bgmac_start_xmit), but the mapping is first manipulated by bgmac_select_queue. The b43 driver gets the mapping by the kernel (__skb_tx_hash), and will then select the ring according to this mapping. The problem is (as far as I understand) that __skb_tx_hash is random (by default). Haven taken a closer look, the behaviour can be influenced by configuring the kernel for certain tc values and setting mappings for tc priorities accordingly. I'll have to take a better look at http://thread.gmane.org/gmane.linux.network/181887 and http://comments.gmane.org/gmane.linux.network/201366 > >> + > >> > >> static const struct net_device_ops bgmac_netdev_ops = { > >> > >> .ndo_open = bgmac_open, > >> .ndo_stop = bgmac_stop, > >> > >> @@ -1225,6 +1272,7 @@ static const struct net_device_ops bgmac_netdev_ops > >> = > >> { .ndo_set_mac_address = bgmac_set_mac_address, > >> > >> .ndo_validate_addr = eth_validate_addr, > >> .ndo_do_ioctl = bgmac_ioctl, > >> > >> + .ndo_select_queue = bgmac_select_queue, > >> > >> }; > >> > >> /************************************************** > >> > >> @@ -1330,7 +1378,7 @@ static int bgmac_probe(struct bcma_device *core) > >> > >> } > >> > >> /* Allocation and references */ > >> > >> - net_dev = alloc_etherdev(sizeof(*bgmac)); > >> + net_dev = alloc_etherdev_mq(sizeof(*bgmac), BGMAC_MAX_TX_RINGS); > >> > >> if (!net_dev) > >> > >> return -ENOMEM; > >> > >> net_dev->netdev_ops = &bgmac_netdev_ops; > >> > >> _______________________________________________ > >> openwrt-devel mailing list > >> [email protected] > >> https://lists.openwrt.org/mailman/listinfo/openwrt-devel > > > > We could keep some counters on the individual rings, to print statistics > > later e.g. with ethtool. > > This driver currently does not generate many statics, but this should be > improved in general. > > > We could have a qos boolean to en/disable qos, which can be checked in > > bgmac_select_queue to e.g always return TX_Q0 when disabled, or one of > > four > > tx queues, as shown above. > > when you use skb_get_queue_mapping() it should be configurable from user > space. > > Hauke I do use that function ;-) I tried to set a filter with tc to filter traffic for a certain ip and assign a strict queue mapping on it, but it failed thus far. Still need to find out why (maybe no .ndo_setup_tc defined?)... Thanks very much. Regards, Tijs _______________________________________________ openwrt-devel mailing list [email protected] https://lists.openwrt.org/mailman/listinfo/openwrt-devel
