date:20160831

Re: [PATCH v3 0/5] net/usb: asix driver improvements

2016-08-31 Thread David Miller

From: robert.f...@collabora.com
Date: Mon, 29 Aug 2016 09:32:14 -0400

> This is a resubmission of v3, since the netdev
> mailinlist was not sent the previous submission.
> 
> This series improves power management of the asix driver.
 ...

Series applied, thanks.

Re: [PATCH net 0/3] qed*: dcbx fix series.

2016-08-31 Thread David Miller

From: Sudarsana Reddy Kalluru 
Date: Mon, 29 Aug 2016 08:29:51 -0400

> The series contains several small fixes for qed* dcbx module.

Series applied, thanks.

Re: [PATCH] mISDN: mark symbols static where possible

2016-08-31 Thread David Miller


Three different patches all with the same Subject line, so I can't
apply this stuff.

You must make the subject lines unique so that someone reading
the "git shortlog" can tell what is different in each change.

Re: [Patch net] kcm: fix a socket double free

2016-08-31 Thread David Miller

From: Cong Wang 
Date: Sun, 28 Aug 2016 21:28:26 -0700

> Dmitry reported a double free on kcm socket, which could
> be easily reproduced by:
> 
>   #include 
>   #include 
> 
>   int main()
>   {
> int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0);
> syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0);
> return 0;
>   }
> 
> This is because on the error path, after we install
> the new socket file, we call sock_release() to clean
> up the socket, which leaves the fd pointing to a freed
> socket. Fix this by calling sys_close() on that fd
> directly.
> 
> Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
> Reported-by: Dmitry Vyukov 
> Cc: Tom Herbert 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks.

Re: [PATCH v2] ipv6: Use inbound ifaddr as source addresses for ICMPv6 errors

2016-08-31 Thread David Miller

From: Eli Cooper 
Date: Sun, 28 Aug 2016 11:34:06 +0800

> According to RFC 1885 2.2(c), the source address of ICMPv6
> errors in response to forwarded packets should be set to the
> unicast address of the forwarding interface in order to be helpful
> in diagnosis. Currently the selection of source address is based
> on the default route, without respect to the inbound interface.
> 
> This patch sets the source address of ICMPv6 error messages to
> the address of inbound interface, with the exception of
> 'time exceeded' and 'packet to big' messages sent in ip6_forward(),
> where the address of OUTPUT device is forced as source address
> (however, it is NOT enforced as claimed without this patch).
> 
> Signed-off-by: Eli Cooper 

Please resubmit with an updated commit message describing
the use case.

Re: [PATCH net v4 0/9] net: ethernet: mediatek: a couple of fixes

2016-08-31 Thread David Miller

From: 
Date: Thu, 1 Sep 2016 10:47:26 +0800

> a couple of fixes come out from integrating with linux-4.8 rc1
> they all are verified and workable on linux-4.8 rc1

Series applied.

Re: [PATCH 4/5] r8152: constify ethtool_ops structures

2016-08-31 Thread David Miller

From: Julia Lawall 
Date: Thu,  1 Sep 2016 00:21:22 +0200

> Check for ethtool_ops structures that are only stored in the ethtool_ops
> field of a net_device structure or passed as the second argument to
> netdev_set_default_ethtool_ops.  These contexts are declared const, so
> ethtool_ops structures that have these properties can be declared as const
> also.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
 ...
> Suggested-by: Stephen Hemminger 
> 
> Signed-off-by: Julia Lawall 

Applied.

Re: [PATCH 5/5] net: axienet: constify ethtool_ops structures

2016-08-31 Thread David Miller

From: Julia Lawall 
Date: Thu,  1 Sep 2016 00:21:23 +0200

> Check for ethtool_ops structures that are only stored in the ethtool_ops
> field of a net_device structure or passed as the second argument to
> netdev_set_default_ethtool_ops.  These contexts are declared const, so
> ethtool_ops structures that have these properties can be declared as const
> also.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
 ...
> Suggested-by: Stephen Hemminger 
> 
> Signed-off-by: Julia Lawall 

Applied.

Re: [PATCH 1/5] net: mediatek: constify ethtool_ops structures

2016-08-31 Thread David Miller

From: Julia Lawall 
Date: Thu,  1 Sep 2016 00:21:19 +0200

> Check for ethtool_ops structures that are only stored in the ethtool_ops
> field of a net_device structure or passed as the second argument to
> netdev_set_default_ethtool_ops.  These contexts are declared const, so
> ethtool_ops structures that have these properties can be declared as const
> also.
 ...
> Suggested-by: Stephen Hemminger 
> 
> Signed-off-by: Julia Lawall 

Applied.

Re: [PATCH net-next 00/12] net: Convert vrf from dst to tx hook

2016-08-31 Thread David Miller

From: David Ahern 
Date: Wed, 31 Aug 2016 17:14:13 -0600

> please drop this series. BGP smoke tests triggered a couple of
> problems I need to resolve.

Ok.

Re: [RFC] xgbe: constify get_netdev_ops and get_ethtool_ops

2016-08-31 Thread Tom Lendacky

On 08/31/2016 04:17 PM, David Miller wrote:
> From: Stephen Hemminger 
> Date: Wed, 31 Aug 2016 08:57:36 -0700
> 
>> Casting away const is bad practice. Since this is ARM specific driver
>> don't have hardware actually test this.
>>
>> Having getter functions for ops is really unnecessary code bloat, but
>> not going to touch that.
>>
>> Signed-off-by: Stephen Hemminger 
> 
> I'll just apply this, let's see what happens.

I should be able to test this in the next few days. I don't expect there
to be an issue. I'll let you know what I find.

Thanks,
Tom

>

[PATCH net v4 5/9] net: ethernet: mediatek: fix logic unbalance between probe and remove

2016-08-31 Thread sean.wang

From: Sean Wang 

original mdio_cleanup is not in the symmetric place against where
mdio_init is, so relocate mdio_cleanup to the right one.

Signed-off-by: Sean Wang 
Acked-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 1ffde91..bf5b7e1 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1511,7 +1511,6 @@ static void mtk_uninit(struct net_device *dev)
struct mtk_eth *eth = mac->hw;
 
phy_disconnect(mac->phy_dev);
-   mtk_mdio_cleanup(eth);
mtk_irq_disable(eth, ~0);
 }
 
@@ -1916,6 +1915,7 @@ static int mtk_remove(struct platform_device *pdev)
netif_napi_del(>tx_napi);
netif_napi_del(>rx_napi);
mtk_cleanup(eth);
+   mtk_mdio_cleanup(eth);
platform_set_drvdata(pdev, NULL);
 
return 0;
-- 
1.9.1

[PATCH net v4 9/9] net: ethernet: mediatek: fix error handling inside mtk_mdio_init

2016-08-31 Thread sean.wang

From: Sean Wang 

Return -ENODEV if the MDIO bus is disabled in the device tree.

Signed-off-by: Sean Wang 
Acked-by: John Crispin 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 0367f51..d919915 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -304,7 +304,7 @@ static int mtk_mdio_init(struct mtk_eth *eth)
}
 
if (!of_device_is_available(mii_np)) {
-   ret = 0;
+   ret = -ENODEV;
goto err_put_node;
}
 
-- 
1.9.1

[PATCH net v4 3/9] net: ethernet: mediatek: fix API usage with skb_free_frag

2016-08-31 Thread sean.wang

From: Sean Wang 

use skb_free_frag() instead of legacy put_page()

Signed-off-by: Sean Wang 
Acked-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index a5dcf57..c9e25a7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -870,7 +870,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
/* receive data */
skb = build_skb(data, ring->frag_size);
if (unlikely(!skb)) {
-   put_page(virt_to_head_page(new_data));
+   skb_free_frag(new_data);
netdev->stats.rx_dropped++;
goto release_desc;
}
-- 
1.9.1

[PATCH net v4 1/9] net: ethernet: mediatek: fix fails from TX housekeeping due to incorrect port setup

2016-08-31 Thread sean.wang

From: Sean Wang 

which net device the SKB is complete for depends on the forward port
on txd4 on the corresponding TX descriptor, but the information isn't
set up well in case of  SKB fragments that would lead to watchdog timeout
from the upper layer, so fix it up.

Signed-off-by: Sean Wang 
Acked-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index f160954..7fc2ff0 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -588,14 +588,15 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
dma_addr_t mapped_addr;
unsigned int nr_frags;
int i, n_desc = 1;
-   u32 txd4 = 0;
+   u32 txd4 = 0, fport;
 
itxd = ring->next_free;
if (itxd == ring->last_free)
return -ENOMEM;
 
/* set the forward port */
-   txd4 |= (mac->id + 1) << TX_DMA_FPORT_SHIFT;
+   fport = (mac->id + 1) << TX_DMA_FPORT_SHIFT;
+   txd4 |= fport;
 
tx_buf = mtk_desc_to_tx_buf(ring, itxd);
memset(tx_buf, 0, sizeof(*tx_buf));
@@ -653,7 +654,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct 
net_device *dev,
WRITE_ONCE(txd->txd3, (TX_DMA_SWC |
   TX_DMA_PLEN0(frag_map_size) |
   last_frag * TX_DMA_LS0));
-   WRITE_ONCE(txd->txd4, 0);
+   WRITE_ONCE(txd->txd4, fport);
 
tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
tx_buf = mtk_desc_to_tx_buf(ring, txd);
-- 
1.9.1

[PATCH net v4 6/9] net: ethernet: mediatek: fix issue of driver removal with interface is up

2016-08-31 Thread sean.wang

From: Sean Wang 

mtk_stop() must be called to stop for freeing DMA
resources acquired and restoring state changed by mtk_open()
firstly when module removal.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index bf5b7e1..556951e 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1906,6 +1906,14 @@ err_free_dev:
 static int mtk_remove(struct platform_device *pdev)
 {
struct mtk_eth *eth = platform_get_drvdata(pdev);
+   int i;
+
+   /* stop all devices to make sure that dma is properly shut down */
+   for (i = 0; i < MTK_MAC_COUNT; i++) {
+   if (!eth->netdev[i])
+   continue;
+   mtk_stop(eth->netdev[i]);
+   }
 
clk_disable_unprepare(eth->clks[MTK_CLK_ETHIF]);
clk_disable_unprepare(eth->clks[MTK_CLK_ESW]);
-- 
1.9.1

[PATCH net v4 7/9] net: ethernet: mediatek: fix the missing of_node_put() after node is used done inside mtk_mdio_init

2016-08-31 Thread sean.wang

From: Sean Wang 

This patch adds the missing of_node_put() after finishing the usage
of of_get_child_by_name.

Signed-off-by: Sean Wang 
Acked-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 556951e..409efcf 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -324,6 +324,7 @@ static int mtk_mdio_init(struct mtk_eth *eth)
err = of_mdiobus_register(eth->mii_bus, mii_np);
if (err)
goto err_free_bus;
+   of_node_put(mii_np);
 
return 0;
 
-- 
1.9.1

[PATCH net v4 8/9] net: ethernet: mediatek: use devm_mdiobus_alloc instead of mdiobus_alloc inside mtk_mdio_init

2016-08-31 Thread sean.wang

From: Sean Wang 

a lot of parts in the driver uses devm_* APIs to gain benefits from the
device resource management, so devm_mdiobus_alloc is also used instead
of mdiobus_alloc to have more elegant code flow.

Using common code provided by the devm_* helps to
1) have simplified the code flow as [1] says
2) decrease the risk of incorrect error handling by human
3) only a few drivers used it since it was proposed on linux 3.16,
so just hope to promote for this.

Ref:
[1] https://patchwork.ozlabs.org/patch/344093/

Signed-off-by: Sean Wang 
Reviewed-by: Andrew Lunn 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 409efcf..0367f51 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -295,7 +295,7 @@ err_phy:
 static int mtk_mdio_init(struct mtk_eth *eth)
 {
struct device_node *mii_np;
-   int err;
+   int ret;
 
mii_np = of_get_child_by_name(eth->dev->of_node, "mdio-bus");
if (!mii_np) {
@@ -304,13 +304,13 @@ static int mtk_mdio_init(struct mtk_eth *eth)
}
 
if (!of_device_is_available(mii_np)) {
-   err = 0;
+   ret = 0;
goto err_put_node;
}
 
-   eth->mii_bus = mdiobus_alloc();
+   eth->mii_bus = devm_mdiobus_alloc(eth->dev);
if (!eth->mii_bus) {
-   err = -ENOMEM;
+   ret = -ENOMEM;
goto err_put_node;
}
 
@@ -321,20 +321,11 @@ static int mtk_mdio_init(struct mtk_eth *eth)
eth->mii_bus->parent = eth->dev;
 
snprintf(eth->mii_bus->id, MII_BUS_ID_SIZE, "%s", mii_np->name);
-   err = of_mdiobus_register(eth->mii_bus, mii_np);
-   if (err)
-   goto err_free_bus;
-   of_node_put(mii_np);
-
-   return 0;
-
-err_free_bus:
-   mdiobus_free(eth->mii_bus);
+   ret = of_mdiobus_register(eth->mii_bus, mii_np);
 
 err_put_node:
of_node_put(mii_np);
-   eth->mii_bus = NULL;
-   return err;
+   return ret;
 }
 
 static void mtk_mdio_cleanup(struct mtk_eth *eth)
@@ -343,8 +334,6 @@ static void mtk_mdio_cleanup(struct mtk_eth *eth)
return;
 
mdiobus_unregister(eth->mii_bus);
-   of_node_put(eth->mii_bus->dev.of_node);
-   mdiobus_free(eth->mii_bus);
 }
 
 static inline void mtk_irq_disable(struct mtk_eth *eth, u32 mask)
-- 
1.9.1

[PATCH net v4 4/9] net: ethernet: mediatek: remove redundant free_irq for devm_request_irq allocated irq

2016-08-31 Thread sean.wang

From: Sean Wang 

these irqs are not used for shared irq and disabled during ethernet stops.
irq requested by devm_request_irq is safe to be freed automatically on
driver detach.

Signed-off-by: Sean Wang 
Acked-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index c9e25a7..1ffde91 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1513,8 +1513,6 @@ static void mtk_uninit(struct net_device *dev)
phy_disconnect(mac->phy_dev);
mtk_mdio_cleanup(eth);
mtk_irq_disable(eth, ~0);
-   free_irq(eth->irq[1], dev);
-   free_irq(eth->irq[2], dev);
 }
 
 static int mtk_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
-- 
1.9.1

[PATCH net v4 2/9] net: ethernet: mediatek: fix incorrect return value of devm_clk_get with EPROBE_DEFER

2016-08-31 Thread sean.wang

From: Sean Wang 

1) If the return value of devm_clk_get is EPROBE_DEFER, we should
defer probing the driver. The change is verified and works based
on 4.8-rc1 staying with the latest clk-next code for MT7623.
2) Changing with the usage of loops to work out if all clocks
required are fine

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 39 -
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 22 ++--
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 7fc2ff0..a5dcf57 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -50,6 +50,10 @@ static const struct mtk_ethtool_stats {
MTK_ETHTOOL_STAT(rx_flow_control_packets),
 };
 
+static const char * const mtk_clks_source_name[] = {
+   "ethif", "esw", "gp1", "gp2"
+};
+
 void mtk_w32(struct mtk_eth *eth, u32 val, unsigned reg)
 {
__raw_writel(val, eth->base + reg);
@@ -1814,6 +1818,7 @@ static int mtk_probe(struct platform_device *pdev)
if (!eth)
return -ENOMEM;
 
+   eth->dev = >dev;
eth->base = devm_ioremap_resource(>dev, res);
if (IS_ERR(eth->base))
return PTR_ERR(eth->base);
@@ -1848,21 +1853,21 @@ static int mtk_probe(struct platform_device *pdev)
return -ENXIO;
}
}
+   for (i = 0; i < ARRAY_SIZE(eth->clks); i++) {
+   eth->clks[i] = devm_clk_get(eth->dev,
+   mtk_clks_source_name[i]);
+   if (IS_ERR(eth->clks[i])) {
+   if (PTR_ERR(eth->clks[i]) == -EPROBE_DEFER)
+   return -EPROBE_DEFER;
+   return -ENODEV;
+   }
+   }
 
-   eth->clk_ethif = devm_clk_get(>dev, "ethif");
-   eth->clk_esw = devm_clk_get(>dev, "esw");
-   eth->clk_gp1 = devm_clk_get(>dev, "gp1");
-   eth->clk_gp2 = devm_clk_get(>dev, "gp2");
-   if (IS_ERR(eth->clk_esw) || IS_ERR(eth->clk_gp1) ||
-   IS_ERR(eth->clk_gp2) || IS_ERR(eth->clk_ethif))
-   return -ENODEV;
-
-   clk_prepare_enable(eth->clk_ethif);
-   clk_prepare_enable(eth->clk_esw);
-   clk_prepare_enable(eth->clk_gp1);
-   clk_prepare_enable(eth->clk_gp2);
+   clk_prepare_enable(eth->clks[MTK_CLK_ETHIF]);
+   clk_prepare_enable(eth->clks[MTK_CLK_ESW]);
+   clk_prepare_enable(eth->clks[MTK_CLK_GP1]);
+   clk_prepare_enable(eth->clks[MTK_CLK_GP2]);
 
-   eth->dev = >dev;
eth->msg_enable = netif_msg_init(mtk_msg_level, MTK_DEFAULT_MSG_ENABLE);
INIT_WORK(>pending_work, mtk_pending_work);
 
@@ -1905,10 +1910,10 @@ static int mtk_remove(struct platform_device *pdev)
 {
struct mtk_eth *eth = platform_get_drvdata(pdev);
 
-   clk_disable_unprepare(eth->clk_ethif);
-   clk_disable_unprepare(eth->clk_esw);
-   clk_disable_unprepare(eth->clk_gp1);
-   clk_disable_unprepare(eth->clk_gp2);
+   clk_disable_unprepare(eth->clks[MTK_CLK_ETHIF]);
+   clk_disable_unprepare(eth->clks[MTK_CLK_ESW]);
+   clk_disable_unprepare(eth->clks[MTK_CLK_GP1]);
+   clk_disable_unprepare(eth->clks[MTK_CLK_GP2]);
 
netif_napi_del(>tx_napi);
netif_napi_del(>rx_napi);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index f82e3ac..6e1ade7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -290,6 +290,17 @@ enum mtk_tx_flags {
MTK_TX_FLAGS_PAGE0  = 0x02,
 };
 
+/* This enum allows us to identify how the clock is defined on the array of the
+ * clock in the order
+ */
+enum mtk_clks_map {
+   MTK_CLK_ETHIF,
+   MTK_CLK_ESW,
+   MTK_CLK_GP1,
+   MTK_CLK_GP2,
+   MTK_CLK_MAX
+};
+
 /* struct mtk_tx_buf - This struct holds the pointers to the memory pointed at
  * by the TX descriptors
  * @skb:   The SKB pointer of the packet being sent
@@ -370,10 +381,7 @@ struct mtk_rx_ring {
  * @scratch_ring:  Newer SoCs need memory for a second HW managed TX ring
  * @phy_scratch_ring:  physical address of scratch_ring
  * @scratch_head:  The scratch memory that scratch_ring points to.
- * @clk_ethif: The ethif clock
- * @clk_esw:   The switch clock
- * @clk_gp1:   The gmac1 clock
- * @clk_gp2:   The gmac2 clock
+ * @clks:  clock array for all clocks required
  * @mii_bus:   If there is a bus we need to create an instance for it
  * @pending_work:  The workqueue used to reset the dma ring
  */
@@ -400,10 +408,8 @@ struct mtk_eth {
struct mtk_tx_dma   *scratch_ring;
dma_addr_t  phy_scratch_ring;

[PATCH net v4 0/9] net: ethernet: mediatek: a couple of fixes

2016-08-31 Thread sean.wang

From: Sean Wang 

a couple of fixes come out from integrating with linux-4.8 rc1
they all are verified and workable on linux-4.8 rc1

Changes since v1:
- usage of loops to work out if all required clock are ready instead
of tedious coding
- remove redundant pinctrl setup that is already done by core driver
thanks for careful and patient reviewing by Andrew Lunn
- splitting distinct changes into the separate patches
- change variable naming from err to ret for readable coding

Changes since v2:
- restore to original clock disabling sequence that is changed 
accidentally in the last version
- refine the commit log that would cause misunderstanding what has 
been done in the changes
- refine the commit log that would cause footnote losing due to 
improper delimiter use

Changes since v3:
- fix git rejects caused by mixing a change from net-next, so 
remake the patch set based on the current net branch again.

Sean Wang (9):
  net: ethernet: mediatek: fix fails from TX housekeeping due to
incorrect port setup
  net: ethernet: mediatek: fix incorrect return value of devm_clk_get
with EPROBE_DEFER
  net: ethernet: mediatek: fix API usage with skb_free_frag
  net: ethernet: mediatek: remove redundant free_irq for
devm_request_irq allocated irq
  net: ethernet: mediatek: fix logic unbalance between probe and remove
  net: ethernet: mediatek: fix issue of driver removal with interface is
up
  net: ethernet: mediatek: fix the missing of_node_put() after node is
used done inside mtk_mdio_init
  net: ethernet: mediatek: use devm_mdiobus_alloc instead of
mdiobus_alloc inside mtk_mdio_init
  net: ethernet: mediatek: fix error handling inside mtk_mdio_init

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 82 +++--
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 22 +---
 2 files changed, 56 insertions(+), 48 deletions(-)

-- 
1.9.1

Re: [PATCH] ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit()

2016-08-31 Thread Eli Cooper

Hello,

On 2016/9/1 4:56, David Miller wrote:
> From: Eli Cooper 
> Date: Fri, 26 Aug 2016 23:52:29 +0800
>
>> @@ -1174,6 +1174,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
>> *dev)
>>  encap_limit = t->parms.encap_limit;
>>  
>>  memcpy(, >fl.u.ip6, sizeof(fl6));
>> +fl6.flowi6_proto = IPPROTO_IPIP;
> Let's just simply have t->fl have the proto setup properly, just like
> in GRE.
>
> Assigning it explicitly every packet transmit doesn't make much sense.

I doubt that. Unlike GRE, where the proto must be IPPROTO_GRE, the proto
here can be either IPPROTO_IPV6 or IPPROTO_IPIP for a single tunnel, and
t->fl is shared by them. Thus it has to be assigned for every packet.

Thanks,
Eli

Re: [PATCH net-next 0/4] xps_flows: XPS flow steering when there is no socket

2016-08-31 Thread Eric Dumazet

On Wed, 2016-08-31 at 17:10 -0700, Tom Herbert wrote:

> Tested:
>   Manually forced all packets to go through the xps_flows path.
>   Observed that some flows were deferred to change queues because
>   packets were in flight witht the flow bucket.

I did not realize you were ready to submit this new infra !

Please add performance tests and documentation.
( Documentation/networking/scaling.txt should be a nice place ) 

Unconnected UDP packets are candidates to this selection,
even locally generated, while maybe the applications are pinning their
thread(s) to cpu(s)
TX completion will then happen on multiple cpus.

Not sure about af_packet and/or pktgen ?

- The new hash table is vmalloc()ed on a single NUMA node. (in
comparison RFS table (per rx queue) can be properly accessed by a single
cpu servicing queue interrupts)

- Each packet will likely get an additional cache miss in a DDOS
forwarding workload.

Thanks.

Re: [PATCH v2 net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events

2016-08-31 Thread Brendan Gregg

On Wed, Aug 31, 2016 at 2:50 PM, Alexei Starovoitov  wrote:
> Hi Peter, Dave,
>
> this patch set is a follow up to the discussion:
> https://lkml.kernel.org/r/20160804142853.GO6862%20()%20twins%20!%20programming%20!%20kicks-ass%20!%20net
> It turned out to be simpler than what we discussed.
>
> Patches 1-3 is bpf-side prep for the main patch 4
> that adds bpf program as an overflow_handler to sw and hw perf_events.
> Peter, please review.
>
> Patches 5 and 6 are examples from myself and Brendan.
>
> v1-v2: fixed issues spotted by Peter and Daniel.

Thanks Alexei!

Tested-by: Brendan Gregg 

Brendan

[PATCH net-next 3/4] net: Add xps_dev_flow_table_cnt

2016-08-31 Thread Tom Herbert

Add infrastructure and definitions to create XFS flow tables. This
creates the new sys entry /sys/class/net/eth*/xps_dev_flow_table_cnt

Signed-off-by: Tom Herbert 
---
 include/linux/netdevice.h | 24 +
 net/core/net-sysfs.c  | 89 +++
 2 files changed, 113 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0d1d748..0164c47 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -736,6 +736,27 @@ struct xps_dev_maps {
 (nr_cpu_ids * sizeof(struct xps_map *)))
 #endif /* CONFIG_XPS */
 
+#ifdef CONFIG_XPS_FLOWS
+struct xps_dev_flow {
+   union {
+   u64 v64;
+   struct {
+   int queue_index;
+   unsigned intqueue_ptr;
+   };
+   };
+};
+
+struct xps_dev_flow_table {
+   unsigned int mask;
+   struct rcu_head rcu;
+   struct xps_dev_flow flows[0];
+};
+#define XPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct xps_dev_flow_table) + \
+   ((_num) * sizeof(struct xps_dev_flow)))
+
+#endif /* CONFIG_XPS_FLOWS */
+
 #define TC_MAX_QUEUE   16
 #define TC_BITMASK 15
 /* HW offloaded queuing disciplines txq count and offset maps */
@@ -1809,6 +1830,9 @@ struct net_device {
 #ifdef CONFIG_XPS
struct xps_dev_maps __rcu *xps_maps;
 #endif
+#ifdef CONFIG_XPS_FLOWS
+   struct xps_dev_flow_table __rcu *xps_flow_table;
+#endif
 #ifdef CONFIG_NET_CLS_ACT
struct tcf_proto __rcu  *egress_cl_list;
 #endif
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index ab7b0b6..0d00b9c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -503,6 +503,92 @@ static ssize_t phys_switch_id_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(phys_switch_id);
 
+#ifdef CONFIG_XPS_FLOWS
+static void xps_dev_flow_table_release(struct rcu_head *rcu)
+{
+   struct xps_dev_flow_table *table = container_of(rcu,
+   struct xps_dev_flow_table, rcu);
+   vfree(table);
+}
+
+static int change_xps_dev_flow_table_cnt(struct net_device *dev,
+unsigned long count)
+{
+   unsigned long mask;
+   struct xps_dev_flow_table *table, *old_table;
+   static DEFINE_SPINLOCK(xps_dev_flow_lock);
+
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+
+   if (count) {
+   mask = count - 1;
+   /* mask = roundup_pow_of_two(count) - 1;
+* without overflows...
+*/
+   while ((mask | (mask >> 1)) != mask)
+   mask |= (mask >> 1);
+   /* On 64 bit arches, must check mask fits in table->mask (u32),
+* and on 32bit arches, must check
+* XPS_DEV_FLOW_TABLE_SIZE(mask + 1) doesn't overflow.
+*/
+#if BITS_PER_LONG > 32
+   if (mask > (unsigned long)(u32)mask)
+   return -EINVAL;
+#else
+   if (mask > (ULONG_MAX - XPS_DEV_FLOW_TABLE_SIZE(1))
+   / sizeof(struct xps_dev_flow)) {
+   /* Enforce a limit to prevent overflow */
+   return -EINVAL;
+   }
+#endif
+   table = vmalloc(XPS_DEV_FLOW_TABLE_SIZE(mask + 1));
+   if (!table)
+   return -ENOMEM;
+
+   table->mask = mask;
+   for (count = 0; count <= mask; count++)
+   table->flows[count].queue_index = -1;
+   } else
+   table = NULL;
+
+   spin_lock(_dev_flow_lock);
+   old_table = rcu_dereference_protected(dev->xps_flow_table,
+ 
lockdep_is_held(_dev_flow_lock));
+   rcu_assign_pointer(dev->xps_flow_table, table);
+   spin_unlock(_dev_flow_lock);
+
+   if (old_table)
+   call_rcu(_table->rcu, xps_dev_flow_table_release);
+
+   return 0;
+}
+
+static ssize_t xps_dev_flow_table_cnt_store(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t len)
+{
+   return netdev_store(dev, attr, buf, len, change_xps_dev_flow_table_cnt);
+}
+
+static ssize_t xps_dev_flow_table_cnt_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct net_device *netdev = to_net_dev(dev);
+   struct xps_dev_flow_table *table;
+   unsigned int cnt = 0;
+
+   rcu_read_lock();
+   table = rcu_dereference(netdev->xps_flow_table);
+   if (table)
+   cnt = table->mask + 1;
+   rcu_read_unlock();
+
+   return sprintf(buf, fmt_dec, cnt);
+}
+DEVICE_ATTR_RW(xps_dev_flow_table_cnt);
+#endif /* CONFIG_XPS_FLOWS */
+
 static struct attribute *net_class_attrs[] = {
_attr_netdev_group.attr,
_attr_type.attr,
@@ -531,6 +617,9

[PATCH net-next 2/4] dql: Add counters for number of queuing and completion operations

2016-08-31 Thread Tom Herbert

Add two new counters to struct dql that are num_enqueue_ops and
num_completed_ops. num_enqueue_ops is incremented by one in each call to
dql_queued. num_enqueue_ops is incremented in dql_completed which takes
an argument indicating number of operations completed. These counters
are only intended for statistics and do not impact the BQL algorithm.

We add a new sysfs entry in byte_queue_limits named inflight_pkts.
This provides the number of packets in flight for the queue by
dql->num_enqueue_ops - dql->num_completed_ops.

Signed-off-by: Tom Herbert 
---
 include/linux/dynamic_queue_limits.h |  7 ++-
 include/linux/netdevice.h|  2 +-
 lib/dynamic_queue_limits.c   |  3 ++-
 net/core/net-sysfs.c | 14 ++
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/dynamic_queue_limits.h 
b/include/linux/dynamic_queue_limits.h
index a4be703..b6a4804 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -43,6 +43,8 @@ struct dql {
unsigned intadj_limit;  /* limit + num_completed */
unsigned intlast_obj_cnt;   /* Count at last queuing */
 
+   unsigned intnum_enqueue_ops;/* Number of queue operations */
+
/* Fields accessed only by completion path (dql_completed) */
 
unsigned intlimit cacheline_aligned_in_smp; /* Current limit */
@@ -55,6 +57,8 @@ struct dql {
unsigned intlowest_slack;   /* Lowest slack found */
unsigned long   slack_start_time;   /* Time slacks seen */
 
+   unsigned intnum_completed_ops;  /* Number of complete ops */
+
/* Configuration */
unsigned intmax_limit;  /* Max limit */
unsigned intmin_limit;  /* Minimum limit */
@@ -83,6 +87,7 @@ static inline void dql_queued(struct dql *dql, unsigned int 
count)
barrier();
 
dql->num_queued += count;
+   dql->num_enqueue_ops++;
 }
 
 /* Returns how many objects can be queued, < 0 indicates over limit. */
@@ -92,7 +97,7 @@ static inline int dql_avail(const struct dql *dql)
 }
 
 /* Record number of completed objects and recalculate the limit. */
-void dql_completed(struct dql *dql, unsigned int count);
+void dql_completed(struct dql *dql, unsigned int count, unsigned int ops);
 
 /* Reset dql state */
 void dql_reset(struct dql *dql);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d122be9..0d1d748 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2999,7 +2999,7 @@ static inline void netdev_tx_completed_queue(struct 
netdev_queue *dev_queue,
if (unlikely(!bytes))
return;
 
-   dql_completed(_queue->dql, bytes);
+   dql_completed(_queue->dql, bytes, pkts);
 
/*
 * Without the memory barrier there is a small possiblity that
diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c
index f346715..d5e7a27 100644
--- a/lib/dynamic_queue_limits.c
+++ b/lib/dynamic_queue_limits.c
@@ -14,7 +14,7 @@
 #define AFTER_EQ(A, B) ((int)((A) - (B)) >= 0)
 
 /* Records completed count and recalculates the queue limit */
-void dql_completed(struct dql *dql, unsigned int count)
+void dql_completed(struct dql *dql, unsigned int count, unsigned int ops)
 {
unsigned int inprogress, prev_inprogress, limit;
unsigned int ovlimit, completed, num_queued;
@@ -108,6 +108,7 @@ void dql_completed(struct dql *dql, unsigned int count)
dql->prev_ovlimit = ovlimit;
dql->prev_last_obj_cnt = dql->last_obj_cnt;
dql->num_completed = completed;
+   dql->num_completed_ops += ops;
dql->prev_num_queued = num_queued;
 }
 EXPORT_SYMBOL(dql_completed);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 6e4f347..ab7b0b6 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1147,6 +1147,19 @@ static ssize_t bql_show_inflight(struct netdev_queue 
*queue,
 static struct netdev_queue_attribute bql_inflight_attribute =
__ATTR(inflight, S_IRUGO, bql_show_inflight, NULL);
 
+static ssize_t bql_show_inflight_pkts(struct netdev_queue *queue,
+ struct netdev_queue_attribute *attr,
+ char *buf)
+{
+   struct dql *dql = >dql;
+
+   return sprintf(buf, "%u\n",
+  dql->num_enqueue_ops - dql->num_completed_ops);
+}
+
+static struct netdev_queue_attribute bql_inflight_pkts_attribute =
+   __ATTR(inflight_pkts, S_IRUGO, bql_show_inflight_pkts, NULL);
+
 #define BQL_ATTR(NAME, FIELD)  \
 static ssize_t bql_show_ ## NAME(struct netdev_queue *queue,   \
 struct netdev_queue_attribute *attr,   \
@@ -1176,6 +1189,7 @@ static struct attribute *dql_attrs[] = {
_limit_min_attribute.attr,

[PATCH net-next 4/4] xps_flows: XPS for packets that don't have a socket

2016-08-31 Thread Tom Herbert

xps_flows maintains a per device flow table that is indexed by the
skbuff hash. The table is only consulted when there is no queue saved in
a transmit socket for an skbuff.

Each entry in the flow table contains a queue index and a queue
pointer. The queue pointer is set when a queue is chosen using a
flow table entry. This pointer is set to the head pointer in the
transmit queue (which is maintained by BQL).

The new function get_xps_flows_index that looks up flows in the
xps_flows table. The entry returned gives the last queue a matching flow
used. The returned queue is compared against the normal XPS queue. If
they are different, then we only switch if the tail pointer in the TX
queue has advanced past the pointer saved in the entry. In this
way OOO should be avoided when XPS wants to use a different queue.

Signed-off-by: Tom Herbert 
---
 net/Kconfig|  6 +
 net/core/dev.c | 85 +++---
 2 files changed, 76 insertions(+), 15 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index 7b6cd34..f77fad1 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -255,6 +255,12 @@ config XPS
depends on SMP
default y
 
+config XPS_FLOWS
+   bool
+   depends on XPS
+   depends on BQL
+   default y
+
 config HWBM
bool
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 34b5322..fc68d19 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3210,6 +3210,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct 
net_device *dev)
 }
 #endif /* CONFIG_NET_EGRESS */
 
+/* Must be called with RCU read_lock */
 static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
 {
 #ifdef CONFIG_XPS
@@ -3217,7 +3218,6 @@ static inline int get_xps_queue(struct net_device *dev, 
struct sk_buff *skb)
struct xps_map *map;
int queue_index = -1;
 
-   rcu_read_lock();
dev_maps = rcu_dereference(dev->xps_maps);
if (dev_maps) {
map = rcu_dereference(
@@ -3232,7 +3232,6 @@ static inline int get_xps_queue(struct net_device *dev, 
struct sk_buff *skb)
queue_index = -1;
}
}
-   rcu_read_unlock();
 
return queue_index;
 #else
@@ -3240,26 +3239,82 @@ static inline int get_xps_queue(struct net_device *dev, 
struct sk_buff *skb)
 #endif
 }
 
-static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb)
+/* Must be called with RCU read_lock */
+static int get_xps_flows_index(struct net_device *dev, struct sk_buff *skb)
 {
-   struct sock *sk = skb->sk;
-   int queue_index = sk_tx_queue_get(sk);
+#ifdef CONFIG_XPS_FLOWS
+   struct xps_dev_flow_table *flow_table;
+   struct xps_dev_flow ent;
+   int queue_index;
+   struct netdev_queue *txq;
+   u32 hash;
 
-   if (queue_index < 0 || skb->ooo_okay ||
-   queue_index >= dev->real_num_tx_queues) {
-   int new_index = get_xps_queue(dev, skb);
-   if (new_index < 0)
-   new_index = skb_tx_hash(dev, skb);
+   flow_table = rcu_dereference(dev->xps_flow_table);
+   if (!flow_table)
+   return -1;
 
-   if (queue_index != new_index && sk &&
-   sk_fullsock(sk) &&
-   rcu_access_pointer(sk->sk_dst_cache))
-   sk_tx_queue_set(sk, new_index);
+   queue_index = get_xps_queue(dev, skb);
+   if (queue_index < 0)
+   return -1;
 
-   queue_index = new_index;
+   hash = skb_get_hash(skb);
+   if (!hash)
+   return -1;
+
+   ent.v64 = flow_table->flows[hash & flow_table->mask].v64;
+
+   if (queue_index != ent.queue_index &&
+   ent.queue_index >= 0 &&
+   ent.queue_index < dev->real_num_tx_queues) {
+   txq = netdev_get_tx_queue(dev, ent.queue_index);
+   if ((int)(txq->dql.num_completed_ops - ent.queue_ptr) < 0)  {
+   /* The current queue's tail has not advanced beyond the
+* last packet that was enqueued using the table entry.
+* We can't change queues without risking OOO. Stick
+* with the queue listed in the flow table.
+*/
+   queue_index = ent.queue_index;
+   }
}
 
+   /* Save the updated entry */
+   txq = netdev_get_tx_queue(dev, queue_index);
+   ent.queue_index = queue_index;
+   ent.queue_ptr = txq->dql.num_enqueue_ops;
+   flow_table->flows[hash & flow_table->mask].v64 = ent.v64;
+
return queue_index;
+#else
+   return get_xps_queue(dev, skb);
+#endif
+}
+
+static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb)
+{
+   struct sock *sk = skb->sk;
+   int queue_index = sk_tx_queue_get(sk);
+   int new_index;
+
+   if (queue_index < 0) {
+   /* Socket

[PATCH net-next 0/4] xps_flows: XPS flow steering when there is no socket

2016-08-31 Thread Tom Herbert

This patch set introduces transmit flow steering for socketless packets.
The idea is that we record the transmit queues in a flow table that is
indexed by skbuff hash.  The flow table entries have two values: the
queue_index and the head cnt of packets from the TX queue. We only allow
a queue to change for a flow if the tail cnt in the TX queue advances
beyond the recorded head cnt. That is the condition that should indicate
that all outstanding packets for the flow have completed transmission so
the queue can change.

Tracking the inflight queue is performed as part of DQL. Two fields are
added to the dql structure: num_enqueue_ops and num_completed_ops.
num_enqueue_ops incremented in dql_queued and num_completed_ops is
incremented in dql_completed by the number of operations completed (an
new argument to the function).

This patch set creates /sys/class/net/eth*/xps_dev_flow_table_cnt
which number of entries in the XPS flow table.

Note that the functionality here is technically best effort (for
instance we don't obtain a lock while processing a flow table entry).
Under high load it is possible that OOO packets can still be generated
due to XPS if two threads are hammering on the same flow table entry.
The assumption of this patches is that OOO packets are not the end of
the world and these should prevent OOO in most common use cases with
XPS.

This is a followup to previous RFC version. Fixes from RFC are:

  - Move counters to DQL
  - Fixed typo
  - Simplified get flow index funtion
  - Fixed sysfs flow_table_cnt to properly use DEVICE_ATTR_RW
  - Renamed the mechanism

Tested:
  Manually forced all packets to go through the xps_flows path.
  Observed that some flows were deferred to change queues because
  packets were in flight witht the flow bucket.

Tom Herbert (4):
  net: Set SW hash in skb_set_hash_from_sk
  dql: Add counters for number of queuing and completion operations
  net: Add xps_dev_flow_table_cnt
  xps_flows: XPS for packets that don't have a socket

 include/linux/dynamic_queue_limits.h |   7 ++-
 include/linux/netdevice.h|  26 -
 include/net/sock.h   |   6 +-
 lib/dynamic_queue_limits.c   |   3 +-
 net/Kconfig  |   6 ++
 net/core/dev.c   |  85 -
 net/core/net-sysfs.c | 103 +++
 7 files changed, 214 insertions(+), 22 deletions(-)

-- 
2.8.0.rc2

[PATCH net-next 1/4] net: Set SW hash in skb_set_hash_from_sk

2016-08-31 Thread Tom Herbert

Use the __skb_set_sw_hash to set the hash in an skbuff from the socket
txhash.

Signed-off-by: Tom Herbert 
---
 include/net/sock.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index c797c57..12e585c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1910,10 +1910,8 @@ static inline void sock_poll_wait(struct file *filp,
 
 static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
 {
-   if (sk->sk_txhash) {
-   skb->l4_hash = 1;
-   skb->hash = sk->sk_txhash;
-   }
+   if (sk->sk_txhash)
+   __skb_set_sw_hash(skb, sk->sk_txhash, true);
 }
 
 void skb_set_owner_w(struct sk_buff *skb, struct sock *sk);
-- 
2.8.0.rc2

[PATCH v3 4/5] arm64: dts: rockchip: add the gmac needed node for rk3399

2016-08-31 Thread Caesar Wang

The RK3399 GMAC Ethernet Controller provides a complete Ethernet interface
from processor to a Reduced Media Independent Interface (RMII) and Reduced
Gigabit Media Independent Interface (RGMII) compliant Ethernet PHY.

This patch adds the related needed device information.
e.g.: interrupts, grf, clocks, pinctrl and so on.

The full details are in [0].

[0]:
Documentation/devicetree/bindings/net/rockchip-dwmac.txt

Signed-off-by: Caesar Wang 
---

Changes in v3:
- generate a patch from https://patchwork.kernel.org/patch/9306339/.

Changes in v2: None

 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 80 
 1 file changed, 80 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 2ab233f..092bb45 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -200,6 +200,26 @@
};
};
 
+   gmac: ethernet@fe30 {
+   compatible = "rockchip,rk3399-gmac";
+   reg = <0x0 0xfe30 0x0 0x1>;
+   interrupts = ;
+   interrupt-names = "macirq";
+   clocks = < SCLK_MAC>, < SCLK_MAC_RX>,
+< SCLK_MAC_TX>, < SCLK_MACREF>,
+< SCLK_MACREF_OUT>, < ACLK_GMAC>,
+< PCLK_GMAC>;
+   clock-names = "stmmaceth", "mac_clk_rx",
+ "mac_clk_tx", "clk_mac_ref",
+ "clk_mac_refout", "aclk_mac",
+ "pclk_mac";
+   power-domains = < RK3399_PD_GMAC>;
+   resets = < SRST_A_GMAC>;
+   reset-names = "stmmaceth";
+   rockchip,grf = <>;
+   status = "disabled";
+   };
+
sdio0: dwmmc@fe31 {
compatible = "rockchip,rk3399-dw-mshc",
 "rockchip,rk3288-dw-mshc";
@@ -1193,6 +1213,66 @@
drive-strength = <13>;
};
 
+   gmac {
+   rgmii_pins: rgmii-pins {
+   rockchip,pins =
+   /* mac_txclk */
+   <3 17 RK_FUNC_1 _pull_none_13ma>,
+   /* mac_rxclk */
+   <3 14 RK_FUNC_1 _pull_none>,
+   /* mac_mdio */
+   <3 13 RK_FUNC_1 _pull_none>,
+   /* mac_txen */
+   <3 12 RK_FUNC_1 _pull_none_13ma>,
+   /* mac_clk */
+   <3 11 RK_FUNC_1 _pull_none>,
+   /* mac_rxdv */
+   <3 9 RK_FUNC_1 _pull_none>,
+   /* mac_mdc */
+   <3 8 RK_FUNC_1 _pull_none>,
+   /* mac_rxd1 */
+   <3 7 RK_FUNC_1 _pull_none>,
+   /* mac_rxd0 */
+   <3 6 RK_FUNC_1 _pull_none>,
+   /* mac_txd1 */
+   <3 5 RK_FUNC_1 _pull_none_13ma>,
+   /* mac_txd0 */
+   <3 4 RK_FUNC_1 _pull_none_13ma>,
+   /* mac_rxd3 */
+   <3 3 RK_FUNC_1 _pull_none>,
+   /* mac_rxd2 */
+   <3 2 RK_FUNC_1 _pull_none>,
+   /* mac_txd3 */
+   <3 1 RK_FUNC_1 _pull_none_13ma>,
+   /* mac_txd2 */
+   <3 0 RK_FUNC_1 _pull_none_13ma>;
+   };
+
+   rmii_pins: rmii-pins {
+   rockchip,pins =
+   /* mac_mdio */
+   <3 13 RK_FUNC_1 _pull_none>,
+   /* mac_txen */
+   <3 12 RK_FUNC_1 _pull_none_13ma>,
+   /* mac_clk */
+   <3 11 RK_FUNC_1 _pull_none>,
+   /* mac_rxer */
+   <3 10 RK_FUNC_1 _pull_none>,
+   /* mac_rxdv */
+   <3 9 RK_FUNC_1 _pull_none>,
+   /* mac_mdc */
+   <3 8 RK_FUNC_1 _pull_none>,
+   /* mac_rxd1 */
+

[PATCH v3 5/5] arm64: dts: rockchip: enable the gmac for rk3399 evb board

2016-08-31 Thread Caesar Wang

We add the required and optional properties for evb board.
See the [0] to get the detail information.

[0]:
Documentation/devicetree/bindings/net/rockchip-dwmac.txt

Signed-off-by: Roger Chen 
Signed-off-by: Caesar Wang 
---

Changes in v3: None
Changes in v2: None

 arch/arm64/boot/dts/rockchip/rk3399-evb.dts | 31 +
 1 file changed, 31 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts 
b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
index d47b4e9..ed6f2e8 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts
@@ -94,12 +94,43 @@
regulator-always-on;
regulator-boot-on;
};
+
+   clkin_gmac: external-gmac-clock {
+   compatible = "fixed-clock";
+   clock-frequency = <12500>;
+   clock-output-names = "clkin_gmac";
+   #clock-cells = <0>;
+   };
+
+   vcc_phy: vcc-phy-regulator {
+   compatible = "regulator-fixed";
+   regulator-name = "vcc_phy";
+   regulator-always-on;
+   regulator-boot-on;
+   };
+
 };
 
 _phy {
status = "okay";
 };
 
+ {
+   phy-supply = <_phy>;
+   phy-mode = "rgmii";
+   clock_in_out = "input";
+   snps,reset-gpio = < 15 GPIO_ACTIVE_LOW>;
+   snps,reset-active-low;
+   snps,reset-delays-us = <0 1 5>;
+   assigned-clocks = < SCLK_RMII_SRC>;
+   assigned-clock-parents = <_gmac>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_pins>;
+   tx_delay = <0x28>;
+   rx_delay = <0x11>;
+   status = "okay";
+};
+
  {
status = "okay";
 };
-- 
1.9.1

[PATCH v3 0/5] Support the rk3399 gmac pd function

2016-08-31 Thread Caesar Wang

This patch add to handle the gmac pd, and support
the rk3399 gmac for devicetree.

The History version:

v1: https://lkml.org/lkml/2016/8/30/668
v2: https://lkml.org/lkml/2016/8/31/27


Changes in v3:
- leave into two patches based on patchv2, and fix nits and commit, as
  comment on https://patchwork.kernel.org/patch/9306339/
- generate a patch from https://patchwork.kernel.org/patch/9306339/.

Changes in v2:
- rk_gmac_powerup instead of the rk_gmac_init.
- fixes the build error on next kernel.
- Fixes the order, ss Heiko commnets on
  https://patchwork.kernel.org/patch/9305991/

Caesar Wang (3):
  arm64: dts: rockchip: add the gmac power domain on rk3399
  arm64: dts: rockchip: add the gmac needed node for rk3399
  arm64: dts: rockchip: enable the gmac for rk3399 evb board

David Wu (1):
  net: stmmac: dwmac-rk: add pd_gmac support for rk3399

Roger Chen (1):
  net: stmmac: dwmac-rk: fixes the gmac resume after PD on/off

 arch/arm64/boot/dts/rockchip/rk3399-evb.dts| 31 +
 arch/arm64/boot/dts/rockchip/rk3399.dtsi   | 90 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 28 +---
 3 files changed, 140 insertions(+), 9 deletions(-)

-- 
1.9.1

[PATCH v3 3/5] arm64: dts: rockchip: add the gmac power domain on rk3399

2016-08-31 Thread Caesar Wang

This patch supports the gmac pd to save power consumption.
Even though some boards not need Ethernet support, the driver
core can also take care of powering up the pd before probe.

Signed-off-by: Roger Chen 
Signed-off-by: Caesar Wang 
---

Changes in v3:
- leave into two patches based on patchv2, and fix nits and commit, as
  comment on https://patchwork.kernel.org/patch/9306339/

Changes in v2:
- Fixes the order, ss Heiko commnets on
  https://patchwork.kernel.org/patch/9305991/

 arch/arm64/boot/dts/rockchip/rk3399.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index 32aebc8..2ab233f 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -611,6 +611,11 @@
status = "disabled";
};
 
+   qos_gmac: qos@ffa5c000 {
+   compatible = "syscon";
+   reg = <0x0 0xffa5c000 0x0 0x20>;
+   };
+
qos_hdcp: qos@ffa9 {
compatible = "syscon";
reg = <0x0 0xffa9 0x0 0x20>;
@@ -739,6 +744,11 @@
};
 
/* These power domains are grouped by VD_LOGIC */
+   pd_gmac@RK3399_PD_GMAC {
+   reg = ;
+   clocks = < ACLK_GMAC>;
+   pm_qos = <_gmac>;
+   };
pd_vio@RK3399_PD_VIO {
reg = ;
#address-cells = <1>;
-- 
1.9.1

[PATCH v3 2/5] net: stmmac: dwmac-rk: add pd_gmac support for rk3399

2016-08-31 Thread Caesar Wang

From: David Wu 

Add the gmac power domain support for rk3399, in order to save more
power consumption.

Signed-off-by: David Wu 
Signed-off-by: Caesar Wang 
---

Changes in v3: None
Changes in v2:
- fixes the build error on next kernel.

 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 289e7a6..406573d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "stmmac_platform.h"
 
@@ -659,11 +660,19 @@ static int rk_gmac_powerup(struct rk_priv_data *bsp_priv)
if (ret)
return ret;
 
+   pm_runtime_enable(dev);
+   pm_runtime_get_sync(dev);
+
return 0;
 }
 
 static void rk_gmac_powerdown(struct rk_priv_data *gmac)
 {
+   struct device *dev = >pdev->dev;
+
+   pm_runtime_put_sync(dev);
+   pm_runtime_disable(dev);
+
phy_power_on(gmac, false);
gmac_clk_enable(gmac, false);
 }
-- 
1.9.1

Re: [PATCH] softirq: let ksoftirqd do its job

2016-08-31 Thread Rick Jones


On 08/31/2016 04:11 PM, Eric Dumazet wrote:

On Wed, 2016-08-31 at 15:47 -0700, Rick Jones wrote:

With regard to drops, are both of you sure you're using the same socket
buffer sizes?


Does it really matter ?


At least at points in the past I have seen different drop counts at the 
SO_RCVBUF based on using (sometimes much) larger sizes.  The hypothesis 
I was operating under at the time was that this dealt with those 
situations where the netserver was held-off from running for "a little 
while" from time to time.  It didn't change things for a sustained 
overload situation though.



In the meantime, is anything interesting happening with TCP_RR or
TCP_STREAM?


TCP_RR is driven by the network latency, we do not drop packets in the
socket itself.


I've been of the opinion it (single stream) is driven by path length. 
Sometimes by NIC latency.  But then I'm almost always measuring in the 
LAN rather than across the WAN.


happy benchmarking,

rick

[PATCH v3 1/5] net: stmmac: dwmac-rk: fixes the gmac resume after PD on/off

2016-08-31 Thread Caesar Wang

From: Roger Chen 

GMAC Power Domain(PD) will be disabled during suspend.
That will causes GRF registers reset.
So corresponding GRF registers for GMAC must be setup again.

Signed-off-by: Roger Chen 
Signed-off-by: Caesar Wang 
---

Changes in v3: None
Changes in v2:
- rk_gmac_powerup instead of the rk_gmac_init.

 drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
index 9210591..289e7a6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c
@@ -629,6 +629,16 @@ static struct rk_priv_data *rk_gmac_setup(struct 
platform_device *pdev,
"rockchip,grf");
bsp_priv->pdev = pdev;
 
+   gmac_clk_init(bsp_priv);
+
+   return bsp_priv;
+}
+
+static int rk_gmac_powerup(struct rk_priv_data *bsp_priv)
+{
+   int ret;
+   struct device *dev = _priv->pdev->dev;
+
/*rmii or rgmii*/
if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RGMII) {
dev_info(dev, "init for RGMII\n");
@@ -641,15 +651,6 @@ static struct rk_priv_data *rk_gmac_setup(struct 
platform_device *pdev,
dev_err(dev, "NO interface defined!\n");
}
 
-   gmac_clk_init(bsp_priv);
-
-   return bsp_priv;
-}
-
-static int rk_gmac_powerup(struct rk_priv_data *bsp_priv)
-{
-   int ret;
-
ret = phy_power_on(bsp_priv, true);
if (ret)
return ret;
-- 
1.9.1

[PATCH] [v10] net: emac: emac gigabit ethernet controller driver

2016-08-31 Thread Timur Tabi

Add support for the Qualcomm Technologies, Inc. EMAC gigabit Ethernet
controller.

This driver supports the following features:
1) Checksum offload.
2) Interrupt coalescing support.
3) SGMII phy.
4) phylib interface for external phy

Based on original work by
Niranjana Vishwanathapura 
Gilad Avidov 

Signed-off-by: Timur Tabi 
---

v10:
 - removed superfluous acpi-related data
 - fix the Makefiles to allow module building
 - rename "jubbers" to "jabbers"
 - fix some function prototypes (found via sparse)
 - removed invalid __iomem (found via sparse)
 - don't print phy status unless phy is attached

 .../devicetree/bindings/net/qcom-emac.txt  |  112 ++
 MAINTAINERS|6 +
 drivers/net/ethernet/qualcomm/Kconfig  |   12 +
 drivers/net/ethernet/qualcomm/Makefile |2 +
 drivers/net/ethernet/qualcomm/emac/Makefile|7 +
 drivers/net/ethernet/qualcomm/emac/emac-mac.c  | 1528 
 drivers/net/ethernet/qualcomm/emac/emac-mac.h  |  248 
 drivers/net/ethernet/qualcomm/emac/emac-phy.c  |  204 +++
 drivers/net/ethernet/qualcomm/emac/emac-phy.h  |   33 +
 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c|  722 +
 drivers/net/ethernet/qualcomm/emac/emac-sgmii.h|   24 +
 drivers/net/ethernet/qualcomm/emac/emac.c  |  743 ++
 drivers/net/ethernet/qualcomm/emac/emac.h  |  335 +
 13 files changed, 3976 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/qcom-emac.txt
 create mode 100644 drivers/net/ethernet/qualcomm/emac/Makefile
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-mac.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-mac.h
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-phy.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-phy.h
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii.h
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac.c
 create mode 100644 drivers/net/ethernet/qualcomm/emac/emac.h

diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt 
b/Documentation/devicetree/bindings/net/qcom-emac.txt
new file mode 100644
index 000..90c3584
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qcom-emac.txt
@@ -0,0 +1,112 @@
+Qualcomm Technologies EMAC Gigabit Ethernet Controller
+
+This network controller consists of two devices: a MAC and an SGMII
+internal PHY.  Each device is represented by a device tree node.  A phandle
+connects the MAC node to its corresponding internal phy node.  Another
+phandle points to the external PHY node.
+
+Required properties:
+
+MAC node:
+- compatible : Should be "qcom,fsm9900-emac".
+- reg : Offset and length of the register regions for the device
+- interrupts : Interrupt number used by this controller
+- mac-address : The 6-byte MAC address. If present, it is the default
+   MAC address.
+- internal-phy : phandle to the internal PHY node
+- phy-handle : phandle the the external PHY node
+
+Internal PHY node:
+- compatible : Should be "qcom,fsm9900-emac-sgmii" or 
"qcom,qdf2432-emac-sgmii".
+- reg : Offset and length of the register region(s) for the device
+- interrupts : Interrupt number used by this controller
+
+The external phy child node:
+- reg : The phy address
+
+Example:
+
+FSM9900:
+
+soc {
+   #address-cells = <1>;
+   #size-cells = <1>;
+
+   emac0: ethernet@feb2 {
+   compatible = "qcom,fsm9900-emac";
+   reg = <0xfeb2 0x1>,
+ <0xfeb36000 0x1000>;
+   interrupts = <76>;
+
+   clocks = < 0>, < 1>, < 3>, < 4>, < 5>,
+   < 6>, < 7>;
+   clock-names = "axi_clk", "cfg_ahb_clk", "high_speed_clk",
+   "mdio_clk", "tx_clk", "rx_clk", "sys_clk";
+
+   internal-phy = <_sgmii>;
+
+   phy-handle = <>;
+
+   #address-cells = <1>;
+   #size-cells = <0>;
+   phy0: ethernet-phy@0 {
+   reg = <0>;
+   };
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_pins_a>;
+   };
+
+   emac_sgmii: ethernet@feb38000 {
+   compatible = "qcom,fsm9900-emac-sgmii";
+   reg = <0xfeb38000 0x1000>;
+   interrupts = <80>;
+   };
+
+   tlmm: pinctrl@fd51 {
+   compatible = "qcom,fsm9900-pinctrl";
+
+   mdio_pins_a: mdio {
+   state {
+   pins = "gpio123", "gpio124";
+   function = "mdio";
+   };
+   };
+   };
+
+
+QDF2432:
+
+soc {
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   emac0: ethernet@3880 {
+

Re: [PATCH] softirq: let ksoftirqd do its job

2016-08-31 Thread Eric Dumazet

On Wed, 2016-08-31 at 15:47 -0700, Rick Jones wrote:
> With regard to drops, are both of you sure you're using the same socket 
> buffer sizes?

Does it really matter ?

I used the standard /proc/sys/net/core/rmem_default, but under flood
receive queue is almost always full, even if you make it bigger.

By varying its size, you only make batches bigger and number of context
switches should be lower, if only two threads are competing for the cpu.

Exact 'optimal' size would depend on various factors, depending on
application and platform constraints.

> 
> In the meantime, is anything interesting happening with TCP_RR or 
> TCP_STREAM?

TCP_RR is driven by the network latency, we do not drop packets in the
socket itself.

TC_STREAM is normally paced by the ability of the receiver to send ACK
packets. TCP has this auto regulating mode, unless the sender violates
the RFC(s).

If your question is :

What happens if thousands of threads on the host want the cpu, and
ksoftirqd gets not enough cycles by virtue of being a normal thread ?

Then, you are back to typical provisioning problems, and normally people
play with priorities and containers/cgroups, and/or various techniques
like RPS/RFS

(You can change ksoftirqd priority if you like)

Re: [PATCH net-next 00/12] net: Convert vrf from dst to tx hook

2016-08-31 Thread David Ahern

On 8/30/16 11:34 AM, David Ahern wrote:
> This series fixes this problem by removing the output dst that points
> to the VRF and always doing the actual FIB lookup. This allows the real
> dst to be cached on sockets and used for MSS. Packets are diverted to
> the VRF device on Tx using an l3mdev hook in the output path similar to
> to what is done for Rx.

Dave:

please drop this series. BGP smoke tests triggered a couple of problems I need 
to resolve.

Re: [PATCH] softirq: let ksoftirqd do its job

2016-08-31 Thread Rick Jones

With regard to drops, are both of you sure you're using the same socket 
buffer sizes?


In the meantime, is anything interesting happening with TCP_RR or 
TCP_STREAM?


happy benchmarking,

rick jones

[PATCH 0/5] constify ethtool_ops structures

2016-08-31 Thread Julia Lawall

Constify ethtool_ops structures.

---

 drivers/net/ethernet/mediatek/mtk_eth_soc.c   |2 +-
 drivers/net/ethernet/synopsys/dwc_eth_qos.c   |2 +-
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c |2 +-
 drivers/net/usb/r8152.c   |2 +-
 drivers/staging/netlogic/xlr_net.c|2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

[PATCH 5/5] net: axienet: constify ethtool_ops structures

2016-08-31 Thread Julia Lawall

Check for ethtool_ops structures that are only stored in the ethtool_ops
field of a net_device structure or passed as the second argument to
netdev_set_default_ethtool_ops.  These contexts are declared const, so
ethtool_ops structures that have these properties can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct ethtool_ops i@p = { ... };

@ok1@
identifier r.i;
struct net_device e;
position p;
@@
e.ethtool_ops = @p;

@ok2@
identifier r.i;
expression e;
position p;
@@
netdev_set_default_ethtool_ops(e, @p)

@bad@
position p != {r.p,ok1.p,ok2.p};
identifier r.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct ethtool_ops i = { ... };
// 

Suggested-by: Stephen Hemminger 

Signed-off-by: Julia Lawall 

---
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c 
b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index 36ee7ab..69e2a83 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -1297,7 +1297,7 @@ static int axienet_ethtools_set_coalesce(struct 
net_device *ndev,
return 0;
 }
 
-static struct ethtool_ops axienet_ethtool_ops = {
+static const struct ethtool_ops axienet_ethtool_ops = {
.get_drvinfo= axienet_ethtools_get_drvinfo,
.get_regs_len   = axienet_ethtools_get_regs_len,
.get_regs   = axienet_ethtools_get_regs,

[PATCH 3/5] dwc_eth_qos: constify ethtool_ops structures

2016-08-31 Thread Julia Lawall

Check for ethtool_ops structures that are only stored in the ethtool_ops
field of a net_device structure or passed as the second argument to
netdev_set_default_ethtool_ops.  These contexts are declared const, so
ethtool_ops structures that have these properties can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct ethtool_ops i@p = { ... };

@ok1@
identifier r.i;
struct net_device e;
position p;
@@
e.ethtool_ops = @p;

@ok2@
identifier r.i;
expression e;
position p;
@@
netdev_set_default_ethtool_ops(e, @p)

@bad@
position p != {r.p,ok1.p,ok2.p};
identifier r.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct ethtool_ops i = { ... };
// 

Suggested-by: Stephen Hemminger 

Signed-off-by: Julia Lawall 

---
 drivers/net/ethernet/synopsys/dwc_eth_qos.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
index 5a3941b..c25d971 100644
--- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -2743,7 +2743,7 @@ static void dwceqos_set_msglevel(struct net_device *ndev, 
u32 msglevel)
lp->msg_enable = msglevel;
 }
 
-static struct ethtool_ops dwceqos_ethtool_ops = {
+static const struct ethtool_ops dwceqos_ethtool_ops = {
.get_drvinfo= dwceqos_get_drvinfo,
.get_link   = ethtool_op_get_link,
.get_pauseparam = dwceqos_get_pauseparam,

[PATCH 4/5] r8152: constify ethtool_ops structures

2016-08-31 Thread Julia Lawall

Check for ethtool_ops structures that are only stored in the ethtool_ops
field of a net_device structure or passed as the second argument to
netdev_set_default_ethtool_ops.  These contexts are declared const, so
ethtool_ops structures that have these properties can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct ethtool_ops i@p = { ... };

@ok1@
identifier r.i;
struct net_device e;
position p;
@@
e.ethtool_ops = @p;

@ok2@
identifier r.i;
expression e;
position p;
@@
netdev_set_default_ethtool_ops(e, @p)

@bad@
position p != {r.p,ok1.p,ok2.p};
identifier r.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct ethtool_ops i = { ... };
// 

Suggested-by: Stephen Hemminger 

Signed-off-by: Julia Lawall 

---
 drivers/net/usb/r8152.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f41a8ad..f72f807 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -4032,7 +4032,7 @@ static int rtl8152_set_coalesce(struct net_device *netdev,
return ret;
 }
 
-static struct ethtool_ops ops = {
+static const struct ethtool_ops ops = {
.get_drvinfo = rtl8152_get_drvinfo,
.get_settings = rtl8152_get_settings,
.set_settings = rtl8152_set_settings,

[PATCH 1/5] net: mediatek: constify ethtool_ops structures

2016-08-31 Thread Julia Lawall

Check for ethtool_ops structures that are only stored in the ethtool_ops
field of a net_device structure or passed as the second argument to
netdev_set_default_ethtool_ops.  These contexts are declared const, so
ethtool_ops structures that have these properties can be declared as const
also.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// 
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct ethtool_ops i@p = { ... };

@ok1@
identifier r.i;
struct net_device e;
position p;
@@
e.ethtool_ops = @p;

@ok2@
identifier r.i;
expression e;
position p;
@@
netdev_set_default_ethtool_ops(e, @p)

@bad@
position p != {r.p,ok1.p,ok2.p};
identifier r.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct ethtool_ops i = { ... };
// 

Suggested-by: Stephen Hemminger 

Signed-off-by: Julia Lawall 

---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 1801fd8..98f22cd 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1692,7 +1692,7 @@ static void mtk_get_ethtool_stats(struct net_device *dev,
} while (u64_stats_fetch_retry_irq(>syncp, start));
 }
 
-static struct ethtool_ops mtk_ethtool_ops = {
+static const struct ethtool_ops mtk_ethtool_ops = {
.get_settings   = mtk_get_settings,
.set_settings   = mtk_set_settings,
.get_drvinfo= mtk_get_drvinfo,

Re: [PATCH] softirq: let ksoftirqd do its job

2016-08-31 Thread Eric Dumazet

On Wed, 2016-08-31 at 23:51 +0200, Jesper Dangaard Brouer wrote:

> 
> The result from this run were handling 1,517,248 pps, without any
> drops, all processes pinned to the same CPU.
> 
>  $ nstat > /dev/null && sleep 1 && nstat
>  #kernel
>  IpInReceives15172250.0
>  IpInDelivers15172240.0
>  UdpInDatagrams  15172480.0
>  IpExtInOctets   69793408   0.0
>  IpExtInNoECTPkts15172460.0
> 
> I'm acking this patch:
> 
> Acked-by: Jesper Dangaard Brouer 
> 

Thanks a lot for bringing back the issue to me again, and all your
tests !

[PATCH net-next] net: dsa: remove ds_to_priv

2016-08-31 Thread Vivien Didelot

Access the priv member of the dsa_switch structure directly, instead of
having an unnecessary helper.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/b53/b53_common.c | 42 +++
 drivers/net/dsa/bcm_sf2.h|  2 +-
 drivers/net/dsa/mv88e6060.c  |  4 +--
 drivers/net/dsa/mv88e6xxx/chip.c | 72 
 include/net/dsa.h|  5 ---
 5 files changed, 60 insertions(+), 65 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 1299104..0afc2e5 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -477,7 +477,7 @@ static int b53_fast_age_vlan(struct b53_device *dev, u16 
vid)
 
 static void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
unsigned int i;
u16 pvlan;
 
@@ -495,7 +495,7 @@ static void b53_imp_vlan_setup(struct dsa_switch *ds, int 
cpu_port)
 static int b53_enable_port(struct dsa_switch *ds, int port,
   struct phy_device *phy)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
unsigned int cpu_port = dev->cpu_port;
u16 pvlan;
 
@@ -520,7 +520,7 @@ static int b53_enable_port(struct dsa_switch *ds, int port,
 static void b53_disable_port(struct dsa_switch *ds, int port,
 struct phy_device *phy)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
u8 reg;
 
/* Disable Tx/Rx for the port */
@@ -629,7 +629,7 @@ static int b53_switch_reset(struct b53_device *dev)
 
 static int b53_phy_read16(struct dsa_switch *ds, int addr, int reg)
 {
-   struct b53_device *priv = ds_to_priv(ds);
+   struct b53_device *priv = ds->priv;
u16 value = 0;
int ret;
 
@@ -644,7 +644,7 @@ static int b53_phy_read16(struct dsa_switch *ds, int addr, 
int reg)
 
 static int b53_phy_write16(struct dsa_switch *ds, int addr, int reg, u16 val)
 {
-   struct b53_device *priv = ds_to_priv(ds);
+   struct b53_device *priv = ds->priv;
 
if (priv->ops->phy_write16)
return priv->ops->phy_write16(priv, addr, reg, val);
@@ -714,7 +714,7 @@ static unsigned int b53_get_mib_size(struct b53_device *dev)
 
 static void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
const struct b53_mib_desc *mibs = b53_get_mib(dev);
unsigned int mib_size = b53_get_mib_size(dev);
unsigned int i;
@@ -727,7 +727,7 @@ static void b53_get_strings(struct dsa_switch *ds, int 
port, uint8_t *data)
 static void b53_get_ethtool_stats(struct dsa_switch *ds, int port,
  uint64_t *data)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
const struct b53_mib_desc *mibs = b53_get_mib(dev);
unsigned int mib_size = b53_get_mib_size(dev);
const struct b53_mib_desc *s;
@@ -759,7 +759,7 @@ static void b53_get_ethtool_stats(struct dsa_switch *ds, 
int port,
 
 static int b53_get_sset_count(struct dsa_switch *ds)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
 
return b53_get_mib_size(dev);
 }
@@ -771,7 +771,7 @@ static int b53_set_addr(struct dsa_switch *ds, u8 *addr)
 
 static int b53_setup(struct dsa_switch *ds)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
unsigned int port;
int ret;
 
@@ -802,7 +802,7 @@ static int b53_setup(struct dsa_switch *ds)
 static void b53_adjust_link(struct dsa_switch *ds, int port,
struct phy_device *phydev)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
u8 rgmii_ctrl = 0, reg = 0, off;
 
if (!phy_is_pseudo_fixed_link(phydev))
@@ -936,7 +936,7 @@ static int b53_vlan_prepare(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan,
struct switchdev_trans *trans)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
 
if ((is5325(dev) || is5365(dev)) && vlan->vid_begin == 0)
return -EOPNOTSUPP;
@@ -953,7 +953,7 @@ static void b53_vlan_add(struct dsa_switch *ds, int port,
 const struct switchdev_obj_port_vlan *vlan,
 struct switchdev_trans *trans)
 {
-   struct b53_device *dev = ds_to_priv(ds);
+   struct b53_device *dev = ds->priv;
bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID;
unsigned int cpu_port =

RE: [PATCH net-next V4 00/10] liquidio CN23XX support

2016-08-31 Thread Vatsavayi, Raghu

Thanks Much. 

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, August 31, 2016 2:13 PM
> To: Vatsavayi, Raghu
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next V4 00/10] liquidio CN23XX support
> 
> From: Raghu Vatsavayi 
> Date: Wed, 31 Aug 2016 11:03:19 -0700
> 
> > Following patchset adds support for new device "CN23XX" in liquidio
> > family of adapters. As adviced by you I have split the previous V3
> > patch of 18 patches into two halves. This first patchset has first 10
> > patches, which are tested against net-next. I will post the second
> > half after this one.
> >
> > This V4 patch also addressed all the comments from previous
> > submission:
> > 1) Avoid busy loop while reading registers.
> > 2) Other minor comments about debug messages and constants.
> >
> > Please apply patches in following order as some of the patches depend
> > on earlier patches.
> 
> Series applied, thanks.

[PATCH v2 net-next 4/6] perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs

2016-08-31 Thread Alexei Starovoitov

Allow attaching BPF_PROG_TYPE_PERF_EVENT programs to sw and hw perf events
via overflow_handler mechanism.
When program is attached the overflow_handlers become stacked.
The program acts as a filter.
Returning zero from the program means that the normal perf_event_output handler
will not be called and sampling event won't be stored in the ring buffer.

The overflow_handler_context==NULL is an additional safety check
to make sure programs are not attached to hw breakpoints and watchdog
in case other checks (that prevent that now anyway) get accidentally
relaxed in the future.

The program refcnt is incremented in case perf_events are inhereted
when target task is forked.
Similar to kprobe and tracepoint programs there is no ioctl to
detach the program or swap already attached program. The user space
expected to close(perf_event_fd) like it does right now for kprobe+bpf.
That restriction simplifies the code quite a bit.

The invocation of overflow_handler in __perf_event_overflow() is now
done via READ_ONCE, since that pointer can be replaced when the program
is attached while perf_event itself could have been active already.
There is no need to do similar treatment for event->prog, since it's
assigned only once before it's accessed.

Signed-off-by: Alexei Starovoitov 
---
 include/linux/bpf.h|  4 +++
 include/linux/perf_event.h |  2 ++
 kernel/events/core.c   | 85 +-
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 11134238417d..9a904f63f8c1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -297,6 +297,10 @@ static inline struct bpf_prog *bpf_prog_add(struct 
bpf_prog *prog, int i)
 static inline void bpf_prog_put(struct bpf_prog *prog)
 {
 }
+static inline struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
+{
+   return ERR_PTR(-EOPNOTSUPP);
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 /* verifier prototypes for helper functions called from eBPF programs */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 97bfe62f30d7..dcaaaf3ec8e6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -679,6 +679,8 @@ struct perf_event {
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
void*overflow_handler_context;
+   perf_overflow_handler_t orig_overflow_handler;
+   struct bpf_prog *prog;
 
 #ifdef CONFIG_EVENT_TRACING
struct trace_event_call *tp_event;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3cfabdf7b942..305433ab2447 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7022,7 +7022,7 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(>pending);
}
 
-   event->overflow_handler(event, data, regs);
+   READ_ONCE(event->overflow_handler)(event, data, regs);
 
if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
@@ -7637,11 +7637,75 @@ static void perf_event_free_filter(struct perf_event 
*event)
ftrace_profile_free_filter(event);
 }
 
+static void bpf_overflow_handler(struct perf_event *event,
+struct perf_sample_data *data,
+struct pt_regs *regs)
+{
+   struct bpf_perf_event_data_kern ctx = {
+   .data = data,
+   .regs = regs,
+   };
+   int ret = 0;
+
+#ifdef CONFIG_BPF_SYSCALL
+   preempt_disable();
+   if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
+   goto out;
+   rcu_read_lock();
+   ret = BPF_PROG_RUN(event->prog, (void *));
+   rcu_read_unlock();
+ out:
+   __this_cpu_dec(bpf_prog_active);
+   preempt_enable();
+#endif
+   if (!ret)
+   return;
+
+   event->orig_overflow_handler(event, data, regs);
+}
+
+static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
+{
+   struct bpf_prog *prog;
+
+   if (event->overflow_handler_context)
+   /* hw breakpoint or kernel counter */
+   return -EINVAL;
+
+   if (event->prog)
+   return -EEXIST;
+
+   prog = bpf_prog_get_type(prog_fd, BPF_PROG_TYPE_PERF_EVENT);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+
+   event->prog = prog;
+   event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
+   WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
+   return 0;
+}
+
+static void perf_event_free_bpf_handler(struct perf_event *event)
+{
+   struct bpf_prog *prog = event->prog;
+
+   if (!prog)
+   return;
+
+   WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
+   event->prog = NULL;
+   bpf_prog_put(prog);
+}
+
 static int

[PATCH v2 net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events

2016-08-31 Thread Alexei Starovoitov

Hi Peter, Dave,

this patch set is a follow up to the discussion:
https://lkml.kernel.org/r/20160804142853.GO6862%20()%20twins%20!%20programming%20!%20kicks-ass%20!%20net
It turned out to be simpler than what we discussed.

Patches 1-3 is bpf-side prep for the main patch 4
that adds bpf program as an overflow_handler to sw and hw perf_events.
Peter, please review.

Patches 5 and 6 are examples from myself and Brendan.

v1-v2: fixed issues spotted by Peter and Daniel.

Thanks!

Alexei Starovoitov (5):
  bpf: support 8-byte metafield access
  bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type
  bpf: perf_event progs should only use preallocated maps
  perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT
programs
  samples/bpf: add perf_event+bpf example

Brendan Gregg (1):
  samples/bpf: add sampleip example

 include/linux/bpf.h |   4 +
 include/linux/perf_event.h  |   7 ++
 include/uapi/linux/Kbuild   |   1 +
 include/uapi/linux/bpf.h|   1 +
 include/uapi/linux/bpf_perf_event.h |  18 +++
 kernel/bpf/verifier.c   |  31 +-
 kernel/events/core.c|  85 +-
 kernel/trace/bpf_trace.c|  60 ++
 samples/bpf/Makefile|   8 ++
 samples/bpf/bpf_helpers.h   |   2 +
 samples/bpf/bpf_load.c  |   7 +-
 samples/bpf/sampleip_kern.c |  38 +++
 samples/bpf/sampleip_user.c | 196 +
 samples/bpf/trace_event_kern.c  |  65 +++
 samples/bpf/trace_event_user.c  | 213 
 15 files changed, 730 insertions(+), 6 deletions(-)
 create mode 100644 include/uapi/linux/bpf_perf_event.h
 create mode 100644 samples/bpf/sampleip_kern.c
 create mode 100644 samples/bpf/sampleip_user.c
 create mode 100644 samples/bpf/trace_event_kern.c
 create mode 100644 samples/bpf/trace_event_user.c

-- 
2.8.0

[PATCH v2 net-next 2/6] bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type

2016-08-31 Thread Alexei Starovoitov

Introduce BPF_PROG_TYPE_PERF_EVENT programs that can be attached to
HW and SW perf events (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE
correspondingly in uapi/linux/perf_event.h)

The program visible context meta structure is
struct bpf_perf_event_data {
struct pt_regs regs;
 __u64 sample_period;
};
which is accessible directly from the program:
int bpf_prog(struct bpf_perf_event_data *ctx)
{
  ... ctx->sample_period ...
  ... ctx->regs.ip ...
}

The bpf verifier rewrites the accesses into kernel internal
struct bpf_perf_event_data_kern which allows changing
struct perf_sample_data without affecting bpf programs.
New fields can be added to the end of struct bpf_perf_event_data
in the future.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 include/linux/perf_event.h  |  5 
 include/uapi/linux/Kbuild   |  1 +
 include/uapi/linux/bpf.h|  1 +
 include/uapi/linux/bpf_perf_event.h | 18 +++
 kernel/trace/bpf_trace.c| 60 +
 5 files changed, 85 insertions(+)
 create mode 100644 include/uapi/linux/bpf_perf_event.h

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2b6b43cc0dd5..97bfe62f30d7 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -788,6 +788,11 @@ struct perf_output_handle {
int page;
 };
 
+struct bpf_perf_event_data_kern {
+   struct pt_regs *regs;
+   struct perf_sample_data *data;
+};
+
 #ifdef CONFIG_CGROUP_PERF
 
 /*
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 185f8ea2702f..d0352a971ebd 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -71,6 +71,7 @@ header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
 header-y += bpf_common.h
+header-y += bpf_perf_event.h
 header-y += bpf.h
 header-y += bpqether.h
 header-y += bsg.h
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e4c5a1baa993..f896dfac4ac0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -95,6 +95,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
BPF_PROG_TYPE_XDP,
+   BPF_PROG_TYPE_PERF_EVENT,
 };
 
 #define BPF_PSEUDO_MAP_FD  1
diff --git a/include/uapi/linux/bpf_perf_event.h 
b/include/uapi/linux/bpf_perf_event.h
new file mode 100644
index ..067427259820
--- /dev/null
+++ b/include/uapi/linux/bpf_perf_event.h
@@ -0,0 +1,18 @@
+/* Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _UAPI__LINUX_BPF_PERF_EVENT_H__
+#define _UAPI__LINUX_BPF_PERF_EVENT_H__
+
+#include 
+#include 
+
+struct bpf_perf_event_data {
+   struct pt_regs regs;
+   __u64 sample_period;
+};
+
+#endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ad35213b8405..0ac414abbf68 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1,4 +1,5 @@
 /* Copyright (c) 2011-2015 PLUMgrid, http://plumgrid.com
+ * Copyright (c) 2016 Facebook
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -8,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -552,10 +554,68 @@ static struct bpf_prog_type_list tracepoint_tl = {
.type   = BPF_PROG_TYPE_TRACEPOINT,
 };
 
+static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type 
type,
+   enum bpf_reg_type *reg_type)
+{
+   if (off < 0 || off >= sizeof(struct bpf_perf_event_data))
+   return false;
+   if (type != BPF_READ)
+   return false;
+   if (off % size != 0)
+   return false;
+   if (off == offsetof(struct bpf_perf_event_data, sample_period)) {
+   if (size != sizeof(u64))
+   return false;
+   } else {
+   if (size != sizeof(long))
+   return false;
+   }
+   return true;
+}
+
+static u32 pe_prog_convert_ctx_access(enum bpf_access_type type, int dst_reg,
+ int src_reg, int ctx_off,
+ struct bpf_insn *insn_buf,
+ struct bpf_prog *prog)
+{
+   struct bpf_insn *insn = insn_buf;
+
+   BUILD_BUG_ON(FIELD_SIZEOF(struct perf_sample_data, period) != 
sizeof(u64));
+   switch (ctx_off) {
+   case offsetof(struct bpf_perf_event_data, sample_period):
+   *insn++ = BPF_LDX_MEM(bytes_to_bpf_size(FIELD_SIZEOF(struct 
bpf_perf_event_data_kern, data)),
+ dst_reg, src_reg,
+

[PATCH v2 net-next 3/6] bpf: perf_event progs should only use preallocated maps

2016-08-31 Thread Alexei Starovoitov

Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use
preallocated hash maps, since doing memory allocation
in overflow_handler can crash depending on where nmi got triggered.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c1c9e441f0f5..48c2705db22c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2511,6 +2511,20 @@ process_bpf_exit:
return 0;
 }
 
+static int check_map_prog_compatibility(struct bpf_map *map,
+   struct bpf_prog *prog)
+
+{
+   if (prog->type == BPF_PROG_TYPE_PERF_EVENT &&
+   (map->map_type == BPF_MAP_TYPE_HASH ||
+map->map_type == BPF_MAP_TYPE_PERCPU_HASH) &&
+   (map->map_flags & BPF_F_NO_PREALLOC)) {
+   verbose("perf_event programs can only use preallocated hash 
map\n");
+   return -EINVAL;
+   }
+   return 0;
+}
+
 /* look for pseudo eBPF instructions that access map FDs and
  * replace them with actual map pointers
  */
@@ -2518,7 +2532,7 @@ static int replace_map_fd_with_map_ptr(struct 
verifier_env *env)
 {
struct bpf_insn *insn = env->prog->insnsi;
int insn_cnt = env->prog->len;
-   int i, j;
+   int i, j, err;
 
for (i = 0; i < insn_cnt; i++, insn++) {
if (BPF_CLASS(insn->code) == BPF_LDX &&
@@ -2562,6 +2576,12 @@ static int replace_map_fd_with_map_ptr(struct 
verifier_env *env)
return PTR_ERR(map);
}
 
+   err = check_map_prog_compatibility(map, env->prog);
+   if (err) {
+   fdput(f);
+   return err;
+   }
+
/* store map pointer inside BPF_LD_IMM64 instruction */
insn[0].imm = (u32) (unsigned long) map;
insn[1].imm = ((u64) (unsigned long) map) >> 32;
-- 
2.8.0

[PATCH v2 net-next 1/6] bpf: support 8-byte metafield access

2016-08-31 Thread Alexei Starovoitov

The verifier supported only 4-byte metafields in
struct __sk_buff and struct xdp_md. The metafields in upcoming
struct bpf_perf_event are 8-byte to match register width in struct pt_regs.
Teach verifier to recognize 8-byte metafield access.
The patch doesn't affect safety of sockets and xdp programs.
They check for 4-byte only ctx access before these conditions are hit.

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index abb61f3f6900..c1c9e441f0f5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2333,7 +2333,8 @@ static int do_check(struct verifier_env *env)
if (err)
return err;
 
-   if (BPF_SIZE(insn->code) != BPF_W) {
+   if (BPF_SIZE(insn->code) != BPF_W &&
+   BPF_SIZE(insn->code) != BPF_DW) {
insn_idx++;
continue;
}
@@ -2642,9 +2643,11 @@ static int convert_ctx_accesses(struct verifier_env *env)
for (i = 0; i < insn_cnt; i++, insn++) {
u32 insn_delta, cnt;
 
-   if (insn->code == (BPF_LDX | BPF_MEM | BPF_W))
+   if (insn->code == (BPF_LDX | BPF_MEM | BPF_W) ||
+   insn->code == (BPF_LDX | BPF_MEM | BPF_DW))
type = BPF_READ;
-   else if (insn->code == (BPF_STX | BPF_MEM | BPF_W))
+   else if (insn->code == (BPF_STX | BPF_MEM | BPF_W) ||
+insn->code == (BPF_STX | BPF_MEM | BPF_DW))
type = BPF_WRITE;
else
continue;
-- 
2.8.0

[PATCH v2 net-next 5/6] samples/bpf: add perf_event+bpf example

2016-08-31 Thread Alexei Starovoitov

The bpf program is called 50 times a second and does 
hashmap[kern_stackid]++
It's primary purpose to check that key bpf helpers like map lookup, update,
get_stackid, trace_printk and ctx access are all working.
It checks:
- PERF_COUNT_HW_CPU_CYCLES on all cpus
- PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to 
children
- PERF_COUNT_SW_CPU_CLOCK on all cpus
- PERF_COUNT_SW_CPU_CLOCK for current process

Signed-off-by: Alexei Starovoitov 
---
 samples/bpf/Makefile   |   4 +
 samples/bpf/bpf_helpers.h  |   2 +
 samples/bpf/bpf_load.c |   7 +-
 samples/bpf/trace_event_kern.c |  65 +
 samples/bpf/trace_event_user.c | 213 +
 5 files changed, 290 insertions(+), 1 deletion(-)
 create mode 100644 samples/bpf/trace_event_kern.c
 create mode 100644 samples/bpf/trace_event_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index db3cb061bfcd..a69cf9045285 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -25,6 +25,7 @@ hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
+hostprogs-y += trace_event
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -52,6 +53,7 @@ xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
 test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
   test_current_task_under_cgroup_user.o
+trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -79,6 +81,7 @@ always += test_cgrp2_tc_kern.o
 always += xdp1_kern.o
 always += xdp2_kern.o
 always += test_current_task_under_cgroup_kern.o
+always += trace_event_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
@@ -103,6 +106,7 @@ HOSTLOADLIBES_test_overhead += -lelf -lrt
 HOSTLOADLIBES_xdp1 += -lelf
 HOSTLOADLIBES_xdp2 += -lelf
 HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
+HOSTLOADLIBES_trace_event += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index bbdf62a1e45e..90f44bd2045e 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -55,6 +55,8 @@ static int (*bpf_skb_get_tunnel_opt)(void *ctx, void *md, int 
size) =
(void *) BPF_FUNC_skb_get_tunnel_opt;
 static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) =
(void *) BPF_FUNC_skb_set_tunnel_opt;
+static unsigned long long (*bpf_get_prandom_u32)(void) =
+   (void *) BPF_FUNC_get_prandom_u32;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 0cfda2320320..97913e109b14 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -51,6 +51,7 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
bool is_xdp = strncmp(event, "xdp", 3) == 0;
+   bool is_perf_event = strncmp(event, "perf_event", 10) == 0;
enum bpf_prog_type prog_type;
char buf[256];
int fd, efd, err, id;
@@ -69,6 +70,8 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
prog_type = BPF_PROG_TYPE_TRACEPOINT;
} else if (is_xdp) {
prog_type = BPF_PROG_TYPE_XDP;
+   } else if (is_perf_event) {
+   prog_type = BPF_PROG_TYPE_PERF_EVENT;
} else {
printf("Unknown event '%s'\n", event);
return -1;
@@ -82,7 +85,7 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
 
prog_fd[prog_cnt++] = fd;
 
-   if (is_xdp)
+   if (is_xdp || is_perf_event)
return 0;
 
if (is_socket) {
@@ -326,6 +329,7 @@ int load_bpf_file(char *path)
memcmp(shname_prog, "kretprobe/", 10) == 0 ||
memcmp(shname_prog, "tracepoint/", 11) == 0 ||
memcmp(shname_prog, "xdp", 3) == 0 ||
+   memcmp(shname_prog, "perf_event", 10) == 0 ||
memcmp(shname_prog, "socket", 6) == 0)
load_and_attach(shname_prog, insns, 
data_prog->d_size);
}
@@ -344,6 +348,7 @@ int load_bpf_file(char *path)
memcmp(shname, "kretprobe/", 10) == 0 ||
memcmp(shname, "tracepoint/", 11) == 0 ||
memcmp(shname, "xdp", 3) == 0 ||
+   memcmp(shname, "perf_event", 10) == 0 ||

Re: [PATCH] softirq: let ksoftirqd do its job

2016-08-31 Thread Jesper Dangaard Brouer

On Wed, 31 Aug 2016 13:42:30 -0700
Eric Dumazet  wrote:

> On Wed, 2016-08-31 at 21:40 +0200, Jesper Dangaard Brouer wrote:
> 
> > I can confirm the improvement of approx 900Kpps (no wonder people have
> > been complaining about DoS against UDP/DNS servers).
> > 
> > BUT during my extensive testing, of this patch, I also think that we
> > have not gotten to the bottom of this.  I was expecting to see a higher
> > (collective) PPS number as I add more UDP servers, but I don't.
> > 
> > Running many UDP netperf's with command:
> >  super_netperf 4 -H 198.18.50.3 -l 120 -t UDP_STREAM -T 0,0 -- -m 1472 -n 
> > -N  
> 
> Are you sure sender can send fast enough ?

Yes, as I can see drops (overrun UDP limit UdpRcvbufErrors). Switching
to pktgen and udp_sink to be sure.

> > 
> > With 'top' I can see ksoftirq are still getting a higher %CPU time:
> > 
> > PID   %CPU TIME+  COMMAND
> >  3   36.5   2:28.98  ksoftirqd/0
> >  107249.6   0:01.05  netserver
> >  107229.3   0:01.05  netserver
> >  107239.3   0:01.05  netserver
> >  107259.3   0:01.05  netserver  
> 
> Looks much better on my machine, with "udprcv -n 4" (using 4 threads,
> and 4 sockets using SO_REUSEPORT)
> 
> 10755 root  20   0   34948  4  0 S  79.7  0.0   0:33.66 udprcv 
> 3 root  20   0   0  0  0 R  19.9  0.0   0:25.49 
> ksoftirqd/0 
> 
> Pressing 'H' in top gives :
> 
> 3 root  20   0   0  0  0 R 19.9  0.0   0:47.84 ksoftirqd/0
> 10756 root  20   0   34948  4  0 R 19.9  0.0   0:30.76 udprcv 
> 10757 root  20   0   34948  4  0 R 19.9  0.0   0:30.76 udprcv 
> 10758 root  20   0   34948  4  0 S 19.9  0.0   0:30.76 udprcv
> 10759 root  20   0   34948  4  0 S 19.9  0.0   0:30.76 udprcv

Yes, I'm seeing the same when unning 5 instances my own udp_sink[1]:
 sudo taskset -c 0 ./udp_sink --port 10003 --recvmsg --reuse-port --count 
$((10**10))

 PID  S  %CPU TIME+  COMMAND
3 R  21.6   2:21.33  ksoftirqd/0
 3838 R  15.9   0:02.18  udp_sink
 3856 R  15.6   0:02.16  udp_sink
 3862 R  15.6   0:02.16  udp_sink
 3844 R  15.3   0:02.15  udp_sink
 3850 S  15.3   0:02.15  udp_sink

This is the expected result, that adding more userspace receivers
scales up.  I needed 5 udp_sink's before I don't see any drops, either
this says the job performed by ksoftirqd is 5 times faster or the
collective queue size of the programs was fast enough to absorb the
scheduling jitter.

The result from this run were handling 1,517,248 pps, without any
drops, all processes pinned to the same CPU.

 $ nstat > /dev/null && sleep 1 && nstat
 #kernel
 IpInReceives15172250.0
 IpInDelivers15172240.0
 UdpInDatagrams  15172480.0
 IpExtInOctets   69793408   0.0
 IpExtInNoECTPkts15172460.0

I'm acking this patch:

Acked-by: Jesper Dangaard Brouer 

> 
> Patch was on top of commit 071e31e254e0e0c438eecba3dba1d6e2d0da36c2

Mine on top of commit 84fd1b191a9468

> > 
> >   
> > > Since the load runs in well identified threads context, an admin can
> > > more easily tune process scheduling parameters if needed.  
> > 
> > With this patch applied, I found that changing the UDP server process,
> > scheduler policy to SCHED_RR or SCHED_FIFO gave me a performance boost
> > from 900Kpps to 1.7Mpps, and not a single UDP packet dropped (even with
> > a single UDP stream, also tested with more)
> > 
> > Command used:
> >  sudo chrt --rr -p 20 $(pgrep netserver)  
> 
> 
> Sure, this is what I mentioned in my changelog : Once we properly
> schedule and rely on ksoftirqd, tuning is available.
> 
> > 
> > The scheduling picture also change a lot:
> > 
> >PID  %CPU   TIME+   COMMAND
> >  10783  24.3  0:21.53  netserver
> >  10784  24.3  0:21.53  netserver
> >  10785  24.3  0:21.52  netserver
> >  10786  24.3  0:21.50  netserver
> >  3   2.7  3:12.18  ksoftirqd/0
> > 


[1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH] irda: Fix likely typo in output format string

2016-08-31 Thread David Miller

From: Oleg Drokin 
Date: Wed, 31 Aug 2016 17:36:44 -0400

> 
> On Aug 31, 2016, at 5:31 PM, David Miller wrote:
> 
>> From: Oleg Drokin 
>> Date: Fri, 26 Aug 2016 23:14:06 -0400
>> 
>>> %ul would print an unsigned with a letter l at the end which does
>>> not seem to be desired here, on the other hand the value being printed
>>> is u32 so just drop the l instead of converting to %lu
>>> 
>>> Signed-off-by: Oleg Drokin 
>> 
>> %u is for unsigned values, and these are "s32" thus signed.
> 
> Hm, you are right.
> I could swear I saw them as unsigned when I looked at it.
> 
> Anyway can they really be negative? they are seconds and usec,
> should I change them to u32 too?

If you're interesting in continuing with this, it is your
area for exploration not our's :-)

[PATCH v2 net-next 6/6] samples/bpf: add sampleip example

2016-08-31 Thread Alexei Starovoitov

From: Brendan Gregg 

sample instruction pointer and frequency count in a BPF map

Signed-off-by: Brendan Gregg 
Signed-off-by: Alexei Starovoitov 
---
 samples/bpf/Makefile|   4 +
 samples/bpf/sampleip_kern.c |  38 +
 samples/bpf/sampleip_user.c | 196 
 3 files changed, 238 insertions(+)
 create mode 100644 samples/bpf/sampleip_kern.c
 create mode 100644 samples/bpf/sampleip_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index a69cf9045285..12b7304d55dc 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -26,6 +26,7 @@ hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
 hostprogs-y += trace_event
+hostprogs-y += sampleip
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -54,6 +55,7 @@ xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
 test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
   test_current_task_under_cgroup_user.o
 trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
+sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -82,6 +84,7 @@ always += xdp1_kern.o
 always += xdp2_kern.o
 always += test_current_task_under_cgroup_kern.o
 always += trace_event_kern.o
+always += sampleip_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
@@ -107,6 +110,7 @@ HOSTLOADLIBES_xdp1 += -lelf
 HOSTLOADLIBES_xdp2 += -lelf
 HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
 HOSTLOADLIBES_trace_event += -lelf
+HOSTLOADLIBES_sampleip += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/sampleip_kern.c b/samples/bpf/sampleip_kern.c
new file mode 100644
index ..774a681f374a
--- /dev/null
+++ b/samples/bpf/sampleip_kern.c
@@ -0,0 +1,38 @@
+/* Copyright 2016 Netflix, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+#define MAX_IPS8192
+
+struct bpf_map_def SEC("maps") ip_map = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(u64),
+   .value_size = sizeof(u32),
+   .max_entries = MAX_IPS,
+};
+
+SEC("perf_event")
+int do_sample(struct bpf_perf_event_data *ctx)
+{
+   u64 ip;
+   u32 *value, init_val = 1;
+
+   ip = ctx->regs.ip;
+   value = bpf_map_lookup_elem(_map, );
+   if (value)
+   *value += 1;
+   else
+   /* E2BIG not tested for this example only */
+   bpf_map_update_elem(_map, , _val, BPF_NOEXIST);
+
+   return 0;
+}
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c
new file mode 100644
index ..260a6bdd6413
--- /dev/null
+++ b/samples/bpf/sampleip_user.c
@@ -0,0 +1,196 @@
+/*
+ * sampleip: sample instruction pointer and frequency count in a BPF map.
+ *
+ * Copyright 2016 Netflix, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+#include "bpf_load.h"
+
+#define DEFAULT_FREQ   99
+#define DEFAULT_SECS   5
+#define MAX_IPS8192
+#define PAGE_OFFSET0x8800
+
+static int nr_cpus;
+
+static void usage(void)
+{
+   printf("USAGE: sampleip [-F freq] [duration]\n");
+   printf("   -F freq# sample frequency (Hertz), default 99\n");
+   printf("   duration   # sampling duration (seconds), default 5\n");
+}
+
+static int sampling_start(int *pmu_fd, int freq)
+{
+   int i;
+
+   struct perf_event_attr pe_sample_attr = {
+   .type = PERF_TYPE_SOFTWARE,
+   .freq = 1,
+   .sample_period = freq,
+   .config = PERF_COUNT_SW_CPU_CLOCK,
+   .inherit = 1,
+   };
+
+   for (i = 0; i < nr_cpus; i++) {
+   pmu_fd[i] = perf_event_open(_sample_attr, -1 /* pid */, i,
+   -1 /* group_fd */, 0 /* flags */);
+   if (pmu_fd[i] < 0) {
+   fprintf(stderr, "ERROR: Initializing perf sampling\n");
+   return 1;
+   }
+   assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF,
+prog_fd[0]) == 0);
+   assert(ioctl(pmu_fd[i],

Re: [PATCH V2] dt: net: enhance DWC EQoS binding to support Tegra186

2016-08-31 Thread Stephen Warren


On 08/31/2016 03:15 AM, Lars Persson wrote:

On 08/30/2016 10:50 PM, Stephen Warren wrote:

On 08/30/2016 01:01 PM, Rob Herring wrote:

On Wed, Aug 24, 2016 at 03:20:46PM -0600, Stephen Warren wrote:

The Synopsys DWC EQoS is a configurable IP block which supports multiple
options for bus type, clocking and reset structure, and feature list.
Extend the DT binding to define a "compatible value" for the configuration
contained in NVIDIA's Tegra186 SoC, and define some new properties and
list property entries required by that configuration.



diff --git
a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt



+- clock-names: May contain any/all of the following depending on the IP
+  configuration, in any order:
+The EQOS transmit path clock. The HW signal name is clk_tx_i.
+In some configurations (e.g. GMII/RGMII), this clock also drives
the PHY TX
+path. In other configurations, other clocks (such as tx_125,
rmii) may
+drive the PHY TX path.
+  - "rx"
+The EQOS receive path clock. The HW signal name is clk_rx_i.
+In some configurations (e.g. GMII/RGMII), this clock also drives
the PHY RX
+path. In other configurations, other clocks (such as rx_125,
pmarx_0,
+pmarx_1, rmii) may drive the PHY RX path.


It is not correct that clk_rx_i drives the PHY rx path for GMII/RGMII.
The PHY is the source of the rx clock for these modes.


I think both of our statements are true.

There's a clock input to the EQOS module (clk_rx_i) that does drive the 
RX path in the EQOS module.


That clock also drives the PHY's RX path.

Those statements make no comment regarding the /source/ of that clock; 
either of the following might be true:


1) The PHY could generate the clock internally somehow, feed its own 
internal logic with that clock, and send the clock out to feed the EQOS 
RX path too.


or,

2)  SoC integration could drive the same clock into both the EQOS and 
PHY modules, so that both sets of logic are fed from the same external 
clock.


Perhaps the phrase "PHY RX path" is confusing; I was talking about the 
EQOS modules' RX path from the PHY more than the PHY itself, although 
given what I said above I believe either interpretation is valid and 
correct.



Will the driver need to make any clock ops on the "rx" clock ?


Yes. The EQOS driver needs to ensure that the clock is running before 
attempting to receive data from the PHY, otherwise the EQOS's own RX 
logic won't be clocked.


Whether the phandle for this clock points at a SoC-level provider (it 
will in Tegra) or a clock provider in the PHY (it might in other SoCs), 
shouldn't matter as far as the DT binding goes, although it might affect 
device probe ordering in some implementations.



+  Note: Support for additional IP configurations may require adding the
+  following clocks to this list in the future: clk_rx_125_i, clk_tx_125_i,
+  clk_pmarx_0_i, clk_pmarx1_i, clk_rmii_i, clk_revmii_rx_i, clk_revmii_tx_i.
+
+  The following compatible values require the following set of clocks:
+  - "nvidia,tegra186-eqos", "snps,dwc-qos-ethernet-4.10":
+- "slave_bus"
+- "master_bus"
+- "rx"
+- "tx"
+- "ptp_ref"
+  - "axis,artpec6-eqos", "snps,dwc-qos-ethernet-4.10":
+- "phy_ref_clk"
+- "apb_clk"


It would be good if this was marked deprecated and the full set of
clocks could be described and supported. Not sure if you can figure that
out. Is it really only 2 clocks, or these have multiple connections to
the same source.


Lars, can you answer here?

I deliberately didn't attempt to change the binding definition for the
existing use-case, since I'm not familiar with that SoC, and don't
relish changing DTs for a platform I can't test.


For the artpec-6 the clocks are like this:
apb_clk: It is both the master and slave bus clock.
phy_ref_clk: It corresponds to tx clock in the proposed new binding.

There is a also a ptp reference clock that will map to the new ptp_ref
clock binding.

So the full set of clocks in a new artpec-6 binding is:
slave_bus
master_bus
tx
ptp_ref


Given the discussion above, I think we should represent the rx clock too.

Re: [RESEND PATCH 3/4] arm64: dts: rockchip: support gmac for rk3399

2016-08-31 Thread Doug Anderson

Hi,

On Wed, Aug 31, 2016 at 2:29 PM, Heiko Stübner  wrote:
>> IMHO it would be nice if this were broken into two patches.
>>
>> 1. First patch would be the power domain patch and that could land any
>> time.  You wouldn't actually be able to use the gmac but at least
>> you'd be able to turn off its power.  This would be a handy patch to
>> be able to backport if you happened to not need Ethernet support but
>> wanted to save power.
>>
>> 2. Second patch would actually add the gmac.
>
> according to my talk with Caesar in the real v1, the gmac even with power-
> domains should work just nicely even without the dts patches, as the driver
> core takes care of powering up the pd before probe.
>
> But I may miss some peculiarity of the dwmac?

Nothing that I'm terribly aware of.  I was just being selfish because:

1. I'm on a board where I don't need Ethernet.

2. I'm running a semi old kernel (4.4)

3. I don't want to pick back the various fixes that might be needed to
make gmac work on rk3399 to that old kernel.

4. I want it very obvious that gmac isn't really supported on this old
kernel on rk3399 (and having stmmac not in the device tree would make
it very obvious)

5. I do want the power savings of turning the power domains off for the gmac.

If this patch is broken in two then I can pick back just the power
domain patch.  :-P

-Doug

Re: [PATCH] irda: Fix likely typo in output format string

2016-08-31 Thread Oleg Drokin

On Aug 31, 2016, at 5:31 PM, David Miller wrote:

> From: Oleg Drokin 
> Date: Fri, 26 Aug 2016 23:14:06 -0400
> 
>> %ul would print an unsigned with a letter l at the end which does
>> not seem to be desired here, on the other hand the value being printed
>> is u32 so just drop the l instead of converting to %lu
>> 
>> Signed-off-by: Oleg Drokin 
> 
> %u is for unsigned values, and these are "s32" thus signed.

Hm, you are right.
I could swear I saw them as unsigned when I looked at it.

Anyway can they really be negative? they are seconds and usec,
should I change them to u32 too?

Re: [PATCH] irda: Fix likely typo in output format string

2016-08-31 Thread David Miller

From: Oleg Drokin 
Date: Fri, 26 Aug 2016 23:14:06 -0400

> %ul would print an unsigned with a letter l at the end which does
> not seem to be desired here, on the other hand the value being printed
> is u32 so just drop the l instead of converting to %lu
> 
> Signed-off-by: Oleg Drokin 

%u is for unsigned values, and these are "s32" thus signed.

Re: [RESEND PATCH 3/4] arm64: dts: rockchip: support gmac for rk3399

2016-08-31 Thread Heiko Stübner

Am Mittwoch, 31. August 2016, 13:42:17 schrieb Doug Anderson:
> Caesar,
> 
> On Tue, Aug 30, 2016 at 11:13 PM, Caesar Wang  wrote:
> > This patch adds needed gamc information for rk3399,
> > also support the gmac pd.
> > 
> > Signed-off-by: Roger Chen 
> > Signed-off-by: Caesar Wang 
> > ---
> > 
> >  arch/arm64/boot/dts/rockchip/rk3399.dtsi | 90
> >   1 file changed, 90 insertions(+)
> 
> I noticed that your subject for this patch contains "RESEND" and not
> "v2" event though there are changes between this version and the last
> one.  That's really confusing.  This should have been "v2" and the
> next version should be "v3".
> 
> > diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> > b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..abf27a4 100644
> > --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> > +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> > @@ -200,6 +200,26 @@
> > 
> > };
> > 
> > };
> > 
> > +   gmac: eth@fe30 {
> 
> nit: on rk3288 the node was "ethernet@" instead of "eth@".  Presumably
> "ethernet" is more correct?
> 
> > +   compatible = "rockchip,rk3399-gmac";
> > +   reg = <0x0 0xfe30 0x0 0x1>;
> > +   interrupts = ;
> > +   interrupt-names = "macirq";
> > +   clocks = < SCLK_MAC>, < SCLK_MAC_RX>,
> > +< SCLK_MAC_TX>, < SCLK_MACREF>,
> > +< SCLK_MACREF_OUT>, < ACLK_GMAC>,
> > +< PCLK_GMAC>;
> > +   clock-names = "stmmaceth", "mac_clk_rx",
> > + "mac_clk_tx", "clk_mac_ref",
> > + "clk_mac_refout", "aclk_mac",
> > + "pclk_mac";
> > +   power-domains = < RK3399_PD_GMAC>;
> > +   resets = < SRST_A_GMAC>;
> > +   reset-names = "stmmaceth";
> > +   rockchip,grf = <>;
> > +   status = "disabled";
> > +   };
> > +
> > 
> > sdio0: dwmmc@fe31 {
> > 
> > compatible = "rockchip,rk3399-dw-mshc",
> > 
> >  "rockchip,rk3288-dw-mshc";
> > 
> > @@ -611,6 +631,11 @@
> > 
> > status = "disabled";
> > 
> > };
> > 
> > +   qos_gmac: qos@ffa5c000 {
> > +   compatible = "syscon";
> > +   reg = <0x0 0xffa5c000 0x0 0x20>;
> > +   };
> > +
> > 
> > qos_hdcp: qos@ffa9 {
> > 
> > compatible = "syscon";
> > reg = <0x0 0xffa9 0x0 0x20>;
> > 
> > @@ -704,6 +729,11 @@
> > 
> > #size-cells = <0>;
> > 
> > /* These power domains are grouped by VD_CENTER */
> > 
> > +   pd_gmac@RK3399_PD_GMAC {
> 
> RK3399_PD_GMAC is not in VD_CENTER but in VD_LOGIC, right?  ...so this
> should move.
> 
> > +   reg = ;
> > +   clocks = < ACLK_GMAC>;
> > +   pm_qos = <_gmac>;
> > +   };
> 
> IMHO it would be nice if this were broken into two patches.
> 
> 1. First patch would be the power domain patch and that could land any
> time.  You wouldn't actually be able to use the gmac but at least
> you'd be able to turn off its power.  This would be a handy patch to
> be able to backport if you happened to not need Ethernet support but
> wanted to save power.
> 
> 2. Second patch would actually add the gmac.

according to my talk with Caesar in the real v1, the gmac even with power-
domains should work just nicely even without the dts patches, as the driver 
core takes care of powering up the pd before probe.

But I may miss some peculiarity of the dwmac?


Heiko

Re: [PATCH net v3 0/9] net: ethernet: mediatek: a couple of fixes

2016-08-31 Thread David Miller

From: 
Date: Tue, 30 Aug 2016 10:59:16 +0800

> a couple of fixes come out from integrating with linux-4.8 rc1
> they all are verified and workable on linux-4.8 rc1

I get rejects when I try to apply this to the current tree.

Re: [RFC] xgbe: constify get_netdev_ops and get_ethtool_ops

2016-08-31 Thread David Miller

From: Stephen Hemminger 
Date: Wed, 31 Aug 2016 08:57:36 -0700

> Casting away const is bad practice. Since this is ARM specific driver
> don't have hardware actually test this.
> 
> Having getter functions for ops is really unnecessary code bloat, but
> not going to touch that.
> 
> Signed-off-by: Stephen Hemminger 

I'll just apply this, let's see what happens.

Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-31 Thread Andrew Lunn

> >You can just as easily find the child node called ethernet-phy.
> 
> As Andrew pointed out, using phy-handle allows me to place the phy
> node anywhere.
> 
> I've already made changes to this design, and every change has
> raised objections.  I don't see anything wrong with phy-handle.  A
> lot of drivers use it.

Agreed. You should do the same as what all other driver do.

Andrew

Re: [PATCH net-next v2 0/3] net: dsa: add MDB support

2016-08-31 Thread David Miller

From: Andrew Lunn 
Date: Wed, 31 Aug 2016 18:04:05 +0200

> On Wed, Aug 31, 2016 at 11:50:02AM -0400, Vivien Didelot wrote:
>> This patchset adds the switchdev MDB object support to the DSA layer.
>> 
>> The MDB support for the mv88e6xxx driver is very similar to the FDB
>> support. The FDB operations care about unicast addresses while the MDB
>> operations care about multicast addresses.
>> 
>> Both operation set load/purge/dump the Address Translation Table (ATU),
>> thus common code is used.
> 
> Reviewed-by: Andrew Lunn 

Series applied, thanks everyone.

Re: [PATCH net-next V4 00/10] liquidio CN23XX support

2016-08-31 Thread David Miller

From: Raghu Vatsavayi 
Date: Wed, 31 Aug 2016 11:03:19 -0700

> Following patchset adds support for new device "CN23XX" in
> liquidio family of adapters. As adviced by you I have split
> the previous V3 patch of 18 patches into two halves. This
> first patchset has first 10 patches, which are tested against
> net-next. I will post the second half after this one.
> 
> This V4 patch also addressed all the comments from previous
> submission:
> 1) Avoid busy loop while reading registers.
> 2) Other minor comments about debug messages and constants.
> 
> Please apply patches in following order as some of the
> patches depend on earlier patches.

Series applied, thanks.

Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-08-31 Thread David Howells

Arnd Bergmann  wrote:

> Right, sorry about that. Do you want me to resend the fixed version,
> or do you apply and fix it yourself?

I can fix it up myself.  I'll pull it into my tree when I've finished doing
the fixing up I'm currently working on.

David

Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-31 Thread Timur Tabi

Rob Herring wrote:

It's not a generic phy.  It's a funky "internal phy" that differs among
>SOCs.  I call it the internal phy, but I could use another name. Internally,
>some people call it the "sgmii phy", but I don't think that's accurate.

Funky internal PHYs are precisely the types of PHYs this binding is
for. It is generic in that the type is not defined. It can be USB,
HDMI, DSI, LVDS, etc.

I don't understand what you're getting at.  There are two IP blocks that 
have a private interconnect.  One is the MAC, and the other is an 
internal PHY, but the driver programs them as one device.

If you want me to make some kind of change, you're going to have to be 
more specific.

>That's what I thought to, but without it, of_phy_find_device() won't work.
>I need a pointer to the phy node, and I use of_parse_phandle() to get it:
>
> struct device_node *phy_np;
>
> ret = of_mdiobus_register(mii_bus, np);
> if (ret) {
> dev_err(>dev, "could not register mdio bus\n");
> return ret;
> }
>
> phy_np = of_parse_phandle(np, "phy-handle", 0);

You can just as easily find the child node called ethernet-phy.

As Andrew pointed out, using phy-handle allows me to place the phy node 
anywhere.

I've already made changes to this design, and every change has raised 
objections.  I don't see anything wrong with phy-handle.  A lot of 
drivers use it.

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [PATCH 3/5] rxrpc: fix last_call processing

2016-08-31 Thread David Howells

Arnd Bergmann  wrote:

> I'll follow up with the fixes, both of which are rather
> straightforward.

Are they both in?

[PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

David

Re: [PATCH] ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit()

2016-08-31 Thread David Miller

From: Eli Cooper 
Date: Fri, 26 Aug 2016 23:52:29 +0800

> @@ -1174,6 +1174,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
> *dev)
>   encap_limit = t->parms.encap_limit;
>  
>   memcpy(, >fl.u.ip6, sizeof(fl6));
> + fl6.flowi6_proto = IPPROTO_IPIP;

Let's just simply have t->fl have the proto setup properly, just like
in GRE.

Assigning it explicitly every packet transmit doesn't make much sense.

Re: possible circular locking dependency detected (bisected)

2016-08-31 Thread Rainer Weikusat

CAI Qian  writes:
> Reverted the patch below fixes this problem.
>
> c845acb324aa85a39650a14e7696982ceea75dc1
> af_unix: Fix splice-bind deadlock

Reverting a patch fixing one deadlock in order to avoid another deadlock
leaves the 'net situation' unchanged. The idea of the other patch was to
change unix_mknod such that it doesn't do __sb_start_write with
u->readlock held anymore. As far as I understand the output below,
overlayfs introduce an additional codepath where unix_mknod end up doing
__sb_start_write again. That's already the original deadlock re-added,
cf,

B: splice() from a pipe to /mnt/regular_file
does sb_start_write() on /mnt
C: try to freeze /mnt
wait for B to finish with /mnt
A: bind() try to bind our socket to /mnt/new_socket_name
lock our socket, see it not bound yet
decide that it needs to create something in /mnt
try to do sb_start_write() on /mnt, block (it's
waiting for C).
D: splice() from the same pipe to our socket
lock the pipe, see that socket is connected
try to lock the socket, block waiting for A
B:  get around to actually feeding a chunk from
pipe to file, try to lock the pipe.  Deadlock.


as A will again acquire the readlock and then call __sb_start_write.

>
>CAI Qian
>
> - Original Message -
>> From: "CAI Qian" 
>> To: secur...@kernel.org
>> Cc: "Miklos Szeredi" , "Eric Sandeen" 
>> 
>> Sent: Tuesday, August 30, 2016 5:05:45 PM
>> Subject: Re: possible circular locking dependency detected
>> 
>> FYI, this one can only be reproduced using the overlayfs docker backend.
>> The device-mapper works fine. The XFS below has ftype=1.
>> 
>> # cp recvmsg01 /mnt
>> # docker run -it -v /mnt/:/mnt/ rhel7 bash
>> [root@c33c99aedd93 /]# mount
>> overlay on / type overlay
>> (rw,relatime,seclabel,lowerdir=l/I5VXL74ENBNAEARZ4M2SIN3XD6:l/KZGBKPXLDXUGHYWMERFUBM4FRP,upperdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/diff,workdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/work)
>> ...
>> [root@c33c99aedd93 /]# /mnt/recvmsg01
>> CAI Qian
>> 
>> - Original Message -
>> > From: "CAI Qian" 
>> > To: secur...@kernel.org
>> > Sent: Friday, August 26, 2016 10:50:57 AM
>> > Subject: possible circular locking dependency detected
>> > 
>> > FYI, just want to give a head up to see if there is anything obvious so
>> > we can avoid a possible DoS somehow.
>> > 
>> > Running the LTP syscalls tests inside a container until this test trigger
>> > below,
>> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/recvmsg/recvmsg01.c
>> > 
>> > [ 4441.904103] open04 (42409) used greatest stack depth: 20552 bytes left
>> > [ 4605.419167]
>> > [ 4605.420831] ==
>> > [ 4605.427727] [ INFO: possible circular locking dependency detected ]
>> > [ 4605.434720] 4.8.0-rc3+ #3 Not tainted
>> > [ 4605.438803] ---
>> > [ 4605.445796] recvmsg01/42878 is trying to acquire lock:
>> > [ 4605.451528]  (sb_writers#8){.+.+.+}, at: []
>> > __sb_start_write+0xb4/0xf0
>> > [ 4605.460642]
>> > [ 4605.460642] but task is already holding lock:
>> > [ 4605.467150]  (>readlock){+.+.+.}, at: []
>> > unix_bind+0x299/0xdf0
>> > [ 4605.475749]
>> > [ 4605.475749] which lock already depends on the new lock.
>> > [ 4605.475749]
>> > [ 4605.484882]
>> > [ 4605.484882] the existing dependency chain (in reverse order) is:
>> > [ 4605.493234]
>> > [ 4605.493234] -> #2 (>readlock){+.+.+.}:
>> > [ 4605.497943][] lock_acquire+0x1fa/0x440
>> > [ 4605.504659][]
>> > mutex_lock_interruptible_nested+0xdd/0x920
>> > [ 4605.513119][] unix_bind+0x299/0xdf0
>> > [ 4605.519540][] SYSC_bind+0x1d8/0x240
>> > [ 4605.525964][] SyS_bind+0xe/0x10
>> > [ 4605.531998][] do_syscall_64+0x1a6/0x500
>> > [ 4605.538811][] return_from_SYSCALL_64+0x0/0x7a
>> > [ 4605.546203]
>> > [ 4605.546203] -> #1 (>i_mutex_dir_key#3/1){+.+.+.}:
>> > [ 4605.552292][] lock_acquire+0x1fa/0x440
>> > [ 4605.559002][] down_write_nested+0x5e/0xe0
>> > [ 4605.566008][] filename_create+0x155/0x470
>> > [ 4605.573013][] SyS_mkdir+0xaf/0x1f0
>> > [ 4605.579339][]
>> > entry_SYSCALL_64_fastpath+0x1f/0xbd
>> > [ 4605.587119]
>> > [ 4605.587119] -> #0 (sb_writers#8){.+.+.+}:
>> > [ 4605.591835][] __lock_acquire+0x3043/0x3dd0
>> > [ 4605.598935][] lock_acquire+0x1fa/0x440
>> > [ 4605.605646][] percpu_down_read+0x4f/0xa0
>> > [ 4605.612552][] __sb_start_write+0xb4/0xf0
>> > [ 4605.619459][] mnt_want_write+0x41/0xb0
>> > [ 4605.626173][] ovl_want_write+0x76/0xa0
>> > [overlay]
>> > [ 4605.633860][] ovl_create_object+0xa3/0x2d0
>> > [overlay]
>> > [

Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-08-31 Thread David Miller

From: David Howells 
Date: Wed, 31 Aug 2016 21:25:46 +0100

> Is there a 1/2 somewhere?  I don't see it.

It was an NFSv4 patch.

Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-31 Thread Rob Herring

On Wed, Aug 31, 2016 at 10:11 AM, Timur Tabi  wrote:
> Rob Herring wrote:
>
>>> +   internal-phy = <_sgmii>;
>>
>>
>> Can't this use the standard generic phy binding (i.e. 'phys'). It's a
>> bit confusing as there's the ethernet phy binding (phy-handle) and the
>> generic one.
>
>
> It's not a generic phy.  It's a funky "internal phy" that differs among
> SOCs.  I call it the internal phy, but I could use another name. Internally,
> some people call it the "sgmii phy", but I don't think that's accurate.

Funky internal PHYs are precisely the types of PHYs this binding is
for. It is generic in that the type is not defined. It can be USB,
HDMI, DSI, LVDS, etc.

>
> I can call it "emac-phy", but I don't know if that's any better.
>
>>> +   phy-handle = <>;
>>
>>
>> This is bit redundant as the phy is the child node. I guess if you had
>> multiple devices on the mdio bus you would need it. I'd drop it if you
>> don't envision needing it and the kernel doesn't require it.
>
>
> That's what I thought to, but without it, of_phy_find_device() won't work.
> I need a pointer to the phy node, and I use of_parse_phandle() to get it:
>
> struct device_node *phy_np;
>
> ret = of_mdiobus_register(mii_bus, np);
> if (ret) {
> dev_err(>dev, "could not register mdio bus\n");
> return ret;
> }
>
> phy_np = of_parse_phandle(np, "phy-handle", 0);

You can just as easily find the child node called ethernet-phy.

> adpt->phydev = of_phy_find_device(phy_np);
>
>>> +
>>> +   #address-cells = <1>;
>>> +   #size-cells = <0>;
>>> +   phy0: ethernet-phy@0 {
>>
>>
>> It's just an example, but don't we require compatible strings for phys
>> now?
>
>
> Nope. I had a compatible property, but it broke of_mdiobus_child_is_phy().
> I don't want to specify why kind of phy it is.  I want to let phylib figure
> it out.

Okay, I'll defer to the mdio folks.

Rob

Re: [PATCH] softirq: let ksoftirqd do its job

2016-08-31 Thread Eric Dumazet

On Wed, 2016-08-31 at 21:40 +0200, Jesper Dangaard Brouer wrote:

> I can confirm the improvement of approx 900Kpps (no wonder people have
> been complaining about DoS against UDP/DNS servers).
> 
> BUT during my extensive testing, of this patch, I also think that we
> have not gotten to the bottom of this.  I was expecting to see a higher
> (collective) PPS number as I add more UDP servers, but I don't.
> 
> Running many UDP netperf's with command:
>  super_netperf 4 -H 198.18.50.3 -l 120 -t UDP_STREAM -T 0,0 -- -m 1472 -n -N

Are you sure sender can send fast enough ?

> 
> With 'top' I can see ksoftirq are still getting a higher %CPU time:
> 
> PID   %CPU TIME+  COMMAND
>  3   36.5   2:28.98  ksoftirqd/0
>  107249.6   0:01.05  netserver
>  107229.3   0:01.05  netserver
>  107239.3   0:01.05  netserver
>  107259.3   0:01.05  netserver

Looks much better on my machine, with "udprcv -n 4" (using 4 threads,
and 4 sockets using SO_REUSEPORT)

10755 root  20   0   34948  4  0 S  79.7  0.0   0:33.66 udprcv  

   
3 root  20   0   0  0  0 R  19.9  0.0   0:25.49 ksoftirqd/0 


Pressing 'H' in top gives :

3 root  20   0   0  0  0 R 19.9  0.0   0:47.84 ksoftirqd/0  

   
10756 root  20   0   34948  4  0 R 19.9  0.0   0:30.76 udprcv   

   
10757 root  20   0   34948  4  0 R 19.9  0.0   0:30.76 udprcv   

   
10758 root  20   0   34948  4  0 S 19.9  0.0   0:30.76 udprcv   

   
10759 root  20   0   34948  4  0 S 19.9  0.0   0:30.76 udprcv


Patch was on top of commit 071e31e254e0e0c438eecba3dba1d6e2d0da36c2
  
> 
> 
> > Since the load runs in well identified threads context, an admin can
> > more easily tune process scheduling parameters if needed.
> 
> With this patch applied, I found that changing the UDP server process,
> scheduler policy to SCHED_RR or SCHED_FIFO gave me a performance boost
> from 900Kpps to 1.7Mpps, and not a single UDP packet dropped (even with
> a single UDP stream, also tested with more)
> 
> Command used:
>  sudo chrt --rr -p 20 $(pgrep netserver)


Sure, this is what I mentioned in my changelog : Once we properly
schedule and rely on ksoftirqd, tuning is available.

> 
> The scheduling picture also change a lot:
> 
>PID  %CPU   TIME+   COMMAND
>  10783  24.3  0:21.53  netserver
>  10784  24.3  0:21.53  netserver
>  10785  24.3  0:21.52  netserver
>  10786  24.3  0:21.50  netserver
>  3   2.7  3:12.18  ksoftirqd/0
> 
>

Re: [RESEND PATCH 3/4] arm64: dts: rockchip: support gmac for rk3399

2016-08-31 Thread Doug Anderson

Caesar,

On Tue, Aug 30, 2016 at 11:13 PM, Caesar Wang  wrote:
> This patch adds needed gamc information for rk3399,
> also support the gmac pd.
>
> Signed-off-by: Roger Chen 
> Signed-off-by: Caesar Wang 
> ---
>
>  arch/arm64/boot/dts/rockchip/rk3399.dtsi | 90 
> 
>  1 file changed, 90 insertions(+)

I noticed that your subject for this patch contains "RESEND" and not
"v2" event though there are changes between this version and the last
one.  That's really confusing.  This should have been "v2" and the
next version should be "v3".

> diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
> b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> index 32aebc8..abf27a4 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
> @@ -200,6 +200,26 @@
> };
> };
>
> +   gmac: eth@fe30 {

nit: on rk3288 the node was "ethernet@" instead of "eth@".  Presumably
"ethernet" is more correct?

> +   compatible = "rockchip,rk3399-gmac";
> +   reg = <0x0 0xfe30 0x0 0x1>;
> +   interrupts = ;
> +   interrupt-names = "macirq";
> +   clocks = < SCLK_MAC>, < SCLK_MAC_RX>,
> +< SCLK_MAC_TX>, < SCLK_MACREF>,
> +< SCLK_MACREF_OUT>, < ACLK_GMAC>,
> +< PCLK_GMAC>;
> +   clock-names = "stmmaceth", "mac_clk_rx",
> + "mac_clk_tx", "clk_mac_ref",
> + "clk_mac_refout", "aclk_mac",
> + "pclk_mac";
> +   power-domains = < RK3399_PD_GMAC>;
> +   resets = < SRST_A_GMAC>;
> +   reset-names = "stmmaceth";
> +   rockchip,grf = <>;
> +   status = "disabled";
> +   };
> +
> sdio0: dwmmc@fe31 {
> compatible = "rockchip,rk3399-dw-mshc",
>  "rockchip,rk3288-dw-mshc";
> @@ -611,6 +631,11 @@
> status = "disabled";
> };
>
> +   qos_gmac: qos@ffa5c000 {
> +   compatible = "syscon";
> +   reg = <0x0 0xffa5c000 0x0 0x20>;
> +   };
> +
> qos_hdcp: qos@ffa9 {
> compatible = "syscon";
> reg = <0x0 0xffa9 0x0 0x20>;
> @@ -704,6 +729,11 @@
> #size-cells = <0>;
>
> /* These power domains are grouped by VD_CENTER */
> +   pd_gmac@RK3399_PD_GMAC {

RK3399_PD_GMAC is not in VD_CENTER but in VD_LOGIC, right?  ...so this
should move.

> +   reg = ;
> +   clocks = < ACLK_GMAC>;
> +   pm_qos = <_gmac>;
> +   };

IMHO it would be nice if this were broken into two patches.

1. First patch would be the power domain patch and that could land any
time.  You wouldn't actually be able to use the gmac but at least
you'd be able to turn off its power.  This would be a handy patch to
be able to backport if you happened to not need Ethernet support but
wanted to save power.

2. Second patch would actually add the gmac.

Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-08-31 Thread Arnd Bergmann

On Wednesday, August 31, 2016 9:26:21 PM CEST David Howells wrote:
> Arnd Bergmann  wrote:
> 
> > + } else {
> > + sched = 0;
> 
> That should be false, not 0, btw.
> 

Right, sorry about that. Do you want me to resend the fixed version,
or do you apply and fix it yourself?

As patch 1/2 isn't actually meant for net-next anyway, the series
doesn't need to stay together.

Arnd

Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-08-31 Thread Arnd Bergmann

On Wednesday, August 31, 2016 9:25:46 PM CEST David Howells wrote:
> Is there a 1/2 somewhere?  I don't see it.
> 
> David

Sorry, mixed up the Cc list. It only went to netdev and lkml and isn't
really related. That one was a workaround for a false-positive
 -Wmaybe-uninitialized warning in NFS, see https://lkml.org/lkml/2016/8/31/412

Arnd

Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-08-31 Thread David Howells

Is there a 1/2 somewhere?  I don't see it.

David

Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-08-31 Thread David Howells

Arnd Bergmann  wrote:

> + } else {
> + sched = 0;

That should be false, not 0, btw.

David

Re: [PATCH 0/6] constify ethtool_ops structures

2016-08-31 Thread Julia Lawall



On Wed, 31 Aug 2016, Stephen Hemminger wrote:

> On Wed, 31 Aug 2016 09:30:42 +0200
> Julia Lawall  wrote:
>
> > Constify ethtool_ops structures.
> >
> > ---
> >
> >  drivers/net/ethernet/agere/et131x.c  |2 +-
> >  drivers/net/ethernet/broadcom/bcmsysport.c   |2 +-
> >  drivers/net/ethernet/broadcom/genet/bcmgenet.c   |2 +-
> >  drivers/net/ethernet/hisilicon/hip04_eth.c   |2 +-
> >  drivers/net/ethernet/hisilicon/hisi_femac.c  |2 +-
> >  drivers/net/ethernet/hisilicon/hix5hd2_gmac.c|2 +-
> >  drivers/net/ethernet/hisilicon/hns/hns_ethtool.c |2 +-
> >  drivers/net/ethernet/synopsys/dwc_eth_qos.c  |2 +-
> >  drivers/staging/slicoss/slicoss.c|4 ++--
> >  9 files changed, 10 insertions(+), 10 deletions(-)
> > ___
> > devel mailing list
> > de...@linuxdriverproject.org
> > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
>
> Other drivers with same type of issue
>
>
> drivers/net/ethernet/mediatek/mtk_eth_soc.c:static struct ethtool_ops 
> mtk_ethtool_ops = {
> drivers/net/ethernet/synopsys/dwc_eth_qos.c:static struct ethtool_ops 
> dwceqos_ethtool_ops = {
> drivers/net/ethernet/xilinx/xilinx_axienet_main.c:static struct ethtool_ops 
> axienet_ethtool_ops = {
> drivers/net/usb/r8152.c:static struct ethtool_ops ops = {
> drivers/staging/netlogic/xlr_net.c:static struct ethtool_ops xlr_ethtool_ops 
> = {

Thanks.  Probably they don't compile for x86, or at least not with make
allyesconfig.  I can check on them.

julia

Re: [PATCH] softirq: let ksoftirqd do its job

2016-08-31 Thread Jesper Dangaard Brouer

On Wed, 31 Aug 2016 10:42:29 -0700
Eric Dumazet  wrote:

> From: Eric Dumazet 
> 
> A while back, Paolo and Hannes sent an RFC patch adding threaded-able
> napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/) 
> 
> The problem seems to be that softirqs are very aggressive and are often
> handled by the current process, even if we are under stress and that
> ksoftirqd was scheduled, so that innocent threads would have more chance
> to make progress.
> 
> This patch makes sure that if ksoftirq is running, we let it
> perform the softirq work.
> 
> Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/
> 
> Tested:
> 
>  - NIC receiving traffic handled by CPU 0
>  - UDP receiver running on CPU 0, using a single UDP socket.
>  - Incoming flood of UDP packets targeting the UDP socket.
> 
> Before the patch, the UDP receiver could almost never get cpu cycles and
> could only receive ~2,000 packets per second.
> 
> After the patch, cpu cycles are split 50/50 between user application and
> ksoftirqd/0, and we can effectively read ~900,000 packets per second,
> a huge improvement in DOS situation. (Note that more packets are now
> dropped by the NIC itself, since the BH handlers get less cpu cycles to
> drain RX ring buffer)

I can confirm the improvement of approx 900Kpps (no wonder people have
been complaining about DoS against UDP/DNS servers).

BUT during my extensive testing, of this patch, I also think that we
have not gotten to the bottom of this.  I was expecting to see a higher
(collective) PPS number as I add more UDP servers, but I don't.

Running many UDP netperf's with command:
 super_netperf 4 -H 198.18.50.3 -l 120 -t UDP_STREAM -T 0,0 -- -m 1472 -n -N

With 'top' I can see ksoftirq are still getting a higher %CPU time:

PID   %CPU TIME+  COMMAND
 3   36.5   2:28.98  ksoftirqd/0
 107249.6   0:01.05  netserver
 107229.3   0:01.05  netserver
 107239.3   0:01.05  netserver
 107259.3   0:01.05  netserver


> Since the load runs in well identified threads context, an admin can
> more easily tune process scheduling parameters if needed.

With this patch applied, I found that changing the UDP server process,
scheduler policy to SCHED_RR or SCHED_FIFO gave me a performance boost
from 900Kpps to 1.7Mpps, and not a single UDP packet dropped (even with
a single UDP stream, also tested with more)

Command used:
 sudo chrt --rr -p 20 $(pgrep netserver)

The scheduling picture also change a lot:

   PID  %CPU   TIME+   COMMAND
 10783  24.3  0:21.53  netserver
 10784  24.3  0:21.53  netserver
 10785  24.3  0:21.52  netserver
 10786  24.3  0:21.50  netserver
 3   2.7  3:12.18  ksoftirqd/0

 
> Reported-by: Paolo Abeni 
> Reported-by: Hannes Frederic Sowa 
> Signed-off-by: Eric Dumazet 
> Cc: David Miller  Cc: Jesper Dangaard Brouer 
> Cc: Peter Zijlstra 
> Cc: Rik van Riel 
> ---
>  kernel/softirq.c |   16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 17caf4b63342..8ed90e3a88d6 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -78,6 +78,17 @@ static void wakeup_softirqd(void)
>  }
>  
>  /*
> + * If ksoftirqd is scheduled, we do not want to process pending softirqs
> + * right now. Let ksoftirqd handle this at its own rate, to get fairness.
> + */
> +static bool ksoftirqd_running(void)
> +{
> + struct task_struct *tsk = __this_cpu_read(ksoftirqd);
> +
> + return tsk && (tsk->state == TASK_RUNNING);
> +}
> +
> +/*
>   * preempt_count and SOFTIRQ_OFFSET usage:
>   * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
>   *   softirq processing.
> @@ -313,7 +324,7 @@ asmlinkage __visible void do_softirq(void)
>  
>   pending = local_softirq_pending();
>  
> - if (pending)
> + if (pending && !ksoftirqd_running())
>   do_softirq_own_stack();
>  
>   local_irq_restore(flags);
> @@ -340,6 +351,9 @@ void irq_enter(void)
>  
>  static inline void invoke_softirq(void)
>  {
> + if (ksoftirqd_running())
> + return;
> +
>   if (!force_irqthreads) {
>  #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
>   /*
> 
> 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released

2016-08-31 Thread Arnd Bergmann

On Wednesday, August 31, 2016 6:39:04 PM CEST David Howells wrote:
> Arnd Bergmann  wrote:
> 
> > gcc -Wmaybe-initialized correctly points out a newly introduced bug
> > through which we can end up calling rxrpc_queue_call() for a dead
> > connection:
> 
> How do you turn that on from within the Kbuild system?

You don't, my mistake. My build bot runs with 6e8d666e9253 ("Disable
"maybe-uninitialized" warning globally") disabled, and I had
assumed that Linus left the warning enabled with "make W=1", but
that was incorrect as Trond Myklebust also pointed out.

You still get the warning with "make EXTRA_CFLAGS=-Wmaybe-uninitialized",
which of course nobody normally does.

I'll try to come up with a patch to enable the warning in the W=1
level in the same conditions that used to be enabled up to v4.7.

Arnd

Re: possible circular locking dependency detected (bisected)

2016-08-31 Thread CAI Qian

Reverted the patch below fixes this problem.

c845acb324aa85a39650a14e7696982ceea75dc1
af_unix: Fix splice-bind deadlock

   CAI Qian

- Original Message -
> From: "CAI Qian" 
> To: secur...@kernel.org
> Cc: "Miklos Szeredi" , "Eric Sandeen" 
> 
> Sent: Tuesday, August 30, 2016 5:05:45 PM
> Subject: Re: possible circular locking dependency detected
> 
> FYI, this one can only be reproduced using the overlayfs docker backend.
> The device-mapper works fine. The XFS below has ftype=1.
> 
> # cp recvmsg01 /mnt
> # docker run -it -v /mnt/:/mnt/ rhel7 bash
> [root@c33c99aedd93 /]# mount
> overlay on / type overlay
> (rw,relatime,seclabel,lowerdir=l/I5VXL74ENBNAEARZ4M2SIN3XD6:l/KZGBKPXLDXUGHYWMERFUBM4FRP,upperdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/diff,workdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/work)
> ...
> [root@c33c99aedd93 /]# /mnt/recvmsg01
> CAI Qian
> 
> - Original Message -
> > From: "CAI Qian" 
> > To: secur...@kernel.org
> > Sent: Friday, August 26, 2016 10:50:57 AM
> > Subject: possible circular locking dependency detected
> > 
> > FYI, just want to give a head up to see if there is anything obvious so
> > we can avoid a possible DoS somehow.
> > 
> > Running the LTP syscalls tests inside a container until this test trigger
> > below,
> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/recvmsg/recvmsg01.c
> > 
> > [ 4441.904103] open04 (42409) used greatest stack depth: 20552 bytes left
> > [ 4605.419167]
> > [ 4605.420831] ==
> > [ 4605.427727] [ INFO: possible circular locking dependency detected ]
> > [ 4605.434720] 4.8.0-rc3+ #3 Not tainted
> > [ 4605.438803] ---
> > [ 4605.445796] recvmsg01/42878 is trying to acquire lock:
> > [ 4605.451528]  (sb_writers#8){.+.+.+}, at: []
> > __sb_start_write+0xb4/0xf0
> > [ 4605.460642]
> > [ 4605.460642] but task is already holding lock:
> > [ 4605.467150]  (>readlock){+.+.+.}, at: []
> > unix_bind+0x299/0xdf0
> > [ 4605.475749]
> > [ 4605.475749] which lock already depends on the new lock.
> > [ 4605.475749]
> > [ 4605.484882]
> > [ 4605.484882] the existing dependency chain (in reverse order) is:
> > [ 4605.493234]
> > [ 4605.493234] -> #2 (>readlock){+.+.+.}:
> > [ 4605.497943][] lock_acquire+0x1fa/0x440
> > [ 4605.504659][]
> > mutex_lock_interruptible_nested+0xdd/0x920
> > [ 4605.513119][] unix_bind+0x299/0xdf0
> > [ 4605.519540][] SYSC_bind+0x1d8/0x240
> > [ 4605.525964][] SyS_bind+0xe/0x10
> > [ 4605.531998][] do_syscall_64+0x1a6/0x500
> > [ 4605.538811][] return_from_SYSCALL_64+0x0/0x7a
> > [ 4605.546203]
> > [ 4605.546203] -> #1 (>i_mutex_dir_key#3/1){+.+.+.}:
> > [ 4605.552292][] lock_acquire+0x1fa/0x440
> > [ 4605.559002][] down_write_nested+0x5e/0xe0
> > [ 4605.566008][] filename_create+0x155/0x470
> > [ 4605.573013][] SyS_mkdir+0xaf/0x1f0
> > [ 4605.579339][]
> > entry_SYSCALL_64_fastpath+0x1f/0xbd
> > [ 4605.587119]
> > [ 4605.587119] -> #0 (sb_writers#8){.+.+.+}:
> > [ 4605.591835][] __lock_acquire+0x3043/0x3dd0
> > [ 4605.598935][] lock_acquire+0x1fa/0x440
> > [ 4605.605646][] percpu_down_read+0x4f/0xa0
> > [ 4605.612552][] __sb_start_write+0xb4/0xf0
> > [ 4605.619459][] mnt_want_write+0x41/0xb0
> > [ 4605.626173][] ovl_want_write+0x76/0xa0
> > [overlay]
> > [ 4605.633860][] ovl_create_object+0xa3/0x2d0
> > [overlay]
> > [ 4605.641942][] ovl_mknod+0x31/0x40 [overlay]
> > [ 4605.649138][] vfs_mknod+0x34b/0x560
> > [ 4605.655570][] unix_bind+0x4ca/0xdf0
> > [ 4605.661991][] SYSC_bind+0x1d8/0x240
> > [ 4605.668412][] SyS_bind+0xe/0x10
> > [ 4605.674456][] do_syscall_64+0x1a6/0x500
> > [ 4605.681266][] return_from_SYSCALL_64+0x0/0x7a
> > [ 4605.688657]
> > [ 4605.688657] other info that might help us debug this:
> > [ 4605.688657]
> > [ 4605.697590] Chain exists of:
> > [ 4605.697590]   sb_writers#8 --> >i_mutex_dir_key#3/1 -->
> > >readlock
> > [ 4605.697590]
> > [ 4605.707287]  Possible unsafe locking scenario:
> > [ 4605.707287]
> > [ 4605.713890]CPU0CPU1
> > [ 4605.718943]
> > [ 4605.723995]   lock(>readlock);
> > [ 4605.727708]
> > lock(>i_mutex_dir_key#3/1);
> > [ 4605.735613]lock(>readlock);
> > [ 4605.742146]   lock(sb_writers#8);
> > [ 4605.745880]
> > [ 4605.745880]  *** DEADLOCK ***
> > [ 4605.745880]
> > [ 4605.752486] 3 locks held by recvmsg01/42878:
> > [ 4605.757247]  #0:  (sb_writers#13){.+.+.+}, at: []
> > __sb_start_write+0xb4/0xf0
> > [ 4605.766930]  #1:  (>s_type->i_mutex_key#16/1){+.+.+.}, at:
> > [] filename_create+0x155/0x470
> > [

Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-31 Thread Timur Tabi


Florian Fainelli wrote:

if these are truly 64-bits stats, how come you are using a single
readl_* to access them? Or is the u64 rx_err_addr just used as temporary
storage and aligned to the largest size you need to deal with?


"*stats_itr += val;" takes the 32-bit val, zero-extends it to 64 bits, 
and then adds that to the corresponding 64-bit field in emac_stats.


--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-31 Thread Florian Fainelli

On 08/31/2016 11:57 AM, Timur Tabi wrote:
> Timur Tabi wrote:
>>
>>> Seems that there are several unused members in the emac_stats struct:
>>>
 +struct emac_stats {
>>> ...
>>> ...
>>> Both rx_bcast_byte_cnt and rx_mcast_byte_cnt are not used anywhere/
 +   u64 rx_bcast_byte_cnt;  /* broadcast packets byte count
 (without FCS) */
 +   u64 rx_mcast_byte_cnt;  /* multicast packets byte count
 (without FCS) */
>>> ...
>>> rx_err_addr is not used
 +   u64 rx_err_addr;/* packets dropped due to address
 filtering */
>>
>> I'll go through the structure and remove the unused fields.
> 
> It turns out I cannot actually strip out those "unused" fields.  They
> are all indirectly used in emac_get_stats64:
> 
> u64 *stats_itr = >stats.rx_ok;
> 
> while (addr <= REG_MAC_RX_STATUS_END) {
> val = readl_relaxed(adpt->base + addr);
> *stats_itr += val;
> stats_itr++;
> addr += sizeof(u32);
> }

if these are truly 64-bits stats, how come you are using a single
readl_* to access them? Or is the u64 rx_err_addr just used as temporary
storage and aligned to the largest size you need to deal with?
-- 
Florian

Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver

2016-08-31 Thread Timur Tabi


Timur Tabi wrote:



Seems that there are several unused members in the emac_stats struct:


+struct emac_stats {

...
...
Both rx_bcast_byte_cnt and rx_mcast_byte_cnt are not used anywhere/

+   u64 rx_bcast_byte_cnt;  /* broadcast packets byte count
(without FCS) */
+   u64 rx_mcast_byte_cnt;  /* multicast packets byte count
(without FCS) */

...
rx_err_addr is not used

+   u64 rx_err_addr;/* packets dropped due to address
filtering */


I'll go through the structure and remove the unused fields.


It turns out I cannot actually strip out those "unused" fields.  They 
are all indirectly used in emac_get_stats64:


u64 *stats_itr = >stats.rx_ok;

while (addr <= REG_MAC_RX_STATUS_END) {
val = readl_relaxed(adpt->base + addr);
*stats_itr += val;
stats_itr++;
addr += sizeof(u32);
}

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [PATCH net-next V4 4/4] net/sched: Introduce act_tunnel_key

2016-08-31 Thread Eric Dumazet

On Wed, Aug 31, 2016 at 5:46 AM, Hadar Hen Zion  wrote:
>
> From: Amir Vadai 
>
> This action could be used before redirecting packets to a shared tunnel
> device, or when redirecting packets arriving from a such a device.
>
>
> +
> +struct tcf_tunnel_key_params {
> +   struct rcu_head rcu;
> +   int tcft_action;

Also add " int action;"

(see why later)

> +   struct metadata_dst *tcft_enc_metadata;
> +};
> +



> +
> +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
> + struct tcf_result *res)
> +{
> +   struct tcf_tunnel_key *t = to_tunnel_key(a);
> +   struct tcf_tunnel_key_params *params;
> +   int action;
> +
> +   rcu_read_lock();
> +
> +   params = rcu_dereference(t->params);
> +
> +   tcf_lastuse_update(>tcf_tm);
> +   bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
> +   action = t->tcf_action;

Ideally, you should read param->action instead of t->tcf_action to be
completely clean.

> +
> +   switch (params->tcft_action) {
> +   case TCA_TUNNEL_KEY_ACT_RELEASE:
> +   skb_dst_drop(skb);
> +   break;
> +   case TCA_TUNNEL_KEY_ACT_SET:
> +   skb_dst_drop(skb);
> +   skb_dst_set(skb, dst_clone(>tcft_enc_metadata->dst));
> +   break;
> +   default:
> +   WARN_ONCE(1, "Bad tunnel_key action.\n");
> +   break;
> +   }
> +
> +   rcu_read_unlock();
> +
> +   return action;
> +}
>

Re: [PATCH RFC 4/4] xfs: Transmit flow steering

2016-08-31 Thread Chris Mason




On 08/30/2016 08:00 PM, Tom Herbert wrote:

XFS maintains a per device flow table that is indexed by the skbuff
hash. The XFS table is only consulted when there is no queue saved in
a transmit socket for an skbuff.

Each entry in the flow table contains a queue index and a queue
pointer. The queue pointer is set when a queue is chosen using a
flow table entry. This pointer is set to the head pointer in the
transmit queue (which is maintained by BQL).

The new function get_xfs_index that looks up flows in the XPS table.
The entry returned gives the last queue a matching flow used. The
returned queue is compared against the normal XPS queue. If they
are different, then we only switch if the tail pointer in the TX
queue has advanced past the pointer saved in the entry. In this
way OOO should be avoided when XPS wants to use a different queue.



I'd love for Dave Chinner to get some networking bug reports, but maybe 
we shouldn't call it XFS?


At least CONFIG_XFS should be something else.  It doesn't conflict now 
because we have CONFIG_XFS_FS, but even CONFIG_XFS_NET sounds like it's 
related to the filesystem instead of transmit flows.


[ Sorry, four patches in and all I do is complain about the name ]

-chris


Signed-off-by: Tom Herbert 
---
 net/Kconfig|  6 
 net/core/dev.c | 93 --
 2 files changed, 84 insertions(+), 15 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index 7b6cd34..5e3eddf 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -255,6 +255,12 @@ config XPS
depends on SMP
default y

+config XFS
+   bool
+   depends on XPS
+   depends on BQL
+   default y
+
 config HWBM
bool


...


-static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb)
+/* Must be called with RCU read_lock */
+static int get_xfs_index(struct net_device *dev, struct sk_buff *skb)
 {
-   struct sock *sk = skb->sk;
-   int queue_index = sk_tx_queue_get(sk);
+#ifdef CONFIG_XFS
+   struct xps_dev_flow_table *flow_table;
+   struct xps_dev_flow ent;
+   int queue_index;
+   struct netdev_queue *txq;
+   u32 hash;

Re: [PATCH] net/ethernet: Use ether_addr_copy rather than memcpy

2016-08-31 Thread Greg

On Wed, 2016-08-31 at 11:32 -0700, Eric Dumazet wrote:
> On Wed, 2016-08-31 at 09:32 -0700, Greg Rose wrote:
> > I'm not sure why this hasn't been done before because it seems obvious,
> > so maybe there is some reason that memcpy is used instead of
> > ether_addr_copy in this code.  But let's try this anyway.
> > 
> > Change memcpy to ether_addr_copy.
> 
> ...
> 
> >  
> > @@ -211,7 +211,7 @@ EXPORT_SYMBOL(eth_type_trans);
> >  int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr)
> >  {
> > const struct ethhdr *eth = eth_hdr(skb);
> > -   memcpy(haddr, eth->h_source, ETH_ALEN);
> > +   ether_addr_copy(haddr, eth->h_source);
> 
> 
> Please carefully read ether_addr_copy() comments.
> 
> Not all arches are like x86

Thanks Eric, Joe set me straight already.

- Greg
> 
> 
>

Re: [PATCH] net/ethernet: Use ether_addr_copy rather than memcpy

2016-08-31 Thread Eric Dumazet

On Wed, 2016-08-31 at 09:32 -0700, Greg Rose wrote:
> I'm not sure why this hasn't been done before because it seems obvious,
> so maybe there is some reason that memcpy is used instead of
> ether_addr_copy in this code.  But let's try this anyway.
> 
> Change memcpy to ether_addr_copy.

...

>  
> @@ -211,7 +211,7 @@ EXPORT_SYMBOL(eth_type_trans);
>  int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr)
>  {
>   const struct ethhdr *eth = eth_hdr(skb);
> - memcpy(haddr, eth->h_source, ETH_ALEN);
> + ether_addr_copy(haddr, eth->h_source);


Please carefully read ether_addr_copy() comments.

Not all arches are like x86

[PATCH net-next V4 10/10] liquidio: CN23XX firmware download

2016-08-31 Thread Raghu Vatsavayi

Add firmware download support for cn23xx device.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 .../ethernet/cavium/liquidio/cn23xx_pf_device.c|  40 +++
 .../ethernet/cavium/liquidio/cn23xx_pf_device.h|   2 +
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 115 -
 3 files changed, 111 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index 2e78101..2d81206 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -214,6 +214,37 @@ void cn23xx_dump_pf_initialized_regs(struct octeon_device 
*oct)
CVM_CAST64(octeon_read_csr64(oct, CN23XX_SLI_PKT_CNT_INT)));
 }
 
+static int cn23xx_pf_soft_reset(struct octeon_device *oct)
+{
+   octeon_write_csr64(oct, CN23XX_WIN_WR_MASK_REG, 0xFF);
+
+   dev_dbg(>pci_dev->dev, "OCTEON[%d]: BIST enabled for CN23XX soft 
reset\n",
+   oct->octeon_id);
+
+   octeon_write_csr64(oct, CN23XX_SLI_SCRATCH1, 0x1234ULL);
+
+   /* Initiate chip-wide soft reset */
+   lio_pci_readq(oct, CN23XX_RST_SOFT_RST);
+   lio_pci_writeq(oct, 1, CN23XX_RST_SOFT_RST);
+
+   /* Wait for 100ms as Octeon resets. */
+   mdelay(100);
+
+   if (octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1) == 0x1234ULL) {
+   dev_err(>pci_dev->dev, "OCTEON[%d]: Soft reset failed\n",
+   oct->octeon_id);
+   return 1;
+   }
+
+   dev_dbg(>pci_dev->dev, "OCTEON[%d]: Reset completed\n",
+   oct->octeon_id);
+
+   /* restore the  reset value*/
+   octeon_write_csr64(oct, CN23XX_WIN_WR_MASK_REG, 0xFF);
+
+   return 0;
+}
+
 static void cn23xx_enable_error_reporting(struct octeon_device *oct)
 {
u32 regval;
@@ -1030,6 +1061,7 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device 
*oct)
oct->fn_list.process_interrupt_regs = cn23xx_interrupt_handler;
oct->fn_list.msix_interrupt_handler = cn23xx_pf_msix_interrupt_handler;
 
+   oct->fn_list.soft_reset = cn23xx_pf_soft_reset;
oct->fn_list.setup_device_regs = cn23xx_setup_pf_device_regs;
 
oct->fn_list.enable_interrupt = cn23xx_enable_pf_interrupt;
@@ -1129,3 +1161,11 @@ void cn23xx_dump_iq_regs(struct octeon_device *oct)
CVM_CAST64(octeon_read_csr64(
oct, CN23XX_SLI_S2M_PORTX_CTL(oct->pcie_port;
 }
+
+int cn23xx_fw_loaded(struct octeon_device *oct)
+{
+   u64 val;
+
+   val = octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1);
+   return (val >> 1) & 1ULL;
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h
index 36252e7..33b7589 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h
@@ -52,4 +52,6 @@ int validate_cn23xx_pf_config_info(struct octeon_device *oct,
   struct octeon_config *conf23xx);
 
 void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct);
+
+int cn23xx_fw_loaded(struct octeon_device *oct);
 #endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 464d42b..866c075 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -1312,9 +1312,9 @@ static void octeon_destroy_resources(struct octeon_device 
*oct)
 
/* fallthrough */
case OCT_DEV_PCI_MAP_DONE:
-
/* Soft reset the octeon device before exiting */
-   oct->fn_list.soft_reset(oct);
+   if ((!OCTEON_CN23XX_PF(oct)) || !oct->octeon_id)
+   oct->fn_list.soft_reset(oct);
 
octeon_unmap_pci_barx(oct, 0);
octeon_unmap_pci_barx(oct, 1);
@@ -3823,6 +3823,7 @@ static void nic_starter(struct work_struct *work)
 static int octeon_device_init(struct octeon_device *octeon_dev)
 {
int j, ret;
+   int fw_loaded = 0;
char bootcmd[] = "\n";
struct octeon_device_priv *oct_priv =
(struct octeon_device_priv *)octeon_dev->priv;
@@ -3844,9 +3845,23 @@ static int octeon_device_init(struct octeon_device 
*octeon_dev)
 
octeon_dev->app_mode = CVM_DRV_INVALID_APP;
 
-   /* Do a soft reset of the Octeon device. */
-   if (octeon_dev->fn_list.soft_reset(octeon_dev))
+   if (OCTEON_CN23XX_PF(octeon_dev)) {
+   if (!cn23xx_fw_loaded(octeon_dev)) {
+   fw_loaded = 0;
+   /* Do a soft reset of the Octeon device. */
+

[PATCH net-next V4 01/10] liquidio: Consolidate common functionality

2016-08-31 Thread Raghu Vatsavayi

Consolidate common functionality of various devices
from different files into lio_core.c/octeon_console.c.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/Makefile  |  23 +-
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   |  31 ---
 .../net/ethernet/cavium/liquidio/cn66xx_device.h   |   1 -
 .../net/ethernet/cavium/liquidio/cn68xx_device.c   |   1 -
 drivers/net/ethernet/cavium/liquidio/lio_core.c| 261 +++
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c |  18 +-
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 276 +
 .../net/ethernet/cavium/liquidio/octeon_console.c  | 117 -
 .../net/ethernet/cavium/liquidio/octeon_device.c   | 104 
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   1 -
 drivers/net/ethernet/cavium/liquidio/octeon_main.h |  24 +-
 .../net/ethernet/cavium/liquidio/octeon_mem_ops.c  |   1 -
 .../net/ethernet/cavium/liquidio/octeon_network.h  |   2 -
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  |   8 +-
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  |   2 +-
 15 files changed, 426 insertions(+), 444 deletions(-)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_core.c

diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile 
b/drivers/net/ethernet/cavium/liquidio/Makefile
index 2f36680..d44111d 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -3,14 +3,15 @@
 #
 obj-$(CONFIG_LIQUIDIO) += liquidio.o
 
-liquidio-objs := lio_main.o  \
- lio_ethtool.o  \
- request_manager.o  \
- response_manager.o \
- octeon_device.o\
- cn66xx_device.o\
- cn68xx_device.o\
- octeon_mem_ops.o   \
- octeon_droq.o  \
- octeon_console.o   \
- octeon_nic.o
+liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \
+   lio_core.o \
+   request_manager.o  \
+   response_manager.o \
+   octeon_device.o\
+   cn66xx_device.o\
+   cn68xx_device.o\
+   octeon_mem_ops.o   \
+   octeon_droq.o  \
+   octeon_nic.o
+
+liquidio-objs := lio_main.o octeon_console.o $(liquidio-y)
diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
index c03d370..dc5d14a 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
@@ -418,36 +418,6 @@ void lio_cn6xxx_disable_io_queues(struct octeon_device 
*oct)
octeon_write_csr(oct, CN6XXX_SLI_PKT_TIME_INT, d32);
 }
 
-void lio_cn6xxx_reinit_regs(struct octeon_device *oct)
-{
-   int i;
-
-   for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct); i++) {
-   if (!(oct->io_qmask.iq & (1ULL << i)))
-   continue;
-   oct->fn_list.setup_iq_regs(oct, i);
-   }
-
-   for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) {
-   if (!(oct->io_qmask.oq & (1ULL << i)))
-   continue;
-   oct->fn_list.setup_oq_regs(oct, i);
-   }
-
-   oct->fn_list.setup_device_regs(oct);
-
-   oct->fn_list.enable_interrupt(oct->chip);
-
-   oct->fn_list.enable_io_queues(oct);
-
-   /* for (i = 0; i < oct->num_oqs; i++) { */
-   for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) {
-   if (!(oct->io_qmask.oq & (1ULL << i)))
-   continue;
-   writel(oct->droq[i]->max_count, oct->droq[i]->pkts_credit_reg);
-   }
-}
-
 void
 lio_cn6xxx_bar1_idx_setup(struct octeon_device *oct,
  u64 core_addr,
@@ -714,7 +684,6 @@ int lio_setup_cn66xx_octeon_device(struct octeon_device 
*oct)
 
oct->fn_list.soft_reset = lio_cn6xxx_soft_reset;
oct->fn_list.setup_device_regs = lio_cn6xxx_setup_device_regs;
-   oct->fn_list.reinit_regs = lio_cn6xxx_reinit_regs;
oct->fn_list.update_iq_read_idx = lio_cn6xxx_update_read_index;
 
oct->fn_list.bar1_idx_setup = lio_cn6xxx_bar1_idx_setup;
diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h 
b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h
index 28c4722..2e4bc25 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h
@@ -83,7 +83,6 @@ void lio_cn6xxx_setup_oq_regs(struct octeon_device *oct, u32 
oq_no);
 void lio_cn6xxx_enable_io_queues(struct octeon_device *oct);
 void lio_cn6xxx_disable_io_queues(struct

[PATCH net-next V4 09/10] liquidio: MSIX support for CN23XX

2016-08-31 Thread Raghu Vatsavayi

This patch adds support msix interrupt for cn23xx device.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 166 +++--
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   |  10 +-
 .../net/ethernet/cavium/liquidio/cn66xx_device.h   |   4 +-
 drivers/net/ethernet/cavium/liquidio/lio_main.c| 269 +
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  39 +++
 .../net/ethernet/cavium/liquidio/octeon_device.h   |  33 ++-
 6 files changed, 452 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index 7e932a3..2e78101 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -567,10 +567,16 @@ static void cn23xx_setup_iq_regs(struct octeon_device 
*oct, u32 iq_no)
 */
pkt_in_done = readq(iq->inst_cnt_reg);
 
-   /* Clear the count by writing back what we read, but don't
-* enable interrupts
-*/
-   writeq(pkt_in_done, iq->inst_cnt_reg);
+   if (oct->msix_on) {
+   /* Set CINT_ENB to enable IQ interrupt   */
+   writeq((pkt_in_done | CN23XX_INTR_CINT_ENB),
+  iq->inst_cnt_reg);
+   } else {
+   /* Clear the count by writing back what we read, but don't
+* enable interrupts
+*/
+   writeq(pkt_in_done, iq->inst_cnt_reg);
+   }
 
iq->reset_instr_cnt = 0;
 }
@@ -579,6 +585,9 @@ static void cn23xx_setup_oq_regs(struct octeon_device *oct, 
u32 oq_no)
 {
u32 reg_val;
struct octeon_droq *droq = oct->droq[oq_no];
+   struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip;
+   u64 time_threshold;
+   u64 cnt_threshold;
 
oq_no += oct->sriov_info.pf_srn;
 
@@ -595,19 +604,31 @@ static void cn23xx_setup_oq_regs(struct octeon_device 
*oct, u32 oq_no)
droq->pkts_credit_reg =
(u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_OQ_PKTS_CREDIT(oq_no);
 
-   /* Enable this output queue to generate Packet Timer Interrupt
-   */
-   reg_val = octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no));
-   reg_val |= CN23XX_PKT_OUTPUT_CTL_TENB;
-   octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no),
-reg_val);
+   if (!oct->msix_on) {
+   /* Enable this output queue to generate Packet Timer Interrupt
+*/
+   reg_val =
+   octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no));
+   reg_val |= CN23XX_PKT_OUTPUT_CTL_TENB;
+   octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no),
+reg_val);
 
-   /* Enable this output queue to generate Packet Count Interrupt
-   */
-   reg_val = octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no));
-   reg_val |= CN23XX_PKT_OUTPUT_CTL_CENB;
-   octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no),
-reg_val);
+   /* Enable this output queue to generate Packet Count Interrupt
+*/
+   reg_val =
+   octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no));
+   reg_val |= CN23XX_PKT_OUTPUT_CTL_CENB;
+   octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no),
+reg_val);
+   } else {
+   time_threshold = cn23xx_pf_get_oq_ticks(
+   oct, (u32)CFG_GET_OQ_INTR_TIME(cn23xx->conf));
+   cnt_threshold = (u32)CFG_GET_OQ_INTR_PKT(cn23xx->conf);
+
+   octeon_write_csr64(
+   oct, CN23XX_SLI_OQ_PKT_INT_LEVELS(oq_no),
+   ((time_threshold << 32 | cnt_threshold)));
+   }
 }
 
 static int cn23xx_enable_io_queues(struct octeon_device *oct)
@@ -762,6 +783,110 @@ static void cn23xx_disable_io_queues(struct octeon_device 
*oct)
}
 }
 
+static u64 cn23xx_pf_msix_interrupt_handler(void *dev)
+{
+   struct octeon_ioq_vector *ioq_vector = (struct octeon_ioq_vector *)dev;
+   struct octeon_device *oct = ioq_vector->oct_dev;
+   u64 pkts_sent;
+   u64 ret = 0;
+   struct octeon_droq *droq = oct->droq[ioq_vector->droq_index];
+
+   dev_dbg(>pci_dev->dev, "In %s octeon_dev @ %p\n", __func__, oct);
+
+   if (!droq) {
+   dev_err(>pci_dev->dev, "23XX bringup FIXME: oct pfnum:%d 
ioq_vector->ioq_num :%d droq is NULL\n",
+   oct->pf_num, ioq_vector->ioq_num);
+   return 0;
+   }
+
+   pkts_sent = readq(droq->pkts_sent_reg);
+
+   /* If our device

[PATCH net-next V4 04/10] liquidio: CN23XX register definitions

2016-08-31 Thread Raghu Vatsavayi

This patch adds register definitions and structures for new
device cn23xx.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 .../net/ethernet/cavium/liquidio/cn23xx_pf_regs.h  | 604 +
 1 file changed, 604 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h
new file mode 100644
index 000..03d79d9
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h
@@ -0,0 +1,604 @@
+/**
+* Author: Cavium, Inc.
+*
+* Contact: supp...@cavium.com
+*  Please include "LiquidIO" in the subject.
+*
+* Copyright (c) 2003-2015 Cavium, Inc.
+*
+* This file is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License, Version 2, as
+* published by the Free Software Foundation.
+*
+* This file is distributed in the hope that it will be useful, but
+* AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+* of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+* NONINFRINGEMENT.  See the GNU General Public License for more
+* details.
+*
+* This file may also be available under a different license from Cavium.
+* Contact Cavium, Inc. for more information
+**/
+
+/*! \file cn23xx_regs.h
+ * \brief Host Driver: Register Address and Register Mask values for
+ * Octeon CN23XX devices.
+*/
+
+#ifndef __CN23XX_PF_REGS_H__
+#define __CN23XX_PF_REGS_H__
+
+#define CN23XX_CONFIG_VENDOR_ID0x00
+#define CN23XX_CONFIG_DEVICE_ID0x02
+
+#define CN23XX_CONFIG_XPANSION_BAR 0x38
+
+#define CN23XX_CONFIG_MSIX_CAP0x50
+#define CN23XX_CONFIG_MSIX_LMSI   0x54
+#define CN23XX_CONFIG_MSIX_UMSI   0x58
+#define CN23XX_CONFIG_MSIX_MSIMD  0x5C
+#define CN23XX_CONFIG_MSIX_MSIMM  0x60
+#define CN23XX_CONFIG_MSIX_MSIMP  0x64
+
+#define CN23XX_CONFIG_PCIE_CAP 0x70
+#define CN23XX_CONFIG_PCIE_DEVCAP  0x74
+#define CN23XX_CONFIG_PCIE_DEVCTL  0x78
+#define CN23XX_CONFIG_PCIE_LINKCAP 0x7C
+#define CN23XX_CONFIG_PCIE_LINKCTL 0x80
+#define CN23XX_CONFIG_PCIE_SLOTCAP 0x84
+#define CN23XX_CONFIG_PCIE_SLOTCTL 0x88
+#define CN23XX_CONFIG_PCIE_DEVCTL2 0x98
+#define CN23XX_CONFIG_PCIE_LINKCTL20xA0
+#define CN23XX_CONFIG_PCIE_UNCORRECT_ERR_MASK  0x108
+#define CN23XX_CONFIG_PCIE_CORRECT_ERR_STATUS  0x110
+#define CN23XX_CONFIG_PCIE_DEVCTL_MASK 0x0004
+
+#define CN23XX_PCIE_SRIOV_FDL 0x188
+#define CN23XX_PCIE_SRIOV_FDL_BIT_POS 0x10
+#define CN23XX_PCIE_SRIOV_FDL_MASK0xFF
+
+#define CN23XX_CONFIG_PCIE_FLTMSK  0x720
+
+#define CN23XX_CONFIG_SRIOV_VFDEVID0x190
+
+#define CN23XX_CONFIG_SRIOV_BAR_START 0x19C
+#define CN23XX_CONFIG_SRIOV_BARX(i)\
+   (CN23XX_CONFIG_SRIOV_BAR_START + (i * 4))
+#define CN23XX_CONFIG_SRIOV_BAR_PF0x08
+#define CN23XX_CONFIG_SRIOV_BAR_64BIT 0x04
+#define CN23XX_CONFIG_SRIOV_BAR_IO0x01
+
+/* ##  BAR0 Registers  */
+
+#defineCN23XX_SLI_CTL_PORT_START   0x286E0
+#defineCN23XX_PORT_OFFSET  0x10
+
+#defineCN23XX_SLI_CTL_PORT(p)  \
+   (CN23XX_SLI_CTL_PORT_START + ((p) * CN23XX_PORT_OFFSET))
+
+/* 2 scatch registers (64-bit)  */
+#defineCN23XX_SLI_WINDOW_CTL   0x282E0
+#defineCN23XX_SLI_SCRATCH1 0x283C0
+#defineCN23XX_SLI_SCRATCH2 0x283D0
+#defineCN23XX_SLI_WINDOW_CTL_DEFAULT   0x20ULL
+
+/* 1 registers (64-bit)  - SLI_CTL_STATUS */
+#defineCN23XX_SLI_CTL_STATUS   0x28570
+
+/* SLI Packet Input Jabber Register (64 bit register)
+ * <31:0> for Byte count for limiting sizes of packet sizes
+ * that are allowed for sli packet inbound packets.
+ * the default value is 0xFA00(=64000).
+ */
+#defineCN23XX_SLI_PKT_IN_JABBER0x29170
+/* The input jabber is used to determine the TSO max size.
+ * Due to H/W limitation, this need to be reduced to 6
+ * in order to to H/W TSO and avoid the WQE malfarmation
+ * PKO_BUG_24989_WQE_LEN
+ */
+#defineCN23XX_DEFAULT_INPUT_JABBER 0xEA60 /*6*/
+
+#defineCN23XX_WIN_WR_ADDR_LO

[PATCH net-next V4 06/10] liquidio: CN23XX device init and sriov config

2016-08-31 Thread Raghu Vatsavayi

Add support for cn23xx device init and sriov queue config.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/Makefile  |   1 +
 .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 527 +
 .../ethernet/cavium/liquidio/cn23xx_pf_device.h|   7 +
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  10 +-
 4 files changed, 544 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c

diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile 
b/drivers/net/ethernet/cavium/liquidio/Makefile
index d44111d..5a27b2a 100644
--- a/drivers/net/ethernet/cavium/liquidio/Makefile
+++ b/drivers/net/ethernet/cavium/liquidio/Makefile
@@ -10,6 +10,7 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \
octeon_device.o\
cn66xx_device.o\
cn68xx_device.o\
+   cn23xx_pf_device.o \
octeon_mem_ops.o   \
octeon_droq.o  \
octeon_nic.o
diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
new file mode 100644
index 000..ccc3d5b
--- /dev/null
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -0,0 +1,527 @@
+/**
+* Author: Cavium, Inc.
+*
+* Contact: supp...@cavium.com
+*  Please include "LiquidIO" in the subject.
+*
+* Copyright (c) 2003-2015 Cavium, Inc.
+*
+* This file is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License, Version 2, as
+* published by the Free Software Foundation.
+*
+* This file is distributed in the hope that it will be useful, but
+* AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty
+* of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or
+* NONINFRINGEMENT.  See the GNU General Public License for more
+* details.
+*
+* This file may also be available under a different license from Cavium.
+* Contact Cavium, Inc. for more information
+**/
+
+#include 
+#include 
+#include 
+#include "liquidio_common.h"
+#include "octeon_droq.h"
+#include "octeon_iq.h"
+#include "response_manager.h"
+#include "octeon_device.h"
+#include "cn23xx_pf_device.h"
+#include "octeon_main.h"
+
+#define RESET_NOTDONE 0
+#define RESET_DONE 1
+
+/* Change the value of SLI Packet Input Jabber Register to allow
+ * VXLAN TSO packets which can be 64424 bytes, exceeding the
+ * MAX_GSO_SIZE we supplied to the kernel
+ */
+#define CN23XX_INPUT_JABBER 64600
+
+#define LIOLUT_RING_DISTRIBUTION 9
+const int liolut_num_vfs_to_rings_per_vf[LIOLUT_RING_DISTRIBUTION] = {
+   0, 8, 4, 2, 2, 2, 1, 1, 1
+};
+
+void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct)
+{
+   int i = 0;
+   u32 regval = 0;
+   struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip;
+
+   /*In cn23xx_soft_reset*/
+   dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%llx\n",
+   "CN23XX_WIN_WR_MASK_REG", CVM_CAST64(CN23XX_WIN_WR_MASK_REG),
+   CVM_CAST64(octeon_read_csr64(oct, CN23XX_WIN_WR_MASK_REG)));
+   dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n",
+   "CN23XX_SLI_SCRATCH1", CVM_CAST64(CN23XX_SLI_SCRATCH1),
+   CVM_CAST64(octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1)));
+   dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n",
+   "CN23XX_RST_SOFT_RST", CN23XX_RST_SOFT_RST,
+   lio_pci_readq(oct, CN23XX_RST_SOFT_RST));
+
+   /*In cn23xx_set_dpi_regs*/
+   dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n",
+   "CN23XX_DPI_DMA_CONTROL", CN23XX_DPI_DMA_CONTROL,
+   lio_pci_readq(oct, CN23XX_DPI_DMA_CONTROL));
+
+   for (i = 0; i < 6; i++) {
+   dev_dbg(>pci_dev->dev, "%s(%d)[%llx] : 0x%016llx\n",
+   "CN23XX_DPI_DMA_ENG_ENB", i,
+   CN23XX_DPI_DMA_ENG_ENB(i),
+   lio_pci_readq(oct, CN23XX_DPI_DMA_ENG_ENB(i)));
+   dev_dbg(>pci_dev->dev, "%s(%d)[%llx] : 0x%016llx\n",
+   "CN23XX_DPI_DMA_ENG_BUF", i,
+   CN23XX_DPI_DMA_ENG_BUF(i),
+   lio_pci_readq(oct, CN23XX_DPI_DMA_ENG_BUF(i)));
+   }
+
+   dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n", "CN23XX_DPI_CTL",
+   CN23XX_DPI_CTL, lio_pci_readq(oct, CN23XX_DPI_CTL));
+
+   /*In cn23xx_setup_pcie_mps and cn23xx_setup_pcie_mrrs */
+   pci_read_config_dword(oct->pci_dev, CN23XX_CONFIG_PCIE_DEVCTL, );

[PATCH net-next V4 02/10] liquidio: Firmware version management

2016-08-31 Thread Raghu Vatsavayi

This patch contains changes for firmware version management.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c  | 12 ++--
 .../net/ethernet/cavium/liquidio/liquidio_common.h   | 20 +---
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 2abc110..1bbeae8 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -3230,8 +3230,9 @@ static int setup_nic_devices(struct octeon_device 
*octeon_dev)
union oct_nic_if_cfg if_cfg;
unsigned int base_queue;
unsigned int gmx_port_id;
-   u32 resp_size, ctx_size;
+   u32 resp_size, ctx_size, data_size;
u32 ifidx_or_pfnum;
+   struct lio_version *vdata;
 
/* This is to handle link status changes */
octeon_register_dispatch_fn(octeon_dev, OPCODE_NIC,
@@ -3253,11 +3254,18 @@ static int setup_nic_devices(struct octeon_device 
*octeon_dev)
for (i = 0; i < octeon_dev->ifcount; i++) {
resp_size = sizeof(struct liquidio_if_cfg_resp);
ctx_size = sizeof(struct liquidio_if_cfg_context);
+   data_size = sizeof(struct lio_version);
sc = (struct octeon_soft_command *)
-   octeon_alloc_soft_command(octeon_dev, 0,
+   octeon_alloc_soft_command(octeon_dev, data_size,
  resp_size, ctx_size);
resp = (struct liquidio_if_cfg_resp *)sc->virtrptr;
ctx  = (struct liquidio_if_cfg_context *)sc->ctxptr;
+   vdata = (struct lio_version *)sc->virtdptr;
+
+   *((u64 *)vdata) = 0;
+   vdata->major = cpu_to_be16(LIQUIDIO_BASE_MAJOR_VERSION);
+   vdata->minor = cpu_to_be16(LIQUIDIO_BASE_MINOR_VERSION);
+   vdata->micro = cpu_to_be16(LIQUIDIO_BASE_MICRO_VERSION);
 
num_iqueues =
CFG_GET_NUM_TXQS_NIC_IF(octeon_get_conf(octeon_dev), i);
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h 
b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 199a8b9..11df55a 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -30,10 +30,24 @@
 
 #include "octeon_config.h"
 
-#define LIQUIDIO_BASE_VERSION   "1.4"
-#define LIQUIDIO_MICRO_VERSION  ".1"
 #define LIQUIDIO_PACKAGE ""
-#define LIQUIDIO_VERSION  "1.4.1"
+#define LIQUIDIO_BASE_MAJOR_VERSION 1
+#define LIQUIDIO_BASE_MINOR_VERSION 4
+#define LIQUIDIO_BASE_MICRO_VERSION 1
+#define LIQUIDIO_BASE_VERSION   __stringify(LIQUIDIO_BASE_MAJOR_VERSION) "." \
+   __stringify(LIQUIDIO_BASE_MINOR_VERSION)
+#define LIQUIDIO_MICRO_VERSION  "." __stringify(LIQUIDIO_BASE_MICRO_VERSION)
+#define LIQUIDIO_VERSIONLIQUIDIO_PACKAGE \
+   __stringify(LIQUIDIO_BASE_MAJOR_VERSION) "." \
+   __stringify(LIQUIDIO_BASE_MINOR_VERSION) \
+   "." __stringify(LIQUIDIO_BASE_MICRO_VERSION)
+
+struct lio_version {
+   u16  major;
+   u16  minor;
+   u16  micro;
+   u16  reserved;
+};
 
 #define CONTROL_IQ 0
 /** Tag types used by Octeon cores in its work. */
-- 
1.8.3.1

[PATCH net-next V4 00/10] liquidio CN23XX support

2016-08-31 Thread Raghu Vatsavayi

Dave,

Following patchset adds support for new device "CN23XX" in
liquidio family of adapters. As adviced by you I have split
the previous V3 patch of 18 patches into two halves. This
first patchset has first 10 patches, which are tested against
net-next. I will post the second half after this one.

This V4 patch also addressed all the comments from previous
submission:
1) Avoid busy loop while reading registers.
2) Other minor comments about debug messages and constants.

Please apply patches in following order as some of the
patches depend on earlier patches.


Raghu Vatsavayi (10):
  liquidio: Consolidate common functionality
  liquidio: Firmware version management
  liquidio: Common enable irq function
  liquidio: CN23XX register definitions
  liquidio: CN23XX queue definitions
  liquidio: CN23XX device init and sriov config
  liquidio: CN23XX register setup
  liquidio: CN23XX queue manipulation
  liquidio: MSIX support for CN23XX
  liquidio: CN23XX firmware download

 drivers/net/ethernet/cavium/Kconfig|2 +-
 drivers/net/ethernet/cavium/liquidio/Makefile  |   24 +-
 .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 1171 
 .../ethernet/cavium/liquidio/cn23xx_pf_device.h|   57 +
 .../net/ethernet/cavium/liquidio/cn23xx_pf_regs.h  |  604 ++
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   |   45 +-
 .../net/ethernet/cavium/liquidio/cn66xx_device.h   |7 +-
 .../net/ethernet/cavium/liquidio/cn68xx_device.c   |1 -
 drivers/net/ethernet/cavium/liquidio/lio_core.c|  261 +
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c |   18 +-
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  766 +++--
 .../net/ethernet/cavium/liquidio/liquidio_common.h |   22 +-
 .../net/ethernet/cavium/liquidio/octeon_config.h   |   59 +-
 .../net/ethernet/cavium/liquidio/octeon_console.c  |  117 +-
 .../net/ethernet/cavium/liquidio/octeon_device.c   |  302 +++--
 .../net/ethernet/cavium/liquidio/octeon_device.h   |  100 +-
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |   33 +-
 drivers/net/ethernet/cavium/liquidio/octeon_droq.h |2 +
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |2 +
 drivers/net/ethernet/cavium/liquidio/octeon_main.h |   24 +-
 .../net/ethernet/cavium/liquidio/octeon_mem_ops.c  |1 -
 .../net/ethernet/cavium/liquidio/octeon_network.h  |2 -
 drivers/net/ethernet/cavium/liquidio/octeon_nic.c  |8 +-
 drivers/net/ethernet/cavium/liquidio/octeon_nic.h  |2 +-
 .../net/ethernet/cavium/liquidio/request_manager.c |3 +
 25 files changed, 3022 insertions(+), 611 deletions(-)
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h
 create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_core.c

-- 
1.8.3.1

[PATCH net-next V4 03/10] liquidio: Common enable irq function

2016-08-31 Thread Raghu Vatsavayi

Add support of common irq enable functionality for both
iq(instruction queue) and oq(output queue).

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  1 +
 .../net/ethernet/cavium/liquidio/liquidio_common.h |  2 +-
 .../net/ethernet/cavium/liquidio/octeon_device.c   | 17 +++
 .../net/ethernet/cavium/liquidio/octeon_device.h   |  2 ++
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 33 +-
 drivers/net/ethernet/cavium/liquidio/octeon_droq.h |  2 ++
 drivers/net/ethernet/cavium/liquidio/octeon_iq.h   |  2 ++
 .../net/ethernet/cavium/liquidio/request_manager.c |  3 ++
 8 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index 1bbeae8..8f11a0b 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -192,6 +192,7 @@ static void octeon_droq_bh(unsigned long pdev)
continue;
reschedule |= octeon_droq_process_packets(oct, oct->droq[q_no],
  MAX_PACKET_BUDGET);
+   lio_enable_irq(oct->droq[q_no], NULL);
}
 
if (reschedule)
diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h 
b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
index 11df55a..8ffd3b8 100644
--- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
+++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h
@@ -846,7 +846,7 @@ struct oct_mdio_cmd {
 /* intrmod: max. packets to trigger interrupt */
 #define LIO_INTRMOD_RXMAXCNT_TRIGGER   384
 /* intrmod: min. packets to trigger interrupt */
-#define LIO_INTRMOD_RXMINCNT_TRIGGER   1
+#define LIO_INTRMOD_RXMINCNT_TRIGGER   0
 /* intrmod: max. time to trigger interrupt */
 #define LIO_INTRMOD_RXMAXTMR_TRIGGER   128
 /* 66xx:intrmod: min. time to trigger interrupt
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
index cff845c..541137a 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c
@@ -1122,3 +1122,20 @@ int lio_get_device_id(void *dev)
return octeon_dev->octeon_id;
return -1;
 }
+
+void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq)
+{
+   /* the whole thing needs to be atomic, ideally */
+   if (droq) {
+   spin_lock_bh(>lock);
+   writel(droq->pkt_count, droq->pkts_sent_reg);
+   droq->pkt_count = 0;
+   spin_unlock_bh(>lock);
+   }
+   if (iq) {
+   spin_lock_bh(>lock);
+   writel(iq->pkt_in_done, iq->inst_cnt_reg);
+   iq->pkt_in_done = 0;
+   spin_unlock_bh(>lock);
+   }
+}
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h 
b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
index d1251f4..02e9854 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h
@@ -660,6 +660,8 @@ void *oct_get_config_info(struct octeon_device *oct, u16 
card_type);
  */
 struct octeon_config *octeon_get_conf(struct octeon_device *oct);
 
+void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq);
+
 /* LiquidIO driver pivate flags */
 enum {
OCT_PRIV_FLAG_TX_BYTES = 0, /* Tx interrupts by pending byte count */
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
index e0afe4c..5dfc23d 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c
@@ -92,22 +92,25 @@ static inline void *octeon_get_dispatch_arg(struct 
octeon_device *octeon_dev,
return fn_arg;
 }
 
-/** Check for packets on Droq. This function should be called with
- * lock held.
+/** Check for packets on Droq. This function should be called with lock held.
  *  @param  droq - Droq on which count is checked.
  *  @return Returns packet count.
  */
 u32 octeon_droq_check_hw_for_pkts(struct octeon_droq *droq)
 {
u32 pkt_count = 0;
+   u32 last_count;
 
pkt_count = readl(droq->pkts_sent_reg);
-   if (pkt_count) {
-   atomic_add(pkt_count, >pkts_pending);
-   writel(pkt_count, droq->pkts_sent_reg);
-   }
 
-   return pkt_count;
+   last_count = pkt_count - droq->pkt_count;
+   droq->pkt_count = pkt_count;
+
+   /* we shall write to cnts  at napi irq enable or end of droq tasklet */
+   if (last_count)
+

[PATCH net-next V4 08/10] liquidio: CN23XX queue manipulation

2016-08-31 Thread Raghu Vatsavayi

This patch adds support for cn23xx queue manipulation.

Signed-off-by: Derek Chickles 
Signed-off-by: Satanand Burla 
Signed-off-by: Felix Manlunas 
Signed-off-by: Raghu Vatsavayi 
---
 .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 213 +
 .../net/ethernet/cavium/liquidio/cn66xx_device.c   |   4 +-
 .../net/ethernet/cavium/liquidio/cn66xx_device.h   |   2 +-
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  12 +-
 .../net/ethernet/cavium/liquidio/octeon_device.h   |   2 +-
 5 files changed, 225 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index d614b0a..7e932a3 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -311,6 +311,61 @@ static void cn23xx_setup_global_mac_regs(struct 
octeon_device *oct)
(oct, CN23XX_SLI_PKT_MAC_RINFO64(mac_no, pf_num)));
 }
 
+static int cn23xx_reset_io_queues(struct octeon_device *oct)
+{
+   int ret_val = 0;
+   u64 d64;
+   u32 q_no, srn, ern;
+   u32 loop = 1000;
+
+   srn = oct->sriov_info.pf_srn;
+   ern = srn + oct->sriov_info.num_pf_rings;
+
+   /*As per HRM reg description, s/w cant write 0 to ENB. */
+   /*to make the queue off, need to set the RST bit. */
+
+   /* Reset the Enable bit for all the 64 IQs.  */
+   for (q_no = srn; q_no < ern; q_no++) {
+   /* set RST bit to 1. This bit applies to both IQ and OQ */
+   d64 = octeon_read_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
+   d64 = d64 | CN23XX_PKT_INPUT_CTL_RST;
+   octeon_write_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no), d64);
+   }
+
+   /*wait until the RST bit is clear or the RST and quite bits are set*/
+   for (q_no = srn; q_no < ern; q_no++) {
+   u64 reg_val = octeon_read_csr64(oct,
+   CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
+   while ((READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) &&
+  !(READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_QUIET) &&
+  loop--) {
+   WRITE_ONCE(reg_val, octeon_read_csr64(
+   oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no)));
+   }
+   if (!loop) {
+   dev_err(>pci_dev->dev,
+   "clearing the reset reg failed or setting the 
quiet reg failed for qno: %u\n",
+   q_no);
+   return -1;
+   }
+   WRITE_ONCE(reg_val, READ_ONCE(reg_val) &
+   ~CN23XX_PKT_INPUT_CTL_RST);
+   octeon_write_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no),
+  READ_ONCE(reg_val));
+
+   WRITE_ONCE(reg_val, octeon_read_csr64(
+  oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no)));
+   if (READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) {
+   dev_err(>pci_dev->dev,
+   "clearing the reset failed for qno: %u\n",
+   q_no);
+   ret_val = -1;
+   }
+   }
+
+   return ret_val;
+}
+
 static int cn23xx_pf_setup_global_input_regs(struct octeon_device *oct)
 {
u32 q_no, ern, srn;
@@ -324,6 +379,9 @@ static int cn23xx_pf_setup_global_input_regs(struct 
octeon_device *oct)
srn = oct->sriov_info.pf_srn;
ern = srn + oct->sriov_info.num_pf_rings;
 
+   if (cn23xx_reset_io_queues(oct))
+   return -1;
+
/** Set the MAC_NUM and PVF_NUM in IQ_PKT_CONTROL reg
* for all queues.Only PF can set these bits.
* bits 29:30 indicate the MAC num.
@@ -552,6 +610,158 @@ static void cn23xx_setup_oq_regs(struct octeon_device 
*oct, u32 oq_no)
 reg_val);
 }
 
+static int cn23xx_enable_io_queues(struct octeon_device *oct)
+{
+   u64 reg_val;
+   u32 srn, ern, q_no;
+   u32 loop = 1000;
+
+   srn = oct->sriov_info.pf_srn;
+   ern = srn + oct->num_iqs;
+
+   for (q_no = srn; q_no < ern; q_no++) {
+   /* set the corresponding IQ IS_64B bit */
+   if (oct->io_qmask.iq64B & BIT_ULL(q_no - srn)) {
+   reg_val = octeon_read_csr64(
+   oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no));
+   reg_val = reg_val | CN23XX_PKT_INPUT_CTL_IS_64B;
+   octeon_write_csr64(
+   oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no), reg_val);
+   }
+
+   /* set the corresponding IQ ENB bit */
+   if (oct->io_qmask.iq & BIT_ULL(q_no - srn)) {
+

1 2 3 >

1 - 100 of 244 matches

Mail list logo