Re: [PATCH v3 0/5] net/usb: asix driver improvements
From: robert.f...@collabora.com Date: Mon, 29 Aug 2016 09:32:14 -0400 > This is a resubmission of v3, since the netdev > mailinlist was not sent the previous submission. > > This series improves power management of the asix driver. ... Series applied, thanks.
Re: [PATCH net 0/3] qed*: dcbx fix series.
From: Sudarsana Reddy KalluruDate: Mon, 29 Aug 2016 08:29:51 -0400 > The series contains several small fixes for qed* dcbx module. Series applied, thanks.
Re: [PATCH] mISDN: mark symbols static where possible
Three different patches all with the same Subject line, so I can't apply this stuff. You must make the subject lines unique so that someone reading the "git shortlog" can tell what is different in each change.
Re: [Patch net] kcm: fix a socket double free
From: Cong WangDate: Sun, 28 Aug 2016 21:28:26 -0700 > Dmitry reported a double free on kcm socket, which could > be easily reproduced by: > > #include > #include > > int main() > { > int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0); > syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0); > return 0; > } > > This is because on the error path, after we install > the new socket file, we call sock_release() to clean > up the socket, which leaves the fd pointing to a freed > socket. Fix this by calling sys_close() on that fd > directly. > > Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") > Reported-by: Dmitry Vyukov > Cc: Tom Herbert > Signed-off-by: Cong Wang Applied and queued up for -stable, thanks.
Re: [PATCH v2] ipv6: Use inbound ifaddr as source addresses for ICMPv6 errors
From: Eli CooperDate: Sun, 28 Aug 2016 11:34:06 +0800 > According to RFC 1885 2.2(c), the source address of ICMPv6 > errors in response to forwarded packets should be set to the > unicast address of the forwarding interface in order to be helpful > in diagnosis. Currently the selection of source address is based > on the default route, without respect to the inbound interface. > > This patch sets the source address of ICMPv6 error messages to > the address of inbound interface, with the exception of > 'time exceeded' and 'packet to big' messages sent in ip6_forward(), > where the address of OUTPUT device is forced as source address > (however, it is NOT enforced as claimed without this patch). > > Signed-off-by: Eli Cooper Please resubmit with an updated commit message describing the use case.
Re: [PATCH net v4 0/9] net: ethernet: mediatek: a couple of fixes
From:Date: Thu, 1 Sep 2016 10:47:26 +0800 > a couple of fixes come out from integrating with linux-4.8 rc1 > they all are verified and workable on linux-4.8 rc1 Series applied.
Re: [PATCH 4/5] r8152: constify ethtool_ops structures
From: Julia LawallDate: Thu, 1 Sep 2016 00:21:22 +0200 > Check for ethtool_ops structures that are only stored in the ethtool_ops > field of a net_device structure or passed as the second argument to > netdev_set_default_ethtool_ops. These contexts are declared const, so > ethtool_ops structures that have these properties can be declared as const > also. > > The semantic patch that makes this change is as follows: > (http://coccinelle.lip6.fr/) ... > Suggested-by: Stephen Hemminger > > Signed-off-by: Julia Lawall Applied.
Re: [PATCH 5/5] net: axienet: constify ethtool_ops structures
From: Julia LawallDate: Thu, 1 Sep 2016 00:21:23 +0200 > Check for ethtool_ops structures that are only stored in the ethtool_ops > field of a net_device structure or passed as the second argument to > netdev_set_default_ethtool_ops. These contexts are declared const, so > ethtool_ops structures that have these properties can be declared as const > also. > > The semantic patch that makes this change is as follows: > (http://coccinelle.lip6.fr/) ... > Suggested-by: Stephen Hemminger > > Signed-off-by: Julia Lawall Applied.
Re: [PATCH 1/5] net: mediatek: constify ethtool_ops structures
From: Julia LawallDate: Thu, 1 Sep 2016 00:21:19 +0200 > Check for ethtool_ops structures that are only stored in the ethtool_ops > field of a net_device structure or passed as the second argument to > netdev_set_default_ethtool_ops. These contexts are declared const, so > ethtool_ops structures that have these properties can be declared as const > also. ... > Suggested-by: Stephen Hemminger > > Signed-off-by: Julia Lawall Applied.
Re: [PATCH net-next 00/12] net: Convert vrf from dst to tx hook
From: David AhernDate: Wed, 31 Aug 2016 17:14:13 -0600 > please drop this series. BGP smoke tests triggered a couple of > problems I need to resolve. Ok.
Re: [RFC] xgbe: constify get_netdev_ops and get_ethtool_ops
On 08/31/2016 04:17 PM, David Miller wrote: > From: Stephen Hemminger> Date: Wed, 31 Aug 2016 08:57:36 -0700 > >> Casting away const is bad practice. Since this is ARM specific driver >> don't have hardware actually test this. >> >> Having getter functions for ops is really unnecessary code bloat, but >> not going to touch that. >> >> Signed-off-by: Stephen Hemminger > > I'll just apply this, let's see what happens. I should be able to test this in the next few days. I don't expect there to be an issue. I'll let you know what I find. Thanks, Tom >
[PATCH net v4 5/9] net: ethernet: mediatek: fix logic unbalance between probe and remove
From: Sean Wangoriginal mdio_cleanup is not in the symmetric place against where mdio_init is, so relocate mdio_cleanup to the right one. Signed-off-by: Sean Wang Acked-by: John Crispin --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 1ffde91..bf5b7e1 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1511,7 +1511,6 @@ static void mtk_uninit(struct net_device *dev) struct mtk_eth *eth = mac->hw; phy_disconnect(mac->phy_dev); - mtk_mdio_cleanup(eth); mtk_irq_disable(eth, ~0); } @@ -1916,6 +1915,7 @@ static int mtk_remove(struct platform_device *pdev) netif_napi_del(>tx_napi); netif_napi_del(>rx_napi); mtk_cleanup(eth); + mtk_mdio_cleanup(eth); platform_set_drvdata(pdev, NULL); return 0; -- 1.9.1
[PATCH net v4 9/9] net: ethernet: mediatek: fix error handling inside mtk_mdio_init
From: Sean WangReturn -ENODEV if the MDIO bus is disabled in the device tree. Signed-off-by: Sean Wang Acked-by: John Crispin Reviewed-by: Andrew Lunn --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 0367f51..d919915 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -304,7 +304,7 @@ static int mtk_mdio_init(struct mtk_eth *eth) } if (!of_device_is_available(mii_np)) { - ret = 0; + ret = -ENODEV; goto err_put_node; } -- 1.9.1
[PATCH net v4 3/9] net: ethernet: mediatek: fix API usage with skb_free_frag
From: Sean Wanguse skb_free_frag() instead of legacy put_page() Signed-off-by: Sean Wang Acked-by: John Crispin --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index a5dcf57..c9e25a7 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -870,7 +870,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget, /* receive data */ skb = build_skb(data, ring->frag_size); if (unlikely(!skb)) { - put_page(virt_to_head_page(new_data)); + skb_free_frag(new_data); netdev->stats.rx_dropped++; goto release_desc; } -- 1.9.1
[PATCH net v4 1/9] net: ethernet: mediatek: fix fails from TX housekeeping due to incorrect port setup
From: Sean Wangwhich net device the SKB is complete for depends on the forward port on txd4 on the corresponding TX descriptor, but the information isn't set up well in case of SKB fragments that would lead to watchdog timeout from the upper layer, so fix it up. Signed-off-by: Sean Wang Acked-by: John Crispin --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index f160954..7fc2ff0 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -588,14 +588,15 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, dma_addr_t mapped_addr; unsigned int nr_frags; int i, n_desc = 1; - u32 txd4 = 0; + u32 txd4 = 0, fport; itxd = ring->next_free; if (itxd == ring->last_free) return -ENOMEM; /* set the forward port */ - txd4 |= (mac->id + 1) << TX_DMA_FPORT_SHIFT; + fport = (mac->id + 1) << TX_DMA_FPORT_SHIFT; + txd4 |= fport; tx_buf = mtk_desc_to_tx_buf(ring, itxd); memset(tx_buf, 0, sizeof(*tx_buf)); @@ -653,7 +654,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev, WRITE_ONCE(txd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(frag_map_size) | last_frag * TX_DMA_LS0)); - WRITE_ONCE(txd->txd4, 0); + WRITE_ONCE(txd->txd4, fport); tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC; tx_buf = mtk_desc_to_tx_buf(ring, txd); -- 1.9.1
[PATCH net v4 6/9] net: ethernet: mediatek: fix issue of driver removal with interface is up
From: Sean Wangmtk_stop() must be called to stop for freeing DMA resources acquired and restoring state changed by mtk_open() firstly when module removal. Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index bf5b7e1..556951e 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1906,6 +1906,14 @@ err_free_dev: static int mtk_remove(struct platform_device *pdev) { struct mtk_eth *eth = platform_get_drvdata(pdev); + int i; + + /* stop all devices to make sure that dma is properly shut down */ + for (i = 0; i < MTK_MAC_COUNT; i++) { + if (!eth->netdev[i]) + continue; + mtk_stop(eth->netdev[i]); + } clk_disable_unprepare(eth->clks[MTK_CLK_ETHIF]); clk_disable_unprepare(eth->clks[MTK_CLK_ESW]); -- 1.9.1
[PATCH net v4 7/9] net: ethernet: mediatek: fix the missing of_node_put() after node is used done inside mtk_mdio_init
From: Sean WangThis patch adds the missing of_node_put() after finishing the usage of of_get_child_by_name. Signed-off-by: Sean Wang Acked-by: John Crispin --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 556951e..409efcf 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -324,6 +324,7 @@ static int mtk_mdio_init(struct mtk_eth *eth) err = of_mdiobus_register(eth->mii_bus, mii_np); if (err) goto err_free_bus; + of_node_put(mii_np); return 0; -- 1.9.1
[PATCH net v4 8/9] net: ethernet: mediatek: use devm_mdiobus_alloc instead of mdiobus_alloc inside mtk_mdio_init
From: Sean Wanga lot of parts in the driver uses devm_* APIs to gain benefits from the device resource management, so devm_mdiobus_alloc is also used instead of mdiobus_alloc to have more elegant code flow. Using common code provided by the devm_* helps to 1) have simplified the code flow as [1] says 2) decrease the risk of incorrect error handling by human 3) only a few drivers used it since it was proposed on linux 3.16, so just hope to promote for this. Ref: [1] https://patchwork.ozlabs.org/patch/344093/ Signed-off-by: Sean Wang Reviewed-by: Andrew Lunn --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 23 ++- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 409efcf..0367f51 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -295,7 +295,7 @@ err_phy: static int mtk_mdio_init(struct mtk_eth *eth) { struct device_node *mii_np; - int err; + int ret; mii_np = of_get_child_by_name(eth->dev->of_node, "mdio-bus"); if (!mii_np) { @@ -304,13 +304,13 @@ static int mtk_mdio_init(struct mtk_eth *eth) } if (!of_device_is_available(mii_np)) { - err = 0; + ret = 0; goto err_put_node; } - eth->mii_bus = mdiobus_alloc(); + eth->mii_bus = devm_mdiobus_alloc(eth->dev); if (!eth->mii_bus) { - err = -ENOMEM; + ret = -ENOMEM; goto err_put_node; } @@ -321,20 +321,11 @@ static int mtk_mdio_init(struct mtk_eth *eth) eth->mii_bus->parent = eth->dev; snprintf(eth->mii_bus->id, MII_BUS_ID_SIZE, "%s", mii_np->name); - err = of_mdiobus_register(eth->mii_bus, mii_np); - if (err) - goto err_free_bus; - of_node_put(mii_np); - - return 0; - -err_free_bus: - mdiobus_free(eth->mii_bus); + ret = of_mdiobus_register(eth->mii_bus, mii_np); err_put_node: of_node_put(mii_np); - eth->mii_bus = NULL; - return err; + return ret; } static void mtk_mdio_cleanup(struct mtk_eth *eth) @@ -343,8 +334,6 @@ static void mtk_mdio_cleanup(struct mtk_eth *eth) return; mdiobus_unregister(eth->mii_bus); - of_node_put(eth->mii_bus->dev.of_node); - mdiobus_free(eth->mii_bus); } static inline void mtk_irq_disable(struct mtk_eth *eth, u32 mask) -- 1.9.1
[PATCH net v4 4/9] net: ethernet: mediatek: remove redundant free_irq for devm_request_irq allocated irq
From: Sean Wangthese irqs are not used for shared irq and disabled during ethernet stops. irq requested by devm_request_irq is safe to be freed automatically on driver detach. Signed-off-by: Sean Wang Acked-by: John Crispin --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index c9e25a7..1ffde91 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1513,8 +1513,6 @@ static void mtk_uninit(struct net_device *dev) phy_disconnect(mac->phy_dev); mtk_mdio_cleanup(eth); mtk_irq_disable(eth, ~0); - free_irq(eth->irq[1], dev); - free_irq(eth->irq[2], dev); } static int mtk_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) -- 1.9.1
[PATCH net v4 2/9] net: ethernet: mediatek: fix incorrect return value of devm_clk_get with EPROBE_DEFER
From: Sean Wang1) If the return value of devm_clk_get is EPROBE_DEFER, we should defer probing the driver. The change is verified and works based on 4.8-rc1 staying with the latest clk-next code for MT7623. 2) Changing with the usage of loops to work out if all clocks required are fine Signed-off-by: Sean Wang --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 39 - drivers/net/ethernet/mediatek/mtk_eth_soc.h | 22 ++-- 2 files changed, 36 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 7fc2ff0..a5dcf57 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -50,6 +50,10 @@ static const struct mtk_ethtool_stats { MTK_ETHTOOL_STAT(rx_flow_control_packets), }; +static const char * const mtk_clks_source_name[] = { + "ethif", "esw", "gp1", "gp2" +}; + void mtk_w32(struct mtk_eth *eth, u32 val, unsigned reg) { __raw_writel(val, eth->base + reg); @@ -1814,6 +1818,7 @@ static int mtk_probe(struct platform_device *pdev) if (!eth) return -ENOMEM; + eth->dev = >dev; eth->base = devm_ioremap_resource(>dev, res); if (IS_ERR(eth->base)) return PTR_ERR(eth->base); @@ -1848,21 +1853,21 @@ static int mtk_probe(struct platform_device *pdev) return -ENXIO; } } + for (i = 0; i < ARRAY_SIZE(eth->clks); i++) { + eth->clks[i] = devm_clk_get(eth->dev, + mtk_clks_source_name[i]); + if (IS_ERR(eth->clks[i])) { + if (PTR_ERR(eth->clks[i]) == -EPROBE_DEFER) + return -EPROBE_DEFER; + return -ENODEV; + } + } - eth->clk_ethif = devm_clk_get(>dev, "ethif"); - eth->clk_esw = devm_clk_get(>dev, "esw"); - eth->clk_gp1 = devm_clk_get(>dev, "gp1"); - eth->clk_gp2 = devm_clk_get(>dev, "gp2"); - if (IS_ERR(eth->clk_esw) || IS_ERR(eth->clk_gp1) || - IS_ERR(eth->clk_gp2) || IS_ERR(eth->clk_ethif)) - return -ENODEV; - - clk_prepare_enable(eth->clk_ethif); - clk_prepare_enable(eth->clk_esw); - clk_prepare_enable(eth->clk_gp1); - clk_prepare_enable(eth->clk_gp2); + clk_prepare_enable(eth->clks[MTK_CLK_ETHIF]); + clk_prepare_enable(eth->clks[MTK_CLK_ESW]); + clk_prepare_enable(eth->clks[MTK_CLK_GP1]); + clk_prepare_enable(eth->clks[MTK_CLK_GP2]); - eth->dev = >dev; eth->msg_enable = netif_msg_init(mtk_msg_level, MTK_DEFAULT_MSG_ENABLE); INIT_WORK(>pending_work, mtk_pending_work); @@ -1905,10 +1910,10 @@ static int mtk_remove(struct platform_device *pdev) { struct mtk_eth *eth = platform_get_drvdata(pdev); - clk_disable_unprepare(eth->clk_ethif); - clk_disable_unprepare(eth->clk_esw); - clk_disable_unprepare(eth->clk_gp1); - clk_disable_unprepare(eth->clk_gp2); + clk_disable_unprepare(eth->clks[MTK_CLK_ETHIF]); + clk_disable_unprepare(eth->clks[MTK_CLK_ESW]); + clk_disable_unprepare(eth->clks[MTK_CLK_GP1]); + clk_disable_unprepare(eth->clks[MTK_CLK_GP2]); netif_napi_del(>tx_napi); netif_napi_del(>rx_napi); diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h index f82e3ac..6e1ade7 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h @@ -290,6 +290,17 @@ enum mtk_tx_flags { MTK_TX_FLAGS_PAGE0 = 0x02, }; +/* This enum allows us to identify how the clock is defined on the array of the + * clock in the order + */ +enum mtk_clks_map { + MTK_CLK_ETHIF, + MTK_CLK_ESW, + MTK_CLK_GP1, + MTK_CLK_GP2, + MTK_CLK_MAX +}; + /* struct mtk_tx_buf - This struct holds the pointers to the memory pointed at * by the TX descriptors * @skb: The SKB pointer of the packet being sent @@ -370,10 +381,7 @@ struct mtk_rx_ring { * @scratch_ring: Newer SoCs need memory for a second HW managed TX ring * @phy_scratch_ring: physical address of scratch_ring * @scratch_head: The scratch memory that scratch_ring points to. - * @clk_ethif: The ethif clock - * @clk_esw: The switch clock - * @clk_gp1: The gmac1 clock - * @clk_gp2: The gmac2 clock + * @clks: clock array for all clocks required * @mii_bus: If there is a bus we need to create an instance for it * @pending_work: The workqueue used to reset the dma ring */ @@ -400,10 +408,8 @@ struct mtk_eth { struct mtk_tx_dma *scratch_ring; dma_addr_t phy_scratch_ring;
[PATCH net v4 0/9] net: ethernet: mediatek: a couple of fixes
From: Sean Wanga couple of fixes come out from integrating with linux-4.8 rc1 they all are verified and workable on linux-4.8 rc1 Changes since v1: - usage of loops to work out if all required clock are ready instead of tedious coding - remove redundant pinctrl setup that is already done by core driver thanks for careful and patient reviewing by Andrew Lunn - splitting distinct changes into the separate patches - change variable naming from err to ret for readable coding Changes since v2: - restore to original clock disabling sequence that is changed accidentally in the last version - refine the commit log that would cause misunderstanding what has been done in the changes - refine the commit log that would cause footnote losing due to improper delimiter use Changes since v3: - fix git rejects caused by mixing a change from net-next, so remake the patch set based on the current net branch again. Sean Wang (9): net: ethernet: mediatek: fix fails from TX housekeeping due to incorrect port setup net: ethernet: mediatek: fix incorrect return value of devm_clk_get with EPROBE_DEFER net: ethernet: mediatek: fix API usage with skb_free_frag net: ethernet: mediatek: remove redundant free_irq for devm_request_irq allocated irq net: ethernet: mediatek: fix logic unbalance between probe and remove net: ethernet: mediatek: fix issue of driver removal with interface is up net: ethernet: mediatek: fix the missing of_node_put() after node is used done inside mtk_mdio_init net: ethernet: mediatek: use devm_mdiobus_alloc instead of mdiobus_alloc inside mtk_mdio_init net: ethernet: mediatek: fix error handling inside mtk_mdio_init drivers/net/ethernet/mediatek/mtk_eth_soc.c | 82 +++-- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 22 +--- 2 files changed, 56 insertions(+), 48 deletions(-) -- 1.9.1
Re: [PATCH] ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit()
Hello, On 2016/9/1 4:56, David Miller wrote: > From: Eli Cooper> Date: Fri, 26 Aug 2016 23:52:29 +0800 > >> @@ -1174,6 +1174,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device >> *dev) >> encap_limit = t->parms.encap_limit; >> >> memcpy(, >fl.u.ip6, sizeof(fl6)); >> +fl6.flowi6_proto = IPPROTO_IPIP; > Let's just simply have t->fl have the proto setup properly, just like > in GRE. > > Assigning it explicitly every packet transmit doesn't make much sense. I doubt that. Unlike GRE, where the proto must be IPPROTO_GRE, the proto here can be either IPPROTO_IPV6 or IPPROTO_IPIP for a single tunnel, and t->fl is shared by them. Thus it has to be assigned for every packet. Thanks, Eli
Re: [PATCH net-next 0/4] xps_flows: XPS flow steering when there is no socket
On Wed, 2016-08-31 at 17:10 -0700, Tom Herbert wrote: > Tested: > Manually forced all packets to go through the xps_flows path. > Observed that some flows were deferred to change queues because > packets were in flight witht the flow bucket. I did not realize you were ready to submit this new infra ! Please add performance tests and documentation. ( Documentation/networking/scaling.txt should be a nice place ) Unconnected UDP packets are candidates to this selection, even locally generated, while maybe the applications are pinning their thread(s) to cpu(s) TX completion will then happen on multiple cpus. Not sure about af_packet and/or pktgen ? - The new hash table is vmalloc()ed on a single NUMA node. (in comparison RFS table (per rx queue) can be properly accessed by a single cpu servicing queue interrupts) - Each packet will likely get an additional cache miss in a DDOS forwarding workload. Thanks.
Re: [PATCH v2 net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events
On Wed, Aug 31, 2016 at 2:50 PM, Alexei Starovoitovwrote: > Hi Peter, Dave, > > this patch set is a follow up to the discussion: > https://lkml.kernel.org/r/20160804142853.GO6862%20()%20twins%20!%20programming%20!%20kicks-ass%20!%20net > It turned out to be simpler than what we discussed. > > Patches 1-3 is bpf-side prep for the main patch 4 > that adds bpf program as an overflow_handler to sw and hw perf_events. > Peter, please review. > > Patches 5 and 6 are examples from myself and Brendan. > > v1-v2: fixed issues spotted by Peter and Daniel. Thanks Alexei! Tested-by: Brendan Gregg Brendan
[PATCH net-next 3/4] net: Add xps_dev_flow_table_cnt
Add infrastructure and definitions to create XFS flow tables. This creates the new sys entry /sys/class/net/eth*/xps_dev_flow_table_cnt Signed-off-by: Tom Herbert--- include/linux/netdevice.h | 24 + net/core/net-sysfs.c | 89 +++ 2 files changed, 113 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0d1d748..0164c47 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -736,6 +736,27 @@ struct xps_dev_maps { (nr_cpu_ids * sizeof(struct xps_map *))) #endif /* CONFIG_XPS */ +#ifdef CONFIG_XPS_FLOWS +struct xps_dev_flow { + union { + u64 v64; + struct { + int queue_index; + unsigned intqueue_ptr; + }; + }; +}; + +struct xps_dev_flow_table { + unsigned int mask; + struct rcu_head rcu; + struct xps_dev_flow flows[0]; +}; +#define XPS_DEV_FLOW_TABLE_SIZE(_num) (sizeof(struct xps_dev_flow_table) + \ + ((_num) * sizeof(struct xps_dev_flow))) + +#endif /* CONFIG_XPS_FLOWS */ + #define TC_MAX_QUEUE 16 #define TC_BITMASK 15 /* HW offloaded queuing disciplines txq count and offset maps */ @@ -1809,6 +1830,9 @@ struct net_device { #ifdef CONFIG_XPS struct xps_dev_maps __rcu *xps_maps; #endif +#ifdef CONFIG_XPS_FLOWS + struct xps_dev_flow_table __rcu *xps_flow_table; +#endif #ifdef CONFIG_NET_CLS_ACT struct tcf_proto __rcu *egress_cl_list; #endif diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index ab7b0b6..0d00b9c 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -503,6 +503,92 @@ static ssize_t phys_switch_id_show(struct device *dev, } static DEVICE_ATTR_RO(phys_switch_id); +#ifdef CONFIG_XPS_FLOWS +static void xps_dev_flow_table_release(struct rcu_head *rcu) +{ + struct xps_dev_flow_table *table = container_of(rcu, + struct xps_dev_flow_table, rcu); + vfree(table); +} + +static int change_xps_dev_flow_table_cnt(struct net_device *dev, +unsigned long count) +{ + unsigned long mask; + struct xps_dev_flow_table *table, *old_table; + static DEFINE_SPINLOCK(xps_dev_flow_lock); + + if (!capable(CAP_NET_ADMIN)) + return -EPERM; + + if (count) { + mask = count - 1; + /* mask = roundup_pow_of_two(count) - 1; +* without overflows... +*/ + while ((mask | (mask >> 1)) != mask) + mask |= (mask >> 1); + /* On 64 bit arches, must check mask fits in table->mask (u32), +* and on 32bit arches, must check +* XPS_DEV_FLOW_TABLE_SIZE(mask + 1) doesn't overflow. +*/ +#if BITS_PER_LONG > 32 + if (mask > (unsigned long)(u32)mask) + return -EINVAL; +#else + if (mask > (ULONG_MAX - XPS_DEV_FLOW_TABLE_SIZE(1)) + / sizeof(struct xps_dev_flow)) { + /* Enforce a limit to prevent overflow */ + return -EINVAL; + } +#endif + table = vmalloc(XPS_DEV_FLOW_TABLE_SIZE(mask + 1)); + if (!table) + return -ENOMEM; + + table->mask = mask; + for (count = 0; count <= mask; count++) + table->flows[count].queue_index = -1; + } else + table = NULL; + + spin_lock(_dev_flow_lock); + old_table = rcu_dereference_protected(dev->xps_flow_table, + lockdep_is_held(_dev_flow_lock)); + rcu_assign_pointer(dev->xps_flow_table, table); + spin_unlock(_dev_flow_lock); + + if (old_table) + call_rcu(_table->rcu, xps_dev_flow_table_release); + + return 0; +} + +static ssize_t xps_dev_flow_table_cnt_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t len) +{ + return netdev_store(dev, attr, buf, len, change_xps_dev_flow_table_cnt); +} + +static ssize_t xps_dev_flow_table_cnt_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct net_device *netdev = to_net_dev(dev); + struct xps_dev_flow_table *table; + unsigned int cnt = 0; + + rcu_read_lock(); + table = rcu_dereference(netdev->xps_flow_table); + if (table) + cnt = table->mask + 1; + rcu_read_unlock(); + + return sprintf(buf, fmt_dec, cnt); +} +DEVICE_ATTR_RW(xps_dev_flow_table_cnt); +#endif /* CONFIG_XPS_FLOWS */ + static struct attribute *net_class_attrs[] = { _attr_netdev_group.attr, _attr_type.attr, @@ -531,6 +617,9
[PATCH net-next 2/4] dql: Add counters for number of queuing and completion operations
Add two new counters to struct dql that are num_enqueue_ops and num_completed_ops. num_enqueue_ops is incremented by one in each call to dql_queued. num_enqueue_ops is incremented in dql_completed which takes an argument indicating number of operations completed. These counters are only intended for statistics and do not impact the BQL algorithm. We add a new sysfs entry in byte_queue_limits named inflight_pkts. This provides the number of packets in flight for the queue by dql->num_enqueue_ops - dql->num_completed_ops. Signed-off-by: Tom Herbert--- include/linux/dynamic_queue_limits.h | 7 ++- include/linux/netdevice.h| 2 +- lib/dynamic_queue_limits.c | 3 ++- net/core/net-sysfs.c | 14 ++ 4 files changed, 23 insertions(+), 3 deletions(-) diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h index a4be703..b6a4804 100644 --- a/include/linux/dynamic_queue_limits.h +++ b/include/linux/dynamic_queue_limits.h @@ -43,6 +43,8 @@ struct dql { unsigned intadj_limit; /* limit + num_completed */ unsigned intlast_obj_cnt; /* Count at last queuing */ + unsigned intnum_enqueue_ops;/* Number of queue operations */ + /* Fields accessed only by completion path (dql_completed) */ unsigned intlimit cacheline_aligned_in_smp; /* Current limit */ @@ -55,6 +57,8 @@ struct dql { unsigned intlowest_slack; /* Lowest slack found */ unsigned long slack_start_time; /* Time slacks seen */ + unsigned intnum_completed_ops; /* Number of complete ops */ + /* Configuration */ unsigned intmax_limit; /* Max limit */ unsigned intmin_limit; /* Minimum limit */ @@ -83,6 +87,7 @@ static inline void dql_queued(struct dql *dql, unsigned int count) barrier(); dql->num_queued += count; + dql->num_enqueue_ops++; } /* Returns how many objects can be queued, < 0 indicates over limit. */ @@ -92,7 +97,7 @@ static inline int dql_avail(const struct dql *dql) } /* Record number of completed objects and recalculate the limit. */ -void dql_completed(struct dql *dql, unsigned int count); +void dql_completed(struct dql *dql, unsigned int count, unsigned int ops); /* Reset dql state */ void dql_reset(struct dql *dql); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d122be9..0d1d748 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2999,7 +2999,7 @@ static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue, if (unlikely(!bytes)) return; - dql_completed(_queue->dql, bytes); + dql_completed(_queue->dql, bytes, pkts); /* * Without the memory barrier there is a small possiblity that diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c index f346715..d5e7a27 100644 --- a/lib/dynamic_queue_limits.c +++ b/lib/dynamic_queue_limits.c @@ -14,7 +14,7 @@ #define AFTER_EQ(A, B) ((int)((A) - (B)) >= 0) /* Records completed count and recalculates the queue limit */ -void dql_completed(struct dql *dql, unsigned int count) +void dql_completed(struct dql *dql, unsigned int count, unsigned int ops) { unsigned int inprogress, prev_inprogress, limit; unsigned int ovlimit, completed, num_queued; @@ -108,6 +108,7 @@ void dql_completed(struct dql *dql, unsigned int count) dql->prev_ovlimit = ovlimit; dql->prev_last_obj_cnt = dql->last_obj_cnt; dql->num_completed = completed; + dql->num_completed_ops += ops; dql->prev_num_queued = num_queued; } EXPORT_SYMBOL(dql_completed); diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 6e4f347..ab7b0b6 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -1147,6 +1147,19 @@ static ssize_t bql_show_inflight(struct netdev_queue *queue, static struct netdev_queue_attribute bql_inflight_attribute = __ATTR(inflight, S_IRUGO, bql_show_inflight, NULL); +static ssize_t bql_show_inflight_pkts(struct netdev_queue *queue, + struct netdev_queue_attribute *attr, + char *buf) +{ + struct dql *dql = >dql; + + return sprintf(buf, "%u\n", + dql->num_enqueue_ops - dql->num_completed_ops); +} + +static struct netdev_queue_attribute bql_inflight_pkts_attribute = + __ATTR(inflight_pkts, S_IRUGO, bql_show_inflight_pkts, NULL); + #define BQL_ATTR(NAME, FIELD) \ static ssize_t bql_show_ ## NAME(struct netdev_queue *queue, \ struct netdev_queue_attribute *attr, \ @@ -1176,6 +1189,7 @@ static struct attribute *dql_attrs[] = { _limit_min_attribute.attr,
[PATCH net-next 4/4] xps_flows: XPS for packets that don't have a socket
xps_flows maintains a per device flow table that is indexed by the skbuff hash. The table is only consulted when there is no queue saved in a transmit socket for an skbuff. Each entry in the flow table contains a queue index and a queue pointer. The queue pointer is set when a queue is chosen using a flow table entry. This pointer is set to the head pointer in the transmit queue (which is maintained by BQL). The new function get_xps_flows_index that looks up flows in the xps_flows table. The entry returned gives the last queue a matching flow used. The returned queue is compared against the normal XPS queue. If they are different, then we only switch if the tail pointer in the TX queue has advanced past the pointer saved in the entry. In this way OOO should be avoided when XPS wants to use a different queue. Signed-off-by: Tom Herbert--- net/Kconfig| 6 + net/core/dev.c | 85 +++--- 2 files changed, 76 insertions(+), 15 deletions(-) diff --git a/net/Kconfig b/net/Kconfig index 7b6cd34..f77fad1 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -255,6 +255,12 @@ config XPS depends on SMP default y +config XPS_FLOWS + bool + depends on XPS + depends on BQL + default y + config HWBM bool diff --git a/net/core/dev.c b/net/core/dev.c index 34b5322..fc68d19 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3210,6 +3210,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) } #endif /* CONFIG_NET_EGRESS */ +/* Must be called with RCU read_lock */ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb) { #ifdef CONFIG_XPS @@ -3217,7 +3218,6 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb) struct xps_map *map; int queue_index = -1; - rcu_read_lock(); dev_maps = rcu_dereference(dev->xps_maps); if (dev_maps) { map = rcu_dereference( @@ -3232,7 +3232,6 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb) queue_index = -1; } } - rcu_read_unlock(); return queue_index; #else @@ -3240,26 +3239,82 @@ static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb) #endif } -static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb) +/* Must be called with RCU read_lock */ +static int get_xps_flows_index(struct net_device *dev, struct sk_buff *skb) { - struct sock *sk = skb->sk; - int queue_index = sk_tx_queue_get(sk); +#ifdef CONFIG_XPS_FLOWS + struct xps_dev_flow_table *flow_table; + struct xps_dev_flow ent; + int queue_index; + struct netdev_queue *txq; + u32 hash; - if (queue_index < 0 || skb->ooo_okay || - queue_index >= dev->real_num_tx_queues) { - int new_index = get_xps_queue(dev, skb); - if (new_index < 0) - new_index = skb_tx_hash(dev, skb); + flow_table = rcu_dereference(dev->xps_flow_table); + if (!flow_table) + return -1; - if (queue_index != new_index && sk && - sk_fullsock(sk) && - rcu_access_pointer(sk->sk_dst_cache)) - sk_tx_queue_set(sk, new_index); + queue_index = get_xps_queue(dev, skb); + if (queue_index < 0) + return -1; - queue_index = new_index; + hash = skb_get_hash(skb); + if (!hash) + return -1; + + ent.v64 = flow_table->flows[hash & flow_table->mask].v64; + + if (queue_index != ent.queue_index && + ent.queue_index >= 0 && + ent.queue_index < dev->real_num_tx_queues) { + txq = netdev_get_tx_queue(dev, ent.queue_index); + if ((int)(txq->dql.num_completed_ops - ent.queue_ptr) < 0) { + /* The current queue's tail has not advanced beyond the +* last packet that was enqueued using the table entry. +* We can't change queues without risking OOO. Stick +* with the queue listed in the flow table. +*/ + queue_index = ent.queue_index; + } } + /* Save the updated entry */ + txq = netdev_get_tx_queue(dev, queue_index); + ent.queue_index = queue_index; + ent.queue_ptr = txq->dql.num_enqueue_ops; + flow_table->flows[hash & flow_table->mask].v64 = ent.v64; + return queue_index; +#else + return get_xps_queue(dev, skb); +#endif +} + +static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb) +{ + struct sock *sk = skb->sk; + int queue_index = sk_tx_queue_get(sk); + int new_index; + + if (queue_index < 0) { + /* Socket
[PATCH net-next 0/4] xps_flows: XPS flow steering when there is no socket
This patch set introduces transmit flow steering for socketless packets. The idea is that we record the transmit queues in a flow table that is indexed by skbuff hash. The flow table entries have two values: the queue_index and the head cnt of packets from the TX queue. We only allow a queue to change for a flow if the tail cnt in the TX queue advances beyond the recorded head cnt. That is the condition that should indicate that all outstanding packets for the flow have completed transmission so the queue can change. Tracking the inflight queue is performed as part of DQL. Two fields are added to the dql structure: num_enqueue_ops and num_completed_ops. num_enqueue_ops incremented in dql_queued and num_completed_ops is incremented in dql_completed by the number of operations completed (an new argument to the function). This patch set creates /sys/class/net/eth*/xps_dev_flow_table_cnt which number of entries in the XPS flow table. Note that the functionality here is technically best effort (for instance we don't obtain a lock while processing a flow table entry). Under high load it is possible that OOO packets can still be generated due to XPS if two threads are hammering on the same flow table entry. The assumption of this patches is that OOO packets are not the end of the world and these should prevent OOO in most common use cases with XPS. This is a followup to previous RFC version. Fixes from RFC are: - Move counters to DQL - Fixed typo - Simplified get flow index funtion - Fixed sysfs flow_table_cnt to properly use DEVICE_ATTR_RW - Renamed the mechanism Tested: Manually forced all packets to go through the xps_flows path. Observed that some flows were deferred to change queues because packets were in flight witht the flow bucket. Tom Herbert (4): net: Set SW hash in skb_set_hash_from_sk dql: Add counters for number of queuing and completion operations net: Add xps_dev_flow_table_cnt xps_flows: XPS for packets that don't have a socket include/linux/dynamic_queue_limits.h | 7 ++- include/linux/netdevice.h| 26 - include/net/sock.h | 6 +- lib/dynamic_queue_limits.c | 3 +- net/Kconfig | 6 ++ net/core/dev.c | 85 - net/core/net-sysfs.c | 103 +++ 7 files changed, 214 insertions(+), 22 deletions(-) -- 2.8.0.rc2
[PATCH net-next 1/4] net: Set SW hash in skb_set_hash_from_sk
Use the __skb_set_sw_hash to set the hash in an skbuff from the socket txhash. Signed-off-by: Tom Herbert--- include/net/sock.h | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index c797c57..12e585c 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1910,10 +1910,8 @@ static inline void sock_poll_wait(struct file *filp, static inline void skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk) { - if (sk->sk_txhash) { - skb->l4_hash = 1; - skb->hash = sk->sk_txhash; - } + if (sk->sk_txhash) + __skb_set_sw_hash(skb, sk->sk_txhash, true); } void skb_set_owner_w(struct sk_buff *skb, struct sock *sk); -- 2.8.0.rc2
[PATCH v3 4/5] arm64: dts: rockchip: add the gmac needed node for rk3399
The RK3399 GMAC Ethernet Controller provides a complete Ethernet interface from processor to a Reduced Media Independent Interface (RMII) and Reduced Gigabit Media Independent Interface (RGMII) compliant Ethernet PHY. This patch adds the related needed device information. e.g.: interrupts, grf, clocks, pinctrl and so on. The full details are in [0]. [0]: Documentation/devicetree/bindings/net/rockchip-dwmac.txt Signed-off-by: Caesar Wang--- Changes in v3: - generate a patch from https://patchwork.kernel.org/patch/9306339/. Changes in v2: None arch/arm64/boot/dts/rockchip/rk3399.dtsi | 80 1 file changed, 80 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 2ab233f..092bb45 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -200,6 +200,26 @@ }; }; + gmac: ethernet@fe30 { + compatible = "rockchip,rk3399-gmac"; + reg = <0x0 0xfe30 0x0 0x1>; + interrupts = ; + interrupt-names = "macirq"; + clocks = < SCLK_MAC>, < SCLK_MAC_RX>, +< SCLK_MAC_TX>, < SCLK_MACREF>, +< SCLK_MACREF_OUT>, < ACLK_GMAC>, +< PCLK_GMAC>; + clock-names = "stmmaceth", "mac_clk_rx", + "mac_clk_tx", "clk_mac_ref", + "clk_mac_refout", "aclk_mac", + "pclk_mac"; + power-domains = < RK3399_PD_GMAC>; + resets = < SRST_A_GMAC>; + reset-names = "stmmaceth"; + rockchip,grf = <>; + status = "disabled"; + }; + sdio0: dwmmc@fe31 { compatible = "rockchip,rk3399-dw-mshc", "rockchip,rk3288-dw-mshc"; @@ -1193,6 +1213,66 @@ drive-strength = <13>; }; + gmac { + rgmii_pins: rgmii-pins { + rockchip,pins = + /* mac_txclk */ + <3 17 RK_FUNC_1 _pull_none_13ma>, + /* mac_rxclk */ + <3 14 RK_FUNC_1 _pull_none>, + /* mac_mdio */ + <3 13 RK_FUNC_1 _pull_none>, + /* mac_txen */ + <3 12 RK_FUNC_1 _pull_none_13ma>, + /* mac_clk */ + <3 11 RK_FUNC_1 _pull_none>, + /* mac_rxdv */ + <3 9 RK_FUNC_1 _pull_none>, + /* mac_mdc */ + <3 8 RK_FUNC_1 _pull_none>, + /* mac_rxd1 */ + <3 7 RK_FUNC_1 _pull_none>, + /* mac_rxd0 */ + <3 6 RK_FUNC_1 _pull_none>, + /* mac_txd1 */ + <3 5 RK_FUNC_1 _pull_none_13ma>, + /* mac_txd0 */ + <3 4 RK_FUNC_1 _pull_none_13ma>, + /* mac_rxd3 */ + <3 3 RK_FUNC_1 _pull_none>, + /* mac_rxd2 */ + <3 2 RK_FUNC_1 _pull_none>, + /* mac_txd3 */ + <3 1 RK_FUNC_1 _pull_none_13ma>, + /* mac_txd2 */ + <3 0 RK_FUNC_1 _pull_none_13ma>; + }; + + rmii_pins: rmii-pins { + rockchip,pins = + /* mac_mdio */ + <3 13 RK_FUNC_1 _pull_none>, + /* mac_txen */ + <3 12 RK_FUNC_1 _pull_none_13ma>, + /* mac_clk */ + <3 11 RK_FUNC_1 _pull_none>, + /* mac_rxer */ + <3 10 RK_FUNC_1 _pull_none>, + /* mac_rxdv */ + <3 9 RK_FUNC_1 _pull_none>, + /* mac_mdc */ + <3 8 RK_FUNC_1 _pull_none>, + /* mac_rxd1 */ +
[PATCH v3 5/5] arm64: dts: rockchip: enable the gmac for rk3399 evb board
We add the required and optional properties for evb board. See the [0] to get the detail information. [0]: Documentation/devicetree/bindings/net/rockchip-dwmac.txt Signed-off-by: Roger ChenSigned-off-by: Caesar Wang --- Changes in v3: None Changes in v2: None arch/arm64/boot/dts/rockchip/rk3399-evb.dts | 31 + 1 file changed, 31 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts index d47b4e9..ed6f2e8 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-evb.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-evb.dts @@ -94,12 +94,43 @@ regulator-always-on; regulator-boot-on; }; + + clkin_gmac: external-gmac-clock { + compatible = "fixed-clock"; + clock-frequency = <12500>; + clock-output-names = "clkin_gmac"; + #clock-cells = <0>; + }; + + vcc_phy: vcc-phy-regulator { + compatible = "regulator-fixed"; + regulator-name = "vcc_phy"; + regulator-always-on; + regulator-boot-on; + }; + }; _phy { status = "okay"; }; + { + phy-supply = <_phy>; + phy-mode = "rgmii"; + clock_in_out = "input"; + snps,reset-gpio = < 15 GPIO_ACTIVE_LOW>; + snps,reset-active-low; + snps,reset-delays-us = <0 1 5>; + assigned-clocks = < SCLK_RMII_SRC>; + assigned-clock-parents = <_gmac>; + pinctrl-names = "default"; + pinctrl-0 = <_pins>; + tx_delay = <0x28>; + rx_delay = <0x11>; + status = "okay"; +}; + { status = "okay"; }; -- 1.9.1
[PATCH v3 0/5] Support the rk3399 gmac pd function
This patch add to handle the gmac pd, and support the rk3399 gmac for devicetree. The History version: v1: https://lkml.org/lkml/2016/8/30/668 v2: https://lkml.org/lkml/2016/8/31/27 Changes in v3: - leave into two patches based on patchv2, and fix nits and commit, as comment on https://patchwork.kernel.org/patch/9306339/ - generate a patch from https://patchwork.kernel.org/patch/9306339/. Changes in v2: - rk_gmac_powerup instead of the rk_gmac_init. - fixes the build error on next kernel. - Fixes the order, ss Heiko commnets on https://patchwork.kernel.org/patch/9305991/ Caesar Wang (3): arm64: dts: rockchip: add the gmac power domain on rk3399 arm64: dts: rockchip: add the gmac needed node for rk3399 arm64: dts: rockchip: enable the gmac for rk3399 evb board David Wu (1): net: stmmac: dwmac-rk: add pd_gmac support for rk3399 Roger Chen (1): net: stmmac: dwmac-rk: fixes the gmac resume after PD on/off arch/arm64/boot/dts/rockchip/rk3399-evb.dts| 31 + arch/arm64/boot/dts/rockchip/rk3399.dtsi | 90 ++ drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 28 +--- 3 files changed, 140 insertions(+), 9 deletions(-) -- 1.9.1
[PATCH v3 3/5] arm64: dts: rockchip: add the gmac power domain on rk3399
This patch supports the gmac pd to save power consumption. Even though some boards not need Ethernet support, the driver core can also take care of powering up the pd before probe. Signed-off-by: Roger ChenSigned-off-by: Caesar Wang --- Changes in v3: - leave into two patches based on patchv2, and fix nits and commit, as comment on https://patchwork.kernel.org/patch/9306339/ Changes in v2: - Fixes the order, ss Heiko commnets on https://patchwork.kernel.org/patch/9305991/ arch/arm64/boot/dts/rockchip/rk3399.dtsi | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..2ab233f 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -611,6 +611,11 @@ status = "disabled"; }; + qos_gmac: qos@ffa5c000 { + compatible = "syscon"; + reg = <0x0 0xffa5c000 0x0 0x20>; + }; + qos_hdcp: qos@ffa9 { compatible = "syscon"; reg = <0x0 0xffa9 0x0 0x20>; @@ -739,6 +744,11 @@ }; /* These power domains are grouped by VD_LOGIC */ + pd_gmac@RK3399_PD_GMAC { + reg = ; + clocks = < ACLK_GMAC>; + pm_qos = <_gmac>; + }; pd_vio@RK3399_PD_VIO { reg = ; #address-cells = <1>; -- 1.9.1
[PATCH v3 2/5] net: stmmac: dwmac-rk: add pd_gmac support for rk3399
From: David WuAdd the gmac power domain support for rk3399, in order to save more power consumption. Signed-off-by: David Wu Signed-off-by: Caesar Wang --- Changes in v3: None Changes in v2: - fixes the build error on next kernel. drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 289e7a6..406573d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "stmmac_platform.h" @@ -659,11 +660,19 @@ static int rk_gmac_powerup(struct rk_priv_data *bsp_priv) if (ret) return ret; + pm_runtime_enable(dev); + pm_runtime_get_sync(dev); + return 0; } static void rk_gmac_powerdown(struct rk_priv_data *gmac) { + struct device *dev = >pdev->dev; + + pm_runtime_put_sync(dev); + pm_runtime_disable(dev); + phy_power_on(gmac, false); gmac_clk_enable(gmac, false); } -- 1.9.1
Re: [PATCH] softirq: let ksoftirqd do its job
On 08/31/2016 04:11 PM, Eric Dumazet wrote: On Wed, 2016-08-31 at 15:47 -0700, Rick Jones wrote: With regard to drops, are both of you sure you're using the same socket buffer sizes? Does it really matter ? At least at points in the past I have seen different drop counts at the SO_RCVBUF based on using (sometimes much) larger sizes. The hypothesis I was operating under at the time was that this dealt with those situations where the netserver was held-off from running for "a little while" from time to time. It didn't change things for a sustained overload situation though. In the meantime, is anything interesting happening with TCP_RR or TCP_STREAM? TCP_RR is driven by the network latency, we do not drop packets in the socket itself. I've been of the opinion it (single stream) is driven by path length. Sometimes by NIC latency. But then I'm almost always measuring in the LAN rather than across the WAN. happy benchmarking, rick
[PATCH v3 1/5] net: stmmac: dwmac-rk: fixes the gmac resume after PD on/off
From: Roger ChenGMAC Power Domain(PD) will be disabled during suspend. That will causes GRF registers reset. So corresponding GRF registers for GMAC must be setup again. Signed-off-by: Roger Chen Signed-off-by: Caesar Wang --- Changes in v3: None Changes in v2: - rk_gmac_powerup instead of the rk_gmac_init. drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c index 9210591..289e7a6 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-rk.c @@ -629,6 +629,16 @@ static struct rk_priv_data *rk_gmac_setup(struct platform_device *pdev, "rockchip,grf"); bsp_priv->pdev = pdev; + gmac_clk_init(bsp_priv); + + return bsp_priv; +} + +static int rk_gmac_powerup(struct rk_priv_data *bsp_priv) +{ + int ret; + struct device *dev = _priv->pdev->dev; + /*rmii or rgmii*/ if (bsp_priv->phy_iface == PHY_INTERFACE_MODE_RGMII) { dev_info(dev, "init for RGMII\n"); @@ -641,15 +651,6 @@ static struct rk_priv_data *rk_gmac_setup(struct platform_device *pdev, dev_err(dev, "NO interface defined!\n"); } - gmac_clk_init(bsp_priv); - - return bsp_priv; -} - -static int rk_gmac_powerup(struct rk_priv_data *bsp_priv) -{ - int ret; - ret = phy_power_on(bsp_priv, true); if (ret) return ret; -- 1.9.1
[PATCH] [v10] net: emac: emac gigabit ethernet controller driver
Add support for the Qualcomm Technologies, Inc. EMAC gigabit Ethernet controller. This driver supports the following features: 1) Checksum offload. 2) Interrupt coalescing support. 3) SGMII phy. 4) phylib interface for external phy Based on original work by Niranjana VishwanathapuraGilad Avidov Signed-off-by: Timur Tabi --- v10: - removed superfluous acpi-related data - fix the Makefiles to allow module building - rename "jubbers" to "jabbers" - fix some function prototypes (found via sparse) - removed invalid __iomem (found via sparse) - don't print phy status unless phy is attached .../devicetree/bindings/net/qcom-emac.txt | 112 ++ MAINTAINERS|6 + drivers/net/ethernet/qualcomm/Kconfig | 12 + drivers/net/ethernet/qualcomm/Makefile |2 + drivers/net/ethernet/qualcomm/emac/Makefile|7 + drivers/net/ethernet/qualcomm/emac/emac-mac.c | 1528 drivers/net/ethernet/qualcomm/emac/emac-mac.h | 248 drivers/net/ethernet/qualcomm/emac/emac-phy.c | 204 +++ drivers/net/ethernet/qualcomm/emac/emac-phy.h | 33 + drivers/net/ethernet/qualcomm/emac/emac-sgmii.c| 722 + drivers/net/ethernet/qualcomm/emac/emac-sgmii.h| 24 + drivers/net/ethernet/qualcomm/emac/emac.c | 743 ++ drivers/net/ethernet/qualcomm/emac/emac.h | 335 + 13 files changed, 3976 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/qcom-emac.txt create mode 100644 drivers/net/ethernet/qualcomm/emac/Makefile create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-mac.c create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-mac.h create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-phy.c create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-phy.h create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii.c create mode 100644 drivers/net/ethernet/qualcomm/emac/emac-sgmii.h create mode 100644 drivers/net/ethernet/qualcomm/emac/emac.c create mode 100644 drivers/net/ethernet/qualcomm/emac/emac.h diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt b/Documentation/devicetree/bindings/net/qcom-emac.txt new file mode 100644 index 000..90c3584 --- /dev/null +++ b/Documentation/devicetree/bindings/net/qcom-emac.txt @@ -0,0 +1,112 @@ +Qualcomm Technologies EMAC Gigabit Ethernet Controller + +This network controller consists of two devices: a MAC and an SGMII +internal PHY. Each device is represented by a device tree node. A phandle +connects the MAC node to its corresponding internal phy node. Another +phandle points to the external PHY node. + +Required properties: + +MAC node: +- compatible : Should be "qcom,fsm9900-emac". +- reg : Offset and length of the register regions for the device +- interrupts : Interrupt number used by this controller +- mac-address : The 6-byte MAC address. If present, it is the default + MAC address. +- internal-phy : phandle to the internal PHY node +- phy-handle : phandle the the external PHY node + +Internal PHY node: +- compatible : Should be "qcom,fsm9900-emac-sgmii" or "qcom,qdf2432-emac-sgmii". +- reg : Offset and length of the register region(s) for the device +- interrupts : Interrupt number used by this controller + +The external phy child node: +- reg : The phy address + +Example: + +FSM9900: + +soc { + #address-cells = <1>; + #size-cells = <1>; + + emac0: ethernet@feb2 { + compatible = "qcom,fsm9900-emac"; + reg = <0xfeb2 0x1>, + <0xfeb36000 0x1000>; + interrupts = <76>; + + clocks = < 0>, < 1>, < 3>, < 4>, < 5>, + < 6>, < 7>; + clock-names = "axi_clk", "cfg_ahb_clk", "high_speed_clk", + "mdio_clk", "tx_clk", "rx_clk", "sys_clk"; + + internal-phy = <_sgmii>; + + phy-handle = <>; + + #address-cells = <1>; + #size-cells = <0>; + phy0: ethernet-phy@0 { + reg = <0>; + }; + + pinctrl-names = "default"; + pinctrl-0 = <_pins_a>; + }; + + emac_sgmii: ethernet@feb38000 { + compatible = "qcom,fsm9900-emac-sgmii"; + reg = <0xfeb38000 0x1000>; + interrupts = <80>; + }; + + tlmm: pinctrl@fd51 { + compatible = "qcom,fsm9900-pinctrl"; + + mdio_pins_a: mdio { + state { + pins = "gpio123", "gpio124"; + function = "mdio"; + }; + }; + }; + + +QDF2432: + +soc { + #address-cells = <2>; + #size-cells = <2>; + + emac0: ethernet@3880 { +
Re: [PATCH] softirq: let ksoftirqd do its job
On Wed, 2016-08-31 at 15:47 -0700, Rick Jones wrote: > With regard to drops, are both of you sure you're using the same socket > buffer sizes? Does it really matter ? I used the standard /proc/sys/net/core/rmem_default, but under flood receive queue is almost always full, even if you make it bigger. By varying its size, you only make batches bigger and number of context switches should be lower, if only two threads are competing for the cpu. Exact 'optimal' size would depend on various factors, depending on application and platform constraints. > > In the meantime, is anything interesting happening with TCP_RR or > TCP_STREAM? TCP_RR is driven by the network latency, we do not drop packets in the socket itself. TC_STREAM is normally paced by the ability of the receiver to send ACK packets. TCP has this auto regulating mode, unless the sender violates the RFC(s). If your question is : What happens if thousands of threads on the host want the cpu, and ksoftirqd gets not enough cycles by virtue of being a normal thread ? Then, you are back to typical provisioning problems, and normally people play with priorities and containers/cgroups, and/or various techniques like RPS/RFS (You can change ksoftirqd priority if you like)
Re: [PATCH net-next 00/12] net: Convert vrf from dst to tx hook
On 8/30/16 11:34 AM, David Ahern wrote: > This series fixes this problem by removing the output dst that points > to the VRF and always doing the actual FIB lookup. This allows the real > dst to be cached on sockets and used for MSS. Packets are diverted to > the VRF device on Tx using an l3mdev hook in the output path similar to > to what is done for Rx. Dave: please drop this series. BGP smoke tests triggered a couple of problems I need to resolve.
Re: [PATCH] softirq: let ksoftirqd do its job
With regard to drops, are both of you sure you're using the same socket buffer sizes? In the meantime, is anything interesting happening with TCP_RR or TCP_STREAM? happy benchmarking, rick jones
[PATCH 0/5] constify ethtool_ops structures
Constify ethtool_ops structures. --- drivers/net/ethernet/mediatek/mtk_eth_soc.c |2 +- drivers/net/ethernet/synopsys/dwc_eth_qos.c |2 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c |2 +- drivers/net/usb/r8152.c |2 +- drivers/staging/netlogic/xlr_net.c|2 +- 5 files changed, 5 insertions(+), 5 deletions(-)
[PATCH 5/5] net: axienet: constify ethtool_ops structures
Check for ethtool_ops structures that are only stored in the ethtool_ops field of a net_device structure or passed as the second argument to netdev_set_default_ethtool_ops. These contexts are declared const, so ethtool_ops structures that have these properties can be declared as const also. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @r disable optional_qualifier@ identifier i; position p; @@ static struct ethtool_ops i@p = { ... }; @ok1@ identifier r.i; struct net_device e; position p; @@ e.ethtool_ops = @p; @ok2@ identifier r.i; expression e; position p; @@ netdev_set_default_ethtool_ops(e, @p) @bad@ position p != {r.p,ok1.p,ok2.p}; identifier r.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct ethtool_ops i = { ... }; // Suggested-by: Stephen HemmingerSigned-off-by: Julia Lawall --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 36ee7ab..69e2a83 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -1297,7 +1297,7 @@ static int axienet_ethtools_set_coalesce(struct net_device *ndev, return 0; } -static struct ethtool_ops axienet_ethtool_ops = { +static const struct ethtool_ops axienet_ethtool_ops = { .get_drvinfo= axienet_ethtools_get_drvinfo, .get_regs_len = axienet_ethtools_get_regs_len, .get_regs = axienet_ethtools_get_regs,
[PATCH 3/5] dwc_eth_qos: constify ethtool_ops structures
Check for ethtool_ops structures that are only stored in the ethtool_ops field of a net_device structure or passed as the second argument to netdev_set_default_ethtool_ops. These contexts are declared const, so ethtool_ops structures that have these properties can be declared as const also. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @r disable optional_qualifier@ identifier i; position p; @@ static struct ethtool_ops i@p = { ... }; @ok1@ identifier r.i; struct net_device e; position p; @@ e.ethtool_ops = @p; @ok2@ identifier r.i; expression e; position p; @@ netdev_set_default_ethtool_ops(e, @p) @bad@ position p != {r.p,ok1.p,ok2.p}; identifier r.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct ethtool_ops i = { ... }; // Suggested-by: Stephen HemmingerSigned-off-by: Julia Lawall --- drivers/net/ethernet/synopsys/dwc_eth_qos.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c b/drivers/net/ethernet/synopsys/dwc_eth_qos.c index 5a3941b..c25d971 100644 --- a/drivers/net/ethernet/synopsys/dwc_eth_qos.c +++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c @@ -2743,7 +2743,7 @@ static void dwceqos_set_msglevel(struct net_device *ndev, u32 msglevel) lp->msg_enable = msglevel; } -static struct ethtool_ops dwceqos_ethtool_ops = { +static const struct ethtool_ops dwceqos_ethtool_ops = { .get_drvinfo= dwceqos_get_drvinfo, .get_link = ethtool_op_get_link, .get_pauseparam = dwceqos_get_pauseparam,
[PATCH 4/5] r8152: constify ethtool_ops structures
Check for ethtool_ops structures that are only stored in the ethtool_ops field of a net_device structure or passed as the second argument to netdev_set_default_ethtool_ops. These contexts are declared const, so ethtool_ops structures that have these properties can be declared as const also. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @r disable optional_qualifier@ identifier i; position p; @@ static struct ethtool_ops i@p = { ... }; @ok1@ identifier r.i; struct net_device e; position p; @@ e.ethtool_ops = @p; @ok2@ identifier r.i; expression e; position p; @@ netdev_set_default_ethtool_ops(e, @p) @bad@ position p != {r.p,ok1.p,ok2.p}; identifier r.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct ethtool_ops i = { ... }; // Suggested-by: Stephen HemmingerSigned-off-by: Julia Lawall --- drivers/net/usb/r8152.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index f41a8ad..f72f807 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -4032,7 +4032,7 @@ static int rtl8152_set_coalesce(struct net_device *netdev, return ret; } -static struct ethtool_ops ops = { +static const struct ethtool_ops ops = { .get_drvinfo = rtl8152_get_drvinfo, .get_settings = rtl8152_get_settings, .set_settings = rtl8152_set_settings,
[PATCH 1/5] net: mediatek: constify ethtool_ops structures
Check for ethtool_ops structures that are only stored in the ethtool_ops field of a net_device structure or passed as the second argument to netdev_set_default_ethtool_ops. These contexts are declared const, so ethtool_ops structures that have these properties can be declared as const also. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // @r disable optional_qualifier@ identifier i; position p; @@ static struct ethtool_ops i@p = { ... }; @ok1@ identifier r.i; struct net_device e; position p; @@ e.ethtool_ops = @p; @ok2@ identifier r.i; expression e; position p; @@ netdev_set_default_ethtool_ops(e, @p) @bad@ position p != {r.p,ok1.p,ok2.p}; identifier r.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r.i; @@ static +const struct ethtool_ops i = { ... }; // Suggested-by: Stephen HemmingerSigned-off-by: Julia Lawall --- drivers/net/ethernet/mediatek/mtk_eth_soc.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index 1801fd8..98f22cd 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1692,7 +1692,7 @@ static void mtk_get_ethtool_stats(struct net_device *dev, } while (u64_stats_fetch_retry_irq(>syncp, start)); } -static struct ethtool_ops mtk_ethtool_ops = { +static const struct ethtool_ops mtk_ethtool_ops = { .get_settings = mtk_get_settings, .set_settings = mtk_set_settings, .get_drvinfo= mtk_get_drvinfo,
Re: [PATCH] softirq: let ksoftirqd do its job
On Wed, 2016-08-31 at 23:51 +0200, Jesper Dangaard Brouer wrote: > > The result from this run were handling 1,517,248 pps, without any > drops, all processes pinned to the same CPU. > > $ nstat > /dev/null && sleep 1 && nstat > #kernel > IpInReceives15172250.0 > IpInDelivers15172240.0 > UdpInDatagrams 15172480.0 > IpExtInOctets 69793408 0.0 > IpExtInNoECTPkts15172460.0 > > I'm acking this patch: > > Acked-by: Jesper Dangaard Brouer> Thanks a lot for bringing back the issue to me again, and all your tests !
[PATCH net-next] net: dsa: remove ds_to_priv
Access the priv member of the dsa_switch structure directly, instead of having an unnecessary helper. Signed-off-by: Vivien Didelot--- drivers/net/dsa/b53/b53_common.c | 42 +++ drivers/net/dsa/bcm_sf2.h| 2 +- drivers/net/dsa/mv88e6060.c | 4 +-- drivers/net/dsa/mv88e6xxx/chip.c | 72 include/net/dsa.h| 5 --- 5 files changed, 60 insertions(+), 65 deletions(-) diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index 1299104..0afc2e5 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -477,7 +477,7 @@ static int b53_fast_age_vlan(struct b53_device *dev, u16 vid) static void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; unsigned int i; u16 pvlan; @@ -495,7 +495,7 @@ static void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port) static int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; unsigned int cpu_port = dev->cpu_port; u16 pvlan; @@ -520,7 +520,7 @@ static int b53_enable_port(struct dsa_switch *ds, int port, static void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; u8 reg; /* Disable Tx/Rx for the port */ @@ -629,7 +629,7 @@ static int b53_switch_reset(struct b53_device *dev) static int b53_phy_read16(struct dsa_switch *ds, int addr, int reg) { - struct b53_device *priv = ds_to_priv(ds); + struct b53_device *priv = ds->priv; u16 value = 0; int ret; @@ -644,7 +644,7 @@ static int b53_phy_read16(struct dsa_switch *ds, int addr, int reg) static int b53_phy_write16(struct dsa_switch *ds, int addr, int reg, u16 val) { - struct b53_device *priv = ds_to_priv(ds); + struct b53_device *priv = ds->priv; if (priv->ops->phy_write16) return priv->ops->phy_write16(priv, addr, reg, val); @@ -714,7 +714,7 @@ static unsigned int b53_get_mib_size(struct b53_device *dev) static void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; const struct b53_mib_desc *mibs = b53_get_mib(dev); unsigned int mib_size = b53_get_mib_size(dev); unsigned int i; @@ -727,7 +727,7 @@ static void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data) static void b53_get_ethtool_stats(struct dsa_switch *ds, int port, uint64_t *data) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; const struct b53_mib_desc *mibs = b53_get_mib(dev); unsigned int mib_size = b53_get_mib_size(dev); const struct b53_mib_desc *s; @@ -759,7 +759,7 @@ static void b53_get_ethtool_stats(struct dsa_switch *ds, int port, static int b53_get_sset_count(struct dsa_switch *ds) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; return b53_get_mib_size(dev); } @@ -771,7 +771,7 @@ static int b53_set_addr(struct dsa_switch *ds, u8 *addr) static int b53_setup(struct dsa_switch *ds) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; unsigned int port; int ret; @@ -802,7 +802,7 @@ static int b53_setup(struct dsa_switch *ds) static void b53_adjust_link(struct dsa_switch *ds, int port, struct phy_device *phydev) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; u8 rgmii_ctrl = 0, reg = 0, off; if (!phy_is_pseudo_fixed_link(phydev)) @@ -936,7 +936,7 @@ static int b53_vlan_prepare(struct dsa_switch *ds, int port, const struct switchdev_obj_port_vlan *vlan, struct switchdev_trans *trans) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; if ((is5325(dev) || is5365(dev)) && vlan->vid_begin == 0) return -EOPNOTSUPP; @@ -953,7 +953,7 @@ static void b53_vlan_add(struct dsa_switch *ds, int port, const struct switchdev_obj_port_vlan *vlan, struct switchdev_trans *trans) { - struct b53_device *dev = ds_to_priv(ds); + struct b53_device *dev = ds->priv; bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED; bool pvid = vlan->flags & BRIDGE_VLAN_INFO_PVID; unsigned int cpu_port =
RE: [PATCH net-next V4 00/10] liquidio CN23XX support
Thanks Much. > -Original Message- > From: David Miller [mailto:da...@davemloft.net] > Sent: Wednesday, August 31, 2016 2:13 PM > To: Vatsavayi, Raghu > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net-next V4 00/10] liquidio CN23XX support > > From: Raghu Vatsavayi> Date: Wed, 31 Aug 2016 11:03:19 -0700 > > > Following patchset adds support for new device "CN23XX" in liquidio > > family of adapters. As adviced by you I have split the previous V3 > > patch of 18 patches into two halves. This first patchset has first 10 > > patches, which are tested against net-next. I will post the second > > half after this one. > > > > This V4 patch also addressed all the comments from previous > > submission: > > 1) Avoid busy loop while reading registers. > > 2) Other minor comments about debug messages and constants. > > > > Please apply patches in following order as some of the patches depend > > on earlier patches. > > Series applied, thanks.
[PATCH v2 net-next 4/6] perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs
Allow attaching BPF_PROG_TYPE_PERF_EVENT programs to sw and hw perf events via overflow_handler mechanism. When program is attached the overflow_handlers become stacked. The program acts as a filter. Returning zero from the program means that the normal perf_event_output handler will not be called and sampling event won't be stored in the ring buffer. The overflow_handler_context==NULL is an additional safety check to make sure programs are not attached to hw breakpoints and watchdog in case other checks (that prevent that now anyway) get accidentally relaxed in the future. The program refcnt is incremented in case perf_events are inhereted when target task is forked. Similar to kprobe and tracepoint programs there is no ioctl to detach the program or swap already attached program. The user space expected to close(perf_event_fd) like it does right now for kprobe+bpf. That restriction simplifies the code quite a bit. The invocation of overflow_handler in __perf_event_overflow() is now done via READ_ONCE, since that pointer can be replaced when the program is attached while perf_event itself could have been active already. There is no need to do similar treatment for event->prog, since it's assigned only once before it's accessed. Signed-off-by: Alexei Starovoitov--- include/linux/bpf.h| 4 +++ include/linux/perf_event.h | 2 ++ kernel/events/core.c | 85 +- 3 files changed, 90 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 11134238417d..9a904f63f8c1 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -297,6 +297,10 @@ static inline struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) static inline void bpf_prog_put(struct bpf_prog *prog) { } +static inline struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog) +{ + return ERR_PTR(-EOPNOTSUPP); +} #endif /* CONFIG_BPF_SYSCALL */ /* verifier prototypes for helper functions called from eBPF programs */ diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 97bfe62f30d7..dcaaaf3ec8e6 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -679,6 +679,8 @@ struct perf_event { u64 (*clock)(void); perf_overflow_handler_t overflow_handler; void*overflow_handler_context; + perf_overflow_handler_t orig_overflow_handler; + struct bpf_prog *prog; #ifdef CONFIG_EVENT_TRACING struct trace_event_call *tp_event; diff --git a/kernel/events/core.c b/kernel/events/core.c index 3cfabdf7b942..305433ab2447 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -7022,7 +7022,7 @@ static int __perf_event_overflow(struct perf_event *event, irq_work_queue(>pending); } - event->overflow_handler(event, data, regs); + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { event->pending_wakeup = 1; @@ -7637,11 +7637,75 @@ static void perf_event_free_filter(struct perf_event *event) ftrace_profile_free_filter(event); } +static void bpf_overflow_handler(struct perf_event *event, +struct perf_sample_data *data, +struct pt_regs *regs) +{ + struct bpf_perf_event_data_kern ctx = { + .data = data, + .regs = regs, + }; + int ret = 0; + +#ifdef CONFIG_BPF_SYSCALL + preempt_disable(); + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) + goto out; + rcu_read_lock(); + ret = BPF_PROG_RUN(event->prog, (void *)); + rcu_read_unlock(); + out: + __this_cpu_dec(bpf_prog_active); + preempt_enable(); +#endif + if (!ret) + return; + + event->orig_overflow_handler(event, data, regs); +} + +static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd) +{ + struct bpf_prog *prog; + + if (event->overflow_handler_context) + /* hw breakpoint or kernel counter */ + return -EINVAL; + + if (event->prog) + return -EEXIST; + + prog = bpf_prog_get_type(prog_fd, BPF_PROG_TYPE_PERF_EVENT); + if (IS_ERR(prog)) + return PTR_ERR(prog); + + event->prog = prog; + event->orig_overflow_handler = READ_ONCE(event->overflow_handler); + WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); + return 0; +} + +static void perf_event_free_bpf_handler(struct perf_event *event) +{ + struct bpf_prog *prog = event->prog; + + if (!prog) + return; + + WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler); + event->prog = NULL; + bpf_prog_put(prog); +} + static int
[PATCH v2 net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events
Hi Peter, Dave, this patch set is a follow up to the discussion: https://lkml.kernel.org/r/20160804142853.GO6862%20()%20twins%20!%20programming%20!%20kicks-ass%20!%20net It turned out to be simpler than what we discussed. Patches 1-3 is bpf-side prep for the main patch 4 that adds bpf program as an overflow_handler to sw and hw perf_events. Peter, please review. Patches 5 and 6 are examples from myself and Brendan. v1-v2: fixed issues spotted by Peter and Daniel. Thanks! Alexei Starovoitov (5): bpf: support 8-byte metafield access bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type bpf: perf_event progs should only use preallocated maps perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs samples/bpf: add perf_event+bpf example Brendan Gregg (1): samples/bpf: add sampleip example include/linux/bpf.h | 4 + include/linux/perf_event.h | 7 ++ include/uapi/linux/Kbuild | 1 + include/uapi/linux/bpf.h| 1 + include/uapi/linux/bpf_perf_event.h | 18 +++ kernel/bpf/verifier.c | 31 +- kernel/events/core.c| 85 +- kernel/trace/bpf_trace.c| 60 ++ samples/bpf/Makefile| 8 ++ samples/bpf/bpf_helpers.h | 2 + samples/bpf/bpf_load.c | 7 +- samples/bpf/sampleip_kern.c | 38 +++ samples/bpf/sampleip_user.c | 196 + samples/bpf/trace_event_kern.c | 65 +++ samples/bpf/trace_event_user.c | 213 15 files changed, 730 insertions(+), 6 deletions(-) create mode 100644 include/uapi/linux/bpf_perf_event.h create mode 100644 samples/bpf/sampleip_kern.c create mode 100644 samples/bpf/sampleip_user.c create mode 100644 samples/bpf/trace_event_kern.c create mode 100644 samples/bpf/trace_event_user.c -- 2.8.0
[PATCH v2 net-next 2/6] bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type
Introduce BPF_PROG_TYPE_PERF_EVENT programs that can be attached to HW and SW perf events (PERF_TYPE_HARDWARE and PERF_TYPE_SOFTWARE correspondingly in uapi/linux/perf_event.h) The program visible context meta structure is struct bpf_perf_event_data { struct pt_regs regs; __u64 sample_period; }; which is accessible directly from the program: int bpf_prog(struct bpf_perf_event_data *ctx) { ... ctx->sample_period ... ... ctx->regs.ip ... } The bpf verifier rewrites the accesses into kernel internal struct bpf_perf_event_data_kern which allows changing struct perf_sample_data without affecting bpf programs. New fields can be added to the end of struct bpf_perf_event_data in the future. Signed-off-by: Alexei StarovoitovAcked-by: Daniel Borkmann --- include/linux/perf_event.h | 5 include/uapi/linux/Kbuild | 1 + include/uapi/linux/bpf.h| 1 + include/uapi/linux/bpf_perf_event.h | 18 +++ kernel/trace/bpf_trace.c| 60 + 5 files changed, 85 insertions(+) create mode 100644 include/uapi/linux/bpf_perf_event.h diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2b6b43cc0dd5..97bfe62f30d7 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -788,6 +788,11 @@ struct perf_output_handle { int page; }; +struct bpf_perf_event_data_kern { + struct pt_regs *regs; + struct perf_sample_data *data; +}; + #ifdef CONFIG_CGROUP_PERF /* diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index 185f8ea2702f..d0352a971ebd 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -71,6 +71,7 @@ header-y += binfmts.h header-y += blkpg.h header-y += blktrace_api.h header-y += bpf_common.h +header-y += bpf_perf_event.h header-y += bpf.h header-y += bpqether.h header-y += bsg.h diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index e4c5a1baa993..f896dfac4ac0 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -95,6 +95,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_TRACEPOINT, BPF_PROG_TYPE_XDP, + BPF_PROG_TYPE_PERF_EVENT, }; #define BPF_PSEUDO_MAP_FD 1 diff --git a/include/uapi/linux/bpf_perf_event.h b/include/uapi/linux/bpf_perf_event.h new file mode 100644 index ..067427259820 --- /dev/null +++ b/include/uapi/linux/bpf_perf_event.h @@ -0,0 +1,18 @@ +/* Copyright (c) 2016 Facebook + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#ifndef _UAPI__LINUX_BPF_PERF_EVENT_H__ +#define _UAPI__LINUX_BPF_PERF_EVENT_H__ + +#include +#include + +struct bpf_perf_event_data { + struct pt_regs regs; + __u64 sample_period; +}; + +#endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */ diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ad35213b8405..0ac414abbf68 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1,4 +1,5 @@ /* Copyright (c) 2011-2015 PLUMgrid, http://plumgrid.com + * Copyright (c) 2016 Facebook * * This program is free software; you can redistribute it and/or * modify it under the terms of version 2 of the GNU General Public @@ -8,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -552,10 +554,68 @@ static struct bpf_prog_type_list tracepoint_tl = { .type = BPF_PROG_TYPE_TRACEPOINT, }; +static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type, + enum bpf_reg_type *reg_type) +{ + if (off < 0 || off >= sizeof(struct bpf_perf_event_data)) + return false; + if (type != BPF_READ) + return false; + if (off % size != 0) + return false; + if (off == offsetof(struct bpf_perf_event_data, sample_period)) { + if (size != sizeof(u64)) + return false; + } else { + if (size != sizeof(long)) + return false; + } + return true; +} + +static u32 pe_prog_convert_ctx_access(enum bpf_access_type type, int dst_reg, + int src_reg, int ctx_off, + struct bpf_insn *insn_buf, + struct bpf_prog *prog) +{ + struct bpf_insn *insn = insn_buf; + + BUILD_BUG_ON(FIELD_SIZEOF(struct perf_sample_data, period) != sizeof(u64)); + switch (ctx_off) { + case offsetof(struct bpf_perf_event_data, sample_period): + *insn++ = BPF_LDX_MEM(bytes_to_bpf_size(FIELD_SIZEOF(struct bpf_perf_event_data_kern, data)), + dst_reg, src_reg, +
[PATCH v2 net-next 3/6] bpf: perf_event progs should only use preallocated maps
Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use preallocated hash maps, since doing memory allocation in overflow_handler can crash depending on where nmi got triggered. Signed-off-by: Alexei StarovoitovAcked-by: Daniel Borkmann --- kernel/bpf/verifier.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c1c9e441f0f5..48c2705db22c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2511,6 +2511,20 @@ process_bpf_exit: return 0; } +static int check_map_prog_compatibility(struct bpf_map *map, + struct bpf_prog *prog) + +{ + if (prog->type == BPF_PROG_TYPE_PERF_EVENT && + (map->map_type == BPF_MAP_TYPE_HASH || +map->map_type == BPF_MAP_TYPE_PERCPU_HASH) && + (map->map_flags & BPF_F_NO_PREALLOC)) { + verbose("perf_event programs can only use preallocated hash map\n"); + return -EINVAL; + } + return 0; +} + /* look for pseudo eBPF instructions that access map FDs and * replace them with actual map pointers */ @@ -2518,7 +2532,7 @@ static int replace_map_fd_with_map_ptr(struct verifier_env *env) { struct bpf_insn *insn = env->prog->insnsi; int insn_cnt = env->prog->len; - int i, j; + int i, j, err; for (i = 0; i < insn_cnt; i++, insn++) { if (BPF_CLASS(insn->code) == BPF_LDX && @@ -2562,6 +2576,12 @@ static int replace_map_fd_with_map_ptr(struct verifier_env *env) return PTR_ERR(map); } + err = check_map_prog_compatibility(map, env->prog); + if (err) { + fdput(f); + return err; + } + /* store map pointer inside BPF_LD_IMM64 instruction */ insn[0].imm = (u32) (unsigned long) map; insn[1].imm = ((u64) (unsigned long) map) >> 32; -- 2.8.0
[PATCH v2 net-next 1/6] bpf: support 8-byte metafield access
The verifier supported only 4-byte metafields in struct __sk_buff and struct xdp_md. The metafields in upcoming struct bpf_perf_event are 8-byte to match register width in struct pt_regs. Teach verifier to recognize 8-byte metafield access. The patch doesn't affect safety of sockets and xdp programs. They check for 4-byte only ctx access before these conditions are hit. Signed-off-by: Alexei StarovoitovAcked-by: Daniel Borkmann --- kernel/bpf/verifier.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index abb61f3f6900..c1c9e441f0f5 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2333,7 +2333,8 @@ static int do_check(struct verifier_env *env) if (err) return err; - if (BPF_SIZE(insn->code) != BPF_W) { + if (BPF_SIZE(insn->code) != BPF_W && + BPF_SIZE(insn->code) != BPF_DW) { insn_idx++; continue; } @@ -2642,9 +2643,11 @@ static int convert_ctx_accesses(struct verifier_env *env) for (i = 0; i < insn_cnt; i++, insn++) { u32 insn_delta, cnt; - if (insn->code == (BPF_LDX | BPF_MEM | BPF_W)) + if (insn->code == (BPF_LDX | BPF_MEM | BPF_W) || + insn->code == (BPF_LDX | BPF_MEM | BPF_DW)) type = BPF_READ; - else if (insn->code == (BPF_STX | BPF_MEM | BPF_W)) + else if (insn->code == (BPF_STX | BPF_MEM | BPF_W) || +insn->code == (BPF_STX | BPF_MEM | BPF_DW)) type = BPF_WRITE; else continue; -- 2.8.0
[PATCH v2 net-next 5/6] samples/bpf: add perf_event+bpf example
The bpf program is called 50 times a second and does hashmap[kern_stackid]++ It's primary purpose to check that key bpf helpers like map lookup, update, get_stackid, trace_printk and ctx access are all working. It checks: - PERF_COUNT_HW_CPU_CYCLES on all cpus - PERF_COUNT_HW_CPU_CYCLES for current process and inherited perf_events to children - PERF_COUNT_SW_CPU_CLOCK on all cpus - PERF_COUNT_SW_CPU_CLOCK for current process Signed-off-by: Alexei Starovoitov--- samples/bpf/Makefile | 4 + samples/bpf/bpf_helpers.h | 2 + samples/bpf/bpf_load.c | 7 +- samples/bpf/trace_event_kern.c | 65 + samples/bpf/trace_event_user.c | 213 + 5 files changed, 290 insertions(+), 1 deletion(-) create mode 100644 samples/bpf/trace_event_kern.c create mode 100644 samples/bpf/trace_event_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index db3cb061bfcd..a69cf9045285 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -25,6 +25,7 @@ hostprogs-y += test_cgrp2_array_pin hostprogs-y += xdp1 hostprogs-y += xdp2 hostprogs-y += test_current_task_under_cgroup +hostprogs-y += trace_event test_verifier-objs := test_verifier.o libbpf.o test_maps-objs := test_maps.o libbpf.o @@ -52,6 +53,7 @@ xdp1-objs := bpf_load.o libbpf.o xdp1_user.o xdp2-objs := bpf_load.o libbpf.o xdp1_user.o test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \ test_current_task_under_cgroup_user.o +trace_event-objs := bpf_load.o libbpf.o trace_event_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -79,6 +81,7 @@ always += test_cgrp2_tc_kern.o always += xdp1_kern.o always += xdp2_kern.o always += test_current_task_under_cgroup_kern.o +always += trace_event_kern.o HOSTCFLAGS += -I$(objtree)/usr/include @@ -103,6 +106,7 @@ HOSTLOADLIBES_test_overhead += -lelf -lrt HOSTLOADLIBES_xdp1 += -lelf HOSTLOADLIBES_xdp2 += -lelf HOSTLOADLIBES_test_current_task_under_cgroup += -lelf +HOSTLOADLIBES_trace_event += -lelf # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h index bbdf62a1e45e..90f44bd2045e 100644 --- a/samples/bpf/bpf_helpers.h +++ b/samples/bpf/bpf_helpers.h @@ -55,6 +55,8 @@ static int (*bpf_skb_get_tunnel_opt)(void *ctx, void *md, int size) = (void *) BPF_FUNC_skb_get_tunnel_opt; static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) = (void *) BPF_FUNC_skb_set_tunnel_opt; +static unsigned long long (*bpf_get_prandom_u32)(void) = + (void *) BPF_FUNC_get_prandom_u32; /* llvm builtin functions that eBPF C program may use to * emit BPF_LD_ABS and BPF_LD_IND instructions diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c index 0cfda2320320..97913e109b14 100644 --- a/samples/bpf/bpf_load.c +++ b/samples/bpf/bpf_load.c @@ -51,6 +51,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size) bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0; bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0; bool is_xdp = strncmp(event, "xdp", 3) == 0; + bool is_perf_event = strncmp(event, "perf_event", 10) == 0; enum bpf_prog_type prog_type; char buf[256]; int fd, efd, err, id; @@ -69,6 +70,8 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size) prog_type = BPF_PROG_TYPE_TRACEPOINT; } else if (is_xdp) { prog_type = BPF_PROG_TYPE_XDP; + } else if (is_perf_event) { + prog_type = BPF_PROG_TYPE_PERF_EVENT; } else { printf("Unknown event '%s'\n", event); return -1; @@ -82,7 +85,7 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size) prog_fd[prog_cnt++] = fd; - if (is_xdp) + if (is_xdp || is_perf_event) return 0; if (is_socket) { @@ -326,6 +329,7 @@ int load_bpf_file(char *path) memcmp(shname_prog, "kretprobe/", 10) == 0 || memcmp(shname_prog, "tracepoint/", 11) == 0 || memcmp(shname_prog, "xdp", 3) == 0 || + memcmp(shname_prog, "perf_event", 10) == 0 || memcmp(shname_prog, "socket", 6) == 0) load_and_attach(shname_prog, insns, data_prog->d_size); } @@ -344,6 +348,7 @@ int load_bpf_file(char *path) memcmp(shname, "kretprobe/", 10) == 0 || memcmp(shname, "tracepoint/", 11) == 0 || memcmp(shname, "xdp", 3) == 0 || + memcmp(shname, "perf_event", 10) == 0 ||
Re: [PATCH] softirq: let ksoftirqd do its job
On Wed, 31 Aug 2016 13:42:30 -0700 Eric Dumazetwrote: > On Wed, 2016-08-31 at 21:40 +0200, Jesper Dangaard Brouer wrote: > > > I can confirm the improvement of approx 900Kpps (no wonder people have > > been complaining about DoS against UDP/DNS servers). > > > > BUT during my extensive testing, of this patch, I also think that we > > have not gotten to the bottom of this. I was expecting to see a higher > > (collective) PPS number as I add more UDP servers, but I don't. > > > > Running many UDP netperf's with command: > > super_netperf 4 -H 198.18.50.3 -l 120 -t UDP_STREAM -T 0,0 -- -m 1472 -n > > -N > > Are you sure sender can send fast enough ? Yes, as I can see drops (overrun UDP limit UdpRcvbufErrors). Switching to pktgen and udp_sink to be sure. > > > > With 'top' I can see ksoftirq are still getting a higher %CPU time: > > > > PID %CPU TIME+ COMMAND > > 3 36.5 2:28.98 ksoftirqd/0 > > 107249.6 0:01.05 netserver > > 107229.3 0:01.05 netserver > > 107239.3 0:01.05 netserver > > 107259.3 0:01.05 netserver > > Looks much better on my machine, with "udprcv -n 4" (using 4 threads, > and 4 sockets using SO_REUSEPORT) > > 10755 root 20 0 34948 4 0 S 79.7 0.0 0:33.66 udprcv > 3 root 20 0 0 0 0 R 19.9 0.0 0:25.49 > ksoftirqd/0 > > Pressing 'H' in top gives : > > 3 root 20 0 0 0 0 R 19.9 0.0 0:47.84 ksoftirqd/0 > 10756 root 20 0 34948 4 0 R 19.9 0.0 0:30.76 udprcv > 10757 root 20 0 34948 4 0 R 19.9 0.0 0:30.76 udprcv > 10758 root 20 0 34948 4 0 S 19.9 0.0 0:30.76 udprcv > 10759 root 20 0 34948 4 0 S 19.9 0.0 0:30.76 udprcv Yes, I'm seeing the same when unning 5 instances my own udp_sink[1]: sudo taskset -c 0 ./udp_sink --port 10003 --recvmsg --reuse-port --count $((10**10)) PID S %CPU TIME+ COMMAND 3 R 21.6 2:21.33 ksoftirqd/0 3838 R 15.9 0:02.18 udp_sink 3856 R 15.6 0:02.16 udp_sink 3862 R 15.6 0:02.16 udp_sink 3844 R 15.3 0:02.15 udp_sink 3850 S 15.3 0:02.15 udp_sink This is the expected result, that adding more userspace receivers scales up. I needed 5 udp_sink's before I don't see any drops, either this says the job performed by ksoftirqd is 5 times faster or the collective queue size of the programs was fast enough to absorb the scheduling jitter. The result from this run were handling 1,517,248 pps, without any drops, all processes pinned to the same CPU. $ nstat > /dev/null && sleep 1 && nstat #kernel IpInReceives15172250.0 IpInDelivers15172240.0 UdpInDatagrams 15172480.0 IpExtInOctets 69793408 0.0 IpExtInNoECTPkts15172460.0 I'm acking this patch: Acked-by: Jesper Dangaard Brouer > > Patch was on top of commit 071e31e254e0e0c438eecba3dba1d6e2d0da36c2 Mine on top of commit 84fd1b191a9468 > > > > > > > Since the load runs in well identified threads context, an admin can > > > more easily tune process scheduling parameters if needed. > > > > With this patch applied, I found that changing the UDP server process, > > scheduler policy to SCHED_RR or SCHED_FIFO gave me a performance boost > > from 900Kpps to 1.7Mpps, and not a single UDP packet dropped (even with > > a single UDP stream, also tested with more) > > > > Command used: > > sudo chrt --rr -p 20 $(pgrep netserver) > > > Sure, this is what I mentioned in my changelog : Once we properly > schedule and rely on ksoftirqd, tuning is available. > > > > > The scheduling picture also change a lot: > > > >PID %CPU TIME+ COMMAND > > 10783 24.3 0:21.53 netserver > > 10784 24.3 0:21.53 netserver > > 10785 24.3 0:21.52 netserver > > 10786 24.3 0:21.50 netserver > > 3 2.7 3:12.18 ksoftirqd/0 > > [1] https://github.com/netoptimizer/network-testing/blob/master/src/udp_sink.c -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH] irda: Fix likely typo in output format string
From: Oleg DrokinDate: Wed, 31 Aug 2016 17:36:44 -0400 > > On Aug 31, 2016, at 5:31 PM, David Miller wrote: > >> From: Oleg Drokin >> Date: Fri, 26 Aug 2016 23:14:06 -0400 >> >>> %ul would print an unsigned with a letter l at the end which does >>> not seem to be desired here, on the other hand the value being printed >>> is u32 so just drop the l instead of converting to %lu >>> >>> Signed-off-by: Oleg Drokin >> >> %u is for unsigned values, and these are "s32" thus signed. > > Hm, you are right. > I could swear I saw them as unsigned when I looked at it. > > Anyway can they really be negative? they are seconds and usec, > should I change them to u32 too? If you're interesting in continuing with this, it is your area for exploration not our's :-)
[PATCH v2 net-next 6/6] samples/bpf: add sampleip example
From: Brendan Greggsample instruction pointer and frequency count in a BPF map Signed-off-by: Brendan Gregg Signed-off-by: Alexei Starovoitov --- samples/bpf/Makefile| 4 + samples/bpf/sampleip_kern.c | 38 + samples/bpf/sampleip_user.c | 196 3 files changed, 238 insertions(+) create mode 100644 samples/bpf/sampleip_kern.c create mode 100644 samples/bpf/sampleip_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index a69cf9045285..12b7304d55dc 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -26,6 +26,7 @@ hostprogs-y += xdp1 hostprogs-y += xdp2 hostprogs-y += test_current_task_under_cgroup hostprogs-y += trace_event +hostprogs-y += sampleip test_verifier-objs := test_verifier.o libbpf.o test_maps-objs := test_maps.o libbpf.o @@ -54,6 +55,7 @@ xdp2-objs := bpf_load.o libbpf.o xdp1_user.o test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \ test_current_task_under_cgroup_user.o trace_event-objs := bpf_load.o libbpf.o trace_event_user.o +sampleip-objs := bpf_load.o libbpf.o sampleip_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -82,6 +84,7 @@ always += xdp1_kern.o always += xdp2_kern.o always += test_current_task_under_cgroup_kern.o always += trace_event_kern.o +always += sampleip_kern.o HOSTCFLAGS += -I$(objtree)/usr/include @@ -107,6 +110,7 @@ HOSTLOADLIBES_xdp1 += -lelf HOSTLOADLIBES_xdp2 += -lelf HOSTLOADLIBES_test_current_task_under_cgroup += -lelf HOSTLOADLIBES_trace_event += -lelf +HOSTLOADLIBES_sampleip += -lelf # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/sampleip_kern.c b/samples/bpf/sampleip_kern.c new file mode 100644 index ..774a681f374a --- /dev/null +++ b/samples/bpf/sampleip_kern.c @@ -0,0 +1,38 @@ +/* Copyright 2016 Netflix, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#include +#include +#include +#include +#include "bpf_helpers.h" + +#define MAX_IPS8192 + +struct bpf_map_def SEC("maps") ip_map = { + .type = BPF_MAP_TYPE_HASH, + .key_size = sizeof(u64), + .value_size = sizeof(u32), + .max_entries = MAX_IPS, +}; + +SEC("perf_event") +int do_sample(struct bpf_perf_event_data *ctx) +{ + u64 ip; + u32 *value, init_val = 1; + + ip = ctx->regs.ip; + value = bpf_map_lookup_elem(_map, ); + if (value) + *value += 1; + else + /* E2BIG not tested for this example only */ + bpf_map_update_elem(_map, , _val, BPF_NOEXIST); + + return 0; +} +char _license[] SEC("license") = "GPL"; diff --git a/samples/bpf/sampleip_user.c b/samples/bpf/sampleip_user.c new file mode 100644 index ..260a6bdd6413 --- /dev/null +++ b/samples/bpf/sampleip_user.c @@ -0,0 +1,196 @@ +/* + * sampleip: sample instruction pointer and frequency count in a BPF map. + * + * Copyright 2016 Netflix, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "libbpf.h" +#include "bpf_load.h" + +#define DEFAULT_FREQ 99 +#define DEFAULT_SECS 5 +#define MAX_IPS8192 +#define PAGE_OFFSET0x8800 + +static int nr_cpus; + +static void usage(void) +{ + printf("USAGE: sampleip [-F freq] [duration]\n"); + printf(" -F freq# sample frequency (Hertz), default 99\n"); + printf(" duration # sampling duration (seconds), default 5\n"); +} + +static int sampling_start(int *pmu_fd, int freq) +{ + int i; + + struct perf_event_attr pe_sample_attr = { + .type = PERF_TYPE_SOFTWARE, + .freq = 1, + .sample_period = freq, + .config = PERF_COUNT_SW_CPU_CLOCK, + .inherit = 1, + }; + + for (i = 0; i < nr_cpus; i++) { + pmu_fd[i] = perf_event_open(_sample_attr, -1 /* pid */, i, + -1 /* group_fd */, 0 /* flags */); + if (pmu_fd[i] < 0) { + fprintf(stderr, "ERROR: Initializing perf sampling\n"); + return 1; + } + assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, +prog_fd[0]) == 0); + assert(ioctl(pmu_fd[i],
Re: [PATCH V2] dt: net: enhance DWC EQoS binding to support Tegra186
On 08/31/2016 03:15 AM, Lars Persson wrote: On 08/30/2016 10:50 PM, Stephen Warren wrote: On 08/30/2016 01:01 PM, Rob Herring wrote: On Wed, Aug 24, 2016 at 03:20:46PM -0600, Stephen Warren wrote: The Synopsys DWC EQoS is a configurable IP block which supports multiple options for bus type, clocking and reset structure, and feature list. Extend the DT binding to define a "compatible value" for the configuration contained in NVIDIA's Tegra186 SoC, and define some new properties and list property entries required by that configuration. diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt +- clock-names: May contain any/all of the following depending on the IP + configuration, in any order: +The EQOS transmit path clock. The HW signal name is clk_tx_i. +In some configurations (e.g. GMII/RGMII), this clock also drives the PHY TX +path. In other configurations, other clocks (such as tx_125, rmii) may +drive the PHY TX path. + - "rx" +The EQOS receive path clock. The HW signal name is clk_rx_i. +In some configurations (e.g. GMII/RGMII), this clock also drives the PHY RX +path. In other configurations, other clocks (such as rx_125, pmarx_0, +pmarx_1, rmii) may drive the PHY RX path. It is not correct that clk_rx_i drives the PHY rx path for GMII/RGMII. The PHY is the source of the rx clock for these modes. I think both of our statements are true. There's a clock input to the EQOS module (clk_rx_i) that does drive the RX path in the EQOS module. That clock also drives the PHY's RX path. Those statements make no comment regarding the /source/ of that clock; either of the following might be true: 1) The PHY could generate the clock internally somehow, feed its own internal logic with that clock, and send the clock out to feed the EQOS RX path too. or, 2) SoC integration could drive the same clock into both the EQOS and PHY modules, so that both sets of logic are fed from the same external clock. Perhaps the phrase "PHY RX path" is confusing; I was talking about the EQOS modules' RX path from the PHY more than the PHY itself, although given what I said above I believe either interpretation is valid and correct. Will the driver need to make any clock ops on the "rx" clock ? Yes. The EQOS driver needs to ensure that the clock is running before attempting to receive data from the PHY, otherwise the EQOS's own RX logic won't be clocked. Whether the phandle for this clock points at a SoC-level provider (it will in Tegra) or a clock provider in the PHY (it might in other SoCs), shouldn't matter as far as the DT binding goes, although it might affect device probe ordering in some implementations. + Note: Support for additional IP configurations may require adding the + following clocks to this list in the future: clk_rx_125_i, clk_tx_125_i, + clk_pmarx_0_i, clk_pmarx1_i, clk_rmii_i, clk_revmii_rx_i, clk_revmii_tx_i. + + The following compatible values require the following set of clocks: + - "nvidia,tegra186-eqos", "snps,dwc-qos-ethernet-4.10": +- "slave_bus" +- "master_bus" +- "rx" +- "tx" +- "ptp_ref" + - "axis,artpec6-eqos", "snps,dwc-qos-ethernet-4.10": +- "phy_ref_clk" +- "apb_clk" It would be good if this was marked deprecated and the full set of clocks could be described and supported. Not sure if you can figure that out. Is it really only 2 clocks, or these have multiple connections to the same source. Lars, can you answer here? I deliberately didn't attempt to change the binding definition for the existing use-case, since I'm not familiar with that SoC, and don't relish changing DTs for a platform I can't test. For the artpec-6 the clocks are like this: apb_clk: It is both the master and slave bus clock. phy_ref_clk: It corresponds to tx clock in the proposed new binding. There is a also a ptp reference clock that will map to the new ptp_ref clock binding. So the full set of clocks in a new artpec-6 binding is: slave_bus master_bus tx ptp_ref Given the discussion above, I think we should represent the rx clock too.
Re: [RESEND PATCH 3/4] arm64: dts: rockchip: support gmac for rk3399
Hi, On Wed, Aug 31, 2016 at 2:29 PM, Heiko Stübnerwrote: >> IMHO it would be nice if this were broken into two patches. >> >> 1. First patch would be the power domain patch and that could land any >> time. You wouldn't actually be able to use the gmac but at least >> you'd be able to turn off its power. This would be a handy patch to >> be able to backport if you happened to not need Ethernet support but >> wanted to save power. >> >> 2. Second patch would actually add the gmac. > > according to my talk with Caesar in the real v1, the gmac even with power- > domains should work just nicely even without the dts patches, as the driver > core takes care of powering up the pd before probe. > > But I may miss some peculiarity of the dwmac? Nothing that I'm terribly aware of. I was just being selfish because: 1. I'm on a board where I don't need Ethernet. 2. I'm running a semi old kernel (4.4) 3. I don't want to pick back the various fixes that might be needed to make gmac work on rk3399 to that old kernel. 4. I want it very obvious that gmac isn't really supported on this old kernel on rk3399 (and having stmmac not in the device tree would make it very obvious) 5. I do want the power savings of turning the power domains off for the gmac. If this patch is broken in two then I can pick back just the power domain patch. :-P -Doug
Re: [PATCH] irda: Fix likely typo in output format string
On Aug 31, 2016, at 5:31 PM, David Miller wrote: > From: Oleg Drokin> Date: Fri, 26 Aug 2016 23:14:06 -0400 > >> %ul would print an unsigned with a letter l at the end which does >> not seem to be desired here, on the other hand the value being printed >> is u32 so just drop the l instead of converting to %lu >> >> Signed-off-by: Oleg Drokin > > %u is for unsigned values, and these are "s32" thus signed. Hm, you are right. I could swear I saw them as unsigned when I looked at it. Anyway can they really be negative? they are seconds and usec, should I change them to u32 too?
Re: [PATCH] irda: Fix likely typo in output format string
From: Oleg DrokinDate: Fri, 26 Aug 2016 23:14:06 -0400 > %ul would print an unsigned with a letter l at the end which does > not seem to be desired here, on the other hand the value being printed > is u32 so just drop the l instead of converting to %lu > > Signed-off-by: Oleg Drokin %u is for unsigned values, and these are "s32" thus signed.
Re: [RESEND PATCH 3/4] arm64: dts: rockchip: support gmac for rk3399
Am Mittwoch, 31. August 2016, 13:42:17 schrieb Doug Anderson: > Caesar, > > On Tue, Aug 30, 2016 at 11:13 PM, Caesar Wangwrote: > > This patch adds needed gamc information for rk3399, > > also support the gmac pd. > > > > Signed-off-by: Roger Chen > > Signed-off-by: Caesar Wang > > --- > > > > arch/arm64/boot/dts/rockchip/rk3399.dtsi | 90 > > 1 file changed, 90 insertions(+) > > I noticed that your subject for this patch contains "RESEND" and not > "v2" event though there are changes between this version and the last > one. That's really confusing. This should have been "v2" and the > next version should be "v3". > > > diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi > > b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..abf27a4 100644 > > --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi > > +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi > > @@ -200,6 +200,26 @@ > > > > }; > > > > }; > > > > + gmac: eth@fe30 { > > nit: on rk3288 the node was "ethernet@" instead of "eth@". Presumably > "ethernet" is more correct? > > > + compatible = "rockchip,rk3399-gmac"; > > + reg = <0x0 0xfe30 0x0 0x1>; > > + interrupts = ; > > + interrupt-names = "macirq"; > > + clocks = < SCLK_MAC>, < SCLK_MAC_RX>, > > +< SCLK_MAC_TX>, < SCLK_MACREF>, > > +< SCLK_MACREF_OUT>, < ACLK_GMAC>, > > +< PCLK_GMAC>; > > + clock-names = "stmmaceth", "mac_clk_rx", > > + "mac_clk_tx", "clk_mac_ref", > > + "clk_mac_refout", "aclk_mac", > > + "pclk_mac"; > > + power-domains = < RK3399_PD_GMAC>; > > + resets = < SRST_A_GMAC>; > > + reset-names = "stmmaceth"; > > + rockchip,grf = <>; > > + status = "disabled"; > > + }; > > + > > > > sdio0: dwmmc@fe31 { > > > > compatible = "rockchip,rk3399-dw-mshc", > > > > "rockchip,rk3288-dw-mshc"; > > > > @@ -611,6 +631,11 @@ > > > > status = "disabled"; > > > > }; > > > > + qos_gmac: qos@ffa5c000 { > > + compatible = "syscon"; > > + reg = <0x0 0xffa5c000 0x0 0x20>; > > + }; > > + > > > > qos_hdcp: qos@ffa9 { > > > > compatible = "syscon"; > > reg = <0x0 0xffa9 0x0 0x20>; > > > > @@ -704,6 +729,11 @@ > > > > #size-cells = <0>; > > > > /* These power domains are grouped by VD_CENTER */ > > > > + pd_gmac@RK3399_PD_GMAC { > > RK3399_PD_GMAC is not in VD_CENTER but in VD_LOGIC, right? ...so this > should move. > > > + reg = ; > > + clocks = < ACLK_GMAC>; > > + pm_qos = <_gmac>; > > + }; > > IMHO it would be nice if this were broken into two patches. > > 1. First patch would be the power domain patch and that could land any > time. You wouldn't actually be able to use the gmac but at least > you'd be able to turn off its power. This would be a handy patch to > be able to backport if you happened to not need Ethernet support but > wanted to save power. > > 2. Second patch would actually add the gmac. according to my talk with Caesar in the real v1, the gmac even with power- domains should work just nicely even without the dts patches, as the driver core takes care of powering up the pd before probe. But I may miss some peculiarity of the dwmac? Heiko
Re: [PATCH net v3 0/9] net: ethernet: mediatek: a couple of fixes
From:Date: Tue, 30 Aug 2016 10:59:16 +0800 > a couple of fixes come out from integrating with linux-4.8 rc1 > they all are verified and workable on linux-4.8 rc1 I get rejects when I try to apply this to the current tree.
Re: [RFC] xgbe: constify get_netdev_ops and get_ethtool_ops
From: Stephen HemmingerDate: Wed, 31 Aug 2016 08:57:36 -0700 > Casting away const is bad practice. Since this is ARM specific driver > don't have hardware actually test this. > > Having getter functions for ops is really unnecessary code bloat, but > not going to touch that. > > Signed-off-by: Stephen Hemminger I'll just apply this, let's see what happens.
Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver
> >You can just as easily find the child node called ethernet-phy. > > As Andrew pointed out, using phy-handle allows me to place the phy > node anywhere. > > I've already made changes to this design, and every change has > raised objections. I don't see anything wrong with phy-handle. A > lot of drivers use it. Agreed. You should do the same as what all other driver do. Andrew
Re: [PATCH net-next v2 0/3] net: dsa: add MDB support
From: Andrew LunnDate: Wed, 31 Aug 2016 18:04:05 +0200 > On Wed, Aug 31, 2016 at 11:50:02AM -0400, Vivien Didelot wrote: >> This patchset adds the switchdev MDB object support to the DSA layer. >> >> The MDB support for the mv88e6xxx driver is very similar to the FDB >> support. The FDB operations care about unicast addresses while the MDB >> operations care about multicast addresses. >> >> Both operation set load/purge/dump the Address Translation Table (ATU), >> thus common code is used. > > Reviewed-by: Andrew Lunn Series applied, thanks everyone.
Re: [PATCH net-next V4 00/10] liquidio CN23XX support
From: Raghu VatsavayiDate: Wed, 31 Aug 2016 11:03:19 -0700 > Following patchset adds support for new device "CN23XX" in > liquidio family of adapters. As adviced by you I have split > the previous V3 patch of 18 patches into two halves. This > first patchset has first 10 patches, which are tested against > net-next. I will post the second half after this one. > > This V4 patch also addressed all the comments from previous > submission: > 1) Avoid busy loop while reading registers. > 2) Other minor comments about debug messages and constants. > > Please apply patches in following order as some of the > patches depend on earlier patches. Series applied, thanks.
Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released
Arnd Bergmannwrote: > Right, sorry about that. Do you want me to resend the fixed version, > or do you apply and fix it yourself? I can fix it up myself. I'll pull it into my tree when I've finished doing the fixing up I'm currently working on. David
Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver
Rob Herring wrote: It's not a generic phy. It's a funky "internal phy" that differs among >SOCs. I call it the internal phy, but I could use another name. Internally, >some people call it the "sgmii phy", but I don't think that's accurate. Funky internal PHYs are precisely the types of PHYs this binding is for. It is generic in that the type is not defined. It can be USB, HDMI, DSI, LVDS, etc. I don't understand what you're getting at. There are two IP blocks that have a private interconnect. One is the MAC, and the other is an internal PHY, but the driver programs them as one device. If you want me to make some kind of change, you're going to have to be more specific. >That's what I thought to, but without it, of_phy_find_device() won't work. >I need a pointer to the phy node, and I use of_parse_phandle() to get it: > > struct device_node *phy_np; > > ret = of_mdiobus_register(mii_bus, np); > if (ret) { > dev_err(>dev, "could not register mdio bus\n"); > return ret; > } > > phy_np = of_parse_phandle(np, "phy-handle", 0); You can just as easily find the child node called ethernet-phy. As Andrew pointed out, using phy-handle allows me to place the phy node anywhere. I've already made changes to this design, and every change has raised objections. I don't see anything wrong with phy-handle. A lot of drivers use it. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH 3/5] rxrpc: fix last_call processing
Arnd Bergmannwrote: > I'll follow up with the fixes, both of which are rather > straightforward. Are they both in? [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released David
Re: [PATCH] ipv6: Don't unset flowi6_proto in ipxip6_tnl_xmit()
From: Eli CooperDate: Fri, 26 Aug 2016 23:52:29 +0800 > @@ -1174,6 +1174,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device > *dev) > encap_limit = t->parms.encap_limit; > > memcpy(, >fl.u.ip6, sizeof(fl6)); > + fl6.flowi6_proto = IPPROTO_IPIP; Let's just simply have t->fl have the proto setup properly, just like in GRE. Assigning it explicitly every packet transmit doesn't make much sense.
Re: possible circular locking dependency detected (bisected)
CAI Qianwrites: > Reverted the patch below fixes this problem. > > c845acb324aa85a39650a14e7696982ceea75dc1 > af_unix: Fix splice-bind deadlock Reverting a patch fixing one deadlock in order to avoid another deadlock leaves the 'net situation' unchanged. The idea of the other patch was to change unix_mknod such that it doesn't do __sb_start_write with u->readlock held anymore. As far as I understand the output below, overlayfs introduce an additional codepath where unix_mknod end up doing __sb_start_write again. That's already the original deadlock re-added, cf, B: splice() from a pipe to /mnt/regular_file does sb_start_write() on /mnt C: try to freeze /mnt wait for B to finish with /mnt A: bind() try to bind our socket to /mnt/new_socket_name lock our socket, see it not bound yet decide that it needs to create something in /mnt try to do sb_start_write() on /mnt, block (it's waiting for C). D: splice() from the same pipe to our socket lock the pipe, see that socket is connected try to lock the socket, block waiting for A B: get around to actually feeding a chunk from pipe to file, try to lock the pipe. Deadlock. as A will again acquire the readlock and then call __sb_start_write. > >CAI Qian > > - Original Message - >> From: "CAI Qian" >> To: secur...@kernel.org >> Cc: "Miklos Szeredi" , "Eric Sandeen" >> >> Sent: Tuesday, August 30, 2016 5:05:45 PM >> Subject: Re: possible circular locking dependency detected >> >> FYI, this one can only be reproduced using the overlayfs docker backend. >> The device-mapper works fine. The XFS below has ftype=1. >> >> # cp recvmsg01 /mnt >> # docker run -it -v /mnt/:/mnt/ rhel7 bash >> [root@c33c99aedd93 /]# mount >> overlay on / type overlay >> (rw,relatime,seclabel,lowerdir=l/I5VXL74ENBNAEARZ4M2SIN3XD6:l/KZGBKPXLDXUGHYWMERFUBM4FRP,upperdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/diff,workdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/work) >> ... >> [root@c33c99aedd93 /]# /mnt/recvmsg01 >> CAI Qian >> >> - Original Message - >> > From: "CAI Qian" >> > To: secur...@kernel.org >> > Sent: Friday, August 26, 2016 10:50:57 AM >> > Subject: possible circular locking dependency detected >> > >> > FYI, just want to give a head up to see if there is anything obvious so >> > we can avoid a possible DoS somehow. >> > >> > Running the LTP syscalls tests inside a container until this test trigger >> > below, >> > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/recvmsg/recvmsg01.c >> > >> > [ 4441.904103] open04 (42409) used greatest stack depth: 20552 bytes left >> > [ 4605.419167] >> > [ 4605.420831] == >> > [ 4605.427727] [ INFO: possible circular locking dependency detected ] >> > [ 4605.434720] 4.8.0-rc3+ #3 Not tainted >> > [ 4605.438803] --- >> > [ 4605.445796] recvmsg01/42878 is trying to acquire lock: >> > [ 4605.451528] (sb_writers#8){.+.+.+}, at: [] >> > __sb_start_write+0xb4/0xf0 >> > [ 4605.460642] >> > [ 4605.460642] but task is already holding lock: >> > [ 4605.467150] (>readlock){+.+.+.}, at: [] >> > unix_bind+0x299/0xdf0 >> > [ 4605.475749] >> > [ 4605.475749] which lock already depends on the new lock. >> > [ 4605.475749] >> > [ 4605.484882] >> > [ 4605.484882] the existing dependency chain (in reverse order) is: >> > [ 4605.493234] >> > [ 4605.493234] -> #2 (>readlock){+.+.+.}: >> > [ 4605.497943][] lock_acquire+0x1fa/0x440 >> > [ 4605.504659][] >> > mutex_lock_interruptible_nested+0xdd/0x920 >> > [ 4605.513119][] unix_bind+0x299/0xdf0 >> > [ 4605.519540][] SYSC_bind+0x1d8/0x240 >> > [ 4605.525964][] SyS_bind+0xe/0x10 >> > [ 4605.531998][] do_syscall_64+0x1a6/0x500 >> > [ 4605.538811][] return_from_SYSCALL_64+0x0/0x7a >> > [ 4605.546203] >> > [ 4605.546203] -> #1 (>i_mutex_dir_key#3/1){+.+.+.}: >> > [ 4605.552292][] lock_acquire+0x1fa/0x440 >> > [ 4605.559002][] down_write_nested+0x5e/0xe0 >> > [ 4605.566008][] filename_create+0x155/0x470 >> > [ 4605.573013][] SyS_mkdir+0xaf/0x1f0 >> > [ 4605.579339][] >> > entry_SYSCALL_64_fastpath+0x1f/0xbd >> > [ 4605.587119] >> > [ 4605.587119] -> #0 (sb_writers#8){.+.+.+}: >> > [ 4605.591835][] __lock_acquire+0x3043/0x3dd0 >> > [ 4605.598935][] lock_acquire+0x1fa/0x440 >> > [ 4605.605646][] percpu_down_read+0x4f/0xa0 >> > [ 4605.612552][] __sb_start_write+0xb4/0xf0 >> > [ 4605.619459][] mnt_want_write+0x41/0xb0 >> > [ 4605.626173][] ovl_want_write+0x76/0xa0 >> > [overlay] >> > [ 4605.633860][] ovl_create_object+0xa3/0x2d0 >> > [overlay] >> > [
Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released
From: David HowellsDate: Wed, 31 Aug 2016 21:25:46 +0100 > Is there a 1/2 somewhere? I don't see it. It was an NFSv4 patch.
Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver
On Wed, Aug 31, 2016 at 10:11 AM, Timur Tabiwrote: > Rob Herring wrote: > >>> + internal-phy = <_sgmii>; >> >> >> Can't this use the standard generic phy binding (i.e. 'phys'). It's a >> bit confusing as there's the ethernet phy binding (phy-handle) and the >> generic one. > > > It's not a generic phy. It's a funky "internal phy" that differs among > SOCs. I call it the internal phy, but I could use another name. Internally, > some people call it the "sgmii phy", but I don't think that's accurate. Funky internal PHYs are precisely the types of PHYs this binding is for. It is generic in that the type is not defined. It can be USB, HDMI, DSI, LVDS, etc. > > I can call it "emac-phy", but I don't know if that's any better. > >>> + phy-handle = <>; >> >> >> This is bit redundant as the phy is the child node. I guess if you had >> multiple devices on the mdio bus you would need it. I'd drop it if you >> don't envision needing it and the kernel doesn't require it. > > > That's what I thought to, but without it, of_phy_find_device() won't work. > I need a pointer to the phy node, and I use of_parse_phandle() to get it: > > struct device_node *phy_np; > > ret = of_mdiobus_register(mii_bus, np); > if (ret) { > dev_err(>dev, "could not register mdio bus\n"); > return ret; > } > > phy_np = of_parse_phandle(np, "phy-handle", 0); You can just as easily find the child node called ethernet-phy. > adpt->phydev = of_phy_find_device(phy_np); > >>> + >>> + #address-cells = <1>; >>> + #size-cells = <0>; >>> + phy0: ethernet-phy@0 { >> >> >> It's just an example, but don't we require compatible strings for phys >> now? > > > Nope. I had a compatible property, but it broke of_mdiobus_child_is_phy(). > I don't want to specify why kind of phy it is. I want to let phylib figure > it out. Okay, I'll defer to the mdio folks. Rob
Re: [PATCH] softirq: let ksoftirqd do its job
On Wed, 2016-08-31 at 21:40 +0200, Jesper Dangaard Brouer wrote: > I can confirm the improvement of approx 900Kpps (no wonder people have > been complaining about DoS against UDP/DNS servers). > > BUT during my extensive testing, of this patch, I also think that we > have not gotten to the bottom of this. I was expecting to see a higher > (collective) PPS number as I add more UDP servers, but I don't. > > Running many UDP netperf's with command: > super_netperf 4 -H 198.18.50.3 -l 120 -t UDP_STREAM -T 0,0 -- -m 1472 -n -N Are you sure sender can send fast enough ? > > With 'top' I can see ksoftirq are still getting a higher %CPU time: > > PID %CPU TIME+ COMMAND > 3 36.5 2:28.98 ksoftirqd/0 > 107249.6 0:01.05 netserver > 107229.3 0:01.05 netserver > 107239.3 0:01.05 netserver > 107259.3 0:01.05 netserver Looks much better on my machine, with "udprcv -n 4" (using 4 threads, and 4 sockets using SO_REUSEPORT) 10755 root 20 0 34948 4 0 S 79.7 0.0 0:33.66 udprcv 3 root 20 0 0 0 0 R 19.9 0.0 0:25.49 ksoftirqd/0 Pressing 'H' in top gives : 3 root 20 0 0 0 0 R 19.9 0.0 0:47.84 ksoftirqd/0 10756 root 20 0 34948 4 0 R 19.9 0.0 0:30.76 udprcv 10757 root 20 0 34948 4 0 R 19.9 0.0 0:30.76 udprcv 10758 root 20 0 34948 4 0 S 19.9 0.0 0:30.76 udprcv 10759 root 20 0 34948 4 0 S 19.9 0.0 0:30.76 udprcv Patch was on top of commit 071e31e254e0e0c438eecba3dba1d6e2d0da36c2 > > > > Since the load runs in well identified threads context, an admin can > > more easily tune process scheduling parameters if needed. > > With this patch applied, I found that changing the UDP server process, > scheduler policy to SCHED_RR or SCHED_FIFO gave me a performance boost > from 900Kpps to 1.7Mpps, and not a single UDP packet dropped (even with > a single UDP stream, also tested with more) > > Command used: > sudo chrt --rr -p 20 $(pgrep netserver) Sure, this is what I mentioned in my changelog : Once we properly schedule and rely on ksoftirqd, tuning is available. > > The scheduling picture also change a lot: > >PID %CPU TIME+ COMMAND > 10783 24.3 0:21.53 netserver > 10784 24.3 0:21.53 netserver > 10785 24.3 0:21.52 netserver > 10786 24.3 0:21.50 netserver > 3 2.7 3:12.18 ksoftirqd/0 > >
Re: [RESEND PATCH 3/4] arm64: dts: rockchip: support gmac for rk3399
Caesar, On Tue, Aug 30, 2016 at 11:13 PM, Caesar Wangwrote: > This patch adds needed gamc information for rk3399, > also support the gmac pd. > > Signed-off-by: Roger Chen > Signed-off-by: Caesar Wang > --- > > arch/arm64/boot/dts/rockchip/rk3399.dtsi | 90 > > 1 file changed, 90 insertions(+) I noticed that your subject for this patch contains "RESEND" and not "v2" event though there are changes between this version and the last one. That's really confusing. This should have been "v2" and the next version should be "v3". > diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi > b/arch/arm64/boot/dts/rockchip/rk3399.dtsi > index 32aebc8..abf27a4 100644 > --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi > +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi > @@ -200,6 +200,26 @@ > }; > }; > > + gmac: eth@fe30 { nit: on rk3288 the node was "ethernet@" instead of "eth@". Presumably "ethernet" is more correct? > + compatible = "rockchip,rk3399-gmac"; > + reg = <0x0 0xfe30 0x0 0x1>; > + interrupts = ; > + interrupt-names = "macirq"; > + clocks = < SCLK_MAC>, < SCLK_MAC_RX>, > +< SCLK_MAC_TX>, < SCLK_MACREF>, > +< SCLK_MACREF_OUT>, < ACLK_GMAC>, > +< PCLK_GMAC>; > + clock-names = "stmmaceth", "mac_clk_rx", > + "mac_clk_tx", "clk_mac_ref", > + "clk_mac_refout", "aclk_mac", > + "pclk_mac"; > + power-domains = < RK3399_PD_GMAC>; > + resets = < SRST_A_GMAC>; > + reset-names = "stmmaceth"; > + rockchip,grf = <>; > + status = "disabled"; > + }; > + > sdio0: dwmmc@fe31 { > compatible = "rockchip,rk3399-dw-mshc", > "rockchip,rk3288-dw-mshc"; > @@ -611,6 +631,11 @@ > status = "disabled"; > }; > > + qos_gmac: qos@ffa5c000 { > + compatible = "syscon"; > + reg = <0x0 0xffa5c000 0x0 0x20>; > + }; > + > qos_hdcp: qos@ffa9 { > compatible = "syscon"; > reg = <0x0 0xffa9 0x0 0x20>; > @@ -704,6 +729,11 @@ > #size-cells = <0>; > > /* These power domains are grouped by VD_CENTER */ > + pd_gmac@RK3399_PD_GMAC { RK3399_PD_GMAC is not in VD_CENTER but in VD_LOGIC, right? ...so this should move. > + reg = ; > + clocks = < ACLK_GMAC>; > + pm_qos = <_gmac>; > + }; IMHO it would be nice if this were broken into two patches. 1. First patch would be the power domain patch and that could land any time. You wouldn't actually be able to use the gmac but at least you'd be able to turn off its power. This would be a handy patch to be able to backport if you happened to not need Ethernet support but wanted to save power. 2. Second patch would actually add the gmac.
Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released
On Wednesday, August 31, 2016 9:26:21 PM CEST David Howells wrote: > Arnd Bergmannwrote: > > > + } else { > > + sched = 0; > > That should be false, not 0, btw. > Right, sorry about that. Do you want me to resend the fixed version, or do you apply and fix it yourself? As patch 1/2 isn't actually meant for net-next anyway, the series doesn't need to stay together. Arnd
Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released
On Wednesday, August 31, 2016 9:25:46 PM CEST David Howells wrote: > Is there a 1/2 somewhere? I don't see it. > > David Sorry, mixed up the Cc list. It only went to netdev and lkml and isn't really related. That one was a workaround for a false-positive -Wmaybe-uninitialized warning in NFS, see https://lkml.org/lkml/2016/8/31/412 Arnd
Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released
Is there a 1/2 somewhere? I don't see it. David
Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released
Arnd Bergmannwrote: > + } else { > + sched = 0; That should be false, not 0, btw. David
Re: [PATCH 0/6] constify ethtool_ops structures
On Wed, 31 Aug 2016, Stephen Hemminger wrote: > On Wed, 31 Aug 2016 09:30:42 +0200 > Julia Lawallwrote: > > > Constify ethtool_ops structures. > > > > --- > > > > drivers/net/ethernet/agere/et131x.c |2 +- > > drivers/net/ethernet/broadcom/bcmsysport.c |2 +- > > drivers/net/ethernet/broadcom/genet/bcmgenet.c |2 +- > > drivers/net/ethernet/hisilicon/hip04_eth.c |2 +- > > drivers/net/ethernet/hisilicon/hisi_femac.c |2 +- > > drivers/net/ethernet/hisilicon/hix5hd2_gmac.c|2 +- > > drivers/net/ethernet/hisilicon/hns/hns_ethtool.c |2 +- > > drivers/net/ethernet/synopsys/dwc_eth_qos.c |2 +- > > drivers/staging/slicoss/slicoss.c|4 ++-- > > 9 files changed, 10 insertions(+), 10 deletions(-) > > ___ > > devel mailing list > > de...@linuxdriverproject.org > > http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel > > Other drivers with same type of issue > > > drivers/net/ethernet/mediatek/mtk_eth_soc.c:static struct ethtool_ops > mtk_ethtool_ops = { > drivers/net/ethernet/synopsys/dwc_eth_qos.c:static struct ethtool_ops > dwceqos_ethtool_ops = { > drivers/net/ethernet/xilinx/xilinx_axienet_main.c:static struct ethtool_ops > axienet_ethtool_ops = { > drivers/net/usb/r8152.c:static struct ethtool_ops ops = { > drivers/staging/netlogic/xlr_net.c:static struct ethtool_ops xlr_ethtool_ops > = { Thanks. Probably they don't compile for x86, or at least not with make allyesconfig. I can check on them. julia
Re: [PATCH] softirq: let ksoftirqd do its job
On Wed, 31 Aug 2016 10:42:29 -0700 Eric Dumazetwrote: > From: Eric Dumazet > > A while back, Paolo and Hannes sent an RFC patch adding threaded-able > napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/) > > The problem seems to be that softirqs are very aggressive and are often > handled by the current process, even if we are under stress and that > ksoftirqd was scheduled, so that innocent threads would have more chance > to make progress. > > This patch makes sure that if ksoftirq is running, we let it > perform the softirq work. > > Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/ > > Tested: > > - NIC receiving traffic handled by CPU 0 > - UDP receiver running on CPU 0, using a single UDP socket. > - Incoming flood of UDP packets targeting the UDP socket. > > Before the patch, the UDP receiver could almost never get cpu cycles and > could only receive ~2,000 packets per second. > > After the patch, cpu cycles are split 50/50 between user application and > ksoftirqd/0, and we can effectively read ~900,000 packets per second, > a huge improvement in DOS situation. (Note that more packets are now > dropped by the NIC itself, since the BH handlers get less cpu cycles to > drain RX ring buffer) I can confirm the improvement of approx 900Kpps (no wonder people have been complaining about DoS against UDP/DNS servers). BUT during my extensive testing, of this patch, I also think that we have not gotten to the bottom of this. I was expecting to see a higher (collective) PPS number as I add more UDP servers, but I don't. Running many UDP netperf's with command: super_netperf 4 -H 198.18.50.3 -l 120 -t UDP_STREAM -T 0,0 -- -m 1472 -n -N With 'top' I can see ksoftirq are still getting a higher %CPU time: PID %CPU TIME+ COMMAND 3 36.5 2:28.98 ksoftirqd/0 107249.6 0:01.05 netserver 107229.3 0:01.05 netserver 107239.3 0:01.05 netserver 107259.3 0:01.05 netserver > Since the load runs in well identified threads context, an admin can > more easily tune process scheduling parameters if needed. With this patch applied, I found that changing the UDP server process, scheduler policy to SCHED_RR or SCHED_FIFO gave me a performance boost from 900Kpps to 1.7Mpps, and not a single UDP packet dropped (even with a single UDP stream, also tested with more) Command used: sudo chrt --rr -p 20 $(pgrep netserver) The scheduling picture also change a lot: PID %CPU TIME+ COMMAND 10783 24.3 0:21.53 netserver 10784 24.3 0:21.53 netserver 10785 24.3 0:21.52 netserver 10786 24.3 0:21.50 netserver 3 2.7 3:12.18 ksoftirqd/0 > Reported-by: Paolo Abeni > Reported-by: Hannes Frederic Sowa > Signed-off-by: Eric Dumazet > Cc: David Miller Cc: Jesper Dangaard Brouer > Cc: Peter Zijlstra > Cc: Rik van Riel > --- > kernel/softirq.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/kernel/softirq.c b/kernel/softirq.c > index 17caf4b63342..8ed90e3a88d6 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -78,6 +78,17 @@ static void wakeup_softirqd(void) > } > > /* > + * If ksoftirqd is scheduled, we do not want to process pending softirqs > + * right now. Let ksoftirqd handle this at its own rate, to get fairness. > + */ > +static bool ksoftirqd_running(void) > +{ > + struct task_struct *tsk = __this_cpu_read(ksoftirqd); > + > + return tsk && (tsk->state == TASK_RUNNING); > +} > + > +/* > * preempt_count and SOFTIRQ_OFFSET usage: > * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving > * softirq processing. > @@ -313,7 +324,7 @@ asmlinkage __visible void do_softirq(void) > > pending = local_softirq_pending(); > > - if (pending) > + if (pending && !ksoftirqd_running()) > do_softirq_own_stack(); > > local_irq_restore(flags); > @@ -340,6 +351,9 @@ void irq_enter(void) > > static inline void invoke_softirq(void) > { > + if (ksoftirqd_running()) > + return; > + > if (!force_irqthreads) { > #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK > /* > > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH 2/2] rxrpc: fix undefined behavior in rxrpc_mark_call_released
On Wednesday, August 31, 2016 6:39:04 PM CEST David Howells wrote: > Arnd Bergmannwrote: > > > gcc -Wmaybe-initialized correctly points out a newly introduced bug > > through which we can end up calling rxrpc_queue_call() for a dead > > connection: > > How do you turn that on from within the Kbuild system? You don't, my mistake. My build bot runs with 6e8d666e9253 ("Disable "maybe-uninitialized" warning globally") disabled, and I had assumed that Linus left the warning enabled with "make W=1", but that was incorrect as Trond Myklebust also pointed out. You still get the warning with "make EXTRA_CFLAGS=-Wmaybe-uninitialized", which of course nobody normally does. I'll try to come up with a patch to enable the warning in the W=1 level in the same conditions that used to be enabled up to v4.7. Arnd
Re: possible circular locking dependency detected (bisected)
Reverted the patch below fixes this problem. c845acb324aa85a39650a14e7696982ceea75dc1 af_unix: Fix splice-bind deadlock CAI Qian - Original Message - > From: "CAI Qian"> To: secur...@kernel.org > Cc: "Miklos Szeredi" , "Eric Sandeen" > > Sent: Tuesday, August 30, 2016 5:05:45 PM > Subject: Re: possible circular locking dependency detected > > FYI, this one can only be reproduced using the overlayfs docker backend. > The device-mapper works fine. The XFS below has ftype=1. > > # cp recvmsg01 /mnt > # docker run -it -v /mnt/:/mnt/ rhel7 bash > [root@c33c99aedd93 /]# mount > overlay on / type overlay > (rw,relatime,seclabel,lowerdir=l/I5VXL74ENBNAEARZ4M2SIN3XD6:l/KZGBKPXLDXUGHYWMERFUBM4FRP,upperdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/diff,workdir=9a7c1f735166b1f63d220b4b6c59cc37f3922719ef810c97182b814c1ab336df/work) > ... > [root@c33c99aedd93 /]# /mnt/recvmsg01 > CAI Qian > > - Original Message - > > From: "CAI Qian" > > To: secur...@kernel.org > > Sent: Friday, August 26, 2016 10:50:57 AM > > Subject: possible circular locking dependency detected > > > > FYI, just want to give a head up to see if there is anything obvious so > > we can avoid a possible DoS somehow. > > > > Running the LTP syscalls tests inside a container until this test trigger > > below, > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/recvmsg/recvmsg01.c > > > > [ 4441.904103] open04 (42409) used greatest stack depth: 20552 bytes left > > [ 4605.419167] > > [ 4605.420831] == > > [ 4605.427727] [ INFO: possible circular locking dependency detected ] > > [ 4605.434720] 4.8.0-rc3+ #3 Not tainted > > [ 4605.438803] --- > > [ 4605.445796] recvmsg01/42878 is trying to acquire lock: > > [ 4605.451528] (sb_writers#8){.+.+.+}, at: [] > > __sb_start_write+0xb4/0xf0 > > [ 4605.460642] > > [ 4605.460642] but task is already holding lock: > > [ 4605.467150] (>readlock){+.+.+.}, at: [] > > unix_bind+0x299/0xdf0 > > [ 4605.475749] > > [ 4605.475749] which lock already depends on the new lock. > > [ 4605.475749] > > [ 4605.484882] > > [ 4605.484882] the existing dependency chain (in reverse order) is: > > [ 4605.493234] > > [ 4605.493234] -> #2 (>readlock){+.+.+.}: > > [ 4605.497943][] lock_acquire+0x1fa/0x440 > > [ 4605.504659][] > > mutex_lock_interruptible_nested+0xdd/0x920 > > [ 4605.513119][] unix_bind+0x299/0xdf0 > > [ 4605.519540][] SYSC_bind+0x1d8/0x240 > > [ 4605.525964][] SyS_bind+0xe/0x10 > > [ 4605.531998][] do_syscall_64+0x1a6/0x500 > > [ 4605.538811][] return_from_SYSCALL_64+0x0/0x7a > > [ 4605.546203] > > [ 4605.546203] -> #1 (>i_mutex_dir_key#3/1){+.+.+.}: > > [ 4605.552292][] lock_acquire+0x1fa/0x440 > > [ 4605.559002][] down_write_nested+0x5e/0xe0 > > [ 4605.566008][] filename_create+0x155/0x470 > > [ 4605.573013][] SyS_mkdir+0xaf/0x1f0 > > [ 4605.579339][] > > entry_SYSCALL_64_fastpath+0x1f/0xbd > > [ 4605.587119] > > [ 4605.587119] -> #0 (sb_writers#8){.+.+.+}: > > [ 4605.591835][] __lock_acquire+0x3043/0x3dd0 > > [ 4605.598935][] lock_acquire+0x1fa/0x440 > > [ 4605.605646][] percpu_down_read+0x4f/0xa0 > > [ 4605.612552][] __sb_start_write+0xb4/0xf0 > > [ 4605.619459][] mnt_want_write+0x41/0xb0 > > [ 4605.626173][] ovl_want_write+0x76/0xa0 > > [overlay] > > [ 4605.633860][] ovl_create_object+0xa3/0x2d0 > > [overlay] > > [ 4605.641942][] ovl_mknod+0x31/0x40 [overlay] > > [ 4605.649138][] vfs_mknod+0x34b/0x560 > > [ 4605.655570][] unix_bind+0x4ca/0xdf0 > > [ 4605.661991][] SYSC_bind+0x1d8/0x240 > > [ 4605.668412][] SyS_bind+0xe/0x10 > > [ 4605.674456][] do_syscall_64+0x1a6/0x500 > > [ 4605.681266][] return_from_SYSCALL_64+0x0/0x7a > > [ 4605.688657] > > [ 4605.688657] other info that might help us debug this: > > [ 4605.688657] > > [ 4605.697590] Chain exists of: > > [ 4605.697590] sb_writers#8 --> >i_mutex_dir_key#3/1 --> > > >readlock > > [ 4605.697590] > > [ 4605.707287] Possible unsafe locking scenario: > > [ 4605.707287] > > [ 4605.713890]CPU0CPU1 > > [ 4605.718943] > > [ 4605.723995] lock(>readlock); > > [ 4605.727708] > > lock(>i_mutex_dir_key#3/1); > > [ 4605.735613]lock(>readlock); > > [ 4605.742146] lock(sb_writers#8); > > [ 4605.745880] > > [ 4605.745880] *** DEADLOCK *** > > [ 4605.745880] > > [ 4605.752486] 3 locks held by recvmsg01/42878: > > [ 4605.757247] #0: (sb_writers#13){.+.+.+}, at: [] > > __sb_start_write+0xb4/0xf0 > > [ 4605.766930] #1: (>s_type->i_mutex_key#16/1){+.+.+.}, at: > > [] filename_create+0x155/0x470 > > [
Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver
Florian Fainelli wrote: if these are truly 64-bits stats, how come you are using a single readl_* to access them? Or is the u64 rx_err_addr just used as temporary storage and aligned to the largest size you need to deal with? "*stats_itr += val;" takes the 32-bit val, zero-extends it to 64 bits, and then adds that to the corresponding 64-bit field in emac_stats. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver
On 08/31/2016 11:57 AM, Timur Tabi wrote: > Timur Tabi wrote: >> >>> Seems that there are several unused members in the emac_stats struct: >>> +struct emac_stats { >>> ... >>> ... >>> Both rx_bcast_byte_cnt and rx_mcast_byte_cnt are not used anywhere/ + u64 rx_bcast_byte_cnt; /* broadcast packets byte count (without FCS) */ + u64 rx_mcast_byte_cnt; /* multicast packets byte count (without FCS) */ >>> ... >>> rx_err_addr is not used + u64 rx_err_addr;/* packets dropped due to address filtering */ >> >> I'll go through the structure and remove the unused fields. > > It turns out I cannot actually strip out those "unused" fields. They > are all indirectly used in emac_get_stats64: > > u64 *stats_itr = >stats.rx_ok; > > while (addr <= REG_MAC_RX_STATUS_END) { > val = readl_relaxed(adpt->base + addr); > *stats_itr += val; > stats_itr++; > addr += sizeof(u32); > } if these are truly 64-bits stats, how come you are using a single readl_* to access them? Or is the u64 rx_err_addr just used as temporary storage and aligned to the largest size you need to deal with? -- Florian
Re: [PATCH] [v9] net: emac: emac gigabit ethernet controller driver
Timur Tabi wrote: Seems that there are several unused members in the emac_stats struct: +struct emac_stats { ... ... Both rx_bcast_byte_cnt and rx_mcast_byte_cnt are not used anywhere/ + u64 rx_bcast_byte_cnt; /* broadcast packets byte count (without FCS) */ + u64 rx_mcast_byte_cnt; /* multicast packets byte count (without FCS) */ ... rx_err_addr is not used + u64 rx_err_addr;/* packets dropped due to address filtering */ I'll go through the structure and remove the unused fields. It turns out I cannot actually strip out those "unused" fields. They are all indirectly used in emac_get_stats64: u64 *stats_itr = >stats.rx_ok; while (addr <= REG_MAC_RX_STATUS_END) { val = readl_relaxed(adpt->base + addr); *stats_itr += val; stats_itr++; addr += sizeof(u32); } -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [PATCH net-next V4 4/4] net/sched: Introduce act_tunnel_key
On Wed, Aug 31, 2016 at 5:46 AM, Hadar Hen Zionwrote: > > From: Amir Vadai > > This action could be used before redirecting packets to a shared tunnel > device, or when redirecting packets arriving from a such a device. > > > + > +struct tcf_tunnel_key_params { > + struct rcu_head rcu; > + int tcft_action; Also add " int action;" (see why later) > + struct metadata_dst *tcft_enc_metadata; > +}; > + > + > +static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a, > + struct tcf_result *res) > +{ > + struct tcf_tunnel_key *t = to_tunnel_key(a); > + struct tcf_tunnel_key_params *params; > + int action; > + > + rcu_read_lock(); > + > + params = rcu_dereference(t->params); > + > + tcf_lastuse_update(>tcf_tm); > + bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb); > + action = t->tcf_action; Ideally, you should read param->action instead of t->tcf_action to be completely clean. > + > + switch (params->tcft_action) { > + case TCA_TUNNEL_KEY_ACT_RELEASE: > + skb_dst_drop(skb); > + break; > + case TCA_TUNNEL_KEY_ACT_SET: > + skb_dst_drop(skb); > + skb_dst_set(skb, dst_clone(>tcft_enc_metadata->dst)); > + break; > + default: > + WARN_ONCE(1, "Bad tunnel_key action.\n"); > + break; > + } > + > + rcu_read_unlock(); > + > + return action; > +} >
Re: [PATCH RFC 4/4] xfs: Transmit flow steering
On 08/30/2016 08:00 PM, Tom Herbert wrote: XFS maintains a per device flow table that is indexed by the skbuff hash. The XFS table is only consulted when there is no queue saved in a transmit socket for an skbuff. Each entry in the flow table contains a queue index and a queue pointer. The queue pointer is set when a queue is chosen using a flow table entry. This pointer is set to the head pointer in the transmit queue (which is maintained by BQL). The new function get_xfs_index that looks up flows in the XPS table. The entry returned gives the last queue a matching flow used. The returned queue is compared against the normal XPS queue. If they are different, then we only switch if the tail pointer in the TX queue has advanced past the pointer saved in the entry. In this way OOO should be avoided when XPS wants to use a different queue. I'd love for Dave Chinner to get some networking bug reports, but maybe we shouldn't call it XFS? At least CONFIG_XFS should be something else. It doesn't conflict now because we have CONFIG_XFS_FS, but even CONFIG_XFS_NET sounds like it's related to the filesystem instead of transmit flows. [ Sorry, four patches in and all I do is complain about the name ] -chris Signed-off-by: Tom Herbert--- net/Kconfig| 6 net/core/dev.c | 93 -- 2 files changed, 84 insertions(+), 15 deletions(-) diff --git a/net/Kconfig b/net/Kconfig index 7b6cd34..5e3eddf 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -255,6 +255,12 @@ config XPS depends on SMP default y +config XFS + bool + depends on XPS + depends on BQL + default y + config HWBM bool ... -static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb) +/* Must be called with RCU read_lock */ +static int get_xfs_index(struct net_device *dev, struct sk_buff *skb) { - struct sock *sk = skb->sk; - int queue_index = sk_tx_queue_get(sk); +#ifdef CONFIG_XFS + struct xps_dev_flow_table *flow_table; + struct xps_dev_flow ent; + int queue_index; + struct netdev_queue *txq; + u32 hash;
Re: [PATCH] net/ethernet: Use ether_addr_copy rather than memcpy
On Wed, 2016-08-31 at 11:32 -0700, Eric Dumazet wrote: > On Wed, 2016-08-31 at 09:32 -0700, Greg Rose wrote: > > I'm not sure why this hasn't been done before because it seems obvious, > > so maybe there is some reason that memcpy is used instead of > > ether_addr_copy in this code. But let's try this anyway. > > > > Change memcpy to ether_addr_copy. > > ... > > > > > @@ -211,7 +211,7 @@ EXPORT_SYMBOL(eth_type_trans); > > int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr) > > { > > const struct ethhdr *eth = eth_hdr(skb); > > - memcpy(haddr, eth->h_source, ETH_ALEN); > > + ether_addr_copy(haddr, eth->h_source); > > > Please carefully read ether_addr_copy() comments. > > Not all arches are like x86 Thanks Eric, Joe set me straight already. - Greg > > >
Re: [PATCH] net/ethernet: Use ether_addr_copy rather than memcpy
On Wed, 2016-08-31 at 09:32 -0700, Greg Rose wrote: > I'm not sure why this hasn't been done before because it seems obvious, > so maybe there is some reason that memcpy is used instead of > ether_addr_copy in this code. But let's try this anyway. > > Change memcpy to ether_addr_copy. ... > > @@ -211,7 +211,7 @@ EXPORT_SYMBOL(eth_type_trans); > int eth_header_parse(const struct sk_buff *skb, unsigned char *haddr) > { > const struct ethhdr *eth = eth_hdr(skb); > - memcpy(haddr, eth->h_source, ETH_ALEN); > + ether_addr_copy(haddr, eth->h_source); Please carefully read ether_addr_copy() comments. Not all arches are like x86
[PATCH net-next V4 10/10] liquidio: CN23XX firmware download
Add firmware download support for cn23xx device. Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 40 +++ .../ethernet/cavium/liquidio/cn23xx_pf_device.h| 2 + drivers/net/ethernet/cavium/liquidio/lio_main.c| 115 - 3 files changed, 111 insertions(+), 46 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index 2e78101..2d81206 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -214,6 +214,37 @@ void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct) CVM_CAST64(octeon_read_csr64(oct, CN23XX_SLI_PKT_CNT_INT))); } +static int cn23xx_pf_soft_reset(struct octeon_device *oct) +{ + octeon_write_csr64(oct, CN23XX_WIN_WR_MASK_REG, 0xFF); + + dev_dbg(>pci_dev->dev, "OCTEON[%d]: BIST enabled for CN23XX soft reset\n", + oct->octeon_id); + + octeon_write_csr64(oct, CN23XX_SLI_SCRATCH1, 0x1234ULL); + + /* Initiate chip-wide soft reset */ + lio_pci_readq(oct, CN23XX_RST_SOFT_RST); + lio_pci_writeq(oct, 1, CN23XX_RST_SOFT_RST); + + /* Wait for 100ms as Octeon resets. */ + mdelay(100); + + if (octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1) == 0x1234ULL) { + dev_err(>pci_dev->dev, "OCTEON[%d]: Soft reset failed\n", + oct->octeon_id); + return 1; + } + + dev_dbg(>pci_dev->dev, "OCTEON[%d]: Reset completed\n", + oct->octeon_id); + + /* restore the reset value*/ + octeon_write_csr64(oct, CN23XX_WIN_WR_MASK_REG, 0xFF); + + return 0; +} + static void cn23xx_enable_error_reporting(struct octeon_device *oct) { u32 regval; @@ -1030,6 +1061,7 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct) oct->fn_list.process_interrupt_regs = cn23xx_interrupt_handler; oct->fn_list.msix_interrupt_handler = cn23xx_pf_msix_interrupt_handler; + oct->fn_list.soft_reset = cn23xx_pf_soft_reset; oct->fn_list.setup_device_regs = cn23xx_setup_pf_device_regs; oct->fn_list.enable_interrupt = cn23xx_enable_pf_interrupt; @@ -1129,3 +1161,11 @@ void cn23xx_dump_iq_regs(struct octeon_device *oct) CVM_CAST64(octeon_read_csr64( oct, CN23XX_SLI_S2M_PORTX_CTL(oct->pcie_port; } + +int cn23xx_fw_loaded(struct octeon_device *oct) +{ + u64 val; + + val = octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1); + return (val >> 1) & 1ULL; +} diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h index 36252e7..33b7589 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h @@ -52,4 +52,6 @@ int validate_cn23xx_pf_config_info(struct octeon_device *oct, struct octeon_config *conf23xx); void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct); + +int cn23xx_fw_loaded(struct octeon_device *oct); #endif diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 464d42b..866c075 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -1312,9 +1312,9 @@ static void octeon_destroy_resources(struct octeon_device *oct) /* fallthrough */ case OCT_DEV_PCI_MAP_DONE: - /* Soft reset the octeon device before exiting */ - oct->fn_list.soft_reset(oct); + if ((!OCTEON_CN23XX_PF(oct)) || !oct->octeon_id) + oct->fn_list.soft_reset(oct); octeon_unmap_pci_barx(oct, 0); octeon_unmap_pci_barx(oct, 1); @@ -3823,6 +3823,7 @@ static void nic_starter(struct work_struct *work) static int octeon_device_init(struct octeon_device *octeon_dev) { int j, ret; + int fw_loaded = 0; char bootcmd[] = "\n"; struct octeon_device_priv *oct_priv = (struct octeon_device_priv *)octeon_dev->priv; @@ -3844,9 +3845,23 @@ static int octeon_device_init(struct octeon_device *octeon_dev) octeon_dev->app_mode = CVM_DRV_INVALID_APP; - /* Do a soft reset of the Octeon device. */ - if (octeon_dev->fn_list.soft_reset(octeon_dev)) + if (OCTEON_CN23XX_PF(octeon_dev)) { + if (!cn23xx_fw_loaded(octeon_dev)) { + fw_loaded = 0; + /* Do a soft reset of the Octeon device. */ +
[PATCH net-next V4 01/10] liquidio: Consolidate common functionality
Consolidate common functionality of various devices from different files into lio_core.c/octeon_console.c. Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- drivers/net/ethernet/cavium/liquidio/Makefile | 23 +- .../net/ethernet/cavium/liquidio/cn66xx_device.c | 31 --- .../net/ethernet/cavium/liquidio/cn66xx_device.h | 1 - .../net/ethernet/cavium/liquidio/cn68xx_device.c | 1 - drivers/net/ethernet/cavium/liquidio/lio_core.c| 261 +++ drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 18 +- drivers/net/ethernet/cavium/liquidio/lio_main.c| 276 + .../net/ethernet/cavium/liquidio/octeon_console.c | 117 - .../net/ethernet/cavium/liquidio/octeon_device.c | 104 .../net/ethernet/cavium/liquidio/octeon_device.h | 1 - drivers/net/ethernet/cavium/liquidio/octeon_main.h | 24 +- .../net/ethernet/cavium/liquidio/octeon_mem_ops.c | 1 - .../net/ethernet/cavium/liquidio/octeon_network.h | 2 - drivers/net/ethernet/cavium/liquidio/octeon_nic.c | 8 +- drivers/net/ethernet/cavium/liquidio/octeon_nic.h | 2 +- 15 files changed, 426 insertions(+), 444 deletions(-) create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_core.c diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile index 2f36680..d44111d 100644 --- a/drivers/net/ethernet/cavium/liquidio/Makefile +++ b/drivers/net/ethernet/cavium/liquidio/Makefile @@ -3,14 +3,15 @@ # obj-$(CONFIG_LIQUIDIO) += liquidio.o -liquidio-objs := lio_main.o \ - lio_ethtool.o \ - request_manager.o \ - response_manager.o \ - octeon_device.o\ - cn66xx_device.o\ - cn68xx_device.o\ - octeon_mem_ops.o \ - octeon_droq.o \ - octeon_console.o \ - octeon_nic.o +liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \ + lio_core.o \ + request_manager.o \ + response_manager.o \ + octeon_device.o\ + cn66xx_device.o\ + cn68xx_device.o\ + octeon_mem_ops.o \ + octeon_droq.o \ + octeon_nic.o + +liquidio-objs := lio_main.o octeon_console.o $(liquidio-y) diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c index c03d370..dc5d14a 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c @@ -418,36 +418,6 @@ void lio_cn6xxx_disable_io_queues(struct octeon_device *oct) octeon_write_csr(oct, CN6XXX_SLI_PKT_TIME_INT, d32); } -void lio_cn6xxx_reinit_regs(struct octeon_device *oct) -{ - int i; - - for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct); i++) { - if (!(oct->io_qmask.iq & (1ULL << i))) - continue; - oct->fn_list.setup_iq_regs(oct, i); - } - - for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) { - if (!(oct->io_qmask.oq & (1ULL << i))) - continue; - oct->fn_list.setup_oq_regs(oct, i); - } - - oct->fn_list.setup_device_regs(oct); - - oct->fn_list.enable_interrupt(oct->chip); - - oct->fn_list.enable_io_queues(oct); - - /* for (i = 0; i < oct->num_oqs; i++) { */ - for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct); i++) { - if (!(oct->io_qmask.oq & (1ULL << i))) - continue; - writel(oct->droq[i]->max_count, oct->droq[i]->pkts_credit_reg); - } -} - void lio_cn6xxx_bar1_idx_setup(struct octeon_device *oct, u64 core_addr, @@ -714,7 +684,6 @@ int lio_setup_cn66xx_octeon_device(struct octeon_device *oct) oct->fn_list.soft_reset = lio_cn6xxx_soft_reset; oct->fn_list.setup_device_regs = lio_cn6xxx_setup_device_regs; - oct->fn_list.reinit_regs = lio_cn6xxx_reinit_regs; oct->fn_list.update_iq_read_idx = lio_cn6xxx_update_read_index; oct->fn_list.bar1_idx_setup = lio_cn6xxx_bar1_idx_setup; diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h index 28c4722..2e4bc25 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h +++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.h @@ -83,7 +83,6 @@ void lio_cn6xxx_setup_oq_regs(struct octeon_device *oct, u32 oq_no); void lio_cn6xxx_enable_io_queues(struct octeon_device *oct); void lio_cn6xxx_disable_io_queues(struct
[PATCH net-next V4 09/10] liquidio: MSIX support for CN23XX
This patch adds support msix interrupt for cn23xx device. Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 166 +++-- .../net/ethernet/cavium/liquidio/cn66xx_device.c | 10 +- .../net/ethernet/cavium/liquidio/cn66xx_device.h | 4 +- drivers/net/ethernet/cavium/liquidio/lio_main.c| 269 + .../net/ethernet/cavium/liquidio/octeon_device.c | 39 +++ .../net/ethernet/cavium/liquidio/octeon_device.h | 33 ++- 6 files changed, 452 insertions(+), 69 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index 7e932a3..2e78101 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -567,10 +567,16 @@ static void cn23xx_setup_iq_regs(struct octeon_device *oct, u32 iq_no) */ pkt_in_done = readq(iq->inst_cnt_reg); - /* Clear the count by writing back what we read, but don't -* enable interrupts -*/ - writeq(pkt_in_done, iq->inst_cnt_reg); + if (oct->msix_on) { + /* Set CINT_ENB to enable IQ interrupt */ + writeq((pkt_in_done | CN23XX_INTR_CINT_ENB), + iq->inst_cnt_reg); + } else { + /* Clear the count by writing back what we read, but don't +* enable interrupts +*/ + writeq(pkt_in_done, iq->inst_cnt_reg); + } iq->reset_instr_cnt = 0; } @@ -579,6 +585,9 @@ static void cn23xx_setup_oq_regs(struct octeon_device *oct, u32 oq_no) { u32 reg_val; struct octeon_droq *droq = oct->droq[oq_no]; + struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip; + u64 time_threshold; + u64 cnt_threshold; oq_no += oct->sriov_info.pf_srn; @@ -595,19 +604,31 @@ static void cn23xx_setup_oq_regs(struct octeon_device *oct, u32 oq_no) droq->pkts_credit_reg = (u8 *)oct->mmio[0].hw_addr + CN23XX_SLI_OQ_PKTS_CREDIT(oq_no); - /* Enable this output queue to generate Packet Timer Interrupt - */ - reg_val = octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no)); - reg_val |= CN23XX_PKT_OUTPUT_CTL_TENB; - octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no), -reg_val); + if (!oct->msix_on) { + /* Enable this output queue to generate Packet Timer Interrupt +*/ + reg_val = + octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no)); + reg_val |= CN23XX_PKT_OUTPUT_CTL_TENB; + octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no), +reg_val); - /* Enable this output queue to generate Packet Count Interrupt - */ - reg_val = octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no)); - reg_val |= CN23XX_PKT_OUTPUT_CTL_CENB; - octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no), -reg_val); + /* Enable this output queue to generate Packet Count Interrupt +*/ + reg_val = + octeon_read_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no)); + reg_val |= CN23XX_PKT_OUTPUT_CTL_CENB; + octeon_write_csr(oct, CN23XX_SLI_OQ_PKT_CONTROL(oq_no), +reg_val); + } else { + time_threshold = cn23xx_pf_get_oq_ticks( + oct, (u32)CFG_GET_OQ_INTR_TIME(cn23xx->conf)); + cnt_threshold = (u32)CFG_GET_OQ_INTR_PKT(cn23xx->conf); + + octeon_write_csr64( + oct, CN23XX_SLI_OQ_PKT_INT_LEVELS(oq_no), + ((time_threshold << 32 | cnt_threshold))); + } } static int cn23xx_enable_io_queues(struct octeon_device *oct) @@ -762,6 +783,110 @@ static void cn23xx_disable_io_queues(struct octeon_device *oct) } } +static u64 cn23xx_pf_msix_interrupt_handler(void *dev) +{ + struct octeon_ioq_vector *ioq_vector = (struct octeon_ioq_vector *)dev; + struct octeon_device *oct = ioq_vector->oct_dev; + u64 pkts_sent; + u64 ret = 0; + struct octeon_droq *droq = oct->droq[ioq_vector->droq_index]; + + dev_dbg(>pci_dev->dev, "In %s octeon_dev @ %p\n", __func__, oct); + + if (!droq) { + dev_err(>pci_dev->dev, "23XX bringup FIXME: oct pfnum:%d ioq_vector->ioq_num :%d droq is NULL\n", + oct->pf_num, ioq_vector->ioq_num); + return 0; + } + + pkts_sent = readq(droq->pkts_sent_reg); + + /* If our device
[PATCH net-next V4 04/10] liquidio: CN23XX register definitions
This patch adds register definitions and structures for new device cn23xx. Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- .../net/ethernet/cavium/liquidio/cn23xx_pf_regs.h | 604 + 1 file changed, 604 insertions(+) create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h new file mode 100644 index 000..03d79d9 --- /dev/null +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h @@ -0,0 +1,604 @@ +/** +* Author: Cavium, Inc. +* +* Contact: supp...@cavium.com +* Please include "LiquidIO" in the subject. +* +* Copyright (c) 2003-2015 Cavium, Inc. +* +* This file is free software; you can redistribute it and/or modify +* it under the terms of the GNU General Public License, Version 2, as +* published by the Free Software Foundation. +* +* This file is distributed in the hope that it will be useful, but +* AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty +* of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or +* NONINFRINGEMENT. See the GNU General Public License for more +* details. +* +* This file may also be available under a different license from Cavium. +* Contact Cavium, Inc. for more information +**/ + +/*! \file cn23xx_regs.h + * \brief Host Driver: Register Address and Register Mask values for + * Octeon CN23XX devices. +*/ + +#ifndef __CN23XX_PF_REGS_H__ +#define __CN23XX_PF_REGS_H__ + +#define CN23XX_CONFIG_VENDOR_ID0x00 +#define CN23XX_CONFIG_DEVICE_ID0x02 + +#define CN23XX_CONFIG_XPANSION_BAR 0x38 + +#define CN23XX_CONFIG_MSIX_CAP0x50 +#define CN23XX_CONFIG_MSIX_LMSI 0x54 +#define CN23XX_CONFIG_MSIX_UMSI 0x58 +#define CN23XX_CONFIG_MSIX_MSIMD 0x5C +#define CN23XX_CONFIG_MSIX_MSIMM 0x60 +#define CN23XX_CONFIG_MSIX_MSIMP 0x64 + +#define CN23XX_CONFIG_PCIE_CAP 0x70 +#define CN23XX_CONFIG_PCIE_DEVCAP 0x74 +#define CN23XX_CONFIG_PCIE_DEVCTL 0x78 +#define CN23XX_CONFIG_PCIE_LINKCAP 0x7C +#define CN23XX_CONFIG_PCIE_LINKCTL 0x80 +#define CN23XX_CONFIG_PCIE_SLOTCAP 0x84 +#define CN23XX_CONFIG_PCIE_SLOTCTL 0x88 +#define CN23XX_CONFIG_PCIE_DEVCTL2 0x98 +#define CN23XX_CONFIG_PCIE_LINKCTL20xA0 +#define CN23XX_CONFIG_PCIE_UNCORRECT_ERR_MASK 0x108 +#define CN23XX_CONFIG_PCIE_CORRECT_ERR_STATUS 0x110 +#define CN23XX_CONFIG_PCIE_DEVCTL_MASK 0x0004 + +#define CN23XX_PCIE_SRIOV_FDL 0x188 +#define CN23XX_PCIE_SRIOV_FDL_BIT_POS 0x10 +#define CN23XX_PCIE_SRIOV_FDL_MASK0xFF + +#define CN23XX_CONFIG_PCIE_FLTMSK 0x720 + +#define CN23XX_CONFIG_SRIOV_VFDEVID0x190 + +#define CN23XX_CONFIG_SRIOV_BAR_START 0x19C +#define CN23XX_CONFIG_SRIOV_BARX(i)\ + (CN23XX_CONFIG_SRIOV_BAR_START + (i * 4)) +#define CN23XX_CONFIG_SRIOV_BAR_PF0x08 +#define CN23XX_CONFIG_SRIOV_BAR_64BIT 0x04 +#define CN23XX_CONFIG_SRIOV_BAR_IO0x01 + +/* ## BAR0 Registers */ + +#defineCN23XX_SLI_CTL_PORT_START 0x286E0 +#defineCN23XX_PORT_OFFSET 0x10 + +#defineCN23XX_SLI_CTL_PORT(p) \ + (CN23XX_SLI_CTL_PORT_START + ((p) * CN23XX_PORT_OFFSET)) + +/* 2 scatch registers (64-bit) */ +#defineCN23XX_SLI_WINDOW_CTL 0x282E0 +#defineCN23XX_SLI_SCRATCH1 0x283C0 +#defineCN23XX_SLI_SCRATCH2 0x283D0 +#defineCN23XX_SLI_WINDOW_CTL_DEFAULT 0x20ULL + +/* 1 registers (64-bit) - SLI_CTL_STATUS */ +#defineCN23XX_SLI_CTL_STATUS 0x28570 + +/* SLI Packet Input Jabber Register (64 bit register) + * <31:0> for Byte count for limiting sizes of packet sizes + * that are allowed for sli packet inbound packets. + * the default value is 0xFA00(=64000). + */ +#defineCN23XX_SLI_PKT_IN_JABBER0x29170 +/* The input jabber is used to determine the TSO max size. + * Due to H/W limitation, this need to be reduced to 6 + * in order to to H/W TSO and avoid the WQE malfarmation + * PKO_BUG_24989_WQE_LEN + */ +#defineCN23XX_DEFAULT_INPUT_JABBER 0xEA60 /*6*/ + +#defineCN23XX_WIN_WR_ADDR_LO
[PATCH net-next V4 06/10] liquidio: CN23XX device init and sriov config
Add support for cn23xx device init and sriov queue config. Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- drivers/net/ethernet/cavium/liquidio/Makefile | 1 + .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 527 + .../ethernet/cavium/liquidio/cn23xx_pf_device.h| 7 + drivers/net/ethernet/cavium/liquidio/lio_main.c| 10 +- 4 files changed, 544 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c diff --git a/drivers/net/ethernet/cavium/liquidio/Makefile b/drivers/net/ethernet/cavium/liquidio/Makefile index d44111d..5a27b2a 100644 --- a/drivers/net/ethernet/cavium/liquidio/Makefile +++ b/drivers/net/ethernet/cavium/liquidio/Makefile @@ -10,6 +10,7 @@ liquidio-$(CONFIG_LIQUIDIO) += lio_ethtool.o \ octeon_device.o\ cn66xx_device.o\ cn68xx_device.o\ + cn23xx_pf_device.o \ octeon_mem_ops.o \ octeon_droq.o \ octeon_nic.o diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c new file mode 100644 index 000..ccc3d5b --- /dev/null +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -0,0 +1,527 @@ +/** +* Author: Cavium, Inc. +* +* Contact: supp...@cavium.com +* Please include "LiquidIO" in the subject. +* +* Copyright (c) 2003-2015 Cavium, Inc. +* +* This file is free software; you can redistribute it and/or modify +* it under the terms of the GNU General Public License, Version 2, as +* published by the Free Software Foundation. +* +* This file is distributed in the hope that it will be useful, but +* AS-IS and WITHOUT ANY WARRANTY; without even the implied warranty +* of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, TITLE, or +* NONINFRINGEMENT. See the GNU General Public License for more +* details. +* +* This file may also be available under a different license from Cavium. +* Contact Cavium, Inc. for more information +**/ + +#include +#include +#include +#include "liquidio_common.h" +#include "octeon_droq.h" +#include "octeon_iq.h" +#include "response_manager.h" +#include "octeon_device.h" +#include "cn23xx_pf_device.h" +#include "octeon_main.h" + +#define RESET_NOTDONE 0 +#define RESET_DONE 1 + +/* Change the value of SLI Packet Input Jabber Register to allow + * VXLAN TSO packets which can be 64424 bytes, exceeding the + * MAX_GSO_SIZE we supplied to the kernel + */ +#define CN23XX_INPUT_JABBER 64600 + +#define LIOLUT_RING_DISTRIBUTION 9 +const int liolut_num_vfs_to_rings_per_vf[LIOLUT_RING_DISTRIBUTION] = { + 0, 8, 4, 2, 2, 2, 1, 1, 1 +}; + +void cn23xx_dump_pf_initialized_regs(struct octeon_device *oct) +{ + int i = 0; + u32 regval = 0; + struct octeon_cn23xx_pf *cn23xx = (struct octeon_cn23xx_pf *)oct->chip; + + /*In cn23xx_soft_reset*/ + dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%llx\n", + "CN23XX_WIN_WR_MASK_REG", CVM_CAST64(CN23XX_WIN_WR_MASK_REG), + CVM_CAST64(octeon_read_csr64(oct, CN23XX_WIN_WR_MASK_REG))); + dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n", + "CN23XX_SLI_SCRATCH1", CVM_CAST64(CN23XX_SLI_SCRATCH1), + CVM_CAST64(octeon_read_csr64(oct, CN23XX_SLI_SCRATCH1))); + dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n", + "CN23XX_RST_SOFT_RST", CN23XX_RST_SOFT_RST, + lio_pci_readq(oct, CN23XX_RST_SOFT_RST)); + + /*In cn23xx_set_dpi_regs*/ + dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n", + "CN23XX_DPI_DMA_CONTROL", CN23XX_DPI_DMA_CONTROL, + lio_pci_readq(oct, CN23XX_DPI_DMA_CONTROL)); + + for (i = 0; i < 6; i++) { + dev_dbg(>pci_dev->dev, "%s(%d)[%llx] : 0x%016llx\n", + "CN23XX_DPI_DMA_ENG_ENB", i, + CN23XX_DPI_DMA_ENG_ENB(i), + lio_pci_readq(oct, CN23XX_DPI_DMA_ENG_ENB(i))); + dev_dbg(>pci_dev->dev, "%s(%d)[%llx] : 0x%016llx\n", + "CN23XX_DPI_DMA_ENG_BUF", i, + CN23XX_DPI_DMA_ENG_BUF(i), + lio_pci_readq(oct, CN23XX_DPI_DMA_ENG_BUF(i))); + } + + dev_dbg(>pci_dev->dev, "%s[%llx] : 0x%016llx\n", "CN23XX_DPI_CTL", + CN23XX_DPI_CTL, lio_pci_readq(oct, CN23XX_DPI_CTL)); + + /*In cn23xx_setup_pcie_mps and cn23xx_setup_pcie_mrrs */ + pci_read_config_dword(oct->pci_dev, CN23XX_CONFIG_PCIE_DEVCTL, );
[PATCH net-next V4 02/10] liquidio: Firmware version management
This patch contains changes for firmware version management. Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- drivers/net/ethernet/cavium/liquidio/lio_main.c | 12 ++-- .../net/ethernet/cavium/liquidio/liquidio_common.h | 20 +--- 2 files changed, 27 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 2abc110..1bbeae8 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -3230,8 +3230,9 @@ static int setup_nic_devices(struct octeon_device *octeon_dev) union oct_nic_if_cfg if_cfg; unsigned int base_queue; unsigned int gmx_port_id; - u32 resp_size, ctx_size; + u32 resp_size, ctx_size, data_size; u32 ifidx_or_pfnum; + struct lio_version *vdata; /* This is to handle link status changes */ octeon_register_dispatch_fn(octeon_dev, OPCODE_NIC, @@ -3253,11 +3254,18 @@ static int setup_nic_devices(struct octeon_device *octeon_dev) for (i = 0; i < octeon_dev->ifcount; i++) { resp_size = sizeof(struct liquidio_if_cfg_resp); ctx_size = sizeof(struct liquidio_if_cfg_context); + data_size = sizeof(struct lio_version); sc = (struct octeon_soft_command *) - octeon_alloc_soft_command(octeon_dev, 0, + octeon_alloc_soft_command(octeon_dev, data_size, resp_size, ctx_size); resp = (struct liquidio_if_cfg_resp *)sc->virtrptr; ctx = (struct liquidio_if_cfg_context *)sc->ctxptr; + vdata = (struct lio_version *)sc->virtdptr; + + *((u64 *)vdata) = 0; + vdata->major = cpu_to_be16(LIQUIDIO_BASE_MAJOR_VERSION); + vdata->minor = cpu_to_be16(LIQUIDIO_BASE_MINOR_VERSION); + vdata->micro = cpu_to_be16(LIQUIDIO_BASE_MICRO_VERSION); num_iqueues = CFG_GET_NUM_TXQS_NIC_IF(octeon_get_conf(octeon_dev), i); diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h index 199a8b9..11df55a 100644 --- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h +++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h @@ -30,10 +30,24 @@ #include "octeon_config.h" -#define LIQUIDIO_BASE_VERSION "1.4" -#define LIQUIDIO_MICRO_VERSION ".1" #define LIQUIDIO_PACKAGE "" -#define LIQUIDIO_VERSION "1.4.1" +#define LIQUIDIO_BASE_MAJOR_VERSION 1 +#define LIQUIDIO_BASE_MINOR_VERSION 4 +#define LIQUIDIO_BASE_MICRO_VERSION 1 +#define LIQUIDIO_BASE_VERSION __stringify(LIQUIDIO_BASE_MAJOR_VERSION) "." \ + __stringify(LIQUIDIO_BASE_MINOR_VERSION) +#define LIQUIDIO_MICRO_VERSION "." __stringify(LIQUIDIO_BASE_MICRO_VERSION) +#define LIQUIDIO_VERSIONLIQUIDIO_PACKAGE \ + __stringify(LIQUIDIO_BASE_MAJOR_VERSION) "." \ + __stringify(LIQUIDIO_BASE_MINOR_VERSION) \ + "." __stringify(LIQUIDIO_BASE_MICRO_VERSION) + +struct lio_version { + u16 major; + u16 minor; + u16 micro; + u16 reserved; +}; #define CONTROL_IQ 0 /** Tag types used by Octeon cores in its work. */ -- 1.8.3.1
[PATCH net-next V4 00/10] liquidio CN23XX support
Dave, Following patchset adds support for new device "CN23XX" in liquidio family of adapters. As adviced by you I have split the previous V3 patch of 18 patches into two halves. This first patchset has first 10 patches, which are tested against net-next. I will post the second half after this one. This V4 patch also addressed all the comments from previous submission: 1) Avoid busy loop while reading registers. 2) Other minor comments about debug messages and constants. Please apply patches in following order as some of the patches depend on earlier patches. Raghu Vatsavayi (10): liquidio: Consolidate common functionality liquidio: Firmware version management liquidio: Common enable irq function liquidio: CN23XX register definitions liquidio: CN23XX queue definitions liquidio: CN23XX device init and sriov config liquidio: CN23XX register setup liquidio: CN23XX queue manipulation liquidio: MSIX support for CN23XX liquidio: CN23XX firmware download drivers/net/ethernet/cavium/Kconfig|2 +- drivers/net/ethernet/cavium/liquidio/Makefile | 24 +- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 1171 .../ethernet/cavium/liquidio/cn23xx_pf_device.h| 57 + .../net/ethernet/cavium/liquidio/cn23xx_pf_regs.h | 604 ++ .../net/ethernet/cavium/liquidio/cn66xx_device.c | 45 +- .../net/ethernet/cavium/liquidio/cn66xx_device.h |7 +- .../net/ethernet/cavium/liquidio/cn68xx_device.c |1 - drivers/net/ethernet/cavium/liquidio/lio_core.c| 261 + drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 18 +- drivers/net/ethernet/cavium/liquidio/lio_main.c| 766 +++-- .../net/ethernet/cavium/liquidio/liquidio_common.h | 22 +- .../net/ethernet/cavium/liquidio/octeon_config.h | 59 +- .../net/ethernet/cavium/liquidio/octeon_console.c | 117 +- .../net/ethernet/cavium/liquidio/octeon_device.c | 302 +++-- .../net/ethernet/cavium/liquidio/octeon_device.h | 100 +- drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 33 +- drivers/net/ethernet/cavium/liquidio/octeon_droq.h |2 + drivers/net/ethernet/cavium/liquidio/octeon_iq.h |2 + drivers/net/ethernet/cavium/liquidio/octeon_main.h | 24 +- .../net/ethernet/cavium/liquidio/octeon_mem_ops.c |1 - .../net/ethernet/cavium/liquidio/octeon_network.h |2 - drivers/net/ethernet/cavium/liquidio/octeon_nic.c |8 +- drivers/net/ethernet/cavium/liquidio/octeon_nic.h |2 +- .../net/ethernet/cavium/liquidio/request_manager.c |3 + 25 files changed, 3022 insertions(+), 611 deletions(-) create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.h create mode 100644 drivers/net/ethernet/cavium/liquidio/cn23xx_pf_regs.h create mode 100644 drivers/net/ethernet/cavium/liquidio/lio_core.c -- 1.8.3.1
[PATCH net-next V4 03/10] liquidio: Common enable irq function
Add support of common irq enable functionality for both iq(instruction queue) and oq(output queue). Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- drivers/net/ethernet/cavium/liquidio/lio_main.c| 1 + .../net/ethernet/cavium/liquidio/liquidio_common.h | 2 +- .../net/ethernet/cavium/liquidio/octeon_device.c | 17 +++ .../net/ethernet/cavium/liquidio/octeon_device.h | 2 ++ drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 33 +- drivers/net/ethernet/cavium/liquidio/octeon_droq.h | 2 ++ drivers/net/ethernet/cavium/liquidio/octeon_iq.h | 2 ++ .../net/ethernet/cavium/liquidio/request_manager.c | 3 ++ 8 files changed, 48 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index 1bbeae8..8f11a0b 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -192,6 +192,7 @@ static void octeon_droq_bh(unsigned long pdev) continue; reschedule |= octeon_droq_process_packets(oct, oct->droq[q_no], MAX_PACKET_BUDGET); + lio_enable_irq(oct->droq[q_no], NULL); } if (reschedule) diff --git a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h index 11df55a..8ffd3b8 100644 --- a/drivers/net/ethernet/cavium/liquidio/liquidio_common.h +++ b/drivers/net/ethernet/cavium/liquidio/liquidio_common.h @@ -846,7 +846,7 @@ struct oct_mdio_cmd { /* intrmod: max. packets to trigger interrupt */ #define LIO_INTRMOD_RXMAXCNT_TRIGGER 384 /* intrmod: min. packets to trigger interrupt */ -#define LIO_INTRMOD_RXMINCNT_TRIGGER 1 +#define LIO_INTRMOD_RXMINCNT_TRIGGER 0 /* intrmod: max. time to trigger interrupt */ #define LIO_INTRMOD_RXMAXTMR_TRIGGER 128 /* 66xx:intrmod: min. time to trigger interrupt diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.c b/drivers/net/ethernet/cavium/liquidio/octeon_device.c index cff845c..541137a 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_device.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.c @@ -1122,3 +1122,20 @@ int lio_get_device_id(void *dev) return octeon_dev->octeon_id; return -1; } + +void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq) +{ + /* the whole thing needs to be atomic, ideally */ + if (droq) { + spin_lock_bh(>lock); + writel(droq->pkt_count, droq->pkts_sent_reg); + droq->pkt_count = 0; + spin_unlock_bh(>lock); + } + if (iq) { + spin_lock_bh(>lock); + writel(iq->pkt_in_done, iq->inst_cnt_reg); + iq->pkt_in_done = 0; + spin_unlock_bh(>lock); + } +} diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_device.h b/drivers/net/ethernet/cavium/liquidio/octeon_device.h index d1251f4..02e9854 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_device.h +++ b/drivers/net/ethernet/cavium/liquidio/octeon_device.h @@ -660,6 +660,8 @@ void *oct_get_config_info(struct octeon_device *oct, u16 card_type); */ struct octeon_config *octeon_get_conf(struct octeon_device *oct); +void lio_enable_irq(struct octeon_droq *droq, struct octeon_instr_queue *iq); + /* LiquidIO driver pivate flags */ enum { OCT_PRIV_FLAG_TX_BYTES = 0, /* Tx interrupts by pending byte count */ diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c index e0afe4c..5dfc23d 100644 --- a/drivers/net/ethernet/cavium/liquidio/octeon_droq.c +++ b/drivers/net/ethernet/cavium/liquidio/octeon_droq.c @@ -92,22 +92,25 @@ static inline void *octeon_get_dispatch_arg(struct octeon_device *octeon_dev, return fn_arg; } -/** Check for packets on Droq. This function should be called with - * lock held. +/** Check for packets on Droq. This function should be called with lock held. * @param droq - Droq on which count is checked. * @return Returns packet count. */ u32 octeon_droq_check_hw_for_pkts(struct octeon_droq *droq) { u32 pkt_count = 0; + u32 last_count; pkt_count = readl(droq->pkts_sent_reg); - if (pkt_count) { - atomic_add(pkt_count, >pkts_pending); - writel(pkt_count, droq->pkts_sent_reg); - } - return pkt_count; + last_count = pkt_count - droq->pkt_count; + droq->pkt_count = pkt_count; + + /* we shall write to cnts at napi irq enable or end of droq tasklet */ + if (last_count) +
[PATCH net-next V4 08/10] liquidio: CN23XX queue manipulation
This patch adds support for cn23xx queue manipulation. Signed-off-by: Derek ChicklesSigned-off-by: Satanand Burla Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi --- .../ethernet/cavium/liquidio/cn23xx_pf_device.c| 213 + .../net/ethernet/cavium/liquidio/cn66xx_device.c | 4 +- .../net/ethernet/cavium/liquidio/cn66xx_device.h | 2 +- drivers/net/ethernet/cavium/liquidio/lio_main.c| 12 +- .../net/ethernet/cavium/liquidio/octeon_device.h | 2 +- 5 files changed, 225 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c index d614b0a..7e932a3 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c @@ -311,6 +311,61 @@ static void cn23xx_setup_global_mac_regs(struct octeon_device *oct) (oct, CN23XX_SLI_PKT_MAC_RINFO64(mac_no, pf_num))); } +static int cn23xx_reset_io_queues(struct octeon_device *oct) +{ + int ret_val = 0; + u64 d64; + u32 q_no, srn, ern; + u32 loop = 1000; + + srn = oct->sriov_info.pf_srn; + ern = srn + oct->sriov_info.num_pf_rings; + + /*As per HRM reg description, s/w cant write 0 to ENB. */ + /*to make the queue off, need to set the RST bit. */ + + /* Reset the Enable bit for all the 64 IQs. */ + for (q_no = srn; q_no < ern; q_no++) { + /* set RST bit to 1. This bit applies to both IQ and OQ */ + d64 = octeon_read_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no)); + d64 = d64 | CN23XX_PKT_INPUT_CTL_RST; + octeon_write_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no), d64); + } + + /*wait until the RST bit is clear or the RST and quite bits are set*/ + for (q_no = srn; q_no < ern; q_no++) { + u64 reg_val = octeon_read_csr64(oct, + CN23XX_SLI_IQ_PKT_CONTROL64(q_no)); + while ((READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) && + !(READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_QUIET) && + loop--) { + WRITE_ONCE(reg_val, octeon_read_csr64( + oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no))); + } + if (!loop) { + dev_err(>pci_dev->dev, + "clearing the reset reg failed or setting the quiet reg failed for qno: %u\n", + q_no); + return -1; + } + WRITE_ONCE(reg_val, READ_ONCE(reg_val) & + ~CN23XX_PKT_INPUT_CTL_RST); + octeon_write_csr64(oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no), + READ_ONCE(reg_val)); + + WRITE_ONCE(reg_val, octeon_read_csr64( + oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no))); + if (READ_ONCE(reg_val) & CN23XX_PKT_INPUT_CTL_RST) { + dev_err(>pci_dev->dev, + "clearing the reset failed for qno: %u\n", + q_no); + ret_val = -1; + } + } + + return ret_val; +} + static int cn23xx_pf_setup_global_input_regs(struct octeon_device *oct) { u32 q_no, ern, srn; @@ -324,6 +379,9 @@ static int cn23xx_pf_setup_global_input_regs(struct octeon_device *oct) srn = oct->sriov_info.pf_srn; ern = srn + oct->sriov_info.num_pf_rings; + if (cn23xx_reset_io_queues(oct)) + return -1; + /** Set the MAC_NUM and PVF_NUM in IQ_PKT_CONTROL reg * for all queues.Only PF can set these bits. * bits 29:30 indicate the MAC num. @@ -552,6 +610,158 @@ static void cn23xx_setup_oq_regs(struct octeon_device *oct, u32 oq_no) reg_val); } +static int cn23xx_enable_io_queues(struct octeon_device *oct) +{ + u64 reg_val; + u32 srn, ern, q_no; + u32 loop = 1000; + + srn = oct->sriov_info.pf_srn; + ern = srn + oct->num_iqs; + + for (q_no = srn; q_no < ern; q_no++) { + /* set the corresponding IQ IS_64B bit */ + if (oct->io_qmask.iq64B & BIT_ULL(q_no - srn)) { + reg_val = octeon_read_csr64( + oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no)); + reg_val = reg_val | CN23XX_PKT_INPUT_CTL_IS_64B; + octeon_write_csr64( + oct, CN23XX_SLI_IQ_PKT_CONTROL64(q_no), reg_val); + } + + /* set the corresponding IQ ENB bit */ + if (oct->io_qmask.iq & BIT_ULL(q_no - srn)) { +