date:20170122

[PATCH v1] net: phy: micrel: add KSZ8795 ethernet switch

2017-01-22 Thread Sean Nyekjaer

This is add support for the PHYs in the KSZ8795 5port managed switch.

It will allow to detect the link between the switch and the soc
and uses the same read_status functions as the KSZ8873MLL switch.

This ethernet switch have unfortunately the same phy id as KSZ8051.

Signed-off-by: Sean Nyekjaer 
---
 drivers/net/phy/micrel.c   | 14 ++
 include/linux/micrel_phy.h |  2 ++
 2 files changed, 16 insertions(+)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index ea92d524d5a8..fa158ae5115b 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -1014,6 +1014,20 @@ static struct phy_driver ksphy_driver[] = {
.get_stats  = kszphy_get_stats,
.suspend= genphy_suspend,
.resume = genphy_resume,
+}, {
+   .phy_id = PHY_ID_KSZ8795,
+   .phy_id_mask= MICREL_PHY_ID_MASK,
+   .name   = "Micrel KSZ8795 Switch",
+   .features   = (SUPPORTED_Pause | SUPPORTED_Asym_Pause),
+   .flags  = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT,
+   .config_init= kszphy_config_init,
+   .config_aneg= ksz8873mll_config_aneg,
+   .read_status= ksz8873mll_read_status,
+   .get_sset_count = kszphy_get_sset_count,
+   .get_strings= kszphy_get_strings,
+   .get_stats  = kszphy_get_stats,
+   .suspend= genphy_suspend,
+   .resume = genphy_resume,
 } };
 
 module_phy_driver(ksphy_driver);
diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h
index 257173e0095e..f541da68d1e7 100644
--- a/include/linux/micrel_phy.h
+++ b/include/linux/micrel_phy.h
@@ -35,6 +35,8 @@
 #define PHY_ID_KSZ886X 0x00221430
 #define PHY_ID_KSZ8863 0x00221435
 
+#define PHY_ID_KSZ8795 0x00221550
+
 /* struct phy_device dev_flags definitions */
 #define MICREL_PHY_50MHZ_CLK   0x0001
 #define MICREL_PHY_FXEN0x0002
-- 
2.11.0

Re: [PATCH net-next v6 1/1] net sched actions: Add support for user cookies

2017-01-22 Thread Jiri Pirko

Sun, Jan 22, 2017 at 09:25:50PM CET, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>Introduce optional 128-bit action cookie.
>Like all other cookie schemes in the networking world (eg in protocols
>like http or existing kernel fib protocol field, etc) the idea is to save
>user state that when retrieved serves as a correlator. The kernel
>_should not_ intepret it.  The user can store whatever they wish in the
>128 bits.
>
>Sample exercise(showing variable length use of cookie)
>
>.. create an accept action with cookie a1b2c3d4
>sudo $TC actions add action ok index 1 cookie a1b2c3d4
>
>.. dump all gact actions..
>sudo $TC -s actions ls action gact
>
>action order 0: gact action pass
> random type none pass val 0
> index 1 ref 1 bind 0 installed 5 sec used 5 sec
>Action statistics:
>Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>cookie a1b2c3d4
>
>.. bind the accept action to a filter..
>sudo $TC filter add dev lo parent : protocol ip prio 1 \
>u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
>
>... send some traffic..
>$ ping 127.0.0.1 -c 3
>PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
>64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
>64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
>64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
>
>--- 127.0.0.1 ping statistics ---
>3 packets transmitted, 3 received, 0% packet loss, time 2109ms
>rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1
>
>... show some stats
>$ sudo $TC -s actions get action gact index 1
>
>action order 1: gact action pass
> random type none pass val 0
> index 1 ref 2 bind 1 installed 204 sec used 5 sec
>Action statistics:
>Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>cookie a1b2c3d4
>
>.. try longer cookie...
>$ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef
>.. dump..
>$ sudo $TC -s actions ls action gact
>
>action order 1: gact action pass
> random type none pass val 0
> index 1 ref 2 bind 1 installed 204 sec used 5 sec
>Action statistics:
>Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>cookie 1234567890abcdef
>
>Signed-off-by: Jamal Hadi Salim 

Reviewed-by: Jiri Pirko

Re: [PATCH 2/3] sh_eth: add missing EESIPR bits

2017-01-22 Thread Geert Uytterhoeven

On Sun, Jan 22, 2017 at 8:18 PM, Sergei Shtylyov
 wrote:
> Renesas SH77{34|63} manuals  describe more EESIPR bits than the current
> driver. Declare the new bits with the end goal of using the bit names
> instead of the bare numbers  for  the 'sh_eth_cpu_data::eesipr_value'
> initializers...
>
> Signed-off-by: Sergei Shtylyov 

Reviewed-by: Geert Uytterhoeven 

> ---
>  drivers/net/ethernet/renesas/sh_eth.h |   10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
> ===
> --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
> +++ net-next/drivers/net/ethernet/renesas/sh_eth.h
> @@ -269,13 +269,17 @@ enum EESR_BIT {
>
>  /* EESIPR */
>  enum EESIPR_BIT {
> -   EESIPR_TWBIP= 0x4000,
> +   EESIPR_TWB1IP   = 0x8000,
> +   EESIPR_TWBIP= 0x4000,   /* same as TWB0IP */

Ah, your adding it here ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH 1/3] sh_eth: rename EESIPR bits

2017-01-22 Thread Geert Uytterhoeven

Hi Sergei,

On Sun, Jan 22, 2017 at 8:18 PM, Sergei Shtylyov
 wrote:
> Since the  commit  b0ca2a21f769 ("sh_eth: Add support of SH7763 to sh_eth")
> the *enum* declaring the EESIPR bits (interrupt mask) went out of sync with
> the *enum* declaring the EESR bits (interrupt status) WRT  bit naming  and
> formatting. I'd like to restore the consistency by using EESIPR as the bit
> name prefix, renaming the *enum* to EESIPR_BIT, and (finally) renaming the
> bits according to the available  Renesas SH77{34|63} manuals...

Which versions of the SH77{34|63} manuals did you use?
Several registers are called slightly different in mine, and also in my
r8a7740 manual.

> --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
> +++ net-next/drivers/net/ethernet/renesas/sh_eth.h
> @@ -268,19 +268,29 @@ enum EESR_BIT {
>  EESR_TFE | EESR_TDE)
>
>  /* EESIPR */
> -enum DMAC_IM_BIT {
> -   DMAC_M_TWB = 0x4000, DMAC_M_TABT = 0x0400,
> -   DMAC_M_RABT = 0x0200,
> -   DMAC_M_RFRMER = 0x0100, DMAC_M_ADF = 0x0080,
> -   DMAC_M_ECI = 0x0040, DMAC_M_FTC = 0x0020,
> -   DMAC_M_TDE = 0x0010, DMAC_M_TFE = 0x0008,
> -   DMAC_M_FRC = 0x0004, DMAC_M_RDE = 0x0002,
> -   DMAC_M_RFE = 0x0001, DMAC_M_TINT4 = 0x0800,
> -   DMAC_M_TINT3 = 0x0400, DMAC_M_TINT2 = 0x0200,
> -   DMAC_M_TINT1 = 0x0100, DMAC_M_RINT8 = 0x0080,
> -   DMAC_M_RINT5 = 0x0010, DMAC_M_RINT4 = 0x0008,
> -   DMAC_M_RINT3 = 0x0004, DMAC_M_RINT2 = 0x0002,
> -   DMAC_M_RINT1 = 0x0001,
> +enum EESIPR_BIT {
> +   EESIPR_TWBIP= 0x4000,

TWBIP is actually two bits in my manual: TWB1IP and TWB0IP

> +   EESIPR_ADEIP= 0x0080,

Nonexistent bit in my manual.

> +   EESIPR_CNDIP= 0x0800,

Nonexistent bit in my manual.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH] net: phy: micrel: add KSZ8795 ethernet switch

2017-01-22 Thread Sean Nyekjær




On 2017-01-20 15:17, Andrew Lunn wrote:

On Fri, Jan 20, 2017 at 01:50:49PM +0100, Sean Nyekjaer wrote:

This ethernet switch have unfortunately the same phy id as KSZ8051.

Hi Sean

Please could you explain some more. You are adding PHY support here,
not switch support. So is this to enable the PHY driver for the PHYs
embedded in the switch?

 Andrew

Yes of couse :-)
The KSZ8051 is a 5 port managed ethernet switch with integrated PHY with 
MII/RMII interface on one port.

Through the MDIO interface is possible to control the PHY on port 1-5.

I have just seen an issue with the reported speed and duplex, so i'm 
gonna submit a new version with a better description


/Sean

[PATCH v4 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread Jisheng Zhang

From: Jingju Hou 

From: Jingju Hou 

The mvneta itself does not support WOL, but the PHY might.
So pass the calls to the PHY

Signed-off-by: Jingju Hou 
Signed-off-by: Jisheng Zhang 
---
since v3:
 - really fix the build error

since v2,v1:
 - using phy_dev member in struct net_device
 - add commit msg

 drivers/net/ethernet/marvell/mvneta.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 6dcc951af0ff..02611fa1c3b8 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3929,6 +3929,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
return 0;
 }
 
+static void mvneta_ethtool_get_wol(struct net_device *dev,
+  struct ethtool_wolinfo *wol)
+{
+   wol->supported = 0;
+   wol->wolopts = 0;
+
+   if (dev->phydev)
+   return phy_ethtool_get_wol(dev->phydev, wol);
+}
+
+static int mvneta_ethtool_set_wol(struct net_device *dev,
+ struct ethtool_wolinfo *wol)
+{
+   if (!dev->phydev)
+   return -EOPNOTSUPP;
+
+   return phy_ethtool_set_wol(dev->phydev, wol);
+}
+
 static const struct net_device_ops mvneta_netdev_ops = {
.ndo_open= mvneta_open,
.ndo_stop= mvneta_stop,
@@ -3958,6 +3977,8 @@ const struct ethtool_ops mvneta_eth_tool_ops = {
.set_rxfh   = mvneta_ethtool_set_rxfh,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = mvneta_ethtool_set_link_ksettings,
+   .get_wol= mvneta_ethtool_get_wol,
+   .set_wol= mvneta_ethtool_set_wol,
 };
 
 /* Initialize hw */
-- 
2.11.0

[PATCH net] r8152: don't execute runtime suspend if the tx is not empty

2017-01-22 Thread Hayes Wang

Runtime suspend shouldn't be executed if the tx queue is not empty,
because the device is not idle.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 0e99af0..e1466b4 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -32,7 +32,7 @@
 #define NETNEXT_VERSION"08"
 
 /* Information for net */
-#define NET_VERSION"6"
+#define NET_VERSION"7"
 
 #define DRIVER_VERSION "v1." NETNEXT_VERSION "." NET_VERSION
 #define DRIVER_AUTHOR "Realtek linux nic maintainers "
@@ -3574,6 +3574,8 @@ static bool delay_autosuspend(struct r8152 *tp)
 */
if (!sw_linking && tp->rtl_ops.in_nway(tp))
return true;
+   else if (!skb_queue_empty(&tp->tx_queue))
+   return true;
else
return false;
 }
-- 
2.7.4

Re: [PATCH] net: stmicro: fix LS field mask in EEE configuration

2017-01-22 Thread Rayagond Kokatanur

Acked-by:Rayagond Kokatanur 

On Fri, Jan 20, 2017 at 9:30 PM, Joao Pinto  wrote:
> This patch fixes the LS mask when setting EEE timer.
> LS field is 10 bits long and not 11 as currently.
>
> Signed-off-by: Joao Pinto 
> Reported-By: Rayagond Kokatanur 
> ---
>  drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
> index 834f40f..202216c 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
> @@ -184,7 +184,7 @@ static void dwmac4_set_eee_pls(struct mac_device_info 
> *hw, int link)
>  static void dwmac4_set_eee_timer(struct mac_device_info *hw, int ls, int tw)
>  {
> void __iomem *ioaddr = hw->pcsr;
> -   int value = ((tw & 0x)) | ((ls & 0x7ff) << 16);
> +   int value = ((tw & 0x)) | ((ls & 0x3ff) << 16);
>
> /* Program the timers in the LPI timer control register:
>  * LS: minimum time (ms) for which the link
> --
> 2.9.3
>



-- 
wwr
Rayagond

Re: [PATCH net-next] net: dsa: Fix inverted test for multiple CPU interface

2017-01-22 Thread Florian Fainelli



On 01/22/2017 01:16 PM, Andrew Lunn wrote:
> Remove the wrong !, otherwise we get false positives about having
> multiple CPU interfaces.
> 
> Fixes: b22de490869d ("net: dsa: store CPU switch structure in the tree")
> Signed-off-by: Andrew Lunn 

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCH net-next 0/2] net: couple mdio_module_driver changes

2017-01-22 Thread Florian Fainelli

Hi David,

Small patch series fixing a comment for mdio_module_driver and
finally utilizing it in b53_mdio.

Thanks!

Florian Fainelli (2):
  net: phy: Fix typo for MDIO module boilerplate comment
  net: dsa: b53: Utilize mdio_module_driver

 drivers/net/dsa/b53/b53_mdio.c | 13 +
 include/linux/mdio.h   |  2 +-
 2 files changed, 2 insertions(+), 13 deletions(-)

-- 
2.9.3

[PATCH net-next 1/2] net: phy: Fix typo for MDIO module boilerplate comment

2017-01-22 Thread Florian Fainelli

The module boilerplate macro is named mdio_module_driver and not
module_mdio_driver, fix that.

Fixes: a9049e0c513c ("mdio: Add support for mdio drivers.")
Signed-off-by: Florian Fainelli 
---
 include/linux/mdio.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mdio.h b/include/linux/mdio.h
index b6587a4b32e7..55a80d73cfc1 100644
--- a/include/linux/mdio.h
+++ b/include/linux/mdio.h
@@ -265,7 +265,7 @@ bool mdiobus_is_registered_device(struct mii_bus *bus, int 
addr);
 struct phy_device *mdiobus_get_phy(struct mii_bus *bus, int addr);
 
 /**
- * module_mdio_driver() - Helper macro for registering mdio drivers
+ * mdio_module_driver() - Helper macro for registering mdio drivers
  *
  * Helper macro for MDIO drivers which do not do anything special in module
  * init/exit. Each module may only use this macro once, and calling it
-- 
2.9.3

[PATCH net-next 2/2] net: dsa: b53: Utilize mdio_module_driver

2017-01-22 Thread Florian Fainelli

Eliminate a bit of boilerplate code.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_mdio.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_mdio.c b/drivers/net/dsa/b53/b53_mdio.c
index 477a16b5660a..fa7556f5d4fb 100644
--- a/drivers/net/dsa/b53/b53_mdio.c
+++ b/drivers/net/dsa/b53/b53_mdio.c
@@ -375,18 +375,7 @@ static struct mdio_driver b53_mdio_driver = {
.of_match_table = b53_of_match,
},
 };
-
-static int __init b53_mdio_driver_register(void)
-{
-   return mdio_driver_register(&b53_mdio_driver);
-}
-module_init(b53_mdio_driver_register);
-
-static void __exit b53_mdio_driver_unregister(void)
-{
-   mdio_driver_unregister(&b53_mdio_driver);
-}
-module_exit(b53_mdio_driver_unregister);
+mdio_module_driver(b53_mdio_driver);
 
 MODULE_DESCRIPTION("B53 MDIO access driver");
 MODULE_LICENSE("Dual BSD/GPL");
-- 
2.9.3

Re: [PATCH net-next] net: ipv6: ignore null_entry on route dumps

2017-01-22 Thread David Miller


David, please slow down.

How is the NULL entry getting selected to be dumped and passed
down here in the first place?

The problem seems to be higher up in the chain here, don't
just special case check for this in rt6_dump_route().

Thanks.

[PATCH net-next] net: ipv6: ignore null_entry on route dumps

2017-01-22 Thread David Ahern

lkp-robot reported a BUG:
[   10.151226] BUG: unable to handle kernel NULL pointer dereference at 0198
[   10.152525] IP: rt6_fill_node+0x164/0x4b8
[   10.153307] *pdpt = 12ee5001 *pde = 
[   10.153309]
[   10.154492] Oops:  [#1]
[   10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 
4.10.0-rc4-00722-g41e8c70ee162-dirty #10
[   10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   10.158254] task: d0deb000 task.stack: d0e0c000
[   10.159059] EIP: rt6_fill_node+0x164/0x4b8
[   10.159780] EFLAGS: 00010296 CPU: 0
[   10.160404] EAX:  EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44
[   10.161469] ESI:  EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4
[   10.162534]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[   10.163482] CR0: 80050033 CR2: 0198 CR3: 10d94660 CR4: 06b0
[   10.164535] Call Trace:
[   10.164993]  ? paravirt_sched_clock+0x9/0xd
[   10.165727]  ? sched_clock+0x9/0xc
[   10.166329]  ? sched_clock_cpu+0x19/0xe9
[   10.166991]  ? lock_release+0x13e/0x36c
[   10.167652]  rt6_dump_route+0x4c/0x56
[   10.168276]  fib6_dump_node+0x1d/0x3d
[   10.168913]  fib6_walk_continue+0xab/0x167
[   10.169611]  fib6_walk+0x2a/0x40
[   10.170182]  inet6_dump_fib+0xfb/0x1e0
[   10.170855]  netlink_dump+0xcd/0x21f

This happens when the loopback device is set down and a ipv6 fib route
dump is requested.

The ipv6 route dump code passes ip6_null_entry to rt6_fill_node. This
route uses the loopback device but does not have idev set. When the
loopback is set down, the netif_running check added by a1a22c1206 fails
and the fill_node descends to checking rt->rt6i_idev for
ignore_routes_with_linkdown. Since idev is null for the ip6_null_entry
route it triggers the BUG.

The null_entry route should not be processed in a dump request. Catch
and ignore.

Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down")
Signed-off-by: David Ahern 
---
 net/ipv6/route.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4b1f0f98a0e9..47499ed429da 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3320,6 +3320,10 @@ static int rt6_fill_node(struct net *net,
 int rt6_dump_route(struct rt6_info *rt, void *p_arg)
 {
struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg;
+   struct net *net = arg->net;
+
+   if (rt == net->ipv6.ip6_null_entry)
+   return 0;
 
if (nlmsg_len(arg->cb->nlh) >= sizeof(struct rtmsg)) {
struct rtmsg *rtm = nlmsg_data(arg->cb->nlh);
@@ -3332,7 +3336,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg)
}
}
 
-   return rt6_fill_node(arg->net,
+   return rt6_fill_node(net,
 arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
 NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq,
 NLM_F_MULTI);
-- 
2.1.4

RE: [PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread YUAN Linyu



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Jingju Hou
> Sent: Monday, January 23, 2017 12:11 PM
> To: da...@davemloft.net
> Cc: jszh...@marvell.com; thomas.petazz...@free-electrons.com;
> netdev@vger.kernel.org; Jingju Hou
> Subject: [PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol
> 
> The mvneta itself does not support WOL, but the PHY might.
> So pass the calls to the PHY
> 
> Signed-off-by: Jingju Hou 
> ---
> Since v2:
> - it should be phydev member not phy_dev
> 
>  drivers/net/ethernet/marvell/mvneta.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c
> b/drivers/net/ethernet/marvell/mvneta.c
> index e05e227..fea4968 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct
> net_device *dev, u32 *indir, u8 *key,
>   return 0;
>  }
> 
> +static void
> +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo
> *wol)
> +{
> + wol->supported = 0;
> + wol->wolopts = 0;
> +
> + if (dev->phy_dev)
Not changed,
> + return phy_ethtool_get_wol(dev->phydev, wol);
> +}
> +
> +static int
> +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +{
> + if (!dev->phydev)
> + return -EOPNOTSUPP;
> +
> + return phy_ethtool_set_wol(dev->phydev, wol);
> +}
> +
>  static const struct net_device_ops mvneta_netdev_ops = {
>   .ndo_open= mvneta_open,
>   .ndo_stop= mvneta_stop,
> @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct
> net_device *dev, u32 *indir, u8 *key,
>   .set_rxfh   = mvneta_ethtool_set_rxfh,
>   .get_link_ksettings = phy_ethtool_get_link_ksettings,
>   .set_link_ksettings = mvneta_ethtool_set_link_ksettings,
> + .get_wol= mvneta_ethtool_get_wol,
> + .set_wol= mvneta_ethtool_set_wol,
>  };
> 
>  /* Initialize hw */
> --
> 1.9.1

Re: [PATCH net-next] net: ipv6: Check that idev is non-NULL in rt6_fill_node

2017-01-22 Thread David Ahern

On 1/22/17 9:32 PM, David Miller wrote:
> From: David Ahern 
> Date: Sun, 22 Jan 2017 20:08:00 -0800
> 
>> The ipv6 route dump code passes ip6_null_entry to rt6_fill_node.
> 
> Doesn't this fact cause you to take a pause?

yes, it did.

> 
> I can't see a legitimate reason to dump the null entry, it's
> a marker rather than a real entry.
> 

neither do I. I was rather surprised to see it hit rt6_fill_node and that the 
rc is 0.

I can send a v2 that drops null_entry.

Re: [PATCH net-next] net: ipv6: Check that idev is non-NULL in rt6_fill_node

2017-01-22 Thread David Miller

From: David Ahern 
Date: Sun, 22 Jan 2017 20:08:00 -0800

> The ipv6 route dump code passes ip6_null_entry to rt6_fill_node.

Doesn't this fact cause you to take a pause?

I can't see a legitimate reason to dump the null entry, it's
a marker rather than a real entry.

Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2017-01-22 Thread Alexei Starovoitov

On Thu, Jan 19, 2017 at 08:04:59PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 19, 2017 at 6:39 PM, Alexei Starovoitov
>  wrote:
> > On Wed, Jan 18, 2017 at 06:29:22PM -0800, Andy Lutomirski wrote:
> >> I think it could work by making a single socket cgroup controller that
> >> handles all cgroup things that are bound to a socket.  Using
> >
> > Such 'socket cgroup controller' would limit usability of the feature
> > to sockets and force all other use cases like landlock to invent
> > their own wheel, which is undesirable. Everyone will be
> > inventing new 'foo cgroup controller', while all of them
> > are really bpf features. They are different bpf program
> > types that attach to different hooks and use cgroup for scoping.
> 
> Can you elaborate on why that would be a problem?  In a cgroup v1
> world, users who want different hierarchies for different types of
> control could easily want one hierarchy for socket hooks and a
> different hierarchy for lsm hooks.  In a cgroup v2 delegation world, I
> could easily imagine the decision to delegate socket hooks being
> different from the decision to delegate lsm hooks.  Almost all of the
> code would be shared between different bpf-using cgroup controllers.

how do you think it can be enforced when directory is chowned?

> >> Having thought about this some more, I think that making it would
> >> alleviate a bunch of my concerns, as it would make the semantics if
> >> the capable() check were relaxed to ns_capable() be sane.  Here's what
> >
> > here we're on the same page. For any meaningful discussion about
> > 'bpf cgroup controller' to happen bpf itself needs to become
> > delegatable in cgroup sense. In other words BPF_PROG_TYPE_CGROUP*
> > program types need to become available for unprivileged users.
> > The only unprivileged prog type today is BPF_PROG_TYPE_SOCKET_FILTER.
> > To make it secure we severely limited its functionality.
> > All bpf advances since then (like new map types and verifier extensions)
> > were done for root only. If early on the priv vs unpriv bpf features
> > were 80/20. Now it's close to 95/5. No work has been done to
> > make socket filter type more powerful. It still has to use
> > slow-ish ld_abs skb access while tc/xdp have direct packet access.
> > Things like register value tracking is root only as well and so on
> > and so forth.
> > We cannot just flip the switch and allow type_cgroup* to unpriv
> > and I don't see any volunteers willing to do this work.
> > Until that happens there is no point coming up with designs
> > for 'cgroup bpf controller'... whatever that means.
> 
> Sure there is.  If delegation can be turned on without changing the
> API, then the result will be easier to work with and have fewer
> compatibility issues.

... and open() of the directory done by the current api will preserve
cgroup delegation when and only when bpf_prog_type_cgroup_*
becomes unprivileged.
I'm not proposing creating new api here.

> >
> >> I currently should happen before bpf+cgroup is enabled in a release:
> >>
> >> 1. Make it netns-aware.  This could be as simple as making it only
> >> work in the root netns because then real netns awareness can be added
> >> later without breaking anything.  The current situation is bad in that
> >> network namespaces are just ignored and it's plausible that people
> >> will start writing user code that depends on having network namespaces
> >> be ignored.
> >
> > nothing in bpf today is netns-aware and frankly I don't see
> > how cgroup+bpf has anything to do with netns.
> > For regular sockets+bpf we don't check netns.
> > When tcpdump opens raw socket and attaches bpf there are no netns
> > checks, since socket itself gives a scope for the program to run.
> > Same thing applies to cgroup+bpf. cgroup gives a scope for the program.
> > But, say, we indeed add 'if !root ns' check to BPF_CGROUP_INET_*
> > hooks.
> 
> 
> Here I completely disagree with you.  tcpdump sees packets in its
> network namespace.  Regular sockets apply bpf filters to the packets
> seen by that socket, and the socket itself is scoped to a netns.
> 
> Meanwhile, cgroup+bpf actually appears to be buggy in this regard even
> regardless of what semantics you think are better.  sk_bound_dev_if is
> exposed as a u32 value, but sk_bound_dev_if only has meaning within a
> given netns.  The "ip vrf" stuff will straight-up malfunction if a
> process affected by its hook runs in a different netns from the netns
> that "ip vrf" was run in.

how is that any different from normal 'ip netns exec'?
that is expected user behavior.

> IOW, the current code is buggy.
> 
> > Then if the hooks are used for security, the process
> > only needs to do setns() to escape security sandbox. Obviously
> > broken semantics.
> 
> This could go both ways.  If the goal is to filter packets, then it's
> not really important to have the filter keep working if the sandboxed
> task unshares netns -- in the new netns, there isn't any access to the
> netwo

Re: [PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread David Miller

From: Jingju Hou 
Date: Mon, 23 Jan 2017 12:11:18 +0800

> The mvneta itself does not support WOL, but the PHY might.
> So pass the calls to the PHY
> 
> Signed-off-by: Jingju Hou 
> ---
> Since v2:
> - it should be phydev member not phy_dev
> 
>  drivers/net/ethernet/marvell/mvneta.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index e05e227..fea4968 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
> *dev, u32 *indir, u8 *key,
>   return 0;
>  }
>  
> +static void
> +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +{
> + wol->supported = 0;
> + wol->wolopts = 0;
> +
> + if (dev->phy_dev)

You are not testing the build of this patch, you are still using
phy_dev here.  Either that or your commit message is not accurate.

Either way this patch or it's commit message is wrong.

I think you need to stop, take a deep breath, and take your time
fixing this.

Right now you are spitting out a new patch just minutes after
a previous submission, and these patches still have the same
bugs.

Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread David Miller


The same build error exists in all submissions of your patch.

At this point you must absolutely reproduce this build failure
yourself, and stop submitting this patch until you can test that the
build failure is fixed.

[PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread Jingju Hou

The mvneta itself does not support WOL, but the PHY might.
So pass the calls to the PHY

Signed-off-by: Jingju Hou 
---
Since v2:
- it should be phydev member not phy_dev

 drivers/net/ethernet/marvell/mvneta.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index e05e227..fea4968 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
return 0;
 }
 
+static void
+mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   wol->supported = 0;
+   wol->wolopts = 0;
+
+   if (dev->phy_dev)
+   return phy_ethtool_get_wol(dev->phydev, wol);
+}
+
+static int
+mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   if (!dev->phydev)
+   return -EOPNOTSUPP;
+
+   return phy_ethtool_set_wol(dev->phydev, wol);
+}
+
 static const struct net_device_ops mvneta_netdev_ops = {
.ndo_open= mvneta_open,
.ndo_stop= mvneta_stop,
@@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
.set_rxfh   = mvneta_ethtool_set_rxfh,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = mvneta_ethtool_set_link_ksettings,
+   .get_wol= mvneta_ethtool_get_wol,
+   .set_wol= mvneta_ethtool_set_wol,
 };
 
 /* Initialize hw */
-- 
1.9.1

[PATCH net-next] net: ipv6: Check that idev is non-NULL in rt6_fill_node

2017-01-22 Thread David Ahern

lkp-robot reported a BUG:
[   10.151226] BUG: unable to handle kernel NULL pointer dereference at 0198
[   10.152525] IP: rt6_fill_node+0x164/0x4b8
[   10.153307] *pdpt = 12ee5001 *pde = 
[   10.153309]
[   10.154492] Oops:  [#1]
[   10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 
4.10.0-rc4-00722-g41e8c70ee162-dirty #10
[   10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   10.158254] task: d0deb000 task.stack: d0e0c000
[   10.159059] EIP: rt6_fill_node+0x164/0x4b8
[   10.159780] EFLAGS: 00010296 CPU: 0
[   10.160404] EAX:  EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44
[   10.161469] ESI:  EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4
[   10.162534]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[   10.163482] CR0: 80050033 CR2: 0198 CR3: 10d94660 CR4: 06b0
[   10.164535] Call Trace:
[   10.164993]  ? paravirt_sched_clock+0x9/0xd
[   10.165727]  ? sched_clock+0x9/0xc
[   10.166329]  ? sched_clock_cpu+0x19/0xe9
[   10.166991]  ? lock_release+0x13e/0x36c
[   10.167652]  rt6_dump_route+0x4c/0x56
[   10.168276]  fib6_dump_node+0x1d/0x3d
[   10.168913]  fib6_walk_continue+0xab/0x167
[   10.169611]  fib6_walk+0x2a/0x40
[   10.170182]  inet6_dump_fib+0xfb/0x1e0
[   10.170855]  netlink_dump+0xcd/0x21f

This happens when the loopback device is set down and a ipv6 fib route
dump is requested.

The ipv6 route dump code passes ip6_null_entry to rt6_fill_node. This
route uses the loopback device but does not have idev set. When the
loopback is set down, the netif_running check added by a1a22c1206 fails
and the fill_node descends to checking rt->rt6i_idev for
ignore_routes_with_linkdown. Since idev is null for the ip6_null_entry
route it triggers the BUG.

Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down")
Signed-off-by: David Ahern 
---
 net/ipv6/route.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5585c501a540..9a7cc7558104 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3218,7 +3218,8 @@ static int rt6_fill_node(struct net *net,
rtm->rtm_flags = 0;
if (!netif_carrier_ok(rt->dst.dev)) {
rtm->rtm_flags |= RTNH_F_LINKDOWN;
-   if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+   if (rt->rt6i_idev &&
+   rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
rtm->rtm_flags |= RTNH_F_DEAD;
}
rtm->rtm_scope = RT_SCOPE_UNIVERSE;
-- 
2.1.4

Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread kbuild test robot

Hi Jingju,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Jingju-Hou/net-mvneta-implement-set_wol-and-get_wol/20170123-105218
config: m68k-allyesconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=m68k 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_get_wol':
>> drivers/net/ethernet/marvell/mvneta.c:3938:9: error: 'struct net_device' has 
>> no member named 'phy_dev'
 if (dev->phy_dev)
^
   drivers/net/ethernet/marvell/mvneta.c:3939:33: error: 'struct net_device' 
has no member named 'phy_dev'
  return phy_ethtool_get_wol(dev->phy_dev, wol);
^
   drivers/net/ethernet/marvell/mvneta.c:3939:3: warning: 'return' with a 
value, in function returning void
  return phy_ethtool_get_wol(dev->phy_dev, wol);
  ^
   drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_set_wol':
   drivers/net/ethernet/marvell/mvneta.c:3945:10: error: 'struct net_device' 
has no member named 'phy_dev'
 if (!dev->phy_dev)
 ^
   drivers/net/ethernet/marvell/mvneta.c:3948:32: error: 'struct net_device' 
has no member named 'phy_dev'
 return phy_ethtool_set_wol(dev->phy_dev, wol);
   ^
   drivers/net/ethernet/marvell/mvneta.c:3949:1: warning: control reaches end 
of non-void function [-Wreturn-type]
}
^

vim +3938 drivers/net/ethernet/marvell/mvneta.c

  3932  static void
  3933  mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo 
*wol)
  3934  {
  3935  wol->supported = 0;
  3936  wol->wolopts = 0;
  3937  
> 3938  if (dev->phy_dev)
  3939  return phy_ethtool_get_wol(dev->phy_dev, wol);
  3940  }
  3941  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread kbuild test robot

Hi Jingju,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Jingju-Hou/net-mvneta-implement-set_wol-and-get_wol/20170123-105218
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_get_wol':
>> drivers/net/ethernet/marvell/mvneta.c:3938:9: error: 'struct net_device' has 
>> no member named 'phy_dev'; did you mean 'phydev'?
 if (dev->phy_dev)
^~
   drivers/net/ethernet/marvell/mvneta.c:3939:33: error: 'struct net_device' 
has no member named 'phy_dev'; did you mean 'phydev'?
  return phy_ethtool_get_wol(dev->phy_dev, wol);
^~
   drivers/net/ethernet/marvell/mvneta.c:3939:10: warning: 'return' with a 
value, in function returning void
  return phy_ethtool_get_wol(dev->phy_dev, wol);
 ^~~
   drivers/net/ethernet/marvell/mvneta.c:3933:1: note: declared here
mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
^~
   drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_set_wol':
   drivers/net/ethernet/marvell/mvneta.c:3945:10: error: 'struct net_device' 
has no member named 'phy_dev'; did you mean 'phydev'?
 if (!dev->phy_dev)
 ^~
   drivers/net/ethernet/marvell/mvneta.c:3948:32: error: 'struct net_device' 
has no member named 'phy_dev'; did you mean 'phydev'?
 return phy_ethtool_set_wol(dev->phy_dev, wol);
   ^~
   drivers/net/ethernet/marvell/mvneta.c:3949:1: warning: control reaches end 
of non-void function [-Wreturn-type]
}
^

vim +3938 drivers/net/ethernet/marvell/mvneta.c

  3932  static void
  3933  mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo 
*wol)
  3934  {
  3935  wol->supported = 0;
  3936  wol->wolopts = 0;
  3937  
> 3938  if (dev->phy_dev)
  3939  return phy_ethtool_get_wol(dev->phy_dev, wol);
  3940  }
  3941  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

EPOLLERR on memory mapped netlink socket

2017-01-22 Thread prashantkumar dhotre

Hi experts,
I am new to netlink sockets.
In my app , I am  getting EPOLLERR in epoll_wait() on netlink socket
continuously.
epoll just notifies that there is a read event on socket (it does not
tell if it is read or epollerr).
What could be cause of this and what EPOLLERR on memory mapped netlink
socket mean.is this Other side of netlink (kernel side) closed
connection ?even if kernel side closed connection why non-stop
repeated EPOLLERR s on netlink sockets ?
What action should we take in such cases ? just close the socket  or
call getsockopt(SO_ERROR) to retrieve the pending error state from the
socket and just continue without closing socket?

how do we detect if kernel side closed the connection ?
My understanding is :
if we get read event notification from epoll on memory mapped netlink
socket and in RX ring if the frame is neither NL_MMAP_STATUS_VALID and
nor NL_MMAP_STATUS_COPY then we can conclude that this is a 'close()'
from remote kernel socket and I can close connection by calling
close() on my netlink socket.
Is above understanding correct ?

Please

Re: [PATCH v2 net-next] net: phy: marvell: Add Wake from LAN support for 88E1510 PHY

2017-01-22 Thread Jisheng Zhang

On Mon, 23 Jan 2017 10:58:15 +0800 wrote:

> This is test on BG4CT platform with 88E1518 marvell PHY.
> 
> Signed-off-by: Jingju Hou 

Reviewed-by: Jisheng Zhang 

> ---
> Since v1:
> - add some commit messages
> 
>  drivers/net/phy/marvell.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index 0b78210..ed0d235 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -1679,6 +1679,8 @@ static int marvell_probe(struct phy_device *phydev)
>   .ack_interrupt = &marvell_ack_interrupt,
>   .config_intr = &marvell_config_intr,
>   .did_interrupt = &m88e1121_did_interrupt,
> + .get_wol = &m88e1318_get_wol,
> + .set_wol = &m88e1318_set_wol,
>   .resume = &marvell_resume,
>   .suspend = &marvell_suspend,
>   .get_sset_count = marvell_get_sset_count,

Re: [PATCH] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread Jisheng Zhang

Hi Jingju,

On Mon, 23 Jan 2017 10:43:08 +0800 wrote:

> The mvneta itself does not support WOL, but the PHY might.
> So pass the calls to the PHY
> 
> Signed-off-by: Jingju Hou 
> ---
> Since v1:
> - using phy_dev member in struct net_device

I noticed that you send a new v2 patch. So this patch should be ignored.
Some tips:

*the v2 patch title should be like:

[PATCH v2] net: mvneta: implement .set_wol and .get_wol

*you also add a commit msg in v2, you'd better mention it
in changes since v1.

Thanks,
Jisheng

> 
>  drivers/net/ethernet/marvell/mvneta.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index e05e227..78869fa 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
> *dev, u32 *indir, u8 *key,
>   return 0;
>  }
>  
> +static void
> +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +{
> + wol->supported = 0;
> + wol->wolopts = 0;
> +
> + if (dev->phy_dev)
> + return phy_ethtool_get_wol(dev->phy_dev, wol);
> +}
> +
> +static int
> +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +{
> + if (!dev->phy_dev)
> + return -EOPNOTSUPP;
> +
> + return phy_ethtool_set_wol(dev->phy_dev, wol);
> +}
> +
>  static const struct net_device_ops mvneta_netdev_ops = {
>   .ndo_open= mvneta_open,
>   .ndo_stop= mvneta_stop,
> @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
> *dev, u32 *indir, u8 *key,
>   .set_rxfh   = mvneta_ethtool_set_rxfh,
>   .get_link_ksettings = phy_ethtool_get_link_ksettings,
>   .set_link_ksettings = mvneta_ethtool_set_link_ksettings,
> + .get_wol= mvneta_ethtool_get_wol,
> + .set_wol= mvneta_ethtool_set_wol,
>  };
>  
>  /* Initialize hw */

[PATCH v2 net-next] net: phy: marvell: Add Wake from LAN support for 88E1510 PHY

2017-01-22 Thread Jingju Hou

This is test on BG4CT platform with 88E1518 marvell PHY.

Signed-off-by: Jingju Hou 
---
Since v1:
- add some commit messages

 drivers/net/phy/marvell.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 0b78210..ed0d235 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -1679,6 +1679,8 @@ static int marvell_probe(struct phy_device *phydev)
.ack_interrupt = &marvell_ack_interrupt,
.config_intr = &marvell_config_intr,
.did_interrupt = &m88e1121_did_interrupt,
+   .get_wol = &m88e1318_get_wol,
+   .set_wol = &m88e1318_set_wol,
.resume = &marvell_resume,
.suspend = &marvell_suspend,
.get_sset_count = marvell_get_sset_count,
-- 
1.9.1

Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread Jisheng Zhang

On Mon, 23 Jan 2017 10:44:07 +0800
Jingju Hou  wrote:

> The mvneta itself does not support WOL, but the PHY might.
> So pass the calls to the PHY
> 
> Signed-off-by: Jingju Hou 

Reviewed-by: Jisheng Zhang 

> ---
> Since v1:
> - using phy_dev member in struct net_device
> 
>  drivers/net/ethernet/marvell/mvneta.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index e05e227..78869fa 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
> *dev, u32 *indir, u8 *key,
>   return 0;
>  }
>  
> +static void
> +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +{
> + wol->supported = 0;
> + wol->wolopts = 0;
> +
> + if (dev->phy_dev)
> + return phy_ethtool_get_wol(dev->phy_dev, wol);
> +}
> +
> +static int
> +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
> +{
> + if (!dev->phy_dev)
> + return -EOPNOTSUPP;
> +
> + return phy_ethtool_set_wol(dev->phy_dev, wol);
> +}
> +
>  static const struct net_device_ops mvneta_netdev_ops = {
>   .ndo_open= mvneta_open,
>   .ndo_stop= mvneta_stop,
> @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
> *dev, u32 *indir, u8 *key,
>   .set_rxfh   = mvneta_ethtool_set_rxfh,
>   .get_link_ksettings = phy_ethtool_get_link_ksettings,
>   .set_link_ksettings = mvneta_ethtool_set_link_ksettings,
> + .get_wol= mvneta_ethtool_get_wol,
> + .set_wol= mvneta_ethtool_set_wol,
>  };
>  
>  /* Initialize hw */

[PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread Jingju Hou

The mvneta itself does not support WOL, but the PHY might.
So pass the calls to the PHY

Signed-off-by: Jingju Hou 
---
Since v1:
- using phy_dev member in struct net_device

 drivers/net/ethernet/marvell/mvneta.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index e05e227..78869fa 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
return 0;
 }
 
+static void
+mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   wol->supported = 0;
+   wol->wolopts = 0;
+
+   if (dev->phy_dev)
+   return phy_ethtool_get_wol(dev->phy_dev, wol);
+}
+
+static int
+mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   if (!dev->phy_dev)
+   return -EOPNOTSUPP;
+
+   return phy_ethtool_set_wol(dev->phy_dev, wol);
+}
+
 static const struct net_device_ops mvneta_netdev_ops = {
.ndo_open= mvneta_open,
.ndo_stop= mvneta_stop,
@@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
.set_rxfh   = mvneta_ethtool_set_rxfh,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = mvneta_ethtool_set_link_ksettings,
+   .get_wol= mvneta_ethtool_get_wol,
+   .set_wol= mvneta_ethtool_set_wol,
 };
 
 /* Initialize hw */
-- 
1.9.1

[PATCH] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread Jingju Hou

The mvneta itself does not support WOL, but the PHY might.
So pass the calls to the PHY

Signed-off-by: Jingju Hou 
---
Since v1:
- using phy_dev member in struct net_device

 drivers/net/ethernet/marvell/mvneta.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index e05e227..78869fa 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
return 0;
 }
 
+static void
+mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   wol->supported = 0;
+   wol->wolopts = 0;
+
+   if (dev->phy_dev)
+   return phy_ethtool_get_wol(dev->phy_dev, wol);
+}
+
+static int
+mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
+{
+   if (!dev->phy_dev)
+   return -EOPNOTSUPP;
+
+   return phy_ethtool_set_wol(dev->phy_dev, wol);
+}
+
 static const struct net_device_ops mvneta_netdev_ops = {
.ndo_open= mvneta_open,
.ndo_stop= mvneta_stop,
@@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device 
*dev, u32 *indir, u8 *key,
.set_rxfh   = mvneta_ethtool_set_rxfh,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = mvneta_ethtool_set_link_ksettings,
+   .get_wol= mvneta_ethtool_get_wol,
+   .set_wol= mvneta_ethtool_set_wol,
 };
 
 /* Initialize hw */
-- 
1.9.1

Re: [RESEND PATCH net] macsec: fix validation failed in asynchronous operation.

2017-01-22 Thread Ryder Lee

Sorry for forgetting to explain it. 

The original patch was incomplete, but I sent it out by mistake... 
So please ignore it.

On Sun, 2017-01-22 at 16:44 -0500, David Miller wrote:
> Why are you resending this?
> 
> The original posting on Jan 20th made it to the mailing list and is queued
> up in patchwork just fine.
> 
> Also, regardless of the reason, a "RESEND" patch should always contain an
> explanation of why it needs to be resent.  So that the maintainer doesn't
> need to ask questions like I am right now.

[lkp-robot] [net] a1a22c1206: BUG:unable_to_handle_kernel

2017-01-22 Thread kernel test robot


FYI, we noticed the following commit:

commit: a1a22c12060e4b9c52f45d4b3460f614e00162a2 ("net: ipv6: Keep nexthop of 
multipath route on admin down")
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-i386 -enable-kvm -m 320M

caused below changes:


+---+++
|   | dceeab0e52 | 
a1a22c1206 |
+---+++
| boot_successes| 8  | 4
  |
| boot_failures | 0  | 4
  |
| BUG:unable_to_handle_kernel   | 0  | 4
  |
| Oops:#[##]| 0  | 4
  |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0  | 4
  |
+---+++



[  150.634538] ubus (612) used greatest stack depth: 6716 bytes left
[  151.925694] ubus (647) used greatest stack depth: 6616 bytes left
[  154.978628] ubus (724) used greatest stack depth: 6604 bytes left
[  158.324778] BUG: unable to handle kernel NULL pointer dereference at 0198
[  158.334111] IP: rt6_fill_node+0x14e/0x4a6
[  158.339546] *pdpt = 0f789001 *pde =  
[  158.339554] 
[  158.349075] Oops:  [#1]
[  158.353060] CPU: 0 PID: 726 Comm: netifd Not tainted 
4.10.0-rc4-00660-ga1a22c1 #1
[  158.362818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[  158.375911] task: cf751000 task.stack: cf78c000
[  158.381810] EIP: rt6_fill_node+0x14e/0x4a6
[  158.386925] EFLAGS: 00010246 CPU: 0
[  158.392921] EAX:  EBX: d1f6c358 ECX: cec03f40 EDX: 
[  158.400763] ESI: cf78dbf8 EDI:  EBP: cf78dc54 ESP: cf78dbe8
[  158.408870]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[  158.415864] CR0: 80050033 CR2: 0198 CR3: 12e50220 CR4: 06b0
[  158.423771] Call Trace:
[  158.426955]  ? paravirt_sched_clock+0x9/0xd
[  158.432069]  ? sched_clock+0x9/0xc
[  158.436702]  ? sched_clock_cpu+0x1a/0xe1


To reproduce:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k  job-script  # job-script is attached in this 
email



Thanks,
Xiaolong
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.10.0-rc4 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_BITS_MAX=16
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_DEBUG_RODATA=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
CONFIG_KERNEL_LZO=y
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
# CONFIG_SYSVIPC is not set
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_

[PATCHv2 perf/core 3/7] tools lib bpf: Add set/is helpers for all prog types

2017-01-22 Thread Joe Stringer

These bpf_prog_types were exposed in the uapi but there were no
corresponding functions to set these types for programs in libbpf.

Signed-off-by: Joe Stringer 
Acked-by: Wang Nan 
---
v2: Add ack.
---
 tools/lib/bpf/libbpf.c |  5 +
 tools/lib/bpf/libbpf.h | 10 ++
 2 files changed, 15 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 371cb40a2304..406838fa9c4f 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1448,8 +1448,13 @@ bool bpf_program__is_##NAME(struct bpf_program *prog)
\
return bpf_program__is_type(prog, TYPE);\
 }  \
 
+BPF_PROG_TYPE_FNS(socket_filter, BPF_PROG_TYPE_SOCKET_FILTER);
 BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE);
+BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS);
+BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT);
 BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
+BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
+BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 
 int bpf_map__fd(struct bpf_map *map)
 {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index a5a8b86a06fe..2188ccdc0e2d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -174,11 +174,21 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n);
 /*
  * Adjust type of bpf program. Default is kprobe.
  */
+int bpf_program__set_socket_filter(struct bpf_program *prog);
 int bpf_program__set_tracepoint(struct bpf_program *prog);
 int bpf_program__set_kprobe(struct bpf_program *prog);
+int bpf_program__set_sched_cls(struct bpf_program *prog);
+int bpf_program__set_sched_act(struct bpf_program *prog);
+int bpf_program__set_xdp(struct bpf_program *prog);
+int bpf_program__set_perf_event(struct bpf_program *prog);
 
+bool bpf_program__is_socket_filter(struct bpf_program *prog);
 bool bpf_program__is_tracepoint(struct bpf_program *prog);
 bool bpf_program__is_kprobe(struct bpf_program *prog);
+bool bpf_program__is_sched_cls(struct bpf_program *prog);
+bool bpf_program__is_sched_act(struct bpf_program *prog);
+bool bpf_program__is_xdp(struct bpf_program *prog);
+bool bpf_program__is_perf_event(struct bpf_program *prog);
 
 /*
  * We don't need __attribute__((packed)) now since it is
-- 
2.11.0

[PATCHv2 perf/core 1/7] tools lib bpf: Fix map offsets in relocation

2017-01-22 Thread Joe Stringer

Commit 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") attempted to
fix map resolution by identifying the number of symbols that point to
maps, and using this number to resolve each of the maps.

However, during relocation the original definition of the map size was
still in use. For up to two maps, the calculation was correct if there
was a small difference in size between the map definition in libbpf and
the one that the client library uses. However if the difference was
large, particularly if more than two maps were used in the BPF program,
the relocation would fail.

For example, when using a map definition with size 28, with three maps,
map relocation would count
(sym_offset / sizeof(struct bpf_map_def) => map_idx)
(0 / 16 => 0), ie map_idx = 0
(28 / 16 => 1), ie map_idx = 1
(56 / 16 => 3), ie map_idx = 3

So, libbpf reports:
libbpf: bpf relocation: map_idx 3 large than 2

Fix map relocation by checking the exact offset of maps when doing
relocation.

Fixes: 4708bbda5cb2 ("tools lib bpf: Fix maps resolution")
Signed-off-by: Joe Stringer 
Signed-off-by: Wang Nan 
[Allow different map size in an object]
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Arnaldo Carvalho de Melo 
---
v2: Use cached offsets of maps for relocation (Wang Nan)

This is a repost of the version Wang Nan posted on Jan 19.
---
 tools/lib/bpf/libbpf.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 84e6b35da4bd..671d5ad07cf1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -779,7 +779,7 @@ static int
 bpf_program__collect_reloc(struct bpf_program *prog,
   size_t nr_maps, GElf_Shdr *shdr,
   Elf_Data *data, Elf_Data *symbols,
-  int maps_shndx)
+  int maps_shndx, struct bpf_map *maps)
 {
int i, nrels;
 
@@ -829,7 +829,15 @@ bpf_program__collect_reloc(struct bpf_program *prog,
return -LIBBPF_ERRNO__RELOC;
}
 
-   map_idx = sym.st_value / sizeof(struct bpf_map_def);
+   /* TODO: 'maps' is sorted. We can use bsearch to make it 
faster. */
+   for (map_idx = 0; map_idx < nr_maps; map_idx++) {
+   if (maps[map_idx].offset == sym.st_value) {
+   pr_debug("relocation: find map %zd (%s) for 
insn %u\n",
+map_idx, maps[map_idx].name, insn_idx);
+   break;
+   }
+   }
+
if (map_idx >= nr_maps) {
pr_warning("bpf relocation: map_idx %d large than %d\n",
   (int)map_idx, (int)nr_maps - 1);
@@ -953,7 +961,8 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
err = bpf_program__collect_reloc(prog, nr_maps,
 shdr, data,
 obj->efile.symbols,
-obj->efile.maps_shndx);
+obj->efile.maps_shndx,
+obj->maps);
if (err)
return err;
}
-- 
2.11.0

[PATCHv2 perf/core 0/7] Libbpf improvements

2017-01-22 Thread Joe Stringer

Patch 1 fixes an issue when using drastically different BPF map definitions
inside ELFs from a client using libbpf, vs the map definition libbpf uses.

Patches 2-4 add some simple, useful helper functions for setting prog type
and retrieving libbpf errors without depending on kernel headers from
userspace programs.

Patches 5-7 add a new pinning functionality for maps, programs, and objects.
Library users may call bpf_map__pin(map, path) or bpf_program__pin(prog, path)
to pin maps and programs separately, or use bpf_object__pin(obj, path) to
pin all maps and programs from the BPF object to the path. The map and program
variations require a full path where it will be pinned in the filesystem,
and the object variation will create directories "maps/" and "progs/" under
the specified path, then mount each map and program under those subdirectories.

---
v1: Initial post.
v2: Wang Nan provided improvements to patch 1.
Dropped patch 2 from v1.
Added acks for acked patches.
Split the bpf_obj__pin() to also provide map / program pinning APIs.
Allow users to provide full filesystem path (don't autodetect/mount BPFFS).

Joe Stringer (7):
  tools lib bpf: Fix map offsets in relocation
  tools lib bpf: Define prog_type fns with macro
  tools lib bpf: Add set/is helpers for all prog types
  tools lib bpf: Add libbpf_get_error()
  tools lib bpf: Add bpf_program__pin()
  tools lib bpf: Add bpf_map__pin()
  tools lib bpf: Add bpf_object__pin()

 tools/lib/bpf/libbpf.c  | 240 ++--
 tools/lib/bpf/libbpf.h  |  17 +++-
 tools/perf/tests/llvm.c |   2 +-
 3 files changed, 229 insertions(+), 30 deletions(-)

-- 
2.11.0

[PATCHv2 perf/core 2/7] tools lib bpf: Define prog_type fns with macro

2017-01-22 Thread Joe Stringer

Turning this into a macro allows future prog types to be added with a
single line per type.

Signed-off-by: Joe Stringer 
Acked-by: Wang Nan 
---
v2: Add ack.
---
 tools/lib/bpf/libbpf.c | 41 -
 1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 671d5ad07cf1..371cb40a2304 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1428,37 +1428,28 @@ static void bpf_program__set_type(struct bpf_program 
*prog,
prog->type = type;
 }
 
-int bpf_program__set_tracepoint(struct bpf_program *prog)
-{
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-   return 0;
-}
-
-int bpf_program__set_kprobe(struct bpf_program *prog)
-{
-   if (!prog)
-   return -EINVAL;
-   bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE);
-   return 0;
-}
-
 static bool bpf_program__is_type(struct bpf_program *prog,
 enum bpf_prog_type type)
 {
return prog ? (prog->type == type) : false;
 }
 
-bool bpf_program__is_tracepoint(struct bpf_program *prog)
-{
-   return bpf_program__is_type(prog, BPF_PROG_TYPE_TRACEPOINT);
-}
-
-bool bpf_program__is_kprobe(struct bpf_program *prog)
-{
-   return bpf_program__is_type(prog, BPF_PROG_TYPE_KPROBE);
-}
+#define BPF_PROG_TYPE_FNS(NAME, TYPE)  \
+int bpf_program__set_##NAME(struct bpf_program *prog)  \
+{  \
+   if (!prog)  \
+   return -EINVAL; \
+   bpf_program__set_type(prog, TYPE);  \
+   return 0;   \
+}  \
+   \
+bool bpf_program__is_##NAME(struct bpf_program *prog)  \
+{  \
+   return bpf_program__is_type(prog, TYPE);\
+}  \
+
+BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE);
+BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 
 int bpf_map__fd(struct bpf_map *map)
 {
-- 
2.11.0

[PATCHv2 perf/core 5/7] tools lib bpf: Add bpf_program__pin()

2017-01-22 Thread Joe Stringer

Add a new API to pin a BPF program to the filesystem. The user can
specify the path full path within a BPF filesystem to pin the program.
Programs with multiple instances are pinned as 'foo', 'foo_1', 'foo_2',
and so on.

Signed-off-by: Joe Stringer 
---
v2: Don't automount BPF filesystem
Split program, map, object pinning into separate APIs and separate
patches.
---
 tools/lib/bpf/libbpf.c | 76 ++
 tools/lib/bpf/libbpf.h |  1 +
 2 files changed, 77 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e6cd62b1264b..eea5c74808f7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4,6 +4,7 @@
  * Copyright (C) 2013-2015 Alexei Starovoitov 
  * Copyright (C) 2015 Wang Nan 
  * Copyright (C) 2015 Huawei Inc.
+ * Copyright (C) 2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU Lesser General Public
@@ -22,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -31,7 +33,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -1237,6 +1242,77 @@ int bpf_object__load(struct bpf_object *obj)
return err;
 }
 
+static int check_path(const char *path)
+{
+   struct statfs st_fs;
+   char *dname, *dir;
+   int err = 0;
+
+   if (path == NULL)
+   return -EINVAL;
+
+   dname = strdup(path);
+   dir = dirname(dname);
+   if (statfs(dir, &st_fs)) {
+   pr_warning("failed to statfs %s: %s\n", dir, strerror(errno));
+   err = -errno;
+   }
+   free(dname);
+
+   if (!err && st_fs.f_type != BPF_FS_MAGIC) {
+   pr_warning("specified path %s is not on BPF FS\n", path);
+   err = -EINVAL;
+   }
+
+   return err;
+}
+
+int bpf_program__pin(struct bpf_program *prog, const char *path)
+{
+   int i, err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (prog == NULL) {
+   pr_warning("invalid program pointer\n");
+   return -EINVAL;
+   }
+
+   if (prog->instances.nr <= 0) {
+   pr_warning("no instances of prog %s to pin\n",
+  prog->section_name);
+   return -EINVAL;
+   }
+
+   if (bpf_obj_pin(prog->instances.fds[0], path)) {
+   pr_warning("failed to pin program: %s\n", strerror(errno));
+   return -errno;
+   }
+   pr_debug("pinned program '%s'\n", path);
+
+   for (i = 1; i < prog->instances.nr; i++) {
+   char buf[PATH_MAX];
+   int len;
+
+   len = snprintf(buf, PATH_MAX, "%s_%d", path, i);
+   if (len < 0)
+   return -EINVAL;
+   else if (len > PATH_MAX)
+   return -ENAMETOOLONG;
+
+   if (bpf_obj_pin(prog->instances.fds[i], buf)) {
+   pr_warning("failed to pin program: %s\n",
+  strerror(errno));
+   return -errno;
+   }
+   pr_debug("pinned program '%s'\n", buf);
+   }
+
+   return 0;
+}
+
 void bpf_object__close(struct bpf_object *obj)
 {
size_t i;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 4014d1ba5e3d..7973087c377b 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -106,6 +106,7 @@ void *bpf_program__priv(struct bpf_program *prog);
 const char *bpf_program__title(struct bpf_program *prog, bool needs_copy);
 
 int bpf_program__fd(struct bpf_program *prog);
+int bpf_program__pin(struct bpf_program *prog, const char *path);
 
 struct bpf_insn;
 
-- 
2.11.0

[PATCHv2 perf/core 4/7] tools lib bpf: Add libbpf_get_error()

2017-01-22 Thread Joe Stringer

This function will turn a libbpf pointer into a standard error code (or
0 if the pointer is valid). This also allows removal of the dependency
on linux/err.h in the public header file, which causes problems in
userspace programs built against libbpf.

Signed-off-by: Joe Stringer 
Acked-by: Wang Nan 
---
v2: Add ack.
---
 tools/lib/bpf/libbpf.c  | 8 
 tools/lib/bpf/libbpf.h  | 4 +++-
 tools/perf/tests/llvm.c | 2 +-
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 406838fa9c4f..e6cd62b1264b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1542,3 +1543,10 @@ bpf_object__find_map_by_offset(struct bpf_object *obj, 
size_t offset)
}
return ERR_PTR(-ENOENT);
 }
+
+long libbpf_get_error(const void *ptr)
+{
+   if (IS_ERR(ptr))
+   return PTR_ERR(ptr);
+   return 0;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2188ccdc0e2d..4014d1ba5e3d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -22,8 +22,8 @@
 #define __BPF_LIBBPF_H
 
 #include 
+#include 
 #include 
-#include 
 #include   // for size_t
 
 enum libbpf_errno {
@@ -234,4 +234,6 @@ int bpf_map__set_priv(struct bpf_map *map, void *priv,
  bpf_map_clear_priv_t clear_priv);
 void *bpf_map__priv(struct bpf_map *map);
 
+long libbpf_get_error(const void *ptr);
+
 #endif
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 02a33ebcd992..d357dab72e68 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -13,7 +13,7 @@ static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
struct bpf_object *obj;
 
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, NULL);
-   if (IS_ERR(obj))
+   if (libbpf_get_error(obj))
return TEST_FAIL;
bpf_object__close(obj);
return TEST_OK;
-- 
2.11.0

[PATCHv2 perf/core 7/7] tools lib bpf: Add bpf_object__pin()

2017-01-22 Thread Joe Stringer

Add a new API to pin a BPF object to the filesystem. The user can
specify the path full path within a BPF filesystem to pin the object.
Programs will be pinned under a subdirectory 'progs', and maps will be
pinned under a subdirectory 'maps'.

For example, with the directory '/sys/fs/bpf/foo':
/sys/fs/bpf/foo/progs/PROG_NAME
/sys/fs/bpf/foo/maps/MAP_NAME

Signed-off-by: Joe Stringer 
---
v2: Don't automount BPF filesystem
Split program, map, object pinning into separate APIs and separate
patches.
---
 tools/lib/bpf/libbpf.c | 73 ++
 tools/lib/bpf/libbpf.h |  1 +
 2 files changed, 74 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index c1d8b07e21d2..41645dc51fa1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1335,6 +1336,78 @@ int bpf_map__pin(struct bpf_map *map, const char *path)
return 0;
 }
 
+static int make_dir(const char *path, const char *dir)
+{
+   char buf[PATH_MAX];
+   int len, err = 0;
+
+   len = snprintf(buf, PATH_MAX, "%s/%s", path, dir);
+   if (len < 0)
+   err = -EINVAL;
+   else if (len >= PATH_MAX)
+   err = -ENAMETOOLONG;
+   if (!err && mkdir(buf, 0700) && errno != EEXIST)
+   err = -errno;
+
+   if (err)
+   pr_warning("failed to make dir %s/%s: %s\n", path, dir,
+  strerror(-err));
+   return err;
+}
+
+int bpf_object__pin(struct bpf_object *obj, const char *path)
+{
+   struct bpf_program *prog;
+   struct bpf_map *map;
+   int err;
+
+   if (!obj)
+   return -ENOENT;
+
+   if (!obj->loaded) {
+   pr_warning("object not yet loaded; load it first\n");
+   return -ENOENT;
+   }
+
+   err = make_dir(path, "maps");
+   if (err)
+   return err;
+
+   bpf_map__for_each(map, obj) {
+   char buf[PATH_MAX];
+   int len;
+
+   len = snprintf(buf, PATH_MAX, "%s/maps/%s", path,
+  bpf_map__name(map));
+   if (len < 0 || len > PATH_MAX)
+   return -EINVAL;
+
+   err = bpf_map__pin(map, buf);
+   if (err)
+   return err;
+   }
+
+   err = make_dir(path, "progs");
+   if (err)
+   return err;
+
+   bpf_object__for_each_program(prog, obj) {
+   char buf[PATH_MAX];
+   int len;
+
+   len = snprintf(buf, PATH_MAX, "%s/progs/%s", path,
+  prog->section_name);
+   if (len < 0 || len > PATH_MAX)
+   return -EINVAL;
+
+   err = bpf_program__pin(prog, buf);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
 void bpf_object__close(struct bpf_object *obj)
 {
size_t i;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 524247cfd205..8363ee6db4a0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -65,6 +65,7 @@ struct bpf_object *bpf_object__open(const char *path);
 struct bpf_object *bpf_object__open_buffer(void *obj_buf,
   size_t obj_buf_sz,
   const char *name);
+int bpf_object__pin(struct bpf_object *object, const char *path);
 void bpf_object__close(struct bpf_object *object);
 
 /* Load/unload object into/from kernel */
-- 
2.11.0

[PATCHv2 perf/core 6/7] tools lib bpf: Add bpf_map__pin()

2017-01-22 Thread Joe Stringer

Add a new API to pin a BPF map to the filesystem. The user can
specify the path full path within a BPF filesystem to pin the map.

Signed-off-by: Joe Stringer 
---
v2: Don't automount BPF filesystem
Split program, map, object pinning into separate APIs and separate
patches.
---
 tools/lib/bpf/libbpf.c | 22 ++
 tools/lib/bpf/libbpf.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index eea5c74808f7..c1d8b07e21d2 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1313,6 +1313,28 @@ int bpf_program__pin(struct bpf_program *prog, const 
char *path)
return 0;
 }
 
+int bpf_map__pin(struct bpf_map *map, const char *path)
+{
+   int err;
+
+   err = check_path(path);
+   if (err)
+   return err;
+
+   if (map == NULL) {
+   pr_warning("invalid map pointer\n");
+   return -EINVAL;
+   }
+
+   if (bpf_obj_pin(map->fd, path)) {
+   pr_warning("failed to pin map: %s\n", strerror(errno));
+   return -errno;
+   }
+
+   pr_debug("pinned map '%s'\n", path);
+   return 0;
+}
+
 void bpf_object__close(struct bpf_object *obj)
 {
size_t i;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 7973087c377b..524247cfd205 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -234,6 +234,7 @@ typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void 
*);
 int bpf_map__set_priv(struct bpf_map *map, void *priv,
  bpf_map_clear_priv_t clear_priv);
 void *bpf_map__priv(struct bpf_map *map);
+int bpf_map__pin(struct bpf_map *map, const char *path);
 
 long libbpf_get_error(const void *ptr);
 
-- 
2.11.0

RE: [RFC PATCH net-next 4/5] bridge: vlan lwt and dst_metadata netlink support

2017-01-22 Thread Rosen, Rami

Hi, Roopa,

Two minor comments:

The parameter br is not used in the br_add_vlan_tunnel_info() method, it should 
be removed:

+static int br_add_vlan_tunnel_info(struct net_bridge *br,
+  struct net_bridge_port *p, int cmd,
+  u16 vid, u32 tun_id)
+{
+   int err;
+
+   switch (cmd) {
+   case RTM_SETLINK:
+   if (p) {
+   /* if the MASTER flag is set this will act on the global
+* per-VLAN entry as well
+*/
+   err = nbp_vlan_tunnel_info_add(p, vid, tun_id);
+   if (err)
+   break;
+   } else {
+   return -EINVAL;
+   }
+
+   break;
+
+   case RTM_DELLINK:
+   if (p)
+   nbp_vlan_tunnel_info_delete(p, vid);
+   else
+   return -EINVAL;
+   break;
+   }
+
+   return 0;
+}
+

The parameter br is used inside br_process_vlan_tunnel_info() only in the two 
Cases, when br_add_vlan_tunnel_info() is invoked. Since we saw earlier that it 
should be removed from br_add_vlan_tunnel_info(), it should also be removed 
from br_process_vlan_tunnel_info() as it is not needed anymore:

+static int br_process_vlan_tunnel_info(struct net_bridge *br,
+  struct net_bridge_port *p, int cmd,
+  struct vtunnel_info *tinfo_curr,
+  struct vtunnel_info *tinfo_last) {
+   int t, v;
+   int err;
+
+   if (tinfo_curr->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
+   if (tinfo_last->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN)
+   return -EINVAL;
+   memcpy(tinfo_last, tinfo_curr, sizeof(struct vtunnel_info));
+   } else if (tinfo_curr->flags & BRIDGE_VLAN_INFO_RANGE_END) {
+   if (!(tinfo_last->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN))
+   return -EINVAL;
+   if ((tinfo_curr->vid - tinfo_last->vid) !=
+   (tinfo_curr->tunid - tinfo_last->tunid))
+   return -EINVAL;
+   /* XXX: tun id and vlan id attrs must be same
+*/
+   t = tinfo_last->tunid;
+   for (v = tinfo_last->vid; v <= tinfo_curr->vid; v++) {
+   err = br_add_vlan_tunnel_info(br, p, cmd,
+ v, t);
+   if (err)
+   return err;
+   t++;
+   }
+   memset(tinfo_last, 0, sizeof(struct vtunnel_info));
+   memset(tinfo_curr, 0, sizeof(struct vtunnel_info));
+   } else {
+   err = br_add_vlan_tunnel_info(br, p, cmd,
+ tinfo_curr->vid,
+ tinfo_curr->tunid);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+

Regards,
Rami Rosen

Re: [PATCH v4 3/3] samples/bpf: add lpm-trie benchmark

2017-01-22 Thread Alexei Starovoitov

On Sat, Jan 21, 2017 at 05:26:13PM +0100, Daniel Mack wrote:
> From: David Herrmann 
> 
> Extend the map_perf_test_{user,kern}.c infrastructure to stress test
> lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
> the latency depending on trie size and lookup count.
> 
> On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
> bpf program takes roughly 6.5us on my system. Lookups in empty tries
> take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
> entries take ~7.1us (on the first _and_ any subsequent try).
> 
> Signed-off-by: David Herrmann 
> Reviewed-by: Daniel Mack 

Acked-by: Alexei Starovoitov 

Thank you for all the hard work you've put into these patches.
All looks great to me.

Re: [patch] samples/bpf: silence shift wrapping warning

2017-01-22 Thread Alexei Starovoitov

On Sat, Jan 21, 2017 at 07:51:43AM +0300, Dan Carpenter wrote:
> max_key is a value in the 0-63 range, so on 32 bit systems the shift
> could wrap.
> 
> Signed-off-by: Dan Carpenter 

Looks fine. I think 'net-next' is ok.

Acked-by: Alexei Starovoitov 

> diff --git a/samples/bpf/lwt_len_hist_user.c b/samples/bpf/lwt_len_hist_user.c
> index ec8f3bb..bd06eef 100644
> --- a/samples/bpf/lwt_len_hist_user.c
> +++ b/samples/bpf/lwt_len_hist_user.c
> @@ -68,7 +68,7 @@ int main(int argc, char **argv)
>   for (i = 1; i <= max_key + 1; i++) {
>   stars(starstr, data[i - 1], max_value, MAX_STARS);
>   printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
> -(1l << i) >> 1, (1l << i) - 1, data[i - 1],
> +(1ULL << i) >> 1, (1ULL << i) - 1, data[i - 1],
>  MAX_STARS, starstr);
>   }
>

Re: [PATCH v4 2/3] bpf: Add tests for the lpm trie map

2017-01-22 Thread Alexei Starovoitov

On Sat, Jan 21, 2017 at 05:26:12PM +0100, Daniel Mack wrote:
> From: David Herrmann 
> 
> The first part of this program runs randomized tests against the
> lpm-bpf-map. It implements a "Trivial Longest Prefix Match" (tlpm)
> based on simple, linear, single linked lists. The implementation
> should be pretty straightforward.
> 
> Based on tlpm, this inserts randomized data into bpf-lpm-maps and
> verifies the trie-based bpf-map implementation behaves the same way
> as tlpm.
> 
> The second part uses 'real world' IPv4 and IPv6 addresses and tests
> the trie with those.
> 
> Signed-off-by: David Herrmann 
> Signed-off-by: Daniel Mack 

Acked-by: Alexei Starovoitov

Re: [PATCH v4 1/3] bpf: add a longest prefix match trie map implementation

2017-01-22 Thread Alexei Starovoitov

On Sat, Jan 21, 2017 at 05:26:11PM +0100, Daniel Mack wrote:
> This trie implements a longest prefix match algorithm that can be used
> to match IP addresses to a stored set of ranges.
> 
> Internally, data is stored in an unbalanced trie of nodes that has a
> maximum height of n, where n is the prefixlen the trie was created
> with.
> 
> Tries may be created with prefix lengths that are multiples of 8, in
> the range from 8 to 2048. The key used for lookup and update operations
> is a struct bpf_lpm_trie_key, and the value is a uint64_t.
> 
> The code carries more information about the internal implementation.
> 
> Signed-off-by: Daniel Mack 
> Reviewed-by: David Herrmann 

Looks great to me.
Acked-by: Alexei Starovoitov

[PATCH net-next v5 2/2] net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable

2017-01-22 Thread Martin Blumenstingl

Prior to this patch we were using a hardcoded RGMII TX clock delay of
2ns (= 1/4 cycle of the 125MHz RGMII TX clock). This value works for
many boards, but unfortunately not for all (due to the way the actual
circuit is designed, sometimes because the TX delay is enabled in the
PHY, etc.). Making the TX delay on the MAC side configurable allows us
to support all possible hardware combinations.

This allows fixing a compatibility issue on some boards, where the
RTL8211F PHY is configured to generate the TX delay. We can now turn
off the TX delay in the MAC, because otherwise we would be applying the
delay twice (which results in non-working TX traffic).

Signed-off-by: Martin Blumenstingl 
Tested-by: Neil Armstrong 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
index ffaed1f35efe..8840a360a0b7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c
@@ -35,10 +35,6 @@
 
 #define PRG_ETH0_TXDLY_SHIFT   5
 #define PRG_ETH0_TXDLY_MASKGENMASK(6, 5)
-#define PRG_ETH0_TXDLY_OFF (0x0 << PRG_ETH0_TXDLY_SHIFT)
-#define PRG_ETH0_TXDLY_QUARTER (0x1 << PRG_ETH0_TXDLY_SHIFT)
-#define PRG_ETH0_TXDLY_HALF(0x2 << PRG_ETH0_TXDLY_SHIFT)
-#define PRG_ETH0_TXDLY_THREE_QUARTERS  (0x3 << PRG_ETH0_TXDLY_SHIFT)
 
 /* divider for the result of m250_sel */
 #define PRG_ETH0_CLK_M250_DIV_SHIFT7
@@ -69,6 +65,8 @@ struct meson8b_dwmac {
 
struct clk_divider  m25_div;
struct clk  *m25_div_clk;
+
+   u32 tx_delay_ns;
 };
 
 static void meson8b_dwmac_mask_bits(struct meson8b_dwmac *dwmac, u32 reg,
@@ -179,6 +177,7 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac)
 {
int ret;
unsigned long clk_rate;
+   u8 tx_dly_val;
 
switch (dwmac->phy_mode) {
case PHY_INTERFACE_MODE_RGMII:
@@ -196,9 +195,13 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac 
*dwmac)
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0,
PRG_ETH0_INVERTED_RMII_CLK, 0);
 
-   /* TX clock delay - all known boards use a 1/4 cycle delay */
+   /* TX clock delay in ns = "8ns / 4 * tx_dly_val" (where
+* 8ns are exactly one cycle of the 125MHz RGMII TX clock):
+* 0ns = 0x0, 2ns = 0x1, 4ns = 0x2, 6ns = 0x3
+*/
+   tx_dly_val = dwmac->tx_delay_ns >> 1;
meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_TXDLY_MASK,
-   PRG_ETH0_TXDLY_QUARTER);
+   tx_dly_val << PRG_ETH0_TXDLY_SHIFT);
break;
 
case PHY_INTERFACE_MODE_RMII:
@@ -284,6 +287,11 @@ static int meson8b_dwmac_probe(struct platform_device 
*pdev)
goto err_remove_config_dt;
}
 
+   /* use 2ns as fallback since this value was previously hardcoded */
+   if (of_property_read_u32(pdev->dev.of_node, "amlogic,tx-delay-ns",
+&dwmac->tx_delay_ns))
+   dwmac->tx_delay_ns = 2;
+
ret = meson8b_init_clk(dwmac);
if (ret)
goto err_remove_config_dt;
-- 
2.11.0

[PATCH net-next v5 1/2] net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac

2017-01-22 Thread Martin Blumenstingl

This allows configuring the RGMII TX clock delay. The RGMII clock is
generated by underlying hardware of the the Meson 8b / GXBB DWMAC glue.
The configuration depends on the actual hardware (no delay may be
needed due to the design of the actual circuit, the PHY might add this
delay, etc.).

Signed-off-by: Martin Blumenstingl 
Tested-by: Neil Armstrong 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/net/meson-dwmac.txt | 16 
 1 file changed, 16 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt 
b/Documentation/devicetree/bindings/net/meson-dwmac.txt
index 89e62ddc69ca..0703ad3f3c1e 100644
--- a/Documentation/devicetree/bindings/net/meson-dwmac.txt
+++ b/Documentation/devicetree/bindings/net/meson-dwmac.txt
@@ -25,6 +25,22 @@ Required properties on Meson8b and newer:
- "clkin0" - first parent clock of the internal mux
- "clkin1" - second parent clock of the internal mux
 
+Optional properties on Meson8b and newer:
+- amlogic,tx-delay-ns: The internal RGMII TX clock delay (provided
+   by this driver) in nanoseconds. Allowed values
+   are: 0ns, 2ns, 4ns, 6ns.
+   When phy-mode is set to "rgmii" then the TX
+   delay should be explicitly configured. When
+   not configured a fallback of 2ns is used.
+   When the phy-mode is set to either "rgmii-id"
+   or "rgmii-txid" the TX clock delay is already
+   provided by the PHY. In that case this
+   property should be set to 0ns (which disables
+   the TX clock delay in the MAC to prevent the
+   clock from going off because both PHY and MAC
+   are adding a delay).
+   Any configuration is ignored when the phy-mode
+   is set to "rmii".
 
 Example for Meson6:
 
-- 
2.11.0

[PATCH net-next v5 0/2] stmmac: dwmac-meson8b: configurable RGMII TX delay

2017-01-22 Thread Martin Blumenstingl

Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4
cycle (= 2ns) TX clock delay. This seems to work fine for many boards
(for example Odroid-C2 or Amlogic's reference boards) but there are
some others where TX traffic is simply broken.
There are probably multiple reasons why it's working on some boards
while it's broken on others:
- some of Amlogic's reference boards are using a Micrel PHY
- hardware circuit design
- maybe more...

iperf3 results on my Mecool BB2 board (Meson GXM, RTL8211F PHY) with
TX clock delay disabled on the MAC (as it's enabled in the PHY driver).
TX throughput was virtually zero before:
$ iperf3 -c 192.168.1.100 -R
Connecting to host 192.168.1.100, port 5201
Reverse mode, remote host 192.168.1.100 is sending
[  4] local 192.168.1.206 port 52828 connected to 192.168.1.100 port 5201
[ ID] Interval   Transfer Bandwidth
[  4]   0.00-1.00   sec   108 MBytes   901 Mbits/sec
[  4]   1.00-2.00   sec  94.2 MBytes   791 Mbits/sec
[  4]   2.00-3.00   sec  96.5 MBytes   810 Mbits/sec
[  4]   3.00-4.00   sec  96.2 MBytes   808 Mbits/sec
[  4]   4.00-5.00   sec  96.6 MBytes   810 Mbits/sec
[  4]   5.00-6.00   sec  96.5 MBytes   810 Mbits/sec
[  4]   6.00-7.00   sec  96.6 MBytes   810 Mbits/sec
[  4]   7.00-8.00   sec  96.5 MBytes   809 Mbits/sec
[  4]   8.00-9.00   sec   105 MBytes   884 Mbits/sec
[  4]   9.00-10.00  sec   111 MBytes   934 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  1000 MBytes   839 Mbits/sec0 sender
[  4]   0.00-10.00  sec   998 MBytes   837 Mbits/sec  receiver

iperf Done.
$ iperf3 -c 192.168.1.100
Connecting to host 192.168.1.100, port 5201
[  4] local 192.168.1.206 port 52832 connected to 192.168.1.100 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.01   sec  99.5 MBytes   829 Mbits/sec  117139 KBytes
[  4]   1.01-2.00   sec   105 MBytes   884 Mbits/sec  129   70.7 KBytes
[  4]   2.00-3.01   sec   107 MBytes   889 Mbits/sec  106187 KBytes
[  4]   3.01-4.01   sec   105 MBytes   878 Mbits/sec   92143 KBytes
[  4]   4.01-5.00   sec   105 MBytes   882 Mbits/sec  140129 KBytes
[  4]   5.00-6.01   sec   106 MBytes   883 Mbits/sec  115195 KBytes
[  4]   6.01-7.00   sec   102 MBytes   863 Mbits/sec  133   70.7 KBytes
[  4]   7.00-8.01   sec   106 MBytes   884 Mbits/sec  143   97.6 KBytes
[  4]   8.01-9.01   sec   104 MBytes   875 Mbits/sec  124107 KBytes
[  4]   9.01-10.01  sec   105 MBytes   876 Mbits/sec   90139 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.01  sec  1.02 GBytes   874 Mbits/sec  1189 sender
[  4]   0.00-10.01  sec  1.02 GBytes   873 Mbits/sec  receiver

iperf Done.

I get similar TX throughput on my Meson GXBB "MXQ Pro+" board when I
disable the PHY's TX-delay and configure a 4ms TX-delay on the MAC.
So changes to at least the RTL8211F PHY driver are needed to get it
working properly in all situations.

Changes since v4:
- add a fallback of 2ns (the value which was previously hardcoded) for
  the TX delay so we are backwards-compatible with older .dts'
- update the documentation with the new fallback value and add a small
  note that the "amlogic,tx-delay" property is ignored when the phy-mode
  is "rmii".

Changes since v3:
- rebased to apply against current net-next branch (fixes a conflict
  with d2ed0a7755fe14c7 "net: ethernet: stmmac: fix of-node and
  fixed-link-phydev leaks")

Changes since v2:
- moved all .dts patches (3-7) to a separate series
- removed the default 2ns TX delay when phy-mode RGMII is specified
- (rebased against current net-next)

Changes since v1:
- renamed the devicetree property "amlogic,tx-delay" to
  "amlogic,tx-delay-ns", which makes the .dts easier to read as we can
  simply specify human-readable values instead of having "preprocessor
  defines and calculation in human brain". Thanks to Andrew Lunn for
  the suggestion!
- improved documentation to indicate when the MAC TX-delay should be
  configured and how to use the PHY's TX-delay
- changed the default TX-delay in the dwmac-meson8b driver from 2ns
  to 0ms when any of the rgmii-*id modes are used (the 2ns default
  value still applies for phy-mode "rgmii")
- added patches to properly reset the PHY on Meson GXBB devices and to
  use a similar configuration than the one we use on Meson GXL devices
  (by passing a phy-handle to stmmac and defining the PHY in the mdio0
  bus - patch 3-6)
- add the "amlogic,tx-delay-ns" property to all boards which are using
  the RGMII PHY (patch 7)


Martin Blumenstingl (2):
  net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac
  net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable

 .../devicetree/bindings/net/meson-dwmac.txt  | 16 
 drivers/net/ethernet/stm

Re: [PATCH net-next 0/7] net: dsa: bcm_sf2: Add support for BCM7278

2017-01-22 Thread David Miller

From: Florian Fainelli 
Date: Fri, 20 Jan 2017 12:36:27 -0800

> This patch series adds support for the Broadcom BCM7278 integrated switch
> which is a successor of the BCM7445 switch. We have a little bit of
> register shuffling going on, which is why most of the functional changes
> are to deal with that.

Applied, thanks.

Re: [PATCH net-next 0/2] net: systemport: Add support for SYSTEMPORT lite

2017-01-22 Thread David Miller

From: Florian Fainelli 
Date: Fri, 20 Jan 2017 11:08:25 -0800

> This patch series adds support for SYSTEMPORT Lite which is an evolution
> of the existing SYSTEMPORT adapter.
> 
> The two generations are largely identical as far as the transmit/receive
> path are concerned, and there were just a few control path changes here
> and there.

Series applied, thanks.

Re: [RESEND PATCH net] macsec: fix validation failed in asynchronous operation.

2017-01-22 Thread David Miller


Why are you resending this?

The original posting on Jan 20th made it to the mailing list and is queued
up in patchwork just fine.

Also, regardless of the reason, a "RESEND" patch should always contain an
explanation of why it needs to be resent.  So that the maintainer doesn't
need to ask questions like I am right now.

Re: [PATCH net v1 0/2] amd-xgbe: AMD XGBE driver fixes 2017-01-20

2017-01-22 Thread David Miller

From: Tom Lendacky 
Date: Fri, 20 Jan 2017 12:13:52 -0600

> This patch series addresses some issues in the AMD XGBE driver.
> 
> The following fixes are included in this driver update series:
> 
> - Add a fix for a version of the hardware that uses different register
>   offset values for a device with the same PCI device ID
> - Add support to check the return code from the xgbe_init() function
> 
> This patch series is based on net.

Series applied.

Re: [PATCH net-next] ipv6: add NUMA awareness to seg6_hmac_init_algo()

2017-01-22 Thread David Miller

From: Eric Dumazet 
Date: Fri, 20 Jan 2017 08:08:56 -0800

> From: Eric Dumazet 
> 
> Since we allocate per cpu storage, let's also use NUMA hints.
> 
> Signed-off-by: Eric Dumazet 

Applied.

Re: [PATCH] net: stmicro: fix LS field mask in EEE configuration

2017-01-22 Thread David Miller

From: Joao Pinto 
Date: Fri, 20 Jan 2017 16:00:26 +

> This patch fixes the LS mask when setting EEE timer.
> LS field is 10 bits long and not 11 as currently.
> 
> Signed-off-by: Joao Pinto 
> Reported-By: Rayagond Kokatanur 

Please indicate the appropriate target tree of your patch in the
subject line just like all other developers on this list do, don't
make me guess.

This time I figured out that this is meant for the net-next tree,
but I will not guess next time, I will just reject your patch
instead.

Thanks.

Re: [PATCH] net/mlx4: use rb_entry()

2017-01-22 Thread David Miller

From: Leon Romanovsky 
Date: Sun, 22 Jan 2017 09:48:39 +0200

> I don't understand completely the rationale behind this conversion.
> rb_entry == container_of, why do we need another name for it?

Because it's an annotation.

Either you agree that the macro exists and it should be used in
every spot where those types are being used, or you don't and
therefore argue for the macro and it's usage completely.

Re: [PATCH] 6lowpan: use rb_entry()

2017-01-22 Thread David Miller

From: Geliang Tang 
Date: Fri, 20 Jan 2017 22:36:53 +0800

> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 

Applied.

Re: [PATCH] net/mlx4: use rb_entry()

2017-01-22 Thread David Miller

From: Geliang Tang 
Date: Fri, 20 Jan 2017 22:36:57 +0800

> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.
> 
> Signed-off-by: Geliang Tang 

Applied.

[PATCH net-next] net: dsa: Fix inverted test for multiple CPU interface

2017-01-22 Thread Andrew Lunn

Remove the wrong !, otherwise we get false positives about having
multiple CPU interfaces.

Fixes: b22de490869d ("net: dsa: store CPU switch structure in the tree")
Signed-off-by: Andrew Lunn 
---
 net/dsa/dsa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 77cb78767f1d..1f3afeb673d6 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -225,7 +225,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
continue;
 
if (!strcmp(name, "cpu")) {
-   if (!dst->cpu_switch) {
+   if (dst->cpu_switch) {
netdev_err(dst->master_netdev,
   "multiple cpu ports?!\n");
return -EINVAL;
-- 
2.11.0

[PATCH net-next v6 1/1] net sched actions: Add support for user cookies

2017-01-22 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it.  The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

--- 127.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2109ms
rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1

... show some stats
$ sudo $TC -s actions get action gact index 1

action order 1: gact action pass
 random type none pass val 0
 index 1 ref 2 bind 1 installed 204 sec used 5 sec
Action statistics:
Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. try longer cookie...
$ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef
.. dump..
$ sudo $TC -s actions ls action gact

action order 1: gact action pass
 random type none pass val 0
 index 1 ref 2 bind 1 installed 204 sec used 5 sec
Action statistics:
Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie 1234567890abcdef

Signed-off-by: Jamal Hadi Salim 
---
Changes in v6:
 - fix mem leak caught by Florian

Changes in V5:
 - kill the stylistic changes
 - Adopt a new structure with length-valuepointer representation
 - rename some things

Changes in v4:
 - move stylistic changes out into a separate patch
   (and add more stylistic changes)

Changes in v3:
 - use TC_ prefix for the max size
 - move the cookie struct so visible only to kernel
 - remove unneeded void * cast

Changes in V2:
 -move from a union to a length-value representation

 include/net/act_api.h|  1 +
 include/net/pkt_cls.h|  8 
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/act_api.c  | 36 
 4 files changed, 48 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 1d71644..cfa2ae3 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -41,6 +41,7 @@ struct tc_action {
struct rcu_head tcfa_rcu;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
struct gnet_stats_queue __percpu *cpu_qstats;
+   struct tc_cookie*act_cookie;
 };
 #define tcf_head   common.tcfa_head
 #define tcf_index  common.tcfa_index
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f0a0514..b43077e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -515,4 +515,12 @@ struct tc_cls_bpf_offload {
u32 gen_flags;
 };
 
+
+/* This structure holds cookie structure that is passed from user
+ * to the kernel for actions and classifiers
+ */
+struct tc_cookie {
+   u8  *data;
+   u32 len;
+};
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index fd373eb..345551e 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -4,6 +4,8 @@
 #include 
 #include 
 
+#define TC_COOKIE_MAX_SIZE 16
+
 /* Action attributes */
 enum {
TCA_ACT_UNSPEC,
@@ -12,6 +14,7 @@ enum {
TCA_ACT_INDEX,
TCA_ACT_STATS,
TCA_ACT_PAD,
+   TCA_ACT_COOKIE,
__TCA_ACT_MAX
 };
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index cd08df9..58cf1c5 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -33,6 +34,8 @@ static void free_tcf(struct rcu_head *head)
 
free_percpu(p->cpu_bstats);
free_percpu(p->cpu_qstats);
+   kfree(p->act_cookie->data);
+   kfree(p->act_cookie);
kfree(p);
 }
 
@@ -475,6 +478,12 @@ int tcf_action_destroy(struct list_head *actions, int bind)
goto nla_put_failure;
if (tcf_action_copy_stats(skb, a, 0))
goto nla_put_failure;
+   if (a->act_cookie) {
+   if (nla_put(skb, TCA_ACT_COOKIE, a->act_co

Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies

2017-01-22 Thread Jamal Hadi Salim


On 17-01-22 02:32 PM, Jiri Pirko wrote:

Sun, Jan 22, 2017 at 07:57:17PM CET, j...@mojatatu.com wrote:

On 17-01-22 01:13 PM, Florian Fainelli wrote:








+   a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE],
+GFP_KERNEL);
+   if (!a->act_cookie->data) {
+   err = -ENOMEM;
+   tcf_hash_release(a, bind);
+   goto err_mod;
+   }


Are not you leaking a->act_cookie here in case nla_memdup() fails here?



yes, I am. Thanks for catching this. V6 coming up.


Btw, you don't have to send cover letter for a single patch. In fact, you
should not.



You can see i write small novels in my commit logs. Do you suggest i
put the git history there as well?

cheers,
jamal

Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies

2017-01-22 Thread Jiri Pirko

Sun, Jan 22, 2017 at 07:57:17PM CET, j...@mojatatu.com wrote:
>On 17-01-22 01:13 PM, Florian Fainelli wrote:
>> 
>> 
>
>> 
>> > +  a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE],
>> > +   GFP_KERNEL);
>> > +  if (!a->act_cookie->data) {
>> > +  err = -ENOMEM;
>> > +  tcf_hash_release(a, bind);
>> > +  goto err_mod;
>> > +  }
>> 
>> Are not you leaking a->act_cookie here in case nla_memdup() fails here?
>> 
>
>yes, I am. Thanks for catching this. V6 coming up.

Btw, you don't have to send cover letter for a single patch. In fact, you
should not.

[PATCH 3/3] sh_eth: stop using bare numbers for EESIPR values

2017-01-22 Thread Sergei Shtylyov

Now  that we  have almost all EESIPR bits declared (and those that  are
still not are most probably reserved anyway) we can at last replace the
bare  numbers used for 'sh_eth_cpu_data::eesipr_value' initializers with
the bit names ORed together...

Signed-off-by: Sergei Shtylyov 

---
 drivers/net/ethernet/renesas/sh_eth.c |   89 +-
 1 file changed, 78 insertions(+), 11 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -518,7 +518,14 @@ static struct sh_eth_cpu_data r7s72100_d
 
.ecsr_value = ECSR_ICD,
.ecsipr_value   = ECSIPR_ICDIP,
-   .eesipr_value   = 0xe77f009f,
+   .eesipr_value   = EESIPR_TWB1IP | EESIPR_TWBIP | EESIPR_TC1IP |
+ EESIPR_TABTIP | EESIPR_RABTIP | EESIPR_RFCOFIP |
+ EESIPR_ECIIP |
+ EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP |
+ EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP |
+ EESIPR_RMAFIP | EESIPR_RRFIP |
+ EESIPR_RTLFIP | EESIPR_RTSFIP |
+ EESIPR_PREIP | EESIPR_CERFIP,
 
.tx_check   = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
@@ -556,7 +563,14 @@ static struct sh_eth_cpu_data r8a7740_da
 
.ecsr_value = ECSR_ICD | ECSR_MPD,
.ecsipr_value   = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP,
-   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP |
+ EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP |
+ EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP |
+ 0xf000 | EESIPR_CNDIP | EESIPR_DLCIP |
+ EESIPR_CDIP | EESIPR_TROIP | EESIPR_RMAFIP |
+ EESIPR_CEEFIP | EESIPR_CELFIP |
+ EESIPR_RRFIP | EESIPR_RTLFIP | EESIPR_RTSFIP |
+ EESIPR_PREIP | EESIPR_CERFIP,
 
.tx_check   = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
@@ -603,7 +617,12 @@ static struct sh_eth_cpu_data r8a777x_da
 
.ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD,
.ecsipr_value   = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP,
-   .eesipr_value   = 0x01ff009f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ADEIP | EESIPR_ECIIP |
+ EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP |
+ EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP |
+ EESIPR_RMAFIP | EESIPR_RRFIP |
+ EESIPR_RTLFIP | EESIPR_RTSFIP |
+ EESIPR_PREIP | EESIPR_CERFIP,
 
.tx_check   = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
@@ -626,7 +645,12 @@ static struct sh_eth_cpu_data r8a779x_da
.ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD | ECSR_MPD,
.ecsipr_value   = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP |
  ECSIPR_MPDIP,
-   .eesipr_value   = 0x01ff009f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ADEIP | EESIPR_ECIIP |
+ EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP |
+ EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP |
+ EESIPR_RMAFIP | EESIPR_RRFIP |
+ EESIPR_RTLFIP | EESIPR_RTSFIP |
+ EESIPR_PREIP | EESIPR_CERFIP,
 
.tx_check   = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
@@ -667,7 +691,12 @@ static struct sh_eth_cpu_data sh7724_dat
 
.ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD,
.ecsipr_value   = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP,
-   .eesipr_value   = 0x01ff009f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ADEIP | EESIPR_ECIIP |
+ EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP |
+ EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP |
+ EESIPR_RMAFIP | EESIPR_RRFIP |
+ EESIPR_RTLFIP | EESIPR_RTSFIP |
+ EESIPR_PREIP | EESIPR_CERFIP,
 
.tx_check   = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
@@ -702,7 +731,14 @@ static struct sh_eth_cpu_data sh7757_dat
 
.register_type  = SH_ETH_REG_FAST_SH4,
 
-   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP |
+

[PATCH 2/3] sh_eth: add missing EESIPR bits

2017-01-22 Thread Sergei Shtylyov

Renesas SH77{34|63} manuals  describe more EESIPR bits than the current
driver. Declare the new bits with the end goal of using the bit names
instead of the bare numbers  for  the 'sh_eth_cpu_data::eesipr_value'
initializers...

Signed-off-by: Sergei Shtylyov 

---
 drivers/net/ethernet/renesas/sh_eth.h |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.h
===
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h
+++ net-next/drivers/net/ethernet/renesas/sh_eth.h
@@ -269,13 +269,17 @@ enum EESR_BIT {
 
 /* EESIPR */
 enum EESIPR_BIT {
-   EESIPR_TWBIP= 0x4000,
+   EESIPR_TWB1IP   = 0x8000,
+   EESIPR_TWBIP= 0x4000,   /* same as TWB0IP */
+   EESIPR_TC1IP= 0x2000,
+   EESIPR_TUCIP= 0x1000,
+   EESIPR_ROCIP= 0x0800,
EESIPR_TABTIP   = 0x0400,
EESIPR_RABTIP   = 0x0200,
EESIPR_RFCOFIP  = 0x0100,
EESIPR_ADEIP= 0x0080,
EESIPR_ECIIP= 0x0040,
-   EESIPR_FTCIP= 0x0020,
+   EESIPR_FTCIP= 0x0020,   /* same as TC0IP */
EESIPR_TDEIP= 0x0010,
EESIPR_TFUFIP   = 0x0008,
EESIPR_FRIP = 0x0004,
@@ -286,6 +290,8 @@ enum EESIPR_BIT {
EESIPR_CDIP = 0x0200,
EESIPR_TROIP= 0x0100,
EESIPR_RMAFIP   = 0x0080,
+   EESIPR_CEEFIP   = 0x0040,
+   EESIPR_CELFIP   = 0x0020,
EESIPR_RRFIP= 0x0010,
EESIPR_RTLFIP   = 0x0008,
EESIPR_RTSFIP   = 0x0004,

[PATCH 1/3] sh_eth: rename EESIPR bits

2017-01-22 Thread Sergei Shtylyov

Since the  commit  b0ca2a21f769 ("sh_eth: Add support of SH7763 to sh_eth")
the *enum* declaring the EESIPR bits (interrupt mask) went out of sync with
the *enum* declaring the EESR bits (interrupt status) WRT  bit naming  and
formatting. I'd like to restore the consistency by using EESIPR as the bit
name prefix, renaming the *enum* to EESIPR_BIT, and (finally) renaming the
bits according to the available  Renesas SH77{34|63} manuals...

Signed-off-by: Sergei Shtylyov 

---
 drivers/net/ethernet/renesas/sh_eth.c |   22 ++--
 drivers/net/ethernet/renesas/sh_eth.h |   36 +-
 2 files changed, 34 insertions(+), 24 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -556,7 +556,7 @@ static struct sh_eth_cpu_data r8a7740_da
 
.ecsr_value = ECSR_ICD | ECSR_MPD,
.ecsipr_value   = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP,
-   .eesipr_value   = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f,
 
.tx_check   = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
@@ -702,7 +702,7 @@ static struct sh_eth_cpu_data sh7757_dat
 
.register_type  = SH_ETH_REG_FAST_SH4,
 
-   .eesipr_value   = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f,
 
.tx_check   = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO,
.eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE |
@@ -769,7 +769,7 @@ static struct sh_eth_cpu_data sh7757_dat
 
.ecsr_value = ECSR_ICD | ECSR_MPD,
.ecsipr_value   = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP,
-   .eesipr_value   = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f,
 
.tx_check   = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
@@ -800,7 +800,7 @@ static struct sh_eth_cpu_data sh7734_dat
 
.ecsr_value = ECSR_ICD | ECSR_MPD,
.ecsipr_value   = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP,
-   .eesipr_value   = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f07ff,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f07ff,
 
.tx_check   = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
@@ -830,7 +830,7 @@ static struct sh_eth_cpu_data sh7763_dat
 
.ecsr_value = ECSR_ICD | ECSR_MPD,
.ecsipr_value   = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP,
-   .eesipr_value   = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f07ff,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f07ff,
 
.tx_check   = EESR_TC1 | EESR_FTC,
.eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
@@ -851,7 +851,7 @@ static struct sh_eth_cpu_data sh7763_dat
 static struct sh_eth_cpu_data sh7619_data = {
.register_type  = SH_ETH_REG_FAST_SH3_SH2,
 
-   .eesipr_value   = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f,
 
.apr= 1,
.mpr= 1,
@@ -862,7 +862,7 @@ static struct sh_eth_cpu_data sh7619_dat
 static struct sh_eth_cpu_data sh771x_data = {
.register_type  = SH_ETH_REG_FAST_SH3_SH2,
 
-   .eesipr_value   = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f,
+   .eesipr_value   = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f,
.tsu= 1,
 };
 
@@ -1547,10 +1547,10 @@ static void sh_eth_emac_interrupt(struct
sh_eth_rcv_snd_disable(ndev);
} else {
/* Link Up */
-   sh_eth_modify(ndev, EESIPR, DMAC_M_ECI, 0);
+   sh_eth_modify(ndev, EESIPR, EESIPR_ECIIP, 0);
/* clear int */
sh_eth_modify(ndev, ECSR, 0, 0);
-   sh_eth_modify(ndev, EESIPR, DMAC_M_ECI, DMAC_M_ECI);
+   sh_eth_modify(ndev, EESIPR, EESIPR_ECIIP, EESIPR_ECIIP);
/* enable tx and rx */
sh_eth_rcv_snd_enable(ndev);
}
@@ -1652,7 +1652,7 @@ static irqreturn_t sh_eth_interrupt(int
 * bit...
 */
intr_enable = sh_eth_read(ndev, EESIPR);
-   intr_status &= intr_enable | DMAC_M_ECI;
+   intr_status &= intr_enable | EESIPR_ECIIP;
if (intr_status & (EESR_RX_CHECK | cd->tx_check | EESR_ECI |
   cd->eesr_err_check))
ret = IRQ_HANDLED;
@@ -3199,7 +3199,7 @@ static int sh_eth_wol_setup(struct net_d
/* Only allow ECI interrupts */
synchronize_irq(ndev->irq);

[PATCH 0/3] sh_eth: E-DMAC interrupt mask cleanups

2017-01-22 Thread Sergei Shtylyov

Hello.

   Here's a set of 3 patches against DaveM's 'net-next.git' repo. The main goal
of this set is to stop using the bare numbers for the E-DMAC interrupt masks.

[1/3] sh_eth: rename EESIPR bits
[2/3] sh_eth: add missing EESIPR bits
[3/3] sh_eth: stop using bare numbers for EESIPR values

MBR, Sergei

Re: [PATCH] net/mlx4: use rb_entry()

2017-01-22 Thread Leon Romanovsky

On Sun, Jan 22, 2017 at 10:42:25PM +0800, Geliang Tang wrote:
> On Sun, Jan 22, 2017 at 09:48:39AM +0200, Leon Romanovsky wrote:
> > On Fri, Jan 20, 2017 at 10:36:57PM +0800, Geliang Tang wrote:
> > > To make the code clearer, use rb_entry() instead of container_of() to
> > > deal with rbtree.
> > >
> > > Signed-off-by: Geliang Tang 
> > > ---
> > >  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 8 
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > I don't understand completely the rationale behind this conversion.
> > rb_entry == container_of, why do we need another name for it?
> >
>
> There are several *_entry macros which are defined in kernel data
> structures, like list_entry, hlist_entry, rb_entry, etc. Each of them is
> just another name for container_of. We use different *_entry so that we
> could identify the specific type of data structure that we are dealing
> with.

Your proposed patch doesn't support the importance of such knowledge for
rb_entry. The list_entry case is totally different, because you perform
operation on it.

Anyway, It doesn't matter.
Reviewed-by: Leon Romanovsky 


signature.asc
Description: PGP signature

Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies

2017-01-22 Thread Jamal Hadi Salim


On 17-01-22 01:13 PM, Florian Fainelli wrote:








+   a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE],
+GFP_KERNEL);
+   if (!a->act_cookie->data) {
+   err = -ENOMEM;
+   tcf_hash_release(a, bind);
+   goto err_mod;
+   }


Are not you leaking a->act_cookie here in case nla_memdup() fails here?



yes, I am. Thanks for catching this. V6 coming up.

cheers,
jamal

Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve

2017-01-22 Thread Tom Herbert

On Sat, Jan 21, 2017 at 11:12 AM, kernel netdev  wrote:
>
>
> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" :
>
> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
>  wrote:
>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet 
>> wrote:
>>> From: Eric Dumazet 
>>>
>>> A driver using dev_alloc_page() must not reuse a page allocated from
>>> emergency memory reserve.
>>>
>>> Otherwise all packets using this page will be immediately dropped,
>>> unless for very specific sockets having SOCK_MEMALLOC bit set.
>>>
>>> This issue might be hard to debug, because only a fraction of received
>>> packets would be dropped.
>>
>> Hi Eric,
>>
>> When you say reuse, you mean point to the same page from several SKBs ?
>>
>> Because in our page cache implementation we don't reuse pages that
>> already passed to the stack,
>> we just keep them in the page cache until the ref count drop back to
>> one, so we recycle them (i,e they will be re-used only when no one
>> else is using them).
>>
> Saeed,
>
> Speaking of the mlx page cache can we remove this or a least make it
> optional to use. It is another example of complex functionality being
> put into drivers that makes things like backports more complicated and
> provide at best some marginal value. In the case of the mlx5e cache
> code the results from pktgen really weren't very impressive in the
> first place. Also, the cache suffers from HOL blocking where we can
> block the whole cache due to an outstanding reference on just one page
> (something that you wouldn't see in pktgen but is likely to happen in
> real applications).
>
>
> (Send from phone in car)
>
> To Tom, have you measured the effect of this page cache? Before claiming it
> is ineffective.

No, I have not. TBH, I have most of the past few weeks trying to debug
a backport of the code from 4.9 to 4.6. Until we have a working
backport performance is immaterial for our purposes. Unfortunately, we
are seeing some issues: the checksum faults I described previously and
crashes on bad page refcns which are presumably being caused by the
logic in RX buffer processing. This is why I am now having to dissect
the code and trying to disable things like the page cache that are not
essential functionality.

In any case the HOL blocking issue is obvious from reading the code
which and implies bimodal behavior-- we don't need a test for that to
know it's a bad as we've see the bad effects of that in many other
contexts.
>
> My previous measurements show approx 20℅ speedup on a UDP test with delivery
> to remote CPU.
>
> Removing the cache would of cause be a good usecase for speeding up the page
> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
> cycles. And with bulking this can be amortized to 80 cycles.
>
That would be great. If only I had a nickel for every time someone
started working on a driver and came the conclusion that they need to
do a custom memory allocator because the kernel allocator is so
inefficient!

Tom

> --Jesper
>

Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies

2017-01-22 Thread Florian Fainelli



On 01/22/2017 04:51 AM, Jamal Hadi Salim wrote:
> From: Jamal Hadi Salim 
> 
> Introduce optional 128-bit action cookie.
> Like all other cookie schemes in the networking world (eg in protocols
> like http or existing kernel fib protocol field, etc) the idea is to save
> user state that when retrieved serves as a correlator. The kernel
> _should not_ intepret it.  The user can store whatever they wish in the
> 128 bits.
> 
> Sample exercise(showing variable length use of cookie)
> 
> .. create an accept action with cookie a1b2c3d4
> sudo $TC actions add action ok index 1 cookie a1b2c3d4
> 
> .. dump all gact actions..
> sudo $TC -s actions ls action gact
> 
> action order 0: gact action pass
>  random type none pass val 0
>  index 1 ref 1 bind 0 installed 5 sec used 5 sec
> Action statistics:
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> cookie a1b2c3d4
> 
> .. bind the accept action to a filter..
> sudo $TC filter add dev lo parent : protocol ip prio 1 \
> u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
> 
> ... send some traffic..
> $ ping 127.0.0.1 -c 3
> PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
> 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
> 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
> 
> --- 127.0.0.1 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2109ms
> rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1
> 
> ... show some stats
> $ sudo $TC -s actions get action gact index 1
> 
> action order 1: gact action pass
>  random type none pass val 0
>  index 1 ref 2 bind 1 installed 204 sec used 5 sec
> Action statistics:
> Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> cookie a1b2c3d4
> 
> .. try longer cookie...
> $ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef
> .. dump..
> $ sudo $TC -s actions ls action gact
> 
> action order 1: gact action pass
>  random type none pass val 0
>  index 1 ref 2 bind 1 installed 204 sec used 5 sec
> Action statistics:
> Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> cookie 1234567890abcdef
> 
> Signed-off-by: Jamal Hadi Salim 

> + a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE],
> +  GFP_KERNEL);
> + if (!a->act_cookie->data) {
> + err = -ENOMEM;
> + tcf_hash_release(a, bind);
> + goto err_mod;
> + }

Are not you leaking a->act_cookie here in case nla_memdup() fails here?
-- 
Florian

Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve

2017-01-22 Thread Tom Herbert

On Sat, Jan 21, 2017 at 12:31 PM, Saeed Mahameed
 wrote:
> On Sat, Jan 21, 2017 at 9:12 PM, kernel netdev  wrote:
>>
>>
>> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" :
>>
>> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed
>>  wrote:
>>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet 
>>> wrote:
 From: Eric Dumazet 

 A driver using dev_alloc_page() must not reuse a page allocated from
 emergency memory reserve.

 Otherwise all packets using this page will be immediately dropped,
 unless for very specific sockets having SOCK_MEMALLOC bit set.

 This issue might be hard to debug, because only a fraction of received
 packets would be dropped.
>>>
>>> Hi Eric,
>>>
>>> When you say reuse, you mean point to the same page from several SKBs ?
>>>
>>> Because in our page cache implementation we don't reuse pages that
>>> already passed to the stack,
>>> we just keep them in the page cache until the ref count drop back to
>>> one, so we recycle them (i,e they will be re-used only when no one
>>> else is using them).
>>>
>> Saeed,
>>
>> Speaking of the mlx page cache can we remove this or a least make it
>> optional to use. It is another example of complex functionality being
>> put into drivers that makes things like backports more complicated and
>
> Re complexity, I am not sure the mlx page cache is that complex,
> we just wrap alloc_page/put_page with our own page cache calls.
> Roughly the page cache implementation is 200-300 LOC tops all concentrated
> in one place in the code.
>
Taken as part of the RX buffer management code the whole thing in very
complicated and seems to be completely bereft of any comments in the
code as to how things are supposed to work.

>> provide at best some marginal value. In the case of the mlx5e cache
>> code the results from pktgen really weren't very impressive in the
>> first place. Also, the cache suffers from HOL blocking where we can
>
> Well, with pktgen you won't notice a huge improvement since the pages are 
> freed
> in the stack directly from our rx receive handler (gro_receive), those
> pages will go back to the page allocator and get requested immediately
> again from the driver (no stress).
>
> The real improvements are seen when you really stress the page allocator with
> real uses cases such as  TCP/UDP with user applications, where the
> pages are held longer than the driver needs, the page cache in this
> case will play an important role of reducing the stress
> on the page allocater since with those use cases we are juggling with
> more pages than pktgen could.
>
> With more stress (#cores TCP streams) our humble page cache some times
> can't hold ! and it will get full fast enough and will fail to recycle
> for a huge percentage of the driver pages requests.
>
> Before our own page cache we used dev_alloc_skb which used its own
> cache "page_frag_cache" and it worked nice enough. So i don't really
> recommend removing the page cache, until we have
> a generic RX page cache solution for all device drivers.
>
>> block the whole cache due to an outstanding reference on just one page
>> (something that you wouldn't see in pktgen but is likely to happen in
>> real applications).
>>
>
> Re the HOL issue, we have some upcoming patches that would drastically
> improve the HOL blocking issue (we will simple swap the HOL on every
> sample).
>
>>
>> (Send from phone in car)
>>
>
> Driver Safe :) ..
>
>> To Tom, have you measured the effect of this page cache? Before claiming it
>> is ineffective.
>>
>> My previous measurements show approx 20℅ speedup on a UDP test with delivery
>> to remote CPU.
>>
>> Removing the cache would of cause be a good usecase for speeding up the page
>> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page
>> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150
>> cycles. And with bulking this can be amortized to 80 cycles.
>>
>
> Are you trying to say that we won't need the cache if you manage to
> deliver those optimizations ?
> can you compare those optimizations with the page_frag_cache from
> dev_alloc_skb ?
>
>> --Jesper
>>

Re: [PATCH] Documentation: net: phy: improve explanation when to specify the PHY ID

2017-01-22 Thread Andrew Lunn

On Sun, Jan 22, 2017 at 05:41:32PM +0100, Martin Blumenstingl wrote:
> The old description basically read like "ethernet-phy-id." can
> be specified when you know the actual PHY ID. However, specifying this
> has a side-effect: it forces Linux to bind to a certain PHY driver (the
> one that matches the ID given in the compatible string), ignoring the ID
> which is reported by the actual PHY.
> Whenever a device is shipped with (multiple) different PHYs during it's
> production lifetime then explicitly specifying
> "ethernet-phy-id." could break certain revisions of that device.
> 
> Signed-off-by: Martin Blumenstingl 

Reviewed-by: Andrew Lunn 

Thanks
Andrew

Re: [PATCH] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread Andrew Lunn

On Sun, Jan 22, 2017 at 06:06:30PM +0800, Jingju Hou wrote:
> Signed-off-by: Jingju Hou 

Hi Jingju

Please include a real comment here. Something like:

The mvneta itself does not support WOL, but the PHY might. So pass the
calls to the PHY.

It also looks like you are patching an old kernel. Network patches
like this need to be against net-next. You should also include
net-next in the subject line.

Thanks
Andrew

[PATCH] Documentation: net: phy: improve explanation when to specify the PHY ID

2017-01-22 Thread Martin Blumenstingl

The old description basically read like "ethernet-phy-id." can
be specified when you know the actual PHY ID. However, specifying this
has a side-effect: it forces Linux to bind to a certain PHY driver (the
one that matches the ID given in the compatible string), ignoring the ID
which is reported by the actual PHY.
Whenever a device is shipped with (multiple) different PHYs during it's
production lifetime then explicitly specifying
"ethernet-phy-id." could break certain revisions of that device.

Signed-off-by: Martin Blumenstingl 
---
Thanks to Andrew Lunn for pointing the documentation issue out to me in:
http://lists.infradead.org/pipermail/linux-amlogic/2017-January/002141.html


 Documentation/devicetree/bindings/net/phy.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/phy.txt 
b/Documentation/devicetree/bindings/net/phy.txt
index ff1bc4b1bb3b..fb5056b22685 100644
--- a/Documentation/devicetree/bindings/net/phy.txt
+++ b/Documentation/devicetree/bindings/net/phy.txt
@@ -19,8 +19,9 @@ Optional Properties:
   specifications. If neither of these are specified, the default is to
   assume clause 22.
 
-  If the phy's identifier is known then the list may contain an entry
-  of the form: "ethernet-phy-id." where
+  If the PHY reports an incorrect ID (or none at all) then the
+  "compatible" list may contain an entry with the correct PHY ID in the
+  form: "ethernet-phy-id." where
   - The value of the 16 bit Phy Identifier 1 register as
 4 hex digits. This is the chip vendor OUI bits 3:18
   - The value of the 16 bit Phy Identifier 2 register as
-- 
2.11.0

RE: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to extended statistics

2017-01-22 Thread Nogah Frankel


> -Original Message-
> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Thursday, January 19, 2017 9:11 PM
> To: Roopa Prabhu 
> Cc: Nogah Frankel ; netdev@vger.kernel.org;
> roszenr...@gmail.com; Jiri Pirko ; Ido Schimmel
> ; Elad Raz ; Yotam Gigi
> ; Or Gerlitz 
> Subject: Re: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to 
> extended statistics
> 
> On Thu, 19 Jan 2017 08:06:21 -0800
> Roopa Prabhu  wrote:
> 
> > On 1/19/17, 7:21 AM, Nogah Frankel wrote:
> > >> -Original Message-
> > >> From: Nogah Frankel
> > >> Sent: Sunday, January 15, 2017 3:55 PM
> > >> To: 'Stephen Hemminger' 
> > >> Cc: netdev@vger.kernel.org; roszenr...@gmail.com;
> ro...@cumulusnetworks.com; Jiri
> > >> Pirko ; Ido Schimmel ; Elad Raz
> > >> ; Yotam Gigi ; Or Gerlitz
> > >> 
> > >> Subject: RE: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to 
> > >> extended
> statistics
> > >>
> > >>
> > >>
> > >>> -Original Message-
> > >>> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> > >>> Sent: Friday, January 13, 2017 3:44 AM
> > >>> To: Nogah Frankel 
> > >>> Cc: netdev@vger.kernel.org; roszenr...@gmail.com;
> ro...@cumulusnetworks.com;
> > >> Jiri
> > >>> Pirko ; Ido Schimmel ; Elad Raz
> > >>> ; Yotam Gigi ; Or Gerlitz
> > >>> 
> > >>> Subject: Re: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to 
> > >>> extended
> statistics
> > >>>
> > >>> On Thu, 12 Jan 2017 15:49:50 +0200
> > >>> Nogah Frankel  wrote:
> > >>>
> >  The default stats for ifstat are 32 bits based.
> >  The kernel supports 64 bits based stats. (They are returned in struct
> >  rtnl_link_stats64 which is an exact copy of struct rtnl_link_stats, in
> >  which the "normal" stats are returned, but with fields of u64 instead 
> >  of
> >  u32). This patch adds them as an extended stats.
> > 
> >  It is read with filter type IFLA_STATS_LINK_64 and no sub type.
> > 
> >  It is under the name 64bits
> >  (or any shorten of it as "64")
> > 
> >  For example:
> >  ifstat -x 64bit
> > 
> >  Signed-off-by: Nogah Frankel 
> >  Reviewed-by: Jiri Pirko 
> > >>> Other commands (like ip link) always use the 64 bit statistics if 
> > >>> available
> > >>> from the device. I see no reason that ifstat needs to be different.
> > >>>
> > >> Do you mean to change the default ifstat results to be 64 bits based?
> > >> I tried it in the first version, but Roopa commented that it was not a 
> > >> good idea.
> > >> She said they tried it in the past and it caused backward 
> > >> compatibilities problems.
> > >> (Or maybe I didn't understand correctly)
> > > So, can I leave the default ifstat results to be 32 bits based, for the 
> > > time being?
> > >
> > From past discussions: Moving the default to 64bit has compat issues with 
> > the old
> history file.
> > There is a way to make it work by using a new file header (to indicate that 
> > it is 64 bit) in
> > a freshly created history file and also check this header before dumping 
> > stats into the
> history file.
> > ie maintain backward compat without introducing a new option. It is doable.
> >
> > One approach is, you can drop the 64bit option from this series and
> > try updating the default to 64 bit (with compat handling code) in a later 
> > series.

I think I will take your suggestion to drop the 64 bits from this series.
Hopefully, I'll return to it in some later series in the future.
Thanks

> 
> The ifstat code could do conversion based on file size.
> 
>   if (history_file_is_32bit()) {
>   printf("converting to 64 bit format\n");
> ...
>   }
> 
>

Re: [RFC PATCH net-next 4/5] bridge: vlan lwt and dst_metadata netlink support

2017-01-22 Thread Roopa Prabhu

On 1/22/17, 4:05 AM, Nikolay Aleksandrov wrote:
> On 21/01/17 06:46, Roopa Prabhu wrote:
>> From: Roopa Prabhu 
>>
>> This patch adds support to attach per vlan tunnel info dst
>> metadata. This enables bridge driver to map vlan to tunnel_info
>> at ingress and egress
>>
>> The initial use case is vlan to vni bridging, but the api is generic
>> to extend to any tunnel_info in the future:
>> - Uapi to configure/unconfigure/dump per vlan tunnel data
>> - netlink functions to configure vlan and tunnel_info mapping
>> - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
>> dst_metadata to bridged packets on ports.
>>
>> Use case:
>> example use for this is a vxlan bridging gateway or vtep
>> which maps vlans to vn-segments (or vnis). User can configure
>> per-vlan tunnel information which the bridge driver can use
>> to bridge vlan into the corresponding tunnel.
>>
>> CC: Nikolay Aleksandrov 
>> Signed-off-by: Roopa Prabhu 
>> ---
>> CC'ing Nikolay for some more eyes as he has been trying to keep the
>> bridge driver fast path lite.
>>
>>  include/linux/if_bridge.h |1 +
>>  net/bridge/br_input.c |1 +
>>  net/bridge/br_netlink.c   |  410 
>> ++---
>>  net/bridge/br_private.h   |   18 ++
>>  net/bridge/br_vlan.c  |  138 ++-
>>  5 files changed, 507 insertions(+), 61 deletions(-)
>>
>> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
>> index c6587c0..36ff611 100644
>> --- a/include/linux/if_bridge.h
>> +++ b/include/linux/if_bridge.h
>> @@ -46,6 +46,7 @@ struct br_ip_list {
>>  #define BR_LEARNING_SYNCBIT(9)
>>  #define BR_PROXYARP_WIFIBIT(10)
>>  #define BR_MCAST_FLOOD  BIT(11)
>> +#define BR_LWT_VLAN BIT(12)
>>  
>>  #define BR_DEFAULT_AGEING_TIME  (300 * HZ)
>>  
>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>> index 855b72f..83f356f 100644
>> --- a/net/bridge/br_input.c
>> +++ b/net/bridge/br_input.c
>> @@ -20,6 +20,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include "br_private.h"
>>  
>>  /* Hook for brouter */
>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>> index 71c7453..df997ad 100644
>> --- a/net/bridge/br_netlink.c
>> +++ b/net/bridge/br_netlink.c
>> @@ -17,17 +17,30 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include "br_private.h"
>>  #include "br_private_stp.h"
>>  
>> -static int __get_num_vlan_infos(struct net_bridge_vlan_group *vg,
>> -u32 filter_mask)
>> +static size_t br_get_vlan_tinfo_size(void)
>>  {
>> +return nla_total_size(0) + /* nest IFLA_BRIDGE_VLAN_TUNNEL_INFO */
>> +  nla_total_size(sizeof(u32)) + /* IFLA_BRIDGE_VLAN_TUNNEL_ID */
>> +  nla_total_size(sizeof(u16)) + /* IFLA_BRIDGE_VLAN_TUNNEL_VID 
>> */
>> +  nla_total_size(sizeof(u16)); /* IFLA_BRIDGE_VLAN_TUNNEL_FLAGS 
>> */
>> +}
>> +
>> +static int __get_num_vlan_infos(struct net_bridge_port *p,
>> +struct net_bridge_vlan_group *vg,
>> +u32 filter_mask, int *num_vtinfos)
>> +{
>> +struct net_bridge_vlan *vbegin = NULL, *vend = NULL;
>> +struct net_bridge_vlan *vtbegin = NULL, *vtend = NULL;
>>  struct net_bridge_vlan *v;
>> -u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0;
>> +bool get_tinfos = (p && p->flags & BR_LWT_VLAN) ? true: false;
>> +bool vcontinue, vtcontinue;
>> +int num_vinfos = 0;
>>  u16 flags, pvid;
>> -int num_vlans = 0;
>>  
>>  if (!(filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED))
>>  return 0;
>> @@ -36,6 +49,8 @@ static int __get_num_vlan_infos(struct 
>> net_bridge_vlan_group *vg,
>>  /* Count number of vlan infos */
>>  list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
>>  flags = 0;
>> +vcontinue = false;
>> +vtcontinue = false;
>>  /* only a context, bridge vlan not activated */
>>  if (!br_vlan_should_use(v))
>>  continue;
>> @@ -45,47 +60,79 @@ static int __get_num_vlan_infos(struct 
>> net_bridge_vlan_group *vg,
>>  if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
>>  flags |= BRIDGE_VLAN_INFO_UNTAGGED;
>>  
>> -if (vid_range_start == 0) {
>> -goto initvars;
>> -} else if ((v->vid - vid_range_end) == 1 &&
>> -flags == vid_range_flags) {
>> -vid_range_end = v->vid;
>> +if (!vbegin) {
>> +vbegin = v;
>> +vend = v;
>> +vcontinue = true;
>> +} else if ((v->vid - vend->vid) == 1 &&
>> +flags == vbegin->flags) {
>> +vend = v;
>> +vcontinue = true;
>> +}
>> +
>> +if (!vcontinue) {
>> +if ((vend->vid - vbegin->vid) > 0

Re: [RFC PATCH net-next 5/5] bridge: vlan lwt dst_metadata hooks in ingress and egress paths

2017-01-22 Thread Roopa Prabhu

On 1/22/17, 4:15 AM, Nikolay Aleksandrov wrote:
> On 21/01/17 06:46, Roopa Prabhu wrote:
>> From: Roopa Prabhu 
>>
>> - ingress hook:
>> - if port is a lwt tunnel port, use tunnel info in
>>   attached dst_metadata to map it to a local vlan
>> - egress hook:
>> - if port is a lwt tunnel port, use tunnel info attached to
>>   vlan to set dst_metadata on the skb
>>
>> CC: Nikolay Aleksandrov 
>> Signed-off-by: Roopa Prabhu 
>> ---
>> CC'ing Nikolay for some more eyes as he has been trying to keep the
>> bridge driver fast path lite.
>>
>>  net/bridge/br_input.c   |4 
>>  net/bridge/br_private.h |4 
>>  net/bridge/br_vlan.c|   55 
>> +++
>>  3 files changed, 63 insertions(+)
>>
>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>> index 83f356f..96602a1 100644
>> --- a/net/bridge/br_input.c
>> +++ b/net/bridge/br_input.c
>> @@ -262,6 +262,10 @@ rx_handler_result_t br_handle_frame(struct sk_buff 
>> **pskb)
>>  return RX_HANDLER_CONSUMED;
>>  
>>  p = br_port_get_rcu(skb->dev);
>> +if (p->flags & BR_LWT_VLAN) {
>> +if (br_handle_ingress_vlan_tunnel(skb, p, 
>> nbp_vlan_group_rcu(p)))
>> +goto drop;
>> +}
> Is there any reason to do this so early (perhaps netfilter?) ? If not, you 
> can push it to the vlan __allowed_ingress
> (and rename that function to something else, it does a hundred additional 
> things)
> and avoid this check for all packets if vlans are disabled, thus people using 
> non-vlan filtering
> bridge won't have an additional test in their fast path
>
>
yes, forgot to mention it in the commit log. I had it close to 
__allowed_ingress in my first version...had to move it up here
because br_nf_pre_routing/br_nf_pre_routing_finish reset the dst...and hence 
already late..

Re: [RFC PATCH net-next 2/5] vxlan: make COLLECT_METADATA mode bridge friendly

2017-01-22 Thread Roopa Prabhu

On 1/22/17, 3:40 AM, Nikolay Aleksandrov wrote:
> On 21/01/17 06:46, Roopa Prabhu wrote:
>> From: Roopa Prabhu 
>>
>> This patch series makes vxlan COLLECT_METADATA mode bridge
>> and layer2 network friendly. Vxlan COLLECT_METADATA mode today
>> solves the per-vni netdev scalability problem in l3 networks.
>> When vxlan collect metadata device participates in bridging
>> vlan to vn-segments, It can only get the vlan mapped vni in
>> the xmit tunnel dst metadata. It will need the vxlan driver to
>> continue learn, hold forwarding state and remote destination
>> information similar to how it already does for non COLLECT_METADATA
>> vxlan netdevices today.
>>
>> Changes introduced by this patch:
>> - allow learning and forwarding database state to vxlan netdev in
>>   COLLECT_METADATA mode. Current behaviour is not changed
>>   by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
>>   to support the new bridge friendly mode.
>> - A single fdb table hashed by (mac, vni) to allow fdb entries with
>>   multiple vnis in the same fdb table
>> - rx path already has the vni
>> - tx path expects a vni in the packet with dst_metadata
>> - prior to this series, fdb remote_dsts carried remote vni and
>>   the vxlan device carrying the fdb table represented the
>>   source vni. With the vxlan device now representing multiple vnis,
>>   this patch adds a src vni attribute to the fdb entry. The remote
>>   vni already uses NDA_VNI attribute. This patch introduces
>>   NDA_SRC_VNI netlink attribute to represent the src vni in a multi
>>   vni fdb table.
>>
>> Signed-off-by: Roopa Prabhu 
>> ---
> [snip]
>> @@ -2173,23 +2221,29 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
>> struct net_device *dev)
>>  bool did_rsc = false;
>>  struct vxlan_rdst *rdst, *fdst = NULL;
>>  struct vxlan_fdb *f;
>> +__be32 vni = 0;
>>  
>>  info = skb_tunnel_info(skb);
>>  
>>  skb_reset_mac_header(skb);
>>  
>>  if (vxlan->flags & VXLAN_F_COLLECT_METADATA) {
>> -if (info && info->mode & IP_TUNNEL_INFO_TX)
>> -vxlan_xmit_one(skb, dev, NULL, false);
>> -else
>> -kfree_skb(skb);
>> -return NETDEV_TX_OK;
>> +if (info && info->mode & IP_TUNNEL_INFO_BRIDGE &&
>> +info->mode & IP_TUNNEL_INFO_TX) {
> nit: parentheses around the IP_TUNNEL_INFO_TX check
>
>> +vni = tunnel_id_to_key32(info->key.tun_id);
>> +} else {
>> +if (info && info->mode & IP_TUNNEL_INFO_TX)
> nit: parentheses around the IP_TUNNEL_INFO_TX check

ack
>> +vxlan_xmit_one(skb, dev, vni, NULL, false);
>> +else
>> +kfree_skb(skb);
>> +return NETDEV_TX_OK;
>> +}
>>  }
>>  
>>  if (vxlan->flags & VXLAN_F_PROXY) {
>>  eth = eth_hdr(skb);
>>  if (ntohs(eth->h_proto) == ETH_P_ARP)
>> -return arp_reduce(dev, skb);
>> +return arp_reduce(dev, skb, vni);
>>  #if IS_ENABLED(CONFIG_IPV6)
>>  else if (ntohs(eth->h_proto) == ETH_P_IPV6 &&
>>   pskb_may_pull(skb, sizeof(struct ipv6hdr)
>> @@ -2200,13 +2254,13 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
>> struct net_device *dev)
>>  msg = (struct nd_msg 
>> *)skb_transport_header(skb);
>>  if (msg->icmph.icmp6_code == 0 &&
>>  msg->icmph.icmp6_type == 
>> NDISC_NEIGHBOUR_SOLICITATION)
>> -return neigh_reduce(dev, skb);
>> +return neigh_reduce(dev, skb, vni);
>>  }
>>  #endif
>>  }
>>  
>>  eth = eth_hdr(skb);
>> -f = vxlan_find_mac(vxlan, eth->h_dest);
>> +f = vxlan_find_mac(vxlan, eth->h_dest, vni);
>>  did_rsc = false;
>>  
>>  if (f && (f->flags & NTF_ROUTER) && (vxlan->flags & VXLAN_F_RSC) &&
>> @@ -2214,11 +2268,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
>> struct net_device *dev)
>>   ntohs(eth->h_proto) == ETH_P_IPV6)) {
>>  did_rsc = route_shortcircuit(dev, skb);
>>  if (did_rsc)
>> -f = vxlan_find_mac(vxlan, eth->h_dest);
>> +f = vxlan_find_mac(vxlan, eth->h_dest, vni);
>>  }
>>  
>>  if (f == NULL) {
>> -f = vxlan_find_mac(vxlan, all_zeros_mac);
>> +f = vxlan_find_mac(vxlan, all_zeros_mac, vni);
>>  if (f == NULL) {
>>  if ((vxlan->flags & VXLAN_F_L2MISS) &&
>>  !is_multicast_ether_addr(eth->h_dest))
>> @@ -2239,11 +2293,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
>> struct net_device *dev)
>>  }
>>  skb1 = skb_clone(skb, GFP_ATOMIC);
>>  if (skb1)
>> -

Re: [PATCH] net/mlx4: use rb_entry()

2017-01-22 Thread Geliang Tang

On Sun, Jan 22, 2017 at 09:48:39AM +0200, Leon Romanovsky wrote:
> On Fri, Jan 20, 2017 at 10:36:57PM +0800, Geliang Tang wrote:
> > To make the code clearer, use rb_entry() instead of container_of() to
> > deal with rbtree.
> >
> > Signed-off-by: Geliang Tang 
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> I don't understand completely the rationale behind this conversion.
> rb_entry == container_of, why do we need another name for it?
> 

There are several *_entry macros which are defined in kernel data
structures, like list_entry, hlist_entry, rb_entry, etc. Each of them is
just another name for container_of. We use different *_entry so that we
could identify the specific type of data structure that we are dealing
with.

-Geliang

Re: [patch net-next 2/4] net/sched: Introduce sample tc action

2017-01-22 Thread Roman Mashak

Jiri Pirko  writes:

> From: Yotam Gigi 
>
> This action allows the user to sample traffic matched by tc classifier.
> The sampling consists of choosing packets randomly and sampling them using
> the psample module. The user can configure the psample group number, the
> sampling rate and the packet's truncation (to save kernel-user traffic).
>

[skip]
> diff --git a/include/uapi/linux/tc_act/tc_sample.h 
> b/include/uapi/linux/tc_act/tc_sample.h
> new file mode 100644
> index 000..21378bc
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_sample.h
> @@ -0,0 +1,26 @@
> +#ifndef __LINUX_TC_SAMPLE_H
> +#define __LINUX_TC_SAMPLE_H
> +
> +#include 
> +#include 
> +#include 
> +
> +#define TCA_ACT_SAMPLE 26
> +
> +struct tc_sample {
> + tc_gen;
> +};
> +
> +enum {
> + TCA_SAMPLE_UNSPEC,
> + TCA_SAMPLE_PARMS,
> + TCA_SAMPLE_TM,
> + TCA_SAMPLE_RATE,
> + TCA_SAMPLE_TRUNC_SIZE,
> + TCA_SAMPLE_PSAMPLE_GROUP,
> + TCA_SAMPLE_PAD,
> + __TCA_SAMPLE_MAX
> +};

Most of action implementations define TCA_X_TM attribute as 1,
and TCA_X_PARMS as 2 followed by action specific tlvs, it is better to
adhere this style in newly designed actions.

[skip]

-- 
Roman Mashak

[PATCH iproute2 1/1] tc: distinguish Add/Replace action operations.

2017-01-22 Thread Roman Mashak

Signed-off-by: Roman Mashak 
Signed-off-by: Jamal Hadi Salim 
---
 tc/m_action.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index bb19df8..05ef07e 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -365,12 +365,18 @@ int print_action(const struct sockaddr_nl *who,
fprintf(fp, "Flushed table ");
tab_flush = 1;
} else {
-   fprintf(fp, "deleted action ");
+   fprintf(fp, "Deleted action ");
}
}
 
-   if (n->nlmsg_type == RTM_NEWACTION)
-   fprintf(fp, "Added action ");
+   if (n->nlmsg_type == RTM_NEWACTION) {
+   if ((n->nlmsg_flags & NLM_F_CREATE) &&
+   !(n->nlmsg_flags & NLM_F_REPLACE)) {
+   fprintf(fp, "Added action ");
+   } else if (n->nlmsg_flags & NLM_F_REPLACE) {
+   fprintf(fp, "Replaced action ");
+   }
+   }
tc_print_action(fp, tb[TCA_ACT_TAB]);
 
return 0;
-- 
1.9.1

RE: [patch net-next 2/4] net/sched: Introduce sample tc action

2017-01-22 Thread Yotam Gigi

>-Original Message-
>From: Jamal Hadi Salim [mailto:j...@mojatatu.com]
>Sent: Sunday, January 22, 2017 3:17 PM
>To: Jiri Pirko ; netdev@vger.kernel.org
>Cc: da...@davemloft.net; Yotam Gigi ; Ido Schimmel
>; Elad Raz ; Nogah Frankel
>; Or Gerlitz ;
>geert+rene...@glider.be; step...@networkplumber.org;
>xiyou.wangc...@gmail.com; li...@roeck-us.net; ro...@cumulusnetworks.com;
>john.fastab...@gmail.com; simon.hor...@netronome.com; m...@mojatatu.com
>Subject: Re: [patch net-next 2/4] net/sched: Introduce sample tc action
>
>On 17-01-22 06:44 AM, Jiri Pirko wrote:
>
>> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
>> new file mode 100644
>> index 000..24e20e4
>> --- /dev/null
>> +++ b/net/sched/act_sample.c
>> @@ -0,0 +1,274 @@
>> +/*
>> + * net/sched/act_sample.c - Packet samplig tc action
>
>typo: "Sampling"

It took me a while to see it. Will fix :)

>
>
>> +static int tcf_sample(struct sk_buff *skb, const struct tc_action *a,
>> +  struct tcf_result *res)
>
>
>Can you rename this function because it is also the name of the data
>structure? It makes it easier to grep.
>I know we have this all over the place in other actions (and i hope
>those are cleaned up at some point).

OK, makes sense.

>
>otherwise:
>Acked-by: Jamal Hadi Salim 

Thanks! 

>
>cheers,
>jamal

Re: [net, 6/6] net: korina: version bump

2017-01-22 Thread Felix Fietkau

On 2017-01-22 13:10, Roman Yeryomin wrote:
> On 17 January 2017 at 21:19, Roman Yeryomin  wrote:
>> On 17 January 2017 at 20:55, Felix Fietkau  wrote:
>>> On 2017-01-17 18:33, Roman Yeryomin wrote:
 Signed-off-by: Roman Yeryomin 
 ---
  drivers/net/ethernet/korina.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
 index 83c994f..c8fed01 100644
 --- a/drivers/net/ethernet/korina.c
 +++ b/drivers/net/ethernet/korina.c
 @@ -66,8 +66,8 @@
  #include 

  #define DRV_NAME "korina"
 -#define DRV_VERSION  "0.10"
 -#define DRV_RELDATE  "04Mar2008"
 +#define DRV_VERSION  "0.20"
 +#define DRV_RELDATE  "15Jan2017"
>>> I think it would make more sense to remove this version instead of
>>> bumping it. Individual driver versions are rather pointless, the kernel
>>> version is more meaningful anyway.
>>
>> OK, makes sense
> 
> Actually, after thinking a bit more about this, not really...
> How about ethtool, which uses driver name and version?
> I see most ethernet drivers define some version. And it's pretty
> useful, when using backports.
> IMO, it should be kept and bumped.
I don't really care, I just wanted to point out that the exact kernel
version is a much more useful indicator, especially since not all patch
submitters do the useless version bump dance.

- Felix

Re: [patch net-next 1/4] net: Introduce psample, a new genetlink channel for packet sampling

2017-01-22 Thread Jamal Hadi Salim


On 17-01-22 06:44 AM, Jiri Pirko wrote:

From: Yotam Gigi 

Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be used by tc,
iptables, etc. and allow to standardize packet sampling in the kernel.

For every sampled packet, the psample module adds the following metadata
fields:

PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable

PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable

PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been
   truncated during sampling

PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the
   user who initiated the sampling. This field allows the user to
   differentiate between several samplers working simultaneously and
   filter packets relevant to him

PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The
   sequence is kept for each group

PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets

PSAMPLE_ATTR_DATA - the actual packet bits

In addition, add the GET_GROUPS netlink command which allows the user to
see the current sample groups, their refcount and sequence number. This
command currently supports only netlink dump mode.

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 


Will be useful to describe in the commit log that one needs to listen
to PSAMPLE_NL_MCGRP_SAMPLE to see the samples.

Reviewed-by: Jamal Hadi Salim 

cheers,
jamal

Re: [patch net-next 2/4] net/sched: Introduce sample tc action

2017-01-22 Thread Jamal Hadi Salim


On 17-01-22 06:44 AM, Jiri Pirko wrote:


diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
new file mode 100644
index 000..24e20e4
--- /dev/null
+++ b/net/sched/act_sample.c
@@ -0,0 +1,274 @@
+/*
+ * net/sched/act_sample.c - Packet samplig tc action


typo: "Sampling"



+static int tcf_sample(struct sk_buff *skb, const struct tc_action *a,
+ struct tcf_result *res)



Can you rename this function because it is also the name of the data
structure? It makes it easier to grep.
I know we have this all over the place in other actions (and i hope
those are cleaned up at some point).

otherwise:
Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH net-next v5 0/1] Add support for tc cookies

2017-01-22 Thread Jamal Hadi Salim


I removed people's reviewed/Acked because i changed the data structure
per Daniel's suggestions.

cheers,
jamal
On 17-01-22 07:51 AM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

Changes in V5:
 - kill the stylistic changes
 - Adopt a new structure with length-valuepointer representation
 - rename some things

Changes in v4:
 - move stylistic changes out into a separate patch
   (and add more stylistic changes)

Changes in v3:
 - use TC_ prefix for the max size
 - move the cookie struct so visible only to kernel
 - remove unneeded void * cast

Changes in V2:
 -move from a union to a length-value representation

Jamal Hadi Salim (1):
  net sched actions: Add support for user cookies

 include/net/act_api.h|  1 +
 include/net/pkt_cls.h|  8 
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/act_api.c  | 35 +++
 4 files changed, 47 insertions(+)

[PATCH net-next v5 1/1] net sched actions: Add support for user cookies

2017-01-22 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it.  The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

--- 127.0.0.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2109ms
rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1

... show some stats
$ sudo $TC -s actions get action gact index 1

action order 1: gact action pass
 random type none pass val 0
 index 1 ref 2 bind 1 installed 204 sec used 5 sec
Action statistics:
Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. try longer cookie...
$ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef
.. dump..
$ sudo $TC -s actions ls action gact

action order 1: gact action pass
 random type none pass val 0
 index 1 ref 2 bind 1 installed 204 sec used 5 sec
Action statistics:
Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie 1234567890abcdef

Signed-off-by: Jamal Hadi Salim 
---
 include/net/act_api.h|  1 +
 include/net/pkt_cls.h|  8 
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/act_api.c  | 35 +++
 4 files changed, 47 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 1d71644..cfa2ae3 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -41,6 +41,7 @@ struct tc_action {
struct rcu_head tcfa_rcu;
struct gnet_stats_basic_cpu __percpu *cpu_bstats;
struct gnet_stats_queue __percpu *cpu_qstats;
+   struct tc_cookie*act_cookie;
 };
 #define tcf_head   common.tcfa_head
 #define tcf_index  common.tcfa_index
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f0a0514..b43077e 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -515,4 +515,12 @@ struct tc_cls_bpf_offload {
u32 gen_flags;
 };
 
+
+/* This structure holds cookie structure that is passed from user
+ * to the kernel for actions and classifiers
+ */
+struct tc_cookie {
+   u8  *data;
+   u32 len;
+};
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index fd373eb..345551e 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -4,6 +4,8 @@
 #include 
 #include 
 
+#define TC_COOKIE_MAX_SIZE 16
+
 /* Action attributes */
 enum {
TCA_ACT_UNSPEC,
@@ -12,6 +14,7 @@ enum {
TCA_ACT_INDEX,
TCA_ACT_STATS,
TCA_ACT_PAD,
+   TCA_ACT_COOKIE,
__TCA_ACT_MAX
 };
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index cd08df9..84052630 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -33,6 +34,8 @@ static void free_tcf(struct rcu_head *head)
 
free_percpu(p->cpu_bstats);
free_percpu(p->cpu_qstats);
+   kfree(p->act_cookie->data);
+   kfree(p->act_cookie);
kfree(p);
 }
 
@@ -475,6 +478,12 @@ int tcf_action_destroy(struct list_head *actions, int bind)
goto nla_put_failure;
if (tcf_action_copy_stats(skb, a, 0))
goto nla_put_failure;
+   if (a->act_cookie) {
+   if (nla_put(skb, TCA_ACT_COOKIE, a->act_cookie->len,
+   a->act_cookie->data))
+   goto nla_put_failure;
+   }
+
nest = nla_nest_start(skb, TCA_OPTIONS);
if (nest == NULL)
goto nla_put_failure;
@@ -575,6 +584,32 @@ struct tc_action *tcf_action_init_1(struct net *net, 
struct nlattr *nla,
if (err < 0)
goto err_mod;
 
+   if (tb[TCA_ACT_COOKIE]) {
+   int cklen = nla_len(tb[TCA_ACT_COOKIE]);
+
+   i

[PATCH net-next v5 0/1] Add support for tc cookies

2017-01-22 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Changes in V5:
 - kill the stylistic changes
 - Adopt a new structure with length-valuepointer representation
 - rename some things

Changes in v4:
 - move stylistic changes out into a separate patch
   (and add more stylistic changes)

Changes in v3:
 - use TC_ prefix for the max size
 - move the cookie struct so visible only to kernel
 - remove unneeded void * cast

Changes in V2:
 -move from a union to a length-value representation

Jamal Hadi Salim (1):
  net sched actions: Add support for user cookies

 include/net/act_api.h|  1 +
 include/net/pkt_cls.h|  8 
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/act_api.c  | 35 +++
 4 files changed, 47 insertions(+)

-- 
1.9.1

Re: [PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind

2017-01-22 Thread Jamal Hadi Salim


On 17-01-20 01:20 AM, Cong Wang wrote:

On Wed, Jan 18, 2017 at 3:33 AM, Jamal Hadi Salim  wrote:

On 17-01-17 01:17 PM, Cong Wang wrote:





I did.
The issue there (after your original patch) was destroy() would
decrement the refcount to zero and a GET was essentially translated
to a DEL. Incrementing the refcount earlier protected against that
assuming destroy was going to decrement it.
However, when an action is bound the destroy() doesnt decrement
the refcnt. So the refcnt keeps going up forever (and therefore
deleting fails in the future). So we cant use destroy() as is.


Hmm, tcf_action_destroy() should not touch the refcnt at all in this case,
right? Since the refcnt here is not for readers in kernel code but for
user-space. We mix the use of this refcnt, which leads to problems.

Your patch is not correct either for DEL, tcf_action_destroy() is not
needed to call again after tcf_del_notify() fails, right? Probably
it is not needed at all:


Cong,
Please proceed to separate del from get. The trickery is biting us.
Also - run those tests i had in my patch. There is a difference
between the bound vs not-bound use cases.

cheers,
jamal

Re: [RFC PATCH net-next 5/5] bridge: vlan lwt dst_metadata hooks in ingress and egress paths

2017-01-22 Thread Nikolay Aleksandrov

On 21/01/17 06:46, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> - ingress hook:
> - if port is a lwt tunnel port, use tunnel info in
>   attached dst_metadata to map it to a local vlan
> - egress hook:
> - if port is a lwt tunnel port, use tunnel info attached to
>   vlan to set dst_metadata on the skb
> 
> CC: Nikolay Aleksandrov 
> Signed-off-by: Roopa Prabhu 
> ---
> CC'ing Nikolay for some more eyes as he has been trying to keep the
> bridge driver fast path lite.
> 
>  net/bridge/br_input.c   |4 
>  net/bridge/br_private.h |4 
>  net/bridge/br_vlan.c|   55 
> +++
>  3 files changed, 63 insertions(+)
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 83f356f..96602a1 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -262,6 +262,10 @@ rx_handler_result_t br_handle_frame(struct sk_buff 
> **pskb)
>   return RX_HANDLER_CONSUMED;
>  
>   p = br_port_get_rcu(skb->dev);
> + if (p->flags & BR_LWT_VLAN) {
> + if (br_handle_ingress_vlan_tunnel(skb, p, 
> nbp_vlan_group_rcu(p)))
> + goto drop;
> + }

Is there any reason to do this so early (perhaps netfilter?) ? If not, you can 
push it to the vlan __allowed_ingress
(and rename that function to something else, it does a hundred additional 
things)
and avoid this check for all packets if vlans are disabled, thus people using 
non-vlan filtering
bridge won't have an additional test in their fast path

>  
>   if (unlikely(is_link_local_ether_addr(dest))) {
>   u16 fwd_mask = p->br->group_fwd_mask_required;
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index f68e360..68a23c5 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -804,6 +804,10 @@ int __vlan_tunnel_info_del(struct net_bridge_vlan_group 
> *vg,
>  int nbp_vlan_tunnel_info_add(struct net_bridge_port *port, u16 vid, u32 
> tun_id);
>  bool vlan_tunnel_id_isrange(struct net_bridge_vlan *v_end,
>   struct net_bridge_vlan *v);
> +int br_handle_ingress_vlan_tunnel(struct sk_buff *skb, struct 
> net_bridge_port *p,
> +   struct net_bridge_vlan_group *vg);
> +int br_handle_egress_vlan_tunnel(struct sk_buff *skb,
> +  struct net_bridge_vlan *vlan);
>  
>  static inline struct net_bridge_vlan_group *br_vlan_group(
>   const struct net_bridge *br)
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 2040f08..6cf2344 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -405,6 +405,11 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
>  
>   if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
>   skb->vlan_tci = 0;
> +
> + if (br_handle_egress_vlan_tunnel(skb, v)) {
> + kfree_skb(skb);
> + return NULL;
> + }
>  out:
>   return skb;
>  }
> @@ -1213,3 +1218,53 @@ int nbp_vlan_tunnel_info_delete(struct net_bridge_port 
> *port, u16 vid)
>  
>   return 0;
>  }
> +
> +int br_handle_ingress_vlan_tunnel(struct sk_buff *skb,
> +   struct net_bridge_port *p,
> +   struct net_bridge_vlan_group *vg)
> +{
> + struct ip_tunnel_info *tinfo = skb_tunnel_info(skb);
> + struct net_bridge_vlan *vlan;
> +
> + if (!vg || !tinfo)
> + return 0;
> +
> + /* if already tagged, ignore */
> + if (skb_vlan_tagged(skb))
> + return 0;
> +
> + /* lookup vid, given tunnel id */
> + vlan = br_vlan_tunnel_lookup(&vg->tunnel_hash, tinfo->key.tun_id);
> + if (!vlan)
> + return 0;
> +
> + skb_dst_drop(skb);
> +
> + __vlan_hwaccel_put_tag(skb, p->br->vlan_proto, vlan->vid);
> +
> + return 0;
> +}
> +
> +int br_handle_egress_vlan_tunnel(struct sk_buff *skb,
> +  struct net_bridge_vlan *vlan)
> +{
> + __be32 tun_id;
> + int err;
> +
> + if (!vlan || !vlan->tinfo.tunnel_id)
> + return 0;
> +
> + if (unlikely(!skb_vlan_tag_present(skb)))
> + return 0;
> +
> + skb_dst_drop(skb);
> + tun_id = tunnel_id_to_key32(vlan->tinfo.tunnel_id);
> +
> + err = skb_vlan_pop(skb);
> + if (err)
> + return err;
> +
> + skb_dst_set(skb, dst_clone(&vlan->tinfo.tunnel_dst->dst));
> +
> + return 0;
> +}
>

Re: [RFC PATCH net-next 4/5] bridge: vlan lwt and dst_metadata netlink support

2017-01-22 Thread Nikolay Aleksandrov

On 21/01/17 06:46, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> This patch adds support to attach per vlan tunnel info dst
> metadata. This enables bridge driver to map vlan to tunnel_info
> at ingress and egress
> 
> The initial use case is vlan to vni bridging, but the api is generic
> to extend to any tunnel_info in the future:
> - Uapi to configure/unconfigure/dump per vlan tunnel data
> - netlink functions to configure vlan and tunnel_info mapping
> - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach
> dst_metadata to bridged packets on ports.
> 
> Use case:
> example use for this is a vxlan bridging gateway or vtep
> which maps vlans to vn-segments (or vnis). User can configure
> per-vlan tunnel information which the bridge driver can use
> to bridge vlan into the corresponding tunnel.
> 
> CC: Nikolay Aleksandrov 
> Signed-off-by: Roopa Prabhu 
> ---
> CC'ing Nikolay for some more eyes as he has been trying to keep the
> bridge driver fast path lite.
> 
>  include/linux/if_bridge.h |1 +
>  net/bridge/br_input.c |1 +
>  net/bridge/br_netlink.c   |  410 
> ++---
>  net/bridge/br_private.h   |   18 ++
>  net/bridge/br_vlan.c  |  138 ++-
>  5 files changed, 507 insertions(+), 61 deletions(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index c6587c0..36ff611 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -46,6 +46,7 @@ struct br_ip_list {
>  #define BR_LEARNING_SYNC BIT(9)
>  #define BR_PROXYARP_WIFI BIT(10)
>  #define BR_MCAST_FLOOD   BIT(11)
> +#define BR_LWT_VLAN  BIT(12)
>  
>  #define BR_DEFAULT_AGEING_TIME   (300 * HZ)
>  
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 855b72f..83f356f 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "br_private.h"
>  
>  /* Hook for brouter */
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index 71c7453..df997ad 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -17,17 +17,30 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "br_private.h"
>  #include "br_private_stp.h"
>  
> -static int __get_num_vlan_infos(struct net_bridge_vlan_group *vg,
> - u32 filter_mask)
> +static size_t br_get_vlan_tinfo_size(void)
>  {
> + return nla_total_size(0) + /* nest IFLA_BRIDGE_VLAN_TUNNEL_INFO */
> +   nla_total_size(sizeof(u32)) + /* IFLA_BRIDGE_VLAN_TUNNEL_ID */
> +   nla_total_size(sizeof(u16)) + /* IFLA_BRIDGE_VLAN_TUNNEL_VID 
> */
> +   nla_total_size(sizeof(u16)); /* IFLA_BRIDGE_VLAN_TUNNEL_FLAGS 
> */
> +}
> +
> +static int __get_num_vlan_infos(struct net_bridge_port *p,
> + struct net_bridge_vlan_group *vg,
> + u32 filter_mask, int *num_vtinfos)
> +{
> + struct net_bridge_vlan *vbegin = NULL, *vend = NULL;
> + struct net_bridge_vlan *vtbegin = NULL, *vtend = NULL;
>   struct net_bridge_vlan *v;
> - u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0;
> + bool get_tinfos = (p && p->flags & BR_LWT_VLAN) ? true: false;
> + bool vcontinue, vtcontinue;
> + int num_vinfos = 0;
>   u16 flags, pvid;
> - int num_vlans = 0;
>  
>   if (!(filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED))
>   return 0;
> @@ -36,6 +49,8 @@ static int __get_num_vlan_infos(struct 
> net_bridge_vlan_group *vg,
>   /* Count number of vlan infos */
>   list_for_each_entry_rcu(v, &vg->vlan_list, vlist) {
>   flags = 0;
> + vcontinue = false;
> + vtcontinue = false;
>   /* only a context, bridge vlan not activated */
>   if (!br_vlan_should_use(v))
>   continue;
> @@ -45,47 +60,79 @@ static int __get_num_vlan_infos(struct 
> net_bridge_vlan_group *vg,
>   if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED)
>   flags |= BRIDGE_VLAN_INFO_UNTAGGED;
>  
> - if (vid_range_start == 0) {
> - goto initvars;
> - } else if ((v->vid - vid_range_end) == 1 &&
> - flags == vid_range_flags) {
> - vid_range_end = v->vid;
> + if (!vbegin) {
> + vbegin = v;
> + vend = v;
> + vcontinue = true;
> + } else if ((v->vid - vend->vid) == 1 &&
> + flags == vbegin->flags) {
> + vend = v;
> + vcontinue = true;
> + }
> +
> + if (!vcontinue) {
> + if ((vend->vid - vbegin->vid) > 0)
> + num_vinfos += 2;
> + else
> + num_vinfos +=

Re: [net, 6/6] net: korina: version bump

2017-01-22 Thread Roman Yeryomin

On 17 January 2017 at 21:19, Roman Yeryomin  wrote:
> On 17 January 2017 at 20:55, Felix Fietkau  wrote:
>> On 2017-01-17 18:33, Roman Yeryomin wrote:
>>> Signed-off-by: Roman Yeryomin 
>>> ---
>>>  drivers/net/ethernet/korina.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
>>> index 83c994f..c8fed01 100644
>>> --- a/drivers/net/ethernet/korina.c
>>> +++ b/drivers/net/ethernet/korina.c
>>> @@ -66,8 +66,8 @@
>>>  #include 
>>>
>>>  #define DRV_NAME "korina"
>>> -#define DRV_VERSION  "0.10"
>>> -#define DRV_RELDATE  "04Mar2008"
>>> +#define DRV_VERSION  "0.20"
>>> +#define DRV_RELDATE  "15Jan2017"
>> I think it would make more sense to remove this version instead of
>> bumping it. Individual driver versions are rather pointless, the kernel
>> version is more meaningful anyway.
>
> OK, makes sense

Actually, after thinking a bit more about this, not really...
How about ethtool, which uses driver name and version?
I see most ethernet drivers define some version. And it's pretty
useful, when using backports.
IMO, it should be kept and bumped.

Regards,
Roman

Re: [patch net-next 4/4] mlxsw: spectrum: Add packet sample offloading support

2017-01-22 Thread Ido Schimmel

On Sun, Jan 22, 2017 at 12:44:47PM +0100, Jiri Pirko wrote:
> From: Yotam Gigi 
> 
> Using the MPSC register, add the functions that configure port-based
> packet sampling in hardware and the necessary datatypes in the
> mlxsw_sp_port struct. In addition, add the necessary trap for sampled
> packets and integrate with matchall offloading to allow offloading of the
> sample tc action.
> 
> The current offload support is for the tc command:
> 
> tc filter add dev  parent : \
> matchall skip_sw \
> action sample rate  group  [trunc ]
> 
> Where only ingress qdiscs are supported, and only a combination of
> matchall classifier and sample action will lead to activating hardware
> packet sampling.
> 
> Signed-off-by: Yotam Gigi 
> Signed-off-by: Jiri Pirko 

Reviewed-by: Ido Schimmel

[patch net-next 4/4] mlxsw: spectrum: Add packet sample offloading support

2017-01-22 Thread Jiri Pirko

From: Yotam Gigi 

Using the MPSC register, add the functions that configure port-based
packet sampling in hardware and the necessary datatypes in the
mlxsw_sp_port struct. In addition, add the necessary trap for sampled
packets and integrate with matchall offloading to allow offloading of the
sample tc action.

The current offload support is for the tc command:

tc filter add dev  parent : \
  matchall skip_sw \
  action sample rate  group  [trunc ]

Where only ingress qdiscs are supported, and only a combination of
matchall classifier and sample action will lead to activating hardware
packet sampling.

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 111 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  10 +++
 drivers/net/ethernet/mellanox/mlxsw/trap.h |   1 +
 3 files changed, 122 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 3dbd82e..467aa52 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "spectrum.h"
 #include "pci.h"
@@ -469,6 +470,16 @@ static void mlxsw_sp_span_mirror_remove(struct 
mlxsw_sp_port *from,
mlxsw_sp_span_inspected_port_unbind(from, span_entry, type);
 }
 
+static int mlxsw_sp_port_sample_set(struct mlxsw_sp_port *mlxsw_sp_port,
+   bool enable, u32 rate)
+{
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+   char mpsc_pl[MLXSW_REG_MPSC_LEN];
+
+   mlxsw_reg_mpsc_pack(mpsc_pl, mlxsw_sp_port->local_port, enable, rate);
+   return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mpsc), mpsc_pl);
+}
+
 static int mlxsw_sp_port_admin_status_set(struct mlxsw_sp_port *mlxsw_sp_port,
  bool is_up)
 {
@@ -1218,6 +1229,51 @@ mlxsw_sp_port_del_cls_matchall_mirror(struct 
mlxsw_sp_port *mlxsw_sp_port,
mlxsw_sp_span_mirror_remove(mlxsw_sp_port, to_port, span_type);
 }
 
+static int
+mlxsw_sp_port_add_cls_matchall_sample(struct mlxsw_sp_port *mlxsw_sp_port,
+ struct tc_cls_matchall_offload *cls,
+ const struct tc_action *a,
+ bool ingress)
+{
+   int err;
+
+   if (!mlxsw_sp_port->sample)
+   return -EOPNOTSUPP;
+   if (rtnl_dereference(mlxsw_sp_port->sample->psample_group)) {
+   netdev_err(mlxsw_sp_port->dev, "sample already active\n");
+   return -EEXIST;
+   }
+   if (tcf_sample_rate(a) > MLXSW_REG_MPSC_RATE_MAX) {
+   netdev_err(mlxsw_sp_port->dev, "sample rate not supported\n");
+   return -EOPNOTSUPP;
+   }
+
+   rcu_assign_pointer(mlxsw_sp_port->sample->psample_group,
+  tcf_sample_psample_group(a));
+   mlxsw_sp_port->sample->truncate = tcf_sample_truncate(a);
+   mlxsw_sp_port->sample->trunc_size = tcf_sample_trunc_size(a);
+   mlxsw_sp_port->sample->rate = tcf_sample_rate(a);
+
+   err = mlxsw_sp_port_sample_set(mlxsw_sp_port, true, tcf_sample_rate(a));
+   if (err)
+   goto err_port_sample_set;
+   return 0;
+
+err_port_sample_set:
+   RCU_INIT_POINTER(mlxsw_sp_port->sample->psample_group, NULL);
+   return err;
+}
+
+static void
+mlxsw_sp_port_del_cls_matchall_sample(struct mlxsw_sp_port *mlxsw_sp_port)
+{
+   if (!mlxsw_sp_port->sample)
+   return;
+
+   mlxsw_sp_port_sample_set(mlxsw_sp_port, false, 1);
+   RCU_INIT_POINTER(mlxsw_sp_port->sample->psample_group, NULL);
+}
+
 static int mlxsw_sp_port_add_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port,
  __be16 protocol,
  struct tc_cls_matchall_offload *cls,
@@ -1248,6 +1304,10 @@ static int mlxsw_sp_port_add_cls_matchall(struct 
mlxsw_sp_port *mlxsw_sp_port,
mirror = &mall_tc_entry->mirror;
err = mlxsw_sp_port_add_cls_matchall_mirror(mlxsw_sp_port,
mirror, a, ingress);
+   } else if (is_tcf_sample(a) && protocol == htons(ETH_P_ALL)) {
+   mall_tc_entry->type = MLXSW_SP_PORT_MALL_SAMPLE;
+   err = mlxsw_sp_port_add_cls_matchall_sample(mlxsw_sp_port, cls,
+   a, ingress);
} else {
err = -EOPNOTSUPP;
}
@@ -1281,6 +1341,9 @@ static void mlxsw_sp_port_del_cls_matchall(struct 
mlxsw_sp_port *mlxsw_sp_port,
mlxsw_sp_port_del_cls_matchall_mirror(mlxsw_sp_port,
  &mall_tc_entry->mirror);
break;
+   case MLXSW_SP_PORT_MALL_SAMPLE:
+   mlxsw_sp_port

[patch net-next 2/4] net/sched: Introduce sample tc action

2017-01-22 Thread Jiri Pirko

From: Yotam Gigi 

This action allows the user to sample traffic matched by tc classifier.
The sampling consists of choosing packets randomly and sampling them using
the psample module. The user can configure the psample group number, the
sampling rate and the packet's truncation (to save kernel-user traffic).

Example:
To sample ingress traffic from interface eth1, one may use the commands:

tc qdisc add dev eth1 handle : ingress

tc filter add dev eth1 parent : \
   matchall action sample rate 12 group 4

Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets on
dev eth1 to psample group 4.

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 
---
 include/net/tc_act/tc_sample.h|  50 +++
 include/uapi/linux/tc_act/Kbuild  |   1 +
 include/uapi/linux/tc_act/tc_sample.h |  26 
 net/sched/Kconfig |  12 ++
 net/sched/Makefile|   1 +
 net/sched/act_sample.c| 274 ++
 6 files changed, 364 insertions(+)
 create mode 100644 include/net/tc_act/tc_sample.h
 create mode 100644 include/uapi/linux/tc_act/tc_sample.h
 create mode 100644 net/sched/act_sample.c

diff --git a/include/net/tc_act/tc_sample.h b/include/net/tc_act/tc_sample.h
new file mode 100644
index 000..89e9305
--- /dev/null
+++ b/include/net/tc_act/tc_sample.h
@@ -0,0 +1,50 @@
+#ifndef __NET_TC_SAMPLE_H
+#define __NET_TC_SAMPLE_H
+
+#include 
+#include 
+#include 
+
+struct tcf_sample {
+   struct tc_action common;
+   u32 rate;
+   bool truncate;
+   u32 trunc_size;
+   struct psample_group __rcu *psample_group;
+   u32 psample_group_num;
+   struct list_head tcfm_list;
+   struct rcu_head rcu;
+};
+#define to_sample(a) ((struct tcf_sample *)a)
+
+static inline bool is_tcf_sample(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+   return a->ops && a->ops->type == TCA_ACT_SAMPLE;
+#else
+   return false;
+#endif
+}
+
+static inline __u32 tcf_sample_rate(const struct tc_action *a)
+{
+   return to_sample(a)->rate;
+}
+
+static inline bool tcf_sample_truncate(const struct tc_action *a)
+{
+   return to_sample(a)->truncate;
+}
+
+static inline int tcf_sample_trunc_size(const struct tc_action *a)
+{
+   return to_sample(a)->trunc_size;
+}
+
+static inline struct psample_group *
+tcf_sample_psample_group(const struct tc_action *a)
+{
+   return rcu_dereference(to_sample(a)->psample_group);
+}
+
+#endif /* __NET_TC_SAMPLE_H */
diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index e3db740..ba62ddf 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -4,6 +4,7 @@ header-y += tc_defact.h
 header-y += tc_gact.h
 header-y += tc_ipt.h
 header-y += tc_mirred.h
+header-y += tc_sample.h
 header-y += tc_nat.h
 header-y += tc_pedit.h
 header-y += tc_skbedit.h
diff --git a/include/uapi/linux/tc_act/tc_sample.h 
b/include/uapi/linux/tc_act/tc_sample.h
new file mode 100644
index 000..21378bc
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_sample.h
@@ -0,0 +1,26 @@
+#ifndef __LINUX_TC_SAMPLE_H
+#define __LINUX_TC_SAMPLE_H
+
+#include 
+#include 
+#include 
+
+#define TCA_ACT_SAMPLE 26
+
+struct tc_sample {
+   tc_gen;
+};
+
+enum {
+   TCA_SAMPLE_UNSPEC,
+   TCA_SAMPLE_PARMS,
+   TCA_SAMPLE_TM,
+   TCA_SAMPLE_RATE,
+   TCA_SAMPLE_TRUNC_SIZE,
+   TCA_SAMPLE_PSAMPLE_GROUP,
+   TCA_SAMPLE_PAD,
+   __TCA_SAMPLE_MAX
+};
+#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index a9aa38d..72cfa3a 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -650,6 +650,18 @@ config NET_ACT_MIRRED
  To compile this code as a module, choose M here: the
  module will be called act_mirred.
 
+config NET_ACT_SAMPLE
+tristate "Traffic Sampling"
+depends on NET_CLS_ACT
+select PSAMPLE
+---help---
+ Say Y here to allow packet sampling tc action. The packet sample
+ action consists of statistically choosing packets and sampling
+ them using the psample module.
+
+ To compile this code as a module, choose M here: the
+ module will be called act_sample.
+
 config NET_ACT_IPT
 tristate "IPtables targets"
 depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 4bdda36..7b915d2 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_NET_CLS_ACT) += act_api.o
 obj-$(CONFIG_NET_ACT_POLICE)   += act_police.o
 obj-$(CONFIG_NET_ACT_GACT) += act_gact.o
 obj-$(CONFIG_NET_ACT_MIRRED)   += act_mirred.o
+obj-$(CONFIG_NET_ACT_SAMPLE)   += act_sample.o
 obj-$(CONFIG_NET_ACT_IPT)  += act_ipt.o
 obj-$(CONFIG_NET_ACT_NAT)  += act_nat.o
 obj-$(CONFIG_NET_ACT_PEDIT)+= act_

[patch net-next 3/4] mlxsw: reg: add the Monitoring Packet Sampling Configuration Register

2017-01-22 Thread Jiri Pirko

From: Yotam Gigi 

The MPSC register allows to configure ingress packet sampling on specific
port of the mlxsw device. The sampled packets are then trapped via
PKT_SAMPLE trap.

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 41 +++
 1 file changed, 41 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 1357fe0..9fb0316 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -4965,6 +4965,46 @@ static inline void mlxsw_reg_mlcr_pack(char *payload, u8 
local_port,
   MLXSW_REG_MLCR_DURATION_MAX : 0);
 }
 
+/* MPSC - Monitoring Packet Sampling Configuration Register
+ * 
+ * MPSC Register is used to configure the Packet Sampling mechanism.
+ */
+#define MLXSW_REG_MPSC_ID 0x9080
+#define MLXSW_REG_MPSC_LEN 0x1C
+
+MLXSW_REG_DEFINE(mpsc, MLXSW_REG_MPSC_ID, MLXSW_REG_MPSC_LEN);
+
+/* reg_mpsc_local_port
+ * Local port number
+ * Not supported for CPU port
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mpsc, local_port, 0x00, 16, 8);
+
+/* reg_mpsc_e
+ * Enable sampling on port local_port
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mpsc, e, 0x04, 30, 1);
+
+#define MLXSW_REG_MPSC_RATE_MAX 35UL
+
+/* reg_mpsc_rate
+ * Sampling rate = 1 out of rate packets (with randomization around
+ * the point). Valid values are: 1 to MLXSW_REG_MPSC_RATE_MAX
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mpsc, rate, 0x08, 0, 32);
+
+static inline void mlxsw_reg_mpsc_pack(char *payload, u8 local_port, bool e,
+  u32 rate)
+{
+   MLXSW_REG_ZERO(mpsc, payload);
+   mlxsw_reg_mpsc_local_port_set(payload, local_port);
+   mlxsw_reg_mpsc_e_set(payload, e);
+   mlxsw_reg_mpsc_rate_set(payload, rate);
+}
+
 /* SBPR - Shared Buffer Pools Register
  * ---
  * The SBPR configures and retrieves the shared buffer pools and configuration.
@@ -5429,6 +5469,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
MLXSW_REG(mpat),
MLXSW_REG(mpar),
MLXSW_REG(mlcr),
+   MLXSW_REG(mpsc),
MLXSW_REG(sbpr),
MLXSW_REG(sbcm),
MLXSW_REG(sbpm),
-- 
2.7.4

[patch net-next 1/4] net: Introduce psample, a new genetlink channel for packet sampling

2017-01-22 Thread Jiri Pirko

From: Yotam Gigi 

Add a general way for kernel modules to sample packets, without being tied
to any specific subsystem. This netlink channel can be used by tc,
iptables, etc. and allow to standardize packet sampling in the kernel.

For every sampled packet, the psample module adds the following metadata
fields:

PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable

PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable

PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been
   truncated during sampling

PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the
   user who initiated the sampling. This field allows the user to
   differentiate between several samplers working simultaneously and
   filter packets relevant to him

PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The
   sequence is kept for each group

PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets

PSAMPLE_ATTR_DATA - the actual packet bits

In addition, add the GET_GROUPS netlink command which allows the user to
see the current sample groups, their refcount and sequence number. This
command currently supports only netlink dump mode.

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 
---
 MAINTAINERS  |   7 +
 include/net/psample.h|  36 ++
 include/uapi/linux/Kbuild|   1 +
 include/uapi/linux/psample.h |  35 +
 net/Kconfig  |   1 +
 net/Makefile |   1 +
 net/psample/Kconfig  |  15 +++
 net/psample/Makefile |   5 +
 net/psample/psample.c| 301 +++
 9 files changed, 402 insertions(+)
 create mode 100644 include/net/psample.h
 create mode 100644 include/uapi/linux/psample.h
 create mode 100644 net/psample/Kconfig
 create mode 100644 net/psample/Makefile
 create mode 100644 net/psample/psample.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3c84a8f..d76fccd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9957,6 +9957,13 @@ L:   linuxppc-...@lists.ozlabs.org
 S: Maintained
 F: drivers/block/ps3vram.c
 
+PSAMPLE PACKET SAMPLING SUPPORT:
+M: Yotam Gigi 
+S: Maintained
+F: net/psample
+F: include/net/psample.h
+F: include/uapi/linux/psample.h
+
 PSTORE FILESYSTEM
 M: Anton Vorontsov 
 M: Colin Cross 
diff --git a/include/net/psample.h b/include/net/psample.h
new file mode 100644
index 000..b0e
--- /dev/null
+++ b/include/net/psample.h
@@ -0,0 +1,36 @@
+#ifndef __NET_PSAMPLE_H
+#define __NET_PSAMPLE_H
+
+#include 
+#include 
+#include 
+
+struct psample_group {
+   struct list_head list;
+   struct net *net;
+   u32 group_num;
+   u32 refcount;
+   u32 seq;
+};
+
+struct psample_group *psample_group_get(struct net *net, u32 group_num);
+void psample_group_put(struct psample_group *group);
+
+#if IS_ENABLED(CONFIG_PSAMPLE)
+
+void psample_sample_packet(struct psample_group *group, struct sk_buff *skb,
+  u32 trunc_size, int in_ifindex, int out_ifindex,
+  u32 sample_rate);
+
+#else
+
+static inline void psample_sample_packet(struct psample_group *group,
+struct sk_buff *skb, u32 trunc_size,
+int in_ifindex, int out_ifindex,
+u32 sample_rate)
+{
+}
+
+#endif
+
+#endif /* __NET_PSAMPLE_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index e600b50..80ad741 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -305,6 +305,7 @@ header-y += netrom.h
 header-y += net_namespace.h
 header-y += net_tstamp.h
 header-y += nfc.h
+header-y += psample.h
 header-y += nfs2.h
 header-y += nfs3.h
 header-y += nfs4.h
diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h
new file mode 100644
index 000..ed48996
--- /dev/null
+++ b/include/uapi/linux/psample.h
@@ -0,0 +1,35 @@
+#ifndef __UAPI_PSAMPLE_H
+#define __UAPI_PSAMPLE_H
+
+enum {
+   /* sampled packet metadata */
+   PSAMPLE_ATTR_IIFINDEX,
+   PSAMPLE_ATTR_OIFINDEX,
+   PSAMPLE_ATTR_ORIGSIZE,
+   PSAMPLE_ATTR_SAMPLE_GROUP,
+   PSAMPLE_ATTR_GROUP_SEQ,
+   PSAMPLE_ATTR_SAMPLE_RATE,
+   PSAMPLE_ATTR_DATA,
+
+   /* commands attributes */
+   PSAMPLE_ATTR_GROUP_REFCOUNT,
+
+   __PSAMPLE_ATTR_MAX
+};
+
+enum psample_command {
+   PSAMPLE_CMD_SAMPLE,
+   PSAMPLE_CMD_GET_GROUP,
+   PSAMPLE_CMD_NEW_GROUP,
+   PSAMPLE_CMD_DEL_GROUP,
+};
+
+/* Can be overridden at runtime by module option */
+#define PSAMPLE_ATTR_MAX (__PSAMPLE_ATTR_MAX - 1)
+
+#define PSAMPLE_NL_MCGRP_CONFIG_NAME "config"
+#define PSAMPLE_NL_MCGRP_SAMPLE_NAME "packets"
+#define PSAMPLE_GENL_NAME "psample"
+#define PSAMPLE_GENL_VERSION 1
+
+#endif
diff --git a/net/Kconfig b/net/Kconfig
index 92ae150..ce4aee6 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -390,6 +390,7 @@ source

[patch net-next 0/4] Add support for offloading packet-sampling

2017-01-22 Thread Jiri Pirko

From: Jiri Pirko 

Yotam says:

The first patch introduces the psample module, a netlink channel dedicated
to packet sampling implemented using generic netlink. This module provides
a generic way for kernel modules to sample packets, while not being tied
to any specific subsystem like NFLOG.

The second patch adds the sample tc action, which uses psample to randomly
sample packets that match a classifier. The user can configure the psample
group number, the sampling rate and the packet's truncation (to save
kernel-user traffic).

The last two patches add the support for offloading the matchall-sample
tc command in the mlxsw driver, for ingress qdiscs.

An example for psample usage can be found in the libpsample project at:
https://github.com/Mellanox/libpsample

Yotam Gigi (4):
  net: Introduce psample, a new genetlink channel for packet sampling
  net/sched: Introduce sample tc action
  mlxsw: reg: add the Monitoring Packet Sampling Configuration Register
  mlxsw: spectrum: Add packet sample offloading support

 MAINTAINERS|   7 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h  |  41 
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 111 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  10 +
 drivers/net/ethernet/mellanox/mlxsw/trap.h |   1 +
 include/net/psample.h  |  36 +++
 include/net/tc_act/tc_sample.h |  50 
 include/uapi/linux/Kbuild  |   1 +
 include/uapi/linux/psample.h   |  35 +++
 include/uapi/linux/tc_act/Kbuild   |   1 +
 include/uapi/linux/tc_act/tc_sample.h  |  26 +++
 net/Kconfig|   1 +
 net/Makefile   |   1 +
 net/psample/Kconfig|  15 ++
 net/psample/Makefile   |   5 +
 net/psample/psample.c  | 301 +
 net/sched/Kconfig  |  12 +
 net/sched/Makefile |   1 +
 net/sched/act_sample.c | 274 ++
 19 files changed, 929 insertions(+)
 create mode 100644 include/net/psample.h
 create mode 100644 include/net/tc_act/tc_sample.h
 create mode 100644 include/uapi/linux/psample.h
 create mode 100644 include/uapi/linux/tc_act/tc_sample.h
 create mode 100644 net/psample/Kconfig
 create mode 100644 net/psample/Makefile
 create mode 100644 net/psample/psample.c
 create mode 100644 net/sched/act_sample.c

-- 
2.7.4

Re: [RFC PATCH net-next 2/5] vxlan: make COLLECT_METADATA mode bridge friendly

2017-01-22 Thread Nikolay Aleksandrov

On 21/01/17 06:46, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> This patch series makes vxlan COLLECT_METADATA mode bridge
> and layer2 network friendly. Vxlan COLLECT_METADATA mode today
> solves the per-vni netdev scalability problem in l3 networks.
> When vxlan collect metadata device participates in bridging
> vlan to vn-segments, It can only get the vlan mapped vni in
> the xmit tunnel dst metadata. It will need the vxlan driver to
> continue learn, hold forwarding state and remote destination
> information similar to how it already does for non COLLECT_METADATA
> vxlan netdevices today.
> 
> Changes introduced by this patch:
> - allow learning and forwarding database state to vxlan netdev in
>   COLLECT_METADATA mode. Current behaviour is not changed
>   by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
>   to support the new bridge friendly mode.
> - A single fdb table hashed by (mac, vni) to allow fdb entries with
>   multiple vnis in the same fdb table
> - rx path already has the vni
> - tx path expects a vni in the packet with dst_metadata
> - prior to this series, fdb remote_dsts carried remote vni and
>   the vxlan device carrying the fdb table represented the
>   source vni. With the vxlan device now representing multiple vnis,
>   this patch adds a src vni attribute to the fdb entry. The remote
>   vni already uses NDA_VNI attribute. This patch introduces
>   NDA_SRC_VNI netlink attribute to represent the src vni in a multi
>   vni fdb table.
> 
> Signed-off-by: Roopa Prabhu 
> ---
[snip]
> @@ -2173,23 +2221,29 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   bool did_rsc = false;
>   struct vxlan_rdst *rdst, *fdst = NULL;
>   struct vxlan_fdb *f;
> + __be32 vni = 0;
>  
>   info = skb_tunnel_info(skb);
>  
>   skb_reset_mac_header(skb);
>  
>   if (vxlan->flags & VXLAN_F_COLLECT_METADATA) {
> - if (info && info->mode & IP_TUNNEL_INFO_TX)
> - vxlan_xmit_one(skb, dev, NULL, false);
> - else
> - kfree_skb(skb);
> - return NETDEV_TX_OK;
> + if (info && info->mode & IP_TUNNEL_INFO_BRIDGE &&
> + info->mode & IP_TUNNEL_INFO_TX) {

nit: parentheses around the IP_TUNNEL_INFO_TX check

> + vni = tunnel_id_to_key32(info->key.tun_id);
> + } else {
> + if (info && info->mode & IP_TUNNEL_INFO_TX)

nit: parentheses around the IP_TUNNEL_INFO_TX check

> + vxlan_xmit_one(skb, dev, vni, NULL, false);
> + else
> + kfree_skb(skb);
> + return NETDEV_TX_OK;
> + }
>   }
>  
>   if (vxlan->flags & VXLAN_F_PROXY) {
>   eth = eth_hdr(skb);
>   if (ntohs(eth->h_proto) == ETH_P_ARP)
> - return arp_reduce(dev, skb);
> + return arp_reduce(dev, skb, vni);
>  #if IS_ENABLED(CONFIG_IPV6)
>   else if (ntohs(eth->h_proto) == ETH_P_IPV6 &&
>pskb_may_pull(skb, sizeof(struct ipv6hdr)
> @@ -2200,13 +2254,13 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   msg = (struct nd_msg 
> *)skb_transport_header(skb);
>   if (msg->icmph.icmp6_code == 0 &&
>   msg->icmph.icmp6_type == 
> NDISC_NEIGHBOUR_SOLICITATION)
> - return neigh_reduce(dev, skb);
> + return neigh_reduce(dev, skb, vni);
>   }
>  #endif
>   }
>  
>   eth = eth_hdr(skb);
> - f = vxlan_find_mac(vxlan, eth->h_dest);
> + f = vxlan_find_mac(vxlan, eth->h_dest, vni);
>   did_rsc = false;
>  
>   if (f && (f->flags & NTF_ROUTER) && (vxlan->flags & VXLAN_F_RSC) &&
> @@ -2214,11 +2268,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>ntohs(eth->h_proto) == ETH_P_IPV6)) {
>   did_rsc = route_shortcircuit(dev, skb);
>   if (did_rsc)
> - f = vxlan_find_mac(vxlan, eth->h_dest);
> + f = vxlan_find_mac(vxlan, eth->h_dest, vni);
>   }
>  
>   if (f == NULL) {
> - f = vxlan_find_mac(vxlan, all_zeros_mac);
> + f = vxlan_find_mac(vxlan, all_zeros_mac, vni);
>   if (f == NULL) {
>   if ((vxlan->flags & VXLAN_F_L2MISS) &&
>   !is_multicast_ether_addr(eth->h_dest))
> @@ -2239,11 +2293,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   }
>   skb1 = skb_clone(skb, GFP_ATOMIC);
>   if (skb1)
> - vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> + vxlan_xmit_one(skb1, dev,

Re: [PATCH] net: mvneta: implement .set_wol and .get_wol

2017-01-22 Thread kbuild test robot

Hi Jingju,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.10-rc4 next-20170120]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Jingju-Hou/net-mvneta-implement-set_wol-and-get_wol/20170122-181651
config: sparc64-allmodconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_get_wol':
>> drivers/net/ethernet/marvell/mvneta.c:3940:8: error: 'struct mvneta_port' 
>> has no member named 'phy_dev'; did you mean 'phy_node'?
 if (pp->phy_dev)
   ^~
   drivers/net/ethernet/marvell/mvneta.c:3941:32: error: 'struct mvneta_port' 
has no member named 'phy_dev'; did you mean 'phy_node'?
  return phy_ethtool_get_wol(pp->phy_dev, wol);
   ^~
   drivers/net/ethernet/marvell/mvneta.c:3941:10: warning: 'return' with a 
value, in function returning void
  return phy_ethtool_get_wol(pp->phy_dev, wol);
 ^~~
   drivers/net/ethernet/marvell/mvneta.c:3933:1: note: declared here
mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
^~
   drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_set_wol':
   drivers/net/ethernet/marvell/mvneta.c:3949:9: error: 'struct mvneta_port' 
has no member named 'phy_dev'; did you mean 'phy_node'?
 if (!pp->phy_dev)
^~
   drivers/net/ethernet/marvell/mvneta.c:3952:31: error: 'struct mvneta_port' 
has no member named 'phy_dev'; did you mean 'phy_node'?
 return phy_ethtool_set_wol(pp->phy_dev, wol);
  ^~
   drivers/net/ethernet/marvell/mvneta.c:3953:1: warning: control reaches end 
of non-void function [-Wreturn-type]
}
^

vim +3940 drivers/net/ethernet/marvell/mvneta.c

  3934  {
  3935  struct mvneta_port *pp = netdev_priv(dev);
  3936  
  3937  wol->supported = 0;
  3938  wol->wolopts = 0;
  3939  
> 3940  if (pp->phy_dev)
  3941  return phy_ethtool_get_wol(pp->phy_dev, wol);
  3942  }
  3943  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

1 2 >

1 - 100 of 104 matches

Mail list logo