[PATCH v1] net: phy: micrel: add KSZ8795 ethernet switch
This is add support for the PHYs in the KSZ8795 5port managed switch. It will allow to detect the link between the switch and the soc and uses the same read_status functions as the KSZ8873MLL switch. This ethernet switch have unfortunately the same phy id as KSZ8051. Signed-off-by: Sean Nyekjaer --- drivers/net/phy/micrel.c | 14 ++ include/linux/micrel_phy.h | 2 ++ 2 files changed, 16 insertions(+) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index ea92d524d5a8..fa158ae5115b 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1014,6 +1014,20 @@ static struct phy_driver ksphy_driver[] = { .get_stats = kszphy_get_stats, .suspend= genphy_suspend, .resume = genphy_resume, +}, { + .phy_id = PHY_ID_KSZ8795, + .phy_id_mask= MICREL_PHY_ID_MASK, + .name = "Micrel KSZ8795 Switch", + .features = (SUPPORTED_Pause | SUPPORTED_Asym_Pause), + .flags = PHY_HAS_MAGICANEG | PHY_HAS_INTERRUPT, + .config_init= kszphy_config_init, + .config_aneg= ksz8873mll_config_aneg, + .read_status= ksz8873mll_read_status, + .get_sset_count = kszphy_get_sset_count, + .get_strings= kszphy_get_strings, + .get_stats = kszphy_get_stats, + .suspend= genphy_suspend, + .resume = genphy_resume, } }; module_phy_driver(ksphy_driver); diff --git a/include/linux/micrel_phy.h b/include/linux/micrel_phy.h index 257173e0095e..f541da68d1e7 100644 --- a/include/linux/micrel_phy.h +++ b/include/linux/micrel_phy.h @@ -35,6 +35,8 @@ #define PHY_ID_KSZ886X 0x00221430 #define PHY_ID_KSZ8863 0x00221435 +#define PHY_ID_KSZ8795 0x00221550 + /* struct phy_device dev_flags definitions */ #define MICREL_PHY_50MHZ_CLK 0x0001 #define MICREL_PHY_FXEN0x0002 -- 2.11.0
Re: [PATCH net-next v6 1/1] net sched actions: Add support for user cookies
Sun, Jan 22, 2017 at 09:25:50PM CET, j...@mojatatu.com wrote: >From: Jamal Hadi Salim > >Introduce optional 128-bit action cookie. >Like all other cookie schemes in the networking world (eg in protocols >like http or existing kernel fib protocol field, etc) the idea is to save >user state that when retrieved serves as a correlator. The kernel >_should not_ intepret it. The user can store whatever they wish in the >128 bits. > >Sample exercise(showing variable length use of cookie) > >.. create an accept action with cookie a1b2c3d4 >sudo $TC actions add action ok index 1 cookie a1b2c3d4 > >.. dump all gact actions.. >sudo $TC -s actions ls action gact > >action order 0: gact action pass > random type none pass val 0 > index 1 ref 1 bind 0 installed 5 sec used 5 sec >Action statistics: >Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) >backlog 0b 0p requeues 0 >cookie a1b2c3d4 > >.. bind the accept action to a filter.. >sudo $TC filter add dev lo parent : protocol ip prio 1 \ >u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1 > >... send some traffic.. >$ ping 127.0.0.1 -c 3 >PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. >64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms >64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms >64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms > >--- 127.0.0.1 ping statistics --- >3 packets transmitted, 3 received, 0% packet loss, time 2109ms >rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1 > >... show some stats >$ sudo $TC -s actions get action gact index 1 > >action order 1: gact action pass > random type none pass val 0 > index 1 ref 2 bind 1 installed 204 sec used 5 sec >Action statistics: >Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) >backlog 0b 0p requeues 0 >cookie a1b2c3d4 > >.. try longer cookie... >$ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef >.. dump.. >$ sudo $TC -s actions ls action gact > >action order 1: gact action pass > random type none pass val 0 > index 1 ref 2 bind 1 installed 204 sec used 5 sec >Action statistics: >Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) >backlog 0b 0p requeues 0 >cookie 1234567890abcdef > >Signed-off-by: Jamal Hadi Salim Reviewed-by: Jiri Pirko
Re: [PATCH 2/3] sh_eth: add missing EESIPR bits
On Sun, Jan 22, 2017 at 8:18 PM, Sergei Shtylyov wrote: > Renesas SH77{34|63} manuals describe more EESIPR bits than the current > driver. Declare the new bits with the end goal of using the bit names > instead of the bare numbers for the 'sh_eth_cpu_data::eesipr_value' > initializers... > > Signed-off-by: Sergei Shtylyov Reviewed-by: Geert Uytterhoeven > --- > drivers/net/ethernet/renesas/sh_eth.h | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > Index: net-next/drivers/net/ethernet/renesas/sh_eth.h > === > --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h > +++ net-next/drivers/net/ethernet/renesas/sh_eth.h > @@ -269,13 +269,17 @@ enum EESR_BIT { > > /* EESIPR */ > enum EESIPR_BIT { > - EESIPR_TWBIP= 0x4000, > + EESIPR_TWB1IP = 0x8000, > + EESIPR_TWBIP= 0x4000, /* same as TWB0IP */ Ah, your adding it here ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH 1/3] sh_eth: rename EESIPR bits
Hi Sergei, On Sun, Jan 22, 2017 at 8:18 PM, Sergei Shtylyov wrote: > Since the commit b0ca2a21f769 ("sh_eth: Add support of SH7763 to sh_eth") > the *enum* declaring the EESIPR bits (interrupt mask) went out of sync with > the *enum* declaring the EESR bits (interrupt status) WRT bit naming and > formatting. I'd like to restore the consistency by using EESIPR as the bit > name prefix, renaming the *enum* to EESIPR_BIT, and (finally) renaming the > bits according to the available Renesas SH77{34|63} manuals... Which versions of the SH77{34|63} manuals did you use? Several registers are called slightly different in mine, and also in my r8a7740 manual. > --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h > +++ net-next/drivers/net/ethernet/renesas/sh_eth.h > @@ -268,19 +268,29 @@ enum EESR_BIT { > EESR_TFE | EESR_TDE) > > /* EESIPR */ > -enum DMAC_IM_BIT { > - DMAC_M_TWB = 0x4000, DMAC_M_TABT = 0x0400, > - DMAC_M_RABT = 0x0200, > - DMAC_M_RFRMER = 0x0100, DMAC_M_ADF = 0x0080, > - DMAC_M_ECI = 0x0040, DMAC_M_FTC = 0x0020, > - DMAC_M_TDE = 0x0010, DMAC_M_TFE = 0x0008, > - DMAC_M_FRC = 0x0004, DMAC_M_RDE = 0x0002, > - DMAC_M_RFE = 0x0001, DMAC_M_TINT4 = 0x0800, > - DMAC_M_TINT3 = 0x0400, DMAC_M_TINT2 = 0x0200, > - DMAC_M_TINT1 = 0x0100, DMAC_M_RINT8 = 0x0080, > - DMAC_M_RINT5 = 0x0010, DMAC_M_RINT4 = 0x0008, > - DMAC_M_RINT3 = 0x0004, DMAC_M_RINT2 = 0x0002, > - DMAC_M_RINT1 = 0x0001, > +enum EESIPR_BIT { > + EESIPR_TWBIP= 0x4000, TWBIP is actually two bits in my manual: TWB1IP and TWB0IP > + EESIPR_ADEIP= 0x0080, Nonexistent bit in my manual. > + EESIPR_CNDIP= 0x0800, Nonexistent bit in my manual. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [PATCH] net: phy: micrel: add KSZ8795 ethernet switch
On 2017-01-20 15:17, Andrew Lunn wrote: On Fri, Jan 20, 2017 at 01:50:49PM +0100, Sean Nyekjaer wrote: This ethernet switch have unfortunately the same phy id as KSZ8051. Hi Sean Please could you explain some more. You are adding PHY support here, not switch support. So is this to enable the PHY driver for the PHYs embedded in the switch? Andrew Yes of couse :-) The KSZ8051 is a 5 port managed ethernet switch with integrated PHY with MII/RMII interface on one port. Through the MDIO interface is possible to control the PHY on port 1-5. I have just seen an issue with the reported speed and duplex, so i'm gonna submit a new version with a better description /Sean
[PATCH v4 net-next] net: mvneta: implement .set_wol and .get_wol
From: Jingju Hou From: Jingju Hou The mvneta itself does not support WOL, but the PHY might. So pass the calls to the PHY Signed-off-by: Jingju Hou Signed-off-by: Jisheng Zhang --- since v3: - really fix the build error since v2,v1: - using phy_dev member in struct net_device - add commit msg drivers/net/ethernet/marvell/mvneta.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 6dcc951af0ff..02611fa1c3b8 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3929,6 +3929,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, return 0; } +static void mvneta_ethtool_get_wol(struct net_device *dev, + struct ethtool_wolinfo *wol) +{ + wol->supported = 0; + wol->wolopts = 0; + + if (dev->phydev) + return phy_ethtool_get_wol(dev->phydev, wol); +} + +static int mvneta_ethtool_set_wol(struct net_device *dev, + struct ethtool_wolinfo *wol) +{ + if (!dev->phydev) + return -EOPNOTSUPP; + + return phy_ethtool_set_wol(dev->phydev, wol); +} + static const struct net_device_ops mvneta_netdev_ops = { .ndo_open= mvneta_open, .ndo_stop= mvneta_stop, @@ -3958,6 +3977,8 @@ const struct ethtool_ops mvneta_eth_tool_ops = { .set_rxfh = mvneta_ethtool_set_rxfh, .get_link_ksettings = phy_ethtool_get_link_ksettings, .set_link_ksettings = mvneta_ethtool_set_link_ksettings, + .get_wol= mvneta_ethtool_get_wol, + .set_wol= mvneta_ethtool_set_wol, }; /* Initialize hw */ -- 2.11.0
[PATCH net] r8152: don't execute runtime suspend if the tx is not empty
Runtime suspend shouldn't be executed if the tx queue is not empty, because the device is not idle. Signed-off-by: Hayes Wang --- drivers/net/usb/r8152.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 0e99af0..e1466b4 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -32,7 +32,7 @@ #define NETNEXT_VERSION"08" /* Information for net */ -#define NET_VERSION"6" +#define NET_VERSION"7" #define DRIVER_VERSION "v1." NETNEXT_VERSION "." NET_VERSION #define DRIVER_AUTHOR "Realtek linux nic maintainers " @@ -3574,6 +3574,8 @@ static bool delay_autosuspend(struct r8152 *tp) */ if (!sw_linking && tp->rtl_ops.in_nway(tp)) return true; + else if (!skb_queue_empty(&tp->tx_queue)) + return true; else return false; } -- 2.7.4
Re: [PATCH] net: stmicro: fix LS field mask in EEE configuration
Acked-by:Rayagond Kokatanur On Fri, Jan 20, 2017 at 9:30 PM, Joao Pinto wrote: > This patch fixes the LS mask when setting EEE timer. > LS field is 10 bits long and not 11 as currently. > > Signed-off-by: Joao Pinto > Reported-By: Rayagond Kokatanur > --- > drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c > b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c > index 834f40f..202216c 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c > @@ -184,7 +184,7 @@ static void dwmac4_set_eee_pls(struct mac_device_info > *hw, int link) > static void dwmac4_set_eee_timer(struct mac_device_info *hw, int ls, int tw) > { > void __iomem *ioaddr = hw->pcsr; > - int value = ((tw & 0x)) | ((ls & 0x7ff) << 16); > + int value = ((tw & 0x)) | ((ls & 0x3ff) << 16); > > /* Program the timers in the LPI timer control register: > * LS: minimum time (ms) for which the link > -- > 2.9.3 > -- wwr Rayagond
Re: [PATCH net-next] net: dsa: Fix inverted test for multiple CPU interface
On 01/22/2017 01:16 PM, Andrew Lunn wrote: > Remove the wrong !, otherwise we get false positives about having > multiple CPU interfaces. > > Fixes: b22de490869d ("net: dsa: store CPU switch structure in the tree") > Signed-off-by: Andrew Lunn Reviewed-by: Florian Fainelli -- Florian
[PATCH net-next 0/2] net: couple mdio_module_driver changes
Hi David, Small patch series fixing a comment for mdio_module_driver and finally utilizing it in b53_mdio. Thanks! Florian Fainelli (2): net: phy: Fix typo for MDIO module boilerplate comment net: dsa: b53: Utilize mdio_module_driver drivers/net/dsa/b53/b53_mdio.c | 13 + include/linux/mdio.h | 2 +- 2 files changed, 2 insertions(+), 13 deletions(-) -- 2.9.3
[PATCH net-next 1/2] net: phy: Fix typo for MDIO module boilerplate comment
The module boilerplate macro is named mdio_module_driver and not module_mdio_driver, fix that. Fixes: a9049e0c513c ("mdio: Add support for mdio drivers.") Signed-off-by: Florian Fainelli --- include/linux/mdio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mdio.h b/include/linux/mdio.h index b6587a4b32e7..55a80d73cfc1 100644 --- a/include/linux/mdio.h +++ b/include/linux/mdio.h @@ -265,7 +265,7 @@ bool mdiobus_is_registered_device(struct mii_bus *bus, int addr); struct phy_device *mdiobus_get_phy(struct mii_bus *bus, int addr); /** - * module_mdio_driver() - Helper macro for registering mdio drivers + * mdio_module_driver() - Helper macro for registering mdio drivers * * Helper macro for MDIO drivers which do not do anything special in module * init/exit. Each module may only use this macro once, and calling it -- 2.9.3
[PATCH net-next 2/2] net: dsa: b53: Utilize mdio_module_driver
Eliminate a bit of boilerplate code. Signed-off-by: Florian Fainelli --- drivers/net/dsa/b53/b53_mdio.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/net/dsa/b53/b53_mdio.c b/drivers/net/dsa/b53/b53_mdio.c index 477a16b5660a..fa7556f5d4fb 100644 --- a/drivers/net/dsa/b53/b53_mdio.c +++ b/drivers/net/dsa/b53/b53_mdio.c @@ -375,18 +375,7 @@ static struct mdio_driver b53_mdio_driver = { .of_match_table = b53_of_match, }, }; - -static int __init b53_mdio_driver_register(void) -{ - return mdio_driver_register(&b53_mdio_driver); -} -module_init(b53_mdio_driver_register); - -static void __exit b53_mdio_driver_unregister(void) -{ - mdio_driver_unregister(&b53_mdio_driver); -} -module_exit(b53_mdio_driver_unregister); +mdio_module_driver(b53_mdio_driver); MODULE_DESCRIPTION("B53 MDIO access driver"); MODULE_LICENSE("Dual BSD/GPL"); -- 2.9.3
Re: [PATCH net-next] net: ipv6: ignore null_entry on route dumps
David, please slow down. How is the NULL entry getting selected to be dumped and passed down here in the first place? The problem seems to be higher up in the chain here, don't just special case check for this in rt6_dump_route(). Thanks.
[PATCH net-next] net: ipv6: ignore null_entry on route dumps
lkp-robot reported a BUG: [ 10.151226] BUG: unable to handle kernel NULL pointer dereference at 0198 [ 10.152525] IP: rt6_fill_node+0x164/0x4b8 [ 10.153307] *pdpt = 12ee5001 *pde = [ 10.153309] [ 10.154492] Oops: [#1] [ 10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty #10 [ 10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 10.158254] task: d0deb000 task.stack: d0e0c000 [ 10.159059] EIP: rt6_fill_node+0x164/0x4b8 [ 10.159780] EFLAGS: 00010296 CPU: 0 [ 10.160404] EAX: EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44 [ 10.161469] ESI: EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4 [ 10.162534] DS: 007b ES: 007b FS: GS: 0033 SS: 0068 [ 10.163482] CR0: 80050033 CR2: 0198 CR3: 10d94660 CR4: 06b0 [ 10.164535] Call Trace: [ 10.164993] ? paravirt_sched_clock+0x9/0xd [ 10.165727] ? sched_clock+0x9/0xc [ 10.166329] ? sched_clock_cpu+0x19/0xe9 [ 10.166991] ? lock_release+0x13e/0x36c [ 10.167652] rt6_dump_route+0x4c/0x56 [ 10.168276] fib6_dump_node+0x1d/0x3d [ 10.168913] fib6_walk_continue+0xab/0x167 [ 10.169611] fib6_walk+0x2a/0x40 [ 10.170182] inet6_dump_fib+0xfb/0x1e0 [ 10.170855] netlink_dump+0xcd/0x21f This happens when the loopback device is set down and a ipv6 fib route dump is requested. The ipv6 route dump code passes ip6_null_entry to rt6_fill_node. This route uses the loopback device but does not have idev set. When the loopback is set down, the netif_running check added by a1a22c1206 fails and the fill_node descends to checking rt->rt6i_idev for ignore_routes_with_linkdown. Since idev is null for the ip6_null_entry route it triggers the BUG. The null_entry route should not be processed in a dump request. Catch and ignore. Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down") Signed-off-by: David Ahern --- net/ipv6/route.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 4b1f0f98a0e9..47499ed429da 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3320,6 +3320,10 @@ static int rt6_fill_node(struct net *net, int rt6_dump_route(struct rt6_info *rt, void *p_arg) { struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; + struct net *net = arg->net; + + if (rt == net->ipv6.ip6_null_entry) + return 0; if (nlmsg_len(arg->cb->nlh) >= sizeof(struct rtmsg)) { struct rtmsg *rtm = nlmsg_data(arg->cb->nlh); @@ -3332,7 +3336,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg) } } - return rt6_fill_node(arg->net, + return rt6_fill_node(net, arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq, NLM_F_MULTI); -- 2.1.4
RE: [PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol
> -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Jingju Hou > Sent: Monday, January 23, 2017 12:11 PM > To: da...@davemloft.net > Cc: jszh...@marvell.com; thomas.petazz...@free-electrons.com; > netdev@vger.kernel.org; Jingju Hou > Subject: [PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol > > The mvneta itself does not support WOL, but the PHY might. > So pass the calls to the PHY > > Signed-off-by: Jingju Hou > --- > Since v2: > - it should be phydev member not phy_dev > > drivers/net/ethernet/marvell/mvneta.c | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index e05e227..fea4968 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct > net_device *dev, u32 *indir, u8 *key, > return 0; > } > > +static void > +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo > *wol) > +{ > + wol->supported = 0; > + wol->wolopts = 0; > + > + if (dev->phy_dev) Not changed, > + return phy_ethtool_get_wol(dev->phydev, wol); > +} > + > +static int > +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) > +{ > + if (!dev->phydev) > + return -EOPNOTSUPP; > + > + return phy_ethtool_set_wol(dev->phydev, wol); > +} > + > static const struct net_device_ops mvneta_netdev_ops = { > .ndo_open= mvneta_open, > .ndo_stop= mvneta_stop, > @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct > net_device *dev, u32 *indir, u8 *key, > .set_rxfh = mvneta_ethtool_set_rxfh, > .get_link_ksettings = phy_ethtool_get_link_ksettings, > .set_link_ksettings = mvneta_ethtool_set_link_ksettings, > + .get_wol= mvneta_ethtool_get_wol, > + .set_wol= mvneta_ethtool_set_wol, > }; > > /* Initialize hw */ > -- > 1.9.1
Re: [PATCH net-next] net: ipv6: Check that idev is non-NULL in rt6_fill_node
On 1/22/17 9:32 PM, David Miller wrote: > From: David Ahern > Date: Sun, 22 Jan 2017 20:08:00 -0800 > >> The ipv6 route dump code passes ip6_null_entry to rt6_fill_node. > > Doesn't this fact cause you to take a pause? yes, it did. > > I can't see a legitimate reason to dump the null entry, it's > a marker rather than a real entry. > neither do I. I was rather surprised to see it hit rt6_fill_node and that the rc is 0. I can send a v2 that drops null_entry.
Re: [PATCH net-next] net: ipv6: Check that idev is non-NULL in rt6_fill_node
From: David Ahern Date: Sun, 22 Jan 2017 20:08:00 -0800 > The ipv6 route dump code passes ip6_null_entry to rt6_fill_node. Doesn't this fact cause you to take a pause? I can't see a legitimate reason to dump the null entry, it's a marker rather than a real entry.
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Thu, Jan 19, 2017 at 08:04:59PM -0800, Andy Lutomirski wrote: > On Thu, Jan 19, 2017 at 6:39 PM, Alexei Starovoitov > wrote: > > On Wed, Jan 18, 2017 at 06:29:22PM -0800, Andy Lutomirski wrote: > >> I think it could work by making a single socket cgroup controller that > >> handles all cgroup things that are bound to a socket. Using > > > > Such 'socket cgroup controller' would limit usability of the feature > > to sockets and force all other use cases like landlock to invent > > their own wheel, which is undesirable. Everyone will be > > inventing new 'foo cgroup controller', while all of them > > are really bpf features. They are different bpf program > > types that attach to different hooks and use cgroup for scoping. > > Can you elaborate on why that would be a problem? In a cgroup v1 > world, users who want different hierarchies for different types of > control could easily want one hierarchy for socket hooks and a > different hierarchy for lsm hooks. In a cgroup v2 delegation world, I > could easily imagine the decision to delegate socket hooks being > different from the decision to delegate lsm hooks. Almost all of the > code would be shared between different bpf-using cgroup controllers. how do you think it can be enforced when directory is chowned? > >> Having thought about this some more, I think that making it would > >> alleviate a bunch of my concerns, as it would make the semantics if > >> the capable() check were relaxed to ns_capable() be sane. Here's what > > > > here we're on the same page. For any meaningful discussion about > > 'bpf cgroup controller' to happen bpf itself needs to become > > delegatable in cgroup sense. In other words BPF_PROG_TYPE_CGROUP* > > program types need to become available for unprivileged users. > > The only unprivileged prog type today is BPF_PROG_TYPE_SOCKET_FILTER. > > To make it secure we severely limited its functionality. > > All bpf advances since then (like new map types and verifier extensions) > > were done for root only. If early on the priv vs unpriv bpf features > > were 80/20. Now it's close to 95/5. No work has been done to > > make socket filter type more powerful. It still has to use > > slow-ish ld_abs skb access while tc/xdp have direct packet access. > > Things like register value tracking is root only as well and so on > > and so forth. > > We cannot just flip the switch and allow type_cgroup* to unpriv > > and I don't see any volunteers willing to do this work. > > Until that happens there is no point coming up with designs > > for 'cgroup bpf controller'... whatever that means. > > Sure there is. If delegation can be turned on without changing the > API, then the result will be easier to work with and have fewer > compatibility issues. ... and open() of the directory done by the current api will preserve cgroup delegation when and only when bpf_prog_type_cgroup_* becomes unprivileged. I'm not proposing creating new api here. > > > >> I currently should happen before bpf+cgroup is enabled in a release: > >> > >> 1. Make it netns-aware. This could be as simple as making it only > >> work in the root netns because then real netns awareness can be added > >> later without breaking anything. The current situation is bad in that > >> network namespaces are just ignored and it's plausible that people > >> will start writing user code that depends on having network namespaces > >> be ignored. > > > > nothing in bpf today is netns-aware and frankly I don't see > > how cgroup+bpf has anything to do with netns. > > For regular sockets+bpf we don't check netns. > > When tcpdump opens raw socket and attaches bpf there are no netns > > checks, since socket itself gives a scope for the program to run. > > Same thing applies to cgroup+bpf. cgroup gives a scope for the program. > > But, say, we indeed add 'if !root ns' check to BPF_CGROUP_INET_* > > hooks. > > > Here I completely disagree with you. tcpdump sees packets in its > network namespace. Regular sockets apply bpf filters to the packets > seen by that socket, and the socket itself is scoped to a netns. > > Meanwhile, cgroup+bpf actually appears to be buggy in this regard even > regardless of what semantics you think are better. sk_bound_dev_if is > exposed as a u32 value, but sk_bound_dev_if only has meaning within a > given netns. The "ip vrf" stuff will straight-up malfunction if a > process affected by its hook runs in a different netns from the netns > that "ip vrf" was run in. how is that any different from normal 'ip netns exec'? that is expected user behavior. > IOW, the current code is buggy. > > > Then if the hooks are used for security, the process > > only needs to do setns() to escape security sandbox. Obviously > > broken semantics. > > This could go both ways. If the goal is to filter packets, then it's > not really important to have the filter keep working if the sandboxed > task unshares netns -- in the new netns, there isn't any access to the > netwo
Re: [PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol
From: Jingju Hou Date: Mon, 23 Jan 2017 12:11:18 +0800 > The mvneta itself does not support WOL, but the PHY might. > So pass the calls to the PHY > > Signed-off-by: Jingju Hou > --- > Since v2: > - it should be phydev member not phy_dev > > drivers/net/ethernet/marvell/mvneta.c | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index e05e227..fea4968 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device > *dev, u32 *indir, u8 *key, > return 0; > } > > +static void > +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) > +{ > + wol->supported = 0; > + wol->wolopts = 0; > + > + if (dev->phy_dev) You are not testing the build of this patch, you are still using phy_dev here. Either that or your commit message is not accurate. Either way this patch or it's commit message is wrong. I think you need to stop, take a deep breath, and take your time fixing this. Right now you are spitting out a new patch just minutes after a previous submission, and these patches still have the same bugs.
Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol
The same build error exists in all submissions of your patch. At this point you must absolutely reproduce this build failure yourself, and stop submitting this patch until you can test that the build failure is fixed.
[PATCH v3 net-next] net: mvneta: implement .set_wol and .get_wol
The mvneta itself does not support WOL, but the PHY might. So pass the calls to the PHY Signed-off-by: Jingju Hou --- Since v2: - it should be phydev member not phy_dev drivers/net/ethernet/marvell/mvneta.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index e05e227..fea4968 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, return 0; } +static void +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + wol->supported = 0; + wol->wolopts = 0; + + if (dev->phy_dev) + return phy_ethtool_get_wol(dev->phydev, wol); +} + +static int +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + if (!dev->phydev) + return -EOPNOTSUPP; + + return phy_ethtool_set_wol(dev->phydev, wol); +} + static const struct net_device_ops mvneta_netdev_ops = { .ndo_open= mvneta_open, .ndo_stop= mvneta_stop, @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, .set_rxfh = mvneta_ethtool_set_rxfh, .get_link_ksettings = phy_ethtool_get_link_ksettings, .set_link_ksettings = mvneta_ethtool_set_link_ksettings, + .get_wol= mvneta_ethtool_get_wol, + .set_wol= mvneta_ethtool_set_wol, }; /* Initialize hw */ -- 1.9.1
[PATCH net-next] net: ipv6: Check that idev is non-NULL in rt6_fill_node
lkp-robot reported a BUG: [ 10.151226] BUG: unable to handle kernel NULL pointer dereference at 0198 [ 10.152525] IP: rt6_fill_node+0x164/0x4b8 [ 10.153307] *pdpt = 12ee5001 *pde = [ 10.153309] [ 10.154492] Oops: [#1] [ 10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty #10 [ 10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 10.158254] task: d0deb000 task.stack: d0e0c000 [ 10.159059] EIP: rt6_fill_node+0x164/0x4b8 [ 10.159780] EFLAGS: 00010296 CPU: 0 [ 10.160404] EAX: EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44 [ 10.161469] ESI: EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4 [ 10.162534] DS: 007b ES: 007b FS: GS: 0033 SS: 0068 [ 10.163482] CR0: 80050033 CR2: 0198 CR3: 10d94660 CR4: 06b0 [ 10.164535] Call Trace: [ 10.164993] ? paravirt_sched_clock+0x9/0xd [ 10.165727] ? sched_clock+0x9/0xc [ 10.166329] ? sched_clock_cpu+0x19/0xe9 [ 10.166991] ? lock_release+0x13e/0x36c [ 10.167652] rt6_dump_route+0x4c/0x56 [ 10.168276] fib6_dump_node+0x1d/0x3d [ 10.168913] fib6_walk_continue+0xab/0x167 [ 10.169611] fib6_walk+0x2a/0x40 [ 10.170182] inet6_dump_fib+0xfb/0x1e0 [ 10.170855] netlink_dump+0xcd/0x21f This happens when the loopback device is set down and a ipv6 fib route dump is requested. The ipv6 route dump code passes ip6_null_entry to rt6_fill_node. This route uses the loopback device but does not have idev set. When the loopback is set down, the netif_running check added by a1a22c1206 fails and the fill_node descends to checking rt->rt6i_idev for ignore_routes_with_linkdown. Since idev is null for the ip6_null_entry route it triggers the BUG. Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down") Signed-off-by: David Ahern --- net/ipv6/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 5585c501a540..9a7cc7558104 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3218,7 +3218,8 @@ static int rt6_fill_node(struct net *net, rtm->rtm_flags = 0; if (!netif_carrier_ok(rt->dst.dev)) { rtm->rtm_flags |= RTNH_F_LINKDOWN; - if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown) + if (rt->rt6i_idev && + rt->rt6i_idev->cnf.ignore_routes_with_linkdown) rtm->rtm_flags |= RTNH_F_DEAD; } rtm->rtm_scope = RT_SCOPE_UNIVERSE; -- 2.1.4
Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol
Hi Jingju, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Jingju-Hou/net-mvneta-implement-set_wol-and-get_wol/20170123-105218 config: m68k-allyesconfig (attached as .config) compiler: m68k-linux-gcc (GCC) 4.9.0 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=m68k All errors (new ones prefixed by >>): drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_get_wol': >> drivers/net/ethernet/marvell/mvneta.c:3938:9: error: 'struct net_device' has >> no member named 'phy_dev' if (dev->phy_dev) ^ drivers/net/ethernet/marvell/mvneta.c:3939:33: error: 'struct net_device' has no member named 'phy_dev' return phy_ethtool_get_wol(dev->phy_dev, wol); ^ drivers/net/ethernet/marvell/mvneta.c:3939:3: warning: 'return' with a value, in function returning void return phy_ethtool_get_wol(dev->phy_dev, wol); ^ drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_set_wol': drivers/net/ethernet/marvell/mvneta.c:3945:10: error: 'struct net_device' has no member named 'phy_dev' if (!dev->phy_dev) ^ drivers/net/ethernet/marvell/mvneta.c:3948:32: error: 'struct net_device' has no member named 'phy_dev' return phy_ethtool_set_wol(dev->phy_dev, wol); ^ drivers/net/ethernet/marvell/mvneta.c:3949:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +3938 drivers/net/ethernet/marvell/mvneta.c 3932 static void 3933 mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) 3934 { 3935 wol->supported = 0; 3936 wol->wolopts = 0; 3937 > 3938 if (dev->phy_dev) 3939 return phy_ethtool_get_wol(dev->phy_dev, wol); 3940 } 3941 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol
Hi Jingju, [auto build test ERROR on net-next/master] url: https://github.com/0day-ci/linux/commits/Jingju-Hou/net-mvneta-implement-set_wol-and-get_wol/20170123-105218 config: ia64-allmodconfig (attached as .config) compiler: ia64-linux-gcc (GCC) 6.2.0 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=ia64 All errors (new ones prefixed by >>): drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_get_wol': >> drivers/net/ethernet/marvell/mvneta.c:3938:9: error: 'struct net_device' has >> no member named 'phy_dev'; did you mean 'phydev'? if (dev->phy_dev) ^~ drivers/net/ethernet/marvell/mvneta.c:3939:33: error: 'struct net_device' has no member named 'phy_dev'; did you mean 'phydev'? return phy_ethtool_get_wol(dev->phy_dev, wol); ^~ drivers/net/ethernet/marvell/mvneta.c:3939:10: warning: 'return' with a value, in function returning void return phy_ethtool_get_wol(dev->phy_dev, wol); ^~~ drivers/net/ethernet/marvell/mvneta.c:3933:1: note: declared here mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) ^~ drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_set_wol': drivers/net/ethernet/marvell/mvneta.c:3945:10: error: 'struct net_device' has no member named 'phy_dev'; did you mean 'phydev'? if (!dev->phy_dev) ^~ drivers/net/ethernet/marvell/mvneta.c:3948:32: error: 'struct net_device' has no member named 'phy_dev'; did you mean 'phydev'? return phy_ethtool_set_wol(dev->phy_dev, wol); ^~ drivers/net/ethernet/marvell/mvneta.c:3949:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +3938 drivers/net/ethernet/marvell/mvneta.c 3932 static void 3933 mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) 3934 { 3935 wol->supported = 0; 3936 wol->wolopts = 0; 3937 > 3938 if (dev->phy_dev) 3939 return phy_ethtool_get_wol(dev->phy_dev, wol); 3940 } 3941 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
EPOLLERR on memory mapped netlink socket
Hi experts, I am new to netlink sockets. In my app , I am getting EPOLLERR in epoll_wait() on netlink socket continuously. epoll just notifies that there is a read event on socket (it does not tell if it is read or epollerr). What could be cause of this and what EPOLLERR on memory mapped netlink socket mean.is this Other side of netlink (kernel side) closed connection ?even if kernel side closed connection why non-stop repeated EPOLLERR s on netlink sockets ? What action should we take in such cases ? just close the socket or call getsockopt(SO_ERROR) to retrieve the pending error state from the socket and just continue without closing socket? how do we detect if kernel side closed the connection ? My understanding is : if we get read event notification from epoll on memory mapped netlink socket and in RX ring if the frame is neither NL_MMAP_STATUS_VALID and nor NL_MMAP_STATUS_COPY then we can conclude that this is a 'close()' from remote kernel socket and I can close connection by calling close() on my netlink socket. Is above understanding correct ? Please
Re: [PATCH v2 net-next] net: phy: marvell: Add Wake from LAN support for 88E1510 PHY
On Mon, 23 Jan 2017 10:58:15 +0800 wrote: > This is test on BG4CT platform with 88E1518 marvell PHY. > > Signed-off-by: Jingju Hou Reviewed-by: Jisheng Zhang > --- > Since v1: > - add some commit messages > > drivers/net/phy/marvell.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c > index 0b78210..ed0d235 100644 > --- a/drivers/net/phy/marvell.c > +++ b/drivers/net/phy/marvell.c > @@ -1679,6 +1679,8 @@ static int marvell_probe(struct phy_device *phydev) > .ack_interrupt = &marvell_ack_interrupt, > .config_intr = &marvell_config_intr, > .did_interrupt = &m88e1121_did_interrupt, > + .get_wol = &m88e1318_get_wol, > + .set_wol = &m88e1318_set_wol, > .resume = &marvell_resume, > .suspend = &marvell_suspend, > .get_sset_count = marvell_get_sset_count,
Re: [PATCH] net: mvneta: implement .set_wol and .get_wol
Hi Jingju, On Mon, 23 Jan 2017 10:43:08 +0800 wrote: > The mvneta itself does not support WOL, but the PHY might. > So pass the calls to the PHY > > Signed-off-by: Jingju Hou > --- > Since v1: > - using phy_dev member in struct net_device I noticed that you send a new v2 patch. So this patch should be ignored. Some tips: *the v2 patch title should be like: [PATCH v2] net: mvneta: implement .set_wol and .get_wol *you also add a commit msg in v2, you'd better mention it in changes since v1. Thanks, Jisheng > > drivers/net/ethernet/marvell/mvneta.c | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index e05e227..78869fa 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device > *dev, u32 *indir, u8 *key, > return 0; > } > > +static void > +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) > +{ > + wol->supported = 0; > + wol->wolopts = 0; > + > + if (dev->phy_dev) > + return phy_ethtool_get_wol(dev->phy_dev, wol); > +} > + > +static int > +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) > +{ > + if (!dev->phy_dev) > + return -EOPNOTSUPP; > + > + return phy_ethtool_set_wol(dev->phy_dev, wol); > +} > + > static const struct net_device_ops mvneta_netdev_ops = { > .ndo_open= mvneta_open, > .ndo_stop= mvneta_stop, > @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device > *dev, u32 *indir, u8 *key, > .set_rxfh = mvneta_ethtool_set_rxfh, > .get_link_ksettings = phy_ethtool_get_link_ksettings, > .set_link_ksettings = mvneta_ethtool_set_link_ksettings, > + .get_wol= mvneta_ethtool_get_wol, > + .set_wol= mvneta_ethtool_set_wol, > }; > > /* Initialize hw */
[PATCH v2 net-next] net: phy: marvell: Add Wake from LAN support for 88E1510 PHY
This is test on BG4CT platform with 88E1518 marvell PHY. Signed-off-by: Jingju Hou --- Since v1: - add some commit messages drivers/net/phy/marvell.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 0b78210..ed0d235 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -1679,6 +1679,8 @@ static int marvell_probe(struct phy_device *phydev) .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, .did_interrupt = &m88e1121_did_interrupt, + .get_wol = &m88e1318_get_wol, + .set_wol = &m88e1318_set_wol, .resume = &marvell_resume, .suspend = &marvell_suspend, .get_sset_count = marvell_get_sset_count, -- 1.9.1
Re: [PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol
On Mon, 23 Jan 2017 10:44:07 +0800 Jingju Hou wrote: > The mvneta itself does not support WOL, but the PHY might. > So pass the calls to the PHY > > Signed-off-by: Jingju Hou Reviewed-by: Jisheng Zhang > --- > Since v1: > - using phy_dev member in struct net_device > > drivers/net/ethernet/marvell/mvneta.c | 21 + > 1 file changed, 21 insertions(+) > > diff --git a/drivers/net/ethernet/marvell/mvneta.c > b/drivers/net/ethernet/marvell/mvneta.c > index e05e227..78869fa 100644 > --- a/drivers/net/ethernet/marvell/mvneta.c > +++ b/drivers/net/ethernet/marvell/mvneta.c > @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device > *dev, u32 *indir, u8 *key, > return 0; > } > > +static void > +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) > +{ > + wol->supported = 0; > + wol->wolopts = 0; > + > + if (dev->phy_dev) > + return phy_ethtool_get_wol(dev->phy_dev, wol); > +} > + > +static int > +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) > +{ > + if (!dev->phy_dev) > + return -EOPNOTSUPP; > + > + return phy_ethtool_set_wol(dev->phy_dev, wol); > +} > + > static const struct net_device_ops mvneta_netdev_ops = { > .ndo_open= mvneta_open, > .ndo_stop= mvneta_stop, > @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device > *dev, u32 *indir, u8 *key, > .set_rxfh = mvneta_ethtool_set_rxfh, > .get_link_ksettings = phy_ethtool_get_link_ksettings, > .set_link_ksettings = mvneta_ethtool_set_link_ksettings, > + .get_wol= mvneta_ethtool_get_wol, > + .set_wol= mvneta_ethtool_set_wol, > }; > > /* Initialize hw */
[PATCHv2 net-next] net: mvneta: implement .set_wol and .get_wol
The mvneta itself does not support WOL, but the PHY might. So pass the calls to the PHY Signed-off-by: Jingju Hou --- Since v1: - using phy_dev member in struct net_device drivers/net/ethernet/marvell/mvneta.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index e05e227..78869fa 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, return 0; } +static void +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + wol->supported = 0; + wol->wolopts = 0; + + if (dev->phy_dev) + return phy_ethtool_get_wol(dev->phy_dev, wol); +} + +static int +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + if (!dev->phy_dev) + return -EOPNOTSUPP; + + return phy_ethtool_set_wol(dev->phy_dev, wol); +} + static const struct net_device_ops mvneta_netdev_ops = { .ndo_open= mvneta_open, .ndo_stop= mvneta_stop, @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, .set_rxfh = mvneta_ethtool_set_rxfh, .get_link_ksettings = phy_ethtool_get_link_ksettings, .set_link_ksettings = mvneta_ethtool_set_link_ksettings, + .get_wol= mvneta_ethtool_get_wol, + .set_wol= mvneta_ethtool_set_wol, }; /* Initialize hw */ -- 1.9.1
[PATCH] net: mvneta: implement .set_wol and .get_wol
The mvneta itself does not support WOL, but the PHY might. So pass the calls to the PHY Signed-off-by: Jingju Hou --- Since v1: - using phy_dev member in struct net_device drivers/net/ethernet/marvell/mvneta.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index e05e227..78869fa 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3908,6 +3908,25 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, return 0; } +static void +mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + wol->supported = 0; + wol->wolopts = 0; + + if (dev->phy_dev) + return phy_ethtool_get_wol(dev->phy_dev, wol); +} + +static int +mvneta_ethtool_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) +{ + if (!dev->phy_dev) + return -EOPNOTSUPP; + + return phy_ethtool_set_wol(dev->phy_dev, wol); +} + static const struct net_device_ops mvneta_netdev_ops = { .ndo_open= mvneta_open, .ndo_stop= mvneta_stop, @@ -3937,6 +3956,8 @@ static int mvneta_ethtool_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, .set_rxfh = mvneta_ethtool_set_rxfh, .get_link_ksettings = phy_ethtool_get_link_ksettings, .set_link_ksettings = mvneta_ethtool_set_link_ksettings, + .get_wol= mvneta_ethtool_get_wol, + .set_wol= mvneta_ethtool_set_wol, }; /* Initialize hw */ -- 1.9.1
Re: [RESEND PATCH net] macsec: fix validation failed in asynchronous operation.
Sorry for forgetting to explain it. The original patch was incomplete, but I sent it out by mistake... So please ignore it. On Sun, 2017-01-22 at 16:44 -0500, David Miller wrote: > Why are you resending this? > > The original posting on Jan 20th made it to the mailing list and is queued > up in patchwork just fine. > > Also, regardless of the reason, a "RESEND" patch should always contain an > explanation of why it needs to be resent. So that the maintainer doesn't > need to ask questions like I am right now.
[lkp-robot] [net] a1a22c1206: BUG:unable_to_handle_kernel
FYI, we noticed the following commit: commit: a1a22c12060e4b9c52f45d4b3460f614e00162a2 ("net: ipv6: Keep nexthop of multipath route on admin down") https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master in testcase: trinity with following parameters: runtime: 300s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ on test machine: qemu-system-i386 -enable-kvm -m 320M caused below changes: +---+++ | | dceeab0e52 | a1a22c1206 | +---+++ | boot_successes| 8 | 4 | | boot_failures | 0 | 4 | | BUG:unable_to_handle_kernel | 0 | 4 | | Oops:#[##]| 0 | 4 | | Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0 | 4 | +---+++ [ 150.634538] ubus (612) used greatest stack depth: 6716 bytes left [ 151.925694] ubus (647) used greatest stack depth: 6616 bytes left [ 154.978628] ubus (724) used greatest stack depth: 6604 bytes left [ 158.324778] BUG: unable to handle kernel NULL pointer dereference at 0198 [ 158.334111] IP: rt6_fill_node+0x14e/0x4a6 [ 158.339546] *pdpt = 0f789001 *pde = [ 158.339554] [ 158.349075] Oops: [#1] [ 158.353060] CPU: 0 PID: 726 Comm: netifd Not tainted 4.10.0-rc4-00660-ga1a22c1 #1 [ 158.362818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014 [ 158.375911] task: cf751000 task.stack: cf78c000 [ 158.381810] EIP: rt6_fill_node+0x14e/0x4a6 [ 158.386925] EFLAGS: 00010246 CPU: 0 [ 158.392921] EAX: EBX: d1f6c358 ECX: cec03f40 EDX: [ 158.400763] ESI: cf78dbf8 EDI: EBP: cf78dc54 ESP: cf78dbe8 [ 158.408870] DS: 007b ES: 007b FS: GS: 0033 SS: 0068 [ 158.415864] CR0: 80050033 CR2: 0198 CR3: 12e50220 CR4: 06b0 [ 158.423771] Call Trace: [ 158.426955] ? paravirt_sched_clock+0x9/0xd [ 158.432069] ? sched_clock+0x9/0xc [ 158.436702] ? sched_clock_cpu+0x1a/0xe1 To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp qemu -k job-script # job-script is attached in this email Thanks, Xiaolong # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.10.0-rc4 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_MMU=y CONFIG_ARCH_MMAP_RND_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_BITS_MAX=16 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8 CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16 CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_DEBUG_RODATA=y CONFIG_PGTABLE_LEVELS=3 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_THREAD_INFO_IN_TASK=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set CONFIG_KERNEL_LZO=y # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SWAP=y # CONFIG_SYSVIPC is not set CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_
[PATCHv2 perf/core 3/7] tools lib bpf: Add set/is helpers for all prog types
These bpf_prog_types were exposed in the uapi but there were no corresponding functions to set these types for programs in libbpf. Signed-off-by: Joe Stringer Acked-by: Wang Nan --- v2: Add ack. --- tools/lib/bpf/libbpf.c | 5 + tools/lib/bpf/libbpf.h | 10 ++ 2 files changed, 15 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 371cb40a2304..406838fa9c4f 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -1448,8 +1448,13 @@ bool bpf_program__is_##NAME(struct bpf_program *prog) \ return bpf_program__is_type(prog, TYPE);\ } \ +BPF_PROG_TYPE_FNS(socket_filter, BPF_PROG_TYPE_SOCKET_FILTER); BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE); +BPF_PROG_TYPE_FNS(sched_cls, BPF_PROG_TYPE_SCHED_CLS); +BPF_PROG_TYPE_FNS(sched_act, BPF_PROG_TYPE_SCHED_ACT); BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT); +BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP); +BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT); int bpf_map__fd(struct bpf_map *map) { diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index a5a8b86a06fe..2188ccdc0e2d 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -174,11 +174,21 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n); /* * Adjust type of bpf program. Default is kprobe. */ +int bpf_program__set_socket_filter(struct bpf_program *prog); int bpf_program__set_tracepoint(struct bpf_program *prog); int bpf_program__set_kprobe(struct bpf_program *prog); +int bpf_program__set_sched_cls(struct bpf_program *prog); +int bpf_program__set_sched_act(struct bpf_program *prog); +int bpf_program__set_xdp(struct bpf_program *prog); +int bpf_program__set_perf_event(struct bpf_program *prog); +bool bpf_program__is_socket_filter(struct bpf_program *prog); bool bpf_program__is_tracepoint(struct bpf_program *prog); bool bpf_program__is_kprobe(struct bpf_program *prog); +bool bpf_program__is_sched_cls(struct bpf_program *prog); +bool bpf_program__is_sched_act(struct bpf_program *prog); +bool bpf_program__is_xdp(struct bpf_program *prog); +bool bpf_program__is_perf_event(struct bpf_program *prog); /* * We don't need __attribute__((packed)) now since it is -- 2.11.0
[PATCHv2 perf/core 1/7] tools lib bpf: Fix map offsets in relocation
Commit 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") attempted to fix map resolution by identifying the number of symbols that point to maps, and using this number to resolve each of the maps. However, during relocation the original definition of the map size was still in use. For up to two maps, the calculation was correct if there was a small difference in size between the map definition in libbpf and the one that the client library uses. However if the difference was large, particularly if more than two maps were used in the BPF program, the relocation would fail. For example, when using a map definition with size 28, with three maps, map relocation would count (sym_offset / sizeof(struct bpf_map_def) => map_idx) (0 / 16 => 0), ie map_idx = 0 (28 / 16 => 1), ie map_idx = 1 (56 / 16 => 3), ie map_idx = 3 So, libbpf reports: libbpf: bpf relocation: map_idx 3 large than 2 Fix map relocation by checking the exact offset of maps when doing relocation. Fixes: 4708bbda5cb2 ("tools lib bpf: Fix maps resolution") Signed-off-by: Joe Stringer Signed-off-by: Wang Nan [Allow different map size in an object] Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Arnaldo Carvalho de Melo --- v2: Use cached offsets of maps for relocation (Wang Nan) This is a repost of the version Wang Nan posted on Jan 19. --- tools/lib/bpf/libbpf.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 84e6b35da4bd..671d5ad07cf1 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -779,7 +779,7 @@ static int bpf_program__collect_reloc(struct bpf_program *prog, size_t nr_maps, GElf_Shdr *shdr, Elf_Data *data, Elf_Data *symbols, - int maps_shndx) + int maps_shndx, struct bpf_map *maps) { int i, nrels; @@ -829,7 +829,15 @@ bpf_program__collect_reloc(struct bpf_program *prog, return -LIBBPF_ERRNO__RELOC; } - map_idx = sym.st_value / sizeof(struct bpf_map_def); + /* TODO: 'maps' is sorted. We can use bsearch to make it faster. */ + for (map_idx = 0; map_idx < nr_maps; map_idx++) { + if (maps[map_idx].offset == sym.st_value) { + pr_debug("relocation: find map %zd (%s) for insn %u\n", +map_idx, maps[map_idx].name, insn_idx); + break; + } + } + if (map_idx >= nr_maps) { pr_warning("bpf relocation: map_idx %d large than %d\n", (int)map_idx, (int)nr_maps - 1); @@ -953,7 +961,8 @@ static int bpf_object__collect_reloc(struct bpf_object *obj) err = bpf_program__collect_reloc(prog, nr_maps, shdr, data, obj->efile.symbols, -obj->efile.maps_shndx); +obj->efile.maps_shndx, +obj->maps); if (err) return err; } -- 2.11.0
[PATCHv2 perf/core 0/7] Libbpf improvements
Patch 1 fixes an issue when using drastically different BPF map definitions inside ELFs from a client using libbpf, vs the map definition libbpf uses. Patches 2-4 add some simple, useful helper functions for setting prog type and retrieving libbpf errors without depending on kernel headers from userspace programs. Patches 5-7 add a new pinning functionality for maps, programs, and objects. Library users may call bpf_map__pin(map, path) or bpf_program__pin(prog, path) to pin maps and programs separately, or use bpf_object__pin(obj, path) to pin all maps and programs from the BPF object to the path. The map and program variations require a full path where it will be pinned in the filesystem, and the object variation will create directories "maps/" and "progs/" under the specified path, then mount each map and program under those subdirectories. --- v1: Initial post. v2: Wang Nan provided improvements to patch 1. Dropped patch 2 from v1. Added acks for acked patches. Split the bpf_obj__pin() to also provide map / program pinning APIs. Allow users to provide full filesystem path (don't autodetect/mount BPFFS). Joe Stringer (7): tools lib bpf: Fix map offsets in relocation tools lib bpf: Define prog_type fns with macro tools lib bpf: Add set/is helpers for all prog types tools lib bpf: Add libbpf_get_error() tools lib bpf: Add bpf_program__pin() tools lib bpf: Add bpf_map__pin() tools lib bpf: Add bpf_object__pin() tools/lib/bpf/libbpf.c | 240 ++-- tools/lib/bpf/libbpf.h | 17 +++- tools/perf/tests/llvm.c | 2 +- 3 files changed, 229 insertions(+), 30 deletions(-) -- 2.11.0
[PATCHv2 perf/core 2/7] tools lib bpf: Define prog_type fns with macro
Turning this into a macro allows future prog types to be added with a single line per type. Signed-off-by: Joe Stringer Acked-by: Wang Nan --- v2: Add ack. --- tools/lib/bpf/libbpf.c | 41 - 1 file changed, 16 insertions(+), 25 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 671d5ad07cf1..371cb40a2304 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -1428,37 +1428,28 @@ static void bpf_program__set_type(struct bpf_program *prog, prog->type = type; } -int bpf_program__set_tracepoint(struct bpf_program *prog) -{ - if (!prog) - return -EINVAL; - bpf_program__set_type(prog, BPF_PROG_TYPE_TRACEPOINT); - return 0; -} - -int bpf_program__set_kprobe(struct bpf_program *prog) -{ - if (!prog) - return -EINVAL; - bpf_program__set_type(prog, BPF_PROG_TYPE_KPROBE); - return 0; -} - static bool bpf_program__is_type(struct bpf_program *prog, enum bpf_prog_type type) { return prog ? (prog->type == type) : false; } -bool bpf_program__is_tracepoint(struct bpf_program *prog) -{ - return bpf_program__is_type(prog, BPF_PROG_TYPE_TRACEPOINT); -} - -bool bpf_program__is_kprobe(struct bpf_program *prog) -{ - return bpf_program__is_type(prog, BPF_PROG_TYPE_KPROBE); -} +#define BPF_PROG_TYPE_FNS(NAME, TYPE) \ +int bpf_program__set_##NAME(struct bpf_program *prog) \ +{ \ + if (!prog) \ + return -EINVAL; \ + bpf_program__set_type(prog, TYPE); \ + return 0; \ +} \ + \ +bool bpf_program__is_##NAME(struct bpf_program *prog) \ +{ \ + return bpf_program__is_type(prog, TYPE);\ +} \ + +BPF_PROG_TYPE_FNS(kprobe, BPF_PROG_TYPE_KPROBE); +BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT); int bpf_map__fd(struct bpf_map *map) { -- 2.11.0
[PATCHv2 perf/core 5/7] tools lib bpf: Add bpf_program__pin()
Add a new API to pin a BPF program to the filesystem. The user can specify the path full path within a BPF filesystem to pin the program. Programs with multiple instances are pinned as 'foo', 'foo_1', 'foo_2', and so on. Signed-off-by: Joe Stringer --- v2: Don't automount BPF filesystem Split program, map, object pinning into separate APIs and separate patches. --- tools/lib/bpf/libbpf.c | 76 ++ tools/lib/bpf/libbpf.h | 1 + 2 files changed, 77 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index e6cd62b1264b..eea5c74808f7 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -4,6 +4,7 @@ * Copyright (C) 2013-2015 Alexei Starovoitov * Copyright (C) 2015 Wang Nan * Copyright (C) 2015 Huawei Inc. + * Copyright (C) 2017 Nicira, Inc. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -22,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -31,7 +33,10 @@ #include #include #include +#include #include +#include +#include #include #include @@ -1237,6 +1242,77 @@ int bpf_object__load(struct bpf_object *obj) return err; } +static int check_path(const char *path) +{ + struct statfs st_fs; + char *dname, *dir; + int err = 0; + + if (path == NULL) + return -EINVAL; + + dname = strdup(path); + dir = dirname(dname); + if (statfs(dir, &st_fs)) { + pr_warning("failed to statfs %s: %s\n", dir, strerror(errno)); + err = -errno; + } + free(dname); + + if (!err && st_fs.f_type != BPF_FS_MAGIC) { + pr_warning("specified path %s is not on BPF FS\n", path); + err = -EINVAL; + } + + return err; +} + +int bpf_program__pin(struct bpf_program *prog, const char *path) +{ + int i, err; + + err = check_path(path); + if (err) + return err; + + if (prog == NULL) { + pr_warning("invalid program pointer\n"); + return -EINVAL; + } + + if (prog->instances.nr <= 0) { + pr_warning("no instances of prog %s to pin\n", + prog->section_name); + return -EINVAL; + } + + if (bpf_obj_pin(prog->instances.fds[0], path)) { + pr_warning("failed to pin program: %s\n", strerror(errno)); + return -errno; + } + pr_debug("pinned program '%s'\n", path); + + for (i = 1; i < prog->instances.nr; i++) { + char buf[PATH_MAX]; + int len; + + len = snprintf(buf, PATH_MAX, "%s_%d", path, i); + if (len < 0) + return -EINVAL; + else if (len > PATH_MAX) + return -ENAMETOOLONG; + + if (bpf_obj_pin(prog->instances.fds[i], buf)) { + pr_warning("failed to pin program: %s\n", + strerror(errno)); + return -errno; + } + pr_debug("pinned program '%s'\n", buf); + } + + return 0; +} + void bpf_object__close(struct bpf_object *obj) { size_t i; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 4014d1ba5e3d..7973087c377b 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -106,6 +106,7 @@ void *bpf_program__priv(struct bpf_program *prog); const char *bpf_program__title(struct bpf_program *prog, bool needs_copy); int bpf_program__fd(struct bpf_program *prog); +int bpf_program__pin(struct bpf_program *prog, const char *path); struct bpf_insn; -- 2.11.0
[PATCHv2 perf/core 4/7] tools lib bpf: Add libbpf_get_error()
This function will turn a libbpf pointer into a standard error code (or 0 if the pointer is valid). This also allows removal of the dependency on linux/err.h in the public header file, which causes problems in userspace programs built against libbpf. Signed-off-by: Joe Stringer Acked-by: Wang Nan --- v2: Add ack. --- tools/lib/bpf/libbpf.c | 8 tools/lib/bpf/libbpf.h | 4 +++- tools/perf/tests/llvm.c | 2 +- 3 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 406838fa9c4f..e6cd62b1264b 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -1542,3 +1543,10 @@ bpf_object__find_map_by_offset(struct bpf_object *obj, size_t offset) } return ERR_PTR(-ENOENT); } + +long libbpf_get_error(const void *ptr) +{ + if (IS_ERR(ptr)) + return PTR_ERR(ptr); + return 0; +} diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 2188ccdc0e2d..4014d1ba5e3d 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -22,8 +22,8 @@ #define __BPF_LIBBPF_H #include +#include #include -#include #include // for size_t enum libbpf_errno { @@ -234,4 +234,6 @@ int bpf_map__set_priv(struct bpf_map *map, void *priv, bpf_map_clear_priv_t clear_priv); void *bpf_map__priv(struct bpf_map *map); +long libbpf_get_error(const void *ptr); + #endif diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c index 02a33ebcd992..d357dab72e68 100644 --- a/tools/perf/tests/llvm.c +++ b/tools/perf/tests/llvm.c @@ -13,7 +13,7 @@ static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz) struct bpf_object *obj; obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, NULL); - if (IS_ERR(obj)) + if (libbpf_get_error(obj)) return TEST_FAIL; bpf_object__close(obj); return TEST_OK; -- 2.11.0
[PATCHv2 perf/core 7/7] tools lib bpf: Add bpf_object__pin()
Add a new API to pin a BPF object to the filesystem. The user can specify the path full path within a BPF filesystem to pin the object. Programs will be pinned under a subdirectory 'progs', and maps will be pinned under a subdirectory 'maps'. For example, with the directory '/sys/fs/bpf/foo': /sys/fs/bpf/foo/progs/PROG_NAME /sys/fs/bpf/foo/maps/MAP_NAME Signed-off-by: Joe Stringer --- v2: Don't automount BPF filesystem Split program, map, object pinning into separate APIs and separate patches. --- tools/lib/bpf/libbpf.c | 73 ++ tools/lib/bpf/libbpf.h | 1 + 2 files changed, 74 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index c1d8b07e21d2..41645dc51fa1 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -1335,6 +1336,78 @@ int bpf_map__pin(struct bpf_map *map, const char *path) return 0; } +static int make_dir(const char *path, const char *dir) +{ + char buf[PATH_MAX]; + int len, err = 0; + + len = snprintf(buf, PATH_MAX, "%s/%s", path, dir); + if (len < 0) + err = -EINVAL; + else if (len >= PATH_MAX) + err = -ENAMETOOLONG; + if (!err && mkdir(buf, 0700) && errno != EEXIST) + err = -errno; + + if (err) + pr_warning("failed to make dir %s/%s: %s\n", path, dir, + strerror(-err)); + return err; +} + +int bpf_object__pin(struct bpf_object *obj, const char *path) +{ + struct bpf_program *prog; + struct bpf_map *map; + int err; + + if (!obj) + return -ENOENT; + + if (!obj->loaded) { + pr_warning("object not yet loaded; load it first\n"); + return -ENOENT; + } + + err = make_dir(path, "maps"); + if (err) + return err; + + bpf_map__for_each(map, obj) { + char buf[PATH_MAX]; + int len; + + len = snprintf(buf, PATH_MAX, "%s/maps/%s", path, + bpf_map__name(map)); + if (len < 0 || len > PATH_MAX) + return -EINVAL; + + err = bpf_map__pin(map, buf); + if (err) + return err; + } + + err = make_dir(path, "progs"); + if (err) + return err; + + bpf_object__for_each_program(prog, obj) { + char buf[PATH_MAX]; + int len; + + len = snprintf(buf, PATH_MAX, "%s/progs/%s", path, + prog->section_name); + if (len < 0 || len > PATH_MAX) + return -EINVAL; + + err = bpf_program__pin(prog, buf); + if (err) + return err; + } + + return 0; +} + void bpf_object__close(struct bpf_object *obj) { size_t i; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 524247cfd205..8363ee6db4a0 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -65,6 +65,7 @@ struct bpf_object *bpf_object__open(const char *path); struct bpf_object *bpf_object__open_buffer(void *obj_buf, size_t obj_buf_sz, const char *name); +int bpf_object__pin(struct bpf_object *object, const char *path); void bpf_object__close(struct bpf_object *object); /* Load/unload object into/from kernel */ -- 2.11.0
[PATCHv2 perf/core 6/7] tools lib bpf: Add bpf_map__pin()
Add a new API to pin a BPF map to the filesystem. The user can specify the path full path within a BPF filesystem to pin the map. Signed-off-by: Joe Stringer --- v2: Don't automount BPF filesystem Split program, map, object pinning into separate APIs and separate patches. --- tools/lib/bpf/libbpf.c | 22 ++ tools/lib/bpf/libbpf.h | 1 + 2 files changed, 23 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index eea5c74808f7..c1d8b07e21d2 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -1313,6 +1313,28 @@ int bpf_program__pin(struct bpf_program *prog, const char *path) return 0; } +int bpf_map__pin(struct bpf_map *map, const char *path) +{ + int err; + + err = check_path(path); + if (err) + return err; + + if (map == NULL) { + pr_warning("invalid map pointer\n"); + return -EINVAL; + } + + if (bpf_obj_pin(map->fd, path)) { + pr_warning("failed to pin map: %s\n", strerror(errno)); + return -errno; + } + + pr_debug("pinned map '%s'\n", path); + return 0; +} + void bpf_object__close(struct bpf_object *obj) { size_t i; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 7973087c377b..524247cfd205 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -234,6 +234,7 @@ typedef void (*bpf_map_clear_priv_t)(struct bpf_map *, void *); int bpf_map__set_priv(struct bpf_map *map, void *priv, bpf_map_clear_priv_t clear_priv); void *bpf_map__priv(struct bpf_map *map); +int bpf_map__pin(struct bpf_map *map, const char *path); long libbpf_get_error(const void *ptr); -- 2.11.0
RE: [RFC PATCH net-next 4/5] bridge: vlan lwt and dst_metadata netlink support
Hi, Roopa, Two minor comments: The parameter br is not used in the br_add_vlan_tunnel_info() method, it should be removed: +static int br_add_vlan_tunnel_info(struct net_bridge *br, + struct net_bridge_port *p, int cmd, + u16 vid, u32 tun_id) +{ + int err; + + switch (cmd) { + case RTM_SETLINK: + if (p) { + /* if the MASTER flag is set this will act on the global +* per-VLAN entry as well +*/ + err = nbp_vlan_tunnel_info_add(p, vid, tun_id); + if (err) + break; + } else { + return -EINVAL; + } + + break; + + case RTM_DELLINK: + if (p) + nbp_vlan_tunnel_info_delete(p, vid); + else + return -EINVAL; + break; + } + + return 0; +} + The parameter br is used inside br_process_vlan_tunnel_info() only in the two Cases, when br_add_vlan_tunnel_info() is invoked. Since we saw earlier that it should be removed from br_add_vlan_tunnel_info(), it should also be removed from br_process_vlan_tunnel_info() as it is not needed anymore: +static int br_process_vlan_tunnel_info(struct net_bridge *br, + struct net_bridge_port *p, int cmd, + struct vtunnel_info *tinfo_curr, + struct vtunnel_info *tinfo_last) { + int t, v; + int err; + + if (tinfo_curr->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) { + if (tinfo_last->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) + return -EINVAL; + memcpy(tinfo_last, tinfo_curr, sizeof(struct vtunnel_info)); + } else if (tinfo_curr->flags & BRIDGE_VLAN_INFO_RANGE_END) { + if (!(tinfo_last->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN)) + return -EINVAL; + if ((tinfo_curr->vid - tinfo_last->vid) != + (tinfo_curr->tunid - tinfo_last->tunid)) + return -EINVAL; + /* XXX: tun id and vlan id attrs must be same +*/ + t = tinfo_last->tunid; + for (v = tinfo_last->vid; v <= tinfo_curr->vid; v++) { + err = br_add_vlan_tunnel_info(br, p, cmd, + v, t); + if (err) + return err; + t++; + } + memset(tinfo_last, 0, sizeof(struct vtunnel_info)); + memset(tinfo_curr, 0, sizeof(struct vtunnel_info)); + } else { + err = br_add_vlan_tunnel_info(br, p, cmd, + tinfo_curr->vid, + tinfo_curr->tunid); + if (err) + return err; + } + + return 0; +} + Regards, Rami Rosen
Re: [PATCH v4 3/3] samples/bpf: add lpm-trie benchmark
On Sat, Jan 21, 2017 at 05:26:13PM +0100, Daniel Mack wrote: > From: David Herrmann > > Extend the map_perf_test_{user,kern}.c infrastructure to stress test > lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure > the latency depending on trie size and lookup count. > > On my Intel Haswell i7-6400U, a single gettid() syscall with an empty > bpf program takes roughly 6.5us on my system. Lookups in empty tries > take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192 > entries take ~7.1us (on the first _and_ any subsequent try). > > Signed-off-by: David Herrmann > Reviewed-by: Daniel Mack Acked-by: Alexei Starovoitov Thank you for all the hard work you've put into these patches. All looks great to me.
Re: [patch] samples/bpf: silence shift wrapping warning
On Sat, Jan 21, 2017 at 07:51:43AM +0300, Dan Carpenter wrote: > max_key is a value in the 0-63 range, so on 32 bit systems the shift > could wrap. > > Signed-off-by: Dan Carpenter Looks fine. I think 'net-next' is ok. Acked-by: Alexei Starovoitov > diff --git a/samples/bpf/lwt_len_hist_user.c b/samples/bpf/lwt_len_hist_user.c > index ec8f3bb..bd06eef 100644 > --- a/samples/bpf/lwt_len_hist_user.c > +++ b/samples/bpf/lwt_len_hist_user.c > @@ -68,7 +68,7 @@ int main(int argc, char **argv) > for (i = 1; i <= max_key + 1; i++) { > stars(starstr, data[i - 1], max_value, MAX_STARS); > printf("%8ld -> %-8ld : %-8ld |%-*s|\n", > -(1l << i) >> 1, (1l << i) - 1, data[i - 1], > +(1ULL << i) >> 1, (1ULL << i) - 1, data[i - 1], > MAX_STARS, starstr); > } >
Re: [PATCH v4 2/3] bpf: Add tests for the lpm trie map
On Sat, Jan 21, 2017 at 05:26:12PM +0100, Daniel Mack wrote: > From: David Herrmann > > The first part of this program runs randomized tests against the > lpm-bpf-map. It implements a "Trivial Longest Prefix Match" (tlpm) > based on simple, linear, single linked lists. The implementation > should be pretty straightforward. > > Based on tlpm, this inserts randomized data into bpf-lpm-maps and > verifies the trie-based bpf-map implementation behaves the same way > as tlpm. > > The second part uses 'real world' IPv4 and IPv6 addresses and tests > the trie with those. > > Signed-off-by: David Herrmann > Signed-off-by: Daniel Mack Acked-by: Alexei Starovoitov
Re: [PATCH v4 1/3] bpf: add a longest prefix match trie map implementation
On Sat, Jan 21, 2017 at 05:26:11PM +0100, Daniel Mack wrote: > This trie implements a longest prefix match algorithm that can be used > to match IP addresses to a stored set of ranges. > > Internally, data is stored in an unbalanced trie of nodes that has a > maximum height of n, where n is the prefixlen the trie was created > with. > > Tries may be created with prefix lengths that are multiples of 8, in > the range from 8 to 2048. The key used for lookup and update operations > is a struct bpf_lpm_trie_key, and the value is a uint64_t. > > The code carries more information about the internal implementation. > > Signed-off-by: Daniel Mack > Reviewed-by: David Herrmann Looks great to me. Acked-by: Alexei Starovoitov
[PATCH net-next v5 2/2] net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable
Prior to this patch we were using a hardcoded RGMII TX clock delay of 2ns (= 1/4 cycle of the 125MHz RGMII TX clock). This value works for many boards, but unfortunately not for all (due to the way the actual circuit is designed, sometimes because the TX delay is enabled in the PHY, etc.). Making the TX delay on the MAC side configurable allows us to support all possible hardware combinations. This allows fixing a compatibility issue on some boards, where the RTL8211F PHY is configured to generate the TX delay. We can now turn off the TX delay in the MAC, because otherwise we would be applying the delay twice (which results in non-working TX traffic). Signed-off-by: Martin Blumenstingl Tested-by: Neil Armstrong --- drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c index ffaed1f35efe..8840a360a0b7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-meson8b.c @@ -35,10 +35,6 @@ #define PRG_ETH0_TXDLY_SHIFT 5 #define PRG_ETH0_TXDLY_MASKGENMASK(6, 5) -#define PRG_ETH0_TXDLY_OFF (0x0 << PRG_ETH0_TXDLY_SHIFT) -#define PRG_ETH0_TXDLY_QUARTER (0x1 << PRG_ETH0_TXDLY_SHIFT) -#define PRG_ETH0_TXDLY_HALF(0x2 << PRG_ETH0_TXDLY_SHIFT) -#define PRG_ETH0_TXDLY_THREE_QUARTERS (0x3 << PRG_ETH0_TXDLY_SHIFT) /* divider for the result of m250_sel */ #define PRG_ETH0_CLK_M250_DIV_SHIFT7 @@ -69,6 +65,8 @@ struct meson8b_dwmac { struct clk_divider m25_div; struct clk *m25_div_clk; + + u32 tx_delay_ns; }; static void meson8b_dwmac_mask_bits(struct meson8b_dwmac *dwmac, u32 reg, @@ -179,6 +177,7 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac) { int ret; unsigned long clk_rate; + u8 tx_dly_val; switch (dwmac->phy_mode) { case PHY_INTERFACE_MODE_RGMII: @@ -196,9 +195,13 @@ static int meson8b_init_prg_eth(struct meson8b_dwmac *dwmac) meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_INVERTED_RMII_CLK, 0); - /* TX clock delay - all known boards use a 1/4 cycle delay */ + /* TX clock delay in ns = "8ns / 4 * tx_dly_val" (where +* 8ns are exactly one cycle of the 125MHz RGMII TX clock): +* 0ns = 0x0, 2ns = 0x1, 4ns = 0x2, 6ns = 0x3 +*/ + tx_dly_val = dwmac->tx_delay_ns >> 1; meson8b_dwmac_mask_bits(dwmac, PRG_ETH0, PRG_ETH0_TXDLY_MASK, - PRG_ETH0_TXDLY_QUARTER); + tx_dly_val << PRG_ETH0_TXDLY_SHIFT); break; case PHY_INTERFACE_MODE_RMII: @@ -284,6 +287,11 @@ static int meson8b_dwmac_probe(struct platform_device *pdev) goto err_remove_config_dt; } + /* use 2ns as fallback since this value was previously hardcoded */ + if (of_property_read_u32(pdev->dev.of_node, "amlogic,tx-delay-ns", +&dwmac->tx_delay_ns)) + dwmac->tx_delay_ns = 2; + ret = meson8b_init_clk(dwmac); if (ret) goto err_remove_config_dt; -- 2.11.0
[PATCH net-next v5 1/2] net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac
This allows configuring the RGMII TX clock delay. The RGMII clock is generated by underlying hardware of the the Meson 8b / GXBB DWMAC glue. The configuration depends on the actual hardware (no delay may be needed due to the design of the actual circuit, the PHY might add this delay, etc.). Signed-off-by: Martin Blumenstingl Tested-by: Neil Armstrong Acked-by: Rob Herring --- Documentation/devicetree/bindings/net/meson-dwmac.txt | 16 1 file changed, 16 insertions(+) diff --git a/Documentation/devicetree/bindings/net/meson-dwmac.txt b/Documentation/devicetree/bindings/net/meson-dwmac.txt index 89e62ddc69ca..0703ad3f3c1e 100644 --- a/Documentation/devicetree/bindings/net/meson-dwmac.txt +++ b/Documentation/devicetree/bindings/net/meson-dwmac.txt @@ -25,6 +25,22 @@ Required properties on Meson8b and newer: - "clkin0" - first parent clock of the internal mux - "clkin1" - second parent clock of the internal mux +Optional properties on Meson8b and newer: +- amlogic,tx-delay-ns: The internal RGMII TX clock delay (provided + by this driver) in nanoseconds. Allowed values + are: 0ns, 2ns, 4ns, 6ns. + When phy-mode is set to "rgmii" then the TX + delay should be explicitly configured. When + not configured a fallback of 2ns is used. + When the phy-mode is set to either "rgmii-id" + or "rgmii-txid" the TX clock delay is already + provided by the PHY. In that case this + property should be set to 0ns (which disables + the TX clock delay in the MAC to prevent the + clock from going off because both PHY and MAC + are adding a delay). + Any configuration is ignored when the phy-mode + is set to "rmii". Example for Meson6: -- 2.11.0
[PATCH net-next v5 0/2] stmmac: dwmac-meson8b: configurable RGMII TX delay
Currently the dwmac-meson8b stmmac glue driver uses a hardcoded 1/4 cycle (= 2ns) TX clock delay. This seems to work fine for many boards (for example Odroid-C2 or Amlogic's reference boards) but there are some others where TX traffic is simply broken. There are probably multiple reasons why it's working on some boards while it's broken on others: - some of Amlogic's reference boards are using a Micrel PHY - hardware circuit design - maybe more... iperf3 results on my Mecool BB2 board (Meson GXM, RTL8211F PHY) with TX clock delay disabled on the MAC (as it's enabled in the PHY driver). TX throughput was virtually zero before: $ iperf3 -c 192.168.1.100 -R Connecting to host 192.168.1.100, port 5201 Reverse mode, remote host 192.168.1.100 is sending [ 4] local 192.168.1.206 port 52828 connected to 192.168.1.100 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 108 MBytes 901 Mbits/sec [ 4] 1.00-2.00 sec 94.2 MBytes 791 Mbits/sec [ 4] 2.00-3.00 sec 96.5 MBytes 810 Mbits/sec [ 4] 3.00-4.00 sec 96.2 MBytes 808 Mbits/sec [ 4] 4.00-5.00 sec 96.6 MBytes 810 Mbits/sec [ 4] 5.00-6.00 sec 96.5 MBytes 810 Mbits/sec [ 4] 6.00-7.00 sec 96.6 MBytes 810 Mbits/sec [ 4] 7.00-8.00 sec 96.5 MBytes 809 Mbits/sec [ 4] 8.00-9.00 sec 105 MBytes 884 Mbits/sec [ 4] 9.00-10.00 sec 111 MBytes 934 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1000 MBytes 839 Mbits/sec0 sender [ 4] 0.00-10.00 sec 998 MBytes 837 Mbits/sec receiver iperf Done. $ iperf3 -c 192.168.1.100 Connecting to host 192.168.1.100, port 5201 [ 4] local 192.168.1.206 port 52832 connected to 192.168.1.100 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.01 sec 99.5 MBytes 829 Mbits/sec 117139 KBytes [ 4] 1.01-2.00 sec 105 MBytes 884 Mbits/sec 129 70.7 KBytes [ 4] 2.00-3.01 sec 107 MBytes 889 Mbits/sec 106187 KBytes [ 4] 3.01-4.01 sec 105 MBytes 878 Mbits/sec 92143 KBytes [ 4] 4.01-5.00 sec 105 MBytes 882 Mbits/sec 140129 KBytes [ 4] 5.00-6.01 sec 106 MBytes 883 Mbits/sec 115195 KBytes [ 4] 6.01-7.00 sec 102 MBytes 863 Mbits/sec 133 70.7 KBytes [ 4] 7.00-8.01 sec 106 MBytes 884 Mbits/sec 143 97.6 KBytes [ 4] 8.01-9.01 sec 104 MBytes 875 Mbits/sec 124107 KBytes [ 4] 9.01-10.01 sec 105 MBytes 876 Mbits/sec 90139 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.01 sec 1.02 GBytes 874 Mbits/sec 1189 sender [ 4] 0.00-10.01 sec 1.02 GBytes 873 Mbits/sec receiver iperf Done. I get similar TX throughput on my Meson GXBB "MXQ Pro+" board when I disable the PHY's TX-delay and configure a 4ms TX-delay on the MAC. So changes to at least the RTL8211F PHY driver are needed to get it working properly in all situations. Changes since v4: - add a fallback of 2ns (the value which was previously hardcoded) for the TX delay so we are backwards-compatible with older .dts' - update the documentation with the new fallback value and add a small note that the "amlogic,tx-delay" property is ignored when the phy-mode is "rmii". Changes since v3: - rebased to apply against current net-next branch (fixes a conflict with d2ed0a7755fe14c7 "net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks") Changes since v2: - moved all .dts patches (3-7) to a separate series - removed the default 2ns TX delay when phy-mode RGMII is specified - (rebased against current net-next) Changes since v1: - renamed the devicetree property "amlogic,tx-delay" to "amlogic,tx-delay-ns", which makes the .dts easier to read as we can simply specify human-readable values instead of having "preprocessor defines and calculation in human brain". Thanks to Andrew Lunn for the suggestion! - improved documentation to indicate when the MAC TX-delay should be configured and how to use the PHY's TX-delay - changed the default TX-delay in the dwmac-meson8b driver from 2ns to 0ms when any of the rgmii-*id modes are used (the 2ns default value still applies for phy-mode "rgmii") - added patches to properly reset the PHY on Meson GXBB devices and to use a similar configuration than the one we use on Meson GXL devices (by passing a phy-handle to stmmac and defining the PHY in the mdio0 bus - patch 3-6) - add the "amlogic,tx-delay-ns" property to all boards which are using the RGMII PHY (patch 7) Martin Blumenstingl (2): net: dt-bindings: add RGMII TX delay configuration to meson8b-dwmac net: stmmac: dwmac-meson8b: make the RGMII TX delay configurable .../devicetree/bindings/net/meson-dwmac.txt | 16 drivers/net/ethernet/stm
Re: [PATCH net-next 0/7] net: dsa: bcm_sf2: Add support for BCM7278
From: Florian Fainelli Date: Fri, 20 Jan 2017 12:36:27 -0800 > This patch series adds support for the Broadcom BCM7278 integrated switch > which is a successor of the BCM7445 switch. We have a little bit of > register shuffling going on, which is why most of the functional changes > are to deal with that. Applied, thanks.
Re: [PATCH net-next 0/2] net: systemport: Add support for SYSTEMPORT lite
From: Florian Fainelli Date: Fri, 20 Jan 2017 11:08:25 -0800 > This patch series adds support for SYSTEMPORT Lite which is an evolution > of the existing SYSTEMPORT adapter. > > The two generations are largely identical as far as the transmit/receive > path are concerned, and there were just a few control path changes here > and there. Series applied, thanks.
Re: [RESEND PATCH net] macsec: fix validation failed in asynchronous operation.
Why are you resending this? The original posting on Jan 20th made it to the mailing list and is queued up in patchwork just fine. Also, regardless of the reason, a "RESEND" patch should always contain an explanation of why it needs to be resent. So that the maintainer doesn't need to ask questions like I am right now.
Re: [PATCH net v1 0/2] amd-xgbe: AMD XGBE driver fixes 2017-01-20
From: Tom Lendacky Date: Fri, 20 Jan 2017 12:13:52 -0600 > This patch series addresses some issues in the AMD XGBE driver. > > The following fixes are included in this driver update series: > > - Add a fix for a version of the hardware that uses different register > offset values for a device with the same PCI device ID > - Add support to check the return code from the xgbe_init() function > > This patch series is based on net. Series applied.
Re: [PATCH net-next] ipv6: add NUMA awareness to seg6_hmac_init_algo()
From: Eric Dumazet Date: Fri, 20 Jan 2017 08:08:56 -0800 > From: Eric Dumazet > > Since we allocate per cpu storage, let's also use NUMA hints. > > Signed-off-by: Eric Dumazet Applied.
Re: [PATCH] net: stmicro: fix LS field mask in EEE configuration
From: Joao Pinto Date: Fri, 20 Jan 2017 16:00:26 + > This patch fixes the LS mask when setting EEE timer. > LS field is 10 bits long and not 11 as currently. > > Signed-off-by: Joao Pinto > Reported-By: Rayagond Kokatanur Please indicate the appropriate target tree of your patch in the subject line just like all other developers on this list do, don't make me guess. This time I figured out that this is meant for the net-next tree, but I will not guess next time, I will just reject your patch instead. Thanks.
Re: [PATCH] net/mlx4: use rb_entry()
From: Leon Romanovsky Date: Sun, 22 Jan 2017 09:48:39 +0200 > I don't understand completely the rationale behind this conversion. > rb_entry == container_of, why do we need another name for it? Because it's an annotation. Either you agree that the macro exists and it should be used in every spot where those types are being used, or you don't and therefore argue for the macro and it's usage completely.
Re: [PATCH] 6lowpan: use rb_entry()
From: Geliang Tang Date: Fri, 20 Jan 2017 22:36:53 +0800 > To make the code clearer, use rb_entry() instead of container_of() to > deal with rbtree. > > Signed-off-by: Geliang Tang Applied.
Re: [PATCH] net/mlx4: use rb_entry()
From: Geliang Tang Date: Fri, 20 Jan 2017 22:36:57 +0800 > To make the code clearer, use rb_entry() instead of container_of() to > deal with rbtree. > > Signed-off-by: Geliang Tang Applied.
[PATCH net-next] net: dsa: Fix inverted test for multiple CPU interface
Remove the wrong !, otherwise we get false positives about having multiple CPU interfaces. Fixes: b22de490869d ("net: dsa: store CPU switch structure in the tree") Signed-off-by: Andrew Lunn --- net/dsa/dsa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 77cb78767f1d..1f3afeb673d6 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -225,7 +225,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) continue; if (!strcmp(name, "cpu")) { - if (!dst->cpu_switch) { + if (dst->cpu_switch) { netdev_err(dst->master_netdev, "multiple cpu ports?!\n"); return -EINVAL; -- 2.11.0
[PATCH net-next v6 1/1] net sched actions: Add support for user cookies
From: Jamal Hadi Salim Introduce optional 128-bit action cookie. Like all other cookie schemes in the networking world (eg in protocols like http or existing kernel fib protocol field, etc) the idea is to save user state that when retrieved serves as a correlator. The kernel _should not_ intepret it. The user can store whatever they wish in the 128 bits. Sample exercise(showing variable length use of cookie) .. create an accept action with cookie a1b2c3d4 sudo $TC actions add action ok index 1 cookie a1b2c3d4 .. dump all gact actions.. sudo $TC -s actions ls action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 0 installed 5 sec used 5 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie a1b2c3d4 .. bind the accept action to a filter.. sudo $TC filter add dev lo parent : protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1 ... send some traffic.. $ ping 127.0.0.1 -c 3 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms --- 127.0.0.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2109ms rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1 ... show some stats $ sudo $TC -s actions get action gact index 1 action order 1: gact action pass random type none pass val 0 index 1 ref 2 bind 1 installed 204 sec used 5 sec Action statistics: Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie a1b2c3d4 .. try longer cookie... $ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef .. dump.. $ sudo $TC -s actions ls action gact action order 1: gact action pass random type none pass val 0 index 1 ref 2 bind 1 installed 204 sec used 5 sec Action statistics: Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 1234567890abcdef Signed-off-by: Jamal Hadi Salim --- Changes in v6: - fix mem leak caught by Florian Changes in V5: - kill the stylistic changes - Adopt a new structure with length-valuepointer representation - rename some things Changes in v4: - move stylistic changes out into a separate patch (and add more stylistic changes) Changes in v3: - use TC_ prefix for the max size - move the cookie struct so visible only to kernel - remove unneeded void * cast Changes in V2: -move from a union to a length-value representation include/net/act_api.h| 1 + include/net/pkt_cls.h| 8 include/uapi/linux/pkt_cls.h | 3 +++ net/sched/act_api.c | 36 4 files changed, 48 insertions(+) diff --git a/include/net/act_api.h b/include/net/act_api.h index 1d71644..cfa2ae3 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -41,6 +41,7 @@ struct tc_action { struct rcu_head tcfa_rcu; struct gnet_stats_basic_cpu __percpu *cpu_bstats; struct gnet_stats_queue __percpu *cpu_qstats; + struct tc_cookie*act_cookie; }; #define tcf_head common.tcfa_head #define tcf_index common.tcfa_index diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f0a0514..b43077e 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -515,4 +515,12 @@ struct tc_cls_bpf_offload { u32 gen_flags; }; + +/* This structure holds cookie structure that is passed from user + * to the kernel for actions and classifiers + */ +struct tc_cookie { + u8 *data; + u32 len; +}; #endif diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index fd373eb..345551e 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -4,6 +4,8 @@ #include #include +#define TC_COOKIE_MAX_SIZE 16 + /* Action attributes */ enum { TCA_ACT_UNSPEC, @@ -12,6 +14,7 @@ enum { TCA_ACT_INDEX, TCA_ACT_STATS, TCA_ACT_PAD, + TCA_ACT_COOKIE, __TCA_ACT_MAX }; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index cd08df9..58cf1c5 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -33,6 +34,8 @@ static void free_tcf(struct rcu_head *head) free_percpu(p->cpu_bstats); free_percpu(p->cpu_qstats); + kfree(p->act_cookie->data); + kfree(p->act_cookie); kfree(p); } @@ -475,6 +478,12 @@ int tcf_action_destroy(struct list_head *actions, int bind) goto nla_put_failure; if (tcf_action_copy_stats(skb, a, 0)) goto nla_put_failure; + if (a->act_cookie) { + if (nla_put(skb, TCA_ACT_COOKIE, a->act_co
Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies
On 17-01-22 02:32 PM, Jiri Pirko wrote: Sun, Jan 22, 2017 at 07:57:17PM CET, j...@mojatatu.com wrote: On 17-01-22 01:13 PM, Florian Fainelli wrote: + a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE], +GFP_KERNEL); + if (!a->act_cookie->data) { + err = -ENOMEM; + tcf_hash_release(a, bind); + goto err_mod; + } Are not you leaking a->act_cookie here in case nla_memdup() fails here? yes, I am. Thanks for catching this. V6 coming up. Btw, you don't have to send cover letter for a single patch. In fact, you should not. You can see i write small novels in my commit logs. Do you suggest i put the git history there as well? cheers, jamal
Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies
Sun, Jan 22, 2017 at 07:57:17PM CET, j...@mojatatu.com wrote: >On 17-01-22 01:13 PM, Florian Fainelli wrote: >> >> > >> >> > + a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE], >> > + GFP_KERNEL); >> > + if (!a->act_cookie->data) { >> > + err = -ENOMEM; >> > + tcf_hash_release(a, bind); >> > + goto err_mod; >> > + } >> >> Are not you leaking a->act_cookie here in case nla_memdup() fails here? >> > >yes, I am. Thanks for catching this. V6 coming up. Btw, you don't have to send cover letter for a single patch. In fact, you should not.
[PATCH 3/3] sh_eth: stop using bare numbers for EESIPR values
Now that we have almost all EESIPR bits declared (and those that are still not are most probably reserved anyway) we can at last replace the bare numbers used for 'sh_eth_cpu_data::eesipr_value' initializers with the bit names ORed together... Signed-off-by: Sergei Shtylyov --- drivers/net/ethernet/renesas/sh_eth.c | 89 +- 1 file changed, 78 insertions(+), 11 deletions(-) Index: net-next/drivers/net/ethernet/renesas/sh_eth.c === --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c +++ net-next/drivers/net/ethernet/renesas/sh_eth.c @@ -518,7 +518,14 @@ static struct sh_eth_cpu_data r7s72100_d .ecsr_value = ECSR_ICD, .ecsipr_value = ECSIPR_ICDIP, - .eesipr_value = 0xe77f009f, + .eesipr_value = EESIPR_TWB1IP | EESIPR_TWBIP | EESIPR_TC1IP | + EESIPR_TABTIP | EESIPR_RABTIP | EESIPR_RFCOFIP | + EESIPR_ECIIP | + EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP | + EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP | + EESIPR_RMAFIP | EESIPR_RRFIP | + EESIPR_RTLFIP | EESIPR_RTSFIP | + EESIPR_PREIP | EESIPR_CERFIP, .tx_check = EESR_TC1 | EESR_FTC, .eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT | @@ -556,7 +563,14 @@ static struct sh_eth_cpu_data r8a7740_da .ecsr_value = ECSR_ICD | ECSR_MPD, .ecsipr_value = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP, - .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | + EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP | + EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP | + 0xf000 | EESIPR_CNDIP | EESIPR_DLCIP | + EESIPR_CDIP | EESIPR_TROIP | EESIPR_RMAFIP | + EESIPR_CEEFIP | EESIPR_CELFIP | + EESIPR_RRFIP | EESIPR_RTLFIP | EESIPR_RTSFIP | + EESIPR_PREIP | EESIPR_CERFIP, .tx_check = EESR_TC1 | EESR_FTC, .eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT | @@ -603,7 +617,12 @@ static struct sh_eth_cpu_data r8a777x_da .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD, .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP, - .eesipr_value = 0x01ff009f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ADEIP | EESIPR_ECIIP | + EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP | + EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP | + EESIPR_RMAFIP | EESIPR_RRFIP | + EESIPR_RTLFIP | EESIPR_RTSFIP | + EESIPR_PREIP | EESIPR_CERFIP, .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO, .eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE | @@ -626,7 +645,12 @@ static struct sh_eth_cpu_data r8a779x_da .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD | ECSR_MPD, .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP, - .eesipr_value = 0x01ff009f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ADEIP | EESIPR_ECIIP | + EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP | + EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP | + EESIPR_RMAFIP | EESIPR_RRFIP | + EESIPR_RTLFIP | EESIPR_RTSFIP | + EESIPR_PREIP | EESIPR_CERFIP, .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO, .eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE | @@ -667,7 +691,12 @@ static struct sh_eth_cpu_data sh7724_dat .ecsr_value = ECSR_PSRTO | ECSR_LCHNG | ECSR_ICD, .ecsipr_value = ECSIPR_PSRTOIP | ECSIPR_LCHNGIP | ECSIPR_ICDIP, - .eesipr_value = 0x01ff009f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ADEIP | EESIPR_ECIIP | + EESIPR_FTCIP | EESIPR_TDEIP | EESIPR_TFUFIP | + EESIPR_FRIP | EESIPR_RDEIP | EESIPR_RFOFIP | + EESIPR_RMAFIP | EESIPR_RRFIP | + EESIPR_RTLFIP | EESIPR_RTSFIP | + EESIPR_PREIP | EESIPR_CERFIP, .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO, .eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE | @@ -702,7 +731,14 @@ static struct sh_eth_cpu_data sh7757_dat .register_type = SH_ETH_REG_FAST_SH4, - .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | +
[PATCH 2/3] sh_eth: add missing EESIPR bits
Renesas SH77{34|63} manuals describe more EESIPR bits than the current driver. Declare the new bits with the end goal of using the bit names instead of the bare numbers for the 'sh_eth_cpu_data::eesipr_value' initializers... Signed-off-by: Sergei Shtylyov --- drivers/net/ethernet/renesas/sh_eth.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) Index: net-next/drivers/net/ethernet/renesas/sh_eth.h === --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.h +++ net-next/drivers/net/ethernet/renesas/sh_eth.h @@ -269,13 +269,17 @@ enum EESR_BIT { /* EESIPR */ enum EESIPR_BIT { - EESIPR_TWBIP= 0x4000, + EESIPR_TWB1IP = 0x8000, + EESIPR_TWBIP= 0x4000, /* same as TWB0IP */ + EESIPR_TC1IP= 0x2000, + EESIPR_TUCIP= 0x1000, + EESIPR_ROCIP= 0x0800, EESIPR_TABTIP = 0x0400, EESIPR_RABTIP = 0x0200, EESIPR_RFCOFIP = 0x0100, EESIPR_ADEIP= 0x0080, EESIPR_ECIIP= 0x0040, - EESIPR_FTCIP= 0x0020, + EESIPR_FTCIP= 0x0020, /* same as TC0IP */ EESIPR_TDEIP= 0x0010, EESIPR_TFUFIP = 0x0008, EESIPR_FRIP = 0x0004, @@ -286,6 +290,8 @@ enum EESIPR_BIT { EESIPR_CDIP = 0x0200, EESIPR_TROIP= 0x0100, EESIPR_RMAFIP = 0x0080, + EESIPR_CEEFIP = 0x0040, + EESIPR_CELFIP = 0x0020, EESIPR_RRFIP= 0x0010, EESIPR_RTLFIP = 0x0008, EESIPR_RTSFIP = 0x0004,
[PATCH 1/3] sh_eth: rename EESIPR bits
Since the commit b0ca2a21f769 ("sh_eth: Add support of SH7763 to sh_eth") the *enum* declaring the EESIPR bits (interrupt mask) went out of sync with the *enum* declaring the EESR bits (interrupt status) WRT bit naming and formatting. I'd like to restore the consistency by using EESIPR as the bit name prefix, renaming the *enum* to EESIPR_BIT, and (finally) renaming the bits according to the available Renesas SH77{34|63} manuals... Signed-off-by: Sergei Shtylyov --- drivers/net/ethernet/renesas/sh_eth.c | 22 ++-- drivers/net/ethernet/renesas/sh_eth.h | 36 +- 2 files changed, 34 insertions(+), 24 deletions(-) Index: net-next/drivers/net/ethernet/renesas/sh_eth.c === --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c +++ net-next/drivers/net/ethernet/renesas/sh_eth.c @@ -556,7 +556,7 @@ static struct sh_eth_cpu_data r8a7740_da .ecsr_value = ECSR_ICD | ECSR_MPD, .ecsipr_value = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP, - .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f, .tx_check = EESR_TC1 | EESR_FTC, .eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT | @@ -702,7 +702,7 @@ static struct sh_eth_cpu_data sh7757_dat .register_type = SH_ETH_REG_FAST_SH4, - .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f, .tx_check = EESR_FTC | EESR_CND | EESR_DLC | EESR_CD | EESR_RTO, .eesr_err_check = EESR_TWB | EESR_TABT | EESR_RABT | EESR_RFE | @@ -769,7 +769,7 @@ static struct sh_eth_cpu_data sh7757_dat .ecsr_value = ECSR_ICD | ECSR_MPD, .ecsipr_value = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP, - .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f, .tx_check = EESR_TC1 | EESR_FTC, .eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT | @@ -800,7 +800,7 @@ static struct sh_eth_cpu_data sh7734_dat .ecsr_value = ECSR_ICD | ECSR_MPD, .ecsipr_value = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP, - .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f07ff, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f07ff, .tx_check = EESR_TC1 | EESR_FTC, .eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT | @@ -830,7 +830,7 @@ static struct sh_eth_cpu_data sh7763_dat .ecsr_value = ECSR_ICD | ECSR_MPD, .ecsipr_value = ECSIPR_LCHNGIP | ECSIPR_ICDIP | ECSIPR_MPDIP, - .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f07ff, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f07ff, .tx_check = EESR_TC1 | EESR_FTC, .eesr_err_check = EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT | @@ -851,7 +851,7 @@ static struct sh_eth_cpu_data sh7763_dat static struct sh_eth_cpu_data sh7619_data = { .register_type = SH_ETH_REG_FAST_SH3_SH2, - .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f, .apr= 1, .mpr= 1, @@ -862,7 +862,7 @@ static struct sh_eth_cpu_data sh7619_dat static struct sh_eth_cpu_data sh771x_data = { .register_type = SH_ETH_REG_FAST_SH3_SH2, - .eesipr_value = DMAC_M_RFRMER | DMAC_M_ECI | 0x003f, + .eesipr_value = EESIPR_RFCOFIP | EESIPR_ECIIP | 0x003f, .tsu= 1, }; @@ -1547,10 +1547,10 @@ static void sh_eth_emac_interrupt(struct sh_eth_rcv_snd_disable(ndev); } else { /* Link Up */ - sh_eth_modify(ndev, EESIPR, DMAC_M_ECI, 0); + sh_eth_modify(ndev, EESIPR, EESIPR_ECIIP, 0); /* clear int */ sh_eth_modify(ndev, ECSR, 0, 0); - sh_eth_modify(ndev, EESIPR, DMAC_M_ECI, DMAC_M_ECI); + sh_eth_modify(ndev, EESIPR, EESIPR_ECIIP, EESIPR_ECIIP); /* enable tx and rx */ sh_eth_rcv_snd_enable(ndev); } @@ -1652,7 +1652,7 @@ static irqreturn_t sh_eth_interrupt(int * bit... */ intr_enable = sh_eth_read(ndev, EESIPR); - intr_status &= intr_enable | DMAC_M_ECI; + intr_status &= intr_enable | EESIPR_ECIIP; if (intr_status & (EESR_RX_CHECK | cd->tx_check | EESR_ECI | cd->eesr_err_check)) ret = IRQ_HANDLED; @@ -3199,7 +3199,7 @@ static int sh_eth_wol_setup(struct net_d /* Only allow ECI interrupts */ synchronize_irq(ndev->irq);
[PATCH 0/3] sh_eth: E-DMAC interrupt mask cleanups
Hello. Here's a set of 3 patches against DaveM's 'net-next.git' repo. The main goal of this set is to stop using the bare numbers for the E-DMAC interrupt masks. [1/3] sh_eth: rename EESIPR bits [2/3] sh_eth: add missing EESIPR bits [3/3] sh_eth: stop using bare numbers for EESIPR values MBR, Sergei
Re: [PATCH] net/mlx4: use rb_entry()
On Sun, Jan 22, 2017 at 10:42:25PM +0800, Geliang Tang wrote: > On Sun, Jan 22, 2017 at 09:48:39AM +0200, Leon Romanovsky wrote: > > On Fri, Jan 20, 2017 at 10:36:57PM +0800, Geliang Tang wrote: > > > To make the code clearer, use rb_entry() instead of container_of() to > > > deal with rbtree. > > > > > > Signed-off-by: Geliang Tang > > > --- > > > drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 8 > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > I don't understand completely the rationale behind this conversion. > > rb_entry == container_of, why do we need another name for it? > > > > There are several *_entry macros which are defined in kernel data > structures, like list_entry, hlist_entry, rb_entry, etc. Each of them is > just another name for container_of. We use different *_entry so that we > could identify the specific type of data structure that we are dealing > with. Your proposed patch doesn't support the importance of such knowledge for rb_entry. The list_entry case is totally different, because you perform operation on it. Anyway, It doesn't matter. Reviewed-by: Leon Romanovsky signature.asc Description: PGP signature
Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies
On 17-01-22 01:13 PM, Florian Fainelli wrote: + a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE], +GFP_KERNEL); + if (!a->act_cookie->data) { + err = -ENOMEM; + tcf_hash_release(a, bind); + goto err_mod; + } Are not you leaking a->act_cookie here in case nla_memdup() fails here? yes, I am. Thanks for catching this. V6 coming up. cheers, jamal
Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
On Sat, Jan 21, 2017 at 11:12 AM, kernel netdev wrote: > > > Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" : > > On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed > wrote: >> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet >> wrote: >>> From: Eric Dumazet >>> >>> A driver using dev_alloc_page() must not reuse a page allocated from >>> emergency memory reserve. >>> >>> Otherwise all packets using this page will be immediately dropped, >>> unless for very specific sockets having SOCK_MEMALLOC bit set. >>> >>> This issue might be hard to debug, because only a fraction of received >>> packets would be dropped. >> >> Hi Eric, >> >> When you say reuse, you mean point to the same page from several SKBs ? >> >> Because in our page cache implementation we don't reuse pages that >> already passed to the stack, >> we just keep them in the page cache until the ref count drop back to >> one, so we recycle them (i,e they will be re-used only when no one >> else is using them). >> > Saeed, > > Speaking of the mlx page cache can we remove this or a least make it > optional to use. It is another example of complex functionality being > put into drivers that makes things like backports more complicated and > provide at best some marginal value. In the case of the mlx5e cache > code the results from pktgen really weren't very impressive in the > first place. Also, the cache suffers from HOL blocking where we can > block the whole cache due to an outstanding reference on just one page > (something that you wouldn't see in pktgen but is likely to happen in > real applications). > > > (Send from phone in car) > > To Tom, have you measured the effect of this page cache? Before claiming it > is ineffective. No, I have not. TBH, I have most of the past few weeks trying to debug a backport of the code from 4.9 to 4.6. Until we have a working backport performance is immaterial for our purposes. Unfortunately, we are seeing some issues: the checksum faults I described previously and crashes on bad page refcns which are presumably being caused by the logic in RX buffer processing. This is why I am now having to dissect the code and trying to disable things like the page cache that are not essential functionality. In any case the HOL blocking issue is obvious from reading the code which and implies bimodal behavior-- we don't need a test for that to know it's a bad as we've see the bad effects of that in many other contexts. > > My previous measurements show approx 20℅ speedup on a UDP test with delivery > to remote CPU. > > Removing the cache would of cause be a good usecase for speeding up the page > allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page > order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150 > cycles. And with bulking this can be amortized to 80 cycles. > That would be great. If only I had a nickel for every time someone started working on a driver and came the conclusion that they need to do a custom memory allocator because the kernel allocator is so inefficient! Tom > --Jesper >
Re: [PATCH net-next v5 1/1] net sched actions: Add support for user cookies
On 01/22/2017 04:51 AM, Jamal Hadi Salim wrote: > From: Jamal Hadi Salim > > Introduce optional 128-bit action cookie. > Like all other cookie schemes in the networking world (eg in protocols > like http or existing kernel fib protocol field, etc) the idea is to save > user state that when retrieved serves as a correlator. The kernel > _should not_ intepret it. The user can store whatever they wish in the > 128 bits. > > Sample exercise(showing variable length use of cookie) > > .. create an accept action with cookie a1b2c3d4 > sudo $TC actions add action ok index 1 cookie a1b2c3d4 > > .. dump all gact actions.. > sudo $TC -s actions ls action gact > > action order 0: gact action pass > random type none pass val 0 > index 1 ref 1 bind 0 installed 5 sec used 5 sec > Action statistics: > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > cookie a1b2c3d4 > > .. bind the accept action to a filter.. > sudo $TC filter add dev lo parent : protocol ip prio 1 \ > u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1 > > ... send some traffic.. > $ ping 127.0.0.1 -c 3 > PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. > 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms > 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms > 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms > > --- 127.0.0.1 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2109ms > rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1 > > ... show some stats > $ sudo $TC -s actions get action gact index 1 > > action order 1: gact action pass > random type none pass val 0 > index 1 ref 2 bind 1 installed 204 sec used 5 sec > Action statistics: > Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > cookie a1b2c3d4 > > .. try longer cookie... > $ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef > .. dump.. > $ sudo $TC -s actions ls action gact > > action order 1: gact action pass > random type none pass val 0 > index 1 ref 2 bind 1 installed 204 sec used 5 sec > Action statistics: > Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) > backlog 0b 0p requeues 0 > cookie 1234567890abcdef > > Signed-off-by: Jamal Hadi Salim > + a->act_cookie->data = nla_memdup(tb[TCA_ACT_COOKIE], > + GFP_KERNEL); > + if (!a->act_cookie->data) { > + err = -ENOMEM; > + tcf_hash_release(a, bind); > + goto err_mod; > + } Are not you leaking a->act_cookie here in case nla_memdup() fails here? -- Florian
Re: [PATCH net] net/mlx5e: Do not recycle pages from emergency reserve
On Sat, Jan 21, 2017 at 12:31 PM, Saeed Mahameed wrote: > On Sat, Jan 21, 2017 at 9:12 PM, kernel netdev wrote: >> >> >> Den 21. jan. 2017 7.10 PM skrev "Tom Herbert" : >> >> On Thu, Jan 19, 2017 at 11:14 AM, Saeed Mahameed >> wrote: >>> On Thu, Jan 19, 2017 at 9:03 AM, Eric Dumazet >>> wrote: From: Eric Dumazet A driver using dev_alloc_page() must not reuse a page allocated from emergency memory reserve. Otherwise all packets using this page will be immediately dropped, unless for very specific sockets having SOCK_MEMALLOC bit set. This issue might be hard to debug, because only a fraction of received packets would be dropped. >>> >>> Hi Eric, >>> >>> When you say reuse, you mean point to the same page from several SKBs ? >>> >>> Because in our page cache implementation we don't reuse pages that >>> already passed to the stack, >>> we just keep them in the page cache until the ref count drop back to >>> one, so we recycle them (i,e they will be re-used only when no one >>> else is using them). >>> >> Saeed, >> >> Speaking of the mlx page cache can we remove this or a least make it >> optional to use. It is another example of complex functionality being >> put into drivers that makes things like backports more complicated and > > Re complexity, I am not sure the mlx page cache is that complex, > we just wrap alloc_page/put_page with our own page cache calls. > Roughly the page cache implementation is 200-300 LOC tops all concentrated > in one place in the code. > Taken as part of the RX buffer management code the whole thing in very complicated and seems to be completely bereft of any comments in the code as to how things are supposed to work. >> provide at best some marginal value. In the case of the mlx5e cache >> code the results from pktgen really weren't very impressive in the >> first place. Also, the cache suffers from HOL blocking where we can > > Well, with pktgen you won't notice a huge improvement since the pages are > freed > in the stack directly from our rx receive handler (gro_receive), those > pages will go back to the page allocator and get requested immediately > again from the driver (no stress). > > The real improvements are seen when you really stress the page allocator with > real uses cases such as TCP/UDP with user applications, where the > pages are held longer than the driver needs, the page cache in this > case will play an important role of reducing the stress > on the page allocater since with those use cases we are juggling with > more pages than pktgen could. > > With more stress (#cores TCP streams) our humble page cache some times > can't hold ! and it will get full fast enough and will fail to recycle > for a huge percentage of the driver pages requests. > > Before our own page cache we used dev_alloc_skb which used its own > cache "page_frag_cache" and it worked nice enough. So i don't really > recommend removing the page cache, until we have > a generic RX page cache solution for all device drivers. > >> block the whole cache due to an outstanding reference on just one page >> (something that you wouldn't see in pktgen but is likely to happen in >> real applications). >> > > Re the HOL issue, we have some upcoming patches that would drastically > improve the HOL blocking issue (we will simple swap the HOL on every > sample). > >> >> (Send from phone in car) >> > > Driver Safe :) .. > >> To Tom, have you measured the effect of this page cache? Before claiming it >> is ineffective. >> >> My previous measurements show approx 20℅ speedup on a UDP test with delivery >> to remote CPU. >> >> Removing the cache would of cause be a good usecase for speeding up the page >> allocator (PCP). Which Mel Gorman and me are working on. AFAIK current page >> order0 cost 240 cycles. Mel have reduced til to 180, and without NUMA 150 >> cycles. And with bulking this can be amortized to 80 cycles. >> > > Are you trying to say that we won't need the cache if you manage to > deliver those optimizations ? > can you compare those optimizations with the page_frag_cache from > dev_alloc_skb ? > >> --Jesper >>
Re: [PATCH] Documentation: net: phy: improve explanation when to specify the PHY ID
On Sun, Jan 22, 2017 at 05:41:32PM +0100, Martin Blumenstingl wrote: > The old description basically read like "ethernet-phy-id." can > be specified when you know the actual PHY ID. However, specifying this > has a side-effect: it forces Linux to bind to a certain PHY driver (the > one that matches the ID given in the compatible string), ignoring the ID > which is reported by the actual PHY. > Whenever a device is shipped with (multiple) different PHYs during it's > production lifetime then explicitly specifying > "ethernet-phy-id." could break certain revisions of that device. > > Signed-off-by: Martin Blumenstingl Reviewed-by: Andrew Lunn Thanks Andrew
Re: [PATCH] net: mvneta: implement .set_wol and .get_wol
On Sun, Jan 22, 2017 at 06:06:30PM +0800, Jingju Hou wrote: > Signed-off-by: Jingju Hou Hi Jingju Please include a real comment here. Something like: The mvneta itself does not support WOL, but the PHY might. So pass the calls to the PHY. It also looks like you are patching an old kernel. Network patches like this need to be against net-next. You should also include net-next in the subject line. Thanks Andrew
[PATCH] Documentation: net: phy: improve explanation when to specify the PHY ID
The old description basically read like "ethernet-phy-id." can be specified when you know the actual PHY ID. However, specifying this has a side-effect: it forces Linux to bind to a certain PHY driver (the one that matches the ID given in the compatible string), ignoring the ID which is reported by the actual PHY. Whenever a device is shipped with (multiple) different PHYs during it's production lifetime then explicitly specifying "ethernet-phy-id." could break certain revisions of that device. Signed-off-by: Martin Blumenstingl --- Thanks to Andrew Lunn for pointing the documentation issue out to me in: http://lists.infradead.org/pipermail/linux-amlogic/2017-January/002141.html Documentation/devicetree/bindings/net/phy.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt index ff1bc4b1bb3b..fb5056b22685 100644 --- a/Documentation/devicetree/bindings/net/phy.txt +++ b/Documentation/devicetree/bindings/net/phy.txt @@ -19,8 +19,9 @@ Optional Properties: specifications. If neither of these are specified, the default is to assume clause 22. - If the phy's identifier is known then the list may contain an entry - of the form: "ethernet-phy-id." where + If the PHY reports an incorrect ID (or none at all) then the + "compatible" list may contain an entry with the correct PHY ID in the + form: "ethernet-phy-id." where - The value of the 16 bit Phy Identifier 1 register as 4 hex digits. This is the chip vendor OUI bits 3:18 - The value of the 16 bit Phy Identifier 2 register as -- 2.11.0
RE: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to extended statistics
> -Original Message- > From: Stephen Hemminger [mailto:step...@networkplumber.org] > Sent: Thursday, January 19, 2017 9:11 PM > To: Roopa Prabhu > Cc: Nogah Frankel ; netdev@vger.kernel.org; > roszenr...@gmail.com; Jiri Pirko ; Ido Schimmel > ; Elad Raz ; Yotam Gigi > ; Or Gerlitz > Subject: Re: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to > extended statistics > > On Thu, 19 Jan 2017 08:06:21 -0800 > Roopa Prabhu wrote: > > > On 1/19/17, 7:21 AM, Nogah Frankel wrote: > > >> -Original Message- > > >> From: Nogah Frankel > > >> Sent: Sunday, January 15, 2017 3:55 PM > > >> To: 'Stephen Hemminger' > > >> Cc: netdev@vger.kernel.org; roszenr...@gmail.com; > ro...@cumulusnetworks.com; Jiri > > >> Pirko ; Ido Schimmel ; Elad Raz > > >> ; Yotam Gigi ; Or Gerlitz > > >> > > >> Subject: RE: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to > > >> extended > statistics > > >> > > >> > > >> > > >>> -Original Message- > > >>> From: Stephen Hemminger [mailto:step...@networkplumber.org] > > >>> Sent: Friday, January 13, 2017 3:44 AM > > >>> To: Nogah Frankel > > >>> Cc: netdev@vger.kernel.org; roszenr...@gmail.com; > ro...@cumulusnetworks.com; > > >> Jiri > > >>> Pirko ; Ido Schimmel ; Elad Raz > > >>> ; Yotam Gigi ; Or Gerlitz > > >>> > > >>> Subject: Re: [PATCH iproute2 v4 3/4] ifstat: Add 64 bits based stats to > > >>> extended > statistics > > >>> > > >>> On Thu, 12 Jan 2017 15:49:50 +0200 > > >>> Nogah Frankel wrote: > > >>> > > The default stats for ifstat are 32 bits based. > > The kernel supports 64 bits based stats. (They are returned in struct > > rtnl_link_stats64 which is an exact copy of struct rtnl_link_stats, in > > which the "normal" stats are returned, but with fields of u64 instead > > of > > u32). This patch adds them as an extended stats. > > > > It is read with filter type IFLA_STATS_LINK_64 and no sub type. > > > > It is under the name 64bits > > (or any shorten of it as "64") > > > > For example: > > ifstat -x 64bit > > > > Signed-off-by: Nogah Frankel > > Reviewed-by: Jiri Pirko > > >>> Other commands (like ip link) always use the 64 bit statistics if > > >>> available > > >>> from the device. I see no reason that ifstat needs to be different. > > >>> > > >> Do you mean to change the default ifstat results to be 64 bits based? > > >> I tried it in the first version, but Roopa commented that it was not a > > >> good idea. > > >> She said they tried it in the past and it caused backward > > >> compatibilities problems. > > >> (Or maybe I didn't understand correctly) > > > So, can I leave the default ifstat results to be 32 bits based, for the > > > time being? > > > > > From past discussions: Moving the default to 64bit has compat issues with > > the old > history file. > > There is a way to make it work by using a new file header (to indicate that > > it is 64 bit) in > > a freshly created history file and also check this header before dumping > > stats into the > history file. > > ie maintain backward compat without introducing a new option. It is doable. > > > > One approach is, you can drop the 64bit option from this series and > > try updating the default to 64 bit (with compat handling code) in a later > > series. I think I will take your suggestion to drop the 64 bits from this series. Hopefully, I'll return to it in some later series in the future. Thanks > > The ifstat code could do conversion based on file size. > > if (history_file_is_32bit()) { > printf("converting to 64 bit format\n"); > ... > } > >
Re: [RFC PATCH net-next 4/5] bridge: vlan lwt and dst_metadata netlink support
On 1/22/17, 4:05 AM, Nikolay Aleksandrov wrote: > On 21/01/17 06:46, Roopa Prabhu wrote: >> From: Roopa Prabhu >> >> This patch adds support to attach per vlan tunnel info dst >> metadata. This enables bridge driver to map vlan to tunnel_info >> at ingress and egress >> >> The initial use case is vlan to vni bridging, but the api is generic >> to extend to any tunnel_info in the future: >> - Uapi to configure/unconfigure/dump per vlan tunnel data >> - netlink functions to configure vlan and tunnel_info mapping >> - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach >> dst_metadata to bridged packets on ports. >> >> Use case: >> example use for this is a vxlan bridging gateway or vtep >> which maps vlans to vn-segments (or vnis). User can configure >> per-vlan tunnel information which the bridge driver can use >> to bridge vlan into the corresponding tunnel. >> >> CC: Nikolay Aleksandrov >> Signed-off-by: Roopa Prabhu >> --- >> CC'ing Nikolay for some more eyes as he has been trying to keep the >> bridge driver fast path lite. >> >> include/linux/if_bridge.h |1 + >> net/bridge/br_input.c |1 + >> net/bridge/br_netlink.c | 410 >> ++--- >> net/bridge/br_private.h | 18 ++ >> net/bridge/br_vlan.c | 138 ++- >> 5 files changed, 507 insertions(+), 61 deletions(-) >> >> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h >> index c6587c0..36ff611 100644 >> --- a/include/linux/if_bridge.h >> +++ b/include/linux/if_bridge.h >> @@ -46,6 +46,7 @@ struct br_ip_list { >> #define BR_LEARNING_SYNCBIT(9) >> #define BR_PROXYARP_WIFIBIT(10) >> #define BR_MCAST_FLOOD BIT(11) >> +#define BR_LWT_VLAN BIT(12) >> >> #define BR_DEFAULT_AGEING_TIME (300 * HZ) >> >> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c >> index 855b72f..83f356f 100644 >> --- a/net/bridge/br_input.c >> +++ b/net/bridge/br_input.c >> @@ -20,6 +20,7 @@ >> #include >> #include >> #include >> +#include >> #include "br_private.h" >> >> /* Hook for brouter */ >> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c >> index 71c7453..df997ad 100644 >> --- a/net/bridge/br_netlink.c >> +++ b/net/bridge/br_netlink.c >> @@ -17,17 +17,30 @@ >> #include >> #include >> #include >> +#include >> >> #include "br_private.h" >> #include "br_private_stp.h" >> >> -static int __get_num_vlan_infos(struct net_bridge_vlan_group *vg, >> -u32 filter_mask) >> +static size_t br_get_vlan_tinfo_size(void) >> { >> +return nla_total_size(0) + /* nest IFLA_BRIDGE_VLAN_TUNNEL_INFO */ >> + nla_total_size(sizeof(u32)) + /* IFLA_BRIDGE_VLAN_TUNNEL_ID */ >> + nla_total_size(sizeof(u16)) + /* IFLA_BRIDGE_VLAN_TUNNEL_VID >> */ >> + nla_total_size(sizeof(u16)); /* IFLA_BRIDGE_VLAN_TUNNEL_FLAGS >> */ >> +} >> + >> +static int __get_num_vlan_infos(struct net_bridge_port *p, >> +struct net_bridge_vlan_group *vg, >> +u32 filter_mask, int *num_vtinfos) >> +{ >> +struct net_bridge_vlan *vbegin = NULL, *vend = NULL; >> +struct net_bridge_vlan *vtbegin = NULL, *vtend = NULL; >> struct net_bridge_vlan *v; >> -u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0; >> +bool get_tinfos = (p && p->flags & BR_LWT_VLAN) ? true: false; >> +bool vcontinue, vtcontinue; >> +int num_vinfos = 0; >> u16 flags, pvid; >> -int num_vlans = 0; >> >> if (!(filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED)) >> return 0; >> @@ -36,6 +49,8 @@ static int __get_num_vlan_infos(struct >> net_bridge_vlan_group *vg, >> /* Count number of vlan infos */ >> list_for_each_entry_rcu(v, &vg->vlan_list, vlist) { >> flags = 0; >> +vcontinue = false; >> +vtcontinue = false; >> /* only a context, bridge vlan not activated */ >> if (!br_vlan_should_use(v)) >> continue; >> @@ -45,47 +60,79 @@ static int __get_num_vlan_infos(struct >> net_bridge_vlan_group *vg, >> if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED) >> flags |= BRIDGE_VLAN_INFO_UNTAGGED; >> >> -if (vid_range_start == 0) { >> -goto initvars; >> -} else if ((v->vid - vid_range_end) == 1 && >> -flags == vid_range_flags) { >> -vid_range_end = v->vid; >> +if (!vbegin) { >> +vbegin = v; >> +vend = v; >> +vcontinue = true; >> +} else if ((v->vid - vend->vid) == 1 && >> +flags == vbegin->flags) { >> +vend = v; >> +vcontinue = true; >> +} >> + >> +if (!vcontinue) { >> +if ((vend->vid - vbegin->vid) > 0
Re: [RFC PATCH net-next 5/5] bridge: vlan lwt dst_metadata hooks in ingress and egress paths
On 1/22/17, 4:15 AM, Nikolay Aleksandrov wrote: > On 21/01/17 06:46, Roopa Prabhu wrote: >> From: Roopa Prabhu >> >> - ingress hook: >> - if port is a lwt tunnel port, use tunnel info in >> attached dst_metadata to map it to a local vlan >> - egress hook: >> - if port is a lwt tunnel port, use tunnel info attached to >> vlan to set dst_metadata on the skb >> >> CC: Nikolay Aleksandrov >> Signed-off-by: Roopa Prabhu >> --- >> CC'ing Nikolay for some more eyes as he has been trying to keep the >> bridge driver fast path lite. >> >> net/bridge/br_input.c |4 >> net/bridge/br_private.h |4 >> net/bridge/br_vlan.c| 55 >> +++ >> 3 files changed, 63 insertions(+) >> >> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c >> index 83f356f..96602a1 100644 >> --- a/net/bridge/br_input.c >> +++ b/net/bridge/br_input.c >> @@ -262,6 +262,10 @@ rx_handler_result_t br_handle_frame(struct sk_buff >> **pskb) >> return RX_HANDLER_CONSUMED; >> >> p = br_port_get_rcu(skb->dev); >> +if (p->flags & BR_LWT_VLAN) { >> +if (br_handle_ingress_vlan_tunnel(skb, p, >> nbp_vlan_group_rcu(p))) >> +goto drop; >> +} > Is there any reason to do this so early (perhaps netfilter?) ? If not, you > can push it to the vlan __allowed_ingress > (and rename that function to something else, it does a hundred additional > things) > and avoid this check for all packets if vlans are disabled, thus people using > non-vlan filtering > bridge won't have an additional test in their fast path > > yes, forgot to mention it in the commit log. I had it close to __allowed_ingress in my first version...had to move it up here because br_nf_pre_routing/br_nf_pre_routing_finish reset the dst...and hence already late..
Re: [RFC PATCH net-next 2/5] vxlan: make COLLECT_METADATA mode bridge friendly
On 1/22/17, 3:40 AM, Nikolay Aleksandrov wrote: > On 21/01/17 06:46, Roopa Prabhu wrote: >> From: Roopa Prabhu >> >> This patch series makes vxlan COLLECT_METADATA mode bridge >> and layer2 network friendly. Vxlan COLLECT_METADATA mode today >> solves the per-vni netdev scalability problem in l3 networks. >> When vxlan collect metadata device participates in bridging >> vlan to vn-segments, It can only get the vlan mapped vni in >> the xmit tunnel dst metadata. It will need the vxlan driver to >> continue learn, hold forwarding state and remote destination >> information similar to how it already does for non COLLECT_METADATA >> vxlan netdevices today. >> >> Changes introduced by this patch: >> - allow learning and forwarding database state to vxlan netdev in >> COLLECT_METADATA mode. Current behaviour is not changed >> by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used >> to support the new bridge friendly mode. >> - A single fdb table hashed by (mac, vni) to allow fdb entries with >> multiple vnis in the same fdb table >> - rx path already has the vni >> - tx path expects a vni in the packet with dst_metadata >> - prior to this series, fdb remote_dsts carried remote vni and >> the vxlan device carrying the fdb table represented the >> source vni. With the vxlan device now representing multiple vnis, >> this patch adds a src vni attribute to the fdb entry. The remote >> vni already uses NDA_VNI attribute. This patch introduces >> NDA_SRC_VNI netlink attribute to represent the src vni in a multi >> vni fdb table. >> >> Signed-off-by: Roopa Prabhu >> --- > [snip] >> @@ -2173,23 +2221,29 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, >> struct net_device *dev) >> bool did_rsc = false; >> struct vxlan_rdst *rdst, *fdst = NULL; >> struct vxlan_fdb *f; >> +__be32 vni = 0; >> >> info = skb_tunnel_info(skb); >> >> skb_reset_mac_header(skb); >> >> if (vxlan->flags & VXLAN_F_COLLECT_METADATA) { >> -if (info && info->mode & IP_TUNNEL_INFO_TX) >> -vxlan_xmit_one(skb, dev, NULL, false); >> -else >> -kfree_skb(skb); >> -return NETDEV_TX_OK; >> +if (info && info->mode & IP_TUNNEL_INFO_BRIDGE && >> +info->mode & IP_TUNNEL_INFO_TX) { > nit: parentheses around the IP_TUNNEL_INFO_TX check > >> +vni = tunnel_id_to_key32(info->key.tun_id); >> +} else { >> +if (info && info->mode & IP_TUNNEL_INFO_TX) > nit: parentheses around the IP_TUNNEL_INFO_TX check ack >> +vxlan_xmit_one(skb, dev, vni, NULL, false); >> +else >> +kfree_skb(skb); >> +return NETDEV_TX_OK; >> +} >> } >> >> if (vxlan->flags & VXLAN_F_PROXY) { >> eth = eth_hdr(skb); >> if (ntohs(eth->h_proto) == ETH_P_ARP) >> -return arp_reduce(dev, skb); >> +return arp_reduce(dev, skb, vni); >> #if IS_ENABLED(CONFIG_IPV6) >> else if (ntohs(eth->h_proto) == ETH_P_IPV6 && >> pskb_may_pull(skb, sizeof(struct ipv6hdr) >> @@ -2200,13 +2254,13 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, >> struct net_device *dev) >> msg = (struct nd_msg >> *)skb_transport_header(skb); >> if (msg->icmph.icmp6_code == 0 && >> msg->icmph.icmp6_type == >> NDISC_NEIGHBOUR_SOLICITATION) >> -return neigh_reduce(dev, skb); >> +return neigh_reduce(dev, skb, vni); >> } >> #endif >> } >> >> eth = eth_hdr(skb); >> -f = vxlan_find_mac(vxlan, eth->h_dest); >> +f = vxlan_find_mac(vxlan, eth->h_dest, vni); >> did_rsc = false; >> >> if (f && (f->flags & NTF_ROUTER) && (vxlan->flags & VXLAN_F_RSC) && >> @@ -2214,11 +2268,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, >> struct net_device *dev) >> ntohs(eth->h_proto) == ETH_P_IPV6)) { >> did_rsc = route_shortcircuit(dev, skb); >> if (did_rsc) >> -f = vxlan_find_mac(vxlan, eth->h_dest); >> +f = vxlan_find_mac(vxlan, eth->h_dest, vni); >> } >> >> if (f == NULL) { >> -f = vxlan_find_mac(vxlan, all_zeros_mac); >> +f = vxlan_find_mac(vxlan, all_zeros_mac, vni); >> if (f == NULL) { >> if ((vxlan->flags & VXLAN_F_L2MISS) && >> !is_multicast_ether_addr(eth->h_dest)) >> @@ -2239,11 +2293,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, >> struct net_device *dev) >> } >> skb1 = skb_clone(skb, GFP_ATOMIC); >> if (skb1) >> -
Re: [PATCH] net/mlx4: use rb_entry()
On Sun, Jan 22, 2017 at 09:48:39AM +0200, Leon Romanovsky wrote: > On Fri, Jan 20, 2017 at 10:36:57PM +0800, Geliang Tang wrote: > > To make the code clearer, use rb_entry() instead of container_of() to > > deal with rbtree. > > > > Signed-off-by: Geliang Tang > > --- > > drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > I don't understand completely the rationale behind this conversion. > rb_entry == container_of, why do we need another name for it? > There are several *_entry macros which are defined in kernel data structures, like list_entry, hlist_entry, rb_entry, etc. Each of them is just another name for container_of. We use different *_entry so that we could identify the specific type of data structure that we are dealing with. -Geliang
Re: [patch net-next 2/4] net/sched: Introduce sample tc action
Jiri Pirko writes: > From: Yotam Gigi > > This action allows the user to sample traffic matched by tc classifier. > The sampling consists of choosing packets randomly and sampling them using > the psample module. The user can configure the psample group number, the > sampling rate and the packet's truncation (to save kernel-user traffic). > [skip] > diff --git a/include/uapi/linux/tc_act/tc_sample.h > b/include/uapi/linux/tc_act/tc_sample.h > new file mode 100644 > index 000..21378bc > --- /dev/null > +++ b/include/uapi/linux/tc_act/tc_sample.h > @@ -0,0 +1,26 @@ > +#ifndef __LINUX_TC_SAMPLE_H > +#define __LINUX_TC_SAMPLE_H > + > +#include > +#include > +#include > + > +#define TCA_ACT_SAMPLE 26 > + > +struct tc_sample { > + tc_gen; > +}; > + > +enum { > + TCA_SAMPLE_UNSPEC, > + TCA_SAMPLE_PARMS, > + TCA_SAMPLE_TM, > + TCA_SAMPLE_RATE, > + TCA_SAMPLE_TRUNC_SIZE, > + TCA_SAMPLE_PSAMPLE_GROUP, > + TCA_SAMPLE_PAD, > + __TCA_SAMPLE_MAX > +}; Most of action implementations define TCA_X_TM attribute as 1, and TCA_X_PARMS as 2 followed by action specific tlvs, it is better to adhere this style in newly designed actions. [skip] -- Roman Mashak
[PATCH iproute2 1/1] tc: distinguish Add/Replace action operations.
Signed-off-by: Roman Mashak Signed-off-by: Jamal Hadi Salim --- tc/m_action.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/tc/m_action.c b/tc/m_action.c index bb19df8..05ef07e 100644 --- a/tc/m_action.c +++ b/tc/m_action.c @@ -365,12 +365,18 @@ int print_action(const struct sockaddr_nl *who, fprintf(fp, "Flushed table "); tab_flush = 1; } else { - fprintf(fp, "deleted action "); + fprintf(fp, "Deleted action "); } } - if (n->nlmsg_type == RTM_NEWACTION) - fprintf(fp, "Added action "); + if (n->nlmsg_type == RTM_NEWACTION) { + if ((n->nlmsg_flags & NLM_F_CREATE) && + !(n->nlmsg_flags & NLM_F_REPLACE)) { + fprintf(fp, "Added action "); + } else if (n->nlmsg_flags & NLM_F_REPLACE) { + fprintf(fp, "Replaced action "); + } + } tc_print_action(fp, tb[TCA_ACT_TAB]); return 0; -- 1.9.1
RE: [patch net-next 2/4] net/sched: Introduce sample tc action
>-Original Message- >From: Jamal Hadi Salim [mailto:j...@mojatatu.com] >Sent: Sunday, January 22, 2017 3:17 PM >To: Jiri Pirko ; netdev@vger.kernel.org >Cc: da...@davemloft.net; Yotam Gigi ; Ido Schimmel >; Elad Raz ; Nogah Frankel >; Or Gerlitz ; >geert+rene...@glider.be; step...@networkplumber.org; >xiyou.wangc...@gmail.com; li...@roeck-us.net; ro...@cumulusnetworks.com; >john.fastab...@gmail.com; simon.hor...@netronome.com; m...@mojatatu.com >Subject: Re: [patch net-next 2/4] net/sched: Introduce sample tc action > >On 17-01-22 06:44 AM, Jiri Pirko wrote: > >> diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c >> new file mode 100644 >> index 000..24e20e4 >> --- /dev/null >> +++ b/net/sched/act_sample.c >> @@ -0,0 +1,274 @@ >> +/* >> + * net/sched/act_sample.c - Packet samplig tc action > >typo: "Sampling" It took me a while to see it. Will fix :) > > >> +static int tcf_sample(struct sk_buff *skb, const struct tc_action *a, >> + struct tcf_result *res) > > >Can you rename this function because it is also the name of the data >structure? It makes it easier to grep. >I know we have this all over the place in other actions (and i hope >those are cleaned up at some point). OK, makes sense. > >otherwise: >Acked-by: Jamal Hadi Salim Thanks! > >cheers, >jamal
Re: [net, 6/6] net: korina: version bump
On 2017-01-22 13:10, Roman Yeryomin wrote: > On 17 January 2017 at 21:19, Roman Yeryomin wrote: >> On 17 January 2017 at 20:55, Felix Fietkau wrote: >>> On 2017-01-17 18:33, Roman Yeryomin wrote: Signed-off-by: Roman Yeryomin --- drivers/net/ethernet/korina.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c index 83c994f..c8fed01 100644 --- a/drivers/net/ethernet/korina.c +++ b/drivers/net/ethernet/korina.c @@ -66,8 +66,8 @@ #include #define DRV_NAME "korina" -#define DRV_VERSION "0.10" -#define DRV_RELDATE "04Mar2008" +#define DRV_VERSION "0.20" +#define DRV_RELDATE "15Jan2017" >>> I think it would make more sense to remove this version instead of >>> bumping it. Individual driver versions are rather pointless, the kernel >>> version is more meaningful anyway. >> >> OK, makes sense > > Actually, after thinking a bit more about this, not really... > How about ethtool, which uses driver name and version? > I see most ethernet drivers define some version. And it's pretty > useful, when using backports. > IMO, it should be kept and bumped. I don't really care, I just wanted to point out that the exact kernel version is a much more useful indicator, especially since not all patch submitters do the useless version bump dance. - Felix
Re: [patch net-next 1/4] net: Introduce psample, a new genetlink channel for packet sampling
On 17-01-22 06:44 AM, Jiri Pirko wrote: From: Yotam Gigi Add a general way for kernel modules to sample packets, without being tied to any specific subsystem. This netlink channel can be used by tc, iptables, etc. and allow to standardize packet sampling in the kernel. For every sampled packet, the psample module adds the following metadata fields: PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been truncated during sampling PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the user who initiated the sampling. This field allows the user to differentiate between several samplers working simultaneously and filter packets relevant to him PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The sequence is kept for each group PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets PSAMPLE_ATTR_DATA - the actual packet bits In addition, add the GET_GROUPS netlink command which allows the user to see the current sample groups, their refcount and sequence number. This command currently supports only netlink dump mode. Signed-off-by: Yotam Gigi Signed-off-by: Jiri Pirko Will be useful to describe in the commit log that one needs to listen to PSAMPLE_NL_MCGRP_SAMPLE to see the samples. Reviewed-by: Jamal Hadi Salim cheers, jamal
Re: [patch net-next 2/4] net/sched: Introduce sample tc action
On 17-01-22 06:44 AM, Jiri Pirko wrote: diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c new file mode 100644 index 000..24e20e4 --- /dev/null +++ b/net/sched/act_sample.c @@ -0,0 +1,274 @@ +/* + * net/sched/act_sample.c - Packet samplig tc action typo: "Sampling" +static int tcf_sample(struct sk_buff *skb, const struct tc_action *a, + struct tcf_result *res) Can you rename this function because it is also the name of the data structure? It makes it easier to grep. I know we have this all over the place in other actions (and i hope those are cleaned up at some point). otherwise: Acked-by: Jamal Hadi Salim cheers, jamal
Re: [PATCH net-next v5 0/1] Add support for tc cookies
I removed people's reviewed/Acked because i changed the data structure per Daniel's suggestions. cheers, jamal On 17-01-22 07:51 AM, Jamal Hadi Salim wrote: From: Jamal Hadi Salim Changes in V5: - kill the stylistic changes - Adopt a new structure with length-valuepointer representation - rename some things Changes in v4: - move stylistic changes out into a separate patch (and add more stylistic changes) Changes in v3: - use TC_ prefix for the max size - move the cookie struct so visible only to kernel - remove unneeded void * cast Changes in V2: -move from a union to a length-value representation Jamal Hadi Salim (1): net sched actions: Add support for user cookies include/net/act_api.h| 1 + include/net/pkt_cls.h| 8 include/uapi/linux/pkt_cls.h | 3 +++ net/sched/act_api.c | 35 +++ 4 files changed, 47 insertions(+)
[PATCH net-next v5 1/1] net sched actions: Add support for user cookies
From: Jamal Hadi Salim Introduce optional 128-bit action cookie. Like all other cookie schemes in the networking world (eg in protocols like http or existing kernel fib protocol field, etc) the idea is to save user state that when retrieved serves as a correlator. The kernel _should not_ intepret it. The user can store whatever they wish in the 128 bits. Sample exercise(showing variable length use of cookie) .. create an accept action with cookie a1b2c3d4 sudo $TC actions add action ok index 1 cookie a1b2c3d4 .. dump all gact actions.. sudo $TC -s actions ls action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 0 installed 5 sec used 5 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie a1b2c3d4 .. bind the accept action to a filter.. sudo $TC filter add dev lo parent : protocol ip prio 1 \ u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1 ... send some traffic.. $ ping 127.0.0.1 -c 3 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms --- 127.0.0.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2109ms rtt min/avg/max/mdev = 0.020/0.028/0.038/0.008 ms 1 ... show some stats $ sudo $TC -s actions get action gact index 1 action order 1: gact action pass random type none pass val 0 index 1 ref 2 bind 1 installed 204 sec used 5 sec Action statistics: Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie a1b2c3d4 .. try longer cookie... $ sudo $TC actions replace action ok index 1 cookie 1234567890abcdef .. dump.. $ sudo $TC -s actions ls action gact action order 1: gact action pass random type none pass val 0 index 1 ref 2 bind 1 installed 204 sec used 5 sec Action statistics: Sent 12168 bytes 164 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 cookie 1234567890abcdef Signed-off-by: Jamal Hadi Salim --- include/net/act_api.h| 1 + include/net/pkt_cls.h| 8 include/uapi/linux/pkt_cls.h | 3 +++ net/sched/act_api.c | 35 +++ 4 files changed, 47 insertions(+) diff --git a/include/net/act_api.h b/include/net/act_api.h index 1d71644..cfa2ae3 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -41,6 +41,7 @@ struct tc_action { struct rcu_head tcfa_rcu; struct gnet_stats_basic_cpu __percpu *cpu_bstats; struct gnet_stats_queue __percpu *cpu_qstats; + struct tc_cookie*act_cookie; }; #define tcf_head common.tcfa_head #define tcf_index common.tcfa_index diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h index f0a0514..b43077e 100644 --- a/include/net/pkt_cls.h +++ b/include/net/pkt_cls.h @@ -515,4 +515,12 @@ struct tc_cls_bpf_offload { u32 gen_flags; }; + +/* This structure holds cookie structure that is passed from user + * to the kernel for actions and classifiers + */ +struct tc_cookie { + u8 *data; + u32 len; +}; #endif diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h index fd373eb..345551e 100644 --- a/include/uapi/linux/pkt_cls.h +++ b/include/uapi/linux/pkt_cls.h @@ -4,6 +4,8 @@ #include #include +#define TC_COOKIE_MAX_SIZE 16 + /* Action attributes */ enum { TCA_ACT_UNSPEC, @@ -12,6 +14,7 @@ enum { TCA_ACT_INDEX, TCA_ACT_STATS, TCA_ACT_PAD, + TCA_ACT_COOKIE, __TCA_ACT_MAX }; diff --git a/net/sched/act_api.c b/net/sched/act_api.c index cd08df9..84052630 100644 --- a/net/sched/act_api.c +++ b/net/sched/act_api.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -33,6 +34,8 @@ static void free_tcf(struct rcu_head *head) free_percpu(p->cpu_bstats); free_percpu(p->cpu_qstats); + kfree(p->act_cookie->data); + kfree(p->act_cookie); kfree(p); } @@ -475,6 +478,12 @@ int tcf_action_destroy(struct list_head *actions, int bind) goto nla_put_failure; if (tcf_action_copy_stats(skb, a, 0)) goto nla_put_failure; + if (a->act_cookie) { + if (nla_put(skb, TCA_ACT_COOKIE, a->act_cookie->len, + a->act_cookie->data)) + goto nla_put_failure; + } + nest = nla_nest_start(skb, TCA_OPTIONS); if (nest == NULL) goto nla_put_failure; @@ -575,6 +584,32 @@ struct tc_action *tcf_action_init_1(struct net *net, struct nlattr *nla, if (err < 0) goto err_mod; + if (tb[TCA_ACT_COOKIE]) { + int cklen = nla_len(tb[TCA_ACT_COOKIE]); + + i
[PATCH net-next v5 0/1] Add support for tc cookies
From: Jamal Hadi Salim Changes in V5: - kill the stylistic changes - Adopt a new structure with length-valuepointer representation - rename some things Changes in v4: - move stylistic changes out into a separate patch (and add more stylistic changes) Changes in v3: - use TC_ prefix for the max size - move the cookie struct so visible only to kernel - remove unneeded void * cast Changes in V2: -move from a union to a length-value representation Jamal Hadi Salim (1): net sched actions: Add support for user cookies include/net/act_api.h| 1 + include/net/pkt_cls.h| 8 include/uapi/linux/pkt_cls.h | 3 +++ net/sched/act_api.c | 35 +++ 4 files changed, 47 insertions(+) -- 1.9.1
Re: [PATCH net 1/1] net sched actions: fix refcnt when GETing of action after bind
On 17-01-20 01:20 AM, Cong Wang wrote: On Wed, Jan 18, 2017 at 3:33 AM, Jamal Hadi Salim wrote: On 17-01-17 01:17 PM, Cong Wang wrote: I did. The issue there (after your original patch) was destroy() would decrement the refcount to zero and a GET was essentially translated to a DEL. Incrementing the refcount earlier protected against that assuming destroy was going to decrement it. However, when an action is bound the destroy() doesnt decrement the refcnt. So the refcnt keeps going up forever (and therefore deleting fails in the future). So we cant use destroy() as is. Hmm, tcf_action_destroy() should not touch the refcnt at all in this case, right? Since the refcnt here is not for readers in kernel code but for user-space. We mix the use of this refcnt, which leads to problems. Your patch is not correct either for DEL, tcf_action_destroy() is not needed to call again after tcf_del_notify() fails, right? Probably it is not needed at all: Cong, Please proceed to separate del from get. The trickery is biting us. Also - run those tests i had in my patch. There is a difference between the bound vs not-bound use cases. cheers, jamal
Re: [RFC PATCH net-next 5/5] bridge: vlan lwt dst_metadata hooks in ingress and egress paths
On 21/01/17 06:46, Roopa Prabhu wrote: > From: Roopa Prabhu > > - ingress hook: > - if port is a lwt tunnel port, use tunnel info in > attached dst_metadata to map it to a local vlan > - egress hook: > - if port is a lwt tunnel port, use tunnel info attached to > vlan to set dst_metadata on the skb > > CC: Nikolay Aleksandrov > Signed-off-by: Roopa Prabhu > --- > CC'ing Nikolay for some more eyes as he has been trying to keep the > bridge driver fast path lite. > > net/bridge/br_input.c |4 > net/bridge/br_private.h |4 > net/bridge/br_vlan.c| 55 > +++ > 3 files changed, 63 insertions(+) > > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c > index 83f356f..96602a1 100644 > --- a/net/bridge/br_input.c > +++ b/net/bridge/br_input.c > @@ -262,6 +262,10 @@ rx_handler_result_t br_handle_frame(struct sk_buff > **pskb) > return RX_HANDLER_CONSUMED; > > p = br_port_get_rcu(skb->dev); > + if (p->flags & BR_LWT_VLAN) { > + if (br_handle_ingress_vlan_tunnel(skb, p, > nbp_vlan_group_rcu(p))) > + goto drop; > + } Is there any reason to do this so early (perhaps netfilter?) ? If not, you can push it to the vlan __allowed_ingress (and rename that function to something else, it does a hundred additional things) and avoid this check for all packets if vlans are disabled, thus people using non-vlan filtering bridge won't have an additional test in their fast path > > if (unlikely(is_link_local_ether_addr(dest))) { > u16 fwd_mask = p->br->group_fwd_mask_required; > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index f68e360..68a23c5 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -804,6 +804,10 @@ int __vlan_tunnel_info_del(struct net_bridge_vlan_group > *vg, > int nbp_vlan_tunnel_info_add(struct net_bridge_port *port, u16 vid, u32 > tun_id); > bool vlan_tunnel_id_isrange(struct net_bridge_vlan *v_end, > struct net_bridge_vlan *v); > +int br_handle_ingress_vlan_tunnel(struct sk_buff *skb, struct > net_bridge_port *p, > + struct net_bridge_vlan_group *vg); > +int br_handle_egress_vlan_tunnel(struct sk_buff *skb, > + struct net_bridge_vlan *vlan); > > static inline struct net_bridge_vlan_group *br_vlan_group( > const struct net_bridge *br) > diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c > index 2040f08..6cf2344 100644 > --- a/net/bridge/br_vlan.c > +++ b/net/bridge/br_vlan.c > @@ -405,6 +405,11 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br, > > if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED) > skb->vlan_tci = 0; > + > + if (br_handle_egress_vlan_tunnel(skb, v)) { > + kfree_skb(skb); > + return NULL; > + } > out: > return skb; > } > @@ -1213,3 +1218,53 @@ int nbp_vlan_tunnel_info_delete(struct net_bridge_port > *port, u16 vid) > > return 0; > } > + > +int br_handle_ingress_vlan_tunnel(struct sk_buff *skb, > + struct net_bridge_port *p, > + struct net_bridge_vlan_group *vg) > +{ > + struct ip_tunnel_info *tinfo = skb_tunnel_info(skb); > + struct net_bridge_vlan *vlan; > + > + if (!vg || !tinfo) > + return 0; > + > + /* if already tagged, ignore */ > + if (skb_vlan_tagged(skb)) > + return 0; > + > + /* lookup vid, given tunnel id */ > + vlan = br_vlan_tunnel_lookup(&vg->tunnel_hash, tinfo->key.tun_id); > + if (!vlan) > + return 0; > + > + skb_dst_drop(skb); > + > + __vlan_hwaccel_put_tag(skb, p->br->vlan_proto, vlan->vid); > + > + return 0; > +} > + > +int br_handle_egress_vlan_tunnel(struct sk_buff *skb, > + struct net_bridge_vlan *vlan) > +{ > + __be32 tun_id; > + int err; > + > + if (!vlan || !vlan->tinfo.tunnel_id) > + return 0; > + > + if (unlikely(!skb_vlan_tag_present(skb))) > + return 0; > + > + skb_dst_drop(skb); > + tun_id = tunnel_id_to_key32(vlan->tinfo.tunnel_id); > + > + err = skb_vlan_pop(skb); > + if (err) > + return err; > + > + skb_dst_set(skb, dst_clone(&vlan->tinfo.tunnel_dst->dst)); > + > + return 0; > +} >
Re: [RFC PATCH net-next 4/5] bridge: vlan lwt and dst_metadata netlink support
On 21/01/17 06:46, Roopa Prabhu wrote: > From: Roopa Prabhu > > This patch adds support to attach per vlan tunnel info dst > metadata. This enables bridge driver to map vlan to tunnel_info > at ingress and egress > > The initial use case is vlan to vni bridging, but the api is generic > to extend to any tunnel_info in the future: > - Uapi to configure/unconfigure/dump per vlan tunnel data > - netlink functions to configure vlan and tunnel_info mapping > - Introduces bridge port flag BR_LWT_VLAN to enable attach/detach > dst_metadata to bridged packets on ports. > > Use case: > example use for this is a vxlan bridging gateway or vtep > which maps vlans to vn-segments (or vnis). User can configure > per-vlan tunnel information which the bridge driver can use > to bridge vlan into the corresponding tunnel. > > CC: Nikolay Aleksandrov > Signed-off-by: Roopa Prabhu > --- > CC'ing Nikolay for some more eyes as he has been trying to keep the > bridge driver fast path lite. > > include/linux/if_bridge.h |1 + > net/bridge/br_input.c |1 + > net/bridge/br_netlink.c | 410 > ++--- > net/bridge/br_private.h | 18 ++ > net/bridge/br_vlan.c | 138 ++- > 5 files changed, 507 insertions(+), 61 deletions(-) > > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h > index c6587c0..36ff611 100644 > --- a/include/linux/if_bridge.h > +++ b/include/linux/if_bridge.h > @@ -46,6 +46,7 @@ struct br_ip_list { > #define BR_LEARNING_SYNC BIT(9) > #define BR_PROXYARP_WIFI BIT(10) > #define BR_MCAST_FLOOD BIT(11) > +#define BR_LWT_VLAN BIT(12) > > #define BR_DEFAULT_AGEING_TIME (300 * HZ) > > diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c > index 855b72f..83f356f 100644 > --- a/net/bridge/br_input.c > +++ b/net/bridge/br_input.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > #include "br_private.h" > > /* Hook for brouter */ > diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c > index 71c7453..df997ad 100644 > --- a/net/bridge/br_netlink.c > +++ b/net/bridge/br_netlink.c > @@ -17,17 +17,30 @@ > #include > #include > #include > +#include > > #include "br_private.h" > #include "br_private_stp.h" > > -static int __get_num_vlan_infos(struct net_bridge_vlan_group *vg, > - u32 filter_mask) > +static size_t br_get_vlan_tinfo_size(void) > { > + return nla_total_size(0) + /* nest IFLA_BRIDGE_VLAN_TUNNEL_INFO */ > + nla_total_size(sizeof(u32)) + /* IFLA_BRIDGE_VLAN_TUNNEL_ID */ > + nla_total_size(sizeof(u16)) + /* IFLA_BRIDGE_VLAN_TUNNEL_VID > */ > + nla_total_size(sizeof(u16)); /* IFLA_BRIDGE_VLAN_TUNNEL_FLAGS > */ > +} > + > +static int __get_num_vlan_infos(struct net_bridge_port *p, > + struct net_bridge_vlan_group *vg, > + u32 filter_mask, int *num_vtinfos) > +{ > + struct net_bridge_vlan *vbegin = NULL, *vend = NULL; > + struct net_bridge_vlan *vtbegin = NULL, *vtend = NULL; > struct net_bridge_vlan *v; > - u16 vid_range_start = 0, vid_range_end = 0, vid_range_flags = 0; > + bool get_tinfos = (p && p->flags & BR_LWT_VLAN) ? true: false; > + bool vcontinue, vtcontinue; > + int num_vinfos = 0; > u16 flags, pvid; > - int num_vlans = 0; > > if (!(filter_mask & RTEXT_FILTER_BRVLAN_COMPRESSED)) > return 0; > @@ -36,6 +49,8 @@ static int __get_num_vlan_infos(struct > net_bridge_vlan_group *vg, > /* Count number of vlan infos */ > list_for_each_entry_rcu(v, &vg->vlan_list, vlist) { > flags = 0; > + vcontinue = false; > + vtcontinue = false; > /* only a context, bridge vlan not activated */ > if (!br_vlan_should_use(v)) > continue; > @@ -45,47 +60,79 @@ static int __get_num_vlan_infos(struct > net_bridge_vlan_group *vg, > if (v->flags & BRIDGE_VLAN_INFO_UNTAGGED) > flags |= BRIDGE_VLAN_INFO_UNTAGGED; > > - if (vid_range_start == 0) { > - goto initvars; > - } else if ((v->vid - vid_range_end) == 1 && > - flags == vid_range_flags) { > - vid_range_end = v->vid; > + if (!vbegin) { > + vbegin = v; > + vend = v; > + vcontinue = true; > + } else if ((v->vid - vend->vid) == 1 && > + flags == vbegin->flags) { > + vend = v; > + vcontinue = true; > + } > + > + if (!vcontinue) { > + if ((vend->vid - vbegin->vid) > 0) > + num_vinfos += 2; > + else > + num_vinfos +=
Re: [net, 6/6] net: korina: version bump
On 17 January 2017 at 21:19, Roman Yeryomin wrote: > On 17 January 2017 at 20:55, Felix Fietkau wrote: >> On 2017-01-17 18:33, Roman Yeryomin wrote: >>> Signed-off-by: Roman Yeryomin >>> --- >>> drivers/net/ethernet/korina.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c >>> index 83c994f..c8fed01 100644 >>> --- a/drivers/net/ethernet/korina.c >>> +++ b/drivers/net/ethernet/korina.c >>> @@ -66,8 +66,8 @@ >>> #include >>> >>> #define DRV_NAME "korina" >>> -#define DRV_VERSION "0.10" >>> -#define DRV_RELDATE "04Mar2008" >>> +#define DRV_VERSION "0.20" >>> +#define DRV_RELDATE "15Jan2017" >> I think it would make more sense to remove this version instead of >> bumping it. Individual driver versions are rather pointless, the kernel >> version is more meaningful anyway. > > OK, makes sense Actually, after thinking a bit more about this, not really... How about ethtool, which uses driver name and version? I see most ethernet drivers define some version. And it's pretty useful, when using backports. IMO, it should be kept and bumped. Regards, Roman
Re: [patch net-next 4/4] mlxsw: spectrum: Add packet sample offloading support
On Sun, Jan 22, 2017 at 12:44:47PM +0100, Jiri Pirko wrote: > From: Yotam Gigi > > Using the MPSC register, add the functions that configure port-based > packet sampling in hardware and the necessary datatypes in the > mlxsw_sp_port struct. In addition, add the necessary trap for sampled > packets and integrate with matchall offloading to allow offloading of the > sample tc action. > > The current offload support is for the tc command: > > tc filter add dev parent : \ > matchall skip_sw \ > action sample rate group [trunc ] > > Where only ingress qdiscs are supported, and only a combination of > matchall classifier and sample action will lead to activating hardware > packet sampling. > > Signed-off-by: Yotam Gigi > Signed-off-by: Jiri Pirko Reviewed-by: Ido Schimmel
[patch net-next 4/4] mlxsw: spectrum: Add packet sample offloading support
From: Yotam Gigi Using the MPSC register, add the functions that configure port-based packet sampling in hardware and the necessary datatypes in the mlxsw_sp_port struct. In addition, add the necessary trap for sampled packets and integrate with matchall offloading to allow offloading of the sample tc action. The current offload support is for the tc command: tc filter add dev parent : \ matchall skip_sw \ action sample rate group [trunc ] Where only ingress qdiscs are supported, and only a combination of matchall classifier and sample action will lead to activating hardware packet sampling. Signed-off-by: Yotam Gigi Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 111 + drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 10 +++ drivers/net/ethernet/mellanox/mlxsw/trap.h | 1 + 3 files changed, 122 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index 3dbd82e..467aa52 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -57,6 +57,7 @@ #include #include #include +#include #include "spectrum.h" #include "pci.h" @@ -469,6 +470,16 @@ static void mlxsw_sp_span_mirror_remove(struct mlxsw_sp_port *from, mlxsw_sp_span_inspected_port_unbind(from, span_entry, type); } +static int mlxsw_sp_port_sample_set(struct mlxsw_sp_port *mlxsw_sp_port, + bool enable, u32 rate) +{ + struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp; + char mpsc_pl[MLXSW_REG_MPSC_LEN]; + + mlxsw_reg_mpsc_pack(mpsc_pl, mlxsw_sp_port->local_port, enable, rate); + return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mpsc), mpsc_pl); +} + static int mlxsw_sp_port_admin_status_set(struct mlxsw_sp_port *mlxsw_sp_port, bool is_up) { @@ -1218,6 +1229,51 @@ mlxsw_sp_port_del_cls_matchall_mirror(struct mlxsw_sp_port *mlxsw_sp_port, mlxsw_sp_span_mirror_remove(mlxsw_sp_port, to_port, span_type); } +static int +mlxsw_sp_port_add_cls_matchall_sample(struct mlxsw_sp_port *mlxsw_sp_port, + struct tc_cls_matchall_offload *cls, + const struct tc_action *a, + bool ingress) +{ + int err; + + if (!mlxsw_sp_port->sample) + return -EOPNOTSUPP; + if (rtnl_dereference(mlxsw_sp_port->sample->psample_group)) { + netdev_err(mlxsw_sp_port->dev, "sample already active\n"); + return -EEXIST; + } + if (tcf_sample_rate(a) > MLXSW_REG_MPSC_RATE_MAX) { + netdev_err(mlxsw_sp_port->dev, "sample rate not supported\n"); + return -EOPNOTSUPP; + } + + rcu_assign_pointer(mlxsw_sp_port->sample->psample_group, + tcf_sample_psample_group(a)); + mlxsw_sp_port->sample->truncate = tcf_sample_truncate(a); + mlxsw_sp_port->sample->trunc_size = tcf_sample_trunc_size(a); + mlxsw_sp_port->sample->rate = tcf_sample_rate(a); + + err = mlxsw_sp_port_sample_set(mlxsw_sp_port, true, tcf_sample_rate(a)); + if (err) + goto err_port_sample_set; + return 0; + +err_port_sample_set: + RCU_INIT_POINTER(mlxsw_sp_port->sample->psample_group, NULL); + return err; +} + +static void +mlxsw_sp_port_del_cls_matchall_sample(struct mlxsw_sp_port *mlxsw_sp_port) +{ + if (!mlxsw_sp_port->sample) + return; + + mlxsw_sp_port_sample_set(mlxsw_sp_port, false, 1); + RCU_INIT_POINTER(mlxsw_sp_port->sample->psample_group, NULL); +} + static int mlxsw_sp_port_add_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port, __be16 protocol, struct tc_cls_matchall_offload *cls, @@ -1248,6 +1304,10 @@ static int mlxsw_sp_port_add_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port, mirror = &mall_tc_entry->mirror; err = mlxsw_sp_port_add_cls_matchall_mirror(mlxsw_sp_port, mirror, a, ingress); + } else if (is_tcf_sample(a) && protocol == htons(ETH_P_ALL)) { + mall_tc_entry->type = MLXSW_SP_PORT_MALL_SAMPLE; + err = mlxsw_sp_port_add_cls_matchall_sample(mlxsw_sp_port, cls, + a, ingress); } else { err = -EOPNOTSUPP; } @@ -1281,6 +1341,9 @@ static void mlxsw_sp_port_del_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port, mlxsw_sp_port_del_cls_matchall_mirror(mlxsw_sp_port, &mall_tc_entry->mirror); break; + case MLXSW_SP_PORT_MALL_SAMPLE: + mlxsw_sp_port
[patch net-next 2/4] net/sched: Introduce sample tc action
From: Yotam Gigi This action allows the user to sample traffic matched by tc classifier. The sampling consists of choosing packets randomly and sampling them using the psample module. The user can configure the psample group number, the sampling rate and the packet's truncation (to save kernel-user traffic). Example: To sample ingress traffic from interface eth1, one may use the commands: tc qdisc add dev eth1 handle : ingress tc filter add dev eth1 parent : \ matchall action sample rate 12 group 4 Where the first command adds an ingress qdisc and the second starts sampling randomly with an average of one sampled packet per 12 packets on dev eth1 to psample group 4. Signed-off-by: Yotam Gigi Signed-off-by: Jiri Pirko --- include/net/tc_act/tc_sample.h| 50 +++ include/uapi/linux/tc_act/Kbuild | 1 + include/uapi/linux/tc_act/tc_sample.h | 26 net/sched/Kconfig | 12 ++ net/sched/Makefile| 1 + net/sched/act_sample.c| 274 ++ 6 files changed, 364 insertions(+) create mode 100644 include/net/tc_act/tc_sample.h create mode 100644 include/uapi/linux/tc_act/tc_sample.h create mode 100644 net/sched/act_sample.c diff --git a/include/net/tc_act/tc_sample.h b/include/net/tc_act/tc_sample.h new file mode 100644 index 000..89e9305 --- /dev/null +++ b/include/net/tc_act/tc_sample.h @@ -0,0 +1,50 @@ +#ifndef __NET_TC_SAMPLE_H +#define __NET_TC_SAMPLE_H + +#include +#include +#include + +struct tcf_sample { + struct tc_action common; + u32 rate; + bool truncate; + u32 trunc_size; + struct psample_group __rcu *psample_group; + u32 psample_group_num; + struct list_head tcfm_list; + struct rcu_head rcu; +}; +#define to_sample(a) ((struct tcf_sample *)a) + +static inline bool is_tcf_sample(const struct tc_action *a) +{ +#ifdef CONFIG_NET_CLS_ACT + return a->ops && a->ops->type == TCA_ACT_SAMPLE; +#else + return false; +#endif +} + +static inline __u32 tcf_sample_rate(const struct tc_action *a) +{ + return to_sample(a)->rate; +} + +static inline bool tcf_sample_truncate(const struct tc_action *a) +{ + return to_sample(a)->truncate; +} + +static inline int tcf_sample_trunc_size(const struct tc_action *a) +{ + return to_sample(a)->trunc_size; +} + +static inline struct psample_group * +tcf_sample_psample_group(const struct tc_action *a) +{ + return rcu_dereference(to_sample(a)->psample_group); +} + +#endif /* __NET_TC_SAMPLE_H */ diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild index e3db740..ba62ddf 100644 --- a/include/uapi/linux/tc_act/Kbuild +++ b/include/uapi/linux/tc_act/Kbuild @@ -4,6 +4,7 @@ header-y += tc_defact.h header-y += tc_gact.h header-y += tc_ipt.h header-y += tc_mirred.h +header-y += tc_sample.h header-y += tc_nat.h header-y += tc_pedit.h header-y += tc_skbedit.h diff --git a/include/uapi/linux/tc_act/tc_sample.h b/include/uapi/linux/tc_act/tc_sample.h new file mode 100644 index 000..21378bc --- /dev/null +++ b/include/uapi/linux/tc_act/tc_sample.h @@ -0,0 +1,26 @@ +#ifndef __LINUX_TC_SAMPLE_H +#define __LINUX_TC_SAMPLE_H + +#include +#include +#include + +#define TCA_ACT_SAMPLE 26 + +struct tc_sample { + tc_gen; +}; + +enum { + TCA_SAMPLE_UNSPEC, + TCA_SAMPLE_PARMS, + TCA_SAMPLE_TM, + TCA_SAMPLE_RATE, + TCA_SAMPLE_TRUNC_SIZE, + TCA_SAMPLE_PSAMPLE_GROUP, + TCA_SAMPLE_PAD, + __TCA_SAMPLE_MAX +}; +#define TCA_SAMPLE_MAX (__TCA_SAMPLE_MAX - 1) + +#endif diff --git a/net/sched/Kconfig b/net/sched/Kconfig index a9aa38d..72cfa3a 100644 --- a/net/sched/Kconfig +++ b/net/sched/Kconfig @@ -650,6 +650,18 @@ config NET_ACT_MIRRED To compile this code as a module, choose M here: the module will be called act_mirred. +config NET_ACT_SAMPLE +tristate "Traffic Sampling" +depends on NET_CLS_ACT +select PSAMPLE +---help--- + Say Y here to allow packet sampling tc action. The packet sample + action consists of statistically choosing packets and sampling + them using the psample module. + + To compile this code as a module, choose M here: the + module will be called act_sample. + config NET_ACT_IPT tristate "IPtables targets" depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES diff --git a/net/sched/Makefile b/net/sched/Makefile index 4bdda36..7b915d2 100644 --- a/net/sched/Makefile +++ b/net/sched/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_NET_CLS_ACT) += act_api.o obj-$(CONFIG_NET_ACT_POLICE) += act_police.o obj-$(CONFIG_NET_ACT_GACT) += act_gact.o obj-$(CONFIG_NET_ACT_MIRRED) += act_mirred.o +obj-$(CONFIG_NET_ACT_SAMPLE) += act_sample.o obj-$(CONFIG_NET_ACT_IPT) += act_ipt.o obj-$(CONFIG_NET_ACT_NAT) += act_nat.o obj-$(CONFIG_NET_ACT_PEDIT)+= act_
[patch net-next 3/4] mlxsw: reg: add the Monitoring Packet Sampling Configuration Register
From: Yotam Gigi The MPSC register allows to configure ingress packet sampling on specific port of the mlxsw device. The sampled packets are then trapped via PKT_SAMPLE trap. Signed-off-by: Yotam Gigi Signed-off-by: Jiri Pirko Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/reg.h | 41 +++ 1 file changed, 41 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index 1357fe0..9fb0316 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -4965,6 +4965,46 @@ static inline void mlxsw_reg_mlcr_pack(char *payload, u8 local_port, MLXSW_REG_MLCR_DURATION_MAX : 0); } +/* MPSC - Monitoring Packet Sampling Configuration Register + * + * MPSC Register is used to configure the Packet Sampling mechanism. + */ +#define MLXSW_REG_MPSC_ID 0x9080 +#define MLXSW_REG_MPSC_LEN 0x1C + +MLXSW_REG_DEFINE(mpsc, MLXSW_REG_MPSC_ID, MLXSW_REG_MPSC_LEN); + +/* reg_mpsc_local_port + * Local port number + * Not supported for CPU port + * Access: Index + */ +MLXSW_ITEM32(reg, mpsc, local_port, 0x00, 16, 8); + +/* reg_mpsc_e + * Enable sampling on port local_port + * Access: RW + */ +MLXSW_ITEM32(reg, mpsc, e, 0x04, 30, 1); + +#define MLXSW_REG_MPSC_RATE_MAX 35UL + +/* reg_mpsc_rate + * Sampling rate = 1 out of rate packets (with randomization around + * the point). Valid values are: 1 to MLXSW_REG_MPSC_RATE_MAX + * Access: RW + */ +MLXSW_ITEM32(reg, mpsc, rate, 0x08, 0, 32); + +static inline void mlxsw_reg_mpsc_pack(char *payload, u8 local_port, bool e, + u32 rate) +{ + MLXSW_REG_ZERO(mpsc, payload); + mlxsw_reg_mpsc_local_port_set(payload, local_port); + mlxsw_reg_mpsc_e_set(payload, e); + mlxsw_reg_mpsc_rate_set(payload, rate); +} + /* SBPR - Shared Buffer Pools Register * --- * The SBPR configures and retrieves the shared buffer pools and configuration. @@ -5429,6 +5469,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = { MLXSW_REG(mpat), MLXSW_REG(mpar), MLXSW_REG(mlcr), + MLXSW_REG(mpsc), MLXSW_REG(sbpr), MLXSW_REG(sbcm), MLXSW_REG(sbpm), -- 2.7.4
[patch net-next 1/4] net: Introduce psample, a new genetlink channel for packet sampling
From: Yotam Gigi Add a general way for kernel modules to sample packets, without being tied to any specific subsystem. This netlink channel can be used by tc, iptables, etc. and allow to standardize packet sampling in the kernel. For every sampled packet, the psample module adds the following metadata fields: PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been truncated during sampling PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the user who initiated the sampling. This field allows the user to differentiate between several samplers working simultaneously and filter packets relevant to him PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The sequence is kept for each group PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets PSAMPLE_ATTR_DATA - the actual packet bits In addition, add the GET_GROUPS netlink command which allows the user to see the current sample groups, their refcount and sequence number. This command currently supports only netlink dump mode. Signed-off-by: Yotam Gigi Signed-off-by: Jiri Pirko --- MAINTAINERS | 7 + include/net/psample.h| 36 ++ include/uapi/linux/Kbuild| 1 + include/uapi/linux/psample.h | 35 + net/Kconfig | 1 + net/Makefile | 1 + net/psample/Kconfig | 15 +++ net/psample/Makefile | 5 + net/psample/psample.c| 301 +++ 9 files changed, 402 insertions(+) create mode 100644 include/net/psample.h create mode 100644 include/uapi/linux/psample.h create mode 100644 net/psample/Kconfig create mode 100644 net/psample/Makefile create mode 100644 net/psample/psample.c diff --git a/MAINTAINERS b/MAINTAINERS index 3c84a8f..d76fccd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9957,6 +9957,13 @@ L: linuxppc-...@lists.ozlabs.org S: Maintained F: drivers/block/ps3vram.c +PSAMPLE PACKET SAMPLING SUPPORT: +M: Yotam Gigi +S: Maintained +F: net/psample +F: include/net/psample.h +F: include/uapi/linux/psample.h + PSTORE FILESYSTEM M: Anton Vorontsov M: Colin Cross diff --git a/include/net/psample.h b/include/net/psample.h new file mode 100644 index 000..b0e --- /dev/null +++ b/include/net/psample.h @@ -0,0 +1,36 @@ +#ifndef __NET_PSAMPLE_H +#define __NET_PSAMPLE_H + +#include +#include +#include + +struct psample_group { + struct list_head list; + struct net *net; + u32 group_num; + u32 refcount; + u32 seq; +}; + +struct psample_group *psample_group_get(struct net *net, u32 group_num); +void psample_group_put(struct psample_group *group); + +#if IS_ENABLED(CONFIG_PSAMPLE) + +void psample_sample_packet(struct psample_group *group, struct sk_buff *skb, + u32 trunc_size, int in_ifindex, int out_ifindex, + u32 sample_rate); + +#else + +static inline void psample_sample_packet(struct psample_group *group, +struct sk_buff *skb, u32 trunc_size, +int in_ifindex, int out_ifindex, +u32 sample_rate) +{ +} + +#endif + +#endif /* __NET_PSAMPLE_H */ diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index e600b50..80ad741 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -305,6 +305,7 @@ header-y += netrom.h header-y += net_namespace.h header-y += net_tstamp.h header-y += nfc.h +header-y += psample.h header-y += nfs2.h header-y += nfs3.h header-y += nfs4.h diff --git a/include/uapi/linux/psample.h b/include/uapi/linux/psample.h new file mode 100644 index 000..ed48996 --- /dev/null +++ b/include/uapi/linux/psample.h @@ -0,0 +1,35 @@ +#ifndef __UAPI_PSAMPLE_H +#define __UAPI_PSAMPLE_H + +enum { + /* sampled packet metadata */ + PSAMPLE_ATTR_IIFINDEX, + PSAMPLE_ATTR_OIFINDEX, + PSAMPLE_ATTR_ORIGSIZE, + PSAMPLE_ATTR_SAMPLE_GROUP, + PSAMPLE_ATTR_GROUP_SEQ, + PSAMPLE_ATTR_SAMPLE_RATE, + PSAMPLE_ATTR_DATA, + + /* commands attributes */ + PSAMPLE_ATTR_GROUP_REFCOUNT, + + __PSAMPLE_ATTR_MAX +}; + +enum psample_command { + PSAMPLE_CMD_SAMPLE, + PSAMPLE_CMD_GET_GROUP, + PSAMPLE_CMD_NEW_GROUP, + PSAMPLE_CMD_DEL_GROUP, +}; + +/* Can be overridden at runtime by module option */ +#define PSAMPLE_ATTR_MAX (__PSAMPLE_ATTR_MAX - 1) + +#define PSAMPLE_NL_MCGRP_CONFIG_NAME "config" +#define PSAMPLE_NL_MCGRP_SAMPLE_NAME "packets" +#define PSAMPLE_GENL_NAME "psample" +#define PSAMPLE_GENL_VERSION 1 + +#endif diff --git a/net/Kconfig b/net/Kconfig index 92ae150..ce4aee6 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -390,6 +390,7 @@ source
[patch net-next 0/4] Add support for offloading packet-sampling
From: Jiri Pirko Yotam says: The first patch introduces the psample module, a netlink channel dedicated to packet sampling implemented using generic netlink. This module provides a generic way for kernel modules to sample packets, while not being tied to any specific subsystem like NFLOG. The second patch adds the sample tc action, which uses psample to randomly sample packets that match a classifier. The user can configure the psample group number, the sampling rate and the packet's truncation (to save kernel-user traffic). The last two patches add the support for offloading the matchall-sample tc command in the mlxsw driver, for ingress qdiscs. An example for psample usage can be found in the libpsample project at: https://github.com/Mellanox/libpsample Yotam Gigi (4): net: Introduce psample, a new genetlink channel for packet sampling net/sched: Introduce sample tc action mlxsw: reg: add the Monitoring Packet Sampling Configuration Register mlxsw: spectrum: Add packet sample offloading support MAINTAINERS| 7 + drivers/net/ethernet/mellanox/mlxsw/reg.h | 41 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 111 + drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 10 + drivers/net/ethernet/mellanox/mlxsw/trap.h | 1 + include/net/psample.h | 36 +++ include/net/tc_act/tc_sample.h | 50 include/uapi/linux/Kbuild | 1 + include/uapi/linux/psample.h | 35 +++ include/uapi/linux/tc_act/Kbuild | 1 + include/uapi/linux/tc_act/tc_sample.h | 26 +++ net/Kconfig| 1 + net/Makefile | 1 + net/psample/Kconfig| 15 ++ net/psample/Makefile | 5 + net/psample/psample.c | 301 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/act_sample.c | 274 ++ 19 files changed, 929 insertions(+) create mode 100644 include/net/psample.h create mode 100644 include/net/tc_act/tc_sample.h create mode 100644 include/uapi/linux/psample.h create mode 100644 include/uapi/linux/tc_act/tc_sample.h create mode 100644 net/psample/Kconfig create mode 100644 net/psample/Makefile create mode 100644 net/psample/psample.c create mode 100644 net/sched/act_sample.c -- 2.7.4
Re: [RFC PATCH net-next 2/5] vxlan: make COLLECT_METADATA mode bridge friendly
On 21/01/17 06:46, Roopa Prabhu wrote: > From: Roopa Prabhu > > This patch series makes vxlan COLLECT_METADATA mode bridge > and layer2 network friendly. Vxlan COLLECT_METADATA mode today > solves the per-vni netdev scalability problem in l3 networks. > When vxlan collect metadata device participates in bridging > vlan to vn-segments, It can only get the vlan mapped vni in > the xmit tunnel dst metadata. It will need the vxlan driver to > continue learn, hold forwarding state and remote destination > information similar to how it already does for non COLLECT_METADATA > vxlan netdevices today. > > Changes introduced by this patch: > - allow learning and forwarding database state to vxlan netdev in > COLLECT_METADATA mode. Current behaviour is not changed > by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used > to support the new bridge friendly mode. > - A single fdb table hashed by (mac, vni) to allow fdb entries with > multiple vnis in the same fdb table > - rx path already has the vni > - tx path expects a vni in the packet with dst_metadata > - prior to this series, fdb remote_dsts carried remote vni and > the vxlan device carrying the fdb table represented the > source vni. With the vxlan device now representing multiple vnis, > this patch adds a src vni attribute to the fdb entry. The remote > vni already uses NDA_VNI attribute. This patch introduces > NDA_SRC_VNI netlink attribute to represent the src vni in a multi > vni fdb table. > > Signed-off-by: Roopa Prabhu > --- [snip] > @@ -2173,23 +2221,29 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, > struct net_device *dev) > bool did_rsc = false; > struct vxlan_rdst *rdst, *fdst = NULL; > struct vxlan_fdb *f; > + __be32 vni = 0; > > info = skb_tunnel_info(skb); > > skb_reset_mac_header(skb); > > if (vxlan->flags & VXLAN_F_COLLECT_METADATA) { > - if (info && info->mode & IP_TUNNEL_INFO_TX) > - vxlan_xmit_one(skb, dev, NULL, false); > - else > - kfree_skb(skb); > - return NETDEV_TX_OK; > + if (info && info->mode & IP_TUNNEL_INFO_BRIDGE && > + info->mode & IP_TUNNEL_INFO_TX) { nit: parentheses around the IP_TUNNEL_INFO_TX check > + vni = tunnel_id_to_key32(info->key.tun_id); > + } else { > + if (info && info->mode & IP_TUNNEL_INFO_TX) nit: parentheses around the IP_TUNNEL_INFO_TX check > + vxlan_xmit_one(skb, dev, vni, NULL, false); > + else > + kfree_skb(skb); > + return NETDEV_TX_OK; > + } > } > > if (vxlan->flags & VXLAN_F_PROXY) { > eth = eth_hdr(skb); > if (ntohs(eth->h_proto) == ETH_P_ARP) > - return arp_reduce(dev, skb); > + return arp_reduce(dev, skb, vni); > #if IS_ENABLED(CONFIG_IPV6) > else if (ntohs(eth->h_proto) == ETH_P_IPV6 && >pskb_may_pull(skb, sizeof(struct ipv6hdr) > @@ -2200,13 +2254,13 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, > struct net_device *dev) > msg = (struct nd_msg > *)skb_transport_header(skb); > if (msg->icmph.icmp6_code == 0 && > msg->icmph.icmp6_type == > NDISC_NEIGHBOUR_SOLICITATION) > - return neigh_reduce(dev, skb); > + return neigh_reduce(dev, skb, vni); > } > #endif > } > > eth = eth_hdr(skb); > - f = vxlan_find_mac(vxlan, eth->h_dest); > + f = vxlan_find_mac(vxlan, eth->h_dest, vni); > did_rsc = false; > > if (f && (f->flags & NTF_ROUTER) && (vxlan->flags & VXLAN_F_RSC) && > @@ -2214,11 +2268,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, > struct net_device *dev) >ntohs(eth->h_proto) == ETH_P_IPV6)) { > did_rsc = route_shortcircuit(dev, skb); > if (did_rsc) > - f = vxlan_find_mac(vxlan, eth->h_dest); > + f = vxlan_find_mac(vxlan, eth->h_dest, vni); > } > > if (f == NULL) { > - f = vxlan_find_mac(vxlan, all_zeros_mac); > + f = vxlan_find_mac(vxlan, all_zeros_mac, vni); > if (f == NULL) { > if ((vxlan->flags & VXLAN_F_L2MISS) && > !is_multicast_ether_addr(eth->h_dest)) > @@ -2239,11 +2293,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, > struct net_device *dev) > } > skb1 = skb_clone(skb, GFP_ATOMIC); > if (skb1) > - vxlan_xmit_one(skb1, dev, rdst, did_rsc); > + vxlan_xmit_one(skb1, dev,
Re: [PATCH] net: mvneta: implement .set_wol and .get_wol
Hi Jingju, [auto build test ERROR on net-next/master] [also build test ERROR on v4.10-rc4 next-20170120] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Jingju-Hou/net-mvneta-implement-set_wol-and-get_wol/20170122-181651 config: sparc64-allmodconfig (attached as .config) compiler: sparc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=sparc64 All errors (new ones prefixed by >>): drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_get_wol': >> drivers/net/ethernet/marvell/mvneta.c:3940:8: error: 'struct mvneta_port' >> has no member named 'phy_dev'; did you mean 'phy_node'? if (pp->phy_dev) ^~ drivers/net/ethernet/marvell/mvneta.c:3941:32: error: 'struct mvneta_port' has no member named 'phy_dev'; did you mean 'phy_node'? return phy_ethtool_get_wol(pp->phy_dev, wol); ^~ drivers/net/ethernet/marvell/mvneta.c:3941:10: warning: 'return' with a value, in function returning void return phy_ethtool_get_wol(pp->phy_dev, wol); ^~~ drivers/net/ethernet/marvell/mvneta.c:3933:1: note: declared here mvneta_ethtool_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) ^~ drivers/net/ethernet/marvell/mvneta.c: In function 'mvneta_ethtool_set_wol': drivers/net/ethernet/marvell/mvneta.c:3949:9: error: 'struct mvneta_port' has no member named 'phy_dev'; did you mean 'phy_node'? if (!pp->phy_dev) ^~ drivers/net/ethernet/marvell/mvneta.c:3952:31: error: 'struct mvneta_port' has no member named 'phy_dev'; did you mean 'phy_node'? return phy_ethtool_set_wol(pp->phy_dev, wol); ^~ drivers/net/ethernet/marvell/mvneta.c:3953:1: warning: control reaches end of non-void function [-Wreturn-type] } ^ vim +3940 drivers/net/ethernet/marvell/mvneta.c 3934 { 3935 struct mvneta_port *pp = netdev_priv(dev); 3936 3937 wol->supported = 0; 3938 wol->wolopts = 0; 3939 > 3940 if (pp->phy_dev) 3941 return phy_ethtool_get_wol(pp->phy_dev, wol); 3942 } 3943 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip