Re: [PATCH net-next] net: bcmgenet: Do not return from void function
From: Florian FainelliDate: Tue, 29 Aug 2017 21:48:51 -0700 > A stray return was added in the macro bcmgenet_##name##_writel where it > should not, drop it. > > Reported-by: kbuild test robot > Fixes: 69d2ea9c7989 ("net: bcmgenet: Use correct I/O accessors") > Signed-off-by: Florian Fainelli Applied, thanks.
Re: [PATCH net-next v3 0/3] NCSI VLAN Filtering Support
From: Samuel Mendoza-JonasDate: Wed, 30 Aug 2017 14:37:21 +1000 > On Mon, 2017-08-28 at 16:50 -0700, David Miller wrote: >> From: Samuel Mendoza-Jonas >> Date: Mon, 28 Aug 2017 16:18:40 +1000 >> >> > This series (mainly patch 2) adds VLAN filtering to the NCSI >> > implementation. >> > A fair amount of code already exists in the NCSI stack for VLAN filtering >> > but >> > none of it is actually hooked up. This goes the final mile and fixes a few >> > bugs in the existing code found along the way (patch 1). >> > >> > Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to >> > enable filtering as it's a large consumer of NCSI (and what I've been >> > testing on). >> > >> > v3: - Add comment describing change to ncsi_find_filter() >> > - Catch NULL in clear_one_vid() from ncsi_get_filter() >> > - Simplify state changes when kicking updated channel >> >> Series applied. > > Thanks David, > > The kbuild bot caught a build error where the add/kill callbacks aren't > defined without CONFIG_NET_NCSI: > >>> ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] >>> undefined! >>> ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] >>> undefined! > > It's a quick fixup to patch 3 as below, would you like me to send it as a v4? You must submit a formal fixup patch to fix bugs if I've said that I've already applied your patch.
Re: Fwd: DA850-evm MAC Address is random
On Wednesday 30 August 2017 06:19 AM, Adam Ford wrote: > On Tue, Aug 29, 2017 at 10:20 AM, Adam Fordwrote: >> On Tue, Aug 29, 2017 at 10:16 AM, Sekhar Nori wrote: >>> On Tuesday 29 August 2017 05:32 PM, Adam Ford wrote: On Tue, Aug 29, 2017 at 6:42 AM, Sekhar Nori wrote: > On Tuesday 29 August 2017 03:53 PM, Adam Ford wrote: >> On Tue, Aug 29, 2017 at 3:23 AM, Sekhar Nori wrote: >>> On Tuesday 29 August 2017 02:42 AM, Tony Lindgren wrote: * Adam Ford [170828 13:33]: > On Mon, Aug 28, 2017 at 1:54 PM, Grygorii Strashko > wrote: >> Cc: Sekhar >> >> On 08/28/2017 10:32 AM, Adam Ford wrote: >>> >>> The davinvi_emac MAC address seems to attempt a call to >>> ti_cm_get_macid in cpsw-common.c but it returns the message >>> 'davinci_emac davinci_emac.1: incompatible machine/device type for >>> reading mac address ' and then generates a random MAC address. >>> >>> The function appears to lookup varions boards using >>> 'of_machine_is_compaible' and supports dm8148, am33xx, am3517, >>> dm816, >>> am4372 and dra7. I don't see the ti,davinci-dm6467-emac which is >>> what's shown in the da850 device tree. >>> >>> Is there a patch somewhere for supporting the da850-evm? >> >> >> Not sure if MAC address can be read from Control module. >> May be Sekhar can say more? > > My understanding is that the MAC address is programmed by Logic PD > into the SPI flash. The Bootloader reads this from either SPI or its > env variables. Looking at the partition info listed in the > da850-evm.dts file, it appears as if they've reserved space for it. > Unfortunately, I don't see any code that reads it out. I was hoping >>> >>> This code is present in U-Boot sources at >>> board/davinci/da8xxevm/da850evm.c. See the function get_mac_addr() and >>> its usage in misc_init_r(). >>> > there might be a way to just pass cmdline parameter from the > bootloader to the kernel to accept the MAC address. > >> >>> >>> If not, is there a way to pass the MAC address from U-Boot to the >>> driver so it doesn't generate a random MAC? >> >> >> "local-mac-address" dt porp > > The downside here, is that we'd have to have the Bootloader modify the > device tree. That piece of code exists somewhere in u-boot already. Note how >>> >>> Yes, it is fdt_fixup_ethernet() and its usage is in common/image-fdt.c. >>> we are populating the mac address for USB Ethernet drivers in u-boot and then the Ethernet driver code parses it. See commit 055d31de7158 ("ARM: omap3: beagleboard-xm: dt: Add ethernet to the device tree") for some more information. I think u-boot needs the ethernet alias for finding the interface. >>> >>> That's exactly what was missing. I have sent a patch for fixing that and >>> copied you there. >> >> Thanks for doing that. >> >>> >>> Adam, if I can get your Tested-by, I will make an attempt to send it for >>> v4.13 itself. >> >> I will test it. Do need to run some instruction or do something >> special in U-Boot to pass this in the proper place for the kernel to >> pull it? Tony's patch reference showed >> command for fdt set, but I am not sure I fully understand the >> parameters that went along with that. > > Nope, just applying the patch and booting the with the new dtb should > result in the random mac address going away. Unfortunately, I am not seeing any change with the patch (at least with Kernel 4.12.9 from stable). netconsole: network logging started davinci_emac davinci_emac.1: incompatible machine/device type for reading mac address davinci_emac davinci_emac.1: using random MAC addr: ee:74:c3:3a:15:be Looking at the source for cpsw-common.c function ti_cm_get_macid() doesn't have a case for the ti,davinci-dm6467-emac so I wonder if there might be more to it. >>> >>> Hmm, it did solve the issue for me when I tried latest -next. And >>> reverting the patch brought back the random mac address usage. Could you >>> try latest mainline or -next? >>> >>> Meanwhile let me see whats going on with the observations you have. >> >> I will try again with -next this afternoon and see what I can find. >> Can you tell me which U-Boot version you're using? I want to match >> your setup. I want to see if something is missing during the hand-off >> between the Bootloader and Linux. >> > > I wonder if U-Boot isn't pushing something to Linux because it
Re: [PATCH net-next 0/4] nsh: headers, GSO
On Wed, Aug 30, 2017 at 06:17:07AM +0800, David Miller wrote: > From: Jiri Benc> Date: Mon, 28 Aug 2017 21:43:20 +0200 > > > This adds header structs and helpers for NSH together with GSO support. > > > > Note there is no code in this patchset that actually manipulates the NSH > > headers. That was sent to netdev by Yi Yang ("[PATCH net-next v6 0/3] > > openvswitch: add NSH support"). The aim of this series is to lay the > > groundwork and ease the implementation for him. > > > > In addition to openvswitch, the NSH support should be added to tc (flower to > > match, act_nsh to push/pop NSH headers). That will come later. There's > > currently no plan to support NSH by other means than those two. > > > > The patch 3 in this patchset was written by Yi Yang, I took it from the > > aforementioned series and slightly modified it - see the note in the patch. > > Series applied, thanks Jiri. Hi, Jiri David has merged your NSH GSO series, do you have one more patch series to fix VxLAN-gpe+NSH GSO issue? I think we still need do something in skb_udp_tunnel_segment in net/ipv4/udp_offload.c to support VxLAN-gpe+NSH GSO. I'll post new OVS NSH kernel datapath patch series if NSH GSO is done.
Re: [PATCH net-next v3 02/13] phy: add the mvebu cp110 comphy driver
Hi, On Tuesday 29 August 2017 06:42 PM, Antoine Tenart wrote: > Hi Kishon, > > On Tue, Aug 29, 2017 at 05:55:06PM +0530, Kishon Vijay Abraham I wrote: >> On Tuesday 29 August 2017 04:53 PM, Antoine Tenart wrote: >>> On Tue, Aug 29, 2017 at 04:34:17PM +0530, Kishon Vijay Abraham I wrote: On Monday 28 August 2017 08:27 PM, Antoine Tenart wrote: > +static const struct mvebu_comhy_conf mvebu_comphy_cp110_modes[] = { > + /* lane 0 */ > + MVEBU_COMPHY_CONF(0, 1, PHY_MODE_SGMII, 0x1), > + /* lane 1 */ > + MVEBU_COMPHY_CONF(1, 2, PHY_MODE_SGMII, 0x1), > + /* lane 2 */ > + MVEBU_COMPHY_CONF(2, 0, PHY_MODE_SGMII, 0x1), > + MVEBU_COMPHY_CONF(2, 0, PHY_MODE_10GKR, 0x1), > + /* lane 3 */ > + MVEBU_COMPHY_CONF(3, 1, PHY_MODE_SGMII, 0x2), > + /* lane 4 */ > + MVEBU_COMPHY_CONF(4, 0, PHY_MODE_SGMII, 0x2), > + MVEBU_COMPHY_CONF(4, 0, PHY_MODE_10GKR, 0x2), > + MVEBU_COMPHY_CONF(4, 1, PHY_MODE_SGMII, 0x1), > + /* lane 5 */ > + MVEBU_COMPHY_CONF(5, 2, PHY_MODE_SGMII, 0x1), > +}; IMHO all the lane and mode configuration should come from dt. That would make it more reusable when comphy is configured differently. >>> >>> These connexions between engines and the comphy lanes are inside the >>> SoC. They won't change for a given SoC, and the actual configuration is >>> at the board level to know what is connected to the output of a given >>> lane, which is already described into the dt (the lane phandle). >>> >>> So I think we can keep this inside the driver, and we'll had other >>> tables if the same comphy is ever used in another SoC. >>> >>> What do you think? >> >> I'd like to avoid adding tables for every SoC. These are configuration >> details >> and can come from dt. > > Actually this is per CP design, not SoC (this one is used in both 7k and > 8k SoCs from Marvell, and probably others). I'm still not convinced this > is a good idea to put this into the dt. First of all we would end up with > something like (and this is only for a single lane, out of *6*): > > cpm_comphy: phy@phy@12 { > compatible = "marvell,comphy-cp110"; > reg = <0x12 0x6000>; > marvell,system-controller = <_syscon0>; > #address-cells = <1>; > #size-cells = <0>; > > cpm_comphy0: phy@0 { > reg = <0>; > #phy-cells = <1>; > > mode@0 { > phy-mode = PHY_MODE_SGMII; > selector = <0x1>; > pipe-selector = <0x0>; > port = <0>; > }; > > mode@1 { > phy-mode = PHY_MODE_HS_SGMII; > selector = <0x1>; > pipe-selector = <0x0>; > port = <0>; > }; > > mode@2 { > phy-mode = PHY_MODE_RXAUI; > selector = <0x2>; > pipe-selector = <0x0>; > port = <0>; > }; > > mode@3 { > phy-mode = PHY_MODE_10GKR; > selector = <0x2>; > pipe-selector = <0x0>; > port = <0>; > }; > > mode@4 { > phy-mode = PHY_MODE_SATA; > selector = <0x4>; > pipe-selector = <0x0>; > port = <1>; > }; > > mode@5 { > phy-mode = PHY_MODE_USB; > selector = <0x0>; > pipe-selector = <0x1>; > port = <2>; > }; > > mode@6 { > phy-mode = PHY_MODE_USB; > selector = <0x0>; > pipe-selector = <0x2>; > port = <0>; > }; I think we should just select the mode that a particular lane has been configured here instead of populating all the modes. But I think that doesn't make sense since the mode is set by the consumer and the initial mode is INVALID. So ignore my comment on having it in dt. > > ... + PCIe, other ports ... > }; > > cpm_comphy1: phy@1 { > ... > }; > > ... > }; > > And this "configuration" makes us think the comphy driver would be > somehow generic with only these parameters to update when using a > different CP. In reality we do have one other comphy in another CP, and > it requires more than just updating the above parameters: the init > functions also are SoC specific. So the table used in the patch proposed > only is a small part of this "configuration". In fact it's not a > configuration at all, but only a mode-to-bit indirection, used in the > comphy init functions. > > What is proposed instead, is very close to what actually changes a lot, > and what a designer can change: the CP
[net-next:master 427/429] arch/sh/include/asm/io.h:30:43: warning: 'return' with a value, in function returning void
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: eaa72dc47488d599439cd0fd0f8c4f1bcb3906bb commit: 69d2ea9c798983c4a7157278ec84ff969d1cd8e8 [427/429] net: bcmgenet: Use correct I/O accessors config: sh-allmodconfig (attached as .config) compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 69d2ea9c798983c4a7157278ec84ff969d1cd8e8 # save the attached .config to linux build tree make.cross ARCH=sh All warnings (new ones prefixed by >>): In file included from include/linux/io.h:25:0, from include/linux/irq.h:24, from arch/sh/include/asm/hardirq.h:5, from include/linux/hardirq.h:8, from include/linux/interrupt.h:12, from drivers/net//ethernet/broadcom/genet/bcmgenet.c:18: drivers/net//ethernet/broadcom/genet/bcmgenet.h: In function 'bcmgenet_ext_writel': >> arch/sh/include/asm/io.h:30:43: warning: 'return' with a value, in function >> returning void #define __raw_writel(v,a) (__chk_io_ptr(a), *(volatile u32 __force *)(a) = (v)) ^ drivers/net//ethernet/broadcom/genet/bcmgenet.h:687:10: note: in expansion of macro '__raw_writel' return __raw_writel(val, priv->base + offset + off); \ ^~~~ drivers/net//ethernet/broadcom/genet/bcmgenet.h:692:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(ext, GENET_EXT_OFF); ^~ In file included from drivers/net//ethernet/broadcom/genet/bcmgenet.c:49:0: drivers/net//ethernet/broadcom/genet/bcmgenet.h:683:20: note: declared here static inline void bcmgenet_##name##_writel(struct bcmgenet_priv *priv, \ ^ drivers/net//ethernet/broadcom/genet/bcmgenet.h:692:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(ext, GENET_EXT_OFF); ^~ In file included from include/linux/io.h:25:0, from include/linux/irq.h:24, from arch/sh/include/asm/hardirq.h:5, from include/linux/hardirq.h:8, from include/linux/interrupt.h:12, from drivers/net//ethernet/broadcom/genet/bcmgenet.c:18: drivers/net//ethernet/broadcom/genet/bcmgenet.h: In function 'bcmgenet_umac_writel': >> arch/sh/include/asm/io.h:30:43: warning: 'return' with a value, in function >> returning void #define __raw_writel(v,a) (__chk_io_ptr(a), *(volatile u32 __force *)(a) = (v)) ^ drivers/net//ethernet/broadcom/genet/bcmgenet.h:687:10: note: in expansion of macro '__raw_writel' return __raw_writel(val, priv->base + offset + off); \ ^~~~ drivers/net//ethernet/broadcom/genet/bcmgenet.h:693:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(umac, GENET_UMAC_OFF); ^~ In file included from drivers/net//ethernet/broadcom/genet/bcmgenet.c:49:0: drivers/net//ethernet/broadcom/genet/bcmgenet.h:683:20: note: declared here static inline void bcmgenet_##name##_writel(struct bcmgenet_priv *priv, \ ^ drivers/net//ethernet/broadcom/genet/bcmgenet.h:693:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(umac, GENET_UMAC_OFF); ^~ In file included from include/linux/io.h:25:0, from include/linux/irq.h:24, from arch/sh/include/asm/hardirq.h:5, from include/linux/hardirq.h:8, from include/linux/interrupt.h:12, from drivers/net//ethernet/broadcom/genet/bcmgenet.c:18: drivers/net//ethernet/broadcom/genet/bcmgenet.h: In function 'bcmgenet_sys_writel': >> arch/sh/include/asm/io.h:30:43: warning: 'return' with a value, in function >> returning void #define __raw_writel(v,a) (__chk_io_ptr(a), *(volatile u32 __force *)(a) = (v)) ^ drivers/net//ethernet/broadcom/genet/bcmgenet.h:687:10: note: in expansion of macro '__raw_writel' return __raw_writel(val, priv->base + offset + off); \ ^~~~ drivers/net//ethernet/broadcom/genet/bcmgenet.h:694:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(sys, GENET_SYS_OFF); ^~ In file included from drivers/net//ethernet/broadcom/genet/bcmgenet.c:49:0: drivers/net//ethernet/broadcom/genet/bcmgenet.h:683:20: note: declared here static inline void bcmgenet_##name##_writel(struct bcmgenet_priv *priv, \ ^
Re: [PATCH net-next v3 02/13] phy: add the mvebu cp110 comphy driver
Hi Antoine, On Monday 28 August 2017 08:27 PM, Antoine Tenart wrote: > On the CP110 unit, which can be found on various Marvell platforms such > as the 7k and 8k (currently), a comphy (common PHYs) hardware block can > be found. This block provides a number of PHYs which can be used in > various modes by other controllers (network, SATA ...). These common > PHYs must be configured for the controllers using them to work correctly > either at boot time, or when the system runs to switch the mode used. > This patch adds a driver for this comphy hardware block, providing > callbacks for the its PHYs so that consumers can configure the modes > used. > > As of this commit, two modes are supported by the comphy driver: sgmii > and 10gkr. Have one more minor comment in addition to my previous comments.. > > Signed-off-by: Antoine Tenart> --- > drivers/phy/marvell/Kconfig | 10 + > drivers/phy/marvell/Makefile | 1 + > drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 656 > +++ > 3 files changed, 667 insertions(+) > create mode 100644 drivers/phy/marvell/phy-mvebu-cp110-comphy.c > > diff --git a/drivers/phy/marvell/Kconfig b/drivers/phy/marvell/Kconfig > index 048d8893bc2e..26755f3d1a9a 100644 > --- a/drivers/phy/marvell/Kconfig > +++ b/drivers/phy/marvell/Kconfig > @@ -21,6 +21,16 @@ config PHY_BERLIN_USB > help > Enable this to support the USB PHY on Marvell Berlin SoCs. > > +config PHY_MVEBU_CP110_COMPHY > + tristate "Marvell CP110 comphy driver" > + depends on ARCH_MVEBU && OF > + select GENERIC_PHY > + help > + This driver allows to control the comphy, an hardware block providing > + shared serdes PHYs on Marvell Armada 7k/8k (in the CP110). Its serdes > + lanes can be used by various controllers (Ethernet, sata, usb, > + PCIe...). > + > config PHY_MVEBU_SATA > def_bool y > depends on ARCH_DOVE || MACH_DOVE || MACH_KIRKWOOD > diff --git a/drivers/phy/marvell/Makefile b/drivers/phy/marvell/Makefile > index 3fc188f59118..0cf6a7cbaf9f 100644 > --- a/drivers/phy/marvell/Makefile > +++ b/drivers/phy/marvell/Makefile > @@ -1,6 +1,7 @@ > obj-$(CONFIG_ARMADA375_USBCLUSTER_PHY) += phy-armada375-usb2.o > obj-$(CONFIG_PHY_BERLIN_SATA)+= phy-berlin-sata.o > obj-$(CONFIG_PHY_BERLIN_USB) += phy-berlin-usb.o > +obj-$(CONFIG_PHY_MVEBU_CP110_COMPHY) += phy-mvebu-cp110-comphy.o > obj-$(CONFIG_PHY_MVEBU_SATA) += phy-mvebu-sata.o > obj-$(CONFIG_PHY_PXA_28NM_HSIC) += phy-pxa-28nm-hsic.o > obj-$(CONFIG_PHY_PXA_28NM_USB2) += phy-pxa-28nm-usb2.o > diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c > b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c > new file mode 100644 > index ..41556e790856 > --- /dev/null > +++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c > @@ -0,0 +1,656 @@ > +/* > + * Copyright (C) 2017 Marvell > + * > + * Antoine Tenart > + * > + * This file is licensed under the terms of the GNU General Public > + * License version 2. This program is licensed "as is" without any > + * warranty of any kind, whether express or implied. > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* Relative to priv->base */ > +#define MVEBU_COMPHY_SERDES_CFG0(n) (0x0 + (n) * 0x1000) > +#define MVEBU_COMPHY_SERDES_CFG0_PU_PLL BIT(1) > +#define MVEBU_COMPHY_SERDES_CFG0_GEN_RX(n) ((n) << 3) > +#define MVEBU_COMPHY_SERDES_CFG0_GEN_TX(n) ((n) << 7) > +#define MVEBU_COMPHY_SERDES_CFG0_PU_RX BIT(11) > +#define MVEBU_COMPHY_SERDES_CFG0_PU_TX BIT(12) > +#define MVEBU_COMPHY_SERDES_CFG0_HALF_BUSBIT(14) > +#define MVEBU_COMPHY_SERDES_CFG1(n) (0x4 + (n) * 0x1000) > +#define MVEBU_COMPHY_SERDES_CFG1_RESET BIT(3) > +#define MVEBU_COMPHY_SERDES_CFG1_RX_INIT BIT(4) > +#define MVEBU_COMPHY_SERDES_CFG1_CORE_RESET BIT(5) > +#define MVEBU_COMPHY_SERDES_CFG1_RF_RESETBIT(6) > +#define MVEBU_COMPHY_SERDES_CFG2(n) (0x8 + (n) * 0x1000) > +#define MVEBU_COMPHY_SERDES_CFG2_DFE_EN BIT(4) > +#define MVEBU_COMPHY_SERDES_STATUS0(n) (0x18 + (n) * 0x1000) > +#define MVEBU_COMPHY_SERDES_STATUS0_TX_PLL_RDY BIT(2) > +#define MVEBU_COMPHY_SERDES_STATUS0_RX_PLL_RDY BIT(3) > +#define MVEBU_COMPHY_SERDES_STATUS0_RX_INIT BIT(4) > +#define MVEBU_COMPHY_PWRPLL_CTRL(n) (0x804 + (n) * 0x1000) > +#define MVEBU_COMPHY_PWRPLL_CTRL_RFREQ(n)((n) << 0) > +#define MVEBU_COMPHY_PWRPLL_PHY_MODE(n) ((n) << 5) > +#define MVEBU_COMPHY_IMP_CAL(n) (0x80c + (n) * 0x1000) > +#define MVEBU_COMPHY_IMP_CAL_TX_EXT(n) ((n) << 10) > +#define MVEBU_COMPHY_IMP_CAL_TX_EXT_EN BIT(15) > +#define MVEBU_COMPHY_DFE_RES(n) (0x81c +
[PATCH net-next] net: bcmgenet: Do not return from void function
A stray return was added in the macro bcmgenet_##name##_writel where it should not, drop it. Reported-by: kbuild test robotFixes: 69d2ea9c7989 ("net: bcmgenet: Use correct I/O accessors") Signed-off-by: Florian Fainelli --- drivers/net/ethernet/broadcom/genet/bcmgenet.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h index 2bfeaefcca0f..4c49d0b97748 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h @@ -684,7 +684,7 @@ static inline void bcmgenet_##name##_writel(struct bcmgenet_priv *priv, \ u32 val, u32 off) \ { \ if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) \ - return __raw_writel(val, priv->base + offset + off);\ + __raw_writel(val, priv->base + offset + off); \ else\ writel_relaxed(val, priv->base + offset + off); \ } -- 2.11.0
[PATCH net-next 3/3 v11] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
RmNet driver provides a transport agnostic MAP (multiplexing and aggregation protocol) support in embedded module. Module provides virtual network devices which can be attached to any IP-mode physical device. This will be used to provide all MAP functionality on future hardware in a single consistent location. Signed-off-by: Subash Abhinov Kasiviswanathan--- Documentation/networking/rmnet.txt | 82 drivers/net/ethernet/qualcomm/Kconfig | 2 + drivers/net/ethernet/qualcomm/Makefile | 2 + drivers/net/ethernet/qualcomm/rmnet/Kconfig| 12 + drivers/net/ethernet/qualcomm/rmnet/Makefile | 10 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 419 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h | 56 +++ .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 271 + .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h | 26 ++ drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h| 88 + .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 107 ++ .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 105 ++ .../net/ethernet/qualcomm/rmnet/rmnet_private.h| 45 +++ drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 170 + drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h| 29 ++ 15 files changed, 1424 insertions(+) create mode 100644 Documentation/networking/rmnet.txt create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h diff --git a/Documentation/networking/rmnet.txt b/Documentation/networking/rmnet.txt new file mode 100644 index 000..6b341ea --- /dev/null +++ b/Documentation/networking/rmnet.txt @@ -0,0 +1,82 @@ +1. Introduction + +rmnet driver is used for supporting the Multiplexing and aggregation +Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm +Technologies, Inc. modems. + +This driver can be used to register onto any physical network device in +IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator. + +Multiplexing allows for creation of logical netdevices (rmnet devices) to +handle multiple private data networks (PDN) like a default internet, tethering, +multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends +packets with MAP headers to rmnet. Based on the multiplexer id, rmnet +routes to the appropriate PDN after removing the MAP header. + +Aggregation is required to achieve high data rates. This involves hardware +sending aggregated bunch of MAP frames. rmnet driver will de-aggregate +these MAP frames and send them to appropriate PDN's. + +2. Packet format + +a. MAP packet (data / control) + +MAP header has the same endianness of the IP packet. + +Packet format - + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command / Data Reserved Pad Multiplexer IDPayload length +Bit32 - x +Function Raw Bytes + +Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command +or data packet. Control packet is used for transport level flow control. Data +packets are standard IP packets. + +Reserved bits are usually zeroed out and to be ignored by receiver. + +Padding is number of bytes to be added for 4 byte alignment if required by +hardware. + +Multiplexer ID is to indicate the PDN on which data has to be sent. + +Payload length includes the padding length but does not include MAP header +length. + +b. MAP packet (command specific) + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command Reserved Pad Multiplexer IDPayload length +Bit 32 - 3940 - 4546 - 47 48 - 63 +Function Command nameReserved Command Type Reserved +Bit 64 - 95 +Function Transaction ID +Bit 96 - 127 +Function Command data + +Command 1 indicates disabling flow while 2 is enabling flow + +Command types - +0 for MAP command request +1 is to acknowledge the receipt of a command +2 is for unsupported commands +3 is for error during processing of commands + +c. Aggregation + +Aggregation is multiple MAP packets (can be data or command) delivered
[PATCH net-next 2/3 v11] net: arp: Add support for raw IP device
Define the raw IP type. This is needed for raw IP net devices like rmnet. Signed-off-by: Subash Abhinov Kasiviswanathan--- include/uapi/linux/if_arp.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h index cf73510..a2a6356 100644 --- a/include/uapi/linux/if_arp.h +++ b/include/uapi/linux/if_arp.h @@ -59,6 +59,7 @@ #define ARPHRD_LAPB516 /* LAPB */ #define ARPHRD_DDCMP517/* Digital's DDCMP protocol */ #define ARPHRD_RAWHDLC 518 /* Raw HDLC */ +#define ARPHRD_RAWIP519/* Raw IP */ #define ARPHRD_TUNNEL 768 /* IPIP tunnel */ #define ARPHRD_TUNNEL6 769 /* IP6IP6 tunnel*/ -- 1.9.1
[PATCH net-next 0/3 v11] Add support for rmnet driver
This patch series adds support for the rmnet driver which is required to support recent chipsets using Qualcomm Technologies, Inc. modems. The data from hardware follows the multiplexing and aggregation protocol (MAP). This driver can be used to register onto any physical network device in IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator. rmnet driver helps to decode these packets and queue them to network stack (and encode and transmit it to the physical device). v1: Same as the RFC patch with some minor fixes for issues reported by kbuild test robot. v1->v2: Change datatypes and remove config IOCTL as mentioned by David. Also fix checkpatch issues and remove some unused code. v2->v3: Move location to drivers/net and rename to rmnet. Change the userspace - netlink communication from custom netlink to rtnl_link_ops. Refactor some code. Use a fixed config for ingress and egress. v3->v4: Move location to drivers/net/ethernet/qualcomm/. Fix comments from Stephen and Jiri - Split the ether and arp type changes into seperate patches. Remove debug and custom logging and switch to standard netdevice log. Remove module parameters. Refactor and change some code style issues. v4->v5: Rename some structs and variables. Move the initializer before the for loop start. Put the arp type in correct sequence. v5->v6: Fix comments from Dan - Use the upper link API. As a result, remove all the refcounting logic. Device refcount is explicitly held on real_dev on rx_handler registration only. Modifiy the flow control struct. Remove the unused ethernet mode handling. v6->v7: Fix comments from David - Add newline to end of Makefile. Remove inline from .c files. Move the module init/exit to rmnet config. Fix an error reported by kbuild test robot for an unused file. v7->v8: Use a smaller value for ETH_P_MAP as mentioned by David. Change netdev_info to netdev_dbg as mentioned by Andew. Fix comments from Stephen regarding netdev_priv and sparse related errors of using 0 as NULL v8->v9: Fix comments from David - Remove the CFLAG rule. Change the way rmnet devices are freed. Instead of using a workqueue to unregister devices individually, go through the list and free all devices within the rtnl_lock(). v9->v10: Actually fix the locking as mentioned by David. The locking scheme is mentioned in a comment in rmnet_config.c. Change comment near MAP type definition as mentioned by Dan. Refactor some code. v10->v11: Allow RMNET to compile as a module as mentioned by David Subash Abhinov Kasiviswanathan (3): net: ether: Add support for multiplexing and aggregation type net: arp: Add support for raw IP device drivers: net: ethernet: qualcomm: rmnet: Initial implementation Documentation/networking/rmnet.txt | 82 drivers/net/ethernet/qualcomm/Kconfig | 2 + drivers/net/ethernet/qualcomm/Makefile | 2 + drivers/net/ethernet/qualcomm/rmnet/Kconfig| 12 + drivers/net/ethernet/qualcomm/rmnet/Makefile | 10 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 419 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h | 56 +++ .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 271 + .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h | 26 ++ drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h| 88 + .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 107 ++ .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 105 ++ .../net/ethernet/qualcomm/rmnet/rmnet_private.h| 45 +++ drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 170 + drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h| 29 ++ include/uapi/linux/if_arp.h| 1 + include/uapi/linux/if_ether.h | 3 + 17 files changed, 1428 insertions(+) create mode 100644 Documentation/networking/rmnet.txt create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h -- 1.9.1
[PATCH net-next 1/3 v11] net: ether: Add support for multiplexing and aggregation type
Define the Qualcomm multiplexing and aggregation (MAP) ether type 0x00F9. This is needed for receiving data in the MAP protocol like RMNET. This is not an officially registered ID. Signed-off-by: Subash Abhinov Kasiviswanathan--- include/uapi/linux/if_ether.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h index 5bc9bfd..30526db 100644 --- a/include/uapi/linux/if_ether.h +++ b/include/uapi/linux/if_ether.h @@ -137,6 +137,9 @@ #define ETH_P_IEEE802154 0x00F6/* IEEE802.15.4 frame */ #define ETH_P_CAIF 0x00F7 /* ST-Ericsson CAIF protocol*/ #define ETH_P_XDSA 0x00F8 /* Multiplexed DSA protocol */ +#define ETH_P_MAP 0x00F9 /* Qualcomm multiplexing and +* aggregation protocol +*/ /* * This is an Ethernet frame header. -- 1.9.1
Re: [PATCH net-next v3 0/3] NCSI VLAN Filtering Support
On Mon, 2017-08-28 at 16:50 -0700, David Miller wrote: > From: Samuel Mendoza-Jonas> Date: Mon, 28 Aug 2017 16:18:40 +1000 > > > This series (mainly patch 2) adds VLAN filtering to the NCSI implementation. > > A fair amount of code already exists in the NCSI stack for VLAN filtering > > but > > none of it is actually hooked up. This goes the final mile and fixes a few > > bugs in the existing code found along the way (patch 1). > > > > Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to > > enable filtering as it's a large consumer of NCSI (and what I've been > > testing on). > > > > v3: - Add comment describing change to ncsi_find_filter() > > - Catch NULL in clear_one_vid() from ncsi_get_filter() > > - Simplify state changes when kicking updated channel > > Series applied. Thanks David, The kbuild bot caught a build error where the add/kill callbacks aren't defined without CONFIG_NET_NCSI: >> ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] >> undefined! >> ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] >> undefined! It's a quick fixup to patch 3 as below, would you like me to send it as a v4? diff --git a/include/net/ncsi.h b/include/net/ncsi.h index 1f96af46df49..2b13b6b91a4d 100644 --- a/include/net/ncsi.h +++ b/include/net/ncsi.h @@ -36,6 +36,14 @@ int ncsi_start_dev(struct ncsi_dev *nd); void ncsi_stop_dev(struct ncsi_dev *nd); void ncsi_unregister_dev(struct ncsi_dev *nd); #else /* !CONFIG_NET_NCSI */ +int ncsi_vlan_rx_add_vid(struct net_device *dev, __be16 proto, u16 vid) +{ + return -ENOTTY; +} +int ncsi_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid) +{ + return -ENOTTY; +} static inline struct ncsi_dev *ncsi_register_dev(struct net_device *dev, void (*notifier)(struct ncsi_dev *nd)) {
Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
On Tue, Aug 29, 2017 at 09:38:16PM -0600, David Ahern wrote: > On 8/29/17 8:58 PM, Alexei Starovoitov wrote: > > On Tue, Aug 29, 2017 at 07:03:43PM -0600, David Ahern wrote: > >> On 8/28/17 10:11 PM, Alexei Starovoitov wrote: > >>> > >>> Agree on the above, but you're mixing semantics of the new recurse > >>> flag and implementation of it. Ex: we don't have to copy this flag > >>> from prog->attr into cgroup. So this reset or non-reset discussion > >>> only makes sense in the context of your current implementation. > >>> We can implement the logic differently. Like don't copy that flag > >>> at all and at attach time walk parent->parent->parent and see > >>> what programs are attached. All of them should have prog->attr & > >>> recurse_bit set > >>> In such implementation detach from 'b' is a nop from reset/non-reset > >>> point of view. When socket creation in 'c' is invoked the program > >>> 'c' is called first then the code keeps walking parents until root > >>> invoking 'a' along the way. > >> > >> So you are suggesting there is no recursive flag per cgroup? How do you > >> know you need to walk cgroups? How do you know when to stop running > >> programs? > > > > you're talking about implementation, right? > > My 'proposed' implemenation of walking from cgroup all the way to the root > > is just an example. It's not efficient. More below... > > > >>> I'm not saying it will be an efficient implementation. The point > >>> is to discuss UAPI independent of implementation. > >>> > ### > > Also, let's agree on this intention. Based on the new ground rule, I > want to point out this example: > > If 'a' gets a program installed with no recurse flag set, ONLY processes > in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd' > all stop at cgroup 'b' program. > >>> > >>> I'm proposing that such situation should not be allowed to happen. > >>> In a->b->c->d cgroup scenario if override+recurse prog attached to 'b' > >>> then only the same override+recurse can be attached to c, d, a. > >>> So at detach time there can be gaps (like only 'b' and 'd' have > >>> override+recurse progs), but walking up until root from any point > >>> will guarantee that only override+recurse programs are seen. > >>> > >> > >> That seems very limiting to me. Seems like you are suggesting the entire > >> cgroup tree is recursive or non-recursive, but never a mix. > > > > Entire cgroup subtree. Yes. It's the simplest uapi I could think of. > > So 10 email exchanges later you agree on the UAPI I implemented in this > patch: user opts in to recursive behavior via a new flag at attach time, > and once a recursive program is installed at some point in the cgroup > tree it applies to all descendant cgroups. > > So all of these exchanges weren't about the UAPI, but your disagreement > in my implementation. The only user visible change here is only programs > marked recursive are run versus going back to the first cgroup marked > non-recursive. you cannot be serious. Your implemention is not at all what i'm proposing above as a simplest uapi. Should we all go back and re-read from the beginning? > > Easy to understand and argue about and I think it's solving your use case. > > It's also easily extendable. New combination and features won't break > > the users. It feels you're in rush to get this stuff for this merge > > window, therefore I want to agree on something that is simple, > > non-controversial and extensible. > > I am in no-rush, but this does not to fall by the wayside like the net > namespace specification. It's more than that! I think you only looking at it from vrf perspective whereas cgroup-bpf became a corner stone feature and enabler for tcp-bpf which in turn became a stepping stone for bpf_sk_redirect. So no, I'm not going to let something half baked that touches the core idea of cgroup-bpf slide in. Tejun and Andy also need to take a look, so yes it will take long time for everyone to agree on this core uapi. > Given the pending release of 4.13 net-next will close which gives a 2+ > week window to work on v3 before the next merge window. Plenty of time > for me to work it into the many other things on my plate. As I proposed several emails ago, please repost patches 2 and 3 that I already acked and we can continue discussing this patch without the agitation.
Re: multi-queue over IFF_NO_QUEUE "virtual" devices
Le 08/07/17 à 15:26, Florian Fainelli a écrit : > Hi, > > Most DSA supported Broadcom switches have multiple queues per ports > (usually 8) and each of these queues can be configured with different > pause, drop, hysteresis thresholds and so on in order to make use of the > switch's internal buffering scheme and have some queues achieve some > kind of lossless behavior (e.g: LAN to LAN traffic for Q7 has a higher > priority than LAN to WAN for Q0). > > This is obviously very workload specific, so I'd want maximum > programmability as much as possible. > > This brings me to a few questions: > > 1) If we have the DSA slave network devices currently flagged with > IFF_NO_QUEUE becoming multi-queue (on TX) aware such that an application > can control exactly which switch egress queue is used on a per-flow > basis, would that be a problem (this is the dynamic selection of the TX > queue)? So I have this part figured out, with a bunch of changes network devices created by DSA are now multiqueue aware and the Broadcom tag layer is capable of extracting the queue index, passing it in the tag where expected and having the switch forward to the appropriate switch port and queue within that port. It also sets the queue mapping in the SKB for later consumption by the master network device driver: bcmsysport.c because of 2). > > 2) The conduit interface (CPU) port network interface has a congestion > control scheme which requires each of its TX queues (32 or 16) to be > statically mapped to each of the underlying switch port queues because > the congestion/ HW needs to inspect the queue depths of the switch to > accept/reject a packet at the CPU's TX ring level. Do we have a good way > with tc to map a virtual/stacked device's queue(s) on-top of its > physical/underlying device's queues (this is the static queue mapping > necessary for congestion to work)? That part I have not figured out yet, with some static mapping I can obtain the results that I want and was even considering the possibility of doing something like this: - register a network device notifier with bcmsysport.c (master network device) for this setup - expose a helper function allowing me to obtain a given DSA network device port index - whenever DSA creates network devices reconfigure the ring and queue mapping of the TX queues managed by bcmsysport.c with the DSA network device port index that has just been registered and just do a 1-1 mapping of the 8 queues You would end-up with something like: gphy (port 0) queues 0-7 mapped to systemport queues 0-7 rgmii_1 (port 1) queues 0-7 mapped to systemport queues 8-15 rgmii_2 (port 2) queues 0-7 mapped to systemport queues 16 through 23 moca (port 7) queues 0-7 mapped to systemport queues 24-31 This should be working because bcmsysport's TX queues are not under direct control by the user, they are used via DSA created network devices which indicate the queue they want to use. When the DSA interfaces are brought down, their respective systemport queues now become unused. This also works because the number of physical ports of the switch times the number of queues is matching the number of TX queues from systemport (like if someone designed it with that exact purpose in mind ;)). The only problem with that approach of course is that it embeds a policy within the systemport driver. Ideally I would really like to configure this via tc by setting up a mapping between queues of one network devices to queues of another network device, is that a possible thing, Jamal, Cong, Jiri, do you know? -- Florian
[net-next:master 427/429] drivers/net/ethernet/broadcom/genet/bcmgenet.h:687:10: note: in expansion of macro '__raw_writel'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: eaa72dc47488d599439cd0fd0f8c4f1bcb3906bb commit: 69d2ea9c798983c4a7157278ec84ff969d1cd8e8 [427/429] net: bcmgenet: Use correct I/O accessors config: blackfin-allyesconfig (attached as .config) compiler: bfin-uclinux-gcc (GCC) 6.2.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git checkout 69d2ea9c798983c4a7157278ec84ff969d1cd8e8 # save the attached .config to linux build tree make.cross ARCH=blackfin All error/warnings (new ones prefixed by >>): In file included from arch/blackfin/mach-bf533/include/mach/blackfin.h:15:0, from arch/blackfin/include/asm/irqflags.h:11, from include/linux/irqflags.h:15, from arch/blackfin/include/asm/bitops.h:33, from include/linux/bitops.h:36, from include/linux/kernel.h:10, from drivers/net/ethernet/broadcom/genet/bcmgenet.c:13: drivers/net/ethernet/broadcom/genet/bcmgenet.h: In function 'bcmgenet_ext_writel': >> arch/blackfin/include/asm/def_LPBlackfin.h:38:2: error: expected expression >> before '__asm__' __asm__ __volatile__( \ ^ >> arch/blackfin/include/asm/def_LPBlackfin.h:51:33: note: in expansion of >> macro '_bfin_writeX' #define bfin_write32(addr, val) _bfin_writeX(addr, val, 32, ) ^~~~ >> arch/blackfin/include/asm/io.h:20:33: note: in expansion of macro >> 'bfin_write32' #define __raw_writel(val, addr) bfin_write32(addr, val) ^~~~ >> drivers/net/ethernet/broadcom/genet/bcmgenet.h:687:10: note: in expansion of >> macro '__raw_writel' return __raw_writel(val, priv->base + offset + off); \ ^~~~ drivers/net/ethernet/broadcom/genet/bcmgenet.h:692:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(ext, GENET_EXT_OFF); ^~ >> arch/blackfin/include/asm/def_LPBlackfin.h:38:2: warning: 'return' with a >> value, in function returning void __asm__ __volatile__( \ ^ >> arch/blackfin/include/asm/def_LPBlackfin.h:51:33: note: in expansion of >> macro '_bfin_writeX' #define bfin_write32(addr, val) _bfin_writeX(addr, val, 32, ) ^~~~ >> arch/blackfin/include/asm/io.h:20:33: note: in expansion of macro >> 'bfin_write32' #define __raw_writel(val, addr) bfin_write32(addr, val) ^~~~ >> drivers/net/ethernet/broadcom/genet/bcmgenet.h:687:10: note: in expansion of >> macro '__raw_writel' return __raw_writel(val, priv->base + offset + off); \ ^~~~ drivers/net/ethernet/broadcom/genet/bcmgenet.h:692:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(ext, GENET_EXT_OFF); ^~ In file included from drivers/net/ethernet/broadcom/genet/bcmgenet.c:49:0: drivers/net/ethernet/broadcom/genet/bcmgenet.h:683:20: note: declared here static inline void bcmgenet_##name##_writel(struct bcmgenet_priv *priv, \ ^ drivers/net/ethernet/broadcom/genet/bcmgenet.h:692:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(ext, GENET_EXT_OFF); ^~ In file included from arch/blackfin/mach-bf533/include/mach/blackfin.h:15:0, from arch/blackfin/include/asm/irqflags.h:11, from include/linux/irqflags.h:15, from arch/blackfin/include/asm/bitops.h:33, from include/linux/bitops.h:36, from include/linux/kernel.h:10, from drivers/net/ethernet/broadcom/genet/bcmgenet.c:13: drivers/net/ethernet/broadcom/genet/bcmgenet.h: In function 'bcmgenet_umac_writel': >> arch/blackfin/include/asm/def_LPBlackfin.h:38:2: error: expected expression >> before '__asm__' __asm__ __volatile__( \ ^ >> arch/blackfin/include/asm/def_LPBlackfin.h:51:33: note: in expansion of >> macro '_bfin_writeX' #define bfin_write32(addr, val) _bfin_writeX(addr, val, 32, ) ^~~~ >> arch/blackfin/include/asm/io.h:20:33: note: in expansion of macro >> 'bfin_write32' #define __raw_writel(val, addr) bfin_write32(addr, val) ^~~~ >> drivers/net/ethernet/broadcom/genet/bcmgenet.h:687:10: note: in expansion of >> macro '__raw_writel' return __raw_writel(val, priv->base + offset + off); \ ^~~~ drivers/net/ethernet/broadcom/genet/bcmgenet.h:693:1: note: in expansion of macro 'GENET_IO_MACRO' GENET_IO_MACRO(umac, GENET_UMAC_OFF); ^~ >> arch/blackfin/include/asm/def_LPBlackfin.h:38:2: warning: 'return' with a >> value, in function returning
Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
On 8/29/17 8:58 PM, Alexei Starovoitov wrote: > On Tue, Aug 29, 2017 at 07:03:43PM -0600, David Ahern wrote: >> On 8/28/17 10:11 PM, Alexei Starovoitov wrote: >>> >>> Agree on the above, but you're mixing semantics of the new recurse >>> flag and implementation of it. Ex: we don't have to copy this flag >>> from prog->attr into cgroup. So this reset or non-reset discussion >>> only makes sense in the context of your current implementation. >>> We can implement the logic differently. Like don't copy that flag >>> at all and at attach time walk parent->parent->parent and see >>> what programs are attached. All of them should have prog->attr & >>> recurse_bit set >>> In such implementation detach from 'b' is a nop from reset/non-reset >>> point of view. When socket creation in 'c' is invoked the program >>> 'c' is called first then the code keeps walking parents until root >>> invoking 'a' along the way. >> >> So you are suggesting there is no recursive flag per cgroup? How do you >> know you need to walk cgroups? How do you know when to stop running >> programs? > > you're talking about implementation, right? > My 'proposed' implemenation of walking from cgroup all the way to the root > is just an example. It's not efficient. More below... > >>> I'm not saying it will be an efficient implementation. The point >>> is to discuss UAPI independent of implementation. >>> ### Also, let's agree on this intention. Based on the new ground rule, I want to point out this example: If 'a' gets a program installed with no recurse flag set, ONLY processes in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd' all stop at cgroup 'b' program. >>> >>> I'm proposing that such situation should not be allowed to happen. >>> In a->b->c->d cgroup scenario if override+recurse prog attached to 'b' >>> then only the same override+recurse can be attached to c, d, a. >>> So at detach time there can be gaps (like only 'b' and 'd' have >>> override+recurse progs), but walking up until root from any point >>> will guarantee that only override+recurse programs are seen. >>> >> >> That seems very limiting to me. Seems like you are suggesting the entire >> cgroup tree is recursive or non-recursive, but never a mix. > > Entire cgroup subtree. Yes. It's the simplest uapi I could think of. So 10 email exchanges later you agree on the UAPI I implemented in this patch: user opts in to recursive behavior via a new flag at attach time, and once a recursive program is installed at some point in the cgroup tree it applies to all descendant cgroups. So all of these exchanges weren't about the UAPI, but your disagreement in my implementation. The only user visible change here is only programs marked recursive are run versus going back to the first cgroup marked non-recursive. > Easy to understand and argue about and I think it's solving your use case. > It's also easily extendable. New combination and features won't break > the users. It feels you're in rush to get this stuff for this merge > window, therefore I want to agree on something that is simple, > non-controversial and extensible. I am in no-rush, but this does not to fall by the wayside like the net namespace specification. Given the pending release of 4.13 net-next will close which gives a 2+ week window to work on v3 before the next merge window. Plenty of time for me to work it into the many other things on my plate.
Re: [PATCH net-next] ipv6: Use rt6i_idev index for echo replies to a local address
On Mon, 2017-08-28 at 13:53 -0700, David Ahern wrote: > Tariq repored local pings to linklocal address is failing: > $ ifconfig ens8 > ens8: flags=4163mtu 1500 > inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255 > inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20 > ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet) > RX packets 12 bytes 1164 (1.1 KiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 30 bytes 2484 (2.4 KiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > $ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8 > PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes > > --- fe80::7efe:90ff:fecb:7502%ens8 ping statistics --- Note that the presence of this leading --- had the effect of truncating the merged patch from this point. (all tags were ignored) > 3 packets transmitted, 0 received, 100% packet loss, time 2043ms > > icmpv6_echo_reply needs to use the rt6i_idev dev index for local traffic > similar to how icmp6_send does. Convert the change for icmp6_send into a > helper that can be used in both places. Add the long over due > skb_rt6_info helper to convert dst on an skb to rt6_info similar to > skb_rtable for ipv4. > > Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on >device with address") > Reported-by: Tariq Toukan > Signed-off-by: David Ahern > ---
Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
On Tue, Aug 29, 2017 at 9:45 PM, Jason Wangwrote: > > > On 2017年08月30日 03:35, Willem de Bruijn wrote: >> >> On Fri, Aug 25, 2017 at 9:03 PM, Willem de Bruijn >> wrote: >>> >>> On Fri, Aug 25, 2017 at 7:32 PM, Michael S. Tsirkin >>> wrote: On Fri, Aug 25, 2017 at 06:44:36PM -0400, Willem de Bruijn wrote: >> >> We don't enable network watchdog on virtio but we could and maybe >> should. > > Can you elaborate? The issue is that holding onto buffers for very long times makes guests think they are stuck. This is funamentally because from guest point of view this is a NIC, so it is supposed to transmit things out in a timely manner. If host backs the virtual NIC by something that is not a NIC, with traffic shaping etc introducing unbounded latencies, guest will be confused. >>> >>> That assumes that guests are fragile in this regard. A linux guest >>> does not make such assumptions. >> >> Yes it does. Examples above: >> > > - a single slow flow can occupy the whole ring, you will >> not >> > > be able to make any new buffers available for the fast >> flow > > Oh, right. Though those are due to vring_desc pool exhaustion > rather than an upper bound on latency of any single packet. > > Limiting the number of zerocopy packets in flight to some fraction > of the ring ensures that fast flows can always grab a slot. > Running > out of ubuf_info slots reverts to copy, so indirectly does this. But > I read it correclty the zerocopy pool may be equal to or larger than > the descriptor pool. Should we refine the zcopy_used test > > (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx > > to also return false if the number of outstanding ubuf_info is greater > than, say, vq->num >> 1? We'll need to think about where to put the threshold, but I think it's a good idea. Maybe even a fixed number, e.g. max(vq->num >> 1, X) to limit host resources. In a sense it still means once you run out of slots zcopt gets disabled possibly permanently. Need to experiment with some numbers. >>> >>> I can take a stab with two flows, one delayed in a deep host qdisc >>> queue. See how this change affects the other flow and also how >>> sensitive that is to the chosen threshold value. >> >> Incomplete results at this stage, but I do see this correlation between >> flows. It occurs even while not running out of zerocopy descriptors, >> which I cannot yet explain. >> >> Running two threads in a guest, each with a udp socket, each >> sending up to 100 datagrams, or until EAGAIN, every msec. >> >> Sender A sends 1B datagrams. >> Sender B sends VHOST_GOODCOPY_LEN, which is enough >> to trigger zcopy_used in vhost net. >> >> A local receive process on the host receives both flows. To avoid >> a deep copy when looping the packet onto the receive path, >> changed skb_orphan_frags_rx to always return false (gross hack). >> >> The flow with the larger packets is redirected through netem on ifb0: >> >>modprobe ifb >>ip link set dev ifb0 up >>tc qdisc add dev ifb0 root netem limit $LIMIT rate 1MBit >> >>tc qdisc add dev tap0 ingress >>tc filter add dev tap0 parent : protocol ip \ >>u32 match ip dport 8000 0x \ >>action mirred egress redirect dev ifb0 >> >> For 10 second run, packet count with various ifb0 queue lengths $LIMIT: >> >> no filter >>rx.A: ~840,000 >>rx.B: ~840,000 > > > Just to make sure I understand the case here. What did rx.B mean here? I > thought all traffic sent by Sender B has been redirected to ifb0? It has been, but the packet still arrives at the destination socket. IFB is a special virtual device that applies traffic shaping and then reinjects it back at the point it was intercept by mirred. rx.B is indeed arrival rate at the receiver, similar to rx.A. >> >> limit 1 >>rx.A: ~500,000 >>rx.B: ~3100 >>ifb0: 3273 sent, 371141 dropped >> >> limit 100 >>rx.A: ~9000 >>rx.B: ~4200 >>ifb0: 4630 sent, 1491 dropped >> >> limit 1000 >>rx.A: ~6800 >>rx.B: ~4200 >>ifb0: 4651 sent, 0 dropped >> >> Sender B is always correctly rate limited to 1 MBps or less. With a >> short queue, it ends up dropping a lot and sending even less. >> >> When a queue builds up for sender B, sender A throughput is strongly >> correlated with queue length. With queue length 1, it can send almost >> at unthrottled speed. But even at limit 100 its throughput is on the >> same order as sender B. >> >> What is surprising to me is that this happens even though the number >> of ubuf_info in use at limit 100 is around 100 at all times. In other >> words, >> it does not exhaust the pool. >> >> When
Re: [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions
On Tue, Aug 29, 2017 at 5:58 PM, David Millerwrote: > From: Tom Herbert > Date: Tue, 29 Aug 2017 16:27:07 -0700 > >> For UDP socket lookup functions, and associateed functions that take an >> skbuf as argument, declare the skb argument as constant. >> >> One caveat is that reuseport_select_sock can be called from the UDP >> lookup functions with an skb argument. This function temporarily >> modifies the skbuff data pointer (in bpf_run via a pull/push sequence). >> To resolve compiler warning I added a local skbuf declaration that is >> not const and assigned to the skb argument with an explicit cast. >> >> Signed-off-by: Tom Herbert > > Please don't do this. > > If reuseport_select_sock() modifies anything in the SKB, especially > skb->data, it infects the entire call chain. So you can't mark it > const in this family of calls. > reuseport_select_sock calls run_bpf that calls pskb_pull to "temporarily advance data past protocol header" and it calls bpf_prog_run_save_cb which takes non-constant skb argument. This is the only instance in all the udp lookup functions where non-constant is needed. It's logical that constant skbuf makes sense for socket lookup-- I doubt any caller would expect the skbuf to be modified as a side effect. It's also an implicit characteristic since reuseport_select_sock may just clone the socket before calling BPF. The problem is that all the flow dissector functions operate on const skbs (again that's logical :-) ). So if we want to be able to call lookup functions or even BPF to do flow dissection, then I think something needs to change. I really don't want to unconsitify the flow dissector functions. We could just always do the skb before calling BPF, but I suppose that is a potential performance hit. Is there a better way to resolve this? Thanks, Tom
Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
On Tue, Aug 29, 2017 at 07:03:43PM -0600, David Ahern wrote: > On 8/28/17 10:11 PM, Alexei Starovoitov wrote: > > > > Agree on the above, but you're mixing semantics of the new recurse > > flag and implementation of it. Ex: we don't have to copy this flag > > from prog->attr into cgroup. So this reset or non-reset discussion > > only makes sense in the context of your current implementation. > > We can implement the logic differently. Like don't copy that flag > > at all and at attach time walk parent->parent->parent and see > > what programs are attached. All of them should have prog->attr & > > recurse_bit set > > In such implementation detach from 'b' is a nop from reset/non-reset > > point of view. When socket creation in 'c' is invoked the program > > 'c' is called first then the code keeps walking parents until root > > invoking 'a' along the way. > > So you are suggesting there is no recursive flag per cgroup? How do you > know you need to walk cgroups? How do you know when to stop running > programs? you're talking about implementation, right? My 'proposed' implemenation of walking from cgroup all the way to the root is just an example. It's not efficient. More below... > > I'm not saying it will be an efficient implementation. The point > > is to discuss UAPI independent of implementation. > > > >> ### > >> > >> Also, let's agree on this intention. Based on the new ground rule, I > >> want to point out this example: > >> > >> If 'a' gets a program installed with no recurse flag set, ONLY processes > >> in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd' > >> all stop at cgroup 'b' program. > > > > I'm proposing that such situation should not be allowed to happen. > > In a->b->c->d cgroup scenario if override+recurse prog attached to 'b' > > then only the same override+recurse can be attached to c, d, a. > > So at detach time there can be gaps (like only 'b' and 'd' have > > override+recurse progs), but walking up until root from any point > > will guarantee that only override+recurse programs are seen. > > > > That seems very limiting to me. Seems like you are suggesting the entire > cgroup tree is recursive or non-recursive, but never a mix. Entire cgroup subtree. Yes. It's the simplest uapi I could think of. Easy to understand and argue about and I think it's solving your use case. It's also easily extendable. New combination and features won't break the users. It feels you're in rush to get this stuff for this merge window, therefore I want to agree on something that is simple, non-controversial and extensible. If you're not in rush (I'm not), we can come up with more flexible uapi. For example: another way of thinking about your 'recursive' requirement is to think that all 'program to be run' should be present as a link list in a given cgroup. So no walking a chain of parents. Instead of 'recursive' let's call this new flag 'multiprog'. Now in a->b->c->d scenario. We can install 'multiprog' prog in 'b'. The kernel will automatically propage it (like it does right now with css_for_each_descendant_pre() loop) to 'c' and to 'd'. Now we allow users to attach another 'multiprog' program to 'c'. The kernel will maintain a link list of programs in every cgroup, so there will be a link list of two programs in 'c' and 'd' and invocation of the programs will be faster than walking cgroup->parent->parent and checking some flags at every step, since there will be less pointer dereferences and no flags to check. Just invoke all programs in the current cgroup. Kernel took care of ordering at the time of attach/detach. I believe Andy proposed something like this back in Jan/Feb.
RE: Question about ip_defrag
Best Regards, liujian > -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Florian Westphal > Sent: Tuesday, August 29, 2017 9:47 PM > To: liujian (CE) > Cc: Florian Westphal; Jesper Dangaard Brouer; netdev@vger.kernel.org; > Wangkefeng (Kevin); weiyongjun (A) > Subject: Re: Question about ip_defrag > > liujian (CE)wrote: > > [ trimming cc list ] > > > Now, I have not the real environment. > > I use iperf generate fragment packets; and I always change NIC rx > > irq's affinity cpu, to make sure frag_mem_limit reach to thresh. > > my test machine, CPU num is 384. > > Oh well, that explains it. > > > > > + if (frag_mem_limit(nf) > nf->low_thresh) { > > > > inet_frag_schedule_worker(f); > > > > + update_frag_mem_limit(nf, SKB_TRUESIZE(1500) * 16); > > > > + } > > You need to reduce this to a lower value. > Your cpu count * batch_value needs to be less than low_thresh to avoid > problems. > > Wtih 384 cpus its close to 12 mbyte... > > Perhaps do this: > > update_frag_mem_limit(nf, 2 * 1024*1024 / NR_CPUS); > > > However, I think its better to revert the percpu counter change and move back > to a single atomic_t count. Ok. Florian and Jesper, many thanks for this issue.
Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
On 2017年08月30日 03:35, Willem de Bruijn wrote: On Fri, Aug 25, 2017 at 9:03 PM, Willem de Bruijnwrote: On Fri, Aug 25, 2017 at 7:32 PM, Michael S. Tsirkin wrote: On Fri, Aug 25, 2017 at 06:44:36PM -0400, Willem de Bruijn wrote: We don't enable network watchdog on virtio but we could and maybe should. Can you elaborate? The issue is that holding onto buffers for very long times makes guests think they are stuck. This is funamentally because from guest point of view this is a NIC, so it is supposed to transmit things out in a timely manner. If host backs the virtual NIC by something that is not a NIC, with traffic shaping etc introducing unbounded latencies, guest will be confused. That assumes that guests are fragile in this regard. A linux guest does not make such assumptions. Yes it does. Examples above: > > - a single slow flow can occupy the whole ring, you will not > > be able to make any new buffers available for the fast flow Oh, right. Though those are due to vring_desc pool exhaustion rather than an upper bound on latency of any single packet. Limiting the number of zerocopy packets in flight to some fraction of the ring ensures that fast flows can always grab a slot. Running out of ubuf_info slots reverts to copy, so indirectly does this. But I read it correclty the zerocopy pool may be equal to or larger than the descriptor pool. Should we refine the zcopy_used test (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx to also return false if the number of outstanding ubuf_info is greater than, say, vq->num >> 1? We'll need to think about where to put the threshold, but I think it's a good idea. Maybe even a fixed number, e.g. max(vq->num >> 1, X) to limit host resources. In a sense it still means once you run out of slots zcopt gets disabled possibly permanently. Need to experiment with some numbers. I can take a stab with two flows, one delayed in a deep host qdisc queue. See how this change affects the other flow and also how sensitive that is to the chosen threshold value. Incomplete results at this stage, but I do see this correlation between flows. It occurs even while not running out of zerocopy descriptors, which I cannot yet explain. Running two threads in a guest, each with a udp socket, each sending up to 100 datagrams, or until EAGAIN, every msec. Sender A sends 1B datagrams. Sender B sends VHOST_GOODCOPY_LEN, which is enough to trigger zcopy_used in vhost net. A local receive process on the host receives both flows. To avoid a deep copy when looping the packet onto the receive path, changed skb_orphan_frags_rx to always return false (gross hack). The flow with the larger packets is redirected through netem on ifb0: modprobe ifb ip link set dev ifb0 up tc qdisc add dev ifb0 root netem limit $LIMIT rate 1MBit tc qdisc add dev tap0 ingress tc filter add dev tap0 parent : protocol ip \ u32 match ip dport 8000 0x \ action mirred egress redirect dev ifb0 For 10 second run, packet count with various ifb0 queue lengths $LIMIT: no filter rx.A: ~840,000 rx.B: ~840,000 Just to make sure I understand the case here. What did rx.B mean here? I thought all traffic sent by Sender B has been redirected to ifb0? limit 1 rx.A: ~500,000 rx.B: ~3100 ifb0: 3273 sent, 371141 dropped limit 100 rx.A: ~9000 rx.B: ~4200 ifb0: 4630 sent, 1491 dropped limit 1000 rx.A: ~6800 rx.B: ~4200 ifb0: 4651 sent, 0 dropped Sender B is always correctly rate limited to 1 MBps or less. With a short queue, it ends up dropping a lot and sending even less. When a queue builds up for sender B, sender A throughput is strongly correlated with queue length. With queue length 1, it can send almost at unthrottled speed. But even at limit 100 its throughput is on the same order as sender B. What is surprising to me is that this happens even though the number of ubuf_info in use at limit 100 is around 100 at all times. In other words, it does not exhaust the pool. When forcing zcopy_used to be false for all packets, this effect of sender A throughput being correlated with sender B does not happen. no filter rx.A: ~850,000 rx.B: ~850,000 limit 100 rx.A: ~850,000 rx.B: ~4200 ifb0: 4518 sent, 876182 dropped Also relevant is that with zerocopy, the sender processes back off and report the same count as the receiver. Without zerocopy, both senders send at full speed, even if only 4200 packets from flow B arrive at the receiver. This is with the default virtio_net driver, so without napi-tx. What kind of qdisc do you use in guest? I suspect we should use something which could do fair queueing (e.g sfq). It appears that the zerocopy notifications are pausing the guest. Will look at that now. Another factor is the tx interrupt coalescing parameters of ifb0, maybe we should disable it during the test. Thanks By the way, I have
Re: [PATCH net] sch_hhf: fix null pointer dereference on init failure
On Tue, Aug 29, 2017 at 12:02 PM, Nikolay Aleksandrovwrote: > First I did it with the check in the for () conditional, but this is more > visible and explicit. Let me know if you'd like the shorter version. :-) Or, if you want to make the patch size smaller, just check NULL before for(): if (!q->hh_flows) return; for (...) Up to you, I have no strong opinion here, slightly prefer a smaller one for backport.
Re: [PATCH net-next 0/3 v10] Add support for rmnet driver
On 2017-08-29 19:12, David Miller wrote: Sigh, I had to revert. You only allow RMNET to take on the values 'y' and 'n'. You must allow for it to be 'm' and modular as well. Hi David I'll fix this now. Sorry about the cover letter. I'll explain it better in subsequent submission. -- Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH net-next 0/3 v10] Add support for rmnet driver
Sigh, I had to revert. You only allow RMNET to take on the values 'y' and 'n'. You must allow for it to be 'm' and modular as well.
Re: [PATCH net-next 0/3 v10] Add support for rmnet driver
From: Subash Abhinov KasiviswanathanDate: Tue, 29 Aug 2017 18:47:55 -0600 > I have updated the locking scheme as follows - Series applied, but this is not how you write a header posting for a patch set. This posting is where you say at a high level what the patch series is doing, how it is doing it, and why it is doing it that way. You can explain what changes happened, and why, but that belongs in the changelog at the end of this posting. Here you've made an explaination for one change the entire content of the text. You not even saying what rmnet is, why we would want to add it to the kernel, and what it's all about. So now when someone tries to read the merge commit that contains this text, they will have no context about you and me talking about locking and they will thus ask themselves "what is this person talking about here? it's not explaining the patch series at all"
Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters
On 8/28/17 10:11 PM, Alexei Starovoitov wrote: > > Agree on the above, but you're mixing semantics of the new recurse > flag and implementation of it. Ex: we don't have to copy this flag > from prog->attr into cgroup. So this reset or non-reset discussion > only makes sense in the context of your current implementation. > We can implement the logic differently. Like don't copy that flag > at all and at attach time walk parent->parent->parent and see > what programs are attached. All of them should have prog->attr & recurse_bit > set > In such implementation detach from 'b' is a nop from reset/non-reset > point of view. When socket creation in 'c' is invoked the program > 'c' is called first then the code keeps walking parents until root > invoking 'a' along the way. So you are suggesting there is no recursive flag per cgroup? How do you know you need to walk cgroups? How do you know when to stop running programs? > I'm not saying it will be an efficient implementation. The point > is to discuss UAPI independent of implementation. > >> ### >> >> Also, let's agree on this intention. Based on the new ground rule, I >> want to point out this example: >> >> If 'a' gets a program installed with no recurse flag set, ONLY processes >> in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd' >> all stop at cgroup 'b' program. > > I'm proposing that such situation should not be allowed to happen. > In a->b->c->d cgroup scenario if override+recurse prog attached to 'b' > then only the same override+recurse can be attached to c, d, a. > So at detach time there can be gaps (like only 'b' and 'd' have > override+recurse progs), but walking up until root from any point > will guarantee that only override+recurse programs are seen. > That seems very limiting to me. Seems like you are suggesting the entire cgroup tree is recursive or non-recursive, but never a mix.
Re: [PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload
From: Tom HerbertDate: Tue, 29 Aug 2017 16:27:08 -0700 > +#define GOTO_BY_RESULT(ret) do { \ > + switch (ret) { \ > + case FLOW_DISSECT_RET_OUT_GOOD: \ > + goto out_good; \ > + case FLOW_DISSECT_RET_PROTO_AGAIN: \ > + goto proto_again; \ > + case FLOW_DISSECT_RET_IPPROTO_AGAIN:\ > + goto ip_proto_again;\ > + case FLOW_DISSECT_RET_OUT_BAD: \ > + default:\ > + goto out_bad; \ > + } \ > +} while (0) > + > +#define GOTO_OR_CONT_BY_RESULT(ret) do { \ > + enum flow_dissect_ret __ret = (ret);\ > + \ > + if (__ret != FLOW_DISSECT_RET_CONTINUE) \ > + GOTO_BY_RESULT(__ret); \ > +} while (0) Please don't hide major control flow changes inside of a macro. This means returns and gotos. It makes code impossible to audit. Yes, this applies even if the macro has the word "GOTO" in it :-)
Re: [PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions
From: Tom HerbertDate: Tue, 29 Aug 2017 16:27:07 -0700 > For UDP socket lookup functions, and associateed functions that take an > skbuf as argument, declare the skb argument as constant. > > One caveat is that reuseport_select_sock can be called from the UDP > lookup functions with an skb argument. This function temporarily > modifies the skbuff data pointer (in bpf_run via a pull/push sequence). > To resolve compiler warning I added a local skbuf declaration that is > not const and assigned to the skb argument with an explicit cast. > > Signed-off-by: Tom Herbert Please don't do this. If reuseport_select_sock() modifies anything in the SKB, especially skb->data, it infects the entire call chain. So you can't mark it const in this family of calls.
[PATCH net-next 3/3 v10] drivers: net: ethernet: qualcomm: rmnet: Initial implementation
RmNet driver provides a transport agnostic MAP (multiplexing and aggregation protocol) support in embedded module. Module provides virtual network devices which can be attached to any IP-mode physical device. This will be used to provide all MAP functionality on future hardware in a single consistent location. Signed-off-by: Subash Abhinov Kasiviswanathan--- Documentation/networking/rmnet.txt | 82 drivers/net/ethernet/qualcomm/Kconfig | 2 + drivers/net/ethernet/qualcomm/Makefile | 2 + drivers/net/ethernet/qualcomm/rmnet/Kconfig| 12 + drivers/net/ethernet/qualcomm/rmnet/Makefile | 10 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 419 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h | 56 +++ .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 271 + .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h | 26 ++ drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h| 88 + .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 107 ++ .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 105 ++ .../net/ethernet/qualcomm/rmnet/rmnet_private.h| 45 +++ drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 170 + drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h| 29 ++ 15 files changed, 1424 insertions(+) create mode 100644 Documentation/networking/rmnet.txt create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h diff --git a/Documentation/networking/rmnet.txt b/Documentation/networking/rmnet.txt new file mode 100644 index 000..6b341ea --- /dev/null +++ b/Documentation/networking/rmnet.txt @@ -0,0 +1,82 @@ +1. Introduction + +rmnet driver is used for supporting the Multiplexing and aggregation +Protocol (MAP). This protocol is used by all recent chipsets using Qualcomm +Technologies, Inc. modems. + +This driver can be used to register onto any physical network device in +IP mode. Physical transports include USB, HSIC, PCIe and IP accelerator. + +Multiplexing allows for creation of logical netdevices (rmnet devices) to +handle multiple private data networks (PDN) like a default internet, tethering, +multimedia messaging service (MMS) or IP media subsystem (IMS). Hardware sends +packets with MAP headers to rmnet. Based on the multiplexer id, rmnet +routes to the appropriate PDN after removing the MAP header. + +Aggregation is required to achieve high data rates. This involves hardware +sending aggregated bunch of MAP frames. rmnet driver will de-aggregate +these MAP frames and send them to appropriate PDN's. + +2. Packet format + +a. MAP packet (data / control) + +MAP header has the same endianness of the IP packet. + +Packet format - + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command / Data Reserved Pad Multiplexer IDPayload length +Bit32 - x +Function Raw Bytes + +Command (1)/ Data (0) bit value is to indicate if the packet is a MAP command +or data packet. Control packet is used for transport level flow control. Data +packets are standard IP packets. + +Reserved bits are usually zeroed out and to be ignored by receiver. + +Padding is number of bytes to be added for 4 byte alignment if required by +hardware. + +Multiplexer ID is to indicate the PDN on which data has to be sent. + +Payload length includes the padding length but does not include MAP header +length. + +b. MAP packet (command specific) + +Bit 0 1 2-7 8 - 15 16 - 31 +Function Command Reserved Pad Multiplexer IDPayload length +Bit 32 - 3940 - 4546 - 47 48 - 63 +Function Command nameReserved Command Type Reserved +Bit 64 - 95 +Function Transaction ID +Bit 96 - 127 +Function Command data + +Command 1 indicates disabling flow while 2 is enabling flow + +Command types - +0 for MAP command request +1 is to acknowledge the receipt of a command +2 is for unsupported commands +3 is for error during processing of commands + +c. Aggregation + +Aggregation is multiple MAP packets (can be data or command) delivered
[PATCH net-next 0/3 v10] Add support for rmnet driver
Hi David I have updated the locking scheme as follows - The shared resource which needs to be protected is realdev->rx_handler_data. For the writer path, this is using rtnl_lock(). The writer paths are rmnet_newlink(), rmnet_dellink() and rmnet_force_unassociate_device(). These paths are already called with rtnl_lock() acquired in. There is also an ASSERT_RTNL() to ensure that we are calling with rtnl acquired. For dereference here, we will need to use rtnl_dereference(). Dev list writing needs to happen with rtnl_lock() acquired for netdev_master_upper_dev_link(). For the reader path, the real_dev->rx_handler_data is called in the TX / RX path. We only need rcu_read_lock() for these scenarios. In these cases, the rcu_read_lock() is held in __dev_queue_xmit() and netif_receive_skb_internal(), so readers need to use rcu_dereference_rtnl() to get the relevant information. For dev list reading, we again acquire rcu_read_lock() in rmnet_dellink() for netdev_master_upper_dev_get_rcu(). We also use unregister_netdevice_many() to free all rmnet devices in rmnet_force_unassociate_device() so we dont lose the rtnl_lock() and free in same context. I have also added this as a comment in rmnet_config.c. -- v1: Same as the RFC patch with some minor fixes for issues reported by kbuild test robot. v1->v2: Change datatypes and remove config IOCTL as mentioned by David. Also fix checkpatch issues and remove some unused code. v2->v3: Move location to drivers/net and rename to rmnet. Change the userspace - netlink communication from custom netlink to rtnl_link_ops. Refactor some code. Use a fixed config for ingress and egress. v3->v4: Move location to drivers/net/ethernet/qualcomm/. Fix comments from Stephen and Jiri - Split the ether and arp type changes into seperate patches. Remove debug and custom logging and switch to standard netdevice log. Remove module parameters. Refactor and change some code style issues. v4->v5: Rename some structs and variables. Move the initializer before the for loop start. Put the arp type in correct sequence. v5->v6: Fix comments from Dan - Use the upper link API. As a result, remove all the refcounting logic. Device refcount is explicitly held on real_dev on rx_handler registration only. Modifiy the flow control struct. Remove the unused ethernet mode handling. v6->v7: Fix comments from David - Add newline to end of Makefile. Remove inline from .c files. Move the module init/exit to rmnet config. Fix an error reported by kbuild test robot for an unused file. v7->v8: Use a smaller value for ETH_P_MAP as mentioned by David. Change netdev_info to netdev_dbg as mentioned by Andew. Fix comments from Stephen regarding netdev_priv and sparse related errors of using 0 as NULL v8->v9: Fix comments from David - Remove the CFLAG rule. Change the way rmnet devices are freed. Instead of using a workqueue to unregister devices individually, go through the list and free all devices within the rtnl_lock(). v9->v10: Fix the locking scheme as mentioned by David. Change comment near MAP type definition as mentioned by Dan. Refactor some code. Subash Abhinov Kasiviswanathan (3): net: ether: Add support for multiplexing and aggregation type net: arp: Add support for raw IP device drivers: net: ethernet: qualcomm: rmnet: Initial implementation Documentation/networking/rmnet.txt | 82 drivers/net/ethernet/qualcomm/Kconfig | 2 + drivers/net/ethernet/qualcomm/Makefile | 2 + drivers/net/ethernet/qualcomm/rmnet/Kconfig| 12 + drivers/net/ethernet/qualcomm/rmnet/Makefile | 10 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 419 + drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h | 56 +++ .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 271 + .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h | 26 ++ drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h| 88 + .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 107 ++ .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 105 ++ .../net/ethernet/qualcomm/rmnet/rmnet_private.h| 45 +++ drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 170 + drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h| 29 ++ include/uapi/linux/if_arp.h| 1 + include/uapi/linux/if_ether.h | 3 + 17 files changed, 1428 insertions(+) create mode 100644 Documentation/networking/rmnet.txt create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Kconfig create mode 100644 drivers/net/ethernet/qualcomm/rmnet/Makefile create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h create mode 100644 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h create mode
[PATCH net-next 1/3 v10] net: ether: Add support for multiplexing and aggregation type
Define the Qualcomm multiplexing and aggregation (MAP) ether type 0x00F9. This is needed for receiving data in the MAP protocol like RMNET. This is not an officially registered ID. Signed-off-by: Subash Abhinov Kasiviswanathan--- include/uapi/linux/if_ether.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h index 5bc9bfd..30526db 100644 --- a/include/uapi/linux/if_ether.h +++ b/include/uapi/linux/if_ether.h @@ -137,6 +137,9 @@ #define ETH_P_IEEE802154 0x00F6/* IEEE802.15.4 frame */ #define ETH_P_CAIF 0x00F7 /* ST-Ericsson CAIF protocol*/ #define ETH_P_XDSA 0x00F8 /* Multiplexed DSA protocol */ +#define ETH_P_MAP 0x00F9 /* Qualcomm multiplexing and +* aggregation protocol +*/ /* * This is an Ethernet frame header. -- 1.9.1
[PATCH net-next 2/3 v10] net: arp: Add support for raw IP device
Define the raw IP type. This is needed for raw IP net devices like rmnet. Signed-off-by: Subash Abhinov Kasiviswanathan--- include/uapi/linux/if_arp.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/if_arp.h b/include/uapi/linux/if_arp.h index cf73510..a2a6356 100644 --- a/include/uapi/linux/if_arp.h +++ b/include/uapi/linux/if_arp.h @@ -59,6 +59,7 @@ #define ARPHRD_LAPB516 /* LAPB */ #define ARPHRD_DDCMP517/* Digital's DDCMP protocol */ #define ARPHRD_RAWHDLC 518 /* Raw HDLC */ +#define ARPHRD_RAWIP519/* Raw IP */ #define ARPHRD_TUNNEL 768 /* IPIP tunnel */ #define ARPHRD_TUNNEL6 769 /* IP6IP6 tunnel*/ -- 1.9.1
Re: Fwd: DA850-evm MAC Address is random
t; v4.13 itself. >>>>> >>>>> I will test it. Do need to run some instruction or do something >>>>> special in U-Boot to pass this in the proper place for the kernel to >>>>> pull it? Tony's patch reference showed >>>>> command for fdt set, but I am not sure I fully understand the >>>>> parameters that went along with that. >>>> >>>> Nope, just applying the patch and booting the with the new dtb should >>>> result in the random mac address going away. >>> >>> Unfortunately, I am not seeing any change with the patch (at least >>> with Kernel 4.12.9 from stable). >>> >>> netconsole: network logging started >>> davinci_emac davinci_emac.1: incompatible machine/device type for >>> reading mac address >>> davinci_emac davinci_emac.1: using random MAC addr: ee:74:c3:3a:15:be >>> >>> Looking at the source for cpsw-common.c function ti_cm_get_macid() >>> doesn't have a case for the ti,davinci-dm6467-emac so I wonder if >>> there might be more to it. >> >> Hmm, it did solve the issue for me when I tried latest -next. And >> reverting the patch brought back the random mac address usage. Could you >> try latest mainline or -next? >> >> Meanwhile let me see whats going on with the observations you have. > > I will try again with -next this afternoon and see what I can find. > Can you tell me which U-Boot version you're using? I want to match > your setup. I want to see if something is missing during the hand-off > between the Bootloader and Linux. > I wonder if U-Boot isn't pushing something to Linux because it doesn't appear to be running some of the da850 specific code even when I run linux-next. Can you tell me what verision of U-Boot you're using? Other than using davinci_all_defconfig, did you change the configuration at all? [1.411107] netconsole: network logging started [1.416237] davinci_emac davinci_emac.1: incompatible machine/device type for reading mac address [1.424496] davinci_emac davinci_emac.1: using random MAC addr: be:e2:84:ed:3 c:87 I also confirmed the SPI-flash has the MAC programmed: # hexdump /dev/mtd5ro 000 0800 04ee 8e32 010 Here is my full log: [0.00] Booting Linux on physical CPU 0x0 [0.00] Linux version 4.13.0-rc7-next-20170829 (aford@ubuntu16) (gcc vers ion 4.8.3 20140320 (prerelease) (Sourcery CodeBench Lite 2014.05-29)) #1 PREEMPT Tue Aug 29 19:11:06 CDT 2017 [0.00] CPU: ARM926EJ-S [41069265] revision 5 (ARMv5TEJ), cr=0005317f [0.00] CPU: VIVT data cache, VIVT instruction cache [0.00] OF: fdt: Machine model: DA850/AM1808/OMAP-L138 EVM [0.00] Memory policy: Data cache writethrough [0.00] DaVinci da850/omap-l138 variant 0x0 [0.00] On node 0 totalpages: 8192 [0.00] free_area_init_node: node 0, pgdat c05dfec8, node_mem_map c1fb900 0 [0.00] DMA zone: 64 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 8192 pages, LIFO batch:0 [0.00] random: fast init done [0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768 [0.00] pcpu-alloc: [0] 0 [0.00] Built 1 zonelists, mobility grouping on. Total pages: 8128 [0.00] Kernel command line: mem=32M console=ttyS2,115200n8 root=/dev/mmc blk0p2 rw wait noinitrd [0.00] PID hash table entries: 128 (order: -3, 512 bytes) [0.00] Dentry cache hash table entries: 4096 (order: 2, 16384 bytes) [0.00] Inode-cache hash table entries: 2048 (order: 1, 8192 bytes) [0.00] Memory: 26196K/32768K available (4412K kernel code, 330K rwdata, 1012K rodata, 224K init, 146K bss, 6572K reserved, 0K cma-reserved) [0.00] Virtual kernel memory layout: vector : 0x - 0x1000 ( 4 kB) fixmap : 0xffc0 - 0xfff0 (3072 kB) vmalloc : 0xc280 - 0xff80 ( 976 MB) lowmem : 0xc000 - 0xc200 ( 32 MB) modules : 0xbf00 - 0xc000 ( 16 MB) .text : 0xc0008000 - 0xc04574e8 (4414 kB) .init : 0xc0556000 - 0xc058e000 ( 224 kB) .data : 0xc058e000 - 0xc05e09e0 ( 331 kB) .bss : 0xc05e4ebc - 0xc06097b4 ( 147 kB)\x00c - 0xc06097 b4 ( 147 kB) [0.00] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [0.00] Preemptible hierarchical RCU implementation. [0.00] Tasks RCU enabled.\x00d. [0.00] NR_IRQS: 245 [0.00] clocksource: timer0_1: mask: 0x max_cycles: 0x, m ax_idle_ns: 79635851949 ns [0.00] sched_clock: 32 bits at 24MHz, resolution 4
Re: [RESEND PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN
"prakash.sangappa"writes: > On 08/29/2017 04:02 PM, David Miller wrote: >> From: Prakash Sangappa >> Date: Mon, 28 Aug 2017 17:12:20 -0700 >> >>> Currently passing tid(gettid(2)) of a thread in struct ucred in >>> SCM_CREDENTIALS message requires CAP_SYS_ADMIN capability otherwise >>> it fails with EPERM error. Some applications deal with thread id >>> of a thread(tid) and so it would help to allow tid in SCM_CREDENTIALS >>> message. Basically, either tgid(pid of the process) or the tid of >>> the thread should be allowed without the need for CAP_SYS_ADMIN capability. >>> >>> SCM_CREDENTIALS will be used to determine the global id of a process or >>> a thread running inside a pid namespace. >>> >>> This patch adds necessary check to accept tid in SCM_CREDENTIALS >>> struct ucred. >>> >>> Signed-off-by: Prakash Sangappa >> I'm pretty sure that by the descriptions in previous changes to this >> function, what you are proposing is basically a minor form of PID >> spoofing which we only want someone with CAP_SYS_ADMIN over the >> PID namespace to be able to do. > > The fix is to allow passing tid of the calling thread itself not of any > other thread or process. Curious why would this be considered > as pid spoofing? > > This change would enable a thread in a multi threaded process, running > inside a pid namespace to be identified by the recipient of the > message easily. I think a more practical problem is that change, changes what is being passed in the SCM_CREDENTIALS from a pid of a process to a tid of a thread. That could be confusing and that confusion could be exploited. It is definitely confusing because in some instances a value can be both a tgid and a tid. I definitely think this needs to be talked about in terms of changing what is passed in that field and what the consequences could be. I suspect you are ok. As nothing allows passing a tid today. But I don't see any analysis on why passing a tid instead of a tgid will not confuse the receiving application, and in such confusion introduce a security hole. Eric
Re: [RESEND PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN
On 08/29/2017 04:02 PM, David Miller wrote: From: Prakash SangappaDate: Mon, 28 Aug 2017 17:12:20 -0700 Currently passing tid(gettid(2)) of a thread in struct ucred in SCM_CREDENTIALS message requires CAP_SYS_ADMIN capability otherwise it fails with EPERM error. Some applications deal with thread id of a thread(tid) and so it would help to allow tid in SCM_CREDENTIALS message. Basically, either tgid(pid of the process) or the tid of the thread should be allowed without the need for CAP_SYS_ADMIN capability. SCM_CREDENTIALS will be used to determine the global id of a process or a thread running inside a pid namespace. This patch adds necessary check to accept tid in SCM_CREDENTIALS struct ucred. Signed-off-by: Prakash Sangappa I'm pretty sure that by the descriptions in previous changes to this function, what you are proposing is basically a minor form of PID spoofing which we only want someone with CAP_SYS_ADMIN over the PID namespace to be able to do. The fix is to allow passing tid of the calling thread itself not of any other thread or process. Curious why would this be considered as pid spoofing? This change would enable a thread in a multi threaded process, running inside a pid namespace to be identified by the recipient of the message easily. Sorry, I'm not applying this.
[PATCH v2 net-next 2/6] udp: Constify skb argument in lookup functions
For UDP socket lookup functions, and associateed functions that take an skbuf as argument, declare the skb argument as constant. One caveat is that reuseport_select_sock can be called from the UDP lookup functions with an skb argument. This function temporarily modifies the skbuff data pointer (in bpf_run via a pull/push sequence). To resolve compiler warning I added a local skbuf declaration that is not const and assigned to the skb argument with an explicit cast. Signed-off-by: Tom Herbert--- include/net/ip.h | 2 +- include/net/sock_reuseport.h | 2 +- include/net/udp.h| 11 ++- net/core/sock_reuseport.c| 5 +++-- net/ipv4/udp.c | 11 ++- net/ipv6/udp.c | 10 +- 6 files changed, 22 insertions(+), 19 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 9896f46cbbf1..8c0d84ffc659 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -79,7 +79,7 @@ struct ipcm_cookie { #define PKTINFO_SKB_CB(skb) ((struct in_pktinfo *)((skb)->cb)) /* return enslaved device index if relevant */ -static inline int inet_sdif(struct sk_buff *skb) +static inline int inet_sdif(const struct sk_buff *skb) { #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) if (skb && ipv4_l3mdev_skb(IPCB(skb)->flags)) diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h index aecd30308d50..d25352a848d9 100644 --- a/include/net/sock_reuseport.h +++ b/include/net/sock_reuseport.h @@ -20,7 +20,7 @@ extern int reuseport_add_sock(struct sock *sk, struct sock *sk2); extern void reuseport_detach_sock(struct sock *sk); extern struct sock *reuseport_select_sock(struct sock *sk, u32 hash, - struct sk_buff *skb, + const struct sk_buff *skb, int hdr_len); extern struct bpf_prog *reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog); diff --git a/include/net/udp.h b/include/net/udp.h index 4e5f23fec35e..f3d1de6f0983 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -167,7 +167,7 @@ static inline void udp_csum_pull_header(struct sk_buff *skb) UDP_SKB_CB(skb)->cscov -= sizeof(struct udphdr); } -typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport, +typedef struct sock *(*udp_lookup_t)(const struct sk_buff *skb, __be16 sport, __be16 dport); struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb, @@ -288,8 +288,9 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport, __be32 daddr, __be16 dport, int dif); struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport, __be32 daddr, __be16 dport, int dif, int sdif, - struct udp_table *tbl, struct sk_buff *skb); -struct sock *udp4_lib_lookup_skb(struct sk_buff *skb, + struct udp_table *tbl, + const struct sk_buff *skb); +struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); struct sock *udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be16 sport, @@ -299,8 +300,8 @@ struct sock *__udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, __be16 dport, int dif, int sdif, struct udp_table *tbl, - struct sk_buff *skb); -struct sock *udp6_lib_lookup_skb(struct sk_buff *skb, + const struct sk_buff *skb); +struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport); /* UDP uses skb->dev_scratch to cache as much information as possible and avoid diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index eed1ebf7f29d..a17f13b33189 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -164,9 +164,10 @@ void reuseport_detach_sock(struct sock *sk) EXPORT_SYMBOL(reuseport_detach_sock); static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks, - struct bpf_prog *prog, struct sk_buff *skb, + struct bpf_prog *prog, const struct sk_buff *_skb, int hdr_len) { + struct sk_buff *skb = (struct sk_buff *)_skb; /* Override const */ struct sk_buff *nskb = NULL; u32 index; @@ -205,7 +206,7 @@ static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks, */ struct sock *reuseport_select_sock(struct sock *sk, u32 hash, -
[PATCH v2 net-next 4/6] udp: flow dissector offload
Add support to perform UDP specific flow dissection. This is primarily intended for dissecting encapsulated packets in UDP encapsulation. This patch adds a flow_dissect offload for UDP4 and UDP6. The backend function performs a socket lookup and calls the flow_dissect function if a socket is found. Signed-off-by: Tom Herbert--- include/linux/udp.h | 8 include/net/udp.h| 8 include/net/udp_tunnel.h | 8 net/ipv4/udp_offload.c | 45 + net/ipv4/udp_tunnel.c| 1 + net/ipv6/udp_offload.c | 13 + 6 files changed, 83 insertions(+) diff --git a/include/linux/udp.h b/include/linux/udp.h index eaea63bc79bb..2e90b189ef6a 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -79,6 +79,14 @@ struct udp_sock { int (*gro_complete)(struct sock *sk, struct sk_buff *skb, int nhoff); + /* Flow dissector function for a UDP socket */ + enum flow_dissect_ret (*flow_dissect)(struct sock *sk, + const struct sk_buff *skb, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags); /* udp_recvmsg try to use this before splicing sk_receive_queue */ struct sk_buff_head reader_queue cacheline_aligned_in_smp; diff --git a/include/net/udp.h b/include/net/udp.h index f3d1de6f0983..499e4faf8b14 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -174,6 +174,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb, struct udphdr *uh, udp_lookup_t lookup); int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup); +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb, + udp_lookup_t lookup, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags); + static inline struct udphdr *udp_gro_udphdr(struct sk_buff *skb) { struct udphdr *uh; diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h index 10cce0dd4450..b7102e0f41a9 100644 --- a/include/net/udp_tunnel.h +++ b/include/net/udp_tunnel.h @@ -69,6 +69,13 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk, struct sk_buff *skb); typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb, int nhoff); +typedef enum flow_dissect_ret (*udp_tunnel_flow_dissect_t)(struct sock *sk, + const struct sk_buff *skb, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags); struct udp_tunnel_sock_cfg { void *sk_user_data; /* user data used by encap_rcv call back */ @@ -78,6 +85,7 @@ struct udp_tunnel_sock_cfg { udp_tunnel_encap_destroy_t encap_destroy; udp_tunnel_gro_receive_t gro_receive; udp_tunnel_gro_complete_t gro_complete; + udp_tunnel_flow_dissect_t flow_dissect; }; /* Setup the given (UDP) sock to receive UDP encapsulated packets */ diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 97658bfc1b58..7f0a7ed4a6f7 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -328,11 +328,56 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff) return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb); } +enum flow_dissect_ret udp_flow_dissect(const struct sk_buff *skb, + udp_lookup_t lookup, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags) +{ + enum flow_dissect_ret ret = FLOW_DISSECT_RET_CONTINUE; + struct udphdr *uh, _uh; + struct sock *sk; + + uh = __skb_header_pointer(skb, *p_nhoff, sizeof(_uh), data, + *p_hlen, &_uh); + if (!uh) + return
[PATCH v2 net-next 6/6] vxlan: support flow dissect
Populate offload flow_dissect callback appropriately for VXLAN and VXLAN-GPE. Signed-off-by: Tom Herbert--- drivers/net/vxlan.c | 50 ++ 1 file changed, 50 insertions(+) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index ae3a1da703c2..41e50de40af4 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1336,6 +1336,55 @@ static bool vxlan_ecn_decapsulate(struct vxlan_sock *vs, void *oiph, return err <= 1; } +static enum flow_dissect_ret vxlan_flow_dissect(struct sock *sk, + const struct sk_buff *skb, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags) +{ + __be16 protocol = htons(ETH_P_TEB); + struct vxlanhdr *vhdr, _vhdr; + struct vxlan_sock *vs; + + vhdr = __skb_header_pointer(skb, *p_nhoff + sizeof(struct udphdr), + sizeof(_vhdr), data, *p_hlen, &_vhdr); + if (!vhdr) + return FLOW_DISSECT_RET_OUT_BAD; + + vs = rcu_dereference_sk_user_data(sk); + if (!vs) + return FLOW_DISSECT_RET_OUT_BAD; + + if (vs->flags & VXLAN_F_GPE) { + struct vxlanhdr_gpe *gpe = (struct vxlanhdr_gpe *)vhdr; + + /* Need to have Next Protocol set for interfaces in GPE mode. */ + if (gpe->version != 0 || !gpe->np_applied || gpe->oam_flag) + return FLOW_DISSECT_RET_CONTINUE; + + switch (gpe->next_protocol) { + case VXLAN_GPE_NP_IPV4: + protocol = htons(ETH_P_IP); + break; + case VXLAN_GPE_NP_IPV6: + protocol = htons(ETH_P_IPV6); + break; + case VXLAN_GPE_NP_ETHERNET: + protocol = htons(ETH_P_TEB); + break; + default: + return FLOW_DISSECT_RET_CONTINUE; + } + } + + *p_nhoff += sizeof(struct udphdr) + sizeof(_vhdr); + *p_proto = protocol; + + return FLOW_DISSECT_RET_PROTO_AGAIN; +} + /* Callback from net/ipv4/udp.c to receive packets */ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb) { @@ -2864,6 +2913,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, bool ipv6, tunnel_cfg.encap_destroy = NULL; tunnel_cfg.gro_receive = vxlan_gro_receive; tunnel_cfg.gro_complete = vxlan_gro_complete; + tunnel_cfg.flow_dissect = vxlan_flow_dissect; setup_udp_tunnel_sock(net, sock, _cfg); -- 2.11.0
[PATCH v2 net-next 5/6] fou: Support flow dissection
Populate offload flow_dissect callabck appropriately for fou and gue. Signed-off-by: Tom Herbert--- net/ipv4/fou.c | 63 ++ 1 file changed, 63 insertions(+) diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c index 1540db65241a..a831dd49fb28 100644 --- a/net/ipv4/fou.c +++ b/net/ipv4/fou.c @@ -282,6 +282,20 @@ static int fou_gro_complete(struct sock *sk, struct sk_buff *skb, return err; } +static enum flow_dissect_ret fou_flow_dissect(struct sock *sk, + const struct sk_buff *skb, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags) +{ + *p_ip_proto = fou_from_sock(sk)->protocol; + *p_nhoff += sizeof(struct udphdr); + + return FLOW_DISSECT_RET_IPPROTO_AGAIN; +} + static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off, struct guehdr *guehdr, void *data, size_t hdrlen, struct gro_remcsum *grc, @@ -500,6 +514,53 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff) return err; } +static enum flow_dissect_ret gue_flow_dissect(struct sock *sk, + const struct sk_buff *skb, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags) +{ + struct guehdr *guehdr, _guehdr; + + guehdr = __skb_header_pointer(skb, *p_nhoff + sizeof(struct udphdr), + sizeof(_guehdr), data, *p_hlen, &_guehdr); + if (!guehdr) + return FLOW_DISSECT_RET_OUT_BAD; + + switch (guehdr->version) { + case 0: + if (unlikely(guehdr->control)) + return FLOW_DISSECT_RET_CONTINUE; + + *p_ip_proto = guehdr->proto_ctype; + *p_nhoff += sizeof(struct udphdr) + + sizeof(*guehdr) + (guehdr->hlen << 2); + + break; + case 1: + switch (((struct iphdr *)guehdr)->version) { + case 4: + *p_ip_proto = IPPROTO_IPIP; + break; + case 6: + *p_ip_proto = IPPROTO_IPV6; + break; + default: + return FLOW_DISSECT_RET_CONTINUE; + } + + *p_nhoff += sizeof(struct udphdr); + + break; + default: + return FLOW_DISSECT_RET_CONTINUE; + } + + return FLOW_DISSECT_RET_IPPROTO_AGAIN; +} + static int fou_add_to_port_list(struct net *net, struct fou *fou) { struct fou_net *fn = net_generic(net, fou_net_id); @@ -570,12 +631,14 @@ static int fou_create(struct net *net, struct fou_cfg *cfg, tunnel_cfg.encap_rcv = fou_udp_recv; tunnel_cfg.gro_receive = fou_gro_receive; tunnel_cfg.gro_complete = fou_gro_complete; + tunnel_cfg.flow_dissect = fou_flow_dissect; fou->protocol = cfg->protocol; break; case FOU_ENCAP_GUE: tunnel_cfg.encap_rcv = gue_udp_recv; tunnel_cfg.gro_receive = gue_gro_receive; tunnel_cfg.gro_complete = gue_gro_complete; + tunnel_cfg.flow_dissect = gue_flow_dissect; break; default: err = -EINVAL; -- 2.11.0
[PATCH v2 net-next 3/6] flow_dissector: Add protocol specific flow dissection offload
Add offload capability for performing protocol specific flow dissection (either by EtherType or IP protocol). Specifically: - Add flow_dissect to offload callbacks - Move flow_dissect_ret enum to flow_dissector.h, cleanup names and add a couple of values - Create GOTO_BY_RESULT macro to use in the main flow dissector switch to simplify handling of functions that return flow_dissect_ret enum - In __skb_flow_dissect, add default case for switch(proto) as well as switch(ip_proto) that looks up and calls protocol specific flow dissection Signed-off-by: Tom Herbert--- include/linux/netdevice.h| 7 +++ include/net/flow_dissector.h | 9 +++ net/core/dev.c | 14 + net/core/flow_dissector.c| 132 +++ net/ipv4/route.c | 4 +- 5 files changed, 128 insertions(+), 38 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c5475b37a631..90ccb434e127 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2208,6 +2208,12 @@ struct offload_callbacks { struct sk_buff **(*gro_receive)(struct sk_buff **head, struct sk_buff *skb); int (*gro_complete)(struct sk_buff *skb, int nhoff); + enum flow_dissect_ret (*flow_dissect)(const struct sk_buff *skb, + struct flow_dissector_key_control *key_control, + struct flow_dissector *flow_dissector, + void *target_container, void *data, + __be16 *p_proto, u8 *p_ip_proto, int *p_nhoff, + int *p_hlen, unsigned int flags); }; struct packet_offload { @@ -3253,6 +3259,7 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi); gro_result_t napi_gro_frags(struct napi_struct *napi); struct packet_offload *gro_find_receive_by_type(__be16 type); struct packet_offload *gro_find_complete_by_type(__be16 type); +struct packet_offload *flow_dissect_find_by_type(__be16 type); static inline void napi_free_frags(struct napi_struct *napi) { diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h index e2663e900b0a..ad75bbfd1c9c 100644 --- a/include/net/flow_dissector.h +++ b/include/net/flow_dissector.h @@ -19,6 +19,14 @@ struct flow_dissector_key_control { #define FLOW_DIS_FIRST_FRAGBIT(1) #define FLOW_DIS_ENCAPSULATION BIT(2) +enum flow_dissect_ret { + FLOW_DISSECT_RET_OUT_GOOD, + FLOW_DISSECT_RET_OUT_BAD, + FLOW_DISSECT_RET_PROTO_AGAIN, + FLOW_DISSECT_RET_IPPROTO_AGAIN, + FLOW_DISSECT_RET_CONTINUE, +}; + /** * struct flow_dissector_key_basic: * @thoff: Transport header offset @@ -205,6 +213,7 @@ enum flow_dissector_key_id { #define FLOW_DISSECTOR_F_STOP_AT_L3BIT(1) #define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABELBIT(2) #define FLOW_DISSECTOR_F_STOP_AT_ENCAP BIT(3) +#define FLOW_DISSECTOR_F_STOP_AT_L4BIT(4) struct flow_dissector_key { enum flow_dissector_key_id key_id; diff --git a/net/core/dev.c b/net/core/dev.c index 270b54754821..22ea8daa930c 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4860,6 +4860,20 @@ struct packet_offload *gro_find_receive_by_type(__be16 type) } EXPORT_SYMBOL(gro_find_receive_by_type); +struct packet_offload *flow_dissect_find_by_type(__be16 type) +{ + struct list_head *offload_head = _base; + struct packet_offload *ptype; + + list_for_each_entry_rcu(ptype, offload_head, list) { + if (ptype->type != type || !ptype->callbacks.flow_dissect) + continue; + return ptype; + } + return NULL; +} +EXPORT_SYMBOL(flow_dissect_find_by_type); + struct packet_offload *gro_find_complete_by_type(__be16 type) { struct list_head *offload_head = _base; diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 12302acdb073..6a2cf240069a 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -115,12 +116,6 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto, } EXPORT_SYMBOL(__skb_flow_get_ports); -enum flow_dissect_ret { - FLOW_DISSECT_RET_OUT_GOOD, - FLOW_DISSECT_RET_OUT_BAD, - FLOW_DISSECT_RET_OUT_PROTO_AGAIN, -}; - static enum flow_dissect_ret __skb_flow_dissect_mpls(const struct sk_buff *skb, struct flow_dissector *flow_dissector, @@ -322,7 +317,7 @@ __skb_flow_dissect_gre(const struct sk_buff *skb, if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP) return FLOW_DISSECT_RET_OUT_GOOD; - return FLOW_DISSECT_RET_OUT_PROTO_AGAIN; + return FLOW_DISSECT_RET_PROTO_AGAIN; } static void @@ -383,6 +378,27 @@ __skb_flow_dissect_ipv6(const struct sk_buff *skb, key_ip->ttl =
[PATCH v2 net-next 1/6] flow_dissector: Move ETH_P_TEB processing to main switch
Support for processing TEB is currently in GRE flow dissection as a special case. This can be moved to be a case the main proto switch in __skb_flow_dissect. Signed-off-by: Tom Herbert--- net/core/flow_dissector.c | 44 +++- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index e2eaa1ff948d..12302acdb073 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -288,27 +288,8 @@ __skb_flow_dissect_gre(const struct sk_buff *skb, if (hdr->flags & GRE_SEQ) offset += sizeof(((struct pptp_gre_header *) 0)->seq); - if (gre_ver == 0) { - if (*p_proto == htons(ETH_P_TEB)) { - const struct ethhdr *eth; - struct ethhdr _eth; - - eth = __skb_header_pointer(skb, *p_nhoff + offset, - sizeof(_eth), - data, *p_hlen, &_eth); - if (!eth) - return FLOW_DISSECT_RET_OUT_BAD; - *p_proto = eth->h_proto; - offset += sizeof(*eth); - - /* Cap headers that we access via pointers at the -* end of the Ethernet header as our maximum alignment -* at that point is only 2 bytes. -*/ - if (NET_IP_ALIGN) - *p_hlen = *p_nhoff + offset; - } - } else { /* version 1, must be PPTP */ + /* version 1, must be PPTP */ + if (gre_ver == 1) { u8 _ppp_hdr[PPP_HDRLEN]; u8 *ppp_hdr; @@ -573,6 +554,27 @@ bool __skb_flow_dissect(const struct sk_buff *skb, break; } + case htons(ETH_P_TEB): { + const struct ethhdr *eth; + struct ethhdr _eth; + + eth = __skb_header_pointer(skb, nhoff, sizeof(_eth), + data, hlen, &_eth); + if (!eth) + goto out_bad; + + proto = eth->h_proto; + nhoff += sizeof(*eth); + + /* Cap headers that we access via pointers at the +* end of the Ethernet header as our maximum alignment +* at that point is only 2 bytes. +*/ + if (NET_IP_ALIGN) + hlen = nhoff; + + goto proto_again; + } case htons(ETH_P_8021AD): case htons(ETH_P_8021Q): { const struct vlan_hdr *vlan; -- 2.11.0
[PATCH v2 net-next 0/6] flow_dissector: Protocol specific flow dissector offload
This patch set adds a new offload type to perform flow dissection for specific protocols (either by EtherType or by IP protocol). This is primary useful to crack open UDP encapsulations (like VXLAN, GUE) for the purposes of parsing the encapsulated packet. Items in this patch set: - Constify skb argument to UDP lookup functions - Create new protocol case in __skb_dissect for ETH_P_TEB. This is based on the code in the GRE dissect function and the special handling in GRE can now be removed (it sets protocol to ETH_P_TEB and returns so goto proto_again is done) - Add infrastructure for protocol specific flow dissection offload - Add infrastructure to perform UDP flow dissection. Uses same model of GRO where a flow_dissect callback can be associated with a UDP socket - Use the infrastructure to support flow dissection of VXLAN and GUE Tested: Forced RPS to call flow dissection for VXLAN, FOU, and GUE. Observed that inner packet was being properly dissected. v2: Add signed off Tom Herbert (6): flow_dissector: Move ETH_P_TEB processing to main switch udp: Constify skb argument in lookup functions flow_dissector: Add protocol specific flow dissection offload udp: flow dissector offload fou: Support flow dissection vxlan: support flow dissect drivers/net/vxlan.c | 50 include/linux/netdevice.h| 7 ++ include/linux/udp.h | 8 ++ include/net/flow_dissector.h | 9 +++ include/net/ip.h | 2 +- include/net/sock_reuseport.h | 2 +- include/net/udp.h| 19 +++-- include/net/udp_tunnel.h | 8 ++ net/core/dev.c | 14 net/core/flow_dissector.c| 176 +-- net/core/sock_reuseport.c| 5 +- net/ipv4/fou.c | 63 net/ipv4/route.c | 4 +- net/ipv4/udp.c | 11 +-- net/ipv4/udp_offload.c | 45 +++ net/ipv4/udp_tunnel.c| 1 + net/ipv6/udp.c | 10 +-- net/ipv6/udp_offload.c | 13 18 files changed, 369 insertions(+), 78 deletions(-) -- 2.11.0
Re: [PATCH net-next] neigh: increase queue_len_bytes to match wmem_default
On Tue, 2017-08-29 at 15:16 -0700, Eric Dumazet wrote: > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index > 568ccfd6dd371d88136ffabe5cfcc36f099786b6..7616cd76f6f6a62f395da897baef2c66c0098193 > 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -6086,9 +6086,9 @@ int tcp_conn_request(struct request_sock_ops > *rsk_ops, > struct tcp_sock *tp = tcp_sk(sk); > struct net *net = sock_net(sk); > struct sock *fastopen_sk = NULL; > - struct dst_entry *dst = NULL; > struct request_sock *req; > bool want_cookie = false; > + struct dst_entry *dst; > struct flowi fl; > > /* TW buckets are converted to open req This part was meant to belong to a separate patch :/ No big deal, this was also one of your suggestion David.
Re: [PATCH net-next] neigh: increase queue_len_bytes to match wmem_default
From: Eric DumazetDate: Tue, 29 Aug 2017 16:15:28 -0700 > On Tue, 2017-08-29 at 15:16 -0700, Eric Dumazet wrote: >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c >> index >> 568ccfd6dd371d88136ffabe5cfcc36f099786b6..7616cd76f6f6a62f395da897baef2c66c0098193 >> 100644 >> --- a/net/ipv4/tcp_input.c >> +++ b/net/ipv4/tcp_input.c >> @@ -6086,9 +6086,9 @@ int tcp_conn_request(struct request_sock_ops >> *rsk_ops, >> struct tcp_sock *tp = tcp_sk(sk); >> struct net *net = sock_net(sk); >> struct sock *fastopen_sk = NULL; >> - struct dst_entry *dst = NULL; >> struct request_sock *req; >> bool want_cookie = false; >> + struct dst_entry *dst; >> struct flowi fl; >> >> /* TW buckets are converted to open req > > This part was meant to belong to a separate patch :/ > > No big deal, this was also one of your suggestion David. Yeah, no big deal. But thanks for pointing it out.
Re: [PATCH] drivers: net: xgene: Correct probe sequence handling
From: Iyappan SubramanianDate: Tue, 29 Aug 2017 15:43:12 -0700 > From: Quan Nguyen > > The phy is connected at early stage of probe but not properly > disconnected if error occurs. This patch fixes the issue. > > Also changing the return type of xgene_enet_check_phy_handle(), > since this function always returns success. > > Signed-off-by: Quan Nguyen > Signed-off-by: Iyappan Subramanian Applied, thank you.
Re: [PATCH net] sch_htb: fix crash on init failure
On 30.08.2017 02:06, David Miller wrote: > > I expect that you will resubmit all of these similar fixes as a patch > series after you have sorted everything out. > > Correct? > Yes, I will. There are a few more places that need fixing, I'll resubmit them all as a set.
Re: [PATCH net-next] neigh: increase queue_len_bytes to match wmem_default
From: Eric DumazetDate: Tue, 29 Aug 2017 15:16:01 -0700 > From: Eric Dumazet > > Florian reported UDP xmit drops that could be root caused to the > too small neigh limit. > > Current limit is 64 KB, meaning that even a single UDP socket would hit > it, since its default sk_sndbuf comes from net.core.wmem_default > (~212992 bytes on 64bit arches). > > Once ARP/ND resolution is in progress, we should allow a little more > packets to be queued, at least for one producer. > > Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf > limit and either block in sendmsg() or return -EAGAIN. > > Signed-off-by: Eric Dumazet > Reported-by: Florian Fainelli Applied, thanks for following up on this.
Re: [PATCH net-next v2] net: bcmgenet: Use correct I/O accessors
From: Florian FainelliDate: Tue, 29 Aug 2017 12:25:31 -0700 > The GENET driver currently uses __raw_{read,write}l which means > native I/O endian. This works correctly for an ARM LE kernel (default) > but fails miserably on an ARM BE (BE8) kernel where registers are kept > little endian, so replace uses with {read,write}l_relaxed here which is > what we want because this is all performance sensitive code. > > Signed-off-by: Florian Fainelli Applied.
Re: [PATCH] net: remove dmaengine.h inclusion from netdevice.h
From: Dave JiangDate: Tue, 29 Aug 2017 13:17:51 -0700 > Since the removal of NET_DMA, dmaengine.h header file shouldn't be needed > by netdevice.h anymore. > > Signed-off-by: Dave Jiang Applied to net-next, but it would have been really great for you to have provided a proper "Fixes: " tag referencing the NET_DMA removal change. Thanks.
Re: [PATCH V2 net-next] liquidio: show NIC's U-Boot version in a dev_info() message
From: Felix ManlunasDate: Tue, 29 Aug 2017 12:19:57 -0700 > From: Weilin Chang > > Signed-off-by: Weilin Chang > Signed-off-by: Felix Manlunas > --- > Patch Change Log: > V1 -> V2: > * Move octeon_get_uboot_version() to a proper place to avoid forward > declaration. > * Remove complicated for-loops that search for substrings; replace them > with calls to strstr(). > * Don't add unnecessary fields to struct octeon_device. Applied.
Re: [PATCH net] nfp: double free on error in probe
From: Dan CarpenterDate: Tue, 29 Aug 2017 22:15:16 +0300 > Both the nfp_net_pf_app_start() and the nfp_net_pci_probe() functions > call nfp_net_pf_app_stop_ctrl(pf) so there is a double free. The free > should be done from the probe function because it's allocated there so > I have removed the call from nfp_net_pf_app_start(). > > Fixes: 02082701b974 ("nfp: create control vNICs and wire up rx/tx") > Signed-off-by: Dan Carpenter Applied.
Re: [PATCH net] sch_htb: fix crash on init failure
I expect that you will resubmit all of these similar fixes as a patch series after you have sorted everything out. Correct?
Re: [PATCH] net: dsa: make some structures const
From: Bhumika GoyalDate: Tue, 29 Aug 2017 22:17:52 +0530 > Make these const as they are not modified anywhere. > > Signed-off-by: Bhumika Goyal Applied.
Re: [PATCH ] net: frag: print frag_mem_limit value in sockstat proc file
From:Date: Tue, 29 Aug 2017 21:05:34 +0800 > From: liujian > > From 6d7b857d5( net: use lib/percpu_counter API for fragmentation mem > accounting), > frag_mem_limit and sum_frag_mem_limit have different value if there are > multiple NIC RX CPU. > Print frag_mem_limit value, then we can get more debug info. > > Signed-off-by: liujian Sorry, I don't think it is justified to change the deprecated procfs output just for this. Use tracepoints and other similar facilities to debug situations like this if you like.
Re: [RESEND PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN
From: Prakash SangappaDate: Mon, 28 Aug 2017 17:12:20 -0700 > Currently passing tid(gettid(2)) of a thread in struct ucred in > SCM_CREDENTIALS message requires CAP_SYS_ADMIN capability otherwise > it fails with EPERM error. Some applications deal with thread id > of a thread(tid) and so it would help to allow tid in SCM_CREDENTIALS > message. Basically, either tgid(pid of the process) or the tid of > the thread should be allowed without the need for CAP_SYS_ADMIN capability. > > SCM_CREDENTIALS will be used to determine the global id of a process or > a thread running inside a pid namespace. > > This patch adds necessary check to accept tid in SCM_CREDENTIALS > struct ucred. > > Signed-off-by: Prakash Sangappa I'm pretty sure that by the descriptions in previous changes to this function, what you are proposing is basically a minor form of PID spoofing which we only want someone with CAP_SYS_ADMIN over the PID namespace to be able to do. Sorry, I'm not applying this.
Re: [PATCH net-next v3 2/2] tcp_diag: report TCP MD5 signing keys and addresses
On Tue, 2017-08-29 at 15:29 -0700, Ivan Delalande wrote: > Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to > processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is > not possible to retrieve these from the kernel once they have been > configured on sockets. > > Signed-off-by: Ivan Delalande> --- > include/uapi/linux/inet_diag.h | 1 + > net/ipv4/tcp_diag.c| 115 > ++--- > 2 files changed, 110 insertions(+), 6 deletions(-) > > diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h > index 678496897a68..f52ff62bfabe 100644 > --- a/include/uapi/linux/inet_diag.h > +++ b/include/uapi/linux/inet_diag.h > @@ -143,6 +143,7 @@ enum { > INET_DIAG_MARK, > INET_DIAG_BBRINFO, > INET_DIAG_CLASS_ID, > + INET_DIAG_MD5SIG, > __INET_DIAG_MAX, > }; > > diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c > index a748c74aa8b7..f972f9f7eae4 100644 > --- a/net/ipv4/tcp_diag.c > +++ b/net/ipv4/tcp_diag.c > @@ -16,6 +16,7 @@ > > #include > > +#include > #include > > static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r, > @@ -36,6 +37,106 @@ static void tcp_diag_get_info(struct sock *sk, struct > inet_diag_msg *r, > tcp_get_info(sk, info); > } > > +#ifdef CONFIG_TCP_MD5SIG > +static void inet_diag_md5sig_fill(struct tcp_md5sig *info, > + const struct tcp_md5sig_key *key) > +{ > + #if IS_ENABLED(CONFIG_IPV6) > + if (key->family == AF_INET6) { > + struct sockaddr_in6 *sin6 = > + (struct sockaddr_in6 *)>tcpm_addr; > + > + memcpy(>sin6_addr, >addr.a6, > +sizeof(struct in6_addr)); > + } else > + #endif > + { > + struct sockaddr_in *sin = > + (struct sockaddr_in *)>tcpm_addr; > + > + memcpy(>sin_addr, >addr.a4, sizeof(struct in_addr)); You probably need a memset(... 0 ...) to clear the XX bytes that are not used by IPv4 address. Otherwise your patch is going to 'leak' 124 bytes of kernel memory to user space and offer yet another way for attackers to build exploits. AFAIK, struct tcp_md5sig has a huge hole, since it uses "struct __kernel_sockaddr_storage tcpm_addr " So a similar problem exists for IPv6. Maybe it is time to define a much smaller structure, using only 16 bytes instead of 128 for the address.
Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
On Tue, Aug 29, 2017 at 4:40 PM, Michael S. Tsirkinwrote: > On Tue, Aug 29, 2017 at 03:53:08PM -0400, Willem de Bruijn wrote: >> On Tue, Aug 29, 2017 at 3:42 PM, Michael S. Tsirkin wrote: >> > On Tue, Aug 29, 2017 at 03:35:38PM -0400, Willem de Bruijn wrote: >> >> By the way, I have had an unrelated patch outstanding for a while >> >> to have virtio-net support the VIRTIO_CONFIG_S_NEEDS_RESET >> >> command. Will send that as RFC. >> > >> > Oh nice. >> >> Great :) >> >> > One needs to be careful about locking there which is why >> > no devices support that yet. >> >> I originally wrote it based on the virtnet_reset function introduced >> for xdp. Calling this from virtnet_config_changed_work is non trivial, >> as virtnet_freeze_down waits until no config worker is running. >> >> Otherwise, I could not find any constraints on when freeze may be >> called, and it largely follows the same path. I hope I didn't miss anything. > > The issue is that on freeze processes are not running so we > generally know no new packets will arrive (might be wrong > for bridging, then it's a bug). On device error you must > prevent new skbs from coming in, etc. Thanks a lot for the quick review. I had indeed not yet figured out which invariants freeze can depend on that are not universal. Same for the virtnet_reset call from the .ndo_xdp. Will need to take a much better look at that.
[PATCH] drivers: net: xgene: Correct probe sequence handling
From: Quan NguyenThe phy is connected at early stage of probe but not properly disconnected if error occurs. This patch fixes the issue. Also changing the return type of xgene_enet_check_phy_handle(), since this function always returns success. Signed-off-by: Quan Nguyen Signed-off-by: Iyappan Subramanian --- drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 27 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c index 1d307f2..6e253d9 100644 --- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c @@ -1661,21 +1661,21 @@ static int xgene_enet_get_irqs(struct xgene_enet_pdata *pdata) return 0; } -static int xgene_enet_check_phy_handle(struct xgene_enet_pdata *pdata) +static void xgene_enet_check_phy_handle(struct xgene_enet_pdata *pdata) { int ret; if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) - return 0; + return; if (!IS_ENABLED(CONFIG_MDIO_XGENE)) - return 0; + return; ret = xgene_enet_phy_connect(pdata->ndev); if (!ret) pdata->mdio_driver = true; - return 0; + return; } static void xgene_enet_gpiod_get(struct xgene_enet_pdata *pdata) @@ -1779,10 +1779,6 @@ static int xgene_enet_get_resources(struct xgene_enet_pdata *pdata) if (ret) return ret; - ret = xgene_enet_check_phy_handle(pdata); - if (ret) - return ret; - xgene_enet_gpiod_get(pdata); pdata->clk = devm_clk_get(>dev, NULL); @@ -2097,9 +2093,11 @@ static int xgene_enet_probe(struct platform_device *pdev) goto err; } + xgene_enet_check_phy_handle(pdata); + ret = xgene_enet_init_hw(pdata); if (ret) - goto err; + goto err2; link_state = pdata->mac_ops->link_state; if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) { @@ -2117,29 +2115,30 @@ static int xgene_enet_probe(struct platform_device *pdev) spin_lock_init(>stats_lock); ret = xgene_extd_stats_init(pdata); if (ret) - goto err2; + goto err1; xgene_enet_napi_add(pdata); ret = register_netdev(ndev); if (ret) { netdev_err(ndev, "Failed to register netdev\n"); - goto err2; + goto err1; } return 0; -err2: +err1: /* * If necessary, free_netdev() will call netif_napi_del() and undo * the effects of xgene_enet_napi_add()'s calls to netif_napi_add(). */ + xgene_enet_delete_desc_rings(pdata); + +err2: if (pdata->mdio_driver) xgene_enet_phy_disconnect(pdata); else if (phy_interface_mode_is_rgmii(pdata->phy_mode)) xgene_enet_mdio_remove(pdata); -err1: - xgene_enet_delete_desc_rings(pdata); err: free_netdev(ndev); return ret; -- 2.7.4
Re: [PATCH RFC 2/2] virtio_net: enable probing for NEEDS_RESET support
On Tue, Aug 29, 2017 at 5:13 PM, Michael S. Tsirkinwrote: > On Tue, Aug 29, 2017 at 05:02:29PM -0400, Willem de Bruijn wrote: >> + virtio-dev >> >> On Tue, Aug 29, 2017 at 4:38 PM, Michael S. Tsirkin wrote: >> > On Tue, Aug 29, 2017 at 04:27:41PM -0400, Willem de Bruijn wrote: >> >> On Tue, Aug 29, 2017 at 4:16 PM, Michael S. Tsirkin >> >> wrote: >> >> > On Tue, Aug 29, 2017 at 04:07:59PM -0400, Willem de Bruijn wrote: >> >> >> From: Willem de Bruijn >> >> >> >> >> >> Implement a mechanism to signal that a virtio device implements the >> >> >> VIRTIO_CONFIG_S_NEEDS_RESET command. >> >> >> >> >> >> Testing for VIRTIO_CONFIG_S_NEEDS_RESET support by issuing the request >> >> >> and verifying the reset state would require an expensive state change. >> >> >> >> >> >> To avoid that, add a status bit to the feature register used during >> >> >> feature negotiation between host and guest. >> >> >> >> >> >> Set the bit for virtio-net, which supports the feature. >> >> >> >> >> >> Signed-off-by: Willem de Bruijn >> >> > >> >> > All virtio 1 devices have the reset feature >> >> >> >> I don't quite follow. No device drivers implement this request currently? >> > >> > Depends. Spec 1.0 describes the bit and that driver can respond >> > by reseting the device. You seem to do something else >> > in this patchset, but as designed in 1.0 it does not seem to need >> > a feature bit. >> >> I see. So support is designed to be best effort? >> >> The feature bit is only needed if driver support is optional. >> >> The ack response is necessary if the device acts differently >> depending on whether the reset happened. The device has >> to reset its local state, too, so I think that this is needed. > > That reset should only happen when guest driver resets the device. > And spec already has a mechanism for that anyway. And the device can read the ring state to see whether it has reset, perhaps. >> >> >> > so maybe guest does >> >> > not need this flag. Does device need it? Does device really >> >> > care that guest can't recover? >> >> >> >> If all device drivers support it, then the flag is not needed. >> >> >> >> But as long as legacy device drivers can exist that do not support >> >> this feature, it has to have a way of differentiating between the two. >> > >> > Why? Device won't set this unless it's in a bad state. In that case, >> > setting the bit is harmless even if drivers ignore it. >> >> The goal is for the device to be able to rely on the driver reset to get >> to a good state even if it gets it into a bad state. >> >> That allows it to implement optimizations that could get it into that bad >> state. > > I see. I don't think this is what the need reset was designed for. > > We can extend it to cover this case but let's add a bit more > documentation then. Okay. > And in particular won't it be better if we could just reset one ring, > and not the whole device state? This might be nicer so flows on other > rings aren't disrupted. Indeed. But that would require a different request, then? It also depends on the use case. A full device reset is a big hammer, but if used only to get out of rare edge cases, it is good enough. >> >> In particular, in the edge case where the device performs backpressure, >> takes the descriptor out of the avail ring and does not immediately post >> it to the used ring. >> >> A reset will make the guest free all delayed packets and treat any >> unsent and unacknowledged as network drops. This allows the >> device to indeed drop long delayed packets when they eventually >> surface (e.g., leave a qdisc queue). > > In this particular case, won't it be easier for device to just > report all packets as used, without involving the guest? When the device can just iterate over the outstanding packet, yeah, that's actually simpler. >> This is of course not safe with >> zerocopy packets. > > > I wonder if we can teach kernel to drop zero copy packets too. Your point about changing frags[] underneath a cloned skb really does make that hard. We might be able to mitigate individual specific cases of high latency, such as TC queue occupancy. > > -- > MST
Re: [PATCH net-next v3 1/2] inet_diag: allow protocols to provide additional data
On Tue, 29 Aug 2017 15:29:53 -0700 Ivan Delalandewrote: > @@ -452,13 +466,14 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo > *hashinfo, > struct net *net = sock_net(in_skb->sk); > struct sk_buff *rep; > struct sock *sk; > + bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN); Please keep declarations in Christmas tree order if possible. int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, struct sk_buff *skb, const struct inet_diag_req_v2 *req, struct user_namespace *user_ns, u32 portid, u32 seq, u16 nlmsg_flags, const struct nlmsghdr *unlh, bool net_admin) { bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN); const struct tcp_congestion_ops *ca_ops; const struct inet_diag_handler *handler; int ext = req->idiag_ext; struct inet_diag_msg *r; struct nlmsghdr *nlh; struct nlattr *attr; void *info = NULL;
Re: [PATCH net-next 0/6] flow_dissector: Protocol specific flow dissector offload
Please add proper signoffs to your patches. Thanks.
Re: [PATCH net-next] ipv6: Use rt6i_idev index for echo replies to a local address
From: David AhernDate: Mon, 28 Aug 2017 13:53:34 -0700 > Tariq repored local pings to linklocal address is failing: > $ ifconfig ens8 > ens8: flags=4163 mtu 1500 > inet 11.141.16.6 netmask 255.255.0.0 broadcast 11.141.255.255 > inet6 fe80::7efe:90ff:fecb:7502 prefixlen 64 scopeid 0x20 > ether 7c:fe:90:cb:75:02 txqueuelen 1000 (Ethernet) > RX packets 12 bytes 1164 (1.1 KiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 30 bytes 2484 (2.4 KiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > $ /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8 > PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes > > --- fe80::7efe:90ff:fecb:7502%ens8 ping statistics --- > 3 packets transmitted, 0 received, 100% packet loss, time 2043ms > > icmpv6_echo_reply needs to use the rt6i_idev dev index for local traffic > similar to how icmp6_send does. Convert the change for icmp6_send into a > helper that can be used in both places. Add the long over due > skb_rt6_info helper to convert dst on an skb to rt6_info similar to > skb_rtable for ipv4. > > Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on >device with address") > Reported-by: Tariq Toukan > Signed-off-by: David Ahern Applied but please do not break up Fixes: tags even if the line is very long. Otherwise it's a pain to grep the logs for specific multi-word string sequences in the Fixes tag. Thanks.
Re: [PATCH net-next v1] amd-xgbe: Interrupt summary bits are h/w version dependent
From: Tom LendackyDate: Mon, 28 Aug 2017 15:29:34 -0500 > There is a difference in the bit position of the normal interrupt summary > enable (NIE) and abnormal interrupt summary enable (AIE) between revisions > of the hardware. For older revisions the NIE and AIE bits are positions > 16 and 15 respectively. For newer revisions the NIE and AIE bits are > positions 15 and 14. The effect in changing the bit position is that > newer hardware won't receive AIE interrupts in the current version of the > driver. Specifically, the driver uses this interrupt to collect > statistics on when a receive buffer unavailable event occurs and to > restart the driver/device when a fatal bus error occurs. > > Update the driver to set the interrupt enable bit based on the reported > version of the hardware. > > Signed-off-by: Tom Lendacky Applied.
[PATCH net-next v3 0/2] report TCP MD5 signing keys and addresses
Allow userspace to retrieve MD5 signature keys and addresses configured on TCP sockets through inet_diag. Thank you Eric Dumazet for the useful explanations and feedback. v3: - rename inet_diag_*md5sig in tcp_diag.c to tcp_diag_* for consistency, - don't lock the socket tcp_diag_put_md5sig, - add checks on md5sig_count in tcp_diag_put_md5sig to not create the netlink attribute if the list is empty, and to avoid overflows or memory leaks if the list has changed in the meantime. v2: - move changes to tcp_diag.c and extend inet_diag_handler to allow protocols to provide additional data on INET_DIAG_INFO, - lock socket before calling tcp_diag_put_md5sig. I also have a patch for iproute2/ss to test this change, making it print this new attribute. I'm planning to polish and send it if this series gets applied. Ivan Delalande (2): inet_diag: allow protocols to provide additional data tcp_diag: report TCP MD5 signing keys and addresses include/linux/inet_diag.h | 7 +++ include/uapi/linux/inet_diag.h | 1 + net/ipv4/inet_diag.c | 22 ++-- net/ipv4/tcp_diag.c| 115 ++--- 4 files changed, 135 insertions(+), 10 deletions(-) -- 2.14.1
[PATCH net-next v3 2/2] tcp_diag: report TCP MD5 signing keys and addresses
Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is not possible to retrieve these from the kernel once they have been configured on sockets. Signed-off-by: Ivan Delalande--- include/uapi/linux/inet_diag.h | 1 + net/ipv4/tcp_diag.c| 115 ++--- 2 files changed, 110 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h index 678496897a68..f52ff62bfabe 100644 --- a/include/uapi/linux/inet_diag.h +++ b/include/uapi/linux/inet_diag.h @@ -143,6 +143,7 @@ enum { INET_DIAG_MARK, INET_DIAG_BBRINFO, INET_DIAG_CLASS_ID, + INET_DIAG_MD5SIG, __INET_DIAG_MAX, }; diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c index a748c74aa8b7..f972f9f7eae4 100644 --- a/net/ipv4/tcp_diag.c +++ b/net/ipv4/tcp_diag.c @@ -16,6 +16,7 @@ #include +#include #include static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r, @@ -36,6 +37,106 @@ static void tcp_diag_get_info(struct sock *sk, struct inet_diag_msg *r, tcp_get_info(sk, info); } +#ifdef CONFIG_TCP_MD5SIG +static void inet_diag_md5sig_fill(struct tcp_md5sig *info, + const struct tcp_md5sig_key *key) +{ + #if IS_ENABLED(CONFIG_IPV6) + if (key->family == AF_INET6) { + struct sockaddr_in6 *sin6 = + (struct sockaddr_in6 *)>tcpm_addr; + + memcpy(>sin6_addr, >addr.a6, + sizeof(struct in6_addr)); + } else + #endif + { + struct sockaddr_in *sin = + (struct sockaddr_in *)>tcpm_addr; + + memcpy(>sin_addr, >addr.a4, sizeof(struct in_addr)); + } + + info->tcpm_addr.ss_family = key->family; + info->tcpm_prefixlen = key->prefixlen; + info->tcpm_keylen = key->keylen; + memcpy(info->tcpm_key, key->key, key->keylen); +} + +static int inet_diag_put_md5sig(struct sk_buff *skb, + const struct tcp_md5sig_info *md5sig) +{ + const struct tcp_md5sig_key *key; + struct nlattr *attr; + struct tcp_md5sig *info; + int md5sig_count = 0; + + hlist_for_each_entry_rcu(key, >head, node) + md5sig_count++; + if (md5sig_count == 0) + return 0; + + attr = nla_reserve(skb, INET_DIAG_MD5SIG, + md5sig_count * sizeof(struct tcp_md5sig)); + if (!attr) + return -EMSGSIZE; + + info = nla_data(attr); + hlist_for_each_entry_rcu(key, >head, node) { + inet_diag_md5sig_fill(info++, key); + if (--md5sig_count == 0) + break; + } + if (md5sig_count > 0) + memset(info, 0, md5sig_count * sizeof(struct tcp_md5sig)); + + return 0; +} +#endif + +static int tcp_diag_get_aux(struct sock *sk, bool net_admin, + struct sk_buff *skb) +{ +#ifdef CONFIG_TCP_MD5SIG + if (net_admin) { + struct tcp_md5sig_info *md5sig; + int err = 0; + + rcu_read_lock(); + md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info); + if (md5sig) + err = inet_diag_put_md5sig(skb, md5sig); + rcu_read_unlock(); + if (err < 0) + return err; + } +#endif + + return 0; +} + +static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin) +{ + size_t size = 0; + +#ifdef CONFIG_TCP_MD5SIG + if (sk_fullsock(sk)) { + const struct tcp_md5sig_info *md5sig; + const struct tcp_md5sig_key *key; + + rcu_read_lock(); + md5sig = rcu_dereference(tcp_sk(sk)->md5sig_info); + if (md5sig) { + hlist_for_each_entry_rcu(key, >head, node) + size += sizeof(struct tcp_md5sig); + } + rcu_read_unlock(); + } +#endif + + return size; +} + static void tcp_diag_dump(struct sk_buff *skb, struct netlink_callback *cb, const struct inet_diag_req_v2 *r, struct nlattr *bc) { @@ -68,13 +169,15 @@ static int tcp_diag_destroy(struct sk_buff *in_skb, #endif static const struct inet_diag_handler tcp_diag_handler = { - .dump= tcp_diag_dump, - .dump_one= tcp_diag_dump_one, - .idiag_get_info = tcp_diag_get_info, - .idiag_type = IPPROTO_TCP, - .idiag_info_size = sizeof(struct tcp_info), + .dump = tcp_diag_dump, + .dump_one = tcp_diag_dump_one, + .idiag_get_info = tcp_diag_get_info, + .idiag_get_aux = tcp_diag_get_aux, + .idiag_get_aux_size =
[PATCH net-next v3 1/2] inet_diag: allow protocols to provide additional data
Extend inet_diag_handler to allow individual protocols to report additional data on INET_DIAG_INFO through idiag_get_aux. The size can be dynamic and is computed by idiag_get_aux_size. Signed-off-by: Ivan Delalande--- include/linux/inet_diag.h | 7 +++ net/ipv4/inet_diag.c | 22 ++ 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/include/linux/inet_diag.h b/include/linux/inet_diag.h index 65da430e260f..ee251c585854 100644 --- a/include/linux/inet_diag.h +++ b/include/linux/inet_diag.h @@ -25,6 +25,13 @@ struct inet_diag_handler { struct inet_diag_msg *r, void *info); + int (*idiag_get_aux)(struct sock *sk, +bool net_admin, +struct sk_buff *skb); + + size_t (*idiag_get_aux_size)(struct sock *sk, + bool net_admin); + int (*destroy)(struct sk_buff *in_skb, const struct inet_diag_req_v2 *req); diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 67325d5832d7..8a88ef373395 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -93,8 +93,17 @@ void inet_diag_msg_common_fill(struct inet_diag_msg *r, struct sock *sk) } EXPORT_SYMBOL_GPL(inet_diag_msg_common_fill); -static size_t inet_sk_attr_size(void) +static size_t inet_sk_attr_size(struct sock *sk, + const struct inet_diag_req_v2 *req, + bool net_admin) { + const struct inet_diag_handler *handler; + size_t aux = 0; + + handler = inet_diag_table[req->sdiag_protocol]; + if (handler && handler->idiag_get_aux_size) + aux = handler->idiag_get_aux_size(sk, net_admin); + returnnla_total_size(sizeof(struct tcp_info)) + nla_total_size(1) /* INET_DIAG_SHUTDOWN */ + nla_total_size(1) /* INET_DIAG_TOS */ @@ -105,6 +114,7 @@ static size_t inet_sk_attr_size(void) + nla_total_size(SK_MEMINFO_VARS * sizeof(u32)) + nla_total_size(TCP_CA_NAME_MAX) + nla_total_size(sizeof(struct tcpvegas_info)) + + nla_total_size(aux) + 64; } @@ -260,6 +270,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, handler->idiag_get_info(sk, r, info); + if (ext & (1 << (INET_DIAG_INFO - 1)) && handler->idiag_get_aux) + if (handler->idiag_get_aux(sk, net_admin, skb) < 0) + goto errout; + if (sk->sk_state < TCP_TIME_WAIT) { union tcp_cc_info info; size_t sz = 0; @@ -452,13 +466,14 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo, struct net *net = sock_net(in_skb->sk); struct sk_buff *rep; struct sock *sk; + bool net_admin = netlink_net_capable(in_skb, CAP_NET_ADMIN); int err; sk = inet_diag_find_one_icsk(net, hashinfo, req); if (IS_ERR(sk)) return PTR_ERR(sk); - rep = nlmsg_new(inet_sk_attr_size(), GFP_KERNEL); + rep = nlmsg_new(inet_sk_attr_size(sk, req, net_admin), GFP_KERNEL); if (!rep) { err = -ENOMEM; goto out; @@ -467,8 +482,7 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo, err = sk_diag_fill(sk, rep, req, sk_user_ns(NETLINK_CB(in_skb).sk), NETLINK_CB(in_skb).portid, - nlh->nlmsg_seq, 0, nlh, - netlink_net_capable(in_skb, CAP_NET_ADMIN)); + nlh->nlmsg_seq, 0, nlh, net_admin); if (err < 0) { WARN_ON(err == -EMSGSIZE); nlmsg_free(rep); -- 2.14.1
Re: Permissions for eBPF objects
On 29/08/2017 03:44, Chenbo Feng wrote: > On Mon, Aug 28, 2017 at 6:15 PM, Alexei Starovoitov >wrote: >> On Mon, Aug 28, 2017 at 05:47:19PM -0700, Chenbo Feng wrote: >>> On Fri, Aug 25, 2017 at 6:03 PM, Alexei Starovoitov >>> wrote: On Fri, Aug 25, 2017 at 10:07:27PM +0200, Daniel Borkmann wrote: > On 08/25/2017 09:52 PM, Chenbo Feng wrote: >> On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep >> wrote: >>> On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley >>> wrote: On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux wrote: > I’d like to get your thoughts on adding LSM permission checks on BPF > objects. before reinventing the wheel please take a look at landlock work. Everything that was discussed in this thread is covered by it. The patches have been in development for more than a year and most of the early issues have been resolved. It will be presented again during security summit in LA in September. >>> I am not very familiar with landlock lsm, isn't this module also >>> depend on the lsm hooks to do >>> the landlock check? If so then adding lsm hooks for eBPF object seems >>> not conflict with the >>> work on progress. >> >> I see. I got it the other way around. What lsm checks are you proposing? >> and why unprivileged_bpf_disabled is not enough? >> you want to allow unpriv only for specific user(s) ? >> > Exactly, the proposal patch I am currently working on will add checks > before map creation, > map read, and map modify, since all these functionalities will be > available to all users when > unprivileged_bpf_disabled is turned off. And eBPF prog_load may also > need a check as well > since loading some types of program is not restricted either. > It would be interesting to be able to check a wide range of actions performed with the BPF syscall: the command and the union bpf_attr argument. Because it is a multiplexer, that may be challenging, though. signature.asc Description: OpenPGP digital signature
Re: [PATCH net-next 0/4] nsh: headers, GSO
From: Jiri BencDate: Mon, 28 Aug 2017 21:43:20 +0200 > This adds header structs and helpers for NSH together with GSO support. > > Note there is no code in this patchset that actually manipulates the NSH > headers. That was sent to netdev by Yi Yang ("[PATCH net-next v6 0/3] > openvswitch: add NSH support"). The aim of this series is to lay the > groundwork and ease the implementation for him. > > In addition to openvswitch, the NSH support should be added to tc (flower to > match, act_nsh to push/pop NSH headers). That will come later. There's > currently no plan to support NSH by other means than those two. > > The patch 3 in this patchset was written by Yi Yang, I took it from the > aforementioned series and slightly modified it - see the note in the patch. Series applied, thanks Jiri.
[PATCH net-next] neigh: increase queue_len_bytes to match wmem_default
From: Eric DumazetFlorian reported UDP xmit drops that could be root caused to the too small neigh limit. Current limit is 64 KB, meaning that even a single UDP socket would hit it, since its default sk_sndbuf comes from net.core.wmem_default (~212992 bytes on 64bit arches). Once ARP/ND resolution is in progress, we should allow a little more packets to be queued, at least for one producer. Once neigh arp_queue is filled, a rogue socket should hit its sk_sndbuf limit and either block in sendmsg() or return -EAGAIN. Signed-off-by: Eric Dumazet Reported-by: Florian Fainelli --- Documentation/networking/ip-sysctl.txt |7 +-- include/net/sock.h | 10 ++ net/core/sock.c| 10 -- net/decnet/dn_neigh.c |2 +- net/ipv4/arp.c |2 +- net/ipv4/tcp_input.c |2 +- net/ipv6/ndisc.c |2 +- 7 files changed, 19 insertions(+), 16 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 6b0bc0f715346a097a6df46e2ba2771359abcd23..b3345d0fe0a67e477a6754848e7fc7be144322d5 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -109,7 +109,10 @@ neigh/default/unres_qlen_bytes - INTEGER queued for each unresolved address by other network layers. (added in linux 3.3) Setting negative value is meaningless and will return error. - Default: 65536 Bytes(64KB) + Default: SK_WMEM_MAX, (same as net.core.wmem_default). + Exact value depends on architecture and kernel options, + but should be enough to allow queuing 256 packets + of medium size. neigh/default/unres_qlen - INTEGER The maximum number of packets which may be queued for each @@ -119,7 +122,7 @@ neigh/default/unres_qlen - INTEGER unexpected packet loss. The current default value is calculated according to default value of unres_qlen_bytes and true size of packet. - Default: 31 + Default: 101 mtu_expires - INTEGER Time, in seconds, that cached PMTU information is kept. diff --git a/include/net/sock.h b/include/net/sock.h index 1c2912d433e81b10f3fdc87bcfcbb091570edc03..03a362568357acc7278a318423dd3873103f90ca 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2368,6 +2368,16 @@ bool sk_net_capable(const struct sock *sk, int cap); void sk_get_meminfo(const struct sock *sk, u32 *meminfo); +/* Take into consideration the size of the struct sk_buff overhead in the + * determination of these values, since that is non-constant across + * platforms. This makes socket queueing behavior and performance + * not depend upon such differences. + */ +#define _SK_MEM_PACKETS256 +#define _SK_MEM_OVERHEAD SKB_TRUESIZE(256) +#define SK_WMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) +#define SK_RMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) + extern __u32 sysctl_wmem_max; extern __u32 sysctl_rmem_max; diff --git a/net/core/sock.c b/net/core/sock.c index dfdd14cac775e9bfcee0085ee32ffcd0ab28b67b..9b7b6bbb2a23e7652a1f34a305f29d49de00bc8c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -307,16 +307,6 @@ static struct lock_class_key af_wlock_keys[AF_MAX]; static struct lock_class_key af_elock_keys[AF_MAX]; static struct lock_class_key af_kern_callback_keys[AF_MAX]; -/* Take into consideration the size of the struct sk_buff overhead in the - * determination of these values, since that is non-constant across - * platforms. This makes socket queueing behavior and performance - * not depend upon such differences. - */ -#define _SK_MEM_PACKETS256 -#define _SK_MEM_OVERHEAD SKB_TRUESIZE(256) -#define SK_WMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) -#define SK_RMEM_MAX(_SK_MEM_OVERHEAD * _SK_MEM_PACKETS) - /* Run time adjustable parameters. */ __u32 sysctl_wmem_max __read_mostly = SK_WMEM_MAX; EXPORT_SYMBOL(sysctl_wmem_max); diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c index 21dedf6fd0f76dec22b2b3685beb89cfefea7ded..22bf0b95d6edc3c27ef3a99d27cb70a1551e3e0e 100644 --- a/net/decnet/dn_neigh.c +++ b/net/decnet/dn_neigh.c @@ -94,7 +94,7 @@ struct neigh_table dn_neigh_table = { [NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ, [NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ, [NEIGH_VAR_GC_STALETIME] = 60 * HZ, - [NEIGH_VAR_QUEUE_LEN_BYTES] = 64*1024, + [NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX, [NEIGH_VAR_PROXY_QLEN] = 0, [NEIGH_VAR_ANYCAST_DELAY] = 0, [NEIGH_VAR_PROXY_DELAY] = 0, diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index
Re: [PATCH net-next 0/3] tc: act_ife: handle IEEE IFE ethertype as default
From: Alexander AringDate: Mon, 28 Aug 2017 15:03:12 -0400 > this patch series will introduce the IFE ethertype which is registered by > IEEE. If the netlink act_ife type netlink attribute is not given it will > use this value by default now. > At least it will introduce some UAPI testcases to check if the default type > is used if not specified and vice versa. Series applied, thank you.
Re: [PATCH net-next] Documentation: networking: Add blurb about patches in patchwork
From: Florian FainelliDate: Tue, 29 Aug 2017 15:07:51 -0700 > Explain that the patch queue in patchwork should not be touched by patch > submitters. > > Signed-off-by: Florian Fainelli Applied.
Re: [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
From: Florian FainelliDate: Tue, 29 Aug 2017 15:08:00 -0700 > On 08/29/2017 02:52 PM, David Miller wrote: >> From: Florian Fainelli >> Date: Tue, 29 Aug 2017 14:45:30 -0700 >> >>> On 08/29/2017 02:42 PM, David Miller wrote: From: Florian Fainelli Date: Tue, 29 Aug 2017 11:39:41 -0700 > While trying an ARM BE kernel for kinks, the 3 drivers below started not > working and the reasons why became pretty obvious because the register > space > remains LE (hardwired), except for Broadcom MIPS where it follows the > CPU's > native endian (let's call that a feature). Series applied, thanks Florian. >>> >>> If you have not pushed yet, seems like not, can you check you applied v2 >>> of this patch series, in particular this patch: >>> >>> http://patchwork.ozlabs.org/patch/807296/ >> >> If you didn't keep messing with the patchwork state of patches in my >> queue this confusion would not have happened. >> >> I did happen to apply v2, but because of all of the confusion you keep >> creating, I used the header message from v1. >> >> It's already pushed out so there is nothing we can do about it. >> > > It's all good, thanks! /me goes updating netdev-FAQ.txt to mention not > touching patchwork. Thank you.
Re: [PATCH net v2 0/6] net:ethernet:aquantia: Atlantic driver Update 2017-08-23
From: Pavel BelousDate: Mon, 28 Aug 2017 21:52:07 +0300 > From: Pavel Belous > > This series contains updates for aQuantia Atlantic driver. > > It has bugfixes and some improvements. > > Changes in v2: > - "MCP state change" fix removed (will be sent as > a separate fix after further investigation.) Series applied, thanks.
Re: [PATCH] packet: Don't write vnet header beyond end of buffer
From: Benjamin PoirierDate: Mon, 28 Aug 2017 14:29:41 -0400 > ... which may happen with certain values of tp_reserve and maclen. > > Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") > Signed-off-by: Benjamin Poirier > Cc: Willem de Bruijn Applied and queued up for -stable.
Re: [PATCH net v1 1/1] tipc: permit bond slave as bearer
From: Parthasarathy BhuvaraganDate: Mon, 28 Aug 2017 17:57:02 +0200 > For a bond slave device as a tipc bearer, the dev represents the bond > interface and orig_dev represents the slave in tipc_l2_rcv_msg(). > Since we decode the tipc_ptr from bonding device (dev), we fail to > find the bearer and thus tipc links are not established. > > In this commit, we register the tipc protocol callback per device and > look for tipc bearer from both the devices. > > Signed-off-by: Parthasarathy Bhuvaragan > Applied.
Re: [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
On 08/29/2017 02:52 PM, David Miller wrote: > From: Florian Fainelli> Date: Tue, 29 Aug 2017 14:45:30 -0700 > >> On 08/29/2017 02:42 PM, David Miller wrote: >>> From: Florian Fainelli >>> Date: Tue, 29 Aug 2017 11:39:41 -0700 >>> While trying an ARM BE kernel for kinks, the 3 drivers below started not working and the reasons why became pretty obvious because the register space remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's native endian (let's call that a feature). >>> >>> Series applied, thanks Florian. >> >> If you have not pushed yet, seems like not, can you check you applied v2 >> of this patch series, in particular this patch: >> >> http://patchwork.ozlabs.org/patch/807296/ > > If you didn't keep messing with the patchwork state of patches in my > queue this confusion would not have happened. > > I did happen to apply v2, but because of all of the confusion you keep > creating, I used the header message from v1. > > It's already pushed out so there is nothing we can do about it. > It's all good, thanks! /me goes updating netdev-FAQ.txt to mention not touching patchwork. -- Florian
[PATCH net-next] Documentation: networking: Add blurb about patches in patchwork
Explain that the patch queue in patchwork should not be touched by patch submitters. Signed-off-by: Florian Fainelli--- Documentation/networking/netdev-FAQ.txt | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/networking/netdev-FAQ.txt b/Documentation/networking/netdev-FAQ.txt index 247a30ba8e17..cfc66ea72329 100644 --- a/Documentation/networking/netdev-FAQ.txt +++ b/Documentation/networking/netdev-FAQ.txt @@ -111,6 +111,14 @@ A: Generally speaking, the patches get triaged quickly (in less than 48h). patch is a good way to ensure your patch is ignored or pushed to the bottom of the priority list. +Q: I submitted multiple versions of the patch series, should I directly update + patchwork for the previous versions of these patch series? + +A: No, please don't interfere with the patch status on patchwork, leave it to + the maintainer to figure out what is the most recent and current version that + should be applied. If there is any doubt, the maintainer will reply and ask + what should be done. + Q: How can I tell what patches are queued up for backporting to the various stable releases? -- 2.9.3
Re: [PATCH net-next 0/4] mlx4 misc patches
From: Tariq ToukanDate: Mon, 28 Aug 2017 16:38:19 +0300 > This patchset contains misc patches from the team > to the mlx4 Core and Eth drivers. > > Patch 1 by Eran replaces large static allocations by dynamic ones. > Patch 2 by Leon makes an explicit conversion and solves a smatch warning. > In patch 3 I fix a misplaced brackets of the sizeof operation. > Patch 4 by Moshe adds the ability to inform the FW regarding user mac updates. > > Series generated against net-next commit: > 901c5d2fbfcd ARM: dts: rk3228-evb: Fix the compiling error Series applied, thanks.
Re: [PATCH v2 net-next 0/8] bpf: Add option to set mark and priority in cgroup sock programs
From: David AhernDate: Fri, 25 Aug 2017 12:05:33 -0700 > Add option to set mark and priority in addition to bound device for newly > created sockets. Also, allow the bpf programs to use the get_current_uid_gid > helper meaning socket marks, priority and device can be set base on the > uid/gid of the running process. > > For flexbility in deploying these programs, option is added to allow cgroups > to be walked from current to root running any program attached. This allows > one cgroup level to control the device a socket is bound to (e.g, a VRF) while > cgroups can be used to set socket marks and priority. > > Sample programs are updated to demonstrate the new options. > > v2 > - added flag to control recursive behavior as requested by Alexei > - added comment to sock_filter_func_proto regarding use of > get_current_uid_gid helper > - updated test programs for recursive option I'm marking this patch series as "deferred" while the semantic issues keep getting discussed. Thanks.
Re: [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
From: Florian FainelliDate: Tue, 29 Aug 2017 14:45:30 -0700 > On 08/29/2017 02:42 PM, David Miller wrote: >> From: Florian Fainelli >> Date: Tue, 29 Aug 2017 11:39:41 -0700 >> >>> While trying an ARM BE kernel for kinks, the 3 drivers below started not >>> working and the reasons why became pretty obvious because the register space >>> remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's >>> native endian (let's call that a feature). >> >> Series applied, thanks Florian. > > If you have not pushed yet, seems like not, can you check you applied v2 > of this patch series, in particular this patch: > > http://patchwork.ozlabs.org/patch/807296/ If you didn't keep messing with the patchwork state of patches in my queue this confusion would not have happened. I did happen to apply v2, but because of all of the confusion you keep creating, I used the header message from v1. It's already pushed out so there is nothing we can do about it.
Re: [PATCH net-next v3] bridge: fdb add and delete tracepoints
From: Roopa PrabhuDate: Tue, 29 Aug 2017 13:16:57 -0700 > From: Roopa Prabhu > > A few useful tracepoints to trace bridge forwarding > database updates. > > Signed-off-by: Roopa Prabhu > --- > v2: address comments from florian > v3: remove stray character '=' in print (pointed out by florian) Applied, thanks.
Re: [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
On 08/29/2017 02:42 PM, David Miller wrote: > From: Florian Fainelli> Date: Tue, 29 Aug 2017 11:39:41 -0700 > >> While trying an ARM BE kernel for kinks, the 3 drivers below started not >> working and the reasons why became pretty obvious because the register space >> remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's >> native endian (let's call that a feature). > > Series applied, thanks Florian. If you have not pushed yet, seems like not, can you check you applied v2 of this patch series, in particular this patch: http://patchwork.ozlabs.org/patch/807296/ Thanks! -- Florian
Re: [PATCH net-next 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
From: Florian FainelliDate: Tue, 29 Aug 2017 11:39:41 -0700 > While trying an ARM BE kernel for kinks, the 3 drivers below started not > working and the reasons why became pretty obvious because the register space > remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's > native endian (let's call that a feature). Series applied, thanks Florian.
Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it
On Tuesday 29 August 2017 01:42:08 David Miller wrote: > From: Greg Kroah-Hartman> Date: Sun, 27 Aug 2017 17:03:30 +0200 > > > The IRDA code has long been obsolete and broken. So, to keep people > > from trying to use it, and to prevent people from having to maintain it, > > let's move it to drivers/staging/ so that we can delete it entirely from > > the kernel in a few releases. > > No objection, I'll apply this to net-next, thanks Greg. IRDA works fine in Debian 9 (kernel 4.9) and I use it for simple file transfer. Hope I'm not the only one... # irattach /dev/ttyS0 -d tekram -s # irdadump 21:28:52.830350 xid:cmd aed8eb79 > S=6 s=0 (14) 21:28:52.922368 xid:cmd aed8eb79 > S=6 s=1 (14) 21:28:53.014350 xid:cmd aed8eb79 > S=6 s=2 (14) 21:28:53.106338 xid:cmd aed8eb79 > S=6 s=3 (14) 21:28:53.190276 xid:rsp aed8eb79 < 35d1 S=6 s=3 Nokia 6230i hint=b125 [ PnP Modem Fax Telephony IrCOMM IrOBEX ] (28) 21:28:53.198384 xid:cmd aed8eb79 > S=6 s=4 (14) 21:28:53.290382 xid:cmd aed8eb79 > S=6 s=5 (14) 21:28:53.382341 xid:cmd aed8eb79 > S=6 s=* pentium hint=0400 [ Computer ] (23) ^C 8 packets received by filter $ obexftp -i -l MMC Connecting..\done Receiving "MMC".../ ]> $ obexftp -i -c MMC -g Image004.jpg Connecting..\done Sending "MMC"...|done Receiving "Image004.jpg"...-done Disconnecting..\done -- Ondrej Zary
Re: [PATCH] DSA support for Micrel KSZ8895
On 08/29/2017 02:15 PM, Pavel Machek wrote: > On Tue 2017-08-29 14:26:04, Andrew Lunn wrote: >>> But the MDIO emaulation code is from their driver, after lots of >>> deletions. >> >> Is this driver supposed to run on lots of different OSs? That would >> explain why they ignored the Linux MDIO and PHY layers. > > It did not look particulary portable. Part of the problem is that they need to duplicate the standard MII definitions, whereas we could re-use those from include/linux/mii.h and/or mdio.h. > >> If possible, please make use of the Linux infrastructure. > > I did not find any infrastructure I could use instead > ksz_mdio_emulation. fixed PHY/swphy.c is as close as it could get, but it is a highly simplified version of this. > > Now, drivers/net/phy/spi_ks8995.c can access the PHY registers, and I > can not see any translation there, so there may be something I'm > missing. I don't see anything in that driver that seems to access PHY registers what makes you think it does? There's got to be a way to perform indirect accesses through SPI, Woojung, do you know? > > Pointers would be welcome at this point. > > Thanks, > Pavel > -- Florian
Re: mlxsw and rtnl lock
On 08/29/2017 11:04 PM, David Ahern wrote: > On 8/29/17 12:10 AM, Arkadi Sharshevsky wrote: >> >> >> On 08/28/2017 09:00 PM, David Ahern wrote: >>> On 8/26/17 11:04 AM, Ido Schimmel wrote: Regarding the silent abort, that's intentional. You can look at the same code in v4.9 - when the chain was still blocking - and you'll see that we didn't propagate the error even then. This was discussed in the past and the conclusion was that user doesn't expect to operation to fail. If hardware resources are exceeded, we let the kernel take care of the forwarding instead. >>> >>> In addition to Roopa's comments... The silent abort is not a good user >>> experience. Right now it's add a network address or route, cross fingers >>> and hope it does not overflow some limit (nexthop, ecmp, neighbor, >>> prefix, etc) that triggers the offload abort. >>> >>> The mlxsw driver queries for some limits (e.g., max rifs) but I don't >>> see any query related to current usage, and there is no API to pass any >>> of that data to user space so user space has no programmatic way to >>> handle this. I realize you are aware of this limitation. The point is to >>> emphasize the need to resolve this. >>> >> >> We actually thought about providing he user some tools to understand >> the ASIC's limitations by introducing the 'resource' object to devlink. >> >> By linking dpipe tables to resources the user can understand which >> hardware processes share a common resource, furthermore this resources >> usage could be observed. By this more visibility can be obtained. >> >> Its not a remedy for the silent abort, but, maybe a notification >> can be sent from devlink in case of abort that some resources is >> full. >> >> This proposition was sent as RFC several weeks ago. >> > > Do you have patches (kernel and devlink) for the proposal? > No, only the design RFC which describe the UAPI, devlink commands and the devlink/driver interactions. I wanted to receive some feedback before the coding.
Re: [PATCH] DSA support for Micrel KSZ8895
On Tue 2017-08-29 14:26:04, Andrew Lunn wrote: > > But the MDIO emaulation code is from their driver, after lots of > > deletions. > > Is this driver supposed to run on lots of different OSs? That would > explain why they ignored the Linux MDIO and PHY layers. It did not look particulary portable. > If possible, please make use of the Linux infrastructure. I did not find any infrastructure I could use instead ksz_mdio_emulation. Now, drivers/net/phy/spi_ks8995.c can access the PHY registers, and I can not see any translation there, so there may be something I'm missing. Pointers would be welcome at this point. Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH RFC 2/2] virtio_net: enable probing for NEEDS_RESET support
On Tue, Aug 29, 2017 at 05:02:29PM -0400, Willem de Bruijn wrote: > + virtio-dev > > On Tue, Aug 29, 2017 at 4:38 PM, Michael S. Tsirkinwrote: > > On Tue, Aug 29, 2017 at 04:27:41PM -0400, Willem de Bruijn wrote: > >> On Tue, Aug 29, 2017 at 4:16 PM, Michael S. Tsirkin > >> wrote: > >> > On Tue, Aug 29, 2017 at 04:07:59PM -0400, Willem de Bruijn wrote: > >> >> From: Willem de Bruijn > >> >> > >> >> Implement a mechanism to signal that a virtio device implements the > >> >> VIRTIO_CONFIG_S_NEEDS_RESET command. > >> >> > >> >> Testing for VIRTIO_CONFIG_S_NEEDS_RESET support by issuing the request > >> >> and verifying the reset state would require an expensive state change. > >> >> > >> >> To avoid that, add a status bit to the feature register used during > >> >> feature negotiation between host and guest. > >> >> > >> >> Set the bit for virtio-net, which supports the feature. > >> >> > >> >> Signed-off-by: Willem de Bruijn > >> > > >> > All virtio 1 devices have the reset feature > >> > >> I don't quite follow. No device drivers implement this request currently? > > > > Depends. Spec 1.0 describes the bit and that driver can respond > > by reseting the device. You seem to do something else > > in this patchset, but as designed in 1.0 it does not seem to need > > a feature bit. > > I see. So support is designed to be best effort? > > The feature bit is only needed if driver support is optional. > > The ack response is necessary if the device acts differently > depending on whether the reset happened. The device has > to reset its local state, too, so I think that this is needed. That reset should only happen when guest driver resets the device. And spec already has a mechanism for that anyway. > > >> > so maybe guest does > >> > not need this flag. Does device need it? Does device really > >> > care that guest can't recover? > >> > >> If all device drivers support it, then the flag is not needed. > >> > >> But as long as legacy device drivers can exist that do not support > >> this feature, it has to have a way of differentiating between the two. > > > > Why? Device won't set this unless it's in a bad state. In that case, > > setting the bit is harmless even if drivers ignore it. > > The goal is for the device to be able to rely on the driver reset to get > to a good state even if it gets it into a bad state. > > That allows it to implement optimizations that could get it into that bad > state. I see. I don't think this is what the need reset was designed for. We can extend it to cover this case but let's add a bit more documentation then. And in particular won't it be better if we could just reset one ring, and not the whole device state? This might be nicer so flows on other rings aren't disrupted. > > In particular, in the edge case where the device performs backpressure, > takes the descriptor out of the avail ring and does not immediately post > it to the used ring. > > A reset will make the guest free all delayed packets and treat any > unsent and unacknowledged as network drops. This allows the > device to indeed drop long delayed packets when they eventually > surface (e.g., leave a qdisc queue). In this particular case, won't it be easier for device to just report all packets as used, without involving the guest? > This is of course not safe with > zerocopy packets. I wonder if we can teach kernel to drop zero copy packets too. -- MST
Re: [PATCH RFC 2/2] virtio_net: enable probing for NEEDS_RESET support
+ virtio-dev On Tue, Aug 29, 2017 at 4:38 PM, Michael S. Tsirkinwrote: > On Tue, Aug 29, 2017 at 04:27:41PM -0400, Willem de Bruijn wrote: >> On Tue, Aug 29, 2017 at 4:16 PM, Michael S. Tsirkin wrote: >> > On Tue, Aug 29, 2017 at 04:07:59PM -0400, Willem de Bruijn wrote: >> >> From: Willem de Bruijn >> >> >> >> Implement a mechanism to signal that a virtio device implements the >> >> VIRTIO_CONFIG_S_NEEDS_RESET command. >> >> >> >> Testing for VIRTIO_CONFIG_S_NEEDS_RESET support by issuing the request >> >> and verifying the reset state would require an expensive state change. >> >> >> >> To avoid that, add a status bit to the feature register used during >> >> feature negotiation between host and guest. >> >> >> >> Set the bit for virtio-net, which supports the feature. >> >> >> >> Signed-off-by: Willem de Bruijn >> > >> > All virtio 1 devices have the reset feature >> >> I don't quite follow. No device drivers implement this request currently? > > Depends. Spec 1.0 describes the bit and that driver can respond > by reseting the device. You seem to do something else > in this patchset, but as designed in 1.0 it does not seem to need > a feature bit. I see. So support is designed to be best effort? The feature bit is only needed if driver support is optional. The ack response is necessary if the device acts differently depending on whether the reset happened. The device has to reset its local state, too, so I think that this is needed. >> > so maybe guest does >> > not need this flag. Does device need it? Does device really >> > care that guest can't recover? >> >> If all device drivers support it, then the flag is not needed. >> >> But as long as legacy device drivers can exist that do not support >> this feature, it has to have a way of differentiating between the two. > > Why? Device won't set this unless it's in a bad state. In that case, > setting the bit is harmless even if drivers ignore it. The goal is for the device to be able to rely on the driver reset to get to a good state even if it gets it into a bad state. That allows it to implement optimizations that could get it into that bad state. In particular, in the edge case where the device performs backpressure, takes the descriptor out of the avail ring and does not immediately post it to the used ring. A reset will make the guest free all delayed packets and treat any unsent and unacknowledged as network drops. This allows the device to indeed drop long delayed packets when they eventually surface (e.g., leave a qdisc queue). This is of course not safe with zerocopy packets.
[PATCH net-next v2 2/4] net: dsa: bcm_sf2: Use correct I/O accessors
The Starfigther 2 driver currently uses __raw_{read,write}l which means native I/O endian. This works correctly for an ARM LE kernel (default) but fails miserably on an ARM BE (BE8) kernel where registers are kept little endian, so replace uses with {read,write}l_relaxed here which is what we want because this is all performance sensitive code. Signed-off-by: Florian Fainelli--- drivers/net/dsa/bcm_sf2.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h index 7d3030e04f11..d9c96b281fc0 100644 --- a/drivers/net/dsa/bcm_sf2.h +++ b/drivers/net/dsa/bcm_sf2.h @@ -130,12 +130,12 @@ static inline u32 bcm_sf2_mangle_addr(struct bcm_sf2_priv *priv, u32 off) #define SF2_IO_MACRO(name) \ static inline u32 name##_readl(struct bcm_sf2_priv *priv, u32 off) \ { \ - return __raw_readl(priv->name + off); \ + return readl_relaxed(priv->name + off); \ } \ static inline void name##_writel(struct bcm_sf2_priv *priv,\ u32 val, u32 off) \ { \ - __raw_writel(val, priv->name + off);\ + writel_relaxed(val, priv->name + off); \ } \ /* Accesses to 64-bits register requires us to latch the hi/lo pairs @@ -179,23 +179,23 @@ static inline u32 bcm_sf2_mangle_addr(struct bcm_sf2_priv *priv, u32 off) static inline u32 core_readl(struct bcm_sf2_priv *priv, u32 off) { u32 tmp = bcm_sf2_mangle_addr(priv, off); - return __raw_readl(priv->core + tmp); + return readl_relaxed(priv->core + tmp); } static inline void core_writel(struct bcm_sf2_priv *priv, u32 val, u32 off) { u32 tmp = bcm_sf2_mangle_addr(priv, off); - __raw_writel(val, priv->core + tmp); + writel_relaxed(val, priv->core + tmp); } static inline u32 reg_readl(struct bcm_sf2_priv *priv, u16 off) { - return __raw_readl(priv->reg + priv->reg_offsets[off]); + return readl_relaxed(priv->reg + priv->reg_offsets[off]); } static inline void reg_writel(struct bcm_sf2_priv *priv, u32 val, u16 off) { - __raw_writel(val, priv->reg + priv->reg_offsets[off]); + writel_relaxed(val, priv->reg + priv->reg_offsets[off]); } SF2_IO64_MACRO(core); -- 1.9.1
[PATCH net-next v2 0/4] Endian fixes for SYSTEMPORT/SF2/MDIO
Hi David, While trying an ARM BE kernel for kinks, the 3 drivers below started not working and the reasons why became pretty obvious because the register space remains LE (hardwired), except for Broadcom MIPS where it follows the CPU's native endian (let's call that a feature). Thanks! Changes in v2: - correctly set RSB_SWAP1 and RSB_SWAP0 for all combinations, now properly tested on SYSTEMPORT (BCM7445), SYSTEMPORT Lite (BCM7278) under both endian Florian Fainelli (4): net: systemport: Use correct I/O accessors net: dsa: bcm_sf2: Use correct I/O accessors net: systemport: Set correct RSB endian bits based on host net: phy: mdio-bcm-unimac: Use correct I/O accessors drivers/net/dsa/bcm_sf2.h | 12 +-- drivers/net/ethernet/broadcom/bcmsysport.c | 20 +++ drivers/net/phy/mdio-bcm-unimac.c | 32 -- 3 files changed, 44 insertions(+), 20 deletions(-) -- 1.9.1
Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi
On Tue, Aug 29, 2017 at 03:53:08PM -0400, Willem de Bruijn wrote: > On Tue, Aug 29, 2017 at 3:42 PM, Michael S. Tsirkinwrote: > > On Tue, Aug 29, 2017 at 03:35:38PM -0400, Willem de Bruijn wrote: > >> By the way, I have had an unrelated patch outstanding for a while > >> to have virtio-net support the VIRTIO_CONFIG_S_NEEDS_RESET > >> command. Will send that as RFC. > > > > Oh nice. > > Great :) > > > One needs to be careful about locking there which is why > > no devices support that yet. > > I originally wrote it based on the virtnet_reset function introduced > for xdp. Calling this from virtnet_config_changed_work is non trivial, > as virtnet_freeze_down waits until no config worker is running. > > Otherwise, I could not find any constraints on when freeze may be > called, and it largely follows the same path. I hope I didn't miss anything. The issue is that on freeze processes are not running so we generally know no new packets will arrive (might be wrong for bridging, then it's a bug). On device error you must prevent new skbs from coming in, etc. -- MST
[PATCH net-next v2 1/4] net: systemport: Use correct I/O accessors
The SYSTEMPORT driver currently uses __raw_{read,write}l which means native I/O endian. This works correctly for an ARM LE kernel (default) but fails miserably on an ARM BE (BE8) kernel where registers are kept little endian, so replace uses with {read,write}l_relaxed here which is what we want because this is all performance sensitive code. Signed-off-by: Florian Fainelli--- drivers/net/ethernet/broadcom/bcmsysport.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index b3a21418f511..a7e84292af50 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -32,13 +32,13 @@ #define BCM_SYSPORT_IO_MACRO(name, offset) \ static inline u32 name##_readl(struct bcm_sysport_priv *priv, u32 off) \ { \ - u32 reg = __raw_readl(priv->base + offset + off); \ + u32 reg = readl_relaxed(priv->base + offset + off); \ return reg; \ } \ static inline void name##_writel(struct bcm_sysport_priv *priv, \ u32 val, u32 off) \ { \ - __raw_writel(val, priv->base + offset + off); \ + writel_relaxed(val, priv->base + offset + off); \ } \ BCM_SYSPORT_IO_MACRO(intrl2_0, SYS_PORT_INTRL2_0_OFFSET); @@ -59,14 +59,14 @@ static inline u32 rdma_readl(struct bcm_sysport_priv *priv, u32 off) { if (priv->is_lite && off >= RDMA_STATUS) off += 4; - return __raw_readl(priv->base + SYS_PORT_RDMA_OFFSET + off); + return readl_relaxed(priv->base + SYS_PORT_RDMA_OFFSET + off); } static inline void rdma_writel(struct bcm_sysport_priv *priv, u32 val, u32 off) { if (priv->is_lite && off >= RDMA_STATUS) off += 4; - __raw_writel(val, priv->base + SYS_PORT_RDMA_OFFSET + off); + writel_relaxed(val, priv->base + SYS_PORT_RDMA_OFFSET + off); } static inline u32 tdma_control_bit(struct bcm_sysport_priv *priv, u32 bit) @@ -110,10 +110,10 @@ static inline void dma_desc_set_addr(struct bcm_sysport_priv *priv, dma_addr_t addr) { #ifdef CONFIG_PHYS_ADDR_T_64BIT - __raw_writel(upper_32_bits(addr) & DESC_ADDR_HI_MASK, + writel_relaxed(upper_32_bits(addr) & DESC_ADDR_HI_MASK, d + DESC_ADDR_HI_STATUS_LEN); #endif - __raw_writel(lower_32_bits(addr), d + DESC_ADDR_LO); + writel_relaxed(lower_32_bits(addr), d + DESC_ADDR_LO); } static inline void tdma_port_write_desc_addr(struct bcm_sysport_priv *priv, -- 1.9.1
[PATCH net-next v2 3/4] net: systemport: Set correct RSB endian bits based on host
RSB_SWAP0 needs to match the host CPU endian, and it needs to be set for LE and clear for BE. RSB_SWAP1 must always be cleared for SYSTEMPORT Lite. With these settings, we have the Receive Status Block always match the host endian and we do not need to perform any conversion. Since there is not necessarily a CONFIG_CPU_LITTLE_ENDIAN option defined, we test for !CONFIG_CPU_BIG_ENDIAN which is guaranteed to be set. Signed-off-by: Florian Fainelli--- drivers/net/ethernet/broadcom/bcmsysport.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c b/drivers/net/ethernet/broadcom/bcmsysport.c index a7e84292af50..931751e4f369 100644 --- a/drivers/net/ethernet/broadcom/bcmsysport.c +++ b/drivers/net/ethernet/broadcom/bcmsysport.c @@ -1762,10 +1762,14 @@ static void rbuf_init(struct bcm_sysport_priv *priv) reg = rbuf_readl(priv, RBUF_CONTROL); reg |= RBUF_4B_ALGN | RBUF_RSB_EN; /* Set a correct RSB format on SYSTEMPORT Lite */ - if (priv->is_lite) { + if (priv->is_lite) reg &= ~RBUF_RSB_SWAP1; + + /* Set a correct RSB format based on host endian */ + if (!IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) reg |= RBUF_RSB_SWAP0; - } + else + reg &= ~RBUF_RSB_SWAP0; rbuf_writel(priv, reg, RBUF_CONTROL); } -- 1.9.1
[PATCH net-next v2 4/4] net: phy: mdio-bcm-unimac: Use correct I/O accessors
The driver currently uses __raw_{read,write}l which works for all platforms supported: Broadcom MIPS LE/BE (native endian), ARM LE (native endian) but not ARM BE (registers are still LE). Switch to using the proper accessors for all platforms and explain why Broadcom MIPS BE is special here, in doing so, we introduce a couple of helper functions to abstract these differences. Signed-off-by: Florian Fainelli--- drivers/net/phy/mdio-bcm-unimac.c | 32 ++-- 1 file changed, 26 insertions(+), 6 deletions(-) diff --git a/drivers/net/phy/mdio-bcm-unimac.c b/drivers/net/phy/mdio-bcm-unimac.c index 73c5267a11fd..08e0647b85e2 100644 --- a/drivers/net/phy/mdio-bcm-unimac.c +++ b/drivers/net/phy/mdio-bcm-unimac.c @@ -47,18 +47,38 @@ struct unimac_mdio_priv { void*wait_func_data; }; +static inline u32 unimac_mdio_readl(struct unimac_mdio_priv *priv, u32 offset) +{ + /* MIPS chips strapped for BE will automagically configure the +* peripheral registers for CPU-native byte order. +*/ + if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) + return __raw_readl(priv->base + offset); + else + return readl_relaxed(priv->base + offset); +} + +static inline void unimac_mdio_writel(struct unimac_mdio_priv *priv, u32 val, + u32 offset) +{ + if (IS_ENABLED(CONFIG_MIPS) && IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) + __raw_writel(val, priv->base + offset); + else + writel_relaxed(val, priv->base + offset); +} + static inline void unimac_mdio_start(struct unimac_mdio_priv *priv) { u32 reg; - reg = __raw_readl(priv->base + MDIO_CMD); + reg = unimac_mdio_readl(priv, MDIO_CMD); reg |= MDIO_START_BUSY; - __raw_writel(reg, priv->base + MDIO_CMD); + unimac_mdio_writel(priv, reg, MDIO_CMD); } static inline unsigned int unimac_mdio_busy(struct unimac_mdio_priv *priv) { - return __raw_readl(priv->base + MDIO_CMD) & MDIO_START_BUSY; + return unimac_mdio_readl(priv, MDIO_CMD) & MDIO_START_BUSY; } static int unimac_mdio_poll(void *wait_func_data) @@ -87,7 +107,7 @@ static int unimac_mdio_read(struct mii_bus *bus, int phy_id, int reg) /* Prepare the read operation */ cmd = MDIO_RD | (phy_id << MDIO_PMD_SHIFT) | (reg << MDIO_REG_SHIFT); - __raw_writel(cmd, priv->base + MDIO_CMD); + unimac_mdio_writel(priv, cmd, MDIO_CMD); /* Start MDIO transaction */ unimac_mdio_start(priv); @@ -96,7 +116,7 @@ static int unimac_mdio_read(struct mii_bus *bus, int phy_id, int reg) if (ret) return ret; - cmd = __raw_readl(priv->base + MDIO_CMD); + cmd = unimac_mdio_readl(priv, MDIO_CMD); /* Some broken devices are known not to release the line during * turn-around, e.g: Broadcom BCM53125 external switches, so check for @@ -118,7 +138,7 @@ static int unimac_mdio_write(struct mii_bus *bus, int phy_id, /* Prepare the write operation */ cmd = MDIO_WR | (phy_id << MDIO_PMD_SHIFT) | (reg << MDIO_REG_SHIFT) | (0x & val); - __raw_writel(cmd, priv->base + MDIO_CMD); + unimac_mdio_writel(priv, cmd, MDIO_CMD); unimac_mdio_start(priv); -- 1.9.1
Re: [PATCH RFC 2/2] virtio_net: enable probing for NEEDS_RESET support
On Tue, Aug 29, 2017 at 04:27:41PM -0400, Willem de Bruijn wrote: > On Tue, Aug 29, 2017 at 4:16 PM, Michael S. Tsirkinwrote: > > On Tue, Aug 29, 2017 at 04:07:59PM -0400, Willem de Bruijn wrote: > >> From: Willem de Bruijn > >> > >> Implement a mechanism to signal that a virtio device implements the > >> VIRTIO_CONFIG_S_NEEDS_RESET command. > >> > >> Testing for VIRTIO_CONFIG_S_NEEDS_RESET support by issuing the request > >> and verifying the reset state would require an expensive state change. > >> > >> To avoid that, add a status bit to the feature register used during > >> feature negotiation between host and guest. > >> > >> Set the bit for virtio-net, which supports the feature. > >> > >> Signed-off-by: Willem de Bruijn > > > > All virtio 1 devices have the reset feature > > I don't quite follow. No device drivers implement this request currently? Depends. Spec 1.0 describes the bit and that driver can respond by reseting the device. You seem to do something else in this patchset, but as designed in 1.0 it does not seem to need a feature bit. > > so maybe guest does > > not need this flag. Does device need it? Does device really > > care that guest can't recover? > > If all device drivers support it, then the flag is not needed. > > But as long as legacy device drivers can exist that do not support > this feature, it has to have a way of differentiating between the two. Why? Device won't set this unless it's in a bad state. In that case, setting the bit is harmless even if drivers ignore it. -- MST
Re: [PATCH net-next v3] bridge: fdb add and delete tracepoints
On 29/08/17 23:16, Roopa Prabhu wrote: > From: Roopa Prabhu> > A few useful tracepoints to trace bridge forwarding > database updates. > > Signed-off-by: Roopa Prabhu > --- > v2: address comments from florian > v3: remove stray character '=' in print (pointed out by florian) > > include/trace/events/bridge.h | 98 > + > net/bridge/br_fdb.c |7 +++ > net/core/net-traces.c |6 +++ > 3 files changed, 111 insertions(+) > create mode 100644 include/trace/events/bridge.h > Acked-by: Nikolay Aleksandrov
Re: [PATCH] Documentation: fix little inconsistencies
On Tue 2017-08-29 13:09:45, Jonathan Corbet wrote: > On Tue, 29 Aug 2017 09:50:57 -0700 > "Darrick J. Wong"wrote: > > > > Anything wrong here? It is fixing extra '+' in the ascii art... > > > > There's nothing incorrect here, I merely thought it odd to send a fix > > for networking documentation to the ext4 list, but not netdev? > > In fact, davem likes to handle networking documentation patches himself, > so resending this one to netdev would be a good idea. Its a typo in documentation, do we really need to waste any more time with it? Dave, can you just ack it? > > index 5e40e1f..6309e90 100644 > > --- a/Documentation/networking/switchdev.txt > > +++ b/Documentation/networking/switchdev.txt > > @@ -29,7 +29,7 @@ with SR-IOV or soft switches, such as OVS, are > > possible. > > sw1p1 + sw1p3 + sw1p5 + eth1 > > +|+|+|+ > > ||||||| > > - +--++++-+--++---+ +-+-+ > > + +--++++++---+ +-+-+ Besides, if documentation fixes should go to Davem/netdev, getmaintainer.pl should say so: pavel@duo:/data/l/linux-n900$ scripts/get_maintainer.pl -f Documentation/networking Jonathan Corbet (maintainer:DOCUMENTATION) "David S. Miller" (commit_signer:52/87=60%) David Howells (commit_signer:7/87=8%,authored:8/87=9%) Florian Fainelli (commit_signer:6/87=7%) Mauro Carvalho Chehab (commit_signer:6/87=7%,authored:5/87=6%) Yuchung Cheng (commit_signer:6/87=7%) Hangbin Liu (authored:5/87=6%) linux-...@vger.kernel.org (open list:DOCUMENTATION) linux-ker...@vger.kernel.org (open list) pavel@duo:/data/l/linux-n900$ Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature