Re: [PATCH net-next 2/3] net: Add FIB table id to rtable
[Cc: linux...@vger.kernel.org] On Tue, Sep 15, 2015 at 12:01:59PM -0700, David Miller wrote: > From: David Ahern <d...@cumulusnetworks.com> > Date: Wed, 2 Sep 2015 13:58:35 -0700 > > > Add the FIB table id to rtable to make the information available for > > IPv4 as it is for IPv6. > > > > Signed-off-by: David Ahern <d...@cumulusnetworks.com> > > Applied. Unfortunately I have observed the following when booting the koelsch board which is based on the Renesas ARM r8a7791 SoC. The kernel was complied using the shmobile_defconfig. I also see this problem in net-next (37d2dbcdcca8) and next-20150917. Booting Linux on physical CPU 0x0 Linux version 4.2.0-11171-gb7503e0cdb5d (ho...@ayumi.isobedori.kobe.vergenet.net) (gcc version 4.6.3 (GCC) ) #6130 SMP Thu Sep 17 15:33:06 JST 2015 CPU: ARMv7 Processor [413fc0f2] revision 2 (ARMv7), cr=10c5307d CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache Machine model: Koelsch Ignoring memory block 0x2 - 0x24000 debug: ignoring loglevel setting. Memory policy: Data cache writealloc On node 0 totalpages: 262144 free_area_init_node: node 0, pgdat c06b5a40, node_mem_map eeff9000 Normal zone: 1520 pages used for memmap Normal zone: 0 pages reserved Normal zone: 194560 pages, LIFO batch:31 HighMem zone: 67584 pages, LIFO batch:15 PERCPU: Embedded 10 pages/cpu @eefc s17984 r0 d22976 u40960 pcpu-alloc: s17984 r0 d22976 u40960 alloc=10*4096 pcpu-alloc: [0] 0 [0] 1 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 260624 Kernel command line: ignore_loglevel rw root=/dev/nfs ip=dhcp PID hash table entries: 4096 (order: 2, 16384 bytes) Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1032232K/1048576K available (4935K kernel code, 244K rwdata, 1356K rodata, 304K init, 204K bss, 16344K reserved, 0K cma-reserved, 270336K highmem) Virtual kernel memory layout: vector : 0x - 0x1000 ( 4 kB) fixmap : 0xffc0 - 0xfff0 (3072 kB) vmalloc : 0xf000 - 0xff00 ( 240 MB) lowmem : 0xc000 - 0xef80 ( 760 MB) pkmap : 0xbfe0 - 0xc000 ( 2 MB) .text : 0xc0008000 - 0xc062dfec (6296 kB) .init : 0xc062e000 - 0xc067a000 ( 304 kB) .data : 0xc067a000 - 0xc06b7340 ( 245 kB) .bss : 0xc06ba000 - 0xc06ed314 ( 205 kB) Hierarchical RCU implementation. Build-time adjustment of leaf fanout to 32. RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2. RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2 NR_IRQS:16 nr_irqs:16 16 Architected cp15 timer(s) running at 10.00MHz (virt). clocksource: arch_sys_counter: mask: 0xff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns sched_clock: 56 bits at 10MHz, resolution 100ns, wraps every 4398046511100ns Switching to timer-based delay loop, resolution 100ns Console: colour dummy device 80x30 console [tty0] enabled Calibrating delay loop (skipped), value calculated using timer frequency.. 20.00 BogoMIPS (lpj=10) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 1, 8192 bytes) Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes) CPU: Testing write buffer coherency: ok CPU0: update cpu_capacity 1024 CPU0: thread -1, cpu 0, socket 0, mpidr 8000 Setting up static identity map for 0x40009000 - 0x40009058 Unable to boot CPU1 when MD21 is set CPU1: failed to boot: -524 Brought up 1 CPUs SMP: Total of 1 processors activated (20.00 BogoMIPS). CPU: All CPU(s) started in SVC mode. devtmpfs: initialized VFP support v0.3: implementor 41 architecture 4 part 30 variant f rev 0 clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 1911260446275 ns pinctrl core: initialized pinctrl subsystem NET: Registered protocol family 16 DMA: preallocated 256 KiB pool for atomic coherent allocations renesas_irqc e61c.interrupt-controller: driving 10 irqs sh-pfc e606.pfc: r8a77910_pfc support registered No ATAGs? hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers. hw-breakpoint: maximum watchpoint size is 8 bytes. IRQ2 is asserted, installing da9063/da9210 regulator quirk gpio-regulator regulator@1: Could not obtain regulator setting GPIOs: -517 gpio-regulator regulator@3: Could not obtain regulator setting GPIOs: -517 gpio-regulator regulator@5: Could not obtain regulator setting GPIOs: -517 vgaarb: loaded SCSI subsystem initialized libata version 3.00 loaded. usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb i2c 6-0058: Masking da9063 interrupt sources i2c 6-0068: Masking da9210 interrupt sources i2c 6-0068: IRQ2 is not asserted, removing quirk i2c-sh_mobile e60b.i2c: I2C adapter 6, bus speed 10 Hz media: Linux media interface: v0.10 Linux video captur
Re: [linux-next] oops in ip_route_input_noref
On 17 September 2015 at 01:47, Sergey Senozhatskywrote: > On (09/16/15 07:07), David Ahern wrote: >> Hi Sergey: >> > > Hi, > > sorry for long reply. Baremetal. So grabbing the backtrace is > a bit complicated. But it looks very close to what Richard Alpe > has posted. Hi, in this boot log you will find a backtrace: https://lava.collabora.co.uk/scheduler/job/67404/log_file (ip_route_input_noref) from [] (ip_rcv+0x39c/0x6e8) (ip_rcv) from [] (__netif_receive_skb_core+0x5ec/0x7c0) (__netif_receive_skb_core) from [] (netif_receive_skb_internal+0x34/0xa4) (netif_receive_skb_internal) from [] (napi_gro_receive+0x78/0xa4) (napi_gro_receive) from [] (rtl8169_poll+0x2dc/0x5dc) (rtl8169_poll) from [] (net_rx_action+0x1d4/0x2d0) (net_rx_action) from [] (__do_softirq+0xfc/0x214) (__do_softirq) from [] (irq_exit+0xb0/0x118) (irq_exit) from [] (__handle_domain_irq+0x60/0xb4) (__handle_domain_irq) from [] (gic_handle_irq+0x54/0x94) (gic_handle_irq) from [] (__irq_svc+0x54/0x70) This is on a jetson-tk1 booting a multi_v7_defconfig kernel. I expect this issue to appear in today's kernelci.org boots. I don't see this or any other boot error after applying David's patch. Regards, Tomeu > in IRQ > > RIP is at ip_route_input_noref > > [0.877597] [] arp_process+0x39c/0x690 > [0.877597] [] arp_rcv+0x13e/0x170 > > > -ss > > >> Is this with KVM or baremetal? >> >> -8<- >> thanks for the analysis >> >> >>addr2line -e vmlinux -i 0x8146c0b1 >> >>net/ipv4/route.c:1815 >> >>net/ipv4/route.c:1905 >> >> >> >> >> >>which seems to be this line ip_route_input_noref()->ip_route_input_slow(): >> >>... >> >>1813 rth->rt_is_input = 1; >> >>1814 if (res.table) >> >>1815 rth->rt_table_id = res.table->tb_id; >> >>1816 >> >>... >> >> >> >> >> >>added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 >> >> >> >> net: Add FIB table id to rtable >> >> >> >> Add the FIB table id to rtable to make the information available for >> >> IPv4 as it is for IPv6. >> >> >> >> >> >>-ss >> >> Hi Richard: >> >> >I to get an Oops in ip_route_input_noref(). It happens occasionally during >> >bootup. >> >KVM environment using virtio driver. Let me know if you need any additional >> >info or >> >if you want me to try to bisect it. >> > >> >Starting network... >> >... >> >[0.877040] BUG: unable to handle kernel NULL pointer dereference at >> >0056 >> >[0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00 >> >> Can you send me your kernel config and qemu command line? KVM with virtio >> networking is a primary test vehicle, and I did not encounter this at all. >> >> Thanks, >> David >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/3] can: Allwinner A10/A20 CAN Controller support - Devicetree bindings
On 09/16/2015 01:21 PM, Gerhard Bertelsmann wrote: > Devicetree bindings for Allwinner A10/A20 CAN > > Signed-off-by: Gerhard Bertelsmann> --- > > .../devicetree/bindings/net/can/sun4i_can.txt | 38 + > 1 files changed, 389 insertions(+) > > > diff --git a/Documentation/devicetree/bindings/net/can/sun4i_can.txt > b/Documentation/devicetree/bindings/net/can/sun4i_can.txt > new file mode 100644 > index 000..cd0f50c > --- /dev/null > +++ b/Documentation/devicetree/bindings/net/can/sun4i_can.txt > @@ -0,0 +1,38 @@ > +Allwinner A10/A20 CAN controller Device Tree Bindings > +- > + > +Required properties: > +- compatible: "allwinner,sun4i-a10-can" > +- reg: physical base address and size of the Allwinner A10/A20 CAN register > map. > +- interrupts: interrupt specifier for the sole interrupt. > +- clock: phandle and clock specifier. > + > + > +Example > +--- > + > +SoC common .dtsi file: > + > + can0_pins_a: can0@0 { > + allwinner,pins = "PH20","PH21"; > + allwinner,function = "can"; > + allwinner,drive = <0>; > + allwinner,pull = <0>; > + }; > +... > + can0: can@01c2bc00 { > + compatible = "allwinner,sun4i-a10-can"; > + reg = <0x01c2bc00 0x400>; > + interrupts = <0 26 4>; > + clocks = <_gates 4>; > + status = "disabled"; > + }; What about adding this snippet to SoC where the CAN core is available? Maxime, what's the policy on sinxi? If you give me an Ack I'd like to take the series via linux-can-next (and to net-next) upstream. Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions| Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917- | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | signature.asc Description: OpenPGP digital signature
[PATCH v2 1/5] net: add Hisilicon Network Subsystem support (config and documents)
The Hisilicon Network Subsystem is a long term evolution IP which is supposed to be used in Hisilicon ICT SoC. The IP, which is called hns for short, is a TCP/IP acceleration engine, which can directly decode TCP/IP stream and distribute them to different ring buffers. HNS can be configured to work on different mode for different scenario. This patch make use only some of the mode to make it as standard ethernet NIC. The other mode will be added soon. The whole function has 4 kernel sub-modules: hnae: the HNS acceleration engine framework. It provides a abstract interface between the engine and the upper layers which make use of the engine by ring buffer. hns_enet_drv: a standard ethernet driver that base on the ring buffer. hns_dsaf: one of the implementation of HNS acceleration engine, which is applied on Hililicon hip05, Hi1610 and other later-on SoCs hns_mdio: the mdio control to the PHY, used by acceleration engine This submit add basic config and documents Signed-off-by: huangdaodeSigned-off-by: Kenneth Lee Signed-off-by: Yisen Zhuang --- .../bindings/net/hisilicon-hip04-net.txt | 4 +- .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 49 ++ .../devicetree/bindings/net/hisilicon-hns-mdio.txt | 22 +++ .../devicetree/bindings/net/hisilicon-hns-nic.txt | 47 + arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi | 193 + 5 files changed, 313 insertions(+), 2 deletions(-) create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt index 988fc69..d1df8a0 100644 --- a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt @@ -32,13 +32,13 @@ Required properties: Required properties: -- compatible: should be "hisilicon,hip04-mdio". +- compatible: should be "hisilicon,mdio". - Inherits from MDIO bus node binding [2] [2] Documentation/devicetree/bindings/net/phy.txt Example: mdio { - compatible = "hisilicon,hip04-mdio"; + compatible = "hisilicon,mdio"; reg = <0x28f1000 0x1000>; #address-cells = <1>; #size-cells = <0>; diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt new file mode 100644 index 000..80411b2 --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt @@ -0,0 +1,49 @@ +Hisilicon DSA Fabric device controller + +Required properties: +- compatible: should be "hisilicon,hns-dsaf-v1" or "hisilicon,hns-dsaf-v2". + "hisilicon,hns-dsaf-v1" is for hip05. + "hisilicon,hns-dsaf-v2" is for Hi1610 and Hi1612. +- dsa-name: dsa fabric name who provide this interface. + should be "dsafX", X is the dsaf id. +- mode: dsa fabric mode string. only support one of dsaf modes like these: + "2port-64vf", + "6port-16rss", + "6port-16vf". +- interrupt-parent: the interrupt parent of this device. +- interrupts: should contain the DSA Fabric and rcb interrupt. +- reg: specifies base physical address(es) and size of the device registers. + The first region is external interface control register base and size. + The second region is SerDes base register and size. + The third region is the PPE register base and size. + The fourth region is dsa fabric base register and size. + The fifth region is cpld base register and size, it is not required if do not use cpld. +- phy-handle: phy handle of physicl port, 0 if not any phy device. see ethernet.txt [1]. +- buf-size: rx buffer size, should be 16-1024. +- desc-num: number of description in TX and RX queue, should be 512, 1024, 2048 or 4096. + +[1] Documentation/devicetree/bindings/net/phy.txt + +Example: + +dsa: dsa@c700 { + compatible = "hisilicon,hns-dsaf-v1"; + dsa_name = "dsaf0"; + mode = "6port-16rss"; + interrupt-parent = <_dsa>; + reg = <0x0 0xC000 0x0 0x42 + 0x0 0xC200 0x0 0x30 + 0x0 0xc500 0x0 0x89 + 0x0 0xc700 0x0 0x6>; + phy-handle = <0 0 0 0 _phy4 _phy5 0 0>; + interrupts = <131 4>,<132 4>, <133 4>,<134 4>, +<135 4>,<136 4>, <137 4>,<138 4>, +<139 4>,<140 4>, <141 4>,<142 4>, +<143 4>,<144 4>, <145 4>,<146 4>, +<147 4>,<148 4>, <384 1>,<385 1>, +<386 1>,<387 1>, <388 1>,<389 1>, +<390 1>,<391 1>,
[PATCH v2 5/5] net: add Hisilicon Network Subsystem basic ethernet support
This is to add basic ethernet support for HNS. It is one of the way to use the HNS acceleration engine. But most of the decoding/encoding capability of the AE cannot be used in this way. This submit contains the basic feature as a ethernet driver. More will be added later. Signed-off-by: huangdaodeSigned-off-by: Kenneth Lee Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/Kconfig |8 + drivers/net/ethernet/hisilicon/hns/Makefile |3 + drivers/net/ethernet/hisilicon/hns/hns_enet.c| 1646 ++ drivers/net/ethernet/hisilicon/hns/hns_enet.h| 84 ++ drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 1230 5 files changed, 2971 insertions(+) create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index aae2c47..165b5a8 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -55,4 +55,12 @@ config HNS_DSAF acceleration engine support. The engine is used in Hisilicon hip05, Hi1610 and further ICT SoC +config HNS_ENET + tristate "Hisilicon HNS Ethernet Device Support" + select PHYLIB + select HNS + ---help--- + This selects the general ethernet driver for HNS. This module make + use of any HNS AE driver, such as HNS_DSAF + endif # NET_VENDOR_HISILICON diff --git a/drivers/net/ethernet/hisilicon/hns/Makefile b/drivers/net/ethernet/hisilicon/hns/Makefile index 0516af7..6010c83 100644 --- a/drivers/net/ethernet/hisilicon/hns/Makefile +++ b/drivers/net/ethernet/hisilicon/hns/Makefile @@ -7,3 +7,6 @@ obj-$(CONFIG_HNS) += hnae.o obj-$(CONFIG_HNS_DSAF) += hns_dsaf.o hns_dsaf-objs = hns_ae_adapt.o hns_dsaf_gmac.o hns_dsaf_mac.o hns_dsaf_misc.o \ hns_dsaf_main.o hns_dsaf_ppe.o hns_dsaf_rcb.o hns_dsaf_xgmac.o + +obj-$(CONFIG_HNS_ENET) += hns_enet_drv.o +hns_enet_drv-objs = hns_enet.o hns_ethtool.o diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c new file mode 100644 index 000..0713ced --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -0,0 +1,1646 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "hnae.h" +#include "hns_enet.h" + +#define NIC_MAX_Q_PER_VF 16 +#define HNS_NIC_TX_TIMEOUT (5 * HZ) + +#define SERVICE_TIMER_HZ (1 * HZ) + +#define NIC_TX_CLEAN_MAX_NUM 256 +#define NIC_RX_CLEAN_MAX_NUM 64 + +#define RCB_ERR_PRINT_CYCLE 1000 + +#define RCB_IRQ_NOT_INITED 0 +#define RCB_IRQ_INITED 1 + +static void fill_desc(struct hnae_ring *ring, void *priv, + int size, dma_addr_t dma, int frag_end, + int buf_num, enum hns_desc_type type) +{ + struct hnae_desc *desc = >desc[ring->next_to_use]; + struct hnae_desc_cb *desc_cb = >desc_cb[ring->next_to_use]; + struct sk_buff *skb; + __be16 protocol; + u32 ip_offset; + u32 asid_bufnum_pid = 0; + u32 flag_ipoffset = 0; + + desc_cb->priv = priv; + desc_cb->length = size; + desc_cb->dma = dma; + desc_cb->type = type; + + desc->addr = cpu_to_le64(dma); + desc->tx.send_size = cpu_to_le16((u16)size); + + /*config bd buffer end */ + flag_ipoffset |= 1 << HNS_TXD_VLD_B; + + asid_bufnum_pid |= buf_num << HNS_TXD_BUFNUM_S; + + if (type == DESC_TYPE_SKB) { + skb = (struct sk_buff *)priv; + + if (skb->ip_summed == CHECKSUM_PARTIAL) { + protocol = skb->protocol; + ip_offset = ETH_HLEN; + + /*if it is a SW VLAN check the next protocol*/ + if (protocol == htons(ETH_P_8021Q)) { + ip_offset += VLAN_HLEN; + protocol = vlan_get_protocol(skb); + skb->protocol = protocol; + } + + if (skb->protocol == htons(ETH_P_IP)) { + flag_ipoffset |= 1 << HNS_TXD_L3CS_B; + /* check for tcp/udp header */ + flag_ipoffset |= 1 << HNS_TXD_L4CS_B; + + } else if (skb->protocol == htons(ETH_P_IPV6)) { + /*
Re: [PATCH net-next v2] net: Initialize table in fib result
On 2015-09-16 18:19, Nikolay Aleksandrov wrote: > The root cause is use of res.table uninitialized. >> >> Thanks to Nikolay for noticing the uninitialized use amongst the maze of >> gotos. >> >> As Nikolay pointed out the second initialization is not required to fix >> the oops, but rather to fix a related problem where a valid lookup should >> be invalidated before creating the rth entry. >> >> Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable") >> Reported-by: Sergey Senozhatsky>> Reported-by: Richard Alpe Works for me as well. Thanks! (Tested-by: Richard Alpe ) Regards Richard -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/5] net: Hisilicon Network Subsystem support
This is V2 of Hisilicon Network Subsystem(HNS) patchesets taking care about LKML comments. Please find out the changes from the change logs. This patchset is rebased on mainline kernel Linux 4.3-rc1 branch. [PATCH v2 1/5] Device Tree Binding Documentation [PATCH v2 2/5] Merge MDIO Module [PATCH v2 3/5] Hisilicon Network Acceleration Engine Framework [PATCH v2 4/5] Distributed System Area Fabric Module [PATCH v2 5/5] Basic Ethernet Driver Module Changes from V1: 1. Remove "inline" in C file (according to LKML comment, same in below). 2. Fix a bug about class_find_device. 3. Change the DTS pattern on hnae, restruct it to compatible with Hi1610 soc. 4. Unified hip04_mdio and hip05_mdio into hns_mdio, which is more usaul for later SOCs. V1 Patches Reference: https://lkml.org/lkml/2015/8/14/165 Thanks huangdaode (5): net: add Hisilicon Network Subsystem support (config and documents) net: add Hisilicon Network Subsystem MDIO support net: add Hisilicon Network Subsystem hnae framework support net: add Hisilicon Network Subsystem DSAF support net: add Hisilicon Network Subsystem basic ethernet support .../bindings/net/hisilicon-hip04-net.txt |4 +- .../devicetree/bindings/net/hisilicon-hns-dsaf.txt | 49 + .../devicetree/bindings/net/hisilicon-hns-mdio.txt | 22 + .../devicetree/bindings/net/hisilicon-hns-nic.txt | 47 + arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi | 193 ++ drivers/net/ethernet/hisilicon/Kconfig | 34 +- drivers/net/ethernet/hisilicon/Makefile|4 +- drivers/net/ethernet/hisilicon/hip04_mdio.c| 185 -- drivers/net/ethernet/hisilicon/hns/Makefile| 12 + drivers/net/ethernet/hisilicon/hns/hnae.c | 507 drivers/net/ethernet/hisilicon/hns/hnae.h | 583 + drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 777 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 704 ++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h | 45 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 900 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 456 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 2445 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 427 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 317 +++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h | 43 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 583 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h | 105 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 1023 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 137 ++ drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h | 972 .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c| 836 +++ .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h| 15 + drivers/net/ethernet/hisilicon/hns/hns_enet.c | 1646 + drivers/net/ethernet/hisilicon/hns/hns_enet.h | 84 + drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 1230 ++ drivers/net/ethernet/hisilicon/hns_mdio.c | 520 + 31 files changed, 14716 insertions(+), 189 deletions(-) create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi delete mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c create mode 100644
[PATCH v2 2/5] net: add Hisilicon Network Subsystem MDIO support
The MDIO support for Hisilicon Network Subsystem. It is used in Hislicon hip04, hip05 and Hi1610 SoC to control the external PHY Signed-off-by: huangdaodeSigned-off-by: Yisen Zhuang Signed-off-by: Kenneth Lee --- drivers/net/ethernet/hisilicon/Kconfig | 10 +- drivers/net/ethernet/hisilicon/Makefile | 3 +- drivers/net/ethernet/hisilicon/hip04_mdio.c | 185 -- drivers/net/ethernet/hisilicon/hns_mdio.c | 520 4 files changed, 531 insertions(+), 187 deletions(-) delete mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c create mode 100644 drivers/net/ethernet/hisilicon/hns_mdio.c diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index dead17b..9184f1d 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -5,7 +5,7 @@ config NET_VENDOR_HISILICON bool "Hisilicon devices" default y - depends on ARM + depends on ARM || ARM64 ---help--- If you have a network (Ethernet) card belonging to this class, say Y. @@ -27,8 +27,16 @@ config HIP04_ETH select PHYLIB select MARVELL_PHY select MFD_SYSCON + select HNS_MDIO ---help--- If you wish to compile a kernel for a hardware with hisilicon p04 SoC and want to use the internal ethernet then you should answer Y to this. +config HNS_MDIO + tristate "Hisilicon HNS MDIO device Support" + select MDIO + ---help--- + This selects the HNS MDIO support. It is needed by HNS_DSAF to access + the PHY + endif # NET_VENDOR_HISILICON diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile index 6c14540..04b4b21 100644 --- a/drivers/net/ethernet/hisilicon/Makefile +++ b/drivers/net/ethernet/hisilicon/Makefile @@ -3,4 +3,5 @@ # obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o -obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o +obj-$(CONFIG_HIP04_ETH) += hip04_eth.o +obj-$(CONFIG_HNS_MDIO) += hns_mdio.o diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c deleted file mode 100644 index fca0a5b..000 --- a/drivers/net/ethernet/hisilicon/hip04_mdio.c +++ /dev/null @@ -1,185 +0,0 @@ -/* Copyright (c) 2014 Linaro Ltd. - * Copyright (c) 2014 Hisilicon Limited. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - */ - -#include -#include -#include -#include -#include - -#define MDIO_CMD_REG 0x0 -#define MDIO_ADDR_REG 0x4 -#define MDIO_WDATA_REG 0x8 -#define MDIO_RDATA_REG 0xc -#define MDIO_STA_REG 0x10 - -#define MDIO_START BIT(14) -#define MDIO_R_VALID BIT(1) -#define MDIO_READ (BIT(12) | BIT(11) | MDIO_START) -#define MDIO_WRITE (BIT(12) | BIT(10) | MDIO_START) - -struct hip04_mdio_priv { - void __iomem *base; -}; - -#define WAIT_TIMEOUT 10 -static int hip04_mdio_wait_ready(struct mii_bus *bus) -{ - struct hip04_mdio_priv *priv = bus->priv; - int i; - - for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) { - if (i == WAIT_TIMEOUT) - return -ETIMEDOUT; - msleep(20); - } - - return 0; -} - -static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum) -{ - struct hip04_mdio_priv *priv = bus->priv; - u32 val; - int ret; - - ret = hip04_mdio_wait_ready(bus); - if (ret < 0) - goto out; - - val = regnum | (mii_id << 5) | MDIO_READ; - writel_relaxed(val, priv->base + MDIO_CMD_REG); - - ret = hip04_mdio_wait_ready(bus); - if (ret < 0) - goto out; - - val = readl_relaxed(priv->base + MDIO_STA_REG); - if (val & MDIO_R_VALID) { - dev_err(bus->parent, "SMI bus read not valid\n"); - ret = -ENODEV; - goto out; - } - - val = readl_relaxed(priv->base + MDIO_RDATA_REG); - ret = val & 0x; -out: - return ret; -} - -static int hip04_mdio_write(struct mii_bus *bus, int mii_id, - int regnum, u16 value) -{ - struct hip04_mdio_priv *priv = bus->priv; - u32 val; - int ret; - - ret = hip04_mdio_wait_ready(bus); - if (ret < 0) - goto out; - - writel_relaxed(value, priv->base + MDIO_WDATA_REG); - val = regnum | (mii_id << 5) | MDIO_WRITE; - writel_relaxed(val, priv->base + MDIO_CMD_REG); -out: - return ret; -} - -static int hip04_mdio_reset(struct mii_bus *bus) -{ - int temp, i; - -
[PATCH v2 3/5] net: add Hisilicon Network Subsystem hnae framework support
HNAE (Hisilicon Network Acceleration Engine) is a framework to provide a unified ring buffer interface for Hisilicon Network Acceleration Engines. With the interface, upper layer can work as ethernet driver, ODP driver or other service driver on purpose. Signed-off-by: huangdaodeSigned-off-by: Kenneth Lee Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/Kconfig | 7 + drivers/net/ethernet/hisilicon/Makefile | 1 + drivers/net/ethernet/hisilicon/hns/Makefile | 5 + drivers/net/ethernet/hisilicon/hns/hnae.c | 507 drivers/net/ethernet/hisilicon/hns/hnae.h | 583 5 files changed, 1103 insertions(+) create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig index 9184f1d..85a2609 100644 --- a/drivers/net/ethernet/hisilicon/Kconfig +++ b/drivers/net/ethernet/hisilicon/Kconfig @@ -39,4 +39,11 @@ config HNS_MDIO This selects the HNS MDIO support. It is needed by HNS_DSAF to access the PHY +config HNS + tristate "Hisilicon Network Subsystem Support (Framework)" + ---help--- + This selects the framework support for Hisilicon Network Subsystem. It + is needed by any driver which provides HNS acceleration engine or make + use of the engine + endif # NET_VENDOR_HISILICON diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile index 04b4b21..390b71f 100644 --- a/drivers/net/ethernet/hisilicon/Makefile +++ b/drivers/net/ethernet/hisilicon/Makefile @@ -5,3 +5,4 @@ obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o obj-$(CONFIG_HIP04_ETH) += hip04_eth.o obj-$(CONFIG_HNS_MDIO) += hns_mdio.o +obj-$(CONFIG_HNS) += hns/ diff --git a/drivers/net/ethernet/hisilicon/hns/Makefile b/drivers/net/ethernet/hisilicon/hns/Makefile new file mode 100644 index 000..8a5f1e7 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/Makefile @@ -0,0 +1,5 @@ +# +# Makefile for the HISILICON network device drivers. +# + +obj-$(CONFIG_HNS) += hnae.o diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c b/drivers/net/ethernet/hisilicon/hns/hnae.c new file mode 100644 index 000..0a0a9e8 --- /dev/null +++ b/drivers/net/ethernet/hisilicon/hns/hnae.c @@ -0,0 +1,507 @@ +/* + * Copyright (c) 2014-2015 Hisilicon Limited. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include + +#include "hnae.h" + +#define cls_to_ae_dev(dev) container_of(dev, struct hnae_ae_dev, cls_dev) + +static struct class *hnae_class; + +static void +hnae_list_add(spinlock_t *lock, struct list_head *node, struct list_head *head) +{ + unsigned long flags; + + spin_lock_irqsave(lock, flags); + list_add_tail_rcu(node, head); + spin_unlock_irqrestore(lock, flags); +} + +static void hnae_list_del(spinlock_t *lock, struct list_head *node) +{ + unsigned long flags; + + spin_lock_irqsave(lock, flags); + list_del_rcu(node); + spin_unlock_irqrestore(lock, flags); +} + +static int hnae_alloc_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb) +{ + unsigned int order = hnae_page_order(ring); + struct page *p = dev_alloc_pages(order); + + if (!p) + return -ENOMEM; + + cb->priv = p; + cb->page_offset = 0; + cb->reuse_flag = 0; + cb->buf = page_address(p); + cb->length = hnae_page_size(ring); + cb->type = DESC_TYPE_PAGE; + + return 0; +} + +static void hnae_free_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb) +{ + if (cb->type == DESC_TYPE_SKB) + dev_kfree_skb_any((struct sk_buff *)cb->priv); + else if (unlikely(is_rx_ring(ring))) + put_page((struct page *)cb->priv); + memset(cb, 0, sizeof(*cb)); +} + +static int hnae_map_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb) +{ + cb->dma = dma_map_page(ring_to_dev(ring), cb->priv, 0, + cb->length, ring_to_dma_dir(ring)); + + if (dma_mapping_error(ring_to_dev(ring), cb->dma)) + return -EIO; + + return 0; +} + +static void hnae_unmap_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb) +{ + if (cb->type == DESC_TYPE_SKB) + dma_unmap_single(ring_to_dev(ring), cb->dma, cb->length, +ring_to_dma_dir(ring)); + else + dma_unmap_page(ring_to_dev(ring), cb->dma, cb->length, +
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
From: KY Srinivasan > Sent: 16 September 2015 23:58 ... > > I think we get that. The question is does the Remote NDIS header and > > packet info actually need to be a part of the header data? I would > > argue that it probably doesn't. > > > > So for example in netvsc_start_xmit it looks like you are calling > > init_page_array in order to populate a set of page buffers, but the > > first buffer for the Remote NDIS protocol is populated as a separate > > page and offset. As such it doesn't seem like it necessarily needs to > > be a part of the header data but could be maintained perhaps in a > > separate ring buffer, or perhaps just be a separate page that you break > > up to use for each header. > > You are right; the rndis header can be built as a separate fragment and sent. > Indeed this is what we were doing earlier - on the outgoing path we would > allocate > memory for the rndis header. My goal was to avoid this allocation on every > packet being > sent and I decided to use the headroom instead. If we can completely avoid > all memory > allocation for rndis header, it makes a significant perf difference: ... So just preallocate the header space as a fixed buffer for each ring entry (or tx frame). If you allocate a fixed buffer for each ring entry you may find there are performance gains from copying small fragments into the buffer instead of doing whatever mapping operations are required. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list of all network namespaces
On Wed, 16 Sep 2015 17:54:34 -0700, Rick Jones wrote: > On 09/16/2015 05:46 PM, Ani Sinha wrote: > > just a stupid question. Is it possible to get a list of all active > > network namespaces in the kernel through /proc or some other > > interface? Not reliably and not efficiently. You can look at what plotnetcfg does: https://github.com/jbenc/plotnetcfg/blob/master/netns.c > Presumably you could copy what "ip netns" does, which appears to be to > look in /var/run/netns . At least that is what an strace of that > command suggests. That only works for namespaces added by the ip tool (and presumably a few other tools which leave a symlink in /var/run/netns as a courtesy). Depending on what you need, it may be enough. Be aware that you won't find all net namespaces in the system this way, though. Jiri -- Jiri Benc -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list of all network namespaces
Le 17/09/2015 02:54, Rick Jones a écrit : On 09/16/2015 05:46 PM, Ani Sinha wrote: Hi guys just a stupid question. Is it possible to get a list of all active network namespaces in the kernel through /proc or some other interface? Presumably you could copy what "ip netns" does, which appears to be to look in /var/run/netns . At least that is what an strace of that command suggests. This will only list netns referenced in '/var/run/netns', which is not 'all' existing netns (most probably only netns created by iproute2). Regards, Nicolas -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 27/31] net/tipc: use kmemdup rather than duplicating its implementation
Acked-by: Jon Maloy///jon > -Original Message- > From: Andrzej Hajda [mailto:a.ha...@samsung.com] > Sent: Wednesday, 16 September, 2015 06:07 > To: Jon Maloy; Ying Xue > Cc: Bartlomiej Zolnierkiewicz; Marek Szyprowski; linux- > ker...@vger.kernel.org; David S. Miller; netdev@vger.kernel.org > Subject: Re: [PATCH 27/31] net/tipc: use kmemdup rather than duplicating its > implementation > > Ping. > > Regards > Andrzej > > On 08/07/2015 09:59 AM, Andrzej Hajda wrote: > > The patch was generated using fixed coccinelle semantic patch > > scripts/coccinelle/api/memdup.cocci [1]. > > > > [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320 > > > > Signed-off-by: Andrzej Hajda > > --- > > net/tipc/server.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/net/tipc/server.c b/net/tipc/server.c index > > 922e04a..c187cad 100644 > > --- a/net/tipc/server.c > > +++ b/net/tipc/server.c > > @@ -411,13 +411,12 @@ static struct outqueue_entry > *tipc_alloc_entry(void *data, int len) > > if (!entry) > > return NULL; > > > > - buf = kmalloc(len, GFP_ATOMIC); > > + buf = kmemdup(data, len, GFP_ATOMIC); > > if (!buf) { > > kfree(entry); > > return NULL; > > } > > > > - memcpy(buf, data, len); > > entry->iov.iov_base = buf; > > entry->iov.iov_len = len; > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches
Hi Florian, On 09/16/2015 11:21 PM, Florian Westphal wrote: > Daniel Mackwrote: >> I'm re-addressing the issue of matching socket meta information for >> non-established sockets that has been discussed a while ago: >> >> >> http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 >> >> Being able to reliably match on net_cls cgroup ids is crucial in >> order to build a per-application or per-container firewall rules >> which don't leak ingress packets. Such a feature would be very >> useful to have. > > Could you clarify what 'which don't leak ingress packets' means? Well, currently, the existing cgroups matches only filter packets that are sent to an established socket. All other packets are ignored. So when users install such matches as advertised by the documented examples, and the chain policy is permissive, the firewall 'leaks' packets, which is unexpected. >> The patch set is obviously not yet finished, because a lot more >> protocol handlers need to be patched. Right now, I only addressed >> tcp_ipv4. Before I do that, I want to get some feedback on the >> approach, so please let me know what you think. > > I think there are several issues. > > implementation problems: > - i'm not sure its legal to call the hook input with skb->sk locked, > some matches might want to aquire it. In the code as it stands after my patch set, I don't see where skb->sk is locked? After all, skb->sk is NULL, even on the 2nd iteration, which is why I patched the newly looked up socket to be available in the nf hook. > - what makes NFT_META_CGROUP special? (or was that just an example?) It's what I want to get working, but other 'meta' hooks can be made working in a similar fashion. > design issues: > The assumption seems to be that a given skb can always be mapped to a > particular socket, and hence a cgroup. > > Thats not necessarily the case, e.g. with broad-/multicasting or when > the socket is e.g. in timewait state. Yes, that's true. The idea for multicast would be to just drop the cloned skb instead of delivering it to the final socket. > Some skbs will now travel INPUT hooks twice. > > And once you'd extend this so that we re-invoke nf hooks for mcast > packets, for each socket they've been received on, you change netfilter > behaviour again (one skb, one traversal -> n traversals of ruleset, one > for each sk). > > I think that this makes it a non-starter, sorry. Hmm, I see your point. > I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves > early-demux-esque code into the nft ruleset. > > Then you could do something like > > nft add rule ip filter input meta l4proto tcp demux meta cgroup 42 Ok, but how would that be different from the unconditional demuxing patches we've kicked around earlier, especially when it comes to multicast sockets? Could you explain what you have in mind here? > The caveat being that even in this case we cannot guarantee > that skb->sk is set afterwards, or that a cgroup can be derived from it. > > Iff you absolutely need this, I'd seriously entertain the idea of adding > NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset > as a socket filter. That would be a new netfilter hook then, something that is called after LOCAL_IN, for ingress only? In a sense, it would be called from the protocol handlers, just as my patches do right now, but instead of conditionally re-iterating the same rules again, we would walk a different chain? > But really, at that point, a much better question would be wheter net > cgroups are the answer to whatever the question was, or what problem we > are attempting to address here... The idea is simply to have a packet filter which is based on information derived from the task that sends or will eventually handle the packet. IOW: We want to be able to install netfilter rules that apply to all packets received or sent by tasks that match a certain criteria, without modifying the sources of those tasks. As we already have net_cls hooked up in netfilter rules, it seems easiest to just get this working. But with the multiple approaches we already had, it appears the real fix needs more thinking. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 1/3] can: Allwinner A10/A20 CAN Controller support - Devicetree bindings
On Wed, Sep 16, 2015 at 01:21:19PM +0200, Gerhard Bertelsmann wrote: > Devicetree bindings for Allwinner A10/A20 CAN > > Signed-off-by: Gerhard BertelsmannAcked-by: Maxime Ripard Thanks! Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature
Re: [PATCH v8 1/3] can: Allwinner A10/A20 CAN Controller support - Devicetree bindings
On Thu, Sep 17, 2015 at 10:04:56AM +0200, Marc Kleine-Budde wrote: > On 09/16/2015 01:21 PM, Gerhard Bertelsmann wrote: > > Devicetree bindings for Allwinner A10/A20 CAN > > > > Signed-off-by: Gerhard Bertelsmann> > --- > > > > .../devicetree/bindings/net/can/sun4i_can.txt | 38 + > > 1 files changed, 389 insertions(+) > > > > > > diff --git a/Documentation/devicetree/bindings/net/can/sun4i_can.txt > > b/Documentation/devicetree/bindings/net/can/sun4i_can.txt > > new file mode 100644 > > index 000..cd0f50c > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/net/can/sun4i_can.txt > > @@ -0,0 +1,38 @@ > > +Allwinner A10/A20 CAN controller Device Tree Bindings > > +- > > + > > +Required properties: > > +- compatible: "allwinner,sun4i-a10-can" > > +- reg: physical base address and size of the Allwinner A10/A20 CAN > > register map. > > +- interrupts: interrupt specifier for the sole interrupt. > > +- clock: phandle and clock specifier. > > + > > + > > +Example > > +--- > > + > > +SoC common .dtsi file: > > + > > + can0_pins_a: can0@0 { > > + allwinner,pins = "PH20","PH21"; > > + allwinner,function = "can"; > > + allwinner,drive = <0>; > > + allwinner,pull = <0>; > > + }; > > +... > > + can0: can@01c2bc00 { > > + compatible = "allwinner,sun4i-a10-can"; > > + reg = <0x01c2bc00 0x400>; > > + interrupts = <0 26 4>; > > + clocks = <_gates 4>; > > + status = "disabled"; > > + }; > > What about adding this snippet to SoC where the CAN core is available? > Maxime, what's the policy on sinxi? It would be great, but it can come as a second step. > If you give me an Ack I'd like to take the series via linux-can-next > (and to net-next) upstream. I just did so for this patch, I'll review the driver when I'll have a bit of time. Thanks! Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature
Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared
On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote: > > [...] > > [308309.574551] 8139cp :00:0b.0 eth1: Transmit timeout, status > c 2b0 80ff > > Rx and Tx are enabled. > > Instant (untested) hack below. Thanks; I'll try that. In fact since updating to 4.2 the problem has got worse — now the whole machine dies: [ 232.064630] [ cut here ] [ 232.069282] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x1e5/0x200() [ 232.077840] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out [ 232.084380] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586 [ 232.157787] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #25 [ 232.163982] c10313eb dee95000 ff32 0258 c1031446 0009 [ 232.171988] dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 012f c13c3afc dee95000 [ 232.179978] e04cfd3c dee95000 dee95240 0258 8100 c1272d10 dee95000 [ 232.188012] Call Trace: [ 232.190482] [] ? warn_slowpath_common+0x5b/0x90 [ 232.196063] [] ? warn_slowpath_fmt+0x26/0x30 [ 232.201307] [] ? dev_watchdog+0x1e5/0x200 [ 232.206317] [] ? qdisc_rcu_free+0x30/0x30 [ 232.211307] [] ? call_timer_fn.isra.7+0xe/0x60 [ 232.216811] [] ? qdisc_rcu_free+0x30/0x30 [ 232.221794] [] ? run_timer_softirq+0xfd/0x1b0 [ 232.227221] [] ? __do_softirq+0xa7/0x190 [ 232.232117] [] ? __hrtimer_tasklet_trampoline+0x20/0x20 [ 232.238395] [] ? do_softirq_own_stack+0x1b/0x20 [ 232.243881][] ? do_IRQ+0x35/0xa0 [ 232.248904] [] ? common_interrupt+0x29/0x30 [ 232.254141] [] ? put_unbound_pool+0x17b/0x1a0 [ 232.259470] [] ? default_idle+0x2/0x10 [ 232.264213] [] ? arch_cpu_idle+0x6/0x10 [ 232.269026] [] ? cpu_startup_entry+0xf5/0x190 [ 232.274459] [] ? start_kernel+0x2e5/0x2e8 [ 232.279432] ---[ end trace 30ae4e701c36b431 ]--- [ 232.284167] 8139cp :00:0b.0 eth1: Transmit timeout, status c 2b1 80ac [ 260.106382] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0] [ 260.113515] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586 [ 260.116369] CPU: 0 PID: 0 Comm: swapper Tainted: GW 4.2.0-gx+ #25 [ 260.116369] task: c13f7540 ti: c13f task.ti: c13f [ 260.116369] EIP: 0060:[] EFLAGS: 00200292 CPU: 0 [ 260.116369] EIP is at _raw_spin_unlock_irqrestore+0xa/0x10
Re: Possible netlink autobind regression
On 09/17/15 at 01:15pm, Herbert Xu wrote: > On Wed, Sep 16, 2015 at 10:02:00PM -0700, Cong Wang wrote: > > > > This part doesn't look correct, seems it is checking if this is a kernel > > netlink socket rather than if it is bound. But I am not sure... > > Good point. I've changed it so that bound is only set for non-kernel > sockets. > > ---8<--- > netlink: Fix autobind race condition that leads to zero port ID > > The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink: > Reset portid after netlink_insert failure") introduced a race > condition where if two threads tried to autobind the same socket > one of them may end up with a zero port ID. > > This patch reverts that commit and instead fixes it by introducing > a separte "bound" variable to indicate whether a user-space socket > has been bound. > > Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure") > Reported-by: Tejun Heo> Reported-by: Linus Torvalds > Signed-off-by: Herbert Xu > Reviewed-by: Cong Wang Acked-by: Thomas Graf -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared
On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote: > Instant (untested) hack below. That seems to trigger a lot, but ultimately doesn't help... [ 250.998980] 8139cp :00:0b.0 eth1: Timeout head=000b, tail=000a [ 252.637287] net_ratelimit: 5 callbacks suppressed [ 252.642022] 8139cp :00:0b.0 eth1: Timeout head=003f, tail=003e [ 252.973255] 8139cp :00:0b.0 eth1: Timeout head=0028, tail=0027 [ 253.911945] 8139cp :00:0b.0 eth1: Timeout head=0010, tail=000f [ 254.151013] 8139cp :00:0b.0 eth1: Timeout head=000e, tail=000d [ 255.551730] 8139cp :00:0b.0 eth1: Timeout head=0025, tail=0024 [ 255.568070] 8139cp :00:0b.0 eth1: Timeout head=0027, tail=0024 [ 255.575717] 8139cp :00:0b.0 eth1: Timeout head=002a, tail=0024 [ 255.583035] 8139cp :00:0b.0 eth1: Timeout head=002b, tail=0024 [ 255.590361] 8139cp :00:0b.0 eth1: Timeout head=002c, tail=0024 [ 255.598080] 8139cp :00:0b.0 eth1: Timeout head=002e, tail=0024 [ 267.066384] [ cut here ] [ 267.071053] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x1e5/0x200() [ 267.079526] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out [ 267.086051] Modules linked in: 8139cp sch_teql mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586 [last unloaded: 8139cp] [ 267.161698] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #26 [ 267.167800] c10313eb ddc53000 fde1 0258 c1031446 0009 [ 267.171408] dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 012f c13c3afc ddc53000 [ 267.183847] e06f9dec ddc53000 ddc53240 0258 8100 c1272d10 ddc53000 [ 267.191812] Call Trace: [ 267.194376] [] ? warn_slowpath_common+0x5b/0x90 [ 267.199874] [] ? warn_slowpath_fmt+0x26/0x30 [ 267.205200] [] ? dev_watchdog+0x1e5/0x200 [ 267.210179] [] ? qdisc_rcu_free+0x30/0x30 [ 267.215250] [] ? call_timer_fn.isra.7+0xe/0x60 [ 267.220661] [] ? qdisc_rcu_free+0x30/0x30 [ 267.225739] [] ? run_timer_softirq+0xfd/0x1b0 [ 267.231071] [] ? __do_softirq+0xa7/0x190 [ 267.236054] [] ? __hrtimer_tasklet_trampoline+0x20/0x20 [ 267.242274] [] ? do_softirq_own_stack+0x1b/0x20 [ 267.247768][] ? do_IRQ+0x35/0xa0 [ 267.252248] [] ? common_interrupt+0x29/0x30 [ 267.258062] [] ? put_unbound_pool+0x17b/0x1a0 [ 267.263391] [] ? default_idle+0x2/0x10 [ 267.268186] [] ? arch_cpu_idle+0x6/0x10 [ 267.272999] [] ? cpu_startup_entry+0xf5/0x190 [ 267.278410] [] ? start_kernel+0x2e5/0x2e8 [ 267.283378] ---[ end trace a08600e9030733fc ]--- [ 267.288100] cp_tx_timeout [ 267.290750] 8139cp :00:0b.0 eth1: Transmit timeout, status c 2b1 c0ac [ 267.298166] will lock... [ 267.300709] Handling tx timeout, flags 200286 [ 267.305281] Will wake queue... [ 267.308153] Will unlock... flags 200286 [ 292.120424] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0] [ 292.127561] Modules linked in: 8139cp sch_teql mii iptable_nat pppoe
Re: [fw filter]: Broken! fw mark based tc class selection not working
On 09/14/15 18:04, Cong Wang wrote: That is exactly the original code. But it is not readable at all, at least I still missed it when I touched the tp->init() part. :( Having a boolean doesn't harm anything. The default should really be no head alloced (given that is the main use case). The other part you can make more readable. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible netlink autobind regression
Hello, Herbert. On Thu, Sep 17, 2015 at 01:15:03PM +0800, Herbert Xu wrote: > netlink: Fix autobind race condition that leads to zero port ID > > The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink: > Reset portid after netlink_insert failure") introduced a race > condition where if two threads tried to autobind the same socket > one of them may end up with a zero port ID. > > This patch reverts that commit and instead fixes it by introducing > a separte "bound" variable to indicate whether a user-space socket > has been bound. > > Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure") > Reported-by: Tejun Heo> Reported-by: Linus Torvalds > Signed-off-by: Herbert Xu > Reviewed-by: Cong Wang Maybe add that this led to a deadlock and add a Link tag to this thread? > @@ -1083,10 +1083,12 @@ static int netlink_insert(struct sock *sk, u32 portid) > if (err) { > if (err == -EEXIST) > err = -EADDRINUSE; > - nlk_sk(sk)->portid = 0; > sock_put(sk); > + goto err; > } > > + nlk_sk(sk)->bound = !!portid; !! isn't necessasry and this creates ordering between two stores. ->bound must be visible only after ->portid is visible, so this should be smp_store_release(). > @@ -2371,7 +2373,7 @@ static int netlink_sendmsg(struct socket *sock, struct > msghdr *msg, size_t len) > dst_group = nlk->dst_group; > } > > - if (!nlk->portid) { > + if (!nlk->bound) { And all unlocked reads should be smp_load_acquire(). > err = netlink_autobind(sock); > if (err) > goto out; Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-next] oops in ip_route_input_noref
On Wed, Sep 16, 2015 at 09:04:15AM -0600, David Ahern wrote: > On 9/16/15 9:00 AM, Fabio Estevam wrote: > >On Wed, Sep 16, 2015 at 6:24 AM, Sergey Senozhatsky > >wrote: > > > >>added by b7503e0cdb5dbec5d201aa69dc14679b5ae8 > >> > >> net: Add FIB table id to rtable > >> > >> Add the FIB table id to rtable to make the information available for > >> IPv4 as it is for IPv6. > > > >I see the same issue here when booting a mx25 ARM processor via NFS. > > > >defconfig is arch/arm/configs/imx_v4_v5_defconfig. > > > > I am still not able to reproduce. While I work on a full Cumulus image for > other test cases here's a patch to try; eagle eye Nikolay noted a potential > use without init in the maze of goto's. > > Thanks, > David > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index da427a4a33fe..80f7c5b7b832 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -1712,6 +1712,7 @@ static int ip_route_input_slow(struct sk_buff *skb, > __be32 daddr, __be32 saddr, > goto martian_source; > > res.fi = NULL; > + res.table = NULL; > if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0)) > goto brd_input; > > @@ -1834,6 +1835,7 @@ out:return err; > RT_CACHE_STAT_INC(in_no_route); > res.type = RTN_UNREACHABLE; > res.fi = NULL; > + res.table = NULL; > goto local_input; > > /* I was seeing the same oops as Fabio (except that the faulting address was 0xb instead of 0x7) and after applying this patch I no longer see it: Tested-by: Thierry Reding signature.asc Description: PGP signature
Re: [PATCH iproute2] man ip-link: Fix wording in VLAN reorder_hdr explanation
On 16/09/15 17:55, Vadim Kochan wrote: > Signed-off-by: Vadim Kochan> --- > man/man8/ip-link.8.in | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in > index 1896eb6..4928249 100644 > --- a/man/man8/ip-link.8.in > +++ b/man/man8/ip-link.8.in > @@ -327,7 +327,7 @@ physical device (if this device does not support VLAN > offloading), the similar > on the RX direction - by default the packet will be untagged before being > received by VLAN device. Reordering allows to accelerate tagging on egress > and > to hide VLAN header on ingress so the packet looks like regular Ethernet > packet, > -at the same time it might be confusing while the packet sniffing as the VLAN > header > +at the same time it might be confusing for packet capture as the VLAN header > does not exist within the packet. > > VLAN offloading can be checked by > Acked-by: Jeremy Harris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
David Millerwrites: > From: David Laight > Date: Wed, 16 Sep 2015 16:25:03 + > >> Am I right in thinking this is adding an extra 96 unused bytes to the front >> of almost all skb just so that hyper-v can make its link level header >> contiguous with whatever follows (IP header ?). >> >> Doesn't sound ideal. > > Agreed, this is rediculous, and the entire stack will incur this cost > just because hyperv is enabled in the kernel config. That's what 'RFC' in the subject was about :-) We already have a precedent of increasing LL_MAX_HEADER globaly because of a config option (CONFIG_MAC80211_MESH) but Hyper-V needs more. -- Vitaly -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] ipv6: ip6_fragment: fix headroom tests and skb leak
On Wed, 2015-09-16 at 17:26 +0200, Florian Westphal wrote: > I tested this e1000 driver hacked to not allocate additional headroom > (we end up in slowpath, since LL_RESERVED_SPACE is 16). And it works on the originally-offending setup too; thanks. Tested-by: David Woodhouse> Reported-by: David Woodhouse > Diagnosed-by: David Woodhouse They generally prefer me to use @intel.com for those too, if you would. I draw the line at using it for actual email communication though :) -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
RE: list of all network namespaces
Hi, >Presumably you could copy what "ip netns" does, which appears to be to look in >/var/run/netns . At least that is what an strace of that >command suggests. This is true, but keep in mind that the output of "ip netns", as well as listing the contents of /var/run/netns, reflects only network namespaces which were created with the "ip netns" command. The "ip netns" userspace implementation consists of code which enables this, by creating /var/run/netns, bind mounting it, etc. Network namespaces which were created by other ways (like userspace applications using the clone() system call) will *not* be reflected by neither of them. Regards, Rami Rosen Intel Corporation N�r��yb�X��ǧv�^�){.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
pull request: bluetooth 2015-09-17
Hi Dave, Here's one important patch for the 4.3-rc series that fixes an issue with Bluetooth LE encryption failing because of a too early check for the SMP context. Please let me know if there are any issues pulling. Thanks. Johan --- The following changes since commit 20471ed4d403a5f4de6aa0c10cd1e446f7f2b3c7: dccp: drop null test before destroy functions (2015-09-15 16:49:43 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git for-upstream for you to fetch changes up to d8949aad3eab5d396f4fefcd581773bf07b9a79e: Bluetooth: Delay check for conn->smp in smp_conn_security() (2015-09-17 12:28:27 +0200) Johan Hedberg (1): Bluetooth: Delay check for conn->smp in smp_conn_security() net/bluetooth/smp.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) signature.asc Description: PGP signature
Re: [PATCH v2 net-next 1/2] cls_bpf: introduce integrated actions
On 09/16/15 02:05, Alexei Starovoitov wrote: From: Daniel BorkmannOften cls_bpf classifier is used with single action drop attached. Optimize this use case and let cls_bpf return both classid and action. For backwards compatibility reasons enable this feature under TCA_BPF_FLAG_ACT_DIRECT flag. This is going off in a different direction really. You are replicating the infrastructure inside bpf. cheers, jamal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] igb: assume MSI-X interrupts during initialization
In igb_sw_init() the sequence of calls was changed from igb_init_queue_configuration() igb_init_interrupt_scheme() igb_probe_vfs() to igb_probe_vfs() igb_init_queue_configuration() igb_init_interrupt_scheme() This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not get enabled properly and we run into a NULL pointer if the max_vfs module parameter is specified (adapter->vf_data does not get allocated, crash on accessing the structure). [7.419348] BUG: unable to handle kernel NULL pointer dereference at 0048 [7.419367] IP: [] igb_reset+0xe6/0x5d0 [igb] [7.419370] PGD 0 [7.419373] Oops: 0002 [#1] SMP [7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio [7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153 [7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013 [...] [7.419431] Call Trace: [7.419442] [] igb_probe+0x8b6/0x1340 [igb] [7.419447] [] local_pci_probe+0x45/0xa0 Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling igb_probe_vfs(). The real interrupt capabilities will be checked during igb_init_interrupt_scheme() so this is safe to do. Signed-off-by: Stefan Assmann--- drivers/net/ethernet/intel/igb/igb_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index e174fbb..ba019fc 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2986,6 +2986,9 @@ static int igb_sw_init(struct igb_adapter *adapter) } #endif /* CONFIG_PCI_IOV */ + /* Assume MSI-X interrupts, will be checked during IRQ allocation */ + adapter->flags |= IGB_FLAG_HAS_MSIX; + igb_probe_vfs(adapter); igb_init_queue_configuration(adapter); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next 16/30] ipv6: Only compute net once in ip6mr_forward2_finish
Le 16/09/2015 03:04, Eric W. Biederman a écrit : Signed-off-by: "Eric W. Biederman"--- net/ipv6/ip6mr.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index e95f6b6281de..3e3085b37a91 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -1987,9 +1987,10 @@ int ip6mr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg) static inline int ip6mr_forward2_finish(struct sock *sk, struct sk_buff *skb) { - IP6_INC_STATS_BH(dev_net(skb_dst(skb)->dev), ip6_dst_idev(skb_dst(skb)), + struct net *net = dev_net(skb_dst(skb)->dev); nit: a blank line is needed after this declaration. + IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_OUTFORWDATAGRAMS); - IP6_ADD_STATS_BH(dev_net(skb_dst(skb)->dev), ip6_dst_idev(skb_dst(skb)), + IP6_ADD_STATS_BH(net, ip6_dst_idev(skb_dst(skb)), IPSTATS_MIB_OUTOCTETS, skb->len); return dst_output(sk, skb); } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 net-next 02/10] qed: Add basic L2 interface
From: Manish ChopraThis patch adds a public API for a network driver to work on top of QED. The interface itself is very minimal - it's mostly infrastructure, as the only content it has after this patch is a query for HW-based information required for the creation of a network interface [I.e., no actual protocol-specific configurations are supported]. Signed-off-by: Manish Chopra Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/qed/Makefile | 2 +- drivers/net/ethernet/qlogic/qed/qed.h | 14 ++ drivers/net/ethernet/qlogic/qed/qed_dev.c | 62 +++ drivers/net/ethernet/qlogic/qed/qed_hsi.h | 1 + drivers/net/ethernet/qlogic/qed/qed_l2.c | 87 ++ include/linux/qed/eth_common.h| 278 ++ include/linux/qed/qed_eth_if.h| 38 7 files changed, 481 insertions(+), 1 deletion(-) create mode 100644 drivers/net/ethernet/qlogic/qed/qed_l2.c create mode 100644 include/linux/qed/eth_common.h create mode 100644 include/linux/qed/qed_eth_if.h diff --git a/drivers/net/ethernet/qlogic/qed/Makefile b/drivers/net/ethernet/qlogic/qed/Makefile index 5bbe0c7..dbe6938 100644 --- a/drivers/net/ethernet/qlogic/qed/Makefile +++ b/drivers/net/ethernet/qlogic/qed/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_QED) := qed.o -qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o +qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o qed_int.o qed_l2.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index a44407c..ab87526 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -24,6 +24,7 @@ #include #include "qed_hsi.h" +extern const struct qed_common_ops qed_common_ops_pass; #define DRV_MODULE_VERSION "8.4.0.0" #define MAX_HWFNS_PER_DEVICE(4) @@ -94,13 +95,22 @@ struct qed_qm_iids { enum QED_RESOURCES { QED_SB, + QED_L2_QUEUE, QED_VPORT, + QED_RSS_ENG, QED_PQ, QED_RL, + QED_MAC, + QED_VLAN, QED_ILT, QED_MAX_RESC, }; +enum QED_FEATURE { + QED_PF_L2_QUE, + QED_MAX_FEATURES, +}; + struct qed_hw_info { /* PCI personality */ enum qed_pci_personalitypersonality; @@ -108,6 +118,7 @@ struct qed_hw_info { /* Resource Allocation scheme results */ u32 resc_start[QED_MAX_RESC]; u32 resc_num[QED_MAX_RESC]; + u32 feat_num[QED_MAX_FEATURES]; #define RESC_START(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_start[resc]) #define RESC_NUM(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_num[resc]) @@ -269,6 +280,9 @@ struct qed_hwfn { struct qed_mcp_info *mcp_info; + struct qed_hw_cid_data *p_tx_cids; + struct qed_hw_cid_data *p_rx_cids; + struct qed_dmae_infodmae_info; /* QM init */ diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 7769720..1053388d 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -94,6 +94,15 @@ void qed_resc_free(struct qed_dev *cdev) for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = >hwfns[i]; + kfree(p_hwfn->p_tx_cids); + p_hwfn->p_tx_cids = NULL; + kfree(p_hwfn->p_rx_cids); + p_hwfn->p_rx_cids = NULL; + } + + for_each_hwfn(cdev, i) { + struct qed_hwfn *p_hwfn = >hwfns[i]; + qed_cxt_mngr_free(p_hwfn); qed_qm_info_free(p_hwfn); qed_spq_free(p_hwfn); @@ -204,6 +213,29 @@ int qed_resc_alloc(struct qed_dev *cdev) if (!cdev->fw_data) return -ENOMEM; + /* Allocate Memory for the Queue->CID mapping */ + for_each_hwfn(cdev, i) { + struct qed_hwfn *p_hwfn = >hwfns[i]; + int tx_size = sizeof(struct qed_hw_cid_data) * + RESC_NUM(p_hwfn, QED_L2_QUEUE); + int rx_size = sizeof(struct qed_hw_cid_data) * + RESC_NUM(p_hwfn, QED_L2_QUEUE); + + p_hwfn->p_tx_cids = kzalloc(tx_size, GFP_KERNEL); + if (!p_hwfn->p_tx_cids) { + DP_NOTICE(p_hwfn, + "Failed to allocate memory for Tx Cids\n"); + goto alloc_err; + } + + p_hwfn->p_rx_cids = kzalloc(rx_size, GFP_KERNEL); + if (!p_hwfn->p_rx_cids) { + DP_NOTICE(p_hwfn, +
Re: My Dear in Christ
My Dear, My happiness is that I have lived a worthy life. My doctor told me that I have serious sickness which is cancer problem. Knowing my condition I decided to donate my funds to you. I want this funds to be used for the orphanages, poor and widows. Please i do not want a situation where this funds will be used in an ungodly manner like what my brothers have done in the past. That's why I'm taking this decision. I'm not afraid of death because I know where I'm going. Please have me in your prayers always. The last of my money which no one knows of is the huge cash deposit of united state dollars $52,000,000 that I have with a finance/Security Company. I will want you to dispatch it to charity organizations if only you will be sincere. As soon as i get an answer from you i will give you contacts of my bank and my attorney. Mrs. Vinayak Arora NB Please do not share my email address with anyone as I have received some emails from some unscrupulous people claiming to be charity organizations and other weird stories. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 1/2] cls_bpf: introduce integrated actions
On 9/17/15 6:13 AM, Daniel Borkmann wrote: Hi Jamal, On 09/17/2015 02:37 PM, Jamal Hadi Salim wrote: On 09/16/15 02:05, Alexei Starovoitov wrote: From: Daniel BorkmannOften cls_bpf classifier is used with single action drop attached. Optimize this use case and let cls_bpf return both classid and action. For backwards compatibility reasons enable this feature under TCA_BPF_FLAG_ACT_DIRECT flag. This is going off in a different direction really. You are replicating the infrastructure inside bpf. Hmm, I don't really agree. With cls_bpf you have non-linear classifications as opposed to walking a chain of classifiers: worst case, I have to walk through N classifiers just to find out that the last one matches that I need to drop - this doesn't scale at all. Given that we can make this decision right here, we can use this fact and have simple return codes provided as well. It only supplements non-linear classification that was from the very beginning of cls_bpf a core part of it. I don't see the replication either. May be the commit log was misread as bpf program now executes the actions and bypasses tcf_exts_exec() ? Well, that may be interesting idea for the future, but that's not what the patch is doing. With this patch cls_bpf can return single integer like TC_ACT_SHOT/TC_ACT_OK that gact/act_bpf can already do as an _optimization_ to avoid extra hops. To do full-fledged action chaining the tcf_exts_exec() is used. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Replace get_seconds with ktime_get_seconds
Replace time_t type and get_seconds function which are not y2038 safe on 32-bit systems. Function ktime_get_seconds use monotonic instead of real time and therefore will not cause overflow. Signed-off-by: Ksenija StanojevicReviewed-by: Arnd Bergmann --- net/rxrpc/ar-connection.c | 4 ++-- net/rxrpc/ar-internal.h | 4 ++-- net/rxrpc/ar-transport.c | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/net/rxrpc/ar-connection.c b/net/rxrpc/ar-connection.c index 6631f4f..692b3e6 100644 --- a/net/rxrpc/ar-connection.c +++ b/net/rxrpc/ar-connection.c @@ -808,7 +808,7 @@ void rxrpc_put_connection(struct rxrpc_connection *conn) ASSERTCMP(atomic_read(>usage), >, 0); - conn->put_time = get_seconds(); + conn->put_time = ktime_get_seconds(); if (atomic_dec_and_test(>usage)) { _debug("zombie"); rxrpc_queue_delayed_work(_connection_reap, 0); @@ -852,7 +852,7 @@ static void rxrpc_connection_reaper(struct work_struct *work) _enter(""); - now = get_seconds(); + now = ktime_get_seconds(); earliest = ULONG_MAX; write_lock_bh(_connection_lock); diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h index aef1bd2..2934a73 100644 --- a/net/rxrpc/ar-internal.h +++ b/net/rxrpc/ar-internal.h @@ -208,7 +208,7 @@ struct rxrpc_transport { struct rb_root server_conns; /* server connections on this transport */ struct list_headlink; /* link in master session list */ struct sk_buff_head error_queue;/* error packets awaiting processing */ - time_t put_time; /* time at which to reap */ + unsigned long put_time; /* time at which to reap */ spinlock_t client_lock;/* client connection allocation lock */ rwlock_tconn_lock; /* lock for active/dead connections */ atomic_tusage; @@ -256,7 +256,7 @@ struct rxrpc_connection { struct rxrpc_crypt csum_iv;/* packet checksum base */ unsigned long events; #define RXRPC_CONN_CHALLENGE 0 /* send challenge packet */ - time_t put_time; /* time at which to reap */ + unsigned long put_time; /* time at which to reap */ rwlock_tlock; /* access lock */ spinlock_t state_lock; /* state-change lock */ atomic_tusage; diff --git a/net/rxrpc/ar-transport.c b/net/rxrpc/ar-transport.c index 1976dec..9946467 100644 --- a/net/rxrpc/ar-transport.c +++ b/net/rxrpc/ar-transport.c @@ -189,7 +189,7 @@ void rxrpc_put_transport(struct rxrpc_transport *trans) ASSERTCMP(atomic_read(>usage), >, 0); - trans->put_time = get_seconds(); + trans->put_time = ktime_get_seconds(); if (unlikely(atomic_dec_and_test(>usage))) { _debug("zombie"); /* let the reaper determine the timeout to avoid a race with @@ -226,7 +226,7 @@ static void rxrpc_transport_reaper(struct work_struct *work) _enter(""); - now = get_seconds(); + now = ktime_get_seconds(); earliest = ULONG_MAX; /* extract all the transports that have been dead too long */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next 28/30] netfilter: Pass struct net into the netfilter hooks
Le 16/09/2015 03:04, Eric W. Biederman a écrit : Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc()sock_net(sk) ipv6/ip6_output.c:ip6_xmit()sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman"--- [snip] int br_forward_finish(struct sock *sk, struct sk_buff *skb) { - return NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING, sk, skb, - NULL, skb->dev, + struct net *net = dev_net(skb->dev); nit: blank line after the declaration + return NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING, + net, sk, skb, NULL, skb->dev, br_dev_queue_push_xmit); } [snip] int xfrm4_output(struct sock *sk, struct sk_buff *skb) { - return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, sk, skb, - NULL, skb_dst(skb)->dev, __xfrm4_output, + struct net *net = dev_net(skb_dst(skb)->dev); nit: same here + return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, + net, sk, skb, NULL, skb_dst(skb)->dev, + __xfrm4_output, !(IPCB(skb)->flags & IPSKB_REROUTED)); } [snip] int xfrm6_output(struct sock *sk, struct sk_buff *skb) { - return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, sk, skb, - NULL, skb_dst(skb)->dev, __xfrm6_output, + struct net *net = dev_net(skb_dst(skb)->dev); nit: same here + return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, + net, sk, skb, NULL, skb_dst(skb)->dev, + __xfrm6_output, !(IP6CB(skb)->flags & IP6SKB_REROUTED)); } [snip] int xfrm_output_resume(struct sk_buff *skb, int err) { + struct net *net = xs_net(skb_dst(skb)->xfrm); nit: same here while (likely((err = xfrm_output_one(skb, err)) == 0)) { nf_reset(skb); -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next 22/30] ipv6: Cache net in ip6_output
Le 16/09/2015 03:04, Eric W. Biederman a écrit : Keep net in a local variable so I can use it in NF_HOOK_COND when I pass struct net to all of the netfilter hooks. Signed-off-by: "Eric W. Biederman"--- net/ipv6/ip6_output.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 12d0166a64cd..8cab909b181e 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -135,9 +135,9 @@ int ip6_output(struct sock *sk, struct sk_buff *skb) { struct net_device *dev = skb_dst(skb)->dev; struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb)); + struct net *net = dev_net(dev); nit: same here for the blank line. if (unlikely(idev->cnf.disable_ipv6)) { - IP6_INC_STATS(dev_net(dev), idev, - IPSTATS_MIB_OUTDISCARDS); + IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); kfree_skb(skb); return 0; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next 0/30] Passing net through the netfilter hooks
Le 16/09/2015 02:59, Eric W. Biederman a écrit : My primary goal with this patchset and it's follow ups is to cleanup the network routing paths so that we do not look at the output device to derive the network namespace. My plan is to pass the network namespace of the transmitting socket through the output path, to replace code that looks at the output network device today. Once that is done we can have routes with output devices outside of the current network namespace. Which should allow reception and transmission of packets in network namespaces to be as fast as normal packet reception and transmission with early demux disabled, because it will same code path. Once skb_dst(skb)->dev is a little better under control I think it will also be possible to use rcu to cleanup the ancient hack that sets dst->dev to loopback_dev when a network device is removed. The work to get there is a series of code cleanups. I am starting with passing net into the netfilter hooks and into the functions that are called after the netfilter hooks. This removes from netfilter the need to guess which network namespace it is working on. To get there I perform a series of minor prep patches so the big changes at the end are possible to audit without getting lost in the noise. In particular I have a lot of patches computing net into a local variable and then using it through out the function. So this patchset encompases removing dead code, sorting out the _sk functions that were added last time someone pushed a prototype change through the post netfilter functions. Cleaning up individual functions use of the network namespace. Passing net into the netfilter hooks. Passing net into the post netfilter functions. Using state->net in the netfilter code where it is available and trivially usable. LGTM (except some minor comments). Acked-by: Nicolas Dichtel-- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches
Daniel Mackwrote: > Hi Florian, > > On 09/16/2015 11:21 PM, Florian Westphal wrote: > > Daniel Mack wrote: > >> I'm re-addressing the issue of matching socket meta information for > >> non-established sockets that has been discussed a while ago: > >> > >> > >> http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 > >> > >> Being able to reliably match on net_cls cgroup ids is crucial in > >> order to build a per-application or per-container firewall rules > >> which don't leak ingress packets. Such a feature would be very > >> useful to have. > > > > Could you clarify what 'which don't leak ingress packets' means? > > Well, currently, the existing cgroups matches only filter packets that > are sent to an established socket. All other packets are ignored. So > when users install such matches as advertised by the documented > examples, and the chain policy is permissive, the firewall 'leaks' > packets, which is unexpected. Then 'the documentation' needs fixing. cgroup (and anything related to sk data, including uid, etc.) is not guaranteed to work. We can only match what is available in the packet payload, and some extra info that the stack can make available to us (e.g. VLAN id, or skb->sk in some cases on output) and conntrack state plus whatever extra data conntrack allows to attach. > >> The patch set is obviously not yet finished, because a lot more > >> protocol handlers need to be patched. Right now, I only addressed > >> tcp_ipv4. Before I do that, I want to get some feedback on the > >> approach, so please let me know what you think. > > > > I think there are several issues. > > > > implementation problems: > > - i'm not sure its legal to call the hook input with skb->sk locked, > > some matches might want to aquire it. > > In the code as it stands after my patch set, I don't see where skb->sk > is locked? True. [..] > > design issues: > > The assumption seems to be that a given skb can always be mapped to a > > particular socket, and hence a cgroup. > > > > Thats not necessarily the case, e.g. with broad-/multicasting or when > > the socket is e.g. in timewait state. > > Yes, that's true. The idea for multicast would be to just drop the > cloned skb instead of delivering it to the final socket. -v please. > > I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves > > early-demux-esque code into the nft ruleset. > > > > Then you could do something like > > > > nft add rule ip filter input meta l4proto tcp demux meta cgroup 42 > > Ok, but how would that be different from the unconditional demuxing > patches we've kicked around earlier, especially when it comes to > multicast sockets? Could you explain what you have in mind here? Two things: - keep it out of core network stack - make it explicit so we can document that 'demux' keyword is fishy and will not work reliably. F.e. I don't see how mcast could ever be made to work except by adding an entirely new filtering mechanism/new hooks in core stack. > > The caveat being that even in this case we cannot guarantee > > that skb->sk is set afterwards, or that a cgroup can be derived from it. > > > > Iff you absolutely need this, I'd seriously entertain the idea of adding > > NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset > > as a socket filter. > > That would be a new netfilter hook then, something that is called after > LOCAL_IN, for ingress only? In a sense, it would be called from the > protocol handlers, just as my patches do right now, but instead of > conditionally re-iterating the same rules again, we would walk a > different chain? Yes, something like that. Obviously, you'll need to dru^W brib^W convince a LOT of people before that could ever fly. I think we should not do this and that this 'match on ingress sk properties' is just bad[tm]. f.e. you'd also have to move all of the stuff you want into sock_common ... 8-( > > But really, at that point, a much better question would be wheter net > > cgroups are the answer to whatever the question was, or what problem we > > are attempting to address here... > > The idea is simply to have a packet filter which is based on information > derived from the task that sends or will eventually handle the packet. Starts to smell like snet (https://lwn.net/Articles/441587/) > IOW: We want to be able to install netfilter rules that apply to all > packets received or sent by tasks that match a certain criteria, without > modifying the sources of those tasks. Sorry, I think netfilter is wrong tool for this, at least for ingress. You could use conntrack to stash net_cls id in the connmark, though (for inbound reply packets). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tcp_cubic: do not set epoch_start in the future
From: Eric DumazetTracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start is normally set at ACK processing time, not at send time. Doing a proper fix would need to add an additional state variable, and does not seem worth the trouble, given CUBIC bug has been there forever before Jana noticed it. Let's simply not set epoch_start in the future, otherwise bictcp_update() could overflow and CUBIC would again grow cwnd too fast. This was detected thanks to a packetdrill test Neal wrote that was flaky before applying this fix. Fixes: 30927520dbae ("tcp_cubic: better follow cubic curve after idle period") Signed-off-by: Eric Dumazet Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Cc: Jana Iyengar --- net/ipv4/tcp_cubic.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index c6ded6b2a79f..448c2615fece 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -154,14 +154,20 @@ static void bictcp_init(struct sock *sk) static void bictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event) { if (event == CA_EVENT_TX_START) { - s32 delta = tcp_time_stamp - tcp_sk(sk)->lsndtime; struct bictcp *ca = inet_csk_ca(sk); + u32 now = tcp_time_stamp; + s32 delta; + + delta = now - tcp_sk(sk)->lsndtime; /* We were application limited (idle) for a while. * Shift epoch_start to keep cwnd growth to cubic curve. */ - if (ca->epoch_start && delta > 0) + if (ca->epoch_start && delta > 0) { ca->epoch_start += delta; + if (after(ca->epoch_start, now)) + ca->epoch_start = now; + } return; } } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS/TCP/IPv6 acting strangely in 4.2
On Thu, 17 Sep 2015, Trond Myklebust wrote: > Hi Russell, > > On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: > > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux > > wrote: > > > Following that idea, I just tried the patch below, and it seems to > > > work. > > > I don't know whether it handles all cases after a call to > > > kernel_connect(), > > > but it stops the multiple connection attempts: > > > > > > 1 0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 > > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 > > > WS=128 > > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] > > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 > > > TSval=870318939 TSecr=15712 WS=64 > > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 > > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0x905379cc, [Check: RD LU MD XT DL] > > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > > > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0x905379cc, [Check: RD LU MD XT DL] > > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), > > > [Allowed: RD LU MD XT DL] > > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 > > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), > > > [Allowed: RD LU MD XT DL] > > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), > > > [Allowed: RD LU MD XT DL] > > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: > > > 0xe15fc9c9 > > > ... > > > > NFS people - any comments on this patch? Is it the correct way to > > solve > > this problem (please see the first message in this thread for the > > problem.) > > Without this patch, NFS is unusable as it tries to launch multiple > > new > > connections from the same port to the NFS server without giving the > > NFS > > server time to respond and establish the TCP connection. > > I agree that it addresses a real problem here, however there are a > couple of issues with the patch itself: > > AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and > TCP_CLOSE, so if the connection attempt fails, this patch leaves the > XPRT_CONNECTING flag set. > There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, > TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection > attempt by canceling the XPRT_CONNECTING state. > > How about the following? It is based on your patch, but adds a check to > ensure that xs_tcp_state_change() doesn't clear the 'connecting' state > more than once (which could otherwise still happen in the TCP_CLOSE > case). > > 8<--- > From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 > From: Trond Myklebust> Date: Wed, 16 Sep 2015 23:43:17 -0400 > Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete > before retrying > > Commit 718ba5b87343, moved the responsibility for unlocking the socket to > xs_tcp_setup_socket, meaning that the socket will be unlocked before we > know that it has finished trying to connect. The following patch is based on > an initial patch by Russell King to ensure that we delay clearing the > XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate > a connection attempt, or the connection attempt itself failed. > > Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from > racing") > Reported-by: Russell King > Signed-off-by: Trond Myklebust This fixes up my network segmentation problem, tested on top of your "Fix races between socket connection and destroy code". Tested-by: Benjamin Coddington Ben > --- > include/linux/sunrpc/xprtsock.h | 3 +++ > net/sunrpc/xprtsock.c | 11 --- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h > index 7591788e9fbf..357e44c1a46b 100644 > --- a/include/linux/sunrpc/xprtsock.h > +++ b/include/linux/sunrpc/xprtsock.h > @@ -42,6 +42,7 @@ struct sock_xprt { > /* >* Connection of transports >*/ > + unsigned long sock_state; > struct delayed_work connect_worker; >
[ANNOUNCE] libnftnl 1.0.5 release
Hi! The Netfilter project proudly presents: libnftnl 1.0.5 libnftnl is a userspace library providing a low-level netlink programming interface (API) to the in-kernel nf_tables subsystem. The library libnftnl has been previously known as libnftables. This library is currently used by the nft command line tool. This release resolves problems with LIBVERSION and symbol versioning with regards to 1.0.4. You can download this library from: http://www.netfilter.org/projects/libnftnl/downloads.html ftp://ftp.netfilter.org/pub/libnftnl/ Thanks! Jan Engelhardt (1): build: bump library versioning Pablo Neira Ayuso (1): bump version to 1.0.5
[PATCH][RESEND] ARCNET: fix hard_header_len limit
For arcnet the bare minimum header only contains the 4 bytes to specify source, dest and offset (1, 1 and 2 bytes respectively). The corresponding struct is struct arc_hardware. The struct archdr contains additionally a union of possible soft headers. When doing $insertusecasehere packets might well include short (or even no?) soft headers. For this reason only use arc_hardware instead of archdr to determine the hard_header_len for an arcnet device. Signed-off-by: Michael Grzeschik--- drivers/net/arcnet/arcnet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/arcnet/arcnet.c b/drivers/net/arcnet/arcnet.c index 10f71c73..816d0e9 100644 --- a/drivers/net/arcnet/arcnet.c +++ b/drivers/net/arcnet/arcnet.c @@ -326,7 +326,7 @@ static void arcdev_setup(struct net_device *dev) dev->type = ARPHRD_ARCNET; dev->netdev_ops = _netdev_ops; dev->header_ops = _header_ops; - dev->hard_header_len = sizeof(struct archdr); + dev->hard_header_len = sizeof(struct arc_hardware); dev->mtu = choose_mtu(); dev->addr_len = ARCNET_ALEN; -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RESEND] MAINTAINERS: add arcnet and take maintainership
Add entry for arcnet to MAINTAINERS file and add myself as the maintainer of the subsystem. Signed-off-by: Michael GrzeschikCc: da...@davemloft.net Cc: j...@perches.com --- MAINTAINERS | 7 +++ 1 file changed, 7 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 7ba7ab7..0a015f7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -808,6 +808,13 @@ S: Maintained F: drivers/video/fbdev/arcfb.c F: drivers/video/fbdev/core/fb_defio.c +ARCNET NETWORK LAYER +M: Michael Grzeschik +L: netdev@vger.kernel.org +S: Maintained +F: drivers/net/arcnet/ +F: include/uapi/linux/if_arcnet.h + ARM MFM AND FLOPPY DRIVERS M: Ian Molton S: Maintained -- 2.5.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] net: Fix behaviour of unreachable, blackhole and prohibit routes
Man page of ip-route(8) says following about route types: unreachable - these destinations are unreachable. Packets are dis‐ carded and the ICMP message host unreachable is generated. The local senders get an EHOSTUNREACH error. blackhole - these destinations are unreachable. Packets are dis‐ carded silently. The local senders get an EINVAL error. prohibit - these destinations are unreachable. Packets are discarded and the ICMP message communication administratively prohibited is generated. The local senders get an EACCES error. In the inet6 address family, this was correct, except the local senders got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route. In the inet address family, all three route types generated ICMP message net unreachable, and the local senders got ENETUNREACH error. In both address families all three route types now behave consistently with documentation. Signed-off-by: Nikola Forró--- include/net/ip_fib.h | 30 +++--- net/ipv4/route.c | 6 -- net/ipv6/route.c | 4 +++- 3 files changed, 26 insertions(+), 14 deletions(-) diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h index a37d043..727d6e9 100644 --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -236,8 +236,11 @@ static inline int fib_lookup(struct net *net, const struct flowi4 *flp, rcu_read_lock(); tb = fib_get_table(net, RT_TABLE_MAIN); - if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF)) - err = 0; + if (tb) + err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF); + + if (err == -EAGAIN) + err = -ENETUNREACH; rcu_read_unlock(); @@ -258,7 +261,7 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp, struct fib_result *res, unsigned int flags) { struct fib_table *tb; - int err; + int err = -ENETUNREACH; flags |= FIB_LOOKUP_NOREF; if (net->ipv4.fib_has_custom_rules) @@ -268,15 +271,20 @@ static inline int fib_lookup(struct net *net, struct flowi4 *flp, res->tclassid = 0; - for (err = 0; !err; err = -ENETUNREACH) { - tb = rcu_dereference_rtnl(net->ipv4.fib_main); - if (tb && !fib_table_lookup(tb, flp, res, flags)) - break; + tb = rcu_dereference_rtnl(net->ipv4.fib_main); + if (tb) + err = fib_table_lookup(tb, flp, res, flags); + + if (!err) + goto out; + + tb = rcu_dereference_rtnl(net->ipv4.fib_default); + if (tb) + err = fib_table_lookup(tb, flp, res, flags); - tb = rcu_dereference_rtnl(net->ipv4.fib_default); - if (tb && !fib_table_lookup(tb, flp, res, flags)) - break; - } +out: + if (err == -EAGAIN) + err = -ENETUNREACH; rcu_read_unlock(); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 5f4a556..c6ad99a 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2045,6 +2045,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) struct fib_result res; struct rtable *rth; int orig_oif; + int err = -ENETUNREACH; res.tclassid= 0; res.fi = NULL; @@ -2153,7 +2154,8 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) goto make_route; } - if (fib_lookup(net, fl4, , 0)) { + err = fib_lookup(net, fl4, , 0); + if (err) { res.fi = NULL; res.table = NULL; if (fl4->flowi4_oif) { @@ -2181,7 +2183,7 @@ struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *fl4) res.type = RTN_UNICAST; goto make_route; } - rth = ERR_PTR(-ENETUNREACH); + rth = ERR_PTR(err); goto out; } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 3d3c1b2..a608ace 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1885,9 +1885,11 @@ int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret) rt->dst.input = ip6_pkt_prohibit; break; case RTN_THROW: + case RTN_UNREACHABLE: default: rt->dst.error = (cfg->fc_type == RTN_THROW) ? -EAGAIN - : -ENETUNREACH; + : (cfg->fc_type == RTN_UNREACHABLE) + ? -EHOSTUNREACH : -ENETUNREACH; rt->dst.output = ip6_pkt_discard_out; rt->dst.input = ip6_pkt_discard; break; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe
[PATCH net] MAINTAINERS: remove bouncing email address for qlcnic
I got this automated message fromwhen submitting a qlcnic patch: > Shahed Shaikh is no longer with QLogic. If you require assistance please > contact Ariel Elior ariel.el...@qlogic.com There's no point in having a bouncing address in MAINTAINERS. CC: dept-gelinuxnic...@qlogic.com CC: Ariel Elior Signed-off-by: Jiri Benc --- MAINTAINERS | 1 - 1 file changed, 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 310da4295c70..0f0dcfd2d68d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8490,7 +8490,6 @@ F:Documentation/networking/LICENSE.qla3xxx F: drivers/net/ethernet/qlogic/qla3xxx.* QLOGIC QLCNIC (1/10)Gb ETHERNET DRIVER -M: Shahed Shaikh M: dept-gelinuxnic...@qlogic.com L: netdev@vger.kernel.org S: Supported -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 1/2] cls_bpf: introduce integrated actions
Hi Jamal, On 09/17/2015 02:37 PM, Jamal Hadi Salim wrote: On 09/16/15 02:05, Alexei Starovoitov wrote: From: Daniel BorkmannOften cls_bpf classifier is used with single action drop attached. Optimize this use case and let cls_bpf return both classid and action. For backwards compatibility reasons enable this feature under TCA_BPF_FLAG_ACT_DIRECT flag. This is going off in a different direction really. You are replicating the infrastructure inside bpf. Hmm, I don't really agree. With cls_bpf you have non-linear classifications as opposed to walking a chain of classifiers: worst case, I have to walk through N classifiers just to find out that the last one matches that I need to drop - this doesn't scale at all. Given that we can make this decision right here, we can use this fact and have simple return codes provided as well. It only supplements non-linear classification that was from the very beginning of cls_bpf a core part of it. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 net-next 10/10] qede: Add basic ethtool support
From: Sudarsana KalluruThis adds basic ethtool operations to the qed driver, allowing support in: - Statistics gathering [ethtool -S] - Setting of debug level [ethtool -s msglvl] - Getting basic information [ethtool, ethtool -i] In addition it adds the ability to change the MTU. Signed-off-by: Sudarsana Kalluru Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/qede/Makefile | 2 +- drivers/net/ethernet/qlogic/qede/qede.h | 74 + drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 385 drivers/net/ethernet/qlogic/qede/qede_main.c| 137 - 4 files changed, 596 insertions(+), 2 deletions(-) create mode 100644 drivers/net/ethernet/qlogic/qede/qede_ethtool.c diff --git a/drivers/net/ethernet/qlogic/qede/Makefile b/drivers/net/ethernet/qlogic/qede/Makefile index bedfe9f..06ff90d 100644 --- a/drivers/net/ethernet/qlogic/qede/Makefile +++ b/drivers/net/ethernet/qlogic/qede/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_QEDE) := qede.o -qede-y := qede_main.o +qede-y := qede_main.o qede_ethtool.o diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 5729128..239f3e5 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -38,6 +38,70 @@ #define QEDE_NAPI_WEIGHT (NAPI_POLL_WEIGHT) +struct qede_stats { + u64 no_buff_discards; + u64 rx_ucast_bytes; + u64 rx_mcast_bytes; + u64 rx_bcast_bytes; + u64 rx_ucast_pkts; + u64 rx_mcast_pkts; + u64 rx_bcast_pkts; + u64 mftag_filter_discards; + u64 mac_filter_discards; + u64 tx_ucast_bytes; + u64 tx_mcast_bytes; + u64 tx_bcast_bytes; + u64 tx_ucast_pkts; + u64 tx_mcast_pkts; + u64 tx_bcast_pkts; + u64 tx_err_drop_pkts; + u64 coalesced_pkts; + u64 coalesced_events; + u64 coalesced_aborts_num; + u64 non_coalesced_pkts; + u64 coalesced_bytes; + + /* port */ + u64 rx_64_byte_packets; + u64 rx_127_byte_packets; + u64 rx_255_byte_packets; + u64 rx_511_byte_packets; + u64 rx_1023_byte_packets; + u64 rx_1518_byte_packets; + u64 rx_1522_byte_packets; + u64 rx_2047_byte_packets; + u64 rx_4095_byte_packets; + u64 rx_9216_byte_packets; + u64 rx_16383_byte_packets; + u64 rx_crc_errors; + u64 rx_mac_crtl_frames; + u64 rx_pause_frames; + u64 rx_pfc_frames; + u64 rx_align_errors; + u64 rx_carrier_errors; + u64 rx_oversize_packets; + u64 rx_jabbers; + u64 rx_undersize_packets; + u64 rx_fragments; + u64 tx_64_byte_packets; + u64 tx_65_to_127_byte_packets; + u64 tx_128_to_255_byte_packets; + u64 tx_256_to_511_byte_packets; + u64 tx_512_to_1023_byte_packets; + u64 tx_1024_to_1518_byte_packets; + u64 tx_1519_to_2047_byte_packets; + u64 tx_2048_to_4095_byte_packets; + u64 tx_4096_to_9216_byte_packets; + u64 tx_9217_to_16383_byte_packets; + u64 tx_pause_frames; + u64 tx_pfc_frames; + u64 tx_lpi_entry_count; + u64 tx_total_collisions; + u64 brb_truncates; + u64 brb_discards; + u64 tx_mac_ctrl_frames; +}; + struct qede_dev { struct qed_dev *cdev; struct net_device *ndev; @@ -86,6 +150,7 @@ struct qede_dev { max_t(u64, 1UL << QEDE_RX_ALIGN_SHIFT, \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + struct qede_stats stats; struct qed_update_vport_rss_params rss_params; u16 q_num_rx_buffers; /* Must be a power of two */ u16 q_num_tx_buffers; /* Must be a power of two */ @@ -198,6 +263,15 @@ union qede_reload_args { u16 mtu; }; +void qede_config_debug(uint debug, u32 *p_dp_module, u8 *p_dp_level); +void qede_set_ethtool_ops(struct net_device *netdev); +void qede_reload(struct qede_dev *edev, +void (*func)(struct qede_dev *edev, + union qede_reload_args *args), +union qede_reload_args *args); +int qede_change_mtu(struct net_device *dev, int new_mtu); +void qede_fill_by_demand_stats(struct qede_dev *edev); + #define RX_RING_SIZE_POW 13 #define RX_RING_SIZE BIT(RX_RING_SIZE_POW) #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1) diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c new file mode 100644 index 000..3a36247 --- /dev/null +++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c @@ -0,0 +1,385 @@ +/* QLogic qede NIC Driver +* Copyright (c) 2015 QLogic Corporation +* +* This software is available under the terms of the
[RFC v2 net-next 00/10] Add new drivers: qed & qede
From: Ariel EliorThis series implements the driver set for Qlogic's new 579xx series. These are 10/20/25/40/50/100 Gig capable converged nics, supporting ethernet (obviously), iscsi, fcoe, roce and iwarp protocols. The overall driver design includes a common module ('qed') and protocol specific dependent modules for ethernet ('qede'), fcoe ('qedf'), iscsi ('qedi') and roce ('qedr'). The common module contains all of the common logic, e.g. initialization, cleanup, infrastructure for interrupt handling, link management, slowpath etc. as well as protocol agnostic features, and supplying an abstraction layer for other modules. The protocol specific modules can be compiled and operated independently of each other, with the exception of the rdma modules which are dependent on the ethernet module, in accordance with the kernel rdma stack design. This series only adds the core and ethernet modules, with basic L2 capabilities. Future series will add the rest of the modules and enhance the L2 functionality. Ths patch series is constructed of the following patches: qed: Add module with basic common support qed: Add basic L2 interface qede: Add basic Network driver qed: Add slowpath L2 support qede: Add basic network device support qede: Add classification configuration qed: Add link support qede: Add support for link qed: Add statistics support qede: Add basic ethtool support We don't expect the series to be accepted as is. We are looking for upstream community feedback and guidance. Although the series is quite large, it is what we viewed as the minimal set of patches to constitute a basic L2 driver. This project is a team effort, thanks go to Yuval Mintz, Dmitry Kravkov, Michal Kalderon, Tomer Tayar, Manish Chopra, Sudarsana Kalluru, Rajesh Borundia, Sony Chacko, Artum Zolotushko, Harish Patil, Rasesh Mody, Sergey Ukhterov and Elad Manela, as well as former team members, Eilon Greenstein and Shmulik Ravid. Changes from previos version: - >From Version 1: - Removed private license file; Instead revised comments at source headers. Thanks, Ariel Elior -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS/TCP/IPv6 acting strangely in 4.2
Hi Russell, On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux > wrote: > > Following that idea, I just tried the patch below, and it seems to > > work. > > I don't know whether it handles all cases after a call to > > kernel_connect(), > > but it stops the multiple connection attempts: > > > > 1 0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 > > WS=128 > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 > > TSval=870318939 TSecr=15712 WS=64 > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0x905379cc, [Check: RD LU MD XT DL] > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0x905379cc, [Check: RD LU MD XT DL] > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), > > [Allowed: RD LU MD XT DL] > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), > > [Allowed: RD LU MD XT DL] > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), > > [Allowed: RD LU MD XT DL] > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: > > 0xe15fc9c9 > > ... > > NFS people - any comments on this patch? Is it the correct way to > solve > this problem (please see the first message in this thread for the > problem.) > Without this patch, NFS is unusable as it tries to launch multiple > new > connections from the same port to the NFS server without giving the > NFS > server time to respond and establish the TCP connection. I agree that it addresses a real problem here, however there are a couple of issues with the patch itself: AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and TCP_CLOSE, so if the connection attempt fails, this patch leaves the XPRT_CONNECTING flag set. There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection attempt by canceling the XPRT_CONNECTING state. How about the following? It is based on your patch, but adds a check to ensure that xs_tcp_state_change() doesn't clear the 'connecting' state more than once (which could otherwise still happen in the TCP_CLOSE case). 8<--- >From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 From: Trond MyklebustDate: Wed, 16 Sep 2015 23:43:17 -0400 Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete before retrying Commit 718ba5b87343, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprtsock.h | 3 +++ net/sunrpc/xprtsock.c | 11 --- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 7591788e9fbf..357e44c1a46b 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -42,6 +42,7 @@ struct sock_xprt { /* * Connection of transports */ + unsigned long sock_state; struct delayed_work connect_worker; struct sockaddr_storage srcaddr; unsigned short srcport; @@ -76,6 +77,8 @@ struct sock_xprt { */ #define TCP_RPC_REPLY (1UL << 6) +#define XPRT_SOCK_CONNECTING 1U + #endif /* __KERNEL__ */ #endif /* _LINUX_SUNRPC_XPRTSOCK_H */ diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 7be90bc1a7c2..5bac27983e2a 100644 --- a/net/sunrpc/xprtsock.c +++
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> -Original Message- > From: David Laight [mailto:david.lai...@aculab.com] > Sent: Thursday, September 17, 2015 1:38 AM > To: KY Srinivasan; Alexander Duyck > ; Haiyang Zhang ; > Vitaly Kuznetsov ; netdev@vger.kernel.org > Cc: David S. Miller ; linux-ker...@vger.kernel.org; > Jason Wang > Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V > > From: KY Srinivasan > > Sent: 16 September 2015 23:58 > ... > > > I think we get that. The question is does the Remote NDIS header and > > > packet info actually need to be a part of the header data? I would > > > argue that it probably doesn't. > > > > > > So for example in netvsc_start_xmit it looks like you are calling > > > init_page_array in order to populate a set of page buffers, but the > > > first buffer for the Remote NDIS protocol is populated as a separate > > > page and offset. As such it doesn't seem like it necessarily needs to > > > be a part of the header data but could be maintained perhaps in a > > > separate ring buffer, or perhaps just be a separate page that you break > > > up to use for each header. > > > > You are right; the rndis header can be built as a separate fragment and > > sent. > > Indeed this is what we were doing earlier - on the outgoing path we would > allocate > > memory for the rndis header. My goal was to avoid this allocation on every > packet being > > sent and I decided to use the headroom instead. If we can completely avoid > > all > memory > > allocation for rndis header, it makes a significant perf difference: > ... > > > So just preallocate the header space as a fixed buffer for each ring entry > (or tx frame). > > If you allocate a fixed buffer for each ring entry you may find there are > performance gains from copying small fragments into the buffer instead > of doing whatever mapping operations are required. > > David Yes; I could do that. My original goal of asking for additional head room was to avoid having any allocation in the transmit path. I did not realize that all I had done was push the allocation to a different spot since the head room I was asking was greater than the default head room on skb allocation. I think I can achieve my original goal of not having any allocation in the send path by carefully using the memory available in the skb: 1. I am going to separately handle the rndis header and this can be packed in the default headroom available in the skb. 2. I will use the scratch area in the skb to stash away the state that needs to persist. This is the state needed to cleanup the guest state after we get the send_complete packet. Regards, K. Y -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS/TCP/IPv6 acting strangely in 4.2
On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux wrote: > Following that idea, I just tried the patch below, and it seems to work. > I don't know whether it handles all cases after a call to kernel_connect(), > but it stops the multiple connection attempts: > > 1 0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 Win=28560 > Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 WS=128 > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] Seq=1884476522 > Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=870318939 > TSecr=15712 WS=64 > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: > RD LU MD XT DL] > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712 > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: > RD LU MD XT DL] > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), [Allowed: > RD LU MD XT DL] > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: > RD LU MD XT DL] > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), [Allowed: > RD LU MD XT DL] > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: > RD LU MD XT DL] > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), [Allowed: > RD LU MD XT DL] > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: 0xe15fc9c9 > ... NFS people - any comments on this patch? Is it the correct way to solve this problem (please see the first message in this thread for the problem.) Without this patch, NFS is unusable as it tries to launch multiple new connections from the same port to the NFS server without giving the NFS server time to respond and establish the TCP connection. > > net/sunrpc/xprtsock.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index ff5b6a2e62c3..c456d6e51c56 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1450,6 +1450,7 @@ static void xs_tcp_state_change(struct sock *sk) > switch (sk->sk_state) { > case TCP_ESTABLISHED: > spin_lock(>transport_lock); > + xprt_clear_connecting(xprt); > if (!xprt_test_and_set_connected(xprt)) { > struct sock_xprt *transport = container_of(xprt, > struct sock_xprt, xprt); > @@ -1474,12 +1475,14 @@ static void xs_tcp_state_change(struct sock *sk) > smp_mb__before_atomic(); > clear_bit(XPRT_CONNECTED, >state); > clear_bit(XPRT_CLOSE_WAIT, >state); > + clear_bit(XPRT_CONNECTING, >state); > smp_mb__after_atomic(); > break; > case TCP_CLOSE_WAIT: > /* The server initiated a shutdown of the socket */ > xprt->connect_cookie++; > clear_bit(XPRT_CONNECTED, >state); > + clear_bit(XPRT_CONNECTING, >state); > xs_tcp_force_close(xprt); > case TCP_CLOSING: > /* > @@ -1493,6 +1496,7 @@ static void xs_tcp_state_change(struct sock *sk) > set_bit(XPRT_CLOSING, >state); > smp_mb__before_atomic(); > clear_bit(XPRT_CONNECTED, >state); > + clear_bit(XPRT_CONNECTING, >state); > smp_mb__after_atomic(); > break; > case TCP_CLOSE: > @@ -2237,11 +2241,13 @@ static void xs_tcp_setup_socket(struct work_struct > *work) > xs_tcp_force_close(xprt); > break; > case 0: > - case -EINPROGRESS: > case -EALREADY: > xprt_unlock_connect(xprt, transport); > xprt_clear_connecting(xprt); > return; > + case -EINPROGRESS: > + xprt_unlock_connect(xprt, transport); > + return; > case -EINVAL: > /* Happens, for instance, if the user specified a link >* local IPv6 address without a scope-id. > > > -- > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net. > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line
[PATCH net 1/5] vxlan: set needed headroom correctly
vxlan_setup is called when allocating the net_device, i.e. way before vxlan_newlink (or vxlan_dev_configure) is called. This means vxlan->default_dst is actually unset in vxlan_setup and the condition that sets needed_headroom always takes the else branch. Set the needed_headrom at the point when we have the information about the address family available. Fixes: e4c7ed415387c ("vxlan: add ipv6 support") Fixes: 2853af6a2ea1a ("vxlan: use dev->needed_headroom instead of dev->hard_header_len") CC: Cong WangSigned-off-by: Jiri Benc --- drivers/net/vxlan.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index cf8b7f0473b3..6ebe562af04e 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2392,10 +2392,6 @@ static void vxlan_setup(struct net_device *dev) eth_hw_addr_random(dev); ether_setup(dev); - if (vxlan->default_dst.remote_ip.sa.sa_family == AF_INET6) - dev->needed_headroom = ETH_HLEN + VXLAN6_HEADROOM; - else - dev->needed_headroom = ETH_HLEN + VXLAN_HEADROOM; dev->netdev_ops = _netdev_ops; dev->destructor = free_netdev; @@ -2670,8 +2666,12 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, dev->needed_headroom = lowerdev->hard_header_len + (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM); - } else if (use_ipv6) + } else if (use_ipv6) { vxlan->flags |= VXLAN_F_IPV6; + dev->needed_headroom = ETH_HLEN + VXLAN6_HEADROOM; + } else { + dev->needed_headroom = ETH_HLEN + VXLAN_HEADROOM; + } memcpy(>cfg, conf, sizeof(*conf)); if (!vxlan->cfg.dst_port) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 5/5] bnx2x: track vxlan port count
The callback for adding vxlan port can be called with the same port for both IPv4 and IPv6. Do not disable the offloading when the same port for both protocols is added and later one of them removed. Signed-off-by: Jiri Benc--- drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 -- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index ba936635322a..b5e64b02200c 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1946,6 +1946,7 @@ struct bnx2x { u16 vlan_cnt; u16 vlan_credit; u16 vxlan_dst_port; + u8 vxlan_dst_port_count; bool accept_any_vlan; }; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 89a174fa1300..f1d62d5dbaff 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -10108,12 +10108,18 @@ static void __bnx2x_add_vxlan_port(struct bnx2x *bp, u16 port) if (!netif_running(bp->dev)) return; - if (bp->vxlan_dst_port || !IS_PF(bp)) { + if (bp->vxlan_dst_port_count && bp->vxlan_dst_port == port) { + bp->vxlan_dst_port_count++; + return; + } + + if (bp->vxlan_dst_port_count || !IS_PF(bp)) { DP(BNX2X_MSG_SP, "Vxlan destination port limit reached\n"); return; } bp->vxlan_dst_port = port; + bp->vxlan_dst_port_count = 1; bnx2x_schedule_sp_rtnl(bp, BNX2X_SP_RTNL_ADD_VXLAN_PORT, 0); } @@ -10128,10 +10134,14 @@ static void bnx2x_add_vxlan_port(struct net_device *netdev, static void __bnx2x_del_vxlan_port(struct bnx2x *bp, u16 port) { - if (!bp->vxlan_dst_port || bp->vxlan_dst_port != port || !IS_PF(bp)) { + if (!bp->vxlan_dst_port_count || bp->vxlan_dst_port != port || + !IS_PF(bp)) { DP(BNX2X_MSG_SP, "Invalid vxlan port\n"); return; } + bp->vxlan_dst_port--; + if (bp->vxlan_dst_port) + return; if (netif_running(bp->dev)) { bnx2x_schedule_sp_rtnl(bp, BNX2X_SP_RTNL_DEL_VXLAN_PORT, 0); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 2/5] vxlan: reject IPv6 addresses if IPv6 is not configured
When IPv6 address is set without IPv6 configured, the vxlan socket is mostly treated as an IPv4 one but various lookus in fdb etc. still take the AF_INET6 into account. This creates incosistencies with weird consequences. Just reject IPv6 addresses in such case. Signed-off-by: Jiri Benc--- drivers/net/vxlan.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 6ebe562af04e..bbac1d35ed4e 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2636,8 +2636,11 @@ static int vxlan_dev_configure(struct net *src_net, struct net_device *dev, dst->remote_ip.sa.sa_family = AF_INET; if (dst->remote_ip.sa.sa_family == AF_INET6 || - vxlan->cfg.saddr.sa.sa_family == AF_INET6) + vxlan->cfg.saddr.sa.sa_family == AF_INET6) { + if (!IS_ENABLED(CONFIG_IPV6)) + return -EPFNOSUPPORT; use_ipv6 = true; + } if (conf->remote_ifindex) { struct net_device *lowerdev -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 4/5] be2net: allow offloading with the same port for IPv4 and IPv6
The callback for adding vxlan port can be called with the same port for both IPv4 and IPv6. Do not disable the offloading if this occurs. Signed-off-by: Jiri Benc--- drivers/net/ethernet/emulex/benet/be.h | 1 + drivers/net/ethernet/emulex/benet/be_main.c | 10 ++ 2 files changed, 11 insertions(+) diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index 0a27805cbbbd..821540913343 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -582,6 +582,7 @@ struct be_adapter { u16 pvid; __be16 vxlan_port; int vxlan_port_count; + int vxlan_port_aliases; struct phy_info phy; u8 wol_cap; bool wol_en; diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 12687bf52b95..7bf51a1a0a77 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -5176,6 +5176,11 @@ static void be_add_vxlan_port(struct net_device *netdev, sa_family_t sa_family, if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter)) return; + if (adapter->vxlan_port == port && adapter->vxlan_port_count) { + adapter->vxlan_port_aliases++; + return; + } + if (adapter->flags & BE_FLAGS_VXLAN_OFFLOADS) { dev_info(dev, "Only one UDP port supported for VxLAN offloads\n"); @@ -5226,6 +5231,11 @@ static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family, if (adapter->vxlan_port != port) goto done; + if (adapter->vxlan_port_aliases) { + adapter->vxlan_port_aliases--; + return; + } + be_disable_vxlan_offloads(adapter); dev_info(>pdev->dev, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared
On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote: > > Thanks; I'll try that. In fact since updating to 4.2 the problem has > got worse — now the whole machine dies: There is something very strange going on here. I've found two ways to make it stop crashing when cp_tx_timeout() hits the 'popf' when unlocking the spinlock. The first is to comment out the whole of cp_tx_timeout() and let it happen once. Then put that code *back* again and reload the module. Then it can work fine. The second way is to comment out the WARN_ONCE in dev_watchdog(). I remain utterly bemused; I have no idea what's going on there. But that aside, even when it survives running cp_tx_timeout(), it still doesn't *work* — it looks like TX is indeed working and has recovered, but we are not *receiving* any packets. I can't actually trigger the TX timeout at all with debugging enabled; I've hacked things so that cp_set_wol() will also call cp_tx_timeout() and simulate it. And now I see this... [ 4358.499474] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.499488] 8139cp :00:0b.0 eth1: tx done, slot 35 [ 4358.513663] 8139cp :00:0b.0 eth1: tx queued, slot 37, skblen 54 [ 4358.513692] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.513705] 8139cp :00:0b.0 eth1: tx done, slot 36 [ 4358.518880] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b [ 4358.518900] 8139cp :00:0b.0 eth1: rx slot 1 status 0x32014040 len 60 [ 4358.523601] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b [ 4358.526910] 8139cp :00:0b.0 eth1: rx slot 2 status 0x32036052 len 78 [ 4358.547898] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b [ 4358.547996] 8139cp :00:0b.0 eth1: rx slot 3 status 0x32036052 len 78 [ 4358.580526] 8139cp :00:0b.0 eth1: tx queued, slot 38, skblen 70 [ 4358.580555] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.580569] 8139cp :00:0b.0 eth1: tx done, slot 37 [ 4358.601912] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b [ 4358.601932] 8139cp :00:0b.0 eth1: rx slot 4 status 0x32036052 len 78 [ 4358.650678] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c cpcmd 002b [ 4358.650698] 8139cp :00:0b.0 eth1: rx slot 5 status 0x320145a5 len 1441 [ 4358.665572] will lock... [ 4358.668222] Handling tx timeout, flags 282 [ 4358.672494] nway_reset [ 4358.674858] Will wake queue... [ 4358.677919] Will unlock... flags 282 [ 4358.681525] did unlock... [ 4358.684198] 8139cp :00:0b.0 eth1: Transmit timeout handled, status c 2b0 80ff [ 4358.708234] 8139cp :00:0b.0 eth1: tx queued, slot 1, skblen 92 [ 4358.714567] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.722405] 8139cp :00:0b.0 eth1: tx done, slot 0 [ 4358.747412] 8139cp :00:0b.0 eth1: tx queued, slot 2, skblen 106 [ 4358.753736] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.756824] 8139cp :00:0b.0 eth1: tx done, slot 1 [ 4358.814961] 8139cp :00:0b.0 eth1: tx queued, slot 3, skblen 173 [ 4358.821291] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.824186] 8139cp :00:0b.0 eth1: tx done, slot 2 [ 4358.834352] 8139cp :00:0b.0 eth1: tx queued, slot 4, skblen 86 [ 4358.840579] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.844216] 8139cp :00:0b.0 eth1: tx done, slot 3 [ 4358.853615] 8139cp :00:0b.0 eth1: tx queued, slot 5, skblen 54 [ 4358.859822] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c cpcmd 002b [ 4358.863497] 8139cp :00:0b.0 eth1: tx done, slot 4 [ 4358.873111] 8139cp :00:0b.0 eth1: tx queued, slot 6, skblen 66 -- -- David WoodhouseOpen Source Technology Centre david.woodho...@intel.com Intel Corporation smime.p7s Description: S/MIME cryptographic signature
Re: NFS/TCP/IPv6 acting strangely in 4.2
On Wed, Sep 16, 2015 at 06:53:57AM +, Damien Thébault wrote: > On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote: > > I have a recent Marvell Armada 388 board here which uses the mvneta > > driver. I'm seeing some weird effects with NFS with it acting as a > > client. > > Hello, > > I'm upgrading a Marvelle Armada 370 board using the mvneta driver from > 4.0 to 4.2 and noticed issues with NFS booting. > Basically, most of the time init returns with an error code, or > programs segfault or throw illegal instructions. > > Since it worked fine on 4.0 I bisected until I found commit > a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling > for Rx DMA buffers". > > If I revert this commit, everything seems to get back to normal. > Could you try it ? The two issues look very similar. If you look at my original problem report, you'll see that has nothing to do with the problem I'm seeing. My problem is: - TCP disconnects - NFS tries to establish a new connection with the server, sending a SYN - NFS server replies with a SYNACK - NFS client immediately sends another SYN with a different sequence number, so it's a _new_ attempt to connect to the NFS server. At this point, the socket for the previous SYNACK'd connection has been destroyed mid-setup. This is because the sunrpc code is horribly racy - it doesn't block a second attempt to call kernel_connect() on a socket which is already in the process of connecting to the NFS server. Even if the SYNACK had been corrupted (due to mvneta's rx code), that has no bearing on the race in the sunrpc layer that destroys the previous socket before the TCP SYN/SYNACK/ACK handshake has had a chance to complete. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 0/5] vxlan fixes
This fixes various issues with vxlan related to IPv6. Jiri Benc (5): vxlan: set needed headroom correctly vxlan: reject IPv6 addresses if IPv6 is not configured qlcnic: track vxlan port count be2net: allow offloading with the same port for IPv4 and IPv6 bnx2x: track vxlan port count drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 1 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 -- drivers/net/ethernet/emulex/benet/be.h | 1 + drivers/net/ethernet/emulex/benet/be_main.c | 10 ++ drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 1 + drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 18 +- drivers/net/vxlan.c | 15 +-- 7 files changed, 47 insertions(+), 13 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net 3/5] qlcnic: track vxlan port count
The callback for adding vxlan port can be called with the same port for both IPv4 and IPv6. Do not disable the offloading when the same port for both protocols is added and later one of them removed. Signed-off-by: Jiri Benc--- drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 1 + drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 18 +- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index 06bcc734fe8d..d6696cfa11d2 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -536,6 +536,7 @@ struct qlcnic_hardware_context { u8 extend_lb_time; u8 phys_port_id[ETH_ALEN]; u8 lb_mode; + u8 vxlan_port_count; u16 vxlan_port; struct device *hwmon_dev; u32 post_mode; diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 8b08b20e8b30..d4481454b5f8 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -483,11 +483,17 @@ static void qlcnic_add_vxlan_port(struct net_device *netdev, /* Adapter supports only one VXLAN port. Use very first port * for enabling offload */ - if (!qlcnic_encap_rx_offload(adapter) || ahw->vxlan_port) + if (!qlcnic_encap_rx_offload(adapter)) return; + if (!ahw->vxlan_port_count) { + ahw->vxlan_port_count = 1; + ahw->vxlan_port = ntohs(port); + adapter->flags |= QLCNIC_ADD_VXLAN_PORT; + return; + } + if (ahw->vxlan_port == ntohs(port)) + ahw->vxlan_port_count++; - ahw->vxlan_port = ntohs(port); - adapter->flags |= QLCNIC_ADD_VXLAN_PORT; } static void qlcnic_del_vxlan_port(struct net_device *netdev, @@ -496,11 +502,13 @@ static void qlcnic_del_vxlan_port(struct net_device *netdev, struct qlcnic_adapter *adapter = netdev_priv(netdev); struct qlcnic_hardware_context *ahw = adapter->ahw; - if (!qlcnic_encap_rx_offload(adapter) || !ahw->vxlan_port || + if (!qlcnic_encap_rx_offload(adapter) || !ahw->vxlan_port_count || (ahw->vxlan_port != ntohs(port))) return; - adapter->flags |= QLCNIC_DEL_VXLAN_PORT; + ahw->vxlan_port_count--; + if (!ahw->vxlan_port_count) + adapter->flags |= QLCNIC_DEL_VXLAN_PORT; } static netdev_features_t qlcnic_features_check(struct sk_buff *skb, -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 net-next 06/10] qede: classification configuration
From: Sudarsana KalluruAdd the ability to configure basic classification in driver by implementing ndo_set_mac_address() and ndo_set_rx_mode(). Signed-off-by: Sudarsana Kalluru Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/qede/qede.h | 10 ++ drivers/net/ethernet/qlogic/qede/qede_main.c | 241 +++ 2 files changed, 251 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 7680106..5729128 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -89,6 +89,9 @@ struct qede_dev { struct qed_update_vport_rss_params rss_params; u16 q_num_rx_buffers; /* Must be a power of two */ u16 q_num_tx_buffers; /* Must be a power of two */ + + struct delayed_work sp_task; + unsigned long sp_flags; }; enum QEDE_STATE { @@ -188,6 +191,13 @@ struct qede_fastpath { #define QEDE_CSUM_ERRORBIT(0) #define QEDE_CSUM_UNNECESSARY BIT(1) + +#define QEDE_SP_RX_MODE1 + +union qede_reload_args { + u16 mtu; +}; + #define RX_RING_SIZE_POW 13 #define RX_RING_SIZE BIT(RX_RING_SIZE_POW) #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 14e0b09..7b3c3d8 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -1030,10 +1030,31 @@ static irqreturn_t qede_msix_fp_int(int irq, void *fp_cookie) static int qede_open(struct net_device *ndev); static int qede_close(struct net_device *ndev); +static int qede_set_mac_addr(struct net_device *ndev, void *p); +static void qede_set_rx_mode(struct net_device *ndev); +static void qede_config_rx_mode(struct net_device *ndev); + +static int qede_set_ucast_rx_mac(struct qede_dev *edev, +enum qed_filter_xcast_params_type opcode, +unsigned char mac[ETH_ALEN]) +{ + struct qed_filter_params filter_cmd; + + memset(_cmd, 0, sizeof(filter_cmd)); + filter_cmd.type = QED_FILTER_TYPE_UCAST; + filter_cmd.filter.ucast.type = opcode; + filter_cmd.filter.ucast.mac_valid = 1; + ether_addr_copy(filter_cmd.filter.ucast.mac, mac); + + return edev->ops->filter_config(edev->cdev, _cmd); +} + static const struct net_device_ops qede_netdev_ops = { .ndo_open = qede_open, .ndo_stop = qede_close, .ndo_start_xmit = qede_start_xmit, + .ndo_set_rx_mode = qede_set_rx_mode, + .ndo_set_mac_address = qede_set_mac_addr, .ndo_validate_addr = eth_validate_addr, }; @@ -1198,6 +1219,20 @@ err: return -ENOMEM; } +static void qede_sp_task(struct work_struct *work) +{ + struct qede_dev *edev = container_of(work, struct qede_dev, +sp_task.work); + mutex_lock(>qede_lock); + + if (edev->state == QEDE_STATE_OPEN) { + if (test_and_clear_bit(QEDE_SP_RX_MODE, >sp_flags)) + qede_config_rx_mode(edev->ndev); + } + + mutex_unlock(>qede_lock); +} + static void qede_update_pf_params(struct qed_dev *cdev) { struct qed_pf_params pf_params; @@ -1269,6 +1304,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, edev->ops->common->set_id(cdev, edev->ndev->name, DRV_MODULE_VERSION); + INIT_DELAYED_WORK(>sp_task, qede_sp_task); + mutex_init(>qede_lock); + DP_INFO(edev, "Ending successfully qede probe\n"); return 0; @@ -1306,6 +1344,7 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode) DP_INFO(edev, "Starting qede_remove\n"); + cancel_delayed_work_sync(>sp_task); unregister_netdev(ndev); edev->ops->common->set_power_state(cdev, PCI_D0); @@ -2025,6 +2064,24 @@ static int qede_start_queues(struct qede_dev *edev) return 0; } +static int qede_set_mcast_rx_mac(struct qede_dev *edev, +enum qed_filter_xcast_params_type opcode, +unsigned char *mac, int num_macs) +{ + struct qed_filter_params filter_cmd; + int i; + + memset(_cmd, 0, sizeof(filter_cmd)); + filter_cmd.type = QED_FILTER_TYPE_MCAST; + filter_cmd.filter.mcast.type = opcode; + filter_cmd.filter.mcast.num = num_macs; + + for (i = 0; i < num_macs; i++, mac += ETH_ALEN) + ether_addr_copy(filter_cmd.filter.mcast.mac[i], mac); + + return edev->ops->filter_config(edev->cdev, _cmd); +} + enum qede_unload_mode {
[RFC v2 net-next 08/10] qede: Add support for link
From: Sudarsana KalluruThis adds basic link functionality to qede - driver still doesn't provide users with an API to change any link property, but it does request qed to initialize the link using default configuration, and registers a callback that allows it to get link notifications. This patch adds the ability of the driver to set the carrier as active and to enable traffic as a result of async. link notifications. Following this patch, driver should be capable of running traffic. Signed-off-by: Sudarsana Kalluru Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/qede/qede_main.c | 47 1 file changed, 47 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 7b3c3d8..8cb1bb5 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -87,6 +87,7 @@ static int qede_probe(struct pci_dev *pdev, const struct pci_device_id *id); static void qede_remove(struct pci_dev *pdev); static int qede_alloc_rx_buffer(struct qede_dev *edev, struct qede_rx_queue *rxq); +static void qede_link_update(void *dev, struct qed_link_output *link); static struct pci_driver qede_pci_driver = { .name = "qede", @@ -95,6 +96,12 @@ static struct pci_driver qede_pci_driver = { .remove = qede_remove, }; +static struct qed_eth_cb_ops qede_ll_ops = { + { + .link_update = qede_link_update, + }, +}; + static int qede_netdev_event(struct notifier_block *this, unsigned long event, void *ptr) { @@ -1304,6 +1311,8 @@ static int __qede_probe(struct pci_dev *pdev, u32 dp_module, u8 dp_level, edev->ops->common->set_id(cdev, edev->ndev->name, DRV_MODULE_VERSION); + edev->ops->register_ops(cdev, _ll_ops, edev); + INIT_DELAYED_WORK(>sp_task, qede_sp_task); mutex_init(>qede_lock); @@ -2088,6 +2097,7 @@ enum qede_unload_mode { static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode) { + struct qed_link_params link_params; int rc; DP_INFO(edev, "Starting qede unload\n"); @@ -2099,6 +2109,10 @@ static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode) netif_tx_disable(edev->ndev); netif_carrier_off(edev->ndev); + /* Reset the link */ + memset(_params, 0, sizeof(link_params)); + link_params.link_up = false; + edev->ops->common->set_link(edev->cdev, _params); rc = qede_stop_queues(edev); if (rc) { qede_sync_free_irqs(edev); @@ -2129,6 +2143,8 @@ enum qede_load_mode { static int qede_load(struct qede_dev *edev, enum qede_load_mode mode) { + struct qed_link_params link_params; + struct qed_link_output link_output; int rc; DP_INFO(edev, "Starting qede load\n"); @@ -2172,6 +2188,17 @@ static int qede_load(struct qede_dev *edev, enum qede_load_mode mode) mutex_lock(>qede_lock); edev->state = QEDE_STATE_OPEN; mutex_unlock(>qede_lock); + + /* Ask for link-up using current configuration */ + memset(_params, 0, sizeof(link_params)); + link_params.link_up = true; + edev->ops->common->set_link(edev->cdev, _params); + + /* Query whether link is already-up */ + memset(_output, 0, sizeof(link_output)); + edev->ops->common->get_link(edev->cdev, _output); + qede_link_update(edev, _output); + DP_INFO(edev, "Ending successfully qede load\n"); return 0; @@ -2217,6 +2244,26 @@ static int qede_close(struct net_device *ndev) return 0; } +static void qede_link_update(void *dev, struct qed_link_output *link) +{ + struct qede_dev *edev = dev; + + if (!netif_running(edev->ndev)) { + DP_VERBOSE(edev, NETIF_MSG_LINK, "Interface is not running\n"); + return; + } + + if (link->link_up) { + DP_NOTICE(edev, "Link is up\n"); + netif_tx_start_all_queues(edev->ndev); + netif_carrier_on(edev->ndev); + } else { + DP_NOTICE(edev, "Link is down\n"); + netif_tx_disable(edev->ndev); + netif_carrier_off(edev->ndev); + } +} + static int qede_set_mac_addr(struct net_device *ndev, void *p) { struct qede_dev *edev = netdev_priv(ndev); -- 1.9.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2 net-next 03/10] qede: Add basic Network driver
The Qlogic Everest Driver for Ethernet is the Ethernet specifc module for 579xx ethernet products by Qlogic. This patch adds a very minimal PCI driver, one that doesn't yet register a network device, but one that does interact with qed and does a basic initialization of the HW. Signed-off-by: Yuval MintzSigned-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/Kconfig | 5 + drivers/net/ethernet/qlogic/Makefile | 1 + drivers/net/ethernet/qlogic/qede/Makefile| 3 + drivers/net/ethernet/qlogic/qede/qede.h | 73 ++ drivers/net/ethernet/qlogic/qede/qede_main.c | 354 +++ 5 files changed, 436 insertions(+) create mode 100644 drivers/net/ethernet/qlogic/qede/Makefile create mode 100644 drivers/net/ethernet/qlogic/qede/qede.h create mode 100644 drivers/net/ethernet/qlogic/qede/qede_main.c diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig index 58c3fb3..30a6f24 100644 --- a/drivers/net/ethernet/qlogic/Kconfig +++ b/drivers/net/ethernet/qlogic/Kconfig @@ -97,4 +97,9 @@ config QED ---help--- This enables the support for ... +config QEDE + tristate "QLogic QED 25/40/100Gb Ethernet NIC" + depends on QED + ---help--- + This enables the support for ... endif # NET_VENDOR_QLOGIC diff --git a/drivers/net/ethernet/qlogic/Makefile b/drivers/net/ethernet/qlogic/Makefile index 7600138..cee90e0 100644 --- a/drivers/net/ethernet/qlogic/Makefile +++ b/drivers/net/ethernet/qlogic/Makefile @@ -7,3 +7,4 @@ obj-$(CONFIG_QLCNIC) += qlcnic/ obj-$(CONFIG_QLGE) += qlge/ obj-$(CONFIG_NETXEN_NIC) += netxen/ obj-$(CONFIG_QED) += qed/ +obj-$(CONFIG_QEDE)+= qede/ diff --git a/drivers/net/ethernet/qlogic/qede/Makefile b/drivers/net/ethernet/qlogic/qede/Makefile new file mode 100644 index 000..bedfe9f --- /dev/null +++ b/drivers/net/ethernet/qlogic/qede/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_QEDE) := qede.o + +qede-y := qede_main.o diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h new file mode 100644 index 000..7e2bcfa --- /dev/null +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -0,0 +1,73 @@ +/* QLogic qede NIC Driver +* Copyright (c) 2015 QLogic Corporation +* +* This software is available under the terms of the GNU General Public License +* (GPL) Version 2, available from the file COPYING in the main directory of +* this source tree. +*/ + +#ifndef _QEDE_H_ +#define _QEDE_H_ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define QEDE_MAJOR_VERSION 8 +#define QEDE_MINOR_VERSION 4 +#define QEDE_REVISION_VERSION 0 +#define QEDE_ENGINEERING_VERSION 0 +#define DRV_MODULE_VERSION __stringify(QEDE_MAJOR_VERSION) "." \ + __stringify(QEDE_MINOR_VERSION) "." \ + __stringify(QEDE_REVISION_VERSION) "." \ + __stringify(QEDE_ENGINEERING_VERSION) + +#define QEDE_ETH_INTERFACE_VERSION 300 + +#define DRV_MODULE_SYM qede + +struct qede_dev { + struct qed_dev *cdev; + struct net_device *ndev; + struct pci_dev *pdev; + + u32 dp_module; + u8 dp_level; + + const struct qed_eth_ops*ops; + + struct qed_dev_eth_info dev_info; +#define QEDE_MAX_RSS_CNT(edev) ((edev)->dev_info.num_queues) +#define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \ +(edev)->dev_info.num_tc) + + u16 num_rss; + u8 num_tc; +#define QEDE_RSS_CNT(edev) ((edev)->num_rss) +#define QEDE_TSS_CNT(edev) ((edev)->num_rss * \ +(edev)->num_tc) +#define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss) +#define QEDE_TC_IDX(edev, txqidx) ((txqidx) / (edev)->num_rss) + + struct qed_int_info int_info; + unsigned char primary_mac[ETH_ALEN]; + + /* Smaller private varaiant of the RTNL lock */ + struct mutexqede_lock; + u32 state; /* Protected by qede_lock */ +}; + +/* Debug print definitions */ +#define DP_NAME(edev) ((edev)->ndev->name) + +#endif /* _QEDE_H_ */ diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c new file mode 100644 index 000..35065dc --- /dev/null +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -0,0 +1,354 @@ +/* QLogic qede NIC Driver +* Copyright (c) 2015 QLogic Corporation +* +* This software is available under the terms of the GNU General Public License +* (GPL) Version 2, available from the
[RFC v2 net-next 09/10] qed: Add statistics support
From: Manish ChopraDevice statistics can be gathered on-demand. This adds the qed support for reading the statistics [both function and port] from the device, and adds to the public API a method for requesting the current statistics. Signed-off-by: Manish Chopra Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/qed/qed.h | 14 ++ drivers/net/ethernet/qlogic/qed/qed_dev.c | 244 +- drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 3 + drivers/net/ethernet/qlogic/qed/qed_hsi.h | 30 drivers/net/ethernet/qlogic/qed/qed_l2.c | 3 + include/linux/qed/qed_eth_if.h| 3 + 6 files changed, 296 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index 965b728..f809f7b 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -215,7 +215,20 @@ struct qed_qm_info { u32 pf_rl; }; +struct storm_stats { + u32 address; + u32 len; +}; + +struct qed_storm_stats { + struct storm_stats mstats; + struct storm_stats pstats; + struct storm_stats tstats; + struct storm_stats ustats; +}; + struct qed_fw_data { + struct fw_ver_info *fw_ver_info; const u8*modes_tree_buf; union init_op *init_ops; const u32 *arr_data; @@ -299,6 +312,7 @@ struct qed_hwfn { /* QM init */ struct qed_qm_info qm_info; + struct qed_storm_stats storm_stats; /* Buffer for unzipping firmware data */ void*unzip_buf; diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index cde72e2..3993584 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -647,8 +647,10 @@ int qed_hw_init(struct qed_dev *cdev, bool allow_npar_tx_switch, const u8 *bin_fw_data) { - u32 load_code, param; + struct qed_storm_stats *p_stat; + u32 load_code, param, *p_address; int rc, mfw_rc, i; + u8 fw_vport = 0; rc = qed_init_fw_data(cdev, bin_fw_data); if (rc != 0) @@ -657,6 +659,10 @@ int qed_hw_init(struct qed_dev *cdev, for_each_hwfn(cdev, i) { struct qed_hwfn *p_hwfn = >hwfns[i]; + rc = qed_fw_vport(p_hwfn, 0, _vport); + if (rc != 0) + return rc; + /* Enable DMAE in PXP */ rc = qed_change_pci_hwfn(p_hwfn, p_hwfn->p_main_ptt, true); @@ -723,6 +729,25 @@ int qed_hw_init(struct qed_dev *cdev, } p_hwfn->hw_init_done = true; + + /* init PF stats */ + p_stat = _hwfn->storm_stats; + p_stat->mstats.address = BAR0_MAP_REG_MSDM_RAM + +MSTORM_QUEUE_STAT_OFFSET(fw_vport); + p_stat->mstats.len = sizeof(struct eth_mstorm_per_queue_stat); + + p_stat->ustats.address = BAR0_MAP_REG_USDM_RAM + +USTORM_QUEUE_STAT_OFFSET(fw_vport); + p_stat->ustats.len = sizeof(struct eth_ustorm_per_queue_stat); + + p_stat->pstats.address = BAR0_MAP_REG_PSDM_RAM + +PSTORM_QUEUE_STAT_OFFSET(fw_vport); + p_stat->pstats.len = sizeof(struct eth_pstorm_per_queue_stat); + + p_address = _stat->tstats.address; + *p_address = BAR0_MAP_REG_TSDM_RAM + +TSTORM_PORT_STAT_OFFSET(MFW_PORT(p_hwfn)); + p_stat->tstats.len = sizeof(struct tstorm_per_port_stat); } return 0; @@ -1503,6 +1528,223 @@ void qed_chain_free(struct qed_dev *cdev, p_chain->p_phys_addr); } +static void __qed_get_vport_stats(struct qed_dev *cdev, + struct qed_eth_stats *stats) +{ + int i, j; + + memset(stats, 0, sizeof(*stats)); + + for_each_hwfn(cdev, i) { + struct qed_hwfn *p_hwfn = >hwfns[i]; + struct eth_mstorm_per_queue_stat mstats; + struct eth_ustorm_per_queue_stat ustats; + struct eth_pstorm_per_queue_stat pstats; + struct tstorm_per_port_stat tstats; + struct port_stats port_stats; + struct qed_ptt *p_ptt = qed_ptt_acquire(p_hwfn); + + if (!p_ptt) { + DP_ERR(p_hwfn, "Failed to acquire ptt\n"); + continue; + } + + memset(, 0, sizeof(mstats)); + qed_memcpy_from(p_hwfn, p_ptt, , +
[RFC v2 net-next 07/10] qed: Add link support
Physical link is handled by the management Firmware. This patch lays the infrastructure for attention handling in the driver, as link change notifications arrive via async. attentions, as well the handling of such notifications. This patch also extends the API with the protocol drivers by adding registered callbacks which the protocol driver passes to qed in order to be notified of async. events originating from the FW/HW. Signed-off-by: Yuval MintzSigned-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/qed/qed.h | 20 ++ drivers/net/ethernet/qlogic/qed/qed_dev.c | 106 - drivers/net/ethernet/qlogic/qed/qed_int.c | 340 - drivers/net/ethernet/qlogic/qed/qed_l2.c | 9 + drivers/net/ethernet/qlogic/qed/qed_main.c | 212 ++ drivers/net/ethernet/qlogic/qed/qed_mcp.c | 300 + drivers/net/ethernet/qlogic/qed/qed_mcp.h | 126 ++- include/linux/qed/qed_eth_if.h | 4 + 8 files changed, 1112 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h index ab87526..965b728 100644 --- a/drivers/net/ethernet/qlogic/qed/qed.h +++ b/drivers/net/ethernet/qlogic/qed/qed.h @@ -111,6 +111,18 @@ enum QED_FEATURE { QED_MAX_FEATURES, }; +enum QED_PORT_MODE { + QED_PORT_MODE_DE_2X40G, + QED_PORT_MODE_DE_2X50G, + QED_PORT_MODE_DE_1X100G, + QED_PORT_MODE_DE_4X10G_F, + QED_PORT_MODE_DE_4X10G_E, + QED_PORT_MODE_DE_4X20G, + QED_PORT_MODE_DE_1X40G, + QED_PORT_MODE_DE_2X25G, + QED_PORT_MODE_DE_1X25G +}; + struct qed_hw_info { /* PCI personality */ enum qed_pci_personalitypersonality; @@ -407,6 +419,13 @@ struct qed_dev { u8 protocol; #define IS_QED_ETH_IF(cdev) ((cdev)->protocol == QED_PROTOCOL_ETH) + /* Callbacks to protocol driver */ + union { + struct qed_common_cb_ops*common; + struct qed_eth_cb_ops *eth; + } protocol_ops; + void*ops_cookie; + const struct firmware *firmware; }; @@ -456,6 +475,7 @@ static inline u8 qed_concrete_to_sw_fid(struct qed_dev *cdev, /* Prototypes */ int qed_fill_dev_info(struct qed_dev *cdev, struct qed_dev_info *dev_info); +void qed_link_update(struct qed_hwfn *hwfn); u32 qed_unzip_data(struct qed_hwfn *p_hwfn, u32 input_len, u8 *input_buf, u32 max_size, u8 *unzip_buf); diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 30408b7..cde72e2 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -1040,8 +1040,9 @@ static void qed_hw_get_resc(struct qed_hwfn *p_hwfn) static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt) { - u32 nvm_cfg1_offset, mf_mode, addr, generic_cont0; - u32 val; + u32 nvm_cfg1_offset, mf_mode, addr, generic_cont0, core_cfg; + struct qed_mcp_link_params *link; + u32 port_cfg_addr, link_temp, val; /* Read global nvm_cfg address */ u32 nvm_cfg_addr = qed_rd(p_hwfn, p_ptt, MISC_REG_GEN_PURP_CR0); @@ -1061,6 +1062,48 @@ static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn, offsetof(struct nvm_cfg1_glob, pci_id); p_hwfn->hw_info.vendor_id = qed_rd(p_hwfn, p_ptt, addr) & NVM_CFG1_GLOB_VENDOR_ID_MASK; + + addr = MCP_REG_SCRATCH + nvm_cfg1_offset + + offsetof(struct nvm_cfg1, glob) + + offsetof(struct nvm_cfg1_glob, core_cfg); + + core_cfg = qed_rd(p_hwfn, p_ptt, addr); + + switch ((core_cfg & NVM_CFG1_GLOB_NETWORK_PORT_MODE_MASK) >> + NVM_CFG1_GLOB_NETWORK_PORT_MODE_OFFSET) { + case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_2X40G: + p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_2X40G; + break; + case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_2X50G: + p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_2X50G; + break; + case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_1X100G: + p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_1X100G; + break; + case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X10G_F: + p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X10G_F; + break; + case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X10G_E: + p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X10G_E; + break; + case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X20G: + p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X20G; + break; + case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_1X40G: +
[RFC v2 net-next 05/10] qede: Add basic network device support
From: Sudarsana KalluruThis patch includes the basic Rx/Tx support for the driver [although carrier will still never be turned on]. Following this patch the driver registers a network device, initializes it and prepares it for traffic. Signed-off-by: Sudarsana Kalluru Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior --- drivers/net/ethernet/qlogic/qede/qede.h | 132 ++ drivers/net/ethernet/qlogic/qede/qede_main.c | 1801 ++ 2 files changed, 1933 insertions(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h index 7e2bcfa..7680106 100644 --- a/drivers/net/ethernet/qlogic/qede/qede.h +++ b/drivers/net/ethernet/qlogic/qede/qede.h @@ -36,6 +36,8 @@ #define DRV_MODULE_SYM qede +#define QEDE_NAPI_WEIGHT (NAPI_POLL_WEIGHT) + struct qede_dev { struct qed_dev *cdev; struct net_device *ndev; @@ -51,6 +53,7 @@ struct qede_dev { #define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \ (edev)->dev_info.num_tc) + struct qede_fastpath*fp_array; u16 num_rss; u8 num_tc; #define QEDE_RSS_CNT(edev) ((edev)->num_rss) @@ -58,6 +61,9 @@ struct qede_dev { (edev)->num_tc) #define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss) #define QEDE_TC_IDX(edev, txqidx) ((txqidx) / (edev)->num_rss) +#define QEDE_TX_QUEUE(edev, txqidx)\ + (&(edev)->fp_array[QEDE_TSS_IDX((edev), (txqidx))].txqs[QEDE_TC_IDX( \ + (edev), (txqidx))]) struct qed_int_info int_info; unsigned char primary_mac[ETH_ALEN]; @@ -65,9 +71,135 @@ struct qede_dev { /* Smaller private varaiant of the RTNL lock */ struct mutexqede_lock; u32 state; /* Protected by qede_lock */ + u16 rx_buf_size; + /* L2 header size + 2*VLANs (8 bytes) + LLC SNAP (8 bytes) */ +#define ETH_OVERHEAD (ETH_HLEN + 8 + 8) + /* Max supported alignment is 256 (8 shift) +* minimal alignment shift 6 is optimal for 57xxx HW performance +*/ +#define QEDE_RX_ALIGN_SHIFTmax(6, min(8, L1_CACHE_SHIFT)) + /* We assume skb_build() uses sizeof(struct skb_shared_info) bytes +* at the end of skb->data, to avoid wasting a full cache line. +* This reduces memory use (skb->truesize). +*/ +#define QEDE_FW_RX_ALIGN_END \ + max_t(u64, 1UL << QEDE_RX_ALIGN_SHIFT, \ + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) + + struct qed_update_vport_rss_params rss_params; + u16 q_num_rx_buffers; /* Must be a power of two */ + u16 q_num_tx_buffers; /* Must be a power of two */ +}; + +enum QEDE_STATE { + QEDE_STATE_CLOSED, + QEDE_STATE_OPEN, +}; + +#define U64_LO(x) ((u32)(((u64)(x)) & 0x)) +#define U64_HI(x) ((u32)(((u64)(x)) >> 32)) +#define HILO_U64(hi, lo) u64)(hi)) << 32) + (lo)) + +#defineMAX_NUM_TC 8 +#defineMAX_NUM_PRI 8 + +/* The driver supports the new build_skb() API: + * RX ring buffer contains pointer to kmalloc() data only, + * skb are built only after the frame was DMA-ed. + */ +struct sw_rx_data { + u8 *data; + + DEFINE_DMA_UNMAP_ADDR(mapping); +}; + +struct qede_rx_queue { + __le16 *hw_cons_ptr; + struct sw_rx_data *sw_rx_ring; + u16 sw_rx_cons; + u16 sw_rx_prod; + struct qed_chainrx_bd_ring; + struct qed_chainrx_comp_ring; + void __iomem*hw_rxq_prod_addr; + + int rx_buf_size; + + u16 num_rx_buffers; + u16 rxq_id; + + u64 rx_hw_errors; + u64 rx_alloc_errors; +}; + +union db_prod { + struct eth_db_data data; + u32 raw; +}; + +struct sw_tx_bd { + struct sk_buff *skb; + u8 flags; +/* Set on the first BD descriptor when there is a split BD */ +#define QEDE_TSO_SPLIT_BD BIT(0) +}; + +struct qede_tx_queue { + int index; /* Queue index */ + __le16 *hw_cons_ptr; + struct sw_tx_bd *sw_tx_ring; + u16 sw_tx_cons; + u16 sw_tx_prod; + struct qed_chaintx_pbl; + void __iomem
Re: [PATCH net-next 2/2] net: bcmgenet: Implement RX coalescing control knobs
On 17/09/15 10:58, Florian Fainelli wrote: > On 16/09/15 16:47, Florian Fainelli wrote: >> Add support for the ethtool rx-frames coalescing parameter which allows >> defining the number of RX interrupts per frames received. The RDMA >> engine supports a configurable timeout with a resolution of >> approximately 8.192 us. >> >> We can no longer enable the BDONE/PDONE interrupts as those would >> fire for each packet/buffer received, which would defeat the MBDONE >> interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a >> PDONE/BDONE interrupt when the threshold is set to 1. > > *sigh*, I missed the initialization of the INTR_THRESHOLD register, so > right now, we just have no interrupts configured properly for RX, will > re-submit shortly. > > Meanwhile, please send feedback if you have any, thanks! > Actually, no that version of the patch is just fine, since we already programmed the DMA_MBUF_DONE_THRESH since commit 6f5a272c99108d9f8450c454a4baede9e7cc643f (" net: bcmgenet: rework Rx queue init") Sorry about the noise, -ENOCOFFEE. -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] net: fix data race on sk_buff after re-cloning
KernelThreadSanitizer (KTSAN) reported the following race (on 4.2 rc2): ThreadSanitizer: data-race in __copy_skb_header Write at 0x8800bb158f48 of size 8 by thread 3146 on CPU 5: [] __copy_skb_header+0xee/0x1d0 net/core/skbuff.c:765 [] __skb_clone+0x5c/0x320 net/core/skbuff.c:820 [] skb_clone+0xd0/0x130 net/core/skbuff.c:962 [] tcp_transmit_skb+0xb5/0x1750 net/ipv4/tcp_output.c:932 [] __tcp_retransmit_skb+0x244/0xb10 net/ipv4/tcp_output.c:2638 [] tcp_retransmit_skb+0x2b/0x240 net/ipv4/tcp_output.c:2655 [] tcp_retransmit_timer+0x579/0xb70 net/ipv4/tcp_timer.c:433 [] tcp_write_timer_handler+0x109/0x320 net/ipv4/tcp_timer.c:514 [] tcp_write_timer+0xc0/0xe0 net/ipv4/tcp_timer.c:532 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155 [< inline >] __run_timers kernel/time/timer.c:1231 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273 [] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790 Previous read at 0x8800bb158f48 of size 8 by thread 3168 on CPU 0: [] skb_release_head_state+0x4b/0x120 net/core/skbuff.c:640 [] skb_release_all+0x1d/0x50 net/core/skbuff.c:657 [< inline >] __kfree_skb net/core/skbuff.c:673 [] consume_skb+0x60/0x100 net/core/skbuff.c:746 [] __dev_kfree_skb_any+0x4d/0x60 net/core/dev.c:2312 [< inline >] dev_kfree_skb_any include/linux/netdevice.h:2933 [] e1000_unmap_and_free_tx_resource.isra.42+0xd3/0x120 drivers/net/ethernet/intel/e1000/e1000_main.c:1973 [< inline >] e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3881 [] e1000_clean+0x24d/0x11e0 drivers/net/ethernet/intel/e1000/e1000_main.c:3818 [< inline >] napi_poll net/core/dev.c:4744 [] net_rx_action+0x489/0x690 net/core/dev.c:4809 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273 [] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790 Mutexes locked by thread 3146: Mutex 436586 is locked here: [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158 [] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151 [< inline >] spin_lock include/linux/spinlock.h:312 [] tcp_write_timer+0x25/0xe0 net/ipv4/tcp_timer.c:530 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155 [< inline >] __run_timers kernel/time/timer.c:1231 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273 [] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790 The only way I can see it happens is as follows: - sk_buff_fclones is allocated - then it is cloned which returns fclones->skb2 - then fclones->skb2 is freed, which drops fclones->fclone_ref to 1 - then the original skb is cloned again - at this point skb_clone sees that fclones->fclone_ref = 1 and returns fclones->skb2 again Now initialization of fclones->skb2 races with the previous use, because refcounting lacks proper memory barriers. I am looking at skb code for the first time, so I can't conclude whether such scenario is possible or not. But refcount at least in kfree_skbmem() looks broken. For example, kfree_skb() properly inserts rmb after the fast-path check: if (likely(atomic_read(>users) == 1)) smp_rmb(); The patch contains a proposed fix. If it looks good to you and the scenario looks sane, then I will update the description and resend it. --- net/core/skbuff.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index dad4dd3..4c89bac 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -618,8 +618,9 @@ static void kfree_skbmem(struct sk_buff *skb) /* We usually free the clone (TX completion) before original skb * This test would have no chance to be true for the clone, * while here, branch prediction will be good. +* Paired with atomic_dec_and_test() below. */ - if (atomic_read(>fclone_ref) == 1) + if (atomic_read_acquire(>fclone_ref) == 1) goto fastpath; break; @@ -944,7 +945,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) return NULL; if (skb->fclone == SKB_FCLONE_ORIG && - atomic_read(>fclone_ref) == 1) { + /* Paired with atomic_dec_and_test() in kfree_skbmem(). */ + atomic_read_acquire(>fclone_ref) == 1) { n = >skb2; atomic_set(>fclone_ref, 2); } else { -- 2.6.0.rc0.131.gf624c3d -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list of all network namespaces
On Thu, Sep 17, 2015 at 10:39 AM, Ani Sinhawrote: > On Thu, Sep 17, 2015 at 2:51 AM, Rosen, Rami wrote: > >> Network namespaces which were created by other ways (like userspace >> applications >> using the clone() system call) will *not* be reflected by neither of them. > > Will there be any interest if I cook up a kernel patch that lists all > network namespaces through /proc? How do you list them since they don't have names in kernel, names are given in user-space. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary
Hi Maxime, On 17.09.2015 20:27, Maxime Ripard wrote: On Thu, Sep 17, 2015 at 08:12:31PM +0200, Oliver Hartkopp wrote: New CAN drivers go via can-next and net-next into mainline. Hmmm, actually, I meant 2 and 3, the two defconfig patches. The driver and bindings should of course go through Marc's tree. Ok. Thanks for the fix :-) Regards, Oliver -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] geneve: restore vlan bits in xmit path
On Thu, Sep 17, 2015 at 10:18 AM, John W. Linvillewrote: > These seem to have been accidentally dropped in commit 371bd1061d29 > ("geneve: Consolidate Geneve functionality in single module."). > Geneve should not export vxlan feature. So that it never sees vxlan tagged packets. Can you turn off the vlan feature? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] net: bcmgenet: Implement RX coalescing control knobs
On 16/09/15 16:47, Florian Fainelli wrote: > Add support for the ethtool rx-frames coalescing parameter which allows > defining the number of RX interrupts per frames received. The RDMA > engine supports a configurable timeout with a resolution of > approximately 8.192 us. > > We can no longer enable the BDONE/PDONE interrupts as those would > fire for each packet/buffer received, which would defeat the MBDONE > interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a > PDONE/BDONE interrupt when the threshold is set to 1. *sigh*, I missed the initialization of the INTR_THRESHOLD register, so right now, we just have no interrupts configured properly for RX, will re-submit shortly. Meanwhile, please send feedback if you have any, thanks! -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list of all network namespaces
On Thu, 17 Sep 2015 10:39:57 -0700, Ani Sinha wrote: > Will there be any interest if I cook up a kernel patch that lists all > network namespaces through /proc? /proc is a wrong interface for this, enumerating all net namespaces has nothing to do with processes. Each process has its corresponding namespaces in /proc already listed, which is as much as belongs to /proc. Dumping all net namespaces should be probably netlink based but obviously, you'll have hard time sending file descriptors over netlink. You can dump their netnsids but that won't help you much accessing the namespace contents. This is not as easy as it seems. But I'd love to have such feature. Jiri -- Jiri Benc -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] net: fix data race on sk_buff after re-cloning
KernelThreadSanitizer (KTSAN) reported the following race (on 4.2 rc2): ThreadSanitizer: data-race in __copy_skb_header Write at 0x8800bb158f48 of size 8 by thread 3146 on CPU 5: [] __copy_skb_header+0xee/0x1d0 net/core/skbuff.c:765 [] __skb_clone+0x5c/0x320 net/core/skbuff.c:820 [] skb_clone+0xd0/0x130 net/core/skbuff.c:962 [] tcp_transmit_skb+0xb5/0x1750 net/ipv4/tcp_output.c:932 [] __tcp_retransmit_skb+0x244/0xb10 net/ipv4/tcp_output.c:2638 [] tcp_retransmit_skb+0x2b/0x240 net/ipv4/tcp_output.c:2655 [] tcp_retransmit_timer+0x579/0xb70 net/ipv4/tcp_timer.c:433 [] tcp_write_timer_handler+0x109/0x320 net/ipv4/tcp_timer.c:514 [] tcp_write_timer+0xc0/0xe0 net/ipv4/tcp_timer.c:532 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155 [< inline >] __run_timers kernel/time/timer.c:1231 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273 [] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790 Previous read at 0x8800bb158f48 of size 8 by thread 3168 on CPU 0: [] skb_release_head_state+0x4b/0x120 net/core/skbuff.c:640 [] skb_release_all+0x1d/0x50 net/core/skbuff.c:657 [< inline >] __kfree_skb net/core/skbuff.c:673 [] consume_skb+0x60/0x100 net/core/skbuff.c:746 [] __dev_kfree_skb_any+0x4d/0x60 net/core/dev.c:2312 [< inline >] dev_kfree_skb_any include/linux/netdevice.h:2933 [] e1000_unmap_and_free_tx_resource.isra.42+0xd3/0x120 drivers/net/ethernet/intel/e1000/e1000_main.c:1973 [< inline >] e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3881 [] e1000_clean+0x24d/0x11e0 drivers/net/ethernet/intel/e1000/e1000_main.c:3818 [< inline >] napi_poll net/core/dev.c:4744 [] net_rx_action+0x489/0x690 net/core/dev.c:4809 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273 [] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790 Mutexes locked by thread 3146: Mutex 436586 is locked here: [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158 [] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151 [< inline >] spin_lock include/linux/spinlock.h:312 [] tcp_write_timer+0x25/0xe0 net/ipv4/tcp_timer.c:530 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155 [< inline >] __run_timers kernel/time/timer.c:1231 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273 [] apic_timer_interrupt+0x8a/0xa0 arch/x86/entry/entry_64.S:790 The only way I can see it happens is as follows: - sk_buff_fclones is allocated - then it is cloned which returns fclones->skb2 - then fclones->skb2 is freed, which drops fclones->fclone_ref to 1 - then the original skb is cloned again - at this point skb_clone sees that fclones->fclone_ref = 1 and returns fclones->skb2 again Now initialization of fclones->skb2 races with the previous use, because refcounting lacks proper memory barriers. I am looking at skb code for the first time, so I can't conclude whether such scenario is possible or not. But refcount at least in kfree_skbmem() looks broken. For example, kfree_skb() properly inserts rmb after the fast-path check: if (likely(atomic_read(>users) == 1)) smp_rmb(); The patch contains a proposed fix. If it looks good to you and the scenario looks sane, then I will update the description and resend it. --- net/core/skbuff.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index dad4dd3..4c89bac 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -618,8 +618,9 @@ static void kfree_skbmem(struct sk_buff *skb) /* We usually free the clone (TX completion) before original skb * This test would have no chance to be true for the clone, * while here, branch prediction will be good. +* Paired with atomic_dec_and_test() below. */ - if (atomic_read(>fclone_ref) == 1) + if (atomic_read_acquire(>fclone_ref) == 1) goto fastpath; break; @@ -944,7 +945,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t gfp_mask) return NULL; if (skb->fclone == SKB_FCLONE_ORIG && - atomic_read(>fclone_ref) == 1) { + /* Paired with atomic_dec_and_test() in kfree_skbmem(). */ + atomic_read_acquire(>fclone_ref) == 1) { n = >skb2; atomic_set(>fclone_ref, 2); } else { -- 2.6.0.rc0.131.gf624c3d -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] netlink: make sure -EBUSY won't escape from netlink_insert
On Wed, Sep 16, 2015 at 10:41 PM, Christoph Paaschwrote: > > can this patch get queued up for 4.1 as well? > It seems to fix a similar issue in 4.1.6. I think Herbert has an additional patch for this issue. But yes, I think should be scheduled for stable. Herbert? Linus -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: list of all network namespaces
On Thu, Sep 17, 2015 at 2:51 AM, Rosen, Ramiwrote: > Network namespaces which were created by other ways (like userspace > applications > using the clone() system call) will *not* be reflected by neither of them. Will there be any interest if I cook up a kernel patch that lists all network namespaces through /proc? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary
On Wed, Sep 16, 2015 at 01:21:18PM +0200, Gerhard Bertelsmann wrote: > Hi, > > please find attached the next version of my patch set. I have > taken all remarks from Maxime Ripard into the new version > > Please review, test and report bugs if exists. > > The patchset applies to all recent Kernel versions (4.x, next etc.). > > [PATCH v8 1/4] Device Tree Binding Documentation > [PATCH v8 2/4] Defconfig multi_v7 > [PATCH v8 3/4] Defconfig sunxi > [PATCH v8 4/4] Kernel Module Applied 3 and 4. Thanks! Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature
Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary
On 17.09.2015 19:54, Maxime Ripard wrote: On Wed, Sep 16, 2015 at 01:21:18PM +0200, Gerhard Bertelsmann wrote: Hi, please find attached the next version of my patch set. I have taken all remarks from Maxime Ripard into the new version Please review, test and report bugs if exists. The patchset applies to all recent Kernel versions (4.x, next etc.). [PATCH v8 1/4] Device Tree Binding Documentation [PATCH v8 2/4] Defconfig multi_v7 [PATCH v8 3/4] Defconfig sunxi [PATCH v8 4/4] Kernel Module Applied 3 and 4. Applied to what tree? That's not the friendly way when Marc asks you about the documentation about the device tree (patch 1) and you commit the CAN driver and the sunxi defconfig (patch 3 & 4) that he mainly reviewed to whatever tree. New CAN drivers go via can-next and net-next into mainline. So please answer Marcs question and let him queue up the CAN driver via can-next himself. Thanks, Oliver -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary
On Thu, Sep 17, 2015 at 08:12:31PM +0200, Oliver Hartkopp wrote: > > > On 17.09.2015 19:54, Maxime Ripard wrote: > >On Wed, Sep 16, 2015 at 01:21:18PM +0200, Gerhard Bertelsmann wrote: > >>Hi, > >> > >>please find attached the next version of my patch set. I have > >>taken all remarks from Maxime Ripard into the new version > >> > >>Please review, test and report bugs if exists. > >> > >>The patchset applies to all recent Kernel versions (4.x, next etc.). > >> > >>[PATCH v8 1/4] Device Tree Binding Documentation > >>[PATCH v8 2/4] Defconfig multi_v7 > >>[PATCH v8 3/4] Defconfig sunxi > >>[PATCH v8 4/4] Kernel Module > > > >Applied 3 and 4. > > Applied to what tree? > > That's not the friendly way when Marc asks you about the documentation about > the device tree (patch 1) and you commit the CAN driver and the sunxi > defconfig (patch 3 & 4) that he mainly reviewed to whatever tree. > > New CAN drivers go via can-next and net-next into mainline. > > So please answer Marcs question and let him queue up the CAN driver via > can-next himself. Hmmm, actually, I meant 2 and 3, the two defconfig patches. The driver and bindings should of course go through Marc's tree. Maxime -- Maxime Ripard, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com signature.asc Description: Digital signature
Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
From: KY SrinivasanDate: Thu, 17 Sep 2015 15:14:05 + > I think I can achieve my original goal of not having any allocation > in the send path by carefully using the memory available in the skb: Please stop flat-out ignoring David L.'s suggestion. Have a pre-cooked ring of buffers for these descriptors that you can point the chip at. No per-packet allocation is necessary at all. If you play games with SKBs you will get burned. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] geneve: remove use of internal IP header when calling IP_ECN_decapsulate
This seems to have been a "thinko". IP_ECN_decapsulate needs info from both internal and external headers. Signed-off-by: John W. Linville--- drivers/net/geneve.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index da3259ce7c8d..a917ae1cfbf3 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -121,10 +121,10 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) struct metadata_dst *tun_dst = NULL; struct geneve_dev *geneve = NULL; struct pcpu_sw_netstats *stats; - struct iphdr *iph; + struct iphdr *iph = NULL; u8 *vni; __be32 addr; - int err; + int err = 0; if (gs->collect_md) { static u8 zero_vni[3]; @@ -178,13 +178,15 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) skb_reset_network_header(skb); - iph = ip_hdr(skb); /* Now inner IP header... */ - err = IP_ECN_decapsulate(iph, skb); + if (iph) + err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { if (log_ecn_error) - net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n", ->saddr, iph->tos); + if (iph) + net_info_ratelimited("non-ECT from %pI4 " +"with TOS=%#x\n", +>saddr, iph->tos); if (err > 1) { ++geneve->dev->stats.rx_frame_errors; ++geneve->dev->stats.rx_errors; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] geneve: restore vlan bits in xmit path
These seem to have been accidentally dropped in commit 371bd1061d29 ("geneve: Consolidate Geneve functionality in single module."). Signed-off-by: John W. Linville--- drivers/net/geneve.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index a917ae1cfbf3..0aaf302cc31b 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -532,13 +533,20 @@ static int geneve_build_skb(struct rtable *rt, struct sk_buff *skb, int err; min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len - + GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr); + + GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr) + + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0); err = skb_cow_head(skb, min_headroom); if (unlikely(err)) { kfree_skb(skb); goto free_rt; } + skb = vlan_hwaccel_push_inside(skb); + if (unlikely(!skb)) { + err = -ENOMEM; + goto free_rt; + } + skb = udp_tunnel_handle_offloads(skb, csum); if (IS_ERR(skb)) { err = PTR_ERR(skb); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
iproute2 tunnel name parsing
Hi, I'm trying to create a sit tunnel called "hel": ip tun add hel mode sit remote 10.200.0.2 local 10.200.1.2 ttl 255, however it seems like this is interpreted as the help argument and I get the usage text. Is there a way to escape names that I've missed, or is this an error somewhere in argv parsing? (I'm not subscribed, so a cc would be appreciated) Thanks, Wilhelm -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] geneve: remove use of internal IP header when calling IP_ECN_decapsulate
On Thu, Sep 17, 2015 at 12:46:48PM -0700, Jesse Gross wrote: > On Thu, Sep 17, 2015 at 10:17 AM, John W. Linville >wrote: > > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > > index da3259ce7c8d..a917ae1cfbf3 100644 > > --- a/drivers/net/geneve.c > > +++ b/drivers/net/geneve.c > > @@ -178,13 +178,15 @@ static void geneve_rx(struct geneve_sock *gs, struct > > sk_buff *skb) > > > > skb_reset_network_header(skb); > > > > - iph = ip_hdr(skb); /* Now inner IP header... */ > > - err = IP_ECN_decapsulate(iph, skb); > > + if (iph) > > + err = IP_ECN_decapsulate(iph, skb); > > It looks like this is now conditional based on !collect_md. I'm not > sure that we want to have a difference in behavior between the two. Sure, I can move the iph assignment higher-up and keep the other bits unconditional. John -- John W. LinvilleSomeday the world will need a hero, and you linvi...@tuxdriver.com might be all we have. Be ready. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] geneve: remove use of internal IP header when calling IP_ECN_decapsulate
This seems to have been a "thinko". IP_ECN_decapsulate needs info from both internal and external headers. Signed-off-by: John W. Linville--- v2 -- ensure the collect_md path still calls IP_ECN_decapsulate drivers/net/geneve.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index da3259ce7c8d..549febac0579 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -126,6 +126,8 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) __be32 addr; int err; + iph = ip_hdr(skb); /* outer IP header... */ + if (gs->collect_md) { static u8 zero_vni[3]; @@ -133,7 +135,6 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) addr = 0; } else { vni = gnvh->vni; - iph = ip_hdr(skb); /* Still outer IP header... */ addr = iph->saddr; } @@ -178,7 +179,6 @@ static void geneve_rx(struct geneve_sock *gs, struct sk_buff *skb) skb_reset_network_header(skb); - iph = ip_hdr(skb); /* Now inner IP header... */ err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] net: fix data race on sk_buff after re-cloning
On Thu, 2015-09-17 at 20:44 +0200, Dmitry Vyukov wrote: > KernelThreadSanitizer (KTSAN) reported the following race (on 4.2 rc2): > > ThreadSanitizer: data-race in __copy_skb_header ... > if (likely(atomic_read(>users) == 1)) > smp_rmb(); > > The patch contains a proposed fix. > If it looks good to you and the scenario looks sane, > then I will update the description and resend it. > --- > net/core/skbuff.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) I have to double check this patch, but in the case it is needed, it would be better to not use fancy new atomic_read_acquire(), as backporting the fix up to 3.19 (where the bug was probably added) will require extra hassle. atomic_read_acquire() would be fine for cleanups and new code, in next branch. Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] nftables 0.5 release
Hi! The Netfilter project proudly presents: nftables 0.5 This release contains bug fixes and new features contained up to the 4.2 kernel release. New features * Concatenations: You can combine two or more selectors to build a tuple, then use it to look up for a matching in sets, eg. % nft add rule ip filter input ip saddr . tcp dport { \ 1.1.1.1 . 22 , \ 1.1.1.1 . 80 \ } counter accept So nft will check if the source IP address AND the TCP destination port matches what you have in the literal set above, if so it will update the rule counter and accept the packet. You can also combine concatenations with verdict maps: % nft add rule ip filter input ether saddr . ip saddr . meta iif vmap { \ 3c:71:0e:39:bb:20 . 192.168.1.120 . "wlan0" : accept, \ 3c:77:e0:39:aa:21 . 192.168.1.204 . "wlan0" : drop } You can declare a set using concatenations, to dynamically update its content instead: % nft add map filter accesslist { \ type ether_addr . ipv4_addr . iface_index : verdict \; } % nft add rule filter input ether saddr . ip saddr . meta iif vmap @accesslist Then, add elements to the set: % nft add element filter accesslist { \ 3c:71:0e:39:bb:20 . 192.168.1.120 . wlan0 : accept } On a different front, you can also combine concatenations with maps: % nft add rule ip nat prerouting dnat ip saddr . tcp dport map { \ 192.168.1.120 . 80 : 1.2.3.4, \ 192.168.1.204 . 22 : 4.3.2.1 } In the example above, the destination address that is used in DNAT depends on the source IP address and the destination port of the packet. You require a Linux kernel >= 4.1 to use this new concatenation feature and nftables 0.5 of course. * Add timeout support for sets: You can specify a lifetime for elements in your set declarations, eg. % nft add set filter whitelist { type ipv4_addr\; timeout 1h\; } % nft add element filter whitelist { 192.168.1.234 } % nft list ruleset table ip filter { set whitelist { type ipv4_addr timeout 1h elements = { 1.2.3.4 expires 59m56s} } } You can also create the set with no specific timeout: % nft add set filter whitelist { type ipv4_addr\; flags timeout\; } So you can indicate the timeout when adding the element: % nft add element filter whitelist { 192.168.2.123 timeout 1h } You still can mix this with element that will reside permanently too: % nft add element filter whitelist { 192.168.2.180 } * Add comments per set element, eg. % nft add element filter whitelist { 192.168.0.1 comment \"some host\" } * Support for mini-gmp: If you're running nft from embedded devices, you may want to skip the libgmp dependency via: % ./configure --with-mini-gmp This compiles nft using the minimal gmp implementation that comes in the nftables tarball. Note that your nft binary avoids the libgmp dependency at the cost of getting a slightly larger binary. * Dormant tables: You can disable the entire ruleset that is contained in a table by setting on the dormant flag: % nft add table filter { flags dormant\; } You can reenable it by typing: % nft add table filter * Allow to specify default chain policy: You can specify the default chain policy by when you create the chain: % nft add chain filter input { \ type filter hook input priority 0\; policy drop\; } You can also change it for an existing chain anytime by updating it via: % nft add chain filter input { policy accept\; } Bug fixes = * Command per line ruleset representation: According to what I can find on the Internet, it seems some people like to maintain their ruleset in scripts so they can add comments and annotate things there. However, this is a problem for two reasons: There is no atomic update since rules are published to the packet path one after another and this increases the time that nft takes to reload your ruleset significantly. So, the solution to this problem consists of keeping your ruleset like this: % cat my-ruleset-file flush ruleset add table filter add set filter whitelist { type ipv4_addr; } add chain filter input { type filter hook input priority 0; } add rule filter input iif lo accept add rule filter input ct state established,related counter accept add rule filter input tcp dport { 22, 80 } counter accept add rule filter input ip saddr @whitelist counter accept add element filter whitelist { 192.168.1.120 } add element filter whitelist { 192.168.1.121 } add element filter whitelist { 192.168.1.204 } You can also insert comments in the file through '#'. Then, you can atomically restore it via: % nft -f my-ruleset-file You can also use this command per line representation to apply incremental ruleset updates atomically: % cat
I need to talk to you very urgent, Email me via: dkareem...@yahoo.com.hk
I need to talk to you very urgent, Email me via: dkareem...@yahoo.com.hk -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Experiences with slub bulk use-case for network stack
On Wed, 16 Sep 2015 10:13:25 -0500 (CDT) Christoph Lameterwrote: > On Wed, 16 Sep 2015, Jesper Dangaard Brouer wrote: > > > > > Hint, this leads up to discussing if current bulk *ALLOC* API need to > > be changed... > > > > Alex and I have been working hard on practical use-case for SLAB > > bulking (mostly slUb), in the network stack. Here is a summary of > > what we have learned so far. > > SLAB refers to the SLAB allocator which is one slab allocator and SLUB is > another slab allocator. > > Please keep that consistent otherwise things get confusing This naming scheme is really confusing. I'll try to be more consistent. So, you want capital letters SLAB and SLUB when talking about a specific slab allocator implementation. > > Bulk free'ing SKBs during TX completion is a big and easy win. > > > > Specifically for slUb, normal path for freeing these objects (which > > are not on c->freelist) require a locked double_cmpxchg per object. > > The bulk free (via detached freelist patch) allow to free all objects > > belonging to the same slab-page, to be free'ed with a single locked > > double_cmpxchg. Thus, the bulk free speedup is quite an improvement. > > Yep. > > > Alex and I had the idea of bulk alloc returns an "allocator specific > > cache" data-structure (and we add some helpers to access this). > > Maybe add some Macros to handle this? Yes, helpers will likely turn out to be macros. > > In the slUb case, the freelist is a single linked pointer list. In > > the network stack the skb objects have a skb->next pointer, which is > > located at the same position as freelist pointer. Thus, simply > > returning the freelist directly, could be interpreted as a skb-list. > > The helper API would then do the prefetching, when pulling out > > objects. > > The problem with the SLUB case is that the objects must be on the same > slab page. Yes, I'm aware that, that is what we are trying to take advantage of. > > For the slUb case, we would simply cmpxchg either c->freelist or > > page->freelist with a NULL ptr, and then own all objects on the > > freelist. This also reduce the time we keep IRQs disabled. > > You dont need to disable interrupts for the cmpxchges. There is > additional state in the page struct though so the updates must be > done carefully. Yes, I'm aware of cmpxchg does not need to disable interrupts. And I plan to take advantage of this, in this new approach for bulk alloc. Our current bulk alloc disables interrupts for the full period (of collecting the number requested objects). What I'm proposing is keeping interrupts on, and then simply cmpxchg e.g 2 slab-pages out of the SLUB allocator (which the SLUB code calls freelist's). The bulk call now owns these freelists, and returns them to the caller. The API caller gets some helpers/macros to access objects, to shield him from the details (of SLUB freelist's). The pitfall with this API is we don't know how many objects are on a SLUB freelist. And we cannot walk the freelist and count them, because then we hit the problem of memory/cache stalls (that we are trying so hard to avoid). -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] geneve: restore vlan bits in xmit path
On Thu, Sep 17, 2015 at 12:25 PM, John W. Linvillewrote: > On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote: >> On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville >> wrote: >> > These seem to have been accidentally dropped in commit 371bd1061d29 >> > ("geneve: Consolidate Geneve functionality in single module."). >> > >> Geneve should not export vxlan feature. So that it never sees vxlan >> tagged packets. Can you turn off the vlan feature? > > I'm not sure I understand...? This is vlan, not vxlan. I think he just mean vlan. If you remove the line where dev->vlan_features are set then the core stack will handle this and we don't need to do anything special here. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
From: KY SrinivasanDate: Thu, 17 Sep 2015 19:52:01 + > > >> -Original Message- >> Have a pre-cooked ring of buffers for these descriptors that you can >> point the chip at. No per-packet allocation is necessary at all. > > Even if I had a ring of buffers, I would still need to manage the life cycle > of these buffers - selecting an unused one on the transmit path and marking > it used (atomically). Have one per TX ring entry, then the lifetime matches the lifetime of the TX entry itself and therefore you need do nothing. That's the whole idea. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] geneve: restore vlan bits in xmit path
On Thu, Sep 17, 2015 at 12:48:56PM -0700, Jesse Gross wrote: > On Thu, Sep 17, 2015 at 12:25 PM, John W. Linville >wrote: > > On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote: > >> On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville > >> wrote: > >> > These seem to have been accidentally dropped in commit 371bd1061d29 > >> > ("geneve: Consolidate Geneve functionality in single module."). > >> > > >> Geneve should not export vxlan feature. So that it never sees vxlan > >> tagged packets. Can you turn off the vlan feature? > > > > I'm not sure I understand...? This is vlan, not vxlan. > > I think he just mean vlan. If you remove the line where > dev->vlan_features are set then the core stack will handle this and we > don't need to do anything special here. Is that preferrable to this patch? Tunneling vlan-tagged frames seems weird, but I would hate to disallow it if some crazy person wanted to do that... I guess the other way would slightly improve performance, and this could be added back later. What about the VLAN-related bits in dev->features and ->hw_features? Should they go as well? John -- John W. LinvilleSomeday the world will need a hero, and you linvi...@tuxdriver.com might be all we have. Be ready. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: smc91x: convert pxa dma to dmaengine
From: Robert JarzmikDate: Wed, 16 Sep 2015 11:41:54 +0200 > David Miller writes: > >> From: Robert Jarzmik >> Date: Thu, 10 Sep 2015 21:26:04 +0200 >> >>> Convert the dma transfers to be dmaengine based, now pxa has a dmaengine >>> slave driver. This makes this driver a bit more PXA agnostic. >>> >>> The driver was tested on pxa27x (mainstone) and pxa310 (zylonite), >>> ie. only pxa platforms. >>> >>> Signed-off-by: Robert Jarzmik >>> Cc: Russell King >>> Cc: Arnd Bergmann >>> --- >>> This has potential to break other platform such as Neponset, Idp, >>> halibut and qsd8x50, so I added Russell and Arnd as they were discussing >>> smc91x support last February. >> > >> Is someone testing whether such platforms break or not? I'm waiting for >> that before I consider applying this patch. > > My understanding is that Russell is the only one left testing them, or at > least > he was the only one complaining about a breakage lately on neponset. > > I can wait several weeks for Russell to have a bit of time to try : I know it > will compile correctly at least for neponset, and I know almost all the code > is > under #ifdef CONFIG_ARCH_PXA. And still I would feel far more comfortable if > it > was tested, just as you. Oh well, I've waited long enough patch applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared
David Woodhouse: > On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote: > > > > Thanks; I'll try that. In fact since updating to 4.2 the problem has > > got worse — now the whole machine dies: > > There is something very strange going on here. I've found two ways to > make it stop crashing when cp_tx_timeout() hits the 'popf' when > unlocking the spinlock. cp_tx_timeout takes lock, disables irq, calls cp_clean_rings, thus plain dev_kfree_skb if a skb is still referenced in one of the rx/tx ring. You may replace it with dev_kfree_skb_any. -- Ueimor -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] geneve: restore vlan bits in xmit path
On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote: > On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville >wrote: > > These seem to have been accidentally dropped in commit 371bd1061d29 > > ("geneve: Consolidate Geneve functionality in single module."). > > > Geneve should not export vxlan feature. So that it never sees vxlan > tagged packets. Can you turn off the vlan feature? I'm not sure I understand...? This is vlan, not vxlan. John -- John W. LinvilleSomeday the world will need a hero, and you linvi...@tuxdriver.com might be all we have. Be ready. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] iplink_geneve: add UDP destination port configuration at link creation
Signed-off-by: John W. Linville--- I didn't see an iproute2 patch posted for this option, so here is my version... ip/iplink_geneve.c| 13 + man/man8/ip-link.8.in | 6 ++ 2 files changed, 19 insertions(+) diff --git a/ip/iplink_geneve.c b/ip/iplink_geneve.c index 331240a6a3d9..0a45647844f5 100644 --- a/ip/iplink_geneve.c +++ b/ip/iplink_geneve.c @@ -19,6 +19,7 @@ static void print_explain(FILE *f) { fprintf(f, "Usage: ... geneve id VNI remote ADDR\n"); fprintf(f, " [ ttl TTL ] [ tos TOS ]\n"); + fprintf(f, " [ dstport PORT ]\n"); fprintf(f, "\n"); fprintf(f, "Where: VNI := 0-16777215\n"); fprintf(f, " ADDR := IP_ADDRESS\n"); @@ -40,6 +41,7 @@ static int geneve_parse_opt(struct link_util *lu, int argc, char **argv, struct in6_addr daddr6 = IN6ADDR_ANY_INIT; __u8 ttl = 0; __u8 tos = 0; + __u16 dstport = 0; while (argc > 0) { if (!matches(*argv, "id") || @@ -80,6 +82,10 @@ static int geneve_parse_opt(struct link_util *lu, int argc, char **argv, tos = uval; } else tos = 1; + } else if (!matches(*argv, "dstport")){ + NEXT_ARG(); + if (get_u16(, *argv, 0)) + invarg("dst port", *argv); } else if (matches(*argv, "help") == 0) { explain(); return -1; @@ -111,6 +117,9 @@ static int geneve_parse_opt(struct link_util *lu, int argc, char **argv, addattr8(n, 1024, IFLA_GENEVE_TTL, ttl); addattr8(n, 1024, IFLA_GENEVE_TOS, tos); + if (dstport) + addattr16(n, 1024, IFLA_GENEVE_PORT, htons(dstport)); + return 0; } @@ -150,6 +159,10 @@ static void geneve_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) else fprintf(f, "tos %#x ", tos); } + + if (tb[IFLA_GENEVE_PORT]) + fprintf(f, "dstport %u ", + ntohs(rta_getattr_u16(tb[IFLA_GENEVE_PORT]))); } static void geneve_print_help(struct link_util *lu, int argc, char **argv, diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in index 1896eb6f185e..2e1889af650e 100644 --- a/man/man8/ip-link.8.in +++ b/man/man8/ip-link.8.in @@ -747,6 +747,8 @@ the following additional arguments are supported: .BI ttl " TTL " .R " ] [ " .BI tos " TOS " +.R " ] [ " +.BI dstport " PORT " .R " ]" .in +8 @@ -766,6 +768,10 @@ the following additional arguments are supported: .BI tos " TOS" - specifies the TOS value to use in outgoing packets. +.sp +.BI dstport " PORT " +- specifies the UDP destination port to communicate at both ends of the GENEVE tunnel. + .in -8 .SS ip link delete - delete virtual link -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] geneve: remove use of internal IP header when calling IP_ECN_decapsulate
On Thu, Sep 17, 2015 at 10:17 AM, John W. Linvillewrote: > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c > index da3259ce7c8d..a917ae1cfbf3 100644 > --- a/drivers/net/geneve.c > +++ b/drivers/net/geneve.c > @@ -178,13 +178,15 @@ static void geneve_rx(struct geneve_sock *gs, struct > sk_buff *skb) > > skb_reset_network_header(skb); > > - iph = ip_hdr(skb); /* Now inner IP header... */ > - err = IP_ECN_decapsulate(iph, skb); > + if (iph) > + err = IP_ECN_decapsulate(iph, skb); It looks like this is now conditional based on !collect_md. I'm not sure that we want to have a difference in behavior between the two. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> -Original Message- > From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, September 17, 2015 11:52 AM > To: KY Srinivasan> Cc: david.lai...@aculab.com; alexander.du...@gmail.com; Haiyang Zhang > ; vkuzn...@redhat.com; netdev@vger.kernel.org; > linux-ker...@vger.kernel.org; jasow...@redhat.com > Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V > > From: KY Srinivasan > Date: Thu, 17 Sep 2015 15:14:05 + > > > I think I can achieve my original goal of not having any allocation > > in the send path by carefully using the memory available in the skb: > > Please stop flat-out ignoring David L.'s suggestion. I am sorry; I did not mean to convey that impression. > > Have a pre-cooked ring of buffers for these descriptors that you can > point the chip at. No per-packet allocation is necessary at all. Even if I had a ring of buffers, I would still need to manage the life cycle of these buffers - selecting an unused one on the transmit path and marking it used (atomically). Once the transmit completes (as indicated by the transmit complete callback) this buffer needs to be marked free. I can certainly make these operations efficient and lock-free, but they are still at some level an allocation/free operation albeit potentially more efficient than having the kernel allocate the memory. > > If you play games with SKBs you will get burned. I will implement Dave L's suggestion. However, I am curious as to why you would consider my proposed usage of the skb headroom and the control buffer area in skb as non-standard usage. Regards, K. Y -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] geneve: restore vlan bits in xmit path
On Thu, Sep 17, 2015 at 12:48 PM, Jesse Grosswrote: > On Thu, Sep 17, 2015 at 12:25 PM, John W. Linville > wrote: >> On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote: >>> On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville >>> wrote: >>> > These seem to have been accidentally dropped in commit 371bd1061d29 >>> > ("geneve: Consolidate Geneve functionality in single module."). >>> > >>> Geneve should not export vxlan feature. So that it never sees vxlan >>> tagged packets. Can you turn off the vlan feature? >> >> I'm not sure I understand...? This is vlan, not vxlan. > > I think he just mean vlan. If you remove the line where > dev->vlan_features are set then the core stack will handle this and we > don't need to do anything special here. Yes, I meant vlan, sorry for confusion. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2 tunnel name parsing
On Thu, Sep 17, 2015 at 09:55:29PM +0200, Wilhelm Wijkander wrote: > Hi, > > I'm trying to create a sit tunnel called "hel": ip tun add hel mode > sit remote 10.200.0.2 local 10.200.1.2 ttl 255, however it seems like > this is interpreted as the help argument and I get the usage text. Is > there a way to escape names that I've missed, or is this an error > somewhere in argv parsing? > > (I'm not subscribed, so a cc would be appreciated) > Thanks, > Wilhelm > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Hi Wilhelm, You can use 'name' before 'hel' like: $ ip tun add name hel mode sit remote 10.200.0.2 local 10.200.1.2 ttl 255 and it should work, actually I just tried and it works. Regards, Vadim Kochan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull-request: can-next 2015-09-17
Hello David, this is a pull request of two patches for net-next/master. Gerhard Bertelsmann adds support for the CAN controller found on the Allwinner A10/A20 SoC. Marc --- The following changes since commit 37d2dbcdcca88e392009d7cbe8617d5af0ebcb32: net: fix cdc-phonet.c dependency and build error (2015-09-16 11:51:19 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git tags/linux-can-next-for-4.4-20150917 for you to fetch changes up to 0738eff14d817a02ab082c392c96a1613006f158: can: Allwinner A10/A20 CAN Controller support - Kernel module (2015-09-17 22:39:08 +0200) linux-can-next-for-4.4-20150917 Gerhard Bertelsmann (2): can: Allwinner A10/A20 CAN Controller support - Devicetree bindings can: Allwinner A10/A20 CAN Controller support - Kernel module .../devicetree/bindings/net/can/sun4i_can.txt | 36 + drivers/net/can/Kconfig| 10 + drivers/net/can/Makefile | 1 + drivers/net/can/sun4i_can.c| 857 + 4 files changed, 904 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/can/sun4i_can.txt create mode 100644 drivers/net/can/sun4i_can.c -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions| Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917- | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | signature.asc Description: OpenPGP digital signature