Re: [PATCH net-next 2/3] net: Add FIB table id to rtable

2015-09-17 Thread Simon Horman
[Cc: linux...@vger.kernel.org]

On Tue, Sep 15, 2015 at 12:01:59PM -0700, David Miller wrote:
> From: David Ahern <d...@cumulusnetworks.com>
> Date: Wed,  2 Sep 2015 13:58:35 -0700
> 
> > Add the FIB table id to rtable to make the information available for
> > IPv4 as it is for IPv6.
> > 
> > Signed-off-by: David Ahern <d...@cumulusnetworks.com>
> 
> Applied.

Unfortunately I have observed the following when booting the koelsch board
which is based on the Renesas ARM r8a7791 SoC. The kernel was complied
using the shmobile_defconfig.

I also see this problem in net-next (37d2dbcdcca8) and next-20150917.

Booting Linux on physical CPU 0x0
Linux version 4.2.0-11171-gb7503e0cdb5d 
(ho...@ayumi.isobedori.kobe.vergenet.net) (gcc version 4.6.3 (GCC) ) #6130 SMP 
Thu Sep 17 15:33:06 JST 2015
CPU: ARMv7 Processor [413fc0f2] revision 2 (ARMv7), cr=10c5307d
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Machine model: Koelsch
Ignoring memory block 0x2 - 0x24000
debug: ignoring loglevel setting.
Memory policy: Data cache writealloc
On node 0 totalpages: 262144
free_area_init_node: node 0, pgdat c06b5a40, node_mem_map eeff9000
  Normal zone: 1520 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 194560 pages, LIFO batch:31
  HighMem zone: 67584 pages, LIFO batch:15
PERCPU: Embedded 10 pages/cpu @eefc s17984 r0 d22976 u40960
pcpu-alloc: s17984 r0 d22976 u40960 alloc=10*4096
pcpu-alloc: [0] 0 [0] 1 
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 260624
Kernel command line: ignore_loglevel rw root=/dev/nfs ip=dhcp
PID hash table entries: 4096 (order: 2, 16384 bytes)
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1032232K/1048576K available (4935K kernel code, 244K rwdata, 1356K 
rodata, 304K init, 204K bss, 16344K reserved, 0K cma-reserved, 270336K highmem)
Virtual kernel memory layout:
vector  : 0x - 0x1000   (   4 kB)
fixmap  : 0xffc0 - 0xfff0   (3072 kB)
vmalloc : 0xf000 - 0xff00   ( 240 MB)
lowmem  : 0xc000 - 0xef80   ( 760 MB)
pkmap   : 0xbfe0 - 0xc000   (   2 MB)
  .text : 0xc0008000 - 0xc062dfec   (6296 kB)
  .init : 0xc062e000 - 0xc067a000   ( 304 kB)
  .data : 0xc067a000 - 0xc06b7340   ( 245 kB)
   .bss : 0xc06ba000 - 0xc06ed314   ( 205 kB)
Hierarchical RCU implementation.
Build-time adjustment of leaf fanout to 32.
RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
RCU: Adjusting geometry for rcu_fanout_leaf=32, nr_cpu_ids=2
NR_IRQS:16 nr_irqs:16 16
Architected cp15 timer(s) running at 10.00MHz (virt).
clocksource: arch_sys_counter: mask: 0xff max_cycles: 0x24e6a1710, 
max_idle_ns: 440795202120 ns
sched_clock: 56 bits at 10MHz, resolution 100ns, wraps every 4398046511100ns
Switching to timer-based delay loop, resolution 100ns
Console: colour dummy device 80x30
console [tty0] enabled
Calibrating delay loop (skipped), value calculated using timer frequency.. 
20.00 BogoMIPS (lpj=10)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes)
CPU: Testing write buffer coherency: ok
CPU0: update cpu_capacity 1024
CPU0: thread -1, cpu 0, socket 0, mpidr 8000
Setting up static identity map for 0x40009000 - 0x40009058
Unable to boot CPU1 when MD21 is set
CPU1: failed to boot: -524
Brought up 1 CPUs
SMP: Total of 1 processors activated (20.00 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
devtmpfs: initialized
VFP support v0.3: implementor 41 architecture 4 part 30 variant f rev 0
clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 
1911260446275 ns
pinctrl core: initialized pinctrl subsystem
NET: Registered protocol family 16
DMA: preallocated 256 KiB pool for atomic coherent allocations
renesas_irqc e61c.interrupt-controller: driving 10 irqs
sh-pfc e606.pfc: r8a77910_pfc support registered
No ATAGs?
hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers.
hw-breakpoint: maximum watchpoint size is 8 bytes.
IRQ2 is asserted, installing da9063/da9210 regulator quirk
gpio-regulator regulator@1: Could not obtain regulator setting GPIOs: -517
gpio-regulator regulator@3: Could not obtain regulator setting GPIOs: -517
gpio-regulator regulator@5: Could not obtain regulator setting GPIOs: -517
vgaarb: loaded
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
i2c 6-0058: Masking da9063 interrupt sources
i2c 6-0068: Masking da9210 interrupt sources
i2c 6-0068: IRQ2 is not asserted, removing quirk
i2c-sh_mobile e60b.i2c: I2C adapter 6, bus speed 10 Hz
media: Linux media interface: v0.10
Linux video captur

Re: [linux-next] oops in ip_route_input_noref

2015-09-17 Thread Tomeu Vizoso
On 17 September 2015 at 01:47, Sergey Senozhatsky
 wrote:
> On (09/16/15 07:07), David Ahern wrote:
>> Hi Sergey:
>>
>
> Hi,
>
> sorry for long reply. Baremetal. So grabbing the backtrace is
> a bit complicated. But it looks very close to what Richard Alpe
> has posted.

Hi,

in this boot log you will find a backtrace:
https://lava.collabora.co.uk/scheduler/job/67404/log_file

(ip_route_input_noref) from [] (ip_rcv+0x39c/0x6e8)
(ip_rcv) from [] (__netif_receive_skb_core+0x5ec/0x7c0)
(__netif_receive_skb_core) from [] (netif_receive_skb_internal+0x34/0xa4)
(netif_receive_skb_internal) from [] (napi_gro_receive+0x78/0xa4)
(napi_gro_receive) from [] (rtl8169_poll+0x2dc/0x5dc)
(rtl8169_poll) from [] (net_rx_action+0x1d4/0x2d0)
(net_rx_action) from [] (__do_softirq+0xfc/0x214)
(__do_softirq) from [] (irq_exit+0xb0/0x118)
(irq_exit) from [] (__handle_domain_irq+0x60/0xb4)
(__handle_domain_irq) from [] (gic_handle_irq+0x54/0x94)
(gic_handle_irq) from [] (__irq_svc+0x54/0x70)

This is on a jetson-tk1 booting a multi_v7_defconfig kernel.

I expect this issue to appear in today's kernelci.org boots.

I don't see this or any other boot error after applying David's patch.

Regards,

Tomeu

> in IRQ
>
> RIP is at ip_route_input_noref
>
> [0.877597]  [] arp_process+0x39c/0x690
> [0.877597]  [] arp_rcv+0x13e/0x170
>
>
> -ss
>
>
>> Is this with KVM or baremetal?
>>
>> -8<-
>> thanks for the analysis
>>
>> >>addr2line -e vmlinux -i 0x8146c0b1
>> >>net/ipv4/route.c:1815
>> >>net/ipv4/route.c:1905
>> >>
>> >>
>> >>which seems to be this line ip_route_input_noref()->ip_route_input_slow():
>> >>...
>> >>1813 rth->rt_is_input = 1;
>> >>1814 if (res.table)
>> >>1815 rth->rt_table_id = res.table->tb_id;
>> >>1816
>> >>...
>> >>
>> >>
>> >>added by b7503e0cdb5dbec5d201aa69dc14679b5ae8
>> >>
>> >> net: Add FIB table id to rtable
>> >>
>> >> Add the FIB table id to rtable to make the information available for
>> >> IPv4 as it is for IPv6.
>> >>
>> >>
>> >>-ss
>>
>> Hi Richard:
>>
>> >I to get an Oops in ip_route_input_noref(). It happens occasionally during 
>> >bootup.
>> >KVM environment using virtio driver. Let me know if you need any additional 
>> >info or
>> >if you want me to try to bisect it.
>> >
>> >Starting network...
>> >...
>> >[0.877040] BUG: unable to handle kernel NULL pointer dereference at 
>> >0056
>> >[0.877597] IP: [] ip_route_input_noref+0x1a2/0xb00
>>
>> Can you send me your kernel config and qemu command line? KVM with virtio
>> networking is a primary test vehicle, and I did not encounter this at all.
>>
>> Thanks,
>> David
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/3] can: Allwinner A10/A20 CAN Controller support - Devicetree bindings

2015-09-17 Thread Marc Kleine-Budde
On 09/16/2015 01:21 PM, Gerhard Bertelsmann wrote:
> Devicetree bindings for Allwinner A10/A20 CAN
> 
> Signed-off-by: Gerhard Bertelsmann 
> ---
> 
>  .../devicetree/bindings/net/can/sun4i_can.txt  |  38 +
>  1 files changed, 389 insertions(+)
> 
> 
> diff --git a/Documentation/devicetree/bindings/net/can/sun4i_can.txt 
> b/Documentation/devicetree/bindings/net/can/sun4i_can.txt
> new file mode 100644
> index 000..cd0f50c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/can/sun4i_can.txt
> @@ -0,0 +1,38 @@
> +Allwinner A10/A20 CAN controller Device Tree Bindings
> +-
> +
> +Required properties:
> +- compatible: "allwinner,sun4i-a10-can"
> +- reg: physical base address and size of the Allwinner A10/A20 CAN register 
> map.
> +- interrupts: interrupt specifier for the sole interrupt.
> +- clock: phandle and clock specifier.
> +
> +
> +Example
> +---
> +
> +SoC common .dtsi file:
> +
> + can0_pins_a: can0@0 {
> + allwinner,pins = "PH20","PH21";
> + allwinner,function = "can";
> + allwinner,drive = <0>;
> + allwinner,pull = <0>;
> + };
> +...
> + can0: can@01c2bc00 {
> + compatible = "allwinner,sun4i-a10-can";
> + reg = <0x01c2bc00 0x400>;
> + interrupts = <0 26 4>;
> + clocks = <_gates 4>;
> + status = "disabled";
> + };

What about adding this snippet to SoC where the CAN core is available?
Maxime, what's the policy on sinxi? If you give me an Ack I'd like to
take the series via linux-can-next (and to net-next) upstream.

Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


[PATCH v2 1/5] net: add Hisilicon Network Subsystem support (config and documents)

2015-09-17 Thread huangdaode
The Hisilicon Network Subsystem is a long term evolution IP which is
supposed to be used in Hisilicon ICT SoC. The IP, which is called hns
for short, is a TCP/IP acceleration engine, which can directly decode
TCP/IP stream and distribute them to different ring buffers.

HNS can be configured to work on different mode for different scenario.
This patch make use only some of the mode to make it as standard
ethernet NIC. The other mode will be added soon.

The whole function has 4 kernel sub-modules:

hnae: the HNS acceleration engine framework. It provides a abstract
interface between the engine and the upper layers which make use of the
engine by ring buffer.

hns_enet_drv: a standard ethernet driver that base on the ring buffer.

hns_dsaf: one of the implementation of HNS acceleration engine, which is
applied on Hililicon hip05, Hi1610 and other later-on SoCs

hns_mdio: the mdio control to the PHY, used by acceleration engine

This submit add basic config and documents

Signed-off-by: huangdaode 
Signed-off-by: Kenneth Lee 
Signed-off-by: Yisen Zhuang 
---
 .../bindings/net/hisilicon-hip04-net.txt   |   4 +-
 .../devicetree/bindings/net/hisilicon-hns-dsaf.txt |  49 ++
 .../devicetree/bindings/net/hisilicon-hns-mdio.txt |  22 +++
 .../devicetree/bindings/net/hisilicon-hns-nic.txt  |  47 +
 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi   | 193 +
 5 files changed, 313 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt
 create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt 
b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
index 988fc69..d1df8a0 100644
--- a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -32,13 +32,13 @@ Required properties:
 
 Required properties:
 
-- compatible: should be "hisilicon,hip04-mdio".
+- compatible: should be "hisilicon,mdio".
 - Inherits from MDIO bus node binding [2]
 [2] Documentation/devicetree/bindings/net/phy.txt
 
 Example:
mdio {
-   compatible = "hisilicon,hip04-mdio";
+   compatible = "hisilicon,mdio";
reg = <0x28f1000 0x1000>;
#address-cells = <1>;
#size-cells = <0>;
diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt 
b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt
new file mode 100644
index 000..80411b2
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt
@@ -0,0 +1,49 @@
+Hisilicon DSA Fabric device controller
+
+Required properties:
+- compatible: should be "hisilicon,hns-dsaf-v1" or "hisilicon,hns-dsaf-v2".
+  "hisilicon,hns-dsaf-v1" is for hip05.
+  "hisilicon,hns-dsaf-v2" is for Hi1610 and Hi1612.
+- dsa-name: dsa fabric name who provide this interface.
+  should be "dsafX", X is the dsaf id.
+- mode: dsa fabric mode string. only support one of dsaf modes like these:
+   "2port-64vf",
+   "6port-16rss",
+   "6port-16vf".
+- interrupt-parent: the interrupt parent of this device.
+- interrupts: should contain the DSA Fabric and rcb interrupt.
+- reg: specifies base physical address(es) and size of the device registers.
+  The first region is external interface control register base and size.
+  The second region is SerDes base register and size.
+  The third region is the PPE register base and size.
+  The fourth region is dsa fabric base register and size.
+  The fifth region is cpld base register and size, it is not required if do 
not use cpld.
+- phy-handle: phy handle of physicl port, 0 if not any phy device. see 
ethernet.txt [1].
+- buf-size: rx buffer size, should be 16-1024.
+- desc-num: number of description in TX and RX queue, should be 512, 1024, 
2048 or 4096.
+
+[1] Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+
+dsa: dsa@c700 {
+   compatible = "hisilicon,hns-dsaf-v1";
+   dsa_name = "dsaf0";
+   mode = "6port-16rss";
+   interrupt-parent = <_dsa>;
+   reg = <0x0 0xC000 0x0 0x42
+  0x0 0xC200 0x0 0x30
+  0x0 0xc500 0x0 0x89
+  0x0 0xc700 0x0 0x6>;
+   phy-handle = <0 0 0 0 _phy4 _phy5 0 0>;
+   interrupts = <131 4>,<132 4>, <133 4>,<134 4>,
+<135 4>,<136 4>, <137 4>,<138 4>,
+<139 4>,<140 4>, <141 4>,<142 4>,
+<143 4>,<144 4>, <145 4>,<146 4>,
+<147 4>,<148 4>, <384 1>,<385 1>,
+<386 1>,<387 1>, <388 1>,<389 1>,
+<390 1>,<391 1>,

[PATCH v2 5/5] net: add Hisilicon Network Subsystem basic ethernet support

2015-09-17 Thread huangdaode
This is to add basic ethernet support for HNS. It is one of the way to
use the HNS acceleration engine. But most of the decoding/encoding
capability of the AE cannot be used in this way.

This submit contains the basic feature as a ethernet driver. More will
be added later.

Signed-off-by: huangdaode 
Signed-off-by: Kenneth Lee 
Signed-off-by: Yisen Zhuang 
---
 drivers/net/ethernet/hisilicon/Kconfig   |8 +
 drivers/net/ethernet/hisilicon/hns/Makefile  |3 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c| 1646 ++
 drivers/net/ethernet/hisilicon/hns/hns_enet.h|   84 ++
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 1230 
 5 files changed, 2971 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index aae2c47..165b5a8 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -55,4 +55,12 @@ config HNS_DSAF
  acceleration engine support. The engine is used in Hisilicon hip05,
  Hi1610 and further ICT SoC
 
+config HNS_ENET
+   tristate "Hisilicon HNS Ethernet Device Support"
+   select PHYLIB
+   select HNS
+   ---help---
+ This selects the general ethernet driver for HNS.  This module make
+ use of any HNS AE driver, such as HNS_DSAF
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/hns/Makefile 
b/drivers/net/ethernet/hisilicon/hns/Makefile
index 0516af7..6010c83 100644
--- a/drivers/net/ethernet/hisilicon/hns/Makefile
+++ b/drivers/net/ethernet/hisilicon/hns/Makefile
@@ -7,3 +7,6 @@ obj-$(CONFIG_HNS) += hnae.o
 obj-$(CONFIG_HNS_DSAF) += hns_dsaf.o
 hns_dsaf-objs = hns_ae_adapt.o hns_dsaf_gmac.o hns_dsaf_mac.o hns_dsaf_misc.o \
hns_dsaf_main.o hns_dsaf_ppe.o hns_dsaf_rcb.o hns_dsaf_xgmac.o
+
+obj-$(CONFIG_HNS_ENET) += hns_enet_drv.o
+hns_enet_drv-objs = hns_enet.o hns_ethtool.o
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
new file mode 100644
index 000..0713ced
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -0,0 +1,1646 @@
+/*
+ * Copyright (c) 2014-2015 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "hnae.h"
+#include "hns_enet.h"
+
+#define NIC_MAX_Q_PER_VF 16
+#define HNS_NIC_TX_TIMEOUT (5 * HZ)
+
+#define SERVICE_TIMER_HZ (1 * HZ)
+
+#define NIC_TX_CLEAN_MAX_NUM 256
+#define NIC_RX_CLEAN_MAX_NUM 64
+
+#define RCB_ERR_PRINT_CYCLE 1000
+
+#define RCB_IRQ_NOT_INITED 0
+#define RCB_IRQ_INITED 1
+
+static void fill_desc(struct hnae_ring *ring, void *priv,
+ int size, dma_addr_t dma, int frag_end,
+ int buf_num, enum hns_desc_type type)
+{
+   struct hnae_desc *desc = >desc[ring->next_to_use];
+   struct hnae_desc_cb *desc_cb = >desc_cb[ring->next_to_use];
+   struct sk_buff *skb;
+   __be16 protocol;
+   u32 ip_offset;
+   u32 asid_bufnum_pid = 0;
+   u32 flag_ipoffset = 0;
+
+   desc_cb->priv = priv;
+   desc_cb->length = size;
+   desc_cb->dma = dma;
+   desc_cb->type = type;
+
+   desc->addr = cpu_to_le64(dma);
+   desc->tx.send_size = cpu_to_le16((u16)size);
+
+   /*config bd buffer end */
+   flag_ipoffset |= 1 << HNS_TXD_VLD_B;
+
+   asid_bufnum_pid |= buf_num << HNS_TXD_BUFNUM_S;
+
+   if (type == DESC_TYPE_SKB) {
+   skb = (struct sk_buff *)priv;
+
+   if (skb->ip_summed == CHECKSUM_PARTIAL) {
+   protocol = skb->protocol;
+   ip_offset = ETH_HLEN;
+
+   /*if it is a SW VLAN check the next protocol*/
+   if (protocol == htons(ETH_P_8021Q)) {
+   ip_offset += VLAN_HLEN;
+   protocol = vlan_get_protocol(skb);
+   skb->protocol = protocol;
+   }
+
+   if (skb->protocol == htons(ETH_P_IP)) {
+   flag_ipoffset |= 1 << HNS_TXD_L3CS_B;
+   /* check for tcp/udp header */
+   flag_ipoffset |= 1 << HNS_TXD_L4CS_B;
+
+   } else if (skb->protocol == htons(ETH_P_IPV6)) {
+   /* 

Re: [PATCH net-next v2] net: Initialize table in fib result

2015-09-17 Thread Richard Alpe
On 2015-09-16 18:19, Nikolay Aleksandrov wrote:
> The root cause is use of res.table uninitialized.
>> 
>> Thanks to Nikolay for noticing the uninitialized use amongst the maze of
>> gotos.
>> 
>> As Nikolay pointed out the second initialization is not required to fix
>> the oops, but rather to fix a related problem where a valid lookup should
>> be invalidated before creating the rth entry.
>> 
>> Fixes: b7503e0cdb5d ("net: Add FIB table id to rtable")
>> Reported-by: Sergey Senozhatsky 
>> Reported-by: Richard Alpe 
Works for me as well. Thanks!

(Tested-by: Richard Alpe )

Regards
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/5] net: Hisilicon Network Subsystem support

2015-09-17 Thread huangdaode
This is V2 of Hisilicon Network Subsystem(HNS) patchesets taking care
about LKML comments.

Please find out the changes from the change logs. 
This patchset is rebased on mainline kernel Linux 4.3-rc1 branch.

[PATCH v2 1/5] Device Tree Binding Documentation
[PATCH v2 2/5] Merge MDIO Module
[PATCH v2 3/5] Hisilicon Network Acceleration Engine Framework
[PATCH v2 4/5] Distributed System Area Fabric Module
[PATCH v2 5/5] Basic Ethernet Driver Module

Changes from V1:
1. Remove "inline" in C file (according to LKML comment, same in below).
2. Fix a bug about class_find_device.
3. Change the DTS pattern on hnae, restruct it to compatible with Hi1610 soc.
4. Unified hip04_mdio and hip05_mdio into hns_mdio, which is more usaul for 
   later SOCs.

V1 Patches Reference: https://lkml.org/lkml/2015/8/14/165

Thanks

huangdaode (5):
  net: add Hisilicon Network Subsystem support (config and documents)
  net: add Hisilicon Network Subsystem MDIO support
  net: add Hisilicon Network Subsystem hnae framework support
  net: add Hisilicon Network Subsystem DSAF support
  net: add Hisilicon Network Subsystem basic ethernet support

 .../bindings/net/hisilicon-hip04-net.txt   |4 +-
 .../devicetree/bindings/net/hisilicon-hns-dsaf.txt |   49 +
 .../devicetree/bindings/net/hisilicon-hns-mdio.txt |   22 +
 .../devicetree/bindings/net/hisilicon-hns-nic.txt  |   47 +
 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi   |  193 ++
 drivers/net/ethernet/hisilicon/Kconfig |   34 +-
 drivers/net/ethernet/hisilicon/Makefile|4 +-
 drivers/net/ethernet/hisilicon/hip04_mdio.c|  185 --
 drivers/net/ethernet/hisilicon/hns/Makefile|   12 +
 drivers/net/ethernet/hisilicon/hns/hnae.c  |  507 
 drivers/net/ethernet/hisilicon/hns/hnae.h  |  583 +
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  |  777 +++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c |  704 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h |   45 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  |  900 +++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h  |  456 
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 2445 
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |  427 
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c |  317 +++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h |   43 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c  |  583 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h  |  105 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  | 1023 
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h  |  137 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  |  972 
 .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c|  836 +++
 .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h|   15 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  | 1646 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.h  |   84 +
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   | 1230 ++
 drivers/net/ethernet/hisilicon/hns_mdio.c  |  520 +
 31 files changed, 14716 insertions(+), 189 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-dsaf.txt
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-mdio.txt
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt
 create mode 100644 arch/arm64/boot/dts/hisilicon/hip05_hns.dtsi
 delete mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.h
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hns_enet.c
 create mode 100644 

[PATCH v2 2/5] net: add Hisilicon Network Subsystem MDIO support

2015-09-17 Thread huangdaode
The MDIO support for Hisilicon Network Subsystem. It is used in Hislicon
hip04, hip05 and Hi1610 SoC to control the external PHY

Signed-off-by: huangdaode 
Signed-off-by: Yisen Zhuang 
Signed-off-by: Kenneth Lee 
---
 drivers/net/ethernet/hisilicon/Kconfig  |  10 +-
 drivers/net/ethernet/hisilicon/Makefile |   3 +-
 drivers/net/ethernet/hisilicon/hip04_mdio.c | 185 --
 drivers/net/ethernet/hisilicon/hns_mdio.c   | 520 
 4 files changed, 531 insertions(+), 187 deletions(-)
 delete mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns_mdio.c

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index dead17b..9184f1d 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_HISILICON
bool "Hisilicon devices"
default y
-   depends on ARM
+   depends on ARM || ARM64
---help---
  If you have a network (Ethernet) card belonging to this class, say Y.
 
@@ -27,8 +27,16 @@ config HIP04_ETH
select PHYLIB
select MARVELL_PHY
select MFD_SYSCON
+   select HNS_MDIO
---help---
  If you wish to compile a kernel for a hardware with hisilicon p04 SoC 
and
  want to use the internal ethernet then you should answer Y to this.
 
+config HNS_MDIO
+   tristate "Hisilicon HNS MDIO device Support"
+   select MDIO
+   ---help---
+ This selects the HNS MDIO support. It is needed by HNS_DSAF to access
+ the PHY
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile 
b/drivers/net/ethernet/hisilicon/Makefile
index 6c14540..04b4b21 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -3,4 +3,5 @@
 #
 
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
+obj-$(CONFIG_HIP04_ETH) += hip04_eth.o
+obj-$(CONFIG_HNS_MDIO) += hns_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c 
b/drivers/net/ethernet/hisilicon/hip04_mdio.c
deleted file mode 100644
index fca0a5b..000
--- a/drivers/net/ethernet/hisilicon/hip04_mdio.c
+++ /dev/null
@@ -1,185 +0,0 @@
-/* Copyright (c) 2014 Linaro Ltd.
- * Copyright (c) 2014 Hisilicon Limited.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define MDIO_CMD_REG   0x0
-#define MDIO_ADDR_REG  0x4
-#define MDIO_WDATA_REG 0x8
-#define MDIO_RDATA_REG 0xc
-#define MDIO_STA_REG   0x10
-
-#define MDIO_START BIT(14)
-#define MDIO_R_VALID   BIT(1)
-#define MDIO_READ  (BIT(12) | BIT(11) | MDIO_START)
-#define MDIO_WRITE (BIT(12) | BIT(10) | MDIO_START)
-
-struct hip04_mdio_priv {
-   void __iomem *base;
-};
-
-#define WAIT_TIMEOUT 10
-static int hip04_mdio_wait_ready(struct mii_bus *bus)
-{
-   struct hip04_mdio_priv *priv = bus->priv;
-   int i;
-
-   for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) 
{
-   if (i == WAIT_TIMEOUT)
-   return -ETIMEDOUT;
-   msleep(20);
-   }
-
-   return 0;
-}
-
-static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
-{
-   struct hip04_mdio_priv *priv = bus->priv;
-   u32 val;
-   int ret;
-
-   ret = hip04_mdio_wait_ready(bus);
-   if (ret < 0)
-   goto out;
-
-   val = regnum | (mii_id << 5) | MDIO_READ;
-   writel_relaxed(val, priv->base + MDIO_CMD_REG);
-
-   ret = hip04_mdio_wait_ready(bus);
-   if (ret < 0)
-   goto out;
-
-   val = readl_relaxed(priv->base + MDIO_STA_REG);
-   if (val & MDIO_R_VALID) {
-   dev_err(bus->parent, "SMI bus read not valid\n");
-   ret = -ENODEV;
-   goto out;
-   }
-
-   val = readl_relaxed(priv->base + MDIO_RDATA_REG);
-   ret = val & 0x;
-out:
-   return ret;
-}
-
-static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
-   int regnum, u16 value)
-{
-   struct hip04_mdio_priv *priv = bus->priv;
-   u32 val;
-   int ret;
-
-   ret = hip04_mdio_wait_ready(bus);
-   if (ret < 0)
-   goto out;
-
-   writel_relaxed(value, priv->base + MDIO_WDATA_REG);
-   val = regnum | (mii_id << 5) | MDIO_WRITE;
-   writel_relaxed(val, priv->base + MDIO_CMD_REG);
-out:
-   return ret;
-}
-
-static int hip04_mdio_reset(struct mii_bus *bus)
-{
-   int temp, i;
-
-   

[PATCH v2 3/5] net: add Hisilicon Network Subsystem hnae framework support

2015-09-17 Thread huangdaode
HNAE (Hisilicon Network Acceleration Engine) is a framework to provide a
unified ring buffer interface for Hisilicon Network Acceleration
Engines.

With the interface, upper layer can work as ethernet driver, ODP driver
or other service driver on purpose.

Signed-off-by: huangdaode 
Signed-off-by: Kenneth Lee 
Signed-off-by: Yisen Zhuang 
---
 drivers/net/ethernet/hisilicon/Kconfig  |   7 +
 drivers/net/ethernet/hisilicon/Makefile |   1 +
 drivers/net/ethernet/hisilicon/hns/Makefile |   5 +
 drivers/net/ethernet/hisilicon/hns/hnae.c   | 507 
 drivers/net/ethernet/hisilicon/hns/hnae.h   | 583 
 5 files changed, 1103 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/hns/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.c
 create mode 100644 drivers/net/ethernet/hisilicon/hns/hnae.h

diff --git a/drivers/net/ethernet/hisilicon/Kconfig 
b/drivers/net/ethernet/hisilicon/Kconfig
index 9184f1d..85a2609 100644
--- a/drivers/net/ethernet/hisilicon/Kconfig
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -39,4 +39,11 @@ config HNS_MDIO
  This selects the HNS MDIO support. It is needed by HNS_DSAF to access
  the PHY
 
+config HNS
+   tristate "Hisilicon Network Subsystem Support (Framework)"
+   ---help---
+ This selects the framework support for Hisilicon Network Subsystem. It
+ is needed by any driver which provides HNS acceleration engine or make
+ use of the engine
+
 endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile 
b/drivers/net/ethernet/hisilicon/Makefile
index 04b4b21..390b71f 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -5,3 +5,4 @@
 obj-$(CONFIG_HIX5HD2_GMAC) += hix5hd2_gmac.o
 obj-$(CONFIG_HIP04_ETH) += hip04_eth.o
 obj-$(CONFIG_HNS_MDIO) += hns_mdio.o
+obj-$(CONFIG_HNS) += hns/
diff --git a/drivers/net/ethernet/hisilicon/hns/Makefile 
b/drivers/net/ethernet/hisilicon/hns/Makefile
new file mode 100644
index 000..8a5f1e7
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the HISILICON network device drivers.
+#
+
+obj-$(CONFIG_HNS) += hnae.o
diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c
new file mode 100644
index 000..0a0a9e8
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -0,0 +1,507 @@
+/*
+ * Copyright (c) 2014-2015 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "hnae.h"
+
+#define cls_to_ae_dev(dev) container_of(dev, struct hnae_ae_dev, cls_dev)
+
+static struct class *hnae_class;
+
+static void
+hnae_list_add(spinlock_t *lock, struct list_head *node, struct list_head *head)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(lock, flags);
+   list_add_tail_rcu(node, head);
+   spin_unlock_irqrestore(lock, flags);
+}
+
+static void hnae_list_del(spinlock_t *lock, struct list_head *node)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(lock, flags);
+   list_del_rcu(node);
+   spin_unlock_irqrestore(lock, flags);
+}
+
+static int hnae_alloc_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb)
+{
+   unsigned int order = hnae_page_order(ring);
+   struct page *p = dev_alloc_pages(order);
+
+   if (!p)
+   return -ENOMEM;
+
+   cb->priv = p;
+   cb->page_offset = 0;
+   cb->reuse_flag = 0;
+   cb->buf  = page_address(p);
+   cb->length = hnae_page_size(ring);
+   cb->type = DESC_TYPE_PAGE;
+
+   return 0;
+}
+
+static void hnae_free_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb)
+{
+   if (cb->type == DESC_TYPE_SKB)
+   dev_kfree_skb_any((struct sk_buff *)cb->priv);
+   else if (unlikely(is_rx_ring(ring)))
+   put_page((struct page *)cb->priv);
+   memset(cb, 0, sizeof(*cb));
+}
+
+static int hnae_map_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb)
+{
+   cb->dma = dma_map_page(ring_to_dev(ring), cb->priv, 0,
+  cb->length, ring_to_dma_dir(ring));
+
+   if (dma_mapping_error(ring_to_dev(ring), cb->dma))
+   return -EIO;
+
+   return 0;
+}
+
+static void hnae_unmap_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb)
+{
+   if (cb->type == DESC_TYPE_SKB)
+   dma_unmap_single(ring_to_dev(ring), cb->dma, cb->length,
+ring_to_dma_dir(ring));
+   else
+   dma_unmap_page(ring_to_dev(ring), cb->dma, cb->length,
+  

RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-17 Thread David Laight
From: KY Srinivasan
> Sent: 16 September 2015 23:58
...
> > I think we get that.  The question is does the Remote NDIS header and
> > packet info actually need to be a part of the header data?  I would
> > argue that it probably doesn't.
> >
> > So for example in netvsc_start_xmit it looks like you are calling
> > init_page_array in order to populate a set of page buffers, but the
> > first buffer for the Remote NDIS protocol is populated as a separate
> > page and offset.  As such it doesn't seem like it necessarily needs to
> > be a part of the header data but could be maintained perhaps in a
> > separate ring buffer, or perhaps just be a separate page that you break
> > up to use for each header.
> 
> You are right; the rndis header can be built as a separate fragment and sent.
> Indeed this is what we were doing earlier - on the outgoing path we would 
> allocate
> memory for the rndis header. My goal was to avoid this allocation on every 
> packet being
> sent and I decided to use the headroom instead. If we can completely avoid 
> all memory
> allocation for rndis header, it makes a significant perf difference:
...


So just preallocate the header space as a fixed buffer for each ring entry
(or tx frame).

If you allocate a fixed buffer for each ring entry you may find there are
performance gains from copying small fragments into the buffer instead
of doing whatever mapping operations are required.

David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list of all network namespaces

2015-09-17 Thread Jiri Benc
On Wed, 16 Sep 2015 17:54:34 -0700, Rick Jones wrote:
> On 09/16/2015 05:46 PM, Ani Sinha wrote:
> > just a stupid question. Is it possible to get a list of all active
> > network namespaces in the kernel through /proc or some other
> > interface?

Not reliably and not efficiently. You can look at what plotnetcfg does:
https://github.com/jbenc/plotnetcfg/blob/master/netns.c

> Presumably you could copy what "ip netns" does, which appears to be to 
> look in /var/run/netns .  At least that is what an strace of that 
> command suggests.

That only works for namespaces added by the ip tool (and presumably a
few other tools which leave a symlink in /var/run/netns as a courtesy).
Depending on what you need, it may be enough. Be aware that you won't
find all net namespaces in the system this way, though.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list of all network namespaces

2015-09-17 Thread Nicolas Dichtel

Le 17/09/2015 02:54, Rick Jones a écrit :

On 09/16/2015 05:46 PM, Ani Sinha wrote:

Hi guys

just a stupid question. Is it possible to get a list of all active
network namespaces in the kernel through /proc or some other
interface?


Presumably you could copy what "ip netns" does, which appears to be to look in
/var/run/netns .  At least that is what an strace of that command suggests.

This will only list netns referenced in '/var/run/netns', which is not 'all'
existing netns (most probably only netns created by iproute2).

Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 27/31] net/tipc: use kmemdup rather than duplicating its implementation

2015-09-17 Thread Jon Maloy
Acked-by: Jon Maloy 

///jon

> -Original Message-
> From: Andrzej Hajda [mailto:a.ha...@samsung.com]
> Sent: Wednesday, 16 September, 2015 06:07
> To: Jon Maloy; Ying Xue
> Cc: Bartlomiej Zolnierkiewicz; Marek Szyprowski; linux-
> ker...@vger.kernel.org; David S. Miller; netdev@vger.kernel.org
> Subject: Re: [PATCH 27/31] net/tipc: use kmemdup rather than duplicating its
> implementation
> 
> Ping.
> 
> Regards
> Andrzej
> 
> On 08/07/2015 09:59 AM, Andrzej Hajda wrote:
> > The patch was generated using fixed coccinelle semantic patch
> > scripts/coccinelle/api/memdup.cocci [1].
> >
> > [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320
> >
> > Signed-off-by: Andrzej Hajda 
> > ---
> >  net/tipc/server.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/net/tipc/server.c b/net/tipc/server.c index
> > 922e04a..c187cad 100644
> > --- a/net/tipc/server.c
> > +++ b/net/tipc/server.c
> > @@ -411,13 +411,12 @@ static struct outqueue_entry
> *tipc_alloc_entry(void *data, int len)
> > if (!entry)
> > return NULL;
> >
> > -   buf = kmalloc(len, GFP_ATOMIC);
> > +   buf = kmemdup(data, len, GFP_ATOMIC);
> > if (!buf) {
> > kfree(entry);
> > return NULL;
> > }
> >
> > -   memcpy(buf, data, len);
> > entry->iov.iov_base = buf;
> > entry->iov.iov_len = len;
> >

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches

2015-09-17 Thread Daniel Mack
Hi Florian,

On 09/16/2015 11:21 PM, Florian Westphal wrote:
> Daniel Mack  wrote:
>> I'm re-addressing the issue of matching socket meta information for
>> non-established sockets that has been discussed a while ago:
>>
>>   
>> http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877
>>
>> Being able to reliably match on net_cls cgroup ids is crucial in
>> order to build a per-application or per-container firewall rules
>> which don't leak ingress packets. Such a feature would be very
>> useful to have.
> 
> Could you clarify what 'which don't leak ingress packets' means?

Well, currently, the existing cgroups matches only filter packets that
are sent to an established socket. All other packets are ignored. So
when users install such matches as advertised by the documented
examples, and the chain policy is permissive, the firewall 'leaks'
packets, which is unexpected.

>> The patch set is obviously not yet finished, because a lot more
>> protocol handlers need to be patched. Right now, I only addressed
>> tcp_ipv4. Before I do that, I want to get some feedback on the
>> approach, so please let me know what you think.
> 
> I think there are several issues.
> 
> implementation problems:
> - i'm not sure its legal to call the hook input with skb->sk locked,
>   some matches might want to aquire it.

In the code as it stands after my patch set, I don't see where skb->sk
is locked? After all, skb->sk is NULL, even on the 2nd iteration, which
is why I patched the newly looked up socket to be available in the nf hook.

> - what makes NFT_META_CGROUP special? (or was that just an example?)

It's what I want to get working, but other 'meta' hooks can be made
working in a similar fashion.

> design issues:
> The assumption seems to be that a given skb can always be mapped to a
> particular socket, and hence a cgroup.
>
> Thats not necessarily the case, e.g. with broad-/multicasting or when
> the socket is e.g. in timewait state.

Yes, that's true. The idea for multicast would be to just drop the
cloned skb instead of delivering it to the final socket.

> Some skbs will now travel INPUT hooks twice.
>
> And once you'd extend this so that we re-invoke nf hooks for mcast
> packets, for each socket they've been received on, you change netfilter
> behaviour again (one skb, one traversal -> n traversals of ruleset, one
> for each sk).
> 
> I think that this makes it a non-starter, sorry.

Hmm, I see your point.

> I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves
> early-demux-esque code into the nft ruleset.
> 
> Then you could do something like
> 
> nft add rule ip filter input meta l4proto tcp demux meta cgroup 42

Ok, but how would that be different from the unconditional demuxing
patches we've kicked around earlier, especially when it comes to
multicast sockets? Could you explain what you have in mind here?

> The caveat being that even in this case we cannot guarantee
> that skb->sk is set afterwards, or that a cgroup can be derived from it.
> 
> Iff you absolutely need this, I'd seriously entertain the idea of adding
> NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset
> as a socket filter.

That would be a new netfilter hook then, something that is called after
LOCAL_IN, for ingress only? In a sense, it would be called from the
protocol handlers, just as my patches do right now, but instead of
conditionally re-iterating the same rules again, we would walk a
different chain?

> But really, at that point, a much better question would be wheter net
> cgroups are the answer to whatever the question was, or what problem we
> are attempting to address here...

The idea is simply to have a packet filter which is based on information
derived from the task that sends or will eventually handle the packet.
IOW: We want to be able to install netfilter rules that apply to all
packets received or sent by tasks that match a certain criteria, without
modifying the sources of those tasks.

As we already have net_cls hooked up in netfilter rules, it seems
easiest to just get this working. But with the multiple approaches we
already had, it appears the real fix needs more thinking.


Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/3] can: Allwinner A10/A20 CAN Controller support - Devicetree bindings

2015-09-17 Thread Maxime Ripard
On Wed, Sep 16, 2015 at 01:21:19PM +0200, Gerhard Bertelsmann wrote:
> Devicetree bindings for Allwinner A10/A20 CAN
> 
> Signed-off-by: Gerhard Bertelsmann 

Acked-by: Maxime Ripard 

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature


Re: [PATCH v8 1/3] can: Allwinner A10/A20 CAN Controller support - Devicetree bindings

2015-09-17 Thread Maxime Ripard
On Thu, Sep 17, 2015 at 10:04:56AM +0200, Marc Kleine-Budde wrote:
> On 09/16/2015 01:21 PM, Gerhard Bertelsmann wrote:
> > Devicetree bindings for Allwinner A10/A20 CAN
> > 
> > Signed-off-by: Gerhard Bertelsmann 
> > ---
> > 
> >  .../devicetree/bindings/net/can/sun4i_can.txt  |  38 +
> >  1 files changed, 389 insertions(+)
> > 
> > 
> > diff --git a/Documentation/devicetree/bindings/net/can/sun4i_can.txt 
> > b/Documentation/devicetree/bindings/net/can/sun4i_can.txt
> > new file mode 100644
> > index 000..cd0f50c
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/can/sun4i_can.txt
> > @@ -0,0 +1,38 @@
> > +Allwinner A10/A20 CAN controller Device Tree Bindings
> > +-
> > +
> > +Required properties:
> > +- compatible: "allwinner,sun4i-a10-can"
> > +- reg: physical base address and size of the Allwinner A10/A20 CAN 
> > register map.
> > +- interrupts: interrupt specifier for the sole interrupt.
> > +- clock: phandle and clock specifier.
> > +
> > +
> > +Example
> > +---
> > +
> > +SoC common .dtsi file:
> > +
> > +   can0_pins_a: can0@0 {
> > +   allwinner,pins = "PH20","PH21";
> > +   allwinner,function = "can";
> > +   allwinner,drive = <0>;
> > +   allwinner,pull = <0>;
> > +   };
> > +...
> > +   can0: can@01c2bc00 {
> > +   compatible = "allwinner,sun4i-a10-can";
> > +   reg = <0x01c2bc00 0x400>;
> > +   interrupts = <0 26 4>;
> > +   clocks = <_gates 4>;
> > +   status = "disabled";
> > +   };
> 
> What about adding this snippet to SoC where the CAN core is available?
> Maxime, what's the policy on sinxi?

It would be great, but it can come as a second step.

> If you give me an Ack I'd like to take the series via linux-can-next
> (and to net-next) upstream.

I just did so for this patch, I'll review the driver when I'll have a
bit of time.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote:
> 
> [...]
> > [308309.574551] 8139cp :00:0b.0 eth1: Transmit timeout, status 
> c   2b0 80ff
> 
> Rx and Tx are enabled.
> 
> Instant (untested) hack below.

Thanks; I'll try that. In fact since updating to 4.2 the problem has
got worse — now the whole machine dies:

[  232.064630] [ cut here ] 
[  232.069282] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 
dev_watchdog+0x1e5/0x200()  
   
[  232.077840] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out   
[  232.084380] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe 
nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE 
xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit 
xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox 
ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp 
nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack 
iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit 
act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc 
sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 
nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter 
ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 
aes_i586   
[  232.157787] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #25
[  232.163982]   c10313eb dee95000  ff32 0258 c1031446 
0009
 
[  232.171988]  dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 012f c13c3afc 
dee95000
 
[  232.179978]  e04cfd3c  dee95000 dee95240 0258 8100 c1272d10 
dee95000
 
[  232.188012] Call Trace:  
[  232.190482]  [] ? warn_slowpath_common+0x5b/0x90   
[  232.196063]  [] ? warn_slowpath_fmt+0x26/0x30  
[  232.201307]  [] ? dev_watchdog+0x1e5/0x200 
[  232.206317]  [] ? qdisc_rcu_free+0x30/0x30 
[  232.211307]  [] ? call_timer_fn.isra.7+0xe/0x60
[  232.216811]  [] ? qdisc_rcu_free+0x30/0x30 
[  232.221794]  [] ? run_timer_softirq+0xfd/0x1b0 
[  232.227221]  [] ? __do_softirq+0xa7/0x190  
[  232.232117]  [] ? __hrtimer_tasklet_trampoline+0x20/0x20   
[  232.238395]  [] ? do_softirq_own_stack+0x1b/0x20   
[  232.243881][] ? do_IRQ+0x35/0xa0  
[  232.248904]  [] ? common_interrupt+0x29/0x30   
[  232.254141]  [] ? put_unbound_pool+0x17b/0x1a0 
[  232.259470]  [] ? default_idle+0x2/0x10
[  232.264213]  [] ? arch_cpu_idle+0x6/0x10   
[  232.269026]  [] ? cpu_startup_entry+0xf5/0x190 
[  232.274459]  [] ? start_kernel+0x2e5/0x2e8 
[  232.279432] ---[ end trace 30ae4e701c36b431 ]--- 
[  232.284167] 8139cp :00:0b.0 eth1: Transmit timeout, status  c   2b1 
80ac
 
[  260.106382] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0]
[  260.113515] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe 
nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE 
xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit 
xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox 
ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp 
nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack 
iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit 
act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc 
sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 
nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter 
ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 
aes_i586   
[  260.116369] CPU: 0 PID: 0 Comm: swapper Tainted: GW   4.2.0-gx+ 
#25 
 
[  260.116369] task: c13f7540 ti: c13f task.ti: c13f
[  260.116369] EIP: 0060:[] EFLAGS: 00200292 CPU: 0   
[  260.116369] EIP is at _raw_spin_unlock_irqrestore+0xa/0x10 

Re: Possible netlink autobind regression

2015-09-17 Thread Thomas Graf
On 09/17/15 at 01:15pm, Herbert Xu wrote:
> On Wed, Sep 16, 2015 at 10:02:00PM -0700, Cong Wang wrote:
> >
> > This part doesn't look correct, seems it is checking if this is a kernel
> > netlink socket rather than if it is bound. But I am not sure...
> 
> Good point.  I've changed it so that bound is only set for non-kernel
> sockets.
> 
> ---8<---
> netlink: Fix autobind race condition that leads to zero port ID
> 
> The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
> Reset portid after netlink_insert failure") introduced a race
> condition where if two threads tried to autobind the same socket
> one of them may end up with a zero port ID.
> 
> This patch reverts that commit and instead fixes it by introducing
> a separte "bound" variable to indicate whether a user-space socket
> has been bound.
> 
> Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
> Reported-by: Tejun Heo 
> Reported-by: Linus Torvalds 
> Signed-off-by: Herbert Xu 
> Reviewed-by: Cong Wang 

Acked-by: Thomas Graf 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote:
> Instant (untested) hack below.

That seems to trigger a lot, but ultimately doesn't help...

[  250.998980] 8139cp :00:0b.0 eth1: Timeout head=000b, tail=000a   
[  252.637287] net_ratelimit: 5 callbacks suppressed
[  252.642022] 8139cp :00:0b.0 eth1: Timeout head=003f, tail=003e   
[  252.973255] 8139cp :00:0b.0 eth1: Timeout head=0028, tail=0027   
[  253.911945] 8139cp :00:0b.0 eth1: Timeout head=0010, tail=000f   
[  254.151013] 8139cp :00:0b.0 eth1: Timeout head=000e, tail=000d   
[  255.551730] 8139cp :00:0b.0 eth1: Timeout head=0025, tail=0024   
[  255.568070] 8139cp :00:0b.0 eth1: Timeout head=0027, tail=0024   
[  255.575717] 8139cp :00:0b.0 eth1: Timeout head=002a, tail=0024   
[  255.583035] 8139cp :00:0b.0 eth1: Timeout head=002b, tail=0024   
[  255.590361] 8139cp :00:0b.0 eth1: Timeout head=002c, tail=0024   
[  255.598080] 8139cp :00:0b.0 eth1: Timeout head=002e, tail=0024   
[  267.066384] [ cut here ] 
[  267.071053] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 
dev_watchdog+0x1e5/0x200()  
   
[  267.079526] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out   
[  267.086051] Modules linked in: 8139cp sch_teql mii iptable_nat pppoe 
nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE 
xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit 
xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox 
ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp 
nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack 
iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit 
act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc 
sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 
nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter 
ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 
aes_i586 [last unloaded: 8139cp]   
[  267.161698] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #26
[  267.167800]   c10313eb ddc53000  fde1 0258 c1031446 
0009
 
[  267.171408]  dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 012f c13c3afc 
ddc53000
 
[  267.183847]  e06f9dec  ddc53000 ddc53240 0258 8100 c1272d10 
ddc53000
 
[  267.191812] Call Trace:  
[  267.194376]  [] ? warn_slowpath_common+0x5b/0x90   
[  267.199874]  [] ? warn_slowpath_fmt+0x26/0x30  
[  267.205200]  [] ? dev_watchdog+0x1e5/0x200 
[  267.210179]  [] ? qdisc_rcu_free+0x30/0x30 
[  267.215250]  [] ? call_timer_fn.isra.7+0xe/0x60
[  267.220661]  [] ? qdisc_rcu_free+0x30/0x30 
[  267.225739]  [] ? run_timer_softirq+0xfd/0x1b0 
[  267.231071]  [] ? __do_softirq+0xa7/0x190  
[  267.236054]  [] ? __hrtimer_tasklet_trampoline+0x20/0x20   
[  267.242274]  [] ? do_softirq_own_stack+0x1b/0x20   
[  267.247768][] ? do_IRQ+0x35/0xa0  
[  267.252248]  [] ? common_interrupt+0x29/0x30   
[  267.258062]  [] ? put_unbound_pool+0x17b/0x1a0 
[  267.263391]  [] ? default_idle+0x2/0x10
[  267.268186]  [] ? arch_cpu_idle+0x6/0x10   
[  267.272999]  [] ? cpu_startup_entry+0xf5/0x190 
[  267.278410]  [] ? start_kernel+0x2e5/0x2e8 
[  267.283378] ---[ end trace a08600e9030733fc ]--- 
[  267.288100] cp_tx_timeout
[  267.290750] 8139cp :00:0b.0 eth1: Transmit timeout, status  c   2b1 
c0ac
 
[  267.298166] will lock... 
[  267.300709] Handling tx timeout, flags 200286
[  267.305281] Will wake queue...   
[  267.308153] Will unlock... flags 200286  
[  292.120424] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0]
[  292.127561] Modules linked in: 8139cp sch_teql mii iptable_nat pppoe 

Re: [fw filter]: Broken! fw mark based tc class selection not working

2015-09-17 Thread Jamal Hadi Salim

On 09/14/15 18:04, Cong Wang wrote:



That is exactly the original code. But it is not readable at all,
at least I still missed it when I touched the tp->init() part. :(
Having a boolean doesn't harm anything.


The default should really be no head alloced (given that is the
main use case).
The other part you can make more readable.

cheers,
jamal


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible netlink autobind regression

2015-09-17 Thread Tejun Heo
Hello, Herbert.

On Thu, Sep 17, 2015 at 01:15:03PM +0800, Herbert Xu wrote:
> netlink: Fix autobind race condition that leads to zero port ID
> 
> The commit c0bb07df7d981e4091432754e30c9c720e2c0c78 ("netlink:
> Reset portid after netlink_insert failure") introduced a race
> condition where if two threads tried to autobind the same socket
> one of them may end up with a zero port ID.
>
> This patch reverts that commit and instead fixes it by introducing
> a separte "bound" variable to indicate whether a user-space socket
> has been bound.
> 
> Fixes: c0bb07df7d98 ("netlink: Reset portid after netlink_insert failure")
> Reported-by: Tejun Heo 
> Reported-by: Linus Torvalds 
> Signed-off-by: Herbert Xu 
> Reviewed-by: Cong Wang 

Maybe add that this led to a deadlock and add a Link tag to this
thread?

> @@ -1083,10 +1083,12 @@ static int netlink_insert(struct sock *sk, u32 portid)
>   if (err) {
>   if (err == -EEXIST)
>   err = -EADDRINUSE;
> - nlk_sk(sk)->portid = 0;
>   sock_put(sk);
> + goto err;
>   }
>  
> + nlk_sk(sk)->bound = !!portid;

!! isn't necessasry and this creates ordering between two stores.
->bound must be visible only after ->portid is visible, so this should
be smp_store_release().

> @@ -2371,7 +2373,7 @@ static int netlink_sendmsg(struct socket *sock, struct 
> msghdr *msg, size_t len)
>   dst_group = nlk->dst_group;
>   }
>  
> - if (!nlk->portid) {
> + if (!nlk->bound) {

And all unlocked reads should be smp_load_acquire().

>   err = netlink_autobind(sock);
>   if (err)
>   goto out;

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-next] oops in ip_route_input_noref

2015-09-17 Thread Thierry Reding
On Wed, Sep 16, 2015 at 09:04:15AM -0600, David Ahern wrote:
> On 9/16/15 9:00 AM, Fabio Estevam wrote:
> >On Wed, Sep 16, 2015 at 6:24 AM, Sergey Senozhatsky
> > wrote:
> >
> >>added by b7503e0cdb5dbec5d201aa69dc14679b5ae8
> >>
> >> net: Add FIB table id to rtable
> >>
> >> Add the FIB table id to rtable to make the information available for
> >> IPv4 as it is for IPv6.
> >
> >I see the same issue here when booting a mx25 ARM processor via NFS.
> >
> >defconfig is arch/arm/configs/imx_v4_v5_defconfig.
> >
> 
> I am still not able to reproduce. While I work on a full Cumulus image for
> other test cases here's a patch to try; eagle eye Nikolay noted a potential
> use without init in the maze of goto's.
> 
> Thanks,
> David

> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index da427a4a33fe..80f7c5b7b832 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1712,6 +1712,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
> __be32 daddr, __be32 saddr,
>   goto martian_source;
>  
>   res.fi = NULL;
> + res.table = NULL;
>   if (ipv4_is_lbcast(daddr) || (saddr == 0 && daddr == 0))
>   goto brd_input;
>  
> @@ -1834,6 +1835,7 @@ out:return err;
>   RT_CACHE_STAT_INC(in_no_route);
>   res.type = RTN_UNREACHABLE;
>   res.fi = NULL;
> + res.table = NULL;
>   goto local_input;
>  
>   /*

I was seeing the same oops as Fabio (except that the faulting address
was 0xb instead of 0x7) and after applying this patch I no longer see
it:

Tested-by: Thierry Reding 


signature.asc
Description: PGP signature


Re: [PATCH iproute2] man ip-link: Fix wording in VLAN reorder_hdr explanation

2015-09-17 Thread Jeremy Harris
On 16/09/15 17:55, Vadim Kochan wrote:
> Signed-off-by: Vadim Kochan 
> ---
>  man/man8/ip-link.8.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
> index 1896eb6..4928249 100644
> --- a/man/man8/ip-link.8.in
> +++ b/man/man8/ip-link.8.in
> @@ -327,7 +327,7 @@ physical device (if this device does not support VLAN 
> offloading), the similar
>  on the RX direction - by default the packet will be untagged before being
>  received by VLAN device. Reordering allows to accelerate tagging on egress 
> and
>  to hide VLAN header on ingress so the packet looks like regular Ethernet 
> packet,
> -at the same time it might be confusing while the packet sniffing as the VLAN 
> header
> +at the same time it might be confusing for packet capture as the VLAN header
>  does not exist within the packet.
>  
>  VLAN offloading can be checked by
> 

Acked-by: Jeremy Harris 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-17 Thread Vitaly Kuznetsov
David Miller  writes:

> From: David Laight 
> Date: Wed, 16 Sep 2015 16:25:03 +
>
>> Am I right in thinking this is adding an extra 96 unused bytes to the front
>> of almost all skb just so that hyper-v can make its link level header
>> contiguous with whatever follows (IP header ?).
>> 
>> Doesn't sound ideal.
>
> Agreed, this is rediculous, and the entire stack will incur this cost
> just because hyperv is enabled in the kernel config.

That's what 'RFC' in the subject was about :-)

We already have a precedent of increasing LL_MAX_HEADER globaly because
of a config option (CONFIG_MAC80211_MESH) but Hyper-V needs more.

-- 
  Vitaly
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv6: ip6_fragment: fix headroom tests and skb leak

2015-09-17 Thread David Woodhouse
On Wed, 2015-09-16 at 17:26 +0200, Florian Westphal wrote:
> I tested this e1000 driver hacked to not allocate additional headroom
> (we end up in slowpath, since LL_RESERVED_SPACE is 16).

And it works on the originally-offending setup too; thanks.

Tested-by: David Woodhouse 

> Reported-by: David Woodhouse 
> Diagnosed-by: David Woodhouse 

They generally prefer me to use @intel.com for those too, if you would.
I draw the line at using it for actual email communication though :)

-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


RE: list of all network namespaces

2015-09-17 Thread Rosen, Rami
Hi,

>Presumably you could copy what "ip netns" does, which appears to be to look in 
>/var/run/netns .  At least that is what an strace of that >command suggests.

This is true, but keep in mind that the output of "ip netns", as well as 
listing the contents of /var/run/netns, reflects only network namespaces
which were created with the "ip netns" command. The "ip netns" userspace 
implementation consists of code which enables this,
by creating /var/run/netns, bind mounting it, etc.

Network namespaces which were created by other ways (like userspace applications
using the clone() system call) will *not* be reflected by neither of them.

Regards,
Rami Rosen
Intel Corporation


N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

pull request: bluetooth 2015-09-17

2015-09-17 Thread Johan Hedberg
Hi Dave,

Here's one important patch for the 4.3-rc series that fixes an issue
with Bluetooth LE encryption failing because of a too early check for
the SMP context.

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 20471ed4d403a5f4de6aa0c10cd1e446f7f2b3c7:

  dccp: drop null test before destroy functions (2015-09-15 16:49:43 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git 
for-upstream

for you to fetch changes up to d8949aad3eab5d396f4fefcd581773bf07b9a79e:

  Bluetooth: Delay check for conn->smp in smp_conn_security() (2015-09-17 
12:28:27 +0200)


Johan Hedberg (1):
  Bluetooth: Delay check for conn->smp in smp_conn_security()

 net/bluetooth/smp.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)


signature.asc
Description: PGP signature


Re: [PATCH v2 net-next 1/2] cls_bpf: introduce integrated actions

2015-09-17 Thread Jamal Hadi Salim

On 09/16/15 02:05, Alexei Starovoitov wrote:

From: Daniel Borkmann 

Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.



This is going off in a different direction really.
You are replicating the infrastructure inside bpf.

cheers,
jamal
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] igb: assume MSI-X interrupts during initialization

2015-09-17 Thread Stefan Assmann
In igb_sw_init() the sequence of calls was changed from
igb_init_queue_configuration()
igb_init_interrupt_scheme()
igb_probe_vfs()
to
igb_probe_vfs()
igb_init_queue_configuration()
igb_init_interrupt_scheme()

This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
get enabled properly and we run into a NULL pointer if the max_vfs
module parameter is specified (adapter->vf_data does not get allocated,
crash on accessing the structure).

[7.419348] BUG: unable to handle kernel NULL pointer dereference at 
0048
[7.419367] IP: [] igb_reset+0xe6/0x5d0 [igb]
[7.419370] PGD 0
[7.419373] Oops: 0002 [#1] SMP
[7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan 
ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
[7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
[7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 
03/07/2013
[...]
[7.419431] Call Trace:
[7.419442]  [] igb_probe+0x8b6/0x1340 [igb]
[7.419447]  [] local_pci_probe+0x45/0xa0

Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
igb_probe_vfs(). The real interrupt capabilities will be checked during
igb_init_interrupt_scheme() so this is safe to do.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index e174fbb..ba019fc 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2986,6 +2986,9 @@ static int igb_sw_init(struct igb_adapter *adapter)
}
 #endif /* CONFIG_PCI_IOV */
 
+   /* Assume MSI-X interrupts, will be checked during IRQ allocation */
+   adapter->flags |= IGB_FLAG_HAS_MSIX;
+
igb_probe_vfs(adapter);
 
igb_init_queue_configuration(adapter);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next 16/30] ipv6: Only compute net once in ip6mr_forward2_finish

2015-09-17 Thread Nicolas Dichtel

Le 16/09/2015 03:04, Eric W. Biederman a écrit :

Signed-off-by: "Eric W. Biederman" 
---
  net/ipv6/ip6mr.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index e95f6b6281de..3e3085b37a91 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1987,9 +1987,10 @@ int ip6mr_compat_ioctl(struct sock *sk, unsigned int 
cmd, void __user *arg)

  static inline int ip6mr_forward2_finish(struct sock *sk, struct sk_buff *skb)
  {
-   IP6_INC_STATS_BH(dev_net(skb_dst(skb)->dev), ip6_dst_idev(skb_dst(skb)),
+   struct net *net = dev_net(skb_dst(skb)->dev);

nit: a blank line is needed after this declaration.


+   IP6_INC_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),
 IPSTATS_MIB_OUTFORWDATAGRAMS);
-   IP6_ADD_STATS_BH(dev_net(skb_dst(skb)->dev), ip6_dst_idev(skb_dst(skb)),
+   IP6_ADD_STATS_BH(net, ip6_dst_idev(skb_dst(skb)),
 IPSTATS_MIB_OUTOCTETS, skb->len);
return dst_output(sk, skb);
  }



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 net-next 02/10] qed: Add basic L2 interface

2015-09-17 Thread Yuval Mintz
From: Manish Chopra 

This patch adds a public API for a network driver to work on top of QED.
The interface itself is very minimal - it's mostly infrastructure, as the
only content it has after this patch is a query for HW-based information
required for the creation of a network interface [I.e., no actual
protocol-specific configurations are supported].

Signed-off-by: Manish Chopra 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/Makefile  |   2 +-
 drivers/net/ethernet/qlogic/qed/qed.h |  14 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  62 +++
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |   1 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |  87 ++
 include/linux/qed/eth_common.h| 278 ++
 include/linux/qed/qed_eth_if.h|  38 
 7 files changed, 481 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_l2.c
 create mode 100644 include/linux/qed/eth_common.h
 create mode 100644 include/linux/qed/qed_eth_if.h

diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
b/drivers/net/ethernet/qlogic/qed/Makefile
index 5bbe0c7..dbe6938 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_QED) := qed.o
 
-qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o 
qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o
+qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o 
qed_int.o qed_l2.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index a44407c..ab87526 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -24,6 +24,7 @@
 #include 
 #include "qed_hsi.h"
 
+extern const struct qed_common_ops qed_common_ops_pass;
 #define DRV_MODULE_VERSION "8.4.0.0"
 
 #define MAX_HWFNS_PER_DEVICE(4)
@@ -94,13 +95,22 @@ struct qed_qm_iids {
 
 enum QED_RESOURCES {
QED_SB,
+   QED_L2_QUEUE,
QED_VPORT,
+   QED_RSS_ENG,
QED_PQ,
QED_RL,
+   QED_MAC,
+   QED_VLAN,
QED_ILT,
QED_MAX_RESC,
 };
 
+enum QED_FEATURE {
+   QED_PF_L2_QUE,
+   QED_MAX_FEATURES,
+};
+
 struct qed_hw_info {
/* PCI personality */
enum qed_pci_personalitypersonality;
@@ -108,6 +118,7 @@ struct qed_hw_info {
/* Resource Allocation scheme results */
u32 resc_start[QED_MAX_RESC];
u32 resc_num[QED_MAX_RESC];
+   u32 feat_num[QED_MAX_FEATURES];
 
 #define RESC_START(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_start[resc])
 #define RESC_NUM(_p_hwfn, resc) ((_p_hwfn)->hw_info.resc_num[resc])
@@ -269,6 +280,9 @@ struct qed_hwfn {
 
struct qed_mcp_info *mcp_info;
 
+   struct qed_hw_cid_data  *p_tx_cids;
+   struct qed_hw_cid_data  *p_rx_cids;
+
struct qed_dmae_infodmae_info;
 
/* QM init */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 7769720..1053388d 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -94,6 +94,15 @@ void qed_resc_free(struct qed_dev *cdev)
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
+   kfree(p_hwfn->p_tx_cids);
+   p_hwfn->p_tx_cids = NULL;
+   kfree(p_hwfn->p_rx_cids);
+   p_hwfn->p_rx_cids = NULL;
+   }
+
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = >hwfns[i];
+
qed_cxt_mngr_free(p_hwfn);
qed_qm_info_free(p_hwfn);
qed_spq_free(p_hwfn);
@@ -204,6 +213,29 @@ int qed_resc_alloc(struct qed_dev *cdev)
if (!cdev->fw_data)
return -ENOMEM;
 
+   /* Allocate Memory for the Queue->CID mapping */
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = >hwfns[i];
+   int tx_size = sizeof(struct qed_hw_cid_data) *
+ RESC_NUM(p_hwfn, QED_L2_QUEUE);
+   int rx_size = sizeof(struct qed_hw_cid_data) *
+ RESC_NUM(p_hwfn, QED_L2_QUEUE);
+
+   p_hwfn->p_tx_cids = kzalloc(tx_size, GFP_KERNEL);
+   if (!p_hwfn->p_tx_cids) {
+   DP_NOTICE(p_hwfn,
+ "Failed to allocate memory for Tx Cids\n");
+   goto alloc_err;
+   }
+
+   p_hwfn->p_rx_cids = kzalloc(rx_size, GFP_KERNEL);
+   if (!p_hwfn->p_rx_cids) {
+   DP_NOTICE(p_hwfn,
+   

Re: My Dear in Christ

2015-09-17 Thread Mrs. Vinayak Arora
My Dear,

My happiness is that I have lived a worthy life. My doctor told me that I
have serious sickness which is cancer problem. Knowing my condition I
decided to donate my funds to you. I want this funds to be used for the
orphanages, poor and widows. Please i do not want a situation where this
funds will be used in an ungodly manner like what my brothers have done in
the past. That's why I'm taking this decision. I'm not afraid of death
because I know where I'm going. Please have me in your prayers always. The
last of my money which no one knows of is the huge cash deposit of united
state dollars  $52,000,000 that I have with a finance/Security Company. I
will want you to dispatch it to charity organizations if only you will be
sincere. As soon as i get an answer from you i will give you contacts of
my bank and my attorney.

Mrs. Vinayak Arora


NB Please do not share my email address with anyone as I have received
some emails from some unscrupulous people claiming to be charity
organizations and other weird stories.







--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 1/2] cls_bpf: introduce integrated actions

2015-09-17 Thread Alexei Starovoitov

On 9/17/15 6:13 AM, Daniel Borkmann wrote:

Hi Jamal,

On 09/17/2015 02:37 PM, Jamal Hadi Salim wrote:

On 09/16/15 02:05, Alexei Starovoitov wrote:

From: Daniel Borkmann 

Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.



This is going off in a different direction really.
You are replicating the infrastructure inside bpf.


Hmm, I don't really agree. With cls_bpf you have non-linear
classifications as opposed to walking a chain of classifiers:
worst case, I have to walk through N classifiers just to find
out that the last one matches that I need to drop - this doesn't
scale at all. Given that we can make this decision right here,
we can use this fact and have simple return codes provided as
well. It only supplements non-linear classification that was
from the very beginning of cls_bpf a core part of it.


I don't see the replication either. May be the commit log was
misread as bpf program now executes the actions and bypasses
tcf_exts_exec() ? Well, that may be interesting idea for
the future, but that's not what the patch is doing.
With this patch cls_bpf can return single integer like
TC_ACT_SHOT/TC_ACT_OK that gact/act_bpf can already do as
an _optimization_ to avoid extra hops. To do full-fledged
action chaining the tcf_exts_exec() is used.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Replace get_seconds with ktime_get_seconds

2015-09-17 Thread Ksenija Stanojevic
Replace time_t type and get_seconds function which are not y2038 safe
on 32-bit systems. Function ktime_get_seconds use monotonic instead of
real time and therefore will not cause overflow.

Signed-off-by: Ksenija Stanojevic 
Reviewed-by: Arnd Bergmann 
---
 net/rxrpc/ar-connection.c | 4 ++--
 net/rxrpc/ar-internal.h   | 4 ++--
 net/rxrpc/ar-transport.c  | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/rxrpc/ar-connection.c b/net/rxrpc/ar-connection.c
index 6631f4f..692b3e6 100644
--- a/net/rxrpc/ar-connection.c
+++ b/net/rxrpc/ar-connection.c
@@ -808,7 +808,7 @@ void rxrpc_put_connection(struct rxrpc_connection *conn)
 
ASSERTCMP(atomic_read(>usage), >, 0);
 
-   conn->put_time = get_seconds();
+   conn->put_time = ktime_get_seconds();
if (atomic_dec_and_test(>usage)) {
_debug("zombie");
rxrpc_queue_delayed_work(_connection_reap, 0);
@@ -852,7 +852,7 @@ static void rxrpc_connection_reaper(struct work_struct 
*work)
 
_enter("");
 
-   now = get_seconds();
+   now = ktime_get_seconds();
earliest = ULONG_MAX;
 
write_lock_bh(_connection_lock);
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index aef1bd2..2934a73 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -208,7 +208,7 @@ struct rxrpc_transport {
struct rb_root  server_conns;   /* server connections on this 
transport */
struct list_headlink;   /* link in master session list 
*/
struct sk_buff_head error_queue;/* error packets awaiting 
processing */
-   time_t  put_time;   /* time at which to reap */
+   unsigned long   put_time;   /* time at which to reap */
spinlock_t  client_lock;/* client connection allocation 
lock */
rwlock_tconn_lock;  /* lock for active/dead 
connections */
atomic_tusage;
@@ -256,7 +256,7 @@ struct rxrpc_connection {
struct rxrpc_crypt  csum_iv;/* packet checksum base */
unsigned long   events;
 #define RXRPC_CONN_CHALLENGE   0   /* send challenge packet */
-   time_t  put_time;   /* time at which to reap */
+   unsigned long   put_time;   /* time at which to reap */
rwlock_tlock;   /* access lock */
spinlock_t  state_lock; /* state-change lock */
atomic_tusage;
diff --git a/net/rxrpc/ar-transport.c b/net/rxrpc/ar-transport.c
index 1976dec..9946467 100644
--- a/net/rxrpc/ar-transport.c
+++ b/net/rxrpc/ar-transport.c
@@ -189,7 +189,7 @@ void rxrpc_put_transport(struct rxrpc_transport *trans)
 
ASSERTCMP(atomic_read(>usage), >, 0);
 
-   trans->put_time = get_seconds();
+   trans->put_time = ktime_get_seconds();
if (unlikely(atomic_dec_and_test(>usage))) {
_debug("zombie");
/* let the reaper determine the timeout to avoid a race with
@@ -226,7 +226,7 @@ static void rxrpc_transport_reaper(struct work_struct *work)
 
_enter("");
 
-   now = get_seconds();
+   now = ktime_get_seconds();
earliest = ULONG_MAX;
 
/* extract all the transports that have been dead too long */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next 28/30] netfilter: Pass struct net into the netfilter hooks

2015-09-17 Thread Nicolas Dichtel

Le 16/09/2015 03:04, Eric W. Biederman a écrit :

Pass a network namespace parameter into the netfilter hooks.  At the
call site of the netfilter hooks the path a packet is taking through
the network stack is well known which allows the network namespace to
be easily and reliabily.

This allows the replacement of magic code like
"dev_net(state->in?:state->out)" that appears at the start of most
netfilter hooks with "state->net".

In almost all cases the network namespace passed in is derived
from the first network device passed in, guaranteeing those
paths will not see any changes in practice.

The exceptions are:
xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm)
ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()  ip_vs_conn_net(cp)
ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()  ip_vs_conn_net(cp)
ipv4/raw.c:raw_send_hdrinc()sock_net(sk)
ipv6/ip6_output.c:ip6_xmit()sock_net(sk)
ipv6/ndisc.c:ndisc_send_skb()   dev_net(skb->dev) not 
dev_net(dst->dev)
ipv6/raw.c:raw6_send_hdrinc()   sock_net(sk)
br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev 
is set to nf_bridge->physindev

In all cases these exceptions seem to be a better expression for the
network namespace the packet is being processed in then the historic
"dev_net(in?in:out)".  I am documenting them in case something odd
pops up and someone starts trying to track down what happened.

Signed-off-by: "Eric W. Biederman" 
---

[snip]

  int br_forward_finish(struct sock *sk, struct sk_buff *skb)
  {
-   return NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING, sk, skb,
-  NULL, skb->dev,
+   struct net *net = dev_net(skb->dev);

nit: blank line after the declaration


+   return NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING,
+  net, sk, skb, NULL, skb->dev,
   br_dev_queue_push_xmit);

  }

[snip]

  int xfrm4_output(struct sock *sk, struct sk_buff *skb)
  {
-   return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, sk, skb,
-   NULL, skb_dst(skb)->dev, __xfrm4_output,
+   struct net *net = dev_net(skb_dst(skb)->dev);

nit: same here


+   return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING,
+   net, sk, skb, NULL, skb_dst(skb)->dev,
+   __xfrm4_output,
!(IPCB(skb)->flags & IPSKB_REROUTED));
  }

[snip]

  int xfrm6_output(struct sock *sk, struct sk_buff *skb)
  {
-   return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, sk, skb,
-   NULL, skb_dst(skb)->dev, __xfrm6_output,
+   struct net *net = dev_net(skb_dst(skb)->dev);

nit: same here


+   return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING,
+   net, sk, skb,  NULL, skb_dst(skb)->dev,
+   __xfrm6_output,
!(IP6CB(skb)->flags & IP6SKB_REROUTED));
  }

[snip]

  int xfrm_output_resume(struct sk_buff *skb, int err)
  {
+   struct net *net = xs_net(skb_dst(skb)->xfrm);

nit: same here


while (likely((err = xfrm_output_one(skb, err)) == 0)) {
nf_reset(skb);


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next 22/30] ipv6: Cache net in ip6_output

2015-09-17 Thread Nicolas Dichtel

Le 16/09/2015 03:04, Eric W. Biederman a écrit :

Keep net in a local variable so I can use it in NF_HOOK_COND
when I pass struct net to all of the netfilter hooks.

Signed-off-by: "Eric W. Biederman" 
---
  net/ipv6/ip6_output.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 12d0166a64cd..8cab909b181e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -135,9 +135,9 @@ int ip6_output(struct sock *sk, struct sk_buff *skb)
  {
struct net_device *dev = skb_dst(skb)->dev;
struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
+   struct net *net = dev_net(dev);

nit: same here for the blank line.


if (unlikely(idev->cnf.disable_ipv6)) {
-   IP6_INC_STATS(dev_net(dev), idev,
- IPSTATS_MIB_OUTDISCARDS);
+   IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
kfree_skb(skb);
return 0;
}



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next 0/30] Passing net through the netfilter hooks

2015-09-17 Thread Nicolas Dichtel

Le 16/09/2015 02:59, Eric W. Biederman a écrit :


My primary goal with this patchset and it's follow ups is to cleanup the
network routing paths so that we do not look at the output device to
derive the network namespace.  My plan is to pass the network namespace
of the transmitting socket through the output path, to replace code that
looks at the output network device today.  Once that is done we can have
routes with output devices outside of the current network namespace.
Which should allow reception and transmission of packets in network
namespaces to be as fast as normal packet reception and transmission
with early demux disabled, because it will same code path.

Once skb_dst(skb)->dev is a little better under control I think it will
also be possible to use rcu to cleanup the ancient hack that sets
dst->dev to loopback_dev when a network device is removed.

The work to get there is a series of code cleanups.  I am starting with
passing net into the netfilter hooks and into the functions that are
called after the netfilter hooks.  This removes from netfilter the
need to guess which network namespace it is working on.

To get there I perform a series of minor prep patches so the big changes
at the end are possible to audit without getting lost in the noise.  In
particular I have a lot of patches computing net into a local variable
and then using it through out the function.

So this patchset encompases removing dead code, sorting out the _sk
functions that were added last time someone pushed a prototype change
through the post netfilter functions.  Cleaning up individual functions
use of the network namespace.  Passing net into the netfilter hooks.
Passing net into the post netfilter functions.  Using state->net in
the netfilter code where it is available and trivially usable.

LGTM (except some minor comments).

Acked-by: Nicolas Dichtel 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches

2015-09-17 Thread Florian Westphal
Daniel Mack  wrote:
> Hi Florian,
> 
> On 09/16/2015 11:21 PM, Florian Westphal wrote:
> > Daniel Mack  wrote:
> >> I'm re-addressing the issue of matching socket meta information for
> >> non-established sockets that has been discussed a while ago:
> >>
> >>   
> >> http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877
> >>
> >> Being able to reliably match on net_cls cgroup ids is crucial in
> >> order to build a per-application or per-container firewall rules
> >> which don't leak ingress packets. Such a feature would be very
> >> useful to have.
> > 
> > Could you clarify what 'which don't leak ingress packets' means?
> 
> Well, currently, the existing cgroups matches only filter packets that
> are sent to an established socket. All other packets are ignored. So
> when users install such matches as advertised by the documented
> examples, and the chain policy is permissive, the firewall 'leaks'
> packets, which is unexpected.

Then 'the documentation' needs fixing.
cgroup (and anything related to sk data, including uid, etc.) is
not guaranteed to work.

We can only match what is available in the packet payload, and
some extra info that the stack can make available to us (e.g.
VLAN id, or skb->sk in some cases on output) and conntrack state plus
whatever extra data conntrack allows to attach.

> >> The patch set is obviously not yet finished, because a lot more
> >> protocol handlers need to be patched. Right now, I only addressed
> >> tcp_ipv4. Before I do that, I want to get some feedback on the
> >> approach, so please let me know what you think.
> > 
> > I think there are several issues.
> > 
> > implementation problems:
> > - i'm not sure its legal to call the hook input with skb->sk locked,
> >   some matches might want to aquire it.
> 
> In the code as it stands after my patch set, I don't see where skb->sk
> is locked?

True.

[..]

> > design issues:
> > The assumption seems to be that a given skb can always be mapped to a
> > particular socket, and hence a cgroup.
> >
> > Thats not necessarily the case, e.g. with broad-/multicasting or when
> > the socket is e.g. in timewait state.
> 
> Yes, that's true. The idea for multicast would be to just drop the
> cloned skb instead of delivering it to the final socket.

-v please.

> > I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves
> > early-demux-esque code into the nft ruleset.
> > 
> > Then you could do something like
> > 
> > nft add rule ip filter input meta l4proto tcp demux meta cgroup 42
> 
> Ok, but how would that be different from the unconditional demuxing
> patches we've kicked around earlier, especially when it comes to
> multicast sockets? Could you explain what you have in mind here?

Two things:
- keep it out of core network stack
- make it explicit so we can document that 'demux' keyword is fishy and
will not work reliably.

F.e. I don't see how mcast could ever be made to work except by adding
an entirely new filtering mechanism/new hooks in core stack.

> > The caveat being that even in this case we cannot guarantee
> > that skb->sk is set afterwards, or that a cgroup can be derived from it.
> > 
> > Iff you absolutely need this, I'd seriously entertain the idea of adding
> > NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset
> > as a socket filter.
> 
> That would be a new netfilter hook then, something that is called after
> LOCAL_IN, for ingress only? In a sense, it would be called from the
> protocol handlers, just as my patches do right now, but instead of
> conditionally re-iterating the same rules again, we would walk a
> different chain?

Yes, something like that.  Obviously, you'll need to dru^W brib^W
convince a LOT of people before that could ever fly.

I think we should not do this and that this 'match on ingress sk
properties' is just bad[tm].

f.e. you'd also have to move all of the stuff you want into
sock_common ... 8-(

> > But really, at that point, a much better question would be wheter net
> > cgroups are the answer to whatever the question was, or what problem we
> > are attempting to address here...
> 
> The idea is simply to have a packet filter which is based on information
> derived from the task that sends or will eventually handle the packet.

Starts to smell like snet (https://lwn.net/Articles/441587/)

> IOW: We want to be able to install netfilter rules that apply to all
> packets received or sent by tasks that match a certain criteria, without
> modifying the sources of those tasks.

Sorry, I think netfilter is wrong tool for this, at least for ingress.
You could use conntrack to stash net_cls id in the connmark, though (for
inbound reply packets).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp_cubic: do not set epoch_start in the future

2015-09-17 Thread Eric Dumazet
From: Eric Dumazet 

Tracking idle time in bictcp_cwnd_event() is imprecise, as epoch_start
is normally set at ACK processing time, not at send time.

Doing a proper fix would need to add an additional state variable,
and does not seem worth the trouble, given CUBIC bug has been there
forever before Jana noticed it.

Let's simply not set epoch_start in the future, otherwise
bictcp_update() could overflow and CUBIC would again
grow cwnd too fast.

This was detected thanks to a packetdrill test Neal wrote that was flaky
before applying this fix.

Fixes: 30927520dbae ("tcp_cubic: better follow cubic curve after idle period")
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Cc: Jana Iyengar 
---
 net/ipv4/tcp_cubic.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index c6ded6b2a79f..448c2615fece 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -154,14 +154,20 @@ static void bictcp_init(struct sock *sk)
 static void bictcp_cwnd_event(struct sock *sk, enum tcp_ca_event event)
 {
if (event == CA_EVENT_TX_START) {
-   s32 delta = tcp_time_stamp - tcp_sk(sk)->lsndtime;
struct bictcp *ca = inet_csk_ca(sk);
+   u32 now = tcp_time_stamp;
+   s32 delta;
+
+   delta = now - tcp_sk(sk)->lsndtime;
 
/* We were application limited (idle) for a while.
 * Shift epoch_start to keep cwnd growth to cubic curve.
 */
-   if (ca->epoch_start && delta > 0)
+   if (ca->epoch_start && delta > 0) {
ca->epoch_start += delta;
+   if (after(ca->epoch_start, now))
+   ca->epoch_start = now;
+   }
return;
}
 }


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS/TCP/IPv6 acting strangely in 4.2

2015-09-17 Thread Benjamin Coddington
On Thu, 17 Sep 2015, Trond Myklebust wrote:

> Hi Russell,
>
> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote:
> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux
> > wrote:
> > > Following that idea, I just tried the patch below, and it seems to
> > > work.
> > > I don't know whether it handles all cases after a call to
> > > kernel_connect(),
> > > but it stops the multiple connection attempts:
> > >
> > >   1   0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539
> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691
> > > WS=128
> > >   2   0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK]
> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> > > TSval=870318939 TSecr=15712 WS=64
> > >   3   0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540
> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > >   4   0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0x905379cc, [Check: RD LU MD XT DL]
> > >   5   0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > >   6   0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0x905379cc, [Check: RD LU MD XT DL]
> > >   7   0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
> > >   8   0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4),
> > > [Allowed: RD LU MD XT DL]
> > >   9   0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780
> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
> > >  10   0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > >  11   0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6),
> > > [Allowed: RD LU MD XT DL]
> > >  12   0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > > 0xe15fc9c9, [Check: RD LU MD XT DL]
> > >  13   0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10),
> > > [Allowed: RD LU MD XT DL]
> > >  14   0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH:
> > > 0xe15fc9c9
> > > ...
> >
> > NFS people - any comments on this patch?  Is it the correct way to
> > solve
> > this problem (please see the first message in this thread for the
> > problem.)
> > Without this patch, NFS is unusable as it tries to launch multiple
> > new
> > connections from the same port to the NFS server without giving the
> > NFS
> > server time to respond and establish the TCP connection.
>
> I agree that it addresses a real problem here, however there are a
> couple of issues with the patch itself:
>
> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and
> TCP_CLOSE, so if the connection attempt fails, this patch leaves the
> XPRT_CONNECTING flag set.
> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1,
> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection
> attempt by canceling the XPRT_CONNECTING state.
>
> How about the following? It is based on your patch, but adds a check to
> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state
> more than once (which could otherwise still happen in the TCP_CLOSE
> case).
>
> 8<---
> From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001
> From: Trond Myklebust 
> Date: Wed, 16 Sep 2015 23:43:17 -0400
> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete
>  before retrying
>
> Commit 718ba5b87343, moved the responsibility for unlocking the socket to
> xs_tcp_setup_socket, meaning that the socket will be unlocked before we
> know that it has finished trying to connect. The following patch is based on
> an initial patch by Russell King to ensure that we delay clearing the
> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate
> a connection attempt, or the connection attempt itself failed.
>
> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from 
> racing")
> Reported-by: Russell King 
> Signed-off-by: Trond Myklebust 

This fixes up my network segmentation problem, tested on top of your "Fix
races between socket connection and destroy code".

Tested-by: Benjamin Coddington 

Ben

> ---
>  include/linux/sunrpc/xprtsock.h |  3 +++
>  net/sunrpc/xprtsock.c   | 11 ---
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
> index 7591788e9fbf..357e44c1a46b 100644
> --- a/include/linux/sunrpc/xprtsock.h
> +++ b/include/linux/sunrpc/xprtsock.h
> @@ -42,6 +42,7 @@ struct sock_xprt {
>   /*
>* Connection of transports
>*/
> + unsigned long   sock_state;
>   struct delayed_work connect_worker;
>  

[ANNOUNCE] libnftnl 1.0.5 release

2015-09-17 Thread Pablo Neira Ayuso
Hi!

The Netfilter project proudly presents:

libnftnl 1.0.5

libnftnl is a userspace library providing a low-level netlink
programming interface (API) to the in-kernel nf_tables subsystem. The
library libnftnl has been previously known as libnftables. This
library is currently used by the nft command line tool.

This release resolves problems with LIBVERSION and symbol versioning
with regards to 1.0.4.

You can download this library from:

http://www.netfilter.org/projects/libnftnl/downloads.html
ftp://ftp.netfilter.org/pub/libnftnl/

Thanks!
Jan Engelhardt (1):
  build: bump library versioning

Pablo Neira Ayuso (1):
  bump version to 1.0.5



[PATCH][RESEND] ARCNET: fix hard_header_len limit

2015-09-17 Thread Michael Grzeschik
For arcnet the bare minimum header only contains the 4 bytes to
specify source, dest and offset (1, 1 and 2 bytes respectively).
The corresponding struct is struct arc_hardware.

The struct archdr contains additionally a union of possible soft
headers. When doing $insertusecasehere packets might well
include short (or even no?) soft headers.

For this reason only use arc_hardware instead of archdr to
determine the hard_header_len for an arcnet device.

Signed-off-by: Michael Grzeschik 
---
 drivers/net/arcnet/arcnet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/arcnet/arcnet.c b/drivers/net/arcnet/arcnet.c
index 10f71c73..816d0e9 100644
--- a/drivers/net/arcnet/arcnet.c
+++ b/drivers/net/arcnet/arcnet.c
@@ -326,7 +326,7 @@ static void arcdev_setup(struct net_device *dev)
dev->type = ARPHRD_ARCNET;
dev->netdev_ops = _netdev_ops;
dev->header_ops = _header_ops;
-   dev->hard_header_len = sizeof(struct archdr);
+   dev->hard_header_len = sizeof(struct arc_hardware);
dev->mtu = choose_mtu();
 
dev->addr_len = ARCNET_ALEN;
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH][RESEND] MAINTAINERS: add arcnet and take maintainership

2015-09-17 Thread Michael Grzeschik
Add entry for arcnet to MAINTAINERS file and add myself as the
maintainer of the subsystem.

Signed-off-by: Michael Grzeschik 
Cc: da...@davemloft.net
Cc: j...@perches.com
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7ba7ab7..0a015f7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -808,6 +808,13 @@ S: Maintained
 F: drivers/video/fbdev/arcfb.c
 F: drivers/video/fbdev/core/fb_defio.c
 
+ARCNET NETWORK LAYER
+M: Michael Grzeschik 
+L: netdev@vger.kernel.org
+S: Maintained
+F: drivers/net/arcnet/
+F: include/uapi/linux/if_arcnet.h
+
 ARM MFM AND FLOPPY DRIVERS
 M: Ian Molton 
 S: Maintained
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] net: Fix behaviour of unreachable, blackhole and prohibit routes

2015-09-17 Thread Nikola Forró
Man page of ip-route(8) says following about route types:

  unreachable - these destinations are unreachable.  Packets are dis‐
  carded and the ICMP message host unreachable is generated.  The local
  senders get an EHOSTUNREACH error.

  blackhole - these destinations are unreachable.  Packets are dis‐
  carded silently.  The local senders get an EINVAL error.

  prohibit - these destinations are unreachable.  Packets are discarded
  and the ICMP message communication administratively prohibited is
  generated.  The local senders get an EACCES error.

In the inet6 address family, this was correct, except the local senders
got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route.
In the inet address family, all three route types generated ICMP message
net unreachable, and the local senders got ENETUNREACH error.

In both address families all three route types now behave consistently
with documentation.

Signed-off-by: Nikola Forró 
---
 include/net/ip_fib.h | 30 +++---
 net/ipv4/route.c |  6 --
 net/ipv6/route.c |  4 +++-
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index a37d043..727d6e9 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -236,8 +236,11 @@ static inline int fib_lookup(struct net *net, const struct 
flowi4 *flp,
rcu_read_lock();
 
tb = fib_get_table(net, RT_TABLE_MAIN);
-   if (tb && !fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF))
-   err = 0;
+   if (tb)
+   err = fib_table_lookup(tb, flp, res, flags | FIB_LOOKUP_NOREF);
+
+   if (err == -EAGAIN)
+   err = -ENETUNREACH;
 
rcu_read_unlock();
 
@@ -258,7 +261,7 @@ static inline int fib_lookup(struct net *net, struct flowi4 
*flp,
 struct fib_result *res, unsigned int flags)
 {
struct fib_table *tb;
-   int err;
+   int err = -ENETUNREACH;
 
flags |= FIB_LOOKUP_NOREF;
if (net->ipv4.fib_has_custom_rules)
@@ -268,15 +271,20 @@ static inline int fib_lookup(struct net *net, struct 
flowi4 *flp,
 
res->tclassid = 0;
 
-   for (err = 0; !err; err = -ENETUNREACH) {
-   tb = rcu_dereference_rtnl(net->ipv4.fib_main);
-   if (tb && !fib_table_lookup(tb, flp, res, flags))
-   break;
+   tb = rcu_dereference_rtnl(net->ipv4.fib_main);
+   if (tb)
+   err = fib_table_lookup(tb, flp, res, flags);
+
+   if (!err)
+   goto out;
+
+   tb = rcu_dereference_rtnl(net->ipv4.fib_default);
+   if (tb)
+   err = fib_table_lookup(tb, flp, res, flags);
 
-   tb = rcu_dereference_rtnl(net->ipv4.fib_default);
-   if (tb && !fib_table_lookup(tb, flp, res, flags))
-   break;
-   }
+out:
+   if (err == -EAGAIN)
+   err = -ENETUNREACH;
 
rcu_read_unlock();
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 5f4a556..c6ad99a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2045,6 +2045,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
struct fib_result res;
struct rtable *rth;
int orig_oif;
+   int err = -ENETUNREACH;
 
res.tclassid= 0;
res.fi  = NULL;
@@ -2153,7 +2154,8 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
goto make_route;
}
 
-   if (fib_lookup(net, fl4, , 0)) {
+   err = fib_lookup(net, fl4, , 0);
+   if (err) {
res.fi = NULL;
res.table = NULL;
if (fl4->flowi4_oif) {
@@ -2181,7 +2183,7 @@ struct rtable *__ip_route_output_key(struct net *net, 
struct flowi4 *fl4)
res.type = RTN_UNICAST;
goto make_route;
}
-   rth = ERR_PTR(-ENETUNREACH);
+   rth = ERR_PTR(err);
goto out;
}
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3d3c1b2..a608ace 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1885,9 +1885,11 @@ int ip6_route_info_create(struct fib6_config *cfg, 
struct rt6_info **rt_ret)
rt->dst.input = ip6_pkt_prohibit;
break;
case RTN_THROW:
+   case RTN_UNREACHABLE:
default:
rt->dst.error = (cfg->fc_type == RTN_THROW) ? -EAGAIN
-   : -ENETUNREACH;
+   : (cfg->fc_type == RTN_UNREACHABLE)
+   ? -EHOSTUNREACH : -ENETUNREACH;
rt->dst.output = ip6_pkt_discard_out;
rt->dst.input = ip6_pkt_discard;
break;
-- 
2.4.3


--
To unsubscribe from this list: send the line "unsubscribe 

[PATCH net] MAINTAINERS: remove bouncing email address for qlcnic

2015-09-17 Thread Jiri Benc
I got this automated message from  when submitting
a qlcnic patch:

> Shahed Shaikh is no longer with QLogic. If you require assistance please
> contact Ariel Elior ariel.el...@qlogic.com

There's no point in having a bouncing address in MAINTAINERS.

CC: dept-gelinuxnic...@qlogic.com
CC: Ariel Elior 
Signed-off-by: Jiri Benc 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 310da4295c70..0f0dcfd2d68d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8490,7 +8490,6 @@ F:Documentation/networking/LICENSE.qla3xxx
 F: drivers/net/ethernet/qlogic/qla3xxx.*
 
 QLOGIC QLCNIC (1/10)Gb ETHERNET DRIVER
-M: Shahed Shaikh 
 M: dept-gelinuxnic...@qlogic.com
 L: netdev@vger.kernel.org
 S: Supported
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 1/2] cls_bpf: introduce integrated actions

2015-09-17 Thread Daniel Borkmann

Hi Jamal,

On 09/17/2015 02:37 PM, Jamal Hadi Salim wrote:

On 09/16/15 02:05, Alexei Starovoitov wrote:

From: Daniel Borkmann 

Often cls_bpf classifier is used with single action drop attached.
Optimize this use case and let cls_bpf return both classid and action.
For backwards compatibility reasons enable this feature under
TCA_BPF_FLAG_ACT_DIRECT flag.



This is going off in a different direction really.
You are replicating the infrastructure inside bpf.


Hmm, I don't really agree. With cls_bpf you have non-linear
classifications as opposed to walking a chain of classifiers:
worst case, I have to walk through N classifiers just to find
out that the last one matches that I need to drop - this doesn't
scale at all. Given that we can make this decision right here,
we can use this fact and have simple return codes provided as
well. It only supplements non-linear classification that was
from the very beginning of cls_bpf a core part of it.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 net-next 10/10] qede: Add basic ethtool support

2015-09-17 Thread Yuval Mintz
From: Sudarsana Kalluru 

This adds basic ethtool operations to the qed driver, allowing support in:
 - Statistics gathering [ethtool -S]
 - Setting of debug level [ethtool -s  msglvl]
 - Getting basic information [ethtool, ethtool -i]

In addition it adds the ability to change the MTU.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/Makefile   |   2 +-
 drivers/net/ethernet/qlogic/qede/qede.h |  74 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 385 
 drivers/net/ethernet/qlogic/qede/qede_main.c| 137 -
 4 files changed, 596 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_ethtool.c

diff --git a/drivers/net/ethernet/qlogic/qede/Makefile 
b/drivers/net/ethernet/qlogic/qede/Makefile
index bedfe9f..06ff90d 100644
--- a/drivers/net/ethernet/qlogic/qede/Makefile
+++ b/drivers/net/ethernet/qlogic/qede/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_QEDE) := qede.o
 
-qede-y := qede_main.o
+qede-y := qede_main.o qede_ethtool.o
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 5729128..239f3e5 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -38,6 +38,70 @@
 
 #define QEDE_NAPI_WEIGHT   (NAPI_POLL_WEIGHT)
 
+struct qede_stats {
+   u64 no_buff_discards;
+   u64 rx_ucast_bytes;
+   u64 rx_mcast_bytes;
+   u64 rx_bcast_bytes;
+   u64 rx_ucast_pkts;
+   u64 rx_mcast_pkts;
+   u64 rx_bcast_pkts;
+   u64 mftag_filter_discards;
+   u64 mac_filter_discards;
+   u64 tx_ucast_bytes;
+   u64 tx_mcast_bytes;
+   u64 tx_bcast_bytes;
+   u64 tx_ucast_pkts;
+   u64 tx_mcast_pkts;
+   u64 tx_bcast_pkts;
+   u64 tx_err_drop_pkts;
+   u64 coalesced_pkts;
+   u64 coalesced_events;
+   u64 coalesced_aborts_num;
+   u64 non_coalesced_pkts;
+   u64 coalesced_bytes;
+
+   /* port */
+   u64 rx_64_byte_packets;
+   u64 rx_127_byte_packets;
+   u64 rx_255_byte_packets;
+   u64 rx_511_byte_packets;
+   u64 rx_1023_byte_packets;
+   u64 rx_1518_byte_packets;
+   u64 rx_1522_byte_packets;
+   u64 rx_2047_byte_packets;
+   u64 rx_4095_byte_packets;
+   u64 rx_9216_byte_packets;
+   u64 rx_16383_byte_packets;
+   u64 rx_crc_errors;
+   u64 rx_mac_crtl_frames;
+   u64 rx_pause_frames;
+   u64 rx_pfc_frames;
+   u64 rx_align_errors;
+   u64 rx_carrier_errors;
+   u64 rx_oversize_packets;
+   u64 rx_jabbers;
+   u64 rx_undersize_packets;
+   u64 rx_fragments;
+   u64 tx_64_byte_packets;
+   u64 tx_65_to_127_byte_packets;
+   u64 tx_128_to_255_byte_packets;
+   u64 tx_256_to_511_byte_packets;
+   u64 tx_512_to_1023_byte_packets;
+   u64 tx_1024_to_1518_byte_packets;
+   u64 tx_1519_to_2047_byte_packets;
+   u64 tx_2048_to_4095_byte_packets;
+   u64 tx_4096_to_9216_byte_packets;
+   u64 tx_9217_to_16383_byte_packets;
+   u64 tx_pause_frames;
+   u64 tx_pfc_frames;
+   u64 tx_lpi_entry_count;
+   u64 tx_total_collisions;
+   u64 brb_truncates;
+   u64 brb_discards;
+   u64 tx_mac_ctrl_frames;
+};
+
 struct qede_dev {
struct qed_dev  *cdev;
struct net_device   *ndev;
@@ -86,6 +150,7 @@ struct qede_dev {
max_t(u64, 1UL << QEDE_RX_ALIGN_SHIFT,  \
  SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
+   struct qede_stats   stats;
struct qed_update_vport_rss_params  rss_params;
u16 q_num_rx_buffers; /* Must be a power of two */
u16 q_num_tx_buffers; /* Must be a power of two */
@@ -198,6 +263,15 @@ union qede_reload_args {
u16 mtu;
 };
 
+void qede_config_debug(uint debug, u32 *p_dp_module, u8 *p_dp_level);
+void qede_set_ethtool_ops(struct net_device *netdev);
+void qede_reload(struct qede_dev *edev,
+void (*func)(struct qede_dev *edev,
+ union qede_reload_args *args),
+union qede_reload_args *args);
+int qede_change_mtu(struct net_device *dev, int new_mtu);
+void qede_fill_by_demand_stats(struct qede_dev *edev);
+
 #define RX_RING_SIZE_POW   13
 #define RX_RING_SIZE   BIT(RX_RING_SIZE_POW)
 #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
new file mode 100644
index 000..3a36247
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -0,0 +1,385 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the 

[RFC v2 net-next 00/10] Add new drivers: qed & qede

2015-09-17 Thread Yuval Mintz
From: Ariel Elior 

This series implements the driver set for Qlogic's new 579xx series.
These are 10/20/25/40/50/100 Gig capable converged nics, supporting
ethernet (obviously), iscsi, fcoe, roce and iwarp protocols.

The overall driver design includes a common module ('qed') and protocol
specific dependent modules for ethernet ('qede'), fcoe ('qedf'),
iscsi ('qedi') and roce ('qedr').
The common module contains all of the common logic, e.g. initialization,
cleanup, infrastructure for interrupt handling, link management, slowpath
etc. as well as protocol agnostic features, and supplying an abstraction
layer for other modules.
The protocol specific modules can be compiled and operated independently
of each other, with the exception of the rdma modules which are dependent
on the ethernet module, in accordance with the kernel rdma stack design.

This series only adds the core and ethernet modules, with basic L2
capabilities. Future series will add the rest of the modules and enhance
the L2 functionality.

Ths patch series is constructed of the following patches:
qed:  Add module with basic common support
qed:  Add basic L2 interface
qede: Add basic Network driver
qed:  Add slowpath L2 support
qede: Add basic network device support
qede: Add classification configuration
qed:  Add link support
qede: Add support for link
qed:  Add statistics support
qede: Add basic ethtool support

We don't expect the series to be accepted as is. We are looking for
upstream community feedback and guidance. Although the series is quite
large, it is what we viewed as the minimal set of patches to constitute
a basic L2 driver.

This project is a team effort, thanks go to Yuval Mintz, Dmitry Kravkov,
Michal Kalderon, Tomer Tayar, Manish Chopra, Sudarsana Kalluru,
Rajesh Borundia, Sony Chacko, Artum Zolotushko, Harish Patil, Rasesh Mody,
Sergey Ukhterov and Elad Manela, as well as former team members,
Eilon Greenstein and Shmulik Ravid.

Changes from previos version:
-

>From Version 1:
  - Removed private license file; Instead revised comments at source headers.

Thanks,
Ariel Elior
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS/TCP/IPv6 acting strangely in 4.2

2015-09-17 Thread Trond Myklebust
Hi Russell,

On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux
> wrote:
> > Following that idea, I just tried the patch below, and it seems to
> > work.
> > I don't know whether it handles all cases after a call to
> > kernel_connect(),
> > but it stops the multiple connection attempts:
> > 
> >   1   0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539
> > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691
> > WS=128
> >   2   0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK]
> > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1
> > TSval=870318939 TSecr=15712 WS=64
> >   3   0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540
> > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
> >   4   0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0x905379cc, [Check: RD LU MD XT DL]
> >   5   0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712
> >   6   0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0x905379cc, [Check: RD LU MD XT DL]
> >   7   0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523
> > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
> >   8   0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4),
> > [Allowed: RD LU MD XT DL]
> >   9   0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780
> > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
> >  10   0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0xe15fc9c9, [Check: RD LU MD XT DL]
> >  11   0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6),
> > [Allowed: RD LU MD XT DL]
> >  12   0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH:
> > 0xe15fc9c9, [Check: RD LU MD XT DL]
> >  13   0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10),
> > [Allowed: RD LU MD XT DL]
> >  14   0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH:
> > 0xe15fc9c9
> > ...
> 
> NFS people - any comments on this patch?  Is it the correct way to
> solve
> this problem (please see the first message in this thread for the
> problem.)
> Without this patch, NFS is unusable as it tries to launch multiple
> new
> connections from the same port to the NFS server without giving the
> NFS
> server time to respond and establish the TCP connection.

I agree that it addresses a real problem here, however there are a
couple of issues with the patch itself:

AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and
TCP_CLOSE, so if the connection attempt fails, this patch leaves the
XPRT_CONNECTING flag set.
There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1,
TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection
attempt by canceling the XPRT_CONNECTING state.

How about the following? It is based on your patch, but adds a check to
ensure that xs_tcp_state_change() doesn't clear the 'connecting' state
more than once (which could otherwise still happen in the TCP_CLOSE
case).

8<---
>From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Wed, 16 Sep 2015 23:43:17 -0400
Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete
 before retrying

Commit 718ba5b87343, moved the responsibility for unlocking the socket to
xs_tcp_setup_socket, meaning that the socket will be unlocked before we
know that it has finished trying to connect. The following patch is based on
an initial patch by Russell King to ensure that we delay clearing the
XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate
a connection attempt, or the connection attempt itself failed.

Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing")
Reported-by: Russell King 
Signed-off-by: Trond Myklebust 
---
 include/linux/sunrpc/xprtsock.h |  3 +++
 net/sunrpc/xprtsock.c   | 11 ---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
index 7591788e9fbf..357e44c1a46b 100644
--- a/include/linux/sunrpc/xprtsock.h
+++ b/include/linux/sunrpc/xprtsock.h
@@ -42,6 +42,7 @@ struct sock_xprt {
/*
 * Connection of transports
 */
+   unsigned long   sock_state;
struct delayed_work connect_worker;
struct sockaddr_storage srcaddr;
unsigned short  srcport;
@@ -76,6 +77,8 @@ struct sock_xprt {
  */
 #define TCP_RPC_REPLY  (1UL << 6)
 
+#define XPRT_SOCK_CONNECTING   1U
+
 #endif /* __KERNEL__ */
 
 #endif /* _LINUX_SUNRPC_XPRTSOCK_H */
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7be90bc1a7c2..5bac27983e2a 100644
--- a/net/sunrpc/xprtsock.c
+++ 

RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-17 Thread KY Srinivasan


> -Original Message-
> From: David Laight [mailto:david.lai...@aculab.com]
> Sent: Thursday, September 17, 2015 1:38 AM
> To: KY Srinivasan ; Alexander Duyck
> ; Haiyang Zhang ;
> Vitaly Kuznetsov ; netdev@vger.kernel.org
> Cc: David S. Miller ; linux-ker...@vger.kernel.org;
> Jason Wang 
> Subject: RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> From: KY Srinivasan
> > Sent: 16 September 2015 23:58
> ...
> > > I think we get that.  The question is does the Remote NDIS header and
> > > packet info actually need to be a part of the header data?  I would
> > > argue that it probably doesn't.
> > >
> > > So for example in netvsc_start_xmit it looks like you are calling
> > > init_page_array in order to populate a set of page buffers, but the
> > > first buffer for the Remote NDIS protocol is populated as a separate
> > > page and offset.  As such it doesn't seem like it necessarily needs to
> > > be a part of the header data but could be maintained perhaps in a
> > > separate ring buffer, or perhaps just be a separate page that you break
> > > up to use for each header.
> >
> > You are right; the rndis header can be built as a separate fragment and 
> > sent.
> > Indeed this is what we were doing earlier - on the outgoing path we would
> allocate
> > memory for the rndis header. My goal was to avoid this allocation on every
> packet being
> > sent and I decided to use the headroom instead. If we can completely avoid 
> > all
> memory
> > allocation for rndis header, it makes a significant perf difference:
> ...
> 
> 
> So just preallocate the header space as a fixed buffer for each ring entry
> (or tx frame).
> 
> If you allocate a fixed buffer for each ring entry you may find there are
> performance gains from copying small fragments into the buffer instead
> of doing whatever mapping operations are required.
> 
>   David

Yes; I could do that. My original goal of asking for additional head room was 
to avoid having
any allocation in the transmit path. I did not realize that all I had done was 
push the allocation to a
different spot since the head room I was asking was greater than the default 
head room on skb allocation.

I think I can achieve my original goal of not having any allocation in the send 
path by carefully using the
memory available in the skb:

1. I am going to separately handle the rndis header and this can be packed in 
the default headroom
available in the skb.
2. I will use the scratch area in the skb to stash away the state that needs to 
persist. This is the state needed
to cleanup the guest state after we get the send_complete packet.

Regards,

K. Y 
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS/TCP/IPv6 acting strangely in 4.2

2015-09-17 Thread Russell King - ARM Linux
On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux wrote:
> Following that idea, I just tried the patch below, and it seems to work.
> I don't know whether it handles all cases after a call to kernel_connect(),
> but it stops the multiple connection attempts:
> 
>   1   0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 Win=28560 
> Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 WS=128
>   2   0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] Seq=1884476522 
> Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=870318939 
> TSecr=15712 WS=64
>   3   0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 
> Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939
>   4   0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: 
> RD LU MD XT DL]
>   5   0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 
> Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712
>   6   0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: 
> RD LU MD XT DL]
>   7   0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 
> Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712
>   8   0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), [Allowed: 
> RD LU MD XT DL]
>   9   0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 
> Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939
>  10   0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: 
> RD LU MD XT DL]
>  11   0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), [Allowed: 
> RD LU MD XT DL]
>  12   0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: 
> RD LU MD XT DL]
>  13   0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), [Allowed: 
> RD LU MD XT DL]
>  14   0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: 0xe15fc9c9
> ...

NFS people - any comments on this patch?  Is it the correct way to solve
this problem (please see the first message in this thread for the problem.)
Without this patch, NFS is unusable as it tries to launch multiple new
connections from the same port to the NFS server without giving the NFS
server time to respond and establish the TCP connection.

> 
>  net/sunrpc/xprtsock.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index ff5b6a2e62c3..c456d6e51c56 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -1450,6 +1450,7 @@ static void xs_tcp_state_change(struct sock *sk)
>   switch (sk->sk_state) {
>   case TCP_ESTABLISHED:
>   spin_lock(>transport_lock);
> + xprt_clear_connecting(xprt);
>   if (!xprt_test_and_set_connected(xprt)) {
>   struct sock_xprt *transport = container_of(xprt,
>   struct sock_xprt, xprt);
> @@ -1474,12 +1475,14 @@ static void xs_tcp_state_change(struct sock *sk)
>   smp_mb__before_atomic();
>   clear_bit(XPRT_CONNECTED, >state);
>   clear_bit(XPRT_CLOSE_WAIT, >state);
> + clear_bit(XPRT_CONNECTING, >state);   
>   smp_mb__after_atomic();
>   break;
>   case TCP_CLOSE_WAIT:
>   /* The server initiated a shutdown of the socket */
>   xprt->connect_cookie++;
>   clear_bit(XPRT_CONNECTED, >state);
> + clear_bit(XPRT_CONNECTING, >state);
>   xs_tcp_force_close(xprt);
>   case TCP_CLOSING:
>   /*
> @@ -1493,6 +1496,7 @@ static void xs_tcp_state_change(struct sock *sk)
>   set_bit(XPRT_CLOSING, >state);
>   smp_mb__before_atomic();
>   clear_bit(XPRT_CONNECTED, >state);
> + clear_bit(XPRT_CONNECTING, >state);
>   smp_mb__after_atomic();
>   break;
>   case TCP_CLOSE:
> @@ -2237,11 +2241,13 @@ static void xs_tcp_setup_socket(struct work_struct 
> *work)
>   xs_tcp_force_close(xprt);
>   break;
>   case 0:
> - case -EINPROGRESS:
>   case -EALREADY:
>   xprt_unlock_connect(xprt, transport);
>   xprt_clear_connecting(xprt);
>   return;
> + case -EINPROGRESS:
> + xprt_unlock_connect(xprt, transport);
> + return;
>   case -EINVAL:
>   /* Happens, for instance, if the user specified a link
>* local IPv6 address without a scope-id.
> 
> 
> -- 
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line 

[PATCH net 1/5] vxlan: set needed headroom correctly

2015-09-17 Thread Jiri Benc
vxlan_setup is called when allocating the net_device, i.e. way before
vxlan_newlink (or vxlan_dev_configure) is called. This means
vxlan->default_dst is actually unset in vxlan_setup and the condition that
sets needed_headroom always takes the else branch.

Set the needed_headrom at the point when we have the information about
the address family available.

Fixes: e4c7ed415387c ("vxlan: add ipv6 support")
Fixes: 2853af6a2ea1a ("vxlan: use dev->needed_headroom instead of 
dev->hard_header_len")
CC: Cong Wang 
Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cf8b7f0473b3..6ebe562af04e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2392,10 +2392,6 @@ static void vxlan_setup(struct net_device *dev)
 
eth_hw_addr_random(dev);
ether_setup(dev);
-   if (vxlan->default_dst.remote_ip.sa.sa_family == AF_INET6)
-   dev->needed_headroom = ETH_HLEN + VXLAN6_HEADROOM;
-   else
-   dev->needed_headroom = ETH_HLEN + VXLAN_HEADROOM;
 
dev->netdev_ops = _netdev_ops;
dev->destructor = free_netdev;
@@ -2670,8 +2666,12 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
 
dev->needed_headroom = lowerdev->hard_header_len +
   (use_ipv6 ? VXLAN6_HEADROOM : 
VXLAN_HEADROOM);
-   } else if (use_ipv6)
+   } else if (use_ipv6) {
vxlan->flags |= VXLAN_F_IPV6;
+   dev->needed_headroom = ETH_HLEN + VXLAN6_HEADROOM;
+   } else {
+   dev->needed_headroom = ETH_HLEN + VXLAN_HEADROOM;
+   }
 
memcpy(>cfg, conf, sizeof(*conf));
if (!vxlan->cfg.dst_port)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 5/5] bnx2x: track vxlan port count

2015-09-17 Thread Jiri Benc
The callback for adding vxlan port can be called with the same port for
both IPv4 and IPv6. Do not disable the offloading when the same port for
both protocols is added and later one of them removed.

Signed-off-by: Jiri Benc 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h  |  1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 --
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index ba936635322a..b5e64b02200c 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1946,6 +1946,7 @@ struct bnx2x {
u16 vlan_cnt;
u16 vlan_credit;
u16 vxlan_dst_port;
+   u8 vxlan_dst_port_count;
bool accept_any_vlan;
 };
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 89a174fa1300..f1d62d5dbaff 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -10108,12 +10108,18 @@ static void __bnx2x_add_vxlan_port(struct bnx2x *bp, 
u16 port)
if (!netif_running(bp->dev))
return;
 
-   if (bp->vxlan_dst_port || !IS_PF(bp)) {
+   if (bp->vxlan_dst_port_count && bp->vxlan_dst_port == port) {
+   bp->vxlan_dst_port_count++;
+   return;
+   }
+
+   if (bp->vxlan_dst_port_count || !IS_PF(bp)) {
DP(BNX2X_MSG_SP, "Vxlan destination port limit reached\n");
return;
}
 
bp->vxlan_dst_port = port;
+   bp->vxlan_dst_port_count = 1;
bnx2x_schedule_sp_rtnl(bp, BNX2X_SP_RTNL_ADD_VXLAN_PORT, 0);
 }
 
@@ -10128,10 +10134,14 @@ static void bnx2x_add_vxlan_port(struct net_device 
*netdev,
 
 static void __bnx2x_del_vxlan_port(struct bnx2x *bp, u16 port)
 {
-   if (!bp->vxlan_dst_port || bp->vxlan_dst_port != port || !IS_PF(bp)) {
+   if (!bp->vxlan_dst_port_count || bp->vxlan_dst_port != port ||
+   !IS_PF(bp)) {
DP(BNX2X_MSG_SP, "Invalid vxlan port\n");
return;
}
+   bp->vxlan_dst_port--;
+   if (bp->vxlan_dst_port)
+   return;
 
if (netif_running(bp->dev)) {
bnx2x_schedule_sp_rtnl(bp, BNX2X_SP_RTNL_DEL_VXLAN_PORT, 0);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/5] vxlan: reject IPv6 addresses if IPv6 is not configured

2015-09-17 Thread Jiri Benc
When IPv6 address is set without IPv6 configured, the vxlan socket is mostly
treated as an IPv4 one but various lookus in fdb etc. still take the
AF_INET6 into account. This creates incosistencies with weird consequences.

Just reject IPv6 addresses in such case.

Signed-off-by: Jiri Benc 
---
 drivers/net/vxlan.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6ebe562af04e..bbac1d35ed4e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2636,8 +2636,11 @@ static int vxlan_dev_configure(struct net *src_net, 
struct net_device *dev,
dst->remote_ip.sa.sa_family = AF_INET;
 
if (dst->remote_ip.sa.sa_family == AF_INET6 ||
-   vxlan->cfg.saddr.sa.sa_family == AF_INET6)
+   vxlan->cfg.saddr.sa.sa_family == AF_INET6) {
+   if (!IS_ENABLED(CONFIG_IPV6))
+   return -EPFNOSUPPORT;
use_ipv6 = true;
+   }
 
if (conf->remote_ifindex) {
struct net_device *lowerdev
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 4/5] be2net: allow offloading with the same port for IPv4 and IPv6

2015-09-17 Thread Jiri Benc
The callback for adding vxlan port can be called with the same port for both
IPv4 and IPv6. Do not disable the offloading if this occurs.

Signed-off-by: Jiri Benc 
---
 drivers/net/ethernet/emulex/benet/be.h  |  1 +
 drivers/net/ethernet/emulex/benet/be_main.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 0a27805cbbbd..821540913343 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -582,6 +582,7 @@ struct be_adapter {
u16 pvid;
__be16 vxlan_port;
int vxlan_port_count;
+   int vxlan_port_aliases;
struct phy_info phy;
u8 wol_cap;
bool wol_en;
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 12687bf52b95..7bf51a1a0a77 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5176,6 +5176,11 @@ static void be_add_vxlan_port(struct net_device *netdev, 
sa_family_t sa_family,
if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter))
return;
 
+   if (adapter->vxlan_port == port && adapter->vxlan_port_count) {
+   adapter->vxlan_port_aliases++;
+   return;
+   }
+
if (adapter->flags & BE_FLAGS_VXLAN_OFFLOADS) {
dev_info(dev,
 "Only one UDP port supported for VxLAN offloads\n");
@@ -5226,6 +5231,11 @@ static void be_del_vxlan_port(struct net_device *netdev, 
sa_family_t sa_family,
if (adapter->vxlan_port != port)
goto done;
 
+   if (adapter->vxlan_port_aliases) {
+   adapter->vxlan_port_aliases--;
+   return;
+   }
+
be_disable_vxlan_offloads(adapter);
 
dev_info(>pdev->dev,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread David Woodhouse
On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote:
> 
> Thanks; I'll try that. In fact since updating to 4.2 the problem has
> got worse — now the whole machine dies:

There is something very strange going on here. I've found two ways to
make it stop crashing when cp_tx_timeout() hits the 'popf' when
unlocking the spinlock.

The first is to comment out the whole of cp_tx_timeout() and let it
happen once. Then put that code *back* again and reload the module.
Then it can work fine.

The second way is to comment out the WARN_ONCE in dev_watchdog().
I remain utterly bemused; I have no idea what's going on there.

But that aside, even when it survives running cp_tx_timeout(), it still
doesn't *work* — it looks like TX is indeed working and has recovered,
but we are not *receiving* any packets.

I can't actually trigger the TX timeout at all with debugging enabled;
I've hacked things so that cp_set_wol() will also call cp_tx_timeout()
and simulate it. And now I see this...

[ 4358.499474] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.499488] 8139cp :00:0b.0 eth1: tx done, slot 35
[ 4358.513663] 8139cp :00:0b.0 eth1: tx queued, slot 37, skblen 54
[ 4358.513692] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.513705] 8139cp :00:0b.0 eth1: tx done, slot 36
[ 4358.518880] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.518900] 8139cp :00:0b.0 eth1: rx slot 1 status 0x32014040 len 60
[ 4358.523601] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.526910] 8139cp :00:0b.0 eth1: rx slot 2 status 0x32036052 len 78
[ 4358.547898] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.547996] 8139cp :00:0b.0 eth1: rx slot 3 status 0x32036052 len 78
[ 4358.580526] 8139cp :00:0b.0 eth1: tx queued, slot 38, skblen 70
[ 4358.580555] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.580569] 8139cp :00:0b.0 eth1: tx done, slot 37
[ 4358.601912] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.601932] 8139cp :00:0b.0 eth1: rx slot 4 status 0x32036052 len 78
[ 4358.650678] 8139cp :00:0b.0 eth1: intr, status 0001 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.650698] 8139cp :00:0b.0 eth1: rx slot 5 status 0x320145a5 len 1441
[ 4358.665572] will lock...
[ 4358.668222] Handling tx timeout, flags 282
[ 4358.672494] nway_reset
[ 4358.674858] Will wake queue...
[ 4358.677919] Will unlock... flags 282
[ 4358.681525] did unlock...
[ 4358.684198] 8139cp :00:0b.0 eth1: Transmit timeout handled, status  c   
2b0 80ff
[ 4358.708234] 8139cp :00:0b.0 eth1: tx queued, slot 1, skblen 92
[ 4358.714567] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.722405] 8139cp :00:0b.0 eth1: tx done, slot 0
[ 4358.747412] 8139cp :00:0b.0 eth1: tx queued, slot 2, skblen 106
[ 4358.753736] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.756824] 8139cp :00:0b.0 eth1: tx done, slot 1
[ 4358.814961] 8139cp :00:0b.0 eth1: tx queued, slot 3, skblen 173
[ 4358.821291] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.824186] 8139cp :00:0b.0 eth1: tx done, slot 2
[ 4358.834352] 8139cp :00:0b.0 eth1: tx queued, slot 4, skblen 86
[ 4358.840579] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.844216] 8139cp :00:0b.0 eth1: tx done, slot 3
[ 4358.853615] 8139cp :00:0b.0 eth1: tx queued, slot 5, skblen 54
[ 4358.859822] 8139cp :00:0b.0 eth1: intr, status 0484 enable 80ff cmd 0c 
cpcmd 002b
[ 4358.863497] 8139cp :00:0b.0 eth1: tx done, slot 4
[ 4358.873111] 8139cp :00:0b.0 eth1: tx queued, slot 6, skblen 66

-- 
-- 
David WoodhouseOpen Source Technology Centre
david.woodho...@intel.com  Intel Corporation



smime.p7s
Description: S/MIME cryptographic signature


Re: NFS/TCP/IPv6 acting strangely in 4.2

2015-09-17 Thread Russell King - ARM Linux
On Wed, Sep 16, 2015 at 06:53:57AM +, Damien Thébault wrote:
> On Fri, 2015-09-11 at 12:38 +0100, Russell King - ARM Linux wrote:
> > I have a recent Marvell Armada 388 board here which uses the mvneta
> > driver.  I'm seeing some weird effects with NFS with it acting as a
> > client.
> 
> Hello,
> 
> I'm upgrading a Marvelle Armada 370 board using the mvneta driver from
> 4.0 to 4.2 and noticed issues with NFS booting.
> Basically, most of the time init returns with an error code, or
> programs segfault or throw illegal instructions.
> 
> Since it worked fine on 4.0 I bisected until I found commit
> a84e32894191cfcbffa54180d78d7d4654d56c20 "net: mvneta: fix refilling
> for Rx DMA buffers".
> 
> If I revert this commit, everything seems to get back to normal.
> Could you try it ? The two issues look very similar.

If you look at my original problem report, you'll see that has nothing
to do with the problem I'm seeing.

My problem is:

- TCP disconnects
- NFS tries to establish a new connection with the server, sending a SYN
- NFS server replies with a SYNACK
- NFS client immediately sends another SYN with a different sequence
   number, so it's a _new_ attempt to connect to the NFS server.

At this point, the socket for the previous SYNACK'd connection has been
destroyed mid-setup.

This is because the sunrpc code is horribly racy - it doesn't block a
second attempt to call kernel_connect() on a socket which is already in
the process of connecting to the NFS server.

Even if the SYNACK had been corrupted (due to mvneta's rx code), that
has no bearing on the race in the sunrpc layer that destroys the previous
socket before the TCP SYN/SYNACK/ACK handshake has had a chance to
complete.

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/5] vxlan fixes

2015-09-17 Thread Jiri Benc
This fixes various issues with vxlan related to IPv6.

Jiri Benc (5):
  vxlan: set needed headroom correctly
  vxlan: reject IPv6 addresses if IPv6 is not configured
  qlcnic: track vxlan port count
  be2net: allow offloading with the same port for IPv4 and IPv6
  bnx2x: track vxlan port count

 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h  |  1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 --
 drivers/net/ethernet/emulex/benet/be.h   |  1 +
 drivers/net/ethernet/emulex/benet/be_main.c  | 10 ++
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h  |  1 +
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 18 +-
 drivers/net/vxlan.c  | 15 +--
 7 files changed, 47 insertions(+), 13 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 3/5] qlcnic: track vxlan port count

2015-09-17 Thread Jiri Benc
The callback for adding vxlan port can be called with the same port for
both IPv4 and IPv6. Do not disable the offloading when the same port for
both protocols is added and later one of them removed.

Signed-off-by: Jiri Benc 
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h  |  1 +
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
index 06bcc734fe8d..d6696cfa11d2 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
@@ -536,6 +536,7 @@ struct qlcnic_hardware_context {
u8 extend_lb_time;
u8 phys_port_id[ETH_ALEN];
u8 lb_mode;
+   u8 vxlan_port_count;
u16 vxlan_port;
struct device *hwmon_dev;
u32 post_mode;
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 8b08b20e8b30..d4481454b5f8 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -483,11 +483,17 @@ static void qlcnic_add_vxlan_port(struct net_device 
*netdev,
/* Adapter supports only one VXLAN port. Use very first port
 * for enabling offload
 */
-   if (!qlcnic_encap_rx_offload(adapter) || ahw->vxlan_port)
+   if (!qlcnic_encap_rx_offload(adapter))
return;
+   if (!ahw->vxlan_port_count) {
+   ahw->vxlan_port_count = 1;
+   ahw->vxlan_port = ntohs(port);
+   adapter->flags |= QLCNIC_ADD_VXLAN_PORT;
+   return;
+   }
+   if (ahw->vxlan_port == ntohs(port))
+   ahw->vxlan_port_count++;
 
-   ahw->vxlan_port = ntohs(port);
-   adapter->flags |= QLCNIC_ADD_VXLAN_PORT;
 }
 
 static void qlcnic_del_vxlan_port(struct net_device *netdev,
@@ -496,11 +502,13 @@ static void qlcnic_del_vxlan_port(struct net_device 
*netdev,
struct qlcnic_adapter *adapter = netdev_priv(netdev);
struct qlcnic_hardware_context *ahw = adapter->ahw;
 
-   if (!qlcnic_encap_rx_offload(adapter) || !ahw->vxlan_port ||
+   if (!qlcnic_encap_rx_offload(adapter) || !ahw->vxlan_port_count ||
(ahw->vxlan_port != ntohs(port)))
return;
 
-   adapter->flags |= QLCNIC_DEL_VXLAN_PORT;
+   ahw->vxlan_port_count--;
+   if (!ahw->vxlan_port_count)
+   adapter->flags |= QLCNIC_DEL_VXLAN_PORT;
 }
 
 static netdev_features_t qlcnic_features_check(struct sk_buff *skb,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 net-next 06/10] qede: classification configuration

2015-09-17 Thread Yuval Mintz
From: Sudarsana Kalluru 

Add the ability to configure basic classification in driver by
implementing ndo_set_mac_address() and ndo_set_rx_mode().

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede.h  |  10 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 241 +++
 2 files changed, 251 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 7680106..5729128 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -89,6 +89,9 @@ struct qede_dev {
struct qed_update_vport_rss_params  rss_params;
u16 q_num_rx_buffers; /* Must be a power of two */
u16 q_num_tx_buffers; /* Must be a power of two */
+
+   struct delayed_work sp_task;
+   unsigned long   sp_flags;
 };
 
 enum QEDE_STATE {
@@ -188,6 +191,13 @@ struct qede_fastpath {
 
 #define QEDE_CSUM_ERRORBIT(0)
 #define QEDE_CSUM_UNNECESSARY  BIT(1)
+
+#define QEDE_SP_RX_MODE1
+
+union qede_reload_args {
+   u16 mtu;
+};
+
 #define RX_RING_SIZE_POW   13
 #define RX_RING_SIZE   BIT(RX_RING_SIZE_POW)
 #define NUM_RX_BDS_MAX (RX_RING_SIZE - 1)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 14e0b09..7b3c3d8 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1030,10 +1030,31 @@ static irqreturn_t qede_msix_fp_int(int irq, void 
*fp_cookie)
 
 static int qede_open(struct net_device *ndev);
 static int qede_close(struct net_device *ndev);
+static int qede_set_mac_addr(struct net_device *ndev, void *p);
+static void qede_set_rx_mode(struct net_device *ndev);
+static void qede_config_rx_mode(struct net_device *ndev);
+
+static int qede_set_ucast_rx_mac(struct qede_dev *edev,
+enum qed_filter_xcast_params_type opcode,
+unsigned char mac[ETH_ALEN])
+{
+   struct qed_filter_params filter_cmd;
+
+   memset(_cmd, 0, sizeof(filter_cmd));
+   filter_cmd.type = QED_FILTER_TYPE_UCAST;
+   filter_cmd.filter.ucast.type = opcode;
+   filter_cmd.filter.ucast.mac_valid = 1;
+   ether_addr_copy(filter_cmd.filter.ucast.mac, mac);
+
+   return edev->ops->filter_config(edev->cdev, _cmd);
+}
+
 static const struct net_device_ops qede_netdev_ops = {
.ndo_open = qede_open,
.ndo_stop = qede_close,
.ndo_start_xmit = qede_start_xmit,
+   .ndo_set_rx_mode = qede_set_rx_mode,
+   .ndo_set_mac_address = qede_set_mac_addr,
.ndo_validate_addr = eth_validate_addr,
 };
 
@@ -1198,6 +1219,20 @@ err:
return -ENOMEM;
 }
 
+static void qede_sp_task(struct work_struct *work)
+{
+   struct qede_dev *edev = container_of(work, struct qede_dev,
+sp_task.work);
+   mutex_lock(>qede_lock);
+
+   if (edev->state == QEDE_STATE_OPEN) {
+   if (test_and_clear_bit(QEDE_SP_RX_MODE, >sp_flags))
+   qede_config_rx_mode(edev->ndev);
+   }
+
+   mutex_unlock(>qede_lock);
+}
+
 static void qede_update_pf_params(struct qed_dev *cdev)
 {
struct qed_pf_params pf_params;
@@ -1269,6 +1304,9 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
 
edev->ops->common->set_id(cdev, edev->ndev->name, DRV_MODULE_VERSION);
 
+   INIT_DELAYED_WORK(>sp_task, qede_sp_task);
+   mutex_init(>qede_lock);
+
DP_INFO(edev, "Ending successfully qede probe\n");
 
return 0;
@@ -1306,6 +1344,7 @@ static void __qede_remove(struct pci_dev *pdev, enum 
qede_remove_mode mode)
 
DP_INFO(edev, "Starting qede_remove\n");
 
+   cancel_delayed_work_sync(>sp_task);
unregister_netdev(ndev);
 
edev->ops->common->set_power_state(cdev, PCI_D0);
@@ -2025,6 +2064,24 @@ static int qede_start_queues(struct qede_dev *edev)
return 0;
 }
 
+static int qede_set_mcast_rx_mac(struct qede_dev *edev,
+enum qed_filter_xcast_params_type opcode,
+unsigned char *mac, int num_macs)
+{
+   struct qed_filter_params filter_cmd;
+   int i;
+
+   memset(_cmd, 0, sizeof(filter_cmd));
+   filter_cmd.type = QED_FILTER_TYPE_MCAST;
+   filter_cmd.filter.mcast.type = opcode;
+   filter_cmd.filter.mcast.num = num_macs;
+
+   for (i = 0; i < num_macs; i++, mac += ETH_ALEN)
+   ether_addr_copy(filter_cmd.filter.mcast.mac[i], mac);
+
+   return edev->ops->filter_config(edev->cdev, _cmd);
+}
+
 enum qede_unload_mode {

[RFC v2 net-next 08/10] qede: Add support for link

2015-09-17 Thread Yuval Mintz
From: Sudarsana Kalluru 

This adds basic link functionality to qede - driver still doesn't provide
users with an API to change any link property, but it does request qed to
initialize the link using default configuration, and registers a callback
that allows it to get link notifications.

This patch adds the ability of the driver to set the carrier as active and
to enable traffic as a result of async. link notifications.
Following this patch, driver should be capable of running traffic.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 47 
 1 file changed, 47 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 7b3c3d8..8cb1bb5 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -87,6 +87,7 @@ static int qede_probe(struct pci_dev *pdev, const struct 
pci_device_id *id);
 static void qede_remove(struct pci_dev *pdev);
 static int qede_alloc_rx_buffer(struct qede_dev *edev,
struct qede_rx_queue *rxq);
+static void qede_link_update(void *dev, struct qed_link_output *link);
 
 static struct pci_driver qede_pci_driver = {
.name = "qede",
@@ -95,6 +96,12 @@ static struct pci_driver qede_pci_driver = {
.remove = qede_remove,
 };
 
+static struct qed_eth_cb_ops qede_ll_ops = {
+   {
+   .link_update = qede_link_update,
+   },
+};
+
 static int qede_netdev_event(struct notifier_block *this, unsigned long event,
 void *ptr)
 {
@@ -1304,6 +1311,8 @@ static int __qede_probe(struct pci_dev *pdev, u32 
dp_module, u8 dp_level,
 
edev->ops->common->set_id(cdev, edev->ndev->name, DRV_MODULE_VERSION);
 
+   edev->ops->register_ops(cdev, _ll_ops, edev);
+
INIT_DELAYED_WORK(>sp_task, qede_sp_task);
mutex_init(>qede_lock);
 
@@ -2088,6 +2097,7 @@ enum qede_unload_mode {
 
 static void qede_unload(struct qede_dev *edev, enum qede_unload_mode mode)
 {
+   struct qed_link_params link_params;
int rc;
 
DP_INFO(edev, "Starting qede unload\n");
@@ -2099,6 +2109,10 @@ static void qede_unload(struct qede_dev *edev, enum 
qede_unload_mode mode)
netif_tx_disable(edev->ndev);
netif_carrier_off(edev->ndev);
 
+   /* Reset the link */
+   memset(_params, 0, sizeof(link_params));
+   link_params.link_up = false;
+   edev->ops->common->set_link(edev->cdev, _params);
rc = qede_stop_queues(edev);
if (rc) {
qede_sync_free_irqs(edev);
@@ -2129,6 +2143,8 @@ enum qede_load_mode {
 
 static int qede_load(struct qede_dev *edev, enum qede_load_mode mode)
 {
+   struct qed_link_params link_params;
+   struct qed_link_output link_output;
int rc;
 
DP_INFO(edev, "Starting qede load\n");
@@ -2172,6 +2188,17 @@ static int qede_load(struct qede_dev *edev, enum 
qede_load_mode mode)
mutex_lock(>qede_lock);
edev->state = QEDE_STATE_OPEN;
mutex_unlock(>qede_lock);
+
+   /* Ask for link-up using current configuration */
+   memset(_params, 0, sizeof(link_params));
+   link_params.link_up = true;
+   edev->ops->common->set_link(edev->cdev, _params);
+
+   /* Query whether link is already-up */
+   memset(_output, 0, sizeof(link_output));
+   edev->ops->common->get_link(edev->cdev, _output);
+   qede_link_update(edev, _output);
+
DP_INFO(edev, "Ending successfully qede load\n");
 
return 0;
@@ -2217,6 +2244,26 @@ static int qede_close(struct net_device *ndev)
return 0;
 }
 
+static void qede_link_update(void *dev, struct qed_link_output *link)
+{
+   struct qede_dev *edev = dev;
+
+   if (!netif_running(edev->ndev)) {
+   DP_VERBOSE(edev, NETIF_MSG_LINK, "Interface is not running\n");
+   return;
+   }
+
+   if (link->link_up) {
+   DP_NOTICE(edev, "Link is up\n");
+   netif_tx_start_all_queues(edev->ndev);
+   netif_carrier_on(edev->ndev);
+   } else {
+   DP_NOTICE(edev, "Link is down\n");
+   netif_tx_disable(edev->ndev);
+   netif_carrier_off(edev->ndev);
+   }
+}
+
 static int qede_set_mac_addr(struct net_device *ndev, void *p)
 {
struct qede_dev *edev = netdev_priv(ndev);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v2 net-next 03/10] qede: Add basic Network driver

2015-09-17 Thread Yuval Mintz
The Qlogic Everest Driver for Ethernet is the Ethernet specifc module for
579xx ethernet products by Qlogic.

This patch adds a very minimal PCI driver, one that doesn't yet register
a network device, but one that does interact with qed and does a basic
initialization of the HW.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/Kconfig  |   5 +
 drivers/net/ethernet/qlogic/Makefile |   1 +
 drivers/net/ethernet/qlogic/qede/Makefile|   3 +
 drivers/net/ethernet/qlogic/qede/qede.h  |  73 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 354 +++
 5 files changed, 436 insertions(+)
 create mode 100644 drivers/net/ethernet/qlogic/qede/Makefile
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede.h
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_main.c

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 58c3fb3..30a6f24 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -97,4 +97,9 @@ config QED
---help---
  This enables the support for ...
 
+config QEDE
+   tristate "QLogic QED 25/40/100Gb Ethernet NIC"
+   depends on QED
+   ---help---
+ This enables the support for ...
 endif # NET_VENDOR_QLOGIC
diff --git a/drivers/net/ethernet/qlogic/Makefile 
b/drivers/net/ethernet/qlogic/Makefile
index 7600138..cee90e0 100644
--- a/drivers/net/ethernet/qlogic/Makefile
+++ b/drivers/net/ethernet/qlogic/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_QLCNIC) += qlcnic/
 obj-$(CONFIG_QLGE) += qlge/
 obj-$(CONFIG_NETXEN_NIC) += netxen/
 obj-$(CONFIG_QED) += qed/
+obj-$(CONFIG_QEDE)+= qede/
diff --git a/drivers/net/ethernet/qlogic/qede/Makefile 
b/drivers/net/ethernet/qlogic/qede/Makefile
new file mode 100644
index 000..bedfe9f
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_QEDE) := qede.o
+
+qede-y := qede_main.o
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
new file mode 100644
index 000..7e2bcfa
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -0,0 +1,73 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the GNU General Public License
+* (GPL) Version 2, available from the file COPYING in the main directory of
+* this source tree.
+*/
+
+#ifndef _QEDE_H_
+#define _QEDE_H_
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define QEDE_MAJOR_VERSION 8
+#define QEDE_MINOR_VERSION 4
+#define QEDE_REVISION_VERSION  0
+#define QEDE_ENGINEERING_VERSION   0
+#define DRV_MODULE_VERSION __stringify(QEDE_MAJOR_VERSION) "." \
+   __stringify(QEDE_MINOR_VERSION) "." \
+   __stringify(QEDE_REVISION_VERSION) "."  \
+   __stringify(QEDE_ENGINEERING_VERSION)
+
+#define QEDE_ETH_INTERFACE_VERSION 300
+
+#define DRV_MODULE_SYM qede
+
+struct qede_dev {
+   struct qed_dev  *cdev;
+   struct net_device   *ndev;
+   struct pci_dev  *pdev;
+
+   u32 dp_module;
+   u8  dp_level;
+
+   const struct qed_eth_ops*ops;
+
+   struct qed_dev_eth_info dev_info;
+#define QEDE_MAX_RSS_CNT(edev) ((edev)->dev_info.num_queues)
+#define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \
+(edev)->dev_info.num_tc)
+
+   u16 num_rss;
+   u8  num_tc;
+#define QEDE_RSS_CNT(edev) ((edev)->num_rss)
+#define QEDE_TSS_CNT(edev) ((edev)->num_rss *  \
+(edev)->num_tc)
+#define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss)
+#define QEDE_TC_IDX(edev, txqidx)  ((txqidx) / (edev)->num_rss)
+
+   struct qed_int_info int_info;
+   unsigned char   primary_mac[ETH_ALEN];
+
+   /* Smaller private varaiant of the RTNL lock */
+   struct mutexqede_lock;
+   u32 state; /* Protected by qede_lock */
+};
+
+/* Debug print definitions */
+#define DP_NAME(edev) ((edev)->ndev->name)
+
+#endif /* _QEDE_H_ */
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
new file mode 100644
index 000..35065dc
--- /dev/null
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -0,0 +1,354 @@
+/* QLogic qede NIC Driver
+* Copyright (c) 2015 QLogic Corporation
+*
+* This software is available under the terms of the GNU General Public License
+* (GPL) Version 2, available from the 

[RFC v2 net-next 09/10] qed: Add statistics support

2015-09-17 Thread Yuval Mintz
From: Manish Chopra 

Device statistics can be gathered on-demand. This adds the qed support for
reading the statistics [both function and port] from the device, and adds
to the public API a method for requesting the current statistics.

Signed-off-by: Manish Chopra 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed.h |  14 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 244 +-
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |   3 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  30 
 drivers/net/ethernet/qlogic/qed/qed_l2.c  |   3 +
 include/linux/qed/qed_eth_if.h|   3 +
 6 files changed, 296 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 965b728..f809f7b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -215,7 +215,20 @@ struct qed_qm_info {
u32 pf_rl;
 };
 
+struct storm_stats {
+   u32 address;
+   u32 len;
+};
+
+struct qed_storm_stats {
+   struct storm_stats mstats;
+   struct storm_stats pstats;
+   struct storm_stats tstats;
+   struct storm_stats ustats;
+};
+
 struct qed_fw_data {
+   struct fw_ver_info  *fw_ver_info;
const u8*modes_tree_buf;
union init_op   *init_ops;
const u32   *arr_data;
@@ -299,6 +312,7 @@ struct qed_hwfn {
 
/* QM init */
struct qed_qm_info  qm_info;
+   struct qed_storm_stats  storm_stats;
 
/* Buffer for unzipping firmware data */
void*unzip_buf;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index cde72e2..3993584 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -647,8 +647,10 @@ int qed_hw_init(struct qed_dev *cdev,
bool allow_npar_tx_switch,
const u8 *bin_fw_data)
 {
-   u32 load_code, param;
+   struct qed_storm_stats *p_stat;
+   u32 load_code, param, *p_address;
int rc, mfw_rc, i;
+   u8 fw_vport = 0;
 
rc = qed_init_fw_data(cdev, bin_fw_data);
if (rc != 0)
@@ -657,6 +659,10 @@ int qed_hw_init(struct qed_dev *cdev,
for_each_hwfn(cdev, i) {
struct qed_hwfn *p_hwfn = >hwfns[i];
 
+   rc = qed_fw_vport(p_hwfn, 0, _vport);
+   if (rc != 0)
+   return rc;
+
/* Enable DMAE in PXP */
rc = qed_change_pci_hwfn(p_hwfn, p_hwfn->p_main_ptt, true);
 
@@ -723,6 +729,25 @@ int qed_hw_init(struct qed_dev *cdev,
}
 
p_hwfn->hw_init_done = true;
+
+   /* init PF stats */
+   p_stat = _hwfn->storm_stats;
+   p_stat->mstats.address = BAR0_MAP_REG_MSDM_RAM +
+MSTORM_QUEUE_STAT_OFFSET(fw_vport);
+   p_stat->mstats.len = sizeof(struct eth_mstorm_per_queue_stat);
+
+   p_stat->ustats.address = BAR0_MAP_REG_USDM_RAM +
+USTORM_QUEUE_STAT_OFFSET(fw_vport);
+   p_stat->ustats.len = sizeof(struct eth_ustorm_per_queue_stat);
+
+   p_stat->pstats.address = BAR0_MAP_REG_PSDM_RAM +
+PSTORM_QUEUE_STAT_OFFSET(fw_vport);
+   p_stat->pstats.len = sizeof(struct eth_pstorm_per_queue_stat);
+
+   p_address = _stat->tstats.address;
+   *p_address = BAR0_MAP_REG_TSDM_RAM +
+TSTORM_PORT_STAT_OFFSET(MFW_PORT(p_hwfn));
+   p_stat->tstats.len = sizeof(struct tstorm_per_port_stat);
}
 
return 0;
@@ -1503,6 +1528,223 @@ void qed_chain_free(struct qed_dev *cdev,
  p_chain->p_phys_addr);
 }
 
+static void __qed_get_vport_stats(struct qed_dev   *cdev,
+ struct qed_eth_stats  *stats)
+{
+   int i, j;
+
+   memset(stats, 0, sizeof(*stats));
+
+   for_each_hwfn(cdev, i) {
+   struct qed_hwfn *p_hwfn = >hwfns[i];
+   struct eth_mstorm_per_queue_stat mstats;
+   struct eth_ustorm_per_queue_stat ustats;
+   struct eth_pstorm_per_queue_stat pstats;
+   struct tstorm_per_port_stat tstats;
+   struct port_stats port_stats;
+   struct qed_ptt *p_ptt = qed_ptt_acquire(p_hwfn);
+
+   if (!p_ptt) {
+   DP_ERR(p_hwfn, "Failed to acquire ptt\n");
+   continue;
+   }
+
+   memset(, 0, sizeof(mstats));
+   qed_memcpy_from(p_hwfn, p_ptt, ,
+

[RFC v2 net-next 07/10] qed: Add link support

2015-09-17 Thread Yuval Mintz
Physical link is handled by the management Firmware.
This patch lays the infrastructure for attention handling in the driver,
as link change notifications arrive via async. attentions,
as well the handling of such notifications.

This patch also extends the API with the protocol drivers by adding registered
callbacks which the protocol driver passes to qed in order to be notified
of async. events originating from the FW/HW.

Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qed/qed.h  |  20 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c  | 106 -
 drivers/net/ethernet/qlogic/qed/qed_int.c  | 340 -
 drivers/net/ethernet/qlogic/qed/qed_l2.c   |   9 +
 drivers/net/ethernet/qlogic/qed/qed_main.c | 212 ++
 drivers/net/ethernet/qlogic/qed/qed_mcp.c  | 300 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h  | 126 ++-
 include/linux/qed/qed_eth_if.h |   4 +
 8 files changed, 1112 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index ab87526..965b728 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -111,6 +111,18 @@ enum QED_FEATURE {
QED_MAX_FEATURES,
 };
 
+enum QED_PORT_MODE {
+   QED_PORT_MODE_DE_2X40G,
+   QED_PORT_MODE_DE_2X50G,
+   QED_PORT_MODE_DE_1X100G,
+   QED_PORT_MODE_DE_4X10G_F,
+   QED_PORT_MODE_DE_4X10G_E,
+   QED_PORT_MODE_DE_4X20G,
+   QED_PORT_MODE_DE_1X40G,
+   QED_PORT_MODE_DE_2X25G,
+   QED_PORT_MODE_DE_1X25G
+};
+
 struct qed_hw_info {
/* PCI personality */
enum qed_pci_personalitypersonality;
@@ -407,6 +419,13 @@ struct qed_dev {
u8  protocol;
 #define IS_QED_ETH_IF(cdev) ((cdev)->protocol == QED_PROTOCOL_ETH)
 
+   /* Callbacks to protocol driver */
+   union {
+   struct qed_common_cb_ops*common;
+   struct qed_eth_cb_ops   *eth;
+   } protocol_ops;
+   void*ops_cookie;
+
const struct firmware   *firmware;
 };
 
@@ -456,6 +475,7 @@ static inline u8 qed_concrete_to_sw_fid(struct qed_dev 
*cdev,
 /* Prototypes */
 int qed_fill_dev_info(struct qed_dev   *cdev,
  struct qed_dev_info   *dev_info);
+void qed_link_update(struct qed_hwfn *hwfn);
 u32 qed_unzip_data(struct qed_hwfn *p_hwfn,
   u32 input_len, u8 *input_buf,
   u32 max_size, u8 *unzip_buf);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 30408b7..cde72e2 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -1040,8 +1040,9 @@ static void qed_hw_get_resc(struct qed_hwfn *p_hwfn)
 static int qed_hw_get_nvm_info(struct qed_hwfn *p_hwfn,
   struct qed_ptt   *p_ptt)
 {
-   u32 nvm_cfg1_offset, mf_mode, addr, generic_cont0;
-   u32 val;
+   u32 nvm_cfg1_offset, mf_mode, addr, generic_cont0, core_cfg;
+   struct qed_mcp_link_params *link;
+   u32 port_cfg_addr, link_temp, val;
 
/* Read global nvm_cfg address */
u32 nvm_cfg_addr = qed_rd(p_hwfn, p_ptt, MISC_REG_GEN_PURP_CR0);
@@ -1061,6 +1062,48 @@ static int qed_hw_get_nvm_info(struct qed_hwfn   *p_hwfn,
   offsetof(struct nvm_cfg1_glob, pci_id);
p_hwfn->hw_info.vendor_id = qed_rd(p_hwfn, p_ptt, addr) &
NVM_CFG1_GLOB_VENDOR_ID_MASK;
+
+   addr = MCP_REG_SCRATCH + nvm_cfg1_offset +
+  offsetof(struct nvm_cfg1, glob) +
+  offsetof(struct nvm_cfg1_glob, core_cfg);
+
+   core_cfg = qed_rd(p_hwfn, p_ptt, addr);
+
+   switch ((core_cfg & NVM_CFG1_GLOB_NETWORK_PORT_MODE_MASK) >>
+   NVM_CFG1_GLOB_NETWORK_PORT_MODE_OFFSET) {
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_2X40G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_2X40G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_2X50G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_2X50G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_1X100G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_1X100G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X10G_F:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X10G_F;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X10G_E:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X10G_E;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_4X20G:
+   p_hwfn->hw_info.port_mode = QED_PORT_MODE_DE_4X20G;
+   break;
+   case NVM_CFG1_GLOB_NETWORK_PORT_MODE_DE_1X40G:
+

[RFC v2 net-next 05/10] qede: Add basic network device support

2015-09-17 Thread Yuval Mintz
From: Sudarsana Kalluru 

This patch includes the basic Rx/Tx support for the driver [although
carrier will still never be turned on].
Following this patch the driver registers a network device, initializes
it and prepares it for traffic.

Signed-off-by: Sudarsana Kalluru 
Signed-off-by: Yuval Mintz 
Signed-off-by: Ariel Elior 
---
 drivers/net/ethernet/qlogic/qede/qede.h  |  132 ++
 drivers/net/ethernet/qlogic/qede/qede_main.c | 1801 ++
 2 files changed, 1933 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index 7e2bcfa..7680106 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -36,6 +36,8 @@
 
 #define DRV_MODULE_SYM qede
 
+#define QEDE_NAPI_WEIGHT   (NAPI_POLL_WEIGHT)
+
 struct qede_dev {
struct qed_dev  *cdev;
struct net_device   *ndev;
@@ -51,6 +53,7 @@ struct qede_dev {
 #define QEDE_MAX_TSS_CNT(edev) ((edev)->dev_info.num_queues * \
 (edev)->dev_info.num_tc)
 
+   struct qede_fastpath*fp_array;
u16 num_rss;
u8  num_tc;
 #define QEDE_RSS_CNT(edev) ((edev)->num_rss)
@@ -58,6 +61,9 @@ struct qede_dev {
 (edev)->num_tc)
 #define QEDE_TSS_IDX(edev, txqidx) ((txqidx) % (edev)->num_rss)
 #define QEDE_TC_IDX(edev, txqidx)  ((txqidx) / (edev)->num_rss)
+#define QEDE_TX_QUEUE(edev, txqidx)\
+   (&(edev)->fp_array[QEDE_TSS_IDX((edev), (txqidx))].txqs[QEDE_TC_IDX( \
+   (edev), (txqidx))])
 
struct qed_int_info int_info;
unsigned char   primary_mac[ETH_ALEN];
@@ -65,9 +71,135 @@ struct qede_dev {
/* Smaller private varaiant of the RTNL lock */
struct mutexqede_lock;
u32 state; /* Protected by qede_lock */
+   u16 rx_buf_size;
+   /* L2 header size + 2*VLANs (8 bytes) + LLC SNAP (8 bytes) */
+#define ETH_OVERHEAD   (ETH_HLEN + 8 + 8)
+   /* Max supported alignment is 256 (8 shift)
+* minimal alignment shift 6 is optimal for 57xxx HW performance
+*/
+#define QEDE_RX_ALIGN_SHIFTmax(6, min(8, L1_CACHE_SHIFT))
+   /* We assume skb_build() uses sizeof(struct skb_shared_info) bytes
+* at the end of skb->data, to avoid wasting a full cache line.
+* This reduces memory use (skb->truesize).
+*/
+#define QEDE_FW_RX_ALIGN_END   \
+   max_t(u64, 1UL << QEDE_RX_ALIGN_SHIFT,  \
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
+   struct qed_update_vport_rss_params  rss_params;
+   u16 q_num_rx_buffers; /* Must be a power of two */
+   u16 q_num_tx_buffers; /* Must be a power of two */
+};
+
+enum QEDE_STATE {
+   QEDE_STATE_CLOSED,
+   QEDE_STATE_OPEN,
+};
+
+#define U64_LO(x)  ((u32)(((u64)(x)) & 0x))
+#define U64_HI(x)  ((u32)(((u64)(x)) >> 32))
+#define HILO_U64(hi, lo)   u64)(hi)) << 32) + (lo))
+
+#defineMAX_NUM_TC  8
+#defineMAX_NUM_PRI 8
+
+/* The driver supports the new build_skb() API:
+ * RX ring buffer contains pointer to kmalloc() data only,
+ * skb are built only after the frame was DMA-ed.
+ */
+struct sw_rx_data {
+   u8 *data;
+
+   DEFINE_DMA_UNMAP_ADDR(mapping);
+};
+
+struct qede_rx_queue {
+   __le16  *hw_cons_ptr;
+   struct sw_rx_data   *sw_rx_ring;
+   u16 sw_rx_cons;
+   u16 sw_rx_prod;
+   struct qed_chainrx_bd_ring;
+   struct qed_chainrx_comp_ring;
+   void __iomem*hw_rxq_prod_addr;
+
+   int rx_buf_size;
+
+   u16 num_rx_buffers;
+   u16 rxq_id;
+
+   u64 rx_hw_errors;
+   u64 rx_alloc_errors;
+};
+
+union db_prod {
+   struct eth_db_data data;
+   u32 raw;
+};
+
+struct sw_tx_bd {
+   struct sk_buff *skb;
+   u8 flags;
+/* Set on the first BD descriptor when there is a split BD */
+#define QEDE_TSO_SPLIT_BD  BIT(0)
+};
+
+struct qede_tx_queue {
+   int index; /* Queue index */
+   __le16  *hw_cons_ptr;
+   struct sw_tx_bd *sw_tx_ring;
+   u16 sw_tx_cons;
+   u16 sw_tx_prod;
+   struct qed_chaintx_pbl;
+   void __iomem

Re: [PATCH net-next 2/2] net: bcmgenet: Implement RX coalescing control knobs

2015-09-17 Thread Florian Fainelli
On 17/09/15 10:58, Florian Fainelli wrote:
> On 16/09/15 16:47, Florian Fainelli wrote:
>> Add support for the ethtool rx-frames coalescing parameter which allows
>> defining the number of RX interrupts per frames received. The RDMA
>> engine supports a configurable timeout with a resolution of
>> approximately 8.192 us.
>>
>> We can no longer enable the BDONE/PDONE interrupts as those would
>> fire for each packet/buffer received, which would defeat the MBDONE
>> interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a
>> PDONE/BDONE interrupt when the threshold is set to 1.
> 
> *sigh*, I missed the initialization of the INTR_THRESHOLD register, so
> right now, we just have no interrupts configured properly for RX, will
> re-submit shortly.
> 
> Meanwhile, please send feedback if you have any, thanks!
> 

Actually, no that version of the patch is just fine, since we already
programmed the DMA_MBUF_DONE_THRESH since commit
6f5a272c99108d9f8450c454a4baede9e7cc643f (" net: bcmgenet: rework Rx
queue init")

Sorry about the noise, -ENOCOFFEE.
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] net: fix data race on sk_buff after re-cloning

2015-09-17 Thread Dmitry Vyukov
KernelThreadSanitizer (KTSAN) reported the following race (on 4.2 rc2):

ThreadSanitizer: data-race in __copy_skb_header

Write at 0x8800bb158f48 of size 8 by thread 3146 on CPU 5:
 [] __copy_skb_header+0xee/0x1d0 net/core/skbuff.c:765
 [] __skb_clone+0x5c/0x320 net/core/skbuff.c:820
 [] skb_clone+0xd0/0x130 net/core/skbuff.c:962
 [] tcp_transmit_skb+0xb5/0x1750 net/ipv4/tcp_output.c:932
 [] __tcp_retransmit_skb+0x244/0xb10 
net/ipv4/tcp_output.c:2638
 [] tcp_retransmit_skb+0x2b/0x240 net/ipv4/tcp_output.c:2655
 [] tcp_retransmit_timer+0x579/0xb70 net/ipv4/tcp_timer.c:433
 [] tcp_write_timer_handler+0x109/0x320 
net/ipv4/tcp_timer.c:514
 [] tcp_write_timer+0xc0/0xe0 net/ipv4/tcp_timer.c:532
 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155
 [< inline >] __run_timers kernel/time/timer.c:1231
 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414
 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273
 [] apic_timer_interrupt+0x8a/0xa0 
arch/x86/entry/entry_64.S:790

Previous read at 0x8800bb158f48 of size 8 by thread 3168 on CPU 0:
 [] skb_release_head_state+0x4b/0x120 net/core/skbuff.c:640
 [] skb_release_all+0x1d/0x50 net/core/skbuff.c:657
 [< inline >] __kfree_skb net/core/skbuff.c:673
 [] consume_skb+0x60/0x100 net/core/skbuff.c:746
 [] __dev_kfree_skb_any+0x4d/0x60 net/core/dev.c:2312
 [< inline >] dev_kfree_skb_any include/linux/netdevice.h:2933
 [] e1000_unmap_and_free_tx_resource.isra.42+0xd3/0x120 
drivers/net/ethernet/intel/e1000/e1000_main.c:1973
 [< inline >] e1000_clean_tx_irq 
drivers/net/ethernet/intel/e1000/e1000_main.c:3881
 [] e1000_clean+0x24d/0x11e0 
drivers/net/ethernet/intel/e1000/e1000_main.c:3818
 [< inline >] napi_poll net/core/dev.c:4744
 [] net_rx_action+0x489/0x690 net/core/dev.c:4809
 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273
 [] apic_timer_interrupt+0x8a/0xa0 
arch/x86/entry/entry_64.S:790

Mutexes locked by thread 3146:
Mutex 436586 is locked here:
 [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
 [] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
 [< inline >] spin_lock include/linux/spinlock.h:312
 [] tcp_write_timer+0x25/0xe0 net/ipv4/tcp_timer.c:530
 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155
 [< inline >] __run_timers kernel/time/timer.c:1231
 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414
 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273
 [] apic_timer_interrupt+0x8a/0xa0 
arch/x86/entry/entry_64.S:790

The only way I can see it happens is as follows:
 - sk_buff_fclones is allocated
 - then it is cloned which returns fclones->skb2
 - then fclones->skb2 is freed, which drops fclones->fclone_ref to 1
 - then the original skb is cloned again
 - at this point skb_clone sees that fclones->fclone_ref = 1
   and returns fclones->skb2 again
Now initialization of fclones->skb2 races with the previous use,
because refcounting lacks proper memory barriers.

I am looking at skb code for the first time, so I can't conclude
whether such scenario is possible or not. But refcount at least in
kfree_skbmem() looks broken. For example, kfree_skb() properly
inserts rmb after the fast-path check:

if (likely(atomic_read(>users) == 1))
smp_rmb();

The patch contains a proposed fix.
If it looks good to you and the scenario looks sane,
then I will update the description and resend it.
---
 net/core/skbuff.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index dad4dd3..4c89bac 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -618,8 +618,9 @@ static void kfree_skbmem(struct sk_buff *skb)
/* We usually free the clone (TX completion) before original skb
 * This test would have no chance to be true for the clone,
 * while here, branch prediction will be good.
+* Paired with atomic_dec_and_test() below.
 */
-   if (atomic_read(>fclone_ref) == 1)
+   if (atomic_read_acquire(>fclone_ref) == 1)
goto fastpath;
break;
 
@@ -944,7 +945,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t 
gfp_mask)
return NULL;
 
if (skb->fclone == SKB_FCLONE_ORIG &&
-   atomic_read(>fclone_ref) == 1) {
+   /* Paired with atomic_dec_and_test() in kfree_skbmem(). */
+   atomic_read_acquire(>fclone_ref) == 1) {
n = >skb2;
atomic_set(>fclone_ref, 2);
} else {
-- 
2.6.0.rc0.131.gf624c3d

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list of all network namespaces

2015-09-17 Thread Cong Wang
On Thu, Sep 17, 2015 at 10:39 AM, Ani Sinha  wrote:
> On Thu, Sep 17, 2015 at 2:51 AM, Rosen, Rami  wrote:
>
>> Network namespaces which were created by other ways (like userspace 
>> applications
>> using the clone() system call) will *not* be reflected by neither of them.
>
> Will there be any interest if I cook up a kernel patch that lists all
> network namespaces through /proc?

How do you list them since they don't have names in kernel, names
are given in user-space.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary

2015-09-17 Thread Oliver Hartkopp

Hi Maxime,

On 17.09.2015 20:27, Maxime Ripard wrote:

On Thu, Sep 17, 2015 at 08:12:31PM +0200, Oliver Hartkopp wrote:



New CAN drivers go via can-next and net-next into mainline.





Hmmm, actually, I meant 2 and 3, the two defconfig patches.

The driver and bindings should of course go through Marc's tree.


Ok. Thanks for the fix :-)

Regards,
Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: restore vlan bits in xmit path

2015-09-17 Thread Pravin Shelar
On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville
 wrote:
> These seem to have been accidentally dropped in commit 371bd1061d29
> ("geneve: Consolidate Geneve functionality in single module.").
>
Geneve should not export vxlan feature. So that it never sees vxlan
tagged packets. Can you turn off the vlan feature?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] net: bcmgenet: Implement RX coalescing control knobs

2015-09-17 Thread Florian Fainelli
On 16/09/15 16:47, Florian Fainelli wrote:
> Add support for the ethtool rx-frames coalescing parameter which allows
> defining the number of RX interrupts per frames received. The RDMA
> engine supports a configurable timeout with a resolution of
> approximately 8.192 us.
> 
> We can no longer enable the BDONE/PDONE interrupts as those would
> fire for each packet/buffer received, which would defeat the MBDONE
> interrupt purpose. The MBDONE interrupt is guaranteed to correspond to a
> PDONE/BDONE interrupt when the threshold is set to 1.

*sigh*, I missed the initialization of the INTR_THRESHOLD register, so
right now, we just have no interrupts configured properly for RX, will
re-submit shortly.

Meanwhile, please send feedback if you have any, thanks!
-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list of all network namespaces

2015-09-17 Thread Jiri Benc
On Thu, 17 Sep 2015 10:39:57 -0700, Ani Sinha wrote:
> Will there be any interest if I cook up a kernel patch that lists all
> network namespaces through /proc?

/proc is a wrong interface for this, enumerating all net namespaces has
nothing to do with processes. Each process has its corresponding
namespaces in /proc already listed, which is as much as belongs
to /proc.

Dumping all net namespaces should be probably netlink based but
obviously, you'll have hard time sending file descriptors over netlink.
You can dump their netnsids but that won't help you much accessing the
namespace contents.

This is not as easy as it seems. But I'd love to have such feature.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] net: fix data race on sk_buff after re-cloning

2015-09-17 Thread Dmitry Vyukov
KernelThreadSanitizer (KTSAN) reported the following race (on 4.2 rc2):

ThreadSanitizer: data-race in __copy_skb_header

Write at 0x8800bb158f48 of size 8 by thread 3146 on CPU 5:
 [] __copy_skb_header+0xee/0x1d0 net/core/skbuff.c:765
 [] __skb_clone+0x5c/0x320 net/core/skbuff.c:820
 [] skb_clone+0xd0/0x130 net/core/skbuff.c:962
 [] tcp_transmit_skb+0xb5/0x1750 net/ipv4/tcp_output.c:932
 [] __tcp_retransmit_skb+0x244/0xb10 
net/ipv4/tcp_output.c:2638
 [] tcp_retransmit_skb+0x2b/0x240 net/ipv4/tcp_output.c:2655
 [] tcp_retransmit_timer+0x579/0xb70 net/ipv4/tcp_timer.c:433
 [] tcp_write_timer_handler+0x109/0x320 
net/ipv4/tcp_timer.c:514
 [] tcp_write_timer+0xc0/0xe0 net/ipv4/tcp_timer.c:532
 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155
 [< inline >] __run_timers kernel/time/timer.c:1231
 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414
 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273
 [] apic_timer_interrupt+0x8a/0xa0 
arch/x86/entry/entry_64.S:790

Previous read at 0x8800bb158f48 of size 8 by thread 3168 on CPU 0:
 [] skb_release_head_state+0x4b/0x120 net/core/skbuff.c:640
 [] skb_release_all+0x1d/0x50 net/core/skbuff.c:657
 [< inline >] __kfree_skb net/core/skbuff.c:673
 [] consume_skb+0x60/0x100 net/core/skbuff.c:746
 [] __dev_kfree_skb_any+0x4d/0x60 net/core/dev.c:2312
 [< inline >] dev_kfree_skb_any include/linux/netdevice.h:2933
 [] e1000_unmap_and_free_tx_resource.isra.42+0xd3/0x120 
drivers/net/ethernet/intel/e1000/e1000_main.c:1973
 [< inline >] e1000_clean_tx_irq 
drivers/net/ethernet/intel/e1000/e1000_main.c:3881
 [] e1000_clean+0x24d/0x11e0 
drivers/net/ethernet/intel/e1000/e1000_main.c:3818
 [< inline >] napi_poll net/core/dev.c:4744
 [] net_rx_action+0x489/0x690 net/core/dev.c:4809
 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273
 [] apic_timer_interrupt+0x8a/0xa0 
arch/x86/entry/entry_64.S:790

Mutexes locked by thread 3146:
Mutex 436586 is locked here:
 [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:158
 [] _raw_spin_lock+0x50/0x70 kernel/locking/spinlock.c:151
 [< inline >] spin_lock include/linux/spinlock.h:312
 [] tcp_write_timer+0x25/0xe0 net/ipv4/tcp_timer.c:530
 [] call_timer_fn+0x4c/0x1b0 kernel/time/timer.c:1155
 [< inline >] __run_timers kernel/time/timer.c:1231
 [] run_timer_softirq+0x313/0x500 kernel/time/timer.c:1414
 [] __do_softirq+0xbe/0x2f0 kernel/softirq.c:273
 [] apic_timer_interrupt+0x8a/0xa0 
arch/x86/entry/entry_64.S:790

The only way I can see it happens is as follows:
 - sk_buff_fclones is allocated
 - then it is cloned which returns fclones->skb2
 - then fclones->skb2 is freed, which drops fclones->fclone_ref to 1
 - then the original skb is cloned again
 - at this point skb_clone sees that fclones->fclone_ref = 1
   and returns fclones->skb2 again
Now initialization of fclones->skb2 races with the previous use,
because refcounting lacks proper memory barriers.

I am looking at skb code for the first time, so I can't conclude
whether such scenario is possible or not. But refcount at least in
kfree_skbmem() looks broken. For example, kfree_skb() properly
inserts rmb after the fast-path check:

if (likely(atomic_read(>users) == 1))
smp_rmb();

The patch contains a proposed fix.
If it looks good to you and the scenario looks sane,
then I will update the description and resend it.
---
 net/core/skbuff.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index dad4dd3..4c89bac 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -618,8 +618,9 @@ static void kfree_skbmem(struct sk_buff *skb)
/* We usually free the clone (TX completion) before original skb
 * This test would have no chance to be true for the clone,
 * while here, branch prediction will be good.
+* Paired with atomic_dec_and_test() below.
 */
-   if (atomic_read(>fclone_ref) == 1)
+   if (atomic_read_acquire(>fclone_ref) == 1)
goto fastpath;
break;
 
@@ -944,7 +945,8 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t 
gfp_mask)
return NULL;
 
if (skb->fclone == SKB_FCLONE_ORIG &&
-   atomic_read(>fclone_ref) == 1) {
+   /* Paired with atomic_dec_and_test() in kfree_skbmem(). */
+   atomic_read_acquire(>fclone_ref) == 1) {
n = >skb2;
atomic_set(>fclone_ref, 2);
} else {
-- 
2.6.0.rc0.131.gf624c3d

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] netlink: make sure -EBUSY won't escape from netlink_insert

2015-09-17 Thread Linus Torvalds
On Wed, Sep 16, 2015 at 10:41 PM, Christoph Paasch
 wrote:
>
> can this patch get queued up for 4.1 as well?
> It seems to fix a similar issue in 4.1.6.

I think Herbert has an additional patch for this issue. But yes, I
think should be scheduled for stable. Herbert?

Linus
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: list of all network namespaces

2015-09-17 Thread Ani Sinha
On Thu, Sep 17, 2015 at 2:51 AM, Rosen, Rami  wrote:

> Network namespaces which were created by other ways (like userspace 
> applications
> using the clone() system call) will *not* be reflected by neither of them.

Will there be any interest if I cook up a kernel patch that lists all
network namespaces through /proc?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary

2015-09-17 Thread Maxime Ripard
On Wed, Sep 16, 2015 at 01:21:18PM +0200, Gerhard Bertelsmann wrote:
> Hi,
> 
> please find attached the next version of my patch set. I have 
> taken all remarks from Maxime Ripard into the new version
> 
> Please review, test and report bugs if exists.
> 
> The patchset applies to all recent Kernel versions (4.x, next etc.).
> 
> [PATCH v8 1/4] Device Tree Binding Documentation
> [PATCH v8 2/4] Defconfig multi_v7
> [PATCH v8 3/4] Defconfig sunxi
> [PATCH v8 4/4] Kernel Module

Applied 3 and 4.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature


Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary

2015-09-17 Thread Oliver Hartkopp



On 17.09.2015 19:54, Maxime Ripard wrote:

On Wed, Sep 16, 2015 at 01:21:18PM +0200, Gerhard Bertelsmann wrote:

Hi,

please find attached the next version of my patch set. I have
taken all remarks from Maxime Ripard into the new version

Please review, test and report bugs if exists.

The patchset applies to all recent Kernel versions (4.x, next etc.).

[PATCH v8 1/4] Device Tree Binding Documentation
[PATCH v8 2/4] Defconfig multi_v7
[PATCH v8 3/4] Defconfig sunxi
[PATCH v8 4/4] Kernel Module


Applied 3 and 4.


Applied to what tree?

That's not the friendly way when Marc asks you about the documentation about 
the device tree (patch 1) and you commit the CAN driver and the sunxi 
defconfig (patch 3 & 4) that he mainly reviewed to whatever tree.


New CAN drivers go via can-next and net-next into mainline.

So please answer Marcs question and let him queue up the CAN driver via 
can-next himself.


Thanks,
Oliver

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 0/4] can: Allwinner A10/A20 CAN Controller support - Summary

2015-09-17 Thread Maxime Ripard
On Thu, Sep 17, 2015 at 08:12:31PM +0200, Oliver Hartkopp wrote:
> 
> 
> On 17.09.2015 19:54, Maxime Ripard wrote:
> >On Wed, Sep 16, 2015 at 01:21:18PM +0200, Gerhard Bertelsmann wrote:
> >>Hi,
> >>
> >>please find attached the next version of my patch set. I have
> >>taken all remarks from Maxime Ripard into the new version
> >>
> >>Please review, test and report bugs if exists.
> >>
> >>The patchset applies to all recent Kernel versions (4.x, next etc.).
> >>
> >>[PATCH v8 1/4] Device Tree Binding Documentation
> >>[PATCH v8 2/4] Defconfig multi_v7
> >>[PATCH v8 3/4] Defconfig sunxi
> >>[PATCH v8 4/4] Kernel Module
> >
> >Applied 3 and 4.
> 
> Applied to what tree?
> 
> That's not the friendly way when Marc asks you about the documentation about
> the device tree (patch 1) and you commit the CAN driver and the sunxi
> defconfig (patch 3 & 4) that he mainly reviewed to whatever tree.
> 
> New CAN drivers go via can-next and net-next into mainline.
> 
> So please answer Marcs question and let him queue up the CAN driver via
> can-next himself.

Hmmm, actually, I meant 2 and 3, the two defconfig patches.

The driver and bindings should of course go through Marc's tree.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature


Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-17 Thread David Miller
From: KY Srinivasan 
Date: Thu, 17 Sep 2015 15:14:05 +

> I think I can achieve my original goal of not having any allocation
> in the send path by carefully using the memory available in the skb:

Please stop flat-out ignoring David L.'s suggestion.

Have a pre-cooked ring of buffers for these descriptors that you can
point the chip at.  No per-packet allocation is necessary at all.

If you play games with SKBs you will get burned.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] geneve: remove use of internal IP header when calling IP_ECN_decapsulate

2015-09-17 Thread John W. Linville
This seems to have been a "thinko".  IP_ECN_decapsulate needs info
from both internal and external headers.

Signed-off-by: John W. Linville 
---
 drivers/net/geneve.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index da3259ce7c8d..a917ae1cfbf3 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -121,10 +121,10 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
struct metadata_dst *tun_dst = NULL;
struct geneve_dev *geneve = NULL;
struct pcpu_sw_netstats *stats;
-   struct iphdr *iph;
+   struct iphdr *iph = NULL;
u8 *vni;
__be32 addr;
-   int err;
+   int err = 0;
 
if (gs->collect_md) {
static u8 zero_vni[3];
@@ -178,13 +178,15 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
 
skb_reset_network_header(skb);
 
-   iph = ip_hdr(skb); /* Now inner IP header... */
-   err = IP_ECN_decapsulate(iph, skb);
+   if (iph)
+   err = IP_ECN_decapsulate(iph, skb);
 
if (unlikely(err)) {
if (log_ecn_error)
-   net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
->saddr, iph->tos);
+   if (iph)
+   net_info_ratelimited("non-ECT from %pI4 "
+"with TOS=%#x\n",
+>saddr, iph->tos);
if (err > 1) {
++geneve->dev->stats.rx_frame_errors;
++geneve->dev->stats.rx_errors;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] geneve: restore vlan bits in xmit path

2015-09-17 Thread John W. Linville
These seem to have been accidentally dropped in commit 371bd1061d29
("geneve: Consolidate Geneve functionality in single module.").

Signed-off-by: John W. Linville 
---
 drivers/net/geneve.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a917ae1cfbf3..0aaf302cc31b 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -532,13 +533,20 @@ static int geneve_build_skb(struct rtable *rt, struct 
sk_buff *skb,
int err;
 
min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
-   + GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr);
+   + GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr)
+   + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
err = skb_cow_head(skb, min_headroom);
if (unlikely(err)) {
kfree_skb(skb);
goto free_rt;
}
 
+   skb = vlan_hwaccel_push_inside(skb);
+   if (unlikely(!skb)) {
+   err = -ENOMEM;
+   goto free_rt;
+   }
+
skb = udp_tunnel_handle_offloads(skb, csum);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


iproute2 tunnel name parsing

2015-09-17 Thread Wilhelm Wijkander
Hi,

I'm trying to create a sit tunnel called "hel": ip tun add hel mode
sit remote 10.200.0.2 local 10.200.1.2 ttl 255, however it seems like
this is interpreted as the help argument and I get the usage text. Is
there a way to escape names that I've missed, or is this an error
somewhere in argv parsing?

(I'm not subscribed, so a cc would be appreciated)
Thanks,
Wilhelm
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: remove use of internal IP header when calling IP_ECN_decapsulate

2015-09-17 Thread John W. Linville
On Thu, Sep 17, 2015 at 12:46:48PM -0700, Jesse Gross wrote:
> On Thu, Sep 17, 2015 at 10:17 AM, John W. Linville
>  wrote:
> > diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> > index da3259ce7c8d..a917ae1cfbf3 100644
> > --- a/drivers/net/geneve.c
> > +++ b/drivers/net/geneve.c
> > @@ -178,13 +178,15 @@ static void geneve_rx(struct geneve_sock *gs, struct 
> > sk_buff *skb)
> >
> > skb_reset_network_header(skb);
> >
> > -   iph = ip_hdr(skb); /* Now inner IP header... */
> > -   err = IP_ECN_decapsulate(iph, skb);
> > +   if (iph)
> > +   err = IP_ECN_decapsulate(iph, skb);
> 
> It looks like this is now conditional based on !collect_md. I'm not
> sure that we want to have a difference in behavior between the two.

Sure, I can move the iph assignment higher-up and keep the other bits 
unconditional.

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] geneve: remove use of internal IP header when calling IP_ECN_decapsulate

2015-09-17 Thread John W. Linville
This seems to have been a "thinko".  IP_ECN_decapsulate needs info
from both internal and external headers.

Signed-off-by: John W. Linville 
---
v2 -- ensure the collect_md path still calls IP_ECN_decapsulate

 drivers/net/geneve.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index da3259ce7c8d..549febac0579 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -126,6 +126,8 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
__be32 addr;
int err;
 
+   iph = ip_hdr(skb); /* outer IP header... */
+
if (gs->collect_md) {
static u8 zero_vni[3];
 
@@ -133,7 +135,6 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
addr = 0;
} else {
vni = gnvh->vni;
-   iph = ip_hdr(skb); /* Still outer IP header... */
addr = iph->saddr;
}
 
@@ -178,7 +179,6 @@ static void geneve_rx(struct geneve_sock *gs, struct 
sk_buff *skb)
 
skb_reset_network_header(skb);
 
-   iph = ip_hdr(skb); /* Now inner IP header... */
err = IP_ECN_decapsulate(iph, skb);
 
if (unlikely(err)) {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] net: fix data race on sk_buff after re-cloning

2015-09-17 Thread Eric Dumazet
On Thu, 2015-09-17 at 20:44 +0200, Dmitry Vyukov wrote:
> KernelThreadSanitizer (KTSAN) reported the following race (on 4.2 rc2):
> 
> ThreadSanitizer: data-race in __copy_skb_header
...

>   if (likely(atomic_read(>users) == 1))
>   smp_rmb();
> 
> The patch contains a proposed fix.
> If it looks good to you and the scenario looks sane,
> then I will update the description and resend it.
> ---
>  net/core/skbuff.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

I have to double check this patch, but in the case it is needed,
it would be better to not use fancy new atomic_read_acquire(),
as backporting the fix up to 3.19 (where the bug was probably added)
will require extra hassle.

atomic_read_acquire() would be fine for cleanups and new code, in next
branch.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] nftables 0.5 release

2015-09-17 Thread Pablo Neira Ayuso
Hi!

The Netfilter project proudly presents:

nftables 0.5

This release contains bug fixes and new features contained up to the
4.2 kernel release.

New features


* Concatenations: You can combine two or more selectors to build a
  tuple, then use it to look up for a matching in sets, eg.

  % nft add rule ip filter input ip saddr . tcp dport { \
1.1.1.1 . 22 , \
1.1.1.1 . 80 \
} counter accept

  So nft will check if the source IP address AND the TCP destination port
  matches what you have in the literal set above, if so it will
  update the rule counter and accept the packet.

  You can also combine concatenations with verdict maps:

  % nft add rule ip filter input ether saddr . ip saddr . meta iif vmap { \
3c:71:0e:39:bb:20 . 192.168.1.120 . "wlan0" : accept, \
3c:77:e0:39:aa:21 . 192.168.1.204 . "wlan0" : drop }

  You can declare a set using concatenations, to dynamically update its content
  instead:

  % nft add map filter accesslist { \
type ether_addr . ipv4_addr . iface_index : verdict \; }
  % nft add rule filter input ether saddr . ip saddr . meta iif vmap @accesslist

  Then, add elements to the set:

  % nft add element filter accesslist { \
3c:71:0e:39:bb:20 . 192.168.1.120 . wlan0 : accept }

  On a different front, you can also combine concatenations with maps:

  % nft add rule ip nat prerouting dnat ip saddr . tcp dport map { \
192.168.1.120 . 80 : 1.2.3.4, \
192.168.1.204 . 22 : 4.3.2.1 }

  In the example above, the destination address that is used in DNAT depends
  on the source IP address and the destination port of the packet.

  You require a Linux kernel >= 4.1 to use this new concatenation feature and
  nftables 0.5 of course.

* Add timeout support for sets: You can specify a lifetime for elements in your
  set declarations, eg.

  % nft add set filter whitelist { type ipv4_addr\; timeout 1h\; }
  % nft add element filter whitelist { 192.168.1.234 }
  % nft list ruleset
  table ip filter {
set whitelist {
type ipv4_addr
timeout 1h
elements = { 1.2.3.4 expires 59m56s}
}
  }

  You can also create the set with no specific timeout:

  % nft add set filter whitelist { type ipv4_addr\; flags timeout\; }

  So you can indicate the timeout when adding the element:

  % nft add element filter whitelist { 192.168.2.123 timeout 1h }

  You still can mix this with element that will reside permanently too:

  % nft add element filter whitelist { 192.168.2.180 }

* Add comments per set element, eg.

  % nft add element filter whitelist { 192.168.0.1 comment \"some host\" }

* Support for mini-gmp: If you're running nft from embedded devices,
  you may want to skip the libgmp dependency via:

  % ./configure --with-mini-gmp

  This compiles nft using the minimal gmp implementation that comes in
  the nftables tarball. Note that your nft binary avoids the libgmp
  dependency at the cost of getting a slightly larger binary.

* Dormant tables: You can disable the entire ruleset that is contained in a
  table by setting on the dormant flag:

  % nft add table filter { flags dormant\; }

  You can reenable it by typing:

  % nft add table filter

* Allow to specify default chain policy: You can specify the default chain
  policy by when you create the chain:

  % nft add chain filter input { \
type filter hook input priority 0\; policy drop\; }

  You can also change it for an existing chain anytime by updating it via:

  % nft add chain filter input { policy accept\; }

Bug fixes
=

* Command per line ruleset representation: According to what I can find on the
  Internet, it seems some people like to maintain their ruleset in scripts so
  they can add comments and annotate things there. However, this is a problem
  for two reasons: There is no atomic update since rules are published to the
  packet path one after another and this increases the time that nft takes to
  reload your ruleset significantly.

  So, the solution to this problem consists of keeping your ruleset like this:

  % cat my-ruleset-file
  flush ruleset
  add table filter
  add set filter whitelist { type ipv4_addr; }
  add chain filter input { type filter hook input priority 0; }
  add rule filter input iif lo accept
  add rule filter input ct state established,related counter accept
  add rule filter input tcp dport { 22, 80 } counter accept
  add rule filter input ip saddr @whitelist counter accept
  add element filter whitelist { 192.168.1.120 }
  add element filter whitelist { 192.168.1.121 }
  add element filter whitelist { 192.168.1.204 }

  You can also insert comments in the file through '#'.

  Then, you can atomically restore it via:

  % nft -f my-ruleset-file

  You can also use this command per line representation to apply
  incremental ruleset updates atomically:

  % cat 

I need to talk to you very urgent, Email me via: dkareem...@yahoo.com.hk

2015-09-17 Thread DKareem
I need to talk to you very urgent, Email me via: dkareem...@yahoo.com.hk
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experiences with slub bulk use-case for network stack

2015-09-17 Thread Jesper Dangaard Brouer
On Wed, 16 Sep 2015 10:13:25 -0500 (CDT)
Christoph Lameter  wrote:

> On Wed, 16 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> >
> > Hint, this leads up to discussing if current bulk *ALLOC* API need to
> > be changed...
> >
> > Alex and I have been working hard on practical use-case for SLAB
> > bulking (mostly slUb), in the network stack.  Here is a summary of
> > what we have learned so far.
> 
> SLAB refers to the SLAB allocator which is one slab allocator and SLUB is
> another slab allocator.
> 
> Please keep that consistent otherwise things get confusing

This naming scheme is really confusing.  I'll try to be more
consistent.  So, you want capital letters SLAB and SLUB when talking
about a specific slab allocator implementation.


> > Bulk free'ing SKBs during TX completion is a big and easy win.
> >
> > Specifically for slUb, normal path for freeing these objects (which
> > are not on c->freelist) require a locked double_cmpxchg per object.
> > The bulk free (via detached freelist patch) allow to free all objects
> > belonging to the same slab-page, to be free'ed with a single locked
> > double_cmpxchg. Thus, the bulk free speedup is quite an improvement.
> 
> Yep.
> 
> > Alex and I had the idea of bulk alloc returns an "allocator specific
> > cache" data-structure (and we add some helpers to access this).
> 
> Maybe add some Macros to handle this?

Yes, helpers will likely turn out to be macros.


> > In the slUb case, the freelist is a single linked pointer list.  In
> > the network stack the skb objects have a skb->next pointer, which is
> > located at the same position as freelist pointer.  Thus, simply
> > returning the freelist directly, could be interpreted as a skb-list.
> > The helper API would then do the prefetching, when pulling out
> > objects.
> 
> The problem with the SLUB case is that the objects must be on the same
> slab page.

Yes, I'm aware that, that is what we are trying to take advantage of.


> > For the slUb case, we would simply cmpxchg either c->freelist or
> > page->freelist with a NULL ptr, and then own all objects on the
> > freelist. This also reduce the time we keep IRQs disabled.
> 
> You dont need to disable interrupts for the cmpxchges. There is
> additional state in the page struct though so the updates must be
> done carefully.

Yes, I'm aware of cmpxchg does not need to disable interrupts.  And I
plan to take advantage of this, in this new approach for bulk alloc.

Our current bulk alloc disables interrupts for the full period (of
collecting the number requested objects).

What I'm proposing is keeping interrupts on, and then simply cmpxchg
e.g 2 slab-pages out of the SLUB allocator (which the SLUB code calls
freelist's). The bulk call now owns these freelists, and returns them
to the caller.  The API caller gets some helpers/macros to access
objects, to shield him from the details (of SLUB freelist's).

The pitfall with this API is we don't know how many objects are on a
SLUB freelist.  And we cannot walk the freelist and count them, because
then we hit the problem of memory/cache stalls (that we are trying so
hard to avoid).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: restore vlan bits in xmit path

2015-09-17 Thread Jesse Gross
On Thu, Sep 17, 2015 at 12:25 PM, John W. Linville
 wrote:
> On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote:
>> On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville
>>  wrote:
>> > These seem to have been accidentally dropped in commit 371bd1061d29
>> > ("geneve: Consolidate Geneve functionality in single module.").
>> >
>> Geneve should not export vxlan feature. So that it never sees vxlan
>> tagged packets. Can you turn off the vlan feature?
>
> I'm not sure I understand...?  This is vlan, not vxlan.

I think he just mean vlan. If you remove the line where
dev->vlan_features are set then the core stack will handle this and we
don't need to do anything special here.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-17 Thread David Miller
From: KY Srinivasan 
Date: Thu, 17 Sep 2015 19:52:01 +

> 
> 
>> -Original Message-
>> Have a pre-cooked ring of buffers for these descriptors that you can
>> point the chip at.  No per-packet allocation is necessary at all.
> 
> Even if I had a ring of buffers, I would still need to manage the life cycle
> of these buffers - selecting an unused one on the transmit path and marking
> it used (atomically).

Have one per TX ring entry, then the lifetime matches the lifetime of the
TX entry itself and therefore you need do nothing.

That's the whole idea.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: restore vlan bits in xmit path

2015-09-17 Thread John W. Linville
On Thu, Sep 17, 2015 at 12:48:56PM -0700, Jesse Gross wrote:
> On Thu, Sep 17, 2015 at 12:25 PM, John W. Linville
>  wrote:
> > On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote:
> >> On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville
> >>  wrote:
> >> > These seem to have been accidentally dropped in commit 371bd1061d29
> >> > ("geneve: Consolidate Geneve functionality in single module.").
> >> >
> >> Geneve should not export vxlan feature. So that it never sees vxlan
> >> tagged packets. Can you turn off the vlan feature?
> >
> > I'm not sure I understand...?  This is vlan, not vxlan.
> 
> I think he just mean vlan. If you remove the line where
> dev->vlan_features are set then the core stack will handle this and we
> don't need to do anything special here.

Is that preferrable to this patch?  Tunneling vlan-tagged frames
seems weird, but I would hate to disallow it if some crazy person
wanted to do that...

I guess the other way would slightly improve performance, and this
could be added back later.  What about the VLAN-related bits in
dev->features and ->hw_features?  Should they go as well?

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: smc91x: convert pxa dma to dmaengine

2015-09-17 Thread David Miller
From: Robert Jarzmik 
Date: Wed, 16 Sep 2015 11:41:54 +0200

> David Miller  writes:
> 
>> From: Robert Jarzmik 
>> Date: Thu, 10 Sep 2015 21:26:04 +0200
>>
>>> Convert the dma transfers to be dmaengine based, now pxa has a dmaengine
>>> slave driver. This makes this driver a bit more PXA agnostic.
>>> 
>>> The driver was tested on pxa27x (mainstone) and pxa310 (zylonite),
>>> ie. only pxa platforms.
>>> 
>>> Signed-off-by: Robert Jarzmik 
>>> Cc: Russell King 
>>> Cc: Arnd Bergmann 
>>> ---
>>> This has potential to break other platform such as Neponset, Idp,
>>> halibut and qsd8x50, so I added Russell and Arnd as they were discussing
>>> smc91x support last February.
>>
> 
>> Is someone testing whether such platforms break or not?  I'm waiting for
>> that before I consider applying this patch.
> 
> My understanding is that Russell is the only one left testing them, or at 
> least
> he was the only one complaining about a breakage lately on neponset.
> 
> I can wait several weeks for Russell to have a bit of time to try : I know it
> will compile correctly at least for neponset, and I know almost all the code 
> is
> under #ifdef CONFIG_ARCH_PXA. And still I would feel far more comfortable if 
> it
> was tested, just as you.

Oh well, I've waited long enough patch applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 2/2] 8139cp: reset BQL when ring tx ring cleared

2015-09-17 Thread Francois Romieu
David Woodhouse  :
> On Thu, 2015-09-17 at 12:36 +0100, David Woodhouse wrote:
> > 
> > Thanks; I'll try that. In fact since updating to 4.2 the problem has
> > got worse — now the whole machine dies:
> 
> There is something very strange going on here. I've found two ways to
> make it stop crashing when cp_tx_timeout() hits the 'popf' when
> unlocking the spinlock.

cp_tx_timeout takes lock, disables irq, calls cp_clean_rings, thus
plain dev_kfree_skb if a skb is still referenced in one of the
rx/tx ring. You may replace it with dev_kfree_skb_any.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: restore vlan bits in xmit path

2015-09-17 Thread John W. Linville
On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote:
> On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville
>  wrote:
> > These seem to have been accidentally dropped in commit 371bd1061d29
> > ("geneve: Consolidate Geneve functionality in single module.").
> >
> Geneve should not export vxlan feature. So that it never sees vxlan
> tagged packets. Can you turn off the vlan feature?

I'm not sure I understand...?  This is vlan, not vxlan.

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iplink_geneve: add UDP destination port configuration at link creation

2015-09-17 Thread John W. Linville
Signed-off-by: John W. Linville 
---
I didn't see an iproute2 patch posted for this option, so here is my version...

 ip/iplink_geneve.c| 13 +
 man/man8/ip-link.8.in |  6 ++
 2 files changed, 19 insertions(+)

diff --git a/ip/iplink_geneve.c b/ip/iplink_geneve.c
index 331240a6a3d9..0a45647844f5 100644
--- a/ip/iplink_geneve.c
+++ b/ip/iplink_geneve.c
@@ -19,6 +19,7 @@ static void print_explain(FILE *f)
 {
fprintf(f, "Usage: ... geneve id VNI remote ADDR\n");
fprintf(f, " [ ttl TTL ] [ tos TOS ]\n");
+   fprintf(f, " [ dstport PORT ]\n");
fprintf(f, "\n");
fprintf(f, "Where: VNI  := 0-16777215\n");
fprintf(f, "   ADDR := IP_ADDRESS\n");
@@ -40,6 +41,7 @@ static int geneve_parse_opt(struct link_util *lu, int argc, 
char **argv,
struct in6_addr daddr6 = IN6ADDR_ANY_INIT;
__u8 ttl = 0;
__u8 tos = 0;
+   __u16 dstport = 0;
 
while (argc > 0) {
if (!matches(*argv, "id") ||
@@ -80,6 +82,10 @@ static int geneve_parse_opt(struct link_util *lu, int argc, 
char **argv,
tos = uval;
} else
tos = 1;
+   } else if (!matches(*argv, "dstport")){
+   NEXT_ARG();
+   if (get_u16(, *argv, 0))
+   invarg("dst port", *argv);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -111,6 +117,9 @@ static int geneve_parse_opt(struct link_util *lu, int argc, 
char **argv,
addattr8(n, 1024, IFLA_GENEVE_TTL, ttl);
addattr8(n, 1024, IFLA_GENEVE_TOS, tos);
 
+   if (dstport)
+   addattr16(n, 1024, IFLA_GENEVE_PORT, htons(dstport));
+
return 0;
 }
 
@@ -150,6 +159,10 @@ static void geneve_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
else
fprintf(f, "tos %#x ", tos);
}
+
+   if (tb[IFLA_GENEVE_PORT])
+   fprintf(f, "dstport %u ",
+   ntohs(rta_getattr_u16(tb[IFLA_GENEVE_PORT])));
 }
 
 static void geneve_print_help(struct link_util *lu, int argc, char **argv,
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 1896eb6f185e..2e1889af650e 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -747,6 +747,8 @@ the following additional arguments are supported:
 .BI ttl " TTL "
 .R " ] [ "
 .BI tos " TOS "
+.R " ] [ "
+.BI dstport " PORT "
 .R " ]"
 
 .in +8
@@ -766,6 +768,10 @@ the following additional arguments are supported:
 .BI tos " TOS"
 - specifies the TOS value to use in outgoing packets.
 
+.sp
+.BI dstport " PORT "
+- specifies the UDP destination port to communicate at both ends of the GENEVE 
tunnel.
+
 .in -8
 
 .SS ip link delete - delete virtual link
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: remove use of internal IP header when calling IP_ECN_decapsulate

2015-09-17 Thread Jesse Gross
On Thu, Sep 17, 2015 at 10:17 AM, John W. Linville
 wrote:
> diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
> index da3259ce7c8d..a917ae1cfbf3 100644
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -178,13 +178,15 @@ static void geneve_rx(struct geneve_sock *gs, struct 
> sk_buff *skb)
>
> skb_reset_network_header(skb);
>
> -   iph = ip_hdr(skb); /* Now inner IP header... */
> -   err = IP_ECN_decapsulate(iph, skb);
> +   if (iph)
> +   err = IP_ECN_decapsulate(iph, skb);

It looks like this is now conditional based on !collect_md. I'm not
sure that we want to have a difference in behavior between the two.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V

2015-09-17 Thread KY Srinivasan


> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, September 17, 2015 11:52 AM
> To: KY Srinivasan 
> Cc: david.lai...@aculab.com; alexander.du...@gmail.com; Haiyang Zhang
> ; vkuzn...@redhat.com; netdev@vger.kernel.org;
> linux-ker...@vger.kernel.org; jasow...@redhat.com
> Subject: Re: [PATCH net-next RFC] net: increase LL_MAX_HEADER for Hyper-V
> 
> From: KY Srinivasan 
> Date: Thu, 17 Sep 2015 15:14:05 +
> 
> > I think I can achieve my original goal of not having any allocation
> > in the send path by carefully using the memory available in the skb:
> 
> Please stop flat-out ignoring David L.'s suggestion.

I am sorry; I did not mean to convey that impression.

> 
> Have a pre-cooked ring of buffers for these descriptors that you can
> point the chip at.  No per-packet allocation is necessary at all.

Even if I had a ring of buffers, I would still need to manage the life cycle
of these buffers - selecting an unused one on the transmit path and marking
it used (atomically). Once the transmit completes (as indicated by the transmit 
complete
callback) this buffer needs to be marked free. I can certainly make these 
operations
efficient and  lock-free, but they are still at some level an allocation/free
operation albeit potentially more efficient than having the kernel allocate the 
memory.
 
> 
> If you play games with SKBs you will get burned.

I will implement Dave L's suggestion. However, I am curious as to why you would 
consider
my proposed usage of the skb headroom and the control buffer area in skb as 
non-standard
usage.

Regards,

K. Y 


  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] geneve: restore vlan bits in xmit path

2015-09-17 Thread Pravin Shelar
On Thu, Sep 17, 2015 at 12:48 PM, Jesse Gross  wrote:
> On Thu, Sep 17, 2015 at 12:25 PM, John W. Linville
>  wrote:
>> On Thu, Sep 17, 2015 at 11:45:58AM -0700, Pravin Shelar wrote:
>>> On Thu, Sep 17, 2015 at 10:18 AM, John W. Linville
>>>  wrote:
>>> > These seem to have been accidentally dropped in commit 371bd1061d29
>>> > ("geneve: Consolidate Geneve functionality in single module.").
>>> >
>>> Geneve should not export vxlan feature. So that it never sees vxlan
>>> tagged packets. Can you turn off the vlan feature?
>>
>> I'm not sure I understand...?  This is vlan, not vxlan.
>
> I think he just mean vlan. If you remove the line where
> dev->vlan_features are set then the core stack will handle this and we
> don't need to do anything special here.

Yes, I meant vlan, sorry for confusion.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2 tunnel name parsing

2015-09-17 Thread Vadim Kochan
On Thu, Sep 17, 2015 at 09:55:29PM +0200, Wilhelm Wijkander wrote:
> Hi,
> 
> I'm trying to create a sit tunnel called "hel": ip tun add hel mode
> sit remote 10.200.0.2 local 10.200.1.2 ttl 255, however it seems like
> this is interpreted as the help argument and I get the usage text. Is
> there a way to escape names that I've missed, or is this an error
> somewhere in argv parsing?
> 
> (I'm not subscribed, so a cc would be appreciated)
> Thanks,
> Wilhelm
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi Wilhelm,

You can use 'name' before 'hel' like:

$ ip tun add name hel mode sit remote 10.200.0.2 local 10.200.1.2 ttl 255

and it should work, actually I just tried and it works.

Regards,
Vadim Kochan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pull-request: can-next 2015-09-17

2015-09-17 Thread Marc Kleine-Budde
Hello David,

this is a pull request of two patches for net-next/master.

Gerhard Bertelsmann adds support for the CAN controller found on the
Allwinner A10/A20 SoC.

Marc

---

The following changes since commit 37d2dbcdcca88e392009d7cbe8617d5af0ebcb32:

  net: fix cdc-phonet.c dependency and build error (2015-09-16 11:51:19 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git 
tags/linux-can-next-for-4.4-20150917

for you to fetch changes up to 0738eff14d817a02ab082c392c96a1613006f158:

  can: Allwinner A10/A20 CAN Controller support - Kernel module (2015-09-17 
22:39:08 +0200)


linux-can-next-for-4.4-20150917


Gerhard Bertelsmann (2):
  can: Allwinner A10/A20 CAN Controller support - Devicetree bindings
  can: Allwinner A10/A20 CAN Controller support - Kernel module

 .../devicetree/bindings/net/can/sun4i_can.txt  |  36 +
 drivers/net/can/Kconfig|  10 +
 drivers/net/can/Makefile   |   1 +
 drivers/net/can/sun4i_can.c| 857 +
 4 files changed, 904 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/can/sun4i_can.txt
 create mode 100644 drivers/net/can/sun4i_can.c

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


  1   2   >