date:20150728

On Tue, 2015-07-28 at 15:36 +0800, Hayes Wang wrote:
 The device reset is necessary if the hw becomes abnormal and stops
 transmitting packets.

You are not the first one to face this problem. Hence there
is a helper:

 * usb_queue_reset_device - Reset a USB device from an atomic context
 * @iface: USB interface belonging to the device to reset
 *
 * This function can be used to reset a USB device from an atomic
 * context, where usb_reset_device() won't work (as it blocks).

Please use it if you can. Your version for example is buggy.
It will oops if you unplug the device while a reset is scheduled.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net 1/2] r8152: add pre_reset and post_reset

On Tue, 2015-07-28 at 15:36 +0800, Hayes Wang wrote:
 Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when
 calling usb_reset_device(). The two functions could reduce the time
 of reset when calling usb_reset_device() after probe().
 
 Signed-off-by: Hayes Wang hayesw...@realtek.com
 ---
  drivers/net/usb/r8152.c | 68 
 +
  1 file changed, 68 insertions(+)
 
 diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
 index 144dc64..a6caa60 100644
 --- a/drivers/net/usb/r8152.c
 +++ b/drivers/net/usb/r8152.c
 @@ -3342,6 +3342,72 @@ static void r8153_init(struct r8152 *tp)
   r8153_u2p3en(tp, true);
  }
  
 +static int rtl8152_pre_reset(struct usb_interface *intf)
 +{
 + struct r8152 *tp = usb_get_intfdata(intf);
 + struct net_device *netdev;
 + int ret;
 +
 + if (intf-condition != USB_INTERFACE_BOUND || !tp)

If the interface weren't bound, you wouldn't be called.

 + return 0;
 +
 + netdev = tp-netdev;
 + if (!netif_running(netdev))
 + return 0;
 +
 + ret = usb_autopm_get_interface(intf);
 + if (ret  0)
 + return ret;

What sense does this make?

 +
 + napi_disable(tp-napi);
 + clear_bit(WORK_ENABLE, tp-flags);
 + usb_kill_urb(tp-intr_urb);
 + cancel_delayed_work_sync(tp-schedule);
 + if (netif_carrier_ok(netdev)) {
 + netif_stop_queue(netdev);
 + mutex_lock(tp-control);
 + tp-rtl_ops.disable(tp);
 + mutex_unlock(tp-control);
 + }
 +
 + usb_autopm_put_interface(intf);
 +
 + return 0;
 +}
 +
 +static int rtl8152_post_reset(struct usb_interface *intf)
 +{
 + struct r8152 *tp = usb_get_intfdata(intf);
 + struct net_device *netdev;
 + int ret;
 +
 + if (intf-condition != USB_INTERFACE_BOUND || !tp)

Again unnecessary

 + return 0;
 +
 + netdev = tp-netdev;
 + if (!netif_running(netdev))
 + return 0;
 +
 + ret = usb_autopm_get_interface(intf);

The device will be awake.

 + if (ret  0)
 + return ret;
 +
 + set_bit(WORK_ENABLE, tp-flags);
 + if (netif_carrier_ok(netdev)) {
 + mutex_lock(tp-control);
 + tp-rtl_ops.enable(tp);
 + rtl8152_set_rx_mode(netdev);
 + mutex_unlock(tp-control);
 + netif_wake_queue(netdev);
 + }
 +
 + napi_enable(tp-napi);
 +
 + usb_autopm_put_interface(intf);
 +
 + return ret;
 +}
 +

HTH
Oliver


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Drivers: isdn: Drop unnecessary continue

2015-07-28 Thread Shraddha Barke

The semantic patch used to make this change is :

@@
@@
for (...;...;...) {
  ...
  if (...) {
...
-   continue;
  }
}

Signed-off-by: Shraddha Barke shraddha.6...@gmail.com
---
 drivers/isdn/hardware/mISDN/hfcsusb.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c 
b/drivers/isdn/hardware/mISDN/hfcsusb.c
index 114f3bc..34e4b6c 100644
--- a/drivers/isdn/hardware/mISDN/hfcsusb.c
+++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
@@ -1923,7 +1923,6 @@ hfcsusb_probe(struct usb_interface *intf, const struct 
usb_device_id *id)
(le16_to_cpu(dev-descriptor.idProduct)
 == hfcsusb_idtab[i].idProduct)) {
vend_idx = i;
-   continue;
}
}
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 3/4] dwc_eth_qos: Add the synopsys folder to the build system.

Signed-off-by: Lars Persson lar...@axis.com
---
 drivers/net/ethernet/Kconfig  | 1 +
 drivers/net/ethernet/Makefile | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index f3bb178..05aa759 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -167,6 +167,7 @@ source drivers/net/ethernet/sgi/Kconfig
 source drivers/net/ethernet/smsc/Kconfig
 source drivers/net/ethernet/stmicro/Kconfig
 source drivers/net/ethernet/sun/Kconfig
+source drivers/net/ethernet/synopsys/Kconfig
 source drivers/net/ethernet/tehuti/Kconfig
 source drivers/net/ethernet/ti/Kconfig
 source drivers/net/ethernet/tile/Kconfig
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index c51014b..f42177b 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -77,6 +77,7 @@ obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
 obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
 obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
 obj-$(CONFIG_NET_VENDOR_SUN) += sun/
+obj-$(CONFIG_NET_VENDOR_SYNOPSYS) += synopsys/
 obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
 obj-$(CONFIG_NET_VENDOR_TI) += ti/
 obj-$(CONFIG_TILE_NET) += tile/
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/1] net/ipv4: Enable flow-based ECMP

2015-07-28 Thread Michal Kubecek

On Tue, Jul 28, 2015 at 02:27:57AM +, Richard Laing wrote:
 From: Richard Laing richard.la...@alliedtelesis.co.nz
 
 Enable flow-based ECMP.
 
 Currently if equal-cost multipath is enabled the kernel chooses between
 equal cost paths for each matching packet, essentially packets are
 round-robined between the routes. This means that packets from a single
 flow can traverse different routes. If one of the routes experiences
 congestion this can result in delayed or out of order packets arriving
 at the destination.
 
 This patch allows packets to be routed based on their
 flow - packets in the same flow will always use the same route. This
 prevents out of order packets. There are other issues with round-robin
 based ECMP routing related to variable path MTU handling and debugging.
 See RFC2991 for more details on the problems associated with packet
 based ECMP routing.
 
 This patch relies on the skb hash value to select between routes. The
 selection uses a hash-threshold algorithm (see RFC2992).
 
 Signed-off-by: Richard Laing richard.la...@alliedtelesis.co.nz

The patch looks corrupted (long lines split, tabs converted to (four?)
spaces etc.

Michal Kubecek

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Drivers: isdn: Drop unnecessary continue

2015-07-28 Thread Shraddha Barke

The semantic patch used to make this change is :

@@
@@
for (...;...;...) {
  ...
  if (...) {
...
-   continue;
  }
}

Signed-off-by: Shraddha Barke shraddha.6...@gmail.com
---
 drivers/isdn/hardware/mISDN/hfcsusb.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c 
b/drivers/isdn/hardware/mISDN/hfcsusb.c
index 114f3bc..91beb83 100644
--- a/drivers/isdn/hardware/mISDN/hfcsusb.c
+++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
@@ -1921,10 +1921,9 @@ hfcsusb_probe(struct usb_interface *intf, const struct 
usb_device_id *id)
if ((le16_to_cpu(dev-descriptor.idVendor)
 == hfcsusb_idtab[i].idVendor) 
(le16_to_cpu(dev-descriptor.idProduct)
-== hfcsusb_idtab[i].idProduct)) {
+== hfcsusb_idtab[i].idProduct))
vend_idx = i;
-   continue;
-   }
+
}
 
printk(KERN_DEBUG
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH v1 net-next 1/1] net: fec: add stop mode request on/off implemention

2015-07-28 Thread Duan Andy

Hi, David,

From: Duan Fugang-B38611 Sent: Monday, July 27, 2015 9:28 AM
 To: 'David Miller'
 Cc: netdev@vger.kernel.org; Li Frank-B20596; step...@networkplumber.org
 Subject: RE: [PATCH v1 net-next 1/1] net: fec: add stop mode request
 on/off implemention

 From: David Miller da...@davemloft.net Sent: Monday, July 27, 2015 7:27
 AM
  To: Duan Fugang-B38611
  Cc: netdev@vger.kernel.org; Li Frank-B20596;
  step...@networkplumber.org
  Subject: Re: [PATCH v1 net-next 1/1] net: fec: add stop mode request
  on/off implemention

  From: Fugang Duan b38...@freescale.com
  Date: Wed, 22 Jul 2015 18:13:43 +0800

   The current driver depends on platform data to implement stop mode
   request on/off that call api pdata-sleep_mode_enable().

   To reduce arch platform redundancy code, since the function only set
   SOC GPR register bit to request stop mode of/off, so we can move the
   function into driver. And the specifix GPR register offset and MASK
   bit can be transferred from DTS.

   Signed-off-by: Fugang Duan b38...@freescale.com

  Doesn't this break stop mode on those devices until the DTS is updated?

  That's really unfortunate, because you're leaving all of the platform
  data and implementation there, yet it's going to be unused.

  I really think you need to keep the code using the platform data bits
  around until all the DTSs are updated.

  No matter what you tell me about how DTSs are updated (don't even
  mention the details, I do not care) you simply cannot keep the
  platform data code around and not use it.  It is completely
  nonsensible to have code that would properly function and properly
  support a feature for the device in the kernel, yet not use it.

  Period.

 Thanks for your comments.

 Firstly, I will send some board dts patches (and test).
 Secondly, the net/next tree have no platform data for stop mode because
 others suggest us to use dts not platform data, and there have no any
 boards support stop mode in net/next, so this doesn't break any boards in
 net/next.

 Regards,
 Andy

I remove platform data callback is because there have no any platform use stop 
mode function. That is to remove redundant code.

I tested the patch on 4.1 with extra patches (imx pm support patches), it works 
fine.
But Linux next still loss i.MX power management patches, so wakeup source 
cannot work in next.

So the patch itself has no problem. You can accept it now, or after imx pm 
patches enter to next, and then I will send it again with imx6x/7x support.

Regards,
Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH net 2/2] r8152: reset device when tx timeout

Oliver Neukum [mailto:oneu...@suse.com]
 Sent: Tuesday, July 28, 2015 4:57 PM
[...]
  * usb_queue_reset_device - Reset a USB device from an atomic context
  * @iface: USB interface belonging to the device to reset
  *
  * This function can be used to reset a USB device from an atomic
  * context, where usb_reset_device() won't work (as it blocks).
 
 Please use it if you can. Your version for example is buggy.
 It will oops if you unplug the device while a reset is scheduled.

Thanks for your suggestion. I would replace it. 

Best Regards,
Hayes

N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

[PATCH net-next 4/4] dwc_eth_qos: Add maintainer info

Add maintainer information for the Synopsys DWC Ethernet QOS driver.

Signed-off-by: Lars Persson lar...@axis.com
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index a226416..0c78766 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8903,6 +8903,13 @@ F:   include/linux/dma/dw.h
 F: include/linux/platform_data/dma-dw.h
 F: drivers/dma/dw/
 
+SYNOPSYS DESIGNWARE ETHERNET QOS 4.10a driver
+M: Lars Persson lars.pers...@axis.com
+L: netdev@vger.kernel.org
+S: Supported
+F: Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
+F: drivers/net/ethernet/synopsys/dwc_eth_qos.c
+
 SYNOPSYS DESIGNWARE MMC/SD/SDIO DRIVER
 M: Seungwon Jeon tgih@samsung.com
 M: Jaehoon Chung jh80.ch...@samsung.com
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/4] dwc_eth_qos: Add support for Synopsys DWC Ethernet QoS

This patch adds a platform driver for the new generation of the
gigabit ethernet IP from Synopsys. It is developed for version 4.10a
of the IP core.

Signed-off-by: Lars Persson lar...@axis.com
---
 drivers/net/ethernet/synopsys/Kconfig   |   27 +
 drivers/net/ethernet/synopsys/Makefile  |5 +
 drivers/net/ethernet/synopsys/dwc_eth_qos.c | 3019 +++
 3 files changed, 3051 insertions(+)
 create mode 100644 drivers/net/ethernet/synopsys/Kconfig
 create mode 100644 drivers/net/ethernet/synopsys/Makefile
 create mode 100644 drivers/net/ethernet/synopsys/dwc_eth_qos.c

diff --git a/drivers/net/ethernet/synopsys/Kconfig 
b/drivers/net/ethernet/synopsys/Kconfig
new file mode 100644
index 000..a8f3151
--- /dev/null
+++ b/drivers/net/ethernet/synopsys/Kconfig
@@ -0,0 +1,27 @@
+#
+# Synopsys network device configuration
+#
+
+config NET_VENDOR_SYNOPSYS
+   bool Synopsys devices
+   default y
+   ---help---
+ If you have a network (Ethernet) device belonging to this class, say 
Y.
+
+ Note that the answer to this question doesn't directly affect the
+ kernel: saying N will just cause the configurator to skip all
+ the questions about Synopsys devices. If you say Y, you will be asked
+ for your specific device in the following questions.
+
+if NET_VENDOR_SYNOPSYS
+
+config SYNOPSYS_DWC_ETH_QOS
+   tristate Sypnopsys DWC Ethernet QOS v4.10a support
+   select PHYLIB
+   select CRC32
+   select MII
+   depends on OF
+   ---help---
+ This driver supports the DWC Ethernet QoS from Synopsys
+
+endif # NET_VENDOR_SYNOPSYS
diff --git a/drivers/net/ethernet/synopsys/Makefile 
b/drivers/net/ethernet/synopsys/Makefile
new file mode 100644
index 000..7a37572
--- /dev/null
+++ b/drivers/net/ethernet/synopsys/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the Synopsys network device drivers.
+#
+
+obj-$(CONFIG_SYNOPSYS_DWC_ETH_QOS) += dwc_eth_qos.o
diff --git a/drivers/net/ethernet/synopsys/dwc_eth_qos.c 
b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
new file mode 100644
index 000..85b3326
--- /dev/null
+++ b/drivers/net/ethernet/synopsys/dwc_eth_qos.c
@@ -0,0 +1,3019 @@
+/*  Synopsys DWC Ethernet Quality-of-Service v4.10a linux driver
+ *
+ *  This is a driver for the Synopsys DWC Ethernet QoS IP version 4.10a (GMAC).
+ *  This version introduced a lot of changes which breaks backwards
+ *  compatibility the non-QoS IP from Synopsys (used in the ST Micro drivers).
+ *  Some fields differ between version 4.00a and 4.10a, mainly the interrupt
+ *  bit fields. The driver could be made compatible with 4.00, if all relevant
+ *  HW erratas are handled.
+ *
+ *  The GMAC is highly configurable at synthesis time. This driver has been
+ *  developed for a subset of the total available feature set. Currently
+ *  it supports:
+ *  - TSO
+ *  - Checksum offload for RX and TX.
+ *  - Energy efficient ethernet.
+ *  - GMII phy interface.
+ *  - The statistics module.
+ *  - Single RX and TX queue.
+ *
+ *  Copyright (C) 2015 Axis Communications AB.
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms and conditions of the GNU General Public License,
+ *  version 2, as published by the Free Software Foundation.
+ */
+
+#include linux/clk.h
+#include linux/module.h
+#include linux/kernel.h
+#include linux/init.h
+#include linux/io.h
+#include linux/ethtool.h
+#include linux/stat.h
+#include linux/types.h
+
+#include linux/types.h
+#include linux/slab.h
+#include linux/delay.h
+#include linux/mm.h
+#include linux/netdevice.h
+#include linux/etherdevice.h
+#include linux/platform_device.h
+
+#include linux/phy.h
+#include linux/mii.h
+#include linux/delay.h
+#include linux/dma-mapping.h
+#include linux/vmalloc.h
+#include linux/version.h
+
+#include linux/device.h
+#include linux/bitrev.h
+#include linux/crc32.h
+
+#include linux/of.h
+#include linux/interrupt.h
+#include linux/clocksource.h
+#include linux/net_tstamp.h
+#include linux/pm_runtime.h
+#include linux/of_net.h
+#include linux/of_address.h
+#include linux/of_mdio.h
+#include linux/timer.h
+#include linux/tcp.h
+
+#define DRIVER_NAMEdwceqos
+#define DRIVER_DESCRIPTION Synopsys DWC Ethernet QoS driver
+#define DRIVER_VERSION 0.9
+
+#define DWCEQOS_MSG_DEFAULT(NETIF_MSG_DRV | NETIF_MSG_PROBE | \
+   NETIF_MSG_LINK | NETIF_MSG_IFDOWN | NETIF_MSG_IFUP)
+
+#define DWCEQOS_TX_TIMEOUT 5 /* Seconds */
+
+#define DWCEQOS_LPI_TIMER_MIN  8
+#define DWCEQOS_LPI_TIMER_MAX  ((1  20) - 1)
+
+#define DWCEQOS_RX_BUF_SIZE 2048
+
+#define DWCEQOS_RX_DCNT 256
+#define DWCEQOS_TX_DCNT 256
+
+#define DWCEQOS_HASH_TABLE_SIZE 64
+
+/* The size field in the DMA descriptor is 14 bits */
+#define BYTES_PER_DMA_DESC 16376
+
+/* Hardware registers */
+#define START_MAC_REG_OFFSET0x
+#define MAX_MAC_REG_OFFSET  0x0bd0
+#define START_MTL_REG_OFFSET0x0c00
+#define

[PATCH net-next 1/4] dwc_eth_qos: Add Synopsys DWC Ethernet QoS bindings

Add device tree binding documentation for the Synopsys DWC Ethernet
QoS driver supporting revision 4.10a of the hardware IP.

Signed-off-by: Lars Persson lar...@axis.com
---
 .../bindings/net/snps,dwc-qos-ethernet.txt | 75 ++
 1 file changed, 75 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt

diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt 
b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
new file mode 100644
index 000..51f8d2e
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
@@ -0,0 +1,75 @@
+* Synopsys DWC Ethernet QoS IP version 4.10 driver (GMAC)
+
+
+Required properties:
+- compatible: Should be snps,dwc-qos-ethernet-4.10
+- reg: Address and length of the register set for the device
+- clocks: Phandles to the reference clock and the bus clock
+- clock-names: Should be phy_ref_clk for the reference clock and apb_pclk
+  for the bus clock.
+- interrupt-parent: Should be the phandle for the interrupt controller
+  that services interrupts for this device
+- interrupts: Should contain the core's combined interrupt signal
+- phy-mode: See ethernet.txt file in the same directory
+
+Optional properties:
+- dma-coherent: Present if dma operations are coherent
+- mac-address: See ethernet.txt in the same directory
+- local-mac-address: See ethernet.txt in the same directory
+- snps,en-lpi: If present it enables use of the AXI low-power interface
+- snps,write-requests: Number of write requests that the AXI port can issue.
+  It depends on the SoC configuration.
+- snps,read-requests: Number of read requests that the AXI port can issue.
+  It depends on the SoC configuration.
+- snps,burst-map: Bitmap of allowed AXI burst lengts, with the LSB
+  representing 4, then 8 etc.
+- snps,txpbl: DMA Programmable burst length for the TX DMA
+- snps,rxpbl: DMA Programmable burst length for the RX DMA
+- snps,en-tx-lpi-clockgating: Enable gating of the MAC TX clock during
+  TX low-power mode.
+- phy-handle: See ethernet.txt file in the same directory
+- mdio device tree subnode: When the GMAC has a phy connected to its local
+mdio, there must be device tree subnode with the following
+required properties:
+- compatible: Must be snps,dwc-qos-ethernet-mdio.
+- #address-cells: Must be 1.
+- #size-cells: Must be 0.
+
+For each phy on the mdio bus, there must be a node with the following
+fields:
+
+- reg: phy id used to communicate to phy.
+- device_type: Must be ethernet-phy.
+- fixed-mode device tree subnode: see fixed-link.txt in the same directory
+
+Examples:
+ethernet2@4001 {
+   clock-names = phy_ref_clk, apb_pclk;
+   clocks = clkc 17, clkc 15;
+   compatible = snps,dwc-qos-ethernet-4.10;
+   interrupt-parent = intc;
+   interrupts = 0x0 0x1e 0x4;
+   reg = 0x4001 0x4000;
+   phy-handle = phy2;
+   phy-mode = gmii;
+
+   snps,en-tx-lpi-clockgating;
+   snps,en-lpi;
+   snps,write-requests = 2;
+   snps,read-requests = 16;
+   snps,burst-map = 0x7;
+   snps,txpbl = 8;
+   snps,rxpbl = 2;
+
+   dma-coherent;
+
+   mdio {
+   #address-cells = 0x1;
+   #size-cells = 0x0;
+   phy2: phy@1 {
+   compatible = ethernet-phy-ieee802.3-c22;
+   device_type = ethernet-phy;
+   reg = 0x1;
+   };
+   };
+};
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Drivers: isdn: Drop unnecessary continue

2015-07-28 Thread Julia Lawall



On Tue, 28 Jul 2015, Shraddha Barke wrote:

 The semantic patch used to make this change is :

 @@
 @@
 for (...;...;...) {
   ...
   if (...) {
 ...
 -   continue;
   }
 }

 Signed-off-by: Shraddha Barke shraddha.6...@gmail.com
 ---
  drivers/isdn/hardware/mISDN/hfcsusb.c | 1 -
  1 file changed, 1 deletion(-)

 diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c 
 b/drivers/isdn/hardware/mISDN/hfcsusb.c
 index 114f3bc..34e4b6c 100644
 --- a/drivers/isdn/hardware/mISDN/hfcsusb.c
 +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
 @@ -1923,7 +1923,6 @@ hfcsusb_probe(struct usb_interface *intf, const struct 
 usb_device_id *id)
   (le16_to_cpu(dev-descriptor.idProduct)
== hfcsusb_idtab[i].idProduct)) {
   vend_idx = i;
 - continue;

Now there is only one statement in the branch, so the {} should go as
well.

julia

   }
   }

 --
 2.1.0


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/4] net/mlx4_en: Hardware accelerated 802.1ad

2015-07-28 Thread Or Gerlitz


On 7/28/2015 1:00 AM, David Miller wrote:
Series applied, thanks. 


Hi Dave,

I don't see this on your kernel.org clone.. maybe forgot to press on the 
push button?


Or.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 2/2] r8152: reset device when tx timeout

The device reset is necessary if the hw becomes abnormal and stops
transmitting packets.

Signed-off-by: Hayes Wang hayesw...@realtek.com
---
 drivers/net/usb/r8152.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index a6caa60..9bf6e0c 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -27,7 +27,7 @@
 #include linux/usb/cdc.h
 
 /* Version Information */
-#define DRIVER_VERSION v1.08.0 (2015/01/13)
+#define DRIVER_VERSION v1.08.1 (2015/07/28)
 #define DRIVER_AUTHOR Realtek linux nic maintainers nic_s...@realtek.com
 #define DRIVER_DESC Realtek RTL8152/RTL8153 Based USB Ethernet Adapters
 #define MODULENAME r8152
@@ -591,6 +591,7 @@ struct r8152 {
struct sk_buff_head tx_queue, rx_queue;
spinlock_t rx_lock, tx_lock;
struct delayed_work schedule;
+   struct delayed_work work_reset;
struct mii_if_info mii;
struct mutex control;   /* use for hw setting */
 
@@ -1902,11 +1903,11 @@ static void rtl_drop_queued_tx(struct r8152 *tp)
 static void rtl8152_tx_timeout(struct net_device *netdev)
 {
struct r8152 *tp = netdev_priv(netdev);
-   int i;
 
netif_warn(tp, tx_err, netdev, Tx timeout\n);
-   for (i = 0; i  RTL8152_MAX_TX; i++)
-   usb_unlink_urb(tp-tx_info[i].urb);
+
+   schedule_delayed_work(tp-work_reset, 0);
+   cancel_delayed_work(tp-schedule);
 }
 
 static void rtl8152_set_rx_mode(struct net_device *netdev)
@@ -3408,6 +3409,18 @@ static int rtl8152_post_reset(struct usb_interface *intf)
return ret;
 }
 
+static void rtl_hw_reset(struct work_struct *work)
+{
+   struct r8152 *tp = container_of(work, struct r8152, work_reset.work);
+
+   netif_info(tp, drv, tp-netdev, usb reset device\n);
+
+   if (test_bit(RTL8152_UNPLUG, tp-flags))
+   return;
+
+   usb_reset_device(tp-udev);
+}
+
 static int rtl8152_suspend(struct usb_interface *intf, pm_message_t message)
 {
struct r8152 *tp = usb_get_intfdata(intf);
@@ -4102,6 +4115,7 @@ static int rtl8152_probe(struct usb_interface *intf,
 
mutex_init(tp-control);
INIT_DELAYED_WORK(tp-schedule, rtl_work_func_t);
+   INIT_DELAYED_WORK(tp-work_reset, rtl_hw_reset);
 
netdev-netdev_ops = rtl8152_netdev_ops;
netdev-watchdog_timeo = RTL8152_TX_TIMEOUT;
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 1/2] r8152: add pre_reset and post_reset

Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when
calling usb_reset_device(). The two functions could reduce the time
of reset when calling usb_reset_device() after probe().

Signed-off-by: Hayes Wang hayesw...@realtek.com
---
 drivers/net/usb/r8152.c | 68 +
 1 file changed, 68 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 144dc64..a6caa60 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3342,6 +3342,72 @@ static void r8153_init(struct r8152 *tp)
r8153_u2p3en(tp, true);
 }
 
+static int rtl8152_pre_reset(struct usb_interface *intf)
+{
+   struct r8152 *tp = usb_get_intfdata(intf);
+   struct net_device *netdev;
+   int ret;
+
+   if (intf-condition != USB_INTERFACE_BOUND || !tp)
+   return 0;
+
+   netdev = tp-netdev;
+   if (!netif_running(netdev))
+   return 0;
+
+   ret = usb_autopm_get_interface(intf);
+   if (ret  0)
+   return ret;
+
+   napi_disable(tp-napi);
+   clear_bit(WORK_ENABLE, tp-flags);
+   usb_kill_urb(tp-intr_urb);
+   cancel_delayed_work_sync(tp-schedule);
+   if (netif_carrier_ok(netdev)) {
+   netif_stop_queue(netdev);
+   mutex_lock(tp-control);
+   tp-rtl_ops.disable(tp);
+   mutex_unlock(tp-control);
+   }
+
+   usb_autopm_put_interface(intf);
+
+   return 0;
+}
+
+static int rtl8152_post_reset(struct usb_interface *intf)
+{
+   struct r8152 *tp = usb_get_intfdata(intf);
+   struct net_device *netdev;
+   int ret;
+
+   if (intf-condition != USB_INTERFACE_BOUND || !tp)
+   return 0;
+
+   netdev = tp-netdev;
+   if (!netif_running(netdev))
+   return 0;
+
+   ret = usb_autopm_get_interface(intf);
+   if (ret  0)
+   return ret;
+
+   set_bit(WORK_ENABLE, tp-flags);
+   if (netif_carrier_ok(netdev)) {
+   mutex_lock(tp-control);
+   tp-rtl_ops.enable(tp);
+   rtl8152_set_rx_mode(netdev);
+   mutex_unlock(tp-control);
+   netif_wake_queue(netdev);
+   }
+
+   napi_enable(tp-napi);
+
+   usb_autopm_put_interface(intf);
+
+   return ret;
+}
+
 static int rtl8152_suspend(struct usb_interface *intf, pm_message_t message)
 {
struct r8152 *tp = usb_get_intfdata(intf);
@@ -4164,6 +4230,8 @@ static struct usb_driver rtl8152_driver = {
.suspend =  rtl8152_suspend,
.resume =   rtl8152_resume,
.reset_resume = rtl8152_resume,
+   .pre_reset =rtl8152_pre_reset,
+   .post_reset =   rtl8152_post_reset,
.supports_autosuspend = 1,
.disable_hub_initiated_lpm = 1,
 };
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net 0/2] r8152: device reset

Although the driver works normally, we find the device may get all 0xff data 
when
tranmitting packets on certain platforms. It would break the device and no 
packet
could be transmitted. The reset is necessary to recover the hw for this 
situation.

Hayes Wang (2):
  r8152: add pre_reset and post_reset
  r8152: reset device when tx timeout

 drivers/net/usb/r8152.c | 90 ++---
 1 file changed, 86 insertions(+), 4 deletions(-)

-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] mac80211: fix invalid read in minstrel_sort_best_tp_rates()

2015-07-28 Thread Adrien Schildknecht

At the last iteration of the loop, j may equal zero and thus
tp_list[j - 1] causes an invalid read.
Changed the logic of the loop so that j - 1 is always = 0.

Signed-off-by: Adrien Schildknecht adrien+...@schischi.me
---
 net/mac80211/rc80211_minstrel.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/mac80211/rc80211_minstrel.c b/net/mac80211/rc80211_minstrel.c
index 247552a..3ece7d1 100644
--- a/net/mac80211/rc80211_minstrel.c
+++ b/net/mac80211/rc80211_minstrel.c
@@ -92,14 +92,15 @@ int minstrel_get_tp_avg(struct minstrel_rate *mr, int 
prob_ewma)
 static inline void
 minstrel_sort_best_tp_rates(struct minstrel_sta_info *mi, int i, u8 *tp_list)
 {
-   int j = MAX_THR_RATES;
-   struct minstrel_rate_stats *tmp_mrs = mi-r[j - 1].stats;
+   int j;
+   struct minstrel_rate_stats *tmp_mrs;
struct minstrel_rate_stats *cur_mrs = mi-r[i].stats;
 
-   while (j  0  (minstrel_get_tp_avg(mi-r[i], cur_mrs-prob_ewma) 
-  minstrel_get_tp_avg(mi-r[tp_list[j - 1]], 
tmp_mrs-prob_ewma))) {
-   j--;
+   for (j = MAX_THR_RATES; j  0; --j) {
tmp_mrs = mi-r[tp_list[j - 1]].stats;
+   if (minstrel_get_tp_avg(mi-r[i], cur_mrs-prob_ewma) =
+   minstrel_get_tp_avg(mi-r[tp_list[j - 1]], 
tmp_mrs-prob_ewma))
+   break;
}
 
if (j  MAX_THR_RATES - 1)
-- 
2.4.6

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/4] net/mlx4_en: Hardware accelerated 802.1ad

2015-07-28 Thread David Miller

From: Or Gerlitz ogerl...@mellanox.com
Date: Tue, 28 Jul 2015 10:51:08 +0300

 On 7/28/2015 1:00 AM, David Miller wrote:
 Series applied, thanks. 

 Hi Dave,

 I don't see this on your kernel.org clone.. maybe forgot to press on
 the push button?

It should be there now.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Drivers: isdn: Drop unnecessary continue

2015-07-28 Thread Julia Lawall

The patch should have v2 in the subject line, and should have a
description of the change since the previous version under the ---

On Tue, 28 Jul 2015, Shraddha Barke wrote:

 The semantic patch used to make this change is :

 @@
 @@
 for (...;...;...) {
   ...
   if (...) {
 ...
 -   continue;
   }
 }

 Signed-off-by: Shraddha Barke shraddha.6...@gmail.com
 ---
  drivers/isdn/hardware/mISDN/hfcsusb.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

 diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c 
 b/drivers/isdn/hardware/mISDN/hfcsusb.c
 index 114f3bc..91beb83 100644
 --- a/drivers/isdn/hardware/mISDN/hfcsusb.c
 +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
 @@ -1921,10 +1921,9 @@ hfcsusb_probe(struct usb_interface *intf, const struct 
 usb_device_id *id)
   if ((le16_to_cpu(dev-descriptor.idVendor)
== hfcsusb_idtab[i].idVendor) 
   (le16_to_cpu(dev-descriptor.idProduct)
 -  == hfcsusb_idtab[i].idProduct)) {
 +  == hfcsusb_idtab[i].idProduct))
   vend_idx = i;
 - continue;
 - }
 +

There is no need to add a blank line here.

julia

   }

   printk(KERN_DEBUG
 --
 2.1.0


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/4] dwc_eth_qos: Add support for Synopsys DWC Ethernet QoS

This is a driver supporting version 4.10a of the Synopsys DWC Ethernet QoS
gigabit ethernet controller. The IP has changed significantly compared to the
dwmac1000 so a separate driver is justified.

The IP is highly configurable at synthesis time. This driver has been
developed for a subset of the total available feature set. Currently
it supports:
* TSO
* Checksum offload for RX and TX.
* Energy efficient ethernet.
* GMII phy interface.
* The statistics module.
* Single RX and TX queue.

Lars Persson (4):
  dwc_eth_qos: Add Synopsys DWC Ethernet QoS bindings
  dwc_eth_qos: Add support for Synopsys DWC Ethernet QoS
  dwc_eth_qos: Add the synopsys folder to the build system.
  dwc_eth_qos: Add maintainer info

 .../bindings/net/snps,dwc-qos-ethernet.txt |   75 +
 MAINTAINERS|7 +
 drivers/net/ethernet/Kconfig   |1 +
 drivers/net/ethernet/Makefile  |1 +
 drivers/net/ethernet/synopsys/Kconfig  |   27 +
 drivers/net/ethernet/synopsys/Makefile |5 +
 drivers/net/ethernet/synopsys/dwc_eth_qos.c| 3019 
 7 files changed, 3135 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
 create mode 100644 drivers/net/ethernet/synopsys/Kconfig
 create mode 100644 drivers/net/ethernet/synopsys/Makefile
 create mode 100644 drivers/net/ethernet/synopsys/dwc_eth_qos.c

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/2] net: Initialize sk_hash to random value and reset for failing cnxs

This patch set implements a common function to simply set sk_txhash to
a random number instead of going through the trouble to call flow
dissector. From dst_negative_advice we now reset the sk_txhash in hopes
of finding a better ECMP path through the network. Changing sk_txhash
affects:
  - IPv6 flow label and UDP source port which affect ECMP in the network
  - Local EMCP route selection (pending changes to use sk_txhash)

Tom Herbert (2):
  net: Set sk_txhash from a random number
  net: Recompute sk_txhash on negative routing advice

 include/net/ip.h| 16 
 include/net/ipv6.h  | 19 ---
 include/net/sock.h  | 16 
 net/ipv4/datagram.c |  2 +-
 net/ipv4/tcp_ipv4.c |  4 ++--
 net/ipv6/datagram.c |  2 +-
 net/ipv6/tcp_ipv6.c |  4 ++--
 7 files changed, 22 insertions(+), 41 deletions(-)

-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/2] net: Set sk_txhash from a random number

This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 include/net/ip.h| 16 
 include/net/ipv6.h  | 19 ---
 include/net/sock.h  |  8 
 net/ipv4/datagram.c |  2 +-
 net/ipv4/tcp_ipv4.c |  4 ++--
 net/ipv6/datagram.c |  2 +-
 net/ipv6/tcp_ipv6.c |  4 ++--
 7 files changed, 14 insertions(+), 41 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index d5fe9f2..bee5f35 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -370,22 +370,6 @@ static inline void iph_to_flow_copy_v4addrs(struct 
flow_keys *flow,
flow-control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
 }
 
-static inline void inet_set_txhash(struct sock *sk)
-{
-   struct inet_sock *inet = inet_sk(sk);
-   struct flow_keys keys;
-
-   memset(keys, 0, sizeof(keys));
-
-   keys.addrs.v4addrs.src = inet-inet_saddr;
-   keys.addrs.v4addrs.dst = inet-inet_daddr;
-   keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV4_ADDRS;
-   keys.ports.src = inet-inet_sport;
-   keys.ports.dst = inet-inet_dport;
-
-   sk-sk_txhash = flow_hash_from_keys(keys);
-}
-
 static inline __wsum inet_gro_compute_pseudo(struct sk_buff *skb, int proto)
 {
const struct iphdr *iph = skb_gro_network_header(skb);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 82dbdb0..7c79798 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -707,25 +707,6 @@ static inline void iph_to_flow_copy_v6addrs(struct 
flow_keys *flow,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static inline void ip6_set_txhash(struct sock *sk)
-{
-   struct inet_sock *inet = inet_sk(sk);
-   struct ipv6_pinfo *np = inet6_sk(sk);
-   struct flow_keys keys;
-
-   memset(keys, 0, sizeof(keys));
-
-   memcpy(keys.addrs.v6addrs.src, np-saddr,
-  sizeof(keys.addrs.v6addrs.src));
-   memcpy(keys.addrs.v6addrs.dst, sk-sk_v6_daddr,
-  sizeof(keys.addrs.v6addrs.dst));
-   keys.control.addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
-   keys.ports.src = inet-inet_sport;
-   keys.ports.dst = inet-inet_dport;
-
-   sk-sk_txhash = flow_hash_from_keys(keys);
-}
-
 static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb,
__be32 flowlabel, bool autolabel)
 {
diff --git a/include/net/sock.h b/include/net/sock.h
index 4353ef7..fe735c4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1687,6 +1687,14 @@ static inline void sock_graft(struct sock *sk, struct 
socket *parent)
 kuid_t sock_i_uid(struct sock *sk);
 unsigned long sock_i_ino(struct sock *sk);
 
+static inline void sk_set_txhash(struct sock *sk)
+{
+   sk-sk_txhash = prandom_u32();
+
+   if (unlikely(!sk-sk_txhash))
+   sk-sk_txhash = 1;
+}
+
 static inline struct dst_entry *
 __sk_dst_get(struct sock *sk)
 {
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 574fad9..f915abf 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -74,7 +74,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr 
*uaddr, int addr_len
inet-inet_daddr = fl4-daddr;
inet-inet_dport = usin-sin_port;
sk-sk_state = TCP_ESTABLISHED;
-   inet_set_txhash(sk);
+   sk_set_txhash(sk);
inet-inet_id = jiffies;
 
sk_dst_set(sk, rt-dst);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 486ba96..d27eb54 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -222,7 +222,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, 
int addr_len)
if (err)
goto failure;
 
-   inet_set_txhash(sk);
+   sk_set_txhash(sk);
 
rt = ip_route_newports(fl4, rt, orig_sport, orig_dport,
   inet-inet_sport, inet-inet_dport, sk);
@@ -1277,7 +1277,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct 
sk_buff *skb,
newinet-mc_ttl   = ip_hdr(skb)-ttl;
newinet-rcv_tos  = ip_hdr(skb)-tos;
inet_csk(newsk)-icsk_ext_hdr_len = 0;
-   inet_set_txhash(newsk);
+   sk_set_txhash(newsk);
if (inet_opt)
inet_csk(newsk)-icsk_ext_hdr_len = inet_opt-opt.optlen;
newinet-inet_id = newtp-write_seq ^ jiffies;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2572a32..9aadd57 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -199,7 +199,7 @@ ipv4_connected:
  NULL);
 
sk-sk_state = TCP_ESTABLISHED;
-   ip6_set_txhash(sk);
+   sk_set_txhash(sk);
 out:
fl6_sock_release(flowlabel);
return err;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index

RE: [PATCH 1/5] Add functions producing system time given a backing counter value

2015-07-28 Thread Hall, Christopher S

-Original Message-
From: John Stultz [mailto:john.stu...@linaro.org]
Sent: Monday, July 27, 2015 8:44 PM
To: Hall, Christopher S
Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
Ronciak, John; H. Peter Anvin; x...@kernel.org; lkml;
netdev@vger.kernel.org
Subject: Re: [PATCH 1/5] Add functions producing system time given a
backing counter value

On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
christopher.s.h...@intel.com wrote:
* counter_to_rawmono64
* counter_to_mono64
* counter_to_realtime64

Enables drivers to translate a captured system clock counter to system
time. This is useful for network and audio devices that capture
timestamps
in terms of both the system clock and device clock.

Huh. So for counter_to_realtime64 mono64, this seems to ignore the
fact that the multiplier is constantly adjusted and corrected. So that
calling the function twice with the same counter value may result in
different returned values.

I've not yet groked the whole patchset, but it seems like there needs
to be some mechanism that ensures the counter value is captured and
used in the same (or at least close) interval that the timekeeper data
is valid for.

The ART (and derived TSC) values are always in the past. There's no
chance that we could exceed the interval. I don't think any similar
usage would be a problem either.

Are you suggesting that, for completeness, this be enforced by the
conversion function?

I do a check here to make sure that the current counter value isn't before
the beginning of the current interval:

timekeeping_get_delta()
...
if (cycle_now tkr-cycle_last
tkr-cycle_last - cycle_now ROLLOVER_THRESHOLD)
return -EAGAIN;

If tkr-cycle_last - cycle_now is large, the assumption is that
rollover occurred. Otherwise, the caller should re-read the counter
so that it falls within the current interval. In my normal use
testing, re-read never occurred.

Thanks for your input.

Chris

thanks
-john
N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH iproute2 net-next] bridge: mdb: add support for vlans

2015-07-28 Thread Stephen Hemminger

On Tue, 28 Jul 2015 13:17:35 +0200
Nikolay Aleksandrov niko...@cumulusnetworks.com wrote:

 On 07/15/2015 05:45 PM, Nikolay Aleksandrov wrote:
  This patch allows the user to specify the vlan of the mdb group being
  added or deleted and adds support for displaying the vlan when
  dumping mdb information or monitoring it. It also updates the man page
  to reflect the new vid argument for mdb.
  
  Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
  ---
  note: the cast in print_mdb_entry() was necessary to shut the compiler
  
   bridge/mdb.c  | 31 +++
   include/linux/if_bridge.h |  1 +
   man/man8/bridge.8 |  8 +++-
   3 files changed, 27 insertions(+), 13 deletions(-)
  
 
 Hi Stephen,
 Just wondering what's the state of this patch because I'd like to submit some
 improvements in the same area and I'm wondering if I should do them on top
 of this patch or if I need to change something in it ?
 
 Thanks,
  Nik
 

Now on net-next branch of iproute2 since support is not in 4.2 kernel.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

2015-07-28 Thread Hall, Christopher S

 -Original Message-
 From: John Stultz [mailto:john.stu...@linaro.org]
 Sent: Monday, July 27, 2015 9:11 PM
 To: Hall, Christopher S
 Cc: Thomas Gleixner; Richard Cochran; Ingo Molnar; Kirsher, Jeffrey T;
 Ronciak, John; H. Peter Anvin; x...@kernel.org; lkml;
 netdev@vger.kernel.org
 Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
 (ART) to system time

 On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
 christopher.s.h...@intel.com wrote:
  +static bool checked_art_to_tsc(cycle_t *tsc)
  +{
  +   if (!has_art())
  +   return false;
  +   *tsc = art_to_tsc(*tsc);
  +   return true;
  +}
  +
  +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
  +{
  +   if (!checked_art_to_tsc(art))
  +   return -ENXIO;
  +   return tsc_to_rawmono64(rawmono, art);
  +}
  +EXPORT_SYMBOL(art_to_rawmono64);

 This all seems to assume the TSC is the current clocksource, which it
 may not be if the user has overridden it.

I don't make that assumption.  The counter_to_* functions take a
pointer to a clocksource struct.  They return -ENXIO if that clocksource
doesn’t match the current clocksource.

The tsc_to_* functions pass the tsc clocksource pointer to the counter_to_*
functions.  These tsc conversion functions are called by the art_to_*
functions.

 If instead there were a counter_to_rawmono64() which took the counter
 value and maybe the name of the clocksource (if the strncmp is
 affordable for your use), it might be easier for the core to provide
 an error if the current timekeeping clocksource isn't the one the
 counter value is based on. This would also allow the tsc_to_*()
 midlayers to be dropped (since they don't seem to do much).

 thanks
 -john

Again, thanks for your input.

Chris

Re: [PATCH net-next v4] af_mpls: fix undefined reference to ip6_route_output

Hi roopa,

On Tue, Jul 28, 2015, at 21:28, roopa wrote:
 On 7/28/15, 6:04 AM, Hannes Frederic Sowa wrote:
  Can't you simply use ipv6_stub_impl.ipv6_dst_lookup with sk=NULL to do
  that and don't have a run-time dependency on IPv6 at all (for the cost
  of a function pointer).
   ipv6_stub_impl.ipv6_dst_lookup seems to require sk today.
 But it only needs it to get 'net' in the beginning and sk is optional 
 afterwards.
 I will submit a patch to add 'net' as an arg  to ipv6_dst_lookup.
 Users of ipv6_dst_lookup are few and that seems like an easy change and 
 helps my patch.
 If you or others think otherwise, pls let me know.

No need to extend this function at any cost. Simply add your own
function pointer to the struct if needed.

Probably you have to move the ipv6_stub = ipv6_stub_impl;
initialization in inet6_init down so you don't expose the function
pointer too early and thus it races with initialization (and error
handling seems to be incorrect in this function, too).

Thanks,
Hannes

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v4] af_mpls: fix undefined reference to ip6_route_output

2015-07-28 Thread roopa


On 7/28/15, 3:22 PM, Hannes Frederic Sowa wrote:

Hi roopa,

On Tue, Jul 28, 2015, at 21:28, roopa wrote:

   ipv6_stub_impl.ipv6_dst_lookup seems to require sk today.
But it only needs it to get 'net' in the beginning and sk is optional
afterwards.
I will submit a patch to add 'net' as an arg  to ipv6_dst_lookup.
Users of ipv6_dst_lookup are few and that seems like an easy change and
helps my patch.
If you or others think otherwise, pls let me know.

No need to extend this function at any cost. Simply add your own
function pointer to the struct if needed.


saw your this email after I hit send on the series. Since the new 
function pointer will be exactly similar to ipv6_dst_lookup
with just an additional argument, a new function pointer does not seem 
necessary. But i can certainly change it

to a new function pointer and resend if that is more acceptable.


Probably you have to move the ipv6_stub = ipv6_stub_impl;
initialization in inet6_init down so you don't expose the function
pointer too early and thus it races with initialization (and error
handling seems to be incorrect in this function, too).


ok, will look.

thanks,
Roopa


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] bridge: mdb: fix delmdb state in the notification

2015-07-28 Thread Cong Wang

On Tue, Jul 28, 2015 at 4:10 AM, Nikolay Aleksandrov
ra...@blackwall.org wrote:
 From: Nikolay Aleksandrov niko...@cumulusnetworks.com

 Since mdb states were introduced when deleting an entry the state was
 left as it was set in the delete request from the user which leads to
 the following output when doing a monitor (for example):
 $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
 (monitor) dev br0 port eth3 grp 239.0.0.1 permanent
 $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
 (monitor) dev br0 port eth3 grp 239.0.0.1 temp
 ^^^
 Note the temp state in the delete notification which is wrong since
 the entry was permanent, the state in a delete is always reported as
 temp regardless of the real state of the entry.


Hmm?

I think it is iproute2 who forgets to set entry-state when deleting it?

} else if (strcmp(*argv, permanent) == 0) {
if (cmd == RTM_NEWMDB)
entry.state |= MDB_PERMANENT;

Kernel simply returns what you pass to it.

Please fix iproute2.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

2015-07-28 Thread Hall, Christopher S



 -Original Message-
 From: Andy Lutomirski [mailto:l...@kernel.org]
 Sent: Monday, July 27, 2015 6:32 PM
 To: Hall, Christopher S; john.stu...@linaro.org; t...@linutronix.de;
 richardcoch...@gmail.com; mi...@redhat.com; Kirsher, Jeffrey T; Ronciak,
 John; h...@zytor.com; x...@kernel.org
 Cc: linux-ker...@vger.kernel.org; netdev@vger.kernel.org; Borislav
 Petkov
 Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
 (ART) to system time
 
 On 07/27/2015 05:46 PM, Christopher Hall wrote:
  * art_to_mono64
  * art_to_rawmono64
  * art_to_realtime64
 
  Intel audio and PCH ethernet devices use the Always Running Timer
 (ART) to
  relate their device clock to system time
 
  Signed-off-by: Christopher Hall christopher.s.h...@intel.com
  ---
arch/x86/Kconfig   |  12 
arch/x86/include/asm/art.h |  42 ++
arch/x86/kernel/Makefile   |   1 +
arch/x86/kernel/art.c  | 134
 +
arch/x86/kernel/tsc.c  |   4 ++
5 files changed, 193 insertions(+)
create mode 100644 arch/x86/include/asm/art.h
create mode 100644 arch/x86/kernel/art.c
 
  diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
  index b3a1a5d..1ef9985 100644
  --- a/arch/x86/Kconfig
  +++ b/arch/x86/Kconfig
  @@ -1175,6 +1175,18 @@ config X86_CPUID
with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
/dev/cpu/31/cpuid.
 
  +config X86_ART
  +   bool Always Running Timer
  +   default y
  +   depends on X86_TSC
  +   ---help---
  + This option provides functionality to drivers and devices that
 use
  + the always-running-timer (ART) to correlate their device clock
  + counter with the system clock counter. The TSC is *exactly*
 related
  + to the ART by a ratio m/n specified by CPUID leaf 0x15
  + (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
  + performance impact. It's safe to say Y.
  +
 
 Is there a good reason to make this optional?

If there aren't any objections, it sound OK to me.  So no, I don't know
of any good reasons.

 
 Also, is there *still* no way to ask the thing for its nominal
 frequnency?  Or can we expect CPUID leaf 16H to work on CPUs that
 support this and can we expect it to actually work?  

There isn't any way to query nominal frequency.  CPUID leaf 0x15 only
exposes the relationship between ART and TSC.  CPUID leaf 0x16 stays
the more or less the same and isn't related to ART.

The SDM says The
 returned information should not be used for any other purpose as the
 returned information does not accurately correlate to information /
 counters returned by other processor interfaces.
 
 Also, does this thing let us learn the real time base?  SDM 17.14.4
 suggests that the ART value isn't affected by privileged software (aka
 buggy/malicious firmware).  Or, alternatively, how do we learn the
 offset K between ART and scaled TSC?

ART isn't affected by software.  The determination of K used to convert ART to
TSC is in a footnote (2) in that section of the SDM.  I'm not going to risk
repeating it here and possibly altering its meaning.

 
choice
  prompt High Memory Support
  default HIGHMEM4G
  diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
  new file mode 100644
  index 000..da58ce4
  --- /dev/null
  +++ b/arch/x86/include/asm/art.h
  @@ -0,0 +1,42 @@
  +/*
  + * x86 ART related functions
  + */
  +#ifndef _ASM_X86_ART_H
  +#define _ASM_X86_ART_H
  +
  +#ifndef CONFIG_X86_ART
  +
  +static inline int setup_art(void)
  +{
  +   return 0;
  +}
  +
  +static inline bool has_art(void)
  +{
  +   return false;
  +}
  +
  +static inline int art_to_rawmono64(struct timespec64 *rawmono,
 cycle_t art)
  +{
  +   return -ENXIO;
  +}
  +static inline int art_to_realtime64(struct timespec64 *realtime,
 cycle_t art)
  +{
  +   return -ENXIO;
  +}
  +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
  +{
  +   return -ENXIO;
  +}
  +
  +#else
  +
  +extern int setup_art(void);
  +extern bool has_art(void);
  +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
  +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t
 art);
  +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
  +
  +#endif
  +
  +#endif/*_ASM_X86_ART_H*/
  diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
  index 0f15af4..0908311 100644
  --- a/arch/x86/kernel/Makefile
  +++ b/arch/x86/kernel/Makefile
  @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS)   +=
 perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_IOSF_MBI)+= iosf_mbi.o
obj-$(CONFIG_PMC_ATOM)+= pmc_atom.o
  +obj-$(CONFIG_X86_ART)  += art.o
 
###
# 64 bit specific files
  diff --git a/arch/x86/kernel/art.c b/arch/x86/kernel/art.c
  new file mode 100644
  index 000..1906cf0
  --- /dev/null
  +++

[PATCH 0/3] net: netcp: bug fixes for dynamic module support

2015-07-28 Thread Murali Karicheri

This series fixes few bugs to allow keystone netcp modules to be
dynamically loaded and removed. Currently it allows following
sequence multiple times
  
 insmod cpsw_ale.ko
 insmod davinci_mdio.ko
 insmod keystone_netcp.ko
 insmod keystone_netcp_ethss.ko
 ifup eth0
 ifup eth1
 ping hosts on eth0
 ping hosts on eth1
 ifdown eth1
 ifdown eth0
 rmmod keystone_netcp_ethss.ko
 rmmod keystone_netcp.ko
 rmmod davinci_mdio.ko
 rmmod cpsw_ale.ko

Murali Karicheri (3):
  net: netcp: fix cleanup interface list in netcp_remove()
  net: netcp: ethss: fix up incorrect use of list api
  net: netcp: ethss: cleanup gbe_probe() and gbe_remove() functions

 drivers/net/ethernet/ti/netcp_core.c  | 14 +++---
 drivers/net/ethernet/ti/netcp_ethss.c | 49 ++-
 2 files changed, 30 insertions(+), 33 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] net: netcp: ethss: fix up incorrect use of list api

2015-07-28 Thread Murali Karicheri

The code seems to assume a null is returned when the list is empty
from first_sec_slave() to break the loop which is incorrect. Fix the
code by using list_empty().

Signed-off-by: Murali Karicheri m-kariche...@ti.com
---
 drivers/net/ethernet/ti/netcp_ethss.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_ethss.c 
b/drivers/net/ethernet/ti/netcp_ethss.c
index 9b7e0a3..77bcfca 100644
--- a/drivers/net/ethernet/ti/netcp_ethss.c
+++ b/drivers/net/ethernet/ti/netcp_ethss.c
@@ -2490,10 +2490,9 @@ static void free_secondary_ports(struct gbe_priv 
*gbe_dev)
 {
struct gbe_slave *slave;
 
-   for (;;) {
+   while (!list_empty(gbe_dev-secondary_slaves)) {
slave = first_sec_slave(gbe_dev);
-   if (!slave)
-   break;
+
if (slave-phy)
phy_disconnect(slave-phy);
list_del(slave-slave_list);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] net: netcp: fix cleanup interface list in netcp_remove()

2015-07-28 Thread Murali Karicheri

Currently if user do rmmod keystone_netcp.ko following warning is
seen :-

[   59.035891] [ cut here ]
[   59.040535] WARNING: CPU: 2 PID: 1619 at drivers/net/ethernet/ti/
netcp_core.c:2127 netcp_remove)

This is because the interface list is not cleaned up in netcp_remove.
This patch fixes this. Also fix some checkpatch related warnings.

Signed-off-by: Murali Karicheri m-kariche...@ti.com
---
 drivers/net/ethernet/ti/netcp_core.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/netcp_core.c 
b/drivers/net/ethernet/ti/netcp_core.c
index ec8ed30..a1c6961 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -2112,6 +2112,7 @@ probe_quit:
 static int netcp_remove(struct platform_device *pdev)
 {
struct netcp_device *netcp_device = platform_get_drvdata(pdev);
+   struct netcp_intf *netcp_intf, *netcp_tmp;
struct netcp_inst_modpriv *inst_modpriv, *tmp;
struct netcp_module *module;
 
@@ -2123,8 +2124,16 @@ static int netcp_remove(struct platform_device *pdev)
list_del(inst_modpriv-inst_list);
kfree(inst_modpriv);
}
-   WARN(!list_empty(netcp_device-interface_head), %s interface list not 
empty!\n,
-pdev-name);
+
+   /* now that all modules are removed, clean up the interfaces */
+   list_for_each_entry_safe(netcp_intf, netcp_tmp,
+netcp_device-interface_head,
+interface_list) {
+   netcp_delete_interface(netcp_device, netcp_intf-ndev);
+   }
+
+   WARN(!list_empty(netcp_device-interface_head),
+%s interface list not empty!\n, pdev-name);
 
devm_kfree(pdev-dev, netcp_device);
pm_runtime_put_sync(pdev-dev);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 2/2] af_mpls: fix undefined reference to ip6_route_output

2015-07-28 Thread Roopa Prabhu

From: Roopa Prabhu ro...@cumulusnetworks.com

Undefined reference to ip6_route_output and ip_route_output
was reported with CONFIG_INET=n and CONFIG_IPV6=n.

This patch uses ipv6_stub_impl.ipv6_dst_lookup instead of
ip6_route_output. And wraps affected code under
IS_ENABLED(CONFIG_INET) and IS_ENABLED(CONFIG_IPV6).

Reported-by: kbuild test robot fengguang...@intel.com
Reported-by: Thomas Graf tg...@suug.ch
Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/mpls/af_mpls.c |   39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 49f1b0e..1c82888 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -15,7 +15,10 @@
 #include net/ip_fib.h
 #include net/netevent.h
 #include net/netns/generic.h
-#include net/ip6_route.h
+#if IS_ENABLED(CONFIG_IPV6)
+#include net/ipv6.h
+#include net/addrconf.h
+#endif
 #include internal.h
 
 #define LABEL_NOT_SPECIFIED (120)
@@ -331,6 +334,7 @@ static unsigned find_free_label(struct net *net)
return LABEL_NOT_SPECIFIED;
 }
 
+#if IS_ENABLED(CONFIG_INET)
 static struct net_device *inet_fib_lookup_dev(struct net *net, void *addr)
 {
struct net_device *dev = NULL;
@@ -347,30 +351,47 @@ static struct net_device *inet_fib_lookup_dev(struct net 
*net, void *addr)
 
ip_rt_put(rt);
 
-errout:
return dev;
+errout:
+   return ERR_PTR(-ENODEV);
 }
+#else
+static struct net_device *inet_fib_lookup_dev(struct net *net, void *addr)
+{
+   return ERR_PTR(-EAFNOSUPPORT);
+}
+#endif
 
+#if IS_ENABLED(CONFIG_IPV6)
 static struct net_device *inet6_fib_lookup_dev(struct net *net, void *addr)
 {
struct net_device *dev = NULL;
struct dst_entry *dst;
struct flowi6 fl6;
 
+   if (!ipv6_stub)
+   return ERR_PTR(-EAFNOSUPPORT);
+
memset(fl6, 0, sizeof(fl6));
memcpy(fl6.daddr, addr, sizeof(struct in6_addr));
-   dst = ip6_route_output(net, NULL, fl6);
-   if (dst-error)
+   if (ipv6_stub-ipv6_dst_lookup(net, NULL, dst, fl6))
goto errout;
 
dev = dst-dev;
dev_hold(dev);
-
-errout:
dst_release(dst);
 
return dev;
+
+errout:
+   return ERR_PTR(-ENODEV);
 }
+#else
+static struct net_device *inet6_fib_lookup_dev(struct net *net, void *addr)
+{
+   return ERR_PTR(-EAFNOSUPPORT);
+}
+#endif
 
 static struct net_device *find_outdev(struct net *net,
  struct mpls_route_config *cfg)
@@ -425,10 +446,12 @@ static int mpls_route_add(struct mpls_route_config *cfg)
if (cfg-rc_output_labels  MAX_NEW_LABELS)
goto errout;
 
-   err = -ENODEV;
dev = find_outdev(net, cfg);
-   if (!dev)
+   if (IS_ERR(dev)) {
+   err = PTR_ERR(dev);
+   dev = NULL;
goto errout;
+   }
 
/* Ensure this is a supported device */
err = -EINVAL;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 0/2] af_mpls: fix undefined reference to ip6_route_output with CONFIG_IPV6=n

2015-07-28 Thread Roopa Prabhu

From: Roopa Prabhu ro...@cumulusnetworks.com

This patch series uses ipv6_stub_impl.ipv6_dst_lookup instead of 
ip6_route_output. Follows the vxlan drivers usage of
ipv6_stub_impl.ipv6_dst_lookup.

There is no sk in the af_mpls context from where
ipv6_stub_impl.ipv6_dst_lookup is used. sk appears to be needed
to get the namespace 'net' and is optional otherwise. This patch series
changes ipv6_stub_impl.ipv6_dst_lookup to take net argument. sk remains
optional.

The case of CONFIG_IPV6=m and MPLS_ROUTING=y is covered by checking
if ipv6_stub is not NULL. I have tested this case for proper return
values to the user. (I dont see an ipv6_stub null check in
the vxlan driver. I will test it separately and submit a patch
for vxlan driver if needed).

v1 - v2: use IS_BUILTIN

v2 - v3: Use new Kconfig option that depends on (IPV6 || IPV6=n) as
 suggested by Dave. Also uses IS_ERR as suggested by Thomas.

v3 - v4: Include missed case of (MPLS_ROUTING=y  IPV6=m) reported by
 Dave.

v4 - v5: Use ipv6_stub_impl.ipv6_dst_lookup as suggested by Hannes


Dave, v4 uses a new Kconfig option and v5 uses ipv6_stub_impl.ipv6_dst_lookup
which looks like was added for vxlan driver for similar use case. Thanks and
apologies for the iterations on this.

Roopa Prabhu (2):
  ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument
  af_mpls: fix undefined reference to ip6_route_output

 drivers/net/vxlan.c|2 +-
 include/net/addrconf.h |4 ++--
 include/net/ipv6.h |3 ++-
 net/ipv6/icmp.c|6 +++---
 net/ipv6/ip6_output.c  |   15 ---
 net/mpls/af_mpls.c |   39 +++
 net/tipc/udp_media.c   |3 ++-
 7 files changed, 49 insertions(+), 23 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 1/2] ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument

2015-07-28 Thread Roopa Prabhu

From: Roopa Prabhu ro...@cumulusnetworks.com

This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup
for use cases where sk is not available (like mpls).
sk appears to be needed to get the namespace 'net' and is optional
otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup
to take net argument. sk remains optional.

All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified
to pass net. I have modified them to use already available
'net' in the scope of the call. I can change them to
sock_net(sk) to avoid any unintended change in behaviour if sock
namespace is different. They dont seem to be from code inspection.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 drivers/net/vxlan.c|2 +-
 include/net/addrconf.h |4 ++--
 include/net/ipv6.h |3 ++-
 net/ipv6/icmp.c|6 +++---
 net/ipv6/ip6_output.c  |   12 ++--
 net/tipc/udp_media.c   |3 ++-
 6 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 81f0f24..beed5d4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -2034,7 +2034,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
fl6.flowi6_mark = skb-mark;
fl6.flowi6_proto = IPPROTO_UDP;
 
-   if (ipv6_stub-ipv6_dst_lookup(sk, ndst, fl6)) {
+   if (ipv6_stub-ipv6_dst_lookup(vxlan-net, sk, ndst, fl6)) {
netdev_dbg(dev, no route to %pI6\n,
   dst-sin6.sin6_addr);
dev-stats.tx_carrier_errors++;
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index def59d3..0c3ac5a 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -158,8 +158,8 @@ struct ipv6_stub {
 const struct in6_addr *addr);
int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex,
 const struct in6_addr *addr);
-   int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst,
-   struct flowi6 *fl6);
+   int (*ipv6_dst_lookup)(struct net *net, struct sock *sk,
+  struct dst_entry **dst, struct flowi6 *fl6);
void (*udpv6_encap_enable)(void);
void (*ndisc_send_na)(struct net_device *dev, struct neighbour *neigh,
  const struct in6_addr *daddr,
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 82dbdb0..09d0ea4 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -832,7 +832,8 @@ static inline struct sk_buff *ip6_finish_skb(struct sock 
*sk)
  inet6_sk(sk)-cork);
 }
 
-int ip6_dst_lookup(struct sock *sk, struct dst_entry **dst, struct flowi6 
*fl6);
+int ip6_dst_lookup(struct net *net, struct sock *sk, struct dst_entry **dst,
+  struct flowi6 *fl6);
 struct dst_entry *ip6_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
  const struct in6_addr *final_dst);
 struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 713d743..6c2b213 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -329,7 +329,7 @@ static struct dst_entry *icmpv6_route_lookup(struct net 
*net,
struct flowi6 fl2;
int err;
 
-   err = ip6_dst_lookup(sk, dst, fl6);
+   err = ip6_dst_lookup(net, sk, dst, fl6);
if (err)
return ERR_PTR(err);
 
@@ -361,7 +361,7 @@ static struct dst_entry *icmpv6_route_lookup(struct net 
*net,
if (err)
goto relookup_failed;
 
-   err = ip6_dst_lookup(sk, dst2, fl2);
+   err = ip6_dst_lookup(net, sk, dst2, fl2);
if (err)
goto relookup_failed;
 
@@ -591,7 +591,7 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
else if (!fl6.flowi6_oif)
fl6.flowi6_oif = np-ucast_oif;
 
-   err = ip6_dst_lookup(sk, dst, fl6);
+   err = ip6_dst_lookup(net, sk, dst, fl6);
if (err)
goto out;
dst = xfrm_lookup(net, dst, flowi6_to_flowi(fl6), sk, 0);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c5fc852..92b7cf0 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -881,10 +881,9 @@ out:
return dst;
 }
 
-static int ip6_dst_lookup_tail(struct sock *sk,
+static int ip6_dst_lookup_tail(struct net *net, struct sock *sk,
   struct dst_entry **dst, struct flowi6 *fl6)
 {
-   struct net *net = sock_net(sk);
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
struct neighbour *n;
struct rt6_info *rt;
@@ -994,10 +993,11 @@ out_err_release:
  *
  * It returns zero on success, or a standard errno code on error.
  */
-int ip6_dst_lookup(struct sock *sk, struct dst_entry **dst, struct flowi6 *fl6)
+int ip6_dst_lookup(struct net *net, struct sock *sk, struct dst_entry **dst,
+

Re: [PATCH net] bridge: mdb: fix delmdb state in the notification

On 07/29/2015 12:38 AM, Cong Wang wrote:
 On Tue, Jul 28, 2015 at 4:10 AM, Nikolay Aleksandrov
 ra...@blackwall.org wrote:
 From: Nikolay Aleksandrov niko...@cumulusnetworks.com

 Since mdb states were introduced when deleting an entry the state was
 left as it was set in the delete request from the user which leads to
 the following output when doing a monitor (for example):
 $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
 (monitor) dev br0 port eth3 grp 239.0.0.1 permanent
 $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
 (monitor) dev br0 port eth3 grp 239.0.0.1 temp
 ^^^
 Note the temp state in the delete notification which is wrong since
 the entry was permanent, the state in a delete is always reported as
 temp regardless of the real state of the entry.

 
 Hmm?
 
 I think it is iproute2 who forgets to set entry-state when deleting it?
 
 } else if (strcmp(*argv, permanent) == 0) {
 if (cmd == RTM_NEWMDB)
 entry.state |= MDB_PERMANENT;
 
 Kernel simply returns what you pass to it.
 
 Please fix iproute2.
 

Hi Cong,
Please read the full commit log, I've explained that the state is not honored 
in the kernel
so it doesn't matter if iproute2 sets the correct state that you give on the 
command
line, that is if I give it temp and the entry is permanent - it will still get
deleted and the notification will have the wrong state as temp because I've 
set
it, while this way it'll at least return the correct state of the entry being 
deleted.
Again I'm saying that I chose this solution over a check for the entry state 
because
it may break some user-space tools that rely on the behaviour that the state is
not checked in the kernel.

Cheers,
 Nik
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/2] net: Recompute sk_txhash on negative routing advice

When a connection is failing a transport protocol calls
dst_negative_advice to try to get a better route. This patch includes
changing the sk_txhash in that function. This provides a rudimentary
method to try to find a different path in the network since sk_txhash
affects ECMP on the local host and through the network (via flow labels
or UDP source port in encapsulation).

Signed-off-by: Tom Herbert t...@herbertland.com
---
 include/net/sock.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index fe735c4..24aa75c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1695,6 +1695,12 @@ static inline void sk_set_txhash(struct sock *sk)
sk-sk_txhash = 1;
 }
 
+static inline void sk_rethink_txhash(struct sock *sk)
+{
+   if (sk-sk_txhash)
+   sk_set_txhash(sk);
+}
+
 static inline struct dst_entry *
 __sk_dst_get(struct sock *sk)
 {
@@ -1719,6 +1725,8 @@ static inline void dst_negative_advice(struct sock *sk)
 {
struct dst_entry *ndst, *dst = __sk_dst_get(sk);
 
+   sk_rethink_txhash(sk);
+
if (dst  dst-ops-negative_advice) {
ndst = dst-ops-negative_advice(dst);
 
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2 v7 4/4] ip link: proto_down config and display.

2015-07-28 Thread Stephen Hemminger

On Tue, 14 Jul 2015 13:43:22 -0700
anurad...@cumulusnetworks.com wrote:

 From: Anuradha Karuppiah anurad...@cumulusnetworks.com
 
 This patch adds support to set and display protodown on a switch port. The
 switch driver can handle this error state by doing a phys down on the port.
 
 One example user space application setting this flag is a multi-chassis
 LAG application to handle split-brain situation on peer-link failure.
 
 Example:
 root@net-next:~# ip link set eth1 protodown on
 root@net-next:~/iproute2# ip link show eth1
 4: eth1: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN mode DEFAULT 
 group default qlen 1000
 link/ether 52:54:00:12:35:01 brd ff:ff:ff:ff:ff:ff protodown on
 root@net-next:~/iproute2# ip link set eth1 protodown off
 root@net-next:~/iproute2# ip link show eth1
 4: eth1: BROADCAST,MULTICAST mtu 1500 qdisc noop state DOWN mode DEFAULT 
 group default qlen 1000
 link/ether 52:54:00:12:35:01 brd ff:ff:ff:ff:ff:ff
 root@net-next:~/iproute2#
 
 Signed-off-by: Anuradha Karuppiah anurad...@cumulusnetworks.com
 Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
 Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
 Signed-off-by: Wilson Kok w...@cumulusnetworks.com

Applied to net-next branch.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

2015-07-28 Thread Andy Lutomirski

On Tue, Jul 28, 2015 at 6:18 PM, Hall, Christopher S
christopher.s.h...@intel.com wrote:


 -Original Message-
 From: Andy Lutomirski [mailto:l...@kernel.org]
 Sent: Monday, July 27, 2015 6:32 PM
 To: Hall, Christopher S; john.stu...@linaro.org; t...@linutronix.de;
 richardcoch...@gmail.com; mi...@redhat.com; Kirsher, Jeffrey T; Ronciak,
 John; h...@zytor.com; x...@kernel.org
 Cc: linux-ker...@vger.kernel.org; netdev@vger.kernel.org; Borislav
 Petkov
 Subject: Re: [PATCH 3/5] Add calls to translate Always Running Timer
 (ART) to system time

 On 07/27/2015 05:46 PM, Christopher Hall wrote:
  * art_to_mono64
  * art_to_rawmono64
  * art_to_realtime64
 
  Intel audio and PCH ethernet devices use the Always Running Timer
 (ART) to
  relate their device clock to system time
 
  Signed-off-by: Christopher Hall christopher.s.h...@intel.com
  ---
arch/x86/Kconfig   |  12 
arch/x86/include/asm/art.h |  42 ++
arch/x86/kernel/Makefile   |   1 +
arch/x86/kernel/art.c  | 134
 +
arch/x86/kernel/tsc.c  |   4 ++
5 files changed, 193 insertions(+)
create mode 100644 arch/x86/include/asm/art.h
create mode 100644 arch/x86/kernel/art.c
 
  diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
  index b3a1a5d..1ef9985 100644
  --- a/arch/x86/Kconfig
  +++ b/arch/x86/Kconfig
  @@ -1175,6 +1175,18 @@ config X86_CPUID
with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
/dev/cpu/31/cpuid.
 
  +config X86_ART
  +   bool Always Running Timer
  +   default y
  +   depends on X86_TSC
  +   ---help---
  + This option provides functionality to drivers and devices that
 use
  + the always-running-timer (ART) to correlate their device clock
  + counter with the system clock counter. The TSC is *exactly*
 related
  + to the ART by a ratio m/n specified by CPUID leaf 0x15
  + (n=EAX,m=EBX). If ART is unused or unavailable there isn't any
  + performance impact. It's safe to say Y.
  +

 Is there a good reason to make this optional?

 If there aren't any objections, it sound OK to me.  So no, I don't know
 of any good reasons.


 Also, is there *still* no way to ask the thing for its nominal
 frequnency?  Or can we expect CPUID leaf 16H to work on CPUs that
 support this and can we expect it to actually work?

 There isn't any way to query nominal frequency.  CPUID leaf 0x15 only
 exposes the relationship between ART and TSC.  CPUID leaf 0x16 stays
 the more or less the same and isn't related to ART.

 The SDM says The
 returned information should not be used for any other purpose as the
 returned information does not accurately correlate to information /
 counters returned by other processor interfaces.

 Also, does this thing let us learn the real time base?  SDM 17.14.4
 suggests that the ART value isn't affected by privileged software (aka
 buggy/malicious firmware).  Or, alternatively, how do we learn the
 offset K between ART and scaled TSC?

 ART isn't affected by software.  The determination of K used to convert ART to
 TSC is in a footnote (2) in that section of the SDM.  I'm not going to risk
 repeating it here and possibly altering its meaning.


choice
  prompt High Memory Support
  default HIGHMEM4G
  diff --git a/arch/x86/include/asm/art.h b/arch/x86/include/asm/art.h
  new file mode 100644
  index 000..da58ce4
  --- /dev/null
  +++ b/arch/x86/include/asm/art.h
  @@ -0,0 +1,42 @@
  +/*
  + * x86 ART related functions
  + */
  +#ifndef _ASM_X86_ART_H
  +#define _ASM_X86_ART_H
  +
  +#ifndef CONFIG_X86_ART
  +
  +static inline int setup_art(void)
  +{
  +   return 0;
  +}
  +
  +static inline bool has_art(void)
  +{
  +   return false;
  +}
  +
  +static inline int art_to_rawmono64(struct timespec64 *rawmono,
 cycle_t art)
  +{
  +   return -ENXIO;
  +}
  +static inline int art_to_realtime64(struct timespec64 *realtime,
 cycle_t art)
  +{
  +   return -ENXIO;
  +}
  +static inline int art_to_mono64(struct timespec64 *mono, cycle_t art)
  +{
  +   return -ENXIO;
  +}
  +
  +#else
  +
  +extern int setup_art(void);
  +extern bool has_art(void);
  +extern int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art);
  +extern int art_to_realtime64(struct timespec64 *realtime, cycle_t
 art);
  +extern int art_to_mono64(struct timespec64 *mono, cycle_t art);
  +
  +#endif
  +
  +#endif/*_ASM_X86_ART_H*/
  diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
  index 0f15af4..0908311 100644
  --- a/arch/x86/kernel/Makefile
  +++ b/arch/x86/kernel/Makefile
  @@ -109,6 +109,7 @@ obj-$(CONFIG_PERF_EVENTS)   +=
 perf_regs.o
obj-$(CONFIG_TRACING) += tracepoint.o
obj-$(CONFIG_IOSF_MBI)+= iosf_mbi.o
obj-$(CONFIG_PMC_ATOM)+= pmc_atom.o
  +obj-$(CONFIG_X86_ART)  += art.o
 
###
# 64 bit specific files
  diff --git a/arch/x86/kernel/art.c

Re: [PATCHv2] net/ipv6: add sysctl option accept_ra_hop_limit

2015-07-28 Thread Hangbin Liu

2015-07-28 11:58 GMT+08:00 YOSHIFUJI Hideaki
hideaki.yoshif...@miraclelinux.com:
 Hi,

 Hangbin Liu wrote:
 2015-07-28 7:50 GMT+08:00 YOSHIFUJI Hideaki/吉藤英明
 hideaki.yoshif...@miraclelinux.com:
 Hi,

 Hangbin Liu wrote:
 Commit 6fd99094de2b (ipv6: Don't reduce hop limit for an interface)
 disabled accept hop limit from RA if it is higher than the current hop
 limit for security stuff. But this behavior kind of break the RFC 
 definition.

 RFC 4861, 6.3.4.  Processing Received Router Advertisements
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.

 So add sysctl option accept_ra_hop_limit to let user choose whether accept
 hop limit info in RA.


 How about introducing minimum hop limit, instead?

 Hi Yoshifuji,

 This is a good idea. Maybe this can be another sysctl option?

 The minimum hop limit can be an enhancement of the security issue, then we 
 will
 not only increase the hop limit, but also could decrease it in the
 range of values we
 accept.

 On the other hand, with this patch, we can enable, disable or partly
 enable accept
 hop limit. If we only use minimum hop limit, people could not use a static 
 hop
 limit value.

 May be we use a “hop limit range instead? How do you think?

 I think name of sysctl is the same as you suggested and change the
 semantics.  default value is 0 to accept all hotlimit value
 as before and people can set it to 32 (for example) to reject
 too-small hoplimit (0-31).

OK, then I will try submit a minimum hop limit, thanks for your suggestion :)

Regards
Hangbin

 --yoshfuji


 Thanks
 Hangbin


 |commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a
 |Author: D.S. Ljungmark ljungm...@modio.se
 |Date:   Wed Mar 25 09:28:15 2015 +0100
 |
 |ipv6: Don't reduce hop limit for an interface
 :
 |RFC 3756, Section 4.2.7, Parameter Spoofing
 |
 :
 |  As an example, one possible approach to mitigate this threat is to
 |   ignore very small hop limits.  The nodes could implement a
 |   configurable minimum hop limit, and ignore attempts to set it below
 |   said limit.

 --
 Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
 Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH 2/2] drivers: net: cpsw: add separate napi for tx packet handling for performance improvment

2015-07-28 Thread Francois Romieu

Mugunthan V N mugunthan...@ti.com :
 On Tuesday 28 July 2015 02:52 AM, Francois Romieu wrote:
  Mugunthan V N mugunthan...@ti.com :
[...]
  @@ -752,13 +753,22 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void 
  *dev_id)
 struct cpsw_priv *priv = dev_id;
   
 cpdma_ctlr_eoi(priv-dma, CPDMA_EOI_TX);
  -  cpdma_chan_process(priv-txch, 128);
  +  writel(0, priv-wr_regs-tx_en);
  +
  +  if (netif_running(priv-ndev)) {
  +  napi_schedule(priv-napi_tx);
  +  return IRQ_HANDLED;
  +  }
  
  
  cpsw_ndo_stop calls napi_disable: you can remove netif_running.
  
 
 This netif_running check is to find which interface is up as the
 interrupt is shared by both the interfaces. When first interface is down
 and second interface is active then napi_schedule for first interface
 will fail and second interface napi needs to be scheduled.
 
 So I don't think netif_running needs to be removed.

Each interface has its own napi tx (resp. rx) context: I would had expected
two unconditional napi_schedule per tx (resp. rx) shared irq, not one.

I'll read it again after some sleep.

-- 
Ueimor
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] net/ipv6: add sysctl option accept_ra_hop_limit

2015-07-28 Thread YOSHIFUJI Hideaki

Hangbin Liu wrote:
 2015-07-28 11:58 GMT+08:00 YOSHIFUJI Hideaki
 hideaki.yoshif...@miraclelinux.com:
 Hi,

 Hangbin Liu wrote:
 2015-07-28 7:50 GMT+08:00 YOSHIFUJI Hideaki/吉藤英明
 hideaki.yoshif...@miraclelinux.com:
 Hi,

 Hangbin Liu wrote:
 Commit 6fd99094de2b (ipv6: Don't reduce hop limit for an interface)
 disabled accept hop limit from RA if it is higher than the current hop
 limit for security stuff. But this behavior kind of break the RFC 
 definition.

 RFC 4861, 6.3.4.  Processing Received Router Advertisements
If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.

 So add sysctl option accept_ra_hop_limit to let user choose whether accept
 hop limit info in RA.


 How about introducing minimum hop limit, instead?

 Hi Yoshifuji,

 This is a good idea. Maybe this can be another sysctl option?

 The minimum hop limit can be an enhancement of the security issue, then we 
 will
 not only increase the hop limit, but also could decrease it in the
 range of values we
 accept.

 On the other hand, with this patch, we can enable, disable or partly
 enable accept
 hop limit. If we only use minimum hop limit, people could not use a 
 static hop
 limit value.

 May be we use a “hop limit range instead? How do you think?

 I think name of sysctl is the same as you suggested and change the
 semantics.  default value is 0 to accept all hotlimit value
 as before and people can set it to 32 (for example) to reject
 too-small hoplimit (0-31).
 
 OK, then I will try submit a minimum hop limit, thanks for your suggestion 
 :)

accept_ra_min_hop_limit would be better as we have
accept_ra_rt_info_max_plen.

 
 Regards
 Hangbin

 --yoshfuji


 Thanks
 Hangbin


 |commit 6fd99094de2b83d1d4c8457f2c83483b2828e75a
 |Author: D.S. Ljungmark ljungm...@modio.se
 |Date:   Wed Mar 25 09:28:15 2015 +0100
 |
 |ipv6: Don't reduce hop limit for an interface
 :
 |RFC 3756, Section 4.2.7, Parameter Spoofing
 |
 :
 |  As an example, one possible approach to mitigate this threat is to
 |   ignore very small hop limits.  The nodes could implement a
 |   configurable minimum hop limit, and ignore attempts to set it 
 below
 |   said limit.

 --
 Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
 Technical Division, MIRACLE LINUX CORPORATION

-- 
Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] bridge: Fix network header pointer for vlan tagged packets

2015-07-28 Thread Toshiaki Makita

There are several devices that can receive vlan tagged packets with
CHECKSUM_PARTIAL like tap, possibly veth and xennet.
When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded
by bridge to a device with the IP_CSUM feature, they end up with checksum
error because before entering bridge, the network header is set to
ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(),
get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later.

Since the network header is exepected to be ETH_HLEN in flow-dissection
and hash-calculation in RPS in rx path, and since the header pointer fix
is needed only in tx path, set the appropriate network header on forwarding
packets.

Signed-off-by: Toshiaki Makita makita.toshi...@lab.ntt.co.jp
---
 net/bridge/br_forward.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 0ff6e1b..fa7bfce 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -37,15 +37,30 @@ static inline int should_deliver(const struct 
net_bridge_port *p,
 
 int br_dev_queue_push_xmit(struct sock *sk, struct sk_buff *skb)
 {
-   if (!is_skb_forwardable(skb-dev, skb)) {
-   kfree_skb(skb);
-   } else {
-   skb_push(skb, ETH_HLEN);
-   br_drop_fake_rtable(skb);
-   skb_sender_cpu_clear(skb);
-   dev_queue_xmit(skb);
+   if (!is_skb_forwardable(skb-dev, skb))
+   goto drop;
+
+   skb_push(skb, ETH_HLEN);
+   br_drop_fake_rtable(skb);
+   skb_sender_cpu_clear(skb);
+
+   if (skb-ip_summed == CHECKSUM_PARTIAL 
+   (skb-protocol == htons(ETH_P_8021Q) ||
+skb-protocol == htons(ETH_P_8021AD))) {
+   int depth;
+
+   if (!__vlan_get_protocol(skb, skb-protocol, depth))
+   goto drop;
+
+   skb_set_network_header(skb, depth);
}
 
+   dev_queue_xmit(skb);
+
+   return 0;
+
+drop:
+   kfree_skb(skb);
return 0;
 }
 EXPORT_SYMBOL_GPL(br_dev_queue_push_xmit);
-- 
1.8.1.2


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 0/7] introduce Hyper-V VM Sockets(hvsock)


Changes since v1:
- updated [PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: default m - default m if HYPERV
- MODULE_LICENSE: Dual MIT/GPL - Dual BSD/GPL 

Changes since v2:
- fixed various coding issue pointed out by David Miller
- fixed indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL - EXPORT_SYMBOL_GPL

Changes since v3:
- fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter
- fixed the ret value in vmbus_recvpacket_hvsock on error
- fixed the style of multi-line comment: vmbus_get_hvsock_rw_status()

Hyper-V VM Sockets (hvsock) is a byte-stream based communication mechanism
between Windowsd 10 (or later) host and a guest. It's kind of TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.
With Hyper-V VM Sockets, applications between the host and a guest can
talk with each other directly by the traditional BSD-style socket APIs.

The patchset implements the necessary support in the guest side by adding
the necessary new APIs in the vmbus driver, and introducing a new driver
hv_sock.ko, which implements_a new socket address family AF_HYPERV.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware's VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://thread.gmane.org/gmane.linux.network/365205.

However, though Hyper-V VM Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: u32 ContextID, u32 Port, but in
AF_HYPERV, the endpoint type is: GUID VM_ID, GUID ServiceID. Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like.notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patchset.

Looking forward to your comments!
Dexuan Cui (7):
  Drivers: hv: vmbus: define the new offer type for Hyper-V socket
(hvsock)
  Drivers: hv: vmbus: define a new VMBus message type for hvsock
  Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the
r/w-ability
  Drivers: hv: vmbus: add APIs to register callbacks to process hvsock
connection
  Drivers: hv: vmbus: add a helper function to set a channel's pending
send size
  hvsock: introduce Hyper-V VM Sockets feature
  Drivers: hv: vmbus: disable local interrupt when hvsock's callback is
running

 MAINTAINERS   |2 +
 drivers/hv/Makefile   |4 +-
 drivers/hv/channel.c  |  149 +
 drivers/hv/channel_mgmt.c |   13 +
 drivers/hv/connection.c   |   15 +-
 drivers/hv/hvsock_callbacks.c |   71 ++
 drivers/hv/hyperv_vmbus.h |4 +
 drivers/hv/ring_buffer.c  |   14 +
 include/linux/hyperv.h|   68 ++
 include/linux/socket.h|4 +-
 include/net/af_hvsock.h   |   44 ++
 include/uapi/linux/hyperv.h   |   16 +
 net/Kconfig   |1 +
 net/Makefile  |1 +
 net/hv_sock/Kconfig   |   10 +
 net/hv_sock/Makefile  |3 +
 net/hv_sock/af_hvsock.c   | 1430 +
 17 files changed, 1846 insertions(+), 3 deletions(-)
 create mode 100644 drivers/hv/hvsock_callbacks.c
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct

2015-07-28 Thread Eric W. Biederman

Hannes Frederic Sowa han...@stressinduktion.org writes:

 Hello Eric,

 On Mon, 2015-07-27 at 15:33 -0500, Eric W. Biederman wrote:
 David Ahern d...@cumulusnetworks.com writes:
 
  Allow tasks to have a default device index for binding sockets. If 
  set
  the value is passed to all AF_INET/AF_INET6 sockets when they are
  created.
  
  The task setting is passed parent to child on fork, but can be set 
  or
  changed after task creation using prctl (if task has CAP_NET_ADMIN
  permissions). The setting for a socket can be retrieved using 
  prctl().
  This option allows an administrator to restrict a task to only 
  send/receive
  packets through the specified device. In the case of VRF devices 
  this
  option restricts tasks to a specific VRF.
  
  Correlation of the device index to a specific VRF, ie.,
 ifindex -- VRF device -- VRF id
  is left to userspace.
 
 Nacked-by: Eric W. Biederman ebied...@xmission.com
 
 Because it is broken by design.  Your routing device is only safe for
 programs that know it's limitations it is not appropriate for general
 applications.
 
 Since you don't even seen to know it's limitations I think this is a
 bad path to walk down.

 Can you please elaborate about the broken by design?

 Different operating systems are already using this approach with good
 success. I read your other mail regarding isolation of different VRFs
 and I agree that all code which persists state depending solely on the
 IP address is affected by this and this must be dealt with and fixed
 (actually, there aren't too many).

The size of struct net would tend to disagree with the assertion that
there are not too many.

 But I wouldn't call that broken by design. This stuff will get fixed
 like e.g. cross-talk between fragmentation queues, icmp rate limiters
 etc, which could already happen in the past.

 What is your opinion on the fundamental approach only from a user
 perspective? Do you think that is broken, too?

I think promising something to userspace that a design can not deliver
is a fundamental problem.

Eric
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: rfkill-regulator: fix compiler warning

2015-07-28 Thread Robert ABEL

pdata char* name = const char* name

Signed-off-by: Robert ABEL ra...@cit-ec.uni-bielefeld.de
---
 include/linux/rfkill-regulator.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/rfkill-regulator.h b/include/linux/rfkill-regulator.h
index aca36bc..594d8e7 100644
--- a/include/linux/rfkill-regulator.h
+++ b/include/linux/rfkill-regulator.h
@@ -41,7 +41,7 @@
 #include linux/rfkill.h
 
 struct rfkill_regulator_platform_data {
-   char *name; /* the name for the rfkill switch */
+   const char *name;   /* the name for the rfkill switch */
enum rfkill_type type;  /* the type as specified in rfkill.h */
 };
 
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 5/5] s390/bpf: recache skb-data/hlen for skb_vlan_push/pop

Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop
via helper functions. These functions may change skb-data/hlen.
This data is cached by s390 JIT to improve performance of ld_abs/ld_ind
instructions. Therefore after a change we have to reload the data.

In case of usage of skb_vlan_push/pop, in the prologue we store
the SKB pointer on the stack and restore it after BPF_JMP_CALL
to skb_vlan_push/pop.

Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
---
 arch/s390/net/bpf_jit.h  |  5 +++-
 arch/s390/net/bpf_jit_comp.c | 55 ++--
 2 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/arch/s390/net/bpf_jit.h b/arch/s390/net/bpf_jit.h
index f6498ee..f010c93 100644
--- a/arch/s390/net/bpf_jit.h
+++ b/arch/s390/net/bpf_jit.h
@@ -36,6 +36,8 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  *   |   BPF stack   | |
  *   |   | |
  *   +---+ |
+ *   | 8 byte skbp   | |
+ * R15+170 - +---+ |
  *   | 8 byte hlen   | |
  * R15+168 - +---+ |
  *   | 4 byte align  | |
@@ -51,11 +53,12 @@ extern u8 sk_load_word[], sk_load_half[], sk_load_byte[];
  * We get 160 bytes stack space from calling function, but only use
  * 12 * 8 byte for old backchain, r15..r6, and tail_call_cnt.
  */
-#define STK_SPACE  (MAX_BPF_STACK + 8 + 4 + 4 + 160)
+#define STK_SPACE  (MAX_BPF_STACK + 8 + 8 + 4 + 4 + 160)
 #define STK_160_UNUSED (160 - 12 * 8)
 #define STK_OFF(STK_SPACE - STK_160_UNUSED)
 #define STK_OFF_TMP160 /* Offset of tmp buffer on stack */
 #define STK_OFF_HLEN   168 /* Offset of SKB header length on stack */
+#define STK_OFF_SKBP   170 /* Offset of SKB pointer on stack */
 
 #define STK_OFF_R6 (160 - 11 * 8)  /* Offset of r6 on stack */
 #define STK_OFF_TCCNT  (160 - 12 * 8)  /* Offset of tail_call_cnt on stack */
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index a025ddc..ece46d4 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -53,6 +53,7 @@ struct bpf_jit {
 #define SEEN_LITERAL   8   /* code uses literals */
 #define SEEN_FUNC  16  /* calls C functions */
 #define SEEN_TAIL_CALL 32  /* code uses tail calls */
+#define SEEN_SKB_CHANGE64  /* code changes skb data */
 #define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB)
 
 /*
@@ -382,6 +383,26 @@ static void save_restore_regs(struct bpf_jit *jit, int op)
 }
 
 /*
+ * For SKB access %b1 contains the SKB pointer. For bpf_jit.S
+ * we store the SKB header length on the stack and the SKB data
+ * pointer in REG_SKB_DATA.
+ */
+static void emit_load_skb_data_hlen(struct bpf_jit *jit)
+{
+   /* Header length: llgf %w1,len(%b1) */
+   EMIT6_DISP_LH(0xe300, 0x0016, REG_W1, REG_0, BPF_REG_1,
+ offsetof(struct sk_buff, len));
+   /* s %w1,data_len(%b1) */
+   EMIT4_DISP(0x5b00, REG_W1, BPF_REG_1,
+  offsetof(struct sk_buff, data_len));
+   /* stg %w1,ST_OFF_HLEN(%r0,%r15) */
+   EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0, REG_15, STK_OFF_HLEN);
+   /* lg %skb_data,data_off(%b1) */
+   EMIT6_DISP_LH(0xe300, 0x0004, REG_SKB_DATA, REG_0,
+ BPF_REG_1, offsetof(struct sk_buff, data));
+}
+
+/*
  * Emit function prologue
  *
  * Save registers and create stack frame if necessary.
@@ -421,25 +442,12 @@ static void bpf_jit_prologue(struct bpf_jit *jit, bool 
is_classic)
EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0,
  REG_15, 152);
}
-   /*
-* For SKB access %b1 contains the SKB pointer. For bpf_jit.S
-* we store the SKB header length on the stack and the SKB data
-* pointer in REG_SKB_DATA.
-*/
-   if (jit-seen  SEEN_SKB) {
-   /* Header length: llgf %w1,len(%b1) */
-   EMIT6_DISP_LH(0xe300, 0x0016, REG_W1, REG_0, BPF_REG_1,
- offsetof(struct sk_buff, len));
-   /* s %w1,data_len(%b1) */
-   EMIT4_DISP(0x5b00, REG_W1, BPF_REG_1,
-  offsetof(struct sk_buff, data_len));
-   /* stg %w1,ST_OFF_HLEN(%r0,%r15) */
+   if (jit-seen  SEEN_SKB)
+   emit_load_skb_data_hlen(jit);
+   if (jit-seen  SEEN_SKB_CHANGE)
+   /* stg %b1,ST_OFF_SKBP(%r0,%r15) */
EMIT6_DISP_LH(0xe300, 0x0024, REG_W1, REG_0, REG_15,
- STK_OFF_HLEN);
-   /* lg %skb_data,data_off(%b1) */
-   EMIT6_DISP_LH(0xe300, 0x0004, REG_SKB_DATA, REG_0,
- BPF_REG_1, offsetof(struct sk_buff, data));
-   }
+ STK_OFF_SKBP);
/* Clear A (%b0) and X (%b7) registers for converted BPF programs */
if

[PATCH net-next 2/5] s390/bpf: Fix multiple macro expansions

The EMIT6_DISP_LH macro passes the disp parameter to the _EMIT6_DISP_LH
macro. The _EMIT6_DISP_LH macro uses the disp parameter twice:

 unsigned int __disp_h = ((u32)disp)  0xff000;
 unsigned int __disp_l = ((u32)disp)  0x00fff;

The EMIT6_DISP_LH is used several times with EMIT_CONST_U64() as disp
parameter. Therefore always two constants are created per usage of
EMIT6_DISP_LH.

Fix this and add variable __disp to avoid multiple expansions.

Fixes: 054623105728 (s390/bpf: Add s390x eBPF JIT compiler backend)
Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
---
 arch/s390/net/bpf_jit_comp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 01ad166..de0f0bc 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -221,8 +221,9 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 b1)
 
 #define EMIT6_DISP_LH(op1, op2, b1, b2, b3, disp)  \
 ({ \
+   int __disp = (disp);\
_EMIT6_DISP_LH(op1 | reg(b1, b2)  16 |\
-  reg_high(b3)  8, op2, disp);   \
+  reg_high(b3)  8, op2, __disp); \
REG_SET_SEEN(b1);   \
REG_SET_SEEN(b2);   \
REG_SET_SEEN(b3);   \
-- 
2.3.8

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/5] s390/bpf: clear correct BPF accumulator register

Currently we assumed the following BPF to eBPF register mapping:

 - BPF_REG_A - BPF_REG_7
 - BPF_REG_X - BPF_REG_8

Unfortunately this mapping is wrong. The correct mapping is:

 - BPF_REG_A - BPF_REG_0
 - BPF_REG_X - BPF_REG_7

So clear the correct registers and use the BPF_REG_A and BPF_REG_X
macros instead of BPF_REG_0/7.

Fixes: 054623105728 (s390/bpf: Add s390x eBPF JIT compiler backend)
Cc: sta...@vger.kernel.org # 4.0+
Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
---
 arch/s390/net/bpf_jit_comp.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 79c731e..01ad166 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -448,13 +448,13 @@ static void bpf_jit_prologue(struct bpf_jit *jit)
EMIT6_DISP_LH(0xe300, 0x0004, REG_SKB_DATA, REG_0,
  BPF_REG_1, offsetof(struct sk_buff, data));
}
-   /* BPF compatibility: clear A (%b7) and X (%b8) registers */
-   if (REG_SEEN(BPF_REG_7))
-   /* lghi %b7,0 */
-   EMIT4_IMM(0xa709, BPF_REG_7, 0);
-   if (REG_SEEN(BPF_REG_8))
-   /* lghi %b8,0 */
-   EMIT4_IMM(0xa709, BPF_REG_8, 0);
+   /* BPF compatibility: clear A (%b0) and X (%b7) registers */
+   if (REG_SEEN(BPF_REG_A))
+   /* lghi %ba,0 */
+   EMIT4_IMM(0xa709, BPF_REG_A, 0);
+   if (REG_SEEN(BPF_REG_X))
+   /* lghi %bx,0 */
+   EMIT4_IMM(0xa709, BPF_REG_X, 0);
 }
 
 /*
-- 
2.3.8

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ARP response with link local IP, why not broadcast

2015-07-28 Thread Sebastian Fett


Just a quick update on the subject.

Thanks for the input. It's good to see that I am not the only one that
has this problem.

Right now we go with our initial approach and bcast our arp responses.
We have a very local network build only for one purpose. Other devices
in that network use the same approach. And the master controll software
will arp request every address eventually.
It's not ideal and will potentially take a couple minutes to resolve
every conflict. But it's the best compromise between effort and benefit.

I'll let you know about our test results. Maybe somebody is interested.

Btw, I still wonder if I can partially keep the kernel from answering
ARP packets?



On Wed, Jul 22, 2015 at 9:49 AM, Sebastian Fett db_ext...@gmx.de wrote:


what is your use case?



My problem ist a local network of audio devices. It is a valid possibility
that two halfs of the setup are set up individually (Stage left and stage
right). Both local networks will auto configure themselves via link local
and will be stable. But there always can be two devices with the same IP in
both networks.
At one point those two networks will be connected. With the current
behaviour the conflicting devices will never know of each other and the
address conflict.


Ah yes, this is a valid problem (Partition-Join tolerance) and one that is
being discussed in the Ipv6 context on 6man:
http://www.ietf.org/mail-archive/web/ipv6/current/msg22712.html

FWIW, when Solaris implemented ACD (rfc 5227) the compromise
that was made between bcasting *every* ARP response whle solving
the type or issue that you describe was to use a periodic ARP announce,
advertising the IP address (a Grat ARP) with exponential backoff.
If a duplicate address is triggered (as would happen in the scenario
that you describe) the system would fall into the aggressive defend mode.

ARP announcemnts were bcast, but the noise is mitigated by tunable
exponential backoff.

Of course, all of this only helps to *detect* the duplicate- eventually
some other entity has to jump in and arbitrate on which one should
own the address.


The devices are controlled by a central PC using avahi/bonjour. It will know
of all conflicting devices, but will only be able to talk to the one that
happens to be in it's ARP cache. And renewing that cache will not change
anything, because it will happen with unicast messages.

I looked at a Dante Controller (an audio data streaming device). And here
all ARP messages are answered with broadcasts.

I think that behaviour is acceptable because it only happens in local
networks. Waking up sleeping devices will not be a concern there.


I dont know if a short term solution that makes sense here is to have
a tunable for this.

But even the always bcast arp response will fail if you have a silent
rejoin of the partitioned network- there is a reliance on the owner
of an address bcasting their ARP resp at some point right?

(there's also a DoS vector here- I can create a lot of bcast traffic
by arping for an address..)


That brings me to another question. When I react to an ARP packet in a
userspace program, can I keep that packet from reaching the kernel as well?
I would like to avoid to completely handle ARP in userspace.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Drivers: isdn: Drop unnecessary continue

On Tue, 2015-07-28 at 14:11 +0530, Shraddha Barke wrote:
 The semantic patch used to make this change is :
 
 @@
 @@
 for (...;...;...) {
   ...
   if (...) {
 ...
 -   continue;
   }
 }
 
 Signed-off-by: Shraddha Barke shraddha.6...@gmail.com
 ---
  drivers/isdn/hardware/mISDN/hfcsusb.c | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/isdn/hardware/mISDN/hfcsusb.c 
 b/drivers/isdn/hardware/mISDN/hfcsusb.c
 index 114f3bc..91beb83 100644
 --- a/drivers/isdn/hardware/mISDN/hfcsusb.c
 +++ b/drivers/isdn/hardware/mISDN/hfcsusb.c
 @@ -1921,10 +1921,9 @@ hfcsusb_probe(struct usb_interface *intf, const struct 
 usb_device_id *id)
   if ((le16_to_cpu(dev-descriptor.idVendor)
== hfcsusb_idtab[i].idVendor) 
   (le16_to_cpu(dev-descriptor.idProduct)
 -  == hfcsusb_idtab[i].idProduct)) {
 +  == hfcsusb_idtab[i].idProduct))
   vend_idx = i;
 - continue;
 - }
 +
   }
  
   printk(KERN_DEBUG


Well, it seems author intent was to use a break instead of a continue.

Not a big deal...


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 3/5] s390/bpf: increase BPF_SIZE_MAX

Currently we have the restriction that jitted BPF programs can
have a maximum size of one page. The reason is that we use short
displacements for the literal pool.

The 20 bit displacements are available since z990 and BPF requires
z196 as minimum. Therefore we can remove this restriction and use
everywhere 20 bit signed long displacements.

Acked-by: Martin Schwidefsky schwidef...@de.ibm.com
Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
---
 arch/s390/net/bpf_jit_comp.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index de0f0bc..bea5cfc 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -45,7 +45,7 @@ struct bpf_jit {
int labels[1];  /* Labels for local jumps */
 };
 
-#define BPF_SIZE_MAX   4096/* Max size for program */
+#define BPF_SIZE_MAX   0x7 /* Max size for program (20 bit signed displ) */
 
 #define SEEN_SKB   1   /* skb access */
 #define SEEN_MEM   2   /* use mem[] for temporary storage */
@@ -203,15 +203,6 @@ static inline void reg_set_seen(struct bpf_jit *jit, u32 
b1)
_EMIT6(op1 | __disp, op2);  \
 })
 
-#define EMIT6_DISP(op1, op2, b1, b2, b3, disp) \
-({ \
-   _EMIT6_DISP(op1 | reg(b1, b2)  16 |   \
-   reg_high(b3)  8, op2, disp);  \
-   REG_SET_SEEN(b1);   \
-   REG_SET_SEEN(b2);   \
-   REG_SET_SEEN(b3);   \
-})
-
 #define _EMIT6_DISP_LH(op1, op2, disp) \
 ({ \
unsigned int __disp_h = ((u32)disp)  0xff000;  \
@@ -981,8 +972,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, 
struct bpf_prog *fp, int i
REG_SET_SEEN(BPF_REG_5);
jit-seen |= SEEN_FUNC;
/* lg %w1,d(imm)(%l) */
-   EMIT6_DISP(0xe300, 0x0004, REG_W1, REG_0, REG_L,
-  EMIT_CONST_U64(func));
+   EMIT6_DISP_LH(0xe300, 0x0004, REG_W1, REG_0, REG_L,
+ EMIT_CONST_U64(func));
/* basr %r14,%w1 */
EMIT2(0x0d00, REG_14, REG_W1);
/* lgr %b0,%r2: load return value into %b0 */
-- 
2.3.8

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/5] s390/bpf: recache skb-data/hlen for skb_vlan_push/pop

Hi Dave,

Here the s390 backend for Alexei's patch 4e10df9a60d9 (bpf: introduce
bpf_skb_vlan_push/pop() helpers) plus two bugfixes and two minor
improvements.

The first patch s390/bpf: clear correct BPF accumulator register will
also go upstream via Martin's fixes branch.

Ok for you?

Regards,
Michael

Michael Holzheu (5):
  s390/bpf: clear correct BPF accumulator register
  s390/bpf: Fix multiple macro expansions
  s390/bpf: increase BPF_SIZE_MAX
  s390/bpf: Only clear A and X for converted BPF programs
  s390/bpf: recache skb-data/hlen for skb_vlan_push/pop

 arch/s390/net/bpf_jit.h  |  5 ++-
 arch/s390/net/bpf_jit_comp.c | 91 +++-
 2 files changed, 52 insertions(+), 44 deletions(-)

-- 
2.3.8

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 4/5] s390/bpf: Only clear A and X for converted BPF programs

Only classic BPF programs that have been converted to eBPF need to clear
the A and X registers. We can check for converted programs with:

  bpf_prog-type == BPF_PROG_TYPE_UNSPEC

So add the check and skip initialization for real eBPF programs.

Signed-off-by: Michael Holzheu holz...@linux.vnet.ibm.com
---
 arch/s390/net/bpf_jit_comp.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index bea5cfc..a025ddc 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -387,7 +387,7 @@ static void save_restore_regs(struct bpf_jit *jit, int op)
  * Save registers and create stack frame if necessary.
  * See stack frame layout desription in bpf_jit.h!
  */
-static void bpf_jit_prologue(struct bpf_jit *jit)
+static void bpf_jit_prologue(struct bpf_jit *jit, bool is_classic)
 {
if (jit-seen  SEEN_TAIL_CALL) {
/* xc STK_OFF_TCCNT(4,%r15),STK_OFF_TCCNT(%r15) */
@@ -440,13 +440,15 @@ static void bpf_jit_prologue(struct bpf_jit *jit)
EMIT6_DISP_LH(0xe300, 0x0004, REG_SKB_DATA, REG_0,
  BPF_REG_1, offsetof(struct sk_buff, data));
}
-   /* BPF compatibility: clear A (%b0) and X (%b7) registers */
-   if (REG_SEEN(BPF_REG_A))
-   /* lghi %ba,0 */
-   EMIT4_IMM(0xa709, BPF_REG_A, 0);
-   if (REG_SEEN(BPF_REG_X))
-   /* lghi %bx,0 */
-   EMIT4_IMM(0xa709, BPF_REG_X, 0);
+   /* Clear A (%b0) and X (%b7) registers for converted BPF programs */
+   if (is_classic) {
+   if (REG_SEEN(BPF_REG_A))
+   /* lghi %ba,0 */
+   EMIT4_IMM(0xa709, BPF_REG_A, 0);
+   if (REG_SEEN(BPF_REG_X))
+   /* lghi %bx,0 */
+   EMIT4_IMM(0xa709, BPF_REG_X, 0);
+   }
 }
 
 /*
@@ -1232,7 +1234,7 @@ static int bpf_jit_prog(struct bpf_jit *jit, struct 
bpf_prog *fp)
jit-lit = jit-lit_start;
jit-prg = 0;
 
-   bpf_jit_prologue(jit);
+   bpf_jit_prologue(jit, fp-type == BPF_PROG_TYPE_UNSPEC);
for (i = 0; i  fp-len; i += insn_count) {
insn_count = bpf_jit_insn(jit, fp, i);
if (insn_count  0)
-- 
2.3.8

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct

On Tue, 2015-07-28 at 08:54 -0500, Eric W. Biederman wrote:
 Hannes Frederic Sowa han...@stressinduktion.org writes:
 
  Hello Eric,
  
  On Mon, 2015-07-27 at 15:33 -0500, Eric W. Biederman wrote:
   David Ahern d...@cumulusnetworks.com writes:
   
Allow tasks to have a default device index for binding sockets. 
If 
set
the value is passed to all AF_INET/AF_INET6 sockets when they 
are
created.

The task setting is passed parent to child on fork, but can be 
set 
or
changed after task creation using prctl (if task has 
CAP_NET_ADMIN
permissions). The setting for a socket can be retrieved using 
prctl().
This option allows an administrator to restrict a task to only 
send/receive
packets through the specified device. In the case of VRF devices 
this
option restricts tasks to a specific VRF.

Correlation of the device index to a specific VRF, ie.,
   ifindex -- VRF device -- VRF id
is left to userspace.
   
   Nacked-by: Eric W. Biederman ebied...@xmission.com
   
   Because it is broken by design.  Your routing device is only safe 
   for
   programs that know it's limitations it is not appropriate for 
   general
   applications.
   
   Since you don't even seen to know it's limitations I think this is 
   a
   bad path to walk down.
  
  Can you please elaborate about the broken by design?
  
  Different operating systems are already using this approach with 
  good
  success. I read your other mail regarding isolation of different 
  VRFs
  and I agree that all code which persists state depending solely on 
  the
  IP address is affected by this and this must be dealt with and fixed
  (actually, there aren't too many).
 
 The size of struct net would tend to disagree with the assertion that
 there are not too many.

netns_frags and inet_peer comes to my mind at first. 

All those data structures simply need to have an opaque id added to the
hash and comparison functions to deal with this problem. And we will
need this in future anyway, as openvswitch will get connection tracking
support and thus the fragmentation engine and icmp rate limiter will
need to be taught about zones in OVS.

  But I wouldn't call that broken by design. This stuff will get fixed
  like e.g. cross-talk between fragmentation queues, icmp rate 
  limiters
  etc, which could already happen in the past.
  
  What is your opinion on the fundamental approach only from a user
  perspective? Do you think that is broken, too?
 
 I think promising something to userspace that a design can not deliver
 is a fundamental problem.

You are still talking about the isolation aspect, right?

Bye,
Hannes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] ebpf, x86: fix general protection fault when tail call is invoked

2015-07-28 Thread Daniel Borkmann

With eBPF JIT compiler enabled on x86_64, I was able to reliably trigger
the following general protection fault out of an eBPF program with a simple
tail call, f.e. tracex5 (or a stripped down version of it):

  [  927.097918] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
  [...]
  [  927.100870] task: 8801f228b780 ti: 880016a64000 task.ti: 
880016a64000
  [  927.102096] RIP: 0010:[a002440d]  [a002440d] 
0xa002440d
  [  927.103390] RSP: 0018:880016a67a68  EFLAGS: 00010006
  [  927.104683] RAX: 5a5a5a5a5a5a5a5a RBX:  RCX: 
0001
  [  927.105921] RDX:  RSI: 88014e438000 RDI: 
880016a67e00
  [  927.107137] RBP: 880016a67c90 R08:  R09: 
0001
  [  927.108351] R10:  R11:  R12: 
880016a67e00
  [  927.109567] R13:  R14: 88026500e460 R15: 
880220a81520
  [  927.110787] FS:  7fe7d5c1f740() GS:88026500() 
knlGS:
  [  927.112021] CS:  0010 DS:  ES:  CR0: 80050033
  [  927.113255] CR2: 003e7bbb91a0 CR3: 6e04b000 CR4: 
001407e0
  [  927.114500] Stack:
  [  927.115737]  c90008cdb000 880016a67e00 88026500e460 
880220a81520
  [  927.117005]  0001 001b 880016a67aa8 
8106c548
  [  927.118276]  7ffcdaf22e58   
880016a67ff0
  [  927.119543] Call Trace:
  [  927.120797]  [8106c548] ? lookup_address+0x28/0x30
  [  927.122058]  [8113d176] ? __module_text_address+0x16/0x70
  [  927.123314]  [8117bf0e] ? is_ftrace_trampoline+0x3e/0x70
  [  927.124562]  [810c1a0f] ? __kernel_text_address+0x5f/0x80
  [  927.125806]  [8102086f] ? print_context_stack+0x7f/0xf0
  [  927.127033]  [810f7852] ? __lock_acquire+0x572/0x2050
  [  927.128254]  [810f7852] ? __lock_acquire+0x572/0x2050
  [  927.129461]  [8119edfa] ? trace_call_bpf+0x3a/0x140
  [  927.130654]  [8119ee4a] trace_call_bpf+0x8a/0x140
  [  927.131837]  [8119edfa] ? trace_call_bpf+0x3a/0x140
  [  927.133015]  [8119f008] kprobe_perf_func+0x28/0x220
  [  927.134195]  [811a1668] kprobe_dispatcher+0x38/0x60
  [  927.135367]  [81174b91] ? seccomp_phase1+0x1/0x230
  [  927.136523]  [81061400] kprobe_ftrace_handler+0xf0/0x150
  [  927.137666]  [81174b95] ? seccomp_phase1+0x5/0x230
  [  927.138802]  [8117950c] ftrace_ops_recurs_func+0x5c/0xb0
  [  927.139934]  [a022b0d5] 0xa022b0d5
  [  927.141066]  [81174b91] ? seccomp_phase1+0x1/0x230
  [  927.142199]  [81174b95] seccomp_phase1+0x5/0x230
  [  927.143323]  [8102c0a4] syscall_trace_enter_phase1+0xc4/0x150
  [  927.144450]  [81174b95] ? seccomp_phase1+0x5/0x230
  [  927.145572]  [8102c0a4] ? syscall_trace_enter_phase1+0xc4/0x150
  [  927.14]  [817f9a9f] tracesys+0xd/0x44
  [  927.147723] Code: 48 8b 46 10 48 39 d0 76 2c 8b 85 fc fd ff ff 83 f8 20 77 
21 83
   c0 01 89 85 fc fd ff ff 48 8d 44 d6 80 48 8b 00 48 83 f8 
00 74
   0a 48 8b 40 20 48 83 c0 33 ff e0 48 89 d8 48 8b 9d d8 
fd ff
   ff 4c
  [  927.150046] RIP  [a002440d] 0xa002440d

The code section with the instructions that traps points into the eBPF JIT
image of the root program (the one invoking the tail call instruction).

Using bpf_jit_disasm -o on the eBPF root program image:

  [...]
  4e:   mov-0x204(%rbp),%eax
8b 85 fc fd ff ff
  54:   cmp$0x20,%eax   --- if (tail_call_cnt  
MAX_TAIL_CALL_CNT)
83 f8 20
  57:   ja 0x007a
77 21
  59:   add$0x1,%eax--- tail_call_cnt++
83 c0 01
  5c:   mov%eax,-0x204(%rbp)
89 85 fc fd ff ff
  62:   lea-0x80(%rsi,%rdx,8),%rax  --- prog = array-prog[index]
48 8d 44 d6 80
  67:   mov(%rax),%rax
48 8b 00
  6a:   cmp$0x0,%rax--- check for NULL
48 83 f8 00
  6e:   je 0x007a
74 0a
  70:   mov0x20(%rax),%rax  --- GPF triggered here! fetch of 
bpf_func
48 8b 40 20  [ matches 48 8b 40 20 ... 
from above ]
  74:   add$0x33,%rax   --- prologue skip of new prog
48 83 c0 33
  78:   jmpq   *%rax--- jump to new prog insns
ff e0
  [...]

The problem is that rax has 5a5a5a5a5a5a5a5a, which suggests a tail call
jump to map slot 0 is pointing to a poisoned page. The issue is the following:

lea instruction has a wrong offset, i.e. it should be ...

  lea0x80(%rsi,%rdx,8),%rax

... but it actually seems to be ...

  lea   -0x80(%rsi,%rdx,8),%rax

... where 0x80 is offsetof(struct bpf_array, prog), thus the offset needs
to be positive instead of negative. Disassembling the

[PATCH net] sctp: fix sockopt size check

2015-07-28 Thread Marcelo Ricardo Leitner

The problem is not on being bigger than what we want, but on being
smaller, as it causes read of invalid memory.

Note that the struct changes on commit 7e8616d8e773 didn't affect
sctp_setsockopt_events one but that's where this check was flipped.

Fixes: 7e8616d8e773 ([SCTP]: Update AUTH structures to match
declarations in draft-16.)
Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
---
 net/sctp/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 
1425ec2bbd5ae359a8e0408a89a6da6bb60bd87e..6c4f0dac2104d38ba6420ce6740224866a2ece82
 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2195,7 +2195,7 @@ static int sctp_setsockopt_events(struct sock *sk, char 
__user *optval,
struct sctp_association *asoc;
struct sctp_ulpevent *event;
 
-   if (optlen  sizeof(struct sctp_event_subscribe))
+   if (optlen  sizeof(struct sctp_event_subscribe))
return -EINVAL;
if (copy_from_user(sctp_sk(sk)-subscribe, optval, optlen))
return -EFAULT;
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v4] af_mpls: fix undefined reference to ip6_route_output

2015-07-28 Thread Robert Shearman


On 28/07/15 07:40, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

Undefined reference to ip6_route_output and ip_route_output
was reported with CONFIG_INET=n and CONFIG_IPV6=n.

This patch adds new CONFIG_MPLS_NEXTHOP_DEVLOOKUP
to lookup nexthop device if user has not specified it
in RTA_OIF attribute. Make CONFIG_MPLS_NEXTHOP_DEVLOOKUP
depend on INET and (IPV6 || IPV6=n) because it
uses ip6_route_output and ip_route_output.

Reported-by: kbuild test robot fengguang...@intel.com
Reported-by: Thomas Graf tg...@suug.ch
Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com


Is there a compelling reason to allow the user/applications to not 
specify the output interface and to derive it from the nexthop? If the 
user/application intends to treat this as a recursive route then it has 
to make sure to trigger route updates to the kernel anyway, and an 
application should have the output interface and real nexthop close to 
hand in that case.


If there isn't a compelling reason, then perhaps the best course of 
action is to revert the commit, instead of introducing a level of config 
complexity that means that users/applications may not be able to rely on 
this capability anyway?


Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v4] af_mpls: fix undefined reference to ip6_route_output

2015-07-28 Thread roopa


On 7/28/15, 7:17 AM, Robert Shearman wrote:

On 28/07/15 07:40, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

Undefined reference to ip6_route_output and ip_route_output
was reported with CONFIG_INET=n and CONFIG_IPV6=n.

This patch adds new CONFIG_MPLS_NEXTHOP_DEVLOOKUP
to lookup nexthop device if user has not specified it
in RTA_OIF attribute. Make CONFIG_MPLS_NEXTHOP_DEVLOOKUP
depend on INET and (IPV6 || IPV6=n) because it
uses ip6_route_output and ip_route_output.

Reported-by: kbuild test robot fengguang...@intel.com
Reported-by: Thomas Graf tg...@suug.ch
Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com


Is there a compelling reason to allow the user/applications to not 
specify the output interface and to derive it from the nexthop? If the 
user/application intends to treat this as a recursive route then it 
has to make sure to trigger route updates to the kernel anyway, and an 
application should have the output interface and real nexthop close to 
hand in that case.


RTA_OIF is optional for ipv4 and ipv6 routes and we wanted to keep it 
that way for mpls routes as well (Quagga is the application in our use 
case).
It was a simple patch...until i realized the IPV6 dependency issues (I 
will sure remember this next time).




If there isn't a compelling reason, then perhaps the best course of 
action is to revert the commit, instead of introducing a level of 
config complexity that means that users/applications may not be able 
to rely on this capability anyway?
The config option though looks complex should not introduce any 
complexity for the user. It is on by default and always on for the 
default case.
Only for the cases where the IPV6 is a loaded as a module and 
MPLS_ROUTING is not, the app may get family not supported errors.
I did suggest a revert the first time. Mainly for me to fix the mistake 
i made and resubmit after proper IPV6 dependency testing.


I am in the process of trying the option that hannes suggested.

Thanks,
Roopa


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next 0/16] Proposal for VRF-lite - v3


On 7/27/15 2:30 PM, Eric W. Biederman wrote:

This paragraph is false when it comes to sockets, as I have already
pointed out.

- VPN Routing and Forwarding (RFC4364 and it's kin) implies isolation
   strong enough to allow using the the same ip on different machines
   in different VPN instances and not have confusion.

- The routing table is not the only table in the kernel that uses
   an ip address as a key.

   The result is that you can combine packets fragments that come in
   on different interfaces (irrespective of your VPN), confuse tcp
   parameters between interfaces, scramble your ipsec connections and I
   don't know what else.


The duplicate IP address is a problem with the networking stack today; 
the VRF device does not introduce it. The VRF device does allow 
duplicate IP addresses within a namespace but separate VRFs, though yes 
various places that rely solely on source address like IP fragmentation 
do need to be fixed.


I looked at the IPv4 fragmentation code yesterday and will continue 
today. So help me with the history: is there any reason why the device 
index is not used today? It seems like a straight forward change.


1. simple netdevices with the same IP address
-- no problem using index in the lookup

2. 2 ipsec tunnels -- different netdevices, same IP address
-- no problem using index

3. stacked devices like bonding and team interfaces appear to the stack 
as a single device

-- no problem using index of stacked device

4. If an interface is deleted and a new one is created with the same IP 
address then we want to fail the lookup

-- no problem using index

5. other???

Is there a use case where I can't add ifindex of the incoming device (or 
higher level device if skb-dev is changed) to the hash and lookup for 
fragments?




Version 3
- addressed comments from first 2 RFCs with the exception of the name
   Nicolas: We will do the name conversion once we agree on what the
correct name should be (vrf, mrf or something else)


Not so.  I described the deep problems between your goals and your
implementation and they are not even mentioned let alone addressed.


I have addressed comments to the extent that I can. As I stated in my 
last followup to you Eric I did not understand your point. I asked for 
clarification, a --verbose if you will. I can't read your mind, so I 
need you to elaborate on your points to be able to respond and address 
your concerns.





-  packets flow through the VRF device in both directions allowing the
following:
- tcpdump -i vrfn
- tc rules on vrf device
- netfilter rules on vrf device

Ingo/Andy: I added you two as a start point for the proposed task related
changes. Not sure who should be the reviewer; please let me know
if someone else is more appropriate. Thanks.


It looks like you are trying to implement a namespace that isn't a
namespace.  Given that it is broken by design you have my nack.


This is an L3 separation within a namespace, not a device level 
separation which is what namespaces provide.


David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct


On 7/28/15 10:01 AM, Eric Dumazet wrote:

On Tue, 2015-07-28 at 14:19 +0200, Hannes Frederic Sowa wrote:

Hello Eric,

On Mon, 2015-07-27 at 15:33 -0500, Eric W. Biederman wrote:

David Ahern d...@cumulusnetworks.com writes:


Allow tasks to have a default device index for binding sockets. If
set
the value is passed to all AF_INET/AF_INET6 sockets when they are
created.

The task setting is passed parent to child on fork, but can be set
or
changed after task creation using prctl (if task has CAP_NET_ADMIN
permissions). The setting for a socket can be retrieved using
prctl().
This option allows an administrator to restrict a task to only
send/receive
packets through the specified device. In the case of VRF devices
this
option restricts tasks to a specific VRF.

Correlation of the device index to a specific VRF, ie.,
ifindex -- VRF device -- VRF id
is left to userspace.


Nacked-by: Eric W. Biederman ebied...@xmission.com

Because it is broken by design.  Your routing device is only safe for
programs that know it's limitations it is not appropriate for general
applications.

Since you don't even seen to know it's limitations I think this is a
bad path to walk down.


Can you please elaborate about the broken by design?

Different operating systems are already using this approach with good
success. I read your other mail regarding isolation of different VRFs
and I agree that all code which persists state depending solely on the
IP address is affected by this and this must be dealt with and fixed
(actually, there aren't too many).

But I wouldn't call that broken by design. This stuff will get fixed
like e.g. cross-talk between fragmentation queues, icmp rate limiters
etc, which could already happen in the past.

What is your opinion on the fundamental approach only from a user
perspective? Do you think that is broken, too?


I agree with Eric here.

This sk_bind_dev_if on task_struct is quite a hack.

What will be added next ? An array of dev_if ? netfilter support ?
af_packet support ? What about /proc files and netlink dumps ?


It could just as easily be a pointer to a struct (e.g., struct net_ctx) 
such that the intrusion to task_struct is simply 8 bytes -- very similar 
to the nsproxy used for the assorted namespaces. The struct can then 
contain whatever network config is imposed on the task.




We already have network namespaces. Extend this if needed, instead of
bypassing them.


Problems with using network namespaces for VRFs has been discussed in 
the past. e.g.,

http://www.spinics.net/lists/netdev/msg298368.html

David



No need to add something else (with lack of proper reporting for various
tools)




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct

2015-07-28 Thread Andy Lutomirski

On Jul 27, 2015 11:33 AM, David Ahern d...@cumulusnetworks.com wrote:

 Allow tasks to have a default device index for binding sockets. If set
 the value is passed to all AF_INET/AF_INET6 sockets when they are created.


This is not intended to be a review of the concept.  I haven't thought
about whether the concept is a good idea, broken by design, or
whatever.  FWIW, if this were added to the kernel and didn't require
excessive privilege, I'd probably use it.  (I still don't really
understand why binding to a device requires privilege in the first
place, but, again, I haven't thought about it very much.)

 +#ifdef CONFIG_NET
 +   case PR_SET_SK_BIND_DEV_IF:
 +   {
 +   struct net_device *dev;
 +   int idx = (int) arg2;
 +
 +   if (!capable(CAP_NET_ADMIN))
 +   return -EPERM;
 +

Can you either use ns_capable or add a comment as to why not?

Also, please return -EINVAL if unused args are nonzero.

 +   if (idx) {
 +   dev = dev_get_by_index(me-nsproxy-net_ns, idx);
 +   if (!dev)
 +   return -EINVAL;
 +   dev_put(dev);
 +   }
 +   me-sk_bind_dev_if = idx;
 +   break;
 +   }
 +   case PR_GET_SK_BIND_DEV_IF:
 +   {
 +   struct task_struct *tsk;
 +   int sk_bind_dev_if = -EINVAL;
 +
 +   rcu_read_lock();
 +   tsk = find_task_by_vpid(arg2);
 +   if (tsk)
 +   sk_bind_dev_if = tsk-sk_bind_dev_if;

Why do you support different tasks here?  Could this use proc instead?

The same -EINVAL issue applies.

Also, I think you need to hook setns and unshare to do something
reasonable when the task is bound to a device.

--Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] net: mdio-octeon: Modify driver to work on both ThunderX and Octeon

2015-07-28 Thread David Daney


On 07/27/2015 07:14 PM, mohun...@gmail.com wrote:

From: Radha Mohan Chintakuntla rchintakun...@cavium.com

This patch modifies the mdio-octeon driver to work on both ThunderX and
Octeon SoCs from Cavium Inc.

Signed-off-by: Sunil Goutham sgout...@cavium.com
Signed-off-by: Radha Mohan Chintakuntla rchintakun...@cavium.com
Signed-off-by: David Daney david.da...@cavium.com
---
  drivers/net/phy/Kconfig   |9 ++-
  drivers/net/phy/mdio-octeon.c |  122 +++-
  2 files changed, 111 insertions(+), 20 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index cf18940..0d6af19 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -145,13 +145,14 @@ config MDIO_GPIO
  will be called mdio-gpio.

  config MDIO_OCTEON
-   tristate Support for MDIO buses on Octeon SOCs
-   depends on CAVIUM_OCTEON_SOC
+   tristate Support for MDIO buses on Octeon and ThunderX SOCs
+   depends on 64BIT
default y


If it now depends only on 64BIT, we should probably remove the 
default.  People building for x86 are not interested in this driver.



[...]


+#ifdef __BIG_ENDIAN_BITFIELD
+#define OCT_MDIO_BITFIELD_FIELD(field, more)   \
+   field;  \
+   more
+
+#else
+#define OCT_MDIO_BITFIELD_FIELD(field, more)   \
+   more\
+   field;
+
+#endif
+
+union cvmx_smix_clk {
+   uint64_t u64;


Perhaps: s/uint64_t/u64/

There are several of these.



+   struct cvmx_smix_clk_s {
+ OCT_MDIO_BITFIELD_FIELD(u64 reserved_25_63:39,
+ OCT_MDIO_BITFIELD_FIELD(u64 mode:1,
+ OCT_MDIO_BITFIELD_FIELD(u64 reserved_21_23:3,
+ OCT_MDIO_BITFIELD_FIELD(u64 sample_hi:5,
+ OCT_MDIO_BITFIELD_FIELD(u64 sample_mode:1,
+ OCT_MDIO_BITFIELD_FIELD(u64 reserved_14_14:1,
+ OCT_MDIO_BITFIELD_FIELD(u64 clk_idle:1,
+ OCT_MDIO_BITFIELD_FIELD(u64 preamble:1,
+ OCT_MDIO_BITFIELD_FIELD(u64 sample:4,
+ OCT_MDIO_BITFIELD_FIELD(u64 phase:8,
+ ;))
+   } s;
+};
+

[...]
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct

On Tue, 2015-07-28 at 14:19 +0200, Hannes Frederic Sowa wrote:
 Hello Eric,
 
 On Mon, 2015-07-27 at 15:33 -0500, Eric W. Biederman wrote:
  David Ahern d...@cumulusnetworks.com writes:
  
   Allow tasks to have a default device index for binding sockets. If 
   set
   the value is passed to all AF_INET/AF_INET6 sockets when they are
   created.
   
   The task setting is passed parent to child on fork, but can be set 
   or
   changed after task creation using prctl (if task has CAP_NET_ADMIN
   permissions). The setting for a socket can be retrieved using 
   prctl().
   This option allows an administrator to restrict a task to only 
   send/receive
   packets through the specified device. In the case of VRF devices 
   this
   option restricts tasks to a specific VRF.
   
   Correlation of the device index to a specific VRF, ie.,
  ifindex -- VRF device -- VRF id
   is left to userspace.
  
  Nacked-by: Eric W. Biederman ebied...@xmission.com
  
  Because it is broken by design.  Your routing device is only safe for
  programs that know it's limitations it is not appropriate for general
  applications.
  
  Since you don't even seen to know it's limitations I think this is a
  bad path to walk down.
 
 Can you please elaborate about the broken by design?
 
 Different operating systems are already using this approach with good
 success. I read your other mail regarding isolation of different VRFs
 and I agree that all code which persists state depending solely on the
 IP address is affected by this and this must be dealt with and fixed
 (actually, there aren't too many).
 
 But I wouldn't call that broken by design. This stuff will get fixed
 like e.g. cross-talk between fragmentation queues, icmp rate limiters
 etc, which could already happen in the past.
 
 What is your opinion on the fundamental approach only from a user
 perspective? Do you think that is broken, too?

I agree with Eric here.

This sk_bind_dev_if on task_struct is quite a hack.

What will be added next ? An array of dev_if ? netfilter support ?
af_packet support ? What about /proc files and netlink dumps ?

We already have network namespaces. Extend this if needed, instead of
bypassing them.

No need to add something else (with lack of proper reporting for various
tools)


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct


On 7/28/15 9:25 AM, Andy Lutomirski wrote:

On Jul 27, 2015 11:33 AM, David Ahern d...@cumulusnetworks.com wrote:


Allow tasks to have a default device index for binding sockets. If set
the value is passed to all AF_INET/AF_INET6 sockets when they are created.



This is not intended to be a review of the concept.  I haven't thought
about whether the concept is a good idea, broken by design, or
whatever.  FWIW, if this were added to the kernel and didn't require
excessive privilege, I'd probably use it.  (I still don't really
understand why binding to a device requires privilege in the first
place, but, again, I haven't thought about it very much.)


The intent here is to restrict a task to only sending and receiving 
packets from a single network device. The device can be single ethernet 
interface, a stacked device (e.g, bond) or in our case a VRF device 
which restricts a task to interfaces (and hence network paths) 
associated with the VRF.





+#ifdef CONFIG_NET
+   case PR_SET_SK_BIND_DEV_IF:
+   {
+   struct net_device *dev;
+   int idx = (int) arg2;
+
+   if (!capable(CAP_NET_ADMIN))
+   return -EPERM;
+


Can you either use ns_capable or add a comment as to why not?


will do.



Also, please return -EINVAL if unused args are nonzero.


ok.




+   if (idx) {
+   dev = dev_get_by_index(me-nsproxy-net_ns, idx);
+   if (!dev)
+   return -EINVAL;
+   dev_put(dev);
+   }
+   me-sk_bind_dev_if = idx;
+   break;
+   }
+   case PR_GET_SK_BIND_DEV_IF:
+   {
+   struct task_struct *tsk;
+   int sk_bind_dev_if = -EINVAL;
+
+   rcu_read_lock();
+   tsk = find_task_by_vpid(arg2);
+   if (tsk)
+   sk_bind_dev_if = tsk-sk_bind_dev_if;


Why do you support different tasks here?  Could this use proc instead?


In this case we want to allow a separate process to determine if a task 
is restricted to a device.




The same -EINVAL issue applies.

Also, I think you need to hook setns and unshare to do something
reasonable when the task is bound to a device.


ack on both.

Thanks for the review,
David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/8] Use correctly the Xen memory terminologies in Linux

2015-07-28 Thread Julien Grall

Hi all,

This patch series aims to use the memory terminologies described in
include/linux/mm.h [1] for Linux xen code.

Linux is using mistakenly MFN when GFN is meant, I suspect this is because the
first support of Xen was for PV. This has brought some misimplementation
of memory helpers on ARM and make the developper confused about the expected
behavior.

For instance, with pfn_to_mfn, we expect to get a MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.
Most of the callers are also using it this way.

The first 2 patches of this series is ARM related in order to remove
PV specific helpers which should not be used and fixing the implementation of
pfn_to_mfn.

The rest of the series is here rename most of the usage in the common code
of MFN to GFN. I also took the opportunity to replace most of the call to
pfn_to_gfn in the common code by page_to_gfn avoid construction such
as pfn_to_gfn(page_to_pfn(...).

Note the one xen-blkfront will be dropped by 64K series [2], I can include it
if necessary.

This series is based on Linux 4.2-rc4. A branch with all the patches
can be found here:
git://xenbits.xen.org/people/julieng/linux-arm.git branch page-renaming-v1

Sincerely yours,

[1] Xen tree: e758ed14f390342513405dd766e874934573e6cb
[2] https://lkml.org/lkml/2015/7/9/628

Cc: Boris Ostrovsky boris.ostrov...@oracle.com
Cc: David Vrabel david.vra...@citrix.com
Cc: Dmitry Torokhov dmitry.torok...@gmail.com
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: H. Peter Anvin h...@zytor.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Ingo Molnar mi...@redhat.com
Cc: James E.J. Bottomley jbottom...@odin.com
Cc: Jean-Christophe Plagniol-Villard plagn...@jcrosoft.com
Cc: Jiri Slaby jsl...@suse.com
Cc: Juergen Gross jgr...@suse.com
Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
Cc: linux-...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-fb...@vger.kernel.org
Cc: linux-in...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: Roger Pau Monné roger@citrix.com
Cc: Russell King li...@arm.linux.org.uk
Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Thomas Gleixner t...@linutronix.de
Cc: Tomi Valkeinen tomi.valkei...@ti.com
Cc: Wei Liu wei.l...@citrix.com
Cc: x...@kernel.org

Julien Grall (8):
  arm/xen: Remove helpers which are PV specific
  xen: Make clear that swiotlb and biomerge are dealing with DMA address
  arm/xen: implement correctly pfn_to_mfn
  xen: Use the correctly the Xen memory terminologies
  xen/tmem: Use page_to_gfn rather than pfn_to_gfn
  video/xen-fbfront: Further s/MFN/GFN clean-up
  hvc/xen: Further s/MFN/GFN clean-up
  xen/privcmd: Further s/MFN/GFN/ clean-up

 arch/arm/include/asm/xen/page.h | 44 +++---
 arch/arm/xen/enlighten.c| 18 ++---
 arch/arm/xen/mm.c   |  4 +--
 arch/x86/include/asm/xen/page.h | 34 +--
 arch/x86/xen/enlighten.c|  4 +--
 arch/x86/xen/mmu.c  | 48 -
 arch/x86/xen/p2m.c  | 32 +++---
 arch/x86/xen/setup.c| 12 -
 arch/x86/xen/smp.c  |  4 +--
 arch/x86/xen/suspend.c  |  8 +++---
 drivers/block/xen-blkfront.c|  6 ++---
 drivers/input/misc/xen-kbdfront.c   |  4 +--
 drivers/net/xen-netback/netback.c   |  4 +--
 drivers/net/xen-netfront.c  |  8 +++---
 drivers/scsi/xen-scsifront.c|  8 +++---
 drivers/tty/hvc/hvc_xen.c   | 18 +
 drivers/video/fbdev/xen-fbfront.c   | 20 +++---
 drivers/xen/balloon.c   |  2 +-
 drivers/xen/biomerge.c  |  6 ++---
 drivers/xen/events/events_base.c|  2 +-
 drivers/xen/events/events_fifo.c|  4 +--
 drivers/xen/gntalloc.c  |  3 ++-
 drivers/xen/manage.c|  2 +-
 drivers/xen/privcmd.c   | 44 +++---
 drivers/xen/swiotlb-xen.c   | 16 +--
 drivers/xen/tmem.c  | 21 +--
 drivers/xen/xenbus/xenbus_client.c  |  2 +-
 drivers/xen/xenbus/xenbus_dev_backend.c |  2 +-
 drivers/xen/xenbus/xenbus_probe.c   |  8 +++---
 drivers/xen/xlate_mmu.c | 18 ++---
 include/uapi/xen/privcmd.h  |  4 +++
 include/xen/page.h  |  4 +--
 include/xen/xen-ops.h   | 10 +++
 33 files changed, 210 insertions(+), 214 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v4] af_mpls: fix undefined reference to ip6_route_output

2015-07-28 Thread roopa


On 7/28/15, 6:04 AM, Hannes Frederic Sowa wrote:

On Mon, 2015-07-27 at 23:40 -0700, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

Undefined reference to ip6_route_output and ip_route_output
was reported with CONFIG_INET=n and CONFIG_IPV6=n.

This patch adds new CONFIG_MPLS_NEXTHOP_DEVLOOKUP
to lookup nexthop device if user has not specified it
in RTA_OIF attribute. Make CONFIG_MPLS_NEXTHOP_DEVLOOKUP
depend on INET and (IPV6 || IPV6=n) because it
uses ip6_route_output and ip_route_output.

Reported-by: kbuild test robot fengguang...@intel.com
Reported-by: Thomas Graf tg...@suug.ch
Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---

v1 - v2: use IS_BUILTIN

v2 - v3: Use new Kconfig option that depends on (IPV6 || IPV6=n) as
 suggested by Dave. Also uses IS_ERR as suggested by Thomas.

v3 - v4: Include missed case of (MPLS_ROUTING=y  IPV6=m) reported by
  Dave.

  net/mpls/Kconfig   |8 
  net/mpls/af_mpls.c |   19 ++-
  2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
index 5c467ef..134764e 100644
--- a/net/mpls/Kconfig
+++ b/net/mpls/Kconfig
@@ -33,4 +33,12 @@ config MPLS_IPTUNNEL
---help---
 mpls ip tunnel support.
  
+config MPLS_NEXTHOP_DEVLOOKUP

+   bool MPLS: nexthop oif dev lookup
+   depends on MPLS_ROUTING  INET  \
+   ((IPV6  !(MPLS_ROUTING=y  IPV6=m)) || IPV6=n)
+   ---help---
+This enables mpls route nexthop dev lookup when oif is not
+specified by user
+

Urks.

Can't you simply use ipv6_stub_impl.ipv6_dst_lookup with sk=NULL to do
that and don't have a run-time dependency on IPv6 at all (for the cost
of a function pointer).

I did not realize that this could be an option. I now see vxlan using it.
I will try it out.


Maybe same for IPv4?
I would prefer leaving IPV4 alone with CONFIG_INET. IPV6 was my problem 
case.

Let me see if i can fix that first without introducing a config option.

Thanks,
Roopa





--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH net 1/2] r8152: add pre_reset and post_reset

Oliver Neukum [mailto:oneu...@suse.com]
 Sent: Tuesday, July 28, 2015 4:53 PM
[...]
  +   return 0;
  +
  +   netdev = tp-netdev;
  +   if (!netif_running(netdev))
  +   return 0;
  +
  +   ret = usb_autopm_get_interface(intf);
  +   if (ret  0)
  +   return ret;
 
 What sense does this make?
 
[...]
  +   return 0;
  +
  +   netdev = tp-netdev;
  +   if (!netif_running(netdev))
  +   return 0;
  +
  +   ret = usb_autopm_get_interface(intf);
 
 The device will be awake.

I don't sure if the device would be in runtimesuspend, so I wake it up by 
myself.
I think you mean I don't have to do this. I would remove them and resend the
patch. Thanks.

Best Regards,
Hayes

[PATCH V4 2/7] Drivers: hv: vmbus: define a new VMBus message type for hvsock

A function to send the type of message is also added.

The coming net/hvsock driver will use this function to proactively request
the host to offer a VMBus channel for a new hvsock connection.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/channel.c  | 15 +++
 drivers/hv/channel_mgmt.c |  4 
 include/linux/hyperv.h| 13 +
 3 files changed, 32 insertions(+)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 603ce97..b09d1b7 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -218,6 +218,21 @@ error0:
 }
 EXPORT_SYMBOL_GPL(vmbus_open);
 
+/* Used for Hyper-V Socket: a guest client's connect() to the host */
+int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
+ const uuid_le *shv_host_servie_id)
+{
+   struct vmbus_channel_tl_connect_request conn_msg;
+
+   memset(conn_msg, 0, sizeof(conn_msg));
+   conn_msg.header.msgtype = CHANNELMSG_TL_CONNECT_REQUEST;
+   conn_msg.guest_endpoint_id = *shv_guest_servie_id;
+   conn_msg.host_service_id = *shv_host_servie_id;
+
+   return vmbus_post_msg(conn_msg, sizeof(conn_msg));
+}
+EXPORT_SYMBOL_GPL(vmbus_send_tl_connect_request);
+
 /*
  * create_gpadl_header - Creates a gpadl for the specified buffer
  */
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 4506a66..7018c53 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -772,6 +772,10 @@ struct vmbus_channel_message_table_entry
{CHANNELMSG_VERSION_RESPONSE,   1, vmbus_onversion_response},
{CHANNELMSG_UNLOAD, 0, NULL},
{CHANNELMSG_UNLOAD_RESPONSE,1, vmbus_unload_response},
+   {CHANNELMSG_18, 0, NULL},
+   {CHANNELMSG_19, 0, NULL},
+   {CHANNELMSG_20, 0, NULL},
+   {CHANNELMSG_TL_CONNECT_REQUEST, 0, NULL},
 };
 
 /*
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2ca3ac1..264093a 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -393,6 +393,10 @@ enum vmbus_channel_message_type {
CHANNELMSG_VERSION_RESPONSE = 15,
CHANNELMSG_UNLOAD   = 16,
CHANNELMSG_UNLOAD_RESPONSE  = 17,
+   CHANNELMSG_18   = 18,
+   CHANNELMSG_19   = 19,
+   CHANNELMSG_20   = 20,
+   CHANNELMSG_TL_CONNECT_REQUEST   = 21,
CHANNELMSG_COUNT
 };
 
@@ -563,6 +567,13 @@ struct vmbus_channel_initiate_contact {
u64 monitor_page2;
 } __packed;
 
+/* Hyper-V socket: guest's connect()-ing to host */
+struct vmbus_channel_tl_connect_request {
+   struct vmbus_channel_message_header header;
+   uuid_le guest_endpoint_id;
+   uuid_le host_service_id;
+} __packed;
+
 struct vmbus_channel_version_response {
struct vmbus_channel_message_header header;
u8 version_supported;
@@ -1248,4 +1259,6 @@ extern struct resource hyperv_mmio;
 
 extern __u32 vmbus_proto_version;
 
+int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
+ const uuid_le *shv_host_servie_id);
 #endif /* _HYPERV_H */
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4 3/7] Drivers: hv: vmbus: add APIs to send/recv hvsock packet and get the r/w-ability

This will be used by the coming net/hvsock driver.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/channel.c  | 134 ++
 drivers/hv/hyperv_vmbus.h |   4 ++
 drivers/hv/ring_buffer.c  |  14 +
 include/linux/hyperv.h|  32 +++
 4 files changed, 184 insertions(+)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index b09d1b7..531a142 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -758,6 +758,53 @@ int vmbus_sendpacket_pagebuffer_ctl(struct vmbus_channel 
*channel,
 EXPORT_SYMBOL_GPL(vmbus_sendpacket_pagebuffer_ctl);
 
 /*
+ * vmbus_sendpacket_hvsock - Send the hvsock payload 'buf' into the vmbus
+ * ringbuffer
+ */
+int vmbus_sendpacket_hvsock(struct vmbus_channel *channel, void *buf, u32 len)
+{
+   struct vmpipe_proto_header pipe_hdr;
+   struct vmpacket_descriptor desc;
+   struct kvec bufferlist[4];
+   u32 packetlen_aligned;
+   u32 packetlen;
+   u64 aligned_data = 0;
+   bool signal = false;
+   int ret;
+
+   packetlen = HVSOCK_HEADER_LEN + len;
+   packetlen_aligned = ALIGN(packetlen, sizeof(u64));
+
+   /* Setup the descriptor */
+   desc.type = VM_PKT_DATA_INBAND;
+   /* in 8-bytes granularity */
+   desc.offset8 = sizeof(struct vmpacket_descriptor)  3;
+   desc.len8 = (u16)(packetlen_aligned  3);
+   desc.flags = 0;
+   desc.trans_id = 0;
+
+   pipe_hdr.pkt_type = 1;
+   pipe_hdr.data_size = len;
+
+   bufferlist[0].iov_base = desc;
+   bufferlist[0].iov_len  = sizeof(struct vmpacket_descriptor);
+   bufferlist[1].iov_base = pipe_hdr;
+   bufferlist[1].iov_len  = sizeof(struct vmpipe_proto_header);
+   bufferlist[2].iov_base = buf;
+   bufferlist[2].iov_len  = len;
+   bufferlist[3].iov_base = aligned_data;
+   bufferlist[3].iov_len  = packetlen_aligned - packetlen;
+
+   ret = hv_ringbuffer_write(channel-outbound, bufferlist, 4, signal);
+
+   if (ret == 0  signal)
+   vmbus_setevent(channel);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(vmbus_sendpacket_hvsock);
+
+/*
  * vmbus_sendpacket_pagebuffer - Send a range of single-page buffer
  * packets using a GPADL Direct packet type.
  */
@@ -978,3 +1025,90 @@ int vmbus_recvpacket_raw(struct vmbus_channel *channel, 
void *buffer,
return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_recvpacket_raw);
+
+/*
+ * vmbus_recvpacket_hvsock - Receive the hvsock payload from the vmbus
+ * ringbuffer into the 'buffer'.
+ */
+int vmbus_recvpacket_hvsock(struct vmbus_channel *channel, void *buffer,
+   u32 bufferlen, u32 *buffer_actual_len)
+{
+   struct vmpipe_proto_header *pipe_hdr;
+   struct vmpacket_descriptor *desc;
+   u32 packet_len, payload_len;
+   bool signal = false;
+   int ret;
+
+   *buffer_actual_len = 0;
+
+   if (bufferlen  HVSOCK_HEADER_LEN)
+   return -ENOBUFS;
+
+   ret = hv_ringbuffer_peek(channel-inbound, buffer,
+HVSOCK_HEADER_LEN);
+   if (ret != 0)
+   return ret;
+
+   desc = (struct vmpacket_descriptor *)buffer;
+   packet_len = desc-len8  3;
+   if (desc-type != VM_PKT_DATA_INBAND ||
+   desc-offset8 != (sizeof(*desc) / 8) ||
+   packet_len  HVSOCK_HEADER_LEN)
+   return -EIO;
+
+   pipe_hdr = (struct vmpipe_proto_header *)(desc + 1);
+   payload_len = pipe_hdr-data_size;
+
+   if (pipe_hdr-pkt_type != 1 || payload_len == 0)
+   return -EIO;
+
+   if (HVSOCK_PKT_LEN(payload_len) != packet_len + PREV_INDICES_LEN)
+   return -EIO;
+
+   if (bufferlen  packet_len - HVSOCK_HEADER_LEN)
+   return -ENOBUFS;
+
+   /* Copy over the hvsock payload to the user buffer */
+   ret = hv_ringbuffer_read(channel-inbound, buffer,
+packet_len - HVSOCK_HEADER_LEN,
+HVSOCK_HEADER_LEN, signal);
+   if (ret != 0)
+   return ret;
+
+   *buffer_actual_len = payload_len;
+
+   if (signal)
+   vmbus_setevent(channel);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vmbus_recvpacket_hvsock);
+
+/*
+ * vmbus_get_hvsock_rw_status - can the ringbuffer be read/written?
+ */
+void vmbus_get_hvsock_rw_status(struct vmbus_channel *channel,
+   bool *can_read, bool *can_write)
+{
+   u32 avl_read_bytes, avl_write_bytes, dummy;
+
+   if (can_read != NULL) {
+   hv_get_ringbuffer_available_space(channel-inbound,
+ avl_read_bytes,
+ dummy);
+   *can_read = avl_read_bytes = HVSOCK_MIN_PKT_LEN;
+   }
+
+   /*
+* We write into the ringbuffer only when we're able to write a
+* a payload of 4096 bytes (the actual written payload's length may be
+* less than

[PATCH V4 6/7] hvsock: introduce Hyper-V VM Sockets feature

Hyper-V VM sockets (hvsock) supplies a byte-stream based communication
mechanism between the host and a guest. It's kind of TCP over VMBus, but
the transportation layer (VMBus) is much simpler than IP. With Hyper-V VM
Sockets, applications between the host and a guest can talk with each
other directly by the traditional BSD-style socket APIs.

Hyper-V VM Sockets is only available on Windows 10 host and later. The
patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui de...@microsoft.com
---

Changes since v1:
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: default m - default m if HYPERV
- MODULE_LICENSE: Dual MIT/GPL - Dual BSD/GPL

Changes since v2:
- fixed indentation issues
- removed pr_debug

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware's VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://thread.gmane.org/gmane.linux.network/365205.

However, though Hyper-V VM Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: u32 ContextID, u32 Port, but in
AF_HYPERV, the endpoint type is: GUID VM_ID, GUID ServiceID. Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like.notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

 MAINTAINERS |2 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   44 ++
 include/uapi/linux/hyperv.h |   16 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1430 +++
 9 files changed, 1510 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index e7bdbac..a4a7e03 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4941,7 +4941,9 @@ F:drivers/input/serio/hyperv-keyboard.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 
 I2C OVER PARALLEL PORT
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 5bf59c8..d5ef612 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -200,7 +200,8 @@ struct ucred {
 #define AF_ALG 38  /* Algorithm sockets*/
 #define AF_NFC 39  /* NFC sockets  */
 #define AF_VSOCK   40  /* vSockets */
-#define AF_MAX 41  /* For now.. */
+#define AF_HYPERV  41  /* Hyper-V virtual sockets  */
+#define AF_MAX 42  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -246,6 +247,7 @@ struct ucred {
 #define PF_ALG AF_ALG
 #define PF_NFC AF_NFC
 #define PF_VSOCK   AF_VSOCK
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..9951658
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,44 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include linux/kernel.h
+#include linux/hyperv.h
+#include net/sock.h
+
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
+
+#define HVSOCK_RCV_BUF_SZ  VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
+#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
+
+#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk))
+#define hvsock_to_sk(__hvsk)   ((struct sock *)(__hvsk))
+
+struct hvsock_sock {
+   /* sk must be the first member. */
+   struct sock sk;
+
+   struct sockaddr_hv local_addr;
+   struct sockaddr_hv remote_addr;
+
+   /* protected by the global hvsock_mutex */
+   struct list_head bound_list;
+   struct list_head connected_list;
+
+   struct list_head accept_queue;
+   /* used by enqueue and

[PATCH V4 7/7] Drivers: hv: vmbus: disable local interrupt when hvsock's callback is running

In the SMP guest case, when the per-channel callback hvsock_events() is
running on virtual CPU A, if the guest tries to close the connection on
virtual CPU B: we invoke vmbus_close() - vmbus_close_internal(),
then we can have trouble: on B, vmbus_close_internal() will send IPI
reset_channel_cb() to A, trying to set channel-onchannel_callbackto NULL;
on A, if the IPI handler happens between
if (channel-onchannel_callback != NULL) and invoking
channel-onchannel_callback, we'll invoke a function pointer of NULL.

This is why the patch is necessary.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/connection.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 4fc2e88..4766fd8 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -319,6 +319,9 @@ static void process_chn_event(u32 relid)
void *arg;
bool read_state;
u32 bytes_to_read;
+   bool is_hvsock = false;
+
+   local_irq_disable();
 
/*
 * Find the channel based on this relid and invokes the
@@ -327,7 +330,11 @@ static void process_chn_event(u32 relid)
channel = pcpu_relid2channel(relid);
 
if (!channel)
-   return;
+   goto out;
+
+   is_hvsock = is_hvsock_channel(channel);
+   if (!is_hvsock)
+   local_irq_enable();
 
/*
 * A channel once created is persistent even when there
@@ -363,6 +370,12 @@ static void process_chn_event(u32 relid)
bytes_to_read = 0;
} while (read_state  (bytes_to_read != 0));
}
+
+   /* local_irq_enable() is alredy invoked above */
+   if (!is_hvsock)
+   return;
+out:
+   local_irq_enable();
 }
 
 /*
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4 4/7] Drivers: hv: vmbus: add APIs to register callbacks to process hvsock connection

With the 2 APIs supplied by the VMBus driver, the coming net/hvsock driver
can register 2 callbacks and can know when a new hvsock connection is
offered by the host, and when a hvsock connection is being closed by the
host.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 drivers/hv/Makefile   |  4 ++-
 drivers/hv/channel_mgmt.c |  9 ++
 drivers/hv/hvsock_callbacks.c | 71 +++
 include/linux/hyperv.h| 10 ++
 4 files changed, 93 insertions(+), 1 deletion(-)
 create mode 100644 drivers/hv/hvsock_callbacks.c

diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index 39c9b2c..ef6f8a8 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -4,5 +4,7 @@ obj-$(CONFIG_HYPERV_BALLOON)+= hv_balloon.o
 
 hv_vmbus-y := vmbus_drv.o \
 hv.o connection.o channel.o \
-channel_mgmt.o ring_buffer.o
+channel_mgmt.o ring_buffer.o \
+hvsock_callbacks.o
+
 hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_fcopy.o hv_utils_transport.o
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 7018c53..a8b1e61 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -300,6 +300,12 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
return;
}
 
+   if (is_hvsock_channel(newchannel)) {
+   if (hvsock_process_offer(newchannel) != 0)
+   goto err_deq_chan;
+   return;
+   }
+
/*
 * Start the process of binding this offer to the driver
 * We need to set the DeviceObject field before calling
@@ -564,7 +570,10 @@ static void vmbus_onoffer_rescind(struct 
vmbus_channel_message_header *hdr)
vmbus_device_unregister(channel-device_obj);
put_device(dev);
}
+   } else if (is_hvsock_channel(channel)) {
+   hvsock_process_offer_rescind(channel);
} else {
+   /* it is a sub-channel. */
hv_process_channel_removal(channel,
channel-offermsg.child_relid);
}
diff --git a/drivers/hv/hvsock_callbacks.c b/drivers/hv/hvsock_callbacks.c
new file mode 100644
index 000..28f7b75
--- /dev/null
+++ b/drivers/hv/hvsock_callbacks.c
@@ -0,0 +1,71 @@
+/*
+ * Copyright (c) 2015, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
+
+#include linux/hyperv.h
+
+/* We should hold the mutex when getting/setting the function pointers */
+static DEFINE_MUTEX(hvsock_cb_mutex);
+static int (*__process_offer)(struct vmbus_channel *channel);
+static void (*__process_offer_rescind)(struct vmbus_channel *channel);
+
+int hvsock_process_offer(struct vmbus_channel *channel)
+{
+   int ret = -ENODEV;
+
+   mutex_lock(hvsock_cb_mutex);
+
+   if (__process_offer != NULL)
+   ret = __process_offer(channel);
+
+   mutex_unlock(hvsock_cb_mutex);
+
+   return ret;
+}
+
+void hvsock_process_offer_rescind(struct vmbus_channel *channel)
+{
+   mutex_lock(hvsock_cb_mutex);
+
+   if (__process_offer_rescind != NULL)
+   __process_offer_rescind(channel);
+   else
+   hv_process_channel_removal(channel,
+   channel-offermsg.child_relid);
+
+   mutex_unlock(hvsock_cb_mutex);
+}
+
+void vmbus_register_hvsock_callbacks(
+   int (*process_offer)(struct vmbus_channel *),
+   void (*process_offer_rescind)(struct vmbus_channel *))
+{
+   mutex_lock(hvsock_cb_mutex);
+
+   __process_offer = process_offer;
+   __process_offer_rescind = process_offer_rescind;
+
+   mutex_unlock(hvsock_cb_mutex);
+}
+EXPORT_SYMBOL_GPL(vmbus_register_hvsock_callbacks);
+
+void vmbus_unregister_hvsock_callbacks(void)
+{
+   mutex_lock(hvsock_cb_mutex);
+
+   __process_offer = NULL;
+   __process_offer_rescind = NULL;
+
+   mutex_unlock(hvsock_cb_mutex);
+}
+EXPORT_SYMBOL_GPL(vmbus_unregister_hvsock_callbacks);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index c8e27da..fda9790 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1269,6 +1269,16 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+
+extern int hvsock_process_offer(struct vmbus_channel *channel);
+extern void hvsock_process_offer_rescind(struct vmbus_channel

Re: [PATCH net 1/2] r8152: add pre_reset and post_reset

On Tue, 2015-07-28 at 09:52 +, Hayes Wang wrote:
 Oliver Neukum [mailto:oneu...@suse.com]
  Sent: Tuesday, July 28, 2015 4:53 PM
 [...]
   + return 0;
   +
   + netdev = tp-netdev;
   + if (!netif_running(netdev))
   + return 0;
   +
   + ret = usb_autopm_get_interface(intf);
   + if (ret  0)
   + return ret;
  
  What sense does this make?
  
 [...]
   + return 0;
   +
   + netdev = tp-netdev;
   + if (!netif_running(netdev))
   + return 0;
   +
   + ret = usb_autopm_get_interface(intf);
  
  The device will be awake.
 
 I don't sure if the device would be in runtimesuspend, so I wake it up by 
 myself.
 I think you mean I don't have to do this. I would remove them and resend the
 patch. Thanks.

Usbcore will resume the device.

HTH
Oliver

 A. 
/* Prevent autosuspend during the reset */
usb_autoresume_device(udev);

if (config) {
for (i = 0; i  config-desc.bNumInterfaces; ++i) {
struct usb_interface *cintf = config-interface[i];
struct usb_driver *drv;
int unbind = 0;

if (cintf-dev.driver) {
drv = to_usb_driver(cintf-dev.driver);
if (drv-pre_reset  drv-post_reset)
unbind = (drv-pre_reset)(cintf);




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] net/mlx4_en: Hardware accelerated 802.1ad works only on the first port

2015-07-28 Thread Amir Vadai

Fix mistakenly used, hard coded, port number in get_phv_bit()

Fixes: 77fc29c (net/mlx4_core: Preparations for 802.1ad VLAN support)
Signed-off-by: Amir Vadai am...@mellanox.com
---
Hi Dave,

Because of my mistake I've sent a version [1] without some internal review
fixes.  This patch fix the only code issue that was missing. The rest were only
improvements to the commit messages, which unfortunately it is too late to fix 
now.

[1] - http://www.spinics.net/lists/netdev/msg337148.html

Thanks,
Amir


 drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 5a1c3d2..e8ec1de 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -2815,7 +2815,7 @@ int get_phv_bit(struct mlx4_dev *dev, u8 port, int *phv)
struct mlx4_func_cap func_cap;
 
memset(func_cap, 0, sizeof(func_cap));
-   err = mlx4_QUERY_FUNC_CAP(dev, 1, func_cap);
+   err = mlx4_QUERY_FUNC_CAP(dev, port, func_cap);
if (!err)
*phv = func_cap.flags  QUERY_FUNC_CAP_PHV_BIT;
return err;
-- 
2.4.3.413.ga5fe668

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4 1/7] Drivers: hv: vmbus: define the new offer type for Hyper-V socket (hvsock)

A helper function is also added.

Signed-off-by: Dexuan Cui de...@microsoft.com
---
 include/linux/hyperv.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 30d3a1f..2ca3ac1 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -236,6 +236,7 @@ struct vmbus_channel_offer {
 #define VMBUS_CHANNEL_LOOPBACK_OFFER   0x100
 #define VMBUS_CHANNEL_PARENT_OFFER 0x200
 #define VMBUS_CHANNEL_REQUEST_MONITORED_NOTIFICATION   0x400
+#define VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER 0x2000
 
 struct vmpacket_descriptor {
u16 type;
@@ -758,6 +759,12 @@ struct vmbus_channel {
struct list_head percpu_list;
 };
 
+static inline bool is_hvsock_channel(const struct vmbus_channel *c)
+{
+   return !!(c-offermsg.offer.chn_flags 
+ VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER);
+}
+
 static inline void set_channel_read_state(struct vmbus_channel *c, bool state)
 {
c-batched_reading = state;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2 net-next] bridge: mdb: add support for router add/del notifications monitoring

On 07/27/2015 11:49 PM, Nikolay Aleksandrov wrote:
 
 On 27 Jul 2015, at 23:40, Stephen Hemminger step...@networkplumber.org 
 wrote:

 On Mon, 27 Jul 2015 13:44:05 +0200
 Nikolay Aleksandrov ra...@blackwall.org wrote:

 From: Nikolay Aleksandrov niko...@cumulusnetworks.com

 This patch adds support for ADDMDB/DELMDB notifications about router ports
 which have been added or deleted/expired respectively.

 Example output:
 $ bridge -s monitor mdb
 Deleted router port dev eth3 master br0
 router port dev eth3 master br0

 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com

 Looks useful, applied.

 Does usage or manual page need to be updated as well?


 
 Good question :-) I'll look into it.
 Thanks!
 

I've looked into it and we don't need any documentation/man changes
the mdb monitoring command is the same and the description doesn't specify
what exactly is being returned, so we're good.

Cheers,
 Nik
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] packet: tpacket_snd(): fix signed/unsigned comparison

On Tue, 2015-07-28 at 13:07 +0200, Daniel Borkmann wrote:
 On 07/28/2015 12:57 PM, Alexander Drozdov wrote:
  tpacket_fill_skb() can return a negative value (-errno) which
  is stored in tp_len variable. In that case the following
  condition will be (but shouldn't be) true:
 
  tp_len  dev-mtu + dev-hard_header_len
 
  as dev-mtu and dev-hard_header_len are both unsigned.
 
  That may lead to just returning an incorrect EMSGSIZE errno
  to the user.
 
  Signed-off-by: Alexander Drozdov al.droz...@gmail.com
 
 Looks good to me, thanks!
 
 Acked-by: Daniel Borkmann dan...@iogearbox.net
 --

Fixes: 52f1454f629fa (packet: allow to transmit +4 byte in TX_RING slot for 
VLAN case)


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] bridge: mdb: fix delmdb state in the notification

From: Nikolay Aleksandrov niko...@cumulusnetworks.com

Since mdb states were introduced when deleting an entry the state was
left as it was set in the delete request from the user which leads to
the following output when doing a monitor (for example):
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 temp
^^^
Note the temp state in the delete notification which is wrong since
the entry was permanent, the state in a delete is always reported as
temp regardless of the real state of the entry.

After this patch:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

There's one important note to make here that the state is actually not
matched when doing a delete, so one can delete a permanent entry by
stating temp in the end of the command, I've chosen this fix in order
not to break user-space tools which rely on this (incorrect) behaviour.

So to give an example after this patch and using the wrong state:
$ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent
$ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp
(monitor) dev br0 port eth3 grp 239.0.0.1 permanent

Note the state of the entry that got deleted is correct in the
notification.

Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
Fixes: ccb1c31a7a87 (bridge: add flags to distinguish permanent mdb entires)
---
I propose to fix the state matching in net-next but we may risk breaking
some user-space tools which rely on this behaviour.

 net/bridge/br_mdb.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 1198a3dbad95..c94321955db7 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -445,6 +445,7 @@ static int __br_mdb_del(struct net_bridge *br, struct 
br_mdb_entry *entry)
if (p-port-state == BR_STATE_DISABLED)
goto unlock;
 
+   entry-state = p-state;
rcu_assign_pointer(*pp, p-next);
hlist_del_init(p-mglist);
del_timer(p-timer);
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2 2/2] r8152: reset device when tx timeout

On Tue, 2015-07-28 at 20:08 +0800, Hayes Wang wrote:
  static void rtl8152_tx_timeout(struct net_device *netdev)
  {
 struct r8152 *tp = netdev_priv(netdev);
 -   int i;
  
 netif_warn(tp, tx_err, netdev, Tx timeout\n);
 -   for (i = 0; i  RTL8152_MAX_TX; i++)
 -   usb_unlink_urb(tp-tx_info[i].urb);
 +
 +   usb_queue_reset_device(tp-intf);
 +   cancel_delayed_work(tp-schedule);

Sorry to bother you again, but this looks wrong.
You want to cancel first. There is no point in
running any work before the reset is done. It will
undo any progress anyway.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH net v2 2/2] r8152: reset device when tx timeout

Oliver Neukum [mailto:oneu...@suse.com]
 Sent: Tuesday, July 28, 2015 8:14 PM
[...]
   static void rtl8152_tx_timeout(struct net_device *netdev)  {
  struct r8152 *tp = netdev_priv(netdev);
  -   int i;
 
  netif_warn(tp, tx_err, netdev, Tx timeout\n);
  -   for (i = 0; i  RTL8152_MAX_TX; i++)
  -   usb_unlink_urb(tp-tx_info[i].urb);
  +
  +   usb_queue_reset_device(tp-intf);
  +   cancel_delayed_work(tp-schedule);
 
 Sorry to bother you again, but this looks wrong.
 You want to cancel first. There is no point in running any work before the 
 reset is
 done. It will undo any progress anyway.

Excuse me. Do you mean I don't need cancel the other work because it wouldn't be
run before the reset is finished?

Best Regards,
Hayes

[PATCH net] bridge: mcast: give fast leave precedence over multicast router and querier

From: Satish Ashok sas...@cumulusnetworks.com

When fast leave is configured on a bridge port and an IGMP leave is
received for a group, the group is not deleted immediately if there is
a router detected or if multicast querier is configured.
Ideally the group should be deleted immediately when fast leave is
configured.

Signed-off-by: Satish Ashok sas...@cumulusnetworks.com
---
 net/bridge/br_multicast.c | 50 ---
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 79db489cdade..0b39dcc65b94 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1416,8 +1416,7 @@ br_multicast_leave_group(struct net_bridge *br,
 
spin_lock(br-multicast_lock);
if (!netif_running(br-dev) ||
-   (port  port-state == BR_STATE_DISABLED) ||
-   timer_pending(other_query-timer))
+   (port  port-state == BR_STATE_DISABLED))
goto out;
 
mdb = mlock_dereference(br-mdb, br);
@@ -1425,6 +1424,31 @@ br_multicast_leave_group(struct net_bridge *br,
if (!mp)
goto out;
 
+   if (port  (port-flags  BR_MULTICAST_FAST_LEAVE)) {
+   struct net_bridge_port_group __rcu **pp;
+
+   for (pp = mp-ports;
+(p = mlock_dereference(*pp, br)) != NULL;
+pp = p-next) {
+   if (p-port != port)
+   continue;
+
+   rcu_assign_pointer(*pp, p-next);
+   hlist_del_init(p-mglist);
+   del_timer(p-timer);
+   call_rcu_bh(p-rcu, br_multicast_free_pg);
+   br_mdb_notify(br-dev, port, group, RTM_DELMDB);
+
+   if (!mp-ports  !mp-mglist 
+   netif_running(br-dev))
+   mod_timer(mp-timer, jiffies);
+   }
+   goto out;
+   }
+
+   if (timer_pending(other_query-timer))
+   goto out;
+
if (br-multicast_querier) {
__br_multicast_send_query(br, port, mp-addr);
 
@@ -1450,28 +1474,6 @@ br_multicast_leave_group(struct net_bridge *br,
}
}
 
-   if (port  (port-flags  BR_MULTICAST_FAST_LEAVE)) {
-   struct net_bridge_port_group __rcu **pp;
-
-   for (pp = mp-ports;
-(p = mlock_dereference(*pp, br)) != NULL;
-pp = p-next) {
-   if (p-port != port)
-   continue;
-
-   rcu_assign_pointer(*pp, p-next);
-   hlist_del_init(p-mglist);
-   del_timer(p-timer);
-   call_rcu_bh(p-rcu, br_multicast_free_pg);
-   br_mdb_notify(br-dev, port, group, RTM_DELMDB);
-
-   if (!mp-ports  !mp-mglist 
-   netif_running(br-dev))
-   mod_timer(mp-timer, jiffies);
-   }
-   goto out;
-   }
-
now = jiffies;
time = now + br-multicast_last_member_count *
 br-multicast_last_member_interval;
-- 
2.4.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] packet: tpacket_snd(): fix signed/unsigned comparison

2015-07-28 Thread Alexander Drozdov

tpacket_fill_skb() can return a negative value (-errno) which
is stored in tp_len variable. In that case the following
condition will be (but shouldn't be) true:

tp_len  dev-mtu + dev-hard_header_len

as dev-mtu and dev-hard_header_len are both unsigned.

That may lead to just returning an incorrect EMSGSIZE errno
to the user.

Signed-off-by: Alexander Drozdov al.droz...@gmail.com
---
 net/packet/af_packet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9e8741..d1d3625 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2403,7 +2403,8 @@ static int tpacket_snd(struct packet_sock *po, struct 
msghdr *msg)
}
tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
  addr, hlen);
-   if (tp_len  dev-mtu + dev-hard_header_len) {
+   if (likely(tp_len = 0) 
+   tp_len  dev-mtu + dev-hard_header_len) {
struct ethhdr *ehdr;
/* Earlier code assumed this would be a VLAN pkt,
 * double-check this now that we have the actual
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] packet: tpacket_snd(): fix signed/unsigned comparison

2015-07-28 Thread Daniel Borkmann


On 07/28/2015 12:57 PM, Alexander Drozdov wrote:

tpacket_fill_skb() can return a negative value (-errno) which
is stored in tp_len variable. In that case the following
condition will be (but shouldn't be) true:

tp_len  dev-mtu + dev-hard_header_len

as dev-mtu and dev-hard_header_len are both unsigned.

That may lead to just returning an incorrect EMSGSIZE errno
to the user.

Signed-off-by: Alexander Drozdov al.droz...@gmail.com


Looks good to me, thanks!

Acked-by: Daniel Borkmann dan...@iogearbox.net
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net v2 1/2] r8152: add pre_reset and post_reset

Add rtl8152_pre_reset() and rtl8152_post_reset() which are used when
calling usb_reset_device(). The two functions could reduce the time
of reset when calling usb_reset_device() after probe().

Signed-off-by: Hayes Wang hayesw...@realtek.com
---
 drivers/net/usb/r8152.c | 54 +
 1 file changed, 54 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 144dc64..e1b6d6d 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3342,6 +3342,58 @@ static void r8153_init(struct r8152 *tp)
r8153_u2p3en(tp, true);
 }
 
+static int rtl8152_pre_reset(struct usb_interface *intf)
+{
+   struct r8152 *tp = usb_get_intfdata(intf);
+   struct net_device *netdev;
+
+   if (!tp)
+   return 0;
+
+   netdev = tp-netdev;
+   if (!netif_running(netdev))
+   return 0;
+
+   napi_disable(tp-napi);
+   clear_bit(WORK_ENABLE, tp-flags);
+   usb_kill_urb(tp-intr_urb);
+   cancel_delayed_work_sync(tp-schedule);
+   if (netif_carrier_ok(netdev)) {
+   netif_stop_queue(netdev);
+   mutex_lock(tp-control);
+   tp-rtl_ops.disable(tp);
+   mutex_unlock(tp-control);
+   }
+
+   return 0;
+}
+
+static int rtl8152_post_reset(struct usb_interface *intf)
+{
+   struct r8152 *tp = usb_get_intfdata(intf);
+   struct net_device *netdev;
+
+   if (!tp)
+   return 0;
+
+   netdev = tp-netdev;
+   if (!netif_running(netdev))
+   return 0;
+
+   set_bit(WORK_ENABLE, tp-flags);
+   if (netif_carrier_ok(netdev)) {
+   mutex_lock(tp-control);
+   tp-rtl_ops.enable(tp);
+   rtl8152_set_rx_mode(netdev);
+   mutex_unlock(tp-control);
+   netif_wake_queue(netdev);
+   }
+
+   napi_enable(tp-napi);
+
+   return 0;
+}
+
 static int rtl8152_suspend(struct usb_interface *intf, pm_message_t message)
 {
struct r8152 *tp = usb_get_intfdata(intf);
@@ -4164,6 +4216,8 @@ static struct usb_driver rtl8152_driver = {
.suspend =  rtl8152_suspend,
.resume =   rtl8152_resume,
.reset_resume = rtl8152_resume,
+   .pre_reset =rtl8152_pre_reset,
+   .post_reset =   rtl8152_post_reset,
.supports_autosuspend = 1,
.disable_hub_initiated_lpm = 1,
 };
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net v2 2/2] r8152: reset device when tx timeout

The device reset is necessary if the hw becomes abnormal and stops
transmitting packets.

Signed-off-by: Hayes Wang hayesw...@realtek.com
---
 drivers/net/usb/r8152.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index e1b6d6d..6af299f 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -27,7 +27,7 @@
 #include linux/usb/cdc.h
 
 /* Version Information */
-#define DRIVER_VERSION v1.08.0 (2015/01/13)
+#define DRIVER_VERSION v1.08.1 (2015/07/28)
 #define DRIVER_AUTHOR Realtek linux nic maintainers nic_s...@realtek.com
 #define DRIVER_DESC Realtek RTL8152/RTL8153 Based USB Ethernet Adapters
 #define MODULENAME r8152
@@ -1902,11 +1902,11 @@ static void rtl_drop_queued_tx(struct r8152 *tp)
 static void rtl8152_tx_timeout(struct net_device *netdev)
 {
struct r8152 *tp = netdev_priv(netdev);
-   int i;
 
netif_warn(tp, tx_err, netdev, Tx timeout\n);
-   for (i = 0; i  RTL8152_MAX_TX; i++)
-   usb_unlink_urb(tp-tx_info[i].urb);
+
+   usb_queue_reset_device(tp-intf);
+   cancel_delayed_work(tp-schedule);
 }
 
 static void rtl8152_set_rx_mode(struct net_device *netdev)
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net v2 0/2] r8152: device reset

v2:
For patch #1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and
the checking of intf-condition.

For patch #2, replace the original method with usb_queue_reset_device() to reset
the device. 

v1:
Although the driver works normally, we find the device may get all 0xff data 
when
transmitting packets on certain platforms. It would break the device and no 
packet
could be transmitted. The reset is necessary to recover the hw for this 
situation.

Hayes Wang (2):
  r8152: add pre_reset and post_reset
  r8152: reset device when tx timeout

 drivers/net/usb/r8152.c | 90 ++---
 1 file changed, 86 insertions(+), 4 deletions(-)

-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct

Hello Eric,

On Mon, 2015-07-27 at 15:33 -0500, Eric W. Biederman wrote:
 David Ahern d...@cumulusnetworks.com writes:
 
  Allow tasks to have a default device index for binding sockets. If 
  set
  the value is passed to all AF_INET/AF_INET6 sockets when they are
  created.
  
  The task setting is passed parent to child on fork, but can be set 
  or
  changed after task creation using prctl (if task has CAP_NET_ADMIN
  permissions). The setting for a socket can be retrieved using 
  prctl().
  This option allows an administrator to restrict a task to only 
  send/receive
  packets through the specified device. In the case of VRF devices 
  this
  option restricts tasks to a specific VRF.
  
  Correlation of the device index to a specific VRF, ie.,
 ifindex -- VRF device -- VRF id
  is left to userspace.
 
 Nacked-by: Eric W. Biederman ebied...@xmission.com
 
 Because it is broken by design.  Your routing device is only safe for
 programs that know it's limitations it is not appropriate for general
 applications.
 
 Since you don't even seen to know it's limitations I think this is a
 bad path to walk down.

Can you please elaborate about the broken by design?

Different operating systems are already using this approach with good
success. I read your other mail regarding isolation of different VRFs
and I agree that all code which persists state depending solely on the
IP address is affected by this and this must be dealt with and fixed
(actually, there aren't too many).

But I wouldn't call that broken by design. This stuff will get fixed
like e.g. cross-talk between fragmentation queues, icmp rate limiters
etc, which could already happen in the past.

What is your opinion on the fundamental approach only from a user
perspective? Do you think that is broken, too?

Thanks,
Hannes

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] packet: remove handling of tx_ring from prb_shutdown_retire_blk_timer()

2015-07-28 Thread Tobias Klauser

Follow e8e85cc5eb57 (packet: remove handling of tx_ring) and remove
the tx_ring parameter from prb_shutdown_retire_blk_timer() as it is only
called with tx_ring = 0.

Signed-off-by: Tobias Klauser tklau...@distanz.ch
---
 net/packet/af_packet.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9e8741..2af8590 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -518,13 +518,11 @@ static void prb_del_retire_blk_timer(struct 
tpacket_kbdq_core *pkc)
 }
 
 static void prb_shutdown_retire_blk_timer(struct packet_sock *po,
-   int tx_ring,
struct sk_buff_head *rb_queue)
 {
struct tpacket_kbdq_core *pkc;
 
-   pkc = tx_ring ? GET_PBDQC_FROM_RB(po-tx_ring) :
-   GET_PBDQC_FROM_RB(po-rx_ring);
+   pkc = GET_PBDQC_FROM_RB(po-rx_ring);
 
spin_lock_bh(rb_queue-lock);
pkc-delete_blk_timer = 1;
@@ -4044,7 +4042,7 @@ static int packet_set_ring(struct sock *sk, union 
tpacket_req_u *req_u,
if (closing  (po-tp_version  TPACKET_V2)) {
/* Because we don't support block-based V3 on tx-ring */
if (!tx_ring)
-   prb_shutdown_retire_blk_timer(po, tx_ring, rb_queue);
+   prb_shutdown_retire_blk_timer(po, rb_queue);
}
release_sock(sk);
 
-- 
2.2.2


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH iproute2 net-next] bridge: mdb: add support for vlans

On 07/15/2015 05:45 PM, Nikolay Aleksandrov wrote:
 This patch allows the user to specify the vlan of the mdb group being
 added or deleted and adds support for displaying the vlan when
 dumping mdb information or monitoring it. It also updates the man page
 to reflect the new vid argument for mdb.
 
 Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com
 ---
 note: the cast in print_mdb_entry() was necessary to shut the compiler
 
  bridge/mdb.c  | 31 +++
  include/linux/if_bridge.h |  1 +
  man/man8/bridge.8 |  8 +++-
  3 files changed, 27 insertions(+), 13 deletions(-)
 

Hi Stephen,
Just wondering what's the state of this patch because I'd like to submit some
improvements in the same area and I'm wondering if I should do them on top
of this patch or if I need to change something in it ?

Thanks,
 Nik

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net v2 2/2] r8152: reset device when tx timeout

On Tue, 2015-07-28 at 12:31 +, Hayes Wang wrote:
 Oliver Neukum [mailto:oneu...@suse.com]
  Sent: Tuesday, July 28, 2015 8:14 PM
 [...]
static void rtl8152_tx_timeout(struct net_device *netdev)  {
   struct r8152 *tp = netdev_priv(netdev);
   -   int i;
  
   netif_warn(tp, tx_err, netdev, Tx timeout\n);
   -   for (i = 0; i  RTL8152_MAX_TX; i++)
   -   usb_unlink_urb(tp-tx_info[i].urb);
   +
   +   usb_queue_reset_device(tp-intf);
   +   cancel_delayed_work(tp-schedule);
  
  Sorry to bother you again, but this looks wrong.
  You want to cancel first. There is no point in running any work before the 
  reset is
  done. It will undo any progress anyway.
 
 Excuse me. Do you mean I don't need cancel the other work because it wouldn't 
 be
 run before the reset is finished?

No, whatever the other work will do, the reset will undo.

Regards
Oliver


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v4] af_mpls: fix undefined reference to ip6_route_output

On Mon, 2015-07-27 at 23:40 -0700, Roopa Prabhu wrote:
 From: Roopa Prabhu ro...@cumulusnetworks.com
 
 Undefined reference to ip6_route_output and ip_route_output
 was reported with CONFIG_INET=n and CONFIG_IPV6=n.
 
 This patch adds new CONFIG_MPLS_NEXTHOP_DEVLOOKUP
 to lookup nexthop device if user has not specified it
 in RTA_OIF attribute. Make CONFIG_MPLS_NEXTHOP_DEVLOOKUP
 depend on INET and (IPV6 || IPV6=n) because it
 uses ip6_route_output and ip_route_output.
 
 Reported-by: kbuild test robot fengguang...@intel.com
 Reported-by: Thomas Graf tg...@suug.ch
 Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
 ---
 
 v1 - v2: use IS_BUILTIN
 
 v2 - v3: Use new Kconfig option that depends on (IPV6 || IPV6=n) as
suggested by Dave. Also uses IS_ERR as suggested by Thomas.
 
 v3 - v4: Include missed case of (MPLS_ROUTING=y  IPV6=m) reported by
  Dave.
 
  net/mpls/Kconfig   |8 
  net/mpls/af_mpls.c |   19 ++-
  2 files changed, 26 insertions(+), 1 deletion(-)
 
 diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
 index 5c467ef..134764e 100644
 --- a/net/mpls/Kconfig
 +++ b/net/mpls/Kconfig
 @@ -33,4 +33,12 @@ config MPLS_IPTUNNEL
   ---help---
mpls ip tunnel support.
  
 +config MPLS_NEXTHOP_DEVLOOKUP
 + bool MPLS: nexthop oif dev lookup
 + depends on MPLS_ROUTING  INET  \
 + ((IPV6  !(MPLS_ROUTING=y  IPV6=m)) || IPV6=n)
 + ---help---
 +  This enables mpls route nexthop dev lookup when oif is not
 +  specified by user
 +

Urks.

Can't you simply use ipv6_stub_impl.ipv6_dst_lookup with sk=NULL to do
that and don't have a run-time dependency on IPv6 at all (for the cost
of a function pointer). Maybe same for IPv4?

If builtin you can inline those calls anyway.

Bye,
Hannes


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH 2/2] drivers: net: cpsw: add separate napi for tx packet handling for performance improvment

2015-07-28 Thread Mugunthan V N

On Wednesday 29 July 2015 04:00 AM, Francois Romieu wrote:
 Mugunthan V N mugunthan...@ti.com :
 On Tuesday 28 July 2015 02:52 AM, Francois Romieu wrote:
 Mugunthan V N mugunthan...@ti.com :
 [...]
 @@ -752,13 +753,22 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void 
 *dev_id)
struct cpsw_priv *priv = dev_id;
  
cpdma_ctlr_eoi(priv-dma, CPDMA_EOI_TX);
 -  cpdma_chan_process(priv-txch, 128);
 +  writel(0, priv-wr_regs-tx_en);
 +
 +  if (netif_running(priv-ndev)) {
 +  napi_schedule(priv-napi_tx);
 +  return IRQ_HANDLED;
 +  }


 cpsw_ndo_stop calls napi_disable: you can remove netif_running.


 This netif_running check is to find which interface is up as the
 interrupt is shared by both the interfaces. When first interface is down
 and second interface is active then napi_schedule for first interface
 will fail and second interface napi needs to be scheduled.

 So I don't think netif_running needs to be removed.
 
 Each interface has its own napi tx (resp. rx) context: I would had expected
 two unconditional napi_schedule per tx (resp. rx) shared irq, not one.
 
 I'll read it again after some sleep.
 

For each interrupt only one napi will be scheduled, when the first
interface is down then only second interface napi is scheduled in both
tx and rx irqs.

Regards
Mugunthan V N
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 5/5] s390/bpf: recache skb-data/hlen for skb_vlan_push/pop

2015-07-28 Thread Alexei Starovoitov


On 7/28/15 7:10 AM, Michael Holzheu wrote:

Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop
via helper functions. These functions may change skb-data/hlen.
This data is cached by s390 JIT to improve performance of ld_abs/ld_ind
instructions. Therefore after a change we have to reload the data.

In case of usage of skb_vlan_push/pop, in the prologue we store
the SKB pointer on the stack and restore it after BPF_JMP_CALL
to skb_vlan_push/pop.

Signed-off-by: Michael Holzheuholz...@linux.vnet.ibm.com


Thanks!
Acked-by: Alexei Starovoitov a...@plumgrid.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] ebpf, x86: fix general protection fault when tail call is invoked

2015-07-28 Thread Alexei Starovoitov


On 7/28/15 6:26 AM, Daniel Borkmann wrote:

After patch, disassembly:

   [...]
   9e:   lea0x80(%rsi,%rdx,8),%rax   --- CONFIG_LOCKDEP/CONFIG_LOCK_STAT
 48 8d 84 d6 80 00 00 00
   a6:   mov(%rax),%rax
 48 8b 00
   [...]

   [...]
   9e:   lea0x50(%rsi,%rdx,8),%rax   --- No CONFIG_LOCKDEP
 48 8d 84 d6 50 00 00 00
   a6:   mov(%rax),%rax
 48 8b 00
   [...]

Fixes: b52f00e6a715 (x86: bpf_jit: implement bpf_tail_call() helper)
Signed-off-by: Daniel Borkmanndan...@iogearbox.net


Thanks for fixing it.
Most of my development is actually with LOCKDEP on, but I don't ever
turn LOCK_STAT on, so sadly missed this 48 byte increase of 80 byte
structure :(

Acked-by: Alexei Starovoitov a...@plumgrid.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Xen-devel] [PATCH 4/8] xen: Use the correctly the Xen memory terminologies

2015-07-28 Thread David Vrabel

On 28/07/15 16:02, Julien Grall wrote:
 Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
 is meant, I suspect this is because the first support for Xen was for
 PV. This brough some misimplementation of helpers on ARM and make the
 developper confused the expected behavior.

For the benefit of other subsystem maintainers, this is a purely
mechanical change in Xen-specific terminology.  It doesn't need reviews
or acks from non-Xen people (IMO).

 For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
 Although, if we look at the implementation on x86, it's returning a GFN.
 
 For clarity and avoid new confusion, replace any reference of mfn into
 gnf in any helpers used by PV drivers.
 
 Take also the opportunity to simplify simple construction such
 as pfn_to_mfn(page_to_pfn(page)) into page_to_gfn. More complex clean up
 will come in follow-up patches.
 
 I think it may be possible to do further clean up in the x86 code to
 ensure that helpers returning machine address (such as virt_address) is
 not used by no auto-translated guests. I will let x86 xen expert doing
 it.

Reviewed-by: David Vrabel david.vra...@citrix.com

It looks a bit odd to use GFN in some of the PV code where the
hypervisor API uses MFN but overall I think using the correct
terminology where possible is best.  But I'd like to have Boris's or
Konrad's opinion on this.

David
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please can I trust you?

2015-07-28 Thread Brice Adams

My Dear Friend,

I am Mr. Brice Adams, staff and auditor of a Bank in Lome, Togo
Republic. I am the Account Officer to (Late Mr. Daniel I. Glade) whose
account is presently dormant, I advise you to keep this as a top
secret as I am still in service and intend to retire from service
after I conclude this deal with you. I have an important
Message/discussion with you about his death and his funds, the sum of
(6.5 Million Euros) left without a heir. If you can be of an
assistance to me, I will be pleased to offer to you 25% of the total
fund. Please I got your email contact through internet email directory
when I was searching for a trust worthy partner. If you are willing to
help me, I need the following information below from you;

Your full name.
Nationality
Telephone number..
Profession.
Age.

I will be humbly waiting your soonest response. Please contact direct
to my email address (brice2ad...@yahoo.fr) for more information.

With Respect,
Mr. Brice Adams.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct

On Tue, Jul 28, 2015 at 9:11 AM, David Ahern d...@cumulusnetworks.com wrote:
 On 7/28/15 9:25 AM, Andy Lutomirski wrote:

 On Jul 27, 2015 11:33 AM, David Ahern d...@cumulusnetworks.com wrote:


 Allow tasks to have a default device index for binding sockets. If set
 the value is passed to all AF_INET/AF_INET6 sockets when they are
 created.


 This is not intended to be a review of the concept.  I haven't thought
 about whether the concept is a good idea, broken by design, or
 whatever.  FWIW, if this were added to the kernel and didn't require
 excessive privilege, I'd probably use it.  (I still don't really
 understand why binding to a device requires privilege in the first
 place, but, again, I haven't thought about it very much.)


 The intent here is to restrict a task to only sending and receiving packets
 from a single network device. The device can be single ethernet interface, a
 stacked device (e.g, bond) or in our case a VRF device which restricts a
 task to interfaces (and hence network paths) associated with the VRF.

We are also intending to implement similar functionality for ILA to
restrict tasks (probably from cgroup) to binding to it's assigned
addresses. This seems most easily accomplished by adding a binding
interface which is only checked at bind time. After binding, the a
connection should be processed no differently than any others,
additional plumbing in the data path for network name spaces just
seems like overhead.

Tom


 +#ifdef CONFIG_NET
 +   case PR_SET_SK_BIND_DEV_IF:
 +   {
 +   struct net_device *dev;
 +   int idx = (int) arg2;
 +
 +   if (!capable(CAP_NET_ADMIN))
 +   return -EPERM;
 +


 Can you either use ns_capable or add a comment as to why not?


 will do.


 Also, please return -EINVAL if unused args are nonzero.


 ok.


 +   if (idx) {
 +   dev = dev_get_by_index(me-nsproxy-net_ns, idx);
 +   if (!dev)
 +   return -EINVAL;
 +   dev_put(dev);
 +   }
 +   me-sk_bind_dev_if = idx;
 +   break;
 +   }
 +   case PR_GET_SK_BIND_DEV_IF:
 +   {
 +   struct task_struct *tsk;
 +   int sk_bind_dev_if = -EINVAL;
 +
 +   rcu_read_lock();
 +   tsk = find_task_by_vpid(arg2);
 +   if (tsk)
 +   sk_bind_dev_if = tsk-sk_bind_dev_if;


 Why do you support different tasks here?  Could this use proc instead?


 In this case we want to allow a separate process to determine if a task is
 restricted to a device.


 The same -EINVAL issue applies.

 Also, I think you need to hook setns and unshare to do something
 reasonable when the task is bound to a device.


 ack on both.

 Thanks for the review,
 David
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 14/16] net: Add sk_bind_dev_if to task_struct

On Tue, 2015-07-28 at 10:07 -0600, David Ahern wrote:

 Problems with using network namespaces for VRFs has been discussed in 
 the past. e.g.,
  http://www.spinics.net/lists/netdev/msg298368.html

Great. Are you suggesting to get rid of network namespaces ?

If not, your proposal only increases bloat and maintenance burden.

If namespaces cant be fixed, they are the wrong design and we should
remove them.

If they can be fixed, they must be fixed.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 13/16] net: Introduce VRF device driver - v2