RCU hang on 4.2-rc5

2015-08-08 Thread Steinar H. Gunderson
Hi,

I have an issue with 4.2-rc5 where my machine hangs approximately once a day
with RCU problems. I initially didn't think it was networking related, but
the it seems to be (weakly) correlated with network activity, and there was
IPv6 stuff in one of the backtraces, so I've reassigned to the networking
component. However, I'm not sure if that generates an email to netdev. Could
anyone please have a look?

  https://bugzilla.kernel.org/show_bug.cgi?id=102291

/* Steinar */
-- 
Homepage: http://www.sesse.net/
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] net/ipv4: inconsistent routing table

2015-08-08 Thread Zang MingJie
Days ago I mistakenly set the gateway address on my box, then add the
default router, after I deleted the address my box can't access
Internet and all things looks fine. It takes me several hours to
figure out it is an kernel bug.

On Sat, Aug 8, 2015, 1:00 AM Hannes Frederic Sowa han...@stressinduktion.org 
wrote:
If we could rewind time, we could make local nexthops -EINVAL.

I don't think this is the proper solution. As almost all network OS
considers the routing table recursive, and it's next hop can be any
unicast ip address.

When the next hop is unreachable the entry won't be installed.

I suggest adding a new sysconf entry, when not set, behavior as the
same as now, when set recalculate the fib when necessary

BTW is there any way to check the fib table?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.

2015-08-08 Thread Arnd Bergmann
On Friday 07 August 2015 12:43:20 Robert Richter wrote:
 
 I would not pollute bgx_probe() with acpi and dt specifics, and instead
 keep bgx_init_phy(). The typical design pattern for this is:
 
 static int bgx_init_phy(struct bgx *bgx)
 {
 #ifdef CONFIG_ACPI
 if (!acpi_disabled)
 return bgx_init_acpi_phy(bgx);
 #endif
 return bgx_init_of_phy(bgx);
 }
 
 This adds acpi runtime detection (acpi=no), does not call dt code in
 case of acpi, and saves the #else for bgx_init_acpi_phy().
 

What you should really do is to use the same function for both,
using the generic device properties API. If that is not possible,
explain in a comment why not.

Aside from that, if you do have to use compile-time conditionals,
use 'if (IS_ENABLED(CONFIG_ACPI)  !acpi_disabled)' instead of
#ifdef, for readability. The compiler will produce the same binary,
but also give helpful warnings about incorrect code that you don't
get with #ifdef.

Arnd

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 01/20] net/xen-netback: xenvif_gop_frag_copy: move GSO check out of the loop

2015-08-08 Thread Wei Liu
On Fri, Aug 07, 2015 at 05:46:40PM +0100, Julien Grall wrote:
 The skb doesn't change within the function. Therefore it's only
 necessary to check if we need GSO once at the beginning.
 
 Signed-off-by: Julien Grall julien.gr...@citrix.com
 

Acked-by: Wei Liu wei.l...@citrix.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-08-08 Thread Wei Liu
On Fri, Aug 07, 2015 at 05:46:57PM +0100, Julien Grall wrote:
 The PV network protocol is using 4KB page granularity. The goal of this
 patch is to allow a Linux using 64KB page granularity working as a
 network backend on a non-modified Xen.
 
 It's only necessary to adapt the ring size and break skb data in small
 chunk of 4KB. The rest of the code is relying on the grant table code.
 
 Signed-off-by: Julien Grall julien.gr...@citrix.com
 
 ---
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Wei Liu wei.l...@citrix.com
 Cc: netdev@vger.kernel.org
 
[...]
 +#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
 +#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, XEN_PAGE_SIZE)
  
  struct xenvif_rx_meta {
   int id;
 @@ -80,16 +81,18 @@ struct xenvif_rx_meta {
  /* Discriminate from any valid pending_idx value. */
  #define INVALID_PENDING_IDX 0x
  
 -#define MAX_BUFFER_OFFSET PAGE_SIZE
 +#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE
  
  #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE
  
 +#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
 +

It might be clearer if you add a comment saying the maximum number of
frags is derived from the page size of the grant page, which happens to
be XEN_PAGE_SIZE at the moment. 

In the future we need to figure out the page size of grant page in a
dynamic way. We shall cross the bridge when we get there.

  /* It's possible for an skb to have a maximal number of frags
   * but still be less than MAX_BUFFER_OFFSET in size. Thus the
 - * worst-case number of copy operations is MAX_SKB_FRAGS per
 + * worst-case number of copy operations is MAX_XEN_SKB_FRAGS per
   * ring slot.
   */
 -#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 +#define MAX_GRANT_COPY_OPS (MAX_XEN_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
  
  #define NETBACK_INVALID_HANDLE -1
  
 @@ -203,7 +206,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
  /* Maximum number of Rx slots a to-guest packet may use, including the
   * slot needed for GSO meta-data.
   */
 -#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
 +#define XEN_NETBK_RX_SLOTS_MAX ((MAX_XEN_SKB_FRAGS + 1))
  
  enum state_bit_shift {
   /* This bit marks that the vif is connected */
 diff --git a/drivers/net/xen-netback/netback.c 
 b/drivers/net/xen-netback/netback.c
 index 66f1780..c32a9f2 100644
 --- a/drivers/net/xen-netback/netback.c
 +++ b/drivers/net/xen-netback/netback.c
 @@ -263,6 +263,80 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
 xenvif_queue *queue,
   return meta;
  }
  
[...]
   * Set up the grant operations for this fragment. If it's a flipping
   * interface, we also set up the unmap request from here.
 @@ -272,83 +346,52 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
 *queue, struct sk_buff *skb
struct page *page, unsigned long size,
unsigned long offset, int *head)
  {
 - struct gnttab_copy *copy_gop;
 - struct xenvif_rx_meta *meta;
 + struct gop_frag_copy info = {
 + .queue = queue,
 + .npo = npo,
 + .head = *head,
 + .gso_type = XEN_NETIF_GSO_TYPE_NONE,
 + };
   unsigned long bytes;
 - int gso_type = XEN_NETIF_GSO_TYPE_NONE;
  
   if (skb_is_gso(skb)) {
   if (skb_shinfo(skb)-gso_type  SKB_GSO_TCPV4)
 - gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
 + info.gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
   else if (skb_shinfo(skb)-gso_type  SKB_GSO_TCPV6)
 - gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
 + info.gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
   }
  
   /* Data must not cross a page boundary. */
   BUG_ON(size + offset  PAGE_SIZEcompound_order(page));
  
 - meta = npo-meta + npo-meta_prod - 1;
 + info.meta = npo-meta + npo-meta_prod - 1;
  
   /* Skip unused frames from start of page */
   page += offset  PAGE_SHIFT;
   offset = ~PAGE_MASK;
  
   while (size  0) {
 - struct xen_page_foreign *foreign;
 -
   BUG_ON(offset = PAGE_SIZE);
 - BUG_ON(npo-copy_off  MAX_BUFFER_OFFSET);
 -
 - if (npo-copy_off == MAX_BUFFER_OFFSET)
 - meta = get_next_rx_buffer(queue, npo);
  
   bytes = PAGE_SIZE - offset;
   if (bytes  size)
   bytes = size;
  
 - if (npo-copy_off + bytes  MAX_BUFFER_OFFSET)
 - bytes = MAX_BUFFER_OFFSET - npo-copy_off;
 -
 - copy_gop = npo-copy + npo-copy_prod++;
 - copy_gop-flags = GNTCOPY_dest_gref;
 - copy_gop-len = bytes;
 -
 - foreign = xen_page_foreign(page);
 - if (foreign) {
 - copy_gop-source.domid = foreign-domid;
 - copy_gop-source.u.ref = foreign-gref;
 - copy_gop-flags |= GNTCOPY_source_gref;
 

[PATCH net-next] net: dsa: mv88e6352: Use mnemonics for EEPROM registers and bits

2015-08-08 Thread Andrew Lunn
Add register definitions #defines for accessing the EEPROM.

Signed-off-by: Andrew Lunn and...@lunn.ch
---
 drivers/net/dsa/mv88e6352.c | 18 ++
 drivers/net/dsa/mv88e6xxx.h |  8 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index af210efecc55..7e935852e192 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -123,8 +123,9 @@ static int mv88e6352_read_eeprom_word(struct dsa_switch 
*ds, int addr)
 
mutex_lock(ps-eeprom_mutex);
 
-   ret = mv88e6xxx_reg_write(ds, REG_GLOBAL2, 0x14,
- 0xc000 | (addr  0xff));
+   ret = mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_EEPROM_OP,
+ GLOBAL2_EEPROM_OP_READ |
+ (addr  GLOBAL2_EEPROM_OP_ADDR_MASK));
if (ret  0)
goto error;
 
@@ -132,7 +133,7 @@ static int mv88e6352_read_eeprom_word(struct dsa_switch 
*ds, int addr)
if (ret  0)
goto error;
 
-   ret = mv88e6xxx_reg_read(ds, REG_GLOBAL2, 0x15);
+   ret = mv88e6xxx_reg_read(ds, REG_GLOBAL2, GLOBAL2_EEPROM_DATA);
 error:
mutex_unlock(ps-eeprom_mutex);
return ret;
@@ -205,11 +206,11 @@ static int mv88e6352_eeprom_is_readonly(struct dsa_switch 
*ds)
 {
int ret;
 
-   ret = mv88e6xxx_reg_read(ds, REG_GLOBAL2, 0x14);
+   ret = mv88e6xxx_reg_read(ds, REG_GLOBAL2, GLOBAL2_EEPROM_OP);
if (ret  0)
return ret;
 
-   if (!(ret  0x0400))
+   if (!(ret  GLOBAL2_EEPROM_OP_WRITE_EN))
return -EROFS;
 
return 0;
@@ -223,12 +224,13 @@ static int mv88e6352_write_eeprom_word(struct dsa_switch 
*ds, int addr,
 
mutex_lock(ps-eeprom_mutex);
 
-   ret = mv88e6xxx_reg_write(ds, REG_GLOBAL2, 0x15, data);
+   ret = mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_EEPROM_DATA, data);
if (ret  0)
goto error;
 
-   ret = mv88e6xxx_reg_write(ds, REG_GLOBAL2, 0x14,
- 0xb000 | (addr  0xff));
+   ret = mv88e6xxx_reg_write(ds, REG_GLOBAL2, GLOBAL2_EEPROM_OP,
+ GLOBAL2_EEPROM_OP_WRITE |
+ (addr  GLOBAL2_EEPROM_OP_ADDR_MASK));
if (ret  0)
goto error;
 
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 78e37226a37d..8b017d65b691 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -285,8 +285,12 @@
 #define GLOBAL2_PRIO_OVERRIDE_FORCE_ARPBIT(3)
 #define GLOBAL2_PRIO_OVERRIDE_ARP_SHIFT0
 #define GLOBAL2_EEPROM_OP  0x14
-#define GLOBAL2_EEPROM_OP_BUSY BIT(15)
-#define GLOBAL2_EEPROM_OP_LOAD BIT(11)
+#define GLOBAL2_EEPROM_OP_BUSY BIT(15)
+#define GLOBAL2_EEPROM_OP_WRITE((3  12) | 
GLOBAL2_EEPROM_OP_BUSY)
+#define GLOBAL2_EEPROM_OP_READ ((4  12) | GLOBAL2_EEPROM_OP_BUSY)
+#define GLOBAL2_EEPROM_OP_LOAD BIT(11)
+#define GLOBAL2_EEPROM_OP_WRITE_EN BIT(10)
+#define GLOBAL2_EEPROM_OP_ADDR_MASK0xff
 #define GLOBAL2_EEPROM_DATA0x15
 #define GLOBAL2_PTP_AVB_OP 0x16
 #define GLOBAL2_PTP_AVB_DATA   0x17
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] dsa: Support multiple MDIO busses

2015-08-08 Thread Andrew Lunn
When using a cluster of switches, some topologies will have an MDIO
bus per switch, not one for the whole cluster. Allow this to be
represented in the device tree, by adding an optional mii-bus property
at the switch level. The old platform_device method of instantiation
supports this already, so only the device tree binding needs extending
with an additional optional phandle.

Signed-off-by: Andrew Lunn and...@lunn.ch
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt |  5 +
 net/dsa/dsa.c | 12 +++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index f0b4cd72411d..9cf9a0ec333c 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -58,6 +58,10 @@ Optionnal property:
  Documentation/devicetree/bindings/net/ethernet.txt
  for details.
 
+- mii-bus  : Should be a phandle to a valid MDIO bus device node.
+ This mii-bus will be used in preference to the
+ global dsa,mii-bus defined above, for this switch.
+
 Optional subnodes:
 - fixed-link   : Fixed-link subnode describing a link to a non-MDIO
  managed entity. See
@@ -107,6 +111,7 @@ Example:
#address-cells = 1;
#size-cells = 0;
reg = 17 1;   /* MDIO address 17, switch 1 in tree */
+   mii-bus = mii_bus1;
 
switch1uplink: port@0 {
reg = 0;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index b445d492c115..78d4ac97aae3 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -574,7 +574,7 @@ static int dsa_of_probe(struct device *dev)
 {
struct device_node *np = dev-of_node;
struct device_node *child, *mdio, *ethernet, *port, *link;
-   struct mii_bus *mdio_bus;
+   struct mii_bus *mdio_bus, *mdio_bus_switch;
struct net_device *ethernet_dev;
struct dsa_platform_data *pd;
struct dsa_chip_data *cd;
@@ -636,6 +636,16 @@ static int dsa_of_probe(struct device *dev)
if (!of_property_read_u32(child, eeprom-length, eeprom_len))
cd-eeprom_len = eeprom_len;
 
+   mdio = of_parse_phandle(child, mii-bus, 0);
+   if (mdio) {
+   mdio_bus_switch = of_mdio_find_bus(mdio);
+   if (!mdio_bus_switch) {
+   ret = -EPROBE_DEFER;
+   goto out_free_chip;
+   }
+   cd-host_dev = mdio_bus_switch-dev;
+   }
+
for_each_available_child_of_node(child, port) {
port_reg = of_get_property(port, reg, NULL);
if (!port_reg)
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] netfilter: ip6t_SYNPROXY: fix NULL pointer dereference

2015-08-08 Thread Patrick McHardy
On 06.08, Phil Sutter wrote:
 This happens when networking namespaces are enabled.

Thanks, just one minor request:

  synproxy_send_tcp(const struct sk_buff *skb, struct sk_buff *nskb,
 struct nf_conntrack *nfct, enum ip_conntrack_info ctinfo,
 struct ipv6hdr *niph, struct tcphdr *nth,
 -   unsigned int tcp_hdr_size)
 +   unsigned int tcp_hdr_size, struct synproxy_net *snet)

Logically the synproxy_net pointer should come before all other arguments
since its the container for a lot of the following arguments.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: dsa: mv88e6352: Use mnemonics for EEPROM registers and bits

2015-08-08 Thread Guenter Roeck

On 08/08/2015 08:04 AM, Andrew Lunn wrote:

Add register definitions #defines for accessing the EEPROM.

Signed-off-by: Andrew Lunn and...@lunn.ch


Acked-by: Guenter Roeck li...@roeck-us.net

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtnl_mutex deadlock?

2015-08-08 Thread Thomas Graf
On 08/07/15 at 08:00am, Herbert Xu wrote:
 On Fri, Aug 07, 2015 at 01:58:15AM +0200, Daniel Borkmann wrote:
 
  Looks like we had a WARN_ON() in rhashtable_insert_rehash() before, but
  was removed in a87b9ebf1709 (rhashtable: Do not schedule more than one
  rehash if we can't grow further). Do you want to re-add a WARN_ON_ONCE()?
 
 I think so.  Thomas?

Makes sense. I removed it because I thought it was not possible to
reach.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] netlink: make sure -EBUSY won't escape from netlink_insert

2015-08-08 Thread Thomas Graf
On 08/07/15 at 12:26am, Daniel Borkmann wrote:
 Linus reports the following deadlock on rtnl_mutex; triggered only
 once so far (extract):
 
[...] 
 Reference: http://thread.gmane.org/gmane.linux.network/372676
 Reported-by: Linus Torvalds torva...@linux-foundation.org
 Signed-off-by: Daniel Borkmann dan...@iogearbox.net

Acked-by: Thomas Graf tg...@suug.ch
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] of: fsl/fman: reuse the fixed node parsing code

2015-08-08 Thread Florian Fainelli
CC'ing Stas,

Le 08/05/15 07:42, Madalin Bucur a écrit :
 The FMan MAC configuration code needs the speed and duplex information
 for fixed-link interfaces that is parsed now by the of function
 of_phy_register_fixed_link(). This parses the fixed-link parameters but
 does not expose to the caller neither the phy_device pointer nor the
 status struct where it loads the fixed-link params. By extracting the
 fixed-link parsing code from of_phy_register_fixed_link() into a
 separate function the parsed values are made available without changing
 the existing API. This change also removes a small redundancy in the
 previous code calling fixed_phy_register().

I will look into this shortly, sorry for the delay.

 
 The FMan patch relies on the latest FMan driver v4 submission by Igal 
 Liberman:
 https://patchwork.ozlabs.org/project/netdev/list/?submitter=Igal.Libermanstate=*q=v4
 
 Madalin Bucur (2):
   of: separate fixed link parsing from registration
   fsl_fman: use fixed_phy_status for MEMAC
 
  .../ethernet/freescale/fman/flib/fsl_fman_memac.h  |  6 ++-
  drivers/net/ethernet/freescale/fman/inc/mac.h  |  2 +-
  drivers/net/ethernet/freescale/fman/mac/fm_memac.c | 42 -
  drivers/net/ethernet/freescale/fman/mac/fm_memac.h |  3 +-
  drivers/net/ethernet/freescale/fman/mac/mac.c  | 18 ++--
  drivers/of/of_mdio.c   | 52 
 ++
  include/linux/of_mdio.h|  9 
  7 files changed, 94 insertions(+), 38 deletions(-)
 


-- 
Florian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] dsa: Support multiple MDIO busses

2015-08-08 Thread Florian Fainelli
Le 08/08/15 08:09, Andrew Lunn a écrit :
 When using a cluster of switches, some topologies will have an MDIO
 bus per switch, not one for the whole cluster. Allow this to be
 represented in the device tree, by adding an optional mii-bus property
 at the switch level. The old platform_device method of instantiation
 supports this already, so only the device tree binding needs extending
 with an additional optional phandle.
 
 Signed-off-by: Andrew Lunn and...@lunn.ch

Reviewed-by: Florian Fainelli f.faine...@gmail.com
-- 
Florian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2 2/9] dpaa_eth: add support for DPAA Ethernet

2015-08-08 Thread Florian Fainelli
Le 08/05/15 08:41, Madalin Bucur a écrit :
 This introduces the Freescale Data Path Acceleration Architecture
 (DPAA) Ethernet driver (dpaa_eth) that builds upon the DPAA QMan,
 BMan, PAMU and FMan drivers to deliver Ethernet connectivity on
 the Freescale DPAA QorIQ platforms.
 
 Signed-off-by: Madalin Bucur madalin.bu...@freescale.com
 ---
[snip]
 +
 +if FSL_DPAA_ETH
 +
 +config FSL_DPAA_CS_THRESHOLD_1G
 + hex Egress congestion threshold on 1G ports
 + range 0x1000 0x1000
 + default 0x0600

This sounds like something you would want to be able to configure at
runtime, either via private sysfs attributes, or better, using ethtool
and either a newly introduced set of tunables, or creating a private
driver API for this.

 + ---help---
 +   The size in bytes of the egress Congestion State notification 
 threshold on 1G ports.
 +   The 1G dTSECs can quite easily be flooded by cores doing Tx in a 
 tight loop
 +   (e.g. by sending UDP datagrams at while(1) speed),
 +   and the larger the frame size, the more acute the problem.
 +   So we have to find a balance between these factors:
 +- avoiding the device staying congested for a prolonged time 
 (risking
 + the netdev watchdog to fire - see also the tx_timeout 
 module param);
 +   - affecting performance of protocols such as TCP, which 
 otherwise
 +  behave well under the congestion notification mechanism;
 +- preventing the Tx cores from tightly-looping (as if the 
 congestion
 +  threshold was too low to be effective);
 +- running out of memory if the CS threshold is set too high.
 +
 +config FSL_DPAA_CS_THRESHOLD_10G
 + hex Egress congestion threshold on 10G ports
 + range 0x1000 0x2000
 + default 0x1000
 + ---help ---
 +   The size in bytes of the egress Congestion State notification 
 threshold on 10G ports.
 +
 +config FSL_DPAA_INGRESS_CS_THRESHOLD
 + hex Ingress congestion threshold on FMan ports
 + default 0x1000
 + ---help---
 +   The size in bytes of the ingress tail-drop threshold on FMan ports.
 +   Traffic piling up above this value will be rejected by QMan and 
 discarded by FMan.

Same here.
-- 
Florian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/2 -mainline] cxgb4: missing curly braces in t4_setup_debugfs()

2015-08-08 Thread Dan Carpenter
There were missing curly braces so it means we call add_debugfs_mem()
unintentionally.

Fixes: 3ccc6cf74d8c ('cxgb4: Adds support for T6 adapter')
Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index b657734..9e0b670 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -2665,10 +2665,11 @@ int t4_setup_debugfs(struct adapter *adap)
EXT_MEM1_SIZE_G(size));
}
} else {
-   if (i  EXT_MEM_ENABLE_F)
+   if (i  EXT_MEM_ENABLE_F) {
size = t4_read_reg(adap, MA_EXT_MEMORY_BAR_A);
add_debugfs_mem(adap, mc, MEM_MC,
EXT_MEM_SIZE_G(size));
+   }
}
 
de = debugfs_create_file_size(flash, S_IRUSR, adap-debugfs_root, 
adap,
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/2] cxgb4: cleanup some indenting

2015-08-08 Thread Dan Carpenter
Add or remove some tabs so that statements line up correctly.

Signed-off-by: Dan Carpenter dan.carpen...@oracle.com

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
index 9e0b670..b83ca7f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c
@@ -346,11 +346,11 @@ static int cim_qcfg_show(struct seq_file *seq, void *v)
if (is_t4(adap-params.chip)) {
i = t4_cim_read(adap, UP_OBQ_0_REALADDR_A,
ARRAY_SIZE(obq_wr_t4), obq_wr_t4);
-   wr = obq_wr_t4;
+   wr = obq_wr_t4;
} else {
i = t4_cim_read(adap, UP_OBQ_0_SHADOW_REALADDR_A,
ARRAY_SIZE(obq_wr_t5), obq_wr_t5);
-   wr = obq_wr_t5;
+   wr = obq_wr_t5;
}
}
if (i)
@@ -2095,7 +2095,7 @@ do { \
 #undef T
 #undef S
 #undef S3
-return 0;
+   return 0;
 }
 
 static int sge_queue_entries(const struct adapter *adap)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] net: dsa: Do not override PHY interface if already configured

2015-08-08 Thread Florian Fainelli
In case we need to divert reads/writes using the slave MII bus, we may have
already fetched a valid PHY interface property from Device Tree, and that
mode is used by the PHY driver to make configuration decisions.

If we could not fetch the phy-mode property, we will assign p-phy_interface
to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
to whether or not we should override the interface value.

Fixes: 19334920eaf7 (net: dsa: Set valid phy interface type)
Signed-off-by: Florian Fainelli f.faine...@gmail.com
---
Hi Guenter,

Could you verify this does not break what you were trying to fix with your 
change?
I am fairly confident this will not because for PHYs built-into the switch 
port
we will not be able to fetch a phy-mode property from DT, so we will use
PHY_INTERFACE_MODE_NA, but here, we will re-assign them to 
PHY_INTERFACE_MODE_GMII
as before.

Thanks!

 net/dsa/slave.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0917123790ea..35c47ddd04f0 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -756,7 +756,8 @@ static int dsa_slave_phy_connect(struct dsa_slave_priv *p,
return -ENODEV;
 
/* Use already configured phy mode */
-   p-phy_interface = p-phy-interface;
+   if (p-phy_interface == PHY_INTERFACE_MODE_NA)
+   p-phy_interface = p-phy-interface;
phy_connect_direct(slave_dev, p-phy, dsa_slave_adjust_link,
   p-phy_interface);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/4] vhost: Introduce a universal thread to serve all users

2015-08-08 Thread Bandan Das
Eyal Moscovici eya...@il.ibm.com writes:

 Hi, 

 Do you know what is the overhead of switching the vhost thread from one 
 cgroup to another?

I misinterpreted this question earlier. I think what you are asking here is
that when the vm process is moved from one cgroup to another, what is the
overhead of moving the vhost thread to the new cgroup.

This design does not provide any hooks for the vhost thread to move to
a new cgroup. Rather, I think a better approach is to create a new vhost
thread and bind the process to it if the process is migrated to a new
cgroup. This is much less complicated, and there's a good chance that
it's impossible to migrate the vhost thread since it's serving other guests.
I will address this in v2.

 Eyal Moscovici
 HL-Cloud Infrastructure Solutions
 IBM Haifa Research Lab



 From:   Bandan Das b...@redhat.com
 To: k...@vger.kernel.org
 Cc: netdev@vger.kernel.org, linux-ker...@vger.kernel.org, 
 m...@redhat.com, Eyal Moscovici/Haifa/IBM@IBMIL, Razya 
 Ladelsky/Haifa/IBM@IBMIL, cgro...@vger.kernel.org, jasow...@redhat.com
 Date:   07/13/2015 07:08 AM
 Subject:[RFC PATCH 1/4] vhost: Introduce a universal thread to 
 serve all users



 vhost threads are per-device, but in most cases a single thread
 is enough. This change creates a single thread that is used to
 serve all guests.

 However, this complicates cgroups associations. The current policy
 is to attach the per-device thread to all cgroups of the parent process
 that the device is associated it. This is no longer possible if we
 have a single thread. So, we end up moving the thread around to
 cgroups of whichever device that needs servicing. This is a very
 inefficient protocol but seems to be the only way to integrate
 cgroups support.

 Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  drivers/vhost/scsi.c  |  15 +++--
  drivers/vhost/vhost.c | 150 
 --
  drivers/vhost/vhost.h |  19 +--
  3 files changed, 97 insertions(+), 87 deletions(-)

 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index ea32b38..6c42936 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -535,7 +535,7 @@ static void vhost_scsi_complete_cmd(struct 
 vhost_scsi_cmd *cmd)
  
  llist_add(cmd-tvc_completion_list, 
 vs-vs_completion_list);
  
 -vhost_work_queue(vs-dev, vs-vs_completion_work);
 +vhost_work_queue(vs-dev.worker, 
 vs-vs_completion_work);
  }
  
  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 @@ -1282,7 +1282,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
  }
  
  llist_add(evt-list, vs-vs_event_list);
 -vhost_work_queue(vs-dev, vs-vs_event_work);
 +vhost_work_queue(vs-dev.worker, vs-vs_event_work);
  }
  
  static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 @@ -1335,8 +1335,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
  /* Flush both the vhost poll and vhost work */
  for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
  vhost_scsi_flush_vq(vs, i);
 -vhost_work_flush(vs-dev, vs-vs_completion_work);
 -vhost_work_flush(vs-dev, vs-vs_event_work);
 +vhost_work_flush(vs-dev.worker, 
 vs-vs_completion_work);
 +vhost_work_flush(vs-dev.worker, vs-vs_event_work);
  
  /* Wait for all reqs issued before the flush to be 
 finished */
  for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 @@ -1584,8 +1584,11 @@ static int vhost_scsi_open(struct inode *inode, 
 struct file *f)
  if (!vqs)
  goto err_vqs;
  
 -vhost_work_init(vs-vs_completion_work, 
 vhost_scsi_complete_cmd_work);
 -vhost_work_init(vs-vs_event_work, vhost_scsi_evt_work);
 +vhost_work_init(vs-dev, vs-vs_completion_work,
 + vhost_scsi_complete_cmd_work);
 +
 +vhost_work_init(vs-dev, vs-vs_event_work,
 +vhost_scsi_evt_work);
  
  vs-vs_events_nr = 0;
  vs-vs_events_missed = false;
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 2ee2826..951c96b 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -11,6 +11,8 @@
   * Generic code for virtio server in host kernel.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/eventfd.h
  #include linux/vhost.h
  #include linux/uio.h
 @@ -28,6 +30,9 @@
  
  #include vhost.h
  
 +/* Just one worker thread to service all devices */
 +static struct vhost_worker *worker;
 +
  enum {
  VHOST_MEMORY_MAX_NREGIONS = 64,
  VHOST_MEMORY_F_LOG = 0x1,
 @@ -58,13 +63,15 @@ static int vhost_poll_wakeup(wait_queue_t *wait, 
 unsigned mode, int sync,
  

Re: [RFC PATCH 0/4] Shared vhost design

2015-08-08 Thread Bandan Das
Hi Michael,

Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Jul 13, 2015 at 12:07:31AM -0400, Bandan Das wrote:
 Hello,
 
 There have been discussions on improving the current vhost design. The first
 attempt, to my knowledge was Shirley Ma's patch to create a dedicated vhost
 worker per cgroup.
 
 http://comments.gmane.org/gmane.linux.network/224730
 
 Later, I posted a cmwq based approach for performance comparisions
 http://comments.gmane.org/gmane.linux.network/286858
 
 More recently was the Elvis work that was presented in KVM Forum 2013
 http://www.linux-kvm.org/images/a/a3/Kvm-forum-2013-elvis.pdf
 
 The Elvis patches rely on common vhost thread design for scalability
 along with polling for performance. Since there are two major changes
 being proposed, we decided to split up the work. The first (this RFC),
 proposing a re-design of the vhost threading model and the second part
 (not posted yet) to focus more on improving performance. 
 
 I am posting this with the hope that we can have a meaningful discussion
 on the proposed new architecture. We have run some tests to show that the new
 design is scalable and in terms of performance, is comparable to the current
 stable design. 
 
 Test Setup:
 The testing is based on the setup described in the Elvis proposal.
 The initial tests are just an aggregate of Netperf STREAM and MAERTS but
 as we progress, I am happy to run more tests. The hosts are two identical
 16 core Haswell systems with point to point network links. For the first 10 
 runs,
 with n=1 upto n=10 guests running in parallel, I booted the target system 
 with nr_cpus=8
 and mem=12G. The purpose was to do a comparision of resource utilization
 and how it affects performance. Finally, with the number of guests set at 14,
 I didn't limit the number of CPUs booted on the host or limit memory seen by
 the kernel but boot the kernel with isolcpus=14,15 that will be used to run
 the vhost threads. The guests are pinned to cpus 0-13 and based on which
 cpu the guest is running on, the corresponding I/O thread is either pinned
 to cpu 14 or 15.
 Results
 # X axis is number of guests
 # Y axis is netperf number
 # nr_cpus=8 and mem=12G
 #Number of Guests#Baseline#ELVIS
 11119.3.0
 2 1135.6   1130.2
 3 1135.5   1131.6
 4 1136.0   1127.1
 5 1118.6   1129.3
 6 1123.4   1129.8
 7 1128.7   1135.4
 8 1129.9   1137.5
 9 1130.6   1135.1
 101129.3   1138.9
 14*   1173.8   1216.9

 I'm a bit too busy now, with 2.4 and related stuff, will review once we
 finish 2.4.  But I'd like to ask two things:
 - did you actually test a config where cgroups were used?

Here are some numbers with a simple cgroup setup.

Three cgroups with cpusets cpu=0,2,4 for cgroup1, cpu=1,3,5 for cgroup2 and 
cpu=6,7
for cgroup3 (even though 6,7 have different numa nodes)

I run netperf for 1 to 9 guests starting with assigning the first guest
to cgroup1, second to cgroup2, third to cgroup3 and repeat this sequence
upto 9 guests.

The numbers  - (TCP_STREAM + TCP_MAERTS)/2

 #Number of Guests #ELVIS (Mbps)
 11056.9
 21122.5
 31122.8
 41123.2
 51122.6
 61110.3
 71116.3
 81121.8
 91118.5

Maybe, my cgroup setup was too simple but these numbers are comparable
to the no cgroups results above. I wrote some tracing code to trace
cgroup_match_groups() and find cgroup search overhead but it seemed
unnecessary for this particular test.


 - does the design address the issue of VM 1 being blocked
   (e.g. because it hits swap) and blocking VM 2?
Good question. I haven't thought of this yet. But IIUC,
the worker thread will complete VM1's job and then move on to
executing VM2's scheduled work. It doesn't matter if VM1 is
blocked currently. I think it would be a problem though if/when
polling is introduced.

 
 #* Last run with the vCPU and I/O thread(s) pinned, no CPU/memory limit 
 imposed.
 #  I/O thread runs on CPU 14 or 15 depending on which guest it's serving
 
 There's a simple graph at
 http://people.redhat.com/~bdas/elvis/data/results.png
 that shows how task affinity results in a jump and even without it,
 as the number of guests increase, the shared vhost design performs
 slightly better.
 
 Observations:
 1. In terms of stock performance, the results are comparable.
 2. However, with a tuned setup, even without polling, we see an improvement
 with the new design.
 3. Making the new design 

[PATCHv2 net-next] dsa: Support multiple MDIO busses

2015-08-08 Thread Andrew Lunn
When using a cluster of switches, some topologies will have an MDIO
bus per switch, not one for the whole cluster. Allow this to be
represented in the device tree, by adding an optional mii-bus property
at the switch level.

Signed-off-by: Andrew Lunn and...@lunn.ch
Reviewed-by: Florian Fainelli f.faine...@gmail.com
---

v2: Fix documentation, which placed the properties documentation in
the wrong place.
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt |  5 +
 net/dsa/dsa.c | 12 +++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index f0b4cd72411d..fc06f4a7c788 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -32,6 +32,10 @@ A switch child node has the following optional property:
  the presence and/or size of a connected EEPROM,
  otherwise optional.
 
+- mii-bus  : Should be a phandle to a valid MDIO bus device node.
+ This mii-bus will be used in preference to the
+ global dsa,mii-bus defined above, for this switch.
+
 A switch may have multiple port children nodes
 
 Each port children node must have the following mandatory properties:
@@ -107,6 +111,7 @@ Example:
#address-cells = 1;
#size-cells = 0;
reg = 17 1;   /* MDIO address 17, switch 1 in tree */
+   mii-bus = mii_bus1;
 
switch1uplink: port@0 {
reg = 0;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index b445d492c115..78d4ac97aae3 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -574,7 +574,7 @@ static int dsa_of_probe(struct device *dev)
 {
struct device_node *np = dev-of_node;
struct device_node *child, *mdio, *ethernet, *port, *link;
-   struct mii_bus *mdio_bus;
+   struct mii_bus *mdio_bus, *mdio_bus_switch;
struct net_device *ethernet_dev;
struct dsa_platform_data *pd;
struct dsa_chip_data *cd;
@@ -636,6 +636,16 @@ static int dsa_of_probe(struct device *dev)
if (!of_property_read_u32(child, eeprom-length, eeprom_len))
cd-eeprom_len = eeprom_len;
 
+   mdio = of_parse_phandle(child, mii-bus, 0);
+   if (mdio) {
+   mdio_bus_switch = of_mdio_find_bus(mdio);
+   if (!mdio_bus_switch) {
+   ret = -EPROBE_DEFER;
+   goto out_free_chip;
+   }
+   cd-host_dev = mdio_bus_switch-dev;
+   }
+
for_each_available_child_of_node(child, port) {
port_reg = of_get_property(port, reg, NULL);
if (!port_reg)
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html