[PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Chris J Arges
Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow-stats[node].

For example, if node_possible_map is 0x30003, this patch will map node to
node_cnt as follows:
0,1,16,17 = 0,1,2,3

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
---
 net/openvswitch/flow.c   | 10 ++
 net/openvswitch/flow_table.c | 18 +++---
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index bc7b0ab..425d45d 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
struct ovs_flow_stats *ovs_stats,
unsigned long *used, __be16 *tcp_flags)
 {
-   int node;
+   int node, node_cnt = 0;
 
*used = 0;
*tcp_flags = 0;
memset(ovs_stats, 0, sizeof(*ovs_stats));
 
for_each_node(node) {
-   struct flow_stats *stats = 
rcu_dereference_ovsl(flow-stats[node]);
+   struct flow_stats *stats = 
rcu_dereference_ovsl(flow-stats[node_cnt]);
 
if (stats) {
/* Local CPU may write on non-local stats, so we must
@@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
ovs_stats-n_bytes += stats-byte_count;
spin_unlock_bh(stats-lock);
}
+   node_cnt++;
}
 }
 
 /* Called with ovs_mutex. */
 void ovs_flow_stats_clear(struct sw_flow *flow)
 {
-   int node;
+   int node, node_cnt = 0;
 
for_each_node(node) {
-   struct flow_stats *stats = ovsl_dereference(flow-stats[node]);
+   struct flow_stats *stats = 
ovsl_dereference(flow-stats[node_cnt]);
 
if (stats) {
spin_lock_bh(stats-lock);
@@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
stats-tcp_flags = 0;
spin_unlock_bh(stats-lock);
}
+   node_cnt++;
}
 }
 
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 4613df8..5d10c54 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void)
 {
struct sw_flow *flow;
struct flow_stats *stats;
-   int node;
+   int node, node_cnt = 0;
 
flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
if (!flow)
@@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void)
 
RCU_INIT_POINTER(flow-stats[0], stats);
 
-   for_each_node(node)
+   for_each_node(node) {
if (node != 0)
-   RCU_INIT_POINTER(flow-stats[node], NULL);
+   RCU_INIT_POINTER(flow-stats[node_cnt], NULL);
+   node_cnt++;
+   }
 
return flow;
 err:
@@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int 
n_buckets)
 
 static void flow_free(struct sw_flow *flow)
 {
-   int node;
+   int node, node_cnt = 0;
 
if (ovs_identifier_is_key(flow-id))
kfree(flow-id.unmasked_key);
kfree((struct sw_flow_actions __force *)flow-sf_acts);
-   for_each_node(node)
-   if (flow-stats[node])
+   for_each_node(node) {
+   if (flow-stats[node_cnt])
kmem_cache_free(flow_stats_cache,
-   (struct flow_stats __force 
*)flow-stats[node]);
+   (struct flow_stats __force 
*)flow-stats[node_cnt]);
+   node_cnt++;
+   }
kmem_cache_free(flow_cache, flow);
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel

2015-07-21 Thread Kalle Valo
Maninder Singh maninder...@samsung.com writes:

 chandef is initialized with NULL and on the very next line,
 we are using it to get channel, which is not correct.

 channel should be initialized after obtaining chandef.

 Signed-off-by: Maninder Singh maninder...@samsung.com

How did you find this bug?

 Static anlysis reports this bug like coverity or any other static tool like 
 cppcheck :-

 drivers/net/wireless/ath/ath10k/mac.c:839]: (error) Possible null pointer 
 dereference: chandef

Thanks. This is always good to add to the commit log so I did that:

ath10k: fix wrong initialization of struct channel

chandef is initialized with NULL and on the very next line, we are using it 
to
get channel, which is not correct. Channel should be initialized after
obtaining chandef.

Found by cppcheck:

ath/ath10k/mac.c:839]: (error) Possible null pointer dereference: chandef

Signed-off-by: Maninder Singh maninder...@samsung.com
Signed-off-by: Kalle Valo kv...@qca.qualcomm.com


-- 
Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring

2015-07-21 Thread Lucas Stach
So it gets freed when the device is going away.
This fixes a DMA memory leak on driver probe() fail and driver
remove().

Signed-off-by: Lucas Stach l.st...@pengutronix.de
---
v2: Fix indentation of second line to fix alignment with opening bracket.
---
 drivers/net/ethernet/freescale/fec_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 349365d85b92..a7f1bdf718f8 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev)
fep-bufdesc_size;
 
/* Allocate memory for buffer descriptors. */
-   cbd_base = dma_alloc_coherent(NULL, bd_size, bd_dma,
- GFP_KERNEL);
+   cbd_base = dmam_alloc_coherent(fep-pdev-dev, bd_size, bd_dma,
+  GFP_KERNEL);
if (!cbd_base) {
return -ENOMEM;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path

2015-07-21 Thread Lucas Stach
This function frees resources and cancels delayed work item that
have been initialized in fec_ptp_init().

Use this to do proper error handling if something goes wrong in
probe function after fec_ptp_init has been called.

Signed-off-by: Lucas Stach l.st...@pengutronix.de
---
 drivers/net/ethernet/freescale/fec.h  |  1 +
 drivers/net/ethernet/freescale/fec_main.c |  5 ++---
 drivers/net/ethernet/freescale/fec_ptp.c  | 10 ++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 1eee73cccdf5..99d33e2d35e6 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -562,6 +562,7 @@ struct fec_enet_private {
 };
 
 void fec_ptp_init(struct platform_device *pdev);
+void fec_ptp_stop(struct platform_device *pdev);
 void fec_ptp_start_cyclecounter(struct net_device *ndev);
 int fec_ptp_set(struct net_device *ndev, struct ifreq *ifr);
 int fec_ptp_get(struct net_device *ndev, struct ifreq *ifr);
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index a7f1bdf718f8..32e3807c650e 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3494,6 +3494,7 @@ failed_register:
 failed_mii_init:
 failed_irq:
 failed_init:
+   fec_ptp_stop(pdev);
if (fep-reg_phy)
regulator_disable(fep-reg_phy);
 failed_regulator:
@@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
 
-   cancel_delayed_work_sync(fep-time_keep);
cancel_work_sync(fep-tx_timeout_work);
+   fec_ptp_stop(pdev);
unregister_netdev(ndev);
fec_enet_mii_remove(fep);
if (fep-reg_phy)
regulator_disable(fep-reg_phy);
-   if (fep-ptp_clock)
-   ptp_clock_unregister(fep-ptp_clock);
of_node_put(fep-phy_node);
free_netdev(ndev);
 
diff --git a/drivers/net/ethernet/freescale/fec_ptp.c 
b/drivers/net/ethernet/freescale/fec_ptp.c
index a15663ad7f5e..f457a23d0bfb 100644
--- a/drivers/net/ethernet/freescale/fec_ptp.c
+++ b/drivers/net/ethernet/freescale/fec_ptp.c
@@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev)
schedule_delayed_work(fep-time_keep, HZ);
 }
 
+void fec_ptp_stop(struct platform_device *pdev)
+{
+   struct net_device *ndev = platform_get_drvdata(pdev);
+   struct fec_enet_private *fep = netdev_priv(ndev);
+
+   cancel_delayed_work_sync(fep-time_keep);
+   if (fep-ptp_clock)
+   ptp_clock_unregister(fep-ptp_clock);
+}
+
 /**
  * fec_ptp_check_pps_event
  * @fep: the fec_enet_private structure handle
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Chris J Arges
On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote:
 On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
  Some architectures like POWER can have a NUMA node_possible_map that
  contains sparse entries. This causes memory corruption with openvswitch
  since it allocates flow_cache with a multiple of num_possible_nodes() and
 
 Couldn't this also be fixed by just allocationg with a multiple of
 nr_node_ids (which seems to have been the original intent all along)?
 You could then make your stats array be sparse or not.
 

Yea originally this is what I did, but I thought it would be wasting memory.

  assumes the node variable returned by for_each_node will index into
  flow-stats[node].
  
  For example, if node_possible_map is 0x30003, this patch will map node to
  node_cnt as follows:
  0,1,16,17 = 0,1,2,3
  
  The crash was noticed after 3af229f2 was applied as it changed the
  node_possible_map to match node_online_map on boot.
  Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
 
 My concern with this version of the fix is that you're relying on,
 implicitly, the order of for_each_node's iteration corresponding to the
 entries in stats 1:1. But what about node hotplug? It seems better to
 have the enumeration of the stats array match the topology accurately,
 rather, or to maintain some sort of internal map in the OVS code between
 the NUMA node and the entry in the stats array?
 
 I'm willing to be convinced otherwise, though :)
 
 -Nish


Nish,

The method I described should work for hotplug since it's using possible map
which AFAIK is static rather than the online map. 

Regardless, the more simple solution to solve this issue would be to just
allocate nr_node_ids number of entries and use up extra memory.

I'll send a v2 after testing it.

--chris

  Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
  ---
   net/openvswitch/flow.c   | 10 ++
   net/openvswitch/flow_table.c | 18 +++---
   2 files changed, 17 insertions(+), 11 deletions(-)
  
  diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
  index bc7b0ab..425d45d 100644
  --- a/net/openvswitch/flow.c
  +++ b/net/openvswitch/flow.c
  @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
  struct ovs_flow_stats *ovs_stats,
  unsigned long *used, __be16 *tcp_flags)
   {
  -   int node;
  +   int node, node_cnt = 0;
  
  *used = 0;
  *tcp_flags = 0;
  memset(ovs_stats, 0, sizeof(*ovs_stats));
  
  for_each_node(node) {
  -   struct flow_stats *stats = 
  rcu_dereference_ovsl(flow-stats[node]);
  +   struct flow_stats *stats = 
  rcu_dereference_ovsl(flow-stats[node_cnt]);
  
  if (stats) {
  /* Local CPU may write on non-local stats, so we must
  @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
  ovs_stats-n_bytes += stats-byte_count;
  spin_unlock_bh(stats-lock);
  }
  +   node_cnt++;
  }
   }
  
   /* Called with ovs_mutex. */
   void ovs_flow_stats_clear(struct sw_flow *flow)
   {
  -   int node;
  +   int node, node_cnt = 0;
  
  for_each_node(node) {
  -   struct flow_stats *stats = ovsl_dereference(flow-stats[node]);
  +   struct flow_stats *stats = 
  ovsl_dereference(flow-stats[node_cnt]);
  
  if (stats) {
  spin_lock_bh(stats-lock);
  @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
  stats-tcp_flags = 0;
  spin_unlock_bh(stats-lock);
  }
  +   node_cnt++;
  }
   }
  
  diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
  index 4613df8..5d10c54 100644
  --- a/net/openvswitch/flow_table.c
  +++ b/net/openvswitch/flow_table.c
  @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void)
   {
  struct sw_flow *flow;
  struct flow_stats *stats;
  -   int node;
  +   int node, node_cnt = 0;
  
  flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
  if (!flow)
  @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void)
  
  RCU_INIT_POINTER(flow-stats[0], stats);
  
  -   for_each_node(node)
  +   for_each_node(node) {
  if (node != 0)
  -   RCU_INIT_POINTER(flow-stats[node], NULL);
  +   RCU_INIT_POINTER(flow-stats[node_cnt], NULL);
  +   node_cnt++;
  +   }
  
  return flow;
   err:
  @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int 
  n_buckets)
  
   static void flow_free(struct sw_flow *flow)
   {
  -   int node;
  +   int node, node_cnt = 0;
  
  if (ovs_identifier_is_key(flow-id))
  kfree(flow-id.unmasked_key);
  kfree((struct sw_flow_actions __force *)flow-sf_acts);
  -   for_each_node(node)
  -   if (flow-stats[node])
  +   for_each_node(node) {
  +   if (flow-stats[node_cnt])

Re: [PATCH] netcp:Fix error handling in the function netcp_xgbe_serdes_config

2015-07-21 Thread Murali Karicheri

On 07/20/2015 11:54 AM, Nicholas Krause wrote:

This fixes error handling in the function netcp_xgbe_serdes_config
by putting the return value of netcp_xgbe_serdes_check_lane into
the variable ret and return this value to the caller as this function
can fail when called by returning the error code -ETIMEOUT.

Signed-off-by: Nicholas Krause xerofo...@gmail.com
---
  drivers/net/ethernet/ti/netcp_xgbepcsr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_xgbepcsr.c 
b/drivers/net/ethernet/ti/netcp_xgbepcsr.c
index 33571ac..0c79e3d 100644
--- a/drivers/net/ethernet/ti/netcp_xgbepcsr.c
+++ b/drivers/net/ethernet/ti/netcp_xgbepcsr.c
@@ -483,7 +483,7 @@ static int netcp_xgbe_serdes_config(void __iomem 
*serdes_regs,
return ret;

netcp_xgbe_serdes_enable_xgmii_port(sw_regs);
-   netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs);
+   ret = netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs);
return ret;
  }



Nicholas,

Thanks for the patch.

Acked-by: Murali Karicheri m-kariche...@ti.com

--
Murali Karicheri
Linux Kernel, Keystone
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [2/2] iwlegacy: convert hex_dump_to_buffer() to %*ph

2015-07-21 Thread Kalle Valo

 There is no need to use hex_dump_to_buffer() in the cases like this:
 
   hexdump_to_buffer(buf, len, 16, 1, outbuf, outlen, false);  /* len 
 = 16 */
   sprintf(%s\n, outbuf);
 
 since it maybe easily converted to simple:
 
   sprintf(%*ph\n, len, buf);
 
 Note: it seems in the case the output is groupped by 2 bytes and looks like a
 typo. Thus, patch changes that to plain byte stream.
 
 Signed-off-by: Andy Shevchenko andriy.shevche...@linux.intel.com

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Martin Blumenstingl
9c70776 added validation for the packet size in packet_snd. This change
enforced that every packet needs a long enough header and at least one
byte payload.

However, when trying to establish a PPPoE connection the following message
is printed every time a PPPoE discovery packet is sent:
pppd: packet size is too short (24 = 24)

From what I can see in the PPPoE code the PADI discovery packet can
consist of only a header with no payload (when there is neither a service
name nor a Host-Uniq configured).

Signed-off-by: Martin Blumenstingl martin.blumensti...@googlemail.com
---
 net/packet/af_packet.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9e8741..d983f8f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2199,18 +2199,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
sock_wfree(skb);
 }
 
-static bool ll_header_truncated(const struct net_device *dev, int len)
-{
-   /* net device doesn't like empty head */
-   if (unlikely(len = dev-hard_header_len)) {
-   net_warn_ratelimited(%s: packet size is too short (%d = 
%d)\n,
-current-comm, len, dev-hard_header_len);
-   return true;
-   }
-
-   return false;
-}
-
 static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
void *frame, struct net_device *dev, int size_max,
__be16 proto, unsigned char *addr, int hlen)
@@ -2286,8 +2274,14 @@ static int tpacket_fill_skb(struct packet_sock *po, 
struct sk_buff *skb,
if (unlikely(err  0))
return -EINVAL;
} else if (dev-hard_header_len) {
-   if (ll_header_truncated(dev, tp_len))
+   /* net device doesn't like empty head */
+   if (unlikely(len = dev-hard_header_len)) {
+   net_warn_ratelimited(%s: packet size is too short 
+   (%d = %d)\n,
+   current-comm, len,
+   dev-hard_header_len);
return -EINVAL;
+   }
 
skb_push(skb, dev-hard_header_len);
err = skb_store_bits(skb, 0, data,
@@ -2624,8 +2618,13 @@ static int packet_snd(struct socket *sock, struct msghdr 
*msg, size_t len)
if (unlikely(offset  0))
goto out_free;
} else {
-   if (ll_header_truncated(dev, len))
+   if (unlikely(len  dev-hard_header_len)) {
+   net_warn_ratelimited(%s: packet size is shorter than 
+   minimum header size (%d  %d)\n,
+   current-comm, len,
+   dev-hard_header_len);
goto out_free;
+   }
}
 
/* Returns -EFAULT on error */
-- 
2.4.6

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Willem de Bruijn
On Tue, Jul 21, 2015 at 12:14 PM, Martin Blumenstingl
martin.blumensti...@googlemail.com wrote:
 9c70776 added validation for the packet size in packet_snd. This change
 enforced that every packet needs a long enough header and at least one
 byte payload.

 However, when trying to establish a PPPoE connection the following message
 is printed every time a PPPoE discovery packet is sent:
 pppd: packet size is too short (24 = 24)

 From what I can see in the PPPoE code the PADI discovery packet can
 consist of only a header with no payload (when there is neither a service
 name nor a Host-Uniq configured).

Interesting. 9c7077622dd9 only extended the check from tpacket_snd to
packet_snd to make the two paths equivalent. The existing check had the
ominous statement

/* net device doesn't like empty head */

so allowing a header-only packet while correct in your case may not be
safe in some edge cases (specific device drivers?).

This was also discussed previously

  http://www.spinics.net/lists/netdev/msg309677.html

In any case, I don't think that reverting the patch and restoring the old
inconsistent state is a fix.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtlwifi: rtl8821ae: Fix an expression that is always false

2015-07-21 Thread Kalle Valo

 In routine _rtl8821ae_set_media_status(), an incorrect mask results in a test
 for AP status to always be false. Similar bugs were fixed in rtl8192cu and
 rtl8192de, but this instance was missed at that time.
 
 Reported-by: David Binderman dcb...@hotmail.com
 Signed-off-by: Larry Finger larry.fin...@lwfinger.net
 Cc: Stable sta...@vger.kernel.org [3.18+]
 Cc: David Binderman dcb...@hotmail.com

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 net-next 0/3] ARM BPF JIT features

2015-07-21 Thread Alexei Starovoitov

On 7/21/15 5:16 AM, Nicolas Schichan wrote:

This serie adds support for more instructions to the ARM BPF JIT
namely skb netdevice type retrieval, skb payload offset retrieval, and
skb packet type retrieval.

This allows 35 tests to use the JIT instead of 29 before.

This serie depends on the BPF JIT fixes for ARM serie sent earlier.


Actually in these patches I don't see a strong dependency on 'net' set,
but since you're saying there is, you'd need to resubmit this set after
your 'net' set is merged, whole 'net' sent to Linus and merged
into net-next.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Martin Blumenstingl
Hi Willem,

On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn will...@google.com wrote:
 Interesting. 9c7077622dd9 only extended the check from tpacket_snd to
 packet_snd to make the two paths equivalent. The existing check had the
 ominous statement

 /* net device doesn't like empty head */
OK, I guess it's best to find out what the purpose of this comment is.

 so allowing a header-only packet while correct in your case may not be
 safe in some edge cases (specific device drivers?).
I'm wondering how a good fix would look like (I can think of a few
things, like renaming hard_header_len to something min_packet_size)?
I am open for suggestions since I have zero knowledge about the inner
workings of the packet framework.

 This was also discussed previously

   http://www.spinics.net/lists/netdev/msg309677.html

 In any case, I don't think that reverting the patch and restoring the old
 inconsistent state is a fix.
I totally agree with you that it's a bad fix if this means that we
could break other drivers.
My primary goal was to fix PPPoE connections - I guess I should have
simply added RFC to the subject.


Regards,
Martin
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
 Some architectures like POWER can have a NUMA node_possible_map that
 contains sparse entries. This causes memory corruption with openvswitch
 since it allocates flow_cache with a multiple of num_possible_nodes() and

Couldn't this also be fixed by just allocationg with a multiple of
nr_node_ids (which seems to have been the original intent all along)?
You could then make your stats array be sparse or not.

 assumes the node variable returned by for_each_node will index into
 flow-stats[node].
 
 For example, if node_possible_map is 0x30003, this patch will map node to
 node_cnt as follows:
 0,1,16,17 = 0,1,2,3
 
 The crash was noticed after 3af229f2 was applied as it changed the
 node_possible_map to match node_online_map on boot.
 Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

My concern with this version of the fix is that you're relying on,
implicitly, the order of for_each_node's iteration corresponding to the
entries in stats 1:1. But what about node hotplug? It seems better to
have the enumeration of the stats array match the topology accurately,
rather, or to maintain some sort of internal map in the OVS code between
the NUMA node and the entry in the stats array?

I'm willing to be convinced otherwise, though :)

-Nish

 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
 ---
  net/openvswitch/flow.c   | 10 ++
  net/openvswitch/flow_table.c | 18 +++---
  2 files changed, 17 insertions(+), 11 deletions(-)
 
 diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
 index bc7b0ab..425d45d 100644
 --- a/net/openvswitch/flow.c
 +++ b/net/openvswitch/flow.c
 @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
   struct ovs_flow_stats *ovs_stats,
   unsigned long *used, __be16 *tcp_flags)
  {
 - int node;
 + int node, node_cnt = 0;
 
   *used = 0;
   *tcp_flags = 0;
   memset(ovs_stats, 0, sizeof(*ovs_stats));
 
   for_each_node(node) {
 - struct flow_stats *stats = 
 rcu_dereference_ovsl(flow-stats[node]);
 + struct flow_stats *stats = 
 rcu_dereference_ovsl(flow-stats[node_cnt]);
 
   if (stats) {
   /* Local CPU may write on non-local stats, so we must
 @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
   ovs_stats-n_bytes += stats-byte_count;
   spin_unlock_bh(stats-lock);
   }
 + node_cnt++;
   }
  }
 
  /* Called with ovs_mutex. */
  void ovs_flow_stats_clear(struct sw_flow *flow)
  {
 - int node;
 + int node, node_cnt = 0;
 
   for_each_node(node) {
 - struct flow_stats *stats = ovsl_dereference(flow-stats[node]);
 + struct flow_stats *stats = 
 ovsl_dereference(flow-stats[node_cnt]);
 
   if (stats) {
   spin_lock_bh(stats-lock);
 @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
   stats-tcp_flags = 0;
   spin_unlock_bh(stats-lock);
   }
 + node_cnt++;
   }
  }
 
 diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
 index 4613df8..5d10c54 100644
 --- a/net/openvswitch/flow_table.c
 +++ b/net/openvswitch/flow_table.c
 @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void)
  {
   struct sw_flow *flow;
   struct flow_stats *stats;
 - int node;
 + int node, node_cnt = 0;
 
   flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
   if (!flow)
 @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void)
 
   RCU_INIT_POINTER(flow-stats[0], stats);
 
 - for_each_node(node)
 + for_each_node(node) {
   if (node != 0)
 - RCU_INIT_POINTER(flow-stats[node], NULL);
 + RCU_INIT_POINTER(flow-stats[node_cnt], NULL);
 + node_cnt++;
 + }
 
   return flow;
  err:
 @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int 
 n_buckets)
 
  static void flow_free(struct sw_flow *flow)
  {
 - int node;
 + int node, node_cnt = 0;
 
   if (ovs_identifier_is_key(flow-id))
   kfree(flow-id.unmasked_key);
   kfree((struct sw_flow_actions __force *)flow-sf_acts);
 - for_each_node(node)
 - if (flow-stats[node])
 + for_each_node(node) {
 + if (flow-stats[node_cnt])
   kmem_cache_free(flow_stats_cache,
 - (struct flow_stats __force 
 *)flow-stats[node]);
 + (struct flow_stats __force 
 *)flow-stats[node_cnt]);
 + node_cnt++;
 + }
   kmem_cache_free(flow_cache, flow);
  }
 
 -- 
 1.9.1
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[net-next:master 187/208] net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces)

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: e3e4712ec0961ed586a8db340bd994c4ad7f5dba [187/208] mpls: ip tunnel 
support
reproduce:
  # apt-get install sparse
  git checkout e3e4712ec0961ed586a8db340bd994c4ad7f5dba
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by )

 net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison 
 expression (different address spaces)

vim +73 net/mpls/mpls_iptunnel.c

57  /* Obtain the ttl */
58  if (skb-protocol == htons(ETH_P_IP)) {
59  ttl = ip_hdr(skb)-ttl;
60  rt = (struct rtable *)dst;
61  lwtstate = rt-rt_lwtstate;
62  } else if (skb-protocol == htons(ETH_P_IPV6)) {
63  ttl = ipv6_hdr(skb)-hop_limit;
64  rt6 = (struct rt6_info *)dst;
65  lwtstate = rt6-rt6i_lwtstate;
66  } else {
67  goto drop;
68  }
69  
70  skb_orphan(skb);
71  
72  /* Find the output device */
   73  out_dev = rcu_dereference(dst-dev);
74  if (!mpls_output_possible(out_dev) ||
75  !lwtstate || skb_warn_if_lro(skb))
76  goto drop;
77  
78  skb_forward_csum(skb);
79  
80  tun_encap_info = mpls_lwtunnel_encap(lwtstate);
81  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread YOSHIFUJI Hideaki
Hi,

Martin KaFai Lau wrote:
 The patch checks neigh-nud_state before acquiring the writer lock.
 Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

You have to take some lock when accessing neigh-nud_state
theoretically.

 
 I also take this chance to re-arrange the code.

No, please do not mix multiple changes.

 
 40 udpflood processes and a /64 gateway route are used.
 The gateway has NUD_PERMANENT.  Each of them is run for 30s.
 At the end, the total number of finished sendto():
 
 BeforeAfter
 55M 95M
 
 Signed-off-by: Martin KaFai Lau ka...@fb.com
 Cc: Hannes Frederic Sowa han...@stressinduktion.org
 ---
  net/ipv6/route.c | 41 -
  1 file changed, 20 insertions(+), 21 deletions(-)
 
 diff --git a/net/ipv6/route.c b/net/ipv6/route.c
 index 6090969..a6c6b5a 100644
 --- a/net/ipv6/route.c
 +++ b/net/ipv6/route.c
 @@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w)
  
  static void rt6_probe(struct rt6_info *rt)
  {
 + struct __rt6_probe_work *work;
   struct neighbour *neigh;
   /*
* Okay, this does not seem to be appropriate
 @@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt)
   rcu_read_lock_bh();
   neigh = __ipv6_neigh_lookup_noref(rt-dst.dev, rt-rt6i_gateway);
   if (neigh) {
 - write_lock(neigh-lock);
   if (neigh-nud_state  NUD_VALID)
   goto out;
 - }
 -
 - if (!neigh ||
 - time_after(jiffies, neigh-updated + 
 rt-rt6i_idev-cnf.rtr_probe_interval)) {
 - struct __rt6_probe_work *work;
  
 + work = NULL;
 + write_lock(neigh-lock);
 + if (!(neigh-nud_state  NUD_VALID) 
 + time_after(jiffies, neigh-updated + 
 rt-rt6i_idev-cnf.rtr_probe_interval)) {
 + work = kmalloc(sizeof(*work), GFP_ATOMIC);
 + if (work) {
 + __neigh_set_probe_once(neigh);
 + }
 + }
 + write_unlock(neigh-lock);
 + } else {
   work = kmalloc(sizeof(*work), GFP_ATOMIC);
 + }
  
 - if (neigh  work)
 - __neigh_set_probe_once(neigh);
 -
 - if (neigh)
 - write_unlock(neigh-lock);
 + if (work) {
 + INIT_WORK(work-work, rt6_probe_deferred);
 + work-target = rt-rt6i_gateway;
 + dev_hold(rt-dst.dev);
 + work-dev = rt-dst.dev;
 + schedule_work(work-work);
 + }
  
 - if (work) {
 - INIT_WORK(work-work, rt6_probe_deferred);
 - work-target = rt-rt6i_gateway;
 - dev_hold(rt-dst.dev);
 - work-dev = rt-dst.dev;
 - schedule_work(work-work);
 - }
 - } else {
  out:
 - write_unlock(neigh-lock);
 - }
   rcu_read_unlock_bh();
  }
  #else
 

-- 
Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com
Technical Division, MIRACLE LINUX CORPORATION
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers

2015-07-21 Thread Alexei Starovoitov
On Tue, Jul 21, 2015 at 07:00:40PM -0700, Alex Gartrell wrote:
 mov %rsp, %r1   ; r1 = rsp
 add $-8, %r1; r1 = rsp - 8
 store_q $123, -8(%rsp)  ; *(u64*)r1 = 123  - valid
 store_q $123, (%r1) ; *(u64*)r1 = 123  - previously invalid
 mov $0, %r0
 exit; Always need to exit

Is this your new eBPF assembler syntax? :)
imo gnu style looks ugly... ;)

It's great to see such in-depth understanding of verifier!!

 And we'd get the following error:
 
   0: (bf) r1 = r10
   1: (07) r1 += -8
   2: (7a) *(u64 *)(r10 -8) = 999
   3: (7a) *(u64 *)(r1 +0) = 999
   R1 invalid mem access 'fp'
 
   Unable to load program
 
 We already know that a register is a stack address and the appropriate
 offset, so we should be able to validate those references as well.

yes, we can teach verifier to do that.
Though llvm doesn't generate such code. It's small enough change.

 Signed-off-by: Alex Gartrell agartr...@fb.com
 ---
  kernel/bpf/verifier.c | 9 +
  1 file changed, 9 insertions(+)
 
 diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
 index 039d866..5dfbece 100644
 --- a/kernel/bpf/verifier.c
 +++ b/kernel/bpf/verifier.c
 @@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, 
 u32 regno, int off,
   err = check_stack_write(state, off, size, value_regno);
   else
   err = check_stack_read(state, off, size, value_regno);
 + } else if (state-regs[regno].type == PTR_TO_STACK) {
 + int real_off = state-regs[regno].imm + off;

real_off is missing alignment and bounds checks.
something like:
if (state-regs[regno].type == PTR_TO_STACK)
off += state-regs[regno].imm;
if (off % size != 0)
...
else if (state-regs[regno].type == FRAME_PTR || == PTR_TO_STACK)
.. as-is here ...

would fix it.

please add few accept and reject tests for this to test_verifier.c as well.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211_hwsim: unregister genetlink family properly

2015-07-21 Thread Su Kang Yin
During hwsim_init_netlink(), we should call genl_unregister_family()
if failed on netlink_register_notifier() since the genetlink is
already registered.

Signed-off-by: Su Kang Yin cant...@cantona.net
---
 drivers/net/wireless/mac80211_hwsim.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mac80211_hwsim.c 
b/drivers/net/wireless/mac80211_hwsim.c
index 99e873d..16d953e 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -3120,8 +3120,10 @@ static int hwsim_init_netlink(void)
goto failure;
 
rc = netlink_register_notifier(hwsim_netlink_notifier);
-   if (rc)
+   if (rc) {
+   genl_unregister_family(hwsim_genl_family);
goto failure;
+   }
 
return 0;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf] netfilter: Support expectations in different zones

2015-07-21 Thread Joe Stringer
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.

Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for conntrack zones)

Signed-off-by: Joe Stringer joestrin...@nicira.com
---
 net/netfilter/nf_conntrack_expect.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_expect.c 
b/net/netfilter/nf_conntrack_expect.c
index 7a17070..b45a422 100644
--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -219,7 +219,8 @@ static inline int expect_clash(const struct 
nf_conntrack_expect *a,
a-mask.src.u3.all[count]  b-mask.src.u3.all[count];
}
 
-   return nf_ct_tuple_mask_cmp(a-tuple, b-tuple, intersect_mask);
+   return nf_ct_tuple_mask_cmp(a-tuple, b-tuple, intersect_mask) 
+  nf_ct_zone(a-master) == nf_ct_zone(b-master);
 }
 
 static inline int expect_matches(const struct nf_conntrack_expect *a,
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 200/208] drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different base types)

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: 614732eaa12dd462c0ab274700bed14f36afea5e [200/208] openvswitch: Use 
regular VXLAN net_device device
reproduce:
  # apt-get install sparse
  git checkout 614732eaa12dd462c0ab274700bed14f36afea5e
  make ARCH=x86_64 allmodconfig
  make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by )

   include/net/checksum.h:166:35: sparse: incorrect type in argument 1 
(different base types)
   include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum
   include/net/checksum.h:166:35:got restricted __sum16
   include/net/checksum.h:166:43: sparse: incorrect type in argument 2 
(different base types)
   include/net/checksum.h:166:43:expected restricted __wsum [usertype] 
addend
   include/net/checksum.h:166:43:got restricted __sum16 [usertype] noident
   include/net/checksum.h:174:43: sparse: incorrect type in argument 2 
(different base types)
   include/net/checksum.h:174:43:expected restricted __wsum [usertype] 
addend
   include/net/checksum.h:174:43:got restricted __sum16 [usertype] noident
   include/net/checksum.h:166:35: sparse: incorrect type in argument 1 
(different base types)
   include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum
   include/net/checksum.h:166:35:got restricted __sum16
   include/net/checksum.h:166:43: sparse: incorrect type in argument 2 
(different base types)
   include/net/checksum.h:166:43:expected restricted __wsum [usertype] 
addend
   include/net/checksum.h:166:43:got restricted __sum16 [usertype] noident
 drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different 
 base types)
   drivers/net/vxlan.c:1739:21:expected restricted __be32 [usertype] vx_vni
   drivers/net/vxlan.c:1739:21:got unsigned int [unsigned] [usertype] vni
   drivers/net/vxlan.c:1818:21: sparse: incorrect type in assignment (different 
base types)
   drivers/net/vxlan.c:1818:21:expected restricted __be32 [usertype] vx_vni
   drivers/net/vxlan.c:1818:21:got unsigned int [unsigned] [usertype] vni
 drivers/net/vxlan.c:2014:58: sparse: incorrect type in argument 11 
 (different base types)
   drivers/net/vxlan.c:2014:58:expected unsigned int [unsigned] [usertype] 
vni
   drivers/net/vxlan.c:2014:58:got restricted __be32 [usertype] noident
   drivers/net/vxlan.c:2072:67: sparse: incorrect type in argument 11 
(different base types)
   drivers/net/vxlan.c:2072:67:expected unsigned int [unsigned] [usertype] 
vni
   drivers/net/vxlan.c:2072:67:got restricted __be32 [usertype] noident

vim +1739 drivers/net/vxlan.c

  1723  }
  1724  
  1725  skb = vlan_hwaccel_push_inside(skb);
  1726  if (WARN_ON(!skb)) {
  1727  err = -ENOMEM;
  1728  goto err;
  1729  }
  1730  
  1731  skb = iptunnel_handle_offloads(skb, udp_sum, type);
  1732  if (IS_ERR(skb)) {
  1733  err = -EINVAL;
  1734  goto err;
  1735  }
  1736  
  1737  vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
  1738  vxh-vx_flags = htonl(VXLAN_HF_VNI);
 1739  vxh-vx_vni = vni;
  1740  
  1741  if (type  SKB_GSO_TUNNEL_REMCSUM) {
  1742  u32 data = (skb_checksum_start_offset(skb) - hdrlen) 
  1743 VXLAN_RCO_SHIFT;
  1744  
  1745  if (skb-csum_offset == offsetof(struct udphdr, check))
  1746  data |= VXLAN_RCO_UDP;
  1747  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] cgroup: net_cls: fix false-positive suspicious RCU usage

2015-07-21 Thread David Miller
From: Konstantin Khlebnikov khlebni...@yandex-team.ru
Date: Tue, 21 Jul 2015 19:46:29 +0300

 @@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct 
 cgroup_subsys_state
  
  struct cgroup_cls_state *task_cls_state(struct task_struct *p)
  {
 - return css_cls_state(task_css(p, net_cls_cgrp_id));
 +   return css_cls_state(task_css_check(p, net_cls_cgrp_id,
 +rcu_read_lock_bh_held()));

You've made a serious mess of the indentation here.

First of all, you've changed the correct plain TAB before the 'return' line
into a TAB and two SPACE characters.

Secondly, the second line needs to be precisely indented to the exact column
following the openning parenthesis of the task_css_check() call on the
previous line.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay

2015-07-21 Thread David Miller
From: Dan Murphy dmur...@ti.com
Date: Tue, 21 Jul 2015 12:06:45 -0500

 Fix warning: logical ‘or’ of collectively exhaustive tests is always true
 
 Change the internal delay check from an 'or' condition to an 'and'
 condition.
 
 Reported-by: David Binderman dcb...@hotmail.com
 Signed-off-by: Dan Murphy dmur...@ti.com

Applied, thanks.


Re: [PATCH net v2] macvtap: fix network header pointer for VLAN tagged pkts

2015-07-21 Thread Toshiaki Makita
On 15/07/21 (火) 16:18, Ivan Vecera wrote:
 Network header is set with offset ETH_HLEN but it is not true for VLAN
 (multiple-)tagged and results in checksum issues in lower devices.
 
 v2: leave skb-protocol untouched (thx Vlad), comment added
 
 Signed-off-by: Ivan Vecera ivec...@redhat.com
 ---
   drivers/net/macvtap.c | 7 +++
   1 file changed, 7 insertions(+)
 
 diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
 index 3b933bb..b75776b 100644
 --- a/drivers/net/macvtap.c
 +++ b/drivers/net/macvtap.c
 @@ -796,6 +796,13 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
 struct msghdr *m,
   skb_reset_mac_header(skb);
   skb-protocol = eth_hdr(skb)-h_proto;
   
 + /* Move network header to the right position for VLAN tagged packets */
 + if (skb_vlan_tagged(skb)) {

I guess you don't need the condition skb_vlan_tag_present(skb), i.e.,

if (skb-protocol == htons(ETH_P_8021Q) ||
skb-protocol == htons(ETH_P_8021AD))

 + int depth;
 + __vlan_get_protocol(skb, skb-protocol, depth);

__vlan_get_protocol() can fail, and then, depth will not be initialized.

 + skb_set_network_header(skb, depth);

I think you should set network_header after
skb_probe_transport_header(). It calls skb_flow_dissect_flow_keys(),
which seems to expect network_header to be ETH_HLEN.

Toshiaki Makita
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2] brcmsmac: Use kstrdup to simplify code

2015-07-21 Thread Kalle Valo

 Replace a kmalloc+strcpy by an equivalent kstrdup in order to improve
 readability.
 
 Signed-off-by: Christophe JAILLET christophe.jail...@wanadoo.fr
 Acked-by: Arend van Spriel ar...@broadcom.com

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] mpls: make RTA_OIF optional

2015-07-21 Thread Roopa Prabhu
From: Roopa Prabhu ro...@cumulusnetworks.com

If user did not specify an oif, try and get it from the via address.
If failed to get device, return with -ENODEV.

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/mpls/af_mpls.c |   67 +++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1f93a59..4cd3789 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -15,6 +15,7 @@
 #include net/ip_fib.h
 #include net/netevent.h
 #include net/netns/generic.h
+#include net/ip6_route.h
 #include internal.h
 
 #define LABEL_NOT_SPECIFIED (120)
@@ -327,6 +328,70 @@ static unsigned find_free_label(struct net *net)
return LABEL_NOT_SPECIFIED;
 }
 
+static struct net_device *inet_fib_lookup_dev(struct net *net, void *addr)
+{
+   struct net_device *dev = NULL;
+   struct rtable *rt;
+   struct in_addr daddr;
+
+   memcpy(daddr, addr, sizeof(struct in_addr));
+   rt = ip_route_output(net, daddr.s_addr, 0, 0, 0);
+   if (IS_ERR(rt))
+   goto errout;
+
+   dev = rt-dst.dev;
+   dev_hold(dev);
+
+   ip_rt_put(rt);
+
+errout:
+   return dev;
+}
+
+static struct net_device *inet6_fib_lookup_dev(struct net *net, void *addr)
+{
+   struct net_device *dev = NULL;
+   struct dst_entry *dst;
+   struct flowi6 fl6;
+
+   memset(fl6, 0, sizeof(fl6));
+   memcpy(fl6.daddr, addr, sizeof(struct in6_addr));
+   dst = ip6_route_output(net, NULL, fl6);
+   if (dst-error)
+   goto errout;
+
+   dev = dst-dev;
+   dev_hold(dev);
+
+errout:
+   dst_release(dst);
+
+   return dev;
+}
+
+static struct net_device *find_outdev(struct net *net,
+ struct mpls_route_config *cfg)
+{
+   struct net_device *dev = NULL;
+
+   if (!cfg-rc_ifindex) {
+   switch (cfg-rc_via_table) {
+   case NEIGH_ARP_TABLE:
+   dev = inet_fib_lookup_dev(net, cfg-rc_via);
+   break;
+   case NEIGH_ND_TABLE:
+   dev = inet6_fib_lookup_dev(net, cfg-rc_via);
+   break;
+   case NEIGH_LINK_TABLE:
+   break;
+   }
+   } else {
+   dev = dev_get_by_index(net, cfg-rc_ifindex);
+   }
+
+   return dev;
+}
+
 static int mpls_route_add(struct mpls_route_config *cfg)
 {
struct mpls_route __rcu **platform_label;
@@ -358,7 +423,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
goto errout;
 
err = -ENODEV;
-   dev = dev_get_by_index(net, cfg-rc_ifindex);
+   dev = find_outdev(net, cfg);
if (!dev)
goto errout;
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] packet: Allow packets with only a header (but no payload)

2015-07-21 Thread Willem de Bruijn
On Tue, Jul 21, 2015 at 12:38 PM, Martin Blumenstingl
martin.blumensti...@googlemail.com wrote:
 Hi Willem,

 On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn will...@google.com wrote:
 Interesting. 9c7077622dd9 only extended the check from tpacket_snd to
 packet_snd to make the two paths equivalent. The existing check had the
 ominous statement

 /* net device doesn't like empty head */
 OK, I guess it's best to find out what the purpose of this comment is.

 so allowing a header-only packet while correct in your case may not be
 safe in some edge cases (specific device drivers?).
 I'm wondering how a good fix would look like (I can think of a few
 things, like renaming hard_header_len to something min_packet_size)?
 I am open for suggestions since I have zero knowledge about the inner
 workings of the packet framework.

I don't see a simple way of verifying the safety of allowing packets
without data short of a code audit, which would be huge, especially
when taking device driver logic into account. Perhaps someone
remembers why that statement was added and what edge case(s)
it refers to. I'm afraid that I don't. It was added in 69e3c75f4d54. I
added the author to this thread.

 This was also discussed previously

   http://www.spinics.net/lists/netdev/msg309677.html

 In any case, I don't think that reverting the patch and restoring the old
 inconsistent state is a fix.
 I totally agree with you that it's a bad fix if this means that we
 could break other drivers.
 My primary goal was to fix PPPoE connections - I guess I should have
 simply added RFC to the subject.


 Regards,
 Martin
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: phy: dp83867: Fix warning check for setting the internal delay

2015-07-21 Thread Dan Murphy
Fix warning: logical ‘or’ of collectively exhaustive tests is always true

Change the internal delay check from an 'or' condition to an 'and'
condition.

Reported-by: David Binderman dcb...@hotmail.com
Signed-off-by: Dan Murphy dmur...@ti.com
---
 drivers/net/phy/dp83867.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index c7a12e2..8a3bf54 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev)
return ret;
}
 
-   if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) ||
+   if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) 
(phydev-interface = PHY_INTERFACE_MODE_RGMII_RXID)) {
val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL,
DP83867_DEVADDR, phydev-addr);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay

2015-07-21 Thread Florian Fainelli
On 21/07/15 10:06, Dan Murphy wrote:
 Fix warning: logical ‘or’ of collectively exhaustive tests is always true
 
 Change the internal delay check from an 'or' condition to an 'and'
 condition.
 
 Reported-by: David Binderman dcb...@hotmail.com
 Signed-off-by: Dan Murphy dmur...@ti.com

Acked-by: Florian Fainelli f.faine...@gmail.com

 ---
  drivers/net/phy/dp83867.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
 index c7a12e2..8a3bf54 100644
 --- a/drivers/net/phy/dp83867.c
 +++ b/drivers/net/phy/dp83867.c
 @@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev)
   return ret;
   }
  
 - if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) ||
 + if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) 
   (phydev-interface = PHY_INTERFACE_MODE_RGMII_RXID)) {
   val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL,
   DP83867_DEVADDR, phydev-addr);
 


-- 
Florian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Cong Wang
On Tue, Jul 21, 2015 at 3:52 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Tue, 2015-07-21 at 06:04 -0400, Jamal Hadi Salim wrote:

 It is worrisome to fix the core code for this. The root cause seems to
 be codel. Dont have time but in general, reset would be something like:

 struct fq_codel_sched_data *q = qdisc_priv(sch);
 qdisc_reset(q)

 This only works for very simple qdisc with one queue.


 or something along those lines...
 But certainly dequeue semantics dont seem right there..

 Well, reset() is trivial to implement like this

 while (skb = local_dequeue(sch)) {
 kfree_skb(skb);
 }

 And I guess I copy/pasted sfq code here, because I was lazy.

 But yes, qdisc_tree_decrease_qlen() would have to be not called.


Hmm, so the semantic is each qdisc resets qlen for its own
and calls qdisc_reset() to reset its leaf qdisc's, that makes sense
for me.


 It seems I coded fq_reset() differently.

 Alex, please try instead :

 diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
 index 21ca33c9f036..3f0320ab6029 100644
 --- a/net/sched/sch_fq_codel.c
 +++ b/net/sched/sch_fq_codel.c
 @@ -288,10 +288,21 @@ begin:

  static void fq_codel_reset(struct Qdisc *sch)
  {
 -   struct sk_buff *skb;
 +   struct fq_codel_sched_data *q = qdisc_priv(sch);
 +   int i;

 -   while ((skb = fq_codel_dequeue(sch)) != NULL)
 -   kfree_skb(skb);
 +   INIT_LIST_HEAD(q-new_flows);
 +   INIT_LIST_HEAD(q-old_flows);
 +   for (i = 0; i  q-flows_cnt; i++) {
 +   struct fq_codel_flow *flow = q-flows + i;
 +
 +   while (flow-head)
 +   kfree_skb(dequeue_head(flow));
 +
 +   INIT_LIST_HEAD(flow-flowchain);


You probably need to call codel_vars_init(flow-cvars) as well.

 +   }
 +   memset(q-backlogs, 0, q-flows_cnt * sizeof(u32));
 +   sch-q.qlen = 0;
  }

  static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {




Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net] inet: frags: fix defragmented packet's IP header for af_packet

2015-07-21 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Tue, 21 Jul 2015 09:43:59 +0200

 From: Edward Hyunkoo Jee ed...@google.com
 
 When ip_frag_queue() computes positions, it assumes that the passed
 sk_buff does not contain L2 headers.
 
 However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
 functions can be called on outgoing packets that contain L2 headers. 
 
 Also, IPv4 checksum is not corrected after reassembly.
 
 Fixes: 7736d33f4262 (packet: Add pre-defragmentation support for ipv4 
 fanouts.)
 Signed-off-by: Edward Hyunkoo Jee ed...@google.com
 Signed-off-by: Eric Dumazet eduma...@google.com

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 14/22] vxlan: Flow based tunneling

2015-07-21 Thread Thomas Graf
On 07/21/15 at 10:30am, Alexei Starovoitov wrote:
 RX:
 +info-mode = IP_TUNNEL_INFO_RX;
 +info-key.tun_flags = TUNNEL_KEY;
 +info-key.tun_id = cpu_to_be64(vni  8);
 ...
 TX:
 +dst_port = info-key.tp_dst ? : vxlan-dst_port;
 +vni = be64_to_cpu(info-key.tun_id);
 
 I think the copy paste of ovs_tunnel_info into ip_tunnel_info
 can be improved. In particular instead of '__be64 tun_id'
 we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx
 paths.
 
 netlink for this part also seems inconsistent.
 In the patch 16:
 +static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = {
 + [IP_TUN_ID] = { .type = NLA_U64 },
 ...
 + if (tb[IP_TUN_ID])
 + tun_info-key.tun_id = nla_get_u64(tb[IP_TUN_ID]);
 
 I think nla_get_be64 should be there?
 and with my suggestion we can add be64_to_cpu() here instead
 of doing it per packet.
 Thoughts?

I like this. The be64 originates from how OVS stores the tun_id in the
flow key. I agree that it makes sense to limit and delay the byteswaps
to when OVS inherits the flow key from the ip_tunnel_info. I will send
a follow-up.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cgroup: net_cls: fix false-positive suspicious RCU usage

2015-07-21 Thread Konstantin Khlebnikov
In dev_queue_xmit() net_cls protected with rcu-bh.

[  270.730026] ===
[  270.730029] [ INFO: suspicious RCU usage. ]
[  270.730033] 4.2.0-rc3+ #2 Not tainted
[  270.730036] ---
[  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() 
usage!
[  270.730041] other info that might help us debug this:
[  270.730043] rcu_scheduler_active = 1, debug_locks = 1
[  270.730045] 2 locks held by dhclient/748:
[  270.730046]  #0:  (rcu_read_lock_bh){..}, at: [81682b70] 
__dev_queue_xmit+0x50/0x960
[  270.730085]  #1:  (qdisc_tx_lock){+.}, at: [81682d60] 
__dev_queue_xmit+0x240/0x960
[  270.730090] stack backtrace:
[  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
[  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 
01/01/2011
[  270.730100]  0001 8800bafeba58 817ad487 
0007
[  270.730103]  880232a0a780 8800bafeba88 810ca4f2 
88022fb23e00
[  270.730105]  880232a0a780 8800bafebb68 8800bafebb68 
8800bafebaa8
[  270.730108] Call Trace:
[  270.730121]  [817ad487] dump_stack+0x4c/0x65
[  270.730148]  [810ca4f2] lockdep_rcu_suspicious+0xe2/0x120
[  270.730153]  [816a62d2] task_cls_state+0x92/0xa0
[  270.730158]  [a00b534f] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[  270.730164]  [816aac74] tc_classify_compat+0x74/0xc0
[  270.730166]  [816ab573] tc_classify+0x33/0x90
[  270.730170]  [a00bcb0a] htb_enqueue+0xaa/0x4a0 [sch_htb]
[  270.730172]  [81682e26] __dev_queue_xmit+0x306/0x960
[  270.730174]  [81682b70] ? __dev_queue_xmit+0x50/0x960
[  270.730176]  [816834a3] dev_queue_xmit_sk+0x13/0x20
[  270.730185]  [81787770] dev_queue_xmit+0x10/0x20
[  270.730187]  [8178b91c] packet_snd.isra.62+0x54c/0x760
[  270.730190]  [8178be25] packet_sendmsg+0x2f5/0x3f0
[  270.730203]  [81665245] ? sock_def_readable+0x5/0x190
[  270.730210]  [817b64bb] ? _raw_spin_unlock+0x2b/0x40
[  270.730216]  [8173bcbc] ? unix_dgram_sendmsg+0x5cc/0x640
[  270.730219]  [8165f367] sock_sendmsg+0x47/0x50
[  270.730221]  [8165f42f] sock_write_iter+0x7f/0xd0
[  270.730232]  [811fd4c7] __vfs_write+0xa7/0xf0
[  270.730234]  [811fe5b8] vfs_write+0xb8/0x190
[  270.730236]  [811fe8c2] SyS_write+0x52/0xb0
[  270.730239]  [817b6bae] entry_SYSCALL_64_fastpath+0x12/0x76

Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru
---
 net/core/netclassid_cgroup.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c
index 1f2a126f4ffa..515939034298 100644
--- a/net/core/netclassid_cgroup.c
+++ b/net/core/netclassid_cgroup.c
@@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct 
cgroup_subsys_state
 
 struct cgroup_cls_state *task_cls_state(struct task_struct *p)
 {
-   return css_cls_state(task_css(p, net_cls_cgrp_id));
+   return css_cls_state(task_css_check(p, net_cls_cgrp_id,
+  rcu_read_lock_bh_held()));
 }
 EXPORT_SYMBOL_GPL(task_cls_state);
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Identifying underlying interface from struct sock

2015-07-21 Thread Guru Prasad
Hi,

First, I apologize for posting on the netdev forum. Majordomo did not
list any other network related mailing list.

Is there a way to identify the underlying network interface from an
instance of struct sock? I realize that the socket is abstract and
shouldn't/doesn't necessarily depend on the underlying interface, but
say, with TCP, where the connection is endpoint oriented, shouldn't
this mean that the socket maintains a reference to the interface to
which it is associated?

I tried
dev = dev_get_by_index(sock_net(sk), skb-skb_iif);
and
dev = skb-dev;
but in both cases, dev was NULL.

I'm trying to reference the underlying interface to determine whether
the conditions present in that interface are acceptable for
transmission.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 14/22] vxlan: Flow based tunneling

2015-07-21 Thread Alexei Starovoitov

On 7/21/15 1:43 AM, Thomas Graf wrote:

This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.


+1. looks very useful.

RX:

+   info-mode = IP_TUNNEL_INFO_RX;
+   info-key.tun_flags = TUNNEL_KEY;
+   info-key.tun_id = cpu_to_be64(vni  8);

...
TX:

+   dst_port = info-key.tp_dst ? : vxlan-dst_port;
+   vni = be64_to_cpu(info-key.tun_id);


I think the copy paste of ovs_tunnel_info into ip_tunnel_info
can be improved. In particular instead of '__be64 tun_id'
we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx
paths.

netlink for this part also seems inconsistent.
In the patch 16:
+static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = {
+   [IP_TUN_ID] = { .type = NLA_U64 },
...
+   if (tb[IP_TUN_ID])
+   tun_info-key.tun_id = nla_get_u64(tb[IP_TUN_ID]);

I think nla_get_be64 should be there?
and with my suggestion we can add be64_to_cpu() here instead
of doing it per packet.
Thoughts?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Chris J Arges
Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow-stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
---
 net/openvswitch/flow_table.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 4613df8..6552394 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -752,7 +752,7 @@ int ovs_flow_init(void)
BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
 
flow_cache = kmem_cache_create(sw_flow, sizeof(struct sw_flow)
-  + (num_possible_nodes()
+  + (nr_node_ids
  * sizeof(struct flow_stats *)),
   0, 0, NULL);
if (flow_cache == NULL)
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] drivers: net: cpsw: remove tx event processing in rx napi poll

2015-07-21 Thread David Miller
From: Mugunthan V N mugunthan...@ti.com
Date: Tue, 21 Jul 2015 16:00:42 +0530

 With commit c03abd84634d (net: ethernet: cpsw: don't requests IRQs
 we don't use) common isr and napi are separated into separate tx isr
 and rx isr/napi, but still in rx napi tx events are handled. So removing
 the tx event handling in rx napi.
 
 Signed-off-by: Mugunthan V N mugunthan...@ti.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] tcp: don't extend RTO on failed loss probe attempts

2015-07-21 Thread Yuchung Cheng
On Fri, Jul 17, 2015 at 10:27 PM, Eric Dumazet eric.duma...@gmail.com wrote:

 On Fri, 2015-07-17 at 14:22 -0700, Yuchung Cheng wrote:
  If TLP was unable to send a probe, it extended the RTO to
  now + icsk_rto. But extending the RTO makes little sense
  if no TLP probe went out. With this commit, instead of
  extending the RTO we re-arm it relative to the transmit time
  of the write queue head.

 But what was the reason the probe could not be sent ?

 If it is local congestion or memory allocation error, it does make sense
 to not add fuel to the fire.
Good point. We can identify those so we don't attempt to
retransmit on these errors, but will retransmit on receive-window
limit. I'll re-spin the patch.




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 00/22 v2] Lightweight flow based encapsulation

2015-07-21 Thread David Miller
From: Thomas Graf tg...@suug.ch
Date: Tue, 21 Jul 2015 10:43:44 +0200

 This series combines the work previously posted by Roopa, Robert and
 myself. It's according to what we discussed at NFWS. The motivation
 of this series is to:
 
  * Consolidate code between OVS and the rest of the kernel and get
rid of OVS vports and instead represent them as pure net_devices.
  * Introduce a lightweight tunneling mechanism which enables flow
based encapsulation to improve scalability on both RX and TX.
  * Do the above in an encapsulation unspecific way so that the
encapsulation type is eventually abstracted away from the user.
  * Use the same forwarding decision for both native forwarding and
encapsulation thus allowing to switch between native IPv6 and
UDP encapsulation based on endpoint without requiring additional
logic
 
 The fundamental changes introduces in this series are:
  * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
instructions. Depending on the specified type, the instructions
apply to UDP encapsulations, MPLS and possible other in the future.
  * Depending on the encapsulation type, the output function of the
dst is directly overwritten or the dst merely attaches metadata and
relies on a subsequent net_device to apply it to the packet. The
latter is typically used if an inner and outer IP header exist which
require two subsequent routing lookups to be performed.
  * A new metadata_dst structure which can be attached to skbs to
carry metadata in between subsystems. This new metadata transport
is used to provide a single interface for VXLAN, routing and OVS
to communicate through metadata.

Series applied, but please take Alexei's endianness feedback into
consideration.

Thanks!
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2 v2] ss: Fix crash when dump stats from /proc with '-p'

2015-07-21 Thread Stephen Hemminger
On Tue, 21 Jul 2015 16:18:36 +0300
Vadim Kochan vadi...@gmail.com wrote:

 From: Vadim Kochan vadi...@gmail.com
 
 It really partially reverts:
 
 ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct)
 
 but adds few fields (name  peer_name) from removed unixstat to sockstat 
 struct to easy
 return original code.
 
 Fixes: ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct)
 Reported-by: Marc Dietrich marvi...@gmx.de
 Signed-off-by: Vadim Kochan vadi...@gmail.com

I applied this one after resolving merge conflicts.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp: suppress a division by zero warning

2015-07-21 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

Andrew Morton reported following warning on one ARM build
with gcc-4.4 :

net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
net/ipv4/inet_hashtables.c:617: warning: division by zero

Even guarded with a test on sizeof(spinlock_t), compiler does not
like current construct on a !CONFIG_SMP build.

Remove the warning by using a temporary variable.

Fixes: 095dc8e0c368 (tcp: fix/cleanup inet_ehash_locks_alloc())
Reported-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Eric Dumazet eduma...@google.com
---
 net/ipv4/inet_hashtables.c |   11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 5f9b063bbe8a..0cb9165421d4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -624,22 +624,21 @@ EXPORT_SYMBOL_GPL(inet_hashinfo_init);
 
 int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
 {
+   unsigned int locksz = sizeof(spinlock_t);
unsigned int i, nblocks = 1;
 
-   if (sizeof(spinlock_t) != 0) {
+   if (locksz != 0) {
/* allocate 2 cache lines or at least one spinlock per cpu */
-   nblocks = max_t(unsigned int,
-   2 * L1_CACHE_BYTES / sizeof(spinlock_t),
-   1);
+   nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U);
nblocks = roundup_pow_of_two(nblocks * num_possible_cpus());
 
/* no more locks than number of hash buckets */
nblocks = min(nblocks, hashinfo-ehash_mask + 1);
 
-   hashinfo-ehash_locks = kmalloc_array(nblocks, 
sizeof(spinlock_t),
+   hashinfo-ehash_locks = kmalloc_array(nblocks, locksz,
  GFP_KERNEL | 
__GFP_NOWARN);
if (!hashinfo-ehash_locks)
-   hashinfo-ehash_locks = vmalloc(nblocks * 
sizeof(spinlock_t));
+   hashinfo-ehash_locks = vmalloc(nblocks * locksz);
 
if (!hashinfo-ehash_locks)
return -ENOMEM;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] tcp: suppress a division by zero warning

2015-07-21 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Wed, 22 Jul 2015 07:02:00 +0200

 From: Eric Dumazet eduma...@google.com
 
 Andrew Morton reported following warning on one ARM build
 with gcc-4.4 :
 
 net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
 net/ipv4/inet_hashtables.c:617: warning: division by zero
 
 Even guarded with a test on sizeof(spinlock_t), compiler does not
 like current construct on a !CONFIG_SMP build.
 
 Remove the warning by using a temporary variable.
 
 Fixes: 095dc8e0c368 (tcp: fix/cleanup inet_ehash_locks_alloc())
 Reported-by: Andrew Morton a...@linux-foundation.org
 Signed-off-by: Eric Dumazet eduma...@google.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next] cxgb4: Add debugfs entry to enable backdoor access

2015-07-21 Thread David Miller
From: Hariprasad Shenai haripra...@chelsio.com
Date: Tue, 21 Jul 2015 22:39:40 +0530

 Add debugfs entry 'use_backdoor' to enable backdoor access to read sge
 context. By default, we read sge context's via firmware. In case of FW
 issues, one can enable backdoor access via debugfs to dump sge context
 for debugging purpose.
 
 Signed-off-by: Hariprasad Shenai haripra...@chelsio.com
 ---
 V2: Remove unnecessary braces as per comments by Sergei Shtylyov 
 sergei.shtyl...@cogentembedded.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path

2015-07-21 Thread Duan Andy
From: Lucas Stach l.st...@pengutronix.de Sent: Tuesday, July 21, 2015 11:11 PM
 To: David S. Miller
 Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org;
 ker...@pengutronix.de; patchwork-...@pengutronix.de
 Subject: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe
 fail path
 
 This function frees resources and cancels delayed work item that have
 been initialized in fec_ptp_init().
 
 Use this to do proper error handling if something goes wrong in probe
 function after fec_ptp_init has been called.
 
 Signed-off-by: Lucas Stach l.st...@pengutronix.de
 ---
  drivers/net/ethernet/freescale/fec.h  |  1 +
  drivers/net/ethernet/freescale/fec_main.c |  5 ++---
 drivers/net/ethernet/freescale/fec_ptp.c  | 10 ++
  3 files changed, 13 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/net/ethernet/freescale/fec.h
 b/drivers/net/ethernet/freescale/fec.h
 index 1eee73cccdf5..99d33e2d35e6 100644
 --- a/drivers/net/ethernet/freescale/fec.h
 +++ b/drivers/net/ethernet/freescale/fec.h
 @@ -562,6 +562,7 @@ struct fec_enet_private {  };
 
  void fec_ptp_init(struct platform_device *pdev);
 +void fec_ptp_stop(struct platform_device *pdev);
  void fec_ptp_start_cyclecounter(struct net_device *ndev);  int
 fec_ptp_set(struct net_device *ndev, struct ifreq *ifr);  int
 fec_ptp_get(struct net_device *ndev, struct ifreq *ifr); diff --git
 a/drivers/net/ethernet/freescale/fec_main.c
 b/drivers/net/ethernet/freescale/fec_main.c
 index a7f1bdf718f8..32e3807c650e 100644
 --- a/drivers/net/ethernet/freescale/fec_main.c
 +++ b/drivers/net/ethernet/freescale/fec_main.c
 @@ -3494,6 +3494,7 @@ failed_register:
  failed_mii_init:
  failed_irq:
  failed_init:
 + fec_ptp_stop(pdev);
   if (fep-reg_phy)
   regulator_disable(fep-reg_phy);
  failed_regulator:
 @@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev)
   struct net_device *ndev = platform_get_drvdata(pdev);
   struct fec_enet_private *fep = netdev_priv(ndev);
 
 - cancel_delayed_work_sync(fep-time_keep);
   cancel_work_sync(fep-tx_timeout_work);
 + fec_ptp_stop(pdev);
   unregister_netdev(ndev);
   fec_enet_mii_remove(fep);
   if (fep-reg_phy)
   regulator_disable(fep-reg_phy);
 - if (fep-ptp_clock)
 - ptp_clock_unregister(fep-ptp_clock);
   of_node_put(fep-phy_node);
   free_netdev(ndev);
 
 diff --git a/drivers/net/ethernet/freescale/fec_ptp.c
 b/drivers/net/ethernet/freescale/fec_ptp.c
 index a15663ad7f5e..f457a23d0bfb 100644
 --- a/drivers/net/ethernet/freescale/fec_ptp.c
 +++ b/drivers/net/ethernet/freescale/fec_ptp.c
 @@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev)
   schedule_delayed_work(fep-time_keep, HZ);  }
 
 +void fec_ptp_stop(struct platform_device *pdev) {
 + struct net_device *ndev = platform_get_drvdata(pdev);
 + struct fec_enet_private *fep = netdev_priv(ndev);
 +
 + cancel_delayed_work_sync(fep-time_keep);
 + if (fep-ptp_clock)
 + ptp_clock_unregister(fep-ptp_clock);
 +}
 +
  /**
   * fec_ptp_check_pps_event
   * @fep: the fec_enet_private structure handle
 --
 2.1.4

Acked-by: Fugang Duan b38...@freescale.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] e1000e: Move e1000e_disable_aspm_locked() inside CONFIG_PM

2015-07-21 Thread Michael Ellerman
On Wed, 2015-07-15 at 03:30 -0700, Jeff Kirsher wrote:
 On Tue, 2015-07-14 at 13:54 +1000, Michael Ellerman wrote:
  e1000e_disable_aspm_locked() is only used in __e1000_resume() which is
  inside CONFIG_PM. So when CONFIG_PM=n we get a defined but not used
  warning for e1000e_disable_aspm_locked().
  
  Move it inside the existing CONFIG_PM block to avoid the warning.
  
  Signed-off-by: Michael Ellerman m...@ellerman.id.au
  ---
   drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
 
 NACK, this is already fixed in my next-queue tree.  Raanan submitted a
 patch back on July 6th to resolve this issue, see commit id
 a75787d2246a93d256061db602f252703559af65 in my dev-queue branch of my
 next-queue tree.

OK. I take it your next-queue is destined for 4.3, so we'll just have to suck
on the warning until then?

cheers


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Cong Wang
On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote:

  -   kfree_skb(skb);
  +   INIT_LIST_HEAD(q-new_flows);
  +   INIT_LIST_HEAD(q-old_flows);
  +   for (i = 0; i  q-flows_cnt; i++) {
  +   struct fq_codel_flow *flow = q-flows + i;
  +
  +   while (flow-head)
  +   kfree_skb(dequeue_head(flow));
  +
  +   INIT_LIST_HEAD(flow-flowchain);


 You probably need to call codel_vars_init(flow-cvars) as well.

 It is not necessary : flow-cvars only matter in the event of a dequeue,
 but whole qdisc is dismantled and no packet will be dequeued.


But it will affect the next dequeue _after_ reset? which is not supposed
to happen as we expect a fresh start after reset?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb

2015-07-21 Thread Lawrence Brakmo
Based on comments by Neal Cardwell to tcp_nv patch:

  AFAICT this patch would not require an increase in the size of sk_buff
  cb[] if it were to take advantage of the fact that the tcp_skb_cb
  header.h4 and header.h6 fields are only used in the packet reception
  code path, and this in_flight field is only used on the transmit
  side. So the in_flight field could be placed in a struct that is
  itself placed in a union with the header union.

  That way the sender code can remember the in_flight value
  without requiring any extra space. And in the future other
  sender-side info could be stored in the tx struct, if needed.

Signed-off-by: Lawrence Brakmo bra...@fb.com
---
 include/net/tcp.h | 13 ++---
 net/ipv4/tcp_input.c  |  5 -
 net/ipv4/tcp_output.c |  4 +++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 26e7651..2e62efe 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -755,11 +755,17 @@ struct tcp_skb_cb {
/* 1 byte hole */
__u32   ack_seq;/* Sequence number ACK'd*/
union {
-   struct inet_skb_parmh4;
+   struct {
+   /* bytes in flight when this packet was sent */
+   __u32 in_flight;
+   } tx;   /* only used for outgoing skbs */
+   union {
+   struct inet_skb_parmh4;
 #if IS_ENABLED(CONFIG_IPV6)
-   struct inet6_skb_parm   h6;
+   struct inet6_skb_parm   h6;
 #endif
-   } header;   /* For incoming frames  */
+   } header;   /* For incoming skbs */
+   };
 };
 
 #define TCP_SKB_CB(__skb)  ((struct tcp_skb_cb *)((__skb)-cb[0]))
@@ -837,6 +843,7 @@ union tcp_cc_info;
 struct ack_sample {
u32 pkts_acked;
s32 rtt_us;
+   u32 in_flight;
 };
 
 struct tcp_congestion_ops {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4f641f6..aca4ae5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3068,6 +3068,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
long ca_rtt_us = -1L;
struct sk_buff *skb;
u32 pkts_acked = 0;
+   u32 last_in_flight = 0;
bool rtt_update;
int flag = 0;
 
@@ -3107,6 +3108,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
if (!first_ackt.v64)
first_ackt = last_ackt;
 
+   last_in_flight = TCP_SKB_CB(skb)-tx.in_flight;
reord = min(pkts_acked, reord);
if (!after(scb-end_seq, tp-high_seq))
flag |= FLAG_ORIG_SACK_ACKED;
@@ -3196,7 +3198,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int 
prior_fackets,
}
 
if (icsk-icsk_ca_ops-pkts_acked) {
-   struct ack_sample sample = {pkts_acked, ca_rtt_us};
+   struct ack_sample sample = {pkts_acked, ca_rtt_us,
+   last_in_flight};
 
icsk-icsk_ca_ops-pkts_acked(sk, sample);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7105784..e9deab5 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct 
sk_buff *skb, int clone_it,
int err;
 
BUG_ON(!skb || !tcp_skb_pcount(skb));
+   tp = tcp_sk(sk);
 
if (clone_it) {
skb_mstamp_get(skb-skb_mstamp);
+   TCP_SKB_CB(skb)-tx.in_flight = TCP_SKB_CB(skb)-end_seq
+   - tp-snd_una;
 
if (unlikely(skb_cloned(skb)))
skb = pskb_copy(skb, gfp_mask);
@@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff 
*skb, int clone_it,
}
 
inet = inet_sk(sk);
-   tp = tcp_sk(sk);
tcb = TCP_SKB_CB(skb);
memset(opts, 0, sizeof(opts));
 
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 net-next 0/3] tcp: add NV congestion control

2015-07-21 Thread Lawrence Brakmo
This patchset adds support for NV congestion control.

The first patch replaces two arguments in the pkts_acked() function
of the congestion control modules with a struct, making it easier to
add more parameters later without modifying the existing congestion
control modules.

The second patch adds the number of bytes in_flight when a packet is sent
to the tcp_skb_cb without increasing its size.

The third patch adds NV congestion control support.

[RFC PATCH v2 net-next 1/3] tcp: replace cnt  rtt with struct in pkts_acked()
[RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb
[RFC PATCH v2 net-next 3/3] tcp: add NV congestion control

Signed-off-by: Lawrence Brakmo bra...@fb.com

 include/net/tcp.h  |  21 ++-
 net/ipv4/Kconfig   |  16 ++
 net/ipv4/Makefile  |   1 +
 net/ipv4/sysctl_net_ipv4.c |   9 +
 net/ipv4/tcp_bic.c |   6 +-
 net/ipv4/tcp_cdg.c |  14 +-
 net/ipv4/tcp_cubic.c   |   6 +-
 net/ipv4/tcp_htcp.c|  10 +-
 net/ipv4/tcp_illinois.c|  20 +-
 net/ipv4/tcp_input.c   |  12 +-
 net/ipv4/tcp_lp.c  |   6 +-
 net/ipv4/tcp_nv.c  | 479 

 net/ipv4/tcp_output.c  |   4 +-
 net/ipv4/tcp_vegas.c   |   6 +-
 net/ipv4/tcp_vegas.h   |   2 +-
 net/ipv4/tcp_veno.c|   6 +-
 net/ipv4/tcp_westwood.c|   6 +-
 net/ipv4/tcp_yeah.c|   6 +-
 18 files changed, 579 insertions(+), 51 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/9] sfc: support for cascaded multicast filtering

2015-07-21 Thread David Miller
From: Edward Cree ec...@solarflare.com
Date: Tue, 21 Jul 2015 15:07:44 +0100

 Recent versions of firmware for SFC9100 adapters add support for filter
  chaining, in which packets matching multiple filters are delivered to all
  filters' recipients, rather than only the highest match-priority filter as 
 was
  previously the case.
 This patch series enables this feature and redesigns the filter handling code
  to make use of it; in particular, subscribing to a multicast address on one
  function no longer prevents traffic to that address reaching another function
  which is in promiscuous or allmulti mode.
 If the firmware does not support filter chaining, the driver will fall back to
  the old behaviour.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 net-next 1/3] tcp: replace cnt rtt with struct in pkts_acked()

2015-07-21 Thread Lawrence Brakmo
Replace 2 arguments (cnt and rtt) in the congestion control modules'
pkts_acked() function with a struct. This will allow adding more
information without having to modify existing congestion control
modules (tcp_nv in particular needs bytes in flight when packet
was sent).

This was proposed by Neal Cardwell in his comments to the tcp_nv patch.

Signed-off-by: Lawrence Brakmo lawre...@brakmo.org
---
 include/net/tcp.h   |  7 ++-
 net/ipv4/tcp_bic.c  |  6 +++---
 net/ipv4/tcp_cdg.c  | 14 +++---
 net/ipv4/tcp_cubic.c|  6 +++---
 net/ipv4/tcp_htcp.c | 10 +-
 net/ipv4/tcp_illinois.c | 20 ++--
 net/ipv4/tcp_input.c|  7 +--
 net/ipv4/tcp_lp.c   |  6 +++---
 net/ipv4/tcp_vegas.c|  6 +++---
 net/ipv4/tcp_vegas.h|  2 +-
 net/ipv4/tcp_veno.c |  6 +++---
 net/ipv4/tcp_westwood.c |  6 +++---
 net/ipv4/tcp_yeah.c |  6 +++---
 13 files changed, 55 insertions(+), 47 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 364426a..26e7651 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -834,6 +834,11 @@ enum tcp_ca_ack_event_flags {
 
 union tcp_cc_info;
 
+struct ack_sample {
+   u32 pkts_acked;
+   s32 rtt_us;
+};
+
 struct tcp_congestion_ops {
struct list_headlist;
u32 key;
@@ -857,7 +862,7 @@ struct tcp_congestion_ops {
/* new value of cwnd after loss (optional) */
u32  (*undo_cwnd)(struct sock *sk);
/* hook for packet ack accounting (optional) */
-   void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
+   void (*pkts_acked)(struct sock *sk, struct ack_sample);
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c
index fd1405d..6a873f7 100644
--- a/net/ipv4/tcp_bic.c
+++ b/net/ipv4/tcp_bic.c
@@ -197,15 +197,15 @@ static void bictcp_state(struct sock *sk, u8 new_state)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt)
+static void bictcp_acked(struct sock *sk, struct ack_sample sample)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
 
if (icsk-icsk_ca_state == TCP_CA_Open) {
struct bictcp *ca = inet_csk_ca(sk);
 
-   cnt -= ca-delayed_ack  ACK_RATIO_SHIFT;
-   ca-delayed_ack += cnt;
+   ca-delayed_ack += sample.pkts_acked - 
+   (ca-delayed_ack  ACK_RATIO_SHIFT);
}
 }
 
diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c
index 167b6a3..ef64106 100644
--- a/net/ipv4/tcp_cdg.c
+++ b/net/ipv4/tcp_cdg.c
@@ -294,12 +294,12 @@ static void tcp_cdg_cong_avoid(struct sock *sk, u32 ack, 
u32 acked)
ca-shadow_wnd = max(ca-shadow_wnd, ca-shadow_wnd + incr);
 }
 
-static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us)
+static void tcp_cdg_acked(struct sock *sk, struct ack_sample sample)
 {
struct cdg *ca = inet_csk_ca(sk);
struct tcp_sock *tp = tcp_sk(sk);
 
-   if (rtt_us = 0)
+   if (sample.rtt_us = 0)
return;
 
/* A heuristic for filtering delayed ACKs, adapted from:
@@ -307,20 +307,20 @@ static void tcp_cdg_acked(struct sock *sk, u32 num_acked, 
s32 rtt_us)
 * delay and rate based TCP mechanisms. TR 100219A. CAIA, 2010.
 */
if (tp-sacked_out == 0) {
-   if (num_acked == 1  ca-delack) {
+   if (sample.pkts_acked == 1  ca-delack) {
/* A delayed ACK is only used for the minimum if it is
 * provenly lower than an existing non-zero minimum.
 */
-   ca-rtt.min = min(ca-rtt.min, rtt_us);
+   ca-rtt.min = min(ca-rtt.min, sample.rtt_us);
ca-delack--;
return;
-   } else if (num_acked  1  ca-delack  5) {
+   } else if (sample.pkts_acked  1  ca-delack  5) {
ca-delack++;
}
}
 
-   ca-rtt.min = min_not_zero(ca-rtt.min, rtt_us);
-   ca-rtt.max = max(ca-rtt.max, rtt_us);
+   ca-rtt.min = min_not_zero(ca-rtt.min, sample.rtt_us);
+   ca-rtt.max = max(ca-rtt.max, sample.rtt_us);
 }
 
 static u32 tcp_cdg_ssthresh(struct sock *sk)
diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 28011fb..070d629 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -416,21 +416,21 @@ static void hystart_update(struct sock *sk, u32 delay)
 /* Track delayed acknowledgment ratio using sliding window
  * ratio = (15*ratio + sample) / 16
  */
-static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us)
+static void bictcp_acked(struct sock *sk, struct ack_sample sample)
 {
const struct tcp_sock *tp = tcp_sk(sk);
   

[RFC PATCH v2 net-next 3/3] tcp: add NV congestion control

2015-07-21 Thread Lawrence Brakmo
This is a request for comments.

TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of
NV was presented at 2010's LPC (slides). It is a delayed based
congestion avoidance for the data center. This version has been tested
within a 10G rack where the HW RTTs are 20-50us.

A description of TCP-NV, including implementation and experimental
results, can be found at:
http://www.brakmo.org/networking/tcp-nv/TCPNV.html

The current version includes many module parameters to support
experimentation with the parameters.

Signed-off-by: Lawrence Brakmo bra...@fb.com
---
 include/net/tcp.h  |   1 +
 net/ipv4/Kconfig   |  16 ++
 net/ipv4/Makefile  |   1 +
 net/ipv4/sysctl_net_ipv4.c |   9 +
 net/ipv4/tcp_input.c   |   2 +
 net/ipv4/tcp_nv.c  | 479 +
 6 files changed, 508 insertions(+)
 create mode 100644 net/ipv4/tcp_nv.c

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2e62efe..c0690ae 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
 extern int sysctl_tcp_min_tso_segs;
 extern int sysctl_tcp_autocorking;
 extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_nv_enable;
 
 extern atomic_long_t tcp_memory_allocated;
 extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 6fb3c90..c37b374 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -539,6 +539,22 @@ config TCP_CONG_VEGAS
window. TCP Vegas should provide less packet loss, but it is
not as aggressive as TCP Reno.
 
+config TCP_CONG_NV
+   tristate TCP NV
+   default m
+   ---help---
+   TCP NV is a follow up to TCP Vegas. It has been modified to deal with
+   10G networks, measurement noise introduced by LRO, GRO and interrupt
+   coalescence. In addition, it will decrease its cwnd multiplicative
+   instead of linearly.
+
+   Note that in general congestion avoidance (cwnd decreased when # packets
+   queued grows) cannot coexist with congestion control (cwnd decreased 
only
+   when there is packet loss) due to fairness issues. One scenario when the
+   can coexist safely is when the CA flows have RTTs  CC flows RTTs.
+
+   For further details see http://www.brakmo.org/networking/tcp-nv/
+
 config TCP_CONG_SCALABLE
tristate Scalable TCP
default n
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index efc43f3..06f335f 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
 obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o
 obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o
 obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o
+obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o
 obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o
 obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o
 obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 433231c..31846d5 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -730,6 +730,15 @@ static struct ctl_table ipv4_table[] = {
.proc_handler   = proc_dointvec_ms_jiffies,
},
{
+   .procname   = tcp_nv_enable,
+   .data   = sysctl_tcp_nv_enable,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = zero,
+   .extra2 = one,
+   },  
+   {
.procname   = icmp_msgs_per_sec,
.data   = sysctl_icmp_msgs_per_sec,
.maxlen = sizeof(int),
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index aca4ae5..87560d9 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
 int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
 int sysctl_tcp_early_retrans __read_mostly = 3;
 int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
+int sysctl_tcp_nv_enable __read_mostly = 1;
+EXPORT_SYMBOL(sysctl_tcp_nv_enable);
 
 #define FLAG_DATA  0x01 /* Incoming frame contained data.  
*/
 #define FLAG_WIN_UPDATE0x02 /* Incoming ACK was a window 
update.   */
diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c
new file mode 100644
index 000..af451b6
--- /dev/null
+++ b/net/ipv4/tcp_nv.c
@@ -0,0 +1,479 @@
+/*
+ * TCP NV: TCP with Congestion Avoidance
+ *
+ * TCP-NV is a successor of TCP-Vegas that has been developed to
+ * deal with the issues that occur in modern networks. 
+ * Like TCP-Vegas, TCP-NV supports true congestion avoidance,
+ * the ability to detect congestion before packet losses occur.
+ * When congestion (queue buildup) starts to occur, TCP-NV
+ * predicts what the cwnd size should be for the current
+ * throughput and it 

Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread David Miller
From: Chris J Arges chris.j.ar...@canonical.com
Date: Tue, 21 Jul 2015 12:36:33 -0500

 Some architectures like POWER can have a NUMA node_possible_map that
 contains sparse entries. This causes memory corruption with openvswitch
 since it allocates flow_cache with a multiple of num_possible_nodes() and
 assumes the node variable returned by for_each_node will index into
 flow-stats[node].
 
 Use nr_node_ids to allocate a maximal sparse array instead of
 num_possible_nodes().
 
 The crash was noticed after 3af229f2 was applied as it changed the
 node_possible_map to match node_online_map on boot.
 Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
 
 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] mpls: make RTA_OIF optional

2015-07-21 Thread David Miller
From: Roopa Prabhu ro...@cumulusnetworks.com
Date: Tue, 21 Jul 2015 09:16:24 -0700

 From: Roopa Prabhu ro...@cumulusnetworks.com
 
 If user did not specify an oif, try and get it from the via address.
 If failed to get device, return with -ENODEV.
 
 Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] be2net: support ndo_get_phys_port_id()

2015-07-21 Thread Sriharsha Basavapatna
From: Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com

Add be_get_phys_port_id() function to report physical port id. The port id
should be unique across different be2net devices in the system. We use the
chip serial number along with the physical port number for this.

Signed-off-by: Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com
---
 drivers/net/ethernet/emulex/benet/be.h  |  3 +++
 drivers/net/ethernet/emulex/benet/be_cmds.c |  7 ++-
 drivers/net/ethernet/emulex/benet/be_cmds.h |  8 +---
 drivers/net/ethernet/emulex/benet/be_main.c | 22 ++
 4 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index cb5777b..8cd384d 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -105,6 +105,8 @@
 
 #define MAX_VFS30 /* Max VFs supported by BE3 FW */
 #define FW_VER_LEN 32
+#defineCNTL_SERIAL_NUM_WORDS   8  /* Controller serial number words */
+#defineCNTL_SERIAL_NUM_WORD_SZ (sizeof(u16)) /* Byte-sz of serial num 
word */
 
 #defineRSS_INDIR_TABLE_LEN 128
 #define RSS_HASH_KEY_LEN   40
@@ -590,6 +592,7 @@ struct be_adapter {
struct rss_info rss_info;
/* Filters for packets that need to be sent to BMC */
u32 bmc_filt_mask;
+   u16 serial_num[CNTL_SERIAL_NUM_WORDS];
 };
 
 #define be_physfn(adapter) (!adapter-virtfn)
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index ecad46f..3be1fbd 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -2852,10 +2852,11 @@ int be_cmd_get_cntl_attributes(struct be_adapter 
*adapter)
struct be_mcc_wrb *wrb;
struct be_cmd_req_cntl_attribs *req;
struct be_cmd_resp_cntl_attribs *resp;
-   int status;
+   int status, i;
int payload_len = max(sizeof(*req), sizeof(*resp));
struct mgmt_controller_attrib *attribs;
struct be_dma_mem attribs_cmd;
+   u32 *serial_num;
 
if (mutex_lock_interruptible(adapter-mbox_lock))
return -1;
@@ -2886,6 +2887,10 @@ int be_cmd_get_cntl_attributes(struct be_adapter 
*adapter)
if (!status) {
attribs = attribs_cmd.va + sizeof(struct be_cmd_resp_hdr);
adapter-hba_port_num = attribs-hba_attribs.phy_port;
+   serial_num = attribs-hba_attribs.controller_serial_number;
+   for (i = 0; i  CNTL_SERIAL_NUM_WORDS; i++)
+   adapter-serial_num[i] = le32_to_cpu(serial_num[i]) 
+   (BIT_MASK(16) - 1);
}
 
 err:
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h 
b/drivers/net/ethernet/emulex/benet/be_cmds.h
index a4479f7..36d835b 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -1637,10 +1637,12 @@ struct be_cmd_req_set_qos {
 struct mgmt_hba_attribs {
u32 rsvd0[24];
u8 controller_model_number[32];
-   u32 rsvd1[79];
-   u8 rsvd2[3];
+   u32 rsvd1[16];
+   u32 controller_serial_number[8];
+   u32 rsvd2[55];
+   u8 rsvd3[3];
u8 phy_port;
-   u32 rsvd3[13];
+   u32 rsvd4[13];
 } __packed;
 
 struct mgmt_controller_attrib {
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index c996dd7..5e92db8 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5219,6 +5219,27 @@ static netdev_features_t be_features_check(struct 
sk_buff *skb,
 }
 #endif
 
+static int be_get_phys_port_id(struct net_device *dev,
+  struct netdev_phys_item_id *ppid)
+{
+   int i, id_len = CNTL_SERIAL_NUM_WORDS * CNTL_SERIAL_NUM_WORD_SZ + 1;
+   struct be_adapter *adapter = netdev_priv(dev);
+   u8 *id;
+
+   if (MAX_PHYS_ITEM_ID_LEN  id_len)
+   return -ENOSPC;
+
+   ppid-id[0] = adapter-hba_port_num + 1;
+   id = ppid-id[1];
+   for (i = CNTL_SERIAL_NUM_WORDS - 1; i = 0;
+i--, id += CNTL_SERIAL_NUM_WORD_SZ)
+   memcpy(id, adapter-serial_num[i], CNTL_SERIAL_NUM_WORD_SZ);
+
+   ppid-id_len = id_len;
+
+   return 0;
+}
+
 static const struct net_device_ops be_netdev_ops = {
.ndo_open   = be_open,
.ndo_stop   = be_close,
@@ -5249,6 +5270,7 @@ static const struct net_device_ops be_netdev_ops = {
.ndo_del_vxlan_port = be_del_vxlan_port,
.ndo_features_check = be_features_check,
 #endif
+   .ndo_get_phys_port_id   = be_get_phys_port_id,
 };
 
 static void be_netdev_init(struct net_device *netdev)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to 

Re: [RFC PATCH v2 net-next 1/3] tcp: replace cnt rtt with struct in pkts_acked()

2015-07-21 Thread Eric Dumazet
On Tue, 2015-07-21 at 21:21 -0700, Lawrence Brakmo wrote:
 Replace 2 arguments (cnt and rtt) in the congestion control modules'
 pkts_acked() function with a struct. This will allow adding more
 information without having to modify existing congestion control
 modules (tcp_nv in particular needs bytes in flight when packet
 was sent).
 
 This was proposed by Neal Cardwell in his comments to the tcp_nv patch.

Are you sure Neal suggested to pass a struct as argument ?

It was probably a struct pointer instead.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().

2015-07-21 Thread roopa

On 7/21/15, 9:57 PM, Rami Rosen wrote:

This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() 
method. The
assignment of vinfo.flags = ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is
unneeded, as vinfo.flags value is overriden by the  immediately following
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement.

Signed-off-by: Rami Rosen rami.ro...@intel.com


Acked-by: Roopa Prabhu ro...@cumulusnetworks.com

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread Julian Anastasov

Hello,

On Tue, 21 Jul 2015, Martin KaFai Lau wrote:

 The patch checks neigh-nud_state before acquiring the writer lock.
 Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

Locking usage is absolutely correct.

 + if (!(neigh-nud_state  NUD_VALID) 
 + time_after(jiffies, neigh-updated + 
 rt-rt6i_idev-cnf.rtr_probe_interval)) {

but this line is too long...

 + work = kmalloc(sizeof(*work), GFP_ATOMIC);
 + if (work) {
 + __neigh_set_probe_once(neigh);
 + }

scripts/checkpatch.pl --strict /tmp/file.patch

Regards

--
Julian Anastasov j...@ssi.bg
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 net 0/3] BPF JIT fixes for ARM

2015-07-21 Thread David Miller
From: Nicolas Schichan nschic...@freebox.fr
Date: Tue, 21 Jul 2015 14:14:11 +0200

 These patches are fixing bugs in the ARM JIT and should probably find
 their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release
 are now passing OK (was 54 out of 60 before).

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference

2015-07-21 Thread Roopa Prabhu
From: Roopa Prabhu ro...@cumulusnetworks.com

fix for:
net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison
expression (different address spaces)

remove incorrect rcu_dereference possibly left over from
earlier revisions of the code.

Reported-by: kbuild test robot fengguang...@intel.com
Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
 net/mpls/mpls_iptunnel.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index eea096f..276f8c9 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -70,7 +70,7 @@ int mpls_output(struct sock *sk, struct sk_buff *skb)
skb_orphan(skb);
 
/* Find the output device */
-   out_dev = rcu_dereference(dst-dev);
+   out_dev = dst-dev;
if (!mpls_output_possible(out_dev) ||
!lwtstate || skb_warn_if_lro(skb))
goto drop;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers

2015-07-21 Thread Alex Gartrell
mov %rsp, %r1   ; r1 = rsp
add $-8, %r1; r1 = rsp - 8
store_q $123, -8(%rsp)  ; *(u64*)r1 = 123  - valid
store_q $123, (%r1) ; *(u64*)r1 = 123  - previously invalid
mov $0, %r0
exit; Always need to exit

And we'd get the following error:

0: (bf) r1 = r10
1: (07) r1 += -8
2: (7a) *(u64 *)(r10 -8) = 999
3: (7a) *(u64 *)(r1 +0) = 999
R1 invalid mem access 'fp'

Unable to load program

We already know that a register is a stack address and the appropriate
offset, so we should be able to validate those references as well.

Signed-off-by: Alex Gartrell agartr...@fb.com
---
 kernel/bpf/verifier.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 039d866..5dfbece 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
err = check_stack_write(state, off, size, value_regno);
else
err = check_stack_read(state, off, size, value_regno);
+   } else if (state-regs[regno].type == PTR_TO_STACK) {
+   int real_off = state-regs[regno].imm + off;
+
+   if (t == BPF_WRITE)
+   err = check_stack_write(
+   state, real_off, size, value_regno);
+   else
+   err = check_stack_read(
+   state, real_off, size, value_regno);
} else {
verbose(R%d invalid mem access '%s'\n,
regno, reg_type_str[state-regs[regno].type]);
-- 
Alex Gartrell agartr...@fb.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Eric Dumazet
On Tue, 2015-07-21 at 19:03 -0700, Cong Wang wrote:
 On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet eric.duma...@gmail.com wrote:
  On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote:
 
   -   kfree_skb(skb);
   +   INIT_LIST_HEAD(q-new_flows);
   +   INIT_LIST_HEAD(q-old_flows);
   +   for (i = 0; i  q-flows_cnt; i++) {
   +   struct fq_codel_flow *flow = q-flows + i;
   +
   +   while (flow-head)
   +   kfree_skb(dequeue_head(flow));
   +
   +   INIT_LIST_HEAD(flow-flowchain);
 
 
  You probably need to call codel_vars_init(flow-cvars) as well.
 
  It is not necessary : flow-cvars only matter in the event of a dequeue,
  but whole qdisc is dismantled and no packet will be dequeued.
 
 
 But it will affect the next dequeue _after_ reset? which is not supposed
 to happen as we expect a fresh start after reset?

Hmm... I thought reset() was only done at queue dismantle, so no new
packet should be added later, and since no packet should be left after
reset, no dequeue should happen.

For completeness, we still can add the codel_vars_init(), no problem.

Thanks.



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().

2015-07-21 Thread Rami Rosen
This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() 
method. The
assignment of vinfo.flags = ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is
unneeded, as vinfo.flags value is overriden by the  immediately following 
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement.

Signed-off-by: Rami Rosen rami.ro...@intel.com
---
 net/bridge/br_netlink.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 364bdc9..793d247 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -164,8 +164,6 @@ static int br_fill_ifvlaninfo_range(struct sk_buff *skb, 
u16 vid_start,
sizeof(vinfo), vinfo))
goto nla_put_failure;
 
-   vinfo.flags = ~BRIDGE_VLAN_INFO_RANGE_BEGIN;
-
vinfo.vid = vid_end;
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END;
if (nla_put(skb, IFLA_BRIDGE_VLAN_INFO,
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] netlink: don't hold mutex in rcu callback when releasing mmapd ring

2015-07-21 Thread David Miller
From: Florian Westphal f...@strlen.de
Date: Tue, 21 Jul 2015 16:33:50 +0200

 Kirill A. Shutemov says:
 
 This simple test-case trigers few locking asserts in kernel:
 ...
 Cong Wang says:
 
 We can't hold mutex lock in a rcu callback, [..]
 
 Thomas Graf says:
 
 The socket should be dead at this point. It might be simpler to
 add a netlink_release_ring() function which doesn't require
 locking at all.
 
 Reported-by: Kirill A. Shutemov kir...@shutemov.name
 Diagnosed-by: Cong Wang cw...@twopensource.com
 Suggested-by: Thomas Graf tg...@suug.ch
 Signed-off-by: Florian Westphal f...@strlen.de

Applied, thanks everyone.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: track success and failure of TCP PMTU probing

2015-07-21 Thread David Miller
From: r...@tardy.usa.hp.com (Rick Jones)
Date: Tue, 21 Jul 2015 16:14:13 -0700 (PDT)

 From: Rick Jones rick.jon...@hp.com
 
 Track success and failure of TCP PMTU probing.
 
 Signed-off-by: Rick Jones rick.jon...@hp.com

Seems reasonable, applied, thanks Rick.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ravb: fix ring memory allocation

2015-07-21 Thread David Miller
From: Sergei Shtylyov sergei.shtyl...@cogentembedded.com
Date: Wed, 22 Jul 2015 01:31:59 +0300

 The driver is written as if it can adapt to a low memory situation  allocating
 less RX  skbs and TX aligned buffers than the respective RX/TX ring sizes.  In
 reality  though  the driver  would malfunction in this case. Stop being overly
 smart and just fail in such situation -- this is achieved by moving the memory
 allocation from ravb_ring_format() to ravb_ring_init().
 
 We leave dma_map_single() calls in place but make their failure non-fatal
 by marking the corresponding RX descriptors  with zero data size which should
 prevent DMA to an invalid addresses.
 
 Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com

Applied.

But the real way to handle this is to allocate all of the necessary
resources for the replacement RX SKB before unmapping and passing the
original SKB up into the stack.

That way you _NEVER_ starve the device of RX packets to receive into,
since if you fail the memory allocation or the DMA mapping, you just
put the original SKB back into the ring.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Pravin Shelar
On Tue, Jul 21, 2015 at 10:36 AM, Chris J Arges
chris.j.ar...@canonical.com wrote:
 Some architectures like POWER can have a NUMA node_possible_map that
 contains sparse entries. This causes memory corruption with openvswitch
 since it allocates flow_cache with a multiple of num_possible_nodes() and
 assumes the node variable returned by for_each_node will index into
 flow-stats[node].

 Use nr_node_ids to allocate a maximal sparse array instead of
 num_possible_nodes().

 The crash was noticed after 3af229f2 was applied as it changed the
 node_possible_map to match node_online_map on boot.
 Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861

 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com

Acked-by: Pravin B Shelar pshe...@nicira.com

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ARP response with link local IP, why not broadcast

2015-07-21 Thread Sowmini Varadhan
On Tue, Jul 21, 2015 at 4:38 PM, Sebastian Fett db_ext...@gmx.de wrote:
 Hello!

 According to RFC3927 every ARP packet (reply and request) should be sent as
 link layer broadcast as long as the sender IP is a link local address. (see
 chapter 2.5).

Because broadcast replies are noisy and should be avoided.
if possible- it creates a broadcast flood that would wake up all receivers,
and is especially undesirable in today's world, where bcast would wake
up sleepy devices, or require other inefficient processes in a cloud env.
See also https://www.ietf.org/id/draft-nordmark-6man-dad-approaches-01.txt

 That functionality would help me a lot with a use case I have with our
 application.

what is your use case?


 But it is not implemented in the kernel that way.
 Does anyone know why?

--Sowmini
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 0/2] pci: Provide a flag to access VPD through function 0

2015-07-21 Thread Bjorn Helgaas
[+cc Alex]

On Mon, Jul 13, 2015 at 11:39:54AM -0700, Mark D Rustad wrote:
 Many multi-function devices provide shared registers in extended
 config space for accessing VPD. The behavior of these registers
 means that the state must be tracked and access locked correctly
 for accesses not to hang or worse. One way to meet these needs is
 to always perform the accesses through function 0, thereby using
 the state tracking and mutex that already exists.
 
 To provide this behavior, add a dev_flags bit to indicate that this
 should be done. This bit can then be set for any non-zero function
 that needs to redirect such VPD access to function 0. Do not set
 this bit on the zero function or there will be an infinite recursion.
 
 The second patch uses this new flag to invoke this behavior on all
 multi-function Intel Ethernet devices.
 
 Any hardware that shares VPD registers with multiple functions has
 been suffering these problems forever. The hangs result in the log
 message:
 
 vpd r/w failed.  This is likely a firmware bug on this device.
 
 Both read and write data corruption are also possible during
 overlapping accesses in addition to hangs.
 
 Signed-off-by: Mark Rustad mark.d.rus...@intel.com
 
 ---
 Changes in V2:
 - Corrected a spelling error in a log message
 - Added checks to see that the referenced function 0 is reasonable
 Changes in V3:
 - Don't leak a device reference
 - Check that function 0 has VPD
 - Make a helper for the function 0 checks
 - Moved a multifunction check to the quirk patch
 Changes in V4:
 - Provide a more extensive commit log for patch 1

I applied these to pci/misc for v4.3 with changelogs as follows.  I added
Alex's ack, since he acked v3 and the only difference here is the
changelog.  I also added a stable tag.  Thanks!

Bjorn


commit 932c435caba8a2ce473a91753bad0173269ef334
Author: Mark Rustad mark.d.rus...@intel.com
Date:   Mon Jul 13 11:40:02 2015 -0700

PCI: Add dev_flags bit to access VPD through function 0

Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through
function 0 to provide VPD access on other functions.  This is for hardware
devices that provide copies of the same VPD capability registers in
multiple functions.  Because the kernel expects that each function has its
own registers, both the locking and the state tracking are affected by VPD
accesses to different functions.

On such devices for example, if a VPD write is performed on function 0,
*any* later attempt to read VPD from any other function of that device will
hang.  This has to do with how the kernel tracks the expected value of the
F bit per function.

Concurrent accesses to different functions of the same device can not only
hang but also corrupt both read and write VPD data.

When hangs occur, typically the error message:

  vpd r/w failed.  This is likely a firmware bug on this device.

will be seen.

Never set this bit on function 0 or there will be an infinite recursion.

Signed-off-by: Mark Rustad mark.d.rus...@intel.com
Signed-off-by: Bjorn Helgaas bhelg...@google.com
Acked-by: Alexander Duyck alexander.h.du...@redhat.com
CC: sta...@vger.kernel.org

commit 7aa6ca4d39edf01f997b9e02cf6d2fdeb224f351
Author: Mark Rustad mark.d.rus...@intel.com
Date:   Mon Jul 13 11:40:07 2015 -0700

PCI: Add VPD function 0 quirk for Intel Ethernet devices

Set the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device
functions other than function 0, so that on multi-function devices, we will
always read VPD from function 0 instead of from the other functions.

[bhelgaas: changelog]
Signed-off-by: Mark Rustad mark.d.rus...@intel.com
Signed-off-by: Bjorn Helgaas bhelg...@google.com
Acked-by: Alexander Duyck alexander.h.du...@redhat.com
CC: sta...@vger.kernel.org
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reproducable panic eviction work queue

2015-07-21 Thread Florian Westphal
Frank Schreuder fschreu...@transip.nl wrote:

[ inet frag evictor crash ]

We believe we found the bug.  This patch should fix it.

We cannot share list for buckets and evictor, the flag member is
subject to race conditions so flags  INET_FRAG_EVICTED test is not
reliable.

It would be great if you could confirm that this fixes the problem
for you, we'll then make formal patch submission.

Please apply this on kernel without previous test patches, wheter you
use affected -stable or net-next kernel shouldn't matter since those are
similar enough.

Many thanks!

diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -45,6 +45,7 @@ enum {
  * @flags: fragment queue flags
  * @max_size: maximum received fragment size
  * @net: namespace that this frag belongs to
+ * @list_evictor: list of queues to forcefully evict (e.g. due to low memory)
  */
 struct inet_frag_queue {
spinlock_t  lock;
@@ -59,6 +60,7 @@ struct inet_frag_queue {
__u8flags;
u16 max_size;
struct netns_frags  *net;
+   struct hlist_node   list_evictor;
 };
 
 #define INETFRAGS_HASHSZ   1024
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index 5e346a0..1722348 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -151,14 +151,13 @@ evict_again:
}
 
fq-flags |= INET_FRAG_EVICTED;
-   hlist_del(fq-list);
-   hlist_add_head(fq-list, expired);
+   hlist_add_head(fq-list_evictor, expired);
++evicted;
}
 
spin_unlock(hb-chain_lock);
 
-   hlist_for_each_entry_safe(fq, n, expired, list)
+   hlist_for_each_entry_safe(fq, n, expired, list_evictor)
f-frag_expire((unsigned long) fq);
 
return evicted;
@@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, 
struct inet_frags *f)
struct inet_frag_bucket *hb;
 
hb = get_frag_bucket_locked(fq, f);
-   if (!(fq-flags  INET_FRAG_EVICTED))
-   hlist_del(fq-list);
+   hlist_del(fq-list);
spin_unlock(hb-chain_lock);
 }
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [12:36:33 -0500], Chris J Arges wrote:
 Some architectures like POWER can have a NUMA node_possible_map that
 contains sparse entries. This causes memory corruption with openvswitch
 since it allocates flow_cache with a multiple of num_possible_nodes() and
 assumes the node variable returned by for_each_node will index into
 flow-stats[node].
 
 Use nr_node_ids to allocate a maximal sparse array instead of
 num_possible_nodes().
 
 The crash was noticed after 3af229f2 was applied as it changed the
 node_possible_map to match node_online_map on boot.
 Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
 
 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com
 ---
  net/openvswitch/flow_table.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
 index 4613df8..6552394 100644
 --- a/net/openvswitch/flow_table.c
 +++ b/net/openvswitch/flow_table.c
 @@ -752,7 +752,7 @@ int ovs_flow_init(void)
   BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long));
 
   flow_cache = kmem_cache_create(sw_flow, sizeof(struct sw_flow)
 -+ (num_possible_nodes()
 ++ (nr_node_ids
 * sizeof(struct flow_stats *)),
  0, 0, NULL);
   if (flow_cache == NULL)
 -- 
 1.9.1
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems

2015-07-21 Thread Nishanth Aravamudan
On 21.07.2015 [11:30:58 -0500], Chris J Arges wrote:
 On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote:
  On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote:
   Some architectures like POWER can have a NUMA node_possible_map that
   contains sparse entries. This causes memory corruption with openvswitch
   since it allocates flow_cache with a multiple of num_possible_nodes() and
  
  Couldn't this also be fixed by just allocationg with a multiple of
  nr_node_ids (which seems to have been the original intent all along)?
  You could then make your stats array be sparse or not.
  
 
 Yea originally this is what I did, but I thought it would be wasting memory.
 
   assumes the node variable returned by for_each_node will index into
   flow-stats[node].
   
   For example, if node_possible_map is 0x30003, this patch will map node to
   node_cnt as follows:
   0,1,16,17 = 0,1,2,3
   
   The crash was noticed after 3af229f2 was applied as it changed the
   node_possible_map to match node_online_map on boot.
   Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
  
  My concern with this version of the fix is that you're relying on,
  implicitly, the order of for_each_node's iteration corresponding to the
  entries in stats 1:1. But what about node hotplug? It seems better to
  have the enumeration of the stats array match the topology accurately,
  rather, or to maintain some sort of internal map in the OVS code between
  the NUMA node and the entry in the stats array?
  
  I'm willing to be convinced otherwise, though :)
  
  -Nish
 
 
 Nish,
 
 The method I described should work for hotplug since it's using possible map
 which AFAIK is static rather than the online map. 

Oh you're right, I'm sorry!

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] Use PATH_MAX instead of MAXPATHLEN

2015-07-21 Thread Yegor Yefremov
On Wed, Apr 29, 2015 at 6:52 PM, Felix Janda felix.ja...@posteo.de wrote:
 Florian Fainelli wrote:
 On 27/04/15 09:13, Stephen Hemminger wrote:
  On Sat, 25 Apr 2015 22:33:28 +0200
  Felix Janda felix.ja...@posteo.de wrote:
 
  They are equivalent but the former is more common. PATH_MAX is
  specified by POSIX and needs limits.h while MAXPATHLEN has BSD
  origin and needs sys/param.h.
 
  PATH_MAX has already been in use in misc/lnstat.h.
 
  Signed-off-by: Felix Janda felix.ja...@posteo.de
 
  Iproute2 is intended for use on Linux.
  It makes more sense to align with Posix than using leftover
  BSD stuff. Therefore I don't see any point in doing this.

 My reading from Felix's commit message is that he is attempting to do
 exactly that: conform to POSIX rather than BSD, which seems to be the
 direction you are also suggesting here.
 --
 Florian

 This is correct. (In fact I misread the end of Stephen's message,
 thought that the patch was merged and wanted to thank for that.)

What's the status of this patch? This is one of the reasons iproute2
cannot be compiled against musl C library. After fixing this I get
tons of redefine errors:

In file included from ../include/linux/xfrm.h:4:0,
 from xfrm_state.c:31:
../include/linux/in6.h:32:8: error: redefinition of ‘struct in6_addr’
 struct in6_addr {
^
In file included from
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0,
 from xfrm_state.c:30:
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:24:8:
note: originally defined here
 struct in6_addr
^
In file included from ../include/linux/xfrm.h:4:0,
 from xfrm_state.c:31:
../include/linux/in6.h:40:0: warning: s6_addr redefined
 #define s6_addr   in6_u.u6_addr8
 ^
In file included from
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0,
 from xfrm_state.c:30:
/home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:32:0:
note: this is the location of the previous definition
 #define s6_addr __in6_union.__s6_addr
 ^

Yegor
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why return E2BIG from bpf map update?

2015-07-21 Thread Alexei Starovoitov

On 7/21/15 3:13 AM, Alex Gartrell wrote:

But, the EINVAL errno has similarly been
abused to death


there was a thread few month ago trying to come up with
a generic solution for aliased error codes, but unfortunately
nothing concrete came out of it.
The one I liked sounded that the kernel may be able to extend
syscall interface to return a string together with errno,
but it's quite hard to do at present.
May be extensions to vdso data writable by kernel can
improve the situation.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested

2015-07-21 Thread Arnaud Ebalard
Hi guys,

Florian Fainelli f.faine...@gmail.com writes:

 Changes in v5:

 - removed an invalid use of the link_update callback in the SF2 driver
   was appeared after merging net: phy: fixed_phy: handle link-down case

 - reworded the commit message for patch 2 to make it clear what it fixes and
   why this is required

 Initial cover letter from Stas:

 Hello.

 Currently the link status auto-negotiation is enabled
 for any SGMII link with fixed-link DT binding.
 The regression was reported:
 https://lkml.org/lkml/2015/7/8/865
 Apparently not all HW that implements SGMII protocol, generates the
 inband status for the auto-negotiation to work.
 More details here:
 https://lkml.org/lkml/2015/7/10/206

 The following patches reverts to the old behavior by default,
 which is to not enable the auto-negotiation for fixed-link.
 The new DT property is added that allows to explicitly request
 the auto-negotiation.

FWIW, I tested this v5 series on mirabox (2 mvneta interfaces using
RGMII); both interfaces still work as expected, i.e. no regression
on my side.

a+
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen

2015-07-21 Thread Eric Dumazet
On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote:

  -   kfree_skb(skb);
  +   INIT_LIST_HEAD(q-new_flows);
  +   INIT_LIST_HEAD(q-old_flows);
  +   for (i = 0; i  q-flows_cnt; i++) {
  +   struct fq_codel_flow *flow = q-flows + i;
  +
  +   while (flow-head)
  +   kfree_skb(dequeue_head(flow));
  +
  +   INIT_LIST_HEAD(flow-flowchain);
 
 
 You probably need to call codel_vars_init(flow-cvars) as well.

It is not necessary : flow-cvars only matter in the event of a dequeue,
but whole qdisc is dismantled and no packet will be dequeued.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring

2015-07-21 Thread Duan Andy
From: Lucas Stach l.st...@pengutronix.de Sent: Tuesday, July 21, 2015 11:11 PM
 To: David S. Miller
 Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org;
 ker...@pengutronix.de; patchwork-...@pengutronix.de
 Subject: [PATCH v2 1/2] net: fec: use managed DMA API functions to
 allocate BD ring
 
 So it gets freed when the device is going away.
 This fixes a DMA memory leak on driver probe() fail and driver remove().
 
 Signed-off-by: Lucas Stach l.st...@pengutronix.de
 ---
 v2: Fix indentation of second line to fix alignment with opening bracket.
 ---
  drivers/net/ethernet/freescale/fec_main.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/net/ethernet/freescale/fec_main.c
 b/drivers/net/ethernet/freescale/fec_main.c
 index 349365d85b92..a7f1bdf718f8 100644
 --- a/drivers/net/ethernet/freescale/fec_main.c
 +++ b/drivers/net/ethernet/freescale/fec_main.c
 @@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev)
   fep-bufdesc_size;
 
   /* Allocate memory for buffer descriptors. */
 - cbd_base = dma_alloc_coherent(NULL, bd_size, bd_dma,
 -   GFP_KERNEL);
 + cbd_base = dmam_alloc_coherent(fep-pdev-dev, bd_size, bd_dma,
 +GFP_KERNEL);
   if (!cbd_base) {
   return -ENOMEM;
   }
 --

Can you also replace the below position with dma_alloc_coherent() ?
txq-tso_hdrs = dma_alloc_coherent(NULL,
txq-tx_ring_size * TSO_HEADER_SIZE,
txq-tso_hdrs_dma,
GFP_KERNEL);


Regards,
Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ravb: fix ring memory allocation

2015-07-21 Thread Sergei Shtylyov
The driver is written as if it can adapt to a low memory situation  allocating
less RX  skbs and TX aligned buffers than the respective RX/TX ring sizes.  In
reality  though  the driver  would malfunction in this case. Stop being overly
smart and just fail in such situation -- this is achieved by moving the memory
allocation from ravb_ring_format() to ravb_ring_init().

We leave dma_map_single() calls in place but make their failure non-fatal
by marking the corresponding RX descriptors  with zero data size which should
prevent DMA to an invalid addresses.

Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com

---
The patch is against Dave Miller's 'net.git' repo.

drivers/net/ethernet/renesas/ravb_main.c |   59 +--
 1 file changed, 34 insertions(+), 25 deletions(-)

Index: net/drivers/net/ethernet/renesas/ravb_main.c
===
--- net.orig/drivers/net/ethernet/renesas/ravb_main.c
+++ net/drivers/net/ethernet/renesas/ravb_main.c
@@ -228,9 +228,7 @@ static void ravb_ring_format(struct net_
struct ravb_desc *desc = NULL;
int rx_ring_size = sizeof(*rx_desc) * priv-num_rx_ring[q];
int tx_ring_size = sizeof(*tx_desc) * priv-num_tx_ring[q];
-   struct sk_buff *skb;
dma_addr_t dma_addr;
-   void *buffer;
int i;
 
priv-cur_rx[q] = 0;
@@ -241,41 +239,28 @@ static void ravb_ring_format(struct net_
memset(priv-rx_ring[q], 0, rx_ring_size);
/* Build RX ring buffer */
for (i = 0; i  priv-num_rx_ring[q]; i++) {
-   priv-rx_skb[q][i] = NULL;
-   skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1);
-   if (!skb)
-   break;
-   ravb_set_buffer_align(skb);
/* RX descriptor */
rx_desc = priv-rx_ring[q][i];
/* The size of the buffer should be on 16-byte boundary. */
rx_desc-ds_cc = cpu_to_le16(ALIGN(PKT_BUF_SZ, 16));
-   dma_addr = dma_map_single(ndev-dev, skb-data,
+   dma_addr = dma_map_single(ndev-dev, priv-rx_skb[q][i]-data,
  ALIGN(PKT_BUF_SZ, 16),
  DMA_FROM_DEVICE);
-   if (dma_mapping_error(ndev-dev, dma_addr)) {
-   dev_kfree_skb(skb);
-   break;
-   }
-   priv-rx_skb[q][i] = skb;
+   /* We just set the data size to 0 for a failed mapping which
+* should prevent DMA from happening...
+*/
+   if (dma_mapping_error(ndev-dev, dma_addr))
+   rx_desc-ds_cc = cpu_to_le16(0);
rx_desc-dptr = cpu_to_le32(dma_addr);
rx_desc-die_dt = DT_FEMPTY;
}
rx_desc = priv-rx_ring[q][i];
rx_desc-dptr = cpu_to_le32((u32)priv-rx_desc_dma[q]);
rx_desc-die_dt = DT_LINKFIX; /* type */
-   priv-dirty_rx[q] = (u32)(i - priv-num_rx_ring[q]);
 
memset(priv-tx_ring[q], 0, tx_ring_size);
/* Build TX ring buffer */
for (i = 0; i  priv-num_tx_ring[q]; i++) {
-   priv-tx_skb[q][i] = NULL;
-   priv-tx_buffers[q][i] = NULL;
-   buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL);
-   if (!buffer)
-   break;
-   /* Aligned TX buffer */
-   priv-tx_buffers[q][i] = buffer;
tx_desc = priv-tx_ring[q][i];
tx_desc-die_dt = DT_EEMPTY;
}
@@ -298,7 +283,10 @@ static void ravb_ring_format(struct net_
 static int ravb_ring_init(struct net_device *ndev, int q)
 {
struct ravb_private *priv = netdev_priv(ndev);
+   struct sk_buff *skb;
int ring_size;
+   void *buffer;
+   int i;
 
/* Allocate RX and TX skb rings */
priv-rx_skb[q] = kcalloc(priv-num_rx_ring[q],
@@ -308,12 +296,28 @@ static int ravb_ring_init(struct net_dev
if (!priv-rx_skb[q] || !priv-tx_skb[q])
goto error;
 
+   for (i = 0; i  priv-num_rx_ring[q]; i++) {
+   skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1);
+   if (!skb)
+   goto error;
+   ravb_set_buffer_align(skb);
+   priv-rx_skb[q][i] = skb;
+   }
+
/* Allocate rings for the aligned buffers */
priv-tx_buffers[q] = kcalloc(priv-num_tx_ring[q],
  sizeof(*priv-tx_buffers[q]), GFP_KERNEL);
if (!priv-tx_buffers[q])
goto error;
 
+   for (i = 0; i  priv-num_tx_ring[q]; i++) {
+   buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL);
+   if (!buffer)
+   goto error;
+   /* Aligned TX buffer */
+   priv-tx_buffers[q][i] = buffer;
+   }
+
/* Allocate all RX 

Re: [PATCHv2 net-next] net: #ifdefify sk_classid member of struct sock

2015-07-21 Thread David Miller
From: Mathias Krause mini...@googlemail.com
Date: Sun, 19 Jul 2015 22:21:13 +0200

 The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is
 enabled. #ifdefify it to reduce the size of struct sock on 32 bit
 systems, at least.
 
 Signed-off-by: Mathias Krause mini...@googlemail.com
 ---
 v2:
 - ensure we'll error out in nft_meta_get_init() if CONFIG_CGROUP_NET_CLASSID
   is not set

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pull-request: wireless-drivers 2015-07-20

2015-07-21 Thread David Miller
From: Kalle Valo kv...@codeaurora.org
Date: Mon, 20 Jul 2015 18:36:30 +0300

 here are few fixes for 4.2, should not have anything out of ordinary.
 Please let me know if there are any issues.

Pulled, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: track success and failure of TCP PMTU probing

2015-07-21 Thread Rick Jones
From: Rick Jones rick.jon...@hp.com

Track success and failure of TCP PMTU probing.

Signed-off-by: Rick Jones rick.jon...@hp.com

---

Tested by loading-up into an OpenStack instance and kicking the MTU
out from under it in the corresponding router namespace.

diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index eee8968..25a9ad8 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -278,6 +278,8 @@ enum
LINUX_MIB_TCPACKSKIPPEDCHALLENGE,   /* TCPACKSkippedChallenge */
LINUX_MIB_TCPWINPROBE,  /* TCPWinProbe */
LINUX_MIB_TCPKEEPALIVE, /* TCPKeepAlive */
+   LINUX_MIB_TCPMTUPFAIL,  /* TCPMTUPFail */
+   LINUX_MIB_TCPMTUPSUCCESS,   /* TCPMTUPSuccess */
__LINUX_MIB_MAX
 };
 
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index da5d483..3abd9d7 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -300,6 +300,8 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM(TCPACKSkippedChallenge, 
LINUX_MIB_TCPACKSKIPPEDCHALLENGE),
SNMP_MIB_ITEM(TCPWinProbe, LINUX_MIB_TCPWINPROBE),
SNMP_MIB_ITEM(TCPKeepAlive, LINUX_MIB_TCPKEEPALIVE),
+   SNMP_MIB_ITEM(TCPMTUPFail, LINUX_MIB_TCPMTUPFAIL),
+   SNMP_MIB_ITEM(TCPMTUPSuccess, LINUX_MIB_TCPMTUPSUCCESS),
SNMP_MIB_SENTINEL
 };
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1578fc2..cda3ffe 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2593,6 +2593,7 @@ static void tcp_mtup_probe_failed(struct sock *sk)
 
icsk-icsk_mtup.search_high = icsk-icsk_mtup.probe_size - 1;
icsk-icsk_mtup.probe_size = 0;
+   NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPFAIL);
 }
 
 static void tcp_mtup_probe_success(struct sock *sk)
@@ -2612,6 +2613,7 @@ static void tcp_mtup_probe_success(struct sock *sk)
icsk-icsk_mtup.search_low = icsk-icsk_mtup.probe_size;
icsk-icsk_mtup.probe_size = 0;
tcp_sync_mss(sk, icsk-icsk_pmtu_cookie);
+   NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPSUCCESS);
 }
 
 /* Do a simple retransmit without using the backoff mechanisms in
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4] ipv6: sysctl to restrict candidate source addresses

2015-07-21 Thread David Miller
From: Erik Kline e...@google.com
Date: Mon, 20 Jul 2015 16:06:34 +0200

 I thought perhaps use_oif_addr_only was a slightly clearer sysctl name.
 
 (Maybe it should be plural, use_oif_addrs_only?)

I think plural would be better too, please respin with that change.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested

2015-07-21 Thread David Miller
From: Florian Fainelli f.faine...@gmail.com
Date: Mon, 20 Jul 2015 17:49:54 -0700

 Changes in v5:
 
 - removed an invalid use of the link_update callback in the SF2 driver
   was appeared after merging net: phy: fixed_phy: handle link-down case
 
 - reworded the commit message for patch 2 to make it clear what it fixes and
   why this is required

Series applied, thanks Florian.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/1] tipc: fix compatibility bug

2015-07-21 Thread David Miller
From: Jon Maloy jon.ma...@ericsson.com
Date: Tue, 21 Jul 2015 06:42:28 -0400

 In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6
 (tipc: reduce locking scope during packet reception) we introduced
 a new function tipc_link_proto_rcv(). This function contains a bug,
 so that it sometimes by error sends out a non-zero link priority value
 in created protocol messages.
 
 The bug may lead to an extra link reset at initial link establising
 with older nodes. This will never happen more than once, whereafter
 the link will work as intended.
 
 We fix this bug in this commit.
 
 Signed-off-by: Jon Maloy jon.ma...@ericsson.com

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch net] sch_plug: purge buffered packets during reset

2015-07-21 Thread Cong Wang
Otherwise the skbuff related structures are not correctly
refcount'ed.

Cc: Jamal Hadi Salim j...@mojatatu.com
Signed-off-by: Cong Wang xiyou.wangc...@gmail.com
---
 net/sched/sch_plug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c
index 89f8fcf..ade9445 100644
--- a/net/sched/sch_plug.c
+++ b/net/sched/sch_plug.c
@@ -216,6 +216,7 @@ static struct Qdisc_ops plug_qdisc_ops __read_mostly = {
.peek=   qdisc_peek_head,
.init=   plug_init,
.change  =   plug_change,
+   .reset   =   qdisc_reset_queue,
.owner   =   THIS_MODULE,
 };
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next:master 194/208] include/net/dst_metadata.h:39:4: error: implicit declaration of function 'lwt_tun_info'

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: 3093fbe7ff4bc7d1571fc217dade1cf80330a714 [194/208] route: Per route IP 
tunnel metadata via lightweight tunnel
config: i386-randconfig-i0-201529 (attached as .config)
reproduce:
  git checkout 3093fbe7ff4bc7d1571fc217dade1cf80330a714
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by ):

   In file included from net/core/dst.c:25:0:
   include/net/dst_metadata.h: In function 'skb_tunnel_info':
 include/net/dst_metadata.h:39:4: error: implicit declaration of function 
 'lwt_tun_info' [-Werror=implicit-function-declaration]
   return lwt_tun_info(rt-rt_lwtstate);
   ^
 include/net/dst_metadata.h:39:4: warning: return makes pointer from integer 
 without a cast
   cc1: some warnings being treated as errors

vim +/lwt_tun_info +39 include/net/dst_metadata.h

33  return md_dst-u.tun_info;
34  
35  switch (family) {
36  case AF_INET:
37  rt = (struct rtable *)skb_dst(skb);
38  if (rt  rt-rt_lwtstate)
   39  return lwt_tun_info(rt-rt_lwtstate);
40  break;
41  }
42  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.2.0-rc2 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT=elf32-i386
CONFIG_ARCH_DEFCONFIG=arch/x86/configs/i386_defconfig
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS=-fcall-saved-ecx -fcall-saved-edx
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_KERNEL_LZ4=y
CONFIG_DEFAULT_HOSTNAME=(none)
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_DEBUG=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
# CONFIG_RCU_STALL_COMMON is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_MEMCG is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
CONFIG_BLK_CGROUP=y
CONFIG_DEBUG_BLK_CGROUP=y
# CONFIG_CHECKPOINT_RESTORE is not set

[net-next:master 195/208] net/core/fib_rules.c:418:3: error: implicit declaration of function 'ip_tunnel_need_metadata'

2015-07-21 Thread kbuild test robot
tree:   git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   16040894b26af9f85d9395f072c53d76a44eba21
commit: e7030878fc8448492b6e5cecd574043f63271298 [195/208] fib: Add fib rule 
match on tunnel id
config: i386-randconfig-r0-201529 (attached as .config)
reproduce:
  git checkout e7030878fc8448492b6e5cecd574043f63271298
  # save the attached .config to linux build tree
  make ARCH=i386 

All error/warnings (new ones prefixed by ):

   net/core/fib_rules.c: In function 'fib_nl_newrule':
 net/core/fib_rules.c:418:3: error: implicit declaration of function 
 'ip_tunnel_need_metadata' [-Werror=implicit-function-declaration]
  ip_tunnel_need_metadata();
  ^
   net/core/fib_rules.c: In function 'fib_nl_delrule':
 net/core/fib_rules.c:505:4: error: implicit declaration of function 
 'ip_tunnel_unneed_metadata' [-Werror=implicit-function-declaration]
   ip_tunnel_unneed_metadata();
   ^
   cc1: some warnings being treated as errors

vim +/ip_tunnel_need_metadata +418 net/core/fib_rules.c

   412  ops-nr_goto_rules++;
   413  
   414  if (unresolved)
   415  ops-unresolved_rules++;
   416  
   417  if (rule-tun_id)
  418  ip_tunnel_need_metadata();
   419  
   420  notify_rule_change(RTM_NEWRULE, rule, ops, nlh, 
NETLINK_CB(skb).portid);
   421  flush_route_cache(ops);
   422  rules_ops_put(ops);
   423  return 0;
   424  
   425  errout_free:
   426  kfree(rule);
   427  errout:
   428  rules_ops_put(ops);
   429  return err;
   430  }
   431  
   432  static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
   433  {
   434  struct net *net = sock_net(skb-sk);
   435  struct fib_rule_hdr *frh = nlmsg_data(nlh);
   436  struct fib_rules_ops *ops = NULL;
   437  struct fib_rule *rule, *tmp;
   438  struct nlattr *tb[FRA_MAX+1];
   439  int err = -EINVAL;
   440  
   441  if (nlh-nlmsg_len  nlmsg_msg_size(sizeof(*frh)))
   442  goto errout;
   443  
   444  ops = lookup_rules_ops(net, frh-family);
   445  if (ops == NULL) {
   446  err = -EAFNOSUPPORT;
   447  goto errout;
   448  }
   449  
   450  err = nlmsg_parse(nlh, sizeof(*frh), tb, FRA_MAX, ops-policy);
   451  if (err  0)
   452  goto errout;
   453  
   454  err = validate_rulemsg(frh, tb, ops);
   455  if (err  0)
   456  goto errout;
   457  
   458  list_for_each_entry(rule, ops-rules_list, list) {
   459  if (frh-action  (frh-action != rule-action))
   460  continue;
   461  
   462  if (frh_get_table(frh, tb) 
   463  (frh_get_table(frh, tb) != rule-table))
   464  continue;
   465  
   466  if (tb[FRA_PRIORITY] 
   467  (rule-pref != nla_get_u32(tb[FRA_PRIORITY])))
   468  continue;
   469  
   470  if (tb[FRA_IIFNAME] 
   471  nla_strcmp(tb[FRA_IIFNAME], rule-iifname))
   472  continue;
   473  
   474  if (tb[FRA_OIFNAME] 
   475  nla_strcmp(tb[FRA_OIFNAME], rule-oifname))
   476  continue;
   477  
   478  if (tb[FRA_FWMARK] 
   479  (rule-mark != nla_get_u32(tb[FRA_FWMARK])))
   480  continue;
   481  
   482  if (tb[FRA_FWMASK] 
   483  (rule-mark_mask != nla_get_u32(tb[FRA_FWMASK])))
   484  continue;
   485  
   486  if (tb[FRA_TUN_ID] 
   487  (rule-tun_id != nla_get_be64(tb[FRA_TUN_ID])))
   488  continue;
   489  
   490  if (!ops-compare(rule, frh, tb))
   491  continue;
   492  
   493  if (rule-flags  FIB_RULE_PERMANENT) {
   494  err = -EPERM;
   495  goto errout;
   496  }
   497  
   498  if (ops-delete) {
   499  err = ops-delete(rule);
   500  if (err)
   501  goto errout;
   502  }
   503  
   504  if (rule-tun_id)
  505  ip_tunnel_unneed_metadata();
   506  
   507  list_del_rcu(rule-list);
   508  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.2.0-rc2 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y

[PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread Martin KaFai Lau
The patch checks neigh-nud_state before acquiring the writer lock.
Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

I also take this chance to re-arrange the code.

40 udpflood processes and a /64 gateway route are used.
The gateway has NUD_PERMANENT.  Each of them is run for 30s.
At the end, the total number of finished sendto():

BeforeAfter
55M   95M

Signed-off-by: Martin KaFai Lau ka...@fb.com
Cc: Hannes Frederic Sowa han...@stressinduktion.org
---
 net/ipv6/route.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6090969..a6c6b5a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w)
 
 static void rt6_probe(struct rt6_info *rt)
 {
+   struct __rt6_probe_work *work;
struct neighbour *neigh;
/*
 * Okay, this does not seem to be appropriate
@@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt)
rcu_read_lock_bh();
neigh = __ipv6_neigh_lookup_noref(rt-dst.dev, rt-rt6i_gateway);
if (neigh) {
-   write_lock(neigh-lock);
if (neigh-nud_state  NUD_VALID)
goto out;
-   }
-
-   if (!neigh ||
-   time_after(jiffies, neigh-updated + 
rt-rt6i_idev-cnf.rtr_probe_interval)) {
-   struct __rt6_probe_work *work;
 
+   work = NULL;
+   write_lock(neigh-lock);
+   if (!(neigh-nud_state  NUD_VALID) 
+   time_after(jiffies, neigh-updated + 
rt-rt6i_idev-cnf.rtr_probe_interval)) {
+   work = kmalloc(sizeof(*work), GFP_ATOMIC);
+   if (work) {
+   __neigh_set_probe_once(neigh);
+   }
+   }
+   write_unlock(neigh-lock);
+   } else {
work = kmalloc(sizeof(*work), GFP_ATOMIC);
+   }
 
-   if (neigh  work)
-   __neigh_set_probe_once(neigh);
-
-   if (neigh)
-   write_unlock(neigh-lock);
+   if (work) {
+   INIT_WORK(work-work, rt6_probe_deferred);
+   work-target = rt-rt6i_gateway;
+   dev_hold(rt-dst.dev);
+   work-dev = rt-dst.dev;
+   schedule_work(work-work);
+   }
 
-   if (work) {
-   INIT_WORK(work-work, rt6_probe_deferred);
-   work-target = rt-rt6i_gateway;
-   dev_hold(rt-dst.dev);
-   work-dev = rt-dst.dev;
-   schedule_work(work-work);
-   }
-   } else {
 out:
-   write_unlock(neigh-lock);
-   }
rcu_read_unlock_bh();
 }
 #else
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Patch net] sch_choke: drop all packets in queue during reset

2015-07-21 Thread Cong Wang
Signed-off-by: Cong Wang xiyou.wangc...@gmail.com
---
 net/sched/sch_choke.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 93d5742..6a783af 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -385,6 +385,19 @@ static void choke_reset(struct Qdisc *sch)
 {
struct choke_sched_data *q = qdisc_priv(sch);
 
+   while (q-head != q-tail) {
+   struct sk_buff *skb = q-tab[q-head];
+
+   q-head = (q-head + 1)  q-tab_mask;
+   if (!skb)
+   continue;
+   qdisc_qstats_backlog_dec(sch, skb);
+   --sch-q.qlen;
+   qdisc_drop(skb, sch);
+   }
+
+   memset(q-tab, 0, (q-tab_mask + 1) * sizeof(struct sk_buff *));
+   q-head = q-tail = 0;
red_restart(q-vars);
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] caif: fix leaks and race in caif_queue_rcv_skb()

2015-07-21 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Fri, 17 Jul 2015 10:19:23 +0200

 From: Eric Dumazet eduma...@google.com
 
 1) If sk_filter() is applied, skb was leaked (not freed)
 2) Testing SOCK_DEAD twice is racy :
packet could be freed while already queued.
 3) Remove obsolete comment about caching skb-len
 
 Signed-off-by: Eric Dumazet eduma...@google.com

Applied, thanks Eric.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sctp: fix cut and paste issue in comment

2015-07-21 Thread David Miller
From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
Date: Fri, 17 Jul 2015 13:50:21 -0300

 Cookie ACK is always received by the association initiator, so fix the
 comment to avoid confusion.
 
 Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] net: fec: use managed DMA API functions to allocate BD ring

2015-07-21 Thread David Miller
From: Lucas Stach l.st...@pengutronix.de
Date: Mon, 20 Jul 2015 15:51:37 +0200

 So it gets freed when the device is going away.
 This fixes a DMA memory leak on driver probe() fail and driver
 remove().
 
 Signed-off-by: Lucas Stach l.st...@pengutronix.de
 ---
  drivers/net/ethernet/freescale/fec_main.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/net/ethernet/freescale/fec_main.c 
 b/drivers/net/ethernet/freescale/fec_main.c
 index 349365d85b92..b3287c6b069b 100644
 --- a/drivers/net/ethernet/freescale/fec_main.c
 +++ b/drivers/net/ethernet/freescale/fec_main.c
 @@ -3142,7 +3142,7 @@ static int fec_enet_init(struct net_device *ndev)
   fep-bufdesc_size;
  
   /* Allocate memory for buffer descriptors. */
 - cbd_base = dma_alloc_coherent(NULL, bd_size, bd_dma,
 + cbd_base = dmam_alloc_coherent(fep-pdev-dev, bd_size, bd_dma,
 GFP_KERNEL);

When you change the column of the openning parenthesis of a function call, you
must fix up the indentation of the second and subsequent lines so that they all
properly start at the first column after that openning parenthesis.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net/mdio: fix mdio_bus_match for c45 PHY

2015-07-21 Thread David Miller
From: shh@gmail.com
Date: Fri, 17 Jul 2015 18:07:19 +0800

 From: Shaohui Xie shaohui@freescale.com
 
 We store c45 PHY's id information in c45_ids, so it should be used to
 check the matching between PHY driver and PHY device for c45 PHY.
 
 Signed-off-by: Shaohui Xie shaohui@freescale.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: mvneta: fix refilling for Rx DMA buffers

2015-07-21 Thread David Miller
From: Simon Guinot simon.gui...@sequanux.org
Date: Sun, 19 Jul 2015 13:00:53 +0200

 With the actual code, if a memory allocation error happens while
 refilling a Rx descriptor, then the original Rx buffer is both passed
 to the networking stack (in a SKB) and let in the Rx ring. This leads
 to various kernel oops and crashes.
 
 As a fix, this patch moves Rx descriptor refilling ahead of building
 SKB with the associated Rx buffer. In case of a memory allocation
 failure, data is dropped and the original DMA buffer is put back into
 the Rx ring.
 
 Signed-off-by: Simon Guinot simon.gui...@sequanux.org
 Fixes: c5aff18204da (net: mvneta: driver for Marvell Armada 370/XP network 
 unit)
 Cc: sta...@vger.kernel.org # v3.8+
 Tested-by: Yoann Sculo yo...@sculo.fr

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] inet: frags: fix defragmented packet's IP header for af_packet

2015-07-21 Thread Eric Dumazet
 This doesn't compile.

 net/ipv4/ip_fragment.c: In function ‘ip_frag_reasm’:
 net/ipv4/ip_fragment.c:644:23: error: ‘skb’ undeclared (first use in this 
 function)
   ip_send_check(ip_hdr(skb));

This was meant to be
  ip_send_check(iph);

Sorry, will send a v2
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] phylib: add driver for Teranetics TN2020

2015-07-21 Thread David Miller
From: shh@gmail.com
Date: Fri, 17 Jul 2015 11:19:46 +0800

 From: Shaohui Xie shaohui@freescale.com
 
 Teranetics TN2020 is compliant with IEEE 802.3an 10 Gigabit.
 
 Signed-off-by: Shaohui Xie shaohui@freescale.com

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] rhashtable: Allow other tasks to be scheduled in large lookup loops

2015-07-21 Thread David Miller
From: Thomas Graf tg...@suug.ch
Date: Fri, 17 Jul 2015 10:52:48 +0200

 Depending on system speed, the large lookup/insert/delete loops of the 
 testsuite can
 take a considerable amount of time to complete causing watchdog warnings to 
 appear.
 Allow other tasks to be scheduled throughout the loops.
 
 Reported-by: Meelis Roos mr...@linux.ee
 Signed-off-by: Thomas Graf tg...@suug.ch
 ---
 v2: Use cond_resched() instead schedule()

Applied, thanks Thomas.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] inet: Always increment refcount in inet_twsk_schedule

2015-07-21 Thread Eric Dumazet
On Mon, 2015-07-20 at 19:14 +, subas...@codeaurora.org wrote:
  //Initialize time wait socket and setup timer
  inet_twsk_alloc() tw_refcnt = 0
  __inet_twsk_hashdance() tw_refcnt = 3
  inet_twsk_schedule() tw_refcnt = 4
  inet_twsk_put() tw_refcnt = 3
 
  //Receive packet 1 in timewait state
  tcp_timewait_state_process() - inet_twsk_schedule tw_refcnt = 3 (no
  change)
 
  This is obviously wrong.
 
  If a timewait socket is found, do we increment its refcnt before
  proceeding.
 We do not increment refcount currently when we find a timewait socket.

Actually we do increment refcnt, for every socket found in ehash.

Carefully read again __inet_lookup_established()

This code is generic for ESTABLISH and TIME-WAIT sockets

If you found a code that performed the lookup without taking the refcnt,
please point me at it, this would be a serious bug.

 
  I've received some private mails about tw issues, that turned to be
  caused by buggy drivers or buggy arch specific code.
 
  Are you crashed observed on x86 ?
 
 This is observed on ARM devices. In the current debug, all time wait
 socket refcount changes were happening in TCP stack only and there was no
 platform / driver code involved.
 
 According to my understanding, we would need to increment the time wait
 socket refcount first before proceeding with any subsequent operations.
 However, I request your expert opinion on this.

Is it some Android kernel ?

Android had private modules that needed an update in 3.18



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bonding: correct the MAC address for follow fail_over_mac policy

2015-07-21 Thread Ding Tianhong
On 2015/7/21 11:30, David Miller wrote:
 From: Ding Tianhong dingtianh...@huawei.com
 Date: Thu, 16 Jul 2015 16:30:02 +0800
 
 The follow fail_over_mac policy is useful for multiport devices that
 either become confused or incur a performance penalty when multiple
 ports are programmed with the same MAC address, but the same MAC
 address still may happened by this steps for this policy:

 1) echo +eth0  /sys/class/net/bond0/bonding/slaves
bond0 has the same mac address with eth0, it is MAC1.

 2) echo +eth1  /sys/class/net/bond0/bonding/slaves
eth1 is backup, eth1 has MAC2.

 3) ifconfig eth0 down
eth1 became active slave, bond will swap MAC for eth0 and eth1,
so eth1 has MAC1, and eth0 has MAC2.

 4) ifconfig eth1 down
there is no active slave, and eth1 still has MAC1, eth2 has MAC2.

 5) ifconfig eth0 up
the eth0 became active slave again, the bond set eth0 to MAC1.

 Something wrong here, then if you set eth1 up, the eth0 and eth1 will have 
 the same
 MAC address, it will break this policy for ACTIVE_BACKUP mode.

 This patch will fix this problem by finding the old active slave and
 swap them MAC address before change active slave.

 Signed-off-by: Ding Tianhong dingtianh...@huawei.com
 
 Applied and queued up for -stable, thanks.

Thanks David.

hi zefan:

Could you please apply this patch to 3.4 stable tree, I think it will fix the 
same problem for this version.

Ding
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 .
 


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] macvtap: fix network header pointer for VLAN tagged pkts

2015-07-21 Thread Ivan Vecera

On 07/20/2015 06:42 PM, Vlad Yasevich wrote:

On 07/20/2015 11:44 AM, Ivan Vecera wrote:

Network header is set with offset ETH_HLEN but it is not true for VLAN
(multiple-)tagged and results in checksum issues in lower devices.

Signed-off-by: Ivan Vecera ivec...@redhat.com
---
  drivers/net/macvtap.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 3b933bb..cdcbab4 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -796,6 +796,12 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
struct msghdr *m,
skb_reset_mac_header(skb);
skb-protocol = eth_hdr(skb)-h_proto;

+   if (skb_vlan_tagged(skb)) {
+   int depth;
+   skb-protocol = __vlan_get_protocol(skb, skb-protocol, depth);


I don't think this is right.  This would reset the protocol to the encapsulated
protocol which isn't really the case since you are not really stripping vlan
encapsulations.

-vlad

Yup, you are right, skb-protocol should be untouched. Will post v2.

Ivan

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] net: ratelimit warnings about dst entry refcount underflow or overflow

2015-07-21 Thread David Miller
From: Konstantin Khlebnikov khlebni...@yandex-team.ru
Date: Fri, 17 Jul 2015 14:01:11 +0300

 Kernel generates a lot of warnings when dst entry reference counter
 overflows and becomes negative. That bug was seen several times at
 machines with outdated 3.10.y kernels. Most like it's already fixed
 in upstream. Anyway that flood completely kills machine and makes
 further debugging impossible.
 
 Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru

Applied.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] net/vxlan: Fix kernel unaligned access in __vxlan_find_mac

2015-07-21 Thread David Miller
From: Sowmini Varadhan sowmini.varad...@oracle.com
Date: Mon, 20 Jul 2015 09:54:50 +0200

 
 __vxlan_find_mac invokes ether_addr_equal on the eth_addr field,
 which triggers unaligned access messages, so rearrange vxlan_fdb
 to avoid this in the most non-intrusive way.
 
 Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
 ---
 v2: Alexander Duyck comments: place eth_addr[] to be 64b aligned

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] stmmac: fix setting of driver data in stmmac_dvr_probe

2015-07-21 Thread David Miller
From: Joachim Eastwood manab...@gmail.com
Date: Fri, 17 Jul 2015 23:48:17 +0200

 Commit 803f8fc46274b (stmmac: move driver data setting into
 stmmac_dvr_probe) mistakenly set priv and not priv-dev as
 driver data. This meant that the remove, resume and suspend
 callbacks that fetched and tried to use this data would most
 likely explode. Fix the issue by using the correct variable.
 
 Fixes: 803f8fc46274b (stmmac: move driver data setting into 
 stmmac_dvr_probe)
 Signed-off-by: Joachim Eastwood manab...@gmail.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >