date:20150813

Re: [PATCH iproute2 2/3] ip-link: fix and extend documentation

2015-08-13 Thread vkochan

On Wed, Aug 12, 2015 at 10:04:07PM +0200, Pavel Šimerda wrote:
 From: Pavel Šimerda psime...@redhat.com
 
  * Add `can` to list of supported link types
  * Document `addrgenmode`
  * Document `link-netnsid`
  * Document VLAN link type
  * Improve VXLAN link type documentation
 - Fix VXLAN srcport/dstport docs
 - Document `udpcsum`, `udp6zerocsumtx` and `udp6zerocsumrx`
 ---
  man/man8/ip-link.8.in | 112 
 --
  1 file changed, 108 insertions(+), 4 deletions(-)
 
 diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
 index 372e6c6..9ff9a23 100644
 --- a/man/man8/ip-link.8.in
 +++ b/man/man8/ip-link.8.in
 @@ -143,9 +143,13 @@ ip-link \- network device configuration
  ] |
  .br
  .B master
 -.IR DEVICE
 +.IR DEVICE  |
  .br
 -.B nomaster
 +.B nomaster  |
 +.br
 +.B addrgenmode { eui64 | none }
 +.br
 +.B link-netnsid ID
  .BR  }
  
  
 @@ -185,6 +189,8 @@ Link types:
  .sp
  .B bond
  - Bonding device
 +.B can
 +- Controller Area Network interface
  .sp
  .B dummy
  - Dummy network interface
 @@ -266,6 +272,66 @@ specifies the number of receive queues for new device.
  specifies the desired index of the new virtual device. The link creation 
 fails, if the index is busy.
  
  .TP
 +VLAN Type Support
 +For a link of type
 +.I VLAN
 +the following additional arguments are supported:
 +
 +.BI ip link add
 +.BI link  DEVICE 
 +.BI name  NAME 
 +.BI type  vlan 
 +.R  [ 
 +.BI protocol  VLAN_PROTO 
 +.R  ] 
 +.BI id  VLANID 
 +.R  [ 
 +.BR reorder_hdr  {  on  |  off  } 
 +.R  ] 
 +.R  [ 
 +.BR gvrp  {  on  |  off  } 
 +.R  ] 
 +.R  [ 
 +.BR mvrp  {  on  |  off  } 
 +.R  ] 
 +.R  [ 
 +.BR loose_binding  {  on  |  off  } 
 +.R  ] 
 +.R  [ 
 +.BI ingress-qos-map  QOS-MAP 
 +.R  ] 
 +.R  [ 
 +.BI egress-qos-map  QOS-MAP 
 +.R  ] 
 +
 +.in +8
 +.sp
 +.BI protocol  VLAN_PROTO 
 +- either 802.1Q or 802.1ad.
 +
 +.BI id  VLANID 
 +- specifies the VLAN Identifer to use. Note that numbers with a leading  0 
  or  0x  are interpreted as octal or hexadeimal, respectively.
 +
 +.BR reorder_hdr  {  on  |  off  } 
 +- specifies whether ethernet headers are reordered or not.
May be it should have more detailed explanation like that this feature is ON by 
default and what it affects on ?

 +
 +.BR gvrp  {  on  |  off  } 
 +- specifies whether this VLAN should be registered using GARP VLAN 
 Registration Protocol.
 +
 +.BR mvrp  {  on  |  off  } 
 +- specifies whether this VLAN should be registered using Multiple VLAN 
 Registration Protocol.
 +
 +.BR loose_binding  {  on  |  off  } 
 +- specifies whether the VLAN device state is bound to the physical device 
 state.
May be add some little explanation that it means that if physical device goes 
DOWN then the vlan device does the same ?

 +
 +.BI ingress-qos-map  QOS-MAP 
 +- defines a mapping between priority code points on incoming frames.  The 
 format is FROM:TO with multiple mappings separated by spaces.
Would it be useful if to add here explanation that under priority it means 
skb-priority and may be some example how it can be set by iptable CLASSIFY ?
 +
 +.BI egress-qos-map  QOS-MAP 
 +- the same as ingress-qos-map but for outgoing frames.
 +.in -8
 +
 +.TP
  VXLAN Type Support
  For a link of type
  .I VXLAN
 @@ -284,7 +350,9 @@ the following additional arguments are supported:
  .R  ] [ 
  .BI tos  TOS 
  .R  ] [ 
 -.BI port  MIN MAX 
 +.BI dstport  PORT 
 +.R  ] [ 
 +.BI srcport  MIN MAX 
  .R  ] [ 
  .I [no]learning 
  .R  ] [ 
 @@ -296,6 +364,12 @@ the following additional arguments are supported:
  .R  ] [ 
  .I [no]l3miss 
  .R  ] [ 
 +.I [no]udpcsum 
 +.R  ] [ 
 +.I [no]udp6zerocsumtx 
 +.R  ] [ 
 +.I [no]udp6zerocsumrx 
 +.R  ] [ 
  .BI ageing  SECONDS 
  .R  ] [ 
  .BI maxaddress  NUMBER 
 @@ -340,7 +414,11 @@ parameter.
  - specifies the TOS value to use in outgoing packets.
  
  .sp
 -.BI port  MIN MAX
 +.BI dstport  PORT
 +- specifies the UDP destination port to communicate to the remote VXLAN 
 tunnel endpoint.
 +
 +.sp
 +.BI srcport  MIN MAX
  - specifies the range of port numbers to use as UDP
  source ports to communicate to the remote VXLAN tunnel endpoint.
  
 @@ -366,6 +444,18 @@ are entered into the VXLAN device forwarding database.
  - specifies if netlink IP ADDR miss notifications are generated.
  
  .sp
 +.I [no]udpcsum
 +- specifies if UDP checksum is filled in
 +
 +.sp
 +.I [no]udp6zerocsumtx
 +- specifies if UDP checksum is filled in
 +
 +.sp
 +.I [no]udp6zerocsumrx
 +- specifies if UDP checksum is received
 +
 +.sp
  .BI ageing  SECONDS
  - specifies the lifetime in seconds of FDB entries learnt by the kernel.
  
 @@ -751,6 +841,12 @@ tool can be used. But it allows to change network 
 namespace only for physical de
  give the device a symbolic name for easy reference.
  
  .TP
 +.BI group  GROUP
 +specify the group the device belongs to.
 +The available groups are listed in file
 +.BR @SYSCONFDIR@/group .
 +
 +.TP
  .BI vf  NUM
  specify a Virtual Function device to be configured. The

[PATCH RFC net 0/3] ipv6: Fix potential deadlock when creating pcpu rt

2015-08-13 Thread Martin KaFai Lau

This patch series fixes a potential deadlock when creating a pcpu rt.
It happens when dst_alloc() decided to run gc. Something like this:

read_lock(table-tb6_lock);
ip6_rt_pcpu_alloc()
= dst_alloc()
= ip6_dst_gc()
= write_lock(table-tb6_lock); /* oops */

Patch 1 and 2 are some prep works.
Patch 3 is the fix.

Original report: https://bugzilla.kernel.org/show_bug.cgi?id=102291

Steinar, the patches can also be applied to 4.2-rc5 (I just tried).
Can you help to test them? Thanks!

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC net 3/3] ipv6: Fix potential deadlock when creating pcpu rt

2015-08-13 Thread Martin KaFai Lau

rt6_make_pcpu_route() is called under read_lock(table-tb6_lock).
rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
calls dst_alloc().  dst_alloc() _may_ call ip6_dst_gc() which takes
the write_lock(tabl-tb6_lock).  A visualized version:

read_lock(table-tb6_lock);
rt6_make_pcpu_route();
= ip6_rt_pcpu_alloc();
= dst_alloc();
= ip6_dst_gc();
= write_lock(table-tb6_lock); /* oops */

The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc().

A reported stack:

[141625.537638] INFO: rcu_sched self-detected stall on CPU { 27}  (t=6 
jiffies g=4159086 c=4159085 q=2139)
[141625.547469] Task dump for CPU 27:
[141625.550881] mtr R  running task0 22121  22081 0x0008
[141625.558069]   88103f363d98 8106e488 
001b
[141625.565641]  81684900 88103f363db8 810702b0 
0800
[141625.573220]  81684900 88103f363de8 8108df9f 
88103f375a00
[141625.580803] Call Trace:
[141625.583345]  IRQ  [8106e488] sched_show_task+0xc1/0xc6
[141625.589650]  [810702b0] dump_cpu_task+0x35/0x39
[141625.595144]  [8108df9f] rcu_dump_cpu_stacks+0x6a/0x8c
[141625.601320]  [81090606] rcu_check_callbacks+0x1f6/0x5d4
[141625.607669]  [810940c8] update_process_times+0x2a/0x4f
[141625.613925]  [8109fbee] tick_sched_handle+0x32/0x3e
[141625.619923]  [8109fc2f] tick_sched_timer+0x35/0x5c
[141625.625830]  [81094a1f] __hrtimer_run_queues+0x8f/0x18d
[141625.632171]  [81094c9e] hrtimer_interrupt+0xa0/0x166
[141625.638258]  [8102bf2a] local_apic_timer_interrupt+0x4e/0x52
[141625.645036]  [8102c36f] smp_apic_timer_interrupt+0x39/0x4a
[141625.651643]  [8140b9e8] apic_timer_interrupt+0x68/0x70
[141625.657895]  EOI  [81346ee8] ? dst_destroy+0x7c/0xb5
[141625.664188]  [813d45b5] ? fib6_flush_trees+0x20/0x20
[141625.670272]  [81082b45] ? queue_write_lock_slowpath+0x60/0x6f
[141625.677140]  [8140aa33] _raw_write_lock_bh+0x23/0x25
[141625.683218]  [813d4553] __fib6_clean_all+0x40/0x82
[141625.689124]  [813d45b5] ? fib6_flush_trees+0x20/0x20
[141625.695207]  [813d6058] fib6_clean_all+0xe/0x10
[141625.700854]  [813d60d3] fib6_run_gc+0x79/0xc8
[141625.706329]  [813d0510] ip6_dst_gc+0x85/0xf9
[141625.711718]  [81346d68] dst_alloc+0x55/0x159
[141625.717105]  [813d09b5] __ip6_dst_alloc.isra.32+0x19/0x63
[141625.723620]  [813d1830] ip6_pol_route+0x36a/0x3e8
[141625.729441]  [813d18d6] ip6_pol_route_output+0x11/0x13
[141625.735700]  [813f02c8] fib6_rule_action+0xa7/0x1bf
[141625.741698]  [813d18c5] ? ip6_pol_route_input+0x17/0x17
[141625.748043]  [81357c48] fib_rules_lookup+0xb5/0x12a
[141625.754050]  [81141628] ? poll_select_copy_remaining+0xf9/0xf9
[141625.761002]  [813f0535] fib6_rule_lookup+0x37/0x5c
[141625.766914]  [813d18c5] ? ip6_pol_route_input+0x17/0x17
[141625.773260]  [813d008c] ip6_route_output+0x7a/0x82
[141625.779177]  [813c44c8] ip6_dst_lookup_tail+0x53/0x112
[141625.785437]  [813c45c3] ip6_dst_lookup_flow+0x2a/0x6b
[141625.791604]  [813ddaab] rawv6_sendmsg+0x407/0x9b6
[141625.797423]  [813d7914] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2
[141625.804464]  [8139d4b4] inet_sendmsg+0x57/0x8e
[141625.810028]  [81329ba3] sock_sendmsg+0x2e/0x3c
[141625.815588]  [8132be57] SyS_sendto+0xfe/0x143
[141625.821063]  [813dd551] ? rawv6_setsockopt+0x5e/0x67
[141625.827146]  [8132c9f8] ? sock_common_setsockopt+0xf/0x11
[141625.833660]  [8132c08c] ? SyS_setsockopt+0x81/0xa2
[141625.839565]  [8140ac17] entry_SYSCALL_64_fastpath+0x12/0x6a

Fixes: d52d3997f843 (pv6: Create percpu rt6_info)
Signed-off-by: Martin KaFai Lau ka...@fb.com
CC: Hannes Frederic Sowa han...@stressinduktion.org
Reported-by: Steinar H. Gunderson sgunder...@bigfoot.com
---
 net/ipv6/ip6_fib.c |  2 ++
 net/ipv6/route.c   | 44 +---
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 55d1986..548c623 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -172,6 +172,8 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
*ppcpu_rt = NULL;
}
}
+
+   non_pcpu_rt-rt6i_pcpu = NULL;
 }
 
 static void rt6_release(struct rt6_info *rt)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 0a82653..d155864 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1007,27 +1007,39 @@ static struct rt6_info *rt6_get_pcpu_route(struct 
rt6_info *rt)
 
 static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt)
 {
+   struct fib6_table *table = rt-rt6i_table;
struct rt6_info *pcpu_rt, *prev, **p;
 
pcpu_rt = ip6_rt_pcpu_alloc(rt);
if (!pcpu_rt) {

[PATCH RFC net 2/3] ipv6: Add rt6_make_pcpu_route()

2015-08-13 Thread Martin KaFai Lau

It is a prep work for the potential deadlock.  The current
rt6_get_pcpu_route() will also create a pcpu rt if one does not exist.
This patch moves the pcpu rt creation logic into another function,
rt6_make_pcpu_route().

Signed-off-by: Martin KaFai Lau ka...@fb.com
CC: Hannes Frederic Sowa han...@stressinduktion.org
---
 net/ipv6/route.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c95c319..0a82653 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -993,13 +993,21 @@ static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info 
*rt)
 /* It should be called with read_lock_bh(tb6_lock) acquired */
 static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt)
 {
-   struct rt6_info *pcpu_rt, *prev, **p;
+   struct rt6_info *pcpu_rt, **p;
 
p = this_cpu_ptr(rt-rt6i_pcpu);
pcpu_rt = *p;
 
-   if (pcpu_rt)
-   goto done;
+   if (pcpu_rt) {
+   dst_hold(pcpu_rt-dst);
+   rt6_dst_from_metrics_check(pcpu_rt);
+   }
+   return pcpu_rt;
+}
+
+static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt)
+{
+   struct rt6_info *pcpu_rt, *prev, **p;
 
pcpu_rt = ip6_rt_pcpu_alloc(rt);
if (!pcpu_rt) {
@@ -1009,6 +1017,7 @@ static struct rt6_info *rt6_get_pcpu_route(struct 
rt6_info *rt)
goto done;
}
 
+   p = this_cpu_ptr(rt-rt6i_pcpu);
prev = cmpxchg(p, NULL, pcpu_rt);
if (prev) {
/* If someone did it before us, return prev instead */
@@ -1093,8 +1102,11 @@ redo_rt6_select:
rt-dst.lastuse = jiffies;
rt-dst.__use++;
pcpu_rt = rt6_get_pcpu_route(rt);
-   read_unlock_bh(table-tb6_lock);
 
+   if (!pcpu_rt)
+   pcpu_rt = rt6_make_pcpu_route(rt);
+
+   read_unlock_bh(table-tb6_lock);
return pcpu_rt;
}
 }
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC net 1/3] ipv6: Remove un-used argument from ip6_dst_alloc()

2015-08-13 Thread Martin KaFai Lau

After 4b32b5ad31a6 (ipv6: Stop rt6_info from using inet_peer's metrics),
ip6_dst_alloc() does not need the 'table' argument.  This patch
cleans it up.

Signed-off-by: Martin KaFai Lau ka...@fb.com
CC: Hannes Frederic Sowa han...@stressinduktion.org
---
 net/ipv6/route.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9de4d2b..c95c319 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -318,8 +318,7 @@ static const struct rt6_info ip6_blk_hole_entry_template = {
 /* allocate dst with ip6_dst_ops */
 static struct rt6_info *__ip6_dst_alloc(struct net *net,
struct net_device *dev,
-   int flags,
-   struct fib6_table *table)
+   int flags)
 {
struct rt6_info *rt = dst_alloc(net-ipv6.ip6_dst_ops, dev,
0, DST_OBSOLETE_FORCE_CHK, flags);
@@ -336,10 +335,9 @@ static struct rt6_info *__ip6_dst_alloc(struct net *net,
 
 static struct rt6_info *ip6_dst_alloc(struct net *net,
  struct net_device *dev,
- int flags,
- struct fib6_table *table)
+ int flags)
 {
-   struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags, table);
+   struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags);
 
if (rt) {
rt-rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC);
@@ -950,8 +948,7 @@ static struct rt6_info *ip6_rt_cache_alloc(struct rt6_info 
*ort,
if (ort-rt6i_flags  (RTF_CACHE | RTF_PCPU))
ort = (struct rt6_info *)ort-dst.from;
 
-   rt = __ip6_dst_alloc(dev_net(ort-dst.dev), ort-dst.dev,
-0, ort-rt6i_table);
+   rt = __ip6_dst_alloc(dev_net(ort-dst.dev), ort-dst.dev, 0);
 
if (!rt)
return NULL;
@@ -983,8 +980,7 @@ static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info 
*rt)
struct rt6_info *pcpu_rt;
 
pcpu_rt = __ip6_dst_alloc(dev_net(rt-dst.dev),
- rt-dst.dev, rt-dst.flags,
- rt-rt6i_table);
+ rt-dst.dev, rt-dst.flags);
 
if (!pcpu_rt)
return NULL;
@@ -1555,7 +1551,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
if (unlikely(!idev))
return ERR_PTR(-ENODEV);
 
-   rt = ip6_dst_alloc(net, dev, 0, NULL);
+   rt = ip6_dst_alloc(net, dev, 0);
if (unlikely(!rt)) {
in6_dev_put(idev);
dst = ERR_PTR(-ENOMEM);
@@ -1742,7 +1738,8 @@ int ip6_route_add(struct fib6_config *cfg)
if (!table)
goto out;
 
-   rt = ip6_dst_alloc(net, NULL, (cfg-fc_flags  RTF_ADDRCONF) ? 0 : 
DST_NOCOUNT, table);
+   rt = ip6_dst_alloc(net, NULL,
+  (cfg-fc_flags  RTF_ADDRCONF) ? 0 : DST_NOCOUNT);
 
if (!rt) {
err = -ENOMEM;
@@ -2399,7 +2396,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev 
*idev,
 {
struct net *net = dev_net(idev-dev);
struct rt6_info *rt = ip6_dst_alloc(net, net-loopback_dev,
-   DST_NOCOUNT, NULL);
+   DST_NOCOUNT);
if (!rt)
return ERR_PTR(-ENOMEM);
 
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] mac80211: use DECLARE_EWMA

2015-08-13 Thread Johannes Berg

From: Johannes Berg johannes.b...@intel.com

Instead of using the out-of-line average calculation, use the new
DECLARE_EWMA() macro to declare a signal EWMA, and use that.

This actually *reduces* the code size slightly (on x86-64) while
also reducing the station info size by 80 bytes.

Signed-off-by: Johannes Berg johannes.b...@intel.com
---
 net/mac80211/Kconfig  | 1 -
 net/mac80211/mesh_plink.c | 2 +-
 net/mac80211/rx.c | 4 ++--
 net/mac80211/sta_info.c   | 9 +
 net/mac80211/sta_info.h   | 6 --
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig
index 086de496a4c1..3891cbd2adea 100644
--- a/net/mac80211/Kconfig
+++ b/net/mac80211/Kconfig
@@ -7,7 +7,6 @@ config MAC80211
select CRYPTO_CCM
select CRYPTO_GCM
select CRC32
-   select AVERAGE
---help---
  This option enables the hardware independent IEEE 802.11
  networking stack.
diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c
index 1a7d98398626..6fa6606fce55 100644
--- a/net/mac80211/mesh_plink.c
+++ b/net/mac80211/mesh_plink.c
@@ -64,7 +64,7 @@ static bool rssi_threshold_check(struct ieee80211_sub_if_data 
*sdata,
 {
s32 rssi_threshold = sdata-u.mesh.mshcfg.rssi_threshold;
return rssi_threshold == 0 ||
-  (sta  (s8) -ewma_read(sta-avg_signal)  rssi_threshold);
+  (sta  (s8) -ewma_signal_read(sta-avg_signal)  
rssi_threshold);
 }
 
 /**
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 3a1462810c8e..e3cff02bde07 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1428,7 +1428,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx)
sta-rx_bytes += rx-skb-len;
if (!(status-flag  RX_FLAG_NO_SIGNAL_VAL)) {
sta-last_signal = status-signal;
-   ewma_add(sta-avg_signal, -status-signal);
+   ewma_signal_add(sta-avg_signal, -status-signal);
}
 
if (status-chains) {
@@ -1440,7 +1440,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx)
continue;
 
sta-chain_signal_last[i] = signal;
-   ewma_add(sta-chain_signal_avg[i], -signal);
+   ewma_signal_add(sta-chain_signal_avg[i], -signal);
}
}
 
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 9da7d2bc271a..dd9541c51de1 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -341,9 +341,9 @@ struct sta_info *sta_info_alloc(struct 
ieee80211_sub_if_data *sdata,
 
ktime_get_ts(uptime);
sta-last_connected = uptime.tv_sec;
-   ewma_init(sta-avg_signal, 1024, 8);
+   ewma_signal_init(sta-avg_signal);
for (i = 0; i  ARRAY_SIZE(sta-chain_signal_avg); i++)
-   ewma_init(sta-chain_signal_avg[i], 1024, 8);
+   ewma_signal_init(sta-chain_signal_avg[i]);
 
if (local-ops-wake_tx_queue) {
void *txq_data;
@@ -1899,7 +1899,8 @@ void sta_set_sinfo(struct sta_info *sta, struct 
station_info *sinfo)
}
 
if (!(sinfo-filled  BIT(NL80211_STA_INFO_SIGNAL_AVG))) {
-   sinfo-signal_avg = (s8) -ewma_read(sta-avg_signal);
+   sinfo-signal_avg =
+   (s8) -ewma_signal_read(sta-avg_signal);
sinfo-filled |= BIT(NL80211_STA_INFO_SIGNAL_AVG);
}
}
@@ -1914,7 +1915,7 @@ void sta_set_sinfo(struct sta_info *sta, struct 
station_info *sinfo)
for (i = 0; i  ARRAY_SIZE(sinfo-chain_signal); i++) {
sinfo-chain_signal[i] = sta-chain_signal_last[i];
sinfo-chain_signal_avg[i] =
-   (s8) -ewma_read(sta-chain_signal_avg[i]);
+   (s8) 
-ewma_signal_read(sta-chain_signal_avg[i]);
}
}
 
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 6dcb33484eac..1c5333448bde 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -318,6 +318,8 @@ struct mesh_sta {
unsigned int fail_avg;
 };
 
+DECLARE_EWMA(signal, 1024, 8)
+
 /**
  * struct sta_info - STA information
  *
@@ -460,12 +462,12 @@ struct sta_info {
unsigned long rx_fragments;
unsigned long rx_dropped;
int last_signal;
-   struct ewma avg_signal;
+   struct ewma_signal avg_signal;
int last_ack_signal;
 
u8 chains;
s8 chain_signal_last[IEEE80211_MAX_CHAINS];
-   struct ewma chain_signal_avg[IEEE80211_MAX_CHAINS];
+   struct ewma_signal chain_signal_avg[IEEE80211_MAX_CHAINS];
 
/* Plus 1 for non-QoS frames */
__le16 last_seq_ctrl[IEEE80211_NUM_TIDS + 1];
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info

[PATCH iproute2] ip-link: remove unnecessary return

2015-08-13 Thread Zhang Shengju

Remove unnecessary retrun, because invarg() exit.

Signed-off-by: Zhang Shengju zhangshen...@cmss.chinamobile.com
---
 ip/iplink_bridge.c | 30 --
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index e704e29..61e4cda 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -42,47 +42,41 @@ static int bridge_parse_opt(struct link_util *lu, int argc, 
char **argv,
while (argc  0) {
if (matches(*argv, forward_delay) == 0) {
NEXT_ARG();
-   if (get_u32(val, *argv, 0)) {
+   if (get_u32(val, *argv, 0))
invarg(invalid forward_delay, *argv);
-   return -1;
-   }
+
addattr32(n, 1024, IFLA_BR_FORWARD_DELAY, val);
} else if (matches(*argv, hello_time) == 0) {
NEXT_ARG();
-   if (get_u32(val, *argv, 0)) {
+   if (get_u32(val, *argv, 0))
invarg(invalid hello_time, *argv);
-   return -1;
-   }
+
addattr32(n, 1024, IFLA_BR_HELLO_TIME, val);
} else if (matches(*argv, max_age) == 0) {
NEXT_ARG();
-   if (get_u32(val, *argv, 0)) {
+   if (get_u32(val, *argv, 0))
invarg(invalid max_age, *argv);
-   return -1;
-   }
+
addattr32(n, 1024, IFLA_BR_MAX_AGE, val);
} else if (matches(*argv, ageing_time) == 0) {
NEXT_ARG();
-   if (get_u32(val, *argv, 0)) {
+   if (get_u32(val, *argv, 0))
invarg(invalid ageing_time, *argv);
-   return -1;
-   }
+
addattr32(n, 1024, IFLA_BR_AGEING_TIME, val);
} else if (matches(*argv, stp_state) == 0) {
NEXT_ARG();
-   if (get_u32(val, *argv, 0)) {
+   if (get_u32(val, *argv, 0))
invarg(invalid stp_state, *argv);
-   return -1;
-   }
+
addattr32(n, 1024, IFLA_BR_STP_STATE, val);
} else if (matches(*argv, priority) == 0) {
__u16 prio;
 
NEXT_ARG();
-   if (get_u16(prio, *argv, 0)) {
+   if (get_u16(prio, *argv, 0))
invarg(invalid priority, *argv);
-   return -1;
-   }
+
addattr16(n, 1024, IFLA_BR_PRIORITY, prio);
} else if (matches(*argv, help) == 0) {
explain();
-- 
1.8.3.1



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-08-13 Thread Graeme Gregory

On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote:
 Add ACPI bindings for the smsc911x driver. Convert the DT specific calls
 to nonspecific device* calls, This allows the driver to work
 with both ACPI and DT configurations. Ethernet should now work when using
 ACPI on ARM Juno.
 
 Signed-off-by: Jeremy Linton jeremy.lin...@arm.com

The code looks fine to me.

Currently the compulsary DT properties seem to match the approved ACPI
NIC properties from here

http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf

Reviewed-by: Graeme Gregory graeme.greg...@linaro.org

Thanks

 ---
  drivers/net/ethernet/smsc/smsc911x.c | 48 
 +---
  1 file changed, 22 insertions(+), 26 deletions(-)
 
 diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
 b/drivers/net/ethernet/smsc/smsc911x.c
 index 959aeea..0f21aa3 100644
 --- a/drivers/net/ethernet/smsc/smsc911x.c
 +++ b/drivers/net/ethernet/smsc/smsc911x.c
 @@ -59,7 +59,9 @@
  #include linux/of_device.h
  #include linux/of_gpio.h
  #include linux/of_net.h
 +#include linux/acpi.h
  #include linux/pm_runtime.h
 +#include linux/property.h
  
  #include smsc911x.h
  
 @@ -2362,59 +2364,46 @@ static const struct smsc911x_ops shifted_smsc911x_ops 
 = {
   .tx_writefifo = smsc911x_tx_writefifo_shift,
  };
  
 -#ifdef CONFIG_OF
 -static int smsc911x_probe_config_dt(struct smsc911x_platform_config *config,
 - struct device_node *np)
 +static int smsc911x_probe_config(struct smsc911x_platform_config *config,
 +  struct device *dev)
  {
 - const char *mac;
   u32 width = 0;
  
 - if (!np)
 + if (!dev)
   return -ENODEV;
  
 - config-phy_interface = of_get_phy_mode(np);
 + config-phy_interface = device_get_phy_mode(dev);
  
 - mac = of_get_mac_address(np);
 - if (mac)
 - memcpy(config-mac, mac, ETH_ALEN);
 + device_get_mac_address(dev, config-mac, ETH_ALEN);
  
 - of_property_read_u32(np, reg-shift, config-shift);
 + device_property_read_u32(dev, reg-shift, config-shift);
  
 - of_property_read_u32(np, reg-io-width, width);
 + device_property_read_u32(dev, reg-io-width, width);
   if (width == 4)
   config-flags |= SMSC911X_USE_32BIT;
   else
   config-flags |= SMSC911X_USE_16BIT;
  
 - if (of_get_property(np, smsc,irq-active-high, NULL))
 + if (device_property_present(dev, smsc,irq-active-high))
   config-irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH;
  
 - if (of_get_property(np, smsc,irq-push-pull, NULL))
 + if (device_property_present(dev, smsc,irq-push-pull))
   config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL;
  
 - if (of_get_property(np, smsc,force-internal-phy, NULL))
 + if (device_property_present(dev, smsc,force-internal-phy))
   config-flags |= SMSC911X_FORCE_INTERNAL_PHY;
  
 - if (of_get_property(np, smsc,force-external-phy, NULL))
 + if (device_property_present(dev, smsc,force-external-phy))
   config-flags |= SMSC911X_FORCE_EXTERNAL_PHY;
  
 - if (of_get_property(np, smsc,save-mac-address, NULL))
 + if (device_property_present(dev, smsc,save-mac-address))
   config-flags |= SMSC911X_SAVE_MAC_ADDRESS;
  
   return 0;
  }
 -#else
 -static inline int smsc911x_probe_config_dt(
 - struct smsc911x_platform_config *config,
 - struct device_node *np)
 -{
 - return -ENODEV;
 -}
 -#endif /* CONFIG_OF */
  
  static int smsc911x_drv_probe(struct platform_device *pdev)
  {
 - struct device_node *np = pdev-dev.of_node;
   struct net_device *dev;
   struct smsc911x_data *pdata;
   struct smsc911x_platform_config *config = dev_get_platdata(pdev-dev);
 @@ -2478,7 +2467,7 @@ static int smsc911x_drv_probe(struct platform_device 
 *pdev)
   goto out_disable_resources;
   }
  
 - retval = smsc911x_probe_config_dt(pdata-config, np);
 + retval = smsc911x_probe_config(pdata-config, pdev-dev);
   if (retval  config) {
   /* copy config parameters across to pdata */
   memcpy(pdata-config, config, sizeof(pdata-config));
 @@ -2654,6 +2643,12 @@ static const struct of_device_id smsc911x_dt_ids[] = {
  MODULE_DEVICE_TABLE(of, smsc911x_dt_ids);
  #endif
  
 +static const struct acpi_device_id smsc911x_acpi_match[] = {
 + { ARMH9118, 0 },
 + { }
 +};
 +MODULE_DEVICE_TABLE(acpi, smsc911x_acpi_match);
 +
  static struct platform_driver smsc911x_driver = {
   .probe = smsc911x_drv_probe,
   .remove = smsc911x_drv_remove,
 @@ -2661,6 +2656,7 @@ static struct platform_driver smsc911x_driver = {
   .name   = SMSC_CHIPNAME,
   .pm = SMSC911X_PM_OPS,
   .of_match_table = of_match_ptr(smsc911x_dt_ids),
 + .acpi_match_table = ACPI_PTR(smsc911x_acpi_match),
   },
  };
  
 -- 
 2.4.3

Re: [PATCH v2 0/2] net: thunder: Add ACPI support.

2015-08-13 Thread Hanjun Guo


On 08/12/2015 11:36 PM, David Daney wrote:

On 08/12/2015 08:23 AM, Catalin Marinas wrote:

On Tue, Aug 11, 2015 at 01:04:55PM -0700, David Daney wrote:

On 08/11/2015 11:49 AM, David Miller wrote:

From: David Daney ddaney.c...@gmail.com
Date: Mon, 10 Aug 2015 17:58:35 -0700


Change from v1:  Drop PHY binding part, use fwnode_property* APIs.

The first patch (1/2) rearranges the existing code a little with no
functional change to get ready for the second.  The second (2/2) does
the actual work of adding support to extract the needed information

from the ACPI tables.

Series applied.


Thank you very much.


In the future it might be better structured to try and get the OF
node, and if that fails then try and use the ACPI method to obtain
these values.


Our current approach, as you can see in the patch, is the opposite.
If ACPI
is being used, prefer that over the OF device tree.

You seem to be recommending precedence for OF.  It should be consistent
across all drivers/sub-systems, so do you really think that OF before
ACPI
is the way to go?


On arm64 (unless you use a vendor kernel), DT takes precedence over ACPI
if both arm provided to the kernel (and it's a fair assumption given
that ACPI on ARM is still in the early days). You could also force ACPI
with acpi=force on the kernel cmd line and the arch code will not
unflatten the DT even if it is provided, therefore is_of_node(fwnode)
returning false.


Yes. on the other hand, if no DT is provided, will try ACPI even
if no acpi=force on the kernel cmd line.



I haven't looked at your driver in detail but something like AMD's
xgbe_probe() uses a single function for both DT and ACPI with
device_property_read_*() functions getting the information from the
correct place in either case. The ACPI vs DT precedence is handled by
the arch boot code, we never mix the two and confuse the drivers.



My long term plan is to create something like
firmware_get_mac_address(), that would encapsulate  of_get_mac_address()
and the ACPI equivalent.

Same for firmware_phy_find_device().


I'm very keen on seeing that happens :)

Thanks
Hanjun
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 11/11] net: rfkill: Allow compile test of GPIO consumers if !GPIOLIB

2015-08-13 Thread Johannes Berg

On Sun, 2015-08-02 at 11:09 +0200, Geert Uytterhoeven wrote:
 The GPIO subsystem provides dummy GPIO consumer functions if GPIOLIB 
 is
 not enabled. Hence drivers that depend on GPIOLIB, but use GPIO 
 consumer
 functionality only, can still be compiled if GPIOLIB is not enabled.
 
 Relax the dependency on GPIOLIB if COMPILE_TEST is enabled, where
 appropriate.
 
Applied.

johannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with fragmented packets on tun/tap interface

2015-08-13 Thread Prashant Upadhyaya

On Fri, Jul 31, 2015 at 4:51 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Fri, 2015-07-31 at 16:42 +0530, Prashant Upadhyaya wrote:
 On Fri, Jul 31, 2015 at 1:26 PM, Eric Dumazet eric.duma...@gmail.com wrote:
  On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote:
 
  The delays work for me but is clearly not good for the performance of
  the slow path. And more importantly, I was looking for a fundamental
  reason regarding why it works with delays and why not without it. The
  issue is reproducible with a big ping (3.11.10-301.fc20.x86_64)
 
  How big ping needs to be to reproduce the problem ?
 
 

 If the MTU is 1500, I start getting problems anywhere starting from
 2900 bytes and surely comes when further big pings are used eg. 10 K.
 (ping IP -s Size eg. ping 10.3.10.244 -s 1)
 And the big pings do work, as I said, with the delay hack.

 It might help trying this while you receive such frags :

 perf record -a -g skb:kfree_skb sleep 10

 ...

 perf report




Hi,

I think I have a clue to the root cause of my issue, but I do not know
a solution.
Let me describe what I think is the problem.

Fragmented packets enter into the kernel through eth0 and the kernel
starts assembling them.
Simultaneously, my packet socket implementation also injects the very
same packets into the kernel via the tap. The kernel sees them as
overlapped packets during assembly and drops the packets injected via
the tap.
Eventually when the assembly gets complete inside kernel for all the
packets which entered via eth0, the whole packet gets dropped due to
the iptables rules that I have set on eth0.
So naturally there is no response to the bigger ping, because
everything got dropped one way or the other.

When I do introduce the delays (and it turns out that the delay that
matters is when injecting via tap), the kernel has already completed
the assembly of the packets via eth0 (during the delay I introduce for
submission on tap), and then the submission via tap works well because
it undergoes a fresh assembly (and ofcourse it does not get dropped
because iptables drop rule is only on eth0)

Now then, the question is -- how do I prevent the kernel from trying
to assemble the packets arriving on eth0 and drop them right away even
before assembly is attempted. This way the same packet injected via
the tap would be the only one undergoing assembly and hopefully it
would work.

Regards
-Prashant
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Question on behavior of tg3_self_test() (ethtool -t on tg3 driver)

2015-08-13 Thread Siva Reddy (Siva) Kallam




On 8/12/2015 6:02 PM, Douglas Miller wrote:
Oh, I had missed the extra if condition on tg3_test_link(). So 
external_lb is not a true superset of offline.


So you are not surprised by the (about) 20 second link down period 
after this test? If this is expected (albeit undocumented) behavior we 
can change the test scenario to work around it. It seems as though not 
all adapters exhibit this same symptom. From a testing standpoint, it 
is a long delay to add that may only be needed for this one adapter 
(Broadcom BCM5719, or adapter family).


We executed the ethtool -t dev offline in a loop on our local test 
machine with 5719 and linkup time is = 5 secs.


Script:
#!/bin/bash
echo -OS Information-
uname -a
echo --Card Information--
lspci | grep 5719
echo --Interface information--
ethtool -i p4p4
echo -Offline test start--
for i in 1 2 3
do
date
ethtool -t p4p4 offline
done

Output:

-OS Information-
Linux siva-dev 4.2.0-rc4+ #1 SMP Thu Aug 13 20:24:11 IST 2015 x86_64 
x86_64 x86_64 GNU/Linux

--Card Information--
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
03:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
03:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)

--Interface information--
driver: tg3
version: 3.137
firmware-version: 5719-v1.41 NCSI v1.3.6.0
bus-info: :03:00.3
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
-Offline test start--
Thu Aug 13 22:05:59 IST 2015
The test result is PASS
The test extra info:
nvram test(online)  0
link test (online)  0
register test (offline) 0
memory test   (offline) 0
mac loopback test (offline) 0
phy loopback test (offline) 0
ext loopback test (offline) 0
interrupt test(offline) 0

Thu Aug 13 22:06:00 IST 2015
The test result is PASS
The test extra info:
nvram test(online)  0
link test (online)  0
register test (offline) 0
memory test   (offline) 0
mac loopback test (offline) 0
phy loopback test (offline) 0
ext loopback test (offline) 0
interrupt test(offline) 0

Thu Aug 13 22:06:05 IST 2015
The test result is PASS
The test extra info:
nvram test(online)  0
link test (online)  0
register test (offline) 0
memory test   (offline) 0
mac loopback test (offline) 0
phy loopback test (offline) 0
ext loopback test (offline) 0
interrupt test(offline) 0

Please check your test environment.


Thanks,
Doug

On 08/11/2015 03:31 PM, Michael Chan wrote:

On Tue, 2015-08-11 at 14:24 -0500, Douglas Miller wrote:

Yes, the wrap plugs are the loopback cables/plugs. It is my
understanding that the offline tests do not require anything to be
plugged into the ports, as they do not in any way touch the external
port. They perform an internal loopback test which does not depend on
any external connection.

Correct.


  From what I can tell, the only difference between offline and
external_lb is that external_lb performs the external loopback
tests, *in addition to* all the tests done for offline.

Correct.


This would
imply that the only tests that depend on anything connected to the
physical port is external_lb, and there is no requirement that the
wrap plugs be removed/replaced in order to run offline tests.

When you do external loopback test, we skip the link test because you no
longer have normal connection to the network.  You now use a special
loopback cable, which will fail the link up test because the link up
test assumes connection to the network using normal cable.

In the case I was debugging, wrap plugs were installed because the 
ports

were, later, being tested in an external loopback way.

What I am observing is that it takes about 20 seconds for the kernel to
declare that the link is up, after running the offline or
external_lb test. In the case of offline I cannot run the test 
again

until the kernel declares the link up. In the case of external_lb I
can run the test again immediately and it passes.

As stated earlier, because we skip the link test when we are performing
external_lb.

So, you should always do ethtool -t dev external_lb if you have a
loopback cable connected.  We will perform the external loopback test
and skip the link test.

If you don't have an external loopback cable connected, you should run
ethtool -t dev offline.  It will not do the external loopback test and
will do the link test for proper link up with the network.


This suggests to me
that the external_lb case (again, it is a superset of offline) is
performing

Re: [PATCH] mm: make page pfmemalloc check more robust

2015-08-13 Thread Vlastimil Babka


On 08/13/2015 10:58 AM, mho...@kernel.org wrote:

From: Michal Hocko mho...@suse.com

The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb)
added the checks for page-pfmemalloc to __skb_fill_page_desc():

 if (page-pfmemalloc  !page-mapping)
 skb-pfmemalloc = true;

It assumes page-mapping == NULL implies that page-pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page-mapping
to NULL and leave page-index value alone. Due to being in union, a
non-zero page-index will be interpreted as true page-pfmemalloc.

So the assumption is invalid if the networking code can see such a
page. And it seems it can. We have encountered this with a NFS over
loopback setup when such a page is attached to a new skbuf. There is no
copying going on in this case so the page confuses __skb_fill_page_desc
which interprets the index as pfmemalloc flag and the network stack
drops packets that have been allocated using the reserves unless they
are to be queued on sockets handling the swapping which is the case here


^ not ?

The full story (according to Jiri Bohac and my understanding, I don't 
know much about netdev) is that the __skb_fill_page_desc() is invoked 
here during *sending* and normally the skb-pfmemalloc would be ignored 
in the end. But because it is a localhost connection, the receiving code 
will think it was a memalloc allocation during receive, and then do the 
socket restriction.


Given that this apparently isn't the first case of this localhost issue, 
I wonder if network code should just clear skb-pfmemalloc during send 
(or maybe just send over localhost). That would be probably easier than 
distinguish the __skb_fill_page_desc() callers for send vs receive.



and that leads to hangs when the nfs client waits for a response from
the server which has been dropped and thus never arrive.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH iproute2] ip-link: enhance prompt message

2015-08-13 Thread Zhang Shengju

Enhance promtp message for 'spoofchk' and 'query_rss' flag, and fix a
typo.

Signed-off-by: Zhang Shengju zhangshen...@cmss.chinamobile.com
---
 ip/iplink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 1836889..520f750 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -329,7 +329,7 @@ static int iplink_parse_vf(int vf, int *argcp, char 
***argvp,
else if (matches(*argv, off) == 0)
ivs.setting = 0;
else
-   invarg(Invalid \spoofchk\ value\n, *argv);
+   return on_off(spoofchk, *argv);
ivs.vf = vf;
addattr_l(req-n, sizeof(*req), IFLA_VF_SPOOFCHK, 
ivs, sizeof(ivs));
 
@@ -341,7 +341,7 @@ static int iplink_parse_vf(int vf, int *argcp, char 
***argvp,
else if (matches(*argv, off) == 0)
ivs.setting = 0;
else
-   invarg(Invalid \query_rss\ value\n, *argv);
+   return on_off(query_rss, *argv);
ivs.vf = vf;
addattr_l(req-n, sizeof(*req), IFLA_VF_RSS_QUERY_EN, 
ivs, sizeof(ivs));
 
@@ -1092,7 +1092,7 @@ static int do_set(int argc, char **argv)
} else if (strcmp(*argv, off) == 0) {
flags |= IFF_NOARP;
} else
-   return on_off(noarp, *argv);
+   return on_off(arp, *argv);
} else if (matches(*argv, dynamic) == 0) {
NEXT_ARG();
mask |= IFF_DYNAMIC;
-- 
1.8.3.1



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] mm: make page pfmemalloc check more robust

2015-08-13 Thread mhocko

From: Michal Hocko mho...@suse.com

The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb)
added the checks for page-pfmemalloc to __skb_fill_page_desc():

if (page-pfmemalloc  !page-mapping)
skb-pfmemalloc = true;

It assumes page-mapping == NULL implies that page-pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page-mapping
to NULL and leave page-index value alone. Due to being in union, a
non-zero page-index will be interpreted as true page-pfmemalloc.

So the assumption is invalid if the networking code can see such a
page. And it seems it can. We have encountered this with a NFS over
loopback setup when such a page is attached to a new skbuf. There is no
copying going on in this case so the page confuses __skb_fill_page_desc
which interprets the index as pfmemalloc flag and the network stack
drops packets that have been allocated using the reserves unless they
are to be queued on sockets handling the swapping which is the case here
and that leads to hangs when the nfs client waits for a response from
the server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding
another hole to put it in, let's do a trick instead. We can reuse the
index again but define it to an impossible value (-1UL). This is the
page index so it should never see the value that large. Replace all
direct users of page-pfmemalloc by page_is_pfmemalloc which will
hide this nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page-index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g. what SLAB and SLUB do).

Fixes: c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb)
Cc: stable # 3.6+
Debugged-by: Vlastimil Babka vba...@suse.com
Debugged-by: Jiri Bohac jbo...@suse.com
Signed-off-by: Michal Hocko mho...@suse.com
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 +-
 include/linux/mm.h| 28 +++
 include/linux/mm_types.h  |  9 
 include/linux/skbuff.h| 14 
 mm/page_alloc.c   |  9 +---
 mm/slab.c |  4 ++--
 mm/slub.c |  2 +-
 net/core/skbuff.c |  2 +-
 11 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 982fdcdc795b..b5b2925103ec 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -216,7 +216,7 @@ static void fm10k_reuse_rx_page(struct fm10k_ring *rx_ring,
 
 static inline bool fm10k_page_is_reserved(struct page *page)
 {
-   return (page_to_nid(page) != numa_mem_id()) || page-pfmemalloc;
+   return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
 }
 
 static bool fm10k_can_reuse_rx_page(struct fm10k_rx_buffer *rx_buffer,
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 2f70a9b152bd..830466c49987 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6566,7 +6566,7 @@ static void igb_reuse_rx_page(struct igb_ring *rx_ring,
 
 static inline bool igb_page_is_reserved(struct page *page)
 {
-   return (page_to_nid(page) != numa_mem_id()) || page-pfmemalloc;
+   return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
 }
 
 static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9aa6104e34ea..ae21e0b06c3a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1832,7 +1832,7 @@ static void ixgbe_reuse_rx_page(struct ixgbe_ring 
*rx_ring,
 
 static inline bool ixgbe_page_is_reserved(struct page *page)
 {
-   return (page_to_nid(page) != numa_mem_id()) || page-pfmemalloc;
+   return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index e71cdde9cb01..1d7b00b038a2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -765,7 +765,7 @@ static void ixgbevf_reuse_rx_page(struct ixgbevf_ring 
*rx_ring,
 
 static inline bool ixgbevf_page_is_reserved(struct page *page)
 {
-   return (page_to_nid(page) !=

Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-08-13 Thread Lorenzo Pieralisi

On Thu, Aug 13, 2015 at 09:27:59AM +0100, Graeme Gregory wrote:
 On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote:
  Add ACPI bindings for the smsc911x driver. Convert the DT specific calls
  to nonspecific device* calls, This allows the driver to work
  with both ACPI and DT configurations. Ethernet should now work when using
  ACPI on ARM Juno.

Last sentence does not belong in the commit log.

  
  Signed-off-by: Jeremy Linton jeremy.lin...@arm.com
 
 The code looks fine to me.
 
 Currently the compulsary DT properties seem to match the approved ACPI
 NIC properties from here
 
 http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf

What about _DSD device specific properties (eg smsc,save-mac-address) ?
Are we taking 1:1 translation between DT and ACPI for granted ?

I thought some process must be put in place to define the corresponding
bindings in ACPI before starting this mechanical translation or maybe
I missed something, I would like to understand.

How does the device specific _DSD definitions work in ACPI world ? Where
are they published ? How will we translate those to DT bindings if there
is need ?

Lorenzo

 Reviewed-by: Graeme Gregory graeme.greg...@linaro.org
 
 Thanks
 
  ---
   drivers/net/ethernet/smsc/smsc911x.c | 48 
  +---
   1 file changed, 22 insertions(+), 26 deletions(-)
  
  diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
  b/drivers/net/ethernet/smsc/smsc911x.c
  index 959aeea..0f21aa3 100644
  --- a/drivers/net/ethernet/smsc/smsc911x.c
  +++ b/drivers/net/ethernet/smsc/smsc911x.c
  @@ -59,7 +59,9 @@
   #include linux/of_device.h
   #include linux/of_gpio.h
   #include linux/of_net.h
  +#include linux/acpi.h
   #include linux/pm_runtime.h
  +#include linux/property.h
   
   #include smsc911x.h
   
  @@ -2362,59 +2364,46 @@ static const struct smsc911x_ops 
  shifted_smsc911x_ops = {
  .tx_writefifo = smsc911x_tx_writefifo_shift,
   };
   
  -#ifdef CONFIG_OF
  -static int smsc911x_probe_config_dt(struct smsc911x_platform_config 
  *config,
  -   struct device_node *np)
  +static int smsc911x_probe_config(struct smsc911x_platform_config *config,
  +struct device *dev)
   {
  -   const char *mac;
  u32 width = 0;
   
  -   if (!np)
  +   if (!dev)
  return -ENODEV;
   
  -   config-phy_interface = of_get_phy_mode(np);
  +   config-phy_interface = device_get_phy_mode(dev);
   
  -   mac = of_get_mac_address(np);
  -   if (mac)
  -   memcpy(config-mac, mac, ETH_ALEN);
  +   device_get_mac_address(dev, config-mac, ETH_ALEN);
   
  -   of_property_read_u32(np, reg-shift, config-shift);
  +   device_property_read_u32(dev, reg-shift, config-shift);
   
  -   of_property_read_u32(np, reg-io-width, width);
  +   device_property_read_u32(dev, reg-io-width, width);
  if (width == 4)
  config-flags |= SMSC911X_USE_32BIT;
  else
  config-flags |= SMSC911X_USE_16BIT;
   
  -   if (of_get_property(np, smsc,irq-active-high, NULL))
  +   if (device_property_present(dev, smsc,irq-active-high))
  config-irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH;
   
  -   if (of_get_property(np, smsc,irq-push-pull, NULL))
  +   if (device_property_present(dev, smsc,irq-push-pull))
  config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL;
   
  -   if (of_get_property(np, smsc,force-internal-phy, NULL))
  +   if (device_property_present(dev, smsc,force-internal-phy))
  config-flags |= SMSC911X_FORCE_INTERNAL_PHY;
   
  -   if (of_get_property(np, smsc,force-external-phy, NULL))
  +   if (device_property_present(dev, smsc,force-external-phy))
  config-flags |= SMSC911X_FORCE_EXTERNAL_PHY;
   
  -   if (of_get_property(np, smsc,save-mac-address, NULL))
  +   if (device_property_present(dev, smsc,save-mac-address))
  config-flags |= SMSC911X_SAVE_MAC_ADDRESS;
   
  return 0;
   }
  -#else
  -static inline int smsc911x_probe_config_dt(
  -   struct smsc911x_platform_config *config,
  -   struct device_node *np)
  -{
  -   return -ENODEV;
  -}
  -#endif /* CONFIG_OF */
   
   static int smsc911x_drv_probe(struct platform_device *pdev)
   {
  -   struct device_node *np = pdev-dev.of_node;
  struct net_device *dev;
  struct smsc911x_data *pdata;
  struct smsc911x_platform_config *config = dev_get_platdata(pdev-dev);
  @@ -2478,7 +2467,7 @@ static int smsc911x_drv_probe(struct platform_device 
  *pdev)
  goto out_disable_resources;
  }
   
  -   retval = smsc911x_probe_config_dt(pdata-config, np);
  +   retval = smsc911x_probe_config(pdata-config, pdev-dev);
  if (retval  config) {
  /* copy config parameters across to pdata */
  memcpy(pdata-config, config, sizeof(pdata-config));
  @@ -2654,6 +2643,12 @@ static const struct of_device_id smsc911x_dt_ids[] = 
  {

[PATCH 1/2] average: provide macro to create static EWMA

2015-08-13 Thread Johannes Berg

From: Johannes Berg johannes.b...@intel.com

Having the EWMA parameters stored in the runtime struct imposes
memory requirements for the constant values that could just be
inlined in the code. This particularly makes sense if there are
a lot of such structs, for example in mac80211 in the station
table where each station has a number of these in an array, and
there can be many stations.

Provide a macro DECLARE_EWMA() that declares the necessary struct
and inline functions to access it with the parameters hard-coded;
using this also means the user no longer needs to 'select AVERAGE'
as it's entirely self-contained.

In the mac80211 case, on x86-64, this actually slightly *reduces*
code size, while also saving 80 bytes of runtime memory per sta.

Signed-off-by: Johannes Berg johannes.b...@intel.com
---
As the next patch relies on this, I'll take this through my tree
unless I hear objections.
---
 include/linux/average.h | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/include/linux/average.h b/include/linux/average.h
index c6028fd742c1..802adeab7037 100644
--- a/include/linux/average.h
+++ b/include/linux/average.h
@@ -27,4 +27,43 @@ static inline unsigned long ewma_read(const struct ewma *avg)
return avg-internal  avg-factor;
 }
 
+#define DECLARE_EWMA(name, _factor, _weight)   \
+   struct ewma_##name {\
+   unsigned long internal; \
+   };  \
+   static inline void ewma_##name##_init(struct ewma_##name *e)\
+   {   \
+   BUILD_BUG_ON(!__builtin_constant_p(_factor));   \
+   BUILD_BUG_ON(!__builtin_constant_p(_weight));   \
+   BUILD_BUG_ON(!is_power_of_2(_factor));  \
+   BUILD_BUG_ON(!is_power_of_2(_weight));  \
+   e-internal = 0;\
+   }   \
+   static inline unsigned long \
+   ewma_##name##_read(struct ewma_##name *e)   \
+   {   \
+   BUILD_BUG_ON(!__builtin_constant_p(_factor));   \
+   BUILD_BUG_ON(!__builtin_constant_p(_weight));   \
+   BUILD_BUG_ON(!is_power_of_2(_factor));  \
+   BUILD_BUG_ON(!is_power_of_2(_weight));  \
+   return e-internal  ilog2(_factor);   \
+   }   \
+   static inline void ewma_##name##_add(struct ewma_##name *e, \
+unsigned long val) \
+   {   \
+   unsigned long internal = ACCESS_ONCE(e-internal);  \
+   unsigned long weight = ilog2(_weight);  \
+   unsigned long factor = ilog2(_factor);  \
+   \
+   BUILD_BUG_ON(!__builtin_constant_p(_factor));   \
+   BUILD_BUG_ON(!__builtin_constant_p(_weight));   \
+   BUILD_BUG_ON(!is_power_of_2(_factor));  \
+   BUILD_BUG_ON(!is_power_of_2(_weight));  \
+   \
+   ACCESS_ONCE(e-internal) = internal ?   \
+   (((internal  weight) - internal) +\
+   (val  factor))  weight :\
+   (val  factor);\
+   }
+
 #endif /* _LINUX_AVERAGE_H */
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mm: make page pfmemalloc check more robust

2015-08-13 Thread Michal Hocko

On Thu 13-08-15 11:13:04, Vlastimil Babka wrote:
 On 08/13/2015 10:58 AM, mho...@kernel.org wrote:
 From: Michal Hocko mho...@suse.com
 
 The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb)
 added the checks for page-pfmemalloc to __skb_fill_page_desc():
 
  if (page-pfmemalloc  !page-mapping)
  skb-pfmemalloc = true;
 
 It assumes page-mapping == NULL implies that page-pfmemalloc can be
 trusted.  However, __delete_from_page_cache() can set set page-mapping
 to NULL and leave page-index value alone. Due to being in union, a
 non-zero page-index will be interpreted as true page-pfmemalloc.
 
 So the assumption is invalid if the networking code can see such a
 page. And it seems it can. We have encountered this with a NFS over
 loopback setup when such a page is attached to a new skbuf. There is no
 copying going on in this case so the page confuses __skb_fill_page_desc
 which interprets the index as pfmemalloc flag and the network stack
 drops packets that have been allocated using the reserves unless they
 are to be queued on sockets handling the swapping which is the case here
 
 ^ not ?

Dohh, you are right of course, updated...

 The full story (according to Jiri Bohac and my understanding, I don't know
 much about netdev) is that the __skb_fill_page_desc() is invoked here during
 *sending* and normally the skb-pfmemalloc would be ignored in the end. But
 because it is a localhost connection, the receiving code will think it was a
 memalloc allocation during receive, and then do the socket restriction.
 
 Given that this apparently isn't the first case of this localhost issue, I
 wonder if network code should just clear skb-pfmemalloc during send (or
 maybe just send over localhost). That would be probably easier than
 distinguish the __skb_fill_page_desc() callers for send vs receive.

Maybe the networking code can behave better in this particular case
but the core thing remains though. Relying on page-mapping as you have
properly found out during the debugging cannot be used for the reliable
detection of pfmemalloc. So I would argue that a more robust detection
is really worthwhile. Note there are other places which even do not
bother to test for mapping - maybe they are safe but I got lost quickly
when trying to track the allocation source to be clear that nothing
could have stepped in in the meantime.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH 2/3] ARM: dts: dra7: update cpsw compatible

2015-08-13 Thread Tony Lindgren

* Mugunthan V N mugunthan...@ti.com [150812 02:56]:
 CPSW driver has been updated with compatibles for enabling errata
 workarounds. So updating cpsw compatibles.
 
 Signed-off-by: Mugunthan V N mugunthan...@ti.com

Please feel free to merge this one via net tree once the changes
are reviewed:

Acked-by: Tony Lindgren t...@atomide.com

 ---
  arch/arm/boot/dts/dra7.dtsi | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
 index 8f1e25b..b4fdd10 100644
 --- a/arch/arm/boot/dts/dra7.dtsi
 +++ b/arch/arm/boot/dts/dra7.dtsi
 @@ -1398,7 +1398,7 @@
   };
  
   mac: ethernet@4a10 {
 - compatible = ti,cpsw;
 + compatible = ti,dra7-cpsw,ti,cpsw;
   ti,hwmods = gmac;
   clocks = dpll_gmac_ck, gmac_gmii_ref_clk_div;
   clock-names = fck, cpts;
 -- 
 2.5.0.234.gefc8a62
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH 3/3] ARM: dts: am33xx: update cpsw compatible

2015-08-13 Thread Tony Lindgren

* Mugunthan V N mugunthan...@ti.com [150812 02:56]:
 CPSW driver has been updated with compatibles for enabling errata
 workarounds. So updating cpsw compatibles.
 
 Signed-off-by: Mugunthan V N mugunthan...@ti.com

This too:

Acked-by: Tony Lindgren t...@atomide.com

 ---
  arch/arm/boot/dts/am33xx.dtsi | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
 index 21fcc44..8b59c86 100644
 --- a/arch/arm/boot/dts/am33xx.dtsi
 +++ b/arch/arm/boot/dts/am33xx.dtsi
 @@ -700,7 +700,7 @@
   };
  
   mac: ethernet@4a10 {
 - compatible = ti,cpsw;
 + compatible = ti,am335x-cpsw,ti,cpsw;
   ti,hwmods = cpgmac0;
   clocks = cpsw_125mhz_gclk, cpsw_cpts_rft_clk;
   clock-names = fck, cpts;
 -- 
 2.5.0.234.gefc8a62
 
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mm: make page pfmemalloc check more robust

2015-08-13 Thread Mel Gorman

On Thu, Aug 13, 2015 at 10:58:54AM +0200, mho...@kernel.org wrote:
 From: Michal Hocko mho...@suse.com
 
 The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb)
 added the checks for page-pfmemalloc to __skb_fill_page_desc():
 
 if (page-pfmemalloc  !page-mapping)
 skb-pfmemalloc = true;
 
 It assumes page-mapping == NULL implies that page-pfmemalloc can be
 trusted.  However, __delete_from_page_cache() can set set page-mapping
 to NULL and leave page-index value alone. Due to being in union, a
 non-zero page-index will be interpreted as true page-pfmemalloc.
 
 So the assumption is invalid if the networking code can see such a
 page. And it seems it can. We have encountered this with a NFS over
 loopback setup when such a page is attached to a new skbuf. There is no
 copying going on in this case so the page confuses __skb_fill_page_desc
 which interprets the index as pfmemalloc flag and the network stack
 drops packets that have been allocated using the reserves unless they
 are to be queued on sockets handling the swapping which is the case here
 and that leads to hangs when the nfs client waits for a response from
 the server which has been dropped and thus never arrive.
 
 The struct page is already heavily packed so rather than finding
 another hole to put it in, let's do a trick instead. We can reuse the
 index again but define it to an impossible value (-1UL). This is the
 page index so it should never see the value that large. Replace all
 direct users of page-pfmemalloc by page_is_pfmemalloc which will
 hide this nastiness from unspoiled eyes.
 
 The information will get lost if somebody wants to use page-index
 obviously but that was the case before and the original code expected
 that the information should be persisted somewhere else if that is
 really needed (e.g. what SLAB and SLUB do).
 
 Fixes: c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb)
 Cc: stable # 3.6+
 Debugged-by: Vlastimil Babka vba...@suse.com
 Debugged-by: Jiri Bohac jbo...@suse.com
 Signed-off-by: Michal Hocko mho...@suse.com

Acked-by: Mel Gorman mgor...@suse.de

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] net: rfkill: add rfkill_find_type function

2015-08-13 Thread Johannes Berg

On Wed, 2015-08-05 at 16:39 +0300, Heikki Krogerus wrote:
 
 +static const char *rfkill_types[NUM_RFKILL_TYPES] = {
 + [RFKILL_TYPE_WLAN]  = wlan,
 + [RFKILL_TYPE_BLUETOOTH] = bluetooth,
 + [RFKILL_TYPE_UWB]   = ultrawideband,
 + [RFKILL_TYPE_WIMAX] = wimax,
 + [RFKILL_TYPE_WWAN]  = wwan,
 + [RFKILL_TYPE_GPS]   = gps,
 + [RFKILL_TYPE_FM]= fm,
 + [RFKILL_TYPE_NFC]   = nfc,
 +};
 +
 +enum rfkill_type rfkill_find_type(const char *name)
 +{
 + int i;
 +
 + BUILD_BUG_ON(NUM_RFKILL_TYPES != RFKILL_TYPE_NFC + 1);
 
That BUILD_BUG_ON() is now less useful - previously it pointed to the
code that needed to change, now you're left wondering if you don't look
up since it isn't quite that obvious from the code what this does.

Something like

BUILD_BUG_ON(rfkill_types[NUM_RFKILL_TYPES - 1] == NULL);

would be better. As we only add here, that would be safe enough - I've
done something similar in the past that a bit more complicated.

With that and the static inline fixed (which maybe you could even
remove) I'm fine with all these rfkill patches, but I'm not sure how to
merge them since they affect all kinds of other trees. If desired, I
can apply them, but an ACK from the tegra maintainer would be good :)

johannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-08-13 Thread Graeme Gregory

On Thu, Aug 13, 2015 at 10:01:17AM +0100, Lorenzo Pieralisi wrote:
 On Thu, Aug 13, 2015 at 09:27:59AM +0100, Graeme Gregory wrote:
  On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote:
   Add ACPI bindings for the smsc911x driver. Convert the DT specific calls
   to nonspecific device* calls, This allows the driver to work
   with both ACPI and DT configurations. Ethernet should now work when using
   ACPI on ARM Juno.
 
 Last sentence does not belong in the commit log.
 
   
   Signed-off-by: Jeremy Linton jeremy.lin...@arm.com
  
  The code looks fine to me.
  
  Currently the compulsary DT properties seem to match the approved ACPI
  NIC properties from here
  
  http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf
 
 What about _DSD device specific properties (eg smsc,save-mac-address) ?
 Are we taking 1:1 translation between DT and ACPI for granted ?
 
 I thought some process must be put in place to define the corresponding
 bindings in ACPI before starting this mechanical translation or maybe
 I missed something, I would like to understand.
 
 How does the device specific _DSD definitions work in ACPI world ? Where
 are they published ? How will we translate those to DT bindings if there
 is need ?
 

This I do not know as that discussion is still ongoing. But those
options are currently marked as optional so driver should not fail if
they are not present in ACPI.

Graeme

 Lorenzo
 
  Reviewed-by: Graeme Gregory graeme.greg...@linaro.org
  
  Thanks
  
   ---
drivers/net/ethernet/smsc/smsc911x.c | 48 
   +---
1 file changed, 22 insertions(+), 26 deletions(-)
   
   diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
   b/drivers/net/ethernet/smsc/smsc911x.c
   index 959aeea..0f21aa3 100644
   --- a/drivers/net/ethernet/smsc/smsc911x.c
   +++ b/drivers/net/ethernet/smsc/smsc911x.c
   @@ -59,7 +59,9 @@
#include linux/of_device.h
#include linux/of_gpio.h
#include linux/of_net.h
   +#include linux/acpi.h
#include linux/pm_runtime.h
   +#include linux/property.h

#include smsc911x.h

   @@ -2362,59 +2364,46 @@ static const struct smsc911x_ops 
   shifted_smsc911x_ops = {
 .tx_writefifo = smsc911x_tx_writefifo_shift,
};

   -#ifdef CONFIG_OF
   -static int smsc911x_probe_config_dt(struct smsc911x_platform_config 
   *config,
   - struct device_node *np)
   +static int smsc911x_probe_config(struct smsc911x_platform_config *config,
   +  struct device *dev)
{
   - const char *mac;
 u32 width = 0;

   - if (!np)
   + if (!dev)
 return -ENODEV;

   - config-phy_interface = of_get_phy_mode(np);
   + config-phy_interface = device_get_phy_mode(dev);

   - mac = of_get_mac_address(np);
   - if (mac)
   - memcpy(config-mac, mac, ETH_ALEN);
   + device_get_mac_address(dev, config-mac, ETH_ALEN);

   - of_property_read_u32(np, reg-shift, config-shift);
   + device_property_read_u32(dev, reg-shift, config-shift);

   - of_property_read_u32(np, reg-io-width, width);
   + device_property_read_u32(dev, reg-io-width, width);
 if (width == 4)
 config-flags |= SMSC911X_USE_32BIT;
 else
 config-flags |= SMSC911X_USE_16BIT;

   - if (of_get_property(np, smsc,irq-active-high, NULL))
   + if (device_property_present(dev, smsc,irq-active-high))
 config-irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH;

   - if (of_get_property(np, smsc,irq-push-pull, NULL))
   + if (device_property_present(dev, smsc,irq-push-pull))
 config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL;

   - if (of_get_property(np, smsc,force-internal-phy, NULL))
   + if (device_property_present(dev, smsc,force-internal-phy))
 config-flags |= SMSC911X_FORCE_INTERNAL_PHY;

   - if (of_get_property(np, smsc,force-external-phy, NULL))
   + if (device_property_present(dev, smsc,force-external-phy))
 config-flags |= SMSC911X_FORCE_EXTERNAL_PHY;

   - if (of_get_property(np, smsc,save-mac-address, NULL))
   + if (device_property_present(dev, smsc,save-mac-address))
 config-flags |= SMSC911X_SAVE_MAC_ADDRESS;

 return 0;
}
   -#else
   -static inline int smsc911x_probe_config_dt(
   - struct smsc911x_platform_config *config,
   - struct device_node *np)
   -{
   - return -ENODEV;
   -}
   -#endif /* CONFIG_OF */

static int smsc911x_drv_probe(struct platform_device *pdev)
{
   - struct device_node *np = pdev-dev.of_node;
 struct net_device *dev;
 struct smsc911x_data *pdata;
 struct smsc911x_platform_config *config = dev_get_platdata(pdev-dev);
   @@ -2478,7 +2467,7 @@ static int smsc911x_drv_probe(struct 
   platform_device *pdev)
 goto out_disable_resources;
 }

   - retval = smsc911x_probe_config_dt(pdata-config, np);
   + retval =

Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API

2015-08-13 Thread Steven Whitehouse


Hi,

On 12/08/15 17:42, Marcelo Ricardo Leitner wrote:

Em 12-08-2015 12:33, David Laight escreveu:

From: Marcelo Ricardo Leitner

Sent: 12 August 2015 14:16
Em 12-08-2015 07:23, David Laight escreveu:

From: Marcelo Ricardo Leitner

Sent: 11 August 2015 23:22
DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
needed but this causes it to use sctp_do_peeloff() to mimic an
kernel_accept() and this causes a symbol dependency on sctp module.

By switching it to 1-to-1 API we can avoid this dependency and also
reduce quite a lot of SCTP-specific code in lowcomms.c.

...

You still need to enable sctp notifications (I think the patch deleted
that code).
Otherwise you don't get any kind of indication if the remote system
'resets' (ie sends an new INIT chunk) on an existing connection.


Right, it would miss the restart event and could generate a corrupted
tx/rx buffers by glueing parts of old messages with new ones.


Except that it is SCTP so you'd expect DATA chunks to contain entire
messages and so get unexpected message sequences rather than corrupt
messages.


I was thinking on cases where the buf for recvmsg is not enough to 
hold the chunk, so that the remaining is left for another attempt 
(sctp_recvmsg, around line 2130), but sounds like we won't purge rx 
buffer when the reset happens so that doesn't matter. The association 
is replaced, but the buffers are kept.


Out of order messages aren't a problem for dlm. It can recover from 
that just fine, as it doesn't have a specific handshake at beginning 
or something like that and upper layers are agnostic to that state 
transition (disconnect/reconnect/...), this should be fine.


I'm not sure thats true - DLM does rely on message ordering in some 
cases in order to ensure correct functioning. So depending on how SCTP 
is interfaced to DLM, it might potentially be an issue,


Steve.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] ipv4: off-by-one in continuation handling in /proc/net/route

2015-08-13 Thread Andy Whitcroft

When generating /proc/net/route we emit a header followed by a line for
each route.  When a short read is performed we will restart this process
based on the open file descriptor.  When calculating the start point we
fail to take into account that the 0th entry is the header.  This leads
us to skip the first entry when doing a continuation read.

This can be easily seen with the comparison below:

  while read l; do echo $l; done /proc/net/route A
  cat /proc/net/route B
  diff -bu A B | grep '^[+-]'

On my example machine I have approximatly 10KB of route output.  There we
see the very first non-title element is lost in the while read case,
and an entry around the 8K mark in the cat case:

  +wlan0  02021EAC 0003 0 0 400  0 0 0
  -tun1  00C0AC0A  0001 0 0 950 00C0 0 0 0

Fix up the off-by-one when reaquiring position on continuation.

BugLink: http://bugs.launchpad.net/bugs/1483440
Signed-off-by: Andy Whitcroft a...@canonical.com
---
 net/ipv4/fib_trie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

From code inspection I belive this was introduced by the Fixes
below, but I have not tested this to confirm.

Fixes: 8be33e955cb9 (ipv4: off-by-one in continuation handling in 
/proc/net/route)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 37c4bb8..b0c6258 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2465,7 +2465,7 @@ static struct key_vector *fib_route_get_idx(struct 
fib_route_iter *iter,
key = l-key + 1;
iter-pos++;
 
-   if (pos-- = 0)
+   if (--pos = 0)
break;
 
l = NULL;
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT

2015-08-13 Thread Lorenzo Pieralisi

On Thu, Aug 13, 2015 at 10:38:38AM +0100, Graeme Gregory wrote:
 On Thu, Aug 13, 2015 at 10:01:17AM +0100, Lorenzo Pieralisi wrote:
  On Thu, Aug 13, 2015 at 09:27:59AM +0100, Graeme Gregory wrote:
   On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote:
Add ACPI bindings for the smsc911x driver. Convert the DT specific calls
to nonspecific device* calls, This allows the driver to work
with both ACPI and DT configurations. Ethernet should now work when 
using
ACPI on ARM Juno.
  
  Last sentence does not belong in the commit log.
  

Signed-off-by: Jeremy Linton jeremy.lin...@arm.com
   
   The code looks fine to me.
   
   Currently the compulsary DT properties seem to match the approved ACPI
   NIC properties from here
   
   http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf
  
  What about _DSD device specific properties (eg smsc,save-mac-address) ?
  Are we taking 1:1 translation between DT and ACPI for granted ?
  
  I thought some process must be put in place to define the corresponding
  bindings in ACPI before starting this mechanical translation or maybe
  I missed something, I would like to understand.
  
  How does the device specific _DSD definitions work in ACPI world ? Where
  are they published ? How will we translate those to DT bindings if there
  is need ?
  
 
 This I do not know as that discussion is still ongoing. But those
 options are currently marked as optional so driver should not fail if
 they are not present in ACPI.

Well yes, but if they are present this patch uses them when booting
with ACPI and that's what I am arguing about, because they are
undocumented.

This discussion should be brought to completion, the _DSD usage and
bindings sharing between DT and ACPI well defined before we can
enable drivers to rely on properties that in ACPI world have no
binding definition at all, if we go by this route we might end
up having drivers reusing DT properties that have no reason whatsoever
to exist in ACPI world.

So, to make this clear, the _DSD usage must be documented and there
has to be a way to define the related bindings in ACPI world so that
driver code can be reviewed accordingly.

Lorenzo

 Graeme
 
  Lorenzo
  
   Reviewed-by: Graeme Gregory graeme.greg...@linaro.org
   
   Thanks
   
---
 drivers/net/ethernet/smsc/smsc911x.c | 48 
+---
 1 file changed, 22 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 959aeea..0f21aa3 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -59,7 +59,9 @@
 #include linux/of_device.h
 #include linux/of_gpio.h
 #include linux/of_net.h
+#include linux/acpi.h
 #include linux/pm_runtime.h
+#include linux/property.h
 
 #include smsc911x.h
 
@@ -2362,59 +2364,46 @@ static const struct smsc911x_ops 
shifted_smsc911x_ops = {
.tx_writefifo = smsc911x_tx_writefifo_shift,
 };
 
-#ifdef CONFIG_OF
-static int smsc911x_probe_config_dt(struct smsc911x_platform_config 
*config,
-   struct device_node *np)
+static int smsc911x_probe_config(struct smsc911x_platform_config 
*config,
+struct device *dev)
 {
-   const char *mac;
u32 width = 0;
 
-   if (!np)
+   if (!dev)
return -ENODEV;
 
-   config-phy_interface = of_get_phy_mode(np);
+   config-phy_interface = device_get_phy_mode(dev);
 
-   mac = of_get_mac_address(np);
-   if (mac)
-   memcpy(config-mac, mac, ETH_ALEN);
+   device_get_mac_address(dev, config-mac, ETH_ALEN);
 
-   of_property_read_u32(np, reg-shift, config-shift);
+   device_property_read_u32(dev, reg-shift, config-shift);
 
-   of_property_read_u32(np, reg-io-width, width);
+   device_property_read_u32(dev, reg-io-width, width);
if (width == 4)
config-flags |= SMSC911X_USE_32BIT;
else
config-flags |= SMSC911X_USE_16BIT;
 
-   if (of_get_property(np, smsc,irq-active-high, NULL))
+   if (device_property_present(dev, smsc,irq-active-high))
config-irq_polarity = 
SMSC911X_IRQ_POLARITY_ACTIVE_HIGH;
 
-   if (of_get_property(np, smsc,irq-push-pull, NULL))
+   if (device_property_present(dev, smsc,irq-push-pull))
config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL;
 
-   if (of_get_property(np, smsc,force-internal-phy, NULL))
+   if (device_property_present(dev, smsc,force-internal-phy))
config-flags |= SMSC911X_FORCE_INTERNAL_PHY;
 
-   if (of_get_property(np, smsc,force-external-phy,

[GIT] Networking

2015-08-13 Thread David Miller


1) Workaround hw bug when acquiring PCI bos ownership of iwlwifi
   devices, from Emmanuel Grumbach.

2) Falling back to vmalloc in conntrack should not emit a warning,
   from Pablo Neira Ayuso.

3) Fix NULL deref when rtlwifi driver is used as an AP, from Luis
   Felipe Dominguez Vega.

4) Rocker doesn't free netdev on device removal, from Ido Schimmel.

5) UDP multicast early sock demux has route handling races, from
   Eric Dumazet.

6) Fix L4 checksum handling in openvswitch, from Glenn Griffin.

7) Fix use-after-free in skb_set_peeked, from Herbert Xu.

8) Don't advertize NETIF_F_FRAGLIST in virtio_net driver, this can
   lead to fraglists longer than the driver can support.  From Jason
   Wang.

9) Fix mlx5 on non-4k-pagesize systems, from Carol L Soto.

10) Fix interrupt storm in bna driver, from Ivan Vecera.

11) Don't propagate -EBUSY from netlink_insert(), from Daniel
Borkmann.

12) Fix inet request sock leak, from Eric Dumazet.

13) Fix TX interrupt masking and marking in TX descriptors of fs_enet
driver, from LEROY Christophe.

14) Get rid of rule optimizer in gianfar driver, it's buggy and unlikely
to get fixed any time soon.  From Jakub Kicinski.

Please pull, thanks a lot!

The following changes since commit 7c764cec3703583247c4ab837c652975a3d41f4b:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-07-31 
17:10:56 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to e6d006938c9bda7ffd22af9d3e1257fd75941fb7:

  cosa: missing error code on failure in probe() (2015-08-12 16:53:11 -0700)


Antonio Quartulli (1):
  batman-adv: avoid DAT to mess up LAN state

Avraham Stern (1):
  iwlwifi: mvm: Fix regular scan priority

Benjamin Poirier (1):
  net-timestamp: Update skb_complete_tx_timestamp comment

Carol L Soto (1):
  net/mlx5_core: Set log_uar_page_sz for non 4K page size architecture

Dan Carpenter (4):
  netfilter: nf_conntrack: checking for IS_ERR() instead of NULL
  rds: fix an integer overflow test in rds_info_getsockopt()
  cxgb4: missing curly braces in t4_setup_debugfs()
  cosa: missing error code on failure in probe()

Daniel Borkmann (1):
  netlink: make sure -EBUSY won't escape from netlink_insert

David S. Miller (8):
  Merge tag 'wireless-drivers-for-davem-2015-08-04' of 
git://git.kernel.org/.../kvalo/wireless-drivers
  Merge branch 'be2net-fixes'
  Merge tag 'batman-adv-fix-for-davem' of 
git://git.open-mesh.org/linux-merge
  Merge branch 'mvpp2-fixes'
  Merge branch 'bnx2x-fixes'
  Merge git://git.kernel.org/.../pablo/nf
  Merge branch 'for-upstream' of 
git://git.kernel.org/.../bluetooth/bluetooth
  Merge branch 'gianfar-fixes'

Emmanuel Grumbach (2):
  iwlwifi: pcie: fix prepare card flow
  iwlwifi: pcie: fix stuck queue detection for sleeping clients

Eric Dumazet (4):
  fq_codel: explicitly reset flows in -reset()
  udp: fix dst races with multicast early demux
  inet: fix races with reqsk timers
  inet: fix possible request socket leak

Fabio Estevam (1):
  mkiss: Fix error handling in mkiss_open()

Florian Fainelli (1):
  net: dsa: Do not override PHY interface if already configured

Florian Westphal (1):
  ipv6: don't reject link-local nexthop on other interface

Glenn Griffin (1):
  openvswitch: Fix L4 checksum handling when dealing with IP fragments

Hauke Mehrtens (1):
  b43: fix extpa_gain check for 2GHz

Herbert Xu (1):
  net: Fix skb_set_peeked use-after-free bug

Ian Campbell (1):
  net: thunderx: remove effective default y from Kconfig if ARCH_THUNDER=y

Ido Schimmel (1):
  rocker: free netdevice during netdevice removal

Ivan Vecera (2):
  r8169: enforce RX_MULTI_EN on rtl8168ep/8111ep chips
  bna: fix interrupts storm caused by erroneous packets

Jakub Kicinski (3):
  gianfar: correct filer table writing
  gianfar: correct list membership accounting
  gianfar: remove faulty filer optimizer

Jakub Pawlowski (1):
  Bluetooth: fix MGMT_EV_NEW_LONG_TERM_KEY event

Jason Wang (1):
  virtio-net: drop NETIF_F_FRAGLIST

Jia-Ju Bai (1):
  3c59x: Fix resource leaks in vortex_open

Joe Stringer (1):
  netfilter: conntrack: Use flags in nf_ct_tmpl_alloc()

Kalesh AP (3):
  be2net: enable IFACE filters only after creating RXQs
  be2net: post buffers before destroying RXQs in Lancer
  be2net: protect eqo-affinity_mask from getting freed twice

Kalle Valo (1):
  Merge tag 'iwlwifi-for-kalle-2015-07-30' of 
https://git.kernel.org/.../iwlwifi/iwlwifi-fixes

LEROY Christophe (2):
  net: fs_enet: explicitly remove I flag on TX partial frames
  net: fs_enet: mask interrupts for TX partial frames.

Larry Finger (1):
  rtlwifi: rtl8723be: Add module parameter for MSI interrupts

Lucas Stach (1):
  net: fec: fix

Re: [PATCH 1/2] Add a matching set of device_ functions for determining mac/phy

2015-08-13 Thread Robin Murphy


Hi Jeremy,

On 12/08/15 23:06, Jeremy Linton wrote:
[...]

+static void *device_get_mac_addr(struct device *dev,
+const char *name, char *addr,
+int alen)
+{
+   int ret = device_property_read_u8_array(dev, name, addr, alen);
+
+   if (ret == 0  is_valid_ether_addr(addr))
+   return addr;
+   return NULL;
+}


Not sure I understand the logic here - return the same thing we were 
given if we updated it, or null if we didn't. It's only indicating 
success/failure (the caller can perfectly well cast its own buffer to a 
void * if it needs to), so why wouldn't you just return a normal int 
error code?



+/**
+ * Search the device tree for the best MAC address to use.  'mac-address' is
+ * checked first, because that is supposed to contain to most recent MAC
+ * address. If that isn't set, then 'local-mac-address' is checked next,
+ * because that is the default address.  If that isn't set, then the obsolete
+ * 'address' is checked, just in case we're using an old device tree.
+ *
+ * Note that the 'address' property is supposed to contain a virtual address of
+ * the register set, but some DTS files have redefined that property to be the
+ * MAC address.
+ *
+ * All-zero MAC addresses are rejected, because those could be properties that
+ * exist in the device tree, but were not set by U-Boot.  For example, the
+ * DTS could define 'mac-address' and 'local-mac-address', with zero MAC
+ * addresses.  Some older U-Boots only initialized 'local-mac-address'.  In
+ * this case, the real MAC is in 'local-mac-address', and 'mac-address' exists
+ * but is all zeros.
+*/
+void *device_get_mac_address(struct device *dev, char *addr, int alen)
+{
+   addr = device_get_mac_addr(dev, mac-address, addr, alen);
+   if (addr)
+   return addr;
+
+   addr = device_get_mac_addr(dev, local-mac-address, addr, alen);
+   if (addr)
+   return addr;
+
+   return device_get_mac_addr(dev, address, addr, alen);
+}
+EXPORT_SYMBOL(device_get_mac_address);


Same here, it's not at all apparent why this should return a void * 
instead of an int (or even possibly bool). of_get_mac_address is giving 
its caller back a _new_ pointer they didn't know about before; this isn't.


Robin.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API

2015-08-13 Thread Marcelo Ricardo Leitner


Em 13-08-2015 06:37, Steven Whitehouse escreveu:

Hi,

On 12/08/15 17:42, Marcelo Ricardo Leitner wrote:

Em 12-08-2015 12:33, David Laight escreveu:

From: Marcelo Ricardo Leitner

Sent: 12 August 2015 14:16
Em 12-08-2015 07:23, David Laight escreveu:

From: Marcelo Ricardo Leitner

Sent: 11 August 2015 23:22
DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not
needed but this causes it to use sctp_do_peeloff() to mimic an
kernel_accept() and this causes a symbol dependency on sctp module.

By switching it to 1-to-1 API we can avoid this dependency and also
reduce quite a lot of SCTP-specific code in lowcomms.c.

...

You still need to enable sctp notifications (I think the patch deleted
that code).
Otherwise you don't get any kind of indication if the remote system
'resets' (ie sends an new INIT chunk) on an existing connection.


Right, it would miss the restart event and could generate a corrupted
tx/rx buffers by glueing parts of old messages with new ones.


Except that it is SCTP so you'd expect DATA chunks to contain entire
messages and so get unexpected message sequences rather than corrupt
messages.


I was thinking on cases where the buf for recvmsg is not enough to
hold the chunk, so that the remaining is left for another attempt
(sctp_recvmsg, around line 2130), but sounds like we won't purge rx
buffer when the reset happens so that doesn't matter. The association
is replaced, but the buffers are kept.

Out of order messages aren't a problem for dlm. It can recover from
that just fine, as it doesn't have a specific handshake at beginning
or something like that and upper layers are agnostic to that state
transition (disconnect/reconnect/...), this should be fine.


I'm not sure thats true - DLM does rely on message ordering in some
cases in order to ensure correct functioning. So depending on how SCTP
is interfaced to DLM, it might potentially be an issue,


Yes, that ordering is still kept. Like, it won't flip a newer message to 
a first position. It's just that if DLM had its own handshake exposing 
its version and features, one peer (the old one) would get it out of the 
blue and the other (the new one) would never get it. Or if its messages 
would depend on a previous state, meaning LockMsgC is only acceptable if 
LockMsgA was already performed on that connection. That is my 
understanding from what David pointed out and what I checked here.


Then as lowcomms previously allowed connection closing without telling 
anyone above it that it happened, it should be fine, right? It will just 
finish processing the old messages and then start on the new ones, just 
like before.


Thanks,
Marcelo

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ath9k_htc: drv_init: match wait_for_completion_timeout return type

2015-08-13 Thread Kalle Valo


 Return type of wait_for_completion_timeout is unsigned long not int.
 As time_left is exclusively used for wait_for_completion_timeout here its
 type is simply changed to unsigned long.
 
 API conformance testing for completions with coccinelle spatches are being
 used to locate API usage inconsistencies:
 ./drivers/net/wireless/ath/ath9k/htc_drv_init.c:81
   int return assigned to unsigned long
 
 Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m,
 CONFIG_ATH9K_HTC=m
 
 Patch is against 4.1-rc3 (localversion-next is -next-20150514)
 
 Signed-off-by: Nicholas Mc Guire hof...@osadl.org

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/5] net: rfkill: add rfkill_find_type function

2015-08-13 Thread Heikki Krogerus

Hi,

On Thu, Aug 13, 2015 at 11:27:46AM +0200, Johannes Berg wrote:
 On Wed, 2015-08-05 at 16:39 +0300, Heikki Krogerus wrote:
  
  +static const char *rfkill_types[NUM_RFKILL_TYPES] = {
  +   [RFKILL_TYPE_WLAN]  = wlan,
  +   [RFKILL_TYPE_BLUETOOTH] = bluetooth,
  +   [RFKILL_TYPE_UWB]   = ultrawideband,
  +   [RFKILL_TYPE_WIMAX] = wimax,
  +   [RFKILL_TYPE_WWAN]  = wwan,
  +   [RFKILL_TYPE_GPS]   = gps,
  +   [RFKILL_TYPE_FM]= fm,
  +   [RFKILL_TYPE_NFC]   = nfc,
  +};
  +
  +enum rfkill_type rfkill_find_type(const char *name)
  +{
  +   int i;
  +
  +   BUILD_BUG_ON(NUM_RFKILL_TYPES != RFKILL_TYPE_NFC + 1);
  
 That BUILD_BUG_ON() is now less useful - previously it pointed to the
 code that needed to change, now you're left wondering if you don't look
 up since it isn't quite that obvious from the code what this does.
 
 Something like
 
   BUILD_BUG_ON(rfkill_types[NUM_RFKILL_TYPES - 1] == NULL);
 
 would be better. As we only add here, that would be safe enough - I've
 done something similar in the past that a bit more complicated.

OK, I'll change it.

 With that and the static inline fixed (which maybe you could even
 remove) I'm fine with all these rfkill patches, but I'm not sure how to
 merge them since they affect all kinds of other trees. If desired, I
 can apply them, but an ACK from the tegra maintainer would be good :)

Andy and Mika are preparing some changes to the device property
handling. I'll wait for their proposal and prepare next version these
after that.


Thanks,

-- 
heikki
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net/wireless: enable wiphy device to suspend/resume asynchronously

2015-08-13 Thread Johannes Berg

On Thu, 2015-07-30 at 08:55 +0300, Emmanuel Grumbach wrote:
 On Thu, Jul 30, 2015 at 8:18 AM, Fu, Zhonghui
 zhonghui...@linux.intel.com wrote:
  Enable wiphy device to suspend/resume asynchronously. This can 
  improve
  system suspend/resume speed.
  
 
 How will that impact the timing with respect to the suspend call
 coming from the bus?
 I think that a few drivers rely on the suspend call of the wiphy
 device happening before the suspend call to the bus device.
 

Yes, we can't do this for precisely this reason unless we have a way to
somehow keep the dependency between the two - possibly by also marking
the other one as async (although I don't know if the async framework in
general has any FIFO guarantees, which would be required for this.)

I've dropped the patch.

johannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] ppp: fix device unregistration upon netns deletion

2015-08-13 Thread Guillaume Nault

PPP devices may get automatically unregistered when their network
namespace is getting removed. This happens if the ppp control plane
daemon (e.g. pppd) exits while it is the last user of this namespace.

This leads to several races:

  * ppp_exit_net() may destroy the per namespace idr (pn-units_idr)
before all file descriptors were released. Successive ppp_release()
calls may then cleanup PPP devices with ppp_shutdown_interface() and
try to use the already destroyed idr.

  * Automatic device unregistration may also happen before the
ppp_release() call for that device gets executed. Once called on
the file owning the device, ppp_release() will then clean it up and
try to unregister it a second time.

To fix these issues, operations defined in ppp_shutdown_interface() are
moved to the PPP device's ndo_uninit() callback. This allows PPP
devices to be properly cleaned up by unregister_netdev() and friends.
So checking for ppp-owner is now an accurate test to decide if a PPP
device should be unregistered.

Setting ppp-owner is done in ppp_create_interface(), before device
registration, in order to avoid unprotected modification of this field.

Finally ppp_exit_net() now starts by unregistering all remaining PPP
devices to ensure that none will get unregistered after the call to
idr_destroy().

Signed-off-by: Guillaume Nault g.na...@alphalink.fr
---
 drivers/net/ppp/ppp_generic.c | 79 +++
 1 file changed, 43 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 9d15566..1dc478a 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -269,9 +269,9 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff 
*skb, int inbound);
 static void ppp_ccp_closed(struct ppp *ppp);
 static struct compressor *find_compressor(int type);
 static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st);
-static struct ppp *ppp_create_interface(struct net *net, int unit, int *retp);
+static struct ppp *ppp_create_interface(struct net *net, int unit,
+   struct file *file, int *retp);
 static void init_ppp_file(struct ppp_file *pf, int kind);
-static void ppp_shutdown_interface(struct ppp *ppp);
 static void ppp_destroy_interface(struct ppp *ppp);
 static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit);
 static struct channel *ppp_find_channel(struct ppp_net *pn, int unit);
@@ -392,8 +392,10 @@ static int ppp_release(struct inode *unused, struct file 
*file)
file-private_data = NULL;
if (pf-kind == INTERFACE) {
ppp = PF_TO_PPP(pf);
+   rtnl_lock();
if (file == ppp-owner)
-   ppp_shutdown_interface(ppp);
+   unregister_netdevice(ppp-dev);
+   rtnl_unlock();
}
if (atomic_dec_and_test(pf-refcnt)) {
switch (pf-kind) {
@@ -593,8 +595,10 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
mutex_lock(ppp_mutex);
if (pf-kind == INTERFACE) {
ppp = PF_TO_PPP(pf);
+   rtnl_lock();
if (file == ppp-owner)
-   ppp_shutdown_interface(ppp);
+   unregister_netdevice(ppp-dev);
+   rtnl_unlock();
}
if (atomic_long_read(file-f_count)  2) {
ppp_release(NULL, file);
@@ -838,11 +842,10 @@ static int ppp_unattached_ioctl(struct net *net, struct 
ppp_file *pf,
/* Create a new ppp unit */
if (get_user(unit, p))
break;
-   ppp = ppp_create_interface(net, unit, err);
+   ppp = ppp_create_interface(net, unit, file, err);
if (!ppp)
break;
file-private_data = ppp-file;
-   ppp-owner = file;
err = -EFAULT;
if (put_user(ppp-file.index, p))
break;
@@ -916,6 +919,17 @@ static __net_init int ppp_init_net(struct net *net)
 static __net_exit void ppp_exit_net(struct net *net)
 {
struct ppp_net *pn = net_generic(net, ppp_net_id);
+   struct ppp *ppp;
+   LIST_HEAD(list);
+   int id;
+
+   rtnl_lock();
+   idr_for_each_entry(pn-units_idr, ppp, id) {
+   unregister_netdevice_queue(ppp-dev, list);
+   }
+
+   unregister_netdevice_many(list);
+   rtnl_unlock();
 
idr_destroy(pn-units_idr);
 }
@@ -1088,8 +1102,28 @@ static int ppp_dev_init(struct net_device *dev)
return 0;
 }
 
+static void ppp_dev_uninit(struct net_device *dev)
+{
+   struct ppp *ppp = netdev_priv(dev);
+   struct ppp_net *pn =

Re: [PATCH] mac80211: fix invalid read in minstrel_sort_best_tp_rates()

2015-08-13 Thread Johannes Berg

On Tue, 2015-07-28 at 10:30 +0200, Adrien Schildknecht wrote:
 At the last iteration of the loop, j may equal zero and thus
 tp_list[j - 1] causes an invalid read.
 Changed the logic of the loop so that j - 1 is always = 0.
 
 Signed-off-by: Adrien Schildknecht adrien+...@schischi.me
 
Applied, I added Cc stable.

johannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [V2] rtlwifi: misspelled code and comments corrected.

2015-08-13 Thread Kalle Valo


 Signed-off-by: Cheolhyun Park pch851...@gmail.com

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ath9k_htc: match wait_for_completion_timeout return type

2015-08-13 Thread Kalle Valo


 Return type of wait_for_completion_timeout is unsigned long not int.
 As time_left is exclusively used for wait_for_completion_timeout here its
 type is simply changed to unsigned long.
 
 API conformance testing for completions with coccinelle spatches are being
 used to locate API usage inconsistencies:
 ./drivers/net/wireless/ath/ath9k/htc_hst.c:171
   int return assigned to unsigned long
 ./drivers/net/wireless/ath/ath9k/htc_hst.c:277
   int return assigned to unsigned long
 ./drivers/net/wireless/ath/ath9k/htc_hst.c:206
   int return assigned to unsigned long
 
 Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m,
 CONFIG_ATH9K_HTC=m
 
 Patch is against 4.1-rc3 (localversion-next is -next-20150514)
 
 
 Signed-off-by: Nicholas Mc Guire hof...@osadl.org

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging

2015-08-13 Thread Igor Plyatov

* Due to HW bug, LAN8700 sometimes does not detect presence of energy in the
  Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is
  set, the ENERGYON bit does not asserted sometimes). This is a common bug of
  LAN87xx family of PHY chips.
* The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous
  algorythm still not reliable on 100 % and sometimes skip cable plugging.

Signed-off-by: Igor Plyatov plya...@gmail.com
---
 drivers/net/phy/smsc.c |   18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index c0f6479..a380958 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev)
 static int lan87xx_read_status(struct phy_device *phydev)
 {
int err = genphy_read_status(phydev);
+   int rc;
+   int i;
 
if (!phydev-link) {
/* Disable EDPD to wake up PHY */
-   int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
+   rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
if (rc  0)
return rc;
 
@@ -116,8 +118,16 @@ static int lan87xx_read_status(struct phy_device *phydev)
if (rc  0)
return rc;
 
-   /* Sleep 64 ms to allow ~5 link test pulses to be sent */
-   msleep(64);
+   /* Wait max 640 ms to detect energy */
+   for (i = 0; i  64; i++) {
+   /* Sleep to allow link test pulses to be sent */
+   msleep(10);
+   rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
+   if (rc  0)
+   return rc;
+   if (rc  MII_LAN83C185_ENERGYON)
+   break;
+   };
 
/* Re-enable EDPD */
rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
@@ -191,7 +201,7 @@ static struct phy_driver smsc_phy_driver[] = {
 
/* basic functions */
.config_aneg= genphy_config_aneg,
-   .read_status= genphy_read_status,
+   .read_status= lan87xx_read_status,
.config_init= smsc_phy_config_init,
.soft_reset = smsc_phy_reset,
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [2/3] brcmfmac: dhd_sdio.c: use existing atomic_or primitive

2015-08-13 Thread Kalle Valo


 There's already a generic implementation so use that instead.

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Question on behavior of tg3_self_test() (ethtool -t on tg3 driver)

2015-08-13 Thread Douglas Miller

Very interesting. I was running a RHEL 7.1 kernel 
3.10.0-229.ael7b.ppc64le (PowerPC). tg3 version 3.137, firmware 
5719-v1.24i,  but unknown what patches were added to either of our modules.


We will investigate the environment more, under the assumption that we 
should not be required to insert any delay between runs of ethtool -t 
... offline.


Thanks Siva,
Doug

On 08/13/2015 03:40 AM, Siva Reddy (Siva) Kallam wrote:



On 8/12/2015 6:02 PM, Douglas Miller wrote:
Oh, I had missed the extra if condition on tg3_test_link(). So 
external_lb is not a true superset of offline.


So you are not surprised by the (about) 20 second link down period 
after this test? If this is expected (albeit undocumented) behavior 
we can change the test scenario to work around it. It seems as though 
not all adapters exhibit this same symptom. From a testing 
standpoint, it is a long delay to add that may only be needed for 
this one adapter (Broadcom BCM5719, or adapter family).


We executed the ethtool -t dev offline in a loop on our local test 
machine with 5719 and linkup time is = 5 secs.


Script:
#!/bin/bash
echo -OS Information-
uname -a
echo --Card Information--
lspci | grep 5719
echo --Interface information--
ethtool -i p4p4
echo -Offline test start--
for i in 1 2 3
do
date
ethtool -t p4p4 offline
done

Output:

-OS Information-
Linux siva-dev 4.2.0-rc4+ #1 SMP Thu Aug 13 20:24:11 IST 2015 x86_64 
x86_64 x86_64 GNU/Linux

--Card Information--
03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
03:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
03:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)

--Interface information--
driver: tg3
version: 3.137
firmware-version: 5719-v1.41 NCSI v1.3.6.0
bus-info: :03:00.3
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
-Offline test start--
Thu Aug 13 22:05:59 IST 2015
The test result is PASS
The test extra info:
nvram test(online)  0
link test (online)  0
register test (offline) 0
memory test   (offline) 0
mac loopback test (offline) 0
phy loopback test (offline) 0
ext loopback test (offline) 0
interrupt test(offline) 0

Thu Aug 13 22:06:00 IST 2015
The test result is PASS
The test extra info:
nvram test(online)  0
link test (online)  0
register test (offline) 0
memory test   (offline) 0
mac loopback test (offline) 0
phy loopback test (offline) 0
ext loopback test (offline) 0
interrupt test(offline) 0

Thu Aug 13 22:06:05 IST 2015
The test result is PASS
The test extra info:
nvram test(online)  0
link test (online)  0
register test (offline) 0
memory test   (offline) 0
mac loopback test (offline) 0
phy loopback test (offline) 0
ext loopback test (offline) 0
interrupt test(offline) 0

Please check your test environment.


Thanks,
Doug

On 08/11/2015 03:31 PM, Michael Chan wrote:

On Tue, 2015-08-11 at 14:24 -0500, Douglas Miller wrote:

Yes, the wrap plugs are the loopback cables/plugs. It is my
understanding that the offline tests do not require anything to be
plugged into the ports, as they do not in any way touch the external
port. They perform an internal loopback test which does not 
depend on

any external connection.

Correct.


  From what I can tell, the only difference between offline and
external_lb is that external_lb performs the external loopback
tests, *in addition to* all the tests done for offline.

Correct.


This would
imply that the only tests that depend on anything connected to the
physical port is external_lb, and there is no requirement that the
wrap plugs be removed/replaced in order to run offline tests.
When you do external loopback test, we skip the link test because 
you no

longer have normal connection to the network.  You now use a special
loopback cable, which will fail the link up test because the link up
test assumes connection to the network using normal cable.

In the case I was debugging, wrap plugs were installed because the 
ports

were, later, being tested in an external loopback way.

What I am observing is that it takes about 20 seconds for the 
kernel to

declare that the link is up, after running the offline or
external_lb test. In the case of offline I cannot run the test 
again

until the kernel declares the link up. In the case of external_lb I
can run the test again immediately and it passes.

As stated earlier, because we skip the link test when we are performing
external_lb.

So, you should always do

Re: [PATCH] mac80211_hwsim: unregister genetlink family properly

2015-08-13 Thread Johannes Berg

On Fri, 2015-08-07 at 16:54 +0800, Su Kang Yin wrote:
 During hwsim_init_netlink(), we should call genl_unregister_family()
 if failed on netlink_register_notifier() since the genetlink is
 already registered.
 
Applied.

johannes
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/2] ppp: implement x-netns support

2015-08-13 Thread Guillaume Nault

Let packets move from one netns to the other at PPP encapsulation and
decapsulation time.

PPP units and channels remain in the netns in which they were
originally created. Only the net_device may move to a different
namespace. Cross netns handling is thus transparent to lower PPP
layers (PPPoE, L2TP, etc.).

PPP devices are automatically unregistered when their netns gets
removed. So read() and poll() on the unit file descriptor will
respectively receive EOF and POLLHUP. Channels aren't affected.

Signed-off-by: Guillaume Nault g.na...@alphalink.fr
---
 drivers/net/ppp/ppp_generic.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 1dc478a..bdde5d8 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -283,6 +283,8 @@ static int unit_set(struct idr *p, void *ptr, int n);
 static void unit_put(struct idr *p, int n);
 static void *unit_find(struct idr *p, int n);
 
+static const struct net_device_ops ppp_netdev_ops;
+
 static struct class *ppp_class;
 
 /* per net-namespace data */
@@ -919,13 +921,22 @@ static __net_init int ppp_init_net(struct net *net)
 static __net_exit void ppp_exit_net(struct net *net)
 {
struct ppp_net *pn = net_generic(net, ppp_net_id);
+   struct net_device *dev;
+   struct net_device *aux;
struct ppp *ppp;
LIST_HEAD(list);
int id;
 
rtnl_lock();
+   for_each_netdev_safe(net, dev, aux) {
+   if (dev-netdev_ops == ppp_netdev_ops)
+   unregister_netdevice_queue(dev, list);
+   }
+
idr_for_each_entry(pn-units_idr, ppp, id) {
-   unregister_netdevice_queue(ppp-dev, list);
+   /* Skip devices already unregistered by previous loop */
+   if (!net_eq(dev_net(ppp-dev), net))
+   unregister_netdevice_queue(ppp-dev, list);
}
 
unregister_netdevice_many(list);
@@ -1018,6 +1029,7 @@ ppp_start_xmit(struct sk_buff *skb, struct net_device 
*dev)
proto = npindex_to_proto[npi];
put_unaligned_be16(proto, pp);
 
+   skb_scrub_packet(skb, !net_eq(ppp-ppp_net, dev_net(dev)));
skb_queue_tail(ppp-file.xq, skb);
ppp_xmit_process(ppp);
return NETDEV_TX_OK;
@@ -1138,7 +1150,6 @@ static void ppp_setup(struct net_device *dev)
dev-tx_queue_len = 3;
dev-type = ARPHRD_PPP;
dev-flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
-   dev-features |= NETIF_F_NETNS_LOCAL;
netif_keep_dst(dev);
 }
 
@@ -1901,6 +1912,8 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff 
*skb)
skb-dev = ppp-dev;
skb-protocol = htons(npindex_to_ethertype[npi]);
skb_reset_mac_header(skb);
+   skb_scrub_packet(skb, !net_eq(ppp-ppp_net,
+ dev_net(ppp-dev)));
netif_rx(skb);
}
}
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/2] ppp: fix device unregistration upon netns deletion

2015-08-13 Thread Guillaume Nault

PPP devices may get automatically unregistered when their network
namespace is getting removed. This happens if the ppp control plane
daemon (e.g. pppd) exits while it is the last user of this namespace.

This leads to several races:

  * ppp_exit_net() may destroy the per namespace idr (pn-units_idr)
before all file descriptors were released. Successive ppp_release()
calls may then cleanup PPP devices with ppp_shutdown_interface() and
try to use the already destroyed idr.

  * Automatic device unregistration may also happen before the
ppp_release() call for that device gets executed. Once called on
the file owning the device, ppp_release() will then clean it up and
try to unregister it a second time.

To fix these issues, operations defined in ppp_shutdown_interface() are
moved to the PPP device's ndo_uninit() callback. This allows PPP
devices to be properly cleaned up by unregister_netdev() and friends.
So checking for ppp-owner is now an accurate test to decide if a PPP
device should be unregistered.

Setting ppp-owner is done in ppp_create_interface(), before device
registration, in order to avoid unprotected modification of this field.

Finally ppp_exit_net() now starts by unregistering all remaining PPP
devices to ensure that none will get unregistered after the call to
idr_destroy().

Signed-off-by: Guillaume Nault g.na...@alphalink.fr
---
 drivers/net/ppp/ppp_generic.c | 79 +++
 1 file changed, 43 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 9d15566..1dc478a 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -269,9 +269,9 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff 
*skb, int inbound);
 static void ppp_ccp_closed(struct ppp *ppp);
 static struct compressor *find_compressor(int type);
 static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st);
-static struct ppp *ppp_create_interface(struct net *net, int unit, int *retp);
+static struct ppp *ppp_create_interface(struct net *net, int unit,
+   struct file *file, int *retp);
 static void init_ppp_file(struct ppp_file *pf, int kind);
-static void ppp_shutdown_interface(struct ppp *ppp);
 static void ppp_destroy_interface(struct ppp *ppp);
 static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit);
 static struct channel *ppp_find_channel(struct ppp_net *pn, int unit);
@@ -392,8 +392,10 @@ static int ppp_release(struct inode *unused, struct file 
*file)
file-private_data = NULL;
if (pf-kind == INTERFACE) {
ppp = PF_TO_PPP(pf);
+   rtnl_lock();
if (file == ppp-owner)
-   ppp_shutdown_interface(ppp);
+   unregister_netdevice(ppp-dev);
+   rtnl_unlock();
}
if (atomic_dec_and_test(pf-refcnt)) {
switch (pf-kind) {
@@ -593,8 +595,10 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
mutex_lock(ppp_mutex);
if (pf-kind == INTERFACE) {
ppp = PF_TO_PPP(pf);
+   rtnl_lock();
if (file == ppp-owner)
-   ppp_shutdown_interface(ppp);
+   unregister_netdevice(ppp-dev);
+   rtnl_unlock();
}
if (atomic_long_read(file-f_count)  2) {
ppp_release(NULL, file);
@@ -838,11 +842,10 @@ static int ppp_unattached_ioctl(struct net *net, struct 
ppp_file *pf,
/* Create a new ppp unit */
if (get_user(unit, p))
break;
-   ppp = ppp_create_interface(net, unit, err);
+   ppp = ppp_create_interface(net, unit, file, err);
if (!ppp)
break;
file-private_data = ppp-file;
-   ppp-owner = file;
err = -EFAULT;
if (put_user(ppp-file.index, p))
break;
@@ -916,6 +919,17 @@ static __net_init int ppp_init_net(struct net *net)
 static __net_exit void ppp_exit_net(struct net *net)
 {
struct ppp_net *pn = net_generic(net, ppp_net_id);
+   struct ppp *ppp;
+   LIST_HEAD(list);
+   int id;
+
+   rtnl_lock();
+   idr_for_each_entry(pn-units_idr, ppp, id) {
+   unregister_netdevice_queue(ppp-dev, list);
+   }
+
+   unregister_netdevice_many(list);
+   rtnl_unlock();
 
idr_destroy(pn-units_idr);
 }
@@ -1088,8 +1102,28 @@ static int ppp_dev_init(struct net_device *dev)
return 0;
 }
 
+static void ppp_dev_uninit(struct net_device *dev)
+{
+   struct ppp *ppp = netdev_priv(dev);
+   struct ppp_net *pn =

[PATCH net-next 0/2] ppp: implement x-netns support

2015-08-13 Thread Guillaume Nault

This series allows PPP devices to reside in a different netns from the
PPP unit/channels. Packets only cross netns boundaries when they're
transmitted between the net_device and the PPP unit (units and channels
always remain in their creation namespace).
So only PPP units need to handle cross namespace operations. Channels
and lower layer protocols aren't affected.

Patch #1 is a bug fix for an existing namespace deletion bug and has
been separetly sent to net.
Patch #2 is the actual x-netns implementation.


Guillaume Nault (2):
  ppp: fix device unregistration upon netns deletion
  ppp: implement x-netns support

 drivers/net/ppp/ppp_generic.c | 94 ++-
 1 file changed, 57 insertions(+), 37 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ath9k_htc: wmi: match wait_for_completion_timeout return type

2015-08-13 Thread Kalle Valo


 Return type of wait_for_completion_timeout is unsigned long not int.
 As time_left is exclusively used for wait_for_completion_timeout here its
 type is simply changed to unsigned long.
 
 API conformance testing for completions with coccinelle spatches are being
 used to locate API usage inconsistencies:
 ./drivers/net/wireless/ath/ath9k/wmi.c:331
   int return assigned to unsigned long
 
 Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m,
 CONFIG_ATH9K_HTC=m
 
 Patch is against 4.1-rc3 (localversion-next is -next-20150514)
 
 Signed-off-by: Nicholas Mc Guire hof...@osadl.org

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: ath9k: match wait_for_completion_timeout return type

2015-08-13 Thread Kalle Valo


 Return type of wait_for_completion_timeout is unsigned long not int.
 As time_left is exclusively used for wait_for_completion_timeout here its
 type is simply changed to unsigned long.
 
 API conformance testing for completions with coccinelle spatches are being
 used to locate API usage inconsistencies:
 ./drivers/net/wireless/ath/ath9k/link.c:197
 int return assigned to unsigned long
 
 Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m,
 
 Patch is against 4.1-rc3 (localversion-next is -next-20150514)
 
 Signed-off-by: Nicholas Mc Guire hof...@osadl.org

Thanks, applied to wireless-drivers-next.git.

Kalle Valo
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH 1/3] net: make default tx_queue_len configurable

2015-08-13 Thread Jesper Dangaard Brouer


On Thu, 13 Aug 2015 03:13:40 +0200 Phil Sutter p...@nwl.cc wrote:

 On Tue, Aug 11, 2015 at 06:13:49PM -0700, Alexei Starovoitov wrote:

  In general 'changing the default' may be an acceptable thing, but then
  it needs to strongly justified. How much performance does it bring?
 
 A quick test on my local VM with veth and netperf (netserver and veth
 peer in different netns) I see an increase of about 5% of throughput
 when using noqueue instead of the default pfifo_fast.

Good that you can show 5% improvement with a single netperf flow.  We
are saving approx 6 atomic operations avoiding the qdisc code path.

This fixes a scalability issue with veth. Thus, the real performance
boost will happen with multiple flows and multiple CPU cores in
action.  You can try with a multi core VM and use super_netperf.

https://github.com/borkmann/stuff/blob/master/super_netperf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] gianfar: Restore link state settings after MAC reset

2015-08-13 Thread Claudiu Manoil

There are some MAC registers that need to be kept in sync
with the link state parameters, see adjust_link().
However, after a MAC soft reset default values for
these registers are assumed.  In some cases (excepting
if down/ if up for example) adjust_link() does not see
that these values were reset to default because the
priv-old* link parameters were left unchanged.
So, reset the priv-old* link params as well during a
MAC reset to let adjust_link() restore the MAC link
settings to the actual link state values.

Fixes following case, for example:
Setting link to 100M, changing MTU (implies MAC reset),
link state remains unchanged to 100M but MAC registers
were reset to default (1G) breaking the connectivity w/
the PHY.  Closing and re-opening the interface would
restore the MAC link parameters to the correct values.

Signed-off-by: Claudiu Manoil claudiu.man...@freescale.com
---
 drivers/net/ethernet/freescale/gianfar.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/freescale/gianfar.c 
b/drivers/net/ethernet/freescale/gianfar.c
index 2b7610f..10b3 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -2102,6 +2102,11 @@ int startup_gfar(struct net_device *ndev)
/* Start Rx/Tx DMA and enable the interrupts */
gfar_start(priv);
 
+   /* force link state update after mac reset */
+   priv-oldlink = 0;
+   priv-oldspeed = 0;
+   priv-oldduplex = -1;
+
phy_start(priv-phydev);
 
enable_napi(priv);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 0/3] mpls: multipath support

2015-08-13 Thread Robert Shearman


On 13/08/15 03:07, roopa wrote:

On 8/12/15, 10:30 AM, Robert Shearman wrote:

On 11/08/15 22:45, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

This patch series adds multipath support to mpls routes.

resembles ipv4 multipath support. The multipath route nexthop
selection algorithm is the same code as in ipv4 fib code.

I understand that the multipath algorithm in ipv4 is undergoing
some changes and will move mpls to similar algo if applicable once
those get merged.


Is it necessary for the mpls patch selection algorithm to closely
resemble the ipv4 one?

No, It is not necessary. I picked that because it was already there. And
I see that ipv4 is also getting some new multipath algorithms
(https://marc.info/?l=linux-apim=143457208315573w=2). I wanted to move
to the new RT_MP infra if that becomes applicable in the future.


The MPLS code doesn't have the binary compatibility requirement that the 
IPv4 path does, so there isn't so much of a need for the algorithm to be 
configurable, provided the default is reasonable. Unless you have a use 
case in mind that would particularly suited to the round-robin algorithm?





A flow based algorithm would be much better for traffic that is
sensitive to re-ordering (e.g TCP, L2VPN) and IMHO we should do this
from the start for MPLS.

I've also been looking at implementing this functionality. I've got a
set of patches for this that I can send if you'd like.


Definitely. But, It seems like you can also submit incremental patches
to mine. You can replace the current algo with a hash based with your
patches.


With a flow-based algorithm if there's no need to support weighted paths 
then there's no need to iterate through the nexthops to work out which 
one should be used and, therefore, there is a performance benefit.


So my patches implement a flow-based path selection without support for 
weighted paths. This is similar way to how the IPv6 path selection 
works. The user can still do UCMP with this mechanism, but they have to 
add the same nexthop multiple times. I don't know if this trade-off is 
worth it, but the benefit is that we can always add support for weighted 
paths in the future, whereas removing support for weighted paths would 
be harder due to compatibility concerns.


Therefore, if I rebased my patches on top of yours I would be removing 
code managing the weighting that you will have just added. Not sure if 
that is desirable.



If that does not work for you and if you want me to merge with this
series that works too.


I think that would work better. I'll send you a patch against your 
current series.


Thanks,
Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/3] mpls: move mpls_route nexthop fields to a new nhlfe struct

2015-08-13 Thread Robert Shearman


On 13/08/15 04:16, roopa wrote:

On 8/12/15, 12:15 PM, Robert Shearman wrote:

On 11/08/15 22:45, Roopa Prabhu wrote:

From: Roopa Prabhu ro...@cumulusnetworks.com

moves mpls_route nexthop fields to a new mpls_nhlfe
struct. mpls_nhlfe represents a mpls nexthop label forwarding entry.
It prepares mpls route structure for multipath support.

In the process moves mpls_route structure into internal.h.


Is there a requirement for moving this and the new datastructures into
internal.h? I may have missed it, but I don't see any dependency on
this in this patch series.


No dependency really. In my initial implementation of iptunnels I had
some shared code and it had been in internal.h since then.
  i don't share any of this with iptunnels now. But, if you see patch
3/3, there is a lot more macros I add with struct nhlfe etc and it is
cleaner
to move all this to a header file than keeping it in the .c file.


Ok, I have no strong preference.




Moves some of the code from mpls_route_add into a separate mpls
nhlfe build function. changed mpls_rt_alloc to take number of
nexthops as argument.

A mpls route can point to multiple mpls_nhlfe. This patch
does not support multipath yet, hence the rest of the changes
assume that a mpls route points to a single mpls_nhlfe

Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com
---
  net/mpls/af_mpls.c  |  225
---
  net/mpls/internal.h |   35 
  2 files changed, 158 insertions(+), 102 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 8c5707d..cf86e9d 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -21,35 +21,6 @@
  #endif
  #include internal.h

-#define LABEL_NOT_SPECIFIED (120)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct
neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-MPT_UNSPEC, /* IPv4 or IPv6 */
-MPT_IPV4 = 4,
-MPT_IPV6 = 6,
-
-/* Other types not implemented:
- *  - Pseudo-wire with or without control word (RFC4385)
- *  - GAL (RFC5586)
- */
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-struct net_device __rcu *rt_dev;
-struct rcu_headrt_rcu;
-u32rt_label[MAX_NEW_LABELS];
-u8rt_protocol; /* routing protocol that set this
entry */
-u8  rt_payload_type;
-u8rt_labels;
-u8rt_via_alen;
-u8rt_via_table;
-u8rt_via[0];
-};
-
  static int zero = 0;
  static int label_limit = (1  20) - 1;


...

@@ -281,13 +254,15 @@ struct mpls_route_config {
  struct nl_inforc_nlinfo;
  };

-static struct mpls_route *mpls_rt_alloc(size_t alen)
+static struct mpls_route *mpls_rt_alloc(int num_nh)
  {
  struct mpls_route *rt;

-rt = kzalloc(sizeof(*rt) + alen, GFP_KERNEL);
+rt = kzalloc(sizeof(*rt) + (num_nh * sizeof(struct mpls_nhlfe)),


How about this instead:
  offsetof(typeof(*rt), rt_nh[num_nh])
?

That way, you don't need to write out the type of rt_nh here.

I don't mind, but i followed existing convention for this (especially
the fib code).
would prefer keeping it the current way.


I don't think we have to follow the ipv4 convention here, but again I 
have no strong preference.





+ GFP_KERNEL);
  if (rt)
-rt-rt_via_alen = alen;
+rt-rt_nhn = num_nh;
+
  return rt;
  }




Thanks for the review.


Thank you for implementing this functionality.

Rob
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Problem with fragmented packets on tun/tap interface

2015-08-13 Thread Eric Dumazet

On Thu, 2015-08-13 at 12:52 +0530, Prashant Upadhyaya wrote:

 
 Hi,
 
 I think I have a clue to the root cause of my issue, but I do not know
 a solution.
 Let me describe what I think is the problem.
 
 Fragmented packets enter into the kernel through eth0 and the kernel
 starts assembling them.
 Simultaneously, my packet socket implementation also injects the very
 same packets into the kernel via the tap. The kernel sees them as
 overlapped packets during assembly and drops the packets injected via
 the tap.
 Eventually when the assembly gets complete inside kernel for all the
 packets which entered via eth0, the whole packet gets dropped due to
 the iptables rules that I have set on eth0.
 So naturally there is no response to the bigger ping, because
 everything got dropped one way or the other.
 
 When I do introduce the delays (and it turns out that the delay that
 matters is when injecting via tap), the kernel has already completed
 the assembly of the packets via eth0 (during the delay I introduce for
 submission on tap), and then the submission via tap works well because
 it undergoes a fresh assembly (and ofcourse it does not get dropped
 because iptables drop rule is only on eth0)
 
 Now then, the question is -- how do I prevent the kernel from trying
 to assemble the packets arriving on eth0 and drop them right away even
 before assembly is attempted. This way the same packet injected via
 the tap would be the only one undergoing assembly and hopefully it
 would work.
 

Nice theory ! 

What kind of iptables rule do you have to drop packets coming on eth0 ?

Have you tried to install this rule in raw table, PREROUTING hook ?

This should work, because the defrag is attempted from
ip_local_deliver() [ after raw table has given its verdict] , not from
ip_rcv().

iptables -t raw -I PREROUTING -i eth0 -j DROP




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] Add a matching set of device_ functions for determining mac/phy

2015-08-13 Thread Jeremy Linton


Hello Robin,

On 08/13/2015 06:57 AM, Robin Murphy wrote:

+static void *device_get_mac_addr(struct device *dev,
+const char *name, char *addr,
+int alen)
+{
+   int ret = device_property_read_u8_array(dev, name, addr, alen);
+
+   if (ret == 0  is_valid_ether_addr(addr))
+   return addr;
+   return NULL;
+}


Not sure I understand the logic here - return the same thing we were
given if we updated it, or null if we didn't. It's only indicating
success/failure (the caller can perfectly well cast its own buffer to a
void * if it needs to), so why wouldn't you just return a normal int
error code?



	No particular reason, other than initially I was trying to keep the 
function as similar as possible to the one in of_net. AKA copy paste 
job. I can convert the return types, but I was trying for a simple 
function rename. That way the users of the of version could be converted 
with relative ease, and the drivers which invented their own version of 
these functions could be changed to use this instead. Of course, that 
plan took a blow, when I added the addr/alen parameters.


Same thing applies for the other function.




--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/2] net: track link status of ipv6 nexthops

2015-08-13 Thread Andy Gospodarek

Add support to track current link status of ipv6 nexthops to match
recent changes that added support for ipv4 nexthops.  This takes a
simple approach to track linkdown status for next-hops and simply
checks the dev for the dst entry and sets proper flags that to be used
in the netlink message.

v2: drop use of rt6i_nhflags since it is not needed right now

Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
---

I realize this patch might be a bit different expected based on
conversations on netdev yesterday, but I got a few off-list
communications that indicated a preference to not expand rt6_info at
this time -- despite the fact that expansion will likely be needed for
switchdev offload of ipv6 fib entries in the near future.

 net/ipv6/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 54fccf0..26b51e1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2885,6 +2885,8 @@ static int rt6_fill_node(struct net *net,
else
rtm-rtm_type = RTN_UNICAST;
rtm-rtm_flags = 0;
+   if (!netif_carrier_ok(rt-dst.dev))
+   rtm-rtm_flags |= RTNH_F_LINKDOWN;
rtm-rtm_scope = RT_SCOPE_UNIVERSE;
rtm-rtm_protocol = rt-rt6i_protocol;
if (rt-rt6i_flags  RTF_DYNAMIC)
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 2/2] net: ipv6 sysctl option to ignore routes when nexthop link is down

2015-08-13 Thread Andy Gospodarek

Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link.  The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:

net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.

1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
8000::/8 dev p8p1  proto kernel  metric 256  pref medium
9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
fe80::/64 dev p8p1  proto kernel  metric 256  pref medium

This also adds devconf support and notification when sysctl values
change.

v2: drop use of rt6i_nhflags since it is not needed right now

Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com
Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com
---
 include/linux/ipv6.h  |   1 +
 include/uapi/linux/ipv6.h |   1 +
 net/ipv6/addrconf.c   | 105 +-
 net/ipv6/route.c  |  11 -
 4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index cb9dcad..f1f32af 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -31,6 +31,7 @@ struct ipv6_devconf {
__s32   accept_ra_defrtr;
__s32   accept_ra_min_hop_limit;
__s32   accept_ra_pinfo;
+   __s32   ignore_routes_with_linkdown;
 #ifdef CONFIG_IPV6_ROUTER_PREF
__s32   accept_ra_rtr_pref;
__s32   rtr_probe_interval;
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 80f3b74..38b4fef 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -173,6 +173,7 @@ enum {
DEVCONF_STABLE_SECRET,
DEVCONF_USE_OIF_ADDRS_ONLY,
DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT,
+   DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN,
DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 53e3a9d..5dfbac7 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -214,6 +214,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.initialized = false,
},
.use_oif_addrs_only = 0,
+   .ignore_routes_with_linkdown = 0,
 };
 
 static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
@@ -257,6 +258,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly 
= {
.initialized = false,
},
.use_oif_addrs_only = 0,
+   .ignore_routes_with_linkdown = 0,
 };
 
 /* Check if a valid qdisc is available */
@@ -472,6 +474,9 @@ static int inet6_netconf_msgsize_devconf(int type)
if (type == -1 || type == NETCONFA_PROXY_NEIGH)
size += nla_total_size(4);
 
+   if (type == -1 || type == NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN)
+   size += nla_total_size(4);
+
return size;
 }
 
@@ -508,6 +513,11 @@ static int inet6_netconf_fill_devconf(struct sk_buff *skb, 
int ifindex,
nla_put_s32(skb, NETCONFA_PROXY_NEIGH, devconf-proxy_ndp)  0)
goto nla_put_failure;
 
+   if ((type == -1 || type == NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN) 
+   nla_put_s32(skb, NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN,
+   devconf-ignore_routes_with_linkdown)  0)
+   goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -544,6 +554,7 @@ static const struct nla_policy 
devconf_ipv6_policy[NETCONFA_MAX+1] = {
[NETCONFA_IFINDEX]  = { .len = sizeof(int) },
[NETCONFA_FORWARDING]   = { .len = sizeof(int) },
[NETCONFA_PROXY_NEIGH]  = { .len = sizeof(int) },
+   [NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN]  = { .len = sizeof(int) },
 };
 
 static int inet6_netconf_get_devconf(struct sk_buff *in_skb,
@@ -766,6 +777,63 @@ static int addrconf_fixup_forwarding(struct ctl_table 
*table, int *p, int newf)
rt6_purge_dflt_routers(net);
return 1;
 }
+
+static void addrconf_linkdown_change(struct net *net, __s32 newf)
+{
+   struct net_device *dev;
+   struct inet6_dev *idev;
+
+   for_each_netdev(net, dev) {
+   idev = __in6_dev_get(dev);
+   if (idev) {
+   int changed = (!idev-cnf.ignore_routes_with_linkdown) 
^ (!newf);
+
+   idev-cnf.ignore_routes_with_linkdown

RE: [PATCH] IGMP: Inhibit reports for local multicast groups

2015-08-13 Thread Philip Downey

Hi David
Thanks for taking the time to review and comment.
This is my first upstream request so please forgive any ignorance on my part.   
I have added a new proposed commit wording below with a view to agreeing the 
content before resubmitting the patch.
I hope it is sufficient to address your concerns.

   IGMP: Inhibit reports for local multicast groups

The range of addresses between 224.0.0.0 and 224.0.0.255
inclusive, is reserved for the use of routing protocols and other
low-level topology discovery or maintenance protocols, such as
gateway discovery and group membership reporting.  Multicast
routers should not forward any multicast datagram with   destination
addresses in this range, regardless of its TTL.

Currently, IGMP reports are generated for this reserved range of
addresses even though a router will ignore this information since
it has no purpose.  However, the presence of reserved group
addresses in an IGMP membership report uses up network bandwidth
and can also obscure addresses of interest when inspecting
membership reports using packet inspection or debug messages.

IGMP reports for local multicast groups can now be inhibited by means
of a system control variable (setting the value to zero).

To retain backwards compatibility the previous behaviour is retained by
default on system boot.

Signed-off-by: Philip Downey pdow...@brocade.com


Regards

Philip

 -Original Message-
 From: David Miller [mailto:da...@davemloft.net]
 Sent: Thursday, August 13, 2015 12:45 AM
 To: Philip Downey
 Cc: kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshf...@linux-ipv6.org;
 ka...@trash.net; linux-ker...@vger.kernel.org; netdev@vger.kernel.org
 Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups
 
 From: Philip Downey pdow...@brocade.com
 Date: Wed, 12 Aug 2015 17:13:53 +0100
 
  IGMP reports are generated for link local multicast groups (224.0.0.1
  - 224.0.0.255) used by the routing protocols such as RIP, OSPF etc.
  In general routers do not generate reports for local multicast groups.
 
  IGMP reports for local multicast groups can now be inhibited by means
  of a system control variable (setting the value to zero).
 
  To retain backwards compatibility the previous behaviour is retained
  by default on system boot.
 
  Signed-off-by: Philip Downey pdow...@brocade.com
 
 I'm always hesitent to apply patches like this.
 
 I can't even understand from your explanation:
 
 1) what about local reporting behavior is so bad
 
 2) why you want to inhibit them at all
 
 For example, this:
 
  In general routers do not generate reports for local multicast groups.
 
 Doesn't tell me anything.  You need to go into more detail about this, and
 explain the situation sufficiently.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mm: make page pfmemalloc check more robust

2015-08-13 Thread Eric Dumazet

On Thu, 2015-08-13 at 11:13 +0200, Vlastimil Babka wrote:

 Given that this apparently isn't the first case of this localhost issue, 
 I wonder if network code should just clear skb-pfmemalloc during send 
 (or maybe just send over localhost). That would be probably easier than 
 distinguish the __skb_fill_page_desc() callers for send vs receive.

Would this still needed after this patch ?

It is sad we do not have a SNMP counter to at least count how often we
drop skb because pfmemalloc is set.

I'll provide such a patch.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next PATCH 1/3] net: make default tx_queue_len configurable

2015-08-13 Thread Phil Sutter

On Thu, Aug 13, 2015 at 03:10:33PM +0200, Jesper Dangaard Brouer wrote:
 
 On Thu, 13 Aug 2015 03:13:40 +0200 Phil Sutter p...@nwl.cc wrote:
 
  On Tue, Aug 11, 2015 at 06:13:49PM -0700, Alexei Starovoitov wrote:
 
   In general 'changing the default' may be an acceptable thing, but then
   it needs to strongly justified. How much performance does it bring?
  
  A quick test on my local VM with veth and netperf (netserver and veth
  peer in different netns) I see an increase of about 5% of throughput
  when using noqueue instead of the default pfifo_fast.
 
 Good that you can show 5% improvement with a single netperf flow.  We
 are saving approx 6 atomic operations avoiding the qdisc code path.
 
 This fixes a scalability issue with veth. Thus, the real performance
 boost will happen with multiple flows and multiple CPU cores in
 action.  You can try with a multi core VM and use super_netperf.
 
 https://github.com/borkmann/stuff/blob/master/super_netperf

I actually used that on my VM as well, but the difference between a
single and ten streams in parallel was negligible. In order to avoid
tampering the results, I tested again on a physical system with four
cores, ran each benchmark ten times and built an average over the
results. This showed an increase in throughput of about 35% with a
single stream and about 10% with ten streams in parallel. Not sure
though why the improvement is bigger in the first case if there really
is a scalability problem as you say.

Cheers, Phil
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] ipv4: off-by-one in continuation handling in /proc/net/route

2015-08-13 Thread Eric Dumazet

On Thu, 2015-08-13 at 11:21 +0100, Andy Whitcroft wrote:
 When generating /proc/net/route we emit a header followed by a line for
 each route.  When a short read is performed we will restart this process
 based on the open file descriptor.  When calculating the start point we
 fail to take into account that the 0th entry is the header.  This leads
 us to skip the first entry when doing a continuation read.
 
 This can be easily seen with the comparison below:
 
   while read l; do echo $l; done /proc/net/route A
   cat /proc/net/route B
   diff -bu A B | grep '^[+-]'
 
 On my example machine I have approximatly 10KB of route output.  There we
 see the very first non-title element is lost in the while read case,
 and an entry around the 8K mark in the cat case:
 
   +wlan0  02021EAC 0003 0 0 400  0 0 0
   -tun1  00C0AC0A  0001 0 0 950 00C0 0 0 0
 
 Fix up the off-by-one when reaquiring position on continuation.
 
 BugLink: http://bugs.launchpad.net/bugs/1483440
 Signed-off-by: Andy Whitcroft a...@canonical.com
 ---
  net/ipv4/fib_trie.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
   From code inspection I belive this was introduced by the Fixes
   below, but I have not tested this to confirm.
 
   Fixes: 8be33e955cb9 (ipv4: off-by-one in continuation handling in 
 /proc/net/route)

You probably meant

Fixes: 8be33e955cb9 (fib_trie: Fib walk rcu should take a tnode and key 
instead of a trie and a leaf)

CC Alexander for review/comment


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IGMP: Inhibit reports for local multicast groups

2015-08-13 Thread Philip Downey

Hi Andrew
IGMP snooping is designed to prevent hosts on a local network from receiving 
traffic for a multicast group they have not explicitly joined.   Link-Local 
multicast traffic should not have an IGMP client since it is reserved for 
routing protocols.  One would expect that IGMP snooping needs to ignore local 
multicast traffic in the reserved range intended for routers since there should 
be no IGMP client to make join requests.

Regards

Philip

 -Original Message-
 From: Andrew Lunn [mailto:and...@lunn.ch]
 Sent: Thursday, August 13, 2015 5:06 PM
 To: Philip Downey
 Cc: David Miller; kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshfuji@linux-
 ipv6.org; ka...@trash.net; linux-ker...@vger.kernel.org;
 netdev@vger.kernel.org
 Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups
 
 On Thu, Aug 13, 2015 at 02:48:23PM +, Philip Downey wrote:
  Hi David
  Thanks for taking the time to review and comment.
  This is my first upstream request so please forgive any ignorance on my
 part.   I have added a new proposed commit wording below with a view to
 agreeing the content before resubmitting the patch.
  I hope it is sufficient to address your concerns.
 
 IGMP: Inhibit reports for local multicast groups
 
  The range of addresses between 224.0.0.0 and 224.0.0.255
  inclusive, is reserved for the use of routing protocols and other
  low-level topology discovery or maintenance protocols, such as
  gateway discovery and group membership reporting.  Multicast
  routers should not forward any multicast datagram with   destination
  addresses in this range, regardless of its TTL.
 
  Currently, IGMP reports are generated for this reserved range of
  addresses even though a router will ignore this information since
  it has no purpose.
 
 Hi Philip
 
 What about switches which are doing IGMP snooping?
 
  Andrew
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 3/7] net: dsa: mv88e6xxx: add VLAN Get Next support

2015-08-13 Thread Vivien Didelot

Implement the port_pvid_get and vlan_getnext driver functions required
to dump VLAN entries from the hardware, with the VTU Get Next operation.

Some functions and structure will be shared with STU operations, since
their table format are similar (e.g. STU data entries are accessible
with the same registers as VTU entries, except with an offset of 2).

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 drivers/net/dsa/mv88e6352.c |   2 +
 drivers/net/dsa/mv88e6xxx.c | 138 
 drivers/net/dsa/mv88e6xxx.h |  27 +
 3 files changed, 167 insertions(+)

diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index a18f7c8..e6767ce 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -343,6 +343,8 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.port_join_bridge   = mv88e6xxx_join_bridge,
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
+   .port_pvid_get  = mv88e6xxx_port_pvid_get,
+   .vlan_getnext   = mv88e6xxx_vlan_getnext,
.port_fdb_add   = mv88e6xxx_port_fdb_add,
.port_fdb_del   = mv88e6xxx_port_fdb_del,
.port_fdb_getnext   = mv88e6xxx_port_fdb_getnext,
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 175353a..ecdd9da 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2,6 +2,9 @@
  * net/dsa/mv88e6xxx.c - Marvell 88e6xxx switch chip support
  * Copyright (c) 2008 Marvell Semiconductor
  *
+ * Copyright (c) 2015 CMC Electronics, Inc.
+ * Added support for VLAN Table Unit operations
+ *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
  * the Free Software Foundation; either version 2 of the License, or
@@ -1182,6 +1185,19 @@ int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int 
port, u8 state)
return 0;
 }
 
+int mv88e6xxx_port_pvid_get(struct dsa_switch *ds, int port, u16 *pvid)
+{
+   int ret;
+
+   ret = mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_DEFAULT_VLAN);
+   if (ret  0)
+   return ret;
+
+   *pvid = ret  PORT_DEFAULT_VLAN_MASK;
+
+   return 0;
+}
+
 static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds)
 {
return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP,
@@ -1210,6 +1226,128 @@ static int _mv88e6xxx_vtu_stu_flush(struct dsa_switch 
*ds)
return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_FLUSH_ALL);
 }
 
+static int _mv88e6xxx_vtu_stu_data_read(struct dsa_switch *ds,
+   struct mv88e6xxx_vtu_stu_entry *entry,
+   unsigned int nibble_offset)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   u16 regs[3];
+   int i;
+   int ret;
+
+   for (i = 0; i  3; ++i) {
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL,
+ GLOBAL_VTU_DATA_0_3 + i);
+   if (ret  0)
+   return ret;
+
+   regs[i] = ret;
+   }
+
+   for (i = 0; i  ps-num_ports; ++i) {
+   unsigned int shift = (i % 4) * 4 + nibble_offset;
+   u16 reg = regs[i / 4];
+
+   entry-data[i] = (reg  shift)  GLOBAL_VTU_STU_DATA_MASK;
+   }
+
+   return 0;
+}
+
+static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid,
+ struct mv88e6xxx_vtu_stu_entry *entry)
+{
+   struct mv88e6xxx_vtu_stu_entry next = { 0 };
+   int ret;
+
+   ret = _mv88e6xxx_vtu_wait(ds);
+   if (ret  0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID,
+  vid  GLOBAL_VTU_VID_MASK);
+   if (ret  0)
+   return ret;
+
+   ret = _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_GET_NEXT);
+   if (ret  0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_VID);
+   if (ret  0)
+   return ret;
+
+   next.vid = ret  GLOBAL_VTU_VID_MASK;
+   next.valid = !!(ret  GLOBAL_VTU_VID_VALID);
+
+   if (next.valid) {
+   ret = _mv88e6xxx_vtu_stu_data_read(ds, next, 0);
+   if (ret  0)
+   return ret;
+
+   if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) ||
+   mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) {
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL,
+ GLOBAL_VTU_FID);
+   if (ret  0)
+   return ret;
+
+   next.fid = ret  GLOBAL_VTU_FID_MASK;
+
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL,
+

[PATCH net-next 4/7] net: dsa: mv88e6xxx: add VLAN support to FDB dump

2015-08-13 Thread Vivien Didelot

Add an helper function to read the next valid VLAN entry for a given
port. It is used in the VID to FID conversion function to retrieve the
forwarding database assigned to a given VLAN port.

Finally update the FDB getnext operation to iterate on the next valid
port VLAN when the end of the current database is reached.

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 drivers/net/dsa/mv88e6xxx.c | 42 --
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ecdd9da..6c86bad 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1307,6 +1307,29 @@ static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, 
u16 vid,
return 0;
 }
 
+static int _mv88e6xxx_port_vtu_getnext(struct dsa_switch *ds, int port, u16 
vid,
+  struct mv88e6xxx_vtu_stu_entry *entry)
+{
+   int err;
+
+   do {
+   if (vid == 4095)
+   return -ENOENT;
+
+   err = _mv88e6xxx_vtu_getnext(ds, vid, entry);
+   if (err)
+   return err;
+
+   if (!entry-valid)
+   return -ENOENT;
+
+   vid = entry-vid;
+   } while (entry-data[port] != GLOBAL_VTU_DATA_MEMBER_TAG_TAGGED 
+entry-data[port] != GLOBAL_VTU_DATA_MEMBER_TAG_UNTAGGED);
+
+   return 0;
+}
+
 int mv88e6xxx_vlan_getnext(struct dsa_switch *ds, u16 *vid,
   unsigned long *ports, unsigned long *untagged)
 {
@@ -1421,10 +1444,19 @@ static int _mv88e6xxx_atu_load(struct dsa_switch *ds,
 static int _mv88e6xxx_port_vid_to_fid(struct dsa_switch *ds, int port, u16 vid)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   struct mv88e6xxx_vtu_stu_entry vlan;
+   int err;
 
if (vid == 0)
return ps-fid[port];
 
+   err = _mv88e6xxx_port_vtu_getnext(ds, port, vid - 1, vlan);
+   if (err)
+   return err;
+
+   if (vlan.vid == vid)
+   return vlan.fid;
+
return -ENOENT;
 }
 
@@ -1548,8 +1580,14 @@ int mv88e6xxx_port_fdb_getnext(struct dsa_switch *ds, 
int port,
 
do {
if (is_broadcast_ether_addr(addr)) {
-   ret = -ENOENT;
-   goto unlock;
+   struct mv88e6xxx_vtu_stu_entry vtu;
+
+   ret = _mv88e6xxx_port_vtu_getnext(ds, port, *vid, vtu);
+   if (ret  0)
+   goto unlock;
+
+   *vid = vtu.vid;
+   fid = vtu.fid;
}
 
ret = _mv88e6xxx_atu_getnext(ds, fid, addr, next);
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IGMP: Inhibit reports for local multicast groups

2015-08-13 Thread Andrew Lunn

On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote:
 Hi Andrew
 IGMP snooping is designed to prevent hosts on a local network from receiving 
 traffic for a multicast group they have not explicitly joined.   Link-Local 
 multicast traffic should not have an IGMP client since it is reserved for 
 routing protocols.  One would expect that IGMP snooping needs to ignore local 
 multicast traffic in the reserved range intended for routers since there 
 should be no IGMP client to make join requests.

The point of this patch is that Linux is sending out group membership
for these addresses, it is acting as a client. What happens with a
switch which is applying IGMP snooping to link-local multicast groups?
You turn on this feature, and you no longer get your routing protocol
messages.

I had a quick look at RFC 3376. The only mention i spotted for not
sending IGMP messages is:

   The all-systems multicast address, 224.0.0.1, is handled as a special
   case.  On all systems -- that is all hosts and routers, including
   multicast routers -- reception of packets destined to the all-systems
   multicast address, from all sources, is permanently enabled on all
   interfaces on which multicast reception is supported.  No IGMP
   messages are ever sent regarding the all-systems multicast address.

IGMP v2 has something similar:

   The all-systems group (address 224.0.0.1) is handled as a special
   case.  The host starts in Idle Member state for that group on every
   interface, never transitions to another state, and never sends a
   report for that group.

But i did not find anything which says all other link-local addresses
don't need member reports. Did i miss something?

  Andrew
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] net: sch_generic: react upon IFF_NO_QUEUE flag

2015-08-13 Thread Phil Sutter

Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero.

Signed-off-by: Phil Sutter p...@nwl.cc
---
 net/sched/sch_generic.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 6efca30..942fea8 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -735,7 +735,7 @@ static void attach_one_default_qdisc(struct net_device *dev,
 {
struct Qdisc *qdisc = noqueue_qdisc;
 
-   if (dev-tx_queue_len) {
+   if (dev-tx_queue_len  !(dev-priv_flags  IFF_NO_QUEUE)) {
qdisc = qdisc_create_dflt(dev_queue,
  default_qdisc_ops, TC_H_ROOT);
if (!qdisc) {
@@ -755,7 +755,9 @@ static void attach_default_qdiscs(struct net_device *dev)
 
txq = netdev_get_tx_queue(dev, 0);
 
-   if (!netif_is_multiqueue(dev) || dev-tx_queue_len == 0) {
+   if (!netif_is_multiqueue(dev) ||
+   dev-tx_queue_len == 0 ||
+   dev-priv_flags  IFF_NO_QUEUE) {
netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL);
dev-qdisc = txq-qdisc_sleeping;
atomic_inc(dev-qdisc-refcnt);
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len

2015-08-13 Thread Phil Sutter

This series adds a new private net_device flag indicating that a device may
(and probably should) be used without a queueing discipline attached to it.
This is already common practice for many virtual device types like e.g.
loopback, VLAN (802.1Q) or bridges (802.1D). The reason for this is that these
devices lack an underlying layer which could impose back pressure and therefore
making a TX queue necessary to not slow down senders.

Up to now, drivers being aware of the above applying to them set
dev-tx_queue_len to zero to indicate no qdisc should be attached to the
interface they drive and the kernel reacts upon this by assigning the noop
qdisc instead of the default pfifo_fast. This implicit agreement though leads
to an inconvenient situation once a user tries to attach a real qdisc to these
devices, as the formerly special tx_queue_len value becomes a regular one,
limiting the queue to zero packets and thus prevents any TX from happening. To
overcome this, practically all qdisc implementations intercept and sanitize the
malicious value.

With this series applied, drivers may signal the lack of need for a qdisc
without having to tamper with tx_queue_len, making fallbacks in qdiscs and
caveats in userspace unnecessary.

Upon upstream acceptance, this series will be followed up by a set of patches
converting device drivers, adding a warning so out-of-tree driver authors get
aware of this change and dropping all special handling of tx_queue_len in
net/sched/.

Phil Sutter (2):
  net: declare new net_device priv_flag IFF_NO_QUEUE
  net: sch_generic: react upon IFF_NO_QUEUE flag

 include/linux/netdevice.h | 3 +++
 net/sched/sch_generic.c   | 6 --
 2 files changed, 7 insertions(+), 2 deletions(-)

-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] net: declare new net_device priv_flag IFF_NO_QUEUE

2015-08-13 Thread Phil Sutter

This private net_device flag can be set by drivers to inform that a
device runs fine without a qdisc attached. This was formerly done by
setting tx_queue_len to zero.

Signed-off-by: Phil Sutter p...@nwl.cc
---
 include/linux/netdevice.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 607b5f4..7ed6fb0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1262,6 +1262,7 @@ struct net_device_ops {
  * @IFF_LIVE_ADDR_CHANGE: device supports hardware address
  * change when it's running
  * @IFF_MACVLAN: Macvlan device
+ * @IFF_NO_QUEUE: device can run without qdisc attached
  */
 enum netdev_priv_flags {
IFF_802_1Q_VLAN = 10,
@@ -1289,6 +1290,7 @@ enum netdev_priv_flags {
IFF_XMIT_DST_RELEASE_PERM   = 122,
IFF_IPVLAN_MASTER   = 123,
IFF_IPVLAN_SLAVE= 124,
+   IFF_NO_QUEUE= 125,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
@@ -1316,6 +1318,7 @@ enum netdev_priv_flags {
 #define IFF_XMIT_DST_RELEASE_PERM  IFF_XMIT_DST_RELEASE_PERM
 #define IFF_IPVLAN_MASTER  IFF_IPVLAN_MASTER
 #define IFF_IPVLAN_SLAVE   IFF_IPVLAN_SLAVE
+#define IFF_NO_QUEUE   IFF_NO_QUEUE
 
 /**
  * struct net_device - The DEVICE structure.
-- 
2.1.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 7/7] net: dsa: mv88e6xxx: use port 802.1Q mode Secure

2015-08-13 Thread Vivien Didelot

This commit changes the 802.1Q mode of each port from Disabled to
Secure. This enables the VLAN support, by checking the VTU entries on
ingress.

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 drivers/net/dsa/mv88e6xxx.c | 14 +++---
 drivers/net/dsa/mv88e6xxx.h |  5 +
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index ca867e4..332f2c8 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -2005,13 +2005,11 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, 
int port)
goto abort;
}
 
-   /* Port Control 2: don't force a good FCS, set the maximum
-* frame size to 10240 bytes, don't let the switch add or
-* strip 802.1q tags, don't discard tagged or untagged frames
-* on this port, do a destination address lookup on all
-* received packets as usual, disable ARP mirroring and don't
-* send a copy of all transmitted/received frames on this port
-* to the CPU.
+   /* Port Control 2: don't force a good FCS, set the maximum frame size to
+* 10240 bytes, enable secure 802.1q tags, don't discard tagged or
+* untagged frames on this port, do a destination address lookup on all
+* received packets as usual, disable ARP mirroring and don't send a
+* copy of all transmitted/received frames on this port to the CPU.
 */
reg = 0;
if (mv88e6xxx_6352_family(ds) || mv88e6xxx_6351_family(ds) ||
@@ -2033,6 +2031,8 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, 
int port)
reg |= PORT_CONTROL_2_FORWARD_UNKNOWN;
}
 
+   reg |= PORT_CONTROL_2_8021Q_SECURE;
+
if (reg) {
ret = _mv88e6xxx_reg_write(ds, REG_PORT(port),
   PORT_CONTROL_2, reg);
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index ca3268f..72ca887 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -140,6 +140,11 @@
 #define PORT_CONTROL_2_JUMBO_1522  (0x00  12)
 #define PORT_CONTROL_2_JUMBO_2048  (0x01  12)
 #define PORT_CONTROL_2_JUMBO_10240 (0x02  12)
+#define PORT_CONTROL_2_8021Q_MASK  (0x03  10)
+#define PORT_CONTROL_2_8021Q_DISABLED  (0x00  10)
+#define PORT_CONTROL_2_8021Q_FALLBACK  (0x01  10)
+#define PORT_CONTROL_2_8021Q_CHECK (0x02  10)
+#define PORT_CONTROL_2_8021Q_SECURE(0x03  10)
 #define PORT_CONTROL_2_DISCARD_TAGGED  BIT(9)
 #define PORT_CONTROL_2_DISCARD_UNTAGGEDBIT(8)
 #define PORT_CONTROL_2_MAP_DA  BIT(7)
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 6/7] net: dsa: mv88e6xxx: add VLAN Load support

2015-08-13 Thread Vivien Didelot

Implement port_pvid_set and port_vlan_add to add new entries in the VLAN
hardware table, and join ports to them.

The patch also implement the STU Get Next and Load Purge operations,
since it is required to have a valid STU entry for at least all VLANs.

Each VLAN has its own forwarding database, with FID num_ports+1 to 4095.

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 drivers/net/dsa/mv88e6352.c |   2 +
 drivers/net/dsa/mv88e6xxx.c | 169 
 drivers/net/dsa/mv88e6xxx.h |   9 +++
 3 files changed, 180 insertions(+)

diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index cec38bb..14b7177 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -344,6 +344,8 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
.port_pvid_get  = mv88e6xxx_port_pvid_get,
+   .port_pvid_set  = mv88e6xxx_port_pvid_set,
+   .port_vlan_add  = mv88e6xxx_port_vlan_add,
.port_vlan_del  = mv88e6xxx_port_vlan_del,
.vlan_getnext   = mv88e6xxx_vlan_getnext,
.port_fdb_add   = mv88e6xxx_port_fdb_add,
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 8423924..ca867e4 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1198,6 +1198,12 @@ int mv88e6xxx_port_pvid_get(struct dsa_switch *ds, int 
port, u16 *pvid)
return 0;
 }
 
+int mv88e6xxx_port_pvid_set(struct dsa_switch *ds, int port, u16 pvid)
+{
+   return mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_DEFAULT_VLAN,
+  pvid  PORT_DEFAULT_VLAN_MASK);
+}
+
 static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds)
 {
return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP,
@@ -1374,6 +1380,169 @@ loadpurge:
return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_LOAD_PURGE);
 }
 
+static int _mv88e6xxx_stu_getnext(struct dsa_switch *ds, u8 sid,
+ struct mv88e6xxx_vtu_stu_entry *entry)
+{
+   struct mv88e6xxx_vtu_stu_entry next = { 0 };
+   int ret;
+
+   ret = _mv88e6xxx_vtu_wait(ds);
+   if (ret  0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_SID,
+  sid  GLOBAL_VTU_SID_MASK);
+   if (ret  0)
+   return ret;
+
+   ret = _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_STU_GET_NEXT);
+   if (ret  0)
+   return ret;
+
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_SID);
+   if (ret  0)
+   return ret;
+
+   next.sid = ret  GLOBAL_VTU_SID_MASK;
+
+   ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_VID);
+   if (ret  0)
+   return ret;
+
+   next.valid = !!(ret  GLOBAL_VTU_VID_VALID);
+
+   if (next.valid) {
+   ret = _mv88e6xxx_vtu_stu_data_read(ds, next, 2);
+   if (ret  0)
+   return ret;
+   }
+
+   *entry = next;
+   return 0;
+}
+
+static int _mv88e6xxx_stu_loadpurge(struct dsa_switch *ds,
+   struct mv88e6xxx_vtu_stu_entry *entry)
+{
+   u16 reg = 0;
+   int ret;
+
+   ret = _mv88e6xxx_vtu_wait(ds);
+   if (ret  0)
+   return ret;
+
+   if (!entry-valid)
+   goto loadpurge;
+
+   /* Write port states */
+   ret = _mv88e6xxx_vtu_stu_data_write(ds, entry, 2);
+   if (ret  0)
+   return ret;
+
+   reg = GLOBAL_VTU_VID_VALID;
+loadpurge:
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, reg);
+   if (ret  0)
+   return ret;
+
+   reg = entry-sid  GLOBAL_VTU_SID_MASK;
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_SID, reg);
+   if (ret  0)
+   return ret;
+
+   return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_STU_LOAD_PURGE);
+}
+
+static int _mv88e6xxx_vlan_init(struct dsa_switch *ds, u16 vid,
+   struct mv88e6xxx_vtu_stu_entry *entry)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   struct mv88e6xxx_vtu_stu_entry vlan = {
+   .valid = true,
+   .vid = vid,
+   };
+   int i;
+
+   /* exclude all ports except the CPU */
+   for (i = 0; i  ps-num_ports; ++i)
+   vlan.data[i] = dsa_is_cpu_port(ds, i) ?
+   GLOBAL_VTU_DATA_MEMBER_TAG_TAGGED :
+   GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER;
+
+   if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) ||
+   mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) {
+   struct mv88e6xxx_vtu_stu_entry vstp;
+   int err;
+
+   /* Adding a VTU entry requires a valid STU entry. As VSTP is not
+*

[PATCH net-next 1/7] net: dsa: add support for switchdev VLAN objects

2015-08-13 Thread Vivien Didelot

Add new functions in DSA drivers to access hardware VLAN entries through
SWITCHDEV_OBJ_PORT_VLAN objects:

 - port_pvid_get() and vlan_getnext() to dump a VLAN
 - port_vlan_del() to exclude a port from a VLAN
 - port_pvid_set() and port_vlan_add() to join a port to a VLAN

The DSA infrastructure will ensure that each VLAN of the given range
does not already belong to another bridge. If it does, it will fallback
to software VLAN and won't program the hardware.

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 include/net/dsa.h |  11 
 net/dsa/slave.c   | 158 ++
 2 files changed, 169 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 6356f43..bd9b765 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -298,6 +298,17 @@ struct dsa_switch_driver {
   u8 state);
 
/*
+* VLAN support
+*/
+   int (*port_pvid_get)(struct dsa_switch *ds, int port, u16 *pvid);
+   int (*port_pvid_set)(struct dsa_switch *ds, int port, u16 pvid);
+   int (*port_vlan_add)(struct dsa_switch *ds, int port, u16 vid,
+bool untagged);
+   int (*port_vlan_del)(struct dsa_switch *ds, int port, u16 vid);
+   int (*vlan_getnext)(struct dsa_switch *ds, u16 *vid,
+   unsigned long *ports, unsigned long *untagged);
+
+   /*
 * Forwarding database
 */
int (*port_fdb_add)(struct dsa_switch *ds, int port,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 2767584..880ead7 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -200,6 +200,152 @@ out:
return 0;
 }
 
+static int dsa_bridge_check_vlan_range(struct dsa_switch *ds,
+  const struct net_device *bridge,
+  u16 vid_begin, u16 vid_end)
+{
+   struct dsa_slave_priv *p;
+   struct net_device *dev, *vlan_br;
+   DECLARE_BITMAP(members, DSA_MAX_PORTS);
+   DECLARE_BITMAP(untagged, DSA_MAX_PORTS);
+   u16 vid;
+   int member, err;
+
+   if (!ds-drv-vlan_getnext || !vid_begin)
+   return -EOPNOTSUPP;
+
+   vid = vid_begin - 1;
+
+   do {
+   err = ds-drv-vlan_getnext(ds, vid, members, untagged);
+   if (err)
+   break;
+
+   if (vid  vid_end)
+   break;
+
+   member = find_first_bit(members, DSA_MAX_PORTS);
+   if (member == DSA_MAX_PORTS)
+   continue;
+
+   dev = ds-ports[member];
+   p = netdev_priv(dev);
+   vlan_br = p-bridge_dev;
+   if (vlan_br == bridge)
+   continue;
+
+   netdev_dbg(vlan_br, hardware VLAN %d already in use\n, vid);
+   return -EOPNOTSUPP;
+   } while (vid  vid_end);
+
+   return err == -ENOENT ? 0 : err;
+}
+
+static int dsa_slave_port_vlan_add(struct net_device *dev,
+  struct switchdev_obj *obj)
+{
+   struct switchdev_obj_vlan *vlan = obj-u.vlan;
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_switch *ds = p-parent;
+   u16 vid;
+   int err;
+
+   switch (obj-trans) {
+   case SWITCHDEV_TRANS_PREPARE:
+   if (!ds-drv-port_vlan_add || !ds-drv-port_pvid_set)
+   return -EOPNOTSUPP;
+
+   /* If the requested port doesn't belong to the same bridge as
+* the VLAN members, fallback to software VLAN (hopefully).
+*/
+   err = dsa_bridge_check_vlan_range(ds, p-bridge_dev,
+ vlan-vid_begin,
+ vlan-vid_end);
+   if (err)
+   return err;
+   break;
+   case SWITCHDEV_TRANS_COMMIT:
+   for (vid = vlan-vid_begin; vid = vlan-vid_end; ++vid) {
+   err = ds-drv-port_vlan_add(ds, p-port, vid,
+vlan-flags 
+BRIDGE_VLAN_INFO_UNTAGGED);
+   if (!err  vlan-flags  BRIDGE_VLAN_INFO_PVID)
+   err = ds-drv-port_pvid_set(ds, p-port, vid);
+   if (err)
+   return err;
+   }
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return 0;
+}
+
+static int dsa_slave_port_vlan_del(struct net_device *dev,
+  struct switchdev_obj *obj)
+{
+   struct switchdev_obj_vlan *vlan = obj-u.vlan;
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_switch *ds = p-parent;
+   u16 vid;
+   int err;
+
+   if (!ds-drv-port_vlan_del)
+

[PATCH] net: allow sleeping when modifying store_rps_map

2015-08-13 Thread Sasha Levin

Commit 10e4ea751 (net: Fix race condition in store_rps_map) has moved the
manipulation of the rps_needed jump label under a spinlock. Since changing
the state of a jump label may sleep this is incorrect and causes warnings
during runtime.

Make rps_map_lock a mutex to allow sleeping under it.

Fixes: 10e4ea751 (net: Fix race condition in store_rps_map)
Signed-off-by: Sasha Levin sasha.le...@oracle.com
---
 net/core/net-sysfs.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 39ec694..b279077 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -689,7 +689,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
struct rps_map *old_map, *map;
cpumask_var_t mask;
int err, cpu, i;
-   static DEFINE_SPINLOCK(rps_map_lock);
+   static DEFINE_MUTEX(rps_map_mutex);
 
if (!capable(CAP_NET_ADMIN))
return -EPERM;
@@ -722,9 +722,9 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
map = NULL;
}
 
-   spin_lock(rps_map_lock);
+   mutex_lock(rps_map_mutex);
old_map = rcu_dereference_protected(queue-rps_map,
-   lockdep_is_held(rps_map_lock));
+   mutex_is_locked(rps_map_mutex));
rcu_assign_pointer(queue-rps_map, map);
 
if (map)
@@ -732,7 +732,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue,
if (old_map)
static_key_slow_dec(rps_needed);
 
-   spin_unlock(rps_map_lock);
+   mutex_unlock(rps_map_mutex);
 
if (old_map)
kfree_rcu(old_map, rcu);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] ethernet/s2io: advertise what hw supports in vlan_features

2015-08-13 Thread Jarod Wilson

For some reason, the s2io driver has never filled in vlan_features. If
that's fully intentional, then this patch should be dropped. If its not,
then this patch is necessary to maintain some functionality of slave s2io
devices in a bonding group.

Without this, the presence of a s2io device in a bond will not trigger
LRO support to be enabled at the bond level, even while it is enabled on
the slave itself.

This change becomes necessary when NETIF_F_LRO is added to
netdev_features.h's NETIF_F_ONE_FOR_ALL.

CC: Jon Mason jdma...@kudzu.us
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson ja...@redhat.com
---
 drivers/net/ethernet/neterion/s2io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/neterion/s2io.c 
b/drivers/net/ethernet/neterion/s2io.c
index 2d1b942..8bbf540 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -7922,6 +7922,7 @@ s2io_init_nic(struct pci_dev *pdev, const struct 
pci_device_id *pre)
NETIF_F_RXCSUM | NETIF_F_LRO;
dev-features |= dev-hw_features |
NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
+   dev-vlan_features |= dev-hw_features;
if (sp-device_type  XFRAME_II_DEVICE) {
dev-hw_features |= NETIF_F_UFO;
if (ufo)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging

2015-08-13 Thread Joe Perches

On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote:
 * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the
   Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is
   set, the ENERGYON bit does not asserted sometimes). This is a common bug of
   LAN87xx family of PHY chips.
 * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous
   algorythm still not reliable on 100 % and sometimes skip cable plugging.
[]
 diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
[]
 @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device 
 *phydev)
  static int lan87xx_read_status(struct phy_device *phydev)
  {
   int err = genphy_read_status(phydev);
 + int rc;

Is there a reason to move this declaration?

 + int i;
  
   if (!phydev-link) {
   /* Disable EDPD to wake up PHY */
 - int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
 + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
   if (rc  0)
   return rc;
  


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] ethernet/bnx2x: advertise LRO support in vlan_features

2015-08-13 Thread Jarod Wilson

Without this, the presence of a bnx2x device in a bond will not trigger
LRO support to be enabled at the bond level, even while it is enabled on
the slave itself.

This change becomes necessary when NETIF_F_LRO is added to
netdev_features.h's NETIF_F_ONE_FOR_ALL.

CC: Ariel Elior ariel.el...@qlogic.com
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson ja...@redhat.com
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index ad73a60..41dc066 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13083,7 +13083,8 @@ static int bnx2x_init_dev(struct bnx2x *bp, struct 
pci_dev *pdev,
}
 
dev-vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-   NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_HIGHDMA;
+   NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_HIGHDMA |
+   NETIF_F_LRO;
 
/* VF with OLD Hypervisor or old PF do not support filtering */
if (IS_PF(bp)) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] ethernet/qlcnic: advertise LRO support in vlan_features

2015-08-13 Thread Jarod Wilson

Without this, the presence of a qlcnic device in a bond will not trigger
LRO support to be enabled at the bond level, even while it is enabled on
the slave itself.

This change becomes necessary when NETIF_F_LRO is added to
netdev_features.h's NETIF_F_ONE_FOR_ALL.

CC: Shahed Shaikh shahed.sha...@qlogic.com
CC: dept-gelinuxnic...@qlogic.com
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson ja...@redhat.com
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 8b08b20..5a798ab 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -2314,8 +2314,10 @@ qlcnic_setup_netdev(struct qlcnic_adapter *adapter, 
struct net_device *netdev,
if (qlcnic_sriov_vf_check(adapter))
netdev-features |= NETIF_F_HW_VLAN_CTAG_FILTER;
 
-   if (adapter-ahw-capabilities  QLCNIC_FW_CAPABILITY_HW_LRO)
+   if (adapter-ahw-capabilities  QLCNIC_FW_CAPABILITY_HW_LRO) {
netdev-features |= NETIF_F_LRO;
+   netdev-vlan_features |= NETIF_F_LRO;
+   }
 
if (qlcnic_encap_tx_offload(adapter)) {
netdev-features |= NETIF_F_GSO_UDP_TUNNEL;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 2/7] net: dsa: mv88e6xxx: flush VTU and STU entries

2015-08-13 Thread Vivien Didelot

Implement the VTU Flush operation (which also flushes the STU), so that
warm boots won't preserved old entries.

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 drivers/net/dsa/mv88e6xxx.c | 34 ++
 drivers/net/dsa/mv88e6xxx.h |  2 ++
 2 files changed, 36 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 9978245..175353a 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1182,6 +1182,34 @@ int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int 
port, u8 state)
return 0;
 }
 
+static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds)
+{
+   return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP,
+  GLOBAL_VTU_OP_BUSY);
+}
+
+static int _mv88e6xxx_vtu_cmd(struct dsa_switch *ds, u16 op)
+{
+   int ret;
+
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_OP, op);
+   if (ret  0)
+   return ret;
+
+   return _mv88e6xxx_vtu_wait(ds);
+}
+
+static int _mv88e6xxx_vtu_stu_flush(struct dsa_switch *ds)
+{
+   int ret;
+
+   ret = _mv88e6xxx_vtu_wait(ds);
+   if (ret  0)
+   return ret;
+
+   return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_FLUSH_ALL);
+}
+
 static int _mv88e6xxx_atu_mac_write(struct dsa_switch *ds,
const unsigned char *addr)
 {
@@ -2071,6 +2099,12 @@ int mv88e6xxx_setup_global(struct dsa_switch *ds)
/* Wait for the flush to complete. */
mutex_lock(ps-smi_mutex);
ret = _mv88e6xxx_stats_wait(ds);
+   if (ret  0)
+   goto unlock;
+
+   /* Clear all the VTU and STU entries */
+   ret = _mv88e6xxx_vtu_stu_flush(ds);
+unlock:
mutex_unlock(ps-smi_mutex);
 
return ret;
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index 10fae32..76139ea 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -188,6 +188,8 @@
 #define GLOBAL_CONTROL_TCAM_EN BIT(1)
 #define GLOBAL_CONTROL_EEPROM_DONE_EN  BIT(0)
 #define GLOBAL_VTU_OP  0x05
+#define GLOBAL_VTU_OP_BUSY BIT(15)
+#define GLOBAL_VTU_OP_FLUSH_ALL((0x01  12) | 
GLOBAL_VTU_OP_BUSY)
 #define GLOBAL_VTU_VID 0x06
 #define GLOBAL_VTU_DATA_0_30x07
 #define GLOBAL_VTU_DATA_4_70x08
-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging

2015-08-13 Thread Igor Plyatov


Dear Joe,


On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote:

* Due to HW bug, LAN8700 sometimes does not detect presence of energy in the
   Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is
   set, the ENERGYON bit does not asserted sometimes). This is a common bug of
   LAN87xx family of PHY chips.
* The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous
   algorythm still not reliable on 100 % and sometimes skip cable plugging.

[]

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c

[]

@@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev)
  static int lan87xx_read_status(struct phy_device *phydev)
  {
int err = genphy_read_status(phydev);
+   int rc;

Is there a reason to move this declaration?


There is no strict requirement to move declaration of the rc.
It was made just to have all declarations easily visible.


+   int i;
  
  	if (!phydev-link) {

/* Disable EDPD to wake up PHY */
-   int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
+   rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
if (rc  0)
return rc;
  




Best wishes
--
Igor Plyatov
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len

2015-08-13 Thread Stephen Hemminger

On Thu, 13 Aug 2015 19:01:05 +0200
Phil Sutter p...@nwl.cc wrote:

 Up to now, drivers being aware of the above applying to them set
 dev-tx_queue_len to zero to indicate no qdisc should be attached to the
 interface they drive and the kernel reacts upon this by assigning the noop
 qdisc instead of the default pfifo_fast. This implicit agreement though leads
 to an inconvenient situation once a user tries to attach a real qdisc to these
 devices, as the formerly special tx_queue_len value becomes a regular one,

So this is a workaround for user ignorance by introducing kernel API complexity.
Before user sets qdisc, why don't they set tx queue length?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] ethernet/netxen: advertise LRO support in vlan_features

2015-08-13 Thread Jarod Wilson

Without this, the presence of a netxen device in a bond will not trigger
LRO support to be enabled at the bond level, even while it is enabled on
the slave itself.

This change becomes necessary when NETIF_F_LRO is added to
netdev_features.h's NETIF_F_ONE_FOR_ALL.

CC: Manish Chopra manish.cho...@qlogic.com
CC: Sony Chacko sony.cha...@qlogic.com
CC: Rajesh Borundia rajesh.borun...@qlogic.com
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson ja...@redhat.com
---
 drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c 
b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index 6409a06..0fd5ada54 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -1387,8 +1387,10 @@ netxen_setup_netdev(struct netxen_adapter *adapter,
if (adapter-capabilities  NX_FW_CAPABILITY_FVLANTX)
netdev-hw_features |= NETIF_F_HW_VLAN_CTAG_TX;
 
-   if (adapter-capabilities  NX_FW_CAPABILITY_HW_LRO)
+   if (adapter-capabilities  NX_FW_CAPABILITY_HW_LRO) {
netdev-hw_features |= NETIF_F_LRO;
+   netdev-vlan_features |= NETIF_F_LRO;
+   }
 
netdev-features |= netdev-hw_features;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] ethernet/ixgbe: advertise LRO support in vlan_features

2015-08-13 Thread Jarod Wilson

Without this, the presence of a ixgbe device in a bond will not trigger
LRO support to be enabled at the bond level, even while it is enabled on
the slave itself.

This change becomes necessary when NETIF_F_LRO is added to
netdev_features.h's NETIF_F_ONE_FOR_ALL.

CC: Jeff Kirsher jeffrey.t.kirs...@intel.com
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson ja...@redhat.com
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3e6a931..0a6e4e1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8659,8 +8659,10 @@ skip_sriov:
 
if (adapter-flags2  IXGBE_FLAG2_RSC_CAPABLE)
netdev-hw_features |= NETIF_F_LRO;
-   if (adapter-flags2  IXGBE_FLAG2_RSC_ENABLED)
+   if (adapter-flags2  IXGBE_FLAG2_RSC_ENABLED) {
netdev-features |= NETIF_F_LRO;
+   netdev-vlan_features |= NETIF_F_LRO;
+   }
 
/* make sure the EEPROM is good */
if (hw-eeprom.ops.validate_checksum(hw, NULL)  0) {
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IGMP: Inhibit reports for local multicast groups

2015-08-13 Thread Thadeu Lima de Souza Cascardo

On Thu, Aug 13, 2015 at 07:01:37PM +0200, Andrew Lunn wrote:
 On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote:
  Hi Andrew
  IGMP snooping is designed to prevent hosts on a local network from 
  receiving traffic for a multicast group they have not explicitly joined.   
  Link-Local multicast traffic should not have an IGMP client since it is 
  reserved for routing protocols.  One would expect that IGMP snooping needs 
  to ignore local multicast traffic in the reserved range intended for 
  routers since there should be no IGMP client to make join requests.
 
 The point of this patch is that Linux is sending out group membership
 for these addresses, it is acting as a client. What happens with a
 switch which is applying IGMP snooping to link-local multicast groups?
 You turn on this feature, and you no longer get your routing protocol
 messages.
 
 I had a quick look at RFC 3376. The only mention i spotted for not
 sending IGMP messages is:
 
The all-systems multicast address, 224.0.0.1, is handled as a special
case.  On all systems -- that is all hosts and routers, including
multicast routers -- reception of packets destined to the all-systems
multicast address, from all sources, is permanently enabled on all
interfaces on which multicast reception is supported.  No IGMP
messages are ever sent regarding the all-systems multicast address.
 
 IGMP v2 has something similar:
 
The all-systems group (address 224.0.0.1) is handled as a special
case.  The host starts in Idle Member state for that group on every
interface, never transitions to another state, and never sends a
report for that group.
 
 But i did not find anything which says all other link-local addresses
 don't need member reports. Did i miss something?
 
   Andrew

From RFC 4541 (Considerations for Internet Group Management Protocol (IGMP) and
Multicast Listener Discovery (MLD) Snooping Switches):

 2) Packets with a destination IP (DIP) address in the 224.0.0.X range
  which are not IGMP must be forwarded on all ports.

  This recommendation is based on the fact that many host systems do
  not send Join IP multicast addresses in this range before sending
  or listening to IP multicast packets.  Furthermore, since the
  224.0.0.X address range is defined as link-local (not to be
  routed), it seems unnecessary to keep the state for each address
  in this range.  Additionally, some routers operate in the
  224.0.0.X address range without issuing IGMP Joins, and these
  applications would break if the switch were to prune them due to
  not having seen a Join Group message from the router.

So, it looks like some hosts and routers out there in the field do not send
joins for those local addresses. In fact, IPv4 local multicast addresses are
ignored when Linux bridge multicast snooping adds a new group.

static int br_ip4_multicast_add_group(struct net_bridge *br,
...
if (ipv4_is_local_multicast(group))
return 0;

Cascardo.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 0/3] net: Identifier Locator Addressing - Part I

2015-08-13 Thread Tom Herbert

This patch set provides rudimentary support for Identifier Locator
Addressing or ILA. The basic concept of ILA is that we split an IPv6
address into a 64 bit locator and 64 bit identifier. The identifier is
the identity of an entity in communication (who), and the locator
expresses the location of the entity (where). Applications
use externally visible address that contains the identifier.
When a packet is actually sent, a translation is done that
overwrites the first 64 bits of the address with a locator.
The packet can then be forwarded over the network to the host where
the addressed entity is located. At the receiver, the reverse
translation is done so the that the application sees the original,
untranslated address. Presumably an external control plane will
provide identifier-locator mappings.

The data path for ILA is a simple NAT translation that only operates
on the upper 64 bits of a destination address in IPv6 packets. The
basic process is:

   1) Lookup 64 bit identifier (lower 64 bits of destination)
   2) If a match is found
  a) Overwrite locator (upper 64 bits of destination) with
 the new locator
  b) Adjust any checksum that has destination address included in
 pseudo header
   3) Send or receive packet

ILA is a means to implement tunnels or network virtualization without
encapsulation. Since there is no encapsulation involved, we assume that
stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
just works. Also, since we're minimally changing the packet many of
the worries about encapsulation (MTU, checksum, fragmentation) are
not relevant. The downside is that, ILA is not extensible like other
encapsulations (GUE for instance) so it might not be appropriate for
all use cases. Also, this only makes sense to do in IPv6!

A key aspect of ILA is performance. The intent is that ILA would be
used in data centers in virtualizing tasks or jobs. In the fullest
incarnation all intra data center communications might be targeted to
virtual ILA addresses. This is basically adding a new virtualization
capability to the existing services in a datacenter, so there is a
strong expectation is that this does not degrade performance for
existing applications.

Performance seems to be dependent on how ILA is hooked into kernel.
ILA can be implemented under some different models:

  - Mechanically it is a form a stateless DNAT
  - It can be thought of as a type of (source) routing
  - As a functional replacement of encapsulation

In this patch set we hook into the data path using Light Weight
Tunnels (LWT) infrastructure. As part of that, we add support in LWT
to redirect dst input. iproute will be modified to take a new ila encap
type. ILA can be configured like:

# ILA to destination
ip route add :0:0:1::0:2:0/128 \
   encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0

# Configure local address
ip -6 addr add :0:0:1::0:1:0/128 dev eth0

# ILA translation for local address on input
ip route add table local local 2001:0:0:1::0:1:0/128
   encap ila :0:0:1 dev lo

So sending to destination :0:0:1::0:2:0 will have destination
of 2001:0:0:2::0:2:0 on the wire.

Performance results are below. With ILA we see about a 10% drop in
pps compared to non-ILA. Much of this drop can be attributed to the
loss of early demux on input (translation occurs after it is attempted).
We will address this in the next patch set. Also, IPvlan input path
does not work with ILA since the routing is bypassed-- this will
be addressed in a future patch.

Performance testing:

Performing netperf TCP_RR with 200 clients:

Non-ILA baseline
  84.92% CPU utilization
  1861922.9 tps
  93/163/330 50/90/99% latencies

ILA single destination
  83.16% CPU utilization
  1679683.4 tps 
  105/180/332 50/90/99% latencies

References:

Slides from netconf:
http://vger.kernel.org/netconf2015Herbert-ILA.pdf

Slides from presentation at IETF:
https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf

I-D:
https://tools.ietf.org/html/draft-herbert-nvo3-ila-00

Tom Herbert (3):
  lwt: Add support to redirect dst.input
  net: Add inet_proto_csum_replace_by_diff utility function
  net: Identifier Locator Addressing module

 include/net/checksum.h|   2 +
 include/net/lwtunnel.h|  25 +++-
 include/uapi/linux/ila.h  |  15 +
 include/uapi/linux/lwtunnel.h |   1 +
 net/core/lwtunnel.c   |  55 +
 net/core/utils.c  |  13 +
 net/ipv4/route.c  |   8 ++-
 net/ipv6/Kconfig  |  18 ++
 net/ipv6/Makefile |   1 +
 net/ipv6/ila/Makefile |   7 +++
 net/ipv6/ila/ila.h|  50 
 net/ipv6/ila/ila_lwt.c| 133 ++
 net/ipv6/ila/ila_main.c   |  69 ++
 net/ipv6/route.c  |   8 ++-
 14 files changed, 402 insertions(+), 3 deletions(-)
 create mode 100644

[PATCH net-next 1/3] lwt: Add support to redirect dst.input

2015-08-13 Thread Tom Herbert

This patch adds the capability to redirect dst input in the same way
that dst output is redirected by LWT.

Also, save the original dst.input and and dst.out when setting up
lwtunnel redirection. These can be called by the client as a pass-
through.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 include/net/lwtunnel.h | 25 ++-
 net/core/lwtunnel.c| 55 ++
 net/ipv4/route.c   |  8 +++-
 net/ipv6/route.c   |  8 +++-
 4 files changed, 93 insertions(+), 3 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index 33bd309..3db87d7 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -11,12 +11,15 @@
 #define LWTUNNEL_HASH_SIZE   (1  LWTUNNEL_HASH_BITS)
 
 /* lw tunnel state flags */
-#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
+#define LWTUNNEL_STATE_OUTPUT_REDIRECT BIT(0)
+#define LWTUNNEL_STATE_INPUT_REDIRECT  BIT(1)
 
 struct lwtunnel_state {
__u16   type;
__u16   flags;
atomic_trefcnt;
+   int (*orig_output)(struct sock *sk, struct sk_buff *skb);
+   int (*orig_input)(struct sk_buff *);
int len;
__u8data[0];
 };
@@ -25,6 +28,7 @@ struct lwtunnel_encap_ops {
int (*build_state)(struct net_device *dev, struct nlattr *encap,
   struct lwtunnel_state **ts);
int (*output)(struct sock *sk, struct sk_buff *skb);
+   int (*input)(struct sk_buff *skb);
int (*fill_encap)(struct sk_buff *skb,
  struct lwtunnel_state *lwtstate);
int (*get_encap_size)(struct lwtunnel_state *lwtstate);
@@ -58,6 +62,13 @@ static inline bool lwtunnel_output_redirect(struct 
lwtunnel_state *lwtstate)
return false;
 }
 
+static inline bool lwtunnel_input_redirect(struct lwtunnel_state *lwtstate)
+{
+   if (lwtstate  (lwtstate-flags  LWTUNNEL_STATE_INPUT_REDIRECT))
+   return true;
+
+   return false;
+}
 int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
   unsigned int num);
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
@@ -72,6 +83,8 @@ struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
 int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
 int lwtunnel_output(struct sock *sk, struct sk_buff *skb);
 int lwtunnel_output6(struct sock *sk, struct sk_buff *skb);
+int lwtunnel_input(struct sk_buff *skb);
+int lwtunnel_input6(struct sk_buff *skb);
 
 #else
 
@@ -142,6 +155,16 @@ static inline int lwtunnel_output6(struct sock *sk, struct 
sk_buff *skb)
return -EOPNOTSUPP;
 }
 
+static inline int lwtunnel_input(struct sock *sk, struct sk_buff *skb)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int lwtunnel_input6(struct sock *sk, struct sk_buff *skb)
+{
+   return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* __NET_LWTUNNEL_H */
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index 5d6d8e3..3331585 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -241,3 +241,58 @@ int lwtunnel_output(struct sock *sk, struct sk_buff *skb)
return __lwtunnel_output(sk, skb, lwtstate);
 }
 EXPORT_SYMBOL(lwtunnel_output);
+
+int __lwtunnel_input(struct sk_buff *skb,
+struct lwtunnel_state *lwtstate)
+{
+   const struct lwtunnel_encap_ops *ops;
+   int ret = -EINVAL;
+
+   if (!lwtstate)
+   goto drop;
+
+   if (lwtstate-type == LWTUNNEL_ENCAP_NONE ||
+   lwtstate-type  LWTUNNEL_ENCAP_MAX)
+   return 0;
+
+   ret = -EOPNOTSUPP;
+   rcu_read_lock();
+   ops = rcu_dereference(lwtun_encaps[lwtstate-type]);
+   if (likely(ops  ops-input))
+   ret = ops-input(skb);
+   rcu_read_unlock();
+
+   if (ret == -EOPNOTSUPP)
+   goto drop;
+
+   return ret;
+
+drop:
+   kfree_skb(skb);
+
+   return ret;
+}
+
+int lwtunnel_input6(struct sk_buff *skb)
+{
+   struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
+   struct lwtunnel_state *lwtstate = NULL;
+
+   if (rt)
+   lwtstate = rt-rt6i_lwtstate;
+
+   return __lwtunnel_input(skb, lwtstate);
+}
+EXPORT_SYMBOL(lwtunnel_input6);
+
+int lwtunnel_input(struct sk_buff *skb)
+{
+   struct rtable *rt = (struct rtable *)skb_dst(skb);
+   struct lwtunnel_state *lwtstate = NULL;
+
+   if (rt)
+   lwtstate = rt-rt_lwtstate;
+
+   return __lwtunnel_input(skb, lwtstate);
+}
+EXPORT_SYMBOL(lwtunnel_input);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 18fd7c9..051d834 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1630,8 +1630,14 @@ static int __mkroute_input(struct sk_buff *skb,
rth-dst.output = ip_output;
 
rt_set_nexthop(rth, daddr, res, fnhe, res-fi, res-type, itag);
-   if (lwtunnel_output_redirect(rth-rt_lwtstate))
+   if

[PATCH net-next 3/3] net: Identifier Locator Addressing module

2015-08-13 Thread Tom Herbert

Adding new module name ila. This implements ILA translation. Light
weight tunnel redirection is used to perform the translation in
the data path. This is configured by the ip -6 route command
using the encap ila locator option, where locator is the
value to set in destination locator of the packet. e.g.

ip -6 route add :0:0:1::0:1:0/128 \
  encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0

Sets a route where :0:0:1 will be overwritten by
2001:0:0:1 on output.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 include/uapi/linux/ila.h  |  15 +
 include/uapi/linux/lwtunnel.h |   1 +
 net/ipv6/Kconfig  |  18 ++
 net/ipv6/Makefile |   1 +
 net/ipv6/ila/Makefile |   7 +++
 net/ipv6/ila/ila.h|  50 
 net/ipv6/ila/ila_lwt.c| 133 ++
 net/ipv6/ila/ila_main.c   |  69 ++
 8 files changed, 294 insertions(+)
 create mode 100644 include/uapi/linux/ila.h
 create mode 100644 net/ipv6/ila/Makefile
 create mode 100644 net/ipv6/ila/ila.h
 create mode 100644 net/ipv6/ila/ila_lwt.c
 create mode 100644 net/ipv6/ila/ila_main.c

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
new file mode 100644
index 000..7ed9e67
--- /dev/null
+++ b/include/uapi/linux/ila.h
@@ -0,0 +1,15 @@
+/* ila.h - ILA Interface */
+
+#ifndef _UAPI_LINUX_ILA_H
+#define _UAPI_LINUX_ILA_H
+
+enum {
+   ILA_ATTR_UNSPEC,
+   ILA_ATTR_LOCATOR,   /* u64 */
+
+   __ILA_ATTR_MAX,
+};
+
+#define ILA_ATTR_MAX   (__ILA_ATTR_MAX - 1)
+
+#endif /* _UAPI_LINUX_ILA_H */
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index 31377bb..04bac3b 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -7,6 +7,7 @@ enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_NONE,
LWTUNNEL_ENCAP_MPLS,
LWTUNNEL_ENCAP_IP,
+   LWTUNNEL_ENCAP_ILA,
__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index 643f613..c732e27 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -92,6 +92,24 @@ config IPV6_MIP6
 
  If unsure, say N.
 
+config IPV6_ILA
+   tristate IPv6: Identifier Locator Addressing (ILA)
+   ---help---
+ Support for IPv6 Identifier Locator Addressing (ILA).
+
+ ILA is a mechanism to do network virtualization without
+ encapsulation. The basic concept of ILA is that we split an
+ IPv6 address into a 64 bit locator and 64 bit identifier. The
+ identifier is the identity of an entity in communication
+ (who) and the locator expresses the location of the
+ entity (where).
+
+ ILA can be configured using the encap ila option with
+ ip -6 route command. ILA is described in
+ https://tools.ietf.org/html/draft-herbert-nvo3-ila-00.
+
+ If unsure, say N.
+
 config INET6_XFRM_TUNNEL
tristate
select INET6_TUNNEL
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 0f3f199..2fbd90b 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o
 obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o
 obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o
 obj-$(CONFIG_IPV6_MIP6) += mip6.o
+obj-$(CONFIG_IPV6_ILA) += ila/
 obj-$(CONFIG_NETFILTER)+= netfilter/
 
 obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile
new file mode 100644
index 000..cc0c202
--- /dev/null
+++ b/net/ipv6/ila/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for ILA module
+#
+
+obj-$(CONFIG_IPV6_ILA) += ila.o
+
+ila-objs := ila_main.o ila_lwt.o
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
new file mode 100644
index 000..d2298b3
--- /dev/null
+++ b/net/ipv6/ila/ila.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright (c) 2015 Tom Herbert t...@herbertland.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ */
+
+#ifndef __ILA_H
+#define __ILA_H
+
+#include linux/errno.h
+#include linux/ip.h
+#include linux/kernel.h
+#include linux/module.h
+#include linux/socket.h
+#include linux/skbuff.h
+#include linux/types.h
+#include net/checksum.h
+#include net/ip.h
+#include net/protocol.h
+#include uapi/linux/ila.h
+
+struct ila_params {
+   __be64 locator;
+};
+
+static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
+{
+   __be32 diff[] = {
+   ~from[0], ~from[1], to[0], to[1],
+   };
+
+   return csum_partial(diff, sizeof(diff), 0);
+}
+
+static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
+{
+   return compute_csum_diff8((__be32 *)ip6h-daddr,
+

[PATCH net-next 2/3] net: Add inet_proto_csum_replace_by_diff utility function

2015-08-13 Thread Tom Herbert

This function updates a checksum field value and skb-csum based on
a value which is the difference between the old and new checksum.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 include/net/checksum.h |  2 ++
 net/core/utils.c   | 13 +
 2 files changed, 15 insertions(+)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 2d1d73c..0e0c987 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -144,6 +144,8 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff 
*skb,
 void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
   const __be32 *from, const __be32 *to,
   int pseudohdr);
+void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
+__wsum diff, int pseudohdr);
 
 static inline void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
__be16 from, __be16 to,
diff --git a/net/core/utils.c b/net/core/utils.c
index a7732a0..89ccfb1 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -336,6 +336,19 @@ void inet_proto_csum_replace16(__sum16 *sum, struct 
sk_buff *skb,
 }
 EXPORT_SYMBOL(inet_proto_csum_replace16);
 
+void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
+__wsum diff, int pseudohdr)
+{
+   if (skb-ip_summed != CHECKSUM_PARTIAL) {
+   *sum = csum_fold(csum_add(diff, ~csum_unfold(*sum)));
+   if (skb-ip_summed == CHECKSUM_COMPLETE  pseudohdr)
+   skb-csum = ~csum_add(diff, ~skb-csum);
+   } else if (pseudohdr) {
+   *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
+   }
+}
+EXPORT_SYMBOL(inet_proto_csum_replace_by_diff);
+
 struct __net_random_once_work {
struct work_struct work;
struct static_key *key;
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 5/7] net: dsa: mv88e6xxx: add VLAN Purge support

2015-08-13 Thread Vivien Didelot

Add support for the VTU Load Purge operation and implement the
port_vlan_del driver function to remove a port from a VLAN entry, and
delete the VLAN if the given port was its last member.

Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com
---
 drivers/net/dsa/mv88e6352.c |   1 +
 drivers/net/dsa/mv88e6xxx.c | 113 
 drivers/net/dsa/mv88e6xxx.h |   2 +
 3 files changed, 116 insertions(+)

diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index e6767ce..cec38bb 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -344,6 +344,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.port_leave_bridge  = mv88e6xxx_leave_bridge,
.port_stp_update= mv88e6xxx_port_stp_update,
.port_pvid_get  = mv88e6xxx_port_pvid_get,
+   .port_vlan_del  = mv88e6xxx_port_vlan_del,
.vlan_getnext   = mv88e6xxx_vlan_getnext,
.port_fdb_add   = mv88e6xxx_port_fdb_add,
.port_fdb_del   = mv88e6xxx_port_fdb_del,
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 6c86bad..8423924 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1254,6 +1254,32 @@ static int _mv88e6xxx_vtu_stu_data_read(struct 
dsa_switch *ds,
return 0;
 }
 
+static int _mv88e6xxx_vtu_stu_data_write(struct dsa_switch *ds,
+struct mv88e6xxx_vtu_stu_entry *entry,
+unsigned int nibble_offset)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   u16 regs[3] = { 0 };
+   int i;
+   int ret;
+
+   for (i = 0; i  ps-num_ports; ++i) {
+   unsigned int shift = (i % 4) * 4 + nibble_offset;
+   u8 data = entry-data[i];
+
+   regs[i / 4] |= (data  GLOBAL_VTU_STU_DATA_MASK)  shift;
+   }
+
+   for (i = 0; i  3; ++i) {
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL,
+  GLOBAL_VTU_DATA_0_3 + i, regs[i]);
+   if (ret  0)
+   return ret;
+   }
+
+   return 0;
+}
+
 static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid,
  struct mv88e6xxx_vtu_stu_entry *entry)
 {
@@ -1307,6 +1333,93 @@ static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, 
u16 vid,
return 0;
 }
 
+static int _mv88e6xxx_vtu_loadpurge(struct dsa_switch *ds,
+   struct mv88e6xxx_vtu_stu_entry *entry)
+{
+   u16 reg = 0;
+   int ret;
+
+   ret = _mv88e6xxx_vtu_wait(ds);
+   if (ret  0)
+   return ret;
+
+   if (!entry-valid)
+   goto loadpurge;
+
+   /* Write port member tags */
+   ret = _mv88e6xxx_vtu_stu_data_write(ds, entry, 0);
+   if (ret  0)
+   return ret;
+
+   if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) ||
+   mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) {
+   reg = entry-sid  GLOBAL_VTU_SID_MASK;
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_SID, reg);
+   if (ret  0)
+   return ret;
+
+   reg = entry-fid  GLOBAL_VTU_FID_MASK;
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_FID, reg);
+   if (ret  0)
+   return ret;
+   }
+
+   reg = GLOBAL_VTU_VID_VALID;
+loadpurge:
+   reg |= entry-vid  GLOBAL_VTU_VID_MASK;
+   ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, reg);
+   if (ret  0)
+   return ret;
+
+   return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_LOAD_PURGE);
+}
+
+int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port, u16 vid)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   struct mv88e6xxx_vtu_stu_entry vlan;
+   bool keep = false;
+   int i, err;
+
+   mutex_lock(ps-smi_mutex);
+
+   err = _mv88e6xxx_vtu_getnext(ds, vid - 1, vlan);
+   if (err)
+   goto unlock;
+
+   if (vlan.vid != vid || !vlan.valid ||
+   vlan.data[port] == GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER) {
+   err = -ENOENT;
+   goto unlock;
+   }
+
+   vlan.data[port] = GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER;
+
+   /* keep the VLAN unless all ports are excluded */
+   for (i = 0; i  ps-num_ports; ++i) {
+   if (dsa_is_cpu_port(ds, i))
+   continue;
+
+   if (vlan.data[i] != GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER) {
+   keep = true;
+   break;
+   }
+   }
+
+   vlan.valid = keep;
+   err = _mv88e6xxx_vtu_loadpurge(ds, vlan);
+   if (err)
+   goto unlock;
+
+   if (!keep)
+   clear_bit(vlan.fid, ps-fid_bitmap);
+
+unlock:
+

[PATCH net-next 0/7] net: dsa: mv88e6xxx: add hardware VLAN support

2015-08-13 Thread Vivien Didelot

Hi All,

This patchset brings support to access hardware VLAN entries in DSA and
mv88e6xxx, through switchdev VLAN objects.

In the following example, ports swp[0-2] belong to bridge br0, and ports
swp[3-4] belong to bridge br1. Here's an example of what can be achieved
after this patchset:

# bridge vlan add dev swp1 vid 100 master
# bridge vlan add dev swp2 vid 100 master
# bridge vlan add dev swp3 vid 100 master
# bridge vlan add dev swp4 vid 100 master
# bridge vlan del dev swp1 vid 100 master

The above commands correctly programmed hardware VLAN 100 for port swp2,
while ports swp3 and swp4 use software VLAN 100, as shown with:

# bridge vlan
portvlan ids
swp0None
swp0
swp1None
swp1
swp2 100

swp2 100

swp3 100

swp3
swp4 100

swp4
br0 None
br1 None

Assuming that port 5 is the CPU port, the hardware VLAN table would
contain the following data:

VID  FID  SID  0  1  2  3  4  5  6
10080  x  x  t  x  x  t  x

Where 'x' means excluded, and 't' means tagged.

Also, adding an FDB entry to VLAN 100 for port swp2 like this:

# bridge fdb add 3c:97:0e:11:6e:30 dev swp2 vlan 100

Would result in the following example output:

# bridge fdb
# 01:00:5e:00:00:01 dev eth0 self permanent
# 01:00:5e:00:00:01 dev eth1 self permanent
# 00:50:d2:10:78:15 dev swp0 master br0 permanent
# 00:50:d2:10:78:15 dev swp2 vlan 100 master br0 permanent
# 3c:97:0e:11:6e:30 dev swp2 vlan 100 self static
# 00:50:d2:10:78:15 dev swp3 master br1 permanent
# 00:50:d2:10:78:15 dev swp3 vlan 100 master br1 permanent

And the Address Translation Unit would contain:

DB   T/P  Vec State Addr
008  Port 004   e   3c:97:0e:11:6e:30

Cheers,
-v

Vivien Didelot (7):
  net: dsa: add support for switchdev VLAN objects
  net: dsa: mv88e6xxx: flush VTU and STU entries
  net: dsa: mv88e6xxx: add VLAN Get Next support
  net: dsa: mv88e6xxx: add VLAN support to FDB dump
  net: dsa: mv88e6xxx: add VLAN Purge support
  net: dsa: mv88e6xxx: add VLAN Load support
  net: dsa: mv88e6xxx: use port 802.1Q mode Secure

 drivers/net/dsa/mv88e6352.c |   5 +
 drivers/net/dsa/mv88e6xxx.c | 510 +++-
 drivers/net/dsa/mv88e6xxx.h |  45 
 include/net/dsa.h   |  11 +
 net/dsa/slave.c | 158 ++
 5 files changed, 720 insertions(+), 9 deletions(-)

-- 
2.5.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging

2015-08-13 Thread David Miller

From: Joe Perches j...@perches.com
Date: Thu, 13 Aug 2015 10:15:15 -0700

 On Thu, 2015-08-13 at 20:11 +0300, Igor Plyatov wrote:
  On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote:
  * Due to HW bug, LAN8700 sometimes does not detect presence of energy in 
  the
 Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN 
  bit is
 set, the ENERGYON bit does not asserted sometimes). This is a common 
  bug of
 LAN87xx family of PHY chips.
  * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its 
  previous
 algorythm still not reliable on 100 % and sometimes skip cable 
  plugging.
  []
  diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
  []
  @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device 
  *phydev)
static int lan87xx_read_status(struct phy_device *phydev)
{
int err = genphy_read_status(phydev);
  + int rc;
  Is there a reason to move this declaration?
 
 There is no strict requirement to move declaration of the rc.
 It was made just to have all declarations easily visible.
 
 Generally it's better to have declarations
 in the minimal/narrowest scope possible.

Agreed, and it's %100 unrelated to the purpose of this patch so not
should be included for that reason as well.

You will need to respin this patch with the variable moving elided.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: allow sleeping when modifying store_rps_map

2015-08-13 Thread Tom Herbert

On Thu, Aug 13, 2015 at 11:03 AM, Sasha Levin sasha.le...@oracle.com wrote:
 Commit 10e4ea751 (net: Fix race condition in store_rps_map) has moved the
 manipulation of the rps_needed jump label under a spinlock. Since changing
 the state of a jump label may sleep this is incorrect and causes warnings
 during runtime.

 Make rps_map_lock a mutex to allow sleeping under it.

 Fixes: 10e4ea751 (net: Fix race condition in store_rps_map)
 Signed-off-by: Sasha Levin sasha.le...@oracle.com
 ---
  net/core/net-sysfs.c |8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
 index 39ec694..b279077 100644
 --- a/net/core/net-sysfs.c
 +++ b/net/core/net-sysfs.c
 @@ -689,7 +689,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue 
 *queue,
 struct rps_map *old_map, *map;
 cpumask_var_t mask;
 int err, cpu, i;
 -   static DEFINE_SPINLOCK(rps_map_lock);
 +   static DEFINE_MUTEX(rps_map_mutex);

 if (!capable(CAP_NET_ADMIN))
 return -EPERM;
 @@ -722,9 +722,9 @@ static ssize_t store_rps_map(struct netdev_rx_queue 
 *queue,
 map = NULL;
 }

 -   spin_lock(rps_map_lock);
 +   mutex_lock(rps_map_mutex);
 old_map = rcu_dereference_protected(queue-rps_map,
 -   lockdep_is_held(rps_map_lock));
 +   mutex_is_locked(rps_map_mutex));
 rcu_assign_pointer(queue-rps_map, map);

 if (map)
 @@ -732,7 +732,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue 
 *queue,
 if (old_map)
 static_key_slow_dec(rps_needed);

 -   spin_unlock(rps_map_lock);
 +   mutex_unlock(rps_map_mutex);

 if (old_map)
 kfree_rcu(old_map, rcu);
 --
 1.7.10.4


Thanks Sasha!

Acked-by: Tom Herbert t...@herbertland.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging

2015-08-13 Thread Joe Perches

On Thu, 2015-08-13 at 20:11 +0300, Igor Plyatov wrote:
  On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote:
  * Due to HW bug, LAN8700 sometimes does not detect presence of energy in 
  the
 Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN 
  bit is
 set, the ENERGYON bit does not asserted sometimes). This is a common 
  bug of
 LAN87xx family of PHY chips.
  * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its 
  previous
 algorythm still not reliable on 100 % and sometimes skip cable plugging.
  []
  diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
  []
  @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device 
  *phydev)
static int lan87xx_read_status(struct phy_device *phydev)
{
 int err = genphy_read_status(phydev);
  +  int rc;
  Is there a reason to move this declaration?
 
 There is no strict requirement to move declaration of the rc.
 It was made just to have all declarations easily visible.

Generally it's better to have declarations
in the minimal/narrowest scope possible.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len

2015-08-13 Thread Jesper Dangaard Brouer

On Thu, 13 Aug 2015 10:49:50 -0700
Stephen Hemminger step...@networkplumber.org wrote:

 On Thu, 13 Aug 2015 19:01:05 +0200
 Phil Sutter p...@nwl.cc wrote:
 
  Up to now, drivers being aware of the above applying to them set
  dev-tx_queue_len to zero to indicate no qdisc should be attached to the
  interface they drive and the kernel reacts upon this by assigning the noop
  qdisc instead of the default pfifo_fast. This implicit agreement though 
  leads
  to an inconvenient situation once a user tries to attach a real qdisc to 
  these
  devices, as the formerly special tx_queue_len value becomes a regular one,
 
 So this is a workaround for user ignorance by introducing kernel API 
 complexity.
 Before user sets qdisc, why don't they set tx queue length?

Please don't insist on keeping this broke interface... how should users
know that BEFORE adding a qdisc they MUST change the _device_ tx queue
length (not zero).  Getting back to the original state, they MUST
change the device tx queue len back to zero BEFORE deleting the qdisc,
such that when assigning the default queue qdisc the system detects
this device can work without a qdisc.  Changing the tx queue len to
zero after the qdisc is deleted will have not effect. 

Listen to the description, that interface is broken. The kernel really
needs to hide these details from userspace.

It even allows you to misconfigure the kernel, by tricking the kernel
into assigning noqueue to physical devices that really need it.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] bonding: only advertise LRO if underlying hardware can LRO

2015-08-13 Thread Jarod Wilson

At present, you can create a bond, containing only underlying slaves that do
not support LRO, and the bond will happily claim to support LRO, and allow
LRO to be toggled on and off by ethtool. While things actually do function
fine in the scenario, and this is merely cosmetic, its a bit misleading to
users, and its something we can fix.

If we add NETIF_F_LRO to the NETIF_F_ONE_FOR_ALL flags in
netdev_features.h, then netdev_features_increment() will only enable LRO
if 1) its listed in the device's feature mask and 2) if there's actually a
slave present that supports the feature.

However, the bnx2x, ixgbe, netxen, qlcnic and s2io drivers all fail to report
support for LRO in their vlan_features, which requires some minor fixups to
these drivers to keep LRO working in cases where it should have been before
this set. The mellanox mlx5 and cavium liquidio drivers already properly set
the LRO flag in their vlan_features.

Note: I've only tested explicitly with bnx2x, as well as some non-LRO hw,
to confirm that:

1) if all slaves support LRO, the bond enables LRO
2) if some slaves support LRO, the bond enables LRO
3) if no slaves support LRO, the bond disables LRO

This set was generated against net-next master, it applies to 4.2.0-rc6 with
a bit of fuzz.

Jarod Wilson (6):
  net/bonding: enable LRO if one device supports it
  ethernet/bnx2x: advertise LRO support in vlan_features
  ethernet/ixgbe: advertise LRO support in vlan_features
  ethernet/netxen: advertise LRO support in vlan_features
  ethernet/qlcnic: advertise LRO support in vlan_features
  ethernet/s2io: advertise what hw supports in vlan_features

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 4 +++-
 drivers/net/ethernet/neterion/s2io.c | 1 +
 drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 4 +++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 4 +++-
 include/linux/netdev_features.h  | 3 ++-
 6 files changed, 14 insertions(+), 5 deletions(-)

CC: David S. Miller da...@davemloft.net
CC: Ariel Elior ariel.el...@qlogic.com
CC: Manish Chopra manish.cho...@qlogic.com
CC: Rajesh Borundia rajesh.borun...@qlogic.com
CC: Shahed Shaikh shahed.sha...@qlogic.com
CC: Sony Chacko sony.cha...@qlogic.com
CC: dept-gelinuxnic...@qlogic.com
CC: Jiri Pirko j...@resnulli.us
CC: Jon Mason jdma...@kudzu.us
CC: Scott Feldman sfel...@gmail.com
CC: Tom Herbert therb...@google.com
CC: Jeff Kirsher jeffrey.t.kirs...@intel.com
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 1/2] tcp: don't extend RTO on failed loss probe attempts

2015-08-13 Thread Eric Dumazet

On Wed, 2015-08-12 at 11:18 -0700, Yuchung Cheng wrote:
 If TLP was unable to send a probe, it extended the RTO to
 now + icsk_rto. But extending the RTO makes little sense
 if no TLP probe went out. With this commit, instead of
 extending the RTO we re-arm it relative to the transmit time
 of the write queue head.
 
 Signed-off-by: Yuchung Cheng ych...@google.com
 Signed-off-by: Neal Cardwell ncardw...@google.com
 Signed-off-by: Nandita Dukkipati nandi...@google.com
 ---

Acked-by: Eric Dumazet eduma...@google.com


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] net/bonding: enable LRO if one device supports it

2015-08-13 Thread Jarod Wilson

Currently, all bonding devices come up, and claim to have LRO support,
which ethtool will let you toggle on and off, even if none of the
underlying hardware devices actually support it. While the bonding driver
takes precautions for slaves that don't support all features, this is at
least a little bit misleading to users.

If we add NETIF_F_LRO to the NETIF_F_ONE_FOR_ALL flags in
netdev_features.h, then netdev_features_increment() will only enable LRO
if 1) its listed in the device's feature mask and 2) if there's actually a
slave present that supports the feature.

Note that this is going to require some follow-up patches, as not all LRO
capable device drivers are currently properly reporting LRO support in
their vlan_features, which is where the bonding driver picks up
device-specific features.

CC: David S. Miller da...@davemloft.net
CC: Jiri Pirko j...@resnulli.us
CC: Tom Herbert therb...@google.com
CC: Scott Feldman sfel...@gmail.com
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson ja...@redhat.com
---
 include/linux/netdev_features.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 9672781..6440bf1 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -159,7 +159,8 @@ enum {
  */
 #define NETIF_F_ONE_FOR_ALL(NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ROBUST | \
 NETIF_F_SG | NETIF_F_HIGHDMA | \
-NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED)
+NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED | \
+NETIF_F_LRO)
 
 /*
  * If one device doesn't support one of these features, then disable it
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net-next 06/10] openvswitch: Allow matching on conntrack mark

2015-08-13 Thread Pravin Shelar

On Wed, Aug 12, 2015 at 4:41 PM, Joe Stringer joestrin...@nicira.com wrote:
 On 12 August 2015 at 16:00, Pravin Shelar pshe...@nicira.com wrote:
 On Tue, Aug 11, 2015 at 3:59 PM, Joe Stringer joestrin...@nicira.com wrote:
 From: Justin Pettit jpet...@nicira.com

 Allow matching and setting the conntrack mark field. As with conntrack
 state and zone, these are populated by executing the ct() action. Unlike
 these, the ct_mark is also a writable field. The set_field() action may
 be used to modify the mark, which will take effect on the most recent
 conntrack entry.

 E.g.: actions:ct(zone=0),ct(zone=1),set_field(1-ct_mark)

 This will perform conntrack lookup in zone 0, then lookup in zone 1,
 then modify the mark for the entry in zone 1. The mark for the entry in
 zone 0 is unchanged. The conntrack entry itself must be committed using
 the commit flag in the conntrack action flags for this change to persist.

 Signed-off-by: Justin Pettit jpet...@nicira.com
 Signed-off-by: Joe Stringer joestrin...@nicira.com
 ---
 ...


 +int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key,
 +   u32 ct_mark, u32 mask)
 +{
 +#ifdef CONFIG_NF_CONNTRACK_MARK
 +   enum ip_conntrack_info ctinfo;
 +   struct nf_conn *ct;
 +   u32 new_mark;
 +
 +   /* This must happen directly after lookup/commit. */
 +   ct = nf_ct_get(skb, ctinfo);
 +   if (!ct)
 +   return -EINVAL;
 +
 +   new_mark = ct_mark | (ct-mark  ~(mask));
 +   if (ct-mark != new_mark) {
 +   ct-mark = new_mark;
 +   nf_conntrack_event_cache(IPCT_MARK, ct);
 +   key-ct.mark = ct_mark;
 +   }
 +

 Is it fine to set just set mark and not initialize reset of key-ct members?

 I don't quite follow. This action acts upon the current connection,
 and modifies its metadata. key-ct should already be populated with
 the existing connection info.

I had offline discussion with Joe. The fields are initialized in prior
conntrack action. So now he is exploring if we can bring conntrack,
set mark and set lable actions under one single conntrack action using
parameters.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] inet: fix potential deadlock in reqsk_queue_unlink()

2015-08-13 Thread Eric Dumazet

From: Eric Dumazet eduma...@google.com

When replacing del_timer() with del_timer_sync(), I introduced
a deadlock condition :

reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()

inet_csk_reqsk_queue_drop() can be called from many contexts,
one being the timer handler itself (reqsk_timer_handler()).

In this case, del_timer_sync() loops forever.

Simple fix is to test if timer is pending.

Fixes: 2235f2ac75fd (inet: fix races with reqsk timers)
Signed-off-by: Eric Dumazet eduma...@google.com
---
Sorry for this very embarrassing bug, I should have caught it in my
tests :(

 net/ipv4/inet_connection_sock.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 05e3145f7dc3..134957159c27 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -593,7 +593,7 @@ static bool reqsk_queue_unlink(struct request_sock_queue 
*queue,
}
 
spin_unlock(queue-syn_wait_lock);
-   if (del_timer_sync(req-rsk_timer))
+   if (timer_pending(req-rsk_timer)  del_timer_sync(req-rsk_timer))
reqsk_put(req);
return found;
 }


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] Enable smsc911x for use with ACPI

2015-08-13 Thread David Miller

From: Jeremy Linton jeremy.lin...@arm.com
Date: Wed, 12 Aug 2015 17:06:25 -0500

 This set of patches enables the front Ethernet port on the
 ARM Juno development platform when used with an ACPI enabled kernel.

 These patches covert the of_property* calls in the driver to the
 DT/ACPI agnostic device_property* calls, and add the arm hardware
 id to the acpi_match_table.

 To support the above changes I copied a couple routines from
 of_net into the properties.c file, and modified them to
 be ACPI/DT agnostic. I'm not 100% sure this is the correct location
 for these functions. But I think they are required to avoid having
 a dozen different implementations scattered across assorted Ethernet
 adapters that are being enabled to use ACPI properties.

I realize that there are still some rinkles to work out, but I applied
this patch series as-is to net-next, and any fixups should be
submitted as followups.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging

2015-08-13 Thread Igor Plyatov


Dear David and Joe,

thank you for patch review!

Please look at email with subject
[PATCH v2] net: phy: workaround for buggy cable detection by LAN8700 
after cable plugging


Best wishes.
--
Igor Plyatov


From: Joe Perches j...@perches.com
Date: Thu, 13 Aug 2015 10:15:15 -0700


On Thu, 2015-08-13 at 20:11 +0300, Igor Plyatov wrote:

On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote:

* Due to HW bug, LAN8700 sometimes does not detect presence of energy in the
Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is
set, the ENERGYON bit does not asserted sometimes). This is a common bug of
LAN87xx family of PHY chips.
* The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous
algorythm still not reliable on 100 % and sometimes skip cable plugging.

[]

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c

[]

@@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev)
   static int lan87xx_read_status(struct phy_device *phydev)
   {
int err = genphy_read_status(phydev);
+   int rc;

Is there a reason to move this declaration?

There is no strict requirement to move declaration of the rc.
It was made just to have all declarations easily visible.

Generally it's better to have declarations
in the minimal/narrowest scope possible.

Agreed, and it's %100 unrelated to the purpose of this patch so not
should be included for that reason as well.

You will need to respin this patch with the variable moving elided.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel warning in tcp_fragment

2015-08-13 Thread Jovi Zhangwei

Hi,

On Wed, Aug 12, 2015 at 8:45 PM, Martin KaFai Lau ka...@fb.com wrote:
 On Mon, Aug 10, 2015 at 02:35:37PM -0400, Neal Cardwell wrote:
 On Mon, Aug 10, 2015 at 2:10 PM, Jovi Zhangwei j...@cloudflare.com wrote:
 
  Ping?
 
  We saw a lot of this warnings in our production system. It would be
  great appreciate if someone can give us the fix on this warnings. :)

 What is your net.ipv4.tcp_mtu_probing setting? If 1, have you tried
 setting it to 0?

 Hi Jovi, If setting net.ipv4.tcp_mtu_probing=0 helps, can you give the
 patch we posted earlier a try: https://patchwork.ozlabs.org/patch/481609/
 It is the same patch that I pointed out earlier. You can click
 on the download link.

 We are currently using a similar patch while keeping 
 net.ipv4.tcp_mtu_probing=1.

Our system need net.ipv4.tcp_mtu_probing, so we cannot set it to 0.
We are testing previous patch given by Neal, I will let you know the result.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] net/fsl: simplify Kconfig dependency list for fsl networking

2015-08-13 Thread Stuart Yoder

make the list of Kconfig dependencies for Freescale
networking more general. Simplify to supported
architectures: ARM, ARM64, PPC, M68K

Signed-off-by: Stuart Yoder stuart.yo...@freescale.com
---
 drivers/net/ethernet/freescale/Kconfig | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/Kconfig 
b/drivers/net/ethernet/freescale/Kconfig
index ff76d4e..70782d7 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -5,9 +5,7 @@
 config NET_VENDOR_FREESCALE
bool Freescale devices
default y
-   depends on FSL_SOC || QUICC_ENGINE || CPM1 || CPM2 || PPC_MPC512x || \
-  M523x || M527x || M5272 || M528x || M520x || M532x || \
-  ARCH_MXC || ARCH_MXS || (PPC_MPC52xx  PPC_BESTCOMM)
+   depends on M68K || PPC || ARM || ARM64
---help---
  If you have a network (Ethernet) card belonging to this class, say Y.
 
-- 
2.3.3

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 5/6] geneve: Consolidate Geneve functionality in single module.

2015-08-13 Thread Pravin Shelar

On Thu, Aug 13, 2015 at 2:24 PM, Jesse Gross je...@nicira.com wrote:
 On Wed, Aug 12, 2015 at 4:33 PM, Pravin Shelar pshe...@nicira.com wrote:
 On Wed, Aug 12, 2015 at 2:55 PM, Jesse Gross je...@nicira.com wrote:
 The farther I go in the series, the more that I hope that we can avoid
 the use of collect_md_tun. It really seems to add a lot of special
 cases.

 Use of collect_md_tun allows us to avoid hash table lookup. thats why
 I did it. Anyways we need a flag or pointer in geneve-sock structure
 to locate tunnel metadata. I dont see how is it simple if
 collect_md_tun is replaced with a flag.

 It seems to me that this requires more bookkeeping to keep consistent
 (a little in the code but mostly mentally) because collect_md_tun is a
 separate concept that doesn't necessarily follow the same rules as
 other tunnels. With VXLAN, I feel like I can mostly ignore this tunnel
 since it isn't special in most ways with the exception of places where
 it actually needs to do something different like allocate metadata.

ok, I will update the patch.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 0/2] minor tail loss probe improvements

2015-08-13 Thread David Miller

From: Yuchung Cheng ych...@google.com
Date: Wed, 12 Aug 2015 11:18:17 -0700

 This patch series enhance the tail loss probe (TLP) on some error
 conditions. When TLP fails to send a probe, it will no longer
 extend the RTO. When it fails to send a new packet because of
 receiver window limit, it'll try to retransmit the last packet.

Series applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Revert net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN

2015-08-13 Thread Eric Dumazet

On Thu, 2015-08-13 at 14:21 -0700, Calvin Owens wrote:
 Commit 8133534c760d4083 (net: limit tcp/udp rmem/wmem to
 SOCK_{RCV,SND}BUF_MIN) modified four sysctls to enforce that the values
 written to them are not less than SOCK_MIN_{RCV,SND}BUF.
 
...
 
 This reverts commit 8133534c760d4083f79d2cde42c636ccc0b2792e.
 
 Fixes: 8133534c760d4083 (net: limit tcp/udp rmem/wmem to SOCK_MIN...)
 Cc: Eric Dumazet eric.duma...@gmail.com
 Cc: Sorin Dumitru so...@returnze.ro
 Signed-off-by: Calvin Owens calvinow...@fb.com

Acked-by: Eric Dumazet eduma...@google.com


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 171 matches

Mail list logo