Re: [PATCH iproute2 2/3] ip-link: fix and extend documentation
On Wed, Aug 12, 2015 at 10:04:07PM +0200, Pavel Šimerda wrote: From: Pavel Šimerda psime...@redhat.com * Add `can` to list of supported link types * Document `addrgenmode` * Document `link-netnsid` * Document VLAN link type * Improve VXLAN link type documentation - Fix VXLAN srcport/dstport docs - Document `udpcsum`, `udp6zerocsumtx` and `udp6zerocsumrx` --- man/man8/ip-link.8.in | 112 -- 1 file changed, 108 insertions(+), 4 deletions(-) diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in index 372e6c6..9ff9a23 100644 --- a/man/man8/ip-link.8.in +++ b/man/man8/ip-link.8.in @@ -143,9 +143,13 @@ ip-link \- network device configuration ] | .br .B master -.IR DEVICE +.IR DEVICE | .br -.B nomaster +.B nomaster | +.br +.B addrgenmode { eui64 | none } +.br +.B link-netnsid ID .BR } @@ -185,6 +189,8 @@ Link types: .sp .B bond - Bonding device +.B can +- Controller Area Network interface .sp .B dummy - Dummy network interface @@ -266,6 +272,66 @@ specifies the number of receive queues for new device. specifies the desired index of the new virtual device. The link creation fails, if the index is busy. .TP +VLAN Type Support +For a link of type +.I VLAN +the following additional arguments are supported: + +.BI ip link add +.BI link DEVICE +.BI name NAME +.BI type vlan +.R [ +.BI protocol VLAN_PROTO +.R ] +.BI id VLANID +.R [ +.BR reorder_hdr { on | off } +.R ] +.R [ +.BR gvrp { on | off } +.R ] +.R [ +.BR mvrp { on | off } +.R ] +.R [ +.BR loose_binding { on | off } +.R ] +.R [ +.BI ingress-qos-map QOS-MAP +.R ] +.R [ +.BI egress-qos-map QOS-MAP +.R ] + +.in +8 +.sp +.BI protocol VLAN_PROTO +- either 802.1Q or 802.1ad. + +.BI id VLANID +- specifies the VLAN Identifer to use. Note that numbers with a leading 0 or 0x are interpreted as octal or hexadeimal, respectively. + +.BR reorder_hdr { on | off } +- specifies whether ethernet headers are reordered or not. May be it should have more detailed explanation like that this feature is ON by default and what it affects on ? + +.BR gvrp { on | off } +- specifies whether this VLAN should be registered using GARP VLAN Registration Protocol. + +.BR mvrp { on | off } +- specifies whether this VLAN should be registered using Multiple VLAN Registration Protocol. + +.BR loose_binding { on | off } +- specifies whether the VLAN device state is bound to the physical device state. May be add some little explanation that it means that if physical device goes DOWN then the vlan device does the same ? + +.BI ingress-qos-map QOS-MAP +- defines a mapping between priority code points on incoming frames. The format is FROM:TO with multiple mappings separated by spaces. Would it be useful if to add here explanation that under priority it means skb-priority and may be some example how it can be set by iptable CLASSIFY ? + +.BI egress-qos-map QOS-MAP +- the same as ingress-qos-map but for outgoing frames. +.in -8 + +.TP VXLAN Type Support For a link of type .I VXLAN @@ -284,7 +350,9 @@ the following additional arguments are supported: .R ] [ .BI tos TOS .R ] [ -.BI port MIN MAX +.BI dstport PORT +.R ] [ +.BI srcport MIN MAX .R ] [ .I [no]learning .R ] [ @@ -296,6 +364,12 @@ the following additional arguments are supported: .R ] [ .I [no]l3miss .R ] [ +.I [no]udpcsum +.R ] [ +.I [no]udp6zerocsumtx +.R ] [ +.I [no]udp6zerocsumrx +.R ] [ .BI ageing SECONDS .R ] [ .BI maxaddress NUMBER @@ -340,7 +414,11 @@ parameter. - specifies the TOS value to use in outgoing packets. .sp -.BI port MIN MAX +.BI dstport PORT +- specifies the UDP destination port to communicate to the remote VXLAN tunnel endpoint. + +.sp +.BI srcport MIN MAX - specifies the range of port numbers to use as UDP source ports to communicate to the remote VXLAN tunnel endpoint. @@ -366,6 +444,18 @@ are entered into the VXLAN device forwarding database. - specifies if netlink IP ADDR miss notifications are generated. .sp +.I [no]udpcsum +- specifies if UDP checksum is filled in + +.sp +.I [no]udp6zerocsumtx +- specifies if UDP checksum is filled in + +.sp +.I [no]udp6zerocsumrx +- specifies if UDP checksum is received + +.sp .BI ageing SECONDS - specifies the lifetime in seconds of FDB entries learnt by the kernel. @@ -751,6 +841,12 @@ tool can be used. But it allows to change network namespace only for physical de give the device a symbolic name for easy reference. .TP +.BI group GROUP +specify the group the device belongs to. +The available groups are listed in file +.BR @SYSCONFDIR@/group . + +.TP .BI vf NUM specify a Virtual Function device to be configured. The
[PATCH RFC net 0/3] ipv6: Fix potential deadlock when creating pcpu rt
This patch series fixes a potential deadlock when creating a pcpu rt. It happens when dst_alloc() decided to run gc. Something like this: read_lock(table-tb6_lock); ip6_rt_pcpu_alloc() = dst_alloc() = ip6_dst_gc() = write_lock(table-tb6_lock); /* oops */ Patch 1 and 2 are some prep works. Patch 3 is the fix. Original report: https://bugzilla.kernel.org/show_bug.cgi?id=102291 Steinar, the patches can also be applied to 4.2-rc5 (I just tried). Can you help to test them? Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC net 3/3] ipv6: Fix potential deadlock when creating pcpu rt
rt6_make_pcpu_route() is called under read_lock(table-tb6_lock). rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then calls dst_alloc(). dst_alloc() _may_ call ip6_dst_gc() which takes the write_lock(tabl-tb6_lock). A visualized version: read_lock(table-tb6_lock); rt6_make_pcpu_route(); = ip6_rt_pcpu_alloc(); = dst_alloc(); = ip6_dst_gc(); = write_lock(table-tb6_lock); /* oops */ The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc(). A reported stack: [141625.537638] INFO: rcu_sched self-detected stall on CPU { 27} (t=6 jiffies g=4159086 c=4159085 q=2139) [141625.547469] Task dump for CPU 27: [141625.550881] mtr R running task0 22121 22081 0x0008 [141625.558069] 88103f363d98 8106e488 001b [141625.565641] 81684900 88103f363db8 810702b0 0800 [141625.573220] 81684900 88103f363de8 8108df9f 88103f375a00 [141625.580803] Call Trace: [141625.583345] IRQ [8106e488] sched_show_task+0xc1/0xc6 [141625.589650] [810702b0] dump_cpu_task+0x35/0x39 [141625.595144] [8108df9f] rcu_dump_cpu_stacks+0x6a/0x8c [141625.601320] [81090606] rcu_check_callbacks+0x1f6/0x5d4 [141625.607669] [810940c8] update_process_times+0x2a/0x4f [141625.613925] [8109fbee] tick_sched_handle+0x32/0x3e [141625.619923] [8109fc2f] tick_sched_timer+0x35/0x5c [141625.625830] [81094a1f] __hrtimer_run_queues+0x8f/0x18d [141625.632171] [81094c9e] hrtimer_interrupt+0xa0/0x166 [141625.638258] [8102bf2a] local_apic_timer_interrupt+0x4e/0x52 [141625.645036] [8102c36f] smp_apic_timer_interrupt+0x39/0x4a [141625.651643] [8140b9e8] apic_timer_interrupt+0x68/0x70 [141625.657895] EOI [81346ee8] ? dst_destroy+0x7c/0xb5 [141625.664188] [813d45b5] ? fib6_flush_trees+0x20/0x20 [141625.670272] [81082b45] ? queue_write_lock_slowpath+0x60/0x6f [141625.677140] [8140aa33] _raw_write_lock_bh+0x23/0x25 [141625.683218] [813d4553] __fib6_clean_all+0x40/0x82 [141625.689124] [813d45b5] ? fib6_flush_trees+0x20/0x20 [141625.695207] [813d6058] fib6_clean_all+0xe/0x10 [141625.700854] [813d60d3] fib6_run_gc+0x79/0xc8 [141625.706329] [813d0510] ip6_dst_gc+0x85/0xf9 [141625.711718] [81346d68] dst_alloc+0x55/0x159 [141625.717105] [813d09b5] __ip6_dst_alloc.isra.32+0x19/0x63 [141625.723620] [813d1830] ip6_pol_route+0x36a/0x3e8 [141625.729441] [813d18d6] ip6_pol_route_output+0x11/0x13 [141625.735700] [813f02c8] fib6_rule_action+0xa7/0x1bf [141625.741698] [813d18c5] ? ip6_pol_route_input+0x17/0x17 [141625.748043] [81357c48] fib_rules_lookup+0xb5/0x12a [141625.754050] [81141628] ? poll_select_copy_remaining+0xf9/0xf9 [141625.761002] [813f0535] fib6_rule_lookup+0x37/0x5c [141625.766914] [813d18c5] ? ip6_pol_route_input+0x17/0x17 [141625.773260] [813d008c] ip6_route_output+0x7a/0x82 [141625.779177] [813c44c8] ip6_dst_lookup_tail+0x53/0x112 [141625.785437] [813c45c3] ip6_dst_lookup_flow+0x2a/0x6b [141625.791604] [813ddaab] rawv6_sendmsg+0x407/0x9b6 [141625.797423] [813d7914] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2 [141625.804464] [8139d4b4] inet_sendmsg+0x57/0x8e [141625.810028] [81329ba3] sock_sendmsg+0x2e/0x3c [141625.815588] [8132be57] SyS_sendto+0xfe/0x143 [141625.821063] [813dd551] ? rawv6_setsockopt+0x5e/0x67 [141625.827146] [8132c9f8] ? sock_common_setsockopt+0xf/0x11 [141625.833660] [8132c08c] ? SyS_setsockopt+0x81/0xa2 [141625.839565] [8140ac17] entry_SYSCALL_64_fastpath+0x12/0x6a Fixes: d52d3997f843 (pv6: Create percpu rt6_info) Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org Reported-by: Steinar H. Gunderson sgunder...@bigfoot.com --- net/ipv6/ip6_fib.c | 2 ++ net/ipv6/route.c | 44 +--- 2 files changed, 35 insertions(+), 11 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 55d1986..548c623 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -172,6 +172,8 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt) *ppcpu_rt = NULL; } } + + non_pcpu_rt-rt6i_pcpu = NULL; } static void rt6_release(struct rt6_info *rt) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 0a82653..d155864 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1007,27 +1007,39 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) { + struct fib6_table *table = rt-rt6i_table; struct rt6_info *pcpu_rt, *prev, **p; pcpu_rt = ip6_rt_pcpu_alloc(rt); if (!pcpu_rt) {
[PATCH RFC net 2/3] ipv6: Add rt6_make_pcpu_route()
It is a prep work for the potential deadlock. The current rt6_get_pcpu_route() will also create a pcpu rt if one does not exist. This patch moves the pcpu rt creation logic into another function, rt6_make_pcpu_route(). Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index c95c319..0a82653 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -993,13 +993,21 @@ static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info *rt) /* It should be called with read_lock_bh(tb6_lock) acquired */ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) { - struct rt6_info *pcpu_rt, *prev, **p; + struct rt6_info *pcpu_rt, **p; p = this_cpu_ptr(rt-rt6i_pcpu); pcpu_rt = *p; - if (pcpu_rt) - goto done; + if (pcpu_rt) { + dst_hold(pcpu_rt-dst); + rt6_dst_from_metrics_check(pcpu_rt); + } + return pcpu_rt; +} + +static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) +{ + struct rt6_info *pcpu_rt, *prev, **p; pcpu_rt = ip6_rt_pcpu_alloc(rt); if (!pcpu_rt) { @@ -1009,6 +1017,7 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) goto done; } + p = this_cpu_ptr(rt-rt6i_pcpu); prev = cmpxchg(p, NULL, pcpu_rt); if (prev) { /* If someone did it before us, return prev instead */ @@ -1093,8 +1102,11 @@ redo_rt6_select: rt-dst.lastuse = jiffies; rt-dst.__use++; pcpu_rt = rt6_get_pcpu_route(rt); - read_unlock_bh(table-tb6_lock); + if (!pcpu_rt) + pcpu_rt = rt6_make_pcpu_route(rt); + + read_unlock_bh(table-tb6_lock); return pcpu_rt; } } -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC net 1/3] ipv6: Remove un-used argument from ip6_dst_alloc()
After 4b32b5ad31a6 (ipv6: Stop rt6_info from using inet_peer's metrics), ip6_dst_alloc() does not need the 'table' argument. This patch cleans it up. Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 9de4d2b..c95c319 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -318,8 +318,7 @@ static const struct rt6_info ip6_blk_hole_entry_template = { /* allocate dst with ip6_dst_ops */ static struct rt6_info *__ip6_dst_alloc(struct net *net, struct net_device *dev, - int flags, - struct fib6_table *table) + int flags) { struct rt6_info *rt = dst_alloc(net-ipv6.ip6_dst_ops, dev, 0, DST_OBSOLETE_FORCE_CHK, flags); @@ -336,10 +335,9 @@ static struct rt6_info *__ip6_dst_alloc(struct net *net, static struct rt6_info *ip6_dst_alloc(struct net *net, struct net_device *dev, - int flags, - struct fib6_table *table) + int flags) { - struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags, table); + struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags); if (rt) { rt-rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC); @@ -950,8 +948,7 @@ static struct rt6_info *ip6_rt_cache_alloc(struct rt6_info *ort, if (ort-rt6i_flags (RTF_CACHE | RTF_PCPU)) ort = (struct rt6_info *)ort-dst.from; - rt = __ip6_dst_alloc(dev_net(ort-dst.dev), ort-dst.dev, -0, ort-rt6i_table); + rt = __ip6_dst_alloc(dev_net(ort-dst.dev), ort-dst.dev, 0); if (!rt) return NULL; @@ -983,8 +980,7 @@ static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info *rt) struct rt6_info *pcpu_rt; pcpu_rt = __ip6_dst_alloc(dev_net(rt-dst.dev), - rt-dst.dev, rt-dst.flags, - rt-rt6i_table); + rt-dst.dev, rt-dst.flags); if (!pcpu_rt) return NULL; @@ -1555,7 +1551,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev, if (unlikely(!idev)) return ERR_PTR(-ENODEV); - rt = ip6_dst_alloc(net, dev, 0, NULL); + rt = ip6_dst_alloc(net, dev, 0); if (unlikely(!rt)) { in6_dev_put(idev); dst = ERR_PTR(-ENOMEM); @@ -1742,7 +1738,8 @@ int ip6_route_add(struct fib6_config *cfg) if (!table) goto out; - rt = ip6_dst_alloc(net, NULL, (cfg-fc_flags RTF_ADDRCONF) ? 0 : DST_NOCOUNT, table); + rt = ip6_dst_alloc(net, NULL, + (cfg-fc_flags RTF_ADDRCONF) ? 0 : DST_NOCOUNT); if (!rt) { err = -ENOMEM; @@ -2399,7 +2396,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev, { struct net *net = dev_net(idev-dev); struct rt6_info *rt = ip6_dst_alloc(net, net-loopback_dev, - DST_NOCOUNT, NULL); + DST_NOCOUNT); if (!rt) return ERR_PTR(-ENOMEM); -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] mac80211: use DECLARE_EWMA
From: Johannes Berg johannes.b...@intel.com Instead of using the out-of-line average calculation, use the new DECLARE_EWMA() macro to declare a signal EWMA, and use that. This actually *reduces* the code size slightly (on x86-64) while also reducing the station info size by 80 bytes. Signed-off-by: Johannes Berg johannes.b...@intel.com --- net/mac80211/Kconfig | 1 - net/mac80211/mesh_plink.c | 2 +- net/mac80211/rx.c | 4 ++-- net/mac80211/sta_info.c | 9 + net/mac80211/sta_info.h | 6 -- 5 files changed, 12 insertions(+), 10 deletions(-) diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig index 086de496a4c1..3891cbd2adea 100644 --- a/net/mac80211/Kconfig +++ b/net/mac80211/Kconfig @@ -7,7 +7,6 @@ config MAC80211 select CRYPTO_CCM select CRYPTO_GCM select CRC32 - select AVERAGE ---help--- This option enables the hardware independent IEEE 802.11 networking stack. diff --git a/net/mac80211/mesh_plink.c b/net/mac80211/mesh_plink.c index 1a7d98398626..6fa6606fce55 100644 --- a/net/mac80211/mesh_plink.c +++ b/net/mac80211/mesh_plink.c @@ -64,7 +64,7 @@ static bool rssi_threshold_check(struct ieee80211_sub_if_data *sdata, { s32 rssi_threshold = sdata-u.mesh.mshcfg.rssi_threshold; return rssi_threshold == 0 || - (sta (s8) -ewma_read(sta-avg_signal) rssi_threshold); + (sta (s8) -ewma_signal_read(sta-avg_signal) rssi_threshold); } /** diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 3a1462810c8e..e3cff02bde07 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -1428,7 +1428,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx) sta-rx_bytes += rx-skb-len; if (!(status-flag RX_FLAG_NO_SIGNAL_VAL)) { sta-last_signal = status-signal; - ewma_add(sta-avg_signal, -status-signal); + ewma_signal_add(sta-avg_signal, -status-signal); } if (status-chains) { @@ -1440,7 +1440,7 @@ ieee80211_rx_h_sta_process(struct ieee80211_rx_data *rx) continue; sta-chain_signal_last[i] = signal; - ewma_add(sta-chain_signal_avg[i], -signal); + ewma_signal_add(sta-chain_signal_avg[i], -signal); } } diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 9da7d2bc271a..dd9541c51de1 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -341,9 +341,9 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata, ktime_get_ts(uptime); sta-last_connected = uptime.tv_sec; - ewma_init(sta-avg_signal, 1024, 8); + ewma_signal_init(sta-avg_signal); for (i = 0; i ARRAY_SIZE(sta-chain_signal_avg); i++) - ewma_init(sta-chain_signal_avg[i], 1024, 8); + ewma_signal_init(sta-chain_signal_avg[i]); if (local-ops-wake_tx_queue) { void *txq_data; @@ -1899,7 +1899,8 @@ void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo) } if (!(sinfo-filled BIT(NL80211_STA_INFO_SIGNAL_AVG))) { - sinfo-signal_avg = (s8) -ewma_read(sta-avg_signal); + sinfo-signal_avg = + (s8) -ewma_signal_read(sta-avg_signal); sinfo-filled |= BIT(NL80211_STA_INFO_SIGNAL_AVG); } } @@ -1914,7 +1915,7 @@ void sta_set_sinfo(struct sta_info *sta, struct station_info *sinfo) for (i = 0; i ARRAY_SIZE(sinfo-chain_signal); i++) { sinfo-chain_signal[i] = sta-chain_signal_last[i]; sinfo-chain_signal_avg[i] = - (s8) -ewma_read(sta-chain_signal_avg[i]); + (s8) -ewma_signal_read(sta-chain_signal_avg[i]); } } diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h index 6dcb33484eac..1c5333448bde 100644 --- a/net/mac80211/sta_info.h +++ b/net/mac80211/sta_info.h @@ -318,6 +318,8 @@ struct mesh_sta { unsigned int fail_avg; }; +DECLARE_EWMA(signal, 1024, 8) + /** * struct sta_info - STA information * @@ -460,12 +462,12 @@ struct sta_info { unsigned long rx_fragments; unsigned long rx_dropped; int last_signal; - struct ewma avg_signal; + struct ewma_signal avg_signal; int last_ack_signal; u8 chains; s8 chain_signal_last[IEEE80211_MAX_CHAINS]; - struct ewma chain_signal_avg[IEEE80211_MAX_CHAINS]; + struct ewma_signal chain_signal_avg[IEEE80211_MAX_CHAINS]; /* Plus 1 for non-QoS frames */ __le16 last_seq_ctrl[IEEE80211_NUM_TIDS + 1]; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info
[PATCH iproute2] ip-link: remove unnecessary return
Remove unnecessary retrun, because invarg() exit. Signed-off-by: Zhang Shengju zhangshen...@cmss.chinamobile.com --- ip/iplink_bridge.c | 30 -- 1 file changed, 12 insertions(+), 18 deletions(-) diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c index e704e29..61e4cda 100644 --- a/ip/iplink_bridge.c +++ b/ip/iplink_bridge.c @@ -42,47 +42,41 @@ static int bridge_parse_opt(struct link_util *lu, int argc, char **argv, while (argc 0) { if (matches(*argv, forward_delay) == 0) { NEXT_ARG(); - if (get_u32(val, *argv, 0)) { + if (get_u32(val, *argv, 0)) invarg(invalid forward_delay, *argv); - return -1; - } + addattr32(n, 1024, IFLA_BR_FORWARD_DELAY, val); } else if (matches(*argv, hello_time) == 0) { NEXT_ARG(); - if (get_u32(val, *argv, 0)) { + if (get_u32(val, *argv, 0)) invarg(invalid hello_time, *argv); - return -1; - } + addattr32(n, 1024, IFLA_BR_HELLO_TIME, val); } else if (matches(*argv, max_age) == 0) { NEXT_ARG(); - if (get_u32(val, *argv, 0)) { + if (get_u32(val, *argv, 0)) invarg(invalid max_age, *argv); - return -1; - } + addattr32(n, 1024, IFLA_BR_MAX_AGE, val); } else if (matches(*argv, ageing_time) == 0) { NEXT_ARG(); - if (get_u32(val, *argv, 0)) { + if (get_u32(val, *argv, 0)) invarg(invalid ageing_time, *argv); - return -1; - } + addattr32(n, 1024, IFLA_BR_AGEING_TIME, val); } else if (matches(*argv, stp_state) == 0) { NEXT_ARG(); - if (get_u32(val, *argv, 0)) { + if (get_u32(val, *argv, 0)) invarg(invalid stp_state, *argv); - return -1; - } + addattr32(n, 1024, IFLA_BR_STP_STATE, val); } else if (matches(*argv, priority) == 0) { __u16 prio; NEXT_ARG(); - if (get_u16(prio, *argv, 0)) { + if (get_u16(prio, *argv, 0)) invarg(invalid priority, *argv); - return -1; - } + addattr16(n, 1024, IFLA_BR_PRIORITY, prio); } else if (matches(*argv, help) == 0) { explain(); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT
On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote: Add ACPI bindings for the smsc911x driver. Convert the DT specific calls to nonspecific device* calls, This allows the driver to work with both ACPI and DT configurations. Ethernet should now work when using ACPI on ARM Juno. Signed-off-by: Jeremy Linton jeremy.lin...@arm.com The code looks fine to me. Currently the compulsary DT properties seem to match the approved ACPI NIC properties from here http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf Reviewed-by: Graeme Gregory graeme.greg...@linaro.org Thanks --- drivers/net/ethernet/smsc/smsc911x.c | 48 +--- 1 file changed, 22 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 959aeea..0f21aa3 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -59,7 +59,9 @@ #include linux/of_device.h #include linux/of_gpio.h #include linux/of_net.h +#include linux/acpi.h #include linux/pm_runtime.h +#include linux/property.h #include smsc911x.h @@ -2362,59 +2364,46 @@ static const struct smsc911x_ops shifted_smsc911x_ops = { .tx_writefifo = smsc911x_tx_writefifo_shift, }; -#ifdef CONFIG_OF -static int smsc911x_probe_config_dt(struct smsc911x_platform_config *config, - struct device_node *np) +static int smsc911x_probe_config(struct smsc911x_platform_config *config, + struct device *dev) { - const char *mac; u32 width = 0; - if (!np) + if (!dev) return -ENODEV; - config-phy_interface = of_get_phy_mode(np); + config-phy_interface = device_get_phy_mode(dev); - mac = of_get_mac_address(np); - if (mac) - memcpy(config-mac, mac, ETH_ALEN); + device_get_mac_address(dev, config-mac, ETH_ALEN); - of_property_read_u32(np, reg-shift, config-shift); + device_property_read_u32(dev, reg-shift, config-shift); - of_property_read_u32(np, reg-io-width, width); + device_property_read_u32(dev, reg-io-width, width); if (width == 4) config-flags |= SMSC911X_USE_32BIT; else config-flags |= SMSC911X_USE_16BIT; - if (of_get_property(np, smsc,irq-active-high, NULL)) + if (device_property_present(dev, smsc,irq-active-high)) config-irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH; - if (of_get_property(np, smsc,irq-push-pull, NULL)) + if (device_property_present(dev, smsc,irq-push-pull)) config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL; - if (of_get_property(np, smsc,force-internal-phy, NULL)) + if (device_property_present(dev, smsc,force-internal-phy)) config-flags |= SMSC911X_FORCE_INTERNAL_PHY; - if (of_get_property(np, smsc,force-external-phy, NULL)) + if (device_property_present(dev, smsc,force-external-phy)) config-flags |= SMSC911X_FORCE_EXTERNAL_PHY; - if (of_get_property(np, smsc,save-mac-address, NULL)) + if (device_property_present(dev, smsc,save-mac-address)) config-flags |= SMSC911X_SAVE_MAC_ADDRESS; return 0; } -#else -static inline int smsc911x_probe_config_dt( - struct smsc911x_platform_config *config, - struct device_node *np) -{ - return -ENODEV; -} -#endif /* CONFIG_OF */ static int smsc911x_drv_probe(struct platform_device *pdev) { - struct device_node *np = pdev-dev.of_node; struct net_device *dev; struct smsc911x_data *pdata; struct smsc911x_platform_config *config = dev_get_platdata(pdev-dev); @@ -2478,7 +2467,7 @@ static int smsc911x_drv_probe(struct platform_device *pdev) goto out_disable_resources; } - retval = smsc911x_probe_config_dt(pdata-config, np); + retval = smsc911x_probe_config(pdata-config, pdev-dev); if (retval config) { /* copy config parameters across to pdata */ memcpy(pdata-config, config, sizeof(pdata-config)); @@ -2654,6 +2643,12 @@ static const struct of_device_id smsc911x_dt_ids[] = { MODULE_DEVICE_TABLE(of, smsc911x_dt_ids); #endif +static const struct acpi_device_id smsc911x_acpi_match[] = { + { ARMH9118, 0 }, + { } +}; +MODULE_DEVICE_TABLE(acpi, smsc911x_acpi_match); + static struct platform_driver smsc911x_driver = { .probe = smsc911x_drv_probe, .remove = smsc911x_drv_remove, @@ -2661,6 +2656,7 @@ static struct platform_driver smsc911x_driver = { .name = SMSC_CHIPNAME, .pm = SMSC911X_PM_OPS, .of_match_table = of_match_ptr(smsc911x_dt_ids), + .acpi_match_table = ACPI_PTR(smsc911x_acpi_match), }, }; -- 2.4.3
Re: [PATCH v2 0/2] net: thunder: Add ACPI support.
On 08/12/2015 11:36 PM, David Daney wrote: On 08/12/2015 08:23 AM, Catalin Marinas wrote: On Tue, Aug 11, 2015 at 01:04:55PM -0700, David Daney wrote: On 08/11/2015 11:49 AM, David Miller wrote: From: David Daney ddaney.c...@gmail.com Date: Mon, 10 Aug 2015 17:58:35 -0700 Change from v1: Drop PHY binding part, use fwnode_property* APIs. The first patch (1/2) rearranges the existing code a little with no functional change to get ready for the second. The second (2/2) does the actual work of adding support to extract the needed information from the ACPI tables. Series applied. Thank you very much. In the future it might be better structured to try and get the OF node, and if that fails then try and use the ACPI method to obtain these values. Our current approach, as you can see in the patch, is the opposite. If ACPI is being used, prefer that over the OF device tree. You seem to be recommending precedence for OF. It should be consistent across all drivers/sub-systems, so do you really think that OF before ACPI is the way to go? On arm64 (unless you use a vendor kernel), DT takes precedence over ACPI if both arm provided to the kernel (and it's a fair assumption given that ACPI on ARM is still in the early days). You could also force ACPI with acpi=force on the kernel cmd line and the arch code will not unflatten the DT even if it is provided, therefore is_of_node(fwnode) returning false. Yes. on the other hand, if no DT is provided, will try ACPI even if no acpi=force on the kernel cmd line. I haven't looked at your driver in detail but something like AMD's xgbe_probe() uses a single function for both DT and ACPI with device_property_read_*() functions getting the information from the correct place in either case. The ACPI vs DT precedence is handled by the arch boot code, we never mix the two and confuse the drivers. My long term plan is to create something like firmware_get_mac_address(), that would encapsulate of_get_mac_address() and the ACPI equivalent. Same for firmware_phy_find_device(). I'm very keen on seeing that happens :) Thanks Hanjun -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 11/11] net: rfkill: Allow compile test of GPIO consumers if !GPIOLIB
On Sun, 2015-08-02 at 11:09 +0200, Geert Uytterhoeven wrote: The GPIO subsystem provides dummy GPIO consumer functions if GPIOLIB is not enabled. Hence drivers that depend on GPIOLIB, but use GPIO consumer functionality only, can still be compiled if GPIOLIB is not enabled. Relax the dependency on GPIOLIB if COMPILE_TEST is enabled, where appropriate. Applied. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Fri, Jul 31, 2015 at 4:51 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-31 at 16:42 +0530, Prashant Upadhyaya wrote: On Fri, Jul 31, 2015 at 1:26 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-31 at 12:30 +0530, Prashant Upadhyaya wrote: The delays work for me but is clearly not good for the performance of the slow path. And more importantly, I was looking for a fundamental reason regarding why it works with delays and why not without it. The issue is reproducible with a big ping (3.11.10-301.fc20.x86_64) How big ping needs to be to reproduce the problem ? If the MTU is 1500, I start getting problems anywhere starting from 2900 bytes and surely comes when further big pings are used eg. 10 K. (ping IP -s Size eg. ping 10.3.10.244 -s 1) And the big pings do work, as I said, with the delay hack. It might help trying this while you receive such frags : perf record -a -g skb:kfree_skb sleep 10 ... perf report Hi, I think I have a clue to the root cause of my issue, but I do not know a solution. Let me describe what I think is the problem. Fragmented packets enter into the kernel through eth0 and the kernel starts assembling them. Simultaneously, my packet socket implementation also injects the very same packets into the kernel via the tap. The kernel sees them as overlapped packets during assembly and drops the packets injected via the tap. Eventually when the assembly gets complete inside kernel for all the packets which entered via eth0, the whole packet gets dropped due to the iptables rules that I have set on eth0. So naturally there is no response to the bigger ping, because everything got dropped one way or the other. When I do introduce the delays (and it turns out that the delay that matters is when injecting via tap), the kernel has already completed the assembly of the packets via eth0 (during the delay I introduce for submission on tap), and then the submission via tap works well because it undergoes a fresh assembly (and ofcourse it does not get dropped because iptables drop rule is only on eth0) Now then, the question is -- how do I prevent the kernel from trying to assemble the packets arriving on eth0 and drop them right away even before assembly is attempted. This way the same packet injected via the tap would be the only one undergoing assembly and hopefully it would work. Regards -Prashant -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on behavior of tg3_self_test() (ethtool -t on tg3 driver)
On 8/12/2015 6:02 PM, Douglas Miller wrote: Oh, I had missed the extra if condition on tg3_test_link(). So external_lb is not a true superset of offline. So you are not surprised by the (about) 20 second link down period after this test? If this is expected (albeit undocumented) behavior we can change the test scenario to work around it. It seems as though not all adapters exhibit this same symptom. From a testing standpoint, it is a long delay to add that may only be needed for this one adapter (Broadcom BCM5719, or adapter family). We executed the ethtool -t dev offline in a loop on our local test machine with 5719 and linkup time is = 5 secs. Script: #!/bin/bash echo -OS Information- uname -a echo --Card Information-- lspci | grep 5719 echo --Interface information-- ethtool -i p4p4 echo -Offline test start-- for i in 1 2 3 do date ethtool -t p4p4 offline done Output: -OS Information- Linux siva-dev 4.2.0-rc4+ #1 SMP Thu Aug 13 20:24:11 IST 2015 x86_64 x86_64 x86_64 GNU/Linux --Card Information-- 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 03:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 03:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 03:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) --Interface information-- driver: tg3 version: 3.137 firmware-version: 5719-v1.41 NCSI v1.3.6.0 bus-info: :03:00.3 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no -Offline test start-- Thu Aug 13 22:05:59 IST 2015 The test result is PASS The test extra info: nvram test(online) 0 link test (online) 0 register test (offline) 0 memory test (offline) 0 mac loopback test (offline) 0 phy loopback test (offline) 0 ext loopback test (offline) 0 interrupt test(offline) 0 Thu Aug 13 22:06:00 IST 2015 The test result is PASS The test extra info: nvram test(online) 0 link test (online) 0 register test (offline) 0 memory test (offline) 0 mac loopback test (offline) 0 phy loopback test (offline) 0 ext loopback test (offline) 0 interrupt test(offline) 0 Thu Aug 13 22:06:05 IST 2015 The test result is PASS The test extra info: nvram test(online) 0 link test (online) 0 register test (offline) 0 memory test (offline) 0 mac loopback test (offline) 0 phy loopback test (offline) 0 ext loopback test (offline) 0 interrupt test(offline) 0 Please check your test environment. Thanks, Doug On 08/11/2015 03:31 PM, Michael Chan wrote: On Tue, 2015-08-11 at 14:24 -0500, Douglas Miller wrote: Yes, the wrap plugs are the loopback cables/plugs. It is my understanding that the offline tests do not require anything to be plugged into the ports, as they do not in any way touch the external port. They perform an internal loopback test which does not depend on any external connection. Correct. From what I can tell, the only difference between offline and external_lb is that external_lb performs the external loopback tests, *in addition to* all the tests done for offline. Correct. This would imply that the only tests that depend on anything connected to the physical port is external_lb, and there is no requirement that the wrap plugs be removed/replaced in order to run offline tests. When you do external loopback test, we skip the link test because you no longer have normal connection to the network. You now use a special loopback cable, which will fail the link up test because the link up test assumes connection to the network using normal cable. In the case I was debugging, wrap plugs were installed because the ports were, later, being tested in an external loopback way. What I am observing is that it takes about 20 seconds for the kernel to declare that the link is up, after running the offline or external_lb test. In the case of offline I cannot run the test again until the kernel declares the link up. In the case of external_lb I can run the test again immediately and it passes. As stated earlier, because we skip the link test when we are performing external_lb. So, you should always do ethtool -t dev external_lb if you have a loopback cable connected. We will perform the external loopback test and skip the link test. If you don't have an external loopback cable connected, you should run ethtool -t dev offline. It will not do the external loopback test and will do the link test for proper link up with the network. This suggests to me that the external_lb case (again, it is a superset of offline) is performing
Re: [PATCH] mm: make page pfmemalloc check more robust
On 08/13/2015 10:58 AM, mho...@kernel.org wrote: From: Michal Hocko mho...@suse.com The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb) added the checks for page-pfmemalloc to __skb_fill_page_desc(): if (page-pfmemalloc !page-mapping) skb-pfmemalloc = true; It assumes page-mapping == NULL implies that page-pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page-mapping to NULL and leave page-index value alone. Due to being in union, a non-zero page-index will be interpreted as true page-pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here ^ not ? The full story (according to Jiri Bohac and my understanding, I don't know much about netdev) is that the __skb_fill_page_desc() is invoked here during *sending* and normally the skb-pfmemalloc would be ignored in the end. But because it is a localhost connection, the receiving code will think it was a memalloc allocation during receive, and then do the socket restriction. Given that this apparently isn't the first case of this localhost issue, I wonder if network code should just clear skb-pfmemalloc during send (or maybe just send over localhost). That would be probably easier than distinguish the __skb_fill_page_desc() callers for send vs receive. and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH iproute2] ip-link: enhance prompt message
Enhance promtp message for 'spoofchk' and 'query_rss' flag, and fix a typo. Signed-off-by: Zhang Shengju zhangshen...@cmss.chinamobile.com --- ip/iplink.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ip/iplink.c b/ip/iplink.c index 1836889..520f750 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -329,7 +329,7 @@ static int iplink_parse_vf(int vf, int *argcp, char ***argvp, else if (matches(*argv, off) == 0) ivs.setting = 0; else - invarg(Invalid \spoofchk\ value\n, *argv); + return on_off(spoofchk, *argv); ivs.vf = vf; addattr_l(req-n, sizeof(*req), IFLA_VF_SPOOFCHK, ivs, sizeof(ivs)); @@ -341,7 +341,7 @@ static int iplink_parse_vf(int vf, int *argcp, char ***argvp, else if (matches(*argv, off) == 0) ivs.setting = 0; else - invarg(Invalid \query_rss\ value\n, *argv); + return on_off(query_rss, *argv); ivs.vf = vf; addattr_l(req-n, sizeof(*req), IFLA_VF_RSS_QUERY_EN, ivs, sizeof(ivs)); @@ -1092,7 +1092,7 @@ static int do_set(int argc, char **argv) } else if (strcmp(*argv, off) == 0) { flags |= IFF_NOARP; } else - return on_off(noarp, *argv); + return on_off(arp, *argv); } else if (matches(*argv, dynamic) == 0) { NEXT_ARG(); mask |= IFF_DYNAMIC; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mm: make page pfmemalloc check more robust
From: Michal Hocko mho...@suse.com The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb) added the checks for page-pfmemalloc to __skb_fill_page_desc(): if (page-pfmemalloc !page-mapping) skb-pfmemalloc = true; It assumes page-mapping == NULL implies that page-pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page-mapping to NULL and leave page-index value alone. Due to being in union, a non-zero page-index will be interpreted as true page-pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page-pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page-index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). Fixes: c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb) Cc: stable # 3.6+ Debugged-by: Vlastimil Babka vba...@suse.com Debugged-by: Jiri Bohac jbo...@suse.com Signed-off-by: Michal Hocko mho...@suse.com --- drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +- drivers/net/ethernet/intel/igb/igb_main.c | 2 +- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +- drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +- include/linux/mm.h| 28 +++ include/linux/mm_types.h | 9 include/linux/skbuff.h| 14 mm/page_alloc.c | 9 +--- mm/slab.c | 4 ++-- mm/slub.c | 2 +- net/core/skbuff.c | 2 +- 11 files changed, 47 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c index 982fdcdc795b..b5b2925103ec 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c @@ -216,7 +216,7 @@ static void fm10k_reuse_rx_page(struct fm10k_ring *rx_ring, static inline bool fm10k_page_is_reserved(struct page *page) { - return (page_to_nid(page) != numa_mem_id()) || page-pfmemalloc; + return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page); } static bool fm10k_can_reuse_rx_page(struct fm10k_rx_buffer *rx_buffer, diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 2f70a9b152bd..830466c49987 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6566,7 +6566,7 @@ static void igb_reuse_rx_page(struct igb_ring *rx_ring, static inline bool igb_page_is_reserved(struct page *page) { - return (page_to_nid(page) != numa_mem_id()) || page-pfmemalloc; + return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page); } static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer, diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 9aa6104e34ea..ae21e0b06c3a 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -1832,7 +1832,7 @@ static void ixgbe_reuse_rx_page(struct ixgbe_ring *rx_ring, static inline bool ixgbe_page_is_reserved(struct page *page) { - return (page_to_nid(page) != numa_mem_id()) || page-pfmemalloc; + return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page); } /** diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c index e71cdde9cb01..1d7b00b038a2 100644 --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c @@ -765,7 +765,7 @@ static void ixgbevf_reuse_rx_page(struct ixgbevf_ring *rx_ring, static inline bool ixgbevf_page_is_reserved(struct page *page) { - return (page_to_nid(page) !=
Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT
On Thu, Aug 13, 2015 at 09:27:59AM +0100, Graeme Gregory wrote: On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote: Add ACPI bindings for the smsc911x driver. Convert the DT specific calls to nonspecific device* calls, This allows the driver to work with both ACPI and DT configurations. Ethernet should now work when using ACPI on ARM Juno. Last sentence does not belong in the commit log. Signed-off-by: Jeremy Linton jeremy.lin...@arm.com The code looks fine to me. Currently the compulsary DT properties seem to match the approved ACPI NIC properties from here http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf What about _DSD device specific properties (eg smsc,save-mac-address) ? Are we taking 1:1 translation between DT and ACPI for granted ? I thought some process must be put in place to define the corresponding bindings in ACPI before starting this mechanical translation or maybe I missed something, I would like to understand. How does the device specific _DSD definitions work in ACPI world ? Where are they published ? How will we translate those to DT bindings if there is need ? Lorenzo Reviewed-by: Graeme Gregory graeme.greg...@linaro.org Thanks --- drivers/net/ethernet/smsc/smsc911x.c | 48 +--- 1 file changed, 22 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 959aeea..0f21aa3 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -59,7 +59,9 @@ #include linux/of_device.h #include linux/of_gpio.h #include linux/of_net.h +#include linux/acpi.h #include linux/pm_runtime.h +#include linux/property.h #include smsc911x.h @@ -2362,59 +2364,46 @@ static const struct smsc911x_ops shifted_smsc911x_ops = { .tx_writefifo = smsc911x_tx_writefifo_shift, }; -#ifdef CONFIG_OF -static int smsc911x_probe_config_dt(struct smsc911x_platform_config *config, - struct device_node *np) +static int smsc911x_probe_config(struct smsc911x_platform_config *config, +struct device *dev) { - const char *mac; u32 width = 0; - if (!np) + if (!dev) return -ENODEV; - config-phy_interface = of_get_phy_mode(np); + config-phy_interface = device_get_phy_mode(dev); - mac = of_get_mac_address(np); - if (mac) - memcpy(config-mac, mac, ETH_ALEN); + device_get_mac_address(dev, config-mac, ETH_ALEN); - of_property_read_u32(np, reg-shift, config-shift); + device_property_read_u32(dev, reg-shift, config-shift); - of_property_read_u32(np, reg-io-width, width); + device_property_read_u32(dev, reg-io-width, width); if (width == 4) config-flags |= SMSC911X_USE_32BIT; else config-flags |= SMSC911X_USE_16BIT; - if (of_get_property(np, smsc,irq-active-high, NULL)) + if (device_property_present(dev, smsc,irq-active-high)) config-irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH; - if (of_get_property(np, smsc,irq-push-pull, NULL)) + if (device_property_present(dev, smsc,irq-push-pull)) config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL; - if (of_get_property(np, smsc,force-internal-phy, NULL)) + if (device_property_present(dev, smsc,force-internal-phy)) config-flags |= SMSC911X_FORCE_INTERNAL_PHY; - if (of_get_property(np, smsc,force-external-phy, NULL)) + if (device_property_present(dev, smsc,force-external-phy)) config-flags |= SMSC911X_FORCE_EXTERNAL_PHY; - if (of_get_property(np, smsc,save-mac-address, NULL)) + if (device_property_present(dev, smsc,save-mac-address)) config-flags |= SMSC911X_SAVE_MAC_ADDRESS; return 0; } -#else -static inline int smsc911x_probe_config_dt( - struct smsc911x_platform_config *config, - struct device_node *np) -{ - return -ENODEV; -} -#endif /* CONFIG_OF */ static int smsc911x_drv_probe(struct platform_device *pdev) { - struct device_node *np = pdev-dev.of_node; struct net_device *dev; struct smsc911x_data *pdata; struct smsc911x_platform_config *config = dev_get_platdata(pdev-dev); @@ -2478,7 +2467,7 @@ static int smsc911x_drv_probe(struct platform_device *pdev) goto out_disable_resources; } - retval = smsc911x_probe_config_dt(pdata-config, np); + retval = smsc911x_probe_config(pdata-config, pdev-dev); if (retval config) { /* copy config parameters across to pdata */ memcpy(pdata-config, config, sizeof(pdata-config)); @@ -2654,6 +2643,12 @@ static const struct of_device_id smsc911x_dt_ids[] = {
[PATCH 1/2] average: provide macro to create static EWMA
From: Johannes Berg johannes.b...@intel.com Having the EWMA parameters stored in the runtime struct imposes memory requirements for the constant values that could just be inlined in the code. This particularly makes sense if there are a lot of such structs, for example in mac80211 in the station table where each station has a number of these in an array, and there can be many stations. Provide a macro DECLARE_EWMA() that declares the necessary struct and inline functions to access it with the parameters hard-coded; using this also means the user no longer needs to 'select AVERAGE' as it's entirely self-contained. In the mac80211 case, on x86-64, this actually slightly *reduces* code size, while also saving 80 bytes of runtime memory per sta. Signed-off-by: Johannes Berg johannes.b...@intel.com --- As the next patch relies on this, I'll take this through my tree unless I hear objections. --- include/linux/average.h | 39 +++ 1 file changed, 39 insertions(+) diff --git a/include/linux/average.h b/include/linux/average.h index c6028fd742c1..802adeab7037 100644 --- a/include/linux/average.h +++ b/include/linux/average.h @@ -27,4 +27,43 @@ static inline unsigned long ewma_read(const struct ewma *avg) return avg-internal avg-factor; } +#define DECLARE_EWMA(name, _factor, _weight) \ + struct ewma_##name {\ + unsigned long internal; \ + }; \ + static inline void ewma_##name##_init(struct ewma_##name *e)\ + { \ + BUILD_BUG_ON(!__builtin_constant_p(_factor)); \ + BUILD_BUG_ON(!__builtin_constant_p(_weight)); \ + BUILD_BUG_ON(!is_power_of_2(_factor)); \ + BUILD_BUG_ON(!is_power_of_2(_weight)); \ + e-internal = 0;\ + } \ + static inline unsigned long \ + ewma_##name##_read(struct ewma_##name *e) \ + { \ + BUILD_BUG_ON(!__builtin_constant_p(_factor)); \ + BUILD_BUG_ON(!__builtin_constant_p(_weight)); \ + BUILD_BUG_ON(!is_power_of_2(_factor)); \ + BUILD_BUG_ON(!is_power_of_2(_weight)); \ + return e-internal ilog2(_factor); \ + } \ + static inline void ewma_##name##_add(struct ewma_##name *e, \ +unsigned long val) \ + { \ + unsigned long internal = ACCESS_ONCE(e-internal); \ + unsigned long weight = ilog2(_weight); \ + unsigned long factor = ilog2(_factor); \ + \ + BUILD_BUG_ON(!__builtin_constant_p(_factor)); \ + BUILD_BUG_ON(!__builtin_constant_p(_weight)); \ + BUILD_BUG_ON(!is_power_of_2(_factor)); \ + BUILD_BUG_ON(!is_power_of_2(_weight)); \ + \ + ACCESS_ONCE(e-internal) = internal ? \ + (((internal weight) - internal) +\ + (val factor)) weight :\ + (val factor);\ + } + #endif /* _LINUX_AVERAGE_H */ -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm: make page pfmemalloc check more robust
On Thu 13-08-15 11:13:04, Vlastimil Babka wrote: On 08/13/2015 10:58 AM, mho...@kernel.org wrote: From: Michal Hocko mho...@suse.com The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb) added the checks for page-pfmemalloc to __skb_fill_page_desc(): if (page-pfmemalloc !page-mapping) skb-pfmemalloc = true; It assumes page-mapping == NULL implies that page-pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page-mapping to NULL and leave page-index value alone. Due to being in union, a non-zero page-index will be interpreted as true page-pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here ^ not ? Dohh, you are right of course, updated... The full story (according to Jiri Bohac and my understanding, I don't know much about netdev) is that the __skb_fill_page_desc() is invoked here during *sending* and normally the skb-pfmemalloc would be ignored in the end. But because it is a localhost connection, the receiving code will think it was a memalloc allocation during receive, and then do the socket restriction. Given that this apparently isn't the first case of this localhost issue, I wonder if network code should just clear skb-pfmemalloc during send (or maybe just send over localhost). That would be probably easier than distinguish the __skb_fill_page_desc() callers for send vs receive. Maybe the networking code can behave better in this particular case but the core thing remains though. Relying on page-mapping as you have properly found out during the debugging cannot be used for the reliable detection of pfmemalloc. So I would argue that a more robust detection is really worthwhile. Note there are other places which even do not bother to test for mapping - maybe they are safe but I got lost quickly when trying to track the allocation source to be clear that nothing could have stepped in in the meantime. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH 2/3] ARM: dts: dra7: update cpsw compatible
* Mugunthan V N mugunthan...@ti.com [150812 02:56]: CPSW driver has been updated with compatibles for enabling errata workarounds. So updating cpsw compatibles. Signed-off-by: Mugunthan V N mugunthan...@ti.com Please feel free to merge this one via net tree once the changes are reviewed: Acked-by: Tony Lindgren t...@atomide.com --- arch/arm/boot/dts/dra7.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi index 8f1e25b..b4fdd10 100644 --- a/arch/arm/boot/dts/dra7.dtsi +++ b/arch/arm/boot/dts/dra7.dtsi @@ -1398,7 +1398,7 @@ }; mac: ethernet@4a10 { - compatible = ti,cpsw; + compatible = ti,dra7-cpsw,ti,cpsw; ti,hwmods = gmac; clocks = dpll_gmac_ck, gmac_gmii_ref_clk_div; clock-names = fck, cpts; -- 2.5.0.234.gefc8a62 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH 3/3] ARM: dts: am33xx: update cpsw compatible
* Mugunthan V N mugunthan...@ti.com [150812 02:56]: CPSW driver has been updated with compatibles for enabling errata workarounds. So updating cpsw compatibles. Signed-off-by: Mugunthan V N mugunthan...@ti.com This too: Acked-by: Tony Lindgren t...@atomide.com --- arch/arm/boot/dts/am33xx.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi index 21fcc44..8b59c86 100644 --- a/arch/arm/boot/dts/am33xx.dtsi +++ b/arch/arm/boot/dts/am33xx.dtsi @@ -700,7 +700,7 @@ }; mac: ethernet@4a10 { - compatible = ti,cpsw; + compatible = ti,am335x-cpsw,ti,cpsw; ti,hwmods = cpgmac0; clocks = cpsw_125mhz_gclk, cpsw_cpts_rft_clk; clock-names = fck, cpts; -- 2.5.0.234.gefc8a62 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm: make page pfmemalloc check more robust
On Thu, Aug 13, 2015 at 10:58:54AM +0200, mho...@kernel.org wrote: From: Michal Hocko mho...@suse.com The patch c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb) added the checks for page-pfmemalloc to __skb_fill_page_desc(): if (page-pfmemalloc !page-mapping) skb-pfmemalloc = true; It assumes page-mapping == NULL implies that page-pfmemalloc can be trusted. However, __delete_from_page_cache() can set set page-mapping to NULL and leave page-index value alone. Due to being in union, a non-zero page-index will be interpreted as true page-pfmemalloc. So the assumption is invalid if the networking code can see such a page. And it seems it can. We have encountered this with a NFS over loopback setup when such a page is attached to a new skbuf. There is no copying going on in this case so the page confuses __skb_fill_page_desc which interprets the index as pfmemalloc flag and the network stack drops packets that have been allocated using the reserves unless they are to be queued on sockets handling the swapping which is the case here and that leads to hangs when the nfs client waits for a response from the server which has been dropped and thus never arrive. The struct page is already heavily packed so rather than finding another hole to put it in, let's do a trick instead. We can reuse the index again but define it to an impossible value (-1UL). This is the page index so it should never see the value that large. Replace all direct users of page-pfmemalloc by page_is_pfmemalloc which will hide this nastiness from unspoiled eyes. The information will get lost if somebody wants to use page-index obviously but that was the case before and the original code expected that the information should be persisted somewhere else if that is really needed (e.g. what SLAB and SLUB do). Fixes: c48a11c7ad26 (netvm: propagate page-pfmemalloc to skb) Cc: stable # 3.6+ Debugged-by: Vlastimil Babka vba...@suse.com Debugged-by: Jiri Bohac jbo...@suse.com Signed-off-by: Michal Hocko mho...@suse.com Acked-by: Mel Gorman mgor...@suse.de -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] net: rfkill: add rfkill_find_type function
On Wed, 2015-08-05 at 16:39 +0300, Heikki Krogerus wrote: +static const char *rfkill_types[NUM_RFKILL_TYPES] = { + [RFKILL_TYPE_WLAN] = wlan, + [RFKILL_TYPE_BLUETOOTH] = bluetooth, + [RFKILL_TYPE_UWB] = ultrawideband, + [RFKILL_TYPE_WIMAX] = wimax, + [RFKILL_TYPE_WWAN] = wwan, + [RFKILL_TYPE_GPS] = gps, + [RFKILL_TYPE_FM]= fm, + [RFKILL_TYPE_NFC] = nfc, +}; + +enum rfkill_type rfkill_find_type(const char *name) +{ + int i; + + BUILD_BUG_ON(NUM_RFKILL_TYPES != RFKILL_TYPE_NFC + 1); That BUILD_BUG_ON() is now less useful - previously it pointed to the code that needed to change, now you're left wondering if you don't look up since it isn't quite that obvious from the code what this does. Something like BUILD_BUG_ON(rfkill_types[NUM_RFKILL_TYPES - 1] == NULL); would be better. As we only add here, that would be safe enough - I've done something similar in the past that a bit more complicated. With that and the static inline fixed (which maybe you could even remove) I'm fine with all these rfkill patches, but I'm not sure how to merge them since they affect all kinds of other trees. If desired, I can apply them, but an ACK from the tegra maintainer would be good :) johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT
On Thu, Aug 13, 2015 at 10:01:17AM +0100, Lorenzo Pieralisi wrote: On Thu, Aug 13, 2015 at 09:27:59AM +0100, Graeme Gregory wrote: On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote: Add ACPI bindings for the smsc911x driver. Convert the DT specific calls to nonspecific device* calls, This allows the driver to work with both ACPI and DT configurations. Ethernet should now work when using ACPI on ARM Juno. Last sentence does not belong in the commit log. Signed-off-by: Jeremy Linton jeremy.lin...@arm.com The code looks fine to me. Currently the compulsary DT properties seem to match the approved ACPI NIC properties from here http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf What about _DSD device specific properties (eg smsc,save-mac-address) ? Are we taking 1:1 translation between DT and ACPI for granted ? I thought some process must be put in place to define the corresponding bindings in ACPI before starting this mechanical translation or maybe I missed something, I would like to understand. How does the device specific _DSD definitions work in ACPI world ? Where are they published ? How will we translate those to DT bindings if there is need ? This I do not know as that discussion is still ongoing. But those options are currently marked as optional so driver should not fail if they are not present in ACPI. Graeme Lorenzo Reviewed-by: Graeme Gregory graeme.greg...@linaro.org Thanks --- drivers/net/ethernet/smsc/smsc911x.c | 48 +--- 1 file changed, 22 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 959aeea..0f21aa3 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -59,7 +59,9 @@ #include linux/of_device.h #include linux/of_gpio.h #include linux/of_net.h +#include linux/acpi.h #include linux/pm_runtime.h +#include linux/property.h #include smsc911x.h @@ -2362,59 +2364,46 @@ static const struct smsc911x_ops shifted_smsc911x_ops = { .tx_writefifo = smsc911x_tx_writefifo_shift, }; -#ifdef CONFIG_OF -static int smsc911x_probe_config_dt(struct smsc911x_platform_config *config, - struct device_node *np) +static int smsc911x_probe_config(struct smsc911x_platform_config *config, + struct device *dev) { - const char *mac; u32 width = 0; - if (!np) + if (!dev) return -ENODEV; - config-phy_interface = of_get_phy_mode(np); + config-phy_interface = device_get_phy_mode(dev); - mac = of_get_mac_address(np); - if (mac) - memcpy(config-mac, mac, ETH_ALEN); + device_get_mac_address(dev, config-mac, ETH_ALEN); - of_property_read_u32(np, reg-shift, config-shift); + device_property_read_u32(dev, reg-shift, config-shift); - of_property_read_u32(np, reg-io-width, width); + device_property_read_u32(dev, reg-io-width, width); if (width == 4) config-flags |= SMSC911X_USE_32BIT; else config-flags |= SMSC911X_USE_16BIT; - if (of_get_property(np, smsc,irq-active-high, NULL)) + if (device_property_present(dev, smsc,irq-active-high)) config-irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH; - if (of_get_property(np, smsc,irq-push-pull, NULL)) + if (device_property_present(dev, smsc,irq-push-pull)) config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL; - if (of_get_property(np, smsc,force-internal-phy, NULL)) + if (device_property_present(dev, smsc,force-internal-phy)) config-flags |= SMSC911X_FORCE_INTERNAL_PHY; - if (of_get_property(np, smsc,force-external-phy, NULL)) + if (device_property_present(dev, smsc,force-external-phy)) config-flags |= SMSC911X_FORCE_EXTERNAL_PHY; - if (of_get_property(np, smsc,save-mac-address, NULL)) + if (device_property_present(dev, smsc,save-mac-address)) config-flags |= SMSC911X_SAVE_MAC_ADDRESS; return 0; } -#else -static inline int smsc911x_probe_config_dt( - struct smsc911x_platform_config *config, - struct device_node *np) -{ - return -ENODEV; -} -#endif /* CONFIG_OF */ static int smsc911x_drv_probe(struct platform_device *pdev) { - struct device_node *np = pdev-dev.of_node; struct net_device *dev; struct smsc911x_data *pdata; struct smsc911x_platform_config *config = dev_get_platdata(pdev-dev); @@ -2478,7 +2467,7 @@ static int smsc911x_drv_probe(struct platform_device *pdev) goto out_disable_resources; } - retval = smsc911x_probe_config_dt(pdata-config, np); + retval =
Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API
Hi, On 12/08/15 17:42, Marcelo Ricardo Leitner wrote: Em 12-08-2015 12:33, David Laight escreveu: From: Marcelo Ricardo Leitner Sent: 12 August 2015 14:16 Em 12-08-2015 07:23, David Laight escreveu: From: Marcelo Ricardo Leitner Sent: 11 August 2015 23:22 DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not needed but this causes it to use sctp_do_peeloff() to mimic an kernel_accept() and this causes a symbol dependency on sctp module. By switching it to 1-to-1 API we can avoid this dependency and also reduce quite a lot of SCTP-specific code in lowcomms.c. ... You still need to enable sctp notifications (I think the patch deleted that code). Otherwise you don't get any kind of indication if the remote system 'resets' (ie sends an new INIT chunk) on an existing connection. Right, it would miss the restart event and could generate a corrupted tx/rx buffers by glueing parts of old messages with new ones. Except that it is SCTP so you'd expect DATA chunks to contain entire messages and so get unexpected message sequences rather than corrupt messages. I was thinking on cases where the buf for recvmsg is not enough to hold the chunk, so that the remaining is left for another attempt (sctp_recvmsg, around line 2130), but sounds like we won't purge rx buffer when the reset happens so that doesn't matter. The association is replaced, but the buffers are kept. Out of order messages aren't a problem for dlm. It can recover from that just fine, as it doesn't have a specific handshake at beginning or something like that and upper layers are agnostic to that state transition (disconnect/reconnect/...), this should be fine. I'm not sure thats true - DLM does rely on message ordering in some cases in order to ensure correct functioning. So depending on how SCTP is interfaced to DLM, it might potentially be an issue, Steve. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] ipv4: off-by-one in continuation handling in /proc/net/route
When generating /proc/net/route we emit a header followed by a line for each route. When a short read is performed we will restart this process based on the open file descriptor. When calculating the start point we fail to take into account that the 0th entry is the header. This leads us to skip the first entry when doing a continuation read. This can be easily seen with the comparison below: while read l; do echo $l; done /proc/net/route A cat /proc/net/route B diff -bu A B | grep '^[+-]' On my example machine I have approximatly 10KB of route output. There we see the very first non-title element is lost in the while read case, and an entry around the 8K mark in the cat case: +wlan0 02021EAC 0003 0 0 400 0 0 0 -tun1 00C0AC0A 0001 0 0 950 00C0 0 0 0 Fix up the off-by-one when reaquiring position on continuation. BugLink: http://bugs.launchpad.net/bugs/1483440 Signed-off-by: Andy Whitcroft a...@canonical.com --- net/ipv4/fib_trie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) From code inspection I belive this was introduced by the Fixes below, but I have not tested this to confirm. Fixes: 8be33e955cb9 (ipv4: off-by-one in continuation handling in /proc/net/route) diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c index 37c4bb8..b0c6258 100644 --- a/net/ipv4/fib_trie.c +++ b/net/ipv4/fib_trie.c @@ -2465,7 +2465,7 @@ static struct key_vector *fib_route_get_idx(struct fib_route_iter *iter, key = l-key + 1; iter-pos++; - if (pos-- = 0) + if (--pos = 0) break; l = NULL; -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Convert smsc911x to use ACPI as well as DT
On Thu, Aug 13, 2015 at 10:38:38AM +0100, Graeme Gregory wrote: On Thu, Aug 13, 2015 at 10:01:17AM +0100, Lorenzo Pieralisi wrote: On Thu, Aug 13, 2015 at 09:27:59AM +0100, Graeme Gregory wrote: On Wed, Aug 12, 2015 at 05:06:27PM -0500, Jeremy Linton wrote: Add ACPI bindings for the smsc911x driver. Convert the DT specific calls to nonspecific device* calls, This allows the driver to work with both ACPI and DT configurations. Ethernet should now work when using ACPI on ARM Juno. Last sentence does not belong in the commit log. Signed-off-by: Jeremy Linton jeremy.lin...@arm.com The code looks fine to me. Currently the compulsary DT properties seem to match the approved ACPI NIC properties from here http://www.uefi.org/sites/default/files/resources/nic-request-v2.pdf What about _DSD device specific properties (eg smsc,save-mac-address) ? Are we taking 1:1 translation between DT and ACPI for granted ? I thought some process must be put in place to define the corresponding bindings in ACPI before starting this mechanical translation or maybe I missed something, I would like to understand. How does the device specific _DSD definitions work in ACPI world ? Where are they published ? How will we translate those to DT bindings if there is need ? This I do not know as that discussion is still ongoing. But those options are currently marked as optional so driver should not fail if they are not present in ACPI. Well yes, but if they are present this patch uses them when booting with ACPI and that's what I am arguing about, because they are undocumented. This discussion should be brought to completion, the _DSD usage and bindings sharing between DT and ACPI well defined before we can enable drivers to rely on properties that in ACPI world have no binding definition at all, if we go by this route we might end up having drivers reusing DT properties that have no reason whatsoever to exist in ACPI world. So, to make this clear, the _DSD usage must be documented and there has to be a way to define the related bindings in ACPI world so that driver code can be reviewed accordingly. Lorenzo Graeme Lorenzo Reviewed-by: Graeme Gregory graeme.greg...@linaro.org Thanks --- drivers/net/ethernet/smsc/smsc911x.c | 48 +--- 1 file changed, 22 insertions(+), 26 deletions(-) diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c index 959aeea..0f21aa3 100644 --- a/drivers/net/ethernet/smsc/smsc911x.c +++ b/drivers/net/ethernet/smsc/smsc911x.c @@ -59,7 +59,9 @@ #include linux/of_device.h #include linux/of_gpio.h #include linux/of_net.h +#include linux/acpi.h #include linux/pm_runtime.h +#include linux/property.h #include smsc911x.h @@ -2362,59 +2364,46 @@ static const struct smsc911x_ops shifted_smsc911x_ops = { .tx_writefifo = smsc911x_tx_writefifo_shift, }; -#ifdef CONFIG_OF -static int smsc911x_probe_config_dt(struct smsc911x_platform_config *config, - struct device_node *np) +static int smsc911x_probe_config(struct smsc911x_platform_config *config, +struct device *dev) { - const char *mac; u32 width = 0; - if (!np) + if (!dev) return -ENODEV; - config-phy_interface = of_get_phy_mode(np); + config-phy_interface = device_get_phy_mode(dev); - mac = of_get_mac_address(np); - if (mac) - memcpy(config-mac, mac, ETH_ALEN); + device_get_mac_address(dev, config-mac, ETH_ALEN); - of_property_read_u32(np, reg-shift, config-shift); + device_property_read_u32(dev, reg-shift, config-shift); - of_property_read_u32(np, reg-io-width, width); + device_property_read_u32(dev, reg-io-width, width); if (width == 4) config-flags |= SMSC911X_USE_32BIT; else config-flags |= SMSC911X_USE_16BIT; - if (of_get_property(np, smsc,irq-active-high, NULL)) + if (device_property_present(dev, smsc,irq-active-high)) config-irq_polarity = SMSC911X_IRQ_POLARITY_ACTIVE_HIGH; - if (of_get_property(np, smsc,irq-push-pull, NULL)) + if (device_property_present(dev, smsc,irq-push-pull)) config-irq_type = SMSC911X_IRQ_TYPE_PUSH_PULL; - if (of_get_property(np, smsc,force-internal-phy, NULL)) + if (device_property_present(dev, smsc,force-internal-phy)) config-flags |= SMSC911X_FORCE_INTERNAL_PHY; - if (of_get_property(np, smsc,force-external-phy,
[GIT] Networking
1) Workaround hw bug when acquiring PCI bos ownership of iwlwifi devices, from Emmanuel Grumbach. 2) Falling back to vmalloc in conntrack should not emit a warning, from Pablo Neira Ayuso. 3) Fix NULL deref when rtlwifi driver is used as an AP, from Luis Felipe Dominguez Vega. 4) Rocker doesn't free netdev on device removal, from Ido Schimmel. 5) UDP multicast early sock demux has route handling races, from Eric Dumazet. 6) Fix L4 checksum handling in openvswitch, from Glenn Griffin. 7) Fix use-after-free in skb_set_peeked, from Herbert Xu. 8) Don't advertize NETIF_F_FRAGLIST in virtio_net driver, this can lead to fraglists longer than the driver can support. From Jason Wang. 9) Fix mlx5 on non-4k-pagesize systems, from Carol L Soto. 10) Fix interrupt storm in bna driver, from Ivan Vecera. 11) Don't propagate -EBUSY from netlink_insert(), from Daniel Borkmann. 12) Fix inet request sock leak, from Eric Dumazet. 13) Fix TX interrupt masking and marking in TX descriptors of fs_enet driver, from LEROY Christophe. 14) Get rid of rule optimizer in gianfar driver, it's buggy and unlikely to get fixed any time soon. From Jakub Kicinski. Please pull, thanks a lot! The following changes since commit 7c764cec3703583247c4ab837c652975a3d41f4b: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-07-31 17:10:56 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net for you to fetch changes up to e6d006938c9bda7ffd22af9d3e1257fd75941fb7: cosa: missing error code on failure in probe() (2015-08-12 16:53:11 -0700) Antonio Quartulli (1): batman-adv: avoid DAT to mess up LAN state Avraham Stern (1): iwlwifi: mvm: Fix regular scan priority Benjamin Poirier (1): net-timestamp: Update skb_complete_tx_timestamp comment Carol L Soto (1): net/mlx5_core: Set log_uar_page_sz for non 4K page size architecture Dan Carpenter (4): netfilter: nf_conntrack: checking for IS_ERR() instead of NULL rds: fix an integer overflow test in rds_info_getsockopt() cxgb4: missing curly braces in t4_setup_debugfs() cosa: missing error code on failure in probe() Daniel Borkmann (1): netlink: make sure -EBUSY won't escape from netlink_insert David S. Miller (8): Merge tag 'wireless-drivers-for-davem-2015-08-04' of git://git.kernel.org/.../kvalo/wireless-drivers Merge branch 'be2net-fixes' Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Merge branch 'mvpp2-fixes' Merge branch 'bnx2x-fixes' Merge git://git.kernel.org/.../pablo/nf Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth Merge branch 'gianfar-fixes' Emmanuel Grumbach (2): iwlwifi: pcie: fix prepare card flow iwlwifi: pcie: fix stuck queue detection for sleeping clients Eric Dumazet (4): fq_codel: explicitly reset flows in -reset() udp: fix dst races with multicast early demux inet: fix races with reqsk timers inet: fix possible request socket leak Fabio Estevam (1): mkiss: Fix error handling in mkiss_open() Florian Fainelli (1): net: dsa: Do not override PHY interface if already configured Florian Westphal (1): ipv6: don't reject link-local nexthop on other interface Glenn Griffin (1): openvswitch: Fix L4 checksum handling when dealing with IP fragments Hauke Mehrtens (1): b43: fix extpa_gain check for 2GHz Herbert Xu (1): net: Fix skb_set_peeked use-after-free bug Ian Campbell (1): net: thunderx: remove effective default y from Kconfig if ARCH_THUNDER=y Ido Schimmel (1): rocker: free netdevice during netdevice removal Ivan Vecera (2): r8169: enforce RX_MULTI_EN on rtl8168ep/8111ep chips bna: fix interrupts storm caused by erroneous packets Jakub Kicinski (3): gianfar: correct filer table writing gianfar: correct list membership accounting gianfar: remove faulty filer optimizer Jakub Pawlowski (1): Bluetooth: fix MGMT_EV_NEW_LONG_TERM_KEY event Jason Wang (1): virtio-net: drop NETIF_F_FRAGLIST Jia-Ju Bai (1): 3c59x: Fix resource leaks in vortex_open Joe Stringer (1): netfilter: conntrack: Use flags in nf_ct_tmpl_alloc() Kalesh AP (3): be2net: enable IFACE filters only after creating RXQs be2net: post buffers before destroying RXQs in Lancer be2net: protect eqo-affinity_mask from getting freed twice Kalle Valo (1): Merge tag 'iwlwifi-for-kalle-2015-07-30' of https://git.kernel.org/.../iwlwifi/iwlwifi-fixes LEROY Christophe (2): net: fs_enet: explicitly remove I flag on TX partial frames net: fs_enet: mask interrupts for TX partial frames. Larry Finger (1): rtlwifi: rtl8723be: Add module parameter for MSI interrupts Lucas Stach (1): net: fec: fix
Re: [PATCH 1/2] Add a matching set of device_ functions for determining mac/phy
Hi Jeremy, On 12/08/15 23:06, Jeremy Linton wrote: [...] +static void *device_get_mac_addr(struct device *dev, +const char *name, char *addr, +int alen) +{ + int ret = device_property_read_u8_array(dev, name, addr, alen); + + if (ret == 0 is_valid_ether_addr(addr)) + return addr; + return NULL; +} Not sure I understand the logic here - return the same thing we were given if we updated it, or null if we didn't. It's only indicating success/failure (the caller can perfectly well cast its own buffer to a void * if it needs to), so why wouldn't you just return a normal int error code? +/** + * Search the device tree for the best MAC address to use. 'mac-address' is + * checked first, because that is supposed to contain to most recent MAC + * address. If that isn't set, then 'local-mac-address' is checked next, + * because that is the default address. If that isn't set, then the obsolete + * 'address' is checked, just in case we're using an old device tree. + * + * Note that the 'address' property is supposed to contain a virtual address of + * the register set, but some DTS files have redefined that property to be the + * MAC address. + * + * All-zero MAC addresses are rejected, because those could be properties that + * exist in the device tree, but were not set by U-Boot. For example, the + * DTS could define 'mac-address' and 'local-mac-address', with zero MAC + * addresses. Some older U-Boots only initialized 'local-mac-address'. In + * this case, the real MAC is in 'local-mac-address', and 'mac-address' exists + * but is all zeros. +*/ +void *device_get_mac_address(struct device *dev, char *addr, int alen) +{ + addr = device_get_mac_addr(dev, mac-address, addr, alen); + if (addr) + return addr; + + addr = device_get_mac_addr(dev, local-mac-address, addr, alen); + if (addr) + return addr; + + return device_get_mac_addr(dev, address, addr, alen); +} +EXPORT_SYMBOL(device_get_mac_address); Same here, it's not at all apparent why this should return a void * instead of an int (or even possibly bool). of_get_mac_address is giving its caller back a _new_ pointer they didn't know about before; this isn't. Robin. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Cluster-devel] [PATCH 4/6] dlm: use sctp 1-to-1 API
Em 13-08-2015 06:37, Steven Whitehouse escreveu: Hi, On 12/08/15 17:42, Marcelo Ricardo Leitner wrote: Em 12-08-2015 12:33, David Laight escreveu: From: Marcelo Ricardo Leitner Sent: 12 August 2015 14:16 Em 12-08-2015 07:23, David Laight escreveu: From: Marcelo Ricardo Leitner Sent: 11 August 2015 23:22 DLM is using 1-to-many API but in a 1-to-1 fashion. That is, it's not needed but this causes it to use sctp_do_peeloff() to mimic an kernel_accept() and this causes a symbol dependency on sctp module. By switching it to 1-to-1 API we can avoid this dependency and also reduce quite a lot of SCTP-specific code in lowcomms.c. ... You still need to enable sctp notifications (I think the patch deleted that code). Otherwise you don't get any kind of indication if the remote system 'resets' (ie sends an new INIT chunk) on an existing connection. Right, it would miss the restart event and could generate a corrupted tx/rx buffers by glueing parts of old messages with new ones. Except that it is SCTP so you'd expect DATA chunks to contain entire messages and so get unexpected message sequences rather than corrupt messages. I was thinking on cases where the buf for recvmsg is not enough to hold the chunk, so that the remaining is left for another attempt (sctp_recvmsg, around line 2130), but sounds like we won't purge rx buffer when the reset happens so that doesn't matter. The association is replaced, but the buffers are kept. Out of order messages aren't a problem for dlm. It can recover from that just fine, as it doesn't have a specific handshake at beginning or something like that and upper layers are agnostic to that state transition (disconnect/reconnect/...), this should be fine. I'm not sure thats true - DLM does rely on message ordering in some cases in order to ensure correct functioning. So depending on how SCTP is interfaced to DLM, it might potentially be an issue, Yes, that ordering is still kept. Like, it won't flip a newer message to a first position. It's just that if DLM had its own handshake exposing its version and features, one peer (the old one) would get it out of the blue and the other (the new one) would never get it. Or if its messages would depend on a previous state, meaning LockMsgC is only acceptable if LockMsgA was already performed on that connection. That is my understanding from what David pointed out and what I checked here. Then as lowcomms previously allowed connection closing without telling anyone above it that it happened, it should be fine, right? It will just finish processing the old messages and then start on the new ones, just like before. Thanks, Marcelo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ath9k_htc: drv_init: match wait_for_completion_timeout return type
Return type of wait_for_completion_timeout is unsigned long not int. As time_left is exclusively used for wait_for_completion_timeout here its type is simply changed to unsigned long. API conformance testing for completions with coccinelle spatches are being used to locate API usage inconsistencies: ./drivers/net/wireless/ath/ath9k/htc_drv_init.c:81 int return assigned to unsigned long Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m, CONFIG_ATH9K_HTC=m Patch is against 4.1-rc3 (localversion-next is -next-20150514) Signed-off-by: Nicholas Mc Guire hof...@osadl.org Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] net: rfkill: add rfkill_find_type function
Hi, On Thu, Aug 13, 2015 at 11:27:46AM +0200, Johannes Berg wrote: On Wed, 2015-08-05 at 16:39 +0300, Heikki Krogerus wrote: +static const char *rfkill_types[NUM_RFKILL_TYPES] = { + [RFKILL_TYPE_WLAN] = wlan, + [RFKILL_TYPE_BLUETOOTH] = bluetooth, + [RFKILL_TYPE_UWB] = ultrawideband, + [RFKILL_TYPE_WIMAX] = wimax, + [RFKILL_TYPE_WWAN] = wwan, + [RFKILL_TYPE_GPS] = gps, + [RFKILL_TYPE_FM]= fm, + [RFKILL_TYPE_NFC] = nfc, +}; + +enum rfkill_type rfkill_find_type(const char *name) +{ + int i; + + BUILD_BUG_ON(NUM_RFKILL_TYPES != RFKILL_TYPE_NFC + 1); That BUILD_BUG_ON() is now less useful - previously it pointed to the code that needed to change, now you're left wondering if you don't look up since it isn't quite that obvious from the code what this does. Something like BUILD_BUG_ON(rfkill_types[NUM_RFKILL_TYPES - 1] == NULL); would be better. As we only add here, that would be safe enough - I've done something similar in the past that a bit more complicated. OK, I'll change it. With that and the static inline fixed (which maybe you could even remove) I'm fine with all these rfkill patches, but I'm not sure how to merge them since they affect all kinds of other trees. If desired, I can apply them, but an ACK from the tegra maintainer would be good :) Andy and Mika are preparing some changes to the device property handling. I'll wait for their proposal and prepare next version these after that. Thanks, -- heikki -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/wireless: enable wiphy device to suspend/resume asynchronously
On Thu, 2015-07-30 at 08:55 +0300, Emmanuel Grumbach wrote: On Thu, Jul 30, 2015 at 8:18 AM, Fu, Zhonghui zhonghui...@linux.intel.com wrote: Enable wiphy device to suspend/resume asynchronously. This can improve system suspend/resume speed. How will that impact the timing with respect to the suspend call coming from the bus? I think that a few drivers rely on the suspend call of the wiphy device happening before the suspend call to the bus device. Yes, we can't do this for precisely this reason unless we have a way to somehow keep the dependency between the two - possibly by also marking the other one as async (although I don't know if the async framework in general has any FIFO guarantees, which would be required for this.) I've dropped the patch. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] ppp: fix device unregistration upon netns deletion
PPP devices may get automatically unregistered when their network namespace is getting removed. This happens if the ppp control plane daemon (e.g. pppd) exits while it is the last user of this namespace. This leads to several races: * ppp_exit_net() may destroy the per namespace idr (pn-units_idr) before all file descriptors were released. Successive ppp_release() calls may then cleanup PPP devices with ppp_shutdown_interface() and try to use the already destroyed idr. * Automatic device unregistration may also happen before the ppp_release() call for that device gets executed. Once called on the file owning the device, ppp_release() will then clean it up and try to unregister it a second time. To fix these issues, operations defined in ppp_shutdown_interface() are moved to the PPP device's ndo_uninit() callback. This allows PPP devices to be properly cleaned up by unregister_netdev() and friends. So checking for ppp-owner is now an accurate test to decide if a PPP device should be unregistered. Setting ppp-owner is done in ppp_create_interface(), before device registration, in order to avoid unprotected modification of this field. Finally ppp_exit_net() now starts by unregistering all remaining PPP devices to ensure that none will get unregistered after the call to idr_destroy(). Signed-off-by: Guillaume Nault g.na...@alphalink.fr --- drivers/net/ppp/ppp_generic.c | 79 +++ 1 file changed, 43 insertions(+), 36 deletions(-) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 9d15566..1dc478a 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -269,9 +269,9 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound); static void ppp_ccp_closed(struct ppp *ppp); static struct compressor *find_compressor(int type); static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st); -static struct ppp *ppp_create_interface(struct net *net, int unit, int *retp); +static struct ppp *ppp_create_interface(struct net *net, int unit, + struct file *file, int *retp); static void init_ppp_file(struct ppp_file *pf, int kind); -static void ppp_shutdown_interface(struct ppp *ppp); static void ppp_destroy_interface(struct ppp *ppp); static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit); static struct channel *ppp_find_channel(struct ppp_net *pn, int unit); @@ -392,8 +392,10 @@ static int ppp_release(struct inode *unused, struct file *file) file-private_data = NULL; if (pf-kind == INTERFACE) { ppp = PF_TO_PPP(pf); + rtnl_lock(); if (file == ppp-owner) - ppp_shutdown_interface(ppp); + unregister_netdevice(ppp-dev); + rtnl_unlock(); } if (atomic_dec_and_test(pf-refcnt)) { switch (pf-kind) { @@ -593,8 +595,10 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg) mutex_lock(ppp_mutex); if (pf-kind == INTERFACE) { ppp = PF_TO_PPP(pf); + rtnl_lock(); if (file == ppp-owner) - ppp_shutdown_interface(ppp); + unregister_netdevice(ppp-dev); + rtnl_unlock(); } if (atomic_long_read(file-f_count) 2) { ppp_release(NULL, file); @@ -838,11 +842,10 @@ static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf, /* Create a new ppp unit */ if (get_user(unit, p)) break; - ppp = ppp_create_interface(net, unit, err); + ppp = ppp_create_interface(net, unit, file, err); if (!ppp) break; file-private_data = ppp-file; - ppp-owner = file; err = -EFAULT; if (put_user(ppp-file.index, p)) break; @@ -916,6 +919,17 @@ static __net_init int ppp_init_net(struct net *net) static __net_exit void ppp_exit_net(struct net *net) { struct ppp_net *pn = net_generic(net, ppp_net_id); + struct ppp *ppp; + LIST_HEAD(list); + int id; + + rtnl_lock(); + idr_for_each_entry(pn-units_idr, ppp, id) { + unregister_netdevice_queue(ppp-dev, list); + } + + unregister_netdevice_many(list); + rtnl_unlock(); idr_destroy(pn-units_idr); } @@ -1088,8 +1102,28 @@ static int ppp_dev_init(struct net_device *dev) return 0; } +static void ppp_dev_uninit(struct net_device *dev) +{ + struct ppp *ppp = netdev_priv(dev); + struct ppp_net *pn =
Re: [PATCH] mac80211: fix invalid read in minstrel_sort_best_tp_rates()
On Tue, 2015-07-28 at 10:30 +0200, Adrien Schildknecht wrote: At the last iteration of the loop, j may equal zero and thus tp_list[j - 1] causes an invalid read. Changed the logic of the loop so that j - 1 is always = 0. Signed-off-by: Adrien Schildknecht adrien+...@schischi.me Applied, I added Cc stable. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [V2] rtlwifi: misspelled code and comments corrected.
Signed-off-by: Cheolhyun Park pch851...@gmail.com Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ath9k_htc: match wait_for_completion_timeout return type
Return type of wait_for_completion_timeout is unsigned long not int. As time_left is exclusively used for wait_for_completion_timeout here its type is simply changed to unsigned long. API conformance testing for completions with coccinelle spatches are being used to locate API usage inconsistencies: ./drivers/net/wireless/ath/ath9k/htc_hst.c:171 int return assigned to unsigned long ./drivers/net/wireless/ath/ath9k/htc_hst.c:277 int return assigned to unsigned long ./drivers/net/wireless/ath/ath9k/htc_hst.c:206 int return assigned to unsigned long Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m, CONFIG_ATH9K_HTC=m Patch is against 4.1-rc3 (localversion-next is -next-20150514) Signed-off-by: Nicholas Mc Guire hof...@osadl.org Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
* Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. Signed-off-by: Igor Plyatov plya...@gmail.com --- drivers/net/phy/smsc.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c index c0f6479..a380958 100644 --- a/drivers/net/phy/smsc.c +++ b/drivers/net/phy/smsc.c @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int rc; + int i; if (!phydev-link) { /* Disable EDPD to wake up PHY */ - int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); if (rc 0) return rc; @@ -116,8 +118,16 @@ static int lan87xx_read_status(struct phy_device *phydev) if (rc 0) return rc; - /* Sleep 64 ms to allow ~5 link test pulses to be sent */ - msleep(64); + /* Wait max 640 ms to detect energy */ + for (i = 0; i 64; i++) { + /* Sleep to allow link test pulses to be sent */ + msleep(10); + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); + if (rc 0) + return rc; + if (rc MII_LAN83C185_ENERGYON) + break; + }; /* Re-enable EDPD */ rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); @@ -191,7 +201,7 @@ static struct phy_driver smsc_phy_driver[] = { /* basic functions */ .config_aneg= genphy_config_aneg, - .read_status= genphy_read_status, + .read_status= lan87xx_read_status, .config_init= smsc_phy_config_init, .soft_reset = smsc_phy_reset, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/3] brcmfmac: dhd_sdio.c: use existing atomic_or primitive
There's already a generic implementation so use that instead. Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question on behavior of tg3_self_test() (ethtool -t on tg3 driver)
Very interesting. I was running a RHEL 7.1 kernel 3.10.0-229.ael7b.ppc64le (PowerPC). tg3 version 3.137, firmware 5719-v1.24i, but unknown what patches were added to either of our modules. We will investigate the environment more, under the assumption that we should not be required to insert any delay between runs of ethtool -t ... offline. Thanks Siva, Doug On 08/13/2015 03:40 AM, Siva Reddy (Siva) Kallam wrote: On 8/12/2015 6:02 PM, Douglas Miller wrote: Oh, I had missed the extra if condition on tg3_test_link(). So external_lb is not a true superset of offline. So you are not surprised by the (about) 20 second link down period after this test? If this is expected (albeit undocumented) behavior we can change the test scenario to work around it. It seems as though not all adapters exhibit this same symptom. From a testing standpoint, it is a long delay to add that may only be needed for this one adapter (Broadcom BCM5719, or adapter family). We executed the ethtool -t dev offline in a loop on our local test machine with 5719 and linkup time is = 5 secs. Script: #!/bin/bash echo -OS Information- uname -a echo --Card Information-- lspci | grep 5719 echo --Interface information-- ethtool -i p4p4 echo -Offline test start-- for i in 1 2 3 do date ethtool -t p4p4 offline done Output: -OS Information- Linux siva-dev 4.2.0-rc4+ #1 SMP Thu Aug 13 20:24:11 IST 2015 x86_64 x86_64 x86_64 GNU/Linux --Card Information-- 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 03:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 03:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) 03:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01) --Interface information-- driver: tg3 version: 3.137 firmware-version: 5719-v1.41 NCSI v1.3.6.0 bus-info: :03:00.3 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no -Offline test start-- Thu Aug 13 22:05:59 IST 2015 The test result is PASS The test extra info: nvram test(online) 0 link test (online) 0 register test (offline) 0 memory test (offline) 0 mac loopback test (offline) 0 phy loopback test (offline) 0 ext loopback test (offline) 0 interrupt test(offline) 0 Thu Aug 13 22:06:00 IST 2015 The test result is PASS The test extra info: nvram test(online) 0 link test (online) 0 register test (offline) 0 memory test (offline) 0 mac loopback test (offline) 0 phy loopback test (offline) 0 ext loopback test (offline) 0 interrupt test(offline) 0 Thu Aug 13 22:06:05 IST 2015 The test result is PASS The test extra info: nvram test(online) 0 link test (online) 0 register test (offline) 0 memory test (offline) 0 mac loopback test (offline) 0 phy loopback test (offline) 0 ext loopback test (offline) 0 interrupt test(offline) 0 Please check your test environment. Thanks, Doug On 08/11/2015 03:31 PM, Michael Chan wrote: On Tue, 2015-08-11 at 14:24 -0500, Douglas Miller wrote: Yes, the wrap plugs are the loopback cables/plugs. It is my understanding that the offline tests do not require anything to be plugged into the ports, as they do not in any way touch the external port. They perform an internal loopback test which does not depend on any external connection. Correct. From what I can tell, the only difference between offline and external_lb is that external_lb performs the external loopback tests, *in addition to* all the tests done for offline. Correct. This would imply that the only tests that depend on anything connected to the physical port is external_lb, and there is no requirement that the wrap plugs be removed/replaced in order to run offline tests. When you do external loopback test, we skip the link test because you no longer have normal connection to the network. You now use a special loopback cable, which will fail the link up test because the link up test assumes connection to the network using normal cable. In the case I was debugging, wrap plugs were installed because the ports were, later, being tested in an external loopback way. What I am observing is that it takes about 20 seconds for the kernel to declare that the link is up, after running the offline or external_lb test. In the case of offline I cannot run the test again until the kernel declares the link up. In the case of external_lb I can run the test again immediately and it passes. As stated earlier, because we skip the link test when we are performing external_lb. So, you should always do
Re: [PATCH] mac80211_hwsim: unregister genetlink family properly
On Fri, 2015-08-07 at 16:54 +0800, Su Kang Yin wrote: During hwsim_init_netlink(), we should call genl_unregister_family() if failed on netlink_register_notifier() since the genetlink is already registered. Applied. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/2] ppp: implement x-netns support
Let packets move from one netns to the other at PPP encapsulation and decapsulation time. PPP units and channels remain in the netns in which they were originally created. Only the net_device may move to a different namespace. Cross netns handling is thus transparent to lower PPP layers (PPPoE, L2TP, etc.). PPP devices are automatically unregistered when their netns gets removed. So read() and poll() on the unit file descriptor will respectively receive EOF and POLLHUP. Channels aren't affected. Signed-off-by: Guillaume Nault g.na...@alphalink.fr --- drivers/net/ppp/ppp_generic.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 1dc478a..bdde5d8 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -283,6 +283,8 @@ static int unit_set(struct idr *p, void *ptr, int n); static void unit_put(struct idr *p, int n); static void *unit_find(struct idr *p, int n); +static const struct net_device_ops ppp_netdev_ops; + static struct class *ppp_class; /* per net-namespace data */ @@ -919,13 +921,22 @@ static __net_init int ppp_init_net(struct net *net) static __net_exit void ppp_exit_net(struct net *net) { struct ppp_net *pn = net_generic(net, ppp_net_id); + struct net_device *dev; + struct net_device *aux; struct ppp *ppp; LIST_HEAD(list); int id; rtnl_lock(); + for_each_netdev_safe(net, dev, aux) { + if (dev-netdev_ops == ppp_netdev_ops) + unregister_netdevice_queue(dev, list); + } + idr_for_each_entry(pn-units_idr, ppp, id) { - unregister_netdevice_queue(ppp-dev, list); + /* Skip devices already unregistered by previous loop */ + if (!net_eq(dev_net(ppp-dev), net)) + unregister_netdevice_queue(ppp-dev, list); } unregister_netdevice_many(list); @@ -1018,6 +1029,7 @@ ppp_start_xmit(struct sk_buff *skb, struct net_device *dev) proto = npindex_to_proto[npi]; put_unaligned_be16(proto, pp); + skb_scrub_packet(skb, !net_eq(ppp-ppp_net, dev_net(dev))); skb_queue_tail(ppp-file.xq, skb); ppp_xmit_process(ppp); return NETDEV_TX_OK; @@ -1138,7 +1150,6 @@ static void ppp_setup(struct net_device *dev) dev-tx_queue_len = 3; dev-type = ARPHRD_PPP; dev-flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST; - dev-features |= NETIF_F_NETNS_LOCAL; netif_keep_dst(dev); } @@ -1901,6 +1912,8 @@ ppp_receive_nonmp_frame(struct ppp *ppp, struct sk_buff *skb) skb-dev = ppp-dev; skb-protocol = htons(npindex_to_ethertype[npi]); skb_reset_mac_header(skb); + skb_scrub_packet(skb, !net_eq(ppp-ppp_net, + dev_net(ppp-dev))); netif_rx(skb); } } -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/2] ppp: fix device unregistration upon netns deletion
PPP devices may get automatically unregistered when their network namespace is getting removed. This happens if the ppp control plane daemon (e.g. pppd) exits while it is the last user of this namespace. This leads to several races: * ppp_exit_net() may destroy the per namespace idr (pn-units_idr) before all file descriptors were released. Successive ppp_release() calls may then cleanup PPP devices with ppp_shutdown_interface() and try to use the already destroyed idr. * Automatic device unregistration may also happen before the ppp_release() call for that device gets executed. Once called on the file owning the device, ppp_release() will then clean it up and try to unregister it a second time. To fix these issues, operations defined in ppp_shutdown_interface() are moved to the PPP device's ndo_uninit() callback. This allows PPP devices to be properly cleaned up by unregister_netdev() and friends. So checking for ppp-owner is now an accurate test to decide if a PPP device should be unregistered. Setting ppp-owner is done in ppp_create_interface(), before device registration, in order to avoid unprotected modification of this field. Finally ppp_exit_net() now starts by unregistering all remaining PPP devices to ensure that none will get unregistered after the call to idr_destroy(). Signed-off-by: Guillaume Nault g.na...@alphalink.fr --- drivers/net/ppp/ppp_generic.c | 79 +++ 1 file changed, 43 insertions(+), 36 deletions(-) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 9d15566..1dc478a 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -269,9 +269,9 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound); static void ppp_ccp_closed(struct ppp *ppp); static struct compressor *find_compressor(int type); static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st); -static struct ppp *ppp_create_interface(struct net *net, int unit, int *retp); +static struct ppp *ppp_create_interface(struct net *net, int unit, + struct file *file, int *retp); static void init_ppp_file(struct ppp_file *pf, int kind); -static void ppp_shutdown_interface(struct ppp *ppp); static void ppp_destroy_interface(struct ppp *ppp); static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit); static struct channel *ppp_find_channel(struct ppp_net *pn, int unit); @@ -392,8 +392,10 @@ static int ppp_release(struct inode *unused, struct file *file) file-private_data = NULL; if (pf-kind == INTERFACE) { ppp = PF_TO_PPP(pf); + rtnl_lock(); if (file == ppp-owner) - ppp_shutdown_interface(ppp); + unregister_netdevice(ppp-dev); + rtnl_unlock(); } if (atomic_dec_and_test(pf-refcnt)) { switch (pf-kind) { @@ -593,8 +595,10 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg) mutex_lock(ppp_mutex); if (pf-kind == INTERFACE) { ppp = PF_TO_PPP(pf); + rtnl_lock(); if (file == ppp-owner) - ppp_shutdown_interface(ppp); + unregister_netdevice(ppp-dev); + rtnl_unlock(); } if (atomic_long_read(file-f_count) 2) { ppp_release(NULL, file); @@ -838,11 +842,10 @@ static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf, /* Create a new ppp unit */ if (get_user(unit, p)) break; - ppp = ppp_create_interface(net, unit, err); + ppp = ppp_create_interface(net, unit, file, err); if (!ppp) break; file-private_data = ppp-file; - ppp-owner = file; err = -EFAULT; if (put_user(ppp-file.index, p)) break; @@ -916,6 +919,17 @@ static __net_init int ppp_init_net(struct net *net) static __net_exit void ppp_exit_net(struct net *net) { struct ppp_net *pn = net_generic(net, ppp_net_id); + struct ppp *ppp; + LIST_HEAD(list); + int id; + + rtnl_lock(); + idr_for_each_entry(pn-units_idr, ppp, id) { + unregister_netdevice_queue(ppp-dev, list); + } + + unregister_netdevice_many(list); + rtnl_unlock(); idr_destroy(pn-units_idr); } @@ -1088,8 +1102,28 @@ static int ppp_dev_init(struct net_device *dev) return 0; } +static void ppp_dev_uninit(struct net_device *dev) +{ + struct ppp *ppp = netdev_priv(dev); + struct ppp_net *pn =
[PATCH net-next 0/2] ppp: implement x-netns support
This series allows PPP devices to reside in a different netns from the PPP unit/channels. Packets only cross netns boundaries when they're transmitted between the net_device and the PPP unit (units and channels always remain in their creation namespace). So only PPP units need to handle cross namespace operations. Channels and lower layer protocols aren't affected. Patch #1 is a bug fix for an existing namespace deletion bug and has been separetly sent to net. Patch #2 is the actual x-netns implementation. Guillaume Nault (2): ppp: fix device unregistration upon netns deletion ppp: implement x-netns support drivers/net/ppp/ppp_generic.c | 94 ++- 1 file changed, 57 insertions(+), 37 deletions(-) -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ath9k_htc: wmi: match wait_for_completion_timeout return type
Return type of wait_for_completion_timeout is unsigned long not int. As time_left is exclusively used for wait_for_completion_timeout here its type is simply changed to unsigned long. API conformance testing for completions with coccinelle spatches are being used to locate API usage inconsistencies: ./drivers/net/wireless/ath/ath9k/wmi.c:331 int return assigned to unsigned long Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m, CONFIG_ATH9K_HTC=m Patch is against 4.1-rc3 (localversion-next is -next-20150514) Signed-off-by: Nicholas Mc Guire hof...@osadl.org Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ath9k: match wait_for_completion_timeout return type
Return type of wait_for_completion_timeout is unsigned long not int. As time_left is exclusively used for wait_for_completion_timeout here its type is simply changed to unsigned long. API conformance testing for completions with coccinelle spatches are being used to locate API usage inconsistencies: ./drivers/net/wireless/ath/ath9k/link.c:197 int return assigned to unsigned long Patch was compile tested with x86_64_defconfig + CONFIG_ATH_CARDS=m, Patch is against 4.1-rc3 (localversion-next is -next-20150514) Signed-off-by: Nicholas Mc Guire hof...@osadl.org Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH 1/3] net: make default tx_queue_len configurable
On Thu, 13 Aug 2015 03:13:40 +0200 Phil Sutter p...@nwl.cc wrote: On Tue, Aug 11, 2015 at 06:13:49PM -0700, Alexei Starovoitov wrote: In general 'changing the default' may be an acceptable thing, but then it needs to strongly justified. How much performance does it bring? A quick test on my local VM with veth and netperf (netserver and veth peer in different netns) I see an increase of about 5% of throughput when using noqueue instead of the default pfifo_fast. Good that you can show 5% improvement with a single netperf flow. We are saving approx 6 atomic operations avoiding the qdisc code path. This fixes a scalability issue with veth. Thus, the real performance boost will happen with multiple flows and multiple CPU cores in action. You can try with a multi core VM and use super_netperf. https://github.com/borkmann/stuff/blob/master/super_netperf -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] gianfar: Restore link state settings after MAC reset
There are some MAC registers that need to be kept in sync with the link state parameters, see adjust_link(). However, after a MAC soft reset default values for these registers are assumed. In some cases (excepting if down/ if up for example) adjust_link() does not see that these values were reset to default because the priv-old* link parameters were left unchanged. So, reset the priv-old* link params as well during a MAC reset to let adjust_link() restore the MAC link settings to the actual link state values. Fixes following case, for example: Setting link to 100M, changing MTU (implies MAC reset), link state remains unchanged to 100M but MAC registers were reset to default (1G) breaking the connectivity w/ the PHY. Closing and re-opening the interface would restore the MAC link parameters to the correct values. Signed-off-by: Claudiu Manoil claudiu.man...@freescale.com --- drivers/net/ethernet/freescale/gianfar.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 2b7610f..10b3 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -2102,6 +2102,11 @@ int startup_gfar(struct net_device *ndev) /* Start Rx/Tx DMA and enable the interrupts */ gfar_start(priv); + /* force link state update after mac reset */ + priv-oldlink = 0; + priv-oldspeed = 0; + priv-oldduplex = -1; + phy_start(priv-phydev); enable_napi(priv); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/3] mpls: multipath support
On 13/08/15 03:07, roopa wrote: On 8/12/15, 10:30 AM, Robert Shearman wrote: On 11/08/15 22:45, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com This patch series adds multipath support to mpls routes. resembles ipv4 multipath support. The multipath route nexthop selection algorithm is the same code as in ipv4 fib code. I understand that the multipath algorithm in ipv4 is undergoing some changes and will move mpls to similar algo if applicable once those get merged. Is it necessary for the mpls patch selection algorithm to closely resemble the ipv4 one? No, It is not necessary. I picked that because it was already there. And I see that ipv4 is also getting some new multipath algorithms (https://marc.info/?l=linux-apim=143457208315573w=2). I wanted to move to the new RT_MP infra if that becomes applicable in the future. The MPLS code doesn't have the binary compatibility requirement that the IPv4 path does, so there isn't so much of a need for the algorithm to be configurable, provided the default is reasonable. Unless you have a use case in mind that would particularly suited to the round-robin algorithm? A flow based algorithm would be much better for traffic that is sensitive to re-ordering (e.g TCP, L2VPN) and IMHO we should do this from the start for MPLS. I've also been looking at implementing this functionality. I've got a set of patches for this that I can send if you'd like. Definitely. But, It seems like you can also submit incremental patches to mine. You can replace the current algo with a hash based with your patches. With a flow-based algorithm if there's no need to support weighted paths then there's no need to iterate through the nexthops to work out which one should be used and, therefore, there is a performance benefit. So my patches implement a flow-based path selection without support for weighted paths. This is similar way to how the IPv6 path selection works. The user can still do UCMP with this mechanism, but they have to add the same nexthop multiple times. I don't know if this trade-off is worth it, but the benefit is that we can always add support for weighted paths in the future, whereas removing support for weighted paths would be harder due to compatibility concerns. Therefore, if I rebased my patches on top of yours I would be removing code managing the weighting that you will have just added. Not sure if that is desirable. If that does not work for you and if you want me to merge with this series that works too. I think that would work better. I'll send you a patch against your current series. Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] mpls: move mpls_route nexthop fields to a new nhlfe struct
On 13/08/15 04:16, roopa wrote: On 8/12/15, 12:15 PM, Robert Shearman wrote: On 11/08/15 22:45, Roopa Prabhu wrote: From: Roopa Prabhu ro...@cumulusnetworks.com moves mpls_route nexthop fields to a new mpls_nhlfe struct. mpls_nhlfe represents a mpls nexthop label forwarding entry. It prepares mpls route structure for multipath support. In the process moves mpls_route structure into internal.h. Is there a requirement for moving this and the new datastructures into internal.h? I may have missed it, but I don't see any dependency on this in this patch series. No dependency really. In my initial implementation of iptunnels I had some shared code and it had been in internal.h since then. i don't share any of this with iptunnels now. But, if you see patch 3/3, there is a lot more macros I add with struct nhlfe etc and it is cleaner to move all this to a header file than keeping it in the .c file. Ok, I have no strong preference. Moves some of the code from mpls_route_add into a separate mpls nhlfe build function. changed mpls_rt_alloc to take number of nexthops as argument. A mpls route can point to multiple mpls_nhlfe. This patch does not support multipath yet, hence the rest of the changes assume that a mpls route points to a single mpls_nhlfe Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com --- net/mpls/af_mpls.c | 225 --- net/mpls/internal.h | 35 2 files changed, 158 insertions(+), 102 deletions(-) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 8c5707d..cf86e9d 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -21,35 +21,6 @@ #endif #include internal.h -#define LABEL_NOT_SPECIFIED (120) -#define MAX_NEW_LABELS 2 - -/* This maximum ha length copied from the definition of struct neighbour */ -#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))) - -enum mpls_payload_type { -MPT_UNSPEC, /* IPv4 or IPv6 */ -MPT_IPV4 = 4, -MPT_IPV6 = 6, - -/* Other types not implemented: - * - Pseudo-wire with or without control word (RFC4385) - * - GAL (RFC5586) - */ -}; - -struct mpls_route { /* next hop label forwarding entry */ -struct net_device __rcu *rt_dev; -struct rcu_headrt_rcu; -u32rt_label[MAX_NEW_LABELS]; -u8rt_protocol; /* routing protocol that set this entry */ -u8 rt_payload_type; -u8rt_labels; -u8rt_via_alen; -u8rt_via_table; -u8rt_via[0]; -}; - static int zero = 0; static int label_limit = (1 20) - 1; ... @@ -281,13 +254,15 @@ struct mpls_route_config { struct nl_inforc_nlinfo; }; -static struct mpls_route *mpls_rt_alloc(size_t alen) +static struct mpls_route *mpls_rt_alloc(int num_nh) { struct mpls_route *rt; -rt = kzalloc(sizeof(*rt) + alen, GFP_KERNEL); +rt = kzalloc(sizeof(*rt) + (num_nh * sizeof(struct mpls_nhlfe)), How about this instead: offsetof(typeof(*rt), rt_nh[num_nh]) ? That way, you don't need to write out the type of rt_nh here. I don't mind, but i followed existing convention for this (especially the fib code). would prefer keeping it the current way. I don't think we have to follow the ipv4 convention here, but again I have no strong preference. + GFP_KERNEL); if (rt) -rt-rt_via_alen = alen; +rt-rt_nhn = num_nh; + return rt; } Thanks for the review. Thank you for implementing this functionality. Rob -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with fragmented packets on tun/tap interface
On Thu, 2015-08-13 at 12:52 +0530, Prashant Upadhyaya wrote: Hi, I think I have a clue to the root cause of my issue, but I do not know a solution. Let me describe what I think is the problem. Fragmented packets enter into the kernel through eth0 and the kernel starts assembling them. Simultaneously, my packet socket implementation also injects the very same packets into the kernel via the tap. The kernel sees them as overlapped packets during assembly and drops the packets injected via the tap. Eventually when the assembly gets complete inside kernel for all the packets which entered via eth0, the whole packet gets dropped due to the iptables rules that I have set on eth0. So naturally there is no response to the bigger ping, because everything got dropped one way or the other. When I do introduce the delays (and it turns out that the delay that matters is when injecting via tap), the kernel has already completed the assembly of the packets via eth0 (during the delay I introduce for submission on tap), and then the submission via tap works well because it undergoes a fresh assembly (and ofcourse it does not get dropped because iptables drop rule is only on eth0) Now then, the question is -- how do I prevent the kernel from trying to assemble the packets arriving on eth0 and drop them right away even before assembly is attempted. This way the same packet injected via the tap would be the only one undergoing assembly and hopefully it would work. Nice theory ! What kind of iptables rule do you have to drop packets coming on eth0 ? Have you tried to install this rule in raw table, PREROUTING hook ? This should work, because the defrag is attempted from ip_local_deliver() [ after raw table has given its verdict] , not from ip_rcv(). iptables -t raw -I PREROUTING -i eth0 -j DROP -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Add a matching set of device_ functions for determining mac/phy
Hello Robin, On 08/13/2015 06:57 AM, Robin Murphy wrote: +static void *device_get_mac_addr(struct device *dev, +const char *name, char *addr, +int alen) +{ + int ret = device_property_read_u8_array(dev, name, addr, alen); + + if (ret == 0 is_valid_ether_addr(addr)) + return addr; + return NULL; +} Not sure I understand the logic here - return the same thing we were given if we updated it, or null if we didn't. It's only indicating success/failure (the caller can perfectly well cast its own buffer to a void * if it needs to), so why wouldn't you just return a normal int error code? No particular reason, other than initially I was trying to keep the function as similar as possible to the one in of_net. AKA copy paste job. I can convert the return types, but I was trying for a simple function rename. That way the users of the of version could be converted with relative ease, and the drivers which invented their own version of these functions could be changed to use this instead. Of course, that plan took a blow, when I added the addr/alen parameters. Same thing applies for the other function. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 1/2] net: track link status of ipv6 nexthops
Add support to track current link status of ipv6 nexthops to match recent changes that added support for ipv4 nexthops. This takes a simple approach to track linkdown status for next-hops and simply checks the dev for the dst entry and sets proper flags that to be used in the netlink message. v2: drop use of rt6i_nhflags since it is not needed right now Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com --- I realize this patch might be a bit different expected based on conversations on netdev yesterday, but I got a few off-list communications that indicated a preference to not expand rt6_info at this time -- despite the fact that expansion will likely be needed for switchdev offload of ipv6 fib entries in the near future. net/ipv6/route.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 54fccf0..26b51e1 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -2885,6 +2885,8 @@ static int rt6_fill_node(struct net *net, else rtm-rtm_type = RTN_UNICAST; rtm-rtm_flags = 0; + if (!netif_carrier_ok(rt-dst.dev)) + rtm-rtm_flags |= RTNH_F_LINKDOWN; rtm-rtm_scope = RT_SCOPE_UNIVERSE; rtm-rtm_protocol = rt-rt6i_protocol; if (rt-rt6i_flags RTF_DYNAMIC) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 2/2] net: ipv6 sysctl option to ignore routes when nexthop link is down
Like the ipv4 patch with a similar title, this adds a sysctl to allow the user to change routing behavior based on whether or not the interface associated with the nexthop was an up or down link. The default setting preserves the current behavior, but anyone that enables it will notice that nexthops on down interfaces will no longer be selected: net.ipv6.conf.all.ignore_routes_with_linkdown = 0 net.ipv6.conf.default.ignore_routes_with_linkdown = 0 net.ipv6.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, not only will link status be reported to userspace, but an indication that a nexthop is dead and will not be used is also reported. 1000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium 1000::/8 via 8000::2 dev p8p1 metric 1024 pref medium 7000::/8 dev p7p1 proto kernel metric 256 dead linkdown pref medium 8000::/8 dev p8p1 proto kernel metric 256 pref medium 9000::/8 via 8000::2 dev p8p1 metric 2048 pref medium 9000::/8 via 7000::2 dev p7p1 metric 1024 dead linkdown pref medium fe80::/64 dev p7p1 proto kernel metric 256 dead linkdown pref medium fe80::/64 dev p8p1 proto kernel metric 256 pref medium This also adds devconf support and notification when sysctl values change. v2: drop use of rt6i_nhflags since it is not needed right now Signed-off-by: Andy Gospodarek go...@cumulusnetworks.com Signed-off-by: Dinesh Dutt dd...@cumulusnetworks.com --- include/linux/ipv6.h | 1 + include/uapi/linux/ipv6.h | 1 + net/ipv6/addrconf.c | 105 +- net/ipv6/route.c | 11 - 4 files changed, 116 insertions(+), 2 deletions(-) diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index cb9dcad..f1f32af 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -31,6 +31,7 @@ struct ipv6_devconf { __s32 accept_ra_defrtr; __s32 accept_ra_min_hop_limit; __s32 accept_ra_pinfo; + __s32 ignore_routes_with_linkdown; #ifdef CONFIG_IPV6_ROUTER_PREF __s32 accept_ra_rtr_pref; __s32 rtr_probe_interval; diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h index 80f3b74..38b4fef 100644 --- a/include/uapi/linux/ipv6.h +++ b/include/uapi/linux/ipv6.h @@ -173,6 +173,7 @@ enum { DEVCONF_STABLE_SECRET, DEVCONF_USE_OIF_ADDRS_ONLY, DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT, + DEVCONF_IGNORE_ROUTES_WITH_LINKDOWN, DEVCONF_MAX }; diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 53e3a9d..5dfbac7 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -214,6 +214,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .initialized = false, }, .use_oif_addrs_only = 0, + .ignore_routes_with_linkdown = 0, }; static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { @@ -257,6 +258,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .initialized = false, }, .use_oif_addrs_only = 0, + .ignore_routes_with_linkdown = 0, }; /* Check if a valid qdisc is available */ @@ -472,6 +474,9 @@ static int inet6_netconf_msgsize_devconf(int type) if (type == -1 || type == NETCONFA_PROXY_NEIGH) size += nla_total_size(4); + if (type == -1 || type == NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN) + size += nla_total_size(4); + return size; } @@ -508,6 +513,11 @@ static int inet6_netconf_fill_devconf(struct sk_buff *skb, int ifindex, nla_put_s32(skb, NETCONFA_PROXY_NEIGH, devconf-proxy_ndp) 0) goto nla_put_failure; + if ((type == -1 || type == NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN) + nla_put_s32(skb, NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN, + devconf-ignore_routes_with_linkdown) 0) + goto nla_put_failure; + nlmsg_end(skb, nlh); return 0; @@ -544,6 +554,7 @@ static const struct nla_policy devconf_ipv6_policy[NETCONFA_MAX+1] = { [NETCONFA_IFINDEX] = { .len = sizeof(int) }, [NETCONFA_FORWARDING] = { .len = sizeof(int) }, [NETCONFA_PROXY_NEIGH] = { .len = sizeof(int) }, + [NETCONFA_IGNORE_ROUTES_WITH_LINKDOWN] = { .len = sizeof(int) }, }; static int inet6_netconf_get_devconf(struct sk_buff *in_skb, @@ -766,6 +777,63 @@ static int addrconf_fixup_forwarding(struct ctl_table *table, int *p, int newf) rt6_purge_dflt_routers(net); return 1; } + +static void addrconf_linkdown_change(struct net *net, __s32 newf) +{ + struct net_device *dev; + struct inet6_dev *idev; + + for_each_netdev(net, dev) { + idev = __in6_dev_get(dev); + if (idev) { + int changed = (!idev-cnf.ignore_routes_with_linkdown) ^ (!newf); + + idev-cnf.ignore_routes_with_linkdown
RE: [PATCH] IGMP: Inhibit reports for local multicast groups
Hi David Thanks for taking the time to review and comment. This is my first upstream request so please forgive any ignorance on my part. I have added a new proposed commit wording below with a view to agreeing the content before resubmitting the patch. I hope it is sufficient to address your concerns. IGMP: Inhibit reports for local multicast groups The range of addresses between 224.0.0.0 and 224.0.0.255 inclusive, is reserved for the use of routing protocols and other low-level topology discovery or maintenance protocols, such as gateway discovery and group membership reporting. Multicast routers should not forward any multicast datagram with destination addresses in this range, regardless of its TTL. Currently, IGMP reports are generated for this reserved range of addresses even though a router will ignore this information since it has no purpose. However, the presence of reserved group addresses in an IGMP membership report uses up network bandwidth and can also obscure addresses of interest when inspecting membership reports using packet inspection or debug messages. IGMP reports for local multicast groups can now be inhibited by means of a system control variable (setting the value to zero). To retain backwards compatibility the previous behaviour is retained by default on system boot. Signed-off-by: Philip Downey pdow...@brocade.com Regards Philip -Original Message- From: David Miller [mailto:da...@davemloft.net] Sent: Thursday, August 13, 2015 12:45 AM To: Philip Downey Cc: kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshf...@linux-ipv6.org; ka...@trash.net; linux-ker...@vger.kernel.org; netdev@vger.kernel.org Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups From: Philip Downey pdow...@brocade.com Date: Wed, 12 Aug 2015 17:13:53 +0100 IGMP reports are generated for link local multicast groups (224.0.0.1 - 224.0.0.255) used by the routing protocols such as RIP, OSPF etc. In general routers do not generate reports for local multicast groups. IGMP reports for local multicast groups can now be inhibited by means of a system control variable (setting the value to zero). To retain backwards compatibility the previous behaviour is retained by default on system boot. Signed-off-by: Philip Downey pdow...@brocade.com I'm always hesitent to apply patches like this. I can't even understand from your explanation: 1) what about local reporting behavior is so bad 2) why you want to inhibit them at all For example, this: In general routers do not generate reports for local multicast groups. Doesn't tell me anything. You need to go into more detail about this, and explain the situation sufficiently. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm: make page pfmemalloc check more robust
On Thu, 2015-08-13 at 11:13 +0200, Vlastimil Babka wrote: Given that this apparently isn't the first case of this localhost issue, I wonder if network code should just clear skb-pfmemalloc during send (or maybe just send over localhost). That would be probably easier than distinguish the __skb_fill_page_desc() callers for send vs receive. Would this still needed after this patch ? It is sad we do not have a SNMP counter to at least count how often we drop skb because pfmemalloc is set. I'll provide such a patch. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next PATCH 1/3] net: make default tx_queue_len configurable
On Thu, Aug 13, 2015 at 03:10:33PM +0200, Jesper Dangaard Brouer wrote: On Thu, 13 Aug 2015 03:13:40 +0200 Phil Sutter p...@nwl.cc wrote: On Tue, Aug 11, 2015 at 06:13:49PM -0700, Alexei Starovoitov wrote: In general 'changing the default' may be an acceptable thing, but then it needs to strongly justified. How much performance does it bring? A quick test on my local VM with veth and netperf (netserver and veth peer in different netns) I see an increase of about 5% of throughput when using noqueue instead of the default pfifo_fast. Good that you can show 5% improvement with a single netperf flow. We are saving approx 6 atomic operations avoiding the qdisc code path. This fixes a scalability issue with veth. Thus, the real performance boost will happen with multiple flows and multiple CPU cores in action. You can try with a multi core VM and use super_netperf. https://github.com/borkmann/stuff/blob/master/super_netperf I actually used that on my VM as well, but the difference between a single and ten streams in parallel was negligible. In order to avoid tampering the results, I tested again on a physical system with four cores, ran each benchmark ten times and built an average over the results. This showed an increase in throughput of about 35% with a single stream and about 10% with ten streams in parallel. Not sure though why the improvement is bigger in the first case if there really is a scalability problem as you say. Cheers, Phil -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] ipv4: off-by-one in continuation handling in /proc/net/route
On Thu, 2015-08-13 at 11:21 +0100, Andy Whitcroft wrote: When generating /proc/net/route we emit a header followed by a line for each route. When a short read is performed we will restart this process based on the open file descriptor. When calculating the start point we fail to take into account that the 0th entry is the header. This leads us to skip the first entry when doing a continuation read. This can be easily seen with the comparison below: while read l; do echo $l; done /proc/net/route A cat /proc/net/route B diff -bu A B | grep '^[+-]' On my example machine I have approximatly 10KB of route output. There we see the very first non-title element is lost in the while read case, and an entry around the 8K mark in the cat case: +wlan0 02021EAC 0003 0 0 400 0 0 0 -tun1 00C0AC0A 0001 0 0 950 00C0 0 0 0 Fix up the off-by-one when reaquiring position on continuation. BugLink: http://bugs.launchpad.net/bugs/1483440 Signed-off-by: Andy Whitcroft a...@canonical.com --- net/ipv4/fib_trie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) From code inspection I belive this was introduced by the Fixes below, but I have not tested this to confirm. Fixes: 8be33e955cb9 (ipv4: off-by-one in continuation handling in /proc/net/route) You probably meant Fixes: 8be33e955cb9 (fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf) CC Alexander for review/comment -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IGMP: Inhibit reports for local multicast groups
Hi Andrew IGMP snooping is designed to prevent hosts on a local network from receiving traffic for a multicast group they have not explicitly joined. Link-Local multicast traffic should not have an IGMP client since it is reserved for routing protocols. One would expect that IGMP snooping needs to ignore local multicast traffic in the reserved range intended for routers since there should be no IGMP client to make join requests. Regards Philip -Original Message- From: Andrew Lunn [mailto:and...@lunn.ch] Sent: Thursday, August 13, 2015 5:06 PM To: Philip Downey Cc: David Miller; kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshfuji@linux- ipv6.org; ka...@trash.net; linux-ker...@vger.kernel.org; netdev@vger.kernel.org Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups On Thu, Aug 13, 2015 at 02:48:23PM +, Philip Downey wrote: Hi David Thanks for taking the time to review and comment. This is my first upstream request so please forgive any ignorance on my part. I have added a new proposed commit wording below with a view to agreeing the content before resubmitting the patch. I hope it is sufficient to address your concerns. IGMP: Inhibit reports for local multicast groups The range of addresses between 224.0.0.0 and 224.0.0.255 inclusive, is reserved for the use of routing protocols and other low-level topology discovery or maintenance protocols, such as gateway discovery and group membership reporting. Multicast routers should not forward any multicast datagram with destination addresses in this range, regardless of its TTL. Currently, IGMP reports are generated for this reserved range of addresses even though a router will ignore this information since it has no purpose. Hi Philip What about switches which are doing IGMP snooping? Andrew -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/7] net: dsa: mv88e6xxx: add VLAN Get Next support
Implement the port_pvid_get and vlan_getnext driver functions required to dump VLAN entries from the hardware, with the VTU Get Next operation. Some functions and structure will be shared with STU operations, since their table format are similar (e.g. STU data entries are accessible with the same registers as VTU entries, except with an offset of 2). Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6352.c | 2 + drivers/net/dsa/mv88e6xxx.c | 138 drivers/net/dsa/mv88e6xxx.h | 27 + 3 files changed, 167 insertions(+) diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c index a18f7c8..e6767ce 100644 --- a/drivers/net/dsa/mv88e6352.c +++ b/drivers/net/dsa/mv88e6352.c @@ -343,6 +343,8 @@ struct dsa_switch_driver mv88e6352_switch_driver = { .port_join_bridge = mv88e6xxx_join_bridge, .port_leave_bridge = mv88e6xxx_leave_bridge, .port_stp_update= mv88e6xxx_port_stp_update, + .port_pvid_get = mv88e6xxx_port_pvid_get, + .vlan_getnext = mv88e6xxx_vlan_getnext, .port_fdb_add = mv88e6xxx_port_fdb_add, .port_fdb_del = mv88e6xxx_port_fdb_del, .port_fdb_getnext = mv88e6xxx_port_fdb_getnext, diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 175353a..ecdd9da 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -2,6 +2,9 @@ * net/dsa/mv88e6xxx.c - Marvell 88e6xxx switch chip support * Copyright (c) 2008 Marvell Semiconductor * + * Copyright (c) 2015 CMC Electronics, Inc. + * Added support for VLAN Table Unit operations + * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or @@ -1182,6 +1185,19 @@ int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int port, u8 state) return 0; } +int mv88e6xxx_port_pvid_get(struct dsa_switch *ds, int port, u16 *pvid) +{ + int ret; + + ret = mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_DEFAULT_VLAN); + if (ret 0) + return ret; + + *pvid = ret PORT_DEFAULT_VLAN_MASK; + + return 0; +} + static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds) { return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP, @@ -1210,6 +1226,128 @@ static int _mv88e6xxx_vtu_stu_flush(struct dsa_switch *ds) return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_FLUSH_ALL); } +static int _mv88e6xxx_vtu_stu_data_read(struct dsa_switch *ds, + struct mv88e6xxx_vtu_stu_entry *entry, + unsigned int nibble_offset) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + u16 regs[3]; + int i; + int ret; + + for (i = 0; i 3; ++i) { + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, + GLOBAL_VTU_DATA_0_3 + i); + if (ret 0) + return ret; + + regs[i] = ret; + } + + for (i = 0; i ps-num_ports; ++i) { + unsigned int shift = (i % 4) * 4 + nibble_offset; + u16 reg = regs[i / 4]; + + entry-data[i] = (reg shift) GLOBAL_VTU_STU_DATA_MASK; + } + + return 0; +} + +static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid, + struct mv88e6xxx_vtu_stu_entry *entry) +{ + struct mv88e6xxx_vtu_stu_entry next = { 0 }; + int ret; + + ret = _mv88e6xxx_vtu_wait(ds); + if (ret 0) + return ret; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, + vid GLOBAL_VTU_VID_MASK); + if (ret 0) + return ret; + + ret = _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_GET_NEXT); + if (ret 0) + return ret; + + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_VID); + if (ret 0) + return ret; + + next.vid = ret GLOBAL_VTU_VID_MASK; + next.valid = !!(ret GLOBAL_VTU_VID_VALID); + + if (next.valid) { + ret = _mv88e6xxx_vtu_stu_data_read(ds, next, 0); + if (ret 0) + return ret; + + if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) || + mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) { + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, + GLOBAL_VTU_FID); + if (ret 0) + return ret; + + next.fid = ret GLOBAL_VTU_FID_MASK; + + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, +
[PATCH net-next 4/7] net: dsa: mv88e6xxx: add VLAN support to FDB dump
Add an helper function to read the next valid VLAN entry for a given port. It is used in the VID to FID conversion function to retrieve the forwarding database assigned to a given VLAN port. Finally update the FDB getnext operation to iterate on the next valid port VLAN when the end of the current database is reached. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6xxx.c | 42 -- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index ecdd9da..6c86bad 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1307,6 +1307,29 @@ static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid, return 0; } +static int _mv88e6xxx_port_vtu_getnext(struct dsa_switch *ds, int port, u16 vid, + struct mv88e6xxx_vtu_stu_entry *entry) +{ + int err; + + do { + if (vid == 4095) + return -ENOENT; + + err = _mv88e6xxx_vtu_getnext(ds, vid, entry); + if (err) + return err; + + if (!entry-valid) + return -ENOENT; + + vid = entry-vid; + } while (entry-data[port] != GLOBAL_VTU_DATA_MEMBER_TAG_TAGGED +entry-data[port] != GLOBAL_VTU_DATA_MEMBER_TAG_UNTAGGED); + + return 0; +} + int mv88e6xxx_vlan_getnext(struct dsa_switch *ds, u16 *vid, unsigned long *ports, unsigned long *untagged) { @@ -1421,10 +1444,19 @@ static int _mv88e6xxx_atu_load(struct dsa_switch *ds, static int _mv88e6xxx_port_vid_to_fid(struct dsa_switch *ds, int port, u16 vid) { struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + struct mv88e6xxx_vtu_stu_entry vlan; + int err; if (vid == 0) return ps-fid[port]; + err = _mv88e6xxx_port_vtu_getnext(ds, port, vid - 1, vlan); + if (err) + return err; + + if (vlan.vid == vid) + return vlan.fid; + return -ENOENT; } @@ -1548,8 +1580,14 @@ int mv88e6xxx_port_fdb_getnext(struct dsa_switch *ds, int port, do { if (is_broadcast_ether_addr(addr)) { - ret = -ENOENT; - goto unlock; + struct mv88e6xxx_vtu_stu_entry vtu; + + ret = _mv88e6xxx_port_vtu_getnext(ds, port, *vid, vtu); + if (ret 0) + goto unlock; + + *vid = vtu.vid; + fid = vtu.fid; } ret = _mv88e6xxx_atu_getnext(ds, fid, addr, next); -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IGMP: Inhibit reports for local multicast groups
On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote: Hi Andrew IGMP snooping is designed to prevent hosts on a local network from receiving traffic for a multicast group they have not explicitly joined. Link-Local multicast traffic should not have an IGMP client since it is reserved for routing protocols. One would expect that IGMP snooping needs to ignore local multicast traffic in the reserved range intended for routers since there should be no IGMP client to make join requests. The point of this patch is that Linux is sending out group membership for these addresses, it is acting as a client. What happens with a switch which is applying IGMP snooping to link-local multicast groups? You turn on this feature, and you no longer get your routing protocol messages. I had a quick look at RFC 3376. The only mention i spotted for not sending IGMP messages is: The all-systems multicast address, 224.0.0.1, is handled as a special case. On all systems -- that is all hosts and routers, including multicast routers -- reception of packets destined to the all-systems multicast address, from all sources, is permanently enabled on all interfaces on which multicast reception is supported. No IGMP messages are ever sent regarding the all-systems multicast address. IGMP v2 has something similar: The all-systems group (address 224.0.0.1) is handled as a special case. The host starts in Idle Member state for that group on every interface, never transitions to another state, and never sends a report for that group. But i did not find anything which says all other link-local addresses don't need member reports. Did i miss something? Andrew -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] net: sch_generic: react upon IFF_NO_QUEUE flag
Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero. Signed-off-by: Phil Sutter p...@nwl.cc --- net/sched/sch_generic.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 6efca30..942fea8 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -735,7 +735,7 @@ static void attach_one_default_qdisc(struct net_device *dev, { struct Qdisc *qdisc = noqueue_qdisc; - if (dev-tx_queue_len) { + if (dev-tx_queue_len !(dev-priv_flags IFF_NO_QUEUE)) { qdisc = qdisc_create_dflt(dev_queue, default_qdisc_ops, TC_H_ROOT); if (!qdisc) { @@ -755,7 +755,9 @@ static void attach_default_qdiscs(struct net_device *dev) txq = netdev_get_tx_queue(dev, 0); - if (!netif_is_multiqueue(dev) || dev-tx_queue_len == 0) { + if (!netif_is_multiqueue(dev) || + dev-tx_queue_len == 0 || + dev-priv_flags IFF_NO_QUEUE) { netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL); dev-qdisc = txq-qdisc_sleeping; atomic_inc(dev-qdisc-refcnt); -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len
This series adds a new private net_device flag indicating that a device may (and probably should) be used without a queueing discipline attached to it. This is already common practice for many virtual device types like e.g. loopback, VLAN (802.1Q) or bridges (802.1D). The reason for this is that these devices lack an underlying layer which could impose back pressure and therefore making a TX queue necessary to not slow down senders. Up to now, drivers being aware of the above applying to them set dev-tx_queue_len to zero to indicate no qdisc should be attached to the interface they drive and the kernel reacts upon this by assigning the noop qdisc instead of the default pfifo_fast. This implicit agreement though leads to an inconvenient situation once a user tries to attach a real qdisc to these devices, as the formerly special tx_queue_len value becomes a regular one, limiting the queue to zero packets and thus prevents any TX from happening. To overcome this, practically all qdisc implementations intercept and sanitize the malicious value. With this series applied, drivers may signal the lack of need for a qdisc without having to tamper with tx_queue_len, making fallbacks in qdiscs and caveats in userspace unnecessary. Upon upstream acceptance, this series will be followed up by a set of patches converting device drivers, adding a warning so out-of-tree driver authors get aware of this change and dropping all special handling of tx_queue_len in net/sched/. Phil Sutter (2): net: declare new net_device priv_flag IFF_NO_QUEUE net: sch_generic: react upon IFF_NO_QUEUE flag include/linux/netdevice.h | 3 +++ net/sched/sch_generic.c | 6 -- 2 files changed, 7 insertions(+), 2 deletions(-) -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] net: declare new net_device priv_flag IFF_NO_QUEUE
This private net_device flag can be set by drivers to inform that a device runs fine without a qdisc attached. This was formerly done by setting tx_queue_len to zero. Signed-off-by: Phil Sutter p...@nwl.cc --- include/linux/netdevice.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 607b5f4..7ed6fb0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1262,6 +1262,7 @@ struct net_device_ops { * @IFF_LIVE_ADDR_CHANGE: device supports hardware address * change when it's running * @IFF_MACVLAN: Macvlan device + * @IFF_NO_QUEUE: device can run without qdisc attached */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 10, @@ -1289,6 +1290,7 @@ enum netdev_priv_flags { IFF_XMIT_DST_RELEASE_PERM = 122, IFF_IPVLAN_MASTER = 123, IFF_IPVLAN_SLAVE= 124, + IFF_NO_QUEUE= 125, }; #define IFF_802_1Q_VLANIFF_802_1Q_VLAN @@ -1316,6 +1318,7 @@ enum netdev_priv_flags { #define IFF_XMIT_DST_RELEASE_PERM IFF_XMIT_DST_RELEASE_PERM #define IFF_IPVLAN_MASTER IFF_IPVLAN_MASTER #define IFF_IPVLAN_SLAVE IFF_IPVLAN_SLAVE +#define IFF_NO_QUEUE IFF_NO_QUEUE /** * struct net_device - The DEVICE structure. -- 2.1.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 7/7] net: dsa: mv88e6xxx: use port 802.1Q mode Secure
This commit changes the 802.1Q mode of each port from Disabled to Secure. This enables the VLAN support, by checking the VTU entries on ingress. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6xxx.c | 14 +++--- drivers/net/dsa/mv88e6xxx.h | 5 + 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index ca867e4..332f2c8 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -2005,13 +2005,11 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port) goto abort; } - /* Port Control 2: don't force a good FCS, set the maximum -* frame size to 10240 bytes, don't let the switch add or -* strip 802.1q tags, don't discard tagged or untagged frames -* on this port, do a destination address lookup on all -* received packets as usual, disable ARP mirroring and don't -* send a copy of all transmitted/received frames on this port -* to the CPU. + /* Port Control 2: don't force a good FCS, set the maximum frame size to +* 10240 bytes, enable secure 802.1q tags, don't discard tagged or +* untagged frames on this port, do a destination address lookup on all +* received packets as usual, disable ARP mirroring and don't send a +* copy of all transmitted/received frames on this port to the CPU. */ reg = 0; if (mv88e6xxx_6352_family(ds) || mv88e6xxx_6351_family(ds) || @@ -2033,6 +2031,8 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port) reg |= PORT_CONTROL_2_FORWARD_UNKNOWN; } + reg |= PORT_CONTROL_2_8021Q_SECURE; + if (reg) { ret = _mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_CONTROL_2, reg); diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h index ca3268f..72ca887 100644 --- a/drivers/net/dsa/mv88e6xxx.h +++ b/drivers/net/dsa/mv88e6xxx.h @@ -140,6 +140,11 @@ #define PORT_CONTROL_2_JUMBO_1522 (0x00 12) #define PORT_CONTROL_2_JUMBO_2048 (0x01 12) #define PORT_CONTROL_2_JUMBO_10240 (0x02 12) +#define PORT_CONTROL_2_8021Q_MASK (0x03 10) +#define PORT_CONTROL_2_8021Q_DISABLED (0x00 10) +#define PORT_CONTROL_2_8021Q_FALLBACK (0x01 10) +#define PORT_CONTROL_2_8021Q_CHECK (0x02 10) +#define PORT_CONTROL_2_8021Q_SECURE(0x03 10) #define PORT_CONTROL_2_DISCARD_TAGGED BIT(9) #define PORT_CONTROL_2_DISCARD_UNTAGGEDBIT(8) #define PORT_CONTROL_2_MAP_DA BIT(7) -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 6/7] net: dsa: mv88e6xxx: add VLAN Load support
Implement port_pvid_set and port_vlan_add to add new entries in the VLAN hardware table, and join ports to them. The patch also implement the STU Get Next and Load Purge operations, since it is required to have a valid STU entry for at least all VLANs. Each VLAN has its own forwarding database, with FID num_ports+1 to 4095. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6352.c | 2 + drivers/net/dsa/mv88e6xxx.c | 169 drivers/net/dsa/mv88e6xxx.h | 9 +++ 3 files changed, 180 insertions(+) diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c index cec38bb..14b7177 100644 --- a/drivers/net/dsa/mv88e6352.c +++ b/drivers/net/dsa/mv88e6352.c @@ -344,6 +344,8 @@ struct dsa_switch_driver mv88e6352_switch_driver = { .port_leave_bridge = mv88e6xxx_leave_bridge, .port_stp_update= mv88e6xxx_port_stp_update, .port_pvid_get = mv88e6xxx_port_pvid_get, + .port_pvid_set = mv88e6xxx_port_pvid_set, + .port_vlan_add = mv88e6xxx_port_vlan_add, .port_vlan_del = mv88e6xxx_port_vlan_del, .vlan_getnext = mv88e6xxx_vlan_getnext, .port_fdb_add = mv88e6xxx_port_fdb_add, diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 8423924..ca867e4 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1198,6 +1198,12 @@ int mv88e6xxx_port_pvid_get(struct dsa_switch *ds, int port, u16 *pvid) return 0; } +int mv88e6xxx_port_pvid_set(struct dsa_switch *ds, int port, u16 pvid) +{ + return mv88e6xxx_reg_write(ds, REG_PORT(port), PORT_DEFAULT_VLAN, + pvid PORT_DEFAULT_VLAN_MASK); +} + static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds) { return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP, @@ -1374,6 +1380,169 @@ loadpurge: return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_LOAD_PURGE); } +static int _mv88e6xxx_stu_getnext(struct dsa_switch *ds, u8 sid, + struct mv88e6xxx_vtu_stu_entry *entry) +{ + struct mv88e6xxx_vtu_stu_entry next = { 0 }; + int ret; + + ret = _mv88e6xxx_vtu_wait(ds); + if (ret 0) + return ret; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_SID, + sid GLOBAL_VTU_SID_MASK); + if (ret 0) + return ret; + + ret = _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_STU_GET_NEXT); + if (ret 0) + return ret; + + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_SID); + if (ret 0) + return ret; + + next.sid = ret GLOBAL_VTU_SID_MASK; + + ret = _mv88e6xxx_reg_read(ds, REG_GLOBAL, GLOBAL_VTU_VID); + if (ret 0) + return ret; + + next.valid = !!(ret GLOBAL_VTU_VID_VALID); + + if (next.valid) { + ret = _mv88e6xxx_vtu_stu_data_read(ds, next, 2); + if (ret 0) + return ret; + } + + *entry = next; + return 0; +} + +static int _mv88e6xxx_stu_loadpurge(struct dsa_switch *ds, + struct mv88e6xxx_vtu_stu_entry *entry) +{ + u16 reg = 0; + int ret; + + ret = _mv88e6xxx_vtu_wait(ds); + if (ret 0) + return ret; + + if (!entry-valid) + goto loadpurge; + + /* Write port states */ + ret = _mv88e6xxx_vtu_stu_data_write(ds, entry, 2); + if (ret 0) + return ret; + + reg = GLOBAL_VTU_VID_VALID; +loadpurge: + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, reg); + if (ret 0) + return ret; + + reg = entry-sid GLOBAL_VTU_SID_MASK; + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_SID, reg); + if (ret 0) + return ret; + + return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_STU_LOAD_PURGE); +} + +static int _mv88e6xxx_vlan_init(struct dsa_switch *ds, u16 vid, + struct mv88e6xxx_vtu_stu_entry *entry) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + struct mv88e6xxx_vtu_stu_entry vlan = { + .valid = true, + .vid = vid, + }; + int i; + + /* exclude all ports except the CPU */ + for (i = 0; i ps-num_ports; ++i) + vlan.data[i] = dsa_is_cpu_port(ds, i) ? + GLOBAL_VTU_DATA_MEMBER_TAG_TAGGED : + GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER; + + if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) || + mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) { + struct mv88e6xxx_vtu_stu_entry vstp; + int err; + + /* Adding a VTU entry requires a valid STU entry. As VSTP is not +*
[PATCH net-next 1/7] net: dsa: add support for switchdev VLAN objects
Add new functions in DSA drivers to access hardware VLAN entries through SWITCHDEV_OBJ_PORT_VLAN objects: - port_pvid_get() and vlan_getnext() to dump a VLAN - port_vlan_del() to exclude a port from a VLAN - port_pvid_set() and port_vlan_add() to join a port to a VLAN The DSA infrastructure will ensure that each VLAN of the given range does not already belong to another bridge. If it does, it will fallback to software VLAN and won't program the hardware. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- include/net/dsa.h | 11 net/dsa/slave.c | 158 ++ 2 files changed, 169 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index 6356f43..bd9b765 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -298,6 +298,17 @@ struct dsa_switch_driver { u8 state); /* +* VLAN support +*/ + int (*port_pvid_get)(struct dsa_switch *ds, int port, u16 *pvid); + int (*port_pvid_set)(struct dsa_switch *ds, int port, u16 pvid); + int (*port_vlan_add)(struct dsa_switch *ds, int port, u16 vid, +bool untagged); + int (*port_vlan_del)(struct dsa_switch *ds, int port, u16 vid); + int (*vlan_getnext)(struct dsa_switch *ds, u16 *vid, + unsigned long *ports, unsigned long *untagged); + + /* * Forwarding database */ int (*port_fdb_add)(struct dsa_switch *ds, int port, diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 2767584..880ead7 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -200,6 +200,152 @@ out: return 0; } +static int dsa_bridge_check_vlan_range(struct dsa_switch *ds, + const struct net_device *bridge, + u16 vid_begin, u16 vid_end) +{ + struct dsa_slave_priv *p; + struct net_device *dev, *vlan_br; + DECLARE_BITMAP(members, DSA_MAX_PORTS); + DECLARE_BITMAP(untagged, DSA_MAX_PORTS); + u16 vid; + int member, err; + + if (!ds-drv-vlan_getnext || !vid_begin) + return -EOPNOTSUPP; + + vid = vid_begin - 1; + + do { + err = ds-drv-vlan_getnext(ds, vid, members, untagged); + if (err) + break; + + if (vid vid_end) + break; + + member = find_first_bit(members, DSA_MAX_PORTS); + if (member == DSA_MAX_PORTS) + continue; + + dev = ds-ports[member]; + p = netdev_priv(dev); + vlan_br = p-bridge_dev; + if (vlan_br == bridge) + continue; + + netdev_dbg(vlan_br, hardware VLAN %d already in use\n, vid); + return -EOPNOTSUPP; + } while (vid vid_end); + + return err == -ENOENT ? 0 : err; +} + +static int dsa_slave_port_vlan_add(struct net_device *dev, + struct switchdev_obj *obj) +{ + struct switchdev_obj_vlan *vlan = obj-u.vlan; + struct dsa_slave_priv *p = netdev_priv(dev); + struct dsa_switch *ds = p-parent; + u16 vid; + int err; + + switch (obj-trans) { + case SWITCHDEV_TRANS_PREPARE: + if (!ds-drv-port_vlan_add || !ds-drv-port_pvid_set) + return -EOPNOTSUPP; + + /* If the requested port doesn't belong to the same bridge as +* the VLAN members, fallback to software VLAN (hopefully). +*/ + err = dsa_bridge_check_vlan_range(ds, p-bridge_dev, + vlan-vid_begin, + vlan-vid_end); + if (err) + return err; + break; + case SWITCHDEV_TRANS_COMMIT: + for (vid = vlan-vid_begin; vid = vlan-vid_end; ++vid) { + err = ds-drv-port_vlan_add(ds, p-port, vid, +vlan-flags +BRIDGE_VLAN_INFO_UNTAGGED); + if (!err vlan-flags BRIDGE_VLAN_INFO_PVID) + err = ds-drv-port_pvid_set(ds, p-port, vid); + if (err) + return err; + } + break; + default: + return -EOPNOTSUPP; + } + + return 0; +} + +static int dsa_slave_port_vlan_del(struct net_device *dev, + struct switchdev_obj *obj) +{ + struct switchdev_obj_vlan *vlan = obj-u.vlan; + struct dsa_slave_priv *p = netdev_priv(dev); + struct dsa_switch *ds = p-parent; + u16 vid; + int err; + + if (!ds-drv-port_vlan_del) +
[PATCH] net: allow sleeping when modifying store_rps_map
Commit 10e4ea751 (net: Fix race condition in store_rps_map) has moved the manipulation of the rps_needed jump label under a spinlock. Since changing the state of a jump label may sleep this is incorrect and causes warnings during runtime. Make rps_map_lock a mutex to allow sleeping under it. Fixes: 10e4ea751 (net: Fix race condition in store_rps_map) Signed-off-by: Sasha Levin sasha.le...@oracle.com --- net/core/net-sysfs.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 39ec694..b279077 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -689,7 +689,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, struct rps_map *old_map, *map; cpumask_var_t mask; int err, cpu, i; - static DEFINE_SPINLOCK(rps_map_lock); + static DEFINE_MUTEX(rps_map_mutex); if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -722,9 +722,9 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, map = NULL; } - spin_lock(rps_map_lock); + mutex_lock(rps_map_mutex); old_map = rcu_dereference_protected(queue-rps_map, - lockdep_is_held(rps_map_lock)); + mutex_is_locked(rps_map_mutex)); rcu_assign_pointer(queue-rps_map, map); if (map) @@ -732,7 +732,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, if (old_map) static_key_slow_dec(rps_needed); - spin_unlock(rps_map_lock); + mutex_unlock(rps_map_mutex); if (old_map) kfree_rcu(old_map, rcu); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] ethernet/s2io: advertise what hw supports in vlan_features
For some reason, the s2io driver has never filled in vlan_features. If that's fully intentional, then this patch should be dropped. If its not, then this patch is necessary to maintain some functionality of slave s2io devices in a bonding group. Without this, the presence of a s2io device in a bond will not trigger LRO support to be enabled at the bond level, even while it is enabled on the slave itself. This change becomes necessary when NETIF_F_LRO is added to netdev_features.h's NETIF_F_ONE_FOR_ALL. CC: Jon Mason jdma...@kudzu.us CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- drivers/net/ethernet/neterion/s2io.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/neterion/s2io.c b/drivers/net/ethernet/neterion/s2io.c index 2d1b942..8bbf540 100644 --- a/drivers/net/ethernet/neterion/s2io.c +++ b/drivers/net/ethernet/neterion/s2io.c @@ -7922,6 +7922,7 @@ s2io_init_nic(struct pci_dev *pdev, const struct pci_device_id *pre) NETIF_F_RXCSUM | NETIF_F_LRO; dev-features |= dev-hw_features | NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX; + dev-vlan_features |= dev-hw_features; if (sp-device_type XFRAME_II_DEVICE) { dev-hw_features |= NETIF_F_UFO; if (ufo) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote: * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. [] diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c [] @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int rc; Is there a reason to move this declaration? + int i; if (!phydev-link) { /* Disable EDPD to wake up PHY */ - int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); if (rc 0) return rc; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] ethernet/bnx2x: advertise LRO support in vlan_features
Without this, the presence of a bnx2x device in a bond will not trigger LRO support to be enabled at the bond level, even while it is enabled on the slave itself. This change becomes necessary when NETIF_F_LRO is added to netdev_features.h's NETIF_F_ONE_FOR_ALL. CC: Ariel Elior ariel.el...@qlogic.com CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index ad73a60..41dc066 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -13083,7 +13083,8 @@ static int bnx2x_init_dev(struct bnx2x *bp, struct pci_dev *pdev, } dev-vlan_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | - NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_HIGHDMA; + NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 | NETIF_F_HIGHDMA | + NETIF_F_LRO; /* VF with OLD Hypervisor or old PF do not support filtering */ if (IS_PF(bp)) { -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] ethernet/qlcnic: advertise LRO support in vlan_features
Without this, the presence of a qlcnic device in a bond will not trigger LRO support to be enabled at the bond level, even while it is enabled on the slave itself. This change becomes necessary when NETIF_F_LRO is added to netdev_features.h's NETIF_F_ONE_FOR_ALL. CC: Shahed Shaikh shahed.sha...@qlogic.com CC: dept-gelinuxnic...@qlogic.com CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 8b08b20..5a798ab 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -2314,8 +2314,10 @@ qlcnic_setup_netdev(struct qlcnic_adapter *adapter, struct net_device *netdev, if (qlcnic_sriov_vf_check(adapter)) netdev-features |= NETIF_F_HW_VLAN_CTAG_FILTER; - if (adapter-ahw-capabilities QLCNIC_FW_CAPABILITY_HW_LRO) + if (adapter-ahw-capabilities QLCNIC_FW_CAPABILITY_HW_LRO) { netdev-features |= NETIF_F_LRO; + netdev-vlan_features |= NETIF_F_LRO; + } if (qlcnic_encap_tx_offload(adapter)) { netdev-features |= NETIF_F_GSO_UDP_TUNNEL; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/7] net: dsa: mv88e6xxx: flush VTU and STU entries
Implement the VTU Flush operation (which also flushes the STU), so that warm boots won't preserved old entries. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6xxx.c | 34 ++ drivers/net/dsa/mv88e6xxx.h | 2 ++ 2 files changed, 36 insertions(+) diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 9978245..175353a 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1182,6 +1182,34 @@ int mv88e6xxx_port_stp_update(struct dsa_switch *ds, int port, u8 state) return 0; } +static int _mv88e6xxx_vtu_wait(struct dsa_switch *ds) +{ + return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_VTU_OP, + GLOBAL_VTU_OP_BUSY); +} + +static int _mv88e6xxx_vtu_cmd(struct dsa_switch *ds, u16 op) +{ + int ret; + + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_OP, op); + if (ret 0) + return ret; + + return _mv88e6xxx_vtu_wait(ds); +} + +static int _mv88e6xxx_vtu_stu_flush(struct dsa_switch *ds) +{ + int ret; + + ret = _mv88e6xxx_vtu_wait(ds); + if (ret 0) + return ret; + + return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_FLUSH_ALL); +} + static int _mv88e6xxx_atu_mac_write(struct dsa_switch *ds, const unsigned char *addr) { @@ -2071,6 +2099,12 @@ int mv88e6xxx_setup_global(struct dsa_switch *ds) /* Wait for the flush to complete. */ mutex_lock(ps-smi_mutex); ret = _mv88e6xxx_stats_wait(ds); + if (ret 0) + goto unlock; + + /* Clear all the VTU and STU entries */ + ret = _mv88e6xxx_vtu_stu_flush(ds); +unlock: mutex_unlock(ps-smi_mutex); return ret; diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h index 10fae32..76139ea 100644 --- a/drivers/net/dsa/mv88e6xxx.h +++ b/drivers/net/dsa/mv88e6xxx.h @@ -188,6 +188,8 @@ #define GLOBAL_CONTROL_TCAM_EN BIT(1) #define GLOBAL_CONTROL_EEPROM_DONE_EN BIT(0) #define GLOBAL_VTU_OP 0x05 +#define GLOBAL_VTU_OP_BUSY BIT(15) +#define GLOBAL_VTU_OP_FLUSH_ALL((0x01 12) | GLOBAL_VTU_OP_BUSY) #define GLOBAL_VTU_VID 0x06 #define GLOBAL_VTU_DATA_0_30x07 #define GLOBAL_VTU_DATA_4_70x08 -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
Dear Joe, On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote: * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. [] diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c [] @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int rc; Is there a reason to move this declaration? There is no strict requirement to move declaration of the rc. It was made just to have all declarations easily visible. + int i; if (!phydev-link) { /* Disable EDPD to wake up PHY */ - int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); if (rc 0) return rc; Best wishes -- Igor Plyatov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len
On Thu, 13 Aug 2015 19:01:05 +0200 Phil Sutter p...@nwl.cc wrote: Up to now, drivers being aware of the above applying to them set dev-tx_queue_len to zero to indicate no qdisc should be attached to the interface they drive and the kernel reacts upon this by assigning the noop qdisc instead of the default pfifo_fast. This implicit agreement though leads to an inconvenient situation once a user tries to attach a real qdisc to these devices, as the formerly special tx_queue_len value becomes a regular one, So this is a workaround for user ignorance by introducing kernel API complexity. Before user sets qdisc, why don't they set tx queue length? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] ethernet/netxen: advertise LRO support in vlan_features
Without this, the presence of a netxen device in a bond will not trigger LRO support to be enabled at the bond level, even while it is enabled on the slave itself. This change becomes necessary when NETIF_F_LRO is added to netdev_features.h's NETIF_F_ONE_FOR_ALL. CC: Manish Chopra manish.cho...@qlogic.com CC: Sony Chacko sony.cha...@qlogic.com CC: Rajesh Borundia rajesh.borun...@qlogic.com CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c index 6409a06..0fd5ada54 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c @@ -1387,8 +1387,10 @@ netxen_setup_netdev(struct netxen_adapter *adapter, if (adapter-capabilities NX_FW_CAPABILITY_FVLANTX) netdev-hw_features |= NETIF_F_HW_VLAN_CTAG_TX; - if (adapter-capabilities NX_FW_CAPABILITY_HW_LRO) + if (adapter-capabilities NX_FW_CAPABILITY_HW_LRO) { netdev-hw_features |= NETIF_F_LRO; + netdev-vlan_features |= NETIF_F_LRO; + } netdev-features |= netdev-hw_features; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] ethernet/ixgbe: advertise LRO support in vlan_features
Without this, the presence of a ixgbe device in a bond will not trigger LRO support to be enabled at the bond level, even while it is enabled on the slave itself. This change becomes necessary when NETIF_F_LRO is added to netdev_features.h's NETIF_F_ONE_FOR_ALL. CC: Jeff Kirsher jeffrey.t.kirs...@intel.com CC: intel-wired-...@lists.osuosl.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index 3e6a931..0a6e4e1 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -8659,8 +8659,10 @@ skip_sriov: if (adapter-flags2 IXGBE_FLAG2_RSC_CAPABLE) netdev-hw_features |= NETIF_F_LRO; - if (adapter-flags2 IXGBE_FLAG2_RSC_ENABLED) + if (adapter-flags2 IXGBE_FLAG2_RSC_ENABLED) { netdev-features |= NETIF_F_LRO; + netdev-vlan_features |= NETIF_F_LRO; + } /* make sure the EEPROM is good */ if (hw-eeprom.ops.validate_checksum(hw, NULL) 0) { -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IGMP: Inhibit reports for local multicast groups
On Thu, Aug 13, 2015 at 07:01:37PM +0200, Andrew Lunn wrote: On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote: Hi Andrew IGMP snooping is designed to prevent hosts on a local network from receiving traffic for a multicast group they have not explicitly joined. Link-Local multicast traffic should not have an IGMP client since it is reserved for routing protocols. One would expect that IGMP snooping needs to ignore local multicast traffic in the reserved range intended for routers since there should be no IGMP client to make join requests. The point of this patch is that Linux is sending out group membership for these addresses, it is acting as a client. What happens with a switch which is applying IGMP snooping to link-local multicast groups? You turn on this feature, and you no longer get your routing protocol messages. I had a quick look at RFC 3376. The only mention i spotted for not sending IGMP messages is: The all-systems multicast address, 224.0.0.1, is handled as a special case. On all systems -- that is all hosts and routers, including multicast routers -- reception of packets destined to the all-systems multicast address, from all sources, is permanently enabled on all interfaces on which multicast reception is supported. No IGMP messages are ever sent regarding the all-systems multicast address. IGMP v2 has something similar: The all-systems group (address 224.0.0.1) is handled as a special case. The host starts in Idle Member state for that group on every interface, never transitions to another state, and never sends a report for that group. But i did not find anything which says all other link-local addresses don't need member reports. Did i miss something? Andrew From RFC 4541 (Considerations for Internet Group Management Protocol (IGMP) and Multicast Listener Discovery (MLD) Snooping Switches): 2) Packets with a destination IP (DIP) address in the 224.0.0.X range which are not IGMP must be forwarded on all ports. This recommendation is based on the fact that many host systems do not send Join IP multicast addresses in this range before sending or listening to IP multicast packets. Furthermore, since the 224.0.0.X address range is defined as link-local (not to be routed), it seems unnecessary to keep the state for each address in this range. Additionally, some routers operate in the 224.0.0.X address range without issuing IGMP Joins, and these applications would break if the switch were to prune them due to not having seen a Join Group message from the router. So, it looks like some hosts and routers out there in the field do not send joins for those local addresses. In fact, IPv4 local multicast addresses are ignored when Linux bridge multicast snooping adds a new group. static int br_ip4_multicast_add_group(struct net_bridge *br, ... if (ipv4_is_local_multicast(group)) return 0; Cascardo. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/3] net: Identifier Locator Addressing - Part I
This patch set provides rudimentary support for Identifier Locator Addressing or ILA. The basic concept of ILA is that we split an IPv6 address into a 64 bit locator and 64 bit identifier. The identifier is the identity of an entity in communication (who), and the locator expresses the location of the entity (where). Applications use externally visible address that contains the identifier. When a packet is actually sent, a translation is done that overwrites the first 64 bits of the address with a locator. The packet can then be forwarded over the network to the host where the addressed entity is located. At the receiver, the reverse translation is done so the that the application sees the original, untranslated address. Presumably an external control plane will provide identifier-locator mappings. The data path for ILA is a simple NAT translation that only operates on the upper 64 bits of a destination address in IPv6 packets. The basic process is: 1) Lookup 64 bit identifier (lower 64 bits of destination) 2) If a match is found a) Overwrite locator (upper 64 bits of destination) with the new locator b) Adjust any checksum that has destination address included in pseudo header 3) Send or receive packet ILA is a means to implement tunnels or network virtualization without encapsulation. Since there is no encapsulation involved, we assume that stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.) just works. Also, since we're minimally changing the packet many of the worries about encapsulation (MTU, checksum, fragmentation) are not relevant. The downside is that, ILA is not extensible like other encapsulations (GUE for instance) so it might not be appropriate for all use cases. Also, this only makes sense to do in IPv6! A key aspect of ILA is performance. The intent is that ILA would be used in data centers in virtualizing tasks or jobs. In the fullest incarnation all intra data center communications might be targeted to virtual ILA addresses. This is basically adding a new virtualization capability to the existing services in a datacenter, so there is a strong expectation is that this does not degrade performance for existing applications. Performance seems to be dependent on how ILA is hooked into kernel. ILA can be implemented under some different models: - Mechanically it is a form a stateless DNAT - It can be thought of as a type of (source) routing - As a functional replacement of encapsulation In this patch set we hook into the data path using Light Weight Tunnels (LWT) infrastructure. As part of that, we add support in LWT to redirect dst input. iproute will be modified to take a new ila encap type. ILA can be configured like: # ILA to destination ip route add :0:0:1::0:2:0/128 \ encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0 # Configure local address ip -6 addr add :0:0:1::0:1:0/128 dev eth0 # ILA translation for local address on input ip route add table local local 2001:0:0:1::0:1:0/128 encap ila :0:0:1 dev lo So sending to destination :0:0:1::0:2:0 will have destination of 2001:0:0:2::0:2:0 on the wire. Performance results are below. With ILA we see about a 10% drop in pps compared to non-ILA. Much of this drop can be attributed to the loss of early demux on input (translation occurs after it is attempted). We will address this in the next patch set. Also, IPvlan input path does not work with ILA since the routing is bypassed-- this will be addressed in a future patch. Performance testing: Performing netperf TCP_RR with 200 clients: Non-ILA baseline 84.92% CPU utilization 1861922.9 tps 93/163/330 50/90/99% latencies ILA single destination 83.16% CPU utilization 1679683.4 tps 105/180/332 50/90/99% latencies References: Slides from netconf: http://vger.kernel.org/netconf2015Herbert-ILA.pdf Slides from presentation at IETF: https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf I-D: https://tools.ietf.org/html/draft-herbert-nvo3-ila-00 Tom Herbert (3): lwt: Add support to redirect dst.input net: Add inet_proto_csum_replace_by_diff utility function net: Identifier Locator Addressing module include/net/checksum.h| 2 + include/net/lwtunnel.h| 25 +++- include/uapi/linux/ila.h | 15 + include/uapi/linux/lwtunnel.h | 1 + net/core/lwtunnel.c | 55 + net/core/utils.c | 13 + net/ipv4/route.c | 8 ++- net/ipv6/Kconfig | 18 ++ net/ipv6/Makefile | 1 + net/ipv6/ila/Makefile | 7 +++ net/ipv6/ila/ila.h| 50 net/ipv6/ila/ila_lwt.c| 133 ++ net/ipv6/ila/ila_main.c | 69 ++ net/ipv6/route.c | 8 ++- 14 files changed, 402 insertions(+), 3 deletions(-) create mode 100644
[PATCH net-next 1/3] lwt: Add support to redirect dst.input
This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. Signed-off-by: Tom Herbert t...@herbertland.com --- include/net/lwtunnel.h | 25 ++- net/core/lwtunnel.c| 55 ++ net/ipv4/route.c | 8 +++- net/ipv6/route.c | 8 +++- 4 files changed, 93 insertions(+), 3 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 33bd309..3db87d7 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -11,12 +11,15 @@ #define LWTUNNEL_HASH_SIZE (1 LWTUNNEL_HASH_BITS) /* lw tunnel state flags */ -#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1 +#define LWTUNNEL_STATE_OUTPUT_REDIRECT BIT(0) +#define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) struct lwtunnel_state { __u16 type; __u16 flags; atomic_trefcnt; + int (*orig_output)(struct sock *sk, struct sk_buff *skb); + int (*orig_input)(struct sk_buff *); int len; __u8data[0]; }; @@ -25,6 +28,7 @@ struct lwtunnel_encap_ops { int (*build_state)(struct net_device *dev, struct nlattr *encap, struct lwtunnel_state **ts); int (*output)(struct sock *sk, struct sk_buff *skb); + int (*input)(struct sk_buff *skb); int (*fill_encap)(struct sk_buff *skb, struct lwtunnel_state *lwtstate); int (*get_encap_size)(struct lwtunnel_state *lwtstate); @@ -58,6 +62,13 @@ static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate) return false; } +static inline bool lwtunnel_input_redirect(struct lwtunnel_state *lwtstate) +{ + if (lwtstate (lwtstate-flags LWTUNNEL_STATE_INPUT_REDIRECT)) + return true; + + return false; +} int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op, unsigned int num); int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, @@ -72,6 +83,8 @@ struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len); int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b); int lwtunnel_output(struct sock *sk, struct sk_buff *skb); int lwtunnel_output6(struct sock *sk, struct sk_buff *skb); +int lwtunnel_input(struct sk_buff *skb); +int lwtunnel_input6(struct sk_buff *skb); #else @@ -142,6 +155,16 @@ static inline int lwtunnel_output6(struct sock *sk, struct sk_buff *skb) return -EOPNOTSUPP; } +static inline int lwtunnel_input(struct sock *sk, struct sk_buff *skb) +{ + return -EOPNOTSUPP; +} + +static inline int lwtunnel_input6(struct sock *sk, struct sk_buff *skb) +{ + return -EOPNOTSUPP; +} + #endif #endif /* __NET_LWTUNNEL_H */ diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index 5d6d8e3..3331585 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -241,3 +241,58 @@ int lwtunnel_output(struct sock *sk, struct sk_buff *skb) return __lwtunnel_output(sk, skb, lwtstate); } EXPORT_SYMBOL(lwtunnel_output); + +int __lwtunnel_input(struct sk_buff *skb, +struct lwtunnel_state *lwtstate) +{ + const struct lwtunnel_encap_ops *ops; + int ret = -EINVAL; + + if (!lwtstate) + goto drop; + + if (lwtstate-type == LWTUNNEL_ENCAP_NONE || + lwtstate-type LWTUNNEL_ENCAP_MAX) + return 0; + + ret = -EOPNOTSUPP; + rcu_read_lock(); + ops = rcu_dereference(lwtun_encaps[lwtstate-type]); + if (likely(ops ops-input)) + ret = ops-input(skb); + rcu_read_unlock(); + + if (ret == -EOPNOTSUPP) + goto drop; + + return ret; + +drop: + kfree_skb(skb); + + return ret; +} + +int lwtunnel_input6(struct sk_buff *skb) +{ + struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); + struct lwtunnel_state *lwtstate = NULL; + + if (rt) + lwtstate = rt-rt6i_lwtstate; + + return __lwtunnel_input(skb, lwtstate); +} +EXPORT_SYMBOL(lwtunnel_input6); + +int lwtunnel_input(struct sk_buff *skb) +{ + struct rtable *rt = (struct rtable *)skb_dst(skb); + struct lwtunnel_state *lwtstate = NULL; + + if (rt) + lwtstate = rt-rt_lwtstate; + + return __lwtunnel_input(skb, lwtstate); +} +EXPORT_SYMBOL(lwtunnel_input); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 18fd7c9..051d834 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1630,8 +1630,14 @@ static int __mkroute_input(struct sk_buff *skb, rth-dst.output = ip_output; rt_set_nexthop(rth, daddr, res, fnhe, res-fi, res-type, itag); - if (lwtunnel_output_redirect(rth-rt_lwtstate)) + if
[PATCH net-next 3/3] net: Identifier Locator Addressing module
Adding new module name ila. This implements ILA translation. Light weight tunnel redirection is used to perform the translation in the data path. This is configured by the ip -6 route command using the encap ila locator option, where locator is the value to set in destination locator of the packet. e.g. ip -6 route add :0:0:1::0:1:0/128 \ encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0 Sets a route where :0:0:1 will be overwritten by 2001:0:0:1 on output. Signed-off-by: Tom Herbert t...@herbertland.com --- include/uapi/linux/ila.h | 15 + include/uapi/linux/lwtunnel.h | 1 + net/ipv6/Kconfig | 18 ++ net/ipv6/Makefile | 1 + net/ipv6/ila/Makefile | 7 +++ net/ipv6/ila/ila.h| 50 net/ipv6/ila/ila_lwt.c| 133 ++ net/ipv6/ila/ila_main.c | 69 ++ 8 files changed, 294 insertions(+) create mode 100644 include/uapi/linux/ila.h create mode 100644 net/ipv6/ila/Makefile create mode 100644 net/ipv6/ila/ila.h create mode 100644 net/ipv6/ila/ila_lwt.c create mode 100644 net/ipv6/ila/ila_main.c diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h new file mode 100644 index 000..7ed9e67 --- /dev/null +++ b/include/uapi/linux/ila.h @@ -0,0 +1,15 @@ +/* ila.h - ILA Interface */ + +#ifndef _UAPI_LINUX_ILA_H +#define _UAPI_LINUX_ILA_H + +enum { + ILA_ATTR_UNSPEC, + ILA_ATTR_LOCATOR, /* u64 */ + + __ILA_ATTR_MAX, +}; + +#define ILA_ATTR_MAX (__ILA_ATTR_MAX - 1) + +#endif /* _UAPI_LINUX_ILA_H */ diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h index 31377bb..04bac3b 100644 --- a/include/uapi/linux/lwtunnel.h +++ b/include/uapi/linux/lwtunnel.h @@ -7,6 +7,7 @@ enum lwtunnel_encap_types { LWTUNNEL_ENCAP_NONE, LWTUNNEL_ENCAP_MPLS, LWTUNNEL_ENCAP_IP, + LWTUNNEL_ENCAP_ILA, __LWTUNNEL_ENCAP_MAX, }; diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig index 643f613..c732e27 100644 --- a/net/ipv6/Kconfig +++ b/net/ipv6/Kconfig @@ -92,6 +92,24 @@ config IPV6_MIP6 If unsure, say N. +config IPV6_ILA + tristate IPv6: Identifier Locator Addressing (ILA) + ---help--- + Support for IPv6 Identifier Locator Addressing (ILA). + + ILA is a mechanism to do network virtualization without + encapsulation. The basic concept of ILA is that we split an + IPv6 address into a 64 bit locator and 64 bit identifier. The + identifier is the identity of an entity in communication + (who) and the locator expresses the location of the + entity (where). + + ILA can be configured using the encap ila option with + ip -6 route command. ILA is described in + https://tools.ietf.org/html/draft-herbert-nvo3-ila-00. + + If unsure, say N. + config INET6_XFRM_TUNNEL tristate select INET6_TUNNEL diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 0f3f199..2fbd90b 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -34,6 +34,7 @@ obj-$(CONFIG_INET6_XFRM_MODE_TUNNEL) += xfrm6_mode_tunnel.o obj-$(CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION) += xfrm6_mode_ro.o obj-$(CONFIG_INET6_XFRM_MODE_BEET) += xfrm6_mode_beet.o obj-$(CONFIG_IPV6_MIP6) += mip6.o +obj-$(CONFIG_IPV6_ILA) += ila/ obj-$(CONFIG_NETFILTER)+= netfilter/ obj-$(CONFIG_IPV6_VTI) += ip6_vti.o diff --git a/net/ipv6/ila/Makefile b/net/ipv6/ila/Makefile new file mode 100644 index 000..cc0c202 --- /dev/null +++ b/net/ipv6/ila/Makefile @@ -0,0 +1,7 @@ +# +# Makefile for ILA module +# + +obj-$(CONFIG_IPV6_ILA) += ila.o + +ila-objs := ila_main.o ila_lwt.o diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h new file mode 100644 index 000..d2298b3 --- /dev/null +++ b/net/ipv6/ila/ila.h @@ -0,0 +1,50 @@ +/* + * Copyright (c) 2015 Tom Herbert t...@herbertland.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + */ + +#ifndef __ILA_H +#define __ILA_H + +#include linux/errno.h +#include linux/ip.h +#include linux/kernel.h +#include linux/module.h +#include linux/socket.h +#include linux/skbuff.h +#include linux/types.h +#include net/checksum.h +#include net/ip.h +#include net/protocol.h +#include uapi/linux/ila.h + +struct ila_params { + __be64 locator; +}; + +static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to) +{ + __be32 diff[] = { + ~from[0], ~from[1], to[0], to[1], + }; + + return csum_partial(diff, sizeof(diff), 0); +} + +static inline __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p) +{ + return compute_csum_diff8((__be32 *)ip6h-daddr, +
[PATCH net-next 2/3] net: Add inet_proto_csum_replace_by_diff utility function
This function updates a checksum field value and skb-csum based on a value which is the difference between the old and new checksum. Signed-off-by: Tom Herbert t...@herbertland.com --- include/net/checksum.h | 2 ++ net/core/utils.c | 13 + 2 files changed, 15 insertions(+) diff --git a/include/net/checksum.h b/include/net/checksum.h index 2d1d73c..0e0c987 100644 --- a/include/net/checksum.h +++ b/include/net/checksum.h @@ -144,6 +144,8 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb, const __be32 *from, const __be32 *to, int pseudohdr); +void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb, +__wsum diff, int pseudohdr); static inline void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb, __be16 from, __be16 to, diff --git a/net/core/utils.c b/net/core/utils.c index a7732a0..89ccfb1 100644 --- a/net/core/utils.c +++ b/net/core/utils.c @@ -336,6 +336,19 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb, } EXPORT_SYMBOL(inet_proto_csum_replace16); +void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb, +__wsum diff, int pseudohdr) +{ + if (skb-ip_summed != CHECKSUM_PARTIAL) { + *sum = csum_fold(csum_add(diff, ~csum_unfold(*sum))); + if (skb-ip_summed == CHECKSUM_COMPLETE pseudohdr) + skb-csum = ~csum_add(diff, ~skb-csum); + } else if (pseudohdr) { + *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum))); + } +} +EXPORT_SYMBOL(inet_proto_csum_replace_by_diff); + struct __net_random_once_work { struct work_struct work; struct static_key *key; -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 5/7] net: dsa: mv88e6xxx: add VLAN Purge support
Add support for the VTU Load Purge operation and implement the port_vlan_del driver function to remove a port from a VLAN entry, and delete the VLAN if the given port was its last member. Signed-off-by: Vivien Didelot vivien.dide...@savoirfairelinux.com --- drivers/net/dsa/mv88e6352.c | 1 + drivers/net/dsa/mv88e6xxx.c | 113 drivers/net/dsa/mv88e6xxx.h | 2 + 3 files changed, 116 insertions(+) diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c index e6767ce..cec38bb 100644 --- a/drivers/net/dsa/mv88e6352.c +++ b/drivers/net/dsa/mv88e6352.c @@ -344,6 +344,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = { .port_leave_bridge = mv88e6xxx_leave_bridge, .port_stp_update= mv88e6xxx_port_stp_update, .port_pvid_get = mv88e6xxx_port_pvid_get, + .port_vlan_del = mv88e6xxx_port_vlan_del, .vlan_getnext = mv88e6xxx_vlan_getnext, .port_fdb_add = mv88e6xxx_port_fdb_add, .port_fdb_del = mv88e6xxx_port_fdb_del, diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 6c86bad..8423924 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -1254,6 +1254,32 @@ static int _mv88e6xxx_vtu_stu_data_read(struct dsa_switch *ds, return 0; } +static int _mv88e6xxx_vtu_stu_data_write(struct dsa_switch *ds, +struct mv88e6xxx_vtu_stu_entry *entry, +unsigned int nibble_offset) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + u16 regs[3] = { 0 }; + int i; + int ret; + + for (i = 0; i ps-num_ports; ++i) { + unsigned int shift = (i % 4) * 4 + nibble_offset; + u8 data = entry-data[i]; + + regs[i / 4] |= (data GLOBAL_VTU_STU_DATA_MASK) shift; + } + + for (i = 0; i 3; ++i) { + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, + GLOBAL_VTU_DATA_0_3 + i, regs[i]); + if (ret 0) + return ret; + } + + return 0; +} + static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid, struct mv88e6xxx_vtu_stu_entry *entry) { @@ -1307,6 +1333,93 @@ static int _mv88e6xxx_vtu_getnext(struct dsa_switch *ds, u16 vid, return 0; } +static int _mv88e6xxx_vtu_loadpurge(struct dsa_switch *ds, + struct mv88e6xxx_vtu_stu_entry *entry) +{ + u16 reg = 0; + int ret; + + ret = _mv88e6xxx_vtu_wait(ds); + if (ret 0) + return ret; + + if (!entry-valid) + goto loadpurge; + + /* Write port member tags */ + ret = _mv88e6xxx_vtu_stu_data_write(ds, entry, 0); + if (ret 0) + return ret; + + if (mv88e6xxx_6097_family(ds) || mv88e6xxx_6165_family(ds) || + mv88e6xxx_6351_family(ds) || mv88e6xxx_6352_family(ds)) { + reg = entry-sid GLOBAL_VTU_SID_MASK; + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_SID, reg); + if (ret 0) + return ret; + + reg = entry-fid GLOBAL_VTU_FID_MASK; + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_FID, reg); + if (ret 0) + return ret; + } + + reg = GLOBAL_VTU_VID_VALID; +loadpurge: + reg |= entry-vid GLOBAL_VTU_VID_MASK; + ret = _mv88e6xxx_reg_write(ds, REG_GLOBAL, GLOBAL_VTU_VID, reg); + if (ret 0) + return ret; + + return _mv88e6xxx_vtu_cmd(ds, GLOBAL_VTU_OP_VTU_LOAD_PURGE); +} + +int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port, u16 vid) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + struct mv88e6xxx_vtu_stu_entry vlan; + bool keep = false; + int i, err; + + mutex_lock(ps-smi_mutex); + + err = _mv88e6xxx_vtu_getnext(ds, vid - 1, vlan); + if (err) + goto unlock; + + if (vlan.vid != vid || !vlan.valid || + vlan.data[port] == GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER) { + err = -ENOENT; + goto unlock; + } + + vlan.data[port] = GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER; + + /* keep the VLAN unless all ports are excluded */ + for (i = 0; i ps-num_ports; ++i) { + if (dsa_is_cpu_port(ds, i)) + continue; + + if (vlan.data[i] != GLOBAL_VTU_DATA_MEMBER_TAG_NON_MEMBER) { + keep = true; + break; + } + } + + vlan.valid = keep; + err = _mv88e6xxx_vtu_loadpurge(ds, vlan); + if (err) + goto unlock; + + if (!keep) + clear_bit(vlan.fid, ps-fid_bitmap); + +unlock: +
[PATCH net-next 0/7] net: dsa: mv88e6xxx: add hardware VLAN support
Hi All, This patchset brings support to access hardware VLAN entries in DSA and mv88e6xxx, through switchdev VLAN objects. In the following example, ports swp[0-2] belong to bridge br0, and ports swp[3-4] belong to bridge br1. Here's an example of what can be achieved after this patchset: # bridge vlan add dev swp1 vid 100 master # bridge vlan add dev swp2 vid 100 master # bridge vlan add dev swp3 vid 100 master # bridge vlan add dev swp4 vid 100 master # bridge vlan del dev swp1 vid 100 master The above commands correctly programmed hardware VLAN 100 for port swp2, while ports swp3 and swp4 use software VLAN 100, as shown with: # bridge vlan portvlan ids swp0None swp0 swp1None swp1 swp2 100 swp2 100 swp3 100 swp3 swp4 100 swp4 br0 None br1 None Assuming that port 5 is the CPU port, the hardware VLAN table would contain the following data: VID FID SID 0 1 2 3 4 5 6 10080 x x t x x t x Where 'x' means excluded, and 't' means tagged. Also, adding an FDB entry to VLAN 100 for port swp2 like this: # bridge fdb add 3c:97:0e:11:6e:30 dev swp2 vlan 100 Would result in the following example output: # bridge fdb # 01:00:5e:00:00:01 dev eth0 self permanent # 01:00:5e:00:00:01 dev eth1 self permanent # 00:50:d2:10:78:15 dev swp0 master br0 permanent # 00:50:d2:10:78:15 dev swp2 vlan 100 master br0 permanent # 3c:97:0e:11:6e:30 dev swp2 vlan 100 self static # 00:50:d2:10:78:15 dev swp3 master br1 permanent # 00:50:d2:10:78:15 dev swp3 vlan 100 master br1 permanent And the Address Translation Unit would contain: DB T/P Vec State Addr 008 Port 004 e 3c:97:0e:11:6e:30 Cheers, -v Vivien Didelot (7): net: dsa: add support for switchdev VLAN objects net: dsa: mv88e6xxx: flush VTU and STU entries net: dsa: mv88e6xxx: add VLAN Get Next support net: dsa: mv88e6xxx: add VLAN support to FDB dump net: dsa: mv88e6xxx: add VLAN Purge support net: dsa: mv88e6xxx: add VLAN Load support net: dsa: mv88e6xxx: use port 802.1Q mode Secure drivers/net/dsa/mv88e6352.c | 5 + drivers/net/dsa/mv88e6xxx.c | 510 +++- drivers/net/dsa/mv88e6xxx.h | 45 include/net/dsa.h | 11 + net/dsa/slave.c | 158 ++ 5 files changed, 720 insertions(+), 9 deletions(-) -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
From: Joe Perches j...@perches.com Date: Thu, 13 Aug 2015 10:15:15 -0700 On Thu, 2015-08-13 at 20:11 +0300, Igor Plyatov wrote: On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote: * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. [] diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c [] @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int rc; Is there a reason to move this declaration? There is no strict requirement to move declaration of the rc. It was made just to have all declarations easily visible. Generally it's better to have declarations in the minimal/narrowest scope possible. Agreed, and it's %100 unrelated to the purpose of this patch so not should be included for that reason as well. You will need to respin this patch with the variable moving elided. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: allow sleeping when modifying store_rps_map
On Thu, Aug 13, 2015 at 11:03 AM, Sasha Levin sasha.le...@oracle.com wrote: Commit 10e4ea751 (net: Fix race condition in store_rps_map) has moved the manipulation of the rps_needed jump label under a spinlock. Since changing the state of a jump label may sleep this is incorrect and causes warnings during runtime. Make rps_map_lock a mutex to allow sleeping under it. Fixes: 10e4ea751 (net: Fix race condition in store_rps_map) Signed-off-by: Sasha Levin sasha.le...@oracle.com --- net/core/net-sysfs.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 39ec694..b279077 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -689,7 +689,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, struct rps_map *old_map, *map; cpumask_var_t mask; int err, cpu, i; - static DEFINE_SPINLOCK(rps_map_lock); + static DEFINE_MUTEX(rps_map_mutex); if (!capable(CAP_NET_ADMIN)) return -EPERM; @@ -722,9 +722,9 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, map = NULL; } - spin_lock(rps_map_lock); + mutex_lock(rps_map_mutex); old_map = rcu_dereference_protected(queue-rps_map, - lockdep_is_held(rps_map_lock)); + mutex_is_locked(rps_map_mutex)); rcu_assign_pointer(queue-rps_map, map); if (map) @@ -732,7 +732,7 @@ static ssize_t store_rps_map(struct netdev_rx_queue *queue, if (old_map) static_key_slow_dec(rps_needed); - spin_unlock(rps_map_lock); + mutex_unlock(rps_map_mutex); if (old_map) kfree_rcu(old_map, rcu); -- 1.7.10.4 Thanks Sasha! Acked-by: Tom Herbert t...@herbertland.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
On Thu, 2015-08-13 at 20:11 +0300, Igor Plyatov wrote: On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote: * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. [] diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c [] @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int rc; Is there a reason to move this declaration? There is no strict requirement to move declaration of the rc. It was made just to have all declarations easily visible. Generally it's better to have declarations in the minimal/narrowest scope possible. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len
On Thu, 13 Aug 2015 10:49:50 -0700 Stephen Hemminger step...@networkplumber.org wrote: On Thu, 13 Aug 2015 19:01:05 +0200 Phil Sutter p...@nwl.cc wrote: Up to now, drivers being aware of the above applying to them set dev-tx_queue_len to zero to indicate no qdisc should be attached to the interface they drive and the kernel reacts upon this by assigning the noop qdisc instead of the default pfifo_fast. This implicit agreement though leads to an inconvenient situation once a user tries to attach a real qdisc to these devices, as the formerly special tx_queue_len value becomes a regular one, So this is a workaround for user ignorance by introducing kernel API complexity. Before user sets qdisc, why don't they set tx queue length? Please don't insist on keeping this broke interface... how should users know that BEFORE adding a qdisc they MUST change the _device_ tx queue length (not zero). Getting back to the original state, they MUST change the device tx queue len back to zero BEFORE deleting the qdisc, such that when assigning the default queue qdisc the system detects this device can work without a qdisc. Changing the tx queue len to zero after the qdisc is deleted will have not effect. Listen to the description, that interface is broken. The kernel really needs to hide these details from userspace. It even allows you to misconfigure the kernel, by tricking the kernel into assigning noqueue to physical devices that really need it. -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] bonding: only advertise LRO if underlying hardware can LRO
At present, you can create a bond, containing only underlying slaves that do not support LRO, and the bond will happily claim to support LRO, and allow LRO to be toggled on and off by ethtool. While things actually do function fine in the scenario, and this is merely cosmetic, its a bit misleading to users, and its something we can fix. If we add NETIF_F_LRO to the NETIF_F_ONE_FOR_ALL flags in netdev_features.h, then netdev_features_increment() will only enable LRO if 1) its listed in the device's feature mask and 2) if there's actually a slave present that supports the feature. However, the bnx2x, ixgbe, netxen, qlcnic and s2io drivers all fail to report support for LRO in their vlan_features, which requires some minor fixups to these drivers to keep LRO working in cases where it should have been before this set. The mellanox mlx5 and cavium liquidio drivers already properly set the LRO flag in their vlan_features. Note: I've only tested explicitly with bnx2x, as well as some non-LRO hw, to confirm that: 1) if all slaves support LRO, the bond enables LRO 2) if some slaves support LRO, the bond enables LRO 3) if no slaves support LRO, the bond disables LRO This set was generated against net-next master, it applies to 4.2.0-rc6 with a bit of fuzz. Jarod Wilson (6): net/bonding: enable LRO if one device supports it ethernet/bnx2x: advertise LRO support in vlan_features ethernet/ixgbe: advertise LRO support in vlan_features ethernet/netxen: advertise LRO support in vlan_features ethernet/qlcnic: advertise LRO support in vlan_features ethernet/s2io: advertise what hw supports in vlan_features drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 3 ++- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 4 +++- drivers/net/ethernet/neterion/s2io.c | 1 + drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c | 4 +++- drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 4 +++- include/linux/netdev_features.h | 3 ++- 6 files changed, 14 insertions(+), 5 deletions(-) CC: David S. Miller da...@davemloft.net CC: Ariel Elior ariel.el...@qlogic.com CC: Manish Chopra manish.cho...@qlogic.com CC: Rajesh Borundia rajesh.borun...@qlogic.com CC: Shahed Shaikh shahed.sha...@qlogic.com CC: Sony Chacko sony.cha...@qlogic.com CC: dept-gelinuxnic...@qlogic.com CC: Jiri Pirko j...@resnulli.us CC: Jon Mason jdma...@kudzu.us CC: Scott Feldman sfel...@gmail.com CC: Tom Herbert therb...@google.com CC: Jeff Kirsher jeffrey.t.kirs...@intel.com CC: intel-wired-...@lists.osuosl.org CC: netdev@vger.kernel.org -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 1/2] tcp: don't extend RTO on failed loss probe attempts
On Wed, 2015-08-12 at 11:18 -0700, Yuchung Cheng wrote: If TLP was unable to send a probe, it extended the RTO to now + icsk_rto. But extending the RTO makes little sense if no TLP probe went out. With this commit, instead of extending the RTO we re-arm it relative to the transmit time of the write queue head. Signed-off-by: Yuchung Cheng ych...@google.com Signed-off-by: Neal Cardwell ncardw...@google.com Signed-off-by: Nandita Dukkipati nandi...@google.com --- Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] net/bonding: enable LRO if one device supports it
Currently, all bonding devices come up, and claim to have LRO support, which ethtool will let you toggle on and off, even if none of the underlying hardware devices actually support it. While the bonding driver takes precautions for slaves that don't support all features, this is at least a little bit misleading to users. If we add NETIF_F_LRO to the NETIF_F_ONE_FOR_ALL flags in netdev_features.h, then netdev_features_increment() will only enable LRO if 1) its listed in the device's feature mask and 2) if there's actually a slave present that supports the feature. Note that this is going to require some follow-up patches, as not all LRO capable device drivers are currently properly reporting LRO support in their vlan_features, which is where the bonding driver picks up device-specific features. CC: David S. Miller da...@davemloft.net CC: Jiri Pirko j...@resnulli.us CC: Tom Herbert therb...@google.com CC: Scott Feldman sfel...@gmail.com CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- include/linux/netdev_features.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 9672781..6440bf1 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -159,7 +159,8 @@ enum { */ #define NETIF_F_ONE_FOR_ALL(NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ROBUST | \ NETIF_F_SG | NETIF_F_HIGHDMA | \ -NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED) +NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED | \ +NETIF_F_LRO) /* * If one device doesn't support one of these features, then disable it -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 net-next 06/10] openvswitch: Allow matching on conntrack mark
On Wed, Aug 12, 2015 at 4:41 PM, Joe Stringer joestrin...@nicira.com wrote: On 12 August 2015 at 16:00, Pravin Shelar pshe...@nicira.com wrote: On Tue, Aug 11, 2015 at 3:59 PM, Joe Stringer joestrin...@nicira.com wrote: From: Justin Pettit jpet...@nicira.com Allow matching and setting the conntrack mark field. As with conntrack state and zone, these are populated by executing the ct() action. Unlike these, the ct_mark is also a writable field. The set_field() action may be used to modify the mark, which will take effect on the most recent conntrack entry. E.g.: actions:ct(zone=0),ct(zone=1),set_field(1-ct_mark) This will perform conntrack lookup in zone 0, then lookup in zone 1, then modify the mark for the entry in zone 1. The mark for the entry in zone 0 is unchanged. The conntrack entry itself must be committed using the commit flag in the conntrack action flags for this change to persist. Signed-off-by: Justin Pettit jpet...@nicira.com Signed-off-by: Joe Stringer joestrin...@nicira.com --- ... +int ovs_ct_set_mark(struct sk_buff *skb, struct sw_flow_key *key, + u32 ct_mark, u32 mask) +{ +#ifdef CONFIG_NF_CONNTRACK_MARK + enum ip_conntrack_info ctinfo; + struct nf_conn *ct; + u32 new_mark; + + /* This must happen directly after lookup/commit. */ + ct = nf_ct_get(skb, ctinfo); + if (!ct) + return -EINVAL; + + new_mark = ct_mark | (ct-mark ~(mask)); + if (ct-mark != new_mark) { + ct-mark = new_mark; + nf_conntrack_event_cache(IPCT_MARK, ct); + key-ct.mark = ct_mark; + } + Is it fine to set just set mark and not initialize reset of key-ct members? I don't quite follow. This action acts upon the current connection, and modifies its metadata. key-ct should already be populated with the existing connection info. I had offline discussion with Joe. The fields are initialized in prior conntrack action. So now he is exploring if we can bring conntrack, set mark and set lable actions under one single conntrack action using parameters. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] inet: fix potential deadlock in reqsk_queue_unlink()
From: Eric Dumazet eduma...@google.com When replacing del_timer() with del_timer_sync(), I introduced a deadlock condition : reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop() inet_csk_reqsk_queue_drop() can be called from many contexts, one being the timer handler itself (reqsk_timer_handler()). In this case, del_timer_sync() loops forever. Simple fix is to test if timer is pending. Fixes: 2235f2ac75fd (inet: fix races with reqsk timers) Signed-off-by: Eric Dumazet eduma...@google.com --- Sorry for this very embarrassing bug, I should have caught it in my tests :( net/ipv4/inet_connection_sock.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 05e3145f7dc3..134957159c27 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -593,7 +593,7 @@ static bool reqsk_queue_unlink(struct request_sock_queue *queue, } spin_unlock(queue-syn_wait_lock); - if (del_timer_sync(req-rsk_timer)) + if (timer_pending(req-rsk_timer) del_timer_sync(req-rsk_timer)) reqsk_put(req); return found; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Enable smsc911x for use with ACPI
From: Jeremy Linton jeremy.lin...@arm.com Date: Wed, 12 Aug 2015 17:06:25 -0500 This set of patches enables the front Ethernet port on the ARM Juno development platform when used with an ACPI enabled kernel. These patches covert the of_property* calls in the driver to the DT/ACPI agnostic device_property* calls, and add the arm hardware id to the acpi_match_table. To support the above changes I copied a couple routines from of_net into the properties.c file, and modified them to be ACPI/DT agnostic. I'm not 100% sure this is the correct location for these functions. But I think they are required to avoid having a dozen different implementations scattered across assorted Ethernet adapters that are being enabled to use ACPI properties. I realize that there are still some rinkles to work out, but I applied this patch series as-is to net-next, and any fixups should be submitted as followups. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
Dear David and Joe, thank you for patch review! Please look at email with subject [PATCH v2] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging Best wishes. -- Igor Plyatov From: Joe Perches j...@perches.com Date: Thu, 13 Aug 2015 10:15:15 -0700 On Thu, 2015-08-13 at 20:11 +0300, Igor Plyatov wrote: On Thu, 2015-08-13 at 16:12 +0300, Igor Plyatov wrote: * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. [] diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c [] @@ -104,10 +104,12 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int rc; Is there a reason to move this declaration? There is no strict requirement to move declaration of the rc. It was made just to have all declarations easily visible. Generally it's better to have declarations in the minimal/narrowest scope possible. Agreed, and it's %100 unrelated to the purpose of this patch so not should be included for that reason as well. You will need to respin this patch with the variable moving elided. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel warning in tcp_fragment
Hi, On Wed, Aug 12, 2015 at 8:45 PM, Martin KaFai Lau ka...@fb.com wrote: On Mon, Aug 10, 2015 at 02:35:37PM -0400, Neal Cardwell wrote: On Mon, Aug 10, 2015 at 2:10 PM, Jovi Zhangwei j...@cloudflare.com wrote: Ping? We saw a lot of this warnings in our production system. It would be great appreciate if someone can give us the fix on this warnings. :) What is your net.ipv4.tcp_mtu_probing setting? If 1, have you tried setting it to 0? Hi Jovi, If setting net.ipv4.tcp_mtu_probing=0 helps, can you give the patch we posted earlier a try: https://patchwork.ozlabs.org/patch/481609/ It is the same patch that I pointed out earlier. You can click on the download link. We are currently using a similar patch while keeping net.ipv4.tcp_mtu_probing=1. Our system need net.ipv4.tcp_mtu_probing, so we cannot set it to 0. We are testing previous patch given by Neal, I will let you know the result. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net/fsl: simplify Kconfig dependency list for fsl networking
make the list of Kconfig dependencies for Freescale networking more general. Simplify to supported architectures: ARM, ARM64, PPC, M68K Signed-off-by: Stuart Yoder stuart.yo...@freescale.com --- drivers/net/ethernet/freescale/Kconfig | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig index ff76d4e..70782d7 100644 --- a/drivers/net/ethernet/freescale/Kconfig +++ b/drivers/net/ethernet/freescale/Kconfig @@ -5,9 +5,7 @@ config NET_VENDOR_FREESCALE bool Freescale devices default y - depends on FSL_SOC || QUICC_ENGINE || CPM1 || CPM2 || PPC_MPC512x || \ - M523x || M527x || M5272 || M528x || M520x || M532x || \ - ARCH_MXC || ARCH_MXS || (PPC_MPC52xx PPC_BESTCOMM) + depends on M68K || PPC || ARM || ARM64 ---help--- If you have a network (Ethernet) card belonging to this class, say Y. -- 2.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 5/6] geneve: Consolidate Geneve functionality in single module.
On Thu, Aug 13, 2015 at 2:24 PM, Jesse Gross je...@nicira.com wrote: On Wed, Aug 12, 2015 at 4:33 PM, Pravin Shelar pshe...@nicira.com wrote: On Wed, Aug 12, 2015 at 2:55 PM, Jesse Gross je...@nicira.com wrote: The farther I go in the series, the more that I hope that we can avoid the use of collect_md_tun. It really seems to add a lot of special cases. Use of collect_md_tun allows us to avoid hash table lookup. thats why I did it. Anyways we need a flag or pointer in geneve-sock structure to locate tunnel metadata. I dont see how is it simple if collect_md_tun is replaced with a flag. It seems to me that this requires more bookkeeping to keep consistent (a little in the code but mostly mentally) because collect_md_tun is a separate concept that doesn't necessarily follow the same rules as other tunnels. With VXLAN, I feel like I can mostly ignore this tunnel since it isn't special in most ways with the exception of places where it actually needs to do something different like allocate metadata. ok, I will update the patch. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2 0/2] minor tail loss probe improvements
From: Yuchung Cheng ych...@google.com Date: Wed, 12 Aug 2015 11:18:17 -0700 This patch series enhance the tail loss probe (TLP) on some error conditions. When TLP fails to send a probe, it will no longer extend the RTO. When it fails to send a new packet because of receiver window limit, it'll try to retransmit the last packet. Series applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN
On Thu, 2015-08-13 at 14:21 -0700, Calvin Owens wrote: Commit 8133534c760d4083 (net: limit tcp/udp rmem/wmem to SOCK_{RCV,SND}BUF_MIN) modified four sysctls to enforce that the values written to them are not less than SOCK_MIN_{RCV,SND}BUF. ... This reverts commit 8133534c760d4083f79d2cde42c636ccc0b2792e. Fixes: 8133534c760d4083 (net: limit tcp/udp rmem/wmem to SOCK_MIN...) Cc: Eric Dumazet eric.duma...@gmail.com Cc: Sorin Dumitru so...@returnze.ro Signed-off-by: Calvin Owens calvinow...@fb.com Acked-by: Eric Dumazet eduma...@google.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html