date:20151006

[PATCH net-next] cxgb4: Enhance driver to update FW, when FW is too old

2015-10-06 Thread Hariprasad Shenai

t4_check_fw_version() can return several error codes (-EINVAL, -EBUSY,
-EAGAIN). The present code sets the adapter state to UNINIT only if its
an EFAULT. In all the error cases set the adapter to uninitialized state.

In t4_check_fw_version() if call to t4_read_flash() fails, repeat the
operation a few times before returning failure.

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +-
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c  | 6 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 9f1f5b2..c29227e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3698,7 +3698,7 @@ static int adap_init0(struct adapter *adap)
t4_get_tp_version(adap, >params.tp_vers);
ret = t4_check_fw_version(adap);
/* If firmware is too old (not supported by driver) force an update. */
-   if (ret == -EFAULT)
+   if (ret)
state = DEV_STATE_UNINIT;
if ((adap->flags & MASTER_PF) && state != DEV_STATE_INIT) {
struct fw_info *fw_info;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c 
b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index dc6ce31..874b599 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -2981,11 +2981,15 @@ int t4_get_exprom_version(struct adapter *adap, u32 
*vers)
  */
 int t4_check_fw_version(struct adapter *adap)
 {
-   int ret, major, minor, micro;
+   int i, ret, major, minor, micro;
int exp_major, exp_minor, exp_micro;
unsigned int chip_version = CHELSIO_CHIP_VERSION(adap->params.chip);
 
ret = t4_get_fw_version(adap, >params.fw_vers);
+   /* Try multiple times before returning error */
+   for (i = 0; ret && i < 3; i++)
+   ret = t4_get_fw_version(adap, >params.fw_vers);
+
if (ret)
return ret;
 
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL nf-next 0/2] Fourth Round of IPVS Updates for v4.4

2015-10-06 Thread Simon Horman

Hi Pablo,

please consider these build warning cleanups from David Ahern and myself.
They resolve some minor side effects of Eric Biederman' heroic work to
cleanup IPVS which you recently pulled: its queued up for v4.4 so no need
to worry about earlier kernel versions.

The following changes since commit a29a9a585b2840a205f085a34dfd65c75e86f7c3:

  netfilter: nfnetlink_log: allow to attach conntrack (2015-10-05 17:32:14 
+0200)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git 
tags/ipvs4-for-v4.4

for you to fetch changes up to 92240e8dc0d1a94ea3dde7cb19aace113dcc6cb9:

  ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup} 
(2015-10-07 10:12:00 +0900)


David Ahern (1):
  ipvs: Remove possibly unused variable from ip_vs_out

Simon Horman (1):
  ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup}

 net/netfilter/ipvs/ip_vs_conn.c | 13 +
 net/netfilter/ipvs/ip_vs_core.c |  3 +--
 2 files changed, 6 insertions(+), 10 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nf-next 2/2] ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup}

2015-10-06 Thread Simon Horman

If CONFIG_PROC_FS is undefined then the arguments of proc_create()
and remove_proc_entry() are unused. As a result the net variables of
ip_vs_conn_net_{init,cleanup} are unused.

net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_init’:
net/netfilter/ipvs//ip_vs_conn.c:1350:14: warning: unused variable ‘net’ 
[-Wunused-variable]
net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_cleanup’:
net/netfilter/ipvs//ip_vs_conn.c:1361:14: warning: unused variable ‘net’ 
[-Wunused-variable]
...

Resolve this by dereferencing net as needed rather than storing it
in a variable.

Fixes: 3d99376689ee ("ipvs: Pass ipvs not net into 
ip_vs_control_net_(init|cleanup)")
Signed-off-by: Simon Horman 
Acked-by: Julian Anastasov 
---
 net/netfilter/ipvs/ip_vs_conn.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index d1d168c7fc68..85ca189bdc3d 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1347,23 +1347,20 @@ flush_again:
  */
 int __net_init ip_vs_conn_net_init(struct netns_ipvs *ipvs)
 {
-   struct net *net = ipvs->net;
-
atomic_set(>conn_count, 0);
 
-   proc_create("ip_vs_conn", 0, net->proc_net, _vs_conn_fops);
-   proc_create("ip_vs_conn_sync", 0, net->proc_net, _vs_conn_sync_fops);
+   proc_create("ip_vs_conn", 0, ipvs->net->proc_net, _vs_conn_fops);
+   proc_create("ip_vs_conn_sync", 0, ipvs->net->proc_net,
+   _vs_conn_sync_fops);
return 0;
 }
 
 void __net_exit ip_vs_conn_net_cleanup(struct netns_ipvs *ipvs)
 {
-   struct net *net = ipvs->net;
-
/* flush all the connection entries first */
ip_vs_conn_flush(ipvs);
-   remove_proc_entry("ip_vs_conn", net->proc_net);
-   remove_proc_entry("ip_vs_conn_sync", net->proc_net);
+   remove_proc_entry("ip_vs_conn", ipvs->net->proc_net);
+   remove_proc_entry("ip_vs_conn_sync", ipvs->net->proc_net);
 }
 
 int __init ip_vs_conn_init(void)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nf-next 1/2] ipvs: Remove possibly unused variable from ip_vs_out

2015-10-06 Thread Simon Horman

From: David Ahern 

Eric's net namespace changes in 1b75097dd7a26 leaves net unreferenced if
CONFIG_IP_VS_IPV6 is not enabled:

../net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_out’:
../net/netfilter/ipvs/ip_vs_core.c:1177:14: warning: unused variable ‘net’ 
[-Wunused-variable]

After the net refactoring there is only 1 user; push the reference to the
1 user. While the line length slightly exceeds 80 it seems to be the
best change.

Fixes: 1b75097dd7a26("ipvs: Pass ipvs into ip_vs_out")
Signed-off-by: David Ahern 
Acked-by: Julian Anastasov 
[horms: updated subject]
Signed-off-by: Simon Horman 
---
 net/netfilter/ipvs/ip_vs_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index d08df435c2aa..3773154d9b71 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1172,7 +1172,6 @@ drop:
 static unsigned int
 ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, 
int af)
 {
-   struct net *net = ipvs->net;
struct ip_vs_iphdr iph;
struct ip_vs_protocol *pp;
struct ip_vs_proto_data *pd;
@@ -1272,7 +1271,7 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, 
struct sk_buff *skb, in
 #ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6) {
if (!skb->dev)
-   skb->dev = net->loopback_dev;
+   skb->dev = 
ipvs->net->loopback_dev;
icmpv6_send(skb,
ICMPV6_DEST_UNREACH,
ICMPV6_PORT_UNREACH,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: netpoll_send_skb_on_dev warning with bnx2

2015-10-06 Thread Vinson Lee

On Fri, Oct 2, 2015 at 6:10 AM, Neil Horman  wrote:
> On Thu, Oct 01, 2015 at 08:25:46PM -0700, Vinson Lee wrote:
>> Hi.
>>
>> I am seeing a netpoll_send_skb_on_dev warning with bnx2. It happens on
>> Linux 4.1 and I am able to reproduce the warning with Linux 4.3-rc3.
>>
>> [ cut here ]
>> WARNING: CPU: 11 PID: 3110 at net/core/netpoll.c:368
>> netpoll_send_skb_on_dev+0x183/0x201()
>> netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll
>> (bnx2_start_xmit+0x0/0x5d4 [bnx2])
>> Modules linked in: netconsole(+) configfs ipv6 ppdev parport_pc lp
>> parport tcp_diag inet_diag ipmi_devintf serio_raw iTCO_wdt
>> iTCO_vendor_support ipmi_si ipmi_msghandler hpilo hpwdt bnx2 coretemp
>> kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode lpc_ich
>> mfd_core i7core_edac edac_core pcc_cpufreq acpi_cpufreq shpchp
>> sch_fq_codel hpsa radeon ttm drm_kms_helper drm fb_sys_fops sysimgblt
>> sysfillrect syscopyarea i2c_algo_bit i2c_core
>> CPU: 11 PID: 3110 Comm: modprobe Not tainted 4.3.0-rc3 #1
>>  0170 88060be17a18 812bc84c 818b1a78
>>  88060be17a68 88060be17a58 81064712 0be17a68
>>  814ec12e 88060bc10900  8800d941c000
>> Call Trace:
>>  [] dump_stack+0x48/0x5c
>>  [] warn_slowpath_common+0xa1/0xbb
>>  [] ? netpoll_send_skb_on_dev+0x183/0x201
>>  [] warn_slowpath_fmt+0x46/0x48
>>  [] ? bnx2_run_loopback+0x391/0x391 [bnx2]
>>  [] netpoll_send_skb_on_dev+0x183/0x201
>>  [] netpoll_send_udp+0x3df/0x3f1
>>  [] write_msg+0xaf/0xe9 [netconsole]
>>  [] call_console_drivers.clone.2+0xd1/0xe9
>>  [] console_unlock+0x30e/0x3a5
>>  [] register_console+0x2af/0x322
>>  [] init_netconsole+0x1b4/0x224 [netconsole]
>>  [] ? 0xa0497000
>>  [] do_one_initcall+0xf7/0x182
>>  [] ? kmem_cache_alloc_trace+0xb6/0xf0
>>  [] ? do_init_module+0x31/0x1e4
>>  [] do_init_module+0x69/0x1e4
>>  [] load_module+0x1451/0x160b
>>  [] ? mod_kobject_put+0x4d/0x4d
>>  [] ? __vmalloc_node+0x3e/0x40
>>  [] SyS_init_module+0x14f/0x155
>>  [] entry_SYSCALL_64_fastpath+0x12/0x6a
>> ---[ end trace 878215466e581776 ]---
>>
>> Cheers,
>> Vinson
>
> Hmm, that would suggest that someone called local_irq_enable in either the 
> xmit,
> poll_controller or napi poll paths (or one of its always-disable, bretheren),
> but I dont see where that might be happening.  Are you able to instrument the
> kernel (either by cloning that WARN_ON_ONCE call or via stap script), so we 
> can
> further isolate which call path the problem is happening in?
>
> Neil
>

I bisected the regression and it is introduced with this Linux 3.19-rc1 commit.

e22b886a8a43b147e1994a9f970f678fc0df2033 is the first bad commit
commit e22b886a8a43b147e1994a9f970f678fc0df2033
Author: Peter Zijlstra 
Date:   Wed Sep 24 10:18:48 2014 +0200

sched/wait: Add might_sleep() checks

Add more might_sleep() checks, suppose someone put a wait_event() like
thing in a wait loop..

Can't put might_sleep() in ___wait_event() because there's the locked
primitives which call ___wait_event() with locks held.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: t...@linutronix.de
Cc: ilya.dryo...@inktank.com
Cc: umgwanakikb...@gmail.com
Cc: Oleg Nesterov 
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/20140924082242.119255...@infradead.org
Signed-off-by: Ingo Molnar 

:04 04 1536c4cc3c706b4129452ce023c69733b46a23e4
22c894c6ae02be75e5f772d39fde178f036e906f Minclude

Vinson
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] regmap: only call custom reg_update_bits() if reg is marked volatile

2015-10-06 Thread Mark Brown

On Tue, Oct 06, 2015 at 06:25:08AM -0700, David Miller wrote:

> > 4) David should then merge the regmap for-next branch into net-next

> Nope, this doesn't work at all.

> It is my tree which people depend upon, not the other way around.

Yes, it does work - this is the way we normally handle cross tree
issues.  There is nothing about pulling code from other trees into your
tree which will stop other people depending on your tree, obviously
anything you merge in needs to stay fast forward only and ideally not
make any resulting pull requests look terrible but that's really the
only restriction.

signature.asc
Description: Digital signature

Re: linux-next: Tree for Oct 6

2015-10-06 Thread Hans-Peter Nilsson

> From: Sudip Mukherjee 
> Date: Tue, 6 Oct 2015 14:33:46 +0200

> On Tue, Oct 06, 2015 at 06:25:22PM +1100, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Changes since 20151002:
> 
> The build for cris allmodconfig is failing with the error:
> 
> net/sched/sch_dsmark.c: In function 'dsmark_dequeue':
> net/sched/sch_dsmark.c:316:1: error: unrecognizable insn:
> (insn 245 244 119 15 (set (reg:QI 11 r11 [179])
> (and:QI (mem/s/j:QI (reg/f:SI 2 r2 [orig:48 D.44939 ] [48]) [0 
> D.44939_34->mask+0 S1 A8])
> (reg:QI 11 r11 [179]))) include/net/dsfield.h:33 -1
>  (nil))
> net/sched/sch_dsmark.c:316:1: internal compiler error: in extract_insn, at 
> recog.c:2109
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.
> make[2]: *** [net/sched/sch_dsmark.o] Error 1
> make[1]: *** [net/sched] Error 2
> make[1]: *** Waiting for unfinished jobs
> 
> It says compiler error, but with 4.3-rc4 it compiled properly.
> Build log of v4.3-rc4 is at
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/83633972
> 
> Build log of linux-next is at:
> https://travis-ci.org/sudipm-mukherjee/parport/jobs/83839470

Thanks for the heads-up, but any chance of a bug report as per
the instructions in the message?  It'll only speed up matters.

brgds, H-P
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HW communication debugging interface - ideas?

2015-10-06 Thread Jiri Pirko

Tue, Oct 06, 2015 at 04:54:09PM CEST, john.fastab...@gmail.com wrote:
>On 15-10-06 01:14 AM, Jiri Pirko wrote:
>> Mon, Oct 05, 2015 at 05:47:09PM CEST, john.fastab...@gmail.com wrote:
>>> On 15-10-05 08:35 AM, Jiri Pirko wrote:
 Mon, Oct 05, 2015 at 05:29:09PM CEST, john.fastab...@gmail.com wrote:
> On 15-10-05 08:18 AM, Jiri Pirko wrote:
>> Mon, Oct 05, 2015 at 04:58:42PM CEST, and...@lunn.ch wrote:
>>> On Mon, Oct 05, 2015 at 04:55:42PM +0200, Jiri Pirko wrote:
 Mon, Oct 05, 2015 at 04:49:41PM CEST, and...@lunn.ch wrote:
>>> Are you referring here to messages of the EMAD protocol ?
>
> I know nothing about this protocol.
>
> Does it at least use standard Ethernet framing? Source and Destination
> header and an EtherType which mean EMAD?

 Yep, but that does not really matter. I believe we should find 
 debugging
 interface which is protocol agnostic. Just arbitrary messages
 monitoring.
>>>
>>> Hi Jiri
>>>
>>> O.K, it is just that you mentioned wireshark. Passing the frames to
>>> network interface taps would make this trivial.
>>
>> That is true. But using netlink+nlmon would do the same.
>
> Also I guess if you go this direction you want to make it generic
> enough for any drivers to use it to snoop software/firmware msgs. This
> is common across many devices.

 Yes, definitelly, this should be something generic to be usable for
 every device type.


>
> In the past though I've just used ethtool dump commands and some
> "scripts" on top of this to debug devices. And when it got really
> bad wrote some throw away code to debug my issue. I guess it might
> be nice to have something in the kernel to improve this but have
> you considered using the tracing features that already exist?

 Which ones do you have in mind?

>>>
>>> I was thinking something like kprobes+bpf to dump a trace and
>>> then a lua script in wireshark to parse the input and pretty
>>> print it for users. This might get you good-enough support without
>>> having to carry it around in the kernel just so we can debug
>>> the devices. We could build some libs/pkgs around it in userspace
>>> and get it published somewhere so we can all work on it together.
>> 
>> Well, I was thinking rather about some standard interface, not dependent
>> on actual kernel internals.
>> 
>
>
>Sure just throwing out an idea. I suspect whatever interface you have
>will include the vendor-id or some other identifier and a set of
>parsers in user space to pretty print the msg.

Yes, that is the plan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/5] net: dsa: complete and fix the dsa unbinding

2015-10-06 Thread Neil Armstrong

In order to cleanly unbind the dsa core, either as a module removal,
or a platform device unbind, switch the allocation the their devm_
counterparts and complete the destroy functions.

First, the missing kfree were added, the remove function were
completed then kfree were removed in favor to devm_ calls.

The last patch is an way to cleanly exit the probe when no
switch is found in the discover process.

The patches are based on the current net.

v3:
 - make checkpatch happy with 1/5 & 5/5
 - fix 5/5 exit path with a goto

Neil Armstrong (5):
  net: dsa: add missing kfree on remove
  net: dsa: add missing dsa_switch mdiobus remove
  net: dsa: complete dsa_switch_destroy
  net: dsa: switch to devm_ calls and remove kfree calls
  net: dsa: exit probe if no switch were found

 net/dsa/dsa.c | 70 +--
 1 file changed, 59 insertions(+), 11 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HW communication debugging interface - ideas?

2015-10-06 Thread John Fastabend

On 15-10-06 01:14 AM, Jiri Pirko wrote:
> Mon, Oct 05, 2015 at 05:47:09PM CEST, john.fastab...@gmail.com wrote:
>> On 15-10-05 08:35 AM, Jiri Pirko wrote:
>>> Mon, Oct 05, 2015 at 05:29:09PM CEST, john.fastab...@gmail.com wrote:
 On 15-10-05 08:18 AM, Jiri Pirko wrote:
> Mon, Oct 05, 2015 at 04:58:42PM CEST, and...@lunn.ch wrote:
>> On Mon, Oct 05, 2015 at 04:55:42PM +0200, Jiri Pirko wrote:
>>> Mon, Oct 05, 2015 at 04:49:41PM CEST, and...@lunn.ch wrote:
>> Are you referring here to messages of the EMAD protocol ?

 I know nothing about this protocol.

 Does it at least use standard Ethernet framing? Source and Destination
 header and an EtherType which mean EMAD?
>>>
>>> Yep, but that does not really matter. I believe we should find debugging
>>> interface which is protocol agnostic. Just arbitrary messages
>>> monitoring.
>>
>> Hi Jiri
>>
>> O.K, it is just that you mentioned wireshark. Passing the frames to
>> network interface taps would make this trivial.
>
> That is true. But using netlink+nlmon would do the same.

 Also I guess if you go this direction you want to make it generic
 enough for any drivers to use it to snoop software/firmware msgs. This
 is common across many devices.
>>>
>>> Yes, definitelly, this should be something generic to be usable for
>>> every device type.
>>>
>>>

 In the past though I've just used ethtool dump commands and some
 "scripts" on top of this to debug devices. And when it got really
 bad wrote some throw away code to debug my issue. I guess it might
 be nice to have something in the kernel to improve this but have
 you considered using the tracing features that already exist?
>>>
>>> Which ones do you have in mind?
>>>
>>
>> I was thinking something like kprobes+bpf to dump a trace and
>> then a lua script in wireshark to parse the input and pretty
>> print it for users. This might get you good-enough support without
>> having to carry it around in the kernel just so we can debug
>> the devices. We could build some libs/pkgs around it in userspace
>> and get it published somewhere so we can all work on it together.
> 
> Well, I was thinking rather about some standard interface, not dependent
> on actual kernel internals.
> 


Sure just throwing out an idea. I suspect whatever interface you have
will include the vendor-id or some other identifier and a set of
parsers in user space to pretty print the msg.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/5] net: dsa: add missing dsa_switch mdiobus remove

2015-10-06 Thread Neil Armstrong

To prevent memory leakage on unbinding, add missing mdiobus unregister
and unallocation calls.

Reviewed-by: Florian Fainelli 
Signed-off-by: Neil Armstrong 
---
 net/dsa/dsa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index ed9d43f..14fac4e 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -424,6 +424,8 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
if (ds->hwmon_dev)
hwmon_device_unregister(ds->hwmon_dev);
 #endif
+   mdiobus_unregister(ds->slave_mii_bus);
+   mdiobus_free(ds->slave_mii_bus);
 }

 #ifdef CONFIG_PM_SLEEP
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/5] net: dsa: switch to devm_ calls and remove kfree calls

2015-10-06 Thread Neil Armstrong

Now the kfree calls exists in the the remove functions, remove them in all
places except the of_probe functions and replace allocation calls
with their devm_ counterparts.

Reviewed-by: Florian Fainelli 
Signed-off-by: Neil Armstrong 
---
 net/dsa/dsa.c | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 6155923..d5a162c 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -306,7 +306,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
if (ret < 0)
goto out;

-   ds->slave_mii_bus = mdiobus_alloc();
+   ds->slave_mii_bus = devm_mdiobus_alloc(parent);
if (ds->slave_mii_bus == NULL) {
ret = -ENOMEM;
goto out;
@@ -315,7 +315,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)

ret = mdiobus_register(ds->slave_mii_bus);
if (ret < 0)
-   goto out_free;
+   goto out;


/*
@@ -368,10 +368,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)

return ret;

-out_free:
-   mdiobus_free(ds->slave_mii_bus);
 out:
-   kfree(ds);
return ret;
 }

@@ -401,7 +398,7 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
/*
 * Allocate and initialise switch state.
 */
-   ds = kzalloc(sizeof(*ds) + drv->priv_size, GFP_KERNEL);
+   ds = devm_kzalloc(parent, sizeof(*ds) + drv->priv_size, GFP_KERNEL);
if (ds == NULL)
return ERR_PTR(-ENOMEM);

@@ -462,7 +459,6 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
}

mdiobus_unregister(ds->slave_mii_bus);
-   mdiobus_free(ds->slave_mii_bus);
 }

 #ifdef CONFIG_PM_SLEEP
@@ -922,7 +918,7 @@ static int dsa_probe(struct platform_device *pdev)
goto out;
}

-   dst = kzalloc(sizeof(*dst), GFP_KERNEL);
+   dst = devm_kzalloc(>dev, sizeof(*dst), GFP_KERNEL);
if (dst == NULL) {
dev_put(dev);
ret = -ENOMEM;
@@ -953,10 +949,8 @@ static void dsa_remove_dst(struct dsa_switch_tree *dst)
for (i = 0; i < dst->pd->nr_chips; i++) {
struct dsa_switch *ds = dst->ds[i];

-   if (ds) {
+   if (ds)
dsa_switch_destroy(ds);
-   kfree(ds);
-   }
}
 }

@@ -965,7 +959,6 @@ static int dsa_remove(struct platform_device *pdev)
struct dsa_switch_tree *dst = platform_get_drvdata(pdev);

dsa_remove_dst(dst);
-   kfree(dst);
dsa_of_remove(>dev);

return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: HW communication debugging interface - ideas?

2015-10-06 Thread Andrew Lunn

> Sure just throwing out an idea. I suspect whatever interface you have
> will include the vendor-id or some other identifier and a set of
> parsers in user space to pretty print the msg.

If you are going to use wireshark, in this case, all you need to do is
make the stream as being Ethernet frames. The destination and
Ethertype tell you all you need to know to identify the protocol.

  Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Jiri Benc

Some more feedback after doing a deeper review.

On Mon,  5 Oct 2015 10:58:17 -0700, Pravin B Shelar wrote:
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -703,6 +703,32 @@ err:
>   return NETDEV_TX_OK;
>  }
>  
> +static int geneve_egress_tun_info(struct net_device *dev, struct sk_buff 
> *skb,
> +   struct ip_tunnel_info *egress_tun_info,
> +   const void **egress_tun_opts)
> +{
> + struct geneve_dev *geneve = netdev_priv(dev);
> + struct ip_tunnel_info *info;
> + struct rtable *rt;
> + struct flowi4 fl4;
> + __be16 sport;
> +
> + info = skb_tunnel_info(skb);
> + if (ip_tunnel_info_af(info) != AF_INET)
> + return -EINVAL;
> +
> + rt = geneve_get_rt(skb, dev, , info);

This will increase dev tx error stats in case the lookup fails which is
probably something we don't want.

[...]
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -60,6 +60,7 @@ struct wireless_dev;
>  /* 802.15.4 specific */
>  struct wpan_dev;
>  struct mpls_dev;
> +struct ip_tunnel_info;
>  
>  void netdev_set_default_ethtool_ops(struct net_device *dev,
>   const struct ethtool_ops *ops);
> @@ -1054,6 +1055,11 @@ typedef u16 (*select_queue_fallback_t)(struct 
> net_device *dev,
>   *   This function is used to pass protocol port error state information
>   *   to the switch driver. The switch driver can react to the proto_down
>   *  by doing a phys down on the associated switch port.
> + * int (*ndo_get_egress_info)(struct net_device *dev, struct sk_buff *skb,
> + * __be32 *saddr, __be16 *sport, __be16 *dport);
> + *   This function is used to get egress tunnel information for given skb.
> + *   This is useful for retrieving outer tunnel header parameters while
> + *   sampling packet.
>   *
>   */
>  struct net_device_ops {
> @@ -1227,6 +1233,10 @@ struct net_device_ops {
>   int (*ndo_get_iflink)(const struct net_device *dev);
>   int (*ndo_change_proto_down)(struct net_device *dev,
>bool proto_down);
> + int (*ndo_get_egress_info)(struct net_device *dev,
> +struct sk_buff *skb,
> +struct ip_tunnel_info 
> *egress_tun_info,
> +const void 
> **egress_tun_opts);

This should have at least a better name to reflect it is about IP
tunnels.

But I don't like having an IP tunnel specific ndo, that doesn't sound
right. The real thing that is wanted here is to complete the dst
metadata. What about:

int (*ndo_fill_metadata_dst)(struct net_device *dev, struct sk_buff *skb);

The function will use skb_tunnel_info to get the template info, then
skb_dst_drop and allocate and attach a fully populated metadata_dst.
The egress_tun_info in struct dp_upcall_info then can be completely
dropped, as all the necessary tunnel information will be available
through skb_tunnel_info(skb). Also, when implemented correctly, such
skb will be just sent out without route lookups etc. if afterwards
handed to ndo_start_xmit.

[...]
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -337,6 +337,11 @@ void __init ip_tunnel_core_init(void);
>  void ip_tunnel_need_metadata(void);
>  void ip_tunnel_unneed_metadata(void);
>  
> +void ipv4_egress_info_init(struct ip_tunnel_info *egress_tun_info,
> +const void **egress_tun_opts,
> +struct ip_tunnel_info *info, __be32 saddr,
> +__be16 sport, __be16 dport);

Please use the ip_tunnel prefix as the rest of the functions, this is
not ipv4 egress info but ip *tunnel* egress info.

Also, it's not clear what the difference between "egress_tun_info" and
"info" is. I'd suggest to use "dst_info" and "src_info" or something
similar.

[...]
> --- a/net/ipv4/ip_tunnel_core.c
> +++ b/net/ipv4/ip_tunnel_core.c
> @@ -424,3 +424,40 @@ void ip_tunnel_unneed_metadata(void)
>   static_key_slow_dec(_tunnel_metadata_cnt);
>  }
>  EXPORT_SYMBOL_GPL(ip_tunnel_unneed_metadata);
> +
> +static void tnl_egress_opts_init(struct ip_tunnel_info *egress_tun_info,
> +  const void **egress_tun_opts,
> +  struct ip_tunnel_info *info)
> +{
> + egress_tun_info->options_len = info->options_len;
> + egress_tun_info->mode = info->mode;
> +
> + /* Tunnel options. */
> + if (info->options_len)
> + *egress_tun_opts = ip_tunnel_info_opts(info);
> + else
> + *egress_tun_opts = NULL;
> +}
> +
> +void ipv4_egress_info_init(struct ip_tunnel_info *egress_tun_info,
> +const void **egress_tun_opts,
> +struct ip_tunnel_info *info, __be32 saddr,
> +

Re: [PATCH RFC v1 7/7] net/faraday: Enable offload checksum according to device-tree

2015-10-06 Thread Sergei Shtylyov


Hello.

On 10/6/2015 6:09 AM, Gavin Shan wrote:


This enables IP/UDP/TCP offload checksum according to information
passed on from bootloader through device-tree. The offload doesn't
working properly when the interface works in NCSI mode.

Signed-off-by: Gavin Shan 
---
  drivers/net/ethernet/faraday/ftgmac100.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 1b13fd4..8caed35 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1377,7 +1377,11 @@ static int ftgmac100_probe(struct platform_device *pdev)

netdev->ethtool_ops = _ethtool_ops;
netdev->netdev_ops = _netdev_ops;
-   netdev->features = NETIF_F_IP_CSUM | NETIF_F_GRO;
+   if (pdev->dev.of_node &&
+   of_get_property(pdev->dev.of_node, "no-hw-checksum", NULL))


   Why not of_property_read_bool()?


+   netdev->features = NETIF_F_GRO;
+   else
+   netdev->features = NETIF_F_IP_CSUM | NETIF_F_GRO;


   Why not set NETIF_F_GRO outside of *if*?

[...]

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/5] net: dsa: exit probe if no switch were found

2015-10-06 Thread Neil Armstrong

If no switch were found in dsa_setup_dst, return -ENODEV and
exit the dsa_probe cleanly.

Signed-off-by: Neil Armstrong 
---
 net/dsa/dsa.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d5a162c..adb5325 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -837,10 +837,11 @@ static inline void dsa_of_remove(struct device *dev)
 }
 #endif

-static void dsa_setup_dst(struct dsa_switch_tree *dst, struct net_device *dev,
- struct device *parent, struct dsa_platform_data *pd)
+static int dsa_setup_dst(struct dsa_switch_tree *dst, struct net_device *dev,
+struct device *parent, struct dsa_platform_data *pd)
 {
int i;
+   unsigned configured = 0;

dst->pd = pd;
dst->master_netdev = dev;
@@ -860,9 +861,17 @@ static void dsa_setup_dst(struct dsa_switch_tree *dst, 
struct net_device *dev,
dst->ds[i] = ds;
if (ds->drv->poll_link != NULL)
dst->link_poll_needed = 1;
+
+   ++configured;
}

/*
+* If no switch was found, exit cleanly
+*/
+   if (!configured)
+   return -EPROBE_DEFER;
+
+   /*
 * If we use a tagging format that doesn't have an ethertype
 * field, make sure that all packets from this point on get
 * sent to the tag format's receive function.
@@ -878,6 +887,8 @@ static void dsa_setup_dst(struct dsa_switch_tree *dst, 
struct net_device *dev,
dst->link_poll_timer.expires = round_jiffies(jiffies + HZ);
add_timer(>link_poll_timer);
}
+
+   return 0;
 }

 static int dsa_probe(struct platform_device *pdev)
@@ -927,7 +938,9 @@ static int dsa_probe(struct platform_device *pdev)

platform_set_drvdata(pdev, dst);

-   dsa_setup_dst(dst, dev, >dev, pd);
+   ret = dsa_setup_dst(dst, dev, >dev, pd);
+   if (ret)
+   goto out;

return 0;

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/5] net: dsa: complete dsa_switch_destroy

2015-10-06 Thread Neil Armstrong

When unbinding dsa, complete the dsa_switch_destroy to unregister the
fixed link phy then cleanly unregister and destroy the net devices.

Reviewed-by: Florian Fainelli 
Signed-off-by: Neil Armstrong 
---
 net/dsa/dsa.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 14fac4e..6155923 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "dsa_priv.h"

 char dsa_driver_version[] = "0.1";
@@ -420,10 +421,46 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,

 static void dsa_switch_destroy(struct dsa_switch *ds)
 {
+   struct device_node *port_dn;
+   struct phy_device *phydev;
+   struct dsa_chip_data *cd = ds->pd;
+   int port;
+
 #ifdef CONFIG_NET_DSA_HWMON
if (ds->hwmon_dev)
hwmon_device_unregister(ds->hwmon_dev);
 #endif
+
+   /* Disable configuration of the CPU and DSA ports */
+   for (port = 0; port < DSA_MAX_PORTS; port++) {
+   if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
+   continue;
+
+   port_dn = cd->port_dn[port];
+   if (of_phy_is_fixed_link(port_dn)) {
+   phydev = of_phy_find_device(port_dn);
+   if (phydev) {
+   int addr = phydev->addr;
+
+   phy_device_free(phydev);
+   of_node_put(port_dn);
+   fixed_phy_del(addr);
+   }
+   }
+   }
+
+   /* Destroy network devices for physical switch ports. */
+   for (port = 0; port < DSA_MAX_PORTS; port++) {
+   if (!(ds->phys_port_mask & (1 << port)))
+   continue;
+
+   if (!ds->ports[port])
+   continue;
+
+   unregister_netdev(ds->ports[port]);
+   free_netdev(ds->ports[port]);
+   }
+
mdiobus_unregister(ds->slave_mii_bus);
mdiobus_free(ds->slave_mii_bus);
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] bridge: allow adding of fdb entries pointing to the bridge device

2015-10-06 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch enables adding of fdb entries pointing to the bridge device.
This can be used to propagate mac address of vlan interfaces
configured on top of the vlan filtering bridge.

Before:
$bridge fdb add 44:38:39:00:27:9f dev bridge
RTNETLINK answers: Invalid argument

After:
$bridge fdb add 44:38:39:00:27:9f dev bridge

Signed-off-by: Roopa Prabhu 
---
 net/bridge/br_fdb.c | 106 
 1 file changed, 83 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 7f7d551..5d0f6f9 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -608,13 +608,14 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
}
 }
 
-static int fdb_to_nud(const struct net_bridge_fdb_entry *fdb)
+static int fdb_to_nud(const struct net_bridge *br,
+ const struct net_bridge_fdb_entry *fdb)
 {
if (fdb->is_local)
return NUD_PERMANENT;
else if (fdb->is_static)
return NUD_NOARP;
-   else if (has_expired(fdb->dst->br, fdb))
+   else if (has_expired(br, fdb))
return NUD_STALE;
else
return NUD_REACHABLE;
@@ -640,7 +641,7 @@ static int fdb_fill_info(struct sk_buff *skb, const struct 
net_bridge *br,
ndm->ndm_flags   = fdb->added_by_external_learn ? NTF_EXT_LEARNED : 0;
ndm->ndm_type= 0;
ndm->ndm_ifindex = fdb->dst ? fdb->dst->dev->ifindex : br->dev->ifindex;
-   ndm->ndm_state   = fdb_to_nud(fdb);
+   ndm->ndm_state   = fdb_to_nud(br, fdb);
 
if (nla_put(skb, NDA_LLADDR, ETH_ALEN, >addr))
goto nla_put_failure;
@@ -785,7 +786,7 @@ static int fdb_add_entry(struct net_bridge_port *source, 
const __u8 *addr,
}
}
 
-   if (fdb_to_nud(fdb) != state) {
+   if (fdb_to_nud(br, fdb) != state) {
if (state & NUD_PERMANENT) {
fdb->is_local = 1;
if (!fdb->is_static) {
@@ -848,6 +849,7 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
struct net_bridge_vlan_group *vg;
struct net_bridge_port *p;
struct net_bridge_vlan *v;
+   struct net_bridge *br = NULL;
int err = 0;
 
if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
@@ -860,14 +862,19 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
return -EINVAL;
}
 
-   p = br_port_get_rtnl(dev);
-   if (p == NULL) {
-   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
-   dev->name);
-   return -EINVAL;
+   if (dev->priv_flags & IFF_EBRIDGE) {
+   br = netdev_priv(dev);
+   vg = br_vlan_group(br);
+   } else {
+   p = br_port_get_rtnl(dev);
+   if (!p) {
+   pr_info("bridge: RTM_NEWNEIGH %s not a bridge port\n",
+   dev->name);
+   return -EINVAL;
+   }
+   vg = nbp_vlan_group(p);
}
 
-   vg = nbp_vlan_group(p);
if (vid) {
v = br_vlan_find(vg, vid);
if (!v) {
@@ -877,9 +884,15 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
}
 
/* VID was specified, so use it. */
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, vid);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, vid);
} else {
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, 0);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags, 0);
if (err || !vg || !vg->num_vlans)
goto out;
 
@@ -888,7 +901,11 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 * vlan on this port.
 */
list_for_each_entry(v, >vlan_list, vlist) {
-   err = __br_fdb_add(ndm, p, addr, nlh_flags, v->vid);
+   if (dev->priv_flags & IFF_EBRIDGE)
+   err = br_fdb_insert(br, NULL, addr, v->vid);
+   else
+   err = __br_fdb_add(ndm, p, addr, nlh_flags,
+  v->vid);
if (err)
goto out;
}
@@ -898,6 +915,32 @@ out:
return err;
 }
 
+static int fdb_delete_by_addr(struct net_bridge *br, const u8 *addr,
+ u16 vid)
+{
+   struct hlist_head *head =

[PATCH] netfilter: fix bad checksum on IPv6 when NAT is performed

2015-10-06 Thread Maxime Bizon


With this setup:

* non IPv6 checksumming capable network hardware
* GRO off
* IPv6 SNAT

I get this when I receive an UDPv6 reply: ": hw csum failure"

Call trace:

* nf_ip6_checksum() calls __skb_checksum_complete()
* nf_nat_ipv6_csum_update() & nf_nat_ipv6_manip_pkt()
* __udp6_lib_rcv() => udp6_csum_init()
* __skb_checksum_validate_complete() "fastpath" fails because
  skb->csum is incorrect.
* udpv6_recvmsg() => skb_copy_and_csum_datagram_msg()

The last call computes a valid checksum despite CHECKSUM_COMPLETE and
triggers the warning.

When we perform NAT on IPv4, we also update the IPv4 checksum, so
there is no side effect on skb->csum (since the csum over a valid IPv4
header area is zero).

But IPv6 doesn't have such header checksum, so when performing NAT we need to
update skb->csum.

Signed-off-by: Maxime Bizon 
---
 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c 
b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
index 70fbaed..e9917d74 100644
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -81,6 +81,8 @@ static bool nf_nat_ipv6_manip_pkt(struct sk_buff *skb,
  enum nf_nat_manip_type maniptype)
 {
struct ipv6hdr *ipv6h;
+   const __be32 *to;
+   __be32 *from;
__be16 frag_off;
int hdroff;
u8 nexthdr;
@@ -100,11 +102,24 @@ static bool nf_nat_ipv6_manip_pkt(struct sk_buff *skb,
target, maniptype))
return false;
 manip_addr:
-   if (maniptype == NF_NAT_MANIP_SRC)
-   ipv6h->saddr = target->src.u3.in6;
-   else
-   ipv6h->daddr = target->dst.u3.in6;
+   if (maniptype == NF_NAT_MANIP_SRC) {
+   from = ipv6h->saddr.s6_addr32;
+   to = target->src.u3.in6.s6_addr32;
+   } else {
+   from = ipv6h->daddr.s6_addr32;
+   to = target->src.u3.in6.s6_addr32;
+   }
+
+   if (skb->ip_summed == CHECKSUM_COMPLETE) {
+   __be32 diff[] = {
+   ~from[0], ~from[1], ~from[2], ~from[3],
+   to[0], to[1], to[2], to[3],
+   };
+
+   skb->csum = ~csum_partial(diff, sizeof(diff), ~skb->csum);
+   }
 
+   memcpy(from, to, sizeof (struct in6_addr));
return true;
 }
 
-- 
1.9.1




-- 
Maxime


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 10/15] ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit

2015-10-06 Thread Eric W. Biederman

Eric Dumazet  writes:

> On Tue, 2015-10-06 at 13:53 -0500, Eric W. Biederman wrote:
>> Compute net and store it in a variable in the functions
>> ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be
>> recomputed next time it is needed.
>> 
>> Signed-off-by: "Eric W. Biederman" 
>> ---
>>  net/ipv4/ip_output.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index 10366ee03bec..a7012f2fa68a 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -139,6 +139,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const 
>> struct sock *sk,
>>  {
>>  struct inet_sock *inet = inet_sk(sk);
>>  struct rtable *rt = skb_rtable(skb);
>> +struct net *net = sock_net(sk);
>>  struct iphdr *iph;
>>  
>>  /* Build the IP header. */
>> @@ -157,7 +158,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const 
>> struct sock *sk,
>>  iph->id = 0;
>>  } else {
>>  iph->frag_off = 0;
>> -__ip_select_ident(sock_net(sk), iph, 1);
>> +__ip_select_ident(net, iph, 1);
>>  }
>>  
>
> Note that under normal SYNACK processing, we do not read sock_net(sk)
> here.
>
> This patch would slow down the SYNACK path under stress, unless compiler
> is smart enough to not care of what you wrote.
>
> Generally speaking, I do not see why storing 'struct net' pointer into a
> variable in the stack is very different from sk->sk_net access (sk being
> a register in most cases)
>
> Note that I am about to submit following patch, so that you understand
> the context : the listener socket is cold in cpu cache at the time we
> transmit a SYNACK. It is better to get net from the request_sock which
> is very hot at this point.

So what I am really reading it for is ip_local_out which I change to
take a struct net a few patches later in the series.  The patches that
changes everything is noticably cleaner and easier to review with these
couple of patches pulling struct net into it's own variable ahead of
time, and ip_build_and_send_pkt does call ip_local_out unconditionally.

I am in the process of figuring out how to compute net once in the
output path and just passing it through so I don't need to compute net
from dst->dev.  As when the dust settles I hope to allow for a dst->dev
in another network namespace.  So that routes with a destination device
in another network namespace will allow for something simpler and faster
than ipvlan that achieves a very similar effect.

In this case to achieve what you are looking for, for cache line
friendliness I believe we would need to pass net in from
tcp_v4_send_synack, and it's cousins in dccp.

skc_net does seem firmly in the first cache line of sockets so it does
look like any of the the reads to inet_sock that we do perform would
hit the same cache line.

To recap.  I store net in a variable because I start using it
unconditionally a few patches later. The only way I can see to avoid
hitting the cold cache line is to pass net into ip_build_and_send_pkt.

Do you think passing net into ip_build_and_send_pkt is the sensible way
to address your performance concern?  Or do you have issues with my
passing of net through the output path?

> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 55ed3266b05f..93277bde8dd9 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3026,7 +3026,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, 
> struct dst_entry *dst,
>   th->window = htons(min(req->rcv_wnd, 65535U));
>   tcp_options_write((__be32 *)(th + 1), NULL, );
>   th->doff = (tcp_header_size >> 2);
> - TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_OUTSEGS);
> + TCP_INC_STATS_BH(sock_net(req_to_sk(req)), TCP_MIB_OUTSEGS);
>  
>  #ifdef CONFIG_TCP_MD5SIG
>   /* Okay, we have all we need - do the md5 hash if needed */
> @@ -3519,9 +3519,11 @@ int tcp_rtx_synack(const struct sock *sk, struct 
> request_sock *req)
>  
>   tcp_rsk(req)->txhash = net_tx_rndhash();
>   res = af_ops->send_synack(sk, NULL, , req, 0, NULL, true);
> - if (!res) {
> - TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_RETRANSSEGS);
> - NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
> + if (likely(!res)) {
> + struct net *net = sock_net(req_to_sk(req));
> +
> + TCP_INC_STATS_BH(net, TCP_MIB_RETRANSSEGS);
> + NET_INC_STATS_BH(net, LINUX_MIB_TCPSYNRETRANS);
>   }
>   return res;
>  }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 1/2] mpls: multipath support

2015-10-06 Thread Eric W. Biederman

roopa  writes:

> On 10/6/15, 12:44 PM, Eric W. Biederman wrote:
>> Roopa Prabhu  writes:
>>
>>> From: Roopa Prabhu 
>>>
>>> This patch adds support for MPLS multipath routes.
>>>
>>> Includes following changes to support multipath:
>>> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
>>>
>>> - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
>>>
>>> - moves mpls route and nexthop structures into internal.h
>>>
>>> - A mpls_route can point to multiple mpls_nh structs
>>>
>>> - the nexthops are maintained as a list
>> So I am not certain I like nexthops being a list.  In the practical case
>> introducing this list guarantees that everyone will see at least an
>> extra cache line miss in the forwarding path.
>>
>> In the more abstract sense a list is the wrong data structure.  If the
>> list is so short we can afford to walk it an array is a better data
>> structure.  If we need enough entries to make the memory consumption
>> of an array a concern we want some kind of hash table or tree data
>> structure, because a list will be too long in that case.
>>
>> So can we please not use a list?
> sure, I used arrays the first time. 
> http://marc.info/?l=linux-netdev=143932956719398=2
> And i am very much ok with an array.  I used list in v2 by following the ipv6 
> fib code following comments from v1.
>
>
> The only place the lookup is sensitive is in the nexthop selection in 
> datapath. And depending
> on how the selection algorithm works, i am not sure if using a hash table 
> will help there.
> I will look though.
>
> I did prefer an array and If you are ok with an array, I will respin.

Please.  And let's cut out any fields we are not using yet.  If nothing
else lean and mean keeps this code more understandable and reviewable as
at the end of the day there is less of it.

>> I expect we can simplify the data structures by noting that rt_via must
>> be an ethernet mac today so that 6 bytes are enough and 8 bytes gives us
>> a bit extra and aligns things nicely.
>>
>> Also I know it goes away in the next patch but a spinlock taken for
>> every transit through the forwarding path really bugs me.
> yes, agree. I picked that from ipv4 fib. since it goes away with Roberts 
> patch I did not spend any time on it.
>
> thanks for the review.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] bpf: clear sender_cpu before xmit

2015-10-06 Thread Alexei Starovoitov

Similar to commit c29390c6dfee ("xps: must clear sender_cpu before forwarding")
the skb->sender_cpu needs to be cleared before xmit.

Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
Signed-off-by: Alexei Starovoitov 
---
That is similar to pending patches for xps:
http://patchwork.ozlabs.org/patch/526952/
and for act_mirred:
http://patchwork.ozlabs.org/patch/527066/

though Fixes tag is different, since bpf_clone_redirect() came in
after commit 2bd82484bb4c ("xps: fix xps for stacked devices")
---
 net/core/filter.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 87b78ef0c3d4..bb18c3680001 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1415,6 +1415,7 @@ static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 
flags, u64 r4, u64 r5)
return dev_forward_skb(dev, skb2);
 
skb2->dev = dev;
+   skb_sender_cpu_clear(skb2);
return dev_queue_xmit(skb2);
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 10/15] ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit

2015-10-06 Thread Eric Dumazet

On Tue, 2015-10-06 at 22:26 -0500, Eric W. Biederman wrote:

> So what I am really reading it for is ip_local_out which I change to
> take a struct net a few patches later in the series.  The patches that
> changes everything is noticably cleaner and easier to review with these
> couple of patches pulling struct net into it's own variable ahead of
> time, and ip_build_and_send_pkt does call ip_local_out unconditionally.
> 
> I am in the process of figuring out how to compute net once in the
> output path and just passing it through so I don't need to compute net
> from dst->dev.  As when the dust settles I hope to allow for a dst->dev
> in another network namespace.  So that routes with a destination device
> in another network namespace will allow for something simpler and faster
> than ipvlan that achieves a very similar effect.
> 
> In this case to achieve what you are looking for, for cache line
> friendliness I believe we would need to pass net in from
> tcp_v4_send_synack, and it's cousins in dccp.

Yes, something that can be done later.

> 
> skc_net does seem firmly in the first cache line of sockets so it does
> look like any of the the reads to inet_sock that we do perform would
> hit the same cache line.
> 
> To recap.  I store net in a variable because I start using it
> unconditionally a few patches later. The only way I can see to avoid
> hitting the cold cache line is to pass net into ip_build_and_send_pkt.
> 
> Do you think passing net into ip_build_and_send_pkt is the sensible way
> to address your performance concern?  Or do you have issues with my
> passing of net through the output path?

I have no issues, but was pointing out this particular path, that might
be optimized later, no worries.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Alexei Starovoitov


On 10/6/15 1:39 AM, Daniel Borkmann wrote:

[...] Also classic BPF would then need to test for it, since a socket
filter
doesn't really know whether native eBPF is loaded there or a
classic-to-eBPF
transformed one, and classic never makes use of this. Anyway, it
could be done
by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
verification phase, and test it inside sk_filter() if I see it
correctly.


That could also be done in an unlikely() branch, to keep the cost to
the non-eBPF
case near zero.


Yes, agreed. For the time being, the majority of users are coming from the
classic BPF side anyway and the unlikely() could still be changed later on
if it should not be the case anymore. The flag and bpf_func would share the
same cacheline as well.


was also thinking that we can do it only in paths that actually
have multiple protocol layers, since today bpf is mainly used with
tcpdump(raw_socket) and new af_packet fanout both have cb cleared
on RX, because it just came out of alloc_skb and no layers were called,
and on TX we can clear 20 bytes in dev_queue_xmit_nit().
af_unix/netlink also have clean skb. Need to analyze tun and sctp...
but it feels overly fragile to save a branch in sk_filter,
so planning to go with
if(unlikely(prog->cb_access)) memset in sk_filter().

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Pravin Shelar

On Tue, Oct 6, 2015 at 11:45 AM, Jiri Benc  wrote:
> On Tue, 6 Oct 2015 11:28:08 -0700, Pravin Shelar wrote:
>> What do you have in mind? I do not see way to fix this issue in vport-*.c.
>
> It seems to me that e.g. the code you have in vxlan_egress_tun_info in
> drivers/net/vxlan.c can be put into vxlan_get_egress_tun_info in
> net/openvswitch/vport-vxlan.c. vport->dev is guaranteed to be vxlan,
> and the current code accesses netdev_priv(dev) as struct vxlan_dev
> anyway.
>
> This would of course not work if we created lwtunnel interface from the
> ovs user space. But that's not going to happen with kernel 4.3, we'll
> need a way to query datapath for features it supports for this to work

This issue exist for lwtunnel based devices, for vport based tunnels
there is no bug.

> - there's currently no useful way to determine whether the kernel
> supports metadata based vxlan or not. I'm working on a patch to query
> the datapath for the supported features but that's for net-next. Thus
> I think we're safe.
>
We should be able to use lwtunnel devices on 4.3. How about using
tunnel device parameters to detect lwtunnel support. For example in
case of vxlan tunnel IFLA_VXLAN_COLLECT_METADATA flags can confirm
lwtunnel support.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Jiri Benc

On Tue, 6 Oct 2015 11:26:31 -0700, Pravin Shelar wrote:
> I do not see need to drop and reallocate dst in this operation. I just
> need to set source IP address and source and dst port in the metadata
> dst already set in skb.

If I'm looking at the code correctly, metadata_dst is stored in the
action and each skb gets only a reference to it. Modifying it would
modify the shared metadata_dst (see execute_set_action).

> This fill_metadata function is not called for every packet so
> ndo_start_xmit() still needs to do route lookup.

Yes. I meant that in those cases where the fill_metadata function was
called, we may skip the lookup. Just an optimization and not an
important one. I'm not even sure it can currently happen as the skb is
cloned for each action.

> egress_tun_info in dp_upcall_info is required to check for failure. It
> would be only set on successful fill_metadata operation.

Or you can just set dst to NULL on failure.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 4/5] net: phy: Broadcom Cygnus internal Etherent PHY driver

2015-10-06 Thread Arun Parameswaran



On 15-09-30 09:04 PM, kbuild test robot wrote:
> Hi Arun,
>
> [auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
> ignore]
>
> config: um-allyesconfig (attached as .config)
> reproduce:
> git checkout 0560b94805aa3bb38439b4f72b776d85d2aac394
> # save the attached .config to linux build tree
> make ARCH=um 
>
> All warnings (new ones prefixed by >>):
>
> warning: (BCM_CYGNUS_PHY) selects MDIO_BCM_IPROC which has unmet direct 
> dependencies (NETDEVICES && PHYLIB && (ARCH_BCM_IPROC || COMPILE_TEST) && 
> HAS_IOMEM && OF_MDIO)

Fixing this warning.
Changing from 'select MDIO_BCM_IPROC' to using 'depends on MDIO_BCM_IPROC' in 
the Cygnus PHY driver.

> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 02/15] xfrm: Only compute net once in xfrm_policy_queue_process

2015-10-06 Thread Eric W. Biederman

Signed-off-by: "Eric W. Biederman" 
---
 net/xfrm/xfrm_policy.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 418daa038edf..be1776bc5673 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1887,6 +1887,7 @@ static void xfrm_policy_queue_process(unsigned long arg)
struct sock *sk;
struct dst_entry *dst;
struct xfrm_policy *pol = (struct xfrm_policy *)arg;
+   struct net *net = xp_net(pol);
struct xfrm_policy_queue *pq = >polq;
struct flowi fl;
struct sk_buff_head list;
@@ -1903,8 +1904,7 @@ static void xfrm_policy_queue_process(unsigned long arg)
spin_unlock(>hold_queue.lock);
 
dst_hold(dst->path);
-   dst = xfrm_lookup(xp_net(pol), dst->path, ,
- sk, 0);
+   dst = xfrm_lookup(net, dst->path, , sk, 0);
if (IS_ERR(dst))
goto purge_queue;
 
@@ -1934,8 +1934,7 @@ static void xfrm_policy_queue_process(unsigned long arg)
 
xfrm_decode_session(skb, , skb_dst(skb)->ops->family);
dst_hold(skb_dst(skb)->path);
-   dst = xfrm_lookup(xp_net(pol), skb_dst(skb)->path,
- , skb->sk, 0);
+   dst = xfrm_lookup(net, skb_dst(skb)->path, , skb->sk, 0);
if (IS_ERR(dst)) {
kfree_skb(skb);
continue;
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Pravin Shelar

On Tue, Oct 6, 2015 at 12:03 PM, Jiri Benc  wrote:
> On Tue, 6 Oct 2015 11:55:35 -0700, Pravin Shelar wrote:
>> We should be able to use lwtunnel devices on 4.3. How about using
>> tunnel device parameters to detect lwtunnel support. For example in
>> case of vxlan tunnel IFLA_VXLAN_COLLECT_METADATA flags can confirm
>> lwtunnel support.
>
> You would have to create the interface first using that flag and then
> check whether the created interface has the flag set or not. If not,
> delete the interface.
>
> Unfortunately, old kernels will just ignore the flag when creating the
> interface. That's why I wrote there's no useful way to check it.
>

It should report the flag when device parameters are requested by
userspace if there is support for lwtunnel. So check for the flag
should be good enough test for lwtunnel support. Why is that not
reliable check?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net] xps: clear skb sender_cpu before tee

2015-10-06 Thread dho

From: "Devon H. O'Dell" 

The netfilter tee code does not clear skb->sender_cpu after copying the
skb. When both CONFIG_XPS and CONFIG_NET_RX_BUSY_POLL are active, it is
possible for a tee rule to duplicate a skb from input, leaving its
its napi queue id set. Because this field is shared in a union with
sender_cpu, we can get an invalid offset in __netdev_pick_tx when the
napi_id exceeds the number of logical CPUs in the system. This yields
the following panic:

BUG: unable to handle kernel paging request at 
IP: [] __netdev_pick_tx+0x6d/0x150
PGD 0
Oops:  [#1] SMP
Call Trace:
 
 [] ixgbe_select_queue+0xe2/0x190 [ixgbe]
 [] netdev_pick_tx+0x6b/0x100
 [] __dev_queue_xmit+0x84/0x540
 [] ? ipt_do_table+0x208/0x5f0
 [] dev_queue_xmit_sk+0x13/0x20
 [] macvlan_start_xmit+0xb1/0x150 [macvlan]
 [] dev_hard_start_xmit+0x22b/0x3d0
 [] ? validate_xmit_skb.isra.98.part.99+0x29/0x2c0
 [] __dev_queue_xmit+0x461/0x540
 [] dev_queue_xmit_sk+0x13/0x20
 [] ip_finish_output+0x258/0x8c0
 [] ip_output+0x6b/0xc0
 [] ? ip_finish_output2+0x370/0x370
 [] ip_local_out_sk+0x3a/0x50
 [] tee_tg4+0x186/0x208 [xt_TEE]
 [] ipt_do_table+0x2fb/0x5f0
 [] ? tcp_rcv_established+0x4b2/0x800
 [] ? ipt_do_table+0x208/0x5f0
 [] ? ixgbe_xmit_frame_ring+0x415/0xe20 [ixgbe]
 [] iptable_mangle_hook+0x4b/0x140
 [] nf_iterate+0x7f/0xb0
 [] nf_hook_slow+0xa4/0x110
 [] ip_rcv+0x2d1/0x3b0
 ...

Investigation shows that Eric Dumazet fixed a similar issue in
commit c29390c6dfeee094 ("xps: must clear sender_cpu before forwarding"),
which was introduced by his commit 2bd82484bb4c5db1 ("xps: fix xps for
stacked devices").

Thanks-to: Eric Hoffman 
Thanks-to: Grant Zhang 
Tested-by: Jonathan Steinert 
Signed-off-by: Devon H. O'Dell 

---
 net/ipv4/netfilter/nf_dup_ipv4.c | 2 ++
 net/ipv6/netfilter/nf_dup_ipv6.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
index 2d79e6e..2f2a79b 100644
--- a/net/ipv4/netfilter/nf_dup_ipv4.c
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -81,6 +81,8 @@ void nf_dup_ipv4(struct sk_buff *skb, unsigned int hooknum,
if (skb == NULL)
return;
 
+   skb_sender_cpu_clear(skb);
+
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
/* Avoid counting cloned packets towards the original connection. */
nf_conntrack_put(skb->nfct);
diff --git a/net/ipv6/netfilter/nf_dup_ipv6.c b/net/ipv6/netfilter/nf_dup_ipv6.c
index c8ab626..03f0a15 100644
--- a/net/ipv6/netfilter/nf_dup_ipv6.c
+++ b/net/ipv6/netfilter/nf_dup_ipv6.c
@@ -70,6 +70,8 @@ void nf_dup_ipv6(struct sk_buff *skb, unsigned int hooknum,
if (skb == NULL)
return;
 
+   skb_sender_cpu_clear(skb);
+
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
nf_conntrack_put(skb->nfct);
skb->nfct = _ct_untracked_get()->ct_general;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Daniel Borkmann


On 10/06/2015 07:50 PM, Alexei Starovoitov wrote:

On 10/6/15 1:39 AM, Daniel Borkmann wrote:

[...] Also classic BPF would then need to test for it, since a socket
filter
doesn't really know whether native eBPF is loaded there or a
classic-to-eBPF
transformed one, and classic never makes use of this. Anyway, it
could be done
by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
verification phase, and test it inside sk_filter() if I see it
correctly.


That could also be done in an unlikely() branch, to keep the cost to
the non-eBPF
case near zero.


Yes, agreed. For the time being, the majority of users are coming from the
classic BPF side anyway and the unlikely() could still be changed later on
if it should not be the case anymore. The flag and bpf_func would share the
same cacheline as well.


was also thinking that we can do it only in paths that actually
have multiple protocol layers, since today bpf is mainly used with
tcpdump(raw_socket) and new af_packet fanout both have cb cleared
on RX, because it just came out of alloc_skb and no layers were called,
and on TX we can clear 20 bytes in dev_queue_xmit_nit().
af_unix/netlink also have clean skb. Need to analyze tun and sctp...
but it feels overly fragile to save a branch in sk_filter,
so planning to go with
if(unlikely(prog->cb_access)) memset in sk_filter().


I was also thinking that for dev_queue_xmit_nit(), since we do the skb_clone()
there, to have a clone version (w/o affecting performance of the current one)
that instead of copying cb[] over, it would just do a memset(). But that would
just be limited to AF_PACKET, and doesn't catch all sk_filter() users.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 06/15] ipv4: Merge ip_local_out and ip_local_out_sk

2015-10-06 Thread Eric W. Biederman

It is confusing and silly hiding a paramater so modify all of
the callers to pass in the appropriate socket or skb->sk if
no socket is known.

Signed-off-by: "Eric W. Biederman" 
---
 drivers/net/ipvlan/ipvlan_core.c|  2 +-
 drivers/net/ppp/pptp.c  |  2 +-
 drivers/net/vrf.c   |  4 ++--
 include/net/ip.h|  6 +-
 net/ipv4/igmp.c |  4 ++--
 net/ipv4/ip_output.c| 10 +-
 net/ipv4/ip_tunnel_core.c   |  2 +-
 net/ipv4/netfilter/ipt_SYNPROXY.c   |  2 +-
 net/ipv4/netfilter/nf_dup_ipv4.c|  2 +-
 net/ipv4/netfilter/nf_reject_ipv4.c |  2 +-
 net/netfilter/ipvs/ip_vs_xmit.c |  2 +-
 11 files changed, 17 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 207f62e8de9a..c75ad39c752f 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -364,7 +364,7 @@ static int ipvlan_process_v4_outbound(struct sk_buff *skb)
}
skb_dst_drop(skb);
skb_dst_set(skb, >dst);
-   err = ip_local_out(skb);
+   err = ip_local_out(skb->sk, skb);
if (unlikely(net_xmit_eval(err)))
dev->stats.tx_errors++;
else
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 686f37daa262..6bef7be10671 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -282,7 +282,7 @@ static int pptp_xmit(struct ppp_channel *chan, struct 
sk_buff *skb)
ip_select_ident(sock_net(sk), skb, NULL);
ip_send_check(iph);
 
-   ip_local_out(skb);
+   ip_local_out(skb->sk, skb);
return 1;
 
 tx_error:
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 2a02cee0bf95..e3a89257e4b7 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -75,7 +75,7 @@ static struct dst_entry *vrf_ip_check(struct dst_entry *dst, 
u32 cookie)
 
 static int vrf_ip_local_out(struct sock *sk, struct sk_buff *skb)
 {
-   return ip_local_out_sk(sk, skb);
+   return ip_local_out(sk, skb);
 }
 
 static unsigned int vrf_v4_mtu(const struct dst_entry *dst)
@@ -221,7 +221,7 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff 
*skb,
   RT_SCOPE_LINK);
}
 
-   ret = ip_local_out(skb);
+   ret = ip_local_out(skb->sk, skb);
if (unlikely(net_xmit_eval(ret)))
vrf_dev->stats.tx_errors++;
else
diff --git a/include/net/ip.h b/include/net/ip.h
index 46272e04f3b6..03e80f936847 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -113,11 +113,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, 
struct sk_buff *skb,
   int (*output)(struct net *, struct sock *, struct sk_buff 
*));
 void ip_send_check(struct iphdr *ip);
 int __ip_local_out(struct sock *sk, struct sk_buff *skb);
-int ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
-static inline int ip_local_out(struct sk_buff *skb)
-{
-   return ip_local_out_sk(skb->sk, skb);
-}
+int ip_local_out(struct sock *sk, struct sk_buff *skb);
 
 int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl);
 void ip_init(void);
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index de6d4c8ba600..43375d9e02ab 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -397,7 +397,7 @@ static int igmpv3_sendpack(struct sk_buff *skb)
 
pig->csum = ip_compute_csum(igmp_hdr(skb), igmplen);
 
-   return ip_local_out(skb);
+   return ip_local_out(skb->sk, skb);
 }
 
 static int grec_size(struct ip_mc_list *pmc, int type, int gdel, int sdel)
@@ -739,7 +739,7 @@ static int igmp_send_report(struct in_device *in_dev, 
struct ip_mc_list *pmc,
ih->group = group;
ih->csum = ip_compute_csum((void *)ih, sizeof(struct igmphdr));
 
-   return ip_local_out(skb);
+   return ip_local_out(skb->sk, skb);
 }
 
 static void igmp_gq_timer_expire(unsigned long data)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 59cec0af3b2e..10366ee03bec 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -108,7 +108,7 @@ int __ip_local_out(struct sock *sk, struct sk_buff *skb)
   dst_output);
 }
 
-int ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
+int ip_local_out(struct sock *sk, struct sk_buff *skb)
 {
struct net *net = dev_net(skb_dst(skb)->dev);
int err;
@@ -119,7 +119,7 @@ int ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
 
return err;
 }
-EXPORT_SYMBOL_GPL(ip_local_out_sk);
+EXPORT_SYMBOL_GPL(ip_local_out);
 
 static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst)
 {
@@ -169,7 +169,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct 
sock *sk,
skb->mark = sk->sk_mark;
 
/* Send it out. */
-   return ip_local_out(skb);
+   return ip_local_out(skb->sk, skb);
 }
 EXPORT_SYMBOL_GPL(ip_build_and_send_pkt);
 
@@ -456,7 +456,7 @@

[PATCH net-next 03/15] net: Pass net into dst_output and remove dst_output_okfn

2015-10-06 Thread Eric W. Biederman

Replace dst_output_okfn with dst_output

Signed-off-by: "Eric W. Biederman" 
---
 include/net/dst.h   | 6 +-
 net/decnet/dn_nsp_out.c | 4 ++--
 net/ipv4/ip_forward.c   | 2 +-
 net/ipv4/ip_output.c| 7 ---
 net/ipv4/ip_vti.c   | 2 +-
 net/ipv4/ipmr.c | 2 +-
 net/ipv4/raw.c  | 2 +-
 net/ipv4/xfrm4_output.c | 2 +-
 net/ipv6/ip6_output.c   | 4 ++--
 net/ipv6/ip6_vti.c  | 2 +-
 net/ipv6/ip6mr.c| 2 +-
 net/ipv6/mcast.c| 4 ++--
 net/ipv6/ndisc.c| 2 +-
 net/ipv6/output_core.c  | 5 +++--
 net/ipv6/raw.c  | 2 +-
 net/ipv6/xfrm6_output.c | 2 +-
 net/netfilter/ipvs/ip_vs_xmit.c | 4 ++--
 net/xfrm/xfrm_output.c  | 2 +-
 net/xfrm/xfrm_policy.c  | 2 +-
 19 files changed, 28 insertions(+), 30 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 779206c15f8b..fdd01fed1a7b 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -454,14 +454,10 @@ static inline void dst_set_expires(struct dst_entry *dst, 
int timeout)
 }
 
 /* Output packet to network from transport.  */
-static inline int dst_output(struct sock *sk, struct sk_buff *skb)
+static inline int dst_output(struct net *net, struct sock *sk, struct sk_buff 
*skb)
 {
return skb_dst(skb)->output(sk, skb);
 }
-static inline int dst_output_okfn(struct net *net, struct sock *sk, struct 
sk_buff *skb)
-{
-   return dst_output(sk, skb);
-}
 
 /* Input packet from network to transport.  */
 static inline int dst_input(struct sk_buff *skb)
diff --git a/net/decnet/dn_nsp_out.c b/net/decnet/dn_nsp_out.c
index 4b02dd300f50..849805e7af52 100644
--- a/net/decnet/dn_nsp_out.c
+++ b/net/decnet/dn_nsp_out.c
@@ -85,7 +85,7 @@ static void dn_nsp_send(struct sk_buff *skb)
if (dst) {
 try_again:
skb_dst_set(skb, dst);
-   dst_output(skb->sk, skb);
+   dst_output(_net, skb->sk, skb);
return;
}
 
@@ -582,7 +582,7 @@ static __inline__ void dn_nsp_do_disc(struct sock *sk, 
unsigned char msgflg,
 * associations.
 */
skb_dst_set(skb, dst_clone(dst));
-   dst_output(skb->sk, skb);
+   dst_output(_net, skb->sk, skb);
 }
 
 
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index d66cfb35ba74..da0d7ce85844 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -72,7 +72,7 @@ static int ip_forward_finish(struct net *net, struct sock 
*sk, struct sk_buff *s
ip_forward_options(skb);
 
skb_sender_cpu_clear(skb);
-   return dst_output(sk, skb);
+   return dst_output(net, sk, skb);
 }
 
 int ip_forward(struct sk_buff *skb)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 6cb585a05dd1..d80e646bb175 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -105,7 +105,7 @@ static int __ip_local_out_sk(struct sock *sk, struct 
sk_buff *skb)
ip_send_check(iph);
return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT,
   net, sk, skb, NULL, skb_dst(skb)->dev,
-  dst_output_okfn);
+  dst_output);
 }
 
 int __ip_local_out(struct sk_buff *skb)
@@ -115,11 +115,12 @@ int __ip_local_out(struct sk_buff *skb)
 
 int ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
 {
+   struct net *net = dev_net(skb_dst(skb)->dev);
int err;
 
err = __ip_local_out_sk(sk, skb);
if (likely(err == 1))
-   err = dst_output(sk, skb);
+   err = dst_output(net, sk, skb);
 
return err;
 }
@@ -276,7 +277,7 @@ static int ip_finish_output(struct net *net, struct sock 
*sk, struct sk_buff *sk
/* Policy lookup after SNAT yielded a new policy */
if (skb_dst(skb)->xfrm) {
IPCB(skb)->flags |= IPSKB_REROUTED;
-   return dst_output(sk, skb);
+   return dst_output(net, sk, skb);
}
 #endif
mtu = ip_skb_dst_mtu(skb);
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 3b87ec5178f9..4d8f0b698777 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -197,7 +197,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct 
net_device *dev,
skb_dst_set(skb, dst);
skb->dev = skb_dst(skb)->dev;
 
-   err = dst_output(skb->sk, skb);
+   err = dst_output(tunnel->net, skb->sk, skb);
if (net_xmit_eval(err) == 0)
err = skb->len;
iptunnel_xmit_stats(err, >stats, dev->tstats);
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index cfcb996ec51b..fc42525d8694 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1689,7 +1689,7 @@ static inline int ipmr_forward_finish(struct net *net, 
struct sock *sk,
if (unlikely(opt->optlen))
ip_forward_options(skb);
 
-   return dst_output(sk, skb);
+   return dst_output(net, sk, skb);
 }
 
 /*

[PATCH net-next 05/15] ipv4: Merge __ip_local_out and __ip_local_out_sk

2015-10-06 Thread Eric W. Biederman

Signed-off-by: "Eric W. Biederman" 
---
 include/net/ip.h| 3 +--
 net/ipv4/ip_output.c| 9 ++---
 net/ipv4/route.c| 2 +-
 net/ipv4/xfrm4_policy.c | 2 +-
 4 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index ea1f721f7224..46272e04f3b6 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -112,8 +112,7 @@ int ip_mc_output(struct sock *sk, struct sk_buff *skb);
 int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
   int (*output)(struct net *, struct sock *, struct sk_buff 
*));
 void ip_send_check(struct iphdr *ip);
-int __ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
-int __ip_local_out(struct sk_buff *skb);
+int __ip_local_out(struct sock *sk, struct sk_buff *skb);
 int ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
 static inline int ip_local_out(struct sk_buff *skb)
 {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 25c93af125e4..59cec0af3b2e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -96,7 +96,7 @@ void ip_send_check(struct iphdr *iph)
 }
 EXPORT_SYMBOL(ip_send_check);
 
-int __ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
+int __ip_local_out(struct sock *sk, struct sk_buff *skb)
 {
struct net *net = dev_net(skb_dst(skb)->dev);
struct iphdr *iph = ip_hdr(skb);
@@ -108,17 +108,12 @@ int __ip_local_out_sk(struct sock *sk, struct sk_buff 
*skb)
   dst_output);
 }
 
-int __ip_local_out(struct sk_buff *skb)
-{
-   return __ip_local_out_sk(skb->sk, skb);
-}
-
 int ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
 {
struct net *net = dev_net(skb_dst(skb)->dev);
int err;
 
-   err = __ip_local_out_sk(sk, skb);
+   err = __ip_local_out(sk, skb);
if (likely(err == 1))
err = dst_output(net, sk, skb);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d1208806e2c6..54297d3a0559 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -165,7 +165,7 @@ static struct dst_ops ipv4_dst_ops = {
.link_failure = ipv4_link_failure,
.update_pmtu =  ip_rt_update_pmtu,
.redirect = ip_do_redirect,
-   .local_out =__ip_local_out_sk,
+   .local_out =__ip_local_out,
.neigh_lookup = ipv4_neigh_lookup,
 };
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index d46d99f9cabd..f2606b9056bb 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -243,7 +243,7 @@ static struct dst_ops xfrm4_dst_ops = {
.cow_metrics =  dst_cow_metrics_generic,
.destroy =  xfrm4_dst_destroy,
.ifdown =   xfrm4_dst_ifdown,
-   .local_out =__ip_local_out_sk,
+   .local_out =__ip_local_out,
.gc_thresh =32768,
 };
 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 08/15] ipv6: Merge ip6_local_out and ip6_local_out_sk

2015-10-06 Thread Eric W. Biederman

Stop hidding the sk paramater with an inline helper function and
make all of the callers pass it, so that it is clear what the
function is doing.

Signed-off-by: "Eric W. Biederman" 
---
 drivers/net/ipvlan/ipvlan_core.c| 2 +-
 include/net/ip6_tunnel.h| 2 +-
 include/net/ipv6.h  | 3 +--
 net/ipv6/ip6_output.c   | 2 +-
 net/ipv6/netfilter/ip6t_SYNPROXY.c  | 2 +-
 net/ipv6/netfilter/nf_dup_ipv6.c| 2 +-
 net/ipv6/netfilter/nf_reject_ipv6.c | 2 +-
 net/ipv6/output_core.c  | 8 +---
 net/netfilter/ipvs/ip_vs_xmit.c | 2 +-
 9 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index c75ad39c752f..75dcf36c0366 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -401,7 +401,7 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb)
}
skb_dst_drop(skb);
skb_dst_set(skb, dst);
-   err = ip6_local_out(skb);
+   err = ip6_local_out(skb->sk, skb);
if (unlikely(net_xmit_eval(err)))
dev->stats.tx_errors++;
else
diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index fa915fa0f703..8f18a8b126e9 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -87,7 +87,7 @@ static inline void ip6tunnel_xmit(struct sock *sk, struct 
sk_buff *skb,
int pkt_len, err;
 
pkt_len = skb->len - skb_inner_network_offset(skb);
-   err = ip6_local_out_sk(sk, skb);
+   err = ip6_local_out(sk, skb);
 
if (net_xmit_eval(err) == 0) {
struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index be7e7689514b..30eb1821c184 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -866,8 +866,7 @@ int ip6_input(struct sk_buff *skb);
 int ip6_mc_input(struct sk_buff *skb);
 
 int __ip6_local_out(struct sock *sk, struct sk_buff *skb);
-int ip6_local_out_sk(struct sock *sk, struct sk_buff *skb);
-int ip6_local_out(struct sk_buff *skb);
+int ip6_local_out(struct sock *sk, struct sk_buff *skb);
 
 /*
  * Extension header (options) processing
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 0171e762e03c..31c686b7fcc0 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1692,7 +1692,7 @@ int ip6_send_skb(struct sk_buff *skb)
struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
int err;
 
-   err = ip6_local_out(skb);
+   err = ip6_local_out(skb->sk, skb);
if (err) {
if (err > 0)
err = net_xmit_errno(err);
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c 
b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index c2356602158a..c38c3411150b 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -76,7 +76,7 @@ synproxy_send_tcp(const struct synproxy_net *snet,
nf_conntrack_get(nfct);
}
 
-   ip6_local_out(nskb);
+   ip6_local_out(nskb->sk, nskb);
return;
 
 free_nskb:
diff --git a/net/ipv6/netfilter/nf_dup_ipv6.c b/net/ipv6/netfilter/nf_dup_ipv6.c
index ee0d9a5b16c3..64f3fe5e2719 100644
--- a/net/ipv6/netfilter/nf_dup_ipv6.c
+++ b/net/ipv6/netfilter/nf_dup_ipv6.c
@@ -68,7 +68,7 @@ void nf_dup_ipv6(struct net *net, struct sk_buff *skb, 
unsigned int hooknum,
}
if (nf_dup_ipv6_route(net, skb, gw, oif)) {
__this_cpu_write(nf_skb_duplicated, true);
-   ip6_local_out(skb);
+   ip6_local_out(skb->sk, skb);
__this_cpu_write(nf_skb_duplicated, false);
} else {
kfree_skb(skb);
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c 
b/net/ipv6/netfilter/nf_reject_ipv6.c
index 94b4c6dfb400..a4f73e235ca5 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -206,7 +206,7 @@ void nf_send_reset6(struct net *net, struct sk_buff 
*oldskb, int hook)
dev_queue_xmit(nskb);
} else
 #endif
-   ip6_local_out(nskb);
+   ip6_local_out(nskb->sk, nskb);
 }
 EXPORT_SYMBOL_GPL(nf_send_reset6);
 
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index f93ae1515387..12855811c6a0 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -155,7 +155,7 @@ int __ip6_local_out(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(__ip6_local_out);
 
-int ip6_local_out_sk(struct sock *sk, struct sk_buff *skb)
+int ip6_local_out(struct sock *sk, struct sk_buff *skb)
 {
struct net *net = dev_net(skb_dst(skb)->dev);
int err;
@@ -166,10 +166,4 @@ int ip6_local_out_sk(struct sock *sk, struct sk_buff *skb)
 
return err;
 }
-EXPORT_SYMBOL_GPL(ip6_local_out_sk);
-
-int ip6_local_out(struct sk_buff *skb)
-{
-   return ip6_local_out_sk(skb->sk, skb);
-}
 EXPORT_SYMBOL_GPL(ip6_local_out);
diff --git

[PATCH net-next 07/15] ipv6: Merge __ip6_local_out and __ip6_local_out_sk

2015-10-06 Thread Eric W. Biederman

Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk
__ip6_local_out and remove the previous __ip6_local_out.

Signed-off-by: "Eric W. Biederman" 
---
 include/net/ipv6.h  | 3 +--
 net/ipv6/output_core.c  | 9 ++---
 net/ipv6/route.c| 2 +-
 net/ipv6/xfrm6_policy.c | 2 +-
 4 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 56920262dbe9..be7e7689514b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -865,8 +865,7 @@ int ip6_forward(struct sk_buff *skb);
 int ip6_input(struct sk_buff *skb);
 int ip6_mc_input(struct sk_buff *skb);
 
-int __ip6_local_out_sk(struct sock *sk, struct sk_buff *skb);
-int __ip6_local_out(struct sk_buff *skb);
+int __ip6_local_out(struct sock *sk, struct sk_buff *skb);
 int ip6_local_out_sk(struct sock *sk, struct sk_buff *skb);
 int ip6_local_out(struct sk_buff *skb);
 
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index e5affb5fe095..f93ae1515387 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -138,7 +138,7 @@ int ip6_dst_hoplimit(struct dst_entry *dst)
 EXPORT_SYMBOL(ip6_dst_hoplimit);
 #endif
 
-int __ip6_local_out_sk(struct sock *sk, struct sk_buff *skb)
+int __ip6_local_out(struct sock *sk, struct sk_buff *skb)
 {
struct net *net = dev_net(skb_dst(skb)->dev);
int len;
@@ -153,11 +153,6 @@ int __ip6_local_out_sk(struct sock *sk, struct sk_buff 
*skb)
   net, sk, skb, NULL, skb_dst(skb)->dev,
   dst_output);
 }
-
-int __ip6_local_out(struct sk_buff *skb)
-{
-   return __ip6_local_out_sk(skb->sk, skb);
-}
 EXPORT_SYMBOL_GPL(__ip6_local_out);
 
 int ip6_local_out_sk(struct sock *sk, struct sk_buff *skb)
@@ -165,7 +160,7 @@ int ip6_local_out_sk(struct sock *sk, struct sk_buff *skb)
struct net *net = dev_net(skb_dst(skb)->dev);
int err;
 
-   err = __ip6_local_out_sk(sk, skb);
+   err = __ip6_local_out(sk, skb);
if (likely(err == 1))
err = dst_output(net, sk, skb);
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b62a507cc1a5..d3d946773a3e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -226,7 +226,7 @@ static struct dst_ops ip6_dst_ops_template = {
.link_failure   =   ip6_link_failure,
.update_pmtu=   ip6_rt_update_pmtu,
.redirect   =   rt6_do_redirect,
-   .local_out  =   __ip6_local_out_sk,
+   .local_out  =   __ip6_local_out,
.neigh_lookup   =   ip6_neigh_lookup,
 };
 
diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index 861a1679f33f..69cee4e0d728 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -284,7 +284,7 @@ static struct dst_ops xfrm6_dst_ops = {
.cow_metrics =  dst_cow_metrics_generic,
.destroy =  xfrm6_dst_destroy,
.ifdown =   xfrm6_dst_ifdown,
-   .local_out =__ip6_local_out_sk,
+   .local_out =__ip6_local_out,
.gc_thresh =32768,
 };
 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 04/15] dst: Pass a sk into .local_out

2015-10-06 Thread Eric W. Biederman

For consistency with the other similar methods in the kernel pass a
struct sock into the dst_ops .local_out method.

Simplifying the socket passing case is needed a prequel to passing a struct net
reference into .local_out.

Signed-off-by: "Eric W. Biederman" 
---
 drivers/net/vrf.c   | 4 ++--
 include/net/dst_ops.h   | 2 +-
 include/net/ip.h| 1 +
 include/net/ipv6.h  | 1 +
 net/ipv4/ip_output.c| 2 +-
 net/ipv4/route.c| 2 +-
 net/ipv4/xfrm4_policy.c | 2 +-
 net/ipv6/output_core.c  | 2 +-
 net/ipv6/route.c| 2 +-
 net/ipv6/xfrm6_policy.c | 2 +-
 net/xfrm/xfrm_output.c  | 2 +-
 11 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 474396353e7f..2a02cee0bf95 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -73,9 +73,9 @@ static struct dst_entry *vrf_ip_check(struct dst_entry *dst, 
u32 cookie)
return dst;
 }
 
-static int vrf_ip_local_out(struct sk_buff *skb)
+static int vrf_ip_local_out(struct sock *sk, struct sk_buff *skb)
 {
-   return ip_local_out(skb);
+   return ip_local_out_sk(sk, skb);
 }
 
 static unsigned int vrf_v4_mtu(const struct dst_entry *dst)
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index d64253914a6a..3f26a6af444e 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -28,7 +28,7 @@ struct dst_ops {
   struct sk_buff *skb, u32 mtu);
void(*redirect)(struct dst_entry *dst, struct sock 
*sk,
struct sk_buff *skb);
-   int (*local_out)(struct sk_buff *skb);
+   int (*local_out)(struct sock *sk, struct sk_buff 
*skb);
struct neighbour *  (*neigh_lookup)(const struct dst_entry *dst,
struct sk_buff *skb,
const void *daddr);
diff --git a/include/net/ip.h b/include/net/ip.h
index dd06ab3669f9..ea1f721f7224 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -112,6 +112,7 @@ int ip_mc_output(struct sock *sk, struct sk_buff *skb);
 int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
   int (*output)(struct net *, struct sock *, struct sk_buff 
*));
 void ip_send_check(struct iphdr *ip);
+int __ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
 int __ip_local_out(struct sk_buff *skb);
 int ip_local_out_sk(struct sock *sk, struct sk_buff *skb);
 static inline int ip_local_out(struct sk_buff *skb)
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 3dde042bcd3f..56920262dbe9 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -865,6 +865,7 @@ int ip6_forward(struct sk_buff *skb);
 int ip6_input(struct sk_buff *skb);
 int ip6_mc_input(struct sk_buff *skb);
 
+int __ip6_local_out_sk(struct sock *sk, struct sk_buff *skb);
 int __ip6_local_out(struct sk_buff *skb);
 int ip6_local_out_sk(struct sock *sk, struct sk_buff *skb);
 int ip6_local_out(struct sk_buff *skb);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d80e646bb175..25c93af125e4 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -96,7 +96,7 @@ void ip_send_check(struct iphdr *iph)
 }
 EXPORT_SYMBOL(ip_send_check);
 
-static int __ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
+int __ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
 {
struct net *net = dev_net(skb_dst(skb)->dev);
struct iphdr *iph = ip_hdr(skb);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 54297d3a0559..d1208806e2c6 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -165,7 +165,7 @@ static struct dst_ops ipv4_dst_ops = {
.link_failure = ipv4_link_failure,
.update_pmtu =  ip_rt_update_pmtu,
.redirect = ip_do_redirect,
-   .local_out =__ip_local_out,
+   .local_out =__ip_local_out_sk,
.neigh_lookup = ipv4_neigh_lookup,
 };
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index f2606b9056bb..d46d99f9cabd 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -243,7 +243,7 @@ static struct dst_ops xfrm4_dst_ops = {
.cow_metrics =  dst_cow_metrics_generic,
.destroy =  xfrm4_dst_destroy,
.ifdown =   xfrm4_dst_ifdown,
-   .local_out =__ip_local_out,
+   .local_out =__ip_local_out_sk,
.gc_thresh =32768,
 };
 
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index 4337147ee23d..e5affb5fe095 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -138,7 +138,7 @@ int ip6_dst_hoplimit(struct dst_entry *dst)
 EXPORT_SYMBOL(ip6_dst_hoplimit);
 #endif
 
-static int __ip6_local_out_sk(struct sock *sk, struct sk_buff *skb)
+int __ip6_local_out_sk(struct sock *sk, struct sk_buff

[PATCH net-next 10/15] ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmit

2015-10-06 Thread Eric W. Biederman

Compute net and store it in a variable in the functions
ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be
recomputed next time it is needed.

Signed-off-by: "Eric W. Biederman" 
---
 net/ipv4/ip_output.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 10366ee03bec..a7012f2fa68a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -139,6 +139,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct 
sock *sk,
 {
struct inet_sock *inet = inet_sk(sk);
struct rtable *rt = skb_rtable(skb);
+   struct net *net = sock_net(sk);
struct iphdr *iph;
 
/* Build the IP header. */
@@ -157,7 +158,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct 
sock *sk,
iph->id = 0;
} else {
iph->frag_off = 0;
-   __ip_select_ident(sock_net(sk), iph, 1);
+   __ip_select_ident(net, iph, 1);
}
 
if (opt && opt->opt.optlen) {
@@ -382,6 +383,7 @@ static void ip_copy_addrs(struct iphdr *iph, const struct 
flowi4 *fl4)
 int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
 {
struct inet_sock *inet = inet_sk(sk);
+   struct net *net = sock_net(sk);
struct ip_options_rcu *inet_opt;
struct flowi4 *fl4;
struct rtable *rt;
@@ -412,7 +414,7 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, 
struct flowi *fl)
 * keep trying until route appears or the connection times
 * itself out.
 */
-   rt = ip_route_output_ports(sock_net(sk), fl4, sk,
+   rt = ip_route_output_ports(net, fl4, sk,
   daddr, inet->inet_saddr,
   inet->inet_dport,
   inet->inet_sport,
@@ -449,7 +451,7 @@ packet_routed:
ip_options_build(skb, _opt->opt, inet->inet_daddr, rt, 0);
}
 
-   ip_select_ident_segs(sock_net(sk), skb, sk,
+   ip_select_ident_segs(net, skb, sk,
 skb_shinfo(skb)->gso_segs ?: 1);
 
/* TODO : should we use skb->sk here instead of sk ? */
@@ -462,7 +464,7 @@ packet_routed:
 
 no_route:
rcu_read_unlock();
-   IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
+   IP_INC_STATS(net, IPSTATS_MIB_OUTNOROUTES);
kfree_skb(skb);
return -EHOSTUNREACH;
 }
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] bridge/netfilter: avoid unused label warning

2015-10-06 Thread Arnd Bergmann

With the ARM mini2440_defconfig, the bridge netfilter code gets
built with both CONFIG_NF_DEFRAG_IPV4 and CONFIG_NF_DEFRAG_IPV6
disabled, which leads to a harmless gcc warning:

net/bridge/br_netfilter_hooks.c: In function 'br_nf_dev_queue_xmit':
net/bridge/br_netfilter_hooks.c:792:2: warning: label 'drop' defined but not 
used [-Wunused-label]

This gets rid of the warning by cleaning up the code to avoid
the respective #ifdefs causing this problem, and replacing them
with if(IS_ENABLED()) checks. I have verified that the resulting
object code is unchanged, and an additional advantage is that
we now get compile coverage of the unused functions in more
configurations.

Signed-off-by: Arnd Bergmann 
Fixes: dd302b59bde0 ("netfilter: bridge: don't leak skb in error paths")

diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 370aa4d4cf4d..5c679ac5cdef 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -111,7 +111,6 @@ static inline __be16 pppoe_proto(const struct sk_buff *skb)
 /* largest possible L2 header, see br_nf_dev_queue_xmit() */
 #define NF_BRIDGE_MAX_MAC_HEADER_LENGTH (PPPOE_SES_HLEN + ETH_HLEN)
 
-#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) || IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
 struct brnf_frag_data {
char mac[NF_BRIDGE_MAX_MAC_HEADER_LENGTH];
u8 encap_size;
@@ -121,7 +120,6 @@ struct brnf_frag_data {
 };
 
 static DEFINE_PER_CPU(struct brnf_frag_data, brnf_frag_data_storage);
-#endif
 
 static void nf_bridge_info_free(struct sk_buff *skb)
 {
@@ -666,7 +664,6 @@ static unsigned int br_nf_forward_arp(void *priv,
return NF_STOLEN;
 }
 
-#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) || IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
 static int br_nf_push_frag_xmit(struct net *net, struct sock *sk, struct 
sk_buff *skb)
 {
struct brnf_frag_data *data;
@@ -691,9 +688,7 @@ static int br_nf_push_frag_xmit(struct net *net, struct 
sock *sk, struct sk_buff
nf_bridge_info_free(skb);
return br_dev_queue_push_xmit(net, sk, skb);
 }
-#endif
 
-#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
 static int
 br_nf_ip_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
  int (*output)(struct net *, struct sock *, struct sk_buff *))
@@ -711,7 +706,6 @@ br_nf_ip_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
 
return ip_do_fragment(net, sk, skb, output);
 }
-#endif
 
 static unsigned int nf_bridge_mtu_reduction(const struct sk_buff *skb)
 {
@@ -734,11 +728,11 @@ static int br_nf_dev_queue_xmit(struct net *net, struct 
sock *sk, struct sk_buff
 
nf_bridge = nf_bridge_info_get(skb);
 
-#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
/* This is wrong! We should preserve the original fragment
 * boundaries by preserving frag_list rather than refragmenting.
 */
-   if (skb->protocol == htons(ETH_P_IP)) {
+   if (IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) &&
+   skb->protocol == htons(ETH_P_IP)) {
struct brnf_frag_data *data;
 
if (br_validate_ipv4(net, skb))
@@ -760,9 +754,8 @@ static int br_nf_dev_queue_xmit(struct net *net, struct 
sock *sk, struct sk_buff
 
return br_nf_ip_fragment(net, sk, skb, br_nf_push_frag_xmit);
}
-#endif
-#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
-   if (skb->protocol == htons(ETH_P_IPV6)) {
+   if (IS_ENABLED(CONFIG_NF_DEFRAG_IPV6) &&
+   skb->protocol == htons(ETH_P_IPV6)) {
const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops();
struct brnf_frag_data *data;
 
@@ -786,7 +779,6 @@ static int br_nf_dev_queue_xmit(struct net *net, struct 
sock *sk, struct sk_buff
kfree_skb(skb);
return -EMSGSIZE;
}
-#endif
nf_bridge_info_free(skb);
return br_dev_queue_push_xmit(net, sk, skb);
  drop:

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/5] net: phy: Add Broadcom phy library for common interfaces

2015-10-06 Thread Arun Parameswaran

This patch adds the Broadcom phy library to consolidate common
interfaces shared by Broadcom phy's.

Moved the common interfaces to the 'bcm-phy-lib.c' and updated
the Broadcom PHY drivers to use the new APIs.

Signed-off-by: Arun Parameswaran 
---
 drivers/net/phy/Kconfig   |   6 ++
 drivers/net/phy/Makefile  |   1 +
 drivers/net/phy/bcm-phy-lib.c | 209 ++
 drivers/net/phy/bcm-phy-lib.h |  37 
 drivers/net/phy/bcm63xx.c |  38 +---
 drivers/net/phy/bcm7xxx.c | 127 ++---
 drivers/net/phy/broadcom.c| 149 +-
 include/linux/brcmphy.h   |  22 +
 8 files changed, 333 insertions(+), 256 deletions(-)
 create mode 100644 drivers/net/phy/bcm-phy-lib.c
 create mode 100644 drivers/net/phy/bcm-phy-lib.h

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index b57f6c2..606fdc9 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -69,8 +69,12 @@ config SMSC_PHY
---help---
  Currently supports the LAN83C185, LAN8187 and LAN8700 PHYs
 
+config BCM_NET_PHYLIB
+   tristate
+
 config BROADCOM_PHY
tristate "Drivers for Broadcom PHYs"
+   select BCM_NET_PHYLIB
---help---
  Currently supports the BCM5411, BCM5421, BCM5461, BCM54616S, BCM5464,
  BCM5481 and BCM5482 PHYs.
@@ -78,11 +82,13 @@ config BROADCOM_PHY
 config BCM63XX_PHY
tristate "Drivers for Broadcom 63xx SOCs internal PHY"
depends on BCM63XX
+   select BCM_NET_PHYLIB
---help---
  Currently supports the 6348 and 6358 PHYs.
 
 config BCM7XXX_PHY
tristate "Drivers for Broadcom 7xxx SOCs internal PHYs"
+   select BCM_NET_PHYLIB
---help---
  Currently supports the BCM7366, BCM7439, BCM7445, and
  40nm and 65nm generation of BCM7xxx Set Top Box SoCs.
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index f4e6eb9..6932475 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_QSEMI_PHY)   += qsemi.o
 obj-$(CONFIG_SMSC_PHY) += smsc.o
 obj-$(CONFIG_TERANETICS_PHY)   += teranetics.o
 obj-$(CONFIG_VITESSE_PHY)  += vitesse.o
+obj-$(CONFIG_BCM_NET_PHYLIB)   += bcm-phy-lib.o
 obj-$(CONFIG_BROADCOM_PHY) += broadcom.o
 obj-$(CONFIG_BCM63XX_PHY)  += bcm63xx.o
 obj-$(CONFIG_BCM7XXX_PHY)  += bcm7xxx.o
diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
new file mode 100644
index 000..13e161e
--- /dev/null
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -0,0 +1,209 @@
+/*
+ * Copyright (C) 2015 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include "bcm-phy-lib.h"
+#include 
+#include 
+#include 
+#include 
+
+#define MII_BCM_CHANNEL_WIDTH 0x2000
+#define BCM_CL45VEN_EEE_ADV   0x3c
+
+int bcm_phy_write_exp(struct phy_device *phydev, u16 reg, u16 val)
+{
+   int rc;
+
+   rc = phy_write(phydev, MII_BCM54XX_EXP_SEL, reg);
+   if (rc < 0)
+   return rc;
+
+   return phy_write(phydev, MII_BCM54XX_EXP_DATA, val);
+}
+EXPORT_SYMBOL_GPL(bcm_phy_write_exp);
+
+int bcm_phy_read_exp(struct phy_device *phydev, u16 reg)
+{
+   int val;
+
+   val = phy_write(phydev, MII_BCM54XX_EXP_SEL, reg);
+   if (val < 0)
+   return val;
+
+   val = phy_read(phydev, MII_BCM54XX_EXP_DATA);
+
+   /* Restore default value.  It's O.K. if this write fails. */
+   phy_write(phydev, MII_BCM54XX_EXP_SEL, 0);
+
+   return val;
+}
+EXPORT_SYMBOL_GPL(bcm_phy_read_exp);
+
+int bcm_phy_write_misc(struct phy_device *phydev,
+  u16 reg, u16 chl, u16 val)
+{
+   int rc;
+   int tmp;
+
+   rc = phy_write(phydev, MII_BCM54XX_AUX_CTL,
+  MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
+   if (rc < 0)
+   return rc;
+
+   tmp = phy_read(phydev, MII_BCM54XX_AUX_CTL);
+   tmp |= MII_BCM54XX_AUXCTL_ACTL_SMDSP_ENA;
+   rc = phy_write(phydev, MII_BCM54XX_AUX_CTL, tmp);
+   if (rc < 0)
+   return rc;
+
+   tmp = (chl * MII_BCM_CHANNEL_WIDTH) | reg;
+   rc = bcm_phy_write_exp(phydev, tmp, val);
+
+   return rc;
+}
+EXPORT_SYMBOL_GPL(bcm_phy_write_misc);
+
+int bcm_phy_read_misc(struct phy_device *phydev,
+ u16 reg, u16 chl)
+{
+   int rc;
+   int tmp;
+
+   rc = phy_write(phydev, MII_BCM54XX_AUX_CTL,
+  MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
+   if (rc < 0)
+   return

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Alexei Starovoitov


On 10/6/15 10:56 AM, Eric Dumazet wrote:

On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:


was also thinking that we can do it only in paths that actually
have multiple protocol layers, since today bpf is mainly used with
tcpdump(raw_socket) and new af_packet fanout both have cb cleared
on RX, because it just came out of alloc_skb and no layers were called,
and on TX we can clear 20 bytes in dev_queue_xmit_nit().
af_unix/netlink also have clean skb. Need to analyze tun and sctp...
but it feels overly fragile to save a branch in sk_filter,
so planning to go with
if(unlikely(prog->cb_access)) memset in sk_filter().



This will break TCP use of sk_filter().
skb->cb[] contains useful data in TCP layer.


oops. thanks for catching. In case of sk_filter on top of tcp sock,
it shouldn't be looking at cb at all.
I'm thinking to send a patch to get rid of cb access for socket filters
all together until better solution found.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] net: ipv4: tcp.c Fixed an assignment coding style issue

2015-10-06 Thread Yuvaraja Mariappan

>From beecc68c2ec1fecda34d26fcbc3c821e617b7bcb Mon Sep 17 00:00:00 2001
From: Yuvaraja Mariappan 
Date: Tue, 6 Oct 2015 10:53:29 -0700
Subject: [PATCH 1/1] net: ipv4: tcp.c Fixed an assignment coding style issue

Fixed an assignment coding style issue

Signed-off-by: Yuvaraja Mariappan 
---
 net/ipv4/tcp.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b8b8fa1..7d619d3 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -900,7 +900,8 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct 
page *page, int offset,
 */
if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
!tcp_passive_fastopen(sk)) {
-   if ((err = sk_stream_wait_connect(sk, )) != 0)
+   err = sk_stream_wait_connect(sk, );
+   if (err != 0)
goto out_err;
}
 
@@ -967,7 +968,8 @@ new_segment:
 
copied += copy;
offset += copy;
-   if (!(size -= copy)) {
+   size -= copy;
+   if (!size) {
tcp_tx_timestamp(sk, skb);
goto out;
}
@@ -988,7 +990,8 @@ wait_for_memory:
tcp_push(sk, flags & ~MSG_MORE, mss_now,
 TCP_NAGLE_PUSH, size_goal);
 
-   if ((err = sk_stream_wait_memory(sk, )) != 0)
+   err = sk_stream_wait_memory(sk, );
+   if (err != 0)
goto do_error;
 
mss_now = tcp_send_mss(sk, _goal, flags);
@@ -,7 +1114,8 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
 */
if (((1 << sk->sk_state) & ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) &&
!tcp_passive_fastopen(sk)) {
-   if ((err = sk_stream_wait_connect(sk, )) != 0)
+   err = sk_stream_wait_connect(sk, );
+   if (err != 0)
goto do_error;
}
 
@@ -1267,7 +1271,8 @@ wait_for_memory:
tcp_push(sk, flags & ~MSG_MORE, mss_now,
 TCP_NAGLE_PUSH, size_goal);
 
-   if ((err = sk_stream_wait_memory(sk, )) != 0)
+   err = sk_stream_wait_memory(sk, );
+   if (err != 0)
goto do_error;
 
mss_now = tcp_send_mss(sk, _goal, flags);
@@ -1767,7 +1772,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
 
/* __ Restore normal policy in scheduler __ */
 
-   if ((chunk = len - tp->ucopy.len) != 0) {
+   chunk = len - tp->ucopy.len;
+   if (chunk != 0) {
NET_ADD_STATS_USER(sock_net(sk), 
LINUX_MIB_TCPDIRECTCOPYFROMBACKLOG, chunk);
len -= chunk;
copied += chunk;
@@ -1778,7 +1784,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
 do_prequeue:
tcp_prequeue_process(sk);
 
-   if ((chunk = len - tp->ucopy.len) != 0) {
+   chunk = len - tp->ucopy.len;
+   if (chunk != 0) {
NET_ADD_STATS_USER(sock_net(sk), 
LINUX_MIB_TCPDIRECTCOPYFROMPREQUEUE, chunk);
len -= chunk;
copied += chunk;
@@ -2230,7 +2237,8 @@ int tcp_disconnect(struct sock *sk, int flags)
sk->sk_shutdown = 0;
sock_reset_flag(sk, SOCK_DONE);
tp->srtt_us = 0;
-   if ((tp->write_seq += tp->max_window + 2) == 0)
+   tp->write_seq += tp->max_window + 2;
+   if (tp->write_seq == 0)
tp->write_seq = 1;
icsk->icsk_backoff = 0;
tp->snd_cwnd = 2;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Pravin Shelar

On Tue, Oct 6, 2015 at 8:42 AM, Jiri Benc  wrote:
> Some more feedback after doing a deeper review.
>
Thanks for the review.

> On Mon,  5 Oct 2015 10:58:17 -0700, Pravin B Shelar wrote:
>> --- a/drivers/net/geneve.c
>> +++ b/drivers/net/geneve.c
>> @@ -703,6 +703,32 @@ err:
>>   return NETDEV_TX_OK;
>>  }
>>
>> +static int geneve_egress_tun_info(struct net_device *dev, struct sk_buff 
>> *skb,
>> +   struct ip_tunnel_info *egress_tun_info,
>> +   const void **egress_tun_opts)
>> +{
>> + struct geneve_dev *geneve = netdev_priv(dev);
>> + struct ip_tunnel_info *info;
>> + struct rtable *rt;
>> + struct flowi4 fl4;
>> + __be16 sport;
>> +
>> + info = skb_tunnel_info(skb);
>> + if (ip_tunnel_info_af(info) != AF_INET)
>> + return -EINVAL;
>> +
>> + rt = geneve_get_rt(skb, dev, , info);
>
> This will increase dev tx error stats in case the lookup fails which is
> probably something we don't want.
>
right, I will fix it.
> [...]
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -60,6 +60,7 @@ struct wireless_dev;
>>  /* 802.15.4 specific */
>>  struct wpan_dev;
>>  struct mpls_dev;
>> +struct ip_tunnel_info;
>>
>>  void netdev_set_default_ethtool_ops(struct net_device *dev,
>>   const struct ethtool_ops *ops);
>> @@ -1054,6 +1055,11 @@ typedef u16 (*select_queue_fallback_t)(struct 
>> net_device *dev,
>>   *   This function is used to pass protocol port error state information
>>   *   to the switch driver. The switch driver can react to the proto_down
>>   *  by doing a phys down on the associated switch port.
>> + * int (*ndo_get_egress_info)(struct net_device *dev, struct sk_buff *skb,
>> + * __be32 *saddr, __be16 *sport, __be16 *dport);
>> + *   This function is used to get egress tunnel information for given skb.
>> + *   This is useful for retrieving outer tunnel header parameters while
>> + *   sampling packet.
>>   *
>>   */
>>  struct net_device_ops {
>> @@ -1227,6 +1233,10 @@ struct net_device_ops {
>>   int (*ndo_get_iflink)(const struct net_device 
>> *dev);
>>   int (*ndo_change_proto_down)(struct net_device 
>> *dev,
>>bool proto_down);
>> + int (*ndo_get_egress_info)(struct net_device *dev,
>> +struct sk_buff *skb,
>> +struct ip_tunnel_info 
>> *egress_tun_info,
>> +const void 
>> **egress_tun_opts);
>
> This should have at least a better name to reflect it is about IP
> tunnels.
>
> But I don't like having an IP tunnel specific ndo, that doesn't sound
> right. The real thing that is wanted here is to complete the dst
> metadata. What about:
>
> int (*ndo_fill_metadata_dst)(struct net_device *dev, struct sk_buff *skb);
>
This is nicer, I will change it.

> The function will use skb_tunnel_info to get the template info, then
> skb_dst_drop and allocate and attach a fully populated metadata_dst.
> The egress_tun_info in struct dp_upcall_info then can be completely
> dropped, as all the necessary tunnel information will be available
> through skb_tunnel_info(skb). Also, when implemented correctly, such
> skb will be just sent out without route lookups etc. if afterwards
> handed to ndo_start_xmit.
>
I do not see need to drop and reallocate dst in this operation. I just
need to set source IP address and source and dst port in the metadata
dst already set in skb.
This fill_metadata function is not called for every packet so
ndo_start_xmit() still needs to do route lookup.
egress_tun_info in dp_upcall_info is required to check for failure. It
would be only set on successful fill_metadata operation.

> [...]
>> --- a/include/net/ip_tunnels.h
>> +++ b/include/net/ip_tunnels.h
>> @@ -337,6 +337,11 @@ void __init ip_tunnel_core_init(void);
>>  void ip_tunnel_need_metadata(void);
>>  void ip_tunnel_unneed_metadata(void);
>>
>> +void ipv4_egress_info_init(struct ip_tunnel_info *egress_tun_info,
>> +const void **egress_tun_opts,
>> +struct ip_tunnel_info *info, __be32 saddr,
>> +__be16 sport, __be16 dport);
>
> Please use the ip_tunnel prefix as the rest of the functions, this is
> not ipv4 egress info but ip *tunnel* egress info.
>
> Also, it's not clear what the difference between "egress_tun_info" and
> "info" is. I'd suggest to use "dst_info" and "src_info" or something
> similar.
>
src and dst prefix is confusing here, since it is nothing to do tunnel
endpoints. Anyways its moot point after changes above there is no need
to have these functions.

> [...]
>> --- a/net/ipv4/ip_tunnel_core.c
>> +++ b/net/ipv4/ip_tunnel_core.c
>> @@ -424,3 +424,40 @@ void

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Jiri Benc

On Tue, 6 Oct 2015 11:55:35 -0700, Pravin Shelar wrote:
> We should be able to use lwtunnel devices on 4.3. How about using
> tunnel device parameters to detect lwtunnel support. For example in
> case of vxlan tunnel IFLA_VXLAN_COLLECT_METADATA flags can confirm
> lwtunnel support.

You would have to create the interface first using that flag and then
check whether the created interface has the flag set or not. If not,
delete the interface.

Unfortunately, old kernels will just ignore the flag when creating the
interface. That's why I wrote there's no useful way to check it.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] af_unix: constify the sock parameter in unix_sk()

2015-10-06 Thread Paul Moore

Make unix_sk() just like inet[6]_sk() by constify'ing the sock
parameter.

Signed-off-by: Paul Moore 
---
 include/net/af_unix.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index cb1b9bb..b36d837 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -64,7 +64,7 @@ struct unix_sock {
struct socket_wqpeer_wq;
 };
 
-static inline struct unix_sock *unix_sk(struct sock *sk)
+static inline struct unix_sock *unix_sk(const struct sock *sk)
 {
return (struct unix_sock *)sk;
 }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Pravin Shelar

On Tue, Oct 6, 2015 at 11:55 AM, Jiri Benc  wrote:
> On Tue, 6 Oct 2015 11:26:31 -0700, Pravin Shelar wrote:
>> I do not see need to drop and reallocate dst in this operation. I just
>> need to set source IP address and source and dst port in the metadata
>> dst already set in skb.
>
> If I'm looking at the code correctly, metadata_dst is stored in the
> action and each skb gets only a reference to it. Modifying it would
> modify the shared metadata_dst (see execute_set_action).
>
right, I was thinking about updating just the source IPv4 address
which should be same for given tunnel metadata. But that can get
complicated for updates to ipv6 address.

>> This fill_metadata function is not called for every packet so
>> ndo_start_xmit() still needs to do route lookup.
>
> Yes. I meant that in those cases where the fill_metadata function was
> called, we may skip the lookup. Just an optimization and not an
> important one. I'm not even sure it can currently happen as the skb is
> cloned for each action.
>
The function is called for sampling which is rare operation. So I do
not want to complicate code to optimize such case.

>> egress_tun_info in dp_upcall_info is required to check for failure. It
>> would be only set on successful fill_metadata operation.
>
> Or you can just set dst to NULL on failure.
>

In that case we would need to extra logic to update error stats
correctly device xmit operation. For example geneve_xmit() updates
different error stats for route lookup.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/5] dt-bindings: net: Broadcom iProc MDIO bus driver device tree binding

2015-10-06 Thread Arun Parameswaran

Add device tree binding documentation for the Broadcom iProc MDIO
bus driver.

Signed-off-by: Arun Parameswaran 
---
 .../devicetree/bindings/net/brcm,iproc-mdio.txt| 23 ++
 1 file changed, 23 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,iproc-mdio.txt

diff --git a/Documentation/devicetree/bindings/net/brcm,iproc-mdio.txt 
b/Documentation/devicetree/bindings/net/brcm,iproc-mdio.txt
new file mode 100644
index 000..8ba9ed1
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/brcm,iproc-mdio.txt
@@ -0,0 +1,23 @@
+* Broadcom iProc MDIO bus controller
+
+Required properties:
+- compatible: should be "brcm,iproc-mdio"
+- reg: address and length of the register set for the MDIO interface
+- #size-cells: must be 1
+- #address-cells: must be 0
+
+Child nodes of this MDIO bus controller node are standard Ethernet PHY device
+nodes as described in Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+
+mdio@18002000 {
+   compatible = "brcm,iproc-mdio";
+   reg = <0x18002000 0x8>;
+   #size-cells = <1>;
+   #address-cells = <0>;
+
+   enet-gphy@0 {
+   reg = <0>;
+   };
+};
-- 
2.5.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/5] net: phy: Broadcom iProc MDIO bus driver

2015-10-06 Thread Arun Parameswaran

This patch adds support for the Broadcom iProc MDIO bus interface.
The MDIO interface can be found in the Broadcom iProc family Soc's.

The MDIO bus is accessed using a combination of command and data
registers. This MDIO driver provides access to the Etherent GPHY's
connected to the MDIO bus.

Signed-off-by: Arun Parameswaran 
---
 drivers/net/phy/Kconfig  |   9 ++
 drivers/net/phy/Makefile |   1 +
 drivers/net/phy/mdio-bcm-iproc.c | 213 +++
 3 files changed, 223 insertions(+)
 create mode 100644 drivers/net/phy/mdio-bcm-iproc.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index c5ad98a..b57f6c2 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -225,6 +225,15 @@ config MDIO_BCM_UNIMAC
  This hardware can be found in the Broadcom GENET Ethernet MAC
  controllers as well as some Broadcom Ethernet switches such as the
  Starfighter 2 switches.
+
+config MDIO_BCM_IPROC
+   tristate "Broadcom iProc MDIO bus controller"
+   depends on ARCH_BCM_IPROC || COMPILE_TEST
+   depends on HAS_IOMEM && OF_MDIO
+   help
+ This module provides a driver for the MDIO busses found in the
+ Broadcom iProc SoC's.
+
 endif # PHYLIB
 
 config MICREL_KS8995MA
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 87f079c..f4e6eb9 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -38,3 +38,4 @@ obj-$(CONFIG_MDIO_SUN4I)  += mdio-sun4i.o
 obj-$(CONFIG_MDIO_MOXART)  += mdio-moxart.o
 obj-$(CONFIG_MDIO_BCM_UNIMAC)  += mdio-bcm-unimac.o
 obj-$(CONFIG_MICROCHIP_PHY)+= microchip.o
+obj-$(CONFIG_MDIO_BCM_IPROC)   += mdio-bcm-iproc.o
diff --git a/drivers/net/phy/mdio-bcm-iproc.c b/drivers/net/phy/mdio-bcm-iproc.c
new file mode 100644
index 000..c0b4e65
--- /dev/null
+++ b/drivers/net/phy/mdio-bcm-iproc.c
@@ -0,0 +1,213 @@
+/*
+ * Copyright (C) 2015 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define IPROC_GPHY_MDCDIV0x1a
+
+#define MII_CTRL_OFFSET  0x000
+
+#define MII_CTRL_DIV_SHIFT   0
+#define MII_CTRL_PRE_SHIFT   7
+#define MII_CTRL_BUSY_SHIFT  8
+
+#define MII_DATA_OFFSET  0x004
+#define MII_DATA_MASK0x
+#define MII_DATA_TA_SHIFT16
+#define MII_DATA_TA_VAL  2
+#define MII_DATA_RA_SHIFT18
+#define MII_DATA_PA_SHIFT23
+#define MII_DATA_OP_SHIFT28
+#define MII_DATA_OP_WRITE1
+#define MII_DATA_OP_READ 2
+#define MII_DATA_SB_SHIFT30
+
+struct iproc_mdio_priv {
+   struct mii_bus *mii_bus;
+   void __iomem *base;
+};
+
+static inline int iproc_mdio_wait_for_idle(void __iomem *base)
+{
+   u32 val;
+   unsigned int timeout = 1000; /* loop for 1s */
+
+   do {
+   val = readl(base + MII_CTRL_OFFSET);
+   if ((val & BIT(MII_CTRL_BUSY_SHIFT)) == 0)
+   return 0;
+
+   usleep_range(1000, 2000);
+   } while (timeout--);
+
+   return -ETIMEDOUT;
+}
+
+static inline void iproc_mdio_config_clk(void __iomem *base)
+{
+   u32 val;
+
+   val = (IPROC_GPHY_MDCDIV << MII_CTRL_DIV_SHIFT) |
+ BIT(MII_CTRL_PRE_SHIFT);
+   writel(val, base + MII_CTRL_OFFSET);
+}
+
+static int iproc_mdio_read(struct mii_bus *bus, int phy_id, int reg)
+{
+   struct iproc_mdio_priv *priv = bus->priv;
+   u32 cmd;
+   int rc;
+
+   rc = iproc_mdio_wait_for_idle(priv->base);
+   if (rc)
+   return rc;
+
+   iproc_mdio_config_clk(priv->base);
+
+   /* Prepare the read operation */
+   cmd = (MII_DATA_TA_VAL << MII_DATA_TA_SHIFT) |
+   (reg << MII_DATA_RA_SHIFT) |
+   (phy_id << MII_DATA_PA_SHIFT) |
+   BIT(MII_DATA_SB_SHIFT) |
+   (MII_DATA_OP_READ << MII_DATA_OP_SHIFT);
+
+   writel(cmd, priv->base + MII_DATA_OFFSET);
+
+   rc = iproc_mdio_wait_for_idle(priv->base);
+   if (rc)
+   return rc;
+
+   cmd = readl(priv->base + MII_DATA_OFFSET) & MII_DATA_MASK;
+
+   return cmd;
+}
+
+static int iproc_mdio_write(struct mii_bus *bus, int phy_id,
+   int reg, u16 val)
+{
+   struct iproc_mdio_priv *priv = bus->priv;
+   u32 cmd;
+   int rc;
+
+   rc = iproc_mdio_wait_for_idle(priv->base);
+   if (rc)
+   return rc;
+
+   iproc_mdio_config_clk(priv->base);
+
+   /* Prepare the

Re: [PATCH] bridge/netfilter: avoid unused label warning

2015-10-06 Thread Nikolay Aleksandrov

On 10/06/2015 09:22 PM, Arnd Bergmann wrote:
> With the ARM mini2440_defconfig, the bridge netfilter code gets
> built with both CONFIG_NF_DEFRAG_IPV4 and CONFIG_NF_DEFRAG_IPV6
> disabled, which leads to a harmless gcc warning:
> 
> net/bridge/br_netfilter_hooks.c: In function 'br_nf_dev_queue_xmit':
> net/bridge/br_netfilter_hooks.c:792:2: warning: label 'drop' defined but not 
> used [-Wunused-label]
> 
> This gets rid of the warning by cleaning up the code to avoid
> the respective #ifdefs causing this problem, and replacing them
> with if(IS_ENABLED()) checks. I have verified that the resulting
> object code is unchanged, and an additional advantage is that
> we now get compile coverage of the unused functions in more
> configurations.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: dd302b59bde0 ("netfilter: bridge: don't leak skb in error paths")
> 

I posted a fix for this a couple of days ago, but I like your approach better.
Since mine is not yet applied (I sent it to netfilter-devel only, wasn't sure 
which
jurisdiction this falls into exactly) we can drop it.
Just for reference my patch is here:
http://patchwork.ozlabs.org/patch/526417/
Pablo, could you please drop it ?

By the way this takes care of another warning about unused variable 
(nf_bridge), too.
So,

Reviewed-by: Nikolay Aleksandrov 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Jiri Benc

On Tue, 6 Oct 2015 12:21:24 -0700, Pravin Shelar wrote:
> It should report the flag when device parameters are requested by
> userspace if there is support for lwtunnel. So check for the flag
> should be good enough test for lwtunnel support. Why is that not
> reliable check?

Which device? You'll need to create one before querying it. Which won't
work so nicely for older kernels. Basically, what the user space would
have to do is trying to create a vxlan interface and then deleting it
and going the tunnel vport way. It can be quite confusing, I'm afraid.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv3 net 2/6] openvswitch: Fix skb leak in ovs_fragment()

2015-10-06 Thread Joe Stringer

On 6 October 2015 at 07:12, Sergei Shtylyov
 wrote:
> Hello.
>
> On 10/6/2015 1:23 AM, Joe Stringer wrote:
>
>> If ovs_fragment() was unable to fragment the skb due to an L2 header
>> that exceeds the supported length, skbs would be leaked. Fix the bug.
>>
>> Fixes: 7f8a436 "openvswitch: Add conntrack action"
>
>
>12-digit SHA1 needed here. And parens along with double quotes as well.

OK, I sent a fresh series with all of the SHA1s updated.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 net 3/6] openvswitch: Ensure flow is valid before executing ct

2015-10-06 Thread Joe Stringer

The ct action uses parts of the flow key, so we need to ensure that it
is valid before executing that action.

Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v4: Extend "fixes" hash in commit message.
v3: No change.
v2: Acked.
---
 net/openvswitch/actions.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 4cb93f92d6be..c6a39bf2c3b9 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1102,6 +1102,12 @@ static int do_execute_actions(struct datapath *dp, 
struct sk_buff *skb,
break;
 
case OVS_ACTION_ATTR_CT:
+   if (!is_flow_key_valid(key)) {
+   err = ovs_flow_key_update(skb, key);
+   if (err)
+   return err;
+   }
+
err = ovs_ct_execute(ovs_dp_get_net(dp), skb, key,
 nla_data(a));
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 net 6/6] openvswitch: Change CT_ATTR_FLAGS to CT_ATTR_COMMIT

2015-10-06 Thread Joe Stringer

Previously, the CT_ATTR_FLAGS attribute, when nested under the
OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the
semantics of the ct action. It's more extensible to just represent each
flag as a nested attribute, and this requires no additional error
checking to reject flags that aren't currently supported.

Suggested-by: Ben Pfaff 
Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v4: No change.
v3: Acked.
v2: Use bitmask for internal representation of COMMIT.
---
 include/uapi/linux/openvswitch.h | 14 --
 net/openvswitch/conntrack.c  | 13 ++---
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index c861a4cf5fec..036f73bc54cd 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -618,7 +618,9 @@ struct ovs_action_hash {
 
 /**
  * enum ovs_ct_attr - Attributes for %OVS_ACTION_ATTR_CT action.
- * @OVS_CT_ATTR_FLAGS: u32 connection tracking flags.
+ * @OVS_CT_ATTR_COMMIT: If present, commits the connection to the conntrack
+ * table. This allows future packets for the same connection to be identified
+ * as 'established' or 'related'.
  * @OVS_CT_ATTR_ZONE: u16 connection tracking zone.
  * @OVS_CT_ATTR_MARK: u32 value followed by u32 mask. For each bit set in the
  * mask, the corresponding bit in the value is copied to the connection
@@ -630,7 +632,7 @@ struct ovs_action_hash {
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
-   OVS_CT_ATTR_FLAGS,  /* u32 bitmask of OVS_CT_F_*. */
+   OVS_CT_ATTR_COMMIT, /* No argument, commits connection. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */
@@ -641,14 +643,6 @@ enum ovs_ct_attr {
 
 #define OVS_CT_ATTR_MAX (__OVS_CT_ATTR_MAX - 1)
 
-/*
- * OVS_CT_ATTR_FLAGS flags - bitmask of %OVS_CT_F_*
- * @OVS_CT_F_COMMIT: Commits the flow to the conntrack table. This allows
- * future packets for the same connection to be identified as 'established'
- * or 'related'.
- */
-#define OVS_CT_F_COMMIT0x01
-
 /**
  * enum ovs_action_attr - Action types.
  *
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 466d5576fe3f..80bf702715bb 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -47,7 +47,7 @@ struct ovs_conntrack_info {
struct nf_conntrack_helper *helper;
struct nf_conntrack_zone zone;
struct nf_conn *ct;
-   u32 flags;
+   u8 commit : 1;
u16 family;
struct md_mark mark;
struct md_labels labels;
@@ -493,7 +493,7 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
return err;
}
 
-   if (info->flags & OVS_CT_F_COMMIT)
+   if (info->commit)
err = ovs_ct_commit(net, key, info, skb);
else
err = ovs_ct_lookup(net, key, info, skb);
@@ -539,8 +539,7 @@ static int ovs_ct_add_helper(struct ovs_conntrack_info 
*info, const char *name,
 }
 
 static const struct ovs_ct_len_tbl ovs_ct_attr_lens[OVS_CT_ATTR_MAX + 1] = {
-   [OVS_CT_ATTR_FLAGS] = { .minlen = sizeof(u32),
-   .maxlen = sizeof(u32) },
+   [OVS_CT_ATTR_COMMIT]= { .minlen = 0, .maxlen = 0 },
[OVS_CT_ATTR_ZONE]  = { .minlen = sizeof(u16),
.maxlen = sizeof(u16) },
[OVS_CT_ATTR_MARK]  = { .minlen = sizeof(struct md_mark),
@@ -576,8 +575,8 @@ static int parse_ct(const struct nlattr *attr, struct 
ovs_conntrack_info *info,
}
 
switch (type) {
-   case OVS_CT_ATTR_FLAGS:
-   info->flags = nla_get_u32(a);
+   case OVS_CT_ATTR_COMMIT:
+   info->commit = true;
break;
 #ifdef CONFIG_NF_CONNTRACK_ZONES
case OVS_CT_ATTR_ZONE:
@@ -701,7 +700,7 @@ int ovs_ct_action_to_attr(const struct ovs_conntrack_info 
*ct_info,
if (!start)
return -EMSGSIZE;
 
-   if (nla_put_u32(skb, OVS_CT_ATTR_FLAGS, ct_info->flags))
+   if (ct_info->commit && nla_put_flag(skb, OVS_CT_ATTR_COMMIT))
return -EMSGSIZE;
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
nla_put_u16(skb, OVS_CT_ATTR_ZONE, ct_info->zone.id))
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 net 4/6] openvswitch: Reject ct_state unsupported bits

2015-10-06 Thread Joe Stringer

Previously, if userspace specified ct_state bits in the flow key which
are currently undefined (and therefore unsupported), then they would be
ignored. This could cause unexpected behaviour in future if userspace is
extended to support additional bits but attempts to communicate with the
current version of the kernel. This patch rectifies the situation by
rejecting such ct_state bits.

Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v4: Extend "fixes" hash in commit message.
v3: No change.
v2: Acked.
---
 net/openvswitch/conntrack.h| 12 
 net/openvswitch/flow_netlink.c |  6 ++
 2 files changed, 18 insertions(+)

diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index 6bd603c6a031..d6eca8394254 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -34,6 +34,13 @@ int ovs_ct_execute(struct net *, struct sk_buff *, struct 
sw_flow_key *,
 void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key);
 int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
 void ovs_ct_free_action(const struct nlattr *a);
+
+static inline bool ovs_ct_state_supported(u8 state)
+{
+   return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED |
+OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR |
+OVS_CS_F_INVALID | OVS_CS_F_TRACKED));
+}
 #else
 #include 
 
@@ -46,6 +53,11 @@ static inline bool ovs_ct_verify(struct net *net, int attr)
return false;
 }
 
+static inline bool ovs_ct_state_supported(u8 state)
+{
+   return false;
+}
+
 static inline int ovs_ct_copy_action(struct net *net, const struct nlattr *nla,
 const struct sw_flow_key *key,
 struct sw_flow_actions **acts, bool log)
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index a60e3b7684bc..d47b5c5c640e 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -816,6 +816,12 @@ static int metadata_from_nlattrs(struct net *net, struct 
sw_flow_match *match,
ovs_ct_verify(net, OVS_KEY_ATTR_CT_STATE)) {
u8 ct_state = nla_get_u8(a[OVS_KEY_ATTR_CT_STATE]);
 
+   if (!is_mask && !ovs_ct_state_supported(ct_state)) {
+   OVS_NLERR(log, "ct_state flags %02x unsupported",
+ ct_state);
+   return -EINVAL;
+   }
+
SW_FLOW_KEY_PUT(match, ct.state, ct_state, is_mask);
*attrs &= ~(1ULL << OVS_KEY_ATTR_CT_STATE);
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 net 2/6] openvswitch: Fix skb leak in ovs_fragment()

2015-10-06 Thread Joe Stringer

If ovs_fragment() was unable to fragment the skb due to an L2 header
that exceeds the supported length, skbs would be leaked. Fix the bug.

Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v4: Extend "fixes" hash in commit message.
v3: Acked.
v2: Drop if condition by returning in success case.
---
 net/openvswitch/actions.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index e23a61cc3d5c..4cb93f92d6be 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -684,7 +684,7 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
 {
if (skb_network_offset(skb) > MAX_L2_LEN) {
OVS_NLERR(1, "L2 header too long to fragment");
-   return;
+   goto err;
}
 
if (ethertype == htons(ETH_P_IP)) {
@@ -708,8 +708,7 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
struct rt6_info ovs_rt;
 
if (!v6ops) {
-   kfree_skb(skb);
-   return;
+   goto err;
}
 
prepare_frag(vport, skb);
@@ -728,8 +727,12 @@ static void ovs_fragment(struct vport *vport, struct 
sk_buff *skb, u16 mru,
WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.",
  ovs_vport_name(vport), ntohs(ethertype), mru,
  vport->dev->mtu);
-   kfree_skb(skb);
+   goto err;
}
+
+   return;
+err:
+   kfree_skb(skb);
 }
 
 static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 net 1/6] openvswitch: Fix typos in CT headers

2015-10-06 Thread Joe Stringer

These comments hadn't caught up to their implementations, fix them.

Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v4: Extend "fixes" hash in commit message.
v3: No change.
v2: Acked.
---
 include/uapi/linux/openvswitch.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index c736344afed4..a9a4a59912e9 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -630,7 +630,7 @@ struct ovs_action_hash {
  */
 enum ovs_ct_attr {
OVS_CT_ATTR_UNSPEC,
-   OVS_CT_ATTR_FLAGS,  /* u8 bitmask of OVS_CT_F_*. */
+   OVS_CT_ATTR_FLAGS,  /* u32 bitmask of OVS_CT_F_*. */
OVS_CT_ATTR_ZONE,   /* u16 zone id. */
OVS_CT_ATTR_MARK,   /* mark to associate with this connection. */
OVS_CT_ATTR_LABELS, /* labels to associate with this connection. */
@@ -705,7 +705,7 @@ enum ovs_action_attr {
   * data immediately followed by a mask.
   * The data must be zero for the unmasked
   * bits. */
-   OVS_ACTION_ATTR_CT,   /* One nested OVS_CT_ATTR_* . */
+   OVS_ACTION_ATTR_CT,   /* Nested OVS_CT_ATTR_* . */
 
__OVS_ACTION_ATTR_MAX,/* Nothing past this will be accepted
   * from userspace. */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv4 net 5/6] openvswitch: Extend ct_state match field to 32 bits

2015-10-06 Thread Joe Stringer

The ct_state field was initially added as an 8-bit field, however six of
the bits are already being used and use cases are already starting to
appear that may push the limits of this field. This patch extends the
field to 32 bits while retaining the internal representation of 8 bits.
This should cover forward compatibility of the ABI for the foreseeable
future.

This patch also reorders the OVS_CS_F_* bits to be sequential.

Suggested-by: Jarno Rajahalme 
Signed-off-by: Joe Stringer 
Acked-by: Pravin B Shelar 
---
v4: No change.
v3: No change.
v2: Acked.
---
 include/uapi/linux/openvswitch.h | 8 
 net/openvswitch/conntrack.c  | 2 +-
 net/openvswitch/conntrack.h  | 4 ++--
 net/openvswitch/flow_netlink.c   | 8 
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index a9a4a59912e9..c861a4cf5fec 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -323,7 +323,7 @@ enum ovs_key_attr {
OVS_KEY_ATTR_MPLS,  /* array of struct ovs_key_mpls.
 * The implementation may restrict
 * the accepted length of the array. */
-   OVS_KEY_ATTR_CT_STATE,  /* u8 bitmask of OVS_CS_F_* */
+   OVS_KEY_ATTR_CT_STATE,  /* u32 bitmask of OVS_CS_F_* */
OVS_KEY_ATTR_CT_ZONE,   /* u16 connection tracking zone. */
OVS_KEY_ATTR_CT_MARK,   /* u32 connection tracking mark */
OVS_KEY_ATTR_CT_LABELS, /* 16-octet connection tracking label */
@@ -449,9 +449,9 @@ struct ovs_key_ct_labels {
 #define OVS_CS_F_ESTABLISHED   0x02 /* Part of an existing connection. */
 #define OVS_CS_F_RELATED   0x04 /* Related to an established
 * connection. */
-#define OVS_CS_F_INVALID   0x20 /* Could not track connection. */
-#define OVS_CS_F_REPLY_DIR 0x40 /* Flow is in the reply direction. */
-#define OVS_CS_F_TRACKED   0x80 /* Conntrack has occurred. */
+#define OVS_CS_F_REPLY_DIR 0x08 /* Flow is in the reply direction. */
+#define OVS_CS_F_INVALID   0x10 /* Could not track connection. */
+#define OVS_CS_F_TRACKED   0x20 /* Conntrack has occurred. */
 
 /**
  * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 7d80acfb80d0..466d5576fe3f 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -167,7 +167,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key)
 
 int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb)
 {
-   if (nla_put_u8(skb, OVS_KEY_ATTR_CT_STATE, key->ct.state))
+   if (nla_put_u32(skb, OVS_KEY_ATTR_CT_STATE, key->ct.state))
return -EMSGSIZE;
 
if (IS_ENABLED(CONFIG_NF_CONNTRACK_ZONES) &&
diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
index d6eca8394254..da8714942c95 100644
--- a/net/openvswitch/conntrack.h
+++ b/net/openvswitch/conntrack.h
@@ -35,7 +35,7 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct 
sw_flow_key *key);
 int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
 void ovs_ct_free_action(const struct nlattr *a);
 
-static inline bool ovs_ct_state_supported(u8 state)
+static inline bool ovs_ct_state_supported(u32 state)
 {
return !(state & ~(OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED |
 OVS_CS_F_RELATED | OVS_CS_F_REPLY_DIR |
@@ -53,7 +53,7 @@ static inline bool ovs_ct_verify(struct net *net, int attr)
return false;
 }
 
-static inline bool ovs_ct_state_supported(u8 state)
+static inline bool ovs_ct_state_supported(u32 state)
 {
return false;
 }
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index d47b5c5c640e..171a691f1c32 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -291,7 +291,7 @@ size_t ovs_key_attr_size(void)
+ nla_total_size(4)   /* OVS_KEY_ATTR_SKB_MARK */
+ nla_total_size(4)   /* OVS_KEY_ATTR_DP_HASH */
+ nla_total_size(4)   /* OVS_KEY_ATTR_RECIRC_ID */
-   + nla_total_size(1)   /* OVS_KEY_ATTR_CT_STATE */
+   + nla_total_size(4)   /* OVS_KEY_ATTR_CT_STATE */
+ nla_total_size(2)   /* OVS_KEY_ATTR_CT_ZONE */
+ nla_total_size(4)   /* OVS_KEY_ATTR_CT_MARK */
+ nla_total_size(16)  /* OVS_KEY_ATTR_CT_LABELS */
@@ -349,7 +349,7 @@ static const struct ovs_len_tbl 
ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
[OVS_KEY_ATTR_TUNNEL]= { .len = OVS_ATTR_NESTED,
 .next = ovs_tunnel_key_lens, },
[OVS_KEY_ATTR_MPLS]  = { .len = sizeof(struct ovs_key_mpls) },
-   [OVS_KEY_ATTR_CT_STATE]  = { .len = sizeof(u8) },
+

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Andy Lutomirski

On Tue, Oct 6, 2015 at 10:56 AM, Eric Dumazet  wrote:
> On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:
>
>> was also thinking that we can do it only in paths that actually
>> have multiple protocol layers, since today bpf is mainly used with
>> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
>> on RX, because it just came out of alloc_skb and no layers were called,
>> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
>> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
>> but it feels overly fragile to save a branch in sk_filter,
>> so planning to go with
>> if(unlikely(prog->cb_access)) memset in sk_filter().
>>
>
> This will break TCP use of sk_filter().
> skb->cb[] contains useful data in TCP layer.
>
>

Since I don't know too much about the networking details:

1. Does "skb->cb" *ever* contain anything useful for an unprivileged user?

2. Does sbk->cb form a stable ABI?

Unless both answers are solid yesses, then maybe the right solution is
to just deny access entirely to unprivileged users.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipvs: Remove possibly unused net variable from ip_vs_out

2015-10-06 Thread Julian Anastasov


Hello,

On Tue, 6 Oct 2015, Simon Horman wrote:

> Since 6f2bcea9917d ("ipvs: Pass ipvs into ip_vs_in_icmp and
> ip_vs_in_icmp_v6") the net variable in ip_vs_out() appears to be unused
> unless CONFIG_IP_VS_IPV6 is set. To resolve this remove the net variable
> and dereference net asneeded.
> 
> Signed-off-by: Simon Horman 

Both patches look good to me

Acked-by: Julian Anastasov 

> ---
>  net/netfilter/ipvs/ip_vs_core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index d08df435c2aa..3773154d9b71 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -1172,7 +1172,6 @@ drop:
>  static unsigned int
>  ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff 
> *skb, int af)
>  {
> - struct net *net = ipvs->net;
>   struct ip_vs_iphdr iph;
>   struct ip_vs_protocol *pp;
>   struct ip_vs_proto_data *pd;
> @@ -1272,7 +1271,7 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int 
> hooknum, struct sk_buff *skb, in
>  #ifdef CONFIG_IP_VS_IPV6
>   if (af == AF_INET6) {
>   if (!skb->dev)
> - skb->dev = net->loopback_dev;
> + skb->dev = 
> ipvs->net->loopback_dev;
>   icmpv6_send(skb,
>   ICMPV6_DEST_UNREACH,
>   ICMPV6_PORT_UNREACH,
> -- 
> 2.1.4

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Jiri Benc

On Tue, 6 Oct 2015 11:28:08 -0700, Pravin Shelar wrote:
> What do you have in mind? I do not see way to fix this issue in vport-*.c.

It seems to me that e.g. the code you have in vxlan_egress_tun_info in
drivers/net/vxlan.c can be put into vxlan_get_egress_tun_info in
net/openvswitch/vport-vxlan.c. vport->dev is guaranteed to be vxlan,
and the current code accesses netdev_priv(dev) as struct vxlan_dev
anyway.

This would of course not work if we created lwtunnel interface from the
ovs user space. But that's not going to happen with kernel 4.3, we'll
need a way to query datapath for features it supports for this to work
- there's currently no useful way to determine whether the kernel
supports metadata based vxlan or not. I'm working on a patch to query
the datapath for the supported features but that's for net-next. Thus
I think we're safe.

 Jiri

-- 
Jiri Benc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 1/2] mpls: multipath support

2015-10-06 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'

- 'struct mpls_nh' represents a mpls nexthop label forwarding entry

- moves mpls route and nexthop structures into internal.h

- A mpls_route can point to multiple mpls_nh structs

- the nexthops are maintained as a list

- In the process of restructuring, this patch also consistently changes all
labels to u8

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this patch, the multipath route nexthop selection algorithm
is a simple round robin picked up from ipv4 fib code and is replaced by
a hash based algorithm from Robert Shearman in the next patch

- mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
mpls_route_update though implemented to update based on dev, it was never
used that way. And the dev handling gets tricky with multiple nexthops. Cannot
match against any single nexthops dev. So, this patch removes the unused
'dev' handling in mpls_route_update.

Example:

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Signed-off-by: Roopa Prabhu 
---
 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 627 +---
 net/mpls/internal.h |  43 ++-
 3 files changed, 516 insertions(+), 156 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index 4757997..179253f 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -18,7 +18,7 @@
 
 struct mpls_iptunnel_encap {
u32 label[MAX_NEW_LABELS];
-   u32 labels;
+   u8  labels;
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 8c5707d..ae9e153 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -19,39 +19,12 @@
 #include 
 #include 
 #endif
+#include 
 #include "internal.h"
 
-#define LABEL_NOT_SPECIFIED (1<<20)
-#define MAX_NEW_LABELS 2
-
-/* This maximum ha length copied from the definition of struct neighbour */
-#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
-
-enum mpls_payload_type {
-   MPT_UNSPEC, /* IPv4 or IPv6 */
-   MPT_IPV4 = 4,
-   MPT_IPV6 = 6,
-
-   /* Other types not implemented:
-*  - Pseudo-wire with or without control word (RFC4385)
-*  - GAL (RFC5586)
-*/
-};
-
-struct mpls_route { /* next hop label forwarding entry */
-   struct net_device __rcu *rt_dev;
-   struct rcu_head rt_rcu;
-   u32 rt_label[MAX_NEW_LABELS];
-   u8  rt_protocol; /* routing protocol that set this 
entry */
-   u8  rt_payload_type;
-   u8  rt_labels;
-   u8  rt_via_alen;
-   u8  rt_via_table;
-   u8  rt_via[0];
-};
-
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
+static DEFINE_SPINLOCK(mpls_multipath_lock);
 
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
   struct nlmsghdr *nlh, struct net *net, u32 portid,
@@ -80,10 +53,10 @@ bool mpls_output_possible(const struct net_device *dev)
 }
 EXPORT_SYMBOL_GPL(mpls_output_possible);
 
-static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
+static unsigned int mpls_nh_header_size(const struct mpls_nh *nh)
 {
/* The size of the layer 2.5 labels to be added for this route */
-   return rt->rt_labels * sizeof(struct mpls_shim_hdr);
+   return nh->nh_labels * sizeof(struct mpls_shim_hdr);
 }
 
 unsigned int mpls_dev_mtu(const struct net_device *dev)
@@ -105,8 +78,58 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
-   struct mpls_entry_decoded dec)
+/* This is a cut/copy/modify from fib_select_multipath */
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+{
+   struct mpls_nh *nh;
+   struct mpls_nh *ret_nh;
+   int nhsel = 0;
+   int w;
+
+   spin_lock_bh(_multipath_lock);
+   ret_nh = list_first_entry_or_null(>rt_nhs, struct mpls_nh,
+ nh_next);
+   if (rt->rt_power <= 0) {
+   int power = 0;
+
+   list_for_each_entry(nh, >rt_nhs,

[PATCH net-next v2 0/2] mpls: multipath support

2015-10-06 Thread Roopa Prabhu

From: Roopa Prabhu 

This patch adds support for MPLS multipath routes.

Includes following changes to support multipath:
- splits struct mpls_route into 'struct mpls_route + struct mpls_nh'.

- struct mpls_nh represents a mpls nexthop label forwarding entry

- Adds support to parse/fill RTA_MULTIPATH netlink attribute for
multipath routes similar to ipv4/v6 fib

- In this series the multipath route nexthop selection
algorithm starts with a simple round robin and is replaced by a hash
based algorithm from Robert Shearman in the last patch

- In the process of restructuring, this patch also consistently changes all
labels to u8

$ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
nexthop as 700 via inet 10.1.1.6 dev swp2 \
nexthop as 800 via inet 40.1.1.2 dev swp3

$ip  -f mpls route show
100 
nexthop as to 200 via inet 10.1.1.2  dev swp1
nexthop as to 700 via inet 10.1.1.6  dev swp2
nexthop as to 800 via inet 40.1.1.2  dev swp3

Roopa Prabhu (2):
  mpls: multipath support
  mpls: flow-based multipath selection

Signed-off-by: Roopa Prabhu 


Changes since v1:
- Incorporate some feedback from Robert:
use dynamic allocation (list) instead of static allocation
for nexthops
- include flow hash based multipath selection from Robert


 include/net/mpls_iptunnel.h |   2 +-
 net/mpls/af_mpls.c  | 668 ++--
 net/mpls/internal.h |  57 +++-
 3 files changed, 572 insertions(+), 155 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v2 2/2] mpls: flow-based multipath selection

2015-10-06 Thread Roopa Prabhu

From: Robert Shearman 

Change the selection of a multipath route to use a flow-based
hash. This more suitable for traffic sensitive to reordering within a
flow (e.g. TCP, L2VPN) and whilst still allowing a good distribution
of traffic given enough flows.

Selection of the path for a multipath route is done using a hash of:
1. Label stack up to MAX_MP_SELECT_LABELS labels or up to and
   including entropy label, whichever is first.
2. 3-tuple of (L3 src, L3 dst, proto) from IPv4/IPv6 header in MPLS
   payload, if present.

Naturally, a 5-tuple hash using L4 information in addition would be
possible and be better in some scenarios, but there is a tradeoff
between looking deeper into the packet to achieve good distribution,
and packet forwarding performance, and I have erred on the side of the
latter as the default.

Signed-off-by: Robert Shearman 
---
 net/mpls/af_mpls.c | 110 -
 1 file changed, 76 insertions(+), 34 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ae9e153..1bef057 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -22,9 +22,13 @@
 #include 
 #include "internal.h"
 
+/* Maximum number of labels to look ahead at when selecting a path of
+ * a multipath route
+ */
+#define MAX_MP_SELECT_LABELS 4
+
 static int zero = 0;
 static int label_limit = (1 << 20) - 1;
-static DEFINE_SPINLOCK(mpls_multipath_lock);
 
 static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
   struct nlmsghdr *nlh, struct net *net, u32 portid,
@@ -78,53 +82,91 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
 }
 EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
-/* This is a cut/copy/modify from fib_select_multipath */
-static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt)
+static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
+struct sk_buff *skb, bool bos)
 {
+   struct mpls_entry_decoded dec;
+   struct mpls_shim_hdr *hdr;
struct mpls_nh *nh;
struct mpls_nh *ret_nh;
-   int nhsel = 0;
-   int w;
-
-   spin_lock_bh(_multipath_lock);
+   bool eli_seen = false;
+   int label_index;
+   int nh_index;
+   u32 hash = 0;
+   int nhsel;
+
+   /* No need to look further into packet if there's only
+* one path
+*/
ret_nh = list_first_entry_or_null(>rt_nhs, struct mpls_nh,
  nh_next);
-   if (rt->rt_power <= 0) {
-   int power = 0;
+   if (rt->rt_nhn == 1)
+   goto out;
 
-   list_for_each_entry(nh, >rt_nhs, nh_next) {
-   power += nh->nh_weight;
-   nh->nh_power = nh->nh_weight;
+   for (label_index = 0; label_index < MAX_MP_SELECT_LABELS && !bos;
+label_index++) {
+   if (!pskb_may_pull(skb, sizeof(*hdr) * label_index))
+   break;
+
+   /* Read and decode the current label */
+   hdr = mpls_hdr(skb) + label_index;
+   dec = mpls_entry_decode(hdr);
+
+   /* RFC6790 - reserved labels MUST NOT be used as keys
+* for the load-balancing function
+*/
+   if (dec.label == MPLS_LABEL_ENTROPY) {
+   eli_seen = true;
+   } else if (dec.label >= MPLS_LABEL_FIRST_UNRESERVED) {
+   hash = jhash_1word(dec.label, hash);
+
+   /* The entropy label follows the entropy label
+* indicator, so this means that the entropy
+* label was just added to the hash - no need to
+* go any deeper either in the label stack or in the
+* payload
+*/
+   if (eli_seen)
+   break;
}
-   rt->rt_power = power;
-   if (power <= 0) {
-   spin_unlock_bh(_multipath_lock);
-   /* Race condition: route has just become dead. */
-   return ret_nh;
+
+   bos = dec.bos;
+   if (bos && pskb_may_pull(skb, sizeof(*hdr) * label_index +
+sizeof(struct iphdr))) {
+   const struct iphdr *v4hdr;
+
+   v4hdr = (const struct iphdr *)(mpls_hdr(skb) +
+  label_index);
+   if (v4hdr->version == 4) {
+   hash = jhash_3words(ntohl(v4hdr->saddr),
+   ntohl(v4hdr->daddr),
+   v4hdr->protocol, hash);
+   } else if (v4hdr->version == 6 &&
+

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Eric Dumazet

On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:

> was also thinking that we can do it only in paths that actually
> have multiple protocol layers, since today bpf is mainly used with
> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
> on RX, because it just came out of alloc_skb and no layers were called,
> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
> but it feels overly fragile to save a branch in sk_filter,
> so planning to go with
> if(unlikely(prog->cb_access)) memset in sk_filter().
> 

This will break TCP use of sk_filter().
skb->cb[] contains useful data in TCP layer.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net] openvswitch: Fix egress tunnel info.

2015-10-06 Thread Pravin Shelar

On Tue, Oct 6, 2015 at 8:56 AM, Jiri Benc  wrote:
> On Mon, 5 Oct 2015 12:37:50 -0700, Pravin Shelar wrote:
>> That would involve lot of refactoring vxlan xmit code. I do not want
>> to make such change in net tree at such late stage.
>
> Fair enough. But then, the minimal fix for net.git would be just
> modifying the current get_egress_tun_info handlers in vport-*.c,
> wouldn't it? I'd say let's go this way and fix this properly for
> net-next. See my other mail for my comments on the patch.
>
What do you have in mind? I do not see way to fix this issue in vport-*.c.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 01/15] ipv4: Fix ip_local_out_sk by passing the sk into __ip_local_out_sk

2015-10-06 Thread Eric W. Biederman

In the rare case where sk != skb->sk ip_local_out_sk arranges
to call dst->output differently if the skb is queued or not.
This is a bug.

Fix this bug by passing the sk parameter of ip_local_out_sk through
from ip_local_out_sk to __ip_local_out_sk (skipping __ip_local_out).

Fixes: 7026b1ddb6b8d4e6ee33dc2bd06c0ca8746fa7ab
Signed-off-by: "Eric W. Biederman" 
---
 net/ipv4/ip_output.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 911ea739049a..6cb585a05dd1 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -117,7 +117,7 @@ int ip_local_out_sk(struct sock *sk, struct sk_buff *skb)
 {
int err;
 
-   err = __ip_local_out(skb);
+   err = __ip_local_out_sk(sk, skb);
if (likely(err == 1))
err = dst_output(sk, skb);
 
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 09/15] ipv4: Cache net in iptunnel_xmit

2015-10-06 Thread Eric W. Biederman

Store net in a variable in ip_tunnel_xmit so it does not need
to be recomputed when it is used again.

Signed-off-by: "Eric W. Biederman" 
---
 net/ipv4/ip_tunnel_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 8d85ecd1ced5..caef8e2c281d 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -53,6 +53,7 @@ int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct 
sk_buff *skb,
  __u8 tos, __u8 ttl, __be16 df, bool xnet)
 {
int pkt_len = skb->len - skb_inner_network_offset(skb);
+   struct net *net = dev_net(rt->dst.dev);
struct iphdr *iph;
int err;
 
@@ -76,8 +77,7 @@ int iptunnel_xmit(struct sock *sk, struct rtable *rt, struct 
sk_buff *skb,
iph->daddr  =   dst;
iph->saddr  =   src;
iph->ttl=   ttl;
-   __ip_select_ident(dev_net(rt->dst.dev), iph,
- skb_shinfo(skb)->gso_segs ?: 1);
+   __ip_select_ident(net, iph, skb_shinfo(skb)->gso_segs ?: 1);
 
err = ip_local_out(sk, skb);
if (unlikely(net_xmit_eval(err)))
-- 
2.2.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/5] Add support for Broadcom's iProc MDIO and Cygnus Ethernet PHY

2015-10-06 Thread Arun Parameswaran

Hi
This patchset adds support for the iProc MDIO interface and the
Broadcom Cygnus SoC's internal Ethernet PHY.

The internal Ethernet PHY(s) in the Cygnus SoC's are accessed
via the MDIO interface found in most of the iProc based chips.

The patch also consolidates the common API's used by the
Broadcom phys to a common library. Existing Broadcom phy
drivers have been modified to use the common library API's.

This patch series is based on Linux v4.3-rc1 and is avaliable in:
https://github.com/Broadcom/cygnus-linux/tree/cygnus-net-phy-mdio-v3

The Ethernet driver for the iProc family will be submitted soon,
as will the device tree configurations for the different iProc
family SoCs.

Changes from v2:
- Modified drivers/net/phy/Kconfig to modify the BCM_CYGNUS_PHY
  driver to 'depends on MDIO_BCM_IPROC' instead of 'select'.
- Added github branch to the cover letter

Changes from v1:
- Updated device tree documentation for the iProc MDIO driver
  based on Florian's feedback.
- Moved the core register defines from the Cygnus PHY driver to
  'include/linux/brcmphy.h' based on Florian's feedback.
- Created a new patch/commit to modify the bcm7xxx phy driver
  to use the new core register defines.
- Modified the Kconfig entry for the Broadcom PHY library to
  'tristate' instead of 'bool'

Arun Parameswaran (5):
  dt-bindings: net: Broadcom iProc MDIO bus driver device tree binding
  net: phy: Broadcom iProc MDIO bus driver
  net: phy: Add Broadcom phy library for common interfaces
  net: phy: Broadcom Cygnus internal Etherent PHY driver
  net: phy: bcm7xxx: Modified to use global core register defines

 .../devicetree/bindings/net/brcm,iproc-mdio.txt|  23 +++
 drivers/net/phy/Kconfig|  28 +++
 drivers/net/phy/Makefile   |   3 +
 drivers/net/phy/bcm-cygnus.c   | 158 +++
 drivers/net/phy/bcm-phy-lib.c  | 209 
 drivers/net/phy/bcm-phy-lib.h  |  37 
 drivers/net/phy/bcm63xx.c  |  38 +---
 drivers/net/phy/bcm7xxx.c  | 136 +++--
 drivers/net/phy/broadcom.c | 149 +-
 drivers/net/phy/mdio-bcm-iproc.c   | 213 +
 include/linux/brcmphy.h|  29 +--
 11 files changed, 761 insertions(+), 262 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/brcm,iproc-mdio.txt
 create mode 100644 drivers/net/phy/bcm-cygnus.c
 create mode 100644 drivers/net/phy/bcm-phy-lib.c
 create mode 100644 drivers/net/phy/bcm-phy-lib.h
 create mode 100644 drivers/net/phy/mdio-bcm-iproc.c

-- 
2.5.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 1/2] regmap: Allow installing custom reg_update_bits function

2015-10-06 Thread David Miller

From: Mark Brown 
Date: Mon, 5 Oct 2015 15:25:31 +0100

> On Mon, Oct 05, 2015 at 06:16:09AM -0700, David Miller wrote:
> 
>> >> Applied.
> 
>> > Thanks David. However, I've sent a v3 patch, and also expecting feedback 
>> > from Mark Brown on the regmap portion of it.
> 
>> Please send me relative changes from v2 to v3, thanks.
> 
>> Sorry about that.
> 
> Ugh, this is a mess :(  Can you please drop this patch instead?

I can't just "drop" changes.  Once a commit hits my tree it is part
of the permanent record.

The easiest thing to do is to send a relative fix, and that's why
I have asked for exactly that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] amd-xgbe: Check for successful buffer allocation before use

2015-10-06 Thread David Miller

From: Tom Lendacky 
Date: Mon, 5 Oct 2015 10:51:03 -0500

> The kasprintf function can return NULL if the allocation fails. Check for
> successful allocation before attempting to use the returned buffer.
> 
> Signed-off-by: Tom Lendacky 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipvs: Remove unused net variable from ip_vs_out

2015-10-06 Thread Julian Anastasov


Hello,

On Fri, 2 Oct 2015, Simon Horman wrote:

> Since 6f2bcea9917d ("ipvs: Pass ipvs into ip_vs_in_icmp and ip_vs_in_icmp_v6")
> the net net variable in ip_vs_out() appears to be unused so remove it.
> 
> Signed-off-by: Simon Horman 
> ---
>  net/netfilter/ipvs/ip_vs_core.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index d08df435c2aa..72b6a44ea9de 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -1172,7 +1172,6 @@ drop:
>  static unsigned int
>  ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff 
> *skb, int af)
>  {
> - struct net *net = ipvs->net;

So, this line should be moved into the block
where the var is used?

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 18/23] spear13xx_pcie_gadget: use per-attribute show and store methods

2015-10-06 Thread Pratyush Anand

On Sat, Oct 3, 2015 at 7:02 PM, Christoph Hellwig  wrote:
> Signed-off-by: Christoph Hellwig 

Acked-by: Pratyush Anand 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/5] net: dsa: add missing kfree on remove

2015-10-06 Thread Neil Armstrong

On 10/03/2015 10:54 PM, Sergei Shtylyov wrote:
> On 10/3/2015 6:20 PM, Neil Armstrong wrote:
> 
>>> On 10/3/2015 5:25 PM, Neil Armstrong wrote:
>>>
 To prevent memory leakage on unbinding, add missing kfree calls.

 Signed-off-by: Neil Armstrong 
 ---
net/dsa/dsa.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
 index c59fa5d..12cec40 100644
 --- a/net/dsa/dsa.c
 +++ b/net/dsa/dsa.c
 @@ -914,8 +914,10 @@ static void dsa_remove_dst(struct dsa_switch_tree 
 *dst)
for (i = 0; i < dst->pd->nr_chips; i++) {
struct dsa_switch *ds = dst->ds[i];

 -if (ds != NULL)
 +if (ds != NULL) {
>>>
>>> Didn;t scripts/checkpatch.pl complain here? just if (ds) is preferred 
>>> in the networking code.
>>>
>>> MBR, Sergei
>>>
>> Yes,
>>
>> But I considered the cosmetic changes are not the subject of this serie.
> 
>Formally, all the patches should be checkpatch-clean...
> 
>> Neil
> 
> MBR, Sergei
> 
Sure,

How should I handle this case ?
A separate patch with the cosmetic change before the kfree addition ?

Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next v2 00/13] rocker: add support for multiple worlds

2015-10-06 Thread Jiri Pirko

Tue, Oct 06, 2015 at 05:56:12AM CEST, sfel...@gmail.com wrote:
>On Mon, Oct 5, 2015 at 10:43 AM, Jiri Pirko  wrote:
>> From: Jiri Pirko 
>>
>> This patchset allows new rocker worlds to be easily added in future (like 
>> eBPF
>> based one I have been working on). The main part of the patchset is the 
>> OF-DPA
>> carve-out. It resuts in OF-DPA specific file. Clean cut.
>>
>> v1->v2:
>>  - rtnl rocker mode change userspace expose patch was removed
>>
>> Jiri Pirko (13):
>>   rocker: remove unused rocker_port param from alloc funcs and shorten
>> their names
>>   rocker: rename rocker.h to rocker_hw.h
>>   rocker: rename rocker.c to rocker_main.c
>>   rocker: push tlv processing into separate files
>>   rocker: implement set settings mode command
>>   rocker: introduce worlds infrastructure
>>   rocker: introduce OF-DPA world skeleton
>>   rocker: set default world on port probe and clean world on remove
>>   rocker: pass "learning" value as a parameter to
>> rocker_port_set_learning
>>   rocker: pre-allocate wait structures during cmd ring init
>>   rocker: remove trans parameter to rocker_cmd_exec function
>>   rocker: call rocker_cmd_exec function with "nowait" boolean instead of
>> flags
>>   rocker: move OF-DPA stuff into separate file
>
>A couple of my tests are failing with this patchset.  A simple port
>test is failing and IPv4 routing test is failing.
>
>The port test is simple: just connect a port on DUT to a port on
>another system and assign an IP address to each port and verify IP
>connectivity.  I have this:
>
>   DUT:sw1p1 (11.0.0.1/24) <---> host1:eth0 (11.0.0.2/24)
>
>The IPv4 routing tests is a bit more complicated to setup.  I'm using
>OSPF, but I'm not seeing full routes formed in the topology, so I
>suspect OSPF hellos aren't getting thru.
>
>Please fix find/fix these issues and send v3.  I don't want any git
>bisect issues when running tests.  Thanks.

I fixed that. Sending v3 in a sec. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next:master 647/682] fib_semantics.c:undefined reference to `__divdi3'

2015-10-06 Thread David Miller

From: Peter Nørlund 
Date: Mon, 5 Oct 2015 23:31:58 +0200

> What is the proper way to supply fixes for this kind of errors? Should
> I submit the original patches again with the bug fixed, or should I
> submit a patch fixing this particular problem? I imagine bisecting
> becomes annoying when a particular commit doesn't compile.

Once a patch is in my tree, it is part of the permanent record and
cannot be removed.  Therefore you must send me a fix for this
problem.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/5] net: dsa: exit probe if no switch were found

2015-10-06 Thread Neil Armstrong

On 10/03/2015 09:27 PM, Florian Fainelli wrote:
> Le 03/10/2015 07:26, Neil Armstrong a écrit :
>> If no switch were found in dsa_setup_dst, return -ENODEV and
>> exit the dsa_probe cleanly.
>>
>> Tested-by: Andrew Lunn 
>> Tested-by: Florian Fainelli 
>> Signed-off-by: Neil Armstrong 
>> ---
> 
> [snip]
> 
>>  static int dsa_probe(struct platform_device *pdev)
>> @@ -926,9 +937,9 @@ static int dsa_probe(struct platform_device *pdev)
>>
>>  platform_set_drvdata(pdev, dst);
>>
>> -dsa_setup_dst(dst, dev, >dev, pd);
>> -
>> -return 0;
>> +ret = dsa_setup_dst(dst, dev, >dev, pd);
>> +if (!ret)
>> +return 0;
> 
> That logic is a little weird, I would just go with something like this:
> 
> ret = dsa_setup_dst(ds, dev, >dev, pd);
> if (ret)
>   goto out;
> 
> return 0;
> 
Yes you are right, the goto out is needed to clean up the of_probe resources.

I will send a v3 with this fixed.

Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/5] net: dsa: complete dsa_switch_destroy

2015-10-06 Thread Neil Armstrong

On 10/03/2015 09:25 PM, Florian Fainelli wrote:
> Le 03/10/2015 07:26, Neil Armstrong a écrit :
>> When unbinding dsa, complete the dsa_switch_destroy to unregister the
>> fixed link phy then cleanly unregister and destroy the net devices.
>>
>> Signed-off-by: Neil Armstrong 
>> ---
> 
> [snip]
> 
>> +port_dn = cd->port_dn[port];
>> +if (of_phy_is_fixed_link(port_dn)) {
>> +phydev = of_phy_find_device(port_dn);
>> +if (phydev) {
>> +int addr = phydev->addr;
>> +phy_device_free(phydev);
>> +of_node_put(port_dn);
>> +fixed_phy_del(addr);
> 
> fixed_phy_del() removes the fixed PHY from the platform fixed MDIO bus
> list of PHYs, so we should be okay even with switch drivers which
> register a link_update callback via fixed_phy_set_link_update(), but I
> have not checked that. The sequence of call looks (phy_device_free then
> fixed_phy_del) looks sane though.
Actually I copied the exact sequence from the fixed link module, eventually 
this should be done in this module with a proper fixed_link_remove function.

> 
> Eventually this logic might be better moved into net/dsa/slave.c such
> that it is easy to see how it balances dsa_slave_phy_setup().
> 
> Reviewed-by: Florian Fainelli 
> 

Thanks for the review,

Neil
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next 00/14] rocker: add support for multiple worlds

2015-10-06 Thread John Fastabend

[...]

>>
>> Its related in that if you expose your device model you do not need
>> opaque strings to do wholesale reconfiguration of the device. Instead
>> if the parts of the device that are configurable are exposed to the
>> user they can build the "world" they want.
> 
> The disconnect here, I believe, is offloading to hw the Linux
> forwarding plane vs. offloading an arbitrary application's forwarding
> plane.  Switchdev (and rocker) are about offloading the Linux
> dataplane.  That means Linux _is_ the application (the NOS); hw
> offloads what it can from the kernel to accelerate pkt forwarding.
> But the user's experience is standard Linux tools (iproute2, netlink)
> and building blocks (bridge, bond, etc) are used to construct a switch
> (or router), and the fact that the data path is offloaded to hw is
> transparent to the user.  We could define APIs for arbitrary
> applications to program hardware, like John is suggesting. by giving
> up raw access to hw resources, like tables, etc.  This approach moves
> the "driver" to the application, and by-passes the Linux tools and
> building blocks.  We're still TBD on these APIs, probably because of
> the "by-pass" part.

Thanks Scott I think this helps some.

I don't view my approach as a by-pass though or even letting arbitrary
applications have access to the hardware. Today I load arbitrary
filters and bpf programs into the kernel to create a pipeline. Now I
want to string a couple other tables in front of my pipeline to do
some of the heavy lifting. Maybe the real difference is my _datapath_
is not offloaded (by-passing?) the kernel. Most (all?) of my packets
are meant for the host and I want to do partial offloading where some
of the initial processing is done in the hardware and the rest is
handled by software. The "driver" is not in the application it is
still in the kernel.

I almost have something ready to kick out I meant to do this today
might be another day or two though.

> 
> Jiri's patchset here is about moving things around so he can define
> another hw mode in rocker.   The upper edge for rocker driver is still
> switchdev, but with the new eBPF hw mode he's working on, he'll be
> able to push down a dynamic pipeline rather than being stuck with the
> OF-DPA pipeline we have today (in rocker).  I presume once he has this
> new eBPF support, he'll program in a "Linux kernel" pipeline, and fill
> out the corresponding swtichdev ops.  I imagine a P4 -> ePBF compiler,
> and we take a linux.p4 and program hw.  Linux.p4 should be
> generic...consumable by any hardware...it is a representation of the
> Linux pipeline.  (Similar to P4's switch.p4).
> 
> But now, with eBPF mode in hw, an arbitrary.p4 could be written for
> that arbitrary application and pushed down.  We still need APIs for
> that application.
> 

My gripe here was flipping the hardware between modes with a string
value. It seems it has been dropped from the latest version though so
I have no problem with the patches.

.John

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 net-next] ipv4: Fix compilation errors in fib_rebalance

2015-10-06 Thread David Miller

From: Peter Nørlund 
Date: Tue,  6 Oct 2015 07:24:47 +0200

> This fixes
> 
> net/built-in.o: In function `fib_rebalance':
> fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3'
> 
> and
> 
> net/built-in.o: In function `fib_rebalance':
> net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod'
> 
> Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath")
> 
> Signed-off-by: Peter Nørlund 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net/mlx4_core: Test interrupts fail if not all comp vectors called request_irq

2015-10-06 Thread David Miller

From: Carol Soto 
Date: Mon, 5 Oct 2015 10:24:19 -0500

> 
> 
> On 10/4/2015 3:03 AM, Or Gerlitz wrote:
>> On 9/29/2015 9:38 PM, cls...@linux.vnet.ibm.com wrote:
>>> From: Carol L Soto 
>>>
>>> Test interrupts fails if not all completion vectors called
>>> request_irq. This case can happen if only mlx4_en loads and
>>> we have more completion vectors than rx rings.
>>
>> good catch! is this a bug since the driver 0-day or was introduced by
>> some recent commit? in the latercase, please add a Fixes: tag before
>> your S.O.B note.
> Probably the issue was introduced by this one
> Fixes: c66fa19c405a ('net/mlx4: Add EQ pool')

Please resubmit this patch with the Fixes: tag properly
added, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2] net: Fix unused variable compile warning

2015-10-06 Thread David Miller

From: Cong Wang 
Date: Mon, 5 Oct 2015 10:53:57 -0700

> On Thu, Oct 1, 2015 at 8:49 AM, David Ahern  wrote:
>> Eric's net namespace changes in 1b75097dd7a26 leaves net unreferenced if
>> CONFIG_IP_VS_IPV6 is not enabled:
>>
>> ../net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_out’:
>> ../net/netfilter/ipvs/ip_vs_core.c:1177:14: warning: unused variable ‘net’ 
>> [-Wunused-variable]
>>
>> After the net refactoring there is only 1 user; push the reference to the
>> 1 user. While the line length slightly exceeds 80 it seems to be the
>> best change.
>>
>> Fixes: 1b75097dd7a26("ipvs: Pass ipvs into ip_vs_out")
>> Signed-off-by: David Ahern 
> 
> I saw the same build warning, and this fix looks good to me.
> 
> DaveM, can you take this?

It should be submitted to netfilter-devel and Pablo should take it.
N§²ζμrΈyϊθΨb²X¬ΆΗ§vΨ^)ήΊ{.nΗ+·§zΧ^Ύ)ν
ζθw*jg¬±¨Άέ’j/κδzΉήΰ2ή¨θΪ&’)ί‘«aΆΪώψ�G«ιh�ζj:+v¨wθΩ₯

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Ingo Molnar


* Alexei Starovoitov  wrote:

> On 10/5/15 3:14 PM, Daniel Borkmann wrote:
> >One scenario that comes to mind ... what happens when there are kernel
> >pointers stored in skb->cb[] (either from the current layer or an old
> >one from a different layer that the skb went through previously, but
> >which did not get overwritten)?
> >
> >Socket filters could read a portion of skb->cb[] also when unprived and
> >leak that out through maps. I think the verifier doesn't catch that,
> >right?
> 
> grrr. indeed. previous layer before sk_filter() can leave junk in there.

Could this be solved by activating zeroing/sanitizing of this data if there's 
an 
active BPF function around that can access that socket?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ipvs: Remove unused net variable from ip_vs_out

2015-10-06 Thread Simon Horman

On Tue, Oct 06, 2015 at 09:35:29AM +0300, Julian Anastasov wrote:
> 
>   Hello,
> 
> On Fri, 2 Oct 2015, Simon Horman wrote:
> 
> > Since 6f2bcea9917d ("ipvs: Pass ipvs into ip_vs_in_icmp and 
> > ip_vs_in_icmp_v6")
> > the net net variable in ip_vs_out() appears to be unused so remove it.
> > 
> > Signed-off-by: Simon Horman 
> > ---
> >  net/netfilter/ipvs/ip_vs_core.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_core.c 
> > b/net/netfilter/ipvs/ip_vs_core.c
> > index d08df435c2aa..72b6a44ea9de 100644
> > --- a/net/netfilter/ipvs/ip_vs_core.c
> > +++ b/net/netfilter/ipvs/ip_vs_core.c
> > @@ -1172,7 +1172,6 @@ drop:
> >  static unsigned int
> >  ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff 
> > *skb, int af)
> >  {
> > -   struct net *net = ipvs->net;
> 
>   So, this line should be moved into the block
> where the var is used?

Indeed, I'll send an updated patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next] net/mlx4_core: Test interrupts fail if not all comp vectors called request_irq

2015-10-06 Thread Matan Barak




On 10/5/2015 11:12 PM, Or Gerlitz wrote:

On Mon, Oct 5, 2015 at 6:24 PM, Carol Soto  wrote:



On 10/4/2015 3:03 AM, Or Gerlitz wrote:


On 9/29/2015 9:38 PM, cls...@linux.vnet.ibm.com wrote:


From: Carol L Soto 

Test interrupts fails if not all completion vectors called
request_irq. This case can happen if only mlx4_en loads and
we have more completion vectors than rx rings.



good catch! is this a bug since the driver 0-day or was introduced by
some recent commit? in the latercase, please add a Fixes: tag before your
S.O.B note.


Probably the issue was introduced by this one
Fixes: c66fa19c405a ('net/mlx4: Add EQ pool')


Matan, agree to the fix and analysis? if yes, Carol, please add the
Fixes: tag to the change-log
and anyway respin/submit the patch against/for net not net-next



Yeah, we should (obviously) only check EQs which are assigned to
interrupts, so skipping unassigned EQs is a good idea.
Good catch and thanks for the fix.

Matan


Or.





Signed-off-by: Carol L Soto 
---
   drivers/net/ethernet/mellanox/mlx4/eq.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c
b/drivers/net/ethernet/mellanox/mlx4/eq.c
index 8e81e53..c344884 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -1364,6 +1364,10 @@ int mlx4_test_interrupts(struct mlx4_dev *dev)
* and performing a NOP command
*/
   for(i = 0; !err && (i < dev->caps.num_comp_vectors); ++i) {
+/* Make sure request_irq was called */
+if (!priv->eq_table.eq[i].have_irq)
+continue;
+
   /* Temporary use polling for command completions */
   mlx4_cmd_use_polling(dev);





--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next v4] net: Microchip encx24j600 driver

2015-10-06 Thread jon

From: Jon Ringle 

This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet
controller over a SPI bus interface. This driver makes use of the regmap API to
optimize access to registers by caching registers where possible.

Datasheet:
http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdf

Signed-off-by: Jon Ringle 
---

This patch requires that the pull request from Mark be applied first:
  git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap.git 
tags/regmap-offload-update-bits

Since the v3 patch, this v4 patch addesses the issue brought up by:

> Hello Jon Ringle,
> 
> This is a semi-automatic email about new static checker warnings.
> 
> The patch 04fbfce7a222: "net: Microchip encx24j600 driver" from Oct 
> 1, 2015, leads to the following Smatch complaint:
> 
> drivers/net/ethernet/microchip/encx24j600.c:313 encx24j600_tx_complete()
>warn: variable dereferenced before check 'priv->tx_skb' (see line 307)
> 
> drivers/net/ethernet/microchip/encx24j600.c
>306
>307dev->stats.tx_bytes += priv->tx_skb->len;
>
> Unchecked dereference.
> 
>308
>309encx24j600_clr_bits(priv, EIR, TXIF | TXABTIF);
>310
>311netif_dbg(priv, tx_done, dev, "TX Done%s\n", err ? ": 
> Err" : "");
>312
>313if (priv->tx_skb) {
> 
> Too late.
> 
>314dev_kfree_skb(priv->tx_skb);
>315priv->tx_skb = NULL;
> 
> regards,
> dan carpenter

Thank you,

Jon

 drivers/net/ethernet/microchip/Kconfig |9 +
 drivers/net/ethernet/microchip/Makefile|1 +
 drivers/net/ethernet/microchip/encx24j600-regmap.c |  513 +
 drivers/net/ethernet/microchip/encx24j600.c| 1127 
 drivers/net/ethernet/microchip/encx24j600_hw.h |  437 
 5 files changed, 2087 insertions(+)
 create mode 100644 drivers/net/ethernet/microchip/encx24j600-regmap.c
 create mode 100644 drivers/net/ethernet/microchip/encx24j600.c
 create mode 100644 drivers/net/ethernet/microchip/encx24j600_hw.h

diff --git a/drivers/net/ethernet/microchip/Kconfig 
b/drivers/net/ethernet/microchip/Kconfig
index afaf0c0..b45b28a 100644
--- a/drivers/net/ethernet/microchip/Kconfig
+++ b/drivers/net/ethernet/microchip/Kconfig
@@ -35,4 +35,13 @@ config ENC28J60_WRITEVERIFY
  Enable the verify after the buffer write useful for debugging purpose.
  If unsure, say N.
 
+config ENCX24J600
+tristate "ENCX24J600 support"
+depends on SPI
+---help---
+  Support for the Microchip ENC424J600 ethernet chip.
+
+  To compile this driver as a module, choose M here. The module will be
+  called enc424j600.
+
 endif # NET_VENDOR_MICROCHIP
diff --git a/drivers/net/ethernet/microchip/Makefile 
b/drivers/net/ethernet/microchip/Makefile
index 573d429..ff78f62 100644
--- a/drivers/net/ethernet/microchip/Makefile
+++ b/drivers/net/ethernet/microchip/Makefile
@@ -3,3 +3,4 @@
 #
 
 obj-$(CONFIG_ENC28J60) += enc28j60.o
+obj-$(CONFIG_ENCX24J600) += encx24j600.o encx24j600-regmap.o
diff --git a/drivers/net/ethernet/microchip/encx24j600-regmap.c 
b/drivers/net/ethernet/microchip/encx24j600-regmap.c
new file mode 100644
index 000..f3bb905
--- /dev/null
+++ b/drivers/net/ethernet/microchip/encx24j600-regmap.c
@@ -0,0 +1,513 @@
+/**
+ * Register map access API - ENCX24J600 support
+ *
+ * Copyright 2015 Gridpoint
+ *
+ * Author: Jon Ringle 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "encx24j600_hw.h"
+
+static inline bool is_bits_set(int value, int mask)
+{
+   return (value & mask) == mask;
+}
+
+static int encx24j600_switch_bank(struct encx24j600_context *ctx,
+int bank)
+{
+   int ret = 0;
+
+   int bank_opcode = BANK_SELECT(bank);
+   ret = spi_write(ctx->spi, _opcode, 1);
+   if (ret == 0)
+   ctx->bank = bank;
+
+   return ret;
+}
+
+static int encx24j600_cmdn(struct encx24j600_context *ctx, u8 opcode,
+   const void *buf, size_t len)
+{
+   struct spi_message m;
+   struct spi_transfer t[2] = { { .tx_buf = , .len = 1, },
+{ .tx_buf = buf, .len = len }, };
+   spi_message_init();
+   spi_message_add_tail([0], );
+   spi_message_add_tail([1], );
+
+   return spi_sync(ctx->spi, );
+}
+
+static void regmap_lock_mutex(void *context)
+{
+   struct encx24j600_context *ctx = context;
+   mutex_lock(>mutex);
+}
+
+static void regmap_unlock_mutex(void

[PATCH] af_unix: introduce unix_sk_const helper

2015-10-06 Thread Arnd Bergmann

Commit 124613012db1 ("af_unix: Convert the unix_sk macro to an inline
function for type safety") was recently added to catch incorrect
uses of the unix_sk helper using compiler warnings.

It has now caught one such case in lsm_audit.c. The code is technically
correct, but as it converts a const pointer to a non-const pointer,
the annotation got lost, which gcc now warns about.

This patch avoids the warning by introducing an additional helper
that has const input and output, which makes the lsm_audit code build
cleanly again.

Signed-off-by: Arnd Bergmann 
---
I'm not entirely happy with this workaround myself, but could not come
up with a better one.

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index cb1b9bbda332..1871b6436ee9 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -69,6 +69,11 @@ static inline struct unix_sock *unix_sk(struct sock *sk)
return (struct unix_sock *)sk;
 }
 
+static inline const struct unix_sock *unix_sk_const(const struct sock *sk)
+{
+   return (const struct unix_sock *)sk;
+}
+
 #define peer_wait peer_wq.wait
 
 long unix_inq_len(struct sock *sk);
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 2deace208db2..cb07f1318a27 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -307,7 +307,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
case LSM_AUDIT_DATA_NET:
if (a->u.net->sk) {
const struct sock *sk = a->u.net->sk;
-   struct unix_sock *u;
+   const struct unix_sock *u;
int len = 0;
char *p = NULL;
 
@@ -337,7 +337,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
}
 #endif
case AF_UNIX:
-   u = unix_sk(sk);
+   u = unix_sk_const(sk);
if (u->path.dentry) {
audit_log_d_path(ab, " path=", 
>path);
break;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4] net/bonding: send arp in interval if no active slave

2015-10-06 Thread Jarod Wilson

From: Uwe Koziolek 

With some very finicky switch hardware, active backup bonding can get into
a situation where we play ping-pong between interfaces, trying to get one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost, so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing with
this hardware combination.

[jarod: manufacturing of changelog, addition of modparam gating]
CC: Jay Vosburgh 
CC: Andy Gospodarek 
CC: Veaceslav Falico 
CC: netdev@vger.kernel.org
Signed-off-by: Uwe Koziolek 
Signed-off-by: Jarod Wilson 
---
v2: add code comment as to why change is needed
v3: fix wrapping of comments
v4: [jarod] add module parameter gating of code addition

 drivers/net/bonding/bond_main.c | 24 
 include/net/bonding.h   |  1 +
 2 files changed, 25 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 90f2615..72ab512 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -95,6 +95,7 @@ static int miimon;
 static int updelay;
 static int downdelay;
 static int use_carrier = 1;
+static int arp_slow_switch;
 static char *mode;
 static char *primary;
 static char *primary_reselect;
@@ -133,6 +134,10 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link 
down, "
 module_param(use_carrier, int, 0);
 MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; 
"
  "0 for off, 1 for on (default)");
+module_param(arp_slow_switch, int, 0);
+MODULE_PARM_DESC(arp_slow_switch, "Do extra arp checks for switches with arp "
+ "caches that are slow to update; "
+ "0 for off (default), 1 for on");
 module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, "
   "1 for active-backup, 2 for balance-xor, "
@@ -2793,6 +2798,18 @@ static bool bond_ab_arp_probe(struct bonding *bond)
return should_notify_rtnl;
}
 
+   /* Sometimes the forwarding tables of the switches are not update
+* fast enough, so the first arp response after a slave change is
+* received on the wrong slave.
+*
+* The arp requests will be retried 2 times on the same slave.
+*/
+   if (arp_slow_switch &&
+   bond_time_in_interval(bond, curr_arp_slave->last_link_up, 2)) {
+   bond_arp_send_all(bond, curr_arp_slave);
+   return should_notify_rtnl;
+   }
+
bond_set_slave_inactive_flags(curr_arp_slave, BOND_SLAVE_NOTIFY_LATER);
 
bond_for_each_slave_rcu(bond, slave, iter) {
@@ -4280,6 +4297,12 @@ static int bond_check_params(struct bond_params *params)
use_carrier = 1;
}
 
+   if ((arp_slow_switch != 0) && (arp_slow_switch != 1)) {
+   pr_warn("Warning: arp_slow_switch module parameter (%d), not of 
valid value (0/1), so it was set to 1\n",
+   arp_slow_switch);
+   arp_slow_switch = 1;
+   }
+
if (num_peer_notif < 0 || num_peer_notif > 255) {
pr_warn("Warning: num_grat_arp/num_unsol_na (%d) not in range 
0-255 so it was reset to 1\n",
num_peer_notif);
@@ -4516,6 +4539,7 @@ static int bond_check_params(struct bond_params *params)
params->updelay = updelay;
params->downdelay = downdelay;
params->use_carrier = use_carrier;
+   params->arp_slow_switch = arp_slow_switch;
params->lacp_fast = lacp_fast;
params->primary[0] = 0;
params->primary_reselect = primary_reselect_value;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index c1740a2..208d31c 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -120,6 +120,7 @@ struct bond_params {
int arp_validate;
int arp_all_targets;
int use_carrier;
+   int arp_slow_switch;
int fail_over_mac;
int updelay;
int downdelay;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC] net: mvneta: add ethtool statistics

2015-10-06 Thread Russell King

Add support for the ethtool statistic interface, returning the full set
of statistics which both Armada 370 and Armada XP can support.

Signed-off-by: Russell King 
---
Andrew,

Here's the patch updated to use the example set by mv643xx_eth.c.

 drivers/net/ethernet/marvell/mvneta.c | 99 +++
 1 file changed, 99 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 514df76fc70f..9f048ba92d0e 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -277,6 +277,50 @@
 
 #define MVNETA_RX_BUF_SIZE(pkt_size)   ((pkt_size) + NET_SKB_PAD)
 
+struct mvneta_statistic {
+   unsigned short offset;
+   unsigned short type;
+   const char name[ETH_GSTRING_LEN];
+};
+
+#define T_REG_32   32
+#define T_REG_64   64
+
+static const struct mvneta_statistic mvneta_statistics[] = {
+   { 0x3000, T_REG_64, "good_octets_received", },
+   { 0x3010, T_REG_32, "good_frames_received", },
+   { 0x3008, T_REG_32, "bad_octets_received", },
+   { 0x3014, T_REG_32, "bad_frames_received", },
+   { 0x3018, T_REG_32, "broadcast_frames_received", },
+   { 0x301c, T_REG_32, "multicast_frames_received", },
+   { 0x3050, T_REG_32, "unrec_mac_control_received", },
+   { 0x3058, T_REG_32, "good_fc_received", },
+   { 0x305c, T_REG_32, "bad_fc_received", },
+   { 0x3060, T_REG_32, "undersize_received", },
+   { 0x3064, T_REG_32, "fragments_received", },
+   { 0x3068, T_REG_32, "oversize_received", },
+   { 0x306c, T_REG_32, "jabber_received", },
+   { 0x3070, T_REG_32, "mac_receive_error", },
+   { 0x3074, T_REG_32, "bad_crc_event", },
+   { 0x3078, T_REG_32, "collision", },
+   { 0x307c, T_REG_32, "late_collision", },
+   { 0x2484, T_REG_32, "rx_discard", },
+   { 0x2488, T_REG_32, "rx_overrun", },
+   { 0x3020, T_REG_32, "frames_64_octets", },
+   { 0x3024, T_REG_32, "frames_65_to_127_octets", },
+   { 0x3028, T_REG_32, "frames_128_to_255_octets", },
+   { 0x302c, T_REG_32, "frames_256_to_511_octets", },
+   { 0x3030, T_REG_32, "frames_512_to_1023_octets", },
+   { 0x3034, T_REG_32, "frames_1024_to_max_octets", },
+   { 0x3038, T_REG_64, "good_octets_sent", },
+   { 0x3040, T_REG_32, "good_frames_sent", },
+   { 0x3044, T_REG_32, "excessive_collision", },
+   { 0x3048, T_REG_32, "multicast_frames_sent", },
+   { 0x304c, T_REG_32, "broadcast_frames_sent", },
+   { 0x3054, T_REG_32, "fc_sent", },
+   { 0x300c, T_REG_32, "internal_mac_transmit_err", },
+};
+
 struct mvneta_pcpu_stats {
struct  u64_stats_sync syncp;
u64 rx_packets;
@@ -312,6 +356,8 @@ struct mvneta_port {
unsigned int speed;
unsigned int tx_csum_limit;
int use_inband_status:1;
+
+   u64 ethtool_stats[ARRAY_SIZE(mvneta_statistics)];
 };
 
 /* The mvneta_tx_desc and mvneta_rx_desc structures describe the
@@ -2875,6 +2921,56 @@ static int mvneta_ethtool_set_ringparam(struct 
net_device *dev,
return 0;
 }
 
+static void mvneta_ethtool_get_strings(struct net_device *netdev, u32 sset,
+  u8 *data)
+{
+   if (sset == ETH_SS_STATS) {
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(mvneta_statistics); i++)
+   memcpy(data + i * ETH_GSTRING_LEN,
+  mvneta_statistics[i].name, ETH_GSTRING_LEN);
+   }
+}
+
+static void mvneta_ethtool_get_stats(struct net_device *dev,
+struct ethtool_stats *stats, u64 *data)
+{
+   struct mvneta_port *pp = netdev_priv(dev);
+   const struct mvneta_statistic *s;
+   void __iomem *base = pp->base;
+   u32 high, low, val;
+   int i;
+
+   for (i = 0, s = mvneta_statistics;
+s < mvneta_statistics + ARRAY_SIZE(mvneta_statistics);
+s++, i++) {
+   val = 0;
+
+   switch (s->type) {
+   case T_REG_32:
+   val = readl_relaxed(base + s->offset);
+   break;
+   case T_REG_64:
+   /* Docs say to read low 32-bit then high */
+   low = readl_relaxed(base + s->offset);
+   high = readl_relaxed(base + s->offset + 4);
+   val = (u64)high << 32 | low;
+   break;
+   }
+
+   pp->ethtool_stats[i] += val;
+   *data++ = pp->ethtool_stats[i];
+   }
+}
+
+static int mvneta_ethtool_get_sset_count(struct net_device *dev, int sset)
+{
+   if (sset == ETH_SS_STATS)
+   return ARRAY_SIZE(mvneta_statistics);
+   return -EOPNOTSUPP;
+}
+
 static const struct net_device_ops mvneta_netdev_ops = {
.ndo_open= mvneta_open,
.ndo_stop= mvneta_stop,
@@ -2896,6

Re: [PATCH] bridge/netfilter: avoid unused label warning

2015-10-06 Thread Arnd Bergmann

On Tuesday 06 October 2015 21:28:29 Nikolay Aleksandrov wrote:
> I posted a fix for this a couple of days ago, but I like your approach better.
> Since mine is not yet applied (I sent it to netfilter-devel only, wasn't sure 
> which
> jurisdiction this falls into exactly) we can drop it.
> Just for reference my patch is here:
> http://patchwork.ozlabs.org/patch/526417/
> Pablo, could you please drop it ?
> 
> By the way this takes care of another warning about unused variable 
> (nf_bridge), too.

Hmm I did not get that one, at least not in 4.3-rc4.

> Reviewed-by: Nikolay Aleksandrov 

Thanks and sorry for missing the Cc to the netfilter list on my patch.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 1/2] mpls: multipath support

2015-10-06 Thread Eric W. Biederman

Roopa Prabhu  writes:

> From: Roopa Prabhu 
>
> This patch adds support for MPLS multipath routes.
>
> Includes following changes to support multipath:
> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
>
> - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
>
> - moves mpls route and nexthop structures into internal.h
>
> - A mpls_route can point to multiple mpls_nh structs
>
> - the nexthops are maintained as a list

So I am not certain I like nexthops being a list.  In the practical case
introducing this list guarantees that everyone will see at least an
extra cache line miss in the forwarding path.

In the more abstract sense a list is the wrong data structure.  If the
list is so short we can afford to walk it an array is a better data
structure.  If we need enough entries to make the memory consumption
of an array a concern we want some kind of hash table or tree data
structure, because a list will be too long in that case.

So can we please not use a list?

I expect we can simplify the data structures by noting that rt_via must
be an ethernet mac today so that 6 bytes are enough and 8 bytes gives us
a bit extra and aligns things nicely.

Also I know it goes away in the next patch but a spinlock taken for
every transit through the forwarding path really bugs me.

Eric

> - In the process of restructuring, this patch also consistently changes all
> labels to u8
>
> - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
> multipath routes similar to ipv4/v6 fib
>
> - In this patch, the multipath route nexthop selection algorithm
> is a simple round robin picked up from ipv4 fib code and is replaced by
> a hash based algorithm from Robert Shearman in the next patch
>
> - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
> mpls_route_update though implemented to update based on dev, it was never
> used that way. And the dev handling gets tricky with multiple nexthops. Cannot
> match against any single nexthops dev. So, this patch removes the unused
> 'dev' handling in mpls_route_update.

>
> Example:
>
> $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
> nexthop as 700 via inet 10.1.1.6 dev swp2 \
> nexthop as 800 via inet 40.1.1.2 dev swp3
>
> $ip  -f mpls route show
> 100
> nexthop as to 200 via inet 10.1.1.2  dev swp1
> nexthop as to 700 via inet 10.1.1.6  dev swp2
> nexthop as to 800 via inet 40.1.1.2  dev swp3
>
> Signed-off-by: Roopa Prabhu 
> ---
>  include/net/mpls_iptunnel.h |   2 +-
>  net/mpls/af_mpls.c  | 627 
> +---
>  net/mpls/internal.h |  43 ++-
>  3 files changed, 516 insertions(+), 156 deletions(-)
>
> diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
> index 4757997..179253f 100644
> --- a/include/net/mpls_iptunnel.h
> +++ b/include/net/mpls_iptunnel.h
> @@ -18,7 +18,7 @@
>  
>  struct mpls_iptunnel_encap {
>   u32 label[MAX_NEW_LABELS];
> - u32 labels;
> + u8  labels;
>  };
>  
>  static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
> lwtunnel_state *lwtstate)
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 8c5707d..ae9e153 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -19,39 +19,12 @@
>  #include 
>  #include 
>  #endif
> +#include 
>  #include "internal.h"
>  
> -#define LABEL_NOT_SPECIFIED (1<<20)
> -#define MAX_NEW_LABELS 2
> -
> -/* This maximum ha length copied from the definition of struct neighbour */
> -#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
> -
> -enum mpls_payload_type {
> - MPT_UNSPEC, /* IPv4 or IPv6 */
> - MPT_IPV4 = 4,
> - MPT_IPV6 = 6,
> -
> - /* Other types not implemented:
> -  *  - Pseudo-wire with or without control word (RFC4385)
> -  *  - GAL (RFC5586)
> -  */
> -};
> -
> -struct mpls_route { /* next hop label forwarding entry */
> - struct net_device __rcu *rt_dev;
> - struct rcu_head rt_rcu;
> - u32 rt_label[MAX_NEW_LABELS];
> - u8  rt_protocol; /* routing protocol that set this 
> entry */
> - u8  rt_payload_type;
> - u8  rt_labels;
> - u8  rt_via_alen;
> - u8  rt_via_table;
> - u8  rt_via[0];
> -};
> -
>  static int zero = 0;
>  static int label_limit = (1 << 20) - 1;
> +static DEFINE_SPINLOCK(mpls_multipath_lock);
>  
>  static void rtmsg_lfib(int event, u32 label, struct mpls_route *rt,
>  struct nlmsghdr *nlh, struct net *net, u32 portid,
> @@ -80,10 +53,10 @@ bool mpls_output_possible(const struct net_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(mpls_output_possible);
>  
> -static unsigned int mpls_rt_header_size(const struct

Re: [PATCH v2] Documentation: improve line discipline method descriptions

2015-10-06 Thread Paul Bolle

Hi Tilman,

On wo, 2015-09-30 at 01:45 +0200, Tilman Schmidt wrote:
> --- a/Documentation/serial/tty.txt
> +++ b/Documentation/serial/tty.txt

> + Should [...] set receive_room
> + in the tty_struct to the maximum amount of data
> + the line discipline is willing to accept from the
> + driver with a single call to receive_buf().

A lot clearer than v1! (I do assume the TTY people will shout if that
sentence isn't actually correct.)

Thanks for looking into this, again.


Paul Bolle
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] net/bonding: send arp in interval if no active slave

2015-10-06 Thread Jarod Wilson


Jarod Wilson wrote:

From: Uwe Koziolek

With some very finicky switch hardware, active backup bonding can get into
a situation where we play ping-pong between interfaces, trying to get one
to come up as the active slave. There seems to be an issue with the
switch's arp replies either taking too long, or simply getting lost, so we
wind up unable to get any interface up and active. Sometimes, the issue
sorts itself out after a while, sometimes it doesn't.

Testing with num_grat_arp has proven fruitless, but sending an additional
arp on curr_arp_slave if we're still in the arp_interval timeslice in
bond_ab_arp_probe(), has shown to produce 100% reliability in testing with
this hardware combination.

[jarod: manufacturing of changelog, addition of modparam gating]
CC: Jay Vosburgh
CC: Andy Gospodarek
CC: Veaceslav Falico
CC: netdev@vger.kernel.org
Signed-off-by: Uwe Koziolek
Signed-off-by: Jarod Wilson
---
v2: add code comment as to why change is needed
v3: fix wrapping of comments
v4: [jarod] add module parameter gating of code addition

  drivers/net/bonding/bond_main.c | 24 
  include/net/bonding.h   |  1 +
  2 files changed, 25 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 90f2615..72ab512 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -95,6 +95,7 @@ static int miimon;
  static int updelay;
  static int downdelay;
  static int use_carrier= 1;
+static int arp_slow_switch;
  static char *mode;
  static char *primary;
  static char *primary_reselect;
@@ -133,6 +134,10 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link 
down, "
  module_param(use_carrier, int, 0);
  MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; 
"
  "0 for off, 1 for on (default)");
+module_param(arp_slow_switch, int, 0);
+MODULE_PARM_DESC(arp_slow_switch, "Do extra arp checks for switches with arp "
+ "caches that are slow to update; "
+ "0 for off (default), 1 for on");
  module_param(mode, charp, 0);
  MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, "
   "1 for active-backup, 2 for balance-xor, "
@@ -2793,6 +2798,18 @@ static bool bond_ab_arp_probe(struct bonding *bond)
return should_notify_rtnl;
}

+   /* Sometimes the forwarding tables of the switches are not update
+* fast enough, so the first arp response after a slave change is
+* received on the wrong slave.
+*
+* The arp requests will be retried 2 times on the same slave.
+*/
+   if (arp_slow_switch &&


This here should actually be bond->params.arp_slow_switch, but I'd like 
to hear first if a module parameter gating this change is even a 
remotely acceptable idea. It'd keep the logic identical in the default 
case though, and still allow for people like Uwe that need it to deploy 
the work-around.


Though I'm slightly curious if this problem does NOT manifest by simply 
setting a larger arp_interval. Early on, I thought I'd heard that other 
intervals had been tried with the same results, but a comment in this 
thread suggested maybe only 500 had been tried.


--
Jarod Wilson
ja...@redhat.com


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 1/2] mpls: multipath support

2015-10-06 Thread Eric W. Biederman

ebied...@xmission.com (Eric W. Biederman) writes:

> Roopa Prabhu  writes:
>
>> From: Roopa Prabhu 
>>
>> This patch adds support for MPLS multipath routes.
>>
>> Includes following changes to support multipath:
>> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
>>
>> - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
>>
>> - moves mpls route and nexthop structures into internal.h
>>
>> - A mpls_route can point to multiple mpls_nh structs
>>
>> - the nexthops are maintained as a list
>
> So I am not certain I like nexthops being a list.  In the practical case
> introducing this list guarantees that everyone will see at least an
> extra cache line miss in the forwarding path.
>
> In the more abstract sense a list is the wrong data structure.  If the
> list is so short we can afford to walk it an array is a better data
> structure.  If we need enough entries to make the memory consumption
> of an array a concern we want some kind of hash table or tree data
> structure, because a list will be too long in that case.
>
> So can we please not use a list?
>
> I expect we can simplify the data structures by noting that rt_via must
> be an ethernet mac today so that 6 bytes are enough and 8 bytes gives us
> a bit extra and aligns things nicely.

Grr. My mistake.  The current worst case is 16 bytes for an ipv6
address in rt_via.  But the point remains that a fixed sized array of
bytes in rt_via allows the use of an array and not a list for nexthops.

At least for the single nexthop case I really want something that is
small enough it fits in a single 64byte cache line.  The performance
compared to anything else is going to be noticable.

Eric

> Also I know it goes away in the next patch but a spinlock taken for
> every transit through the forwarding path really bugs me.
>
> Eric
>
>> - In the process of restructuring, this patch also consistently changes all
>> labels to u8
>>
>> - Adds support to parse/fill RTA_MULTIPATH netlink attribute for
>> multipath routes similar to ipv4/v6 fib
>>
>> - In this patch, the multipath route nexthop selection algorithm
>> is a simple round robin picked up from ipv4 fib code and is replaced by
>> a hash based algorithm from Robert Shearman in the next patch
>>
>> - mpls_route_update cleanup: remove 'dev' handling in mpls_route_update.
>> mpls_route_update though implemented to update based on dev, it was never
>> used that way. And the dev handling gets tricky with multiple nexthops. 
>> Cannot
>> match against any single nexthops dev. So, this patch removes the unused
>> 'dev' handling in mpls_route_update.
>
>>
>> Example:
>>
>> $ip -f mpls route add 100 nexthop as 200 via inet 10.1.1.2 dev swp1 \
>> nexthop as 700 via inet 10.1.1.6 dev swp2 \
>> nexthop as 800 via inet 40.1.1.2 dev swp3
>>
>> $ip  -f mpls route show
>> 100
>> nexthop as to 200 via inet 10.1.1.2  dev swp1
>> nexthop as to 700 via inet 10.1.1.6  dev swp2
>> nexthop as to 800 via inet 40.1.1.2  dev swp3
>>
>> Signed-off-by: Roopa Prabhu 
>> ---
>>  include/net/mpls_iptunnel.h |   2 +-
>>  net/mpls/af_mpls.c  | 627 
>> +---
>>  net/mpls/internal.h |  43 ++-
>>  3 files changed, 516 insertions(+), 156 deletions(-)
>>
>> diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
>> index 4757997..179253f 100644
>> --- a/include/net/mpls_iptunnel.h
>> +++ b/include/net/mpls_iptunnel.h
>> @@ -18,7 +18,7 @@
>>  
>>  struct mpls_iptunnel_encap {
>>  u32 label[MAX_NEW_LABELS];
>> -u32 labels;
>> +u8  labels;
>>  };
>>  
>>  static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
>> lwtunnel_state *lwtstate)
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 8c5707d..ae9e153 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -19,39 +19,12 @@
>>  #include 
>>  #include 
>>  #endif
>> +#include 
>>  #include "internal.h"
>>  
>> -#define LABEL_NOT_SPECIFIED (1<<20)
>> -#define MAX_NEW_LABELS 2
>> -
>> -/* This maximum ha length copied from the definition of struct neighbour */
>> -#define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
>> -
>> -enum mpls_payload_type {
>> -MPT_UNSPEC, /* IPv4 or IPv6 */
>> -MPT_IPV4 = 4,
>> -MPT_IPV6 = 6,
>> -
>> -/* Other types not implemented:
>> - *  - Pseudo-wire with or without control word (RFC4385)
>> - *  - GAL (RFC5586)
>> - */
>> -};
>> -
>> -struct mpls_route { /* next hop label forwarding entry */
>> -struct net_device __rcu *rt_dev;
>> -struct rcu_head rt_rcu;
>> -u32 rt_label[MAX_NEW_LABELS];
>> -u8  rt_protocol; /* routing protocol that set this 
>> entry */
>> -u8  rt_payload_type;
>> -u8  rt_labels;
>> -u8

Re: [PATCH net-next v2 1/2] mpls: multipath support

2015-10-06 Thread roopa

On 10/6/15, 12:44 PM, Eric W. Biederman wrote:
> Roopa Prabhu  writes:
>
>> From: Roopa Prabhu 
>>
>> This patch adds support for MPLS multipath routes.
>>
>> Includes following changes to support multipath:
>> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
>>
>> - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
>>
>> - moves mpls route and nexthop structures into internal.h
>>
>> - A mpls_route can point to multiple mpls_nh structs
>>
>> - the nexthops are maintained as a list
> So I am not certain I like nexthops being a list.  In the practical case
> introducing this list guarantees that everyone will see at least an
> extra cache line miss in the forwarding path.
>
> In the more abstract sense a list is the wrong data structure.  If the
> list is so short we can afford to walk it an array is a better data
> structure.  If we need enough entries to make the memory consumption
> of an array a concern we want some kind of hash table or tree data
> structure, because a list will be too long in that case.
>
> So can we please not use a list?
sure, I used arrays the first time. 
http://marc.info/?l=linux-netdev=143932956719398=2
And i am very much ok with an array.  I used list in v2 by following the ipv6 
fib code following comments from v1.


The only place the lookup is sensitive is in the nexthop selection in datapath. 
And depending
on how the selection algorithm works, i am not sure if using a hash table will 
help there.
I will look though.

I did prefer an array and If you are ok with an array, I will respin.

>
> I expect we can simplify the data structures by noting that rt_via must
> be an ethernet mac today so that 6 bytes are enough and 8 bytes gives us
> a bit extra and aligns things nicely.
>
> Also I know it goes away in the next patch but a spinlock taken for
> every transit through the forwarding path really bugs me.
yes, agree. I picked that from ipv4 fib. since it goes away with Roberts patch 
I did not spend any time on it.

thanks for the review.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH net-next v2 1/2] mpls: multipath support

2015-10-06 Thread roopa

On 10/6/15, 1:11 PM, Eric W. Biederman wrote:
> ebied...@xmission.com (Eric W. Biederman) writes:
>
>> Roopa Prabhu  writes:
>>
>>> From: Roopa Prabhu 
>>>
>>> This patch adds support for MPLS multipath routes.
>>>
>>> Includes following changes to support multipath:
>>> - splits struct mpls_route into 'struct mpls_route + struct mpls_nh'
>>>
>>> - 'struct mpls_nh' represents a mpls nexthop label forwarding entry
>>>
>>> - moves mpls route and nexthop structures into internal.h
>>>
>>> - A mpls_route can point to multiple mpls_nh structs
>>>
>>> - the nexthops are maintained as a list
>> So I am not certain I like nexthops being a list.  In the practical case
>> introducing this list guarantees that everyone will see at least an
>> extra cache line miss in the forwarding path.
>>
>> In the more abstract sense a list is the wrong data structure.  If the
>> list is so short we can afford to walk it an array is a better data
>> structure.  If we need enough entries to make the memory consumption
>> of an array a concern we want some kind of hash table or tree data
>> structure, because a list will be too long in that case.
>>
>> So can we please not use a list?
>>
>> I expect we can simplify the data structures by noting that rt_via must
>> be an ethernet mac today so that 6 bytes are enough and 8 bytes gives us
>> a bit extra and aligns things nicely.
> Grr. My mistake.  The current worst case is 16 bytes for an ipv6
> address in rt_via.  But the point remains that a fixed sized array of
> bytes in rt_via allows the use of an array and not a list for nexthops.
>
> At least for the single nexthop case I really want something that is
> small enough it fits in a single 64byte cache line.  The performance
> compared to anything else is going to be noticable.
>
agree. Just responded to your last email. I moved from array to list only 
because of the extra bytes.
I would prefer an array too.

http://marc.info/?l=linux-netdev=143932956719398=2

or

https://patchwork.ozlabs.org/patch/506226/


link to full series is here: 
http://marc.info/?l=linux-netdev=143932955919395=2

thanks,
Roopa

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch net-next v2 06/13] rocker: introduce worlds infrastructure

2015-10-06 Thread Rustad, Mark D

Jiri Pirko  wrote:

>>> +   int (*port_init)(struct rocker_port *rocker_port, void *priv,
>>> +void *port_priv);
>> 
>> Yuck, void *.  Can we do better?
> 
> I see nothing wrong with this priv usage. It's done like this on many
> places. I think it is completely legit, since the call points are well
> defined and wrapped.

This particular call is perhaps the most troubling. In general, if there is one 
void parameter you may well get a compile error on a non-void parameter if you 
get them switched around. With two void parameters that is no longer the case, 
making it even more error-prone than the other uses of void *.

--
Mark Rustad, Networking Division, Intel Corporation

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [patch net-next v3 06/14] rocker: introduce worlds infrastructure

2015-10-06 Thread John Fastabend

[...]

>>>
>>> Using void * in these ops is unacceptable, I can't agree to this patch.
>>>
>>> There is a much cleaner way to architect this.  If you look at the ops
>>> defined, they're mostly duplicates of the already defined
>>> switchdev_ops.  It would be much cleaner to:
>>>
>>> 0) set port mode on qemu/rocker (the device)
>>> 1) get the port mode on port probe
>>> 2) based on port mode, set the switchdev_ops to point to the port mode
>>> world switchdev_ops
>>> 3) sub-class rocker_port, like I mentioned in before, to store
>>> world-specific stuff in rocker_port
>>>
>>> I don't buy the argument that we need to change port mode dynamically
>>> from the driver.  Set it at the device and be done.
>>>
>>
>> Maybe as a reference this strikes me as similar to how we do multiple
>> device support in a single driver like ixgbe or fm10k (the two I'm most
>> familiar with). At probe time we read the device id and then stub in
>> the specific callbacks for that device.
> 
> Exactly
> 
>> Sorry I'm still hung up on the multiple worlds thing, is it really
>> trying to model different devices under a single driver? In which case
>> maybe rather than port mode expose it as its own device id. Just a
>> thought.
> 
> Yes, different devices under single driver.  New device ID or
> sub-device ID will not work in this case as we're trying to slice it
> at the port level, not the device level.
> 

OK uncovered my next level of suspicion/confusion.

Do you actually have or seen hardware that has completely different
programming interface per port? And completely different pipelines?

This seems really strange to me and perhaps just an artifact of
the qemu implementation? Typically or at least what I expect is you
have a switch pipeline with a set of data structures, tcams, hash
tables, etc all connected together in some (possibly configurable)
topology. Ports feed packets into this pipeline and packets egress
out ports. In my logical view of a "switch" device the pipeline
is a shared resource you can partition it so that ports are isolated
in some sense but you can't use fundamentally different underlying
resources per ports. Its not a per port attribute/mode like this
series sort of hints at.

Also I wonder how this works when a pkt ingresses a port in mode A and
egresses a port in mode B? What fib/fdb tables does it cross when this
happens? It seems easier to just have two switch devices not a
hybrid. If this per port implementation maps to some hardware that
would be really interesting though.

.John



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] Do not set shared_ports when nreq > MAX_MSIX

2015-10-06 Thread Carol Soto




On 10/6/2015 4:39 PM, Or Gerlitz wrote:

On Wed, Oct 7, 2015 at 12:27 AM,   wrote:

From: Carol L Soto 

If we get MAX_MSIX interrupts would like to have each receive ring
with his own msix interrupt line.

so 9293267a3e2a  was only partially correct? and/or not fully optimal?
please elaborate more on that in your change log.
just not fully optimal, with commit 9293267a3e2a if I have 64 MSIXs and 
2 ports I can get 8 rings for each port but then the rings will share 
the interrupt lines. For 64 MSIXs we can have each ring with his own 
interrupt line.



Fixes: 9293267a3e2a ('net/mlx4_core: Capping number of requested MSIXs to 
MAX_MSIX')
Signed-off-by: Carol L Soto 

Carol, you didn't use net/mlx4: prefix as ask for mlx4 driver patch
titles, so please repost, but before that I'd like to see an ack from
Matan for this patch as well.

Sorry completely missed it. When Matan acks will resend it.


Or.


---
  drivers/net/ethernet/mellanox/mlx4/main.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 006757f..f03f513 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2673,10 +2673,8 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)

 nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs,
  nreq);
-   if (nreq > MAX_MSIX) {
+   if (nreq > MAX_MSIX)
 nreq = MAX_MSIX;
-   shared_ports = true;
-   }

 entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
 if (!entries)
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next] tcp: ensure prior synack rtx behavior with small backlogs

2015-10-06 Thread Eric Dumazet

From: Eric Dumazet 

Some applications use a listen() backlog of 1.

Prior kernels were silently enforcing a qlen_log of 4, so that we were
sending up to /proc/sys/net/ipv4/tcp_synack_retries SYNACK messages.

Fixes: ef547f2ac16b ("tcp: remove max_qlen_log")
Signed-off-by: Eric Dumazet 
---
 net/ipv4/inet_connection_sock.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 89eedfbd4ad5..514b9e910bd4 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -579,7 +579,7 @@ static void reqsk_timer_handler(unsigned long data)
 * ones are about to clog our table.
 */
qlen = reqsk_queue_len(queue);
-   if ((qlen << 1) > sk_listener->sk_max_ack_backlog) {
+   if ((qlen << 1) > max(8U, sk_listener->sk_max_ack_backlog)) {
int young = reqsk_queue_len_young(queue) << 1;
 
while (thresh > 2) {


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] igb: improve handling of disconnected adapters

2015-10-06 Thread kbuild test robot

Hi Jarod,

[auto build test ERROR on v4.3-rc4 -- if it's inappropriate base, please ignore]

config: i386-randconfig-x009-201540 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   drivers/net/ethernet/intel/igb/igb_main.c: In function 'igb_request_msix':
>> drivers/net/ethernet/intel/igb/igb_main.c:962:35: error: 'struct 
>> igb_adapter' has no member named 'io_addr'
  q_vector->itr_register = adapter->io_addr + E1000_EITR(vector);
  ^
>> drivers/net/ethernet/intel/igb/igb_main.c:949:19: warning: unused variable 
>> 'hw' [-Wunused-variable]
 struct e1000_hw *hw = >hw;
  ^
   drivers/net/ethernet/intel/igb/igb_main.c: In function 'igb_alloc_q_vector':
   drivers/net/ethernet/intel/igb/igb_main.c:1233:34: error: 'struct 
igb_adapter' has no member named 'io_addr'
 q_vector->itr_register = adapter->io_addr + E1000_EITR(0);
 ^
   drivers/net/ethernet/intel/igb/igb_main.c: In function 
'igb_configure_tx_ring':
   drivers/net/ethernet/intel/igb/igb_main.c:3282:22: error: 'struct 
igb_adapter' has no member named 'io_addr'
 ring->tail = adapter->io_addr + E1000_TDT(reg_idx);
 ^
   drivers/net/ethernet/intel/igb/igb_main.c: In function 
'igb_configure_rx_ring':
   drivers/net/ethernet/intel/igb/igb_main.c:3638:22: error: 'struct 
igb_adapter' has no member named 'io_addr'
 ring->tail = adapter->io_addr + E1000_RDT(reg_idx);
 ^

vim +962 drivers/net/ethernet/intel/igb/igb_main.c

   943   *  igb_request_msix allocates MSI-X vectors and requests interrupts 
from the
   944   *  kernel.
   945   **/
   946  static int igb_request_msix(struct igb_adapter *adapter)
   947  {
   948  struct net_device *netdev = adapter->netdev;
 > 949  struct e1000_hw *hw = >hw;
   950  int i, err = 0, vector = 0, free_vector = 0;
   951  
   952  err = request_irq(adapter->msix_entries[vector].vector,
   953igb_msix_other, 0, netdev->name, adapter);
   954  if (err)
   955  goto err_out;
   956  
   957  for (i = 0; i < adapter->num_q_vectors; i++) {
   958  struct igb_q_vector *q_vector = adapter->q_vector[i];
   959  
   960  vector++;
   961  
 > 962  q_vector->itr_register = adapter->io_addr + 
 > E1000_EITR(vector);
   963  
   964  if (q_vector->rx.ring && q_vector->tx.ring)
   965  sprintf(q_vector->name, "%s-TxRx-%u", 
netdev->name,

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

1 2 3 >

1 - 100 of 221 matches

Mail list logo