Re: [PATCH v10 00/12] Add driver bpf hook for early packet drop and forwarding

2016-07-19 Thread David Miller
From: Brenden Blanco 
Date: Tue, 19 Jul 2016 12:16:45 -0700

> This patch set introduces new infrastructure for programmatically
> processing packets in the earliest stages of rx, as part of an effort
> others are calling eXpress Data Path (XDP) [1]. Start this effort by
> introducing a new bpf program type for early packet filtering, before
> even an skb has been allocated.
> 
> Extend on this with the ability to modify packet data and send back out
> on the same port.

Series applied, thanks.


[PATCH 1/1] ixgbevf: replace num with macro

2016-07-19 Thread zyjzyj2000
From: Zhu Yanjun 

With the original num, when a several bits state is set, it is
possible that the wrong test occurs.

For example,

a state is 0x3, its bits are 11. When testing a state 0x2 whose
bits are 10, it is difficult to confirm that state 0x2 is set or
not.

As such, the MACROs are defined to avoid the above error.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 60fc63b..d37b910 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -436,18 +436,16 @@ struct ixgbevf_adapter {
u8 rss_indir_tbl[IXGBEVF_X550_VFRETA_SIZE];
 };
 
-enum ixbgevf_state_t {
-   __IXGBEVF_TESTING,
-   __IXGBEVF_RESETTING,
-   __IXGBEVF_DOWN,
-   __IXGBEVF_DISABLED,
-   __IXGBEVF_REMOVING,
-   __IXGBEVF_SERVICE_SCHED,
-   __IXGBEVF_SERVICE_INITED,
-   __IXGBEVF_RESET_REQUESTED,
-   __IXGBEVF_QUEUE_RESET_REQUESTED,
-   __IXGBEVF_HW_RESETTING,
-};
+#define__IXGBEVF_TESTING   0x0001
+#define__IXGBEVF_RESETTING 0x0002
+#define__IXGBEVF_DOWN  0x0004
+#define__IXGBEVF_DISABLED  0x0008
+#define__IXGBEVF_REMOVING  0x0010
+#define__IXGBEVF_SERVICE_SCHED 0x0020
+#define__IXGBEVF_SERVICE_INITED0x0040
+#define__IXGBEVF_RESET_REQUESTED   0x0080
+#define__IXGBEVF_QUEUE_RESET_REQUESTED 0x0100
+#define__IXGBEVF_HW_RESETTING  0x0200
 
 enum ixgbevf_boards {
board_82599_vf,
-- 
2.7.4



RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> >> From: kbuild test robot [mailto:l...@intel.com]
> >> [auto build test WARNING on net-next/master]
> >>
> >> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-
> >> Hyper-V-VM-Sockets-hv_sock/20160715-223433
> >> config: x86_64-randconfig-a0-07191719 (attached as .config)
> >> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> >> reproduce:
> >> # save the attached .config to linux build tree
> >> make ARCH=x86_64
> >>
> >> All warnings (new ones prefixed by >>):
> >>
> >>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
> >>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized
> in
> >> this function
> >>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used
> >> uninitialized in this function
> >>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used
> uninitialized
> >> in this function
> >>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
> >>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized
> in
> >> this function
> >> >> net/hv_sock/af_hvsock.o: warning: objtool:
> hvsock_on_channel_cb()+0x1d:
> >> function has unreachable instruction
> >
> > These warnings are all false alarms.
> 
> But you still have to quiet them.

Sure. Will do.

Thanks,
-- Dexuan


Re: [Patch-V2 2/3] chcr: Support for Chelsio's Crypto Hardware

2016-07-19 Thread David Miller
From: Yeshaswi M R Gowda 
Date: Mon, 18 Jul 2016 22:42:14 -0700

> +config CRYPTO_DEV_CHELSIO
> + tristate "Chelsio Crypto Co-processor Driver"
> + depends on PCI && NETDEVICES && ETHERNET
> + select CRYPTO_SHA1
> + select CRYPTO_SHA256
> + select CRYPTO_SHA512
> + select NET_VENDOR_CHELSIO
> + select CHELSIO_T4

The user shouldn't have to know about the technical details about
how this chip is physically implemented.

It's therefore not reasonable to require an ethernet driver to be
enabled to use the crypto engine.

Also, selecting Kconfig symbol X does not recursively enable the
"select" statement(s) of symbol X nor does it check symbol X's
dependencies.

This is really one big huge dependency mess, and I think you have
to split out the core of the T4 driver into a driver subtype
agnostic library or similar to make this work properly.

Don't just shoehorn this stuff into the ethernet driver.  Round
peg, square hole.



Re: [PATCH net-next v3 00/10] NCSI Support

2016-07-19 Thread Benjamin Herrenschmidt
On Tue, 2016-07-19 at 20:49 -0700, David Miller wrote:
> From: Gavin Shan 
> Date: Tue, 19 Jul 2016 11:54:15 +1000
> 
> > This series rebases on David's linux-net git repo ("master"
> branch). It's
> > to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c.
> The
> > implementation is based on NCSI spec (version: 1.1.0):
> > https://www.dmtf.org/sites/default/files/standards/documents/DSP022
> 2_1.1.0.pdf
> 
> Series applied, thanks.

Thanks Dave. There was one little nit left in the Faraday driver but we
will send a follow up patch to fix it.

Cheers,
Ben.



Re: [PATCH v3] packet: fix second argument of sock_tx_timestamp()

2016-07-19 Thread David Miller
From: Yoshihiro Shimoda 
Date: Tue, 19 Jul 2016 14:40:51 +0900

> This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
> work correctly. Since the sendto syscall doesn't have msg_control buffer,
> the sock_tx_timestamp() in packet_snd() cannot work correctly because
> the socks.tsflags is set to 0.
> So, this patch sets the socks.tsflags to sk->sk_tsflags as default.
> 
> Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
> Cc: 
> Reported-by: Kazuya Mizuguchi 
> Reported-by: Keita Kobayashi 
> Signed-off-by: Yoshihiro Shimoda 

Applied.


Re: [PATCH net-next v3 00/10] NCSI Support

2016-07-19 Thread David Miller
From: Gavin Shan 
Date: Tue, 19 Jul 2016 11:54:15 +1000

> This series rebases on David's linux-net git repo ("master" branch). It's
> to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. The
> implementation is based on NCSI spec (version: 1.1.0):
> https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf

Series applied, thanks.


Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)

2016-07-19 Thread Sargun Dhillon



On Wed, 20 Jul 2016, Daniel Borkmann wrote:


On 07/19/2016 06:34 PM, Alexei Starovoitov wrote:

On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote:

+   return -EINVAL;
+
+   /* Is this a user address, or a kernel address? */
+   if (!access_ok(VERIFY_WRITE, to, size))
+   return -EINVAL;
+
+   return probe_kernel_write(to, from, size);


I'm still worried that this can lead to all kind of hard to find
bugs or races for user processes, if you make this writable to entire
user address space (which is the only thing that access_ok() checks
for). What if the BPF program has bugs and writes to wrong addresses
for example, introducing bugs in some other, non-intended processes
elsewhere? Can this be limited to syscalls only? And if so, to the
passed memory only?


my understanding that above code will write only to memory of current 
process,
so impact is contained and in that sense buggy kprobe program is no 
different

from buggy seccomp prorgram.


Compared to seccomp, you might not notice that a race has happened,
in seccomp case you might have killed your process, which is visible.
But ok, in ptrace() case it might be similar issue perhaps ...

The asm-generic version does __access_ok(..) { return 1; } for nommu
case, I haven't checked closely enough whether there's actually an arch
that uses this, but for example arm nommu with only one addr space would
certainly result in access_ok() as 1, and then you could also go into
probe_kernel_write(), no?

Don't know that code well enough, but I believe the check would only
ensure in normal use-cases that user process doesn't fiddle with kernel
address space, but not necessarily guarantee that this really only
belongs to the process address space.

You're right, it doesn't gaurantee that the memory being written to is the 
user, but only the current process should be mapped into that chunk of 
memory that is being operated at in the point in time. I think it'd be 
fairly difficult to write to another process' memory to corrupt it.


It also might make sense to me to call access_ok with a known kernel 
address, and either deny access to the bpf helper, or warn in dmesg saying 
that it is running in an environment without any protection.
-- I'm not sure if there is a preferred kernel address to check? What do 
you think?




x86 code comments this with "note that, depending on architecture,
this function probably just checks that the pointer is in the user
space range - after calling this function, memory access functions may
still return -EFAULT".

Also, what happens in case of kernel thread?
It seems fairly reasonable to me to check task->flags & PF_KTHREAD in 
order to prevent this scenario since there is no memory to copy_to_user 
to.




As it stands, it does ...

if (unlikely(in_interrupt()))
return -EINVAL;
if (unlikely(!task || !task->pid))
return -EINVAL;

So up to here, irq/sirq, NULL current and that current is not the 'idle'
process is being checked (still fail to see the point for the !task->pid,
I believe the intend here is different).

/* Is this a user address, or a kernel address? */
if (!access_ok(VERIFY_WRITE, to, size))
return -EINVAL;

Now here. What if it's a kernel thread? You'll have KERNEL_DS segment,
task->pid was non-zero as well for the kthread, so access_ok() will
pass and you can still execute probe_kernel_write() ...


Limiting this to syscalls will make it too limited.
I'm in favor of this change, because it allows us to experiment
with restartable sequences and lock-free algorithms that need ultrafast
access to cpuid without burdening the kernel with stable abi.


Have you played around with ptrace() to check whether you could
achieve similar functionality (was thinking about things like [1],
PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't
this be limited to a similar functionality for only the current task.
ptrace() utilizes helpers like access_process_vm(), maybe this can
similarly be adapted here, too (under the circumstances that sleeping
is not allowed)?


If we hack access_process_vm I think at the end it will look like
probe_kernel_write. Walking mm requires semaphore, so we would only
be able to do it in task_work and there we can do normal copy_to_user
just as well, but it will complicate the programs quite a bit, since
writes will be asynchronous and batched.
Looks like with access_ok+probe_write we can achieve the same
with a lot less code.


I believe it may not quite be the same as it currently stands. No
fundamental objection, just that this needs to be made "safe" to the
limits you state above yourself. ;)

At least for my current use cases, doing this asynchronously could prove 
to be unworkable.



As far as races between user and bpf program, yeah, if process
is multithreaded, the other threads may access the same mem that
bpf is writing to, but that's no 

Re: [kbuild-all] [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Fengguang Wu

On Tue, Jul 19, 2016 at 07:52:53PM -0700, David Miller wrote:

From: Dexuan Cui 
Date: Wed, 20 Jul 2016 01:48:18 +


From: kbuild test robot [mailto:l...@intel.com]
Sent: Wednesday, July 20, 2016 1:10

Hi,

[auto build test WARNING on net-next/master]

url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-
Hyper-V-VM-Sockets-hv_sock/20160715-223433
config: x86_64-randconfig-a0-07191719 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All warnings (new ones prefixed by >>):

   net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
   net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in
this function
   net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used
uninitialized in this function
   net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized
in this function
   net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
   net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in
this function
>> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d:
function has unreachable instruction


These warnings are all false alarms.


But you still have to quiet them.


Agreed.

However the last warning

   objtool: hvsock_on_channel_cb()+0x1d: function has unreachable 
instruction

is my fault -- it turns out to be a problem with old gcc-4.4 and
gcc-4.6 which we enabled 2 days ago. I just disabled reporting this
odd warning because it complaints on ~1 functions all over the
place, which can only be sanely fixed in the toolchain.

Thanks,
Fengguang


Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)

2016-07-19 Thread Alexei Starovoitov
On Wed, Jul 20, 2016 at 01:19:51AM +0200, Daniel Borkmann wrote:
> On 07/19/2016 06:34 PM, Alexei Starovoitov wrote:
> >On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote:
> >>>+  return -EINVAL;
> >>>+
> >>>+  /* Is this a user address, or a kernel address? */
> >>>+  if (!access_ok(VERIFY_WRITE, to, size))
> >>>+  return -EINVAL;
> >>>+
> >>>+  return probe_kernel_write(to, from, size);
> >>
> >>I'm still worried that this can lead to all kind of hard to find
> >>bugs or races for user processes, if you make this writable to entire
> >>user address space (which is the only thing that access_ok() checks
> >>for). What if the BPF program has bugs and writes to wrong addresses
> >>for example, introducing bugs in some other, non-intended processes
> >>elsewhere? Can this be limited to syscalls only? And if so, to the
> >>passed memory only?
> >
> >my understanding that above code will write only to memory of current 
> >process,
> >so impact is contained and in that sense buggy kprobe program is no different
> >from buggy seccomp prorgram.
> 
> Compared to seccomp, you might not notice that a race has happened,
> in seccomp case you might have killed your process, which is visible.
> But ok, in ptrace() case it might be similar issue perhaps ...
> 
> The asm-generic version does __access_ok(..) { return 1; } for nommu
> case, I haven't checked closely enough whether there's actually an arch
> that uses this, but for example arm nommu with only one addr space would
> certainly result in access_ok() as 1, and then you could also go into
> probe_kernel_write(), no?

good point. how arm nommu handles copy_to_user? if there is nommu
then there is no user/kernel mm ? Crazy archs.
I guess we have to disable this helper on all such archs.

> Don't know that code well enough, but I believe the check would only
> ensure in normal use-cases that user process doesn't fiddle with kernel
> address space, but not necessarily guarantee that this really only
> belongs to the process address space.

why? on x86 that exactly what it does. access_ok=true means
it's user space address and since we're in _this user context_
probe_kernel_write can only affect this user.

> x86 code comments this with "note that, depending on architecture,
> this function probably just checks that the pointer is in the user
> space range - after calling this function, memory access functions may
> still return -EFAULT".

Yes. I've read that comment to :)
Certainly not an expert, but the archs I've looked at, access_ok
has the same meaning as on x86. They check the space range to
make sure address doesn't belong to kernel.
Could I have missed something? Certainly. Please double check :)

> Also, what happens in case of kernel thread?

my understanding if access_ok(addr)=true the addr will never point
to memory of kernel thread.
We need expert opinion. Whom should we ping?

> As it stands, it does ...
> 
>   if (unlikely(in_interrupt()))
>   return -EINVAL;
>   if (unlikely(!task || !task->pid))
>   return -EINVAL;
> 
> So up to here, irq/sirq, NULL current and that current is not the 'idle'
> process is being checked (still fail to see the point for the !task->pid,
> I believe the intend here is different).
> 
>   /* Is this a user address, or a kernel address? */
>   if (!access_ok(VERIFY_WRITE, to, size))
>   return -EINVAL;
> 
> Now here. What if it's a kernel thread? You'll have KERNEL_DS segment,
> task->pid was non-zero as well for the kthread, so access_ok() will
> pass and you can still execute probe_kernel_write() ...

I think user_addr_max() should be zero for kthread, but
worth checking for sure.

> >Limiting this to syscalls will make it too limited.
> >I'm in favor of this change, because it allows us to experiment
> >with restartable sequences and lock-free algorithms that need ultrafast
> >access to cpuid without burdening the kernel with stable abi.
> >
> >>Have you played around with ptrace() to check whether you could
> >>achieve similar functionality (was thinking about things like [1],
> >>PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't
> >>this be limited to a similar functionality for only the current task.
> >>ptrace() utilizes helpers like access_process_vm(), maybe this can
> >>similarly be adapted here, too (under the circumstances that sleeping
> >>is not allowed)?
> >
> >If we hack access_process_vm I think at the end it will look like
> >probe_kernel_write. Walking mm requires semaphore, so we would only
> >be able to do it in task_work and there we can do normal copy_to_user
> >just as well, but it will complicate the programs quite a bit, since
> >writes will be asynchronous and batched.
> >Looks like with access_ok+probe_write we can achieve the same
> >with a lot less code.
> 
> I believe it may not quite be the same as it currently stands. No
> fundamental objection, just that this needs to be made "safe" to the
> 

[PATCH 1/1] ixgbevf: avoid checking hang when performing hardware reset

2016-07-19 Thread zyjzyj2000
From: Zhu Yanjun 

When performing hardware reset, it is not necessary to check hang.
Or else, the call trace will appear.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  | 1 +
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 5 +++--
 drivers/net/ethernet/intel/ixgbevf/vf.c   | 5 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index d5944c3..60fc63b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -446,6 +446,7 @@ enum ixbgevf_state_t {
__IXGBEVF_SERVICE_INITED,
__IXGBEVF_RESET_REQUESTED,
__IXGBEVF_QUEUE_RESET_REQUESTED,
+   __IXGBEVF_HW_RESETTING,
 };
 
 enum ixgbevf_boards {
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index acc2401..530005b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2795,9 +2795,10 @@ static void ixgbevf_check_hang_subtask(struct 
ixgbevf_adapter *adapter)
u32 eics = 0;
int i;
 
-   /* If we're down or resetting, just bail */
+   /* If we're down, resetting or hw resetting, just bail */
if (test_bit(__IXGBEVF_DOWN, >state) ||
-   test_bit(__IXGBEVF_RESETTING, >state))
+   test_bit(__IXGBEVF_RESETTING, >state) ||
+   test_bit(__IXGBEVF_HW_RESETTING, >state))
return;
 
/* Force detection of hung controller */
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c 
b/drivers/net/ethernet/intel/ixgbevf/vf.c
index e670d3b..4ec4484 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -80,6 +80,9 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
s32 ret_val = IXGBE_ERR_INVALID_MAC_ADDR;
u32 msgbuf[IXGBE_VF_PERMADDR_MSG_LEN];
u8 *addr = (u8 *)([1]);
+   struct ixgbevf_adapter *adapter = hw->back;
+
+   set_bit(__IXGBEVF_HW_RESETTING, >state);
 
/* Call adapter stop to disable tx/rx and clear interrupts */
hw->mac.ops.stop_adapter(hw);
@@ -128,6 +131,8 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
 
hw->mac.mc_filter_type = msgbuf[IXGBE_VF_MC_TYPE_WORD];
 
+   clear_bit(__IXGBEVF_HW_RESETTING, >state);
+
return 0;
 }
 
-- 
2.7.4



[no subject]

2016-07-19 Thread zyjzyj2000

v1->v2:
Follow the advice from Donald, replacing read directly from RSTD and 
RSTI register with a state variable __IXGBEVF_HW_RESETTING;



Re: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread David Miller
From: Dexuan Cui 
Date: Wed, 20 Jul 2016 01:48:18 +

>> From: kbuild test robot [mailto:l...@intel.com]
>> Sent: Wednesday, July 20, 2016 1:10
>> 
>> Hi,
>> 
>> [auto build test WARNING on net-next/master]
>> 
>> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-
>> Hyper-V-VM-Sockets-hv_sock/20160715-223433
>> config: x86_64-randconfig-a0-07191719 (attached as .config)
>> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
>> reproduce:
>> # save the attached .config to linux build tree
>> make ARCH=x86_64
>> 
>> All warnings (new ones prefixed by >>):
>> 
>>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
>>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in
>> this function
>>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used
>> uninitialized in this function
>>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized
>> in this function
>>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
>>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in
>> this function
>> >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d:
>> function has unreachable instruction
> 
> These warnings are all false alarms.

But you still have to quiet them.


Re: [PATCH net-next v3 00/12] net: dsa: mv88e6xxx: Global2 cleanup and STP

2016-07-19 Thread David Miller
From: Vivien Didelot 
Date: Mon, 18 Jul 2016 20:45:28 -0400

> The Marvell switches registers are organized in distinct internal SMI
> devices, such as PHY, Port, Global 1 or Global 2 registers sets.
> 
> Since not all chips support every registers sets or have slightly
> differences in them (such as old 88E6060 or new 88E6390 likely to be
> supported soon), make the setup code clearer now by removing a few
> family checks and adding flags to describe the Global 2 registers map.
> 
> This patchset enables basic STP support and bridging on most chips when
> getting rid of a few inconsistencies in chip descriptions (patch 1) and
> add bridge Ageing Time support to DSA and the mv88e6xxx driver.

Series applied.


Re: [patch 1/1] kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union initialization bug

2016-07-19 Thread David Miller
From: a...@linux-foundation.org
Date: Mon, 18 Jul 2016 15:50:58 -0700

> From: Andrew Morton 
> Subject: kernel/trace/bpf_trace.c: work around gcc-4.4.4 anon union 
> initialization bug
> 
> kernel/trace/bpf_trace.c: In function 'bpf_event_output':
> kernel/trace/bpf_trace.c:312: error: unknown field 'next' specified in 
> initializer
> kernel/trace/bpf_trace.c:312: warning: missing braces around initializer
> kernel/trace/bpf_trace.c:312: warning: (near initialization for 
> 'raw.frag.')
> 
> Fixes: 555c8a8623a3a87 ("bpf: avoid stack copy and use skb ctx for event 
> output")
> Acked-by: Daniel Borkmann 
> Cc: Alexei Starovoitov 
> Cc: David S. Miller 
> Signed-off-by: Andrew Morton 

Applied.


Re: [PATCH resend] virtio-net: Remove more stack DMA

2016-07-19 Thread David Miller
From: Andy Lutomirski 
Date: Mon, 18 Jul 2016 15:34:49 -0700

> VLAN and MQ control was doing DMA from the stack.  Fix it.
> 
> Cc: Michael S. Tsirkin 
> Cc: "netdev@vger.kernel.org" 
> Signed-off-by: Andy Lutomirski 
> ---
> 
> I tested VLAN addition and removal with CONFIG_VMAP_STACK=y,
> CONFIG_DEBUG_SG=y and it got rid of the warnings I saw.  I haven't
> tested the MQ part because I don't know how to enable it in the first
> place (I'm guessing it needs me to enable some QEMU feature I don't
> know about.)
> 
> DaveM, contrary to what I thought last time I sent this, I think this
> should go through net-next as long as it makes it in time for 4.8.

Applied to net-next, thanks.


Re: [PATCH net] bnxt_en: Remove locking around txr->dev_state

2016-07-19 Thread David Miller
From: Florian Fainelli 
Date: Mon, 18 Jul 2016 13:02:47 -0700

> txr->dev_state was not consistently manipulated with the acquisition of
> the per-queue lock, after further inspection the lock does not seem
> necessary, either the value is read as BNXT_DEV_STATE_CLOSING or 0.
> 
> Reported-by: coverity (CID 1339583)
> Fixes: c0c050c58d840 ("bnxt_en: New Broadcom ethernet driver.")
> Signed-off-by: Florian Fainelli 
> ---
> Changes in v2:
> 
> - remove locking in bnxt_tx_disable() as recommended by Michael

Applied to net-next, thanks.


linux-next: manual merge of the net-next tree with the net tree

2016-07-19 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/sctp/input.c

between commit:

  c74bfbdba0e8 ("sctp: load transport header after sk_filter")

from the net tree and commit:

  3acb50c18d8d ("sctp: delay as much as possible skb_linearize")

from the net-next tree.

I fixed it up (I just used the net-next version) and can carry the fix
as necessary. This is now fixed as far as linux-next is concerned, but
any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the net-next tree with the net tree

2016-07-19 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  net/sched/sch_htb.c

between commit:

  338ed9b4de57 ("net_sched: sch_htb: export class backlog in dumps")

from the net tree and commit:

  0564bf0afae4 ("net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc net/sched/sch_htb.c
index 052f84d6cc23,91982d9784b3..
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@@ -1136,18 -1113,22 +1113,24 @@@ static in
  htb_dump_class_stats(struct Qdisc *sch, unsigned long arg, struct gnet_dump 
*d)
  {
struct htb_class *cl = (struct htb_class *)arg;
+   struct gnet_stats_queue qs = {
+   .drops = cl->drops,
+   };
__u32 qlen = 0;
  
-   if (!cl->level && cl->un.leaf.q)
+   if (!cl->level && cl->un.leaf.q) {
qlen = cl->un.leaf.q->q.qlen;
+   qs.backlog = cl->un.leaf.q->qstats.backlog;
+   }
 -  cl->xstats.tokens = PSCHED_NS2TICKS(cl->tokens);
 -  cl->xstats.ctokens = PSCHED_NS2TICKS(cl->ctokens);
 +  cl->xstats.tokens = clamp_t(s64, PSCHED_NS2TICKS(cl->tokens),
 +  INT_MIN, INT_MAX);
 +  cl->xstats.ctokens = clamp_t(s64, PSCHED_NS2TICKS(cl->ctokens),
 +   INT_MIN, INT_MAX);
  
-   if (gnet_stats_copy_basic(d, NULL, >bstats) < 0 ||
+   if (gnet_stats_copy_basic(qdisc_root_sleeping_running(sch),
+ d, NULL, >bstats) < 0 ||
gnet_stats_copy_rate_est(d, NULL, >rate_est) < 0 ||
-   gnet_stats_copy_queue(d, NULL, >qstats, qlen) < 0)
+   gnet_stats_copy_queue(d, NULL, , qlen) < 0)
return -1;
  
return gnet_stats_copy_app(d, >xstats, sizeof(cl->xstats));


Re: [kbuild-all] [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Fengguang Wu

Yes, sorry for the noises!


>> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d:
function has unreachable instruction


These warnings are all false alarms.

Thanks,
-- Dexuan
___
kbuild-all mailing list
kbuild-...@lists.01.org
https://lists.01.org/mailman/listinfo/kbuild-all


RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Dexuan Cui
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: Wednesday, July 20, 2016 1:10
> 
> Hi,
> 
> [auto build test WARNING on net-next/master]
> 
> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-
> Hyper-V-VM-Sockets-hv_sock/20160715-223433
> config: x86_64-randconfig-a0-07191719 (attached as .config)
> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
> 
> All warnings (new ones prefixed by >>):
> 
>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in
> this function
>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used
> uninitialized in this function
>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized
> in this function
>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in
> this function
> >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d:
> function has unreachable instruction

These warnings are all false alarms.

Thanks,
-- Dexuan


答复: [PATCH 1/1] netfilter: Add helper array register/unregister functions

2016-07-19 Thread 高峰
Oh, thanks Liping.
I have not found the extra port styles are different of irc, sane and tftp with 
ftp.

Hi Pablo,
Then should I modify the original patch or send a new one?


-邮件原件-
发件人: Liping Zhang [mailto:zlpnob...@gmail.com] 
发送时间: 2016年7月20日 8:51
收件人: f...@ikuai8.com
抄送: Pablo Neira Ayuso ; Patrick McHardy ; 
netfilter-de...@vger.kernel.org; netdev@vger.kernel.org; 
linux-ker...@vger.kernel.org; gfree.w...@gmail.com
主题: Re: [PATCH 1/1] netfilter: Add helper array register/unregister functions

2016-07-18 11:39 GMT+08:00  :
> From: Gao Feng 
>
> Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister
> functions to enhance the conntrack helper codes.

I think this patch is breaking something ...

This irc:
> -   if (ports[i] == IRC_PORT)
> -   sprintf(irc[i].name, "irc");
> -   else
> -   sprintf(irc[i].name, "irc-%u", i);
> -
> -   ret = nf_conntrack_helper_register([i]);
> +   nf_ct_helper_init([i], AF_INET, IPPROTO_TCP, "irc",
> + IRC_PORT, ports[i], _exp_policy, 0, 0,
> + help, NULL, THIS_MODULE);
> +   }

This sip:
> -   if (ports[i] == SIP_PORT)
> -   sprintf(sip[i][j].name, "sip");
> -   else
> -   sprintf(sip[i][j].name, "sip-%u", i);

And this tftp:
> -   if (ports[i] == TFTP_PORT)
> -   sprintf(tftp[i][j].name, "tftp");
> -   else
> -   sprintf(tftp[i][j].name, "tftp-%u", i);

For example, if the user install the nf_conntrack_tftp module an specify the 
ports to "69,10069", then the helper name is "tftp" and "tftp-1".

But apply this patch, the helper name will be changed to "tftp" and 
"tftp-10069", this may break the existing iptables rules which used the helper 
match or CT target.

And this was already discussed  at https://patchwork.ozlabs.org/patch/622238/



Re: [PATCH 1/1] netfilter: Add helper array register/unregister functions

2016-07-19 Thread Liping Zhang
2016-07-18 11:39 GMT+08:00  :
> From: Gao Feng 
>
> Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister
> functions to enhance the conntrack helper codes.

I think this patch is breaking something ...

This irc:
> -   if (ports[i] == IRC_PORT)
> -   sprintf(irc[i].name, "irc");
> -   else
> -   sprintf(irc[i].name, "irc-%u", i);
> -
> -   ret = nf_conntrack_helper_register([i]);
> +   nf_ct_helper_init([i], AF_INET, IPPROTO_TCP, "irc",
> + IRC_PORT, ports[i], _exp_policy, 0, 0,
> + help, NULL, THIS_MODULE);
> +   }

This sip:
> -   if (ports[i] == SIP_PORT)
> -   sprintf(sip[i][j].name, "sip");
> -   else
> -   sprintf(sip[i][j].name, "sip-%u", i);

And this tftp:
> -   if (ports[i] == TFTP_PORT)
> -   sprintf(tftp[i][j].name, "tftp");
> -   else
> -   sprintf(tftp[i][j].name, "tftp-%u", i);

For example, if the user install the nf_conntrack_tftp module an
specify the ports to "69,10069",
then the helper name is "tftp" and "tftp-1".

But apply this patch, the helper name will be changed to "tftp" and
"tftp-10069", this may break the
existing iptables rules which used the helper match or CT target.

And this was already discussed  at https://patchwork.ozlabs.org/patch/622238/


Re: [PATCH net-next 1/1] net_sched: Introduce skbmod action

2016-07-19 Thread Daniel Borkmann

On 07/19/2016 08:04 PM, Cong Wang wrote:

On Tue, Jul 19, 2016 at 8:03 AM, Daniel Borkmann  wrote:

On 07/19/2016 03:56 PM, Jamal Hadi Salim wrote:
[...]


But apart from this,
neither pedit nor tcf_skbmod_run() here handle checksum complete, so
you'll
potentially get false positives wrt csum corruption and drops as a result
when using either of the two.


pedit maybe tricky. Any suggestions?
On tcf_skbmod_run, mostly ignorance: while doing only ethernet updates;
is it still needed to do the checksum complete?


Well, what Cong recently fixed with mirred was related to mac header ...

You probably need skb_postpull_rcsum(), skb_postpush_rcsum() pair.

Also, what about skb_try_make_writable()?


I don't think so. 1) checksum is supposed to be done by csum action
rather than pedit (or skbmod if it matters), 2) csum action currently
already handles that correctly for both egress and ingress: 2a)
CHECKSUM_COMPLETE is meaningless on egress; 2b) it forces
CHECKSUM_COMPLETE to be CHECKSUM_NONE on ingress
and it is correctly respected.


Ahh, right, so it redoes entire csums when worst-case changing one byte
only, since it cannot know what other actions like pedit did previously.
But it also means for your l2 mac addr changes to force a CHECKSUM_COMPLETE
into CHECKSUM_NONE with csum action that you need to call into one of the
l3/l4 helpers as far as I see.


Re: [ovs-dev] [PATCH net-next v11 5/6] openvswitch: add layer 3 flow/port support

2016-07-19 Thread Simon Horman
On Mon, Jul 18, 2016 at 03:34:52PM -0700, pravin shelar wrote:
> On Sun, Jul 17, 2016 at 9:50 PM, Simon Horman
>  wrote:
> > [CC Jiri Benc for portion regarding GRE]
> >
> > Hi Pravin,
> >
> > On Fri, Jul 15, 2016 at 02:07:37PM -0700, pravin shelar wrote:
> >> On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman
> >>  wrote:
> >> > Hi Pravin,
> >> >
> >> > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote:
> >> >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman
> >> >>  wrote:
> >> >
> >> > ...
> >>
> >> >
> >> >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> >> >> > index 0ea128eeeab2..86f2cfb19de3 100644
> >> >> > --- a/net/openvswitch/flow.c
> >> >> > +++ b/net/openvswitch/flow.c
> >> >> ...
> >> >>
> >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct 
> >> >> > ip_tunnel_info *tun_info,
> >> >> > key->phy.skb_mark = skb->mark;
> >> >> > ovs_ct_fill_key(skb, key);
> >> >> > key->ovs_flow_hash = 0;
> >> >> > +   key->phy.is_layer3 = skb->mac_len == 0;
> >> >>
> >> >> I do not think mac_len can be used. mac_header needs to be checked.
> >> >> ...
> >> >
> >> > Yes, indeed. The update to use skb_mac_header_was_set() here accidently
> >> > slipped into the following patch, sorry about that.
> >> >
> >> > With that change in place I believe that this patch is internally
> >> > consistent because mac_header and mac_len are set correctly by the
> >> > call to key_extract() which is called by ovs_flow_key_extract() just
> >> > after where the excerpt above ends.
> >> >
> >> > That said, I do think that it is possible to rely on 
> >> > skb_mac_header_was_set
> >> > throughout the datapath, including action processing etc... I have 
> >> > provided
> >> > an incremental patch - which I created on top of this entire series - at
> >> > the end of this email. If you prefer that approach I am happy to take it,
> >> > though I do feel that using mac_len leads to slightly cleaner code. Let 
> >> > me
> >> > know what you think.
> >> >
> >>
> >>
> >> I am not sure if you can use only mac_len to detect L3 packet. This
> >> does not work with MPLS packets, mac_len is used to account MPLS
> >> headers pushed on skb. Therefore in case of a MPLS header on L3
> >> packet, mac_len would be non zero and we have to look at either
> >> mac_header or some other metadata like is_layer3 flag from key to
> >> check for L3 packet.
> >
> > At least within OvS mac_len does not include the length of the MPLS label
> > stack. Rather, the MPLS label stack length is the difference between the
> > end of (mac_header + mac_len) and network_header.
> >
> > So I think that the scheme does work as mac_len is 0 if there is no L2
> > header regardless of if an MPLS label stack is present or not.
> >
> 
> I was thinking in overall networking stack rather than just ovs
> datapath. I think we should have consistent method of detecting L3
> packet. As commented in previous mail it could be achieved using
> skb-protocol and device type.

This is somewhat of a surprise to me. As far as I recall when MPLS support
was added to OvS it and the accompanying support for MPLS GSO was the only
MPLS support present in the kernel. And at the time the scheme developed by
Jesse Gross, myself and others was as I describe above.

Internally OvS relies on this scheme and in particular it is used
by skb_mpls_header() to calculate the beginning of the MPLS label stack
accurately in the presence of VLAN tags.

Is it mpls_gso_segment() that you are concerned about?
If so, perhaps the problem could be addressed there.


Re: [PATCH net] net: switchdev: change ageing_time type to clock_t

2016-07-19 Thread David Miller
From: Vivien Didelot 
Date: Mon, 18 Jul 2016 15:02:06 -0400

> The switchdev value for the SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME
> attribute is a clock_t and requires to use helpers such as
> clock_t_to_jiffies() to convert to milliseconds.
> 
> Change ageing_time type from u32 to clock_t to make it explicit.
> 
> Fixes: f55ac58ae64c ("switchdev: add bridge ageing_time attribute")
> Signed-off-by: Vivien Didelot 

Applied.


Re: [PATCH 1/1] Update maintainer for EHEA driver.

2016-07-19 Thread David Miller
From: Douglas Miller 
Date: Mon, 18 Jul 2016 12:28:45 -0500

> Since Thadeu left IBM, EHEA has gone mostly unmaintained, since his email
> address doesn't work anymore.  I'm stepping up to help maintain this
> driver upstream.
> 
> I'm adding Thadeu's personal e-mail address in Cc, hoping that we can
> get his ack.
> 
> CC: Thadeu Lima de Souza Cascardo 
> Signed-off-by: Douglas Miller 

Applied.


Re: [PATCH net 0/2] Safe flow for mlx4_en configuration change

2016-07-19 Thread David Miller
From: Tariq Toukan 
Date: Mon, 18 Jul 2016 18:35:10 +0300

> This patchset improves the mlx4_en driver resiliency, especially on
> systems with low memory.  Upon a configuration change that requires
> the allocation of new resources, we first try to allocate, prior to
> destroying the current ones.  Once it is successfully done,
> we release the old resources and attach the new ones.  Otherwise, we
> stay with a functioning interface having the same old configuration.
> 
> This improvement became of greater significance after removing the use
> of vmap.

Series applied, thanks.


Re: [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso'

2016-07-19 Thread David Miller
From: Shmulik Ladkani 
Date: Mon, 18 Jul 2016 14:49:32 +0300

> Currently IP fragmentation of GSO segments that exceed dst mtu is
> considered only in the ipv4 forwarding case.
> 
> There are cases where GSO skbs that are bridged and then udp-tunneled
> may have gso_size exceeding the egress device mtu.
> It makes sense to fragment them, as in the non GSOed code path.
> 
> The exact cases where this behavior is needed is described and addressed
> in the 2nd patch.

Series applied, thanks.


Re: [PATCH net-next v2 07/10] net/faraday: Read MAC address from chip

2016-07-19 Thread Gavin Shan
On Tue, Jul 19, 2016 at 08:57:53PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2016-07-19 at 10:50 +, David Laight wrote:
>> > + if (!is_valid_ether_addr(mac)) {
>> > + mac[5] = (m >> 8) & 0xff;
>> > + mac[4] = m & 0xff;
>> > + mac[3] = (l >> 24) & 0xff;
>> > + mac[2] = (l >> 16) & 0xff;
>> > + mac[1] = (l >>  8) & 0xff;
>> > + mac[0] = l & 0xff;
>> > + }
>> ...
>> 
>> That is horrid, not all byte reversed addresses will be invalid.
>
>Right, that's just a hack for a broken vendor uboot we had here, Gavin,
>drop that part of the patch please.
>

Sure, I'll drop it in v4 or a followup patch. v3 is being reviewed this
moment.

Thanks,
Gavin



Re: Network hang after c3f1010b30f7fc611139cfb702a8685741aa6827 with CIPSO & Smack

2016-07-19 Thread Casey Schaufler
On 7/6/2016 11:56 AM, Casey Schaufler wrote:
> On 7/6/2016 11:43 AM, David Ahern wrote:
>> On 7/6/16 11:01 AM, Casey Schaufler wrote:
>>> I find the test occasionally passes without hanging, but will
>>> hang the system if repeated. I am running on Ubuntu and Fedora,
>>> both with systemd, which may be a contributing factor. I run
>>> under qemu, and am based on Linus' tree.
>>>
>> With this:
>>
>> for n in $(seq 1 10); do
>> bash -x ./testnetworking.sh
>> sleep 10
>> done
>>
>> I do get the VM to loop where I can not kill the test. dmesg has this splat:
>>
>> [ 3576.504715] general protection fault:  [#21] SMP
>> [ 3576.505322] Modules linked in: 8021q garp mrp stp llc
>> [ 3576.506007] CPU: 3 PID: 2938 Comm: killall Tainted: G  D 4.7.0-rc5+ 
>> #20
>> [ 3576.506881] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.7.5-20140531_083030-gandalf 04/01/2014
>> [ 3576.508048] task: 8800b4e72340 ti: 880138a48000 task.ti: 
>> 880138a48000
>> [ 3576.508894] RIP: 0010:[]  [] 
>> next_tgid+0x53/0x99
>> [ 3576.509803] RSP: 0018:880138a4bde8  EFLAGS: 00010206
>> [ 3576.510410] RAX: 4100646e4100608e RBX: 07f2 RCX: 
>> 8800b98c9bb0
>> [ 3576.511218] RDX: 4100646e4100608e RSI: 03e0 RDI: 
>> 8800b98c9b80
>> [ 3576.512024] RBP: 880138a4be10 R08: 0032 R09: 
>> 
>> [ 3576.512833] R10:  R11: 0200 R12: 
>> 07e5
>> [ 3576.513647] R13: 8800b98c9b80 R14: 81a27900 R15: 
>> 07e4
>> [ 3576.514453] FS:  7fc084469700() GS:88013fd8() 
>> knlGS:
>> [ 3576.515361] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 3576.516009] CR2: 01947000 CR3: b5449000 CR4: 
>> 000406e0
>> [ 3576.516818] Stack:
>> [ 3576.517057]  07e5  880138a4bee0 
>> 8800b982a140
>> [ 3576.517963]  81a27900 880138a4be68 81187090 
>> 8800b1d9d300
>> [ 3576.518857]  003032303201 880138a4bee0 880138a4bee0 
>> 
>> [ 3576.519754] Call Trace:
>> [ 3576.520044]  [] proc_pid_readdir+0xd4/0x18b
>> [ 3576.520697]  [] proc_root_readdir+0x35/0x3a
>> [ 3576.521352]  [] iterate_dir+0xac/0x148
>> [ 3576.521966]  [] ? __fget_light+0x27/0x48
>> [ 3576.522587]  [] SyS_getdents+0x8a/0xdc
>> [ 3576.523189]  [] ? fillonedir+0xc7/0xc7
>> [ 3576.523794]  [] entry_SYSCALL_64_fastpath+0x1a/0xa4
>> [ 3576.524524]  [] ? entry_SYSCALL_64_fastpath+0x1a/0xa4
>> [ 3576.525276] Code: d6 aa ed ff 48 85 c0 49 89 c5 74 40 4c 89 f6 48 89 c7 
>> e8 8b a2 ed ff 31 f6 4c 89 ef 89 c3 e8 15 a2 ed ff 48 85 c0 48 89 c2 74 17 
>> <48> 8b 80 78 05 00 00 48 8b 80 c8 00 00 00 48 39 82 f0 03 00 00
>> [ 3576.528359] RIP  [] next_tgid+0x53/0x99
>> [ 3576.528991]  RSP 
>> [ 3576.529452] ---[ end trace a6f0cb9bfb70d9e6 ]---
>>
>> And then I can no longer run commands:
>>
>> root@kenny-jessie3:~# top -d1
>> Segmentation fault
>>
> My thought is that there's a locking issue on a resource
> somewhere in the TCP stack, and that a freed but still in
> use buffer is getting into the filesystem code somehow.

Digging into this further I have determined that the
circumstances leading to this issue are somewhat complex.
The good news is that there seems to be a very limited
circumstances under which the problem manifests.

I have a socket, and change the Smack attributes on the
socket (security_inode_setsecurity) before connecting to
a server. The connect succeeds. The client sends a packet,
also successfully. The response is received. Now here's
where it gets interesting. I instrumented the code to print
the Smack attributes on the socket both before and after
the Smack access check. Before the check is made the Smack
data reflects the initial values from when the socket was
created. After the check, they reflect the explicit change
made earlier. The check reports failure based on the initial
values. As a result, an attempt to notify the caller that
the action failed is made (netlbl_skbuff_err) which results
in a call to icmp_send that frees already freed memory.

If the Smack attributes in the sk_security blob are not
explicitly set the problem does not occur. I have the same
result if I change the Smack attributes within the socket
security blob as I do if I replace the security blob.



Re: [PATCH net-next 0/6] bnxt_en: Add support for NS2 Nitro.

2016-07-19 Thread David Miller
From: Michael Chan 
Date: Mon, 18 Jul 2016 07:15:19 -0400

> This series adds support for the embedded version of the
> ethernet controller (Nitro) in the North Star 2 SoC.  There are a number
> of features not supported and a software workaround for a hardware rx
> bug is required for Nitro A0.  Please review.

Series applied, thanks.


Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)

2016-07-19 Thread Daniel Borkmann

On 07/19/2016 06:34 PM, Alexei Starovoitov wrote:

On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote:

+   return -EINVAL;
+
+   /* Is this a user address, or a kernel address? */
+   if (!access_ok(VERIFY_WRITE, to, size))
+   return -EINVAL;
+
+   return probe_kernel_write(to, from, size);


I'm still worried that this can lead to all kind of hard to find
bugs or races for user processes, if you make this writable to entire
user address space (which is the only thing that access_ok() checks
for). What if the BPF program has bugs and writes to wrong addresses
for example, introducing bugs in some other, non-intended processes
elsewhere? Can this be limited to syscalls only? And if so, to the
passed memory only?


my understanding that above code will write only to memory of current process,
so impact is contained and in that sense buggy kprobe program is no different
from buggy seccomp prorgram.


Compared to seccomp, you might not notice that a race has happened,
in seccomp case you might have killed your process, which is visible.
But ok, in ptrace() case it might be similar issue perhaps ...

The asm-generic version does __access_ok(..) { return 1; } for nommu
case, I haven't checked closely enough whether there's actually an arch
that uses this, but for example arm nommu with only one addr space would
certainly result in access_ok() as 1, and then you could also go into
probe_kernel_write(), no?

Don't know that code well enough, but I believe the check would only
ensure in normal use-cases that user process doesn't fiddle with kernel
address space, but not necessarily guarantee that this really only
belongs to the process address space.

x86 code comments this with "note that, depending on architecture,
this function probably just checks that the pointer is in the user
space range - after calling this function, memory access functions may
still return -EFAULT".

Also, what happens in case of kernel thread?

As it stands, it does ...

if (unlikely(in_interrupt()))
return -EINVAL;
if (unlikely(!task || !task->pid))
return -EINVAL;

So up to here, irq/sirq, NULL current and that current is not the 'idle'
process is being checked (still fail to see the point for the !task->pid,
I believe the intend here is different).

/* Is this a user address, or a kernel address? */
if (!access_ok(VERIFY_WRITE, to, size))
return -EINVAL;

Now here. What if it's a kernel thread? You'll have KERNEL_DS segment,
task->pid was non-zero as well for the kthread, so access_ok() will
pass and you can still execute probe_kernel_write() ...


Limiting this to syscalls will make it too limited.
I'm in favor of this change, because it allows us to experiment
with restartable sequences and lock-free algorithms that need ultrafast
access to cpuid without burdening the kernel with stable abi.


Have you played around with ptrace() to check whether you could
achieve similar functionality (was thinking about things like [1],
PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't
this be limited to a similar functionality for only the current task.
ptrace() utilizes helpers like access_process_vm(), maybe this can
similarly be adapted here, too (under the circumstances that sleeping
is not allowed)?


If we hack access_process_vm I think at the end it will look like
probe_kernel_write. Walking mm requires semaphore, so we would only
be able to do it in task_work and there we can do normal copy_to_user
just as well, but it will complicate the programs quite a bit, since
writes will be asynchronous and batched.
Looks like with access_ok+probe_write we can achieve the same
with a lot less code.


I believe it may not quite be the same as it currently stands. No
fundamental objection, just that this needs to be made "safe" to the
limits you state above yourself. ;)


As far as races between user and bpf program, yeah, if process
is multithreaded, the other threads may access the same mem that
bpf is writing to, but that's no different from reading.
For tracing we walk complex datastructures and pointers. They
can be changed by user space on the fly and bpf will see garbage.
Like we have uprobe+bpf that walks clang c++ internal datastructures
to figure out how long it takes clang to process .h headers.
There is a lot of fragility in the bpf script, yet it's pretty
useful to quickly analyze compile times.
I see bpf_copy_to_user to be hugely valuable too, not as a stable
interface, but rather as a tool to quickly debug and experiment.


Right, so maybe there should be a warn once to the dmesg log that this
is just experimental?

Thanks,
Daniel


Re: [PATCH v6 0/4] Marvell phy: fiber interface configuration

2016-07-19 Thread David Miller
From: Charles-Antoine Couret 
Date: Tue, 19 Jul 2016 11:13:09 +0200

> Another patchset to manage correctly the fiber link for some concerned 
> Marvell's
> phy like 88E1512.
> 
> This patchset fixed the commit log for the third and last commits and a 
> comment
> in the first commit.

Series applied to net-next, thanks.


Re: [PATCH 0/2] Fix DMA channel misreporting for the Renesas Ethernet drivers

2016-07-19 Thread David Miller
From: Sergei Shtylyov 
Date: Sun, 17 Jul 2016 16:06:24 +0300

>Here's a set of 2 patches against DaveM's 'net.git' repo fixing
> up the DMA channel reporting by 'ifconfig'...

Seires applied to net-next.


Re: [PATCH v10 12/12] bpf: add sample for xdp forwarding and rewrite

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:57PM -0700, Brenden Blanco wrote:
> Add a sample that rewrites and forwards packets out on the same
> interface. Observed single core forwarding performance of ~10Mpps.
> 
> Since the mlx4 driver under test recycles every single packet page, the
> perf output shows almost exclusively just the ring management and bpf
> program work. Slowdowns are likely occurring due to cache misses.

long term we need to resurrect your prefetch patch. sux to leave
so much performance on the table.

> +static int parse_ipv4(void *data, u64 nh_off, void *data_end)
> +{
> + struct iphdr *iph = data + nh_off;
> +
> + if (iph + 1 > data_end)
> + return 0;
> + return iph->protocol;
> +}
> +
> +static int parse_ipv6(void *data, u64 nh_off, void *data_end)
> +{
> + struct ipv6hdr *ip6h = data + nh_off;
> +
> + if (ip6h + 1 > data_end)
> + return 0;
> + return ip6h->nexthdr;
> +}
...
> + if (h_proto == htons(ETH_P_IP))
> + index = parse_ipv4(data, nh_off, data_end);
> + else if (h_proto == htons(ETH_P_IPV6))
> + index = parse_ipv6(data, nh_off, data_end);
> + else
> + index = 0;
> +
> + value = bpf_map_lookup_elem(, );
> + if (value)
> + *value += 1;
> +
> + if (index == 17) {

not an obvious xdp example. if you'd have to respin for other
reasons please consider 'proto' name and IPPROTO_UDP here.

Acked-by: Alexei Starovoitov 



Re: [PATCH v4 0/7] thunderbolt: Introducing Thunderbolt(TM) networking

2016-07-19 Thread Bjorn Helgaas
On Mon, Jul 18, 2016 at 01:00:33PM +0300, Amir Levy wrote:
> This is version 4 of Thunderbolt(TM) driver for non-Apple hardware.
> 
> Changes since v3:
>  - Moved new Thunderbolt device IDs from pci_ids.h to icm_nhi.h.
>  - Cleanup and added some comments in code.
> 
> These patches were pushed to GitHub where they can be reviewed more
> comfortably with green/red highlighting:
>   https://github.com/01org/thunderbolt-software-kernel-tree
> 
> Daemon code:
>   https://github.com/01org/thunderbolt-software-daemon
> 
> For reference, here's a link to version 3:
> [v3]: https://lkml.org/lkml/2016/7/14/311
> 
> Amir Levy (7):
>   thunderbolt: Macro rename
>   thunderbolt: Updating the register definitions
>   thunderbolt: Kconfig for Thunderbolt(TM) networking
>   thunderbolt: Communication with the ICM (firmware)
>   thunderbolt: Networking state machine
>   thunderbolt: Networking transmit and receive
>   thunderbolt: Networking doc
> 
>  Documentation/00-INDEX   |2 +
>  Documentation/thunderbolt-networking.txt |  135 ++
>  drivers/thunderbolt/Kconfig  |   25 +-
>  drivers/thunderbolt/Makefile |3 +-
>  drivers/thunderbolt/icm/Makefile |   28 +
>  drivers/thunderbolt/icm/icm_nhi.c| 1642 +
>  drivers/thunderbolt/icm/icm_nhi.h|   93 ++
>  drivers/thunderbolt/icm/net.c| 2276 
> ++
>  drivers/thunderbolt/icm/net.h|  274 
>  drivers/thunderbolt/nhi_regs.h   |  115 +-
>  10 files changed, 4585 insertions(+), 8 deletions(-)
>  create mode 100644 Documentation/thunderbolt-networking.txt
>  create mode 100644 drivers/thunderbolt/icm/Makefile
>  create mode 100644 drivers/thunderbolt/icm/icm_nhi.c
>  create mode 100644 drivers/thunderbolt/icm/icm_nhi.h
>  create mode 100644 drivers/thunderbolt/icm/net.c
>  create mode 100644 drivers/thunderbolt/icm/net.h

Andreas, I assume you'll handle this.


Re: [PATCH v10 11/12] bpf: enable direct packet data write for xdp progs

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:56PM -0700, Brenden Blanco wrote:
> For forwarding to be effective, XDP programs should be allowed to
> rewrite packet data.
> 
> This requires that the drivers supporting XDP must all map the packet
> memory as TODEVICE or BIDIRECTIONAL before invoking the program.
> 
> Signed-off-by: Brenden Blanco 
> ---
>  kernel/bpf/verifier.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a8d67d0..f72f23b 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -653,6 +653,16 @@ static int check_map_access(struct verifier_env *env, 
> u32 regno, int off,
>  
>  #define MAX_PACKET_OFF 0x
>  
> +static bool may_write_pkt_data(enum bpf_prog_type type)
> +{
> + switch (type) {
> + case BPF_PROG_TYPE_XDP:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
>  static int check_packet_access(struct verifier_env *env, u32 regno, int off,
>  int size)
>  {
> @@ -806,10 +816,15 @@ static int check_mem_access(struct verifier_env *env, 
> u32 regno, int off,
>   err = check_stack_read(state, off, size, value_regno);
>   }
>   } else if (state->regs[regno].type == PTR_TO_PACKET) {
> - if (t == BPF_WRITE) {
> + if (t == BPF_WRITE && !may_write_pkt_data(env->prog->type)) {
>   verbose("cannot write into packet\n");
>   return -EACCES;
>   }
> + if (t == BPF_WRITE && value_regno >= 0 &&
> + is_pointer_value(env, value_regno)) {
> + verbose("R%d leaks addr into packet\n", value_regno);
> + return -EACCES;
> + }

Like this extra security check :) though it's arguably overkill.

Acked-by: Alexei Starovoitov 



Re: [PATCH v10 08/12] bpf: add XDP_TX xdp_action for direct forwarding

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:53PM -0700, Brenden Blanco wrote:
> XDP enabled drivers must transmit received packets back out on the same
> port they were received on when a program returns this action.
> 
> Signed-off-by: Brenden Blanco 
> ---
>  include/uapi/linux/bpf.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index a517865..2b7076f 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -449,6 +449,7 @@ enum xdp_action {
>   XDP_ABORTED = 0,
>   XDP_DROP,
>   XDP_PASS,
> + XDP_TX,

Acked-by: Alexei Starovoitov 



Re: [PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:52PM -0700, Brenden Blanco wrote:
> The mlx4 driver by default allocates order-3 pages for the ring to
> consume in multiple fragments. When the device has an xdp program, this
> behavior will prevent tx actions since the page must be re-mapped in
> TODEVICE mode, which cannot be done if the page is still shared.
> 
> Start by making the allocator configurable based on whether xdp is
> running, such that order-0 pages are always used and never shared.
> 
> Since this will stress the page allocator, add a simple page cache to
> each rx ring. Pages in the cache are left dma-mapped, and in drop-only
> stress tests the page allocator is eliminated from the perf report.
> 
> Note that setting an xdp program will now require the rings to be
> reconfigured.
> 
> Before:
>  26.91%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_process_rx_cq
>  17.88%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_alloc_frags
>   6.00%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_free_frag
>   4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
>   3.21%  swapper  [kernel.vmlinux]  [k] intel_idle
>   2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
>   2.57%  swapper  [mlx4_en] [k] mlx4_en_process_rx_cq
> 
> After:
>  31.72%  swapper  [kernel.vmlinux]   [k] intel_idle
>   8.79%  swapper  [mlx4_en]  [k] mlx4_en_process_rx_cq
>   7.54%  swapper  [kernel.vmlinux]   [k] poll_idle
>   6.36%  swapper  [mlx4_core][k] mlx4_eq_int
>   4.21%  swapper  [kernel.vmlinux]   [k] tasklet_action
>   4.03%  swapper  [kernel.vmlinux]   [k] cpuidle_enter_state
>   3.43%  swapper  [mlx4_en]  [k] mlx4_en_prepare_rx_desc
>   2.18%  swapper  [kernel.vmlinux]   [k] native_irq_return_iret
>   1.37%  swapper  [kernel.vmlinux]   [k] menu_select
>   1.09%  swapper  [kernel.vmlinux]   [k] bpf_map_lookup_elem
> 
> Signed-off-by: Brenden Blanco 
...
> +#define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT)
> +struct mlx4_en_page_cache {
> + u32 index;
> + struct mlx4_en_rx_alloc buf[MLX4_EN_CACHE_SIZE];
> +};

amazing that this tiny recycling pool makes such a huge difference.
Acked-by: Alexei Starovoitov 



Re: [PATCH v10 01/12] bpf: add bpf_prog_add api for bulk prog refcnt

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:46PM -0700, Brenden Blanco wrote:
> A subsystem may need to store many copies of a bpf program, each
> deserving its own reference. Rather than requiring the caller to loop
> one by one (with possible mid-loop failure), add a bulk bpf_prog_add
> api.
> 
> Signed-off-by: Brenden Blanco 
> ---
>  include/linux/bpf.h  |  1 +
>  kernel/bpf/syscall.c | 12 +---
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index c13e92b..75a5ae6 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -224,6 +224,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl);
>  
>  struct bpf_prog *bpf_prog_get(u32 ufd);
>  struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
> +struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i);
>  struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
>  void bpf_prog_put(struct bpf_prog *prog);
>  
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 96d938a..228f962 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -670,14 +670,20 @@ static struct bpf_prog *bpf_prog_get(struct fd f)
>   return f.file->private_data;
>  }
>  
> -struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
> +struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
>  {
> - if (atomic_inc_return(>aux->refcnt) > BPF_MAX_REFCNT) {
> - atomic_dec(>aux->refcnt);
> + if (atomic_add_return(i, >aux->refcnt) > BPF_MAX_REFCNT) {
> + atomic_sub(i, >aux->refcnt);
>   return ERR_PTR(-EBUSY);
>   }
>   return prog;
>  }
> +EXPORT_SYMBOL_GPL(bpf_prog_add);
> +
> +struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
> +{
> + return bpf_prog_add(prog, 1);
> +}

that extension turned out to be smaller than I thought :)
Acked-by: Alexei Starovoitov 



Re: [PATCH v10 06/12] Add sample for adding simple drop program to link

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:51PM -0700, Brenden Blanco wrote:
> Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
> hook of a link. With the drop-only program, observed single core rate is
> ~20Mpps.
> 
> Other tests were run, for instance without the dropcnt increment or
> without reading from the packet header, the packet rate was mostly
> unchanged.
> 
> $ perf record -a samples/bpf/xdp1 $( proto 17:   20403027 drops/s
> 
> ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
> Running... ctrl^C to stop
> Device: eth4@0
> Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
>   5056638pps 2427Mb/sec (2427186240bps) errors: 0
> Device: eth4@1
> Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
>   5133311pps 2463Mb/sec (2463989280bps) errors: 0
> Device: eth4@2
> Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
>   5077431pps 2437Mb/sec (2437166880bps) errors: 0
> Device: eth4@3
> Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
>   5043067pps 2420Mb/sec (2420672160bps) errors: 0
> 
> perf report --no-children:
>  26.05%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_process_rx_cq
>  17.84%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_alloc_frags
>   5.52%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_free_frag
>   4.90%  swapper  [kernel.vmlinux]  [k] poll_idle
>   4.14%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
>   2.78%  ksoftirqd/0  [kernel.vmlinux]  [k] __free_pages_ok
>   2.57%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
>   2.51%  swapper  [mlx4_en] [k] mlx4_en_process_rx_cq
>   1.94%  ksoftirqd/0  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
>   1.45%  swapper  [mlx4_en] [k] mlx4_en_alloc_frags
>   1.35%  ksoftirqd/0  [kernel.vmlinux]  [k] free_one_page
>   1.33%  swapper  [kernel.vmlinux]  [k] intel_idle
>   1.04%  ksoftirqd/0  [mlx4_en] [k] 0x0001c5c5
>   0.96%  ksoftirqd/0  [mlx4_en] [k] 0x0001c58d
>   0.93%  ksoftirqd/0  [mlx4_en] [k] 0x0001c6ee
>   0.92%  ksoftirqd/0  [mlx4_en] [k] 0x0001c6b9
>   0.89%  ksoftirqd/0  [kernel.vmlinux]  [k] __alloc_pages_nodemask
>   0.83%  ksoftirqd/0  [mlx4_en] [k] 0x0001c686
>   0.83%  ksoftirqd/0  [mlx4_en] [k] 0x0001c5d5
>   0.78%  ksoftirqd/0  [mlx4_en] [k] mlx4_alloc_pages.isra.23
>   0.77%  ksoftirqd/0  [mlx4_en] [k] 0x0001c5b4
>   0.77%  ksoftirqd/0  [kernel.vmlinux]  [k] net_rx_action
> 
> machine specs:
>  receiver - Intel E5-1630 v3 @ 3.70GHz
>  sender - Intel E5645 @ 2.40GHz
>  Mellanox ConnectX-3 @40G
> 
> Signed-off-by: Brenden Blanco 
...
> +int main(int ac, char **argv)
> +{
> + char filename[256];
> +
> + snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
> +
> + if (ac != 2) {
> + printf("usage: %s IFINDEX\n", argv[0]);
> + return 1;
> + }
> +
> + ifindex = strtoul(argv[1], NULL, 0);

great test. some future extension could be to use dev name instead of id.

Acked-by: Alexei Starovoitov 



Re: [PATCH v10 05/12] net/mlx4_en: add support for fast rx drop bpf program

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:50PM -0700, Brenden Blanco wrote:
> Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.
> 
> In tc/socket bpf programs, helpers linearize skb fragments as needed
> when the program touches the packet data. However, in the pursuit of
> speed, XDP programs will not be allowed to use these slower functions,
> especially if it involves allocating an skb.
> 
> Therefore, disallow MTU settings that would produce a multi-fragment
> packet that XDP programs would fail to access. Future enhancements could
> be done to increase the allowable MTU.
> 
> The xdp program is present as a per-ring data structure, but as of yet
> it is not possible to set at that granularity through any ndo.
> 
> Signed-off-by: Brenden Blanco 
...
> +static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> +{
> + struct mlx4_en_priv *priv = netdev_priv(dev);
> + struct bpf_prog *old_prog;
> + int xdp_ring_num;
> + int i;
> +
> + xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0;
> +
> + if (priv->num_frags > 1) {
> + en_err(priv, "Cannot set XDP if MTU requires multiple frags\n");
> + return -EOPNOTSUPP;
> + }
> +
> + if (prog) {
> + prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
> + if (IS_ERR(prog))
> + return PTR_ERR(prog);
> + }
> +
> + priv->xdp_ring_num = xdp_ring_num;
> +
> + /* This xchg is paired with READ_ONCE in the fast path */
> + for (i = 0; i < priv->rx_ring_num; i++) {
> + old_prog = xchg(>rx_ring[i]->xdp_prog, prog);
> + if (old_prog)
> + bpf_prog_put(old_prog);
> + }

priv->xdp_ring_num looks similar priv->rx_ring_num, so on the first glance
it seemed that the per ring refactoring broke detach logic, but no. it's good.
Acked-by: Alexei Starovoitov 



Re: [PATCH v10 02/12] bpf: add XDP prog type for early driver filter

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 12:16:47PM -0700, Brenden Blanco wrote:
> Add a new bpf prog type that is intended to run in early stages of the
> packet rx path. Only minimal packet metadata will be available, hence a
> new context type, struct xdp_md, is exposed to userspace. So far only
> expose the packet start and end pointers, and only in read mode.
> 
> An XDP program must return one of the well known enum values, all other
> return codes are reserved for future use. Unfortunately, this
> restriction is hard to enforce at verification time, so take the
> approach of warning at runtime when such programs are encountered. Out
> of bounds return codes should alias to XDP_ABORTED.
> 
> Signed-off-by: Brenden Blanco 

Acked-by: Alexei Starovoitov 



Re: [Intel-wired-lan] [PATCH net] e1000e: fix PTP on e1000_pch_lpt variants

2016-07-19 Thread Rustad, Mark D

Jarod Wilson  wrote:


I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used
as a PTP slave experiences random ~10 hour clock jumps, which are resolved
if the same workaround for the 82574 and 82583 is employed. Switching from
an if to a select, because the list of NIC types could well grow further
and we'd already have to wrap the conditionals.

CC: Jeff Kirsher 
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c  
b/drivers/net/ethernet/intel/e1000e/netdev.c

index 2b2e2f8..866fea0 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4335,7 +4335,10 @@ static cycle_t e1000e_cyclecounter_read(const  
struct cyclecounter *cc)

systim = (cycle_t)systimel;
systim |= (cycle_t)systimeh << 32;

-   if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
+   switch (hw->mac.type) {
+   case e1000_82574:
+   case e1000_82583:
+   case e1000_pch_lpt:
u64 time_delta, rem, temp;
u32 incvalue;
int i;


I don't think that it is acceptable to declare local variables inside a  
switch statement quite like this. At a minimum, a new block needs to be  
opened to allow the declarations.


@@ -4360,6 +4363,9 @@ static cycle_t e1000e_cyclecounter_read(const  
struct cyclecounter *cc)

(rem == 0))
break;
}
+   break;
+   default:
+   break;
}
return systim;
 }


--
Mark Rustad, Networking Division, Intel Corporation


signature.asc
Description: Message signed with OpenPGP using GPGMail


[PATCH net] e1000e: fix PTP on e1000_pch_lpt variants

2016-07-19 Thread Jarod Wilson
I've got reports that the Intel I-218V NIC in Intel NUC5i5RYH systems used
as a PTP slave experiences random ~10 hour clock jumps, which are resolved
if the same workaround for the 82574 and 82583 is employed. Switching from
an if to a select, because the list of NIC types could well grow further
and we'd already have to wrap the conditionals.

CC: Jeff Kirsher 
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 2b2e2f8..866fea0 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4335,7 +4335,10 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
systim = (cycle_t)systimel;
systim |= (cycle_t)systimeh << 32;
 
-   if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
+   switch (hw->mac.type) {
+   case e1000_82574:
+   case e1000_82583:
+   case e1000_pch_lpt:
u64 time_delta, rem, temp;
u32 incvalue;
int i;
@@ -4360,6 +4363,9 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
(rem == 0))
break;
}
+   break;
+   default:
+   break;
}
return systim;
 }
-- 
1.8.3.1



Re: [PATCH] WiMAX-i2400m: Delete an unnecessary check before the function call "kfree_skb"

2016-07-19 Thread SF Markus Elfring
> From: Markus Elfring 
> Date: Mon, 16 Nov 2015 11:17:55 +0100
> 
> The kfree_skb() function tests whether its argument is NULL and then
> returns immediately. Thus the test around the call is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/net/wimax/i2400m/control.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wimax/i2400m/control.c 
> b/drivers/net/wimax/i2400m/control.c
> index 4c41790..4de22b7 100644
> --- a/drivers/net/wimax/i2400m/control.c
> +++ b/drivers/net/wimax/i2400m/control.c
> @@ -644,7 +644,7 @@ void i2400m_msg_to_dev_cancel_wait(struct i2400m *i2400m, 
> int code)
>  
>   spin_lock_irqsave(>rx_lock, flags);
>   ack_skb = i2400m->ack_skb;
> - if (ack_skb && !IS_ERR(ack_skb))
> + if (!IS_ERR(ack_skb))
>   kfree_skb(ack_skb);
>   i2400m->ack_skb = ERR_PTR(code);
>   spin_unlock_irqrestore(>rx_lock, flags);
> 

How do you think about to integrate this update suggestion
into another source code repository?

Regards,
Markus


[PATCH v10 06/12] Add sample for adding simple drop program to link

2016-07-19 Thread Brenden Blanco
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.

Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.

$ perf record -a samples/bpf/xdp1 $(
---
 samples/bpf/Makefile|   4 ++
 samples/bpf/bpf_load.c  |   8 +++
 samples/bpf/xdp1_kern.c |  93 +
 samples/bpf/xdp1_user.c | 181 
 4 files changed, 286 insertions(+)
 create mode 100644 samples/bpf/xdp1_kern.c
 create mode 100644 samples/bpf/xdp1_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index a98b780..0e4ab3a 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -21,6 +21,7 @@ hostprogs-y += spintest
 hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
+hostprogs-y += xdp1
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -42,6 +43,7 @@ spintest-objs := bpf_load.o libbpf.o spintest_user.o
 map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
+xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -64,6 +66,7 @@ always += test_overhead_tp_kern.o
 always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
 always += test_cgrp2_tc_kern.o
+always += xdp1_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
@@ -84,6 +87,7 @@ HOSTLOADLIBES_offwaketime += -lelf
 HOSTLOADLIBES_spintest += -lelf
 HOSTLOADLIBES_map_perf_test += -lelf -lrt
 HOSTLOADLIBES_test_overhead += -lelf -lrt
+HOSTLOADLIBES_xdp1 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 022af71..0cfda23 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -50,6 +50,7 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
+   bool is_xdp = strncmp(event, "xdp", 3) == 0;
enum bpf_prog_type prog_type;
char buf[256];
int fd, efd, err, id;
@@ -66,6 +67,8 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
prog_type = BPF_PROG_TYPE_KPROBE;
} else if (is_tracepoint) {
prog_type = BPF_PROG_TYPE_TRACEPOINT;
+   } else if (is_xdp) {
+   prog_type = BPF_PROG_TYPE_XDP;
} else {
printf("Unknown event '%s'\n", event);
return -1;
@@ -79,6 +82,9 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
 
prog_fd[prog_cnt++] = fd;
 
+   if (is_xdp)
+   return 0;
+
if (is_socket) {
event += 6;
if (*event != '/')
@@ -319,6 +325,7 @@ int load_bpf_file(char *path)
if (memcmp(shname_prog, "kprobe/", 7) == 0 ||
memcmp(shname_prog, "kretprobe/", 10) == 0 ||
memcmp(shname_prog, "tracepoint/", 11) == 0 ||
+   memcmp(shname_prog, "xdp", 3) == 0 ||
memcmp(shname_prog, "socket", 6) == 0)
load_and_attach(shname_prog, insns, 
data_prog->d_size);
}
@@ -336,6 +343,7 @@ int load_bpf_file(char *path)
if (memcmp(shname, "kprobe/", 7) == 0 ||
memcmp(shname, "kretprobe/", 10) == 0 ||
memcmp(shname, "tracepoint/", 11) == 0 ||
+   memcmp(shname, "xdp", 3) == 0 ||
memcmp(shname, "socket", 6) == 0)
load_and_attach(shname, data->d_buf, data->d_size);
}
diff --git a/samples/bpf/xdp1_kern.c b/samples/bpf/xdp1_kern.c
new file mode 100644
index 000..e7dd8ac
--- /dev/null
+++ b/samples/bpf/xdp1_kern.c
@@ -0,0 +1,93 @@
+/* Copyright (c) 2016 PLUMgrid
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") dropcnt = {
+   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(long),
+   .max_entries = 256,
+};
+

[PATCH v10 05/12] net/mlx4_en: add support for fast rx drop bpf program

2016-07-19 Thread Brenden Blanco
Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.

In tc/socket bpf programs, helpers linearize skb fragments as needed
when the program touches the packet data. However, in the pursuit of
speed, XDP programs will not be allowed to use these slower functions,
especially if it involves allocating an skb.

Therefore, disallow MTU settings that would produce a multi-fragment
packet that XDP programs would fail to access. Future enhancements could
be done to increase the allowable MTU.

The xdp program is present as a per-ring data structure, but as of yet
it is not possible to set at that granularity through any ndo.

Signed-off-by: Brenden Blanco 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 60 ++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 40 +++--
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  6 +++
 3 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 6083775..c34a33d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -31,6 +31,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -2112,6 +2113,11 @@ static int mlx4_en_change_mtu(struct net_device *dev, 
int new_mtu)
en_err(priv, "Bad MTU size:%d.\n", new_mtu);
return -EPERM;
}
+   if (priv->xdp_ring_num && MLX4_EN_EFF_MTU(new_mtu) > FRAG_SZ0) {
+   en_err(priv, "MTU size:%d requires frags but XDP running\n",
+  new_mtu);
+   return -EOPNOTSUPP;
+   }
dev->mtu = new_mtu;
 
if (netif_running(dev)) {
@@ -2520,6 +2526,58 @@ static int mlx4_en_set_tx_maxrate(struct net_device 
*dev, int queue_index, u32 m
return err;
 }
 
+static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+{
+   struct mlx4_en_priv *priv = netdev_priv(dev);
+   struct bpf_prog *old_prog;
+   int xdp_ring_num;
+   int i;
+
+   xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0;
+
+   if (priv->num_frags > 1) {
+   en_err(priv, "Cannot set XDP if MTU requires multiple frags\n");
+   return -EOPNOTSUPP;
+   }
+
+   if (prog) {
+   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+   }
+
+   priv->xdp_ring_num = xdp_ring_num;
+
+   /* This xchg is paired with READ_ONCE in the fast path */
+   for (i = 0; i < priv->rx_ring_num; i++) {
+   old_prog = xchg(>rx_ring[i]->xdp_prog, prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+   }
+
+   return 0;
+}
+
+static bool mlx4_xdp_attached(struct net_device *dev)
+{
+   struct mlx4_en_priv *priv = netdev_priv(dev);
+
+   return !!priv->xdp_ring_num;
+}
+
+static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+   switch (xdp->command) {
+   case XDP_SETUP_PROG:
+   return mlx4_xdp_set(dev, xdp->prog);
+   case XDP_QUERY_PROG:
+   xdp->prog_attached = mlx4_xdp_attached(dev);
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
 static const struct net_device_ops mlx4_netdev_ops = {
.ndo_open   = mlx4_en_open,
.ndo_stop   = mlx4_en_close,
@@ -2548,6 +2606,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
.ndo_udp_tunnel_del = mlx4_en_del_vxlan_port,
.ndo_features_check = mlx4_en_features_check,
.ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate,
+   .ndo_xdp= mlx4_xdp,
 };
 
 static const struct net_device_ops mlx4_netdev_ops_master = {
@@ -2584,6 +2643,7 @@ static const struct net_device_ops mlx4_netdev_ops_master 
= {
.ndo_udp_tunnel_del = mlx4_en_del_vxlan_port,
.ndo_features_check = mlx4_en_features_check,
.ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate,
+   .ndo_xdp= mlx4_xdp,
 };
 
 struct mlx4_en_bond {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c1b3a9c..6729545 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -32,6 +32,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -509,6 +510,8 @@ void mlx4_en_destroy_rx_ring(struct mlx4_en_priv *priv,
struct mlx4_en_dev *mdev = priv->mdev;
struct mlx4_en_rx_ring *ring = *pring;
 
+   if (ring->xdp_prog)
+   bpf_prog_put(ring->xdp_prog);
mlx4_free_hwq_res(mdev->dev, >wqres, size * stride + TXBB_SIZE);
vfree(ring->rx_info);
ring->rx_info = NULL;
@@ -743,6 +746,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq 

[PATCH v10 00/12] Add driver bpf hook for early packet drop and forwarding

2016-07-19 Thread Brenden Blanco
This patch set introduces new infrastructure for programmatically
processing packets in the earliest stages of rx, as part of an effort
others are calling eXpress Data Path (XDP) [1]. Start this effort by
introducing a new bpf program type for early packet filtering, before
even an skb has been allocated.

Extend on this with the ability to modify packet data and send back out
on the same port.

Patch 1 adds an API for bulk bpf prog refcnt incrememnt.
Patch 2 introduces the new prog type and helpers for validating the bpf
  program. A new userspace struct is defined containing only data and
  data_end as fields, with others to follow in the future.
In patch 3, create a new ndo to pass the fd to supported drivers.
In patch 4, expose a new rtnl option to userspace.
In patch 5, enable support in mlx4 driver.
In patch 6, create a sample drop and count program. With single core,
  achieved ~20 Mpps drop rate on a 40G ConnectX3-Pro. This includes
  packet data access, bpf array lookup, and increment.
In patch 7, add a page recycle facility to mlx4 rx, enabled when xdp is
  active.
In patch 8, add the XDP_TX type to bpf.h
In patch 9, add helper in tx patch for writing tx_desc
In patch 10, add support in mlx4 for packet data write and forwarding
In patch 11, turn on packet write support in the bpf verifier
In patch 12, add a sample program for packet write and forwarding. With
  single core, achieved ~10 Mpps rewrite and forwarding.

[1] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf

v10:
 1/12: Add bulk refcnt api.
 5/12: Move prog from priv to ring. This attribute is still only set
   globally, but the path to finer granularity should be clear. No lock
   is taken, so some rings may operate on older programs for a time (one
   napi loop). Looked into options such as napi_synchronize, but they
   were deemed too slow (calls to msleep).
   Rename prog to xdp_prog. Add xdp_ring_num to help with accounting,
   used more heavily in later patches.
 7/12: Adjust to use per-ring xdp prog. Use priv->xdp_ring_num where
   before priv->prog was used to determine buffer allocations.
 9/12: Add cpu_to_be16 to vlan_tag in mxl4_en_xmit(). Remove unused variable
   from mlx4_en_xmit and unused params from build_inline_wqe.

v9:
 4/11: Add missing newline in en_err message.
 6/11: Move page_cache cleanup from mlx4_en_destroy_rx_ring to
   mlx4_en_deactivate_rx_ring. Move mlx4_en_moderation_update back to
   static. Remove calls to mlx4_en_alloc/free_resources in mlx4_xdp_set.
   Adopt instead the approach of mlx4_en_change_mtu to use a watchdog.
 9/11: Use a per-ring function pointer in tx to separate out the code
   for regular and recycle paths of tx completion handling. Add a helper
   function to init the recycle ring and callback, called just after
   activating tx. Remove extra tx ring resource requirement, and instead
   steal from the upper rings. This helps to avoid needing
   mlx4_en_alloc_resources. Add some hopefully meaningful error
   messages for the various error cases. Reverted some of the
   hard-to-follow logic that was accounting for the extra tx rings.

v8:
 1/11: Reduce WARN_ONCE to single line. Also, change act param of that
   function to u32 to match return type of bpf_prog_run_xdp.
 2/11: Clarify locking semantics in ndo comment.
 4/11: Add en_err warning in mlx4_xdp_set on num_frags/mtu violation.

v7:
 Addressing two of the major discussion points: return codes and ndo.
 The rest will be taken as todo items for separate patches.

 Add an XDP_ABORTED type, which explicitly falls through to DROP. The
 same result must be taken for the default case as well, as it is now
 well-defined API behavior.

 Merge ndo_xdp_* into a single ndo. The style is similar to
 ndo_setup_tc, but with less unidirectional naming convention. The IFLA
 parameter names are unchanged.

 TODOs:
 Add ethtool per-ring stats for aborted, default cases, maybe even drop
 and tx as well.
 Avoid duplicate dma sync operation in XDP_PASS case as mentioned by
 Saeed.

  1/12: Add XDP_ABORTED enum, reword API comment, and update commit
   message.
  2/12: Rewrite ndo_xdp_*() into single ndo_xdp() with type/union style
calling convention.
  3/12: Switch to ndo_xdp callback.
  4/12: Add XDP_ABORTED case as a fall-through to XDP_DROP. Implement
ndo_xdp.
 12/12: Dropped, this will need some more work.

v6:
  2/12: drop unnecessary netif_device_present check
  4/12, 6/12, 9/12: Reorder default case statement above drop case to
remove some copy/paste.

v5:
  0/12: Rebase and remove previous 1/13 patch
  1/12: Fix nits from Daniel. Left the (void *) cast as-is, to be fixed
in future. Add bpf_warn_invalid_xdp_action() helper, to be used when
out of bounds action is returned by the program. Add a comment to
bpf.h denoting the undefined nature of out of bounds returns.
  2/12: Switch to using bpf_prog_get_type(). Rename ndo_xdp_get() to
ndo_xdp_attached().
  3/12: Add IFLA_XDP as a nested type, and 

[PATCH v10 01/12] bpf: add bpf_prog_add api for bulk prog refcnt

2016-07-19 Thread Brenden Blanco
A subsystem may need to store many copies of a bpf program, each
deserving its own reference. Rather than requiring the caller to loop
one by one (with possible mid-loop failure), add a bulk bpf_prog_add
api.

Signed-off-by: Brenden Blanco 
---
 include/linux/bpf.h  |  1 +
 kernel/bpf/syscall.c | 12 +---
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c13e92b..75a5ae6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -224,6 +224,7 @@ void bpf_register_map_type(struct bpf_map_type_list *tl);
 
 struct bpf_prog *bpf_prog_get(u32 ufd);
 struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
+struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i);
 struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 96d938a..228f962 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -670,14 +670,20 @@ static struct bpf_prog *bpf_prog_get(struct fd f)
return f.file->private_data;
 }
 
-struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
+struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
 {
-   if (atomic_inc_return(>aux->refcnt) > BPF_MAX_REFCNT) {
-   atomic_dec(>aux->refcnt);
+   if (atomic_add_return(i, >aux->refcnt) > BPF_MAX_REFCNT) {
+   atomic_sub(i, >aux->refcnt);
return ERR_PTR(-EBUSY);
}
return prog;
 }
+EXPORT_SYMBOL_GPL(bpf_prog_add);
+
+struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
+{
+   return bpf_prog_add(prog, 1);
+}
 
 static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type)
 {
-- 
2.8.2



[PATCH v10 08/12] bpf: add XDP_TX xdp_action for direct forwarding

2016-07-19 Thread Brenden Blanco
XDP enabled drivers must transmit received packets back out on the same
port they were received on when a program returns this action.

Signed-off-by: Brenden Blanco 
---
 include/uapi/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a517865..2b7076f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -449,6 +449,7 @@ enum xdp_action {
XDP_ABORTED = 0,
XDP_DROP,
XDP_PASS,
+   XDP_TX,
 };
 
 /* user accessible metadata for XDP packet hook
-- 
2.8.2



[PATCH v10 07/12] net/mlx4_en: add page recycle to prepare rx ring for tx support

2016-07-19 Thread Brenden Blanco
The mlx4 driver by default allocates order-3 pages for the ring to
consume in multiple fragments. When the device has an xdp program, this
behavior will prevent tx actions since the page must be re-mapped in
TODEVICE mode, which cannot be done if the page is still shared.

Start by making the allocator configurable based on whether xdp is
running, such that order-0 pages are always used and never shared.

Since this will stress the page allocator, add a simple page cache to
each rx ring. Pages in the cache are left dma-mapped, and in drop-only
stress tests the page allocator is eliminated from the perf report.

Note that setting an xdp program will now require the rings to be
reconfigured.

Before:
 26.91%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_process_rx_cq
 17.88%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_alloc_frags
  6.00%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_free_frag
  4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  3.21%  swapper  [kernel.vmlinux]  [k] intel_idle
  2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.57%  swapper  [mlx4_en] [k] mlx4_en_process_rx_cq

After:
 31.72%  swapper  [kernel.vmlinux]   [k] intel_idle
  8.79%  swapper  [mlx4_en]  [k] mlx4_en_process_rx_cq
  7.54%  swapper  [kernel.vmlinux]   [k] poll_idle
  6.36%  swapper  [mlx4_core][k] mlx4_eq_int
  4.21%  swapper  [kernel.vmlinux]   [k] tasklet_action
  4.03%  swapper  [kernel.vmlinux]   [k] cpuidle_enter_state
  3.43%  swapper  [mlx4_en]  [k] mlx4_en_prepare_rx_desc
  2.18%  swapper  [kernel.vmlinux]   [k] native_irq_return_iret
  1.37%  swapper  [kernel.vmlinux]   [k] menu_select
  1.09%  swapper  [kernel.vmlinux]   [k] bpf_map_lookup_elem

Signed-off-by: Brenden Blanco 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 38 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 70 +++---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   | 11 +++-
 3 files changed, 109 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c34a33d..47ae2a2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2529,12 +2529,33 @@ static int mlx4_en_set_tx_maxrate(struct net_device 
*dev, int queue_index, u32 m
 static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
+   struct mlx4_en_dev *mdev = priv->mdev;
struct bpf_prog *old_prog;
int xdp_ring_num;
+   int port_up = 0;
+   int err;
int i;
 
xdp_ring_num = prog ? ALIGN(priv->rx_ring_num, MLX4_EN_NUM_UP) : 0;
 
+   /* No need to reconfigure buffers when simply swapping the
+* program for a new one.
+*/
+   if (priv->xdp_ring_num == xdp_ring_num) {
+   if (prog) {
+   prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+   }
+   for (i = 0; i < priv->rx_ring_num; i++) {
+   /* This xchg is paired with READ_ONCE in the fastpath */
+   old_prog = xchg(>rx_ring[i]->xdp_prog, prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+   }
+   return 0;
+   }
+
if (priv->num_frags > 1) {
en_err(priv, "Cannot set XDP if MTU requires multiple frags\n");
return -EOPNOTSUPP;
@@ -2546,15 +2567,30 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
return PTR_ERR(prog);
}
 
+   mutex_lock(>state_lock);
+   if (priv->port_up) {
+   port_up = 1;
+   mlx4_en_stop_port(dev, 1);
+   }
+
priv->xdp_ring_num = xdp_ring_num;
 
-   /* This xchg is paired with READ_ONCE in the fast path */
for (i = 0; i < priv->rx_ring_num; i++) {
old_prog = xchg(>rx_ring[i]->xdp_prog, prog);
if (old_prog)
bpf_prog_put(old_prog);
}
 
+   if (port_up) {
+   err = mlx4_en_start_port(dev);
+   if (err) {
+   en_err(priv, "Failed starting port %d for XDP change\n",
+  priv->port);
+   queue_work(mdev->workqueue, >watchdog_task);
+   }
+   }
+
+   mutex_unlock(>state_lock);
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6729545..9dd5dc1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -58,7 +58,7 @@ 

[PATCH v10 04/12] rtnl: add option for setting link xdp prog

2016-07-19 Thread Brenden Blanco
Sets the bpf program represented by fd as an early filter in the rx path
of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
Providing a negative value as fd clears the program. Getting the fd back
via rtnl is not possible, therefore reading of this value merely
provides a bool whether the program is valid on the link or not.

Signed-off-by: Brenden Blanco 
---
 include/uapi/linux/if_link.h | 12 +
 net/core/rtnetlink.c | 64 
 2 files changed, 76 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 4285ac3..a1b5202 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -156,6 +156,7 @@ enum {
IFLA_GSO_MAX_SEGS,
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
+   IFLA_XDP,
__IFLA_MAX
 };
 
@@ -843,4 +844,15 @@ enum {
 };
 #define LINK_XSTATS_TYPE_MAX (__LINK_XSTATS_TYPE_MAX - 1)
 
+/* XDP section */
+
+enum {
+   IFLA_XDP_UNSPEC,
+   IFLA_XDP_FD,
+   IFLA_XDP_ATTACHED,
+   __IFLA_XDP_MAX,
+};
+
+#define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a9e3805..eba2b82 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -891,6 +891,16 @@ static size_t rtnl_port_size(const struct net_device *dev,
return port_self_size;
 }
 
+static size_t rtnl_xdp_size(const struct net_device *dev)
+{
+   size_t xdp_size = nla_total_size(1);/* XDP_ATTACHED */
+
+   if (!dev->netdev_ops->ndo_xdp)
+   return 0;
+   else
+   return xdp_size;
+}
+
 static noinline size_t if_nlmsg_size(const struct net_device *dev,
 u32 ext_filter_mask)
 {
@@ -927,6 +937,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
   + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
+  + rtnl_xdp_size(dev) /* IFLA_XDP */
   + nla_total_size(1); /* IFLA_PROTO_DOWN */
 
 }
@@ -1211,6 +1222,33 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, 
struct net_device *dev)
return 0;
 }
 
+static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
+{
+   struct netdev_xdp xdp_op = {};
+   struct nlattr *xdp;
+   int err;
+
+   if (!dev->netdev_ops->ndo_xdp)
+   return 0;
+   xdp = nla_nest_start(skb, IFLA_XDP);
+   if (!xdp)
+   return -EMSGSIZE;
+   xdp_op.command = XDP_QUERY_PROG;
+   err = dev->netdev_ops->ndo_xdp(dev, _op);
+   if (err)
+   goto err_cancel;
+   err = nla_put_u8(skb, IFLA_XDP_ATTACHED, xdp_op.prog_attached);
+   if (err)
+   goto err_cancel;
+
+   nla_nest_end(skb, xdp);
+   return 0;
+
+err_cancel:
+   nla_nest_cancel(skb, xdp);
+   return err;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags, u32 ext_filter_mask)
@@ -1307,6 +1345,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
if (rtnl_port_fill(skb, dev, ext_filter_mask))
goto nla_put_failure;
 
+   if (rtnl_xdp_fill(skb, dev))
+   goto nla_put_failure;
+
if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) {
if (rtnl_link_fill(skb, dev) < 0)
goto nla_put_failure;
@@ -1392,6 +1433,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_PHYS_SWITCH_ID]   = { .type = NLA_BINARY, .len = 
MAX_PHYS_ITEM_ID_LEN },
[IFLA_LINK_NETNSID] = { .type = NLA_S32 },
[IFLA_PROTO_DOWN]   = { .type = NLA_U8 },
+   [IFLA_XDP]  = { .type = NLA_NESTED },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1429,6 +1471,11 @@ static const struct nla_policy 
ifla_port_policy[IFLA_PORT_MAX+1] = {
[IFLA_PORT_RESPONSE]= { .type = NLA_U16, },
 };
 
+static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = {
+   [IFLA_XDP_FD]   = { .type = NLA_S32 },
+   [IFLA_XDP_ATTACHED] = { .type = NLA_U8 },
+};
+
 static const struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr 
*nla)
 {
const struct rtnl_link_ops *ops = NULL;
@@ -2054,6 +2101,23 @@ static int do_setlink(const struct sk_buff *skb,
status |= DO_SETLINK_NOTIFY;
}
 
+   if (tb[IFLA_XDP]) {
+   struct nlattr *xdp[IFLA_XDP_MAX + 1];
+
+   err = nla_parse_nested(xdp, IFLA_XDP_MAX, tb[IFLA_XDP],
+  ifla_xdp_policy);
+   if (err < 0)
+ 

[PATCH v10 02/12] bpf: add XDP prog type for early driver filter

2016-07-19 Thread Brenden Blanco
Add a new bpf prog type that is intended to run in early stages of the
packet rx path. Only minimal packet metadata will be available, hence a
new context type, struct xdp_md, is exposed to userspace. So far only
expose the packet start and end pointers, and only in read mode.

An XDP program must return one of the well known enum values, all other
return codes are reserved for future use. Unfortunately, this
restriction is hard to enforce at verification time, so take the
approach of warning at runtime when such programs are encountered. Out
of bounds return codes should alias to XDP_ABORTED.

Signed-off-by: Brenden Blanco 
---
 include/linux/filter.h   | 18 +++
 include/uapi/linux/bpf.h | 20 
 kernel/bpf/verifier.c|  1 +
 net/core/filter.c| 79 
 4 files changed, 118 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6fc31ef..15d816a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -368,6 +368,11 @@ struct bpf_skb_data_end {
void *data_end;
 };
 
+struct xdp_buff {
+   void *data;
+   void *data_end;
+};
+
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf and act_bpf programs
  */
@@ -429,6 +434,18 @@ static inline u32 bpf_prog_run_clear_cb(const struct 
bpf_prog *prog,
return BPF_PROG_RUN(prog, skb);
 }
 
+static inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
+  struct xdp_buff *xdp)
+{
+   u32 ret;
+
+   rcu_read_lock();
+   ret = BPF_PROG_RUN(prog, (void *)xdp);
+   rcu_read_unlock();
+
+   return ret;
+}
+
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
return max(sizeof(struct bpf_prog),
@@ -509,6 +526,7 @@ bool bpf_helper_changes_skb_data(void *func);
 
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
   const struct bpf_insn *patch, u32 len);
+void bpf_warn_invalid_xdp_action(u32 act);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c4d9224..a517865 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_CLS,
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
+   BPF_PROG_TYPE_XDP,
 };
 
 #define BPF_PSEUDO_MAP_FD  1
@@ -439,4 +440,23 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
 };
 
+/* User return codes for XDP prog type.
+ * A valid XDP program must return one of these defined values. All other
+ * return codes are reserved for future use. Unknown return codes will result
+ * in packet drop.
+ */
+enum xdp_action {
+   XDP_ABORTED = 0,
+   XDP_DROP,
+   XDP_PASS,
+};
+
+/* user accessible metadata for XDP packet hook
+ * new fields must be added to the end of this structure
+ */
+struct xdp_md {
+   __u32 data;
+   __u32 data_end;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e206c21..a8d67d0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -713,6 +713,7 @@ static int check_ptr_alignment(struct verifier_env *env, 
struct reg_state *reg,
switch (env->prog->type) {
case BPF_PROG_TYPE_SCHED_CLS:
case BPF_PROG_TYPE_SCHED_ACT:
+   case BPF_PROG_TYPE_XDP:
break;
default:
verbose("verifier is misconfigured\n");
diff --git a/net/core/filter.c b/net/core/filter.c
index 22e3992..6c627bc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2410,6 +2410,12 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
}
 }
 
+static const struct bpf_func_proto *
+xdp_func_proto(enum bpf_func_id func_id)
+{
+   return sk_filter_func_proto(func_id);
+}
+
 static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 {
if (off < 0 || off >= sizeof(struct __sk_buff))
@@ -2477,6 +2483,44 @@ static bool tc_cls_act_is_valid_access(int off, int size,
return __is_valid_access(off, size, type);
 }
 
+static bool __is_valid_xdp_access(int off, int size,
+ enum bpf_access_type type)
+{
+   if (off < 0 || off >= sizeof(struct xdp_md))
+   return false;
+   if (off % size != 0)
+   return false;
+   if (size != 4)
+   return false;
+
+   return true;
+}
+
+static bool xdp_is_valid_access(int off, int size,
+   enum bpf_access_type type,
+   enum bpf_reg_type *reg_type)
+{
+   if (type == BPF_WRITE)
+   return false;
+
+   switch (off) {
+   case offsetof(struct xdp_md, data):
+   *reg_type = PTR_TO_PACKET;
+   break;
+   case offsetof(struct xdp_md, data_end):
+   

[PATCH v10 10/12] net/mlx4_en: add xdp forwarding and data write support

2016-07-19 Thread Brenden Blanco
A user will now be able to loop packets back out of the same port using
a bpf program attached to xdp hook. Updates to the packet contents from
the bpf program is also supported.

For the packet write feature to work, the rx buffers are now mapped as
bidirectional when the page is allocated. This occurs only when the xdp
hook is active.

When the program returns a TX action, enqueue the packet directly to a
dedicated tx ring, so as to avoid completely any locking. This requires
the tx ring to be allocated 1:1 for each rx ring, as well as the tx
completion running in the same softirq.

Upon tx completion, this dedicated tx ring recycles pages without
unmapping directly back to the original rx ring. In steady state tx/drop
workload, effectively 0 page allocs/frees will occur.

In order to separate out the paths between free and recycle, a
free_tx_desc func pointer is introduced that is optionally updated
whenever recycle_ring is activated. By default the original free
function is always initialized.

Signed-off-by: Brenden Blanco 
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |   9 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  29 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c  |  14 +++
 drivers/net/ethernet/mellanox/mlx4/en_tx.c  | 140 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|  27 -
 5 files changed, 211 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 51a2e82..f32e272 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1722,6 +1722,12 @@ static int mlx4_en_set_channels(struct net_device *dev,
!channel->tx_count || !channel->rx_count)
return -EINVAL;
 
+   if (channel->tx_count * MLX4_EN_NUM_UP <= priv->xdp_ring_num) {
+   en_err(priv, "Minimum %d tx channels required with XDP on\n",
+  priv->xdp_ring_num / MLX4_EN_NUM_UP + 1);
+   return -EINVAL;
+   }
+
mutex_lock(>state_lock);
if (priv->port_up) {
port_up = 1;
@@ -1740,7 +1746,8 @@ static int mlx4_en_set_channels(struct net_device *dev,
goto out;
}
 
-   netif_set_real_num_tx_queues(dev, priv->tx_ring_num);
+   netif_set_real_num_tx_queues(dev, priv->tx_ring_num -
+   priv->xdp_ring_num);
netif_set_real_num_rx_queues(dev, priv->rx_ring_num);
 
if (dev->num_tc)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 47ae2a2..9abbba6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1522,6 +1522,24 @@ static void mlx4_en_free_affinity_hint(struct 
mlx4_en_priv *priv, int ring_idx)
free_cpumask_var(priv->rx_ring[ring_idx]->affinity_mask);
 }
 
+static void mlx4_en_init_recycle_ring(struct mlx4_en_priv *priv,
+ int tx_ring_idx)
+{
+   struct mlx4_en_tx_ring *tx_ring = priv->tx_ring[tx_ring_idx];
+   int rr_index;
+
+   rr_index = (priv->xdp_ring_num - priv->tx_ring_num) + tx_ring_idx;
+   if (rr_index >= 0) {
+   tx_ring->free_tx_desc = mlx4_en_recycle_tx_desc;
+   tx_ring->recycle_ring = priv->rx_ring[rr_index];
+   en_dbg(DRV, priv,
+  "Set tx_ring[%d]->recycle_ring = rx_ring[%d]\n",
+  tx_ring_idx, rr_index);
+   } else {
+   tx_ring->recycle_ring = NULL;
+   }
+}
+
 int mlx4_en_start_port(struct net_device *dev)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -1644,6 +1662,8 @@ int mlx4_en_start_port(struct net_device *dev)
}
tx_ring->tx_queue = netdev_get_tx_queue(dev, i);
 
+   mlx4_en_init_recycle_ring(priv, i);
+
/* Arm CQ for TX completions */
mlx4_en_arm_cq(priv, cq);
 
@@ -2561,6 +2581,13 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
return -EOPNOTSUPP;
}
 
+   if (priv->tx_ring_num < xdp_ring_num + MLX4_EN_NUM_UP) {
+   en_err(priv,
+  "Minimum %d tx channels required to run XDP\n",
+  (xdp_ring_num + MLX4_EN_NUM_UP) / MLX4_EN_NUM_UP);
+   return -EINVAL;
+   }
+
if (prog) {
prog = bpf_prog_add(prog, priv->rx_ring_num - 1);
if (IS_ERR(prog))
@@ -2574,6 +2601,8 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
}
 
priv->xdp_ring_num = xdp_ring_num;
+   netif_set_real_num_tx_queues(dev, priv->tx_ring_num -
+   priv->xdp_ring_num);
 
for (i = 0; i < 

[PATCH v10 11/12] bpf: enable direct packet data write for xdp progs

2016-07-19 Thread Brenden Blanco
For forwarding to be effective, XDP programs should be allowed to
rewrite packet data.

This requires that the drivers supporting XDP must all map the packet
memory as TODEVICE or BIDIRECTIONAL before invoking the program.

Signed-off-by: Brenden Blanco 
---
 kernel/bpf/verifier.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a8d67d0..f72f23b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -653,6 +653,16 @@ static int check_map_access(struct verifier_env *env, u32 
regno, int off,
 
 #define MAX_PACKET_OFF 0x
 
+static bool may_write_pkt_data(enum bpf_prog_type type)
+{
+   switch (type) {
+   case BPF_PROG_TYPE_XDP:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static int check_packet_access(struct verifier_env *env, u32 regno, int off,
   int size)
 {
@@ -806,10 +816,15 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
err = check_stack_read(state, off, size, value_regno);
}
} else if (state->regs[regno].type == PTR_TO_PACKET) {
-   if (t == BPF_WRITE) {
+   if (t == BPF_WRITE && !may_write_pkt_data(env->prog->type)) {
verbose("cannot write into packet\n");
return -EACCES;
}
+   if (t == BPF_WRITE && value_regno >= 0 &&
+   is_pointer_value(env, value_regno)) {
+   verbose("R%d leaks addr into packet\n", value_regno);
+   return -EACCES;
+   }
err = check_packet_access(env, regno, off, size);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown_value(state->regs, value_regno);
-- 
2.8.2



[PATCH v10 03/12] net: add ndo to setup/query xdp prog in adapter rx

2016-07-19 Thread Brenden Blanco
Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP
filter. The single op is used for both setup/query of the xdp program,
modelled after ndo_setup_tc.

Signed-off-by: Brenden Blanco 
---
 include/linux/netdevice.h | 34 ++
 net/core/dev.c| 33 +
 2 files changed, 67 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 49736a3..fab9a1c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -63,6 +63,7 @@ struct wpan_dev;
 struct mpls_dev;
 /* UDP Tunnel offloads */
 struct udp_tunnel_info;
+struct bpf_prog;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
const struct ethtool_ops *ops);
@@ -799,6 +800,33 @@ struct tc_to_netdev {
};
 };
 
+/* These structures hold the attributes of xdp state that are being passed
+ * to the netdevice through the xdp op.
+ */
+enum xdp_netdev_command {
+   /* Set or clear a bpf program used in the earliest stages of packet
+* rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
+* is responsible for calling bpf_prog_put on any old progs that are
+* stored. In case of error, the callee need not release the new prog
+* reference, but on success it takes ownership and must bpf_prog_put
+* when it is no longer used.
+*/
+   XDP_SETUP_PROG,
+   /* Check if a bpf program is set on the device.  The callee should
+* return true if a program is currently attached and running.
+*/
+   XDP_QUERY_PROG,
+};
+
+struct netdev_xdp {
+   enum xdp_netdev_command command;
+   union {
+   /* XDP_SETUP_PROG */
+   struct bpf_prog *prog;
+   /* XDP_QUERY_PROG */
+   bool prog_attached;
+   };
+};
 
 /*
  * This structure defines the management hooks for network devices.
@@ -1087,6 +1115,9 @@ struct tc_to_netdev {
  * appropriate rx headroom value allows avoiding skb head copy on
  * forward. Setting a negative value resets the rx headroom to the
  * default value.
+ * int (*ndo_xdp)(struct net_device *dev, struct netdev_xdp *xdp);
+ * This function is used to set or query state related to XDP on the
+ * netdevice. See definition of enum xdp_netdev_command for details.
  *
  */
 struct net_device_ops {
@@ -1271,6 +1302,8 @@ struct net_device_ops {
   struct sk_buff *skb);
void(*ndo_set_rx_headroom)(struct net_device *dev,
   int needed_headroom);
+   int (*ndo_xdp)(struct net_device *dev,
+  struct netdev_xdp *xdp);
 };
 
 /**
@@ -3257,6 +3290,7 @@ int dev_get_phys_port_id(struct net_device *dev,
 int dev_get_phys_port_name(struct net_device *dev,
   char *name, size_t len);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
+int dev_change_xdp_fd(struct net_device *dev, int fd);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device 
*dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device 
*dev,
struct netdev_queue *txq, int *ret);
diff --git a/net/core/dev.c b/net/core/dev.c
index 7894e40..2a9c39f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -94,6 +94,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -6615,6 +6616,38 @@ int dev_change_proto_down(struct net_device *dev, bool 
proto_down)
 EXPORT_SYMBOL(dev_change_proto_down);
 
 /**
+ * dev_change_xdp_fd - set or clear a bpf program for a device rx path
+ * @dev: device
+ * @fd: new program fd or negative value to clear
+ *
+ * Set or clear a bpf program for a device
+ */
+int dev_change_xdp_fd(struct net_device *dev, int fd)
+{
+   const struct net_device_ops *ops = dev->netdev_ops;
+   struct bpf_prog *prog = NULL;
+   struct netdev_xdp xdp = {};
+   int err;
+
+   if (!ops->ndo_xdp)
+   return -EOPNOTSUPP;
+   if (fd >= 0) {
+   prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+   }
+
+   xdp.command = XDP_SETUP_PROG;
+   xdp.prog = prog;
+   err = ops->ndo_xdp(dev, );
+   if (err < 0 && prog)
+   bpf_prog_put(prog);
+
+   return err;
+}
+EXPORT_SYMBOL(dev_change_xdp_fd);
+
+/**
  * dev_new_index   -   allocate an ifindex
  * @net: the applicable net namespace
  *
-- 
2.8.2



[PATCH v10 09/12] net/mlx4_en: break out tx_desc write into separate function

2016-07-19 Thread Brenden Blanco
In preparation for writing the tx descriptor from multiple functions,
create a helper for both normal and blueflame access.

Signed-off-by: Brenden Blanco 
---
 drivers/infiniband/hw/mlx4/qp.c|  11 +--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 134 -
 include/linux/mlx4/qp.h|  18 ++--
 3 files changed, 92 insertions(+), 71 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 8db8405..768085f 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -232,7 +232,7 @@ static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n, 
int size)
}
} else {
ctrl = buf = get_send_wqe(qp, n & (qp->sq.wqe_cnt - 1));
-   s = (ctrl->fence_size & 0x3f) << 4;
+   s = (ctrl->qpn_vlan.fence_size & 0x3f) << 4;
for (i = 64; i < s; i += 64) {
wqe = buf + i;
*wqe = cpu_to_be32(0x);
@@ -264,7 +264,7 @@ static void post_nop_wqe(struct mlx4_ib_qp *qp, int n, int 
size)
inl->byte_count = cpu_to_be32(1 << 31 | (size - s - sizeof 
*inl));
}
ctrl->srcrb_flags = 0;
-   ctrl->fence_size = size / 16;
+   ctrl->qpn_vlan.fence_size = size / 16;
/*
 * Make sure descriptor is fully written before setting ownership bit
 * (because HW can start executing as soon as we do).
@@ -1992,7 +1992,8 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
ctrl = get_send_wqe(qp, i);
ctrl->owner_opcode = cpu_to_be32(1 << 31);
if (qp->sq_max_wqes_per_wr == 1)
-   ctrl->fence_size = 1 << (qp->sq.wqe_shift - 4);
+   ctrl->qpn_vlan.fence_size =
+   1 << (qp->sq.wqe_shift - 4);
 
stamp_send_wqe(qp, i, 1 << qp->sq.wqe_shift);
}
@@ -3169,8 +3170,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
wmb();
*lso_wqe = lso_hdr_sz;
 
-   ctrl->fence_size = (wr->send_flags & IB_SEND_FENCE ?
-   MLX4_WQE_CTRL_FENCE : 0) | size;
+   ctrl->qpn_vlan.fence_size = (wr->send_flags & IB_SEND_FENCE ?
+MLX4_WQE_CTRL_FENCE : 0) | size;
 
/*
 * Make sure descriptor is fully written before
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 76aa4d2..2f56018 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -631,8 +631,7 @@ static int get_real_size(const struct sk_buff *skb,
 static void build_inline_wqe(struct mlx4_en_tx_desc *tx_desc,
 const struct sk_buff *skb,
 const struct skb_shared_info *shinfo,
-int real_size, u16 *vlan_tag,
-int tx_ind, void *fragptr)
+void *fragptr)
 {
struct mlx4_wqe_inline_seg *inl = _desc->inl;
int spc = MLX4_INLINE_ALIGN - CTRL_SIZE - sizeof *inl;
@@ -700,10 +699,66 @@ static void mlx4_bf_copy(void __iomem *dst, const void 
*src,
__iowrite64_copy(dst, src, bytecnt / 8);
 }
 
+void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring)
+{
+   wmb();
+   /* Since there is no iowrite*_native() that writes the
+* value as is, without byteswapping - using the one
+* the doesn't do byteswapping in the relevant arch
+* endianness.
+*/
+#if defined(__LITTLE_ENDIAN)
+   iowrite32(
+#else
+   iowrite32be(
+#endif
+ ring->doorbell_qpn,
+ ring->bf.uar->map + MLX4_SEND_DOORBELL);
+}
+
+static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
+ struct mlx4_en_tx_desc *tx_desc,
+ union mlx4_wqe_qpn_vlan qpn_vlan,
+ int desc_size, int bf_index,
+ __be32 op_own, bool bf_ok,
+ bool send_doorbell)
+{
+   tx_desc->ctrl.qpn_vlan = qpn_vlan;
+
+   if (bf_ok) {
+   op_own |= htonl((bf_index & 0x) << 8);
+   /* Ensure new descriptor hits memory
+* before setting ownership of this descriptor to HW
+*/
+   dma_wmb();
+   tx_desc->ctrl.owner_opcode = op_own;
+
+   wmb();
+
+   mlx4_bf_copy(ring->bf.reg + ring->bf.offset, _desc->ctrl,
+desc_size);
+
+   wmb();
+
+   ring->bf.offset ^= ring->bf.buf_size;
+   } else {
+   /* Ensure new descriptor hits 

[PATCH v10 12/12] bpf: add sample for xdp forwarding and rewrite

2016-07-19 Thread Brenden Blanco
Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.

Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.

Signed-off-by: Brenden Blanco 
---
 samples/bpf/Makefile|   5 +++
 samples/bpf/xdp2_kern.c | 114 
 2 files changed, 119 insertions(+)
 create mode 100644 samples/bpf/xdp2_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0e4ab3a..d2d2b35 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -22,6 +22,7 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += xdp1
+hostprogs-y += xdp2
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -44,6 +45,8 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
+# reuse xdp1 source intentionally
+xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -67,6 +70,7 @@ always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
 always += test_cgrp2_tc_kern.o
 always += xdp1_kern.o
+always += xdp2_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
@@ -88,6 +92,7 @@ HOSTLOADLIBES_spintest += -lelf
 HOSTLOADLIBES_map_perf_test += -lelf -lrt
 HOSTLOADLIBES_test_overhead += -lelf -lrt
 HOSTLOADLIBES_xdp1 += -lelf
+HOSTLOADLIBES_xdp2 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/xdp2_kern.c b/samples/bpf/xdp2_kern.c
new file mode 100644
index 000..38fe7e1
--- /dev/null
+++ b/samples/bpf/xdp2_kern.c
@@ -0,0 +1,114 @@
+/* Copyright (c) 2016 PLUMgrid
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") dropcnt = {
+   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(long),
+   .max_entries = 256,
+};
+
+static void swap_src_dst_mac(void *data)
+{
+   unsigned short *p = data;
+   unsigned short dst[3];
+
+   dst[0] = p[0];
+   dst[1] = p[1];
+   dst[2] = p[2];
+   p[0] = p[3];
+   p[1] = p[4];
+   p[2] = p[5];
+   p[3] = dst[0];
+   p[4] = dst[1];
+   p[5] = dst[2];
+}
+
+static int parse_ipv4(void *data, u64 nh_off, void *data_end)
+{
+   struct iphdr *iph = data + nh_off;
+
+   if (iph + 1 > data_end)
+   return 0;
+   return iph->protocol;
+}
+
+static int parse_ipv6(void *data, u64 nh_off, void *data_end)
+{
+   struct ipv6hdr *ip6h = data + nh_off;
+
+   if (ip6h + 1 > data_end)
+   return 0;
+   return ip6h->nexthdr;
+}
+
+SEC("xdp1")
+int xdp_prog1(struct xdp_md *ctx)
+{
+   void *data_end = (void *)(long)ctx->data_end;
+   void *data = (void *)(long)ctx->data;
+   struct ethhdr *eth = data;
+   int rc = XDP_DROP;
+   long *value;
+   u16 h_proto;
+   u64 nh_off;
+   u32 index;
+
+   nh_off = sizeof(*eth);
+   if (data + nh_off > data_end)
+   return rc;
+
+   h_proto = eth->h_proto;
+
+   if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
+   struct vlan_hdr *vhdr;
+
+   vhdr = data + nh_off;
+   nh_off += sizeof(struct vlan_hdr);
+   if (data + nh_off > data_end)
+   return rc;
+   h_proto = vhdr->h_vlan_encapsulated_proto;
+   }
+   if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
+   struct vlan_hdr *vhdr;
+
+   vhdr = data + nh_off;
+   nh_off += sizeof(struct vlan_hdr);
+   if (data + nh_off > data_end)
+   return rc;
+   h_proto = vhdr->h_vlan_encapsulated_proto;
+   }
+
+   if (h_proto == htons(ETH_P_IP))
+   index = parse_ipv4(data, nh_off, data_end);
+   else if (h_proto == htons(ETH_P_IPV6))
+   index = parse_ipv6(data, nh_off, data_end);
+   else
+   index = 0;
+
+   value = bpf_map_lookup_elem(, );
+   if (value)
+   *value += 1;
+
+   if (index == 17) {
+   

Re: [PATCH net-next v3 11/12] net: dsa: mv88e6xxx: add G1 helper for ageing time

2016-07-19 Thread Andrew Lunn
On Mon, Jul 18, 2016 at 08:45:39PM -0400, Vivien Didelot wrote:
> All Marvell switch chips from (88E6060 to 88E6390) have a ATU Control
> register containing bits 11:4 to configure an ATU Age Time quotient.
> 
> However the coefficient used to calculate the ATU Age Time vary with the
> models. E.g. 88E6060, 88E6352 and 88E6390 use respectively 16, 15 and
> 3.75 seconds.
> 
> Add a age_time_coeff to the info structure to handle this and a Global 1
> helper to set the default age time of 5 minutes in the setup code.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew



Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr

2016-07-19 Thread Andrew Lunn
On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote:
> Add a new function for DSA drivers to handle the switchdev
> SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.
> 
> The ageing time is passed as milliseconds.
> 
> Also because we can have multiple logical bridges on top of a physical
> switch and ageing time are switch-wide, call the driver function with
> the fastest ageing time in use on the chip instead of the requested one.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next v3 12/12] net: dsa: mv88e6xxx: add support for DSA ageing time

2016-07-19 Thread Andrew Lunn
On Mon, Jul 18, 2016 at 08:45:40PM -0400, Vivien Didelot wrote:
> Implement the DSA driver function to configure the bridge ageing time.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew


4.4 Backport request 4d06dd537f95 cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind

2016-07-19 Thread Chas Williams
Can

4d06dd537f95 cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind

be added to 4.4 stable?  This fixes CVE-2016-3951 [The second half
necessary for a complete fix, 666984c8625 ("usbnet: cleanup after bind()
in probe()"), is already present in 4.4.]

Thanks!


Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-07-19 Thread Netanel Belgazal


On 07/19/2016 08:26 PM, Leon Romanovsky wrote:
> On Tue, Jul 19, 2016 at 02:09:25PM +0300, Netanel Belgazal wrote:
>>
>> On 07/15/2016 08:00 AM, Leon Romanovsky wrote:
>>> On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote:
 This is a driver for the ENA family of networking devices.

 Signed-off-by: Netanel Belgazal 
 ---

 Notes:
>>> ...
>>>
 - Increase driver version to 1.0.2
>>> ...
>>>
 +static void ena_get_drvinfo(struct net_device *dev,
 +  struct ethtool_drvinfo *info)
 +{
 +  struct ena_adapter *adapter = netdev_priv(dev);
 +
 +  strlcpy(info->driver, DRV_MODULE_NAME, sizeof(info->driver));
 +  strlcpy(info->version, DRV_MODULE_VERSION, sizeof(info->version));
>>> Does module version give anything valuable in real life usage?
>>> Do you plan to bump version after every patch?
>>>
>>> Hint, NO.
>> I think it is appropriate to expose driver version to ethtool, and itis 
>> appropriate to be able to version a driver in upstream (mainly for debug 
>> purpose)
>> I don't think there is upstream agreementthat no driver should be allowed to 
>> maintain a versionnumber.
> You didn't answer on my questions, so I suppose that this version
> interface will be forgotten and won't be relevant after first major
> rework.
>
> You have kernel version to know which driver you are running, mixing
> different versions of driver with other kernels are seeing as
> not-supported by the community.

We need this information for our internal purposes including debugging and 
problems tracking.
It is in our interests to make sure that versions are managed correctly.
Besides, it seems to be a common practice, and most of the drivers maintain 
internal driver versions.
   
 +  strlcpy(info->bus_info, pci_name(adapter->pdev),
 +  sizeof(info->bus_info));
 +}
 +
 +
>>> ...
>>>
 +
 +static char version[] =
 +  DEVICE_NAME " v"
 +  DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
 +
 +MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
 +MODULE_DESCRIPTION(DEVICE_NAME);
 +MODULE_LICENSE("GPL");
 +MODULE_VERSION(DRV_MODULE_VERSION);
 +
 +/* Time in jiffies before concluding the transmitter is hung. */
 +#define TX_TIMEOUT  (5 * HZ)
 +
 +#define ENA_NAPI_BUDGET 64
 +
 +#define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_PROBE | 
 NETIF_MSG_IFUP | \
 +  NETIF_MSG_TX_DONE | NETIF_MSG_TX_ERR | NETIF_MSG_RX_ERR)
 +static int debug = -1;
 +module_param(debug, int, 0);
 +MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
>>> What is it?
>> This is the debugging message interface.
>> https://www.kernel.org/doc/Documentation/networking/netif-msg.txt
> This document was updated last time in 2006 and I doubt that it is
> relevant in 2016. You have dynamic debug prints infrastructure for it,
> use it.

Joe answer for that.




Re: [PATCH] net: ti: cpmac: Use the correct function to free some resources.

2016-07-19 Thread David Miller
From: Christophe JAILLET 
Date: Sun, 17 Jul 2016 08:15:50 +0200

> In 'cpmac_open', 'dma_alloc_coherent' has been used to allocate some
> resources, so we need to free them using 'dma_free_coherent' instead
> of 'kfree'.
> 
> Also, we don't need to free these resources if the allocation has failed.
> So I have slighly modified the goto label in this case.
> 
> Signed-off-by: Christophe JAILLET 

Applied.


Re: [PATCH] drivers: atm: nicstar: Use the correct function to free some resources

2016-07-19 Thread David Miller
From: Christophe JAILLET 
Date: Sun, 17 Jul 2016 09:03:24 +0200

> In 'get_scq', 'dma_alloc_coherent' has been used to allocate some
> resources, so we need to free them using 'dma_free_coherent' instead
> of 'kfree'.
> 
> Signed-off-by: Christophe JAILLET 

Applied.


Re: [1/3] rtlwifi: don't add include path for rtl8188ee

2016-07-19 Thread Kalle Valo
Arnd Bergmann  wrote:
> For rtl8188ee, we pass -Idrivers/net/wireless/rtlwifi/ to gcc,
> however that directy no longer exists, so evidently this option
> is no longer required here and can be removed to avoid a warning
> when building with 'make W=1' or 'gcc -Wmissing-include-dirs'
> 
> Signed-off-by: Arnd Bergmann 

Thanks, 1 patch applied to wireless-drivers-next.git:

944c07a7aac9 rtlwifi: don't add include path for rtl8188ee

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9237723/



Re: [v7] wlcore: spi: add wl18xx support

2016-07-19 Thread Kalle Valo
Eyal Reizer  wrote:
> From: Eyal Reizer 
> 
> Add support for using with both wl12xx and wl18xx.
> 
> - all wilink family needs special init command for entering wspi mode.
>   extra clock cycles should be sent after the spi init command while the
>   cs pin is high.
> - Use inverted chip select for sending a dummy 4 bytes command that
>   completes the init stage.
> 
> Signed-off-by: Eyal Reizer 
> Acked-by: Rob Herring 

Thanks, 1 patch applied to wireless-drivers-next.git:

01efe65aba65 wlcore: spi: add wl18xx support

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9235983/



Re: [PATCH 1/1] netfilter: Add helper array register/unregister functions

2016-07-19 Thread Pablo Neira Ayuso
On Mon, Jul 18, 2016 at 11:39:23AM +0800, f...@ikuai8.com wrote:
> From: Gao Feng 
> 
> Add nf_ct_helper_init, nf_conntrack_helpers_register/unregister
> functions to enhance the conntrack helper codes.

Applied, thanks.

I have manually updated indentations to make it fit to our coding
style, btw.


Re: [PATCH v2 05/11] Kbuild: don't add obj tree in additional includes

2016-07-19 Thread Kalle Valo
Arnd Bergmann  writes:

>> > I think that's fine, a couple were already picked up, and what I have
>> > left now is
>> >
>> > a281bfa5713a [SUBMITTED 20160615] [EXPERIMENTAL] Kbuild: enable 
>> > -Wmissing-include-dirs by default
>> > 83934921e68e [SUBMITTED 20160615] rtlwifi: don't add include path for 
>> > rtl8188ee
>> 
>> Apparently[1] you didn't CC linux-wireless and that's why I didn't see
>> the rtlwifi patch in wireless patchwork. Care to resend?
>> 
>> [1] https://patchwork.kernel.org/patch/9178861/
>> 
>
> Done.

Thanks.

> I've also thrown in two patches for drivers/staging/rtl8*/ that I
> submitted a while ago, but I'm not sure if they should get merged
> through the staging tree or the wireless tree. I had previously
> submitted those two as a combined patch along with a third one that
> turned out to be unnecessary.

Greg applies drivers/staging patches to his staging tree, but I'll take
the rtlwifi patch.

-- 
Kalle Valo


Re: [PATCH net-next 1/1] net_sched: Introduce skbmod action

2016-07-19 Thread Cong Wang
On Tue, Jul 19, 2016 at 8:03 AM, Daniel Borkmann  wrote:
> On 07/19/2016 03:56 PM, Jamal Hadi Salim wrote:
> [...]
>>>
>>> But apart from this,
>>> neither pedit nor tcf_skbmod_run() here handle checksum complete, so
>>> you'll
>>> potentially get false positives wrt csum corruption and drops as a result
>>> when using either of the two.
>>
>>
>> pedit maybe tricky. Any suggestions?
>> On tcf_skbmod_run, mostly ignorance: while doing only ethernet updates;
>> is it still needed to do the checksum complete?
>
>
> Well, what Cong recently fixed with mirred was related to mac header ...
>
> You probably need skb_postpull_rcsum(), skb_postpush_rcsum() pair.
>
> Also, what about skb_try_make_writable()?

I don't think so. 1) checksum is supposed to be done by csum action
rather than pedit (or skbmod if it matters), 2) csum action currently
already handles that correctly for both egress and ingress: 2a)
CHECKSUM_COMPLETE is meaningless on egress; 2b) it forces
CHECKSUM_COMPLETE to be CHECKSUM_NONE on ingress
and it is correctly respected.


Re: [net-next v2 1/6] libcxgb: add library module for Chelsio drivers

2016-07-19 Thread David Miller
From: Varun Prakash 
Date: Sat, 16 Jul 2016 22:49:15 +0530

>  }
> +
> +static int __init libcxgb_init(void)
> +{
> + return 0;
> +}
> +
> +static void __exit libcxgb_exit(void)
> +{
> +}
> +
> +module_init(libcxgb_init);
> +module_exit(libcxgb_exit);

If these functions don't have to do anything, you can remove this
entire sequence entirely.


Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-07-19 Thread Joe Perches
On Tue, 2016-07-19 at 20:26 +0300, Leon Romanovsky wrote:
> On Tue, Jul 19, 2016 at 02:09:25PM +0300, Netanel Belgazal wrote:
> > This is the debugging message interface.
> > https://www.kernel.org/doc/Documentation/networking/netif-msg.txt
> This document was updated last time in 2006 and I doubt that it is
> relevant in 2016. You have dynamic debug prints infrastructure for it,
> use it.

I think this is uninformed.

netif_ works well, is compatible with dynamic debug,
and is commonly used with new networking drivers.



Re: [PATCH v8 06/11] net/mlx4_en: add page recycle to prepare rx ring for tx support

2016-07-19 Thread Brenden Blanco
On Tue, Jul 19, 2016 at 04:33:28PM +0300, Tariq Toukan wrote:
[...]
> >So, I took Dave's suggestion to heart, and spent the last 2 days seeing
> >what was possible to implement with just xdp as the focus, rather than
> >an overall cleanup which Tariq will be looking at.
> >
> >Unfortunately, this turned out to a be a bit of a rat hole.
> >
> >What I wanted to do was to pre-allocate all the required pages before
> >reaching the point of no return. Doing this isn't all that hard, since
> >it should just be a few loops. However, I ended with a bit more
> >duplicated code than one would like, since I had to tease out the
> >various sections that assume exclusive access to hardware.
> >
> >But, more than that, is that I don't see a way to fill these pages into
> >the rings safely while hardware still has ability to write into the old
> >ones. There was no "pause" API that I could find besides
> >mlx4_en_stop_port(). That function is fairly destructive and requires
> >the resource allocation in mlx4_en_start_port() to succeed to recover
> >the port status.
> >
> >One option that I considered would be to drain buffers from the rx ring,
> >and just let mlx4_en_recover_from_oom() do its job once we update the
> >page template in frag_info[]. This, however, also requires the queues to
> >be paused safely, so we again have to rely on mlx4_en_stop_port().
> >
> >One change I can make is to avoid allocating additional tx rings, which
> >means that we can skip the calls to mlx4_en_free/alloc_resources().
> I think it's more natural to manage the two types of tx rings
> without converting one type to the other dynamically, just to avoid
> re-allocation.
> I don't see the re-allocation of resources as a serious issue here,
> especially after I submitted patches that add resiliency and make it
> more safe (see them in [1]).
Yes, I saw those, it would be helpful for the v8 version but with v9
onwards I guess it is not conflicting, since no reallocation is done.
> We just need to make sure we don't end up with an inoperative port,
> I will take care of it once net and net-next are merged.
Thanks
> 
> I'd like to keep the tx rings separation, also when it comes to
> resource allocation, so that we allocate xdp rings when needed, and
> not borrow them from the other type.
> I have two possible solutions in mind:
> 1) Based on the suggestion you have in v8, in which rsv tx rings
> grow beyond to the regular tx array, while maintaining the
> accounting.
> One improvement to make the code more clear and readable is to
> explicitly define two fields for accounting the number of tx rings:
> - rename rsv_tx_rings to xdp_tx_rings.
I'll incorporate this one, since anyway the code is changing a bit to
accommodate prog-per-ring.
> - have a new priv->netdev_num_tx_rings = priv->num_tx_rings - xdp_tx_rings.
> and then modify driver rings iterators accordingly.
The ring iteration is sprinkled in too many places to incorporate
easily, so this makes sense as its own cleanup, with either (1) or (2)
from your proposal.
> 
> 2) Another solution I have in mind goes as follows.
> Expand the current tx_ring array to be two dimensional, where
> tx_ring[0] points to the regular tx array, and tx_ring[1] points to
> the xdp rings.
> We look to keep using the same napi handlers and not get into code
> duplication, and make minimal adjustments that do not harm
> performance.
> The idea is to mark CQ's types in creation time, so that we know for
> each CQ whether it's for a regular TX ring or an XDP tx ring.
> When we get a completion, we can just read the CQ type, and then
> choose the correct tx_ring according to cq type and cq->ring,
> something like tx_ring[cq->cq_type][cq->ring]; (we can do it so that
> we avoid using an if statement).
> It is possible to use the existing enum cq_type field "cq->is_tx",
> rename it (say to 'type'), and expand the enum cq_type with an
> additional value for TX_XDP.
> In addition, similarly to the previous suggestion, also here we
> would want to use two separate fields for the number of tx rings,
> one for netdev and one for xdp, and modify the rings iterators
> accordingly.
I think (2) makes sense, but the proof will be in the code, depending on
how easy to review it is.
> 
> [1]:
> https://patchwork.ozlabs.org/patch/649620/
> https://patchwork.ozlabs.org/patch/649619/
> 
> Another (not related) nit, as you're already working on the next V,
> I suggest we rename priv->prog to priv->bpf_prog.
I think xdp_prog is better, but point taken. Will update.
> 
> Regards,
> Tariq


Re: [PATCH V2] Add flow control to the portmapper

2016-07-19 Thread Leon Romanovsky
On Tue, Jul 19, 2016 at 09:50:24AM -0500, Shiraz Saleem wrote:
> On Tue, Jul 19, 2016 at 08:40:06AM +0300, Leon Romanovsky wrote:
> > 
> > You are the one user of this new inline function.
> > Why don't you directly call to netlink_unicast() in your ibnl_unicast()
> > without messing with widely visible header file?
> 
> Since there is a non-blocking version of nlmsg_unicast(), the idea is 
> to make a blocking version available to others as well as maintain 
> consistency of existing code.
> 

In such way, please provide patch series which will convert all other
users to this new call.

➜  linux-rdma git:(master) grep -rI netlink_unicast * | grep -I 0
kernel/audit.c: err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
kernel/audit.c: netlink_unicast(aunet->nlsk, skb, dest->portid, 0);
kernel/audit.c: netlink_unicast(aunet->nlsk , reply->skb, reply->portid, 0);
kernel/audit.c: return netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
samples/connector/cn_test.c:netlink_unicast(nls, skb, 0, 0);

Thanks


signature.asc
Description: Digital signature


Re: [PATCH net-next V3] net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)

2016-07-19 Thread Leon Romanovsky
On Tue, Jul 19, 2016 at 02:09:25PM +0300, Netanel Belgazal wrote:
> 
> 
> On 07/15/2016 08:00 AM, Leon Romanovsky wrote:
> > On Thu, Jul 14, 2016 at 09:46:14AM +0300, Netanel Belgazal wrote:
> >> This is a driver for the ENA family of networking devices.
> >>
> >> Signed-off-by: Netanel Belgazal 
> >> ---
> >>
> >> Notes:
> > ...
> >
> >> - Increase driver version to 1.0.2
> > ...
> >
> >> +static void ena_get_drvinfo(struct net_device *dev,
> >> +  struct ethtool_drvinfo *info)
> >> +{
> >> +  struct ena_adapter *adapter = netdev_priv(dev);
> >> +
> >> +  strlcpy(info->driver, DRV_MODULE_NAME, sizeof(info->driver));
> >> +  strlcpy(info->version, DRV_MODULE_VERSION, sizeof(info->version));
> > Does module version give anything valuable in real life usage?
> > Do you plan to bump version after every patch?
> >
> > Hint, NO.
> 
> I think it is appropriate to expose driver version to ethtool, and itis 
> appropriate to be able to version a driver in upstream (mainly for debug 
> purpose)
> I don't think there is upstream agreementthat no driver should be allowed to 
> maintain a versionnumber.

You didn't answer on my questions, so I suppose that this version
interface will be forgotten and won't be relevant after first major
rework.

You have kernel version to know which driver you are running, mixing
different versions of driver with other kernels are seeing as
not-supported by the community.

> 
> >> +  strlcpy(info->bus_info, pci_name(adapter->pdev),
> >> +  sizeof(info->bus_info));
> >> +}
> >> +
> >> +
> > ...
> >
> >> +
> >> +static char version[] =
> >> +  DEVICE_NAME " v"
> >> +  DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
> >> +
> >> +MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
> >> +MODULE_DESCRIPTION(DEVICE_NAME);
> >> +MODULE_LICENSE("GPL");
> >> +MODULE_VERSION(DRV_MODULE_VERSION);
> >> +
> >> +/* Time in jiffies before concluding the transmitter is hung. */
> >> +#define TX_TIMEOUT  (5 * HZ)
> >> +
> >> +#define ENA_NAPI_BUDGET 64
> >> +
> >> +#define DEFAULT_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_PROBE | 
> >> NETIF_MSG_IFUP | \
> >> +  NETIF_MSG_TX_DONE | NETIF_MSG_TX_ERR | NETIF_MSG_RX_ERR)
> >> +static int debug = -1;
> >> +module_param(debug, int, 0);
> >> +MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
> > What is it?
> 
> This is the debugging message interface.
> https://www.kernel.org/doc/Documentation/networking/netif-msg.txt

This document was updated last time in 2006 and I doubt that it is
relevant in 2016. You have dynamic debug prints infrastructure for it,
use it.


signature.asc
Description: Digital signature


Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr

2016-07-19 Thread Florian Fainelli
On 07/19/2016 07:20 AM, Andrew Lunn wrote:
> On Mon, Jul 18, 2016 at 09:26:00PM -0700, Florian Fainelli wrote:
>> Le 18/07/2016 à 20:24, Andrew Lunn a écrit :
>>> On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote:
 Add a new function for DSA drivers to handle the switchdev
 SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.

 The ageing time is passed as milliseconds.

 Also because we can have multiple logical bridges on top of a physical
 switch and ageing time are switch-wide, call the driver function with
 the fastest ageing time in use on the chip instead of the requested one.

 Signed-off-by: Vivien Didelot 
 ---
  include/net/dsa.h |  2 ++
  net/dsa/slave.c   | 41 +
>>>
>>
>> Hi Andrew,
>>
>>> Hi Florian
>>>
>>> It looks like the SF2 can do fast ageing per port. What i don't see if
>>> what configuration options you have. Can you get the fast and the
>>> normal age time per port? Or is it global?
>>
>> The normal ageing is global and the value needs to be programmed in
>> seconds, can can range from 10 to 1,048,575 (encoded on 20 bits). The
>> fast-ageing can actually be per-port, per-VLAN id, for just dynamic or
>> static entries etc. and is just a poor name for a flush based on any of
>> these criteria.
> 
> Hi Florian
> 
> So fast ageing does not have a timer value associated to it? It is
> just a flush?

Correct, it's a flush operation which is internally implemented/named as
a fast aging, as in fast enough it is almost instantenous from the
programmer's perspective.

> 
> If so, the code Vivien is proposing in DSA slave is O.K. If however
> there was a per port timer, Vivien's code is too high in the stack,
> blocking SF2 from being able to use per-port timers. That is what i'm
> trying to get at.

Browsing through all the generations, there does not seem to be any
per-port aging, it's always global.
-- 
Florian


Re: [PATCH v4 0/7] thunderbolt: Introducing Thunderbolt(TM) networking

2016-07-19 Thread Joe Perches
On Mon, 2016-07-18 at 13:00 +0300, Amir Levy wrote:
> This is version 4 of Thunderbolt(TM) driver for non-Apple hardware.
[]
>  Documentation/00-INDEX   |2 +
>  Documentation/thunderbolt-networking.txt |  135 ++
>  drivers/thunderbolt/Kconfig  |   25 +-
>  drivers/thunderbolt/Makefile |3 +-
>  drivers/thunderbolt/icm/Makefile |   28 +
>  drivers/thunderbolt/icm/icm_nhi.c| 1642 +
>  drivers/thunderbolt/icm/icm_nhi.h|   93 ++
>  drivers/thunderbolt/icm/net.c| 2276 
> ++
>  drivers/thunderbolt/icm/net.h|  274 
>  drivers/thunderbolt/nhi_regs.h   |  115 +-
>  10 files changed, 4585 insertions(+), 8 deletions(-)
>  create mode 100644 Documentation/thunderbolt-networking.txt
>  create mode 100644 drivers/thunderbolt/icm/Makefile
>  create mode 100644 drivers/thunderbolt/icm/icm_nhi.c
>  create mode 100644 drivers/thunderbolt/icm/icm_nhi.h
>  create mode 100644 drivers/thunderbolt/icm/net.c
>  create mode 100644 drivers/thunderbolt/icm/net.h

Is a MAINTAINERS update necessary or is Andreas going to
be the maintainer?


Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)

2016-07-19 Thread Sargun Dhillon



On Tue, 19 Jul 2016, Daniel Borkmann wrote:


Hi Sargun,

On 07/19/2016 11:32 AM, Sargun Dhillon wrote:

This allows user memory to be written to during the course of a kprobe.
It shouldn't be used to implement any kind of security mechanism
because of TOC-TOU attacks, but rather to debug, divert, and
manipulate execution of semi-cooperative processes.

Although it uses probe_kernel_write, we limit the address space
the probe can write into by checking the space with access_ok.
This call shouldn't sleep on any architectures based on review.

It was tested with the tracex7 program on x86-64.

Signed-off-by: Sargun Dhillon 
Cc: Alexei Starovoitov 

[...]

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ebfbb7d..45878f3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -81,6 +81,35 @@ static const struct bpf_func_proto bpf_probe_read_proto 
= {

.arg3_type  = ARG_ANYTHING,
  };

+static u64 bpf_copy_to_user(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+   void *to = (void *) (long) r1;
+   void *from = (void *) (long) r2;
+   int  size = (int) r3;


Nit: you have two spaces here?


+   struct task_struct *task = current;
+
+   /* check if we're in a user context */
+   if (unlikely(in_interrupt()))
+   return -EINVAL;
+   if (unlikely(!task || !task->pid))


Why !task->pid is needed? Can you point to the code where this is
set up as such? I'd assume if that's the case, there's a generic
helper for it to determine that the task_struct is a kernel task?



One example of precedent:
http://lxr.free-electrons.com/source/arch/x86/kernel/process.c#L270

Also, while testing this out, I instrumented ip_route_output_flow,
and I found that there were cases that were invoked from the kernel
there current was not NULL, but task->pid was reliably 0.


+   return -EINVAL;
+
+   /* Is this a user address, or a kernel address? */
+   if (!access_ok(VERIFY_WRITE, to, size))
+   return -EINVAL;
+
+   return probe_kernel_write(to, from, size);


I'm still worried that this can lead to all kind of hard to find
bugs or races for user processes, if you make this writable to entire
user address space (which is the only thing that access_ok() checks
for). What if the BPF program has bugs and writes to wrong addresses
for example, introducing bugs in some other, non-intended processes
elsewhere? Can this be limited to syscalls only? And if so, to the
passed memory only?

Have you played around with ptrace() to check whether you could
achieve similar functionality (was thinking about things like [1],
PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't
this be limited to a similar functionality for only the current task.
ptrace() utilizes helpers like access_process_vm(), maybe this can
similarly be adapted here, too (under the circumstances that sleeping
is not allowed)?

 [1] https://code.google.com/archive/p/ptrace-parasite/


I agree that this can get really complicated if it clashes with memory
that a task is currently manipulating while we do the copy_to_user.
I highlighted that in one of the commit messages not to rely on this
behaviour for anything "real" because TOC-TOU.

We could limit the use to be within syscalls and uprobes if you think
that's the best idea. Is there an easy way to find out, but I think
it's somewhat difficult to restrict it to passed memory only. In the
example app there's only one level of indirection that occurs,
and without some new rules in the verifier, I don't how to handle
this. A simple, problematic example is dealing with iovecs, and 
process_vm_readv, and process_vm_writev, where to find the actual buffer 
content I need to dereference once to get the list of iovecs, and then 
again, to find the buffer the iovec points to.


We looked at using ptrace, but our software is often working with blackbox 
code that sometimes does not like to ptrace'd. ptrace is kinda also a pain 
to work with, and has the same potential for data races you highlighted 
earlier. In fact, we have a demo of tracex7 that uses ptrace + 
process_vm_writev + seccomp, and it's much more complicated and fragile.


Just casually poking around, it looks like access_process_vm sleeps, 
resulting in a whole new set of complexities that Alexei highlighted 
earlier.



+}
+
+static const struct bpf_func_proto bpf_copy_to_user_proto = {
+   .func = bpf_copy_to_user,
+   .gpl_only = false,
+   .ret_type = RET_INTEGER,
+   .arg1_type = ARG_ANYTHING,
+   .arg2_type = ARG_PTR_TO_RAW_STACK,
+   .arg3_type = ARG_CONST_STACK_SIZE,


Nit: please tab-align '=' like we have in the other *_proto cases in
bpf_trace.


+};
+
  /*
   * limited trace_printk()
   * only %d %u %x %ld %lu %lx %lld %llu %llx %p %s conversion specifiers 
allowed


Thanks,
Daniel

Thanks for the review. Unless there's any other input, I'll go ahead and 
resubmit the 

Re: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread kbuild test robot
Hi,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-Hyper-V-VM-Sockets-hv_sock/20160715-223433
config: x86_64-randconfig-a0-07191719 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
   net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in 
this function
   net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used uninitialized 
in this function
   net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized in 
this function
   net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
   net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in 
this function
>> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d: 
>> function has unreachable instruction

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH net-next v3 1/2] bpf: Add bpf_copy_to_user BPF helper to be called in tracers (kprobes)

2016-07-19 Thread Alexei Starovoitov
On Tue, Jul 19, 2016 at 01:17:53PM +0200, Daniel Borkmann wrote:
> >+return -EINVAL;
> >+
> >+/* Is this a user address, or a kernel address? */
> >+if (!access_ok(VERIFY_WRITE, to, size))
> >+return -EINVAL;
> >+
> >+return probe_kernel_write(to, from, size);
> 
> I'm still worried that this can lead to all kind of hard to find
> bugs or races for user processes, if you make this writable to entire
> user address space (which is the only thing that access_ok() checks
> for). What if the BPF program has bugs and writes to wrong addresses
> for example, introducing bugs in some other, non-intended processes
> elsewhere? Can this be limited to syscalls only? And if so, to the
> passed memory only?

my understanding that above code will write only to memory of current process,
so impact is contained and in that sense buggy kprobe program is no different
from buggy seccomp prorgram.
Limiting this to syscalls will make it too limited.
I'm in favor of this change, because it allows us to experiment
with restartable sequences and lock-free algorithms that need ultrafast
access to cpuid without burdening the kernel with stable abi.

> Have you played around with ptrace() to check whether you could
> achieve similar functionality (was thinking about things like [1],
> PTRACE_PEEK{TEXT,DATA} / PTRACE_POKE{TEXT,DATA}). If not, why can't
> this be limited to a similar functionality for only the current task.
> ptrace() utilizes helpers like access_process_vm(), maybe this can
> similarly be adapted here, too (under the circumstances that sleeping
> is not allowed)?

If we hack access_process_vm I think at the end it will look like
probe_kernel_write. Walking mm requires semaphore, so we would only
be able to do it in task_work and there we can do normal copy_to_user
just as well, but it will complicate the programs quite a bit, since
writes will be asynchronous and batched.
Looks like with access_ok+probe_write we can achieve the same
with a lot less code.
As far as races between user and bpf program, yeah, if process
is multithreaded, the other threads may access the same mem that
bpf is writing to, but that's no different from reading.
For tracing we walk complex datastructures and pointers. They
can be changed by user space on the fly and bpf will see garbage.
Like we have uprobe+bpf that walks clang c++ internal datastructures
to figure out how long it takes clang to process .h headers.
There is a lot of fragility in the bpf script, yet it's pretty
useful to quickly analyze compile times.
I see bpf_copy_to_user to be hugely valuable too, not as a stable
interface, but rather as a tool to quickly debug and experiment.



Re: [PATCH v3] packet: fix second argument of sock_tx_timestamp()

2016-07-19 Thread Willem de Bruijn
On Tue, Jul 19, 2016 at 8:34 AM, Soheil Hassas Yeganeh
 wrote:
> On Tue, Jul 19, 2016 at 1:40 AM, Yoshihiro Shimoda
>  wrote:
>> This patch fixes an issue that a syscall (e.g. sendto syscall) cannot
>> work correctly. Since the sendto syscall doesn't have msg_control buffer,
>> the sock_tx_timestamp() in packet_snd() cannot work correctly because
>> the socks.tsflags is set to 0.
>> So, this patch sets the socks.tsflags to sk->sk_tsflags as default.
>>
>> Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
>> Cc: 
>> Reported-by: Kazuya Mizuguchi 
>> Reported-by: Keita Kobayashi 
>> Signed-off-by: Yoshihiro Shimoda 
> Acked-by: Soheil Hassas Yeganeh 

Acked-by: Willem de Bruijn 


Re: [PATCH 2/3] staging/rtl8192e: use s8 instead of char

2016-07-19 Thread Jes Sorensen
Arnd Bergmann  writes:
> On Tuesday, July 19, 2016 11:46:04 AM CEST Jes Sorensen wrote:
>> > diff --git a/drivers/staging/rtl8192e/rtl819x_TSProc.c 
>> > b/drivers/staging/rtl8192e/rtl819x_TSProc.c
>> > index 2c8a526773ed..e0a2fe5e6148 100644
>> > --- a/drivers/staging/rtl8192e/rtl819x_TSProc.c
>> > +++ b/drivers/staging/rtl8192e/rtl819x_TSProc.c
>> > @@ -323,7 +323,7 @@ bool GetTs(struct rtllib_device *ieee, struct 
>> > ts_common_info **ppTS,
>> >   if (ieee->current_network.qos_data.supported == 0) {
>> >   UP = 0;
>> >   } else {
>> > - if (!IsACValid(TID)) {
>> > + if (!IsACValid((s8)TID)) {
>> >   netdev_warn(ieee->dev, "%s(): TID(%d) is not 
>> > valid\n",
>> >   __func__, TID);
>> >   return false;
>> 
>> TID is a 4-bit field, it should never go negative. The cast to s8 seems
>> wrong to me, if anything it should be using u8. I do realize the macro
>> IsACValid checks against negative too, but that just looks silly to me.
>
> Ok, I'll remove the extra comparison then to avoid the warning:
>
> staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always
> true due to limited range of data type [-Werror=type-limits]
>
> I guess it should be a separate patch. I had just stumbled over the
> same thing before resending the patch but decided not to change it
> to keep the patch simple.

I think that would be better, albeit not a big issue. I'd like to get
rid of all the drivers/staging/rtl* drivers eventually :)

Cheers,
Jes


Re: [PATCH] packet: fix second argument of sock_tx_timestamp()

2016-07-19 Thread kbuild test robot
Hi,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.7-rc7 next-20160719]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Yoshihiro-Shimoda/packet-fix-second-argument-of-sock_tx_timestamp/20160719-194240
config: i386-defconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/packet/af_packet.c: In function 'tpacket_snd':
>> net/packet/af_packet.c:2645:18: error: 'sk' undeclared (first use in this 
>> function)
 sockc.tsflags = sk->sk_tsflags;
 ^~
   net/packet/af_packet.c:2645:18: note: each undeclared identifier is reported 
only once for each function it appears in

vim +/sk +2645 net/packet/af_packet.c

  2639  goto out;
  2640  proto   = saddr->sll_protocol;
  2641  addr= saddr->sll_addr;
  2642  dev = dev_get_by_index(sock_net(>sk), 
saddr->sll_ifindex);
  2643  }
  2644  
> 2645  sockc.tsflags = sk->sk_tsflags;
  2646  if (msg->msg_controllen) {
  2647  err = sock_cmsg_send(>sk, msg, );
  2648  if (unlikely(err))

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 2/3] staging/rtl8192e: use s8 instead of char

2016-07-19 Thread Arnd Bergmann
On Tuesday, July 19, 2016 11:46:04 AM CEST Jes Sorensen wrote:
> > diff --git a/drivers/staging/rtl8192e/rtl819x_TSProc.c 
> > b/drivers/staging/rtl8192e/rtl819x_TSProc.c
> > index 2c8a526773ed..e0a2fe5e6148 100644
> > --- a/drivers/staging/rtl8192e/rtl819x_TSProc.c
> > +++ b/drivers/staging/rtl8192e/rtl819x_TSProc.c
> > @@ -323,7 +323,7 @@ bool GetTs(struct rtllib_device *ieee, struct 
> > ts_common_info **ppTS,
> >   if (ieee->current_network.qos_data.supported == 0) {
> >   UP = 0;
> >   } else {
> > - if (!IsACValid(TID)) {
> > + if (!IsACValid((s8)TID)) {
> >   netdev_warn(ieee->dev, "%s(): TID(%d) is not valid\n",
> >   __func__, TID);
> >   return false;
> 
> TID is a 4-bit field, it should never go negative. The cast to s8 seems
> wrong to me, if anything it should be using u8. I do realize the macro
> IsACValid checks against negative too, but that just looks silly to me.

Ok, I'll remove the extra comparison then to avoid the warning:

staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always true due 
to limited range of data type [-Werror=type-limits]

I guess it should be a separate patch. I had just stumbled over the
same thing before resending the patch but decided not to change it
to keep the patch simple.

Arnd


Re: [PATCH 3/3] staging/rtl8192u: use s8 instead of char

2016-07-19 Thread Jes Sorensen
Arnd Bergmann  writes:
> Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of
> incorrect code that results from 'char' being unsigned here, e.g.
>
> staging/rtl8192u/r8192U_core.c:4150:16: error: comparison is always false due 
> to limited range of data type [-Werror=type-limits]
> staging/rtl8192u/r8192U_dm.c:646:50: error: comparison is always false due to 
> limited range of data type [-Werror=type-limits]
>
> This patch changes all uses of 'char' in this driver that refer to
> 8-bit integers to use 's8' instead, which is signed on all architectures.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/staging/rtl8192u/ieee80211/ieee80211.h |  4 ++--
>  drivers/staging/rtl8192u/r8192U.h  |  4 ++--
>  drivers/staging/rtl8192u/r8192U_core.c | 14 +++---
>  3 files changed, 11 insertions(+), 11 deletions(-)

Looks good to me

Acked-by: Jes Sorensen 


Re: [PATCH 2/3] staging/rtl8192e: use s8 instead of char

2016-07-19 Thread Jes Sorensen
Arnd Bergmann  writes:
> Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of
> incorrect code that results from 'char' being unsigned here, e.g.
>
> staging/rtl8192e/rtl8192e/r8192E_phy.c:1072:36: error: comparison is always 
> false due to limited range of data type [-Werror=type-limits]
> staging/rtl8192e/rtl8192e/r8192E_phy.c:1104:36: error: comparison is always 
> false due to limited range of data type [-Werror=type-limits]
> staging/rtl8192e/rtl8192e/rtl_core.c:1987:16: error: comparison is always 
> false due to limited range of data type [-Werror=type-limits]
> staging/rtl8192e/rtl8192e/rtl_dm.c:782:37: error: comparison is always false 
> due to limited range of data type [-Werror=type-limits]
> staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always true 
> due to limited range of data type [-Werror=type-limits]
> staging/rtl8192e/rtllib_softmac_wx.c:465:16: error: comparison is always 
> false due to limited range of data type [-Werror=type-limits]
>
> This patch changes all uses of 'char' in this driver that refer to
> 8-bit integers to use 's8' instead, which is signed on all architectures.
>
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c | 8 
>  drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c | 2 +-
>  drivers/staging/rtl8192e/rtl8192e/rtl_core.c   | 6 +++---
>  drivers/staging/rtl8192e/rtl8192e/rtl_core.h   | 8 
>  drivers/staging/rtl8192e/rtl819x_TSProc.c  | 2 +-
>  5 files changed, 13 insertions(+), 13 deletions(-)
>

Most of this looks fine to me. One issue stands out which I don't think
is right:

> diff --git a/drivers/staging/rtl8192e/rtl819x_TSProc.c 
> b/drivers/staging/rtl8192e/rtl819x_TSProc.c
> index 2c8a526773ed..e0a2fe5e6148 100644
> --- a/drivers/staging/rtl8192e/rtl819x_TSProc.c
> +++ b/drivers/staging/rtl8192e/rtl819x_TSProc.c
> @@ -323,7 +323,7 @@ bool GetTs(struct rtllib_device *ieee, struct 
> ts_common_info **ppTS,
>   if (ieee->current_network.qos_data.supported == 0) {
>   UP = 0;
>   } else {
> - if (!IsACValid(TID)) {
> + if (!IsACValid((s8)TID)) {
>   netdev_warn(ieee->dev, "%s(): TID(%d) is not valid\n",
>   __func__, TID);
>   return false;

TID is a 4-bit field, it should never go negative. The cast to s8 seems
wrong to me, if anything it should be using u8. I do realize the macro
IsACValid checks against negative too, but that just looks silly to me.

Cheers,
Jes


Re: [PATCH v2 05/11] Kbuild: don't add obj tree in additional includes

2016-07-19 Thread Arnd Bergmann
On Tuesday, July 19, 2016 5:33:44 PM CEST Kalle Valo wrote:
> Arnd Bergmann  writes:
> 
> > On Monday, July 18, 2016 10:14:39 PM CEST Michal Marek wrote:
> >> On Wed, Jun 15, 2016 at 05:45:47PM +0200, Arnd Bergmann wrote:
> >> > When building with separate object directories and driver specific
> >> > Makefiles that add additional header include paths, Kbuild adjusts
> >> > the gcc flags so that we include both the directory in the source
> >> > tree and in the object tree.
> >> > 
> >> > However, due to another bug I fixed earlier, this did not actually
> >> > include the correct directory in the object tree, so we know that
> >> > we only really need the source tree here. Also, including the
> >> > object tree sometimes causes warnings about nonexisting directories
> >> > when the include path only exists in the source.
> >> > 
> >> > This changes the logic to only emit the -I argument for the srctree,
> >> > not for objects. We still need both $(srctree)/$(src) and $(obj)
> >> > though, so I'm adding them manually.
> >> > 
> >> > Signed-off-by: Arnd Bergmann 
> >> 
> >> Hi Arnd,
> >> 
> >> I applied the series up to this patch to kbuild.git#kbuild. The rest
> >> seem to be related but not dependent patches, so I'll leave it up to the
> >> respective maintainers to pick them up. Is that OK with you?
> >
> > I think that's fine, a couple were already picked up, and what I have
> > left now is
> >
> > a281bfa5713a [SUBMITTED 20160615] [EXPERIMENTAL] Kbuild: enable 
> > -Wmissing-include-dirs by default
> > 83934921e68e [SUBMITTED 20160615] rtlwifi: don't add include path for 
> > rtl8188ee
> 
> Apparently[1] you didn't CC linux-wireless and that's why I didn't see
> the rtlwifi patch in wireless patchwork. Care to resend?
> 
> [1] https://patchwork.kernel.org/patch/9178861/
> 

Done. I've also thrown in two patches for drivers/staging/rtl8*/ that I 
submitted
a while ago, but I'm not sure if they should get merged through the staging
tree or the wireless tree. I had previously submitted those two as a combined
patch along with a third one that turned out to be unnecessary.

Arnd


[PATCH 3/3] staging/rtl8192u: use s8 instead of char

2016-07-19 Thread Arnd Bergmann
Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of
incorrect code that results from 'char' being unsigned here, e.g.

staging/rtl8192u/r8192U_core.c:4150:16: error: comparison is always false due 
to limited range of data type [-Werror=type-limits]
staging/rtl8192u/r8192U_dm.c:646:50: error: comparison is always false due to 
limited range of data type [-Werror=type-limits]

This patch changes all uses of 'char' in this driver that refer to
8-bit integers to use 's8' instead, which is signed on all architectures.

Signed-off-by: Arnd Bergmann 
---
 drivers/staging/rtl8192u/ieee80211/ieee80211.h |  4 ++--
 drivers/staging/rtl8192u/r8192U.h  |  4 ++--
 drivers/staging/rtl8192u/r8192U_core.c | 14 +++---
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h 
b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
index 09e9499b7f9d..077ea13eb1e7 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
@@ -746,7 +746,7 @@ struct ieee80211_rx_stats {
bool  bisrxaggrsubframe;
bool  bPacketBeacon;//cosa add for rssi
bool  bToSelfBA;//cosa add for rssi
-   char  cck_adc_pwdb[4];  //cosa add for rx path selection
+   s8cck_adc_pwdb[4];  //cosa add for rx path selection
u16   Seq_Num;
 
 };
@@ -1814,7 +1814,7 @@ struct ieee80211_device {
u32 wpax_type_notify; //{added by David, 2006.9.26}
 
/* QoS related flag */
-   char init_wmmparam_flag;
+   s8  init_wmmparam_flag;
/* set on initialization */
u8  qos_support;
 
diff --git a/drivers/staging/rtl8192u/r8192U.h 
b/drivers/staging/rtl8192u/r8192U.h
index 5dba6a28dd9b..b28bc7812caa 100644
--- a/drivers/staging/rtl8192u/r8192U.h
+++ b/drivers/staging/rtl8192u/r8192U.h
@@ -533,7 +533,7 @@ typedef struct _rt_9x_tx_rate_history {
u32 ht_mcs[4][16];
 } rt_tx_rahis_t, *prt_tx_rahis_t;
 typedef struct _RT_SMOOTH_DATA_4RF {
-   charelements[4][100]; /* array to store values */
+   s8elements[4][100]; /* array to store values */
u32 index;/* index to current array to store */
u32 TotalNum; /* num of valid elements */
u32 TotalVal[4];  /* sum of valid elements */
@@ -1031,7 +1031,7 @@ typedef struct r8192_priv {
s8 cck_present_attentuation;
u8 cck_present_attentuation_20Mdefault;
u8 cck_present_attentuation_40Mdefault;
-   char cck_present_attentuation_difference;
+   s8 cck_present_attentuation_difference;
bool btxpower_tracking;
bool bcck_in_ch14;
bool btxpowerdata_readfromEEPORM;
diff --git a/drivers/staging/rtl8192u/r8192U_core.c 
b/drivers/staging/rtl8192u/r8192U_core.c
index dd0970facdf5..f36b2d3b1ee9 100644
--- a/drivers/staging/rtl8192u/r8192U_core.c
+++ b/drivers/staging/rtl8192u/r8192U_core.c
@@ -4209,7 +4209,7 @@ static void rtl8192_process_phyinfo(struct r8192_priv 
*priv, u8 *buffer,
  *
  * Return: 0-100 percentage
  *---*/
-static u8 rtl819x_query_rxpwrpercentage(char antpower)
+static u8 rtl819x_query_rxpwrpercentage(s8 antpower)
 {
if ((antpower <= -100) || (antpower >= 20))
return  0;
@@ -4220,9 +4220,9 @@ static u8 rtl819x_query_rxpwrpercentage(char antpower)
 
 }  /* QueryRxPwrPercentage */
 
-static u8 rtl819x_evm_dbtopercentage(char value)
+static u8 rtl819x_evm_dbtopercentage(s8 value)
 {
-   char ret_val;
+   s8 ret_val;
 
ret_val = value;
 
@@ -4297,8 +4297,8 @@ static void rtl8192_query_rxphystatus(struct r8192_priv 
*priv,
phy_ofdm_rx_status_rxsc_sgien_exintfflag *prxsc;
u8  *prxpkt;
u8  i, max_spatial_stream, tmp_rxsnr, tmp_rxevm, rxsc_sgien_exflg;
-   charrx_pwr[4], rx_pwr_all = 0;
-   charrx_snrX, rx_evmX;
+   s8  rx_pwr[4], rx_pwr_all = 0;
+   s8  rx_snrX, rx_evmX;
u8  evm, pwdb_all;
u32 RSSI, total_rssi = 0;
u8  is_cck_rate = 0;
@@ -4423,7 +4423,7 @@ static void rtl8192_query_rxphystatus(struct r8192_priv 
*priv,
 
/* Get Rx snr value in DB */
tmp_rxsnr = pofdm_buf->rxsnr_X[i];
-   rx_snrX = (char)(tmp_rxsnr);
+   rx_snrX = (s8)(tmp_rxsnr);
rx_snrX /= 2;
priv->stats.rxSNRdB[i] = (long)rx_snrX;
 
@@ -4457,7 +4457,7 @@ static void rtl8192_query_rxphystatus(struct r8192_priv 
*priv,
 
for (i = 0; i < max_spatial_stream; i++) {
tmp_rxevm = pofdm_buf->rxevm_X[i];
-   rx_evmX = (char)(tmp_rxevm);
+   rx_evmX = 

[PATCH 1/3] rtlwifi: don't add include path for rtl8188ee

2016-07-19 Thread Arnd Bergmann
For rtl8188ee, we pass -Idrivers/net/wireless/rtlwifi/ to gcc,
however that directy no longer exists, so evidently this option
is no longer required here and can be removed to avoid a warning
when building with 'make W=1' or 'gcc -Wmissing-include-dirs'

Signed-off-by: Arnd Bergmann 
---
 drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile 
b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile
index a85419a37651..676e7de27f27 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile
@@ -12,4 +12,4 @@ rtl8188ee-objs := \
 
 obj-$(CONFIG_RTL8188EE) += rtl8188ee.o
 
-ccflags-y += -Idrivers/net/wireless/rtlwifi -D__CHECK_ENDIAN__
+ccflags-y += -D__CHECK_ENDIAN__
-- 
2.9.0



[PATCH 2/3] staging/rtl8192e: use s8 instead of char

2016-07-19 Thread Arnd Bergmann
Compiling the rtlwifi drivers for ARM with gcc -Wextra warns about lots of
incorrect code that results from 'char' being unsigned here, e.g.

staging/rtl8192e/rtl8192e/r8192E_phy.c:1072:36: error: comparison is always 
false due to limited range of data type [-Werror=type-limits]
staging/rtl8192e/rtl8192e/r8192E_phy.c:1104:36: error: comparison is always 
false due to limited range of data type [-Werror=type-limits]
staging/rtl8192e/rtl8192e/rtl_core.c:1987:16: error: comparison is always false 
due to limited range of data type [-Werror=type-limits]
staging/rtl8192e/rtl8192e/rtl_dm.c:782:37: error: comparison is always false 
due to limited range of data type [-Werror=type-limits]
staging/rtl8192e/rtl819x_TSProc.c:326:14: error: comparison is always true due 
to limited range of data type [-Werror=type-limits]
staging/rtl8192e/rtllib_softmac_wx.c:465:16: error: comparison is always false 
due to limited range of data type [-Werror=type-limits]

This patch changes all uses of 'char' in this driver that refer to
8-bit integers to use 's8' instead, which is signed on all architectures.

Signed-off-by: Arnd Bergmann 
---
 drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c | 8 
 drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c | 2 +-
 drivers/staging/rtl8192e/rtl8192e/rtl_core.c   | 6 +++---
 drivers/staging/rtl8192e/rtl8192e/rtl_core.h   | 8 
 drivers/staging/rtl8192e/rtl819x_TSProc.c  | 2 +-
 5 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c 
b/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c
index ba64a4f1b3a8..8d6bca61e7aa 100644
--- a/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c
+++ b/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c
@@ -1487,8 +1487,8 @@ static void _rtl92e_query_rxphystatus(
struct phy_ofdm_rx_status_rxsc_sgien_exintfflag *prxsc;
u8 *prxpkt;
u8 i, max_spatial_stream, tmp_rxsnr, tmp_rxevm, rxsc_sgien_exflg;
-   char rx_pwr[4], rx_pwr_all = 0;
-   char rx_snrX, rx_evmX;
+   s8 rx_pwr[4], rx_pwr_all = 0;
+   s8 rx_snrX, rx_evmX;
u8 evm, pwdb_all;
u32 RSSI, total_rssi = 0;
u8 is_cck_rate = 0;
@@ -1613,7 +1613,7 @@ static void _rtl92e_query_rxphystatus(
 2) - 110;
 
tmp_rxsnr = pofdm_buf->rxsnr_X[i];
-   rx_snrX = (char)(tmp_rxsnr);
+   rx_snrX = (s8)(tmp_rxsnr);
rx_snrX /= 2;
priv->stats.rxSNRdB[i] = (long)rx_snrX;
 
@@ -1643,7 +1643,7 @@ static void _rtl92e_query_rxphystatus(
 
for (i = 0; i < max_spatial_stream; i++) {
tmp_rxevm = pofdm_buf->rxevm_X[i];
-   rx_evmX = (char)(tmp_rxevm);
+   rx_evmX = (s8)(tmp_rxevm);
 
rx_evmX /= 2;
 
diff --git a/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c 
b/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c
index 5e3bbe5c3ca4..0698131e4300 100644
--- a/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c
+++ b/drivers/staging/rtl8192e/rtl8192e/r8192E_phy.c
@@ -630,7 +630,7 @@ void rtl92e_set_tx_power(struct net_device *dev, u8 channel)
 {
struct r8192_priv *priv = rtllib_priv(dev);
u8  powerlevel = 0, powerlevelOFDM24G = 0;
-   char ant_pwr_diff;
+   s8  ant_pwr_diff;
u32 u4RegValue;
 
if (priv->epromtype == EEPROM_93C46) {
diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c 
b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
index 13a5ddc2bea5..41e05f206300 100644
--- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
+++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
@@ -1982,7 +1982,7 @@ void rtl92e_update_rx_statistics(struct r8192_priv *priv,
weighting) / 6;
 }
 
-u8 rtl92e_rx_db_to_percent(char antpower)
+u8 rtl92e_rx_db_to_percent(s8 antpower)
 {
if ((antpower <= -100) || (antpower >= 20))
return  0;
@@ -1993,9 +1993,9 @@ u8 rtl92e_rx_db_to_percent(char antpower)
 
 }  /* QueryRxPwrPercentage */
 
-u8 rtl92e_evm_db_to_percent(char value)
+u8 rtl92e_evm_db_to_percent(s8 value)
 {
-   char ret_val;
+   s8 ret_val;
 
ret_val = value;
 
diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.h 
b/drivers/staging/rtl8192e/rtl8192e/rtl_core.h
index f627fdc15a58..6921125c9d35 100644
--- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.h
+++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.h
@@ -503,8 +503,8 @@ struct r8192_priv {
u32 Pwr_Track;
u8 CCKPresentAttentuation_20Mdefault;
u8 CCKPresentAttentuation_40Mdefault;
-   char CCKPresentAttentuation_difference;
-   char CCKPresentAttentuation;
+   s8 CCKPresentAttentuation_difference;
+   s8 CCKPresentAttentuation;
long undecorated_smoothed_pwdb;
 
u32 MCSTxPowerLevelOriginalOffset[6];
@@ -604,8 +604,8 @@ void 

Re: [PATCH v0 00/10] Convert Netgear WNR854T to devicetree

2016-07-19 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

> [Reducing the Cc: list a bit to networking people]
>
>> Okay. Frames sent from the port are EDSA-tagged (which isn't exactly
>> surprising), but I'm yet to see the switch receive 0xdada frames.
>> Even with the net-next branch which uses DSA_TAG_PROTO_EDSA for all
>> chip types.
>> 
>> However, the ethertype is reflecting the port:-
>> 
>> lan1/5 : ethertype Unknown (0xc028), length 176:
>> lan2/7 : ethertype Unknown (0xc038), length 176:
>> lan3/0 : ethertype Unknown (0xc000), length 176:
>> lan4/1 : ethertype Unknown (0xc008), length 176:
>> wan/2  : ethertype Unknown (0xc010), length 176:
>
> O.K, we broke it :-(
>
> The 6185 does not support EDSA, only DSA.

I'm wondering if EDSA could be simulated on 6185 (only) with the
CoreTagType register (0x19) and DoubleTag bit in Port Control (0x04)...

> Or we do something more radical, like add a driver callback to return
> the tagging protocol, rather than hard code it in the structure? We
> can then use a capability flag.

The driver callback is the way to go. The supported tag format is
identifiable via the register layout. 6352 and newer have a 2-bit
FrameMode in Port Control (0x04), while 6185 has a single DSA_Tag bit.

A new switch operation can return the tag protocol enum, or an exposed
xmit/recv pair from net/dsa/tag_*.c, which must be used by the CPU.

Note: 88E6060 (which is likely to be supported by mv88e6xxx) has the
same issue and requires this, since it uses the Trailer tagging format.

Thanks,

Vivien


Re: [PATCH net-next 1/1] net_sched: Introduce skbmod action

2016-07-19 Thread Daniel Borkmann

On 07/19/2016 03:56 PM, Jamal Hadi Salim wrote:
[...]

But apart from this,
neither pedit nor tcf_skbmod_run() here handle checksum complete, so you'll
potentially get false positives wrt csum corruption and drops as a result
when using either of the two.


pedit maybe tricky. Any suggestions?
On tcf_skbmod_run, mostly ignorance: while doing only ethernet updates;
is it still needed to do the checksum complete?


Well, what Cong recently fixed with mirred was related to mac header ...

You probably need skb_postpull_rcsum(), skb_postpush_rcsum() pair.

Also, what about skb_try_make_writable()?


Re: [PATCH V2] Add flow control to the portmapper

2016-07-19 Thread Shiraz Saleem
On Tue, Jul 19, 2016 at 08:40:06AM +0300, Leon Romanovsky wrote:
> 
> You are the one user of this new inline function.
> Why don't you directly call to netlink_unicast() in your ibnl_unicast()
> without messing with widely visible header file?

Since there is a non-blocking version of nlmsg_unicast(), the idea is 
to make a blocking version available to others as well as maintain 
consistency of existing code.



[PATCH net-next v3] cdc_ether: Improve ZTE MF823/831/910 handling

2016-07-19 Thread Kristian Evensen
The firmware in several ZTE devices (at least the MF823/831/910
modems/mifis) use OS fingerprinting to determine which type of device to
export. In addition, these devices export a REST API which can be used to
control the type of device. So far, on Linux, the devices have been seen as
RNDIS or CDC Ether.

When CDC Ether is used, devices of the same type are, as with RNDIS,
exported with the same, bogus random MAC address. In addition, the devices
(at least on all firmware revisions I have found) use the bogus MAC when
sending traffic routed from external networks. And as a final feature, the
devices sometimes export the link state incorrectly. There are also
references online to several other ZTE devices displaying this behavior,
with several different PIDs and MAC addresses.

This patch tries to improve the handling of ZTE devices by doing the
following:

* Create a new driver_info-struct that is used by ZTE devices that do not
have an explicit entry in the product table. This struct is the same as the
default cdc_ether driver info, but a new bind- and an rx_fixup-function
have been added.

* In the new bind function, we check if we have read a random MAC from the
device. If we have, then we generate a new random MAC address. This will
ensure that all devices get a unique MAC.

* The rx_fixup-function replaces the destination MAC address in the skb
with that of the device. I have not seen a revision of these devices that
behaves correctly (i.e., sets the right destination MAC), so I chose not to
do any comparison with for example the known, bogus addresses.

* The MF823/MF832/MF910 sometimes export cdc carrier on twice on connect
(the correct behavior is off then on). Work around this by manually setting
carrier to off if an on-notification is received and the NOCARRIER-bit is
not set.

This change will affect all devices, but it should take care of similar
mistakes made by other manufacturers. I tried to think of/look/test for
problems/regressions that could be introduced by this behavior, but could
not find any. However, my familiarity with this code path is not that
great, so there could be something I have overlooked.

I have tested this patch with multiple revisions of all three devices, and
they behave as expected. In other words, they all got a valid, random MAC,
the correct operational state and I can receive/sent traffic without
problems. I also tested with some other cdc_ether devices I have and did
not find any problems/regressions caused by the two general changes.

v2->v3:
* I had forgot to remove the random MAC generation from usbnet_cdc_bind()
(thanks Olive).
* Rework logic in the ZTE bind-function a bit.

v1->v2:
* Only generate random MAC for ZTE devices (thanks Oliver Neukum).
* Set random MAC and do RX fixup for all ZTE devices that do not have a
product-entry, as the bogus MAC have been seen on devices with several
different PIDs/MAC addresses. In other words, it seems to be the default
behavior of ZTE CDC Ether devices (thanks Lars Melin).

Signed-off-by: Kristian Evensen 
---
 drivers/net/usb/cdc_ether.c | 54 +
 1 file changed, 54 insertions(+)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 7cba2c3..874caad 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -388,6 +388,12 @@ void usbnet_cdc_status(struct usbnet *dev, struct urb *urb)
case USB_CDC_NOTIFY_NETWORK_CONNECTION:
netif_dbg(dev, timer, dev->net, "CDC: carrier %s\n",
  event->wValue ? "on" : "off");
+
+   /* Work-around for devices with broken off-notifications */
+   if (event->wValue &&
+   !test_bit(__LINK_STATE_NOCARRIER, >net->state))
+   usbnet_link_change(dev, 0, 0);
+
usbnet_link_change(dev, !!event->wValue, 0);
break;
case USB_CDC_NOTIFY_SPEED_CHANGE:   /* tx/rx rates */
@@ -432,6 +438,37 @@ int usbnet_cdc_bind(struct usbnet *dev, struct 
usb_interface *intf)
 }
 EXPORT_SYMBOL_GPL(usbnet_cdc_bind);
 
+static int usbnet_cdc_zte_bind(struct usbnet *dev, struct usb_interface *intf)
+{
+   int status = usbnet_cdc_bind(dev, intf);
+
+   if (!status && (dev->net->dev_addr[0] & 0x02))
+   eth_hw_addr_random(dev->net);
+
+   return status;
+}
+
+/* Make sure packets have correct destination MAC address
+ *
+ * A firmware bug observed on some devices (ZTE MF823/831/910) is that the
+ * device sends packets with a static, bogus, random MAC address (event if
+ * device MAC address has been updated). Always set MAC address to that of the
+ * device.
+ */
+static int usbnet_cdc_zte_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
+{
+   u8 num_buggy_hwaddrs;
+   u8 buggy_hwaddrs_idx = 0;
+
+   if (skb->len < ETH_HLEN || !(skb->data[0] & 0x02))
+   return 1;
+
+   skb_reset_mac_header(skb);
+   

Re: [PATCH net-next v3 11/12] net: dsa: mv88e6xxx: add G1 helper for ageing time

2016-07-19 Thread Andrew Lunn
> Hi Vivien
> 
> This is doing a read/modify/write, so should really hold the mutex.

Upps. The mutex is held. Sorry for the noise.

  Andrew


Re: [PATCH net-next v3 11/12] net: dsa: mv88e6xxx: add G1 helper for ageing time

2016-07-19 Thread Andrew Lunn
> +static int mv88e6xxx_g1_set_age_time(struct mv88e6xxx_chip *chip,
> +  unsigned int msecs)
> +{
> + const unsigned int coeff = chip->info->age_time_coeff;
> + const unsigned int min = 0x01 * coeff;
> + const unsigned int max = 0xff * coeff;
> + u8 age_time;
> + u16 val;
> + int err;
> +
> + if (msecs < min || msecs > max)
> + return -ERANGE;
> +
> + /* Round to nearest multiple of coeff */
> + age_time = (msecs + coeff / 2) / coeff;
> +
> + err = mv88e6xxx_read(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, );
> + if (err)
> + return err;
> +
> + /* AgeTime is 11:4 bits */
> + val &= ~0xff0;
> + val |= age_time << 4;
> +
> + return mv88e6xxx_write(chip, REG_GLOBAL, GLOBAL_ATU_CONTROL, val);
> +}

Hi Vivien

This is doing a read/modify/write, so should really hold the mutex.

> + err = mv88e6xxx_g1_set_age_time(chip, 30);

Maybe use (5 * 60 * 1000) to make it more readable?

  Andrew


Re: [PATCH v2 05/11] Kbuild: don't add obj tree in additional includes

2016-07-19 Thread Kalle Valo
Arnd Bergmann  writes:

> On Monday, July 18, 2016 10:14:39 PM CEST Michal Marek wrote:
>> On Wed, Jun 15, 2016 at 05:45:47PM +0200, Arnd Bergmann wrote:
>> > When building with separate object directories and driver specific
>> > Makefiles that add additional header include paths, Kbuild adjusts
>> > the gcc flags so that we include both the directory in the source
>> > tree and in the object tree.
>> > 
>> > However, due to another bug I fixed earlier, this did not actually
>> > include the correct directory in the object tree, so we know that
>> > we only really need the source tree here. Also, including the
>> > object tree sometimes causes warnings about nonexisting directories
>> > when the include path only exists in the source.
>> > 
>> > This changes the logic to only emit the -I argument for the srctree,
>> > not for objects. We still need both $(srctree)/$(src) and $(obj)
>> > though, so I'm adding them manually.
>> > 
>> > Signed-off-by: Arnd Bergmann 
>> 
>> Hi Arnd,
>> 
>> I applied the series up to this patch to kbuild.git#kbuild. The rest
>> seem to be related but not dependent patches, so I'll leave it up to the
>> respective maintainers to pick them up. Is that OK with you?
>
> I think that's fine, a couple were already picked up, and what I have
> left now is
>
> a281bfa5713a [SUBMITTED 20160615] [EXPERIMENTAL] Kbuild: enable 
> -Wmissing-include-dirs by default
> 83934921e68e [SUBMITTED 20160615] rtlwifi: don't add include path for 
> rtl8188ee

Apparently[1] you didn't CC linux-wireless and that's why I didn't see
the rtlwifi patch in wireless patchwork. Care to resend?

[1] https://patchwork.kernel.org/patch/9178861/

-- 
Kalle Valo


Re: [PATCH net-next v3 10/12] net: dsa: support switchdev ageing time attr

2016-07-19 Thread Andrew Lunn
On Mon, Jul 18, 2016 at 09:26:00PM -0700, Florian Fainelli wrote:
> Le 18/07/2016 à 20:24, Andrew Lunn a écrit :
> > On Mon, Jul 18, 2016 at 08:45:38PM -0400, Vivien Didelot wrote:
> >> Add a new function for DSA drivers to handle the switchdev
> >> SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.
> >>
> >> The ageing time is passed as milliseconds.
> >>
> >> Also because we can have multiple logical bridges on top of a physical
> >> switch and ageing time are switch-wide, call the driver function with
> >> the fastest ageing time in use on the chip instead of the requested one.
> >>
> >> Signed-off-by: Vivien Didelot 
> >> ---
> >>  include/net/dsa.h |  2 ++
> >>  net/dsa/slave.c   | 41 +
> > 
> 
> Hi Andrew,
> 
> > Hi Florian
> > 
> > It looks like the SF2 can do fast ageing per port. What i don't see if
> > what configuration options you have. Can you get the fast and the
> > normal age time per port? Or is it global?
> 
> The normal ageing is global and the value needs to be programmed in
> seconds, can can range from 10 to 1,048,575 (encoded on 20 bits). The
> fast-ageing can actually be per-port, per-VLAN id, for just dynamic or
> static entries etc. and is just a poor name for a flush based on any of
> these criteria.

Hi Florian

So fast ageing does not have a timer value associated to it? It is
just a flush?

If so, the code Vivien is proposing in DSA slave is O.K. If however
there was a per port timer, Vivien's code is too high in the stack,
blocking SF2 from being able to use per-port timers. That is what i'm
trying to get at.

 Andrew


  1   2   >