date:20170610

RE: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-10 Thread Ilan Tayari

> -Original Message-
> From: Jason Gunthorpe [mailto:jguntho...@obsidianresearch.com]
> Subject: Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova
> 


> > > This keep getting more ugly :(
> > >
> > > What about security? What if user space sends some raw packets to the
> > > FPGA - can it reprogram the ISPEC settings or worse?
> > >
> >
> > No such thing. This QP is only for internal driver/HW communications,
> > as it is faster from the existing command interface.
> > it is not meant to be exposed for any raw user space usages at all,
> > without proper standard API adapter of course.
> 
> I'm not asking about the QP, I'm asking what happens after the NIC
> part. You use ROCE packets to control the FPGA. What prevents
> userspace from forcibly constructing roce packets and sending them to
> the FPGA. How does the FPGA know for certain the packet came from the
> kernel QP and not someplace else.
> 
> This is especially true for mlx nics as there are many raw packet
> bypass mechanisms available to userspace.

Hi Jason,

The device uses internal signaling that ensures that no entity other than the 
mlx5 driver can talk over the FPGA channel.
This is also the reason why this is not a "ULP in a  driver", but rather an 
internal bus that happens to use some of our existing HW features.

As explained earlier, this "bus" is an internal device implementation issue, 
and has nothing to do with the network or RDMA stack.

Ilan.

Re: [PATCH net-next 6/9] net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC

2017-06-10 Thread Andrew Lunn

> > +int hclge_mac_mdio_config(struct hclge_dev *hdev)
> > +{
> > +   struct hclge_mac *mac = >hw.mac;
> > +   struct mii_bus *mdio_bus;
> > +   struct net_device *ndev = >ndev;
> > +   struct phy_device *phy;
> > +   bool is_c45;
> > +   int ret;
> > +
> > +   if (hdev->hw.mac.phy_addr >= PHY_MAX_ADDR)
> > +   return 0;
> > +
> > +   if (hdev->hw.mac.phy_if == PHY_INTERFACE_MODE_NA)
> > +   return 0;
> > +   else if (mac->phy_if == PHY_INTERFACE_MODE_SGMII)
> > +   is_c45 = 0;
> > +   else if (mac->phy_if == PHY_INTERFACE_MODE_XGMII)
> > +   is_c45 = 1;
> > +   else
> > +   return -ENODATA;
> 
> Can you consider using a switch () case statement here?

Does this concept even make sense? The Marvell 10G phy will use SGMII
for 10/100/1000Mbs and swap to XGMII for 10Gbps. It however stays a
c45 device all the time.

In general, i don't think PHY mode is related to C22/C45.

   Andrew

Re: [PATCH 08/44] xen-swiotlb: implement ->mapping_error

2017-06-10 Thread Konrad Rzeszutek Wilk

On Thu, Jun 08, 2017 at 03:25:33PM +0200, Christoph Hellwig wrote:
> DMA_ERROR_CODE is going to go away, so don't rely on it.

Reviewed-by: Konrad Rzeszutek Wilk

Re: [PATCH 07/44] xen-swiotlb: consolidate xen_swiotlb_dma_ops

2017-06-10 Thread Konrad Rzeszutek Wilk

On Thu, Jun 08, 2017 at 03:25:32PM +0200, Christoph Hellwig wrote:
> ARM and x86 had duplicated versions of the dma_ops structure, the
> only difference is that x86 hasn't wired up the set_dma_mask,
> mmap, and get_sgtable ops yet.  On x86 all of them are identical
> to the generic version, so they aren't needed but harmless.
> 
> All the symbols used only for xen_swiotlb_dma_ops can now be marked
> static as well.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/xen/mm.c  | 17 
>  arch/x86/xen/pci-swiotlb-xen.c | 14 ---
>  drivers/xen/swiotlb-xen.c  | 93 
> ++
>  include/xen/swiotlb-xen.h  | 62 +---
>  4 files changed, 49 insertions(+), 137 deletions(-)

Yeeey!

Reviewed-by: Konrad Rzeszutek Wilk

[PATCH net-next] bpf, arm64: take advantage of stack_depth tracking

2017-06-10 Thread Daniel Borkmann

Make use of recently implemented stack_depth tracking for arm64 JIT,
so that stack usage can be reduced heavily for programs not using
tail calls at least.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 arch/arm64/net/bpf_jit_comp.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 4f95873..73de2c7 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -69,6 +69,7 @@ struct jit_ctx {
int epilogue_offset;
int *offset;
u32 *image;
+   u32 stack_size;
 };
 
 static inline void emit(const u32 insn, struct jit_ctx *ctx)
@@ -145,16 +146,11 @@ static inline int epilogue_offset(const struct jit_ctx 
*ctx)
 /* Stack must be multiples of 16B */
 #define STACK_ALIGN(sz) (((sz) + 15) & ~15)
 
-#define _STACK_SIZE \
-   (MAX_BPF_STACK \
-+ 4 /* extra for skb_copy_bits buffer */)
-
-#define STACK_SIZE STACK_ALIGN(_STACK_SIZE)
-
 #define PROLOGUE_OFFSET 8
 
 static int build_prologue(struct jit_ctx *ctx)
 {
+   const struct bpf_prog *prog = ctx->prog;
const u8 r6 = bpf2a64[BPF_REG_6];
const u8 r7 = bpf2a64[BPF_REG_7];
const u8 r8 = bpf2a64[BPF_REG_8];
@@ -176,9 +172,9 @@ static int build_prologue(struct jit_ctx *ctx)
 *| |
 *| ... | BPF prog stack
 *| |
-*+-+ <= (BPF_FP - MAX_BPF_STACK)
+*+-+ <= (BPF_FP - prog->aux->stack_depth)
 *|RSVD | JIT scratchpad
-* current A64_SP =>  +-+ <= (BPF_FP - STACK_SIZE)
+* current A64_SP =>  +-+ <= (BPF_FP - ctx->stack_size)
 *| |
 *| ... | Function call stack
 *| |
@@ -202,8 +198,12 @@ static int build_prologue(struct jit_ctx *ctx)
/* Initialize tail_call_cnt */
emit(A64_MOVZ(1, tcc, 0, 0), ctx);
 
+   /* 4 byte extra for skb_copy_bits buffer */
+   ctx->stack_size = prog->aux->stack_depth + 4;
+   ctx->stack_size = STACK_ALIGN(ctx->stack_size);
+
/* Set up function call stack */
-   emit(A64_SUB_I(1, A64_SP, A64_SP, STACK_SIZE), ctx);
+   emit(A64_SUB_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
 
cur_offset = ctx->idx - idx0;
if (cur_offset != PROLOGUE_OFFSET) {
@@ -288,7 +288,7 @@ static void build_epilogue(struct jit_ctx *ctx)
const u8 fp = bpf2a64[BPF_REG_FP];
 
/* We're done with BPF stack */
-   emit(A64_ADD_I(1, A64_SP, A64_SP, STACK_SIZE), ctx);
+   emit(A64_ADD_I(1, A64_SP, A64_SP, ctx->stack_size), ctx);
 
/* Restore fs (x25) and x26 */
emit(A64_POP(fp, A64_R(26), A64_SP), ctx);
@@ -732,7 +732,7 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
return -EINVAL;
}
emit_a64_mov_i64(r3, size, ctx);
-   emit(A64_SUB_I(1, r4, fp, STACK_SIZE), ctx);
+   emit(A64_SUB_I(1, r4, fp, ctx->stack_size), ctx);
emit_a64_mov_i64(r5, (unsigned long)bpf_load_pointer, ctx);
emit(A64_BLR(r5), ctx);
emit(A64_MOV(1, r0, A64_R(0)), ctx);
-- 
1.9.3

[PATCHv3 net] xfrm: move xfrm_garbage_collect out of xfrm_policy_flush

2017-06-10 Thread Hangbin Liu

Now we will force to do garbage collection if any policy removed in
xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
-> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
dereference when check percpu_empty. The code path looks like:

flow_cache_fini()
  - fc->percpu = NULL
xfrm_policy_fini()
  - xfrm_policy_flush()
- xfrm_garbage_collect()
  - flow_cache_flush()
- flow_cache_percpu_empty()
  - fcp = per_cpu_ptr(fc->percpu, cpu)

To reproduce, just add ipsec in netns and then remove the netns.

v2:
As Xin Long suggested, since only two other places need to call it. move
xfrm_garbage_collect() outside xfrm_policy_flush().

v3:
Fix subject mismatch after v2 fix.

Fixes: 35db06912189 ("xfrm: do the garbage collection after flushing policy")
Signed-off-by: Hangbin Liu 
---
 net/key/af_key.c   | 2 ++
 net/xfrm/xfrm_policy.c | 4 
 net/xfrm/xfrm_user.c   | 1 +
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 512dc43..5103f92 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -2755,6 +2755,8 @@ static int pfkey_spdflush(struct sock *sk, struct sk_buff 
*skb, const struct sad
int err, err2;
 
err = xfrm_policy_flush(net, XFRM_POLICY_TYPE_MAIN, true);
+   if (!err)
+   xfrm_garbage_collect(net);
err2 = unicast_flush_resp(sk, hdr);
if (err || err2) {
if (err == -ESRCH) /* empty table - old silent behavior */
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index ed4e52d..643a18f 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1006,10 +1006,6 @@ int xfrm_policy_flush(struct net *net, u8 type, bool 
task_valid)
err = -ESRCH;
 out:
spin_unlock_bh(>xfrm.xfrm_policy_lock);
-
-   if (cnt)
-   xfrm_garbage_collect(net);
-
return err;
 }
 EXPORT_SYMBOL(xfrm_policy_flush);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 38614df..86116e9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2027,6 +2027,7 @@ static int xfrm_flush_policy(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return 0;
return err;
}
+   xfrm_garbage_collect(net);
 
c.data.type = type;
c.event = nlh->nlmsg_type;
-- 
2.5.5

Re: [PATCHv2 net] net/flow: fix fc->percpu NULL pointer dereference

2017-06-10 Thread Hangbin Liu

On Sat, Jun 10, 2017 at 04:29:23PM +0800, Xin Long wrote:
>  It's a xfrm fix, pls also fix the title, like:
>xfrm: move xfrm_garbage_collect out of xfrm_policy_flush
> or
>xfrm: fix ...
Opps, sorry forgot that.

Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-10 Thread Nikolay Aleksandrov

On 10/06/17 23:30, Donald Sharp wrote:
> This patch fixes two issues:
> 
> 1) When forwarding on *,G mroutes that are in a vrf, the
> kernel was dropping information about the actual incoming
> interface when calling ip_mr_forward from ip_mr_input.
> This caused ip_mr_forward to send the multicast packet
> back out the incoming interface.  Fix this by
> modifying ip_mr_forward to be handed the correctly
> resolved dev.
> 
> 2) When a unresolved cache entry is created we store
> the incoming skb on the unresolved cache entry and
> upon mroute resolution from the user space daemon,
> we attempt to forward the packet.  Again we were
> not resolving to the correct incoming device for
> a vrf scenario, before calling ip_mr_forward.
> Fix this by resolving to the correct interface
> and calling ip_mr_forward with the result.
> 
> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
> Signed-off-by: Donald Sharp 
> ---
> v2: Fixed title
> v3: Addressed Review comments by Andrew Lunn and David Ahern
> 

Looks good, thanks!

Acked-by: Nikolay Aleksandrov

Re: [Intel-wired-lan] [PATCH] i40evf: remove redundant null check on key

2017-06-10 Thread Alexander Duyck

On Sat, Jun 10, 2017 at 4:33 AM, Dan Carpenter  wrote:
>
> This patch isn't right...
>
> On Wed, Jun 07, 2017 at 12:54:07AM +0100, Colin King wrote:
>> From: Colin Ian King 
>>
>> key has previously been null checked so the subsequent null check
>> is redundant as key can never be null at that point, so remove it.
>>
>
> Actually, it's the reverse.  "key" is always NULL.  Probably the ||
> should be a &&?
>
> regards,
> dan carpenter

Actually the original code and the patched version are still both
broken, but it is more broken with the patch. With this change I am
pretty sure we will kernel panic if we use the ethtool ioctl for
ETHTOOL_SRXFHINDIR, or don't update the key when updating other fields
in the flow hash.

So the original logic here looks like a bad copy of code from igb.
There it doesn't support updating the key so if key is set we are
supposed to be returning an error since key update isn't currently
supported. So the check for key at the start of this function should
probably be dropped instead of the second check. From what I can tell
the original code prevents key from ever being updated since if key is
non-null it means we want to update the key.

- Alex

Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-10 Thread Thomas Winter

I don't think we've seen this issue but patch looks good.

From: David Ahern 
Sent: 11 June 2017 11:33
To: David Miller; sha...@cumulusnetworks.com
Cc: netdev@vger.kernel.org; Thomas Winter; niko...@cumulusnetworks.com; 
yot...@mellanox.com; ido...@mellanox.com; ro...@cumulusnetworks.com
Subject: Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in 
vrf's

On 6/10/17 5:07 PM, David Miller wrote:
> From: Donald Sharp 
> Date: Sat, 10 Jun 2017 16:30:17 -0400
>
>> This patch fixes two issues:
>>
>> 1) When forwarding on *,G mroutes that are in a vrf, the
>> kernel was dropping information about the actual incoming
>> interface when calling ip_mr_forward from ip_mr_input.
>> This caused ip_mr_forward to send the multicast packet
>> back out the incoming interface.  Fix this by
>> modifying ip_mr_forward to be handed the correctly
>> resolved dev.
>>
>> 2) When a unresolved cache entry is created we store
>> the incoming skb on the unresolved cache entry and
>> upon mroute resolution from the user space daemon,
>> we attempt to forward the packet.  Again we were
>> not resolving to the correct incoming device for
>> a vrf scenario, before calling ip_mr_forward.
>> Fix this by resolving to the correct interface
>> and calling ip_mr_forward with the result.
>>
>> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
>> Signed-off-by: Donald Sharp 
>
> David, please review.
>

Responded. Would be good for the Mellanox team (and Thomas) to chime in
as well.

Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-10 Thread David Ahern

On 6/10/17 5:07 PM, David Miller wrote:
> From: Donald Sharp 
> Date: Sat, 10 Jun 2017 16:30:17 -0400
> 
>> This patch fixes two issues:
>>
>> 1) When forwarding on *,G mroutes that are in a vrf, the
>> kernel was dropping information about the actual incoming
>> interface when calling ip_mr_forward from ip_mr_input.
>> This caused ip_mr_forward to send the multicast packet
>> back out the incoming interface.  Fix this by
>> modifying ip_mr_forward to be handed the correctly
>> resolved dev.
>>
>> 2) When a unresolved cache entry is created we store
>> the incoming skb on the unresolved cache entry and
>> upon mroute resolution from the user space daemon,
>> we attempt to forward the packet.  Again we were
>> not resolving to the correct incoming device for
>> a vrf scenario, before calling ip_mr_forward.
>> Fix this by resolving to the correct interface
>> and calling ip_mr_forward with the result.
>>
>> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
>> Signed-off-by: Donald Sharp 
> 
> David, please review.
> 

Responded. Would be good for the Mellanox team (and Thomas) to chime in
as well.

Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-10 Thread David Ahern

On 6/10/17 2:30 PM, Donald Sharp wrote:
> This patch fixes two issues:
> 
> 1) When forwarding on *,G mroutes that are in a vrf, the
> kernel was dropping information about the actual incoming
> interface when calling ip_mr_forward from ip_mr_input.
> This caused ip_mr_forward to send the multicast packet
> back out the incoming interface.  Fix this by
> modifying ip_mr_forward to be handed the correctly
> resolved dev.
> 
> 2) When a unresolved cache entry is created we store
> the incoming skb on the unresolved cache entry and
> upon mroute resolution from the user space daemon,
> we attempt to forward the packet.  Again we were
> not resolving to the correct incoming device for
> a vrf scenario, before calling ip_mr_forward.
> Fix this by resolving to the correct interface
> and calling ip_mr_forward with the result.
> 
> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
> Signed-off-by: Donald Sharp 
> ---
> v2: Fixed title
> v3: Addressed Review comments by Andrew Lunn and David Ahern
> 

LGTM.

Acked-by: David Ahern

Re: [PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-10 Thread David Miller

From: Donald Sharp 
Date: Sat, 10 Jun 2017 16:30:17 -0400

> This patch fixes two issues:
> 
> 1) When forwarding on *,G mroutes that are in a vrf, the
> kernel was dropping information about the actual incoming
> interface when calling ip_mr_forward from ip_mr_input.
> This caused ip_mr_forward to send the multicast packet
> back out the incoming interface.  Fix this by
> modifying ip_mr_forward to be handed the correctly
> resolved dev.
> 
> 2) When a unresolved cache entry is created we store
> the incoming skb on the unresolved cache entry and
> upon mroute resolution from the user space daemon,
> we attempt to forward the packet.  Again we were
> not resolving to the correct incoming device for
> a vrf scenario, before calling ip_mr_forward.
> Fix this by resolving to the correct interface
> and calling ip_mr_forward with the result.
> 
> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
> Signed-off-by: Donald Sharp 

David, please review.

Re: [PATCH net-next 0/8] Misc BPF updates

2017-06-10 Thread David Miller

From: Daniel Borkmann 
Date: Sun, 11 Jun 2017 00:50:39 +0200

> This set contains a couple of misc updates: stack usage reduction
> for perf_sample_data in tracing progs, reduction of stale data in
> verifier on register state transitions that I still had in my queue
> and few selftest improvements as well as bpf_set_hash() helper for
> tc programs.

Series applied, thanks Daniel.

Re: [PATCH net-next 1/1] net: reflect mark on tcp syn ack packets

2017-06-10 Thread David Miller

From: Jamal Hadi Salim 
Date: Sat, 10 Jun 2017 09:31:01 -0400

> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 7a3fd25..a8fd5f0 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -173,7 +173,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const 
> struct sock *sk,
>   }
>  
>   skb->priority = sk->sk_priority;
> - skb->mark = sk->sk_mark;
> + if (!skb->mark)
> + skb->mark = sk->sk_mark;

Maybe this should both be "inet_request_mark()"?

Also, Lorenzo, please review.

Re: [PATCH v2 2/2] tcp: md5: extend the tcp_md5sig struct to specify a key address prefix

2017-06-10 Thread David Miller

From: Ivan Delalande 
Date: Fri,  9 Jun 2017 19:14:49 -0700

> Add a flag field and address prefix length at the end of the tcp_md5sig
> structure so users can configure an address prefix length along with a
> key. Make sure shorter option values are still accepted in
> tcp_v4_parse_md5_keys and tcp_v6_parse_md5_keys to maintain backward
> compatibility.
> 
> Signed-off-by: Bob Gilligan 
> Signed-off-by: Eric Mowat 
> Signed-off-by: Ivan Delalande 

As I believe was previously stated, the problem with this approach is
that if a new tool requests the prefix length and is run on an older
kernel, the kernel will return success even though the prefix length
was not taken into account.

We do not want to get a success back when the operation requested was
not performed.

[PATCH net-next 4/8] bpf: reset id on spilled regs in clear_all_pkt_pointers

2017-06-10 Thread Daniel Borkmann

Right now, we don't reset the id of spilled registers in case of
clear_all_pkt_pointers(). Given pkt_pointers are highly likely to
contain an id, do so by reusing __mark_reg_unknown_value().

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d195d82..519a614 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1346,8 +1346,8 @@ static void clear_all_pkt_pointers(struct 
bpf_verifier_env *env)
if (reg->type != PTR_TO_PACKET &&
reg->type != PTR_TO_PACKET_END)
continue;
-   reg->type = UNKNOWN_VALUE;
-   reg->imm = 0;
+   __mark_reg_unknown_value(state->spilled_regs,
+i / BPF_REG_SIZE);
}
 }
 
-- 
1.9.3

[PATCH net-next 8/8] bpf: add bpf_set_hash helper for tc progs

2017-06-10 Thread Daniel Borkmann

Allow for tc BPF programs to set a skb->hash, apart from clearing
and triggering a recalc that we have right now. It allows for BPF
to implement a custom hashing routine for skb_get_hash().

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h   |  8 +++-
 net/core/filter.c  | 20 
 tools/include/uapi/linux/bpf.h |  8 +++-
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9b2c10b..f94b48b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -513,6 +513,11 @@ enum bpf_attach_type {
  * Get the owner uid of the socket stored inside sk_buff.
  * @skb: pointer to skb
  * Return: uid of the socket owner on success or overflowuid if failed.
+ *
+ * u32 bpf_set_hash(skb, hash)
+ * Set full skb->hash.
+ * @skb: pointer to skb
+ * @hash: hash to set
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -562,7 +567,8 @@ enum bpf_attach_type {
FN(xdp_adjust_head),\
FN(probe_read_str), \
FN(get_socket_cookie),  \
-   FN(get_socket_uid),
+   FN(get_socket_uid), \
+   FN(set_hash),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 4867391..a65a3b2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1874,6 +1874,24 @@ int skb_do_redirect(struct sk_buff *skb)
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_2(bpf_set_hash, struct sk_buff *, skb, u32, hash)
+{
+   /* Set user specified hash as L4(+), so that it gets returned
+* on skb_get_hash() call unless BPF prog later on triggers a
+* skb_clear_hash().
+*/
+   __skb_set_sw_hash(skb, hash, true);
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_set_hash_proto = {
+   .func   = bpf_set_hash,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+};
+
 BPF_CALL_3(bpf_skb_vlan_push, struct sk_buff *, skb, __be16, vlan_proto,
   u16, vlan_tci)
 {
@@ -2744,6 +2762,8 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const 
void *src_buff,
return _get_hash_recalc_proto;
case BPF_FUNC_set_hash_invalid:
return _set_hash_invalid_proto;
+   case BPF_FUNC_set_hash:
+   return _set_hash_proto;
case BPF_FUNC_perf_event_output:
return _skb_event_output_proto;
case BPF_FUNC_get_smp_processor_id:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9b2c10b..f94b48b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -513,6 +513,11 @@ enum bpf_attach_type {
  * Get the owner uid of the socket stored inside sk_buff.
  * @skb: pointer to skb
  * Return: uid of the socket owner on success or overflowuid if failed.
+ *
+ * u32 bpf_set_hash(skb, hash)
+ * Set full skb->hash.
+ * @skb: pointer to skb
+ * @hash: hash to set
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -562,7 +567,8 @@ enum bpf_attach_type {
FN(xdp_adjust_head),\
FN(probe_read_str), \
FN(get_socket_cookie),  \
-   FN(get_socket_uid),
+   FN(get_socket_uid), \
+   FN(set_hash),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
1.9.3

[PATCH net-next 3/8] bpf: reset id on CONST_IMM transition

2017-06-10 Thread Daniel Borkmann

Whenever we set the register to the type CONST_IMM, we currently don't
reset the id to 0. id member is not used in CONST_IMM case, so don't
let it become stale, where pruning won't be able to match later on.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d031b3b..d195d82 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1952,6 +1952,7 @@ static int check_alu_op(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
 */
regs[insn->dst_reg].type = CONST_IMM;
regs[insn->dst_reg].imm = insn->imm;
+   regs[insn->dst_reg].id = 0;
regs[insn->dst_reg].max_value = insn->imm;
regs[insn->dst_reg].min_value = insn->imm;
regs[insn->dst_reg].min_align = calc_align(insn->imm);
@@ -2409,6 +2410,7 @@ static int check_ld_imm(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
 
regs[insn->dst_reg].type = CONST_IMM;
regs[insn->dst_reg].imm = imm;
+   regs[insn->dst_reg].id = 0;
return 0;
}
 
-- 
1.9.3

[PATCH net-next 0/8] Misc BPF updates

2017-06-10 Thread Daniel Borkmann

This set contains a couple of misc updates: stack usage reduction
for perf_sample_data in tracing progs, reduction of stale data in
verifier on register state transitions that I still had in my queue
and few selftest improvements as well as bpf_set_hash() helper for
tc programs.

Thanks!

Daniel Borkmann (8):
  bpf: avoid excessive stack usage for perf_sample_data
  bpf: don't check spilled reg state for non-STACK_SPILLed type slots
  bpf: reset id on CONST_IMM transition
  bpf: reset id on spilled regs in clear_all_pkt_pointers
  bpf, tests: add a test for htab lookup + update traversal
  bpf, tests: set rlimit also for test_align, so it doesn't fail
  bpf: remove cg_skb_func_proto and use sk_filter_func_proto directly
  bpf: add bpf_set_hash helper for tc progs

 include/uapi/linux/bpf.h |  8 -
 kernel/bpf/verifier.c|  8 +++--
 kernel/trace/bpf_trace.c | 10 ---
 net/core/filter.c| 28 +-
 tools/include/uapi/linux/bpf.h   |  8 -
 tools/testing/selftests/bpf/test_align.c |  5 
 tools/testing/selftests/bpf/test_maps.c  | 50 
 7 files changed, 102 insertions(+), 15 deletions(-)

-- 
1.9.3

[PATCH net-next 5/8] bpf, tests: add a test for htab lookup + update traversal

2017-06-10 Thread Daniel Borkmann

Add a test case to track behaviour when traversing and updating the
htab map. We recently used such traversal, so it's quite useful to
keep it as an example in selftests.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_maps.c | 50 +
 1 file changed, 50 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 9331452..79601c8 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -239,6 +239,54 @@ static void test_hashmap_percpu(int task, void *data)
close(fd);
 }
 
+static void test_hashmap_walk(int task, void *data)
+{
+   int fd, i, max_entries = 10;
+   long long key, value, next_key;
+   bool next_key_valid = true;
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
+   max_entries, map_flags);
+   if (fd < 0) {
+   printf("Failed to create hashmap '%s'!\n", strerror(errno));
+   exit(1);
+   }
+
+   for (i = 0; i < max_entries; i++) {
+   key = i; value = key;
+   assert(bpf_map_update_elem(fd, , , BPF_NOEXIST) == 0);
+   }
+
+   for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : ,
+_key) == 0; i++) {
+   key = next_key;
+   assert(bpf_map_lookup_elem(fd, , ) == 0);
+   }
+
+   assert(i == max_entries);
+
+   assert(bpf_map_get_next_key(fd, NULL, ) == 0);
+   for (i = 0; next_key_valid; i++) {
+   next_key_valid = bpf_map_get_next_key(fd, , _key) == 0;
+   assert(bpf_map_lookup_elem(fd, , ) == 0);
+   value++;
+   assert(bpf_map_update_elem(fd, , , BPF_EXIST) == 0);
+   key = next_key;
+   }
+
+   assert(i == max_entries);
+
+   for (i = 0; bpf_map_get_next_key(fd, !i ? NULL : ,
+_key) == 0; i++) {
+   key = next_key;
+   assert(bpf_map_lookup_elem(fd, , ) == 0);
+   assert(value - 1 == key);
+   }
+
+   assert(i == max_entries);
+   close(fd);
+}
+
 static void test_arraymap(int task, void *data)
 {
int key, next_key, fd;
@@ -464,6 +512,7 @@ static void test_map_stress(void)
run_parallel(100, test_hashmap, NULL);
run_parallel(100, test_hashmap_percpu, NULL);
run_parallel(100, test_hashmap_sizes, NULL);
+   run_parallel(100, test_hashmap_walk, NULL);
 
run_parallel(100, test_arraymap, NULL);
run_parallel(100, test_arraymap_percpu, NULL);
@@ -549,6 +598,7 @@ static void run_all_tests(void)
 {
test_hashmap(0, NULL);
test_hashmap_percpu(0, NULL);
+   test_hashmap_walk(0, NULL);
 
test_arraymap(0, NULL);
test_arraymap_percpu(0, NULL);
-- 
1.9.3

[PATCH net-next 2/8] bpf: don't check spilled reg state for non-STACK_SPILLed type slots

2017-06-10 Thread Daniel Borkmann

spilled_regs[] state is only used for stack slots of type STACK_SPILL,
never for STACK_MISC. Right now, in states_equal(), even if we have
old and current stack state of type STACK_MISC, we compare spilled_regs[]
for that particular offset. Just skip these like we do everywhere else.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 14ccb07..d031b3b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2828,6 +2828,8 @@ static bool states_equal(struct bpf_verifier_env *env,
return false;
if (i % BPF_REG_SIZE)
continue;
+   if (old->stack_slot_type[i] != STACK_SPILL)
+   continue;
if (memcmp(>spilled_regs[i / BPF_REG_SIZE],
   >spilled_regs[i / BPF_REG_SIZE],
   sizeof(old->spilled_regs[0])))
-- 
1.9.3

[PATCH net-next 7/8] bpf: remove cg_skb_func_proto and use sk_filter_func_proto directly

2017-06-10 Thread Daniel Borkmann

Since cg_skb_func_proto() doesn't do anything else than just calling
into sk_filter_func_proto(), remove it and set sk_filter_func_proto()
directly for .get_func_proto callback.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 946f758..4867391 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2775,12 +2775,6 @@ static unsigned long bpf_xdp_copy(void *dst_buff, const 
void *src_buff,
 }
 
 static const struct bpf_func_proto *
-cg_skb_func_proto(enum bpf_func_id func_id)
-{
-   return sk_filter_func_proto(func_id);
-}
-
-static const struct bpf_func_proto *
 lwt_inout_func_proto(enum bpf_func_id func_id)
 {
switch (func_id) {
@@ -3344,7 +3338,7 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type 
type,
 };
 
 const struct bpf_verifier_ops cg_skb_prog_ops = {
-   .get_func_proto = cg_skb_func_proto,
+   .get_func_proto = sk_filter_func_proto,
.is_valid_access= sk_filter_is_valid_access,
.convert_ctx_access = bpf_convert_ctx_access,
.test_run   = bpf_prog_test_run_skb,
-- 
1.9.3

[PATCH net-next 1/8] bpf: avoid excessive stack usage for perf_sample_data

2017-06-10 Thread Daniel Borkmann

perf_sample_data consumes 386 bytes on stack, reduce excessive stack
usage and move it to per cpu buffer. It's allowed due to preemption
being disabled for tracing, xdp and tc programs, thus at all times
only one program can run on a specific CPU and programs cannot run
from interrupt. We similarly also handle bpf_pt_regs.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/trace/bpf_trace.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 08eb072..051d7fc 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -266,14 +266,16 @@ const struct bpf_func_proto 
*bpf_get_trace_printk_proto(void)
.arg2_type  = ARG_ANYTHING,
 };
 
+static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd);
+
 static __always_inline u64
 __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map,
u64 flags, struct perf_raw_record *raw)
 {
struct bpf_array *array = container_of(map, struct bpf_array, map);
+   struct perf_sample_data *sd = this_cpu_ptr(_sd);
unsigned int cpu = smp_processor_id();
u64 index = flags & BPF_F_INDEX_MASK;
-   struct perf_sample_data sample_data;
struct bpf_event_entry *ee;
struct perf_event *event;
 
@@ -294,9 +296,9 @@ const struct bpf_func_proto 
*bpf_get_trace_printk_proto(void)
if (unlikely(event->oncpu != cpu))
return -EOPNOTSUPP;
 
-   perf_sample_data_init(_data, 0, 0);
-   sample_data.raw = raw;
-   perf_event_output(event, _data, regs);
+   perf_sample_data_init(sd, 0, 0);
+   sd->raw = raw;
+   perf_event_output(event, sd, regs);
return 0;
 }
 
-- 
1.9.3

[PATCH net-next 6/8] bpf, tests: set rlimit also for test_align, so it doesn't fail

2017-06-10 Thread Daniel Borkmann

When running all the tests, through 'make run_tests', I had
test_align failing due to insufficient rlimit. Set it the same
way as all other test cases from BPF selftests do, so that
test case properly loads everything.

  [...]
  Summary: 7 PASSED, 1 FAILED
  selftests: test_progs [PASS]
  /home/foo/net-next/tools/testing/selftests/bpf
  Test   0: mov ... Failed to load program.
  FAIL
  Test   1: shift ... Failed to load program.
  FAIL
  Test   2: addsub ... Failed to load program.
  FAIL
  Test   3: mul ... Failed to load program.
  FAIL
  Test   4: unknown shift ... Failed to load program.
  FAIL
  Test   5: unknown mul ... Failed to load program.
  FAIL
  Test   6: packet const offset ... Failed to load program.
  FAIL
  Test   7: packet variable offset ... Failed to load program.
  FAIL
  Results: 0 pass 8 fail
  selftests: test_align [PASS]
  [...]

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 tools/testing/selftests/bpf/test_align.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_align.c 
b/tools/testing/selftests/bpf/test_align.c
index 9644d4e..1426594 100644
--- a/tools/testing/selftests/bpf/test_align.c
+++ b/tools/testing/selftests/bpf/test_align.c
@@ -9,6 +9,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 #include 
@@ -432,6 +434,9 @@ static int do_test(unsigned int from, unsigned int to)
 int main(int argc, char **argv)
 {
unsigned int from = 0, to = ARRAY_SIZE(tests);
+   struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+
+   setrlimit(RLIMIT_MEMLOCK, );
 
if (argc == 3) {
unsigned int l = atoi(argv[argc - 2]);
-- 
1.9.3

Re: [PATCH v2 0/2] net: mvpp2: driver fixes

2017-06-10 Thread David Miller

From: Thomas Petazzoni 
Date: Sat, 10 Jun 2017 23:18:20 +0200

> As requested, here is a series of patches containing only bug fixes
> for the mvpp2 driver. It is based on the latest "net" branch.
> 
> Changes since v1:
> 
>  - Fixed a build breakage that occurred when only PATCH 1 was only,
>and not later patches in the series. Was reported by the kbuild
>report on the first submission.
> 
>  - Added Tested-by from Marc Zyngier on PATCH 2.

Series applied, thank you.

Re: [PATCH] net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx

2017-06-10 Thread David Miller

From: Jia-Ju Bai 
Date: Sat, 10 Jun 2017 16:49:39 +0800

> The kernel may sleep under a rcu read lock in cfpkt_create_pfx, and the
> function call path is:
> cfcnfg_linkup_rsp (acquire the lock by rcu_read_lock)
>   cfctrl_linkdown_req
> cfpkt_create
>   cfpkt_create_pfx
> alloc_skb(GFP_KERNEL) --> may sleep
> cfserl_receive (acquire the lock by rcu_read_lock)
>   cfpkt_split
> cfpkt_create_pfx
>   alloc_skb(GFP_KERNEL) --> may sleep
> 
> There is "in_interrupt" in cfpkt_create_pfx to decide use "GFP_KERNEL" or
> "GFP_ATOMIC". In this situation, "GFP_KERNEL" is used because the function 
> is called under a rcu read lock, instead in interrupt.
> 
> To fix it, only "GFP_ATOMIC" is used in cfpkt_create_pfx.
> 
> Signed-off-by: Jia-Ju Bai 

Applied and queued up for -stable.

Re: [PATCH] net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse

2017-06-10 Thread David Miller

From: Jia-Ju Bai 
Date: Sat, 10 Jun 2017 17:03:35 +0800

> The kernel may sleep under a rcu read lock in tipc_msg_reverse, and the
> function call path is:
> tipc_l2_rcv_msg (acquire the lock by rcu_read_lock)
>   tipc_rcv
> tipc_sk_rcv
>   tipc_msg_reverse
> pskb_expand_head(GFP_KERNEL) --> may sleep
> tipc_node_broadcast
>   tipc_node_xmit_skb
> tipc_node_xmit
>   tipc_sk_rcv
> tipc_msg_reverse
>   pskb_expand_head(GFP_KERNEL) --> may sleep
> 
> To fix it, "GFP_KERNEL" is replaced with "GFP_ATOMIC".
> 
> Signed-off-by: Jia-Ju Bai 

Applied and queued up for -stable.

[PATCH v2 0/2] net: mvpp2: driver fixes

2017-06-10 Thread Thomas Petazzoni

Hello,

As requested, here is a series of patches containing only bug fixes
for the mvpp2 driver. It is based on the latest "net" branch.

Changes since v1:

 - Fixed a build breakage that occurred when only PATCH 1 was only,
   and not later patches in the series. Was reported by the kbuild
   report on the first submission.

 - Added Tested-by from Marc Zyngier on PATCH 2.

Thanks!

Thomas

Thomas Petazzoni (2):
  net: mvpp2: remove mvpp2_bm_cookie_{build,pool_get}
  net: mvpp2: use {get,put}_cpu() instead of smp_processor_id()

 drivers/net/ethernet/marvell/mvpp2.c | 74 
 1 file changed, 33 insertions(+), 41 deletions(-)

-- 
2.9.4

[PATCH v2 2/2] net: mvpp2: use {get,put}_cpu() instead of smp_processor_id()

2017-06-10 Thread Thomas Petazzoni

smp_processor_id() should not be used in migration-enabled contexts. We
originally thought it was OK in the specific situation of this driver,
but it was wrong, and calling smp_processor_id() in a migration-enabled
context prints a big fat warning when CONFIG_DEBUG_PREEMPT=y.

Therefore, this commit replaces the smp_processor_id() in
migration-enabled contexts by the appropriate get_cpu/put_cpu sections.

Reported-by: Marc Zyngier 
Fixes: a786841df72e ("net: mvpp2: handle register mapping and access for 
PPv2.2")
Signed-off-by: Thomas Petazzoni 
Tested-by: Marc Zyngier 
---
 drivers/net/ethernet/marvell/mvpp2.c | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 5841e53..33c9016 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -3719,7 +3719,7 @@ static void mvpp2_bm_bufs_get_addrs(struct device *dev, 
struct mvpp2 *priv,
dma_addr_t *dma_addr,
phys_addr_t *phys_addr)
 {
-   int cpu = smp_processor_id();
+   int cpu = get_cpu();
 
*dma_addr = mvpp2_percpu_read(priv, cpu,
  MVPP2_BM_PHY_ALLOC_REG(bm_pool->id));
@@ -3740,6 +3740,8 @@ static void mvpp2_bm_bufs_get_addrs(struct device *dev, 
struct mvpp2 *priv,
if (sizeof(phys_addr_t) == 8)
*phys_addr |= (u64)phys_addr_highbits << 32;
}
+
+   put_cpu();
 }
 
 /* Free all buffers from the pool */
@@ -3925,7 +3927,7 @@ static inline void mvpp2_bm_pool_put(struct mvpp2_port 
*port, int pool,
 dma_addr_t buf_dma_addr,
 phys_addr_t buf_phys_addr)
 {
-   int cpu = smp_processor_id();
+   int cpu = get_cpu();
 
if (port->priv->hw_version == MVPP22) {
u32 val = 0;
@@ -3952,6 +3954,8 @@ static inline void mvpp2_bm_pool_put(struct mvpp2_port 
*port, int pool,
   MVPP2_BM_VIRT_RLS_REG, buf_phys_addr);
mvpp2_percpu_write(port->priv, cpu,
   MVPP2_BM_PHY_RLS_REG(pool), buf_dma_addr);
+
+   put_cpu();
 }
 
 /* Refill BM pool */
@@ -4732,7 +4736,7 @@ static void mvpp2_txp_max_tx_size_set(struct mvpp2_port 
*port)
 static void mvpp2_rx_pkts_coal_set(struct mvpp2_port *port,
   struct mvpp2_rx_queue *rxq)
 {
-   int cpu = smp_processor_id();
+   int cpu = get_cpu();
 
if (rxq->pkts_coal > MVPP2_OCCUPIED_THRESH_MASK)
rxq->pkts_coal = MVPP2_OCCUPIED_THRESH_MASK;
@@ -4740,6 +4744,8 @@ static void mvpp2_rx_pkts_coal_set(struct mvpp2_port 
*port,
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_NUM_REG, rxq->id);
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_THRESH_REG,
   rxq->pkts_coal);
+
+   put_cpu();
 }
 
 static u32 mvpp2_usec_to_cycles(u32 usec, unsigned long clk_hz)
@@ -4920,7 +4926,7 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
mvpp2_write(port->priv, MVPP2_RXQ_STATUS_REG(rxq->id), 0);
 
/* Set Rx descriptors queue starting address - indirect access */
-   cpu = smp_processor_id();
+   cpu = get_cpu();
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_NUM_REG, rxq->id);
if (port->priv->hw_version == MVPP21)
rxq_dma = rxq->descs_dma;
@@ -4929,6 +4935,7 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_DESC_ADDR_REG, rxq_dma);
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_DESC_SIZE_REG, rxq->size);
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_INDEX_REG, 0);
+   put_cpu();
 
/* Set Offset */
mvpp2_rxq_offset_set(port, rxq->id, NET_SKB_PAD);
@@ -4991,10 +4998,11 @@ static void mvpp2_rxq_deinit(struct mvpp2_port *port,
 * free descriptor number
 */
mvpp2_write(port->priv, MVPP2_RXQ_STATUS_REG(rxq->id), 0);
-   cpu = smp_processor_id();
+   cpu = get_cpu();
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_NUM_REG, rxq->id);
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_DESC_ADDR_REG, 0);
mvpp2_percpu_write(port->priv, cpu, MVPP2_RXQ_DESC_SIZE_REG, 0);
+   put_cpu();
 }
 
 /* Create and initialize a Tx queue */
@@ -5017,7 +5025,7 @@ static int mvpp2_txq_init(struct mvpp2_port *port,
txq->last_desc = txq->size - 1;
 
/* Set Tx descriptors queue starting address - indirect access */
-   cpu = smp_processor_id();
+   cpu = get_cpu();
mvpp2_percpu_write(port->priv, cpu, MVPP2_TXQ_NUM_REG, txq->id);
mvpp2_percpu_write(port->priv, cpu, MVPP2_TXQ_DESC_ADDR_REG,
   txq->descs_dma);
@@ -5042,6 +5050,7 @@ static int

[PATCH v2 1/2] net: mvpp2: remove mvpp2_bm_cookie_{build,pool_get}

2017-06-10 Thread Thomas Petazzoni

This commit removes the useless remove
mvpp2_bm_cookie_{build,pool_get} functions. All what
mvpp2_bm_cookie_build() was doing is compute a 32-bit value by
concatenating the pool number and the CPU number... only to get the pool
number re-extracted by mvpp2_bm_cookie_pool_get() later on.

Instead, just get the pool number directly from RX descriptor status,
and pass it to mvpp2_pool_refill() and mvpp2_rx_refill().

This has the added benefit of dropping a smp_processor_id() call in a
migration-enabled context, which is wrong, and is the original
motivation for making this change.

Fixes: 3f518509dedc9 ("ethernet: Add new driver for Marvell Armada 375 network 
unit")
Signed-off-by: Thomas Petazzoni 
---
 drivers/net/ethernet/marvell/mvpp2.c | 47 +++-
 1 file changed, 14 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index 70bca2a..5841e53 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -3920,12 +3920,6 @@ static inline u32 mvpp2_bm_cookie_pool_set(u32 cookie, 
int pool)
return bm;
 }
 
-/* Get pool number from a BM cookie */
-static inline int mvpp2_bm_cookie_pool_get(unsigned long cookie)
-{
-   return (cookie >> MVPP2_BM_COOKIE_POOL_OFFS) & 0xFF;
-}
-
 /* Release buffer to BM */
 static inline void mvpp2_bm_pool_put(struct mvpp2_port *port, int pool,
 dma_addr_t buf_dma_addr,
@@ -3961,12 +3955,10 @@ static inline void mvpp2_bm_pool_put(struct mvpp2_port 
*port, int pool,
 }
 
 /* Refill BM pool */
-static void mvpp2_pool_refill(struct mvpp2_port *port, u32 bm,
+static void mvpp2_pool_refill(struct mvpp2_port *port, int pool,
  dma_addr_t dma_addr,
  phys_addr_t phys_addr)
 {
-   int pool = mvpp2_bm_cookie_pool_get(bm);
-
mvpp2_bm_pool_put(port, pool, dma_addr, phys_addr);
 }
 
@@ -4513,21 +4505,6 @@ static void mvpp2_rxq_offset_set(struct mvpp2_port *port,
mvpp2_write(port->priv, MVPP2_RXQ_CONFIG_REG(prxq), val);
 }
 
-/* Obtain BM cookie information from descriptor */
-static u32 mvpp2_bm_cookie_build(struct mvpp2_port *port,
-struct mvpp2_rx_desc *rx_desc)
-{
-   int cpu = smp_processor_id();
-   int pool;
-
-   pool = (mvpp2_rxdesc_status_get(port, rx_desc) &
-   MVPP2_RXD_BM_POOL_ID_MASK) >>
-   MVPP2_RXD_BM_POOL_ID_OFFS;
-
-   return ((pool & 0xFF) << MVPP2_BM_COOKIE_POOL_OFFS) |
-  ((cpu & 0xFF) << MVPP2_BM_COOKIE_CPU_OFFS);
-}
-
 /* Tx descriptors helper methods */
 
 /* Get pointer to next Tx descriptor to be processed (send) by HW */
@@ -4978,9 +4955,13 @@ static void mvpp2_rxq_drop_pkts(struct mvpp2_port *port,
 
for (i = 0; i < rx_received; i++) {
struct mvpp2_rx_desc *rx_desc = mvpp2_rxq_next_desc_get(rxq);
-   u32 bm = mvpp2_bm_cookie_build(port, rx_desc);
+   u32 status = mvpp2_rxdesc_status_get(port, rx_desc);
+   int pool;
+
+   pool = (status & MVPP2_RXD_BM_POOL_ID_MASK) >>
+   MVPP2_RXD_BM_POOL_ID_OFFS;
 
-   mvpp2_pool_refill(port, bm,
+   mvpp2_pool_refill(port, pool,
  mvpp2_rxdesc_dma_addr_get(port, rx_desc),
  mvpp2_rxdesc_cookie_get(port, rx_desc));
}
@@ -5418,7 +5399,7 @@ static void mvpp2_rx_csum(struct mvpp2_port *port, u32 
status,
 
 /* Reuse skb if possible, or allocate a new skb and add it to BM pool */
 static int mvpp2_rx_refill(struct mvpp2_port *port,
-  struct mvpp2_bm_pool *bm_pool, u32 bm)
+  struct mvpp2_bm_pool *bm_pool, int pool)
 {
dma_addr_t dma_addr;
phys_addr_t phys_addr;
@@ -5430,7 +5411,7 @@ static int mvpp2_rx_refill(struct mvpp2_port *port,
if (!buf)
return -ENOMEM;
 
-   mvpp2_pool_refill(port, bm, dma_addr, phys_addr);
+   mvpp2_pool_refill(port, pool, dma_addr, phys_addr);
 
return 0;
 }
@@ -5488,7 +5469,7 @@ static int mvpp2_rx(struct mvpp2_port *port, int rx_todo,
unsigned int frag_size;
dma_addr_t dma_addr;
phys_addr_t phys_addr;
-   u32 bm, rx_status;
+   u32 rx_status;
int pool, rx_bytes, err;
void *data;
 
@@ -5500,8 +5481,8 @@ static int mvpp2_rx(struct mvpp2_port *port, int rx_todo,
phys_addr = mvpp2_rxdesc_cookie_get(port, rx_desc);
data = (void *)phys_to_virt(phys_addr);
 
-   bm = mvpp2_bm_cookie_build(port, rx_desc);
-   pool = mvpp2_bm_cookie_pool_get(bm);
+   pool = (rx_status & MVPP2_RXD_BM_POOL_ID_MASK) >>
+   MVPP2_RXD_BM_POOL_ID_OFFS;
bm_pool =

Re: [PATCH] net: fec: Add a fec_enet_clear_ethtool_stats() stub for CONFIG_M5272

2017-06-10 Thread David Miller

From: Fabio Estevam 
Date: Sat, 10 Jun 2017 17:33:01 -0300

> On Sat, Jun 10, 2017 at 5:16 PM, David Miller  wrote:
>> From: Fabio Estevam 
>> Date: Fri,  9 Jun 2017 22:37:22 -0300
>>
>>> From: Fabio Estevam 
>>>
>>> Commit 2b30842b23b9 ("net: fec: Clear and enable MIB counters on imx51")
>>> introduced fec_enet_clear_ethtool_stats(), but missed to add a stub
>>> for the CONFIG_M5272=y case, causing build failure for the
>>> m5272c3_defconfig.
>>>
>>> Add the missing empty stub to fix the build failure.
>>>
>>> Reported-by: Paul Gortmaker 
>>> Signed-off-by: Fabio Estevam 
>>
>> Applied, thanks.
> 
> I should have marked net-next in the Subject.
> 
> Commit 2b30842b23b9 ("net: fec: Clear and enable MIB counters on
> imx51") is in net-next, so this one should go to net-next, not to net.
> 
> Sorry about that.

Ok, reverted from 'net' and applied to 'net-next' with an appropriate
Fixes: tag added.

Thanks.

Re: [PATCH] net: fec: Add a fec_enet_clear_ethtool_stats() stub for CONFIG_M5272

2017-06-10 Thread Fabio Estevam

On Sat, Jun 10, 2017 at 5:16 PM, David Miller  wrote:
> From: Fabio Estevam 
> Date: Fri,  9 Jun 2017 22:37:22 -0300
>
>> From: Fabio Estevam 
>>
>> Commit 2b30842b23b9 ("net: fec: Clear and enable MIB counters on imx51")
>> introduced fec_enet_clear_ethtool_stats(), but missed to add a stub
>> for the CONFIG_M5272=y case, causing build failure for the
>> m5272c3_defconfig.
>>
>> Add the missing empty stub to fix the build failure.
>>
>> Reported-by: Paul Gortmaker 
>> Signed-off-by: Fabio Estevam 
>
> Applied, thanks.

I should have marked net-next in the Subject.

Commit 2b30842b23b9 ("net: fec: Clear and enable MIB counters on
imx51") is in net-next, so this one should go to net-next, not to net.

Sorry about that.

[PATCH net v3] net: ipmr: Fix some mroute forwarding issues in vrf's

2017-06-10 Thread Donald Sharp

This patch fixes two issues:

1) When forwarding on *,G mroutes that are in a vrf, the
kernel was dropping information about the actual incoming
interface when calling ip_mr_forward from ip_mr_input.
This caused ip_mr_forward to send the multicast packet
back out the incoming interface.  Fix this by
modifying ip_mr_forward to be handed the correctly
resolved dev.

2) When a unresolved cache entry is created we store
the incoming skb on the unresolved cache entry and
upon mroute resolution from the user space daemon,
we attempt to forward the packet.  Again we were
not resolving to the correct incoming device for
a vrf scenario, before calling ip_mr_forward.
Fix this by resolving to the correct interface
and calling ip_mr_forward with the result.

Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast")
Signed-off-by: Donald Sharp 
---
v2: Fixed title
v3: Addressed Review comments by Andrew Lunn and David Ahern

 net/ipv4/ipmr.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 551de4d..09368a1 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -101,8 +101,8 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 
id);
 static void ipmr_free_table(struct mr_table *mrt);
 
 static void ip_mr_forward(struct net *net, struct mr_table *mrt,
- struct sk_buff *skb, struct mfc_cache *cache,
- int local);
+ struct net_device *dev, struct sk_buff *skb,
+ struct mfc_cache *cache, int local);
 static int ipmr_cache_report(struct mr_table *mrt,
 struct sk_buff *pkt, vifi_t vifi, int assert);
 static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
@@ -988,7 +988,7 @@ static void ipmr_cache_resolve(struct net *net, struct 
mr_table *mrt,
 
rtnl_unicast(skb, net, NETLINK_CB(skb).portid);
} else {
-   ip_mr_forward(net, mrt, skb, c, 0);
+   ip_mr_forward(net, mrt, skb->dev, skb, c, 0);
}
}
 }
@@ -1073,7 +1073,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
 
 /* Queue a packet for resolution. It gets locked cache entry! */
 static int ipmr_cache_unresolved(struct mr_table *mrt, vifi_t vifi,
-struct sk_buff *skb)
+struct sk_buff *skb, struct net_device *dev)
 {
const struct iphdr *iph = ip_hdr(skb);
struct mfc_cache *c;
@@ -1130,6 +1130,10 @@ static int ipmr_cache_unresolved(struct mr_table *mrt, 
vifi_t vifi,
kfree_skb(skb);
err = -ENOBUFS;
} else {
+   if (dev) {
+   skb->dev = dev;
+   skb->skb_iif = dev->ifindex;
+   }
skb_queue_tail(>mfc_un.unres.unresolved, skb);
err = 0;
}
@@ -1828,10 +1832,10 @@ static int ipmr_find_vif(struct mr_table *mrt, struct 
net_device *dev)
 
 /* "local" means that we should preserve one skb (for local delivery) */
 static void ip_mr_forward(struct net *net, struct mr_table *mrt,
- struct sk_buff *skb, struct mfc_cache *cache,
- int local)
+ struct net_device *dev, struct sk_buff *skb,
+ struct mfc_cache *cache, int local)
 {
-   int true_vifi = ipmr_find_vif(mrt, skb->dev);
+   int true_vifi = ipmr_find_vif(mrt, dev);
int psend = -1;
int vif, ct;
 
@@ -1853,13 +1857,7 @@ static void ip_mr_forward(struct net *net, struct 
mr_table *mrt,
}
 
/* Wrong interface: drop packet and (maybe) send PIM assert. */
-   if (mrt->vif_table[vif].dev != skb->dev) {
-   struct net_device *mdev;
-
-   mdev = l3mdev_master_dev_rcu(mrt->vif_table[vif].dev);
-   if (mdev == skb->dev)
-   goto forward;
-
+   if (mrt->vif_table[vif].dev != dev) {
if (rt_is_output_route(skb_rtable(skb))) {
/* It is our own packet, looped back.
 * Very complicated situation...
@@ -2053,7 +2051,7 @@ int ip_mr_input(struct sk_buff *skb)
read_lock(_lock);
vif = ipmr_find_vif(mrt, dev);
if (vif >= 0) {
-   int err2 = ipmr_cache_unresolved(mrt, vif, skb);
+   int err2 = ipmr_cache_unresolved(mrt, vif, skb, dev);
read_unlock(_lock);
 
return err2;
@@ -2064,7 +2062,7 @@ int ip_mr_input(struct sk_buff *skb)
}
 
read_lock(_lock);
-   ip_mr_forward(net, mrt, skb, cache, local);
+   ip_mr_forward(net, mrt, dev, skb, cache, local);
read_unlock(_lock);
 
if (local)
@@ -2238,7 +2236,7 @@ int

Re: [PATCH net-next] Remove the redundant skb->dev initialization in ip6_fragment

2017-06-10 Thread David Miller

From: Chenbo Feng 
Date: Sat, 10 Jun 2017 12:35:38 -0700

> From: Chenbo Feng 
> 
> After moves the skb->dev and skb->protocol initialization into
> ip6_output, setting the skb->dev inside ip6_fragment is unnecessary.
> 
> Fixes: 97a7a37a7b7b("ipv6: Initial skb->dev and skb->protocol in ip6_output")
> Signed-off-by: Chenbo Feng 

Applied, thank you.

Re: [PATCHv2 net-next] sctp: no need to check assoc id before calling sctp_assoc_set_id

2017-06-10 Thread David Miller

From: Xin Long 
Date: Sat, 10 Jun 2017 15:27:12 +0800

> sctp_assoc_set_id does the assoc id check in the beginning when
> processing dupcookie, no need to do the same check before calling
> it.
> 
> v1->v2:
>   fix some typo errs Marcelo pointed in changelog.
> 
> Signed-off-by: Xin Long 

Also applied, thanks.

Re: [PATCH net-next] sctp: fix recursive locking warning in sctp_do_peeloff

2017-06-10 Thread David Miller

From: Xin Long 
Date: Sat, 10 Jun 2017 14:56:56 +0800

> Dmitry got the following recursive locking report while running syzkaller
> fuzzer, the Call Trace:
 ...
> This warning is caused by the lock held by sctp_getsockopt() is on one
> socket, while the other lock that sctp_close() is getting later is on
> the newly created (which failed) socket during peeloff operation.
> 
> This patch is to avoid this warning by use lock_sock with subclass
> SINGLE_DEPTH_NESTING as Wang Cong and Marcelo's suggestion.
> 
> Reported-by: Dmitry Vyukov 
> Suggested-by: Marcelo Ricardo Leitner 
> Suggested-by: Cong Wang 
> Signed-off-by: Xin Long 

Applied.

Re: [PATCH net-next] sctp: use read_lock_bh in sctp_eps_seq_show

2017-06-10 Thread David Miller

From: Xin Long 
Date: Sat, 10 Jun 2017 15:13:32 +0800

> This patch is to use read_lock_bh instead of local_bh_disable
> and read_lock in sctp_eps_seq_show.
> 
> Signed-off-by: Xin Long 

Applied.

Re: [PATCH net] sctp: disable BH in sctp_for_each_endpoint

2017-06-10 Thread David Miller

From: Xin Long 
Date: Sat, 10 Jun 2017 14:48:14 +0800

> Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling
> BH. If CPU schedules to another thread A at this moment, the thread A may
> be trying to hold the write_lock with disabling BH.
> 
> As BH is disabled and CPU cannot schedule back to the thread holding the
> read_lock, while the thread A keeps waiting for the read_lock. A dead
> lock would be triggered by this.
> 
> This patch is to fix this dead lock by calling read_lock_bh instead to
> disable BH when holding the read_lock in sctp_for_each_endpoint.
> 
> Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and 
> reuse some for proc")
> Reported-by: Xiumei Mu 
> Signed-off-by: Xin Long 

Applied and queued up for -stable, thanks.

Re: [PATCH] net: fec: Add a fec_enet_clear_ethtool_stats() stub for CONFIG_M5272

2017-06-10 Thread David Miller

From: Fabio Estevam 
Date: Fri,  9 Jun 2017 22:37:22 -0300

> From: Fabio Estevam 
> 
> Commit 2b30842b23b9 ("net: fec: Clear and enable MIB counters on imx51")
> introduced fec_enet_clear_ethtool_stats(), but missed to add a stub
> for the CONFIG_M5272=y case, causing build failure for the
> m5272c3_defconfig.
> 
> Add the missing empty stub to fix the build failure.
> 
> Reported-by: Paul Gortmaker 
> Signed-off-by: Fabio Estevam 

Applied, thanks.

Re: [PATCH net-next] net/packet: remove unneeded declaraion of tpacket_snd().

2017-06-10 Thread David Miller

From: Rami Rosen 
Date: Sat, 10 Jun 2017 03:22:48 +0300

> This patch removes unneeded forward declaration of tpacket_snd()
> in net/packet/af_packet.c.
> 
> Signed-off-by: Rami Rosen 

Applied, thanks Rami.

Re: [PATCH] l2tp: cast l2tp traffic counter to unsigned

2017-06-10 Thread David Miller

From: Eric Dumazet 
Date: Fri, 09 Jun 2017 17:29:11 -0700

> On Fri, 2017-06-09 at 15:16 -0700, Stephen Hemminger wrote:
>> On Fri,  9 Jun 2017 16:29:47 +0200
>> Dominik Heidler  wrote:
>> 
>> > This fixes a counter problem on 32bit systems:
>> > When the rx_bytes counter reached 2 GiB, it jumpd to (2^64 Bytes - 2GiB) 
>> > Bytes.
>> > 
>> > rtnl_link_stats64 has __u64 type and atomic_long_read returns
>> > atomic_long_t which is signed. Due to the conversation
>> > we get an incorrect value on 32bit systems if the MSB of
>> > the atomic_long_t value is set.
>> > 
>> > CC: Tom Parkin 
>> > Fixes: 7b7c0719cd7a ("l2tp: avoid deadlock in l2tp stats update")
>> > Signed-off-by: Dominik Heidler 
>> > ---
>> >  net/l2tp/l2tp_eth.c | 13 +++--
>> >  1 file changed, 7 insertions(+), 6 deletions(-)
>> > 
>> > diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
>> > index 8b21af7321b9..668a75e002e9 100644
>> > --- a/net/l2tp/l2tp_eth.c
>> > +++ b/net/l2tp/l2tp_eth.c
>> > @@ -114,12 +114,13 @@ static void l2tp_eth_get_stats64(struct net_device 
>> > *dev,
>> >  {
>> >struct l2tp_eth *priv = netdev_priv(dev);
>> >  
>> > -  stats->tx_bytes   = atomic_long_read(>tx_bytes);
>> > -  stats->tx_packets = atomic_long_read(>tx_packets);
>> > -  stats->tx_dropped = atomic_long_read(>tx_dropped);
>> > -  stats->rx_bytes   = atomic_long_read(>rx_bytes);
>> > -  stats->rx_packets = atomic_long_read(>rx_packets);
>> > -  stats->rx_errors  = atomic_long_read(>rx_errors);
>> > +  stats->tx_bytes   = (unsigned long) atomic_long_read(>tx_bytes);
>> > +  stats->tx_packets = (unsigned long) atomic_long_read(>tx_packets);
>> > +  stats->tx_dropped = (unsigned long) atomic_long_read(>tx_dropped);
>> > +  stats->rx_bytes   = (unsigned long) atomic_long_read(>rx_bytes);
>> > +  stats->rx_packets = (unsigned long) atomic_long_read(>rx_packets);
>> > +  stats->rx_errors  = (unsigned long) atomic_long_read(>rx_errors);
>> > +
>> >  }
>> >  
>> >  static const struct net_device_ops l2tp_eth_netdev_ops = {
>> 
>> This is not the right way to fix this.
>> 
>> 1. shouldn't be using atomic's for network counters, look at other network 
>> devices.
>> 
>> 2. should be using u64_stats_fetch  api to handle 64 bit counters.
> 
> But they do not want 64bit counters, and not per cpu counters for a
> driver handling few packets per second.
> 
> Just use native size of "unsigned long".

Ahh yeah, indeed.  I've applied this l2tp patch, therefore.

> We use the same atomic_long_t for (struct netdev)->rx_dropped,
> tx_dropped & rx_nohandler
> 
> So I guess same fix is needed in dev_get_stats()

Looks like it, I'll apply a formal submission of this.

Re: [PATCH net-next 0/8] Bug fixes in ena ethernet driver

2017-06-10 Thread David Miller

From: Florian Fainelli 
Date: Fri, 9 Jun 2017 15:19:54 -0700

> On 06/09/2017 03:13 PM, neta...@amazon.com wrote:
>> From: Netanel Belgazal 
>> 
>> This patchset contains fixes for the bugs that were discovered so far.
> 
> If these are all fixes you should submit them against the "net" tree.
> net-next is for features [1].
> 
> Since these are fixes, you may also want to provide a Fixes: 12-digit
> commit ("commit subject") [2] such that David can queue these patches
> for stable trees and this can be retrofitted into kernel distributions.
> 
> [1]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/netdev-FAQ.txt#n25
> 
> [2]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n183

Yeah I agree.  If they are genuine bug fixes they should be submitted
against 'net'.  And yes, Fixes: tags are quite desirable as well.

Re: [PATCH] net: aquantia: atlantic: remove declaration of hw_atl_utils_hw_set_power

2017-06-10 Thread David Miller

From: Philippe Reynes 
Date: Fri,  9 Jun 2017 23:50:57 +0200

> This function is not defined, so no need to declare it.
> 
> As I don't have the hardware, I'd be very pleased if
> someone may test this patch.
> 
> Signed-off-by: Philippe Reynes 

Applied, thanks.

Re: [PATCH net-next] bpf: Remove duplicate tcp_filter hook in ipv6

2017-06-10 Thread David Miller

From: Chenbo Feng 
Date: Fri,  9 Jun 2017 12:17:37 -0700

> From: Chenbo Feng 
> 
> There are two tcp_filter hooks in tcp_ipv6 ingress path currently.
> One is at tcp_v6_rcv and another is in tcp_v6_do_rcv. It seems the
> tcp_filter() call inside tcp_v6_do_rcv is redundent and some packet
> will be filtered twice in this situation. This will cause trouble
> when using eBPF filters to account traffic data.
> 
> Signed-off-by: Chenbo Feng 
> Acked-by: Eric Dumazet 

Applied, thank you.

Re: [PATCH net-next v2] bonding: warn user when 802.3ad speed is unknown

2017-06-10 Thread David Miller

From: Nicolas Dichtel 
Date: Fri,  9 Jun 2017 17:58:08 +0200

> Goal is to advertise the user when ethtool speeds and 802.3ad speeds are
> desynchronized.
> When this case happens, the kernel needs to be patched.
> 
> Suggested-by: Andrew Lunn 
> Signed-off-by: Nicolas Dichtel 

Applied, thanks Nicolas.

Re: [PATCH net 0/2] bnx2x: Fix malicious VFs indication

2017-06-10 Thread David Miller

From: Yuval Mintz 
Date: Fri, 9 Jun 2017 17:17:00 +0300

> It was discovered that for a VF there's a simple [yet uncommon] scenario
> which would cause device firmware to declare that VF as malicious -
> Add a vlan interface on top of a VF and disable txvlan offloading for
> that VF [causing VF to transmit packets where vlan is on payload].
> 
> Patch #1 corrects driver transmission to prevent this issue.
> Patch #2 is a by-product correcting PF behavior once a VF is declared
> malicious.

Series applied, thank you.

Re: [PATCH net-next] bonding: warn user when 802.3ad speed is unknown

2017-06-10 Thread David Miller

From: Nicolas Dichtel 
Date: Fri,  9 Jun 2017 15:34:43 +0200

> Make it explicit in the log.
> 
> Suggested-by: Andrew Lunn 
> Signed-off-by: Nicolas Dichtel 

I agree with others that we should rate limit this somehow given
the context in which it is invoked.

Re: [PATCH net-next 2/2] netns: fix error code when the nsid is already used

2017-06-10 Thread David Miller

From: Nicolas Dichtel 
Date: Fri,  9 Jun 2017 14:41:57 +0200

> When the user tries to assign a specific nsid, idr_alloc() is called with
> the range [nsid, nsid+1]. If this nsid is already used, idr_alloc() returns
> ENOSPC (No space left on device). In our case, it's better to return
> EEXIST to make it clear that the nsid is not available.
> 
> CC: Jamal Hadi Salim 
> Signed-off-by: Nicolas Dichtel 

Applied.

Re: [PATCH net-next 1/2] netns: define extack error msg for nsis cmds

2017-06-10 Thread David Miller

From: Nicolas Dichtel 
Date: Fri,  9 Jun 2017 14:41:56 +0200

> It helps the user to identify errors.
> 
> CC: Jamal Hadi Salim 
> Signed-off-by: Nicolas Dichtel 

Applied, but please in the future always provide a proper "[PATCH 0/N]
" header posting with a patch series.

Thank you.

Re: [PATCH v4.11 -stable] esp4: Fix udpencap for local TCP packets.

2017-06-10 Thread David Miller

From: Steffen Klassert 
Date: Fri, 9 Jun 2017 11:35:46 +0200

> Locally generated TCP packets are usually cloned, so we
> do skb_cow_data() on this packets. After that we need to
> reload the pointer to the esp header. On udpencap this
> header has an offset to skb_transport_header, so take this
> offset into account.
> 
> This is a backport of:
> commit 0e78a87306a ("esp4: Fix udpencap for local TCP packets.")
> 
> Fixes: 67d349ed603 ("net/esp4: Fix invalid esph pointer crash")
> Fixes: fca11ebde3f0 ("esp4: Reorganize esp_output")
> Reported-by: Don Bowman 
> Signed-off-by: Steffen Klassert 

Queued up for -stable, thanks Steffen.

Re: [PATCH net-next] Revert "ipv6: Initial skb->dev and skb->protocol in ip6_output"

2017-06-10 Thread Chenbo Feng




On 06/10/2017 07:55 AM, Eric Dumazet wrote:

On Fri, 2017-06-09 at 12:56 -0700, Chenbo Feng wrote:

From: Chenbo Feng 

This reverts commit 97a7a37a7b7b("ipv6: Initial skb->dev and
skb->protocol in ip6_output") since it does not handles the
skb->dev assignment inside ip6_fragment() code path properly.
Need to rework and upload again

We can avoid the revert I believe the patch is fine after analysis.

Please submit this followup, thanks ! :

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 
02cd44f0953900108701895108b2fdaa9f9980e5..0d6f3b6345de26c329ae1d6f25dde652a5452d4b
 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -869,7 +869,6 @@ int ip6_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
if (skb->sk && dst_allfrag(skb_dst(skb)))
sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
  
-	skb->dev = skb_dst(skb)->dev;

icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
err = -EMSGSIZE;
  




Thanks for the help! Patch submitted here: 
http://patchwork.ozlabs.org/patch/774260/

[PATCH net-next] Remove the redundant skb->dev initialization in ip6_fragment

2017-06-10 Thread Chenbo Feng

From: Chenbo Feng 

After moves the skb->dev and skb->protocol initialization into
ip6_output, setting the skb->dev inside ip6_fragment is unnecessary.

Fixes: 97a7a37a7b7b("ipv6: Initial skb->dev and skb->protocol in ip6_output")
Signed-off-by: Chenbo Feng 
---
 net/ipv6/ip6_output.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 02cd44f..0d6f3b6 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -869,7 +869,6 @@ int ip6_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
if (skb->sk && dst_allfrag(skb_dst(skb)))
sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
 
-   skb->dev = skb_dst(skb)->dev;
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
err = -EMSGSIZE;
 
-- 
2.7.4

Re: [PATCH v2] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain

2017-06-10 Thread Arend van Spriel

On 03-06-17 17:36, Andy Shevchenko wrote:
> On Sat, Jun 3, 2017 at 1:29 AM, Peter S. Housel  wrote:
>> An earlier change to this function (3bdae810721b) fixed a leak in the
>> case of an unsuccessful call to brcmf_sdiod_buffrw(). However, the
>> glom_skb buffer, used for emulating a scattering read, is never used
>> or referenced after its contents are copied into the destination
>> buffers, and therefore always needs to be freed by the end of the
>> function.

[snip]

>> +   skb_queue_walk(pktq, skb) {
>> +   memcpy(skb->data, glom_skb->data, skb->len);
>> +   skb_pull(glom_skb, skb->len);
>> +   }
>> }
> 
>> +   brcmu_pkt_buf_free_skb(glom_skb);
> 
> Can we just add this one line instead or I'm missing something?

I guess. We don't want to walk the packet queue if glom_skb is not
carrying data due to brcmf_sdiod_buffrw() failure.

So I would go with the patch below as brcmu_pkt_buf_free_skb() simply
ignores null pointer.

Regards,
Arend
---
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index 5bc2ba2..3722f23 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -705,7 +705,7 @@ int brcmf_sdiod_recv_pkt(struct brcmf_sdio_dev
*sdiodev, struct sk_buff *pkt)
 int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev *sdiodev,
   struct sk_buff_head *pktq, uint totlen)
 {
-   struct sk_buff *glom_skb;
+   struct sk_buff *glom_skb = NULL;
struct sk_buff *skb;
u32 addr = sdiodev->sbwad;
int err = 0;
@@ -726,10 +726,8 @@ int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev
*sdiodev,
return -ENOMEM;
err = brcmf_sdiod_buffrw(sdiodev, SDIO_FUNC_2, false, addr,
 glom_skb);
-   if (err) {
-   brcmu_pkt_buf_free_skb(glom_skb);
+   if (err)
goto done;
-   }

skb_queue_walk(pktq, skb) {
memcpy(skb->data, glom_skb->data, skb->len);
@@ -740,6 +738,7 @@ int brcmf_sdiod_recv_chain(struct brcmf_sdio_dev
*sdiodev,
pktq);

 done:
+   brcmu_pkt_buf_free_skb(glom_skb);
return err;
 }

Re: [PATCH v2] sh_eth: add support to change MTU

2017-06-10 Thread Sergei Shtylyov


Hello!

On 06/09/2017 11:36 PM, Niklas Söderlund wrote:


The hardware supports the MTU to be changed and the driver it self is
somewhat prepared to support this. This patch hooks up the callbacks to
be able to change the MTU from user-space.

Signed-off-by: Niklas Söderlund 
Acked-by: Sergei Shtylyov 
---

[...]

@@ -3171,6 +3184,13 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
}
sh_eth_set_default_cpu_data(mdp->cd);

+   /* User's manua states max MTU should be 2048 but due to the


s/manua/manual/


   Looking at it for another time, I'd also reword the subject -- something 
like "sh_eth: add support for changing MTU" or "sh_eth: add MTU change support".


MBR, Sergei

[PATCH v1] net: rfkill: gpio: Switch to devm_acpi_dev_add_driver_gpios()

2017-06-10 Thread Andy Shevchenko

Switch to use managed variant of acpi_dev_add_driver_gpios() to simplify
error path and fix potentially wrong assingment if ->probe() fails.

Signed-off-by: Andy Shevchenko 
---
 net/rfkill/rfkill-gpio.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/rfkill/rfkill-gpio.c b/net/rfkill/rfkill-gpio.c
index 76c01cbd56e3..41bd496531d4 100644
--- a/net/rfkill/rfkill-gpio.c
+++ b/net/rfkill/rfkill-gpio.c
@@ -81,8 +81,7 @@ static int rfkill_gpio_acpi_probe(struct device *dev,
 
rfkill->type = (unsigned)id->driver_data;
 
-   return acpi_dev_add_driver_gpios(ACPI_COMPANION(dev),
-acpi_rfkill_default_gpios);
+   return devm_acpi_dev_add_driver_gpios(dev, acpi_rfkill_default_gpios);
 }
 
 static int rfkill_gpio_probe(struct platform_device *pdev)
@@ -154,8 +153,6 @@ static int rfkill_gpio_remove(struct platform_device *pdev)
rfkill_unregister(rfkill->rfkill_dev);
rfkill_destroy(rfkill->rfkill_dev);
 
-   acpi_dev_remove_driver_gpios(ACPI_COMPANION(>dev));
-
return 0;
 }
 
-- 
2.11.0

Re: [PATCH net-next 6/9] net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC

2017-06-10 Thread Florian Fainelli

Le 06/09/17 à 20:46, Salil Mehta a écrit :
> This patch adds the support of MDIO bus interface for HNS3 driver.
> Code provides various interfaces to start and stop the PHY layer
> and to read and write the MDIO bus or PHY.
> 
> Signed-off-by: Daode Huang 
> Signed-off-by: lipeng 
> Signed-off-by: Salil Mehta 
> Signed-off-by: Yisen Zhuang 
> ---
>  .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c| 310 
> +
>  1 file changed, 310 insertions(+)
>  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c 
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> new file mode 100644
> index 000..c6812d2
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> @@ -0,0 +1,310 @@
> +/*
> + * Copyright (c) 2016~2017 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include 
> +#include 
> +
> +#include "hclge_cmd.h"
> +#include "hclge_main.h"
> +
> +enum hclge_mdio_c22_op_seq {
> + HCLGE_MDIO_C22_WRITE = 1,
> + HCLGE_MDIO_C22_READ = 2
> +};
> +
> +enum hclge_mdio_c45_op_seq {
> + HCLGE_MDIO_C45_WRITE_ADDR = 0,
> + HCLGE_MDIO_C45_WRITE_DATA,
> + HCLGE_MDIO_C45_READ_INCREMENT,
> + HCLGE_MDIO_C45_READ
> +};
> +
> +#define HCLGE_MDIO_CTRL_START_BIT   BIT(0)
> +#define HCLGE_MDIO_CTRL_ST_MSK  GENMASK(2, 1)
> +#define HCLGE_MDIO_CTRL_ST_LSH  1
> +#define HCLGE_MDIO_IS_C22(c22)  (((c22) << HCLGE_MDIO_CTRL_ST_LSH) & \
> + HCLGE_MDIO_CTRL_ST_MSK)
> +
> +#define HCLGE_MDIO_CTRL_OP_MSK  GENMASK(4, 3)
> +#define HCLGE_MDIO_CTRL_OP_LSH  3
> +#define HCLGE_MDIO_CTRL_OP(access) \
> + (((access) << HCLGE_MDIO_CTRL_OP_LSH) & HCLGE_MDIO_CTRL_OP_MSK)
> +#define HCLGE_MDIO_CTRL_PRTAD_MSK   GENMASK(4, 0)
> +#define HCLGE_MDIO_CTRL_DEVAD_MSK   GENMASK(4, 0)
> +
> +#define HCLGE_MDIO_STA_VAL(val)  ((val) & BIT(0))
> +
> +struct hclge_mdio_cfg_cmd {
> + u8 ctrl_bit;
> + u8 prtad;   /* The external port address */
> + u8 devad;   /* The external device address */
> + u8 rsvd;
> + __le16 addr_c45;/* Only valid for c45 */
> + __le16 data_wr;
> + __le16 data_rd;
> + __le16 sta;> +};
> +
> +static int hclge_mdio_write(struct mii_bus *bus, int phy_id, int regnum,
> + u16 data)
> +{
> + struct hclge_dev *hdev = (struct hclge_dev *)bus->priv;
> + struct hclge_mdio_cfg_cmd *mdio_cmd;
> + enum hclge_cmd_status status;
> + struct hclge_desc desc;
> + u8 is_c45, devad;
> + u16 reg;
> +
> + if (!bus)
> + return -EINVAL;
> +
> + is_c45 = !!(regnum & MII_ADDR_C45);
> + devad = ((regnum >> 16) & 0x1f);
> + reg = (u16)(regnum & 0x);
> +
> + hclge_cmd_setup_basic_desc(, HCLGE_OPC_MDIO_CONFIG, false);
> +
> + mdio_cmd = (struct hclge_mdio_cfg_cmd *)desc.data;
> +
> + if (!is_c45) {

It would be more readable with positive logic: if (is_c45) { } else { }

> + /* C22 write reg and data */
> + mdio_cmd->ctrl_bit = HCLGE_MDIO_IS_C22(!is_c45);
> + mdio_cmd->ctrl_bit |= HCLGE_MDIO_CTRL_OP(HCLGE_MDIO_C22_WRITE);
> + mdio_cmd->ctrl_bit |= HCLGE_MDIO_CTRL_START_BIT;
> + mdio_cmd->data_wr = cpu_to_le16(data);
> + mdio_cmd->devad = devad & HCLGE_MDIO_CTRL_DEVAD_MSK;
> + mdio_cmd->prtad = phy_id & HCLGE_MDIO_CTRL_PRTAD_MSK;
> + } else {
> + /* Set phy addr */
> + mdio_cmd->ctrl_bit |= HCLGE_MDIO_CTRL_START_BIT;
> + mdio_cmd->addr_c45 = cpu_to_le16(reg);
> + mdio_cmd->data_wr = cpu_to_le16(data);
> + mdio_cmd->devad = devad & HCLGE_MDIO_CTRL_DEVAD_MSK;
> + mdio_cmd->prtad = phy_id & HCLGE_MDIO_CTRL_PRTAD_MSK;
> + }

There is some common initialization that you could probably extracted
out of the C22/C45 clause here.

> +
> + status = hclge_cmd_send(>hw, , 1);
> + if (status) {
> + dev_err(>pdev->dev,
> + "mdio write fail when sending cmd, status is %d.\n",
> + status);
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +static int hclge_mdio_read(struct mii_bus *bus, int phy_id, int regnum)
> +{
> + struct hclge_dev *hdev = (struct hclge_dev *)bus->priv;
> + struct hclge_mdio_cfg_cmd *mdio_cmd;
> + enum hclge_cmd_status status;
> + struct hclge_desc desc;
> + u8 is_c45, devad;
> + u16 reg;
> +
> + if (!bus)
> + return -EINVAL;
> +
> + is_c45 = !!(regnum & MII_ADDR_C45);
> + devad = ((regnum >> 16) & GENMASK(4,

Re: [PATCH net-next 8/9] net: hns3: Add support of debugfs interface to HNS3 driver

2017-06-10 Thread Andrew Lunn

On Sat, Jun 10, 2017 at 12:51:57PM +, Mintz, Yuval wrote:
> > This adds the support of the debugfs interface to the driver for debugging
> > purposes.
> 
> > +const struct  hclge_support_cmd  support_cmd[] = {
> > +   {"send cmd", 8, hclge_dbg_send,
> > +   "opcode flag data0 data1 data2 data3 data4 data5"},
> > +   {"help", 4, hclge_dbg_usage, "no option"}, };
> 
> Is there an actual description of what this does? Or is it simply a huge 
> backdoor?

It looks like a huge backdoor to the chip.

It is O.K. to have such a patch internally for your own development
work, but it should not be submitted for mainline.

NACK

Andrew

Re: [PATCHv2 net-next] sctp: no need to check assoc id before calling sctp_assoc_set_id

2017-06-10 Thread Marcelo Ricardo Leitner

On Sat, Jun 10, 2017 at 03:27:12PM +0800, Xin Long wrote:
> sctp_assoc_set_id does the assoc id check in the beginning when
> processing dupcookie, no need to do the same check before calling
> it.
> 
> v1->v2:
>   fix some typo errs Marcelo pointed in changelog.
> 
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/associola.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index 288c5e0..72b07dd 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1181,12 +1181,8 @@ void sctp_assoc_update(struct sctp_association *asoc,
>   if (sctp_state(asoc, COOKIE_WAIT))
>   sctp_stream_update(>stream, >stream);
>  
> - if (!asoc->assoc_id) {
> - /* get a new association id since we don't have one
> -  * yet.
> -  */
> - sctp_assoc_set_id(asoc, GFP_ATOMIC);
> - }
> + /* get a new assoc id if we don't have one yet. */
> + sctp_assoc_set_id(asoc, GFP_ATOMIC);
>   }
>  
>   /* SCTP-AUTH: Save the peer parameters from the new associations
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH net-next] sctp: use read_lock_bh in sctp_eps_seq_show

2017-06-10 Thread Marcelo Ricardo Leitner

On Sat, Jun 10, 2017 at 03:13:32PM +0800, Xin Long wrote:
> This patch is to use read_lock_bh instead of local_bh_disable
> and read_lock in sctp_eps_seq_show.
> 
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/proc.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index 5a27d0f..8e34db5 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -218,8 +218,7 @@ static int sctp_eps_seq_show(struct seq_file *seq, void 
> *v)
>   return -ENOMEM;
>  
>   head = _ep_hashtable[hash];
> - local_bh_disable();
> - read_lock(>lock);
> + read_lock_bh(>lock);
>   sctp_for_each_hentry(epb, >chain) {
>   ep = sctp_ep(epb);
>   sk = epb->sk;
> @@ -234,8 +233,7 @@ static int sctp_eps_seq_show(struct seq_file *seq, void 
> *v)
>   sctp_seq_dump_local_addrs(seq, epb);
>   seq_printf(seq, "\n");
>   }
> - read_unlock(>lock);
> - local_bh_enable();
> + read_unlock_bh(>lock);
>  
>   return 0;
>  }
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH net-next] sctp: fix recursive locking warning in sctp_do_peeloff

2017-06-10 Thread Marcelo Ricardo Leitner

On Sat, Jun 10, 2017 at 02:56:56PM +0800, Xin Long wrote:
> Dmitry got the following recursive locking report while running syzkaller
> fuzzer, the Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
>  print_deadlock_bug kernel/locking/lockdep.c:1729 [inline]
>  check_deadlock kernel/locking/lockdep.c:1773 [inline]
>  validate_chain kernel/locking/lockdep.c:2251 [inline]
>  __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340
>  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
>  lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
>  lock_sock include/net/sock.h:1460 [inline]
>  sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  __sock_create+0x38b/0x870 net/socket.c:1226
>  sock_create+0x7f/0xa0 net/socket.c:1237
>  sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879
>  sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline]
>  sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628
>  sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690
>  SYSC_getsockopt net/socket.c:1817 [inline]
>  SyS_getsockopt+0x240/0x380 net/socket.c:1799
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> 
> This warning is caused by the lock held by sctp_getsockopt() is on one
> socket, while the other lock that sctp_close() is getting later is on
> the newly created (which failed) socket during peeloff operation.
> 
> This patch is to avoid this warning by use lock_sock with subclass
> SINGLE_DEPTH_NESTING as Wang Cong and Marcelo's suggestion.
> 
> Reported-by: Dmitry Vyukov 
> Suggested-by: Marcelo Ricardo Leitner 
> Suggested-by: Cong Wang 
> Signed-off-by: Xin Long 

Thanks for following up on this.

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/socket.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 5f58dd0..32d5495 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
>  
>   pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
>  
> - lock_sock(sk);
> + lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
>   sk->sk_shutdown = SHUTDOWN_MASK;
>   sk->sk_state = SCTP_SS_CLOSING;
>  
> @@ -1544,7 +1544,7 @@ static void sctp_close(struct sock *sk, long timeout)
>* held and that should be grabbed before socket lock.
>*/
>   spin_lock_bh(>sctp.addr_wq_lock);
> - bh_lock_sock(sk);
> + bh_lock_sock_nested(sk);
>  
>   /* Hold the sock, since sk_common_release() will put sock_put()
>* and we have just a little more cleanup.
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: [PATCH net] sctp: disable BH in sctp_for_each_endpoint

2017-06-10 Thread Marcelo Ricardo Leitner

On Sat, Jun 10, 2017 at 02:48:14PM +0800, Xin Long wrote:
> Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling
> BH. If CPU schedules to another thread A at this moment, the thread A may
> be trying to hold the write_lock with disabling BH.
> 
> As BH is disabled and CPU cannot schedule back to the thread holding the
> read_lock, while the thread A keeps waiting for the read_lock. A dead
> lock would be triggered by this.
> 
> This patch is to fix this dead lock by calling read_lock_bh instead to
> disable BH when holding the read_lock in sctp_for_each_endpoint.
> 
> Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and 
> reuse some for proc")
> Reported-by: Xiumei Mu 
> Signed-off-by: Xin Long 

Similar is done for proc interface already (sctp_eps_seq_show).

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/socket.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index f16c8d9..30aa0a5 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4622,13 +4622,13 @@ int sctp_for_each_endpoint(int (*cb)(struct 
> sctp_endpoint *, void *),
>  
>   for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
>hash++, head++) {
> - read_lock(>lock);
> + read_lock_bh(>lock);
>   sctp_for_each_hentry(epb, >chain) {
>   err = cb(sctp_ep(epb), p);
>   if (err)
>   break;
>   }
> - read_unlock(>lock);
> + read_unlock_bh(>lock);
>   }
>  
>   return err;
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-06-10 Thread Andrew Lunn

> Also what Martin Blumenstingl wrote is following which is also crucial
> for fixing the issue:
> Amlogic has given their ethernet PHY driver some updates [2], it now
> includes wake-on-lan, and they now have an internal_phy_read_status
> which uses reset_internal_phy if there's a link and some error counter
> exceeds some magic value.

Hi Crow

You could probably just drop the Amlogic driver into mainline and see
if it works better. If that solves your problem, we can look at
merging the changes.

Andrew

Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-10 Thread Eric Dumazet

On Fri, 2017-06-09 at 21:24 +0200, Bjørn Mork wrote:
> Chenbo Feng  writes:
> 
> > This patch is still under working since it may have problem with
> > ip_fragment() call, did you applied it already? Should I send a revert
> > patch to you then?
> 
> It does? I initially thought so too, but looking closer I believe the
> ip6_copy_metadata() calls in ip6_fragment() takes care of it.

Indeed, this should be fine.

Re: [PATCH net-next] Revert "ipv6: Initial skb->dev and skb->protocol in ip6_output"

2017-06-10 Thread Eric Dumazet

On Fri, 2017-06-09 at 12:56 -0700, Chenbo Feng wrote:
> From: Chenbo Feng 
> 
> This reverts commit 97a7a37a7b7b("ipv6: Initial skb->dev and
> skb->protocol in ip6_output") since it does not handles the
> skb->dev assignment inside ip6_fragment() code path properly.
> Need to rework and upload again

We can avoid the revert I believe the patch is fine after analysis.

Please submit this followup, thanks ! :

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 
02cd44f0953900108701895108b2fdaa9f9980e5..0d6f3b6345de26c329ae1d6f25dde652a5452d4b
 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -869,7 +869,6 @@ int ip6_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
if (skb->sk && dst_allfrag(skb_dst(skb)))
sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
 
-   skb->dev = skb_dst(skb)->dev;
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
err = -EMSGSIZE;

Re: [PATCH net-next] ipv6: Initial skb->dev and skb->protocol in ip6_output

2017-06-10 Thread Eric Dumazet

On Fri, 2017-06-09 at 16:12 -0700, Chenbo Feng wrote:
> 
> On 06/09/2017 12:39 PM, David Miller wrote:
> > From: Chenbo Feng 
> > Date: Fri, 9 Jun 2017 12:13:57 -0700
> >
> >>
> >> On 06/09/2017 12:08 PM, David Miller wrote:
> >>> From: Chenbo Feng 
> >>> Date: Fri,  9 Jun 2017 12:06:07 -0700
> >>>
>  From: Chenbo Feng 
> 
>  Move the initialization of skb->dev and skb->protocol from
>  ip6_finish_output2 to ip6_output. This can make the skb->dev and
>  skb->protocol information avalaible to the CGROUP eBPF filter.
> 
>  Signed-off-by: Chenbo Feng 
>  Acked-by: Eric Dumazet 
> >>> Applied, thanks.
> >>>
> >>> This makes ipv6 consistent with ipv4.
> >>>
> >>> I am surprised this wasn't noticed, for example, in netfilter.
> >>> .
> >>>
> >> Hi David,
> >>
> >> This patch is still under working since it may have problem with
> >> ip_fragment() call, did you applied it already? Should I send a revert
> >> patch to you then?
> > A revert is necessary or a relative fixup.
> >
> > Thank you.
> >
> Hi David,
> 
> The revert is uploaded here: http://patchwork.ozlabs.org/patch/774136/
> 
> Thanks and sorry for the trouble caused
> 
> Chenbo Feng

No worries !

It seems the revert is not needed, after further analysis.

One of the point I raised was this no longer needed chunk, that can be
added as a separate patch :

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 
02cd44f0953900108701895108b2fdaa9f9980e5..0d6f3b6345de26c329ae1d6f25dde652a5452d4b
 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -869,7 +869,6 @@ int ip6_fragment(struct net *net, struct sock *sk, struct 
sk_buff *skb,
if (skb->sk && dst_allfrag(skb_dst(skb)))
sk_nocaps_add(skb->sk, NETIF_F_GSO_MASK);
 
-   skb->dev = skb_dst(skb)->dev;
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
err = -EMSGSIZE;

Re: [for-next 4/6] net/mlx5: FPGA, Add basic support for Innova

2017-06-10 Thread Majd Dibbiny


> On Jun 10, 2017, at 1:24 AM, Doug Ledford  wrote:
> 
>> On Wed, 2017-06-07 at 13:21 -0600, Jason Gunthorpe wrote:
>>> On Wed, Jun 07, 2017 at 10:13:43PM +0300, Saeed Mahameed wrote:
>>>  
>>> No !!
>>> I am just showing you that the ib_core eventually will end up
>>> calling
>>> mlx5_core to create a QP.
>>> so mlx5_core can create the QP it self since it is the one
>>> eventually
>>> creating QPs.
>>> we just call mlx5_core_create_qp directly.
>> 
>> Which is building a RDMA ULP inside a driver without using the core
>> code :(
> 
> Aren't the transmit/receive queues of the Ethernet netdevice on
> mlx4/mlx5 hardware QPs too?  Those bypass the RDMA subsystem entirely.
>  Just because something uses a QP on hardware that does *everything*
> via QPs doesn't necessarily mean it must go through the RDMA subsystem.
> 
> Now, the fact that the content of the packets is basically a RoCE
> packet does make things a bit fuzzier, but if their packets are
> specially crafted RoCE packets that aren't really intended to be fully
> RoCE spec compliant (maybe they don't support all the options as normal
> RoCE QPs), then I can see hiding them from the larger RoCE portion of
> the RDMA stack.
> 
>>> 
 
 This keep getting more ugly :(
 
 What about security? What if user space sends some raw packets to
 the
 FPGA - can it reprogram the ISPEC settings or worse?
 
>>> 
>>> No such thing. This QP is only for internal driver/HW
>>> communications,
>>> as it is faster from the existing command interface.
>>> it is not meant to be exposed for any raw user space usages at all,
>>> without proper standard API adapter of course.
>> 
>> I'm not asking about the QP, I'm asking what happens after the NIC
>> part. You use ROCE packets to control the FPGA. What prevents
>> userspace from forcibly constructing roce packets and sending them to
>> the FPGA. How does the FPGA know for certain the packet came from the
>> kernel QP and not someplace else.
> 
> This is a valid concern.
> 
>> This is especially true for mlx nics as there are many raw packet
>> bypass mechanisms available to userspace.
> 
All of the Raw packet bypass mechanisms are restricted to CAP_NET_RAW, and thus 
malicious users can't simply open a RAW Packet QP and send it to the FPGA..
> Right.  The question becomes: Does the firmware filter outgoing raw ETH
> QPs such that a nefarious user could not send a crafted RoCE packet
> that the bump on the wire would intercept and accept?
> 
> -- 
> Doug Ledford 
> GPG KeyID: B826A3330E572FDD
>
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH net-next 1/1] net: reflect mark on tcp syn ack packets

2017-06-10 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

SYN-ACK responses on a server in response to a SYN from a client
did not get the injected skb mark that was tagged on the SYN packet.

Fixes: 84f39b08d786 ("net: support marking accepting TCP sockets")
Signed-off-by: Jamal Hadi Salim 
---
 net/ipv4/ip_output.c  | 3 ++-
 net/ipv4/tcp_output.c | 2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7a3fd25..a8fd5f0 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -173,7 +173,8 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct 
sock *sk,
}
 
skb->priority = sk->sk_priority;
-   skb->mark = sk->sk_mark;
+   if (!skb->mark)
+   skb->mark = sk->sk_mark;
 
/* Send it out. */
return ip_local_out(net, skb->sk, skb);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9a9c395..8c3661a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3212,6 +3212,8 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, 
struct dst_entry *dst,
tcp_ecn_make_synack(req, th);
th->source = htons(ireq->ir_num);
th->dest = ireq->ir_rmt_port;
+   if (sock_net(sk)->ipv4.sysctl_tcp_fwmark_accept)
+   skb->mark = ireq->ir_mark;
/* Setting of flags are superfluous here for callers (and ECE is
 * not even correctly set)
 */
-- 
1.9.1

[PATCH v2] net: fec: Add a fec_enet_clear_ethtool_stats() stub for CONFIG_M5272

2017-06-10 Thread Fabio Estevam

From: Fabio Estevam 

Commit 2b30842b23b9 ("net: fec: Clear and enable MIB counters on imx51")
introduced fec_enet_clear_ethtool_stats(), but missed to add a stub
for the CONFIG_M5272=y case, causing build failure for the
m5272c3_defconfig.

Add the missing empty stub to fix the build failure.

Reported-by: Paul Gortmaker 
Signed-off-by: Fabio Estevam 
Reviewed-by: Andrew Lunn 
---
Changes since v1:
- I am resending it with Andrew's Reviewed-by. Originally I have mistyped
tha netdev mailing list address.

 drivers/net/ethernet/freescale/fec_main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 297fd19..a6e323f 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2379,6 +2379,10 @@ static void fec_enet_clear_ethtool_stats(struct 
net_device *dev)
 static inline void fec_enet_update_ethtool_stats(struct net_device *dev)
 {
 }
+
+static inline void fec_enet_clear_ethtool_stats(struct net_device *dev)
+{
+}
 #endif /* !defined(CONFIG_M5272) */
 
 /* ITR clock source is enet system clock (clk_ahb).
-- 
2.7.4

RE: [PATCH] net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse

2017-06-10 Thread Jon Maloy

Acked.

///jon


> -Original Message-
> From: Jia-Ju Bai [mailto:baijiaju1...@163.com]
> Sent: Saturday, June 10, 2017 05:04
> To: Jon Maloy ; Ying Xue
> ; da...@davemloft.net
> Cc: netdev@vger.kernel.org; tipc-discuss...@lists.sourceforge.net; linux-
> ker...@vger.kernel.org; Jia-Ju Bai 
> Subject: [PATCH] net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse
> 
> The kernel may sleep under a rcu read lock in tipc_msg_reverse, and the
> function call path is:
> tipc_l2_rcv_msg (acquire the lock by rcu_read_lock)
>   tipc_rcv
> tipc_sk_rcv
>   tipc_msg_reverse
> pskb_expand_head(GFP_KERNEL) --> may sleep tipc_node_broadcast
>   tipc_node_xmit_skb
> tipc_node_xmit
>   tipc_sk_rcv
> tipc_msg_reverse
>   pskb_expand_head(GFP_KERNEL) --> may sleep
> 
> To fix it, "GFP_KERNEL" is replaced with "GFP_ATOMIC".
> 
> Signed-off-by: Jia-Ju Bai 
> ---
>  net/tipc/msg.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 312ef7d..ab30876 100644
> --- a/net/tipc/msg.c
> +++ b/net/tipc/msg.c
> @@ -508,7 +508,7 @@ bool tipc_msg_reverse(u32 own_node,  struct
> sk_buff **skb, int err)
>   }
> 
>   if (skb_cloned(_skb) &&
> - pskb_expand_head(_skb, BUF_HEADROOM, BUF_TAILROOM,
> GFP_KERNEL))
> + pskb_expand_head(_skb, BUF_HEADROOM, BUF_TAILROOM,
> GFP_ATOMIC))
>   goto exit;
> 
>   /* Now reverse the concerned fields */
> --
> 1.7.9.5
>

RE: [PATCH net-next 8/9] net: hns3: Add support of debugfs interface to HNS3 driver

2017-06-10 Thread Mintz, Yuval

> This adds the support of the debugfs interface to the driver for debugging
> purposes.

> +const struct  hclge_support_cmd  support_cmd[] = {
> + {"send cmd", 8, hclge_dbg_send,
> + "opcode flag data0 data1 data2 data3 data4 data5"},
> + {"help", 4, hclge_dbg_usage, "no option"}, };

Is there an actual description of what this does? Or is it simply a huge 
backdoor?

RE: [PATCH net-next 1/9] net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC

2017-06-10 Thread Mintz, Yuval

> +static void hns3_nic_net_down(struct net_device *ndev) {
> + struct hns3_nic_priv *priv = netdev_priv(ndev);
> + struct hnae3_ae_ops *ops;
> + int i;
> +
> + netif_tx_stop_all_queues(ndev);
> + netif_carrier_off(ndev);
> + netif_tx_disable(ndev);
> +
> + ops = priv->ae_handle->ae_algo->ops;
> +
> + if (ops->stop)
> + ops->stop(priv->ae_handle);
> +
> + netif_tx_stop_all_queues(ndev);

Looks a bit excessive. Why do you need all these netif_tx_stop_all_queues()?

> +int hns3_nic_net_xmit_hw(struct net_device *ndev,
...
> +out_map_frag_fail:
> +
> + while (ring->next_to_use != next_to_use) {
> + if (ring->next_to_use != next_to_use)
> + dma_unmap_page(dev,
> +ring->desc_cb[ring->next_to_use].dma,
> +ring->desc_cb[ring->next_to_use].length,
> +DMA_TO_DEVICE);
> + else
> + dma_unmap_single(dev,
> +  ring->desc_cb[next_to_use].dma,
> +  ring->desc_cb[next_to_use].length,
> +  DMA_TO_DEVICE);
> + }

Something looks completely broken in this error-handling 'loop'.

> +static int hns3_setup_tc(struct net_device *ndev, u8 tc) {
...
> + /* Assign UP2TC map for the VSI */
> + for (i = 0; i < HNAE3_MAX_TC; i++) {
> + netdev_set_prio_tc_map(ndev,
> +kinfo->tc_info[i].up,
> +kinfo->tc_info[i].tc);
> + }
...
> +static int hns3_nic_setup_tc(struct net_device *dev, u32 handle,
> +  u32 chain_index, __be16 protocol,
> +  struct tc_to_netdev *tc)
> +{
> + if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
> + return -EINVAL;
> +
> + return hns3_setup_tc(dev, tc->mqprio->num_tc); }

Isn't mqprio going to override your priority2tc mapping with the one provided
by user?

> +
> +static int hns3_handle_rx_bd(struct hns3_enet_ring *ring,
> +  struct sk_buff **out_skb, int *out_bnum) {
...
> + /* Prefetch first cache line of first page */
> + prefetch(va);
> +#if L1_CACHE_BYTES < 128
> + prefetch(va + L1_CACHE_BYTES);
> +#endif

Might be better to comment what you're actually fetching

Re: [PATCH net-next RFC 1/1] net netlink: Add new type NLA_FLAG_BITS

2017-06-10 Thread Jamal Hadi Salim




Ping.. Following up

On 17-05-02 09:52 PM, Jamal Hadi Salim wrote:

On 17-05-02 03:03 PM, David Miller wrote:

From: Jamal Hadi Salim 
Date: Sun, 30 Apr 2017 10:28:39 -0400


Generic bitflags attribute content sent to the kernel by user.
With this type the user can either set or unset a flag in the
kernel.


You asked for feedback, here it is :-)

I think this is overengineered.

Just define a u32 for the value, and mask which defines which bits are
legitimate and defined.  Any bit outside of the legitimate mask must
be zero.



That is what it does but as a nla type. It has a validator ops() in
case someone wants to override the default validator.
Is the ops the over-engineering you refer to or merely making it an
nla type is a problem?



Since its been a month, a reminder of what the patch introduced a
new netlink type, NLA_FLAG_BITS

UAPI structure:

struct nla_bit_flags {
   u32 nla_flag_values;
   u32 nla_flag_selector;
}

User to kernel example:
---
nla_flag_values = 0x0, and nla_flag_selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.

nla_flag_values = 0x2, and nla_flag_selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.
---

A kernel subsystem specifies validation rules of the following
nature:

 [ATTR_GOO] = { .type = NLA_FLAG_BITS,
.validation_data =  },

where myvalidflags is the bit mask of the flags the kernel understands.
We reject any bitmask of values that dont fit myvalidflags.

The user can also specify their own validation callback as so:

[ATTR_GOO] = { .type = MYTYPE,
   .validation_data = _data,
   .validate_content = mycontent_validator },

myvalidation_data is the allowed bitmap as before
With validator callback looking like:

int mycontent_validator(const struct nlattr *nla, void *valid_data)
{
   const struct myattribute *user_data = nla_data(nla);
   struct myvalidation_struct *valid_data_constraint = valid_data;

  ... validate user_data against valid_data_constraint ...
  ... return appropriate error code etc ...
}
---

So Dave ;-> Are you suggesting I get rid of the validation op?

cheers,
jamal

veth: проблемы со скоростью

2017-06-10 Thread Алексей Болдырев

Короче, имеем ядро 4.11.4. При передаче данных через veth, мы получаем скорость 
примерно такую:
root@containers:~# iperf3 -c 10.194.1.3
Connecting to host 10.194.1.3, port 5201
[  4] local 10.194.1.2 port 55640 connected to 10.194.1.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec   678 MBytes  5.68 Gbits/sec0   1.10 MBytes   
[  4]   1.00-2.00   sec   684 MBytes  5.73 Gbits/sec0   1.60 MBytes   
[  4]   2.00-3.00   sec   684 MBytes  5.74 Gbits/sec0   1.60 MBytes   
[  4]   3.00-4.00   sec   686 MBytes  5.75 Gbits/sec0   1.60 MBytes   
[  4]   4.00-5.00   sec   685 MBytes  5.75 Gbits/sec0   1.60 MBytes   
[  4]   5.00-6.00   sec   684 MBytes  5.73 Gbits/sec0   1.60 MBytes   
[  4]   6.00-7.00   sec   684 MBytes  5.74 Gbits/sec0   1.60 MBytes   
[  4]   7.00-8.00   sec   684 MBytes  5.74 Gbits/sec0   1.60 MBytes   
[  4]   8.00-9.00   sec   686 MBytes  5.75 Gbits/sec0   1.60 MBytes   
[  4]   9.00-10.00  sec   685 MBytes  5.75 Gbits/sec0   1.60 MBytes   
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  6.68 GBytes  5.74 Gbits/sec0 sender
[  4]   0.00-10.00  sec  6.68 GBytes  5.74 Gbits/sec  receiver

iperf Done.

Собственно, в чём вопрос:
Какие изменения были внесины в драйвер veth? На сколько я понимаю, маленькая 
скорость возникает потому, что включенна проверка от плохих пакетов, которые 
были в этом баге: https://github.com/kubernetes/kubernetes/issues/18898

Дело в том, что veth используется также для создания псевдо туннелей, например 
если надо покключить в l2 режиме устройства которые не поддерживают больших 
пакетов к сети, в которой есть jumbo frame. 

Дело в том, что эту проверку невозможно отключить.

Re: [PATCH] i40evf: remove redundant null check on key

2017-06-10 Thread Dan Carpenter


This patch isn't right...

On Wed, Jun 07, 2017 at 12:54:07AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> key has previously been null checked so the subsequent null check
> is redundant as key can never be null at that point, so remove it.
> 

Actually, it's the reverse.  "key" is always NULL.  Probably the ||
should be a &&?

regards,
dan carpenter

[PATCH] net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse

2017-06-10 Thread Jia-Ju Bai

The kernel may sleep under a rcu read lock in tipc_msg_reverse, and the
function call path is:
tipc_l2_rcv_msg (acquire the lock by rcu_read_lock)
  tipc_rcv
tipc_sk_rcv
  tipc_msg_reverse
pskb_expand_head(GFP_KERNEL) --> may sleep
tipc_node_broadcast
  tipc_node_xmit_skb
tipc_node_xmit
  tipc_sk_rcv
tipc_msg_reverse
  pskb_expand_head(GFP_KERNEL) --> may sleep

To fix it, "GFP_KERNEL" is replaced with "GFP_ATOMIC".

Signed-off-by: Jia-Ju Bai 
---
 net/tipc/msg.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 312ef7d..ab30876 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -508,7 +508,7 @@ bool tipc_msg_reverse(u32 own_node,  struct sk_buff **skb, 
int err)
}
 
if (skb_cloned(_skb) &&
-   pskb_expand_head(_skb, BUF_HEADROOM, BUF_TAILROOM, GFP_KERNEL))
+   pskb_expand_head(_skb, BUF_HEADROOM, BUF_TAILROOM, GFP_ATOMIC))
goto exit;
 
/* Now reverse the concerned fields */
-- 
1.7.9.5

[PATCH] net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx

2017-06-10 Thread Jia-Ju Bai

The kernel may sleep under a rcu read lock in cfpkt_create_pfx, and the
function call path is:
cfcnfg_linkup_rsp (acquire the lock by rcu_read_lock)
  cfctrl_linkdown_req
cfpkt_create
  cfpkt_create_pfx
alloc_skb(GFP_KERNEL) --> may sleep
cfserl_receive (acquire the lock by rcu_read_lock)
  cfpkt_split
cfpkt_create_pfx
  alloc_skb(GFP_KERNEL) --> may sleep

There is "in_interrupt" in cfpkt_create_pfx to decide use "GFP_KERNEL" or
"GFP_ATOMIC". In this situation, "GFP_KERNEL" is used because the function 
is called under a rcu read lock, instead in interrupt.

To fix it, only "GFP_ATOMIC" is used in cfpkt_create_pfx.

Signed-off-by: Jia-Ju Bai 
---
 net/caif/cfpkt_skbuff.c |6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/caif/cfpkt_skbuff.c b/net/caif/cfpkt_skbuff.c
index 59ce1fc..71b6ab2 100644
--- a/net/caif/cfpkt_skbuff.c
+++ b/net/caif/cfpkt_skbuff.c
@@ -81,11 +81,7 @@ static struct cfpkt *cfpkt_create_pfx(u16 len, u16 pfx)
 {
struct sk_buff *skb;
 
-   if (likely(in_interrupt()))
-   skb = alloc_skb(len + pfx, GFP_ATOMIC);
-   else
-   skb = alloc_skb(len + pfx, GFP_KERNEL);
-
+   skb = alloc_skb(len + pfx, GFP_ATOMIC);
if (unlikely(skb == NULL))
return NULL;
 
-- 
1.7.9.5

Re: [PATCHv2 net] net/flow: fix fc->percpu NULL pointer dereference

2017-06-10 Thread Xin Long

On Fri, Jun 9, 2017 at 9:09 PM, Hangbin Liu  wrote:
> Now we will force to do garbage collection if any policy removed in
> xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
> first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
> -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
> dereference when check percpu_empty. The code path looks like:
>
> flow_cache_fini()
>   - fc->percpu = NULL
> xfrm_policy_fini()
>   - xfrm_policy_flush()
> - xfrm_garbage_collect()
>   - flow_cache_flush()
> - flow_cache_percpu_empty()
>   - fcp = per_cpu_ptr(fc->percpu, cpu)
>
> To reproduce, just add ipsec in netns and then remove the netns.
>
> v2:
> As Xin Long suggested, since only two other places need to call it. move
> xfrm_garbage_collect() outside xfrm_policy_flush().
>
> Fixes: 35db06912189 ("xfrm: do the garbage collection after flushing policy")
> Signed-off-by: Hangbin Liu 
> ---
>  net/key/af_key.c   | 2 ++
>  net/xfrm/xfrm_policy.c | 4 
>  net/xfrm/xfrm_user.c   | 1 +
>  3 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/net/key/af_key.c b/net/key/af_key.c
> index 512dc43..5103f92 100644
> --- a/net/key/af_key.c
> +++ b/net/key/af_key.c
> @@ -2755,6 +2755,8 @@ static int pfkey_spdflush(struct sock *sk, struct 
> sk_buff *skb, const struct sad
> int err, err2;
>
> err = xfrm_policy_flush(net, XFRM_POLICY_TYPE_MAIN, true);
> +   if (!err)
> +   xfrm_garbage_collect(net);
> err2 = unicast_flush_resp(sk, hdr);
> if (err || err2) {
> if (err == -ESRCH) /* empty table - old silent behavior */
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index ed4e52d..643a18f 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1006,10 +1006,6 @@ int xfrm_policy_flush(struct net *net, u8 type, bool 
> task_valid)
> err = -ESRCH;
>  out:
> spin_unlock_bh(>xfrm.xfrm_policy_lock);
> -
> -   if (cnt)
> -   xfrm_garbage_collect(net);
> -
> return err;
>  }
>  EXPORT_SYMBOL(xfrm_policy_flush);
> diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
> index 38614df..86116e9 100644
> --- a/net/xfrm/xfrm_user.c
> +++ b/net/xfrm/xfrm_user.c
> @@ -2027,6 +2027,7 @@ static int xfrm_flush_policy(struct sk_buff *skb, 
> struct nlmsghdr *nlh,
> return 0;
> return err;
> }
> +   xfrm_garbage_collect(net);
>
> c.data.type = type;
> c.event = nlh->nlmsg_type;
> --
> 2.5.5
>
 It's a xfrm fix, pls also fix the title, like:
   xfrm: move xfrm_garbage_collect out of xfrm_policy_flush
or
   xfrm: fix ...

Re: ARM GLX Khadas VIM Pro - Ethernet detected as only 10Mbps and stalled after some traffic

2017-06-10 Thread crow

Hi,

On Thu, Jun 8, 2017 at 7:30 PM, crow  wrote:
> Hi,
> I have here two problems with Khadas VIM Pro device and Ethernet.
>
> 1) sometimes with the Kernel Linux 4.12-rc4 the Ethernet link is Up
> but only 10Mbps.
> Don't work (either SSH to device nor over serial console to ping out)
> meson8b-dwmac c941.ethernet eth0: Link is Up - 10Mbps/Half - flow
> control off
>
> Works (if I do ifconfig eth0 down / up):
> meson8b-dwmac c941.ethernet eth0: Link is Down
> meson8b-dwmac c941.ethernet eth0: Link is Up - 100Mbps/Full - flow
> control off
>
> Whole log could be found in [0].
>
> 2) if downloading an amount of data while connected to device over
> SSH, connection will stall and after some minutes SSH session would be
> disconnected. Nothing is written in dmesg and ethtool shows me same
> values before when Ethernet was working, and after when connection
> stall. Whole log can be found in [1]
>
> SSH to device (OK)
>   -> Cloning linux git repo...
> Cloning into bare repository '/opt/mmcblk1/linux-aarch64-git/linux'...
> remote: Counting objects: 5399033, done.
> remote: Compressing objects: 100% (1495/1495), done.
> Receiving objects:   3% (161971/5399033), 73.20 MiB | 6.19 MiB/s
>
> here the download stalled and SSH connection also stalled but not disconnected
>
> # ethtool eth0
> Settings for eth0:
> Supported ports: [ TP MII ]
> Supported link modes:   10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> Supported pause frame use: Symmetric Receive-only
> Supports auto-negotiation: Yes
> Advertised link modes:  10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
> Advertised pause frame use: No
> Advertised auto-negotiation: Yes
> Link partner advertised link modes:  10baseT/Half 10baseT/Full
>  100baseT/Half 100baseT/Full
> Link partner advertised pause frame use: No
> Link partner advertised auto-negotiation: Yes
> Speed: 100Mb/s
> Duplex: Full
> Port: MII
> PHYAD: 8
> Transceiver: internal
> Auto-negotiation: on
> Supports Wake-on: ug
> Wake-on: d
> Current message level: 0x003f (63)
>drv probe link timer ifdown ifup
> Link detected: yes
> #
>
> after 2 min SSH connection disconnected
>
> over serial console:
> # ping -c3 8.8.8.8
> PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
> From 10.8.8.6 icmp_seq=1 Destination Host Unreachable
>
> --- 8.8.8.8 ping statistics ---
> 3 packets transmitted, 0 received, +1 errors, 100% packet loss, time 2060ms
> pipe 3
> #
>
> nothing in dmesg or in journalctl
>
> also here is helps only to take port down and again up. restarting
> networking services fails "timed out".
>
> Regards,
>
>
> [0] https://defuse.ca/b/jqXqW9Ip
> [1] https://defuse.ca/b/bJLOAuX6


I am posting this only as Information if needed to others who may be
able to check this, as I am not able to do it:

I was told from Neil Armstrong to post PHY register dump information
from eth0, but to use official Khadas VIM Ubuntu image (Amlogic
kernel) and then mainline kernel 4.12-rc4 (which I am running on
ArchLinuxArm).

With custom Amlogic 4.9.26 kernel I was able to git clone linux repository:

Linux Khadas 4.9.26 #1 SMP PREEMPT Sun Jun 4 11:34:23 CST 2017 aarch64
aarch64 aarch64 GNU/Linux

# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-HD flow-control, link ok
  registers for MII PHY 8:
1000 782d 0181 4400 01e1 c1e1 000f 2001
       
0040 0082 40e8 5400 0d80 1000  a900
fff0   140a 1407 00ca  105a
  product info: vendor 00:60:51, model 0 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  link partner: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
#

With custom Amlogic 3.1.14 kernel cloning linux repository was also working

Linux Khadas 3.14.29 #21 SMP PREEMPT Sat May 13 22:10:31 CST 2017
aarch64 aarch64 aarch64 GNU/Linux

# mii-tool -vvv eth0
Using SIOCGMIIPHY=0x8947
eth0: negotiated 1000baseT-HD flow-control, link ok
  registers for MII PHY 8:
1000 782d 0181 4400 01e1 c1e1 000f 2001
       
0040 00c2 40e8 5400 0803   0009
fff0   020a 1407 00ca 0e00 105a
  product info: vendor 00:60:51, model 0 rev 0
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD 10baseT-HD
  advertising:  1000baseT-HD 1000baseT-FD 100baseTx-FD 100baseTx-HD
10baseT-FD

[PATCH net-next] net/packet: remove unneeded declaraion of tpacket_snd().

2017-06-10 Thread Rami Rosen

This patch removes unneeded forward declaration of tpacket_snd()
in net/packet/af_packet.c.

Signed-off-by: Rami Rosen 
---
 net/packet/af_packet.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 82ca49fba336..f9349a495caf 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -188,7 +188,6 @@ static int packet_set_ring(struct sock *sk, union 
tpacket_req_u *req_u,
 #define BLOCK_PRIV(x)  ((void *)((char *)(x) + BLOCK_O2PRIV(x)))
 
 struct packet_sock;
-static int tpacket_snd(struct packet_sock *po, struct msghdr *msg);
 static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
   struct packet_type *pt, struct net_device *orig_dev);
 
-- 
2.4.11

[PATCHv2 net-next] sctp: no need to check assoc id before calling sctp_assoc_set_id

2017-06-10 Thread Xin Long

sctp_assoc_set_id does the assoc id check in the beginning when
processing dupcookie, no need to do the same check before calling
it.

v1->v2:
  fix some typo errs Marcelo pointed in changelog.

Signed-off-by: Xin Long 
---
 net/sctp/associola.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 288c5e0..72b07dd 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1181,12 +1181,8 @@ void sctp_assoc_update(struct sctp_association *asoc,
if (sctp_state(asoc, COOKIE_WAIT))
sctp_stream_update(>stream, >stream);
 
-   if (!asoc->assoc_id) {
-   /* get a new association id since we don't have one
-* yet.
-*/
-   sctp_assoc_set_id(asoc, GFP_ATOMIC);
-   }
+   /* get a new assoc id if we don't have one yet. */
+   sctp_assoc_set_id(asoc, GFP_ATOMIC);
}
 
/* SCTP-AUTH: Save the peer parameters from the new associations
-- 
2.1.0

[PATCH net-next] sctp: use read_lock_bh in sctp_eps_seq_show

2017-06-10 Thread Xin Long

This patch is to use read_lock_bh instead of local_bh_disable
and read_lock in sctp_eps_seq_show.

Signed-off-by: Xin Long 
---
 net/sctp/proc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 5a27d0f..8e34db5 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -218,8 +218,7 @@ static int sctp_eps_seq_show(struct seq_file *seq, void *v)
return -ENOMEM;
 
head = _ep_hashtable[hash];
-   local_bh_disable();
-   read_lock(>lock);
+   read_lock_bh(>lock);
sctp_for_each_hentry(epb, >chain) {
ep = sctp_ep(epb);
sk = epb->sk;
@@ -234,8 +233,7 @@ static int sctp_eps_seq_show(struct seq_file *seq, void *v)
sctp_seq_dump_local_addrs(seq, epb);
seq_printf(seq, "\n");
}
-   read_unlock(>lock);
-   local_bh_enable();
+   read_unlock_bh(>lock);
 
return 0;
 }
-- 
2.1.0

[PATCH net-next] sctp: fix recursive locking warning in sctp_do_peeloff

2017-06-10 Thread Xin Long

Dmitry got the following recursive locking report while running syzkaller
fuzzer, the Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
 print_deadlock_bug kernel/locking/lockdep.c:1729 [inline]
 check_deadlock kernel/locking/lockdep.c:1773 [inline]
 validate_chain kernel/locking/lockdep.c:2251 [inline]
 __lock_acquire+0xef2/0x3430 kernel/locking/lockdep.c:3340
 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
 lock_sock_nested+0xcb/0x120 net/core/sock.c:2536
 lock_sock include/net/sock.h:1460 [inline]
 sctp_close+0xcd/0x9d0 net/sctp/socket.c:1497
 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
 sock_release+0x8d/0x1e0 net/socket.c:597
 __sock_create+0x38b/0x870 net/socket.c:1226
 sock_create+0x7f/0xa0 net/socket.c:1237
 sctp_do_peeloff+0x1a2/0x440 net/sctp/socket.c:4879
 sctp_getsockopt_peeloff net/sctp/socket.c:4914 [inline]
 sctp_getsockopt+0x111a/0x67e0 net/sctp/socket.c:6628
 sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2690
 SYSC_getsockopt net/socket.c:1817 [inline]
 SyS_getsockopt+0x240/0x380 net/socket.c:1799
 entry_SYSCALL_64_fastpath+0x1f/0xc2

This warning is caused by the lock held by sctp_getsockopt() is on one
socket, while the other lock that sctp_close() is getting later is on
the newly created (which failed) socket during peeloff operation.

This patch is to avoid this warning by use lock_sock with subclass
SINGLE_DEPTH_NESTING as Wang Cong and Marcelo's suggestion.

Reported-by: Dmitry Vyukov 
Suggested-by: Marcelo Ricardo Leitner 
Suggested-by: Cong Wang 
Signed-off-by: Xin Long 
---
 net/sctp/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 5f58dd0..32d5495 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1494,7 +1494,7 @@ static void sctp_close(struct sock *sk, long timeout)
 
pr_debug("%s: sk:%p, timeout:%ld\n", __func__, sk, timeout);
 
-   lock_sock(sk);
+   lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
sk->sk_shutdown = SHUTDOWN_MASK;
sk->sk_state = SCTP_SS_CLOSING;
 
@@ -1544,7 +1544,7 @@ static void sctp_close(struct sock *sk, long timeout)
 * held and that should be grabbed before socket lock.
 */
spin_lock_bh(>sctp.addr_wq_lock);
-   bh_lock_sock(sk);
+   bh_lock_sock_nested(sk);
 
/* Hold the sock, since sk_common_release() will put sock_put()
 * and we have just a little more cleanup.
-- 
2.1.0

[PATCH net] sctp: disable BH in sctp_for_each_endpoint

2017-06-10 Thread Xin Long

Now sctp holds read_lock when foreach sctp_ep_hashtable without disabling
BH. If CPU schedules to another thread A at this moment, the thread A may
be trying to hold the write_lock with disabling BH.

As BH is disabled and CPU cannot schedule back to the thread holding the
read_lock, while the thread A keeps waiting for the read_lock. A dead
lock would be triggered by this.

This patch is to fix this dead lock by calling read_lock_bh instead to
disable BH when holding the read_lock in sctp_for_each_endpoint.

Fixes: 626d16f50f39 ("sctp: export some apis or variables for sctp_diag and 
reuse some for proc")
Reported-by: Xiumei Mu 
Signed-off-by: Xin Long 
---
 net/sctp/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index f16c8d9..30aa0a5 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4622,13 +4622,13 @@ int sctp_for_each_endpoint(int (*cb)(struct 
sctp_endpoint *, void *),
 
for (head = sctp_ep_hashtable; hash < sctp_ep_hashsize;
 hash++, head++) {
-   read_lock(>lock);
+   read_lock_bh(>lock);
sctp_for_each_hentry(epb, >chain) {
err = cb(sctp_ep(epb), p);
if (err)
break;
}
-   read_unlock(>lock);
+   read_unlock_bh(>lock);
}
 
return err;
-- 
2.1.0

84 matches

Mail list logo