[PATCH] VSOCK: Don't dec ack backlog twice for rejected connections

2016-09-27 Thread Jorgen Hansen
If a pending socket is marked as rejected, we will decrease the
sk_ack_backlog twice. So don't decrement it for rejected sockets
in vsock_pending_work().

Testing of the rejected socket path was done through code
modifications.

Reported-by: Stefan Hajnoczi 
Signed-off-by: Jorgen Hansen 
Reviewed-by: Adit Ranadive 
Reviewed-by: Aditya Sarwade 
---
 net/vmw_vsock/af_vsock.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 17dbbe6..8a398b3 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -465,6 +465,8 @@ void vsock_pending_work(struct work_struct *work)
 
if (vsock_is_pending(sk)) {
vsock_remove_pending(listener, sk);
+
+   listener->sk_ack_backlog--;
} else if (!vsk->rejected) {
/* We are not on the pending list and accept() did not reject
 * us, so we must have been accepted by our user process.  We
@@ -475,8 +477,6 @@ void vsock_pending_work(struct work_struct *work)
goto out;
}
 
-   listener->sk_ack_backlog--;
-
/* We need to remove ourself from the global connected sockets list so
 * incoming packets can't find this socket, and to reduce the reference
 * count.
@@ -2010,5 +2010,5 @@ EXPORT_SYMBOL_GPL(vsock_core_get_transport);
 
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMware Virtual Socket Family");
-MODULE_VERSION("1.0.1.0-k");
+MODULE_VERSION("1.0.2.0-k");
 MODULE_LICENSE("GPL v2");
-- 
1.7.0



Re: [PATCH 5/5] ISDN-Gigaset: Enclose two expressions for the sizeof operator by parentheses

2016-09-27 Thread Dan Carpenter
On Mon, Sep 26, 2016 at 08:38:14PM +0300, Sergei Shtylyov wrote:
> >@@ -53,7 +53,7 @@ void gigaset_dbg_buffer(enum debuglevel level, const 
> >unsigned char *msg,
> > {
> > unsigned char outbuf[80];
> > unsigned char c;
> >-size_t space = sizeof outbuf - 1;
> >+size_t space = sizeof(outbuf - 1);
> 
>What?! Does that compile?
> 
> [...]

It prints a Smatch warning.  Smatch ignores these if they happen inside
a macro where you pass a pointer and it takes the sizeof() the argument.

regards,
dan carpenter



Re: [PATCH 4/5] ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread Dan Carpenter
This patch creates new bugs.

I have a policy of not telling Markus where the bug is, because
otherwise he'll just resend the patch and I have told him many times to
stop sending these cleanup patches that just introduce bugs and waste
maintainer time.

regards,
dan carpenter



Re: [PATCH 5/5] ISDN-Gigaset: Enclose two expressions for the sizeof operator by parentheses

2016-09-27 Thread Dan Carpenter
On Tue, Sep 27, 2016 at 10:08:37AM +0300, Dan Carpenter wrote:
> On Mon, Sep 26, 2016 at 08:38:14PM +0300, Sergei Shtylyov wrote:
> > >@@ -53,7 +53,7 @@ void gigaset_dbg_buffer(enum debuglevel level, const 
> > >unsigned char *msg,
> > > {
> > >   unsigned char outbuf[80];
> > >   unsigned char c;
> > >-  size_t space = sizeof outbuf - 1;
> > >+  size_t space = sizeof(outbuf - 1);
> > 
> >What?! Does that compile?
> > 
> > [...]
> 
> It prints a Smatch warning.  Smatch ignores these if they happen inside
> a macro where you pass a pointer and it takes the sizeof() the argument.

Reading that again, I realize it's not clear.  Smatch ignores these any
time they happen in a macro whether they're valid or not.  (Many times
they are valid).

regards,
dan carpenter



Re: ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread SF Markus Elfring
> This patch creates new bugs.

Thanks for your information.


> I have a policy of not telling Markus where the bug is,

I find this kind of response strange.


> because otherwise he'll just resend the patch

This can also happen when the other contributors request it.


> and I have told him many times to stop sending these cleanup patches

Software "cleanups" seem to stress the review process to some degree.


> that just introduce bugs and waste maintainer time.

I guess that the situation is mixed depending on the subsystem
or concrete software module, isn't it?

Regards,
Markus


Re: [PATCH 4/5] ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread Dan Carpenter
Ah well...  Someone else discovered the double free bug first and gave
it away.  Reassuring, I guess.

regards,
dan carpenter



Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

2016-09-27 Thread David Miller
From: "R. Parameswaran" 
Date: Thu, 22 Sep 2016 13:52:43 -0700 (PDT)

> From ed585bdd6d3d2b3dec58d414f514cd764d89159d Mon Sep 17 00:00:00 2001
> From: "R. Parameswaran" 
> Date: Thu, 22 Sep 2016 13:19:25 -0700
> Subject: [PATCH] L2TP:Adjust intf MTU,factor underlay L3,overlay L2
> 
> Take into account all of the tunnel encapsulation headers when setting
> up the MTU on the L2TP logical interface device. Otherwise, packets
> created by the applications on top of the L2TP layer are larger
> than they ought to be, relative to the underlay MTU, leading to
> needless fragmentation once the outer IP encap is added.
> 
> Specifically, take into account the (outer, underlay) IP header
> imposed on the encapsulated L2TP packet, and the Layer 2 header
> imposed on the inner IP packet prior to L2TP encapsulation.
> 
> Do not assume an Ethernet (non-jumbo) underlay. Use the PMTU mechanism
> and the dst entry in the L2TP tunnel socket to directly pull up
> the underlay MTU (as the baseline number on top of which the
> encapsulation headers are factored in).  Fall back to Ethernet MTU
> if this fails.
> 
> Signed-off-by: R. Parameswaran 
> 
> Reviewed-by: "N. Prachanda" ,
> Reviewed-by: "R. Shearman" ,
> Reviewed-by: "D. Fawcus" 

I have to ask, how do other tunnels over UDP such as VXLAN handle
this problem?


Re: [PATCH 1/2] net: qcom/emac: do not use devm on internal phy pdev

2016-09-27 Thread David Miller

This patch doesn't apply to net-next.

Also, when you send a patch series, you must send an initial
posting with Subject of the form "[PATCH {net,net-next} 0/2] ..."
explaining at a high level what your patch series is doing,
how it is doing it, and why it is doing it that way.

Thanks.


Re: [PATCH resend] sh_eth: add R8A7743/5 support

2016-09-27 Thread Geert Uytterhoeven
Hi Sergei,

On Tue, Sep 27, 2016 at 12:23 AM, Sergei Shtylyov
 wrote:
> Add support for the first two members of the Renesas RZ/G family, RZ/G1M/E
> (also known as  R8A7743/5). The Ether core is the same as in the R-Car gen2
> SoCs, so will share the code/data with them...
>
> Signed-off-by: Sergei Shtylyov 

> --- net-next.orig/drivers/net/ethernet/renesas/Kconfig
> +++ net-next/drivers/net/ethernet/renesas/Kconfig
> @@ -27,7 +27,7 @@ config SH_ETH
>   Renesas SuperH Ethernet device driver.
>   This driver supporting CPUs are:
> - SH7619, SH7710, SH7712, SH7724, SH7734, SH7763, SH7757,
> - R8A7740, R8A777x and R8A779x.
> + R8A7740, R8A774x, R8A777x and R8A779x.

Surely "R8A7740" is covered by "R8A774x"? :-)
However, the "x" is not a real wildcard (also for '7x and '9x), as the driver
doesn't support all possible values of "x".

Apart from that:
Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v3 2/2] netfilter: Create revision 2 of xt_hashlimit to support higher pps rates

2016-09-27 Thread Vishwanath Pai
On Tue, Sep 27, 2016 at 12:15 AM, Liping Zhang  wrote:
> Hi Vishwanath,
>
> 2016-09-23 0:43 GMT+08:00 Vishwanath Pai :
>>
>>  /* Precision saver. */
>> -static u32 user2credits(u32 user)
>> +static u64 user2credits(u64 user, int revision)
>>  {
>> -   /* If multiplying would overflow... */
>> -   if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
>> -   /* Divide first. */
>> -   return (user / XT_HASHLIMIT_SCALE_v1) *\
>> -   HZ * CREDITS_PER_JIFFY_v1;
>> +   if (revision == 1) {
>> +   /* If multiplying would overflow... */
>> +   if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
>> +   /* Divide first. */
>> +   return (user / XT_HASHLIMIT_SCALE_v1) *\
>> +   HZ * CREDITS_PER_JIFFY_v1;
>> +
>> +   return (user * HZ * CREDITS_PER_JIFFY_v1) \
>> +   / XT_HASHLIMIT_SCALE_v1;
>> +   } else {
>> +   if (user > 0x / (HZ*CREDITS_PER_JIFFY))
>> +   return (user / XT_HASHLIMIT_SCALE) *\
>> +   HZ * CREDITS_PER_JIFFY;
>>
>> -   return (user * HZ * CREDITS_PER_JIFFY_v1) / XT_HASHLIMIT_SCALE_v1;
>> +   return (user * HZ * CREDITS_PER_JIFFY) / XT_HASHLIMIT_SCALE;
>> +   }
>>  }
>>
>
> In my memory, 64-bit division operation should be replaced by
> div_u64 or div64_u64, otherwise on some 32-bit architecture
> systems, link error will happen. Something like this:
> ... undefined reference to `__udivdi3'.

I did not know that, thanks for pointing it out. I will send a patch
to fix this.

-Vishwanath


[PATCH] Fix link error in 32bit arch because of 64bit division

2016-09-27 Thread Vishwanath Pai
Fix link error in 32bit arch because of 64bit division

Division of 64bit integers will cause linker error undefined reference
to `__udivdi3'. Fix this by replacing divisions with div64_64

Signed-off-by: Vishwanath Pai 

---
 net/netfilter/xt_hashlimit.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 44a095e..7fc694e 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -465,19 +465,20 @@ static u64 user2credits(u64 user, int revision)
 {
if (revision == 1) {
/* If multiplying would overflow... */
-   if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
+   if (user > div64_u64(0x, (HZ*CREDITS_PER_JIFFY_v1)))
/* Divide first. */
-   return (user / XT_HASHLIMIT_SCALE) *\
+   return div64_u64(user, XT_HASHLIMIT_SCALE) *\
HZ * CREDITS_PER_JIFFY_v1;
 
-   return (user * HZ * CREDITS_PER_JIFFY_v1) \
-   / XT_HASHLIMIT_SCALE;
+   return div64_u64((user * HZ * CREDITS_PER_JIFFY_v1),
+ XT_HASHLIMIT_SCALE);
} else {
-   if (user > 0x / (HZ*CREDITS_PER_JIFFY))
-   return (user / XT_HASHLIMIT_SCALE_v2) *\
+   if (user > div64_u64(0x, 
(HZ*CREDITS_PER_JIFFY)))
+   return div64_u64(user, XT_HASHLIMIT_SCALE_v2) *\
HZ * CREDITS_PER_JIFFY;
 
-   return (user * HZ * CREDITS_PER_JIFFY) / XT_HASHLIMIT_SCALE_v2;
+   return div64_u64((user * HZ * CREDITS_PER_JIFFY),
+XT_HASHLIMIT_SCALE_v2);
}
 }
 
-- 
1.9.1



Re: [PATCH net-next 2/3] net: mpls: Fixups for GSO

2016-09-27 Thread Jiri Benc
On Mon, 26 Sep 2016 20:04:06 -0600, David Ahern wrote:
> you know this code better than me, but key_extract pulls the eth
> header and then sets network header. If MPLS labels are present then
> it is the labels that the network_header now points to. How did come
> to the conclusion it is after the labels?

Look ~100 lines below that, to "if (eth_p_mpls(key->eth.type))".
There's a while loop advancing network header.

 Jiri


Re: [PATCH net-next] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-27 Thread Hadar Hen Zion
On Mon, Sep 26, 2016 at 11:34 PM, Cong Wang  wrote:
> On Sun, Sep 25, 2016 at 11:02 PM, Hadar Hen Zion
>  wrote:
>> On Mon, Sep 26, 2016 at 7:31 AM, Cong Wang  wrote:
>>> On Sun, Sep 25, 2016 at 7:39 AM, Jamal Hadi Salim  wrote:
 On 16-09-25 10:08 AM, Hadar Hen Zion wrote:
>
> Currently the created tc actions list is reversed against the order
> set by the user.
> Change the actions list order to be the same as was set by the user.
>


 Did something break? It seems to matter most for dumping. But even that
 didnt breaking. Looking at the latest net tree, i tried:

>>>
>>> The reason is we use action->order as an nested attribute, so
>>> the order in the list doesn't matter, only action->order itself matters.
>>
>> The order in the list matters for offload drivers who use the
>> "tcf_exts_to_list" function and action->order parameter isn't usable
>> for them.
>> Why not keeping the actions in the same order as the user? isn't it
>> more elegant?
>
> I don't object this patch since it affects offloading, I just explained
> why it doesn't affect dumping.
>
> Please add this to your changelog, to make it obvious.

Sure, I'll add it.

Hadar

>
> Thanks!


Re: [PATCH 1/2] bpf samples: fix compiler errors with sockex2 and sockex3

2016-09-27 Thread David Miller
From: "Naveen N. Rao" 
Date: Sat, 24 Sep 2016 02:10:04 +0530

> These samples fail to compile as 'struct flow_keys' conflicts with
> definition in net/flow_dissector.h. Fix the same by renaming the
> structure used in the sample.
> 
> Signed-off-by: Naveen N. Rao 

Applied to net-next.


Re: [PATCH 2/2] bpf samples: update tracex5 sample to use __seccomp_filter

2016-09-27 Thread David Miller
From: "Naveen N. Rao" 
Date: Sat, 24 Sep 2016 02:10:05 +0530

> seccomp_phase1() does not exist anymore. Instead, update sample to use
> __seccomp_filter(). While at it, set max locked memory to unlimited.
> 
> Signed-off-by: Naveen N. Rao 

Also applied to net-next, thanks.


Re: [PATCH v3] net: ip, diag -- Add diag interface for raw sockets

2016-09-27 Thread Cyrill Gorcunov
On Mon, Sep 26, 2016 at 07:54:37PM -0600, David Ahern wrote:
> On 9/26/16 4:38 PM, Cyrill Gorcunov wrote:
> > Something like
> > 
> > Index: linux-ml.git/include/uapi/linux/inet_diag.h
> > ===
> > --- linux-ml.git.orig/include/uapi/linux/inet_diag.h2016-09-11 
> > 20:56:18.191584145 +0300
> > +++ linux-ml.git/include/uapi/linux/inet_diag.h 2016-09-27 
> > 01:34:08.413172394 +0300
> > @@ -38,7 +38,7 @@ struct inet_diag_req_v2 {
> > __u8sdiag_family;
> > __u8sdiag_protocol;
> > __u8idiag_ext;
> > -   __u8pad;
> > +   __u8sdiag_raw_protocol; /* SOCK_RAW only, @pad for others */
> 
> Seems like that should be a union to keep the API.

Is anonymous union (which is not part of c99) are acceptable in uapi?
Initially I declared it as union but then scratched my head if this
would be acceptable.

> 
> 
> > __u32   idiag_states;
> > struct inet_diag_sockid id;
> >  };
> > 
> > and in raw-diag module we will use @sdiag_raw_protocol instead of
> > @sdiag_protocol field. Didn't cover ss tool source code yet but
> > I think the idea is seen. Still not sure if start using @pad here
> > is a good idea (it's uapi), maybe beter to ask nla attribute which would
> > come right afterh the inet_diag_req_v2 message?
> > 
> 
> seems reasonable to me since 2 protocols need to be sent to the kernel.
> 
> Alternatively, sdiag_protocol could be the actual protocol and the pad union 
> be a flag field
> with say bit 0 = INET_DIAG_FLAG_SOCK_RAW. Allows other overrides in the 
> future if needed.

The @sdiag_protocol used for matching in diag module handler, so no, I think
we should not change this semantics. I would stick with @pad usage and if
anonymous unions are acceptable this would be just great.

Cyrill


Re: [PATCH v3] bpf: Set register type according to is_valid_access()

2016-09-27 Thread David Miller
From: Mickaël Salaün 
Date: Sat, 24 Sep 2016 20:01:50 +0200

> This prevent future potential pointer leaks when an unprivileged eBPF
> program will read a pointer value from its context. Even if
> is_valid_access() returns a pointer type, the eBPF verifier replace it
> with UNKNOWN_VALUE. The register value that contains a kernel address is
> then allowed to leak. Moreover, this fix allows unprivileged eBPF
> programs to use functions with (legitimate) pointer arguments.
> 
> Not an issue currently since reg_type is only set for PTR_TO_PACKET or
> PTR_TO_PACKET_END in XDP and TC programs that can only be loaded as
> privileged. For now, the only unprivileged eBPF program allowed is for
> socket filtering and all the types from its context are UNKNOWN_VALUE.
> However, this fix is important for future unprivileged eBPF programs
> which could use pointers in their context.
> 
> Signed-off-by: Mickaël Salaün 

Applied to net-next, thanks.


Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread Shmulik Ladkani
Hi David,

On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote:
> The discussion on this patch has ventured off into what to do about
> recursion.
> 
> But it unclear to me where this specific patch, and this series,
> stands right now.  Someone please clear this up for me.

Status:
 - Series adds "ingress redirect/mirror" support
 - Positive feedback for the feature
 - So far no comments regarding code itself
 - Questions raised regarding "recursion handling"
   Expressed that existing mirred code (i.e egress redirect) is *already*
   loop-unsafe (and also, some non-tc netdev constructs, as exampled by
   others).
   Discussion then wandered to "recursion handling".

Regards,
Shmulik 


Re: [PATCH v2] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/27/2016 02:01 AM, Andrew Morton wrote:

On Thu, 22 Sep 2016 18:43:59 +0200 Vlastimil Babka  wrote:


The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be
physically contiguous. Also the allocation is temporary for the duration of the
syscall, so it's unlikely to stress vmalloc too much.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.

...

--- a/fs/select.c
+++ b/fs/select.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -558,6 +559,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
struct fdtable *fdt;
/* Allocate small arguments on the stack to save memory and be faster */
long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
+   unsigned long alloc_size;

ret = -EINVAL;
if (n < 0)
@@ -580,8 +582,12 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
bits = stack_fds;
if (size > sizeof(stack_fds) / 6) {
/* Not enough space in on-stack array; must use kmalloc */
+   alloc_size = 6 * size;


Well.  `size' is `unsigned'.  The multiplication will be done as 32-bit
so there was no point in making `alloc_size' unsigned long.


Uh, right. Thanks.


So can we tighten up the types in this function?  size_t might make
sense, but vmalloc() takes a ulong.


Let's do size_t then, as the conversion to ulong is safe.




[PATCH net-next V2] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-27 Thread Hadar Hen Zion
Currently the created tc actions list is reversed against the order
set by the user.
Change the actions list order to be the same as was set by the user.

This patch doesn't affect dump actions behavior.
For dumping, action->order parameter is used so the list order doesn't
matter.

Signed-off-by: Hadar Hen Zion 
Acked-by: Jamal Hadi Salim 
---
 include/net/pkt_cls.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 5ccaa4b..767b03a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -123,7 +123,7 @@ static inline void tcf_exts_to_list(const struct tcf_exts 
*exts,
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];
 
-   list_add(&a->list, actions);
+   list_add_tail(&a->list, actions);
}
 #endif
 }
-- 
1.8.3.1



Re: [PATCH v2] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/27/2016 03:38 AM, Eric Dumazet wrote:

On Mon, 2016-09-26 at 17:01 -0700, Andrew Morton wrote:


I don't share Eric's concerns about performance here.  If the vmalloc()
is called, we're about to write to that quite large amount of memory
which we just allocated, and the vmalloc() overhead will be relatively
low.


I did not care of the performance of this particular select() system
call really, but other cpus because of more TLB invalidations.


There are many other ways to cause those, AFAIK. The reclaim/compaction
for order-3 allocation has its own impact on system, including TLB flushes.
Or a flood of mmap(MAP_POPULATE) and madvise(MADV_DONTNEED) calls...
This vmalloc() would however require raising RLIMIT_NOFILE above the defaults.


At least CONFIG_DEBUG_PAGEALLOC=y builds should be impacted, but maybe
we do not care.


I doubt anyone runs that in production, especially if performance is of concern.



[PATCH net-next] net/sched: cls_flower: Use a proper mask value for enc key id parameter

2016-09-27 Thread Hadar Hen Zion
The current code use the encapsulation key id value as the mask of that
parameter which is wrong. Fix that by using a full mask.

Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels')
Signed-off-by: Hadar Hen Zion 
---
 net/sched/cls_flower.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 2af09c8..f6f40fb 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -481,7 +481,7 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
}
 
fl_set_key_val(tb, &key->enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
-  &mask->enc_key_id.keyid, TCA_FLOWER_KEY_ENC_KEY_ID,
+  &mask->enc_key_id.keyid, TCA_FLOWER_UNSPEC,
   sizeof(key->enc_key_id.keyid));
 
return 0;
@@ -919,7 +919,7 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
goto nla_put_failure;
 
if (fl_dump_key_val(skb, &key->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
-   &mask->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+   &mask->enc_key_id, TCA_FLOWER_UNSPEC,
sizeof(key->enc_key_id)))
goto nla_put_failure;
 
-- 
1.8.3.1



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/23/2016 06:47 PM, Jason Baron wrote:

Hi,

On 09/23/2016 03:24 AM, Nicholas Piggin wrote:

On Fri, 23 Sep 2016 14:42:53 +0800
"Hillf Danton"  wrote:



The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be
physically contiguous. Also the allocation is temporary for the duration of the
syscall, so it's unlikely to stress vmalloc too much.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.


How about something like this? (untested)


This pushes the limit further, but might just delay the problem. Could be an 
optimization on top if there's enough interest, though.


[...]


+
+   if (!(fds.in && fds.out && fds.ex &&
+   fds.res_in && fds.res_out && fds.res_ex))
+   goto out;
+   } else {
+   if (nr_bytes > sizeof(stack_fds)) {
+   /* Not enough space in on-stack array */
+   if (nr_bytes > PAGE_SIZE * 2)


The 'if' looks extraneous?

Also, I wonder if we can just avoid some allocations altogether by
checking by if the user fd_set pointers are NULL? That can avoid failures :)


That would be a more major rewrite, as the core algorithm doesn't expect NULLs.


Thanks,

-Jason





[PATCH v3] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka
The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
with the number of fds passed. We had a customer report page allocation
failures of order-4 for this allocation. This is a costly order, so it might
easily fail, as the VM expects such allocation to have a lower-order fallback.

Such trivial fallback is vmalloc(), as the memory doesn't have to be physically
contiguous and the allocation is temporary for the duration of the syscall
only. There were some concerns, whether this would have negative impact on the
system by exposing vmalloc() to userspace. Although an excessive use of vmalloc
can cause some system wide performance issues - TLB flushes etc. - a large
order allocation is not for free either and an excessive reclaim/compaction can
have a similar effect. Also note that the size is effectively limited by
RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the
bitmaps will fit well within single page and thus the vmalloc() fallback could
be only excercised for processes where root allows a higher limit.

Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
it doesn't need this kind of fallback.

[eric.duma...@gmail.com: fix failure path logic]
[a...@linux-foundation.org: use proper type for size]
Signed-off-by: Vlastimil Babka 
---
 fs/select.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/select.c b/fs/select.c
index 8ed9da50896a..3d4f85defeab 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -554,7 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
fd_set_bits fds;
void *bits;
int ret, max_fds;
-   unsigned int size;
+   size_t size, alloc_size;
struct fdtable *fdt;
/* Allocate small arguments on the stack to save memory and be faster */
long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
@@ -581,7 +582,14 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
if (size > sizeof(stack_fds) / 6) {
/* Not enough space in on-stack array; must use kmalloc */
ret = -ENOMEM;
-   bits = kmalloc(6 * size, GFP_KERNEL);
+   if (size > (SIZE_MAX / 6))
+   goto out_nofds;
+
+   alloc_size = 6 * size;
+   bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
+   if (!bits && alloc_size > PAGE_SIZE)
+   bits = vmalloc(alloc_size);
+
if (!bits)
goto out_nofds;
}
@@ -618,7 +626,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
__user *outp,
 
 out:
if (bits != stack_fds)
-   kfree(bits);
+   kvfree(bits);
 out_nofds:
return ret;
 }
-- 
2.10.0



Re: ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread Paul Bolle
On Tue, 2016-09-27 at 07:20 +0200, SF Markus Elfring wrote:
> Will it matter here if the function "kfree" will be called for the
> data structure members "bcs" and "inbuf" after a later function call
> failed within the implementation of "gigaset_initcs"?

My translation of this question is: could you please hold my hand while
I read the code of a driver I do not use - a driver for hardware that I
don't even have, and therefor cannot really test - after I submitted a
patch that appears to be broken?

My answer to that question is: no, sorry, I won't do that.


Paul Bolle


[PATCH RFC 3/3] e1000e: Add ndo_set_env_hdr_len

2016-09-27 Thread Toshiaki Makita
e1000e supports generic 1522-sized frames by default, so set the default
env_hdr_len to 4, and replace dev->mtu with dev->mtu + env_hdr_len.
Note that e1000e has adapter->max_frame_size that includes mtu +
env_hdr_len, so I use it where mtu was used to validate frame length.

Signed-off-by: Toshiaki Makita 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 84 +-
 1 file changed, 60 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7017281..4dc9315 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3033,7 +3033,7 @@ static void e1000_setup_rctl(struct e1000_adapter 
*adapter)
if (hw->mac.type >= e1000_pch2lan) {
s32 ret_val;
 
-   if (adapter->netdev->mtu > ETH_DATA_LEN)
+   if (adapter->max_frame_size > VLAN_ETH_FRAME_LEN + ETH_FCS_LEN)
ret_val = e1000_lv_jumbo_workaround_ich8lan(hw, true);
else
ret_val = e1000_lv_jumbo_workaround_ich8lan(hw, false);
@@ -3053,7 +3053,7 @@ static void e1000_setup_rctl(struct e1000_adapter 
*adapter)
rctl &= ~E1000_RCTL_SBP;
 
/* Enable Long Packet receive */
-   if (adapter->netdev->mtu <= ETH_DATA_LEN)
+   if (adapter->max_frame_size <= VLAN_ETH_FRAME_LEN + ETH_FCS_LEN)
rctl &= ~E1000_RCTL_LPE;
else
rctl |= E1000_RCTL_LPE;
@@ -3121,7 +3121,8 @@ static void e1000_setup_rctl(struct e1000_adapter 
*adapter)
 * a lot of memory, since we allocate 3 pages at all times
 * per packet.
 */
-   pages = PAGE_USE_COUNT(adapter->netdev->mtu);
+   pages = PAGE_USE_COUNT(adapter->netdev->mtu +
+  adapter->netdev->env_hdr_len);
if ((pages <= 3) && (PAGE_SIZE <= 16384) && (rctl & E1000_RCTL_LPE))
adapter->rx_ps_pages = pages;
else
@@ -3191,7 +3192,8 @@ static void e1000_configure_rx(struct e1000_adapter 
*adapter)
sizeof(union e1000_rx_desc_packet_split);
adapter->clean_rx = e1000_clean_rx_irq_ps;
adapter->alloc_rx_buf = e1000_alloc_rx_buffers_ps;
-   } else if (adapter->netdev->mtu > ETH_FRAME_LEN + ETH_FCS_LEN) {
+   } else if (adapter->netdev->mtu + adapter->netdev->env_hdr_len >
+  ETH_FRAME_LEN + ETH_FCS_LEN) {
rdlen = rx_ring->count * sizeof(union e1000_rx_desc_extended);
adapter->clean_rx = e1000_clean_jumbo_rx_irq;
adapter->alloc_rx_buf = e1000_alloc_jumbo_rx_buffers;
@@ -3273,7 +3275,7 @@ static void e1000_configure_rx(struct e1000_adapter 
*adapter)
/* With jumbo frames, excessive C-state transition latencies result
 * in dropped transactions.
 */
-   if (adapter->netdev->mtu > ETH_DATA_LEN) {
+   if (adapter->max_frame_size > VLAN_ETH_FRAME_LEN + ETH_FCS_LEN) {
u32 lat =
((er32(PBA) & E1000_PBA_RXA_MASK) * 1024 -
 adapter->max_frame_size) * 8 / 1000;
@@ -4001,7 +4003,8 @@ void e1000e_reset(struct e1000_adapter *adapter)
switch (hw->mac.type) {
case e1000_ich9lan:
case e1000_ich10lan:
-   if (adapter->netdev->mtu > ETH_DATA_LEN) {
+   if (adapter->max_frame_size > VLAN_ETH_FRAME_LEN +
+ ETH_FCS_LEN) {
pba = 14;
ew32(PBA, pba);
fc->high_water = 0x2800;
@@ -4020,7 +4023,8 @@ void e1000e_reset(struct e1000_adapter *adapter)
/* Workaround PCH LOM adapter hangs with certain network
 * loads.  If hangs persist, try disabling Tx flow control.
 */
-   if (adapter->netdev->mtu > ETH_DATA_LEN) {
+   if (adapter->max_frame_size > VLAN_ETH_FRAME_LEN +
+ ETH_FCS_LEN) {
fc->high_water = 0x3500;
fc->low_water = 0x1500;
} else {
@@ -4034,7 +4038,8 @@ void e1000e_reset(struct e1000_adapter *adapter)
case e1000_pch_spt:
fc->refresh_time = 0x0400;
 
-   if (adapter->netdev->mtu <= ETH_DATA_LEN) {
+   if (adapter->max_frame_size <= VLAN_ETH_FRAME_LEN +
+  ETH_FCS_LEN) {
fc->high_water = 0x05C20;
fc->low_water = 0x05048;
fc->pause_time = 0x0650;
@@ -4278,7 +4283,7 @@ void e1000e_down(struct e1000_adapter *adapter, bool 
reset)
 
/* Disable Si errata workaround on PCHx for jumbo frame flow */
if ((hw->mac.type >= e1000_pch2lan) &&
-   (adapter->netdev->mtu > ETH_DATA_LEN) &&
+   (adapter->max_frame_size > VLAN_ETH_FRAME_LEN + ET

[PATCH RFC 2/3] net: Support IFLA_ENV_HDR_LEN to configure max envelope header length

2016-09-27 Thread Toshiaki Makita
With this change, admin can configure env_hdr_len.

Signed-off-by: Toshiaki Makita 
---
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b4fba66..9545ea4 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_ENV_HDR_LEN,
__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3ac8946..9233709 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -943,7 +943,8 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
   + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
   + rtnl_xdp_size(dev) /* IFLA_XDP */
-  + nla_total_size(1); /* IFLA_PROTO_DOWN */
+  + nla_total_size(1) /* IFLA_PROTO_DOWN */
+  + nla_total_size(4); /* IFLA_ENV_HDR_LEN */
 
 }
 
@@ -1321,7 +1322,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
 nla_put_string(skb, IFLA_IFALIAS, dev->ifalias)) ||
nla_put_u32(skb, IFLA_CARRIER_CHANGES,
atomic_read(&dev->carrier_changes)) ||
-   nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down))
+   nla_put_u8(skb, IFLA_PROTO_DOWN, dev->proto_down) ||
+   nla_put_u32(skb, IFLA_ENV_HDR_LEN, netif_get_env_hdr_len(dev)))
goto nla_put_failure;
 
if (rtnl_fill_link_ifmap(skb, dev))
@@ -1458,6 +1460,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_LINK_NETNSID] = { .type = NLA_S32 },
[IFLA_PROTO_DOWN]   = { .type = NLA_U8 },
[IFLA_XDP]  = { .type = NLA_NESTED },
+   [IFLA_ENV_HDR_LEN]  = { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2174,6 +2177,13 @@ static int do_setlink(const struct sk_buff *skb,
}
}
 
+   if (tb[IFLA_ENV_HDR_LEN]) {
+   err = dev_set_env_hdr_len(dev, 
nla_get_u32(tb[IFLA_ENV_HDR_LEN]));
+   if (err < 0)
+   goto errout;
+   status |= DO_SETLINK_MODIFIED;
+   }
+
 errout:
if (status & DO_SETLINK_MODIFIED) {
if (status & DO_SETLINK_NOTIFY)
@@ -2378,6 +2388,8 @@ struct net_device *rtnl_create_link(struct net *net,
dev->link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
if (tb[IFLA_GROUP])
dev_set_group(dev, nla_get_u32(tb[IFLA_GROUP]));
+   if (tb[IFLA_ENV_HDR_LEN])
+   dev->env_hdr_len = nla_get_u32(tb[IFLA_ENV_HDR_LEN]);
 
return dev;
 
-- 
1.8.3.1





[PATCH RFC iproute2] iplink: Support envhdrlen

2016-09-27 Thread Toshiaki Makita
This adds support for envhdrlen.

Example:
 # ip link set eno1 envhdrlen 8
 # ip link show eno1
 2: eno1:  mtu 1500 envhdrlen 8 qdisc fq_codel 
state UP mode DEFAULT group default qlen 1000
 link/ether 44:37:e6:6c:69:a4 brd ff:ff:ff:ff:ff:ff

Note:
As an RFC, this includes update for kernel headers.

Signed-off-by: Toshiaki Makita 
---
 include/linux/if_link.h |  1 +
 ip/ipaddress.c  |  2 ++
 ip/iplink.c | 10 ++
 3 files changed, 13 insertions(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index b9299e3..46ef8cc 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_ENV_HDR_LEN,
__IFLA_MAX
 };
 
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 76bd7b3..92a472d 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -820,6 +820,8 @@ int print_linkinfo(const struct sockaddr_nl *who,
 
if (tb[IFLA_MTU])
fprintf(fp, "mtu %u ", *(int *)RTA_DATA(tb[IFLA_MTU]));
+   if (tb[IFLA_ENV_HDR_LEN])
+   fprintf(fp, "envhdrlen %u ", *(int 
*)RTA_DATA(tb[IFLA_ENV_HDR_LEN]));
if (tb[IFLA_QDISC])
fprintf(fp, "qdisc %s ", rta_getattr_str(tb[IFLA_QDISC]));
if (tb[IFLA_MASTER]) {
diff --git a/ip/iplink.c b/ip/iplink.c
index 6b1db18..4dcb9ac 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -50,6 +50,7 @@ void iplink_usage(void)
fprintf(stderr, "   [ address LLADDR ]\n");
fprintf(stderr, "   [ broadcast LLADDR ]\n");
fprintf(stderr, "   [ mtu MTU ] [index IDX 
]\n");
+   fprintf(stderr, "   [ envhdrlen ENVHDRLEN ]\n");
fprintf(stderr, "   [ numtxqueues QUEUE_COUNT 
]\n");
fprintf(stderr, "   [ numrxqueues QUEUE_COUNT 
]\n");
fprintf(stderr, "   type TYPE [ ARGS ]\n");
@@ -489,6 +490,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
char abuf[32];
int qlen = -1;
int mtu = -1;
+   int envhdrlen = -1;
int netns = -1;
int vf = -1;
int numtxqueues = -1;
@@ -547,6 +549,14 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
if (get_integer(&mtu, *argv, 0))
invarg("Invalid \"mtu\" value\n", *argv);
addattr_l(&req->n, sizeof(*req), IFLA_MTU, &mtu, 4);
+   } else if (strcmp(*argv, "envhdrlen") == 0) {
+   NEXT_ARG();
+   if (envhdrlen != -1)
+   duparg("envhdrlen", *argv);
+   if (get_integer(&envhdrlen, *argv, 0))
+   invarg("Invalid \"envhdrlen\" value\n", *argv);
+   addattr_l(&req->n, sizeof(*req), IFLA_ENV_HDR_LEN,
+ &envhdrlen, 4);
} else if (strcmp(*argv, "netns") == 0) {
NEXT_ARG();
if (netns != -1)
-- 
2.5.5





[PATCH RFC 1/3] net: Add dev_set_env_hdr_len to accept envelope frames

2016-09-27 Thread Toshiaki Makita
Currently most NICs support Q-tagged frames[1], i.e. 1522 bytes frames
to handle 4 bytes VLAN header. But some encapsulation protocols like
802.1ad requires them to handle larger frames.
This change introduces dev_set_env_hdr_len() and corresponding drivers'
operation .ndo_set_env_hdr_len(), which notifies drivers of needed
encapsulation header length. This enables devices to accept longer
frames with encapsulation headers, i.e. envelope frames[2], without
expanding MTU size for non-encapsulated frames.

Note 1:
Envelope frames are not jumbo frames. See IEEE 802.3as[3] for detail.
IEEE 802.3-2012 3.2.7 says:
  The envelope frame is intended to allow inclusion of additional
  prefixes and suffixes required by higher layer encapusulation
  protocols such as those defined by the IEEE 802.1 working
  group (such as Provider Bridges and MAC Security), ITU-T or
  IETF (such as MPLS). The original MAC Client Data field
  maximum remains 1500 octets while the encapsulation protocols
  may add up to an additional 482 octets. Use of these extra
  octets for other purposes is not recommended, and may result
  in MAC frames being dropped or corrupted as they may violate
  maximum MAC frame size restrictions if encapsulation protocols
  are required to operate on them.

Note 2:
Envelope frames in IEEE 802.3 defines the max size of envelope frames
as 2000 bytes. This change is more flexible than 802.3 in terms of max
allowed frame length.

[1] IEEE 802.3-2012, 1.4.334.
[2] IEEE 802.3-2012, 1.4.184.
[3] http://www.ieee802.org/3/as/public/0607/802.3as_overview.pdf

Signed-off-by: Toshiaki Makita 
---
 include/linux/netdevice.h | 21 +
 net/core/dev.c| 32 
 2 files changed, 53 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 136ae6bb..a0ac76a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1132,6 +1132,10 @@ struct netdev_xdp {
  * int (*ndo_xdp)(struct net_device *dev, struct netdev_xdp *xdp);
  * This function is used to set or query state related to XDP on the
  * netdevice. See definition of enum xdp_netdev_command for details.
+ * int (*ndo_set_env_hdr_len)(struct net_device *dev, int hdr_len);
+ * This function is used to set the maximum header size of envelope
+ * frames. The device must accept the size of MTU + envelope header
+ * size on packet reception.
  *
  */
 struct net_device_ops {
@@ -1323,6 +1327,8 @@ struct net_device_ops {
   int needed_headroom);
int (*ndo_xdp)(struct net_device *dev,
   struct netdev_xdp *xdp);
+   int (*ndo_set_env_hdr_len)(struct net_device *dev,
+  int hdr_len);
 };
 
 /**
@@ -1506,6 +1512,7 @@ enum netdev_priv_flags {
  * @if_port:   Selectable AUI, TP, ...
  * @dma:   DMA channel
  * @mtu:   Interface MTU value
+ * @env_hdr_len:   Additional encapsulation header length to MTU
  * @type:  Interface hardware type
  * @hard_header_len: Maximum hardware header length.
  *
@@ -1726,6 +1733,7 @@ struct net_device {
unsigned char   dma;
 
unsigned intmtu;
+   unsigned intenv_hdr_len;
unsigned short  type;
unsigned short  hard_header_len;
 
@@ -3300,6 +3308,7 @@ int dev_change_name(struct net_device *, const char *);
 int dev_set_alias(struct net_device *, const char *, size_t);
 int dev_change_net_namespace(struct net_device *, struct net *, const char *);
 int dev_set_mtu(struct net_device *, int);
+int dev_set_env_hdr_len(struct net_device *, int);
 void dev_set_group(struct net_device *, int);
 int dev_set_mac_address(struct net_device *, struct sockaddr *);
 int dev_change_carrier(struct net_device *, bool new_carrier);
@@ -4233,6 +4242,18 @@ static inline bool netif_reduces_vlan_mtu(struct 
net_device *dev)
return dev->priv_flags & IFF_MACSEC;
 }
 
+/* return envelope header length */
+static inline int netif_get_env_hdr_len(struct net_device *dev)
+{
+   if (dev->netdev_ops->ndo_set_env_hdr_len)
+   return dev->env_hdr_len;
+
+   if (netif_reduces_vlan_mtu(dev))
+   return 0;
+
+   return 4; /* VLAN_HLEN */
+}
+
 extern struct pernet_operations __net_initdata loopback_net_ops;
 
 /* Logging, debugging and troubleshooting/diagnostic helpers. */
diff --git a/net/core/dev.c b/net/core/dev.c
index c0c291f..df75aaa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6524,6 +6524,38 @@ int dev_set_mtu(struct net_device *dev, int new_mtu)
 EXPORT_SYMBOL(dev_set_mtu);
 
 /**
+ * dev_set_env_hdr_len - Set max envelope header length
+ * @dev: device
+ * @new_len: new length
+ 

[PATCH RFC 0/3] Support envelope frames (802.3as)

2016-09-27 Thread Toshiaki Makita
This patch introduces software implementation of envelope frames defined
in 802.3as[1], which allows encapsulated packets to be received without
expanding MTU for them.


* Envelope frames

Envelope frames are introduces by IEEE 802.3as[1], which has been
incorpolated in IEEE 802.3-2012.

IEEE 802.3-2012 1.4.184 defines envelope frame as:
A MAC frame that carries a Length/Type field with the Type
interpretation that may indicate additional encapsulation
information within the MAC client data and has a maximum length
of 2000 octets. The envelope frame is intended to allow inclusion
of additional prefixes and suffixes required by higher layer
encapsulation protocols. The encapsulation protocols may use up
to 482 octets.


* Motivation

The intended customer of this feature is mainly vlan, possibly mpls or
other encapsulation protocols.

Vlan is different than other encapsulation protocols in that the packet
size is generally larger than normal packets by vlan-header size (4 bytes).
Thus, most NICs allow packets the size of which is larger by 4 bytes
than MTU (802.3 calls this vlan-tagged packets "Q-tagged frames", whose
MTU is 1504 including vlan header. Most NICs accept Q-tagged frames).

Similarly, when doubly tagged vlan is used leveraging 802.1ad, the packet
size will be larger by vlan-header size * 2 (8 bytes). This packet size is
needed to provide Ethernet VPN transparent to the users. Thus, hardware
switches support 1508 bytes MTU when using 802.1ad, as suggested by MEF[2].
Also, Linux stacked vlan devices have 1500 bytes MTU, which emit 1508
bytes doubly tagged packets. But unfortunately some NICs don't accept 
1508 bytes packets by default, and they are dropped.

++ single tag +---+ double tag +---+ double tag +--+
|End | 1504 bytes |802.1ad| 1508 bytes |802.1ad| 1508 bytes |Linux |
|User|--->|Edge SW|--->|NNI SW |--->|Server|
+++---++---+ *drop* +--+
 on NIC

802.3 calls such encapsulated packets larger than 1504 "envelope frames".
Most NICs lack support for envelope frames. But many of them support jumbo
frames, which can be used to implement envelope frames support in Linux.
I'm proposing this envelope frames support to fix problems described above.


* Implementation

Envelope frames require normal packets to use 1500-sized MTU, while
encapsulation headers can be added to the MTU. If we simply increase MTU
of the physical device, it causes jumbo frames as well as envelope frames
(jumbo frames are non-encapsulated packets whose MTU is larger than 1500).
So what we need here is to increase the max acceptable frame size of NICs
without changing dev->mtu.

In order to achieve this, I add a new function pointer,
.ndo_set_env_hdr_len, in net_device_ops, through which kernel can inform
device drivers of needed additional header size of envelope frames
(env_hdr_len).
Implementation in device drivers is as simple as replacing dev->mtu with
dev->mtu + env_hdr_len. This makes devices recognize dev->mtu + env_hdr_len
as MTU, and allow packets with additional header up to env_hdr_len, while
kernel networking stack recognizes dev->mtu as MTU. Thus no packets larger
than MTU will be sent other than those encapsulated by upper devices. This
effectively supports envelope frames.

Userspace API is netlink, the same as MTU. It will be a parameter which
can be configured through "ip link".


* Q&A

** Why not reducing MTU of VLAN devices?

As written in Motivation, in order to achieve transparency of Ethernet VPN,
MTU of vlan device needs to be 1500. Since this is usual in 802.1ad network,
switches in 802.1ad network send 1508-sized tagged packets. Thus, reducing
MTU of vlan device does not change the situation where Linux receives
packets whose MTU is larger than NICs' acceptable size, and does not fix
the issue.

** Why not increasing MTU of physical devices?

Increasing MTU of physical device indeed resolves the problem that NICs
cannot receive doubly tagged packets. However, this effectively allows
devices to send jumbo frames as well as envelope frames, and could cause
packet drops on network elements which does not accept jumbo frames.

** Why is .ndo_set_env_hdr_len needed?
   Why not modifying drivers to accept envelope frames by default?

Some NICs actually support envelope frames by default. One example is igb,
which always accepts packet size up to 9728.
I however don't think all NICs necessarily be able to do that since some
NICs change their behaviour when changing MTU larger than 1500.
For example, e1000e changes usage of descriptors when its MTU gets larger
than 1500. qlge also looks to change its behaviour as far as I can see from
the source code of the driver.
In order to keep the default behaviour when not using 802.1ad or stacked
vlan, some knob is needed.

** Why are drivers notified of header _length_?
   

Re: [PATCH] brcmfmac: implement more accurate skb tracking

2016-09-27 Thread Arend Van Spriel
On 26-9-2016 16:59, Dan Williams wrote:
> On Mon, 2016-09-26 at 14:13 +0200, Rafał Miłecki wrote:
>> On 26 September 2016 at 13:46, Arend Van Spriel
>>  wrote:
>>>
>>> On 26-9-2016 12:23, Rafał Miłecki wrote:

 From: Rafał Miłecki 

 We need to track 802.1x packets to know if there are any pending
 ones
 for transmission. This is required for performing key update in
 the
 firmware.
>>>
>>> The problem we are trying to solve is a pretty old one. The problem
>>> is
>>> that wpa_supplicant uses two separate code paths: EAPOL messaging
>>> through data path and key configuration though nl80211.
>>
>> Can I find it described/reported somewhere?
> 
> If I understand the issue correctly, you can find all this in the
> supplicant code.  Once the supplicant has done whatever it wants to do
> with the data frames that just happen to be EAPOL it then sends the
> keys down to the driver with nl80211.

Indeed. EAPOL packets are simply data packets as far as the 802.11 stack
is concerned. The arrival of those in the driver is not predictable
hence we hold off the key configuration until those have been passed
over to firmware.

> But it sounds like, instead of sniffing EAPOL frames in the driver skb
> tracking and sniffing ETH_P_PAE, you should probably implement support
> for NL80211_CMD_CRIT_PROTOCOL_START/NL80211_CMD_CRIT_PROTOCOL_STOP and
> key off the passed-in NL80211_CRIT_PROTO_EAPOL.  At least at the
> beginning of connection setup only EAPOL packets will be allowed
> anyway.
> 
> It doesn't seem like the supplicant uses NL80211_CRIT_PROTO_EAPOL yet,
> but that should also be fixed in the supplicant itself.  You should
> probably get some comments from Jouni on how he'd like to see all this
> work.  But generally the less specific sniffing of frames in drivers,
> likely the better.

Indeed. That was the main motivation to introduce the CRIT_PROTO api. If
I recall correctly it was considered the task of the network manager to
issue the START/STOP. Recently noticed the use of CRIT_PROTO_DHCP on
some target system, which we already support in brcmfmac. From your
response I guess you consider CRIT_PROTO_EAPOL to be issued by the
supplicant.

Regards,
Arend

> Dan
> 
>>
>>>

 Unfortunately our old tracking code wasn't very accurate. It was
 treating skb as pending as soon as it was passed by the netif.
 Actual
 handling packet to the firmware was happening later as brcmfmac
 internally queues them and uses its own worker(s).
>>>
>>> That does not seem right. As soon as we get a 1x packet we need to
>>> wait
>>> with key configuration regardless whether it is still in the driver
>>> or
>>> handed over to firmware already.
>>
>> OK, thanks.


[PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Rafał Miłecki
From: Rafał Miłecki 

Flowrings contain skbs waiting for transmission that were passed to us
by netif. It means we checked every one of them looking for 802.1x
Ethernet type. When deleting flowring we have to use freeing function
that will check for 802.1x type as well.

Freeing skbs without a proper check was leading to counter not being
properly decreased. This was triggering a WARNING every time
brcmf_netdev_wait_pend8021x was called.

Signed-off-by: Rafał Miłecki 
---
Kalle: this isn't important enough for 4.8 as it's too late for that.

I'd like to get it for 4.9 however, as this fixes bug that could lead
to WARNING on every add_key/del_key call. We was struggling with these
WARNINGs for some time and this fixes one of two problems causing them.
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
index b16b367..d0b738d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
@@ -234,13 +234,20 @@ static void brcmf_flowring_block(struct brcmf_flowring 
*flow, u16 flowid,
 
 void brcmf_flowring_delete(struct brcmf_flowring *flow, u16 flowid)
 {
+   struct brcmf_bus *bus_if = dev_get_drvdata(flow->dev);
struct brcmf_flowring_ring *ring;
+   struct brcmf_if *ifp;
u16 hash_idx;
+   u8 ifidx;
struct sk_buff *skb;
 
ring = flow->rings[flowid];
if (!ring)
return;
+
+   ifidx = brcmf_flowring_ifidx_get(flow, flowid);
+   ifp = brcmf_get_ifp(bus_if->drvr, ifidx);
+
brcmf_flowring_block(flow, flowid, false);
hash_idx = ring->hash_id;
flow->hash[hash_idx].ifidx = BRCMF_FLOWRING_INVALID_IFIDX;
@@ -249,7 +256,7 @@ void brcmf_flowring_delete(struct brcmf_flowring *flow, u16 
flowid)
 
skb = skb_dequeue(&ring->skblist);
while (skb) {
-   brcmu_pkt_buf_free_skb(skb);
+   brcmf_txfinalize(ifp, skb, false);
skb = skb_dequeue(&ring->skblist);
}
 
-- 
2.9.3



Re: [PATCH net-next V2] net/sched: pkt_cls: change tc actions order to be as the user sets

2016-09-27 Thread Hadar Hen Zion
On Tue, Sep 27, 2016 at 11:09 AM, Hadar Hen Zion  wrote:
> Currently the created tc actions list is reversed against the order
> set by the user.
> Change the actions list order to be the same as was set by the user.
>
> This patch doesn't affect dump actions behavior.
> For dumping, action->order parameter is used so the list order doesn't
> matter.
>
> Signed-off-by: Hadar Hen Zion 
> Acked-by: Jamal Hadi Salim 


Changes from V1:
- Add a comment to the change log


> ---
>  include/net/pkt_cls.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 5ccaa4b..767b03a 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -123,7 +123,7 @@ static inline void tcf_exts_to_list(const struct tcf_exts 
> *exts,
> for (i = 0; i < exts->nr_actions; i++) {
> struct tc_action *a = exts->actions[i];
>
> -   list_add(&a->list, actions);
> +   list_add_tail(&a->list, actions);
> }
>  #endif
>  }
> --
> 1.8.3.1
>


RE: [PATCH v5 13/16] IB/pvrdma: Add the main driver module for PVRDMA

2016-09-27 Thread David Laight
From: Adit Ranadive
> Sent: 26 September 2016 19:15
> On Mon, Sep 26, 2016 at 00:27:40AM -0700, Yuval Shaia wrote:
> > On Sat, Sep 24, 2016 at 04:21:37PM -0700, Adit Ranadive wrote:
> > > +
> > > + /* Currently, the driver only supports RoCE mode. */
> > > + if (dev->dsr->caps.mode != PVRDMA_DEVICE_MODE_ROCE) {
> > > + dev_err(&pdev->dev, "unsupported transport %d\n",
> > > + dev->dsr->caps.mode);
> > > + ret = -EINVAL;
> >
> > This is some fatal error with the device, not that something wrong with the
> > function's argument.
> > Suggesting to replace with -EFAULT.
> >
> 
> Thanks, will fix this one and the others here.

Won't EFAULT generate SIGSEGV ?

David



Re: [PATCH] brcmfmac: implement more accurate skb tracking

2016-09-27 Thread Arend Van Spriel
On 26-9-2016 14:38, Rafał Miłecki wrote:
> On 26 September 2016 at 14:13, Rafał Miłecki  wrote:
>> On 26 September 2016 at 13:46, Arend Van Spriel
>>  wrote:
>>> On 26-9-2016 12:23, Rafał Miłecki wrote:
 From: Rafał Miłecki 

 We need to track 802.1x packets to know if there are any pending ones
 for transmission. This is required for performing key update in the
 firmware.
>>>
>>> The problem we are trying to solve is a pretty old one. The problem is
>>> that wpa_supplicant uses two separate code paths: EAPOL messaging
>>> through data path and key configuration though nl80211.
>>
>> Can I find it described/reported somewhere?
>>
>>
 Unfortunately our old tracking code wasn't very accurate. It was
 treating skb as pending as soon as it was passed by the netif. Actual
 handling packet to the firmware was happening later as brcmfmac
 internally queues them and uses its own worker(s).
>>>
>>> That does not seem right. As soon as we get a 1x packet we need to wait
>>> with key configuration regardless whether it is still in the driver or
>>> handed over to firmware already.
>>
>> OK, thanks.
> 
> Actually, it's not OK. I was trying to report/describe/discuss this
> problem for over a week. I couldn't get much of answer from you.
> 
> I had to come with a patch I worked on for quite some time. Only then
> you decided to react and reply with a reason for a nack. I see this
> patch may be wrong (but it's still hard to know what's going wrong
> without a proper hostapd bug report). I'd expect you to somehow work &
> communicate with open source community.

We do or at least make an honest attempt, but there is more on our plate
so responses may be delayed. It also does not help when you get anal and
preachy when we do respond. Also not OK. In this case the delay is
caused because I had to pick up the thread(s) as Hante is on vacation
(he needed a break :-p ). However, you started sending patches so I
decided to look at and respond to those. Sorry if you felt like we left
you hanging to dry.

Regards,
Arend


Re: [PATCH net-next] net/sched: cls_flower: Use a proper mask value for enc key id parameter

2016-09-27 Thread Amir Vadai"
On Tue, Sep 27, 2016 at 11:21:18AM +0300, Hadar Hen Zion wrote:
> The current code use the encapsulation key id value as the mask of that
> parameter which is wrong. Fix that by using a full mask.
> 
> Fixes: bc3103f1ed40 ('net/sched: cls_flower: Classify packet in ip tunnels')
> Signed-off-by: Hadar Hen Zion 
> ---

Acked-by: Amir Vadai 


Explaining RX-stages for XDP

2016-09-27 Thread Jesper Dangaard Brouer

Let me try in a calm way (not like [1]) to explain how I imagine that
the XDP processing RX-stage should be implemented. As I've pointed out
before[2], I'm proposing splitting up the driver into RX-stages.  This
is a mental-model change, I hope you can follow my "inception" attempt.

The basic concept behind this idea is, if the RX-ring contains
multiple "ready" packets, then the kernel was too slow, processing
incoming packets. Thus, switch into more efficient mode, which is a
"packet-vector" mode.

Today, our XDP micro-benchmarks looks amazing, and they are!  But once
real-life intermixed traffic is used, then we loose the XDP I-cache
benefit.  XDP is meant for DoS protection, and an attacker can easily
construct intermixed traffic.  Why not fix this architecturally?

Most importantly concept: If XDP return XDP_PASS, do NOT pass the
packet up the network stack immediately (that would flush I-cache).
Instead store the packet for the next RX-stage.  Basically splitting
the packet-vector into two packet-vectors, one for network-stack and
one for XDP.  Thus, intermixed XDP vs. netstack not longer have effect
on XDP performance.

The reason for also creating an XDP packet-vector, is to move the
XDP_TX transmit code out of the XDP processing stage (and future
features).  This maximize I-cache availability to the eBPF program,
and make eBPF performance more uniform across drivers.


Inception:
 * Instead of individual packets, see it as a RX packet-vector.
 * XDP should be seen as a stage *before* the network stack gets called.

If your mind can handle it: I'm NOT proposing a RX-vector of 64-packets.
I actually want N-packet per vector (8-16).  As the NIC HW RX process
runs concurrently, and by the time it takes to process N-packets, more
packets have had a chance to arrive in the RX-ring queue.

-- 
Best regards,
  Jesper Dangaard Brouertho
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] https://mid.mail-archive.com/netdev@vger.kernel.org/msg127043.html

[2] http://lists.openwall.net/netdev/2016/01/15/51  

[3] http://lists.openwall.net/netdev/2016/04/19/89


Re: ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread SF Markus Elfring
>> Will it matter here if the function "kfree" will be called for the
>> data structure members "bcs" and "inbuf" after a later function call
>> failed within the implementation of "gigaset_initcs"?
> 
> My translation of this question is: could you please hold my hand while
> I read the code of a driver I do not use - a driver for hardware that I
> don't even have, and therefor cannot really test - after I submitted a
> patch that appears to be broken?

I got the impression that the exception handling  was incomplete in the
implementation of the function "gigaset_initcs".
Does anybody (besides me) care for improving the software situation there?


> My answer to that question is: no, sorry, I won't do that.

I find that the process has just started once more to clarify in which
directions this software module could evolve.

Do we discuss a potential memory leak under special conditions here?

Regards,
Markus


Re: [PATCH v4 5/7] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-27 Thread Maciej Żenczykowski
> Please just use do_div here and go back to the first version of the
> patch. Variable names could be more aligned with the RFC maybe?

So I tried:

static inline s32 rfc3315_s14_backoff_init(s32 irt)
 {
   /* multiply 'initial retransmission time' by 0.9 .. 1.1 */
   u64 tmp = (90 + prandom_u32() % 21) * (u64)irt;
   do_div(tmp, 100);
   return (s32)tmp;
 }

static inline s32 rfc3315_s14_backoff_update(s32 rt, s32 mrt)
 {
   /* multiply 'retransmission timeout' by 1.9 .. 2.1 */
   u64 tmp = (190 + prandom_u32() % 21) * (u64)rt;
   do_div(tmp, 100);
   if ((s32)tmp > mrt) {
   /* multiply 'maximum retransmission time' by 0.9 .. 1.1 */
   tmp = (90 + prandom_u32() % 21) * (u64)mrt;
   do_div(tmp, 100);
}
   return (s32)tmp;
}

but then building for i386 I get:

ERROR: "__udivdi3" [net/netfilter/xt_hashlimit.ko] undefined!

which happens even at net-next/master itself.

Anyway, I'll resubmit assuming the above is what you're looking for...


Re: ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread Paul Bolle
You're in Eliza mode again.

(Hat tip to Björn Mork, https://lkml.org/lkml/2016/1/4/259).


Paul Bolle


Re: [PATCH v4 4/7] ipv6 addrconf: add new sysctl 'router_solicitation_max_interval'

2016-09-27 Thread Hannes Frederic Sowa
On 27.09.2016 04:30, Maciej Żenczykowski wrote:
>> Is seconds granular enough?
> 
> The only reason why one would ever want to go into fractions of
> seconds would be some sort of unittesting with very low delays.
> 
> In any normal environment the max is going to be tens if not hundreds
> or thousands of seconds.
> 
> Also note that the delay and interval (ie. not max interval) are also
> currently exported in seconds, so having more granularity for
> max_seconds is kind of pointless.
> 
> I have been considering whether I could make proc_dointvec_jiffies
> accept floating point input (and output) though... although that seems
> a little harder and probably out of scope of this change.

Good point. Using ms should actually be easy, instead of
proc_dointvec_jiffies you can use proc_dointvec_ms_jiffies.

It seems good practice to add a _ms then to the sysctl, too.

I am fine with both ways.

Bye,
Hannes




[PATCH v5 4/7] ipv6 addrconf: add new sysctl 'router_solicitation_max_interval'

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

Accessible via:
  /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval

For now we default it to the same value as the normal interval.

Signed-off-by: Maciej Żenczykowski 
---
 include/linux/ipv6.h  |  1 +
 include/net/addrconf.h|  1 +
 include/uapi/linux/ipv6.h |  1 +
 net/ipv6/addrconf.c   | 11 +++
 4 files changed, 14 insertions(+)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index c6dbcd84a2c7..7e9a789be5e0 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -18,6 +18,7 @@ struct ipv6_devconf {
__s32   dad_transmits;
__s32   rtr_solicits;
__s32   rtr_solicit_interval;
+   __s32   rtr_solicit_max_interval;
__s32   rtr_solicit_delay;
__s32   force_mld_version;
__s32   mldv1_unsolicited_report_interval;
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 9826d3a9464c..275e5af4c2f4 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -3,6 +3,7 @@
 
 #define MAX_RTR_SOLICITATIONS  3
 #define RTR_SOLICITATION_INTERVAL  (4*HZ)
+#define RTR_SOLICITATION_MAX_INTERVAL  (4*HZ)
 
 #define MIN_VALID_LIFETIME (2*3600)/* 2 hours */
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 395876060f50..8c2772340c3f 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -177,6 +177,7 @@ enum {
DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
DEVCONF_DROP_UNSOLICITED_NA,
DEVCONF_KEEP_ADDR_ON_DOWN,
+   DEVCONF_RTR_SOLICIT_MAX_INTERVAL,
DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1e59c0034916..84c46950876a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -187,6 +187,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.dad_transmits  = 1,
.rtr_solicits   = MAX_RTR_SOLICITATIONS,
.rtr_solicit_interval   = RTR_SOLICITATION_INTERVAL,
+   .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL,
.rtr_solicit_delay  = MAX_RTR_SOLICITATION_DELAY,
.use_tempaddr   = 0,
.temp_valid_lft = TEMP_VALID_LIFETIME,
@@ -232,6 +233,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly 
= {
.dad_transmits  = 1,
.rtr_solicits   = MAX_RTR_SOLICITATIONS,
.rtr_solicit_interval   = RTR_SOLICITATION_INTERVAL,
+   .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL,
.rtr_solicit_delay  = MAX_RTR_SOLICITATION_DELAY,
.use_tempaddr   = 0,
.temp_valid_lft = TEMP_VALID_LIFETIME,
@@ -4891,6 +4893,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf 
*cnf,
array[DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits;
array[DEVCONF_RTR_SOLICIT_INTERVAL] =
jiffies_to_msecs(cnf->rtr_solicit_interval);
+   array[DEVCONF_RTR_SOLICIT_MAX_INTERVAL] =
+   jiffies_to_msecs(cnf->rtr_solicit_max_interval);
array[DEVCONF_RTR_SOLICIT_DELAY] =
jiffies_to_msecs(cnf->rtr_solicit_delay);
array[DEVCONF_FORCE_MLD_VERSION] = cnf->force_mld_version;
@@ -5771,6 +5775,13 @@ static const struct ctl_table addrconf_sysctl[] = {
.proc_handler   = proc_dointvec_jiffies,
},
{
+   .procname   = "router_solicitation_max_interval",
+   .data   = &ipv6_devconf.rtr_solicit_max_interval,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_jiffies,
+   },
+   {
.procname   = "router_solicitation_delay",
.data   = &ipv6_devconf.rtr_solicit_delay,
.maxlen = sizeof(int),
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 3/7] ipv6 addrconf: rtr_solicits == -1 means unlimited

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This allows setting /proc/sys/net/ipv6/conf/*/router_solicitations
to -1 meaning an unlimited number of retransmits.

Signed-off-by: Maciej Żenczykowski 
---
 net/ipv6/addrconf.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8bd2d06eefe7..1e59c0034916 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3687,7 +3687,7 @@ static void addrconf_rs_timer(unsigned long data)
if (idev->if_flags & IF_RA_RCVD)
goto out;
 
-   if (idev->rs_probes++ < idev->cnf.rtr_solicits) {
+   if (idev->rs_probes++ < idev->cnf.rtr_solicits || 
idev->cnf.rtr_solicits == -1) {
write_unlock(&idev->lock);
if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
ndisc_send_rs(dev, &lladdr,
@@ -3949,7 +3949,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
*ifp)
send_mld = ifp->scope == IFA_LINK && ipv6_lonely_lladdr(ifp);
send_rs = send_mld &&
  ipv6_accept_ra(ifp->idev) &&
- ifp->idev->cnf.rtr_solicits > 0 &&
+ ifp->idev->cnf.rtr_solicits != 0 &&
  (dev->flags&IFF_LOOPBACK) == 0;
read_unlock_bh(&ifp->idev->lock);
 
@@ -5099,7 +5099,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, 
struct in6_addr *token)
return -EINVAL;
if (!ipv6_accept_ra(idev))
return -EINVAL;
-   if (idev->cnf.rtr_solicits <= 0)
+   if (idev->cnf.rtr_solicits == 0)
return -EINVAL;
 
write_lock_bh(&idev->lock);
@@ -5699,6 +5699,7 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
ctl_table *ctl,
return ret;
 }
 
+static const int minus_one = -1;
 static const int one = 1;
 static const int two_five_five = 255;
 
@@ -5759,7 +5760,8 @@ static const struct ctl_table addrconf_sysctl[] = {
.data   = &ipv6_devconf.rtr_solicits,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = (void *)&minus_one,
},
{
.procname   = "router_solicitation_interval",
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 6/7] ipv6 addrconf: change default RTR_SOLICITATION_MAX_INTERVAL from 4s to 1h

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This changes:
  /proc/sys/net/ipv6/conf/all/router_solicitation_max_interval
  /proc/sys/net/ipv6/conf/default/router_solicitation_max_interval
from 4 seconds to 1 hour.

This is the https://tools.ietf.org/html/rfc7559 recommended default.

Signed-off-by: Maciej Żenczykowski 
---
 include/net/addrconf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 275e5af4c2f4..8f3677269f9a 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -3,7 +3,7 @@
 
 #define MAX_RTR_SOLICITATIONS  3
 #define RTR_SOLICITATION_INTERVAL  (4*HZ)
-#define RTR_SOLICITATION_MAX_INTERVAL  (4*HZ)
+#define RTR_SOLICITATION_MAX_INTERVAL  (3600*HZ)   /* 1 hour */
 
 #define MIN_VALID_LIFETIME (2*3600)/* 2 hours */
 
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 5/7] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This implements:
  https://tools.ietf.org/html/rfc7559

Backoff is performed according to RFC3315 section 14:
  https://tools.ietf.org/html/rfc3315#section-14

Signed-off-by: Maciej Żenczykowski 
---
 include/net/if_inet6.h |  1 +
 net/ipv6/addrconf.c| 34 ++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 1c8b6820b694..515352c6280a 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -201,6 +201,7 @@ struct inet6_dev {
struct ipv6_devstat stats;
 
struct timer_list   rs_timer;
+   __s32   rs_interval;/* in jiffies */
__u8rs_probes;
 
__u8addr_gen_mode;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 84c46950876a..dc287f57c39b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -112,6 +112,27 @@ static inline u32 cstamp_delta(unsigned long cstamp)
return (cstamp - INITIAL_JIFFIES) * 100UL / HZ;
 }
 
+static inline s32 rfc3315_s14_backoff_init(s32 irt)
+{
+   /* multiply 'initial retransmission time' by 0.9 .. 1.1 */
+   u64 tmp = (90 + prandom_u32() % 21) * (u64)irt;
+   do_div(tmp, 100);
+   return (s32)tmp;
+}
+
+static inline s32 rfc3315_s14_backoff_update(s32 rt, s32 mrt)
+{
+   /* multiply 'retransmission timeout' by 1.9 .. 2.1 */
+   u64 tmp = (190 + prandom_u32() % 21) * (u64)rt;
+   do_div(tmp, 100);
+   if ((s32)tmp > mrt) {
+   /* multiply 'maximum retransmission time' by 0.9 .. 1.1 */
+   tmp = (90 + prandom_u32() % 21) * (u64)mrt;
+   do_div(tmp, 100);
+   }
+   return (s32)tmp;
+}
+
 #ifdef CONFIG_SYSCTL
 static int addrconf_sysctl_register(struct inet6_dev *idev);
 static void addrconf_sysctl_unregister(struct inet6_dev *idev);
@@ -3698,11 +3719,13 @@ static void addrconf_rs_timer(unsigned long data)
goto put;
 
write_lock(&idev->lock);
+   idev->rs_interval = rfc3315_s14_backoff_update(
+   idev->rs_interval, idev->cnf.rtr_solicit_max_interval);
/* The wait after the last probe can be shorter */
addrconf_mod_rs_timer(idev, (idev->rs_probes ==
 idev->cnf.rtr_solicits) ?
  idev->cnf.rtr_solicit_delay :
- idev->cnf.rtr_solicit_interval);
+ idev->rs_interval);
} else {
/*
 * Note: we do not support deprecated "all on-link"
@@ -3973,10 +3996,11 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
*ifp)
 
write_lock_bh(&ifp->idev->lock);
spin_lock(&ifp->lock);
+   ifp->idev->rs_interval = rfc3315_s14_backoff_init(
+   ifp->idev->cnf.rtr_solicit_interval);
ifp->idev->rs_probes = 1;
ifp->idev->if_flags |= IF_RS_SENT;
-   addrconf_mod_rs_timer(ifp->idev,
- ifp->idev->cnf.rtr_solicit_interval);
+   addrconf_mod_rs_timer(ifp->idev, ifp->idev->rs_interval);
spin_unlock(&ifp->lock);
write_unlock_bh(&ifp->idev->lock);
}
@@ -5132,8 +5156,10 @@ update_lft:
 
if (update_rs) {
idev->if_flags |= IF_RS_SENT;
+   idev->rs_interval = rfc3315_s14_backoff_init(
+   idev->cnf.rtr_solicit_interval);
idev->rs_probes = 1;
-   addrconf_mod_rs_timer(idev, idev->cnf.rtr_solicit_interval);
+   addrconf_mod_rs_timer(idev, idev->rs_interval);
}
 
/* Well, that's kinda nasty ... */
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 2/7] ipv6 addrconf: remove addrconf_sysctl_hop_limit()

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

replace with extra1/2 magic

Signed-off-by: Maciej Żenczykowski 
---
 net/ipv6/addrconf.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 11fa1a5564d4..8bd2d06eefe7 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5467,20 +5467,6 @@ int addrconf_sysctl_forward(struct ctl_table *ctl, int 
write,
 }
 
 static
-int addrconf_sysctl_hop_limit(struct ctl_table *ctl, int write,
-  void __user *buffer, size_t *lenp, loff_t *ppos)
-{
-   struct ctl_table lctl;
-   int min_hl = 1, max_hl = 255;
-
-   lctl = *ctl;
-   lctl.extra1 = &min_hl;
-   lctl.extra2 = &max_hl;
-
-   return proc_dointvec_minmax(&lctl, write, buffer, lenp, ppos);
-}
-
-static
 int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -5713,6 +5699,9 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
ctl_table *ctl,
return ret;
 }
 
+static const int one = 1;
+static const int two_five_five = 255;
+
 static const struct ctl_table addrconf_sysctl[] = {
{
.procname   = "forwarding",
@@ -5726,7 +5715,9 @@ static const struct ctl_table addrconf_sysctl[] = {
.data   = &ipv6_devconf.hop_limit,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = addrconf_sysctl_hop_limit,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = (void *)&one,
+   .extra2 = (void *)&two_five_five,
},
{
.procname   = "mtu",
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 0/7] implement rfc7559 ipv6 router solicitation backoff

2016-09-27 Thread Maciej Żenczykowski
Hi,

This patch series implements RFC7559 style backoff of IPv6 router
solicitation requests.

Patches 1 and 2 are minor cleanup and stand on their own.

Patch 3 allows a (potentially) infinite number of RS'es to be sent
when the rtr_solicits sysctl is set to -1 (this depends on patch 1).

Patch 4 is just boilerplate to add a new sysctl for the maximum
backoff period.

Patch 5 implements the backoff algorithm (and depends on the previous
patches).

Patches 6 and 7 switch the defaults over to enable this by default
(defaults come from the RFC).

[PATCH v5 1/7] ipv6 addrconf: enable use of proc_dointvec_minmax in
[PATCH v5 2/7] ipv6 addrconf: remove addrconf_sysctl_hop_limit()
[PATCH v5 3/7] ipv6 addrconf: rtr_solicits == -1 means unlimited
[PATCH v5 4/7] ipv6 addrconf: add new sysctl
[PATCH v5 5/7] ipv6 addrconf: implement RFC7559 router solicitation
[PATCH v5 6/7] ipv6 addrconf: change default
[PATCH v5 7/7] ipv6 addrconf: change default MAX_RTR_SOLICITATIONS

Changes v4->v5:
  added 'const' qualifier to extra1/2 constants - requires (void*) casting
  switched away from shifts by 20 to do_div(..., 100)
  switched to variable names from the rfc, elaborated a bit in the comments

Changes v3->v4:
  added subject line to cover letter

Changes v2->v3:
  added cover letter

Changes v1->v2:
  avoid 64-bit divisions to fix 32-bit build errors


[PATCH v5 1/7] ipv6 addrconf: enable use of proc_dointvec_minmax in addrconf_sysctl

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

Signed-off-by: Maciej Żenczykowski 
---
 net/ipv6/addrconf.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2f1f5d439788..11fa1a5564d4 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6044,8 +6044,14 @@ static int __addrconf_sysctl_register(struct net *net, 
char *dev_name,
 
for (i = 0; table[i].data; i++) {
table[i].data += (char *)p - (char *)&ipv6_devconf;
-   table[i].extra1 = idev; /* embedded; no ref */
-   table[i].extra2 = net;
+   /* If one of these is already set, then it is not safe to
+* overwrite either of them: this makes proc_dointvec_minmax
+* usable.
+*/
+   if (!table[i].extra1 && !table[i].extra2) {
+   table[i].extra1 = idev; /* embedded; no ref */
+   table[i].extra2 = net;
+   }
}
 
snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
-- 
2.8.0.rc3.226.g39d4020



[PATCH v5 7/7] ipv6 addrconf: change default MAX_RTR_SOLICITATIONS from 3 to -1 (unlimited)

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This changes:
  /proc/sys/net/ipv6/conf/all/router_solicitations
  /proc/sys/net/ipv6/conf/default/router_solicitations
from 3 to unlimited.

This is the https://tools.ietf.org/html/rfc7559 recommended default.

Signed-off-by: Maciej Żenczykowski 
---
 include/net/addrconf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 8f3677269f9a..f2d072787947 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -1,7 +1,7 @@
 #ifndef _ADDRCONF_H
 #define _ADDRCONF_H
 
-#define MAX_RTR_SOLICITATIONS  3
+#define MAX_RTR_SOLICITATIONS  -1  /* unlimited */
 #define RTR_SOLICITATION_INTERVAL  (4*HZ)
 #define RTR_SOLICITATION_MAX_INTERVAL  (3600*HZ)   /* 1 hour */
 
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH net 2/5] sctp: reuse sent_count to avoid retransmitted chunks for RTT measurements

2016-09-27 Thread Xin Long
>
> Maybe wrap this in a macro?  i.e.:
> #define chunk_retransmitted(chunk) (chunk->sent_count > 1)
>
> For readability?
>
That's a nice suggestion.
chunk->sent_count == 1 is confusing there for reading.
will improve it in v2.

Thanks.


Re: [PATCH v4 5/7] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-27 Thread Hannes Frederic Sowa
On 27.09.2016 11:42, Maciej Żenczykowski wrote:
>> Please just use do_div here and go back to the first version of the
>> patch. Variable names could be more aligned with the RFC maybe?
> 
> So I tried:
> 
> static inline s32 rfc3315_s14_backoff_init(s32 irt)
>  {
>/* multiply 'initial retransmission time' by 0.9 .. 1.1 */
>u64 tmp = (90 + prandom_u32() % 21) * (u64)irt;
>do_div(tmp, 100);
>return (s32)tmp;
>  }
> 
> static inline s32 rfc3315_s14_backoff_update(s32 rt, s32 mrt)
>  {
>/* multiply 'retransmission timeout' by 1.9 .. 2.1 */
>u64 tmp = (190 + prandom_u32() % 21) * (u64)rt;
>do_div(tmp, 100);
>if ((s32)tmp > mrt) {
>/* multiply 'maximum retransmission time' by 0.9 .. 1.1 */
>tmp = (90 + prandom_u32() % 21) * (u64)mrt;
>do_div(tmp, 100);
> }
>return (s32)tmp;
> }
> 
> but then building for i386 I get:
> 
> ERROR: "__udivdi3" [net/netfilter/xt_hashlimit.ko] undefined!
> 
> which happens even at net-next/master itself.
> 
> Anyway, I'll resubmit assuming the above is what you're looking for...

I think the __udivdi3 comes from the fact you are doing the modulo
operation and the reciprocal divide optimization doesn't work in this
case thus you end up with the call to libgcc.

Can you use the remainder from the do_div operation also?

u32 r = prandom_u32();
u64 tmp = (90 + do_div(r,21)) * (u64)irt;

Depending on if you keep the values in ms or jiffies, maybe it would
make sense to simply use msecs_to_jiffies and vice versa?

Thanks,
Hannes



Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Arend Van Spriel
On 27-9-2016 11:14, Rafał Miłecki wrote:
> From: Rafał Miłecki 
> 
> Flowrings contain skbs waiting for transmission that were passed to us
> by netif. It means we checked every one of them looking for 802.1x
> Ethernet type. When deleting flowring we have to use freeing function
> that will check for 802.1x type as well.
> 
> Freeing skbs without a proper check was leading to counter not being
> properly decreased. This was triggering a WARNING every time
> brcmf_netdev_wait_pend8021x was called.

Acked-by: Arend van Spriel 
> Signed-off-by: Rafał Miłecki 
> ---
> Kalle: this isn't important enough for 4.8 as it's too late for that.
> 
> I'd like to get it for 4.9 however, as this fixes bug that could lead
> to WARNING on every add_key/del_key call. We was struggling with these
> WARNINGs for some time and this fixes one of two problems causing them.

Please mark it for stable as well.

> ---
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
> index b16b367..d0b738d 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
> @@ -234,13 +234,20 @@ static void brcmf_flowring_block(struct brcmf_flowring 
> *flow, u16 flowid,
>  
>  void brcmf_flowring_delete(struct brcmf_flowring *flow, u16 flowid)
>  {
> + struct brcmf_bus *bus_if = dev_get_drvdata(flow->dev);
>   struct brcmf_flowring_ring *ring;
> + struct brcmf_if *ifp;
>   u16 hash_idx;
> + u8 ifidx;
>   struct sk_buff *skb;
>  
>   ring = flow->rings[flowid];
>   if (!ring)
>   return;
> +
> + ifidx = brcmf_flowring_ifidx_get(flow, flowid);
> + ifp = brcmf_get_ifp(bus_if->drvr, ifidx);
> +
>   brcmf_flowring_block(flow, flowid, false);
>   hash_idx = ring->hash_id;
>   flow->hash[hash_idx].ifidx = BRCMF_FLOWRING_INVALID_IFIDX;

I am not very familiar with flowring code, but I suppose this is just
initializing the entry for later use, right?

> @@ -249,7 +256,7 @@ void brcmf_flowring_delete(struct brcmf_flowring *flow, 
> u16 flowid)
>  
>   skb = skb_dequeue(&ring->skblist);
>   while (skb) {
> - brcmu_pkt_buf_free_skb(skb);
> + brcmf_txfinalize(ifp, skb, false);
>   skb = skb_dequeue(&ring->skblist);
>   }
>  
> 


Re: linux-next: Tree for Sep 27

2016-09-27 Thread Sergey Senozhatsky
Hello,

On (09/27/16 16:40), Stephen Rothwell wrote:
> 
> Changes since 20160923:
> 

seems that commit e3b37f11e6e4e6b6 ("netfilter: replace list_head with
single linked list") breaks the build on !CONFIG_NETFILTER_INGRESS systems
accessing ->nf_hooks_ingress

static void nf_set_hooks_head(struct net *net, const struct nf_hook_ops *reg,
 struct nf_hook_entry *entry)
{
   switch (reg->pf) {
   case NFPROTO_NETDEV:
   /* We already checked in nf_register_net_hook() that this is
* used from ingress.
*/
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);



log:


In file included from ./include/linux/linkage.h:4:0,
 from ./include/linux/kernel.h:6,
 from net/netfilter/core.c:10:
net/netfilter/core.c: In function ‘nf_set_hooks_head’:
net/netfilter/core.c:96:30: error: ‘struct net_device’ has no member named 
‘nf_hooks_ingress’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
  ^
./include/linux/compiler.h:322:17: note: in definition of macro ‘WRITE_ONCE’
  union { typeof(x) __val; char __c[1]; } __u = \
 ^
net/netfilter/core.c:96:3: note: in expansion of macro ‘rcu_assign_pointer’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
   ^~
net/netfilter/core.c:96:30: error: ‘struct net_device’ has no member named 
‘nf_hooks_ingress’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
  ^
./include/linux/compiler.h:323:30: note: in definition of macro ‘WRITE_ONCE’
   { .__val = (__force typeof(x)) (val) }; \
  ^
net/netfilter/core.c:96:3: note: in expansion of macro ‘rcu_assign_pointer’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
   ^~
net/netfilter/core.c:96:30: error: ‘struct net_device’ has no member named 
‘nf_hooks_ingress’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
  ^
./include/linux/compiler.h:323:35: note: in definition of macro ‘WRITE_ONCE’
   { .__val = (__force typeof(x)) (val) }; \
   ^~~
net/netfilter/core.c:96:3: note: in expansion of macro ‘rcu_assign_pointer’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
   ^~
net/netfilter/core.c:96:30: error: ‘struct net_device’ has no member named 
‘nf_hooks_ingress’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
  ^
./include/linux/compiler.h:324:22: note: in definition of macro ‘WRITE_ONCE’
  __write_once_size(&(x), __u.__c, sizeof(x)); \
  ^
net/netfilter/core.c:96:3: note: in expansion of macro ‘rcu_assign_pointer’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
   ^~
net/netfilter/core.c:96:30: error: ‘struct net_device’ has no member named 
‘nf_hooks_ingress’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
  ^
./include/linux/compiler.h:324:42: note: in definition of macro ‘WRITE_ONCE’
  __write_once_size(&(x), __u.__c, sizeof(x)); \
  ^
net/netfilter/core.c:96:3: note: in expansion of macro ‘rcu_assign_pointer’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
   ^~
In file included from ./include/linux/linkage.h:4:0,
 from ./include/linux/kernel.h:6,
 from net/netfilter/core.c:10:
net/netfilter/core.c:96:30: error: ‘struct net_device’ has no member named 
‘nf_hooks_ingress’
   rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
  ^
./include/linux/compiler.h:498:19: note: in definition of macro 
‘__compiletime_assert’
   bool __cond = !(condition);\
   ^
./include/linux/compiler.h:518:2: note: in expansion of macro 
‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
  ^~~
./include/linux/compiler.h:521:2: note: in expansion of macro 
‘compiletime_assert’
  compiletime_assert(__native_word(t),\
  ^~
./include/linux/compiler.h:521:21: note: in expansion of macro ‘__native_word’
  compiletime_assert(__native_word(t),\
 ^
./arch/x86/include/asm/barrier.h:64:2: note: in expansion of macro 
‘compiletime_assert_atomic_type’
  compiletime_assert_atomic_type(*p);\
  ^~
./include/asm-generic/barrier.h:157:33: note: in expansion of macro 
‘__smp_store_release’
 #define smp_store_release(p, v) __smp_store_release(p, v)
 ^~~
./include/linux/rcupdate.h:668:3: note: in expansion of macro 
‘smp_store_release’
   smp_store_release(&p, RCU_INITIALIZER((typeof(p))_r_a_p__v)); \
   ^
net/netfilter/core.c:96:3: note: in expansion of macro ‘rc

Re: [PATCH v4 4/7] ipv6 addrconf: add new sysctl 'router_solicitation_max_interval'

2016-09-27 Thread Maciej Żenczykowski
> Good point. Using ms should actually be easy, instead of
> proc_dointvec_jiffies you can use proc_dointvec_ms_jiffies.

Yes, I'm aware of this, but 'proc_dointvec_ms_jiffies' seems to be a
bit of a hack, and especially for large settings like this exporting
them to userspace in units of ms instead of seconds seems to impose
unnecessary cognitive burden on the user.

(furthermore the other two related settings are exported as seconds)

> It seems good practice to add a _ms then to the sysctl, too.

If we wanted to do that, we'd have to add _ms versions of
router_solicitation_{delay,interval}.
This seems like pointless duplication - since we can't actually delete
the older seconds one.

There isn't really any real world scenario I can think of (besides
unit-testing) where ms granularity would be useful.

If we were to actually fix proc_dointvec_jiffies to accept fractional
seconds, we'd fix all of these interfaces in one fell swoop...

I took a look at what this would take, and there doesn't really appear
to be a nice way to do it :-(
The lower level conversion functions get a character buffer and
convert it to/from an int (or maybe long).

We'd either need to switch these over to take a void* or add an
entirely separate implementation of proc_dointvec_jiffies...

(at which point it could probably also parse units (ns, us, ms, s, m,
h, d) on input if we wanted to...)

> I am fine with both ways.

I'm going to leave this as is (especially since I send out v5 before I
saw your comment)


[PATCH] brcmfmac: replace WARNING on timeout with a simple error message

2016-09-27 Thread Rafał Miłecki
From: Rafał Miłecki 

Even with timeout increased to 950 ms we get WARNINGs from time to time.
It mostly happens on A-MPDU stalls (e.g. when station goes out of
range). It may take up to 5-10 secods for the firmware to recover and
for that time it doesn't process packets.

It's still useful to have a message on time out as it may indicate some
firmware problem and incorrect key update. Raising a WARNING however
wasn't really that necessary, it doesn't point to any driver bug anymore
and backtrace wasn't much useful.

Signed-off-by: Rafał Miłecki 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 6d046ba..9e6f60a 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -1161,7 +1161,8 @@ int brcmf_netdev_wait_pend8021x(struct brcmf_if *ifp)
 !brcmf_get_pend_8021x_cnt(ifp),
 MAX_WAIT_FOR_8021X_TX);
 
-   WARN_ON(!err);
+   if (!err)
+   brcmf_err("Timed out waiting for no pending 802.1x packets\n");
 
return !err;
 }
-- 
2.9.3



Re: [PATCH v5 1/7] ipv6 addrconf: enable use of proc_dointvec_minmax in addrconf_sysctl

2016-09-27 Thread YOSHIFUJI Hideaki
Hi,

Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> Signed-off-by: Maciej Żenczykowski 
> ---
>  net/ipv6/addrconf.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 2f1f5d439788..11fa1a5564d4 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -6044,8 +6044,14 @@ static int __addrconf_sysctl_register(struct net *net, 
> char *dev_name,
>  
>   for (i = 0; table[i].data; i++) {
>   table[i].data += (char *)p - (char *)&ipv6_devconf;
> - table[i].extra1 = idev; /* embedded; no ref */
> - table[i].extra2 = net;
> + /* If one of these is already set, then it is not safe to
> +  * overwrite either of them: this makes proc_dointvec_minmax
> +  * usable.
> +  */
> + if (!table[i].extra1 && !table[i].extra2) {
> + table[i].extra1 = idev; /* embedded; no ref */
> + table[i].extra2 = net;
> + }
>   }
>  
>   snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
> 

This seems nothing to do with the RFC7559 changes.
Why don't you submit this as a separate patch?

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


Re: [PATCH v5 2/7] ipv6 addrconf: remove addrconf_sysctl_hop_limit()

2016-09-27 Thread YOSHIFUJI Hideaki
Hi,

Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> replace with extra1/2 magic
> 
> Signed-off-by: Maciej Żenczykowski 
> ---
>  net/ipv6/addrconf.c | 21 ++---
>  1 file changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 11fa1a5564d4..8bd2d06eefe7 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -5467,20 +5467,6 @@ int addrconf_sysctl_forward(struct ctl_table *ctl, int 
> write,
>  }
>  
>  static
> -int addrconf_sysctl_hop_limit(struct ctl_table *ctl, int write,
> -  void __user *buffer, size_t *lenp, loff_t 
> *ppos)
> -{
> - struct ctl_table lctl;
> - int min_hl = 1, max_hl = 255;
> -
> - lctl = *ctl;
> - lctl.extra1 = &min_hl;
> - lctl.extra2 = &max_hl;
> -
> - return proc_dointvec_minmax(&lctl, write, buffer, lenp, ppos);
> -}
> -
> -static
>  int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
>   void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> @@ -5713,6 +5699,9 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
> ctl_table *ctl,
>   return ret;
>  }
>  
> +static const int one = 1;
> +static const int two_five_five = 255;
> +
>  static const struct ctl_table addrconf_sysctl[] = {
>   {
>   .procname   = "forwarding",
> @@ -5726,7 +5715,9 @@ static const struct ctl_table addrconf_sysctl[] = {
>   .data   = &ipv6_devconf.hop_limit,
>   .maxlen = sizeof(int),
>   .mode   = 0644,
> - .proc_handler   = addrconf_sysctl_hop_limit,
> + .proc_handler   = proc_dointvec_minmax,
> + .extra1 = (void *)&one,
> + .extra2 = (void *)&two_five_five,
>   },
>   {
>   .procname   = "mtu",
> 

Please submit this in a different series of patches
(like 1/7).

--yoshfuji


Re: [PATCH v5 3/7] ipv6 addrconf: rtr_solicits == -1 means unlimited

2016-09-27 Thread YOSHIFUJI Hideaki


Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski 
> 
> This allows setting /proc/sys/net/ipv6/conf/*/router_solicitations
> to -1 meaning an unlimited number of retransmits.
> 

We could say "< 0 means infinite" and we can reduce changes here.

--yoshfuji

> Signed-off-by: Maciej Żenczykowski 
> ---
>  net/ipv6/addrconf.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 8bd2d06eefe7..1e59c0034916 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3687,7 +3687,7 @@ static void addrconf_rs_timer(unsigned long data)
>   if (idev->if_flags & IF_RA_RCVD)
>   goto out;
>  
> - if (idev->rs_probes++ < idev->cnf.rtr_solicits) {
> + if (idev->rs_probes++ < idev->cnf.rtr_solicits || 
> idev->cnf.rtr_solicits == -1) {
>   write_unlock(&idev->lock);
>   if (!ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
>   ndisc_send_rs(dev, &lladdr,
> @@ -3949,7 +3949,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
> *ifp)
>   send_mld = ifp->scope == IFA_LINK && ipv6_lonely_lladdr(ifp);
>   send_rs = send_mld &&
> ipv6_accept_ra(ifp->idev) &&
> -   ifp->idev->cnf.rtr_solicits > 0 &&
> +   ifp->idev->cnf.rtr_solicits != 0 &&
> (dev->flags&IFF_LOOPBACK) == 0;
>   read_unlock_bh(&ifp->idev->lock);
>  
> @@ -5099,7 +5099,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, 
> struct in6_addr *token)
>   return -EINVAL;
>   if (!ipv6_accept_ra(idev))
>   return -EINVAL;
> - if (idev->cnf.rtr_solicits <= 0)
> + if (idev->cnf.rtr_solicits == 0)
>   return -EINVAL;
>  
>   write_lock_bh(&idev->lock);
> @@ -5699,6 +5699,7 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
> ctl_table *ctl,
>   return ret;
>  }
>  
> +static const int minus_one = -1;
>  static const int one = 1;
>  static const int two_five_five = 255;
>  
> @@ -5759,7 +5760,8 @@ static const struct ctl_table addrconf_sysctl[] = {
>   .data   = &ipv6_devconf.rtr_solicits,
>   .maxlen = sizeof(int),
>   .mode   = 0644,
> - .proc_handler   = proc_dointvec,
> + .proc_handler   = proc_dointvec_minmax,
> + .extra1 = (void *)&minus_one,
>   },
>   {
>   .procname   = "router_solicitation_interval",
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


Re: [PATCH v4 5/7] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-27 Thread Hannes Frederic Sowa
[cc Vishwanath Pai]

On 27.09.2016 11:42, Maciej Żenczykowski wrote:
>> Please just use do_div here and go back to the first version of the
>> patch. Variable names could be more aligned with the RFC maybe?
> 
> So I tried:
> 
> static inline s32 rfc3315_s14_backoff_init(s32 irt)
>  {
>/* multiply 'initial retransmission time' by 0.9 .. 1.1 */
>u64 tmp = (90 + prandom_u32() % 21) * (u64)irt;
>do_div(tmp, 100);
>return (s32)tmp;
>  }
> 
> static inline s32 rfc3315_s14_backoff_update(s32 rt, s32 mrt)
>  {
>/* multiply 'retransmission timeout' by 1.9 .. 2.1 */
>u64 tmp = (190 + prandom_u32() % 21) * (u64)rt;
>do_div(tmp, 100);
>if ((s32)tmp > mrt) {
>/* multiply 'maximum retransmission time' by 0.9 .. 1.1 */
>tmp = (90 + prandom_u32() % 21) * (u64)mrt;
>do_div(tmp, 100);
> }
>return (s32)tmp;
> }
> 
> but then building for i386 I get:
> 
> ERROR: "__udivdi3" [net/netfilter/xt_hashlimit.ko] undefined!

Hmm, evidently we have some u64 divisions in xt_hashlimit.c which should
be replaced by do_div?



Re: [PATCH v3] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Michal Hocko
On Tue 27-09-16 10:45:36, Vlastimil Babka wrote:
> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
> with the number of fds passed. We had a customer report page allocation
> failures of order-4 for this allocation. This is a costly order, so it might
> easily fail, as the VM expects such allocation to have a lower-order fallback.
> 
> Such trivial fallback is vmalloc(), as the memory doesn't have to be 
> physically
> contiguous and the allocation is temporary for the duration of the syscall
> only. There were some concerns, whether this would have negative impact on the
> system by exposing vmalloc() to userspace. Although an excessive use of 
> vmalloc
> can cause some system wide performance issues - TLB flushes etc. - a large
> order allocation is not for free either and an excessive reclaim/compaction 
> can
> have a similar effect. Also note that the size is effectively limited by
> RLIMIT_NOFILE which defaults to 1024 on the systems I checked. That means the
> bitmaps will fit well within single page and thus the vmalloc() fallback could
> be only excercised for processes where root allows a higher limit.
> 
> Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
> it doesn't need this kind of fallback.
> 
> [eric.duma...@gmail.com: fix failure path logic]
> [a...@linux-foundation.org: use proper type for size]
> Signed-off-by: Vlastimil Babka 

Yes this makes sense to me. It could be argued that this could be
simplified to not rely on high order allocations at all but this is
simple enough (and backportable to stable trees) and should work
reasonably well.

So FWIW
Acked-by: Michal Hocko 

I would even argue to use __GFP_NORETRY for size > PAGE_SIZE because
giving a userspace an access to high order pages which can invoke OOM
killer is not a great idea. Something for a separate patch though.

> ---
>  fs/select.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/select.c b/fs/select.c
> index 8ed9da50896a..3d4f85defeab 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -554,7 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   fd_set_bits fds;
>   void *bits;
>   int ret, max_fds;
> - unsigned int size;
> + size_t size, alloc_size;
>   struct fdtable *fdt;
>   /* Allocate small arguments on the stack to save memory and be faster */
>   long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
> @@ -581,7 +582,14 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>   if (size > sizeof(stack_fds) / 6) {
>   /* Not enough space in on-stack array; must use kmalloc */
>   ret = -ENOMEM;
> - bits = kmalloc(6 * size, GFP_KERNEL);
> + if (size > (SIZE_MAX / 6))
> + goto out_nofds;
> +
> + alloc_size = 6 * size;
> + bits = kmalloc(alloc_size, GFP_KERNEL|__GFP_NOWARN);
> + if (!bits && alloc_size > PAGE_SIZE)
> + bits = vmalloc(alloc_size);
> +
>   if (!bits)
>   goto out_nofds;
>   }
> @@ -618,7 +626,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set 
> __user *outp,
>  
>  out:
>   if (bits != stack_fds)
> - kfree(bits);
> + kvfree(bits);
>  out_nofds:
>   return ret;
>  }
> -- 
> 2.10.0
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] Fix link error in 32bit arch because of 64bit division

2016-09-27 Thread Liping Zhang
Hi  Vishwanath Pai,

2016-09-27 15:42 GMT+08:00 Vishwanath Pai :
> Fix link error in 32bit arch because of 64bit division

This should be "netfilter: xt_hashlimit: fix ... "

>
> --- a/net/netfilter/xt_hashlimit.c
> +++ b/net/netfilter/xt_hashlimit.c
> @@ -465,19 +465,20 @@ static u64 user2credits(u64 user, int revision)
>  {
> if (revision == 1) {
> /* If multiplying would overflow... */
> -   if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
> +   if (user > div64_u64(0x, (HZ*CREDITS_PER_JIFFY_v1)))

Here divisor and dividend are all 32-bit integer, so covert "/" to div64_u64
seems unnecessary.

> /* Divide first. */
> -   return (user / XT_HASHLIMIT_SCALE) *\
> +   return div64_u64(user, XT_HASHLIMIT_SCALE) *\
> HZ * CREDITS_PER_JIFFY_v1;
>
> -   return (user * HZ * CREDITS_PER_JIFFY_v1) \
> -   / XT_HASHLIMIT_SCALE;
> +   return div64_u64((user * HZ * CREDITS_PER_JIFFY_v1),
> + XT_HASHLIMIT_SCALE);
> } else {
> -   if (user > 0x / (HZ*CREDITS_PER_JIFFY))
> -   return (user / XT_HASHLIMIT_SCALE_v2) *\
> +   if (user > div64_u64(0x, 
> (HZ*CREDITS_PER_JIFFY)))

0x and "HZ*CREDITS_PER_JIFFY" are both
constant, and GCC will do constant folding optimization, so I think
convert "/" to div64_u64 here is also unnecessary.

> +   return div64_u64(user, XT_HASHLIMIT_SCALE_v2) *\
> HZ * CREDITS_PER_JIFFY;
>
> -   return (user * HZ * CREDITS_PER_JIFFY) / 
> XT_HASHLIMIT_SCALE_v2;
> +   return div64_u64((user * HZ * CREDITS_PER_JIFFY),
> +XT_HASHLIMIT_SCALE_v2);
> }
>  }
>


Re: [Gigaset307x-common] ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread Tilman Schmidt
Hi,

as longtime maintainer of the code in question I feel compelled to chime
in at this point.

On Tue, Sep 27, 2016, at 11:34, SF Markus Elfring wrote:
> >> Will it matter here if the function "kfree" will be called for the
> >> data structure members "bcs" and "inbuf" after a later function call
> >> failed within the implementation of "gigaset_initcs"?
> > 
> > My translation of this question is: could you please hold my hand while
> > I read the code of a driver I do not use - a driver for hardware that I
> > don't even have, and therefor cannot really test - after I submitted a
> > patch that appears to be broken?
> 
> I got the impression that the exception handling  was incomplete in the
> implementation of the function "gigaset_initcs".

That impression is wrong. Careful reading of the code will confirm that.

> Does anybody (besides me) care for improving the software situation
> there?

There's no urgent need for improvement. The code is stable and there's
no demonstrated bug to be fixed.
You could improve the coding style, but that is of secondary importance,
and if you want to do that, as a minimum you have to make sure that you
don't introduce new bugs.

Thanks,
Tilman

-- 
Tilman Schmidt
til...@imap.cc


Re: [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit

2016-09-27 Thread Daniel Borkmann

On 09/22/2016 03:21 PM, Shmulik Ladkani wrote:

From: Shmulik Ladkani 

'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
egress to the target device (if action is performed at ingress).

Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
the target device.
This allows to decouple the attribute from the action to be taken.

Signed-off-by: Shmulik Ladkani 
---
  include/net/tc_act/tc_mirred.h |  2 +-
  net/sched/act_mirred.c | 10 +-
  2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
index 62770ad..5275158 100644
--- a/include/net/tc_act/tc_mirred.h
+++ b/include/net/tc_act/tc_mirred.h
@@ -8,7 +8,7 @@ struct tcf_mirred {
struct tc_actioncommon;
int tcfm_eaction;
int tcfm_ifindex;
-   int tcfm_ok_push;
+   int tcfm_mac_header_xmit;


Since you already touch this here and in patch 2/4 anyway, maybe
make that a bool along the way?

Perhaps instead of tcfm_mac_header_xmit, tcfm_mac_header_push
might be a better name?


struct net_device __rcu *tcfm_dev;
struct list_headtcfm_list;
  };
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 667dc38..7b03b13 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -63,7 +63,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
struct tc_mirred *parm;
struct tcf_mirred *m;
struct net_device *dev;
-   int ret, ok_push = 0;
+   int ret, mac_header_xmit = 0;
bool exists = false;

if (nla == NULL)
@@ -102,10 +102,10 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
case ARPHRD_IPGRE:
case ARPHRD_VOID:
case ARPHRD_NONE:
-   ok_push = 0;
+   mac_header_xmit = 0;
break;
default:
-   ok_push = 1;
+   mac_header_xmit = 1;
break;
}
} else {
@@ -136,7 +136,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
dev_put(rcu_dereference_protected(m->tcfm_dev, 1));
dev_hold(dev);
rcu_assign_pointer(m->tcfm_dev, dev);
-   m->tcfm_ok_push = ok_push;
+   m->tcfm_mac_header_xmit = mac_header_xmit;
}

if (ret == ACT_P_CREATED) {
@@ -181,7 +181,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
goto out;

if (!(at & AT_EGRESS)) {
-   if (m->tcfm_ok_push)
+   if (m->tcfm_mac_header_xmit)
skb_push_rcsum(skb2, skb->mac_len);
}






Re: [PATCH v5 3/7] ipv6 addrconf: rtr_solicits == -1 means unlimited

2016-09-27 Thread Maciej Żenczykowski
That wouldn't really simplify much.

This change currently has 5 lines.
3 of those would be needed anyway if we were to define anything < 0 to
mean infinite.

Yes, you could get rid of the two lines with minus_one in them, but
this way we can also use -2 to mean something else in the future if we
ever want to.


Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread Daniel Borkmann

On 09/27/2016 10:07 AM, Shmulik Ladkani wrote:

Hi David,

On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), da...@davemloft.net wrote:

The discussion on this patch has ventured off into what to do about
recursion.

But it unclear to me where this specific patch, and this series,
stands right now.  Someone please clear this up for me.


Status:
  - Series adds "ingress redirect/mirror" support
  - Positive feedback for the feature
  - So far no comments regarding code itself
  - Questions raised regarding "recursion handling"
Expressed that existing mirred code (i.e egress redirect) is *already*
loop-unsafe (and also, some non-tc netdev constructs, as exampled by
others).
Discussion then wandered to "recursion handling".


Any reason why dev_forward_skb() is not preferred over direct
netif_receive_skb() you're using? It would, for example, implicitly
assure that pkt_type is always PACKET_HOST, etc.

Thanks,
Daniel


Re: [PATCH 3/5] ISDN-Gigaset: Delete an error message for a failed memory allocation

2016-09-27 Thread Tilman Schmidt
On Mon, Sep 26, 2016, at 17:42, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Mon, 26 Sep 2016 15:35:47 +0200
> 
> Omit an extra message for a memory allocation failure in this function.
> 
> Link:
> http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
> 
> Signed-off-by: Markus Elfring 

The patch is fine but the link in the commit message is irrelevant.
Please remove it.
(Yes, I read through the whole presentation to verify that. It was fun,
even.)

-- 
Tilman Schmidt
til...@imap.cc


Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Nicholas Piggin
On Tue, 27 Sep 2016 10:44:04 +0200
Vlastimil Babka  wrote:

> On 09/23/2016 06:47 PM, Jason Baron wrote:
> > Hi,
> >
> > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:  
> >> On Fri, 23 Sep 2016 14:42:53 +0800
> >> "Hillf Danton"  wrote:
> >>  
> 
>  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size 
>  grows
>  with the number of fds passed. We had a customer report page allocation
>  failures of order-4 for this allocation. This is a costly order, so it 
>  might
>  easily fail, as the VM expects such allocation to have a lower-order 
>  fallback.
> 
>  Such trivial fallback is vmalloc(), as the memory doesn't have to be
>  physically contiguous. Also the allocation is temporary for the duration 
>  of the
>  syscall, so it's unlikely to stress vmalloc too much.
> 
>  Note that the poll(2) syscall seems to use a linked list of order-0 
>  pages, so
>  it doesn't need this kind of fallback.  
> >>
> >> How about something like this? (untested)  
> 
> This pushes the limit further, but might just delay the problem. Could be an 
> optimization on top if there's enough interest, though.

What's your customer doing with those selects? If they care at all about
performance, I doubt they want select to attempt order-4 allocations, fail,
then use vmalloc :)



Re: [PATCH] brcmfmac: replace WARNING on timeout with a simple error message

2016-09-27 Thread Arend Van Spriel
On 27-9-2016 12:12, Rafał Miłecki wrote:
> From: Rafał Miłecki 
> 
> Even with timeout increased to 950 ms we get WARNINGs from time to time.
> It mostly happens on A-MPDU stalls (e.g. when station goes out of
> range). It may take up to 5-10 secods for the firmware to recover and
> for that time it doesn't process packets.
> 
> It's still useful to have a message on time out as it may indicate some
> firmware problem and incorrect key update. Raising a WARNING however
> wasn't really that necessary, it doesn't point to any driver bug anymore
> and backtrace wasn't much useful.

Indeed the interesting part would be in another context. So:

Acked-by: Arend van Spriel 
> Signed-off-by: Rafał Miłecki 
> ---
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> index 6d046ba..9e6f60a 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> @@ -1161,7 +1161,8 @@ int brcmf_netdev_wait_pend8021x(struct brcmf_if *ifp)
>!brcmf_get_pend_8021x_cnt(ifp),
>MAX_WAIT_FOR_8021X_TX);
>  
> - WARN_ON(!err);
> + if (!err)
> + brcmf_err("Timed out waiting for no pending 802.1x packets\n");
>  
>   return !err;
>  }
> 


Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Kalle Valo
Arend Van Spriel  writes:

> On 27-9-2016 11:14, Rafał Miłecki wrote:
>> From: Rafał Miłecki 
>> 
>> Flowrings contain skbs waiting for transmission that were passed to us
>> by netif. It means we checked every one of them looking for 802.1x
>> Ethernet type. When deleting flowring we have to use freeing function
>> that will check for 802.1x type as well.
>> 
>> Freeing skbs without a proper check was leading to counter not being
>> properly decreased. This was triggering a WARNING every time
>> brcmf_netdev_wait_pend8021x was called.
>
> Acked-by: Arend van Spriel 
>> Signed-off-by: Rafał Miłecki 
>> ---
>> Kalle: this isn't important enough for 4.8 as it's too late for that.
>> 
>> I'd like to get it for 4.9 however, as this fixes bug that could lead
>> to WARNING on every add_key/del_key call. We was struggling with these
>> WARNINGs for some time and this fixes one of two problems causing them.

Ok, I'll queue this for 4.9.

> Please mark it for stable as well.

I can add that. Any ideas how old releases stable releases should this
go to?

-- 
Kalle Valo


Re: [Gigaset307x-common] ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread SF Markus Elfring
>> I got the impression that the exception handling  was incomplete in the
>> implementation of the function "gigaset_initcs".
> 
> That impression is wrong. Careful reading of the code will confirm that.

* Is it still correct nowadays that the function "gigaset_initcs" did not
  call the function "kfree" after a later function call failed?

* Do you expect that allocated memory will be automatically reclaimed
  after it would return a null pointer?

Regards,
Markus


Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Arend Van Spriel
On 27-9-2016 13:27, Kalle Valo wrote:
> Arend Van Spriel  writes:
> 
>> On 27-9-2016 11:14, Rafał Miłecki wrote:
>>> From: Rafał Miłecki 
>>>
>>> Flowrings contain skbs waiting for transmission that were passed to us
>>> by netif. It means we checked every one of them looking for 802.1x
>>> Ethernet type. When deleting flowring we have to use freeing function
>>> that will check for 802.1x type as well.
>>>
>>> Freeing skbs without a proper check was leading to counter not being
>>> properly decreased. This was triggering a WARNING every time
>>> brcmf_netdev_wait_pend8021x was called.
>>
>> Acked-by: Arend van Spriel 
>>> Signed-off-by: Rafał Miłecki 
>>> ---
>>> Kalle: this isn't important enough for 4.8 as it's too late for that.
>>>
>>> I'd like to get it for 4.9 however, as this fixes bug that could lead
>>> to WARNING on every add_key/del_key call. We was struggling with these
>>> WARNINGs for some time and this fixes one of two problems causing them.
> 
> Ok, I'll queue this for 4.9.
> 
>> Please mark it for stable as well.
> 
> I can add that. Any ideas how old releases stable releases should this
> go to?

Not sure if the vendor directory move causes issues as stable can not
fallback to three-way merge. I assumed it would so my last stable tag
was only for 4.7 and I took care of older kernels at later time with
backported patch. I can do that for this one as well.

Regards,
Arend


Re: [PATCH 0/3] net: fec: updates to align IP header

2016-09-27 Thread David Miller
From: Eric Nelson 
Date: Sat, 24 Sep 2016 07:42:16 -0700

> This patch series is the outcome of investigation into very high
> numbers of alignment faults on kernel 4.1.33 from the linux-fslc
> tree:
> https://github.com/freescale/linux-fslc/tree/4.1-1.0.x-imx
> 
> The first two patches remove support for the receive accelerator (RACC) from
> the i.MX25 and i.MX27 SoCs which don't support the function.
> 
> The third patch enables hardware alignment of the ethernet packet payload
> (and especially the IP header) to prevent alignment faults in the IP stack.
> 
> Testing on i.MX6UL on the 4.1.33 kernel showed that this patch removed
> on the order of 70k alignment faults during a 100MiB transfer using 
> wget.
> 
> Testing on an i.MX6Q (SABRE Lite) board on net-next (4.8.0-rc7) showed
> a much more modest improvement from 10's of faults, and it's not clear
> why that's the case.

Series applied and queued up for -stable.


RE: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread David Laight
From: Nicholas Piggin
> Sent: 27 September 2016 12:25
> On Tue, 27 Sep 2016 10:44:04 +0200
> Vlastimil Babka  wrote:
> 
> > On 09/23/2016 06:47 PM, Jason Baron wrote:
> > > Hi,
> > >
> > > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:
> > >> On Fri, 23 Sep 2016 14:42:53 +0800
> > >> "Hillf Danton"  wrote:
> > >>
> > 
> >  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size 
> >  grows
> >  with the number of fds passed. We had a customer report page allocation
> >  failures of order-4 for this allocation. This is a costly order, so it 
> >  might
> >  easily fail, as the VM expects such allocation to have a lower-order 
> >  fallback.
> > 
> >  Such trivial fallback is vmalloc(), as the memory doesn't have to be
> >  physically contiguous. Also the allocation is temporary for the 
> >  duration of the
> >  syscall, so it's unlikely to stress vmalloc too much.
> > 
> >  Note that the poll(2) syscall seems to use a linked list of order-0 
> >  pages, so
> >  it doesn't need this kind of fallback.
> > >>
> > >> How about something like this? (untested)
> >
> > This pushes the limit further, but might just delay the problem. Could be an
> > optimization on top if there's enough interest, though.
> 
> What's your customer doing with those selects? If they care at all about
> performance, I doubt they want select to attempt order-4 allocations, fail,
> then use vmalloc :)

If they care about performance they shouldn't be passing select() lists that
are anywhere near that large.
If the number of actual fd is small - use poll().

Otherwise you want one of the 'event' mechanisms in order to avoid setting
the markers on every fd after every event (can't remember how you do that
in Linux).

At least this isn't SYSV - poll() was O(n^2) in the number of fd
(because the fd were on a linked list).

David



Re: [PATCH net] Revert "net: ethernet: bcmgenet: use phydev from struct net_device"

2016-09-27 Thread David Miller
From: Florian Fainelli 
Date: Sat, 24 Sep 2016 12:58:30 -0700

> This reverts commit 62469c76007e ("net: ethernet: bcmgenet: use phydev
> from struct net_device") because it causes GENETv1/2/3 adapters to
> expose the following behavior after an ifconfig down/up sequence:
> 
> PING fainelli-linux (10.112.156.244): 56 data bytes
> 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.352 ms
> 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.472 ms (DUP!)
> 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.496 ms (DUP!)
> 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.517 ms (DUP!)
> 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.536 ms (DUP!)
> 64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.557 ms (DUP!)
> 64 bytes from 10.112.156.244: seq=1 ttl=61 time=752.448 ms (DUP!)
> 
> This was previously fixed by commit 5dbebbb44a6a ("net: bcmgenet:
> Software reset EPHY after power on") but the commit we are reverting was
> essentially making this previous commit void, here is why.
> 
> Without commit 62469c76007e we would have the following scenario after
> an ifconfig down then up sequence:
> 
> - bcmgenet_open() calls bcmgenet_power_up() to make sure the PHY is
>   initialized *before* we get to initialize the UniMAC, this is
>   critical to ensure the PHY is in a correct state, priv->phydev is
>   valid, this code executes fine
> 
> - second time from bcmgenet_mii_probe(), through the normal
>   phy_init_hw() call (which arguably could be optimized out)
> 
> Everything is fine in that case. With commit 62469c76007e, we would have
> the following scenario to happen after an ifconfig down then up
> sequence:
> 
> - bcmgenet_close() calls phy_disonnect() which makes dev->phydev become
>   NULL
> 
> - when bcmgenet_open() executes again and calls bcmgenet_mii_reset() from
>   bcmgenet_power_up() to initialize the internal PHY, the NULL check
>   becomes true, so we do not reset the PHY, yet we keep going on and
>   initialize the UniMAC, causing MAC activity to occur
> 
> - we call bcmgenet_mii_reset() from bcmgenet_mii_probe(), but this is
>   too late, the PHY is botched, and causes the above bogus pings/packets
>   transmission/reception to occur
> 
> Reported-by: Jaedon Shin 
> Signed-off-by: Florian Fainelli 

Applied and queued up for -stable.


Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Nicholas Piggin
On Tue, 27 Sep 2016 11:37:24 +
David Laight  wrote:

> From: Nicholas Piggin
> > Sent: 27 September 2016 12:25
> > On Tue, 27 Sep 2016 10:44:04 +0200
> > Vlastimil Babka  wrote:
> >   
> > > On 09/23/2016 06:47 PM, Jason Baron wrote:  
> > > > Hi,
> > > >
> > > > On 09/23/2016 03:24 AM, Nicholas Piggin wrote:  
> > > >> On Fri, 23 Sep 2016 14:42:53 +0800
> > > >> "Hillf Danton"  wrote:
> > > >>  
> > > 
> > >  The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where 
> > >  size grows
> > >  with the number of fds passed. We had a customer report page 
> > >  allocation
> > >  failures of order-4 for this allocation. This is a costly order, so 
> > >  it might
> > >  easily fail, as the VM expects such allocation to have a lower-order 
> > >  fallback.
> > > 
> > >  Such trivial fallback is vmalloc(), as the memory doesn't have to be
> > >  physically contiguous. Also the allocation is temporary for the 
> > >  duration of the
> > >  syscall, so it's unlikely to stress vmalloc too much.
> > > 
> > >  Note that the poll(2) syscall seems to use a linked list of order-0 
> > >  pages, so
> > >  it doesn't need this kind of fallback.  
> > > >>
> > > >> How about something like this? (untested)  
> > >
> > > This pushes the limit further, but might just delay the problem. Could be 
> > > an
> > > optimization on top if there's enough interest, though.  
> > 
> > What's your customer doing with those selects? If they care at all about
> > performance, I doubt they want select to attempt order-4 allocations, fail,
> > then use vmalloc :)  
> 
> If they care about performance they shouldn't be passing select() lists that
> are anywhere near that large.
> If the number of actual fd is small - use poll().

Right. Presumably it's some old app they're still using, no?


Re: [PATCH v3 net-next 0/3] net: bcmgenet: only use new api ethtool_{get|set}_link_ksettings

2016-09-27 Thread David Miller
From: Philippe Reynes 
Date: Mon, 26 Sep 2016 22:31:54 +0200

> Some times ago, a serie of patches were committed :
> - commit 62469c76007e ("net: ethernet: bcmgenet: use phydev from struct 
> net_device")
> - commit 6b352ebccbcf ("net: ethernet: broadcom: bcmgenet: use new api 
> ethtool_{get|set}_link_ksettings")
> The first patch add a regression on this driver, so it should be reverted.
> As the second patch depend on the former, it should be reverted too.
> 
> The first patch is buggy because there is a "trick" in this driver.
> The structure phydev is kept in the private data when the interface
> go down, and used when the interface go up to enable the phy before
> the function phy_connect is called.
> 
> I don't have this hardware, neither the datasheet. So I won't
> update the driver to avoid this trick.
> 
> But the real goal of the first serie was to move to the new api
> ethtool_{get|set}_link_ksettings. So I provide a new version of
> the patch without the "cleaning" of driver to use the phydev
> store in the net_device structure.
> 
> Changelog:
> v3:
> - use priv instead of dev (so all the code use the same phydev)
> v2:
> - use Florian Fainelli patches for the revert instead of Jaedon Shin
> - simply use net: bcmgenet: for the prefix of the patch

Series applied, thanks.


Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Rafał Miłecki
On 27 September 2016 at 13:27, Kalle Valo  wrote:
> Arend Van Spriel  writes:
>
>> On 27-9-2016 11:14, Rafał Miłecki wrote:
>>> From: Rafał Miłecki 
>>>
>>> Flowrings contain skbs waiting for transmission that were passed to us
>>> by netif. It means we checked every one of them looking for 802.1x
>>> Ethernet type. When deleting flowring we have to use freeing function
>>> that will check for 802.1x type as well.
>>>
>>> Freeing skbs without a proper check was leading to counter not being
>>> properly decreased. This was triggering a WARNING every time
>>> brcmf_netdev_wait_pend8021x was called.
>>
>> Acked-by: Arend van Spriel 
>>> Signed-off-by: Rafał Miłecki 
>>> ---
>>> Kalle: this isn't important enough for 4.8 as it's too late for that.
>>>
>>> I'd like to get it for 4.9 however, as this fixes bug that could lead
>>> to WARNING on every add_key/del_key call. We was struggling with these
>>> WARNINGs for some time and this fixes one of two problems causing them.
>
> Ok, I'll queue this for 4.9.
>
>> Please mark it for stable as well.
>
> I can add that. Any ideas how old releases stable releases should this
> go to?

I was analyzing this.
1) This patch uses brcmf_get_ifp which is available in 4.4+ only.
2) It applies cleanly to 4.5+ only due to 32f90caa7debd ("brcmfmac:
Increase nr of supported flowrings.")
3) 4.4 would also require applying to the patch without broadcom/ subdir

That said I suggest 4.5+. Any objections?

-- 
Rafał


Re: [PATCH v2] net: hns: mark symbols static where possible

2016-09-27 Thread David Miller
From: Baoyou Xie 
Date: Mon, 26 Sep 2016 17:13:38 +0800

> We get a few warnings when building kernel with W=1:
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:76:21: warning: no previous 
> prototype for 'hns_ae_get_handle' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:274:6: warning: no previous 
> prototype for 'hns_ae_stop' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:302:6: warning: no previous 
> prototype for 'hns_ae_toggle_ring_irq' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:490:6: warning: no previous 
> prototype for 'hns_ae_update_stats' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:573:6: warning: no previous 
> prototype for 'hns_ae_get_stats' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:605:6: warning: no previous 
> prototype for 'hns_ae_get_strings' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:638:5: warning: no previous 
> prototype for 'hns_ae_get_sset_count' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:687:6: warning: no previous 
> prototype for 'hns_ae_update_led_status' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:698:5: warning: no previous 
> prototype for 'hns_ae_cpld_set_led_id' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:710:6: warning: no previous 
> prototype for 'hns_ae_get_regs' [-Wmissing-prototypes]
> drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c:735:5: warning: no previous 
> prototype for 'hns_ae_get_regs_len' [-Wmissing-prototypes]
> 
> 
> In fact, these functions are only used in the file in which they are
> declared and don't need a declaration, but can be made static.
> so this patch marks these functions with 'static'.
> 
> Signed-off-by: Baoyou Xie 

This still doesn't apply to the net-next tree.

If you aren't actually building your patch against the net-next
tree, don't bother submitting these patches any more.



Re: [PATCH] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Vlastimil Babka

On 09/27/2016 01:42 PM, Nicholas Piggin wrote:

On Tue, 27 Sep 2016 11:37:24 +
David Laight  wrote:


From: Nicholas Piggin
> Sent: 27 September 2016 12:25
> On Tue, 27 Sep 2016 10:44:04 +0200
> Vlastimil Babka  wrote:
>
>
> What's your customer doing with those selects? If they care at all about
> performance, I doubt they want select to attempt order-4 allocations, fail,
> then use vmalloc :)

If they care about performance they shouldn't be passing select() lists that
are anywhere near that large.
If the number of actual fd is small - use poll().


Right. Presumably it's some old app they're still using, no?


Process name suggests it's part of db2 database. It seems it has to implement 
its own interface to select() syscall, because glibc itself seems to have a 
FD_SETSIZE limit of 1024, which is probably why this wasn't an issue for all the 
years...





Re: [PATCH] VSOCK: Don't dec ack backlog twice for rejected connections

2016-09-27 Thread David Miller
From: Jorgen Hansen 
Date: Mon, 26 Sep 2016 23:59:53 -0700

> If a pending socket is marked as rejected, we will decrease the
> sk_ack_backlog twice. So don't decrement it for rejected sockets
> in vsock_pending_work().
> 
> Testing of the rejected socket path was done through code
> modifications.
> 
> Reported-by: Stefan Hajnoczi 
> Signed-off-by: Jorgen Hansen 
> Reviewed-by: Adit Ranadive 
> Reviewed-by: Aditya Sarwade 

Applied, thanks.


Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Rafał Miłecki
On 27 September 2016 at 13:44, Rafał Miłecki  wrote:
> On 27 September 2016 at 13:27, Kalle Valo  wrote:
>> Arend Van Spriel  writes:
>>
>>> On 27-9-2016 11:14, Rafał Miłecki wrote:
 From: Rafał Miłecki 

 Flowrings contain skbs waiting for transmission that were passed to us
 by netif. It means we checked every one of them looking for 802.1x
 Ethernet type. When deleting flowring we have to use freeing function
 that will check for 802.1x type as well.

 Freeing skbs without a proper check was leading to counter not being
 properly decreased. This was triggering a WARNING every time
 brcmf_netdev_wait_pend8021x was called.
>>>
>>> Acked-by: Arend van Spriel 
 Signed-off-by: Rafał Miłecki 
 ---
 Kalle: this isn't important enough for 4.8 as it's too late for that.

 I'd like to get it for 4.9 however, as this fixes bug that could lead
 to WARNING on every add_key/del_key call. We was struggling with these
 WARNINGs for some time and this fixes one of two problems causing them.
>>
>> Ok, I'll queue this for 4.9.
>>
>>> Please mark it for stable as well.
>>
>> I can add that. Any ideas how old releases stable releases should this
>> go to?
>
> I was analyzing this.
> 1) This patch uses brcmf_get_ifp which is available in 4.4+ only.
> 2) It applies cleanly to 4.5+ only due to 32f90caa7debd ("brcmfmac:
> Increase nr of supported flowrings.")
> 3) 4.4 would also require applying to the patch without broadcom/ subdir
>
> That said I suggest 4.5+. Any objections?

Let me see if patchwork with pick Cc tag as it does for others.

Cc: sta...@vger.kernel.org # 4.5+

This may be worth backporting to 4.4 as well (as it's longterm), but
I'll do it separately due to patch not applying cleanly.

-- 
Rafał


Let me hear from you

2016-09-27 Thread abudu samfo
Hello Dear Friend,

 My name is. Samfo Abudu I have decided to seek a confidential
co-operation  with you in the execution of the deal described
here-under for our both  mutual benefit and I hope you will keep it a
top secret because of the nature  of the transaction, During the
course of our bank year auditing, I discovered  an unclaimed/abandoned
fund, sum total of {US$19.3 Million United State  Dollars} in the bank
account that belongs to a Saudi Arabia businessman Who unfortunately
lost his life and entire family in a Motor Accident.

 Now our bank has been waiting for any of the relatives to come-up for
the claim but nobody has done that. I personally has been unsuccessful
in locating any of the relatives, now, I sincerely seek your consent
to present you as the next of kin / Will Beneficiary to the deceased
so that the proceeds of this account valued at {US$19.3 Million United
State Dollars} can be paid to you, which we will share in these
percentages ratio, 60% to me and 40% to you. All I request is your
utmost sincere co-operation; trust and maximum confidentiality to
achieve this project successfully. I have carefully mapped out the
moralities for execution of this transaction under a legitimate
arrangement to protect you from any breach of the law both in your
country and here in Burkina Faso when the fund is being transferred to
your bank account.

 I will have to provide all the relevant document that will be
requested to indicate that you are the rightful beneficiary of this
legacy and our bank will release the fund to you without any further
delay, upon your consideration and acceptance of this offer, please
send me the following information as stated below so we can proceed
and get this fund transferred to your designated bank account
immediately.

Your Full Name:
Your Contact Address:
Your direct Mobile telephone Number:
Your Date of Birth:
Your occupation:

 I await your swift response and re-assurance through my Private email
address:samozasa...@gmail.com

 Samfo Abudu
 Best regards,
so we commence this transaction immediately.


Re: ath10k: Spelling and miscellaneous neatening

2016-09-27 Thread Kalle Valo
Joe Perches  wrote:
> Correct some trivial comment typos.
> Remove unnecessary parentheses in a long line.
> Convert a return; before the end of a void function definition to just ;
> 
> Signed-off-by: Joe Perches 
> Reviewed-by: Julian Calaby 

Patch applied to ath-next branch of ath.git, thanks.

e13dbead976d ath10k: spelling and miscellaneous neatening

-- 
https://patchwork.kernel.org/patch/9304171/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Arend Van Spriel
On 27-9-2016 13:58, Rafał Miłecki wrote:
> On 27 September 2016 at 13:44, Rafał Miłecki  wrote:
>> On 27 September 2016 at 13:27, Kalle Valo  wrote:
>>> Arend Van Spriel  writes:
>>>
 On 27-9-2016 11:14, Rafał Miłecki wrote:
> From: Rafał Miłecki 
>
> Flowrings contain skbs waiting for transmission that were passed to us
> by netif. It means we checked every one of them looking for 802.1x
> Ethernet type. When deleting flowring we have to use freeing function
> that will check for 802.1x type as well.
>
> Freeing skbs without a proper check was leading to counter not being
> properly decreased. This was triggering a WARNING every time
> brcmf_netdev_wait_pend8021x was called.

 Acked-by: Arend van Spriel 
> Signed-off-by: Rafał Miłecki 
> ---
> Kalle: this isn't important enough for 4.8 as it's too late for that.
>
> I'd like to get it for 4.9 however, as this fixes bug that could lead
> to WARNING on every add_key/del_key call. We was struggling with these
> WARNINGs for some time and this fixes one of two problems causing them.
>>>
>>> Ok, I'll queue this for 4.9.
>>>
 Please mark it for stable as well.
>>>
>>> I can add that. Any ideas how old releases stable releases should this
>>> go to?
>>
>> I was analyzing this.
>> 1) This patch uses brcmf_get_ifp which is available in 4.4+ only.
>> 2) It applies cleanly to 4.5+ only due to 32f90caa7debd ("brcmfmac:
>> Increase nr of supported flowrings.")
>> 3) 4.4 would also require applying to the patch without broadcom/ subdir
>>
>> That said I suggest 4.5+. Any objections?

No objections. Just a tip. I tend to look at kernel.org main page to see
the stable and long-term kernel listed. So 4.7+ and 4.5+ have same
meaning as 4.5 and 4.6 are not stable/long-term kernels.

Regards,
Arend

> Let me see if patchwork with pick Cc tag as it does for others.
> 
> Cc: sta...@vger.kernel.org # 4.5+
> 
> This may be worth backporting to 4.4 as well (as it's longterm), but
> I'll do it separately due to patch not applying cleanly.


Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Rafał Miłecki
On 27 September 2016 at 14:04, Arend Van Spriel
 wrote:
> On 27-9-2016 13:58, Rafał Miłecki wrote:
>> On 27 September 2016 at 13:44, Rafał Miłecki  wrote:
>>> On 27 September 2016 at 13:27, Kalle Valo  wrote:
 Arend Van Spriel  writes:

> On 27-9-2016 11:14, Rafał Miłecki wrote:
>> From: Rafał Miłecki 
>>
>> Flowrings contain skbs waiting for transmission that were passed to us
>> by netif. It means we checked every one of them looking for 802.1x
>> Ethernet type. When deleting flowring we have to use freeing function
>> that will check for 802.1x type as well.
>>
>> Freeing skbs without a proper check was leading to counter not being
>> properly decreased. This was triggering a WARNING every time
>> brcmf_netdev_wait_pend8021x was called.
>
> Acked-by: Arend van Spriel 
>> Signed-off-by: Rafał Miłecki 
>> ---
>> Kalle: this isn't important enough for 4.8 as it's too late for that.
>>
>> I'd like to get it for 4.9 however, as this fixes bug that could lead
>> to WARNING on every add_key/del_key call. We was struggling with these
>> WARNINGs for some time and this fixes one of two problems causing them.

 Ok, I'll queue this for 4.9.

> Please mark it for stable as well.

 I can add that. Any ideas how old releases stable releases should this
 go to?
>>>
>>> I was analyzing this.
>>> 1) This patch uses brcmf_get_ifp which is available in 4.4+ only.
>>> 2) It applies cleanly to 4.5+ only due to 32f90caa7debd ("brcmfmac:
>>> Increase nr of supported flowrings.")
>>> 3) 4.4 would also require applying to the patch without broadcom/ subdir
>>>
>>> That said I suggest 4.5+. Any objections?
>
> No objections. Just a tip. I tend to look at kernel.org main page to see
> the stable and long-term kernel listed. So 4.7+ and 4.5+ have same
> meaning as 4.5 and 4.6 are not stable/long-term kernels.

Some projects may work on their own stable kernels, e.g. Ubuntu, see:
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

That's why I don't always look strictly at upstream stable releases only.

-- 
Rafał


Re: [Gigaset307x-common] ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread Tilman Schmidt
On Tue, Sep 27, 2016, at 13:32, SF Markus Elfring wrote:
> >> I got the impression that the exception handling  was incomplete in the
> >> implementation of the function "gigaset_initcs".
> > 
> > That impression is wrong. Careful reading of the code will confirm that.
> 
> * Is it still correct nowadays that the function "gigaset_initcs" did not
>   call the function "kfree" after a later function call failed?

Wrong premise. That statement was never correct.

> * Do you expect that allocated memory will be automatically reclaimed
>   after it would return a null pointer?

No. Should I? Do you?

Regards,
Tilman

-- 
Tilman Schmidt
til...@imap.cc


Re: [PATCH 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Kalle Valo
Rafał Miłecki  writes:

> Kalle: this isn't important enough for 4.8 as it's too late for that.
>
> I'd like to get it for 4.9 however, as this fixes bug that could lead
> to WARNING on every add_key/del_key call. We was struggling with these
> WARNINGs for some time and this fixes one of two problems causing them.
>>>
>>> Ok, I'll queue this for 4.9.
>>>
 Please mark it for stable as well.
>>>
>>> I can add that. Any ideas how old releases stable releases should this
>>> go to?
>>
>> I was analyzing this.
>> 1) This patch uses brcmf_get_ifp which is available in 4.4+ only.
>> 2) It applies cleanly to 4.5+ only due to 32f90caa7debd ("brcmfmac:
>> Increase nr of supported flowrings.")
>> 3) 4.4 would also require applying to the patch without broadcom/ subdir
>>
>> That said I suggest 4.5+. Any objections?
>
> Let me see if patchwork with pick Cc tag as it does for others.
>
> Cc: sta...@vger.kernel.org # 4.5+

An excellent idea but no luck:

Signed-off-by: Rafa? Mi?ecki 
Acked-by: Arend van Spriel 

I'll add this to my patchwork wishlist though, I think it would be a
really useful feature to have.

(The question marks are because of my buggy copy paste, ignore those)

-- 
Kalle Valo


[PATCH V2 4.9] brcmfmac: use correct skb freeing helper when deleting flowring

2016-09-27 Thread Rafał Miłecki
From: Rafał Miłecki 

Flowrings contain skbs waiting for transmission that were passed to us
by netif. It means we checked every one of them looking for 802.1x
Ethernet type. When deleting flowring we have to use freeing function
that will check for 802.1x type as well.

Freeing skbs without a proper check was leading to counter not being
properly decreased. This was triggering a WARNING every time
brcmf_netdev_wait_pend8021x was called.

Signed-off-by: Rafał Miłecki 
Acked-by: Arend van Spriel 
Cc: sta...@vger.kernel.org # 4.5+
---
V2: Add Cc for stable 4.5+. It doesn't apply cleanly to 4.4 and is not
possible for 4.3- due to missing brcmf_get_ifp.
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
index b16b367..d0b738d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/flowring.c
@@ -234,13 +234,20 @@ static void brcmf_flowring_block(struct brcmf_flowring 
*flow, u16 flowid,
 
 void brcmf_flowring_delete(struct brcmf_flowring *flow, u16 flowid)
 {
+   struct brcmf_bus *bus_if = dev_get_drvdata(flow->dev);
struct brcmf_flowring_ring *ring;
+   struct brcmf_if *ifp;
u16 hash_idx;
+   u8 ifidx;
struct sk_buff *skb;
 
ring = flow->rings[flowid];
if (!ring)
return;
+
+   ifidx = brcmf_flowring_ifidx_get(flow, flowid);
+   ifp = brcmf_get_ifp(bus_if->drvr, ifidx);
+
brcmf_flowring_block(flow, flowid, false);
hash_idx = ring->hash_id;
flow->hash[hash_idx].ifidx = BRCMF_FLOWRING_INVALID_IFIDX;
@@ -249,7 +256,7 @@ void brcmf_flowring_delete(struct brcmf_flowring *flow, u16 
flowid)
 
skb = skb_dequeue(&ring->skblist);
while (skb) {
-   brcmu_pkt_buf_free_skb(skb);
+   brcmf_txfinalize(ifp, skb, false);
skb = skb_dequeue(&ring->skblist);
}
 
-- 
2.9.3



Re: [1/3] ath10k: use devm_clk_get() instead of clk_get()

2016-09-27 Thread Kalle Valo
Masahiro Yamada  wrote:
> Use the managed variant of clk_get() to simplify the failure path
> and the .remove callback.
> 
> Signed-off-by: Masahiro Yamada 

3 patches applied to ath-next branch of ath.git, thanks.

828662753d60 ath10k: use devm_clk_get() instead of clk_get()
c5d8a34675d9 ath10k: use devm_reset_control_get() instead of reset_control_get()
65901a9e7058 ath10k: do not check if reset is NULL

-- 
https://patchwork.kernel.org/patch/9316579/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches



Re: [Gigaset307x-common] ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread isdn
Am 27.09.2016 um 13:32 schrieb SF Markus Elfring:
>>> I got the impression that the exception handling  was incomplete in the
>>> implementation of the function "gigaset_initcs".
>>
>> That impression is wrong. Careful reading of the code will confirm that.
> 
> * Is it still correct nowadays that the function "gigaset_initcs" did not
>   call the function "kfree" after a later function call failed?
> 

Yes, if it is handled in another place, Paul already did show you the place.


> * Do you expect that allocated memory will be automatically reclaimed
>   after it would return a null pointer?
> 
Of course not

Best regards
Karsten



[PATCH] ipv6 addrconf: enable use of proc_dointvec_minmax in addrconf_sysctl

2016-09-27 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

Signed-off-by: Maciej Żenczykowski 
---
 net/ipv6/addrconf.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2f1f5d439788..11fa1a5564d4 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6044,8 +6044,14 @@ static int __addrconf_sysctl_register(struct net *net, 
char *dev_name,
 
for (i = 0; table[i].data; i++) {
table[i].data += (char *)p - (char *)&ipv6_devconf;
-   table[i].extra1 = idev; /* embedded; no ref */
-   table[i].extra2 = net;
+   /* If one of these is already set, then it is not safe to
+* overwrite either of them: this makes proc_dointvec_minmax
+* usable.
+*/
+   if (!table[i].extra1 && !table[i].extra2) {
+   table[i].extra1 = idev; /* embedded; no ref */
+   table[i].extra2 = net;
+   }
}
 
snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
-- 
2.8.0.rc3.226.g39d4020



Re: [Gigaset307x-common] ISDN-Gigaset: Release memory in gigaset_initcs() after an allocation failure

2016-09-27 Thread SF Markus Elfring
>> * Is it still correct nowadays that the function "gigaset_initcs" did not
>>   call the function "kfree" after a later function call failed?
> 
> Yes, if it is handled in another place, Paul already did show you the place.

To which source code place do you refer here?


>> * Do you expect that allocated memory will be automatically reclaimed
>>   after it would return a null pointer?
>>
> Of course not

Thanks for this acknowledgement.

Regards,
Markus


Re: [PATCH] Fix link error in 32bit arch because of 64bit division

2016-09-27 Thread Eric Dumazet
On Tue, 2016-09-27 at 03:42 -0400, Vishwanath Pai wrote:
> Fix link error in 32bit arch because of 64bit division
> 
> Division of 64bit integers will cause linker error undefined reference
> to `__udivdi3'. Fix this by replacing divisions with div64_64
> 
> Signed-off-by: Vishwanath Pai 
> 
> ---
>  net/netfilter/xt_hashlimit.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
> index 44a095e..7fc694e 100644
> --- a/net/netfilter/xt_hashlimit.c
> +++ b/net/netfilter/xt_hashlimit.c
> @@ -465,19 +465,20 @@ static u64 user2credits(u64 user, int revision)
>  {
>   if (revision == 1) {
>   /* If multiplying would overflow... */
> - if (user > 0x / (HZ*CREDITS_PER_JIFFY_v1))
> + if (user > div64_u64(0x, (HZ*CREDITS_PER_JIFFY_v1)))

How can this be needed ? 0x is 32bits, compiler knows how to
compute 0x / (HZ*CREDITS_PER_JIFFY_v1) itself, without using a
64 bit divide !

Please be selective.

>   /* Divide first. */
> - return (user / XT_HASHLIMIT_SCALE) *\
> + return div64_u64(user, XT_HASHLIMIT_SCALE) *\
>   HZ * CREDITS_PER_JIFFY_v1;
>  
> - return (user * HZ * CREDITS_PER_JIFFY_v1) \
> - / XT_HASHLIMIT_SCALE;
> + return div64_u64((user * HZ * CREDITS_PER_JIFFY_v1),
> +   XT_HASHLIMIT_SCALE);
>   } else {
> - if (user > 0x / (HZ*CREDITS_PER_JIFFY))
> - return (user / XT_HASHLIMIT_SCALE_v2) *\

Probably same remark here.

> + if (user > div64_u64(0x, 
> (HZ*CREDITS_PER_JIFFY)))
> + return div64_u64(user, XT_HASHLIMIT_SCALE_v2) *\
>   HZ * CREDITS_PER_JIFFY;
>  
> - return (user * HZ * CREDITS_PER_JIFFY) / XT_HASHLIMIT_SCALE_v2;
> + return div64_u64((user * HZ * CREDITS_PER_JIFFY),
> +  XT_HASHLIMIT_SCALE_v2);
>   }
>  }
>  




Re: [PATCH v2] fs/select: add vmalloc fallback for select(2)

2016-09-27 Thread Eric Dumazet
On Tue, 2016-09-27 at 10:13 +0200, Vlastimil Babka wrote:

> I doubt anyone runs that in production, especially if performance is of 
> concern.
> 

I doubt anyone serious runs select() on a large fd set in production.

Last time I used it was in last century.




[PATCH nf-next v2 0/2] fixes for recent nf_compact hooks

2016-09-27 Thread Aaron Conole
Two possible error conditions were caught during an extended testing
session, and by a build robot.  These patches fix the two issues (a
missing handler when config is changed, and a potential NULL
dereference).

Aaron Conole (2):
  netfilter: Fix potential null pointer dereference
  nf_set_hooks_head: acommodate different kconfig

 net/netfilter/core.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.5.5



[PATCH nf-next v2 1/2] netfilter: Fix potential null pointer dereference

2016-09-27 Thread Aaron Conole
It's possible for nf_hook_entry_head to return NULL if two
nf_unregister_net_hook calls happen simultaneously with a single hook
entry in the list.  This fix ensures that no null pointer dereference
could occur when such a race happens.

Signed-off-by: Aaron Conole 
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 360c63d..e58e420 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -160,7 +160,7 @@ void nf_unregister_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
 
mutex_lock(&nf_hook_mutex);
hooks_entry = nf_hook_entry_head(net, reg);
-   if (hooks_entry->orig_ops == reg) {
+   if (hooks_entry && hooks_entry->orig_ops == reg) {
nf_set_hooks_head(net, reg,
  nf_entry_dereference(hooks_entry->next));
goto unlock;
-- 
2.7.4



[PATCH nf-next v2 2/2] nf_set_hooks_head: acommodate different kconfig

2016-09-27 Thread Aaron Conole
When CONFIG_NETFILTER_INGRESS is unset (or no), we need to handle
the request for registration properly by dropping the hook.  This
releases the entry during the set.

Signed-off-by: Aaron Conole 
---
 net/netfilter/core.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index e58e420..61e8a9d 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -90,10 +90,12 @@ static void nf_set_hooks_head(struct net *net, const struct 
nf_hook_ops *reg,
 {
switch (reg->pf) {
case NFPROTO_NETDEV:
+#ifdef CONFIG_NETFILTER_INGRESS
/* We already checked in nf_register_net_hook() that this is
 * used from ingress.
 */
rcu_assign_pointer(reg->dev->nf_hooks_ingress, entry);
+#endif
break;
default:
rcu_assign_pointer(net->nf.hooks[reg->pf][reg->hooknum],
@@ -107,10 +109,15 @@ int nf_register_net_hook(struct net *net, const struct 
nf_hook_ops *reg)
struct nf_hook_entry *hooks_entry;
struct nf_hook_entry *entry;
 
-   if (reg->pf == NFPROTO_NETDEV &&
-   (reg->hooknum != NF_NETDEV_INGRESS ||
-!reg->dev || dev_net(reg->dev) != net))
-   return -EINVAL;
+   if (reg->pf == NFPROTO_NETDEV) {
+#ifndef CONFIG_NETFILTER_INGRESS
+   if (reg->hooknum == NF_NETDEV_INGRESS)
+   return -EOPNOTSUPP;
+#endif
+   if (reg->hooknum != NF_NETDEV_INGRESS ||
+   !reg->dev || dev_net(reg->dev) != net)
+   return -EINVAL;
+   }
 
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
if (!entry)
-- 
2.7.4



Re: [PATCH net-next 0/2] net: ethernet: mediatek: some bug fixes for PDAM and HW LRO

2016-09-27 Thread David Miller
From: Nelson Chang 
Date: Mon, 26 Sep 2016 14:33:48 +0800

> 1) Add to stop PDMA while stopping the frame engine
> 2) Modify the register settings for LRO relinquishments
> 3) Jump out from the waiting loop while LRO relinquishments are done

Series applied, but like Sergei I think you should have split patch
#2 into two separate patches.

You even list the changes individually here in your header
posting.


Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-27 Thread David Miller
From: Daniel Borkmann 
Date: Tue, 27 Sep 2016 12:39:34 +0200

> Any reason why dev_forward_skb() is not preferred over direct
> netif_receive_skb() you're using? It would, for example, implicitly
> assure that pkt_type is always PACKET_HOST, etc.

dev_forward_skb() will pull the ethernet header.

And since a direct call to netif_receive_skb() will not, one of these
two choices won't work properly.



[PATCH v7 6/8] thunderbolt: Networking transmit and receive

2016-09-27 Thread Amir Levy
This patch provides the handling interface for sending and receiving
network packets between the hosts over the full communication route
(using the communication path established in the previous patch).

The Thunderbolt Network driver interfaces the Linux network stack
and the hardware controller configuration to handle packet transmissions:
  ++++
  |Host 1  ||Host 2  |
  ||||
  |   +---+||+---+   |
  |   |Network||||Network|   |
  |   |Stack  ||||Stack  |   |
  |   +---+||+---+   |
  |   ^||^   |
  |   ||||   |
  |   v||v   |
  | +---+  ||  +---+ |
  | |Thunderbolt|  ||  |Thunderbolt| |
  | |Networking |  ||  |Networking | |
  | |Driver |  ||  |Driver | |
  | +---+  ||  +---+ |
  |   ^||^   |
  |   ||||   |
  |   v||v   |
  | +---+  ||  +---+ |
  | |Thunderbolt|  ||  |Thunderbolt| |
  | |Controller |<-++->|Controller | |
  | +---+  ||  +---+ |
  ++++

Signed-off-by: Amir Levy 
---
 drivers/thunderbolt/icm/icm_nhi.c |   15 +
 drivers/thunderbolt/icm/net.c | 1471 +
 2 files changed, 1486 insertions(+)

diff --git a/drivers/thunderbolt/icm/icm_nhi.c 
b/drivers/thunderbolt/icm/icm_nhi.c
index 578eb14..c181abf 100644
--- a/drivers/thunderbolt/icm/icm_nhi.c
+++ b/drivers/thunderbolt/icm/icm_nhi.c
@@ -1036,6 +1036,7 @@ static irqreturn_t nhi_msi(int __always_unused irq, void 
*data)
 {
struct tbt_nhi_ctxt *nhi_ctxt = data;
u32 isr0, isr1, imr0, imr1;
+   int i;
 
/* clear on read */
isr0 = ioread32(nhi_ctxt->iobase + REG_RING_NOTIFY_BASE);
@@ -1058,6 +1059,20 @@ static irqreturn_t nhi_msi(int __always_unused irq, void 
*data)
 
spin_unlock(&nhi_ctxt->lock);
 
+   for (i = 0; i < nhi_ctxt->num_ports; ++i) {
+   struct net_device *net_dev =
+   nhi_ctxt->net_devices[i].net_dev;
+   if (net_dev) {
+   u8 path = PATH_FROM_PORT(nhi_ctxt->num_paths, i);
+
+   if (isr0 & REG_RING_INT_RX_PROCESSED(
+   path, nhi_ctxt->num_paths))
+   tbt_net_rx_msi(net_dev);
+   if (isr0 & REG_RING_INT_TX_PROCESSED(path))
+   tbt_net_tx_msi(net_dev);
+   }
+   }
+
if (isr0 & REG_RING_INT_RX_PROCESSED(TBT_ICM_RING_NUM,
 nhi_ctxt->num_paths))
schedule_work(&nhi_ctxt->icm_msgs_work);
diff --git a/drivers/thunderbolt/icm/net.c b/drivers/thunderbolt/icm/net.c
index acf30ad..a0f3c4a 100644
--- a/drivers/thunderbolt/icm/net.c
+++ b/drivers/thunderbolt/icm/net.c
@@ -134,6 +134,17 @@ struct approve_inter_domain_connection_cmd {
 
 };
 
+struct tbt_frame_header {
+   /* size of the data with the frame */
+   __le32 frame_size;
+   /* running index on the frames */
+   __le16 frame_index;
+   /* ID of the frame to match frames to specific packet */
+   __le16 frame_id;
+   /* how many frames assembles a full packet */
+   __le32 frame_count;
+};
+
 enum neg_event {
RECEIVE_LOGOUT = NUM_MEDIUM_STATUSES,
RECEIVE_LOGIN_RESPONSE,
@@ -141,15 +152,81 @@ enum neg_event {
NUM_NEG_EVENTS
 };
 
+enum frame_status {
+   GOOD_FRAME,
+   GOOD_AS_FIRST_FRAME,
+   GOOD_AS_FIRST_MULTICAST_FRAME,
+   FRAME_NOT_READY,
+   FRAME_ERROR,
+};
+
+enum packet_filter {
+   /* all multicast MAC addresses */
+   PACKET_TYPE_ALL_MULTICAST,
+   /* all types of MAC addresses: multicast, unicast and broadcast */
+   PACKET_TYPE_PROMISCUOUS,
+   /* all unicast MAC addresses */
+   PACKET_TYPE_UNICAST_PROMISCUOUS,
+};
+
 enum disconnect_path_stage {
STAGE_1 = BIT(0),
STAGE_2 = BIT(1)
 };
 
+struct tbt_net_stats {
+   u64 tx_packets;
+   u64 tx_bytes;
+   u64 tx_errors;
+   u64 rx_packets;
+   u64 rx_bytes;
+   u64 rx_length_errors;
+   u64 rx_over_errors;
+   u64 rx_crc_errors;
+   u64 rx_missed_errors;
+   u64 multicast;
+};
+
+static const char tbt_net_gstrings_stats[][ETH_GSTRING_LEN] = {
+   "tx_packets",
+   "tx_bytes",
+   "tx_errors",
+   "rx_packets",
+   "rx_bytes",
+   "rx_length_errors",
+   "rx_over_errors",
+   "rx_crc_errors",
+   "rx_missed_errors",
+   "multicast

  1   2   3   >