date:20170828

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-28 Thread Greg KH

On Mon, Aug 28, 2017 at 04:46:07PM -0700, Joe Perches wrote:
> On Mon, 2017-08-28 at 16:42 -0700, David Miller wrote:
> > From: Greg Kroah-Hartman 
> > Date: Sun, 27 Aug 2017 17:03:30 +0200
> > 
> > > The IRDA code has long been obsolete and broken.  So, to keep people
> > > from trying to use it, and to prevent people from having to maintain it,
> > > let's move it to drivers/staging/ so that we can delete it entirely from
> > > the kernel in a few releases.
> > 
> > No objection, I'll apply this to net-next, thanks Greg.
> 
> Still needs an update to MAINTAINERS.

Oops, forgot those directories, will send a follow-on patch for that.

greg k-h

Re: [PATCH net 0/4] xfrm_user info leaks

2017-08-28 Thread Steffen Klassert

On Mon, Aug 28, 2017 at 03:52:32PM -0700, David Miller wrote:
> From: Mathias Krause 
> Date: Sat, 26 Aug 2017 17:08:56 +0200
> 
> > Hi David, Steffen,
> > 
> > the following series fixes a few info leaks due to missing padding byte
> > initialization in the xfrm_user netlink interface.
> > 
> > Please apply!
> 
> Steffen please pick this up if you haven't already.

I had it already in the ipsec/testing branch, now merged into
ipsec/master.

Thanks everyone!

Re: [PATCH net-next] hinic: don't build the module by default

2017-08-28 Thread David Miller

From: Vitaly Kuznetsov 
Date: Mon, 28 Aug 2017 15:16:05 +0200

> We probably don't want to enable code supporting particular hardware by
> default e.g. when someone does 'make defconfig'. Other ethernet modules
> don't do it.
> 
> Signed-off-by: Vitaly Kuznetsov 

Applied, thanks.

Re: [PATCH net-next v2 00/10] net: dsa: add generic debugfs interface

2017-08-28 Thread David Miller

From: Vivien Didelot 
Date: Mon, 28 Aug 2017 15:17:38 -0400

> This patch series adds a generic debugfs interface for the DSA
> framework, so that all switch devices benefit from it, e.g. Marvell,
> Broadcom, Microchip or any other DSA driver.

I've been thinking this over and I agree with the feedback given that
debugfs really isn't appropriate for this.

Please create a DSA device class, and hang these values under
appropriate sysfs device nodes that can be easily found via
/sys/class/dsa/ just as easily as they would be /sys/kernel/debug/dsa/

You really intend these values to be consistent across DSA devices,
and you don't intend to go willy-nilly changig these exported values
arbitrarily over time.  That's what debugfs is for, throw-away
stuff.

So please make these proper device sysfs attributes rather than
debugfs.

Thank you.

Re: [net-next] be2net: use shift instead of expensive divide

2017-08-28 Thread David Miller

From: Zhang Shengju 
Date: Tue, 29 Aug 2017 11:18:39 +0800

> Replace shift instead of expensive divide.
> 
> Signed-off-by: Zhang Shengju 

The divide is more easier to understand, and it costs the same
as a shift if the types are unsigned.

I'm not applying silly changes like this which actually make
the code worse off, sorry.

[PATCH net-next v2] bridge: fdb add and delete tracepoints

2017-08-28 Thread Roopa Prabhu

From: Roopa Prabhu 

A few useful tracepoints to trace bridge forwarding
database updates.

Signed-off-by: Roopa Prabhu 
---
v2 - address comments from Florian

 include/trace/events/bridge.h |   98 +
 net/bridge/br_fdb.c   |7 +++
 net/core/net-traces.c |6 +++
 3 files changed, 111 insertions(+)
 create mode 100644 include/trace/events/bridge.h

diff --git a/include/trace/events/bridge.h b/include/trace/events/bridge.h
new file mode 100644
index 000..3a4ecc3
--- /dev/null
+++ b/include/trace/events/bridge.h
@@ -0,0 +1,98 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM bridge
+
+#if !defined(_TRACE_BRIDGE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_BRIDGE_H
+
+#include 
+#include 
+
+#include "../../../net/bridge/br_private.h"
+
+TRACE_EVENT(br_fdb_add,
+
+   TP_PROTO(struct ndmsg *ndm, struct net_device *dev,
+const unsigned char *addr, u16 vid, u16 nlh_flags),
+
+   TP_ARGS(ndm, dev, addr, vid, nlh_flags),
+
+   TP_STRUCT__entry(
+   __field(u8, ndm_flags)
+   __string(dev, dev->name)
+   __array(unsigned char, addr, ETH_ALEN)
+   __field(u16, vid)
+   __field(u16, nlh_flags)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev, dev->name);
+   memcpy(__entry->addr, addr, ETH_ALEN);
+   __entry->vid = vid;
+   __entry->nlh_flags = nlh_flags;
+   __entry->ndm_flags = ndm->ndm_flags;
+   ),
+
+   TP_printk("dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u nlh_flags 
%04x ndm_flags = %02x",
+ __get_str(dev), __entry->addr[0], __entry->addr[1],
+ __entry->addr[2], __entry->addr[3], __entry->addr[4],
+ __entry->addr[5], __entry->vid,
+ __entry->nlh_flags, __entry->ndm_flags)
+);
+
+TRACE_EVENT(br_fdb_external_learn_add,
+
+   TP_PROTO(struct net_bridge *br, struct net_bridge_port *p,
+const unsigned char *addr, u16 vid),
+
+   TP_ARGS(br, p, addr, vid),
+
+   TP_STRUCT__entry(
+   __string(br_dev, br->dev->name)
+   __string(dev, p->dev->name)
+   __array(unsigned char, addr, ETH_ALEN)
+   __field(u16, vid)
+   ),
+
+   TP_fast_assign(
+   __assign_str(br_dev, br ? br->dev->name : "null");
+   __assign_str(dev, p ? p->dev->name : "null");
+   memcpy(__entry->addr, addr, ETH_ALEN);
+   __entry->vid = vid;
+   ),
+
+   TP_printk("br_dev %s port %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
+ __get_str(br_dev), __get_str(dev), __entry->addr[0],
+ __entry->addr[1], __entry->addr[2], __entry->addr[3],
+ __entry->addr[4], __entry->addr[5], __entry->vid)
+);
+
+TRACE_EVENT(fdb_delete,
+
+   TP_PROTO(struct net_bridge *br, struct net_bridge_fdb_entry *f),
+
+   TP_ARGS(br, f),
+
+   TP_STRUCT__entry(
+   __string(br_dev, br->dev->name)
+   __string(dev, f->dst ? f->dst->dev->name : "null")
+   __array(unsigned char, addr, ETH_ALEN)
+   __field(u16, vid)
+   ),
+
+   TP_fast_assign(
+   __assign_str(br_dev, br ? br->dev->name : "null");
+   __assign_str(dev, f->dst ? f->dst->dev->name : "null");
+   memcpy(__entry->addr, f->addr.addr, ETH_ALEN);
+   __entry->vid = f->vlan_id;
+   ),
+
+   TP_printk("br_dev %s dev %s addr %02x:%02x:%02x:%02x:%02x:%02x vid %u",
+ __get_str(br_dev), __get_str(dev), __entry->addr[0],
+ __entry->addr[1], __entry->addr[2], __entry->addr[3],
+ __entry->addr[4], __entry->addr[5], __entry->vid)
+);
+
+#endif /* _TRACE_BRIDGE_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index a79b648..be5e1da 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "br_private.h"
 
 static struct kmem_cache *br_fdb_cache __read_mostly;
@@ -171,6 +172,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, const 
unsigned char *addr)
 
 static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f)
 {
+   trace_fdb_delete(br, f);
+
if (f->is_static)
fdb_del_hw_addr(br, f->addr.addr);
 
@@ -870,6 +873,8 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
struct net_bridge *br = NULL;
int err = 0;
 
+   trace_br_fdb_add(ndm, dev, addr, vid, nlh_flags);
+
if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_NOARP|NUD_REACHABLE))) {
pr_info("bridge: RTM_NEWNEIGH with invalid state %#x\n", 
ndm->ndm_state);
return -EINVAL;
@@ -1066,6 +1071,8 @@ int

Re: [PATCH] be2net: Fix some u16 fields appropriately

2017-08-28 Thread David Miller

From: 严海双 
Date: Tue, 29 Aug 2017 09:04:57 +0800

> The GET_TX_COMPL_BITS comes from amap_get which also returns a 32-bit value:

It never returns a value with more than 16-bits of significance for
this specific call.

Please stop trying to be semantically clever when arguing about this
change.

It's not about types, it's about what range of values the struct
member can actually hold.

Re: [PATCH net] net: dsa: Don't dereference dst->cpu_dp->netdev

2017-08-28 Thread David Miller

From: Florian Fainelli 
Date: Mon, 28 Aug 2017 17:10:51 -0700

> If we do not have a master network device attached dst->cpu_dp will be
> NULL and accessing cpu_dp->netdev will create a trace similar to the one
> below. The correct check is on dst->cpu_dp period.
 ...
> Reported-by: Dan Carpenter 
> Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use 
> dst->cpu_dp->netdev")
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.

Re: net.ipv4.tcp_max_syn_backlog implementation

2017-08-28 Thread Eric Dumazet

On Mon, 2017-08-28 at 23:47 -0400, Harsha Chenji wrote:
> So I have ubuntu 12.04 x32 in a VM with syncookies turned off. I tried
> to do a syn flood (with netwox) on 3 different processes. Each of them
> returns a different value with netstat -na | grep -c RECV :
> 
> nc -l  returns 16 (netcat-traditional)
> apache2 port 80 returns 256
> vsftpd on 21 returns 64.
> net.ipv4.tcp_max_syn_backlog is 512.
> 
> Why do these different processes on different ports have different
> queue lengths for incomplete connections? Where exactly in the kernel
> is this decided?

See 2nd argument in listen() system call, ie backlog 

man listen

Without a synflood, just look at "ss -t state listening" 

The backlog is the 2nd column (Send)

Re: net.ipv4.tcp_max_syn_backlog implementation

2017-08-28 Thread Willy Tarreau

On Mon, Aug 28, 2017 at 11:47:41PM -0400, Harsha Chenji wrote:
> So I have ubuntu 12.04 x32 in a VM with syncookies turned off. I tried
> to do a syn flood (with netwox) on 3 different processes. Each of them
> returns a different value with netstat -na | grep -c RECV :
> 
> nc -l  returns 16 (netcat-traditional)
> apache2 port 80 returns 256
> vsftpd on 21 returns 64.
> net.ipv4.tcp_max_syn_backlog is 512.
> 
> Why do these different processes on different ports have different
> queue lengths for incomplete connections? Where exactly in the kernel
> is this decided?

The listening socket's backlog (second argument to the listen() syscall)
is considered as well. The code path to determine whether or not to start
to send SYN cookies is far from being trivial but makes sense once you
write it down completely. I never perfectly remember it, I regularly have
to recheck when I have a doubt.

Willy

Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters

2017-08-28 Thread Alexei Starovoitov

On Mon, Aug 28, 2017 at 08:22:31PM -0600, David Ahern wrote:
> On 8/28/17 7:12 PM, Alexei Starovoitov wrote:
>  To consider what happens on doubling back and changing programs in the
>  hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
>  recursive on 'b' and recursive on 'c') for each of the following cases:
> 
>  1. Program attached to 'b' is detached, recursive flag is reset in the
>  request. Attempt fails EINVAL because the recursion flag has to be set.
> >>>
> >>> didn't get this point. you mean 'detach' will fail?
> >>
> >> Yes, because it tried to reset the flag in the process.
> >>
> >> This is something we can make user friendly - the detach succeeds, but
> >> the recurse flag is ignored and the recurse flag in the group is not
> >> reset unless it is the base group with the recurse flag (i.e., the
> >> parent is marked non-recurse).
> > 
> > if we don't reset group flags to default it will be even more difficult
> > for users to use, since attach with recursive flag + immediate detach sets
> > some internal flag on the cgroup and user space has no way of
> > either querying this flag or deleting it.
> 
> We have discussed this before -- the need to know which cgroup has a
> program and now what is the status of flags. That need is a different
> problem than this patch set.
> 
> I'll address the reset of the flags below to keep that discussion together.
> 
> > 
> >>>
>  2. Program attached to 'b' is detached, recursive flag is set. Allowed.
> >>>
> >>> meaing that detach from 'b' has to pass recurse flag to be detached?
> >>> That's also odd.
> >>> imo detach should always succeed and the process doing detach
> >>> shouldn't need to know what flags were used in attach.
> >>
> >> Then, we agree to make it user friendly and handle resetting the recurse
> >> flag automatically.
> > 
> > in that sense yes attach/delete pair should be side-effect free.
> > 
>  Process in 'b' opens a socket. No program attached to 'b' so no program
>  is run. Recursive flag is set to program from 'a' is run. Stop.
> 
>  We should allow the recursive flag to be reset if the parent is not
>  recursive allowing an unwind of settings applied. I'll add that change.
> >>>
> >>> I don't get this part.
> >>> Anyway looking forward to the next patch set with tests and comments like 
> >>> above.
> >>
> >> Per above discussion, you don't want 'a' run since it is not marked
> >> recurse. My last sentence is the user friendly part in resetting the
> >> flag in the cgroup.
> > 
> > I'm still not grasping fully the semantics of what you're proposing.
> > You keep saying that override and recurse flags are indepedent, but
> > the more we talk the more it's clear that there is a complicated
> > relationship between them. Like no_override overrules everything, etc.
> 
> yes, I have said that a few times. Override should block everything in
> terms of installing programs. If it is not enabled, the status of the
> recurse flag is not relevant at attach / detach time as the call should
> fail. So installing a program with the recurse flag only works if
> override is allowed.
> 
> 
> > I'm looking for the simplest to use logic. Not implementation.
> > Implementation can be complex, but uapi should be as simple to
> > explain and as simple to understand as possible.
> > So how about allowing recurse+overide combination only?
> > All descendents must be recurse+override too and
> > no program allowed to be set on parent unless it's recurse+override
> > as well. Then detach anywhere is simple, since all programs in
> > such chain are always recurse+override.
> 
> 
> Let's walk through examples based on the new ground rule - recursion
> stops at last cgroup with flag set.
> 
> Assuming override is allowed ...
> 
> ${MNT}/a/b/c/d
> 
> - 'a' has no program
> 
> - 'b' has a program, override allowed, recurse set
> 
> - 'c' and 'd' inherit the program from 'b' by recursion, not inheritance
> (ie., bpf.effective is not updated with the program from 'b', but the
> recurse flag is set on 'c' and 'd').
> 
> At this point 'c' and 'd' can ONLY take programs that are recursive.
> 
> - 'c' gets a program installed
> - 'd' gets a program installed.
> 
> 
> Process in 'd' has programs run in this order: 'd', 'c', 'b'
> 
> 
> Now, program 'c' is detached. It is in the middle of the recursive set.
> It MUST keep the recurse flag set as it inherited the restriction from
> 'b'. The recurse flag on 'c' can ONLY be reset when the program is
> detached from 'b' as it is the start of the recursive chain.
> 
> I'll stop here to make sure we agree on the above. Considering all
> permutations is a maze.

Agree on the above, but you're mixing semantics of the new recurse
flag and implementation of it. Ex: we don't have to copy this flag
from prog->attr into cgroup. So this reset or non-reset discussion
only makes sense in the context of your current implementation.
We can implement the logic

net.ipv4.tcp_max_syn_backlog implementation

2017-08-28 Thread Harsha Chenji

So I have ubuntu 12.04 x32 in a VM with syncookies turned off. I tried
to do a syn flood (with netwox) on 3 different processes. Each of them
returns a different value with netstat -na | grep -c RECV :

nc -l  returns 16 (netcat-traditional)
apache2 port 80 returns 256
vsftpd on 21 returns 64.
net.ipv4.tcp_max_syn_backlog is 512.

Why do these different processes on different ports have different
queue lengths for incomplete connections? Where exactly in the kernel
is this decided?

Re: [net-next] be2net: use shift instead of expensive divide

2017-08-28 Thread Joe Perches

On Tue, 2017-08-29 at 11:18 +0800, Zhang Shengju wrote:
> Replace shift instead of expensive divide.

This change is mostly pointless.
Any half-way decent compiler should produce the same object.
gcc emits the same object with the old code.

The AMAP_GET_BITS macro uses the "offsetof(struct, member) / 32"
style too.

> diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
> b/drivers/net/ethernet/emulex/benet/be_main.c
[]
> @@ -2455,7 +2455,7 @@ static struct be_rx_compl_info *be_rx_compl_get(struct 
> be_rx_obj *rxo)
>  
>   /* For checking the valid bit it is Ok to use either definition as the
>* valid bit is at the same position in both v0 and v1 Rx compl */
> - if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) / 32] == 0)
> + if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) >> 5] == 0)

etc...

Re: Hooking on L4 Level with process information

2017-08-28 Thread Ravish Kumar

Not convinced with this .

A process open a socket and that socket is associated with that
particular process unless it shares the file descriptors.
Can you explain why it is not related , at a time a socket will be
opened by a particular process.

On Tue, Aug 29, 2017 at 8:49 AM, Stephen Hemminger
 wrote:
> On Tue, 29 Aug 2017 07:34:51 +0530
> Ravish Kumar  wrote:
>
>> Hi,
>>
>> I want to hook tcp/udp packets on L4 Layer and based on process
>> information , content want to deny or allow packets.
>>
>> Netfilter provides pre/post Routing hooks but not sure that will be
>> right place so thought of asking whether my approach is right.
>> Also how i can get process information whether this packet is send by
>> this process.
>>
>> Thoughts /source code reference would be appreciated.
>>
>> Regards,
>> Ravish
>
> There is not a 1:1 relationship between sockets/files and processes.

RE: [patch net-next 2/3] net/sched: Change cls_flower to use IDR

2017-08-28 Thread Chris Mi



> -Original Message-
> From: Simon Horman [mailto:simon.hor...@netronome.com]
> Sent: Monday, August 28, 2017 7:37 PM
> To: Chris Mi 
> Cc: netdev@vger.kernel.org; j...@mojatatu.com;
> xiyou.wangc...@gmail.com; j...@resnulli.us; da...@davemloft.net;
> mawil...@microsoft.com
> Subject: Re: [patch net-next 2/3] net/sched: Change cls_flower to use IDR
> 
> On Mon, Aug 28, 2017 at 02:41:16AM -0400, Chris Mi wrote:
> > Currently, all filters with the same priority are linked in a doubly
> > linked list. Every filter should have a unique handle. To make the
> > handle unique, we need to iterate the list every time to see if the
> > handle exists or not when inserting a new filter. It is time-consuming.
> > For example, it takes about 5m3.169s to insert 64K rules.
> >
> > This patch changes cls_flower to use IDR. With this patch, it takes
> > about 0m1.127s to insert 64K rules. The improvement is huge.
> 
> Very nice :)
> 
> > But please note that in this testing, all filters share the same action.
> > If every filter has a unique action, that is another bottleneck.
> > Follow-up patch in this patchset addresses that.
> >
> > Signed-off-by: Chris Mi 
> > Signed-off-by: Jiri Pirko 
> > ---
> >  net/sched/cls_flower.c | 55
> > +-
> >  1 file changed, 23 insertions(+), 32 deletions(-)
> >
> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index
> > bd9dab4..3d041d2 100644
> > --- a/net/sched/cls_flower.c
> > +++ b/net/sched/cls_flower.c
> 
> ...
> 
> > @@ -890,6 +870,7 @@ static int fl_change(struct net *net, struct sk_buff
> *in_skb,
> > struct cls_fl_filter *fnew;
> > struct nlattr **tb;
> > struct fl_flow_mask mask = {};
> > +   unsigned long idr_index;
> > int err;
> >
> > if (!tca[TCA_OPTIONS])
> > @@ -920,13 +901,21 @@ static int fl_change(struct net *net, struct sk_buff
> *in_skb,
> > goto errout;
> >
> > if (!handle) {
> > -   handle = fl_grab_new_handle(tp, head);
> > -   if (!handle) {
> > -   err = -EINVAL;
> > +   err = idr_alloc_ext(>handle_idr, fnew, _index,
> > +   1, 0x8000, GFP_KERNEL);
> > +   if (err)
> > goto errout;
> > -   }
> > +   fnew->handle = idr_index;
> > +   }
> > +
> > +   /* user specifies a handle and it doesn't exist */
> > +   if (handle && !fold) {
> > +   err = idr_alloc_ext(>handle_idr, fnew, _index,
> > +   handle, handle + 1, GFP_KERNEL);
> > +   if (err)
> > +   goto errout;
> > +   fnew->handle = idr_index;
> > }
> > -   fnew->handle = handle;
> >
> > if (tb[TCA_FLOWER_FLAGS]) {
> > fnew->flags = nla_get_u32(tb[TCA_FLOWER_FLAGS]);
> > @@ -980,6 +969,8 @@ static int fl_change(struct net *net, struct sk_buff
> *in_skb,
> > *arg = fnew;
> >
> > if (fold) {
> > +   fnew->handle = handle;
> 
> Can it be the case that fold is non-NULL and handle is zero?
> The handling of that case seem to have changed in this patch.
I don't think that could happen.  In function tc_ctl_tfilter(),

fl_get() will be called.  If handle is zero, fl_get() will return NULL.
That means fold is NULL.

> 
> > +   idr_replace_ext(>handle_idr, fnew, fnew->handle);
> > list_replace_rcu(>list, >list);
> > tcf_unbind_filter(tp, >res);
> > call_rcu(>rcu, fl_destroy_filter);
> > --
> > 1.8.3.1
> >

Re: Hooking on L4 Level with process information

2017-08-28 Thread Stephen Hemminger

On Tue, 29 Aug 2017 07:34:51 +0530
Ravish Kumar  wrote:

> Hi,
> 
> I want to hook tcp/udp packets on L4 Layer and based on process
> information , content want to deny or allow packets.
> 
> Netfilter provides pre/post Routing hooks but not sure that will be
> right place so thought of asking whether my approach is right.
> Also how i can get process information whether this packet is send by
> this process.
> 
> Thoughts /source code reference would be appreciated.
> 
> Regards,
> Ravish

There is not a 1:1 relationship between sockets/files and processes.

[net-next] be2net: use shift instead of expensive divide

2017-08-28 Thread Zhang Shengju

Replace shift instead of expensive divide.

Signed-off-by: Zhang Shengju 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 319eee3..e06094f 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2455,7 +2455,7 @@ static struct be_rx_compl_info *be_rx_compl_get(struct 
be_rx_obj *rxo)
 
/* For checking the valid bit it is Ok to use either definition as the
 * valid bit is at the same position in both v0 and v1 Rx compl */
-   if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) / 32] == 0)
+   if (compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) >> 5] == 0)
return NULL;
 
rmb();
@@ -2486,7 +2486,7 @@ static struct be_rx_compl_info *be_rx_compl_get(struct 
be_rx_obj *rxo)
}
 
/* As the compl has been parsed, reset it; we wont touch it again */
-   compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) / 32] = 0;
+   compl->dw[offsetof(struct amap_eth_rx_compl_v1, valid) >> 5] = 0;
 
queue_tail_inc(>cq);
return rxcp;
@@ -2590,7 +2590,7 @@ static struct be_tx_compl_info *be_tx_compl_get(struct 
be_tx_obj *txo)
struct be_tx_compl_info *txcp = >txcp;
struct be_eth_tx_compl *compl = queue_tail_node(tx_cq);
 
-   if (compl->dw[offsetof(struct amap_eth_tx_compl, valid) / 32] == 0)
+   if (compl->dw[offsetof(struct amap_eth_tx_compl, valid) >> 5] == 0)
return NULL;
 
/* Ensure load ordering of valid bit dword and other dwords below */
@@ -2600,7 +2600,7 @@ static struct be_tx_compl_info *be_tx_compl_get(struct 
be_tx_obj *txo)
txcp->status = GET_TX_COMPL_BITS(status, compl);
txcp->end_index = GET_TX_COMPL_BITS(wrb_index, compl);
 
-   compl->dw[offsetof(struct amap_eth_tx_compl, valid) / 32] = 0;
+   compl->dw[offsetof(struct amap_eth_tx_compl, valid) >> 5] = 0;
queue_tail_inc(tx_cq);
return txcp;
 }
-- 
1.8.3.1

Re: [PATCH net-next 1/3 v9] net: ether: Add support for multiplexing and aggregation type

2017-08-28 Thread Subash Abhinov Kasiviswanathan


On 2017-08-28 16:22, Dan Williams wrote:

On Thu, 2017-08-24 at 22:39 -0600, Subash Abhinov Kasiviswanathan
wrote:

Define the multiplexing and aggregation (MAP) ether type 0x00F9. This
is needed for receiving data in the MAP protocol like RMNET. This is
not an officially registered ID.

Signed-off-by: Subash Abhinov Kasiviswanathan 

Any chance you could name this QUALCOMM_MAP or something like that?  Or
at least update the comment to include that fact.

Dan



Hi Dan

Sure, I can add a comment for it.
--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project

Re: [patch net-next 11/12] mlxsw: spectrum_dpipe: Add support for IPv4 host table dump

2017-08-28 Thread David Ahern

On 8/27/17 2:31 AM, Arkadi Sharshevsky wrote:
>> Also, this dpipe capability seems to be just dumping data structures
>> maintained by the driver. ie., you can compare the mlxsw view of
>> networking state to IPv4 and IPv6 level tables. Any plans to offer a
>> command that reads data from the h/w and passes that back to the user?
>> i.e, a command to compare kernel tables to h/w state?
>>
> 
> So this infra should provide several things-
> 
> 1) Reveal the interactions between various hardware tables
> 2) Counters for this tables
> 3) Debugabillity
> 
> The first two can be achieved right now. Regarding debugabillity, which
> is a bit vague, the current assumption is that the drivers internal data
> structures are synced with hardware (which is no always true), and maybe
> are not synced with the kernel, so this can be achieved right now by
> dumping the internal state of the driver. Furthermore, the counters are
> dumped from the hardware and give the user additional indication.
> 
> I completely agree that the hardware should be dumped in order to
> validate the internal data structures are really synced with HW. This
> could be usable for observing data corruptions inside the ASIC and
> various complex bugs.
> 
> In order to address that I though about maybe add a flag called
> "validate_hw" so that during the dump the driver<-->hw state could be
> validated.
> 
> What do you think about it?

It is not just a matter of dumping hardware state. The data returned by
dump needs to be consistent across platforms and vendors.

If the intent is validating hardware state matches kernel state (ie.,
h/w forwarding matches s/w forwarding), then the hardware state should
be dumped by the driver in a form that parallels kernel state. e.g.,
dump h/w routes, neighbor entries, fdb's in a form and granularity
similar to what is done for kernel tables.

With the recent dpipe changes that allows kernel to driver cache and
kernel to h/w state comparisons.

[net-next:master 1466/1469] drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:48:10: error: implicit declaration of function 'bnxt_vf_rep_get_fid'

2017-08-28 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   acae4b48856838d71d548ab6610a99d8e32653e4
commit: 2ae7408fedfee979e01ed3801223c632bb124c46 [1466/1469] bnxt_en: bnxt: add 
TC flower filter offload support
config: x86_64-randconfig-b0-08290613 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout 2ae7408fedfee979e01ed3801223c632bb124c46
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c: In function 
'bnxt_flow_get_dst_fid':
>> drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:48:10: error: implicit 
>> declaration of function 'bnxt_vf_rep_get_fid' 
>> [-Werror=implicit-function-declaration]
  return bnxt_vf_rep_get_fid(dev);
 ^~~
   In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/timer.h:4,
from include/linux/netdevice.h:28,
from drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:10:
   drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c: At top level:
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'strcpy' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:390:2: note: in expansion of macro 'if'
 if (p_size == (size_t)-1 && q_size == (size_t)-1)
 ^~
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'kmemdup' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:380:2: note: in expansion of macro 'if'
 if (p_size < size)
 ^~
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'kmemdup' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:378:2: note: in expansion of macro 'if'
 if (__builtin_constant_p(size) && p_size < size)
 ^~
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'memchr_inv' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:369:2: note: in expansion of macro 'if'
 if (p_size < size)
 ^~
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'memchr_inv' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:367:2: note: in expansion of macro 'if'
 if (__builtin_constant_p(size) && p_size < size)
 ^~
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'memchr' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:358:2: note: in expansion of macro 'if'
 if (p_size < size)
 ^~
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'memchr' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:356:2: note: in expansion of macro 'if'
 if (__builtin_constant_p(size) && p_size < size)
 ^~
   include/linux/compiler.h:162:4: warning: '__f' is static but declared in 
inline function 'memcmp' which is not static
   __f = { \
   ^
   include/linux/compiler.h:154:23: note: in expansion of macro '__trace_if'
#define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
  ^~
   include/linux/string.h:348:2:

RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-08-28 Thread Dexuan Cui

> From: Dexuan Cui
> Sent: Tuesday, August 22, 2017 21:21
> > ...
> > ...
> > The only problem here would be the potential for a guest and a host app
> to
> > have a conflict wrt port numbers, even though they would be able to
> > operate fine, if restricted to their appropriate transport.
> >
> > Thanks,
> > Jorgen
> 
> Hi Jorgen, Stefan,
> Thank you for the detailed analysis!
> You have a much better understanding than me about the complex
> scenarios. Can you please work out a patch? :-)

Hi Jorgen, Stefan,
May I know your plan for this? 
 
> IMO Linux driver of Hyper-V sockets is the simplest case, as we only have
> the "to host" option (the host side driver of Hyper-V sockets runs on
> Windows kernel and I don't think the other hypervisors emulate
> the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets).
> 
> -- Dexuan

Thanks,
-- Dexuan

linux-next: manual merge of the net-next tree with the net tree

2017-08-28 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/marvell/mvpp2.c

between commit:

  4c2286826451 ("net: mvpp2: fix the mac address used when using PPv2.2")

from the net tree and commits:

  09f8397553a2 ("net: mvpp2: introduce per-port nrxqs/ntxqs variables")
  213f428f5056 ("net: mvpp2: add support for TX interrupts and RX queue 
distribution modes")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/marvell/mvpp2.c
index 4d598ca8503a,fea9ae5b70ba..
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@@ -6504,7 -7248,9 +7248,9 @@@ static int mvpp2_port_probe(struct plat
struct resource *res;
const char *dt_mac_addr;
const char *mac_from;
 -  char hw_mac_addr[ETH_ALEN];
 +  char hw_mac_addr[ETH_ALEN] = {0};
+   unsigned int ntxqs, nrxqs;
+   bool has_tx_irqs;
u32 id;
int features;
int phy_mode;

Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters

2017-08-28 Thread David Ahern

On 8/28/17 7:12 PM, Alexei Starovoitov wrote:
 To consider what happens on doubling back and changing programs in the
 hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
 recursive on 'b' and recursive on 'c') for each of the following cases:

 1. Program attached to 'b' is detached, recursive flag is reset in the
 request. Attempt fails EINVAL because the recursion flag has to be set.
>>>
>>> didn't get this point. you mean 'detach' will fail?
>>
>> Yes, because it tried to reset the flag in the process.
>>
>> This is something we can make user friendly - the detach succeeds, but
>> the recurse flag is ignored and the recurse flag in the group is not
>> reset unless it is the base group with the recurse flag (i.e., the
>> parent is marked non-recurse).
> 
> if we don't reset group flags to default it will be even more difficult
> for users to use, since attach with recursive flag + immediate detach sets
> some internal flag on the cgroup and user space has no way of
> either querying this flag or deleting it.

We have discussed this before -- the need to know which cgroup has a
program and now what is the status of flags. That need is a different
problem than this patch set.

I'll address the reset of the flags below to keep that discussion together.

> 
>>>
 2. Program attached to 'b' is detached, recursive flag is set. Allowed.
>>>
>>> meaing that detach from 'b' has to pass recurse flag to be detached?
>>> That's also odd.
>>> imo detach should always succeed and the process doing detach
>>> shouldn't need to know what flags were used in attach.
>>
>> Then, we agree to make it user friendly and handle resetting the recurse
>> flag automatically.
> 
> in that sense yes attach/delete pair should be side-effect free.
> 
 Process in 'b' opens a socket. No program attached to 'b' so no program
 is run. Recursive flag is set to program from 'a' is run. Stop.

 We should allow the recursive flag to be reset if the parent is not
 recursive allowing an unwind of settings applied. I'll add that change.
>>>
>>> I don't get this part.
>>> Anyway looking forward to the next patch set with tests and comments like 
>>> above.
>>
>> Per above discussion, you don't want 'a' run since it is not marked
>> recurse. My last sentence is the user friendly part in resetting the
>> flag in the cgroup.
> 
> I'm still not grasping fully the semantics of what you're proposing.
> You keep saying that override and recurse flags are indepedent, but
> the more we talk the more it's clear that there is a complicated
> relationship between them. Like no_override overrules everything, etc.

yes, I have said that a few times. Override should block everything in
terms of installing programs. If it is not enabled, the status of the
recurse flag is not relevant at attach / detach time as the call should
fail. So installing a program with the recurse flag only works if
override is allowed.

> I'm looking for the simplest to use logic. Not implementation.
> Implementation can be complex, but uapi should be as simple to
> explain and as simple to understand as possible.
> So how about allowing recurse+overide combination only?
> All descendents must be recurse+override too and
> no program allowed to be set on parent unless it's recurse+override
> as well. Then detach anywhere is simple, since all programs in
> such chain are always recurse+override.

Let's walk through examples based on the new ground rule - recursion
stops at last cgroup with flag set.

Assuming override is allowed ...

${MNT}/a/b/c/d

- 'a' has no program

- 'b' has a program, override allowed, recurse set

- 'c' and 'd' inherit the program from 'b' by recursion, not inheritance
(ie., bpf.effective is not updated with the program from 'b', but the
recurse flag is set on 'c' and 'd').

At this point 'c' and 'd' can ONLY take programs that are recursive.

- 'c' gets a program installed
- 'd' gets a program installed.

Process in 'd' has programs run in this order: 'd', 'c', 'b'

Now, program 'c' is detached. It is in the middle of the recursive set.
It MUST keep the recurse flag set as it inherited the restriction from
'b'. The recurse flag on 'c' can ONLY be reset when the program is
detached from 'b' as it is the start of the recursive chain.

I'll stop here to make sure we agree on the above. Considering all
permutations is a maze.

###

Also, let's agree on this intention. Based on the new ground rule, I
want to point out this example:

If 'a' gets a program installed with no recurse flag set, ONLY processes
in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd'
all stop at cgroup 'b' program.

To me this is counterintuitive and why I said programs are run up to the
first parent without the recurse flag. They are all part of the same tree.

Hooking on L4 Level with process information

2017-08-28 Thread Ravish Kumar

Hi,

I want to hook tcp/udp packets on L4 Layer and based on process
information , content want to deny or allow packets.

Netfilter provides pre/post Routing hooks but not sure that will be
right place so thought of asking whether my approach is right.
Also how i can get process information whether this packet is send by
this process.

Thoughts /source code reference would be appreciated.

Regards,
Ravish

Re: Permissions for eBPF objects

2017-08-28 Thread Chenbo Feng

On Mon, Aug 28, 2017 at 6:15 PM, Alexei Starovoitov
 wrote:
> On Mon, Aug 28, 2017 at 05:47:19PM -0700, Chenbo Feng wrote:
>> On Fri, Aug 25, 2017 at 6:03 PM, Alexei Starovoitov
>>  wrote:
>> > On Fri, Aug 25, 2017 at 10:07:27PM +0200, Daniel Borkmann wrote:
>> >> On 08/25/2017 09:52 PM, Chenbo Feng wrote:
>> >> > On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep 
>> >> >  wrote:
>> >> > > On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley 
>> >> > >  wrote:
>> >> > > > On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
>> >> > > > wrote:
>> >> > > > > I’d like to get your thoughts on adding LSM permission checks on 
>> >> > > > > BPF
>> >> > > > > objects.
>> >
>> > before reinventing the wheel please take a look at landlock work.
>> > Everything that was discussed in this thread is covered by it.
>> > The patches have been in development for more than a year and most of the 
>> > early
>> > issues have been resolved.
>> > It will be presented again during security summit in LA in September.
>> >
>> I am not very familiar with landlock lsm, isn't this module also
>> depend on the lsm hooks to do
>> the landlock check? If so then adding lsm hooks for eBPF object seems
>> not conflict with the
>> work on progress.
>
> I see. I got it the other way around. What lsm checks are you proposing?
> and why unprivileged_bpf_disabled is not enough?
> you want to allow unpriv only for specific user(s) ?
>
Exactly, the proposal patch I am currently working on will add checks
before map creation,
map read,  and map modify, since all these functionalities will be
available to all users when
unprivileged_bpf_disabled is turned off. And eBPF prog_load may also
need a check as well
since loading some types of program is not restricted either.

RE: [patch net-next 2/3] net/sched: Change cls_flower to use IDR

2017-08-28 Thread Chris Mi

> -Original Message-
> From: Jamal Hadi Salim [mailto:j...@mojatatu.com]
> Sent: Tuesday, August 29, 2017 5:56 AM
> To: Chris Mi ; netdev@vger.kernel.org
> Cc: xiyou.wangc...@gmail.com; j...@resnulli.us; da...@davemloft.net;
> mawil...@microsoft.com
> Subject: Re: [patch net-next 2/3] net/sched: Change cls_flower to use IDR
> 
> On 17-08-28 02:41 AM, Chris Mi wrote:
> > Currently, all filters with the same priority are linked in a doubly
> > linked list. Every filter should have a unique handle. To make the
> > handle unique, we need to iterate the list every time to see if the
> > handle exists or not when inserting a new filter. It is time-consuming.
> > For example, it takes about 5m3.169s to insert 64K rules.
> >
> > This patch changes cls_flower to use IDR. With this patch, it takes
> > about 0m1.127s to insert 64K rules. The improvement is huge.
> >
> > But please note that in this testing, all filters share the same action.
> > If every filter has a unique action, that is another bottleneck.
> > Follow-up patch in this patchset addresses that.
> >
> > Signed-off-by: Chris Mi 
> > Signed-off-by: Jiri Pirko 
> 
> Acked-by: Jamal Hadi Salim 
> 
> As Cong asked last time - any plans to add to other classifiers?
I think if other classifiers don't need so many items, list is enough for them.
If we change all of them, we need spend a lot of time to test them to make sure
there is no regression. But the benefit is not very big. If a certain classifier
need to change in the future, flower is an example for reference.

-Chris
> 
> cheers,
> jamal

Re: Permissions for eBPF objects

2017-08-28 Thread Alexei Starovoitov

On Mon, Aug 28, 2017 at 05:47:19PM -0700, Chenbo Feng wrote:
> On Fri, Aug 25, 2017 at 6:03 PM, Alexei Starovoitov
>  wrote:
> > On Fri, Aug 25, 2017 at 10:07:27PM +0200, Daniel Borkmann wrote:
> >> On 08/25/2017 09:52 PM, Chenbo Feng wrote:
> >> > On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep 
> >> >  wrote:
> >> > > On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley  
> >> > > wrote:
> >> > > > On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
> >> > > > wrote:
> >> > > > > I’d like to get your thoughts on adding LSM permission checks on 
> >> > > > > BPF
> >> > > > > objects.
> >
> > before reinventing the wheel please take a look at landlock work.
> > Everything that was discussed in this thread is covered by it.
> > The patches have been in development for more than a year and most of the 
> > early
> > issues have been resolved.
> > It will be presented again during security summit in LA in September.
> >
> I am not very familiar with landlock lsm, isn't this module also
> depend on the lsm hooks to do
> the landlock check? If so then adding lsm hooks for eBPF object seems
> not conflict with the
> work on progress.

I see. I got it the other way around. What lsm checks are you proposing?
and why unprivileged_bpf_disabled is not enough?
you want to allow unpriv only for specific user(s) ?

Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters

2017-08-28 Thread Alexei Starovoitov

On Mon, Aug 28, 2017 at 06:43:42PM -0600, David Ahern wrote:
> On 8/28/17 5:56 PM, Alexei Starovoitov wrote:
> > On Sun, Aug 27, 2017 at 08:49:23AM -0600, David Ahern wrote:
> >>
> >> The override flag is independent of the recursive flag. If the override
> >> flag does not allow an override, the attempt to add a new program fails.
> >> The recursive flag brings an additional constraint: once a cgroup has a
> >> program with the recursive flag set it is inherited by all descendant
> >> groups. Attempts to insert a program that changes that flag fails EINVAL.
> >>
> >> Start with the root group at $MNT. No program is attached. By default
> >> override is allowed and recursive is not set.
> > 
> > The above explanation is the reason we need tests for this logic.
> > The default is the opposite! By default override is _not_ allowed.
> 
> yes, I got that backwards. It's how my programs work since I always add
> the OVERRIDE flag ATM.
> 
> > 
> >> 1. Group $MNT/a is created.
> >>
> >> i. Default settings from $MNT are inherited; 'a' has override enabled
> >> and recursive disabled.
> > 
> > not true, but say the user attached a prog wih override on...
> 
> exactly.
> 
> > 
> >> ii. Program is attached. Override flag is set, recursive flag is not set.
> >>
> >> iii. Process in 'a' opens a socket, program attached to 'a' is run.
> >>
> >>
> >> 2. $MNT/a/b is created
> >>
> >> i. 'b' inherits the program and settings of 'a' (override enabled,
> >> recursive disabled).
> >>
> >> ii. Process in 'b' opens a socket. Program inherited from 'a' is run.
> >>
> >> iii. Non-interesting case for this patch set: attaching a non-recursive
> >> program to 'b' overrides the inherited one. process opens a socket only
> >> the 'b' program is run.
> >>
> >> iv. Program is attached to 'b', override flag set, recursive flag set.
> >>
> >> v. Process in 'b' opens a socket. Program attached to 'b' is run and
> >> then program from 'a' is run. Recursion stops here since 'a' does not
> >> have the recursion flag set.
> > 
> > isn't this the problem? Override+non_recurse was set on 'a'.
> > Now we attached override+recurse on 'b' and suddenly 'a'
> > will be run like it was 'recursive'?
> 
> Without 'b', 'a' is run. With 'b' and the recurse flag I am suggesting
> the user is saying I want 'b' and then 'a'. ie., the first parent group
> without the flag is where we stop.
> 
> > imo that is counter intuitive to the owner of 'a'.
> > I think there can be two options:
> > - if recurse is not set on 'a', all of it descendents should not be allowed
> >   to use recurse flag
> > - if recurse is not set on 'a', it should not be run
> 
> If 'a' can be overridden then I don't agree with the first suggestion.
> Which means you are suggesting we stop at the last group with the
> recursive flag set. That is fine too.
> 
> > 
> > imo the former is cleaner and avoids issues with detach in the middle
> > 
> >> 3. $MNT/a/b/c is created
> >>
> >> i. 'c' inherits the settings of 'b' (override is allowed, recursive flag
> >> is set)
> >>
> >> ii. Process in 'c' opens a socket. No program from 'c' exists, so
> >> nothing is run. Recursion flag is set, so program from 'b' is run, then
> >> program from 'a' is run. Stop (recursive flag not set on 'a').
> > 
> > also doesn't make sense to me. Both 'b' and 'c' were attached as
> > override+recurse while 'a' as non-recurse why would it run?
> > The owner of 'a' attached it as override in the first place,
> > so it assumed that if descendent wants to override it it can
> > and the prog 'a' won't be running.
> 
> same argument as above - the question is where do we stop. My suggestion
> and implementation is stopping at the first group without the flag.
> 
> As stated all along, descendant groups inherit programs of the parent
> marked recurse - not by copying them into the group but by recursion.
> 
> 
> > 
> >> iii. Attaching a non-recursive program to 'c' fails because it inherited
> >> the recursive flag from 'b' and that can not be reset by a descendant.
> > 
> > that part makes sense
> > 
> >> iv. Recursive program is attached to 'c'
> >>
> >> v. Process in 'c' opens a socket. Program attached to 'c' is run, then
> >> the program from 'b' and the program from 'a'. Stop.
> >>
> >> etc.
> >>
> >> To consider what happens on doubling back and changing programs in the
> >> hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
> >> recursive on 'b' and recursive on 'c') for each of the following cases:
> >>
> >> 1. Program attached to 'b' is detached, recursive flag is reset in the
> >> request. Attempt fails EINVAL because the recursion flag has to be set.
> > 
> > didn't get this point. you mean 'detach' will fail?
> 
> Yes, because it tried to reset the flag in the process.
> 
> This is something we can make user friendly - the detach succeeds, but
> the recurse flag is ignored and the recurse flag in the group is not
> reset unless it is the base group with the recurse flag (i.e., the
> parent

Re: [PATCH] be2net: Fix some u16 fields appropriately

2017-08-28 Thread 严海双


> On 2017年8月29日, at 上午7:19, David Miller  wrote:
> 
> From: Haishuang Yan 
> Date: Sun, 27 Aug 2017 15:24:45 +0800
> 
>> In be_tx_compl_process, frag_index declared as u32, so it's better to
>> declare last_index as u32 also.
>> 
>> CC: Ajit Khaparde 
>> Fixes: b0fd2eb28bd4 ("be2net: Declare some u16 fields as u32 to improve
>> performance")
>> Signed-off-by: Haishuang Yan 
> 
> That is not a legitimate reason for making this change.
> 
>> @@ -255,7 +255,7 @@ struct be_tx_stats {
>> /* Structure to hold some data of interest obtained from a TX CQE */
>> struct be_tx_compl_info {
>>  u8 status;  /* Completion status */
>> -u16 end_index;  /* Completed TXQ Index */
>> +u32 end_index;  /* Completed TXQ Index */
>> };
>> 
>> struct be_tx_obj {
> 
> The ->end_index comes solely from:
> 
>   txcp->end_index = GET_TX_COMPL_BITS(wrb_index, compl);
> 
> Which is precisely a 16-bit value.
> 
> I'm not applying this, sorry.
> 

Hi David,

The GET_TX_COMPL_BITS comes from amap_get which also returns a 32-bit value:

  6 static inline u32 amap_get(void *ptr, u32 dw_offset, u32 mask, u32 offset)
  5 {
  4 u32 *dw = (u32 *) ptr;
  3 return mask & (*(dw + dw_offset) >> offset);
  2 }
  1
869 #define AMAP_GET_BITS(_struct, field, ptr)  \
  1 amap_get(ptr,   \
  2 offsetof(_struct, field)/32,\
  3 amap_mask(sizeof(((_struct *)0)->field)),   \
  4 AMAP_BIT_OFFSET(_struct, field))

Re: Permissions for eBPF objects

2017-08-28 Thread Chenbo Feng

On Fri, Aug 25, 2017 at 6:03 PM, Alexei Starovoitov
 wrote:
> On Fri, Aug 25, 2017 at 10:07:27PM +0200, Daniel Borkmann wrote:
>> On 08/25/2017 09:52 PM, Chenbo Feng wrote:
>> > On Fri, Aug 25, 2017 at 12:45 PM, Jeffrey Vander Stoep  
>> > wrote:
>> > > On Fri, Aug 25, 2017 at 12:26 PM, Stephen Smalley  
>> > > wrote:
>> > > > On Fri, 2017-08-25 at 11:01 -0700, Jeffrey Vander Stoep via Selinux
>> > > > wrote:
>> > > > > I’d like to get your thoughts on adding LSM permission checks on BPF
>> > > > > objects.
>
> before reinventing the wheel please take a look at landlock work.
> Everything that was discussed in this thread is covered by it.
> The patches have been in development for more than a year and most of the 
> early
> issues have been resolved.
> It will be presented again during security summit in LA in September.
>
I am not very familiar with landlock lsm, isn't this module also
depend on the lsm hooks to do
the landlock check? If so then adding lsm hooks for eBPF object seems
not conflict with the
work on progress.

Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters

2017-08-28 Thread David Ahern

On 8/28/17 5:56 PM, Alexei Starovoitov wrote:
> On Sun, Aug 27, 2017 at 08:49:23AM -0600, David Ahern wrote:
>>
>> The override flag is independent of the recursive flag. If the override
>> flag does not allow an override, the attempt to add a new program fails.
>> The recursive flag brings an additional constraint: once a cgroup has a
>> program with the recursive flag set it is inherited by all descendant
>> groups. Attempts to insert a program that changes that flag fails EINVAL.
>>
>> Start with the root group at $MNT. No program is attached. By default
>> override is allowed and recursive is not set.
> 
> The above explanation is the reason we need tests for this logic.
> The default is the opposite! By default override is _not_ allowed.

yes, I got that backwards. It's how my programs work since I always add
the OVERRIDE flag ATM.

> 
>> 1. Group $MNT/a is created.
>>
>> i. Default settings from $MNT are inherited; 'a' has override enabled
>> and recursive disabled.
> 
> not true, but say the user attached a prog wih override on...

exactly.

> 
>> ii. Program is attached. Override flag is set, recursive flag is not set.
>>
>> iii. Process in 'a' opens a socket, program attached to 'a' is run.
>>
>>
>> 2. $MNT/a/b is created
>>
>> i. 'b' inherits the program and settings of 'a' (override enabled,
>> recursive disabled).
>>
>> ii. Process in 'b' opens a socket. Program inherited from 'a' is run.
>>
>> iii. Non-interesting case for this patch set: attaching a non-recursive
>> program to 'b' overrides the inherited one. process opens a socket only
>> the 'b' program is run.
>>
>> iv. Program is attached to 'b', override flag set, recursive flag set.
>>
>> v. Process in 'b' opens a socket. Program attached to 'b' is run and
>> then program from 'a' is run. Recursion stops here since 'a' does not
>> have the recursion flag set.
> 
> isn't this the problem? Override+non_recurse was set on 'a'.
> Now we attached override+recurse on 'b' and suddenly 'a'
> will be run like it was 'recursive'?

Without 'b', 'a' is run. With 'b' and the recurse flag I am suggesting
the user is saying I want 'b' and then 'a'. ie., the first parent group
without the flag is where we stop.

> imo that is counter intuitive to the owner of 'a'.
> I think there can be two options:
> - if recurse is not set on 'a', all of it descendents should not be allowed
>   to use recurse flag
> - if recurse is not set on 'a', it should not be run

If 'a' can be overridden then I don't agree with the first suggestion.
Which means you are suggesting we stop at the last group with the
recursive flag set. That is fine too.

> 
> imo the former is cleaner and avoids issues with detach in the middle
> 
>> 3. $MNT/a/b/c is created
>>
>> i. 'c' inherits the settings of 'b' (override is allowed, recursive flag
>> is set)
>>
>> ii. Process in 'c' opens a socket. No program from 'c' exists, so
>> nothing is run. Recursion flag is set, so program from 'b' is run, then
>> program from 'a' is run. Stop (recursive flag not set on 'a').
> 
> also doesn't make sense to me. Both 'b' and 'c' were attached as
> override+recurse while 'a' as non-recurse why would it run?
> The owner of 'a' attached it as override in the first place,
> so it assumed that if descendent wants to override it it can
> and the prog 'a' won't be running.

same argument as above - the question is where do we stop. My suggestion
and implementation is stopping at the first group without the flag.

As stated all along, descendant groups inherit programs of the parent
marked recurse - not by copying them into the group but by recursion.


> 
>> iii. Attaching a non-recursive program to 'c' fails because it inherited
>> the recursive flag from 'b' and that can not be reset by a descendant.
> 
> that part makes sense
> 
>> iv. Recursive program is attached to 'c'
>>
>> v. Process in 'c' opens a socket. Program attached to 'c' is run, then
>> the program from 'b' and the program from 'a'. Stop.
>>
>> etc.
>>
>> To consider what happens on doubling back and changing programs in the
>> hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
>> recursive on 'b' and recursive on 'c') for each of the following cases:
>>
>> 1. Program attached to 'b' is detached, recursive flag is reset in the
>> request. Attempt fails EINVAL because the recursion flag has to be set.
> 
> didn't get this point. you mean 'detach' will fail?

Yes, because it tried to reset the flag in the process.

This is something we can make user friendly - the detach succeeds, but
the recurse flag is ignored and the recurse flag in the group is not
reset unless it is the base group with the recurse flag (i.e., the
parent is marked non-recurse).


> 
>> 2. Program attached to 'b' is detached, recursive flag is set. Allowed.
> 
> meaing that detach from 'b' has to pass recurse flag to be detached?
> That's also odd.
> imo detach should always succeed and the process doing detach
> shouldn't need to know what flags were

[PATCH net] net: dsa: Don't dereference dst->cpu_dp->netdev

2017-08-28 Thread Florian Fainelli

If we do not have a master network device attached dst->cpu_dp will be
NULL and accessing cpu_dp->netdev will create a trace similar to the one
below. The correct check is on dst->cpu_dp period.

[1.004650] DSA: switch 0 0 parsed
[1.008078] Unable to handle kernel NULL pointer dereference at
virtual address 0010
[1.016195] pgd = c0003000
[1.018918] [0010] *pgd=804003, *pmd=
[1.024349] Internal error: Oops: 206 [#1] SMP ARM
[1.029157] Modules linked in:
[1.032228] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.13.0-rc6-00071-g45b45afab9bd-dirty #7
[1.040772] Hardware name: Broadcom STB (Flattened Device Tree)
[1.046704] task: ee08f840 task.stack: ee09
[1.051258] PC is at dsa_register_switch+0x5e0/0x9dc
[1.056234] LR is at dsa_register_switch+0x5d0/0x9dc
[1.061211] pc : []lr : []psr: 6213
[1.067491] sp : ee091d88  ip :   fp : 000c
[1.072728] r10:   r9 : 0001  r8 : ee208010
[1.077965] r7 : ee2b57b0  r6 : ee2b5780  r5 :   r4 :
ee208e0c
[1.084506] r3 :   r2 : 00040d00  r1 : 2d1b2000  r0 :
0016
[1.091050] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment user
[1.098199] Control: 32c5387d  Table: 3000  DAC: fffd
[1.103957] Process swapper/0 (pid: 1, stack limit = 0xee090210)

Reported-by: Dan Carpenter 
Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use 
dst->cpu_dp->netdev")
Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index c442051d5a55..20bc9c56fca0 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -577,7 +577,7 @@ static int dsa_dst_parse(struct dsa_switch_tree *dst)
return err;
}
 
-   if (!dst->cpu_dp->netdev) {
+   if (!dst->cpu_dp) {
pr_warn("Tree has no master device\n");
return -EINVAL;
}
-- 
1.9.1

[RESEND PATCH] Allow passing tid or pid in SCM_CREDENTIALS without CAP_SYS_ADMIN

2017-08-28 Thread Prakash Sangappa

Currently passing tid(gettid(2)) of a thread in struct ucred in
SCM_CREDENTIALS message requires CAP_SYS_ADMIN capability otherwise
it fails with EPERM error. Some applications deal with thread id
of a thread(tid) and so it would help to allow tid in SCM_CREDENTIALS
message. Basically, either tgid(pid of the process) or the tid of
the thread should be allowed without the need for CAP_SYS_ADMIN capability.

SCM_CREDENTIALS will be used to determine the global id of a process or
a thread running inside a pid namespace.

This patch adds necessary check to accept tid in SCM_CREDENTIALS
struct ucred.

Signed-off-by: Prakash Sangappa 
---
 net/core/scm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/scm.c b/net/core/scm.c
index b1ff8a4..9274197 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -55,6 +55,7 @@ static __inline__ int scm_check_creds(struct ucred *creds)
return -EINVAL;
 
if ((creds->pid == task_tgid_vnr(current) ||
+creds->pid == task_pid_vnr(current) ||
 ns_capable(task_active_pid_ns(current)->user_ns, CAP_SYS_ADMIN)) &&
((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
  uid_eq(uid, cred->suid)) || ns_capable(cred->user_ns, 
CAP_SETUID)) &&
-- 
2.7.4

Re: [PATCH net-next 00/11] bnxt_en: Updates.

2017-08-28 Thread David Miller

From: Michael Chan 
Date: Mon, 28 Aug 2017 13:40:24 -0400

> Various changes including updated firmware interface, improved TX ring
> allocation scheme, improved out-of-memory logic in NAPI loop, reduced
> default rings on multi-port devices, new PCI IDs. Of particular note,
> 
> CPU affinity hints from Vasundhara Volam.
> 
> TC Flower eswitch support from Sathya Perla.

Looks good, series applied, thanks!

Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running cgroup sock filters

2017-08-28 Thread Alexei Starovoitov

On Sun, Aug 27, 2017 at 08:49:23AM -0600, David Ahern wrote:
> 
> The override flag is independent of the recursive flag. If the override
> flag does not allow an override, the attempt to add a new program fails.
> The recursive flag brings an additional constraint: once a cgroup has a
> program with the recursive flag set it is inherited by all descendant
> groups. Attempts to insert a program that changes that flag fails EINVAL.
> 
> Start with the root group at $MNT. No program is attached. By default
> override is allowed and recursive is not set.

The above explanation is the reason we need tests for this logic.
The default is the opposite! By default override is _not_ allowed.

> 1. Group $MNT/a is created.
> 
> i. Default settings from $MNT are inherited; 'a' has override enabled
> and recursive disabled.

not true, but say the user attached a prog wih override on...

> ii. Program is attached. Override flag is set, recursive flag is not set.
> 
> iii. Process in 'a' opens a socket, program attached to 'a' is run.
> 
> 
> 2. $MNT/a/b is created
> 
> i. 'b' inherits the program and settings of 'a' (override enabled,
> recursive disabled).
> 
> ii. Process in 'b' opens a socket. Program inherited from 'a' is run.
> 
> iii. Non-interesting case for this patch set: attaching a non-recursive
> program to 'b' overrides the inherited one. process opens a socket only
> the 'b' program is run.
>
> iv. Program is attached to 'b', override flag set, recursive flag set.
> 
> v. Process in 'b' opens a socket. Program attached to 'b' is run and
> then program from 'a' is run. Recursion stops here since 'a' does not
> have the recursion flag set.

isn't this the problem? Override+non_recurse was set on 'a'.
Now we attached override+recurse on 'b' and suddenly 'a'
will be run like it was 'recursive'?
imo that is counter intuitive to the owner of 'a'.
I think there can be two options:
- if recurse is not set on 'a', all of it descendents should not be allowed
  to use recurse flag
- if recurse is not set on 'a', it should not be run

imo the former is cleaner and avoids issues with detach in the middle

> 3. $MNT/a/b/c is created
> 
> i. 'c' inherits the settings of 'b' (override is allowed, recursive flag
> is set)
> 
> ii. Process in 'c' opens a socket. No program from 'c' exists, so
> nothing is run. Recursion flag is set, so program from 'b' is run, then
> program from 'a' is run. Stop (recursive flag not set on 'a').

also doesn't make sense to me. Both 'b' and 'c' were attached as
override+recurse while 'a' as non-recurse why would it run?
The owner of 'a' attached it as override in the first place,
so it assumed that if descendent wants to override it it can
and the prog 'a' won't be running.

> iii. Attaching a non-recursive program to 'c' fails because it inherited
> the recursive flag from 'b' and that can not be reset by a descendant.

that part makes sense

> iv. Recursive program is attached to 'c'
> 
> v. Process in 'c' opens a socket. Program attached to 'c' is run, then
> the program from 'b' and the program from 'a'. Stop.
> 
> etc.
> 
> To consider what happens on doubling back and changing programs in the
> hierarchy, start with $MNT/a/b/c from 3 above (non-recursive on 'a',
> recursive on 'b' and recursive on 'c') for each of the following cases:
> 
> 1. Program attached to 'b' is detached, recursive flag is reset in the
> request. Attempt fails EINVAL because the recursion flag has to be set.

didn't get this point. you mean 'detach' will fail?

> 2. Program attached to 'b' is detached, recursive flag is set. Allowed.

meaing that detach from 'b' has to pass recurse flag to be detached?
That's also odd.
imo detach should always succeed and the process doing detach
shouldn't need to know what flags were used in attach.

> Process in 'b' opens a socket. No program attached to 'b' so no program
> is run. Recursive flag is set to program from 'a' is run. Stop.
> 
> We should allow the recursive flag to be reset if the parent is not
> recursive allowing an unwind of settings applied. I'll add that change.

I don't get this part.
Anyway looking forward to the next patch set with tests and comments like above.

Also adding Andy to cc. I'd really like both Andy and Tejun to review this 
logic.

Re: [PATCH net-next v3 0/3] NCSI VLAN Filtering Support

2017-08-28 Thread David Miller

From: Samuel Mendoza-Jonas 
Date: Mon, 28 Aug 2017 16:18:40 +1000

> This series (mainly patch 2) adds VLAN filtering to the NCSI implementation.
> A fair amount of code already exists in the NCSI stack for VLAN filtering but
> none of it is actually hooked up. This goes the final mile and fixes a few
> bugs in the existing code found along the way (patch 1).
> 
> Patch 3 adds the appropriate flag and callbacks to the ftgmac100 driver to
> enable filtering as it's a large consumer of NCSI (and what I've been
> testing on).
> 
> v3:   - Add comment describing change to ncsi_find_filter()
>   - Catch NULL in clear_one_vid() from ncsi_get_filter()
>   - Simplify state changes when kicking updated channel

Series applied.

Re: [net-next 00/15][pull request] 40GbE Intel Wired LAN Driver Updates 2017-08-27

2017-08-28 Thread David Miller

From: Jeff Kirsher 
Date: Sun, 27 Aug 2017 17:15:48 -0700

> This series contains updates to i40e and i40evf only.

Pulled, thanks Jeff.

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-28 Thread Joe Perches

On Mon, 2017-08-28 at 16:42 -0700, David Miller wrote:
> From: Greg Kroah-Hartman 
> Date: Sun, 27 Aug 2017 17:03:30 +0200
> 
> > The IRDA code has long been obsolete and broken.  So, to keep people
> > from trying to use it, and to prevent people from having to maintain it,
> > let's move it to drivers/staging/ so that we can delete it entirely from
> > the kernel in a few releases.
> 
> No objection, I'll apply this to net-next, thanks Greg.

Still needs an update to MAINTAINERS.

Re: [PATCH V2 net-next] net-next/hinic: Fix MTU limitation

2017-08-28 Thread David Miller

From: Aviad Krawczyk 
Date: Mon, 28 Aug 2017 01:20:26 +0800

> Fix the hw MTU limitation by setting max_mtu
> 
> Signed-off-by: Aviad Krawczyk 
> Signed-off-by: Zhao Chen 

Applied.

Re: [PATCH net-next] net-next/hinic: fix comparison of a uint16_t type with -1

2017-08-28 Thread David Miller

From: Aviad Krawczyk 
Date: Mon, 28 Aug 2017 01:35:30 +0800

> Remove the search for index of constant buffer size
> 
> Signed-off-by: Aviad Krawczyk 
> Signed-off-by: Zhao Chen 

Applied.

Re: [PATCH 0/4] irda: move it to drivers/staging so we can delete it

2017-08-28 Thread David Miller

From: Greg Kroah-Hartman 
Date: Sun, 27 Aug 2017 17:03:30 +0200

> The IRDA code has long been obsolete and broken.  So, to keep people
> from trying to use it, and to prevent people from having to maintain it,
> let's move it to drivers/staging/ so that we can delete it entirely from
> the kernel in a few releases.

No objection, I'll apply this to net-next, thanks Greg.

Re: [PATCH v4 0/7] Add RSS to DPAA 1.x Ethernet driver

2017-08-28 Thread David Miller

From: Madalin Bucur 
Date: Sun, 27 Aug 2017 16:13:36 +0300

> This patch set introduces Receive Side Scaling for the DPAA Ethernet
> driver. Documentation is updated with details related to the new
> feature and limitations that apply.
> Added also a small fix.
> 
> v2: removed a C++ style comment
> v3: move struct fman to header file to avoid exporting a function
> v4: addressed compilation issues introduced in v3

Series applied, thanks.

RE: [PATCH v3 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-28 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, August 28, 2017 15:39
> From: Dexuan Cui 
> Date: Sat, 26 Aug 2017 04:52:43 +
> 
> >
> > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> > mechanism between the host and the guest. It uses VMBus ringbuffer as
> the
> > transportation layer.
> >
> > With hv_sock, applications between the host (Windows 10, Windows
> Server
> > 2016 or newer) and the guest can talk with each other using the traditional
> > socket APIs.
> >
> > Signed-off-by: Dexuan Cui 
> 
> Applied, thank you.

Thanks a lot!

There are some supporting patches still pending in the VMBus driver.
I'll make sure they go in through the char-misc tree.

Thanks,
-- Dexuan

Re: [PATCH] be2net: Fix some u16 fields appropriately

2017-08-28 Thread David Miller

From: Haishuang Yan 
Date: Sun, 27 Aug 2017 15:24:45 +0800

> In be_tx_compl_process, frag_index declared as u32, so it's better to
> declare last_index as u32 also.
> 
> CC: Ajit Khaparde 
> Fixes: b0fd2eb28bd4 ("be2net: Declare some u16 fields as u32 to improve
> performance")
> Signed-off-by: Haishuang Yan 

That is not a legitimate reason for making this change.

> @@ -255,7 +255,7 @@ struct be_tx_stats {
>  /* Structure to hold some data of interest obtained from a TX CQE */
>  struct be_tx_compl_info {
>   u8 status;  /* Completion status */
> - u16 end_index;  /* Completed TXQ Index */
> + u32 end_index;  /* Completed TXQ Index */
>  };
>  
>  struct be_tx_obj {

The ->end_index comes solely from:

txcp->end_index = GET_TX_COMPL_BITS(wrb_index, compl);

Which is precisely a 16-bit value.

I'm not applying this, sorry.

Re: [PATCH iproute2 2/4] tc: m_ife: print IEEE ethertype format

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 06:37 PM, Stephen Hemminger wrote:

On Mon, 28 Aug 2017 18:18:04 -0400


[..]

@@ -125,7 +125,7 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
NEXT_ARG();
if (get_u16(_type, *argv, 0))
invarg("ife type is invalid", *argv);
-   fprintf(stderr, "IFE type 0x%x\n", ife_type);
+   fprintf(stderr, "IFE type 0x%04X\n", ife_type);
user_type = 1;
} else if (matches(*argv, "dst") == 0) {
NEXT_ARG();
   





For iproute commands the show output is supposed to match the corresponding set 
inputs.



These were being printed at parse time ...

cheers,
jamal

Re: [PATCH] net: ethernet: broadcom: Remove null check before kfree

2017-08-28 Thread David Miller

From: Himanshu Jha 
Date: Sun, 27 Aug 2017 01:47:47 +0530

> Kfree on NULL pointer is a no-op and therefore checking is redundant.
> 
> Signed-off-by: Himanshu Jha 

Applied.

Re: [PATCH net-next v2] sched: sfq: drop packets after root qdisc lock is released

2017-08-28 Thread David Miller

From: gfree.w...@vip.163.com
Date: Sat, 26 Aug 2017 22:58:58 +0800

> From: Gao Feng 
> 
> The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock
> is released) made a big change of tc for performance. But there are
> some points which are not changed in SFQ enqueue operation.
> 1. Fail to find the SFQ hash slot;
> 2. When the queue is full;
> 
> Now use qdisc_drop instead free skb directly.
> 
> Signed-off-by: Gao Feng 

Applied, thank you.

Re: [PATCH net 0/4] xfrm_user info leaks

2017-08-28 Thread David Miller

From: Mathias Krause 
Date: Sat, 26 Aug 2017 17:08:56 +0200

> Hi David, Steffen,
> 
> the following series fixes a few info leaks due to missing padding byte
> initialization in the xfrm_user netlink interface.
> 
> Please apply!

Steffen please pick this up if you haven't already.

Thank you.

Re: [PATCH] sni_82596: Add Thomas' email address to driver.

2017-08-28 Thread David Miller


I need this reposted with a proper commit message and a signoff before
I can apply this, thanks.

Re: [PATCH net] ipv6: set dst.obsolete when a cached route has expired

2017-08-28 Thread David Miller

From: Xin Long 
Date: Sat, 26 Aug 2017 20:10:10 +0800

> Now it doesn't check for the cached route expiration in ipv6's
> dst_ops->check(), because it trusts dst_gc that would clean the
> cached route up when it's expired.
> 
> The problem is in dst_gc, it would clean the cached route only
> when it's refcount is 1. If some other module (like xfrm) keeps
> holding it and the module only release it when dst_ops->check()
> fails.
> 
> But without checking for the cached route expiration, .check()
> may always return true. Meanwhile, without releasing the cached
> route, dst_gc couldn't del it. It will cause this cached route
> never to expire.
> 
> This patch is to set dst.obsolete with DST_OBSOLETE_KILL in .gc
> when it's expired, and check obsolete != DST_OBSOLETE_FORCE_CHK
> in .check.
> 
> Note that this is even needed when ipv6 dst_gc timer is removed
> one day. It would set dst.obsolete in .redirect and .update_pmtu
> instead, and check for cached route expiration when getting it,
> just like what ipv4 route does.
> 
> Reported-by: Jianlin Shi 
> Signed-off-by: Xin Long 
> Acked-by: Hannes Frederic Sowa 

Applied and queued up for -stable, thanks.

Re: [PATCH 3/4] i825xx: switch to switch to dma_alloc_attrs

2017-08-28 Thread David Miller

From: Christoph Hellwig 
Date: Sat, 26 Aug 2017 09:21:24 +0200

> This way we can always pass DMA_ATTR_NON_CONSISTENT, the SNI mips version
> will simply ignore the flag.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller

Re: [PATCH 2/4] au1000_eth: switch to dma_alloc_attrs

2017-08-28 Thread David Miller

From: Christoph Hellwig 
Date: Sat, 26 Aug 2017 09:21:23 +0200

> Use dma_alloc_attrs directly instead of the dma_alloc_noncoherent wrapper.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller

Re: [PATCH 1/4] sgiseeq: switch to dma_alloc_attrs

2017-08-28 Thread David Miller

From: Christoph Hellwig 
Date: Sat, 26 Aug 2017 09:21:22 +0200

> Use dma_alloc_attrs directly instead of the dma_alloc_noncoherent wrapper.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David S. Miller

Re: [patch net-next 0/2] mlxsw: spectrum: Fix couple of dpipe ipv4 host table bugs

2017-08-28 Thread David Miller

From: Jiri Pirko 
Date: Sat, 26 Aug 2017 08:35:37 +0200

> From: Jiri Pirko 
> 
> Arkadi Sharshevsky (1):
>   mlxsw: spectrum_dpipe: Fix host table dump
> 
> Jiri Pirko (1):
>   mlxsw: spectrum: compile-in dpipe support only if devlink is enabled

Series applied, thanks Jiri.

Re: [PATCH v3 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-28 Thread David Miller

From: Dexuan Cui 
Date: Sat, 26 Aug 2017 04:52:43 +

> 
> Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> mechanism between the host and the guest. It uses VMBus ringbuffer as the
> transportation layer.
> 
> With hv_sock, applications between the host (Windows 10, Windows Server
> 2016 or newer) and the guest can talk with each other using the traditional
> socket APIs.
> 
> More info about Hyper-V Sockets is available here:
> 
> "Make your own integration services":
> https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service
> 
> The patch implements the necessary support in Linux guest by introducing a new
> vsock transport for AF_VSOCK.
> 
> Signed-off-by: Dexuan Cui 

Applied, thank you.

Re: [PATCH iproute2 2/4] tc: m_ife: print IEEE ethertype format

2017-08-28 Thread Stephen Hemminger

On Mon, 28 Aug 2017 18:18:04 -0400
Jamal Hadi Salim  wrote:

> Alex,
> 
> I think we should get rid of these fprintfs instead of fixing them.
> They were originally intended to be debug outputs.
> 
> cheers,
> jamal
> 
> On 17-08-28 03:07 PM, Alexander Aring wrote:
> > This patch uses the usually IEEE format to display an ethertype which is
> > 4-digits and every digit in upper case.
> > 
> > Signed-off-by: Alexander Aring 
> > ---
> >   tc/m_ife.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tc/m_ife.c b/tc/m_ife.c
> > index e05e2276..7b57130e 100644
> > --- a/tc/m_ife.c
> > +++ b/tc/m_ife.c
> > @@ -125,7 +125,7 @@ static int parse_ife(struct action_util *a, int 
> > *argc_p, char ***argv_p,
> > NEXT_ARG();
> > if (get_u16(_type, *argv, 0))
> > invarg("ife type is invalid", *argv);
> > -   fprintf(stderr, "IFE type 0x%x\n", ife_type);
> > +   fprintf(stderr, "IFE type 0x%04X\n", ife_type);
> > user_type = 1;
> > } else if (matches(*argv, "dst") == 0) {
> > NEXT_ARG();
> >   
> 


For iproute commands the show output is supposed to match the corresponding set 
inputs.

Re: [PATCH net] ipv6: fix sparse warning on rt6i_node

2017-08-28 Thread David Miller

From: Wei Wang 
Date: Fri, 25 Aug 2017 15:03:10 -0700

> From: Wei Wang 
> 
> Commit c5cff8561d2d adds rcu grace period before freeing fib6_node. This
> generates a new sparse warning on rt->rt6i_node related code:
>   net/ipv6/route.c:1394:30: error: incompatible types in comparison
>   expression (different address spaces)
>   ./include/net/ip6_fib.h:187:14: error: incompatible types in comparison
>   expression (different address spaces)
> 
> This commit adds "__rcu" tag for rt6i_node and makes sure corresponding
> rcu API is used for it.
> After this fix, sparse no longer generates the above warning.
> 
> Fixes: c5cff8561d2d ("ipv6: add rcu grace period before freeing fib6_node")
> Signed-off-by: Wei Wang 
> Acked-by: Eric Dumazet 

Applied.

Re: [PATCH net-next] selftests/bpf: check the instruction dumps are populated

2017-08-28 Thread David Miller

From: Jakub Kicinski 
Date: Fri, 25 Aug 2017 14:39:57 -0700

> Add a basic test for checking whether kernel is populating
> the jited and xlated BPF images.  It was used to confirm
> the behaviour change from commit d777b2ddbecf ("bpf: don't 
> zero out the info struct in bpf_obj_get_info_by_fd()"),
> which made bpf_obj_get_info_by_fd() usable for retrieving
> the image dumps.
> 
> Signed-off-by: Jakub Kicinski 

Applied.

Re: [PATCH net] cxgb4: Fix stack out-of-bounds read due to wrong size to t4_record_mbox()

2017-08-28 Thread David Miller

From: Stefano Brivio 
Date: Fri, 25 Aug 2017 22:48:48 +0200

> Passing commands for logging to t4_record_mbox() with size
> MBOX_LEN, when the actual command size is actually smaller,
> causes out-of-bounds stack accesses in t4_record_mbox() while
> copying command words here:
> 
>   for (i = 0; i < size / 8; i++)
>   entry->cmd[i] = be64_to_cpu(cmd[i]);
> 
> Up to 48 bytes from the stack are then leaked to debugfs.
> 
> This happens whenever we send (and log) commands described by
> structs fw_sched_cmd (32 bytes leaked), fw_vi_rxmode_cmd (48),
> fw_hello_cmd (48), fw_bye_cmd (48), fw_initialize_cmd (48),
> fw_reset_cmd (48), fw_pfvf_cmd (32), fw_eq_eth_cmd (16),
> fw_eq_ctrl_cmd (32), fw_eq_ofld_cmd (32), fw_acl_mac_cmd(16),
> fw_rss_glb_config_cmd(32), fw_rss_vi_config_cmd(32),
> fw_devlog_cmd(32), fw_vi_enable_cmd(48), fw_port_cmd(32),
> fw_sched_cmd(32), fw_devlog_cmd(32).
> 
> The cxgb4vf driver got this right instead.
> 
> When we call t4_record_mbox() to log a command reply, a MBOX_LEN
> size can be used though, as get_mbox_rpl() will fill cmd_rpl up
> completely.
> 
> Fixes: 7f080c3f2ff0 ("cxgb4: Add support to enable logging of firmware 
> mailbox commands")
> Signed-off-by: Stefano Brivio 
> ---
> I guess this should be queued up for -stable, back to 4.7.

Applied and queued up for -stable, thanks.

Re: [PATCH net-next 1/3 v9] net: ether: Add support for multiplexing and aggregation type

2017-08-28 Thread Dan Williams

On Thu, 2017-08-24 at 22:39 -0600, Subash Abhinov Kasiviswanathan
wrote:
> Define the multiplexing and aggregation (MAP) ether type 0x00F9. This
> is needed for receiving data in the MAP protocol like RMNET. This is
> not an officially registered ID.
> 
> Signed-off-by: Subash Abhinov Kasiviswanathan  g>
> ---
>  include/uapi/linux/if_ether.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/uapi/linux/if_ether.h
> b/include/uapi/linux/if_ether.h
> index 5bc9bfd..0d73ecc 100644
> --- a/include/uapi/linux/if_ether.h
> +++ b/include/uapi/linux/if_ether.h
> @@ -137,6 +137,7 @@
>  #define ETH_P_IEEE802154 0x00F6  /* IEEE802.15.4 frame
>   */
>  #define ETH_P_CAIF   0x00F7  /* ST-Ericsson CAIF
> protocol  */
>  #define ETH_P_XDSA   0x00F8  /* Multiplexed DSA
> protocol  */
> +#define ETH_P_MAP0x00F9  /* Multiplex &
> aggregation proto*/

Any chance you could name this QUALCOMM_MAP or something like that?  Or
at least update the comment to include that fact.

Dan

>  /*
>   *   This is an Ethernet frame header.

Re: [PATCH net-next] bpf: fix oops on allocation failure

2017-08-28 Thread David Miller

From: Dan Carpenter 
Date: Fri, 25 Aug 2017 23:27:14 +0300

> "err" is set to zero if bpf_map_area_alloc() fails so it means we return
> ERR_PTR(0) which is NULL.  The caller, find_and_alloc_map(), is not
> expecting NULL returns and will oops.
> 
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Signed-off-by: Dan Carpenter 

Applied.

Re: [PATCH 4/4] net: stmmac: sun8i: Remove the compatibles

2017-08-28 Thread David Miller

From: Maxime Ripard 
Date: Fri, 25 Aug 2017 21:12:17 +0200

> Since the bindings have been controversial, and we follow the DT stable ABI
> rule, we shouldn't let a driver with a DT binding that might change slip
> through in a stable release.
> 
> Remove the compatibles to make sure the driver will not probe and no-one
> will start using the binding currently implemented. This commit will
> obviously need to be reverted in due time.
> 
> Signed-off-by: Maxime Ripard 

Applied to net tree.

Re: [PATCH net 0/3] fix layer calculation and flow dissector use

2017-08-28 Thread David Miller

From: Pieter Jansen van Vuuren 
Date: Fri, 25 Aug 2017 19:31:00 +0200

> Previously when calculating the supported key layers MPLS, IPv4/6
> TTL and TOS were not considered. Formerly flow dissectors were referenced
> without first checking that they are in use and correctly populated by TC.
> Additionally this patch set fixes the incorrect use of mask field for vlan
> matching.

Series applied, thanks.

Re: [PATCH iproute2 4/4] man: tc-ife: add default type note

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 03:07 PM, Alexander Aring wrote:

This patch updates the tc-ife man page that the default IFE ethertype
will be used if it's not specified.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH iproute2 3/4] tc: m_ife: report about kernels default type

2017-08-28 Thread Jamal Hadi Salim


Same comment as previous patch.

cheers,
jamal

On 17-08-28 03:07 PM, Alexander Aring wrote:

This patch will report about if the ethertype for IFE is not specified
that the default IFE type is used.

Signed-off-by: Alexander Aring 
---
  tc/m_ife.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index 7b57130e..5633ab90 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -189,6 +189,8 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
addattr_l(n, MAX_MSG, TCA_IFE_DMAC, dbuf, ETH_ALEN);
if (user_type)
addattr_l(n, MAX_MSG, TCA_IFE_TYPE, _type, 2);
+   else
+   fprintf(stderr, "IFE type 0x%04X\n", ETH_P_IFE);
if (saddr)
addattr_l(n, MAX_MSG, TCA_IFE_SMAC, sbuf, ETH_ALEN);

Re: [PATCH iproute2 2/4] tc: m_ife: print IEEE ethertype format

2017-08-28 Thread Jamal Hadi Salim


Alex,

I think we should get rid of these fprintfs instead of fixing them.
They were originally intended to be debug outputs.

cheers,
jamal

On 17-08-28 03:07 PM, Alexander Aring wrote:

This patch uses the usually IEEE format to display an ethertype which is
4-digits and every digit in upper case.

Signed-off-by: Alexander Aring 
---
  tc/m_ife.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index e05e2276..7b57130e 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -125,7 +125,7 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
NEXT_ARG();
if (get_u16(_type, *argv, 0))
invarg("ife type is invalid", *argv);
-   fprintf(stderr, "IFE type 0x%x\n", ife_type);
+   fprintf(stderr, "IFE type 0x%04X\n", ife_type);
user_type = 1;
} else if (matches(*argv, "dst") == 0) {
NEXT_ARG();

Re: [PATCH net-next] xen-netback: update ubuf_info initialization to anonymous union

2017-08-28 Thread David Miller

From: Willem de Bruijn 
Date: Fri, 25 Aug 2017 13:10:43 -0400

> From: Willem de Bruijn 
> 
> The xen driver initializes struct ubuf_info fields using designated
> initializers. I recently moved these fields inside a nested anonymous
> struct inside an anonymous union. I had missed this use case.
> 
> This breaks compilation of xen-netback with older compilers.
> From kbuild bot with gcc-4.4.7:
> 
>drivers/net//xen-netback/interface.c: In function
>'xenvif_init_queue':
>>> drivers/net//xen-netback/interface.c:554: error: unknown field 'ctx' 
> specified in initializer
>>> drivers/net//xen-netback/interface.c:554: warning: missing braces 
> around initializer
>   drivers/net//xen-netback/interface.c:554: warning: (near initialization 
> for '(anonymous).')
>>> drivers/net//xen-netback/interface.c:554: warning: initialization makes 
> integer from pointer without a cast
>>> drivers/net//xen-netback/interface.c:555: error: unknown field 'desc' 
> specified in initializer
> 
> Add double braces around the designated initializers to match their
> nested position in the struct. After this, compilation succeeds again.
> 
> Fixes: 4ab6c99d99bb ("sock: MSG_ZEROCOPY notification coalescing")
> Reported-by: kbuild bot 
> Signed-off-by: Willem de Bruijn 

APplied.

Re: [PATCH net-next] net: Add comment that early_demux can change via sysctl

2017-08-28 Thread David Miller

From: David Ahern 
Date: Mon, 28 Aug 2017 15:14:20 -0700

> Twice patches trying to constify inet{6}_protocol have been reverted:
> 39294c3df2a8 ("Revert "ipv6: constify inet6_protocol structures"") to
> revert 3a3a4e3054137 and then 03157937fe0b5 ("Revert "ipv4: make
> net_protocol const"") to revert aa8db499ea67.
> 
> Add a comment that the structures can not be const because the
> early_demux field can change based on a sysctl.
> 
> Signed-off-by: David Ahern 

Applied.

Re: [PATCH iproute2 1/4] tc: m_ife: allow ife type to zero

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 03:07 PM, Alexander Aring wrote:

This patch allows to set an ethertype for IFE which is zero. There is no
kernel side validation which forbids a type to zero.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH net-next 3/3] tc-testing: add test for testing ife type

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 03:03 PM, Alexander Aring wrote:

This patch adds a new testcase for the IFE type setting in tc. In case
of user specified the type it will check if the ife is correctly
configured to react on it. If it's not specified the default IFE type
should be used.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 


We should add the tests which do filter binding as well
(both auto-binding and late binding)

cheers,
jamal

[PATCH net-next] net: Add comment that early_demux can change via sysctl

2017-08-28 Thread David Ahern

Twice patches trying to constify inet{6}_protocol have been reverted:
39294c3df2a8 ("Revert "ipv6: constify inet6_protocol structures"") to
revert 3a3a4e3054137 and then 03157937fe0b5 ("Revert "ipv4: make
net_protocol const"") to revert aa8db499ea67.

Add a comment that the structures can not be const because the
early_demux field can change based on a sysctl.

Signed-off-by: David Ahern 
---
 net/ipv4/af_inet.c  | 6 ++
 net/ipv6/tcp_ipv6.c | 3 +++
 net/ipv6/udp.c  | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d678820e4306..e31108e5ef79 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1596,6 +1596,9 @@ static const struct net_protocol igmp_protocol = {
 };
 #endif
 
+/* thinking of making this const? Don't.
+ * early_demux can change based on sysctl.
+ */
 static struct net_protocol tcp_protocol = {
.early_demux=   tcp_v4_early_demux,
.early_demux_handler =  tcp_v4_early_demux,
@@ -1606,6 +1609,9 @@ static struct net_protocol tcp_protocol = {
.icmp_strict_tag_validation = 1,
 };
 
+/* thinking of making this const? Don't.
+ * early_demux can change based on sysctl.
+ */
 static struct net_protocol udp_protocol = {
.early_demux =  udp_v4_early_demux,
.early_demux_handler =  udp_v4_early_demux,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index abba3bc2a3d9..38f76d8b231e 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1949,6 +1949,9 @@ struct proto tcpv6_prot = {
.diag_destroy   = tcp_abort,
 };
 
+/* thinking of making this const? Don't.
+ * early_demux can change based on sysctl.
+ */
 static struct inet6_protocol tcpv6_protocol = {
.early_demux=   tcp_v6_early_demux,
.early_demux_handler =  tcp_v6_early_demux,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2a15f1bb6ef8..976f30391356 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1472,6 +1472,9 @@ int compat_udpv6_getsockopt(struct sock *sk, int level, 
int optname,
 }
 #endif
 
+/* thinking of making this const? Don't.
+ * early_demux can change based on sysctl.
+ */
 static struct inet6_protocol udpv6_protocol = {
.early_demux=   udp_v6_early_demux,
.early_demux_handler =  udp_v6_early_demux,
-- 
2.1.4

Re: [PATCH net-next 2/3] act_ife: use registered ife_type as fallback

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 03:03 PM, Alexander Aring wrote:

This patch handles a default IFE type if it's not given by user space
netlink api. The default IFE type will be the registered ethertype by
IEEE for IFE ForCES.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [PATCH net-next 0/3] gre: add collect_md mode for ERSPAN tunnel

2017-08-28 Thread David Miller

From: William Tu 
Date: Fri, 25 Aug 2017 09:21:26 -0700

> This patch series provide collect_md mode for ERSPAN tunnel.  The fist patch
> refactors the existing gre_fb_xmit function by exacting the route cache
> portion into a new function called prepare_fb_xmit.  The second patch
> introduces the collect_md mode for ERSPAN tunnel, by calling the
> prepare_fb_xmit function and adding ERSPAN specific logic.  The final patch
> adds the test case using bpf_skb_{set,get}_tunnel_{key,opt}.

Series applied, thanks William.

Re: [PATCH net-next 1/3] if_ether: add forces ife lfb type

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 03:03 PM, Alexander Aring wrote:

This patch adds the forces IFE lfb type according to IEEE registered
ethertypes. See http://standards-oui.ieee.org/ethertype/eth.txt for more
information. Since there exists the IFE subsystem it can be used there.

This patch also use the correct word "ForCES" instead of "FoRCES" which
is a spelling error inside the IEEE ethertype specification.

Signed-off-by: Alexander Aring 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [patch net-next 3/3] net/sched: Change act_api and act_xxx modules to use IDR

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 02:41 AM, Chris Mi wrote:

Typically, each TC filter has its own action. All the actions of the
same type are saved in its hash table. But the hash buckets are too
small that it degrades to a list. And the performance is greatly
affected. For example, it takes about 0m11.914s to insert 64K rules.
If we convert the hash table to IDR, it only takes about 0m1.500s.
The improvement is huge.

But please note that the test result is based on previous patch that
cls_flower uses IDR.

Signed-off-by: Chris Mi 
Signed-off-by: Jiri Pirko 


Acked-by: Jamal Hadi Salim 

cheers,
jamal

Re: [patch net-next 2/3] net/sched: Change cls_flower to use IDR

2017-08-28 Thread Jamal Hadi Salim


On 17-08-28 02:41 AM, Chris Mi wrote:

Currently, all filters with the same priority are linked in a doubly
linked list. Every filter should have a unique handle. To make the
handle unique, we need to iterate the list every time to see if the
handle exists or not when inserting a new filter. It is time-consuming.
For example, it takes about 5m3.169s to insert 64K rules.

This patch changes cls_flower to use IDR. With this patch, it
takes about 0m1.127s to insert 64K rules. The improvement is huge.

But please note that in this testing, all filters share the same action.
If every filter has a unique action, that is another bottleneck.
Follow-up patch in this patchset addresses that.

Signed-off-by: Chris Mi 
Signed-off-by: Jiri Pirko 


Acked-by: Jamal Hadi Salim 

As Cong asked last time - any plans to add to other classifiers?

cheers,
jamal

[PATCH v2 18/30] net: Define usercopy region in struct proto slab cache

2017-08-28 Thread Kees Cook

From: David Windsor 

In support of usercopy hardening, this patch defines a region in the
struct proto slab cache in which userspace copy operations are allowed.
Some protocols need to copy objects to/from userspace, and they can
declare the region via their proto structure with the new usersize and
useroffset fields. Initially, if no region is specified (usersize ==
0), the entire field is marked as whitelisted. This allows protocols
to be whitelisted in subsequent patches. Once all protocols have been
annotated, the full-whitelist default can be removed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor 
[kees: adjust commit log, split off per-proto patches]
[kees: add logic for by-default full-whitelist]
Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Paolo Abeni 
Cc: David Howells 
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 include/net/sock.h | 2 ++
 net/core/sock.c| 6 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..170d5b2dbcb6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1106,6 +1106,8 @@ struct proto {
struct kmem_cache   *slab;
unsigned intobj_size;
int slab_flags;
+   size_t  useroffset; /* Usercopy region offset */
+   size_t  usersize;   /* Usercopy region size */
 
struct percpu_counter   *orphan_count;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..02dab98ca3e3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3109,8 +3109,12 @@ static int req_prot_init(const struct proto *prot)
 int proto_register(struct proto *prot, int alloc_slab)
 {
if (alloc_slab) {
-   prot->slab = kmem_cache_create(prot->name, prot->obj_size, 0,
+   prot->slab = kmem_cache_create_usercopy(prot->name,
+   prot->obj_size, 0,
SLAB_HWCACHE_ALIGN | prot->slab_flags,
+   prot->usersize ? prot->useroffset : 0,
+   prot->usersize ? prot->usersize
+  : prot->obj_size,
NULL);
 
if (prot->slab == NULL) {
-- 
2.7.4

[PATCH v2 23/30] net: Restrict unwhitelisted proto caches to size 0

2017-08-28 Thread Kees Cook

Now that protocols have been annotated (the copy of icsk_ca_ops->name
is of an ops field from outside the slab cache):

$ git grep 'copy_.*_user.*sk.*->'
caif/caif_socket.c: copy_from_user(_sk->conn_req.param.data, ov, ol)) {
ipv4/raw.c:   if (copy_from_user(_sk(sk)->filter, optval, optlen))
ipv4/raw.c:   copy_to_user(optval, _sk(sk)->filter, len))
ipv4/tcp.c:   if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
ipv4/tcp.c:   if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len))
ipv6/raw.c:   if (copy_from_user(_sk(sk)->filter, optval, optlen))
ipv6/raw.c:   if (copy_to_user(optval, _sk(sk)->filter, len))
sctp/socket.c: if (copy_from_user(_sk(sk)->subscribe, optval, optlen))
sctp/socket.c: if (copy_to_user(optval, _sk(sk)->subscribe, len))
sctp/socket.c: if (copy_to_user(optval, _sk(sk)->initmsg, len))

we can switch the default proto usercopy region to size 0. Any protocols
needing to add whitelisted regions must annotate the fields with the
useroffset and usersize fields of struct proto.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: Paolo Abeni 
Cc: David Howells 
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 net/core/sock.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 02dab98ca3e3..c7d0afa1d0b1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3112,9 +3112,7 @@ int proto_register(struct proto *prot, int alloc_slab)
prot->slab = kmem_cache_create_usercopy(prot->name,
prot->obj_size, 0,
SLAB_HWCACHE_ALIGN | prot->slab_flags,
-   prot->usersize ? prot->useroffset : 0,
-   prot->usersize ? prot->usersize
-  : prot->obj_size,
+   prot->useroffset, prot->usersize,
NULL);
 
if (prot->slab == NULL) {
-- 
2.7.4

[PATCH v2 22/30] sctp: Copy struct sctp_sock.autoclose to userspace using put_user()

2017-08-28 Thread Kees Cook

From: David Windsor 

The autoclose field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor 
[kees: adjust commit log]
Cc: Vlad Yasevich 
Cc: Neil Horman 
Cc: "David S. Miller" 
Cc: linux-s...@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 net/sctp/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c8784cb216e4..a29e41e19d64 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4882,7 +4882,7 @@ static int sctp_getsockopt_autoclose(struct sock *sk, int 
len, char __user *optv
len = sizeof(int);
if (put_user(len, optlen))
return -EFAULT;
-   if (copy_to_user(optval, _sk(sk)->autoclose, sizeof(int)))
+   if (put_user(sctp_sk(sk)->autoclose, (int __user *)optval))
return -EFAULT;
return 0;
 }
-- 
2.7.4

[PATCH v2 19/30] ip: Define usercopy region in IP proto slab cache

2017-08-28 Thread Kees Cook

From: David Windsor 

The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

net/ipv4/raw.c:
raw_seticmpfilter(...):
...
copy_from_user(_sk(sk)->filter, ..., optlen)

raw_geticmpfilter(...):
...
copy_to_user(..., _sk(sk)->filter, len)

net/ipv6/raw.c:
rawv6_seticmpfilter(...):
...
copy_from_user(_sk(sk)->filter, ..., optlen)

rawv6_geticmpfilter(...):
...
copy_to_user(..., _sk(sk)->filter, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor 
[kees: split from network patch, provide usage trace]
Cc: "David S. Miller" 
Cc: Alexey Kuznetsov 
Cc: Hideaki YOSHIFUJI 
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 net/ipv4/raw.c | 2 ++
 net/ipv6/raw.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index b0bb5d0a30bd..6c7f8d2eb3af 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -964,6 +964,8 @@ struct proto raw_prot = {
.hash  = raw_hash_sk,
.unhash= raw_unhash_sk,
.obj_size  = sizeof(struct raw_sock),
+   .useroffset= offsetof(struct raw_sock, filter),
+   .usersize  = sizeof_field(struct raw_sock, filter),
.h.raw_hash= _v4_hashinfo,
 #ifdef CONFIG_COMPAT
.compat_setsockopt = compat_raw_setsockopt,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 60be012fe708..27dd9a5f71c6 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1265,6 +1265,8 @@ struct proto rawv6_prot = {
.hash  = raw_hash_sk,
.unhash= raw_unhash_sk,
.obj_size  = sizeof(struct raw6_sock),
+   .useroffset= offsetof(struct raw6_sock, filter),
+   .usersize  = sizeof_field(struct raw6_sock, filter),
.h.raw_hash= _v6_hashinfo,
 #ifdef CONFIG_COMPAT
.compat_setsockopt = compat_rawv6_setsockopt,
-- 
2.7.4

[PATCH v2 21/30] sctp: Define usercopy region in SCTP proto slab cache

2017-08-28 Thread Kees Cook

From: David Windsor 

The SCTP socket event notification subscription information need to be
copied to/from userspace. In support of usercopy hardening, this patch
defines a region in the struct proto slab cache in which userspace copy
operations are allowed. Additionally moves the usercopy fields to be
adjacent for the region to cover both.

example usage trace:

net/sctp/socket.c:
sctp_getsockopt_events(...):
...
copy_to_user(..., _sk(sk)->subscribe, len)

sctp_setsockopt_events(...):
...
copy_from_user(_sk(sk)->subscribe, ..., optlen)

sctp_getsockopt_initmsg(...):
...
copy_to_user(..., _sk(sk)->initmsg, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor 
[kees: split from network patch, move struct member adjacent, provide usage]
Cc: Vlad Yasevich 
Cc: Neil Horman 
Cc: "David S. Miller" 
Cc: linux-s...@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 include/net/sctp/structs.h | 9 +++--
 net/sctp/socket.c  | 4 
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 5ab29af8ca8a..f1d7810e200e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -202,12 +202,17 @@ struct sctp_sock {
/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
__u32 param_flags;
 
-   struct sctp_initmsg initmsg;
struct sctp_rtoinfo rtoinfo;
struct sctp_paddrparams paddrparam;
-   struct sctp_event_subscribe subscribe;
struct sctp_assocparams assocparams;
 
+   /*
+* These two structures must be grouped together for the usercopy
+* whitelist region.
+*/
+   struct sctp_event_subscribe subscribe;
+   struct sctp_initmsg initmsg;
+
int user_frag;
 
__u32 autoclose;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1db478e34520..c8784cb216e4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8235,6 +8235,10 @@ struct proto sctp_prot = {
.unhash  =  sctp_unhash,
.get_port=  sctp_get_port,
.obj_size=  sizeof(struct sctp_sock),
+   .useroffset  =  offsetof(struct sctp_sock, subscribe),
+   .usersize=  offsetof(struct sctp_sock, initmsg) -
+   offsetof(struct sctp_sock, subscribe) +
+   sizeof_field(struct sctp_sock, initmsg),
.sysctl_mem  =  sysctl_sctp_mem,
.sysctl_rmem =  sysctl_sctp_rmem,
.sysctl_wmem =  sysctl_sctp_wmem,
-- 
2.7.4

[PATCH v2 20/30] caif: Define usercopy region in caif proto slab cache

2017-08-28 Thread Kees Cook

From: David Windsor 

The CAIF channel connection request parameters need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

net/caif/caif_socket.c:
setsockopt(...):
...
copy_from_user(_sk->conn_req.param.data, ..., ol)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor 
[kees: split from network patch, provide usage trace]
Cc: Dmitry Tarnyagin 
Cc: "David S. Miller" 
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook 
---
 net/caif/caif_socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 632d5a416d97..c76d513b9a7a 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1032,6 +1032,8 @@ static int caif_create(struct net *net, struct socket 
*sock, int protocol,
static struct proto prot = {.name = "PF_CAIF",
.owner = THIS_MODULE,
.obj_size = sizeof(struct caifsock),
+   .useroffset = offsetof(struct caifsock, conn_req.param),
+   .usersize = sizeof_field(struct caifsock, conn_req.param)
};
 
if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
-- 
2.7.4

Re: Get ARP/ND tables from kernel

2017-08-28 Thread Bassam Alsanie

PJ,
Thank you. strace is really gonna help :)
It seem using Netlink (NETLINK_ROUTE) interface is the way to get the
arp/tables form kernel programmatically.

Thank you,
Bassam

On Sun, Aug 27, 2017 at 7:53 PM, Waskiewicz Jr, Peter
 wrote:
> On 8/27/17 9:25 PM, Bassam Alsanie wrote:
>> Hello everyone,
>> I looking into a good way (stable and compatible with large number of
>> distros) to get the arp/nd cache from kernel to user space, for both
>> IP4 and IP6.
>>
>> It seem IOCTL (SIOCGARP) can't do that, you can only get MAC address
>> from provided IP address. But IOCTL can't give the the full arp/nd
>> table.
>> The other option is the Netlink interface. I tried it and I got the
>> ARP/ND table :).
>> The third option is using /proc/net/arp, which only restricted to IP4.
>>
>> There is command line utilities that I excluding in my case.
>>
>> Is there another way to do it? what is the best way in my case?
>>
>> Thank you all.
>
> # strace arp -an
> [...]
> open("/proc/net/arp", O_RDONLY) = 4
> fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
> read(4, "IP address   HW type Fla"..., 1024) = 310
> [...]
>
> # strace ip -6 neighbor show
> [...]
> socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC, NETLINK_ROUTE) = 3
> setsockopt(3, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0
> setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1048576], 4) = 0
> bind(3, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=}, 12) = 0
> getsockname(3, {sa_family=AF_NETLINK, nl_pid=30292, nl_groups=},
> [12]) = 0
> sendto(3, {{len=40, type=RTM_GETLINK, flags=NLM_F_REQUEST|NLM_F_DUMP,
> seq=1503888680, pid=0},
> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\35\0\1\0\0\0"}, 40, 0, NULL, 0) = 40
> recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0,
> nl_groups=}, msg_namelen=12, msg_iov=[{iov_base=[{{len=1268,
> type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680, pid=30292},
> "\0\0\4\3\1\0\0\0I\0\1\0\0\0\0\0\7\0\3\0lo\0\0\10\0\r\0\350\3\0\0"...},
> {{len=1280, type=RTM_NEWLINK, flags=NLM_F_MULTI, seq=1503888680,
> pid=30292},
> "\0\0\1\0\2\0\0\0C\20\1\0\0\0\0\0\t\0\3\0eno1\0\0\0\0\10\0\r\0"...}],
> iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 2548
> [...]
>
> Seems like it's pretty obvious if you don't want to use the existing
> tools, just look at how the existing tools get this data.  IPv4 uses
> /proc/net/arp, IPv6 uses netlink.
>
> Cheers,
> -PJ
>

Re: [PATCH net-next] Revert "ipv4: make net_protocol const"

2017-08-28 Thread David Miller

From: David Ahern 
Date: Mon, 28 Aug 2017 15:03:34 -0600

> On 8/28/17 3:01 PM, David Miller wrote:
>> From: David Ahern 
>> Date: Mon, 28 Aug 2017 13:23:09 -0700
>> 
>>> This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4.
>> ...
>>> I think this is the second time such a patch has been reverted.
>> 
>> Then please add a comment, it will help prevent this from happening
>> again.
>> 
> 
> Was going to do that in both places after the revert. You want it as
> part of this one?

You're right a clean revert first is better.

I'll apply this now, thanks David.

Re: [PATCH] ipv6: sr: fix get_srh() to comply with IPv6 standard "RFC 8200"

2017-08-28 Thread Ahmed Abdelsalam

On Mon, 28 Aug 2017 19:48:15 +0100
David Lebrun  wrote:

> On 08/28/2017 07:20 PM, Ahmed Abdelsalam wrote:
> > This patch fixes the get_srh(), so it gets the segment routing header
> > regardless of its position in the chain of the extension headers in IPv6
> > packet, and makes sure that the IPv6 routing extension header is of
> > Type 4.
> 
> Ahmed,
> 
> You need to initialize srhoff to 0, otherwise ipv6_find_hdr() will crash
> the kernel by dereferencing an uninitialized pointer.
> 
> Please test your patches before submitting them.
> 
> Furthermore, your pskb_may_pull() check should happen right after the
> call to ipv6_find_hdr, with srhoff + sizeof(*srh) as argument. Once you
> have checked the SRH type, you can then do another pskb_may_pull with
> srhoff + len.
> 
> David
> 

Thanks David, 
 
I will address the comments and re-submit the patch after testing.

-- 
Ahmed

Re: [PATCH 2/6] nbd: make device_attribute const

2017-08-28 Thread Jens Axboe

On 08/21/2017 05:43 AM, Bhumika Goyal wrote:
> Make this const as is is only passed as an argument to the
> function device_create_file and device_remove_file and the corresponding
> arguments are of type const.
> Done using Coccinelle

Added for 4.14, thanks.

-- 
Jens Axboe

Re: [PATCH net-next] Revert "ipv4: make net_protocol const"

2017-08-28 Thread Stephen Hemminger

On Mon, 28 Aug 2017 13:23:09 -0700
David Ahern  wrote:

> This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4.
> 
> Early demux structs can not be made const. Doing so results in:
> [   84.967355] BUG: unable to handle kernel paging request at 81684b10
> [   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
> [   84.970544] PGD 1a0a067
> [   84.970546] P4D 1a0a067
> [   84.971212] PUD 1a0b063
> [   84.971733] PMD 816001e1
> 
> [   84.972669] Oops: 0003 [#1] SMP
> [   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
> [   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
> [   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.7.5-20140531_083030-gandalf 04/01/2014
> [   84.975855] task: 88003854ce00 task.stack: c95a4000
> [   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
> [   84.977253] RSP: 0018:c95a7dd0 EFLAGS: 00010246
> [   84.977891] RAX: 81684b10 RBX: 0001 RCX: 
> 
> [   84.978759] RDX:  RSI: 0006 RDI: 
> 
> [   84.979628] RBP: c95a7dd0 R08:  R09: 
> 
> [   84.980501] R10: 0001 R11: 0008 R12: 
> 0001
> [   84.981373] R13: ffea R14: 81a9b4c0 R15: 
> 0002
> [   84.982249] FS:  7feb237b7700() GS:88003fc0() 
> knlGS:
> [   84.983231] CS:  0010 DS:  ES:  CR0: 80050033
> [   84.983941] CR2: 81684b10 CR3: 38492000 CR4: 
> 000406f0
> [   84.984817] Call Trace:
> [   84.985133]  proc_tcp_early_demux+0x29/0x30
> 
> I think this is the second time such a patch has been reverted.
> 
> Cc: Bhumika Goyal 
> Signed-off-by: David Ahern 

This would have been caught at compile time if you tried setting inet_protos to 
const.
Start with that if you want to give it a try.



diff --git a/include/net/protocol.h b/include/net/protocol.h
index 65ba335b0e7e..373fa92d33ff 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -93,7 +93,7 @@ struct inet_protosw {
 #define INET_PROTOSW_PERMANENT 0x02  /* Permanent protocols are unremovable. */
 #define INET_PROTOSW_ICSK  0x04  /* Is this an inet_connection_sock? */
 
-extern struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
+extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
 extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
 extern const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
 
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 32a691b7ce2c..4b7c0ec65251 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 
-struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
+const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
 const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
 EXPORT_SYMBOL(inet_offloads);


net/ipv4/sysctl_net_ipv4.c: In function ‘proc_configure_early_demux’:
net/ipv4/sysctl_net_ipv4.c:310:9: warning: assignment discards ‘const’ 
qualifier from pointer target type [-Wdiscarded-qualifiers]
  ipprot = rcu_dereference(inet_protos[protocol]);

Re: Fwd: DA850-evm MAC Address is random

2017-08-28 Thread Tony Lindgren

* Adam Ford  [170828 13:33]:
> On Mon, Aug 28, 2017 at 1:54 PM, Grygorii Strashko
>  wrote:
> > Cc: Sekhar
> >
> > On 08/28/2017 10:32 AM, Adam Ford wrote:
> >>
> >> The davinvi_emac MAC address seems to attempt a call to
> >> ti_cm_get_macid in cpsw-common.c but it returns the message
> >> 'davinci_emac davinci_emac.1: incompatible machine/device type for
> >> reading mac address ' and then generates a random MAC address.
> >>
> >> The function appears to lookup varions boards using
> >> 'of_machine_is_compaible' and supports dm8148, am33xx, am3517, dm816,
> >> am4372 and dra7.  I don't see the ti,davinci-dm6467-emac which is
> >> what's shown in the da850 device tree.
> >>
> >> Is there a patch somewhere for supporting the da850-evm?
> >
> >
> > Not sure if MAC address can be read from Control module.
> > May be Sekhar can say more?
> 
> My understanding is that the MAC address is programmed by Logic PD
> into the SPI flash.  The Bootloader reads this from either SPI or its
> env variables.  Looking at the partition info listed in the
> da850-evm.dts file, it appears as if they've reserved space for it.
> Unfortunately, I don't see any code that reads it out.  I was hoping
> there might be a way to just pass cmdline parameter from the
> bootloader to the kernel to accept the MAC address.
> 
> >
> >>
> >> If not, is there a way to pass the MAC address from U-Boot to the
> >> driver so it doesn't generate a random MAC?
> >
> >
> > "local-mac-address" dt porp
> 
> The downside here, is that we'd have to have the Bootloader modify the
> device tree.

That piece of code exists somewhere in u-boot already. Note how
we are populating the mac address for USB Ethernet drivers in
u-boot and then the Ethernet driver code parses it. See commit
055d31de7158 ("ARM: omap3: beagleboard-xm: dt: Add ethernet to
the device tree") for some more information.

I think u-boot needs the ethernet alias for finding the interface.

Regards,

Tony

Re: [PATCH 1/6] ACPI: make device_attribute const

2017-08-28 Thread Rafael J. Wysocki

On Monday, August 21, 2017 1:43:07 PM CEST Bhumika Goyal wrote:
> Make these const as they are only passed as an argument to the function
> device_create_file and device_remove_file and the corresponding
> arguments are of type const.
> Done using Coccinelle
> 
> Signed-off-by: Bhumika Goyal 
> ---
>  drivers/acpi/battery.c | 2 +-
>  drivers/acpi/sbs.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
> index 1cbb88d..13e7b56 100644
> --- a/drivers/acpi/battery.c
> +++ b/drivers/acpi/battery.c
> @@ -620,7 +620,7 @@ static ssize_t acpi_battery_alarm_store(struct device 
> *dev,
>   return count;
>  }
>  
> -static struct device_attribute alarm_attr = {
> +static const struct device_attribute alarm_attr = {
>   .attr = {.name = "alarm", .mode = 0644},
>   .show = acpi_battery_alarm_show,
>   .store = acpi_battery_alarm_store,
> diff --git a/drivers/acpi/sbs.c b/drivers/acpi/sbs.c
> index a184637..a2428e9 100644
> --- a/drivers/acpi/sbs.c
> +++ b/drivers/acpi/sbs.c
> @@ -474,7 +474,7 @@ static ssize_t acpi_battery_alarm_store(struct device 
> *dev,
>   return count;
>  }
>  
> -static struct device_attribute alarm_attr = {
> +static const struct device_attribute alarm_attr = {
>   .attr = {.name = "alarm", .mode = 0644},
>   .show = acpi_battery_alarm_show,
>   .store = acpi_battery_alarm_store,
> 

Applied, thanks!

Re: [PATCH net-next] Revert "ipv4: make net_protocol const"

2017-08-28 Thread David Ahern

On 8/28/17 3:01 PM, David Miller wrote:
> From: David Ahern 
> Date: Mon, 28 Aug 2017 13:23:09 -0700
> 
>> This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4.
> ...
>> I think this is the second time such a patch has been reverted.
> 
> Then please add a comment, it will help prevent this from happening
> again.
> 

Was going to do that in both places after the revert. You want it as
part of this one?

Re: [PATCH net-next] Revert "ipv4: make net_protocol const"

2017-08-28 Thread David Miller

From: David Ahern 
Date: Mon, 28 Aug 2017 13:23:09 -0700

> This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4.
...
> I think this is the second time such a patch has been reverted.

Then please add a comment, it will help prevent this from happening
again.

RE: [RFC PATCH] net: limit maximum number of packets to mark with xmit_more

2017-08-28 Thread Keller, Jacob E

> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
> Behalf Of Jakub Kicinski
> Sent: Friday, August 25, 2017 12:34 PM
> To: Keller, Jacob E 
> Cc: netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with
> xmit_more
> 
> On Fri, 25 Aug 2017 08:24:49 -0700, Jacob Keller wrote:
> > Under some circumstances, such as with many stacked devices, it is
> > possible that dev_hard_start_xmit will bundle many packets together, and
> > mark them all with xmit_more.
> 
> Excuse my ignorance but what are those stacked devices?  Could they
> perhaps be fixed somehow?  My intuition was that long xmit_more
> sequences can only happen if NIC and/or BQL are back pressuring, and
> therefore we shouldn't be seeing a long xmit_more "train" arriving at
> an empty device ring...

a veth device connecting a VM to the host, then connected to a bridge, which is 
connected to a vlan interface connected to a bond, which is hooked in 
active-backup to a physical device.

Sorry if I don't really know the correct way to refer to these, I just think of 
them as devices stacked on top of each other.

During root cause investigation I found that we (the i40e driver) sometimes 
received up to 100 or more SKBs in a row with xmit_more set. We were 
incorrectly also using xmit_more as a hint for not marking packets to get 
writebacks, which caused significant throughput issues. Additionally there was 
concern that that many packets in a row without a tail bump would cause latency 
issues, so I thought maybe it was best to simply guarantee that the stack 
didn't send us too many packets marked with xmit more at once.

It seems based on discussion that it should be up to the driver to determine 
exactly how to handle the xmit_more hint and to determine when it actually 
isn't helpful or not, so I do not think this patch makes sense now.

Thanks,
Jake

[PATCH net-next] ipv6: Use rt6i_idev index for echo replies to a local address

2017-08-28 Thread David Ahern

Tariq repored local pings to linklocal address is failing:
$ ifconfig ens8
ens8: flags=4163  mtu 1500
inet 11.141.16.6  netmask 255.255.0.0  broadcast 11.141.255.255
inet6 fe80::7efe:90ff:fecb:7502  prefixlen 64  scopeid 0x20
ether 7c:fe:90:cb:75:02  txqueuelen 1000  (Ethernet)
RX packets 12  bytes 1164 (1.1 KiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 30  bytes 2484 (2.4 KiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

$  /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes

--- fe80::7efe:90ff:fecb:7502%ens8 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2043ms

icmpv6_echo_reply needs to use the rt6i_idev dev index for local traffic
similar to how icmp6_send does. Convert the change for icmp6_send into a
helper that can be used in both places. Add the long over due
skb_rt6_info helper to convert dst on an skb to rt6_info similar to
skb_rtable for ipv4.

Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on
   device with address")
Reported-by: Tariq Toukan 
Signed-off-by: David Ahern 
---
 include/net/ip6_route.h | 10 ++
 net/ipv6/icmp.c | 33 -
 2 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 882bc3c7ccde..ee96f402cb75 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -164,6 +164,16 @@ void rt6_mtu_change(struct net_device *dev, unsigned int 
mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
 
+static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
+{
+   const struct dst_entry *dst = skb_dst(skb);
+   const struct rt6_info *rt6 = NULL;
+
+   if (dst)
+   rt6 = container_of(dst, struct rt6_info, dst);
+
+   return rt6;
+}
 
 /*
  * Store a destination cache entry in a socket
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index dd7608cf1d72..c25b5954cfbb 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -399,6 +399,24 @@ static struct dst_entry *icmpv6_route_lookup(struct net 
*net,
return ERR_PTR(err);
 }
 
+static int icmp6_iif(const struct sk_buff *skb)
+{
+   int iif = skb->dev->ifindex;
+
+   /* for local traffic to local address, skb dev is the loopback
+* device. Check if there is a dst attached to the skb and if so
+* get the real device index.
+*/
+   if (unlikely(iif == LOOPBACK_IFINDEX)) {
+   const struct rt6_info *rt6 = skb_rt6_info(skb);
+
+   if (rt6)
+   iif = rt6->rt6i_idev->dev->ifindex;
+   }
+
+   return iif;
+}
+
 /*
  * Send an ICMP message in response to a packet in error
  */
@@ -460,18 +478,7 @@ static void icmp6_send(struct sk_buff *skb, u8 type, u8 
code, __u32 info,
 */
 
if (__ipv6_addr_needs_scope_id(addr_type)) {
-   iif = skb->dev->ifindex;
-
-   /* for local packets, get the real device index */
-   if (iif == LOOPBACK_IFINDEX) {
-   dst = skb_dst(skb);
-   if (dst) {
-   struct rt6_info *rt;
-
-   rt = container_of(dst, struct rt6_info, dst);
-   iif = rt->rt6i_idev->dev->ifindex;
-   }
-   }
+   iif = icmp6_iif(skb);
} else {
dst = skb_dst(skb);
iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev);
@@ -694,7 +701,7 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
fl6.daddr = ipv6_hdr(skb)->saddr;
if (saddr)
fl6.saddr = *saddr;
-   fl6.flowi6_oif = skb->dev->ifindex;
+   fl6.flowi6_oif = icmp6_iif(skb);
fl6.fl6_icmp_type = ICMPV6_ECHO_REPLY;
fl6.flowi6_mark = mark;
fl6.flowi6_uid = sock_net_uid(net, NULL);
-- 
2.1.4

RE: [RFC PATCH] net: limit maximum number of packets to mark with xmit_more

2017-08-28 Thread Keller, Jacob E

> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
> Behalf Of Alexander Duyck
> Sent: Friday, August 25, 2017 3:34 PM
> To: Stephen Hemminger 
> Cc: Waskiewicz Jr, Peter ; Keller, Jacob E
> ; netdev@vger.kernel.org
> Subject: Re: [RFC PATCH] net: limit maximum number of packets to mark with
> xmit_more
> 
> On Fri, Aug 25, 2017 at 8:58 AM, Stephen Hemminger
>  wrote:
> > On Fri, 25 Aug 2017 15:36:22 +
> > "Waskiewicz Jr, Peter"  wrote:
> >
> >> On 8/25/17 11:25 AM, Jacob Keller wrote:
> >> > Under some circumstances, such as with many stacked devices, it is
> >> > possible that dev_hard_start_xmit will bundle many packets together, and
> >> > mark them all with xmit_more.
> >> >
> >> > Most drivers respond to xmit_more by skipping tail bumps on packet
> >> > rings, or similar behavior as long as xmit_more is set. This is
> >> > a performance win since it means drivers can avoid notifying hardware of
> >> > new packets repeat daily, and thus avoid wasting unnecessary PCIe or 
> >> > other
> >> > bandwidth.
> >> >
> >> > This use of xmit_more comes with a trade off because bundling too many
> >> > packets can increase latency of the Tx packets. To avoid this, we should
> >> > limit the maximum number of packets with xmit_more.
> >> >
> >> > Driver authors could modify their drivers to check for some determined
> >> > limit, but this requires all drivers to be modified in order to gain
> >> > advantage.
> >> >
> >> > Instead, add a sysctl "xmit_more_max" which can be used to configure the
> >> > maximum number of xmit_more skbs to send in a sequence. This ensures
> >> > that all drivers benefit, and allows system administrators the option to
> >> > tune the value to their environment.
> >> >
> >> > Signed-off-by: Jacob Keller 
> >> > ---
> >> >
> >> > Stray thoughts and further questions
> >> >
> >> > Is this the right approach? Did I miss any other places where we should
> >> > limit? Does the limit make sense? Should it instead be a per-device
> >> > tuning nob instead of a global? Is 32 a good default?
> >>
> >> I actually like the idea of a per-device knob.  A xmit_more_max that's
> >> global in a system with 1GbE devices along with a 25/50GbE or more just
> >> doesn't make much sense to me.  Or having heterogeneous vendor devices
> >> in the same system that have different HW behaviors could mask issues
> >> with latency.
> >>
> >> This seems like another incarnation of possible buffer-bloat if the max
> >> is too high...
> >>
> >> >
> >> >   Documentation/sysctl/net.txt |  6 ++
> >> >   include/linux/netdevice.h|  2 ++
> >> >   net/core/dev.c   | 10 +-
> >> >   net/core/sysctl_net_core.c   |  7 +++
> >> >   4 files changed, 24 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> >> > index b67044a2575f..3d995e8f4448 100644
> >> > --- a/Documentation/sysctl/net.txt
> >> > +++ b/Documentation/sysctl/net.txt
> >> > @@ -230,6 +230,12 @@ netdev_max_backlog
> >> >   Maximum number  of  packets,  queued  on  the  INPUT  side, when the
> interface
> >> >   receives packets faster than kernel can process them.
> >> >
> >> > +xmit_more_max
> >> > +-
> >> > +
> >> > +Maximum number of packets in a row to mark with skb->xmit_more. A value
> of zero
> >> > +indicates no limit.
> >>
> >> What defines "packet?"  MTU-sized packets, or payloads coming down from
> >> the stack (e.g. TSO's)?
> >
> > xmit_more is only a hint to the device. The device driver should ignore it 
> > unless
> > there are hardware advantages. The device driver is the place with HW 
> > specific
> > knowledge (like 4 Tx descriptors is equivalent to one PCI transaction on 
> > this
> device).
> >
> > Anything that pushes that optimization out to the user is only useful for
> benchmarks
> > and embedded devices.
> 
> Actually I think I might have an idea what is going on here and I
> agree that this is probably something that needs to be fixed in the
> drivers. Especially since the problem isn't so much the skbs but
> descriptors in the descriptor ring.
> 
> If I am not mistaken the issue is most drivers will honor the
> xmit_more unless the ring cannot enqueue another packet. The problem
> is if the clean-up is occurring on a different CPU than transmit we
> can cause the clean-up CPU/device DMA to go idle by not providing any
> notifications to the device that new packets are present. What we
> should probably do is look at adding another condition which is to
> force us to flush the packet if we have used over half of the
> descriptors in a given ring without notifying the device. Then that
> way we can be filling half while the device is processing the other
> half which should result in us operating

Re: Fwd: DA850-evm MAC Address is random

2017-08-28 Thread Adam Ford

On Mon, Aug 28, 2017 at 1:54 PM, Grygorii Strashko
 wrote:
> Cc: Sekhar
>
> On 08/28/2017 10:32 AM, Adam Ford wrote:
>>
>> The davinvi_emac MAC address seems to attempt a call to
>> ti_cm_get_macid in cpsw-common.c but it returns the message
>> 'davinci_emac davinci_emac.1: incompatible machine/device type for
>> reading mac address ' and then generates a random MAC address.
>>
>> The function appears to lookup varions boards using
>> 'of_machine_is_compaible' and supports dm8148, am33xx, am3517, dm816,
>> am4372 and dra7.  I don't see the ti,davinci-dm6467-emac which is
>> what's shown in the da850 device tree.
>>
>> Is there a patch somewhere for supporting the da850-evm?
>
>
> Not sure if MAC address can be read from Control module.
> May be Sekhar can say more?

My understanding is that the MAC address is programmed by Logic PD
into the SPI flash.  The Bootloader reads this from either SPI or its
env variables.  Looking at the partition info listed in the
da850-evm.dts file, it appears as if they've reserved space for it.
Unfortunately, I don't see any code that reads it out.  I was hoping
there might be a way to just pass cmdline parameter from the
bootloader to the kernel to accept the MAC address.

>
>>
>> If not, is there a way to pass the MAC address from U-Boot to the
>> driver so it doesn't generate a random MAC?
>
>
> "local-mac-address" dt porp

The downside here, is that we'd have to have the Bootloader modify the
device tree.
>
> --
> regards,
> -grygorii

thanks

adam

[PATCH net-next v1] amd-xgbe: Interrupt summary bits are h/w version dependent

2017-08-28 Thread Tom Lendacky

There is a difference in the bit position of the normal interrupt summary
enable (NIE) and abnormal interrupt summary enable (AIE) between revisions
of the hardware.  For older revisions the NIE and AIE bits are positions
16 and 15 respectively.  For newer revisions the NIE and AIE bits are
positions 15 and 14.  The effect in changing the bit position is that
newer hardware won't receive AIE interrupts in the current version of the
driver.  Specifically, the driver uses this interrupt to collect
statistics on when a receive buffer unavailable event occurs and to
restart the driver/device when a fatal bus error occurs.

Update the driver to set the interrupt enable bit based on the reported
version of the hardware.

Signed-off-by: Tom Lendacky 
---
 drivers/net/ethernet/amd/xgbe/xgbe-common.h |8 ++--
 drivers/net/ethernet/amd/xgbe/xgbe-dev.c|   13 ++---
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-common.h 
b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
index 9431330..7ea72ef 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-common.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-common.h
@@ -210,11 +210,15 @@
 #define DMA_CH_CR_PBLX8_WIDTH  1
 #define DMA_CH_CR_SPH_INDEX24
 #define DMA_CH_CR_SPH_WIDTH1
-#define DMA_CH_IER_AIE_INDEX   15
+#define DMA_CH_IER_AIE20_INDEX 15
+#define DMA_CH_IER_AIE20_WIDTH 1
+#define DMA_CH_IER_AIE_INDEX   14
 #define DMA_CH_IER_AIE_WIDTH   1
 #define DMA_CH_IER_FBEE_INDEX  12
 #define DMA_CH_IER_FBEE_WIDTH  1
-#define DMA_CH_IER_NIE_INDEX   16
+#define DMA_CH_IER_NIE20_INDEX 16
+#define DMA_CH_IER_NIE20_WIDTH 1
+#define DMA_CH_IER_NIE_INDEX   15
 #define DMA_CH_IER_NIE_WIDTH   1
 #define DMA_CH_IER_RBUE_INDEX  7
 #define DMA_CH_IER_RBUE_WIDTH  1
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c 
b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
index 671203d..e107e18 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
@@ -649,13 +649,15 @@ static void xgbe_config_flow_control(struct xgbe_prv_data 
*pdata)
 static void xgbe_enable_dma_interrupts(struct xgbe_prv_data *pdata)
 {
struct xgbe_channel *channel;
-   unsigned int i;
+   unsigned int i, ver;
 
/* Set the interrupt mode if supported */
if (pdata->channel_irq_mode)
XGMAC_IOWRITE_BITS(pdata, DMA_MR, INTM,
   pdata->channel_irq_mode);
 
+   ver = XGMAC_GET_BITS(pdata->hw_feat.version, MAC_VR, SNPSVER);
+
for (i = 0; i < pdata->channel_count; i++) {
channel = pdata->channel[i];
 
@@ -671,8 +673,13 @@ static void xgbe_enable_dma_interrupts(struct 
xgbe_prv_data *pdata)
 *   AIE  - Abnormal Interrupt Summary Enable
 *   FBEE - Fatal Bus Error Enable
 */
-   XGMAC_SET_BITS(channel->curr_ier, DMA_CH_IER, NIE, 1);
-   XGMAC_SET_BITS(channel->curr_ier, DMA_CH_IER, AIE, 1);
+   if (ver < 0x21) {
+   XGMAC_SET_BITS(channel->curr_ier, DMA_CH_IER, NIE20, 1);
+   XGMAC_SET_BITS(channel->curr_ier, DMA_CH_IER, AIE20, 1);
+   } else {
+   XGMAC_SET_BITS(channel->curr_ier, DMA_CH_IER, NIE, 1);
+   XGMAC_SET_BITS(channel->curr_ier, DMA_CH_IER, AIE, 1);
+   }
XGMAC_SET_BITS(channel->curr_ier, DMA_CH_IER, FBEE, 1);
 
if (channel->tx_ring) {

[PATCH net-next] Revert "ipv4: make net_protocol const"

2017-08-28 Thread David Ahern

This reverts commit aa8db499ea67cff1f5f049033810ffede2fe5ae4.

Early demux structs can not be made const. Doing so results in:
[   84.967355] BUG: unable to handle kernel paging request at 81684b10
[   84.969272] IP: proc_configure_early_demux+0x1e/0x3d
[   84.970544] PGD 1a0a067
[   84.970546] P4D 1a0a067
[   84.971212] PUD 1a0b063
[   84.971733] PMD 816001e1

[   84.972669] Oops: 0003 [#1] SMP
[   84.973065] Modules linked in: ip6table_filter ip6_tables veth vrf
[   84.973833] CPU: 0 PID: 955 Comm: sysctl Not tainted 4.13.0-rc6+ #22
[   84.974612] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.7.5-20140531_083030-gandalf 04/01/2014
[   84.975855] task: 88003854ce00 task.stack: c95a4000
[   84.976580] RIP: 0010:proc_configure_early_demux+0x1e/0x3d
[   84.977253] RSP: 0018:c95a7dd0 EFLAGS: 00010246
[   84.977891] RAX: 81684b10 RBX: 0001 RCX: 
[   84.978759] RDX:  RSI: 0006 RDI: 
[   84.979628] RBP: c95a7dd0 R08:  R09: 
[   84.980501] R10: 0001 R11: 0008 R12: 0001
[   84.981373] R13: ffea R14: 81a9b4c0 R15: 0002
[   84.982249] FS:  7feb237b7700() GS:88003fc0() 
knlGS:
[   84.983231] CS:  0010 DS:  ES:  CR0: 80050033
[   84.983941] CR2: 81684b10 CR3: 38492000 CR4: 000406f0
[   84.984817] Call Trace:
[   84.985133]  proc_tcp_early_demux+0x29/0x30

I think this is the second time such a patch has been reverted.

Cc: Bhumika Goyal 
Signed-off-by: David Ahern 
---
Bhumika: How are you testing these constify changes? In this case a simple
sysctl -w net.ipv4.tcp_early_demux=1 would have shown the problem

 net/ipv4/af_inet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 19aee073ba29..d678820e4306 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1596,7 +1596,7 @@ static const struct net_protocol igmp_protocol = {
 };
 #endif
 
-static const struct net_protocol tcp_protocol = {
+static struct net_protocol tcp_protocol = {
.early_demux=   tcp_v4_early_demux,
.early_demux_handler =  tcp_v4_early_demux,
.handler=   tcp_v4_rcv,
@@ -1606,7 +1606,7 @@ static const struct net_protocol tcp_protocol = {
.icmp_strict_tag_validation = 1,
 };
 
-static const struct net_protocol udp_protocol = {
+static struct net_protocol udp_protocol = {
.early_demux =  udp_v4_early_demux,
.early_demux_handler =  udp_v4_early_demux,
.handler =  udp_rcv,
-- 
2.1.4

Re: [PATCH net-next v2 01/10] net: dsa: add debugfs interface

2017-08-28 Thread Andrew Lunn

On Mon, Aug 28, 2017 at 03:17:39PM -0400, Vivien Didelot wrote:
> This commit adds a DEBUG_FS dependent DSA core file creating a generic
> debug filesystem interface for the DSA switch devices.
> 
> The interface can be mounted with:
> 
> # mount -t debugfs none /sys/kernel/debug
> 
> The dsa directory contains one directory per switch chip:
> 
> # cd /sys/kernel/debug/dsa/
> # ls
> switch0  switch1 switch2
> 
> Each chip directory contains one directory per port:
> 
> # ls -l switch0/
> drwxr-xr-x 2 root root 0 Jan  1 00:00 port0
> drwxr-xr-x 2 root root 0 Jan  1 00:00 port1
> drwxr-xr-x 2 root root 0 Jan  1 00:00 port2
> drwxr-xr-x 2 root root 0 Jan  1 00:00 port5
> drwxr-xr-x 2 root root 0 Jan  1 00:00 port6
> 
> Future patches will add entry files to these directories.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next v2 03/10] net: dsa: debugfs: add tag_protocol

2017-08-28 Thread Andrew Lunn

On Mon, Aug 28, 2017 at 03:17:41PM -0400, Vivien Didelot wrote:
> Add a debug filesystem "tag_protocol" entry to query the switch tagging
> protocol through the .get_tag_protocol operation.
> 
> # cat switch1/tag_protocol
> EDSA
> 
> To ease maintenance of tag protocols, add a dsa_tag_protocol_name helper
> to the public API which to convert a tag protocol enum to a string.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next v2 00/10] net: dsa: add generic debugfs interface

2017-08-28 Thread Andrew Lunn

> I see this overlaps a lot with DPIPE. Why won't you use that to expose
> your hw state?

We took a look at dpipe and i talked to you about using it for this
sort of thing at netconf/netdev. But dpipe has issues displaying the
sort of information we have. I never figured out how to do two
dimensional tables. The output of the dpipe command is pretty
unreadable. A lot of the information being dumped here is not about
the data pipe, etc.

There is a lot of pushback on debugfs for individual drivers. As i
said recently to somebody, debugfs is a bit of a wild west. When
designing this code, we thought about that. This debugfs is not at the
driver level. It is at the DSA level. All DSA drivers will benefit
from this code, and all DSA drivers will get the same information
exposed in debugfs. It is generic, well defined and structured, with
respect to DSA.

Andrew

1 2 3 >

1 - 100 of 279 matches

Mail list logo