date:20170331

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Alexei Starovoitov


On 3/31/17 10:32 PM, Wangnan (F) wrote:


OK. Then let your patch be merged first then let me do the code cleanup.


Thanks!

Once these two lib/bpf patches applied to net-next we can apply
the same to tip and there will be no conflicts during the merge
window.

I'm also planning to fix few things later:

- there is an obscure error
libbpf: relocation failed: no 11 section
It's printed when llvm generates access to global variables.
I think we need to improve the error message to point out the actual
reason of the failure, so the user can fix the program.

- there is another equally obscure error
libbpf: Program 'foo' contains non-map related relo data pointing to 
section 8

It's happening when program is compiled with -g.
It's just a bug in libbpf and the fix is straightforward.

so I suspect these fixes will also go into both net-next and tip.

Re: [PATCH net-next 1/1 v2] net: rmnet_data: Initial implementation

2017-03-31 Thread Subash Abhinov Kasiviswanathan


Yeah, seems quite a bit like VLAN (from a workflow perspective, not
quite as much from a protocol one) and I think the same workflow could
work for this too.  Would be nice to eventually get qmi_wwan onto the
same base, if possible (though we'd need to preserve the 802.3
capability somehow for devices that don't support raw-ip).

It doesn't necessarily mean that configuration would need to move to
the IP tool.  I just used it as an example of how VLAN works and how
rmnet could work as well, quite easily with the ip tool.

Since the ip tool is based on netlink, both it and your userspace
library could use the same netlink attributes and families to do the
same thing.

Essentially, I am recommending that instead of your current custom
netlink commands, port them over to rtnetlink which will mean less code
for you, and a more standard kernel interface for everyone.


Thanks for your comments. I'll work on conversion into rtnl_link_ops.

Ethernet frames are supported in pass through mode (though not used 
often)

but they cannot be used in conjunction with MAP functionality.


Does the aggregation happen at the level of the raw device, or at the
level of the MUX channels?  eg, can I aggregate packets from multiple
MUX channels into the same request, especially on USB devices?

Hardware does allow aggregation of packets from different mux channels 
in

a single frame.


One use-case is to put different packet data contexts into different
namespaces.  You could then isolate different EPS/PDP contexts by
putting them into different network namespaces, and for example have
your IMS handler only be able to access its own EPS/PDP context.

We could already do this with qmi_wwan on devices that provide multiple
USB endpoints for QMI/rmnet, but I thought the point of the MUX
protocol was to allow a single endpoint for rmnet that can MUX multiple
packet data contexts.  So it would be nice to allow each rmnet netdev
to be placed into a different network namespace.


I need to study more about namespaces since I am not familiar with it.
I'll add support for it in a follow up patchset.


Like a usb gadget rmnet interface for debugging?


Yes, its mostly used for test only.
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

a Linux Foundation Collaborative Project

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Wangnan (F)




On 2017/4/1 11:18, Alexei Starovoitov wrote:

On 3/31/17 7:29 PM, Wangnan (F) wrote:



On 2017/3/31 12:45, Alexei Starovoitov wrote:

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  tools/lib/bpf/libbpf.c | 3 +--
  tools/lib/bpf/libbpf.h | 2 ++
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ac6eb863b2a4..1a2c07eb7795 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1618,8 +1618,7 @@ int bpf_program__nth_fd(struct bpf_program
*prog, int n)
  return fd;
  }
  -static void bpf_program__set_type(struct bpf_program *prog,
-  enum bpf_prog_type type)
+void bpf_program__set_type(struct bpf_program *prog, enum
bpf_prog_type type)
  {
  prog->type = type;
  }


Since it become a public interface, we need to check if prog is a
NULL pointer and check if the value of type is okay, and let it
return an errno.


I strongly disagree with such defensive programming. It's a cause
of bugs for users of this library.
I think you're trying to mimic c++ setters/getters, but c++
never checks 'this != null', since passing null into setter
is a bug of the user of the library and not the library.
The setters also should have 'void' return type when setters
cannot fail. That is exactly the case here.
If, in the future, we decide that this libbpf shouldn't support
all bpf program types then you'd need to change the prototype
of this function to return error code and change all places
where this function is called to check for error code.
It may or may not be the right approach.
For example today the only user of bpf_program__set*() methods
is perf/util/bpf-loader.c and it calls bpf_program__set_kprobe() and
bpf_program__set_tracepoint() _without_ checking the return value
which is _correct_ thing to do. Instead the current prototype of
'int bpf_program__set_tracepoint(struct bpf_program *prog);
is not correct and I suggest you to fix it.

You also need to do other cleanup. Like in bpf_object__elf_finish()
you have:
if (!obj_elf_valid(obj))
return;

if (obj->efile.elf) {

which is redundant. It's another example where mistakes creep in
due to defensive programming.

Another bug in bpf_object__close() which does:
if (!obj)
return;
again defensive programming strikes, since
you're not checking IS_ERR(obj) and that's what bpf_object__open()
returns, so most users of the library (who don't read the source
code and just using it based on .h) will do

obj = bpf_object__open(...);
bpf_object__close(obj);

and current 'if (!obj)' won't help and it will segfault.
I hit this issue will developing this patch set.


diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b30394f9947a..32c7252f734e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include   // for size_t
+#include 
enum libbpf_errno {
  __LIBBPF_ERRNO__START = 4000,
@@ -185,6 +186,7 @@ int bpf_program__set_sched_cls(struct bpf_program
*prog);
  int bpf_program__set_sched_act(struct bpf_program *prog);
  int bpf_program__set_xdp(struct bpf_program *prog);
  int bpf_program__set_perf_event(struct bpf_program *prog);
+void bpf_program__set_type(struct bpf_program *prog, enum
bpf_prog_type type);



The above bpf_program__set_xxx become redundancy. It should be generated
using macro as static inline functions.


  bool bpf_program__is_socket_filter(struct bpf_program *prog);
  bool bpf_program__is_tracepoint(struct bpf_program *prog);


bpf_program__is_xxx should be updated like bpf_program__set_xxx, since
enum bpf_prog_type is not a problem now.


All of these suggestions is a cleanup for your code that you
need to do yourself. I actually suggest you kill all bpf_program__is*()
and all but one bpf_program__set*() functions.
The current user perf/util/bpf-loader.c should be converted
to using bpf_program__set_type() with _void_ return code that
I'm introducing here.

Overall, I think, tools/lib/bpf/ is a nice library and it can be used
by many projects, but I suggest to stop making excuses based on
your proprietary usage of it.

Also please cc me in the future on changes to the library. It still
has my copyrights, though a lot has changed, since last time
I looked at it and it's my fault for not pay attention earlier.



OK. Then let your patch be merged first then let me do the code cleanup.

Thank you.

Re: [PATCH V2 net-next 1/7] ptr_ring: introduce batch dequeuing

2017-03-31 Thread Jason Wang




On 2017年03月31日 22:31, Michael S. Tsirkin wrote:

On Fri, Mar 31, 2017 at 11:52:24AM +0800, Jason Wang wrote:

On 2017年03月30日 21:53, Michael S. Tsirkin wrote:

On Thu, Mar 30, 2017 at 03:22:24PM +0800, Jason Wang wrote:

This patch introduce a batched version of consuming, consumer can
dequeue more than one pointers from the ring at a time. We don't care
about the reorder of reading here so no need for compiler barrier.

Signed-off-by: Jason Wang
---
   include/linux/ptr_ring.h | 65 

   1 file changed, 65 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 6c70444..2be0f350 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -247,6 +247,22 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
return ptr;
   }
+static inline int __ptr_ring_consume_batched(struct ptr_ring *r,
+void **array, int n)

Can we use a shorter name? ptr_ring_consume_batch?

Ok, but at least we need to keep the prefix since there's a locked version.




+{
+   void *ptr;
+   int i;
+
+   for (i = 0; i < n; i++) {
+   ptr = __ptr_ring_consume(r);
+   if (!ptr)
+   break;
+   array[i] = ptr;
+   }
+
+   return i;
+}
+
   /*
* Note: resize (below) nests producer lock within consumer lock, so if you
* call this in interrupt or BH context, you must disable interrupts/BH when

I'd like to add a code comment here explaining why we don't
care about cpu or compiler reordering. And I think the reason is
in the way you use this API: in vhost it does not matter
if you get less entries than present in the ring.
That's ok but needs to be noted
in a code comment so people use this function correctly.

Interesting, but I still think it's not necessary.

If consumer is doing a busy polling, it will eventually get the entries. If
the consumer need notification from producer, it should drain the queue
which means it need enable notification before last try of consuming call,
otherwise it was a bug. The batch consuming function in this patch can
guarantee return at least one pointer if there's many, this looks sufficient
for the correctness?

Thanks

You ask for N entries but get N-1. This seems to imply the
ring is now empty. Do we guarantee this?


I think consumer can not assume ring is empty consider producer can 
produce at the same time. It need enable notification and do another 
poll in this case.


Thanks

[PATCH net-next 1/1] net: tcp: Define the TCP_MAX_WSCALE instead of literal number 14

2017-03-31 Thread gfree . wind

From: Gao Feng 

Define one new macro TCP_MAX_WSCALE instead of literal number '14',
and use U16_MAX instead of 65535 as the max value of TCP window.
There is another minor change, use rounddown(space, mss) instead of
(space / mss) * mss;

Signed-off-by: Gao Feng 
---
 include/net/tcp.h | 3 +++
 net/ipv4/tcp.c| 2 +-
 net/ipv4/tcp_input.c  | 4 ++--
 net/ipv4/tcp_output.c | 8 
 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e614ad4..2a89881 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -78,6 +78,9 @@
 /* Maximal number of ACKs sent quickly to accelerate slow-start. */
 #define TCP_MAX_QUICKACKS  16U
 
+/* Maximal number of window scale according to RFC1323 */
+#define TCP_MAX_WSCALE 14U
+
 /* urg_data states */
 #define TCP_URG_VALID  0x0100
 #define TCP_URG_NOTYET 0x0200
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cf4..95be443 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2393,7 +2393,7 @@ static int tcp_repair_options_est(struct tcp_sock *tp,
u16 snd_wscale = opt.opt_val & 0x;
u16 rcv_wscale = opt.opt_val >> 16;
 
-   if (snd_wscale > 14 || rcv_wscale > 14)
+   if (snd_wscale > TCP_MAX_WSCALE || rcv_wscale > 
TCP_MAX_WSCALE)
return -EFBIG;
 
tp->rx_opt.snd_wscale = snd_wscale;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bb09c70..e277901 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3759,11 +3759,11 @@ void tcp_parse_options(const struct sk_buff *skb,
!estab && sysctl_tcp_window_scaling) {
__u8 snd_wscale = *(__u8 *)ptr;
opt_rx->wscale_ok = 1;
-   if (snd_wscale > 14) {
+   if (snd_wscale > TCP_MAX_WSCALE) {
net_info_ratelimited("%s: 
Illegal window scaling value %d >14 received\n",
 __func__,
 
snd_wscale);
-   snd_wscale = 14;
+   snd_wscale = TCP_MAX_WSCALE;
}
opt_rx->snd_wscale = snd_wscale;
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 22548b5..d8f12d7 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -212,12 +212,12 @@ void tcp_select_initial_window(int __space, __u32 mss,
 
/* If no clamp set the clamp to the max possible scaled window */
if (*window_clamp == 0)
-   (*window_clamp) = (65535 << 14);
+   (*window_clamp) = (U16_MAX << TCP_MAX_WSCALE);
space = min(*window_clamp, space);
 
/* Quantize space offering to a multiple of mss if possible. */
if (space > mss)
-   space = (space / mss) * mss;
+   space = rounddown(space, mss);
 
/* NOTE: offering an initial window larger than 32767
 * will break some buggy TCP stacks. If the admin tells us
@@ -240,7 +240,7 @@ void tcp_select_initial_window(int __space, __u32 mss,
space = max_t(u32, space, sysctl_tcp_rmem[2]);
space = max_t(u32, space, sysctl_rmem_max);
space = min_t(u32, space, *window_clamp);
-   while (space > 65535 && (*rcv_wscale) < 14) {
+   while (space > U16_MAX && (*rcv_wscale) < TCP_MAX_WSCALE) {
space >>= 1;
(*rcv_wscale)++;
}
@@ -253,7 +253,7 @@ void tcp_select_initial_window(int __space, __u32 mss,
}
 
/* Set the clamp no higher than max representable value */
-   (*window_clamp) = min(65535U << (*rcv_wscale), *window_clamp);
+   (*window_clamp) = min_t(__u32, U16_MAX << (*rcv_wscale), *window_clamp);
 }
 EXPORT_SYMBOL(tcp_select_initial_window);
 
-- 
1.9.1

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Alexei Starovoitov


On 3/31/17 7:29 PM, Wangnan (F) wrote:



On 2017/3/31 12:45, Alexei Starovoitov wrote:

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  tools/lib/bpf/libbpf.c | 3 +--
  tools/lib/bpf/libbpf.h | 2 ++
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ac6eb863b2a4..1a2c07eb7795 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1618,8 +1618,7 @@ int bpf_program__nth_fd(struct bpf_program
*prog, int n)
  return fd;
  }
  -static void bpf_program__set_type(struct bpf_program *prog,
-  enum bpf_prog_type type)
+void bpf_program__set_type(struct bpf_program *prog, enum
bpf_prog_type type)
  {
  prog->type = type;
  }


Since it become a public interface, we need to check if prog is a
NULL pointer and check if the value of type is okay, and let it
return an errno.


I strongly disagree with such defensive programming. It's a cause
of bugs for users of this library.
I think you're trying to mimic c++ setters/getters, but c++
never checks 'this != null', since passing null into setter
is a bug of the user of the library and not the library.
The setters also should have 'void' return type when setters
cannot fail. That is exactly the case here.
If, in the future, we decide that this libbpf shouldn't support
all bpf program types then you'd need to change the prototype
of this function to return error code and change all places
where this function is called to check for error code.
It may or may not be the right approach.
For example today the only user of bpf_program__set*() methods
is perf/util/bpf-loader.c and it calls bpf_program__set_kprobe() and
bpf_program__set_tracepoint() _without_ checking the return value
which is _correct_ thing to do. Instead the current prototype of
'int bpf_program__set_tracepoint(struct bpf_program *prog);
is not correct and I suggest you to fix it.

You also need to do other cleanup. Like in bpf_object__elf_finish()
you have:
if (!obj_elf_valid(obj))
return;

if (obj->efile.elf) {

which is redundant. It's another example where mistakes creep in
due to defensive programming.

Another bug in bpf_object__close() which does:
if (!obj)
return;
again defensive programming strikes, since
you're not checking IS_ERR(obj) and that's what bpf_object__open()
returns, so most users of the library (who don't read the source
code and just using it based on .h) will do

obj = bpf_object__open(...);
bpf_object__close(obj);

and current 'if (!obj)' won't help and it will segfault.
I hit this issue will developing this patch set.


diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b30394f9947a..32c7252f734e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include   // for size_t
+#include 
enum libbpf_errno {
  __LIBBPF_ERRNO__START = 4000,
@@ -185,6 +186,7 @@ int bpf_program__set_sched_cls(struct bpf_program
*prog);
  int bpf_program__set_sched_act(struct bpf_program *prog);
  int bpf_program__set_xdp(struct bpf_program *prog);
  int bpf_program__set_perf_event(struct bpf_program *prog);
+void bpf_program__set_type(struct bpf_program *prog, enum
bpf_prog_type type);



The above bpf_program__set_xxx become redundancy. It should be generated
using macro as static inline functions.


  bool bpf_program__is_socket_filter(struct bpf_program *prog);
  bool bpf_program__is_tracepoint(struct bpf_program *prog);


bpf_program__is_xxx should be updated like bpf_program__set_xxx, since
enum bpf_prog_type is not a problem now.


All of these suggestions is a cleanup for your code that you
need to do yourself. I actually suggest you kill all bpf_program__is*()
and all but one bpf_program__set*() functions.
The current user perf/util/bpf-loader.c should be converted
to using bpf_program__set_type() with _void_ return code that
I'm introducing here.

Overall, I think, tools/lib/bpf/ is a nice library and it can be used
by many projects, but I suggest to stop making excuses based on
your proprietary usage of it.

Also please cc me in the future on changes to the library. It still
has my copyrights, though a lot has changed, since last time
I looked at it and it's my fault for not pay attention earlier.

[PATCH v2] ip: Add support for netdev events to monitor

2017-03-31 Thread Vladislav Yasevich

Add IFLA_EVENT handling so that event types can be viewed with
'moniotr' command.  This gives a little more information for why
a given message was recevied.

V2: Adds all events recently proposed.  This way all currently supported
 events can be viewed.

Signed-off-by: Vladislav Yasevich 
---
 include/linux/if_link.h | 19 +++
 ip/ipaddress.c  | 29 +
 2 files changed, 48 insertions(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index b0bdbd6..6cc0b36 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_EVENT,
__IFLA_MAX
 };
 
@@ -890,4 +891,22 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+   IFLA_EVENT_UNSPEC,
+   IFLA_EVENT_REBOOT,
+   IFLA_EVENT_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_NAME,
+   IFLA_EVENT_FEAT_CHANGE,
+   IFLA_EVENT_BONDING_FAILOVER,
+   IFLA_EVENT_NOTIFY_PEERS,
+   IFLA_EVENT_CHANGE_UPPER,
+   IFLA_EVENT_RESEND_IGMP,
+   IFLA_EVENT_PRE_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_INFO_DATA,
+   IFLA_EVENT_PRE_CHANGE_UPPER,
+   IFLA_EVENT_CHANGE_LOWER_STATE,
+   IFLA_EVENT_UDP_TUNNEL_PUSH_INFO,
+   IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
+};
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index b8d9c7d..dfe93f3 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -753,6 +753,32 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
return 0;
 }
 
+static const char *netdev_events[] = {"UNKNONWN",
+ "REBOOT",
+ "CHANGE_MTU",
+ "CHANGE_NAME",
+ "FEATURE_CHANGE",
+ "BONDING_FAILOVER",
+ "NOTIFY_PEERS",
+ "CHANGE_UPPER",
+ "RESEND_IGMP",
+ "PRE_CHANGE_MTU",
+ "CHANGE_INFO_DATA",
+ "PRE_CHANGE_UPPER",
+ "CHANGE_LOWER_STATE",
+ "UDP_TUNNEL_PUSH_INFO",
+ "CHANGE_TXQUEUE_LEN"};
+
+static void print_dev_event(FILE *f, __u32 event)
+{
+   if (event >= ARRAY_SIZE(netdev_events))
+   fprintf(f, "event %d ", event);
+   else {
+   if (event)
+   fprintf(f, "event %s ", netdev_events[event]);
+   }
+}
+
 int print_linkinfo(const struct sockaddr_nl *who,
   struct nlmsghdr *n, void *arg)
 {
@@ -858,6 +884,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
if (filter.showqueue)
print_queuelen(fp, tb);
 
+   if (tb[IFLA_EVENT])
+   print_dev_event(fp, rta_getattr_u32(tb[IFLA_EVENT]));
+
if (!filter.family || filter.family == AF_PACKET || show_details) {
SPRINT_BUF(b1);
fprintf(fp, "%s", _SL_);
-- 
2.7.4

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Wangnan (F)




On 2017/3/31 12:45, Alexei Starovoitov wrote:

expose bpf_program__set_type() to set program type

Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
  tools/lib/bpf/libbpf.c | 3 +--
  tools/lib/bpf/libbpf.h | 2 ++
  2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ac6eb863b2a4..1a2c07eb7795 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1618,8 +1618,7 @@ int bpf_program__nth_fd(struct bpf_program *prog, int n)
return fd;
  }
  
-static void bpf_program__set_type(struct bpf_program *prog,

- enum bpf_prog_type type)
+void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type)
  {
prog->type = type;
  }


Since it become a public interface, we need to check if prog is a
NULL pointer and check if the value of type is okay, and let it
return an errno.


diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b30394f9947a..32c7252f734e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -25,6 +25,7 @@
  #include 
  #include 
  #include   // for size_t
+#include 
  
  enum libbpf_errno {

__LIBBPF_ERRNO__START = 4000,
@@ -185,6 +186,7 @@ int bpf_program__set_sched_cls(struct bpf_program *prog);
  int bpf_program__set_sched_act(struct bpf_program *prog);
  int bpf_program__set_xdp(struct bpf_program *prog);
  int bpf_program__set_perf_event(struct bpf_program *prog);
+void bpf_program__set_type(struct bpf_program *prog, enum bpf_prog_type type);
  


The above bpf_program__set_xxx become redundancy. It should be generated
using macro as static inline functions.


  bool bpf_program__is_socket_filter(struct bpf_program *prog);
  bool bpf_program__is_tracepoint(struct bpf_program *prog);


bpf_program__is_xxx should be updated like bpf_program__set_xxx, since
enum bpf_prog_type is not a problem now.

Thank you.

[PATCH V2 net-next 2/2] rtnetlink: Add support for netdev event to link messages

2017-03-31 Thread Vladislav Yasevich

When netdev events happen, a rtnetlink_event() handler will send
messages for every event in it's white list.  These messages contain
current information about a particular device, but they do not include
the iformation about which event just happened.  The consumer of
the message has to try to infer this information.  In some cases
(ex: NETDEV_NOTIFY_PEERS), that is not possible.

This patch adds a new extension to RTM_NEWLINK message called IFLA_EVENT
that would have an encoding of the which event triggered this
message.  This would allow the the message consumer to easily determine
if it is interested in a particular event or not.

Signed-off-by: Vladislav Yasevich 
---
 include/linux/rtnetlink.h|  3 +-
 include/uapi/linux/if_link.h | 19 ++
 net/core/dev.c   |  2 +-
 net/core/rtnetlink.c | 86 +++-
 4 files changed, 99 insertions(+), 11 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 57e5484..0459018 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -18,7 +18,8 @@ extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct 
dst_entry *dst,
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change, gfp_t 
flags);
 struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
-  unsigned change, gfp_t flags);
+  unsigned change, unsigned long event,
+  gfp_t flags);
 void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
   gfp_t flags);
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 320fc1e..8eaada5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
IFLA_XDP,
+   IFLA_EVENT,
__IFLA_MAX
 };
 
@@ -892,4 +893,22 @@ enum {
 
 #define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
 
+enum {
+   IFLA_EVENT_UNSPEC,
+   IFLA_EVENT_REBOOT,
+   IFLA_EVENT_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_NAME,
+   IFLA_EVENT_FEAT_CHANGE,
+   IFLA_EVENT_BONDING_FAILOVER,
+   IFLA_EVENT_NOTIFY_PEERS,
+   IFLA_EVENT_CHANGE_UPPER,
+   IFLA_EVENT_RESEND_IGMP,
+   IFLA_EVENT_PRE_CHANGE_MTU,
+   IFLA_EVENT_CHANGE_INFO_DATA,
+   IFLA_EVENT_PRE_CHANGE_UPPER,
+   IFLA_EVENT_CHANGE_LOWER_STATE,
+   IFLA_EVENT_UDP_TUNNEL_PUSH_INFO,
+   IFLA_EVENT_CHANGE_TX_QUEUE_LEN,
+};
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ef9fe60e..7efb417 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6840,7 +6840,7 @@ static void rollback_registered_many(struct list_head 
*head)
 
if (!dev->rtnl_link_ops ||
dev->rtnl_link_state == RTNL_LINK_INITIALIZED)
-   skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U,
+   skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0,
 GFP_KERNEL);
 
/*
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f48a60d..956729c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -944,6 +944,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
   + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
   + rtnl_xdp_size(dev) /* IFLA_XDP */
+  + nla_total_size(4)  /* IFLA_EVENT */
   + nla_total_size(1); /* IFLA_PROTO_DOWN */
 
 }
@@ -1276,9 +1277,64 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct 
net_device *dev)
return err;
 }
 
+static int rtnl_fill_link_event(struct sk_buff *skb, unsigned long event)
+{
+   u32 rtnl_event;
+
+   switch (event) {
+   case NETDEV_REBOOT:
+   rtnl_event = IFLA_EVENT_REBOOT;
+   break;
+   case NETDEV_CHANGEMTU:
+   rtnl_event = IFLA_EVENT_CHANGE_MTU;
+   break;
+   case NETDEV_CHANGENAME:
+   rtnl_event = IFLA_EVENT_CHANGE_NAME;
+   break;
+   case NETDEV_FEAT_CHANGE:
+   rtnl_event = IFLA_EVENT_FEAT_CHANGE;
+   break;
+   case NETDEV_BONDING_FAILOVER:
+   rtnl_event = IFLA_EVENT_BONDING_FAILOVER;
+   break;
+   case NETDEV_NOTIFY_PEERS:
+   rtnl_event = IFLA_EVENT_NOTIFY_PEERS;
+   break;
+   case NETDEV_CHANGEUPPER:
+   rtnl_event = IFLA_EVENT_CHANGE_UPPER;
+   break;
+   case NETDEV_RESEND_IGMP:
+   rtnl_event = IFLA_EVENT_RESEND_IGMP;
+   break;
+   case NETDEV_PRECHANGEMTU:
+   rtnl_event = IFLA_EVENT_PRE_CHANGE_MTU;
+   break;
+   case NETDEV_CHANGEINFODATA:
+

[PATCH net-next 0/2] rtnetlink: Updates to rtnetlink_event()

2017-03-31 Thread Vladislav Yasevich

This series came out of the conversation that started as a result
my first attempt to add netdevice event info to netlink messages.

This series converts event processing to a 'white list', where
we explicitely permit events to generate netlink messages.  This
is meant to make people take a closer look and determine wheter
these events should really trigger netlink messages.

I am also adding a V2 of my patch to add event type to the netlink
message.  This version supports all events that we currently generate.

I will also update my patch to iproute that will show this data
through 'ip monitor'. 

I actually need the ability to trap NETDEV_NOTIFY_PEERS event
(as well as possible NETDEV_RESEND_IGMP) to support hanlding of
macvtap on top of bonding.  I hope others will also find this info usefull.

Vladislav Yasevich (2):
  rtnetlink: Convert rtnetlink_event to white list
  rtnl: Add support for netdev event to link messages

 include/linux/rtnetlink.h|   3 +-
 include/uapi/linux/if_link.h |  19 
 net/core/dev.c   |   2 +-
 net/core/rtnetlink.c | 113 ++-
 4 files changed, 113 insertions(+), 24 deletions(-)

-- 
2.7.4

[PATCH net-next 1/2] rtnetlink: Convert rtnetlink_event to white list

2017-03-31 Thread Vladislav Yasevich

The rtnetlink_event currently functions as a blacklist where
we block cerntain netdev events from being sent to user space.
As a result, events have been added to the system that userspace
probably doesn't care about.

This patch converts the implementation to the white list so that
newly events would have to be specifically added to the list to
be sent to userspace.  This would force new event implementers to
consider whether a given event is usefull to user space or if it's
just a kernel event.

Signed-off-by: Vladislav Yasevich 
---
 net/core/rtnetlink.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9c3947a..f48a60d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4116,22 +4116,23 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 
switch (event) {
-   case NETDEV_UP:
-   case NETDEV_DOWN:
-   case NETDEV_PRE_UP:
-   case NETDEV_POST_INIT:
-   case NETDEV_REGISTER:
-   case NETDEV_CHANGE:
-   case NETDEV_PRE_TYPE_CHANGE:
-   case NETDEV_GOING_DOWN:
-   case NETDEV_UNREGISTER:
-   case NETDEV_UNREGISTER_FINAL:
-   case NETDEV_RELEASE:
-   case NETDEV_JOIN:
-   case NETDEV_BONDING_INFO:
+   case NETDEV_REBOOT:
+   case NETDEV_CHANGEMTU:
+   case NETDEV_CHANGENAME:
+   case NETDEV_FEAT_CHANGE:
+   case NETDEV_BONDING_FAILOVER:
+   case NETDEV_NOTIFY_PEERS:
+   case NETDEV_CHANGEUPPER:
+   case NETDEV_RESEND_IGMP:
+   case NETDEV_PRECHANGEMTU:
+   case NETDEV_CHANGEINFODATA:
+   case NETDEV_PRECHANGEUPPER:
+   case NETDEV_CHANGELOWERSTATE:
+   case NETDEV_UDP_TUNNEL_PUSH_INFO:
+   case NETDEV_CHANGE_TX_QUEUE_LEN:
+   rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
break;
default:
-   rtmsg_ifinfo(RTM_NEWLINK, dev, 0, GFP_KERNEL);
break;
}
return NOTIFY_DONE;
-- 
2.7.4

[PATCH] ss: replace all zero characters in a unix name to '@'

2017-03-31 Thread Andrei Vagin

From: Andrei Vagin 

A name of an abstract socket can contain zero characters.
Now we replace only the first character. If a name contains more
than one zero character, the ss tool shows only a part of the name:
u_str  UNCONN00 @1931097   * 0

the output with this patch:
u_str  UNCONN00 @@zdtm-./sk-unix-unconn-23/@ 1931097   * 0

Signed-off-by: Andrei Vagin 
---
 misc/ss.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 5cda728..a3200a1 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2726,10 +2726,24 @@ static int unix_show_sock(const struct sockaddr_nl 
*addr, struct nlmsghdr *nlh,
if (tb[UNIX_DIAG_NAME]) {
int len = RTA_PAYLOAD(tb[UNIX_DIAG_NAME]);
 
+   if (len > sizeof(name) - 1)
+   len = sizeof(name) - 1;
+
memcpy(name, RTA_DATA(tb[UNIX_DIAG_NAME]), len);
name[len] = '\0';
-   if (name[0] == '\0')
+   if (name[0] == '\0') {
+   char *n;
+
name[0] = '@';
+
+   n = name + 1;
+   while (n && n < name + len) {
+   n = memchr(n, 0, name + len - n);
+   if (n == NULL)
+   break;
+   *n = '@';
+   }
+   }
stat.name = [0];
memcpy(stat.local.data, , sizeof(stat.name));
}
-- 
2.7.4

[PATCH] net: hns: fix boolreturn.cocci warnings

2017-03-31 Thread kbuild test robot

drivers/net/ethernet/hisilicon/hns/hns_enet.c:1548:8-9: WARNING: return of 0/1 
in function 'hns_enable_serdes_lb' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: lipeng 
Signed-off-by: Fengguang Wu 
---

 hns_enet.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1545,7 +1545,7 @@ static bool hns_enable_serdes_lb(struct
/* wait h/w ready */
mdelay(300);
 
-   return 0;
+   return false;
 }
 
 static void hns_disable_serdes_lb(struct net_device *ndev)

Re: [PATCH net 08/19] net: hns: Fix to adjust buf_size of ring according to mtu

2017-03-31 Thread kbuild test robot

Hi lipeng,

[auto build test WARNING on net/master]

url:
https://github.com/0day-ci/linux/commits/Salil-Mehta/net-hns-Misc-HNS-Bug-Fixes-Code-Improvements/20170401-060153


coccinelle warnings: (new ones prefixed by >>)

>> drivers/net/ethernet/hisilicon/hns/hns_enet.c:1548:8-9: WARNING: return of 
>> 0/1 in function 'hns_enable_serdes_lb' with return type bool

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

[PATCH] net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout

2017-03-31 Thread Grygorii Strashko

In case, if TX watchdog is fired some or all netdev TX queues will be
stopped and as part of recovery it is required not only to drain and
reinitailize CPSW TX channeles, but also wake up stoppted TX queues what
doesn't happen now and netdevice will stop transmiting data until
reopenned.

Hence, add netif_tx_wake_all_queues() call in .ndo_tx_timeout() to complete
recovery and restore TX path.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 86d6f10..71fd4ef 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1819,6 +1819,8 @@ static void cpsw_ndo_tx_timeout(struct net_device *ndev)
}
 
cpsw_intr_enable(cpsw);
+   netif_trans_update(ndev);
+   netif_tx_wake_all_queues(ndev);
 }
 
 static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
-- 
2.10.1

Re: [PATCH net-next] sock: correctly test SOCK_TIMESTAMP in sock_recv_ts_and_drops()

2017-03-31 Thread Paolo Abeni

On Fri, 2017-03-31 at 14:59 -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> It seems the code does not match the intent.
> 
> This broke packetdrill, and probably other programs.
> 
> Fixes: 6c7c98bad488 ("sock: avoid dirtying sk_stamp, if possible")
> Signed-off-by: Eric Dumazet 
> Cc: Paolo Abeni 
> ---
>  include/net/sock.h |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 
> 8e53158a7d957ea2a480cc449606dca2480b1259..66349e49d468646ce724485bb8e74952825f0d6c
>  100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2250,7 +2250,7 @@ static inline void sock_recv_ts_and_drops(struct msghdr 
> *msg, struct sock *sk,
>  
>   if (sk->sk_flags & FLAGS_TS_OR_DROPS || sk->sk_tsflags & TSFLAGS_ANY)
>   __sock_recv_ts_and_drops(msg, sk, skb);
> - else if (unlikely(sk->sk_flags & SOCK_TIMESTAMP))
> + else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
>   sk->sk_stamp = skb->tstamp;
>   else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP))
>   sk->sk_stamp = 0;
> 

Oh, my bad! 

Thanks Eric for fixing this.

Acked-by: Paolo Abeni

Re: [PATCH v2 net-next 3/6] tools/lib/bpf: expose bpf_program__set_type()

2017-03-31 Thread Alexei Starovoitov


On 3/31/17 12:49 AM, Wangnan (F) wrote:

Hi Alexei,

Please see the patch I sent. Since we export bpf_program__set_type(),
bpf_program__set_xxx() should be built based on it.


Replied to your patch. I still think simply adding
#include  here like I did in this patch is much cleaner.
That's the whole purpose of uapi header to be used in such libraries
and apps. Copy-pasting enum bpf_prog_type from bpf.h into bunch of
macros in libbpf.h just doesn't look right and likely an indication
that whatever you do with your proprietary stuff is fragile.
Clearly you're trying to avoid including bpf.h not because of perf,
which has its own copy of bpf.h in tools/include/
Just like iproute2 has its copy of bpf.h as well.

Re: Issue with load across multiple connections

2017-03-31 Thread Eric Dumazet

On Fri, 2017-03-31 at 12:51 -0700, Tom Herbert wrote:
> Hi netdev,
> 
> I'm forwarding this problem report from Ulrich Speidel.
> 
> The correlation to FDs would seem to point to application always
> reading for FDs in order and it seems unlikely that FDs have any
> relevance in the networking stack below the file IO for sockets. Some
> of the stack experts might have a better idea though...
> 
> Thanks,
> Tom
> 

TCP stack has no fairness guarantee, both at sender side and receive
side.

This smells like some memory tuning to me. Some flows, depending on
their start time, can grab big receive/send windows, and others might
hit global memory pressure and fallback to ridiculous windows.

Please provide, on server and client :

cat /proc/sys/net/ipv4/tcp_rmem
cat /proc/sys/net/ipv4/tcp_wmem
cat /proc/sys/net/ipv4/tcp_mem

and maybe nstat output

nstats -n >/dev/null ; < run experiment > ; nstat


But I guess this is really a receiver problem, with too small amount of
memory.


> -- Forwarded message --
> From: Ulrich Speidel 
> Date: Fri, Mar 31, 2017 at 2:11 AM
> Subject: Linux kernel query
> To: t...@quantonium.net
> Cc: Brian Carpenter , Nevil Brownlee
> , l...@steinwurf.com, Lei Qian
> 
> 
> 
> Dear Tom,
> 
> I'm a colleague of Brian Carpenter at the University of Auckland. He
> has suggested that I contact you about this as I'm not sure that what
> we have discovered is a bug - it may even be an intended feature but
> I've failed to find it documented anywhere. From all we can tell, the
> problem seems related to how socket file descriptor numbers & SKBs are
> handled in POSIX-compliant kernels. I'm not a kernel hack so apologise
> in advance if terminology isn't always spot-on.
> 
> This is how we triggered the effect: We have a setup in which we have
> multiple physical network clients connect to multiple servers at
> random. On the client side, we create N "channels" (indexed, say 0 to
> N-1) on each physical client. Each channel executes the following
> task:
> 
> 1) create a fresh TCP socket
> 2) connect to a randomly chosen server from our pool
> 3) receive a quantity of data that the server sends (this may be
> somewhere between 0 bytes and hundreds of MB). In our case, we use the
> application merely as a network traffic generator, so the receive
> process consists of recording the number of bytes made available by
> the socket and freeing the buffer without ever actually reading it.
> 4) wait for server disconnect
> 5) free socket (i.e., we're not explicitly re-using the previous
> connection's socket)
> 6) jump back to 1)
> 
> We keep track of the throughput on each channel.
> 
> Note that the effect is the same regardless of whether we implement
> each channel in a process of its own, in a threaded application, or
> whether we use non-blocking sockets and check on them in a loop.
> 
> What we would normally expect is that the each channel would receive
> about the same goodput over time, regardless of the value of N. Note
> that each channel uses a succession of fresh sockets.
> 
> What actually happens is this: For up to approximately N=20 channels
> on a single physical client (we've tried Raspbian and Debian, as well
> as Ubuntu), each channel sees on average substantial and comparable
> levels of throughput, adding up to values approaching network
> interface capacity. Once we push N beyond 20, the throughput on any
> further channels drops to zero very quickly. For N=30, we typically
> see at least half a dozen channels with no throughput at all beyond
> the connection handshake. Throughput on the first 20 or so channels
> remains pretty much unchanged. The sockets on the channels with low or
> no throughput all manage to connect, but remain in connected state but
> receive no data.
> 
> Throughput on the first ~20 channels is sustainable for long periods
> of time - so we're not dealing with an intermittent bug that causes
> our sockets to stall: The affected sockets / channels never receive
> anything (and the sockets around the 20-or-so mark very little). So it
> seems that subsequent sockets on a channel inherit the ability of
> their predecessor to receive data at quantity.
> 
> We also see the issue on a single physical Raspberry client having the
> sole use of 14 Super Micros on GbE interfaces to download from. So we
> know we're definitely not overloading the server side (note that we
> are able to saturate the network to the Pi). Here is some sample data
> from the Pi (my apologies for the rough format):
> 
> Channel index/MB transferred/Number of connections completed+attempted
> 0 2.37 144
> 1 29.32 92
> 2 2.71 132
> 3 10.88 705
> 4 11.90 513
> 5 16.045990 571
> 6 9.631539 598
> 7 15.420138 362
> 8 9.854378 106
> 9 8.975264 315
> 10 8.020266 526
> 11 6.369107 582
> 12 8.877760 277
> 13 8.148640 406
> 14 13.536793 301
> 15 9.804712 55
> 16 7.643378 292
> 17 7.970028

Re: [PATCH net-next 1/1 v2] net: rmnet_data: Initial implementation

2017-03-31 Thread Dan Williams

On Fri, 2017-03-24 at 18:49 -0600, Subash Abhinov Kasiviswanathan
wrote:
> > (re-sending from an address that's actually subscribed to
> > netdev@...)
> > 
> > The first thing that jumps out at me is "why isn't this using
> > rtnl_link_ops?"
> > 
> > To me (and perhaps I'm completely wrong) the structure here is a
> > lot
> > like VLAN interfaces.  You have a base device (whether that's a
> > netdev
> > or not) and you essentially "ip link add link cdc-wdm0 name rmnet0
> > type
> > rmnet id 5".  Does the aggregation happen only on the downlink (eg,
> > device -> host) or can the host send aggregated packets too?
> 
> Hi Dan
> 
> Yes, you are correct. We associate this driver with a physical
> device 
> and
> then create rmnet devices over it as needed for multiplexing.

Yeah, seems quite a bit like VLAN (from a workflow perspective, not
quite as much from a protocol one) and I think the same workflow could
work for this too.  Would be nice to eventually get qmi_wwan onto the
same base, if possible (though we'd need to preserve the 802.3
capability somehow for devices that don't support raw-ip).

> Aggregation is supported both on downlink and uplink by Qualcomm
> Technologies, Inc. modem hardware. This initial patchset implements
> only
> downlink aggregation since uplink aggregation is rarely enabled or
> used.
> I'll send a separate patchset for that.

Does the aggregation happen at the level of the raw device, or at the
level of the MUX channels?  eg, can I aggregate packets from multiple
MUX channels into the same request, especially on USB devices?

> > Using rtnl_link_ops would get rid of ASSOC_NET_DEV,
> > UNASSOC_NET_DEV,
> > NEW_VND, NEW_VND_WITH_PREFIX, and FREE_VND.  GET_NET_DEV_ASSOC goes
> > away becuase you use normal 'kobject' associations and you can
> > derive
> > the rmnet parent through sysfs links.  rmnet_nl_msg_s goes away,
> > because you can use nla_policy.
> > 
> > Just a thought; there seems to be a ton of overlap with
> > rtnl_link_ops
> > in the control plane here.
> 
> As of now, we have been using a custom netlink userspace tool for
> configuring the interfaces by listening to RMNET_NETLINK_PROTO
> events.
> Does that mean that configuration needs to be moved to ip tool by
> adding a new option for rmnet (like how you have mentioned above)
> and add additional options for end point configuration as well?

It doesn't necessarily mean that configuration would need to move to
the IP tool.  I just used it as an example of how VLAN works and how
rmnet could work as well, quite easily with the ip tool.

Since the ip tool is based on netlink, both it and your userspace
library could use the same netlink attributes and families to do the
same thing.

Essentially, I am recommending that instead of your current custom
netlink commands, port them over to rtnetlink which will mean less code
for you, and a more standard kernel interface for everyone.

> > Any thoughts on how this plays with net namespaces?
> 
> We have not tried it with different net namespaces since we never had
> such use cases internally. We can look into this.

One use-case is to put different packet data contexts into different
namespaces.  You could then isolate different EPS/PDP contexts by
putting them into different network namespaces, and for example have
your IMS handler only be able to access its own EPS/PDP context.

We could already do this with qmi_wwan on devices that provide multiple
USB endpoints for QMI/rmnet, but I thought the point of the MUX
protocol was to allow a single endpoint for rmnet that can MUX multiple
packet data contexts.  So it would be nice to allow each rmnet netdev
to be placed into a different network namespace.

> > Also, I'm not sure if it make sense to provide first class
> > tracepoints
> > for a specific driver, as it's not clear if they are userspace API
> > or
> > not and thus may need to be kept stable.   Or are perf probes
> > enough
> > instead?
> 
> We have some tracepoints which are in datapath and some for specific
> events such as device unregistration and configuration events 
> association.
> Most of these devices are on ARM64 and need to support older kernels,
> (from 3.10) so we had to rely on tracepoints. I believe ARM64 got
> trace
> point support somewhat recently. I was not aware of the restriction
> of
> tracepoints, so I can remove that.

Yeah, best to remove that for now, you can propose to add them back
later and see what people say.

> > 
> > What's RMNET_EPMODE_BRIDGE and how is it used?
> 
> RMNET_EPMODE_BRIDGE is for bridging two different physical
> interfaces.
> An example is sending raw bytes from hardware to USB - this can be
> used
> if a PC connected by USB would require the data in the MAP format.

Like a usb gadget rmnet interface for debugging?

Dan

Re: [PATCH v2] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-31 Thread Masami Hiramatsu

On Fri, 31 Mar 2017 10:08:39 -0400
Steven Rostedt  wrote:

> On Fri, 31 Mar 2017 15:20:24 +0200
> Alban Crequy  wrote:
> 
> > When a kretprobe is installed on a kernel function, there is a maximum
> > limit of how many calls in parallel it can catch (aka "maxactive"). A
> > kernel module could call register_kretprobe() and initialize maxactive
> > (see example in samples/kprobes/kretprobe_example.c).
> > 
> > But that is not exposed to userspace and it is currently not possible to
> > choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
> > 
> > The default maxactive can be as low as 1 on single-core with a
> > non-preemptive kernel. This is too low and we need to increase it not
> > only for recursive functions, but for functions that sleep or resched.
> > 
> > This patch updates the format of the command that can be written to
> > kprobe_events so that maxactive can be optionally specified.
> > 
> > I need this for a bpf program attached to the kretprobe of
> > inet_csk_accept, which can sleep for a long time.
> > 
> > This patch includes a basic selftest:
> > 
> > > # ./ftracetest -v  test.d/kprobe/
> > > === Ftrace unit tests ===
> > > [1] Kprobe dynamic event - adding and removing[PASS]
> > > [2] Kprobe dynamic event - busy event check   [PASS]
> > > [3] Kprobe dynamic event with arguments   [PASS]
> > > [4] Kprobes event arguments with types[PASS]
> > > [5] Kprobe dynamic event with function tracer [PASS]
> > > [6] Kretprobe dynamic event with arguments[PASS]
> > > [7] Kretprobe dynamic event with maxactive[PASS]
> > >
> > > # of passed:  7
> > > # of failed:  0
> > > # of unresolved:  0
> > > # of untested:  0
> > > # of unsupported:  0
> > > # of xfailed:  0
> > > # of undefined(test bug):  0  
> > 
> > BugLink: https://github.com/iovisor/bcc/issues/1072
> > Signed-off-by: Alban Crequy 
> > 
> > ---
> > 
> > Changes since v1:
> > - Remove "(*)" from documentation. (Review from Masami Hiramatsu)
> > - Fix support for "r100" without the event name (Review from Masami 
> > Hiramatsu)
> > - Get rid of magic numbers within the code.  (Review from Steven Rostedt)
> >   Note that I didn't use KRETPROBE_MAXACTIVE_ALLOC since that patch is not
> >   merged.
> > - Return -E2BIG when maxactive is too big.
> > - Add basic selftest
> > ---
> >  Documentation/trace/kprobetrace.txt|  4 ++-
> >  kernel/trace/trace_kprobe.c| 39 
> > ++
> >  .../ftrace/test.d/kprobe/kretprobe_maxactive.tc| 39 
> > ++
> >  3 files changed, 75 insertions(+), 7 deletions(-)
> >  create mode 100644 
> > tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
> > 
> > diff --git a/Documentation/trace/kprobetrace.txt 
> > b/Documentation/trace/kprobetrace.txt
> > index 41ef9d8..7051a20 100644
> > --- a/Documentation/trace/kprobetrace.txt
> > +++ b/Documentation/trace/kprobetrace.txt
> > @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
> >  Synopsis of kprobe_events
> >  -
> >p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
> > -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]: Set a return 
> > probe
> > +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return 
> > probe
> >-:[GRP/]EVENT: Clear a probe
> >  
> >   GRP   : Group name. If omitted, use "kprobes" for it.
> > @@ -32,6 +32,8 @@ Synopsis of kprobe_events
> >   MOD   : Module name which has given SYM.
> >   SYM[+offs]: Symbol+offset where the probe is inserted.
> >   MEMADDR   : Address where the probe is inserted.
> > + MAXACTIVE : Maximum number of instances of the specified function that
> > + can be probed simultaneously, or 0 for the default.
> 
> BTW, to me, 0 means none (no instances can probe). This should have a
> better description of what "0" actually means.

default value is defined in Documentation/kprobes.txt sction 1.3.1, so
you'll just need to refer that.

Thank you,

> 
> -- Steve
> 
> 
> >  
> >   FETCHARGS : Arguments. Each probe can have up to 128 args.
> >%REG : Fetch register REG


-- 
Masami Hiramatsu

[PATCH net-next] sock: correctly test SOCK_TIMESTAMP in sock_recv_ts_and_drops()

2017-03-31 Thread Eric Dumazet

From: Eric Dumazet 

It seems the code does not match the intent.

This broke packetdrill, and probably other programs.

Fixes: 6c7c98bad488 ("sock: avoid dirtying sk_stamp, if possible")
Signed-off-by: Eric Dumazet 
Cc: Paolo Abeni 
---
 include/net/sock.h |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 
8e53158a7d957ea2a480cc449606dca2480b1259..66349e49d468646ce724485bb8e74952825f0d6c
 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2250,7 +2250,7 @@ static inline void sock_recv_ts_and_drops(struct msghdr 
*msg, struct sock *sk,
 
if (sk->sk_flags & FLAGS_TS_OR_DROPS || sk->sk_tsflags & TSFLAGS_ANY)
__sock_recv_ts_and_drops(msg, sk, skb);
-   else if (unlikely(sk->sk_flags & SOCK_TIMESTAMP))
+   else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
sk->sk_stamp = skb->tstamp;
else if (unlikely(sk->sk_stamp == SK_DEFAULT_STAMP))
sk->sk_stamp = 0;

Re: [PATCH net] ftgmac100: Mostly rewrite the driver

2017-03-31 Thread Andrew Lunn

> They are incremental, but some of them are trivial and in the end
> it's the end result that matters but yes I could probably split some
> misc stuff, rx path, tx path, and more misc.

Hi Ben

Trivial patches are good. They are easy to review. You should be
aiming for patches which are obviously correct, where ever possible.
Refactoring existing code often has a lot of obviously correct
patches, and a few complex patches which take some effort to review.

   Andrew

Re: [PATCH net-next v6 01/11] bpf: Add eBPF program subtype and is_valid_subtype() verifier (fwd)

2017-03-31 Thread Mickaël Salaün

Good catch, thanks again Julia!

 Mickaël

On 29/03/2017 17:14, Julia Lawall wrote:
> Size is unsigned, so not negative.
> 
> julia
> 
> -- Forwarded message --
> Date: Wed, 29 Mar 2017 23:06:01 +0800
> From: kbuild test robot 
> To: kbu...@01.org
> Cc: Julia Lawall 
> Subject: Re: [PATCH net-next v6 01/11] bpf: Add eBPF program subtype and
> is_valid_subtype() verifier
> 
> In-Reply-To: <20170328234650.19695-2-...@digikod.net>
> TO: "Mickaël Salaün" 
> 
> Hi Mickaël,
> 
> [auto build test WARNING on net-next/master]
> 
> url:
> https://github.com/0day-ci/linux/commits/Micka-l-Sala-n/Landlock-LSM-Toward-unprivileged-sandboxing/20170329-211258
> :: branch date: 2 hours ago
> :: commit date: 2 hours ago
> 
>>> kernel/bpf/syscall.c:1041:5-9: WARNING: Unsigned expression compared with 
>>> zero: size < 0
> 
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 07d282aef4f60235407284c0be81d01e352e040b
> vim +1041 kernel/bpf/syscall.c
> 
> f4324551 Daniel Mack2016-11-23  1025  return -EINVAL;
> f4324551 Daniel Mack2016-11-23  1026  }
> f4324551 Daniel Mack2016-11-23  1027
> 7f677633 Alexei Starovoitov 2017-02-10  1028  return ret;
> f4324551 Daniel Mack2016-11-23  1029  }
> f4324551 Daniel Mack2016-11-23  1030  #endif /* CONFIG_CGROUP_BPF */
> f4324551 Daniel Mack2016-11-23  1031
> 99c55f7d Alexei Starovoitov 2014-09-26  1032  SYSCALL_DEFINE3(bpf, int, cmd, 
> union bpf_attr __user *, uattr, unsigned int, size)
> 99c55f7d Alexei Starovoitov 2014-09-26  1033  {
> 99c55f7d Alexei Starovoitov 2014-09-26  1034  union bpf_attr attr = 
> {};
> 99c55f7d Alexei Starovoitov 2014-09-26  1035  int err;
> 99c55f7d Alexei Starovoitov 2014-09-26  1036
> 1be7f75d Alexei Starovoitov 2015-10-07  1037  if 
> (!capable(CAP_SYS_ADMIN) && sysctl_unprivileged_bpf_disabled)
> 99c55f7d Alexei Starovoitov 2014-09-26  1038  return -EPERM;
> 99c55f7d Alexei Starovoitov 2014-09-26  1039
> 07d282ae Mickaël Salaün 2017-03-29  1040  size = 
> check_user_buf((void __user *)uattr, size, sizeof(attr));
> 07d282ae Mickaël Salaün 2017-03-29 @1041  if (size < 0)
> 07d282ae Mickaël Salaün 2017-03-29  1042  return size;
> 99c55f7d Alexei Starovoitov 2014-09-26  1043
> 99c55f7d Alexei Starovoitov 2014-09-26  1044  /* copy attributes from 
> user space, may be less than sizeof(bpf_attr) */
> 99c55f7d Alexei Starovoitov 2014-09-26  1045  if 
> (copy_from_user(, uattr, size) != 0)
> 99c55f7d Alexei Starovoitov 2014-09-26  1046  return -EFAULT;
> 99c55f7d Alexei Starovoitov 2014-09-26  1047
> 99c55f7d Alexei Starovoitov 2014-09-26  1048  switch (cmd) {
> 99c55f7d Alexei Starovoitov 2014-09-26  1049  case BPF_MAP_CREATE:
> 
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation
> 



signature.asc
Description: OpenPGP digital signature

Re: [kernel-hardening] [PATCH net-next v6 06/11] seccomp,landlock: Handle Landlock events per process hierarchy

2017-03-31 Thread Mickaël Salaün



On 29/03/2017 12:35, Djalal Harouni wrote:
> On Wed, Mar 29, 2017 at 1:46 AM, Mickaël Salaün  wrote:

>> @@ -25,6 +30,9 @@ struct seccomp_filter;
>>  struct seccomp {
>> int mode;
>> struct seccomp_filter *filter;
>> +#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
>> +   struct landlock_events *landlock_events;
>> +#endif /* CONFIG_SECCOMP_FILTER && CONFIG_SECURITY_LANDLOCK */
>>  };
> 
> Sorry if this was discussed before, but since this is mean to be a
> stackable LSM, I'm wondering if later you could move the events from
> seccomp, and go with a security_task_alloc() model [1] ?
> 
> Thanks!
> 
> [1] 
> http://kernsec.org/pipermail/linux-security-module-archive/2017-March/000184.html
> 

Landlock use the seccomp syscall to attach a rule to a process and using
struct seccomp to store this rule make sense. There is currently no way
to store multiple task->security, which is needed for a stackable LSM
like Landlock, but we could move the events there if needed in the future.

 Mickaël



signature.asc
Description: OpenPGP digital signature

Re: [PATCH net] ftgmac100: Mostly rewrite the driver

2017-03-31 Thread Benjamin Herrenschmidt

On Fri, 2017-03-31 at 15:52 +0200, Andrew Lunn wrote:
> > We're running some more testing tonight, if it's all solid I'll shoot
> > it out tomorrow or sunday. Dave, it's ok to just spam the list with a
> > 55 patches series like that ?
> 
> Hi Ben
> 
> Is there a good reason to spam the list with 55 patches? The patches
> should be incremental, so getting them reviewed and applied in batches
> of 10 should not be a problem.

They are incremental, but some of them are trivial and in the end
it's the end result that matters but yes I could probably split some
misc stuff, rx path, tx path, and more misc.

I found an issue with link down vs. pending tx packets last night
so I need to fix that and test. I'll send things when that's done.

Cheers,
Ben.

Re: [B.A.T.M.A.N.] [PATCH] net: batman-adv: use new api ethtool_{get|set}_link_ksettings

2017-03-31 Thread Philippe Reynes

Hi Sven,

On 3/31/17, Sven Eckelmann  wrote:
> On Donnerstag, 30. März 2017 23:01:27 CEST Philippe Reynes wrote:
>> The ethtool api {get|set}_settings is deprecated.
>> We move this driver to new api {get|set}_link_ksettings.
>>
>> I've only compiled this change. If someone may test it,
>> it would be very nice.
>>
>> Signed-off-by: Philippe Reynes 
>> ---
>>  net/batman-adv/soft-interface.c |   25 -
>>  1 files changed, 12 insertions(+), 13 deletions(-)
>
> Do you know if anyone already prepared the get_link_ksettings support for
> kernels older than 4.6 for backports.git?

Sorry, I don't know this repo. Do you have an url please ?
But I suppose that nobody works on such backport.

> Kind regards,
>   Sven

Regards,
Philippe

КЛИЕНТСКИЕ БАЗЫ для всех кто много продает (для юрлиц и физлиц)! Узнайте подробнее! Skype: prodawez390 Email: prodawez...@gmail.com Whatsapp: +79139230330 Viber: +79139230330 Telegram: +79139230330

2017-03-31 Thread zubakin.cr...@yandex.ru

КЛИЕНТСКИЕ БАЗЫ для всех кто много продает (для юрлиц и физлиц)! Узнайте 
подробнее! Skype: prodawez390 Email: prodawez...@gmail.com Whatsapp: 
+79139230330 Viber: +79139230330 Telegram: +79139230330

[patch 2/2] drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: fix build with gcc-4.4.4

2017-03-31 Thread akpm

From: Andrew Morton 
Subject: drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: fix build with 
gcc-4.4.4

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 
'mlx5e_set_rxfh':
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: error: unknown field 
'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: missing 
braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: (near 
initialization for 'rrp.')
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1068: error: unknown field 
'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: excess 
elements in struct initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: (near 
initialization for 'rrp')

gcc-4.4.4 has issues with anonymous union initializers.  Work around this.

Cc: Saeed Mahameed 
Cc: Tariq Toukan 
Signed-off-by: Andrew Morton 
---

 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff -puN 
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c~drivers-net-ethernet-mellanox-mlx5-core-en_ethtoolc-fix-build-with-gcc-444
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
--- 
a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c~drivers-net-ethernet-mellanox-mlx5-core-en_ethtoolc-fix-build-with-gcc-444
+++ a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1064,8 +1064,12 @@ static int mlx5e_set_rxfh(struct net_dev
u32 rqtn = priv->indir_rqt.rqtn;
struct mlx5e_redirect_rqt_param rrp = {
.is_rss = true,
-   .rss.hfunc = priv->channels.params.rss_hfunc,
-   .rss.channels  = >channels
+   {
+   .rss = {
+   .hfunc = 
priv->channels.params.rss_hfunc,
+   .channels  = >channels,
+   },
+   },
};
 
mlx5e_redirect_rqt(priv, rqtn, MLX5E_INDIR_RQT_SIZE, 
rrp);
_

[patch 1/2] drivers/net/ethernet/mellanox/mlx5/core/en_main.c: fix build with gcc-4.4.4

2017-03-31 Thread akpm

From: Andrew Morton 
Subject: drivers/net/ethernet/mellanox/mlx5/core/en_main.c: fix build with 
gcc-4.4.4

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 
'mlx5e_redirect_rqts':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2210: error: unknown field 
'rqn' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: missing braces 
around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: (near 
initialization for 'direct_rrp.')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 
'mlx5e_redirect_rqts_to_channels':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: error: unknown field 
'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: missing braces 
around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: (near 
initialization for 'rrp.')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: initialization 
makes integer from pointer without a cast
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2228: error: unknown field 
'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: excess 
elements in struct initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: (near 
initialization for 'rrp')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 
'mlx5e_redirect_rqts_to_drop':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2238: error: unknown field 
'rqn' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: missing braces 
around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: (near 
initialization for 'drop_rrp.')

gcc-4.4.4 has issues with anonymous union initializers.  Work around this.

Cc: Saeed Mahameed 
Cc: Tariq Toukan 
Signed-off-by: Andrew Morton 
---

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |   16 +---
 1 file changed, 12 insertions(+), 4 deletions(-)

diff -puN 
drivers/net/ethernet/mellanox/mlx5/core/en_main.c~drivers-net-ethernet-mellanox-mlx5-core-en_mainc-fix-build-with-gcc-444
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c
--- 
a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c~drivers-net-ethernet-mellanox-mlx5-core-en_mainc-fix-build-with-gcc-444
+++ a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2207,7 +2207,9 @@ static void mlx5e_redirect_rqts(struct m
for (ix = 0; ix < priv->profile->max_nch(priv->mdev); ix++) {
struct mlx5e_redirect_rqt_param direct_rrp = {
.is_rss = false,
-   .rqn= mlx5e_get_direct_rqn(priv, ix, rrp)
+   {
+   .rqn= mlx5e_get_direct_rqn(priv, ix, rrp)
+   },
};
 
/* Direct RQ Tables */
@@ -2224,8 +2226,12 @@ static void mlx5e_redirect_rqts_to_chann
 {
struct mlx5e_redirect_rqt_param rrp = {
.is_rss= true,
-   .rss.channels  = chs,
-   .rss.hfunc = chs->params.rss_hfunc
+   {
+   .rss = {
+   .channels  = chs,
+   .hfunc = chs->params.rss_hfunc,
+   }
+   },
};
 
mlx5e_redirect_rqts(priv, rrp);
@@ -2235,7 +2241,9 @@ static void mlx5e_redirect_rqts_to_drop(
 {
struct mlx5e_redirect_rqt_param drop_rrp = {
.is_rss = false,
-   .rqn = priv->drop_rq.rqn
+   {
+   .rqn = priv->drop_rq.rqn,
+   },
};
 
mlx5e_redirect_rqts(priv, drop_rrp);
_

Re: [BUG] ethernet:mellanox:mlx5: Oops in health_recover get_nic_state(dev)

2017-03-31 Thread Goel, Sameer



On 3/28/2017 7:25 AM, Daniel Jurgens wrote:
> On 3/28/2017 4:11 AM, Saeed Mahameed wrote:
>> On Tue, Mar 28, 2017 at 2:45 AM, Goel, Sameer  wrote:
>>> Stack frame:
>>> [ 1744.418958] [] get_nic_state+0x24/0x40 [mlx5_core]
>>> [ 1744.425273] [] health_recover+0x28/0x80 [mlx5_core]
>>> [ 1744.431496] [] process_one_work+0x150/0x460
>>> [ 1744.437218] [] worker_thread+0x50/0x4b8
>>> [ 1744.442609] [] kthread+0xd8/0xf0
>>> [ 1744.447377] [] ret_from_fork+0x10/0x20
>>>
>>> Summary:
>>> This issue was seen on QDF2400 system 30 mins after while running speccpu 
>>> 2006. During the test a recoverable PCIe error was seen that gave the 
>>> following log:
>>> [ 1673.170969] pcieport 0002:00:00.0: aer_status: 0x4000, aer_mask: 
>>> 0x0040
>>> [ 1673.177961] pcieport 0002:00:00.0: aer_layer=Transaction Layer, 
>>> aer_agent=Requester ID
>>> [ 1673.185832] pcieport 0002:00:00.0: aer_uncor_severity: 0x00462030
>>> [ 1675.536391] mlx5_core 0002:01:00.0: assert_var[0] 0x
>>> [ 1675.541093] mlx5_core 0002:01:00.0: assert_var[1] 0x
>>> [ 1675.546750] mlx5_core 0002:01:00.0: assert_var[2] 0x
>>> [ 1675.552377] mlx5_core 0002:01:00.0: assert_var[3] 0x
>>> [ 1675.558040] mlx5_core 0002:01:00.0: assert_var[4] 0x
>>> [ 1675.563661] mlx5_core 0002:01:00.0: assert_exit_ptr 0x
>>> [ 1675.569488] mlx5_core 0002:01:00.0: assert_callra 0x
>>> [ 1675.575120] mlx5_core 0002:01:00.0: fw_ver 15.4095.65535
>>> [ 1675.580426] mlx5_core 0002:01:00.0: hw_id 0x
>>> [ 1675.585363] mlx5_core 0002:01:00.0: irisc_index 255
>>> [ 1675.590242] mlx5_core 0002:01:00.0: synd 0xff: unrecognized error
>>> [ 1675.596301] mlx5_core 0002:01:00.0: ext_synd 0x
>>> [ 1675.601209] mlx5_core 0002:01:00.0: mlx5_enter_error_state:120:(pid 
>>> 7205): start
>>> [ 1675.608613] mlx5_core 0002:01:00.0: mlx5_enter_error_state:127:(pid 
>>> 7205): end
>>>
>>> After the above log we see the above stackframe and a page fault due to 
>>> invalid dev pointer.
>>>
>>> So the the recovery work is queued and the timer is stopped. Somehow the 
>>> workqueue is not cleared and when it runs the dev pointer is invalid.
>>>
>>> This issue was difficult to repro and was seen only once in multiple runs 
>>> on a specific device.
>> Hi Sameer,
>>
>> Thanks for the report,
>> adding more relevant ppl
>>
>> Mohamad/Daniel Does the above ring a bell ?
>> can you check ?
>>
>> Thanks
>> Saeed.
> 
> Hi Sameer, Can you tell me if you have these 2 patches?
> 
> 5e44fca504705 ('net/mlx5: Only cancel recovery work when cleaning up device')
> 
> 689a248df83b ("net/mlx5: Cancel recovery work in remove flow")
> 
> 

Hi Daniel, 
 No I do not have these patches. I can pick up the cleanups from the mailing 
list. Are these patches
acked for 4.12?
Thanks,
Sameer
 
-- 
 Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, 
Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Issue with load across multiple connections

2017-03-31 Thread Tom Herbert

Hi netdev,

I'm forwarding this problem report from Ulrich Speidel.

The correlation to FDs would seem to point to application always
reading for FDs in order and it seems unlikely that FDs have any
relevance in the networking stack below the file IO for sockets. Some
of the stack experts might have a better idea though...

Thanks,
Tom

-- Forwarded message --
From: Ulrich Speidel 
Date: Fri, Mar 31, 2017 at 2:11 AM
Subject: Linux kernel query
To: t...@quantonium.net
Cc: Brian Carpenter , Nevil Brownlee
, l...@steinwurf.com, Lei Qian



Dear Tom,

I'm a colleague of Brian Carpenter at the University of Auckland. He
has suggested that I contact you about this as I'm not sure that what
we have discovered is a bug - it may even be an intended feature but
I've failed to find it documented anywhere. From all we can tell, the
problem seems related to how socket file descriptor numbers & SKBs are
handled in POSIX-compliant kernels. I'm not a kernel hack so apologise
in advance if terminology isn't always spot-on.

This is how we triggered the effect: We have a setup in which we have
multiple physical network clients connect to multiple servers at
random. On the client side, we create N "channels" (indexed, say 0 to
N-1) on each physical client. Each channel executes the following
task:

1) create a fresh TCP socket
2) connect to a randomly chosen server from our pool
3) receive a quantity of data that the server sends (this may be
somewhere between 0 bytes and hundreds of MB). In our case, we use the
application merely as a network traffic generator, so the receive
process consists of recording the number of bytes made available by
the socket and freeing the buffer without ever actually reading it.
4) wait for server disconnect
5) free socket (i.e., we're not explicitly re-using the previous
connection's socket)
6) jump back to 1)

We keep track of the throughput on each channel.

Note that the effect is the same regardless of whether we implement
each channel in a process of its own, in a threaded application, or
whether we use non-blocking sockets and check on them in a loop.

What we would normally expect is that the each channel would receive
about the same goodput over time, regardless of the value of N. Note
that each channel uses a succession of fresh sockets.

What actually happens is this: For up to approximately N=20 channels
on a single physical client (we've tried Raspbian and Debian, as well
as Ubuntu), each channel sees on average substantial and comparable
levels of throughput, adding up to values approaching network
interface capacity. Once we push N beyond 20, the throughput on any
further channels drops to zero very quickly. For N=30, we typically
see at least half a dozen channels with no throughput at all beyond
the connection handshake. Throughput on the first 20 or so channels
remains pretty much unchanged. The sockets on the channels with low or
no throughput all manage to connect, but remain in connected state but
receive no data.

Throughput on the first ~20 channels is sustainable for long periods
of time - so we're not dealing with an intermittent bug that causes
our sockets to stall: The affected sockets / channels never receive
anything (and the sockets around the 20-or-so mark very little). So it
seems that subsequent sockets on a channel inherit the ability of
their predecessor to receive data at quantity.

We also see the issue on a single physical Raspberry client having the
sole use of 14 Super Micros on GbE interfaces to download from. So we
know we're definitely not overloading the server side (note that we
are able to saturate the network to the Pi). Here is some sample data
from the Pi (my apologies for the rough format):

Channel index/MB transferred/Number of connections completed+attempted
0 2.37 144
1 29.32 92
2 2.71 132
3 10.88 705
4 11.90 513
5 16.045990 571
6 9.631539 598
7 15.420138 362
8 9.854378 106
9 8.975264 315
10 8.020266 526
11 6.369107 582
12 8.877760 277
13 8.148640 406
14 13.536793 301
15 9.804712 55
16 7.643378 292
17 7.970028 393
18 0.000120 1
19 9.359919 415
20 0.000120 1
21 0.000120 1
22 12.937519 314
23 0.000920 2
24 14.561784 362
25 0.000240 2
26 11.005030 535
27 0.000120 1
28 0.000120 1
29 0.000120 1

The total data rate in this example was 94.1 Mbps on the 100 Mbps
connection of the Pi. Experiment duration was 20 seconds on this
occasion, but the effect is stable - we have observed it for many
minutes. Once "stuck", a channel remains stuck.

The fact that the incoming data rate accrues almost exclusively to the
~20 busy channels suggests that the sockets on the other channel are
either advertising a window of 0 bytes or are not generating ACKs for
incoming data, or both.

We have considered the possibility of FIN packets getting dropped
somewhere along the way - not only is this unlikely since they are
small, but the effect also happens

Pointer type of _arp in __skb_flow_dissect()

2017-03-31 Thread Nicolas Iooss

Hello,

Linux 4.11-rc4 contains the following code in function
__skb_flow_dissect(), file net/core/flow_dissector.c:

const struct arphdr *arp;
struct arphdr *_arp;

arp = __skb_header_pointer(skb, nhoff, sizeof(_arp), data,
   hlen, &_arp);


Here _arp and arp are both pointers to arphdr structures. In other calls
to __skb_header_pointer(), the buffer argument (_arp here) would have
been a struct instead of a pointer. What makes ARP packets different in
__skb_flow_dissect()?

Thanks,
Nicolas

PS: the code which I am curious about seems to have been introduced in
4.11-rc1 with commit 55733350e5e8 ("flow disector: ARP support")

Re: probably serious conntrack/netfilter panic, 4.8.14, timers and intel turbo

2017-03-31 Thread Denys Fedoryshchenko

I am not sure if it is same issue, but panics still happen, but much 
less. Same server, nat.
I will upgrade to latest 4.10.x build, because for this one i dont have 
files anymore (for symbols and etc).


 [864288.511464] Modules linked in: nf_conntrack_netlink nf_nat_pptp 
nf_nat_proto_gre xt_TCPMSS xt_connmark ipt_MASQUERADE 
nf_nat_masquerade_ipv4 xt_nat xt_rateest xt_RATEEST nf_conntrack_pptp 
nf_conntrack_proto_gre xt_CT xt_set xt_hl xt_tcpudp ip_set_hash_net 
ip_set nfnetlink iptable_raw iptable_mangle iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_filter ip_tables x_tables netconsole configfs 8021q garp mrp stp 
llc bonding ixgbe dca
 [864288.512740] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 
4.10.1-build-0132 #2
 [864288.513005] Hardware name: Intel Corporation S2600WTT/S2600WTT, 
BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016

 [864288.513454] task: 881038cb6000 task.stack: c9000c678000
 [864288.513719] RIP: 0010:nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat]
 [864288.513980] RSP: 0018:88103fc43ba0 EFLAGS: 00010206
 [864288.514237] RAX: 140504021ad8 RBX: 881004021ad8 RCX: 
0100
 [864288.514677] RDX: 140504021ad8 RSI: 88103279628c RDI: 
88103279628c
 [864288.515117] RBP: 88103fc43be0 R08: c9003b47b558 R09: 
0004
 [864288.515558] R10: 8820083d00ce R11: 881038480b00 R12: 
881004021a40
 [864288.515998] R13:  R14: a00d406e R15: 
c90036e11000
 [864288.516438] FS:  () GS:88103fc4() 
knlGS:

 [864288.516882] CS:  0010 DS:  ES:  CR0: 80050033
 [864288.517142] CR2: 7fbfc303f978 CR3: 00202267c000 CR4: 
001406e0

 [864288.517580] Call Trace:
 [864288.517831]  
 [864288.518090]  __nf_ct_ext_destroy+0x3f/0x57 [nf_conntrack]
 [864288.518352]  nf_conntrack_free+0x25/0x55 [nf_conntrack]
 [864288.518615]  destroy_conntrack+0x80/0x8c [nf_conntrack]
 [864288.518880]  nf_conntrack_destroy+0x19/0x1b
 [864288.519137]  nf_ct_gc_expired+0x6e/0x71 [nf_conntrack]
 [864288.519400]  __nf_conntrack_find_get+0x89/0x2ab [nf_conntrack]
 [864288.519663]  nf_conntrack_in+0x1ec/0x877 [nf_conntrack]
 [864288.519925]  ipv4_conntrack_in+0x1c/0x1e [nf_conntrack_ipv4]
 [864288.520185]  nf_hook_slow+0x2a/0x9a
 [864288.520439]  ip_rcv+0x318/0x337
 [864288.520692]  ? ip_local_deliver_finish+0x1ba/0x1ba
 [864288.520953]  __netif_receive_skb_core+0x607/0x852
 [864288.521213]  ? kmem_cache_free_bulk+0x232/0x274
 [864288.521471]  __netif_receive_skb+0x18/0x5a
 [864288.521727]  process_backlog+0x90/0x113
 [864288.521981]  net_rx_action+0x114/0x2dc
 [864288.522238]  ? sched_clock_cpu+0x15/0x94
 [864288.522496]  __do_softirq+0xe7/0x259
 [864288.522753]  irq_exit+0x52/0x93
 [864288.523006]  smp_call_function_single_interrupt+0x33/0x35
 [864288.523267]  call_function_single_interrupt+0x83/0x90
 [864288.523531] RIP: 0010:mwait_idle+0x9e/0x125
 [864288.523786] RSP: 0018:c9000c67beb0 EFLAGS: 0246 ORIG_RAX: 
ff04
 [864288.524229] RAX:  RBX: 881038cb6000 RCX: 

 [864288.524669] RDX:  RSI:  RDI: 

 [864288.525110] RBP: c9000c67bec0 R08: 0001 R09: 

 [864288.525551] R10: c9000c67be50 R11:  R12: 
0011
 [864288.525991] R13:  R14: 881038cb6000 R15: 
881038cb6000

 [864288.526429]  
 [864288.526682]  arch_cpu_idle+0xf/0x11
 [864288.526937]  default_idle_call+0x25/0x27
 [864288.527193]  do_idle+0xb6/0x15d
 [864288.527446]  cpu_startup_entry+0x1f/0x21
 [864288.527702]  start_secondary+0xe8/0xeb
 [864288.527961]  start_cpu+0x14/0x14
 [864288.528212] Code: 48 89 f7 48 89 75 c8 e8 6e e8 8f e1 8b 45 c4 48 
8b 75 c8 48 83 c0 08 4d 8d 04 c7 49 8b 04 c7 a8 01 75 46 48 39 c3 74 1e 
48 89 c2 <48> 8b 7a 08 48 85 ff 0f 84 b3 00 00 00 48 39 fb 0f 84 9e 00 
00
 [864288.528905] RIP: nf_nat_cleanup_conntrack+0xe2/0x1bc [nf_nat] RSP: 
88103fc43ba0

 [864288.529362] ---[ end trace e3c40a5e4bf43e26 ]---
 [864288.567835] Kernel panic - not syncing: Fatal exception in 
interrupt

 [864288.568122] Kernel Offset: disabled
 [864288.587619] Rebooting in 5 seconds..

Re: [PATCH net] sctp: use right in and out stream cnt

2017-03-31 Thread Marcelo Ricardo Leitner

On Fri, Mar 31, 2017 at 05:57:28PM +0800, Xin Long wrote:
> Since sctp reconf was added in sctp, the real cnt of in/out stream
> have not been c.sinit_max_instreams and c.sinit_num_ostreams any
> more.
> 
> This patch is to replace them with stream->in/outcnt.
> 
> Signed-off-by: Xin Long 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/outqueue.c |  3 +--
>  net/sctp/proc.c |  4 ++--
>  net/sctp/sm_statefuns.c |  6 +++---
>  net/sctp/socket.c   | 10 +-
>  4 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 025ccff..8081476 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1026,8 +1026,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int 
> rtx_timeout, gfp_t gfp)
>   /* RFC 2960 6.5 Every DATA chunk MUST carry a valid
>* stream identifier.
>*/
> - if (chunk->sinfo.sinfo_stream >=
> - asoc->c.sinit_num_ostreams) {
> + if (chunk->sinfo.sinfo_stream >= asoc->stream->outcnt) {
>  
>   /* Mark as failed send. */
>   sctp_chunk_fail(chunk, SCTP_ERROR_INV_STRM);
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index 206377f..a0b29d4 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -361,8 +361,8 @@ static int sctp_assocs_seq_show(struct seq_file *seq, 
> void *v)
>   sctp_seq_dump_remote_addrs(seq, assoc);
>   seq_printf(seq, "\t%8lu %5d %5d %4d %4d %4d %8d "
>  "%8d %8d %8d %8d",
> - assoc->hbinterval, assoc->c.sinit_max_instreams,
> - assoc->c.sinit_num_ostreams, assoc->max_retrans,
> + assoc->hbinterval, assoc->stream->incnt,
> + assoc->stream->outcnt, assoc->max_retrans,
>   assoc->init_retries, assoc->shutdown_retries,
>   assoc->rtx_data_chunks,
>   atomic_read(>sk_wmem_alloc),
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index e03bb1a..24c6ccc 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -3946,7 +3946,7 @@ sctp_disposition_t sctp_sf_eat_fwd_tsn(struct net *net,
>  
>   /* Silently discard the chunk if stream-id is not valid */
>   sctp_walk_fwdtsn(skip, chunk) {
> - if (ntohs(skip->stream) >= asoc->c.sinit_max_instreams)
> + if (ntohs(skip->stream) >= asoc->stream->incnt)
>   goto discard_noforce;
>   }
>  
> @@ -4017,7 +4017,7 @@ sctp_disposition_t sctp_sf_eat_fwd_tsn_fast(
>  
>   /* Silently discard the chunk if stream-id is not valid */
>   sctp_walk_fwdtsn(skip, chunk) {
> - if (ntohs(skip->stream) >= asoc->c.sinit_max_instreams)
> + if (ntohs(skip->stream) >= asoc->stream->incnt)
>   goto gen_shutdown;
>   }
>  
> @@ -6353,7 +6353,7 @@ static int sctp_eat_data(const struct sctp_association 
> *asoc,
>* and discard the DATA chunk.
>*/
>   sid = ntohs(data_hdr->stream);
> - if (sid >= asoc->c.sinit_max_instreams) {
> + if (sid >= asoc->stream->incnt) {
>   /* Mark tsn as received even though we drop it */
>   sctp_add_cmd_sf(commands, SCTP_CMD_REPORT_TSN, SCTP_U32(tsn));
>  
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index baa269a..12fbae2 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -1920,7 +1920,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr 
> *msg, size_t msg_len)
>   }
>  
>   /* Check for invalid stream. */
> - if (sinfo->sinfo_stream >= asoc->c.sinit_num_ostreams) {
> + if (sinfo->sinfo_stream >= asoc->stream->outcnt) {
>   err = -EINVAL;
>   goto out_free;
>   }
> @@ -4461,8 +4461,8 @@ int sctp_get_sctp_info(struct sock *sk, struct 
> sctp_association *asoc,
>   info->sctpi_rwnd = asoc->a_rwnd;
>   info->sctpi_unackdata = asoc->unack_data;
>   info->sctpi_penddata = sctp_tsnmap_pending(>peer.tsn_map);
> - info->sctpi_instrms = asoc->c.sinit_max_instreams;
> - info->sctpi_outstrms = asoc->c.sinit_num_ostreams;
> + info->sctpi_instrms = asoc->stream->incnt;
> + info->sctpi_outstrms = asoc->stream->outcnt;
>   list_for_each(pos, >base.inqueue.in_chunk_list)
>   info->sctpi_inqueue++;
>   list_for_each(pos, >outqueue.out_chunk_list)
> @@ -4691,8 +4691,8 @@ static int sctp_getsockopt_sctp_status(struct sock *sk, 
> int len,
>   status.sstat_unackdata = asoc->unack_data;
>  
>   status.sstat_penddata = sctp_tsnmap_pending(>peer.tsn_map);
> - status.sstat_instrms = asoc->c.sinit_max_instreams;
> - status.sstat_outstrms = asoc->c.sinit_num_ostreams;
> + status.sstat_instrms = asoc->stream->incnt;
> + status.sstat_outstrms = asoc->stream->outcnt;
>

Re: [PATCH v6 00/15] Replace PCI pool by DMA pool API

2017-03-31 Thread Romain Perier

ping


Le 19/03/2017 à 18:03, Romain Perier a écrit :
> The current PCI pool API are simple macro functions direct expanded to
> the appropriate dma pool functions. The prototypes are almost the same
> and semantically, they are very similar. I propose to use the DMA pool
> API directly and get rid of the old API.
>
> This set of patches, replaces the old API by the dma pool API
> and remove the defines.
>
> Changes in v6:
> - Fixed an issue reported by kbuild test robot about changes in DAC960
> - Removed patches 15/19,16/19,17/19,18/19. They have been merged by Greg
> - Added Acked-by Tags
>
> Changes in v5:
> - Re-worded the cover letter (remove sentence about checkpatch.pl)
> - Rebased series onto next-20170308
> - Fix typos in commit message
> - Added Acked-by Tags
>
> Changes in v4:
> - Rebased series onto next-20170301
> - Removed patch 20/20: checks done by checkpath.pl, no longer required.
>   Thanks to Peter and Joe for their feedbacks.
> - Added Reviewed-by tags
>
> Changes in v3:
> - Rebased series onto next-20170224
> - Fix checkpath.pl reports for patch 11/20 and patch 12/20
> - Remove prefix RFC
> Changes in v2:
> - Introduced patch 18/20
> - Fixed cosmetic changes: spaces before brace, live over 80 characters
> - Removed some of the check for NULL pointers before calling dma_pool_destroy
> - Improved the regexp in checkpatch for pci_pool, thanks to Joe Perches
> - Added Tested-by and Acked-by tags
>
> Romain Perier (15):
>   block: DAC960: Replace PCI pool old API
>   dmaengine: pch_dma: Replace PCI pool old API
>   IB/mthca: Replace PCI pool old API
>   net: e100: Replace PCI pool old API
>   mlx4: Replace PCI pool old API
>   mlx5: Replace PCI pool old API
>   wireless: ipw2200: Replace PCI pool old API
>   scsi: be2iscsi: Replace PCI pool old API
>   scsi: csiostor: Replace PCI pool old API
>   scsi: lpfc: Replace PCI pool old API
>   scsi: megaraid: Replace PCI pool old API
>   scsi: mpt3sas: Replace PCI pool old API
>   scsi: mvsas: Replace PCI pool old API
>   scsi: pmcraid: Replace PCI pool old API
>   PCI: Remove PCI pool macro functions
>
>  drivers/block/DAC960.c|  38 +
>  drivers/block/DAC960.h|   4 +-
>  drivers/dma/pch_dma.c |  12 +--
>  drivers/infiniband/hw/mthca/mthca_av.c|  10 +--
>  drivers/infiniband/hw/mthca/mthca_cmd.c   |   8 +-
>  drivers/infiniband/hw/mthca/mthca_dev.h   |   4 +-
>  drivers/net/ethernet/intel/e100.c |  12 +--
>  drivers/net/ethernet/mellanox/mlx4/cmd.c  |  10 +--
>  drivers/net/ethernet/mellanox/mlx4/mlx4.h |   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c |  11 +--
>  drivers/net/wireless/intel/ipw2x00/ipw2200.c  |  13 ++--
>  drivers/scsi/be2iscsi/be_iscsi.c  |   6 +-
>  drivers/scsi/be2iscsi/be_main.c   |   6 +-
>  drivers/scsi/be2iscsi/be_main.h   |   2 +-
>  drivers/scsi/csiostor/csio_hw.h   |   2 +-
>  drivers/scsi/csiostor/csio_init.c |  11 +--
>  drivers/scsi/csiostor/csio_scsi.c |   6 +-
>  drivers/scsi/lpfc/lpfc.h  |  14 ++--
>  drivers/scsi/lpfc/lpfc_init.c |  16 ++--
>  drivers/scsi/lpfc/lpfc_mem.c  | 106 
> +-
>  drivers/scsi/lpfc/lpfc_nvme.c |   6 +-
>  drivers/scsi/lpfc/lpfc_nvmet.c|   4 +-
>  drivers/scsi/lpfc/lpfc_scsi.c |  12 +--
>  drivers/scsi/megaraid/megaraid_mbox.c |  33 
>  drivers/scsi/megaraid/megaraid_mm.c   |  32 
>  drivers/scsi/megaraid/megaraid_sas_base.c |  29 +++
>  drivers/scsi/megaraid/megaraid_sas_fusion.c   |  66 
>  drivers/scsi/mpt3sas/mpt3sas_base.c   |  73 +-
>  drivers/scsi/mvsas/mv_init.c  |   6 +-
>  drivers/scsi/mvsas/mv_sas.c   |   6 +-
>  drivers/scsi/pmcraid.c|  10 +--
>  drivers/scsi/pmcraid.h|   2 +-
>  include/linux/mlx5/driver.h   |   2 +-
>  include/linux/pci.h   |   9 ---
>  34 files changed, 280 insertions(+), 303 deletions(-)
>

Re: [PATCH net-next] sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp

2017-03-31 Thread Marcelo Ricardo Leitner

On Fri, Mar 31, 2017 at 06:14:09PM +0800, Xin Long wrote:
> Before when implementing sctp prsctp, SCTP_PR_STREAM_STATUS wasn't
> added, as it needs to save abandoned_(un)sent for every stream.
> 
> After sctp stream reconf is added in sctp, assoc has structure
> sctp_stream_out to save per stream info.
> 
> This patch is to add SCTP_PR_STREAM_STATUS by putting the prsctp
> per stream statistics into sctp_stream_out.
> 
> Signed-off-by: Xin Long 
> ---
>  include/net/sctp/structs.h |  2 ++
>  include/uapi/linux/sctp.h  |  1 +
>  net/sctp/chunk.c   | 14 +--
>  net/sctp/outqueue.c| 10 
>  net/sctp/socket.c  | 59 
> ++
>  5 files changed, 84 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 592dece..3e61a54 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -1315,6 +1315,8 @@ struct sctp_inithdr_host {
>  struct sctp_stream_out {
>   __u16   ssn;
>   __u8state;
> + __u64   abandoned_unsent[SCTP_PR_INDEX(MAX) + 1];
> + __u64   abandoned_sent[SCTP_PR_INDEX(MAX) + 1];
>  };
>  
>  struct sctp_stream_in {
> diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> index 7212870..ced9d8b 100644
> --- a/include/uapi/linux/sctp.h
> +++ b/include/uapi/linux/sctp.h
> @@ -115,6 +115,7 @@ typedef __s32 sctp_assoc_t;
>  #define SCTP_PR_SUPPORTED113
>  #define SCTP_DEFAULT_PRINFO  114
>  #define SCTP_PR_ASSOC_STATUS 115
> +#define SCTP_PR_STREAM_STATUS116
>  #define SCTP_RECONFIG_SUPPORTED  117
>  #define SCTP_ENABLE_STREAM_RESET 118
>  #define SCTP_RESET_STREAMS   119
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index e3621cb..697721a 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -306,14 +306,24 @@ int sctp_chunk_abandoned(struct sctp_chunk *chunk)
>  
>   if (SCTP_PR_TTL_ENABLED(chunk->sinfo.sinfo_flags) &&
>   time_after(jiffies, chunk->msg->expires_at)) {
> - if (chunk->sent_count)
> + struct sctp_stream_out *streamout =
> + >asoc->stream->out[chunk->sinfo.sinfo_stream];
> +
> + if (chunk->sent_count) {
>   chunk->asoc->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
> - else
> + streamout->abandoned_sent[SCTP_PR_INDEX(TTL)]++;
> + } else {
>   chunk->asoc->abandoned_unsent[SCTP_PR_INDEX(TTL)]++;
> + streamout->abandoned_unsent[SCTP_PR_INDEX(TTL)]++;
> + }
>   return 1;
>   } else if (SCTP_PR_RTX_ENABLED(chunk->sinfo.sinfo_flags) &&
>  chunk->sent_count > chunk->sinfo.sinfo_timetolive) {
> + struct sctp_stream_out *streamout =
> + >asoc->stream->out[chunk->sinfo.sinfo_stream];
> +
>   chunk->asoc->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
> + streamout->abandoned_sent[SCTP_PR_INDEX(RTX)]++;
>   return 1;
>   } else if (!SCTP_PR_POLICY(chunk->sinfo.sinfo_flags) &&
>  chunk->msg->expires_at &&
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 025ccff..3f78d7f 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -353,6 +353,8 @@ static int sctp_prsctp_prune_sent(struct sctp_association 
> *asoc,
>   struct sctp_chunk *chk, *temp;
>  
>   list_for_each_entry_safe(chk, temp, queue, transmitted_list) {
> + struct sctp_stream_out *streamout;
> +
>   if (!SCTP_PR_PRIO_ENABLED(chk->sinfo.sinfo_flags) ||
>   chk->sinfo.sinfo_timetolive <= sinfo->sinfo_timetolive)
>   continue;
> @@ -361,8 +363,10 @@ static int sctp_prsctp_prune_sent(struct 
> sctp_association *asoc,
>   sctp_insert_list(>outqueue.abandoned,
>>transmitted_list);
>  
> + streamout = >stream->out[chk->sinfo.sinfo_stream];
>   asoc->sent_cnt_removable--;
>   asoc->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
> + streamout->abandoned_sent[SCTP_PR_INDEX(PRIO)]++;
>  
>   if (!chk->tsn_gap_acked) {
>   if (chk->transport)
> @@ -396,6 +400,12 @@ static int sctp_prsctp_prune_unsent(struct 
> sctp_association *asoc,
>   q->out_qlen -= chk->skb->len;
>   asoc->sent_cnt_removable--;
>   asoc->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
> + if (chk->sinfo.sinfo_stream < asoc->stream->outcnt) {
> + struct sctp_stream_out *streamout =
> + >stream->out[chk->sinfo.sinfo_stream];
> +
> + streamout->abandoned_unsent[SCTP_PR_INDEX(PRIO)]++;
> + }
>  
>   msg_len -= SCTP_DATA_SNDSIZE(chk) +
>  sizeof(struct sk_buff) +
> diff --git a/net/sctp/socket.c

[PATCH V3 net-next 00/18] net: hns: Misc. HNS Bug Fixes & Code Improvements

2017-03-31 Thread Salil Mehta

This patch set introduces various HNS bug fixes, optimizations and code
improvements.

Daode Huang (1):
  net: hns: bug fix of ethtool show the speed

Kejian Yan (7):
  net: hns: Remove the redundant adding and deleting mac function
  net: hns: Remove redundant mac_get_id()
  net: hns: Remove redundant mac table operations
  net: hns: Clean redundant code from hns_mdio.c file
  net: hns: Optimise the code in hns_mdio_wait_ready()
  net: hns: Simplify the exception sequence in hns_ppe_init()
  net: hns: Adjust the SBM module buffer threshold

Salil Mehta (1):
  net: hns: Some checkpatch.pl script & warning fixes

lipeng (9):
  net: hns: Fix the implementation of irq affinity function
  net: hns: Modify GMAC init TX threshold value
  net: hns: Optimize the code for GMAC pad and crc Config
  net: hns: Remove redundant memset during buffer release
  net: hns: Optimize hns_nic_common_poll for better performance
  net: hns: Fix to adjust buf_size of ring according to mtu
  net: hns: Replace netif_tx_lock to ring spin lock
  net: hns: Correct HNS RSS key set function
  net: hns: Avoid Hip06 chip TX packet line bug

 drivers/net/ethernet/hisilicon/hns/hnae.c  |   7 +-
 drivers/net/ethernet/hisilicon/hns/hnae.h  |  47 ++-
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c  | 127 +--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c |  61 ++--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  |  41 +--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h  |   5 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 249 +
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |  14 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c  |  17 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  | 143 ++--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h  |  26 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  |   6 +-
 .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c|  13 -
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  | 400 -
 drivers/net/ethernet/hisilicon/hns/hns_enet.h  |   3 +-
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |  34 +-
 drivers/net/ethernet/hisilicon/hns_mdio.c  |  20 +-
 17 files changed, 653 insertions(+), 560 deletions(-)

-- 
2.7.4

[PATCH V3 net-next 01/18] net: hns: Fix the implementation of irq affinity function

2017-03-31 Thread Salil Mehta

From: lipeng 

This patch fixes the implementation of the IRQ affinity
function. This function is used to create the cpu mask
which eventually is used to initialize the cpu<->queue
association for XPS(Transmit Packet Steering).

Signed-off-by: lipeng 
Signed-off-by: Kejian Yan 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 75 +++
 drivers/net/ethernet/hisilicon/hns/hns_enet.h |  1 +
 2 files changed, 30 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index fca37e2..73ec8c8 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -1196,54 +1196,31 @@ static void hns_nic_ring_close(struct net_device 
*netdev, int idx)
napi_disable(>ring_data[idx].napi);
 }
 
-static void hns_set_irq_affinity(struct hns_nic_priv *priv)
+static int hns_nic_init_affinity_mask(int q_num, int ring_idx,
+ struct hnae_ring *ring, cpumask_t *mask)
 {
-   struct hnae_handle *h = priv->ae_handle;
-   struct hns_nic_ring_data *rd;
-   int i;
int cpu;
-   cpumask_var_t mask;
 
-   if (!alloc_cpumask_var(, GFP_KERNEL))
-   return;
-
-   /*diffrent irq banlance for 16core and 32core*/
-   if (h->q_num == num_possible_cpus()) {
-   for (i = 0; i < h->q_num * 2; i++) {
-   rd = >ring_data[i];
-   if (cpu_online(rd->queue_index)) {
-   cpumask_clear(mask);
-   cpu = rd->queue_index;
-   cpumask_set_cpu(cpu, mask);
-   (void)irq_set_affinity_hint(rd->ring->irq,
-   mask);
-   }
-   }
+   /* Diffrent irq banlance between 16core and 32core.
+* The cpu mask set by ring index according to the ring flag
+* which indicate the ring is tx or rx.
+*/
+   if (q_num == num_possible_cpus()) {
+   if (is_tx_ring(ring))
+   cpu = ring_idx;
+   else
+   cpu = ring_idx - q_num;
} else {
-   for (i = 0; i < h->q_num; i++) {
-   rd = >ring_data[i];
-   if (cpu_online(rd->queue_index * 2)) {
-   cpumask_clear(mask);
-   cpu = rd->queue_index * 2;
-   cpumask_set_cpu(cpu, mask);
-   (void)irq_set_affinity_hint(rd->ring->irq,
-   mask);
-   }
-   }
-
-   for (i = h->q_num; i < h->q_num * 2; i++) {
-   rd = >ring_data[i];
-   if (cpu_online(rd->queue_index * 2 + 1)) {
-   cpumask_clear(mask);
-   cpu = rd->queue_index * 2 + 1;
-   cpumask_set_cpu(cpu, mask);
-   (void)irq_set_affinity_hint(rd->ring->irq,
-   mask);
-   }
-   }
+   if (is_tx_ring(ring))
+   cpu = ring_idx * 2;
+   else
+   cpu = (ring_idx - q_num) * 2 + 1;
}
 
-   free_cpumask_var(mask);
+   cpumask_clear(mask);
+   cpumask_set_cpu(cpu, mask);
+
+   return cpu;
 }
 
 static int hns_nic_init_irq(struct hns_nic_priv *priv)
@@ -1252,6 +1229,7 @@ static int hns_nic_init_irq(struct hns_nic_priv *priv)
struct hns_nic_ring_data *rd;
int i;
int ret;
+   int cpu;
 
for (i = 0; i < h->q_num * 2; i++) {
rd = >ring_data[i];
@@ -1261,7 +1239,7 @@ static int hns_nic_init_irq(struct hns_nic_priv *priv)
 
snprintf(rd->ring->ring_name, RCB_RING_NAME_LEN,
 "%s-%s%d", priv->netdev->name,
-(i < h->q_num ? "tx" : "rx"), rd->queue_index);
+(is_tx_ring(rd->ring) ? "tx" : "rx"), rd->queue_index);
 
rd->ring->ring_name[RCB_RING_NAME_LEN - 1] = '\0';
 
@@ -1273,12 +1251,17 @@ static int hns_nic_init_irq(struct hns_nic_priv *priv)
return ret;
}
disable_irq(rd->ring->irq);
+
+   cpu = hns_nic_init_affinity_mask(h->q_num, i,
+rd->ring, >mask);
+
+   if (cpu_online(cpu))
+   irq_set_affinity_hint(rd->ring->irq,
+ >mask);
+

[PATCH V3 net-next 02/18] net: hns: Modify GMAC init TX threshold value

2017-03-31 Thread Salil Mehta

From: lipeng 

This patch reduces GMAC TX threshold value to avoid gmac
hang-up with speed 100M/duplex half.

Signed-off-by: lipeng 
Signed-off-by: JinchuanTian 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 6 ++
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h  | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 3382441..a8dbe00 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -325,6 +325,12 @@ static void hns_gmac_init(void *mac_drv)
hns_gmac_tx_loop_pkt_dis(mac_drv);
if (drv->mac_cb->mac_type == HNAE_PORT_DEBUG)
hns_gmac_set_uc_match(mac_drv, 0);
+
+   /* reduce gmac tx water line to avoid gmac hang-up
+* in speed 100M and duplex half.
+*/
+   dsaf_set_dev_field(drv, GMAC_TX_WATER_LINE_REG, GMAC_TX_WATER_LINE_MASK,
+  GMAC_TX_WATER_LINE_SHIFT, 8);
 }
 
 void hns_gmac_update_stats(void *mac_drv)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
index 8fa18fc..4b8af68 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h
@@ -466,6 +466,7 @@
 
 #define GMAC_DUPLEX_TYPE_REG   0x0008UL
 #define GMAC_FD_FC_TYPE_REG0x000CUL
+#define GMAC_TX_WATER_LINE_REG 0x0010UL
 #define GMAC_FC_TX_TIMER_REG   0x001CUL
 #define GMAC_FD_FC_ADDR_LOW_REG0x0020UL
 #define GMAC_FD_FC_ADDR_HIGH_REG   0x0024UL
@@ -912,6 +913,9 @@
 
 #define GMAC_DUPLEX_TYPE_B 0
 
+#define GMAC_TX_WATER_LINE_MASK((1UL << 8) - 1)
+#define GMAC_TX_WATER_LINE_SHIFT   0
+
 #define GMAC_FC_TX_TIMER_S 0
 #define GMAC_FC_TX_TIMER_M 0x
 
-- 
2.7.4

[PATCH V3 net-next 03/18] net: hns: Optimize the code for GMAC pad and crc Config

2017-03-31 Thread Salil Mehta

From: lipeng 

This patch optimises the init configuration code leg
for gmac pad and crc set interface.

Signed-off-by: lipeng 
Signed-off-by: JinchuanTian 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 36 ++
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index a8dbe00..723f3ae 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -148,6 +148,17 @@ static void hns_gmac_config_max_frame_length(void 
*mac_drv, u16 newval)
   GMAC_MAX_FRM_SIZE_S, newval);
 }
 
+static void hns_gmac_config_pad_and_crc(void *mac_drv, u8 newval)
+{
+   u32 tx_ctrl;
+   struct mac_driver *drv = (struct mac_driver *)mac_drv;
+
+   tx_ctrl = dsaf_read_dev(drv, GMAC_TRANSMIT_CONTROL_REG);
+   dsaf_set_bit(tx_ctrl, GMAC_TX_PAD_EN_B, !!newval);
+   dsaf_set_bit(tx_ctrl, GMAC_TX_CRC_ADD_B, !!newval);
+   dsaf_write_dev(drv, GMAC_TRANSMIT_CONTROL_REG, tx_ctrl);
+}
+
 static void hns_gmac_config_an_mode(void *mac_drv, u8 newval)
 {
struct mac_driver *drv = (struct mac_driver *)mac_drv;
@@ -250,7 +261,6 @@ static void hns_gmac_get_pausefrm_cfg(void *mac_drv, u32 
*rx_pause_en,
 static int hns_gmac_adjust_link(void *mac_drv, enum mac_speed speed,
u32 full_duplex)
 {
-   u32 tx_ctrl;
struct mac_driver *drv = (struct mac_driver *)mac_drv;
 
dsaf_set_dev_bit(drv, GMAC_DUPLEX_TYPE_REG,
@@ -279,14 +289,6 @@ static int hns_gmac_adjust_link(void *mac_drv, enum 
mac_speed speed,
return -EINVAL;
}
 
-   tx_ctrl = dsaf_read_dev(drv, GMAC_TRANSMIT_CONTROL_REG);
-   dsaf_set_bit(tx_ctrl, GMAC_TX_PAD_EN_B, 1);
-   dsaf_set_bit(tx_ctrl, GMAC_TX_CRC_ADD_B, 1);
-   dsaf_write_dev(drv, GMAC_TRANSMIT_CONTROL_REG, tx_ctrl);
-
-   dsaf_set_dev_bit(drv, GMAC_MODE_CHANGE_EN_REG,
-GMAC_MODE_CHANGE_EB_B, 1);
-
return 0;
 }
 
@@ -326,6 +328,11 @@ static void hns_gmac_init(void *mac_drv)
if (drv->mac_cb->mac_type == HNAE_PORT_DEBUG)
hns_gmac_set_uc_match(mac_drv, 0);
 
+   hns_gmac_config_pad_and_crc(mac_drv, 1);
+
+   dsaf_set_dev_bit(drv, GMAC_MODE_CHANGE_EN_REG,
+GMAC_MODE_CHANGE_EB_B, 1);
+
/* reduce gmac tx water line to avoid gmac hang-up
 * in speed 100M and duplex half.
 */
@@ -459,17 +466,6 @@ static int hns_gmac_config_loopback(void *mac_drv, enum 
hnae_loop loop_mode,
return 0;
 }
 
-static void hns_gmac_config_pad_and_crc(void *mac_drv, u8 newval)
-{
-   u32 tx_ctrl;
-   struct mac_driver *drv = (struct mac_driver *)mac_drv;
-
-   tx_ctrl = dsaf_read_dev(drv, GMAC_TRANSMIT_CONTROL_REG);
-   dsaf_set_bit(tx_ctrl, GMAC_TX_PAD_EN_B, !!newval);
-   dsaf_set_bit(tx_ctrl, GMAC_TX_CRC_ADD_B, !!newval);
-   dsaf_write_dev(drv, GMAC_TRANSMIT_CONTROL_REG, tx_ctrl);
-}
-
 static void hns_gmac_get_id(void *mac_drv, u8 *mac_id)
 {
struct mac_driver *drv = (struct mac_driver *)mac_drv;
-- 
2.7.4

[PATCH V3 net-next 04/18] net: hns: Remove redundant memset during buffer release

2017-03-31 Thread Salil Mehta

From: lipeng 

Because all members of desc_cb is assigned when xmit one package, so it
can delete in hnae_free_buffer, as follows:
- "dma, priv, length, type" are assigned in fill_v2_desc.
- "page_offset, reuse_flag, buf" are not used in tx direction.

Signed-off-by: lipeng 
Signed-off-by: Weiwei Deng 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hnae.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c
index b6ed818..78af663 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.c
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -61,7 +61,6 @@ static void hnae_free_buffer(struct hnae_ring *ring, struct 
hnae_desc_cb *cb)
dev_kfree_skb_any((struct sk_buff *)cb->priv);
else if (unlikely(is_rx_ring(ring)))
put_page((struct page *)cb->priv);
-   memset(cb, 0, sizeof(*cb));
 }
 
 static int hnae_map_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb)
-- 
2.7.4

[PATCH V3 net-next 06/18] net: hns: Optimize hns_nic_common_poll for better performance

2017-03-31 Thread Salil Mehta

From: lipeng 

After polling less than buget packages, we need check again. If
there are still some packages, we call napi_schedule add softirq
queue, this is not better way. So we return buget value instead
of napi_schedule.

Signed-off-by: lipeng 
reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 50 ---
 drivers/net/ethernet/hisilicon/hns/hns_enet.h |  2 +-
 2 files changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 73ec8c8..5f67db2 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -859,7 +859,7 @@ static int hns_nic_rx_poll_one(struct hns_nic_ring_data 
*ring_data,
return recv_pkts;
 }
 
-static void hns_nic_rx_fini_pro(struct hns_nic_ring_data *ring_data)
+static bool hns_nic_rx_fini_pro(struct hns_nic_ring_data *ring_data)
 {
struct hnae_ring *ring = ring_data->ring;
int num = 0;
@@ -873,22 +873,23 @@ static void hns_nic_rx_fini_pro(struct hns_nic_ring_data 
*ring_data)
ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
ring_data->ring, 1);
 
-   napi_schedule(_data->napi);
+   return false;
+   } else {
+   return true;
}
 }
 
-static void hns_nic_rx_fini_pro_v2(struct hns_nic_ring_data *ring_data)
+static bool hns_nic_rx_fini_pro_v2(struct hns_nic_ring_data *ring_data)
 {
struct hnae_ring *ring = ring_data->ring;
-   int num = 0;
+   int num;
 
num = readl_relaxed(ring->io_base + RCB_REG_FBDNUM);
 
-   if (num == 0)
-   ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
-   ring, 0);
+   if (!num)
+   return true;
else
-   napi_schedule(_data->napi);
+   return false;
 }
 
 static inline void hns_nic_reclaim_one_desc(struct hnae_ring *ring,
@@ -989,7 +990,7 @@ static int hns_nic_tx_poll_one(struct hns_nic_ring_data 
*ring_data,
return 0;
 }
 
-static void hns_nic_tx_fini_pro(struct hns_nic_ring_data *ring_data)
+static bool hns_nic_tx_fini_pro(struct hns_nic_ring_data *ring_data)
 {
struct hnae_ring *ring = ring_data->ring;
int head;
@@ -1002,20 +1003,21 @@ static void hns_nic_tx_fini_pro(struct 
hns_nic_ring_data *ring_data)
ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
ring_data->ring, 1);
 
-   napi_schedule(_data->napi);
+   return false;
+   } else {
+   return true;
}
 }
 
-static void hns_nic_tx_fini_pro_v2(struct hns_nic_ring_data *ring_data)
+static bool hns_nic_tx_fini_pro_v2(struct hns_nic_ring_data *ring_data)
 {
struct hnae_ring *ring = ring_data->ring;
int head = readl_relaxed(ring->io_base + RCB_REG_HEAD);
 
if (head == ring->next_to_clean)
-   ring_data->ring->q->handle->dev->ops->toggle_ring_irq(
-   ring, 0);
+   return true;
else
-   napi_schedule(_data->napi);
+   return false;
 }
 
 static void hns_nic_tx_clr_all_bufs(struct hns_nic_ring_data *ring_data)
@@ -1042,15 +1044,23 @@ static void hns_nic_tx_clr_all_bufs(struct 
hns_nic_ring_data *ring_data)
 
 static int hns_nic_common_poll(struct napi_struct *napi, int budget)
 {
+   int clean_complete = 0;
struct hns_nic_ring_data *ring_data =
container_of(napi, struct hns_nic_ring_data, napi);
-   int clean_complete = ring_data->poll_one(
-   ring_data, budget, ring_data->ex_process);
+   struct hnae_ring *ring = ring_data->ring;
 
-   if (clean_complete >= 0 && clean_complete < budget) {
-   napi_complete(napi);
-   ring_data->fini_process(ring_data);
-   return 0;
+try_again:
+   clean_complete += ring_data->poll_one(
+   ring_data, budget - clean_complete,
+   ring_data->ex_process);
+
+   if (clean_complete < budget) {
+   if (ring_data->fini_process(ring_data)) {
+   napi_complete(napi);
+   ring->q->handle->dev->ops->toggle_ring_irq(ring, 0);
+   } else {
+   goto try_again;
+   }
}
 
return clean_complete;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.h 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.h
index fff8f8a..1b83232 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.h
@@ -41,7 +41,7 @@ struct hns_nic_ring_data {
int queue_index;
int (*poll_one)(struct hns_nic_ring_data *, int,

[PATCH V3 net-next 05/18] net: hns: bug fix of ethtool show the speed

2017-03-31 Thread Salil Mehta

From: Daode Huang 

When run ethtool ethX on hns driver, the speed will show
as "Unknown". The base.speed is not correct assigned,
this patch fix this bug.

Signed-off-by: Daode Huang 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 3ac2183..3404eac 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -146,7 +146,7 @@ static int hns_nic_get_link_ksettings(struct net_device 
*net_dev,
 
/* When there is no phy, autoneg is off. */
cmd->base.autoneg = false;
-   cmd->base.cmd = speed;
+   cmd->base.speed = speed;
cmd->base.duplex = duplex;
 
if (net_dev->phydev)
-- 
2.7.4

[PATCH V3 net-next 07/18] net: hns: Fix to adjust buf_size of ring according to mtu

2017-03-31 Thread Salil Mehta

From: lipeng 

Because buf_size of ring set to 2048, the process of rx_poll_one
can reuse the page, therefore the performance of XGE can improve.
But the chip only supports three bds in one package, so the max mtu
is 6K when it sets to 2048. For better performane in litter mtu, we
need change buf_size according to mtu.

When user change mtu, hns is only change the desc in memory. There
are some desc has been fetched by the chip, these desc can not be
changed by the code. So it needs set the port loopback and send
some packages to let the chip consumes the wrong desc and fetch new
desc.
Because the Pv660 do not support rss indirection, we need add version
check in mtu change process.

Signed-off-by: lipeng 
reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hnae.h |  37 
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c |  26 ++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c |   3 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h |   2 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c |  41 +++-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h |   3 +
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 249 +-
 7 files changed, 337 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 8016854..c66581d 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -67,6 +67,8 @@ do { \
 #define AE_IS_VER1(ver) ((ver) == AE_VERSION_1)
 #define AE_NAME_SIZE 16
 
+#define BD_SIZE_2048_MAX_MTU   6000
+
 /* some said the RX and TX RCB format should not be the same in the future. But
  * it is the same now...
  */
@@ -646,6 +648,41 @@ static inline void hnae_reuse_buffer(struct hnae_ring 
*ring, int i)
ring->desc[i].rx.ipoff_bnum_pid_flag = 0;
 }
 
+/* when reinit buffer size, we should reinit buffer description */
+static inline void hnae_reinit_all_ring_desc(struct hnae_handle *h)
+{
+   int i, j;
+   struct hnae_ring *ring;
+
+   for (i = 0; i < h->q_num; i++) {
+   ring = >qs[i]->rx_ring;
+   for (j = 0; j < ring->desc_num; j++)
+   ring->desc[j].addr = cpu_to_le64(ring->desc_cb[j].dma);
+   }
+
+   wmb();  /* commit all data before submit */
+}
+
+/* when reinit buffer size, we should reinit page offset */
+static inline void hnae_reinit_all_ring_page_off(struct hnae_handle *h)
+{
+   int i, j;
+   struct hnae_ring *ring;
+
+   for (i = 0; i < h->q_num; i++) {
+   ring = >qs[i]->rx_ring;
+   for (j = 0; j < ring->desc_num; j++) {
+   ring->desc_cb[j].page_offset = 0;
+   if (ring->desc[j].addr !=
+   cpu_to_le64(ring->desc_cb[j].dma))
+   ring->desc[j].addr =
+   cpu_to_le64(ring->desc_cb[j].dma);
+   }
+   }
+
+   wmb();  /* commit all data before submit */
+}
+
 #define hnae_set_field(origin, mask, shift, val) \
do { \
(origin) &= (~(mask)); \
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index 0a9cdf0..cd7e88e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -267,8 +267,32 @@ static int hns_ae_clr_multicast(struct hnae_handle *handle)
 static int hns_ae_set_mtu(struct hnae_handle *handle, int new_mtu)
 {
struct hns_mac_cb *mac_cb = hns_get_mac_cb(handle);
+   struct hnae_queue *q;
+   u32 rx_buf_size;
+   int i, ret;
+
+   /* when buf_size is 2048, max mtu is 6K for rx ring max bd num is 3. */
+   if (!AE_IS_VER1(mac_cb->dsaf_dev->dsaf_ver)) {
+   if (new_mtu <= BD_SIZE_2048_MAX_MTU)
+   rx_buf_size = 2048;
+   else
+   rx_buf_size = 4096;
+   } else {
+   rx_buf_size = mac_cb->dsaf_dev->buf_size;
+   }
+
+   ret = hns_mac_set_mtu(mac_cb, new_mtu, rx_buf_size);
 
-   return hns_mac_set_mtu(mac_cb, new_mtu);
+   if (!ret) {
+   /* reinit ring buf_size */
+   for (i = 0; i < handle->q_num; i++) {
+   q = handle->qs[i];
+   q->rx_ring.buf_size = rx_buf_size;
+   hns_rcb_set_rx_ring_bs(q, rx_buf_size);
+   }
+   }
+
+   return ret;
 }
 
 static void hns_ae_set_tso_stats(struct hnae_handle *handle, int enable)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index 3239d27..edf9a23 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@

[PATCH V3 net-next 08/18] net: hns: Replace netif_tx_lock to ring spin lock

2017-03-31 Thread Salil Mehta

From: lipeng 

netif_tx_lock is a global spin lock, it will take affect
in all rings in the netdevice. In tx_poll_one process, it can
only lock the current ring, in this case, we define a spin lock
in hnae_ring struct for it.

Signed-off-by: lipeng 
reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hnae.c |  1 +
 drivers/net/ethernet/hisilicon/hns/hnae.h |  3 +++
 drivers/net/ethernet/hisilicon/hns/hns_enet.c | 21 +++--
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c
index 78af663..513c257 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.c
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -196,6 +196,7 @@ hnae_init_ring(struct hnae_queue *q, struct hnae_ring 
*ring, int flags)
 
ring->q = q;
ring->flags = flags;
+   spin_lock_init(>lock);
assert(!ring->desc && !ring->desc_cb && !ring->desc_dma_addr);
 
/* not matter for tx or rx ring, the ntc and ntc start from 0 */
diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index c66581d..859c536 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -275,6 +275,9 @@ struct hnae_ring {
/* statistic */
struct ring_stats stats;
 
+   /* ring lock for poll one */
+   spinlock_t lock;
+
dma_addr_t desc_dma_addr;
u32 buf_size;   /* size for hnae_desc->addr, preset by AE */
u16 desc_num;   /* total number of desc */
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 3449a18..3634366 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -922,12 +922,13 @@ static int is_valid_clean_head(struct hnae_ring *ring, 
int h)
 
 /* netif_tx_lock will turn down the performance, set only when necessary */
 #ifdef CONFIG_NET_POLL_CONTROLLER
-#define NETIF_TX_LOCK(ndev) netif_tx_lock(ndev)
-#define NETIF_TX_UNLOCK(ndev) netif_tx_unlock(ndev)
+#define NETIF_TX_LOCK(ring) spin_lock(>lock)
+#define NETIF_TX_UNLOCK(ring) spin_unlock(>lock)
 #else
-#define NETIF_TX_LOCK(ndev)
-#define NETIF_TX_UNLOCK(ndev)
+#define NETIF_TX_LOCK(ring)
+#define NETIF_TX_UNLOCK(ring)
 #endif
+
 /* reclaim all desc in one budget
  * return error or number of desc left
  */
@@ -941,13 +942,13 @@ static int hns_nic_tx_poll_one(struct hns_nic_ring_data 
*ring_data,
int head;
int bytes, pkts;
 
-   NETIF_TX_LOCK(ndev);
+   NETIF_TX_LOCK(ring);
 
head = readl_relaxed(ring->io_base + RCB_REG_HEAD);
rmb(); /* make sure head is ready before touch any data */
 
if (is_ring_empty(ring) || head == ring->next_to_clean) {
-   NETIF_TX_UNLOCK(ndev);
+   NETIF_TX_UNLOCK(ring);
return 0; /* no data to poll */
}
 
@@ -955,7 +956,7 @@ static int hns_nic_tx_poll_one(struct hns_nic_ring_data 
*ring_data,
netdev_err(ndev, "wrong head (%d, %d-%d)\n", head,
   ring->next_to_use, ring->next_to_clean);
ring->stats.io_err_cnt++;
-   NETIF_TX_UNLOCK(ndev);
+   NETIF_TX_UNLOCK(ring);
return -EIO;
}
 
@@ -967,7 +968,7 @@ static int hns_nic_tx_poll_one(struct hns_nic_ring_data 
*ring_data,
prefetch(>desc_cb[ring->next_to_clean]);
}
 
-   NETIF_TX_UNLOCK(ndev);
+   NETIF_TX_UNLOCK(ring);
 
dev_queue = netdev_get_tx_queue(ndev, ring_data->queue_index);
netdev_tx_completed_queue(dev_queue, pkts, bytes);
@@ -1028,7 +1029,7 @@ static void hns_nic_tx_clr_all_bufs(struct 
hns_nic_ring_data *ring_data)
int head;
int bytes, pkts;
 
-   NETIF_TX_LOCK(ndev);
+   NETIF_TX_LOCK(ring);
 
head = ring->next_to_use; /* ntu :soft setted ring position*/
bytes = 0;
@@ -1036,7 +1037,7 @@ static void hns_nic_tx_clr_all_bufs(struct 
hns_nic_ring_data *ring_data)
while (head != ring->next_to_clean)
hns_nic_reclaim_one_desc(ring, , );
 
-   NETIF_TX_UNLOCK(ndev);
+   NETIF_TX_UNLOCK(ring);
 
dev_queue = netdev_get_tx_queue(ndev, ring_data->queue_index);
netdev_tx_reset_queue(dev_queue);
-- 
2.7.4

[PATCH V3 net-next 11/18] net: hns: Remove redundant mac_get_id()

2017-03-31 Thread Salil Mehta

From: Kejian Yan 

There is a mac_id in mac control block structure, so the callback
function mac_get_id() is useless. Here we remove this function.

Reported-by: Weiwei Deng 
Signed-off-by: Kejian Yan 
Reviewed-by: Salil Mehta 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c  |  8 
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h   |  2 --
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 13 -
 3 files changed, 23 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 723f3ae..035db86 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -466,13 +466,6 @@ static int hns_gmac_config_loopback(void *mac_drv, enum 
hnae_loop loop_mode,
return 0;
 }
 
-static void hns_gmac_get_id(void *mac_drv, u8 *mac_id)
-{
-   struct mac_driver *drv = (struct mac_driver *)mac_drv;
-
-   *mac_id = drv->mac_id;
-}
-
 static void hns_gmac_get_info(void *mac_drv, struct mac_info *mac_info)
 {
enum hns_gmac_duplex_mdoe duplex;
@@ -714,7 +707,6 @@ void *hns_gmac_config(struct hns_mac_cb *mac_cb, struct 
mac_params *mac_param)
mac_drv->config_pad_and_crc = hns_gmac_config_pad_and_crc;
mac_drv->config_half_duplex = hns_gmac_set_duplex_type;
mac_drv->set_rx_ignore_pause_frames = hns_gmac_set_rx_auto_pause_frames;
-   mac_drv->mac_get_id = hns_gmac_get_id;
mac_drv->get_info = hns_gmac_get_info;
mac_drv->autoneg_stat = hns_gmac_autoneg_stat;
mac_drv->get_pause_enable = hns_gmac_get_pausefrm_cfg;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
index e6842c9..24dfba5 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
@@ -373,8 +373,6 @@ struct mac_driver {
void (*set_rx_ignore_pause_frames)(void *mac_drv, u32 enable);
/* config rx mode for promiscuous*/
void (*set_promiscuous)(void *mac_drv, u8 enable);
-   /* get mac id */
-   void (*mac_get_id)(void *mac_drv, u8 *mac_id);
void (*mac_pausefrm_cfg)(void *mac_drv, u32 rx_en, u32 tx_en);
 
void (*autoneg_stat)(void *mac_drv, u32 *enable);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
index aae830a..37a2fc3 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
@@ -300,18 +300,6 @@ static void hns_xgmac_set_tx_auto_pause_frames(void 
*mac_drv, u16 enable)
 }
 
 /**
- *hns_xgmac_get_id - get xgmac port id
- *@mac_drv: mac driver
- *@newval:xgmac max frame length
- */
-static void hns_xgmac_get_id(void *mac_drv, u8 *mac_id)
-{
-   struct mac_driver *drv = (struct mac_driver *)mac_drv;
-
-   *mac_id = drv->mac_id;
-}
-
-/**
  *hns_xgmac_config_max_frame_length - set xgmac max frame length
  *@mac_drv: mac driver
  *@newval:xgmac max frame length
@@ -833,7 +821,6 @@ void *hns_xgmac_config(struct hns_mac_cb *mac_cb, struct 
mac_params *mac_param)
mac_drv->config_half_duplex = NULL;
mac_drv->set_rx_ignore_pause_frames =
hns_xgmac_set_rx_ignore_pause_frames;
-   mac_drv->mac_get_id = hns_xgmac_get_id;
mac_drv->mac_free = hns_xgmac_free;
mac_drv->adjust_link = NULL;
mac_drv->set_tx_auto_pause_frames = hns_xgmac_set_tx_auto_pause_frames;
-- 
2.7.4

[PATCH V3 net-next 13/18] net: hns: Clean redundant code from hns_mdio.c file

2017-03-31 Thread Salil Mehta

From: Kejian Yan 

This patch cleans the redundant code from  hns_mdio.c.

Reported-by: Ping Zhang 
Signed-off-by: Kejian Yan 
Reviewed-by: Salil Mehta 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns_mdio.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns_mdio.c 
b/drivers/net/ethernet/hisilicon/hns_mdio.c
index 501eb20..fad1c5b 100644
--- a/drivers/net/ethernet/hisilicon/hns_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns_mdio.c
@@ -23,17 +23,9 @@
 #include 
 #include 
 #include 
-#include 
 
 #define MDIO_DRV_NAME "Hi-HNS_MDIO"
 #define MDIO_BUS_NAME "Hisilicon MII Bus"
-#define MDIO_DRV_VERSION "1.3.0"
-#define MDIO_COPYRIGHT "Copyright(c) 2015 Huawei Corporation."
-#define MDIO_DRV_STRING MDIO_BUS_NAME
-#define MDIO_DEFAULT_DEVICE_DESCR MDIO_BUS_NAME
-
-#define MDIO_CTL_DEV_ADDR(x)   (x & 0x1f)
-#define MDIO_CTL_PORT_ADDR(x)  ((x & 0x1f) << 5)
 
 #define MDIO_TIMEOUT   100
 
@@ -64,9 +56,7 @@ struct hns_mdio_device {
 #define MDIO_CMD_DEVAD_S   0
 #define MDIO_CMD_PRTAD_M   0x1f
 #define MDIO_CMD_PRTAD_S   5
-#define MDIO_CMD_OP_M  0x3
 #define MDIO_CMD_OP_S  10
-#define MDIO_CMD_ST_M  0x3
 #define MDIO_CMD_ST_S  12
 #define MDIO_CMD_START_B   14
 
-- 
2.7.4

[PATCH V3 net-next 09/18] net: hns: Correct HNS RSS key set function

2017-03-31 Thread Salil Mehta

From: lipeng 

This patch fixes below ethtool configuration error:

localhost:~ # ethtool -X eth0 hkey XX:XX:XX...
Cannot set Rx flow hash configuration: Operation not supported

Signed-off-by: lipeng 
Reviewed-by: Yisen Zhuang 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 23 ++-
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c  |  9 -
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index cd7e88e..f0142e5 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -826,8 +826,9 @@ static int hns_ae_get_rss(struct hnae_handle *handle, u32 
*indir, u8 *key,
memcpy(key, ppe_cb->rss_key, HNS_PPEV2_RSS_KEY_SIZE);
 
/* update the current hash->queue mappings from the shadow RSS table */
-   memcpy(indir, ppe_cb->rss_indir_table,
-  HNS_PPEV2_RSS_IND_TBL_SIZE * sizeof(*indir));
+   if (indir)
+   memcpy(indir, ppe_cb->rss_indir_table,
+  HNS_PPEV2_RSS_IND_TBL_SIZE  * sizeof(*indir));
 
return 0;
 }
@@ -838,15 +839,19 @@ static int hns_ae_set_rss(struct hnae_handle *handle, 
const u32 *indir,
struct hns_ppe_cb *ppe_cb = hns_get_ppe_cb(handle);
 
/* set the RSS Hash Key if specififed by the user */
-   if (key)
-   hns_ppe_set_rss_key(ppe_cb, (u32 *)key);
+   if (key) {
+   memcpy(ppe_cb->rss_key, key, HNS_PPEV2_RSS_KEY_SIZE);
+   hns_ppe_set_rss_key(ppe_cb, ppe_cb->rss_key);
+   }
 
-   /* update the shadow RSS table with user specified qids */
-   memcpy(ppe_cb->rss_indir_table, indir,
-  HNS_PPEV2_RSS_IND_TBL_SIZE * sizeof(*indir));
+   if (indir) {
+   /* update the shadow RSS table with user specified qids */
+   memcpy(ppe_cb->rss_indir_table, indir,
+  HNS_PPEV2_RSS_IND_TBL_SIZE  * sizeof(*indir));
 
-   /* now update the hardware */
-   hns_ppe_set_indir_table(ppe_cb, ppe_cb->rss_indir_table);
+   /* now update the hardware */
+   hns_ppe_set_indir_table(ppe_cb, ppe_cb->rss_indir_table);
+   }
 
return 0;
 }
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 3404eac..3a2a342 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -1244,6 +1244,7 @@ hns_set_rss(struct net_device *netdev, const u32 *indir, 
const u8 *key,
 {
struct hns_nic_priv *priv = netdev_priv(netdev);
struct hnae_ae_ops *ops;
+   int ret;
 
if (AE_IS_VER1(priv->enet_ver)) {
netdev_err(netdev,
@@ -1253,12 +1254,10 @@ hns_set_rss(struct net_device *netdev, const u32 
*indir, const u8 *key,
 
ops = priv->ae_handle->dev->ops;
 
-   /* currently hfunc can only be Toeplitz hash */
-   if (key ||
-   (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP))
+   if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP) {
+   netdev_err(netdev, "Invalid hfunc!\n");
return -EOPNOTSUPP;
-   if (!indir)
-   return 0;
+   }
 
return ops->set_rss(priv->ae_handle, indir, key, hfunc);
 }
-- 
2.7.4

[PATCH V3 net-next 12/18] net: hns: Remove redundant mac table operations

2017-03-31 Thread Salil Mehta

From: Kejian Yan 

This patch removes redundant functions used only for debugging
purposes.

Reported-by: Weiwei Deng 
Signed-off-by: Kejian Yan 
Reviewed-by: Salil Mehta 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 160 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |  10 --
 2 files changed, 170 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index 6a069ff..abd8aec 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -2008,166 +2008,6 @@ int hns_dsaf_clr_mac_mc_port(struct dsaf_device 
*dsaf_dev, u8 mac_id,
return ret;
 }
 
-/**
- * hns_dsaf_get_mac_uc_entry - get mac uc entry
- * @dsaf_dev: dsa fabric device struct pointer
- * @mac_entry: mac entry
- */
-int hns_dsaf_get_mac_uc_entry(struct dsaf_device *dsaf_dev,
- struct dsaf_drv_mac_single_dest_entry *mac_entry)
-{
-   u16 entry_index = DSAF_INVALID_ENTRY_IDX;
-   struct dsaf_drv_tbl_tcam_key mac_key;
-
-   struct dsaf_tbl_tcam_ucast_cfg mac_data;
-   struct dsaf_tbl_tcam_data tcam_data;
-
-   /* check macaddr */
-   if (MAC_IS_ALL_ZEROS(mac_entry->addr) ||
-   MAC_IS_BROADCAST(mac_entry->addr)) {
-   dev_err(dsaf_dev->dev, "get_entry failed,addr %pM\n",
-   mac_entry->addr);
-   return -EINVAL;
-   }
-
-   /*config key */
-   hns_dsaf_set_mac_key(dsaf_dev, _key, mac_entry->in_vlan_id,
-mac_entry->in_port_num, mac_entry->addr);
-
-   /*check exist? */
-   entry_index = hns_dsaf_find_soft_mac_entry(dsaf_dev, _key);
-   if (entry_index == DSAF_INVALID_ENTRY_IDX) {
-   /*find none, error */
-   dev_err(dsaf_dev->dev,
-   "get_uc_entry failed, %s Mac key(%#x:%#x)\n",
-   dsaf_dev->ae_dev.name,
-   mac_key.high.val, mac_key.low.val);
-   return -EINVAL;
-   }
-   dev_dbg(dsaf_dev->dev,
-   "get_uc_entry, %s Mac key(%#x:%#x) entry_index%d\n",
-   dsaf_dev->ae_dev.name, mac_key.high.val,
-   mac_key.low.val, entry_index);
-
-   /* read entry */
-   hns_dsaf_tcam_uc_get(dsaf_dev, entry_index, _data, _data);
-
-   mac_key.high.val = le32_to_cpu(tcam_data.tbl_tcam_data_high);
-   mac_key.low.val = le32_to_cpu(tcam_data.tbl_tcam_data_low);
-
-   mac_entry->port_num = mac_data.tbl_ucast_out_port;
-
-   return 0;
-}
-
-/**
- * hns_dsaf_get_mac_mc_entry - get mac mc entry
- * @dsaf_dev: dsa fabric device struct pointer
- * @mac_entry: mac entry
- */
-int hns_dsaf_get_mac_mc_entry(struct dsaf_device *dsaf_dev,
- struct dsaf_drv_mac_multi_dest_entry *mac_entry)
-{
-   u16 entry_index = DSAF_INVALID_ENTRY_IDX;
-   struct dsaf_drv_tbl_tcam_key mac_key;
-
-   struct dsaf_tbl_tcam_mcast_cfg mac_data;
-   struct dsaf_tbl_tcam_data tcam_data;
-
-   /*check mac addr */
-   if (MAC_IS_ALL_ZEROS(mac_entry->addr) ||
-   MAC_IS_BROADCAST(mac_entry->addr)) {
-   dev_err(dsaf_dev->dev, "get_entry failed,addr %pM\n",
-   mac_entry->addr);
-   return -EINVAL;
-   }
-
-   /*config key */
-   hns_dsaf_set_mac_key(dsaf_dev, _key, mac_entry->in_vlan_id,
-mac_entry->in_port_num, mac_entry->addr);
-
-   /*check exist? */
-   entry_index = hns_dsaf_find_soft_mac_entry(dsaf_dev, _key);
-   if (entry_index == DSAF_INVALID_ENTRY_IDX) {
-   /* find none, error */
-   dev_err(dsaf_dev->dev,
-   "get_mac_uc_entry failed, %s Mac key(%#x:%#x)\n",
-   dsaf_dev->ae_dev.name, mac_key.high.val,
-   mac_key.low.val);
-   return -EINVAL;
-   }
-   dev_dbg(dsaf_dev->dev,
-   "get_mac_uc_entry, %s Mac key(%#x:%#x) entry_index%d\n",
-   dsaf_dev->ae_dev.name, mac_key.high.val,
-   mac_key.low.val, entry_index);
-
-   /*read entry */
-   hns_dsaf_tcam_mc_get(dsaf_dev, entry_index, _data, _data);
-
-   mac_key.high.val = le32_to_cpu(tcam_data.tbl_tcam_data_high);
-   mac_key.low.val = le32_to_cpu(tcam_data.tbl_tcam_data_low);
-
-   mac_entry->port_mask[0] = mac_data.tbl_mcast_port_msk[0] & 0x3F;
-   return 0;
-}
-
-/**
- * hns_dsaf_get_mac_entry_by_index - get mac entry by tab index
- * @dsaf_dev: dsa fabric device struct pointer
- * @entry_index: tab entry index
- * @mac_entry: mac entry
- */
-int hns_dsaf_get_mac_entry_by_index(
-   struct dsaf_device *dsaf_dev,
-   u16 entry_index, struct dsaf_drv_mac_multi_dest_entry

[PATCH V3 net-next 15/18] net: hns: Simplify the exception sequence in hns_ppe_init()

2017-03-31 Thread Salil Mehta

From: Kejian Yan 

We need to free all ppe submodule if it fails to initialize ppe by
any fault, so this patch will free all ppe resource before
hns_ppe_init() returns exception situation

Reported-by: JinchuanTian 
Signed-off-by: Kejian Yan 
Reviewed-by: Salil Mehta 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index 6ea8722..eba406b 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -496,17 +496,17 @@ void hns_ppe_get_stats(struct hns_ppe_cb *ppe_cb, u64 
*data)
  */
 int hns_ppe_init(struct dsaf_device *dsaf_dev)
 {
-   int i, k;
int ret;
+   int i;
 
for (i = 0; i < HNS_PPE_COM_NUM; i++) {
ret = hns_ppe_common_get_cfg(dsaf_dev, i);
if (ret)
-   goto get_ppe_cfg_fail;
+   goto get_cfg_fail;
 
ret = hns_rcb_common_get_cfg(dsaf_dev, i);
if (ret)
-   goto get_rcb_cfg_fail;
+   goto get_cfg_fail;
 
hns_ppe_get_cfg(dsaf_dev->ppe_common[i]);
 
@@ -518,13 +518,12 @@ int hns_ppe_init(struct dsaf_device *dsaf_dev)
 
return 0;
 
-get_rcb_cfg_fail:
-   hns_ppe_common_free_cfg(dsaf_dev, i);
-get_ppe_cfg_fail:
-   for (k = i - 1; k >= 0; k--) {
-   hns_rcb_common_free_cfg(dsaf_dev, k);
-   hns_ppe_common_free_cfg(dsaf_dev, k);
+get_cfg_fail:
+   for (i = 0; i < HNS_PPE_COM_NUM; i++) {
+   hns_rcb_common_free_cfg(dsaf_dev, i);
+   hns_ppe_common_free_cfg(dsaf_dev, i);
}
+
return ret;
 }
 
-- 
2.7.4

[PATCH V3 net-next 10/18] net: hns: Remove the redundant adding and deleting mac function

2017-03-31 Thread Salil Mehta

From: Kejian Yan 

The functions (hns_dsaf_set_mac_mc_entry() and hns_mac_del_mac()) are
not called by any functions. They are dead code in hns. And the same
features are implemented by the patch (the id is 66355f5).

Reported-by: Weiwei Deng 
Signed-off-by: Kejian Yan 
Reviewed-by: Salil Mehta 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c  | 38 --
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h  |  1 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 81 --
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |  2 -
 4 files changed, 122 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index edf9a23..696f2ae 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -332,44 +332,6 @@ int hns_mac_set_multi(struct hns_mac_cb *mac_cb,
return 0;
 }
 
-/**
- *hns_mac_del_mac - delete mac address into dsaf table,can't delete the same
- *  address twice
- *@net_dev: net device
- *@vfn :   vf lan
- *@mac : mac address
- *return status
- */
-int hns_mac_del_mac(struct hns_mac_cb *mac_cb, u32 vfn, char *mac)
-{
-   struct mac_entry_idx *old_mac;
-   struct dsaf_device *dsaf_dev;
-   u32 ret;
-
-   dsaf_dev = mac_cb->dsaf_dev;
-
-   if (vfn < DSAF_MAX_VM_NUM) {
-   old_mac = _cb->addr_entry_idx[vfn];
-   } else {
-   dev_err(mac_cb->dev,
-   "vf queue is too large, %s mac%d queue = %#x!\n",
-   mac_cb->dsaf_dev->ae_dev.name, mac_cb->mac_id, vfn);
-   return -EINVAL;
-   }
-
-   if (dsaf_dev) {
-   ret = hns_dsaf_del_mac_entry(dsaf_dev, old_mac->vlan_id,
-mac_cb->mac_id, old_mac->addr);
-   if (ret)
-   return ret;
-
-   if (memcmp(old_mac->addr, mac, sizeof(old_mac->addr)) == 0)
-   old_mac->valid = 0;
-   }
-
-   return 0;
-}
-
 int hns_mac_clr_multicast(struct hns_mac_cb *mac_cb, int vfn)
 {
struct dsaf_device *dsaf_dev = mac_cb->dsaf_dev;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
index 7f14d91..e6842c9 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
@@ -436,7 +436,6 @@ int hns_mac_set_multi(struct hns_mac_cb *mac_cb,
 int hns_mac_vm_config_bc_en(struct hns_mac_cb *mac_cb, u32 vm, bool enable);
 void hns_mac_start(struct hns_mac_cb *mac_cb);
 void hns_mac_stop(struct hns_mac_cb *mac_cb);
-int hns_mac_del_mac(struct hns_mac_cb *mac_cb, u32 vfn, char *mac);
 void hns_mac_uninit(struct dsaf_device *dsaf_dev);
 void hns_mac_adjust_link(struct hns_mac_cb *mac_cb, int speed, int duplex);
 void hns_mac_reset(struct hns_mac_cb *mac_cb);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index 90dbda7..6a069ff 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -1647,87 +1647,6 @@ int hns_dsaf_rm_mac_addr(
  mac_entry->addr);
 }
 
-/**
- * hns_dsaf_set_mac_mc_entry - set mac mc-entry
- * @dsaf_dev: dsa fabric device struct pointer
- * @mac_entry: mc-mac entry
- */
-int hns_dsaf_set_mac_mc_entry(
-   struct dsaf_device *dsaf_dev,
-   struct dsaf_drv_mac_multi_dest_entry *mac_entry)
-{
-   u16 entry_index = DSAF_INVALID_ENTRY_IDX;
-   struct dsaf_drv_tbl_tcam_key mac_key;
-   struct dsaf_tbl_tcam_mcast_cfg mac_data;
-   struct dsaf_drv_priv *priv =
-   (struct dsaf_drv_priv *)hns_dsaf_dev_priv(dsaf_dev);
-   struct dsaf_drv_soft_mac_tbl *soft_mac_entry = priv->soft_mac_tbl;
-   struct dsaf_drv_tbl_tcam_key tmp_mac_key;
-   struct dsaf_tbl_tcam_data tcam_data;
-
-   /* mac addr check */
-   if (MAC_IS_ALL_ZEROS(mac_entry->addr)) {
-   dev_err(dsaf_dev->dev, "set uc %s Mac %pM err!\n",
-   dsaf_dev->ae_dev.name, mac_entry->addr);
-   return -EINVAL;
-   }
-
-   /*config key */
-   hns_dsaf_set_mac_key(dsaf_dev, _key,
-mac_entry->in_vlan_id,
-mac_entry->in_port_num, mac_entry->addr);
-
-   /* entry ie exist? */
-   entry_index = hns_dsaf_find_soft_mac_entry(dsaf_dev, _key);
-   if (entry_index == DSAF_INVALID_ENTRY_IDX) {
-   /*if hasnot, find enpty entry*/
-   entry_index = hns_dsaf_find_empty_mac_entry(dsaf_dev);
-   if (entry_index == DSAF_INVALID_ENTRY_IDX) {
-   /*if hasnot empty, error*/
-

[PATCH V3 net-next 14/18] net: hns: Optimise the code in hns_mdio_wait_ready()

2017-03-31 Thread Salil Mehta

From: Kejian Yan 

This patch fixes the code to clear pclint warning/info.

Reported-by: Ping Zhang 
Signed-off-by: Kejian Yan 
Reviewed-by: Salil Mehta 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns_mdio.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns_mdio.c 
b/drivers/net/ethernet/hisilicon/hns_mdio.c
index fad1c5b..e5221d9 100644
--- a/drivers/net/ethernet/hisilicon/hns_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns_mdio.c
@@ -175,18 +175,20 @@ static int mdio_sc_cfg_reg_write(struct hns_mdio_device 
*mdio_dev,
 static int hns_mdio_wait_ready(struct mii_bus *bus)
 {
struct hns_mdio_device *mdio_dev = bus->priv;
+   u32 cmd_reg_value;
int i;
-   u32 cmd_reg_value = 1;
 
/* waitting for MDIO_COMMAND_REG 's mdio_start==0 */
/* after that can do read or write*/
-   for (i = 0; cmd_reg_value; i++) {
+   for (i = 0; i < MDIO_TIMEOUT; i++) {
cmd_reg_value = MDIO_GET_REG_BIT(mdio_dev,
 MDIO_COMMAND_REG,
 MDIO_CMD_START_B);
-   if (i == MDIO_TIMEOUT)
-   return -ETIMEDOUT;
+   if (!cmd_reg_value)
+   break;
}
+   if ((i == MDIO_TIMEOUT) && cmd_reg_value)
+   return -ETIMEDOUT;
 
return 0;
 }
-- 
2.7.4

[PATCH V3 net-next 17/18] net: hns: Avoid Hip06 chip TX packet line bug

2017-03-31 Thread Salil Mehta

From: lipeng 

There is a bug on Hip06 that tx ring interrupts packets count will be
clear when drivers send data to tx ring, so that the tx packets count
will never upgrade to packets line, and cause the interrupts engendered
was delayed.
Sometimes, it will cause sending performance lower than expected.

To fix this bug, we set tx ring interrupts packets line to 1 forever,
to avoid count clear. And set the gap time to 20us, to solve the problem
that too many interrupts engendered when packets line is 1.

This patch could advance the send performance on ARM  from 6.6G to 9.37G
when an iperf send thread on ARM and an iperf send thread on X86 for XGE.

Signed-off-by: lipeng 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hnae.c |   5 ++
 drivers/net/ethernet/hisilicon/hns/hnae.h |   6 +-
 drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c |  78 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 101 --
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h |  23 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_reg.h |   2 +-
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c  |  24 +++--
 7 files changed, 169 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c 
b/drivers/net/ethernet/hisilicon/hns/hnae.c
index 513c257..8950b74 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.c
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.c
@@ -57,10 +57,15 @@ static int hnae_alloc_buffer(struct hnae_ring *ring, struct 
hnae_desc_cb *cb)
 
 static void hnae_free_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb)
 {
+   if (unlikely(!cb->priv))
+   return;
+
if (cb->type == DESC_TYPE_SKB)
dev_kfree_skb_any((struct sk_buff *)cb->priv);
else if (unlikely(is_rx_ring(ring)))
put_page((struct page *)cb->priv);
+
+   cb->priv = NULL;
 }
 
 static int hnae_map_buffer(struct hnae_ring *ring, struct hnae_desc_cb *cb)
diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 859c536..0943138 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -488,11 +488,11 @@ struct hnae_ae_ops {
  u32 auto_neg, u32 rx_en, u32 tx_en);
void (*get_coalesce_usecs)(struct hnae_handle *handle,
   u32 *tx_usecs, u32 *rx_usecs);
-   void (*get_rx_max_coalesced_frames)(struct hnae_handle *handle,
-   u32 *tx_frames, u32 *rx_frames);
+   void (*get_max_coalesced_frames)(struct hnae_handle *handle,
+u32 *tx_frames, u32 *rx_frames);
int (*set_coalesce_usecs)(struct hnae_handle *handle, u32 timeout);
int (*set_coalesce_frames)(struct hnae_handle *handle,
-  u32 coalesce_frames);
+  u32 tx_frames, u32 rx_frames);
void (*get_coalesce_range)(struct hnae_handle *handle,
   u32 *tx_frames_low, u32 *rx_frames_low,
   u32 *tx_frames_high, u32 *rx_frames_high,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index f0142e5..ff864a1 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -487,15 +487,21 @@ static void hns_ae_get_coalesce_usecs(struct hnae_handle 
*handle,
   ring_pair->port_id_in_comm);
 }
 
-static void hns_ae_get_rx_max_coalesced_frames(struct hnae_handle *handle,
-  u32 *tx_frames, u32 *rx_frames)
+static void hns_ae_get_max_coalesced_frames(struct hnae_handle *handle,
+   u32 *tx_frames, u32 *rx_frames)
 {
struct ring_pair_cb *ring_pair =
container_of(handle->qs[0], struct ring_pair_cb, q);
+   struct dsaf_device *dsaf_dev = hns_ae_get_dsaf_dev(handle->dev);
 
-   *tx_frames = hns_rcb_get_coalesced_frames(ring_pair->rcb_common,
- ring_pair->port_id_in_comm);
-   *rx_frames = hns_rcb_get_coalesced_frames(ring_pair->rcb_common,
+   if (AE_IS_VER1(dsaf_dev->dsaf_ver) ||
+   handle->port_type == HNAE_PORT_DEBUG)
+   *tx_frames = hns_rcb_get_rx_coalesced_frames(
+   ring_pair->rcb_common, ring_pair->port_id_in_comm);
+   else
+   *tx_frames = hns_rcb_get_tx_coalesced_frames(
+   ring_pair->rcb_common, ring_pair->port_id_in_comm);
+   *rx_frames = hns_rcb_get_rx_coalesced_frames(ring_pair->rcb_common,
  ring_pair->port_id_in_comm);
 }
 
@@

Re: [PATCH] ath6kl: Add __printf verification to ath6kl_dbg

2017-03-31 Thread Steve deRosier

On Fri, Mar 31, 2017 at 10:45 AM, Joe Perches  wrote:
> On Fri, 2017-03-31 at 10:34 -0700, Steve deRosier wrote:
>> On Fri, Mar 31, 2017 at 10:23 AM, Joe Perches  wrote:
>> > On Fri, 2017-03-31 at 10:19 -0700, Steve deRosier wrote:
>> > > On Thu, Mar 30, 2017 at 3:57 PM, Joe Perches  wrote:
>> > > > Fix fallout too.
>> >
>> > []
>> > > My only question is why bother doing a format check on something
>> > > that's going to be compiled out anyway?
>> >
>> > To avoid introducing defects when writing new code
>> > and not using the debugging code path.
>> >
>>
>> Fair enough. And I totally agree with the defensive programming here
>> in that case and feel it's worth the tradeoff (if indeed there really
>> is any cost, I'm unsure what gcc actually does in this instance).
>>
>> For sake of discussion though - shouldn't anything not using the debug
>> code path in this case always be of the form that compiles out? ie
>> would be empty functions intended here just to make compilation work
>> and the code that depends on it simpler? Thus, there really should
>> never be a risk of introducing said defects. If any "real" code were
>> put in that else clause, that'd be a big red-flag in the review of
>> said hypothetical patch.
>
> Generically, all debugging forms should strive to avoid side-effects.
>

Yes. Of course. Lightbulb.

I wasn't even thinking of the fact someone could load the printf
arguments with code that might have side-effects instead of simple
variables to print. I never do it for obvious reasons, but I could see
it happening.

Thanks for spending the time going back and forth with me about it.

Thanks,
- Steve

[PATCH V3 net-next 18/18] net: hns: Some checkpatch.pl script & warning fixes

2017-03-31 Thread Salil Mehta

This patch fixes some checkpatch.pl script caught errors and
warnings during the compilation time.

Signed-off-by: Salil Mehta 
---
Patch V3: https://lkml.org/lkml/2017/3/31/538
  Adressed the comment by Joe Perches
---
 drivers/net/ethernet/hisilicon/hns/hnae.h  |  1 -
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 11 +--
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h |  2 +-
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c  |  1 -
 drivers/net/ethernet/hisilicon/hns/hns_enet.c  |  9 +
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c   |  1 -
 6 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h 
b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 0943138..04211ac 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -103,7 +103,6 @@ enum hnae_led_state {
 #define HNS_RX_FLAG_L4ID_TCP 0x1
 #define HNS_RX_FLAG_L4ID_SCTP 0x3
 
-
 #define HNS_TXD_ASID_S 0
 #define HNS_TXD_ASID_M (0xff << HNS_TXD_ASID_S)
 #define HNS_TXD_BUFNUM_S 8
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 035db86..74bd260 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -86,12 +86,11 @@ static void hns_gmac_disable(void *mac_drv, enum 
mac_commom_mode mode)
dsaf_set_dev_bit(drv, GMAC_PORT_EN_REG, GMAC_PORT_RX_EN_B, 0);
 }
 
-/**
-*hns_gmac_get_en - get port enable
-*@mac_drv:mac device
-*@rx:rx enable
-*@tx:tx enable
-*/
+/* hns_gmac_get_en - get port enable
+ * @mac_drv:mac device
+ * @rx:rx enable
+ * @tx:tx enable
+ */
 static void hns_gmac_get_en(void *mac_drv, u32 *rx, u32 *tx)
 {
struct mac_driver *drv = (struct mac_driver *)mac_drv;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
index 4db02e2..4507e82 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
@@ -68,7 +68,7 @@ enum dsaf_roce_qos_sl {
 };
 
 #define DSAF_STATS_READ(p, offset) (*((u64 *)((u8 *)(p) + (offset
-#define HNS_DSAF_IS_DEBUG(dev) (dev->dsaf_mode == DSAF_MODE_DISABLE_SP)
+#define HNS_DSAF_IS_DEBUG(dev) ((dev)->dsaf_mode == DSAF_MODE_DISABLE_SP)
 
 enum hal_dsaf_mode {
HRD_DSAF_NO_DSAF_MODE   = 0x0,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index 9b66057..c20a0f4 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -471,7 +471,6 @@ static void hns_rcb_ring_pair_get_cfg(struct ring_pair_cb 
*ring_pair_cb)
 static int hns_rcb_get_port_in_comm(
struct rcb_common_cb *rcb_common, int ring_idx)
 {
-
return ring_idx / (rcb_common->max_q_per_vf * rcb_common->max_vfn);
 }
 
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 3634366..1f7b2cd 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -512,7 +512,8 @@ static void hns_nic_reuse_page(struct sk_buff *skb, int i,
int last_offset;
bool twobufs;
 
-   twobufs = ((PAGE_SIZE < 8192) && hnae_buf_size(ring) == 
HNS_BUFFER_SIZE_2048);
+   twobufs = ((PAGE_SIZE < 8192) &&
+   hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048);
 
desc = >desc[ring->next_to_clean];
size = le16_to_cpu(desc->rx.size);
@@ -922,8 +923,8 @@ static int is_valid_clean_head(struct hnae_ring *ring, int 
h)
 
 /* netif_tx_lock will turn down the performance, set only when necessary */
 #ifdef CONFIG_NET_POLL_CONTROLLER
-#define NETIF_TX_LOCK(ring) spin_lock(>lock)
-#define NETIF_TX_UNLOCK(ring) spin_unlock(>lock)
+#define NETIF_TX_LOCK(ring) spin_lock(&(ring)->lock)
+#define NETIF_TX_UNLOCK(ring) spin_unlock(&(ring)->lock)
 #else
 #define NETIF_TX_LOCK(ring)
 #define NETIF_TX_UNLOCK(ring)
@@ -2012,7 +2013,7 @@ static void hns_nic_reset_subtask(struct hns_nic_priv 
*priv)
 static void hns_nic_service_event_complete(struct hns_nic_priv *priv)
 {
WARN_ON(!test_bit(NIC_STATE_SERVICE_SCHED, >state));
-
+   /* make sure to commit the things */
smp_mb__before_atomic();
clear_bit(NIC_STATE_SERVICE_SCHED, >state);
 }
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 36f33bd..b8fab14 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -1242,7 +1242,6 @@ hns_set_rss(struct net_device *netdev, const u32 *indir, 
const u8 *key,
 {
struct hns_nic_priv *priv = netdev_priv(netdev);
struct hnae_ae_ops *ops;
-   int ret;
 
if (AE_IS_VER1(priv->enet_ver)) {

[PATCH V3 net-next 16/18] net: hns: Adjust the SBM module buffer threshold

2017-03-31 Thread Salil Mehta

From: Kejian Yan 

HNS needs SMB Buffers to store at least two packets after sending
pause frame because of the link delay. The MTU of HNS is 9728. As
the processor user manual described, the SBM buffer threshold should
be modified.

Reported-by: Ping Zhang 
Signed-off-by: Kejian Yan 
Reviewed-by: Salil Mehta 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c 
b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index abd8aec..d07b4fe 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -510,10 +510,10 @@ static void hns_dsafv2_sbm_bp_wl_cfg(struct dsaf_device 
*dsaf_dev)
o_sbm_bp_cfg = dsaf_read_dev(dsaf_dev, reg);
dsaf_set_field(o_sbm_bp_cfg,
   DSAFV2_SBM_CFG3_SET_BUF_NUM_NO_PFC_M,
-  DSAFV2_SBM_CFG3_SET_BUF_NUM_NO_PFC_S, 48);
+  DSAFV2_SBM_CFG3_SET_BUF_NUM_NO_PFC_S, 55);
dsaf_set_field(o_sbm_bp_cfg,
   DSAFV2_SBM_CFG3_RESET_BUF_NUM_NO_PFC_M,
-  DSAFV2_SBM_CFG3_RESET_BUF_NUM_NO_PFC_S, 80);
+  DSAFV2_SBM_CFG3_RESET_BUF_NUM_NO_PFC_S, 110);
dsaf_write_dev(dsaf_dev, reg, o_sbm_bp_cfg);
 
/* for no enable pfc mode */
@@ -521,10 +521,10 @@ static void hns_dsafv2_sbm_bp_wl_cfg(struct dsaf_device 
*dsaf_dev)
o_sbm_bp_cfg = dsaf_read_dev(dsaf_dev, reg);
dsaf_set_field(o_sbm_bp_cfg,
   DSAFV2_SBM_CFG4_SET_BUF_NUM_NO_PFC_M,
-  DSAFV2_SBM_CFG4_SET_BUF_NUM_NO_PFC_S, 192);
+  DSAFV2_SBM_CFG4_SET_BUF_NUM_NO_PFC_S, 128);
dsaf_set_field(o_sbm_bp_cfg,
   DSAFV2_SBM_CFG4_RESET_BUF_NUM_NO_PFC_M,
-  DSAFV2_SBM_CFG4_RESET_BUF_NUM_NO_PFC_S, 240);
+  DSAFV2_SBM_CFG4_RESET_BUF_NUM_NO_PFC_S, 192);
dsaf_write_dev(dsaf_dev, reg, o_sbm_bp_cfg);
}
 
-- 
2.7.4

Re: [PATCH] ath6kl: Add __printf verification to ath6kl_dbg

2017-03-31 Thread Joe Perches

On Fri, 2017-03-31 at 10:34 -0700, Steve deRosier wrote:
> On Fri, Mar 31, 2017 at 10:23 AM, Joe Perches  wrote:
> > On Fri, 2017-03-31 at 10:19 -0700, Steve deRosier wrote:
> > > On Thu, Mar 30, 2017 at 3:57 PM, Joe Perches  wrote:
> > > > Fix fallout too.
> > 
> > []
> > > My only question is why bother doing a format check on something
> > > that's going to be compiled out anyway?
> > 
> > To avoid introducing defects when writing new code
> > and not using the debugging code path.
> > 
> 
> Fair enough. And I totally agree with the defensive programming here
> in that case and feel it's worth the tradeoff (if indeed there really
> is any cost, I'm unsure what gcc actually does in this instance).
> 
> For sake of discussion though - shouldn't anything not using the debug
> code path in this case always be of the form that compiles out? ie
> would be empty functions intended here just to make compilation work
> and the code that depends on it simpler? Thus, there really should
> never be a risk of introducing said defects. If any "real" code were
> put in that else clause, that'd be a big red-flag in the review of
> said hypothetical patch.

Generically, all debugging forms should strive to avoid side-effects.

For instance, look at no_printk/pr_debug in the #ifndef DEBUG paths.

It uses if (0) to avoid compilation of arguments that might be
function calls or volatile accesses and so might have side-effects
altogether.

include/linux/printk.h-/*
include/linux/printk.h- * Dummy printk for disabled debugging statements to use 
whilst maintaining
include/linux/printk.h- * gcc's format checking.
include/linux/printk.h- */
include/linux/printk.h:#define no_printk(fmt, ...)  
\
include/linux/printk.h-({   
\
include/linux/printk.h- do {\
include/linux/printk.h- if (0)  \
include/linux/printk.h- printk(fmt, ##__VA_ARGS__); \
include/linux/printk.h- } while (0);\
include/linux/printk.h- 0;  \
include/linux/printk.h-})
i

Re: [PATCH] ath6kl: Add __printf verification to ath6kl_dbg

2017-03-31 Thread Steve deRosier

On Fri, Mar 31, 2017 at 10:23 AM, Joe Perches  wrote:
> On Fri, 2017-03-31 at 10:19 -0700, Steve deRosier wrote:
>> On Thu, Mar 30, 2017 at 3:57 PM, Joe Perches  wrote:
>> > Fix fallout too.
> []
>> My only question is why bother doing a format check on something
>> that's going to be compiled out anyway?
>
> To avoid introducing defects when writing new code
> and not using the debugging code path.
>

Fair enough. And I totally agree with the defensive programming here
in that case and feel it's worth the tradeoff (if indeed there really
is any cost, I'm unsure what gcc actually does in this instance).

For sake of discussion though - shouldn't anything not using the debug
code path in this case always be of the form that compiles out? ie
would be empty functions intended here just to make compilation work
and the code that depends on it simpler? Thus, there really should
never be a risk of introducing said defects. If any "real" code were
put in that else clause, that'd be a big red-flag in the review of
said hypothetical patch.

Thanks,
- Steve

Re: [PATCH net-next v2 9/9] net: dsa: mv88e6xxx: add cross-chip bridging

2017-03-31 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> I don't like the idea of leaking frames.

I don't like it neither. That's why this patch series is out there. It
improves the security on PVT-capable Marvell switches by programming the
tables correctly on bridging events.

A next step can be to warn the user about the software or hardware
limitations (s)he is currently experiencing with something roughly like
this in cross-chip bridging operations:

if (!mv88e6xxx_has_pvt(chip)) {
#if IS_ENABLED(BRIDGE_VLAN_FILTERING)
return 0;
#else
pr_err("Cross-chip bridging is forbidden on non-PVT hardware and
   non-VLAN-filtering aware systems\n");
return -EINVAL;
#endif
}

But let's keep it simple for the moment and go baby steps. First program
PVT tables correctly as this patchset does, then figure out how to
handle non-PVT systems. That is a good topic for next week ;-)

Thanks,

Vivien

Re: [PATCH net 5/5] l2tp: take a reference on sessions used in genetlink handlers

2017-03-31 Thread Guillaume Nault

On Fri, Mar 31, 2017 at 01:02:30PM +0200, Guillaume Nault wrote:
> Callers of l2tp_nl_session_find() need to hold a reference on the
> returned session since there's no guarantee that it isn't going to
> disappear from under them.
> 
> Relying on the fact that no l2tp netlink message may be processed
> concurrently isn't enough: sessions can be deleted by other means
> (e.g. by closing the PPPOL2TP socket of a ppp pseudowire).
> 
> l2tp_nl_cmd_session_delete() is a bit special: it runs a callback
> function that may require a previous call to session->ref(). In
> particular, for ppp pseudowires, the callback is l2tp_session_delete(),
> which then calls pppol2tp_session_close() and dereferences the PPPOL2TP
> socket. The socket might already be gone at the moment
> l2tp_session_delete() calls session->ref(), so we need to take a
> reference during the session lookup. So we need to pass the do_ref
> variable down to l2tp_session_get() and l2tp_session_get_by_ifname().
> 
> Since all callers have to be updated, l2tp_session_find_by_ifname() and
> l2tp_nl_session_find() are renamed to reflect their new behaviour.
> 
> Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered 
> listeners")

Sorry, it should have been
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")

Commit 33f72e6f0c67 ("l2tp : multicast notification to the registered
listeners") just worsened the existing race conditions.

David, do you want me to repost this series with the new Fixes tag?

Re: [PATCH] ath6kl: Add __printf verification to ath6kl_dbg

2017-03-31 Thread Joe Perches

On Fri, 2017-03-31 at 10:19 -0700, Steve deRosier wrote:
> On Thu, Mar 30, 2017 at 3:57 PM, Joe Perches  wrote:
> > Fix fallout too.
[]
> My only question is why bother doing a format check on something
> that's going to be compiled out anyway?

To avoid introducing defects when writing new code
and not using the debugging code path.

Re: [PATCH] ath6kl: Add __printf verification to ath6kl_dbg

2017-03-31 Thread Steve deRosier

On Thu, Mar 30, 2017 at 3:57 PM, Joe Perches  wrote:
> Fix fallout too.
>
> Signed-off-by: Joe Perches 
> ---
>  drivers/net/wireless/ath/ath6kl/debug.h| 2 ++
>  drivers/net/wireless/ath/ath6kl/htc_pipe.c | 2 +-
>  drivers/net/wireless/ath/ath6kl/wmi.c  | 2 +-
>  3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath6kl/debug.h 
> b/drivers/net/wireless/ath/ath6kl/debug.h
> index 0614393dd7ae..94297572914f 100644
> --- a/drivers/net/wireless/ath/ath6kl/debug.h
> +++ b/drivers/net/wireless/ath/ath6kl/debug.h
> @@ -63,6 +63,7 @@ int ath6kl_read_tgt_stats(struct ath6kl *ar, struct 
> ath6kl_vif *vif);
>
>  #ifdef CONFIG_ATH6KL_DEBUG
>
> +__printf(2, 3)
>  void ath6kl_dbg(enum ATH6K_DEBUG_MASK mask, const char *fmt, ...);
>  void ath6kl_dbg_dump(enum ATH6K_DEBUG_MASK mask,
>  const char *msg, const char *prefix,
> @@ -83,6 +84,7 @@ int ath6kl_debug_init_fs(struct ath6kl *ar);
>  void ath6kl_debug_cleanup(struct ath6kl *ar);
>
>  #else
> +__printf(2, 3)
>  static inline void ath6kl_dbg(enum ATH6K_DEBUG_MASK dbg_mask,
>   const char *fmt, ...)
>  {

My only question is why bother doing a format check on something
that's going to be compiled out anyway? I suppose the only harm is a
tiny extra bit of compile time due to the check and I'm sure that's
measured in micro-seconds on full development systems, but if we do it
everywhere those tiny bits of time would eventually add up.

Admittedly it's a comment that probably isn't worth redoing the commit
over.  I guess I'm bringing up the point more discuss the question:
"Should we add the printf format verification on the clauses that get
compiled out?"

So, it looks good to me as is, or if you feel like making the change
I'm suggesting, that's fine too.  And it builds and runs on my
platforms.

Reviewed-by: Steve deRosier 

- Steve

Re: [PATCH net-next v2 9/9] net: dsa: mv88e6xxx: add cross-chip bridging

2017-03-31 Thread Andrew Lunn

On Fri, Mar 31, 2017 at 12:55:03PM -0400, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn  writes:
> 
> > On Thu, Mar 30, 2017 at 05:37:15PM -0400, Vivien Didelot wrote:
> >> Implement the DSA cross-chip bridging operations by remapping the local
> >> ports an external source port can egress frames to, when this cross-chip
> >> port joins or leaves a bridge.
> >> 
> >> The PVT is no longer configured with all ones allowing any external
> >> frame to egress any local port. Only DSA and CPU ports, as well as
> >> bridge group members, can egress frames on local ports.
> >
> > With the ZII devel B, we have two switches with PVT, and one
> > without. What happens in this setup? Can the non-PVT switch leak
> > frames out user ports which should otherwise be blocked?
> 
> If CONFIG_BRIDGE_VLAN_FILTERING isn't enabled in the kernel, the non-PVT
> switch would indeed have no mean to restrict arbitrary external
> frames. So in that setup, yes the switch can theorically leak frames.

I don't like the idea of leaking frames. It has security implications,
and hard to debug weird networking problems, like some other machine
is using my IP address, maybe spanning tree is broken if BPDUs leak,
even broadcast storms?

So we need to consider the complexity of detecting we have a non-PVT
destination switch, and forward it frames via the software bridge.

What about the case the non-PVT switch is in the middle of a chain of
PVT switches?

Maybe to start with, to keep it simple, we check all switches are PVT
capable. If they are not, we refuse to use PVT and all inter-switch
frames need to go via the Linux software bridge?

   Andrew

Re: EINVAL when using connect() for udp sockets

2017-03-31 Thread Cong Wang

On Fri, Mar 31, 2017 at 9:52 AM, Cong Wang  wrote:
>
> Please submit your patch formally and with a man page patch too.

BTW, and probably IPv6 too.

Re: [PATCH] [net-next] stmmac: use netif_set_real_num_{rx,tx}_queues

2017-03-31 Thread Joao Pinto

Às 5:57 PM de 3/31/2017, David Miller escreveu:
> From: Joao Pinto 
> Date: Fri, 31 Mar 2017 11:43:38 +0100
> 
>> @David: Could you please create a branch in your git tree for us to work on 
>> it
>> until the multiple buffers get stable for everyone? This way the patches 
>> could
>> circulate in the mailing-list with a different target, like stmmac-next or 
>> similar.
>>
>> What do you think?
> 
> Sorry, I'm not going to do that.
> 

Ok. I will send the patches normally through the mailing-list.

Thanks.

Re: [PATCH] [net-next] stmmac: use netif_set_real_num_{rx,tx}_queues

2017-03-31 Thread David Miller

From: Joao Pinto 
Date: Fri, 31 Mar 2017 11:43:38 +0100

> @David: Could you please create a branch in your git tree for us to work on it
> until the multiple buffers get stable for everyone? This way the patches could
> circulate in the mailing-list with a different target, like stmmac-next or 
> similar.
> 
> What do you think?

Sorry, I'm not going to do that.

Re: [PATCH net] ftgmac100: Mostly rewrite the driver

2017-03-31 Thread David Miller

From: Benjamin Herrenschmidt 
Date: Fri, 31 Mar 2017 20:59:27 +1100

> We're running some more testing tonight, if it's all solid I'll shoot
> it out tomorrow or sunday. Dave, it's ok to just spam the list with a
> 55 patches series like that ?

Please send about a dozen at a time, thank you.  Group them logically
as best as you can.

Re: [PATCH net-next v2 9/9] net: dsa: mv88e6xxx: add cross-chip bridging

2017-03-31 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> On Thu, Mar 30, 2017 at 05:37:15PM -0400, Vivien Didelot wrote:
>> Implement the DSA cross-chip bridging operations by remapping the local
>> ports an external source port can egress frames to, when this cross-chip
>> port joins or leaves a bridge.
>> 
>> The PVT is no longer configured with all ones allowing any external
>> frame to egress any local port. Only DSA and CPU ports, as well as
>> bridge group members, can egress frames on local ports.
>
> With the ZII devel B, we have two switches with PVT, and one
> without. What happens in this setup? Can the non-PVT switch leak
> frames out user ports which should otherwise be blocked?

If CONFIG_BRIDGE_VLAN_FILTERING isn't enabled in the kernel, the non-PVT
switch would indeed have no mean to restrict arbitrary external
frames. So in that setup, yes the switch can theorically leak frames.

With a VLAN-filtering aware system, the VTU policy and 802.1Q Secure
port mode should guard against that.

Thanks,

Vivien

Re: [PATCH net-next v2] net: dsa: Mock-up driver

2017-03-31 Thread Andrew Lunn

> Actually we do not, because netdev_uses_dsa() returns true only when
> dst->rcv is different from NULL. When dst->rcv is NULL we completely
> bypass the DSA hook in eth_type_trans() and everything is well, the
> master network device is the one receiving packets.

static inline bool netdev_uses_dsa(struct net_device *dev)
{
#if IS_ENABLED(CONFIG_NET_DSA)
if (dev->dsa_ptr != NULL)
return dsa_uses_tagged_protocol(dev->dsa_ptr);
#endif
return false;
}

and

static inline bool dsa_uses_tagged_protocol(struct dsa_switch_tree *dst)
{
return dst->rcv != NULL;
}

Yep, i just needed to dig further...

Thanks
Andrew

Re: EINVAL when using connect() for udp sockets

2017-03-31 Thread Cong Wang

On Thu, Mar 30, 2017 at 5:02 PM, Eric Dumazet  wrote:
> On Thu, 2017-03-30 at 16:36 -0700, Cong Wang wrote:
>> On Tue, Mar 28, 2017 at 5:52 PM, Eric Dumazet  wrote:
>> > On Tue, 2017-03-28 at 16:11 -0700, Eric Dumazet wrote:
>> >
>> >> Yes, this looks better.
>> >>
>> >> Although you probably need to change a bit later this part :
>> >>
>> >> if (!inet->inet_saddr)
>> >>   inet->inet_saddr = fl4->saddr;  /* Update source address */
>> >>
>> >
>> > I came up with the following tested patch for IPv4
>> >
>> > diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
>> > index 
>> > f915abff1350a86af8d5bb89725b751c061b0fb5..1454b6191e0d38ffae0ae260578858285bc5f77b
>> >  100644
>> > --- a/net/ipv4/datagram.c
>> > +++ b/net/ipv4/datagram.c
>> > @@ -40,7 +40,7 @@ int __ip4_datagram_connect(struct sock *sk, struct 
>> > sockaddr *uaddr, int addr_len
>> > sk_dst_reset(sk);
>> >
>> > oif = sk->sk_bound_dev_if;
>> > -   saddr = inet->inet_saddr;
>> > +   saddr = (sk->sk_userlocks & SOCK_BINDADDR_LOCK) ? inet->inet_saddr 
>> > : 0;
>> > if (ipv4_is_multicast(usin->sin_addr.s_addr)) {
>> > if (!oif)
>> > oif = inet->mc_index;
>> > @@ -64,9 +64,8 @@ int __ip4_datagram_connect(struct sock *sk, struct 
>> > sockaddr *uaddr, int addr_len
>> > err = -EACCES;
>> > goto out;
>> > }
>> > -   if (!inet->inet_saddr)
>> > +   if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) {
>> > inet->inet_saddr = fl4->saddr;  /* Update source address */
>> > -   if (!inet->inet_rcv_saddr) {
>> > inet->inet_rcv_saddr = fl4->saddr;
>> > if (sk->sk_prot->rehash)
>> > sk->sk_prot->rehash(sk);
>>
>> Why do we need this here? If you mean bind() INADDR_ANY is bound,
>> then it is totally a different problem?
>
>
> Proper delivery of RX packets will need to find the socket, and this
> needs the 2-tuple (source address, source port) info for UDP.
>
> So after a connect(), we need to rehash

Oh, I didn't notice remove the if (!inet->inet_rcv_saddr) check...

[...]

>> 1) When a bind() is called before connect()'s, aka:
>>
>> bind();
>> connect(addr1); // should not change source addr
>
> It depends. bind() can be only allocating the source port.
>
> If bind(INADDR_ANY) was used, then we need to determine source addr at
> connect() time.

Yes, bind() only sets the flag for non-zero address:

if (inet->inet_rcv_saddr)
sk->sk_userlocks |= SOCK_BINDADDR_LOCK;


Please submit your patch formally and with a man page patch too.

RE: [PATCH V2 net-next 18/18] net: hns: Some checkpatch.pl script & warning fixes

2017-03-31 Thread Salil Mehta

> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Friday, March 31, 2017 4:45 PM
> To: Salil Mehta; da...@davemloft.net
> Cc: Zhuangyuzeng (Yisen); mehta.salil@gmail.com;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V2 net-next 18/18] net: hns: Some checkpatch.pl
> script & warning fixes
> 
> On Fri, 2017-03-31 at 12:20 +0100, Salil Mehta wrote:
> > This patch fixes some checkpatch.pl script caught errors and
> > warnings during the compilation time.
> []
> > diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
> []
> > @@ -512,7 +512,8 @@ static void hns_nic_reuse_page(struct sk_buff
> *skb, int i,
> > int last_offset;
> > bool twobufs;
> >
> > -   twobufs = ((PAGE_SIZE < 8192) && hnae_buf_size(ring) ==
> HNS_BUFFER_SIZE_2048);
> > +   twobufs = ((PAGE_SIZE < 8192) && hnae_buf_size(ring)
> > +   == HNS_BUFFER_SIZE_2048);
> 
> This would read nicer without splitting a comparison test
> onto multiple lines
> 
>   twobufs = PAGE_SIZE < 8192 &&
> hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048;
For sure, thanks for noticing. Will correct this!

Best regards
Salil
>

Re: [PATCH net-next v2] net: dsa: Mock-up driver

2017-03-31 Thread Florian Fainelli

Hi Andrew,

On 03/31/2017 09:06 AM, Andrew Lunn wrote:
> Hi Florian
> 
>> +static enum dsa_tag_protocol dsa_loop_get_protocol(struct dsa_switch *ds)
>> +{
>> +dev_dbg(ds->dev, "%s\n", __func__);
>> +
>> +return DSA_TAG_PROTO_NONE;
>> +}
> 
> I'm wondering how safe this is:
> 
> static const struct dsa_device_ops none_ops = {
> .xmit   = dsa_slave_notag_xmit,
> .rcv= NULL,
> };
> 
> /*
>  * If the CPU connects to this switch, set the switch tree
>  * tagging protocol to the preferred tagging format of this
>  * switch.
>  */
> if (dst->cpu_switch == ds) {
> enum dsa_tag_protocol tag_protocol;
> 
> tag_protocol = ops->get_tag_protocol(ds);
> dst->tag_ops = dsa_resolve_tag_protocol(tag_protocol);
> if (IS_ERR(dst->tag_ops))
> return PTR_ERR(dst->tag_ops);
> 
> dst->rcv = dst->tag_ops->rcv;
> }
> 
> 
> static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>   struct packet_type *pt, struct net_device *orig_dev)
> {
> struct dsa_switch_tree *dst = dev->dsa_ptr;
> 
> if (unlikely(dst == NULL)) {
> kfree_skb(skb);
> return 0;
> }
> 
> return dst->rcv(skb, dev, pt, orig_dev);
> }
> 
> static struct packet_type dsa_pack_type __read_mostly = {
> .type   = cpu_to_be16(ETH_P_XDSA),
> .func   = dsa_switch_rcv,
> };
> 
> It looks like when a frame is received, we are going to dereference a
> NULL pointer.

Actually we do not, because netdev_uses_dsa() returns true only when
dst->rcv is different from NULL. When dst->rcv is NULL we completely
bypass the DSA hook in eth_type_trans() and everything is well, the
master network device is the one receiving packets.

This is actually the intended behavior for netdev_uses_dsa() because it
really tells whether there is a DSA tagging protocol set-up and that is
what NIC drivers (e.g: bcmsysport) would care about.

Thanks!
-- 
Florian

Re: [PATCH net-next v2 9/9] net: dsa: mv88e6xxx: add cross-chip bridging

2017-03-31 Thread Andrew Lunn

On Thu, Mar 30, 2017 at 05:37:15PM -0400, Vivien Didelot wrote:
> Implement the DSA cross-chip bridging operations by remapping the local
> ports an external source port can egress frames to, when this cross-chip
> port joins or leaves a bridge.
> 
> The PVT is no longer configured with all ones allowing any external
> frame to egress any local port. Only DSA and CPU ports, as well as
> bridge group members, can egress frames on local ports.

Hi Vivien

With the ZII devel B, we have two switches with PVT, and one
without. What happens in this setup? Can the non-PVT switch leak
frames out user ports which should otherwise be blocked?

Thanks
Andrew

Re: [PATCH net-next v2 3/9] net: dsa: mv88e6xxx: program the PVT with all ones

2017-03-31 Thread Andrew Lunn

On Thu, Mar 30, 2017 at 05:37:09PM -0400, Vivien Didelot wrote:
> The Cross-chip Port Based VLAN Table (PVT) is currently initialized with
> all ones, allowing any external ports to egress frames on local ports.
> 
> This commit implements the PVT access functions and programs the PVT
> with all ones for the local switch ports only, instead of using the Init
> operation. The current behavior is unchanged for the moment.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next v2 2/9] net: dsa: mv88e6xxx: use 4-bit port for PVT data

2017-03-31 Thread Andrew Lunn

> +/* Offset 0x1D: Misc Register */
> +
> +static int mv88e6xxx_g2_misc_5_bit_port(struct mv88e6xxx_chip *chip,
> + bool port_5_bit)
> +{
> + u16 val;
> + int err;
> +
> + err = mv88e6xxx_g2_read(chip, GLOBAL2_MISC, );
> + if (err)
> + return err;
> +
> + if (port_5_bit)
> + val |= GLOBAL2_MISC_5_BIT_PORT;
> + else
> + val &= ~GLOBAL2_MISC_5_BIT_PORT;
> +
> + return mv88e6xxx_g2_write(chip, GLOBAL2_MISC, val);
> +}
> +
> +int mv88e6xxx_g2_misc_4_bit_port(struct mv88e6xxx_chip *chip)
> +{
> + return mv88e6xxx_g2_misc_5_bit_port(chip, false);
> +}

Hi Vivien

Yes, i know, i'm nit-picking. The function naming is confusing here.
What would you call this function:

int mv88e6xxx_g2_misc_X(struct mv88e6xxx_chip *chip)
{
return mv88e6xxx_g2_misc_5_bit_port(chip, true);
}

Thanks
Andrew

Re: [PATCH net-next v2 4/9] net: dsa: mv88e6xxx: allocate the number of ports

2017-03-31 Thread Andrew Lunn

On Thu, Mar 30, 2017 at 05:37:10PM -0400, Vivien Didelot wrote:
> The current code allocates DSA_MAX_PORTS ports for a Marvell dsa_switch
> structure. Provide the exact number of ports so the corresponding
> ds->num_ports is accurate.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH 1/6] virtio: wrap find_vqs

2017-03-31 Thread Michael S. Tsirkin

On Fri, Mar 31, 2017 at 12:04:55PM +0800, Jason Wang wrote:
> 
> 
> On 2017年03月30日 22:32, Michael S. Tsirkin wrote:
> > On Thu, Mar 30, 2017 at 02:00:08PM +0800, Jason Wang wrote:
> > > 
> > > On 2017年03月30日 04:48, Michael S. Tsirkin wrote:
> > > > We are going to add more parameters to find_vqs, let's wrap the call so
> > > > we don't need to tweak all drivers every time.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin
> > > > ---
> > > A quick glance and it looks ok, but what the benefit of this series, is it
> > > required by other changes?
> > > 
> > > Thanks
> > Yes - to avoid touching all devices when doing the rest of
> > the patchset.
> 
> Maybe I'm not clear. I mean the benefit of this series not this single
> patch. I guess it may be used by you proposal that avoid reset when set XDP?

In particular, yes. It generally simplifies things significantly if
we can get the true buffer size back.

> If yes, do we really want to drop some packets after XDP is set?
> 
> Thanks

We would rather not drop packets. We could detect and copy them to make
XDP work.

-- 
MST

Re: [PATCH] net: udp: add socket option to report RX queue level

2017-03-31 Thread Eric Dumazet

Please do not top post on netdev

On Mon, 2017-03-27 at 18:08 -0700, Chris Kuiper wrote:
> Sorry, I have been transferring jobs and had no time to look at this.
> 
> Josh Hunt's change seems to solve a different problem. I was looking
> for something that works the same way as SO_RXQ_OVERFL, providing
> information as ancillary data to the recvmsg() call. The problem with
> SO_RXQ_OVERFL alone is that it tells you when things have already gone
> wrong (you dropped data), so the new option SO_RX_ALLOC acts as a
> leading indicator to check if you are getting close to hitting such
> problem.

SO_RXQ_OVERFL gives a very precise info for every skb that was queued.

This is a different indicator, because it can tell you where is the
discontinuity point at the time skb were queued, not at the time they
are dequeued.

Just tune SO_RCVBUF to not even have to care about this.

By the time you sample the queue occupancy, the information might be
completely stale and queue already overflowed.

There is very little point having a super system call gathering all kind
of (stale) info

> 
> Regarding only UDP being supported, it is only meaningful for UDP. TCP
> doesn't drop data and if its buffer gets full it just stops the sender
> from sending more. The buffer level in that case doesn't even tell you
> the whole picture, since it doesn't include any information on how
> much additional buffering is done at the sender side.
> 

We have more protocols than UDP and TCP in linux kernel.

> In terms of "a lot overhead", logically the overhead of adding
> additional getsockopt() calls after each recvmsg() is significantly
> larger than just getting the information as part of recvmsg(). If you
> don't need it, then don't enable this option. Admitted you can reduce
> the frequency of calling getsockopt() relative to recvmsg(), but that
> also increases your risk of missing the point where data is dropped.

Your proposal adds overhead for all UDP recvmsg() calls, while most of
them absolutely not care about overruns. There is little you can do if
you are under attack or if your SO_RCVBUF is too small for the workload.

Some people work hard to reach 2 Millions UDP recvmsg() calls per second
on a single UDP socket, so everything added in fast path will be
scrutinized.

Re: [PATCH net-next v2] net: dsa: Mock-up driver

2017-03-31 Thread Andrew Lunn

Hi Florian

> +static enum dsa_tag_protocol dsa_loop_get_protocol(struct dsa_switch *ds)
> +{
> + dev_dbg(ds->dev, "%s\n", __func__);
> +
> + return DSA_TAG_PROTO_NONE;
> +}

I'm wondering how safe this is:

static const struct dsa_device_ops none_ops = {
.xmit   = dsa_slave_notag_xmit,
.rcv= NULL,
};

/*
 * If the CPU connects to this switch, set the switch tree
 * tagging protocol to the preferred tagging format of this
 * switch.
 */
if (dst->cpu_switch == ds) {
enum dsa_tag_protocol tag_protocol;

tag_protocol = ops->get_tag_protocol(ds);
dst->tag_ops = dsa_resolve_tag_protocol(tag_protocol);
if (IS_ERR(dst->tag_ops))
return PTR_ERR(dst->tag_ops);

dst->rcv = dst->tag_ops->rcv;
}


static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
  struct packet_type *pt, struct net_device *orig_dev)
{
struct dsa_switch_tree *dst = dev->dsa_ptr;

if (unlikely(dst == NULL)) {
kfree_skb(skb);
return 0;
}

return dst->rcv(skb, dev, pt, orig_dev);
}

static struct packet_type dsa_pack_type __read_mostly = {
.type   = cpu_to_be16(ETH_P_XDSA),
.func   = dsa_switch_rcv,
};

It looks like when a frame is received, we are going to dereference a
NULL pointer.

Either we need a NOP rcv function, or we don't register dsa_pack_type
if rcv is NULL.

   Andrew

Re: [PATCH V2 net-next 18/18] net: hns: Some checkpatch.pl script & warning fixes

2017-03-31 Thread Joe Perches

On Fri, 2017-03-31 at 12:20 +0100, Salil Mehta wrote:
> This patch fixes some checkpatch.pl script caught errors and
> warnings during the compilation time.
[]
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c 
> b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
[]
> @@ -512,7 +512,8 @@ static void hns_nic_reuse_page(struct sk_buff *skb, int i,
>   int last_offset;
>   bool twobufs;
>  
> - twobufs = ((PAGE_SIZE < 8192) && hnae_buf_size(ring) == 
> HNS_BUFFER_SIZE_2048);
> + twobufs = ((PAGE_SIZE < 8192) && hnae_buf_size(ring)
> + == HNS_BUFFER_SIZE_2048);

This would read nicer without splitting a comparison test
onto multiple lines

twobufs = PAGE_SIZE < 8192 &&
  hnae_buf_size(ring) == HNS_BUFFER_SIZE_2048;

Re: [PATCH net-next] udp: use sk_protocol instead of pcflag to detect udplite sockets

2017-03-31 Thread Paolo Abeni

On Fri, 2017-03-31 at 08:09 -0700, Eric Dumazet wrote:
> On Fri, 2017-03-31 at 16:33 +0200, Paolo Abeni wrote:
> 
> > I did the above to avoid increasing the udp_sock struct size; this will
> > costs more than a whole cacheline.
> 
> Yes, but who cares :)
> 
> Also note that we discussed about having a secondary receive queue in
> the future, to decouple the fact that producers/consumer have to grab a
> contended spinlock for every enqueued and dequeued packet.
> 
> With a secondary queue, the consumer can transfer one queue into another
> in one batch.
> 
> Or simply use ptr_ring / skb_array now these infras are available thanks
> to Michael.
> 
> So we will likely increase UDP socket size in a near future...
> 
> > 
> > I did not hit others false sharing issues because:
> > - gro_receive/gro_complete are touched only for packets coming from 
> > devices with udp tunnel offload enabled, that hit the tunnel offload
> > path on the nic; such packets will most probably land in the udp tunnel
> >  and will not use 'forward_deficit'
> 
> 
> > - encap_destroy is touched only socket shutdown
> > - encap_rcv is protected by the 'udp_encap_needed' static key
> > 
> > I think this latter is problematic, so I'm ok with the patch you
> > suggested.
> > 
> > The above change could still make sense, the udp code is already
> > checking for udplite sockets with either pcflag and protocol;
> > testing always the same data will make the code more cleaner.
> 
> Where are we testing sk->sk_prototocol in receive path ?

Sorry, I was ambiguous: sk->sk_protocol is not used yet; before the
socket lockup, __udp4_lib_rcv() and __udp6_lib_rcv() use the protocol
number provided by the caller to properly account udp vs udplite stats.

Cheers,

Paolo

Re: [PATCH net-next] udp: use sk_protocol instead of pcflag to detect udplite sockets

2017-03-31 Thread Eric Dumazet

On Fri, 2017-03-31 at 15:03 +, David Laight wrote:

> Is that really sensible on systems with large cache lines?

Yes it is.

We mostly do our perf analysis on x86, and it turns out that linux
networking on PowerPC is not great because of this.

struct dst_entry is showing problems, simply because we aligned __refcnt
on 64 byte boundary, which is not enough on PowerPC.

With bigger cache lines, you have bigger chances of false sharing.

Re: [PATCH net-next] udp: use sk_protocol instead of pcflag to detect udplite sockets

2017-03-31 Thread Eric Dumazet

On Fri, 2017-03-31 at 16:33 +0200, Paolo Abeni wrote:

> I did the above to avoid increasing the udp_sock struct size; this will
> costs more than a whole cacheline.

Yes, but who cares :)

Also note that we discussed about having a secondary receive queue in
the future, to decouple the fact that producers/consumer have to grab a
contended spinlock for every enqueued and dequeued packet.

With a secondary queue, the consumer can transfer one queue into another
in one batch.

Or simply use ptr_ring / skb_array now these infras are available thanks
to Michael.

So we will likely increase UDP socket size in a near future...

> 
> I did not hit others false sharing issues because:
> - gro_receive/gro_complete are touched only for packets coming from 
> devices with udp tunnel offload enabled, that hit the tunnel offload
> path on the nic; such packets will most probably land in the udp tunnel
>  and will not use 'forward_deficit'

> - encap_destroy is touched only socket shutdown
> - encap_rcv is protected by the 'udp_encap_needed' static key
> 
> I think this latter is problematic, so I'm ok with the patch you
> suggested.
> 
> The above change could still make sense, the udp code is already
> checking for udplite sockets with either pcflag and protocol;
> testing always the same data will make the code more cleaner.

Where are we testing sk->sk_prototocol in receive path ?

Thanks Paolo !

RE: [PATCH net-next] udp: use sk_protocol instead of pcflag to detect udplite sockets

2017-03-31 Thread David Laight

From: Eric Dumazet
> Sent: 31 March 2017 14:25
> On Fri, 2017-03-31 at 11:47 +0200, Paolo Abeni wrote:
> > In the udp_sock struct, the 'forward_deficit' and 'pcflag' fields
> > share the same cacheline. While the first is dirtied by
> > udp_recvmsg, the latter is read, possibly several times, by the
> > bottom half processing to discriminate between udp and udplite
> > sockets.
> >
> > With this patch, sk->sk_protocol is used to check is the socket is
> > really an udplite one, avoiding some cache misses per
> > packet and improving the performance under udp_flood with
> > small packet up to 10%.
...
> I am pretty sure we agreed in the past that forward_deficit would need
> to be placed on a cache line of its own. Somehow we manage to not
> implement this properly.
> 
> What about other fields like encap_rcv, encap_destroy, gro_receive,
> gro_complete. They really should have the same false sharing issue.
> 
> Proper fix is :
...
> - /* This field is dirtied by udp_recvmsg() */
> - int forward_deficit;
> + /* This field is dirtied by udp_recvmsg().
> +  * Make sure it wont share a cache line with prior fields.
> +  */
> + int forward_deficit cacheline_aligned_in_smp;

Is that really sensible on systems with large cache lines?

David

Please I want you to patiently read this offer.?

2017-03-31 Thread Mr.Hassan Habib

Hello.

I know this means of communication may not be morally right to you as a person 
but I also have had a great thought about it and I have come to this conclusion 
which I am about to share with you.

INTRODUCTION:I am the Credit Manager U. B. A Bank of Burkina Faso Ouagadougou
and in one way or the other was hoping you will cooperate with me as a partner 
in a project of transferring an abandoned fund of a late customer of the bank 
worth of $18,000,000 (Eighteen Million Dollars US).

This will be disbursed or shared between the both of us in these percentages, 
55% to me and 35% to you while 10% will be for expenses both parties might have 
incurred during the process of transferring.
I
await for your response so that we can commence on this project as soon as 
possible.

Reply to this Email:mr_habib2...@yahoo.com

Regards,
Mr.Hassan Habib.

Credit Manager U.B.A Bank of
Burkina Faso Ouagadougou

Re: [PATCH net-next] udp: use sk_protocol instead of pcflag to detect udplite sockets

2017-03-31 Thread Paolo Abeni

On Fri, 2017-03-31 at 06:25 -0700, Eric Dumazet wrote:
> On Fri, 2017-03-31 at 11:47 +0200, Paolo Abeni wrote:
> > In the udp_sock struct, the 'forward_deficit' and 'pcflag' fields
> > share the same cacheline. While the first is dirtied by
> > udp_recvmsg, the latter is read, possibly several times, by the
> > bottom half processing to discriminate between udp and udplite
> > sockets.
> > 
> > With this patch, sk->sk_protocol is used to check is the socket is
> > really an udplite one, avoiding some cache misses per
> > packet and improving the performance under udp_flood with
> > small packet up to 10%.
> > 
> > Signed-off-by: Paolo Abeni 
> > ---
> >  include/linux/udp.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > index c0f5308..6cb4061 100644
> > --- a/include/linux/udp.h
> > +++ b/include/linux/udp.h
> > @@ -115,6 +115,6 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
> >  #define udp_portaddr_for_each_entry_rcu(__sk, list) \
> > hlist_for_each_entry_rcu(__sk, list, __sk_common.skc_portaddr_node)
> >  
> > -#define IS_UDPLITE(__sk) (udp_sk(__sk)->pcflag)
> > +#define IS_UDPLITE(__sk) (__sk->sk_protocol == IPPROTO_UDPLITE)
> >  
> >  #endif /* _LINUX_UDP_H */
> 
> 
> 
> I am pretty sure we agreed in the past that forward_deficit would need
> to be placed on a cache line of its own. Somehow we manage to not
> implement this properly.
> 
> What about other fields like encap_rcv, encap_destroy, gro_receive,
> gro_complete. They really should have the same false sharing issue.

I did the above to avoid increasing the udp_sock struct size; this will
costs more than a whole cacheline.

I did not hit others false sharing issues because:
- gro_receive/gro_complete are touched only for packets coming from 
devices with udp tunnel offload enabled, that hit the tunnel offload
path on the nic; such packets will most probably land in the udp tunnel
 and will not use 'forward_deficit'
- encap_destroy is touched only socket shutdown
- encap_rcv is protected by the 'udp_encap_needed' static key

I think this latter is problematic, so I'm ok with the patch you
suggested.

The above change could still make sense, the udp code is already
checking for udplite sockets with either pcflag and protocol;
testing always the same data will make the code more cleaner.

Paolo

Re: [PATCH V2 net-next 1/7] ptr_ring: introduce batch dequeuing

2017-03-31 Thread Michael S. Tsirkin

On Fri, Mar 31, 2017 at 11:52:24AM +0800, Jason Wang wrote:
> 
> 
> On 2017年03月30日 21:53, Michael S. Tsirkin wrote:
> > On Thu, Mar 30, 2017 at 03:22:24PM +0800, Jason Wang wrote:
> > > This patch introduce a batched version of consuming, consumer can
> > > dequeue more than one pointers from the ring at a time. We don't care
> > > about the reorder of reading here so no need for compiler barrier.
> > > 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >   include/linux/ptr_ring.h | 65 
> > > 
> > >   1 file changed, 65 insertions(+)
> > > 
> > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > > index 6c70444..2be0f350 100644
> > > --- a/include/linux/ptr_ring.h
> > > +++ b/include/linux/ptr_ring.h
> > > @@ -247,6 +247,22 @@ static inline void *__ptr_ring_consume(struct 
> > > ptr_ring *r)
> > >   return ptr;
> > >   }
> > > +static inline int __ptr_ring_consume_batched(struct ptr_ring *r,
> > > +  void **array, int n)
> > Can we use a shorter name? ptr_ring_consume_batch?
> 
> Ok, but at least we need to keep the prefix since there's a locked version.
> 
> 
> 
> > 
> > > +{
> > > + void *ptr;
> > > + int i;
> > > +
> > > + for (i = 0; i < n; i++) {
> > > + ptr = __ptr_ring_consume(r);
> > > + if (!ptr)
> > > + break;
> > > + array[i] = ptr;
> > > + }
> > > +
> > > + return i;
> > > +}
> > > +
> > >   /*
> > >* Note: resize (below) nests producer lock within consumer lock, so if 
> > > you
> > >* call this in interrupt or BH context, you must disable interrupts/BH 
> > > when
> > I'd like to add a code comment here explaining why we don't
> > care about cpu or compiler reordering. And I think the reason is
> > in the way you use this API: in vhost it does not matter
> > if you get less entries than present in the ring.
> > That's ok but needs to be noted
> > in a code comment so people use this function correctly.
> 
> Interesting, but I still think it's not necessary.
> 
> If consumer is doing a busy polling, it will eventually get the entries. If
> the consumer need notification from producer, it should drain the queue
> which means it need enable notification before last try of consuming call,
> otherwise it was a bug. The batch consuming function in this patch can
> guarantee return at least one pointer if there's many, this looks sufficient
> for the correctness?
> 
> Thanks

You ask for N entries but get N-1. This seems to imply the
ring is now empty. Do we guarantee this?


> > 
> > Also, I think you need to repeat the comment about cpu_relax
> > near this function: if someone uses it in a loop,
> > a compiler barrier is needed to prevent compiler from
> > optimizing it out.
> > 
> > I note that ptr_ring_consume currently lacks any of these
> > comments so I'm ok with merging as is, and I'll add
> > documentation on top.
> > Like this perhaps?
> > 
> > /* Consume up to n entries and return the number of entries consumed
> >   * or 0 on ring empty.
> >   * Note: this might return early with less entries than present in the
> >   * ring.
> >   * Note: callers invoking this in a loop must use a compiler barrier,
> >   * for example cpu_relax(). Callers must take consumer_lock
> >   * if the ring is ever resized - see e.g. ptr_ring_consume_batch.
> >   */
> > 
> > 
> > 
> > > @@ -297,6 +313,55 @@ static inline void *ptr_ring_consume_bh(struct 
> > > ptr_ring *r)
> > >   return ptr;
> > >   }
> > > +static inline int ptr_ring_consume_batched(struct ptr_ring *r,
> > > +void **array, int n)
> > > +{
> > > + int ret;
> > > +
> > > + spin_lock(>consumer_lock);
> > > + ret = __ptr_ring_consume_batched(r, array, n);
> > > + spin_unlock(>consumer_lock);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static inline int ptr_ring_consume_batched_irq(struct ptr_ring *r,
> > > +void **array, int n)
> > > +{
> > > + int ret;
> > > +
> > > + spin_lock_irq(>consumer_lock);
> > > + ret = __ptr_ring_consume_batched(r, array, n);
> > > + spin_unlock_irq(>consumer_lock);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static inline int ptr_ring_consume_batched_any(struct ptr_ring *r,
> > > +void **array, int n)
> > > +{
> > > + unsigned long flags;
> > > + int ret;
> > > +
> > > + spin_lock_irqsave(>consumer_lock, flags);
> > > + ret = __ptr_ring_consume_batched(r, array, n);
> > > + spin_unlock_irqrestore(>consumer_lock, flags);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static inline int ptr_ring_consume_batched_bh(struct ptr_ring *r,
> > > +   void **array, int n)
> > > +{
> > > + int ret;
> > > +
> > > + spin_lock_bh(>consumer_lock);
> > > + ret = __ptr_ring_consume_batched(r, array, n);
> > > + spin_unlock_bh(>consumer_lock);
> > > +
> > > + return ret;
> > > +}
> >

[PATCH net-next v3 3/6] net: mpls: change mpls_route layout

2017-03-31 Thread David Ahern

Move labels to the end of mpls_nh as a 0-sized array and within mpls_route
move the via for a nexthop after the mpls_nh. The new layout becomes:

   +--+
   | mpls_route   |
   +--+
   | mpls_nh 0|
   +--+
   | alignment padding|   4 bytes for odd number of labels; 0 for even
   +--+
   | via[rt_max_alen] 0   |
   +--+
   | alignment padding|   via's aligned on sizeof(unsigned long)
   +--+
   | ...  |
   +--+
   | mpls_nh n-1  |
   +--+
   | via[rt_max_alen] n-1 |
   +--+

Memory allocated for nexthop + via is constant across all nexthops and
their via. It is based on the maximum number of labels across all nexthops
and the maximum via length. The size is saved in the mpls_route as
rt_nh_size. Accessing a nexthop becomes rt->rt_nh + index * rt->rt_nh_size.

The offset of the via address from a nexthop is saved as rt_via_offset
so that given an mpls_nh pointer the via for that hop is simply
nh + rt->rt_via_offset.

With prior code, memory allocated per mpls_route with 1 nexthop:
 via is an ethernet address - 64 bytes
 via is an ipv4 address - 64
 via is an ipv6 address - 72

With this patch set, memory allocated per mpls_route with 1 nexthop and
1 or 2 labels:
 via is an ethernet address - 56 bytes
 via is an ipv4 address - 56
 via is an ipv6 address - 64

The 8-byte reduction is due to the previous patch; the change introduced
by this patch has no impact on the size of allocations for 1 or 2 labels.

Performance impact of this change was examined using network namespaces
with veth pairs connecting namespaces. ns0 inserts the packet to the
label-switched path using an lwt route with encap mpls. ns1 adds 1 or 2
labels depending on test, ns2 (and ns3 for 2-label test) pops the label
and forwards. ns3 (or ns4) for a 2-label is the destination. Similar
series of namespaces used for 2-nexthop test.

Intent is to measure changes to latency (overhead in manipulating the
packet) in the forwarding path. Tests used netperf with UDP_RR.

IPv4: current   patches
   1 label, 1 nexthop  29908 30115
   2 label, 1 nexthop  29071 29612
   1 label, 2 nexthop  29582 29776
   2 label, 2 nexthop  29086 29149

IPv6: current   patches
   1 label, 1 nexthop  24502 24960
   2 label, 1 nexthop  24041 24407
   1 label, 2 nexthop  23795 23899
   2 label, 2 nexthop  23074 22959

In short, the change has no effect to a modest increase in performance.
This is expected since this patch does not really have an impact on routes
with 1 or 2 labels (the current limit) and 1 or 2 nexthops.

Signed-off-by: David Ahern 
---
v3
- no change

v2
- and u8 and u16 reserved variables to explicitly note holes in mpls_nh
  and mpls_route

 net/mpls/af_mpls.c  | 37 +
 net/mpls/internal.h | 45 ++---
 2 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 665dec84f001..1863b94133e4 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -24,6 +24,8 @@
 #include 
 #include "internal.h"
 
+#define MAX_NEW_LABELS 2
+
 /* Maximum number of labels to look ahead at when selecting a path of
  * a multipath route
  */
@@ -60,10 +62,7 @@ EXPORT_SYMBOL_GPL(mpls_output_possible);
 
 static u8 *__mpls_nh_via(struct mpls_route *rt, struct mpls_nh *nh)
 {
-   u8 *nh0_via = PTR_ALIGN((u8 *)>rt_nh[rt->rt_nhn], VIA_ALEN_ALIGN);
-   int nh_index = nh - rt->rt_nh;
-
-   return nh0_via + rt->rt_max_alen * nh_index;
+   return (u8 *)nh + rt->rt_via_offset;
 }
 
 static const u8 *mpls_nh_via(const struct mpls_route *rt,
@@ -189,6 +188,11 @@ static u32 mpls_multipath_hash(struct mpls_route *rt, 
struct sk_buff *skb)
return hash;
 }
 
+static struct mpls_nh *mpls_get_nexthop(struct mpls_route *rt, u8 index)
+{
+   return (struct mpls_nh *)((u8 *)rt->rt_nh + index * rt->rt_nh_size);
+}
+
 /* number of alive nexthops (rt->rt_nhn_alive) and the flags for
  * a next hop (nh->nh_flags) are modified by netdev event handlers.
  * Since those fields can change at any moment, use READ_ONCE to
@@ -206,7 +210,7 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
 * one path
 */
if (rt->rt_nhn == 1)
-   goto out;
+   return rt->rt_nh;
 
alive = READ_ONCE(rt->rt_nhn_alive);
if (alive == 0)
@@ -227,7 +231,7 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
} endfor_nexthops(rt);
 
 out:
-   return >rt_nh[nh_index];
+   return mpls_get_nexthop(rt, nh_index);
 }
 
 static bool mpls_egress(struct net *net, struct

[PATCH net-next v3 4/6] net: mpls: Limit memory allocation for mpls_route

2017-03-31 Thread David Ahern

Limit memory allocation size for mpls_route to 4096.

Signed-off-by: David Ahern 
---
v3
- no change

v2
- new patch in v2 of set

 net/mpls/af_mpls.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1863b94133e4..f84c52b6eafc 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -26,6 +26,9 @@
 
 #define MAX_NEW_LABELS 2
 
+/* max memory we will use for mpls_route */
+#define MAX_MPLS_ROUTE_MEM 4096
+
 /* Maximum number of labels to look ahead at when selecting a path of
  * a multipath route
  */
@@ -477,14 +480,20 @@ static struct mpls_route *mpls_rt_alloc(u8 num_nh, u8 
max_alen, u8 max_labels)
 {
u8 nh_size = MPLS_NH_SIZE(max_labels, max_alen);
struct mpls_route *rt;
+   size_t size;
 
-   rt = kzalloc(sizeof(*rt) + num_nh * nh_size, GFP_KERNEL);
-   if (rt) {
-   rt->rt_nhn = num_nh;
-   rt->rt_nhn_alive = num_nh;
-   rt->rt_nh_size = nh_size;
-   rt->rt_via_offset = MPLS_NH_VIA_OFF(max_labels);
-   }
+   size = sizeof(*rt) + num_nh * nh_size;
+   if (size > MAX_MPLS_ROUTE_MEM)
+   return ERR_PTR(-EINVAL);
+
+   rt = kzalloc(size, GFP_KERNEL);
+   if (!rt)
+   return ERR_PTR(-ENOMEM);
+
+   rt->rt_nhn = num_nh;
+   rt->rt_nhn_alive = num_nh;
+   rt->rt_nh_size = nh_size;
+   rt->rt_via_offset = MPLS_NH_VIA_OFF(max_labels);
 
return rt;
 }
@@ -898,8 +907,10 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 
err = -ENOMEM;
rt = mpls_rt_alloc(nhs, max_via_alen, MAX_NEW_LABELS);
-   if (!rt)
+   if (IS_ERR(rt)) {
+   err = PTR_ERR(rt);
goto errout;
+   }
 
rt->rt_protocol = cfg->rc_protocol;
rt->rt_payload_type = cfg->rc_payload_type;
@@ -1970,7 +1981,7 @@ static int resize_platform_label_table(struct net *net, 
size_t limit)
if (limit > MPLS_LABEL_IPV4NULL) {
struct net_device *lo = net->loopback_dev;
rt0 = mpls_rt_alloc(1, lo->addr_len, MAX_NEW_LABELS);
-   if (!rt0)
+   if (IS_ERR(rt0))
goto nort0;
RCU_INIT_POINTER(rt0->rt_nh->nh_dev, lo);
rt0->rt_protocol = RTPROT_KERNEL;
@@ -1984,7 +1995,7 @@ static int resize_platform_label_table(struct net *net, 
size_t limit)
if (limit > MPLS_LABEL_IPV6NULL) {
struct net_device *lo = net->loopback_dev;
rt2 = mpls_rt_alloc(1, lo->addr_len, MAX_NEW_LABELS);
-   if (!rt2)
+   if (IS_ERR(rt2))
goto nort2;
RCU_INIT_POINTER(rt2->rt_nh->nh_dev, lo);
rt2->rt_protocol = RTPROT_KERNEL;
-- 
2.1.4

[PATCH net-next v3 0/6] net: mpls: Allow users to configure more labels per route

2017-03-31 Thread David Ahern

Increase the maximum number of new labels for MPLS routes from 2 to 30.

To keep memory consumption in check, the labels array is moved to the end
of mpls_nh and mpls_iptunnel_encap structs as a 0-sized array. Allocations
use the maximum number of labels across all nexthops in a route for LSR
and the number of labels configured for LWT.

The mpls_route layout is changed to:

   +--+
   | mpls_route   |
   +--+
   | mpls_nh 0|
   +--+
   | alignment padding|   4 bytes for odd number of labels; 0 for even
   +--+
   | via[rt_max_alen] 0   |
   +--+
   | alignment padding|   via's aligned on sizeof(unsigned long)
   +--+
   | ...  |

Meaning the via follows its mpls_nh providing better locality as the
number of labels increases. UDP_RR tests with namespaces shows no impact
to a modest performance increase with this layout for 1 or 2 labels and
1 or 2 nexthops.

mpls_route allocation size is limited to 4096 bytes allowing on the
order of 30 nexthops with 30 labels (or more nexthops with fewer
labels). LWT encap shares same maximum number of labels as mpls routing.

v3
- initialize n_labels to 0 in case RTA_NEWDST is not defined; detected
  by the kbuild test robot

v2
- updates per Eric's comments
  + added patch to ensure all reads of rt_nhn_alive and nh_flags in
the packet path use READ_ONCE and all writes via event handlers
use WRITE_ONCE

  + limit mpls_route size to 4096 (PAGE_SIZE for most arch)

  + mostly killed use of MAX_NEW_LABELS; it exists only for common
limit between lwt and routing paths

David Ahern (6):
  net: mpls: rt_nhn_alive and nh_flags should be accessed using
READ_ONCE
  net: mpls: Convert number of nexthops to u8
  net: mpls: change mpls_route layout
  net:mpls: Limit memory allocation for mpls_route
  net: mpls: bump maximum number of labels
  net: mpls: Increase max number of labels for lwt encap

 include/net/mpls_iptunnel.h |   5 +-
 net/mpls/af_mpls.c  | 210 +---
 net/mpls/internal.h |  61 +
 net/mpls/mpls_iptunnel.c|  13 ++-
 4 files changed, 196 insertions(+), 93 deletions(-)

-- 
2.1.4

[PATCH net-next v3 1/6] net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE

2017-03-31 Thread David Ahern

The number of alive nexthops for a route (rt->rt_nhn_alive) and the
flags for a next hop (nh->nh_flags) are modified by netdev event
handlers. The event handlers run with rtnl_lock held so updates are
always done with the lock held. The packet path accesses the fields
under the rcu lock. Since those fields can change at any moment in
the packet path, both fields should be accessed using READ_ONCE. Updates
to both fields should use WRITE_ONCE.

Update mpls_select_multipath (packet path) and mpls_ifdown and mpls_ifup
(event handlers) accordingly.

Signed-off-by: David Ahern 
---
v3
- no change

v2
- new patch in v2 of set

 net/mpls/af_mpls.c  | 36 +---
 net/mpls/internal.h |  8 
 2 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 06ffafde70da..6bdd2f95b576 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -189,10 +189,15 @@ static u32 mpls_multipath_hash(struct mpls_route *rt, 
struct sk_buff *skb)
return hash;
 }
 
+/* number of alive nexthops (rt->rt_nhn_alive) and the flags for
+ * a next hop (nh->nh_flags) are modified by netdev event handlers.
+ * Since those fields can change at any moment, use READ_ONCE to
+ * access both.
+ */
 static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
 struct sk_buff *skb)
 {
-   int alive = ACCESS_ONCE(rt->rt_nhn_alive);
+   unsigned int alive;
u32 hash = 0;
int nh_index = 0;
int n = 0;
@@ -203,7 +208,8 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
if (rt->rt_nhn == 1)
goto out;
 
-   if (alive <= 0)
+   alive = READ_ONCE(rt->rt_nhn_alive);
+   if (alive == 0)
return NULL;
 
hash = mpls_multipath_hash(rt, skb);
@@ -211,7 +217,9 @@ static struct mpls_nh *mpls_select_multipath(struct 
mpls_route *rt,
if (alive == rt->rt_nhn)
goto out;
for_nexthops(rt) {
-   if (nh->nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
+   unsigned int nh_flags = READ_ONCE(nh->nh_flags);
+
+   if (nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN))
continue;
if (n == nh_index)
return nh;
@@ -1302,7 +1310,6 @@ static void mpls_ifdown(struct net_device *dev, int event)
 {
struct mpls_route __rcu **platform_label;
struct net *net = dev_net(dev);
-   unsigned int nh_flags = RTNH_F_DEAD | RTNH_F_LINKDOWN;
unsigned int alive, deleted;
unsigned index;
 
@@ -1316,22 +1323,27 @@ static void mpls_ifdown(struct net_device *dev, int 
event)
alive = 0;
deleted = 0;
change_nexthops(rt) {
+   unsigned int nh_flags = nh->nh_flags;
+
if (rtnl_dereference(nh->nh_dev) != dev)
goto next;
 
switch (event) {
case NETDEV_DOWN:
case NETDEV_UNREGISTER:
-   nh->nh_flags |= RTNH_F_DEAD;
+   nh_flags |= RTNH_F_DEAD;
/* fall through */
case NETDEV_CHANGE:
-   nh->nh_flags |= RTNH_F_LINKDOWN;
+   nh_flags |= RTNH_F_LINKDOWN;
break;
}
if (event == NETDEV_UNREGISTER)
RCU_INIT_POINTER(nh->nh_dev, NULL);
+
+   if (nh->nh_flags != nh_flags)
+   WRITE_ONCE(nh->nh_flags, nh_flags);
 next:
-   if (!(nh->nh_flags & nh_flags))
+   if (!(nh_flags & (RTNH_F_DEAD | RTNH_F_LINKDOWN)))
alive++;
if (!rtnl_dereference(nh->nh_dev))
deleted++;
@@ -1345,7 +1357,7 @@ static void mpls_ifdown(struct net_device *dev, int event)
}
 }
 
-static void mpls_ifup(struct net_device *dev, unsigned int nh_flags)
+static void mpls_ifup(struct net_device *dev, unsigned int flags)
 {
struct mpls_route __rcu **platform_label;
struct net *net = dev_net(dev);
@@ -1361,20 +1373,22 @@ static void mpls_ifup(struct net_device *dev, unsigned 
int nh_flags)
 
alive = 0;
change_nexthops(rt) {
+   unsigned int nh_flags = nh->nh_flags;
struct net_device *nh_dev =
rtnl_dereference(nh->nh_dev);
 
-   if (!(nh->nh_flags & nh_flags)) {
+   if (!(nh_flags & flags)) {
alive++;
continue;
}
if

[PATCH net-next v3 6/6] net: mpls: Increase max number of labels for lwt encap

2017-03-31 Thread David Ahern

Alow users to push down more labels per MPLS encap. Similar to LSR case,
move label array to the end of mpls_iptunnel_encap and allocate based on
the number of labels for the route.

For consistency with the LSR case, re-use the same maximum number of
labels.

Signed-off-by: David Ahern 
---
v3
- no change

v2
- marked hole in mpls_iptunnel_encap as reserved1

 include/net/mpls_iptunnel.h |  5 ++---
 net/mpls/af_mpls.c  |  5 -
 net/mpls/internal.h |  5 +
 net/mpls/mpls_iptunnel.c| 13 ++---
 4 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
index a18af6a16eb5..9d22bf67ac86 100644
--- a/include/net/mpls_iptunnel.h
+++ b/include/net/mpls_iptunnel.h
@@ -14,13 +14,12 @@
 #ifndef _NET_MPLS_IPTUNNEL_H
 #define _NET_MPLS_IPTUNNEL_H 1
 
-#define MAX_NEW_LABELS 2
-
 struct mpls_iptunnel_encap {
-   u32 label[MAX_NEW_LABELS];
u8  labels;
u8  ttl_propagate;
u8  default_ttl;
+   u8  reserved1;
+   u32 label[0];
 };
 
 static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 2458d7ed2ab5..2da15dcb2675 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -24,11 +24,6 @@
 #include 
 #include "internal.h"
 
-/* put a reasonable limit on the number of labels
- * we will accept from userspace
- */
-#define MAX_NEW_LABELS 30
-
 /* max memory we will use for mpls_route */
 #define MAX_MPLS_ROUTE_MEM 4096
 
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index c5d2f5bc37ec..4db6a5971322 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -2,6 +2,11 @@
 #define MPLS_INTERNAL_H
 #include 
 
+/* put a reasonable limit on the number of labels
+ * we will accept from userspace
+ */
+#define MAX_NEW_LABELS 30
+
 struct mpls_entry_decoded {
u32 label;
u8 ttl;
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
index 22f71fce0bfb..fe00e98667cf 100644
--- a/net/mpls/mpls_iptunnel.c
+++ b/net/mpls/mpls_iptunnel.c
@@ -164,6 +164,7 @@ static int mpls_build_state(struct nlattr *nla,
struct mpls_iptunnel_encap *tun_encap_info;
struct nlattr *tb[MPLS_IPTUNNEL_MAX + 1];
struct lwtunnel_state *newts;
+   u8 n_labels;
int ret;
 
ret = nla_parse_nested(tb, MPLS_IPTUNNEL_MAX, nla,
@@ -175,12 +176,18 @@ static int mpls_build_state(struct nlattr *nla,
return -EINVAL;
 
 
-   newts = lwtunnel_state_alloc(sizeof(*tun_encap_info));
+   /* determine number of labels */
+   if (nla_get_labels(tb[MPLS_IPTUNNEL_DST],
+  MAX_NEW_LABELS, _labels, NULL))
+   return -EINVAL;
+
+   newts = lwtunnel_state_alloc(sizeof(*tun_encap_info) +
+n_labels * sizeof(u32));
if (!newts)
return -ENOMEM;
 
tun_encap_info = mpls_lwtunnel_encap(newts);
-   ret = nla_get_labels(tb[MPLS_IPTUNNEL_DST], MAX_NEW_LABELS,
+   ret = nla_get_labels(tb[MPLS_IPTUNNEL_DST], n_labels,
 _encap_info->labels, tun_encap_info->label);
if (ret)
goto errout;
@@ -257,7 +264,7 @@ static int mpls_encap_cmp(struct lwtunnel_state *a, struct 
lwtunnel_state *b)
a_hdr->default_ttl != b_hdr->default_ttl)
return 1;
 
-   for (l = 0; l < MAX_NEW_LABELS; l++)
+   for (l = 0; l < a_hdr->labels; l++)
if (a_hdr->label[l] != b_hdr->label[l])
return 1;
return 0;
-- 
2.1.4

[PATCH net-next v3 5/6] net: mpls: bump maximum number of labels

2017-03-31 Thread David Ahern

Allow users to push down more labels per MPLS route. With the previous
patches, no memory allocations are based on MAX_NEW_LABELS; the limit
is only used to keep userspace in check.

At this point MAX_NEW_LABELS is only used for mpls_route_config (copying
route data from userspace) and processing nexthops looking for the max
number of labels across the route spec.

Signed-off-by: David Ahern 
---
v3
- initialize n_labels to 0 in case RTA_NEWDST is not defined; detected
  by the kbuild test robot

v2
- increased MAX_NEW_LABELS to 30
- allocate mpls_route_config dynamically to reduce stack usage with
  new label count

 net/mpls/af_mpls.c  | 103 +++-
 net/mpls/internal.h |   2 +-
 2 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index f84c52b6eafc..2458d7ed2ab5 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -24,7 +24,10 @@
 #include 
 #include "internal.h"
 
-#define MAX_NEW_LABELS 2
+/* put a reasonable limit on the number of labels
+ * we will accept from userspace
+ */
+#define MAX_NEW_LABELS 30
 
 /* max memory we will use for mpls_route */
 #define MAX_MPLS_ROUTE_MEM 4096
@@ -698,9 +701,6 @@ static int mpls_nh_build_from_cfg(struct mpls_route_config 
*cfg,
return -ENOMEM;
 
err = -EINVAL;
-   /* Ensure only a supported number of labels are present */
-   if (cfg->rc_output_labels > MAX_NEW_LABELS)
-   goto errout;
 
nh->nh_labels = cfg->rc_output_labels;
for (i = 0; i < nh->nh_labels; i++)
@@ -725,7 +725,7 @@ static int mpls_nh_build_from_cfg(struct mpls_route_config 
*cfg,
 
 static int mpls_nh_build(struct net *net, struct mpls_route *rt,
 struct mpls_nh *nh, int oif, struct nlattr *via,
-struct nlattr *newdst)
+struct nlattr *newdst, u8 max_labels)
 {
int err = -ENOMEM;
 
@@ -733,7 +733,7 @@ static int mpls_nh_build(struct net *net, struct mpls_route 
*rt,
goto errout;
 
if (newdst) {
-   err = nla_get_labels(newdst, MAX_NEW_LABELS,
+   err = nla_get_labels(newdst, max_labels,
 >nh_labels, nh->nh_label);
if (err)
goto errout;
@@ -759,21 +759,19 @@ static int mpls_nh_build(struct net *net, struct 
mpls_route *rt,
 }
 
 static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int len,
- u8 cfg_via_alen, u8 *max_via_alen)
+ u8 cfg_via_alen, u8 *max_via_alen,
+ u8 *max_labels)
 {
int remaining = len;
u8 nhs = 0;
 
-   if (!rtnh) {
-   *max_via_alen = cfg_via_alen;
-   return 1;
-   }
-
*max_via_alen = 0;
+   *max_labels = 0;
 
while (rtnh_ok(rtnh, remaining)) {
struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
int attrlen;
+   u8 n_labels = 0;
 
attrlen = rtnh_attrlen(rtnh);
nla = nla_find(attrs, attrlen, RTA_VIA);
@@ -787,6 +785,13 @@ static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int 
len,
  via_alen);
}
 
+   nla = nla_find(attrs, attrlen, RTA_NEWDST);
+   if (nla &&
+   nla_get_labels(nla, MAX_NEW_LABELS, _labels, NULL) != 0)
+   return 0;
+
+   *max_labels = max_t(u8, *max_labels, n_labels);
+
/* number of nexthops is tracked by a u8.
 * Check for overflow.
 */
@@ -802,7 +807,7 @@ static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int 
len,
 }
 
 static int mpls_nh_build_multi(struct mpls_route_config *cfg,
-  struct mpls_route *rt)
+  struct mpls_route *rt, u8 max_labels)
 {
struct rtnexthop *rtnh = cfg->rc_mp;
struct nlattr *nla_via, *nla_newdst;
@@ -835,7 +840,8 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
}
 
err = mpls_nh_build(cfg->rc_nlinfo.nl_net, rt, nh,
-   rtnh->rtnh_ifindex, nla_via, nla_newdst);
+   rtnh->rtnh_ifindex, nla_via, nla_newdst,
+   max_labels);
if (err)
goto errout;
 
@@ -862,6 +868,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
int err = -EINVAL;
u8 max_via_alen;
unsigned index;
+   u8 max_labels;
u8 nhs;
 
index = cfg->rc_label;
@@ -900,13 +907,21 @@ static int mpls_route_add(struct mpls_route_config *cfg)
goto errout;
 
err = -EINVAL;
-   nhs = mpls_count_nexthops(cfg->rc_mp, cfg->rc_mp_len,
-

[PATCH net-next v3 2/6] net: mpls: Convert number of nexthops to u8

2017-03-31 Thread David Ahern

Number of nexthops and number of alive nexthops are tracked using an
unsigned int. A route should never have more than 255 nexthops so
convert both to u8. Update all references and intermediate variables
to consistently use u8 as well.

Shrinks the size of mpls_route from 32 bytes to 24 bytes with a 2-byte
hole before the nexthops.

Signed-off-by: David Ahern 
---
v3
- no change

v2
- label u16 hole in mpls_route as rt_reserved1

 net/mpls/af_mpls.c  | 28 +---
 net/mpls/internal.h |  5 +++--
 2 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 6bdd2f95b576..665dec84f001 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -197,10 +197,10 @@ static u32 mpls_multipath_hash(struct mpls_route *rt, 
struct sk_buff *skb)
 static struct mpls_nh *mpls_select_multipath(struct mpls_route *rt,
 struct sk_buff *skb)
 {
-   unsigned int alive;
u32 hash = 0;
int nh_index = 0;
int n = 0;
+   u8 alive;
 
/* No need to look further into packet if there's only
 * one path
@@ -466,7 +466,7 @@ struct mpls_route_config {
int rc_mp_len;
 };
 
-static struct mpls_route *mpls_rt_alloc(int num_nh, u8 max_alen)
+static struct mpls_route *mpls_rt_alloc(u8 num_nh, u8 max_alen)
 {
u8 max_alen_aligned = ALIGN(max_alen, VIA_ALEN_ALIGN);
struct mpls_route *rt;
@@ -744,11 +744,11 @@ static int mpls_nh_build(struct net *net, struct 
mpls_route *rt,
return err;
 }
 
-static int mpls_count_nexthops(struct rtnexthop *rtnh, int len,
-  u8 cfg_via_alen, u8 *max_via_alen)
+static u8 mpls_count_nexthops(struct rtnexthop *rtnh, int len,
+ u8 cfg_via_alen, u8 *max_via_alen)
 {
-   int nhs = 0;
int remaining = len;
+   u8 nhs = 0;
 
if (!rtnh) {
*max_via_alen = cfg_via_alen;
@@ -773,7 +773,13 @@ static int mpls_count_nexthops(struct rtnexthop *rtnh, int 
len,
  via_alen);
}
 
+   /* number of nexthops is tracked by a u8.
+* Check for overflow.
+*/
+   if (nhs == 255)
+   return 0;
nhs++;
+
rtnh = rtnh_next(rtnh, );
}
 
@@ -787,8 +793,8 @@ static int mpls_nh_build_multi(struct mpls_route_config 
*cfg,
struct rtnexthop *rtnh = cfg->rc_mp;
struct nlattr *nla_via, *nla_newdst;
int remaining = cfg->rc_mp_len;
-   int nhs = 0;
int err = 0;
+   u8 nhs = 0;
 
change_nexthops(rt) {
int attrlen;
@@ -842,7 +848,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
int err = -EINVAL;
u8 max_via_alen;
unsigned index;
-   int nhs;
+   u8 nhs;
 
index = cfg->rc_label;
 
@@ -1310,7 +1316,7 @@ static void mpls_ifdown(struct net_device *dev, int event)
 {
struct mpls_route __rcu **platform_label;
struct net *net = dev_net(dev);
-   unsigned int alive, deleted;
+   u8 alive, deleted;
unsigned index;
 
platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -1362,7 +1368,7 @@ static void mpls_ifup(struct net_device *dev, unsigned 
int flags)
struct mpls_route __rcu **platform_label;
struct net *net = dev_net(dev);
unsigned index;
-   int alive;
+   u8 alive;
 
platform_label = rtnl_dereference(net->mpls.platform_label);
for (index = 0; index < net->mpls.platform_labels; index++) {
@@ -1786,8 +1792,8 @@ static int mpls_dump_route(struct sk_buff *skb, u32 
portid, u32 seq, int event,
} else {
struct rtnexthop *rtnh;
struct nlattr *mp;
-   int dead = 0;
-   int linkdown = 0;
+   u8 linkdown = 0;
+   u8 dead = 0;
 
mp = nla_nest_start(skb, RTA_MULTIPATH);
if (!mp)
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 91419fe63464..2ac97433c3b7 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -127,12 +127,13 @@ struct mpls_route { /* next hop label forwarding entry */
u8  rt_payload_type;
u8  rt_max_alen;
u8  rt_ttl_propagate;
-   unsigned intrt_nhn;
+   u8  rt_nhn;
 
/* rt_nhn_alive is accessed under RCU in the packet path; it
 * is modified handling netdev events with rtnl lock held
 */
-   unsigned intrt_nhn_alive;
+   u8  rt_nhn_alive;
+   u16 rt_reserved1;
struct mpls_nh  rt_nh[0];
 };
 
-- 
2.1.4

Re: [PATCH] selftests: add a generic testsuite for ethernet device

2017-03-31 Thread Andrew Lunn

On Fri, Mar 31, 2017 at 02:57:52PM +0200, Corentin Labbe wrote:
> This patch add a generic testsuite for testing ethernet network device driver.
> 
> Signed-off-by: Corentin Labbe 
> ---
>  tools/testing/selftests/net/Makefile |   2 +-
>  tools/testing/selftests/net/netdevice.sh | 185 
> +++
>  2 files changed, 186 insertions(+), 1 deletion(-)
>  create mode 100755 tools/testing/selftests/net/netdevice.sh
> 
> diff --git a/tools/testing/selftests/net/Makefile 
> b/tools/testing/selftests/net/Makefile
> index fbfe5d0..35cbb4c 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -5,7 +5,7 @@ CFLAGS += -I../../../../usr/include/
>  
>  reuseport_bpf_numa: LDFLAGS += -lnuma
>  
> -TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh
> +TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh
>  TEST_GEN_FILES =  socket
>  TEST_GEN_FILES += psock_fanout psock_tpacket
>  TEST_GEN_FILES += reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
> diff --git a/tools/testing/selftests/net/netdevice.sh 
> b/tools/testing/selftests/net/netdevice.sh
> new file mode 100755
> index 000..89ba827
> --- /dev/null
> +++ b/tools/testing/selftests/net/netdevice.sh
> @@ -0,0 +1,185 @@
> +#!/bin/sh
> +#
> +# This test is for checking network interface
> +# For the moment it tests only ethernet interface (but wifi could be easily 
> added)
> +#
> +# We assume that all network driver are loaded
> +# if not they probably have failed earlier in the boot process and their 
> logged error will be catched by another test
> +#
> +

Hi Corentin

Nice to see some basic tests.

> +# this function will try to up the interface
> +# if already up, nothing done
> +# arg1: network interface name
> +kci_net_start()
> +{
> + netdev=$1
> +
> + ip link show "$netdev" |grep -q UP
> + if [ $? -eq 0 ];then
> + echo "SKIP: interface $netdev already up"
> + return 0
> + fi
> +
> + ip link set "$netdev" up
> + if [ $? -ne 0 ];then
> + echo "FAIL: Fail to up $netdev"
> + return 1
> + else
> + echo "PASS: set interface $netdev up"
> + NETDEV_STARTED=1
> + fi

This is going to be problematic.

3: eth1:  mtu 1500 qdisc pfifo_fast state DOWN mode 
DEFAULT group default qlen 1000
link/ether 8e:01:30:d5:63:ff brd ff:ff:ff:ff:ff:ff
4: lan1@eth1:  mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
link/ether 8e:01:30:d5:63:ff brd ff:ff:ff:ff:ff:ff

lan1 has eth1 as its master interface. If you try to up lan1 while eth1 is down:

# ip link set lan1 up
RTNETLINK answers: Network is down

> +
> +ls /sys/class/net/ |grep -vE '^lo|^tun' | grep -E '^eth|enp[0-9]s[0-9]' > 
> "$TMP_LIST_NETDEV"
> +while read netdev
> +do
> + kci_test_netdev "$netdev"
> +done < "$TMP_LIST_NETDEV"

Because of the grep, on this board, you won't actually test
lan1. Which is a shame. It would be nice to test it, and the other
interfaces like it.

Rather than going on the order ls gives you, could you order it based
on the ifnum? The master has to exist before a slave can be
created. Hence the master has a lower ifnum than the slave. So bring
the interfaces up in ifnum order, and down in reverse order.

Thanks
Andrew

Re: [PATCH v2] tracing/kprobes: expose maxactive for kretprobe in kprobe_events

2017-03-31 Thread Steven Rostedt

On Fri, 31 Mar 2017 15:20:24 +0200
Alban Crequy  wrote:

> When a kretprobe is installed on a kernel function, there is a maximum
> limit of how many calls in parallel it can catch (aka "maxactive"). A
> kernel module could call register_kretprobe() and initialize maxactive
> (see example in samples/kprobes/kretprobe_example.c).
> 
> But that is not exposed to userspace and it is currently not possible to
> choose maxactive when writing to /sys/kernel/debug/tracing/kprobe_events
> 
> The default maxactive can be as low as 1 on single-core with a
> non-preemptive kernel. This is too low and we need to increase it not
> only for recursive functions, but for functions that sleep or resched.
> 
> This patch updates the format of the command that can be written to
> kprobe_events so that maxactive can be optionally specified.
> 
> I need this for a bpf program attached to the kretprobe of
> inet_csk_accept, which can sleep for a long time.
> 
> This patch includes a basic selftest:
> 
> > # ./ftracetest -v  test.d/kprobe/
> > === Ftrace unit tests ===
> > [1] Kprobe dynamic event - adding and removing  [PASS]
> > [2] Kprobe dynamic event - busy event check [PASS]
> > [3] Kprobe dynamic event with arguments [PASS]
> > [4] Kprobes event arguments with types  [PASS]
> > [5] Kprobe dynamic event with function tracer   [PASS]
> > [6] Kretprobe dynamic event with arguments  [PASS]
> > [7] Kretprobe dynamic event with maxactive  [PASS]
> >
> > # of passed:  7
> > # of failed:  0
> > # of unresolved:  0
> > # of untested:  0
> > # of unsupported:  0
> > # of xfailed:  0
> > # of undefined(test bug):  0  
> 
> BugLink: https://github.com/iovisor/bcc/issues/1072
> Signed-off-by: Alban Crequy 
> 
> ---
> 
> Changes since v1:
> - Remove "(*)" from documentation. (Review from Masami Hiramatsu)
> - Fix support for "r100" without the event name (Review from Masami Hiramatsu)
> - Get rid of magic numbers within the code.  (Review from Steven Rostedt)
>   Note that I didn't use KRETPROBE_MAXACTIVE_ALLOC since that patch is not
>   merged.
> - Return -E2BIG when maxactive is too big.
> - Add basic selftest
> ---
>  Documentation/trace/kprobetrace.txt|  4 ++-
>  kernel/trace/trace_kprobe.c| 39 
> ++
>  .../ftrace/test.d/kprobe/kretprobe_maxactive.tc| 39 
> ++
>  3 files changed, 75 insertions(+), 7 deletions(-)
>  create mode 100644 
> tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_maxactive.tc
> 
> diff --git a/Documentation/trace/kprobetrace.txt 
> b/Documentation/trace/kprobetrace.txt
> index 41ef9d8..7051a20 100644
> --- a/Documentation/trace/kprobetrace.txt
> +++ b/Documentation/trace/kprobetrace.txt
> @@ -23,7 +23,7 @@ current_tracer. Instead of that, add probe points via
>  Synopsis of kprobe_events
>  -
>p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS]   : Set a probe
> -  r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]  : Set a return probe
> +  r[MAXACTIVE][:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS]   : Set a return 
> probe
>-:[GRP/]EVENT  : Clear a probe
>  
>   GRP : Group name. If omitted, use "kprobes" for it.
> @@ -32,6 +32,8 @@ Synopsis of kprobe_events
>   MOD : Module name which has given SYM.
>   SYM[+offs]  : Symbol+offset where the probe is inserted.
>   MEMADDR : Address where the probe is inserted.
> + MAXACTIVE   : Maximum number of instances of the specified function that
> +   can be probed simultaneously, or 0 for the default.

BTW, to me, 0 means none (no instances can probe). This should have a
better description of what "0" actually means.

-- Steve


>  
>   FETCHARGS   : Arguments. Each probe can have up to 128 args.
>%REG   : Fetch register REG

Re: [PATCH net] ftgmac100: Mostly rewrite the driver

2017-03-31 Thread Andrew Lunn

> We're running some more testing tonight, if it's all solid I'll shoot
> it out tomorrow or sunday. Dave, it's ok to just spam the list with a
> 55 patches series like that ?

Hi Ben

Is there a good reason to spam the list with 55 patches? The patches
should be incremental, so getting them reviewed and applied in batches
of 10 should not be a problem.

   Andrew

Re: [PATCHv3] net: usbnet: support 64bit stats in qmi_wwan driver

2017-03-31 Thread Bjørn Mork



On March 31, 2017 3:27:59 PM CEST, Greg Ungerer  wrote:
>On 31/03/17 18:48, Bjørn Mork wrote:
>
>>> +void usbnet_get_stats64(struct net_device *net, struct
>rtnl_link_stats64 *stats)
>>> +{
>>> +   struct usbnet *dev = netdev_priv(net);
>>> +   unsigned int start;
>>> +   int cpu;
>>> +
>>> +   netdev_stats_to_stats64(stats, >stats);
>>> +
>>> +   for_each_possible_cpu(cpu) {
>>> +   struct pcpu_sw_netstats *stats64;
>>> +   u64 rx_packets, rx_bytes;
>>> +   u64 tx_packets, tx_bytes;
>>> +
>>> +   stats64 = per_cpu_ptr(dev->stats64, cpu);
>>> +
>>> +   do {
>>> +   start = u64_stats_fetch_begin_irq(>syncp);
>>> +   rx_packets = stats64->rx_packets;
>>> +   rx_bytes = stats64->rx_bytes;
>>> +   tx_packets = stats64->tx_packets;
>>> +   tx_bytes = stats64->tx_bytes;
>>> +   } while (u64_stats_fetch_retry_irq(>syncp, start));
>>> +
>>> +   stats->rx_packets += rx_packets;
>>> +   stats->rx_bytes += rx_bytes;
>>> +   stats->tx_packets += tx_packets;
>>> +   stats->tx_bytes += tx_bytes;
>>> +   }
>>> +}
>>
>> So we only count packets and bytes.  No errors.  Why?
>
>All stats are counted. That call to netdev_stats_to_stats64() transfers
>all other stats struct fields (errors, etc) to the stats64 struct.
>No error counts are lost (though they are only stored as 32bits values
>on 32bit machines).


Ah, right. Thanks for explaining and sorry for being so slow. Then I have no 
objection to the patch as it is.


Bjørn

Re: [PATCH net-next] net: dsa: fix build error with devlink build as module

2017-03-31 Thread Andrew Lunn

On Fri, Mar 31, 2017 at 11:44:52AM +0200, Tobias Regnery wrote:
> After commit 96567d5dacf4 ("net: dsa: dsa2: Add basic support of devlink")
> I see the following link error with CONFIG_NET_DSA=y and CONFIG_NET_DEVLINK=m:
> 
> net/built-in.o: In function 'dsa_register_switch':
> (.text+0xe226b): undefined reference to `devlink_alloc'
> net/built-in.o: In function 'dsa_register_switch':
> (.text+0xe2284): undefined reference to `devlink_register'
> net/built-in.o: In function 'dsa_register_switch':
> (.text+0xe243e): undefined reference to `devlink_port_register'
> net/built-in.o: In function 'dsa_register_switch':
> (.text+0xe24e1): undefined reference to `devlink_port_register'
> net/built-in.o: In function 'dsa_register_switch':
> (.text+0xe24fa): undefined reference to `devlink_port_type_eth_set'
> net/built-in.o: In function 'dsa_dst_unapply.part.8':
> dsa2.c:(.text.unlikely+0x345): undefined reference to 
> 'devlink_port_unregister'
> dsa2.c:(.text.unlikely+0x36c): undefined reference to 
> 'devlink_port_unregister'
> dsa2.c:(.text.unlikely+0x38e): undefined reference to 
> 'devlink_port_unregister'
> dsa2.c:(.text.unlikely+0x3f2): undefined reference to 'devlink_unregister'
> dsa2.c:(.text.unlikely+0x3fb): undefined reference to 'devlink_free'
> 
> Fix this by adding a dependency on MAY_USE_DEVLINK so that CONFIG_NET_DSA
> get switched to be build as module when CONFIG_NET_DEVLINK=m.
> 
> Fixes: 96567d5dacf4 ("net: dsa: dsa2: Add basic support of devlink")
> Signed-off-by: Tobias Regnery 

Hi Tobias

0-day just found the same issue as well.

Reviewed-by: Andrew Lunn 

Thanks
Andrew

Re: [PATCHv3] net: usbnet: support 64bit stats in qmi_wwan driver

2017-03-31 Thread Greg Ungerer


Hi Oliver,

On 31/03/17 19:39, Oliver Neukum wrote:

Am Freitag, den 31.03.2017, 10:48 +0200 schrieb Bjørn Mork:

You get *all* the "0" line drivers for free, not only "qmi_wwan".  No
code changes needed, except for adding the single .ndo line to drivers
overriding the usbnet default net_device_ops. And even that only applies
to a few of them.  Most will be OK if you just change the usbnet
default.

I don't think the size of a complete series will be terrifying to
anyone.


It would really be nice to do that.
However, if you really don't want to do it, well you wrote
a patch. But I am afraid dropping the error count is not acceptable.


Of course dropping error counts would be, but that doesn't happen.

I will generate a patch that converts all usbnet users in one go.

Regards
Greg

1 2 >

1 - 100 of 163 matches

Mail list logo