date:20170321

Re: [patch -next] net: dwc-xlgmac: fix an error code in xlgmac_alloc_pages()

2017-03-21 Thread Jie Deng

On 2017/3/22 4:42, Dan Carpenter wrote:

> The dma_mapping_error() returns true if there is an error but we want
> to return -ENOMEM and not 1.
>
> Fixes: 65e0ace2c5cd ("net: dwc-xlgmac: Initial driver for DesignWare 
> Enterprise Ethernet")
> Signed-off-by: Dan Carpenter 
>
> diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c 
> b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
> index 55c796ed7d26..39b5cb967bba 100644
> --- a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
> +++ b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
> @@ -335,7 +335,6 @@ static int xlgmac_alloc_pages(struct xlgmac_pdata *pdata,
>  {
>   struct page *pages = NULL;
>   dma_addr_t pages_dma;
> - int ret;
>  
>   /* Try to obtain pages, decreasing order if necessary */
>   gfp |= __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
> @@ -352,10 +351,9 @@ static int xlgmac_alloc_pages(struct xlgmac_pdata *pdata,
>   /* Map the pages */
>   pages_dma = dma_map_page(pdata->dev, pages, 0,
>PAGE_SIZE << order, DMA_FROM_DEVICE);
> - ret = dma_mapping_error(pdata->dev, pages_dma);
> - if (ret) {
> + if (dma_mapping_error(pdata->dev, pages_dma)) {
>   put_page(pages);
> - return ret;
> + return -ENOMEM;
>   }
>  
>   pa->pages = pages;
Thanks for fixes.
Reviewed-by: Jie Deng

[PATCH net-next v3] net: Add sysctl to toggle early demux for tcp and udp

2017-03-21 Thread Subash Abhinov Kasiviswanathan

Certain system process significant unconnected UDP workload.
It would be preferrable to disable UDP early demux for those systems
and enable it for TCP only.

By disabling UDP demux, we see these slight gains on an ARM64 system-
782 -> 788Mbps unconnected single stream UDPv4
633 -> 654Mbps unconnected UDPv4 different sources

The performance impact can change based on CPU architecure and cache
sizes. There will not much difference seen if entire UDP hash table
is in cache.

Both sysctls are enabled by default to preserve existing behavior.

v1->v2: Change function pointer instead of adding conditional as
suggested by Stephen.

v2->v3: Read once in callers to avoid issues due to compiler
optimizations. Also update commit message with the tests.

Signed-off-by: Subash Abhinov Kasiviswanathan 
Suggested-by: Eric Dumazet 
Cc: Stephen Hemminger 
Cc: Tom Herbert 
---
 Documentation/networking/ip-sysctl.txt | 11 +++-
 include/net/netns/ipv4.h   |  2 ++
 include/net/tcp.h  |  2 ++
 include/net/udp.h  |  3 +++
 net/ipv4/af_inet.c | 22 ++--
 net/ipv4/ip_input.c|  2 +-
 net/ipv4/sysctl_net_ipv4.c | 48 ++
 net/ipv6/ip6_input.c   |  2 +-
 net/ipv6/tcp_ipv6.c| 10 ++-
 net/ipv6/udp.c | 10 ++-
 10 files changed, 105 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index ed3d079..6b921a1 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -848,12 +848,21 @@ ip_dynaddr - BOOLEAN
 ip_early_demux - BOOLEAN
Optimize input packet processing down to one demux for
certain kinds of local sockets.  Currently we only do this
-   for established TCP sockets.
+   for established TCP and connected UDP sockets.
 
It may add an additional cost for pure routing workloads that
reduces overall throughput, in such case you should disable it.
Default: 1
 
+tcp_early_demux - BOOLEAN
+   Enable early demux for established TCP sockets.
+   Default: 1
+
+udp_early_demux - BOOLEAN
+   Enable early demux for connected UDP sockets. Disable this if
+   your system could experience more unconnected load.
+   Default: 1
+
 icmp_echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 2e9d649..a489b76 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -95,6 +95,8 @@ struct netns_ipv4 {
/* Shall we try to damage output packets if routing dev changes? */
int sysctl_ip_dynaddr;
int sysctl_ip_early_demux;
+   int sysctl_tcp_early_demux;
+   int sysctl_udp_early_demux;
 
int sysctl_fwmark_reflect;
int sysctl_tcp_fwmark_accept;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index e614ad4..edc1df4 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1932,4 +1932,6 @@ static inline void tcp_listendrop(const struct sock *sk)
__NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENDROPS);
 }
 
+void tcp_v4_early_demux_configure(int enable);
+void tcp_v6_early_demux_configure(int enable);
 #endif /* _TCP_H */
diff --git a/include/net/udp.h b/include/net/udp.h
index c9d8b8e..33198fa 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -372,4 +372,7 @@ struct udp_iter_state {
 #if IS_ENABLED(CONFIG_IPV6)
 void udpv6_encap_enable(void);
 #endif
+
+void udp_v4_early_demux_configure(int enable);
+void udp_v6_early_demux_configure(int enable);
 #endif /* _UDP_H */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 6b1fc6e..d286750 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1599,7 +1599,7 @@ u64 snmp_fold_field64(void __percpu *mib, int offt, 
size_t syncp_offset)
 };
 #endif
 
-static const struct net_protocol tcp_protocol = {
+static struct net_protocol tcp_protocol = {
.early_demux=   tcp_v4_early_demux,
.handler=   tcp_v4_rcv,
.err_handler=   tcp_v4_err,
@@ -1608,7 +1608,7 @@ u64 snmp_fold_field64(void __percpu *mib, int offt, 
size_t syncp_offset)
.icmp_strict_tag_validation = 1,
 };
 
-static const struct net_protocol udp_protocol = {
+static struct net_protocol udp_protocol = {
.early_demux =  udp_v4_early_demux,
.handler =  udp_rcv,
.err_handler =  udp_err,
@@ -1616,6 +1616,22 @@ u64 snmp_fold_field64(void __percpu *mib, int offt, 
size_t syncp_offset)
.netns_ok = 1,
 };
 
+void tcp_v4_early_demux_configure(int enable)
+{
+   if (enable)
+   tcp_protocol.early_demux = tcp_v4_early_demux;
+   else
+

Re: [PATCH net-next 1/8] ptr_ring: introduce batch dequeuing

2017-03-21 Thread Jason Wang




On 2017年03月21日 18:25, Sergei Shtylyov wrote:

Hello!

On 3/21/2017 7:04 AM, Jason Wang wrote:


Signed-off-by: Jason Wang 
---
 include/linux/ptr_ring.h | 65 


 1 file changed, 65 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 6c70444..4771ded 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -247,6 +247,22 @@ static inline void *__ptr_ring_consume(struct 
ptr_ring *r)

 return ptr;
 }

+static inline int __ptr_ring_consume_batched(struct ptr_ring *r,
+ void **array, int n)
+{
+void *ptr;
+int i = 0;
+
+while (i < n) {


   Hm, why not *for*?


Yes, it maybe better, if there's other comment on the series, will 
change it in next version.


Thanks

[PATCH net] ipv4: provide stronger user input validation in nl_fib_input()

2017-03-21 Thread Eric Dumazet

From: Eric Dumazet 

Alexander reported a KMSAN splat caused by reads of uninitialized
field (tb_id_in) from user provided struct fib_result_nl

It turns out nl_fib_input() sanity tests on user input is a bit
wrong :

User can pretend nlh->nlmsg_len is big enough, but provide
at sendmsg() time a too small buffer.

Reported-by: Alexander Potapenko 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/fib_frontend.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 
42bfd08109dd78ab509493e8d2205d72845bb3eb..8f2133ffc2ff1b94871408a5f934cb938d3462b5
 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -1083,7 +1083,8 @@ static void nl_fib_input(struct sk_buff *skb)
 
net = sock_net(skb->sk);
nlh = nlmsg_hdr(skb);
-   if (skb->len < NLMSG_HDRLEN || skb->len < nlh->nlmsg_len ||
+   if (skb->len < nlmsg_total_size(sizeof(*frn)) ||
+   skb->len < nlh->nlmsg_len ||
nlmsg_len(nlh) < sizeof(*frn))
return;

[PATCH net] bpf: fix hashmap extra_elems logic

2017-03-21 Thread Alexei Starovoitov

In both kmalloc and prealloc mode the bpf_map_update_elem() is using
per-cpu extra_elems to do atomic update when the map is full.
There are two issues with it. The logic can be misused, since it allows
max_entries+num_cpus elements to be present in the map. And alloc_extra_elems()
at map creation time can fail percpu alloc for large map values with a warn:
WARNING: CPU: 3 PID: 2752 at ../mm/percpu.c:892 pcpu_alloc+0x119/0xa60
illegal size (32824) or align (8) for percpu allocation

The fixes for both of these issues are different for kmalloc and prealloc modes.
For prealloc mode allocate extra num_possible_cpus elements and store
their pointers into extra_elems array instead of actual elements.
Hence we can use these hidden(spare) elements not only when the map is full
but during bpf_map_update_elem() that replaces existing element too.
That also improves performance, since pcpu_freelist_pop/push is avoided.
Unfortunately this approach cannot be used for kmalloc mode which needs
to kfree elements after rcu grace period. Therefore switch it back to normal
kmalloc even when full and old element exists like it was prior to
commit 6c9059817432 ("bpf: pre-allocate hash map elements").

Add tests to check for over max_entries and large map values.

Reported-by: Dave Jones 
Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 kernel/bpf/hashtab.c| 144 
 tools/testing/selftests/bpf/test_maps.c |  29 ++-
 2 files changed, 97 insertions(+), 76 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index afe5bab376c9..361a69dfe543 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -30,18 +30,12 @@ struct bpf_htab {
struct pcpu_freelist freelist;
struct bpf_lru lru;
};
-   void __percpu *extra_elems;
+   struct htab_elem *__percpu *extra_elems;
atomic_t count; /* number of elements in this hashtable */
u32 n_buckets;  /* number of hash buckets */
u32 elem_size;  /* size of each element in bytes */
 };
 
-enum extra_elem_state {
-   HTAB_NOT_AN_EXTRA_ELEM = 0,
-   HTAB_EXTRA_ELEM_FREE,
-   HTAB_EXTRA_ELEM_USED
-};
-
 /* each htab element is struct htab_elem + key + value */
 struct htab_elem {
union {
@@ -56,7 +50,6 @@ struct htab_elem {
};
union {
struct rcu_head rcu;
-   enum extra_elem_state state;
struct bpf_lru_node lru_node;
};
u32 hash;
@@ -77,6 +70,11 @@ static bool htab_is_percpu(const struct bpf_htab *htab)
htab->map.map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH;
 }
 
+static bool htab_is_prealloc(const struct bpf_htab *htab)
+{
+   return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
+}
+
 static inline void htab_elem_set_ptr(struct htab_elem *l, u32 key_size,
 void __percpu *pptr)
 {
@@ -128,17 +126,20 @@ static struct htab_elem *prealloc_lru_pop(struct bpf_htab 
*htab, void *key,
 
 static int prealloc_init(struct bpf_htab *htab)
 {
+   u32 num_entries = htab->map.max_entries;
int err = -ENOMEM, i;
 
-   htab->elems = bpf_map_area_alloc(htab->elem_size *
-htab->map.max_entries);
+   if (!htab_is_percpu(htab) && !htab_is_lru(htab))
+   num_entries += num_possible_cpus();
+
+   htab->elems = bpf_map_area_alloc(htab->elem_size * num_entries);
if (!htab->elems)
return -ENOMEM;
 
if (!htab_is_percpu(htab))
goto skip_percpu_elems;
 
-   for (i = 0; i < htab->map.max_entries; i++) {
+   for (i = 0; i < num_entries; i++) {
u32 size = round_up(htab->map.value_size, 8);
void __percpu *pptr;
 
@@ -166,11 +167,11 @@ static int prealloc_init(struct bpf_htab *htab)
if (htab_is_lru(htab))
bpf_lru_populate(>lru, htab->elems,
 offsetof(struct htab_elem, lru_node),
-htab->elem_size, htab->map.max_entries);
+htab->elem_size, num_entries);
else
pcpu_freelist_populate(>freelist,
   htab->elems + offsetof(struct htab_elem, 
fnode),
-  htab->elem_size, htab->map.max_entries);
+  htab->elem_size, num_entries);
 
return 0;
 
@@ -191,16 +192,22 @@ static void prealloc_destroy(struct bpf_htab *htab)
 
 static int alloc_extra_elems(struct bpf_htab *htab)
 {
-   void __percpu *pptr;
+   struct htab_elem *__percpu *pptr, *l_new;
+   struct pcpu_freelist_node *l;
int cpu;
 
-   pptr = __alloc_percpu_gfp(htab->elem_size, 8,

[PATCH] netfilter: ipset: print out warnings generated by commands

2017-03-21 Thread Vishwanath Pai

Warnings are only printed out for IPSET_CMD_TEST. The user won't see
warnings from other commands.

Reviewed-by: Josh Hunt 
Signed-off-by: Vishwanath Pai 
---
 src/ipset.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/ipset.c b/src/ipset.c
index 2c4fa10..b0bef7b 100644
--- a/src/ipset.c
+++ b/src/ipset.c
@@ -812,8 +812,8 @@ parse_commandline(int argc, char *argv[])
"Unknown argument %s", argv[1]);
ret = ipset_cmd(session, cmd, restore_line);
D("ret %d", ret);
-   /* Special case for TEST and non-quiet mode */
-   if (cmd == IPSET_CMD_TEST && ipset_session_warning(session)) {
+
+   if (ipset_session_warning(session)) {
if (!ipset_envopt_test(session, IPSET_ENV_QUIET))
fprintf(stderr, "%s", ipset_session_warning(session));
ipset_session_report_reset(session);
-- 
1.9.1

[PATCH 1/2] netfilter: ipset: warn users of list:set that parameter 'size' is ignored

2017-03-21 Thread Vishwanath Pai

Since kernel commit 00590fdd5be0 ("netfilter: ipset: Introduce RCU
locking in list type"), the parameter 'size' has not been in use and
is ignored by the kernel. This is not very apparent to the user. This
commit makes 'size' optional and also warns the user if they try to
specify it. We also don't print it out on 'ipset l'.

I created revision 4 to make this change, revision 3 should work with
older kernels just like before.

Reviewed-by: Josh Hunt 
Signed-off-by: Vishwanath Pai 
---
 lib/ipset_list_set.c | 92 
 1 file changed, 92 insertions(+)

diff --git a/lib/ipset_list_set.c b/lib/ipset_list_set.c
index 45934e7..2d8bc7a 100644
--- a/lib/ipset_list_set.c
+++ b/lib/ipset_list_set.c
@@ -322,6 +322,31 @@ static const struct ipset_arg list_set_create_args3[] = {
{ },
 };
 
+/* Parse commandline arguments */
+static const struct ipset_arg list_set_create_args4[] = {
+   { .name = { "size", NULL },
+ .has_arg = IPSET_OPTIONAL_ARG,.opt = IPSET_OPT_SIZE,
+ .parse = ipset_parse_ignored,
+   },
+   { .name = { "timeout", NULL },
+ .has_arg = IPSET_MANDATORY_ARG,   .opt = IPSET_OPT_TIMEOUT,
+ .parse = ipset_parse_timeout, .print = ipset_print_number,
+   },
+   { .name = { "counters", NULL },
+ .has_arg = IPSET_NO_ARG,  .opt = IPSET_OPT_COUNTERS,
+ .parse = ipset_parse_flag,.print = ipset_print_flag,
+   },
+   { .name = { "comment", NULL },
+ .has_arg = IPSET_NO_ARG,  .opt = IPSET_OPT_CREATE_COMMENT,
+ .parse = ipset_parse_flag,.print = ipset_print_flag,
+   },
+   { .name = { "skbinfo", NULL },
+ .has_arg = IPSET_NO_ARG,  .opt = IPSET_OPT_SKBINFO,
+ .parse = ipset_parse_flag,.print = ipset_print_flag,
+   },
+   { },
+};
+
 static const struct ipset_arg list_set_adt_args3[] = {
{ .name = { "timeout", NULL },
  .has_arg = IPSET_MANDATORY_ARG,   .opt = IPSET_OPT_TIMEOUT,
@@ -426,6 +451,72 @@ static struct ipset_type ipset_list_set3 = {
.usage = list_set_usage3,
.description = "skbinfo support",
 };
+
+static const char list_set_usage4[] =
+"create SETNAME list:set\n"
+"   [timeout VALUE] [counters] [comment]\n"
+"  [skbinfo]\n"
+"addSETNAME NAME [before|after NAME] [timeout VALUE]\n"
+"   [packets VALUE] [bytes VALUE] [comment STRING]\n"
+"  [skbmark VALUE] [skbprio VALUE] [skbqueue VALUE]\n"
+"delSETNAME NAME [before|after NAME]\n"
+"test   SETNAME NAME [before|after NAME]\n\n"
+"where NAME are existing set names.\n";
+
+static struct ipset_type ipset_list_set4 = {
+   .name = "list:set",
+   .alias = { "setlist", NULL },
+   .revision = 4,
+   .family = NFPROTO_UNSPEC,
+   .dimension = IPSET_DIM_ONE,
+   .elem = {
+   [IPSET_DIM_ONE - 1] = {
+   .parse = ipset_parse_setname,
+   .print = ipset_print_name,
+   .opt = IPSET_OPT_NAME
+   },
+   },
+   .compat_parse_elem = ipset_parse_name_compat,
+   .args = {
+   [IPSET_CREATE] = list_set_create_args4,
+   [IPSET_ADD] = list_set_adt_args3,
+   [IPSET_DEL] = list_set_adt_args2,
+   [IPSET_TEST] = list_set_adt_args2,
+   },
+   .mandatory = {
+   [IPSET_CREATE] = 0,
+   [IPSET_ADD] = IPSET_FLAG(IPSET_OPT_NAME),
+   [IPSET_DEL] = IPSET_FLAG(IPSET_OPT_NAME),
+   [IPSET_TEST] = IPSET_FLAG(IPSET_OPT_NAME),
+   },
+   .full = {
+   [IPSET_CREATE] = IPSET_FLAG(IPSET_OPT_SIZE)
+   | IPSET_FLAG(IPSET_OPT_TIMEOUT)
+   | IPSET_FLAG(IPSET_OPT_COUNTERS)
+   | IPSET_FLAG(IPSET_OPT_CREATE_COMMENT)
+   | IPSET_FLAG(IPSET_OPT_SKBINFO),
+   [IPSET_ADD] = IPSET_FLAG(IPSET_OPT_NAME)
+   | IPSET_FLAG(IPSET_OPT_BEFORE)
+   | IPSET_FLAG(IPSET_OPT_NAMEREF)
+   | IPSET_FLAG(IPSET_OPT_TIMEOUT)
+   | IPSET_FLAG(IPSET_OPT_PACKETS)
+   | IPSET_FLAG(IPSET_OPT_BYTES)
+   | IPSET_FLAG(IPSET_OPT_ADT_COMMENT)
+   | IPSET_FLAG(IPSET_OPT_SKBMARK)
+   | IPSET_FLAG(IPSET_OPT_SKBPRIO)
+   | IPSET_FLAG(IPSET_OPT_SKBQUEUE),
+   [IPSET_DEL] = IPSET_FLAG(IPSET_OPT_NAME)
+   | IPSET_FLAG(IPSET_OPT_BEFORE)
+   | IPSET_FLAG(IPSET_OPT_NAMEREF),
+   [IPSET_TEST] = IPSET_FLAG(IPSET_OPT_NAME)
+   | IPSET_FLAG(IPSET_OPT_BEFORE)
+   | IPSET_FLAG(IPSET_OPT_NAMEREF),
+   },
+
+   .usage =

Re: [PATCH 3/4] flowcache: make struct flow_cache_percpu::hash_rnd_recalc bool

2017-03-21 Thread David Miller

From: Alexey Dobriyan 
Date: Mon, 20 Mar 2017 01:27:43 +0300

> ->hash_rnd_recalc is only used in boolean context.
> 
> Space savings on x86_64 come from the fact that "MOV rm8, imm8" is
> shorter than "MOV rm32, imm32" by at least 3 bytes.
> 
>   add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-10 (-10)
>   function old new   delta
>   flow_cache_new_hashrnd   166 163  -3
>   flow_cache_cpu_up_prep   171 168  -3
>   flow_cache_lookup   11481144  -4
>   Total: Before=170822872, After=170822862, chg -0.00%
> 
> Signed-off-by: Alexey Dobriyan 

I agree with Eric Dumazet that we might have atomicity issues in the
future because of this change.

Why don't you drop this and resubmit just the other 3 patches which
seem to be much less controversial?

Thanks.

Re: [PATCH] net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4

2017-03-21 Thread David Miller

From: Tony Lindgren 
Date: Sun, 19 Mar 2017 09:19:57 -0700

> This gets qmicli working with the MDM6600 modem.
> 
> Cc: Bjørn Mork 
> Reviewed-by: Sebastian Reichel 
> Tested-by: Sebastian Reichel 
> Signed-off-by: Tony Lindgren 

Applied, thanks.

[PATCH 2/2] netfilter: ipset: warn users of list:set that parameter 'size' is ignored

2017-03-21 Thread Vishwanath Pai

Revision 4 warns the users that the parameter 'size' is ignored. The
kernel module doesn't need any changes, it will work with both the
revisions.

Note that this will not restore old behavior before commit 00590fdd5be0
("netfilter: ipset: Introduce RCU locking in list type") for users of
the older revision. It will be a much bigger change if that is
what we need.

Reviewed-by: Josh Hunt 
Signed-off-by: Vishwanath Pai 
---
 net/netfilter/ipset/ip_set_list_set.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index 178d4eb..d4f820a 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -19,7 +19,8 @@
 #define IPSET_TYPE_REV_MIN 0
 /* 1Counters support added */
 /* 2Comments support added */
-#define IPSET_TYPE_REV_MAX 3 /* skbinfo support added */
+/* 3skbinfo support added */
+#define IPSET_TYPE_REV_MAX 4 /* size argument is ignored */
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Jozsef Kadlecsik ");
-- 
1.9.1

Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-21 Thread Eric Dumazet

On Tue, 2017-03-21 at 16:51 -0700, Kees Cook wrote:

> Am I understanding you correctly that you'd want something like:
> 
> refcount.h:
> #ifdef UNPROTECTED_REFCOUNT
> #define refcount_inc(x)   atomic_inc(x)
> ...
> #else
> void refcount_inc(...
> ...
> #endif
> 
> some/net.c:
> #define UNPROTECTED_REFCOUNT
> #include 
> 
> or similar?

At first, it could be something simple like that yes.

Note that we might define two refcount_inc()  : One that does whole
tests, and refcount_inc_relaxed() that might translate to atomic_inc()
on non debug kernels.

Then later, maybe provide a dynamic infrastructure so that we can
dynamically force the full checks even for refcount_inc_relaxed() on say
1% of the hosts, to get better debug coverage ?

Re: [PATCH net-next 0/9] qed: IOV related clenaups

2017-03-21 Thread David Miller

From: Yuval Mintz 
Date: Sun, 19 Mar 2017 13:08:11 +0200

> This patch series targets IOV functionality [on both PF and VF].
> 
> Patches #2, #3 and #5 fix flows relating to malicious VFs, either by
> upgrading and aligning current safe-guards or by correcing racy flows.
> 
> Patches #1 and #8 make some malicious/dysnfunctional VFs logging appear
> by default in logs.
> 
> The rest of the patches either cleanup the existing code or else correct
> some possible [yet fairly insignicant] issues in VF behavior.

Series applied, thank you.

Re: [PATCH v2 net] selftests/bpf: fix broken build, take 2

2017-03-21 Thread David Miller

From: Shuah Khan 
Date: Mon, 20 Mar 2017 10:37:26 -0600

> On 03/20/2017 09:45 AM, Alexei Starovoitov wrote:
>> On Mon, Mar 20, 2017 at 04:31:28PM +0100, Daniel Borkmann wrote:
>>> On 03/20/2017 07:03 AM, Zi Shen Lim wrote:
 Merge of 'linux-kselftest-4.11-rc1':

 1. Partially removed use of 'test_objs' target, breaking force rebuild of
 BPFOBJ, introduced in commit d498f8719a09 ("bpf: Rebuild bpf.o for any
 dependency update").

   Update target so dependency on BPFOBJ is restored.

 2. Introduced commit 2047f1d8ba28 ("selftests: Fix the .c linking rule")
 which fixes order of LDLIBS.

   Commit d02d8986a768 ("bpf: Always test unprivileged programs") added
 libcap dependency into CFLAGS. Use LDLIBS instead to fix linking of
 test_verifier.

 3. Introduced commit d83c3ba0b926 ("selftests: Fix selftests build to
 just build, not run tests").

   Reordering the Makefile allows us to remove the 'all' target.

 Tested both:
 selftests/bpf$ make
 and
 selftests$ make TARGETS=bpf
 on Ubuntu 16.04.2.

 Signed-off-by: Zi Shen Lim 
>>>
>>> Looks reasonable to me as follow up to 1da8ac7c49fb ("selftests/bpf:
>>> fix broken build"), thanks for fixing Zi!
>>>
>>> Acked-by: Daniel Borkmann 
>>> Tested-by: Daniel Borkmann 
>> 
>> worked for me as well:
>> Acked-by: Alexei Starovoitov 
>> Tested-by: Alexei Starovoitov 
>> 
>> 
>> 
> 
> David,
> 
> Could you please apply it to your tree. I think you already applied
> the first fix.
> 
> Acked-by: Shuah Khan 

Done.

Re: [PATCH net 2/2] tcp: mark skbs with SCM_TIMESTAMPING_OPT_STATS

2017-03-21 Thread David Miller

From: Soheil Hassas Yeganeh 
Date: Sat, 18 Mar 2017 17:03:00 -0400

> From: Soheil Hassas Yeganeh 
> 
> SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled
> while packets are collected on the error queue.
> So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags
> is not enough to safely assume that the skb contains
> OPT_STATS data.
> 
> Add a bit in sock_exterr_skb to indicate whether the
> skb contains opt_stats data.
> 
> Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for 
> SO_TIMESTAMPING")
> Reported-by: JongHwan Kim 
> Signed-off-by: Soheil Hassas Yeganeh 
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Willem de Bruijn 

Also applied and queued up for -stable.

Re: [PATCH net 1/2] tcp: fix SCM_TIMESTAMPING_OPT_STATS for normal skbs

2017-03-21 Thread David Miller

From: Soheil Hassas Yeganeh 
Date: Sat, 18 Mar 2017 17:02:59 -0400

> From: Soheil Hassas Yeganeh 
> 
> __sock_recv_timestamp can be called for both normal skbs (for
> receive timestamps) and for skbs on the error queue (for transmit
> timestamps).
> 
> Commit 1c885808e456
> (tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING)
> assumes any skb passed to __sock_recv_timestamp are from
> the error queue, containing OPT_STATS in the content of the skb.
> This results in accessing invalid memory or generating junk
> data.
> 
> To fix this, set skb->pkt_type to PACKET_OUTGOING for packets
> on the error queue. This is safe because on the receive path
> on local sockets skb->pkt_type is never set to PACKET_OUTGOING.
> With that, copy OPT_STATS from a packet, only if its pkt_type
> is PACKET_OUTGOING.
> 
> Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for 
> SO_TIMESTAMPING")
> Reported-by: JongHwan Kim 
> Signed-off-by: Soheil Hassas Yeganeh 
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Willem de Bruijn 

Applied and queued up for -stable.

Re: [PATCH net] sctp: out_qlen should be updated when pruning unsent queue

2017-03-21 Thread David Miller

From: Xin Long 
Date: Sat, 18 Mar 2017 20:03:59 +0800

> This patch is to fix the issue that sctp_prsctp_prune_sent forgot
> to update q->out_qlen when removing a chunk from unsent queue.
> 
> Fixes: 8dbdf1f5b09c ("sctp: implement prsctp PRIO policy")
> Signed-off-by: Xin Long 

Applied, thanks.

Re: [PATCH net] sctp: define dst_pending_confirm as a bit in sctp_transport

2017-03-21 Thread David Miller

From: Xin Long 
Date: Sat, 18 Mar 2017 19:27:23 +0800

> As tp->dst_pending_confirm's value can only be set 0 or 1, this
> patch is to change to define it as a bit instead of __u32.
> 
> Signed-off-by: Xin Long 

Applied.

Re: [PATCH net] sctp: remove temporary variable confirm from sctp_packet_transmit

2017-03-21 Thread David Miller

From: Xin Long 
Date: Sat, 18 Mar 2017 19:12:22 +0800

> Commit c86a773c7802 ("sctp: add dst_pending_confirm flag") introduced
> a temporary variable "confirm" in sctp_packet_transmit.
> 
> But it broke the rule that longer lines should be above shorter ones.
> Besides, this variable is not necessary, so this patch is to just
> remove it and use tp->dst_pending_confirm directly.
> 
> Fixes: c86a773c7802 ("sctp: add dst_pending_confirm flag")
> Signed-off-by: Xin Long 

Applied.

[PATCH v2 net-next 1/4] drivers: net: xgene-v2: Add MDIO support

2017-03-21 Thread Iyappan Subramanian

Added phy management support by using phy abstraction layer APIs.

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene-v2/Makefile |   2 +-
 drivers/net/ethernet/apm/xgene-v2/mac.c|   2 +-
 drivers/net/ethernet/apm/xgene-v2/mac.h|   1 +
 drivers/net/ethernet/apm/xgene-v2/main.c   |  11 +-
 drivers/net/ethernet/apm/xgene-v2/main.h   |   4 +
 drivers/net/ethernet/apm/xgene-v2/mdio.c   | 167 +
 6 files changed, 182 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/apm/xgene-v2/mdio.c

diff --git a/drivers/net/ethernet/apm/xgene-v2/Makefile 
b/drivers/net/ethernet/apm/xgene-v2/Makefile
index 735309c..0fa5975 100644
--- a/drivers/net/ethernet/apm/xgene-v2/Makefile
+++ b/drivers/net/ethernet/apm/xgene-v2/Makefile
@@ -2,5 +2,5 @@
 # Makefile for APM X-Gene Ethernet v2 driver
 #
 
-xgene-enet-v2-objs := main.o mac.o enet.o ring.o
+xgene-enet-v2-objs := main.o mac.o enet.o ring.o mdio.o
 obj-$(CONFIG_NET_XGENE_V2) += xgene-enet-v2.o
diff --git a/drivers/net/ethernet/apm/xgene-v2/mac.c 
b/drivers/net/ethernet/apm/xgene-v2/mac.c
index c3189de..ee431e3 100644
--- a/drivers/net/ethernet/apm/xgene-v2/mac.c
+++ b/drivers/net/ethernet/apm/xgene-v2/mac.c
@@ -27,7 +27,7 @@ void xge_mac_reset(struct xge_pdata *pdata)
xge_wr_csr(pdata, MAC_CONFIG_1, 0);
 }
 
-static void xge_mac_set_speed(struct xge_pdata *pdata)
+void xge_mac_set_speed(struct xge_pdata *pdata)
 {
u32 icm0, icm2, ecm0, mc2;
u32 intf_ctrl, rgmii;
diff --git a/drivers/net/ethernet/apm/xgene-v2/mac.h 
b/drivers/net/ethernet/apm/xgene-v2/mac.h
index 0fce6ae..74397c9 100644
--- a/drivers/net/ethernet/apm/xgene-v2/mac.h
+++ b/drivers/net/ethernet/apm/xgene-v2/mac.h
@@ -101,6 +101,7 @@ static inline u32 xgene_get_reg_bits(u32 var, int pos, int 
len)
 struct xge_pdata;
 
 void xge_mac_reset(struct xge_pdata *pdata);
+void xge_mac_set_speed(struct xge_pdata *pdata);
 void xge_mac_enable(struct xge_pdata *pdata);
 void xge_mac_disable(struct xge_pdata *pdata);
 void xge_mac_init(struct xge_pdata *pdata);
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c 
b/drivers/net/ethernet/apm/xgene-v2/main.c
index ae76977..82ac5b4 100644
--- a/drivers/net/ethernet/apm/xgene-v2/main.c
+++ b/drivers/net/ethernet/apm/xgene-v2/main.c
@@ -500,9 +500,10 @@ static int xge_open(struct net_device *ndev)
 
xge_intr_enable(pdata);
xge_wr_csr(pdata, DMARXCTRL, 1);
+
+   phy_start(ndev->phydev);
xge_mac_enable(pdata);
netif_start_queue(ndev);
-   netif_carrier_on(ndev);
 
return 0;
 }
@@ -511,9 +512,9 @@ static int xge_close(struct net_device *ndev)
 {
struct xge_pdata *pdata = netdev_priv(ndev);
 
-   netif_carrier_off(ndev);
netif_stop_queue(ndev);
xge_mac_disable(pdata);
+   phy_stop(ndev->phydev);
 
xge_intr_disable(pdata);
xge_free_irq(ndev);
@@ -683,9 +684,12 @@ static int xge_probe(struct platform_device *pdev)
if (ret)
goto err;
 
+   ret = xge_mdio_config(ndev);
+   if (ret)
+   goto err;
+
netif_napi_add(ndev, >napi, xge_napi, NAPI_POLL_WEIGHT);
 
-   netif_carrier_off(ndev);
ret = register_netdev(ndev);
if (ret) {
netdev_err(ndev, "Failed to register netdev\n");
@@ -713,6 +717,7 @@ static int xge_remove(struct platform_device *pdev)
dev_close(ndev);
rtnl_unlock();
 
+   xge_mdio_remove(ndev);
unregister_netdev(ndev);
free_netdev(ndev);
 
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.h 
b/drivers/net/ethernet/apm/xgene-v2/main.h
index ada7b0e..777f254 100644
--- a/drivers/net/ethernet/apm/xgene-v2/main.h
+++ b/drivers/net/ethernet/apm/xgene-v2/main.h
@@ -65,6 +65,7 @@ struct xge_pdata {
struct xge_desc_ring *rx_ring;
struct platform_device *pdev;
char irq_name[IRQ_ID_SIZE];
+   struct mii_bus *mdio_bus;
struct net_device *ndev;
struct napi_struct napi;
struct xge_stats stats;
@@ -72,4 +73,7 @@ struct xge_pdata {
u8 nbufs;
 };
 
+int xge_mdio_config(struct net_device *ndev);
+void xge_mdio_remove(struct net_device *ndev);
+
 #endif /* __XGENE_ENET_V2_MAIN_H__ */
diff --git a/drivers/net/ethernet/apm/xgene-v2/mdio.c 
b/drivers/net/ethernet/apm/xgene-v2/mdio.c
new file mode 100644
index 000..a583c6a
--- /dev/null
+++ b/drivers/net/ethernet/apm/xgene-v2/mdio.c
@@ -0,0 +1,167 @@
+/*
+ * Applied Micro X-Gene SoC Ethernet v2 Driver
+ *
+ * Copyright (c) 2017, Applied Micro Circuits Corporation
+ * Author(s): Iyappan Subramanian 
+ *   Keyur Chudgar 
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * This

[PATCH v2 net-next 0/4] drivers: net: xgene-v2: Add MDIO and ethtool support

2017-03-21 Thread Iyappan Subramanian

This patch set,

- adds phy management and ethtool support
- fixes ethernet reset
- addresses review comments from previous patch set

Signed-off-by: Iyappan Subramanian 
---
v2: Address review comments from v1
- removed mdio_lock, since there is a top level lock in mdio_bus.c

v1:
- Initial version
---

Iyappan Subramanian (4):
  drivers: net: xgene-v2: Add MDIO support
  drivers: net: xgene-v2: Add ethtool support
  drivers: net: xgene-v2: Fix port reset
  drivers: net: xgene-v2: misc fixes

 drivers/net/ethernet/apm/xgene-v2/Makefile  |   2 +-
 drivers/net/ethernet/apm/xgene-v2/enet.c|  24 +++-
 drivers/net/ethernet/apm/xgene-v2/enet.h|   2 +
 drivers/net/ethernet/apm/xgene-v2/ethtool.c | 121 
 drivers/net/ethernet/apm/xgene-v2/mac.c |   2 +-
 drivers/net/ethernet/apm/xgene-v2/mac.h |   2 +-
 drivers/net/ethernet/apm/xgene-v2/main.c|  67 +--
 drivers/net/ethernet/apm/xgene-v2/main.h|   5 +
 drivers/net/ethernet/apm/xgene-v2/mdio.c| 167 
 9 files changed, 351 insertions(+), 41 deletions(-)
 create mode 100644 drivers/net/ethernet/apm/xgene-v2/ethtool.c
 create mode 100644 drivers/net/ethernet/apm/xgene-v2/mdio.c

-- 
1.9.1

[PATCH v2 net-next 4/4] drivers: net: xgene-v2: misc fixes

2017-03-21 Thread Iyappan Subramanian

Fixed review comments from the previous patch-set.

- changed return value check of platform_get_irq() to < 0
- replaced devm_request(free)_irq() calls by request(free)_irq() since
  they are called from open() and close()
- changed sizeof(struct mystruct) to sizeof(*mystruct)
- reduced indentation on tx_timeout()

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene-v2/main.c | 55 +++-
 1 file changed, 26 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c 
b/drivers/net/ethernet/apm/xgene-v2/main.c
index e764e58..0f2ad50 100644
--- a/drivers/net/ethernet/apm/xgene-v2/main.c
+++ b/drivers/net/ethernet/apm/xgene-v2/main.c
@@ -66,9 +66,8 @@ static int xge_get_resources(struct xge_pdata *pdata)
}
 
ret = platform_get_irq(pdev, 0);
-   if (ret <= 0) {
-   dev_err(dev, "Unable to get ENET IRQ\n");
-   ret = ret ? : -ENXIO;
+   if (ret < 0) {
+   dev_err(dev, "Unable to get irq\n");
return ret;
}
pdata->resources.irq = ret;
@@ -156,13 +155,12 @@ static irqreturn_t xge_irq(const int irq, void *data)
 static int xge_request_irq(struct net_device *ndev)
 {
struct xge_pdata *pdata = netdev_priv(ndev);
-   struct device *dev = >pdev->dev;
int ret;
 
snprintf(pdata->irq_name, IRQ_ID_SIZE, "%s", ndev->name);
 
-   ret = devm_request_irq(dev, pdata->resources.irq, xge_irq,
-  0, pdata->irq_name, pdata);
+   ret = request_irq(pdata->resources.irq, xge_irq, 0, pdata->irq_name,
+ pdata);
if (ret)
netdev_err(ndev, "Failed to request irq %s\n", pdata->irq_name);
 
@@ -172,9 +170,8 @@ static int xge_request_irq(struct net_device *ndev)
 static void xge_free_irq(struct net_device *ndev)
 {
struct xge_pdata *pdata = netdev_priv(ndev);
-   struct device *dev = >pdev->dev;
 
-   devm_free_irq(dev, pdata->resources.irq, pdata);
+   free_irq(pdata->resources.irq, pdata);
 }
 
 static bool is_tx_slot_available(struct xge_raw_desc *raw_desc)
@@ -424,7 +421,7 @@ static struct xge_desc_ring *xge_create_desc_ring(struct 
net_device *ndev)
struct xge_desc_ring *ring;
u16 size;
 
-   ring = kzalloc(sizeof(struct xge_desc_ring), GFP_KERNEL);
+   ring = kzalloc(sizeof(*ring), GFP_KERNEL);
if (!ring)
return NULL;
 
@@ -436,7 +433,7 @@ static struct xge_desc_ring *xge_create_desc_ring(struct 
net_device *ndev)
if (!ring->desc_addr)
goto err;
 
-   ring->pkt_info = kcalloc(XGENE_ENET_NUM_DESC, sizeof(struct pkt_info),
+   ring->pkt_info = kcalloc(XGENE_ENET_NUM_DESC, sizeof(*ring->pkt_info),
 GFP_KERNEL);
if (!ring->pkt_info)
goto err;
@@ -598,28 +595,28 @@ static void xge_timeout(struct net_device *ndev)
 
rtnl_lock();
 
-   if (netif_running(ndev)) {
-   netif_carrier_off(ndev);
-   netif_stop_queue(ndev);
-   xge_intr_disable(pdata);
-   napi_disable(>napi);
+   if (!netif_running(ndev))
+   goto out;
 
-   xge_wr_csr(pdata, DMATXCTRL, 0);
-   xge_txc_poll(ndev);
-   xge_free_pending_skb(ndev);
-   xge_wr_csr(pdata, DMATXSTATUS, ~0U);
+   netif_stop_queue(ndev);
+   xge_intr_disable(pdata);
+   napi_disable(>napi);
 
-   xge_setup_desc(pdata->tx_ring);
-   xge_update_tx_desc_addr(pdata);
-   xge_mac_init(pdata);
+   xge_wr_csr(pdata, DMATXCTRL, 0);
+   xge_txc_poll(ndev);
+   xge_free_pending_skb(ndev);
+   xge_wr_csr(pdata, DMATXSTATUS, ~0U);
 
-   napi_enable(>napi);
-   xge_intr_enable(pdata);
-   xge_mac_enable(pdata);
-   netif_start_queue(ndev);
-   netif_carrier_on(ndev);
-   }
+   xge_setup_desc(pdata->tx_ring);
+   xge_update_tx_desc_addr(pdata);
+   xge_mac_init(pdata);
+
+   napi_enable(>napi);
+   xge_intr_enable(pdata);
+   xge_mac_enable(pdata);
+   netif_start_queue(ndev);
 
+out:
rtnl_unlock();
 }
 
@@ -653,7 +650,7 @@ static int xge_probe(struct platform_device *pdev)
struct xge_pdata *pdata;
int ret;
 
-   ndev = alloc_etherdev(sizeof(struct xge_pdata));
+   ndev = alloc_etherdev(sizeof(*pdata));
if (!ndev)
return -ENOMEM;
 
-- 
1.9.1

[PATCH v2 net-next 2/4] drivers: net: xgene-v2: Add ethtool support

2017-03-21 Thread Iyappan Subramanian

Added basic ethtool support.

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene-v2/Makefile  |   2 +-
 drivers/net/ethernet/apm/xgene-v2/ethtool.c | 121 
 drivers/net/ethernet/apm/xgene-v2/main.c|   1 +
 drivers/net/ethernet/apm/xgene-v2/main.h|   1 +
 4 files changed, 124 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/apm/xgene-v2/ethtool.c

diff --git a/drivers/net/ethernet/apm/xgene-v2/Makefile 
b/drivers/net/ethernet/apm/xgene-v2/Makefile
index 0fa5975..f16a2b3 100644
--- a/drivers/net/ethernet/apm/xgene-v2/Makefile
+++ b/drivers/net/ethernet/apm/xgene-v2/Makefile
@@ -2,5 +2,5 @@
 # Makefile for APM X-Gene Ethernet v2 driver
 #
 
-xgene-enet-v2-objs := main.o mac.o enet.o ring.o mdio.o
+xgene-enet-v2-objs := main.o mac.o enet.o ring.o mdio.o ethtool.o
 obj-$(CONFIG_NET_XGENE_V2) += xgene-enet-v2.o
diff --git a/drivers/net/ethernet/apm/xgene-v2/ethtool.c 
b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
new file mode 100644
index 000..0c426f5
--- /dev/null
+++ b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
@@ -0,0 +1,121 @@
+/*
+ * Applied Micro X-Gene SoC Ethernet v2 Driver
+ *
+ * Copyright (c) 2017, Applied Micro Circuits Corporation
+ * Author(s): Iyappan Subramanian 
+ *   Keyur Chudgar 
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include "main.h"
+
+struct xge_gstrings_stats {
+   char name[ETH_GSTRING_LEN];
+   int offset;
+};
+
+#define XGE_STAT(m){ #m, offsetof(struct xge_pdata, stats.m) }
+
+static const struct xge_gstrings_stats gstrings_stats[] = {
+   XGE_STAT(rx_packets),
+   XGE_STAT(tx_packets),
+   XGE_STAT(rx_bytes),
+   XGE_STAT(tx_bytes),
+   XGE_STAT(rx_errors)
+};
+
+#define XGE_STATS_LEN  ARRAY_SIZE(gstrings_stats)
+
+static void xge_get_drvinfo(struct net_device *ndev,
+   struct ethtool_drvinfo *info)
+{
+   struct xge_pdata *pdata = netdev_priv(ndev);
+   struct platform_device *pdev = pdata->pdev;
+
+   strcpy(info->driver, "xgene-enet-v2");
+   strcpy(info->version, XGENE_ENET_V2_VERSION);
+   snprintf(info->fw_version, ETHTOOL_FWVERS_LEN, "N/A");
+   sprintf(info->bus_info, "%s", pdev->name);
+}
+
+static void xge_get_strings(struct net_device *ndev, u32 stringset, u8 *data)
+{
+   u8 *p = data;
+   int i;
+
+   if (stringset != ETH_SS_STATS)
+   return;
+
+   for (i = 0; i < XGE_STATS_LEN; i++) {
+   memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN);
+   p += ETH_GSTRING_LEN;
+   }
+}
+
+static int xge_get_sset_count(struct net_device *ndev, int sset)
+{
+   if (sset != ETH_SS_STATS)
+   return -EINVAL;
+
+   return XGE_STATS_LEN;
+}
+
+static void xge_get_ethtool_stats(struct net_device *ndev,
+ struct ethtool_stats *dummy,
+ u64 *data)
+{
+   void *pdata = netdev_priv(ndev);
+   int i;
+
+   for (i = 0; i < XGE_STATS_LEN; i++)
+   *data++ = *(u64 *)(pdata + gstrings_stats[i].offset);
+}
+
+static int xge_get_link_ksettings(struct net_device *ndev,
+ struct ethtool_link_ksettings *cmd)
+{
+   struct phy_device *phydev = ndev->phydev;
+
+   if (!phydev)
+   return -ENODEV;
+
+   return phy_ethtool_ksettings_get(phydev, cmd);
+}
+
+static int xge_set_link_ksettings(struct net_device *ndev,
+ const struct ethtool_link_ksettings *cmd)
+{
+   struct phy_device *phydev = ndev->phydev;
+
+   if (!phydev)
+   return -ENODEV;
+
+   return phy_ethtool_ksettings_set(phydev, cmd);
+}
+
+static const struct ethtool_ops xge_ethtool_ops = {
+   .get_drvinfo = xge_get_drvinfo,
+   .get_link = ethtool_op_get_link,
+   .get_strings = xge_get_strings,
+   .get_sset_count = xge_get_sset_count,
+   .get_ethtool_stats = xge_get_ethtool_stats,
+   .get_link_ksettings = xge_get_link_ksettings,
+   .set_link_ksettings = xge_set_link_ksettings,
+};
+
+void xge_set_ethtool_ops(struct net_device *ndev)
+{
+   ndev->ethtool_ops = _ethtool_ops;
+}
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c

[PATCH v2 net-next 3/4] drivers: net: xgene-v2: Fix port reset

2017-03-21 Thread Iyappan Subramanian

Fixed port reset sequence by adding ECC init.

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene-v2/enet.c | 24 ++--
 drivers/net/ethernet/apm/xgene-v2/enet.h |  2 ++
 drivers/net/ethernet/apm/xgene-v2/mac.h  |  1 -
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene-v2/enet.c 
b/drivers/net/ethernet/apm/xgene-v2/enet.c
index b49edee..5998da0 100644
--- a/drivers/net/ethernet/apm/xgene-v2/enet.c
+++ b/drivers/net/ethernet/apm/xgene-v2/enet.c
@@ -38,10 +38,24 @@ u32 xge_rd_csr(struct xge_pdata *pdata, u32 offset)
 int xge_port_reset(struct net_device *ndev)
 {
struct xge_pdata *pdata = netdev_priv(ndev);
+   struct device *dev = >pdev->dev;
+   u32 data, wait = 10;
 
-   xge_wr_csr(pdata, ENET_SRST, 0x3);
-   xge_wr_csr(pdata, ENET_SRST, 0x2);
-   xge_wr_csr(pdata, ENET_SRST, 0x0);
+   xge_wr_csr(pdata, ENET_CLKEN, 0x3);
+   xge_wr_csr(pdata, ENET_SRST, 0xf);
+   xge_wr_csr(pdata, ENET_SRST, 0);
+   xge_wr_csr(pdata, CFG_MEM_RAM_SHUTDOWN, 1);
+   xge_wr_csr(pdata, CFG_MEM_RAM_SHUTDOWN, 0);
+
+   do {
+   usleep_range(100, 110);
+   data = xge_rd_csr(pdata, BLOCK_MEM_RDY);
+   } while (data != MEM_RDY && wait--);
+
+   if (data != MEM_RDY) {
+   dev_err(dev, "ECC init failed: %x\n", data);
+   return -ETIMEDOUT;
+   }
 
xge_wr_csr(pdata, ENET_SHIM, DEVM_ARAUX_COH | DEVM_AWAUX_COH);
 
@@ -59,13 +73,11 @@ static void xge_traffic_resume(struct net_device *ndev)
xge_wr_csr(pdata, RX_DV_GATE_REG, 1);
 }
 
-int xge_port_init(struct net_device *ndev)
+void xge_port_init(struct net_device *ndev)
 {
struct xge_pdata *pdata = netdev_priv(ndev);
 
pdata->phy_speed = SPEED_1000;
xge_mac_init(pdata);
xge_traffic_resume(ndev);
-
-   return 0;
 }
diff --git a/drivers/net/ethernet/apm/xgene-v2/enet.h 
b/drivers/net/ethernet/apm/xgene-v2/enet.h
index 40371cf..3fd36dc6 100644
--- a/drivers/net/ethernet/apm/xgene-v2/enet.h
+++ b/drivers/net/ethernet/apm/xgene-v2/enet.h
@@ -28,6 +28,7 @@
 #define CFG_MEM_RAM_SHUTDOWN   0xd070
 #define BLOCK_MEM_RDY  0xd074
 
+#define MEM_RDY0x
 #define DEVM_ARAUX_COH BIT(19)
 #define DEVM_AWAUX_COH BIT(3)
 
@@ -39,5 +40,6 @@
 void xge_wr_csr(struct xge_pdata *pdata, u32 offset, u32 val);
 u32 xge_rd_csr(struct xge_pdata *pdata, u32 offset);
 int xge_port_reset(struct net_device *ndev);
+void xge_port_init(struct net_device *ndev);
 
 #endif  /* __XGENE_ENET_V2_ENET__H__ */
diff --git a/drivers/net/ethernet/apm/xgene-v2/mac.h 
b/drivers/net/ethernet/apm/xgene-v2/mac.h
index 74397c9..18a9c9d 100644
--- a/drivers/net/ethernet/apm/xgene-v2/mac.h
+++ b/drivers/net/ethernet/apm/xgene-v2/mac.h
@@ -105,7 +105,6 @@ static inline u32 xgene_get_reg_bits(u32 var, int pos, int 
len)
 void xge_mac_enable(struct xge_pdata *pdata);
 void xge_mac_disable(struct xge_pdata *pdata);
 void xge_mac_init(struct xge_pdata *pdata);
-int xge_port_init(struct net_device *ndev);
 void xge_mac_set_station_addr(struct xge_pdata *pdata);
 
 #endif /* __XGENE_ENET_V2_MAC_H__ */
-- 
1.9.1

[PATCH net-next 13/15] nfp: flush xmit_more on error paths

2017-03-21 Thread Jakub Kicinski

In case of ring full or DMA mapping error remember to flush xmit_more
delayed kicks.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 255294b8bc5f..d35eeba86bac 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -759,6 +759,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
nn_dp_warn(dp, "TX ring %d busy. wrp=%u rdp=%u\n",
   qidx, tx_ring->wr_p, tx_ring->rd_p);
netif_tx_stop_queue(nd_q);
+   nfp_net_tx_xmit_more_flush(tx_ring);
u64_stats_update_begin(_vec->tx_sync);
r_vec->tx_busy++;
u64_stats_update_end(_vec->tx_sync);
@@ -867,6 +868,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
tx_ring->txbufs[wr_idx].fidx = -2;
 err_free:
nn_dp_warn(dp, "Failed to map DMA TX buffer\n");
+   nfp_net_tx_xmit_more_flush(tx_ring);
u64_stats_update_begin(_vec->tx_sync);
r_vec->tx_errors++;
u64_stats_update_end(_vec->tx_sync);
-- 
2.11.0

[PATCH net-next 05/15] nfp: document expected locking in the core

2017-03-21 Thread Jakub Kicinski

Document which fields of nfp_cpp are protected by which locks.

Signed-off-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c   | 33 ++
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index 62aa7bcee93d..4e08362d8c97 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -65,28 +65,49 @@ struct nfp_cpp_resource {
u64 end;
 };
 
+/**
+ * struct nfp_cpp - main nfpcore device structure
+ * Following fields are read-only after probe() exits or netdevs are spawned.
+ * @dev:   embedded device structure
+ * @op:low-level implementation ops
+ * @priv:  private data of the low-level implementation
+ * @model: chip model
+ * @interface: chip interface id we are using to reach it
+ * @serial:chip serial number
+ * @imb_cat_table: CPP Mapping Table
+ *
+ * Following fields can be used only in probe() or with rtnl held:
+ * @hwinfo:HWInfo database fetched from the device
+ * @rtsym: firmware run time symbols
+ *
+ * Following fields use explicit locking:
+ * @resource_list: NFP CPP resource list
+ * @resource_lock: protects @resource_list
+ *
+ * @area_cache_list:   cached areas for cpp/xpb read/write speed up
+ * @area_cache_mutex:  protects @area_cache_list
+ *
+ * @waitq: area wait queue
+ */
 struct nfp_cpp {
struct device dev;
 
-   void *priv; /* Private data of the low-level implementation */
+   void *priv;
 
u32 model;
u16 interface;
u8 serial[NFP_SERIAL_LEN];
 
const struct nfp_cpp_operations *op;
-   struct list_head resource_list; /* NFP CPP resource list */
+   struct list_head resource_list;
rwlock_t resource_lock;
wait_queue_head_t waitq;
 
-   /* NFP6000 CPP Mapping Table */
u32 imb_cat_table[16];
 
-   /* Cached areas for cpp/xpb readl/writel speedups */
-   struct mutex area_cache_mutex;  /* Lock for the area cache */
+   struct mutex area_cache_mutex;
struct list_head area_cache_list;
 
-   /* Cached information */
void *hwinfo;
void *rtsym;
 };
-- 
2.11.0

[PATCH net-next 08/15] nfp: don't ignore return value of wait_event_interruptible

2017-03-21 Thread Jakub Kicinski

When signal interrupts waiting for an area to become available
we assume success.  Pay attention to the return code.  Unpack
the code a little bit to make it more readable.

Signed-off-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c   | 56 +++---
 1 file changed, 38 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index 5189fedb0f4f..2e4796b52b84 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -411,9 +411,43 @@ nfp_cpp_area_alloc(struct nfp_cpp *cpp, u32 dest,
  */
 void nfp_cpp_area_free(struct nfp_cpp_area *area)
 {
+   if (atomic_read(>refcount))
+   nfp_warn(area->cpp, "Warning: freeing busy area\n");
nfp_cpp_area_put(area);
 }
 
+static bool nfp_cpp_area_acquire_try(struct nfp_cpp_area *area, int *status)
+{
+   *status = area->cpp->op->area_acquire(area);
+
+   return *status != -EAGAIN;
+}
+
+static int __nfp_cpp_area_acquire(struct nfp_cpp_area *area)
+{
+   int err, status;
+
+   if (atomic_inc_return(>refcount) > 1)
+   return 0;
+
+   if (!area->cpp->op->area_acquire)
+   return 0;
+
+   err = wait_event_interruptible(area->cpp->waitq,
+  nfp_cpp_area_acquire_try(area, ));
+   if (!err)
+   err = status;
+   if (err) {
+   nfp_warn(area->cpp, "Warning: area wait failed: %d\n", err);
+   atomic_dec(>refcount);
+   return err;
+   }
+
+   nfp_cpp_area_get(area);
+
+   return 0;
+}
+
 /**
  * nfp_cpp_area_acquire() - lock down a CPP area for access
  * @area:  CPP area handle
@@ -425,27 +459,13 @@ void nfp_cpp_area_free(struct nfp_cpp_area *area)
  */
 int nfp_cpp_area_acquire(struct nfp_cpp_area *area)
 {
-   mutex_lock(>mutex);
-   if (atomic_inc_return(>refcount) == 1) {
-   int (*a_a)(struct nfp_cpp_area *);
-
-   a_a = area->cpp->op->area_acquire;
-   if (a_a) {
-   int err;
+   int ret;
 
-   wait_event_interruptible(area->cpp->waitq,
-(err = a_a(area)) != -EAGAIN);
-   if (err < 0) {
-   atomic_dec(>refcount);
-   mutex_unlock(>mutex);
-   return err;
-   }
-   }
-   }
+   mutex_lock(>mutex);
+   ret = __nfp_cpp_area_acquire(area);
mutex_unlock(>mutex);
 
-   nfp_cpp_area_get(area);
-   return 0;
+   return ret;
 }
 
 /**
-- 
2.11.0

[PATCH net-next 06/15] nfp: lock area cache earlier

2017-03-21 Thread Jakub Kicinski

We shouldn't access area_cache_list without its lock even
to check if it's empty.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index 4e08362d8c97..5189fedb0f4f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -821,10 +821,7 @@ area_cache_get(struct nfp_cpp *cpp, u32 id,
 * the need for special case code below when
 * checking against available cache size.
 */
-   if (length == 0)
-   return NULL;
-
-   if (list_empty(>area_cache_list) || id == 0)
+   if (length == 0 || id == 0)
return NULL;
 
/* Remap from cpp_island to cpp_target */
@@ -832,10 +829,15 @@ area_cache_get(struct nfp_cpp *cpp, u32 id,
if (err < 0)
return NULL;
 
-   addr += *offset;
-
mutex_lock(>area_cache_mutex);
 
+   if (list_empty(>area_cache_list)) {
+   mutex_unlock(>area_cache_mutex);
+   return NULL;
+   }
+
+   addr += *offset;
+
/* See if we have a match */
list_for_each_entry(cache, >area_cache_list, entry) {
if (id == cache->id &&
-- 
2.11.0

[PATCH net-next 01/15] nfp: disallow sharing mutexes on the same machine

2017-03-21 Thread Jakub Kicinski

NFP can be connected to multiple machines via PCI or other buses.
Access to hardware resources is arbitrated using locks residing
in device memory.  Currently nfpcore only respects the mutexes
when it comes to inter-host locking, but if we try to acquire
the same lock again, on one host - it will simply return success
because owner of the lock is already set to that host.

This makes the locks useless for arbitration within one host
and unfair because whichever host grabbed the lock will have
a chance to reacquire it without others getting a shot.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index 40108e66c654..e2267b421af0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -1736,11 +1736,5 @@ int nfp_cpp_mutex_trylock(struct nfp_cpp_mutex *mutex)
return 0;
}
 
-   /* Already locked by us? Success! */
-   if (tmp == value) {
-   mutex->depth = 1;
-   return 0;
-   }
-
return nfp_mutex_is_locked(tmp) ? -EBUSY : -EINVAL;
 }
-- 
2.11.0

[PATCH net-next 12/15] nfp: remove RX queue pointers

2017-03-21 Thread Jakub Kicinski

NFP6000 doesn't use queue pointers/doorbells for RX, it uses
'done' bit in descriptors.  Remove the pointers from data structures.
Since we are saving space in rx_ring structure make fields we
previously compressed to 16bits word size again.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h |  8 ++--
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c  |  3 ---
 drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c | 15 ---
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 4d45f4573b57..8e04aa0e6e87 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -307,9 +307,7 @@ struct nfp_net_rx_buf {
  * @rd_p:   FL/RX ring read pointer (free running)
  * @idx:Ring index from Linux's perspective
  * @fl_qcidx:   Queue Controller Peripheral (QCP) queue index for the freelist
- * @rx_qcidx:   Queue Controller Peripheral (QCP) queue index for the RX queue
  * @qcp_fl: Pointer to base of the QCP freelist queue
- * @qcp_rx: Pointer to base of the QCP RX queue
  * @wr_ptr_add: Accumulated number of buffers to add to QCP write pointer
  *  (used for free list batching)
  * @rxbufs: Array of transmitted FL/RX buffers
@@ -324,13 +322,11 @@ struct nfp_net_rx_ring {
u32 wr_p;
u32 rd_p;
 
-   u16 idx;
-   u16 wr_ptr_add;
+   u32 idx;
+   u32 wr_ptr_add;
 
int fl_qcidx;
-   int rx_qcidx;
u8 __iomem *qcp_fl;
-   u8 __iomem *qcp_rx;
 
struct nfp_net_rx_buf *rxbufs;
struct nfp_net_rx_desc *rxds;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 19f9d95faea4..255294b8bc5f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -479,10 +479,7 @@ nfp_net_rx_ring_init(struct nfp_net_rx_ring *rx_ring,
rx_ring->r_vec = r_vec;
 
rx_ring->fl_qcidx = rx_ring->idx * nn->stride_rx;
-   rx_ring->rx_qcidx = rx_ring->fl_qcidx + (nn->stride_rx - 1);
-
rx_ring->qcp_fl = nn->rx_bar + NFP_QCP_QUEUE_OFF(rx_ring->fl_qcidx);
-   rx_ring->qcp_rx = nn->rx_bar + NFP_QCP_QUEUE_OFF(rx_ring->rx_qcidx);
 }
 
 /**
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
index 74125584260b..4077c59bf782 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
@@ -40,9 +40,9 @@ static struct dentry *nfp_dir;
 
 static int nfp_net_debugfs_rx_q_read(struct seq_file *file, void *data)
 {
-   int fl_rd_p, fl_wr_p, rx_rd_p, rx_wr_p, rxd_cnt;
struct nfp_net_r_vector *r_vec = file->private;
struct nfp_net_rx_ring *rx_ring;
+   int fl_rd_p, fl_wr_p, rxd_cnt;
struct nfp_net_rx_desc *rxd;
struct nfp_net *nn;
void *frag;
@@ -61,14 +61,11 @@ static int nfp_net_debugfs_rx_q_read(struct seq_file *file, 
void *data)
 
fl_rd_p = nfp_qcp_rd_ptr_read(rx_ring->qcp_fl);
fl_wr_p = nfp_qcp_wr_ptr_read(rx_ring->qcp_fl);
-   rx_rd_p = nfp_qcp_rd_ptr_read(rx_ring->qcp_rx);
-   rx_wr_p = nfp_qcp_wr_ptr_read(rx_ring->qcp_rx);
 
-   seq_printf(file, "RX[%02d,%02d,%02d]: cnt=%d dma=%pad host=%p   H_RD=%d 
H_WR=%d FL_RD=%d FL_WR=%d RX_RD=%d RX_WR=%d\n",
-  rx_ring->idx, rx_ring->fl_qcidx, rx_ring->rx_qcidx,
+   seq_printf(file, "RX[%02d,%02d]: cnt=%d dma=%pad host=%p   H_RD=%d 
H_WR=%d FL_RD=%d FL_WR=%d\n",
+  rx_ring->idx, rx_ring->fl_qcidx,
   rx_ring->cnt, _ring->dma, rx_ring->rxds,
-  rx_ring->rd_p, rx_ring->wr_p,
-  fl_rd_p, fl_wr_p, rx_rd_p, rx_wr_p);
+  rx_ring->rd_p, rx_ring->wr_p, fl_rd_p, fl_wr_p);
 
for (i = 0; i < rxd_cnt; i++) {
rxd = _ring->rxds[i];
@@ -91,10 +88,6 @@ static int nfp_net_debugfs_rx_q_read(struct seq_file *file, 
void *data)
seq_puts(file, " FL_RD");
if (i == fl_wr_p % rxd_cnt)
seq_puts(file, " FL_WR");
-   if (i == rx_rd_p % rxd_cnt)
-   seq_puts(file, " RX_RD");
-   if (i == rx_wr_p % rxd_cnt)
-   seq_puts(file, " RX_WR");
 
seq_putc(file, '\n');
}
-- 
2.11.0

[PATCH net-next 10/15] nfp: fix nfp_cpp_read()/nfp_cpp_write() error paths

2017-03-21 Thread Jakub Kicinski

When acquiring an area fails we can't call function doing both
release and free.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index 2e4796b52b84..e2abba4c3a3f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -951,12 +951,14 @@ int nfp_cpp_read(struct nfp_cpp *cpp, u32 destination,
return -ENOMEM;
 
err = nfp_cpp_area_acquire(area);
-   if (err)
-   goto out;
+   if (err) {
+   nfp_cpp_area_free(area);
+   return err;
+   }
}
 
err = nfp_cpp_area_read(area, offset, kernel_vaddr, length);
-out:
+
if (cache)
area_cache_put(cpp, cache);
else
@@ -993,13 +995,14 @@ int nfp_cpp_write(struct nfp_cpp *cpp, u32 destination,
return -ENOMEM;
 
err = nfp_cpp_area_acquire(area);
-   if (err)
-   goto out;
+   if (err) {
+   nfp_cpp_area_free(area);
+   return err;
+   }
}
 
err = nfp_cpp_area_write(area, offset, kernel_vaddr, length);
 
-out:
if (cache)
area_cache_put(cpp, cache);
else
-- 
2.11.0

[PATCH net-next 11/15] nfp: don't use netdev_warn() before netdev is registered

2017-03-21 Thread Jakub Kicinski

Fix warning which was using netdev_warn() instead of dev_warn()
to early.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index f134f1808b9a..19f9d95faea4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -336,9 +336,9 @@ nfp_net_irqs_assign(struct nfp_net *nn, struct msix_entry 
*irq_entries,
 
if (dp->num_rx_rings > dp->num_r_vecs ||
dp->num_tx_rings > dp->num_r_vecs)
-   nn_warn(nn, "More rings (%d,%d) than vectors (%d).\n",
-   dp->num_rx_rings, dp->num_tx_rings,
-   dp->num_r_vecs);
+   dev_warn(nn->dp.dev, "More rings (%d,%d) than vectors (%d).\n",
+dp->num_rx_rings, dp->num_tx_rings,
+dp->num_r_vecs);
 
dp->num_rx_rings = min(dp->num_r_vecs, dp->num_rx_rings);
dp->num_tx_rings = min(dp->num_r_vecs, dp->num_tx_rings);
-- 
2.11.0

[PATCH net-next 07/15] nfp: correct return codes when msleep gets interrupted

2017-03-21 Thread Jakub Kicinski

msleep_interruptible() returns time left to wait, not error
code.  Return ERESTARTSYS when interrupted.

While at it correct a comment and make the polling a bit
more aggressive.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 34c50987c377..17822ae4a17f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -209,9 +209,8 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg,
if ((*reg & mask) == val)
return 0;
 
-   err = msleep_interruptible(100);
-   if (err)
-   return err;
+   if (msleep_interruptible(25))
+   return -ERESTARTSYS;
 
if (time_after(start_time, wait_until))
return -ETIMEDOUT;
@@ -228,7 +227,7 @@ nfp_nsp_wait_reg(struct nfp_cpp *cpp, u64 *reg,
  *
  * Return: 0 for success with no result
  *
- *  1..255 for NSP completion with a result code
+ *  positive value for NSP completion with a result code
  *
  * -EAGAIN if the NSP is not yet present
  * -ENODEV if the NSP is not a supported model
@@ -380,9 +379,10 @@ int nfp_nsp_wait(struct nfp_nsp *state)
if (err != -EAGAIN)
break;
 
-   err = msleep_interruptible(100);
-   if (err)
+   if (msleep_interruptible(25)) {
+   err = -ERESTARTSYS;
break;
+   }
 
if (time_after(start_time, wait_until)) {
err = -ETIMEDOUT;
-- 
2.11.0

[PATCH net-next 02/15] nfp: fail graciously when someone tries to grab global lock

2017-03-21 Thread Jakub Kicinski

The global device lock is acquired to search the resource table.
The lock is actually itself part of the table (entry 0).
Therefore if someone asks for resource 0 we would deadlock since
double locking is no longer allowed.

Currently the driver doesn't try to lock that resource so let's
simply make sure we fail graciously and not add special handling
of this case until really need.  Hide the relevant defines in
the source file.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h  |  9 +
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c | 15 ---
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
index 42cb720b696d..f7ca8e374923 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h
@@ -66,14 +66,7 @@ int nfp_nsp_write_eth_table(struct nfp_nsp *state,
 
 /* Implemented in nfp_resource.c */
 
-#define NFP_RESOURCE_TBL_TARGETNFP_CPP_TARGET_MU
-#define NFP_RESOURCE_TBL_BASE  0x81ULL
-
-/* NFP Resource Table self-identifier */
-#define NFP_RESOURCE_TBL_NAME  "nfp.res"
-#define NFP_RESOURCE_TBL_KEY   0x /* Special key for entry 0 */
-
-/* All other keys are CRC32-POSIX of the 8-byte identification string */
+/* All keys are CRC32-POSIX of the 8-byte identification string */
 
 /* ARM/PCI vNIC Interfaces 0..3 */
 #define NFP_RESOURCE_VNIC_PCI_0"vnic.p0"
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
index a2850344f8b4..2d15a7c9d0de 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_resource.c
@@ -45,6 +45,13 @@
 #include "nfp_cpp.h"
 #include "nfp6000/nfp6000.h"
 
+#define NFP_RESOURCE_TBL_TARGETNFP_CPP_TARGET_MU
+#define NFP_RESOURCE_TBL_BASE  0x81ULL
+
+/* NFP Resource Table self-identifier */
+#define NFP_RESOURCE_TBL_NAME  "nfp.res"
+#define NFP_RESOURCE_TBL_KEY   0x /* Special key for entry 0 */
+
 #define NFP_RESOURCE_ENTRY_NAME_SZ 8
 
 /**
@@ -100,9 +107,11 @@ static int nfp_cpp_resource_find(struct nfp_cpp *cpp, 
struct nfp_resource *res)
strncpy(name_pad, res->name, sizeof(name_pad));
 
/* Search for a matching entry */
-   key = NFP_RESOURCE_TBL_KEY;
-   if (memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8))
-   key = crc32_posix(name_pad, sizeof(name_pad));
+   if (!memcmp(name_pad, NFP_RESOURCE_TBL_NAME "\0\0\0\0\0\0\0\0", 8)) {
+   nfp_err(cpp, "Grabbing device lock not supported\n");
+   return -EOPNOTSUPP;
+   }
+   key = crc32_posix(name_pad, sizeof(name_pad));
 
for (i = 0; i < NFP_RESOURCE_TBL_ENTRIES; i++) {
u64 addr = NFP_RESOURCE_TBL_BASE +
-- 
2.11.0

[PATCH net-next 15/15] nfp: disable FW on reconfiguration errors

2017-03-21 Thread Jakub Kicinski

Since we no longer need to keep the FW enabled for .ndo_close()
to work we can always stop FW after reconfiguration failure.
This seems to make most FWs more resilient to faults (at least
in error injection scenarios).

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 29 --
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index e12353a7c83c..8f2da128ce0f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2127,7 +2127,11 @@ nfp_net_tx_ring_hw_cfg_write(struct nfp_net *nn,
nn_writeb(nn, NFP_NET_CFG_TXR_VEC(idx), tx_ring->r_vec->irq_entry);
 }
 
-static int __nfp_net_set_config_and_enable(struct nfp_net *nn)
+/**
+ * nfp_net_set_config_and_enable() - Write control BAR and enable NFP
+ * @nn:  NFP Net device to reconfigure
+ */
+static int nfp_net_set_config_and_enable(struct nfp_net *nn)
 {
u32 new_ctrl, update = 0;
unsigned int r;
@@ -2176,6 +2180,10 @@ static int __nfp_net_set_config_and_enable(struct 
nfp_net *nn)
 
nn_writel(nn, NFP_NET_CFG_CTRL, new_ctrl);
err = nfp_net_reconfig(nn, update);
+   if (err) {
+   nfp_net_clear_config_and_disable(nn);
+   return err;
+   }
 
nn->dp.ctrl = new_ctrl;
 
@@ -2191,22 +2199,7 @@ static int __nfp_net_set_config_and_enable(struct 
nfp_net *nn)
udp_tunnel_get_rx_info(nn->dp.netdev);
}
 
-   return err;
-}
-
-/**
- * nfp_net_set_config_and_enable() - Write control BAR and enable NFP
- * @nn:  NFP Net device to reconfigure
- */
-static int nfp_net_set_config_and_enable(struct nfp_net *nn)
-{
-   int err;
-
-   err = __nfp_net_set_config_and_enable(nn);
-   if (err)
-   nfp_net_clear_config_and_disable(nn);
-
-   return err;
+   return 0;
 }
 
 /**
@@ -2447,7 +2440,7 @@ static int nfp_net_dp_swap_enable(struct nfp_net *nn, 
struct nfp_net_dp *dp)
return err;
}
 
-   return __nfp_net_set_config_and_enable(nn);
+   return nfp_net_set_config_and_enable(nn);
 }
 
 struct nfp_net_dp *nfp_net_clone_dp(struct nfp_net *nn)
-- 
2.11.0

[PATCH net-next 03/15] nfp: remove cpp mutex cache

2017-03-21 Thread Jakub Kicinski

CPP mutex cache was introduced to work around the fact that the
same host could successfully acquire a lock multiple times.  It
used to collapse multiple users to the same struct nfp_cpp_mutex
and track use count.  Unfortunately it's racy.  Since we now force
all nfp_mutex_lock() callers within the host to actually succeed
at acquiring the lock we no longer need the cache, let's remove it.

Signed-off-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c   | 43 +-
 1 file changed, 2 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index e2267b421af0..6337342c5b62 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -66,10 +66,8 @@ struct nfp_cpp_resource {
 };
 
 struct nfp_cpp_mutex {
-   struct list_head list;
struct nfp_cpp *cpp;
int target;
-   u16 usage;
u16 depth;
unsigned long long address;
u32 key;
@@ -86,7 +84,6 @@ struct nfp_cpp {
 
const struct nfp_cpp_operations *op;
struct list_head resource_list; /* NFP CPP resource list */
-   struct list_head mutex_cache;   /* Mutex cache */
rwlock_t resource_lock;
wait_queue_head_t waitq;
 
@@ -187,24 +184,6 @@ void nfp_cpp_free(struct nfp_cpp *cpp)
 {
struct nfp_cpp_area_cache *cache, *ctmp;
struct nfp_cpp_resource *res, *rtmp;
-   struct nfp_cpp_mutex *mutex, *mtmp;
-
-   /* There should be no mutexes in the cache at this point. */
-   WARN_ON(!list_empty(>mutex_cache));
-   /* .. but if there are, unlock them and complain. */
-   list_for_each_entry_safe(mutex, mtmp, >mutex_cache, list) {
-   dev_err(cpp->dev.parent, "Dangling mutex: @%d::0x%llx, %d locks 
held by %d owners\n",
-   mutex->target, (unsigned long long)mutex->address,
-   mutex->depth, mutex->usage);
-
-   /* Forcing an unlock */
-   mutex->depth = 1;
-   nfp_cpp_mutex_unlock(mutex);
-
-   /* Forcing a free */
-   mutex->usage = 1;
-   nfp_cpp_mutex_free(mutex);
-   }
 
/* Remove all caches */
list_for_each_entry_safe(cache, ctmp, >area_cache_list, entry) {
@@ -1127,7 +1106,6 @@ nfp_cpp_from_operations(const struct nfp_cpp_operations 
*ops,
rwlock_init(>resource_lock);
init_waitqueue_head(>waitq);
lockdep_set_class(>resource_lock, _cpp_resource_lock_key);
-   INIT_LIST_HEAD(>mutex_cache);
INIT_LIST_HEAD(>resource_list);
INIT_LIST_HEAD(>area_cache_list);
mutex_init(>area_cache_mutex);
@@ -1536,14 +1514,6 @@ struct nfp_cpp_mutex *nfp_cpp_mutex_alloc(struct nfp_cpp 
*cpp, int target,
if (err)
return NULL;
 
-   /* Look for mutex on cache list */
-   list_for_each_entry(mutex, >mutex_cache, list) {
-   if (mutex->target == target && mutex->address == address) {
-   mutex->usage++;
-   return mutex;
-   }
-   }
-
err = nfp_cpp_readl(cpp, mur, address + 4, );
if (err < 0)
return NULL;
@@ -1560,10 +1530,6 @@ struct nfp_cpp_mutex *nfp_cpp_mutex_alloc(struct nfp_cpp 
*cpp, int target,
mutex->address = address;
mutex->key = key;
mutex->depth = 0;
-   mutex->usage = 1;
-
-   /* Add mutex to cache list */
-   list_add(>list, >mutex_cache);
 
return mutex;
 }
@@ -1574,11 +1540,6 @@ struct nfp_cpp_mutex *nfp_cpp_mutex_alloc(struct nfp_cpp 
*cpp, int target,
  */
 void nfp_cpp_mutex_free(struct nfp_cpp_mutex *mutex)
 {
-   if (--mutex->usage)
-   return;
-
-   /* Remove mutex from cache */
-   list_del(>list);
kfree(mutex);
 }
 
@@ -1611,8 +1572,8 @@ int nfp_cpp_mutex_lock(struct nfp_cpp_mutex *mutex)
if (time_is_before_eq_jiffies(warn_at)) {
warn_at = jiffies + 60 * HZ;
dev_warn(mutex->cpp->dev.parent,
-"Warning: waiting for NFP mutex [usage:%hd 
depth:%hd target:%d addr:%llx key:%08x]\n",
-mutex->usage, mutex->depth,
+"Warning: waiting for NFP mutex [depth:%hd 
target:%d addr:%llx key:%08x]\n",
+mutex->depth,
 mutex->target, mutex->address, mutex->key);
}
}
-- 
2.11.0

[PATCH net-next 04/15] nfp: move mutex code out of nfp_cppcore.c

2017-03-21 Thread Jakub Kicinski

After mutex cache removal we can put the mutex code in a separate
source file.  This makes it clear it doesn't play with internals
of struct nfp_cpp any more.

No functional changes.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c   | 304 --
 .../net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c | 345 +
 3 files changed, 346 insertions(+), 304 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 6933afa69df2..4a5d13ef92a4 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -6,6 +6,7 @@ nfp-objs := \
nfpcore/nfp_cpplib.o \
nfpcore/nfp_hwinfo.o \
nfpcore/nfp_mip.o \
+   nfpcore/nfp_mutex.o \
nfpcore/nfp_nffw.o \
nfpcore/nfp_nsp.o \
nfpcore/nfp_nsp_eth.o \
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
index 6337342c5b62..62aa7bcee93d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c
@@ -65,14 +65,6 @@ struct nfp_cpp_resource {
u64 end;
 };
 
-struct nfp_cpp_mutex {
-   struct nfp_cpp *cpp;
-   int target;
-   u16 depth;
-   unsigned long long address;
-   u32 key;
-};
-
 struct nfp_cpp {
struct device dev;
 
@@ -1403,299 +1395,3 @@ void *nfp_cpp_explicit_priv(struct nfp_cpp_explicit 
*cpp_explicit)
 {
return _explicit[1];
 }
-
-/* THIS FUNCTION IS NOT EXPORTED */
-static u32 nfp_mutex_locked(u16 interface)
-{
-   return (u32)interface << 16 | 0x000f;
-}
-
-static u32 nfp_mutex_unlocked(u16 interface)
-{
-   return (u32)interface << 16 | 0x;
-}
-
-static bool nfp_mutex_is_locked(u32 val)
-{
-   return (val & 0x) == 0x000f;
-}
-
-static bool nfp_mutex_is_unlocked(u32 val)
-{
-   return (val & 0x) == ;
-}
-
-/* If you need more than 65536 recursive locks, please rethink your code. */
-#define MUTEX_DEPTH_MAX 0x
-
-static int
-nfp_cpp_mutex_validate(u16 interface, int *target, unsigned long long address)
-{
-   /* Not permitted on invalid interfaces */
-   if (NFP_CPP_INTERFACE_TYPE_of(interface) ==
-   NFP_CPP_INTERFACE_TYPE_INVALID)
-   return -EINVAL;
-
-   /* Address must be 64-bit aligned */
-   if (address & 7)
-   return -EINVAL;
-
-   if (*target != NFP_CPP_TARGET_MU)
-   return -EINVAL;
-
-   return 0;
-}
-
-/**
- * nfp_cpp_mutex_init() - Initialize a mutex location
- * @cpp:   NFP CPP handle
- * @target:NFP CPP target ID (ie NFP_CPP_TARGET_CLS or NFP_CPP_TARGET_MU)
- * @address:   Offset into the address space of the NFP CPP target ID
- * @key:   Unique 32-bit value for this mutex
- *
- * The CPP target:address must point to a 64-bit aligned location, and
- * will initialize 64 bits of data at the location.
- *
- * This creates the initial mutex state, as locked by this
- * nfp_cpp_interface().
- *
- * This function should only be called when setting up
- * the initial lock state upon boot-up of the system.
- *
- * Return: 0 on success, or -errno on failure
- */
-int nfp_cpp_mutex_init(struct nfp_cpp *cpp,
-  int target, unsigned long long address, u32 key)
-{
-   const u32 muw = NFP_CPP_ID(target, 4, 0);/* atomic_write */
-   u16 interface = nfp_cpp_interface(cpp);
-   int err;
-
-   err = nfp_cpp_mutex_validate(interface, , address);
-   if (err)
-   return err;
-
-   err = nfp_cpp_writel(cpp, muw, address + 4, key);
-   if (err)
-   return err;
-
-   err = nfp_cpp_writel(cpp, muw, address, nfp_mutex_locked(interface));
-   if (err)
-   return err;
-
-   return 0;
-}
-
-/**
- * nfp_cpp_mutex_alloc() - Create a mutex handle
- * @cpp:   NFP CPP handle
- * @target:NFP CPP target ID (ie NFP_CPP_TARGET_CLS or NFP_CPP_TARGET_MU)
- * @address:   Offset into the address space of the NFP CPP target ID
- * @key:   32-bit unique key (must match the key at this location)
- *
- * The CPP target:address must point to a 64-bit aligned location, and
- * reserve 64 bits of data at the location for use by the handle.
- *
- * Only target/address pairs that point to entities that support the
- * MU Atomic Engine's CmpAndSwap32 command are supported.
- *
- * Return: A non-NULL struct nfp_cpp_mutex * on success, NULL on failure.
- */
-struct nfp_cpp_mutex *nfp_cpp_mutex_alloc(struct nfp_cpp *cpp, int target,
- unsigned long long address, u32 key)
-{
-   const u32 mur = NFP_CPP_ID(target, 3, 0);/*

[PATCH net-next 00/15] nfp: allow concurrency in core and minor fixes

2017-03-21 Thread Jakub Kicinski

Hi!

The first 10 patches of this series prepare nfpcore for concurrent 
accesses.  This will be needed by upcoming hwmon and devlink patches.
Most locking is already in place, the patches in this series iron out
a few bugs.

Last 5 patches are fixes and cleanups to the netdev code, including
removal of doorbell pointers used only on old versions of the chip,
removal of unnecessarily defensive code and flushing xmit_more more
carefully on error paths.

Jakub Kicinski (15):
  nfp: disallow sharing mutexes on the same machine
  nfp: fail graciously when someone tries to grab global lock
  nfp: remove cpp mutex cache
  nfp: move mutex code out of nfp_cppcore.c
  nfp: document expected locking in the core
  nfp: lock area cache earlier
  nfp: correct return codes when msleep gets interrupted
  nfp: don't ignore return value of wait_event_interruptible
  nfp: fix invalid area detection
  nfp: fix nfp_cpp_read()/nfp_cpp_write() error paths
  nfp: don't use netdev_warn() before netdev is registered
  nfp: remove RX queue pointers
  nfp: flush xmit_more on error paths
  nfp: remove defensive checks around ndo_open()/ndo_close()
  nfp: disable FW on reconfiguration errors

 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   8 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  50 +--
 .../net/ethernet/netronome/nfp/nfp_net_debugfs.c   |  15 +-
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp.h   |   9 +-
 .../ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c  |  29 +-
 .../ethernet/netronome/nfp/nfpcore/nfp_cppcore.c   | 467 -
 .../net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c | 345 +++
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  12 +-
 .../ethernet/netronome/nfp/nfpcore/nfp_resource.c  |  15 +-
 10 files changed, 484 insertions(+), 467 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_mutex.c

-- 
2.11.0

[PATCH net-next 14/15] nfp: remove defensive checks around ndo_open()/ndo_close()

2017-03-21 Thread Jakub Kicinski

Device open and close handlers check if the device is already
in the desired state.  Thanks to our reconfig infrastructure
this should not be necessary, there doesn't seem to be any
code in the driver which depends on it.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index d35eeba86bac..e12353a7c83c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2233,11 +2233,6 @@ static int nfp_net_netdev_open(struct net_device *netdev)
struct nfp_net *nn = netdev_priv(netdev);
int err, r;
 
-   if (nn->dp.ctrl & NFP_NET_CFG_CTRL_ENABLE) {
-   nn_err(nn, "Dev is already enabled: 0x%08x\n", nn->dp.ctrl);
-   return -EBUSY;
-   }
-
/* Step 1: Allocate resources for rings and the like
 * - Request interrupts
 * - Allocate RX and TX ring resources
@@ -2368,11 +2363,6 @@ static int nfp_net_netdev_close(struct net_device 
*netdev)
 {
struct nfp_net *nn = netdev_priv(netdev);
 
-   if (!(nn->dp.ctrl & NFP_NET_CFG_CTRL_ENABLE)) {
-   nn_err(nn, "Dev is not up: 0x%08x\n", nn->dp.ctrl);
-   return 0;
-   }
-
/* Step 1: Disable RX and TX rings from the Linux kernel perspective
 */
nfp_net_close_stack(nn);
-- 
2.11.0

[PATCH net-next 09/15] nfp: fix invalid area detection

2017-03-21 Thread Jakub Kicinski

Core should detect when someone is trying to request an access
window which is too large for a given type of access.  Otherwise
the requester will be put on a wait queue for ever without any
error message.

Add const qualifiers to clarify that we are only looking at read-
-only members in relevant functions.

Signed-off-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c  | 29 +++---
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index 15cc3e77cf6a..43dc68e01274 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -217,7 +217,7 @@ static resource_size_t nfp_bar_resource_start(struct 
nfp_bar *bar)
 #define TARGET_WIDTH_648
 
 static int
-compute_bar(struct nfp6000_pcie *nfp, struct nfp_bar *bar,
+compute_bar(const struct nfp6000_pcie *nfp, const struct nfp_bar *bar,
u32 *bar_config, u64 *bar_base,
int tgt, int act, int tok, u64 offset, size_t size, int width)
 {
@@ -410,35 +410,36 @@ find_matching_bar(struct nfp6000_pcie *nfp,
 
 /* Return EAGAIN if no resource is available */
 static int
-find_unused_bar_noblock(struct nfp6000_pcie *nfp,
+find_unused_bar_noblock(const struct nfp6000_pcie *nfp,
int tgt, int act, int tok,
u64 offset, size_t size, int width)
 {
-   int n, invalid = 0;
+   int n, busy = 0;
 
for (n = 0; n < nfp->bars; n++) {
-   struct nfp_bar *bar = >bar[n];
+   const struct nfp_bar *bar = >bar[n];
int err;
 
-   if (bar->bitsize == 0) {
-   invalid++;
-   continue;
-   }
-
-   if (atomic_read(>refcnt) != 0)
+   if (!bar->bitsize)
continue;
 
/* Just check to see if we can make it fit... */
err = compute_bar(nfp, bar, NULL, NULL,
  tgt, act, tok, offset, size, width);
+   if (err)
+   continue;
 
-   if (err < 0)
-   invalid++;
-   else
+   if (!atomic_read(>refcnt))
return n;
+
+   busy++;
}
 
-   return (n == invalid) ? -EINVAL : -EAGAIN;
+   if (WARN(!busy, "No suitable BAR found for request tgt:0x%x act:0x%x 
tok:0x%x off:0x%llx size:%zd width:%d\n",
+tgt, act, tok, offset, size, width))
+   return -EINVAL;
+
+   return -EAGAIN;
 }
 
 static int
-- 
2.11.0

Re: [PATCH] net: vrf: Reset rt6i_idev in local dst after put

2017-03-21 Thread David Miller

From: David Ahern 
Date: Fri, 17 Mar 2017 16:07:11 -0700

> The VRF driver takes a reference to the inet6_dev on the VRF device for
> its rt6_local dst when handling local traffic through the VRF device as
> a loopback. When the device is deleted the driver does a put on the idev
> but does not reset rt6i_idev in the rt6_info struct. When the dst is
> destroyed, dst_destroy calls ip6_dst_destroy which does a second put for
> what is essentially the same reference causing it to be prematurely freed.
> Reset rt6i_idev after the put in the vrf driver.
> 
> Fixes: b4869aa2f881e ("net: vrf: ipv6 support for local traffic to
>local addresses")
> Signed-off-by: David Ahern 

Applied and queued up for -stable.

Re: [PATCH] rhashtable: Add rhashtable_lookup_get_insert_fast

2017-03-21 Thread David Miller

From: Andreas Gruenbacher 
Date: Sat, 18 Mar 2017 00:36:15 +0100

> Add rhashtable_lookup_get_insert_fast for fixed keys, similar to
> rhashtable_lookup_get_insert_key for explicit keys.
> 
> Signed-off-by: Andreas Gruenbacher 
> Acked-by: Herbert Xu 

Applied to net-next, thanks.

Re: [PATCH net-next] liquidio: fix for vf mac addr command sent to nic firmware

2017-03-21 Thread David Miller

From: Felix Manlunas 
Date: Fri, 17 Mar 2017 15:43:26 -0700

> From: Rick Farrington 
> 
> Change to support host<->firmware command return value.
> Fix for vf mac addr state command.
> 1. Added support for firmware commands to return a value:
>- previously, the returned code overlapped with host codes, thus
>  commands were only returning 0 (success) or -1 (interpreted as
>  timeout)
>- per 'response_manager.h', the error codes are split into two fields
>  (major/minor) now, firmware commands are grouped into their own
>  'major' group, separate from the host's 'major' group, which allow f/w
>  commands to return any 16-bit value
> 2. The command to set vf mac addr was logging a success message even if
>command failed.  Now command uses a callback function to log the status
>message.
> 3. The command to set vf mac addr was not logging a message when set via
>the host 'ip' command.  Now, the callback function will log an
>appropriate message.
> 
> Signed-off-by: Rick Farrington 
> Signed-off-by: Felix Manlunas 
> Signed-off-by: Derek Chickles 
> Signed-off-by: Satanand Burla 

Applied, thanks.

Re: [PATCH net 0/4] ibmvnic: Initialization fixes and improvements

2017-03-21 Thread David Miller

From: John Allen 
Date: Fri, 17 Mar 2017 17:13:39 -0500

> These patches resolve issues with the ibmvnic initialization process.

Series applied, thanks.

Re: [PATCH net-next] liquidio: add debug error messages to report command timeout

2017-03-21 Thread David Miller

From: Felix Manlunas 
Date: Fri, 17 Mar 2017 11:23:08 -0700

> From: Rick Farrington 
> 
> Add timeout error message in lio_process_ordered_list().  Add host failure
> status in existing error message in if_cfg_callback().
> 
> Signed-off-by: Rick Farrington 
> Signed-off-by: Felix Manlunas 

Applied.

Re: [PATCH net-next] liquidio: remove duplicate code

2017-03-21 Thread David Miller

From: Felix Manlunas 
Date: Fri, 17 Mar 2017 10:50:05 -0700

> From: Satanand Burla 
> 
> Remove code duplicated in PF and VF; define that code once only in a common
> header file included by PF and VF.
>  
> Signed-off-by: Satanand Burla 
> Signed-off-by: Felix Manlunas 

Applied.

Re: [PATCH] bna: integer overflow bug in debugfs

2017-03-21 Thread David Miller

From: Dan Carpenter 
Date: Fri, 17 Mar 2017 23:52:35 +0300

> We could allocate less memory than intended because we do:
> 
>   bnad->regdata = kzalloc(len << 2, GFP_KERNEL);
> 
> The shift can overflow leading to a crash.  This is debugfs code so the
> impact is very small.
> 
> Fixes: 7afc5dbde091 ("bna: Add debugfs interface.")
> Signed-off-by: Dan Carpenter 

Applied.

Re: [PATCH net,stable] qmi_wwan: add Dell DW5811e

2017-03-21 Thread David Miller

From: Bjørn Mork 
Date: Fri, 17 Mar 2017 17:20:48 +0100

> This is a Dell branded Sierra Wireless EM7455. It is operating in
> MBIM mode by default, but can be configured to provide two QMI/RMNET
> functions.
> 
> Signed-off-by: Bjørn Mork 
> ---
> Note regarding stable backports:
> 
> This device should only be added to v4.5 and later. It depends on
> commit 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode")
> which has not been backported AFAIK.

Applied and queued up for -stable, thanks.

Re: [PATCH v2 net-next 0/3] net: stmmac: adding multiple buffers and routing

2017-03-21 Thread David Miller

From: Joao Pinto 
Date: Fri, 17 Mar 2017 16:11:04 +

> As agreed with David Miller, this patch-set is the third and last to enable
> multiple queues in stmmac.
> 
> This third one focuses on:
> 
> a) Enable multiple buffering to the driver and queue independent data
> b) Configuration of RX and TX queues' priority
> c) Configuration of RX queues' routing

Series applied, thanks.

Re: [PATCH net] sch_dsmark: fix invalid skb_cow() usage

2017-03-21 Thread David Miller

From: Eric Dumazet 
Date: Fri, 17 Mar 2017 08:05:28 -0700

> From: Eric Dumazet 
> 
> skb_cow(skb, sizeof(ip header)) is not very helpful in this context.
> 
> First we need to use pskb_may_pull() to make sure the ip header
> is in skb linear part, then use skb_try_make_writable() to
> address clones issues.
> 
> Fixes: 4c30719f4f55 ("[PKT_SCHED] dsmark: handle cloned and non-linear skb's")
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable, thanks.

Re: [PATCH net-next] net: ethoc: Use ether_addr_copy()

2017-03-21 Thread David Miller

From: Tobias Klauser 
Date: Fri, 17 Mar 2017 11:52:15 +0100

> Use ether_addr_copy() instead of memcpy() to set netdev->dev_addr (which
> is 2-byte aligned).
> 
> Signed-off-by: Tobias Klauser 

Applied.

Re: [patch net-next 0/2] mlxsw: small driver update

2017-03-21 Thread David Miller

From: Jiri Pirko 
Date: Fri, 17 Mar 2017 09:37:59 +0100

> From: Jiri Pirko 
> 
> Contains two cleanup patches.

Series applied, thanks.

Re: [PATCH net-next] r8152: check hw version first

2017-03-21 Thread David Miller

From: Hayes Wang 
Date: Fri, 17 Mar 2017 11:20:13 +0800

> Check hw version first in probe(). Do nothing if the driver doesn't
> support the chip.
> 
> Signed-off-by: Hayes Wang 

Applied, thanks.

Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-21 Thread Kees Cook

On Tue, Mar 21, 2017 at 2:23 PM, Eric Dumazet  wrote:
> On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:
>
>> Yeah, this is exactly what I'd like to find as well. Just comparing
>> cycles between refcount implementations, while interesting, doesn't
>> show us real-world performance changes, which is what we need to
>> measure.
>>
>> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
>> elsewhere in this email thread) real-world meaningful enough?
>
> Not at all ;)
>
> This was targeting the specific change I had in mind for
> ip_idents_reserve(), which is not used by TCP flows.

Okay, I just wanted to check. I didn't think so, but it was the only
example in the thread.

> Unfortunately there is no good test simulating real-world workloads,
> which are mostly using TCP flows.

Sure, but there has to be _something_ that can be used to test to
measure the effects. Without a meaningful test, it's weird to reject a
change for performance reasons.

> Most synthetic tools you can find are not using epoll(), and very often
> hit bottlenecks in other layers.
>
>
> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.

So, FWIW, I originally tried to make this a CONFIG in the first couple
passes at getting a refcount defense. I would be fine with this, but I
was not able to convince Peter. :) However, things have evolved a lot
since then, so perhaps there are things do be done here.

> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

It wasn't the issue of coding time; just that it had been specifically
not wanted. :)

Am I understanding you correctly that you'd want something like:

refcount.h:
#ifdef UNPROTECTED_REFCOUNT
#define refcount_inc(x)   atomic_inc(x)
...
#else
void refcount_inc(...
...
#endif

some/net.c:
#define UNPROTECTED_REFCOUNT
#include 

or similar?

-Kees

-- 
Kees Cook
Pixel Security

[net-next 1/5] e1000: use new API ethtool_{get|set}_link_ksettings

2017-03-21 Thread Jeff Kirsher

From: Philippe Reynes 

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 117 +++
 1 file changed, 58 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 975eeb885ca2..ec8aa4562cc9 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -103,104 +103,104 @@ static const char 
e1000_gstrings_test[][ETH_GSTRING_LEN] = {
 
 #define E1000_TEST_LEN ARRAY_SIZE(e1000_gstrings_test)
 
-static int e1000_get_settings(struct net_device *netdev,
- struct ethtool_cmd *ecmd)
+static int e1000_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings *cmd)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
+   u32 supported, advertising;
 
if (hw->media_type == e1000_media_type_copper) {
-   ecmd->supported = (SUPPORTED_10baseT_Half |
-  SUPPORTED_10baseT_Full |
-  SUPPORTED_100baseT_Half |
-  SUPPORTED_100baseT_Full |
-  SUPPORTED_1000baseT_Full|
-  SUPPORTED_Autoneg |
-  SUPPORTED_TP);
-   ecmd->advertising = ADVERTISED_TP;
+   supported = (SUPPORTED_10baseT_Half |
+SUPPORTED_10baseT_Full |
+SUPPORTED_100baseT_Half |
+SUPPORTED_100baseT_Full |
+SUPPORTED_1000baseT_Full|
+SUPPORTED_Autoneg |
+SUPPORTED_TP);
+   advertising = ADVERTISED_TP;
 
if (hw->autoneg == 1) {
-   ecmd->advertising |= ADVERTISED_Autoneg;
+   advertising |= ADVERTISED_Autoneg;
/* the e1000 autoneg seems to match ethtool nicely */
-   ecmd->advertising |= hw->autoneg_advertised;
+   advertising |= hw->autoneg_advertised;
}
 
-   ecmd->port = PORT_TP;
-   ecmd->phy_address = hw->phy_addr;
-
-   if (hw->mac_type == e1000_82543)
-   ecmd->transceiver = XCVR_EXTERNAL;
-   else
-   ecmd->transceiver = XCVR_INTERNAL;
-
+   cmd->base.port = PORT_TP;
+   cmd->base.phy_address = hw->phy_addr;
} else {
-   ecmd->supported   = (SUPPORTED_1000baseT_Full |
-SUPPORTED_FIBRE |
-SUPPORTED_Autoneg);
+   supported   = (SUPPORTED_1000baseT_Full |
+  SUPPORTED_FIBRE |
+  SUPPORTED_Autoneg);
 
-   ecmd->advertising = (ADVERTISED_1000baseT_Full |
-ADVERTISED_FIBRE |
-ADVERTISED_Autoneg);
+   advertising = (ADVERTISED_1000baseT_Full |
+  ADVERTISED_FIBRE |
+  ADVERTISED_Autoneg);
 
-   ecmd->port = PORT_FIBRE;
-
-   if (hw->mac_type >= e1000_82545)
-   ecmd->transceiver = XCVR_INTERNAL;
-   else
-   ecmd->transceiver = XCVR_EXTERNAL;
+   cmd->base.port = PORT_FIBRE;
}
 
if (er32(STATUS) & E1000_STATUS_LU) {
e1000_get_speed_and_duplex(hw, >link_speed,
   >link_duplex);
-   ethtool_cmd_speed_set(ecmd, adapter->link_speed);
+   cmd->base.speed = adapter->link_speed;
 
/* unfortunately FULL_DUPLEX != DUPLEX_FULL
 * and HALF_DUPLEX != DUPLEX_HALF
 */
if (adapter->link_duplex == FULL_DUPLEX)
-   ecmd->duplex = DUPLEX_FULL;
+   cmd->base.duplex = DUPLEX_FULL;
else
-   ecmd->duplex = DUPLEX_HALF;
+   cmd->base.duplex = DUPLEX_HALF;
} else {
-   ethtool_cmd_speed_set(ecmd, SPEED_UNKNOWN);
-   ecmd->duplex = DUPLEX_UNKNOWN;
+   cmd->base.speed = SPEED_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
}
 
-   ecmd->autoneg = ((hw->media_type ==

Re: [PATCH v2] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-21 Thread Stephen Hemminger

On Tue, 21 Mar 2017 23:28:45 +0100
Linus Lüssing  wrote:

> However, the IP code drops it in the beginning of ip_input.c/ip_rcv()
> as the dnat target did not update the skb->pkt_type. If after
> dnat'ing the packet is now destined to us then the skb->pkt_type
> needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too.

Why not fix DNAT netfilter module rather than hacking bridge code here.

Re: [PATCH net-next 1/4] drivers: net: xgene-v2: Add MDIO support

2017-03-21 Thread Iyappan Subramanian

On Tue, Mar 21, 2017 at 1:35 PM, Andrew Lunn  wrote:
>> @@ -511,9 +512,9 @@ static int xge_close(struct net_device *ndev)
>>  {
>>   struct xge_pdata *pdata = netdev_priv(ndev);
>>
>> - netif_carrier_off(ndev);
>>   netif_stop_queue(ndev);
>>   xge_mac_disable(pdata);
>> + phy_stop(ndev->phydev);
>>
>>   xge_intr_disable(pdata);
>>   xge_free_irq(ndev);
>> @@ -683,9 +684,14 @@ static int xge_probe(struct platform_device *pdev)
>>   if (ret)
>>   goto err;
>>
>> + spin_lock_init(>mdio_lock);
>> +
>
> ...
>
>> +static int xge_mdio_write(struct mii_bus *bus, int phy_id, int reg, u16 
>> data)
>> +{
>> + struct xge_pdata *pdata = bus->priv;
>> + u32 done, val = 0;
>> + u8 wait = 10;
>> + int ret = 0;
>> +
>> + spin_lock(>mdio_lock);
>> +
>> + SET_REG_BITS(, PHY_ADDR, phy_id);
>> + SET_REG_BITS(, REG_ADDR, reg);
>> + xge_wr_csr(pdata, MII_MGMT_ADDRESS, val);
>> +
>> + xge_wr_csr(pdata, MII_MGMT_CONTROL, data);
>> + do {
>> + usleep_range(5, 10);
>> + done = xge_rd_csr(pdata, MII_MGMT_INDICATORS);
>> + } while ((done & MII_MGMT_BUSY) && wait--);
>> +
>> + if (done & MII_MGMT_BUSY) {
>> + dev_err(>dev, "MII_MGMT write failed\n");
>> + ret = -ETIMEDOUT;
>> + }
>> +
>> + spin_unlock(>mdio_lock);
>> +
>> + return ret;
>> +}
>> +
>> +static int xge_mdio_read(struct mii_bus *bus, int phy_id, int reg)
>> +{
>> + struct xge_pdata *pdata = bus->priv;
>> + u32 data, done, val = 0;
>> + u8 wait = 10;
>> +
>> + spin_lock(>mdio_lock);
>> +
>
> Hi Iyappan
>
> Please could you explain what this lock is protecting which the
> mii_bus mdio_lock in mdio_bus.c is not protecting?

Hi Keyur,

Please could you explain what this lock is protecting which the
mii_bus mdio_lock in mdio_bus.c is not protecting?

I agree with him.  Actually there is a mutex on mdio_bus.  So the mdio
bus read and write are locked.  we don't need the lock.

Do you agree ?


>
> Thanks
> Andrew

[net-next 4/5] igbvf: use new API ethtool_{get|set}_link_ksettings

2017-03-21 Thread Jeff Kirsher

From: Philippe Reynes 

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igbvf/ethtool.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/ethtool.c 
b/drivers/net/ethernet/intel/igbvf/ethtool.c
index 8dea1b1367ef..34faa113a8a0 100644
--- a/drivers/net/ethernet/intel/igbvf/ethtool.c
+++ b/drivers/net/ethernet/intel/igbvf/ethtool.c
@@ -71,45 +71,45 @@ static const char igbvf_gstrings_test[][ETH_GSTRING_LEN] = {
 
 #define IGBVF_TEST_LEN ARRAY_SIZE(igbvf_gstrings_test)
 
-static int igbvf_get_settings(struct net_device *netdev,
- struct ethtool_cmd *ecmd)
+static int igbvf_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings *cmd)
 {
struct igbvf_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
u32 status;
 
-   ecmd->supported   = SUPPORTED_1000baseT_Full;
+   ethtool_link_ksettings_zero_link_mode(cmd, supported);
+   ethtool_link_ksettings_add_link_mode(cmd, supported, 1000baseT_Full);
+   ethtool_link_ksettings_zero_link_mode(cmd, advertising);
+   ethtool_link_ksettings_add_link_mode(cmd, advertising, 1000baseT_Full);
 
-   ecmd->advertising = ADVERTISED_1000baseT_Full;
-
-   ecmd->port = -1;
-   ecmd->transceiver = XCVR_DUMMY1;
+   cmd->base.port = -1;
 
status = er32(STATUS);
if (status & E1000_STATUS_LU) {
if (status & E1000_STATUS_SPEED_1000)
-   ethtool_cmd_speed_set(ecmd, SPEED_1000);
+   cmd->base.speed = SPEED_1000;
else if (status & E1000_STATUS_SPEED_100)
-   ethtool_cmd_speed_set(ecmd, SPEED_100);
+   cmd->base.speed = SPEED_100;
else
-   ethtool_cmd_speed_set(ecmd, SPEED_10);
+   cmd->base.speed = SPEED_10;
 
if (status & E1000_STATUS_FD)
-   ecmd->duplex = DUPLEX_FULL;
+   cmd->base.duplex = DUPLEX_FULL;
else
-   ecmd->duplex = DUPLEX_HALF;
+   cmd->base.duplex = DUPLEX_HALF;
} else {
-   ethtool_cmd_speed_set(ecmd, SPEED_UNKNOWN);
-   ecmd->duplex = DUPLEX_UNKNOWN;
+   cmd->base.speed = SPEED_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
}
 
-   ecmd->autoneg = AUTONEG_DISABLE;
+   cmd->base.autoneg = AUTONEG_DISABLE;
 
return 0;
 }
 
-static int igbvf_set_settings(struct net_device *netdev,
- struct ethtool_cmd *ecmd)
+static int igbvf_set_link_ksettings(struct net_device *netdev,
+   const struct ethtool_link_ksettings *cmd)
 {
return -EOPNOTSUPP;
 }
@@ -443,8 +443,6 @@ static void igbvf_get_strings(struct net_device *netdev, 
u32 stringset,
 }
 
 static const struct ethtool_ops igbvf_ethtool_ops = {
-   .get_settings   = igbvf_get_settings,
-   .set_settings   = igbvf_set_settings,
.get_drvinfo= igbvf_get_drvinfo,
.get_regs_len   = igbvf_get_regs_len,
.get_regs   = igbvf_get_regs,
@@ -467,6 +465,8 @@ static const struct ethtool_ops igbvf_ethtool_ops = {
.get_ethtool_stats  = igbvf_get_ethtool_stats,
.get_coalesce   = igbvf_get_coalesce,
.set_coalesce   = igbvf_set_coalesce,
+   .get_link_ksettings = igbvf_get_link_ksettings,
+   .set_link_ksettings = igbvf_set_link_ksettings,
 };
 
 void igbvf_set_ethtool_ops(struct net_device *netdev)
-- 
2.12.0

[net-next 5/5] ixgb: use new API ethtool_{get|set}_link_ksettings

2017-03-21 Thread Jeff Kirsher

From: Philippe Reynes 

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c | 39 +++---
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c 
b/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
index e5d72559cca9..d10a0d242dda 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c
@@ -94,24 +94,30 @@ static struct ixgb_stats ixgb_gstrings_stats[] = {
 #define IXGB_STATS_LEN ARRAY_SIZE(ixgb_gstrings_stats)
 
 static int
-ixgb_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
+ixgb_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings *cmd)
 {
struct ixgb_adapter *adapter = netdev_priv(netdev);
 
-   ecmd->supported = (SUPPORTED_1baseT_Full | SUPPORTED_FIBRE);
-   ecmd->advertising = (ADVERTISED_1baseT_Full | ADVERTISED_FIBRE);
-   ecmd->port = PORT_FIBRE;
-   ecmd->transceiver = XCVR_EXTERNAL;
+   ethtool_link_ksettings_zero_link_mode(cmd, supported);
+   ethtool_link_ksettings_add_link_mode(cmd, supported, 1baseT_Full);
+   ethtool_link_ksettings_add_link_mode(cmd, supported, FIBRE);
+
+   ethtool_link_ksettings_zero_link_mode(cmd, advertising);
+   ethtool_link_ksettings_add_link_mode(cmd, advertising, 1baseT_Full);
+   ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
+
+   cmd->base.port = PORT_FIBRE;
 
if (netif_carrier_ok(adapter->netdev)) {
-   ethtool_cmd_speed_set(ecmd, SPEED_1);
-   ecmd->duplex = DUPLEX_FULL;
+   cmd->base.speed = SPEED_1;
+   cmd->base.duplex = DUPLEX_FULL;
} else {
-   ethtool_cmd_speed_set(ecmd, SPEED_UNKNOWN);
-   ecmd->duplex = DUPLEX_UNKNOWN;
+   cmd->base.speed = SPEED_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
}
 
-   ecmd->autoneg = AUTONEG_DISABLE;
+   cmd->base.autoneg = AUTONEG_DISABLE;
return 0;
 }
 
@@ -126,13 +132,14 @@ void ixgb_set_speed_duplex(struct net_device *netdev)
 }
 
 static int
-ixgb_set_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
+ixgb_set_link_ksettings(struct net_device *netdev,
+   const struct ethtool_link_ksettings *cmd)
 {
struct ixgb_adapter *adapter = netdev_priv(netdev);
-   u32 speed = ethtool_cmd_speed(ecmd);
+   u32 speed = cmd->base.speed;
 
-   if (ecmd->autoneg == AUTONEG_ENABLE ||
-   (speed + ecmd->duplex != SPEED_1 + DUPLEX_FULL))
+   if (cmd->base.autoneg == AUTONEG_ENABLE ||
+   (speed + cmd->base.duplex != SPEED_1 + DUPLEX_FULL))
return -EINVAL;
 
if (netif_running(adapter->netdev)) {
@@ -630,8 +637,6 @@ ixgb_get_strings(struct net_device *netdev, u32 stringset, 
u8 *data)
 }
 
 static const struct ethtool_ops ixgb_ethtool_ops = {
-   .get_settings = ixgb_get_settings,
-   .set_settings = ixgb_set_settings,
.get_drvinfo = ixgb_get_drvinfo,
.get_regs_len = ixgb_get_regs_len,
.get_regs = ixgb_get_regs,
@@ -649,6 +654,8 @@ static const struct ethtool_ops ixgb_ethtool_ops = {
.set_phys_id = ixgb_set_phys_id,
.get_sset_count = ixgb_get_sset_count,
.get_ethtool_stats = ixgb_get_ethtool_stats,
+   .get_link_ksettings = ixgb_get_link_ksettings,
+   .set_link_ksettings = ixgb_set_link_ksettings,
 };
 
 void ixgb_set_ethtool_ops(struct net_device *netdev)
-- 
2.12.0

[net-next 0/5][pull request] Intel Wired LAN Driver Updates 2017-03-21

2017-03-21 Thread Jeff Kirsher

This series contains updates to e1000, e1000e, igb, igbvf and ixgb.

This finishes up the work Philippe Reynes did to update the Intel drivers
to the new API for ethtool (get|set)_link_ksettings.

The following are changes since commit b3407c8e5eb78e4e0b57a97a4dd2e411354b60cd:
  Cleanup some warning from timestamping code.
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 1GbE

Philippe Reynes (5):
  e1000: use new API ethtool_{get|set}_link_ksettings
  e1000e: use new API ethtool_{get|set}_link_ksettings
  igb: use new API ethtool_{get|set}_link_ksettings
  igbvf: use new API ethtool_{get|set}_link_ksettings
  ixgb: use new API ethtool_{get|set}_link_ksettings

 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 117 ++-
 drivers/net/ethernet/intel/e1000e/ethtool.c  | 111 +-
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 138 ---
 drivers/net/ethernet/intel/igbvf/ethtool.c   |  38 +++
 drivers/net/ethernet/intel/ixgb/ixgb_ethtool.c   |  39 ---
 5 files changed, 232 insertions(+), 211 deletions(-)

-- 
2.12.0

Re: [PATCH 01/11] net: usb: usbnet: add new api ethtool_{get|set}_link_ksettings

2017-03-21 Thread David Miller

From: Oliver Neukum 
Date: Tue, 21 Mar 2017 12:33:03 +0100

> Am Donnerstag, den 16.03.2017, 23:18 +0100 schrieb Philippe Reynes:
>> The ethtool api {get|set}_settings is deprecated.
>> We add the new api {get|set}_link_ksettings to this driver.
>> 
>> As I don't have the hardware, I'd be very pleased if
>> someone may test this patch.
>> 
> 
> Unfortunately I lack hardware to test.
> Phillipe and I had a patch collision. David, please take his
> patch for the next merge window. It looks good and is comprehensive and
> nobody has reported issues. We will never find testers for all those
> drivers on this list.

Ok.

[net-next 3/5] igb: use new API ethtool_{get|set}_link_ksettings

2017-03-21 Thread Jeff Kirsher

From: Philippe Reynes 

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 138 ++-
 1 file changed, 73 insertions(+), 65 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 797b9daba224..0efb62db6efd 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -151,7 +151,8 @@ static const char igb_priv_flags_strings[][ETH_GSTRING_LEN] 
= {
 
 #define IGB_PRIV_FLAGS_STR_LEN ARRAY_SIZE(igb_priv_flags_strings)
 
-static int igb_get_settings(struct net_device *netdev, struct ethtool_cmd 
*ecmd)
+static int igb_get_link_ksettings(struct net_device *netdev,
+ struct ethtool_link_ksettings *cmd)
 {
struct igb_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
@@ -159,76 +160,73 @@ static int igb_get_settings(struct net_device *netdev, 
struct ethtool_cmd *ecmd)
struct e1000_sfp_flags *eth_flags = _spec->eth_flags;
u32 status;
u32 speed;
+   u32 supported, advertising;
 
status = rd32(E1000_STATUS);
if (hw->phy.media_type == e1000_media_type_copper) {
 
-   ecmd->supported = (SUPPORTED_10baseT_Half |
-  SUPPORTED_10baseT_Full |
-  SUPPORTED_100baseT_Half |
-  SUPPORTED_100baseT_Full |
-  SUPPORTED_1000baseT_Full|
-  SUPPORTED_Autoneg |
-  SUPPORTED_TP |
-  SUPPORTED_Pause);
-   ecmd->advertising = ADVERTISED_TP;
+   supported = (SUPPORTED_10baseT_Half |
+SUPPORTED_10baseT_Full |
+SUPPORTED_100baseT_Half |
+SUPPORTED_100baseT_Full |
+SUPPORTED_1000baseT_Full|
+SUPPORTED_Autoneg |
+SUPPORTED_TP |
+SUPPORTED_Pause);
+   advertising = ADVERTISED_TP;
 
if (hw->mac.autoneg == 1) {
-   ecmd->advertising |= ADVERTISED_Autoneg;
+   advertising |= ADVERTISED_Autoneg;
/* the e1000 autoneg seems to match ethtool nicely */
-   ecmd->advertising |= hw->phy.autoneg_advertised;
+   advertising |= hw->phy.autoneg_advertised;
}
 
-   ecmd->port = PORT_TP;
-   ecmd->phy_address = hw->phy.addr;
-   ecmd->transceiver = XCVR_INTERNAL;
+   cmd->base.port = PORT_TP;
+   cmd->base.phy_address = hw->phy.addr;
} else {
-   ecmd->supported = (SUPPORTED_FIBRE |
-  SUPPORTED_1000baseKX_Full |
-  SUPPORTED_Autoneg |
-  SUPPORTED_Pause);
-   ecmd->advertising = (ADVERTISED_FIBRE |
-ADVERTISED_1000baseKX_Full);
+   supported = (SUPPORTED_FIBRE |
+SUPPORTED_1000baseKX_Full |
+SUPPORTED_Autoneg |
+SUPPORTED_Pause);
+   advertising = (ADVERTISED_FIBRE |
+  ADVERTISED_1000baseKX_Full);
if (hw->mac.type == e1000_i354) {
if ((hw->device_id ==
 E1000_DEV_ID_I354_BACKPLANE_2_5GBPS) &&
!(status & E1000_STATUS_2P5_SKU_OVER)) {
-   ecmd->supported |= SUPPORTED_2500baseX_Full;
-   ecmd->supported &=
-   ~SUPPORTED_1000baseKX_Full;
-   ecmd->advertising |= ADVERTISED_2500baseX_Full;
-   ecmd->advertising &=
-   ~ADVERTISED_1000baseKX_Full;
+   supported |= SUPPORTED_2500baseX_Full;
+   supported &= ~SUPPORTED_1000baseKX_Full;
+   advertising |= ADVERTISED_2500baseX_Full;
+   advertising &= ~ADVERTISED_1000baseKX_Full;
}
}
if (eth_flags->e100_base_fx) {
-   ecmd->supported |= SUPPORTED_100baseT_Full;
-

[net-next 2/5] e1000e: use new API ethtool_{get|set}_link_ksettings

2017-03-21 Thread Jeff Kirsher

From: Philippe Reynes 

The ethtool API {get|set}_settings is deprecated.
We move this driver to new API {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000e/ethtool.c | 111 +++-
 1 file changed, 59 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c 
b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 7aff68a4a4df..e70b1ebff60d 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -117,55 +117,52 @@ static const char e1000_gstrings_test[][ETH_GSTRING_LEN] 
= {
 
 #define E1000_TEST_LEN ARRAY_SIZE(e1000_gstrings_test)
 
-static int e1000_get_settings(struct net_device *netdev,
- struct ethtool_cmd *ecmd)
+static int e1000_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings *cmd)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
-   u32 speed;
+   u32 speed, supported, advertising;
 
if (hw->phy.media_type == e1000_media_type_copper) {
-   ecmd->supported = (SUPPORTED_10baseT_Half |
-  SUPPORTED_10baseT_Full |
-  SUPPORTED_100baseT_Half |
-  SUPPORTED_100baseT_Full |
-  SUPPORTED_1000baseT_Full |
-  SUPPORTED_Autoneg |
-  SUPPORTED_TP);
+   supported = (SUPPORTED_10baseT_Half |
+SUPPORTED_10baseT_Full |
+SUPPORTED_100baseT_Half |
+SUPPORTED_100baseT_Full |
+SUPPORTED_1000baseT_Full |
+SUPPORTED_Autoneg |
+SUPPORTED_TP);
if (hw->phy.type == e1000_phy_ife)
-   ecmd->supported &= ~SUPPORTED_1000baseT_Full;
-   ecmd->advertising = ADVERTISED_TP;
+   supported &= ~SUPPORTED_1000baseT_Full;
+   advertising = ADVERTISED_TP;
 
if (hw->mac.autoneg == 1) {
-   ecmd->advertising |= ADVERTISED_Autoneg;
+   advertising |= ADVERTISED_Autoneg;
/* the e1000 autoneg seems to match ethtool nicely */
-   ecmd->advertising |= hw->phy.autoneg_advertised;
+   advertising |= hw->phy.autoneg_advertised;
}
 
-   ecmd->port = PORT_TP;
-   ecmd->phy_address = hw->phy.addr;
-   ecmd->transceiver = XCVR_INTERNAL;
-
+   cmd->base.port = PORT_TP;
+   cmd->base.phy_address = hw->phy.addr;
} else {
-   ecmd->supported   = (SUPPORTED_1000baseT_Full |
-SUPPORTED_FIBRE |
-SUPPORTED_Autoneg);
+   supported   = (SUPPORTED_1000baseT_Full |
+  SUPPORTED_FIBRE |
+  SUPPORTED_Autoneg);
 
-   ecmd->advertising = (ADVERTISED_1000baseT_Full |
-ADVERTISED_FIBRE |
-ADVERTISED_Autoneg);
+   advertising = (ADVERTISED_1000baseT_Full |
+  ADVERTISED_FIBRE |
+  ADVERTISED_Autoneg);
 
-   ecmd->port = PORT_FIBRE;
-   ecmd->transceiver = XCVR_EXTERNAL;
+   cmd->base.port = PORT_FIBRE;
}
 
speed = SPEED_UNKNOWN;
-   ecmd->duplex = DUPLEX_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
 
if (netif_running(netdev)) {
if (netif_carrier_ok(netdev)) {
speed = adapter->link_speed;
-   ecmd->duplex = adapter->link_duplex - 1;
+   cmd->base.duplex = adapter->link_duplex - 1;
}
} else if (!pm_runtime_suspended(netdev->dev.parent)) {
u32 status = er32(STATUS);
@@ -179,30 +176,36 @@ static int e1000_get_settings(struct net_device *netdev,
speed = SPEED_10;
 
if (status & E1000_STATUS_FD)
-   ecmd->duplex = DUPLEX_FULL;
+   cmd->base.duplex = DUPLEX_FULL;
else
-   ecmd->duplex = DUPLEX_HALF;
+   cmd->base.duplex = DUPLEX_HALF;
}
}
 
-   ethtool_cmd_speed_set(ecmd, speed);
-

Re: [PATCH 00/11] net: usbnet: move to new api ethtool_{get|set}_link_ksettings

2017-03-21 Thread David Miller

From: Philippe Reynes 
Date: Thu, 16 Mar 2017 23:18:46 +0100

> The ethtool api {get|set}_settings is deprecated. On usbnet, it
> was often implemented with usbnet_{get|set}_settings.
> 
> In this serie, I add usbnet_{get|set}_link_ksettings
> in the first patch, then I update all the driver to
> use this new api, and in the last patch I remove the
> old api usbnet_{get|set}_settings.

Series applied, thanks Philippe.

Re: [PATCH net-next v2] net: Add sysctl to toggle early demux for tcp and udp

2017-03-21 Thread Tom Herbert

On Sat, Mar 18, 2017 at 7:07 PM, Subash Abhinov Kasiviswanathan
 wrote:
>> Less than 1% performance improvement in a benchmark doesn't justify
>> the complexity of the patch. Eric's hypothesis was that an unconnected
>> UDP socket may show issues because of cache misses in look-ups due to
>> so many different sources. This should be fairly easy to benchmark by
>> randomly setting source address in your test (IP any and routing my
>> need to be set appropriately).
>>
>
> With different source addresses, a larger increase is seen here
> (633->654Mbps).
>
Thanks for running the tests. It's obviously not a huge win at least
relative to performance improvement we got from early demux, but I
suppose with very specific and engineered loads this might have value.
Please include this is next patch sets.

Generally, I think a good goal moving forward would be a to apply the
0 or 1 times rule for connection lookup. That is for any transport
tuple in a receive packet we want to do at most one connection lookup.
So early demux would need to apply to unconnected sockets and then we
wouldn't have to do the second lookup in UDP (or TCP for a SYN)
receive (note we also do an extra lookup for GRO with UDP
encapsulation). A reason we haven't this before might be that the
lookup may actually find the wrong socket (for example we go into a
different network namespace). Maybe the stack should consider any
lookup result outside of the protocol stack to be provisional (and it
would be super nice if we could somehow cache a dst with an
unconnected socket also ;-) )

Tom

>
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project

Re: [PATCH net] net: bcmgenet: remove bcmgenet_internal_phy_setup()

2017-03-21 Thread Florian Fainelli

On 03/21/2017 02:01 PM, Doug Berger wrote:
> Commit 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset")
> removed the bcmgenet_mii_reset() function from bcmgenet_power_up() and
> bcmgenet_internal_phy_setup() functions.  In so doing it broke the reset
> of the internal PHY devices used by the GENETv1-GENETv3 which required
> this reset before the UniMAC was enabled.  It also broke the internal
> GPHY devices used by the GENETv4 because the config_init that installed
> the AFE workaround was no longer occurring after the reset of the GPHY
> performed by bcmgenet_phy_power_set() in bcmgenet_internal_phy_setup().
> In addition the code in bcmgenet_internal_phy_setup() related to the
> "enable APD" comment goes with the bcmgenet_mii_reset() so it should
> have also been removed.
> 
> Commit bd4060a6108b ("net: bcmgenet: Power on integrated GPHY in
> bcmgenet_power_up()") moved the bcmgenet_phy_power_set() call to the
> bcmgenet_power_up() function, but failed to remove it from the
> bcmgenet_internal_phy_setup() function.  Had it done so, the
> bcmgenet_internal_phy_setup() function would have been empty and could
> have been removed at that time.
> 
> Commit 5dbebbb44a6a ("net: bcmgenet: Software reset EPHY after power on")
> was submitted to correct the functional problems introduced by
> commit 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset"). It
> was included in v4.4 and made available on 4.3-stable. Unfortunately,
> it didn't fully revert the commit because this bcmgenet_mii_reset()
> doesn't apply the soft reset to the internal GPHY used by GENETv4 like
> the previous one did. This prevents the restoration of the AFE work-
> arounds for internal GPHY devices after the bcmgenet_phy_power_set() in
> bcmgenet_internal_phy_setup().
> 
> This commit takes the alternate approach of removing the unnecessary
> bcmgenet_internal_phy_setup() function which shouldn't have been in v4.3
> so that when bcmgenet_mii_reset() was restored it should have only gone
> into bcmgenet_power_up().  This will avoid the problems while also
> removing the redundancy (and hopefully some of the confusion).
> 
> Fixes: 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset")
> Signed-off-by: Doug Berger 

Too bad the commit message award of the year was already won [1],
because you are definitively on the short list for that one above.

Reviewed-by: Florian Fainelli 

> ---

[1]: https://lkml.org/lkml/2017/3/10/1583
-- 
Florian

Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-21 Thread David Miller

From: Eric Dumazet 
Date: Tue, 21 Mar 2017 14:23:09 -0700

> It looks like our suggestion to get kernel builds with atomic_inc()
> being exactly an atomic_inc() is not even discussed or implemented.
> 
> Coding this would require less time than running a typical Google kernel
> qualification (roughly one month, thousands of hosts..., days of SWE).

+1

Re: [PATCH] net: unix: properly re-increment inflight counter of GC discarded candidates

2017-03-21 Thread David Miller

From: Andrey Ulanov 
Date: Tue, 14 Mar 2017 20:16:42 -0700

> Dmitry has reported that a BUG_ON() condition in unix_notinflight()
> may be triggered by a simple code that forwards unix socket in an
> SCM_RIGHTS message.
> That is caused by incorrect unix socket GC implementation in unix_gc().
> 
> The GC first collects list of candidates, then (a) decrements their
> "children's" inflight counter, (b) checks which inflight counters are
> now 0, and then (c) increments all inflight counters back.
> (a) and (c) are done by calling scan_children() with inc_inflight or
> dec_inflight as the second argument.
> 
> Commit 6209344f5a37 ("net: unix: fix inflight counting bug in garbage
> collector") changed scan_children() such that it no longer considers
> sockets that do not have UNIX_GC_CANDIDATE flag. It also added a block
> of code that that unsets this flag _before_ invoking
> scan_children(, dec_iflight, ). This may lead to incorrect inflight
> counters for some sockets.
> 
> This change fixes this bug by changing order of operations:
> UNIX_GC_CANDIDATE is now unset only after all inflight counters are
> restored to the original state.
> 
>   kernel BUG at net/unix/garbage.c:149!
>   RIP: 0010:[]  []
>   unix_notinflight+0x3b4/0x490 net/unix/garbage.c:149
>   Call Trace:
>[] unix_detach_fds.isra.19+0xff/0x170 
> net/unix/af_unix.c:1487
>[] unix_destruct_scm+0xf9/0x210 net/unix/af_unix.c:1496
>[] skb_release_head_state+0x101/0x200 
> net/core/skbuff.c:655
>[] skb_release_all+0x1a/0x60 net/core/skbuff.c:668
>[] __kfree_skb+0x1a/0x30 net/core/skbuff.c:684
>[] kfree_skb+0x184/0x570 net/core/skbuff.c:705
>[] unix_release_sock+0x5b5/0xbd0 net/unix/af_unix.c:559
>[] unix_release+0x49/0x90 net/unix/af_unix.c:836
>[] sock_release+0x92/0x1f0 net/socket.c:570
>[] sock_close+0x1b/0x20 net/socket.c:1017
>[] __fput+0x34e/0x910 fs/file_table.c:208
>[] fput+0x1a/0x20 fs/file_table.c:244
>[] task_work_run+0x1a0/0x280 kernel/task_work.c:116
>[< inline >] exit_task_work include/linux/task_work.h:21
>[] do_exit+0x183a/0x2640 kernel/exit.c:828
>[] do_group_exit+0x14e/0x420 kernel/exit.c:931
>[] get_signal+0x663/0x1880 kernel/signal.c:2307
>[] do_signal+0xc5/0x2190 arch/x86/kernel/signal.c:807
>[] exit_to_usermode_loop+0x1ea/0x2d0
>   arch/x86/entry/common.c:156
>[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
>[] syscall_return_slowpath+0x4d3/0x570
>   arch/x86/entry/common.c:259
>[] entry_SYSCALL_64_fastpath+0xc4/0xc6
> 
> Link: https://lkml.org/lkml/2017/3/6/252
> Signed-off-by: Andrey Ulanov 
> Reported-by: Dmitry Vyukov 
> Fixes: 6209344 ("net: unix: fix inflight counting bug in garbage collector")

Applied and queued up for -stable, thanks.

Re: [PATCH net-next] net/8021q: create device with all possible features in wanted_features

2017-03-21 Thread David Miller

From: Andrei Vagin 
Date: Wed, 15 Mar 2017 17:41:14 -0700

> wanted_features is a set of features which have to be enabled if a
> hardware allows that.
> 
> Currently when a vlan device is created, its wanted_features is set to
> current features of its base device.
> 
> The problem is that the base device can get new features and they are
> not propagated to vlan-s of this device.
> 
> If we look at bonding devices, they doesn't have this problem and this
> patch suggests to fix this issue by the same way how it works for bonding
> devices.
> 
> We meet this problem, when we try to create a vlan device over a bonding
> device. When a system are booting, real devices require time to be
> initialized, so bonding devices created without slaves, then vlan
> devices are created and only then ethernet devices are added to the
> bonding device. As a result we have vlan devices with disabled
> scatter-gather.
> 
> * create a bonding device
>   $ ip link add bond0 type bond
>   $ ethtool -k bond0 | grep scatter
>   scatter-gather: off
>   tx-scatter-gather: off [requested on]
>   tx-scatter-gather-fraglist: off [requested on]
> 
> * create a vlan device
>   $ ip link add link bond0 name bond0.10 type vlan id 10
>   $ ethtool -k bond0.10 | grep scatter
>   scatter-gather: off
>   tx-scatter-gather: off
>   tx-scatter-gather-fraglist: off
> 
> * Add a slave device to bond0
>   $ ip link set dev eth0 master bond0
> 
> And now we can see that the bond0 device has got the scatter-gather
> feature, but the bond0.10 hasn't got it.
> [root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter
> scatter-gather: on
>   tx-scatter-gather: on
>   tx-scatter-gather-fraglist: on
> [root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter
> scatter-gather: off
>   tx-scatter-gather: off
>   tx-scatter-gather-fraglist: off
> 
> With this patch the vlan device will get all new features from the
> bonding device.
> 
> Here is a call trace how features which are set in this patch reach
> dev->wanted_features.
> 
> register_netdevice
>vlan_dev_init
>   ...
>   dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
>  NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
>  NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
>  NETIF_F_ALL_FCOE;
> 
>   dev->features |= dev->hw_features;
>   ...
> dev->wanted_features = dev->features & dev->hw_features;
> __netdev_update_features(dev);
> vlan_dev_fix_features
>  ...
> 
> Signed-off-by: Andrei Vagin 

Applied, thanks.

Re: [PATCH] tun: fix inability to set offloads after disabling them via ethtool

2017-03-21 Thread David Miller

From: Yaroslav Isakov 
Date: Thu, 16 Mar 2017 22:44:10 +0300

> Added missing logic in tun driver, which prevents apps to set
> offloads using tun ioctl, if offloads were previously disabled via ethtool
> 
> Signed-off-by: Yaroslav Isakov 

Applied, thanks.

Re: [PATCH net-next] net: bcmgenet: Track per TX/RX rings statistics

2017-03-21 Thread David Miller

From: Florian Fainelli 
Date: Thu, 16 Mar 2017 10:27:08 -0700

> __bcmgenet_tx_reclaim() is currently summing TX bytes/packets in a way
> that is not SMP friendly, mutliples CPUs could run
> __bcmgenet_tx_reclaim() independently and still update stats->tx_bytes
> and stats->tx_packets, cloberring the other CPUs statistics.
> 
> Fix this by tracking per RX and TX rings the number of bytes, packets,
> dropped and errors statistics, and provide a bcmgenet_get_stats()
> function which aggregates everything and returns a consistent output.
> 
> Signed-off-by: Florian Fainelli 

Applied, thanks.

Re: [PATCH net-next v4] net: ipv4: add support for ECMP hash policy choice

2017-03-21 Thread David Miller

From: Nikolay Aleksandrov 
Date: Thu, 16 Mar 2017 15:28:00 +0200

> This patch adds support for ECMP hash policy choice via a new sysctl
> called fib_multipath_hash_policy and also adds support for L4 hashes.
> The current values for fib_multipath_hash_policy are:
>  0 - layer 3 (default)
>  1 - layer 4
> If there's an skb hash already set and it matches the chosen policy then it
> will be used instead of being calculated (currently only for L4).
> In L3 mode we always calculate the hash due to the ICMP error special
> case, the flow dissector's field consistentification should handle the
> address order thus we can remove the address reversals.
> If the skb is provided we always use it for the hash calculation,
> otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
> 
> Signed-off-by: Nikolay Aleksandrov 

Applied, thanks Nikolay.

[PATCH v2] bridge: ebtables: fix reception of frames DNAT-ed to bridge device

2017-03-21 Thread Linus Lüssing

When trying to redirect bridged frames to the bridge device itself
via the ebtables nat-prerouting chain and the dnat target then this
currently fails:

The ethernet destination of the frame is dnat'ed to the MAC address of
the bridge itself just fine and the correctly altered frame can even
be captured via a tcpdump on br0 (with or without promisc mode).

However, the IP code drops it in the beginning of ip_input.c/ip_rcv()
as the dnat target did not update the skb->pkt_type. If after
dnat'ing the packet is now destined to us then the skb->pkt_type
needs to be updated from PACKET_OTHERHOST to PACKET_HOST, too.

Signed-off-by: Linus Lüssing 

---
Changelog v2:
* refrain from altering pkt_type for multicast packets
  with a unicast destination MAC
---
 net/bridge/br_input.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 013f2290b..fd7bc4c 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -198,8 +198,13 @@ int br_handle_frame_finish(struct net *net, struct sock 
*sk, struct sk_buff *skb
if (dst) {
unsigned long now = jiffies;
 
-   if (dst->is_local)
+   if (dst->is_local) {
+   /* fix up potential DNAT inconsistencies */
+   if (skb->pkt_type == PACKET_OTHERHOST)
+   skb->pkt_type = PACKET_HOST;
+
return br_pass_frame_up(skb);
+   }
 
if (now != dst->used)
dst->used = now;
-- 
2.1.4

[PATCH 2] net: virtio_net: use new api ethtool_{get|set}_link_ksettings

2017-03-21 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes 
---
Changelog:
v2:
- remove comment about the missing hardware,
  I've tested this change with qemu

 drivers/net/virtio_net.c |   50 +++--
 1 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ea9890d..b0d241d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1636,47 +1636,57 @@ static void virtnet_get_channels(struct net_device *dev,
 }
 
 /* Check if the user is trying to change anything besides speed/duplex */
-static bool virtnet_validate_ethtool_cmd(const struct ethtool_cmd *cmd)
+static bool
+virtnet_validate_ethtool_cmd(const struct ethtool_link_ksettings *cmd)
 {
-   struct ethtool_cmd diff1 = *cmd;
-   struct ethtool_cmd diff2 = {};
+   struct ethtool_link_ksettings diff1 = *cmd;
+   struct ethtool_link_ksettings diff2 = {};
 
/* cmd is always set so we need to clear it, validate the port type
 * and also without autonegotiation we can ignore advertising
 */
-   ethtool_cmd_speed_set(, 0);
-   diff2.port = PORT_OTHER;
-   diff1.advertising = 0;
-   diff1.duplex = 0;
-   diff1.cmd = 0;
+   diff1.base.speed = 0;
+   diff2.base.port = PORT_OTHER;
+   ethtool_link_ksettings_zero_link_mode(, advertising);
+   diff1.base.duplex = 0;
+   diff1.base.cmd = 0;
+   diff1.base.link_mode_masks_nwords = 0;
 
-   return !memcmp(, , sizeof(diff1));
+   return !memcmp(, , sizeof(diff1.base)) &&
+   bitmap_empty(diff1.link_modes.supported,
+__ETHTOOL_LINK_MODE_MASK_NBITS) &&
+   bitmap_empty(diff1.link_modes.advertising,
+__ETHTOOL_LINK_MODE_MASK_NBITS) &&
+   bitmap_empty(diff1.link_modes.lp_advertising,
+__ETHTOOL_LINK_MODE_MASK_NBITS);
 }
 
-static int virtnet_set_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
+static int virtnet_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
 {
struct virtnet_info *vi = netdev_priv(dev);
u32 speed;
 
-   speed = ethtool_cmd_speed(cmd);
+   speed = cmd->base.speed;
/* don't allow custom speed and duplex */
if (!ethtool_validate_speed(speed) ||
-   !ethtool_validate_duplex(cmd->duplex) ||
+   !ethtool_validate_duplex(cmd->base.duplex) ||
!virtnet_validate_ethtool_cmd(cmd))
return -EINVAL;
vi->speed = speed;
-   vi->duplex = cmd->duplex;
+   vi->duplex = cmd->base.duplex;
 
return 0;
 }
 
-static int virtnet_get_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
+static int virtnet_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
 {
struct virtnet_info *vi = netdev_priv(dev);
 
-   ethtool_cmd_speed_set(cmd, vi->speed);
-   cmd->duplex = vi->duplex;
-   cmd->port = PORT_OTHER;
+   cmd->base.speed = vi->speed;
+   cmd->base.duplex = vi->duplex;
+   cmd->base.port = PORT_OTHER;
 
return 0;
 }
@@ -1696,8 +1706,8 @@ static void virtnet_init_settings(struct net_device *dev)
.set_channels = virtnet_set_channels,
.get_channels = virtnet_get_channels,
.get_ts_info = ethtool_op_get_ts_info,
-   .get_settings = virtnet_get_settings,
-   .set_settings = virtnet_set_settings,
+   .get_link_ksettings = virtnet_get_link_ksettings,
+   .set_link_ksettings = virtnet_set_link_ksettings,
 };
 
 static void virtnet_freeze_down(struct virtio_device *vdev)
-- 
1.7.4.4

[PATCH net-next] enic: update enic maintainers

2017-03-21 Thread Govindarajulu Varadarajan

update enic maintainers

Signed-off-by: Govindarajulu Varadarajan 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d14e42bef72e..5094b78c4acc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3159,7 +3159,6 @@ F:drivers/platform/chrome/
 
 CISCO VIC ETHERNET NIC DRIVER
 M: Christian Benvenuti 
-M: Sujith Sankar 
 M: Govindarajulu Varadarajan <_gov...@gmx.com>
 M: Neel Patel 
 S: Supported
-- 
2.12.0

[PATCH net-next v7 3/3] A Sample of using socket cookie and uid for traffic monitoring

2017-03-21 Thread Chenbo Feng

From: Chenbo Feng 

Add a sample program to demostrate the possible usage of
get_socket_cookie and get_socket_uid helper function. The program will
store bytes and packets counting of in/out traffic monitored by iptables
and store the stats in a bpf map in per socket base. The owner uid of
the socket will be stored as part of the data entry. A shell script for
running the program is also included.

Acked-by: Alexei Starovoitov 
Signed-off-by: Chenbo Feng 
---
 samples/bpf/Makefile |   3 +
 samples/bpf/cookie_uid_helper_example.c  | 217 +++
 samples/bpf/libbpf.h |  10 ++
 samples/bpf/run_cookie_uid_helper_example.sh |  14 ++
 4 files changed, 244 insertions(+)
 create mode 100644 samples/bpf/cookie_uid_helper_example.c
 create mode 100755 samples/bpf/run_cookie_uid_helper_example.sh

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 09e9d53..f803f51 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -34,6 +34,7 @@ hostprogs-y += sampleip
 hostprogs-y += tc_l2_redirect
 hostprogs-y += lwt_len_hist
 hostprogs-y += xdp_tx_iptunnel
+hostprogs-y += per_socket_stats_example
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -72,6 +73,7 @@ sampleip-objs := bpf_load.o $(LIBBPF) sampleip_user.o
 tc_l2_redirect-objs := bpf_load.o $(LIBBPF) tc_l2_redirect_user.o
 lwt_len_hist-objs := bpf_load.o $(LIBBPF) lwt_len_hist_user.o
 xdp_tx_iptunnel-objs := bpf_load.o $(LIBBPF) xdp_tx_iptunnel_user.o
+per_socket_stats_example-objs := $(LIBBPF) cookie_uid_helper_example.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -105,6 +107,7 @@ always += trace_event_kern.o
 always += sampleip_kern.o
 always += lwt_len_hist_kern.o
 always += xdp_tx_iptunnel_kern.o
+always += cookie_uid_helper_example.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/cookie_uid_helper_example.c 
b/samples/bpf/cookie_uid_helper_example.c
new file mode 100644
index 000..f6e5e58
--- /dev/null
+++ b/samples/bpf/cookie_uid_helper_example.c
@@ -0,0 +1,217 @@
+/* This test is a demo of using get_socket_uid and get_socket_cookie
+ * helper function to do per socket based network traffic monitoring.
+ * It requires iptables version higher then 1.6.1. to load pinned eBPF
+ * program into the xt_bpf match.
+ *
+ * TEST:
+ * ./run_cookie_uid_helper_example.sh
+ * Then generate some traffic in variate ways. ping 0 -c 10 would work
+ * but the cookie and uid in this case could both be 0. A sample output
+ * with some traffic generated by web browser is shown below:
+ *
+ * cookie: 877, uid: 0x3e8, Pakcet Count: 20, Bytes Count: 11058
+ * cookie: 132, uid: 0x0, Pakcet Count: 2, Bytes Count: 286
+ * cookie: 812, uid: 0x3e8, Pakcet Count: 3, Bytes Count: 1726
+ * cookie: 802, uid: 0x3e8, Pakcet Count: 2, Bytes Count: 104
+ * cookie: 877, uid: 0x3e8, Pakcet Count: 20, Bytes Count: 11058
+ * cookie: 831, uid: 0x3e8, Pakcet Count: 2, Bytes Count: 104
+ * cookie: 0, uid: 0x0, Pakcet Count: 6, Bytes Count: 712
+ * cookie: 880, uid: 0xfffe, Pakcet Count: 1, Bytes Count: 70
+ *
+ * Clean up: if using shell script, the script file will delete the iptables
+ * rule and unmount the bpf program when exit. Else the iptables rule need
+ * to be deleted by hand, see run_cookie_uid_helper_example.sh for detail.
+ */
+
+#define _GNU_SOURCE
+
+#define offsetof(type, member) __builtin_offsetof(type, member)
+#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "libbpf.h"
+
+struct stats {
+   uint32_t uid;
+   uint64_t packets;
+   uint64_t bytes;
+};
+
+static int map_fd, prog_fd;
+
+static void maps_create(void)
+{
+   map_fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(uint32_t),
+   sizeof(struct stats), 100, 0);
+   if (map_fd < 0)
+   error(1, errno, "map create failed!\n");
+}
+
+static void prog_load(void)
+{
+   static char log_buf[1 << 16];
+
+   struct bpf_insn prog[] = {
+   /*
+* Save sk_buff for future usage. value stored in R6 to R10 will
+* not be reset after a bpf helper function call.
+*/
+   BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+   /*
+* pc1: BPF_FUNC_get_socket_cookie takes one parameter,
+* R1: sk_buff
+*/
+   BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+   BPF_FUNC_get_socket_cookie),
+   /* pc2-4: save  to r7 for future usage*/
+   BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+   BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
+   BPF_ALU64_IMM(BPF_ADD,

[PATCH net-next v7 0/3] net: core: Two Helper function about socket information

2017-03-21 Thread Chenbo Feng

From: Chenbo Feng 

Introduce two eBpf helper function to get the socket cookie and
socket uid for each packet. The helper function is useful when
the *sk field inside sk_buff is not empty. These helper functions
can be used on socket and uid based traffic monitoring programs.

Change since V6:
* change the user namespace of uid helper function back to init_user_ns
  since in some situation, for example, pinned bpf object, the current
  user namespace is not always applicable. 

Change since V5:
* Delete unnecessary blank lines in sample program.
* Refine the variable orders in get_uid helper function.

Change since V4:
* Using current user namespace to get uid instead of using init_ns.
* Add compiling setup of example program in to Makefile.
* Change the name style of the example program binaries.

Change since V3:
* Fixed some typos and incorrect comments in sample program
* replaced raw insns with BPF_STX_XADD and add it to libbpf.h
* Use a temp dir as mount point instead and added a check for
  the user input string.
* Make the get uid helper function returns the user namespace uid
  instead of kuid.
* Return a overflowuid instead of 0 when no uid information is found.

Change since V2:
* Add a sample program to demostrate the usage of the helper function.
* Moved the helper function proto invoking place.
* Add function header into tools/include
* Apply sk_to_full_sk() before getting uid.

Change since V1:
* Removed the unnecessary declarations and export command
* resolved conflict with master branch.
* Examine if the socket is a full socket before getting the uid.


Chenbo Feng (3):
  Add a helper function to get socket cookie in eBPF
  Add a eBPF helper function to retrieve socket uid
  A Sample of using socket cookie and uid for traffic monitoring

 include/linux/sock_diag.h|   1 +
 include/uapi/linux/bpf.h |  16 +-
 net/core/filter.c|  39 +
 net/core/sock_diag.c |   2 +-
 samples/bpf/Makefile |   3 +
 samples/bpf/cookie_uid_helper_example.c  | 217 +++
 samples/bpf/libbpf.h |  10 ++
 samples/bpf/run_cookie_uid_helper_example.sh |  14 ++
 tools/include/uapi/linux/bpf.h   |   4 +-
 9 files changed, 303 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/cookie_uid_helper_example.c
 create mode 100755 samples/bpf/run_cookie_uid_helper_example.sh

-- 
2.7.4

[PATCH net-next v7 1/3] Add a helper function to get socket cookie in eBPF

2017-03-21 Thread Chenbo Feng

From: Chenbo Feng 

Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff
with a known socket. Generates a new cookie if one was not yet set.If
the socket pointer inside sk_buff is NULL, 0 is returned. The helper
function coud be useful in monitoring per socket networking traffic
statistics and provide a unique socket identifier per namespace.

Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Signed-off-by: Chenbo Feng 
---
 include/linux/sock_diag.h  |  1 +
 include/uapi/linux/bpf.h   |  9 -
 net/core/filter.c  | 17 +
 net/core/sock_diag.c   |  2 +-
 tools/include/uapi/linux/bpf.h |  3 ++-
 5 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/include/linux/sock_diag.h b/include/linux/sock_diag.h
index a0596ca0..a2f8109 100644
--- a/include/linux/sock_diag.h
+++ b/include/linux/sock_diag.h
@@ -24,6 +24,7 @@ void sock_diag_unregister(const struct sock_diag_handler *h);
 void sock_diag_register_inet_compat(int (*fn)(struct sk_buff *skb, struct 
nlmsghdr *nlh));
 void sock_diag_unregister_inet_compat(int (*fn)(struct sk_buff *skb, struct 
nlmsghdr *nlh));
 
+u64 sock_gen_cookie(struct sock *sk);
 int sock_diag_check_cookie(struct sock *sk, const __u32 *cookie);
 void sock_diag_save_cookie(struct sock *sk, __u32 *cookie);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0539a0c..dc81a9f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -456,6 +456,12 @@ union bpf_attr {
  * Return:
  *   > 0 length of the string including the trailing NUL on success
  *   < 0 error
+ *
+ * u64 bpf_bpf_get_socket_cookie(skb)
+ * Get the cookie for the socket stored inside sk_buff.
+ * @skb: pointer to skb
+ * Return: 8 Bytes non-decreasing number on success or 0 if the socket
+ * field is missing inside sk_buff
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -503,7 +509,8 @@ union bpf_attr {
FN(get_numa_node_id),   \
FN(skb_change_head),\
FN(xdp_adjust_head),\
-   FN(probe_read_str),
+   FN(probe_read_str), \
+   FN(get_socket_cookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index ebaeaf2..5b65ae3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2599,6 +2600,18 @@ static const struct bpf_func_proto 
bpf_xdp_event_output_proto = {
.arg5_type  = ARG_CONST_SIZE,
 };
 
+BPF_CALL_1(bpf_get_socket_cookie, struct sk_buff *, skb)
+{
+   return skb->sk ? sock_gen_cookie(skb->sk) : 0;
+}
+
+static const struct bpf_func_proto bpf_get_socket_cookie_proto = {
+   .func   = bpf_get_socket_cookie,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+};
+
 static const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
 {
@@ -2633,6 +2646,8 @@ sk_filter_func_proto(enum bpf_func_id func_id)
switch (func_id) {
case BPF_FUNC_skb_load_bytes:
return _skb_load_bytes_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return _get_socket_cookie_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -2692,6 +2707,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return _get_smp_processor_id_proto;
case BPF_FUNC_skb_under_cgroup:
return _skb_under_cgroup_proto;
+   case BPF_FUNC_get_socket_cookie:
+   return _get_socket_cookie_proto;
default:
return bpf_base_func_proto(func_id);
}
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 6b10573..acd2a6c 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -19,7 +19,7 @@ static int (*inet_rcv_compat)(struct sk_buff *skb, struct 
nlmsghdr *nlh);
 static DEFINE_MUTEX(sock_diag_table_mutex);
 static struct workqueue_struct *broadcast_wq;
 
-static u64 sock_gen_cookie(struct sock *sk)
+u64 sock_gen_cookie(struct sock *sk)
 {
while (1) {
u64 res = atomic64_read(>sk_cookie);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0539a0c..a94bdd3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -503,7 +503,8 @@ union bpf_attr {
FN(get_numa_node_id),   \
FN(skb_change_head),\
FN(xdp_adjust_head),\
-   FN(probe_read_str),
+   FN(probe_read_str), \
+   FN(get_socket_cookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function

[PATCH net-next v7 2/3] Add a eBPF helper function to retrieve socket uid

2017-03-21 Thread Chenbo Feng

From: Chenbo Feng 

Returns the owner uid of the socket inside a sk_buff. This is useful to
perform per-UID accounting of network traffic or per-UID packet
filtering. The socket need to be a fullsock otherwise overflowuid is
returned.

Signed-off-by: Chenbo Feng 
---
 include/uapi/linux/bpf.h   |  9 -
 net/core/filter.c  | 22 ++
 tools/include/uapi/linux/bpf.h |  3 ++-
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dc81a9f..ff42111 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -462,6 +462,12 @@ union bpf_attr {
  * @skb: pointer to skb
  * Return: 8 Bytes non-decreasing number on success or 0 if the socket
  * field is missing inside sk_buff
+ *
+ * u32 bpf_get_socket_uid(skb)
+ * Get the owner uid of the socket stored inside sk_buff.
+ * @skb: pointer to skb
+ * Return: uid of the socket owner on success or 0 if the socket pointer
+ * inside sk_buff is NULL
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -510,7 +516,8 @@ union bpf_attr {
FN(skb_change_head),\
FN(xdp_adjust_head),\
FN(probe_read_str), \
-   FN(get_socket_cookie),
+   FN(get_socket_cookie),  \
+   FN(get_socket_uid),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 5b65ae3..2f022df 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2612,6 +2612,24 @@ static const struct bpf_func_proto 
bpf_get_socket_cookie_proto = {
.arg1_type  = ARG_PTR_TO_CTX,
 };
 
+BPF_CALL_1(bpf_get_socket_uid, struct sk_buff *, skb)
+{
+   struct sock *sk = sk_to_full_sk(skb->sk);
+   kuid_t kuid;
+
+   if (!sk || !sk_fullsock(sk))
+   return overflowuid;
+   kuid = sock_net_uid(sock_net(sk), sk);
+   return from_kuid_munged(_user_ns, kuid);
+}
+
+static const struct bpf_func_proto bpf_get_socket_uid_proto = {
+   .func   = bpf_get_socket_uid,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+};
+
 static const struct bpf_func_proto *
 bpf_base_func_proto(enum bpf_func_id func_id)
 {
@@ -2648,6 +2666,8 @@ sk_filter_func_proto(enum bpf_func_id func_id)
return _skb_load_bytes_proto;
case BPF_FUNC_get_socket_cookie:
return _get_socket_cookie_proto;
+   case BPF_FUNC_get_socket_uid:
+   return _get_socket_uid_proto;
default:
return bpf_base_func_proto(func_id);
}
@@ -2709,6 +2729,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return _skb_under_cgroup_proto;
case BPF_FUNC_get_socket_cookie:
return _get_socket_cookie_proto;
+   case BPF_FUNC_get_socket_uid:
+   return _get_socket_uid_proto;
default:
return bpf_base_func_proto(func_id);
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a94bdd3..4a2d56d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -504,7 +504,8 @@ union bpf_attr {
FN(skb_change_head),\
FN(xdp_adjust_head),\
FN(probe_read_str), \
-   FN(get_socket_cookie),
+   FN(get_socket_cookie),  \
+   FN(get_socket_uid),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.7.4

Re: [PATCH net-next 2/2] sctp: add support for MSG_MORE

2017-03-21 Thread Marcelo Ricardo Leitner

Hi,

On Fri, Feb 24, 2017 at 10:14:09AM +, David Laight wrote:
> From: Xin Long
> > Sent: 24 February 2017 06:44
> ...
> > > IIRC sctp_packet_can_append_data() is called for the first queued
> > > data chunk in order to decide whether to generate a message that
> > > consists only of data chunks.
> > > If it returns SCTP_XMIT_OK then a message is built collecting the
> > > rest of the queued data chunks (until the window fills).
> > >
> > > So if I send a message with MSG_MORE set (on an idle connection)
> > > SCTP_XMIT_DELAY is returned and a message isn't sent.
> > >
> > > I now send a second small message, this time with MSG_MORE clear.
> > > The message is queued, then the code looks to see if it can send anything.
> > >
> > > sctp_packet_can_append_data() is called for the first queued chunk.
> > > Since it has force_delay set SCTP_XMIT_DELAY is returned and no
> > > message is built.
> > > The second message isn't even looked at.
> > You're right. I can see the problem now.
> > 
> > What I expected is it should work like:
> > 
> > 1, send 3 small chunks with MSG_MORE set, the queue is:
> >   chk3 [set] -> chk2 [set] -> chk1 [set]
> 
> Strange way to write a queue! chk1 points to chk2 :-)

Strictly speaking, it's actually both together, as it's a double-linked
list. :-)

> 
> > 2. send 1 more chunk with MSG_MORE clear, the queue is:
> >   chk4[clear] -> chk3 [clear] -> chk2 [clear] -> chk1 [clear]
> 
> I don't think processing the entire queue is a good idea.
> Both from execution time and the effects on the data cache.

It won't be processing the entire queue if not needed, and it will only
process it on the last sendmsg() call. As the list is double-linked, it
can walk backwards as necessary and stop just at the right point.  So
this doesn't imply on any quadratic or exponential factor here, but
linear and only if/when finishing the MSG_MORE block.

If the application is not using MSG_MORE, impact is zero.

> The SCTP code is horrid enough as it is.
> 
> > 3. then if user send more small chunks with MSG_MORE set,
> > the queue is like:
> >   chkB[set] -> chkA[set] -> chk4[clear] -> chk3 [clear] -> chk2 [clear] -> 
> > chk1 [clear]
> > so that the new small chunks' flag will not affect the other chunks 
> > bundling.
> 
> That isn't really necessary.
> The user can't expect to have absolute control over which chunks get bundled
> together.

So...?
I mean, I'm okay with that but that doesn't explain why we can't do as
Xin proposed on previous email here.

> If the above chunks still aren't big enough to fill a frame the code might
> as well wait for the next chunk instead of building a packet that contains
> chk1 through to chkB.

Our expectations are the same and that's what the proposed solution also
achieves, no?

> 
> Remember you'll only get a queued chunk with MSG_MORE clear if data can't be 
> sent.
> As soon as data can be sent, if the first chunk has MSG_MORE clear all of the
> queued chunks will be sent.

With the fix proposed by Xin, this would be more like: ... all of the
_non-held_ chunks will be sent.
After all, application asked to hold them, for whatever reason it had.

> 
> So immediately after your (3) the application is expected to send a chunk
> with MSG_MORE clear - at that point all the queued chunks can be sent in
> a single packet.

Yes. Isn't that the idea?

> 
> So just save the last MSG_MORE on the association as I did.

I don't see the reason to change that. Your reply seem to reject the
idea but I cannot get the reason why. The solution proposed is more
complex, yes, allows more control, yes, but those aren't real issues
here.

  Marcelo

[PATCH] net: virtio_net: use new api ethtool_{get|set}_link_ksettings

2017-03-21 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/virtio_net.c |   50 +++--
 1 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ea9890d..b0d241d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1636,47 +1636,57 @@ static void virtnet_get_channels(struct net_device *dev,
 }
 
 /* Check if the user is trying to change anything besides speed/duplex */
-static bool virtnet_validate_ethtool_cmd(const struct ethtool_cmd *cmd)
+static bool
+virtnet_validate_ethtool_cmd(const struct ethtool_link_ksettings *cmd)
 {
-   struct ethtool_cmd diff1 = *cmd;
-   struct ethtool_cmd diff2 = {};
+   struct ethtool_link_ksettings diff1 = *cmd;
+   struct ethtool_link_ksettings diff2 = {};
 
/* cmd is always set so we need to clear it, validate the port type
 * and also without autonegotiation we can ignore advertising
 */
-   ethtool_cmd_speed_set(, 0);
-   diff2.port = PORT_OTHER;
-   diff1.advertising = 0;
-   diff1.duplex = 0;
-   diff1.cmd = 0;
+   diff1.base.speed = 0;
+   diff2.base.port = PORT_OTHER;
+   ethtool_link_ksettings_zero_link_mode(, advertising);
+   diff1.base.duplex = 0;
+   diff1.base.cmd = 0;
+   diff1.base.link_mode_masks_nwords = 0;
 
-   return !memcmp(, , sizeof(diff1));
+   return !memcmp(, , sizeof(diff1.base)) &&
+   bitmap_empty(diff1.link_modes.supported,
+__ETHTOOL_LINK_MODE_MASK_NBITS) &&
+   bitmap_empty(diff1.link_modes.advertising,
+__ETHTOOL_LINK_MODE_MASK_NBITS) &&
+   bitmap_empty(diff1.link_modes.lp_advertising,
+__ETHTOOL_LINK_MODE_MASK_NBITS);
 }
 
-static int virtnet_set_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
+static int virtnet_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
 {
struct virtnet_info *vi = netdev_priv(dev);
u32 speed;
 
-   speed = ethtool_cmd_speed(cmd);
+   speed = cmd->base.speed;
/* don't allow custom speed and duplex */
if (!ethtool_validate_speed(speed) ||
-   !ethtool_validate_duplex(cmd->duplex) ||
+   !ethtool_validate_duplex(cmd->base.duplex) ||
!virtnet_validate_ethtool_cmd(cmd))
return -EINVAL;
vi->speed = speed;
-   vi->duplex = cmd->duplex;
+   vi->duplex = cmd->base.duplex;
 
return 0;
 }
 
-static int virtnet_get_settings(struct net_device *dev, struct ethtool_cmd 
*cmd)
+static int virtnet_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
 {
struct virtnet_info *vi = netdev_priv(dev);
 
-   ethtool_cmd_speed_set(cmd, vi->speed);
-   cmd->duplex = vi->duplex;
-   cmd->port = PORT_OTHER;
+   cmd->base.speed = vi->speed;
+   cmd->base.duplex = vi->duplex;
+   cmd->base.port = PORT_OTHER;
 
return 0;
 }
@@ -1696,8 +1706,8 @@ static void virtnet_init_settings(struct net_device *dev)
.set_channels = virtnet_set_channels,
.get_channels = virtnet_get_channels,
.get_ts_info = ethtool_op_get_ts_info,
-   .get_settings = virtnet_get_settings,
-   .set_settings = virtnet_set_settings,
+   .get_link_ksettings = virtnet_get_link_ksettings,
+   .set_link_ksettings = virtnet_set_link_ksettings,
 };
 
 static void virtnet_freeze_down(struct virtio_device *vdev)
-- 
1.7.4.4

[PATCH iproute2 v3 1/1] actions: Add support for user cookies

2017-03-21 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_action.c | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 05ef07e..6ba7615 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)
 
 }
 
-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
int argc = *argc_p;
char **argv = *argv_p;
struct rtattr *tail, *tail2;
char k[16];
+   int act_ck_len = 0;
int ok = 0;
int eap = 0; /* expect action parameters */
 
int ret = 0;
int prio = 0;
+   unsigned char act_ck[TC_COOKIE_MAX_SIZE];
 
if (argc <= 0)
return -1;
@@ -215,16 +216,39 @@ done0:
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
 
-   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS, 
n);
+   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS,
+   n);
 
if (ret < 0) {
fprintf(stderr, "bad action parsing\n");
goto bad_val;
}
+
+   if (*argv && strcmp(*argv, "cookie") == 0) {
+   size_t slen;
+
+   NEXT_ARG();
+   slen = strlen(*argv);
+   if (slen > TC_COOKIE_MAX_SIZE * 2)
+   invarg("cookie cannot exceed %d\n",
+  *argv);
+
+   if (hex2mem(*argv, act_ck, slen / 2) < 0)
+   invarg("cookie must be a hex string\n",
+  *argv);
+
+   act_ck_len = slen;
+   argc--;
+   argv++;
+   }
+
+   if (act_ck_len)
+   addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+ _ck, act_ck_len);
+
tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
ok++;
}
-
}
 
if (eap > 0) {
@@ -245,8 +269,7 @@ bad_val:
return -1;
 }
 
-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {
 
struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +297,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
return err;
 
if (show_stats && tb[TCA_ACT_STATS]) {
+
fprintf(f, "\tAction statistics:\n");
print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+   if (tb[TCA_ACT_COOKIE]) {
+   int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+   char b1[strsz+1];
+
+   fprintf(f, "\n\tcookie len %d %s ", strsz,
+   hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]),
+ strsz, b1, sizeof(b1)));
+   }
fprintf(f, "\n");
}
 
-- 
1.9.1

Re: [PATCH iproute2 v2 1/1] actions: Add support for user cookies

2017-03-21 Thread Jamal Hadi Salim


Ignore - i resent the same version again ;->
Resending ..

cheers,
jamal

On 17-03-21 05:58 PM, Jamal Hadi Salim wrote:

From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_action.c | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 05ef07e..9d5857c 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)

 }

-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
int argc = *argc_p;
char **argv = *argv_p;
struct rtattr *tail, *tail2;
char k[16];
+   int act_ck_len = 0;
int ok = 0;
int eap = 0; /* expect action parameters */

int ret = 0;
int prio = 0;
+   unsigned char act_ck[TC_COOKIE_MAX_SIZE];

if (argc <= 0)
return -1;
@@ -215,16 +216,39 @@ done0:
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);

-   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS, 
n);
+   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS,
+   n);

if (ret < 0) {
fprintf(stderr, "bad action parsing\n");
goto bad_val;
}
+
+   if (*argv && strcmp(*argv, "cookie") == 0) {
+   int slen;
+
+   NEXT_ARG();
+   slen = strlen(*argv);
+   if (slen > (TC_COOKIE_MAX_SIZE*2))
+   invarg("cookie cannot exceed %d\n",
+  *argv);
+
+   if (hex2mem(*argv, act_ck, slen/2) < 0)
+   invarg("cookie must be a hex string\n",
+  *argv);
+
+   act_ck_len = slen;
+   argc--;
+   argv++;
+   }
+
+   if (act_ck_len)
+   addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+ (const void *)_ck, act_ck_len);
+
tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
ok++;
}
-
}

if (eap > 0) {
@@ -245,8 +269,7 @@ bad_val:
return -1;
 }

-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {

struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +297,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
return err;

if (show_stats && tb[TCA_ACT_STATS]) {
+
fprintf(f, "\tAction statistics:\n");
print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+   if (tb[TCA_ACT_COOKIE]) {
+   int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+   char b1[strsz+1];
+
+   fprintf(f, "\n\tcookie len %d %s ", strsz,
+   hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]),
+ strsz, b1, sizeof(b1)));
+   }
fprintf(f, "\n");
}

IPv6 IGMP issue in v4.4.44 ??

2017-03-21 Thread Murali Karicheri

Hello David, experts,

I see an issue with IGMP for IPv6 when I test HSR redundancy network
interface. As soon as I set up an HSR interface, I see some IGMP messages
(destination mac address: 33 33 00 00 00 02 going over HSR interface to
slave interfaces, at the egress where as for IPv6, I see similar messages
going directly over the Ethernet interfaces that are attached to
HSR master. It appears that the NETDEV_CHANGEUPPER is not handled properly
and the mcast snoop sends the packets over the old interfaces at timer
expiry. 

A dump of the message at the slave Ethernet interface looks like below.

IPv4

[   64.643842] 33 33 00 00 00 02 70 ff 76 1c 0f 8d 89 2f 10 3e fc 
[   64.649910] 18 86 dd 60 00 00 00 00 10 3a ff fe 80 00 00 00 
[   64.655705] 00 00 00 72 ff 76 ff fe 1c 0f 8d ff 02 00 00 00 
[   64.661503] 00 00 00 00 00 00 00 00 00 00 02 85 00 8d dc 


You can see this is tagged with HSR.

IPv6

[   65.559130] 33 33 00 00 00 02 70 ff 76 1c 0f 8d 86 dd 60 00 00 
[   65.565205] 00 00 10 3a ff fe 80 00 00 00 00 00 00 72 ff 76 
[   65.571011] ff fe 1c 0f 8d ff 02 00 00 00 00 00 00 00 00 00 
[   65.576806] 00 00 00 00 02 85 00 8d dc 00 00 00 00 01 01 

This is going directly to the slave Ethernet interface.

When I put a WARN_ONCE, I found this is coming directly from 
mld_ifc_timer_expire() -> mld_sendpack() -> ip6_output()

Do you think this is fixed in latest kernel at master? If so, could
you point me to some commits.

Thanks

-- 
Murali Karicheri
Linux Kernel, Keystone

[PATCH iproute2 v2 1/1] actions: Add support for user cookies

2017-03-21 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

action order 0: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 0 installed 5 sec used 5 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent : protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_action.c | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/tc/m_action.c b/tc/m_action.c
index 05ef07e..9d5857c 100644
--- a/tc/m_action.c
+++ b/tc/m_action.c
@@ -150,18 +150,19 @@ new_cmd(char **argv)
 
 }
 
-int
-parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
+int parse_action(int *argc_p, char ***argv_p, int tca_id, struct nlmsghdr *n)
 {
int argc = *argc_p;
char **argv = *argv_p;
struct rtattr *tail, *tail2;
char k[16];
+   int act_ck_len = 0;
int ok = 0;
int eap = 0; /* expect action parameters */
 
int ret = 0;
int prio = 0;
+   unsigned char act_ck[TC_COOKIE_MAX_SIZE];
 
if (argc <= 0)
return -1;
@@ -215,16 +216,39 @@ done0:
addattr_l(n, MAX_MSG, ++prio, NULL, 0);
addattr_l(n, MAX_MSG, TCA_ACT_KIND, k, strlen(k) + 1);
 
-   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS, 
n);
+   ret = a->parse_aopt(a, , , TCA_ACT_OPTIONS,
+   n);
 
if (ret < 0) {
fprintf(stderr, "bad action parsing\n");
goto bad_val;
}
+
+   if (*argv && strcmp(*argv, "cookie") == 0) {
+   int slen;
+
+   NEXT_ARG();
+   slen = strlen(*argv);
+   if (slen > (TC_COOKIE_MAX_SIZE*2))
+   invarg("cookie cannot exceed %d\n",
+  *argv);
+
+   if (hex2mem(*argv, act_ck, slen/2) < 0)
+   invarg("cookie must be a hex string\n",
+  *argv);
+
+   act_ck_len = slen;
+   argc--;
+   argv++;
+   }
+
+   if (act_ck_len)
+   addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
+ (const void *)_ck, act_ck_len);
+
tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
ok++;
}
-
}
 
if (eap > 0) {
@@ -245,8 +269,7 @@ bad_val:
return -1;
 }
 
-static int
-tc_print_one_action(FILE *f, struct rtattr *arg)
+static int tc_print_one_action(FILE *f, struct rtattr *arg)
 {
 
struct rtattr *tb[TCA_ACT_MAX + 1];
@@ -274,8 +297,17 @@ tc_print_one_action(FILE *f, struct rtattr *arg)
return err;
 
if (show_stats && tb[TCA_ACT_STATS]) {
+
fprintf(f, "\tAction statistics:\n");
print_tcstats2_attr(f, tb[TCA_ACT_STATS], "\t", NULL);
+   if (tb[TCA_ACT_COOKIE]) {
+   int strsz = RTA_PAYLOAD(tb[TCA_ACT_COOKIE]);
+   char b1[strsz+1];
+
+   fprintf(f, "\n\tcookie len %d %s ", strsz,
+   hexstring_n2a(RTA_DATA(tb[TCA_ACT_COOKIE]),
+ strsz, b1, sizeof(b1)));
+   }
fprintf(f, "\n");
}
 
-- 
1.9.1

Re: [PATCH net] r8152: fix the list rx_done may be used without initialization

2017-03-21 Thread David Miller

From: Hayes Wang 
Date: Tue, 14 Mar 2017 14:15:20 +0800

> The list rx_done would be initialized when the linking on occurs.
> Therefore, if a napi is scheduled without any linking on before,
> the following kernel panic would happen.
> 
>   BUG: unable to handle kernel NULL pointer dereference at 008
>   IP: [] r8152_poll+0xe1e/0x1210 [r8152]
>   PGD 0
>   Oops: 0002 [#1] SMP
> 
> Signed-off-by: Hayes Wang 

Applied.

Re: [net-next 00/13][pull request] 40GbE Intel Wired LAN Driver Updates 2017-03-20

2017-03-21 Thread David Miller

From: Jeff Kirsher 
Date: Mon, 20 Mar 2017 16:46:59 -0700

> This series contains updates to i40e and i40evf only.

Pulled, thanks Jeff.

Re: run_timer_softirq gpf. [smc]

2017-03-21 Thread Thomas Gleixner

On Tue, 21 Mar 2017, Dave Jones wrote:
> On Tue, Mar 21, 2017 at 08:25:39PM +0100, Thomas Gleixner wrote:
>  
>  > > I just hit this while fuzzing..
>  > > 
>  > > general protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>  > > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.11.0-rc2-think+ #1 
>  > > task: 88017f0ed440 task.stack: c9094000
>  > > RIP: 0010:run_timer_softirq+0x15f/0x700
>  > > RSP: 0018:880507c03ec8 EFLAGS: 00010086
>  > > RAX: dead0200 RBX: 880507dd0d00 RCX: 0002
>  > > RDX: 880507c03ed0 RSI:  RDI: 8204b3a0
>  > > RBP: 880507c03f48 R08: 880507dd12d0 R09: 880507c03ed8
>  > > R10: 880507dd0db0 R11:  R12: 8215cc38
>  > > R13: 880507c03ed0 R14: 82005188 R15: 8804b55491a8
>  > > FS:  () GS:880507c0() 
> knlGS:
>  > > CS:  0010 DS:  ES:  CR0: 80050033
>  > > CR2: 0004 CR3: 05011000 CR4: 001406e0
>  > > Call Trace:
>  > >  
>  > >  ? clockevents_program_event+0x47/0x120
>  > >  __do_softirq+0xbf/0x5b1
>  > >  irq_exit+0xb5/0xc0
>  > >  smp_apic_timer_interrupt+0x3d/0x50
>  > >  apic_timer_interrupt+0x97/0xa0
>  > > RIP: 0010:cpuidle_enter_state+0x12e/0x400
>  > > RSP: 0018:c9097e40 EFLAGS: 0202
>  > > [CONT START]  ORIG_RAX: ff10
>  > > RAX: 88017f0ed440 RBX: e8a03cc8 RCX: 0001
>  > > RDX: 20c49ba5e353f7cf RSI: 0001 RDI: 88017f0ed440
>  > > RBP: c9097e80 R08:  R09: 0008
>  > > R10:  R11:  R12: 0005
>  > > R13: 820b9338 R14: 0005 R15: 820b9320
>  > >  
>  > >  cpuidle_enter+0x17/0x20
>  > >  call_cpuidle+0x23/0x40
>  > >  do_idle+0xfb/0x200
>  > >  cpu_startup_entry+0x71/0x80
>  > >  start_secondary+0x16a/0x210
>  > >  start_cpu+0x14/0x14
>  > > Code: 8b 05 ce 1b ef 7e 83 f8 03 0f 87 4e 01 00 00 89 c0 49 0f a3 04 24 
> 0f 82 0a 01 00 00 49 8b 07 49 8b 57 08 48 85 c0 48 89 02 74 04 <48> 89 50 08 
> 41 f6 47 2a 20 49 c7 47 08 00 00 00 00 48 89 df 48 
>  > 
>  > The timer which expires has timer->entry.next == POISON2 !
>  > 
>  > it's a classic list corruption.  The
>  > bad news is that there is no trace of the culprit because that happens when
>  > some other timer expires after some random amount of time.
>  > 
>  > If that is reproducible, then please enable debugobjects. That should
>  > pinpoint the culprit.
> 
> It's net/smc.  This recently had a similar bug with workqueues. 
> (https://marc.info/?l=linux-kernel=148821582909541) fixed by
> 637fdbae60d6cb9f6e963c1079d7e0445c86ff7d

Fixed? It's not fixed by that commit. The workqueue code merily got a new
WARN_ON_ONCE(). But the underlying problem is still unfixed in net/smc

> so it's probably unsurprising that there are similar issues.

That one is related to workqueues:

> WARNING: CPU: 0 PID: 2430 at lib/debugobjects.c:289 
> debug_print_object+0x87/0xb0
> ODEBUG: free active (active state 0) object type: timer_list hint: 
> delayed_work_timer_fn+0x0/0x20

delayed_work_timer_fn() is what queues the work once the timer expires.

> CPU: 0 PID: 2430 Comm: trinity-c4 Not tainted 4.11.0-rc3-think+ #3 
> Call Trace:
>  dump_stack+0x68/0x93
>  __warn+0xcb/0xf0
>  warn_slowpath_fmt+0x5f/0x80
>  ? debug_check_no_obj_freed+0xd9/0x260
>  debug_print_object+0x87/0xb0
>  ? work_on_cpu+0xd0/0xd0
>  debug_check_no_obj_freed+0x219/0x260
>  ? __sk_destruct+0x10d/0x1c0
>  kmem_cache_free+0x9f/0x370
>  __sk_destruct+0x10d/0x1c0
>  sk_destruct+0x20/0x30
>  __sk_free+0x43/0xa0
>  sk_free+0x18/0x20

smc_release does at the end of the function:

if (smc->use_fallback) {
schedule_delayed_work(>sock_put_work, TCP_TIMEWAIT_LEN);
} else if (sk->sk_state == SMC_CLOSED) {
smc_conn_free(>conn);
schedule_delayed_work(>sock_put_work,
  SMC_CLOSE_SOCK_PUT_DELAY);
}
sk->sk_prot->unhash(sk);
release_sock(sk);

sock_put(sk);

sock_put(sk)
{
if (atomic_dec_and_test(>sk_refcnt))
sk_free(sk);
}

That means either smc_release() queued delayed work or it was already
queued.

But in neither case it holds an extra refcount on sk. Otherwise sock_put()
would not end up in sk_free().

Thanks,

tglx

Re: [PATCH-v5 0/4] vsock: cancel connect packets when failing to connect

2017-03-21 Thread David Miller

From: Peng Tao 
Date: Wed, 15 Mar 2017 09:32:13 +0800

> Currently, if a connect call fails on a signal or timeout (e.g., guest is 
> still
> in the process of starting up), we'll just return to caller and leave the 
> connect
> packet queued and they are sent even though the connection is considered a 
> failure,
> which can confuse applications with unwanted false connect attempt.
> 
> The patchset enables vsock (both host and guest) to cancel queued packets when
> a connect attempt is considered to fail.
> 
> v5 changelog:
>   - change virtio_vsock_pkt->cancel_token back to virtio_vsock_pkt->vsk
> v4 changelog:
>   - drop two unnecessary void * cast
>   - update new callback comment
> v3 changelog:
>   - define cancel_pkt callback in struct vsock_transport rather than struct 
> virtio_transport
>   - rename virtio_vsock_pkt->vsk to virtio_vsock_pkt->cancel_token
> v2 changelog:
>   - fix queued_replies counting and resume tx/rx when necessary

Series applied, thanks.

Re: [PATCH 00/22] Netfilter/IPVS updates for net-next

2017-03-21 Thread David Miller

From: Pablo Neira Ayuso 
Date: Mon, 20 Mar 2017 11:08:28 +0100

> The following patchset contains Netfilter/IPVS updates for your
> net-next tree. A couple of new features for nf_tables, and unsorted
> cleanups and incremental updates for the Netfilter tree. More
> specifically, they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks a lot Pablo.

RE: [net-next 05/13] i40e: rework exit flow of i40e_add_fdir_ethtool

2017-03-21 Thread Keller, Jacob E

> -Original Message-
> From: Sergei Shtylyov [mailto:sergei.shtyl...@cogentembedded.com]
> Sent: Tuesday, March 21, 2017 3:11 AM
> To: Kirsher, Jeffrey T ; da...@davemloft.net
> Cc: Keller, Jacob E ; netdev@vger.kernel.org;
> nhor...@redhat.com; sassm...@redhat.com; jogre...@redhat.com
> Subject: Re: [net-next 05/13] i40e: rework exit flow of i40e_add_fdir_ethtool
> 
> Hello!
> 
> On 3/21/2017 2:47 AM, Jeff Kirsher wrote:
> 
> > From: Jacob Keller 
> >
> > Refactor the exit flow of the i40e_add_fdir_ethtool function. Move the
> > input_label to the end of the function, removing the dependency on
> 
> I don't see 'input_label' anywhere. Perhaps 'free_input' label was meant?
> 

You're correct. 

Thanks,
Jake

Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-21 Thread Eric Dumazet

On Tue, 2017-03-21 at 13:49 -0700, Kees Cook wrote:

> Yeah, this is exactly what I'd like to find as well. Just comparing
> cycles between refcount implementations, while interesting, doesn't
> show us real-world performance changes, which is what we need to
> measure.
> 
> Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
> elsewhere in this email thread) real-world meaningful enough?

Not at all ;)

This was targeting the specific change I had in mind for
ip_idents_reserve(), which is not used by TCP flows.

Unfortunately there is no good test simulating real-world workloads,
which are mostly using TCP flows.

Most synthetic tools you can find are not using epoll(), and very often
hit bottlenecks in other layers.

It looks like our suggestion to get kernel builds with atomic_inc()
being exactly an atomic_inc() is not even discussed or implemented.

Coding this would require less time than running a typical Google kernel
qualification (roughly one month, thousands of hosts..., days of SWE).

Re: [PATCH iproute2 1/1] actions: Add support for user cookies

2017-03-21 Thread Stephen Hemminger

Minor style issues.

> + if (*argv && strcmp(*argv, "cookie") == 0) {
> + int slen;
slen is strlen() and that returns size_t not int.

> +
> + NEXT_ARG();
> + slen = strlen(*argv);
> + if (slen > (TC_COOKIE_MAX_SIZE*2))

No extra (), and space around *

> + invarg("cookie cannot exceed %d\n",
> +*argv);
> +
> + if (hex2mem(*argv, act_ck, slen/2) < 0)
Space around / operator

> + invarg("cookie must be a hex string\n",
> +*argv);
> +
> + act_ck_len = slen;
> + argc--;
> + argv++;
> + }
> +
> + if (act_ck_len)
> + addattr_l(n, MAX_MSG, TCA_ACT_COOKIE,
> +   (const void *)_ck, act_ck_len);

Cast to void *  is not necessary.

Re: [net-next 00/13][pull request] 1GbE Intel Wired LAN Driver Updates 2017-03-17

2017-03-21 Thread David Miller

From: Jeff Kirsher 
Date: Fri, 17 Mar 2017 12:58:05 -0700

> This series contains updates to mainly igb, with one fix for ixgbe.

Pulled, thanks Jeff.

[PATCH net] net: bcmgenet: remove bcmgenet_internal_phy_setup()

2017-03-21 Thread Doug Berger

Commit 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset")
removed the bcmgenet_mii_reset() function from bcmgenet_power_up() and
bcmgenet_internal_phy_setup() functions.  In so doing it broke the reset
of the internal PHY devices used by the GENETv1-GENETv3 which required
this reset before the UniMAC was enabled.  It also broke the internal
GPHY devices used by the GENETv4 because the config_init that installed
the AFE workaround was no longer occurring after the reset of the GPHY
performed by bcmgenet_phy_power_set() in bcmgenet_internal_phy_setup().
In addition the code in bcmgenet_internal_phy_setup() related to the
"enable APD" comment goes with the bcmgenet_mii_reset() so it should
have also been removed.

Commit bd4060a6108b ("net: bcmgenet: Power on integrated GPHY in
bcmgenet_power_up()") moved the bcmgenet_phy_power_set() call to the
bcmgenet_power_up() function, but failed to remove it from the
bcmgenet_internal_phy_setup() function.  Had it done so, the
bcmgenet_internal_phy_setup() function would have been empty and could
have been removed at that time.

Commit 5dbebbb44a6a ("net: bcmgenet: Software reset EPHY after power on")
was submitted to correct the functional problems introduced by
commit 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset"). It
was included in v4.4 and made available on 4.3-stable. Unfortunately,
it didn't fully revert the commit because this bcmgenet_mii_reset()
doesn't apply the soft reset to the internal GPHY used by GENETv4 like
the previous one did. This prevents the restoration of the AFE work-
arounds for internal GPHY devices after the bcmgenet_phy_power_set() in
bcmgenet_internal_phy_setup().

This commit takes the alternate approach of removing the unnecessary
bcmgenet_internal_phy_setup() function which shouldn't have been in v4.3
so that when bcmgenet_mii_reset() was restored it should have only gone
into bcmgenet_power_up().  This will avoid the problems while also
removing the redundancy (and hopefully some of the confusion).

Fixes: 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset")
Signed-off-by: Doug Berger 
---
 drivers/net/ethernet/broadcom/genet/bcmmii.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c 
b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index e87607621e62..2f9281936f0e 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -220,20 +220,6 @@ void bcmgenet_phy_power_set(struct net_device *dev, bool 
enable)
udelay(60);
 }
 
-static void bcmgenet_internal_phy_setup(struct net_device *dev)
-{
-   struct bcmgenet_priv *priv = netdev_priv(dev);
-   u32 reg;
-
-   /* Power up PHY */
-   bcmgenet_phy_power_set(dev, true);
-   /* enable APD */
-   reg = bcmgenet_ext_readl(priv, EXT_EXT_PWR_MGMT);
-   reg |= EXT_PWR_DN_EN_LD;
-   bcmgenet_ext_writel(priv, reg, EXT_EXT_PWR_MGMT);
-   bcmgenet_mii_reset(dev);
-}
-
 static void bcmgenet_moca_phy_setup(struct bcmgenet_priv *priv)
 {
u32 reg;
@@ -281,7 +267,6 @@ int bcmgenet_mii_config(struct net_device *dev)
 
if (priv->internal_phy) {
phy_name = "internal PHY";
-   bcmgenet_internal_phy_setup(dev);
} else if (priv->phy_interface == PHY_INTERFACE_MODE_MOCA) {
phy_name = "MoCA";
bcmgenet_moca_phy_setup(priv);
-- 
2.11.1

Hello

2017-03-21 Thread Shri Gandhi

Hello Friend,

Compliment of the day, I got your contact information from a reputable 
business/professional directory of your country which gives me assurance of 
your legibility as a person. I send you this brief letter to solicit your 
partnership to transfer ($22,500,000.00 USD) from Reserve Bank of India ( 
R.B.I) to your country. I shall furnish you with more info when I get a 
positive reply from you. Reply me Via E-mail if you are interested.

Best Regards,
Mr.Shri Gandhi

Re: [PATCH 07/17] net: convert sock.sk_refcnt from atomic_t to refcount_t

2017-03-21 Thread Kees Cook

On Mon, Mar 20, 2017 at 6:40 AM, Peter Zijlstra  wrote:
> On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote:
>> On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote:
>> >
>> > So what bench/setup do you want ran?
>>
>> You can start by counting how many cycles an atomic op takes
>> vs. how many cycles this new code takes.
>
> On what uarch?
>
> I think I tested hand coded asm version and it ended up about double the
> cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until
> the memory bus saturated, at which point they took the same). Newer
> parts will of course have different numbers,
>
> Can't we run some iperf on a 40gbe fiber loop or something? It would be
> very useful to have an actual workload we can run.

Yeah, this is exactly what I'd like to find as well. Just comparing
cycles between refcount implementations, while interesting, doesn't
show us real-world performance changes, which is what we need to
measure.

Is Eric's "20 concurrent 'netperf -t UDP_STREAM'" example (from
elsewhere in this email thread) real-world meaningful enough?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH v2 net-next] net: fix dma operation mode config for older versions

2017-03-21 Thread Thierry Reding

Resending with Sergei's proper email address.

On Tue, Mar 21, 2017 at 06:02:36PM +, Joao Pinto wrote:
> This patch fixes a bug introduced in:
> commit 6deee2221e11 ("net: stmmac: prepare dma op mode config for multiple
> queues")
> 
> The dma operation mode configuration routine was wrongly moved to a
> function (stmmac_mtl_configuration) that is only executed if the
> core version is >= 4.00.
> 
> Reported-by: Corentin Labbe 
> Signed-off-by: Joao Pinto 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

With Florian's comments addressed, this is:

Reviewed-by: Thierry Reding 


signature.asc
Description: PGP signature

[patch -next] net: dwc-xlgmac: fix an error code in xlgmac_alloc_pages()

2017-03-21 Thread Dan Carpenter

The dma_mapping_error() returns true if there is an error but we want
to return -ENOMEM and not 1.

Fixes: 65e0ace2c5cd ("net: dwc-xlgmac: Initial driver for DesignWare Enterprise 
Ethernet")
Signed-off-by: Dan Carpenter 

diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c 
b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
index 55c796ed7d26..39b5cb967bba 100644
--- a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
+++ b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
@@ -335,7 +335,6 @@ static int xlgmac_alloc_pages(struct xlgmac_pdata *pdata,
 {
struct page *pages = NULL;
dma_addr_t pages_dma;
-   int ret;
 
/* Try to obtain pages, decreasing order if necessary */
gfp |= __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
@@ -352,10 +351,9 @@ static int xlgmac_alloc_pages(struct xlgmac_pdata *pdata,
/* Map the pages */
pages_dma = dma_map_page(pdata->dev, pages, 0,
 PAGE_SIZE << order, DMA_FROM_DEVICE);
-   ret = dma_mapping_error(pdata->dev, pages_dma);
-   if (ret) {
+   if (dma_mapping_error(pdata->dev, pages_dma)) {
put_page(pages);
-   return ret;
+   return -ENOMEM;
}
 
pa->pages = pages;

Re: [PATCH v2 net-next] net: fix dma operation mode config for older versions

2017-03-21 Thread Thierry Reding

On Tue, Mar 21, 2017 at 06:00:53PM +, Joao Pinto wrote:
> This patch fixes a bug introduced in:
> commit 6deee2221e11 ("net: stmmac: prepare dma op mode config for multiple
> queues")
> 
> The dma operation mode configuration routine was wrongly moved to a
> function (stmmac_mtl_configuration) that is only executed if the
> core version is >= 4.00.
> 
> Reported-by: Corentin Labbe 
> Signed-off-by: Joao Pinto 
> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

With Florian's comments addressed, this is:

Reviewed-by: Thierry Reding 


signature.asc
Description: PGP signature

Re: [PATCH net-next 1/4] drivers: net: xgene-v2: Add MDIO support

2017-03-21 Thread Andrew Lunn

> @@ -511,9 +512,9 @@ static int xge_close(struct net_device *ndev)
>  {
>   struct xge_pdata *pdata = netdev_priv(ndev);
>  
> - netif_carrier_off(ndev);
>   netif_stop_queue(ndev);
>   xge_mac_disable(pdata);
> + phy_stop(ndev->phydev);
>  
>   xge_intr_disable(pdata);
>   xge_free_irq(ndev);
> @@ -683,9 +684,14 @@ static int xge_probe(struct platform_device *pdev)
>   if (ret)
>   goto err;
>  
> + spin_lock_init(>mdio_lock);
> +

...

> +static int xge_mdio_write(struct mii_bus *bus, int phy_id, int reg, u16 data)
> +{
> + struct xge_pdata *pdata = bus->priv;
> + u32 done, val = 0;
> + u8 wait = 10;
> + int ret = 0;
> +
> + spin_lock(>mdio_lock);
> +
> + SET_REG_BITS(, PHY_ADDR, phy_id);
> + SET_REG_BITS(, REG_ADDR, reg);
> + xge_wr_csr(pdata, MII_MGMT_ADDRESS, val);
> +
> + xge_wr_csr(pdata, MII_MGMT_CONTROL, data);
> + do {
> + usleep_range(5, 10);
> + done = xge_rd_csr(pdata, MII_MGMT_INDICATORS);
> + } while ((done & MII_MGMT_BUSY) && wait--);
> +
> + if (done & MII_MGMT_BUSY) {
> + dev_err(>dev, "MII_MGMT write failed\n");
> + ret = -ETIMEDOUT;
> + }
> +
> + spin_unlock(>mdio_lock);
> +
> + return ret;
> +}
> +
> +static int xge_mdio_read(struct mii_bus *bus, int phy_id, int reg)
> +{
> + struct xge_pdata *pdata = bus->priv;
> + u32 data, done, val = 0;
> + u8 wait = 10;
> +
> + spin_lock(>mdio_lock);
> +

Hi Iyappan

Please could you explain what this lock is protecting which the
mii_bus mdio_lock in mdio_bus.c is not protecting?

Thanks
Andrew

1 2 3 >

1 - 100 of 225 matches

Mail list logo