[PATCH net v2] tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs

2016-02-03 Thread skallam
From: Siva Reddy Kallam 

tg3_tso_bug() can hit a condition where the entire tx ring is not big
enough to segment the GSO packet. For example, if MSS is very small,
gso_segs can exceed the tx ring size. When we hit the condition, it
will cause tx timeout.

tg3_tso_bug() is called to handle TSO and DMA hardware bugs.
For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet.
For DMA bugs, we can still fall back to linearize the SKB and let the
hardware transmit the TSO packet.

This patch adds a function tg3_tso_bug_gso_check() to check if there
are enough tx descriptors for GSO before calling tg3_tso_bug().
The caller will then handle the error appropriately - drop or
lineraize the SKB.

v2: Corrected patch description to avoid confusion.

Signed-off-by: Siva Reddy Kallam 
Signed-off-by: Michael Chan 
Acked-by: Prashant Sreedharan 
---
 drivers/net/ethernet/broadcom/tg3.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index 9293675..eb3d347 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -7831,6 +7831,14 @@ static int tigon3_dma_hwbug_workaround(struct tg3_napi 
*tnapi,
return ret;
 }
 
+static bool tg3_tso_bug_gso_check(struct tg3_napi *tnapi, struct sk_buff *skb)
+{
+   /* Check if we will never have enough descriptors,
+* as gso_segs can be more than current ring size
+*/
+   return skb_shinfo(skb)->gso_segs < tnapi->tx_pending / 3;
+}
+
 static netdev_tx_t tg3_start_xmit(struct sk_buff *, struct net_device *);
 
 /* Use GSO to workaround all TSO packets that meet HW bug conditions
@@ -7934,14 +7942,19 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
 * vlan encapsulated.
 */
if (skb->protocol == htons(ETH_P_8021Q) ||
-   skb->protocol == htons(ETH_P_8021AD))
-   return tg3_tso_bug(tp, tnapi, txq, skb);
+   skb->protocol == htons(ETH_P_8021AD)) {
+   if (tg3_tso_bug_gso_check(tnapi, skb))
+   return tg3_tso_bug(tp, tnapi, txq, skb);
+   goto drop;
+   }
 
if (!skb_is_gso_v6(skb)) {
if (unlikely((ETH_HLEN + hdr_len) > 80) &&
-   tg3_flag(tp, TSO_BUG))
-   return tg3_tso_bug(tp, tnapi, txq, skb);
-
+   tg3_flag(tp, TSO_BUG)) {
+   if (tg3_tso_bug_gso_check(tnapi, skb))
+   return tg3_tso_bug(tp, tnapi, txq, skb);
+   goto drop;
+   }
ip_csum = iph->check;
ip_tot_len = iph->tot_len;
iph->check = 0;
@@ -8073,7 +8086,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
if (would_hit_hwbug) {
tg3_tx_skb_unmap(tnapi, tnapi->tx_prod, i);
 
-   if (mss) {
+   if (mss && tg3_tso_bug_gso_check(tnapi, skb)) {
/* If it's a TSO packet, do GSO instead of
 * allocating and copying to a large linear SKB
 */
-- 
1.9.1



Re: ixgbe: get link speed as a slave nic unrelated with link

2016-02-03 Thread zhuyj

Hi, Emil

Thanks for your reply.

I made simple tests. And maybe this patch should work. Because you can
reproduce this problem, would you like to make tests with this patch?

If this patch can fix this problem, it can prove that the root cause is 
correct.

We can find another solution to fix this problem.

If this patch can not fix this problem, maybe we should make further 
investigations

to find the root cause.

Thanks a lot.
Zhu Yanjun

On 02/01/2016 11:53 PM, Tantilov, Emil S wrote:

-Original Message-
From: zyjzyj2...@gmail.com [mailto:zyjzyj2...@gmail.com]
Sent: Sunday, January 31, 2016 11:28 PM
To: zyjzyj2...@gmail.com; Tantilov, Emil S; Schmitt, Phillip J; Kirsher,
Jeffrey T; netdev@vger.kernel.org; e1000-de...@lists.sourceforge.net;
Shteinbock, Boris (Wind River)
Subject: ixgbe: get link speed as a slave nic unrelated with link


Hi, Emil

Thanks for your patch.
After I applied your patch, the following are the feedback from my users.

"
Users had tested the latest patch that you provided and it is much improved
now. However it’s still not good enough as the users are planning field
deployment. Here are their findings:

So close, but not quite 100%. I did run over 2500 re-negotiations on one
interface of a bonded pair and got the 0 MBps status total of three times.
The longest run without single error was something like 1800 re-
negotiations or so. So, this version seems to improve the situation
immensely (the unpatched driver fails like 25% of the time), but there
still seems to remain some tiny race somewhere.

Yes at the time of the bonding interface coming up there can be a message about 
0 Mbps in dmesg,
however the actual bond once fully up will have the correct speeds as seen by:
#cat /proc/net/bonding/bond0

Thanks,
Emil






Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe

2016-02-03 Thread Or Gerlitz

On 2/3/2016 12:21 PM, John Fastabend wrote:

Thanks, we will need at least a v2 to fixup some build errors
with various compile flags caught by build_bot and missed by me.

Hi John,

You didn't mark that as RFC... but we said this direction/approach yet
to be talked @ netdev next-week, so.. can you clarify?

I suggest not to rush and asking pulling this, lets have the tc workshop 
beforehand...


Please add to v2 listing of changes from V0/V1 to assist with the review.

thanks,

Or.


[patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-03 Thread Jiri Pirko
From: Jiri Pirko 

There a is need for some userspace API that would allow to expose things
that are not directly related to any device class like net_device of
ib_device, but rather chip-wide/switch-ASIC-wide stuff.

Use cases:
1) get/set of port type (Ethernet/InfiniBand)
2) monitoring of hardware messages to and from chip
3) setting up port splitters - split port into multiple ones and squash again,
   enables usage of splitter cable
4) setting up shared buffers - shared among multiple ports within one chip

First patch of this set introduces a new generic Netlink based interface,
called "devlink". It is similar to nl80211 model and it is heavily
influenced by it, including the API definition. The devlink introduction patch
implements use cases 1) and 2). Other 2 are in development atm and will
be addressed by follow-ups.

It is very convenient for drivers to use devlink, as you can see in other
patches in this set.

Counterpart for devlink is userspace tool called "dl". Command line interface
and outputs are derived from "ip" tool so it should be easy for users to get
used to it.

It is available here:
https://github.com/jpirko/devlink

Example usage:
butter:~$ dl help
Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
where  OBJECT := { dev | port | monitor }
   OPTIONS := { -v/--verbose }

butter:~$ dl dev show
0: devlink0: bus pci dev :01:00.0

butter:~$ dl port help
Usage: dl port show [DEV/PORT_INDEX]
Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]

butter:~$ dl port show
devlink0/1: type ib ibdev mlx4_0
devlink0/2: type ib ibdev mlx4_0

butter:~$ sudo dl port set devlink0/1 type eth

butter:~$ dl port show
devlink0/1: type eth netdev ens4
devlink0/2: type ib ibdev mlx4_0

butter:~$ sudo dl port set devlink0/1 type auto

butter:~$ dl port show
devlink0/1: type eth(auto) netdev ens4
devlink0/2: type ib ibdev mlx4_0

Jiri Pirko (6):
  Introduce devlink infrastructure
  mlxsw: Implement devlink interface
  mlxsw: Implement hardware messages notification using devlink
  mlx4: Implement devlink interface
  mlx4: Implement hardware messages notification using devlink
  mlx4: Implement port type setting via devlink interface

 MAINTAINERS|   8 +
 drivers/infiniband/hw/mlx4/main.c  |   7 +
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   8 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c|   9 +
 drivers/net/ethernet/mellanox/mlx4/intf.c  |   9 +
 drivers/net/ethernet/mellanox/mlx4/main.c  | 129 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |   2 +
 drivers/net/ethernet/mellanox/mlxsw/core.c |  39 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  20 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |   2 +
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c |  20 +
 include/linux/mlx4/driver.h|   3 +
 include/net/devlink.h  | 152 +
 include/uapi/linux/devlink.h   |  84 +++
 net/Kconfig|   7 +
 net/core/Makefile  |   1 +
 net/core/devlink.c | 856 +
 17 files changed, 1313 insertions(+), 43 deletions(-)
 create mode 100644 include/net/devlink.h
 create mode 100644 include/uapi/linux/devlink.h
 create mode 100644 net/core/devlink.c

-- 
1.9.3



[RESEND PATCH iproute2] tc: fix compilation with old gcc (< 4.6) (bis)

2016-02-03 Thread Nicolas Dichtel
Commit 8f80d450c3cb ("tc: fix compilation with old gcc (< 4.6)") was reverted
to ease the merge of the net-next branch.

Here is the new version.

Signed-off-by: Nicolas Dichtel 
Signed-off-by: Daniel Borkmann 
---
 tc/tc_bpf.c | 58 +-
 1 file changed, 33 insertions(+), 25 deletions(-)

diff --git a/tc/tc_bpf.c b/tc/tc_bpf.c
index 42c8841869f5..219ffa582c1a 100644
--- a/tc/tc_bpf.c
+++ b/tc/tc_bpf.c
@@ -49,6 +49,10 @@
 #include "tc_util.h"
 #include "tc_bpf.h"
 
+#ifndef AF_ALG
+#define AF_ALG 38
+#endif
+
 #ifdef HAVE_ELF
 static int bpf_obj_open(const char *path, enum bpf_prog_type type,
const char *sec, bool verbose);
@@ -81,12 +85,13 @@ static int bpf(int cmd, union bpf_attr *attr, unsigned int 
size)
 static int bpf_map_update(int fd, const void *key, const void *value,
  uint64_t flags)
 {
-   union bpf_attr attr = {
-   .map_fd = fd,
-   .key= bpf_ptr_to_u64(key),
-   .value  = bpf_ptr_to_u64(value),
-   .flags  = flags,
-   };
+   union bpf_attr attr;
+
+   memset(, 0, sizeof(attr));
+   attr.map_fd = fd;
+   attr.key = bpf_ptr_to_u64(key);
+   attr.value = bpf_ptr_to_u64(value);
+   attr.flags = flags;
 
return bpf(BPF_MAP_UPDATE_ELEM, , sizeof(attr));
 }
@@ -745,12 +750,13 @@ static __check_format_string(1, 2) void 
bpf_dump_error(const char *format, ...)
 static int bpf_map_create(enum bpf_map_type type, unsigned int size_key,
  unsigned int size_value, unsigned int max_elem)
 {
-   union bpf_attr attr = {
-   .map_type   = type,
-   .key_size   = size_key,
-   .value_size = size_value,
-   .max_entries= max_elem,
-   };
+   union bpf_attr attr;
+
+   memset(, 0, sizeof(attr));
+   attr.map_type = type;
+   attr.key_size = size_key;
+   attr.value_size = size_value;
+   attr.max_entries = max_elem;
 
return bpf(BPF_MAP_CREATE, , sizeof(attr));
 }
@@ -758,15 +764,16 @@ static int bpf_map_create(enum bpf_map_type type, 
unsigned int size_key,
 static int bpf_prog_load(enum bpf_prog_type type, const struct bpf_insn *insns,
 size_t size, const char *license)
 {
-   union bpf_attr attr = {
-   .prog_type  = type,
-   .insns  = bpf_ptr_to_u64(insns),
-   .insn_cnt   = size / sizeof(struct bpf_insn),
-   .license= bpf_ptr_to_u64(license),
-   .log_buf= bpf_ptr_to_u64(bpf_log_buf),
-   .log_size   = sizeof(bpf_log_buf),
-   .log_level  = 1,
-   };
+   union bpf_attr attr;
+
+   memset(, 0, sizeof(attr));
+   attr.prog_type = type;
+   attr.insns = bpf_ptr_to_u64(insns);
+   attr.insn_cnt = size / sizeof(struct bpf_insn);
+   attr.license = bpf_ptr_to_u64(license);
+   attr.log_buf = bpf_ptr_to_u64(bpf_log_buf);
+   attr.log_size = sizeof(bpf_log_buf);
+   attr.log_level = 1;
 
if (getenv(BPF_ENV_NOLOG)) {
attr.log_buf= 0;
@@ -779,10 +786,11 @@ static int bpf_prog_load(enum bpf_prog_type type, const 
struct bpf_insn *insns,
 
 static int bpf_obj_pin(int fd, const char *pathname)
 {
-   union bpf_attr attr = {
-   .pathname   = bpf_ptr_to_u64(pathname),
-   .bpf_fd = fd,
-   };
+   union bpf_attr attr;
+
+   memset(, 0, sizeof(attr));
+   attr.pathname = bpf_ptr_to_u64(pathname);
+   attr.bpf_fd = fd;
 
return bpf(BPF_OBJ_PIN, , sizeof(attr));
 }
-- 
2.4.2



[PATCH net] enic: increment devcmd2 result ring in case of timeout

2016-02-03 Thread Govindarajulu Varadarajan
From: Sandeep Pillai 

Firmware posts the devcmd result in result ring. In case of timeout, driver
does not increment the current result pointer and firmware could post the
result after timeout has occurred. During next devcmd, driver would be
reading the result of previous devcmd.

Fix this by incrementing result even in case of timeout.

Fixes: 373fb0873d43 ("enic: add devcmd2")
Signed-off-by: Sandeep Pillai 
Signed-off-by: Govindarajulu Varadarajan <_gov...@gmx.com>
---
 drivers/net/ethernet/cisco/enic/enic.h |  2 +-
 drivers/net/ethernet/cisco/enic/vnic_dev.c | 19 ---
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic.h 
b/drivers/net/ethernet/cisco/enic/enic.h
index 1671fa3..7ba6d53 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -33,7 +33,7 @@
 
 #define DRV_NAME   "enic"
 #define DRV_DESCRIPTION"Cisco VIC Ethernet NIC Driver"
-#define DRV_VERSION"2.3.0.12"
+#define DRV_VERSION"2.3.0.20"
 #define DRV_COPYRIGHT  "Copyright 2008-2013 Cisco Systems, Inc"
 
 #define ENIC_BARS_MAX  6
diff --git a/drivers/net/ethernet/cisco/enic/vnic_dev.c 
b/drivers/net/ethernet/cisco/enic/vnic_dev.c
index 1ffd105..1fdf5fe 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_dev.c
+++ b/drivers/net/ethernet/cisco/enic/vnic_dev.c
@@ -298,7 +298,8 @@ static int _vnic_dev_cmd2(struct vnic_dev *vdev, enum 
vnic_devcmd_cmd cmd,
  int wait)
 {
struct devcmd2_controller *dc2c = vdev->devcmd2;
-   struct devcmd2_result *result = dc2c->result + dc2c->next_result;
+   struct devcmd2_result *result;
+   u8 color;
unsigned int i;
int delay, err;
u32 fetch_index, new_posted;
@@ -336,13 +337,17 @@ static int _vnic_dev_cmd2(struct vnic_dev *vdev, enum 
vnic_devcmd_cmd cmd,
if (dc2c->cmd_ring[posted].flags & DEVCMD2_FNORESULT)
return 0;
 
+   result = dc2c->result + dc2c->next_result;
+   color = dc2c->color;
+
+   dc2c->next_result++;
+   if (dc2c->next_result == dc2c->result_size) {
+   dc2c->next_result = 0;
+   dc2c->color = dc2c->color ? 0 : 1;
+   }
+
for (delay = 0; delay < wait; delay++) {
-   if (result->color == dc2c->color) {
-   dc2c->next_result++;
-   if (dc2c->next_result == dc2c->result_size) {
-   dc2c->next_result = 0;
-   dc2c->color = dc2c->color ? 0 : 1;
-   }
+   if (result->color == color) {
if (result->error) {
err = result->error;
if (err != ERR_ECMDUNKNOWN ||
-- 
2.6.2



Re: [net-next PATCH 04/11] net: bulk alloc and reuse of SKBs in NAPI context

2016-02-03 Thread Jesper Dangaard Brouer
On Tue, 2 Feb 2016 16:52:50 -0800
Alexei Starovoitov  wrote:

> On Tue, Feb 02, 2016 at 10:12:01PM +0100, Jesper Dangaard Brouer wrote:
> > Think twice before applying
> >  - This patch can potentially introduce added latency in some workloads
> > 
> > This patch introduce bulk alloc of SKBs and allow reuse of SKBs
> > free'ed in same softirq cycle.  SKBs are normally free'ed during TX
> > completion, but most high speed drivers also cleanup TX ring during
> > NAPI RX poll cycle.  Thus, if using napi_consume_skb/__kfree_skb_defer,
> > SKBs will be avail in the napi_alloc_cache->skb_cache.
> > 
> > If no SKBs are avail for reuse, then only bulk alloc 8 SKBs, to limit
> > the potential overshooting unused SKBs needed to free'ed when NAPI
> > cycle ends (flushed in net_rx_action via __kfree_skb_flush()).
> > 
> > Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost)
> > (GCC version 5.1.1 20150618 (Red Hat 5.1.1-4))
> >  Allocator SLUB:
> >   Single CPU/flow numbers: before: 2064446 pps -> after: 2083031 pps
> >   Improvement: +18585 pps, -4.3 nanosec, +0.9%
> >  Allocator SLAB:
> >   Single CPU/flow numbers: before: 2035949 pps -> after: 2033567 pps
> >   Regression: -2382 pps, +0.57 nanosec, -0.1 %
> > 
> > Even-though benchmarking does show an improvement for SLUB(+0.9%), I'm
> > not convinced bulk alloc will be a win in all situations:
> >  * I see stalls on walking the SLUB freelist (normal hidden by prefetch)
> >  * In case RX queue is not full, alloc and free more SKBs than needed
> > 
> > More testing is needed with more real life benchmarks.
> > 
> > Joint work with Alexander Duyck.
> > 
> > Signed-off-by: Jesper Dangaard Brouer 
> > Signed-off-by: Alexander Duyck   
> ...
> > -   skb = __build_skb(data, len);
> > -   if (unlikely(!skb)) {
> > +#define BULK_ALLOC_SIZE 8
> > +   if (!nc->skb_count) {
> > +   nc->skb_count = kmem_cache_alloc_bulk(skbuff_head_cache,
> > + gfp_mask, BULK_ALLOC_SIZE,
> > + nc->skb_cache);
> > +   }
> > +   if (likely(nc->skb_count)) {
> > +   skb = (struct sk_buff *)nc->skb_cache[--nc->skb_count];
> > +   } else {
> > +   /* alloc bulk failed */
> > skb_free_frag(data);
> > return NULL;
> > }
> >  
> > +   len -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +
> > +   memset(skb, 0, offsetof(struct sk_buff, tail));
> > +   skb->truesize = SKB_TRUESIZE(len);
> > +   atomic_set(>users, 1);
> > +   skb->head = data;
> > +   skb->data = data;
> > +   skb_reset_tail_pointer(skb);
> > +   skb->end = skb->tail + len;
> > +   skb->mac_header = (typeof(skb->mac_header))~0U;
> > +   skb->transport_header = (typeof(skb->transport_header))~0U;
> > +
> > +   /* make sure we initialize shinfo sequentially */
> > +   shinfo = skb_shinfo(skb);
> > +   memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
> > +   atomic_set(>dataref, 1);
> > +   kmemcheck_annotate_variable(shinfo->destructor_arg);  
> 
> copy-pasting from __build_skb()...
> Either new helper is needed or extend __build_skb() to take
> pre-allocated 'raw_skb' pointer.

Guess should have create a helper for the basic skb setup.  I just kept
it here, as I was planning to do trick like removing the memset, as
discussed below.

> Overall I like the first 3 patches. I think they're useful
> on their won.

Great, hope they can go into net-next soon.

> As far as bulk alloc... have you considered splitting
> bulk alloc of skb and init of skb?

I have many consideration on the SKB memset, as it shows up quite high
in profiles, how we can either 1) avoid clearing this much, or 2) speed
up clearing by clearing more (full pages in allocator).

My idea (2) is about clearing larger memory chunks, as the
memset/rep-stos operation have a fixed startup cost.  But do it inside
the slab allocator's bulk alloc call.  During bulk alloc we can
identify objects that are "next-to-each-other" and exec one rep-stos
operation.  My measurements show we need 3x 256-byte object's cleared
together before it is a win.

*BUT* I actually like your idea better, below, of delaying the init
... more comments below.

> Like in the above
> + skb = (struct sk_buff *)nc->skb_cache[--nc->skb_count];
> will give cold pointer and first memset() will be missing cache.
> Either prefetch is needed the way slab_alloc_node() is doing
> in the line prefetch_freepointer(s, next_object);

I could prefetch the next elem in skb_cache[]. I have been playing with
that.  It didn't have much effect, as the bulk alloc (at least for
SLUB) have already walked the memory.  It helped a little to prefetch
the 3rd cache-line, on the memset speed... (but so small that is was
difficult to measure with enough confidence).

> or buld_alloc_skb and bulk_init_skb need to be two loops
> driven by drivers.
> Another idea is we can move 

Re: [PATCH v3] net:Add sysctl_max_skb_frags

2016-02-03 Thread Herbert Xu
On Wed, Feb 03, 2016 at 09:26:57AM +0100, Hans Westgaard Ry wrote:
> Devices may have limits on the number of fragments in an skb they support.
> Current codebase uses a constant as maximum for number of fragments one
> skb can hold and use.
> When enabling scatter/gather and running traffic with many small messages
> the codebase uses the maximum number of fragments and may thereby violate
> the max for certain devices.
> The patch introduces a global variable as max number of fragments.
> 
> Signed-off-by: Hans Westgaard Ry 
> Reviewed-by: Håkon Bugge 

I have to say this seems rather dirty.  I mean if taken to the
extreme wouldn't this mean that we should disable frags altogether
if some NIC can't handle them at all?

Someone suggested earlier to partially linearise the skb, why
couldn't we do that? IOW let's handle this craziness in the crazy
drivers and not in the general stack.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH net-next 2/4] drivers: net: xgene: Add support for RSS

2016-02-03 Thread kbuild test robot
Hi Iyappan,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Iyappan-Subramanian/Add-support-for-Classifier-and-RSS/20160203-155838
config: i386-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/apm/xgene/xgene_enet_cle.c: In function 
'xgene_enet_cle_init':
>> drivers/net/ethernet/apm/xgene/xgene_enet_cle.c:729:1: warning: the frame 
>> size of 1096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^

vim +729 drivers/net/ethernet/apm/xgene/xgene_enet_cle.c

bdc3a65f Iyappan Subramanian 2016-02-02  713dbptr[DB_RES_DROP].drop = 1;
bdc3a65f Iyappan Subramanian 2016-02-02  714  
bdc3a65f Iyappan Subramanian 2016-02-02  715memset(, 0, sizeof(kn));
bdc3a65f Iyappan Subramanian 2016-02-02  716kn.node_type = KN;
bdc3a65f Iyappan Subramanian 2016-02-02  717kn.num_keys = 1;
bdc3a65f Iyappan Subramanian 2016-02-02  718kn.key[0].priority = 0;
bdc3a65f Iyappan Subramanian 2016-02-02  719kn.key[0].result_pointer = 
DB_RES_ACCEPT;
bdc3a65f Iyappan Subramanian 2016-02-02  720  
bdc3a65f Iyappan Subramanian 2016-02-02  721ptree->dn = ptree_dn;
bdc3a65f Iyappan Subramanian 2016-02-02  722ptree->kn = 
bdc3a65f Iyappan Subramanian 2016-02-02  723ptree->dbptr = dbptr;
bdc3a65f Iyappan Subramanian 2016-02-02  724ptree->num_dn = MAX_NODES;
bdc3a65f Iyappan Subramanian 2016-02-02  725ptree->num_kn = 1;
bdc3a65f Iyappan Subramanian 2016-02-02  726ptree->num_dbptr = DB_MAX_PTRS;
bdc3a65f Iyappan Subramanian 2016-02-02  727  
bdc3a65f Iyappan Subramanian 2016-02-02  728return 
xgene_cle_setup_ptree(pdata, enet_cle);
bdc3a65f Iyappan Subramanian 2016-02-02 @729  }
bdc3a65f Iyappan Subramanian 2016-02-02  730  
bdc3a65f Iyappan Subramanian 2016-02-02  731  struct xgene_cle_ops 
xgene_cle3in_ops = {
bdc3a65f Iyappan Subramanian 2016-02-02  732.cle_init = xgene_enet_cle_init,
bdc3a65f Iyappan Subramanian 2016-02-02  733  };

:: The code at line 729 was first introduced by commit
:: bdc3a65fd7d47cc058cc913cfbd8f2934ef7f7a2 drivers: net: xgene: Add 
support for Classifier engine

:: TO: Iyappan Subramanian <isubraman...@apm.com>
:: CC: 0day robot <fengguang...@intel.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter

2016-02-03 Thread kbuild test robot
Hi John,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: x86_64-randconfig-x005-201605 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from drivers/net/ethernet/intel/fm10k/fm10k.h:24,
from drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:21:
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c: In function 
'__fm10k_setup_tc':
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: error: 'TC_H_ROOT' 
undeclared (first use in this function)
 if (handle != TC_H_ROOT)
   ^
   include/linux/compiler.h:147:28: note: in definition of macro '__trace_if'
 if (__builtin_constant_p((cond)) ? !!(cond) :   \
   ^
>> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:2: note: in expansion 
>> of macro 'if'
 if (handle != TC_H_ROOT)
 ^
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: note: each 
undeclared identifier is reported only once for each function it appears in
 if (handle != TC_H_ROOT)
   ^
   include/linux/compiler.h:147:28: note: in definition of macro '__trace_if'
 if (__builtin_constant_p((cond)) ? !!(cond) :   \
   ^
>> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:2: note: in expansion 
>> of macro 'if'
 if (handle != TC_H_ROOT)
 ^

vim +/if +1209 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c

  1193  /* flag to indicate SWPRI has yet to be updated */
  1194  interface->flags |= FM10K_FLAG_SWPRI_CONFIG;
  1195  
  1196  return 0;
  1197  err_open:
  1198  fm10k_mbx_free_irq(interface);
  1199  err_mbx_irq:
  1200  fm10k_clear_queueing_scheme(interface);
  1201  err_queueing_scheme:
  1202  netif_device_detach(dev);
  1203  
  1204  return err;
  1205  }
  1206  
  1207  static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
  1208  {
> 1209  if (handle != TC_H_ROOT)
  1210  return -EINVAL;
  1211  
  1212  return fm10k_setup_tc(dev, tc);
  1213  }
  1214  
  1215  static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, 
int cmd)
  1216  {
  1217  switch (cmd) {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[patch net-next RFC 1/6] Introduce devlink infrastructure

2016-02-03 Thread Jiri Pirko
From: Jiri Pirko 

Introduce devlink infrastructure for drivers to register and expose to
userspace via generic Netlink interface.

There are two basic objects defined:
devlink - one instance for every "parent device", for example switch ASIC
devlink port - one instance for every physical port of the device.

This initial portion implements basic get/dump of objects to userspace.
Also, hardware message monitoring and port type setting is implemented.

Signed-off-by: Jiri Pirko 
---
 MAINTAINERS  |   8 +
 include/net/devlink.h| 152 
 include/uapi/linux/devlink.h |  83 +
 net/Kconfig  |   7 +
 net/core/Makefile|   1 +
 net/core/devlink.c   | 856 +++
 6 files changed, 1107 insertions(+)
 create mode 100644 include/net/devlink.h
 create mode 100644 include/uapi/linux/devlink.h
 create mode 100644 net/core/devlink.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f678c37..c4efa8d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3501,6 +3501,14 @@ F:   include/linux/device-mapper.h
 F: include/linux/dm-*.h
 F: include/uapi/linux/dm-*.h
 
+DEVLINK
+M: Jiri Pirko 
+L: netdev@vger.kernel.org
+S: Supported
+F: net/core/devlink.c
+F: include/net/devlink.h
+F: include/uapi/linux/devlink.h
+
 DIALOG SEMICONDUCTOR DRIVERS
 M: Support Opensource 
 W: http://www.dialog-semiconductor.com/products
diff --git a/include/net/devlink.h b/include/net/devlink.h
new file mode 100644
index 000..a7888e2
--- /dev/null
+++ b/include/net/devlink.h
@@ -0,0 +1,152 @@
+/*
+ * include/net/devlink.h - Network physical device Netlink interface
+ * Copyright (c) 2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2016 Jiri Pirko 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#ifndef _NET_DEVLINK_H_
+#define _NET_DEVLINK_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct devlink_ops;
+
+struct devlink {
+   struct list_head list;
+   struct list_head port_list;
+   int index;
+   const struct devlink_ops *ops;
+   struct device dev;
+   possible_net_t _net;
+   char priv[0] __aligned(NETDEV_ALIGN);
+};
+
+struct devlink_port {
+   struct list_head list;
+   struct devlink *devlink;
+   unsigned index;
+   enum devlink_port_type type;
+   enum devlink_port_type desired_type;
+   void *type_dev;
+};
+
+struct devlink_ops {
+   size_t priv_size;
+   int (*port_type_set)(struct devlink_port *devlink_port,
+enum devlink_port_type port_type);
+};
+
+static inline void *devlink_priv(struct devlink *devlink)
+{
+   BUG_ON(!devlink);
+   return >priv;
+}
+
+static inline struct devlink *priv_to_devlink(void *priv)
+{
+   BUG_ON(!priv);
+   return container_of(priv, struct devlink, priv);
+}
+
+static inline struct device *devlink_dev(struct devlink *devlink)
+{
+   return >dev;
+}
+
+static inline void set_devlink_dev(struct devlink *devlink, struct device *dev)
+{
+   devlink->dev.parent = dev;
+}
+
+static inline const char *devlink_name(const struct devlink *devlink)
+{
+   return dev_name(>dev);
+}
+
+struct ib_device;
+
+#if IS_ENABLED(CONFIG_NET_DEVLINK)
+
+struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size);
+int devlink_register(struct devlink *devlink);
+void devlink_unregister(struct devlink *devlink);
+void devlink_free(struct devlink *devlink);
+void devlink_hwmsg_notify(struct devlink *devlink,
+ const char *buf, size_t buf_len,
+ enum devlink_hwmsg_type type,
+ enum devlink_hwmsg_dir dir,
+ gfp_t gfp_mask);
+int devlink_port_register(struct devlink *devlink,
+ struct devlink_port *devlink_port,
+ unsigned int port_index);
+void devlink_port_unregister(struct devlink_port *devlink_port);
+void devlink_port_type_eth_set(struct devlink_port *devlink_port,
+  struct net_device *netdev);
+void devlink_port_type_ib_set(struct devlink_port *devlink_port,
+ struct ib_device *ibdev);
+void devlink_port_type_clear(struct devlink_port *devlink_port);
+
+#else
+
+static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
+   size_t priv_size)
+{
+   return kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL);
+}
+
+static inline int devlink_register(struct devlink *devlink)
+{
+   return 0;
+}
+
+static inline void 

[patch net-next RFC 5/6] mlx4: Implement hardware messages notification using devlink

2016-02-03 Thread Jiri Pirko
From: Jiri Pirko 

Use devlink HW message notification facilities to pass massages going
to and from HW to userspace via Netlink multicast.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 9 +
 include/uapi/linux/devlink.h| 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index d66c690..2735211 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fw.h"
 #include "icm.h"
@@ -2734,6 +2735,8 @@ static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 
reg_id,
 {
struct mlx4_cmd_mailbox *inbox, *outbox;
struct mlx4_access_reg *inbuf, *outbuf;
+   struct mlx4_priv *priv = mlx4_priv(dev);
+   struct devlink *devlink = priv_to_devlink(priv);
int err;
 
inbox = mlx4_alloc_cmd_mailbox(dev);
@@ -2760,6 +2763,9 @@ static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 
reg_id,
((0x3) << 12));
 
memcpy(inbuf->reg_data, reg_data, reg_len);
+   devlink_hwmsg_notify(devlink, reg_data, reg_len,
+DEVLINK_HWMSG_TYPE_MLX_CMD_REG,
+DEVLINK_HWMSG_DIR_TO_HW, GFP_KERNEL);
err = mlx4_cmd_box(dev, inbox->dma, outbox->dma, 0, 0,
   MLX4_CMD_ACCESS_REG, MLX4_CMD_TIME_CLASS_C,
   MLX4_CMD_WRAPPED);
@@ -2775,6 +2781,9 @@ static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 
reg_id,
}
 
memcpy(reg_data, outbuf->reg_data, reg_len);
+   devlink_hwmsg_notify(devlink, reg_data, reg_len,
+DEVLINK_HWMSG_TYPE_MLX_CMD_REG,
+DEVLINK_HWMSG_DIR_FROM_HW, GFP_KERNEL);
 out:
mlx4_free_cmd_mailbox(dev, inbox);
mlx4_free_cmd_mailbox(dev, outbox);
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 761612b..f06f4f7 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -41,6 +41,7 @@ enum devlink_command {
 
 enum devlink_hwmsg_type {
DEVLINK_HWMSG_TYPE_MLX_EMAD, /* Mellanox EMAD packet */
+   DEVLINK_HWMSG_TYPE_MLX_CMD_REG, /* Mellanox CMD iface register access */
 };
 
 enum devlink_hwmsg_dir {
-- 
1.9.3



[patch net-next RFC 3/6] mlxsw: Implement hardware messages notification using devlink

2016-02-03 Thread Jiri Pirko
From: Jiri Pirko 

Use devlink HW message notification facilities to pass massages going
to and from HW to userspace via Netlink multicast.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 15 +++
 include/uapi/linux/devlink.h   |  2 +-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 57d9655..da4e6c9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -65,6 +65,7 @@
 #include "trap.h"
 #include "emad.h"
 #include "reg.h"
+#include "txheader.h"
 
 static LIST_HEAD(mlxsw_core_driver_list);
 static DEFINE_SPINLOCK(mlxsw_core_driver_list_lock);
@@ -1099,7 +1100,11 @@ static int mlxsw_core_reg_access_emad(struct mlxsw_core 
*mlxsw_core,
 
dev_dbg(mlxsw_core->bus_info->dev, "EMAD send (tid=%llx)\n",
mlxsw_core->emad.tid);
-   mlxsw_core_buf_dump_dbg(mlxsw_core, skb->data, skb->len);
+   devlink_hwmsg_notify(priv_to_devlink(mlxsw_core),
+skb->data + MLXSW_TXHDR_LEN,
+skb->len - MLXSW_TXHDR_LEN,
+DEVLINK_HWMSG_TYPE_MLX_EMAD,
+DEVLINK_HWMSG_DIR_TO_HW, GFP_KERNEL);
 
err = mlxsw_emad_transmit(mlxsw_core, skb, _info);
if (!err) {
@@ -1109,9 +1114,11 @@ static int mlxsw_core_reg_access_emad(struct mlxsw_core 
*mlxsw_core,
 
dev_dbg(mlxsw_core->bus_info->dev, "EMAD recv (tid=%llx)\n",
mlxsw_core->emad.tid - 1);
-   mlxsw_core_buf_dump_dbg(mlxsw_core,
-   mlxsw_core->emad.resp_skb->data,
-   mlxsw_core->emad.resp_skb->len);
+   devlink_hwmsg_notify(priv_to_devlink(mlxsw_core),
+mlxsw_core->emad.resp_skb->data,
+mlxsw_core->emad.resp_skb->len,
+DEVLINK_HWMSG_TYPE_MLX_EMAD,
+DEVLINK_HWMSG_DIR_FROM_HW, GFP_KERNEL);
 
dev_kfree_skb(mlxsw_core->emad.resp_skb);
}
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 1d9f999..761612b 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -40,7 +40,7 @@ enum devlink_command {
 };
 
 enum devlink_hwmsg_type {
-   DEVLINK_HWMSG_TYPE_TMP, /* temporary, until first message type is 
introduced */
+   DEVLINK_HWMSG_TYPE_MLX_EMAD, /* Mellanox EMAD packet */
 };
 
 enum devlink_hwmsg_dir {
-- 
1.9.3



[patch net-next RFC 4/6] mlx4: Implement devlink interface

2016-02-03 Thread Jiri Pirko
From: Jiri Pirko 

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko 
---
 drivers/infiniband/hw/mlx4/main.c  |  7 
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |  8 -
 drivers/net/ethernet/mellanox/mlx4/intf.c  |  9 ++
 drivers/net/ethernet/mellanox/mlx4/main.c  | 45 +++---
 drivers/net/ethernet/mellanox/mlx4/mlx4.h  |  2 ++
 include/linux/mlx4/driver.h|  3 ++
 6 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c 
b/drivers/infiniband/hw/mlx4/main.c
index 1c7ab6c..a15a7b3 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2519,6 +2520,9 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
}
 
ibdev->ib_active = true;
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+   devlink_port_type_ib_set(mlx4_get_devlink_port(dev, i),
+>ib_dev);
 
if (mlx4_is_mfunc(ibdev->dev))
init_pkeys(ibdev);
@@ -2643,7 +2647,10 @@ static void mlx4_ib_remove(struct mlx4_dev *dev, void 
*ibdev_ptr)
 {
struct mlx4_ib_dev *ibdev = ibdev_ptr;
int p;
+   int i;
 
+   mlx4_foreach_port(i, dev, MLX4_PORT_TYPE_IB)
+   devlink_port_type_clear(mlx4_get_devlink_port(dev, i));
ibdev->ib_active = false;
flush_workqueue(wq);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0c7e3f6..17ac3b0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2024,8 +2025,11 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
en_dbg(DRV, priv, "Destroying netdev on port:%d\n", priv->port);
 
/* Unregister device - this will close the port if it was up */
-   if (priv->registered)
+   if (priv->registered) {
+   devlink_port_type_clear(mlx4_get_devlink_port(mdev->dev,
+ priv->port));
unregister_netdev(dev);
+   }
 
if (priv->allocated)
mlx4_free_hwq_res(mdev->dev, >res, MLX4_EN_PAGE_SIZE);
@@ -3041,6 +3045,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int 
port,
}
 
priv->registered = 1;
+   devlink_port_type_eth_set(mlx4_get_devlink_port(mdev->dev, priv->port),
+ dev);
 
return 0;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/intf.c 
b/drivers/net/ethernet/mellanox/mlx4/intf.c
index 0472941..dec77d6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/intf.c
+++ b/drivers/net/ethernet/mellanox/mlx4/intf.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mlx4.h"
 
@@ -249,3 +250,11 @@ void *mlx4_get_protocol_dev(struct mlx4_dev *dev, enum 
mlx4_protocol proto, int
return result;
 }
 EXPORT_SYMBOL_GPL(mlx4_get_protocol_dev);
+
+struct devlink_port *mlx4_get_devlink_port(struct mlx4_dev *dev, int port)
+{
+   struct mlx4_port_info *info = _priv(dev)->port[port];
+
+   return >devlink_port;
+}
+EXPORT_SYMBOL_GPL(mlx4_get_devlink_port);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index f1b6d21..a5f54a5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2847,8 +2848,13 @@ no_msi:
 
 static int mlx4_init_port_info(struct mlx4_dev *dev, int port)
 {
+   struct devlink *devlink = priv_to_devlink(mlx4_priv(dev));
struct mlx4_port_info *info = _priv(dev)->port[port];
-   int err = 0;
+   int err;
+
+   err = devlink_port_register(devlink, >devlink_port, port);
+   if (err)
+   return err;
 
info->dev = dev;
info->port = port;
@@ -2873,6 +2879,7 @@ static int mlx4_init_port_info(struct mlx4_dev *dev, int 
port)
err = device_create_file(>persist->pdev->dev, >port_attr);
if (err) {
mlx4_err(dev, "Failed to create file for port %d\n", port);
+   devlink_port_unregister(>devlink_port);
info->port = -1;
}
 
@@ -3646,21 +3653,23 @@ err_disable_pdev:
 
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
+   struct devlink *devlink;
struct mlx4_priv *priv;
struct mlx4_dev *dev;
int ret;
 
printk_once(KERN_INFO "%s", mlx4_version);
 
-   priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-   if (!priv)
+   

[patch net-next RFC 2/6] mlxsw: Implement devlink interface

2016-02-03 Thread Jiri Pirko
From: Jiri Pirko 

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 24 ++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 20 
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  2 ++
 drivers/net/ethernet/mellanox/mlxsw/switchx2.c | 20 
 4 files changed, 60 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 22379eb..57d9655 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "core.h"
 #include "item.h"
@@ -791,6 +792,7 @@ int mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
const char *device_kind = mlxsw_bus_info->device_kind;
struct mlxsw_core *mlxsw_core;
struct mlxsw_driver *mlxsw_driver;
+   struct devlink *devlink;
size_t alloc_size;
int err;
 
@@ -798,12 +800,13 @@ int mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
if (!mlxsw_driver)
return -EINVAL;
alloc_size = sizeof(*mlxsw_core) + mlxsw_driver->priv_size;
-   mlxsw_core = kzalloc(alloc_size, GFP_KERNEL);
-   if (!mlxsw_core) {
+   devlink = devlink_alloc(NULL, alloc_size);
+   if (!devlink) {
err = -ENOMEM;
-   goto err_core_alloc;
+   goto err_devlink_alloc;
}
 
+   mlxsw_core = devlink_priv(devlink);
INIT_LIST_HEAD(_core->rx_listener_list);
INIT_LIST_HEAD(_core->event_listener_list);
mlxsw_core->driver = mlxsw_driver;
@@ -841,6 +844,11 @@ int mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
if (err)
goto err_hwmon_init;
 
+   set_devlink_dev(devlink, mlxsw_bus_info->dev);
+   err = devlink_register(devlink);
+   if (err)
+   goto err_devlink_register;
+
err = mlxsw_driver->init(mlxsw_core->driver_priv, mlxsw_core,
 mlxsw_bus_info);
if (err)
@@ -855,6 +863,8 @@ int mlxsw_core_bus_device_register(const struct 
mlxsw_bus_info *mlxsw_bus_info,
 err_debugfs_init:
mlxsw_core->driver->fini(mlxsw_core->driver_priv);
 err_driver_init:
+   devlink_unregister(devlink);
+err_devlink_register:
 err_hwmon_init:
mlxsw_emad_fini(mlxsw_core);
 err_emad_init:
@@ -864,8 +874,8 @@ err_bus_init:
 err_alloc_lag_mapping:
free_percpu(mlxsw_core->pcpu_stats);
 err_alloc_stats:
-   kfree(mlxsw_core);
-err_core_alloc:
+   devlink_free(devlink);
+err_devlink_alloc:
mlxsw_core_driver_put(device_kind);
return err;
 }
@@ -874,14 +884,16 @@ EXPORT_SYMBOL(mlxsw_core_bus_device_register);
 void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 {
const char *device_kind = mlxsw_core->bus_info->device_kind;
+   struct devlink *devlink = priv_to_devlink(mlxsw_core);
 
mlxsw_core_debugfs_fini(mlxsw_core);
mlxsw_core->driver->fini(mlxsw_core->driver_priv);
+   devlink_unregister(devlink);
mlxsw_emad_fini(mlxsw_core);
mlxsw_core->bus->fini(mlxsw_core->bus_priv);
kfree(mlxsw_core->lag.mapping);
free_percpu(mlxsw_core->pcpu_stats);
-   kfree(mlxsw_core);
+   devlink_free(devlink);
mlxsw_core_driver_put(device_kind);
 }
 EXPORT_SYMBOL(mlxsw_core_bus_device_unregister);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 217856b..9d4b06c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1351,7 +1352,9 @@ static const struct ethtool_ops mlxsw_sp_port_ethtool_ops 
= {
 
 static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port)
 {
+   struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
struct mlxsw_sp_port *mlxsw_sp_port;
+   struct devlink_port *devlink_port;
struct net_device *dev;
bool usable;
size_t bytes;
@@ -1360,6 +1363,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp 
*mlxsw_sp, u8 local_port)
dev = alloc_etherdev(sizeof(struct mlxsw_sp_port));
if (!dev)
return -ENOMEM;
+   SET_NETDEV_DEV(dev, devlink_dev(devlink));
mlxsw_sp_port = netdev_priv(dev);
mlxsw_sp_port->dev = dev;
mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
@@ -1417,6 +1421,14 @@ static int mlxsw_sp_port_create(struct mlxsw_sp 
*mlxsw_sp, u8 local_port)
goto port_not_usable;
}
 
+   devlink_port = 

Re: [PATCH net] ipv6: fix a lockdep splat

2016-02-03 Thread Hannes Frederic Sowa

On 03.02.2016 02:55, Eric Dumazet wrote:

From: Eric Dumazet 

Silence lockdep false positive about rcu_dereference() being
used in the wrong context.

First one should use rcu_dereference_protected() as we own the spinlock.

>

Second one should be a normal assignation, as no barrier is needed.


Acked-by: Hannes Frederic Sowa 


Re: [PATCH v3] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-03 Thread Sudip Mukherjee
On Wed, Feb 03, 2016 at 02:21:46PM +0900, Byeoungwook Kim wrote:
> Conditional codes in rtl_addr_delay() were improved in readability and
> performance by using switch codes.
> 
> Reviewed-by: Julian Calaby 
> Signed-off-by: Byeoungwook Kim 
> Signed-off-by: Fengguang Wu 

How you are using Signed-off-by: of Fengguang Wu?

did i missed seeing any mail from Fengguang in your previous versions?

regards
sudip


[net-next PATCH 2/7] net: rework setup_tc ndo op to consume general tc operand

2016-02-03 Thread John Fastabend
This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |7 ---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |3 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   |8 ++--
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c |7 ---
 drivers/net/ethernet/intel/i40e/i40e_main.c |7 ---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c   |7 ---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |7 ---
 drivers/net/ethernet/sfc/efx.h  |3 ++-
 drivers/net/ethernet/sfc/tx.c   |9 ++---
 drivers/net/ethernet/ti/netcp_core.c|   13 +++--
 include/linux/netdevice.h   |   20 +++-
 net/sched/sch_mqprio.c  |9 ++---
 12 files changed, 68 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 1c7ff51..e925831 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,11 +4272,12 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
return 0;
 }
 
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+struct tc_to_netdev *tc)
 {
-   if (handle != TC_H_ROOT)
+   if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
return -EINVAL;
-   return bnx2x_setup_tc(dev, num_tc);
+   return bnx2x_setup_tc(dev, tc->tc);
 }
 
 /* called with rtnl_lock */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index e92d6e7..ef2c776 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,7 +486,8 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
-int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+struct tc_to_netdev *tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ff08faf..169920a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,13 +5370,17 @@ static int bnxt_change_mtu(struct net_device *dev, int 
new_mtu)
return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+struct tc_to_netdev *ntc)
 {
struct bnxt *bp = netdev_priv(dev);
+   u8 tc;
 
-   if (handle != TC_H_ROOT)
+   if (handle != TC_H_ROOT || ntc->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
+   tc = ntc->tc;
+
if (tc > bp->max_tc) {
netdev_err(dev, "too many traffic classes requested: %d Max 
supported is %d\n",
   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 12701a4..dc1a821 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1204,12 +1204,13 @@ err_queueing_scheme:
return err;
 }
 
-static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, __be16 proto,
+   struct tc_to_netdev *tc)
 {
-   if (handle != TC_H_ROOT)
+   if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)
return -EINVAL;
 
-   return fm10k_setup_tc(dev, tc);
+   return fm10k_setup_tc(dev, tc->tc);
 }
 
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ae70e7c..8a84328 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5306,11 +5306,12 @@ exit:
return ret;
 }
 
-static int __i40e_setup_tc(struct net_device *netdev, u32 handle, u8 tc)
+static int __i40e_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+  struct tc_to_netdev *tc)
 {
-   if (handle != 

[net-next PATCH 5/7] net: tc: helper functions to query action types

2016-02-03 Thread John Fastabend
This is a helper function drivers can use to learn if the
action type is a drop action.

Signed-off-by: John Fastabend 
---
 include/net/tc_act/tc_gact.h |   16 
 1 file changed, 16 insertions(+)

diff --git a/include/net/tc_act/tc_gact.h b/include/net/tc_act/tc_gact.h
index 592a6bc..3067a10 100644
--- a/include/net/tc_act/tc_gact.h
+++ b/include/net/tc_act/tc_gact.h
@@ -2,6 +2,7 @@
 #define __NET_TC_GACT_H
 
 #include 
+#include 
 
 struct tcf_gact {
struct tcf_common   common;
@@ -15,4 +16,19 @@ struct tcf_gact {
 #define to_gact(a) \
container_of(a->priv, struct tcf_gact, common)
 
+#ifdef CONFIG_NET_CLS_ACT
+static inline bool is_tcf_gact_dropped(const struct tc_action *a)
+{
+   struct tcf_gact *gact;
+
+   if (a->ops && a->ops->type != TCA_ACT_GACT)
+   return false;
+
+   gact = a->priv;
+   if (gact->tcf_action == TC_ACT_SHOT)
+   return true;
+
+   return false;
+}
+#endif
 #endif /* __NET_TC_GACT_H */



[net-next PATCH 4/7] net: add tc offload feature flag

2016-02-03 Thread John Fastabend
Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.

Signed-off-by: John Fastabend 
---
 include/linux/netdev_features.h |3 +++
 net/core/ethtool.c  |1 +
 2 files changed, 4 insertions(+)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index d9654f0e..a734bf4 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -67,6 +67,8 @@ enum {
NETIF_F_HW_L2FW_DOFFLOAD_BIT,   /* Allow L2 Forwarding in Hardware */
NETIF_F_BUSY_POLL_BIT,  /* Busy poll */
 
+   NETIF_F_HW_TC_BIT,  /* Offload TC infrastructure */
+
/*
 * Add your fresh new feature above and remember to update
 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -124,6 +126,7 @@ enum {
 #define NETIF_F_HW_VLAN_STAG_TX__NETIF_F(HW_VLAN_STAG_TX)
 #define NETIF_F_HW_L2FW_DOFFLOAD   __NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL  __NETIF_F(BUSY_POLL)
+#define NETIF_F_HW_TC  __NETIF_F(HW_TC)
 
 #define for_each_netdev_feature(mask_addr, bit)\
for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index daf0470..636c1c1 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -98,6 +98,7 @@ static const char 
netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
[NETIF_F_RXALL_BIT] ="rx-all",
[NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
[NETIF_F_BUSY_POLL_BIT] ="busy-poll",
+   [NETIF_F_HW_TC_BIT] ="hw-tc-offload",
 };
 
 static const char



[net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs

2016-02-03 Thread John Fastabend
This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.

Signed-off-by: John Fastabend 
---
 include/linux/netdevice.h |6 +++-
 include/net/pkt_cls.h |   33 
 net/sched/cls_u32.c   |   73 -
 3 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9090ff7..861ce67 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -778,17 +778,21 @@ static inline bool netdev_phys_item_id_same(struct 
netdev_phys_item_id *a,
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
   struct sk_buff *skb);
 
-/* This structure holds attributes of qdisc and classifiers
+/* These structures hold the attributes of qdisc and classifiers
  * that are being passed to the netdevice through the setup_tc op.
  */
 enum {
TC_SETUP_MQPRIO,
+   TC_SETUP_CLSU32,
 };
 
+struct tc_cls_u32_offload;
+
 struct tc_to_netdev {
unsigned int type;
union {
u8 tc;
+   struct tc_cls_u32_offload *cls_u32;
};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index bc49967..0bd12cd 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -358,4 +358,37 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
+struct tc_cls_u32_knode {
+   struct tcf_exts *exts;
+   u8 fshift;
+   u32 handle;
+   u32 val;
+   u32 mask;
+   u32 link_handle;
+   struct tc_u32_sel *sel;
+};
+
+struct tc_cls_u32_hnode {
+   u32 handle;
+   u32 prio;
+   unsigned int divisor;
+};
+
+enum {
+   TC_CLSU32_NEW_KNODE,
+   TC_CLSU32_REPLACE_KNODE,
+   TC_CLSU32_DELETE_KNODE,
+   TC_CLSU32_NEW_HNODE,
+   TC_CLSU32_REPLACE_HNODE,
+};
+
+struct tc_cls_u32_offload {
+   /* knode values */
+   int command;
+   union {
+   struct tc_cls_u32_knode knode;
+   struct tc_cls_u32_hnode hnode;
+   };
+};
+
 #endif
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4fbb674..dfaaf29 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct tc_u_knode {
struct tc_u_knode __rcu *next;
@@ -424,6 +425,68 @@ static int u32_delete_key(struct tcf_proto *tp, struct 
tc_u_knode *key)
return 0;
 }
 
+static void u32_remove_hw_knode(struct tcf_proto *tp, u32 handle)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = _offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_DELETE_KNODE;
+   offload.cls_u32->knode.handle = handle;
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, );
+   }
+}
+
+static void u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = _offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_NEW_HNODE;
+   offload.cls_u32->hnode.divisor = h->divisor;
+   offload.cls_u32->hnode.handle = h->handle;
+   offload.cls_u32->hnode.prio = h->prio;
+
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, );
+   }
+}
+
+static void u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_u32_offload u32_offload = {0};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSU32;
+   offload.cls_u32 = _offload;
+
+   if (dev->netdev_ops->ndo_setup_tc) {
+   offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
+   offload.cls_u32->knode.handle = n->handle;
+   offload.cls_u32->knode.fshift = n->fshift;
+   offload.cls_u32->knode.val = n->val;
+   offload.cls_u32->knode.mask = n->mask;
+   offload.cls_u32->knode.sel = >sel;
+   offload.cls_u32->knode.exts = >exts;
+   if (n->ht_down)
+   offload.cls_u32->knode.link_handle = n->ht_down->handle;
+
+   dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+ tp->protocol, );
+   }
+}
+
 static void 

[net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter

2016-02-03 Thread John Fastabend
The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri 
CC: Shradha Shah 
CC: Or Gerlitz 
CC: Ariel Elior 
CC: Jeff Kirsher 
CC: Bruce Allan 
CC: Jesse Brandeburg 
CC: Don Skidmore 
Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |7 +++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c|5 -
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   10 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c  |9 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|   11 ++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   12 ++--
 drivers/net/ethernet/sfc/efx.h   |2 +-
 drivers/net/ethernet/sfc/tx.c|5 -
 drivers/net/ethernet/ti/netcp_core.c |5 -
 include/linux/netdevice.h|2 +-
 net/sched/sch_mqprio.c   |5 +++--
 13 files changed, 63 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 9695a4c..1c7ff51 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -4272,6 +4272,13 @@ int bnx2x_setup_tc(struct net_device *dev, u8 num_tc)
return 0;
 }
 
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc)
+{
+   if (handle != TC_H_ROOT)
+   return -EINVAL;
+   return bnx2x_setup_tc(dev, num_tc);
+}
+
 /* called with rtnl_lock */
 int bnx2x_change_mac_addr(struct net_device *dev, void *p)
 {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 4cbb03f8..e92d6e7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -486,6 +486,7 @@ netdev_tx_t bnx2x_start_xmit(struct sk_buff *skb, struct 
net_device *dev);
 
 /* setup_tc callback */
 int bnx2x_setup_tc(struct net_device *dev, u8 num_tc);
+int __bnx2x_setup_tc(struct net_device *dev, u32 handle, u8 num_tc);
 
 int bnx2x_get_vf_config(struct net_device *dev, int vf,
struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 6c4e3a6..b17bb17 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12994,7 +12994,7 @@ static const struct net_device_ops bnx2x_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller= poll_bnx2x,
 #endif
-   .ndo_setup_tc   = bnx2x_setup_tc,
+   .ndo_setup_tc   = __bnx2x_setup_tc,
 #ifdef CONFIG_BNX2X_SRIOV
.ndo_set_vf_mac = bnx2x_set_vf_mac,
.ndo_set_vf_vlan= bnx2x_set_vf_vlan,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5dc89e5..ff08faf 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -5370,10 +5370,13 @@ static int bnxt_change_mtu(struct net_device *dev, int 
new_mtu)
return 0;
 }
 
-static int bnxt_setup_tc(struct net_device *dev, u8 tc)
+static int bnxt_setup_tc(struct net_device *dev, u32 handle, u8 tc)
 {
struct bnxt *bp = netdev_priv(dev);
 
+   if (handle != TC_H_ROOT)
+   return -EINVAL;
+
if (tc > bp->max_tc) {
netdev_err(dev, "too many traffic classes requested: %d Max 
supported is %d\n",
   tc, bp->max_tc);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 662569d..12701a4 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1204,6 +1204,14 @@ err_queueing_scheme:
return err;
 }
 
+static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
+{
+   if (handle != TC_H_ROOT)
+   return -EINVAL;
+
+   return fm10k_setup_tc(dev, tc);
+}
+
 static int fm10k_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 {
switch (cmd) {
@@ -1386,7 +1394,7 @@ static const struct net_device_ops fm10k_netdev_ops 

Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload

2016-02-03 Thread Amir Vadai"
On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
> This adds initial support for offloading the u32 tc classifier. This
> initial implementation only implements a few base matches and actions
> to illustrate the use of the infrastructure patches.
> 
> However it is an interesting subset because it handles the u32 next
> hdr logic to correctly map tcp packets from ip headers using the ihl
> and protocol fields. After this is accepted we can extend the match
> and action fields easily by updating the model header file.
> 
> Also only the drop action is supported initially.
> 
> Here is a short test script,
> 
>  #tc qdisc add dev eth4 ingress
>  #tc filter add dev eth4 parent : protocol ip \
>   u32 ht 800: order 1 \
>   match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
> 
> <-- hardware has dst/src ip match rule installed -->
> 
>  #tc filter del dev eth4 parent : prio 49152
>  #tc filter add dev eth4 parent : protocol ip prio 99 \
>   handle 1: u32 divisor 1
>  #tc filter add dev eth4 protocol ip parent : prio 99 \
>   u32 ht 800: order 1 link 1: \
>   offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>  #tc filter add dev eth4 parent : protocol ip \
>   u32 ht 1: order 3 match tcp src 23  action drop
> 
> <-- hardware has tcp src port rule installed -->
> 
>  #tc qdisc del dev eth4 parent :
> 
> <-- hardware cleaned up -->
> 
> Signed-off-by: John Fastabend 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h |3 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  196 
> ++
>  3 files changed, 198 insertions(+), 7 deletions(-)
> 

What are you doing w.r.t priorities? Are the filters processed by the
order of the priorities?

[...]

>  
> -static int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
> -struct ixgbe_fdir_filter *input,
> -u16 sw_idx)
> +int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
> + struct ixgbe_fdir_filter *input,
> + u16 sw_idx)
>  {
>   struct ixgbe_hw *hw = >hw;
>   struct hlist_node *node2;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 03e236c..a1a91bf 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -51,6 +51,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_OF
>  #include 
> @@ -8200,10 +8201,197 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
>   return 0;
>  }
>  
> +#include 
> +#include "ixgbe_model.h"
Did you leave those #include's in the middle of the file on purpose?

[...]



Re: [PATCH iproute2] ipmonitor: match user option 'all' before 'all-nsid'

2016-02-03 Thread Nicolas Dichtel

Le 03/02/2016 01:53, Roopa Prabhu a écrit :

From: Roopa Prabhu 

'ip monitor all' is broken on older kernels.
This patch fixes 'ip monitor all' to match
'all' and not 'all-nsid'.

It moves parsing arg 'all-nsid' to after parsing
'all'.

Before:
$ip monitor all
NETLINK_LISTEN_ALL_NSID: Protocol not available

After:
$ip monitor all
[NEIGH]Deleted 10.0.0.1 dev eth1 lladdr c4:54:44:4f:b2:dd STALE

Fixes: 449b824ad196 ("ipmonitor: allows to monitor in several netns")
Signed-off-by: Roopa Prabhu 

Acked-by: Nicolas Dichtel 


Re: [PATCH v3] rtlwifi: Fix improve function 'rtl_addr_delay()' in core.c

2016-02-03 Thread ByeoungWook Kim
Hi Sudip,

This patch is exsisted depencency like next.
http://marc.info/?l=linux-wireless=145447963305712=2

I wrote review and the followup patch.
but i seem to that don't write a Cc document.

Sorry for your confused.

Regards,
Byeoungwook.

2016-02-03 17:56 GMT+09:00 Sudip Mukherjee :
> On Wed, Feb 03, 2016 at 02:21:46PM +0900, Byeoungwook Kim wrote:
>> Conditional codes in rtl_addr_delay() were improved in readability and
>> performance by using switch codes.
>>
>> Reviewed-by: Julian Calaby 
>> Signed-off-by: Byeoungwook Kim 
>> Signed-off-by: Fengguang Wu 
>
> How you are using Signed-off-by: of Fengguang Wu?
>
> did i missed seeing any mail from Fengguang in your previous versions?
>
> regards
> sudip


Re: [PATCH v2 net 6/6] net: mvneta: Fix race condition during stopping

2016-02-03 Thread Jisheng Zhang
On Mon, 1 Feb 2016 14:07:47 +0100 Gregory CLEMENT wrote:

> When stopping the port, the CPU notifier are still there whereas the
> mvneta_stop_dev function calls mvneta_percpu_disable() on each CPUs.
> It was possible to have a new CPU coming at this point which could be
> racy.
> 
> This patch adds a flag preventing executing the code notifier for a new CPU
> when the port is stopping.
> 
> Signed-off-by: Gregory CLEMENT 
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index 4d40d2fde7ca..2f53975aa6ec 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -374,6 +374,7 @@ struct mvneta_port {
>* ensuring that the configuration remains coherent.
>*/
>   spinlock_t lock;
> + bool is_stopping;
>  
>   /* Core clock */
>   struct clk *clk;
> @@ -2916,6 +2917,11 @@ static int mvneta_percpu_notifier(struct 
> notifier_block *nfb,
>   switch (action) {
>   case CPU_ONLINE:
>   case CPU_ONLINE_FROZEN:
> + /* Configuring the driver for a new CPU while the
> +  * driver is stopping is racy, so just avoid it.
> +  */
> + if (pp->is_stopping)
> + break;

I still see race. What about another cpu set is_stopping at this point?

Thanks


>   netif_tx_stop_all_queues(pp->dev);
>  
>   /* We have to synchronise on tha napi of each CPU
> @@ -3054,9 +3060,17 @@ static int mvneta_stop(struct net_device *dev)
>  {
>   struct mvneta_port *pp = netdev_priv(dev);
>  
> + /* Inform that we are stopping so we don't want to setup the
> +  * driver for new CPUs in the notifiers
> +  */
> + pp->is_stopping = true;
>   mvneta_stop_dev(pp);
>   mvneta_mdio_remove(pp);
>   unregister_cpu_notifier(>cpu_notifier);
> + /* Now that the notifier are unregistered, we can clear the
> +  * flag
> +  */
> + pp->is_stopping = false;
>   on_each_cpu(mvneta_percpu_disable, pp, true);
>   free_percpu_irq(dev->irq, pp->ports);
>   mvneta_cleanup_rxqs(pp);



[PATCH net-next V2 5/6] e1000: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-03 Thread Stefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/e1000/e1000.h | 2 ++
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 drivers/net/ethernet/intel/e1000/e1000_main.c| 8 
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h 
b/drivers/net/ethernet/intel/e1000/e1000.h
index 98fe5a2..d7bdea7 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -358,6 +358,8 @@ struct net_device *e1000_get_hw_dev(struct e1000_hw *hw);
 extern char e1000_driver_name[];
 extern const char e1000_driver_version[];
 
+int e1000_open(struct net_device *netdev);
+int e1000_close(struct net_device *netdev);
 int e1000_up(struct e1000_adapter *adapter);
 void e1000_down(struct e1000_adapter *adapter);
 void e1000_reinit_locked(struct e1000_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 83e557c..975eeb8 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1553,7 +1553,7 @@ static void e1000_diag_test(struct net_device *netdev,
 
if (if_running)
/* indicate we're in test mode */
-   dev_close(netdev);
+   e1000_close(netdev);
else
e1000_reset(adapter);
 
@@ -1582,7 +1582,7 @@ static void e1000_diag_test(struct net_device *netdev,
e1000_reset(adapter);
clear_bit(__E1000_TESTING, >flags);
if (if_running)
-   dev_open(netdev);
+   e1000_open(netdev);
} else {
e_info(hw, "online testing starting\n");
/* Online tests */
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 3fc7bde..6de0c7d 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -114,8 +114,8 @@ static int e1000_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent);
 static void e1000_remove(struct pci_dev *pdev);
 static int e1000_alloc_queues(struct e1000_adapter *adapter);
 static int e1000_sw_init(struct e1000_adapter *adapter);
-static int e1000_open(struct net_device *netdev);
-static int e1000_close(struct net_device *netdev);
+int e1000_open(struct net_device *netdev);
+int e1000_close(struct net_device *netdev);
 static void e1000_configure_tx(struct e1000_adapter *adapter);
 static void e1000_configure_rx(struct e1000_adapter *adapter);
 static void e1000_setup_rctl(struct e1000_adapter *adapter);
@@ -1360,7 +1360,7 @@ static int e1000_alloc_queues(struct e1000_adapter 
*adapter)
  * handler is registered with the OS, the watchdog task is started,
  * and the stack is notified that the interface is ready.
  **/
-static int e1000_open(struct net_device *netdev)
+int e1000_open(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
@@ -1437,7 +1437,7 @@ err_setup_tx:
  * needs to be disabled.  A global MAC reset is issued to stop the
  * hardware, and all transmit and receive resources are freed.
  **/
-static int e1000_close(struct net_device *netdev)
+int e1000_close(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
-- 
2.5.0



[PATCH net-next V2 2/6] ixgbe: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-03 Thread Stefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4b9156c..6cf1ac7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -875,6 +875,8 @@ extern const char ixgbe_driver_version[];
 extern char ixgbe_default_device_descr[];
 #endif /* IXGBE_FCOE */
 
+int ixgbe_open(struct net_device *netdev);
+int ixgbe_close(struct net_device *netdev);
 void ixgbe_up(struct ixgbe_adapter *adapter);
 void ixgbe_down(struct ixgbe_adapter *adapter);
 void ixgbe_reinit_locked(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index bea96b3..2f05937 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2053,7 +2053,7 @@ static void ixgbe_diag_test(struct net_device *netdev,
 
if (if_running)
/* indicate we're in test mode */
-   dev_close(netdev);
+   ixgbe_close(netdev);
else
ixgbe_reset(adapter);
 
@@ -2091,7 +2091,7 @@ skip_loopback:
/* clear testing bit and return adapter to previous state */
clear_bit(__IXGBE_TESTING, >state);
if (if_running)
-   dev_open(netdev);
+   ixgbe_open(netdev);
else if (hw->mac.ops.disable_tx_laser)
hw->mac.ops.disable_tx_laser(hw);
} else {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c4003a8..354be34 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5988,7 +5988,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, 
int new_mtu)
  * handler is registered with the OS, the watchdog timer is started,
  * and the stack is notified that the interface is ready.
  **/
-static int ixgbe_open(struct net_device *netdev)
+int ixgbe_open(struct net_device *netdev)
 {
struct ixgbe_adapter *adapter = netdev_priv(netdev);
struct ixgbe_hw *hw = >hw;
@@ -6090,7 +6090,7 @@ static void ixgbe_close_suspend(struct ixgbe_adapter 
*adapter)
  * needs to be disabled.  A global MAC reset is issued to stop the
  * hardware, and all transmit and receive resources are freed.
  **/
-static int ixgbe_close(struct net_device *netdev)
+int ixgbe_close(struct net_device *netdev)
 {
struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-- 
2.5.0



[PATCH net-next V2 4/6] igb: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-03 Thread Stefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/igb/igb.h | 2 ++
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c| 8 
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index e3cb93b..5dac361 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -510,6 +510,8 @@ enum igb_boards {
 extern char igb_driver_name[];
 extern char igb_driver_version[];
 
+int igb_open(struct net_device *netdev);
+int igb_close(struct net_device *netdev);
 int igb_up(struct igb_adapter *);
 void igb_down(struct igb_adapter *);
 void igb_reinit_locked(struct igb_adapter *);
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c 
b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 1d329f1..7982243 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2017,7 +2017,7 @@ static void igb_diag_test(struct net_device *netdev,
 
if (if_running)
/* indicate we're in test mode */
-   dev_close(netdev);
+   igb_close(netdev);
else
igb_reset(adapter);
 
@@ -2050,7 +2050,7 @@ static void igb_diag_test(struct net_device *netdev,
 
clear_bit(__IGB_TESTING, >state);
if (if_running)
-   dev_open(netdev);
+   igb_open(netdev);
} else {
dev_info(>pdev->dev, "online testing starting\n");
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 31e5f39..7a2ccb5 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -122,8 +122,8 @@ static void igb_setup_mrqc(struct igb_adapter *);
 static int igb_probe(struct pci_dev *, const struct pci_device_id *);
 static void igb_remove(struct pci_dev *pdev);
 static int igb_sw_init(struct igb_adapter *);
-static int igb_open(struct net_device *);
-static int igb_close(struct net_device *);
+int igb_open(struct net_device *);
+int igb_close(struct net_device *);
 static void igb_configure(struct igb_adapter *);
 static void igb_configure_tx(struct igb_adapter *);
 static void igb_configure_rx(struct igb_adapter *);
@@ -3132,7 +3132,7 @@ err_setup_tx:
return err;
 }
 
-static int igb_open(struct net_device *netdev)
+int igb_open(struct net_device *netdev)
 {
return __igb_open(netdev, false);
 }
@@ -3169,7 +3169,7 @@ static int __igb_close(struct net_device *netdev, bool 
suspending)
return 0;
 }
 
-static int igb_close(struct net_device *netdev)
+int igb_close(struct net_device *netdev)
 {
return __igb_close(netdev, false);
 }
-- 
2.5.0



[PATCH net-next V2 6/6] e1000e: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-03 Thread Stefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

V2: rename e1000_open(), e1000_close() to e1000e_open(), e1000e_close()
to avoid name clash with e1000.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/e1000e/e1000.h   |  2 ++
 drivers/net/ethernet/intel/e1000e/ethtool.c |  4 ++--
 drivers/net/ethernet/intel/e1000e/netdev.c  | 12 ++--
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index 1dc293b..52eb641 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -480,6 +480,8 @@ extern const char e1000e_driver_version[];
 void e1000e_check_options(struct e1000_adapter *adapter);
 void e1000e_set_ethtool_ops(struct net_device *netdev);
 
+int e1000e_open(struct net_device *netdev);
+int e1000e_close(struct net_device *netdev);
 void e1000e_up(struct e1000_adapter *adapter);
 void e1000e_down(struct e1000_adapter *adapter, bool reset);
 void e1000e_reinit_locked(struct e1000_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/e1000e/ethtool.c 
b/drivers/net/ethernet/intel/e1000e/ethtool.c
index 6cab1f3..1e3973a 100644
--- a/drivers/net/ethernet/intel/e1000e/ethtool.c
+++ b/drivers/net/ethernet/intel/e1000e/ethtool.c
@@ -1816,7 +1816,7 @@ static void e1000_diag_test(struct net_device *netdev,
 
if (if_running)
/* indicate we're in test mode */
-   dev_close(netdev);
+   e1000e_close(netdev);
 
if (e1000_reg_test(adapter, [0]))
eth_test->flags |= ETH_TEST_FL_FAILED;
@@ -1849,7 +1849,7 @@ static void e1000_diag_test(struct net_device *netdev,
 
clear_bit(__E1000_TESTING, >state);
if (if_running)
-   dev_open(netdev);
+   e1000e_open(netdev);
} else {
/* Online tests */
 
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index c71ba1b..02449a0 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4495,7 +4495,7 @@ static int e1000_test_msi(struct e1000_adapter *adapter)
 }
 
 /**
- * e1000_open - Called when a network interface is made active
+ * e1000e_open - Called when a network interface is made active
  * @netdev: network interface device structure
  *
  * Returns 0 on success, negative value on failure
@@ -4506,7 +4506,7 @@ static int e1000_test_msi(struct e1000_adapter *adapter)
  * handler is registered with the OS, the watchdog timer is started,
  * and the stack is notified that the interface is ready.
  **/
-static int e1000_open(struct net_device *netdev)
+int e1000e_open(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = >hw;
@@ -4604,7 +4604,7 @@ err_setup_tx:
 }
 
 /**
- * e1000_close - Disables a network interface
+ * e1000e_close - Disables a network interface
  * @netdev: network interface device structure
  *
  * Returns 0, this is not allowed to fail
@@ -4614,7 +4614,7 @@ err_setup_tx:
  * needs to be disabled.  A global MAC reset is issued to stop the
  * hardware, and all transmit and receive resources are freed.
  **/
-static int e1000_close(struct net_device *netdev)
+int e1000e_close(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
struct pci_dev *pdev = adapter->pdev;
@@ -6920,8 +6920,8 @@ static int e1000_set_features(struct net_device *netdev,
 }
 
 static const struct net_device_ops e1000e_netdev_ops = {
-   .ndo_open   = e1000_open,
-   .ndo_stop   = e1000_close,
+   .ndo_open   = e1000e_open,
+   .ndo_stop   = e1000e_close,
.ndo_start_xmit = e1000_xmit_frame,
.ndo_get_stats64= e1000e_get_stats64,
.ndo_set_rx_mode= e1000e_set_rx_mode,
-- 
2.5.0



[PATCH net-next V2 0/6] net/intel: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-03 Thread Stefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Tested this with igb, e1000e, ixgbe and i40e. All drivers now are able to
resume operation without restoring the IP address, gateway or such.

V2: rename e1000_open(), e1000_close() to e1000e_open(), e1000e_close()
to avoid name clash with e1000.


Stefan Assmann (6):
  i40e: call ndo_stop() instead of dev_close() when running offline
selftest
  ixgbe: call ndo_stop() instead of dev_close() when running offline
selftest
  ixgbevf: call ndo_stop() instead of dev_close() when running offline
selftest
  igb: call ndo_stop() instead of dev_close() when running offline
selftest
  e1000: call ndo_stop() instead of dev_close() when running offline
selftest
  e1000e: call ndo_stop() instead of dev_close() when running offline
selftest

 drivers/net/ethernet/intel/e1000/e1000.h  |  2 ++
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c  |  4 ++--
 drivers/net/ethernet/intel/e1000/e1000_main.c |  8 
 drivers/net/ethernet/intel/e1000e/e1000.h |  2 ++
 drivers/net/ethernet/intel/e1000e/ethtool.c   |  4 ++--
 drivers/net/ethernet/intel/e1000e/netdev.c| 12 ++--
 drivers/net/ethernet/intel/i40e/i40e.h|  2 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c|  4 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  4 
 drivers/net/ethernet/intel/igb/igb.h  |  2 ++
 drivers/net/ethernet/intel/igb/igb_ethtool.c  |  4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c |  8 
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c  |  4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  4 ++--
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  |  4 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  2 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  4 ++--
 18 files changed, 41 insertions(+), 35 deletions(-)

-- 
2.5.0



[PATCH net-next V2 1/6] i40e: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-03 Thread Stefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 4 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c| 4 
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 68f2204..7e16c55 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -773,6 +773,7 @@ int i40e_vlan_rx_kill_vid(struct net_device *netdev,
  __always_unused __be16 proto, u16 vid);
 #endif
 int i40e_open(struct net_device *netdev);
+int i40e_close(struct net_device *netdev);
 int i40e_vsi_open(struct i40e_vsi *vsi);
 void i40e_vlan_stripping_disable(struct i40e_vsi *vsi);
 int i40e_vsi_add_vlan(struct i40e_vsi *vsi, s16 vid);
@@ -785,7 +786,6 @@ bool i40e_is_vsi_in_vlan(struct i40e_vsi *vsi);
 struct i40e_mac_filter *i40e_find_mac(struct i40e_vsi *vsi, u8 *macaddr,
  bool is_vf, bool is_netdev);
 #ifdef I40E_FCOE
-int i40e_close(struct net_device *netdev);
 int i40e_setup_tc(struct net_device *netdev, u8 tc);
 void i40e_netpoll(struct net_device *netdev);
 int i40e_fcoe_enable(struct net_device *netdev);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 29d5833..eeca530 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1704,7 +1704,7 @@ static void i40e_diag_test(struct net_device *netdev,
/* If the device is online then take it offline */
if (if_running)
/* indicate we're in test mode */
-   dev_close(netdev);
+   i40e_close(netdev);
else
/* This reset does not affect link - if it is
 * changed to a type of reset that does affect
@@ -1733,7 +1733,7 @@ static void i40e_diag_test(struct net_device *netdev,
i40e_do_reset(pf, BIT(__I40E_PF_RESET_REQUESTED));
 
if (if_running)
-   dev_open(netdev);
+   i40e_open(netdev);
} else {
/* Online tests */
netif_info(pf, drv, netdev, "online testing starting\n");
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8f3b53e..39a1653 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5459,11 +5459,7 @@ static void i40e_fdir_filter_exit(struct i40e_pf *pf)
  *
  * Returns 0, this is not allowed to fail
  **/
-#ifdef I40E_FCOE
 int i40e_close(struct net_device *netdev)
-#else
-static int i40e_close(struct net_device *netdev)
-#endif
 {
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
-- 
2.5.0



[patch net-next 1/3] bridge: mdb: add support for offloaded mdb entries

2016-02-03 Thread Jiri Pirko
From: Elad Raz 

Add new bitmask member 'flags' to br_mdb_entry structure. Adding
MDB_FLAGS_OFFLOAD bit which indicates MDB entries is offloaded to hardware.

Signed-off-by: Elad Raz 
Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/if_bridge.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 18db144..ec35472 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -183,6 +183,8 @@ struct br_mdb_entry {
 #define MDB_TEMPORARY 0
 #define MDB_PERMANENT 1
__u8 state;
+#define MDB_FLAGS_OFFLOAD  (1 << 0)
+   __u8 flags;
__u16 vid;
struct {
union {
-- 
1.9.3



[patch net-next 2/3] bridge: mdb: Separate br_mdb_entry->state from net_bridge_port_group->state

2016-02-03 Thread Jiri Pirko
From: Elad Raz 

Change net_bridge_port_group 'state' member to 'flags' and define new set
of flags internal to the kernel.

Signed-off-by: Elad Raz 
Signed-off-by: Jiri Pirko 
---
 net/bridge/br_mdb.c   | 16 
 net/bridge/br_multicast.c | 16 
 net/bridge/br_private.h   |  9 ++---
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 30e105f..5312570 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -41,6 +41,14 @@ fail:
return -EMSGSIZE;
 }
 
+static void __mdb_entry_fill_flags(struct br_mdb_entry *e, unsigned char flags)
+{
+   e->state = flags & MDB_PG_FLAGS_PERMANENT;
+   e->flags = 0;
+   if (flags & MDB_PG_FLAGS_OFFLOAD)
+   e->flags |= MDB_FLAGS_OFFLOAD;
+}
+
 static int br_mdb_fill_info(struct sk_buff *skb, struct netlink_callback *cb,
struct net_device *dev)
 {
@@ -85,8 +93,8 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct 
netlink_callback *cb,
struct br_mdb_entry e;
memset(, 0, sizeof(e));
e.ifindex = port->dev->ifindex;
-   e.state = p->state;
e.vid = p->addr.vid;
+   __mdb_entry_fill_flags(, p->flags);
if (p->addr.proto == htons(ETH_P_IP))
e.addr.u.ip4 = p->addr.u.ip4;
 #if IS_ENABLED(CONFIG_IPV6)
@@ -254,7 +262,7 @@ errout:
 }
 
 void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
-  struct br_ip *group, int type, u8 state)
+  struct br_ip *group, int type, u8 flags)
 {
struct br_mdb_entry entry;
 
@@ -265,8 +273,8 @@ void br_mdb_notify(struct net_device *dev, struct 
net_bridge_port *port,
 #if IS_ENABLED(CONFIG_IPV6)
entry.addr.u.ip6 = group->u.ip6;
 #endif
-   entry.state = state;
entry.vid = group->vid;
+   __mdb_entry_fill_flags(, flags);
__br_mdb_notify(dev, , type);
 }
 
@@ -568,7 +576,7 @@ static int __br_mdb_del(struct net_bridge *br, struct 
br_mdb_entry *entry)
if (p->port->state == BR_STATE_DISABLED)
goto unlock;
 
-   entry->state = p->state;
+   __mdb_entry_fill_flags(entry, p->flags);
rcu_assign_pointer(*pp, p->next);
hlist_del_init(>mglist);
del_timer(>timer);
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 03661d9..d156491 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -284,7 +284,7 @@ static void br_multicast_del_pg(struct net_bridge *br,
hlist_del_init(>mglist);
del_timer(>timer);
br_mdb_notify(br->dev, p->port, >addr, RTM_DELMDB,
- p->state);
+ p->flags);
call_rcu_bh(>rcu, br_multicast_free_pg);
 
if (!mp->ports && !mp->mglist &&
@@ -304,7 +304,7 @@ static void br_multicast_port_group_expired(unsigned long 
data)
 
spin_lock(>multicast_lock);
if (!netif_running(br->dev) || timer_pending(>timer) ||
-   hlist_unhashed(>mglist) || pg->state & MDB_PERMANENT)
+   hlist_unhashed(>mglist) || pg->flags & MDB_PG_FLAGS_PERMANENT)
goto out;
 
br_multicast_del_pg(br, pg);
@@ -649,7 +649,7 @@ struct net_bridge_port_group *br_multicast_new_port_group(
struct net_bridge_port *port,
struct br_ip *group,
struct net_bridge_port_group __rcu *next,
-   unsigned char state)
+   unsigned char flags)
 {
struct net_bridge_port_group *p;
 
@@ -659,7 +659,7 @@ struct net_bridge_port_group *br_multicast_new_port_group(
 
p->addr = *group;
p->port = port;
-   p->state = state;
+   p->flags = flags;
rcu_assign_pointer(p->next, next);
hlist_add_head(>mglist, >mglist);
setup_timer(>timer, br_multicast_port_group_expired,
@@ -702,11 +702,11 @@ static int br_multicast_add_group(struct net_bridge *br,
break;
}
 
-   p = br_multicast_new_port_group(port, group, *pp, MDB_TEMPORARY);
+   p = br_multicast_new_port_group(port, group, *pp, 0);
if (unlikely(!p))
goto err;
rcu_assign_pointer(*pp, p);
-   br_mdb_notify(br->dev, port, group, RTM_NEWMDB, MDB_TEMPORARY);
+   br_mdb_notify(br->dev, port, group, RTM_NEWMDB, 0);
 
 found:
mod_timer(>timer, now + br->multicast_membership_interval);
@@ -975,7 +975,7 @@ void br_multicast_disable_port(struct net_bridge_port *port)
 

[patch net-next 3/3] bridge: mdb: Passing the port-group pointer to br_mdb module

2016-02-03 Thread Jiri Pirko
From: Elad Raz 

Passing the port-group to br_mdb in order to allow direct access to the
structure. br_mdb will later use the structure to reflect HW reflection
status via "state" variable.

Signed-off-by: Elad Raz 
Signed-off-by: Jiri Pirko 
---
 net/bridge/br_mdb.c   | 51 +++
 net/bridge/br_multicast.c |  8 +++-
 net/bridge/br_private.h   |  4 ++--
 3 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 5312570..ac089286 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -217,7 +217,7 @@ static inline size_t rtnl_mdb_nlmsg_size(void)
 }
 
 static void __br_mdb_notify(struct net_device *dev, struct br_mdb_entry *entry,
-   int type)
+   int type, struct net_bridge_port_group *pg)
 {
struct switchdev_obj_port_mdb mdb = {
.obj = {
@@ -240,10 +240,13 @@ static void __br_mdb_notify(struct net_device *dev, 
struct br_mdb_entry *entry,
 #endif
 
mdb.obj.orig_dev = port_dev;
-   if (port_dev && type == RTM_NEWMDB)
-   switchdev_port_obj_add(port_dev, );
-   else if (port_dev && type == RTM_DELMDB)
+   if (port_dev && type == RTM_NEWMDB) {
+   err = switchdev_port_obj_add(port_dev, );
+   if (!err && pg)
+   pg->flags |= MDB_PG_FLAGS_OFFLOAD;
+   } else if (port_dev && type == RTM_DELMDB) {
switchdev_port_obj_del(port_dev, );
+   }
 
skb = nlmsg_new(rtnl_mdb_nlmsg_size(), GFP_ATOMIC);
if (!skb)
@@ -261,21 +264,21 @@ errout:
rtnl_set_sk_err(net, RTNLGRP_MDB, err);
 }
 
-void br_mdb_notify(struct net_device *dev, struct net_bridge_port *port,
-  struct br_ip *group, int type, u8 flags)
+void br_mdb_notify(struct net_device *dev, struct net_bridge_port_group *pg,
+  int type)
 {
struct br_mdb_entry entry;
 
memset(, 0, sizeof(entry));
-   entry.ifindex = port->dev->ifindex;
-   entry.addr.proto = group->proto;
-   entry.addr.u.ip4 = group->u.ip4;
+   entry.ifindex = pg->port->dev->ifindex;
+   entry.addr.proto = pg->addr.proto;
+   entry.addr.u.ip4 = pg->addr.u.ip4;
 #if IS_ENABLED(CONFIG_IPV6)
-   entry.addr.u.ip6 = group->u.ip6;
+   entry.addr.u.ip6 = pg->addr.u.ip6;
 #endif
-   entry.vid = group->vid;
-   __mdb_entry_fill_flags(, flags);
-   __br_mdb_notify(dev, , type);
+   entry.vid = pg->addr.vid;
+   __mdb_entry_fill_flags(, pg->flags);
+   __br_mdb_notify(dev, , type, pg);
 }
 
 static int nlmsg_populate_rtr_fill(struct sk_buff *skb,
@@ -420,7 +423,8 @@ static int br_mdb_parse(struct sk_buff *skb, struct 
nlmsghdr *nlh,
 }
 
 static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port 
*port,
-   struct br_ip *group, unsigned char state)
+   struct br_ip *group, unsigned char state,
+   struct net_bridge_port_group **pg)
 {
struct net_bridge_mdb_entry *mp;
struct net_bridge_port_group *p;
@@ -451,6 +455,7 @@ static int br_mdb_add_group(struct net_bridge *br, struct 
net_bridge_port *port,
if (unlikely(!p))
return -ENOMEM;
rcu_assign_pointer(*pp, p);
+   *pg = p;
if (state == MDB_TEMPORARY)
mod_timer(>timer, now + br->multicast_membership_interval);
 
@@ -458,7 +463,8 @@ static int br_mdb_add_group(struct net_bridge *br, struct 
net_bridge_port *port,
 }
 
 static int __br_mdb_add(struct net *net, struct net_bridge *br,
-   struct br_mdb_entry *entry)
+   struct br_mdb_entry *entry,
+   struct net_bridge_port_group **pg)
 {
struct br_ip ip;
struct net_device *dev;
@@ -487,7 +493,7 @@ static int __br_mdb_add(struct net *net, struct net_bridge 
*br,
 #endif
 
spin_lock_bh(>multicast_lock);
-   ret = br_mdb_add_group(br, p, , entry->state);
+   ret = br_mdb_add_group(br, p, , entry->state, pg);
spin_unlock_bh(>multicast_lock);
return ret;
 }
@@ -495,6 +501,7 @@ static int __br_mdb_add(struct net *net, struct net_bridge 
*br,
 static int br_mdb_add(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
struct net *net = sock_net(skb->sk);
+   struct net_bridge_port_group *pg;
struct net_bridge_vlan_group *vg;
struct net_device *dev, *pdev;
struct br_mdb_entry *entry;
@@ -524,15 +531,15 @@ static int br_mdb_add(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (br_vlan_enabled(br) && vg && entry->vid == 0) {
list_for_each_entry(v, >vlan_list, vlist) {
entry->vid = v->vid;
-   err = __br_mdb_add(net, br, entry);
+   err = __br_mdb_add(net, br, entry, );

[net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload

2016-02-03 Thread John Fastabend
This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Here is a short test script,

 #tc qdisc add dev eth4 ingress
 #tc filter add dev eth4 parent : protocol ip \
u32 ht 800: order 1 \
match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop

<-- hardware has dst/src ip match rule installed -->

 #tc filter del dev eth4 parent : prio 49152
 #tc filter add dev eth4 parent : protocol ip prio 99 \
handle 1: u32 divisor 1
 #tc filter add dev eth4 protocol ip parent : prio 99 \
u32 ht 800: order 1 link 1: \
offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
 #tc filter add dev eth4 parent : protocol ip \
u32 ht 1: order 3 match tcp src 23  action drop

<-- hardware has tcp src port rule installed -->

 #tc qdisc del dev eth4 parent :

<-- hardware cleaned up -->

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h |3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  196 ++
 3 files changed, 198 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 4b9156c..09c2d9b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -925,6 +925,9 @@ s32 ixgbe_fdir_erase_perfect_filter_82599(struct ixgbe_hw 
*hw,
  u16 soft_id);
 void ixgbe_atr_compute_perfect_hash_82599(union ixgbe_atr_input *input,
  union ixgbe_atr_input *mask);
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+   struct ixgbe_fdir_filter *input,
+   u16 sw_idx);
 void ixgbe_set_rx_mode(struct net_device *netdev);
 #ifdef CONFIG_IXGBE_DCB
 void ixgbe_set_rx_drop_en(struct ixgbe_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index bea96b3..726e0ee 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2520,9 +2520,9 @@ static int ixgbe_get_rxnfc(struct net_device *dev, struct 
ethtool_rxnfc *cmd,
return ret;
 }
 
-static int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
-  struct ixgbe_fdir_filter *input,
-  u16 sw_idx)
+int ixgbe_update_ethtool_fdir_entry(struct ixgbe_adapter *adapter,
+   struct ixgbe_fdir_filter *input,
+   u16 sw_idx)
 {
struct ixgbe_hw *hw = >hw;
struct hlist_node *node2;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 03e236c..a1a91bf 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_OF
 #include 
@@ -8200,10 +8201,197 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
return 0;
 }
 
+#include 
+#include "ixgbe_model.h"
+static int ixgbe_delete_clsu32(struct ixgbe_adapter *adapter,
+  struct tc_cls_u32_offload *cls)
+{
+   int err;
+
+   spin_lock(>fdir_perfect_lock);
+   err = ixgbe_update_ethtool_fdir_entry(adapter, NULL, cls->knode.handle);
+   spin_unlock(>fdir_perfect_lock);
+   return err;
+}
+
+#define IXGBE_MAX_LINK_HANDLE 10
+static struct ixgbe_mat_field *
+ixgbe_jump_tables[IXGBE_MAX_LINK_HANDLE] = {ixgbe_ipv4_fields,};
+
+static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
+ __be16 protocol,
+ struct tc_cls_u32_offload *cls)
+{
+   u32 loc = cls->knode.handle & 0xf;
+   struct ixgbe_hw *hw = >hw;
+   struct ixgbe_mat_field *field_ptr;
+   struct ixgbe_fdir_filter *input;
+   union ixgbe_atr_input mask;
+#ifdef CONFIG_NET_CLS_ACT
+   const struct tc_action *a;
+#endif
+   int i, err = 0;
+   u8 queue;
+   u32 handle;
+
+   memset(, 0, sizeof(union ixgbe_atr_input));
+   handle = cls->knode.handle;
+
+   /* At the moment cls_u32 jumps to transport layer and skips past
+* L2 headers. The canonical method to match L2 frames is to 

[net-next PATCH 6/7] net: ixgbe: add minimal parser details for ixgbe

2016-02-03 Thread John Fastabend
This adds an ixgbe data structure that is used to determine what
headers:fields can be matched and in what order they are supported.

For hardware devices this can be a bit tricky because typically
only pre-programmed (firmware, ucode, rtl) parse graphs will be
supported and we don't yet have an interface to change these from
the OS. So its sort of a you get whatever your friendly vendor
provides affair at the moment.

In the future we can add the get routines and set routines to
update this data structure. One interesting thing to note here
is the data structure here identifies ethernet, ip, and tcp
fields without having to hardcode them as enumerations or use
other identifiers.

Signed-off-by: John Fastabend 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h |  112 
 1 file changed, 112 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
new file mode 100644
index 000..43ebec4
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_model.h
@@ -0,0 +1,112 @@
+/***
+ *
+ * Intel 10 Gigabit PCI Express Linux drive
+ * Copyright(c) 2013 - 2015 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see .
+ *
+ * The full GNU General Public License is included in this distribution in
+ * the file called "COPYING".
+ *
+ * Contact Information:
+ * e1000-devel Mailing List 
+ * Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
+ *
+ 
**/
+
+#ifndef _IXGBE_MODEL_H_
+#define _IXGBE_MODEL_H_
+
+#include "ixgbe.h"
+#include "ixgbe_type.h"
+
+struct ixgbe_mat_field {
+   unsigned int off;
+   unsigned int mask;
+   int (*val)(struct ixgbe_fdir_filter *input,
+  union ixgbe_atr_input *mask,
+  __u32 val, __u32 m);
+   unsigned int type;
+};
+
+static inline int ixgbe_mat_prgm_sip(struct ixgbe_fdir_filter *input,
+union ixgbe_atr_input *mask,
+__u32 val, __u32 m)
+{
+   input->filter.formatted.src_ip[0] = val;
+   mask->formatted.src_ip[0] = m;
+   return 0;
+}
+
+static inline int ixgbe_mat_prgm_dip(struct ixgbe_fdir_filter *input,
+union ixgbe_atr_input *mask,
+__u32 val, __u32 m)
+{
+   input->filter.formatted.dst_ip[0] = val;
+   mask->formatted.dst_ip[0] = m;
+   return 0;
+}
+
+static struct ixgbe_mat_field ixgbe_ipv4_fields[] = {
+   { .off = 12, .mask = -1, .val = ixgbe_mat_prgm_sip,
+ .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+   { .off = 16, .mask = -1, .val = ixgbe_mat_prgm_dip,
+ .type = IXGBE_ATR_FLOW_TYPE_IPV4},
+   { .val = NULL } /* terminal node */
+};
+
+static inline int ixgbe_mat_prgm_sport(struct ixgbe_fdir_filter *input,
+  union ixgbe_atr_input *mask,
+  __u32 val, __u32 m)
+{
+   input->filter.formatted.src_port = val & 0x;
+   mask->formatted.src_port = m & 0x;
+   return 0;
+};
+
+static inline int ixgbe_mat_prgm_dport(struct ixgbe_fdir_filter *input,
+  union ixgbe_atr_input *mask,
+  __u32 val, __u32 m)
+{
+   input->filter.formatted.dst_port = val & 0x;
+   mask->formatted.dst_port = m & 0x;
+   return 0;
+};
+
+static struct ixgbe_mat_field ixgbe_tcp_fields[] = {
+   {.off = 0, .mask = 0x, .val = ixgbe_mat_prgm_sport,
+.type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+   {.off = 2, .mask = 0x, .val = ixgbe_mat_prgm_dport,
+.type = IXGBE_ATR_FLOW_TYPE_TCPV4},
+   { .val = NULL } /* terminal node */
+};
+
+struct ixgbe_nexthdr {
+   /* offset, shift, and mask of position to next header */
+   unsigned int o;
+   __u32 s;
+   __u32 m;
+   /* match criteria to make this jump*/
+   unsigned int off;
+   __u32 val;
+   __u32 mask;
+   /* location of jump to make */
+   struct ixgbe_mat_field *jump;
+};
+
+static struct ixgbe_nexthdr ixgbe_ipv4_jumps[] = {
+ 

[net-next PATCH 0/7] tc offload for cls_u32 on ixgbe

2016-02-03 Thread John Fastabend
This extends the setup_tc framework so it can support more than just
the mqprio offload and push other classifiers and qdiscs into the
hardware. The series here targets the u32 classifier and ixgbe
driver. I worked out the u32 classifier because it is protocol
oblivious and aligns with multiple hardware devices I have access
to. I did an initial implementation on ixgbe because (a) I have one
in my box (b) its a stable driver and (c) it is relatively simple
compared to the other devices I have here but still has enough
flexibility to exercise the features of cls_u32.

I intentionally limited the scope of this series to the basic
feature set. Specifically this uses a 'big hammer' feature bit
to do the offload or not. If the bit is set you get offloaded rules
if it is not then rules will not be offloaded. If we can agree on
this patch series there are some more patches on my queue we can
talk about to make the offload decision per rule using flags similar
to how we do l2 mac updates. Additionally the error strategy can
be improved to be hard aborting, log and continue, etc. I think
these are nice to have improvements but shouldn't block this series.

Also by adding get_parse_graph and set_parse_graph attributes as
in my previous flow_api work we can build programmable devices
and programmatically learn when rules can or can not be loaded
into the hardware. Again future work.

Any comments/feedback appreciated.

Thanks,
John

---

John Fastabend (7):
  net: rework ndo tc op to consume additional qdisc handle parameter
  net: rework setup_tc ndo op to consume general tc operand
  net: sched: add cls_u32 offload hooks for netdevs
  net: add tc offload feature flag
  net: tc: helper functions to query action types
  net: ixgbe: add minimal parser details for ixgbe
  net: ixgbe: add support for tc_u32 offload


 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  |8 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h  |2 
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |2 
 drivers/net/ethernet/broadcom/bnxt/bnxt.c|9 +
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |   11 +
 drivers/net/ethernet/intel/i40e/i40e_main.c  |   10 +
 drivers/net/ethernet/intel/ixgbe/ixgbe.h |3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  206 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h   |  112 
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   13 +
 drivers/net/ethernet/sfc/efx.h   |3 
 drivers/net/ethernet/sfc/tx.c|   10 +
 drivers/net/ethernet/ti/netcp_core.c |   14 +
 include/linux/netdev_features.h  |3 
 include/linux/netdevice.h|   24 ++-
 include/net/pkt_cls.h|   33 
 include/net/tc_act/tc_gact.h |   16 ++
 net/core/ethtool.c   |1 
 net/sched/cls_u32.c  |   73 
 net/sched/sch_mqprio.c   |8 +
 21 files changed, 541 insertions(+), 26 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h

--
Signature


Re: [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter

2016-02-03 Thread kbuild test robot
Hi John,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: x86_64-randconfig-x016-201605 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c: In function 
'__fm10k_setup_tc':
>> drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: error: 'TC_H_ROOT' 
>> undeclared (first use in this function)
 if (handle != TC_H_ROOT)
   ^
   drivers/net/ethernet/intel/fm10k/fm10k_netdev.c:1209:16: note: each 
undeclared identifier is reported only once for each function it appears in

vim +/TC_H_ROOT +1209 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c

  1203  
  1204  return err;
  1205  }
  1206  
  1207  static int __fm10k_setup_tc(struct net_device *dev, u32 handle, u8 tc)
  1208  {
> 1209  if (handle != TC_H_ROOT)
  1210  return -EINVAL;
  1211  
  1212  return fm10k_setup_tc(dev, tc);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe

2016-02-03 Thread Amir Vadai"
On Wed, Feb 03, 2016 at 01:27:32AM -0800, John Fastabend wrote:
> This extends the setup_tc framework so it can support more than just
> the mqprio offload and push other classifiers and qdiscs into the
> hardware. The series here targets the u32 classifier and ixgbe
> driver. I worked out the u32 classifier because it is protocol
> oblivious and aligns with multiple hardware devices I have access
> to. I did an initial implementation on ixgbe because (a) I have one
> in my box (b) its a stable driver and (c) it is relatively simple
> compared to the other devices I have here but still has enough
> flexibility to exercise the features of cls_u32.
> 
> I intentionally limited the scope of this series to the basic
> feature set. Specifically this uses a 'big hammer' feature bit
> to do the offload or not. If the bit is set you get offloaded rules
> if it is not then rules will not be offloaded. If we can agree on
> this patch series there are some more patches on my queue we can
> talk about to make the offload decision per rule using flags similar
> to how we do l2 mac updates. Additionally the error strategy can
> be improved to be hard aborting, log and continue, etc. I think
> these are nice to have improvements but shouldn't block this series.
> 
> Also by adding get_parse_graph and set_parse_graph attributes as
> in my previous flow_api work we can build programmable devices
> and programmatically learn when rules can or can not be loaded
> into the hardware. Again future work.
> 
> Any comments/feedback appreciated.
> 
> Thanks,
> John
> 
> ---
> 
> John Fastabend (7):
>   net: rework ndo tc op to consume additional qdisc handle parameter
>   net: rework setup_tc ndo op to consume general tc operand
>   net: sched: add cls_u32 offload hooks for netdevs
>   net: add tc offload feature flag
>   net: tc: helper functions to query action types
>   net: ixgbe: add minimal parser details for ixgbe
>   net: ixgbe: add support for tc_u32 offload
> 

Hi John,

Nice work :)

I will add mlx5 support, and see if can live with u32. If not - will
add flower support too.

Amir


Re: [net-next PATCH 3/7] net: sched: add cls_u32 offload hooks for netdevs

2016-02-03 Thread kbuild test robot
Hi John,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: i386-randconfig-x009-02010231 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/sched/cls_u32.c: In function 'u32_replace_hw_knode':
>> net/sched/cls_u32.c:478:33: error: 'struct tc_u_knode' has no member named 
>> 'val'
  offload.cls_u32->knode.val = n->val;
^
>> net/sched/cls_u32.c:479:34: error: 'struct tc_u_knode' has no member named 
>> 'mask'
  offload.cls_u32->knode.mask = n->mask;
 ^

vim +478 net/sched/cls_u32.c

   472  offload.cls_u32 = _offload;
   473  
   474  if (dev->netdev_ops->ndo_setup_tc) {
   475  offload.cls_u32->command = TC_CLSU32_REPLACE_KNODE;
   476  offload.cls_u32->knode.handle = n->handle;
   477  offload.cls_u32->knode.fshift = n->fshift;
 > 478  offload.cls_u32->knode.val = n->val;
 > 479  offload.cls_u32->knode.mask = n->mask;
   480  offload.cls_u32->knode.sel = >sel;
   481  offload.cls_u32->knode.exts = >exts;
   482  if (n->ht_down)

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload

2016-02-03 Thread John Fastabend
On 16-02-03 02:07 AM, Amir Vadai" wrote:
> On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:
>> This adds initial support for offloading the u32 tc classifier. This
>> initial implementation only implements a few base matches and actions
>> to illustrate the use of the infrastructure patches.
>>
>> However it is an interesting subset because it handles the u32 next
>> hdr logic to correctly map tcp packets from ip headers using the ihl
>> and protocol fields. After this is accepted we can extend the match
>> and action fields easily by updating the model header file.
>>
>> Also only the drop action is supported initially.
>>
>> Here is a short test script,
>>
>>  #tc qdisc add dev eth4 ingress
>>  #tc filter add dev eth4 parent : protocol ip \
>>  u32 ht 800: order 1 \
>>  match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
>>
>> <-- hardware has dst/src ip match rule installed -->
>>
>>  #tc filter del dev eth4 parent : prio 49152
>>  #tc filter add dev eth4 parent : protocol ip prio 99 \
>>  handle 1: u32 divisor 1
>>  #tc filter add dev eth4 protocol ip parent : prio 99 \
>>  u32 ht 800: order 1 link 1: \
>>  offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
>>  #tc filter add dev eth4 parent : protocol ip \
>>  u32 ht 1: order 3 match tcp src 23  action drop
>>
>> <-- hardware has tcp src port rule installed -->
>>
>>  #tc qdisc del dev eth4 parent :
>>
>> <-- hardware cleaned up -->
>>
>> Signed-off-by: John Fastabend 
>> ---
>>  drivers/net/ethernet/intel/ixgbe/ixgbe.h |3 
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 -
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  196 
>> ++
>>  3 files changed, 198 insertions(+), 7 deletions(-)
>>
> 
> What are you doing w.r.t priorities? Are the filters processed by the
> order of the priorities?
> 

The rules are put in order by the handles which is populated in
my command above such that 'ht 1: order 3' gives handle 1::3 and
'ht 800: order 1' gives 800::1. Take a look at this block in cls_u32

if (err == 0) {
struct tc_u_knode __rcu **ins;
struct tc_u_knode *pins;

ins = >ht[TC_U32_HASH(handle)];
for (pins = rtnl_dereference(*ins); pins;
 ins = >next, pins = rtnl_dereference(*ins))
if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
break;

RCU_INIT_POINTER(n->next, pins);
rcu_assign_pointer(*ins, n);
u32_replace_hw_knode(tp, n);
*arg = (unsigned long)n;
return 0;


If you leave ht and order off the tc cli I believe 'tc' just
picks some semi-arbitrary ones for you. I've been in the habit
of always specifying them even for software filters.

[...]

>>  
>> +#include 
>> +#include "ixgbe_model.h"
> Did you leave those #include's in the middle of the file on purpose?
> 
> [...]
> 

Probably bad form I'll move it to the top with the other header files.

Thanks!



[patch net-next RFC 6/6] mlx4: Implement port type setting via devlink interface

2016-02-03 Thread Jiri Pirko
From: Jiri Pirko 

So far, there has been an mlx4-specific sysfs file allowing user to
change port type to either Ethernet of InfiniBand. This is very
inconvenient.

Allow to expose the same ability to set port type in a generic way
using devlink interface.

Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 86 +++
 1 file changed, 65 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index a5f54a5..4bac714 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1052,36 +1052,20 @@ static ssize_t show_port_type(struct device *dev,
return strlen(buf);
 }
 
-static ssize_t set_port_type(struct device *dev,
-struct device_attribute *attr,
-const char *buf, size_t count)
+static int __set_port_type(struct mlx4_port_info *info,
+  enum mlx4_port_type port_type)
 {
-   struct mlx4_port_info *info = container_of(attr, struct mlx4_port_info,
-  port_attr);
struct mlx4_dev *mdev = info->dev;
struct mlx4_priv *priv = mlx4_priv(mdev);
enum mlx4_port_type types[MLX4_MAX_PORTS];
enum mlx4_port_type new_types[MLX4_MAX_PORTS];
-   static DEFINE_MUTEX(set_port_type_mutex);
int i;
int err = 0;
 
-   mutex_lock(_port_type_mutex);
-
-   if (!strcmp(buf, "ib\n"))
-   info->tmp_type = MLX4_PORT_TYPE_IB;
-   else if (!strcmp(buf, "eth\n"))
-   info->tmp_type = MLX4_PORT_TYPE_ETH;
-   else if (!strcmp(buf, "auto\n"))
-   info->tmp_type = MLX4_PORT_TYPE_AUTO;
-   else {
-   mlx4_err(mdev, "%s is not supported port type\n", buf);
-   err = -EINVAL;
-   goto err_out;
-   }
-
mlx4_stop_sense(mdev);
mutex_lock(>port_mutex);
+   info->tmp_type = port_type;
+
/* Possible type is always the one that was delivered */
mdev->caps.possible_type[info->port] = info->tmp_type;
 
@@ -1123,6 +1107,37 @@ static ssize_t set_port_type(struct device *dev,
 out:
mlx4_start_sense(mdev);
mutex_unlock(>port_mutex);
+
+   return err;
+}
+
+static ssize_t set_port_type(struct device *dev,
+struct device_attribute *attr,
+const char *buf, size_t count)
+{
+   struct mlx4_port_info *info = container_of(attr, struct mlx4_port_info,
+  port_attr);
+   struct mlx4_dev *mdev = info->dev;
+   enum mlx4_port_type port_type;
+   static DEFINE_MUTEX(set_port_type_mutex);
+   int err;
+
+   mutex_lock(_port_type_mutex);
+
+   if (!strcmp(buf, "ib\n")) {
+   port_type = MLX4_PORT_TYPE_IB;
+   } else if (!strcmp(buf, "eth\n")) {
+   port_type = MLX4_PORT_TYPE_ETH;
+   } else if (!strcmp(buf, "auto\n")) {
+   port_type = MLX4_PORT_TYPE_AUTO;
+   } else {
+   mlx4_err(mdev, "%s is not supported port type\n", buf);
+   err = -EINVAL;
+   goto err_out;
+   }
+
+   err = __set_port_type(info, port_type);
+
 err_out:
mutex_unlock(_port_type_mutex);
 
@@ -3651,6 +3666,35 @@ err_disable_pdev:
return err;
 }
 
+static int mlx4_devlink_port_type_set(struct devlink_port *devlink_port,
+ enum devlink_port_type port_type)
+{
+   struct mlx4_port_info *info = container_of(devlink_port,
+  struct mlx4_port_info,
+  devlink_port);
+   enum mlx4_port_type mlx4_port_type;
+
+   switch (port_type) {
+   case DEVLINK_PORT_TYPE_AUTO:
+   mlx4_port_type = MLX4_PORT_TYPE_AUTO;
+   break;
+   case DEVLINK_PORT_TYPE_ETH:
+   mlx4_port_type = MLX4_PORT_TYPE_ETH;
+   break;
+   case DEVLINK_PORT_TYPE_IB:
+   mlx4_port_type = MLX4_PORT_TYPE_IB;
+   break;
+   default:
+   return -EOPNOTSUPP;
+   }
+
+   return __set_port_type(info, mlx4_port_type);
+}
+
+static const struct devlink_ops mlx4_devlink_ops = {
+   .port_type_set  = mlx4_devlink_port_type_set,
+};
+
 static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)
 {
struct devlink *devlink;
@@ -3660,7 +3704,7 @@ static int mlx4_init_one(struct pci_dev *pdev, const 
struct pci_device_id *id)
 
printk_once(KERN_INFO "%s", mlx4_version);
 
-   devlink = devlink_alloc(NULL, sizeof(*priv));
+   devlink = devlink_alloc(_devlink_ops, sizeof(*priv));
if (!devlink)
return -ENOMEM;
priv = devlink_priv(devlink);
-- 
1.9.3



Re: [PATCH net-next 2/4] net: dev: add batching to net_device notifiers

2016-02-03 Thread Julian Anastasov

Hello,

On Tue, 2 Feb 2016, Salam Noureddine wrote:

> On Tue, Feb 2, 2016 at 12:55 PM, Julian Anastasov  wrote:
> >
> > If the rule is once per net, the above call...
> >
> >>   }
> >
> > should be here:
> >
> > call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> > net->loopback_dev);
> >
> > and also once after outroll label?:
> 
> That's a good optimization to add. I was mostly focusing on the device
> unregister path.

Yes, the idea is to avoid NETDEV_UNREGISTER_BATCH for
every dev. And not to forget to call it for every net.
In this case it is a cleanup path after failure.

> >>   call_netdevice_notifier(nb, NETDEV_UNREGISTER, dev);
> >> + call_netdevice_notifier(nb, NETDEV_UNREGISTER_BATCH,
> >> + dev);
> >
> > Above call...
> >
> >>   }
> >
> > should be here, for net->loopback_dev?
> > Also, is it ok to call NETDEV_DOWN_BATCH many times, as result,
> > sometimes after NETDEV_UNREGISTER?
> 
> Same here, I can add this optimization. I think it is fine to call the
> BATCH notifiers
> for every interface. It is just better to do it for many interfaces at
> the same time.

Agreed

> >> + list_for_each_entry_safe(net, net_tmp, _head, event_list) {
> >> + call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
> >> +  net->loopback_dev);
> >> + net_del_event_list(net);
> >> + }
> >> +
> >
> > NETDEV_UNREGISTER* should not be called before
> > following synchronize_net and NETDEV_UNREGISTER. May be
> > we should split the loop: loop (dev_shutdown+NETDEV_UNREGISTER)
> > followed by above NETDEV_UNREGISTER_BATCH then again the
> > loop for all remaining calls
> >
> >>   synchronize_net();
> 
> The call to NETDEV_UNREGISTER_BATCH is actually after NETDEV_UNREGISTER,
> it seems the other way around in the patch because it is showing part
> of netdev_wait_allrefs
> and not rollback_registered_may.

Aha, I see, it is after NETDEV_UNREGISTER but may be
the above loop should be changed to two loops so that
NETDEV_UNREGISTER_BATCH is called exactly after all
NETDEV_UNREGISTER and before all dev_*_flush and
ndo_uninit calls to avoid any risks. I mean:

synchronize_net();

First part of loop:

list_for_each_entry(dev, head, unreg_list) {
/* Shutdown queueing discipline. */
dev_shutdown(dev);

/* Notify protocols, that we are about to destroy
   this device. They should clean all the things.
*/
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
}

This is the same NETDEV_UNREGISTER_BATCH logic:

+   list_for_each_entry(dev, head, unreg_list) {
+   net_add_event_list(_head, dev_net(dev));
+   }
+   list_for_each_entry_safe(net, net_tmp, _head, event_list) {
+   call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH,
+net->loopback_dev);
+   net_del_event_list(net);
+   }

Second part of the loop:

list_for_each_entry(dev, head, unreg_list) {
struct sk_buff *skb = NULL;

if (!dev->rtnl_link_ops ||
...

Regards

--
Julian Anastasov 


[PATCH net-next V2 3/6] ixgbevf: call ndo_stop() instead of dev_close() when running offline selftest

2016-02-03 Thread Stefan Assmann
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c  | 4 ++--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  | 2 ++
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c 
b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index c48aef6..d7aa4b2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -680,7 +680,7 @@ static void ixgbevf_diag_test(struct net_device *netdev,
 
if (if_running)
/* indicate we're in test mode */
-   dev_close(netdev);
+   ixgbevf_close(netdev);
else
ixgbevf_reset(adapter);
 
@@ -692,7 +692,7 @@ static void ixgbevf_diag_test(struct net_device *netdev,
 
clear_bit(__IXGBEVF_TESTING, >state);
if (if_running)
-   dev_open(netdev);
+   ixgbevf_open(netdev);
} else {
hw_dbg(>hw, "online testing starting\n");
/* Online tests */
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 68ec7daa..991eeae 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -486,6 +486,8 @@ extern const struct ixgbe_mbx_operations ixgbevf_mbx_ops;
 extern const char ixgbevf_driver_name[];
 extern const char ixgbevf_driver_version[];
 
+int ixgbevf_open(struct net_device *netdev);
+int ixgbevf_close(struct net_device *netdev);
 void ixgbevf_up(struct ixgbevf_adapter *adapter);
 void ixgbevf_down(struct ixgbevf_adapter *adapter);
 void ixgbevf_reinit_locked(struct ixgbevf_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 3558f01..01f79fa 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3122,7 +3122,7 @@ static void ixgbevf_free_all_rx_resources(struct 
ixgbevf_adapter *adapter)
  * handler is registered with the OS, the watchdog timer is started,
  * and the stack is notified that the interface is ready.
  **/
-static int ixgbevf_open(struct net_device *netdev)
+int ixgbevf_open(struct net_device *netdev)
 {
struct ixgbevf_adapter *adapter = netdev_priv(netdev);
struct ixgbe_hw *hw = >hw;
@@ -3205,7 +3205,7 @@ err_setup_reset:
  * needs to be disabled.  A global MAC reset is issued to stop the
  * hardware, and all transmit and receive resources are freed.
  **/
-static int ixgbevf_close(struct net_device *netdev)
+int ixgbevf_close(struct net_device *netdev)
 {
struct ixgbevf_adapter *adapter = netdev_priv(netdev);
 
-- 
2.5.0



Re: [PATCH v3] net:Add sysctl_max_skb_frags

2016-02-03 Thread Hannes Frederic Sowa

On 03.02.2016 12:25, Herbert Xu wrote:

On Wed, Feb 03, 2016 at 09:26:57AM +0100, Hans Westgaard Ry wrote:

Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.

Signed-off-by: Hans Westgaard Ry 
Reviewed-by: Håkon Bugge 


I have to say this seems rather dirty.  I mean if taken to the
extreme wouldn't this mean that we should disable frags altogether
if some NIC can't handle them at all?

Someone suggested earlier to partially linearise the skb, why
couldn't we do that? IOW let's handle this craziness in the crazy
drivers and not in the general stack.


Agreed that it feels like a hack, but a rather simple one. I would
consider this to be just a performance improvement. We certainly need
a slow-path when virtio drivers submit gso packets to the stack (and
already discussed with Hans). The sysctl can't help here. But without
the sysctl the packets would constantly hit the slow-path in case of
e.g. IPoIB and that would also be rather bad.

Thanks,
Hannes


[3.16.y-ckt stable] Patch "=?UTF-8?q?veth:=20don=E2=80=99t=20modify=20ip=5Fsummed;=20doing?= =?UTF-8?q?=20so=20treats=20packets=20with=20bad=20checksums=20as=20good.?=" has been added to the 3.16.y-ck

2016-02-03 Thread Luis Henriques
This is a note to let you know that I have just added a patch titled

=?UTF-8?q?veth:=20don=E2=80=99t=20modify=20ip=5Fsummed;=20doing?= 
=?UTF-8?q?=20so=20treats=20packets=20with=20bad=20checksums=20as=20good.?=

to the linux-3.16.y-queue branch of the 3.16.y-ckt extended stable tree 
which can be found at:

http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.16.y-queue

This patch is scheduled to be released in version 3.16.7-ckt24.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.16.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Luis

---8<

>From 3be302bd22c56fe6a9a1c6de7e14661f0d3defc1 Mon Sep 17 00:00:00 2001
From: Vijay Pandurangan 
Date: Fri, 18 Dec 2015 14:34:59 -0500
Subject: =?UTF-8?q?veth:=20don=E2=80=99t=20modify=20ip=5Fsummed;=20doing?=
 =?UTF-8?q?=20so=20treats=20packets=20with=20bad=20checksums=20as=20good.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f upstream.

Packets that arrive from real hardware devices have ip_summed ==
CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
current version of veth will replace CHECKSUM_NONE with
CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
a veth device to be delivered to the application. This caused applications
at Twitter to receive corrupt data when network hardware was corrupting
packets.

We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).

This code dates back to the first version of the driver, commit
 ("[NET]: Virtual ethernet device driver"), so I
suspect this bug occurred mostly because the driver API has evolved
significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
packet checksumming") (in December 2010) fixed this for packets that get
created locally and sent to hardware devices, by not changing
CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
in from hardware devices.

Co-authored-by: Evan Jones 
Signed-off-by: Evan Jones 
Cc: Nicolas Dichtel 
Cc: Phil Sutter 
Cc: Toshiaki Makita 
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Vijay Pandurangan 
Acked-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Luis Henriques 
---
 drivers/net/veth.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index b4a10bcb66a0..e3a0e674136f 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -117,12 +117,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
kfree_skb(skb);
goto drop;
}
-   /* don't change ip_summed == CHECKSUM_PARTIAL, as that
-* will cause bad checksum on forwarded packets
-*/
-   if (skb->ip_summed == CHECKSUM_NONE &&
-   rcv->features & NETIF_F_RXCSUM)
-   skb->ip_summed = CHECKSUM_UNNECESSARY;

if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);


Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-03 Thread Jesper Dangaard Brouer
On Wed,  3 Feb 2016 11:47:56 +0100
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> There a is need for some userspace API that would allow to expose things
> that are not directly related to any device class like net_device of
> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
> 
> Use cases:
> 1) get/set of port type (Ethernet/InfiniBand)
> 2) monitoring of hardware messages to and from chip
> 3) setting up port splitters - split port into multiple ones and squash again,
>enables usage of splitter cable
> 4) setting up shared buffers - shared among multiple ports within one chip
> 
> First patch of this set introduces a new generic Netlink based interface,
> called "devlink". It is similar to nl80211 model and it is heavily
> influenced by it, including the API definition. The devlink introduction patch
> implements use cases 1) and 2). Other 2 are in development atm and will
> be addressed by follow-ups.
> 
> It is very convenient for drivers to use devlink, as you can see in other
> patches in this set.
> 
> Counterpart for devlink is userspace tool called "dl". Command line interface
> and outputs are derived from "ip" tool so it should be easy for users to get
> used to it.
> 
> It is available here:
> https://github.com/jpirko/devlink

IHMO this very short command name "dl" is not an advantage.  It is
simply too unspecific and short for a good Google search.  E.g. when
searching for good examples for using "dl".  I think "devlink" would be
better.  If you like short commands do: alias dl="devlink"


> Example usage:
> butter:~$ dl help
> Usage: dl [ OPTIONS ] OBJECT { COMMAND | help }
> where  OBJECT := { dev | port | monitor }
>OPTIONS := { -v/--verbose }
> 
> butter:~$ dl dev show
> 0: devlink0: bus pci dev :01:00.0
> 
> butter:~$ dl port help
> Usage: dl port show [DEV/PORT_INDEX]
> Usage: dl port set DEV/PORT_INDEX [ type { eth | ib | auto} ]
> 
> butter:~$ dl port show
> devlink0/1: type ib ibdev mlx4_0
> devlink0/2: type ib ibdev mlx4_0
> 
> butter:~$ sudo dl port set devlink0/1 type eth
> 
> butter:~$ dl port show
> devlink0/1: type eth netdev ens4
> devlink0/2: type ib ibdev mlx4_0
> 
> butter:~$ sudo dl port set devlink0/1 type auto
> 
> butter:~$ dl port show
> devlink0/1: type eth(auto) netdev ens4
> devlink0/2: type ib ibdev mlx4_0

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH v2 12/15] ptp: pch: Allow build on MIPS platforms

2016-02-03 Thread Paul Burton
Allow the ptp_pch driver to be built on MIPS platforms in preparation
for use on the MIPS Boston board.

Signed-off-by: Paul Burton 
Acked-by: Richard Cochran 
---

Changes in v2: None

 drivers/ptp/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig
index ee3de34..ee43549 100644
--- a/drivers/ptp/Kconfig
+++ b/drivers/ptp/Kconfig
@@ -74,7 +74,7 @@ config DP83640_PHY
 
 config PTP_1588_CLOCK_PCH
tristate "Intel PCH EG20T as PTP clock"
-   depends on X86_32 || COMPILE_TEST
+   depends on X86_32 || MIPS || COMPILE_TEST
depends on HAS_IOMEM && NET
select PTP_1588_CLOCK
help
-- 
2.7.0



[PATCH v2 1/6] net: pch_gbe: Allow build on MIPS platforms

2016-02-03 Thread Paul Burton
Allow the pch_gbe driver to be built on MIPS platforms, in preparation
for its use on the MIPS Boston board.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig 
b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
index 5f7a352..4d3809a 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/Kconfig
@@ -4,7 +4,7 @@
 
 config PCH_GBE
tristate "OKI SEMICONDUCTOR IOH(ML7223/ML7831) GbE"
-   depends on PCI && (X86_32 || COMPILE_TEST)
+   depends on PCI && (X86_32 || MIPS || COMPILE_TEST)
select MII
select PTP_1588_CLOCK_PCH
select NET_PTP_CLASSIFY
-- 
2.7.0



[PATCH v2 0/6] pch_gbe fixes for Imagination Technologies MIPS Boston

2016-02-03 Thread Paul Burton
This series has been extracted from an earlier larger series adding
support for the Imagination Technologies MIPS Boston development board.
The current version of that series without these patches included can be
found here:

http://marc.info/?l=linux-mips=145449909110835=2

This series is somewhat standalone & should fix theoretical issues for
other users of the driver, but has only been tested by myself in
conjunction with the above series on a Boston board.

Paul Burton (6):
  net: pch_gbe: Allow build on MIPS platforms
  net: pch_gbe: Mark Minnow PHY reset GPIO active low
  net: pch_gbe: Pull PHY GPIO handling out of Minnow code
  net: pch_gbe: Always reset PHY along with MAC
  net: pch_gbe: Add device tree support
  net: pch_gbe: Allow longer for resets

 drivers/net/ethernet/oki-semi/pch_gbe/Kconfig  |  2 +-
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h|  4 +-
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   | 71 ++
 3 files changed, 62 insertions(+), 15 deletions(-)

-- 
2.7.0



Re: [net-next PATCH 7/7] net: ixgbe: add support for tc_u32 offload

2016-02-03 Thread Jamal Hadi Salim

On 16-02-03 05:26 AM, John Fastabend wrote:

On 16-02-03 02:07 AM, Amir Vadai" wrote:

On Wed, Feb 03, 2016 at 01:29:59AM -0800, John Fastabend wrote:

This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Here is a short test script,

  #tc qdisc add dev eth4 ingress
  #tc filter add dev eth4 parent : protocol ip \
u32 ht 800: order 1 \
match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop

<-- hardware has dst/src ip match rule installed -->

  #tc filter del dev eth4 parent : prio 49152
  #tc filter add dev eth4 parent : protocol ip prio 99 \
handle 1: u32 divisor 1
  #tc filter add dev eth4 protocol ip parent : prio 99 \
u32 ht 800: order 1 link 1: \
offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
  #tc filter add dev eth4 parent : protocol ip \
u32 ht 1: order 3 match tcp src 23  action drop

<-- hardware has tcp src port rule installed -->

  #tc qdisc del dev eth4 parent :

<-- hardware cleaned up -->

Signed-off-by: John Fastabend 
---
  drivers/net/ethernet/intel/ixgbe/ixgbe.h |3
  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 -
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c|  196 ++
  3 files changed, 198 insertions(+), 7 deletions(-)



What are you doing w.r.t priorities? Are the filters processed by the
order of the priorities?



The rules are put in order by the handles which is populated in
my command above such that 'ht 1: order 3' gives handle 1::3 and
'ht 800: order 1' gives 800::1. Take a look at this block in cls_u32

 if (err == 0) {
 struct tc_u_knode __rcu **ins;
 struct tc_u_knode *pins;

 ins = >ht[TC_U32_HASH(handle)];
 for (pins = rtnl_dereference(*ins); pins;
  ins = >next, pins = rtnl_dereference(*ins))
 if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
 break;

 RCU_INIT_POINTER(n->next, pins);
 rcu_assign_pointer(*ins, n);
 u32_replace_hw_knode(tp, n);
 *arg = (unsigned long)n;
 return 0;


If you leave ht and order off the tc cli I believe 'tc' just
picks some semi-arbitrary ones for you. I've been in the habit
of always specifying them even for software filters.



The default table id is essentially 0x800. Default bucket is 0.
"order" essentially is the filter id. And given you can link tables
(Nice work John!); essentially the ht:bucket:nodeid is an "address" to
a specific filter on a specific table and when makes sense a specific
hash bucket. Some other way to look at it is as a way to construct
a mapping to a TCAM key.
What John is doing is essentially taking the nodeid and trying to use
it as a priority. In otherwise the abstraction is reduced to a linked
list in which the ordering is how the list is traversed.
It may work in this case, but i am for being able to explicitly specify
priorities.

cheers,
jamal



Re: [PATCH net-next] hv_netvsc: Increase delay for RNDIS_STATUS_NETWORK_CHANGE

2016-02-03 Thread Vitaly Kuznetsov
Haiyang Zhang  writes:

> We simulates a link down period for RNDIS_STATUS_NETWORK_CHANGE message to
> trigger DHCP renew. User daemons may need multiple seconds to trigger the
> link down event. (e.g. ifplugd: 5sec, network-manager: 4sec.) So update
> this link down period to 10 sec to properly trigger DHCP renew.
>

I probably don't follow: why do we need sucha a delay? If (with real
hardware) you plug network cable out and in one second you plug it in
you'll get DHCP renewed, right?

When I introduced RNDIS_STATUS_NETWORK_CHANGE handling by emulating a
pair of up/down events I put 2 second delay to make link_watch happy (as
we only handle 1 event per second there) but 10 seconds sounds to much
to me.

> Signed-off-by: Haiyang Zhang 
> ---
>  drivers/net/hyperv/netvsc_drv.c |   10 --
>  1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 1d3a665..6f23973 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -43,6 +43,8 @@
>
>  #define RING_SIZE_MIN 64
>  #define LINKCHANGE_INT (2 * HZ)
> +/* Extra delay for RNDIS_STATUS_NETWORK_CHANGE: */
> +#define LINKCHANGE_DELAY (8 * HZ)
>  static int ring_size = 128;
>  module_param(ring_size, int, S_IRUGO);
>  MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
> @@ -964,6 +966,7 @@ static void netvsc_link_change(struct work_struct *w)
>   return;
>   }
>   ndev_ctx->last_reconfig = jiffies;
> + delay = LINKCHANGE_INT;
>
>   spin_lock_irqsave(_ctx->lock, flags);
>   if (!list_empty(_ctx->reconfig_events)) {
> @@ -1009,8 +1012,11 @@ static void netvsc_link_change(struct work_struct *w)
>   netif_tx_stop_all_queues(net);
>   event->event = RNDIS_STATUS_MEDIA_CONNECT;
>   spin_lock_irqsave(_ctx->lock, flags);
> - list_add_tail(>list, _ctx->reconfig_events);
> + list_add(>list, _ctx->reconfig_events);

Why? Adding to tail was here to not screw the order of events...

>   spin_unlock_irqrestore(_ctx->lock, flags);
> +
> + ndev_ctx->last_reconfig += LINKCHANGE_DELAY;
> + delay = LINKCHANGE_INT + LINKCHANGE_DELAY;
>   reschedule = true;
>   }
>   break;
> @@ -1025,7 +1031,7 @@ static void netvsc_link_change(struct work_struct *w)
>* second, handle next reconfig event in 2 seconds.
>*/
>   if (reschedule)
> - schedule_delayed_work(_ctx->dwork, LINKCHANGE_INT);
> + schedule_delayed_work(_ctx->dwork, delay);
>  }
>
>  static void netvsc_free_netdev(struct net_device *netdev)

-- 
  Vitaly


Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-03 Thread Jiri Pirko
Wed, Feb 03, 2016 at 02:31:33PM CET, bro...@redhat.com wrote:
>On Wed,  3 Feb 2016 11:47:56 +0100
>Jiri Pirko  wrote:
>
>> From: Jiri Pirko 
>> 
>> There a is need for some userspace API that would allow to expose things
>> that are not directly related to any device class like net_device of
>> ib_device, but rather chip-wide/switch-ASIC-wide stuff.
>> 
>> Use cases:
>> 1) get/set of port type (Ethernet/InfiniBand)
>> 2) monitoring of hardware messages to and from chip
>> 3) setting up port splitters - split port into multiple ones and squash 
>> again,
>>enables usage of splitter cable
>> 4) setting up shared buffers - shared among multiple ports within one chip
>> 
>> First patch of this set introduces a new generic Netlink based interface,
>> called "devlink". It is similar to nl80211 model and it is heavily
>> influenced by it, including the API definition. The devlink introduction 
>> patch
>> implements use cases 1) and 2). Other 2 are in development atm and will
>> be addressed by follow-ups.
>> 
>> It is very convenient for drivers to use devlink, as you can see in other
>> patches in this set.
>> 
>> Counterpart for devlink is userspace tool called "dl". Command line interface
>> and outputs are derived from "ip" tool so it should be easy for users to get
>> used to it.
>> 
>> It is available here:
>> https://github.com/jpirko/devlink
>
>IHMO this very short command name "dl" is not an advantage.  It is
>simply too unspecific and short for a good Google search.  E.g. when
>searching for good examples for using "dl".  I think "devlink" would be
>better.  If you like short commands do: alias dl="devlink"

I was thinking about using "devlink". Decided to go with shortened
version so this is in-line with "ip". But you have a point.


[RFC (v3) 10/19] nfp: cleanup tx ring flush and rename to reset

2016-02-03 Thread Jakub Kicinski
Since we never used flush without freeing the ring later
the functionality of the two operations is mixed.
Rename flush to ring reset and move there all the things
which have to be done after FW ring state is cleared.
While at it do some clean-ups.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 81 ++
 1 file changed, 37 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index faaa25dd5a1e..cc8b06651f57 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -868,61 +868,59 @@ static void nfp_net_tx_complete(struct nfp_net_tx_ring 
*tx_ring)
 }
 
 /**
- * nfp_net_tx_flush() - Free any untransmitted buffers currently on the TX ring
- * @tx_ring: TX ring structure
+ * nfp_net_tx_ring_reset() - Free any untransmitted buffers and reset pointers
+ * @nn:NFP Net device
+ * @tx_ring:   TX ring structure
  *
  * Assumes that the device is stopped
  */
-static void nfp_net_tx_flush(struct nfp_net_tx_ring *tx_ring)
+static void
+nfp_net_tx_ring_reset(struct nfp_net *nn, struct nfp_net_tx_ring *tx_ring)
 {
-   struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
-   struct nfp_net *nn = r_vec->nfp_net;
-   struct pci_dev *pdev = nn->pdev;
const struct skb_frag_struct *frag;
struct netdev_queue *nd_q;
-   struct sk_buff *skb;
-   int nr_frags;
-   int fidx;
-   int idx;
+   struct pci_dev *pdev = nn->pdev;
 
while (tx_ring->rd_p != tx_ring->wr_p) {
-   idx = tx_ring->rd_p % tx_ring->cnt;
+   int nr_frags, fidx, idx;
+   struct sk_buff *skb;
 
+   idx = tx_ring->rd_p % tx_ring->cnt;
skb = tx_ring->txbufs[idx].skb;
-   if (skb) {
-   nr_frags = skb_shinfo(skb)->nr_frags;
-   fidx = tx_ring->txbufs[idx].fidx;
-
-   if (fidx == -1) {
-   /* unmap head */
-   dma_unmap_single(>dev,
-tx_ring->txbufs[idx].dma_addr,
-skb_headlen(skb),
-DMA_TO_DEVICE);
-   } else {
-   /* unmap fragment */
-   frag = _shinfo(skb)->frags[fidx];
-   dma_unmap_page(>dev,
-  tx_ring->txbufs[idx].dma_addr,
-  skb_frag_size(frag),
-  DMA_TO_DEVICE);
-   }
-
-   /* check for last gather fragment */
-   if (fidx == nr_frags - 1)
-   dev_kfree_skb_any(skb);
-
-   tx_ring->txbufs[idx].dma_addr = 0;
-   tx_ring->txbufs[idx].skb = NULL;
-   tx_ring->txbufs[idx].fidx = -2;
+   nr_frags = skb_shinfo(skb)->nr_frags;
+   fidx = tx_ring->txbufs[idx].fidx;
+
+   if (fidx == -1) {
+   /* unmap head */
+   dma_unmap_single(>dev,
+tx_ring->txbufs[idx].dma_addr,
+skb_headlen(skb), DMA_TO_DEVICE);
+   } else {
+   /* unmap fragment */
+   frag = _shinfo(skb)->frags[fidx];
+   dma_unmap_page(>dev,
+  tx_ring->txbufs[idx].dma_addr,
+  skb_frag_size(frag), DMA_TO_DEVICE);
}
 
-   memset(_ring->txds[idx], 0, sizeof(tx_ring->txds[idx]));
+   /* check for last gather fragment */
+   if (fidx == nr_frags - 1)
+   dev_kfree_skb_any(skb);
+
+   tx_ring->txbufs[idx].dma_addr = 0;
+   tx_ring->txbufs[idx].skb = NULL;
+   tx_ring->txbufs[idx].fidx = -2;
 
tx_ring->qcp_rd_p++;
tx_ring->rd_p++;
}
 
+   memset(tx_ring->txds, 0, sizeof(*tx_ring->txds) * tx_ring->cnt);
+   tx_ring->wr_p = 0;
+   tx_ring->rd_p = 0;
+   tx_ring->qcp_rd_p = 0;
+   tx_ring->wr_ptr_add = 0;
+
nd_q = netdev_get_tx_queue(nn->netdev, tx_ring->idx);
netdev_tx_reset_queue(nd_q);
 }
@@ -1360,11 +1358,6 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
  tx_ring->txds, tx_ring->dma);
 
tx_ring->cnt = 0;
-   tx_ring->wr_p = 0;
-   tx_ring->rd_p = 0;
-   tx_ring->qcp_rd_p = 0;
-   tx_ring->wr_ptr_add = 0;
-

[RFC (v3) 05/19] nfp: don't trust netif_running() in debug code

2016-02-03 Thread Jakub Kicinski
Since change_mtu() can fail and leave us with netif_running()
returning true even though all rings were freed - we should
look at NFP_NET_CFG_CTRL_ENABLE flag to determine if device
is really opened.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
index 4c97c713121c..7af404d492cc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_debugfs.c
@@ -52,7 +52,7 @@ static int nfp_net_debugfs_rx_q_read(struct seq_file *file, 
void *data)
if (!rx_ring->r_vec || !rx_ring->r_vec->nfp_net)
goto out;
nn = rx_ring->r_vec->nfp_net;
-   if (!netif_running(nn->netdev))
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE))
goto out;
 
rxd_cnt = rx_ring->cnt;
@@ -127,7 +127,7 @@ static int nfp_net_debugfs_tx_q_read(struct seq_file *file, 
void *data)
if (!tx_ring->r_vec || !tx_ring->r_vec->nfp_net)
goto out;
nn = tx_ring->r_vec->nfp_net;
-   if (!netif_running(nn->netdev))
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE))
goto out;
 
txd_cnt = tx_ring->cnt;
-- 
1.9.1



[RFC (v3) 19/19] nfp: allow ring size reconfiguration at runtime

2016-02-03 Thread Jakub Kicinski
Since much of the required changes have already been made for
changing MTU at runtime let's use it for ring size changes as
well.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   1 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 129 +
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  30 ++---
 3 files changed, 139 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 1e08c9cf3ee0..90ad6264e62c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -724,6 +724,7 @@ void nfp_net_rss_write_key(struct nfp_net *nn);
 void nfp_net_coalesce_write_cfg(struct nfp_net *nn);
 int nfp_net_irqs_alloc(struct nfp_net *nn);
 void nfp_net_irqs_disable(struct nfp_net *nn);
+int nfp_net_set_ring_size(struct nfp_net *nn, u32 rxd_cnt, u32 txd_cnt);
 
 #ifdef CONFIG_NFP_NET_DEBUG
 void nfp_net_debugfs_create(void);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2c86a10abcd3..70d366bdd4b7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1442,6 +1442,59 @@ err_alloc:
return -ENOMEM;
 }
 
+static struct nfp_net_tx_ring *
+nfp_net_shadow_tx_rings_prepare(struct nfp_net *nn, u32 buf_cnt)
+{
+   struct nfp_net_tx_ring *rings;
+   unsigned int r;
+
+   rings = kcalloc(nn->num_tx_rings, sizeof(*rings), GFP_KERNEL);
+   if (!rings)
+   return NULL;
+
+   for (r = 0; r < nn->num_tx_rings; r++) {
+   nfp_net_tx_ring_init([r], nn->tx_rings[r].r_vec, r);
+
+   if (nfp_net_tx_ring_alloc([r], buf_cnt))
+   goto err_free_prev;
+   }
+
+   return rings;
+
+err_free_prev:
+   while (r--)
+   nfp_net_tx_ring_free([r]);
+   kfree(rings);
+   return NULL;
+}
+
+static struct nfp_net_tx_ring *
+nfp_net_shadow_tx_rings_swap(struct nfp_net *nn, struct nfp_net_tx_ring *rings)
+{
+   struct nfp_net_tx_ring *old = nn->tx_rings;
+   unsigned int r;
+
+   for (r = 0; r < nn->num_tx_rings; r++)
+   old[r].r_vec->tx_ring = [r];
+
+   nn->tx_rings = rings;
+   return old;
+}
+
+static void
+nfp_net_shadow_tx_rings_free(struct nfp_net *nn, struct nfp_net_tx_ring *rings)
+{
+   unsigned int r;
+
+   if (!rings)
+   return;
+
+   for (r = 0; r < nn->num_tx_rings; r++)
+   nfp_net_tx_ring_free([r]);
+
+   kfree(rings);
+}
+
 /**
  * nfp_net_rx_ring_free() - Free resources allocated to a RX ring
  * @rx_ring:  RX ring to free
@@ -1558,6 +1611,9 @@ nfp_net_shadow_rx_rings_free(struct nfp_net *nn, struct 
nfp_net_rx_ring *rings)
 {
unsigned int r;
 
+   if (!rings)
+   return;
+
for (r = 0; r < nn->num_r_vecs; r++) {
nfp_net_rx_ring_bufs_free(nn, [r]);
nfp_net_rx_ring_free([r]);
@@ -2100,6 +2156,79 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
return err;
 }
 
+int nfp_net_set_ring_size(struct nfp_net *nn, u32 rxd_cnt, u32 txd_cnt)
+{
+   struct nfp_net_tx_ring *tx_rings = NULL;
+   struct nfp_net_rx_ring *rx_rings = NULL;
+   u32 old_rxd_cnt, old_txd_cnt;
+   int err, err2;
+
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE)) {
+   nn->rxd_cnt = rxd_cnt;
+   nn->txd_cnt = txd_cnt;
+   return 0;
+   }
+
+   old_rxd_cnt = nn->rxd_cnt;
+   old_txd_cnt = nn->txd_cnt;
+
+   /* Prepare new rings */
+   if (nn->rxd_cnt != rxd_cnt) {
+   rx_rings = nfp_net_shadow_rx_rings_prepare(nn, nn->fl_bufsz,
+  rxd_cnt);
+   if (!rx_rings)
+   return -ENOMEM;
+   }
+   if (nn->txd_cnt != txd_cnt) {
+   tx_rings = nfp_net_shadow_tx_rings_prepare(nn, txd_cnt);
+   if (!tx_rings) {
+   nfp_net_shadow_rx_rings_free(nn, rx_rings);
+   return -ENOMEM;
+   }
+   }
+
+   /* Stop device, swap in new rings, try to start the device */
+   nfp_net_close_stack(nn);
+   nfp_net_clear_config_and_disable(nn);
+
+   if (rx_rings)
+   rx_rings = nfp_net_shadow_rx_rings_swap(nn, rx_rings);
+   if (tx_rings)
+   tx_rings = nfp_net_shadow_tx_rings_swap(nn, tx_rings);
+
+   nn->rxd_cnt = rxd_cnt;
+   nn->txd_cnt = txd_cnt;
+
+   err = nfp_net_set_config_and_enable(nn);
+   if (err) {
+   /* Try with old configuration and old rings */
+   if (rx_rings)
+   rx_rings = nfp_net_shadow_rx_rings_swap(nn, rx_rings);
+   if (tx_rings)
+   

[RFC (v3) 17/19] nfp: convert .ndo_change_mtu() to prepare/commit paradigm

2016-02-03 Thread Jakub Kicinski
When changing MTU on running device first allocate new rings
and buffers and once it succeeds proceed with changing MTU.

Allocation of new rings is not really necessary for this
operation - it's done to keep the code simple and because
size of the extra ring memory is quite small compared to
the size of buffers.

Operation can still fail midway through if FW communication
times out.  In that case we retry with old rings and if fail
persists there is little we can do, we just free all resources
and leave device in fully closed state.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 113 +++--
 1 file changed, 105 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index fd226d2e8606..0153fce33dff 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1504,6 +1504,64 @@ err_alloc:
return -ENOMEM;
 }
 
+static struct nfp_net_rx_ring *
+nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, unsigned int fl_bufsz)
+{
+   struct nfp_net_rx_ring *rings;
+   unsigned int r;
+
+   rings = kcalloc(nn->num_rx_rings, sizeof(*rings), GFP_KERNEL);
+   if (!rings)
+   return NULL;
+
+   for (r = 0; r < nn->num_rx_rings; r++) {
+   nfp_net_rx_ring_init([r], nn->rx_rings[r].r_vec, r);
+
+   if (nfp_net_rx_ring_alloc([r], fl_bufsz))
+   goto err_free_prev;
+
+   if (nfp_net_rx_ring_bufs_alloc(nn, [r]))
+   goto err_free_ring;
+   }
+
+   return rings;
+
+err_free_prev:
+   while (r--) {
+   nfp_net_rx_ring_bufs_free(nn, [r]);
+err_free_ring:
+   nfp_net_rx_ring_free([r]);
+   }
+   kfree(rings);
+   return NULL;
+}
+
+static struct nfp_net_rx_ring *
+nfp_net_shadow_rx_rings_swap(struct nfp_net *nn, struct nfp_net_rx_ring *rings)
+{
+   struct nfp_net_rx_ring *old = nn->rx_rings;
+   unsigned int r;
+
+   for (r = 0; r < nn->num_rx_rings; r++)
+   old[r].r_vec->rx_ring = [r];
+
+   nn->rx_rings = rings;
+   return old;
+}
+
+static void
+nfp_net_shadow_rx_rings_free(struct nfp_net *nn, struct nfp_net_rx_ring *rings)
+{
+   unsigned int r;
+
+   for (r = 0; r < nn->num_r_vecs; r++) {
+   nfp_net_rx_ring_bufs_free(nn, [r]);
+   nfp_net_rx_ring_free([r]);
+   }
+
+   kfree(rings);
+}
+
 static int
 nfp_net_prepare_vector(struct nfp_net *nn, struct nfp_net_r_vector *r_vec,
   int idx)
@@ -1977,25 +2035,64 @@ static void nfp_net_set_rx_mode(struct net_device 
*netdev)
 
 static int nfp_net_change_mtu(struct net_device *netdev, int new_mtu)
 {
+   unsigned int old_mtu, old_fl_bufsz, new_fl_bufsz;
struct nfp_net *nn = netdev_priv(netdev);
-   int ret = 0;
+   struct nfp_net_rx_ring *tmp_rings;
+   int err, err2;
 
if (new_mtu < 68 || new_mtu > nn->max_mtu) {
nn_err(nn, "New MTU (%d) is not valid\n", new_mtu);
return -EINVAL;
}
 
-   if (netif_running(netdev))
-   nfp_net_netdev_close(netdev);
+   old_mtu = netdev->mtu;
+   old_fl_bufsz = nn->fl_bufsz;
+   new_fl_bufsz = NFP_NET_MAX_PREPEND + ETH_HLEN + VLAN_HLEN * 2 +
+   MPLS_HLEN * 8 + new_mtu;
+
+   if (!(nn->ctrl & NFP_NET_CFG_CTRL_ENABLE)) {
+   netdev->mtu = new_mtu;
+   nn->fl_bufsz = new_fl_bufsz;
+   return 0;
+   }
+
+   /* Prepare new rings */
+   tmp_rings = nfp_net_shadow_rx_rings_prepare(nn, new_fl_bufsz);
+   if (!tmp_rings)
+   return -ENOMEM;
+
+   /* Stop device, swap in new rings, try to start the device */
+   nfp_net_close_stack(nn);
+   nfp_net_clear_config_and_disable(nn);
+
+   tmp_rings = nfp_net_shadow_rx_rings_swap(nn, tmp_rings);
 
netdev->mtu = new_mtu;
-   nn->fl_bufsz = NFP_NET_MAX_PREPEND + ETH_HLEN + VLAN_HLEN * 2 +
-   MPLS_HLEN * 8 + new_mtu;
+   nn->fl_bufsz = new_fl_bufsz;
+
+   err = nfp_net_set_config_and_enable(nn);
+   if (err) {
+   /* Try with old configuration and old rings */
+   tmp_rings = nfp_net_shadow_rx_rings_swap(nn, tmp_rings);
+
+   netdev->mtu = old_mtu;
+   nn->fl_bufsz = old_fl_bufsz;
+
+   err2 = nfp_net_set_config_and_enable(nn);
+   if (err2) {
+   nn_err(nn, "Can't restore MTU - FW communication failed 
(%d,%d)\n",
+  err, err2);
+   nfp_net_shadow_rx_rings_free(nn, tmp_rings);
+   nfp_net_close_free_all(nn);
+   return err2;
+   }
+   }
 
-   if (netif_running(netdev))
-   ret = 

[RFC (v3) 16/19] nfp: propagate list buffer size in struct rx_ring

2016-02-03 Thread Jakub Kicinski
Free list buffer size needs to be propagated to few functions
as a parameter and added to struct nfp_net_rx_ring since soon
some of the functions will be reused to manage rings with
buffers of size different than nn->fl_bufsz.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  3 +++
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 24 ++
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 0a87571a7d9c..1e08c9cf3ee0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -298,6 +298,8 @@ struct nfp_net_rx_buf {
  * @rxds:   Virtual address of FL/RX ring in host memory
  * @dma:DMA address of the FL/RX ring
  * @size:   Size, in bytes, of the FL/RX ring (needed to free)
+ * @bufsz: Buffer allocation size for convenience of management routines
+ * (NOTE: this is in second cache line, do not use on fast path!)
  */
 struct nfp_net_rx_ring {
struct nfp_net_r_vector *r_vec;
@@ -319,6 +321,7 @@ struct nfp_net_rx_ring {
 
dma_addr_t dma;
unsigned int size;
+   unsigned int bufsz;
 } cacheline_aligned;
 
 /**
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 15d695cd8c44..fd226d2e8606 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -958,25 +958,27 @@ static inline int nfp_net_rx_space(struct nfp_net_rx_ring 
*rx_ring)
  * nfp_net_rx_alloc_one() - Allocate and map skb for RX
  * @rx_ring:   RX ring structure of the skb
  * @dma_addr:  Pointer to storage for DMA address (output param)
+ * @fl_bufsz:  size of freelist buffers
  *
  * This function will allcate a new skb, map it for DMA.
  *
  * Return: allocated skb or NULL on failure.
  */
 static struct sk_buff *
-nfp_net_rx_alloc_one(struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr)
+nfp_net_rx_alloc_one(struct nfp_net_rx_ring *rx_ring, dma_addr_t *dma_addr,
+unsigned int fl_bufsz)
 {
struct nfp_net *nn = rx_ring->r_vec->nfp_net;
struct sk_buff *skb;
 
-   skb = netdev_alloc_skb(nn->netdev, nn->fl_bufsz);
+   skb = netdev_alloc_skb(nn->netdev, fl_bufsz);
if (!skb) {
nn_warn_ratelimit(nn, "Failed to alloc receive SKB\n");
return NULL;
}
 
*dma_addr = dma_map_single(>pdev->dev, skb->data,
- nn->fl_bufsz, DMA_FROM_DEVICE);
+  fl_bufsz, DMA_FROM_DEVICE);
if (dma_mapping_error(>pdev->dev, *dma_addr)) {
dev_kfree_skb_any(skb);
nn_warn_ratelimit(nn, "Failed to map DMA RX buffer\n");
@@ -1069,7 +1071,7 @@ nfp_net_rx_ring_bufs_free(struct nfp_net *nn, struct 
nfp_net_rx_ring *rx_ring)
continue;
 
dma_unmap_single(>dev, rx_ring->rxbufs[i].dma_addr,
-nn->fl_bufsz, DMA_FROM_DEVICE);
+rx_ring->bufsz, DMA_FROM_DEVICE);
dev_kfree_skb_any(rx_ring->rxbufs[i].skb);
rx_ring->rxbufs[i].dma_addr = 0;
rx_ring->rxbufs[i].skb = NULL;
@@ -1091,7 +1093,8 @@ nfp_net_rx_ring_bufs_alloc(struct nfp_net *nn, struct 
nfp_net_rx_ring *rx_ring)
 
for (i = 0; i < rx_ring->cnt - 1; i++) {
rxbufs[i].skb =
-   nfp_net_rx_alloc_one(rx_ring, [i].dma_addr);
+   nfp_net_rx_alloc_one(rx_ring, [i].dma_addr,
+rx_ring->bufsz);
if (!rxbufs[i].skb) {
nfp_net_rx_ring_bufs_free(nn, rx_ring);
return -ENOMEM;
@@ -1279,7 +1282,8 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
skb = rx_ring->rxbufs[idx].skb;
 
-   new_skb = nfp_net_rx_alloc_one(rx_ring, _dma_addr);
+   new_skb = nfp_net_rx_alloc_one(rx_ring, _dma_addr,
+  nn->fl_bufsz);
if (!new_skb) {
nfp_net_rx_give_one(rx_ring, rx_ring->rxbufs[idx].skb,
rx_ring->rxbufs[idx].dma_addr);
@@ -1463,10 +1467,12 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
 /**
  * nfp_net_rx_ring_alloc() - Allocate resource for a RX ring
  * @rx_ring:  RX ring to allocate
+ * @fl_bufsz: Size of buffers to allocate
  *
  * Return: 0 on success, negative errno otherwise.
  */
-static int nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring)
+static int
+nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring, unsigned int fl_bufsz)
 {
struct nfp_net_r_vector *r_vec = 

[RFC (v3) 18/19] nfp: pass ring count as function parameter

2016-02-03 Thread Jakub Kicinski
Soon ring resize will call this functions with values
different than the current configuration we need to
explicitly pass the ring count as parameter.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 23 +-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 0153fce33dff..2c86a10abcd3 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1405,17 +1405,18 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
 /**
  * nfp_net_tx_ring_alloc() - Allocate resource for a TX ring
  * @tx_ring:   TX Ring structure to allocate
+ * @cnt:   Ring buffer count
  *
  * Return: 0 on success, negative errno otherwise.
  */
-static int nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring)
+static int nfp_net_tx_ring_alloc(struct nfp_net_tx_ring *tx_ring, u32 cnt)
 {
struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
int sz;
 
-   tx_ring->cnt = nn->txd_cnt;
+   tx_ring->cnt = cnt;
 
tx_ring->size = sizeof(*tx_ring->txds) * tx_ring->cnt;
tx_ring->txds = dma_zalloc_coherent(>dev, tx_ring->size,
@@ -1468,18 +1469,20 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
  * nfp_net_rx_ring_alloc() - Allocate resource for a RX ring
  * @rx_ring:  RX ring to allocate
  * @fl_bufsz: Size of buffers to allocate
+ * @cnt:  Ring buffer count
  *
  * Return: 0 on success, negative errno otherwise.
  */
 static int
-nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring, unsigned int fl_bufsz)
+nfp_net_rx_ring_alloc(struct nfp_net_rx_ring *rx_ring, unsigned int fl_bufsz,
+ u32 cnt)
 {
struct nfp_net_r_vector *r_vec = rx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
int sz;
 
-   rx_ring->cnt = nn->rxd_cnt;
+   rx_ring->cnt = cnt;
rx_ring->bufsz = fl_bufsz;
 
rx_ring->size = sizeof(*rx_ring->rxds) * rx_ring->cnt;
@@ -1505,7 +1508,8 @@ err_alloc:
 }
 
 static struct nfp_net_rx_ring *
-nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, unsigned int fl_bufsz)
+nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, unsigned int fl_bufsz,
+   u32 buf_cnt)
 {
struct nfp_net_rx_ring *rings;
unsigned int r;
@@ -1517,7 +1521,7 @@ nfp_net_shadow_rx_rings_prepare(struct nfp_net *nn, 
unsigned int fl_bufsz)
for (r = 0; r < nn->num_rx_rings; r++) {
nfp_net_rx_ring_init([r], nn->rx_rings[r].r_vec, r);
 
-   if (nfp_net_rx_ring_alloc([r], fl_bufsz))
+   if (nfp_net_rx_ring_alloc([r], fl_bufsz, buf_cnt))
goto err_free_prev;
 
if (nfp_net_rx_ring_bufs_alloc(nn, [r]))
@@ -1871,12 +1875,12 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
if (err)
goto err_free_prev_vecs;
 
-   err = nfp_net_tx_ring_alloc(nn->r_vecs[r].tx_ring);
+   err = nfp_net_tx_ring_alloc(nn->r_vecs[r].tx_ring, nn->txd_cnt);
if (err)
goto err_cleanup_vec_p;
 
err = nfp_net_rx_ring_alloc(nn->r_vecs[r].rx_ring,
-   nn->fl_bufsz);
+   nn->fl_bufsz, nn->rxd_cnt);
if (err)
goto err_free_tx_ring_p;
 
@@ -2057,7 +2061,8 @@ static int nfp_net_change_mtu(struct net_device *netdev, 
int new_mtu)
}
 
/* Prepare new rings */
-   tmp_rings = nfp_net_shadow_rx_rings_prepare(nn, new_fl_bufsz);
+   tmp_rings = nfp_net_shadow_rx_rings_prepare(nn, new_fl_bufsz,
+   nn->rxd_cnt);
if (!tmp_rings)
return -ENOMEM;
 
-- 
1.9.1



Re: [net-next PATCH 1/7] net: rework ndo tc op to consume additional qdisc handle parameter

2016-02-03 Thread kbuild test robot
Hi John,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/John-Fastabend/tc-offload-for-cls_u32-on-ixgbe/20160203-173342
config: x86_64-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1715:19: warning: initialization 
>> from incompatible pointer type [-Wincompatible-pointer-types]
 .ndo_setup_tc  = xgbe_setup_tc,
  ^
   drivers/net/ethernet/amd/xgbe/xgbe-drv.c:1715:19: note: (near initialization 
for 'xgbe_netdev_ops.ndo_setup_tc')
--
>> drivers/net/ethernet/intel/i40e/i40e_fcoe.c:1460:19: warning: initialization 
>> from incompatible pointer type [-Wincompatible-pointer-types]
 .ndo_setup_tc  = i40e_setup_tc,
  ^
   drivers/net/ethernet/intel/i40e/i40e_fcoe.c:1460:19: note: (near 
initialization for 'i40e_fcoe_netdev_ops.ndo_setup_tc')

vim +1715 drivers/net/ethernet/amd/xgbe/xgbe-drv.c

c5aa9e3b Lendacky, Thomas 2014-06-05  1699  static const struct net_device_ops 
xgbe_netdev_ops = {
c5aa9e3b Lendacky, Thomas 2014-06-05  1700  .ndo_open   = 
xgbe_open,
c5aa9e3b Lendacky, Thomas 2014-06-05  1701  .ndo_stop   = 
xgbe_close,
c5aa9e3b Lendacky, Thomas 2014-06-05  1702  .ndo_start_xmit = 
xgbe_xmit,
c5aa9e3b Lendacky, Thomas 2014-06-05  1703  .ndo_set_rx_mode= 
xgbe_set_rx_mode,
c5aa9e3b Lendacky, Thomas 2014-06-05  1704  .ndo_set_mac_address= 
xgbe_set_mac_address,
c5aa9e3b Lendacky, Thomas 2014-06-05  1705  .ndo_validate_addr  = 
eth_validate_addr,
23e4eef7 Lendacky, Thomas 2014-07-29  1706  .ndo_do_ioctl   = 
xgbe_ioctl,
c5aa9e3b Lendacky, Thomas 2014-06-05  1707  .ndo_change_mtu = 
xgbe_change_mtu,
a8373f1a Lendacky, Thomas 2015-04-09  1708  .ndo_tx_timeout = 
xgbe_tx_timeout,
c5aa9e3b Lendacky, Thomas 2014-06-05  1709  .ndo_get_stats64= 
xgbe_get_stats64,
801c62d9 Lendacky, Thomas 2014-06-24  1710  .ndo_vlan_rx_add_vid= 
xgbe_vlan_rx_add_vid,
801c62d9 Lendacky, Thomas 2014-06-24  1711  .ndo_vlan_rx_kill_vid   = 
xgbe_vlan_rx_kill_vid,
c5aa9e3b Lendacky, Thomas 2014-06-05  1712  #ifdef CONFIG_NET_POLL_CONTROLLER
c5aa9e3b Lendacky, Thomas 2014-06-05  1713  .ndo_poll_controller= 
xgbe_poll_controller,
c5aa9e3b Lendacky, Thomas 2014-06-05  1714  #endif
fca2d994 Lendacky, Thomas 2014-07-29 @1715  .ndo_setup_tc   = 
xgbe_setup_tc,
c5aa9e3b Lendacky, Thomas 2014-06-05  1716  .ndo_set_features   = 
xgbe_set_features,
c5aa9e3b Lendacky, Thomas 2014-06-05  1717  };
c5aa9e3b Lendacky, Thomas 2014-06-05  1718  
c5aa9e3b Lendacky, Thomas 2014-06-05  1719  struct net_device_ops 
*xgbe_get_netdev_ops(void)
c5aa9e3b Lendacky, Thomas 2014-06-05  1720  {
c5aa9e3b Lendacky, Thomas 2014-06-05  1721  return (struct net_device_ops 
*)_netdev_ops;
c5aa9e3b Lendacky, Thomas 2014-06-05  1722  }
c5aa9e3b Lendacky, Thomas 2014-06-05  1723  

:: The code at line 1715 was first introduced by commit
:: fca2d99428473884e67ef8ea1586e58151ed6ac3 amd-xgbe: Add traffic class 
support

:: TO: Lendacky, Thomas <thomas.lenda...@amd.com>
:: CC: David S. Miller <da...@davemloft.net>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


RE: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-03 Thread David Laight
From: Linus Torvalds
> Sent: 02 February 2016 20:45
> On Tue, Feb 2, 2016 at 12:32 PM, Hannes Frederic Sowa
>  wrote:
> >
> > Unfortunately we never transfer a scm_cookie via the skbs but merely use it
> > to initialize unix_skb_parms structure in skb->cb and destroy it afterwards.
> 
> Ok, I obviously didn't check very closely.
> 
> > But "struct pid *" in unix_skb_parms should be enough to get us to
> > corresponding "struct cred *" so we can decrement the correct counter during
> > skb destruction.
> 
> Umm. I think the "struct cred" may change in between, can't it?
> 
> So I don't think you can later look up the cred based on the pid.
> 
> Could we add the cred pointer (or just the user pointer) to the 
> unix_skb_parms?
> 
> Or maybe just add it to the "struct scm_fp_list"?

Is that going to work if the sending process exits before the message is read?
You need a reference count against the structure than contains the count.
I think this is 'struct user' not 'struct cred'

David



Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-03 Thread Simon McVittie
On 03/02/16 11:56, David Herrmann wrote:
> However, with Hannes' revised patch, a different DoS attack against
> dbus-daemon is possible. Imagine a peer that receives batches of FDs,
> but never dequeues them. They will be accounted on the inflight-limit
> of dbus-daemon, as such causing messages of independent peers to be
> rejected in case they carry FDs.
> Preferably, dbus-daemon would avoid queuing more than 16 FDs on a
> single destination (total). But that would require POLLOUT to be
> capped by the number of queued fds. A possible workaround is to add
> CAP_SYS_RESOURCE to dbus-daemon.

We have several mitigations for this:

* there's a limit on outgoing fds queued inside dbus-daemon for a
  particular recipient connection, currently 64, and if that's
  exceeded, we stop accepting messages for that recipient, feeding back
  a send failure to the sender for messages to that recipient;
* there's a time limit for how long any given fd can stay queued up
  inside dbus-daemon, currently 2.5 minutes, and if that's exceeded, we
  drop the message;
* if started as root[1], we increase our fd limit to 64k before
  dropping privileges to the intended uid, which in combination with
  the maximum of 256 connections per uid and 64 fds per connection means
  it takes multiple uids working together to carry out a DoS;
* all of those limits can be adjusted by a local sysadmin if necessary,
  if their users are particularly hostile (for instance 16 fds per
  message is a generous limit intended to avoid unforeseen breakage,
  and 4 would probably be enough in practice)

The queueing logic is fairly involved, so I'd appreciate suggested
patches on freedesktop.org Bugzilla or to
dbus-secur...@lists.freedesktop.org if you have an idea for better
limits. If you believe you have found a non-public vulnerability, please
mark any Bugzilla bugs as restricted to the D-Bus security group.

Thanks,
S

[1] The system dbus-daemon is started as root (and hence able to
increase its own fd limit) on all systemd systems, anything that uses
the Red Hat or Slackware sysvinit scripts bundled with dbus, and any
Debian derivatives that use sysvinit and have taken security updates
from at least Debian 7. Other distro-specific init system glue is up to
the relevant distro.

-- 
Simon McVittie
Collabora Ltd. 



Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-03 Thread Simon McVittie
On 02/02/16 17:34, David Herrmann wrote:
> Furthermore, with this patch in place, a process better not pass any
> file-descriptors to an untrusted process.
...
> Did anyone notify the dbus maintainers of this? They
> might wanna document this, if not already done (CC: smcv).

Sorry, I'm not clear from this message on what patterns I should be
documenting as bad, and what the effect of non-compliance would be.

dbus-daemon has a fd-passing feature, which uses AF_UNIX sockets'
existing ability to pass fds to let users of D-Bus attach fds to their
messages. The message is passed from the sending client to dbus-daemon,
then from dbus-daemon to the recipient:

 AF_UNIX AF_UNIX
   ||
sender ---> dbus-daemon ---> recipient
   ||

This has been API since before I was a D-Bus maintainer, so I have no
influence over its existence; just like the kernel doesn't want to break
user-space, dbus-daemon doesn't want to break its users.

The system dbus-daemon (dbus-daemon --system) is a privilege boundary,
and accepts senders and recipients with differing privileges. Without
configuration, messages are denied by default. Recipients can open this
up (by installing system-wide configuration) to allow arbitrary
processes to send messages to them, so that they can carry out their own
discretionary access control. Since 2014, the system dbus-daemon accepts
up to 16 file descriptors per message by default.

There is also a session or user dbus-daemon (dbus-daemon --session) per
uid, but that isn't normally a privilege boundary, so any user trying to
carry out a denial of service there is only hurting themselves.

Am I right in saying that the advice I give to D-Bus users should be
something like this?

* system services should not send fds at all, unless they trust the
  dbus-daemon
* system services should not send fds via D-Bus that will be delivered
  to recipients that they do not trust
* sending fds to an untrusted recipient would enable that recipient to
  carry out a denial-of-service attack (on what? the sender? the
  dbus-daemon?)

-- 
Simon McVittie
Collabora Ltd. 



[RFC (v3) 02/19] nfp: free buffers before changing MTU

2016-02-03 Thread Jakub Kicinski
For freeing DMA buffers we depend on nfp_net.fl_bufsz having the same
value as during allocation therefore in .ndo_change_mtu we must first
free the buffers and then change the setting.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 006d9600240f..b381682de3d6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1921,17 +1921,17 @@ static int nfp_net_change_mtu(struct net_device 
*netdev, int new_mtu)
return -EINVAL;
}
 
+   if (netif_running(netdev))
+   nfp_net_netdev_close(netdev);
+
netdev->mtu = new_mtu;
 
/* Freelist buffer size rounded up to the nearest 1K */
tmp = new_mtu + ETH_HLEN + VLAN_HLEN + NFP_NET_MAX_PREPEND;
nn->fl_bufsz = roundup(tmp, 1024);
 
-   /* restart if running */
-   if (netif_running(netdev)) {
-   nfp_net_netdev_close(netdev);
+   if (netif_running(netdev))
ret = nfp_net_netdev_open(netdev);
-   }
 
return ret;
 }
-- 
1.9.1



[RFC (v3) 01/19] nfp: return error if MTU change fails

2016-02-03 Thread Jakub Kicinski
When reopening device fails after MTU change, let the userspace
know.  MTU remains changed even though error is returned, this
is what all ethernet devices are doing.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 43c618bafdb6..006d9600240f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1911,6 +1911,7 @@ static void nfp_net_set_rx_mode(struct net_device *netdev)
 static int nfp_net_change_mtu(struct net_device *netdev, int new_mtu)
 {
struct nfp_net *nn = netdev_priv(netdev);
+   int ret = 0;
u32 tmp;
 
nn_dbg(nn, "New MTU = %d\n", new_mtu);
@@ -1929,10 +1930,10 @@ static int nfp_net_change_mtu(struct net_device 
*netdev, int new_mtu)
/* restart if running */
if (netif_running(netdev)) {
nfp_net_netdev_close(netdev);
-   nfp_net_netdev_open(netdev);
+   ret = nfp_net_netdev_open(netdev);
}
 
-   return 0;
+   return ret;
 }
 
 static struct rtnl_link_stats64 *nfp_net_stat64(struct net_device *netdev,
-- 
1.9.1



[RFC (v3) 14/19] nfp: slice .ndo_open() and .ndo_stop() up

2016-02-03 Thread Jakub Kicinski
Divide .ndo_open() and .ndo_stop() into logical, callable
chunks.  No functional changes.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 213 +
 1 file changed, 131 insertions(+), 82 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 34f933f19059..4ce17cb95e6f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1671,6 +1671,77 @@ nfp_net_vec_write_ring_data(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec,
 }
 
 /**
+ * nfp_net_set_config_and_enable() - Write control BAR and enable NFP
+ * @nn:  NFP Net device to reconfigure
+ */
+static int nfp_net_set_config_and_enable(struct nfp_net *nn)
+{
+   u32 new_ctrl, update = 0;
+   unsigned int r;
+   int err;
+
+   new_ctrl = nn->ctrl;
+
+   if (nn->cap & NFP_NET_CFG_CTRL_RSS) {
+   nfp_net_rss_write_key(nn);
+   nfp_net_rss_write_itbl(nn);
+   nn_writel(nn, NFP_NET_CFG_RSS_CTRL, nn->rss_cfg);
+   update |= NFP_NET_CFG_UPDATE_RSS;
+   }
+
+   if (nn->cap & NFP_NET_CFG_CTRL_IRQMOD) {
+   nfp_net_coalesce_write_cfg(nn);
+
+   new_ctrl |= NFP_NET_CFG_CTRL_IRQMOD;
+   update |= NFP_NET_CFG_UPDATE_IRQMOD;
+   }
+
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_vec_write_ring_data(nn, >r_vecs[r], r);
+
+   nn_writeq(nn, NFP_NET_CFG_TXRS_ENABLE, nn->num_tx_rings == 64 ?
+ 0xULL : ((u64)1 << nn->num_tx_rings) - 1);
+
+   nn_writeq(nn, NFP_NET_CFG_RXRS_ENABLE, nn->num_rx_rings == 64 ?
+ 0xULL : ((u64)1 << nn->num_rx_rings) - 1);
+
+   nfp_net_write_mac_addr(nn, nn->netdev->dev_addr);
+
+   nn_writel(nn, NFP_NET_CFG_MTU, nn->netdev->mtu);
+   nn_writel(nn, NFP_NET_CFG_FLBUFSZ, nn->fl_bufsz);
+
+   /* Enable device */
+   new_ctrl |= NFP_NET_CFG_CTRL_ENABLE;
+   update |= NFP_NET_CFG_UPDATE_GEN;
+   update |= NFP_NET_CFG_UPDATE_MSIX;
+   update |= NFP_NET_CFG_UPDATE_RING;
+   if (nn->cap & NFP_NET_CFG_CTRL_RINGCFG)
+   new_ctrl |= NFP_NET_CFG_CTRL_RINGCFG;
+
+   nn_writel(nn, NFP_NET_CFG_CTRL, new_ctrl);
+   err = nfp_net_reconfig(nn, update);
+   if (err)
+   goto err_clear_config;
+
+   nn->ctrl = new_ctrl;
+
+   /* Since reconfiguration requests while NFP is down are ignored we
+* have to wipe the entire VXLAN configuration and reinitialize it.
+*/
+   if (nn->ctrl & NFP_NET_CFG_CTRL_VXLAN) {
+   memset(>vxlan_ports, 0, sizeof(nn->vxlan_ports));
+   memset(>vxlan_usecnt, 0, sizeof(nn->vxlan_usecnt));
+   vxlan_get_rx_port(nn->netdev);
+   }
+
+   return 0;
+
+err_clear_config:
+   nfp_net_clear_config_and_disable(nn);
+   return err;
+}
+
+/**
  * nfp_net_start_vec() - Start ring vector
  * @nn:  NFP Net device structure
  * @r_vec:   Ring vector to be started
@@ -1690,20 +1761,33 @@ nfp_net_start_vec(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec)
enable_irq(irq_vec);
 }
 
+/**
+ * nfp_net_open_stack() - Start the device from stack's perspective
+ * @nn:  NFP Net device to reconfigure
+ */
+static void nfp_net_open_stack(struct nfp_net *nn)
+{
+   unsigned int r;
+
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_start_vec(nn, >r_vecs[r]);
+
+   netif_tx_wake_all_queues(nn->netdev);
+
+   enable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
+   nfp_net_read_link_status(nn);
+}
+
 static int nfp_net_netdev_open(struct net_device *netdev)
 {
struct nfp_net *nn = netdev_priv(netdev);
int err, r;
-   u32 update = 0;
-   u32 new_ctrl;
 
if (nn->ctrl & NFP_NET_CFG_CTRL_ENABLE) {
nn_err(nn, "Dev is already enabled: 0x%08x\n", nn->ctrl);
return -EBUSY;
}
 
-   new_ctrl = nn->ctrl;
-
/* Step 1: Allocate resources for rings and the like
 * - Request interrupts
 * - Allocate RX and TX ring resources
@@ -1756,20 +1840,6 @@ static int nfp_net_netdev_open(struct net_device *netdev)
if (err)
goto err_free_rings;
 
-   if (nn->cap & NFP_NET_CFG_CTRL_RSS) {
-   nfp_net_rss_write_key(nn);
-   nfp_net_rss_write_itbl(nn);
-   nn_writel(nn, NFP_NET_CFG_RSS_CTRL, nn->rss_cfg);
-   update |= NFP_NET_CFG_UPDATE_RSS;
-   }
-
-   if (nn->cap & NFP_NET_CFG_CTRL_IRQMOD) {
-   nfp_net_coalesce_write_cfg(nn);
-
-   new_ctrl |= NFP_NET_CFG_CTRL_IRQMOD;
-   update |= NFP_NET_CFG_UPDATE_IRQMOD;
-   }
-
/* Step 2: Configure the NFP
 * - Enable rings from 0 to tx_rings/rx_rings - 1.
 * - 

[RFC (v3) 07/19] nfp: break up nfp_net_{alloc|free}_rings

2016-02-03 Thread Jakub Kicinski
nfp_net_{alloc|free}_rings contained strange mix of allocations
and vector initialization.  Remove it, declare vector init as
a separate function and handle allocations explicitly.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 126 -
 1 file changed, 47 insertions(+), 79 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index bebdae80ccda..d39ac3553e1e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1486,91 +1486,40 @@ err_alloc:
return -ENOMEM;
 }
 
-static void __nfp_net_free_rings(struct nfp_net *nn, unsigned int n_free)
+static int
+nfp_net_prepare_vector(struct nfp_net *nn, struct nfp_net_r_vector *r_vec,
+  int idx)
 {
-   struct nfp_net_r_vector *r_vec;
-   struct msix_entry *entry;
+   struct msix_entry *entry = >irq_entries[r_vec->irq_idx];
+   int err;
 
-   while (n_free--) {
-   r_vec = >r_vecs[n_free];
-   entry = >irq_entries[r_vec->irq_idx];
+   snprintf(r_vec->name, sizeof(r_vec->name),
+"%s-rxtx-%d", nn->netdev->name, idx);
+   err = request_irq(entry->vector, r_vec->handler, 0, r_vec->name, r_vec);
+   if (err) {
+   nn_err(nn, "Error requesting IRQ %d\n", entry->vector);
+   return err;
+   }
 
-   nfp_net_rx_ring_free(r_vec->rx_ring);
-   nfp_net_tx_ring_free(r_vec->tx_ring);
+   /* Setup NAPI */
+   netif_napi_add(nn->netdev, _vec->napi,
+  nfp_net_poll, NAPI_POLL_WEIGHT);
 
-   irq_set_affinity_hint(entry->vector, NULL);
-   free_irq(entry->vector, r_vec);
+   irq_set_affinity_hint(entry->vector, _vec->affinity_mask);
 
-   netif_napi_del(_vec->napi);
-   }
-}
+   nn_dbg(nn, "RV%02d: irq=%03d/%03d\n", idx, entry->vector, entry->entry);
 
-/**
- * nfp_net_free_rings() - Free all ring resources
- * @nn:  NFP Net device to reconfigure
- */
-static void nfp_net_free_rings(struct nfp_net *nn)
-{
-   __nfp_net_free_rings(nn, nn->num_r_vecs);
+   return 0;
 }
 
-/**
- * nfp_net_alloc_rings() - Allocate resources for RX and TX rings
- * @nn:  NFP Net device to reconfigure
- *
- * Return: 0 on success or negative errno on error.
- */
-static int nfp_net_alloc_rings(struct nfp_net *nn)
+static void
+nfp_net_cleanup_vector(struct nfp_net *nn, struct nfp_net_r_vector *r_vec)
 {
-   struct nfp_net_r_vector *r_vec;
-   struct msix_entry *entry;
-   int err;
-   int r;
+   struct msix_entry *entry = >irq_entries[r_vec->irq_idx];
 
-   for (r = 0; r < nn->num_r_vecs; r++) {
-   r_vec = >r_vecs[r];
-   entry = >irq_entries[r_vec->irq_idx];
-
-   /* Setup NAPI */
-   netif_napi_add(nn->netdev, _vec->napi,
-  nfp_net_poll, NAPI_POLL_WEIGHT);
-
-   snprintf(r_vec->name, sizeof(r_vec->name),
-"%s-rxtx-%d", nn->netdev->name, r);
-   err = request_irq(entry->vector, r_vec->handler, 0,
- r_vec->name, r_vec);
-   if (err) {
-   nn_dbg(nn, "Error requesting IRQ %d\n", entry->vector);
-   goto err_napi_del;
-   }
-
-   irq_set_affinity_hint(entry->vector, _vec->affinity_mask);
-
-   nn_dbg(nn, "RV%02d: irq=%03d/%03d\n",
-  r, entry->vector, entry->entry);
-
-   /* Allocate TX ring resources */
-   err = nfp_net_tx_ring_alloc(r_vec->tx_ring);
-   if (err)
-   goto err_free_irq;
-
-   /* Allocate RX ring resources */
-   err = nfp_net_rx_ring_alloc(r_vec->rx_ring);
-   if (err)
-   goto err_free_tx;
-   }
-
-   return 0;
-
-err_free_tx:
-   nfp_net_tx_ring_free(r_vec->tx_ring);
-err_free_irq:
irq_set_affinity_hint(entry->vector, NULL);
-   free_irq(entry->vector, r_vec);
-err_napi_del:
netif_napi_del(_vec->napi);
-   __nfp_net_free_rings(nn, r);
-   return err;
+   free_irq(entry->vector, r_vec);
 }
 
 /**
@@ -1734,9 +1683,19 @@ static int nfp_net_netdev_open(struct net_device *netdev)
goto err_free_exn;
disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
 
-   err = nfp_net_alloc_rings(nn);
-   if (err)
-   goto err_free_lsc;
+   for (r = 0; r < nn->num_r_vecs; r++) {
+   err = nfp_net_prepare_vector(nn, >r_vecs[r], r);
+   if (err)
+   goto err_free_prev_vecs;
+
+   err = nfp_net_tx_ring_alloc(nn->r_vecs[r].tx_ring);
+   if (err)
+   goto 

[RFC (v3) 08/19] nfp: make *x_ring_init do all the init

2016-02-03 Thread Jakub Kicinski
nfp_net_[rt]x_ring_init functions used to be called from probe
path only and some of their functionality was spilled to the
call site.  In order to reuse them for ring reconfiguration
we need them to do all the init.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 28 ++
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index d39ac3553e1e..8299d4c002fb 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -348,12 +348,18 @@ static irqreturn_t nfp_net_irq_exn(int irq, void *data)
 /**
  * nfp_net_tx_ring_init() - Fill in the boilerplate for a TX ring
  * @tx_ring:  TX ring structure
+ * @r_vec:IRQ vector servicing this ring
+ * @idx:  Ring index
  */
-static void nfp_net_tx_ring_init(struct nfp_net_tx_ring *tx_ring)
+static void
+nfp_net_tx_ring_init(struct nfp_net_tx_ring *tx_ring,
+struct nfp_net_r_vector *r_vec, unsigned int idx)
 {
-   struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
 
+   tx_ring->idx = idx;
+   tx_ring->r_vec = r_vec;
+
tx_ring->qcidx = tx_ring->idx * nn->stride_tx;
tx_ring->qcp_q = nn->tx_bar + NFP_QCP_QUEUE_OFF(tx_ring->qcidx);
 }
@@ -361,12 +367,18 @@ static void nfp_net_tx_ring_init(struct nfp_net_tx_ring 
*tx_ring)
 /**
  * nfp_net_rx_ring_init() - Fill in the boilerplate for a RX ring
  * @rx_ring:  RX ring structure
+ * @r_vec:IRQ vector servicing this ring
+ * @idx:  Ring index
  */
-static void nfp_net_rx_ring_init(struct nfp_net_rx_ring *rx_ring)
+static void
+nfp_net_rx_ring_init(struct nfp_net_rx_ring *rx_ring,
+struct nfp_net_r_vector *r_vec, unsigned int idx)
 {
-   struct nfp_net_r_vector *r_vec = rx_ring->r_vec;
struct nfp_net *nn = r_vec->nfp_net;
 
+   rx_ring->idx = idx;
+   rx_ring->r_vec = r_vec;
+
rx_ring->fl_qcidx = rx_ring->idx * nn->stride_rx;
rx_ring->rx_qcidx = rx_ring->fl_qcidx + (nn->stride_rx - 1);
 
@@ -404,14 +416,10 @@ static void nfp_net_irqs_assign(struct net_device *netdev)
cpumask_set_cpu(r, _vec->affinity_mask);
 
r_vec->tx_ring = >tx_rings[r];
-   nn->tx_rings[r].idx = r;
-   nn->tx_rings[r].r_vec = r_vec;
-   nfp_net_tx_ring_init(r_vec->tx_ring);
+   nfp_net_tx_ring_init(r_vec->tx_ring, r_vec, r);
 
r_vec->rx_ring = >rx_rings[r];
-   nn->rx_rings[r].idx = r;
-   nn->rx_rings[r].r_vec = r_vec;
-   nfp_net_rx_ring_init(r_vec->rx_ring);
+   nfp_net_rx_ring_init(r_vec->rx_ring, r_vec, r);
}
 }
 
-- 
1.9.1



[RFC (v3) 09/19] nfp: allocate ring SW structs dynamically

2016-02-03 Thread Jakub Kicinski
To be able to switch rings more easily on config changes allocate
them dynamically, separately from nfp_net structure.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  6 ++---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 28 +-
 2 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index ab264e1bccd0..0a87571a7d9c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -472,6 +472,9 @@ struct nfp_net {
 
u32 rx_offset;
 
+   struct nfp_net_tx_ring *tx_rings;
+   struct nfp_net_rx_ring *rx_rings;
+
 #ifdef CONFIG_PCI_IOV
unsigned int num_vfs;
struct vf_data_storage *vfinfo;
@@ -504,9 +507,6 @@ struct nfp_net {
int txd_cnt;
int rxd_cnt;
 
-   struct nfp_net_tx_ring tx_rings[NFP_NET_MAX_TX_RINGS];
-   struct nfp_net_rx_ring rx_rings[NFP_NET_MAX_RX_RINGS];
-
u8 num_irqs;
u8 num_r_vecs;
struct nfp_net_r_vector r_vecs[NFP_NET_MAX_TX_RINGS];
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 8299d4c002fb..faaa25dd5a1e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -414,12 +414,6 @@ static void nfp_net_irqs_assign(struct net_device *netdev)
r_vec->irq_idx = NFP_NET_NON_Q_VECTORS + r;
 
cpumask_set_cpu(r, _vec->affinity_mask);
-
-   r_vec->tx_ring = >tx_rings[r];
-   nfp_net_tx_ring_init(r_vec->tx_ring, r_vec, r);
-
-   r_vec->rx_ring = >rx_rings[r];
-   nfp_net_rx_ring_init(r_vec->rx_ring, r_vec, r);
}
 }
 
@@ -1501,6 +1495,12 @@ nfp_net_prepare_vector(struct nfp_net *nn, struct 
nfp_net_r_vector *r_vec,
struct msix_entry *entry = >irq_entries[r_vec->irq_idx];
int err;
 
+   r_vec->tx_ring = >tx_rings[idx];
+   nfp_net_tx_ring_init(r_vec->tx_ring, r_vec, idx);
+
+   r_vec->rx_ring = >rx_rings[idx];
+   nfp_net_rx_ring_init(r_vec->rx_ring, r_vec, idx);
+
snprintf(r_vec->name, sizeof(r_vec->name),
 "%s-rxtx-%d", nn->netdev->name, idx);
err = request_irq(entry->vector, r_vec->handler, 0, r_vec->name, r_vec);
@@ -1691,6 +1691,15 @@ static int nfp_net_netdev_open(struct net_device *netdev)
goto err_free_exn;
disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
 
+   nn->rx_rings = kcalloc(nn->num_rx_rings, sizeof(*nn->rx_rings),
+  GFP_KERNEL);
+   if (!nn->rx_rings)
+   goto err_free_lsc;
+   nn->tx_rings = kcalloc(nn->num_tx_rings, sizeof(*nn->tx_rings),
+  GFP_KERNEL);
+   if (!nn->tx_rings)
+   goto err_free_rx_rings;
+
for (r = 0; r < nn->num_r_vecs; r++) {
err = nfp_net_prepare_vector(nn, >r_vecs[r], r);
if (err)
@@ -1805,6 +1814,10 @@ err_free_tx_ring_p:
 err_cleanup_vec_p:
nfp_net_cleanup_vector(nn, >r_vecs[r]);
}
+   kfree(nn->tx_rings);
+err_free_rx_rings:
+   kfree(nn->rx_rings);
+err_free_lsc:
nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
 err_free_exn:
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
@@ -1850,6 +1863,9 @@ static int nfp_net_netdev_close(struct net_device *netdev)
nfp_net_cleanup_vector(nn, >r_vecs[r]);
}
 
+   kfree(nn->rx_rings);
+   kfree(nn->tx_rings);
+
nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
 
-- 
1.9.1



[RFC (v3) 06/19] nfp: move link state interrupt request/free calls

2016-02-03 Thread Jakub Kicinski
We need to be able to disable the link state interrupt when
the device is brought down.  We used to just free the IRQ
at the beginning of .ndo_stop().  As we now move towards
more ordered .ndo_open()/.ndo_stop() paths LSC allocation
should be placed in the "allocate resource" section.

Since the IRQ can't be freed early in .ndo_stop(), it is
disabled instead.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 23 +++---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 070645f9bc21..bebdae80ccda 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1727,10 +1727,16 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
  NFP_NET_IRQ_EXN_IDX, nn->exn_handler);
if (err)
return err;
+   err = nfp_net_aux_irq_request(nn, NFP_NET_CFG_LSC, "%s-lsc",
+ nn->lsc_name, sizeof(nn->lsc_name),
+ NFP_NET_IRQ_LSC_IDX, nn->lsc_handler);
+   if (err)
+   goto err_free_exn;
+   disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
 
err = nfp_net_alloc_rings(nn);
if (err)
-   goto err_free_exn;
+   goto err_free_lsc;
 
err = netif_set_real_num_tx_queues(netdev, nn->num_tx_rings);
if (err)
@@ -1810,19 +1816,11 @@ static int nfp_net_netdev_open(struct net_device 
*netdev)
 
netif_tx_wake_all_queues(netdev);
 
-   err = nfp_net_aux_irq_request(nn, NFP_NET_CFG_LSC, "%s-lsc",
- nn->lsc_name, sizeof(nn->lsc_name),
- NFP_NET_IRQ_LSC_IDX, nn->lsc_handler);
-   if (err)
-   goto err_stop_tx;
+   enable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
nfp_net_read_link_status(nn);
 
return 0;
 
-err_stop_tx:
-   netif_tx_disable(netdev);
-   for (r = 0; r < nn->num_r_vecs; r++)
-   nfp_net_tx_flush(nn->r_vecs[r].tx_ring);
 err_disable_napi:
while (r--) {
napi_disable(>r_vecs[r].napi);
@@ -1832,6 +1830,8 @@ err_clear_config:
nfp_net_clear_config_and_disable(nn);
 err_free_rings:
nfp_net_free_rings(nn);
+err_free_lsc:
+   nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
 err_free_exn:
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
return err;
@@ -1853,7 +1853,7 @@ static int nfp_net_netdev_close(struct net_device *netdev)
 
/* Step 1: Disable RX and TX rings from the Linux kernel perspective
 */
-   nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
+   disable_irq(nn->irq_entries[NFP_NET_CFG_LSC].vector);
netif_carrier_off(netdev);
nn->link_up = false;
 
@@ -1874,6 +1874,7 @@ static int nfp_net_netdev_close(struct net_device *netdev)
}
 
nfp_net_free_rings(nn);
+   nfp_net_aux_irq_free(nn, NFP_NET_CFG_LSC, NFP_NET_IRQ_LSC_IDX);
nfp_net_aux_irq_free(nn, NFP_NET_CFG_EXN, NFP_NET_IRQ_EXN_IDX);
 
nn_dbg(nn, "%s down", netdev->name);
-- 
1.9.1



[RFC (v3) 13/19] nfp: move filling ring information to FW config

2016-02-03 Thread Jakub Kicinski
nfp_net_[rt]x_ring_{alloc,free} should only allocate or free
ring resources without touching the device.  Move setting
parameters in the BAR to separate functions.  This will make
it possible to reuse alloc/free functions to allocate new
rings while the device is running.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 50 ++
 1 file changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1e1e0f7ac077..34f933f19059 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1385,10 +1385,6 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
 
-   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(tx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(tx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(tx_ring->idx), 0);
-
kfree(tx_ring->txbufs);
 
if (tx_ring->txds)
@@ -1428,11 +1424,6 @@ static int nfp_net_tx_ring_alloc(struct nfp_net_tx_ring 
*tx_ring)
if (!tx_ring->txbufs)
goto err_alloc;
 
-   /* Write the DMA address, size and MSI-X info to the device */
-   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(tx_ring->idx), tx_ring->dma);
-   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(tx_ring->idx), ilog2(tx_ring->cnt));
-   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(tx_ring->idx), r_vec->irq_idx);
-
netif_set_xps_queue(nn->netdev, _vec->affinity_mask, tx_ring->idx);
 
nn_dbg(nn, "TxQ%02d: QCidx=%02d cnt=%d dma=%#llx host=%p\n",
@@ -1456,10 +1447,6 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
struct nfp_net *nn = r_vec->nfp_net;
struct pci_dev *pdev = nn->pdev;
 
-   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(rx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(rx_ring->idx), 0);
-   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(rx_ring->idx), 0);
-
kfree(rx_ring->rxbufs);
 
if (rx_ring->rxds)
@@ -1499,11 +1486,6 @@ static int nfp_net_rx_ring_alloc(struct nfp_net_rx_ring 
*rx_ring)
if (!rx_ring->rxbufs)
goto err_alloc;
 
-   /* Write the DMA address, size and MSI-X info to the device */
-   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(rx_ring->idx), rx_ring->dma);
-   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(rx_ring->idx), ilog2(rx_ring->cnt));
-   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(rx_ring->idx), r_vec->irq_idx);
-
nn_dbg(nn, "RxQ%02d: FlQCidx=%02d RxQCidx=%02d cnt=%d dma=%#llx 
host=%p\n",
   rx_ring->idx, rx_ring->fl_qcidx, rx_ring->rx_qcidx,
   rx_ring->cnt, (unsigned long long)rx_ring->dma, rx_ring->rxds);
@@ -1628,6 +1610,17 @@ static void nfp_net_write_mac_addr(struct nfp_net *nn, 
const u8 *mac)
  get_unaligned_be16(nn->netdev->dev_addr + 4) << 16);
 }
 
+static void nfp_net_vec_clear_ring_data(struct nfp_net *nn, unsigned int idx)
+{
+   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(idx), 0);
+
+   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(idx), 0);
+   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(idx), 0);
+}
+
 /**
  * nfp_net_clear_config_and_disable() - Clear control BAR and disable NFP
  * @nn:  NFP Net device to reconfigure
@@ -1635,6 +1628,7 @@ static void nfp_net_write_mac_addr(struct nfp_net *nn, 
const u8 *mac)
 static void nfp_net_clear_config_and_disable(struct nfp_net *nn)
 {
u32 new_ctrl, update;
+   unsigned int r;
int err;
 
new_ctrl = nn->ctrl;
@@ -1656,9 +1650,26 @@ static void nfp_net_clear_config_and_disable(struct 
nfp_net *nn)
return;
}
 
+   for (r = 0; r < nn->num_r_vecs; r++)
+   nfp_net_vec_clear_ring_data(nn, r);
+
nn->ctrl = new_ctrl;
 }
 
+static void
+nfp_net_vec_write_ring_data(struct nfp_net *nn, struct nfp_net_r_vector *r_vec,
+   unsigned int idx)
+{
+   /* Write the DMA address, size and MSI-X info to the device */
+   nn_writeq(nn, NFP_NET_CFG_RXR_ADDR(idx), r_vec->rx_ring->dma);
+   nn_writeb(nn, NFP_NET_CFG_RXR_SZ(idx), ilog2(r_vec->rx_ring->cnt));
+   nn_writeb(nn, NFP_NET_CFG_RXR_VEC(idx), r_vec->irq_idx);
+
+   nn_writeq(nn, NFP_NET_CFG_TXR_ADDR(idx), r_vec->tx_ring->dma);
+   nn_writeb(nn, NFP_NET_CFG_TXR_SZ(idx), ilog2(r_vec->tx_ring->cnt));
+   nn_writeb(nn, NFP_NET_CFG_TXR_VEC(idx), r_vec->irq_idx);
+}
+
 /**
  * nfp_net_start_vec() - Start ring vector
  * @nn:  NFP Net device structure
@@ -1766,6 +1777,9 @@ static int nfp_net_netdev_open(struct net_device *netdev)
 * - Set the Freelist buffer size
 * - Enable the FW
 */
+   

[RFC (v3) 00/19] MTU changes and other fixes

2016-02-03 Thread Jakub Kicinski
Hi Dave,

I'm posting this as RFC because I think +625/-339 of mostly
code refactoring is not good for net, my intention is to
show you I did the homework and then repost this as two parts
- for net and for net-next.  In retrospect I could've just
asked you right away where you intend this series to go...

First four patches are what you already seen - those plus
number 5 I would be glad to see in net.

Patches 6-13 refactor open/stop paths to follow this:
 - alloc;
 - dev/FW init;
 - stack init/start.
stop:
 - stack stop;
 - dev/FW down;
 - free.
That's a quite a bit of code churn I did my best to split
it up but probably still not much fun to review.

Patch 14 splits the open/stop into chunks I can call later.

Patch 15 makes sure that FW start/stop operations are
reflected in SW state (which was not needed earlier since
we always did full down/up).

Patches 16 and 18 are trivial, split for readability.

Patch 17 does what you requested for MTU change:
 - alloc new resources;
 - stop dev;
 - try to start dev with new config;
 - if failed try with old config;
 - if failed die loudly.

Patch 19 does the same thing for ring resize.

I tested this with various error injection hacks and it 
seems quite solid.

Please let me know if 1-5/6-19 split makes sense to you or if
you prefer to take them all into one tree (and I should squash
#1 and #2 into proper rework (#17)).

Thanks!


Jakub Kicinski (19):
  nfp: return error if MTU change fails
  nfp: free buffers before changing MTU
  nfp: correct RX buffer length calculation
  nfp: fix RX buffer length validation
  nfp: don't trust netif_running() in debug code
  nfp: move link state interrupt request/free calls
  nfp: break up nfp_net_{alloc|free}_rings
  nfp: make *x_ring_init do all the init
  nfp: allocate ring SW structs dynamically
  nfp: cleanup tx ring flush and rename to reset
  nfp: reorganize initial filling of RX rings
  nfp: preallocate RX buffers early in .ndo_open
  nfp: move filling ring information to FW config
  nfp: slice .ndo_open() and .ndo_stop() up
  nfp: sync ring state during FW reconfiguration
  nfp: propagate list buffer size in struct rx_ring
  nfp: convert .ndo_change_mtu() to prepare/commit paradigm
  nfp: pass ring count as function parameter
  nfp: allow ring size reconfiguration at runtime

 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  10 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 920 ++---
 .../net/ethernet/netronome/nfp/nfp_net_debugfs.c   |   4 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  30 +-
 4 files changed, 625 insertions(+), 339 deletions(-)

-- 
1.9.1



[RFC (v3) 04/19] nfp: fix RX buffer length validation

2016-02-03 Thread Jakub Kicinski
Meaning of data_len and meta_len RX WB descriptor fields depend
slightly on whether rx_offset is dynamic or not.  For dynamic
offsets data_len includes meta_len.  This makes the code harder
to follow, in fact our RX buffer length check is incorrect -
we are comparing allocation length to data_len while we should
also account for meta_len.

Let's adjust the values of data_len and meta_len to their natural
meaning and simplify the logic.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Rolf Neugebauer 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 553ae64e2f7f..070645f9bc21 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1259,22 +1259,19 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
meta_len = rxd->rxd.meta_len_dd & PCIE_DESC_RX_META_LEN_MASK;
data_len = le16_to_cpu(rxd->rxd.data_len);
+   /* For dynamic offset data_len includes meta_len, adjust */
+   if (nn->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC)
+   data_len -= meta_len;
+   else
+   meta_len = nn->rx_offset;
 
-   if (WARN_ON_ONCE(data_len > nn->fl_bufsz)) {
+   if (WARN_ON_ONCE(meta_len + data_len > nn->fl_bufsz)) {
dev_kfree_skb_any(skb);
continue;
}
 
-   if (nn->rx_offset == NFP_NET_CFG_RX_OFFSET_DYNAMIC) {
-   /* The packet data starts after the metadata */
-   skb_reserve(skb, meta_len);
-   } else {
-   /* The packet data starts at a fixed offset */
-   skb_reserve(skb, nn->rx_offset);
-   }
-
-   /* Adjust the SKB for the dynamic meta data pre-pended */
-   skb_put(skb, data_len - meta_len);
+   skb_reserve(skb, meta_len);
+   skb_put(skb, data_len);
 
nfp_net_set_hash(nn->netdev, skb, rxd);
 
-- 
1.9.1



[PATCH v2 2/6] net: pch_gbe: Mark Minnow PHY reset GPIO active low

2016-02-03 Thread Paul Burton
The Minnow PHY reset GPIO is set to 0 to enter reset & 1 to leave reset
- that is, it is an active low GPIO. In order to allow for the code to
be made more generic by further patches, indicate to the GPIO subsystem
that the GPIO is active low & invert the values it is set to such that
they reflect logically whether the device is being reset or not.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 3b98b263b..fde4c11 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -2717,7 +2717,8 @@ err_free_netdev:
  */
 static int pch_gbe_minnow_platform_init(struct pci_dev *pdev)
 {
-   unsigned long flags = GPIOF_DIR_OUT | GPIOF_INIT_HIGH | GPIOF_EXPORT;
+   unsigned long flags = GPIOF_DIR_OUT | GPIOF_INIT_LOW |
+   GPIOF_EXPORT | GPIOF_ACTIVE_LOW;
unsigned gpio = MINNOW_PHY_RESET_GPIO;
int ret;
 
@@ -2729,10 +2730,10 @@ static int pch_gbe_minnow_platform_init(struct pci_dev 
*pdev)
return ret;
}
 
-   gpio_set_value(gpio, 0);
-   usleep_range(1250, 1500);
gpio_set_value(gpio, 1);
usleep_range(1250, 1500);
+   gpio_set_value(gpio, 0);
+   usleep_range(1250, 1500);
 
return ret;
 }
-- 
2.7.0



[PATCH v3] net:Add sysctl_max_skb_frags

2016-02-03 Thread Hans Westgaard Ry
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.

Signed-off-by: Hans Westgaard Ry 
Reviewed-by: Håkon Bugge 

---
 include/linux/skbuff.h |  1 +
 net/core/skbuff.c  |  2 ++
 net/core/sysctl_net_core.c | 10 ++
 net/ipv4/tcp.c |  4 ++--
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4355129..fe47ad3 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -219,6 +219,7 @@ struct sk_buff;
 #else
 #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
 #endif
+extern int sysctl_max_skb_frags;
 
 typedef struct skb_frag_struct skb_frag_t;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 152b9c7..c336b97 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -79,6 +79,8 @@
 
 struct kmem_cache *skbuff_head_cache __read_mostly;
 static struct kmem_cache *skbuff_fclone_cache __read_mostly;
+int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
+EXPORT_SYMBOL(sysctl_max_skb_frags);
 
 /**
  * skb_panic - private function for out-of-line support
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 95b6139..a6beb7b 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -26,6 +26,7 @@ static int zero = 0;
 static int one = 1;
 static int min_sndbuf = SOCK_MIN_SNDBUF;
 static int min_rcvbuf = SOCK_MIN_RCVBUF;
+static int max_skb_frags = MAX_SKB_FRAGS;
 
 static int net_msg_warn;   /* Unused, but still a sysctl */
 
@@ -392,6 +393,15 @@ static struct ctl_table net_core_table[] = {
.mode   = 0644,
.proc_handler   = proc_dointvec
},
+   {
+   .procname   = "max_skb_frags",
+   .data   = _max_skb_frags,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _skb_frags,
+   },
{ }
 };
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c82cca1..3dc7a2fd 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -938,7 +938,7 @@ new_segment:
 
i = skb_shinfo(skb)->nr_frags;
can_coalesce = skb_can_coalesce(skb, i, page, offset);
-   if (!can_coalesce && i >= MAX_SKB_FRAGS) {
+   if (!can_coalesce && i >= sysctl_max_skb_frags) {
tcp_mark_push(tp, skb);
goto new_segment;
}
@@ -1211,7 +1211,7 @@ new_segment:
 
if (!skb_can_coalesce(skb, i, pfrag->page,
  pfrag->offset)) {
-   if (i == MAX_SKB_FRAGS || !sg) {
+   if (i == sysctl_max_skb_frags || !sg) {
tcp_mark_push(tp, skb);
goto new_segment;
}
-- 
2.4.3



Re: [PATCH net-next v5 2/2] virtio_net: add ethtool support for set and get of settings

2016-02-03 Thread Nikolay Aleksandrov
On 02/03/2016 04:04 AM, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov 
> 
> This patch allows the user to set and retrieve speed and duplex of the
> virtio_net device via ethtool. Having this functionality is very helpful
> for simulating different environments and also enables the virtio_net
> device to participate in operations where proper speed and duplex are
> required (e.g. currently bonding lacp mode requires full duplex). Custom
> speed and duplex are not allowed, the user-supplied settings are validated
> before applying.
> 
> Example:
> $ ethtool eth1
> Settings for eth1:
> ...
>   Speed: Unknown!
>   Duplex: Unknown! (255)
> $ ethtool -s eth1 speed 1000 duplex full
> $ ethtool eth1
> Settings for eth1:
> ...
>   Speed: 1000Mb/s
>   Duplex: Full
> 
> Based on a patch by Roopa Prabhu.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> v2: use the new ethtool speed/duplex validation functions and allow half
> duplex to be set
> v3: return error if the user tries to change anything besides speed/duplex
> as per Michael's comment
> We have to zero-out advertising as it gets set automatically by ethtool if
> setting speed and duplex together.
> v4: Set port type to PORT_OTHER
> v5: null diff1.port because we set cmd->port now and ethtool returns it in
> the set request, retested all cases
> 

Hmm, nulling the advertising and ->port completely ignores them, i.e. won't 
produce
an error if the user actually specified a different value for either of them.
We can check if the ->port matches what we returned, but there's no fix for
advertising. I'm leaving both ignored for now, please let me know if you'd
prefer otherwise.

Thanks,
 Nik



RE: bnx2x commits needed to use 7.51.10 firmware?

2016-02-03 Thread Ariel Elior
+Yuval

> -Original Message-
> From: Dan Streetman [mailto:dan.street...@canonical.com]
> Sent: Wednesday, February 03, 2016 2:20 AM
> To: Ariel Elior 
> Cc: netdev 
> Subject: bnx2x commits needed to use 7.51.10 firmware?
> 
> Hi Ariel,
> 
> I'm trying to update the bnx2x driver in Ubuntu trusty (3.13 kernel)
> release to use the 7.51.10 firmware; can you help me determine which
> commits need to be backported?
> 
> Some reference is in Launchpad bug 1454286:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1454286
> 
> basically, there are 87 commits between the current bnx2x driver level
> in Ubuntu trusty's kernel, and commit e42780b where the bnx2x driver
> is acutally updated to use the 7.51.10 firmware.  Can you provide
> guidance to which commits should be pulled back?  Do all 87 of them
> need to be included?
> 
> Thanks!


Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe

2016-02-03 Thread John Fastabend
On 16-02-03 02:11 AM, Amir Vadai" wrote:
> On Wed, Feb 03, 2016 at 01:27:32AM -0800, John Fastabend wrote:
>> This extends the setup_tc framework so it can support more than just
>> the mqprio offload and push other classifiers and qdiscs into the
>> hardware. The series here targets the u32 classifier and ixgbe
>> driver. I worked out the u32 classifier because it is protocol
>> oblivious and aligns with multiple hardware devices I have access
>> to. I did an initial implementation on ixgbe because (a) I have one
>> in my box (b) its a stable driver and (c) it is relatively simple
>> compared to the other devices I have here but still has enough
>> flexibility to exercise the features of cls_u32.
>>
>> I intentionally limited the scope of this series to the basic
>> feature set. Specifically this uses a 'big hammer' feature bit
>> to do the offload or not. If the bit is set you get offloaded rules
>> if it is not then rules will not be offloaded. If we can agree on
>> this patch series there are some more patches on my queue we can
>> talk about to make the offload decision per rule using flags similar
>> to how we do l2 mac updates. Additionally the error strategy can
>> be improved to be hard aborting, log and continue, etc. I think
>> these are nice to have improvements but shouldn't block this series.
>>
>> Also by adding get_parse_graph and set_parse_graph attributes as
>> in my previous flow_api work we can build programmable devices
>> and programmatically learn when rules can or can not be loaded
>> into the hardware. Again future work.
>>
>> Any comments/feedback appreciated.
>>
>> Thanks,
>> John
>>
>> ---
>>
>> John Fastabend (7):
>>   net: rework ndo tc op to consume additional qdisc handle parameter
>>   net: rework setup_tc ndo op to consume general tc operand
>>   net: sched: add cls_u32 offload hooks for netdevs
>>   net: add tc offload feature flag
>>   net: tc: helper functions to query action types
>>   net: ixgbe: add minimal parser details for ixgbe
>>   net: ixgbe: add support for tc_u32 offload
>>
> 
> Hi John,
> 
> Nice work :)

Thanks, we will need at least a v2 to fixup some build errors
with various compile flags caught by build_bot and missed by me.

> 
> I will add mlx5 support, and see if can live with u32. If not - will
> add flower support too.

That would be great.

Thanks
.John

> 
> Amir
> 



Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-03 Thread Hannes Frederic Sowa

On 03.02.2016 12:36, Simon McVittie wrote:

On 02/02/16 17:34, David Herrmann wrote:

Furthermore, with this patch in place, a process better not pass any
file-descriptors to an untrusted process.

...

Did anyone notify the dbus maintainers of this? They
might wanna document this, if not already done (CC: smcv).


Sorry, I'm not clear from this message on what patterns I should be
documenting as bad, and what the effect of non-compliance would be.

dbus-daemon has a fd-passing feature, which uses AF_UNIX sockets'
existing ability to pass fds to let users of D-Bus attach fds to their
messages. The message is passed from the sending client to dbus-daemon,
then from dbus-daemon to the recipient:

  AF_UNIX AF_UNIX
||
 sender ---> dbus-daemon ---> recipient
||

This has been API since before I was a D-Bus maintainer, so I have no
influence over its existence; just like the kernel doesn't want to break
user-space, dbus-daemon doesn't want to break its users.

The system dbus-daemon (dbus-daemon --system) is a privilege boundary,
and accepts senders and recipients with differing privileges. Without
configuration, messages are denied by default. Recipients can open this
up (by installing system-wide configuration) to allow arbitrary
processes to send messages to them, so that they can carry out their own
discretionary access control. Since 2014, the system dbus-daemon accepts
up to 16 file descriptors per message by default.

There is also a session or user dbus-daemon (dbus-daemon --session) per
uid, but that isn't normally a privilege boundary, so any user trying to
carry out a denial of service there is only hurting themselves.

Am I right in saying that the advice I give to D-Bus users should be
something like this?

* system services should not send fds at all, unless they trust the
   dbus-daemon
* system services should not send fds via D-Bus that will be delivered
   to recipients that they do not trust
* sending fds to an untrusted recipient would enable that recipient to
   carry out a denial-of-service attack (on what? the sender? the
   dbus-daemon?)



The described behavior was simply a bug in the referenced patch. I 
already posted a follow-up to change this behavior so that only the 
current sending process is credited with the number of fds in flight:




Other processes (in this case the original opener of the file) isn't 
credited anymore if it does not send the fd itself.


That said, I don't think you need to change anything or give different 
advice because of this thread.


Thanks,
Hannes



Re: [PATCH v2] unix: properly account for FDs passed over unix sockets

2016-02-03 Thread David Herrmann
Hi

On Wed, Feb 3, 2016 at 12:36 PM, Simon McVittie
 wrote:
> Am I right in saying that the advice I give to D-Bus users should be
> something like this?
>
> * system services should not send fds at all, unless they trust the
>   dbus-daemon
> * system services should not send fds via D-Bus that will be delivered
>   to recipients that they do not trust
> * sending fds to an untrusted recipient would enable that recipient to
>   carry out a denial-of-service attack (on what? the sender? the
>   dbus-daemon?)

With the revised patch from Hannes, this should no longer be needed.
My original concern was only about accounting inflight-fds on the
file-owner, rather than the sender.

However, with Hannes' revised patch, a different DoS attack against
dbus-daemon is possible. Imagine a peer that receives batches of FDs,
but never dequeues them. They will be accounted on the inflight-limit
of dbus-daemon, as such causing messages of independent peers to be
rejected in case they carry FDs.
Preferably, dbus-daemon would avoid queuing more than 16 FDs on a
single destination (total). But that would require POLLOUT to be
capped by the number of queued fds. A possible workaround is to add
CAP_SYS_RESOURCE to dbus-daemon.

Thanks
David


[PATCH v2 6/6] net: pch_gbe: Allow longer for resets

2016-02-03 Thread Paul Burton
Resets of the EG20T MAC on the MIPS Boston development board take longer
than the 1000 loops that pch_gbe_wait_clr_bit was performing. Bump up
the number of loops.

Signed-off-by: Paul Burton 

---

Changes in v2: None

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 00ef83c..87994d2 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -321,7 +321,7 @@ static void pch_gbe_wait_clr_bit(void *reg, u32 bit)
u32 tmp;
 
/* wait busy */
-   tmp = 1000;
+   tmp = 1;
while ((ioread32(reg) & bit) && --tmp)
cpu_relax();
if (!tmp)
-- 
2.7.0



[PATCH v2 4/6] net: pch_gbe: Always reset PHY along with MAC

2016-02-03 Thread Paul Burton
On the MIPS Boston development board, the EG20T MAC does not report
receiving the RX clock from the (RGMII) RTL8211E PHY unless the PHY is
reset at the same time as the MAC. Since the pch_gbe driver resets the
MAC a number of times - twice during probe, and when taking down the
network interface - we need to reset the PHY at all the same times. Do
that from pch_gbe_mac_reset_hw which is used to reset the MAC in all
cases.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 23d28f0..824ff9e 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -378,10 +378,13 @@ static void pch_gbe_mac_reset_hw(struct pch_gbe_hw *hw)
 {
/* Read the MAC address. and store to the private data */
pch_gbe_mac_read_mac_addr(hw);
+   pch_gbe_phy_set_reset(hw, 1);
iowrite32(PCH_GBE_ALL_RST, >reg->RESET);
 #ifdef PCH_GBE_MAC_IFOP_RGMII
iowrite32(PCH_GBE_MODE_GMII_ETHER, >reg->MODE);
 #endif
+   pch_gbe_phy_set_reset(hw, 0);
+   usleep_range(1250, 1500);
pch_gbe_wait_clr_bit(>reg->RESET, PCH_GBE_ALL_RST);
/* Setup the receive addresses */
pch_gbe_mac_mar_set(hw, hw->mac.addr, 0);
-- 
2.7.0



[PATCH v2 3/6] net: pch_gbe: Pull PHY GPIO handling out of Minnow code

2016-02-03 Thread Paul Burton
The MIPS Boston development board uses the Intel EG20T Platform
Controller Hub, including its gigabit ethernet controller, and requires
that its RTL8211E PHY be reset much like the Minnow platform. Pull the
PHY reset GPIO handling out of Minnow-specific code such that it can be
shared by later patches.

Signed-off-by: Paul Burton 
---

Changes in v2: None

 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h|  4 ++-
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   | 33 +++---
 2 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
index 2a55d6d..884f90b 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
@@ -582,15 +582,17 @@ struct pch_gbe_hw_stats {
 
 /**
  * struct pch_gbe_privdata - PCI Device ID driver data
+ * @phy_reset_gpio:PHY reset GPIO descriptor.
  * @phy_tx_clk_delay:  Bool, configure the PHY TX delay in software
  * @phy_disable_hibernate: Bool, disable PHY hibernation
  * @platform_init: Platform initialization callback, called from
  * probe, prior to PHY initialization.
  */
 struct pch_gbe_privdata {
+   struct gpio_desc *phy_reset_gpio;
bool phy_tx_clk_delay;
bool phy_disable_hibernate;
-   int (*platform_init)(struct pci_dev *pdev);
+   int (*platform_init)(struct pci_dev *, struct pch_gbe_privdata *);
 };
 
 /**
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index fde4c11..23d28f0 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -360,6 +360,16 @@ static void pch_gbe_mac_mar_set(struct pch_gbe_hw *hw, u8 
* addr, u32 index)
pch_gbe_wait_clr_bit(>reg->ADDR_MASK, PCH_GBE_BUSY);
 }
 
+static void pch_gbe_phy_set_reset(struct pch_gbe_hw *hw, int value)
+{
+   struct pch_gbe_adapter *adapter = pch_gbe_hw_to_adapter(hw);
+
+   if (!adapter->pdata || !adapter->pdata->phy_reset_gpio)
+   return;
+
+   gpiod_set_value(adapter->pdata->phy_reset_gpio, value);
+}
+
 /**
  * pch_gbe_mac_reset_hw - Reset hardware
  * @hw:Pointer to the HW structure
@@ -2627,7 +2637,14 @@ static int pch_gbe_probe(struct pci_dev *pdev,
adapter->hw.reg = pcim_iomap_table(pdev)[PCH_GBE_PCI_BAR];
adapter->pdata = (struct pch_gbe_privdata *)pci_id->driver_data;
if (adapter->pdata && adapter->pdata->platform_init)
-   adapter->pdata->platform_init(pdev);
+   adapter->pdata->platform_init(pdev, pdata);
+
+   if (adapter->pdata && adapter->pdata->phy_reset_gpio) {
+   pch_gbe_phy_set_reset(>hw, 1);
+   usleep_range(1250, 1500);
+   pch_gbe_phy_set_reset(>hw, 0);
+   usleep_range(1250, 1500);
+   }
 
adapter->ptp_pdev = pci_get_bus_and_slot(adapter->pdev->bus->number,
   PCI_DEVFN(12, 4));
@@ -2715,7 +2732,8 @@ err_free_netdev:
 /* The AR803X PHY on the MinnowBoard requires a physical pin to be toggled to
  * ensure it is awake for probe and init. Request the line and reset the PHY.
  */
-static int pch_gbe_minnow_platform_init(struct pci_dev *pdev)
+static int pch_gbe_minnow_platform_init(struct pci_dev *pdev,
+   struct pch_gbe_privdata *pdata)
 {
unsigned long flags = GPIOF_DIR_OUT | GPIOF_INIT_LOW |
GPIOF_EXPORT | GPIOF_ACTIVE_LOW;
@@ -2724,16 +2742,11 @@ static int pch_gbe_minnow_platform_init(struct pci_dev 
*pdev)
 
ret = devm_gpio_request_one(>dev, gpio, flags,
"minnow_phy_reset");
-   if (ret) {
+   if (!ret)
+   pdata->phy_reset_gpio = gpio_to_desc(gpio);
+   else
dev_err(>dev,
"ERR: Can't request PHY reset GPIO line '%d'\n", gpio);
-   return ret;
-   }
-
-   gpio_set_value(gpio, 1);
-   usleep_range(1250, 1500);
-   gpio_set_value(gpio, 0);
-   usleep_range(1250, 1500);
 
return ret;
 }
-- 
2.7.0



[PATCH v2 5/6] net: pch_gbe: Add device tree support

2016-02-03 Thread Paul Burton
Introduce support for retrieving the PHY reset GPIO from device tree,
which will be used on the MIPS Boston development board. This requires
support for probe deferral in order to work correctly, since the order
of device probe is not guaranteed & typically the EG20T GPIO controller
device will be probed after the ethernet MAC.

Signed-off-by: Paul Burton 
---

Changes in v2:
- Tidy up handling of parsing private data, drop err_out.

 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 824ff9e..00ef83c 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define DRV_VERSION "1.01"
 const char pch_driver_version[] = DRV_VERSION;
@@ -2594,13 +2596,39 @@ static void pch_gbe_remove(struct pci_dev *pdev)
free_netdev(netdev);
 }
 
+static struct pch_gbe_privdata *
+pch_gbe_get_priv(struct pci_dev *pdev, const struct pci_device_id *pci_id)
+{
+   struct pch_gbe_privdata *pdata;
+   struct gpio_desc *gpio;
+
+   if (!config_enabled(CONFIG_OF))
+   return (struct pch_gbe_privdata *)pci_id->driver_data;
+
+   pdata = devm_kzalloc(>dev, sizeof(*pdata), GFP_KERNEL);
+   if (!pdata)
+   return ERR_PTR(-ENOMEM);
+
+   gpio = devm_gpiod_get(>dev, "phy-reset", GPIOD_ASIS);
+   if (IS_ERR(gpio))
+   return ERR_PTR(PTR_ERR(gpio));
+   pdata->phy_reset_gpio = gpio;
+
+   return pdata;
+}
+
 static int pch_gbe_probe(struct pci_dev *pdev,
  const struct pci_device_id *pci_id)
 {
struct net_device *netdev;
struct pch_gbe_adapter *adapter;
+   struct pch_gbe_privdata *pdata;
int ret;
 
+   pdata = pch_gbe_get_priv(pdev, pci_id);
+   if (IS_ERR(pdata))
+   return PTR_ERR(pdata);
+
ret = pcim_enable_device(pdev);
if (ret)
return ret;
@@ -2638,7 +2666,7 @@ static int pch_gbe_probe(struct pci_dev *pdev,
adapter->pdev = pdev;
adapter->hw.back = adapter;
adapter->hw.reg = pcim_iomap_table(pdev)[PCH_GBE_PCI_BAR];
-   adapter->pdata = (struct pch_gbe_privdata *)pci_id->driver_data;
+   adapter->pdata = pdata;
if (adapter->pdata && adapter->pdata->platform_init)
adapter->pdata->platform_init(pdev, pdata);
 
-- 
2.7.0



[PATCH net-next] bonding: 3ad: apply ad_actor settings changes immediately

2016-02-03 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Currently the bonding allows to set ad_actor_system and prio while the
bond device is down, but these are actually applied only if there aren't
any slaves yet (applied to bond device when first slave shows up, and to
slaves at 3ad bind time). After this patch changes are applied immediately
and the new values can be used/seen after the bond's upped so it's not
necessary anymore to release all and enslave again to see the changes.

CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
Signed-off-by: Nikolay Aleksandrov 
---
 drivers/net/bonding/bond_3ad.c | 40 +++---
 drivers/net/bonding/bond_options.c |  4 
 include/net/bond_3ad.h |  1 +
 3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 4cbb8b27a891..ee94056dbb2e 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -357,6 +357,14 @@ static u8 __get_duplex(struct port *port)
return retval;
 }
 
+static void __ad_actor_update_port(struct port *port)
+{
+   const struct bonding *bond = bond_get_bond_by_slave(port->slave);
+
+   port->actor_system = BOND_AD_INFO(bond).system.sys_mac_addr;
+   port->actor_system_priority = BOND_AD_INFO(bond).system.sys_priority;
+}
+
 /* Conversions */
 
 /**
@@ -1963,9 +1971,7 @@ void bond_3ad_bind_slave(struct slave *slave)
port->actor_admin_port_key = bond->params.ad_user_port_key << 6;
ad_update_actor_keys(port, false);
/* actor system is the bond's system */
-   port->actor_system = BOND_AD_INFO(bond).system.sys_mac_addr;
-   port->actor_system_priority =
-   BOND_AD_INFO(bond).system.sys_priority;
+   __ad_actor_update_port(port);
/* tx timer(to verify that no more than MAX_TX_IN_SECOND
 * lacpdu's are sent in one second)
 */
@@ -2148,6 +2154,34 @@ out:
 }
 
 /**
+ * bond_3ad_update_ad_actor_settings - reflect change of actor settings to 
ports
+ * @bond: bonding struct to work on
+ *
+ * If an ad_actor setting gets changed we need to update the individual port
+ * settings so the bond device will use the new values when it gets upped.
+ */
+void bond_3ad_update_ad_actor_settings(struct bonding *bond)
+{
+   struct list_head *iter;
+   struct slave *slave;
+
+   ASSERT_RTNL();
+
+   BOND_AD_INFO(bond).system.sys_priority = bond->params.ad_actor_sys_prio;
+   if (is_zero_ether_addr(bond->params.ad_actor_system))
+   BOND_AD_INFO(bond).system.sys_mac_addr =
+   *((struct mac_addr *)bond->dev->dev_addr);
+   else
+   BOND_AD_INFO(bond).system.sys_mac_addr =
+   *((struct mac_addr *)bond->params.ad_actor_system);
+
+   spin_lock_bh(>mode_lock);
+   bond_for_each_slave(bond, slave, iter)
+   __ad_actor_update_port(&(SLAVE_AD_INFO(slave)->port));
+   spin_unlock_bh(>mode_lock);
+}
+
+/**
  * bond_3ad_state_machine_handler - handle state machines timeout
  * @bond: bonding struct to work on
  *
diff --git a/drivers/net/bonding/bond_options.c 
b/drivers/net/bonding/bond_options.c
index 55e93b6b6d21..ed0bdae64f5e 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -1392,6 +1392,8 @@ static int bond_option_ad_actor_sys_prio_set(struct 
bonding *bond,
newval->value);
 
bond->params.ad_actor_sys_prio = newval->value;
+   bond_3ad_update_ad_actor_settings(bond);
+
return 0;
 }
 
@@ -1418,6 +1420,8 @@ static int bond_option_ad_actor_system_set(struct bonding 
*bond,
 
netdev_info(bond->dev, "Setting ad_actor_system to %pM\n", mac);
ether_addr_copy(bond->params.ad_actor_system, mac);
+   bond_3ad_update_ad_actor_settings(bond);
+
return 0;
 
 err:
diff --git a/include/net/bond_3ad.h b/include/net/bond_3ad.h
index f1fbc3b11962..f358ad5e4214 100644
--- a/include/net/bond_3ad.h
+++ b/include/net/bond_3ad.h
@@ -306,5 +306,6 @@ int bond_3ad_lacpdu_recv(const struct sk_buff *skb, struct 
bonding *bond,
 struct slave *slave);
 int bond_3ad_set_carrier(struct bonding *bond);
 void bond_3ad_update_lacp_rate(struct bonding *bond);
+void bond_3ad_update_ad_actor_settings(struct bonding *bond);
 #endif /* _NET_BOND_3AD_H */
 
-- 
2.4.3



Re: [PATCH] flowi: add concept of "not_oif"

2016-02-03 Thread Julian Anastasov

Hello,

On Wed, 3 Feb 2016, Jason A. Donenfeld wrote:

> This patch simply adds support for specifying a "not_oif" device in
> flowi4 and flowi6 lookups, that will find a matching route that _isn't_
> via the specified device.

If you check every flowi4_oif user you will notice
that some places can not fulfil this requirement:

- fib_select_path -> fib_select_multipath

Other places like fib_select_default are called
for flowi4_oif=0 and there are no other checks for flowi4_oif
but they will be needed for the new field.

I don't know about the particular problems with
tunnels but the scripts can use the route metric to order
the routes in a table. Your patch looks simple but misses
a dozen of problems. The first breakage should be from the
missing initialization of this new field because the flowi
structure is not initialized at some places. Random
stack can lead to skipped routes. If this feature has
fans, you have to check all places that use flowi4_oif and
flowi6_oif.

Regards

--
Julian Anastasov 


Re: [PATCH] of: of_mdio: Add marvell,88e1145 to whitelist of PHY compatibilities.

2016-02-03 Thread Aaro Koskinen
Hi,

On Wed, Feb 03, 2016 at 09:08:57PM +0100, Andrew Lunn wrote:
> On Wed, Feb 03, 2016 at 09:35:29PM +0200, Aaro Koskinen wrote:
> > Commit ae461131960b ("of: of_mdio: Add a whitelist of PHY
> > compatibilities.") missed one compatible string used in in-tree DTBs:
> > in OCTEON, for selected boards, the kernel DTB pruning code will overwrite
> > the DTB compatible string with "marvell,88e1145", which is missing
> > from the whitelist. Add it.
> 
> Does this overwriting means this compatibility is not visible in the
> current DTS files? Or did i miss it?

Yeah, it happens in arch/mips/cavium-octeon/octeon-platform.c:

if (octeon_has_88e1145()) {
fdt_nop_property(initial_boot_params, phy, "marvell,reg-init");
memset(new_name, 0, sizeof(new_name));
strcpy(new_name, "marvell,88e1145");

It took a while for me to figure out this as well... Nasty.

> At least for the Marvell SoCs i intend to submit a patch removing
> these compatible strings from the DTS files. Will you do the same for
> the OCTEON boards?

Yes, for in-tree OCTEON DTS files, I can do the update; the above strcpy
needs to be deleted at the same go, and this needs go through MIPS tree.

A.


[PATCH net 4/4] net: phy: bcm7xxx: Make MII_BCM7XX_64CLK_MDIO naming consistent

2016-02-03 Thread Florian Fainelli
The driver is BCM7xxx, we were missing an additional X in the constant naming,
fix that to be consistent.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/bcm7xxx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index 568768abe3ed..0666f54ceeb5 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -24,7 +24,7 @@
 #define MII_BCM7XXX_100TX_FALSE_CAR0x13
 #define MII_BCM7XXX_100TX_DISC 0x14
 #define MII_BCM7XXX_AUX_MODE   0x1d
-#define  MII_BCM7XX_64CLK_MDIO BIT(12)
+#define  MII_BCM7XXX_64CLK_MDIOBIT(12)
 #define MII_BCM7XXX_TEST   0x1f
 #define  MII_BCM7XXX_SHD_MODE_2BIT(2)
 
@@ -247,7 +247,7 @@ static int bcm7xxx_config_init(struct phy_device *phydev)
int ret;
 
/* Enable 64 clock MDIO */
-   phy_write(phydev, MII_BCM7XXX_AUX_MODE, MII_BCM7XX_64CLK_MDIO);
+   phy_write(phydev, MII_BCM7XXX_AUX_MODE, MII_BCM7XXX_64CLK_MDIO);
phy_read(phydev, MII_BCM7XXX_AUX_MODE);
 
/* set shadow mode 2 */
-- 
2.1.0



[PATCH net 1/4] net: phy: bcm7xxx: Fix shadow mode 2 disabling

2016-02-03 Thread Florian Fainelli
The clear and set masks in the call to phy_set_clr_bits() called from
bcm7xxx_config_init() are inverted. We need to fix this by swapping the two
arguments, that is, set 0 bits, but clear the shade mode 2 enable bit.

Fixes: b560a58c45c66 ("net: phy: add Broadcom BCM7xxx internal PHY driver")
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/bcm7xxx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index bf241a3ec5e5..234a28502793 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -270,7 +270,7 @@ static int bcm7xxx_config_init(struct phy_device *phydev)
phy_write(phydev, MII_BCM7XXX_100TX_FALSE_CAR, 0x7555);
 
/* reset shadow mode 2 */
-   ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 
MII_BCM7XXX_SHD_MODE_2, 0);
+   ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST, 0, 
MII_BCM7XXX_SHD_MODE_2);
if (ret < 0)
return ret;
 
-- 
2.1.0



[PATCH net 2/4] net: phy: bcm7xxx: Fix 40nm EPHY features

2016-02-03 Thread Florian Fainelli
The PHY entries for BCM7425/29/35 declare the 40nm Ethernet PHY as being
10/100/1000 capable, while this is just a 10/100 capable PHY device, fix that.

Fixes: d068b02cfdfc2 ("net: phy: add BCM7425 and BCM7429 PHYs")
Fixes: 9458ceab4917 ("net: phy: bcm7xxx: Add entry for BCM7435")
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/bcm7xxx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index 234a28502793..524806dd0f6b 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -337,7 +337,7 @@ static struct phy_driver bcm7xxx_driver[] = {
.phy_id = PHY_ID_BCM7425,
.phy_id_mask= 0xfff0,
.name   = "Broadcom BCM7425",
-   .features   = PHY_GBIT_FEATURES |
+   .features   = PHY_BASIC_FEATURES |
  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
.flags  = PHY_IS_INTERNAL,
.config_init= bcm7xxx_config_init,
@@ -349,7 +349,7 @@ static struct phy_driver bcm7xxx_driver[] = {
.phy_id = PHY_ID_BCM7429,
.phy_id_mask= 0xfff0,
.name   = "Broadcom BCM7429",
-   .features   = PHY_GBIT_FEATURES |
+   .features   = PHY_BASIC_FEATURES |
  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
.flags  = PHY_IS_INTERNAL,
.config_init= bcm7xxx_config_init,
@@ -361,7 +361,7 @@ static struct phy_driver bcm7xxx_driver[] = {
.phy_id = PHY_ID_BCM7435,
.phy_id_mask= 0xfff0,
.name   = "Broadcom BCM7435",
-   .features   = PHY_GBIT_FEATURES |
+   .features   = PHY_BASIC_FEATURES |
  SUPPORTED_Pause | SUPPORTED_Asym_Pause,
.flags  = PHY_IS_INTERNAL,
.config_init= bcm7xxx_config_init,
-- 
2.1.0



[PATCH net 0/4] net: phy: bcm7xxx 40nm PHY fixes

2016-02-03 Thread Florian Fainelli
Hi David,

Here is a collection of fixes for the 40nm Ethernet PHY supported
by the 7xxx PHY driver, please also queue these fixes for stable.

Let me know if you think patch 4 is too much of a cleanup to be taken
as a fix.

Thanks!

Florian Fainelli (4):
  net: phy: bcm7xxx: Fix shadow mode 2 disabling
  net: phy: bcm7xxx: Fix 40nm EPHY features
  net: phy: bcm7xxx: Fix bcm7xxx_config_init() check
  net: phy: bcm7xxx: Make MII_BCM7XX_64CLK_MDIO naming consistent

 drivers/net/phy/bcm7xxx.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

-- 
2.1.0



Re: [PATCH] flowi: add concept of "not_oif"

2016-02-03 Thread Jason A. Donenfeld
Hi Julian,

Thanks a lot for your review. Much appreciated.

On Wed, Feb 3, 2016 at 9:42 PM, Julian Anastasov  wrote:
> If you check every flowi4_oif user you will notice
> that some places can not fulfil this requirement:
> - fib_select_path -> fib_select_multipath
> Other places like fib_select_default are called
> for flowi4_oif=0 and there are no other checks for flowi4_oif
> but they will be needed for the new field.
> fans, you have to check all places that use flowi4_oif and
> flowi6_oif.
> missing initialization of this new field

Thanks for pointing these out. I will take this into account and send
an updated patch [series].



> I don't know about the particular problems with
> tunnels but the scripts can use the route metric to order
> the routes in a table.

This unfortunately does not cut it with tunnels.


Jason


[PATCH] net: ethernet: davicom: fix devicetree irq resource

2016-02-03 Thread Robert Jarzmik
The dm9000 driver doesn't work in at least one device-tree
configuration, spitting an error message on irq resource :
[1.062495] dm9000 800.ethernet: insufficient resources
[1.068439] dm9000 800.ethernet: not found (-2).
[1.073451] dm9000: probe of 800.ethernet failed with error -2

The reason behind is that the interrupt might be provided by a gpio
controller, not probed when dm9000 is probed, and needing the probe
deferral mechanism to apply.

Currently, the interrupt is directly taken from resources. This patch
changes this to use the more generic platform_get_irq(), which handles
the deferral.

Moreover, since commit Fixes: 7085a7401ba5 ("drivers: platform: parse
IRQ flags from resources"), the interrupt trigger flags are honored in
platform_get_irq(), so remove the needless code in dm9000.

Signed-off-by: Robert Jarzmik 
Acked-by: Marcel Ziswiler 
---
 drivers/net/ethernet/davicom/dm9000.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/davicom/dm9000.c 
b/drivers/net/ethernet/davicom/dm9000.c
index cf94b72dbacd..6c527bde9edb 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -128,7 +128,6 @@ struct board_info {
struct resource *data_res;
struct resource *addr_req;   /* resources requested */
struct resource *data_req;
-   struct resource *irq_res;
 
int  irq_wake;
 
@@ -1300,18 +1299,14 @@ static int
 dm9000_open(struct net_device *dev)
 {
struct board_info *db = netdev_priv(dev);
-   unsigned long irqflags = db->irq_res->flags & IRQF_TRIGGER_MASK;
+   unsigned long irqflags = 0;
 
if (netif_msg_ifup(db))
dev_dbg(db->dev, "enabling %s\n", dev->name);
 
-   /* If there is no IRQ type specified, default to something that
-* may work, and tell the user that this is a problem */
-
-   if (irqflags == IRQF_TRIGGER_NONE)
-   irqflags = irq_get_trigger_type(dev->irq);
-
-   if (irqflags == IRQF_TRIGGER_NONE)
+   /* If there is no IRQ type specified, tell the user that this is a
+* problem */
+   if (irq_get_trigger_type(dev->irq) == IRQF_TRIGGER_NONE)
dev_warn(db->dev, "WARNING: no IRQ resource flags set.\n");
 
irqflags |= IRQF_SHARED;
@@ -1500,15 +1495,21 @@ dm9000_probe(struct platform_device *pdev)
 
db->addr_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
db->data_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
-   db->irq_res  = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
 
-   if (db->addr_res == NULL || db->data_res == NULL ||
-   db->irq_res == NULL) {
-   dev_err(db->dev, "insufficient resources\n");
+   if (db->addr_res == NULL || db->data_res == NULL) {
+   dev_err(db->dev, "insufficient resources addr=%p data=%p\n",
+   db->addr_res, db->data_res);
ret = -ENOENT;
goto out;
}
 
+   ndev->irq = platform_get_irq(pdev, 0);
+   if (ndev->irq <= 0) {
+   dev_err(db->dev, "interrupt resource unavailable: %d\n",
+   ndev->irq);
+   return ndev->irq;
+   }
+
db->irq_wake = platform_get_irq(pdev, 1);
if (db->irq_wake >= 0) {
dev_dbg(db->dev, "wakeup irq %d\n", db->irq_wake);
@@ -1570,7 +1571,6 @@ dm9000_probe(struct platform_device *pdev)
 
/* fill in parameters for net-dev structure */
ndev->base_addr = (unsigned long)db->io_addr;
-   ndev->irq   = db->irq_res->start;
 
/* ensure at least we have a default set of IO routines */
dm9000_set_io(db, iosize);
-- 
2.1.4



Re: Keystone 2 boards boot failure

2016-02-03 Thread Arnd Bergmann
On Wednesday 03 February 2016 18:31:00 Grygorii Strashko wrote:
> On 02/03/2016 06:20 PM, Arnd Bergmann wrote:
> > On Wednesday 03 February 2016 16:21:05 Grygorii Strashko wrote:
> >> On 02/03/2016 04:11 PM, Franklin S Cooper Jr. wrote:
> >>> On 02/02/2016 07:19 PM, Franklin S Cooper Jr. wrote:
> >
> > This looks wrong: I was getting the build warnings originally
> > because of 64-bit dma_addr_t, and that should be the only way that
> > this driver can operate, because in some configurations on keystone
> > there is no memory below 4GB, and there is no dma-ranges property
> > in the DT that shifts around the start of the DMA addresses.
> 
> see keystone.dtsi:
>   soc {
>   #address-cells = <1>;
>   #size-cells = <1>;
>   compatible = "ti,keystone","simple-bus";
>   interrupt-parent = <>;
>   ranges = <0x0 0x0 0x0 0xc000>;
>   dma-ranges = <0x8000 0x8 0x 0x8000>;
>   ^^^

You are right, I totally missed it when I looked again. I thought it
was correct but then couldn't find it in the dts.

> config:
> 
> CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
> CONFIG_PHYS_ADDR_T_64BIT=y
> 
> and
> 
> #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT <--- should not be defined for KS2
> typedef u64 dma_addr_t;
> #else
> typedef u32 dma_addr_t;
> #endif
> 
> Above is valid configuration for Keystone 2 with LPAE=y

Ok, but what do you mean with "should not be defined"? It clearly is
defined in any multiplatform configuration that enables another platform
needing 64-bit dma_addr_t.


Arnd


Re: Keystone 2 boards boot failure

2016-02-03 Thread Arnd Bergmann
On Wednesday 03 February 2016 11:41:40 Murali Karicheri wrote:
> > 
> > This looks wrong: I was getting the build warnings originally
> > because of 64-bit dma_addr_t, and that should be the only way that
> > this driver can operate, because in some configurations on keystone
> > there is no memory below 4GB, and there is no dma-ranges property
> > in the DT that shifts around the start of the DMA addresses.
> Arnd,
> 
> Why do think so? I see in arch/arm/boot/dts/keystone.dtsi
> 
> soc {
> #address-cells = <1>;
> #size-cells = <1>;
> compatible = "ti,keystone","simple-bus";
> interrupt-parent = <>;
> ranges = <0x0 0x0 0x0 0xc000>;
> dma-ranges = <0x8000 0x8 0x 0x8000>;
> 
> AFAIK, On Keystone, dma address is 32 bit and Physical DDR address is
> 64 bit (actually 36 bit, LPAE address). The conversion happens based on
> pfn_offset which is calculated based on the above dma-range property.

My mistake, see my other reply.
 


Arnd


[PATCH net 3/4] net: phy: bcm7xxx: Fix bcm7xxx_config_init() check

2016-02-03 Thread Florian Fainelli
Since we were wrongly advertising gigabit features for these 10/100 only
Ethernet PHYs, bcm7xxx_config_init() which is supposed to apply workaround
would have not run since the check would be true, now that we have fixed the
PHY features, remove that check since it has no reasoning to be there anymore.

Fixes: e18556ee3bd83 ("net: phy: bcm7xxx: do not use PHY_BRCM_100MBPS_WAR")
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/bcm7xxx.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index 524806dd0f6b..568768abe3ed 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -250,10 +250,6 @@ static int bcm7xxx_config_init(struct phy_device *phydev)
phy_write(phydev, MII_BCM7XXX_AUX_MODE, MII_BCM7XX_64CLK_MDIO);
phy_read(phydev, MII_BCM7XXX_AUX_MODE);
 
-   /* Workaround only required for 100Mbits/sec capable PHYs */
-   if (phydev->supported & PHY_GBIT_FEATURES)
-   return 0;
-
/* set shadow mode 2 */
ret = phy_set_clr_bits(phydev, MII_BCM7XXX_TEST,
MII_BCM7XXX_SHD_MODE_2, MII_BCM7XXX_SHD_MODE_2);
-- 
2.1.0



Re: [PATCH net-next] sxgbe: remove unused code

2016-02-03 Thread Shuah Khan
On 02/03/2016 01:11 PM, Jεan Sacren wrote:
> From: Jean Sacren 
> 
> With the introduction of this commit 1edb9ca69e8a
> ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver"),
> the following interface was added:
> 
>   int sxgbe_xpcs_init(struct net_device *ndev);
>   int sxgbe_xpcs_init_1G(struct net_device *ndev);
> 
> But those two functions have never been used since the inception.
> 
> In addition to the interface, the following macros are defined in
> sxgbe_xpcs header but not used:
> 
>   SR_MII_MMD_AN_ADV
>   SR_MII_MMD_AN_LINK_PARTNER_BA
>   VR_MII_MMD_AN_CONTROL
>   VR_MII_MMD_AN_INT_STATUS
>   XPCS_TYPE_SEL_R
>   XPCS_TYPE_SEL_W
>   XPCS_RXAUI_MODE
> 
> If we remove the interface, functions it uses and all other macros
> defined in sxgbe_xpcs header will also become useless. Thus, the whole
> sxgbe_xpcs shebang should be all gone.
> 
> Julia Lawall observed:
> 
> "...  I was looking at dependencies between networking files.  This one
> stands out because nothing is dependent[ ]on it, even the files it is
> compiled with, and it doesn't contain the usual functions,
> register_netdev, etc."
> 
> David Miller commented:
> 
> "There are no in-tree callers of this code.  It should be removed until
> there are in-tree users.
> 
> Nobody can figure out if the interface for this is done properly without
> seeing the call sites and how they work.  It is therefore impossible to
> review this code and judge it[']s design."
> 
> Let's remove this unused code. As a matter of fact, it should not have
> been merged in the first place.

ok - I don't think all of this belongs in the change log.
You can just say - "removing unused code" include a link
a to the discussion - could also add Suggested-by tag for
David Miller

thanks,
-- Shuah
> 
> Reported-by: Julia Lawall 
> Signed-off-by: Jean Sacren 
> Cc: Byungho An 
> Cc: Girish K S 
> ---
> We may use "--ignore FILE_PATH_CHANGES" to suppress checkpatch warning.
> 
>  drivers/net/ethernet/samsung/sxgbe/Makefile |  2 +-
>  drivers/net/ethernet/samsung/sxgbe/sxgbe_xpcs.c | 91 
> -
>  drivers/net/ethernet/samsung/sxgbe/sxgbe_xpcs.h | 38 ---
>  3 files changed, 1 insertion(+), 130 deletions(-)
>  delete mode 100644 drivers/net/ethernet/samsung/sxgbe/sxgbe_xpcs.c
>  delete mode 100644 drivers/net/ethernet/samsung/sxgbe/sxgbe_xpcs.h
> 
> diff --git a/drivers/net/ethernet/samsung/sxgbe/Makefile 
> b/drivers/net/ethernet/samsung/sxgbe/Makefile
> index dcc80b9d4370..31e968561d5c 100644
> --- a/drivers/net/ethernet/samsung/sxgbe/Makefile
> +++ b/drivers/net/ethernet/samsung/sxgbe/Makefile
> @@ -1,4 +1,4 @@
>  obj-$(CONFIG_SXGBE_ETH) += samsung-sxgbe.o
>  samsung-sxgbe-objs:= sxgbe_platform.o sxgbe_main.o sxgbe_desc.o \
>   sxgbe_dma.o sxgbe_core.o sxgbe_mtl.o  sxgbe_mdio.o \
> - sxgbe_ethtool.o sxgbe_xpcs.o $(samsung-sxgbe-y)
> + sxgbe_ethtool.o $(samsung-sxgbe-y)
> diff --git a/drivers/net/ethernet/samsung/sxgbe/sxgbe_xpcs.c 
> b/drivers/net/ethernet/samsung/sxgbe/sxgbe_xpcs.c
> deleted file mode 100644
> index 51c32194ba88..
> --- a/drivers/net/ethernet/samsung/sxgbe/sxgbe_xpcs.c
> +++ /dev/null
> @@ -1,91 +0,0 @@
> -/* 10G controller driver for Samsung SoCs
> - *
> - * Copyright (C) 2013 Samsung Electronics Co., Ltd.
> - *   http://www.samsung.com
> - *
> - * Author: Siva Reddy Kallam 
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - */
> -#include 
> -#include 
> -#include 
> -#include 
> -#include "sxgbe_common.h"
> -#include "sxgbe_xpcs.h"
> -
> -static int sxgbe_xpcs_read(struct net_device *ndev, unsigned int reg)
> -{
> - u32 value;
> - struct sxgbe_priv_data *priv = netdev_priv(ndev);
> -
> - value = readl(priv->ioaddr + XPCS_OFFSET + reg);
> -
> - return value;
> -}
> -
> -static int sxgbe_xpcs_write(struct net_device *ndev, int reg, int data)
> -{
> - struct sxgbe_priv_data *priv = netdev_priv(ndev);
> -
> - writel(data, priv->ioaddr + XPCS_OFFSET + reg);
> -
> - return 0;
> -}
> -
> -int sxgbe_xpcs_init(struct net_device *ndev)
> -{
> - u32 value;
> -
> - value = sxgbe_xpcs_read(ndev, SR_PCS_MMD_CONTROL1);
> - /* 10G XAUI mode */
> - sxgbe_xpcs_write(ndev, SR_PCS_CONTROL2, XPCS_TYPE_SEL_X);
> - sxgbe_xpcs_write(ndev, VR_PCS_MMD_XAUI_MODE_CONTROL, XPCS_XAUI_MODE);
> - sxgbe_xpcs_write(ndev, VR_PCS_MMD_XAUI_MODE_CONTROL, value | BIT(13));
> - sxgbe_xpcs_write(ndev, SR_PCS_MMD_CONTROL1, value | BIT(11));
> -
> - do {
> - value = sxgbe_xpcs_read(ndev, VR_PCS_MMD_DIGITAL_STATUS);
> - } while ((value & XPCS_QSEQ_STATE_MPLLOFF) == XPCS_QSEQ_STATE_STABLE);
> -
> - 

Re: [PATCH] of: of_mdio: Add marvell,88e1145 to whitelist of PHY compatibilities.

2016-02-03 Thread Andrew Lunn
> The compatibility strings may be present in deployed firmware, they
> cannot be removed.  For many OCTEON boards, the device tree is a
> firmware-kernel ABI, it is not practical to unilaterally decide to
> change the bindings on the kernel side as you don't control the
> firmware.

Hi David

We are keeping backwards compatibility. The kernel has always ignored
this string, and will continue to always ignore this string. But since
it is being ignored, you may as well remove it in future versions of
the DTB.

Andrew


Re: [PATCH] of: of_mdio: Add marvell,88e1145 to whitelist of PHY compatibilities.

2016-02-03 Thread David Daney

On 02/03/2016 12:08 PM, Andrew Lunn wrote:

On Wed, Feb 03, 2016 at 09:35:29PM +0200, Aaro Koskinen wrote:

Commit ae461131960b ("of: of_mdio: Add a whitelist of PHY
compatibilities.") missed one compatible string used in in-tree DTBs:
in OCTEON, for selected boards, the kernel DTB pruning code will overwrite
the DTB compatible string with "marvell,88e1145", which is missing
from the whitelist. Add it.


Does this overwriting means this compatibility is not visible in the
current DTS files? Or did i miss it?

At least for the Marvell SoCs i intend to submit a patch removing
these compatible strings from the DTS files. Will you do the same for
the OCTEON boards?



The compatibility strings may be present in deployed firmware, they 
cannot be removed.  For many OCTEON boards, the device tree is a 
firmware-kernel ABI, it is not practical to unilaterally decide to 
change the bindings on the kernel side as you don't control the firmware.


David Daney




The patch fixes broken networking on EdgeRouter Lite.

Fixes: ae461131960b ("of: of_mdio: Add a whitelist of PHY compatibilities.")
Signed-off-by: Aaro Koskinen 


Reviewed-by: Andrew Lunn 

Thanks
Andrew


---
  drivers/of/of_mdio.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
index 5648317..39c4be4 100644
--- a/drivers/of/of_mdio.c
+++ b/drivers/of/of_mdio.c
@@ -154,6 +154,7 @@ static const struct of_device_id whitelist_phys[] = {
{ .compatible = "marvell,88E", },
{ .compatible = "marvell,88e1116", },
{ .compatible = "marvell,88e1118", },
+   { .compatible = "marvell,88e1145", },
{ .compatible = "marvell,88e1149r", },
{ .compatible = "marvell,88e1310", },
{ .compatible = "marvell,88E1510", },
--
2.4.0





  1   2   3   >