[PATCH net] ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output

2018-12-06 Thread Shmulik Ladkani
In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
initialization.

Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection 
with lwtunnels")
Signed-off-by: Shmulik Ladkani 

---
Suggeting this fix, spotted during code review while experimenting
with SRv6, although havn't encountered a specific issue during
experiments.

Was there any genuine intention to actually keep 'fl6' uninitialized?
---

 net/ipv6/seg6_iptunnel.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index a8854dd3e9c5..8181ee7e1e27 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -347,6 +347,7 @@ static int seg6_output(struct net *net, struct sock *sk, 
struct sk_buff *skb)
struct ipv6hdr *hdr = ipv6_hdr(skb);
struct flowi6 fl6;
 
+   memset(, 0, sizeof(fl6));
fl6.daddr = hdr->daddr;
fl6.saddr = hdr->saddr;
fl6.flowlabel = ip6_flowinfo(hdr);
-- 
2.19.1


Re: [PATCH 0/3]: net: dsa: mt7530: support MT7530 in the MT7621 SoC

2018-12-06 Thread Greg Ungerer

Hi John,

On 4/12/18 12:02 am, John Crispin wrote:

On 03/12/2018 15:00, René van Dorst wrote:

Quoting Bjørn Mork :

Greg Ungerer  writes:


The following change helped alot, but I still get some problems under
sustained load and some types of port setups. Still trying to figure
out what exactly is going on.

--- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
   if (likely(napi_schedule_prep(>rx_napi))) {
    __napi_schedule(>rx_napi);
-   mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
    }
+   mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
   return IRQ_HANDLED;
 }
@@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void 
*_eth)
   if (likely(napi_schedule_prep(>tx_napi))) {
    __napi_schedule(>tx_napi);
-   mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
    }
+   mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
   return IRQ_HANDLED;
 }


Yes, sorry I didn't point to that as well.  Just to be clear:  I have no
clue how this thing is actually wired up, or if you could use three
interrupts on the MT7621 too. I just messed with it until I got
something to work, based on Renés original idea and code.


My idea is a just a copy of mtk_handle_irq_{rx,tx} see [1]
You probably want to look at the staging driver or Ubiquity source with a 
3.10.x kernel [2] or padavan with 3.4.x kernel [3].
AFAIK mt7621 only has 1 IRQ for ethernet part.


correct there is only 1 single IRQ on mt7621


One of the main differences I see between the mainline mtk_eth_soc.c
and the older mediatek/openwrt driver is that the older driver uses
the PDMA module for TX transmission, while the mainline uses the
QDMA module. I have no documentation on the what the differences
are between the 2 (or why there is even 2 DMA engines in there?).
Can you shed any light on that?

I did a quick and dirty recode of the QDMA transmission parts of
the mainline driver code to use the PDMA engine instead. The most
immediate result is that it suffers the same IP header checksum
problem on TX packets :-(  But it also still suffers from the
same occasional TX watchdog timeout I see with only the mainline
driver and basic support of MT7621.

What I see with the TX watchdog timeouts is that there is valid
TX descriptors, but the frame engine is just not processing them.
It seems to be just sitting there idle. The CTX and DTX registers
look valid and consistent with the local last_free/next_free
pointers.

Regards
Greg



Re: [PATCH rdma-next 0/3] Packet based credit mode

2018-12-06 Thread Leon Romanovsky
On Thu, Dec 06, 2018 at 08:27:06PM -0700, Jason Gunthorpe wrote:
> On Fri, Nov 30, 2018 at 01:22:03PM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> >
> > >From Danit,
> >
> > Packet based credit mode is an alternative end-to-end credit mode for QPs
> > set during their creation. Credits are transported from the responder
> > to the requester to optimize the use of its receive resources.
> > In packet-based credit mode, credits are issued on a per packet basis.
> >
> > The advantage of this feature comes while sending large RDMA messages
> > through switches that are short in memory.
> >
> > The first commit exposes QP creation flag and the HCA capability. The
> > second commit adds support for a new DV QP creation flag. The last
> > commit report packet based credit mode capability via the MLX5DV device
> > capabilities.
> >
> > Thanks
> >
> > Danit Goldberg (3):
> >   net/mlx5: Expose packet based credit mode
> >   IB/mlx5: Add packet based credit mode support
> >   IB/mlx5: Report packet based credit mode device capability
>
> This looks fine to me, can you update the shared branch please

Done, thanks
3fd3c80acc17 net/mlx5: Expose packet based credit mode

>
> Thanks,
> Jason


signature.asc
Description: PGP signature


Re: [PATCH bpf] selftests/bpf: add missing pointer dereference for map stacktrace fixup

2018-12-06 Thread Prashant Bhole




On 12/7/2018 1:14 PM, Stanislav Fomichev wrote:

I get a segfault without it, other fixups always do dereference, and
without dereference I don't understand how it can ever work.

Fixes: 7c85c448e7d74 ("selftests/bpf: test_verifier, check
bpf_map_lookup_elem access in bpf prog")

Signed-off-by: Stanislav Fomichev 
---
  tools/testing/selftests/bpf/test_verifier.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index df6f751cc1e8..d23929a1985d 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -14166,7 +14166,7 @@ static void do_test_fixup(struct bpf_test *test, enum 
bpf_map_type prog_type,
do {
prog[*fixup_map_stacktrace].imm = map_fds[12];
fixup_map_stacktrace++;
-   } while (fixup_map_stacktrace);
+   } while (*fixup_map_stacktrace);
}
  }


It was my mistake. Thanks for the fix!

-Prashant



[PATCH bpf] selftests/bpf: add missing pointer dereference for map stacktrace fixup

2018-12-06 Thread Stanislav Fomichev
I get a segfault without it, other fixups always do dereference, and
without dereference I don't understand how it can ever work.

Fixes: 7c85c448e7d74 ("selftests/bpf: test_verifier, check
bpf_map_lookup_elem access in bpf prog")

Signed-off-by: Stanislav Fomichev 
---
 tools/testing/selftests/bpf/test_verifier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index df6f751cc1e8..d23929a1985d 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -14166,7 +14166,7 @@ static void do_test_fixup(struct bpf_test *test, enum 
bpf_map_type prog_type,
do {
prog[*fixup_map_stacktrace].imm = map_fds[12];
fixup_map_stacktrace++;
-   } while (fixup_map_stacktrace);
+   } while (*fixup_map_stacktrace);
}
 }
 
-- 
2.20.0.rc2.403.gdbc3b29805-goog



Re: [RFC PATCH 4/6] dt-bindings: update mvneta binding document

2018-12-06 Thread Kishon Vijay Abraham I
Hi Russell,

On 05/12/18 9:00 PM, Rob Herring wrote:
> On Wed, Dec 5, 2018 at 5:00 AM Russell King - ARM Linux
>  wrote:
>>
>> On Mon, Dec 03, 2018 at 05:54:55PM -0600, Rob Herring wrote:
>>> On Mon, Nov 12, 2018 at 12:31:02PM +, Russell King wrote:
 Signed-off-by: Russell King 
>>>
>>> Needs a better subject and a commit msg.
>>
>> Hmm, not sure why it didn't contain:
>>
>> "dt-bindings: net: mvneta: add phys property
>>
>> Add an optional phys property to the mvneta binding documentation for
>> the common phy.
>>
>> Signed-off-by: Russell King "
>>
>> as the commit message.  With the correct commit message, are you happy
>> with it?
> 
> Yes.
> 
> Reviewed-by: Rob Herring 

Are you planning to resend this series?

Thanks
Kishon


Re: [PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"

2018-12-06 Thread Benjamin Herrenschmidt
On Fri, 2018-12-07 at 14:20 +1100, Benjamin Herrenschmidt wrote:
>

Apologies for the empty email, not too sure what happened, I did a
resend and the second one worked.

Cheers
Ben.



[PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"

2018-12-06 Thread Benjamin Herrenschmidt
This reverts commit 624ca9c33c8a853a4a589836e310d776620f4ab9.

This commit is completely bogus. The STACR register has two formats, old
and new, depending on the version of the IP block used. There's a pair of
device-tree properties that can be used to specify the format used:

has-inverted-stacr-oc
has-new-stacr-staopc

What this commit did was to change the bit definition used with the old
parts to match the new parts. This of course breaks the driver on all
the old ones.

Instead, the author should have set the appropriate properties in the
device-tree for the variant used on his board.

Signed-off-by: Benjamin Herrenschmidt 
---

Found while setting up some old ppc440 boxes for test/CI

 drivers/net/ethernet/ibm/emac/emac.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/emac/emac.h 
b/drivers/net/ethernet/ibm/emac/emac.h
index e2f80cc..0d2de6f 100644
--- a/drivers/net/ethernet/ibm/emac/emac.h
+++ b/drivers/net/ethernet/ibm/emac/emac.h
@@ -231,7 +231,7 @@ struct emac_regs {
 #define EMAC_STACR_PHYE0x4000
 #define EMAC_STACR_STAC_MASK   0x3000
 #define EMAC_STACR_STAC_READ   0x1000
-#define EMAC_STACR_STAC_WRITE  0x0800
+#define EMAC_STACR_STAC_WRITE  0x2000
 #define EMAC_STACR_OPBC_MASK   0x0C00
 #define EMAC_STACR_OPBC_50 0x
 #define EMAC_STACR_OPBC_66 0x0400
-- 
2.7.4




Re: [PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"

2018-12-06 Thread David Miller


Looks like your posting was empty?


Re: [PATCH net-next] neighbour: Improve garbage collection

2018-12-06 Thread David Miller
From: David Ahern 
Date: Thu,  6 Dec 2018 14:38:44 -0800

> The existing garbage collection algorithm has a number of problems:

Thanks for working on this!

I totally agree with what you are doing, especially the separate
gc_list.

But why do you need the on_gc_list boolean state?  That's equivalent
to "!list_empty(>gc_list)" and seems redundant.


Re: [PATCH net-next 3/4] net: use indirect call wrapper at GRO transport layer

2018-12-06 Thread kbuild test robot
Hi Paolo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/net-mitigate-retpoline-overhead/20181206-183400
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/ipv6/ip6_offload.o: In function `ipv6_gro_receive':
>> ip6_offload.c:(.text+0x58e): undefined reference to `transport6_gro_receive1'
   ip6_offload.c:(.text+0x595): undefined reference to `transport6_gro_receive1'
   net/ipv6/ip6_offload.o: In function `ipv6_gro_complete':
>> ip6_offload.c:(.text+0xb26): undefined reference to 
>> `transport6_gro_complete1'
   ip6_offload.c:(.text+0xb30): undefined reference to 
`transport6_gro_complete1'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH bpf] selftests/bpf: skip sockmap tests on kernels without support

2018-12-06 Thread Stanislav Fomichev
Include "autoconf.h" header in the test_maps.c and selectively
disable test_sockmap if CONFIG_BPF_STREAM_PARSER is not specified
in the kernel config.
When building out of tree/without autoconf.h, fall back to 'enabled'.

Signed-off-by: Stanislav Fomichev 
---
 tools/testing/selftests/bpf/test_maps.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 4db2116e52be..3548de8a78ac 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -32,6 +32,13 @@
 #define ENOTSUPP 524
 #endif
 
+#ifdef HAVE_GENHDR
+# include "autoconf.h"
+#else
+/* fallback to all features enabled */
+# define CONFIG_BPF_STREAM_PARSER 1
+#endif
+
 static int map_flags;
 
 #define CHECK(condition, tag, format...) ({\
@@ -588,6 +595,7 @@ static void test_stackmap(int task, void *data)
close(fd);
 }
 
+#ifdef CONFIG_BPF_STREAM_PARSER
 #include 
 #include 
 #include 
@@ -1079,6 +1087,7 @@ static void test_sockmap(int tasks, void *data)
close(fd);
exit(1);
 }
+#endif
 
 #define MAP_SIZE (32 * 1024)
 
@@ -1541,7 +1550,9 @@ static void run_all_tests(void)
test_arraymap_percpu_many_keys();
 
test_devmap(0, NULL);
+#ifdef CONFIG_BPF_STREAM_PARSER
test_sockmap(0, NULL);
+#endif
 
test_map_large();
test_map_parallel();
-- 
2.20.0.rc2.403.gdbc3b29805-goog



Re: [PATCH rdma-next 0/3] Packet based credit mode

2018-12-06 Thread Jason Gunthorpe
On Fri, Nov 30, 2018 at 01:22:03PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> >From Danit,
> 
> Packet based credit mode is an alternative end-to-end credit mode for QPs
> set during their creation. Credits are transported from the responder
> to the requester to optimize the use of its receive resources.
> In packet-based credit mode, credits are issued on a per packet basis.
> 
> The advantage of this feature comes while sending large RDMA messages
> through switches that are short in memory.
> 
> The first commit exposes QP creation flag and the HCA capability. The
> second commit adds support for a new DV QP creation flag. The last
> commit report packet based credit mode capability via the MLX5DV device
> capabilities.
> 
> Thanks
> 
> Danit Goldberg (3):
>   net/mlx5: Expose packet based credit mode
>   IB/mlx5: Add packet based credit mode support
>   IB/mlx5: Report packet based credit mode device capability

This looks fine to me, can you update the shared branch please

Thanks,
Jason


[PATCH] Revert "net/ibm/emac: wrong bit is used for STA control"

2018-12-06 Thread Benjamin Herrenschmidt



Re: [PATCH RFC 5/6] net: dsa: microchip: Update tag_ksz.c to access switch driver

2018-12-06 Thread Richard Cochran
On Thu, Dec 06, 2018 at 08:00:26PM +, tristram...@microchip.com wrote:
> A customer has already inquired about implementing 1588 PTP in the DSA 
> driver.  I hope
> this mechanism is approved so that I can start doing that.

If you need changes to the PTP core, you had better discuss this with
the PTP maintainer.

Thanks,
Richard


[net-next PATCH RFC 8/8] veth: xdp_frames redirected into veth need to transfer xdp_mem_info

2018-12-06 Thread Jesper Dangaard Brouer
XDP frames redirected into a veth device, that choose XDP_PASS end-up
creating an SKB from the xdp_frame.  The xdp_frame mem info need to be
transferred into the SKB.

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Ilias Apalodimas 
---
 drivers/net/veth.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f412ea1cef18..925d300402ca 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -555,6 +555,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
goto err;
}
 
+   skb->mem_info = frame->mem;
xdp_scrub_frame(frame);
skb->protocol = eth_type_trans(skb, rq->dev);
 err:



[net-next PATCH RFC 5/8] net: mvneta: remove copybreak, prefetch and use build_skb

2018-12-06 Thread Jesper Dangaard Brouer
From: Ilias Apalodimas 

The driver memcpy for packets < 256b and it's recycle tricks are not
needed anymore.  As previous patch introduces buffer recycling using
the page_pool API (although recycling doesn't get fully activated in
this patch).  After this switch to using build_skb().

This patch implicit fixes a driver bug where the memory is copied
prior to it's syncing for the CPU, in the < 256b case (as this case is
removed).

We also remove the data prefetch completely. The original driver had
the prefetch misplaced before any dma_sync operations took place.
Based on Jesper's analysis even if the prefetch is placed after
any DMA sync ops it ends up hurting performance.

Signed-off-by: Ilias Apalodimas 
Signed-off-by: Jesper Dangaard Brouer 
---
 drivers/net/ethernet/marvell/mvneta.c |   81 +
 1 file changed, 22 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 2354421fe96f..78f1fcdc1f00 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -643,7 +643,6 @@ static int txq_number = 8;
 static int rxq_def;
 
 static int rx_copybreak __read_mostly = 256;
-static int rx_header_size __read_mostly = 128;
 
 /* HW BM need that each port be identify by a unique ID */
 static int global_port_id;
@@ -1823,7 +1822,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
 
phys_addr = page_pool_get_dma_addr(page);
 
-   phys_addr += pp->rx_offset_correction;
+   phys_addr += pp->rx_offset_correction + NET_SKB_PAD;
mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
return 0;
 }
@@ -1944,14 +1943,12 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
struct page *page;
dma_addr_t phys_addr;
u32 rx_status, index;
-   int rx_bytes, skb_size, copy_size;
-   int frag_num, frag_size, frag_offset;
+   int frag_num, frag_size;
+   int rx_bytes;
 
index = rx_desc - rxq->descs;
page = (struct page *)rxq->buf_virt_addr[index];
data = page_address(page);
-   /* Prefetch header */
-   prefetch(data);
 
phys_addr = rx_desc->buf_phys_addr;
rx_status = rx_desc->status;
@@ -1969,49 +1966,25 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
rx_bytes = rx_desc->data_size -
   (ETH_FCS_LEN + MVNETA_MH_SIZE);
 
-   /* Allocate small skb for each new packet */
-   skb_size = max(rx_copybreak, rx_header_size);
-   rxq->skb = netdev_alloc_skb_ip_align(dev, skb_size);
-   if (unlikely(!rxq->skb)) {
-   netdev_err(dev,
-  "Can't allocate skb on queue %d\n",
-  rxq->id);
-   dev->stats.rx_dropped++;
-   rxq->skb_alloc_err++;
-   continue;
-   }
-   copy_size = min(skb_size, rx_bytes);
-
-   /* Copy data from buffer to SKB, skip Marvell header */
-   memcpy(rxq->skb->data, data + MVNETA_MH_SIZE,
-  copy_size);
-   skb_put(rxq->skb, copy_size);
-   rxq->left_size = rx_bytes - copy_size;
 
-   mvneta_rx_csum(pp, rx_status, rxq->skb);
-   if (rxq->left_size == 0) {
-   int size = copy_size + MVNETA_MH_SIZE;
-
-   dma_sync_single_range_for_cpu(dev->dev.parent,
- phys_addr, 0,
- size,
- DMA_FROM_DEVICE);
+   dma_sync_single_range_for_cpu(dev->dev.parent,
+ phys_addr, 0,
+ rx_bytes,
+ DMA_FROM_DEVICE);
 
-   /* leave the descriptor and buffer untouched */
-   } else {
-   /* refill descriptor with new buffer later */
-   rx_desc->buf_phys_addr = 0;
+   rxq->skb = build_skb(data, PAGE_SIZE);
+   if (!rxq->skb)
+   break;
 
-   frag_num = 0;
-   frag_offset = copy_size + MVNETA_MH_SIZE;
-   frag_size = min(rxq->left_size,
-   (int)(PAGE_SIZE - 

[net-next PATCH RFC 2/8] net: mvneta: use page pool API for sw buffer manager

2018-12-06 Thread Jesper Dangaard Brouer
From: Ilias Apalodimas 

Use the page_pool api for allocations and DMA handling instead of
__dev_alloc_page()/dma_map_page() and free_page()/dma_unmap_page().

The page_pool API offers buffer recycling capabilities for XDP but
allocates one page per packet, unless the driver splits and manages
the allocated page.

Although XDP is not a part of the driver yet, the current implementation
is allocating one page per packet, thus there's no performance penalty from
using the API.

For now pages are unmapped via page_pool_unmap_page() before packets
travel into the network stack, as it doesn't have a return hook yet.
Given this call cleared the page_pool state, it is safe to let the
page be returned to the normal page allocator.

Signed-off-by: Ilias Apalodimas 
Signed-off-by: Jesper Dangaard Brouer 
---
 drivers/net/ethernet/marvell/Kconfig  |1 +
 drivers/net/ethernet/marvell/mvneta.c |   56 -
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig 
b/drivers/net/ethernet/marvell/Kconfig
index 3238aa7f5dac..3325abe67465 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -60,6 +60,7 @@ config MVNETA
depends on ARCH_MVEBU || COMPILE_TEST
select MVMDIO
select PHYLINK
+   select PAGE_POOL
---help---
  This driver supports the network interface units in the
  Marvell ARMADA XP, ARMADA 370, ARMADA 38x and
diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 5bfd349bf41a..2354421fe96f 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include "mvneta_bm.h"
+#include 
 #include 
 #include 
 #include 
@@ -624,6 +625,9 @@ struct mvneta_rx_queue {
struct sk_buff *skb;
int left_size;
 
+   /* page pool */
+   struct page_pool *page_pool;
+
/* error counters */
u32 skb_alloc_err;
u32 refill_err;
@@ -1813,17 +1817,11 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
dma_addr_t phys_addr;
struct page *page;
 
-   page = __dev_alloc_page(gfp_mask);
+   page = page_pool_dev_alloc_pages(rxq->page_pool);
if (!page)
return -ENOMEM;
 
-   /* map page for use */
-   phys_addr = dma_map_page(pp->dev->dev.parent, page, 0, PAGE_SIZE,
-DMA_FROM_DEVICE);
-   if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
-   __free_page(page);
-   return -ENOMEM;
-   }
+   phys_addr = page_pool_get_dma_addr(page);
 
phys_addr += pp->rx_offset_correction;
mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
@@ -1892,10 +1890,11 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
if (!data || !(rx_desc->buf_phys_addr))
continue;
 
-   dma_unmap_page(pp->dev->dev.parent, rx_desc->buf_phys_addr,
-  PAGE_SIZE, DMA_FROM_DEVICE);
-   __free_page(data);
+   page_pool_put_page(rxq->page_pool, data, false);
}
+
+   if (rxq->page_pool)
+   page_pool_destroy(rxq->page_pool);
 }
 
 static inline
@@ -2010,8 +2009,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
skb_add_rx_frag(rxq->skb, frag_num, page,
frag_offset, frag_size,
PAGE_SIZE);
-   dma_unmap_page(dev->dev.parent, phys_addr,
-  PAGE_SIZE, DMA_FROM_DEVICE);
+   page_pool_unmap_page(rxq->page_pool, page);
rxq->left_size -= frag_size;
}
} else {
@@ -2041,8 +2039,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
frag_offset, frag_size,
PAGE_SIZE);
 
-   dma_unmap_page(dev->dev.parent, phys_addr,
-  PAGE_SIZE, DMA_FROM_DEVICE);
+   page_pool_unmap_page(rxq->page_pool, page);
 
rxq->left_size -= frag_size;
}
@@ -2828,11 +2825,37 @@ static int mvneta_poll(struct napi_struct *napi, int 
budget)
return rx_done;
 }
 
+static int mvneta_create_page_pool(struct mvneta_port *pp,
+  struct mvneta_rx_queue *rxq, int num)
+{
+   struct page_pool_params pp_params = { 0 };
+   int err = 0;
+
+   pp_params.order = 0;
+   /* internal DMA mapping in page_pool */
+   pp_params.flags = PP_FLAG_DMA_MAP;
+   pp_params.pool_size = num;
+   pp_params.nid = 

[net-next PATCH RFC 1/8] page_pool: add helper functions for DMA

2018-12-06 Thread Jesper Dangaard Brouer
From: Ilias Apalodimas 

Add helper functions for retreiving dma_addr_t stored in page_private and
unmapping dma addresses, mapped via the page_pool API.

Signed-off-by: Ilias Apalodimas 
Signed-off-by: Jesper Dangaard Brouer 
---
 include/net/page_pool.h |6 ++
 net/core/page_pool.c|7 +++
 2 files changed, 13 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 694d055e01ef..439f9183d4cd 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -111,6 +111,8 @@ struct page_pool *page_pool_create(const struct 
page_pool_params *params);
 
 void page_pool_destroy(struct page_pool *pool);
 
+void page_pool_unmap_page(struct page_pool *pool, struct page *page);
+
 /* Never call this directly, use helpers below */
 void __page_pool_put_page(struct page_pool *pool,
  struct page *page, bool allow_direct);
@@ -141,4 +143,8 @@ static inline bool is_page_pool_compiled_in(void)
 #endif
 }
 
+static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
+{
+   return page_private(page);
+}
 #endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 43a932cb609b..26e14a17a67c 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -184,6 +184,13 @@ static void __page_pool_clean_page(struct page_pool *pool,
set_page_private(page, 0);
 }
 
+/* unmap the page and clean our state */
+void page_pool_unmap_page(struct page_pool *pool, struct page *page)
+{
+   __page_pool_clean_page(pool, page);
+}
+EXPORT_SYMBOL(page_pool_unmap_page);
+
 /* Return a page to the page allocator, cleaning up our state */
 static void __page_pool_return_page(struct page_pool *pool, struct page *page)
 {



[net-next PATCH RFC 6/8] mvneta: activate page recycling via skb using page_pool

2018-12-06 Thread Jesper Dangaard Brouer
Previous mvneta patches have already started to use page_pool, but
this was primarily for RX page alloc-side and for doing DMA map/unmap
handling.  Pages traveling through the netstack was unmapped and
returned through the normal page allocator.

It is now time to activate that pages are recycled back. This involves
registering the page_pool with the XDP rxq memory model API, even-though
the driver doesn't support XDP itself yet.  And simply updating the
SKB->mem_info field with info from the xdp_rxq_info.

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Ilias Apalodimas 
---
 drivers/net/ethernet/marvell/mvneta.c |   29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 78f1fcdc1f00..449c19829d67 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -628,6 +628,9 @@ struct mvneta_rx_queue {
/* page pool */
struct page_pool *page_pool;
 
+   /* XDP */
+   struct xdp_rxq_info xdp_rxq;
+
/* error counters */
u32 skb_alloc_err;
u32 refill_err;
@@ -1892,6 +1895,9 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
page_pool_put_page(rxq->page_pool, data, false);
}
 
+   if (xdp_rxq_info_is_reg(>xdp_rxq))
+   xdp_rxq_info_unreg(>xdp_rxq);
+
if (rxq->page_pool)
page_pool_destroy(rxq->page_pool);
 }
@@ -1978,11 +1984,11 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 
rx_desc->buf_phys_addr = 0;
frag_num = 0;
+   rxq->skb->mem_info = rxq->xdp_rxq.mem;
skb_reserve(rxq->skb, MVNETA_MH_SIZE + NET_SKB_PAD);
skb_put(rxq->skb, rx_bytes < PAGE_SIZE ? rx_bytes :
PAGE_SIZE);
mvneta_rx_csum(pp, rx_status, rxq->skb);
-   page_pool_unmap_page(rxq->page_pool, page);
rxq->left_size = rx_bytes < PAGE_SIZE ? 0 : rx_bytes -
PAGE_SIZE;
} else {
@@ -2001,7 +2007,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
skb_add_rx_frag(rxq->skb, frag_num, page,
0, frag_size,
PAGE_SIZE);
-
+   /* skb frags[] are not recycled, unmap now */
page_pool_unmap_page(rxq->page_pool, page);
 
rxq->left_size -= frag_size;
@@ -2815,10 +2821,25 @@ static int mvneta_create_page_pool(struct mvneta_port 
*pp,
 static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
   int num)
 {
-   int i = 0;
+   int err, i = 0;
+
+   err = mvneta_create_page_pool(pp, rxq, num);
+   if (err)
+   goto out;
 
-   if (mvneta_create_page_pool(pp, rxq, num))
+   err = xdp_rxq_info_reg(>xdp_rxq, pp->dev, rxq->id);
+   if (err) {
+   page_pool_destroy(rxq->page_pool);
+   goto out;
+   }
+
+   err = xdp_rxq_info_reg_mem_model(>xdp_rxq, MEM_TYPE_PAGE_POOL,
+rxq->page_pool);
+   if (err) {
+   xdp_rxq_info_unreg(>xdp_rxq);
+   page_pool_destroy(rxq->page_pool);
goto out;
+   }
 
for (i = 0; i < num; i++) {
memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));



[net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API

2018-12-06 Thread Jesper Dangaard Brouer
From: Ilias Apalodimas 

This patch is changing struct sk_buff, and is thus per-definition
controversial.

Place a new member 'mem_info' of type struct xdp_mem_info, just after
members (flags) head_frag and pfmemalloc, And not in between
headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
Copying mem_info during skb_clone() is required.  This makes sure that
pages are correctly freed or recycled during the altered
skb_free_head() invocation.

The 'mem_info' name is chosen as this is not strictly tied to XDP,
although the XDP return infrastructure is used.  As a future plan, we
could introduce a __u8 flags member to xdp_mem_info and move flags
head_frag and pfmemalloc into this area.

Signed-off-by: Ilias Apalodimas 
Signed-off-by: Jesper Dangaard Brouer 
---
 include/linux/skbuff.h |6 +-
 include/net/xdp.h  |1 +
 net/core/skbuff.c  |7 +++
 net/core/xdp.c |6 ++
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7dcfb5591dc3..95dac0ba6947 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* The interface for checksum offload between the stack and networking drivers
  * is as follows...
@@ -744,6 +745,10 @@ struct sk_buff {
head_frag:1,
xmit_more:1,
pfmemalloc:1;
+   /* TODO: Future idea, extend mem_info with __u8 flags, and
+* move bits head_frag and pfmemalloc there.
+*/
+   struct xdp_mem_info mem_info;
 
/* fields enclosed in headers_start/headers_end are copied
 * using a single memcpy() in __copy_skb_header()
@@ -827,7 +832,6 @@ struct sk_buff {
 #ifdef CONFIG_NETWORK_SECMARK
__u32   secmark;
 #endif
-
union {
__u32   mark;
__u32   reserved_tailroom;
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 5c33b9e0efab..4a0ca7a3d5e5 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -128,6 +128,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 void xdp_return_frame(struct xdp_frame *xdpf);
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
+void xdp_return_skb_page(void *data, struct xdp_mem_info *mem_info);
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
 struct net_device *dev, u32 queue_index);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b4ee5c8b928f..71aca186e44c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -70,6 +70,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -544,6 +545,11 @@ static void skb_free_head(struct sk_buff *skb)
 {
unsigned char *head = skb->head;
 
+   if (skb->mem_info.type == MEM_TYPE_PAGE_POOL) {
+   xdp_return_skb_page(head, >mem_info);
+   return;
+   }
+
if (skb->head_frag)
skb_free_frag(head);
else
@@ -859,6 +865,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, 
struct sk_buff *skb)
n->nohdr = 0;
n->peeked = 0;
C(pfmemalloc);
+   C(mem_info);
n->destructor = NULL;
C(tail);
C(end);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index e79526314864..1703be4c2611 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -379,6 +379,12 @@ void xdp_return_buff(struct xdp_buff *xdp)
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
 
+void xdp_return_skb_page(void *data, struct xdp_mem_info *mem_info)
+{
+   __xdp_return(data, mem_info, false, 0);
+}
+EXPORT_SYMBOL(xdp_return_skb_page);
+
 int xdp_attachment_query(struct xdp_attachment_info *info,
 struct netdev_bpf *bpf)
 {



[net-next PATCH RFC 7/8] xdp: bpf: cpumap redirect must update skb->mem_info

2018-12-06 Thread Jesper Dangaard Brouer
XDP-redirect to CPUMAP is about creating the SKB outside the driver
(and on another CPU) via xdp_frame info. Transfer the xdp_frame mem
info to the new SKB mem_info field.

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Ilias Apalodimas 
---
 kernel/bpf/cpumap.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 24aac0d0f412..e3e05b6ccc42 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -199,6 +199,8 @@ static struct sk_buff *cpu_map_build_skb(struct 
bpf_cpu_map_entry *rcpu,
/* Essential SKB info: protocol and skb->dev */
skb->protocol = eth_type_trans(skb, xdpf->dev_rx);
 
+   skb->mem_info = xdpf->mem;
+
/* Optional SKB info, currently missing:
 * - HW checksum info   (skb->ip_summed)
 * - HW RX hash (skb_set_hash)



[net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB

2018-12-06 Thread Jesper Dangaard Brouer
This RFC patchset shows the plans for allowing page_pool to handle and
maintain DMA map/unmap of the pages it serves to the driver.  For this
to work a return hook in the network core is introduced, which also
involves extending sk_buff with the necessary information.

The overall purpose is to simplify drivers, by providing a page
allocation API that does recycling, such that each driver doesn't have
to reinvent its own recycling scheme.  Using page_pool in a driver
does not require implementing XDP support, but it makes it trivially
easy to do so.  The recycles code leverage the XDP recycle APIs.

The Marvell mvneta driver was used in this patchset to demonstrate how
to use the API, and tested on the EspressoBIN board.  We also have
patches enabling XDP for this driver, but they are not part of this
patchset as we want review of the general idea of the page_pool return
SKB hook.

A driver using page_pool and XDP redirecting into CPUMAP or veth, will
also take advantage of the new SKB return hook, this is currently only mlx5.


The longer term plans involves allowing other types of allocators to
use this return hook.  Particularly allowing zero-copy AF_XDP frames
to travel further into the netstack, if userspace page have been
restricted to read-only.

---

Ilias Apalodimas (4):
  page_pool: add helper functions for DMA
  net: mvneta: use page pool API for sw buffer manager
  net: core: add recycle capabilities on skbs via page_pool API
  net: mvneta: remove copybreak, prefetch and use build_skb

Jesper Dangaard Brouer (4):
  xdp: reduce size of struct xdp_mem_info
  mvneta: activate page recycling via skb using page_pool
  xdp: bpf: cpumap redirect must update skb->mem_info
  veth: xdp_frames redirected into veth need to transfer xdp_mem_info


 drivers/net/ethernet/marvell/Kconfig  |1 
 drivers/net/ethernet/marvell/mvneta.c |  158 +
 drivers/net/veth.c|1 
 include/linux/skbuff.h|6 +
 include/net/page_pool.h   |6 +
 include/net/xdp.h |5 +
 kernel/bpf/cpumap.c   |2 
 net/core/page_pool.c  |7 +
 net/core/skbuff.c |7 +
 net/core/xdp.c|   14 ++-
 10 files changed, 125 insertions(+), 82 deletions(-)

--


[net-next PATCH RFC 3/8] xdp: reduce size of struct xdp_mem_info

2018-12-06 Thread Jesper Dangaard Brouer
It is possible to compress/reduce the size of struct xdp_mem_info.
This change reduce struct xdp_mem_info from 8 bytes to 4 bytes.

The member xdp_mem_info.id can be reduced to u16, as the mem_id_ht
rhashtable in net/core/xdp.c is already limited by MEM_ID_MAX=0xFFFE
which can safely fit in u16.

The member xdp_mem_info.type could be reduced more than u16, as it stores
the enum xdp_mem_type, but due to alignment it is only reduced to u16.

Signed-off-by: Jesper Dangaard Brouer 
---
 include/net/xdp.h |4 ++--
 net/core/xdp.c|8 
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0f25b3675c5c..5c33b9e0efab 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -46,8 +46,8 @@ enum xdp_mem_type {
 #define XDP_XMIT_FLAGS_MASKXDP_XMIT_FLUSH
 
 struct xdp_mem_info {
-   u32 type; /* enum xdp_mem_type, but known size type */
-   u32 id;
+   u16 type; /* enum xdp_mem_type, but known size type */
+   u16 id;
 };
 
 struct page_pool;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4b2b194f4f1f..e79526314864 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -42,11 +42,11 @@ struct xdp_mem_allocator {
 
 static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed)
 {
-   const u32 *k = data;
-   const u32 key = *k;
+   const u16 *k = data;
+   const u16 key = *k;
 
BUILD_BUG_ON(FIELD_SIZEOF(struct xdp_mem_allocator, mem.id)
-!= sizeof(u32));
+!= sizeof(u16));
 
/* Use cyclic increasing ID as direct hash key */
return key;
@@ -56,7 +56,7 @@ static int xdp_mem_id_cmp(struct rhashtable_compare_arg *arg,
  const void *ptr)
 {
const struct xdp_mem_allocator *xa = ptr;
-   u32 mem_id = *(u32 *)arg->key;
+   u16 mem_id = *(u16 *)arg->key;
 
return xa->mem.id != mem_id;
 }



RE: [Intel-wired-lan] [PATCH] ixgbe: Fix race when the VF driver does a reset

2018-12-06 Thread Bowers, AndrewX
> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Ross Lagerwall
> Sent: Wednesday, December 5, 2018 5:54 AM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Cc: Ross Lagerwall ; David S. Miller
> 
> Subject: [Intel-wired-lan] [PATCH] ixgbe: Fix race when the VF driver does a
> reset
> 
> When the VF driver does a reset, it (at least the Linux one) writes to the
> VFCTRL register to issue a reset and then immediately sends a reset message
> using the mailbox API. This is racy because when the PF driver detects that
> the VFCTRL register reset pin has been asserted, it clears the mailbox
> memory. Depending on ordering, the reset message sent by the VF could be
> cleared by the PF driver. It then responds to the cleared message with a
> NACK which causes the VF driver to malfunction.
> Fix this by deferring clearing the mailbox memory until the reset message is
> received.
> 
> Fixes: 939b701ad633 ("ixgbe: fix driver behaviour after issuing VFLR")
> Signed-off-by: Ross Lagerwall 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)

Tested-by: Andrew Bowers 




RE: [Intel-wired-lan] [PATCH v3 2/2] i40e: DRY rx_ptype handling code

2018-12-06 Thread Bowers, AndrewX
> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Michal Miroslaw
> Sent: Tuesday, December 4, 2018 9:31 AM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH v3 2/2] i40e: DRY rx_ptype handling code
> 
> Move rx_ptype extracting to i40e_process_skb_fields() to avoid duplicating
> the code.
> 
> Signed-off-by: Michał Mirosław 
> Signed-off-by: Michał Mirosław 
> ---
> v3:
>  * no changes
> v2:
>  * fix prototype in i40e_txrx_common.h
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c| 12 
>  drivers/net/ethernet/intel/i40e/i40e_txrx_common.h |  3 +--
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c |  6 +-
>  3 files changed, 6 insertions(+), 15 deletions(-)

Tested-by: Andrew Bowers 




RE: [Intel-wired-lan] [PATCH v3 1/2] i40e: fix VLAN.TCI == 0 RX HW offload

2018-12-06 Thread Bowers, AndrewX
> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On
> Behalf Of Michal Miroslaw
> Sent: Tuesday, December 4, 2018 9:31 AM
> To: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH v3 1/2] i40e: fix VLAN.TCI == 0 RX HW
> offload
> 
> This fixes two bugs in hardware VLAN offload:
>  1. VLAN.TCI == 0 was being dropped
>  2. there was a race between disabling of VLAN RX feature in hardware
> and processing RX queue, where packets processed in this window
> could have their VLAN information dropped
> 
> Fix moves the VLAN handling into i40e_process_skb_fields() to save on
> duplicated code. i40e_receive_skb() becomes trivial and so is removed.
> 
> Signed-off-by: Michał Mirosław 
> Signed-off-by: Michał Mirosław 
> ---
> v3:
>  * fix whitespace for checkpatch
> v2:
>  * no changes
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 31 +--
>  .../ethernet/intel/i40e/i40e_txrx_common.h|  2 --
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c|  6 +---
>  3 files changed, 9 insertions(+), 30 deletions(-)

Tested-by: Andrew Bowers 




[PATCH net-next 2/4] tc-testing: Add new TdcResults module

2018-12-06 Thread Lucas Bates
This module includes new classes for tdc to use in keeping track
of test case results, instead of generating and tracking a lengthy
string.

The new module can be extended to support multiple formal
test result formats to be friendlier to automation.

Signed-off-by: Lucas Bates 
---
 tools/testing/selftests/tc-testing/TdcResults.py | 133 +++
 1 file changed, 133 insertions(+)
 create mode 100644 tools/testing/selftests/tc-testing/TdcResults.py

diff --git a/tools/testing/selftests/tc-testing/TdcResults.py 
b/tools/testing/selftests/tc-testing/TdcResults.py
new file mode 100644
index 000..250fbb2
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/TdcResults.py
@@ -0,0 +1,133 @@
+#!/usr/bin/env python3
+
+from enum import Enum
+
+class ResultState(Enum):
+noresult = -1
+skip = 0
+success = 1
+fail = 2
+
+class TestResult:
+def __init__(self, test_id="", test_name=""):
+   self.test_id = test_id
+   self.test_name = test_name
+   self.result = ResultState.noresult
+   self.failmsg = ""
+   self.errormsg = ""
+   self.steps = []
+
+def set_result(self, result):
+if (isinstance(result, ResultState)):
+self.result = result
+return True
+else:
+raise TypeError('Unknown result type, must be type ResultState')
+
+def get_result(self):
+return self.result
+
+def set_errormsg(self, errormsg):
+self.errormsg = errormsg
+return True
+
+def append_errormsg(self, errormsg):
+self.errormsg = '{}\n{}'.format(self.errormsg, errormsg)
+
+def get_errormsg(self):
+return self.errormsg
+
+def set_failmsg(self, failmsg):
+self.failmsg = failmsg
+return True
+
+def append_failmsg(self, failmsg):
+self.failmsg = '{}\n{}'.format(self.failmsg, failmsg)
+
+def get_failmsg(self):
+return self.failmsg
+
+def add_steps(self, newstep):
+if type(newstep) == list:
+self.steps.extend(newstep)
+elif type(newstep) == str:
+self.steps.append(step)
+else:
+raise TypeError('TdcResults.add_steps() requires a list or str')
+
+def get_executed_steps(self):
+return self.steps
+
+class TestSuiteReport():
+_testsuite = []
+
+def add_resultdata(self, result_data):
+if isinstance(result_data, TestResult):
+self._testsuite.append(result_data)
+return True
+
+def count_tests(self):
+return len(self._testsuite)
+
+def count_failures(self):
+return sum(1 for t in self._testsuite if t.result == ResultState.fail)
+
+def count_skips(self):
+return sum(1 for t in self._testsuite if t.result == ResultState.skip)
+
+def find_result(self, test_id):
+return next((tr for tr in self._testsuite if tr.test_id == test_id), 
None)
+
+def update_result(self, result_data):
+orig = self.find_result(result_data.test_id)
+if orig != None:
+idx = self._testsuite.index(orig)
+self._testsuite[idx] = result_data
+else:
+self.add_resultdata(result_data)
+
+def format_tap(self):
+ftap = ""
+ftap += '1..{}\n'.format(self.count_tests())
+index = 1
+for t in self._testsuite:
+if t.result == ResultState.fail:
+ftap += 'not '
+ftap += 'ok {} {} - {}'.format(str(index), t.test_id, t.test_name)
+if t.result == ResultState.skip or t.result == 
ResultState.noresult:
+ftap += ' # skipped - {}\n'.format(t.errormsg)
+elif t.result == ResultState.fail:
+if len(t.steps) > 0:
+ftap += '\tCommands executed in this test case:'
+for step in t.steps:
+ftap += '\n\t\t{}'.format(step)
+ftap += '\n\t{}'.format(t.failmsg)
+ftap += '\n'
+index += 1
+return ftap
+
+def format_xunit(self):
+from xml.sax.saxutils import escape
+xunit = "\n"
+xunit += '\t\n'.format(self.count_tests(), self.count_skips())
+for t in self._testsuite:
+xunit += '\t\t 0:
+xunit += 'Commands executed in this test case:\n'
+for step in t.steps:
+xunit += '\t{}\n'.format(escape(step))
+xunit += 'FAILURE: {}\n'.format(escape(t.failmsg))
+xunit += '\t\t\t\n'
+if t.errormsg:
+xunit += '\t\t\t\n{}\n'.format(escape(t.errormsg))
+xunit += '\t\t\t\n'
+if t.result == ResultState.skip:
+xunit += '\t\t\t\n'
+xunit += '\t\t\n'
+xunit += '\t\n'
+xunit += '\n'
+return xunit
+
-- 
2.7.4



[PATCH net-next 1/4] tc-testing: Add command timeout feature to tdc

2018-12-06 Thread Lucas Bates
Using an attribute set in the tdc_config.py file, limit the
amount of time tdc will wait for an executed command to
complete and prevent the script from hanging entirely.

This timeout will be applied to all executed commands.

Signed-off-by: Lucas Bates 
---
 tools/testing/selftests/tc-testing/tdc.py| 16 +++-
 tools/testing/selftests/tc-testing/tdc_config.py |  2 ++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/tdc.py 
b/tools/testing/selftests/tc-testing/tdc.py
index 7607ba3..b862ee5 100755
--- a/tools/testing/selftests/tc-testing/tdc.py
+++ b/tools/testing/selftests/tc-testing/tdc.py
@@ -131,12 +131,16 @@ def exec_cmd(args, pm, stage, command):
 stdout=subprocess.PIPE,
 stderr=subprocess.PIPE,
 env=ENVIR)
-(rawout, serr) = proc.communicate()
 
-if proc.returncode != 0 and len(serr) > 0:
-foutput = serr.decode("utf-8", errors="ignore")
-else:
-foutput = rawout.decode("utf-8", errors="ignore")
+try:
+(rawout, serr) = proc.communicate(timeout=NAMES['TIMEOUT'])
+if proc.returncode != 0 and len(serr) > 0:
+foutput = serr.decode("utf-8", errors="ignore")
+else:
+foutput = rawout.decode("utf-8", errors="ignore")
+except subprocess.TimeoutExpired:
+foutput = "Command \"{}\" timed out\n".format(command)
+proc.returncode = 255
 
 proc.stdout.close()
 proc.stderr.close()
@@ -438,6 +442,8 @@ def check_default_settings(args, remaining, pm):
 NAMES['TC'] = args.path
 if args.device != None:
 NAMES['DEV2'] = args.device
+if 'TIMEOUT' not in NAMES:
+NAMES['TIMEOUT'] = None
 if not os.path.isfile(NAMES['TC']):
 print("The specified tc path " + NAMES['TC'] + " does not exist.")
 exit(1)
diff --git a/tools/testing/selftests/tc-testing/tdc_config.py 
b/tools/testing/selftests/tc-testing/tdc_config.py
index d651bc1..6d91e48 100644
--- a/tools/testing/selftests/tc-testing/tdc_config.py
+++ b/tools/testing/selftests/tc-testing/tdc_config.py
@@ -15,6 +15,8 @@ NAMES = {
   'DEV1': 'v0p1',
   'DEV2': '',
   'BATCH_FILE': './batch.txt',
+  # Length of time in seconds to wait before terminating a command
+  'TIMEOUT': 12,
   # Name of the namespace to use
   'NS': 'tcut',
   # Directory containing eBPF test programs
-- 
2.7.4



[PATCH net-next 3/4] tc-testing: Implement the TdcResults module in tdc

2018-12-06 Thread Lucas Bates
In tdc and the valgrind plugin, begin using the TdcResults module
to track executed test cases.

Signed-off-by: Lucas Bates 
---
 tools/testing/selftests/tc-testing/TdcPlugin.py|   3 +-
 .../tc-testing/plugin-lib/valgrindPlugin.py|  22 +++-
 tools/testing/selftests/tc-testing/tdc.py  | 117 -
 3 files changed, 91 insertions(+), 51 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/TdcPlugin.py 
b/tools/testing/selftests/tc-testing/TdcPlugin.py
index 3ee9a6d..1d9e279 100644
--- a/tools/testing/selftests/tc-testing/TdcPlugin.py
+++ b/tools/testing/selftests/tc-testing/TdcPlugin.py
@@ -18,11 +18,12 @@ class TdcPlugin:
 if self.args.verbose > 1:
 print(' -- {}.post_suite'.format(self.sub_class))
 
-def pre_case(self, test_ordinal, testid):
+def pre_case(self, test_ordinal, testid, test_name):
 '''run commands before test_runner does one test'''
 if self.args.verbose > 1:
 print(' -- {}.pre_case'.format(self.sub_class))
 self.args.testid = testid
+self.args.test_name = test_name
 self.args.test_ordinal = test_ordinal
 
 def post_case(self):
diff --git a/tools/testing/selftests/tc-testing/plugin-lib/valgrindPlugin.py 
b/tools/testing/selftests/tc-testing/plugin-lib/valgrindPlugin.py
index 477a7bd..e00c798 100644
--- a/tools/testing/selftests/tc-testing/plugin-lib/valgrindPlugin.py
+++ b/tools/testing/selftests/tc-testing/plugin-lib/valgrindPlugin.py
@@ -11,6 +11,7 @@ from string import Template
 import subprocess
 import time
 from TdcPlugin import TdcPlugin
+from TdcResults import *
 
 from tdc_config import *
 
@@ -21,6 +22,7 @@ class SubPlugin(TdcPlugin):
 def __init__(self):
 self.sub_class = 'valgrind/SubPlugin'
 self.tap = ''
+self._tsr = TestSuiteReport()
 super().__init__()
 
 def pre_suite(self, testcount, testidlist):
@@ -34,10 +36,14 @@ class SubPlugin(TdcPlugin):
 def post_suite(self, index):
 '''run commands after test_runner goes into a test loop'''
 super().post_suite(index)
-self._add_to_tap('\n|---\n')
 if self.args.verbose > 1:
 print('{}.post_suite'.format(self.sub_class))
-print('{}'.format(self.tap))
+#print('{}'.format(self.tap))
+for xx in range(index - 1, self.testcount):
+res = TestResult('{}-mem'.format(self.testidlist[xx]), 'Test 
skipped')
+res.set_result(ResultState.skip)
+res.set_errormsg('Skipped because of prior setup/teardown failure')
+self._add_results(res)
 if self.args.verbose < 4:
 subprocess.check_output('rm -f vgnd-*.log', shell=True)
 
@@ -128,8 +134,17 @@ class SubPlugin(TdcPlugin):
 nle_num = int(nle_mo.group(1))
 
 mem_results = ''
+res = TestResult('{}-mem'.format(self.args.testid),
+  '{} memory leak check'.format(self.args.test_name))
 if (def_num > 0) or (ind_num > 0) or (pos_num > 0) or (nle_num > 0):
 mem_results += 'not '
+res.set_result(ResultState.fail)
+res.set_failmsg('Memory leak detected')
+res.append_failmsg(content)
+else:
+res.set_result(ResultState.success)
+
+self._add_results(res)
 
 mem_results += 'ok {} - {}-mem # {}\n'.format(
 self.args.test_ordinal, self.args.testid, 'memory leak check')
@@ -138,5 +153,8 @@ class SubPlugin(TdcPlugin):
 print('{}'.format(content))
 self._add_to_tap(content)
 
+def _add_results(self, res):
+self._tsr.add_resultdata(res)
+
 def _add_to_tap(self, more_tap_output):
 self.tap += more_tap_output
diff --git a/tools/testing/selftests/tc-testing/tdc.py 
b/tools/testing/selftests/tc-testing/tdc.py
index b862ee5..e6e4ce8 100755
--- a/tools/testing/selftests/tc-testing/tdc.py
+++ b/tools/testing/selftests/tc-testing/tdc.py
@@ -23,6 +23,7 @@ from tdc_config import *
 from tdc_helper import *
 
 import TdcPlugin
+from TdcResults import *
 
 
 class PluginMgrTestFail(Exception):
@@ -60,10 +61,10 @@ class PluginMgr:
 for pgn_inst in reversed(self.plugin_instances):
 pgn_inst.post_suite(index)
 
-def call_pre_case(self, test_ordinal, testid):
+def call_pre_case(self, test_ordinal, testid, test_name):
 for pgn_inst in self.plugin_instances:
 try:
-pgn_inst.pre_case(test_ordinal, testid)
+pgn_inst.pre_case(test_ordinal, testid, test_name)
 except Exception as ee:
 print('exception {} in call to pre_case for {} plugin'.
   format(ee, pgn_inst.__class__))
@@ -102,7 +103,6 @@ class PluginMgr:
 self.argparser = argparse.ArgumentParser(
 description='Linux TC unit tests')
 
-
 def replace_keywords(cmd):
 """
 For a given executable command, substitute any known
@@ -187,6 +187,7 @@ 

[PATCH net-next 4/4] tc-testing: gitignore, ignore generated test results

2018-12-06 Thread Lucas Bates
Ignore any .tap or .xml test result files generated by tdc.

Additionally, ignore plugin symlinks.

Signed-off-by: Lucas Bates 
---
 tools/testing/selftests/tc-testing/.gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/.gitignore 
b/tools/testing/selftests/tc-testing/.gitignore
index 7a60b85..c5cc160 100644
--- a/tools/testing/selftests/tc-testing/.gitignore
+++ b/tools/testing/selftests/tc-testing/.gitignore
@@ -1,2 +1,5 @@
 __pycache__/
 *.pyc
+plugins/
+*.xml
+*.tap
-- 
2.7.4



[PATCH net-next 0/4] tc-testing: implement command timeouts and better results tracking

2018-12-06 Thread Lucas Bates
Patch 1 adds a timeout feature for any command tdc launches in a subshell.
This prevents tdc from hanging indefinitely.

Patches 2-4 introduce a new method for tracking and generating test case
results, and implements it across the core script and all applicable
plugins.

Lucas Bates (4):
  tc-testing: Add command timeout feature to tdc
  tc-testing: Add new TdcResults module
  tc-testing: Implement the TdcResults module in tdc
  tc-testing: gitignore, ignore generated test results

 tools/testing/selftests/tc-testing/.gitignore  |   3 +
 tools/testing/selftests/tc-testing/TdcPlugin.py|   3 +-
 tools/testing/selftests/tc-testing/TdcResults.py   | 133 +
 .../tc-testing/plugin-lib/valgrindPlugin.py|  22 +++-
 tools/testing/selftests/tc-testing/tdc.py  | 133 +
 tools/testing/selftests/tc-testing/tdc_config.py   |   2 +
 6 files changed, 240 insertions(+), 56 deletions(-)
 create mode 100644 tools/testing/selftests/tc-testing/TdcResults.py

--
2.7.4



[PATCH net-next,v5 11/12] qede: place ethtool_rx_flow_spec after code after TC flower codebase

2018-12-06 Thread Pablo Neira Ayuso
This is a preparation patch to reuse the existing TC flower codebase
from ethtool_rx_flow_spec.

This patch is merely moving the core ethtool_rx_flow_spec parser after
tc flower offload driver code so we can skip a few forward function
declarations in the follow up patch.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 drivers/net/ethernet/qlogic/qede/qede_filter.c | 264 -
 1 file changed, 132 insertions(+), 132 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index 833c9ec58a6e..ed77950f6cf9 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -1791,72 +1791,6 @@ static int qede_flow_spec_to_tuple_udpv6(struct qede_dev 
*edev,
return 0;
 }
 
-static int qede_flow_spec_to_tuple(struct qede_dev *edev,
-  struct qede_arfs_tuple *t,
-  struct ethtool_rx_flow_spec *fs)
-{
-   memset(t, 0, sizeof(*t));
-
-   if (qede_flow_spec_validate_unused(edev, fs))
-   return -EOPNOTSUPP;
-
-   switch ((fs->flow_type & ~FLOW_EXT)) {
-   case TCP_V4_FLOW:
-   return qede_flow_spec_to_tuple_tcpv4(edev, t, fs);
-   case UDP_V4_FLOW:
-   return qede_flow_spec_to_tuple_udpv4(edev, t, fs);
-   case TCP_V6_FLOW:
-   return qede_flow_spec_to_tuple_tcpv6(edev, t, fs);
-   case UDP_V6_FLOW:
-   return qede_flow_spec_to_tuple_udpv6(edev, t, fs);
-   default:
-   DP_VERBOSE(edev, NETIF_MSG_IFUP,
-  "Can't support flow of type %08x\n", fs->flow_type);
-   return -EOPNOTSUPP;
-   }
-
-   return 0;
-}
-
-static int qede_flow_spec_validate(struct qede_dev *edev,
-  struct ethtool_rx_flow_spec *fs,
-  struct qede_arfs_tuple *t)
-{
-   if (fs->location >= QEDE_RFS_MAX_FLTR) {
-   DP_INFO(edev, "Location out-of-bounds\n");
-   return -EINVAL;
-   }
-
-   /* Check location isn't already in use */
-   if (test_bit(fs->location, edev->arfs->arfs_fltr_bmap)) {
-   DP_INFO(edev, "Location already in use\n");
-   return -EINVAL;
-   }
-
-   /* Check if the filtering-mode could support the filter */
-   if (edev->arfs->filter_count &&
-   edev->arfs->mode != t->mode) {
-   DP_INFO(edev,
-   "flow_spec would require filtering mode %08x, but %08x 
is configured\n",
-   t->mode, edev->arfs->filter_count);
-   return -EINVAL;
-   }
-
-   /* If drop requested then no need to validate other data */
-   if (fs->ring_cookie == RX_CLS_FLOW_DISC)
-   return 0;
-
-   if (ethtool_get_flow_spec_ring_vf(fs->ring_cookie))
-   return 0;
-
-   if (fs->ring_cookie >= QEDE_RSS_COUNT(edev)) {
-   DP_INFO(edev, "Queue out-of-bounds\n");
-   return -EINVAL;
-   }
-
-   return 0;
-}
-
 /* Must be called while qede lock is held */
 static struct qede_arfs_fltr_node *
 qede_flow_find_fltr(struct qede_dev *edev, struct qede_arfs_tuple *t)
@@ -1896,72 +1830,6 @@ static void qede_flow_set_destination(struct qede_dev 
*edev,
   "Configuring N-tuple for VF 0x%02x\n", n->vfid - 1);
 }
 
-int qede_add_cls_rule(struct qede_dev *edev, struct ethtool_rxnfc *info)
-{
-   struct ethtool_rx_flow_spec *fsp = >fs;
-   struct qede_arfs_fltr_node *n;
-   struct qede_arfs_tuple t;
-   int min_hlen, rc;
-
-   __qede_lock(edev);
-
-   if (!edev->arfs) {
-   rc = -EPERM;
-   goto unlock;
-   }
-
-   /* Translate the flow specification into something fittign our DB */
-   rc = qede_flow_spec_to_tuple(edev, , fsp);
-   if (rc)
-   goto unlock;
-
-   /* Make sure location is valid and filter isn't already set */
-   rc = qede_flow_spec_validate(edev, fsp, );
-   if (rc)
-   goto unlock;
-
-   if (qede_flow_find_fltr(edev, )) {
-   rc = -EINVAL;
-   goto unlock;
-   }
-
-   n = kzalloc(sizeof(*n), GFP_KERNEL);
-   if (!n) {
-   rc = -ENOMEM;
-   goto unlock;
-   }
-
-   min_hlen = qede_flow_get_min_header_size();
-   n->data = kzalloc(min_hlen, GFP_KERNEL);
-   if (!n->data) {
-   kfree(n);
-   rc = -ENOMEM;
-   goto unlock;
-   }
-
-   n->sw_id = fsp->location;
-   set_bit(n->sw_id, edev->arfs->arfs_fltr_bmap);
-   n->buf_len = min_hlen;
-
-   memcpy(>tuple, , sizeof(n->tuple));
-
-   qede_flow_set_destination(edev, n, fsp);
-
-   /* Build a minimal header according to the flow */
-   n->tuple.build_hdr(>tuple, n->data);
-
-   rc = 

[PATCH net-next,v5 01/12] flow_offload: add flow_rule and flow_match structures and use them

2018-12-06 Thread Pablo Neira Ayuso
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.

To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.

This introduces two new interfaces:

bool flow_rule_match_key(rule, dissector_id)

that returns true if a given matching key is set on, and:

flow_rule_match_XYZ(rule, );

To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.

Signed-off-by: Pablo Neira Ayuso 
---
v5: fix double kfree in cls_flower error path, reported by kbuild robot
via Julia Lawal.

 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c   | 174 -
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 194 --
 drivers/net/ethernet/intel/i40e/i40e_main.c| 178 -
 drivers/net/ethernet/intel/iavf/iavf_main.c| 195 --
 drivers/net/ethernet/intel/igb/igb_main.c  |  64 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 420 +
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  | 202 +-
 drivers/net/ethernet/netronome/nfp/flower/action.c |  11 +-
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 417 ++--
 .../net/ethernet/netronome/nfp/flower/offload.c| 145 +++
 drivers/net/ethernet/qlogic/qede/qede_filter.c |  85 ++---
 include/net/flow_offload.h | 115 ++
 include/net/pkt_cls.h  |  11 +-
 net/core/Makefile  |   2 +-
 net/core/flow_offload.c| 143 +++
 net/sched/cls_flower.c |  47 ++-
 16 files changed, 1196 insertions(+), 1207 deletions(-)
 create mode 100644 include/net/flow_offload.h
 create mode 100644 net/core/flow_offload.c

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 749f63beddd8..b82143d6cdde 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -177,18 +177,12 @@ static int bnxt_tc_parse_actions(struct bnxt *bp,
return 0;
 }
 
-#define GET_KEY(flow_cmd, key_type)\
-   skb_flow_dissector_target((flow_cmd)->dissector, key_type,\
- (flow_cmd)->key)
-#define GET_MASK(flow_cmd, key_type)   \
-   skb_flow_dissector_target((flow_cmd)->dissector, key_type,\
- (flow_cmd)->mask)
-
 static int bnxt_tc_parse_flow(struct bnxt *bp,
  struct tc_cls_flower_offload *tc_flow_cmd,
  struct bnxt_tc_flow *flow)
 {
-   struct flow_dissector *dissector = tc_flow_cmd->dissector;
+   struct flow_rule *rule = tc_cls_flower_offload_flow_rule(tc_flow_cmd);
+   struct flow_dissector *dissector = rule->match.dissector;
 
/* KEY_CONTROL and KEY_BASIC are needed for forming a meaningful key */
if ((dissector->used_keys & BIT(FLOW_DISSECTOR_KEY_CONTROL)) == 0 ||
@@ -198,140 +192,120 @@ static int bnxt_tc_parse_flow(struct bnxt *bp,
return -EOPNOTSUPP;
}
 
-   if (dissector_uses_key(dissector, FLOW_DISSECTOR_KEY_BASIC)) {
-   struct flow_dissector_key_basic *key =
-   GET_KEY(tc_flow_cmd, FLOW_DISSECTOR_KEY_BASIC);
-   struct flow_dissector_key_basic *mask =
-   GET_MASK(tc_flow_cmd, FLOW_DISSECTOR_KEY_BASIC);
+   if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_BASIC)) {
+   struct flow_match_basic match;
 
-   flow->l2_key.ether_type = key->n_proto;
-   flow->l2_mask.ether_type = mask->n_proto;
+   flow_rule_match_basic(rule, );
+   flow->l2_key.ether_type = match.key->n_proto;
+   flow->l2_mask.ether_type = match.mask->n_proto;
 
-   if (key->n_proto == htons(ETH_P_IP) ||
-   key->n_proto == htons(ETH_P_IPV6)) {
-   flow->l4_key.ip_proto = key->ip_proto;
-   flow->l4_mask.ip_proto = mask->ip_proto;
+   if (match.key->n_proto == htons(ETH_P_IP) ||
+   match.key->n_proto == htons(ETH_P_IPV6)) {
+   flow->l4_key.ip_proto = match.key->ip_proto;
+   flow->l4_mask.ip_proto = match.mask->ip_proto;
}
}
 
-   if (dissector_uses_key(dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
-   struct flow_dissector_key_eth_addrs *key =
-   GET_KEY(tc_flow_cmd, FLOW_DISSECTOR_KEY_ETH_ADDRS);
-   struct flow_dissector_key_eth_addrs *mask =
-   

[PATCH net-next,v5 10/12] dsa: bcm_sf2: use flow_rule infrastructure

2018-12-06 Thread Pablo Neira Ayuso
Update this driver to use the flow_rule infrastructure, hence we can use
the same code to populate hardware IR from ethtool_rx_flow and the
cls_flower interfaces.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 drivers/net/dsa/bcm_sf2_cfp.c | 102 +++---
 1 file changed, 67 insertions(+), 35 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index e14663ab6dbc..6d8059dc77b7 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "bcm_sf2.h"
 #include "bcm_sf2_regs.h"
@@ -257,7 +258,8 @@ static int bcm_sf2_cfp_act_pol_set(struct bcm_sf2_priv 
*priv,
 }
 
 static void bcm_sf2_cfp_slice_ipv4(struct bcm_sf2_priv *priv,
-  struct ethtool_tcpip4_spec *v4_spec,
+  struct flow_dissector_key_ipv4_addrs *addrs,
+  struct flow_dissector_key_ports *ports,
   unsigned int slice_num,
   bool mask)
 {
@@ -278,7 +280,7 @@ static void bcm_sf2_cfp_slice_ipv4(struct bcm_sf2_priv 
*priv,
 * UDF_n_A6 [23:8]
 * UDF_n_A5 [7:0]
 */
-   reg = be16_to_cpu(v4_spec->pdst) >> 8;
+   reg = be16_to_cpu(ports->dst) >> 8;
if (mask)
offset = CORE_CFP_MASK_PORT(3);
else
@@ -289,9 +291,9 @@ static void bcm_sf2_cfp_slice_ipv4(struct bcm_sf2_priv 
*priv,
 * UDF_n_A4 [23:8]
 * UDF_n_A3 [7:0]
 */
-   reg = (be16_to_cpu(v4_spec->pdst) & 0xff) << 24 |
- (u32)be16_to_cpu(v4_spec->psrc) << 8 |
- (be32_to_cpu(v4_spec->ip4dst) & 0xff00) >> 8;
+   reg = (be16_to_cpu(ports->dst) & 0xff) << 24 |
+ (u32)be16_to_cpu(ports->src) << 8 |
+ (be32_to_cpu(addrs->dst) & 0xff00) >> 8;
if (mask)
offset = CORE_CFP_MASK_PORT(2);
else
@@ -302,9 +304,9 @@ static void bcm_sf2_cfp_slice_ipv4(struct bcm_sf2_priv 
*priv,
 * UDF_n_A2 [23:8]
 * UDF_n_A1 [7:0]
 */
-   reg = (u32)(be32_to_cpu(v4_spec->ip4dst) & 0xff) << 24 |
- (u32)(be32_to_cpu(v4_spec->ip4dst) >> 16) << 8 |
- (be32_to_cpu(v4_spec->ip4src) & 0xff00) >> 8;
+   reg = (u32)(be32_to_cpu(addrs->dst) & 0xff) << 24 |
+ (u32)(be32_to_cpu(addrs->dst) >> 16) << 8 |
+ (be32_to_cpu(addrs->src) & 0xff00) >> 8;
if (mask)
offset = CORE_CFP_MASK_PORT(1);
else
@@ -317,8 +319,8 @@ static void bcm_sf2_cfp_slice_ipv4(struct bcm_sf2_priv 
*priv,
 * Slice ID [3:2]
 * Slice valid  [1:0]
 */
-   reg = (u32)(be32_to_cpu(v4_spec->ip4src) & 0xff) << 24 |
- (u32)(be32_to_cpu(v4_spec->ip4src) >> 16) << 8 |
+   reg = (u32)(be32_to_cpu(addrs->src) & 0xff) << 24 |
+ (u32)(be32_to_cpu(addrs->src) >> 16) << 8 |
  SLICE_NUM(slice_num) | SLICE_VALID;
if (mask)
offset = CORE_CFP_MASK_PORT(0);
@@ -332,9 +334,13 @@ static int bcm_sf2_cfp_ipv4_rule_set(struct bcm_sf2_priv 
*priv, int port,
 unsigned int queue_num,
 struct ethtool_rx_flow_spec *fs)
 {
-   struct ethtool_tcpip4_spec *v4_spec, *v4_m_spec;
+   struct ethtool_rx_flow_spec_input input = {};
const struct cfp_udf_layout *layout;
unsigned int slice_num, rule_index;
+   struct ethtool_rx_flow_rule *flow;
+   struct flow_match_ipv4_addrs ipv4;
+   struct flow_match_ports ports;
+   struct flow_match_ip ip;
u8 ip_proto, ip_frag;
u8 num_udf;
u32 reg;
@@ -343,13 +349,9 @@ static int bcm_sf2_cfp_ipv4_rule_set(struct bcm_sf2_priv 
*priv, int port,
switch (fs->flow_type & ~FLOW_EXT) {
case TCP_V4_FLOW:
ip_proto = IPPROTO_TCP;
-   v4_spec = >h_u.tcp_ip4_spec;
-   v4_m_spec = >m_u.tcp_ip4_spec;
break;
case UDP_V4_FLOW:
ip_proto = IPPROTO_UDP;
-   v4_spec = >h_u.udp_ip4_spec;
-   v4_m_spec = >m_u.udp_ip4_spec;
break;
default:
return -EINVAL;
@@ -367,11 +369,22 @@ static int bcm_sf2_cfp_ipv4_rule_set(struct bcm_sf2_priv 
*priv, int port,
if (rule_index > bcm_sf2_cfp_rule_size(priv))
return -ENOSPC;
 
+   input.fs = fs;
+   flow = ethtool_rx_flow_rule_create();
+   if (IS_ERR(flow))
+   return PTR_ERR(flow);
+
+   flow_rule_match_ipv4_addrs(flow->rule, );
+   flow_rule_match_ports(flow->rule, );
+   flow_rule_match_ip(flow->rule, );
+
layout = _tcpip4_layout;
/* We only use one UDF slice for now */

[PATCH net-next,v5 04/12] cls_api: add translator to flow_action representation

2018-12-06 Thread Pablo Neira Ayuso
This patch implements a new function to translate from native TC action
to the new flow_action representation. Moreover, this patch also updates
cls_flower to use this new function.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 include/net/pkt_cls.h  |  3 ++
 net/sched/cls_api.c| 99 ++
 net/sched/cls_flower.c | 14 +++
 3 files changed, 116 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 9ceac97e5eff..abb035f84321 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -622,6 +622,9 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 
 unsigned int tcf_exts_num_actions(struct tcf_exts *exts);
 
+int tc_setup_flow_action(struct flow_action *flow_action,
+const struct tcf_exts *exts);
+
 int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
 enum tc_setup_type type, void *type_data, bool err_stop);
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3a4d36072fd5..00b7b639f713 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -32,6 +32,13 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
 
@@ -2568,6 +2575,98 @@ int tc_setup_cb_call(struct tcf_block *block, struct 
tcf_exts *exts,
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
 
+int tc_setup_flow_action(struct flow_action *flow_action,
+const struct tcf_exts *exts)
+{
+   const struct tc_action *act;
+   int i, j, k;
+
+   if (!exts)
+   return 0;
+
+   j = 0;
+   tcf_exts_for_each_action(i, act, exts) {
+   struct flow_action_entry *entry;
+
+   entry = _action->entries[j];
+   if (is_tcf_gact_ok(act)) {
+   entry->id = FLOW_ACTION_ACCEPT;
+   } else if (is_tcf_gact_shot(act)) {
+   entry->id = FLOW_ACTION_DROP;
+   } else if (is_tcf_gact_trap(act)) {
+   entry->id = FLOW_ACTION_TRAP;
+   } else if (is_tcf_gact_goto_chain(act)) {
+   entry->id = FLOW_ACTION_GOTO;
+   entry->chain_index = tcf_gact_goto_chain_index(act);
+   } else if (is_tcf_mirred_egress_redirect(act)) {
+   entry->id = FLOW_ACTION_REDIRECT;
+   entry->dev = tcf_mirred_dev(act);
+   } else if (is_tcf_mirred_egress_mirror(act)) {
+   entry->id = FLOW_ACTION_MIRRED;
+   entry->dev = tcf_mirred_dev(act);
+   } else if (is_tcf_vlan(act)) {
+   switch (tcf_vlan_action(act)) {
+   case TCA_VLAN_ACT_PUSH:
+   entry->id = FLOW_ACTION_VLAN_PUSH;
+   entry->vlan.vid = tcf_vlan_push_vid(act);
+   entry->vlan.proto = tcf_vlan_push_proto(act);
+   entry->vlan.prio = tcf_vlan_push_prio(act);
+   break;
+   case TCA_VLAN_ACT_POP:
+   entry->id = FLOW_ACTION_VLAN_POP;
+   break;
+   case TCA_VLAN_ACT_MODIFY:
+   entry->id = FLOW_ACTION_VLAN_MANGLE;
+   entry->vlan.vid = tcf_vlan_push_vid(act);
+   entry->vlan.proto = tcf_vlan_push_proto(act);
+   entry->vlan.prio = tcf_vlan_push_prio(act);
+   break;
+   default:
+   goto err_out;
+   }
+   } else if (is_tcf_tunnel_set(act)) {
+   entry->id = FLOW_ACTION_TUNNEL_ENCAP;
+   entry->tunnel = tcf_tunnel_info(act);
+   } else if (is_tcf_tunnel_release(act)) {
+   entry->id = FLOW_ACTION_TUNNEL_DECAP;
+   entry->tunnel = tcf_tunnel_info(act);
+   } else if (is_tcf_pedit(act)) {
+   for (k = 0; k < tcf_pedit_nkeys(act); k++) {
+   switch (tcf_pedit_cmd(act, k)) {
+   case TCA_PEDIT_KEY_EX_CMD_SET:
+   entry->id = FLOW_ACTION_MANGLE;
+   break;
+   case TCA_PEDIT_KEY_EX_CMD_ADD:
+   entry->id = FLOW_ACTION_ADD;
+   break;
+   default:
+   goto err_out;
+   }
+   entry->mangle.htype = tcf_pedit_htype(act, k);
+   

[PATCH net-next,v5 07/12] cls_flower: don't expose TC actions to drivers anymore

2018-12-06 Thread Pablo Neira Ayuso
Now that drivers have been converted to use the flow action
infrastructure, remove this field from the tc_cls_flower_offload
structure.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 include/net/pkt_cls.h  | 1 -
 net/sched/cls_flower.c | 5 -
 2 files changed, 6 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a08c06e383db..9bd724bfa860 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -768,7 +768,6 @@ struct tc_cls_flower_offload {
unsigned long cookie;
struct flow_rule *rule;
struct flow_stats stats;
-   struct tcf_exts *exts;
u32 classid;
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 806392598ae2..bb4d39689404 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -392,7 +392,6 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
cls_flower.rule->match.dissector = >mask->dissector;
cls_flower.rule->match.mask = >mask->key;
cls_flower.rule->match.key = >mkey;
-   cls_flower.exts = >exts;
cls_flower.classid = f->res.classid;
 
err = tc_setup_flow_action(_flower.rule->action, >exts);
@@ -427,7 +426,6 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct 
cls_fl_filter *f)
tc_cls_common_offload_init(_flower.common, tp, f->flags, NULL);
cls_flower.command = TC_CLSFLOWER_STATS;
cls_flower.cookie = (unsigned long) f;
-   cls_flower.exts = >exts;
cls_flower.classid = f->res.classid;
 
tc_setup_cb_call(block, >exts, TC_SETUP_CLSFLOWER,
@@ -1490,7 +1488,6 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, 
tc_setup_cb_t *cb,
cls_flower.rule->match.dissector = >dissector;
cls_flower.rule->match.mask = >key;
cls_flower.rule->match.key = >mkey;
-   cls_flower.exts = >exts;
 
err = tc_setup_flow_action(_flower.rule->action,
   >exts);
@@ -1523,7 +1520,6 @@ static int fl_hw_create_tmplt(struct tcf_chain *chain,
 {
struct tc_cls_flower_offload cls_flower = {};
struct tcf_block *block = chain->block;
-   struct tcf_exts dummy_exts = { 0, };
 
cls_flower.rule = flow_rule_alloc(0);
if (!cls_flower.rule)
@@ -1535,7 +1531,6 @@ static int fl_hw_create_tmplt(struct tcf_chain *chain,
cls_flower.rule->match.dissector = >dissector;
cls_flower.rule->match.mask = >mask;
cls_flower.rule->match.key = >dummy_key;
-   cls_flower.exts = _exts;
 
/* We don't care if driver (any of them) fails to handle this
 * call. It serves just as a hint for it.
-- 
2.11.0



[PATCH net-next,v5 05/12] flow_offload: add statistics retrieval infrastructure and use it

2018-12-06 Thread Pablo Neira Ayuso
This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.

Signed-off-by: Pablo Neira Ayuso 
---
v5: Fix bytes and packet parameter swap in flow_stats_update() call,
reported by Venkat Duvvuru.

 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c  |  4 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c  |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c   |  2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c |  2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c   |  5 ++---
 include/net/flow_offload.h| 14 ++
 include/net/pkt_cls.h |  1 +
 net/sched/cls_flower.c|  4 
 8 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index b82143d6cdde..09cd75f54eba 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -1366,8 +1366,8 @@ static int bnxt_tc_get_flow_stats(struct bnxt *bp,
lastused = flow->lastused;
spin_unlock(>stats_lock);
 
-   tcf_exts_stats_update(tc_flow_cmd->exts, stats.bytes, stats.packets,
- lastused);
+   flow_stats_update(_flow_cmd->stats, stats.bytes, stats.packets,
+ lastused);
return 0;
 }
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 39c5af5dad3d..8a2d66ee1d7b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -807,9 +807,9 @@ int cxgb4_tc_flower_stats(struct net_device *dev,
if (ofld_stats->packet_count != packets) {
if (ofld_stats->prev_packet_count != packets)
ofld_stats->last_used = jiffies;
-   tcf_exts_stats_update(cls->exts, bytes - ofld_stats->byte_count,
- packets - ofld_stats->packet_count,
- ofld_stats->last_used);
+   flow_stats_update(>stats, bytes - ofld_stats->byte_count,
+ packets - ofld_stats->packet_count,
+ ofld_stats->last_used);
 
ofld_stats->packet_count = packets;
ofld_stats->byte_count = bytes;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 1daaab91280f..2e1eaf6f5139 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -3227,7 +3227,7 @@ int mlx5e_stats_flower(struct mlx5e_priv *priv,
 
mlx5_fc_query_cached(counter, , , );
 
-   tcf_exts_stats_update(f->exts, bytes, packets, lastuse);
+   flow_stats_update(>stats, bytes, packets, lastuse);
 
return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index e6c4c672b1ca..60900e53243b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -460,7 +460,7 @@ int mlxsw_sp_flower_stats(struct mlxsw_sp *mlxsw_sp,
if (err)
goto err_rule_get_stats;
 
-   tcf_exts_stats_update(f->exts, bytes, packets, lastuse);
+   flow_stats_update(>stats, bytes, packets, lastuse);
 
mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset);
return 0;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 708331234908..524b9ae1a639 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -532,9 +532,8 @@ nfp_flower_get_stats(struct nfp_app *app, struct net_device 
*netdev,
ctx_id = be32_to_cpu(nfp_flow->meta.host_ctx_id);
 
spin_lock_bh(>stats_lock);
-   tcf_exts_stats_update(flow->exts, priv->stats[ctx_id].bytes,
- priv->stats[ctx_id].pkts,
- priv->stats[ctx_id].used);
+   flow_stats_update(>stats, priv->stats[ctx_id].bytes,
+ priv->stats[ctx_id].pkts, priv->stats[ctx_id].used);
 
priv->stats[ctx_id].pkts = 0;
priv->stats[ctx_id].bytes = 0;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index dabc819b6cc9..f9ce39992dbd 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -179,4 +179,18 @@ static inline bool flow_rule_match_key(const struct 
flow_rule *rule,
return dissector_uses_key(rule->match.dissector, key);
 }
 
+struct flow_stats {
+  

[PATCH net-next,v5 12/12] qede: use ethtool_rx_flow_rule() to remove duplicated parser code

2018-12-06 Thread Pablo Neira Ayuso
The qede driver supports for ethtool_rx_flow_spec and flower, both
codebases look very similar.

This patch uses the ethtool_rx_flow_rule() infrastructure to remove the
duplicated ethtool_rx_flow_spec parser and consolidate ACL offload
support around the flow_rule infrastructure.

Furthermore, more code can be consolidated by merging
qede_add_cls_rule() and qede_add_tc_flower_fltr(), these two functions
also look very similar.

This driver currently provides simple ACL support, such as 5-tuple
matching, drop policy and queue to CPU.

Drivers that support more features can benefit from this infrastructure
to save even more redundant codebase.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 drivers/net/ethernet/qlogic/qede/qede_filter.c | 279 +++--
 1 file changed, 76 insertions(+), 203 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c 
b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index ed77950f6cf9..37c0651184ce 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -1665,132 +1665,6 @@ static int qede_set_v6_tuple_to_profile(struct qede_dev 
*edev,
return 0;
 }
 
-static int qede_flow_spec_to_tuple_ipv4_common(struct qede_dev *edev,
-  struct qede_arfs_tuple *t,
-  struct ethtool_rx_flow_spec *fs)
-{
-   if ((fs->h_u.tcp_ip4_spec.ip4src &
-fs->m_u.tcp_ip4_spec.ip4src) != fs->h_u.tcp_ip4_spec.ip4src) {
-   DP_INFO(edev, "Don't support IP-masks\n");
-   return -EOPNOTSUPP;
-   }
-
-   if ((fs->h_u.tcp_ip4_spec.ip4dst &
-fs->m_u.tcp_ip4_spec.ip4dst) != fs->h_u.tcp_ip4_spec.ip4dst) {
-   DP_INFO(edev, "Don't support IP-masks\n");
-   return -EOPNOTSUPP;
-   }
-
-   if ((fs->h_u.tcp_ip4_spec.psrc &
-fs->m_u.tcp_ip4_spec.psrc) != fs->h_u.tcp_ip4_spec.psrc) {
-   DP_INFO(edev, "Don't support port-masks\n");
-   return -EOPNOTSUPP;
-   }
-
-   if ((fs->h_u.tcp_ip4_spec.pdst &
-fs->m_u.tcp_ip4_spec.pdst) != fs->h_u.tcp_ip4_spec.pdst) {
-   DP_INFO(edev, "Don't support port-masks\n");
-   return -EOPNOTSUPP;
-   }
-
-   if (fs->h_u.tcp_ip4_spec.tos) {
-   DP_INFO(edev, "Don't support tos\n");
-   return -EOPNOTSUPP;
-   }
-
-   t->eth_proto = htons(ETH_P_IP);
-   t->src_ipv4 = fs->h_u.tcp_ip4_spec.ip4src;
-   t->dst_ipv4 = fs->h_u.tcp_ip4_spec.ip4dst;
-   t->src_port = fs->h_u.tcp_ip4_spec.psrc;
-   t->dst_port = fs->h_u.tcp_ip4_spec.pdst;
-
-   return qede_set_v4_tuple_to_profile(edev, t);
-}
-
-static int qede_flow_spec_to_tuple_tcpv4(struct qede_dev *edev,
-struct qede_arfs_tuple *t,
-struct ethtool_rx_flow_spec *fs)
-{
-   t->ip_proto = IPPROTO_TCP;
-
-   if (qede_flow_spec_to_tuple_ipv4_common(edev, t, fs))
-   return -EINVAL;
-
-   return 0;
-}
-
-static int qede_flow_spec_to_tuple_udpv4(struct qede_dev *edev,
-struct qede_arfs_tuple *t,
-struct ethtool_rx_flow_spec *fs)
-{
-   t->ip_proto = IPPROTO_UDP;
-
-   if (qede_flow_spec_to_tuple_ipv4_common(edev, t, fs))
-   return -EINVAL;
-
-   return 0;
-}
-
-static int qede_flow_spec_to_tuple_ipv6_common(struct qede_dev *edev,
-  struct qede_arfs_tuple *t,
-  struct ethtool_rx_flow_spec *fs)
-{
-   struct in6_addr zero_addr;
-
-   memset(_addr, 0, sizeof(zero_addr));
-
-   if ((fs->h_u.tcp_ip6_spec.psrc &
-fs->m_u.tcp_ip6_spec.psrc) != fs->h_u.tcp_ip6_spec.psrc) {
-   DP_INFO(edev, "Don't support port-masks\n");
-   return -EOPNOTSUPP;
-   }
-
-   if ((fs->h_u.tcp_ip6_spec.pdst &
-fs->m_u.tcp_ip6_spec.pdst) != fs->h_u.tcp_ip6_spec.pdst) {
-   DP_INFO(edev, "Don't support port-masks\n");
-   return -EOPNOTSUPP;
-   }
-
-   if (fs->h_u.tcp_ip6_spec.tclass) {
-   DP_INFO(edev, "Don't support tclass\n");
-   return -EOPNOTSUPP;
-   }
-
-   t->eth_proto = htons(ETH_P_IPV6);
-   memcpy(>src_ipv6, >h_u.tcp_ip6_spec.ip6src,
-  sizeof(struct in6_addr));
-   memcpy(>dst_ipv6, >h_u.tcp_ip6_spec.ip6dst,
-  sizeof(struct in6_addr));
-   t->src_port = fs->h_u.tcp_ip6_spec.psrc;
-   t->dst_port = fs->h_u.tcp_ip6_spec.pdst;
-
-   return qede_set_v6_tuple_to_profile(edev, t, _addr);
-}
-
-static int qede_flow_spec_to_tuple_tcpv6(struct qede_dev *edev,
-struct qede_arfs_tuple *t,
-

[PATCH net-next,v5 09/12] ethtool: add ethtool_rx_flow_spec to flow_rule structure translator

2018-12-06 Thread Pablo Neira Ayuso
This patch adds a function to translate the ethtool_rx_flow_spec
structure to the flow_rule representation.

This allows us to reuse code from the driver side given that both flower
and ethtool_rx_flow interfaces use the same representation.

This patch also includes support for the flow type flags FLOW_EXT,
FLOW_MAC_EXT and FLOW_RSS.

The ethtool_rx_flow_spec_input wrapper structure is used to convey the
rss_context field, that is away from the ethtool_rx_flow_spec structure,
and the ethtool_rx_flow_spec structure.

Signed-off-by: Pablo Neira Ayuso 
---
v5: support for FLOW_RSS flowtype flag and set rss context in queue action.
Suggested by Michal Kubecek.

 include/linux/ethtool.h |  15 +++
 net/core/ethtool.c  | 240 
 2 files changed, 255 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index afd9596ce636..19a8de5326fb 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -400,4 +400,19 @@ struct ethtool_ops {
void(*get_ethtool_phy_stats)(struct net_device *,
 struct ethtool_stats *, u64 *);
 };
+
+struct ethtool_rx_flow_rule {
+   struct flow_rule*rule;
+   unsigned long   priv[0];
+};
+
+struct ethtool_rx_flow_spec_input {
+   const struct ethtool_rx_flow_spec   *fs;
+   u32 rss_ctx;
+};
+
+struct ethtool_rx_flow_rule *
+ethtool_rx_flow_rule_create(const struct ethtool_rx_flow_spec_input *input);
+void ethtool_rx_flow_rule_destroy(struct ethtool_rx_flow_rule *rule);
+
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index d05402868575..2711d0737d3f 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Some useful ethtool_ops methods that're device independent.
@@ -2808,3 +2809,242 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
 
return rc;
 }
+
+struct ethtool_rx_flow_key {
+   struct flow_dissector_key_basic basic;
+   union {
+   struct flow_dissector_key_ipv4_addrsipv4;
+   struct flow_dissector_key_ipv6_addrsipv6;
+   };
+   struct flow_dissector_key_ports tp;
+   struct flow_dissector_key_ipip;
+   struct flow_dissector_key_vlan  vlan;
+   struct flow_dissector_key_eth_addrs eth_addrs;
+} __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
+
+struct ethtool_rx_flow_match {
+   struct flow_dissector   dissector;
+   struct ethtool_rx_flow_key  key;
+   struct ethtool_rx_flow_key  mask;
+};
+
+struct ethtool_rx_flow_rule *
+ethtool_rx_flow_rule_create(const struct ethtool_rx_flow_spec_input *input)
+{
+   const struct ethtool_rx_flow_spec *fs = input->fs;
+   static struct in6_addr zero_addr = {};
+   struct ethtool_rx_flow_match *match;
+   struct ethtool_rx_flow_rule *flow;
+   struct flow_action_entry *act;
+
+   flow = kzalloc(sizeof(struct ethtool_rx_flow_rule) +
+  sizeof(struct ethtool_rx_flow_match), GFP_KERNEL);
+   if (!flow)
+   return ERR_PTR(-ENOMEM);
+
+   /* ethtool_rx supports only one single action per rule. */
+   flow->rule = flow_rule_alloc(1);
+   if (!flow->rule) {
+   kfree(flow);
+   return ERR_PTR(-ENOMEM);
+   }
+
+   match = (struct ethtool_rx_flow_match *)flow->priv;
+   flow->rule->match.dissector = >dissector;
+   flow->rule->match.mask  = >mask;
+   flow->rule->match.key   = >key;
+
+   match->mask.basic.n_proto = htons(0x);
+
+   switch (fs->flow_type & ~(FLOW_EXT | FLOW_MAC_EXT | FLOW_RSS)) {
+   case TCP_V4_FLOW:
+   case UDP_V4_FLOW: {
+   const struct ethtool_tcpip4_spec *v4_spec, *v4_m_spec;
+
+   match->key.basic.n_proto = htons(ETH_P_IP);
+
+   v4_spec = >h_u.tcp_ip4_spec;
+   v4_m_spec = >m_u.tcp_ip4_spec;
+
+   if (v4_m_spec->ip4src) {
+   match->key.ipv4.src = v4_spec->ip4src;
+   match->mask.ipv4.src = v4_m_spec->ip4src;
+   }
+   if (v4_m_spec->ip4dst) {
+   match->key.ipv4.dst = v4_spec->ip4dst;
+   match->mask.ipv4.dst = v4_m_spec->ip4dst;
+   }
+   if (v4_m_spec->ip4src ||
+   v4_m_spec->ip4dst) {
+   match->dissector.used_keys |=
+   BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS);
+   match->dissector.offset[FLOW_DISSECTOR_KEY_IPV4_ADDRS] =
+   offsetof(struct ethtool_rx_flow_key, ipv4);
+   }
+   if (v4_m_spec->psrc) {
+   

[PATCH net-next,v5 06/12] drivers: net: use flow action infrastructure

2018-12-06 Thread Pablo Neira Ayuso
This patch updates drivers to use the new flow action infrastructure.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c   |  74 +++---
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 250 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 266 ++---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c |   2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  |  54 +++--
 drivers/net/ethernet/netronome/nfp/flower/action.c | 187 ---
 drivers/net/ethernet/qlogic/qede/qede_filter.c |  12 +-
 7 files changed, 418 insertions(+), 427 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 09cd75f54eba..b7bd27edd80e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -61,9 +61,9 @@ static u16 bnxt_flow_get_dst_fid(struct bnxt *pf_bp, struct 
net_device *dev)
 
 static int bnxt_tc_parse_redir(struct bnxt *bp,
   struct bnxt_tc_actions *actions,
-  const struct tc_action *tc_act)
+  const struct flow_action_entry *act)
 {
-   struct net_device *dev = tcf_mirred_dev(tc_act);
+   struct net_device *dev = act->dev;
 
if (!dev) {
netdev_info(bp->dev, "no dev in mirred action");
@@ -77,16 +77,16 @@ static int bnxt_tc_parse_redir(struct bnxt *bp,
 
 static int bnxt_tc_parse_vlan(struct bnxt *bp,
  struct bnxt_tc_actions *actions,
- const struct tc_action *tc_act)
+ const struct flow_action_entry *act)
 {
-   switch (tcf_vlan_action(tc_act)) {
-   case TCA_VLAN_ACT_POP:
+   switch (act->id) {
+   case FLOW_ACTION_VLAN_POP:
actions->flags |= BNXT_TC_ACTION_FLAG_POP_VLAN;
break;
-   case TCA_VLAN_ACT_PUSH:
+   case FLOW_ACTION_VLAN_PUSH:
actions->flags |= BNXT_TC_ACTION_FLAG_PUSH_VLAN;
-   actions->push_vlan_tci = htons(tcf_vlan_push_vid(tc_act));
-   actions->push_vlan_tpid = tcf_vlan_push_proto(tc_act);
+   actions->push_vlan_tci = htons(act->vlan.vid);
+   actions->push_vlan_tpid = act->vlan.proto;
break;
default:
return -EOPNOTSUPP;
@@ -96,10 +96,10 @@ static int bnxt_tc_parse_vlan(struct bnxt *bp,
 
 static int bnxt_tc_parse_tunnel_set(struct bnxt *bp,
struct bnxt_tc_actions *actions,
-   const struct tc_action *tc_act)
+   const struct flow_action_entry *act)
 {
-   struct ip_tunnel_info *tun_info = tcf_tunnel_info(tc_act);
-   struct ip_tunnel_key *tun_key = _info->key;
+   const struct ip_tunnel_info *tun_info = act->tunnel;
+   const struct ip_tunnel_key *tun_key = _info->key;
 
if (ip_tunnel_info_af(tun_info) != AF_INET) {
netdev_info(bp->dev, "only IPv4 tunnel-encap is supported");
@@ -113,51 +113,43 @@ static int bnxt_tc_parse_tunnel_set(struct bnxt *bp,
 
 static int bnxt_tc_parse_actions(struct bnxt *bp,
 struct bnxt_tc_actions *actions,
-struct tcf_exts *tc_exts)
+struct flow_action *flow_action)
 {
-   const struct tc_action *tc_act;
+   struct flow_action_entry *act;
int i, rc;
 
-   if (!tcf_exts_has_actions(tc_exts)) {
+   if (!flow_action_has_entries(flow_action)) {
netdev_info(bp->dev, "no actions");
return -EINVAL;
}
 
-   tcf_exts_for_each_action(i, tc_act, tc_exts) {
-   /* Drop action */
-   if (is_tcf_gact_shot(tc_act)) {
+   flow_action_for_each(i, act, flow_action) {
+   switch (act->id) {
+   case FLOW_ACTION_DROP:
actions->flags |= BNXT_TC_ACTION_FLAG_DROP;
return 0; /* don't bother with other actions */
-   }
-
-   /* Redirect action */
-   if (is_tcf_mirred_egress_redirect(tc_act)) {
-   rc = bnxt_tc_parse_redir(bp, actions, tc_act);
+   case FLOW_ACTION_REDIRECT:
+   rc = bnxt_tc_parse_redir(bp, actions, act);
if (rc)
return rc;
-   continue;
-   }
-
-   /* Push/pop VLAN */
-   if (is_tcf_vlan(tc_act)) {
-   rc = bnxt_tc_parse_vlan(bp, actions, tc_act);
+   break;
+   case FLOW_ACTION_VLAN_POP:
+   case FLOW_ACTION_VLAN_PUSH:
+   case FLOW_ACTION_VLAN_MANGLE:
+   rc = bnxt_tc_parse_vlan(bp, 

[PATCH net-next,v5 08/12] flow_offload: add wake-up-on-lan and queue to flow_action

2018-12-06 Thread Pablo Neira Ayuso
These actions need to be added to support the ethtool_rx_flow interface.
The queue action includes a field to specify the RSS context, that is
set via FLOW_RSS flow type flag and the rss_context field in struct
ethtool_rxnfc, plus the corresponding queue index. FLOW_RSS implies that
rss_context is non-zero, therefore, queue.ctx == 0 means that FLOW_RSS
was not set.

Signed-off-by: Pablo Neira Ayuso 
---
v5: add queue structure and context field, per Michal Kubecek.

 include/net/flow_offload.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index f9ce39992dbd..6489fb9eb394 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -116,6 +116,8 @@ enum flow_action_id {
FLOW_ACTION_ADD,
FLOW_ACTION_CSUM,
FLOW_ACTION_MARK,
+   FLOW_ACTION_WAKE,
+   FLOW_ACTION_QUEUE,
 };
 
 /* This is mirroring enum pedit_header_type definition for easy mapping between
@@ -150,6 +152,10 @@ struct flow_action_entry {
const struct ip_tunnel_info *tunnel;/* 
FLOW_ACTION_TUNNEL_ENCAP */
u32 csum_flags; /* FLOW_ACTION_CSUM */
u32 mark;   /* FLOW_ACTION_MARK */
+   struct {/* FLOW_ACTION_QUEUE */
+   u32 ctx;
+   u32 index;
+   } queue;
};
 };
 
-- 
2.11.0



[PATCH net-next,v5 02/12] net/mlx5e: support for two independent packet edit actions

2018-12-06 Thread Pablo Neira Ayuso
This patch adds pedit_headers_action structure to store the result of
parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
populate the mlx5e hardware intermediate representation once all actions
have been parsed.

This patch comes in preparation for the new flow_action infrastructure,
where each packet mangling comes in an separated action, ie. not packed
as in tc pedit.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 81 ++---
 1 file changed, 59 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index d2e6c6578b9c..1daaab91280f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -1751,6 +1751,12 @@ struct pedit_headers {
struct udphdr  udp;
 };
 
+struct pedit_headers_action {
+   struct pedit_headersvals;
+   struct pedit_headersmasks;
+   u32 pedits;
+};
+
 static int pedit_header_offsets[] = {
[TCA_PEDIT_KEY_EX_HDR_TYPE_ETH] = offsetof(struct pedit_headers, eth),
[TCA_PEDIT_KEY_EX_HDR_TYPE_IP4] = offsetof(struct pedit_headers, ip4),
@@ -1762,16 +1768,15 @@ static int pedit_header_offsets[] = {
 #define pedit_header(_ph, _htype) ((void *)(_ph) + 
pedit_header_offsets[_htype])
 
 static int set_pedit_val(u8 hdr_type, u32 mask, u32 val, u32 offset,
-struct pedit_headers *masks,
-struct pedit_headers *vals)
+struct pedit_headers_action *hdrs)
 {
u32 *curr_pmask, *curr_pval;
 
if (hdr_type >= __PEDIT_HDR_TYPE_MAX)
goto out_err;
 
-   curr_pmask = (u32 *)(pedit_header(masks, hdr_type) + offset);
-   curr_pval  = (u32 *)(pedit_header(vals, hdr_type) + offset);
+   curr_pmask = (u32 *)(pedit_header(>masks, hdr_type) + offset);
+   curr_pval  = (u32 *)(pedit_header(>vals, hdr_type) + offset);
 
if (*curr_pmask & mask)  /* disallow acting twice on the same location 
*/
goto out_err;
@@ -1827,8 +1832,7 @@ static struct mlx5_fields fields[] = {
  * max from the SW pedit action. On success, it says how many HW actions were
  * actually parsed.
  */
-static int offload_pedit_fields(struct pedit_headers *masks,
-   struct pedit_headers *vals,
+static int offload_pedit_fields(struct pedit_headers_action *hdrs,
struct mlx5e_tc_flow_parse_attr *parse_attr,
struct netlink_ext_ack *extack)
 {
@@ -1843,10 +1847,10 @@ static int offload_pedit_fields(struct pedit_headers 
*masks,
__be16 mask_be16;
void *action;
 
-   set_masks = [TCA_PEDIT_KEY_EX_CMD_SET];
-   add_masks = [TCA_PEDIT_KEY_EX_CMD_ADD];
-   set_vals = [TCA_PEDIT_KEY_EX_CMD_SET];
-   add_vals = [TCA_PEDIT_KEY_EX_CMD_ADD];
+   set_masks = [TCA_PEDIT_KEY_EX_CMD_SET].masks;
+   add_masks = [TCA_PEDIT_KEY_EX_CMD_ADD].masks;
+   set_vals = [TCA_PEDIT_KEY_EX_CMD_SET].vals;
+   add_vals = [TCA_PEDIT_KEY_EX_CMD_ADD].vals;
 
action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
action = parse_attr->mod_hdr_actions;
@@ -1942,12 +1946,14 @@ static int offload_pedit_fields(struct pedit_headers 
*masks,
 }
 
 static int alloc_mod_hdr_actions(struct mlx5e_priv *priv,
-const struct tc_action *a, int namespace,
+struct pedit_headers_action *hdrs,
+int namespace,
 struct mlx5e_tc_flow_parse_attr *parse_attr)
 {
int nkeys, action_size, max_actions;
 
-   nkeys = tcf_pedit_nkeys(a);
+   nkeys = hdrs[TCA_PEDIT_KEY_EX_CMD_SET].pedits +
+   hdrs[TCA_PEDIT_KEY_EX_CMD_ADD].pedits;
action_size = MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto);
 
if (namespace == MLX5_FLOW_NAMESPACE_FDB) /* FDB offloading */
@@ -1971,18 +1977,15 @@ static const struct pedit_headers zero_masks = {};
 static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 const struct tc_action *a, int namespace,
 struct mlx5e_tc_flow_parse_attr *parse_attr,
+struct pedit_headers_action *hdrs,
 struct netlink_ext_ack *extack)
 {
-   struct pedit_headers masks[__PEDIT_CMD_MAX], vals[__PEDIT_CMD_MAX], 
*cmd_masks;
int nkeys, i, err = -EOPNOTSUPP;
u32 mask, val, offset;
u8 cmd, htype;
 
nkeys = tcf_pedit_nkeys(a);
 
-   memset(masks, 0, sizeof(struct pedit_headers) * __PEDIT_CMD_MAX);
-   memset(vals,  0, sizeof(struct pedit_headers) * __PEDIT_CMD_MAX);
-
for (i = 0; i < nkeys; i++) {
htype = tcf_pedit_htype(a, i);
  

[PATCH net-next,v5 03/12] flow_offload: add flow action infrastructure

2018-12-06 Thread Pablo Neira Ayuso
This new infrastructure defines the nic actions that you can perform
from existing network drivers. This infrastructure allows us to avoid a
direct dependency with the native software TC action representation.

Signed-off-by: Pablo Neira Ayuso 
---
v5: rebase on top of net-next head.

 include/net/flow_offload.h | 69 +-
 include/net/pkt_cls.h  |  2 ++
 net/core/flow_offload.c| 14 --
 net/sched/cls_api.c| 17 
 net/sched/cls_flower.c |  7 +++--
 5 files changed, 103 insertions(+), 6 deletions(-)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 461c66595763..dabc819b6cc9 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -100,11 +100,78 @@ void flow_rule_match_enc_keyid(const struct flow_rule 
*rule,
 void flow_rule_match_enc_opts(const struct flow_rule *rule,
  struct flow_match_enc_opts *out);
 
+enum flow_action_id {
+   FLOW_ACTION_ACCEPT  = 0,
+   FLOW_ACTION_DROP,
+   FLOW_ACTION_TRAP,
+   FLOW_ACTION_GOTO,
+   FLOW_ACTION_REDIRECT,
+   FLOW_ACTION_MIRRED,
+   FLOW_ACTION_VLAN_PUSH,
+   FLOW_ACTION_VLAN_POP,
+   FLOW_ACTION_VLAN_MANGLE,
+   FLOW_ACTION_TUNNEL_ENCAP,
+   FLOW_ACTION_TUNNEL_DECAP,
+   FLOW_ACTION_MANGLE,
+   FLOW_ACTION_ADD,
+   FLOW_ACTION_CSUM,
+   FLOW_ACTION_MARK,
+};
+
+/* This is mirroring enum pedit_header_type definition for easy mapping between
+ * tc pedit action. Legacy TCA_PEDIT_KEY_EX_HDR_TYPE_NETWORK is mapped to
+ * FLOW_ACT_MANGLE_UNSPEC, which is supported by no driver.
+ */
+enum flow_action_mangle_base {
+   FLOW_ACT_MANGLE_UNSPEC  = 0,
+   FLOW_ACT_MANGLE_HDR_TYPE_ETH,
+   FLOW_ACT_MANGLE_HDR_TYPE_IP4,
+   FLOW_ACT_MANGLE_HDR_TYPE_IP6,
+   FLOW_ACT_MANGLE_HDR_TYPE_TCP,
+   FLOW_ACT_MANGLE_HDR_TYPE_UDP,
+};
+
+struct flow_action_entry {
+   enum flow_action_id id;
+   union {
+   u32 chain_index;/* FLOW_ACTION_GOTO */
+   struct net_device   *dev;   /* FLOW_ACTION_REDIRECT 
*/
+   struct {/* FLOW_ACTION_VLAN */
+   u16 vid;
+   __be16  proto;
+   u8  prio;
+   } vlan;
+   struct {/* 
FLOW_ACTION_PACKET_EDIT */
+   enum flow_action_mangle_base htype;
+   u32 offset;
+   u32 mask;
+   u32 val;
+   } mangle;
+   const struct ip_tunnel_info *tunnel;/* 
FLOW_ACTION_TUNNEL_ENCAP */
+   u32 csum_flags; /* FLOW_ACTION_CSUM */
+   u32 mark;   /* FLOW_ACTION_MARK */
+   };
+};
+
+struct flow_action {
+   unsigned intnum_entries;
+   struct flow_action_entryentries[0];
+};
+
+static inline bool flow_action_has_entries(const struct flow_action *action)
+{
+   return action->num_entries;
+}
+
+#define flow_action_for_each(__i, __act, __actions)\
+for (__i = 0, __act = &(__actions)->entries[0]; __i < 
(__actions)->num_entries; __act = &(__actions)->entries[__i++])
+
 struct flow_rule {
struct flow_match   match;
+   struct flow_action  action;
 };
 
-struct flow_rule *flow_rule_alloc(void);
+struct flow_rule *flow_rule_alloc(unsigned int num_actions);
 
 static inline bool flow_rule_match_key(const struct flow_rule *rule,
   enum flow_dissector_key_id key)
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 359876ee32be..9ceac97e5eff 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -620,6 +620,8 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 #endif /* CONFIG_NET_CLS_IND */
 
+unsigned int tcf_exts_num_actions(struct tcf_exts *exts);
+
 int tc_setup_cb_call(struct tcf_block *block, struct tcf_exts *exts,
 enum tc_setup_type type, void *type_data, bool err_stop);
 
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 2fbf6903d2f6..c3a00eac4804 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -3,9 +3,19 @@
 #include 
 #include 
 
-struct flow_rule *flow_rule_alloc(void)
+struct flow_rule *flow_rule_alloc(unsigned int num_actions)
 {
-   return kzalloc(sizeof(struct flow_rule), GFP_KERNEL);
+   struct flow_rule *rule;
+
+   rule = kzalloc(sizeof(struct flow_rule) +
+  sizeof(struct flow_action_entry) * num_actions,
+  GFP_KERNEL);
+   if (!rule)
+   return NULL;
+
+   rule->action.num_entries = num_actions;
+
+   return rule;
 }
 

[PATCH net-next,v5 00/12] add flow_rule infrastructure

2018-12-06 Thread Pablo Neira Ayuso
Hi,

This is another iteration of the in-kernel intermediate representation
(IR) that allows to express ACL hardware offloads using one unified
representation from the driver side for the ethtool and the tc
frontends [1] [2] [3].

In words of Michal Kubecek:

"... the ethtool interface can apply four types of action to matching
packets:

  - put into a specific queue
  - discard
  - distribute accross the queues according to a RSS context
  - use the rule as a wake-on-lan filter"

This new round now supports for these four types, that can be mapped to
the flow_rule representation.

Changes from previous version:

* Michal Kubecek:
- Add support for FLOW_RSS flag to the ethtool_rx_flow_spec
  to flow_rule translator.

* Venkat Duvvuru:
- Fix accidental swapping of flow_stats_update() bytes and
  packets parameter.

* kbuild robot:
- Fix double kfree in error path from cls_flower, via Julian Lawal.
- Fix enum type mismatch in nfp driver reported by sparse
  checks.

Please apply, thanks.

Pablo Neira Ayuso (12):
  flow_offload: add flow_rule and flow_match structures and use them
  net/mlx5e: support for two independent packet edit actions
  flow_offload: add flow action infrastructure
  cls_api: add translator to flow_action representation
  flow_offload: add statistics retrieval infrastructure and use it
  drivers: net: use flow action infrastructure
  cls_flower: don't expose TC actions to drivers anymore
  flow_offload: add wake-up-on-lan and queue to flow_action
  ethtool: add ethtool_rx_flow_spec to flow_rule structure translator
  dsa: bcm_sf2: use flow_rule infrastructure
  qede: place ethtool_rx_flow_spec after code after TC flower codebase
  qede: use ethtool_rx_flow_rule() to remove duplicated parser code

 drivers/net/dsa/bcm_sf2_cfp.c  | 102 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c   | 252 +++
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 450 ++---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 178 ++---
 drivers/net/ethernet/intel/iavf/iavf_main.c| 195 +++---
 drivers/net/ethernet/intel/igb/igb_main.c  |  64 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c| 743 ++---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c |   2 +-
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  | 258 ---
 drivers/net/ethernet/netronome/nfp/flower/action.c | 198 +++---
 drivers/net/ethernet/netronome/nfp/flower/match.c  | 417 ++--
 .../net/ethernet/netronome/nfp/flower/offload.c| 150 ++---
 drivers/net/ethernet/qlogic/qede/qede_filter.c | 572 ++--
 include/linux/ethtool.h|  15 +
 include/net/flow_offload.h | 202 ++
 include/net/pkt_cls.h  |  18 +-
 net/core/Makefile  |   2 +-
 net/core/ethtool.c | 240 +++
 net/core/flow_offload.c| 153 +
 net/sched/cls_api.c| 116 
 net/sched/cls_flower.c |  71 +-
 21 files changed, 2403 insertions(+), 1995 deletions(-)
 create mode 100644 include/net/flow_offload.h
 create mode 100644 net/core/flow_offload.c

-- 
2.11.0



[PATCH net-next] neighbour: Improve garbage collection

2018-12-06 Thread David Ahern
From: David Ahern 

The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

-|-x|--|-
t-5 t t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:
1. use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern 
---
 Documentation/networking/ip-sysctl.txt |   4 +-
 include/net/neighbour.h|   4 ++
 net/core/neighbour.c   | 122 +++--
 3 files changed, 93 insertions(+), 37 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index af2a69439b93..acdfb5d2bcaa 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -108,8 +108,8 @@ neigh/default/gc_thresh2 - INTEGER
Default: 512
 
 neigh/default/gc_thresh3 - INTEGER
-   Maximum number of neighbor entries allowed.  Increase this
-   when using large numbers of interfaces and when communicating
+   Maximum number of non-PERMANENT neighbor entries allowed.  Increase
+   this when using large numbers of interfaces and when communicating
with large numbers of directly-connected peers.
Default: 1024
 
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index f58b384aa6c9..846ad8da91eb 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -154,6 +154,8 @@ struct neighbour {
struct hh_cache hh;
int (*output)(struct neighbour *, struct sk_buff *);
const struct neigh_ops  *ops;
+   struct list_headgc_list;
+   boolon_gc_list;
struct rcu_head rcu;
struct net_device   *dev;
u8  primary_key[0];
@@ -214,6 +216,8 @@ struct neigh_table {
struct timer_list   proxy_timer;
struct sk_buff_head proxy_queue;
atomic_tentries;
+   atomic_tgc_entries;
+   struct list_headgc_list;
rwlock_tlock;
unsigned long   last_rand;
struct neigh_statistics __percpu *stats;
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 6d479b5562be..ab11e94ec44d 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -118,6 +118,36 @@ unsigned long 

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-06 Thread Tony Lindgren
* Russell King - ARM Linux  [181206 18:08]:
> reverted, the problem is still there.  Revert:
> 
> ec0daae685b2 ("gpio: omap: Add level wakeup handling for omap4 based SoCs")
> 
> on top, and networking returns to normal.  So it appears to be this
> last commit causing the issue.
> 
> With that and b764a5863fd8 applied, it still misbehaves.  Then, poking
> at the OMAP4_GPIO_IRQWAKEN0 register, changing it from 0 to 4 with
> devmem2 restores normal behaviour - ping times are normal and NFS is
> happy.
> 
> # devmem2 0x48055044 w 4

OK thanks.

> Given that this GPIO device is not runtime suspended, and is
> permanently active (which is what I think we expect, given that it
> has an IRQ claimed against it) does the hardware still attempt to
> idle the GPIO block - if so, could that be why we need to program
> the wakeup register, so the GPIO block signals that it's active?

Yes we now idle non-irq GPIOs only from CPU_CLUSTER_PM_ENTER
as the selected cpuidle state triggers the domain transitions
with WFI. And that's why runtime_suspended_time does not increase
for a GPIO instance with IRQs.

I can reproduce the long ping latencies on duovero smsc connected
to gpio_44, I'll try to debug it more.

Regards,

Tony


Re: [PATCH net-next 2/2] net: dsa: Set the master device's MTU to account for DSA overheads

2018-12-06 Thread Stephen Hemminger
On Thu,  6 Dec 2018 11:36:05 +0100
Andrew Lunn  wrote:

> +void dsa_master_set_mtu(struct net_device *dev, struct dsa_port *cpu_dp)
> +{
> + unsigned int mtu = ETH_DATA_LEN + cpu_dp->tag_ops->overhead;
> + int err;
> +
> + rtnl_lock();
> + if (mtu <= dev->max_mtu) {
> + err = dev_set_mtu(dev, mtu);
> + if (err)
> + netdev_dbg(dev, "Unable to set MTU to include for DSA 
> overheads\n");
> + }
> + rtnl_unlock();
> +}
> +

You don't need the debug message. Use err_ack instead?

Debug messages are usually disabled in most distributions.


Re: [PATCH net-next 2/2] net: dsa: Set the master device's MTU to account for DSA overheads

2018-12-06 Thread David Miller
From: Andrew Lunn 
Date: Thu, 6 Dec 2018 21:48:46 +0100

> David has already accepted the patchset, so i will add a followup
> patch.

Yeah sorry for jumping the gun, the changes looked pretty
straightforward to me. :-/


Re: [PATCH net-next v2 0/8] Pass extack to NETDEV_PRE_UP

2018-12-06 Thread David Miller
From: Petr Machata 
Date: Thu, 6 Dec 2018 17:05:35 +

> Drivers may need to validate configuration of a device that's about to
> be upped. An example is mlxsw, which needs to check the configuration of
> a VXLAN device attached to an offloaded bridge. Should the validation
> fail, there's currently no way to communicate details of the failure to
> the user, beyond an error number.
> 
> Therefore this patch set extends the NETDEV_PRE_UP event to include
> extack, if available.
 ...

Series applied, thank you.


Re: [PATCH net 0/4] mlxsw: Various fixes

2018-12-06 Thread David Miller
From: Ido Schimmel 
Date: Thu, 6 Dec 2018 17:44:48 +

> Patches #1 and #2 fix two VxLAN related issues. The first patch removes
> warnings that can currently be triggered from user space. Second patch
> avoids leaking a FID in an error path.
> 
> Patch #3 fixes a too strict check that causes certain host routes not to
> be promoted to perform GRE decapsulation in hardware.
> 
> Last patch avoids a use-after-free when deleting a VLAN device via an
> ioctl when it is enslaved to a bridge. I have a patchset for net-next
> that reworks this code and makes the driver more robust.

Series applied.


Re: [PATCH net-next 2/2] net: dsa: Set the master device's MTU to account for DSA overheads

2018-12-06 Thread Andrew Lunn
On Thu, Dec 06, 2018 at 12:21:31PM -0800, Florian Fainelli wrote:
> On 12/6/18 2:36 AM, Andrew Lunn wrote:
> > DSA tagging of frames sent over the master interface to the switch
> > increases the size of the frame. Such frames can then be bigger than
> > the normal MTU of the master interface, and it may drop them. Use the
> > overhead information from the tagger to set the MTU of the master
> > device to include this overhead.
> > 
> > Signed-off-by: Andrew Lunn 
> > ---
> >  net/dsa/master.c | 16 
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/net/dsa/master.c b/net/dsa/master.c
> > index c90ee3227dea..42f525bc68e2 100644
> > --- a/net/dsa/master.c
> > +++ b/net/dsa/master.c
> > @@ -158,8 +158,24 @@ static void dsa_master_ethtool_teardown(struct 
> > net_device *dev)
> > cpu_dp->orig_ethtool_ops = NULL;
> >  }
> >  
> > +void dsa_master_set_mtu(struct net_device *dev, struct dsa_port *cpu_dp)
> > +{
> > +   unsigned int mtu = ETH_DATA_LEN + cpu_dp->tag_ops->overhead;
> > +   int err;
> > +
> > +   rtnl_lock();
> > +   if (mtu <= dev->max_mtu) {
> > +   err = dev_set_mtu(dev, mtu);
> > +   if (err)
> > +   netdev_dbg(dev, "Unable to set MTU to include for DSA 
> > overheads\n");
> > +   }
> 
> Would it make sense to warn the user that there might be
> transmit/receive issues with the DSA tagging protocol if either
> dev_set_mtu() fails or mtu > dev->max_mtu?

I thought about that. But we have setups which work today with the
standard MTU. The master might not implement the set_mtu op, or might
impose the standard MTU, but be quite happy to deal with our DSA
packets. So i wanted to make this a hint to the master device, not a
strong requirement.

> Not that I think it matters too much to people because unbinding the
> switch driver and expecting the CPU port to continue operating is
> wishful thinking, but we should probably unwind that operation in
> dsa_master_teardown(), right?

That would make sense.

David has already accepted the patchset, so i will add a followup
patch.

Andrew


Re: [PATCH RFC 2/6] net: dsa: microchip: Add MIB counter reading support

2018-12-06 Thread Andrew Lunn
> I wonder what is the official way to clear the counters.

I don't think there is one, other than unloading the driver and
loading it again.

Andrew


Re: [PATCH RFC 5/6] net: dsa: microchip: Update tag_ksz.c to access switch driver

2018-12-06 Thread Andrew Lunn
> I did try to implement this way.  But the other switches do not have the same
> format even though the length is the same.  Then I need to change the 
> following
> files for any new KSZ switch: include/linux/dsa.h, net/dsa/dsa.c, 
> net/dsa/dsa_priv.h,
> and finally net/dsa/tag_ksz.c.

You can always add two different tag drivers. They don't have to share
code if it does not make sense.

> Even then it will not work if Microchip wants to add 1588 PTP
> capability to the switches.
> 
> For KSZ9477 the length of the tail tag changes when the PTP function
> is enabled.  Typically this function is either enabled or disabled
> all the time, but if users want to change that during normal
> operation to see how the switch behaves, the transmit function
> completely stops working correctly.

We should figure out how to support PTP. I think that is the main
issue here.

> Older driver implementation is to monitor that register change and adjust the 
> length
> dynamically.
> 
> Another problem is the tail tag needs to include the timestamp for the 1-step
> Pdelay_Resp to have accurate turnaround time when that message is sent out by 
> the
> switch.  This will require access to the main switch driver which will keep 
> track of those
> PTP messages.
> 
> PTP handles transmit timestamp in skb_tx_timestamp, which is typically called 
> after the
> frame is sent, so it is too late.  DSA calls dsa_skb_tx_timestamp before 
> sending, but it
> only provides a clone to the driver that supports port_txstamp and so the 
> switch driver
> may not be able to do anything.

The current design assumes the hardware will insert the PTP timestamp
into the frame using the clock inside the hardware. You then ask it
what timestamp it actually used. 

If i understand you correctly, in your case, software was to provide
the timestamp which then gets inserted into the frame. So you want to
provide this timestamp as late as possible, when the frame reaches the
head of the queue and is about to be sent out the master interface?

> In dsa_switch_rcv() the CPU receive function is called first before
> dsa_skb_defer_rx_timestamp().  That means the receive tail tag
> operation has to be done first to retrieve the receive timestamp so
> that it can be passed later.

What i think you can do is in your tag rx function you can directly
add the timestamp info to the skbuf. The dsa driver function
.port_txtstamp can then always return false.

Your tag function is going to need access to some driver state, but
you should be able to get at that, following pointers, and placing
some of the structures in global headers.

Andrew


Re: [PATCH 1/2] net: linkwatch: send change uevent on link changes

2018-12-06 Thread Jouke Witteveen
On Thu, Dec 6, 2018 at 9:10 PM David Miller  wrote:
>
> From: Jouke Witteveen 
> Date: Thu, 6 Dec 2018 09:59:20 +0100
>
> > On Thu, Dec 6, 2018 at 1:34 AM David Miller  wrote:
> >>
> >> From: Jouke Witteveen 
> >> Date: Wed, 5 Dec 2018 23:38:17 +0100
> >>
> >> > Can you elaborate a bit? I may not be aware of the policy you have in
> >> > mind.
> >>
> >> When we have a user facing interface to do something, we don't create
> >> another one unless it is absolutely, positively, unavoidable.
> >
> > Obviously, if I would have known this I would not have gone through
> > the trouble of investigating and proposing this patch. It was an
> > honest attempt at making the kernel better.
> > Where could I have found this policy? I have looked on kernel.org/doc,
> > but couldn't find it.
>
> It is not formally documented but it is a concern we raise every time
> a duplicate piece of user facing functionality is proposed.

Ok, thanks for getting back to me! Now I know.

That said, when looking into udev I became more convinced that the
kernel should send uevents on link state changes anyway. An example of
another kernel interface that has two ways of sending out state
changes is rfkill. It informs userspace of state changes via
/dev/rfkill and via uevents. For the latter it also sets some
informative environment variables, which my patch currently does not
do.

What would be needed to get you (or anyone else) to reconsider this
patch (or a revision)? I can definitely see your point and am willing
to accept your rejection. However, I also think there are substantial
ergonomic benefits to a unified and consistent interface for device
state changes and would like to give it one more shot, if possible.

Thanks for your time,
- Jouke


Re: [PATCH] mv88e6060: Warn about errors

2018-12-06 Thread Pavel Machek
On Thu 2018-12-06 12:21:59, David Miller wrote:
> 
> Plain "printk" are never appropriate.
> 
> Please explicitly use pr_warn() or similar.  If there is a device context
> available, either a generic device or a netdev, use one of the dev_*()
> or netdev_*() variants.

Can do, I guess is there's agreeement that such error is worth
some kind of output to the logs?

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: mv88e6060: Turn e6060 driver into e6065 driver

2018-12-06 Thread Pavel Machek
On Thu 2018-12-06 12:23:33, David Miller wrote:
> From: Pavel Machek 
> Date: Thu, 6 Dec 2018 14:03:45 +0100
> 
> > @@ -79,7 +82,7 @@ static enum dsa_tag_protocol 
> > mv88e6060_get_tag_protocol(struct dsa_switch *ds,
> >  {
> >//return DSA_TAG_PROTO_QCA;
> >//return DSA_TAG_PROTO_TRAILER;
> 
> These C++ style comments are not in any of my tree(s).
> 
> Your patch submission really needs to shape up if you want your patches
> to be considered seriously.

This one should have been "RFD". It has way more serious problems then
this, as changelog tried to explain.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


RE: [PATCH RFC 1/6] net: dsa: microchip: Prepare PHY for proper advertisement

2018-12-06 Thread Tristram.Ha
> > +static void ksz9477_phy_setup(struct ksz_device *dev, int port,
> > + struct phy_device *phy)
> > +{
> > +   if (port < dev->phy_port_cnt) {
> > +   /* SUPPORTED_Asym_Pause and SUPPORTED_Pause can be
> removed to
> > +* disable flow control when rate limiting is used.
> > +*/
> > +   }
> 
> Hi Tristram
> 
> Is this meant to be a TODO comment?
> 
> What is supposed to happen here is that all forms of pause are disable
> by default. The MAC driver needs to enable what it supports by calling
> phy_support_sym_pause() or phy_support_asym_pause().
> 
> Ah, is this because there is not a real PHY driver?

The kernel has been changed so I am not sure about the current behavior.
I would like to turn on flow control by default.  Before I just assigned
"supported" to "advertising."  I know linkmode_copy is being used now for that.
But last time I checked "advertising" is already the same as "supported."

There will be a situation that flow control should not be turned on as the 
switch
uses bandwidth control to limit outgoing traffic.

The issue is actually becoming more complex as KSZ9477 has a variant which
does not support gigabit speed, although the same PHY device id is being used.
That means the driver has to fake it by returning a different id and also 
registering
a different PHY driver to handle that.  Marketing also likes to display the 
correct chip
name during kernel booting so that users do not get confused.



Re: mv88e6060: Turn e6060 driver into e6065 driver

2018-12-06 Thread David Miller
From: Pavel Machek 
Date: Thu, 6 Dec 2018 14:03:45 +0100

> @@ -79,7 +82,7 @@ static enum dsa_tag_protocol 
> mv88e6060_get_tag_protocol(struct dsa_switch *ds,
>  {
>//return DSA_TAG_PROTO_QCA;
>//return DSA_TAG_PROTO_TRAILER;

These C++ style comments are not in any of my tree(s).

Your patch submission really needs to shape up if you want your patches
to be considered seriously.

Thank you.


Re: [PATCH] mv88e6060: Warn about errors

2018-12-06 Thread David Miller


Plain "printk" are never appropriate.

Please explicitly use pr_warn() or similar.  If there is a device context
available, either a generic device or a netdev, use one of the dev_*()
or netdev_*() variants.


Re: [PATCH net-next 2/2] net: dsa: Set the master device's MTU to account for DSA overheads

2018-12-06 Thread Florian Fainelli
On 12/6/18 2:36 AM, Andrew Lunn wrote:
> DSA tagging of frames sent over the master interface to the switch
> increases the size of the frame. Such frames can then be bigger than
> the normal MTU of the master interface, and it may drop them. Use the
> overhead information from the tagger to set the MTU of the master
> device to include this overhead.
> 
> Signed-off-by: Andrew Lunn 
> ---
>  net/dsa/master.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/net/dsa/master.c b/net/dsa/master.c
> index c90ee3227dea..42f525bc68e2 100644
> --- a/net/dsa/master.c
> +++ b/net/dsa/master.c
> @@ -158,8 +158,24 @@ static void dsa_master_ethtool_teardown(struct 
> net_device *dev)
>   cpu_dp->orig_ethtool_ops = NULL;
>  }
>  
> +void dsa_master_set_mtu(struct net_device *dev, struct dsa_port *cpu_dp)
> +{
> + unsigned int mtu = ETH_DATA_LEN + cpu_dp->tag_ops->overhead;
> + int err;
> +
> + rtnl_lock();
> + if (mtu <= dev->max_mtu) {
> + err = dev_set_mtu(dev, mtu);
> + if (err)
> + netdev_dbg(dev, "Unable to set MTU to include for DSA 
> overheads\n");
> + }

Would it make sense to warn the user that there might be
transmit/receive issues with the DSA tagging protocol if either
dev_set_mtu() fails or mtu > dev->max_mtu?

> + rtnl_unlock();
> +}
> +
>  int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp)
>  {
> + dsa_master_set_mtu(dev,  cpu_dp);

Not that I think it matters too much to people because unbinding the
switch driver and expecting the CPU port to continue operating is
wishful thinking, but we should probably unwind that operation in
dsa_master_teardown(), right?

> +
>   /* If we use a tagging format that doesn't have an ethertype
>* field, make sure that all packets from this point on get
>* sent to the tag format's receive function.
> 


-- 
Florian


Re: [PATCH] tcp: fix code style in tcp_recvmsg()

2018-12-06 Thread David Miller
From: Pedro Tammela 
Date: Thu,  6 Dec 2018 10:45:28 -0200

> 2 goto labels are indented with a tab. remove the tabs and
> keep the code style consistent.
> 
> Signed-off-by: Pedro Tammela 

Applied to net-next.


Re: [PATCH net-next 0/2] Adjust MTU of DSA master interface

2018-12-06 Thread David Miller
From: Andrew Lunn 
Date: Thu,  6 Dec 2018 11:36:03 +0100

> DSA makes use of additional headers to direct a frame in/out of a
> specific port of the switch. When the slave interfaces uses an MTU of
> 1500, the master interface can be asked to handle frames with an MTU
> of 1504, or 1508 bytes. Some Ethernet interfaces won't
> transmit/receive frames which are bigger than their MTU.
> 
> Automate the increasing of the MTU on the master interface, by adding
> to each tagging driver how much overhead they need, and then calling
> dev_set_mtu() of the master interface to increase its MTU as needed.

Series applied, thanks Andrew.


Re: [PATCH][net-next] tun: align write-heavy flow entry members to a cache line

2018-12-06 Thread David Miller
From: Li RongQing 
Date: Thu,  6 Dec 2018 16:08:17 +0800

> tun flow entry 'updated' fields are written when receive
> every packet. Thus if a flow is receiving packets from a
> particular flow entry, it'll cause false-sharing with
> all the other who has looked it up, so move it in its own
> cache line
> 
> and update 'queue_index' and 'update' field only when
> they are changed to reduce the cache false-sharing.
> 
> Signed-off-by: Zhang Yu 
> Signed-off-by: Wang Li 
> Signed-off-by: Li RongQing 

Applied.


Re: [PATCH][net-next] tun: remove unnecessary check in tun_flow_update

2018-12-06 Thread David Miller
From: Li RongQing 
Date: Thu,  6 Dec 2018 16:28:11 +0800

> caller has guaranted that rxhash is not zero
> 
> Signed-off-by: Li RongQing 

Applied.


RE: [PATCH RFC 2/6] net: dsa: microchip: Add MIB counter reading support

2018-12-06 Thread Tristram.Ha
> > +static void ksz9477_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
> > + u64 *cnt)
> > +{
> > +   u32 data;
> > +   int timeout;
> > +   struct ksz_port *p = >ports[port];
> > +
> > +   /* retain the flush/freeze bit */
> > +   data = p->freeze ? MIB_COUNTER_FLUSH_FREEZE : 0;
> > +   data |= MIB_COUNTER_READ;
> > +   data |= (addr << MIB_COUNTER_INDEX_S);
> > +   ksz_pwrite32(dev, port, REG_PORT_MIB_CTRL_STAT__4, data);
> > +
> > +   timeout = 1000;
> > +   do {
> > +   ksz_pread32(dev, port, REG_PORT_MIB_CTRL_STAT__4,
> > +   );
> > +   usleep_range(1, 10);
> > +   if (!(data & MIB_COUNTER_READ))
> > +   break;
> > +   } while (timeout-- > 0);
> 
> Could you use readx_poll_timeout() here?
> 
> > +void ksz_get_ethtool_stats(struct dsa_switch *ds, int port, uint64_t *buf)
> > +{
> > +   struct ksz_device *dev = ds->priv;
> > +   struct ksz_port_mib *mib;
> > +
> > +   mib = >ports[port].mib;
> > +
> > +   /* freeze MIB counters if supported */
> > +   if (dev->dev_ops->freeze_mib)
> > +   dev->dev_ops->freeze_mib(dev, port, true);
> > +   mutex_lock(>cnt_mutex);
> > +   port_r_cnt(dev, port);
> > +   mutex_unlock(>cnt_mutex);
> > +   if (dev->dev_ops->freeze_mib)
> > +   dev->dev_ops->freeze_mib(dev, port, false);
> 
> Should the freeze be protected by the mutex as well?
> 
> > +   memcpy(buf, mib->counters, dev->mib_cnt * sizeof(u64));
> 
> I wonder if this memcpy should also be protected by the mutex. As soon
> as the mutex is dropped, the scheduled work could start updating
> mib->counters in non-atomic ways?
>

I will update as suggested.
 
> > +}
> > +
> >  int ksz_port_bridge_join(struct dsa_switch *ds, int port,
> >  struct net_device *br)
> >  {
> > @@ -255,6 +349,7 @@ int ksz_enable_port(struct dsa_switch *ds, int port,
> struct phy_device *phy)
> > /* setup slave port */
> > dev->dev_ops->port_setup(dev, port, false);
> > dev->dev_ops->phy_setup(dev, port, phy);
> > +   dev->dev_ops->port_init_cnt(dev, port);
> 
> This is probably not the correct place to do this. MIB counters should
> not be cleared by an ifdown/ifup cycle. They should only be cleared
> when the driver is probed.

I wonder what is the official way to clear the counters.  For network debugging
It is good to clear the counters to start fresh to see which frame is not
being sent or received.  Typically the device is reset when it is shutdown as 
there
are hardware problems.  I would think it is the job of applications like SNMP 
Manager
to keep track of MIB counters throughout the life of a running system.


Re: [PATCH 1/2] net: linkwatch: send change uevent on link changes

2018-12-06 Thread David Miller
From: Jouke Witteveen 
Date: Thu, 6 Dec 2018 09:59:20 +0100

> On Thu, Dec 6, 2018 at 1:34 AM David Miller  wrote:
>>
>> From: Jouke Witteveen 
>> Date: Wed, 5 Dec 2018 23:38:17 +0100
>>
>> > Can you elaborate a bit? I may not be aware of the policy you have in
>> > mind.
>>
>> When we have a user facing interface to do something, we don't create
>> another one unless it is absolutely, positively, unavoidable.
> 
> Obviously, if I would have known this I would not have gone through
> the trouble of investigating and proposing this patch. It was an
> honest attempt at making the kernel better.
> Where could I have found this policy? I have looked on kernel.org/doc,
> but couldn't find it.

It is not formally documented but it is a concern we raise every time
a duplicate piece of user facing functionality is proposed.


RE: [PATCH RFC 5/6] net: dsa: microchip: Update tag_ksz.c to access switch driver

2018-12-06 Thread Tristram.Ha
> >>> Update tag_ksz.c to access switch driver's tail tagging operations.
> >>
> >> Hi Tristram
> >>
> >> Humm, i'm not sure we want this, the tagging spit into two places.  I
> >> need to take a closer look at the previous patch, to see why it cannot
> >> be done here.
> >
> > O.K, i think i get what is going on.
> >
> > I would however implement it differently.
> >
> > One net/dsa/tag_X.c file can export two dsa_device_ops structures,
> > allowing you to share common code for the two taggers. You could call
> > these DSA_TAG_PROTO_KSZ_1_BYTE, and DSA_TAG_PROTO_KSZ_2_BYTE,
> and the
> > .get_tag_protocol call would then return the correct one for the
> > switch.
> 
> Agreed, that is what is done by net/dsa/tag_brcm.c because there are two
> formats for the Broadcom tag:
> 
> - TAG_BRCM: the 4-bytes Broadcom tag is between MAC SA and Ethertype
> - TAG_BRCM_PREPEND: the 4-bytes Broadcom tag is before the MAC DA
>

I did try to implement this way.  But the other switches do not have the same
format even though the length is the same.  Then I need to change the following
files for any new KSZ switch: include/linux/dsa.h, net/dsa/dsa.c, 
net/dsa/dsa_priv.h,
and finally net/dsa/tag_ksz.c.

Even then it will not work if Microchip wants to add 1588 PTP capability to the 
switches.

For KSZ9477 the length of the tail tag changes when the PTP function is enabled.
Typically this function is either enabled or disabled all the time, but if 
users want to
change that during normal operation to see how the switch behaves, the transmit
function completely stops working correctly.

Older driver implementation is to monitor that register change and adjust the 
length
dynamically.

Another problem is the tail tag needs to include the timestamp for the 1-step
Pdelay_Resp to have accurate turnaround time when that message is sent out by 
the
switch.  This will require access to the main switch driver which will keep 
track of those
PTP messages.

PTP handles transmit timestamp in skb_tx_timestamp, which is typically called 
after the
frame is sent, so it is too late.  DSA calls dsa_skb_tx_timestamp before 
sending, but it
only provides a clone to the driver that supports port_txstamp and so the 
switch driver
may not be able to do anything.
 
> And the code to process them is basically using relative offsets from
> the start of the frame to access correct data.
> 
> This is done largely for performance reasons because we have 1/2
> Gigabit/secs capable CPU ports and so we want to avoid as little cache
> trashing as possible and immediately get the right rcv() function to
> process the packets.
> 

The SoC I used for this driver development actually has problem sending
Gigabit traffic so I do not see the effect of any slowdown, and the updated
MAC driver change for a hardware problem does not help and greatly
degrades the transmit performance.

> >
> > It might also be possible to merge in tag_trailer, or at least share
> > some code.
> >

Actually in previous old DSA implementation I just hijacked this file to
add the tail tag operations without creating a new file like tag_ksz.c.

> > What i don't yet understand is how you are passing PTP information
> > around. The commit messages need to explain that, since it is not
> > obvious, and it is the first time we have needed PTP info in a tag
> > driver.

It seems the official 1588 PTP timestamp API for a PHY driver is only 
implemented
in only PHY driver, net/phy/dp83640.c, in the whole kernel.  DSA uses similar
mechanism to support 1588 PTP.  In dsa_switch_rcv() the CPU receive function is 
called
first before dsa_skb_defer_rx_timestamp().  That means the receive tail tag 
operation
has to be done first to retrieve the receive timestamp so that it can be passed 
later.

It is probably not good to change the socket buffer length inside the 
port_rxtstamp
function, and I do not see any other way to insert that transmit timestamp.

A customer has already inquired about implementing 1588 PTP in the DSA driver.  
I hope
this mechanism is approved so that I can start doing that.



Re: [PATCH net-next 0/2] platform data controls for mdio-gpio

2018-12-06 Thread Andrew Lunn
On Thu, Dec 06, 2018 at 09:22:27AM -0800, Florian Fainelli wrote:
> Hi Andrew,
> 
> On 12/6/18 5:58 AM, Andrew Lunn wrote:
> > Soon to be mainlined is an x86 platform with a Marvell switch, and a
> > bit-banging MDIO bus. In order to make this work, the phy_mask of the
> > MDIO bus needs to be set to prevent scanning for PHYs, and the
> > phy_ignore_ta_mask needs to be set because the switch has broken
> > turnaround.
> 
> This looks good, I would just make one/two changes which is to match the
> internal phy_mask and phy_ignore_ta_mask types from the struct mii_bus
> and use u32 instead of int.

Yes, that makes sense.

v2 to follow.

 Andrew


Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-06 Thread Andrew Lunn
> I can have a look at using dormant, but what is change_carrier
> supposed to do if not this?

It is intended for interfaces which are stacked, like the team driver,
and for devices which don't have a phy, e.g. tun, and dummy.

> I didn't find a tool for DORMANT, I guess i will have to write one
> myself(using SIOCGIFFLAGS, SIOCSIFFLAGS)?

ip link should be able to set it.

Try ip link set mode dormant dev eth0

Andrew


Re: [PATCH bpf-next] tools: bpftool: add a command to dump the trace pipe

2018-12-06 Thread Alexei Starovoitov
On Thu, Dec 06, 2018 at 05:20:54PM +, Quentin Monnet wrote:
> 2018-12-05 19:18 UTC-0800 ~ Alexei Starovoitov
> 
> > On Wed, Dec 05, 2018 at 06:15:23PM +, Quentin Monnet wrote:
>  +
>  +/* Allow room for NULL terminating byte and pipe file name */
>  +snprintf(format, sizeof(format), "%%*s %%%zds %%99s %%*s %%*d 
>  %%*d\\n",
>  + PATH_MAX - strlen(pipe_name) - 1);
> >>>
> >>> before scanning trace_pipe could you add a check that trace_options are 
> >>> compatible?
> >>> Otherwise there will be a lot of garbage printed.
> >>> afaik default is rarely changed, so the patch is ok as-is.
> >>> The followup some time in the future would be perfect.
> >>
> >> Sure. What do you mean exactly by compatible options? I can check that
> >> "trace_printk" is set, is there any other option that would be relevant?
> > 
> > See Documentation/trace/ftrace.rst
> > a lot of the flags will change the format significantly.
> > Like 'bin' will make it binary.
> > I'm not suggesting to support all possible output formats.
> > Only to check that trace flags match scanf.
> 
> fscanf() is only used to retrieve the name of the sysfs directory where
> the pipe is located, when listing all the mount points on the system. It
> is not used to dump the content from the pipe (which is done with
> getline(), so formatting does not matter much).
> 
> If the "bin" option is set, "bpftool prog tracelog" will dump the same
> binary content as "cat /sys/kernel/debug/tracing/trace_pipe", which is
> the expected behaviour (at least with the current patch). Let me know if
> you would like me to change this somehow.

I misread the patch :) thanks for explaining. all good then.



Re: [PATCH net-next 4/4] net: aquantia: add support of RSS configuration

2018-12-06 Thread Jakub Kicinski
On Thu, 6 Dec 2018 15:02:52 +, Igor Russkikh wrote:
> From: Dmitry Bogdanov 
> 
> Add support of configuration of RSS hash key and RSS indirection table.
> 
> Signed-off-by: Dmitry Bogdanov 
> Signed-off-by: Igor Russkikh 
> ---
>  .../ethernet/aquantia/atlantic/aq_ethtool.c   | 42 +++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c 
> b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
> index a5fd71692c8b..2f2e12c2b632 100644
> --- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
> +++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
> @@ -202,6 +202,47 @@ static int aq_ethtool_get_rss(struct net_device *ndev, 
> u32 *indir, u8 *key,
>   return 0;
>  }
>  
> +static int aq_ethtool_set_rss(struct net_device *netdev, const u32 *indir,
> +   const u8 *key, const u8 hfunc)
> +{
> + struct aq_nic_s *aq_nic = netdev_priv(netdev);
> + struct aq_nic_cfg_s *cfg;
> + unsigned int i = 0U;
> + u32 rss_entries;
> + int err = 0;
> +
> + cfg = aq_nic_get_cfg(aq_nic);
> + rss_entries = cfg->aq_rss.indirection_table_size;
> +
> + /* We do not allow change in unsupported parameters */
> + if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
> + return -EOPNOTSUPP;
> + /* Fill out the redirection table */
> + if (indir) {
> + /* Verify user input. */
> + for (i = 0; i < rss_entries; i++)
> + if (indir[i] >= cfg->num_rss_queues)
> + return -EINVAL;

nit: you shouldn't have to do this, see ethtool_copy_validate_indir().

> + for (i = 0; i < rss_entries; i++)
> + cfg->aq_rss.indirection_table[i] = indir[i];
> + }
> +
> + /* Fill out the rss hash key */
> + if (key) {
> + memcpy(cfg->aq_rss.hash_secret_key, key,
> +sizeof(cfg->aq_rss.hash_secret_key));
> + err = aq_nic->aq_hw_ops->hw_rss_hash_set(aq_nic->aq_hw,
> + >aq_rss);
> + if (err)
> + return err;
> + }
> +
> + err = aq_nic->aq_hw_ops->hw_rss_set(aq_nic->aq_hw, >aq_rss);
> +
> + return err;
> +}


[PATCH] vhost/vsock: fix reset orphans race with close timeout

2018-12-06 Thread Stefan Hajnoczi
If a local process has closed a connected socket and hasn't received a
RST packet yet, then the socket remains in the table until a timeout
expires.

When a vhost_vsock instance is released with the timeout still pending,
the socket is never freed because vhost_vsock has already set the
SOCK_DONE flag.

Check if the close timer is pending and let it close the socket.  This
prevents the race which can leak sockets.

Reported-by: Maximilian Riemensberger 
Cc: Graham Whaley 
Signed-off-by: Stefan Hajnoczi 
---
 drivers/vhost/vsock.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 34bc3ab40c6d..731e2ea2aeca 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -563,13 +563,21 @@ static void vhost_vsock_reset_orphans(struct sock *sk)
 * executing.
 */
 
-   if (!vhost_vsock_get(vsk->remote_addr.svm_cid)) {
-   sock_set_flag(sk, SOCK_DONE);
-   vsk->peer_shutdown = SHUTDOWN_MASK;
-   sk->sk_state = SS_UNCONNECTED;
-   sk->sk_err = ECONNRESET;
-   sk->sk_error_report(sk);
-   }
+   /* If the peer is still valid, no need to reset connection */
+   if (vhost_vsock_get(vsk->remote_addr.svm_cid))
+   return;
+
+   /* If the close timeout is pending, let it expire.  This avoids races
+* with the timeout callback.
+*/
+   if (vsk->close_work_scheduled)
+   return;
+
+   sock_set_flag(sk, SOCK_DONE);
+   vsk->peer_shutdown = SHUTDOWN_MASK;
+   sk->sk_state = SS_UNCONNECTED;
+   sk->sk_err = ECONNRESET;
+   sk->sk_error_report(sk);
 }
 
 static int vhost_vsock_dev_release(struct inode *inode, struct file *file)
-- 
2.19.2



Re: [PATCH] gianfar: Add gfar_change_carrier()

2018-12-06 Thread Joakim Tjernlund
On Thu, 2018-12-06 at 17:54 +0100, Andrew Lunn wrote:
> 
> > I wish I had a proper DSA/Switchdev driver in place but I don't :(
> > Adding one is not impossible but then a lot of our user space app needs 
> > fixing so all
> > in all it it a fairly big project.
> > Anyhow, these carrier additions should be fine I think?
> 
> I'm not too sure about that. You are potentially messing up the state
> machine, and the MAC driver could be looking at phydev->link, which
> says up, but the carrier is down.
> 
> https://www.kernel.org/doc/Documentation/networking/operstates.txt
> 
> Could you set the interface to dormant? That seems like a better fit
> anyway:
> 
> IF_OPER_DORMANT (5):
>  Interface is L1 up, but waiting for an external event, f.e. for a
>  protocol to establish. (802.1X)
> 
> The interface does have L1 to the switch, but you are waiting for the
> external interface to go up. You can set this from user space without
> needing any kernel changes.

I can have a look at using dormant, but what is change_carrier supposed to do if
not this?

I didn't find a tool for DORMANT, I guess i will have to write one myself(using 
SIOCGIFFLAGS, SIOCSIFFLAGS)?


Re: [RFD] mv88e6060: Allow the driver to be probed from device tree

2018-12-06 Thread kbuild test robot
Hi Pavel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.20-rc5 next-20181206]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Pavel-Machek/mv88e6060-Allow-the-driver-to-be-probed-from-device-tree/20181207-013430
config: i386-randconfig-x007-201848 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/net/dsa/mv88e6060.c: In function 'mv88e6060_probe':
>> drivers/net/dsa/mv88e6060.c:316:16: warning: initialization discards 'const' 
>> qualifier from pointer target type [-Wdiscarded-qualifiers]
  char *name = mv88e6060_get_name(mdiodev->bus, addr);
   ^~
>> drivers/net/dsa/mv88e6060.c:317:34: warning: format '%lx' expects argument 
>> of type 'long unsigned int', but argument 3 has type 'struct mii_bus *' 
>> [-Wformat=]
  printk("e6060: got name %s @ %lx %lx\n", name, mdiodev->bus, addr);
   ~~^   
>> drivers/net/dsa/mv88e6060.c:317:38: warning: format '%lx' expects argument 
>> of type 'long unsigned int', but argument 4 has type 'int' [-Wformat=]
  printk("e6060: got name %s @ %lx %lx\n", name, mdiodev->bus, addr);
   ~~^
   %x
   drivers/net/dsa/mv88e6060.c:307:6: warning: unused variable 'err' 
[-Wunused-variable]
 int err;
 ^~~
   drivers/net/dsa/mv88e6060.c:306:6: warning: unused variable 'eeprom_len' 
[-Wunused-variable]
 u32 eeprom_len;
 ^~
   drivers/net/dsa/mv88e6060.c:304:31: warning: unused variable 'compat_info' 
[-Wunused-variable]
 const struct mv88e6060_info *compat_info;
  ^~~
   drivers/net/dsa/mv88e6060.c:303:22: warning: unused variable 'np' 
[-Wunused-variable]
 struct device_node *np = dev->of_node;
 ^~
   drivers/net/dsa/mv88e6060.c: In function 'mv88e6060_remove':
   drivers/net/dsa/mv88e6060.c:344:25: warning: unused variable 'chip' 
[-Wunused-variable]
 struct mv88e6060_chip *chip = ds->priv;
^~~~
   Cyclomatic Complexity 1 include/linux/device.h:devm_kzalloc
   Cyclomatic Complexity 1 include/linux/device.h:dev_get_drvdata
   Cyclomatic Complexity 1 include/linux/device.h:dev_set_drvdata
   Cyclomatic Complexity 1 include/linux/etherdevice.h:eth_random_addr
   Cyclomatic Complexity 1 include/net/dsa.h:dsa_to_port
   Cyclomatic Complexity 1 include/net/dsa.h:dsa_is_cpu_port
   Cyclomatic Complexity 1 include/net/dsa.h:dsa_is_user_port
   Cyclomatic Complexity 3 include/net/dsa.h:dsa_user_ports
   Cyclomatic Complexity 1 
drivers/net/dsa/mv88e6060.c:mv88e6060_get_tag_protocol
   Cyclomatic Complexity 2 drivers/net/dsa/mv88e6060.c:alloc_priv
   Cyclomatic Complexity 2 
drivers/net/dsa/mv88e6060.c:mv88e6060_port_to_phy_addr
   Cyclomatic Complexity 1 drivers/net/dsa/mv88e6060.c:mv88e6060_init
   Cyclomatic Complexity 0 drivers/net/dsa/mv88e6060.c:mv88e6060_remove
   Cyclomatic Complexity 1 drivers/net/dsa/mv88e6060.c:reg_read
   Cyclomatic Complexity 2 drivers/net/dsa/mv88e6060.c:mv88e6060_phy_read
   Cyclomatic Complexity 1 drivers/net/dsa/mv88e6060.c:reg_write
   Cyclomatic Complexity 2 drivers/net/dsa/mv88e6060.c:mv88e6060_phy_write
   Cyclomatic Complexity 3 drivers/net/dsa/mv88e6060.c:mv88e6060_setup_global
   Cyclomatic Complexity 4 drivers/net/dsa/mv88e6060.c:mv88e6060_setup_addr
   Cyclomatic Complexity 6 drivers/net/dsa/mv88e6060.c:mv88e6060_setup_port
   Cyclomatic Complexity 13 drivers/net/dsa/mv88e6060.c:mv88e6060_switch_reset
   Cyclomatic Complexity 6 drivers/net/dsa/mv88e6060.c:mv88e6060_setup
   Cyclomatic Complexity 5 drivers/net/dsa/mv88e6060.c:mv88e6060_get_name
   Cyclomatic Complexity 3 drivers/net/dsa/mv88e6060.c:mv88e6060_probe
   Cyclomatic Complexity 3 drivers/net/dsa/mv88e6060.c:mv88e6060_drv_probe
   Cyclomatic Complexity 1 drivers/net/dsa/mv88e6060.c:mv88e6060_cleanup
   Cyclomatic Complexity 1 
drivers/net/dsa/mv88e6060.c:_GLOBAL__sub_I_00100_0_mv88e6060.c
   Cyclomatic Complexity 1 
drivers/net/dsa/mv88e6060.c:_GLOBAL__sub_D_00100_1_mv88e6060.c

vim +/const +316 drivers/net/dsa/mv88e6060.c

   299  
   300  static int mv88e6060_probe(struct mdio_device *mdiodev)
   301  {
   302  struct device *dev = >dev;
   303  struct device_node *np = dev->of_node;
   304  const struct mv88e6060_info *compat_info;
   305  struct mv88e6060_priv *chip;
   306  u32 eeprom_len;
   307  int err;
   308  
   309  int addr = 0x10 /* mdiodev->addr */ ;
   310  
   311  chip = alloc_priv

[PATCH net v2 2/2] neighbour: Avoid writing before skb->head in neigh_hh_output()

2018-12-06 Thread Stefano Brivio
While skb_push() makes the kernel panic if the skb headroom is less than
the unaligned hardware header size, it will proceed normally in case we
copy more than that because of alignment, and we'll silently corrupt
adjacent slabs.

In the case fixed by the previous patch,
"ipv6: Check available headroom in ip6_xmit() even without options", we
end up in neigh_hh_output() with 14 bytes headroom, 14 bytes hardware
header and write 16 bytes, starting 2 bytes before the allocated buffer.

Always check we're not writing before skb->head and, if the headroom is
not enough, warn and drop the packet.

v2:
 - instead of panicking with BUG_ON(), WARN_ON_ONCE() and drop the packet
   (Eric Dumazet)
 - if we avoid the panic, though, we need to explicitly check the headroom
   before the memcpy(), otherwise we'll have corrupted slabs on a running
   kernel, after we warn
 - use __skb_push() instead of skb_push(), as the headroom check is
   already implemented here explicitly (Eric Dumazet)

Signed-off-by: Stefano Brivio 
---
 include/net/neighbour.h | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index f58b384aa6c9..665990c7dec8 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -454,6 +454,7 @@ static inline int neigh_hh_bridge(struct hh_cache *hh, 
struct sk_buff *skb)
 
 static inline int neigh_hh_output(const struct hh_cache *hh, struct sk_buff 
*skb)
 {
+   unsigned int hh_alen = 0;
unsigned int seq;
unsigned int hh_len;
 
@@ -461,16 +462,33 @@ static inline int neigh_hh_output(const struct hh_cache 
*hh, struct sk_buff *skb
seq = read_seqbegin(>hh_lock);
hh_len = hh->hh_len;
if (likely(hh_len <= HH_DATA_MOD)) {
-   /* this is inlined by gcc */
-   memcpy(skb->data - HH_DATA_MOD, hh->hh_data, 
HH_DATA_MOD);
+   hh_alen = HH_DATA_MOD;
+
+   /* skb_push() would proceed silently if we have room for
+* the unaligned size but not for the aligned size:
+* check headroom explicitly.
+*/
+   if (likely(skb_headroom(skb) >= HH_DATA_MOD)) {
+   /* this is inlined by gcc */
+   memcpy(skb->data - HH_DATA_MOD, hh->hh_data,
+  HH_DATA_MOD);
+   }
} else {
-   unsigned int hh_alen = HH_DATA_ALIGN(hh_len);
+   hh_alen = HH_DATA_ALIGN(hh_len);
 
-   memcpy(skb->data - hh_alen, hh->hh_data, hh_alen);
+   if (likely(skb_headroom(skb) >= hh_alen)) {
+   memcpy(skb->data - hh_alen, hh->hh_data,
+  hh_alen);
+   }
}
} while (read_seqretry(>hh_lock, seq));
 
-   skb_push(skb, hh_len);
+   if (WARN_ON_ONCE(skb_headroom(skb) < hh_alen)) {
+   kfree_skb(skb);
+   return NET_XMIT_DROP;
+   }
+
+   __skb_push(skb, hh_len);
return dev_queue_xmit(skb);
 }
 
-- 
2.19.2



[PATCH net v2 0/2] Fix slab out-of-bounds on insufficient headroom for IPv6 packets

2018-12-06 Thread Stefano Brivio
Patch 1/2 fixes a slab out-of-bounds occurring with short SCTP packets over
IPv4 over L2TP over IPv6 on a configuration with relatively low HEADER_MAX.

Patch 2/2 makes sure we avoid writing before the allocated buffer in
neigh_hh_output() in case the headroom is enough for the unaligned hardware
header size, but not enough for the aligned one, and that we warn if we hit
this condition.

Stefano Brivio (2):
  ipv6: Check available headroom in ip6_xmit() even without options
  neighbour: Avoid writing before skb->head in neigh_hh_output()

 include/net/neighbour.h | 28 ++-
 net/ipv6/ip6_output.c   | 42 -
 2 files changed, 44 insertions(+), 26 deletions(-)

-- 
2.19.2



[PATCH net v2 1/2] ipv6: Check available headroom in ip6_xmit() even without options

2018-12-06 Thread Stefano Brivio
Even if we send an IPv6 packet without options, MAX_HEADER might not be
enough to account for the additional headroom required by alignment of
hardware headers.

On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().

Those would be enough to append our 14 bytes header, but we're going to
align that to 16 bytes, and write 2 bytes out of the allocated slab in
neigh_hh_output().

KASan says:

[  264.967848] 
==
[  264.967861] BUG: KASAN: slab-out-of-bounds in 
ip6_finish_output2+0x1aec/0x1c70
[  264.967866] Write of size 16 at addr 6af1c7fe by task netperf/6201
[  264.967870]
[  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
[  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
[  264.967887] Call Trace:
[  264.967896] ([<001347d6>] show_stack+0x56/0xa0)
[  264.967903]  [<017e379c>] dump_stack+0x23c/0x290
[  264.967912]  [<007bc594>] print_address_description+0xf4/0x290
[  264.967919]  [<007bc8fc>] kasan_report+0x13c/0x240
[  264.967927]  [<0162f5e4>] ip6_finish_output2+0x1aec/0x1c70
[  264.967935]  [<0163f890>] ip6_finish_output+0x430/0x7f0
[  264.967943]  [<0163fe44>] ip6_output+0x1f4/0x580
[  264.967953]  [<0163882a>] ip6_xmit+0xfea/0x1ce8
[  264.967963]  [<017396e2>] inet6_csk_xmit+0x282/0x3f8
[  264.968033]  [<03ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
[  264.968037]  [<03ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
[  264.968041]  [<01220020>] dev_hard_start_xmit+0x268/0x928
[  264.968069]  [<01330e8e>] sch_direct_xmit+0x7ae/0x1350
[  264.968071]  [<0122359c>] __dev_queue_xmit+0x2b7c/0x3478
[  264.968075]  [<013d2862>] ip_finish_output2+0xce2/0x11a0
[  264.968078]  [<013d9b14>] ip_finish_output+0x56c/0x8c8
[  264.968081]  [<013ddd1e>] ip_output+0x226/0x4c0
[  264.968083]  [<013dbd6c>] __ip_queue_xmit+0x894/0x1938
[  264.968100]  [<03ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
[  264.968116]  [<03ff80b7bf68>] 
sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
[  264.968131]  [<03ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
[  264.968146]  [<03ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 
[sctp]
[  264.968161]  [<03ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
[  264.968177]  [<03ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
[  264.968192]  [<03ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
[  264.968208]  [<03ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
[  264.968212]  [<01197942>] __sys_connect+0x21a/0x450
[  264.968215]  [<0119aff8>] sys_socketcall+0x3d0/0xb08
[  264.968218]  [<0184ea7a>] system_call+0x2a2/0x2c0

[...]

Just like ip_finish_output2() does for IPv4, check that we have enough
headroom in ip6_xmit(), and reallocate it if we don't.

This issue is older than git history.

Reported-by: Jianlin Shi 
Signed-off-by: Stefano Brivio 
---
v2: Fixed Jianlin's name in commit message

 net/ipv6/ip6_output.c | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 827a3f5ff3bb..fcd3c66ded16 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -195,37 +195,37 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, 
struct flowi6 *fl6,
const struct ipv6_pinfo *np = inet6_sk(sk);
struct in6_addr *first_hop = >daddr;
struct dst_entry *dst = skb_dst(skb);
+   unsigned int head_room;
struct ipv6hdr *hdr;
u8  proto = fl6->flowi6_proto;
int seg_len = skb->len;
int hlimit = -1;
u32 mtu;
 
-   if (opt) {
-   unsigned int head_room;
+   head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dst->dev);
+   if (opt)
+   head_room += opt->opt_nflen + opt->opt_flen;
 
-   /* First: exthdrs may take lots of space (~8K for now)
-  MAX_HEADER is not enough.
-*/
-   head_room = opt->opt_nflen + opt->opt_flen;
-   seg_len += head_room;
-   head_room += sizeof(struct ipv6hdr) + 
LL_RESERVED_SPACE(dst->dev);
-
-   if (skb_headroom(skb) < head_room) {
-   struct sk_buff *skb2 = skb_realloc_headroom(skb, 
head_room);
-   if (!skb2) {
-   IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
- IPSTATS_MIB_OUTDISCARDS);
-   kfree_skb(skb);
-   return -ENOBUFS;
-   

Re: OMAP4430 SDP with KS8851: very slow networking

2018-12-06 Thread Russell King - ARM Linux
On Thu, Dec 06, 2018 at 08:31:54AM -0800, Tony Lindgren wrote:
> Hi,
> 
> * Russell King - ARM Linux  [181206 13:23]:
> > It looks very much like a receive problem - in that the board is not
> > always aware of a packet having been received until it attempts to
> > transmit (eg, in the case of TFTP, when it re-sends the ACK after a
> > receive timeout, it _then_ notices that there's a packet waiting.)
> > 
> > I'm not quite sure when this cropped up as I no longer regularly
> > update and run my nightly boot tests, but I think 4.18 was fine.
> 
> Sounds like it's some gpio or PM related issue. If it's not caused
> by commit b764a5863fd8 ("gpio: omap: Remove custom PM calls and
> use cpu_pm instead"), then maybe the changes to probe devices
> with ti-sysc interconnect target module driver caused it. Below
> is a revert for mcspi that would help in that case.

In the interests of keeping the mailing list record up to date, with
the following:

850d434ea37b ("gpio: omap: Remove set but not used variable 'dev'")
c4791bc6e3a6 ("gpio: omap: drop omap_gpio_list")
467480738d0b ("gpio: omap: get rid of the conditional PM runtime calls")
5284521a290e ("gpio: omap: Get rid of pm_runtime_irq_safe()")
b764a5863fd8 ("gpio: omap: Remove custom PM calls and use cpu_pm instead")

reverted, the problem is still there.  Revert:

ec0daae685b2 ("gpio: omap: Add level wakeup handling for omap4 based SoCs")

on top, and networking returns to normal.  So it appears to be this
last commit causing the issue.

With that and b764a5863fd8 applied, it still misbehaves.  Then, poking
at the OMAP4_GPIO_IRQWAKEN0 register, changing it from 0 to 4 with
devmem2 restores normal behaviour - ping times are normal and NFS is
happy.

# devmem2 0x48055044 w 4

(slightly more complex for me as its via NFS and needs different C
libraries from the ones on the rootfs.)

Given that this GPIO device is not runtime suspended, and is
permanently active (which is what I think we expect, given that it
has an IRQ claimed against it) does the hardware still attempt to
idle the GPIO block - if so, could that be why we need to program
the wakeup register, so the GPIO block signals that it's active?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


[PATCH net] tcp: lack of available data can also cause TSO defer

2018-12-06 Thread Eric Dumazet
tcp_tso_should_defer() can return true in three different cases :

 1) We are cwnd-limited
 2) We are rwnd-limited
 3) We are application limited.

Neal pointed out that my recent fix went too far, since
it assumed that if we were not in 1) case, we must be rwnd-limited

Fix this by properly populating the is_cwnd_limited and
is_rwnd_limited booleans.

After this change, we can finally move the silly check for FIN
flag only for the application-limited case.

The same move for EOR bit will be handled in net-next,
since commit 1c09f7d073b1 ("tcp: do not try to defer skbs
with eor mark (MSG_EOR)") is scheduled for linux-4.21

Tested by running 200 concurrent netperf -t TCP_RR -- -r 6,100
and checking none of them was rwnd_limited in the chrono_stat
output from "ss -ti" command.

Fixes: 41727549de3e ("tcp: Do not underestimate rwnd_limited")
Signed-off-by: Eric Dumazet 
Suggested-by: Neal Cardwell 
Reviewed-by: Neal Cardwell 
Acked-by: Soheil Hassas Yeganeh 
Reviewed-by: Yuchung Cheng 
---
 net/ipv4/tcp_output.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 
5aa600900695666aa8b55f1e4b11ba5129509958..d1676d8a6ed70fbe050709a16a650df35a1f4d87
 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1904,7 +1904,9 @@ static int tso_fragment(struct sock *sk, enum tcp_queue 
tcp_queue,
  * This algorithm is from John Heffner.
  */
 static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
-bool *is_cwnd_limited, u32 max_segs)
+bool *is_cwnd_limited,
+bool *is_rwnd_limited,
+u32 max_segs)
 {
const struct inet_connection_sock *icsk = inet_csk(sk);
u32 age, send_win, cong_win, limit, in_flight;
@@ -1912,9 +1914,6 @@ static bool tcp_tso_should_defer(struct sock *sk, struct 
sk_buff *skb,
struct sk_buff *head;
int win_divisor;
 
-   if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
-   goto send_now;
-
if (icsk->icsk_ca_state >= TCP_CA_Recovery)
goto send_now;
 
@@ -1973,10 +1972,27 @@ static bool tcp_tso_should_defer(struct sock *sk, 
struct sk_buff *skb,
if (age < (tp->srtt_us >> 4))
goto send_now;
 
-   /* Ok, it looks like it is advisable to defer. */
+   /* Ok, it looks like it is advisable to defer.
+* Three cases are tracked :
+* 1) We are cwnd-limited
+* 2) We are rwnd-limited
+* 3) We are application limited.
+*/
+   if (cong_win < send_win) {
+   if (cong_win <= skb->len) {
+   *is_cwnd_limited = true;
+   return true;
+   }
+   } else {
+   if (send_win <= skb->len) {
+   *is_rwnd_limited = true;
+   return true;
+   }
+   }
 
-   if (cong_win < send_win && cong_win <= skb->len)
-   *is_cwnd_limited = true;
+   /* If this packet won't get more data, do not wait. */
+   if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
+   goto send_now;
 
return true;
 
@@ -2356,11 +2372,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
} else {
if (!push_one &&
tcp_tso_should_defer(sk, skb, _cwnd_limited,
-max_segs)) {
-   if (!is_cwnd_limited)
-   is_rwnd_limited = true;
+_rwnd_limited, max_segs))
break;
-   }
}
 
limit = mss_now;
-- 
2.20.0.rc2.403.gdbc3b29805-goog



Re: [PATCH bpf 2/2] net/flow_dissector: correctly cap nhoff and thoff in case of BPF

2018-12-06 Thread Song Liu
On Wed, Dec 5, 2018 at 8:41 PM Stanislav Fomichev  wrote:
>
> We want to make sure that the following condition holds:
> 0 <= nhoff <= thoff <= skb->len
>
> BPF program can set out-of-bounds nhoff and thoff, which is dangerous, see
> recent commit d0c081b49137 ("flow_dissector: properly cap thoff field")'.
>
> Signed-off-by: Stanislav Fomichev 

Acked-by: Song Liu 

> ---
>  net/core/flow_dissector.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index ff5556d80570..af68207ee56c 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -791,9 +791,12 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> /* Restore state */
> memcpy(cb, _saved, sizeof(cb_saved));
>
> +   flow_keys.nhoff = clamp_t(u16, flow_keys.nhoff, 0, skb->len);
> +   flow_keys.thoff = clamp_t(u16, flow_keys.thoff,
> + flow_keys.nhoff, skb->len);
> +
> __skb_flow_bpf_to_target(_keys, flow_dissector,
>  target_container);
> -   key_control->thoff = min_t(u16, key_control->thoff, skb->len);
> rcu_read_unlock();
> return result == BPF_OK;
> }
> --
> 2.20.0.rc1.387.gf8505762e3-goog
>


Re: [PATCH bpf 1/2] selftests/bpf: use thoff instead of nhoff in BPF flow dissector

2018-12-06 Thread Song Liu
On Wed, Dec 5, 2018 at 8:41 PM Stanislav Fomichev  wrote:
>
> We are returning thoff from the flow dissector, not the nhoff. Pass
> thoff along with nhoff to the bpf program (initially thoff == nhoff)
> and expect flow dissector amend/return thoff, not nhoff.
>
> This avoids confusion, when by the time bpf flow dissector exits,
> nhoff == thoff, which doesn't make much sense.
>
> Signed-off-by: Stanislav Fomichev 
Acked-by: Song Liu 

> ---
>  net/core/flow_dissector.c  |  1 +
>  tools/testing/selftests/bpf/bpf_flow.c | 36 --
>  2 files changed, 18 insertions(+), 19 deletions(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 588f475019d4..ff5556d80570 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -783,6 +783,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> /* Pass parameters to the BPF program */
> cb->qdisc_cb.flow_keys = _keys;
> flow_keys.nhoff = nhoff;
> +   flow_keys.thoff = nhoff;
>
> bpf_compute_data_pointers((struct sk_buff *)skb);
> result = BPF_PROG_RUN(attached, skb);
> diff --git a/tools/testing/selftests/bpf/bpf_flow.c 
> b/tools/testing/selftests/bpf/bpf_flow.c
> index 107350a7821d..df9d32fd2055 100644
> --- a/tools/testing/selftests/bpf/bpf_flow.c
> +++ b/tools/testing/selftests/bpf/bpf_flow.c
> @@ -70,18 +70,18 @@ static __always_inline void 
> *bpf_flow_dissect_get_header(struct __sk_buff *skb,
>  {
> void *data_end = (void *)(long)skb->data_end;
> void *data = (void *)(long)skb->data;
> -   __u16 nhoff = skb->flow_keys->nhoff;
> +   __u16 thoff = skb->flow_keys->thoff;
> __u8 *hdr;
>
> /* Verifies this variable offset does not overflow */
> -   if (nhoff > (USHRT_MAX - hdr_size))
> +   if (thoff > (USHRT_MAX - hdr_size))
> return NULL;
>
> -   hdr = data + nhoff;
> +   hdr = data + thoff;
> if (hdr + hdr_size <= data_end)
> return hdr;
>
> -   if (bpf_skb_load_bytes(skb, nhoff, buffer, hdr_size))
> +   if (bpf_skb_load_bytes(skb, thoff, buffer, hdr_size))
> return NULL;
>
> return buffer;
> @@ -158,13 +158,13 @@ static __always_inline int parse_ip_proto(struct 
> __sk_buff *skb, __u8 proto)
> /* Only inspect standard GRE packets with version 0 */
> return BPF_OK;
>
> -   keys->nhoff += sizeof(*gre); /* Step over GRE Flags and Proto 
> */
> +   keys->thoff += sizeof(*gre); /* Step over GRE Flags and Proto 
> */
> if (GRE_IS_CSUM(gre->flags))
> -   keys->nhoff += 4; /* Step over chksum and Padding */
> +   keys->thoff += 4; /* Step over chksum and Padding */
> if (GRE_IS_KEY(gre->flags))
> -   keys->nhoff += 4; /* Step over key */
> +   keys->thoff += 4; /* Step over key */
> if (GRE_IS_SEQ(gre->flags))
> -   keys->nhoff += 4; /* Step over sequence number */
> +   keys->thoff += 4; /* Step over sequence number */
>
> keys->is_encap = true;
>
> @@ -174,7 +174,7 @@ static __always_inline int parse_ip_proto(struct 
> __sk_buff *skb, __u8 proto)
> if (!eth)
> return BPF_DROP;
>
> -   keys->nhoff += sizeof(*eth);
> +   keys->thoff += sizeof(*eth);
>
> return parse_eth_proto(skb, eth->h_proto);
> } else {
> @@ -191,7 +191,6 @@ static __always_inline int parse_ip_proto(struct 
> __sk_buff *skb, __u8 proto)
> if ((__u8 *)tcp + (tcp->doff << 2) > data_end)
> return BPF_DROP;
>
> -   keys->thoff = keys->nhoff;
> keys->sport = tcp->source;
> keys->dport = tcp->dest;
> return BPF_OK;
> @@ -201,7 +200,6 @@ static __always_inline int parse_ip_proto(struct 
> __sk_buff *skb, __u8 proto)
> if (!udp)
> return BPF_DROP;
>
> -   keys->thoff = keys->nhoff;
> keys->sport = udp->source;
> keys->dport = udp->dest;
> return BPF_OK;
> @@ -252,8 +250,8 @@ PROG(IP)(struct __sk_buff *skb)
> keys->ipv4_src = iph->saddr;
> keys->ipv4_dst = iph->daddr;
>
> -   keys->nhoff += iph->ihl << 2;
> -   if (data + keys->nhoff > data_end)
> +   keys->thoff += iph->ihl << 2;
> +   if (data + keys->thoff > data_end)
> return BPF_DROP;
>
> if (iph->frag_off & bpf_htons(IP_MF | IP_OFFSET)) {
> @@ -285,7 +283,7 @@ PROG(IPV6)(struct __sk_buff *skb)
> keys->addr_proto = ETH_P_IPV6;
> memcpy(>ipv6_src, >saddr, 2*sizeof(ip6h->saddr));
>
> -   keys->nhoff += 

[PATCH net 4/4] mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl

2018-12-06 Thread Ido Schimmel
When deleting a VLAN device using an ioctl the netdev is unregistered
before the VLAN filter is updated via ndo_vlan_rx_kill_vid(). It can
lead to a use-after-free in mlxsw in case the VLAN device is deleted
while being enslaved to a bridge.

The reason for the above is that when mlxsw receives the CHANGEUPPER
event, it wrongly assumes that the VLAN device is no longer its upper
and thus destroys the internal representation of the bridge port despite
the reference count being non-zero.

Fix this by checking if the VLAN device is our upper using its real
device. In net-next I'm going to remove this trick and instead make
mlxsw completely agnostic to the order of the events.

Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c   | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 7f2091c2648e..50080c60a279 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -296,7 +296,13 @@ static bool
 mlxsw_sp_bridge_port_should_destroy(const struct mlxsw_sp_bridge_port *
bridge_port)
 {
-   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_lower_get(bridge_port->dev);
+   struct net_device *dev = bridge_port->dev;
+   struct mlxsw_sp *mlxsw_sp;
+
+   if (is_vlan_dev(dev))
+   mlxsw_sp = mlxsw_sp_lower_get(vlan_dev_real_dev(dev));
+   else
+   mlxsw_sp = mlxsw_sp_lower_get(dev);
 
/* In case ports were pulled from out of a bridged LAG, then
 * it's possible the reference count isn't zero, yet the bridge
@@ -2109,7 +2115,7 @@ mlxsw_sp_bridge_8021d_port_leave(struct 
mlxsw_sp_bridge_device *bridge_device,
 
vid = is_vlan_dev(dev) ? vlan_dev_vlan_id(dev) : 1;
mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_find_by_vid(mlxsw_sp_port, vid);
-   if (WARN_ON(!mlxsw_sp_port_vlan))
+   if (!mlxsw_sp_port_vlan)
return;
 
mlxsw_sp_port_vlan_bridge_leave(mlxsw_sp_port_vlan);
-- 
2.19.1



[PATCH net 3/4] mlxsw: spectrum_router: Relax GRE decap matching check

2018-12-06 Thread Ido Schimmel
From: Nir Dotan 

GRE decap offload is configured when local routes prefix correspond to the
local address of one of the offloaded GRE tunnels. The matching check was
found to be too strict, such that for a flat GRE configuration, in which
the overlay and underlay traffic share the same non-default VRF, decap flow
was not offloaded.

Relax the check for decap flow offloading. A match occurs if the local
address of the tunnel matches the local route address while both share the
same VRF table.

Fixes: 4607f6d26950 ("mlxsw: spectrum_router: Support IPv4 underlay decap")
Signed-off-by: Nir Dotan 
Signed-off-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 9e9bb57134f2..6ebf99cc3154 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -1275,15 +1275,12 @@ mlxsw_sp_ipip_entry_matches_decap(struct mlxsw_sp 
*mlxsw_sp,
 {
u32 ul_tb_id = l3mdev_fib_table(ul_dev) ? : RT_TABLE_MAIN;
enum mlxsw_sp_ipip_type ipipt = ipip_entry->ipipt;
-   struct net_device *ipip_ul_dev;
 
if (mlxsw_sp->router->ipip_ops_arr[ipipt]->ul_proto != ul_proto)
return false;
 
-   ipip_ul_dev = __mlxsw_sp_ipip_netdev_ul_dev_get(ipip_entry->ol_dev);
return mlxsw_sp_ipip_entry_saddr_matches(mlxsw_sp, ul_proto, ul_dip,
-ul_tb_id, ipip_entry) &&
-  (!ipip_ul_dev || ipip_ul_dev == ul_dev);
+ul_tb_id, ipip_entry);
 }
 
 /* Given decap parameters, find the corresponding IPIP entry. */
-- 
2.19.1



[PATCH net 2/4] mlxsw: spectrum_switchdev: Avoid leaking FID's reference count

2018-12-06 Thread Ido Schimmel
It should never be possible for a user to set a VNI on a FID in case one
is already set. The driver therefore returns an error, but fails to drop
the reference count taken earlier when calling
mlxsw_sp_fid_8021d_lookup().

Drop the reference when this unlikely error is hit.

Fixes: 1c30d1836aeb ("mlxsw: spectrum: Enable VxLAN enslavement to bridges")
Signed-off-by: Ido Schimmel 
Reviewed-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 739a51f0a366..7f2091c2648e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2134,8 +2134,10 @@ mlxsw_sp_bridge_8021d_vxlan_join(struct 
mlxsw_sp_bridge_device *bridge_device,
if (!fid)
return -EINVAL;
 
-   if (mlxsw_sp_fid_vni_is_set(fid))
-   return -EINVAL;
+   if (mlxsw_sp_fid_vni_is_set(fid)) {
+   err = -EINVAL;
+   goto err_vni_exists;
+   }
 
err = mlxsw_sp_nve_fid_enable(mlxsw_sp, fid, , extack);
if (err)
@@ -2149,6 +2151,7 @@ mlxsw_sp_bridge_8021d_vxlan_join(struct 
mlxsw_sp_bridge_device *bridge_device,
return 0;
 
 err_nve_fid_enable:
+err_vni_exists:
mlxsw_sp_fid_put(fid);
return err;
 }
-- 
2.19.1



[PATCH net 0/4] mlxsw: Various fixes

2018-12-06 Thread Ido Schimmel
Patches #1 and #2 fix two VxLAN related issues. The first patch removes
warnings that can currently be triggered from user space. Second patch
avoids leaking a FID in an error path.

Patch #3 fixes a too strict check that causes certain host routes not to
be promoted to perform GRE decapsulation in hardware.

Last patch avoids a use-after-free when deleting a VLAN device via an
ioctl when it is enslaved to a bridge. I have a patchset for net-next
that reworks this code and makes the driver more robust.

Ido Schimmel (3):
  mlxsw: spectrum_nve: Remove easily triggerable warnings
  mlxsw: spectrum_switchdev: Avoid leaking FID's reference count
  mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl

Nir Dotan (1):
  mlxsw: spectrum_router: Relax GRE decap matching check

 .../net/ethernet/mellanox/mlxsw/spectrum_nve.c  |  4 ++--
 .../ethernet/mellanox/mlxsw/spectrum_router.c   |  5 +
 .../mellanox/mlxsw/spectrum_switchdev.c | 17 +
 3 files changed, 16 insertions(+), 10 deletions(-)

-- 
2.19.1



[PATCH net 1/4] mlxsw: spectrum_nve: Remove easily triggerable warnings

2018-12-06 Thread Ido Schimmel
It is possible to trigger a warning in mlxsw in case a flood entry which
mlxsw is not aware of is deleted from the VxLAN device. This is because
mlxsw expects to find a singly linked list where the flood entry is
present in.

Fix by removing these warnings for now.

Will re-add them in the next release after we teach mlxsw to ask for a
dump of FDB entries from the VxLAN device, once it is enslaved to a
bridge mlxsw cares about.

Fixes: 6e6030bd5412 ("mlxsw: spectrum_nve: Implement common NVE core")
Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
index ad06d9969bc1..5c13674439f1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_nve.c
@@ -560,7 +560,7 @@ static void mlxsw_sp_nve_mc_list_ip_del(struct mlxsw_sp 
*mlxsw_sp,
 
mc_record = mlxsw_sp_nve_mc_record_find(mc_list, proto, addr,
_entry);
-   if (WARN_ON(!mc_record))
+   if (!mc_record)
return;
 
mlxsw_sp_nve_mc_record_entry_del(mc_record, mc_entry);
@@ -647,7 +647,7 @@ void mlxsw_sp_nve_flood_ip_del(struct mlxsw_sp *mlxsw_sp,
 
key.fid_index = mlxsw_sp_fid_index(fid);
mc_list = mlxsw_sp_nve_mc_list_find(mlxsw_sp, );
-   if (WARN_ON(!mc_list))
+   if (!mc_list)
return;
 
mlxsw_sp_nve_fid_flood_index_clear(fid, mc_list);
-- 
2.19.1



Re: [PATCH net] macvlan: remove duplicate check

2018-12-06 Thread Matteo Croce
On Wed, Dec 5, 2018 at 8:40 PM David Miller  wrote:
>
> From: Matteo Croce 
> Date: Tue,  4 Dec 2018 18:05:42 +0100
>
> > Following commit 59f997b088d2 ("macvlan: return correct error value"),
> > there is a duplicate check for mac addresses both in macvlan_sync_address()
> > and macvlan_set_mac_address().
> > As the former calls the latter, remove the one in macvlan_set_mac_address()
> > and move the one in macvlan_sync_address() before any other check.
> >
> > Signed-off-by: Matteo Croce 
>
> Hmmm, doesn't this change behavior?
>
> For the handling of the NETDEV_CHANGEADDR event in macvlan_device_event()
> we would make it to macvlan_sync_address(), and if IFF_UP is false,
> we would elide the macvlan_addr_busy() check and just copy the MAC addres
> over and return.
>
> Now, we would always perform the macvlan_addr_busy() check.
>
> Please, if this is OK, explain and document this behavioral chance in
> the commit message.
>
> Thank you.

Hi David,

I looked at macvlan_device_event() again. Correct me if I'm wrong:
That function is meant to handle changes to the macvlan lower device.
In my case, it receives an NETDEV_CHANGEADDR after the lower device
mac addres is changed.
Actually events are handled only if the macvlan mode is passthru,
while in all other modes NOTIFY_DONE is just returned, so
macvlan_sync_address() is called only for passthru mode.
The passthru mode mandates that the macvlan and phy address are the
same, hence macvlan_addr_busy() skips the address comparison if the
mode is passthru, and at the end, nothing happens.

Speaking of mac address change, I have a question about the generic code.
If I look at the NOTIFY_BAD definition in include/linux/notifier.h,
the comment states "Bad/Veto action", which suggests me that a
notifier returning NOTIFY_BAD should prevent a change.
This doesn't happen because in dev_set_mac_address(), the event is
sent to notifiers after the change has already made, and the result of
call_netdevice_notifiers() is ignored anyway.

So in theory a notifier can deny another device address change, but in
practice this doesn't happen. Does it sound right? Just asking.

Regards,
-- 
Matteo Croce
per aspera ad upstream


Re: [PATCH net-next 0/2] platform data controls for mdio-gpio

2018-12-06 Thread Florian Fainelli
Hi Andrew,

On 12/6/18 5:58 AM, Andrew Lunn wrote:
> Soon to be mainlined is an x86 platform with a Marvell switch, and a
> bit-banging MDIO bus. In order to make this work, the phy_mask of the
> MDIO bus needs to be set to prevent scanning for PHYs, and the
> phy_ignore_ta_mask needs to be set because the switch has broken
> turnaround.

This looks good, I would just make one/two changes which is to match the
internal phy_mask and phy_ignore_ta_mask types from the struct mii_bus
and use u32 instead of int.

> 
> Add a platform_data structure with these parameters.
> 
> Andrew Lunn (2):
>   net: phy: mdio-gpio: Add platform_data support for phy_mask
>   net: phy: mdio-gpio: Add phy_ignore_ta_mask to platform data
> 
>  MAINTAINERS |  1 +
>  drivers/net/phy/mdio-gpio.c |  7 +++
>  include/linux/platform_data/mdio-gpio.h | 14 ++
>  3 files changed, 22 insertions(+)
>  create mode 100644 include/linux/platform_data/mdio-gpio.h
> 

-- 
Florian


Re: [PATCH bpf-next] tools: bpftool: add a command to dump the trace pipe

2018-12-06 Thread Quentin Monnet
2018-12-05 19:18 UTC-0800 ~ Alexei Starovoitov

> On Wed, Dec 05, 2018 at 06:15:23PM +, Quentin Monnet wrote:
 +
 +  /* Allow room for NULL terminating byte and pipe file name */
 +  snprintf(format, sizeof(format), "%%*s %%%zds %%99s %%*s %%*d %%*d\\n",
 +   PATH_MAX - strlen(pipe_name) - 1);
>>>
>>> before scanning trace_pipe could you add a check that trace_options are 
>>> compatible?
>>> Otherwise there will be a lot of garbage printed.
>>> afaik default is rarely changed, so the patch is ok as-is.
>>> The followup some time in the future would be perfect.
>>
>> Sure. What do you mean exactly by compatible options? I can check that
>> "trace_printk" is set, is there any other option that would be relevant?
> 
> See Documentation/trace/ftrace.rst
> a lot of the flags will change the format significantly.
> Like 'bin' will make it binary.
> I'm not suggesting to support all possible output formats.
> Only to check that trace flags match scanf.

fscanf() is only used to retrieve the name of the sysfs directory where
the pipe is located, when listing all the mount points on the system. It
is not used to dump the content from the pipe (which is done with
getline(), so formatting does not matter much).

If the "bin" option is set, "bpftool prog tracelog" will dump the same
binary content as "cat /sys/kernel/debug/tracing/trace_pipe", which is
the expected behaviour (at least with the current patch). Let me know if
you would like me to change this somehow.

Thanks,
Quentin


[PATCH] bpf: fix overflow of bpf_jit_limit when PAGE_SIZE >= 64K

2018-12-06 Thread Michael Roth
Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
JIT allocations. At compile time it defaults to PAGE_SIZE * 4,
and is adjusted again at init time if MODULES_VADDR is defined.

For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
the compile-time default at boot-time, which is 0x9c40 when
using 64K page size. This overflows the signed 32-bit bpf_jit_limit
value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

and can cause various unexpected failures throughout the network
stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8}, 16) 
= -1 ENOTSUPP (Unknown error 524)

and similar failures can be seen with tools like tcpdump. This doesn't
always reproduce however, and I'm not sure why. The more consistent
failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9 host
would time out on systemd/netplan configuring a virtio-net NIC with no
noticeable errors in the logs.

Fix this by limiting the compile-time default for bpf_jit_limit to
INT_MAX.

Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
allocations")
Cc: linuxppc-...@ozlabs.org
Cc: Daniel Borkmann 
Cc: Sandipan Das 
Cc: Alexei Starovoitov 
Signed-off-by: Michael Roth 
---
 kernel/bpf/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b1a3545d0ec8..55de4746cdfd 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -365,7 +365,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
 }
 
 #ifdef CONFIG_BPF_JIT
-# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4)
+# define BPF_MIN(x, y) ((x) < (y) ? (x) : (y))
+# define BPF_JIT_LIMIT_DEFAULT BPF_MIN((PAGE_SIZE * 4), INT_MAX)
 
 /* All BPF JIT sysctl knobs here. */
 int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
-- 
2.17.1



[PATCH net-next v2 8/8] selftests: mlxsw: Add a new test extack.sh

2018-12-06 Thread Petr Machata
Add a testsuite dedicated to testing extack propagation and related
functionality.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
---
 .../testing/selftests/drivers/net/mlxsw/extack.sh  | 84 ++
 1 file changed, 84 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/mlxsw/extack.sh

diff --git a/tools/testing/selftests/drivers/net/mlxsw/extack.sh 
b/tools/testing/selftests/drivers/net/mlxsw/extack.sh
new file mode 100755
index ..101a5508bdfd
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/mlxsw/extack.sh
@@ -0,0 +1,84 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test operations that we expect to report extended ack.
+
+lib_dir=$(dirname $0)/../../../net/forwarding
+
+ALL_TESTS="
+   netdev_pre_up_test
+"
+NUM_NETIFS=2
+source $lib_dir/lib.sh
+
+setup_prepare()
+{
+   swp1=${NETIFS[p1]}
+   swp2=${NETIFS[p2]}
+
+   ip link set dev $swp1 up
+   ip link set dev $swp2 up
+}
+
+cleanup()
+{
+   pre_cleanup
+
+   ip link set dev $swp2 down
+   ip link set dev $swp1 down
+}
+
+netdev_pre_up_test()
+{
+   RET=0
+
+   ip link add name br1 up type bridge vlan_filtering 0 mcast_snooping 0
+   ip link add name vx1 up type vxlan id 1000 \
+   local 192.0.2.17 remote 192.0.2.18 \
+   dstport 4789 nolearning noudpcsum tos inherit ttl 100
+
+   ip link set dev vx1 master br1
+   check_err $?
+
+   ip link set dev $swp1 master br1
+   check_err $?
+
+   ip link add name br2 up type bridge vlan_filtering 0 mcast_snooping 0
+   ip link add name vx2 up type vxlan id 2000 \
+   local 192.0.2.17 remote 192.0.2.18 \
+   dstport 4789 nolearning noudpcsum tos inherit ttl 100
+
+   ip link set dev vx2 master br2
+   check_err $?
+
+   ip link set dev $swp2 master br2
+   check_err $?
+
+   # Unsupported configuration: mlxsw demands that all offloaded VXLAN
+   # devices have the same TTL.
+   ip link set dev vx2 down
+   ip link set dev vx2 type vxlan ttl 200
+
+   ip link set dev vx2 up &>/dev/null
+   check_fail $?
+
+   ip link set dev vx2 up 2>&1 >/dev/null | grep -q mlxsw_spectrum
+   check_err $?
+
+   log_test "extack - NETDEV_PRE_UP"
+
+   ip link del dev vx2
+   ip link del dev br2
+
+   ip link del dev vx1
+   ip link del dev br1
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.4.11



Re: [PATCH bpf] bpf: fix default unprivileged allocation limit

2018-12-06 Thread Michael Roth
Quoting Sandipan Das (2018-12-06 03:27:32)
> When using a large page size, the default value of the bpf_jit_limit
> knob becomes invalid and users are not able to run unprivileged bpf
> programs.
> 
> The bpf_jit_limit knob is represented internally as a 32-bit signed
> integer because of which the default value, i.e. PAGE_SIZE * 4,
> overflows in case of an architecture like powerpc64 which uses 64K
> as the default page size (i.e. CONFIG_PPC_64K_PAGES is set).
> 
> So, instead of depending on the page size, use a constant value.
> 
> Fixes: ede95a63b5e8 ("bpf: add bpf_jit_limit knob to restrict unpriv 
> allocations")

This also consistently caused a virtio-net KVM Ubuntu 18.04 guest to time out
on configuring networking during boot via systemd/netplan. A bisect
pointed to the same commit this patch addresses.

> Signed-off-by: Sandipan Das 
> ---
>  kernel/bpf/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index b1a3545d0ec8..a81d097a17fb 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -365,7 +365,7 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
>  }
> 
>  #ifdef CONFIG_BPF_JIT
> -# define BPF_JIT_LIMIT_DEFAULT (PAGE_SIZE * 4)
> +# define BPF_JIT_LIMIT_DEFAULT (4096 * 4)

This isn't quite right as we still use (bpf_jit_limit >> PAGE_SHIFT) to check
allocations in bpf_jit_charge_modmem(), so that should be fixed up as well.

Another alternative which is to clamp BPF_JIT_LIMIT_DEFAULT to INT_MAX,
which fixes the issue for me and is similar to what
bpf_jit_charge_init() does for kernels where MODULES_VADDR is defined.
I'll go ahead and send the patch in case that seems preferable.

> 
>  /* All BPF JIT sysctl knobs here. */
>  int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_ALWAYS_ON);
> -- 
> 2.19.2
> 



[PATCH net-next v2 6/8] net: core: dev: Add call_netdevice_notifiers_extack()

2018-12-06 Thread Petr Machata
In order to propagate extack through NETDEV_PRE_UP, add a new function
call_netdevice_notifiers_extack() that primes the extack field of the
notifier info. Convert call_netdevice_notifiers() to a simple wrapper
around the new function that passes NULL for extack.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
Reviewed-by: David Ahern 
---
 net/core/dev.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b37e320def13..4b033af8e6cd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -162,6 +162,9 @@ static struct list_head offload_base __read_mostly;
 static int netif_rx_internal(struct sk_buff *skb);
 static int call_netdevice_notifiers_info(unsigned long val,
 struct netdev_notifier_info *info);
+static int call_netdevice_notifiers_extack(unsigned long val,
+  struct net_device *dev,
+  struct netlink_ext_ack *extack);
 static struct napi_struct *napi_by_id(unsigned int napi_id);
 
 /*
@@ -1734,6 +1737,18 @@ static int call_netdevice_notifiers_info(unsigned long 
val,
return raw_notifier_call_chain(_chain, val, info);
 }
 
+static int call_netdevice_notifiers_extack(unsigned long val,
+  struct net_device *dev,
+  struct netlink_ext_ack *extack)
+{
+   struct netdev_notifier_info info = {
+   .dev = dev,
+   .extack = extack,
+   };
+
+   return call_netdevice_notifiers_info(val, );
+}
+
 /**
  * call_netdevice_notifiers - call all network notifier blocks
  *  @val: value passed unmodified to notifier function
@@ -1745,11 +1760,7 @@ static int call_netdevice_notifiers_info(unsigned long 
val,
 
 int call_netdevice_notifiers(unsigned long val, struct net_device *dev)
 {
-   struct netdev_notifier_info info = {
-   .dev = dev,
-   };
-
-   return call_netdevice_notifiers_info(val, );
+   return call_netdevice_notifiers_extack(val, dev, NULL);
 }
 EXPORT_SYMBOL(call_netdevice_notifiers);
 
-- 
2.4.11



[PATCH net-next v2 7/8] net: core: dev: Attach extack to NETDEV_PRE_UP

2018-12-06 Thread Petr Machata
Drivers may need to validate configuration of a device that's about to
be upped. Should the validation fail, there's currently no way to
communicate details of the failure to the user, beyond an error number.

To mend that, change __dev_open() to take an extack argument and pass it
from __dev_change_flags() and dev_open(), where it was propagated in the
previous patches.

Change __dev_open() to call call_netdevice_notifiers_extack() so that
the passed-in extack is attached to the NETDEV_PRE_UP notifier.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
Reviewed-by: David Ahern 
---
 net/core/dev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4b033af8e6cd..068b60db35ae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1364,7 +1364,7 @@ void netdev_notify_peers(struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_notify_peers);
 
-static int __dev_open(struct net_device *dev)
+static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 {
const struct net_device_ops *ops = dev->netdev_ops;
int ret;
@@ -1380,7 +1380,7 @@ static int __dev_open(struct net_device *dev)
 */
netpoll_poll_disable(dev);
 
-   ret = call_netdevice_notifiers(NETDEV_PRE_UP, dev);
+   ret = call_netdevice_notifiers_extack(NETDEV_PRE_UP, dev, extack);
ret = notifier_to_errno(ret);
if (ret)
return ret;
@@ -1427,7 +1427,7 @@ int dev_open(struct net_device *dev, struct 
netlink_ext_ack *extack)
if (dev->flags & IFF_UP)
return 0;
 
-   ret = __dev_open(dev);
+   ret = __dev_open(dev, extack);
if (ret < 0)
return ret;
 
@@ -7547,7 +7547,7 @@ int __dev_change_flags(struct net_device *dev, unsigned 
int flags,
if (old_flags & IFF_UP)
__dev_close(dev);
else
-   ret = __dev_open(dev);
+   ret = __dev_open(dev, extack);
}
 
if ((flags ^ dev->gflags) & IFF_PROMISC) {
-- 
2.4.11



[PATCH net-next v2 5/8] net: core: dev: Add extack argument to __dev_change_flags()

2018-12-06 Thread Petr Machata
In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. The last missing API is __dev_change_flags().

Therefore extend __dev_change_flags() with and extra extack argument and
update the two existing users.

Since the function declaration line is changed anyway, name the struct
net_device argument to placate checkpatch.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
Reviewed-by: David Ahern 
---
 include/linux/netdevice.h | 3 ++-
 net/core/dev.c| 5 +++--
 net/core/rtnetlink.c  | 3 ++-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 18cf464450ee..fc6ba71513be 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3611,7 +3611,8 @@ int dev_ioctl(struct net *net, unsigned int cmd, struct 
ifreq *ifr,
 int dev_ifconf(struct net *net, struct ifconf *, int);
 int dev_ethtool(struct net *net, struct ifreq *);
 unsigned int dev_get_flags(const struct net_device *);
-int __dev_change_flags(struct net_device *, unsigned int flags);
+int __dev_change_flags(struct net_device *dev, unsigned int flags,
+  struct netlink_ext_ack *extack);
 int dev_change_flags(struct net_device *dev, unsigned int flags,
 struct netlink_ext_ack *extack);
 void __dev_notify_flags(struct net_device *, unsigned int old_flags,
diff --git a/net/core/dev.c b/net/core/dev.c
index 8bba6f98b545..b37e320def13 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7498,7 +7498,8 @@ unsigned int dev_get_flags(const struct net_device *dev)
 }
 EXPORT_SYMBOL(dev_get_flags);
 
-int __dev_change_flags(struct net_device *dev, unsigned int flags)
+int __dev_change_flags(struct net_device *dev, unsigned int flags,
+  struct netlink_ext_ack *extack)
 {
unsigned int old_flags = dev->flags;
int ret;
@@ -7606,7 +7607,7 @@ int dev_change_flags(struct net_device *dev, unsigned int 
flags,
int ret;
unsigned int changes, old_flags = dev->flags, old_gflags = dev->gflags;
 
-   ret = __dev_change_flags(dev, flags);
+   ret = __dev_change_flags(dev, flags, extack);
if (ret < 0)
return ret;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4c9e4e187600..91a0f7477f8e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2871,7 +2871,8 @@ int rtnl_configure_link(struct net_device *dev, const 
struct ifinfomsg *ifm)
 
old_flags = dev->flags;
if (ifm && (ifm->ifi_flags || ifm->ifi_change)) {
-   err = __dev_change_flags(dev, rtnl_dev_combine_flags(dev, ifm));
+   err = __dev_change_flags(dev, rtnl_dev_combine_flags(dev, ifm),
+NULL);
if (err < 0)
return err;
}
-- 
2.4.11



[PATCH net-next v2 4/8] net: core: dev: Add extack argument to dev_change_flags()

2018-12-06 Thread Petr Machata
In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_change_flags().

Therefore extend dev_change_flags() with and extra extack argument and
update all users. Most of the calls end up just encoding NULL, but
several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

Since the function declaration line is changed anyway, name the other
function arguments to placate checkpatch.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
Reviewed-by: David Ahern 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  6 +++---
 drivers/net/hyperv/netvsc_drv.c   |  2 +-
 drivers/net/ipvlan/ipvlan_main.c  | 12 
 drivers/net/vrf.c |  4 ++--
 include/linux/netdevice.h |  3 ++-
 net/8021q/vlan.c  |  4 +++-
 net/core/dev.c|  4 +++-
 net/core/dev_ioctl.c  |  2 +-
 net/core/net-sysfs.c  |  2 +-
 net/core/rtnetlink.c  |  3 ++-
 net/ipv4/devinet.c|  2 +-
 net/ipv4/ipconfig.c   |  6 +++---
 net/openvswitch/vport-geneve.c|  2 +-
 net/openvswitch/vport-gre.c   |  2 +-
 net/openvswitch/vport-vxlan.c |  2 +-
 15 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 8710214594d8..6214d8c0d546 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -167,7 +167,7 @@ int ipoib_open(struct net_device *dev)
if (flags & IFF_UP)
continue;
 
-   dev_change_flags(cpriv->dev, flags | IFF_UP);
+   dev_change_flags(cpriv->dev, flags | IFF_UP, NULL);
}
up_read(>vlan_rwsem);
}
@@ -207,7 +207,7 @@ static int ipoib_stop(struct net_device *dev)
if (!(flags & IFF_UP))
continue;
 
-   dev_change_flags(cpriv->dev, flags & ~IFF_UP);
+   dev_change_flags(cpriv->dev, flags & ~IFF_UP, NULL);
}
up_read(>vlan_rwsem);
}
@@ -1823,7 +1823,7 @@ static void ipoib_parent_unregister_pre(struct net_device 
*ndev)
 * running ensures the it will not add more work.
 */
rtnl_lock();
-   dev_change_flags(priv->dev, priv->dev->flags & ~IFF_UP);
+   dev_change_flags(priv->dev, priv->dev->flags & ~IFF_UP, NULL);
rtnl_unlock();
 
/* ipoib_event() cannot be running once this returns */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index c65620adab52..18b5584d6377 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1993,7 +1993,7 @@ static void __netvsc_vf_setup(struct net_device *ndev,
"unable to change mtu to %u\n", ndev->mtu);
 
/* set multicast etc flags on VF */
-   dev_change_flags(vf_netdev, ndev->flags | IFF_SLAVE);
+   dev_change_flags(vf_netdev, ndev->flags | IFF_SLAVE, NULL);
 
/* sync address list from ndev to VF */
netif_addr_lock_bh(ndev);
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 14f1cbd3b96f..c3d3e458f541 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -85,10 +85,12 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 
nval,
flags = ipvlan->dev->flags;
if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S) {
err = dev_change_flags(ipvlan->dev,
-  flags | IFF_NOARP);
+  flags | IFF_NOARP,
+  extack);
} else {
err = dev_change_flags(ipvlan->dev,
-  flags & ~IFF_NOARP);
+  flags & ~IFF_NOARP,
+  extack);
}
if (unlikely(err))
goto fail;
@@ -117,9 +119,11 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, 
u16 nval,
flags = ipvlan->dev->flags;
if (port->mode == IPVLAN_MODE_L3 ||
port->mode == IPVLAN_MODE_L3S)
-   dev_change_flags(ipvlan->dev, flags | IFF_NOARP);
+   dev_change_flags(ipvlan->dev, flags | 

[PATCH net-next v2 3/8] net: ipvlan: ipvlan_set_port_mode(): Add an extack argument

2018-12-06 Thread Petr Machata
A follow-up patch will extend dev_change_flags() with an extack
argument. Extend ipvlan_set_port_mode() to have that argument available
for the conversion.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
---
 drivers/net/ipvlan/ipvlan_main.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 4a949569ec4c..14f1cbd3b96f 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -71,7 +71,8 @@ static void ipvlan_unregister_nf_hook(struct net *net)
ARRAY_SIZE(ipvl_nfops));
 }
 
-static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
+static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval,
+   struct netlink_ext_ack *extack)
 {
struct ipvl_dev *ipvlan;
struct net_device *mdev = port->dev;
@@ -498,7 +499,7 @@ static int ipvlan_nl_changelink(struct net_device *dev,
if (data[IFLA_IPVLAN_MODE]) {
u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
 
-   err = ipvlan_set_port_mode(port, nmode);
+   err = ipvlan_set_port_mode(port, nmode, extack);
}
 
if (!err && data[IFLA_IPVLAN_FLAGS]) {
@@ -672,7 +673,7 @@ int ipvlan_link_new(struct net *src_net, struct net_device 
*dev,
if (data && data[IFLA_IPVLAN_MODE])
mode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
 
-   err = ipvlan_set_port_mode(port, mode);
+   err = ipvlan_set_port_mode(port, mode, extack);
if (err)
goto unlink_netdev;
 
-- 
2.4.11



[PATCH net-next v2 2/8] net: vrf: cycle_netdev(): Add an extack argument

2018-12-06 Thread Petr Machata
A follow-up patch will extend dev_change_flags() with an extack
argument. Extend cycle_netdev() to have that argument available for the
conversion.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
Reviewed-by: David Ahern 
---
 drivers/net/vrf.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 21ad4b1d7f03..1e9f2dc0de07 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -747,7 +747,8 @@ static int vrf_rtable_create(struct net_device *dev)
 / device handling /
 
 /* cycle interface to flush neighbor cache and move routes across tables */
-static void cycle_netdev(struct net_device *dev)
+static void cycle_netdev(struct net_device *dev,
+struct netlink_ext_ack *extack)
 {
unsigned int flags = dev->flags;
int ret;
@@ -785,7 +786,7 @@ static int do_vrf_add_slave(struct net_device *dev, struct 
net_device *port_dev,
if (ret < 0)
goto err;
 
-   cycle_netdev(port_dev);
+   cycle_netdev(port_dev, extack);
 
return 0;
 
@@ -815,7 +816,7 @@ static int do_vrf_del_slave(struct net_device *dev, struct 
net_device *port_dev)
netdev_upper_dev_unlink(port_dev, dev);
port_dev->priv_flags &= ~IFF_L3MDEV_SLAVE;
 
-   cycle_netdev(port_dev);
+   cycle_netdev(port_dev, NULL);
 
return 0;
 }
-- 
2.4.11



[PATCH net-next v2 1/8] net: core: dev: Add extack argument to dev_open()

2018-12-06 Thread Petr Machata
In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_open().

Therefore extend dev_open() with and extra extack argument and update
all users. Most of the calls end up just encoding NULL, but bond and
team drivers have the extack readily available.

Signed-off-by: Petr Machata 
Acked-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
Reviewed-by: David Ahern 
---
 drivers/net/bonding/bond_main.c | 2 +-
 drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c | 2 +-
 drivers/net/ethernet/cisco/enic/enic_ethtool.c  | 2 +-
 drivers/net/ethernet/hisilicon/hns/hns_ethtool.c| 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c  | 2 +-
 drivers/net/ethernet/sfc/ethtool.c  | 2 +-
 drivers/net/ethernet/sfc/falcon/ethtool.c   | 2 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c   | 2 +-
 drivers/net/hyperv/netvsc_drv.c | 4 ++--
 drivers/net/net_failover.c  | 8 
 drivers/net/team/team.c | 2 +-
 drivers/net/wireless/intersil/hostap/hostap_main.c  | 2 +-
 drivers/s390/net/qeth_l2_main.c | 2 +-
 drivers/s390/net/qeth_l3_main.c | 2 +-
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 2 +-
 drivers/staging/unisys/visornic/visornic_main.c | 2 +-
 include/linux/netdevice.h   | 2 +-
 net/bluetooth/6lowpan.c | 2 +-
 net/core/dev.c  | 5 +++--
 net/core/netpoll.c  | 2 +-
 net/ipv4/ipmr.c | 4 ++--
 net/ipv6/addrconf.c | 2 +-
 net/ipv6/ip6mr.c| 2 +-
 23 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 87f1f1fe..6b34dbefa7dd 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1538,7 +1538,7 @@ int bond_enslave(struct net_device *bond_dev, struct 
net_device *slave_dev,
slave_dev->flags |= IFF_SLAVE;
 
/* open the slave since the application closed it */
-   res = dev_open(slave_dev);
+   res = dev_open(slave_dev, extack);
if (res) {
netdev_dbg(bond_dev, "Opening slave %s failed\n", 
slave_dev->name);
goto err_restore_mac;
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
index a5fd71692c8b..43b42615ad84 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
@@ -525,7 +525,7 @@ static int aq_set_ringparam(struct net_device *ndev,
}
}
if (ndev_running)
-   err = dev_open(ndev);
+   err = dev_open(ndev, NULL);
 
 err_exit:
return err;
diff --git a/drivers/net/ethernet/cisco/enic/enic_ethtool.c 
b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
index f42f7a6e1559..ebd5c2cf1efe 100644
--- a/drivers/net/ethernet/cisco/enic/enic_ethtool.c
+++ b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
@@ -241,7 +241,7 @@ static int enic_set_ringparam(struct net_device *netdev,
}
enic_init_vnic_resources(enic);
if (running) {
-   err = dev_open(netdev);
+   err = dev_open(netdev, NULL);
if (err)
goto err_out;
}
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 774beda040a1..8e9b95871d30 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -624,7 +624,7 @@ static void hns_nic_self_test(struct net_device *ndev,
clear_bit(NIC_STATE_TESTING, >state);
 
if (if_running)
-   (void)dev_open(ndev);
+   (void)dev_open(ndev, NULL);
}
/* Online tests aren't run; pass by default */
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 4563638367ac..e678b6939da3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -821,7 +821,7 @@ static int hns3_set_ringparam(struct net_device *ndev,
}
 
if (if_running)
-   ret = dev_open(ndev);
+   ret = dev_open(ndev, NULL);
 
return ret;
 }
diff --git a/drivers/net/ethernet/sfc/ethtool.c 
b/drivers/net/ethernet/sfc/ethtool.c
index 3143588ffd77..600d7b895cf2 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -539,7 +539,7 @@ static 

  1   2   >