Re: [PATCH net] cxgb4: Added missing break in ndo_udp_tunnel_{add/del}

2018-07-25 Thread David Miller
From: Arjun Vynipadath 
Date: Wed, 25 Jul 2018 19:39:52 +0530

> Break statements were missing for Geneve case in
> ndo_udp_tunnel_{add/del}, thereby raw mac matchall
> entries were not getting added.
> 
> Fixes: c746fc0e8b2d("cxgb4: add geneve offload support for T6")
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied and queued up for -stable.


Re: [PATCH net] netdevsim: don't leak devlink resources

2018-07-25 Thread David Miller
From: Jakub Kicinski 
Date: Wed, 25 Jul 2018 15:39:27 -0700

> Devlink resources registered with devlink_resource_register() have
> to be unregistered.
> 
> Fixes: 37923ed6b8ce ("netdevsim: Add simple FIB resource controller via 
> devlink")
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Quentin Monnet 

Applied and queued up for -stable.


Re: [PATCH net-next 0/4] nfp: protect from theoretical size overflows and SR-IOV errors

2018-07-25 Thread David Miller
From: Jakub Kicinski 
Date: Wed, 25 Jul 2018 19:40:33 -0700

> This small set changes the handling of pci_sriov_set_totalvfs() errors.
> nfp is the only driver which fails probe on pci_sriov_set_totalvfs()
> errors.  It turns out some BIOS configurations may break SR-IOV and
> users who don't use that feature should not suffer.
> 
> Remaining patches makes sure we use overflow-safe function for ring
> allocation, even though ring sizes are limited.  It won't hurt and
> we can also enable fallback to vmalloc() if memory is tight while
> at it.

Series applied, thanks Jakub.


Re: [PATCH rdma-next v2 0/8] Support mlx5 flow steering with RAW data

2018-07-25 Thread Leon Romanovsky
On Wed, Jul 25, 2018 at 08:35:17AM -0600, Jason Gunthorpe wrote:
> On Wed, Jul 25, 2018 at 08:37:03AM +0300, Leon Romanovsky wrote:
>
> > > Also, I would like to keep the specs consistently formatted according
> > > to clang-format with 'BinPackParameters: true', so I reflowed them as
> > > well.
> >
> > I'm using default VIM clang-format.py without anything in .clang-format.
> > Do you have an extra definitions there, except BinPackParameters?
>
> These days Linux includes a top level .clang-format that does a
> pretty good job.
>
> I have to manually switch BinPackParameters on when working with these
> specs to get the right indenting.. A pain, but maybe there is a better
> way someday..

I don't think that it is feasible to ask from people to change some
defaults only for patches that touch those specs. Any change in this
area will change formatting back.

Jason, bottom line, I won't use BinPackParameters for my patches.

Thanks

>
> Jason


signature.asc
Description: PGP signature


[PATCH bpf-next 2/6] nfp: allow control message reception on data queues

2018-07-25 Thread Jakub Kicinski
Port id 0x is reserved for control messages.  Allow reception
of messages with this id on data queues.  Hand off a raw buffer to
the higher layer code, without allocating SKB for max efficiency.
The RX handle can't modify or keep the buffer, after it returns
buffer is handed back over to the NIC RX free buffer list.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 drivers/net/ethernet/netronome/nfp/nfp_app.c|  2 ++
 drivers/net/ethernet/netronome/nfp/nfp_app.h| 17 +
 .../net/ethernet/netronome/nfp/nfp_net_common.c | 11 +++
 .../net/ethernet/netronome/nfp/nfp_net_ctrl.h   |  1 +
 4 files changed, 31 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index 69d4ae7a61f3..8607d09ab732 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -172,6 +172,8 @@ struct nfp_app *nfp_app_alloc(struct nfp_pf *pf, enum 
nfp_app_id id)
 
if (WARN_ON(!apps[id]->name || !apps[id]->vnic_alloc))
return ERR_PTR(-EINVAL);
+   if (WARN_ON(!apps[id]->ctrl_msg_rx && apps[id]->ctrl_msg_rx_raw))
+   return ERR_PTR(-EINVAL);
 
app = kzalloc(sizeof(*app), GFP_KERNEL);
if (!app)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.h 
b/drivers/net/ethernet/netronome/nfp/nfp_app.h
index afbc19aa66a8..ccb244cf6c30 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.h
@@ -98,6 +98,7 @@ extern const struct nfp_app_type app_abm;
  * @start: start application logic
  * @stop:  stop application logic
  * @ctrl_msg_rx:control message handler
+ * @ctrl_msg_rx_raw:   handler for control messages from data queues
  * @setup_tc:  setup TC ndo
  * @bpf:   BPF ndo offload-related calls
  * @xdp_offload:offload an XDP program
@@ -150,6 +151,8 @@ struct nfp_app_type {
void (*stop)(struct nfp_app *app);
 
void (*ctrl_msg_rx)(struct nfp_app *app, struct sk_buff *skb);
+   void (*ctrl_msg_rx_raw)(struct nfp_app *app, const void *data,
+   unsigned int len);
 
int (*setup_tc)(struct nfp_app *app, struct net_device *netdev,
enum tc_setup_type type, void *type_data);
@@ -318,6 +321,11 @@ static inline bool nfp_app_ctrl_has_meta(struct nfp_app 
*app)
return app->type->ctrl_has_meta;
 }
 
+static inline bool nfp_app_ctrl_uses_data_vnics(struct nfp_app *app)
+{
+   return app && app->type->ctrl_msg_rx_raw;
+}
+
 static inline const char *nfp_app_extra_cap(struct nfp_app *app,
struct nfp_net *nn)
 {
@@ -381,6 +389,15 @@ static inline void nfp_app_ctrl_rx(struct nfp_app *app, 
struct sk_buff *skb)
app->type->ctrl_msg_rx(app, skb);
 }
 
+static inline void
+nfp_app_ctrl_rx_raw(struct nfp_app *app, const void *data, unsigned int len)
+{
+   trace_devlink_hwmsg(priv_to_devlink(app->pf), true, 0, data, len);
+
+   if (app && app->type->ctrl_msg_rx_raw)
+   app->type->ctrl_msg_rx_raw(app, data, len);
+}
+
 static inline int nfp_app_eswitch_mode_get(struct nfp_app *app, u16 *mode)
 {
if (!app->type->eswitch_mode_get)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index cdfbd1e8bf4b..ca42f7da4f50 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1759,6 +1759,14 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
if (likely(!meta.portid)) {
netdev = dp->netdev;
+   } else if (meta.portid == NFP_META_PORT_ID_CTRL) {
+   struct nfp_net *nn = netdev_priv(dp->netdev);
+
+   nfp_app_ctrl_rx_raw(nn->app, rxbuf->frag + pkt_off,
+   pkt_len);
+   nfp_net_rx_give_one(dp, rx_ring, rxbuf->frag,
+   rxbuf->dma_addr);
+   continue;
} else {
struct nfp_net *nn;
 
@@ -3857,6 +3865,9 @@ int nfp_net_init(struct nfp_net *nn)
nn->dp.mtu = NFP_NET_DEFAULT_MTU;
nn->dp.fl_bufsz = nfp_net_calc_fl_bufsz(>dp);
 
+   if (nfp_app_ctrl_uses_data_vnics(nn->app))
+   nn->dp.ctrl |= nn->cap & NFP_NET_CFG_CTRL_CMSG_DATA;
+
if (nn->cap & NFP_NET_CFG_CTRL_RSS_ANY) {
nfp_net_rss_init(nn);
nn->dp.ctrl |= nn->cap & NFP_NET_CFG_CTRL_RSS2 ?:
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
index bb63c115537d..44d3ea75d043 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h
@@ -127,6 +127,7 @@
 

[PATCH bpf-next 1/6] nfp: move repr handling on RX path

2018-07-25 Thread Jakub Kicinski
Representor packets are received on PF queues with special metadata tag
for demux.  There is no reason to resolve the representor ID -> netdev
after the skb has been allocated.  Move the code, this will allow us to
handle special FW messages without SKB allocation overhead.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 .../ethernet/netronome/nfp/nfp_net_common.c   | 29 ++-
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index cf1704e972b7..cdfbd1e8bf4b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1757,6 +1757,21 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
}
}
 
+   if (likely(!meta.portid)) {
+   netdev = dp->netdev;
+   } else {
+   struct nfp_net *nn;
+
+   nn = netdev_priv(dp->netdev);
+   netdev = nfp_app_repr_get(nn->app, meta.portid);
+   if (unlikely(!netdev)) {
+   nfp_net_rx_drop(dp, r_vec, rx_ring, rxbuf,
+   NULL);
+   continue;
+   }
+   nfp_repr_inc_rx_stats(netdev, pkt_len);
+   }
+
skb = build_skb(rxbuf->frag, true_bufsz);
if (unlikely(!skb)) {
nfp_net_rx_drop(dp, r_vec, rx_ring, rxbuf, NULL);
@@ -1772,20 +1787,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, 
int budget)
 
nfp_net_rx_give_one(dp, rx_ring, new_frag, new_dma_addr);
 
-   if (likely(!meta.portid)) {
-   netdev = dp->netdev;
-   } else {
-   struct nfp_net *nn;
-
-   nn = netdev_priv(dp->netdev);
-   netdev = nfp_app_repr_get(nn->app, meta.portid);
-   if (unlikely(!netdev)) {
-   nfp_net_rx_drop(dp, r_vec, rx_ring, NULL, skb);
-   continue;
-   }
-   nfp_repr_inc_rx_stats(netdev, pkt_len);
-   }
-
skb_reserve(skb, pkt_off);
skb_put(skb, pkt_len);
 
-- 
2.17.1



[PATCH bpf-next 3/6] nfp: bpf: pass raw data buffer to nfp_bpf_event_output()

2018-07-25 Thread Jakub Kicinski
In preparation for SKB-less perf event handling make
nfp_bpf_event_output() take buffer address and length,
not SKB as parameters.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c |  5 -
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  3 ++-
 .../net/ethernet/netronome/nfp/bpf/offload.c  | 21 ---
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
index cb87fccb9f6a..0a89b53962aa 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
@@ -441,7 +441,10 @@ void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct 
sk_buff *skb)
}
 
if (nfp_bpf_cmsg_get_type(skb) == CMSG_TYPE_BPF_EVENT) {
-   nfp_bpf_event_output(bpf, skb);
+   if (!nfp_bpf_event_output(bpf, skb->data, skb->len))
+   dev_consume_skb_any(skb);
+   else
+   dev_kfree_skb_any(skb);
return;
}
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index bec935468f90..e25d3c0c7e43 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -501,7 +501,8 @@ int nfp_bpf_ctrl_lookup_entry(struct bpf_offloaded_map 
*offmap,
 int nfp_bpf_ctrl_getnext_entry(struct bpf_offloaded_map *offmap,
   void *key, void *next_key);
 
-int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb);
+int nfp_bpf_event_output(struct nfp_app_bpf *bpf, const void *data,
+unsigned int len);
 
 void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb);
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 49b03f7dbf46..293dda84818f 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -453,23 +453,24 @@ nfp_bpf_perf_event_copy(void *dst, const void *src,
return 0;
 }
 
-int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct sk_buff *skb)
+int nfp_bpf_event_output(struct nfp_app_bpf *bpf, const void *data,
+unsigned int len)
 {
-   struct cmsg_bpf_event *cbe = (void *)skb->data;
+   struct cmsg_bpf_event *cbe = (void *)data;
u32 pkt_size, data_size;
struct bpf_map *map;
 
-   if (skb->len < sizeof(struct cmsg_bpf_event))
-   goto err_drop;
+   if (len < sizeof(struct cmsg_bpf_event))
+   return -EINVAL;
 
pkt_size = be32_to_cpu(cbe->pkt_size);
data_size = be32_to_cpu(cbe->data_size);
map = (void *)(unsigned long)be64_to_cpu(cbe->map_ptr);
 
-   if (skb->len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size)
-   goto err_drop;
+   if (len < sizeof(struct cmsg_bpf_event) + pkt_size + data_size)
+   return -EINVAL;
if (cbe->hdr.ver != CMSG_MAP_ABI_VERSION)
-   goto err_drop;
+   return -EINVAL;
 
rcu_read_lock();
if (!rhashtable_lookup_fast(>maps_neutral, ,
@@ -477,7 +478,7 @@ int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct 
sk_buff *skb)
rcu_read_unlock();
pr_warn("perf event: dest map pointer %px not recognized, 
dropping event\n",
map);
-   goto err_drop;
+   return -EINVAL;
}
 
bpf_event_output(map, be32_to_cpu(cbe->cpu_id),
@@ -485,11 +486,7 @@ int nfp_bpf_event_output(struct nfp_app_bpf *bpf, struct 
sk_buff *skb)
 cbe->data, pkt_size, nfp_bpf_perf_event_copy);
rcu_read_unlock();
 
-   dev_consume_skb_any(skb);
return 0;
-err_drop:
-   dev_kfree_skb_any(skb);
-   return -EINVAL;
 }
 
 static int
-- 
2.17.1



[PATCH bpf-next 5/6] nfp: bpf: remember maps by ID

2018-07-25 Thread Jakub Kicinski
Record perf maps by map ID, not raw kernel pointer.  This helps
with debug messages, because printing pointers to logs is frowned
upon, and makes debug easier for the users, as map ID is something
they should be more familiar with.  Note that perf maps are offload
neutral, therefore IDs won't be orphaned.

While at it use a rate limited print helper for the error message.

Reported-by: Kees Cook 
Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
Reviewed-by: Quentin Monnet 
---
CC: Kees Cook 

 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c |  2 --
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 12 ++
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  4 ++--
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  3 +++
 .../net/ethernet/netronome/nfp/bpf/offload.c  | 22 +++
 5 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
index 1946291bf4fd..2572a4b91c7c 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
@@ -43,8 +43,6 @@
 #include "fw.h"
 #include "main.h"
 
-#define cmsg_warn(bpf, msg...) nn_dp_warn(&(bpf)->app->ctrl->dp, msg)
-
 #define NFP_BPF_TAG_ALLOC_SPAN (U16_MAX / 4)
 
 static bool nfp_bpf_all_tags_busy(struct nfp_app_bpf *bpf)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c 
b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 1d9e36835404..3c22d27de9da 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -3883,6 +3883,7 @@ static int nfp_bpf_replace_map_ptrs(struct nfp_prog 
*nfp_prog)
struct nfp_insn_meta *meta1, *meta2;
struct nfp_bpf_map *nfp_map;
struct bpf_map *map;
+   u32 id;
 
nfp_for_each_insn_walk2(nfp_prog, meta1, meta2) {
if (meta1->skip || meta2->skip)
@@ -3894,11 +3895,14 @@ static int nfp_bpf_replace_map_ptrs(struct nfp_prog 
*nfp_prog)
 
map = (void *)(unsigned long)((u32)meta1->insn.imm |
  (u64)meta2->insn.imm << 32);
-   if (bpf_map_offload_neutral(map))
-   continue;
-   nfp_map = map_to_offmap(map)->dev_priv;
+   if (bpf_map_offload_neutral(map)) {
+   id = map->id;
+   } else {
+   nfp_map = map_to_offmap(map)->dev_priv;
+   id = nfp_map->tid;
+   }
 
-   meta1->insn.imm = nfp_map->tid;
+   meta1->insn.imm = id;
meta2->insn.imm = 0;
}
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 192e88981fb2..cce1d2945a32 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -45,8 +45,8 @@
 
 const struct rhashtable_params nfp_bpf_maps_neutral_params = {
.nelem_hint = 4,
-   .key_len= FIELD_SIZEOF(struct nfp_bpf_neutral_map, ptr),
-   .key_offset = offsetof(struct nfp_bpf_neutral_map, ptr),
+   .key_len= FIELD_SIZEOF(struct bpf_map, id),
+   .key_offset = offsetof(struct nfp_bpf_neutral_map, map_id),
.head_offset= offsetof(struct nfp_bpf_neutral_map, l),
.automatic_shrinking= true,
 };
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index 017e0ae5e736..57573bfa8c03 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -47,6 +47,8 @@
 #include "../nfp_asm.h"
 #include "fw.h"
 
+#define cmsg_warn(bpf, msg...) nn_dp_warn(&(bpf)->app->ctrl->dp, msg)
+
 /* For relocation logic use up-most byte of branch instruction as scratch
  * area.  Remember to clear this before sending instructions to HW!
  */
@@ -221,6 +223,7 @@ struct nfp_bpf_map {
 struct nfp_bpf_neutral_map {
struct rhash_head l;
struct bpf_map *ptr;
+   u32 map_id;
u32 count;
 };
 
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index 293dda84818f..b1fbb3babc7f 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -67,7 +67,7 @@ nfp_map_ptr_record(struct nfp_app_bpf *bpf, struct nfp_prog 
*nfp_prog,
ASSERT_RTNL();
 
/* Reuse path - other offloaded program is already tracking this map. */
-   record = rhashtable_lookup_fast(>maps_neutral, ,
+   record = rhashtable_lookup_fast(>maps_neutral, >id,
nfp_bpf_maps_neutral_params);
if (record) {
nfp_prog->map_records[nfp_prog->map_records_cnt++] = record;
@@ -89,6 +89,7 @@ nfp_map_ptr_record(struct nfp_app_bpf *bpf, 

[PATCH bpf-next 4/6] nfp: bpf: allow receiving perf events on data queues

2018-07-25 Thread Jakub Kicinski
Control queue is fairly low latency, and requires SKB allocations,
which means we can't even reach 0.5Msps with perf events.  Allow
perf events to be delivered to data queues.  This allows us to not
only use multiple queues, but also receive and deliver to user space
more than 5Msps per queue (Xeon E5-2630 v4 2.20GHz, no retpolines).

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
Reviewed-by: Quentin Monnet 
---
 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 18 ++
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  1 +
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  3 +++
 3 files changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c 
b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
index 0a89b53962aa..1946291bf4fd 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/cmsg.c
@@ -468,3 +468,21 @@ void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct 
sk_buff *skb)
 err_free:
dev_kfree_skb_any(skb);
 }
+
+void
+nfp_bpf_ctrl_msg_rx_raw(struct nfp_app *app, const void *data, unsigned int 
len)
+{
+   struct nfp_app_bpf *bpf = app->priv;
+   const struct cmsg_hdr *hdr = data;
+
+   if (unlikely(len < sizeof(struct cmsg_reply_map_simple))) {
+   cmsg_warn(bpf, "cmsg drop - too short %d!\n", len);
+   return;
+   }
+
+   if (hdr->type == CMSG_TYPE_BPF_EVENT)
+   nfp_bpf_event_output(bpf, data, len);
+   else
+   cmsg_warn(bpf, "cmsg drop - msg type %d with raw buffer!\n",
+ hdr->type);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.c 
b/drivers/net/ethernet/netronome/nfp/bpf/main.c
index 994d2b756fe1..192e88981fb2 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.c
@@ -490,6 +490,7 @@ const struct nfp_app_type app_bpf = {
.vnic_free  = nfp_bpf_vnic_free,
 
.ctrl_msg_rx= nfp_bpf_ctrl_msg_rx,
+   .ctrl_msg_rx_raw= nfp_bpf_ctrl_msg_rx_raw,
 
.setup_tc   = nfp_bpf_setup_tc,
.bpf= nfp_ndo_bpf,
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h 
b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index e25d3c0c7e43..017e0ae5e736 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -505,4 +505,7 @@ int nfp_bpf_event_output(struct nfp_app_bpf *bpf, const 
void *data,
 unsigned int len);
 
 void nfp_bpf_ctrl_msg_rx(struct nfp_app *app, struct sk_buff *skb);
+void
+nfp_bpf_ctrl_msg_rx_raw(struct nfp_app *app, const void *data,
+   unsigned int len);
 #endif
-- 
2.17.1



[PATCH bpf-next 0/6] nfp: bpf: improve efficiency of offloaded perf events

2018-07-25 Thread Jakub Kicinski
Hi!

This set is focused on improving the performance of perf events
reported from BPF offload.  Perf events can now be received on
packet data queues, which significantly improves the performance
(from total of 0.5 Msps to 5Msps per core).  To get to this
performance we need a fast path for control messages which will
operate on raw buffers and recycle them immediately.

Patch 5 replaces the map pointers for perf maps with map IDs.
We look the pointers up in a hashtable, anyway, to validate they
are correct, so there is no performance difference.  Map IDs
have the advantage of being easier to understand for users in
case of errors (we no longer print raw pointers to the logs).

Last patch improves info messages about map offload.

Jakub Kicinski (6):
  nfp: move repr handling on RX path
  nfp: allow control message reception on data queues
  nfp: bpf: pass raw data buffer to nfp_bpf_event_output()
  nfp: bpf: allow receiving perf events on data queues
  nfp: bpf: remember maps by ID
  nfp: bpf: improve map offload info messages

 drivers/net/ethernet/netronome/nfp/bpf/cmsg.c | 25 +++-
 drivers/net/ethernet/netronome/nfp/bpf/jit.c  | 12 ++--
 drivers/net/ethernet/netronome/nfp/bpf/main.c |  5 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.h |  9 ++-
 .../net/ethernet/netronome/nfp/bpf/offload.c  | 63 +++
 drivers/net/ethernet/netronome/nfp/nfp_app.c  |  2 +
 drivers/net/ethernet/netronome/nfp/nfp_app.h  | 17 +
 .../ethernet/netronome/nfp/nfp_net_common.c   | 40 +++-
 .../net/ethernet/netronome/nfp/nfp_net_ctrl.h |  1 +
 9 files changed, 125 insertions(+), 49 deletions(-)

-- 
2.17.1



[PATCH bpf-next 6/6] nfp: bpf: improve map offload info messages

2018-07-25 Thread Jakub Kicinski
FW can put constraints on map element size to maximize resource
use and efficiency.  When user attempts offload of a map which
does not fit into those constraints an informational message is
printed to kernel logs to inform user about the reason offload
failed.  Map offload does not have access to any advanced error
reporting like verifier log or extack.  There is also currently
no way for us to nicely expose the FW capabilities to user
space.  Given all those constraints we should make sure log
messages are as informative as possible.  Improve them.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 .../net/ethernet/netronome/nfp/bpf/offload.c  | 20 +++
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/bpf/offload.c 
b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
index b1fbb3babc7f..1ccd6371a15b 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/offload.c
@@ -380,11 +380,23 @@ nfp_bpf_map_alloc(struct nfp_app_bpf *bpf, struct 
bpf_offloaded_map *offmap)
bpf->maps.max_elems - bpf->map_elems_in_use);
return -ENOMEM;
}
-   if (offmap->map.key_size > bpf->maps.max_key_sz ||
-   offmap->map.value_size > bpf->maps.max_val_sz ||
-   round_up(offmap->map.key_size, 8) +
+
+   if (round_up(offmap->map.key_size, 8) +
round_up(offmap->map.value_size, 8) > bpf->maps.max_elem_sz) {
-   pr_info("elements don't fit in device constraints\n");
+   pr_info("map elements too large: %u, FW max element size 
(key+value): %u\n",
+   round_up(offmap->map.key_size, 8) +
+   round_up(offmap->map.value_size, 8),
+   bpf->maps.max_elem_sz);
+   return -ENOMEM;
+   }
+   if (offmap->map.key_size > bpf->maps.max_key_sz) {
+   pr_info("map key size %u, FW max is %u\n",
+   offmap->map.key_size, bpf->maps.max_key_sz);
+   return -ENOMEM;
+   }
+   if (offmap->map.value_size > bpf->maps.max_val_sz) {
+   pr_info("map value size %u, FW max is %u\n",
+   offmap->map.value_size, bpf->maps.max_val_sz);
return -ENOMEM;
}
 
-- 
2.17.1



[PATCH net-next 3/4] nfp: restore correct ordering of fields in rx ring structure

2018-07-25 Thread Jakub Kicinski
Commit 7f1c684a8966 ("nfp: setup xdp_rxq_info") mixed the cache
cold and cache hot data in the nfp_net_rx_ring structure (ignoring
the feedback), to try to fit the structure into 2 cache lines
after struct xdp_rxq_info was added.  Now that we are about to add
a new field the structure will grow back to 3 cache lines, so
order the members correctly.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 8970ec981e11..607896910fb0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -350,9 +350,9 @@ struct nfp_net_rx_buf {
  * @qcp_fl: Pointer to base of the QCP freelist queue
  * @rxbufs: Array of transmitted FL/RX buffers
  * @rxds:   Virtual address of FL/RX ring in host memory
+ * @xdp_rxq:RX-ring info avail for XDP
  * @dma:DMA address of the FL/RX ring
  * @size:   Size, in bytes, of the FL/RX ring (needed to free)
- * @xdp_rxq:RX-ring info avail for XDP
  */
 struct nfp_net_rx_ring {
struct nfp_net_r_vector *r_vec;
@@ -364,14 +364,15 @@ struct nfp_net_rx_ring {
u32 idx;
 
int fl_qcidx;
-   unsigned int size;
u8 __iomem *qcp_fl;
 
struct nfp_net_rx_buf *rxbufs;
struct nfp_net_rx_desc *rxds;
 
-   dma_addr_t dma;
struct xdp_rxq_info xdp_rxq;
+
+   dma_addr_t dma;
+   unsigned int size;
 } cacheline_aligned;
 
 /**
-- 
2.17.1



[PATCH net-next 0/4] nfp: protect from theoretical size overflows and SR-IOV errors

2018-07-25 Thread Jakub Kicinski
Hi!

This small set changes the handling of pci_sriov_set_totalvfs() errors.
nfp is the only driver which fails probe on pci_sriov_set_totalvfs()
errors.  It turns out some BIOS configurations may break SR-IOV and
users who don't use that feature should not suffer.

Remaining patches makes sure we use overflow-safe function for ring
allocation, even though ring sizes are limited.  It won't hurt and
we can also enable fallback to vmalloc() if memory is tight while
at it.

Jakub Kicinski (4):
  nfp: don't fail probe on pci_sriov_set_totalvfs() errors
  nfp: use kvcalloc() to allocate SW buffer descriptor arrays
  nfp: restore correct ordering of fields in rx ring structure
  nfp: protect from theoretical size overflows on HW descriptor ring

 drivers/net/ethernet/netronome/nfp/nfp_main.c | 20 +--
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |  9 ---
 .../ethernet/netronome/nfp/nfp_net_common.c   | 25 ++-
 3 files changed, 30 insertions(+), 24 deletions(-)

-- 
2.17.1



[PATCH net-next 2/4] nfp: use kvcalloc() to allocate SW buffer descriptor arrays

2018-07-25 Thread Jakub Kicinski
Use kvcalloc() instead of tmp variable + kzalloc() when allocating
SW buffer information to allow falling back to vmalloc and to protect
from theoretical integer overflow.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c  | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index cf1704e972b7..d02baefcb350 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2126,7 +2127,7 @@ static void nfp_net_tx_ring_free(struct nfp_net_tx_ring 
*tx_ring)
struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
struct nfp_net_dp *dp = _vec->nfp_net->dp;
 
-   kfree(tx_ring->txbufs);
+   kvfree(tx_ring->txbufs);
 
if (tx_ring->txds)
dma_free_coherent(dp->dev, tx_ring->size,
@@ -2150,7 +2151,6 @@ static int
 nfp_net_tx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_tx_ring *tx_ring)
 {
struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
-   int sz;
 
tx_ring->cnt = dp->txd_cnt;
 
@@ -2160,8 +2160,8 @@ nfp_net_tx_ring_alloc(struct nfp_net_dp *dp, struct 
nfp_net_tx_ring *tx_ring)
if (!tx_ring->txds)
goto err_alloc;
 
-   sz = sizeof(*tx_ring->txbufs) * tx_ring->cnt;
-   tx_ring->txbufs = kzalloc(sz, GFP_KERNEL);
+   tx_ring->txbufs = kvcalloc(tx_ring->cnt, sizeof(*tx_ring->txbufs),
+  GFP_KERNEL);
if (!tx_ring->txbufs)
goto err_alloc;
 
@@ -2275,7 +2275,7 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
 
if (dp->netdev)
xdp_rxq_info_unreg(_ring->xdp_rxq);
-   kfree(rx_ring->rxbufs);
+   kvfree(rx_ring->rxbufs);
 
if (rx_ring->rxds)
dma_free_coherent(dp->dev, rx_ring->size,
@@ -2298,7 +2298,7 @@ static void nfp_net_rx_ring_free(struct nfp_net_rx_ring 
*rx_ring)
 static int
 nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring)
 {
-   int sz, err;
+   int err;
 
if (dp->netdev) {
err = xdp_rxq_info_reg(_ring->xdp_rxq, dp->netdev,
@@ -2314,8 +2314,8 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct 
nfp_net_rx_ring *rx_ring)
if (!rx_ring->rxds)
goto err_alloc;
 
-   sz = sizeof(*rx_ring->rxbufs) * rx_ring->cnt;
-   rx_ring->rxbufs = kzalloc(sz, GFP_KERNEL);
+   rx_ring->rxbufs = kvcalloc(rx_ring->cnt, sizeof(*rx_ring->rxbufs),
+  GFP_KERNEL);
if (!rx_ring->rxbufs)
goto err_alloc;
 
-- 
2.17.1



[PATCH net-next 1/4] nfp: don't fail probe on pci_sriov_set_totalvfs() errors

2018-07-25 Thread Jakub Kicinski
On machines with buggy ACPI tables or when SR-IOV is already enabled
we may not be able to set the SR-IOV VF limit in sysfs, it's not fatal
because the limit is imposed by the driver anyway.  Only the sysfs
'sriov_totalvfs' attribute will be too high.  Print an error to inform
user about the failure but allow probe to continue.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c | 20 +++
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 152283d7e59c..4a540c5e27fe 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -236,16 +236,20 @@ static int nfp_pcie_sriov_read_nfd_limit(struct nfp_pf 
*pf)
int err;
 
pf->limit_vfs = nfp_rtsym_read_le(pf->rtbl, "nfd_vf_cfg_max_vfs", );
-   if (!err)
-   return pci_sriov_set_totalvfs(pf->pdev, pf->limit_vfs);
+   if (err) {
+   /* For backwards compatibility if symbol not found allow all */
+   pf->limit_vfs = ~0;
+   if (err == -ENOENT)
+   return 0;
 
-   pf->limit_vfs = ~0;
-   /* Allow any setting for backwards compatibility if symbol not found */
-   if (err == -ENOENT)
-   return 0;
+   nfp_warn(pf->cpp, "Warning: VF limit read failed: %d\n", err);
+   return err;
+   }
 
-   nfp_warn(pf->cpp, "Warning: VF limit read failed: %d\n", err);
-   return err;
+   err = pci_sriov_set_totalvfs(pf->pdev, pf->limit_vfs);
+   if (err)
+   nfp_warn(pf->cpp, "Failed to set VF count in sysfs: %d\n", err);
+   return 0;
 }
 
 static int nfp_pcie_sriov_enable(struct pci_dev *pdev, int num_vfs)
-- 
2.17.1



[PATCH net-next 4/4] nfp: protect from theoretical size overflows on HW descriptor ring

2018-07-25 Thread Jakub Kicinski
Use array_size() and store the size as full size_t to protect from
theoretical size overflow when handling HW descriptor rings.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h| 4 ++--
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 9 +
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 607896910fb0..439e6ffe2f05 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -250,7 +250,7 @@ struct nfp_net_tx_ring {
struct nfp_net_tx_desc *txds;
 
dma_addr_t dma;
-   unsigned int size;
+   size_t size;
bool is_xdp;
 } cacheline_aligned;
 
@@ -372,7 +372,7 @@ struct nfp_net_rx_ring {
struct xdp_rxq_info xdp_rxq;
 
dma_addr_t dma;
-   unsigned int size;
+   size_t size;
 } cacheline_aligned;
 
 /**
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index d02baefcb350..7c1a921d178d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1121,7 +1122,7 @@ nfp_net_tx_ring_reset(struct nfp_net_dp *dp, struct 
nfp_net_tx_ring *tx_ring)
tx_ring->rd_p++;
}
 
-   memset(tx_ring->txds, 0, sizeof(*tx_ring->txds) * tx_ring->cnt);
+   memset(tx_ring->txds, 0, tx_ring->size);
tx_ring->wr_p = 0;
tx_ring->rd_p = 0;
tx_ring->qcp_rd_p = 0;
@@ -1301,7 +1302,7 @@ static void nfp_net_rx_ring_reset(struct nfp_net_rx_ring 
*rx_ring)
rx_ring->rxbufs[last_idx].dma_addr = 0;
rx_ring->rxbufs[last_idx].frag = NULL;
 
-   memset(rx_ring->rxds, 0, sizeof(*rx_ring->rxds) * rx_ring->cnt);
+   memset(rx_ring->rxds, 0, rx_ring->size);
rx_ring->wr_p = 0;
rx_ring->rd_p = 0;
 }
@@ -2154,7 +2155,7 @@ nfp_net_tx_ring_alloc(struct nfp_net_dp *dp, struct 
nfp_net_tx_ring *tx_ring)
 
tx_ring->cnt = dp->txd_cnt;
 
-   tx_ring->size = sizeof(*tx_ring->txds) * tx_ring->cnt;
+   tx_ring->size = array_size(tx_ring->cnt, sizeof(*tx_ring->txds));
tx_ring->txds = dma_zalloc_coherent(dp->dev, tx_ring->size,
_ring->dma, GFP_KERNEL);
if (!tx_ring->txds)
@@ -2308,7 +2309,7 @@ nfp_net_rx_ring_alloc(struct nfp_net_dp *dp, struct 
nfp_net_rx_ring *rx_ring)
}
 
rx_ring->cnt = dp->rxd_cnt;
-   rx_ring->size = sizeof(*rx_ring->rxds) * rx_ring->cnt;
+   rx_ring->size = array_size(rx_ring->cnt, sizeof(*rx_ring->rxds));
rx_ring->rxds = dma_zalloc_coherent(dp->dev, rx_ring->size,
_ring->dma, GFP_KERNEL);
if (!rx_ring->rxds)
-- 
2.17.1



Re: [PATCH V2 bpf] xdp: add NULL pointer check in __xdp_return()

2018-07-25 Thread Jakub Kicinski
On Thu, 26 Jul 2018 00:09:50 +0900, Taehee Yoo wrote:
> rhashtable_lookup() can return NULL. so that NULL pointer
> check routine should be added.
> 
> Fixes: 02b55e5657c3 ("xdp: add MEM_TYPE_ZERO_COPY")
> Signed-off-by: Taehee Yoo 
> ---
> V2 : add WARN_ON_ONCE when xa is NULL.
> 
>  net/core/xdp.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 9d1f220..786fdbe 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -345,7 +345,10 @@ static void __xdp_return(void *data, struct xdp_mem_info 
> *mem, bool napi_direct,
>   rcu_read_lock();
>   /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
>   xa = rhashtable_lookup(mem_id_ht, >id, mem_id_rht_params);
> - xa->zc_alloc->free(xa->zc_alloc, handle);
> + if (!xa)
> + WARN_ON_ONCE(1);

nit: is compiler smart enough to figure out the fast path here?
WARN_ON_ONCE() has the nice side effect of wrapping the condition in
unlikely().  It could save us both LoC and potentially cycles to do:

if (!WARN_ON_ONCE(!xa))
xa->zc_alloc->free(xa->zc_alloc, handle);

Although it admittedly looks a bit awkward.  I'm not sure if we have
some form of assert (i.e. positive check) in tree :S

> + else
> + xa->zc_alloc->free(xa->zc_alloc, handle);
>   rcu_read_unlock();
>   default:
>   /* Not possible, checked in xdp_rxq_info_reg_mem_model() */


Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-25 Thread D'Souza, Nelson
David,

To narrow down on the issue, I've been requested by our kernel team for the 
following information:

"Can you clarify what kernel configuration was used for the clean 4.14.52 
kernel (no changes)

 The kernel configuration may be available in /proc/config.gz, or it might be 
available as a text file in the /boot directory."

Would you be able to provide this?

Nelson

On 7/25/18, 5:35 PM, "D'Souza, Nelson"  wrote:

David, 

I tried out the commands on an Ubuntu 17.10.1 VM.
The pings on test-vrf are successful, but the pings on br0 are not 
successful.

# uname -rv  
4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017

 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 17.10
Release:17.10
Codename:   artful

# ip rule  --> Note: its missing the l3mdev rule
0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Ran the configs from a bash script vrf.sh

 # ./vrf.sh 
+ ip netns add foo
+ ip li add veth1 type veth peer name veth2
+ ip li set veth2 netns foo
+ ip -netns foo li set lo up
+ ip -netns foo li set veth2 up
+ ip -netns foo addr add 172.16.1.2/24 dev veth2
+ ip li add test-vrf type vrf table 123
+ ip li set test-vrf up
+ ip ro add vrf test-vrf unreachable default
+ ip li add br0 type bridge
+ ip li set veth1 master br0
+ ip li set veth1 up
+ ip li set br0 up
+ ip addr add dev br0 172.16.1.1/24
+ ip li set br0 master test-vrf
+ ip -netns foo addr add 172.16.2.2/32 dev lo
+ ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

# ping -I test-vrf 172.16.2.2 -c 2  <<< successful on test-vrf
ping: Warning: source address might be selected on device other than 
test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms

#ping -I br0 172.16.2.2 -c 2   <<< fails on br0
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1022ms

Please let me know if I should try a different version.

Nelson

On 7/24/18, 9:08 AM, "D'Souza, Nelson"  wrote:

It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but 
enslaving eth1->test-vrf works fine.

Nelson

On 7/24/18, 8:58 AM, "D'Souza, Nelson"  wrote:

Thank you David, really appreciate the help. Most likely something 
specific to my environment.

ip vrf id, does not report anything on my system. Here's the result 
after running the command.

# ip vrf id
#

I'll follow up with a VM.

Nelson

On 7/24/18, 5:55 AM, "David Ahern"  wrote:

On 7/23/18 7:43 PM, D'Souza, Nelson wrote:
> I copy and pasted the configs onto my device, but pings on 
test-vrf do not work in my setup. 
> I'm essentially seeing the same issue as I reported before.
> 
> In this case, pings sent out on test-vrf (host ns) are 
received and replied to by the loopback interface (foo ns). Although the 
replies are seen at the test-vrf level, they are not locally delivered to the 
ping application.
> 

I just built v4.14.52 kernel and ran those commands - worked 
fine. It is
something specific to your environment. Is your shell tied to a 
VRF --
(ip vrf id)?

After that, I suggest you create a VM running a newer 
distribution of
your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 
kernel, or
Fedora 26 or newer) and run the commands there.










Re: [net-next 10/16] net/mlx5: Support PCIe buffer congestion handling via Devlink

2018-07-25 Thread Jakub Kicinski
On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote:
> On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote:
> > On 7/24/2018 10:51 PM, Jakub Kicinski wrote:  
>  The devlink params haven't been upstream even for a full cycle and
>  already you guys are starting to use them to configure standard
>  features like queuing.  
> >>>
> >>> We developed the devlink params in order to support non-standard
> >>> configuration only. And for non-standard, there are generic and vendor
> >>> specific options.  
> >>
> >> I thought it was developed for performing non-standard and possibly
> >> vendor specific configuration.  Look at DEVLINK_PARAM_GENERIC_* for
> >> examples of well justified generic options for which we have no
> >> other API.  The vendor mlx4 options look fairly vendor specific if you
> >> ask me, too.
> >>
> >> Configuring queuing has an API.  The question is it acceptable to enter
> >> into the risky territory of controlling offloads via devlink parameters
> >> or would we rather make vendors take the time and effort to model
> >> things to (a subset) of existing APIs.  The HW never fits the APIs
> >> perfectly.  
> >
> > I understand what you meant here, I would like to highlight that this
> > mechanism was not meant to handle SRIOV, Representors, etc.
> > The vendor specific configuration suggested here is to handle a congestion
> > state in Multi Host environment (which includes PF and multiple VFs per
> > host), where one host is not aware to the other hosts, and each is running
> > on its own pci/driver. It is a device working mode configuration.
> >
> > This  couldn't fit into any existing API, thus creating this vendor specific
> > unique API is needed.  
> 
> If we are just going to start creating devlink interfaces in for every
> one-off option a device wants to add why did we even bother with
> trying to prevent drivers from using sysfs? This just feels like we
> are back to the same arguments we had back in the day with it.
> 
> I feel like the bigger question here is if devlink is how we are going
> to deal with all PCIe related features going forward, or should we
> start looking at creating a new interface/tool for PCI/PCIe related
> features? My concern is that we have already had features such as DMA
> Coalescing that didn't really fit into anything and now we are
> starting to see other things related to DMA and PCIe bus credits. I'm
> wondering if we shouldn't start looking at a tool/interface to
> configure all the PCIe related features such as interrupts, error
> reporting, DMA configuration, power management, etc. Maybe we could
> even look at sharing it across subsystems and include things like
> storage, graphics, and other subsystems in the conversation.

Agreed, for actual PCIe configuration (i.e. not ECN marking) we do need
to build up an API.  Sharing it across subsystems would be very cool!


Re: [PATCH] bpf: verifier: BPF_MOV don't mark dst reg if src == dst

2018-07-25 Thread Daniel Borkmann
On 07/26/2018 12:08 AM, Arthur Fabre wrote:
> When check_alu_op() handles a BPF_MOV between two registers,
> it calls check_reg_arg() on the dst register, marking it as unbounded.
> If the src and dst register are the same, this marks the src as
> unbounded, which can lead to unexpected errors for further checks that
> rely on bounds info.
> 
> check_alu_op() now only marks the dst register as unbounded if it
> different from the src register.
> 
> Signed-off-by: Arthur Fabre 
> ---
>  kernel/bpf/verifier.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 63aaac52a265..ddfe3c544a80 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3238,8 +3238,9 @@ static int check_alu_op(struct bpf_verifier_env
> *env, struct bpf_insn *insn)
> }
> }
> 
> -   /* check dest operand */
> -   err = check_reg_arg(env, insn->dst_reg, DST_OP);
> +   /* check dest operand, only mark if dest != src */
> +   err = check_reg_arg(env, insn->dst_reg,
> +   insn->dst_reg == insn->src_reg ?
> DST_OP_NO_MARK : DST_OP);
> if (err)
> return err;
> 

Thanks a lot for the patch! Looks like it's corrupted wrt newline.

Please also add test cases to tools/testing/selftests/bpf/test_verifier.c
for the cases of mov64 and mov32 where in each src==dst and src!=dst; mov32
should mark it as unbounded but not former, so would be good to keep tracking
that in selftests.


Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-25 Thread D'Souza, Nelson
David, 

I tried out the commands on an Ubuntu 17.10.1 VM.
The pings on test-vrf are successful, but the pings on br0 are not successful.

# uname -rv  
4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017

 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 17.10
Release:17.10
Codename:   artful

# ip rule  --> Note: its missing the l3mdev rule
0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Ran the configs from a bash script vrf.sh

 # ./vrf.sh 
+ ip netns add foo
+ ip li add veth1 type veth peer name veth2
+ ip li set veth2 netns foo
+ ip -netns foo li set lo up
+ ip -netns foo li set veth2 up
+ ip -netns foo addr add 172.16.1.2/24 dev veth2
+ ip li add test-vrf type vrf table 123
+ ip li set test-vrf up
+ ip ro add vrf test-vrf unreachable default
+ ip li add br0 type bridge
+ ip li set veth1 master br0
+ ip li set veth1 up
+ ip li set br0 up
+ ip addr add dev br0 172.16.1.1/24
+ ip li set br0 master test-vrf
+ ip -netns foo addr add 172.16.2.2/32 dev lo
+ ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

# ping -I test-vrf 172.16.2.2 -c 2  <<< successful on test-vrf
ping: Warning: source address might be selected on device other than test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms

#ping -I br0 172.16.2.2 -c 2   <<< fails on br0
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1022ms

Please let me know if I should try a different version.

Nelson

On 7/24/18, 9:08 AM, "D'Souza, Nelson"  wrote:

It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but 
enslaving eth1->test-vrf works fine.

Nelson

On 7/24/18, 8:58 AM, "D'Souza, Nelson"  wrote:

Thank you David, really appreciate the help. Most likely something 
specific to my environment.

ip vrf id, does not report anything on my system. Here's the result 
after running the command.

# ip vrf id
#

I'll follow up with a VM.

Nelson

On 7/24/18, 5:55 AM, "David Ahern"  wrote:

On 7/23/18 7:43 PM, D'Souza, Nelson wrote:
> I copy and pasted the configs onto my device, but pings on 
test-vrf do not work in my setup. 
> I'm essentially seeing the same issue as I reported before.
> 
> In this case, pings sent out on test-vrf (host ns) are received 
and replied to by the loopback interface (foo ns). Although the replies are 
seen at the test-vrf level, they are not locally delivered to the ping 
application.
> 

I just built v4.14.52 kernel and ran those commands - worked fine. 
It is
something specific to your environment. Is your shell tied to a VRF 
--
(ip vrf id)?

After that, I suggest you create a VM running a newer distribution 
of
your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 
kernel, or
Fedora 26 or newer) and run the commands there.








[PATCH bpf-next v3] bpf: add End.DT6 action to bpf_lwt_seg6_action helper

2018-07-25 Thread Mathieu Xhonneux
The seg6local LWT provides the End.DT6 action, which allows to
decapsulate an outer IPv6 header containing a Segment Routing Header
(SRH), full specification is available here:

https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-05

This patch adds this action now to the seg6local BPF
interface. Since it is not mandatory that the inner IPv6 header also
contains a SRH, seg6_bpf_srh_state has been extended with a pointer to
a possible SRH of the outermost IPv6 header. This helps assessing if the
validation must be triggered or not, and avoids some calls to
ipv6_find_hdr.

v3: s/1/true, s/0/false for boolean values
v2: - changed true/false -> 1/0
- preempt_enable no longer called in first conditional block

Signed-off-by: Mathieu Xhonneux 
---
 include/net/seg6_local.h |  4 ++-
 net/core/filter.c| 89 
 net/ipv6/seg6_local.c| 50 +--
 3 files changed, 95 insertions(+), 48 deletions(-)

diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
index 661fd5b4d3e0..08359e2d8b35 100644
--- a/include/net/seg6_local.h
+++ b/include/net/seg6_local.h
@@ -21,10 +21,12 @@
 
 extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
   u32 tbl_id);
+extern bool seg6_bpf_has_valid_srh(struct sk_buff *skb);
 
 struct seg6_bpf_srh_state {
-   bool valid;
+   struct ipv6_sr_hdr *srh;
u16 hdrlen;
+   bool valid;
 };
 
 DECLARE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
diff --git a/net/core/filter.c b/net/core/filter.c
index 104d560946da..355430cfeb76 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4542,26 +4542,28 @@ BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, 
skb, u32, offset,
 {
struct seg6_bpf_srh_state *srh_state =
this_cpu_ptr(_bpf_srh_states);
+   struct ipv6_sr_hdr *srh = srh_state->srh;
void *srh_tlvs, *srh_end, *ptr;
-   struct ipv6_sr_hdr *srh;
int srhoff = 0;
 
-   if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0)
+   if (srh == NULL)
return -EINVAL;
 
-   srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
 
ptr = skb->data + offset;
if (ptr >= srh_tlvs && ptr + len <= srh_end)
-   srh_state->valid = 0;
+   srh_state->valid = false;
else if (ptr < (void *)>flags ||
 ptr + len > (void *)>segments)
return -EFAULT;
 
if (unlikely(bpf_try_make_writable(skb, offset + len)))
return -EFAULT;
+   if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0)
+   return -EINVAL;
+   srh_state->srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
 
memcpy(skb->data + offset, from, len);
return 0;
@@ -4577,52 +4579,79 @@ static const struct bpf_func_proto 
bpf_lwt_seg6_store_bytes_proto = {
.arg4_type  = ARG_CONST_SIZE
 };
 
-BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
-  u32, action, void *, param, u32, param_len)
+static void bpf_update_srh_state(struct sk_buff *skb)
 {
struct seg6_bpf_srh_state *srh_state =
this_cpu_ptr(_bpf_srh_states);
-   struct ipv6_sr_hdr *srh;
int srhoff = 0;
-   int err;
-
-   if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0)
-   return -EINVAL;
-   srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
-
-   if (!srh_state->valid) {
-   if (unlikely((srh_state->hdrlen & 7) != 0))
-   return -EBADMSG;
-
-   srh->hdrlen = (u8)(srh_state->hdrlen >> 3);
-   if (unlikely(!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)))
-   return -EBADMSG;
 
-   srh_state->valid = 1;
+   if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0) {
+   srh_state->srh = NULL;
+   } else {
+   srh_state->srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
+   srh_state->hdrlen = srh_state->srh->hdrlen << 3;
+   srh_state->valid = true;
}
+}
+
+BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
+  u32, action, void *, param, u32, param_len)
+{
+   struct seg6_bpf_srh_state *srh_state =
+   this_cpu_ptr(_bpf_srh_states);
+   int hdroff = 0;
+   int err;
 
switch (action) {
case SEG6_LOCAL_ACTION_END_X:
+   if (!seg6_bpf_has_valid_srh(skb))
+   return -EBADMSG;
if (param_len != sizeof(struct in6_addr))
return -EINVAL;
return seg6_lookup_nexthop(skb, (struct in6_addr *)param, 0);
case SEG6_LOCAL_ACTION_END_T:
+   if 

Re: [181992] ...8xx/816: net-tcp_bbr: improve DCTCP ECN delayed ACK

2018-07-25 Thread Laurent Chavey
Are we ok with all the BBR patches for 816 (i.e. we did extensive testing?).


On Wed, Jul 25, 2018 at 3:58 PM kernel-commit-validator (Code Review) <
prodkernel-gerrit-comm...@google.com> wrote:

> GA release commit message is missing a Release-Commit: tag indicating the
> original SHA1.
>
>
> Current Buildbot status: http://prodkernel-bot/?review=181992.
>
> Patch set 1:Code-Review -2
>
> View Change 
>
> 1 comment:
>
>-
>
>Commit Message:
>
>-
>
>   Patch Set #1, Line 15:
>   
>  
> CherryPick-4.3.5-SHA1:
>   7205c1178312155984ab4047b1e4459bde572f8f
>
>   Unexpected tag; expected CherryPick-8xx-SHA1:
>
> To view, visit change 181992
> . To unsubscribe,
> or for help writing mail filters, visit settings
> .
> Gerrit-Project: kernel/release/8xx
> Gerrit-Branch: 816
> Gerrit-Change-Id: Ied39c2f4ef576c697b0064d6b810b47282796388
> Gerrit-Change-Number: 181992
> Gerrit-PatchSet: 1
> Gerrit-Owner: Yuchung Cheng 
> Gerrit-Reviewer: Greg Thelen 
> Gerrit-Reviewer: Laurent Chavey 
> Gerrit-Reviewer: kernel-commit-validator
> Gerrit-CC: Neal Cardwell 
> Gerrit-Comment-Date: Wed, 25 Jul 2018 22:58:32 +
> Gerrit-HasComments: Yes
> Gerrit-Has-Labels: Yes
> Gerrit-MessageType: comment
>


Re: [PATCH net-next 00/17] mlxsw: Introduce algorithmic TCAM support

2018-07-25 Thread David Miller
From: Ido Schimmel 
Date: Wed, 25 Jul 2018 09:23:49 +0300

> The Spectrum-2 ASIC uses an algorithmic TCAM (A-TCAM) where multiple
> exact matches lookups are performed instead of a single lookup as with
> standard circuit TCAM (C-TCAM) memory. This allows for higher scale and
> reduced power consumption.
> 
> The lookups are performed by masking a packet using different masks
> (e.g., {dst_ip/24, ethtype}) defined for the region and looking for an
> exact match. Eventually, the rule with the highest priority will be
> picked.
> 
> Since the number of masks per-region is limited, the ASIC includes a
> C-TCAM that can be used as a spill area for rules that do not fit into
> the A-TCAM.
 ...

Looks great, series applied, thanks!


Re: [PATCH bpf-next v2] bpf: add End.DT6 action to bpf_lwt_seg6_action helper

2018-07-25 Thread Mathieu Xhonneux
2018-07-25 23:13 GMT+02:00 Martin KaFai Lau :
>> v2: - changed true/false -> 1/0
> hmmm...I thought I was asking to replace 1/0 with true/false.  More
> below.

Silly me, I read your indication backwards. Agreed.

@Daniel: sorry for this one, sending a v3.


Re: [PATCH net-next] net: igmp: make function __ip_mc_inc_group() static

2018-07-25 Thread David Miller
From: Wei Yongjun 
Date: Wed, 25 Jul 2018 06:06:13 +

> Fixes the following sparse warnings:
> 
> net/ipv4/igmp.c:1391:6: warning:
>  symbol '__ip_mc_inc_group' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied to 'net'.


Re: [PATCH net-next] lan743x: Make symbol lan743x_pm_ops static

2018-07-25 Thread David Miller
From: Wei Yongjun 
Date: Wed, 25 Jul 2018 06:11:16 +

> Fixes the following sparse warning:
> 
> drivers/net/ethernet/microchip/lan743x_main.c:2944:25: warning:
>  symbol 'lan743x_pm_ops' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied.


Re: [PATCH net-next] tcp: make function tcp_retransmit_stamp() static

2018-07-25 Thread David Miller
From: Wei Yongjun 
Date: Wed, 25 Jul 2018 06:06:07 +

> Fixes the following sparse warnings:
> 
> net/ipv4/tcp_timer.c:25:5: warning:
>  symbol 'tcp_retransmit_stamp' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied, thank you.


Re: [PATCH net-next] net/sched: cls_flower: Use correct inline function for assignment of vlan tpid

2018-07-25 Thread David Miller
From: Jianbo Liu 
Date: Wed, 25 Jul 2018 02:31:25 +

> This fixes the following sparse warning:
> 
> net/sched/cls_flower.c:1356:36: warning: incorrect type in argument 3 
> (different base types)
> net/sched/cls_flower.c:1356:36: expected unsigned short [unsigned] [usertype] 
> value
> net/sched/cls_flower.c:1356:36: got restricted __be16 [usertype] vlan_tpid
> 
> Signed-off-by: Jianbo Liu 
> Reported-by: Or Gerlitz 
> Reviewed-by: Or Gerlitz 

Applied, thank you.


Re: [PATCH net-next] net/mlx4_core: Allow MTTs starting at any index

2018-07-25 Thread David Miller
From: Tariq Toukan 
Date: Tue, 24 Jul 2018 14:31:45 +0300

> Allow obtaining MTTs starting at any index,
> thus give a better cache utilization.
> 
> For this, allow setting log_mtts_per_seg to 0, and use
> this in default.
> 
> Signed-off-by: Tariq Toukan 
> Signed-off-by: Eli Cohen 
> Signed-off-by: Anaty Rahamim Bar Kat 
> Reviewed-by: Jack Morgenstein 

Applied.


Re: [PATCH net-next 0/3] net/mlx5: Offload setting/matching on tunnel tos/ttl

2018-07-25 Thread David Miller
From: Or Gerlitz 
Date: Tue, 24 Jul 2018 13:59:32 +0300

> This series enables mlx5 offloading of tc eswitch rules that set
> tos/ttl (encap) or match on them (decap) for tunnels.

Series applied, thanks.


Re: pull-request: can 2018-07-23

2018-07-25 Thread David Miller
From: Marc Kleine-Budde 
Date: Tue, 24 Jul 2018 09:27:30 +0200

> Thanks David. Can you please merge net into next-next, as I've some
> patches for net-next that would result in a merge conflict between net
> and net-next later.

This has now been done.


Re: [PATCH net-next] tcp: ack immediately when a cwr packet arrives

2018-07-25 Thread David Miller
From: Neal Cardwell 
Date: Tue, 24 Jul 2018 21:57:27 -0400

> On Tue, Jul 24, 2018 at 1:42 PM Lawrence Brakmo  wrote:
>>
>> Note that without this fix the 99% latencies when doing 10KB RPCs
>> in a congested network using DCTCP are 40ms vs. 190us with the patch.
>> Also note that these 40ms high tail latencies started after commit
>> 3759824da87b30ce7a35b4873b62b0ba38905ef5 in Jul 2015,
>> which triggered the bugs/features we are fixing/adding. I agree it is a
>> debatable whether it is a bug fix or a feature improvement and I am
>> fine either way.
> 
> Good point. The fact that this greatly mitigates a regression in DCTCP
> performance resulting from 3759824da87b30ce7a35b4873b62b0ba38905ef5
> ("tcp: PRR uses CRB mode by default and SS mode conditionally") IMHO
> seems to be a good argument for putting this patch ("tcp: ack
> immediately when a cwr packet arrives") in the "net" branch and stable
> releases.

Thus, applied to 'net' and queued up for -stable.


[PATCH net] netdevsim: don't leak devlink resources

2018-07-25 Thread Jakub Kicinski
Devlink resources registered with devlink_resource_register() have
to be unregistered.

Fixes: 37923ed6b8ce ("netdevsim: Add simple FIB resource controller via 
devlink")
Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
---
 drivers/net/netdevsim/devlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/netdevsim/devlink.c b/drivers/net/netdevsim/devlink.c
index ba663e5af168..5135fc371f01 100644
--- a/drivers/net/netdevsim/devlink.c
+++ b/drivers/net/netdevsim/devlink.c
@@ -207,6 +207,7 @@ void nsim_devlink_teardown(struct netdevsim *ns)
struct net *net = nsim_to_net(ns);
bool *reg_devlink = net_generic(net, nsim_devlink_id);
 
+   devlink_resources_unregister(ns->devlink, NULL);
devlink_unregister(ns->devlink);
devlink_free(ns->devlink);
ns->devlink = NULL;
-- 
2.17.1



[PATCH RFC ipsec-next] xfrm: Check Reverse-Mark Lookup Before ADDSA/DELSA

2018-07-25 Thread Nathan Harold
It's possible to insert an SA into the SADB that will
preclude the lookup of other SAs when using the MARK
attribute. The problem occurs based on a particular
sequencing with the marks where a new mark matches
requests for the SA mark of an existing SA but the
inverse is not true. For an example:
1) Add SA with mark=0, mask=0
2) Add an otherwise-identical SA
   with mark=0x1234, mask=0x

This will fail; however, if done in the reverse order
the second add will succeed:
1) Add an SA with mark=0x1234, mask=0x
2) Add an otherwise-identical SA
   except with mark=0, mask=0
Then:
3) Delete the SA using mark=0x1234, mask=0xFFF
4) Dump the SADB, and there will be one SA, and it will
   have mark=0x1234, mask=0x; the 0/0 SA will
   be deleted

This patch addresses the problem by performing a
reverse-match on the mark and preventing ADDSA for
any SA that would 'shadow' an existing SA in the SADB.

This patch also address a bug where it was possible to
add an SA with a mark broader than its mask, which
could never be deleted:
1) ADDSA with mark=0x1234, mask=0xFF
2) DELSA with mark=0x1234, mask=; error=ESRCH

By applying both masks to each mark, bits outside the
mask of a given SA mark will be ignored, and the match
will succeed.

This patch does not make any changes to the 'data'
path, so SAs with such oddly-defined marks will still
be unmatch-able.

Signed-off-by: Nathan Harold 
---
 net/xfrm/xfrm_state.c | 44 +++
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index b669262682c9..ee212a7c91a9 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -815,10 +815,10 @@ xfrm_init_tempstate(struct xfrm_state *x, const struct 
flowi *fl,
afinfo->init_temprop(x, tmpl, daddr, saddr);
 }
 
-static struct xfrm_state *__xfrm_state_lookup(struct net *net, u32 mark,
- const xfrm_address_t *daddr,
- __be32 spi, u8 proto,
- unsigned short family)
+static struct xfrm_state *
+__xfrm_state_lookup(struct net *net, u32 mark, u32 mask,
+   const xfrm_address_t *daddr,
+   __be32 spi, u8 proto, unsigned short family)
 {
unsigned int h = xfrm_spi_hash(net, daddr, spi, proto, family);
struct xfrm_state *x;
@@ -830,7 +830,7 @@ static struct xfrm_state *__xfrm_state_lookup(struct net 
*net, u32 mark,
!xfrm_addr_equal(>id.daddr, daddr, family))
continue;
 
-   if ((mark & x->mark.m) != x->mark.v)
+   if ((mark ^ x->mark.v) & mask & x->mark.m)
continue;
if (!xfrm_state_hold_rcu(x))
continue;
@@ -840,10 +840,11 @@ static struct xfrm_state *__xfrm_state_lookup(struct net 
*net, u32 mark,
return NULL;
 }
 
-static struct xfrm_state *__xfrm_state_lookup_byaddr(struct net *net, u32 mark,
-const xfrm_address_t 
*daddr,
-const xfrm_address_t 
*saddr,
-u8 proto, unsigned short 
family)
+static struct xfrm_state *
+__xfrm_state_lookup_byaddr(struct net *net, u32 mark, u32 mask,
+  const xfrm_address_t *daddr,
+  const xfrm_address_t *saddr,
+  u8 proto, unsigned short family)
 {
unsigned int h = xfrm_src_hash(net, daddr, saddr, family);
struct xfrm_state *x;
@@ -855,7 +856,7 @@ static struct xfrm_state *__xfrm_state_lookup_byaddr(struct 
net *net, u32 mark,
!xfrm_addr_equal(>props.saddr, saddr, family))
continue;
 
-   if ((mark & x->mark.m) != x->mark.v)
+   if ((mark ^ x->mark.v) & mask & x->mark.m)
continue;
if (!xfrm_state_hold_rcu(x))
continue;
@@ -869,15 +870,14 @@ static inline struct xfrm_state *
 __xfrm_state_locate(struct xfrm_state *x, int use_spi, int family)
 {
struct net *net = xs_net(x);
-   u32 mark = x->mark.v & x->mark.m;
 
if (use_spi)
-   return __xfrm_state_lookup(net, mark, >id.daddr,
-  x->id.spi, x->id.proto, family);
+   return __xfrm_state_lookup(net, x->mark.v, x->mark.m,
+  >id.daddr, x->id.spi,
+  x->id.proto, family);
else
-   return __xfrm_state_lookup_byaddr(net, mark,
- >id.daddr,
- >props.saddr,
+   return __xfrm_state_lookup_byaddr(net, x->mark.v, x->mark.m,
+ 

[PATCH] bpf: verifier: BPF_MOV don't mark dst reg if src == dst

2018-07-25 Thread Arthur Fabre
When check_alu_op() handles a BPF_MOV between two registers,
it calls check_reg_arg() on the dst register, marking it as unbounded.
If the src and dst register are the same, this marks the src as
unbounded, which can lead to unexpected errors for further checks that
rely on bounds info.

check_alu_op() now only marks the dst register as unbounded if it
different from the src register.

Signed-off-by: Arthur Fabre 
---
 kernel/bpf/verifier.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 63aaac52a265..ddfe3c544a80 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3238,8 +3238,9 @@ static int check_alu_op(struct bpf_verifier_env
*env, struct bpf_insn *insn)
}
}

-   /* check dest operand */
-   err = check_reg_arg(env, insn->dst_reg, DST_OP);
+   /* check dest operand, only mark if dest != src */
+   err = check_reg_arg(env, insn->dst_reg,
+   insn->dst_reg == insn->src_reg ?
DST_OP_NO_MARK : DST_OP);
if (err)
return err;

-- 
2.18.0


Re: [PATCH v1 1/4] igb: Remove unnecessary include of

2018-07-25 Thread Jeff Kirsher
On Wed, 2018-07-25 at 14:52 -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> The igb driver doesn't need anything provided by pci-aspm.h, so
> remove
> the unnecessary include of it.
> 
> Signed-off-by: Bjorn Helgaas 

Acked-by: Jeff Kirsher 

I am fine with you picking up this change.

> ---
>  drivers/net/ethernet/intel/igb/igb_main.c |1 -
>  1 file changed, 1 deletion(-)



signature.asc
Description: This is a digitally signed message part


[PATCH net-next v2] tls: Skip zerocopy path for ITER_KVEC

2018-07-25 Thread Doron Roberts-Kedes
The zerocopy path ultimately calls iov_iter_get_pages, which defines the
step function for ITER_KVECs as simply, return -EFAULT. Taking the
non-zerocopy path for ITER_KVECs avoids the unnecessary fallback.

See https://lore.kernel.org/lkml/20150401023311.gl29...@zeniv.linux.org.uk/T/#u
for a discussion of why zerocopy for vmalloc data is not a good idea.

Discovered while testing NBD traffic encrypted with ktls.

Fixes: c46234ebb4d1 ("tls: RX path for ktls")
Signed-off-by: Doron Roberts-Kedes 
---
 net/tls/tls_sw.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 1f3d9789af30..da584dc7d633 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -377,6 +377,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
int record_room;
bool full_record;
int orig_size;
+   bool is_kvec = msg->msg_iter.type & ITER_KVEC;
 
if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL))
return -ENOTSUPP;
@@ -425,8 +426,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, 
size_t size)
try_to_copy -= required_size - ctx->sg_encrypted_size;
full_record = true;
}
-
-   if (full_record || eor) {
+   if (!is_kvec && (full_record || eor)) {
ret = zerocopy_from_iter(sk, >msg_iter,
try_to_copy, >sg_plaintext_num_elem,
>sg_plaintext_size,
@@ -764,6 +764,7 @@ int tls_sw_recvmsg(struct sock *sk,
bool cmsg = false;
int target, err = 0;
long timeo;
+   bool is_kvec = msg->msg_iter.type & ITER_KVEC;
 
flags |= nonblock;
 
@@ -807,7 +808,7 @@ int tls_sw_recvmsg(struct sock *sk,
page_count = iov_iter_npages(>msg_iter,
 MAX_SKB_FRAGS);
to_copy = rxm->full_len - tls_ctx->rx.overhead_size;
-   if (to_copy <= len && page_count < MAX_SKB_FRAGS &&
+   if (!is_kvec && to_copy <= len && page_count < 
MAX_SKB_FRAGS &&
likely(!(flags & MSG_PEEK)))  {
struct scatterlist sgin[MAX_SKB_FRAGS + 1];
int pages = 0;
-- 
2.17.1



Re: [PATCH net-next] net/tls: Removed redundant checks for non-NULL

2018-07-25 Thread Dave Watson
On 07/24/18 04:54 PM, Vakul Garg wrote:
> Removed checks against non-NULL before calling kfree_skb() and
> crypto_free_aead(). These functions are safe to be called with NULL
> as an argument.
> 
> Signed-off-by: Vakul Garg 

Acked-by: Dave Watson 


Re: [PATCH bpf-next v2] bpf: add End.DT6 action to bpf_lwt_seg6_action helper

2018-07-25 Thread Martin KaFai Lau
On Wed, Jul 25, 2018 at 12:36:45PM +, Mathieu Xhonneux wrote:
> The seg6local LWT provides the End.DT6 action, which allows to
> decapsulate an outer IPv6 header containing a Segment Routing Header
> (SRH), full specification is available here:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dfilsfils-2Dspring-2Dsrv6-2Dnetwork-2Dprogramming-2D05=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=VQnoQ7LvghIj0gVEaiQSUw=xOQOjR3OUfKkBdRSeFH8x1QqbAb8VVRwECipEqCJyuw=L3YiDuRAH4hYSETfa5t_5q2BqaYKJR4d8Vqa8dqqHGo=
> 
> This patch adds this action now to the seg6local BPF
> interface. Since it is not mandatory that the inner IPv6 header also
> contains a SRH, seg6_bpf_srh_state has been extended with a pointer to
> a possible SRH of the outermost IPv6 header. This helps assessing if the
> validation must be triggered or not, and avoids some calls to
> ipv6_find_hdr.
> 
> v2: - changed true/false -> 1/0
hmmm...I thought I was asking to replace 1/0 with true/false.  More
below.

> - preempt_enable no longer called in first conditional block
> 
> Signed-off-by: Mathieu Xhonneux 
> ---
>  include/net/seg6_local.h |  4 ++-
>  net/core/filter.c| 83 
> +---
>  net/ipv6/seg6_local.c| 48 ++--
>  3 files changed, 91 insertions(+), 44 deletions(-)
> 
> diff --git a/include/net/seg6_local.h b/include/net/seg6_local.h
> index 661fd5b4d3e0..08359e2d8b35 100644
> --- a/include/net/seg6_local.h
> +++ b/include/net/seg6_local.h
> @@ -21,10 +21,12 @@
>  
>  extern int seg6_lookup_nexthop(struct sk_buff *skb, struct in6_addr *nhaddr,
>  u32 tbl_id);
> +extern bool seg6_bpf_has_valid_srh(struct sk_buff *skb);
>  
>  struct seg6_bpf_srh_state {
> - bool valid;
> + struct ipv6_sr_hdr *srh;
>   u16 hdrlen;
> + bool valid;
"valid" is a bool, so it is easier to read
if true/false is used in srh_state->valid = true/false;

>  };
>  
>  DECLARE_PER_CPU(struct seg6_bpf_srh_state, seg6_bpf_srh_states);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 104d560946da..2cdea7d05063 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4542,14 +4542,13 @@ BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff 
> *, skb, u32, offset,
>  {
>   struct seg6_bpf_srh_state *srh_state =
>   this_cpu_ptr(_bpf_srh_states);
> + struct ipv6_sr_hdr *srh = srh_state->srh;
>   void *srh_tlvs, *srh_end, *ptr;
> - struct ipv6_sr_hdr *srh;
>   int srhoff = 0;
>  
> - if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0)
> + if (srh == NULL)
>   return -EINVAL;
>  
> - srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
>   srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
>   srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
>  
> @@ -4562,6 +4561,9 @@ BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, 
> skb, u32, offset,
>  
>   if (unlikely(bpf_try_make_writable(skb, offset + len)))
>   return -EFAULT;
> + if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0)
> + return -EINVAL;
> + srh_state->srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
>  
>   memcpy(skb->data + offset, from, len);
>   return 0;
> @@ -4577,52 +4579,79 @@ static const struct bpf_func_proto 
> bpf_lwt_seg6_store_bytes_proto = {
>   .arg4_type  = ARG_CONST_SIZE
>  };
>  
> -BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
> -u32, action, void *, param, u32, param_len)
> +static void bpf_update_srh_state(struct sk_buff *skb)
>  {
>   struct seg6_bpf_srh_state *srh_state =
>   this_cpu_ptr(_bpf_srh_states);
> - struct ipv6_sr_hdr *srh;
>   int srhoff = 0;
> - int err;
> -
> - if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0)
> - return -EINVAL;
> - srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
> -
> - if (!srh_state->valid) {
> - if (unlikely((srh_state->hdrlen & 7) != 0))
> - return -EBADMSG;
> -
> - srh->hdrlen = (u8)(srh_state->hdrlen >> 3);
> - if (unlikely(!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)))
> - return -EBADMSG;
>  
> + if (ipv6_find_hdr(skb, , IPPROTO_ROUTING, NULL, NULL) < 0) {
> + srh_state->srh = NULL;
> + } else {
> + srh_state->srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
> + srh_state->hdrlen = srh_state->srh->hdrlen << 3;
>   srh_state->valid = 1;
e.g. here

>   }
> +}
> +
> +BPF_CALL_4(bpf_lwt_seg6_action, struct sk_buff *, skb,
> +u32, action, void *, param, u32, param_len)
> +{
> + struct seg6_bpf_srh_state *srh_state =
> + this_cpu_ptr(_bpf_srh_states);
> + int hdroff = 0;
> + int err;
>  
>   switch (action) {
>   case SEG6_LOCAL_ACTION_END_X:
> + if 

Re: [Intel-wired-lan] [PATCH v1 1/4] igb: Remove unnecessary include of

2018-07-25 Thread Alexander Duyck
On Wed, Jul 25, 2018 at 12:52 PM, Bjorn Helgaas  wrote:
> From: Bjorn Helgaas 
>
> The igb driver doesn't need anything provided by pci-aspm.h, so remove
> the unnecessary include of it.
>
> Signed-off-by: Bjorn Helgaas 

Looks good to me.

Acked-by: Alexander Duyck 

> ---
>  drivers/net/ethernet/intel/igb/igb_main.c |1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index f707709969ac..c77fda05f683 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -22,7 +22,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
>
> ___
> Intel-wired-lan mailing list
> intel-wired-...@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan


[PATCH v2 bpf-next] samples/bpf: xdpsock: order memory on AArch64

2018-07-25 Thread Brian Brooks
Define u_smp_rmb() and u_smp_wmb() to respective barrier instructions.
This ensures the processor will order accesses to queue indices against
accesses to queue ring entries.

Signed-off-by: Brian Brooks 
---
 samples/bpf/xdpsock_user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
index 5904b1543831..1e82f7c617c3 100644
--- a/samples/bpf/xdpsock_user.c
+++ b/samples/bpf/xdpsock_user.c
@@ -145,8 +145,13 @@ static void dump_stats(void);
} while (0)
 
 #define barrier() __asm__ __volatile__("": : :"memory")
+#ifdef __aarch64__
+#define u_smp_rmb() __asm__ __volatile__("dmb ishld": : :"memory")
+#define u_smp_wmb() __asm__ __volatile__("dmb ishst": : :"memory")
+#else
 #define u_smp_rmb() barrier()
 #define u_smp_wmb() barrier()
+#endif
 #define likely(x) __builtin_expect(!!(x), 1)
 #define unlikely(x) __builtin_expect(!!(x), 0)
 
-- 
2.18.0



Re: [PATCH] hinic: Link the logical network device to the pci device in sysfs

2018-07-25 Thread David Miller
From: dann frazier 
Date: Mon, 23 Jul 2018 16:55:40 -0600

> Otherwise interfaces get exposed under /sys/devices/virtual, which
> doesn't give udev the context it needs for PCI-based predictable
> interface names.
> 
> Signed-off-by: dann frazier 

Applied, thank you.


Re: [net-next v5 3/3] net/tls: Remove redundant array allocation.

2018-07-25 Thread Dave Watson
On 07/24/18 08:22 AM, Vakul Garg wrote:
> > I don't think this patch is safe as-is.  sgin_arr is a stack array of size
> > MAX_SKB_FRAGS (+ overhead), while my read of skb_cow_data is that it
> > walks the whole chain of skbs from skb->next, and can return any number of
> > segments.  Therefore we need to heap allocate.  I think I copied the IPSEC
> > code here.
> > 
> > For perf though, we could use the stack array if skb_cow_data returns <=
> > MAX_SKB_FRAGS.
> > 
> > This code is slightly confusing though, since we don't heap allocate in the
> > zerocopy case - what happens is that skb_to_sgvec returns -EMSGSIZE, and
> > we fall back to the non-zerocopy case, and return again to this function,
> > where we then hit the skb_cow_data path and heap allocate.
> 
> Thanks for explaining. 
> I am missing the point why MAX_SKB_FRAGS sized local array 
> sgin has been used in tls_sw_recvmsg(). What is special about MAX_SKB_FRAGS so
> that we used it as a size factor for 'sgin'?

There is nothing special about it, in the zerocopy-fastpath if we
happen to fit in MAX_SKB_FRAGS we avoid any kmalloc whatsoever though.
It could be renamed MAX_SC_FOR_FASTPATH or something.

> Will it be a bad idea to get rid of array 'sgin' on stack and simply kmalloc 
> 'sgin' for 
> whatever the number the number of pages returned by iov_iter_npages()?
> We can allocate for sgout too with the same kmalloc().
> 
> (Using a local array based 'sgin' is coming in the way to achieve sending 
> multiple async
> record decryption requests to the accelerator without waiting for previous 
> one to complete.)

Yes we could do this, and yes we would need to heap allocate if you
want to support multiple outstanding decryption requests.  I think
async crypto prevents any sort of zerocopy-fastpath, however.  


Re: [PATCH v1 0/4] PCI: Remove unnecessary includes of

2018-07-25 Thread Bjorn Helgaas
On Wed, Jul 25, 2018 at 01:33:23PM -0700, Sinan Kaya wrote:
> On 7/25/2018 12:52 PM, Bjorn Helgaas wrote:
> > emove includes of  from files that don't need
> > it.  I'll apply all these via the PCI tree unless there's objection.
> > 
> > ---
> > 
> > Bjorn Helgaas (4):
> >igb: Remove unnecessary include of 
> >ath9k: Remove unnecessary include of 
> >iwlwifi: Remove unnecessary include of 
> >PCI: Remove unnecessary include of 
> 
> Thanks.
> 
> Reviewed-by: Sinan Kaya 
> 
> Is it possible to kill that file altogether? I haven't looked who is
> using outside of pci directory.

Thanks for taking a look!

It's possible we could remove it altogether; there's very little in
it, and in most cases the only reason drivers include it is to disable
certain ASPM link states to work around hardware defects.  It might
make sense to just move that interface into .


Re: [PATCH v1 3/4] iwlwifi: Remove unnecessary include of

2018-07-25 Thread Luciano Coelho
On Wed, 2018-07-25 at 14:52 -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas 
> 
> This part of the iwlwifi driver doesn't need anything provided by
> pci-aspm.h, so remove the unnecessary include of it.
> 
> Signed-off-by: Bjorn Helgaas 
> ---

Acked-by: Luca Coelho 

Thanks!

--
Cheers,
Luca.


Re: [PATCH] vxge: Remove unnecessary include of

2018-07-25 Thread David Miller
From: Bjorn Helgaas 
Date: Mon, 23 Jul 2018 15:59:46 -0500

> From: Bjorn Helgaas 
> 
> The vxge driver doesn't need anything provided by pci_hotplug.h, so remove
> the unnecessary include of it.
> 
> Signed-off-by: Bjorn Helgaas 

Applied to net-next, thanks.


Re: [PATCH V2 bpf] xdp: add NULL pointer check in __xdp_return()

2018-07-25 Thread Martin KaFai Lau
On Thu, Jul 26, 2018 at 12:09:50AM +0900, Taehee Yoo wrote:
> rhashtable_lookup() can return NULL. so that NULL pointer
> check routine should be added.
> 
> Fixes: 02b55e5657c3 ("xdp: add MEM_TYPE_ZERO_COPY")
> Signed-off-by: Taehee Yoo 
Acked-by: Martin KaFai Lau 

> ---
> V2 : add WARN_ON_ONCE when xa is NULL.
> 
>  net/core/xdp.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 9d1f220..786fdbe 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -345,7 +345,10 @@ static void __xdp_return(void *data, struct xdp_mem_info 
> *mem, bool napi_direct,
>   rcu_read_lock();
>   /* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
>   xa = rhashtable_lookup(mem_id_ht, >id, mem_id_rht_params);
> - xa->zc_alloc->free(xa->zc_alloc, handle);
> + if (!xa)
> + WARN_ON_ONCE(1);
> + else
> + xa->zc_alloc->free(xa->zc_alloc, handle);
>   rcu_read_unlock();
>   default:
>   /* Not possible, checked in xdp_rxq_info_reg_mem_model() */
> -- 
> 2.9.3
> 


[PATCH ipsec-next] xfrm: Return detailed errors from xfrmi_newlink

2018-07-25 Thread Benedict Wong
Currently all failure modes of xfrm interface creation return EEXIST.
This change improves the granularity of errnos provided by also
returning ENODEV or EINVAL if failures happen in looking up the
underlying interface, or a required parameter is not provided.

This change has been tested against the Android Kernel Networking Tests,
with additional xfrmi_newlink tests here:

https://android-review.googlesource.com/c/kernel/tests/+/715755

Signed-off-by: Benedict Wong 
---
 net/xfrm/xfrm_interface.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index ccfe18d67e98..481d7307ab51 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -149,14 +149,18 @@ static struct xfrm_if *xfrmi_create(struct net *net, 
struct xfrm_if_parms *p)
char name[IFNAMSIZ];
int err;
 
-   if (p->name[0])
+   if (p->name[0]) {
strlcpy(name, p->name, IFNAMSIZ);
-   else
+   } else {
+   err = -EINVAL;
goto failed;
+   }
 
dev = alloc_netdev(sizeof(*xi), name, NET_NAME_UNKNOWN, 
xfrmi_dev_setup);
-   if (!dev)
+   if (!dev) {
+   err = -EAGAIN;
goto failed;
+   }
 
dev_net_set(dev, net);
 
@@ -165,8 +169,10 @@ static struct xfrm_if *xfrmi_create(struct net *net, 
struct xfrm_if_parms *p)
xi->net = net;
xi->dev = dev;
xi->phydev = dev_get_by_index(net, p->link);
-   if (!xi->phydev)
+   if (!xi->phydev) {
+   err = -ENODEV;
goto failed_free;
+   }
 
err = xfrmi_create2(dev);
if (err < 0)
@@ -179,7 +185,7 @@ static struct xfrm_if *xfrmi_create(struct net *net, struct 
xfrm_if_parms *p)
 failed_free:
free_netdev(dev);
 failed:
-   return NULL;
+   return ERR_PTR(err);
 }
 
 static struct xfrm_if *xfrmi_locate(struct net *net, struct xfrm_if_parms *p,
@@ -194,13 +200,13 @@ static struct xfrm_if *xfrmi_locate(struct net *net, 
struct xfrm_if_parms *p,
 xip = >next) {
if (xi->p.if_id == p->if_id) {
if (create)
-   return NULL;
+   return ERR_PTR(-EEXIST);
 
return xi;
}
}
if (!create)
-   return NULL;
+   return ERR_PTR(-ENODEV);
return xfrmi_create(net, p);
 }
 
@@ -682,8 +688,9 @@ static int xfrmi_newlink(struct net *src_net, struct 
net_device *dev,
 
nla_strlcpy(p->name, tb[IFLA_IFNAME], IFNAMSIZ);
 
-   if (!xfrmi_locate(net, p, 1))
-   return -EEXIST;
+   xi = xfrmi_locate(net, p, 1);
+   if (IS_ERR(xi))
+   return PTR_ERR(xi);
 
return 0;
 }
@@ -704,11 +711,12 @@ static int xfrmi_changelink(struct net_device *dev, 
struct nlattr *tb[],
 
xi = xfrmi_locate(net, >p, 0);
 
-   if (xi) {
+   if (IS_ERR_OR_NULL(xi)) {
+   xi = netdev_priv(dev);
+   } else {
if (xi->dev != dev)
return -EEXIST;
-   } else
-   xi = netdev_priv(dev);
+   }
 
return xfrmi_update(xi, >p);
 }
-- 
2.18.0.233.g985f88cf7e-goog



Re: [PATCH] samples/bpf: Add BTF build flags to Makefile

2018-07-25 Thread Martin KaFai Lau
On Wed, Jul 25, 2018 at 01:38:44PM -0700, Martin KaFai Lau wrote:
> On Thu, Jul 26, 2018 at 01:30:39AM +0900, Taeung Song wrote:
> > To smoothly test BTF supported binary on samples/bpf,
> > let samples/bpf/Makefile probe llc, pahole and
> > llvm-objcopy for BPF support and use them
> > like tools/testing/selftests/bpf/Makefile
> > changed from the commit c0fa1b6c3efc ("bpf: btf:
> >  Add BTF tests")
> > 
> > Cc: Martin KaFai Lau 
> > Signed-off-by: Taeung Song 
> Thanks for the patch. LGTM.
> 
> Acked-by: Martin KaFai Lau 
and it should go to bpf-next (Please use the proper tag in the
Subject, thanks!).

> 
> > ---
> >  samples/bpf/Makefile | 21 -
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > index 1303af10e54d..e079266360a3 100644
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -191,6 +191,8 @@ HOSTLOADLIBES_xdpsock   += -pthread
> >  #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
> > CLANG=~/git/llvm/build/bin/clang
> >  LLC ?= llc
> >  CLANG ?= clang
> > +LLVM_OBJCOPY ?= llvm-objcopy
> > +BTF_PAHOLE ?= pahole
> >  
> >  # Detect that we're cross compiling and use the cross compiler
> >  ifdef CROSS_COMPILE
> > @@ -198,6 +200,20 @@ HOSTCC = $(CROSS_COMPILE)gcc
> >  CLANG_ARCH_ARGS = -target $(ARCH)
> >  endif
> >  
> > +BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep 
> > dwarfris)
> > +BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
> > +BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 
> > 'usage.*llvm')
> > +
> > +ifneq ($(BTF_LLC_PROBE),)
> > +ifneq ($(BTF_PAHOLE_PROBE),)
> > +ifneq ($(BTF_OBJCOPY_PROBE),)
> > +   EXTRA_CFLAGS += -g
> > +   LLC_FLAGS += -mattr=dwarfris
> > +   DWARF2BTF = y
> > +endif
> > +endif
> > +endif
> > +
> >  # Trick to allow make to be run from this directory
> >  all:
> > $(MAKE) -C ../../ $(CURDIR)/ BPF_SAMPLES_PATH=$(CURDIR)
> > @@ -256,4 +272,7 @@ $(obj)/%.o: $(src)/%.c
> > -Wno-gnu-variable-sized-type-not-at-end \
> > -Wno-address-of-packed-member -Wno-tautological-compare \
> > -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
> > -   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> > +   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf $(LLC_FLAGS) 
> > -filetype=obj -o $@
> > +ifeq ($(DWARF2BTF),y)
> > +   $(BTF_PAHOLE) -J $@
> > +endif
> > -- 
> > 2.17.1
> > 


Re: [PATCH v3] net: ethernet: freescale: Use generic CRC32 implementation

2018-07-25 Thread David Miller
From: Krzysztof Kozlowski 
Date: Mon, 23 Jul 2018 18:19:14 +0200

> Use generic kernel CRC32 implementation because it:
> 1. Should be faster (uses lookup tables),
> 2. Removes duplicated CRC generation code,
> 3. Uses well-proven algorithm instead of coding it one more time.
> 
> Suggested-by: Eric Biggers 
> Signed-off-by: Krzysztof Kozlowski 

Applied.


Re: [PATCH] net: ethernet: fs-enet: Use generic CRC32 implementation

2018-07-25 Thread David Miller
From: Krzysztof Kozlowski 
Date: Mon, 23 Jul 2018 18:20:20 +0200

> Use generic kernel CRC32 implementation because it:
> 1. Should be faster (uses lookup tables),
> 2. Removes duplicated CRC generation code,
> 3. Uses well-proven algorithm instead of coding it one more time.
> 
> Suggested-by: Eric Biggers 
> Signed-off-by: Krzysztof Kozlowski 

Applied.


Re: [PATCH v2 net-next] net: phy: add helper phy_polling_mode

2018-07-25 Thread David Miller
From: Heiner Kallweit 
Date: Mon, 23 Jul 2018 21:40:07 +0200

> Add a helper for checking whether polling is used to detect PHY status
> changes.
> 
> Signed-off-by: Heiner Kallweit 

Applied.


Re: [PATCH] samples/bpf: Add BTF build flags to Makefile

2018-07-25 Thread Martin KaFai Lau
On Thu, Jul 26, 2018 at 01:30:39AM +0900, Taeung Song wrote:
> To smoothly test BTF supported binary on samples/bpf,
> let samples/bpf/Makefile probe llc, pahole and
> llvm-objcopy for BPF support and use them
> like tools/testing/selftests/bpf/Makefile
> changed from the commit c0fa1b6c3efc ("bpf: btf:
>  Add BTF tests")
> 
> Cc: Martin KaFai Lau 
> Signed-off-by: Taeung Song 
Thanks for the patch. LGTM.

Acked-by: Martin KaFai Lau 

> ---
>  samples/bpf/Makefile | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> index 1303af10e54d..e079266360a3 100644
> --- a/samples/bpf/Makefile
> +++ b/samples/bpf/Makefile
> @@ -191,6 +191,8 @@ HOSTLOADLIBES_xdpsock += -pthread
>  #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
> CLANG=~/git/llvm/build/bin/clang
>  LLC ?= llc
>  CLANG ?= clang
> +LLVM_OBJCOPY ?= llvm-objcopy
> +BTF_PAHOLE ?= pahole
>  
>  # Detect that we're cross compiling and use the cross compiler
>  ifdef CROSS_COMPILE
> @@ -198,6 +200,20 @@ HOSTCC = $(CROSS_COMPILE)gcc
>  CLANG_ARCH_ARGS = -target $(ARCH)
>  endif
>  
> +BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
> +BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
> +BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 
> 'usage.*llvm')
> +
> +ifneq ($(BTF_LLC_PROBE),)
> +ifneq ($(BTF_PAHOLE_PROBE),)
> +ifneq ($(BTF_OBJCOPY_PROBE),)
> + EXTRA_CFLAGS += -g
> + LLC_FLAGS += -mattr=dwarfris
> + DWARF2BTF = y
> +endif
> +endif
> +endif
> +
>  # Trick to allow make to be run from this directory
>  all:
>   $(MAKE) -C ../../ $(CURDIR)/ BPF_SAMPLES_PATH=$(CURDIR)
> @@ -256,4 +272,7 @@ $(obj)/%.o: $(src)/%.c
>   -Wno-gnu-variable-sized-type-not-at-end \
>   -Wno-address-of-packed-member -Wno-tautological-compare \
>   -Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
> - -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> + -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf $(LLC_FLAGS) 
> -filetype=obj -o $@
> +ifeq ($(DWARF2BTF),y)
> + $(BTF_PAHOLE) -J $@
> +endif
> -- 
> 2.17.1
> 


Re: [PATCH v1 0/4] PCI: Remove unnecessary includes of

2018-07-25 Thread Sinan Kaya

On 7/25/2018 12:52 PM, Bjorn Helgaas wrote:

emove includes of  from files that don't need
it.  I'll apply all these via the PCI tree unless there's objection.

---

Bjorn Helgaas (4):
   igb: Remove unnecessary include of 
   ath9k: Remove unnecessary include of 
   iwlwifi: Remove unnecessary include of 
   PCI: Remove unnecessary include of 


Thanks.

Reviewed-by: Sinan Kaya 

Is it possible to kill that file altogether? I haven't looked who is
using outside of pci directory.


Re: [PATCH net-next] net/tls: Corrected enabling of zero-copy mode

2018-07-25 Thread David Miller
From: Vakul Garg 
Date: Mon, 23 Jul 2018 21:00:06 +0530

> @@ -787,7 +787,7 @@ int tls_sw_recvmsg(struct sock *sk,
>   target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
>   timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
>   do {
> - bool zc = false;
> + bool zc;
>   int chunk = 0;
>  
>   skb = tls_wait_data(sk, flags, timeo, );
 ...
> @@ -836,6 +835,7 @@ int tls_sw_recvmsg(struct sock *sk,
>   if (err < 0)
>   goto fallback_to_reg_recv;
>  
> + zc = true;
>   err = decrypt_skb_update(sk, skb, sgin, );
>   for (; pages > 0; pages--)
>   put_page(sg_page([pages]));
> @@ -845,6 +845,7 @@ int tls_sw_recvmsg(struct sock *sk,
>   }
>   } else {
>  fallback_to_reg_recv:
> + zc = false;
>   err = decrypt_skb_update(sk, skb, NULL, );
>   if (err < 0) {
>   tls_err_abort(sk, EBADMSG);
> -- 
> 2.13.6
> 

This will leave a code path where 'zc' is evaluated but not initialized to
any value.

And that's the path taken when ctx->decrypted is true.  The code after
your changes looks like:

bool zc;
 ...
if (!ctx->decrypted) {

 ... assignments to 'zc' happen in this code block

ctx->decrypted = true;
}

if (!zc) {

So when ctx->decrypted it true, the if(!zc) condition runs on an
uninitialized value.

I have to say that your TLS changes are becomming quite a time sink
for two reasons.

First, you are making a lot of changes that seem not so needed, and
whose value is purely determined by taste.  I'd put the
msg_data_left() multiple evaluation patch into this category.

The rest require deep review and understanding of the complicated
details of the TLS code, and many of them turn out to be incorrect.

As I find more errors in your submissions, I begin to scrutinize your
patches even more.  Thus, review of your changes takes even more time.

And it isn't helping that there are not a lot of other developers
helping actively to review your changes.

I would like to just make a small request to you, that you concentrate
on fixing clear bugs and clear issues that need to be resolved.

Thank you.


Re: pahole + BTF was: Re: [Question] bpf: about a new 'tools/bpf/bpf_dwarf2btf'

2018-07-25 Thread Martin KaFai Lau
On Thu, Jul 26, 2018 at 04:21:31AM +0900, Taeung Song wrote:
> 
> 
> On 07/26/2018 03:27 AM, Taeung Song wrote:
> > Hi Arnaldo,
> > 
> > On 07/26/2018 02:52 AM, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Jul 26, 2018 at 02:23:32AM +0900, Taeung Song escreveu:
> > > > Hi,
> > > > 
> > > > Building bpf programs with .BTF section,
> > > > I thought it'd be better to convert dwarf info to .BTF by
> > > > a new tool such as 'tools/bpf/bpf_dwarf2btf' instead of pahole
> > > > in the future.
> > > > Currently for bpf binary that have .BTF section,
> > > > we need to use pahole from https://github.com/iamkafai/pahole/tree/btf
> > > > with the command line such as "pahole -J bpf_prog.o".
> > > > I think it is great but if implementing new 'bpf_dwarf2btf'
> > > > (dwarf parsing + btf encoder code written by Martin KaFai Lau on
> > > > the pahole project i.e. btf.h, btf_encoder.c, btf_encoder.h,
> > > > libbtf.c, libbtf.h),
> > > > BPF developers would more easily use functionalities based on BTF.
> > > 
> > > What would be easier exactly? Not having to install a package but build
> > > it from the kernel sources?
> > > 
> > > Many kernel developers already have pahole installed for other uses, so
> > > no need to install anything.
> > > 
> > 
> > Understood, but I think there are many non-kernel developers
> > developing BPF programs and they mightn't have or use pahole.
> > 
> > So, if providing the 'dwarf2btf' feature on tools/bpf or tools/bpf/bpftool,
> > non-kernel developers can also more easily build bpf prog with .BPF, no ?
Some quick thoughts,
IMO, I suspect if it is in the distro's pahole package,  it should be easy
enough for kernel and non kernel developer to install.
BTF usage is still evolving,  we might re-evaluate going forward but at this
point I think leveraging pahole's existing capability is a good option.

> > 
> 
> Or, if tools/lib/bpf/ have the 'dwarf2btf' feature,
> I think BPF developers can just use bpf programs that have dwarf info
> after compiling with clang '-g' and llc '-mattr=dwarfris', even though not
> using pahole.
> Isn't it good way ?
> 
> > > BTW, Daniel, I just pushed to pahole's main repository at:
> > > 
> > >    git://git.kernel.org/pub/scm/devel/pahole/pahole.git
> > > 
> > > with the Martin's BTF patch, so no need to pull from the github one,
> > > I'll tag v1.12 and announce the release so that distro package
> > > maintainers can update their packages.
Awesome! Thanks, Arnaldo!


- Martin


Re: [PATCH net-next] net: phy: prevent PHYs w/o Clause 22 regs from calling genphy_config_aneg

2018-07-25 Thread David Miller
From: Camelia Groza 
Date: Mon, 23 Jul 2018 18:06:15 +0300

> genphy_config_aneg() should be called only by PHYs that implement
> the Clause 22 register set. Prevent Clause 45 PHYs that don't implement
> the register set from calling the genphy function.
> 
> Signed-off-by: Camelia Groza 

Applied, thank you.


Re: [PATCH net-next v6 0/4] net: vhost: improve performance when enable busyloop

2018-07-25 Thread David Miller
From: xiangxia.m@gmail.com
Date: Sat, 21 Jul 2018 11:03:58 -0700

> From: Tonghao Zhang 
> 
> This patches improve the guest receive performance.
> On the handle_tx side, we poll the sock receive queue
> at the same time. handle_rx do that in the same way.
> 
> For more performance report, see patch 4.
> 
> v5->v6:
> rebase the codes.

It looks like there is still some dangling discussions about this
patch set.

Please repost this series when those discussions have completed.

Thank you.


Re: [PATCH net-next 0/6] virtio_net: Add ethtool stat items

2018-07-25 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Wed, 25 Jul 2018 12:40:12 +0300

> On Mon, Jul 23, 2018 at 11:36:03PM +0900, Toshiaki Makita wrote:
>> From: Toshiaki Makita 
>> 
>> Add some ethtool stat items useful for performance analysis.
>> 
>> Signed-off-by: Toshiaki Makita 
> 
> Series:
> 
> Acked-by: Michael S. Tsirkin 

Series applied.

> Patch 1 seems appropriate for stable, even though it's minor.

Ok, I'll put patch #1 also into 'net' and queue it up for -stable.

Thanks.


[PATCH v1 1/4] igb: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The igb driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/net/ethernet/intel/igb/igb_main.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index f707709969ac..c77fda05f683 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -22,7 +22,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 



[PATCH v1 4/4] PCI: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

Several PCI core files include pci-aspm.h even though they don't need
anything provided by that file.  Remove the unnecessary includes of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/pci/pci-sysfs.c |1 -
 drivers/pci/pci.c   |1 -
 drivers/pci/probe.c |1 -
 drivers/pci/remove.c|1 -
 4 files changed, 4 deletions(-)

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 0c4653c1d2ce..91337faae60d 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f5c6ab14fb31..7c2f0e682fc0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -23,7 +23,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac876e32de4b..1ed2852dee21 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index 6f072eae4f7a..01ec7fcb5634 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -1,7 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
-#include 
 #include "pci.h"
 
 static void pci_free_resources(struct pci_dev *dev)



[PATCH v1 3/4] iwlwifi: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

This part of the iwlwifi driver doesn't need anything provided by
pci-aspm.h, so remove the unnecessary include of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c 
b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
index 38234bda9017..d6c55e111fda 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c
@@ -72,7 +72,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "fw/acpi.h"



[PATCH v1 2/4] ath9k: Remove unnecessary include of

2018-07-25 Thread Bjorn Helgaas
From: Bjorn Helgaas 

The ath9k driver doesn't need anything provided by pci-aspm.h, so remove
the unnecessary include of it.

Signed-off-by: Bjorn Helgaas 
---
 drivers/net/wireless/ath/ath9k/pci.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/pci.c 
b/drivers/net/wireless/ath/ath9k/pci.c
index 645f0fbd9179..92b2dd396436 100644
--- a/drivers/net/wireless/ath/ath9k/pci.c
+++ b/drivers/net/wireless/ath/ath9k/pci.c
@@ -18,7 +18,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include "ath9k.h"
 



[PATCH v1 0/4] PCI: Remove unnecessary includes of

2018-07-25 Thread Bjorn Helgaas
Remove includes of  from files that don't need
it.  I'll apply all these via the PCI tree unless there's objection.

---

Bjorn Helgaas (4):
  igb: Remove unnecessary include of 
  ath9k: Remove unnecessary include of 
  iwlwifi: Remove unnecessary include of 
  PCI: Remove unnecessary include of 


 drivers/net/ethernet/intel/igb/igb_main.c |1 -
 drivers/net/wireless/ath/ath9k/pci.c  |1 -
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c |1 -
 drivers/pci/pci-sysfs.c   |1 -
 drivers/pci/pci.c |1 -
 drivers/pci/probe.c   |1 -
 drivers/pci/remove.c  |1 -
 7 files changed, 7 deletions(-)


Re: pahole + BTF was: Re: [Question] bpf: about a new 'tools/bpf/bpf_dwarf2btf'

2018-07-25 Thread Taeung Song




On 07/26/2018 03:27 AM, Taeung Song wrote:

Hi Arnaldo,

On 07/26/2018 02:52 AM, Arnaldo Carvalho de Melo wrote:

Em Thu, Jul 26, 2018 at 02:23:32AM +0900, Taeung Song escreveu:

Hi,

Building bpf programs with .BTF section,
I thought it'd be better to convert dwarf info to .BTF by
a new tool such as 'tools/bpf/bpf_dwarf2btf' instead of pahole
in the future.
Currently for bpf binary that have .BTF section,
we need to use pahole from https://github.com/iamkafai/pahole/tree/btf
with the command line such as "pahole -J bpf_prog.o".
I think it is great but if implementing new 'bpf_dwarf2btf'
(dwarf parsing + btf encoder code written by Martin KaFai Lau on
the pahole project i.e. btf.h, btf_encoder.c, btf_encoder.h,
libbtf.c, libbtf.h),
BPF developers would more easily use functionalities based on BTF.


What would be easier exactly? Not having to install a package but build
it from the kernel sources?

Many kernel developers already have pahole installed for other uses, so
no need to install anything.



Understood, but I think there are many non-kernel developers
developing BPF programs and they mightn't have or use pahole.

So, if providing the 'dwarf2btf' feature on tools/bpf or tools/bpf/bpftool,
non-kernel developers can also more easily build bpf prog with .BPF, no ?



Or, if tools/lib/bpf/ have the 'dwarf2btf' feature,
I think BPF developers can just use bpf programs that have dwarf info
after compiling with clang '-g' and llc '-mattr=dwarfris', even though 
not using pahole.

Isn't it good way ?


BTW, Daniel, I just pushed to pahole's main repository at:

   git://git.kernel.org/pub/scm/devel/pahole/pahole.git

with the Martin's BTF patch, so no need to pull from the github one,
I'll tag v1.12 and announce the release so that distro package
maintainers can update their packages.

What do you think about this ? Do you think this is needed ?
Or, already implementing something like this ?
If it is needed, I want to try to make 'tools/bpf/bpf_dwarf2btf'
based on the pahole code. I'd appreciate it, if you reply to this


The way Martin took advantage of the work done a long time ago to
support CTF out of the same DWARF reading codebase was really cool, not
that much work to do, just add a new format to pahole's codebase making
it more useful.



I got it !

Thanks,
Taeung


I was just so far overly picky with testing it and kept leaving for
later to have a good documentation about testing it, vacation and perf
maintainership duties kept making this take like forever, grumble :-\

- Arnaldo



--
oh.. :'(


Re: selftests: bpf: test_progs: deadlock at trace_call_bpf

2018-07-25 Thread Martin KaFai Lau
On Tue, Jul 24, 2018 at 02:51:42PM +0530, Naresh Kamboju wrote:
> Deadlock warning on x86 machine while testing selftests: bpf:
> test_progs and running linux next 4.18.0-rc3-next-20180705 and still
> happening on 4.18.0-rc5-next-20180720.
> 
> Any one noticed this kernel warning about deadlock ?
It should be a false positive.  The head->lock is a percpu
lock and is acquired by the bpf prog running on that cpu when
updating a bpf htab.  Hence, CPU0 and CPU1 are acquiring
a different head->lock.

When looking at a CPU alone, another bpf prog cannot start
running on the same CPU before the currently running bpf prog
has finished.  e.g. There is a percpu "bpf_prog_active" counter
to ensure that in the tracing side.

The head->lock is primary used in bpf htab update which
is used very heavily in most of the bpf progs.  Hence,
replacing the lock with the irqsave version is unnecessary
while having performance impact.

Thanks,
Martin

> 
> selftests: bpf: test_progs
> libbpf: incorrect bpf_call opcode
> libbpf: incorrect bpf_call opcode
> test_pkt_access:FAIL:ipv4 err 0 errno 2 retval 0 duration 126
> test_pkt_access:FAIL:ipv6 err 0 errno 2 retval 0 duration 115
> test_xdp:FAIL:ipv4 err 0 errno 2 retval 3 size 74
> test_xdp:FAIL:ipv6 err 0 errno 2 retval 3 size 114
> test_xdp_adjust_tail:FAIL:ipv4 err 0 errno 2 retval 1 size 54
> test_xdp_adjust_tail:FAIL:ipv6 err 0 errno 2 retval 3 siz[   69.901655]
> [   69.903862] 
> [   69.910213] WARNING: possible irq lock inversion dependency detected
> [   69.916559] 4.18.0-rc3-next-20180705 #1 Not tainted
> [   69.921428] 
> [   69.927774] dd/2928 just changed the state of lock:
> [   69.932643] 22eeb38d (>lock){+...}, at:
> pcpu_freelist_push+0x28/0x50
> [   69.940208] but this lock was taken by another, HARDIRQ-safe lock
> in the past:
> [   69.947420]  (>lock){-.-.}
> [   69.947421]
> [   69.947421]
> [   69.947421] and interrupts could create inverse lock ordering between them.
> [   69.947421]
> [   69.961842]
> [   69.961842] other info that might help us debug this:
> [   69.968357]  Possible interrupt unsafe locking scenario:
> [   69.968357]
> [   69.975136]CPU0CPU1
> [   69.979659]
> [   69.984184]   lock(>lock);
> [   69.987406]local_irq_disable();
> [   69.993319]lock(>lock);
> [   69.998882]lock(>lock);
> [   70.004618]   
> [   70.007235] lock(>lock);
> [   70.010461]
> [   70.010461]  *** DEADLOCK ***
> [   70.010461]
> [   70.016372] 1 lock held by dd/2928:
> [   70.019856]  #0: ab9293c8 (rcu_read_lock){}, at:
> trace_call_bpf+0x37/0x1d0
> [   70.027768]
> [   70.027768] the shortest dependencies between 2nd lock and 1st lock:
> [   70.035586]  -> (>lock){-.-.} ops: 1401365 {
> [   70.040204] IN-HARDIRQ-W at:
> [   70.043428]   lock_acquire+0xd5/0x1c0
> [   70.048820]   _raw_spin_lock+0x2f/0x40
> [   70.054299]   scheduler_tick+0x51/0xf0
> [   70.059781]   update_process_times+0x47/0x60
> [   70.065779]   tick_periodic+0x2b/0xc0
> [   70.071171]   tick_handle_periodic+0x25/0x70
> [   70.077168]   timer_interrupt+0x15/0x20
> [   70.082731]   __handle_irq_event_percpu+0x48/0x320
> [   70.089250]   handle_irq_event_percpu+0x32/0x80
> [   70.095505]   handle_irq_event+0x39/0x60
> [   70.101157]   handle_level_irq+0x7f/0x100
> [   70.106893]   handle_irq+0x6f/0x110
> [   70.112112]   do_IRQ+0x5c/0x110
> [   70.116982]   ret_from_intr+0x0/0x1d
> [   70.122286]   _raw_spin_unlock_irqrestore+0x38/0x50
> [   70.128891]   __setup_irq+0x45d/0x700
> [   70.134281]   setup_irq+0x4c/0x90
> [   70.139324]   hpet_time_init+0x25/0x37
> [   70.144803]   x86_late_time_init+0xf/0x1c
> [   70.150538]   start_kernel+0x40c/0x4ca
> [   70.156017]   x86_64_start_reservations+0x24/0x26
> [   70.162445]   x86_64_start_kernel+0x6f/0x72
> [   70.168357]   secondary_startup_64+0xa4/0xb0
> [   70.174356] IN-SOFTIRQ-W at:
> [   70.177578]   lock_acquire+0xd5/0x1c0
> [   70.182970]   _raw_spin_lock+0x2f/0x40
> [   70.188448]   try_to_wake_up+0x31b/0x540
> [   70.194097]   wake_up_process+0x15/0x20
> [   70.199661]   swake_up_locked+0x24/0x40
> [   70.205226]   swake_up_one+0x1f/0x30
> [   70.210530]  

Re: [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace

2018-07-25 Thread Eric W. Biederman
David Ahern  writes:

> On 7/25/18 11:38 AM, Eric W. Biederman wrote:
>> 
>> Absolutely NOT.  Global thresholds are exactly correct given the fact
>> you are running on a single kernel.
>> 
>> Memory is not free (Even though we are swimming in enough of it memory
>> rarely matters).  One of the few remaining challenges is for containers
>> is finding was to limit resources in such a way that one application
>> does not mess things up for another container during ordinary usage.
>> 
>> It looks like the neighbour tables absolutely are that kind of problem,
>> because the artificial limits are too strict.   Completely giving up on
>> limits does not seem right approach either.  We need to fix the limits
>> we have (perhaps making them go away entirely), not just apply a
>> band-aid.  Let's get to the bottom of this and make the system better.
>
> Eric: yes, they all share the global resource of memory and there should
> be limits on how many entries a remote entity can create.
>
> Network namespaces can provide a separation such that one namespace does
> not disrupt networking in another. It is absolutely appropriate to do
> so. Your rigid stance is inconsistent given the basic meaning of a
> network namespace and the parallels to this same problem -- bridges,
> vxlans, and ip fragments. Only neighbor tables are not per-device or per
> namespace; your insistence on global limits is missing the mark and wrong.

That is not what I said.  Let me rephrase and see if you understand.

The problem appears to be of lots of devices.  Fundamentally if you use
lots of network devices today unless you adjust gc_thresh3 you will run
out of neighbour table entries.

The problem has a bigger scope than what you are looking at.

If you fix the core problem you won't see the problem in the context
of network namespaces either.

Default limits should be something that will never be hit unless
something goes crazy.  We are hitting them.  Therefore by definition
there is a bug in these limits.


And yes there is absolutely a place for global limits on things like
inodes, file descriptors etc, that does not care about which part of the
kernel you are in.  However hitting those limits in normal operation is
a bug.

We have ourselves a bug.

Eric

p.s. I wrote the definition of network namespaces and it absolutely does
have room for global limits.   One of the things Linus has periodically
yelled at me about is that there are not enough of them.




Re: [PATCH v3] ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest()

2018-07-25 Thread Julian Anastasov


Hello,

On Wed, 25 Jul 2018, Tan Hu wrote:

> We came across infinite loop in ipvs when using ipvs in docker
> env.
> 
> When ipvs receives new packets and cannot find an ipvs connection,
> it will create a new connection, then if the dest is unavailable
> (i.e. IP_VS_DEST_F_AVAILABLE), the packet will be dropped sliently.
> 
> But if the dropped packet is the first packet of this connection,
> the connection control timer never has a chance to start and the
> ipvs connection cannot be released. This will lead to memory leak, or
> infinite loop in cleanup_net() when net namespace is released like
> this:
> 
> ip_vs_conn_net_cleanup at a0a9f31a [ip_vs]
> __ip_vs_cleanup at a0a9f60a [ip_vs]
> ops_exit_list at 81567a49
> cleanup_net at 81568b40
> process_one_work at 810a851b
> worker_thread at 810a9356
> kthread at 810b0b6f
> ret_from_fork at 81697a18
> 
> race condition:
> CPU1   CPU2
> ip_vs_in()
>   ip_vs_conn_new()
>ip_vs_del_dest()
>  __ip_vs_unlink_dest()
>~IP_VS_DEST_F_AVAILABLE
>   cp->dest && !IP_VS_DEST_F_AVAILABLE
>   __ip_vs_conn_put
> ...
> cleanup_net  ---> infinite looping
> 
> Fix this by checking whether the timer already started.
> 
> Signed-off-by: Tan Hu 
> Reviewed-by: Jiang Biao 

v3 looks good to me,

Acked-by: Julian Anastasov 

Simon and Pablo, this can be applied to ipvs/nf tree...

> ---
> v2: fix use-after-free in CONN_ONE_PACKET case suggested by Julian Anastasov
> v3: remove trailing whitespace for patch checking 
> 
>  net/netfilter/ipvs/ip_vs_core.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
> index 0679dd1..a17104f 100644
> --- a/net/netfilter/ipvs/ip_vs_core.c
> +++ b/net/netfilter/ipvs/ip_vs_core.c
> @@ -1972,13 +1972,20 @@ static int ip_vs_in_icmp_v6(struct netns_ipvs *ipvs, 
> struct sk_buff *skb,
>   if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
>   /* the destination server is not available */
> 
> - if (sysctl_expire_nodest_conn(ipvs)) {
> + __u32 flags = cp->flags;
> +
> + /* when timer already started, silently drop the packet.*/
> + if (timer_pending(>timer))
> + __ip_vs_conn_put(cp);
> + else
> + ip_vs_conn_put(cp);
> +
> + if (sysctl_expire_nodest_conn(ipvs) &&
> + !(flags & IP_VS_CONN_F_ONE_PACKET)) {
>   /* try to expire the connection immediately */
>   ip_vs_conn_expire_now(cp);
>   }
> - /* don't restart its timer, and silently
> -drop the packet. */
> - __ip_vs_conn_put(cp);
> +
>   return NF_DROP;
>   }
> 
> --
> 1.8.3.1

Regards

--
Julian Anastasov 


Re: [PATCH iproute2] iplink: report drop stats for VFs

2018-07-25 Thread Ivan Vecera
On 25.7.2018 19:04, David Ahern wrote:
> On 7/25/18 10:22 AM, Ivan Vecera wrote:
>> Kernel commit c5a9f6f0ab40 ("net/core: Add drop counters to VF
>> statistics") added support for Rx/Tx packet drops but these stats are
>> not reported by 'ip link'.
>>
>> Cc: Eugenia Emantayev 
>> Cc: Saeed Mahameed 
>>
>> Signed-off-by: Ivan Vecera 
>> ---
>>  ip/ipaddress.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
> 
> duplicates a patch from Eran which I just committed to iproute2-next
> 
Oops, I missed it :-)

I.


Re: pahole + BTF was: Re: [Question] bpf: about a new 'tools/bpf/bpf_dwarf2btf'

2018-07-25 Thread Daniel Borkmann
On 07/25/2018 08:27 PM, Taeung Song wrote:
> On 07/26/2018 02:52 AM, Arnaldo Carvalho de Melo wrote:
[...]
>> BTW, Daniel, I just pushed to pahole's main repository at:
>>
>>    git://git.kernel.org/pub/scm/devel/pahole/pahole.git
>>
>> with the Martin's BTF patch, so no need to pull from the github one,
>> I'll tag v1.12 and announce the release so that distro package
>> maintainers can update their packages.

Awesome, thanks so much Arnaldo!


[PATCH] net: wireless: cw1200: Remove extra parentheses

2018-07-25 Thread Varsha Rao
Remove unnecessary parentheses to fix the extraneous parentheses clang
warning.

Signed-off-by: Varsha Rao 
---
 drivers/net/wireless/st/cw1200/txrx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/st/cw1200/txrx.c 
b/drivers/net/wireless/st/cw1200/txrx.c
index f7b1b0062db3..8c800ef23159 100644
--- a/drivers/net/wireless/st/cw1200/txrx.c
+++ b/drivers/net/wireless/st/cw1200/txrx.c
@@ -624,9 +624,9 @@ cw1200_tx_h_bt(struct cw1200_common *priv,
priority = WSM_EPTA_PRIORITY_ACTION;
else if (ieee80211_is_mgmt(t->hdr->frame_control))
priority = WSM_EPTA_PRIORITY_MGT;
-   else if ((wsm->queue_id == WSM_QUEUE_VOICE))
+   else if (wsm->queue_id == WSM_QUEUE_VOICE)
priority = WSM_EPTA_PRIORITY_VOICE;
-   else if ((wsm->queue_id == WSM_QUEUE_VIDEO))
+   else if (wsm->queue_id == WSM_QUEUE_VIDEO)
priority = WSM_EPTA_PRIORITY_VIDEO;
else
priority = WSM_EPTA_PRIORITY_DATA;
-- 
2.17.0



[PATCH] net: wireless: brcmsmac: Remove extra parentheses

2018-07-25 Thread Varsha Rao
Remove the unnecessary parentheses to fix the clang warning of
extraneous parentheses.

Signed-off-by: Varsha Rao 
---
 drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
index 1a187557982e..4deba3075083 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
@@ -25453,12 +25453,12 @@ void wlc_phy_cal_perical_nphy_run(struct brcms_phy 
*pi, u8 caltype)
(pi->cal_type_override ==
 PHY_PERICAL_FULL) ? true : false;
 
-   if ((pi->mphase_cal_phase_id > MPHASE_CAL_STATE_INIT)) {
+   if (pi->mphase_cal_phase_id > MPHASE_CAL_STATE_INIT) {
if (pi->nphy_txiqlocal_chanspec != pi->radio_chanspec)
wlc_phy_cal_perical_mphase_restart(pi);
}
 
-   if ((pi->mphase_cal_phase_id == MPHASE_CAL_STATE_RXCAL))
+   if (pi->mphase_cal_phase_id == MPHASE_CAL_STATE_RXCAL)
wlapi_bmac_write_shm(pi->sh->physhim, M_CTS_DURATION, 1);
 
wlapi_suspend_mac_and_wait(pi->sh->physhim);
-- 
2.17.0



[PATCH] net: wireless: ath9k: Remove unnecessary parentheses

2018-07-25 Thread Varsha Rao
Remove extra parentheses to fix the clang warning of extraneous
parentheses.

Signed-off-by: Varsha Rao 
---
 drivers/net/wireless/ath/ath9k/debug_sta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/debug_sta.c 
b/drivers/net/wireless/ath/ath9k/debug_sta.c
index a6f45f1bb5bb..ed8b77a74630 100644
--- a/drivers/net/wireless/ath/ath9k/debug_sta.c
+++ b/drivers/net/wireless/ath/ath9k/debug_sta.c
@@ -116,7 +116,7 @@ void ath_debug_rate_stats(struct ath_softc *sc,
if (rxs->rate_idx >= ARRAY_SIZE(rstats->ht_stats))
goto exit;
 
-   if ((rxs->bw == RATE_INFO_BW_40))
+   if (rxs->bw == RATE_INFO_BW_40)
rstats->ht_stats[rxs->rate_idx].ht40_cnt++;
else
rstats->ht_stats[rxs->rate_idx].ht20_cnt++;
-- 
2.17.0



[PATCH] net: wireless: ath6kl: Remove unnecessary parentheses

2018-07-25 Thread Varsha Rao
Remove extra parentheses to fix the clang warning of extraneous
parentheses.

Signed-off-by: Varsha Rao 
---
 drivers/net/wireless/ath/ath6kl/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath6kl/main.c 
b/drivers/net/wireless/ath/ath6kl/main.c
index 808fb30be9ad..451297441532 100644
--- a/drivers/net/wireless/ath/ath6kl/main.c
+++ b/drivers/net/wireless/ath/ath6kl/main.c
@@ -639,7 +639,7 @@ void ath6kl_connect_event(struct ath6kl_vif *vif, u16 
channel, u8 *bssid,
memcpy(vif->bssid, bssid, sizeof(vif->bssid));
vif->bss_ch = channel;
 
-   if ((vif->nw_type == INFRA_NETWORK)) {
+   if (vif->nw_type == INFRA_NETWORK) {
ath6kl_wmi_listeninterval_cmd(ar->wmi, vif->fw_vif_idx,
  vif->listen_intvl_t, 0);
ath6kl_check_ch_switch(ar, channel);
-- 
2.17.0



Re: pahole + BTF was: Re: [Question] bpf: about a new 'tools/bpf/bpf_dwarf2btf'

2018-07-25 Thread Taeung Song

Hi Arnaldo,

On 07/26/2018 02:52 AM, Arnaldo Carvalho de Melo wrote:

Em Thu, Jul 26, 2018 at 02:23:32AM +0900, Taeung Song escreveu:

Hi,

Building bpf programs with .BTF section,
I thought it'd be better to convert dwarf info to .BTF by
a new tool such as 'tools/bpf/bpf_dwarf2btf' instead of pahole
in the future.
  

Currently for bpf binary that have .BTF section,
we need to use pahole from https://github.com/iamkafai/pahole/tree/btf
with the command line such as "pahole -J bpf_prog.o".
  

I think it is great but if implementing new 'bpf_dwarf2btf'
(dwarf parsing + btf encoder code written by Martin KaFai Lau on
the pahole project i.e. btf.h, btf_encoder.c, btf_encoder.h,
libbtf.c, libbtf.h),
BPF developers would more easily use functionalities based on BTF.


What would be easier exactly? Not having to install a package but build
it from the kernel sources?

Many kernel developers already have pahole installed for other uses, so
no need to install anything.



Understood, but I think there are many non-kernel developers
developing BPF programs and they mightn't have or use pahole.

So, if providing the 'dwarf2btf' feature on tools/bpf or tools/bpf/bpftool,
non-kernel developers can also more easily build bpf prog with .BPF, no ?


BTW, Daniel, I just pushed to pahole's main repository at:

   git://git.kernel.org/pub/scm/devel/pahole/pahole.git

with the Martin's BTF patch, so no need to pull from the github one,
I'll tag v1.12 and announce the release so that distro package
maintainers can update their packages.
  

What do you think about this ? Do you think this is needed ?
Or, already implementing something like this ?
  

If it is needed, I want to try to make 'tools/bpf/bpf_dwarf2btf'
based on the pahole code. I'd appreciate it, if you reply to this


The way Martin took advantage of the work done a long time ago to
support CTF out of the same DWARF reading codebase was really cool, not
that much work to do, just add a new format to pahole's codebase making
it more useful.



I got it !

Thanks,
Taeung


I was just so far overly picky with testing it and kept leaving for
later to have a good documentation about testing it, vacation and perf
maintainership duties kept making this take like forever, grumble :-\

- Arnaldo



--
oh.. :'(


Re: [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback

2018-07-25 Thread Marcelo Ricardo Leitner
On Wed, Jul 25, 2018 at 07:59:48AM -0400, Jamal Hadi Salim wrote:
> On 24/07/18 04:06 PM, Paolo Abeni wrote:
> > Each lockless action currently does its own RCU locking in ->act().
> > This is allows using plain RCU accessor, even if the context
> > is really RCU BH.
> > 
> > This change drops the per action RCU lock, replace the accessors
> > with _bh variant, cleans up a bit the surronding code and documents
> > the RCU status in the relevant header.
> > No functional nor performance change is intended.
> > 
> > The goal of this patch is clarifying that the RCU critical section
> > used by the tc actions extends up to the classifier's caller.
> > 
> 
> This and 2/5 seems to stand on their own merit.

So does 1/5, I think.

> 
> cheers,
> jamal


Re: [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace

2018-07-25 Thread David Ahern
On 7/24/18 11:14 AM, David Miller wrote:
> From: David Ahern 
> Date: Tue, 24 Jul 2018 09:14:01 -0600
> 
>> I get the impression there is no longer a strong resistance against
>> moving the tables to per namespace, but deciding what is the right
>> approach to handle backwards compatibility. Correct? Changing the
>> accounting is inevitably going to be noticeable to some use case(s), but
>> with sysctl settings it is a simple runtime update once the user knows
>> to make the change.
>>
>> neighbor entries round up to 512 byte allocations, so with the current
>> gc_thresh defaults (128/512/1024) 512k can be consumed. Using those
>> limits per namespace seems high which is why I suggested a per-namespace
>> default of (16/32/64) which amounts to 32k per namespace limit by
>> default. Open to other suggestions as well.
> 
> No objection from me about going to per-ns neigh tables.
> 
> About the defaults, I wonder if we can scale them to the amount of
> memory given to the ns or something like that?  I bet this will better
> match the intended use of the ns.
> 

Not sure how to do that. I am not aware of memory allocations to a
network namespace. As I understand it containers use cgroups to control
memory use, but I am not aware of any direct ties to namespace.


Re: [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace

2018-07-25 Thread David Ahern
On 7/25/18 11:38 AM, Eric W. Biederman wrote:
> 
> Absolutely NOT.  Global thresholds are exactly correct given the fact
> you are running on a single kernel.
> 
> Memory is not free (Even though we are swimming in enough of it memory
> rarely matters).  One of the few remaining challenges is for containers
> is finding was to limit resources in such a way that one application
> does not mess things up for another container during ordinary usage.
> 
> It looks like the neighbour tables absolutely are that kind of problem,
> because the artificial limits are too strict.   Completely giving up on
> limits does not seem right approach either.  We need to fix the limits
> we have (perhaps making them go away entirely), not just apply a
> band-aid.  Let's get to the bottom of this and make the system better.

Eric: yes, they all share the global resource of memory and there should
be limits on how many entries a remote entity can create.

Network namespaces can provide a separation such that one namespace does
not disrupt networking in another. It is absolutely appropriate to do
so. Your rigid stance is inconsistent given the basic meaning of a
network namespace and the parallels to this same problem -- bridges,
vxlans, and ip fragments. Only neighbor tables are not per-device or per
namespace; your insistence on global limits is missing the mark and wrong.


Re: [PATCH v6 10/18] x86/power/64: Remove VLA usage

2018-07-25 Thread Kees Cook
On Wed, Jul 25, 2018 at 4:32 AM, Rafael J. Wysocki  wrote:
> On Tue, Jul 24, 2018 at 6:49 PM, Kees Cook  wrote:
>> In the quest to remove all stack VLA usage from the kernel[1], this
>> removes the discouraged use of AHASH_REQUEST_ON_STACK by switching to
>> shash directly and allocating the descriptor in heap memory (which should
>> be fine: the tfm has already been allocated there too).
>>
>> [1] 
>> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
>>
>> Signed-off-by: Kees Cook 
>> Acked-by: Pavel Machek 
>
> I think I can queue this up if there are no objections from others.
>
> Do you want me to do that?

Sure thing. It looks like the other stand-alone patches like this one
are getting taken into the non-crypto trees, so that's fine.

Thanks!

-Kees

-- 
Kees Cook
Pixel Security


Re: [PATCH RFC/RFT net-next 00/17] net: Convert neighbor tables to per-namespace

2018-07-25 Thread Eric W. Biederman
David Ahern  writes:

> On 7/25/18 6:33 AM, Eric W. Biederman wrote:
>> Cong Wang  writes:
>> 
>>> On Tue, Jul 24, 2018 at 8:14 AM David Ahern  wrote:

 On 7/19/18 11:12 AM, Cong Wang wrote:
> On Thu, Jul 19, 2018 at 9:16 AM David Ahern  wrote:
>>
>> Chatting with Nikolay about this and he brought up a good corollary - ip
>> fragmentation. It really is a similar problem in that memory is consumed
>> as a result of packets received from an external entity. The ipfrag
>> sysctls are per namespace with a limit that non-init_net namespaces can
>> not set high_thresh > the current value of init_net. Potential memory
>> consumed by fragments scales with the number of namespaces which is the
>> primary concern with making neighbor tables per namespace.
>
> Nothing new, already discussed:
> https://marc.info/?l=linux-netdev=140391416215988=2
>
> :)
>

 Neighbor tables, bridge fdbs, vxlan fdbs and ip fragments all consume
 local memory resources due to received packets. bridge and vxlan fdb's
 are fairly straightforward analogs to neighbor entries; they are per
 device with no limits on the number of entries. Fragments have memory
 limits per namespace. So neighbor tables are the only ones with this
 strict limitation and concern on memory consumption.

 I get the impression there is no longer a strong resistance against
 moving the tables to per namespace, but deciding what is the right
 approach to handle backwards compatibility. Correct? Changing the
 accounting is inevitably going to be noticeable to some use case(s), but
 with sysctl settings it is a simple runtime update once the user knows
 to make the change.
>>>
>>> This question definitely should go to Eric Biederman who was against
>>> my proposal.
>>>
>>> Let's add Eric into CC.
>> 
>> Given that the entries are per device and the devices are per-namespace,
>> semantically neighbours are already kept in a per-namespace manner.  So
>> this is all about making the code not honoring global resource limits.
>> Making the code not honor gc_thresh3.
>> 
>> Skimming through the code today the default for gc_thresh3 is 1024.
>> Which means that we limit the neighbour tables to 1024 entries per
>> protocol type.
>> 
>> There are some pretty compelling reasons especially with ipv4 to keep
>> the subnet size down.  Arp storms are a real thing.
>> 
>> I don't know off the top of my head what the reasons for limiting the
>> neighbour table sizes.  I would be much more comfortable with a patchset
>> like this if we did some research and figured out the reasons why
>> we have a global limit.  Then changed the code to remove those limits.
>> 
>> When the limits are gone.  When the code can support large subnets
>> without tuning.  We we don't have to worry about someone scanning an all
>> addresses in an ipv6 subnet and causing a DOS on working machines.
>> I think it is completely appropriate to look to see if something per
>> network namespace needs to happen.
>> 
>> So please let's address the limits, not the fact that some specific
>> corner case ran into them.
>> 
>> If we are going to neuter gc_thresh3 let's go as far as removing it
>> entirely.  If we are going to make the neighbour table per something
>> let's make it per network device.  If we can afford the multiple hash
>> tables then a hash table per device is better.   Perhaps we want to move
>> to rhash tables while we look at this, instead of an old hand grown
>> version of resizable hash table.
>
> Given the uses cases with increasing number of devices (> 10,000),
> per-device tables will have more problems than per namespace - in
> reference to your concern in the last paragraph below.
>
>> 
>> Unless I misread something all your patchset did is reshuffle code and
>> data structures so that gc_thresh3 does not apply accross namespaces.
>> That does not feel like it really fixes anything.  That just lies to
>> people.
>
> This patch set fixes the lie that network namespaces provide complete
> isolation when in fact one namespace can evict neighbor entries from
> another. An arp storm you are concerned about in one namespace impacts
> all containers.

Network namespaces can not provide complete isolation.  They share the
same kernel and they do not dedicate resources to each other.
Namespaces in general are about the names.  They are about sharing a
machine efficiently.

I humbly suggest that anyone who wants ``complete'' isolation to use vm
at the very least.

I do think the limits on the neighbour table are quite likely too
strict.  We should be able to relax them and continue to have a
networking stack that works for everyone.

> It starts by removing the proliferation of open coded references to
> arp_tbl and nd_tbl, moving them behind the existing neigh_find_table.
> From there (patches 14-16) it makes the tables per-namespace and hence
> makes the gc_thresh parameters which 

[Question] bpf: about a new 'tools/bpf/bpf_dwarf2btf'

2018-07-25 Thread Taeung Song

Hi,

Building bpf programs with .BTF section,
I thought it'd be better to convert dwarf info to .BTF by
a new tool such as 'tools/bpf/bpf_dwarf2btf' instead of pahole
in the future.

Currently for bpf binary that have .BTF section,
we need to use pahole from https://github.com/iamkafai/pahole/tree/btf
with the command line such as "pahole -J bpf_prog.o".

I think it is great but if implementing new 'bpf_dwarf2btf'
(dwarf parsing + btf encoder code written by Martin KaFai Lau on
the pahole project i.e. btf.h, btf_encoder.c, btf_encoder.h, libbtf.c, 
libbtf.h),

BPF developers would more easily use functionalities based on BTF.

What do you think about this ? Do you think this is needed ?
Or, already implementing something like this ?

If it is needed, I want to try to make 'tools/bpf/bpf_dwarf2btf'
based on the pahole code. I'd appreciate it, if you reply to this

Thanks,
Taeung


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Marcelo Ricardo Leitner
On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim  wrote:
> >
> > Those changes were there from the beginning (above patch did
> > not introduce them).
> > IIRC, the reason was to distinguish between policy intended
> > drops and drops because of errors.
> 
> There must be a limit for "overlimit" to make sense. There is
> no limit in mirred action's context, probably there is only
> such a limit in act_police. So, all rest should not touch overlimit.

+1



Re: [PATCH iproute2] iplink: report drop stats for VFs

2018-07-25 Thread David Ahern
On 7/25/18 10:22 AM, Ivan Vecera wrote:
> Kernel commit c5a9f6f0ab40 ("net/core: Add drop counters to VF
> statistics") added support for Rx/Tx packet drops but these stats are
> not reported by 'ip link'.
> 
> Cc: Eugenia Emantayev 
> Cc: Saeed Mahameed 
> 
> Signed-off-by: Ivan Vecera 
> ---
>  ip/ipaddress.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 

duplicates a patch from Eran which I just committed to iproute2-next



Re: [patch iproute2/net-next v4] tc: introduce support for chain templates

2018-07-25 Thread David Ahern
On 7/23/18 1:24 AM, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> Signed-off-by: Jiri Pirko 
> ---
> v3->v4:
> - reworked to chain object
> v1->v2:
> - moved the template handling
>   from "tc filter template" to "tc chaintemplate"
> ---
>  include/uapi/linux/rtnetlink.h |   7 +++
>  man/man8/tc.8  |  26 
>  tc/tc.c|   5 +-
>  tc/tc_common.h |   1 +
>  tc/tc_filter.c | 131 
> +
>  tc/tc_monitor.c|   5 +-
>  6 files changed, 135 insertions(+), 40 deletions(-)
> 

applied to iproute2-next. Thanks




Re: [PATCH iproute2-next] ip: Add violation counters to VF statisctics

2018-07-25 Thread David Ahern
On 7/22/18 4:31 AM, Eran Ben Elisha wrote:
> Extend VFs statistics by receive and transmit violation counters.
> 
> Example: "ip -s link show dev enp5s0f0"
> 
> 6: enp5s0f0:  mtu 1500 qdisc mq state UP 
> mode DEFAULT group default qlen 1000
> link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 0  00   0   0   2
> TX: bytes  packets  errors  dropped carrier collsns
> 1406   17   0   0   0   0
> vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, 
> trust off, query_rss off
> RX: bytes  packets  mcast   bcast   dropped
> 1666   29   14 32  0
> TX: bytes  packets   dropped
> 2880   44   2412
> 
> Signed-off-by: Eran Ben Elisha 
> ---
>  ip/ipaddress.c | 20 ++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 

applied to iproute2-next. Thanks




Re: [PATCH bpf] xsk: fix poll/POLLIN premature returns

2018-07-25 Thread Daniel Borkmann
On 07/23/2018 11:43 AM, Björn Töpel wrote:
> From: Björn Töpel 
> 
> Polling for the ingress queues relies on reading the producer/consumer
> pointers of the Rx queue.
> 
> Prior this commit, a cached consumer pointer could be used, instead of
> the actual consumer pointer and therefore report POLLIN prematurely.
> 
> This patch makes sure that the non-cached consumer pointer is used
> instead.
> 
> Reported-by: Qi Zhang 
> Tested-by: Qi Zhang 
> Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support")
> Signed-off-by: Björn Töpel 

Applied thanks, Björn!


Re: [PATCH] bpf, x32: Fix regression caused by commit 24dea04767e6

2018-07-25 Thread Daniel Borkmann
On 07/25/2018 06:46 PM, Wang YanQing wrote:
> Commit 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind")
> removed the 4 /* Extra space for skb_copy_bits buffer */
> from _STACK_SIZE, but it didn't fix the concerned code
> in emit_prologue and emit_epilogue, and this error will
> bring very strange kernel runtime errors.
> 
> This patch fix it.
> 
> Fixes: 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind")
> Signed-off-by: Wang YanQing 

Applied, thanks Wang!


Re: [PATCH] rds: send: Fix dead code in rds_sendmsg

2018-07-25 Thread Santosh Shilimkar

On 7/25/2018 8:22 AM, Gustavo A. R. Silva wrote:

Currently, code at label *out* is unreachable. Fix this by updating
variable *ret* with -EINVAL, so the jump to *out* can be properly
executed instead of directly returning from function.

Addresses-Coverity-ID: 1472059 ("Structurally dead code")
Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support")
Signed-off-by: Gustavo A. R. Silva 
---

Looks fine.
Acked-by: Santosh Shilimkar 


Re: [PATCH 4/4] net: dsa: Add Lantiq / Intel DSA driver for vrx200

2018-07-25 Thread Andrew Lunn
> 
> i extracted this struct/blob/voodoo from UGW3/4 7 years ago and was puzzled
> by it. for those of us that have worked with this table in the past, its
> semi understandable, yet its almost like a blackbox FW blob. can we try to
> make it a little more readable by adding a comment on each line explaining
> why its there and what it does ?

Hi John

I was wondering if there was any reverse engineered documentation. Do
you have any links?

Thanks
Andrew


Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.

2018-07-25 Thread Cong Wang
On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim  wrote:
>
> Those changes were there from the beginning (above patch did
> not introduce them).
> IIRC, the reason was to distinguish between policy intended
> drops and drops because of errors.

There must be a limit for "overlimit" to make sense. There is
no limit in mirred action's context, probably there is only
such a limit in act_police. So, all rest should not touch overlimit.


[PATCH] bpf, x32: Fix regression caused by commit 24dea04767e6

2018-07-25 Thread Wang YanQing
Commit 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind")
removed the 4 /* Extra space for skb_copy_bits buffer */
from _STACK_SIZE, but it didn't fix the concerned code
in emit_prologue and emit_epilogue, and this error will
bring very strange kernel runtime errors.

This patch fix it.

Fixes: 24dea04767e6 ("bpf, x32: remove ld_abs/ld_ind")
Signed-off-by: Wang YanQing 
---
 arch/x86/net/bpf_jit_comp32.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 5579987..8f6cc71 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -1441,8 +1441,8 @@ static void emit_prologue(u8 **pprog, u32 stack_depth)
 
/* sub esp,STACK_SIZE */
EMIT2_off32(0x81, 0xEC, STACK_SIZE);
-   /* sub ebp,SCRATCH_SIZE+4+12*/
-   EMIT3(0x83, add_1reg(0xE8, IA32_EBP), SCRATCH_SIZE + 16);
+   /* sub ebp,SCRATCH_SIZE+12*/
+   EMIT3(0x83, add_1reg(0xE8, IA32_EBP), SCRATCH_SIZE + 12);
/* xor ebx,ebx */
EMIT2(0x31, add_2reg(0xC0, IA32_EBX, IA32_EBX));
 
@@ -1475,8 +1475,8 @@ static void emit_epilogue(u8 **pprog, u32 stack_depth)
/* mov edx,dword ptr [ebp+off]*/
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EDX), STACK_VAR(r0[1]));
 
-   /* add ebp,SCRATCH_SIZE+4+12*/
-   EMIT3(0x83, add_1reg(0xC0, IA32_EBP), SCRATCH_SIZE + 16);
+   /* add ebp,SCRATCH_SIZE+12*/
+   EMIT3(0x83, add_1reg(0xC0, IA32_EBP), SCRATCH_SIZE + 12);
 
/* mov ebx,dword ptr [ebp-12]*/
EMIT3(0x8B, add_2reg(0x40, IA32_EBP, IA32_EBX), -12);
-- 
1.8.5.6.2.g3d8a54e.dirty


RE: [PATCH net-next] lan743x: Make symbol lan743x_pm_ops static

2018-07-25 Thread Bryan.Whitehead
> Subject: [PATCH net-next] lan743x: Make symbol lan743x_pm_ops static
> 
> Fixes the following sparse warning:
> 
> drivers/net/ethernet/microchip/lan743x_main.c:2944:25: warning:
>  symbol 'lan743x_pm_ops' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Acked-by: Bryan Whitehead 



Re: [patch net-next v4 03/12] net: sched: introduce chain object to uapi

2018-07-25 Thread Cong Wang
On Tue, Jul 24, 2018 at 11:49 PM Jiri Pirko  wrote:
>
> Wed, Jul 25, 2018 at 01:20:08AM CEST, xiyou.wangc...@gmail.com wrote:
> >So, you only send out notification when the last refcnt is gone.
> >
> >If the chain that is being deleted by a user is still used by an action,
> >you return 0 or -EPERM?
>
> 0 and the chain stays there until the action is removed. Hmm, do you thing
> that -EPERM should be returned in that case? The thing is, we have to
> flush the chain in order to see the action references are there. We would
> have to have 2 ref counters, one for filter, one for actions.
> What do you think?

_If_ RTM_DELCHAIN does decrease the chain refcnt, then it is
broken:

# tc chain add X... (refcnt == 1)
# tc action add ... goto chain X (refcnt==2)
# tc chain del X ... (refcnt== 1)
# tc chain del X ... (refcnt==0)

RTM_DELCHAIN should just test if refcnt is 1, if it is, delete it,
otherwise return -EPERM. This is how we handle tc standalone
actions, see tcf_idr_delete_index().

Yes, you might need two refcnt's here.


Re: [PATCH 4/4] net: dsa: Add Lantiq / Intel DSA driver for vrx200

2018-07-25 Thread John Crispin




On 21/07/18 21:13, Hauke Mehrtens wrote:

+#define MC_ENTRY(val, msk, ns, out, len, type, flags, ipv4_len) \
+   { val, msk, (ns << 10 | out << 4 | len >> 1),\
+   (len & 1) << 15 | type << 13 | flags << 9 | ipv4_len << 8 }
+static const struct gswip_pce_microcode gswip_pce_microcode[] = {
+   /*  valuemaskns  fields  L  type flags   
ipv4_len */
+   MC_ENTRY(0x88c3, 0x,  1, OUT_ITAG0,  4, INSTR,   FLAG_ITAG,  0),
+   MC_ENTRY(0x8100, 0x,  2, OUT_VTAG0,  2, INSTR,   FLAG_VLAN,  0),
+   MC_ENTRY(0x88A8, 0x,  1, OUT_VTAG0,  2, INSTR,   FLAG_VLAN,  0),
+   MC_ENTRY(0x8100, 0x,  1, OUT_VTAG0,  2, INSTR,   FLAG_VLAN,  0),
+   MC_ENTRY(0x8864, 0x, 17, OUT_ETHTYP, 1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x0800, 0x, 21, OUT_ETHTYP, 1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x86DD, 0x, 22, OUT_ETHTYP, 1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x8863, 0x, 16, OUT_ETHTYP, 1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0xF800, 10, OUT_NONE,   0, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 40, OUT_ETHTYP, 1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x0600, 0x0600, 40, OUT_ETHTYP, 1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 12, OUT_NONE,   1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 14, OUT_NONE,   1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x0300, 0xFF00, 41, OUT_NONE,   0, INSTR,   FLAG_SNAP,  0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 41, OUT_DIP7,   3, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 18, OUT_DIP7,   3, INSTR,   FLAG_PPPOE, 0),
+   MC_ENTRY(0x0021, 0x, 21, OUT_NONE,   1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x0057, 0x, 22, OUT_NONE,   1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 40, OUT_NONE,   0, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x4000, 0xF000, 24, OUT_IP0,4, INSTR,   FLAG_IPV4,  1),
+   MC_ENTRY(0x6000, 0xF000, 27, OUT_IP0,3, INSTR,   FLAG_IPV6,  0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 25, OUT_IP3,2, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 26, OUT_SIP0,   4, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 40, OUT_NONE,   0, LENACCU, FLAG_NO,0),
+   MC_ENTRY(0x1100, 0xFF00, 39, OUT_PROT,   1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x0600, 0xFF00, 39, OUT_PROT,   1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0xFF00, 33, OUT_IP3,   17, INSTR,   FLAG_HOP,   0),
+   MC_ENTRY(0x2B00, 0xFF00, 33, OUT_IP3,   17, INSTR,   FLAG_NN1,   0),
+   MC_ENTRY(0x3C00, 0xFF00, 33, OUT_IP3,   17, INSTR,   FLAG_NN2,   0),
+   MC_ENTRY(0x, 0x, 39, OUT_PROT,   1, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x00E0, 35, OUT_NONE,   0, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 40, OUT_NONE,   0, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0xFF00, 33, OUT_NONE,   0, IPV6,FLAG_HOP,   0),
+   MC_ENTRY(0x2B00, 0xFF00, 33, OUT_NONE,   0, IPV6,FLAG_NN1,   0),
+   MC_ENTRY(0x3C00, 0xFF00, 33, OUT_NONE,   0, IPV6,FLAG_NN2,   0),
+   MC_ENTRY(0x, 0x, 40, OUT_PROT,   1, IPV6,FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 40, OUT_SIP0,  16, INSTR,   FLAG_NO,0),
+   MC_ENTRY(0x, 0x, 41, OUT_APP0,   4, INSTR,   FLAG_IGMP,  0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR,   FLAG_END,   0),
+   MC_ENTRY(0x, 0x, 41, OUT_NONE,   0, INSTR, 

[PATCH] samples/bpf: Add BTF build flags to Makefile

2018-07-25 Thread Taeung Song
To smoothly test BTF supported binary on samples/bpf,
let samples/bpf/Makefile probe llc, pahole and
llvm-objcopy for BPF support and use them
like tools/testing/selftests/bpf/Makefile
changed from the commit c0fa1b6c3efc ("bpf: btf:
 Add BTF tests")

Cc: Martin KaFai Lau 
Signed-off-by: Taeung Song 
---
 samples/bpf/Makefile | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 1303af10e54d..e079266360a3 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -191,6 +191,8 @@ HOSTLOADLIBES_xdpsock   += -pthread
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
 LLC ?= llc
 CLANG ?= clang
+LLVM_OBJCOPY ?= llvm-objcopy
+BTF_PAHOLE ?= pahole
 
 # Detect that we're cross compiling and use the cross compiler
 ifdef CROSS_COMPILE
@@ -198,6 +200,20 @@ HOSTCC = $(CROSS_COMPILE)gcc
 CLANG_ARCH_ARGS = -target $(ARCH)
 endif
 
+BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris)
+BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF)
+BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 
'usage.*llvm')
+
+ifneq ($(BTF_LLC_PROBE),)
+ifneq ($(BTF_PAHOLE_PROBE),)
+ifneq ($(BTF_OBJCOPY_PROBE),)
+   EXTRA_CFLAGS += -g
+   LLC_FLAGS += -mattr=dwarfris
+   DWARF2BTF = y
+endif
+endif
+endif
+
 # Trick to allow make to be run from this directory
 all:
$(MAKE) -C ../../ $(CURDIR)/ BPF_SAMPLES_PATH=$(CURDIR)
@@ -256,4 +272,7 @@ $(obj)/%.o: $(src)/%.c
-Wno-gnu-variable-sized-type-not-at-end \
-Wno-address-of-packed-member -Wno-tautological-compare \
-Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
-   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+   -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf $(LLC_FLAGS) 
-filetype=obj -o $@
+ifeq ($(DWARF2BTF),y)
+   $(BTF_PAHOLE) -J $@
+endif
-- 
2.17.1



Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly

2018-07-25 Thread Daniel Borkmann
On 07/25/2018 05:48 PM, Paolo Abeni wrote:
> On Wed, 2018-07-25 at 15:03 +0200, Jiri Pirko wrote:
>> Wed, Jul 25, 2018 at 02:54:04PM CEST, pab...@redhat.com wrote:
>>> On Wed, 2018-07-25 at 13:56 +0200, Jiri Pirko wrote:
 Tue, Jul 24, 2018 at 10:06:39PM CEST, pab...@redhat.com wrote:
> Only cls_bpf and act_bpf can safely use such value. If a generic
> action is configured by user space to return TC_ACT_REDIRECT,
> the usually visible behavior is passing the skb up the stack - as
> for unknown action, but, with complex configuration, more random
> results can be obtained.
>
> This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
> at action init time, making the kernel behavior more consistent.
>
> v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
>
> Signed-off-by: Paolo Abeni 
> ---
> net/sched/act_api.c | 5 +
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index 148a89ab789b..24b5534967fe 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, 
> struct tcf_proto *tp,
>   }
>   }
>
> + if (a->tcfa_action == TC_ACT_REDIRECT) {
> + net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");

 Can't you push this warning through extack?

 But, wouldn't it be more appropriate to fail here? User is passing
 invalid configuration
>>>
>>> Jiri, Jamal, thank you for the feedback.
>>>
>>> Please allow me to answer both of you here, since you raised similar
>>> concers.
>>>
>>> I thought about rejecting the action, but that change of behavior could
>>> break some users, as currently most kind of invalid tcfa_action values
>>> are simply accepted.
>>>
>>> If there is consensus about it, I can simply fail.
>>
>> Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
>> really has no meaning for anyone to use it throughout its whole history.

That claim is completely wrong.

>> I would vote for "fail", yet I admit that I am usually alone in opinion
>> about similar uapi changes :)
> 
> Since even Jamal suggested the same, unless someone else voice some
> opposition soon, in v4 I'll opt for rejecting actions using
> TC_ACT_REDIRECT.

You should probably leave out act_bpf from that rejection as there may be
a small chance that users could potentially use it as default action.

Thanks,
Daniel


Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly

2018-07-25 Thread Paolo Abeni
On Wed, 2018-07-25 at 17:48 +0200, Paolo Abeni wrote:
> On Wed, 2018-07-25 at 15:03 +0200, Jiri Pirko wrote:
> > Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
> > really has no meaning for anyone to use it throughout its whole history.
> > I would vote for "fail", yet I admit that I am usually alone in opinion
> > about similar uapi changes :)
> 
> Since even Jamal suggested the same, unless someone else voice some
> opposition soon, in v4 I'll opt for rejecting actions using
> TC_ACT_REDIRECT.

Thinking again about it, I'm going to drop this patch from this series.
Since v2 is not strictly needed anymore and actually quite unrelated.

Thanks and sorry for the reiterated noise ;)

Paolo



prosím, chci s vámi mluvit

2018-07-25 Thread KATIE




[PATCH iproute2] iplink: report drop stats for VFs

2018-07-25 Thread Ivan Vecera
Kernel commit c5a9f6f0ab40 ("net/core: Add drop counters to VF
statistics") added support for Rx/Tx packet drops but these stats are
not reported by 'ip link'.

Cc: Eugenia Emantayev 
Cc: Saeed Mahameed 

Signed-off-by: Ivan Vecera 
---
 ip/ipaddress.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index ea8211c1..1b5ec02a 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -558,6 +558,8 @@ static void print_vf_stats64(FILE *fp, struct rtattr 
*vfstats)
   rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
print_u64(PRINT_JSON, "packets", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
+   print_u64(PRINT_JSON, "dropped", NULL,
+  rta_getattr_u64(vf[IFLA_VF_STATS_RX_DROPPED]));
print_u64(PRINT_JSON, "multicast", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
print_u64(PRINT_JSON, "broadcast", NULL,
@@ -570,26 +572,31 @@ static void print_vf_stats64(FILE *fp, struct rtattr 
*vfstats)
   rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
print_u64(PRINT_JSON, "tx_packets", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
+   print_u64(PRINT_JSON, "tx_dropped", NULL,
+  rta_getattr_u64(vf[IFLA_VF_STATS_TX_DROPPED]));
close_json_object();
close_json_object();
} else {
/* RX stats */
fprintf(fp, "%s", _SL_);
-   fprintf(fp, "RX: bytes  packets  mcast   bcast %s", _SL_);
+   fprintf(fp, "RX: bytes  packets  dropped mcast   bcast %s",
+   _SL_);
fprintf(fp, "");
 
print_num(fp, 10, rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
print_num(fp, 8, rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
+   print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_RX_DROPPED]));
print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_BROADCAST]));
 
/* TX stats */
fprintf(fp, "%s", _SL_);
-   fprintf(fp, "TX: bytes  packets %s", _SL_);
+   fprintf(fp, "TX: bytes  packets  dropped %s", _SL_);
fprintf(fp, "");
 
print_num(fp, 10, rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
print_num(fp, 8, rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
+   print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_TX_DROPPED]));
}
 }
 
-- 
2.16.4



  1   2   3   >