date:20181109

Re: [PATCH net v2] net: sched: cls_flower: validate nested enc_opts_policy to avoid warning

2018-11-09 Thread Jiri Pirko

Sat, Nov 10, 2018 at 06:06:26AM CET, jakub.kicin...@netronome.com wrote:
>TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
>currently contain further nested attributes, which are parsed by
>hand, so the policy is never actually used resulting in a W=1
>build warning:
>
>net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used 
>[-Wunused-const-variable=]
> enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {
>
>Add the validation anyway to avoid potential bugs when other
>attributes are added and to make the attribute structure slightly
>more clear.  Validation will also set extact to point to bad
>attribute on error.
>
>Signed-off-by: Jakub Kicinski 
>Acked-by: Simon Horman 

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Acked-by: Jiri Pirko

Re: [net-next PATCH v3] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Jiri Pirko

Sat, Nov 10, 2018 at 01:11:10AM CET, amritha.namb...@intel.com wrote:

[...]

>@@ -1026,8 +1122,7 @@ static void fl_init_dissector(struct flow_dissector 
>*dissector,
>FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
>-  FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>-   FLOW_DISSECTOR_KEY_PORTS, tp);
>+  FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);

You still need to set the key under a condition. Something like:
if (FL_KEY_IS_MASKED(mask, tp) ||
FL_KEY_IS_MASKED(mask, tp_min) ||
FL_KEY_IS_MASKED(mask, tp_max)
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);


>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>FLOW_DISSECTOR_KEY_IP, ip);
>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,

[...]

Re: [PATCH net-next] udp6: cleanup stats accounting in recvmsg()

2018-11-09 Thread David Miller

From: Paolo Abeni 
Date: Fri,  9 Nov 2018 15:52:45 +0100

> In the udp6 code path, we needed multiple tests to select the correct
> mib to be updated. Since we touch at least a counter at each iteration,
> it's convenient to use the recently introduced __UDPX_MIB() helper once
> and remove some code duplication.
> 
> Signed-off-by: Paolo Abeni 

Applied, thanks.

Re: [PATCH net-next v2 0/2] dpaa2-eth: defer probe on object allocate

2018-11-09 Thread David Miller

From: Ioana Ciornei 
Date: Fri, 9 Nov 2018 15:26:45 +

> Allocatable objects on the fsl-mc bus may be probed by the fsl_mc_allocator
> after the first attempts of other drivers to use them. Defer the probe when
> this situation happens.
> 
> Changes in v2:
>   - proper handling of IS_ERR_OR_NULL

Series applied.

Re: [PATCH net] flow_dissector: do not dissect l4 ports for fragments

2018-11-09 Thread David Miller

From: Eric Dumazet 
Date: Fri,  9 Nov 2018 16:53:06 -0800

> From: 배석진 
> 
> Only first fragment has the sport/dport information,
> not the following ones.
> 
> If we want consistent hash for all fragments, we need to
> ignore ports even for first fragment.
> 
> This bug is visible for IPv6 traffic, if incoming fragments
> do not have a flow label, since skb_get_hash() will give
> different results for first fragment and following ones.
> 
> It is also visible if any routing rule wants dissection
> and sport or dport.
> 
> See commit 5e5d6fed3741 ("ipv6: route: dissect flow
> in input path if fib rules need it") for details.
> 
> [edumazet] rewrote the changelog completely.
> 
> Fixes: 06635a35d13d ("flow_dissect: use programable dissector in 
> skb_flow_dissect and friends")
> Signed-off-by: 배석진 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable.

Re: [net-next PATCH v2] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Nambiar, Amritha

On 11/9/2018 4:10 AM, Jiri Pirko wrote:
> Wed, Nov 07, 2018 at 10:22:42PM CET, amritha.namb...@intel.com wrote:
>> Added support in tc flower for filtering based on port ranges.
>>
>> Example:
>> 1. Match on a port range:
>> -
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>>  action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0
>> filter protocol ip pref 1 flower chain 0 handle 0x1
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_port range 20-30
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>>Action statistics:
>>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> 2. Match on IP address and port range:
>> --
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>>  skip_hw action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0 handle 0x2
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_ip 192.168.1.1
>>  dst_port range 100-200
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>>Action statistics:
>>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> v2:
>> Addressed Jiri's comments:
>> 1. Added separate functions for dst and src comparisons.
>> 2. Removed endpoint enum.
>> 3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
>>  lookup.
>> 4. Cleaned up fl_lookup function.
>>
>> Signed-off-by: Amritha Nambiar 
>> ---
>> include/uapi/linux/pkt_cls.h |7 ++
>> net/sched/cls_flower.c   |  133 
>> --
>> 2 files changed, 134 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 401d0c1..b63c3cf 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -405,6 +405,11 @@ enum {
>>  TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>>  TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>
>> +TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>> +
> 
> Please put it at the end of the enum, as David mentioned.

Will fix in v3.

> 
> 
>>  TCA_FLOWER_FLAGS,
>>  TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>>  TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>> @@ -518,6 +523,8 @@ enum {
>>  TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
>> };
>>
>> +#define TCA_FLOWER_MASK_FLAGS_RANGE (1 << 0) /* Range-based match */
>> +
>> /* Match-all classifier */
>>
>> enum {
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 9aada2d..9d2582d 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -55,6 +55,9 @@ struct fl_flow_key {
>>  struct flow_dissector_key_ip ip;
>>  struct flow_dissector_key_ip enc_ip;
>>  struct flow_dissector_key_enc_opts enc_opts;
>> +
> 
> No need for an empty line.

Will fix in v3.

> 
> 
>> +struct flow_dissector_key_ports tp_min;
>> +struct flow_dissector_key_ports tp_max;
>> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
>> longs. */
>>
>> struct fl_flow_mask_range {
>> @@ -65,6 +68,7 @@ struct fl_flow_mask_range {
>> struct fl_flow_mask {
>>  struct fl_flow_key key;
>>  struct fl_flow_mask_range range;
>> +u32 flags;
>>  struct rhash_head ht_node;
>>  struct rhashtable ht;
>>  struct rhashtable_params filter_ht_params;
>> @@ -179,13 +183,89 @@ static void fl_clear_masked_range(struct fl_flow_key 
>> *key,
>>  memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
>> }
>>
>> -static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
>> -   struct fl_flow_key *mkey)
>> +static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter,
>> +  struct fl_flow_key *key,
>> +  struct fl_flow_key *mkey)
>> +{
>> +__be16 min_mask, max_mask, min_val, max_val;
>> +
>> +min_mask = htons(filter->mask->key.tp_min.dst);
>> +max_mask = htons(filter->mask->key.tp_max.dst);
>> +min_val = htons(filter->key.tp_min.dst);
>> +max_val = htons(filter->key.tp_max.dst);
>> +
>> +if (min_mask && max_mask) {
>> +if (htons(key->tp.dst) < min_val ||
>> +htons(key->tp.dst) > max_val)
>> +return false;
>> +
>> +/* skb does not have min and max values */
>> +

Re: [net-next PATCH v2] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Nambiar, Amritha

On 11/8/2018 3:15 PM, David Miller wrote:
> From: Amritha Nambiar 
> Date: Wed, 07 Nov 2018 13:22:42 -0800
> 
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 401d0c1..b63c3cf 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -405,6 +405,11 @@ enum {
>>  TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>>  TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>  
>> +TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>> +
>>  TCA_FLOWER_FLAGS,
>>  TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>>  TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>> @@ -518,6 +523,8 @@ enum {
> 
> I don't think you can do this without breaking UAPI, this changes the
> value of TCA_FLOWER_FLAGS and all subsequent values in this
> enumeration.
> 

Will move the new fields to the bottom of the enum in v3.

Re: [PATCH net] net: sched: cls_flower: validate nested enc_opts_policy to avoid build warning

2018-11-09 Thread Jakub Kicinski

On Fri, 09 Nov 2018 20:40:25 -0800 (PST), David Miller wrote:
> From: Jakub Kicinski 
> Date: Fri,  9 Nov 2018 14:41:22 -0800
> 
> > TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
> > currently contain further nested attributes, which are parsed by
> > hand, so the policy is never actually used.  Add the validation
> > anyway to avoid potential bugs when other attributes are added
> > and to make the attribute structure slightly more clear.  Validation
> > will also set extact to point to bad attribute on error.
> > 
> > Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
> > Signed-off-by: Jakub Kicinski 
> > Acked-by: Simon Horman   
> 
> If this fixes a build warning, please include the build warning
> message in your commit log.
> 
> Thanks!

Ah, sorry, it's a W=1 warning, which should have been mentioned, too.
I'll repost shortly!

Re: [PATCH net] net: sched: cls_flower: validate nested enc_opts_policy to avoid build warning

2018-11-09 Thread David Miller

From: Jakub Kicinski 
Date: Fri,  9 Nov 2018 14:41:22 -0800

> TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
> currently contain further nested attributes, which are parsed by
> hand, so the policy is never actually used.  Add the validation
> anyway to avoid potential bugs when other attributes are added
> and to make the attribute structure slightly more clear.  Validation
> will also set extact to point to bad attribute on error.
> 
> Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
> Signed-off-by: Jakub Kicinski 
> Acked-by: Simon Horman 

If this fixes a build warning, please include the build warning
message in your commit log.

Thanks!

[PATCH net-next 6/6] nfp: flower: remove unnecessary code in flow lookup

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Recent changes to NFP mean that stats updates from fw to driver no longer
require a flow lookup and (because egdev offload has been removed) the
ingress netdev for a lookup is now always known.

Remove obsolete code in a flow lookup that matches on host context and
that allows for a netdev to be NULL.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.h |  3 +--
 drivers/net/ethernet/netronome/nfp/flower/metadata.c | 11 +++
 drivers/net/ethernet/netronome/nfp/flower/offload.c  |  6 ++
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 9d134aa871fc..b858bac47621 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -20,7 +20,6 @@ struct nfp_fl_pre_lag;
 struct net_device;
 struct nfp_app;
 
-#define NFP_FL_STATS_CTX_DONT_CARE cpu_to_be32(0x)
 #define NFP_FL_STATS_ELEM_RS   FIELD_SIZEOF(struct nfp_fl_stats_id, \
 init_unalloc)
 #define NFP_FLOWER_MASK_ENTRY_RS   256
@@ -242,7 +241,7 @@ int nfp_modify_flow_metadata(struct nfp_app *app,
 
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx);
+  struct net_device *netdev);
 struct nfp_fl_payload *
 nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long 
tc_flower_cookie);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 9b4711ce98f0..573a4400a26c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -21,7 +21,6 @@ struct nfp_mask_id_table {
 struct nfp_fl_flow_table_cmp_arg {
struct net_device *netdev;
unsigned long cookie;
-   __be32 host_ctx;
 };
 
 static int nfp_release_stats_entry(struct nfp_app *app, u32 stats_context_id)
@@ -76,14 +75,13 @@ static int nfp_get_stats_entry(struct nfp_app *app, u32 
*stats_context_id)
 /* Must be called with either RTNL or rcu_read_lock */
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx)
+  struct net_device *netdev)
 {
struct nfp_fl_flow_table_cmp_arg flower_cmp_arg;
struct nfp_flower_priv *priv = app->priv;
 
flower_cmp_arg.netdev = netdev;
flower_cmp_arg.cookie = tc_flower_cookie;
-   flower_cmp_arg.host_ctx = host_ctx;
 
return rhashtable_lookup_fast(>flow_table, _cmp_arg,
  nfp_flower_table_params);
@@ -307,8 +305,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
priv->stats[stats_cxt].bytes = 0;
priv->stats[stats_cxt].used = jiffies;
 
-   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev,
-NFP_FL_STATS_CTX_DONT_CARE);
+   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (check_entry) {
if (nfp_release_stats_entry(app, stats_cxt))
return -EINVAL;
@@ -353,9 +350,7 @@ static int nfp_fl_obj_cmpfn(struct rhashtable_compare_arg 
*arg,
const struct nfp_fl_flow_table_cmp_arg *cmp_arg = arg->key;
const struct nfp_fl_payload *flow_entry = obj;
 
-   if ((!cmp_arg->netdev || flow_entry->ingress_dev == cmp_arg->netdev) &&
-   (cmp_arg->host_ctx == NFP_FL_STATS_CTX_DONT_CARE ||
-flow_entry->meta.host_ctx_id == cmp_arg->host_ctx))
+   if (flow_entry->ingress_dev == cmp_arg->netdev)
return flow_entry->tc_flower_cookie != cmp_arg->cookie;
 
return 1;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 0e2dfbb3ef86..545d94168874 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -512,8 +512,7 @@ nfp_flower_del_offload(struct nfp_app *app, struct 
net_device *netdev,
if (nfp_netdev_is_nfp_repr(netdev))
port = nfp_port_from_netdev(netdev);
 
-   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev,
- NFP_FL_STATS_CTX_DONT_CARE);
+   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (!nfp_flow)
return -ENOENT;
 
@@ -561,8 +560,7 @@ nfp_flower_get_stats(struct nfp_app *app, struct net_device 
*netdev,
struct nfp_fl_payload *nfp_flow;
u32 ctx_id;
 
-   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie,

[PATCH net-next 0/6] net: sched: indirect tc block cb registration

2018-11-09 Thread Jakub Kicinski

John says:

This patchset introduces an alternative to egdev offload by allowing a
driver to register for block updates when an external device (e.g. tunnel
netdev) is bound to a TC block. Drivers can track new netdevs or register
to existing ones to receive information on such events. Based on this,
they may register for block offload rules using already existing
functions.

The patchset also implements this new indirect block registration in the
NFP driver to allow the offloading of tunnel rules. The use of egdev
offload (which is currently only used for tunnel offload) is subsequently
removed.

RFC v2 -> PATCH
 - removed embedded tracking function from indir block register (now up to
   driver to clean up after itself)
 - refactored NFP code due to recent submissions
 - removed priv list clean function in NFP (list should be cleared by
   indirect block unregisters)

RFC v1->v2:
 - free allocated owner struct in block_owner_clean function
 - add geneve type helper function
 - move test stub in NFP (v1 patch 2) to full tunnel offload
   implementation via indirect blocks (v2 patches 3-8)

John Hurley (6):
  net: sched: register callbacks for indirect tc block binds
  nfp: flower: allow non repr netdev offload
  nfp: flower: increase scope of netdev checking functions
  nfp: flower: offload tunnel decap rules via indirect TC blocks
  nfp: flower: remove TC egdev offloads
  nfp: flower: remove unnecessary code in flow lookup

 .../ethernet/netronome/nfp/flower/action.c|  28 +-
 .../net/ethernet/netronome/nfp/flower/cmsg.h  |  27 ++
 .../net/ethernet/netronome/nfp/flower/main.c  |  18 +-
 .../net/ethernet/netronome/nfp/flower/main.h  |  14 +-
 .../net/ethernet/netronome/nfp/flower/match.c |  38 +--
 .../ethernet/netronome/nfp/flower/metadata.c  |  12 +-
 .../ethernet/netronome/nfp/flower/offload.c   | 243 +++--
 .../netronome/nfp/flower/tunnel_conf.c|  19 +-
 include/net/pkt_cls.h |  34 +++
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 256 +-
 11 files changed, 531 insertions(+), 161 deletions(-)

-- 
2.17.1

[PATCH net-next 4/6] nfp: flower: offload tunnel decap rules via indirect TC blocks

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Previously, TC block tunnel decap rules were only offloaded when a
callback was triggered through registration of the rules egress device.
This meant that the driver had no access to the ingress netdev and so
could not verify it was the same tunnel type that the rule implied.

Register tunnel devices for indirect TC block offloads in NFP, giving
access to new rules based on the ingress device rather than egress. Use
this to verify the netdev type of VXLAN and Geneve based rules and offload
the rules to HW if applicable.

Tunnel registration is done via a netdev notifier. On notifier
registration, this is triggered for already existing netdevs. This means
that NFP can register for offloads from devices that exist before it is
loaded (filter rules will be replayed from the TC core). Similarly, on
notifier unregister, a call is triggered for each currently active netdev.
This allows the driver to unregister any indirect block callbacks that may
still be active.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/flower/main.c  |   6 +
 .../net/ethernet/netronome/nfp/flower/main.h  |   5 +
 .../ethernet/netronome/nfp/flower/offload.c   | 137 +-
 3 files changed, 144 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 2ad00773750f..d1c3c2081461 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -568,6 +568,8 @@ static int nfp_flower_init(struct nfp_app *app)
goto err_cleanup_metadata;
}
 
+   INIT_LIST_HEAD(_priv->indr_block_cb_priv);
+
return 0;
 
 err_cleanup_metadata:
@@ -684,6 +686,10 @@ nfp_flower_netdev_event(struct nfp_app *app, struct 
net_device *netdev,
return ret;
}
 
+   ret = nfp_flower_reg_indir_block_handler(app, netdev, event);
+   if (ret & NOTIFY_STOP_MASK)
+   return ret;
+
return nfp_tunnel_mac_event_handler(app, netdev, event, ptr);
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 4a2b1a915131..8c84829ebd21 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -130,6 +130,7 @@ struct nfp_fl_lag {
  * @reify_wait_queue:  wait queue for repr reify response counting
  * @mtu_conf:  Configuration of repr MTU value
  * @nfp_lag:   Link aggregation data block
+ * @indr_block_cb_priv:List of priv data passed to indirect block cbs
  */
 struct nfp_flower_priv {
struct nfp_app *app;
@@ -162,6 +163,7 @@ struct nfp_flower_priv {
wait_queue_head_t reify_wait_queue;
struct nfp_mtu_conf mtu_conf;
struct nfp_fl_lag nfp_lag;
+   struct list_head indr_block_cb_priv;
 };
 
 /**
@@ -271,5 +273,8 @@ int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
   struct nfp_fl_pre_lag *pre_act);
 int nfp_flower_lag_get_output_id(struct nfp_app *app,
 struct net_device *master);
+int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
+  struct net_device *netdev,
+  unsigned long event);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 2c32edfc1a9d..222e1a98cf16 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -128,6 +128,7 @@ nfp_flower_calc_opt_layer(struct 
flow_dissector_key_enc_opts *enc_opts,
 
 static int
 nfp_flower_calculate_key_layers(struct nfp_app *app,
+   struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
bool egress,
@@ -186,8 +187,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
skb_flow_dissector_target(flow->dissector,
  
FLOW_DISSECTOR_KEY_ENC_CONTROL,
  flow->key);
-   if (!egress)
-   return -EOPNOTSUPP;
 
if (mask_enc_ctl->addr_type != 0x ||
enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
@@ -250,6 +249,10 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
default:
return -EOPNOTSUPP;
}
+
+   /* Ensure the ingress netdev matches the expected tun type. */
+   if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
+   return -EOPNOTSUPP;
} else if (egress) {
/*

[PATCH net-next 5/6] nfp: flower: remove TC egdev offloads

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Previously, only tunnel decap rules required egdev registration for
offload in NFP. These are now supported via indirect TC block callbacks.

Remove the egdev code from NFP.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/flower/main.c  | 12 ---
 .../net/ethernet/netronome/nfp/flower/main.h  |  3 -
 .../ethernet/netronome/nfp/flower/metadata.c  |  1 +
 .../ethernet/netronome/nfp/flower/offload.c   | 79 ---
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index d1c3c2081461..5059110a1768 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -146,23 +146,12 @@ nfp_flower_repr_netdev_stop(struct nfp_app *app, struct 
nfp_repr *repr)
return nfp_flower_cmsg_portmod(repr, false, repr->netdev->mtu, false);
 }
 
-static int
-nfp_flower_repr_netdev_init(struct nfp_app *app, struct net_device *netdev)
-{
-   return tc_setup_cb_egdev_register(netdev,
- nfp_flower_setup_tc_egress_cb,
- netdev_priv(netdev));
-}
-
 static void
 nfp_flower_repr_netdev_clean(struct nfp_app *app, struct net_device *netdev)
 {
struct nfp_repr *repr = netdev_priv(netdev);
 
kfree(repr->app_priv);
-
-   tc_setup_cb_egdev_unregister(netdev, nfp_flower_setup_tc_egress_cb,
-netdev_priv(netdev));
 }
 
 static void
@@ -711,7 +700,6 @@ const struct nfp_app_type app_flower = {
.vnic_init  = nfp_flower_vnic_init,
.vnic_clean = nfp_flower_vnic_clean,
 
-   .repr_init  = nfp_flower_repr_netdev_init,
.repr_preclean  = nfp_flower_repr_netdev_preclean,
.repr_clean = nfp_flower_repr_netdev_clean,
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 8c84829ebd21..9d134aa871fc 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -207,7 +207,6 @@ struct nfp_fl_payload {
char *unmasked_data;
char *mask_data;
char *action_data;
-   bool ingress_offload;
 };
 
 extern const struct rhashtable_params nfp_flower_table_params;
@@ -259,8 +258,6 @@ void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 
ipv4);
 void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4);
 void nfp_tunnel_request_route(struct nfp_app *app, struct sk_buff *skb);
 void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb);
-int nfp_flower_setup_tc_egress_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv);
 void nfp_flower_lag_init(struct nfp_fl_lag *lag);
 void nfp_flower_lag_cleanup(struct nfp_fl_lag *lag);
 int nfp_flower_lag_reset(struct nfp_fl_lag *lag);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 48729bf171e0..9b4711ce98f0 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -287,6 +287,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
 
nfp_flow->meta.host_ctx_id = cpu_to_be32(stats_cxt);
nfp_flow->meta.host_cookie = cpu_to_be64(flow->cookie);
+   nfp_flow->ingress_dev = netdev;
 
new_mask_id = 0;
if (!nfp_check_mask_add(app, nfp_flow->mask_data,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 222e1a98cf16..0e2dfbb3ef86 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -131,7 +131,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
-   bool egress,
enum nfp_flower_tun_type *tun_type)
 {
struct flow_dissector_key_basic *mask_basic = NULL;
@@ -253,9 +252,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
/* Ensure the ingress netdev matches the expected tun type. */
if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
return -EOPNOTSUPP;
-   } else if (egress) {
-   /* Reject non tunnel matches offloaded to egress repr. */
-   return -EOPNOTSUPP;
}
 
if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
@@ -376,7 +372,7 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 }
 
 static struct nfp_fl_payload *
-nfp_flower_allocate_new(struct nfp_fl_key_ls *key_layer, bool

[net-next PATCH v3] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Amritha Nambiar

Added support in tc flower for filtering based on port ranges.

Example:
1. Match on a port range:
-
$ tc filter add dev enp4s0 protocol ip parent :\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
action order 1: gact action drop
 random type none pass val 0
 index 1 ref 1 bind 1 installed 85 sec used 3 sec
Action statistics:
Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

2. Match on IP address and port range:
--
$ tc filter add dev enp4s0 protocol ip parent :\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent :
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
action order 1: gact action drop
 random type none pass val 0
 index 2 ref 1 bind 1 installed 58 sec used 2 sec
Action statistics:
Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

v3:
1. Moved new fields in UAPI enum to the end of enum.
2. Removed couple of empty lines.

v2:
Addressed Jiri's comments:
1. Added separate functions for dst and src comparisons.
2. Removed endpoint enum.
3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
  lookup.
4. Cleaned up fl_lookup function.

Signed-off-by: Amritha Nambiar 
---
 include/uapi/linux/pkt_cls.h |7 ++
 net/sched/cls_flower.c   |  132 --
 2 files changed, 133 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 401d0c1..95d0db2 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -485,6 +485,11 @@ enum {
 
TCA_FLOWER_IN_HW_COUNT,
 
+   TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
+   TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
+   TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
+   TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
+
__TCA_FLOWER_MAX,
 };
 
@@ -518,6 +523,8 @@ enum {
TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
 };
 
+#define TCA_FLOWER_MASK_FLAGS_RANGE(1 << 0) /* Range-based match */
+
 /* Match-all classifier */
 
 enum {
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9aada2d..7780106 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -55,6 +55,8 @@ struct fl_flow_key {
struct flow_dissector_key_ip ip;
struct flow_dissector_key_ip enc_ip;
struct flow_dissector_key_enc_opts enc_opts;
+   struct flow_dissector_key_ports tp_min;
+   struct flow_dissector_key_ports tp_max;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -65,6 +67,7 @@ struct fl_flow_mask_range {
 struct fl_flow_mask {
struct fl_flow_key key;
struct fl_flow_mask_range range;
+   u32 flags;
struct rhash_head ht_node;
struct rhashtable ht;
struct rhashtable_params filter_ht_params;
@@ -179,13 +182,89 @@ static void fl_clear_masked_range(struct fl_flow_key *key,
memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
 }
 
-static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
-  struct fl_flow_key *mkey)
+static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter,
+ struct fl_flow_key *key,
+ struct fl_flow_key *mkey)
+{
+   __be16 min_mask, max_mask, min_val, max_val;
+
+   min_mask = htons(filter->mask->key.tp_min.dst);
+   max_mask = htons(filter->mask->key.tp_max.dst);
+   min_val = htons(filter->key.tp_min.dst);
+   max_val = htons(filter->key.tp_max.dst);
+
+   if (min_mask && max_mask) {
+   if (htons(key->tp.dst) < min_val ||
+   htons(key->tp.dst) > max_val)
+   return false;
+
+   /* skb does not have min and max values */
+   mkey->tp_min.dst = filter->mkey.tp_min.dst;
+   mkey->tp_max.dst = filter->mkey.tp_max.dst;
+   }
+   return true;
+}
+
+static bool fl_range_port_src_cmp(struct cls_fl_filter *filter,
+ struct fl_flow_key *key,
+ struct fl_flow_key *mkey)
+{
+   __be16 min_mask, max_mask, min_val, max_val;
+
+   min_mask = htons(filter->mask->key.tp_min.src);
+   max_mask = htons(filter->mask->key.tp_max.src);
+   min_val = htons(filter->key.tp_min.src);
+   max_val =

Re: [iproute2 PATCH v2] tc: flower: Classify packets based port ranges

2018-11-09 Thread Nambiar, Amritha

On 11/9/2018 12:51 AM, Jiri Pirko wrote:
> Wed, Nov 07, 2018 at 10:22:50PM CET, amritha.namb...@intel.com wrote:
>> Added support for filtering based on port ranges.
>>
>> Example:
>> 1. Match on a port range:
>> -
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>>  action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0
>> filter protocol ip pref 1 flower chain 0 handle 0x1
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_port range 20-30
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>>Action statistics:
>>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> 2. Match on IP address and port range:
>> --
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>>  skip_hw action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0 handle 0x2
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_ip 192.168.1.1
>>  dst_port range 100-200
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>>Action statistics:
>>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> v2:
>> Addressed Jiri's comment to sync output format with input
>>
>> Signed-off-by: Amritha Nambiar 
>> ---
>> include/uapi/linux/pkt_cls.h |7 ++
>> tc/f_flower.c|  145 
>> +++---
>> 2 files changed, 142 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 401d0c1..b63c3cf 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -405,6 +405,11 @@ enum {
>>  TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>>  TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>
>> +TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>> +
>>  TCA_FLOWER_FLAGS,
>>  TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>>  TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>> @@ -518,6 +523,8 @@ enum {
>>  TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
>> };
>>
>> +#define TCA_FLOWER_MASK_FLAGS_RANGE (1 << 0) /* Range-based match */
>> +
>> /* Match-all classifier */
>>
>> enum {
>> diff --git a/tc/f_flower.c b/tc/f_flower.c
>> index 65fca04..7724a1d 100644
>> --- a/tc/f_flower.c
>> +++ b/tc/f_flower.c
>> @@ -494,6 +494,66 @@ static int flower_parse_port(char *str, __u8 ip_proto,
>>  return 0;
>> }
>>
>> +static int flower_port_range_attr_type(__u8 ip_proto, enum flower_endpoint 
>> type,
>> +   __be16 *min_port_type,
>> +   __be16 *max_port_type)
>> +{
>> +if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP ||
>> +ip_proto == IPPROTO_SCTP) {
>> +if (type == FLOWER_ENDPOINT_SRC) {
>> +*min_port_type = TCA_FLOWER_KEY_PORT_SRC_MIN;
>> +*max_port_type = TCA_FLOWER_KEY_PORT_SRC_MAX;
>> +} else {
>> +*min_port_type = TCA_FLOWER_KEY_PORT_DST_MIN;
>> +*max_port_type = TCA_FLOWER_KEY_PORT_DST_MAX;
>> +}
>> +} else {
>> +return -1;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +static int flower_parse_port_range(__be16 *min, __be16 *max, __u8 ip_proto,
>> +   enum flower_endpoint endpoint,
>> +   struct nlmsghdr *n)
>> +{
>> +__be16 min_port_type, max_port_type;
>> +
>> +flower_port_range_attr_type(ip_proto, endpoint, _port_type,
>> +_port_type);
>> +addattr16(n, MAX_MSG, min_port_type, *min);
>> +addattr16(n, MAX_MSG, max_port_type, *max);
>> +
>> +return 0;
>> +}
>> +
>> +static int get_range(__be16 *min, __be16 *max, char *argv)
>> +{
>> +char *r;
>> +
>> +r = strchr(argv, '-');
>> +if (r) {
>> +*r = '\0';
>> +if (get_be16(min, argv, 10)) {
>> +fprintf(stderr, "invalid min range\n");
>> +return -1;
>> +}
>> +if (get_be16(max, r + 1, 10)) {
>> +fprintf(stderr, "invalid max range\n");
>> +return -1;
>> +}
>> +if (htons(*max) <= htons(*min)) {
>> +fprintf(stderr, "max value should be greater than min 
>> value\n");
>> +return -1;
>> +

[PATCH] infiniband: nes: Fix more direct skb list accesses.

2018-11-09 Thread David Miller



The following:

skb = skb->next;
...
if (skb == (struct sk_buff *)queue)

is transformed into:

skb = skb_peek_next(skb, queue);
...
if (!skb)

Signed-off-by: David S. Miller 
---
 drivers/infiniband/hw/nes/nes_mgt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_mgt.c 
b/drivers/infiniband/hw/nes/nes_mgt.c
index e9661c3a..fc0c191014e9 100644
--- a/drivers/infiniband/hw/nes/nes_mgt.c
+++ b/drivers/infiniband/hw/nes/nes_mgt.c
@@ -223,11 +223,11 @@ static struct sk_buff *nes_get_next_skb(struct nes_device 
*nesdev, struct nes_qp
}
 
old_skb = skb;
-   skb = skb->next;
+   skb = skb_peek_next(skb, >pau_list);
skb_unlink(old_skb, >pau_list);
nes_mgt_free_skb(nesdev, old_skb, PCI_DMA_TODEVICE);
nes_rem_ref_cm_node(nesqp->cm_node);
-   if (skb == (struct sk_buff *)>pau_list)
+   if (!skb)
goto out;
}
return skb;
-- 
2.19.1

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread David Miller

From: Cong Wang 
Date: Fri,  9 Nov 2018 11:43:33 -0800

> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
> int len)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>   if (!skb_shared(skb))
>   skb->csum_valid = !sum;

Didn't you move this function into net/core/skbuff.c? :-)

Please respin.

[PATCH net v2] net: sched: cls_flower: validate nested enc_opts_policy to avoid warning

2018-11-09 Thread Jakub Kicinski

TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used resulting in a W=1
build warning:

net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used 
[-Wunused-const-variable=]
 enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {

Add the validation anyway to avoid potential bugs when other
attributes are added and to make the attribute structure slightly
more clear.  Validation will also set extact to point to bad
attribute on error.

Signed-off-by: Jakub Kicinski 
Acked-by: Simon Horman 
---
 net/sched/cls_flower.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9aada2d0ef06..c6c327874abc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -709,11 +709,23 @@ static int fl_set_enc_opt(struct nlattr **tb, struct 
fl_flow_key *key,
  struct netlink_ext_ack *extack)
 {
const struct nlattr *nla_enc_key, *nla_opt_key, *nla_opt_msk = NULL;
-   int option_len, key_depth, msk_depth = 0;
+   int err, option_len, key_depth, msk_depth = 0;
+
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
 
nla_enc_key = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS]);
 
if (tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]) {
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
+
nla_opt_msk = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
msk_depth = nla_len(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
}
-- 
2.17.1

Re: [PATCH v2 net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-09 Thread David Miller

From: Heiner Kallweit 
Date: Fri, 9 Nov 2018 18:35:52 +0100

> As a heritage from the very early days of phylib member interrupts is
> defined as u32 even though it's just a flag whether interrupts are
> enabled. So we can change it to a bitfield member. In addition change
> the code dealing with this member in a way that it's clear we're
> dealing with a bool value.
> 
> Signed-off-by: Heiner Kallweit 
> ---
> v2:
> - use false/true instead of 0/1 for the constants

Applied.

[PATCH net-next 1/6] net: sched: register callbacks for indirect tc block binds

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Currently drivers can register to receive TC block bind/unbind callbacks
by implementing the setup_tc ndo in any of their given netdevs. However,
drivers may also be interested in binds to higher level devices (e.g.
tunnel drivers) to potentially offload filters applied to them.

Introduce indirect block devs which allows drivers to register callbacks
for block binds on other devices. The callback is triggered when the
device is bound to a block, allowing the driver to register for rules
applied to that block using already available functions.

Freeing an indirect block callback will trigger an unbind event (if
necessary) to direct the driver to remove any offloaded rules and unreg
any block rule callbacks. It is the responsibility of the implementing
driver to clean any registered indirect block callbacks before exiting,
if the block it still active at such a time.

Allow registering an indirect block dev callback for a device that is
already bound to a block. In this case (if it is an ingress block),
register and also trigger the callback meaning that any already installed
rules can be replayed to the calling driver.

Signed-off-by: John Hurley 
Signed-off-by: Jakub Kicinski 
---
 include/net/pkt_cls.h |  34 +
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 256 +-
 3 files changed, 292 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 00f71644fbcd..f6c0cd29dea4 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -81,6 +81,14 @@ void __tcf_block_cb_unregister(struct tcf_block *block,
   struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 tc_setup_cb_t *cb, void *cb_ident);
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb, void *cb_ident);
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb, void *cb_ident);
 
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 struct tcf_result *res, bool compat_mode);
@@ -183,6 +191,32 @@ void tcf_block_cb_unregister(struct tcf_block *block,
 {
 }
 
+static inline
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+   return 0;
+}
+
+static inline
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+   return 0;
+}
+
+static inline
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+}
+
+static inline
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
   struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a8dd1fc141b6..9481f2c142e2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -24,6 +24,9 @@ struct bpf_flow_keys;
 typedef int tc_setup_cb_t(enum tc_setup_type type,
  void *type_data, void *cb_priv);
 
+typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+   enum tc_setup_type type, void *type_data);
+
 struct qdisc_rate_table {
struct tc_ratespec rate;
u32 data[256];
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f427a1e00e7e..d92f44ac4c39 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -365,6 +366,245 @@ static void tcf_chain_flush(struct tcf_chain *chain)
}
 }
 
+static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
+{
+   const struct Qdisc_class_ops *cops;
+   struct Qdisc *qdisc;
+
+   if (!dev_ingress_queue(dev))
+   return NULL;
+
+   qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
+   if (!qdisc)
+   return NULL;
+
+   cops = qdisc->ops->cl_ops;
+   if (!cops)
+   return NULL;
+
+   if (!cops->tcf_block)
+   return NULL;
+
+   return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
+}
+
+static struct rhashtable indr_setup_block_ht;
+
+struct tc_indr_block_dev {
+

[PATCH net-next 2/6] nfp: flower: allow non repr netdev offload

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Previously the offload functions in NFP assumed that the ingress (or
egress) netdev passed to them was an nfp repr.

Modify the driver to permit the passing of non repr netdevs as the ingress
device for an offload rule candidate. This may include devices such as
tunnels. The driver should then base its offload decision on a combination
of ingress device and egress port for a rule.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/flower/action.c| 14 +++
 .../net/ethernet/netronome/nfp/flower/main.h  |  3 +-
 .../net/ethernet/netronome/nfp/flower/match.c | 38 ++-
 .../ethernet/netronome/nfp/flower/offload.c   | 33 +---
 4 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index fbc052d5bb47..2e64fe878da6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -149,11 +149,12 @@ nfp_fl_output(struct nfp_app *app, struct nfp_fl_output 
*output,
/* Set action output parameters. */
output->flags = cpu_to_be16(tmp_flags);
 
-   /* Only offload if egress ports are on the same device as the
-* ingress port.
-*/
-   if (!switchdev_port_same_parent_id(in_dev, out_dev))
-   return -EOPNOTSUPP;
+   if (nfp_netdev_is_nfp_repr(in_dev)) {
+   /* Confirm ingress and egress are on same device. */
+   if (!switchdev_port_same_parent_id(in_dev, out_dev))
+   return -EOPNOTSUPP;
+   }
+
if (!nfp_netdev_is_nfp_repr(out_dev))
return -EOPNOTSUPP;
 
@@ -840,9 +841,8 @@ nfp_flower_loop_action(struct nfp_app *app, const struct 
tc_action *a,
*a_len += sizeof(struct nfp_fl_push_vlan);
} else if (is_tcf_tunnel_set(a)) {
struct ip_tunnel_info *ip_tun = tcf_tunnel_info(a);
-   struct nfp_repr *repr = netdev_priv(netdev);
 
-   *tun_type = nfp_fl_get_tun_from_act_l4_port(repr->app, a);
+   *tun_type = nfp_fl_get_tun_from_act_l4_port(app, a);
if (*tun_type == NFP_FL_TUNNEL_NONE)
return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 0f6f1675f6f1..4a2b1a915131 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -222,7 +222,8 @@ void nfp_flower_metadata_cleanup(struct nfp_app *app);
 
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
enum tc_setup_type type, void *type_data);
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c 
b/drivers/net/ethernet/netronome/nfp/flower/match.c
index e54fb6034326..cdf75595f627 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -52,10 +52,13 @@ nfp_flower_compile_port(struct nfp_flower_in_port *frame, 
u32 cmsg_port,
return 0;
}
 
-   if (tun_type)
+   if (tun_type) {
frame->in_port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type);
-   else
+   } else {
+   if (!cmsg_port)
+   return -EOPNOTSUPP;
frame->in_port = cpu_to_be32(cmsg_port);
+   }
 
return 0;
 }
@@ -289,17 +292,21 @@ nfp_flower_compile_ipv4_udp_tun(struct 
nfp_flower_ipv4_udp_tun *frame,
}
 }
 
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
  enum nfp_flower_tun_type tun_type)
 {
-   struct nfp_repr *netdev_repr;
+   u32 cmsg_port = 0;
int err;
u8 *ext;
u8 *msk;
 
+   if (nfp_netdev_is_nfp_repr(netdev))
+   cmsg_port = nfp_repr_get_port_id(netdev);
+
memset(nfp_flow->unmasked_data, 0, key_ls->key_size);
memset(nfp_flow->mask_data, 0, key_ls->key_size);
 
@@ -327,15 +334,13 @@ int

[PATCH net-next 3/6] nfp: flower: increase scope of netdev checking functions

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Both the actions and tunnel_conf files contain local functions that check
the type of an input netdev. In preparation for re-use with tunnel offload
via indirect blocks, move these to static inline functions in a header
file.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/flower/action.c| 14 --
 .../net/ethernet/netronome/nfp/flower/cmsg.h  | 27 +++
 .../netronome/nfp/flower/tunnel_conf.c| 19 ++---
 3 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 2e64fe878da6..8d54b36afee8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -2,7 +2,6 @@
 /* Copyright (C) 2017-2018 Netronome Systems, Inc. */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -11,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "cmsg.h"
 #include "main.h"
@@ -92,18 +90,6 @@ nfp_fl_pre_lag(struct nfp_app *app, const struct tc_action 
*action,
return act_size;
 }
 
-static bool nfp_fl_netdev_is_tunnel_type(struct net_device *out_dev,
-enum nfp_flower_tun_type tun_type)
-{
-   if (netif_is_vxlan(out_dev))
-   return tun_type == NFP_FL_TUNNEL_VXLAN;
-
-   if (netif_is_geneve(out_dev))
-   return tun_type == NFP_FL_TUNNEL_GENEVE;
-
-   return false;
-}
-
 static int
 nfp_fl_output(struct nfp_app *app, struct nfp_fl_output *output,
  const struct tc_action *action, struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 3e391555e191..15f41cfef9f1 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../nfp_app.h"
 #include "../nfpcore/nfp_cpp.h"
@@ -499,6 +500,32 @@ static inline int nfp_flower_cmsg_get_data_len(struct 
sk_buff *skb)
return skb->len - NFP_FLOWER_CMSG_HLEN;
 }
 
+static inline bool
+nfp_fl_netdev_is_tunnel_type(struct net_device *netdev,
+enum nfp_flower_tun_type tun_type)
+{
+   if (netif_is_vxlan(netdev))
+   return tun_type == NFP_FL_TUNNEL_VXLAN;
+   if (netif_is_geneve(netdev))
+   return tun_type == NFP_FL_TUNNEL_GENEVE;
+
+   return false;
+}
+
+static inline bool nfp_fl_is_netdev_to_offload(struct net_device *netdev)
+{
+   if (!netdev->rtnl_link_ops)
+   return false;
+   if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
+   return true;
+   if (netif_is_vxlan(netdev))
+   return true;
+   if (netif_is_geneve(netdev))
+   return true;
+
+   return false;
+}
+
 struct sk_buff *
 nfp_flower_cmsg_mac_repr_start(struct nfp_app *app, unsigned int num_ports);
 void
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 5d641d7dabff..2d9f26a725c2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -4,7 +4,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -182,20 +181,6 @@ void nfp_tunnel_keep_alive(struct nfp_app *app, struct 
sk_buff *skb)
}
 }
 
-static bool nfp_tun_is_netdev_to_offload(struct net_device *netdev)
-{
-   if (!netdev->rtnl_link_ops)
-   return false;
-   if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
-   return true;
-   if (netif_is_vxlan(netdev))
-   return true;
-   if (netif_is_geneve(netdev))
-   return true;
-
-   return false;
-}
-
 static int
 nfp_flower_xmit_tun_conf(struct nfp_app *app, u8 mtype, u16 plen, void *pdata,
 gfp_t flag)
@@ -617,7 +602,7 @@ static void nfp_tun_add_to_mac_offload_list(struct 
net_device *netdev,
 
if (nfp_netdev_is_nfp_repr(netdev))
port = nfp_repr_get_port_id(netdev);
-   else if (!nfp_tun_is_netdev_to_offload(netdev))
+   else if (!nfp_fl_is_netdev_to_offload(netdev))
return;
 
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
@@ -660,7 +645,7 @@ int nfp_tunnel_mac_event_handler(struct nfp_app *app,
 {
if (event == NETDEV_DOWN || event == NETDEV_UNREGISTER) {
/* If non-nfp netdev then free its offload index. */
-   if (nfp_tun_is_netdev_to_offload(netdev))
+   if (nfp_fl_is_netdev_to_offload(netdev))
nfp_tun_del_mac_idx(app, netdev->ifindex);
} else if (event == NETDEV_UP || event == NETDEV_CHANGEADDR ||
   event == NETDEV_REGISTER) {

Re: [PATCH net-next] nfp: use the new __netdev_tx_sent_queue() BQL optimisation

2018-11-09 Thread David Miller

From: Jakub Kicinski 
Date: Fri,  9 Nov 2018 18:50:00 -0800

> __netdev_tx_sent_queue() was added in commit e59020abf0f
> ("net: bql: add __netdev_tx_sent_queue()") and allows for
> better GSO performance.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Dirk van der Merwe 
> Reviewed-by: Simon Horman 

Applied.

Re: [PATCH net] net: qualcomm: rmnet: Fix incorrect assignment of real_dev

2018-11-09 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Fri,  9 Nov 2018 18:56:27 -0700

> A null dereference was observed when a sysctl was being set
> from userspace and rmnet was stuck trying to complete some actions
> in the NETDEV_REGISTER callback. This is because the real_dev is set
> only after the device registration handler completes.
 ...
> Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink")
> Signed-off-by: Sean Tranchetti 
> Signed-off-by: Subash Abhinov Kasiviswanathan 

Applied and queued up for -stable, thanks.

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread David Miller

From: Richard Cochran 
Date: Fri, 9 Nov 2018 17:44:31 -0800

> On Fri, Nov 09, 2018 at 03:28:46PM -0800, David Miller wrote:
>> This series looks good to me but I want to give Richard an opportunity to
>> review it first.
> 
> The series is good to go.
> 
> Acked-by: Richard Cochran 

Great, series applied to net-next, thanks everyone.

[PATCH net-next] nfp: use the new __netdev_tx_sent_queue() BQL optimisation

2018-11-09 Thread Jakub Kicinski

__netdev_tx_sent_queue() was added in commit e59020abf0f
("net: bql: add __netdev_tx_sent_queue()") and allows for
better GSO performance.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b55c91818a67..9aa6265bf4de 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -890,8 +890,6 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
u64_stats_update_end(_vec->tx_sync);
}
 
-   netdev_tx_sent_queue(nd_q, txbuf->real_len);
-
skb_tx_timestamp(skb);
 
tx_ring->wr_p += nr_frags + 1;
@@ -899,7 +897,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
nfp_net_tx_ring_stop(nd_q, tx_ring);
 
tx_ring->wr_ptr_add += nr_frags + 1;
-   if (!skb->xmit_more || netif_xmit_stopped(nd_q))
+   if (__netdev_tx_sent_queue(nd_q, txbuf->real_len, skb->xmit_more))
nfp_net_tx_xmit_more_flush(tx_ring);
 
return NETDEV_TX_OK;
-- 
2.17.1

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Yunsheng Lin

On 2018/11/10 10:09, Cong Wang wrote:
> On Fri, Nov 9, 2018 at 6:02 PM Yunsheng Lin  wrote:
>>
>> On 2018/11/10 9:42, Cong Wang wrote:
>>> On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:

 On 2018/11/10 3:43, Cong Wang wrote:
> Currently netdev_rx_csum_fault() only shows a device name,
> we need more information about the skb for debugging.
>
> Sample output:
>
>  ens3: hw csum failure
>  dev features: 0x00014b89
>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> csum_complete_sw=0, csum_valid=0
>
> Signed-off-by: Cong Wang 
> ---
>  include/linux/netdevice.h |  5 +++--
>  net/core/datagram.c   |  6 +++---
>  net/core/dev.c| 10 --
>  net/sunrpc/socklib.c  |  2 +-
>  4 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8abf7b91..fabcd9fa6cf7 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -4332,9 +4332,10 @@ static inline bool 
> can_checksum_protocol(netdev_features_t features,
>  }
>
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev);
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
>  #else
> -static inline void netdev_rx_csum_fault(struct net_device *dev)
> +static inline void netdev_rx_csum_fault(struct net_device *dev,
> + struct sk_buff *skb)
>  {
>  }
>  #endif
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
> *skb, int len)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>   if (!skb_shared(skb))
>   skb->csum_valid = !sum;
> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>
>   if (!skb_shared(skb)) {
> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff 
> *skb,
>
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(NULL);
> + netdev_rx_csum_fault(NULL, skb);
>   }
>   return 0;
>  fault:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ffcbdd55fa9..2b337df26117 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
>
>  /* Take action when hardware reception checksum errors are detected. */
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev)
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
>  {
>   if (net_ratelimit()) {
>   pr_err("%s: hw csum failure\n", dev ? dev->name : 
> "");
> + if (dev)
> + pr_err("dev features: %pNF\n", >features);
> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> +skb->csum_complete_sw, skb->csum_valid);


 This function also have the netdev available, use netdev_err to log the 
 error?
>>>
>>> It is apparently not me who picked pr_err() from the beginning,
>>> I just follow that pr_err(). If you are not happy with it, please send
>>> a followup.
>>
>> Yes, but perhaps it is something to improve.
> 
> 
> Sure, no one stops you from improving it in a followup patch. :)
> 
> 
>> When using the netdev, then maybe it does not have to check if dev is null, 
>> because
>> netdev_err has handled the netdev being NULL case.
>> Maybe I missed something that netdev can not be used here?
>> If not, maybe I can send a followup.
>>
> 
> Maybe. Again, my patch intends to add a few debugging logs,
> not to convert pr_err() to whatever else, they are totally different
> goals. I choose pr_err() only because I follow the existing one,
> not to say which one is better than the other.

Ok. :)

> 
>

Re: [PATCH] add an initial version of snmp_counter.rst

2018-11-09 Thread Cong Wang

(Cc Randy)

On Fri, Nov 9, 2018 at 10:13 AM yupeng  wrote:
>
> The snmp_counter.rst run a set of simple experiments, explains the
> meaning of snmp counters depend on the experiments' results. This is
> an initial version, only covers a small part of the snmp counters.


I don't look into much details, so just a few high-level reviews:

1. Please try to group those counters by protocol, it would be easier
to search.

2. For many counters you provide a link to RFC, do you just copy
and paste them? Please try to expand.

3. _I think_ you don't need to show, for example, how to run a ping
command. It's safe to assume readers already know this. Therefore,
just explaining those counters is okay.


Thanks.

>
> Signed-off-by: yupeng 
> ---
>  Documentation/networking/index.rst|   1 +
>  Documentation/networking/snmp_counter.rst | 963 ++
>  2 files changed, 964 insertions(+)
>  create mode 100644 Documentation/networking/snmp_counter.rst
>
> diff --git a/Documentation/networking/index.rst 
> b/Documentation/networking/index.rst
> index bd89dae8d578..6a47629ef8ed 100644
> --- a/Documentation/networking/index.rst
> +++ b/Documentation/networking/index.rst
> @@ -31,6 +31,7 @@ Contents:
> net_failover
> alias
> bridge
> +   snmp_counter
>
>  .. only::  subproject
>
> diff --git a/Documentation/networking/snmp_counter.rst 
> b/Documentation/networking/snmp_counter.rst
> new file mode 100644
> index ..2939c5acf675
> --- /dev/null
> +++ b/Documentation/networking/snmp_counter.rst
> @@ -0,0 +1,963 @@
> +
> +snmp counter tutorial
> +
> +
> +This document explains the meaning of snmp counters. For understanding
> +their meanings better, this document doesn't explain the counters one
> +by one, but creates a set of experiments, and explains the counters
> +depend on the experiments' results. The experiments are on one or two
> +virtual machines. Except for the test commands we use in the experiments,
> +the virtual machines have no other network traffic. We use the 'nstat'
> +command to get the values of snmp counters, before every test, we run
> +'nstat -n' to update the history, so the 'nstat' output would only
> +show the changes of the snmp counters. For more information about
> +nstat, please refer:
> +
> +http://man7.org/linux/man-pages/man8/nstat.8.html
> +
> +icmp ping
> +
> +
> +Run the ping command against the public dns server 8.8.8.8::
> +
> +  nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
> +  PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
> +  64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms
> +
> +  --- 8.8.8.8 ping statistics ---
> +  1 packets transmitted, 1 received, 0% packet loss, time 0ms
> +  rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms
> +
> +The nstayt result::
> +
> +  nstatuser@nstat-a:~$ nstat
> +  #kernel
> +  IpInReceives1  0.0
> +  IpInDelivers1  0.0
> +  IpOutRequests   1  0.0
> +  IcmpInMsgs  1  0.0
> +  IcmpInEchoReps  1  0.0
> +  IcmpOutMsgs 1  0.0
> +  IcmpOutEchos1  0.0
> +  IcmpMsgInType0  1  0.0
> +  IcmpMsgOutType8 1  0.0
> +  IpExtInOctets   84 0.0
> +  IpExtOutOctets  84 0.0
> +  IpExtInNoECTPkts1  0.0
> +
> +The nstat output could be divided into two part: one with the 'Ext'
> +keyword, another without the 'Ext' keyword. If the counter name
> +doesn't have 'Ext', it is defined by one of snmp rfc, if it has 'Ext',
> +it is a kernel extent counter. Below we explain them one by one.
> +
> +The rfc defined counters
> +--
> +
> +* IpInReceives
> +The total number of input datagrams received from interfaces,
> +including those received in error.
> +
> +https://tools.ietf.org/html/rfc1213#page-26
> +
> +* IpInDelivers
> +The total number of input datagrams successfully delivered to IP
> +user-protocols (including ICMP).
> +
> +https://tools.ietf.org/html/rfc1213#page-28
> +
> +* IpOutRequests
> +The total number of IP datagrams which local IP user-protocols
> +(including ICMP) supplied to IP in requests for transmission.  Note
> +that this counter does not include any datagrams counted in
> +ipForwDatagrams.
> +
> +https://tools.ietf.org/html/rfc1213#page-28
> +
> +* IcmpInMsgs
> +The total number of ICMP messages which the entity received.  Note
> +that this counter includes all those counted by icmpInErrors.
> +
> +https://tools.ietf.org/html/rfc1213#page-41
> +
> +* IcmpInEchoReps
> +The number of ICMP Echo Reply messages received.
> +
> +https://tools.ietf.org/html/rfc1213#page-42
> +
> +* IcmpOutMsgs
> +The total number of ICMP messages which this entity attempted

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Cong Wang

On Fri, Nov 9, 2018 at 6:02 PM Yunsheng Lin  wrote:
>
> On 2018/11/10 9:42, Cong Wang wrote:
> > On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:
> >>
> >> On 2018/11/10 3:43, Cong Wang wrote:
> >>> Currently netdev_rx_csum_fault() only shows a device name,
> >>> we need more information about the skb for debugging.
> >>>
> >>> Sample output:
> >>>
> >>>  ens3: hw csum failure
> >>>  dev features: 0x00014b89
> >>>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> >>> csum_complete_sw=0, csum_valid=0
> >>>
> >>> Signed-off-by: Cong Wang 
> >>> ---
> >>>  include/linux/netdevice.h |  5 +++--
> >>>  net/core/datagram.c   |  6 +++---
> >>>  net/core/dev.c| 10 --
> >>>  net/sunrpc/socklib.c  |  2 +-
> >>>  4 files changed, 15 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>> index 857f8abf7b91..fabcd9fa6cf7 100644
> >>> --- a/include/linux/netdevice.h
> >>> +++ b/include/linux/netdevice.h
> >>> @@ -4332,9 +4332,10 @@ static inline bool 
> >>> can_checksum_protocol(netdev_features_t features,
> >>>  }
> >>>
> >>>  #ifdef CONFIG_BUG
> >>> -void netdev_rx_csum_fault(struct net_device *dev);
> >>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
> >>>  #else
> >>> -static inline void netdev_rx_csum_fault(struct net_device *dev)
> >>> +static inline void netdev_rx_csum_fault(struct net_device *dev,
> >>> + struct sk_buff *skb)
> >>>  {
> >>>  }
> >>>  #endif
> >>> diff --git a/net/core/datagram.c b/net/core/datagram.c
> >>> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> >>> --- a/net/core/datagram.c
> >>> +++ b/net/core/datagram.c
> >>> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
> >>> *skb, int len)
> >>>   if (likely(!sum)) {
> >>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >>>   !skb->csum_complete_sw)
> >>> - netdev_rx_csum_fault(skb->dev);
> >>> + netdev_rx_csum_fault(skb->dev, skb);
> >>>   }
> >>>   if (!skb_shared(skb))
> >>>   skb->csum_valid = !sum;
> >>> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
> >>>   if (likely(!sum)) {
> >>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >>>   !skb->csum_complete_sw)
> >>> - netdev_rx_csum_fault(skb->dev);
> >>> + netdev_rx_csum_fault(skb->dev, skb);
> >>>   }
> >>>
> >>>   if (!skb_shared(skb)) {
> >>> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff 
> >>> *skb,
> >>>
> >>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >>>   !skb->csum_complete_sw)
> >>> - netdev_rx_csum_fault(NULL);
> >>> + netdev_rx_csum_fault(NULL, skb);
> >>>   }
> >>>   return 0;
> >>>  fault:
> >>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>> index 0ffcbdd55fa9..2b337df26117 100644
> >>> --- a/net/core/dev.c
> >>> +++ b/net/core/dev.c
> >>> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
> >>>
> >>>  /* Take action when hardware reception checksum errors are detected. */
> >>>  #ifdef CONFIG_BUG
> >>> -void netdev_rx_csum_fault(struct net_device *dev)
> >>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
> >>>  {
> >>>   if (net_ratelimit()) {
> >>>   pr_err("%s: hw csum failure\n", dev ? dev->name : 
> >>> "");
> >>> + if (dev)
> >>> + pr_err("dev features: %pNF\n", >features);
> >>> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> >>> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> >>> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> >>> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> >>> +skb->csum_complete_sw, skb->csum_valid);
> >>
> >>
> >> This function also have the netdev available, use netdev_err to log the 
> >> error?
> >
> > It is apparently not me who picked pr_err() from the beginning,
> > I just follow that pr_err(). If you are not happy with it, please send
> > a followup.
>
> Yes, but perhaps it is something to improve.


Sure, no one stops you from improving it in a followup patch. :)


> When using the netdev, then maybe it does not have to check if dev is null, 
> because
> netdev_err has handled the netdev being NULL case.
> Maybe I missed something that netdev can not be used here?
> If not, maybe I can send a followup.
>

Maybe. Again, my patch intends to add a few debugging logs,
not to convert pr_err() to whatever else, they are totally different
goals. I choose pr_err() only because I follow the existing one,
not to say which one is better than the other.

Thanks.

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Yunsheng Lin

On 2018/11/10 9:42, Cong Wang wrote:
> On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:
>>
>> On 2018/11/10 3:43, Cong Wang wrote:
>>> Currently netdev_rx_csum_fault() only shows a device name,
>>> we need more information about the skb for debugging.
>>>
>>> Sample output:
>>>
>>>  ens3: hw csum failure
>>>  dev features: 0x00014b89
>>>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
>>> csum_complete_sw=0, csum_valid=0
>>>
>>> Signed-off-by: Cong Wang 
>>> ---
>>>  include/linux/netdevice.h |  5 +++--
>>>  net/core/datagram.c   |  6 +++---
>>>  net/core/dev.c| 10 --
>>>  net/sunrpc/socklib.c  |  2 +-
>>>  4 files changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 857f8abf7b91..fabcd9fa6cf7 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -4332,9 +4332,10 @@ static inline bool 
>>> can_checksum_protocol(netdev_features_t features,
>>>  }
>>>
>>>  #ifdef CONFIG_BUG
>>> -void netdev_rx_csum_fault(struct net_device *dev);
>>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
>>>  #else
>>> -static inline void netdev_rx_csum_fault(struct net_device *dev)
>>> +static inline void netdev_rx_csum_fault(struct net_device *dev,
>>> + struct sk_buff *skb)
>>>  {
>>>  }
>>>  #endif
>>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>>> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
>>> --- a/net/core/datagram.c
>>> +++ b/net/core/datagram.c
>>> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
>>> *skb, int len)
>>>   if (likely(!sum)) {
>>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>   !skb->csum_complete_sw)
>>> - netdev_rx_csum_fault(skb->dev);
>>> + netdev_rx_csum_fault(skb->dev, skb);
>>>   }
>>>   if (!skb_shared(skb))
>>>   skb->csum_valid = !sum;
>>> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>>>   if (likely(!sum)) {
>>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>   !skb->csum_complete_sw)
>>> - netdev_rx_csum_fault(skb->dev);
>>> + netdev_rx_csum_fault(skb->dev, skb);
>>>   }
>>>
>>>   if (!skb_shared(skb)) {
>>> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>>>
>>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>   !skb->csum_complete_sw)
>>> - netdev_rx_csum_fault(NULL);
>>> + netdev_rx_csum_fault(NULL, skb);
>>>   }
>>>   return 0;
>>>  fault:
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 0ffcbdd55fa9..2b337df26117 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
>>>
>>>  /* Take action when hardware reception checksum errors are detected. */
>>>  #ifdef CONFIG_BUG
>>> -void netdev_rx_csum_fault(struct net_device *dev)
>>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
>>>  {
>>>   if (net_ratelimit()) {
>>>   pr_err("%s: hw csum failure\n", dev ? dev->name : 
>>> "");
>>> + if (dev)
>>> + pr_err("dev features: %pNF\n", >features);
>>> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
>>> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
>>> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
>>> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
>>> +skb->csum_complete_sw, skb->csum_valid);
>>
>>
>> This function also have the netdev available, use netdev_err to log the 
>> error?
> 
> It is apparently not me who picked pr_err() from the beginning,
> I just follow that pr_err(). If you are not happy with it, please send
> a followup.

Yes, but perhaps it is something to improve.
When using the netdev, then maybe it does not have to check if dev is null, 
because
netdev_err has handled the netdev being NULL case.
Maybe I missed something that netdev can not be used here?
If not, maybe I can send a followup.

> 
> 
>>
>> Also, dev->features was dumped before this patch, why remove it?
> 
> Seriously? Where do I remove it? Please be specific. :)

Sorry, I missed that, I thought it was removed when adding the new log.

> 
> .
>

Re: [PATCH v4] Wait for running BPF programs when updating map-in-map

2018-11-09 Thread Chenbo Feng

Hi netdev,

Could we queue up this patch to stable 4.14 and stable 4.19? I can
provide a backport patch if needed. I checked it is a clean
cherry-pick for 4.19 but have some minor conflict for 4.14.

Thanks
Chenbo Feng
On Thu, Oct 18, 2018 at 4:36 PM Joel Fernandes  wrote:
>
> On Thu, Oct 18, 2018 at 08:46:59AM -0700, Alexei Starovoitov wrote:
> > On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote:
> > > On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov
> > >  wrote:
> > > > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
> > > >> The map-in-map frequently serves as a mechanism for atomic
> > > >> snapshotting of state that a BPF program might record.  The current
> > > >> implementation is dangerous to use in this way, however, since
> > > >> userspace has no way of knowing when all programs that might have
> > > >> retrieved the "old" value of the map may have completed.
> > > >>
> > > >> This change ensures that map update operations on map-in-map map types
> > > >> always wait for all references to the old map to drop before returning
> > > >> to userspace.
> > > >>
> > > >> Signed-off-by: Daniel Colascione 
> > > >> ---
> > > >>  kernel/bpf/syscall.c | 14 ++
> > > >>  1 file changed, 14 insertions(+)
> > > >>
> > > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > >> index 8339d81cba1d..d7c16ae1e85a 100644
> > > >> --- a/kernel/bpf/syscall.c
> > > >> +++ b/kernel/bpf/syscall.c
> > > >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
> > > >>   return err;
> > > >>  }
> > > >>
> > > >> +static void maybe_wait_bpf_programs(struct bpf_map *map)
> > > >> +{
> > > >> + /* Wait for any running BPF programs to complete so that
> > > >> +  * userspace, when we return to it, knows that all programs
> > > >> +  * that could be running use the new map value.
> > > >> +  */
> > > >> + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> > > >> + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> > > >> + synchronize_rcu();
> > > >> + }
> > > >
> > > > extra {} were not necessary. I removed them while applying to bpf-next.
> > > > Please run checkpatch.pl next time.
> > > > Thanks
> > >
> > > Thanks Alexei for taking it. Me and Lorenzo were discussing that not
> > > having this causes incorrect behavior for apps using map-in-map for
> > > this. So I CC'd stable as well.
> >
> > It is too late in the release cycle.
> > We can submit it to stable releases after the merge window.
> >
>
> Sounds good, thanks.
>
> - Joel
>

[PATCH net] net: qualcomm: rmnet: Fix incorrect assignment of real_dev

2018-11-09 Thread Subash Abhinov Kasiviswanathan

A null dereference was observed when a sysctl was being set
from userspace and rmnet was stuck trying to complete some actions
in the NETDEV_REGISTER callback. This is because the real_dev is set
only after the device registration handler completes.

sysctl call stack -

<6> Unable to handle kernel NULL pointer dereference at
virtual address 0108
<2> pc : rmnet_vnd_get_iflink+0x1c/0x28
<2> lr : dev_get_iflink+0x2c/0x40
<2>  rmnet_vnd_get_iflink+0x1c/0x28
<2>  inet6_fill_ifinfo+0x15c/0x234
<2>  inet6_ifinfo_notify+0x68/0xd4
<2>  ndisc_ifinfo_sysctl_change+0x1b8/0x234
<2>  proc_sys_call_handler+0xac/0x100
<2>  proc_sys_write+0x3c/0x4c
<2>  __vfs_write+0x54/0x14c
<2>  vfs_write+0xcc/0x188
<2>  SyS_write+0x60/0xc0
<2>  el0_svc_naked+0x34/0x38

device register call stack -

<2>  notifier_call_chain+0x84/0xbc
<2>  raw_notifier_call_chain+0x38/0x48
<2>  call_netdevice_notifiers_info+0x40/0x70
<2>  call_netdevice_notifiers+0x38/0x60
<2>  register_netdevice+0x29c/0x3d8
<2>  rmnet_vnd_newlink+0x68/0xe8
<2>  rmnet_newlink+0xa0/0x160
<2>  rtnl_newlink+0x57c/0x6c8
<2>  rtnetlink_rcv_msg+0x1dc/0x328
<2>  netlink_rcv_skb+0xac/0x118
<2>  rtnetlink_rcv+0x24/0x30
<2>  netlink_unicast+0x158/0x1f0
<2>  netlink_sendmsg+0x32c/0x338
<2>  sock_sendmsg+0x44/0x60
<2>  SyS_sendto+0x150/0x1ac
<2>  el0_svc_naked+0x34/0x38

Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink")
Signed-off-by: Sean Tranchetti 
Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 0afc3d3..d11c16a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -234,7 +234,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
  struct net_device *real_dev,
  struct rmnet_endpoint *ep)
 {
-   struct rmnet_priv *priv;
+   struct rmnet_priv *priv = netdev_priv(rmnet_dev);
int rc;
 
if (ep->egress_dev)
@@ -247,6 +247,8 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
rmnet_dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
rmnet_dev->hw_features |= NETIF_F_SG;
 
+   priv->real_dev = real_dev;
+
rc = register_netdevice(rmnet_dev);
if (!rc) {
ep->egress_dev = rmnet_dev;
@@ -255,9 +257,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 
rmnet_dev->rtnl_link_ops = _link_ops;
 
-   priv = netdev_priv(rmnet_dev);
priv->mux_id = id;
-   priv->real_dev = real_dev;
 
netdev_dbg(rmnet_dev, "rmnet dev created\n");
}
-- 
1.9.1

Re: [PATCH net-next 5/8] e1000e: extend PTP gettime function to read system clock

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 11:14 +0100, Miroslav Lichvar wrote:
> This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
> 
> Cc: Richard Cochran 
> Cc: Jacob Keller 
> Cc: Jeff Kirsher 
> Signed-off-by: Miroslav Lichvar 
> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h  |  3 ++
>  drivers/net/ethernet/intel/e1000e/netdev.c | 42 --
>  drivers/net/ethernet/intel/e1000e/ptp.c| 16 +
>  3 files changed, 45 insertions(+), 16 deletions(-)

Acked-by: Jeff Kirsher 


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 6/8] igb: extend PTP gettime function to read system clock

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 11:14 +0100, Miroslav Lichvar wrote:
> This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
> 
> Cc: Richard Cochran 
> Cc: Jacob Keller 
> Cc: Jeff Kirsher 
> Signed-off-by: Miroslav Lichvar 
> ---
>  drivers/net/ethernet/intel/igb/igb_ptp.c | 65 
>  1 file changed, 55 insertions(+), 10 deletions(-)

Acked-by: Jeff Kirsher 


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 7/8] ixgbe: extend PTP gettime function to read system clock

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 11:14 +0100, Miroslav Lichvar wrote:
> This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
> 
> Cc: Richard Cochran 
> Cc: Jacob Keller 
> Cc: Jeff Kirsher 
> Signed-off-by: Miroslav Lichvar 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 54 
>  1 file changed, 44 insertions(+), 10 deletions(-)

Acked-by: Jeff Kirsher 


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread Richard Cochran

On Fri, Nov 09, 2018 at 03:28:46PM -0800, David Miller wrote:
> This series looks good to me but I want to give Richard an opportunity to
> review it first.

The series is good to go.

Acked-by: Richard Cochran

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Cong Wang

On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:
>
> On 2018/11/10 3:43, Cong Wang wrote:
> > Currently netdev_rx_csum_fault() only shows a device name,
> > we need more information about the skb for debugging.
> >
> > Sample output:
> >
> >  ens3: hw csum failure
> >  dev features: 0x00014b89
> >  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> > csum_complete_sw=0, csum_valid=0
> >
> > Signed-off-by: Cong Wang 
> > ---
> >  include/linux/netdevice.h |  5 +++--
> >  net/core/datagram.c   |  6 +++---
> >  net/core/dev.c| 10 --
> >  net/sunrpc/socklib.c  |  2 +-
> >  4 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 857f8abf7b91..fabcd9fa6cf7 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -4332,9 +4332,10 @@ static inline bool 
> > can_checksum_protocol(netdev_features_t features,
> >  }
> >
> >  #ifdef CONFIG_BUG
> > -void netdev_rx_csum_fault(struct net_device *dev);
> > +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
> >  #else
> > -static inline void netdev_rx_csum_fault(struct net_device *dev)
> > +static inline void netdev_rx_csum_fault(struct net_device *dev,
> > + struct sk_buff *skb)
> >  {
> >  }
> >  #endif
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> > --- a/net/core/datagram.c
> > +++ b/net/core/datagram.c
> > @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
> > *skb, int len)
> >   if (likely(!sum)) {
> >   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >   !skb->csum_complete_sw)
> > - netdev_rx_csum_fault(skb->dev);
> > + netdev_rx_csum_fault(skb->dev, skb);
> >   }
> >   if (!skb_shared(skb))
> >   skb->csum_valid = !sum;
> > @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
> >   if (likely(!sum)) {
> >   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >   !skb->csum_complete_sw)
> > - netdev_rx_csum_fault(skb->dev);
> > + netdev_rx_csum_fault(skb->dev, skb);
> >   }
> >
> >   if (!skb_shared(skb)) {
> > @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
> >
> >   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >   !skb->csum_complete_sw)
> > - netdev_rx_csum_fault(NULL);
> > + netdev_rx_csum_fault(NULL, skb);
> >   }
> >   return 0;
> >  fault:
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 0ffcbdd55fa9..2b337df26117 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
> >
> >  /* Take action when hardware reception checksum errors are detected. */
> >  #ifdef CONFIG_BUG
> > -void netdev_rx_csum_fault(struct net_device *dev)
> > +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
> >  {
> >   if (net_ratelimit()) {
> >   pr_err("%s: hw csum failure\n", dev ? dev->name : 
> > "");
> > + if (dev)
> > + pr_err("dev features: %pNF\n", >features);
> > + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> > ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> > +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> > +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> > +skb->csum_complete_sw, skb->csum_valid);
>
>
> This function also have the netdev available, use netdev_err to log the error?

It is apparently not me who picked pr_err() from the beginning,
I just follow that pr_err(). If you are not happy with it, please send
a followup.


>
> Also, dev->features was dumped before this patch, why remove it?

Seriously? Where do I remove it? Please be specific. :)

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Yunsheng Lin

On 2018/11/10 3:43, Cong Wang wrote:
> Currently netdev_rx_csum_fault() only shows a device name,
> we need more information about the skb for debugging.
> 
> Sample output:
> 
>  ens3: hw csum failure
>  dev features: 0x00014b89
>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> csum_complete_sw=0, csum_valid=0
> 
> Signed-off-by: Cong Wang 
> ---
>  include/linux/netdevice.h |  5 +++--
>  net/core/datagram.c   |  6 +++---
>  net/core/dev.c| 10 --
>  net/sunrpc/socklib.c  |  2 +-
>  4 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8abf7b91..fabcd9fa6cf7 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -4332,9 +4332,10 @@ static inline bool 
> can_checksum_protocol(netdev_features_t features,
>  }
>  
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev);
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
>  #else
> -static inline void netdev_rx_csum_fault(struct net_device *dev)
> +static inline void netdev_rx_csum_fault(struct net_device *dev,
> + struct sk_buff *skb)
>  {
>  }
>  #endif
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
> int len)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>   if (!skb_shared(skb))
>   skb->csum_valid = !sum;
> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>  
>   if (!skb_shared(skb)) {
> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>  
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(NULL);
> + netdev_rx_csum_fault(NULL, skb);
>   }
>   return 0;
>  fault:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ffcbdd55fa9..2b337df26117 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
>  
>  /* Take action when hardware reception checksum errors are detected. */
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev)
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
>  {
>   if (net_ratelimit()) {
>   pr_err("%s: hw csum failure\n", dev ? dev->name : "");
> + if (dev)
> + pr_err("dev features: %pNF\n", >features);
> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> +skb->csum_complete_sw, skb->csum_valid);


This function also have the netdev available, use netdev_err to log the error?

Also, dev->features was dumped before this patch, why remove it?


>   dump_stack();
>   }
>  }
> @@ -5779,7 +5785,7 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>  
>   NAPI_GRO_CB(skb)->csum = wsum;
> diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
> index 9062967575c4..7e55cfc69697 100644
> --- a/net/sunrpc/socklib.c
> +++ b/net/sunrpc/socklib.c
> @@ -175,7 +175,7 @@ int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct 
> sk_buff *skb)
>   return -1;
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   return 0;
>  no_checksum:
>   if (xdr_partial_copy_from_skb(xdr, 0, , xdr_skb_read_bits) < 0)
>

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread David Miller

From: Jeff Kirsher 
Date: Fri, 09 Nov 2018 15:33:10 -0800

> On Fri, 2018-11-09 at 15:28 -0800, David Miller wrote:
>> From: Miroslav Lichvar 
>> Date: Fri,  9 Nov 2018 11:14:41 +0100
>> 
>> > RFC->v1:
>> > - added new patches
>> > - separated PHC timestamp from ptp_system_timestamp
>> > - fixed memory leak in PTP_SYS_OFFSET_EXTENDED
>> > - changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
>> > - fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
>> > - fixed timecounter updates in drivers
>> > - split gettimex in igb driver
>> > - fixed ptp_read_* functions to be available without
>> >   CONFIG_PTP_1588_CLOCK
>> > 
>> > This series enables a more accurate synchronization between PTP
>> > hardware
>> > clocks and the system clock.
>>  ...
>> 
>> This series looks good to me but I want to give Richard an opportunity to
>> review it first.
> 
> Dave, I also do not want to hold this series up by picking up patches 5, 6
> and 7 (Intel drivers) so please apply the entire series after Richard
> provides his review.

Ok, will do.

[PATCH net] flow_dissector: do not dissect l4 ports for fragments

2018-11-09 Thread Eric Dumazet

From: 배석진 

Only first fragment has the sport/dport information,
not the following ones.

If we want consistent hash for all fragments, we need to
ignore ports even for first fragment.

This bug is visible for IPv6 traffic, if incoming fragments
do not have a flow label, since skb_get_hash() will give
different results for first fragment and following ones.

It is also visible if any routing rule wants dissection
and sport or dport.

See commit 5e5d6fed3741 ("ipv6: route: dissect flow
in input path if fib rules need it") for details.

[edumazet] rewrote the changelog completely.

Fixes: 06635a35d13d ("flow_dissect: use programable dissector in 
skb_flow_dissect and friends")
Signed-off-by: 배석진 
Signed-off-by: Eric Dumazet 
---
 net/core/flow_dissector.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 
676f3ad629f95625422aa55f0f54157001ac477c..588f475019d47c9d6bae8883acebab48aaf63b48
 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1166,8 +1166,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
break;
}
 
-   if (dissector_uses_key(flow_dissector,
-  FLOW_DISSECTOR_KEY_PORTS)) {
+   if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS) &&
+   !(key_control->flags & FLOW_DIS_IS_FRAGMENT)) {
key_ports = skb_flow_dissector_target(flow_dissector,
  FLOW_DISSECTOR_KEY_PORTS,
  target_container);
-- 
2.19.1.930.g4563a0d9d0-goog

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread David Ahern

On 11/9/18 9:21 AM, David Ahern wrote:
>> Is there possible to add only counters from xdp for vlans ?
>> This will help me in testing.
> I will take a look today at adding counters that you can dump using
> bpftool. It will be a temporary solution for this xdp program only.
> 

Same tree, kernel-tables-wip-02 branch. Compile kernel and install.
Compile samples as before.

If you give the userspace program a -t arg, it loop showing stats.
Ctrl-C to break. The xdp programs are not detached on exit.

Example:

./xdp_fwd -t 5 eth1 eth2 eth3 eth4

15:59:32:   rx tx  dropped  skippedl3_devfib_dev
index  3:   901158 9011580   18 0  0
index  4:   901159 9011580   20 0 901139
index 10:0  00019 19
index 11:0  000901139 901139
index 15:0  00019 19
index 16:0  000901139  0

Rx and Tx counters are for the physical port.

VLANs show up as l3_dev (ingress) and fib_dev (egress).

dropped is anytime the xdp program returns XDP_DROP (e.g., invalid packet)

skipped is anytime the program returns XDP_PASS (e.g., not ipv4 or ipv6,
local traffic, or needs full stack assist).

[PATCH net-next 2/2] net: phy: realtek: use new PHYID matching macros

2018-11-09 Thread Heiner Kallweit

Use new macros for PHYID matching to avoid boilerplate code.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/realtek.c | 29 ++---
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 0f8e5b1c9..c6010fb1a 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -213,14 +213,12 @@ static int rtl8366rb_config_init(struct phy_device 
*phydev)
 
 static struct phy_driver realtek_drvs[] = {
{
-   .phy_id = 0x8201,
+   PHY_ID_MATCH_EXACT(0x8201),
.name   = "RTL8201CP Ethernet",
-   .phy_id_mask= 0x,
.features   = PHY_BASIC_FEATURES,
}, {
-   .phy_id = 0x001cc816,
+   PHY_ID_MATCH_EXACT(0x001cc816),
.name   = "RTL8201F Fast Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_BASIC_FEATURES,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
@@ -229,17 +227,15 @@ static struct phy_driver realtek_drvs[] = {
.read_page  = rtl821x_read_page,
.write_page = rtl821x_write_page,
}, {
-   .phy_id = 0x001cc910,
+   PHY_ID_MATCH_EXACT(0x001cc910),
.name   = "RTL8211 Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_aneg= rtl8211_config_aneg,
.read_mmd   = _read_mmd_unsupported,
.write_mmd  = _write_mmd_unsupported,
}, {
-   .phy_id = 0x001cc912,
+   PHY_ID_MATCH_EXACT(0x001cc912),
.name   = "RTL8211B Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
@@ -248,35 +244,31 @@ static struct phy_driver realtek_drvs[] = {
.suspend= rtl8211b_suspend,
.resume = rtl8211b_resume,
}, {
-   .phy_id = 0x001cc913,
+   PHY_ID_MATCH_EXACT(0x001cc913),
.name   = "RTL8211C Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_init= rtl8211c_config_init,
.read_mmd   = _read_mmd_unsupported,
.write_mmd  = _write_mmd_unsupported,
}, {
-   .phy_id = 0x001cc914,
+   PHY_ID_MATCH_EXACT(0x001cc914),
.name   = "RTL8211DN Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.ack_interrupt  = rtl821x_ack_interrupt,
.config_intr= rtl8211e_config_intr,
.suspend= genphy_suspend,
.resume = genphy_resume,
}, {
-   .phy_id = 0x001cc915,
+   PHY_ID_MATCH_EXACT(0x001cc915),
.name   = "RTL8211E Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
.suspend= genphy_suspend,
.resume = genphy_resume,
}, {
-   .phy_id = 0x001cc916,
+   PHY_ID_MATCH_EXACT(0x001cc916),
.name   = "RTL8211F Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_init= _config_init,
.ack_interrupt  = _ack_interrupt,
@@ -286,9 +278,8 @@ static struct phy_driver realtek_drvs[] = {
.read_page  = rtl821x_read_page,
.write_page = rtl821x_write_page,
}, {
-   .phy_id = 0x001cc961,
+   PHY_ID_MATCH_EXACT(0x001cc961),
.name   = "RTL8366RB Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_init= _config_init,
.suspend= genphy_suspend,
@@ -299,7 +290,7 @@ static struct phy_driver realtek_drvs[] = {
 module_phy_driver(realtek_drvs);
 
 static const struct mdio_device_id __maybe_unused realtek_tbl[] = {
-   { 0x001cc800, GENMASK(31, 10) },
+   { PHY_ID_MATCH_VENDOR(0x001cc800) },
{ }
 };
 
-- 
2.19.1

[PATCH net-next 1/2] net: phy: add macros for PHYID matching

2018-11-09 Thread Heiner Kallweit

Add macros for PHYID matching to be used in PHY driver configs.
By using these macros some boilerplate code can be avoided.

Signed-off-by: Heiner Kallweit 
---
 include/linux/phy.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 17d1f6472..03005c65e 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -651,6 +651,10 @@ struct phy_driver {
 #define PHY_ANY_ID "MATCH ANY PHY"
 #define PHY_ANY_UID 0x
 
+#define PHY_ID_MATCH_EXACT(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 0)
+#define PHY_ID_MATCH_MODEL(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 4)
+#define PHY_ID_MATCH_VENDOR(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 10)
+
 /* A Structure for boards to register fixups with the PHY Lib */
 struct phy_fixup {
struct list_head list;
-- 
2.19.1

Re: [PATCH net 0/5] net: aquantia: 2018-11 bugfixes

2018-11-09 Thread David Miller

From: Igor Russkikh 
Date: Fri, 9 Nov 2018 11:53:54 +

> The patchset fixes a number of bugs found in various areas after
> driver validation.

Series applied, thank you.

Please, when you provide a Fixes: tag, do not separate it with the
other Signed-off-by: and Acked-by: etc. tags with an empty line.  It
is just another tag, so keep them all together without any kind of
separation like that.

A lot of people seem to do this, I wonder why :-)

Thank you.

[PATCH net-next 0/2] net: phy: add macros for PHYID matching in PHY driver config

2018-11-09 Thread Heiner Kallweit

Add macros for PHYID matching to be used in PHY driver configs.
By using these macros some boilerplate code can be avoided.

Use them initially in the Realtek PHY drivers.

Heiner Kallweit (2):
  net: phy: add macros for PHYID matching
  net: phy: realtek: use new PHYID matching macros

 drivers/net/phy/realtek.c | 29 ++---
 include/linux/phy.h   |  4 
 2 files changed, 14 insertions(+), 19 deletions(-)

-- 
2.19.1

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 15:28 -0800, David Miller wrote:
> From: Miroslav Lichvar 
> Date: Fri,  9 Nov 2018 11:14:41 +0100
> 
> > RFC->v1:
> > - added new patches
> > - separated PHC timestamp from ptp_system_timestamp
> > - fixed memory leak in PTP_SYS_OFFSET_EXTENDED
> > - changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
> > - fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
> > - fixed timecounter updates in drivers
> > - split gettimex in igb driver
> > - fixed ptp_read_* functions to be available without
> >   CONFIG_PTP_1588_CLOCK
> > 
> > This series enables a more accurate synchronization between PTP
> > hardware
> > clocks and the system clock.
>  ...
> 
> This series looks good to me but I want to give Richard an opportunity to
> review it first.

Dave, I also do not want to hold this series up by picking up patches 5, 6
and 7 (Intel drivers) so please apply the entire series after Richard
provides his review.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread David Miller

From: Miroslav Lichvar 
Date: Fri,  9 Nov 2018 11:14:41 +0100

> RFC->v1:
> - added new patches
> - separated PHC timestamp from ptp_system_timestamp
> - fixed memory leak in PTP_SYS_OFFSET_EXTENDED
> - changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
> - fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
> - fixed timecounter updates in drivers
> - split gettimex in igb driver
> - fixed ptp_read_* functions to be available without
>   CONFIG_PTP_1588_CLOCK
> 
> This series enables a more accurate synchronization between PTP hardware
> clocks and the system clock.
 ...

This series looks good to me but I want to give Richard an opportunity to
review it first.

Re: [PATCH net 1/1] bnx2x: Assign unique DMAE channel number for FW DMAE transactions.

2018-11-09 Thread David Miller

From: Sudarsana Reddy Kalluru 
Date: Fri, 9 Nov 2018 02:10:43 -0800

> +/* Following is the DMAE channel number allocation for the clients.
> + *   MFW: OCBB/OCSD implementations use DMAE channels 14/15 respectively.
> + *   Driver: 0-3 and 8-11 (for PF dmae operations)
> + *   4 and 12 (for stats requests)
> + */
> +#define BNX2X_FW_DMAE_C 13 /* Channel for FW DMAE operations 
> */
 ...
> + start_params->dmae_cmd_id = BNX2X_FW_DMAE_C;

Why do you need this, it never changes, and:

> + rdata->dmae_cmd_id  = start_params->dmae_cmd_id;

It always is the same value here in the one place it is used.

Just assign BNX2X_FW_DMAE_C directly to rdata->dmae_cmd_id please.

Re: [PATCH net-next] cxgb4vf: free mac_hlist properly

2018-11-09 Thread David Miller

From: Arjun Vynipadath 
Date: Fri,  9 Nov 2018 14:52:53 +0530

> The locally maintained list for tracking hash mac table was
> not freed during driver remove.
> 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net-next] cxgb4vf: fix memleak in mac_hlist initialization

2018-11-09 Thread David Miller

From: Arjun Vynipadath 
Date: Fri,  9 Nov 2018 14:52:01 +0530

> mac_hlist was initialized during adapter_up, which will be called
> every time a vf device is first brought up, or every time when device
> is brought up again after bringing all devices down. This means our
> state of previous list is lost, causing a memleak if entries are
> present in the list. To fix that, move list init to the condition
> that performs initial one time adapter setup.
> 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net-next] cxgb4: free mac_hlist properly

2018-11-09 Thread David Miller

From: Arjun Vynipadath 
Date: Fri,  9 Nov 2018 14:50:25 +0530

> The locally maintained list for tracking hash mac table was
> not freed during driver remove.
> 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH][net-next] net: tcp: remove BUG_ON from tcp_v4_err

2018-11-09 Thread David Miller

From: Li RongQing 
Date: Fri,  9 Nov 2018 17:04:51 +0800

> if skb is NULL pointer, and the following access of skb's
> skb_mstamp_ns will trigger panic, which is same as BUG_ON
> 
> Signed-off-by: Li RongQing 

Applied.

[PATCH net] net: sched: cls_flower: validate nested enc_opts_policy to avoid build warning

2018-11-09 Thread Jakub Kicinski

TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used.  Add the validation
anyway to avoid potential bugs when other attributes are added
and to make the attribute structure slightly more clear.  Validation
will also set extact to point to bad attribute on error.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Jakub Kicinski 
Acked-by: Simon Horman 
---
 net/sched/cls_flower.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9aada2d0ef06..c6c327874abc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -709,11 +709,23 @@ static int fl_set_enc_opt(struct nlattr **tb, struct 
fl_flow_key *key,
  struct netlink_ext_ack *extack)
 {
const struct nlattr *nla_enc_key, *nla_opt_key, *nla_opt_msk = NULL;
-   int option_len, key_depth, msk_depth = 0;
+   int err, option_len, key_depth, msk_depth = 0;
+
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
 
nla_enc_key = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS]);
 
if (tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]) {
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
+
nla_opt_msk = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
msk_depth = nla_len(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
}
-- 
2.17.1

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread Paweł Staszewski





W dniu 08.11.2018 o 20:12, Paweł Staszewski pisze:
CPU load is lower than for connectx4 - but it looks like bandwidth 
limit is the same :)

But also after reaching 60Gbit/60Gbit

 bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
  input: /proc/net/dev type: rate
  - iface   Rx Tx    Total
== 

 enp175s0:  45.09 Gb/s   15.09 Gb/s   
60.18 Gb/s
 enp216s0:  15.14 Gb/s   45.19 Gb/s   
60.33 Gb/s
-- 

    total:  60.45 Gb/s   60.48 Gb/s 120.93 Gb/s 


Today reached 65/65Gbit/s

But starting from 60Gbit/s RX / 60Gbit TX nics start to drop packets 
(with 50%CPU on all 28cores) - so still there is cpu power to use :).


So checked other stats.
softnet_stats shows average 1k squeezed per sec:
cpu  total    dropped   squeezed  collision    rps flow_limit
  0  18554  0  1  0  0 0
  1  16728  0  1  0  0 0
  2  18033  0  1  0  0 0
  3  17757  0  1  0  0 0
  4  18861  0  0  0  0 0
  5  0  0  1  0  0 0
  6  2  0  1  0  0 0
  7  0  0  1  0  0 0
  8  0  0  0  0  0 0
  9  0  0  1  0  0 0
 10  0  0  0  0  0 0
 11  0  0  1  0  0 0
 12 50  0  1  0  0 0
 13    257  0  0  0  0 0
 14 3629115363  0    3353259  0  0 0
 15  255167835  0    3138271  0  0 0
 16 4240101961  0    3036130  0  0 0
 17  599810018  0    3072169  0  0 0
 18  432796524  0    3034191  0  0 0
 19   41803906  0    3037405  0  0 0
 20  900382666  0    3112294  0  0 0
 21  620926085  0    3086009  0  0 0
 22   41861198  0    3023142  0  0 0
 23 4090425574  0    2990412  0  0 0
 24 4264870218  0    3010272  0  0 0
 25  141401811  0    3027153  0  0 0
 26  104155188  0    3051251  0  0 0
 27 4261258691  0    3039765  0  0 0
 28  4  0  1  0  0 0
 29  4  0  0  0  0 0
 30  0  0  1  0  0 0
 31  0  0  0  0  0 0
 32  3  0  1  0  0 0
 33  1  0  1  0  0 0
 34  0  0  1  0  0 0
 35  0  0  0  0  0 0
 36  0  0  1  0  0 0
 37  0  0  1  0  0 0
 38  0  0  1  0  0 0
 39  0  0  1  0  0 0
 40  0  0  0  0  0 0
 41  0  0  1  0  0 0
 42  299758202  0    3139693  0  0 0
 43 4254727979  0    3103577  0  0 0
 44 195943  0    2554885  0  0 0
 45 1675702723  0    2513481  0  0 0
 46 1908435503  0    2519698  0  0 0
 47 1877799710  0    2537768  0  0 0
 48 2384274076  0    2584673  0  0 0
 49 2598104878  0    2593616  0  0 0
 50 1897566829  0    2530857  0  0 0
 51 1712741629  0    2489089  0  0 0
 52 1704033648  0    2495892  0  0 0
 53 1636781820  0    2499783  0  0 0
 54 1861997734  0    2541060  0  0 0
 55 2113521616  0    2555673  0  0 0


So i rised netdev backlog and budged to rly high values
524288 for netdev_budget and same for backlog

This rised sortirqs from about 600k/sec to 800k/sec for NET_TX/NET_RX

But after this changes i have less packets drops.


Below perf top from max traffic reached:
   PerfTop:   72230 irqs/sec  kernel:99.4%  exact:  0.0% [4000Hz 
cycles],  (all, 56 CPUs)

Re: [PATCH net-next 0/4] Remove VLAN_TAG_PRESENT from drivers

2018-11-09 Thread Shiraz Saleem

On Thu, Nov 08, 2018 at 06:44:46PM +0100, Michał Mirosław wrote:
> This series removes VLAN_TAG_PRESENT use from network drivers in
> preparation to removing its special meaning.
> 
> Michał Mirosław (4):
>   i40iw: remove use of VLAN_TAG_PRESENT
>   cnic: remove use of VLAN_TAG_PRESENT
>   gianfar: remove use of VLAN_TAG_PRESENT
>   OVS: remove use of VLAN_TAG_PRESENT
> 
>  drivers/infiniband/hw/i40iw/i40iw_cm.c|  8 +++
>

i40iw bit looks fine. Thanks!

Acked-by: Shiraz Saleem

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Alexei Starovoitov

On 11/9/18 1:28 PM, Edward Cree wrote:
> On 09/11/18 21:14, Alexei Starovoitov wrote:
>> same link, but i cannot make it right now.
>> have to extinguish few fires.
>> may be at 2pm (unlikely) or 3pm (more likely) PST?
> 
> Yep I can do either of those, just let me know which when you can.

still swamped. but see the light.
let's do 3pm

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Edward Cree

On 09/11/18 21:14, Alexei Starovoitov wrote:
> same link, but i cannot make it right now.
> have to extinguish few fires.
> may be at 2pm (unlikely) or 3pm (more likely) PST?

Yep I can do either of those, just let me know which when you can.

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Alexei Starovoitov

On 11/9/18 12:00 PM, Edward Cree wrote:
> On 09/11/18 04:35, Alexei Starovoitov wrote:
>> On Thu, Nov 08, 2018 at 10:56:55PM +, Edward Cree wrote:
>>>   think this question of maps should be discussed in tomorrow's
>>>   call, since it is when we start having other kinds of instances
>> turned out most of us have a conflict, so the earliest is 1:30pm on Friday.
>> still works for you?
> 
> Yep (that's 9.30pm GMT right?)
> 
> I'm assuming same bluejeans link again.

same link, but i cannot make it right now.
have to extinguish few fires.
may be at 2pm (unlikely) or 3pm (more likely) PST?

Re: [PATCH 08/20] octeontx2-af: Alloc and config NPC MCAM entry at a time

2018-11-09 Thread Arnd Bergmann

On Fri, Nov 9, 2018 at 6:13 PM Sunil Kovvuri  wrote:
> On Fri, Nov 9, 2018 at 4:32 PM Arnd Bergmann  wrote:
> > On Fri, Nov 9, 2018 at 5:21 AM Sunil Kovvuri  
> > wrote:

> >
> > Since b is aligned to four bytes, you get padding between a and b.
> > On top of that, you also get padding after c to make the size of
> > structure itself be a multiple of its alignment. For interfaces, we
> > should avoid both kinds of padding. This can be done by marking
> > members as __packed (usually I don't recommend that), by
> > changing the size of members, or by adding explicit 'reserved'
> > fields in place of the padding.
> >
> > > > I also noticed a similar problem in struct mbox_msghdr. Maybe
> > > > use the 'pahole' tool to check for this kind of padding in the
> > > > API structures.
>
> Got your point now and agree that padding has to be avoided.
> But this is a big change and above pointed structure is not
> the only one as this applies to all structures in the file.
>
> Would it be okay if I submit a separate patch after this series
> addressing all structures ?

It depends on how you want to address it. If you want to
change the structure layout, then I think it would be better
integrated into the series as that is an incompatible interface
change. If you just want to add reserved members to make
the padding explicit, that could be a follow-up.

Arnd

Re: [PATCH net-next v2 1/2] dpaa2-eth: defer probe on object allocate

2018-11-09 Thread Andrew Lunn

On Fri, Nov 09, 2018 at 03:26:45PM +, Ioana Ciornei wrote:
> The fsl_mc_object_allocate function can fail because not all allocatable
> objects are probed by the fsl_mc_allocator at the call time. Defer the
> dpaa2-eth probe when this happens.
> 
> Signed-off-by: Ioana Ciornei 
> ---
> Changes in v2:
>   - proper handling of IS_ERR_OR_NULL

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2 net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-09 Thread Florian Fainelli

On 11/9/18 9:35 AM, Heiner Kallweit wrote:
> As a heritage from the very early days of phylib member interrupts is
> defined as u32 even though it's just a flag whether interrupts are
> enabled. So we can change it to a bitfield member. In addition change
> the code dealing with this member in a way that it's clear we're
> dealing with a bool value.
> 
> Signed-off-by: Heiner Kallweit 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH v5 bpf-next 0/7] bpftool: support loading flow dissector

2018-11-09 Thread Jakub Kicinski

On Fri,  9 Nov 2018 08:21:39 -0800, Stanislav Fomichev wrote:
> v5 changes:
> * FILE -> PATH for load/loadall (can be either file or directory now)
> * simpler implementation for __bpf_program__pin_name
> * removed p_err for REQ_ARGS checks
> * parse_atach_detach_args -> parse_attach_detach_args
> * for -> while in bpf_object__pin_{programs,maps} recovery

Thanks!  Patch 3 needs attention from maintainers but the rest LGTM!

Re: [PATCH v2 net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-09 Thread Andrew Lunn

On Fri, Nov 09, 2018 at 06:35:52PM +0100, Heiner Kallweit wrote:
> As a heritage from the very early days of phylib member interrupts is
> defined as u32 even though it's just a flag whether interrupts are
> enabled. So we can change it to a bitfield member. In addition change
> the code dealing with this member in a way that it's clear we're
> dealing with a bool value.
> 
> Signed-off-by: Heiner Kallweit 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Edward Cree

On 09/11/18 04:35, Alexei Starovoitov wrote:
> On Thu, Nov 08, 2018 at 10:56:55PM +, Edward Cree wrote:
>>  think this question of maps should be discussed in tomorrow's
>>  call, since it is when we start having other kinds of instances
> turned out most of us have a conflict, so the earliest is 1:30pm on Friday.
> still works for you?

Yep (that's 9.30pm GMT right?)

I'm assuming same bluejeans link again.

-Ed

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread Paweł Staszewski





W dniu 09.11.2018 o 17:21, David Ahern pisze:

On 11/9/18 3:20 AM, Paweł Staszewski wrote:

I just catch some weird behavior :)
All was working fine for about 20k packets

Then after xdp start to forward every 10 packets

Interesting. Any counter showing drops?

nothing that will fit

NIC statistics:
 rx_packets: 187041
 rx_bytes: 10600954
 tx_packets: 40316
 tx_bytes: 16526844
 tx_tso_packets: 797
 tx_tso_bytes: 3876084
 tx_tso_inner_packets: 0
 tx_tso_inner_bytes: 0
 tx_added_vlan_packets: 38391
 tx_nop: 2
 rx_lro_packets: 0
 rx_lro_bytes: 0
 rx_ecn_mark: 0
 rx_removed_vlan_packets: 187041
 rx_csum_unnecessary: 0
 rx_csum_none: 150011
 rx_csum_complete: 37030
 rx_csum_unnecessary_inner: 0
 rx_xdp_drop: 0
 rx_xdp_redirect: 64893
 rx_xdp_tx_xmit: 0
 rx_xdp_tx_full: 0
 rx_xdp_tx_err: 0
 rx_xdp_tx_cqe: 0
 tx_csum_none: 2468
 tx_csum_partial: 35955
 tx_csum_partial_inner: 0
 tx_queue_stopped: 0
 tx_queue_dropped: 0
 tx_xmit_more: 0
 tx_recover: 0
 tx_cqes: 38423
 tx_queue_wake: 0
 tx_udp_seg_rem: 0
 tx_cqe_err: 0
 tx_xdp_xmit: 0
 tx_xdp_full: 0
 tx_xdp_err: 0
 tx_xdp_cqes: 0
 rx_wqe_err: 0
 rx_mpwqe_filler_cqes: 0
 rx_mpwqe_filler_strides: 0
 rx_buff_alloc_err: 0
 rx_cqe_compress_blks: 0
 rx_cqe_compress_pkts: 0
 rx_page_reuse: 0
 rx_cache_reuse: 186302
 rx_cache_full: 0
 rx_cache_empty: 666768
 rx_cache_busy: 174
 rx_cache_waive: 0
 rx_congst_umr: 0
 rx_arfs_err: 0
 ch_events: 249320
 ch_poll: 249321
 ch_arm: 249001
 ch_aff_change: 0
 ch_eq_rearm: 0
 rx_out_of_buffer: 0
 rx_if_down_packets: 57
 rx_vport_unicast_packets: 142659
 rx_vport_unicast_bytes: 42706914
 tx_vport_unicast_packets: 40167
 tx_vport_unicast_bytes: 16668096
 rx_vport_multicast_packets: 39188170
 rx_vport_multicast_bytes: 3466527450
 tx_vport_multicast_packets: 58
 tx_vport_multicast_bytes: 4556
 rx_vport_broadcast_packets: 16343520
 rx_vport_broadcast_bytes: 1031334602
 tx_vport_broadcast_packets: 91
 tx_vport_broadcast_bytes: 5460
 rx_vport_rdma_unicast_packets: 0
 rx_vport_rdma_unicast_bytes: 0
 tx_vport_rdma_unicast_packets: 0
 tx_vport_rdma_unicast_bytes: 0
 rx_vport_rdma_multicast_packets: 0
 rx_vport_rdma_multicast_bytes: 0
 tx_vport_rdma_multicast_packets: 0
 tx_vport_rdma_multicast_bytes: 0
 tx_packets_phy: 40316
 rx_packets_phy: 55674361
 rx_crc_errors_phy: 0
 tx_bytes_phy: 16839376
 rx_bytes_phy: 4763267396
 tx_multicast_phy: 58
 tx_broadcast_phy: 91
 rx_multicast_phy: 39188180
 rx_broadcast_phy: 16343521
 rx_in_range_len_errors_phy: 0
 rx_out_of_range_len_phy: 0
 rx_oversize_pkts_phy: 0
 rx_symbol_err_phy: 0
 tx_mac_control_phy: 0
 rx_mac_control_phy: 0
 rx_unsupported_op_phy: 0
 rx_pause_ctrl_phy: 0
 tx_pause_ctrl_phy: 0
 rx_discards_phy: 1
 tx_discards_phy: 0
 tx_errors_phy: 0
 rx_undersize_pkts_phy: 0
 rx_fragments_phy: 0
 rx_jabbers_phy: 0
 rx_64_bytes_phy: 3792455
 rx_65_to_127_bytes_phy: 51821620
 rx_128_to_255_bytes_phy: 37669
 rx_256_to_511_bytes_phy: 1481
 rx_512_to_1023_bytes_phy: 434
 rx_1024_to_1518_bytes_phy: 694
 rx_1519_to_2047_bytes_phy: 20008
 rx_2048_to_4095_bytes_phy: 0
 rx_4096_to_8191_bytes_phy: 0
 rx_8192_to_10239_bytes_phy: 0
 link_down_events_phy: 0
 rx_pcs_symbol_err_phy: 0
 rx_corrected_bits_phy: 6
 rx_err_lane_0_phy: 0
 rx_err_lane_1_phy: 0
 rx_err_lane_2_phy: 0
 rx_err_lane_3_phy: 6
 rx_buffer_passed_thres_phy: 0
 rx_pci_signal_integrity: 0
 tx_pci_signal_integrity: 82
 outbound_pci_stalled_rd: 0
 outbound_pci_stalled_wr: 0
 outbound_pci_stalled_rd_events: 0
 outbound_pci_stalled_wr_events: 0
 rx_prio0_bytes: 4144920388
 rx_prio0_packets: 48310037
 tx_prio0_bytes: 16839376
 tx_prio0_packets: 40316
 rx_prio1_bytes: 481032
 rx_prio1_packets: 7074
 tx_prio1_bytes: 0
 tx_prio1_packets: 0
 rx_prio2_bytes: 9074194
 rx_prio2_packets: 106207
 tx_prio2_bytes: 0
 tx_prio2_packets: 0
 rx_prio3_bytes: 0
 rx_prio3_packets: 0
 tx_prio3_bytes: 0
 tx_prio3_packets: 0
 rx_prio4_bytes: 0
 rx_prio4_packets: 0
 tx_prio4_bytes: 0
 tx_prio4_packets: 0
 rx_prio5_bytes: 0
 rx_prio5_packets: 0
 tx_prio5_bytes: 0
 tx_prio5_packets: 0
 rx_prio6_bytes: 371961810
 rx_prio6_packets: 4006281
 tx_prio6_bytes: 0
 tx_prio6_packets: 0
 rx_prio7_bytes: 236830040
 rx_prio7_packets: 3244761
 tx_prio7_bytes: 0
 tx_prio7_packets: 0
 tx_pause_storm_warning_events : 0
 tx_pause_storm_error_events: 0
 module_unplug: 0
 module_bus_stuck: 0
 module_high_temp: 0
 module_bad_shorted: 0

NIC

Re: [PATCH iproute] ss: Actually print left delimiter for columns

2018-11-09 Thread Stefano Brivio

On Fri, 9 Nov 2018 09:05:46 -0800
Stephen Hemminger  wrote:

> On Mon, 29 Oct 2018 23:04:25 +0100
> Stefano Brivio  wrote:
> 
> > While rendering columns, we use a local variable to keep track of the
> > field currently being printed, without touching current_field, which is
> > used for buffering.
> > 
> > Use the right pointer to access the left delimiter for the current column,
> > instead of always printing the left delimiter for the last buffered field,
> > which is usually an empty string.
> > 
> > This fixes an issue especially visible on narrow terminals, where some
> > columns might be displayed without separation.
> > 
> > Reported-by: YoyPa 
> > Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a 
> > table")
> > Signed-off-by: Stefano Brivio 
> > Tested-by: YoyPa   
> 
> This test broke the testsuite/ss/ssfilter.t test.
> Please fix the test to match your new output format, or I will have to revert 
> it.

Ouch, sorry, I didn't notice that "new" test. I'll fix that by tomorrow.

-- 
Stefano

Re: [PATCH net] ip: hash fragments consistently

2018-11-09 Thread Eric Dumazet




On 07/23/2018 09:26 AM, Eric Dumazet wrote:
> 
> 
> On 07/23/2018 07:50 AM, Paolo Abeni wrote:
>> The skb hash for locally generated ip[v6] fragments belonging
>> to the same datagram can vary in several circumstances:
>> * for connected UDP[v6] sockets, the first fragment get its hash
>>   via set_owner_w()/skb_set_hash_from_sk()
>> * for unconnected IPv6 UDPv6 sockets, the first fragment can get
>>   its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
>>   auto_flowlabel is enabled
>>
>> For the following frags the hash is usually computed via
>> skb_get_hash().
>> The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
>> scenario the egress tx queue can be selected on a per packet basis
>> via the skb hash.
>> It may also fool flow-oriented schedulers to place fragments belonging
>> to the same datagram in different flows.
>>
> 
> It also fools bond_xmit_hash(), packets of the same datagram can be sent on
> two bonding slaves instead of one, meaning adding pressure on the defrag unit
> in receiver.
> 
> Reviewed-by: Eric Dumazet 
> 

Also we might note that flow dissector itself is buggy as
found by Soukjin Bae ( https://patchwork.ozlabs.org/patch/994601/ )

I will send a v2 of his patch with a different changelog.

Defrag is fixed [1] but the bug in flow dissector is adding
extra work and hash inconsistencies.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=0d5b9311baf27bb545f187f12ecfd558220c607d

[Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Cong Wang

Currently netdev_rx_csum_fault() only shows a device name,
we need more information about the skb for debugging.

Sample output:

 ens3: hw csum failure
 dev features: 0x00014b89
 skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
csum_complete_sw=0, csum_valid=0

Signed-off-by: Cong Wang 
---
 include/linux/netdevice.h |  5 +++--
 net/core/datagram.c   |  6 +++---
 net/core/dev.c| 10 --
 net/sunrpc/socklib.c  |  2 +-
 4 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 857f8abf7b91..fabcd9fa6cf7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4332,9 +4332,10 @@ static inline bool 
can_checksum_protocol(netdev_features_t features,
 }
 
 #ifdef CONFIG_BUG
-void netdev_rx_csum_fault(struct net_device *dev);
+void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
 #else
-static inline void netdev_rx_csum_fault(struct net_device *dev)
+static inline void netdev_rx_csum_fault(struct net_device *dev,
+   struct sk_buff *skb)
 {
 }
 #endif
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 57f3a6fcfc1e..d8f4d55cd6c5 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
int len)
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
}
if (!skb_shared(skb))
skb->csum_valid = !sum;
@@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
}
 
if (!skb_shared(skb)) {
@@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
 
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(NULL);
+   netdev_rx_csum_fault(NULL, skb);
}
return 0;
 fault:
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ffcbdd55fa9..2b337df26117 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
 
 /* Take action when hardware reception checksum errors are detected. */
 #ifdef CONFIG_BUG
-void netdev_rx_csum_fault(struct net_device *dev)
+void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
 {
if (net_ratelimit()) {
pr_err("%s: hw csum failure\n", dev ? dev->name : "");
+   if (dev)
+   pr_err("dev features: %pNF\n", >features);
+   pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
+  skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
+  skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
+  skb->csum_complete_sw, skb->csum_valid);
dump_stack();
}
 }
@@ -5779,7 +5785,7 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb)
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
}
 
NAPI_GRO_CB(skb)->csum = wsum;
diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index 9062967575c4..7e55cfc69697 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -175,7 +175,7 @@ int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct 
sk_buff *skb)
return -1;
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
-   netdev_rx_csum_fault(skb->dev);
+   netdev_rx_csum_fault(skb->dev, skb);
return 0;
 no_checksum:
if (xdr_partial_copy_from_skb(xdr, 0, , xdr_skb_read_bits) < 0)
-- 
2.19.1

Re: [PATCH bpf-next v2 2/3] bpf: Support socket lookup in CGROUP_SOCK_ADDR progs

2018-11-09 Thread Martin Lau

On Fri, Nov 09, 2018 at 10:54:01AM -0800, Andrey Ignatov wrote:
> Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers
> available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR.
> 
> Such programs operate on sockets and have access to socket and struct
> sockaddr passed by user to system calls such as sys_bind, sys_connect,
> sys_sendmsg.
> 
> It's useful to be able to lookup other sockets from these programs.
> E.g. sys_connect may lookup IP:port endpoint and if there is a server
> socket bound to that endpoint ("server" can be defined by saddr & sport
> being zero), redirect client connection to it by rewriting IP:port in
> sockaddr passed to sys_connect.
Acked-by: Martin KaFai Lau

[PATCH bpf-next v2 2/3] bpf: Support socket lookup in CGROUP_SOCK_ADDR progs

2018-11-09 Thread Andrey Ignatov

Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers
available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR.

Such programs operate on sockets and have access to socket and struct
sockaddr passed by user to system calls such as sys_bind, sys_connect,
sys_sendmsg.

It's useful to be able to lookup other sockets from these programs.
E.g. sys_connect may lookup IP:port endpoint and if there is a server
socket bound to that endpoint ("server" can be defined by saddr & sport
being zero), redirect client connection to it by rewriting IP:port in
sockaddr passed to sys_connect.

Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
---
 net/core/filter.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index f4ae933edf61..f6ca38a7d433 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5042,6 +5042,43 @@ static const struct bpf_func_proto 
bpf_xdp_sk_lookup_tcp_proto = {
.arg4_type  = ARG_ANYTHING,
.arg5_type  = ARG_ANYTHING,
 };
+
+BPF_CALL_5(bpf_sock_addr_sk_lookup_tcp, struct bpf_sock_addr_kern *, ctx,
+  struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
+{
+   return __bpf_sk_lookup(NULL, tuple, len, sock_net(ctx->sk), 0,
+  IPPROTO_TCP, netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_sock_addr_sk_lookup_tcp_proto = {
+   .func   = bpf_sock_addr_sk_lookup_tcp,
+   .gpl_only   = false,
+   .ret_type   = RET_PTR_TO_SOCKET_OR_NULL,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_PTR_TO_MEM,
+   .arg3_type  = ARG_CONST_SIZE,
+   .arg4_type  = ARG_ANYTHING,
+   .arg5_type  = ARG_ANYTHING,
+};
+
+BPF_CALL_5(bpf_sock_addr_sk_lookup_udp, struct bpf_sock_addr_kern *, ctx,
+  struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
+{
+   return __bpf_sk_lookup(NULL, tuple, len, sock_net(ctx->sk), 0,
+  IPPROTO_UDP, netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
+   .func   = bpf_sock_addr_sk_lookup_udp,
+   .gpl_only   = false,
+   .ret_type   = RET_PTR_TO_SOCKET_OR_NULL,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_PTR_TO_MEM,
+   .arg3_type  = ARG_CONST_SIZE,
+   .arg4_type  = ARG_ANYTHING,
+   .arg5_type  = ARG_ANYTHING,
+};
+
 #endif /* CONFIG_INET */
 
 bool bpf_helper_changes_pkt_data(void *func)
@@ -5148,6 +5185,14 @@ sock_addr_func_proto(enum bpf_func_id func_id, const 
struct bpf_prog *prog)
return _get_socket_cookie_sock_addr_proto;
case BPF_FUNC_get_local_storage:
return _get_local_storage_proto;
+#ifdef CONFIG_INET
+   case BPF_FUNC_sk_lookup_tcp:
+   return _sock_addr_sk_lookup_tcp_proto;
+   case BPF_FUNC_sk_lookup_udp:
+   return _sock_addr_sk_lookup_udp_proto;
+   case BPF_FUNC_sk_release:
+   return _sk_release_proto;
+#endif /* CONFIG_INET */
default:
return bpf_base_func_proto(func_id);
}
-- 
2.17.1

[PATCH bpf-next v2 3/3] selftest/bpf: Use bpf_sk_lookup_{tcp,udp} in test_sock_addr

2018-11-09 Thread Andrey Ignatov

Use bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers from
test_sock_addr programs to make sure they're available and can lookup
and release socket properly for IPv4/IPv4, TCP/UDP.

Reading from a few fields of returned struct bpf_sock is also tested.

Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
Acked-by: Martin KaFai Lau 
---
 tools/testing/selftests/bpf/connect4_prog.c | 43 
 tools/testing/selftests/bpf/connect6_prog.c | 56 -
 2 files changed, 78 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/bpf/connect4_prog.c 
b/tools/testing/selftests/bpf/connect4_prog.c
index 5a88a681d2ab..b8395f3c43e9 100644
--- a/tools/testing/selftests/bpf/connect4_prog.c
+++ b/tools/testing/selftests/bpf/connect4_prog.c
@@ -21,23 +21,48 @@ int _version SEC("version") = 1;
 SEC("cgroup/connect4")
 int connect_v4_prog(struct bpf_sock_addr *ctx)
 {
+   struct bpf_sock_tuple tuple = {};
struct sockaddr_in sa;
+   struct bpf_sock *sk;
+
+   /* Verify that new destination is available. */
+   memset(, 0, sizeof(tuple.ipv4.saddr));
+   memset(, 0, sizeof(tuple.ipv4.sport));
+
+   tuple.ipv4.daddr = bpf_htonl(DST_REWRITE_IP4);
+   tuple.ipv4.dport = bpf_htons(DST_REWRITE_PORT4);
+
+   if (ctx->type != SOCK_STREAM && ctx->type != SOCK_DGRAM)
+   return 0;
+   else if (ctx->type == SOCK_STREAM)
+   sk = bpf_sk_lookup_tcp(ctx, , sizeof(tuple.ipv4), 0, 0);
+   else
+   sk = bpf_sk_lookup_udp(ctx, , sizeof(tuple.ipv4), 0, 0);
+
+   if (!sk)
+   return 0;
+
+   if (sk->src_ip4 != tuple.ipv4.daddr ||
+   sk->src_port != DST_REWRITE_PORT4) {
+   bpf_sk_release(sk);
+   return 0;
+   }
+
+   bpf_sk_release(sk);
 
/* Rewrite destination. */
ctx->user_ip4 = bpf_htonl(DST_REWRITE_IP4);
ctx->user_port = bpf_htons(DST_REWRITE_PORT4);
 
-   if (ctx->type == SOCK_DGRAM || ctx->type == SOCK_STREAM) {
-   ///* Rewrite source. */
-   memset(, 0, sizeof(sa));
+   /* Rewrite source. */
+   memset(, 0, sizeof(sa));
 
-   sa.sin_family = AF_INET;
-   sa.sin_port = bpf_htons(0);
-   sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4);
+   sa.sin_family = AF_INET;
+   sa.sin_port = bpf_htons(0);
+   sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4);
 
-   if (bpf_bind(ctx, (struct sockaddr *), sizeof(sa)) != 0)
-   return 0;
-   }
+   if (bpf_bind(ctx, (struct sockaddr *), sizeof(sa)) != 0)
+   return 0;
 
return 1;
 }
diff --git a/tools/testing/selftests/bpf/connect6_prog.c 
b/tools/testing/selftests/bpf/connect6_prog.c
index 8ea3f7d12dee..25f5dc7b7aa0 100644
--- a/tools/testing/selftests/bpf/connect6_prog.c
+++ b/tools/testing/selftests/bpf/connect6_prog.c
@@ -29,7 +29,41 @@ int _version SEC("version") = 1;
 SEC("cgroup/connect6")
 int connect_v6_prog(struct bpf_sock_addr *ctx)
 {
+   struct bpf_sock_tuple tuple = {};
struct sockaddr_in6 sa;
+   struct bpf_sock *sk;
+
+   /* Verify that new destination is available. */
+   memset(, 0, sizeof(tuple.ipv6.saddr));
+   memset(, 0, sizeof(tuple.ipv6.sport));
+
+   tuple.ipv6.daddr[0] = bpf_htonl(DST_REWRITE_IP6_0);
+   tuple.ipv6.daddr[1] = bpf_htonl(DST_REWRITE_IP6_1);
+   tuple.ipv6.daddr[2] = bpf_htonl(DST_REWRITE_IP6_2);
+   tuple.ipv6.daddr[3] = bpf_htonl(DST_REWRITE_IP6_3);
+
+   tuple.ipv6.dport = bpf_htons(DST_REWRITE_PORT6);
+
+   if (ctx->type != SOCK_STREAM && ctx->type != SOCK_DGRAM)
+   return 0;
+   else if (ctx->type == SOCK_STREAM)
+   sk = bpf_sk_lookup_tcp(ctx, , sizeof(tuple.ipv6), 0, 0);
+   else
+   sk = bpf_sk_lookup_udp(ctx, , sizeof(tuple.ipv6), 0, 0);
+
+   if (!sk)
+   return 0;
+
+   if (sk->src_ip6[0] != tuple.ipv6.daddr[0] ||
+   sk->src_ip6[1] != tuple.ipv6.daddr[1] ||
+   sk->src_ip6[2] != tuple.ipv6.daddr[2] ||
+   sk->src_ip6[3] != tuple.ipv6.daddr[3] ||
+   sk->src_port != DST_REWRITE_PORT6) {
+   bpf_sk_release(sk);
+   return 0;
+   }
+
+   bpf_sk_release(sk);
 
/* Rewrite destination. */
ctx->user_ip6[0] = bpf_htonl(DST_REWRITE_IP6_0);
@@ -39,21 +73,19 @@ int connect_v6_prog(struct bpf_sock_addr *ctx)
 
ctx->user_port = bpf_htons(DST_REWRITE_PORT6);
 
-   if (ctx->type == SOCK_DGRAM || ctx->type == SOCK_STREAM) {
-   /* Rewrite source. */
-   memset(, 0, sizeof(sa));
+   /* Rewrite source. */
+   memset(, 0, sizeof(sa));
 
-   sa.sin6_family = AF_INET6;
-   sa.sin6_port = bpf_htons(0);
+   sa.sin6_family = AF_INET6;
+   sa.sin6_port = bpf_htons(0);
 
-   sa.sin6_addr.s6_addr32[0] = bpf_htonl(SRC_REWRITE_IP6_0);

[PATCH bpf-next v2 1/3] bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp

2018-11-09 Thread Andrey Ignatov

Lookup functions in sk_lookup have different expectations about byte
order of provided arguments.

Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup
expect dport to be in network byte order and do ntohs(dport) internally.

At the same time __inet6_lookup expects dport to be in host byte order
and correspondingly name the argument hnum.

sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and
__inet6_lookup with regard to dport. But in __udp6_lib_lookup case it
uses host instead of expected network byte order. It makes result
returned by bpf_sk_lookup_udp for IPv6 incorrect.

The patch fixes byte order of dport passed to __udp6_lib_lookup.

Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84f02a
fixes TCPv6 but breaks UDPv6.

Fixes: 5ef0ae84f02a ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup")
Signed-off-by: Andrey Ignatov 
Acked-by: Joe Stringer 
Acked-by: Martin KaFai Lau 
---
 net/core/filter.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 53d50fb75ea1..f4ae933edf61 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4867,17 +4867,16 @@ static struct sock *sk_lookup(struct net *net, struct 
bpf_sock_tuple *tuple,
} else {
struct in6_addr *src6 = (struct in6_addr *)>ipv6.saddr;
struct in6_addr *dst6 = (struct in6_addr *)>ipv6.daddr;
-   u16 hnum = ntohs(tuple->ipv6.dport);
 
if (proto == IPPROTO_TCP)
sk = __inet6_lookup(net, _hashinfo, NULL, 0,
src6, tuple->ipv6.sport,
-   dst6, hnum,
+   dst6, ntohs(tuple->ipv6.dport),
dif, sdif, );
else if (likely(ipv6_bpf_stub))
sk = ipv6_bpf_stub->udp6_lib_lookup(net,
src6, 
tuple->ipv6.sport,
-   dst6, hnum,
+   dst6, 
tuple->ipv6.dport,
dif, sdif,
_table, NULL);
 #endif
-- 
2.17.1

[PATCH bpf-next v2 0/3] bpf: Support socket lookup in CGROUP_SOCK_ADDR progs

2018-11-09 Thread Andrey Ignatov

This patch set makes bpf_sk_lookup_tcp, bpf_sk_lookup_udp and
bpf_sk_release helpers available in programs of type
BPF_PROG_TYPE_CGROUP_SOCK_ADDR.

Patch 1 is a fix for bpf_sk_lookup_udp that was already merged to bpf
(stable) tree. Here it's prerequisite for patch 3.

Patch 2 is the main patch in the set, it makes the helpers available for
BPF_PROG_TYPE_CGROUP_SOCK_ADDR and provides more details about use-case.

Patch 3 adds selftest for new functionality.

v1->v2:
- remove "Split bpf_sk_lookup" patch since it was already split by:
  commit c8123ead13a5 ("bpf: Extend the sk_lookup() helper to XDP
  hookpoint.");
- avoid unnecessary bpf_sock_addr_sk_lookup function.


Andrey Ignatov (3):
  bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp
  bpf: Support socket lookup in CGROUP_SOCK_ADDR progs
  selftest/bpf: Use bpf_sk_lookup_{tcp,udp} in test_sock_addr

 net/core/filter.c   | 50 --
 tools/testing/selftests/bpf/connect4_prog.c | 43 
 tools/testing/selftests/bpf/connect6_prog.c | 56 -
 3 files changed, 125 insertions(+), 24 deletions(-)

-- 
2.17.1

Re: [PATCH bpf-next] filter: add BPF_ADJ_ROOM_DATA mode to bpf_skb_adjust_room()

2018-11-09 Thread Martin Lau

On Thu, Nov 08, 2018 at 04:11:37PM +0100, Nicolas Dichtel wrote:
> This new mode enables to add or remove an l2 header in a programmatic way
> with cls_bpf.
> For example, it enables to play with mpls headers.
> 
> Signed-off-by: Nicolas Dichtel 
> ---
>  include/uapi/linux/bpf.h   |  3 ++
>  net/core/filter.c  | 54 ++
>  tools/include/uapi/linux/bpf.h |  3 ++
>  3 files changed, 60 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 852dc17ab47a..47407fd5162b 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1467,6 +1467,8 @@ union bpf_attr {
>   *
>   *   * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
>   * (room space is added or removed below the layer 3 header).
> + *   * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
> + * packet (room space is added or removed below skb->data).
>   *
>   *   All values for *flags* are reserved for future usage, and must
>   *   be left at zero.
> @@ -2408,6 +2410,7 @@ enum bpf_func_id {
>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>  enum bpf_adj_room_mode {
>   BPF_ADJ_ROOM_NET,
> + BPF_ADJ_ROOM_DATA,
>  };
>  
>  /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e521c5ebc7d1..e699849b269d 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2884,6 +2884,58 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, s32 
> len_diff)
>   return ret;
>  }
>  
> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
> +{
> + unsigned short hhlen = skb->dev->header_ops ?
> +skb->dev->hard_header_len : 0;
> + int ret;
> +
> + ret = skb_unclone(skb, GFP_ATOMIC);
> + if (unlikely(ret < 0))
> + return ret;
> +
> + __skb_pull(skb, len);
> + skb_reset_mac_header(skb);
> + skb_reset_network_header(skb);
> + skb->network_header += hhlen;
> + skb_reset_transport_header(skb);
hmm...why transport_header does not need += hhlen here
while network_header does?

> + return 0;
> +}
> +
> +static int bpf_skb_data_grow(struct sk_buff *skb, u32 len)
> +{
> + unsigned short hhlen = skb->dev->header_ops ?
> +skb->dev->hard_header_len : 0;
> + int ret;
> +
> + ret = skb_cow(skb, len);
> + if (unlikely(ret < 0))
> + return ret;
> +
> + skb_push(skb, len);
> + skb_reset_mac_header(skb);
> + return 0;
> +}
> +
> +static int bpf_skb_adjust_data(struct sk_buff *skb, s32 len_diff)
> +{
> + u32 len_diff_abs = abs(len_diff);
> + bool shrink = len_diff < 0;
> + int ret;
> +
> + if (unlikely(len_diff_abs > 0xfffU))
> + return -EFAULT;
> +
> + if (shrink && len_diff_abs >= skb_headlen(skb))
> + return -EFAULT;
> +
> + ret = shrink ? bpf_skb_data_shrink(skb, len_diff_abs) :
> +bpf_skb_data_grow(skb, len_diff_abs);
> +
> + bpf_compute_data_pointers(skb);
> + return ret;
> +}
> +
>  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
>  u32, mode, u64, flags)
>  {
> @@ -2891,6 +2943,8 @@ BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, 
> s32, len_diff,
>   return -EINVAL;
>   if (likely(mode == BPF_ADJ_ROOM_NET))
>   return bpf_skb_adjust_net(skb, len_diff);
> + if (likely(mode == BPF_ADJ_ROOM_DATA))
> + return bpf_skb_adjust_data(skb, len_diff);
>  
>   return -ENOTSUPP;
>  }
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 852dc17ab47a..47407fd5162b 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -1467,6 +1467,8 @@ union bpf_attr {
>   *
>   *   * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
>   * (room space is added or removed below the layer 3 header).
> + *   * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
> + * packet (room space is added or removed below skb->data).
>   *
>   *   All values for *flags* are reserved for future usage, and must
>   *   be left at zero.
> @@ -2408,6 +2410,7 @@ enum bpf_func_id {
>  /* Mode for BPF_FUNC_skb_adjust_room helper. */
>  enum bpf_adj_room_mode {
>   BPF_ADJ_ROOM_NET,
> + BPF_ADJ_ROOM_DATA,
>  };
>  
>  /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
> -- 
> 2.18.0
>

Re: bring back IPX and NCPFS, please!

2018-11-09 Thread Willy Tarreau

On Fri, Nov 09, 2018 at 06:30:14PM +0100, Johannes C. Schulz wrote:
> Hello Willy, hello Stephen
> 
> Thankyou for your reply.
> But I'm not able to maintain or code these modules. I'm just a bloody
> user/webdev.

That's what we've all claimed before taking over something many years
ago you know :-)  The most important is time and willingness to try to
do it.

You could first look at the latest kernel supporting those, check if
they still used to work fine in your environment (not everyone has
access to these ones anymore), and if so, then try to copy that code
over newer kernels. Sometimes it will not build with an obvious error
that you'll be able to fix by yourself, sometimes it will be harder
and you'll have to ask for help and/or figure API changes in "git log".
After working many hours on this you'll be much more at ease with this
code and you'll possibly be able to make it work on your kernel version.
This is already a huge step because even if you don't consider it as
being in a mergeable state (too hackish, dirty etc), you have the
option to run it as your own patch for a while.

After this you'll seek some more help about the process needed to get
these merged back and to maintain them as long as you estimate you can
(possibly mark it deprecated and keep it as long as you can). And who
knows, given nothing changes in this area these days, maybe it will be
trivial to maintain this FS for another decade and you'll have learned
something fun and useful.

> It would be really nice if these modules will find a good
> maintainer!

Just think again about the advantages you have over many other people :
  - access to the environment
  - real use case for the feature

There's nothing wrong with trying and failing multiple times, even giving
up if you find the task too hard. But giving up before trying is quite
sad in your situation.

Cheers,
Willy

[PATCH bpf-next] selftests/bpf: Fix uninitialized duration warning

2018-11-09 Thread Joe Stringer

Daniel Borkmann reports:

test_progs.c: In function ‘main’:
test_progs.c:81:3: warning: ‘duration’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
   printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\
   ^~
test_progs.c:1706:8: note: ‘duration’ was declared here
  __u32 duration;
^~~~

Signed-off-by: Joe Stringer 
---

I'm actually not able to reproduce this with GCC 7.3 or 8.2, so I'll
rely on review to establish that this patch works as intended.
---
 tools/testing/selftests/bpf/test_progs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c 
b/tools/testing/selftests/bpf/test_progs.c
index 2d3c04f45530..c1e688f61061 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -1703,7 +1703,7 @@ static void test_reference_tracking()
const char *file = "./test_sk_lookup_kern.o";
struct bpf_object *obj;
struct bpf_program *prog;
-   __u32 duration;
+   __u32 duration = 0;
int err = 0;
 
obj = bpf_object__open(file);
-- 
2.17.1

RE: [PATCH net-next 7/8] ixgbe: extend PTP gettime function to read system clock

2018-11-09 Thread Keller, Jacob E

> -Original Message-
> From: Miroslav Lichvar [mailto:mlich...@redhat.com]
> Sent: Friday, November 09, 2018 2:15 AM
> To: netdev@vger.kernel.org
> Cc: Richard Cochran ; Keller, Jacob E
> ; Miroslav Lichvar ; Kirsher,
> Jeffrey T 
> Subject: [PATCH net-next 7/8] ixgbe: extend PTP gettime function to read 
> system
> clock
> 
> -static int ixgbe_ptp_gettime(struct ptp_clock_info *ptp, struct timespec64 
> *ts)
> +static int ixgbe_ptp_gettimex(struct ptp_clock_info *ptp,
> +   struct timespec64 *ts,
> +   struct ptp_system_timestamp *sts)
>  {
>   struct ixgbe_adapter *adapter =
>   container_of(ptp, struct ixgbe_adapter, ptp_caps);
> + struct ixgbe_hw *hw = >hw;
>   unsigned long flags;
> - u64 ns;
> + u64 ns, stamp;
> 
>   spin_lock_irqsave(>tmreg_lock, flags);
> - ns = timecounter_read(>hw_tc);
> +
> + switch (adapter->hw.mac.type) {
> + case ixgbe_mac_X550:
> + case ixgbe_mac_X550EM_x:
> + case ixgbe_mac_x550em_a:
> + /* Upper 32 bits represent billions of cycles, lower 32 bits
> +  * represent cycles. However, we use timespec64_to_ns for the
> +  * correct math even though the units haven't been corrected
> +  * yet.
> +  */
> + ptp_read_system_prets(sts);
> + IXGBE_READ_REG(hw, IXGBE_SYSTIMR);
> + ptp_read_system_postts(sts);
> + ts->tv_nsec = IXGBE_READ_REG(hw, IXGBE_SYSTIML);
> + ts->tv_sec = IXGBE_READ_REG(hw, IXGBE_SYSTIMH);
> + stamp = timespec64_to_ns(ts);
> + break;
> + default:
> + ptp_read_system_prets(sts);
> + stamp = IXGBE_READ_REG(hw, IXGBE_SYSTIML);
> + ptp_read_system_postts(sts);
> + stamp |= (u64)IXGBE_READ_REG(hw, IXGBE_SYSTIMH) << 32;
> + break;
> + }
> +
> + ns = timecounter_cyc2time(>hw_tc, stamp);
> +

At first, I was confused by this entire block of code, but then realized that 
we can't update the timecounter_read method, so we instead have to break this 
out so that our calls to ptp_read_system_prets() and ptp_read_system_postts() 
can be added between the register reads.

Ok, that makes sense.

>   spin_unlock_irqrestore(>tmreg_lock, flags);
> 
>   *ts = ns_to_timespec64(ns);
> @@ -567,10 +597,14 @@ void ixgbe_ptp_overflow_check(struct ixgbe_adapter
> *adapter)
>  {
>   bool timeout = time_is_before_jiffies(adapter->last_overflow_check +
>IXGBE_OVERFLOW_PERIOD);
> - struct timespec64 ts;
> + unsigned long flags;
> 
>   if (timeout) {
> - ixgbe_ptp_gettime(>ptp_caps, );
> + /* Update the timecounter */
> + spin_lock_irqsave(>tmreg_lock, flags);
> + timecounter_read(>hw_tc);
> + spin_unlock_irqrestore(>tmreg_lock, flags);
> +

This also explains this change where we now have to update the timecounter 
during the overflow check.

Ok, this makes sense to me.

Thanks,
Jake

[PATCH] add an initial version of snmp_counter.rst

2018-11-09 Thread yupeng

The snmp_counter.rst run a set of simple experiments, explains the
meaning of snmp counters depend on the experiments' results. This is
an initial version, only covers a small part of the snmp counters.

Signed-off-by: yupeng 
---
 Documentation/networking/index.rst|   1 +
 Documentation/networking/snmp_counter.rst | 963 ++
 2 files changed, 964 insertions(+)
 create mode 100644 Documentation/networking/snmp_counter.rst

diff --git a/Documentation/networking/index.rst 
b/Documentation/networking/index.rst
index bd89dae8d578..6a47629ef8ed 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -31,6 +31,7 @@ Contents:
net_failover
alias
bridge
+   snmp_counter
 
 .. only::  subproject
 
diff --git a/Documentation/networking/snmp_counter.rst 
b/Documentation/networking/snmp_counter.rst
new file mode 100644
index ..2939c5acf675
--- /dev/null
+++ b/Documentation/networking/snmp_counter.rst
@@ -0,0 +1,963 @@
+
+snmp counter tutorial
+
+
+This document explains the meaning of snmp counters. For understanding
+their meanings better, this document doesn't explain the counters one
+by one, but creates a set of experiments, and explains the counters
+depend on the experiments' results. The experiments are on one or two
+virtual machines. Except for the test commands we use in the experiments,
+the virtual machines have no other network traffic. We use the 'nstat'
+command to get the values of snmp counters, before every test, we run
+'nstat -n' to update the history, so the 'nstat' output would only
+show the changes of the snmp counters. For more information about
+nstat, please refer:
+
+http://man7.org/linux/man-pages/man8/nstat.8.html
+
+icmp ping
+
+
+Run the ping command against the public dns server 8.8.8.8::
+
+  nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
+  PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
+  64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms
+
+  --- 8.8.8.8 ping statistics ---
+  1 packets transmitted, 1 received, 0% packet loss, time 0ms
+  rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms
+
+The nstayt result::
+
+  nstatuser@nstat-a:~$ nstat
+  #kernel
+  IpInReceives1  0.0
+  IpInDelivers1  0.0
+  IpOutRequests   1  0.0
+  IcmpInMsgs  1  0.0
+  IcmpInEchoReps  1  0.0
+  IcmpOutMsgs 1  0.0
+  IcmpOutEchos1  0.0
+  IcmpMsgInType0  1  0.0
+  IcmpMsgOutType8 1  0.0
+  IpExtInOctets   84 0.0
+  IpExtOutOctets  84 0.0
+  IpExtInNoECTPkts1  0.0
+
+The nstat output could be divided into two part: one with the 'Ext'
+keyword, another without the 'Ext' keyword. If the counter name
+doesn't have 'Ext', it is defined by one of snmp rfc, if it has 'Ext',
+it is a kernel extent counter. Below we explain them one by one.
+
+The rfc defined counters
+--
+
+* IpInReceives
+The total number of input datagrams received from interfaces,
+including those received in error.
+
+https://tools.ietf.org/html/rfc1213#page-26
+
+* IpInDelivers
+The total number of input datagrams successfully delivered to IP
+user-protocols (including ICMP).
+
+https://tools.ietf.org/html/rfc1213#page-28
+
+* IpOutRequests
+The total number of IP datagrams which local IP user-protocols
+(including ICMP) supplied to IP in requests for transmission.  Note
+that this counter does not include any datagrams counted in
+ipForwDatagrams.
+
+https://tools.ietf.org/html/rfc1213#page-28
+
+* IcmpInMsgs
+The total number of ICMP messages which the entity received.  Note
+that this counter includes all those counted by icmpInErrors.
+
+https://tools.ietf.org/html/rfc1213#page-41
+
+* IcmpInEchoReps
+The number of ICMP Echo Reply messages received.
+
+https://tools.ietf.org/html/rfc1213#page-42
+
+* IcmpOutMsgs
+The total number of ICMP messages which this entity attempted to send.
+Note that this counter includes all those counted by icmpOutErrors.
+
+https://tools.ietf.org/html/rfc1213#page-43
+
+* IcmpOutEchos
+The number of ICMP Echo (request) messages sent.
+
+https://tools.ietf.org/html/rfc1213#page-45
+
+IcmpMsgInType0 and IcmpMsgOutType8 are not defined by any snmp related
+RFCs, but their meaning are quite straightforward, they count the
+packet number of specific icmp packet types. We could find the icmp
+types here:
+
+https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml
+
+Type 8 is echo, type 0 is echo reply.
+
+Until now, we can easily explain these items of the nstat: We sent an
+icmp echo request, so IpOutRequests, IcmpOutMsgs, IcmpOutEchos and

RE: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread Keller, Jacob E

> -Original Message-
> From: Miroslav Lichvar [mailto:mlich...@redhat.com]
> Sent: Friday, November 09, 2018 2:15 AM
> To: netdev@vger.kernel.org
> Cc: Richard Cochran ; Keller, Jacob E
> ; Miroslav Lichvar ; Marcelo
> Tosatti ; Kirsher, Jeffrey T 
> ;
> Michael Chan 
> Subject: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization
> 
> RFC->v1:
> - added new patches
> - separated PHC timestamp from ptp_system_timestamp
> - fixed memory leak in PTP_SYS_OFFSET_EXTENDED
> - changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
> - fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
> - fixed timecounter updates in drivers
> - split gettimex in igb driver
> - fixed ptp_read_* functions to be available without
>   CONFIG_PTP_1588_CLOCK
> 
> This series enables a more accurate synchronization between PTP hardware
> clocks and the system clock.

Thanks for doing this, Miroslav!

> 
> The first two patches are minor cleanup/bug fixes.
> 
> The third patch adds an extended version of the PTP_SYS_OFFSET ioctl,
> which returns three timestamps for each measurement. The idea is to
> shorten the interval between the system timestamps to contain just the
> reading of the lowest register of the PHC in order to reduce the error
> in the measured offset and get a smaller upper bound on the maximum
> error.
> 
> The fourth patch deprecates the original gettime function.
> 
> The remaining patches update the gettime function in order to support
> the new ioctl in the e1000e, igb, ixgbe, and tg3 drivers.
> 
> Tests with few different NICs in different machines show that:
> - with an I219 (e1000e) the measured delay was reduced from 2500 to 1300
>   ns and the error in the measured offset, when compared to the cross
>   timestamping supported by the driver, was reduced by a factor of 5
> - with an I210 (igb) the delay was reduced from 5100 to 1700 ns
> - with an I350 (igb) the delay was reduced from 2300 to 750 ns
> - with an X550 (ixgbe) the delay was reduced from 1950 to 650 ns
> - with a BCM5720 (tg3) the delay was reduced from 2400 to 1200 ns
> 

Impressive results!

For the main portions and the Intel driver changes this is

Reviewed-by: Jacob Keller 

Regards,
Jake

> 
> Miroslav Lichvar (8):
>   ptp: reorder declarations in ptp_ioctl()
>   ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
>   ptp: add PTP_SYS_OFFSET_EXTENDED ioctl
>   ptp: deprecate gettime64() in favor of gettimex64()
>   e1000e: extend PTP gettime function to read system clock
>   igb: extend PTP gettime function to read system clock
>   ixgbe: extend PTP gettime function to read system clock
>   tg3: extend PTP gettime function to read system clock
> 
>  drivers/net/ethernet/broadcom/tg3.c  | 19 --
>  drivers/net/ethernet/intel/e1000e/e1000.h|  3 +
>  drivers/net/ethernet/intel/e1000e/netdev.c   | 42 ++---
>  drivers/net/ethernet/intel/e1000e/ptp.c  | 16 +++--
>  drivers/net/ethernet/intel/igb/igb_ptp.c | 65 +---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 54 +---
>  drivers/ptp/ptp_chardev.c| 55 ++---
>  drivers/ptp/ptp_clock.c  |  5 +-
>  include/linux/ptp_clock_kernel.h | 33 ++
>  include/uapi/linux/ptp_clock.h   | 12 
>  10 files changed, 253 insertions(+), 51 deletions(-)
> 
> --
> 2.17.2

[PATCH net-next 3/3] net: phy: improve and inline phy_change

2018-11-09 Thread Heiner Kallweit

Now that phy_mac_interrupt() doesn't call phy_change() any longer it's
called from phy_interrupt() only. Therefore phy_interrupt_is_valid()
returns true always and the check can be removed.
In case of PHY_HALTED phy_interrupt() bails out immediately,
therefore the second check for PHY_HALTED including the call to
phy_disable_interrupts() can be removed.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/phy.c | 47 ++-
 1 file changed, 15 insertions(+), 32 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ce1e8130a..083977d2f 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -722,41 +722,12 @@ static int phy_disable_interrupts(struct phy_device 
*phydev)
return phy_clear_interrupt(phydev);
 }
 
-/**
- * phy_change - Called by the phy_interrupt to handle PHY changes
- * @phydev: phy_device struct that interrupted
- */
-static irqreturn_t phy_change(struct phy_device *phydev)
-{
-   if (phy_interrupt_is_valid(phydev)) {
-   if (phydev->drv->did_interrupt &&
-   !phydev->drv->did_interrupt(phydev))
-   return IRQ_NONE;
-
-   if (phydev->state == PHY_HALTED)
-   if (phy_disable_interrupts(phydev))
-   goto phy_err;
-   }
-
-   /* reschedule state queue work to run as soon as possible */
-   phy_trigger_machine(phydev);
-
-   if (phy_interrupt_is_valid(phydev) && phy_clear_interrupt(phydev))
-   goto phy_err;
-   return IRQ_HANDLED;
-
-phy_err:
-   phy_error(phydev);
-   return IRQ_NONE;
-}
-
 /**
  * phy_interrupt - PHY interrupt handler
  * @irq: interrupt line
  * @phy_dat: phy_device pointer
  *
- * Description: When a PHY interrupt occurs, the handler disables
- * interrupts, and uses phy_change to handle the interrupt.
+ * Description: Handle PHY interrupt
  */
 static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 {
@@ -765,7 +736,19 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
if (PHY_HALTED == phydev->state)
return IRQ_NONE;/* It can't be ours.  */
 
-   return phy_change(phydev);
+   if (phydev->drv->did_interrupt && !phydev->drv->did_interrupt(phydev))
+   return IRQ_NONE;
+
+   /* reschedule state queue work to run as soon as possible */
+   phy_trigger_machine(phydev);
+
+   if (phy_clear_interrupt(phydev))
+   goto phy_err;
+   return IRQ_HANDLED;
+
+phy_err:
+   phy_error(phydev);
+   return IRQ_NONE;
 }
 
 /**
@@ -846,7 +829,7 @@ void phy_stop(struct phy_device *phydev)
phy_state_machine(>state_queue.work);
 
/* Cannot call flush_scheduled_work() here as desired because
-* of rtnl_lock(), but PHY_HALTED shall guarantee phy_change()
+* of rtnl_lock(), but PHY_HALTED shall guarantee irq handler
 * will not reenable interrupts.
 */
 }
-- 
2.19.1

[PATCH net-next 2/3] net: phy: simplify phy_mac_interrupt and related functions

2018-11-09 Thread Heiner Kallweit

When using phy_mac_interrupt() the irq number is set to
PHY_IGNORE_INTERRUPT, therefore phy_interrupt_is_valid() returns false.
As a result phy_change() effectively just calls phy_trigger_machine()
when called from phy_mac_interrupt() via phy_change_work(). So we can
call phy_trigger_machine() from phy_mac_interrupt() directly and
remove some now unneeded code.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/phy.c| 14 +-
 drivers/net/phy/phy_device.c |  1 -
 include/linux/phy.h  |  3 ---
 3 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index da41420df..ce1e8130a 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -750,18 +750,6 @@ static irqreturn_t phy_change(struct phy_device *phydev)
return IRQ_NONE;
 }
 
-/**
- * phy_change_work - Scheduled by the phy_mac_interrupt to handle PHY changes
- * @work: work_struct that describes the work to be done
- */
-void phy_change_work(struct work_struct *work)
-{
-   struct phy_device *phydev =
-   container_of(work, struct phy_device, phy_queue);
-
-   phy_change(phydev);
-}
-
 /**
  * phy_interrupt - PHY interrupt handler
  * @irq: interrupt line
@@ -1005,7 +993,7 @@ void phy_state_machine(struct work_struct *work)
 void phy_mac_interrupt(struct phy_device *phydev)
 {
/* Trigger a state machine change */
-   queue_work(system_power_efficient_wq, >phy_queue);
+   phy_trigger_machine(phydev);
 }
 EXPORT_SYMBOL(phy_mac_interrupt);
 
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 00a46218c..0f56d408b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -587,7 +587,6 @@ struct phy_device *phy_device_create(struct mii_bus *bus, 
int addr, int phy_id,
 
mutex_init(>lock);
INIT_DELAYED_WORK(>state_queue, phy_state_machine);
-   INIT_WORK(>phy_queue, phy_change_work);
 
/* Request the appropriate module unconditionally; don't
 * bother trying to do so only if it isn't already loaded,
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 7db07e69c..17d1f6472 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -369,7 +369,6 @@ struct phy_c45_device_ids {
  * giving up on the current attempt at acquiring a link
  * irq: IRQ number of the PHY's interrupt (-1 if none)
  * phy_timer: The timer for handling the state machine
- * phy_queue: A work_queue for the phy_mac_interrupt
  * attached_dev: The attached enet driver's device instance ptr
  * adjust_link: Callback for the enet controller to respond to
  * changes in the link state.
@@ -454,7 +453,6 @@ struct phy_device {
void *priv;
 
/* Interrupt and Polling infrastructure */
-   struct work_struct phy_queue;
struct delayed_work state_queue;
 
struct mutex lock;
@@ -1029,7 +1027,6 @@ int phy_driver_register(struct phy_driver *new_driver, 
struct module *owner);
 int phy_drivers_register(struct phy_driver *new_driver, int n,
 struct module *owner);
 void phy_state_machine(struct work_struct *work);
-void phy_change_work(struct work_struct *work);
 void phy_mac_interrupt(struct phy_device *phydev);
 void phy_start_machine(struct phy_device *phydev);
 void phy_stop_machine(struct phy_device *phydev);
-- 
2.19.1

[PATCH net-next 1/3] net: phy: don't set state PHY_CHANGELINK in phy_change

2018-11-09 Thread Heiner Kallweit

State PHY_CHANGELINK isn't needed here, we can call the state machine
directly. We just have to remove the check for phy_polling_mode() to
make this work also in interrupt mode. Removing this check doesn't
cause any overhead because when not polling the state machine is
called only if required by some event.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/phy.c | 8 
 include/linux/phy.h   | 7 ++-
 2 files changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 8dac890f3..da41420df 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -738,11 +738,6 @@ static irqreturn_t phy_change(struct phy_device *phydev)
goto phy_err;
}
 
-   mutex_lock(>lock);
-   if ((PHY_RUNNING == phydev->state) || (PHY_NOLINK == phydev->state))
-   phydev->state = PHY_CHANGELINK;
-   mutex_unlock(>lock);
-
/* reschedule state queue work to run as soon as possible */
phy_trigger_machine(phydev);
 
@@ -946,9 +941,6 @@ void phy_state_machine(struct work_struct *work)
break;
case PHY_NOLINK:
case PHY_RUNNING:
-   if (!phy_polling_mode(phydev))
-   break;
-   /* fall through */
case PHY_CHANGELINK:
case PHY_RESUMING:
err = phy_check_link_status(phydev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 59bb31ee1..7db07e69c 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -298,7 +298,7 @@ struct phy_device *mdiobus_scan(struct mii_bus *bus, int 
addr);
  * - timer moves to NOLINK or RUNNING
  *
  * NOLINK: PHY is up, but not currently plugged in.
- * - If the timer notes that the link comes back, we move to RUNNING
+ * - irq or timer will set RUNNING if link comes back
  * - phy_stop moves to HALTED
  *
  * FORCING: PHY is being configured with forced settings
@@ -309,10 +309,7 @@ struct phy_device *mdiobus_scan(struct mii_bus *bus, int 
addr);
  *
  * RUNNING: PHY is currently up, running, and possibly sending
  * and/or receiving packets
- * - timer will set CHANGELINK if we're polling (this ensures the
- *   link state is polled every other cycle of this state machine,
- *   which makes it every other second)
- * - irq will set CHANGELINK
+ * - irq or timer will set NOLINK if link goes down
  * - phy_stop moves to HALTED
  *
  * CHANGELINK: PHY experienced a change in link state
-- 
2.19.1

[PATCH net-next 0/3] net: phy: further phylib simplifications after recent changes to the state machine

2018-11-09 Thread Heiner Kallweit

After the recent changes to the state machine phylib can be further
simplified (w/o having to make any assumptions).

Heiner Kallweit (3):
  net: phy: don't set state PHY_CHANGELINK in phy_change
  net: phy: simplify phy_mac_interrupt and related functions
  net: phy: improve and inline phy_change

 drivers/net/phy/phy.c| 67 
 drivers/net/phy/phy_device.c |  1 -
 include/linux/phy.h  | 10 ++
 3 files changed, 17 insertions(+), 61 deletions(-)

-- 
2.19.1

Re: [PATCH v5 bpf-next 0/7] bpftool: support loading flow dissector

2018-11-09 Thread Quentin Monnet

2018-11-09 08:21 UTC-0800 ~ Stanislav Fomichev 
> v5 changes:
> * FILE -> PATH for load/loadall (can be either file or directory now)
> * simpler implementation for __bpf_program__pin_name
> * removed p_err for REQ_ARGS checks
> * parse_atach_detach_args -> parse_attach_detach_args
> * for -> while in bpf_object__pin_{programs,maps} recovery
> 
> v4 changes:
> * addressed another round of comments/style issues from Jakub Kicinski &
>   Quentin Monnet (thanks!)
> * implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
>   used them in bpf_program__pin
> * added new pin_name to bpf_program so bpf_program__pin
>   works with sections that contain '/'
> * moved *loadall* command implementation into a separate patch
> * added patch that implements *pinmaps* to pin maps when doing
>   load/loadall
> 
> v3 changes:
> * (maybe) better cleanup for partial failure in bpf_object__pin
> * added special case in bpf_program__pin for programs with single
>   instances
> 
> v2 changes:
> * addressed comments/style issues from Jakub Kicinski & Quentin Monnet
> * removed logic that populates jump table
> * added cleanup for partial failure in bpf_object__pin
> 
> This patch series adds support for loading and attaching flow dissector
> programs from the bpftool:
> 
> * first patch fixes flow dissector section name in the selftests (so
>   libbpf auto-detection works)
> * second patch adds proper cleanup to bpf_object__pin, parts of which are now
>   being used to attach all flow dissector progs/maps
> * third patch adds special case in bpf_program__pin for programs with
>   single instances (we don't create /0 pin anymore, just )
> * forth patch adds pin_name to the bpf_program struct
>   which is now used as a pin name in bpf_program__pin et al
> * fifth patch adds *loadall* command that pins all programs, not just
>   the first one
> * sixth patch adds *pinmaps* argument to load/loadall to let users pin
>   all maps of the obj file
> * seventh patch adds actual flow_dissector support to the bpftool and
>   an example

The series look good to me, thanks!

For the bpftool parts:
Acked-by: Quentin Monnet

[PATCH v2 net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-09 Thread Heiner Kallweit

As a heritage from the very early days of phylib member interrupts is
defined as u32 even though it's just a flag whether interrupts are
enabled. So we can change it to a bitfield member. In addition change
the code dealing with this member in a way that it's clear we're
dealing with a bool value.

Signed-off-by: Heiner Kallweit 
---
v2:
- use false/true instead of 0/1 for the constants

Actually this member isn't needed at all and could be replaced with
a parameter in phy_driver->config_intr. But this would mean an API
change, maybe I come up with a proposal later.
---
 drivers/net/phy/phy.c |  4 ++--
 include/linux/phy.h   | 10 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index dd5bff955..8dac890f3 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -115,9 +115,9 @@ static int phy_clear_interrupt(struct phy_device *phydev)
  *
  * Returns 0 on success or < 0 on error.
  */
-static int phy_config_interrupt(struct phy_device *phydev, u32 interrupts)
+static int phy_config_interrupt(struct phy_device *phydev, bool interrupts)
 {
-   phydev->interrupts = interrupts;
+   phydev->interrupts = interrupts ? 1 : 0;
if (phydev->drv->config_intr)
return phydev->drv->config_intr(phydev);
 
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 240e04d5a..59bb31ee1 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -262,8 +262,8 @@ static inline struct mii_bus *devm_mdiobus_alloc(struct 
device *dev)
 void devm_mdiobus_free(struct device *dev, struct mii_bus *bus);
 struct phy_device *mdiobus_scan(struct mii_bus *bus, int addr);
 
-#define PHY_INTERRUPT_DISABLED 0x0
-#define PHY_INTERRUPT_ENABLED  0x8000
+#define PHY_INTERRUPT_DISABLED false
+#define PHY_INTERRUPT_ENABLED  true
 
 /* PHY state machine states:
  *
@@ -409,6 +409,9 @@ struct phy_device {
/* The most recently read link state */
unsigned link:1;
 
+   /* Interrupts are enabled */
+   unsigned interrupts:1;
+
enum phy_state state;
 
u32 dev_flags;
@@ -424,9 +427,6 @@ struct phy_device {
int pause;
int asym_pause;
 
-   /* Enabled Interrupts */
-   u32 interrupts;
-
/* Union of PHY and Attached devices' supported modes */
/* See mii.h for more info */
u32 supported;
-- 
2.19.1

Re: [PATCH bpf-next 2/4] bpf: Split bpf_sk_lookup

2018-11-09 Thread Andrey Ignatov

Martin Lau  [Fri, 2018-11-09 09:19 -0800]:
> On Thu, Nov 08, 2018 at 08:54:23AM -0800, Andrey Ignatov wrote:
> > Split bpf_sk_lookup to separate core functionality, that can be reused
> > to make socket lookup available to more program types, from
> > functionality specific to program types that have access to skb.
> > 
> > Core functionality is placed to __bpf_sk_lookup. And bpf_sk_lookup only
> > gets caller netns and ifindex from skb and passes it to __bpf_sk_lookup.
> > 
> > Program types that don't have access to skb can just pass NULL to
> > __bpf_sk_lookup that will be handled correctly by both inet{,6}_sdif and
> > lookup functions.
> > 
> > This is refactoring that simply moves blocks around and does NOT change
> > existing logic.
> > 
> > Signed-off-by: Andrey Ignatov 
> > Acked-by: Alexei Starovoitov 
> > ---
> >  net/core/filter.c | 38 +++---
> >  1 file changed, 23 insertions(+), 15 deletions(-)
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 9a1327eb25fa..dc0f86a707b7 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -4825,14 +4825,10 @@ static const struct bpf_func_proto 
> > bpf_lwt_seg6_adjust_srh_proto = {
> >  
> >  #ifdef CONFIG_INET
> >  static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple 
> > *tuple,
> > - struct sk_buff *skb, u8 family, u8 proto)
> > + struct sk_buff *skb, u8 family, u8 proto, int dif)
> >  {
> > bool refcounted = false;
> > struct sock *sk = NULL;
> > -   int dif = 0;
> > -
> > -   if (skb->dev)
> > -   dif = skb->dev->ifindex;
> >  
> > if (family == AF_INET) {
> > __be32 src4 = tuple->ipv4.saddr;
> > @@ -4875,16 +4871,16 @@ static struct sock *sk_lookup(struct net *net, 
> > struct bpf_sock_tuple *tuple,
> > return sk;
> >  }
> >  
> > -/* bpf_sk_lookup performs the core lookup for different types of sockets,
> > +/* __bpf_sk_lookup performs the core lookup for different types of sockets,
> >   * taking a reference on the socket if it doesn't have the flag 
> > SOCK_RCU_FREE.
> >   * Returns the socket as an 'unsigned long' to simplify the casting in the
> >   * callers to satisfy BPF_CALL declarations.
> >   */
> >  static unsigned long
> > -bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> > - u8 proto, u64 netns_id, u64 flags)
> > +__bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> > +   u8 proto, u64 netns_id, struct net *caller_net, int ifindex,
> > +   u64 flags)
> That looks a bit different from the one landed to bpf-next.
> You may need to respin the set.

Since Nitin's version is landed now, I'll rebase on top of it and this
patch just won't be needed (initially I did it to unblock myself).

I'll also address the nit in patch 3 and send v2 with both changes.

Thanks Martin!

> >  {
> > -   struct net *caller_net;
> > struct sock *sk = NULL;
> > u8 family = AF_UNSPEC;
> > struct net *net;
> > @@ -4893,19 +4889,15 @@ bpf_sk_lookup(struct sk_buff *skb, struct 
> > bpf_sock_tuple *tuple, u32 len,
> > if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags))
> > goto out;
> >  
> > -   if (skb->dev)
> > -   caller_net = dev_net(skb->dev);
> > -   else
> > -   caller_net = sock_net(skb->sk);
> > if (netns_id) {
> > net = get_net_ns_by_id(caller_net, netns_id);
> > if (unlikely(!net))
> > goto out;
> > -   sk = sk_lookup(net, tuple, skb, family, proto);
> > +   sk = sk_lookup(net, tuple, skb, family, proto, ifindex);
> > put_net(net);
> > } else {
> > net = caller_net;
> > -   sk = sk_lookup(net, tuple, skb, family, proto);
> > +   sk = sk_lookup(net, tuple, skb, family, proto, ifindex);
> > }
> >  
> > if (sk)
> > @@ -4914,6 +4906,22 @@ bpf_sk_lookup(struct sk_buff *skb, struct 
> > bpf_sock_tuple *tuple, u32 len,
> > return (unsigned long) sk;
> >  }
> >  
> > +static unsigned long
> > +bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> > + u8 proto, u64 netns_id, u64 flags)
> > +{
> > +   struct net *caller_net = sock_net(skb->sk);
> > +   int ifindex = 0;
> > +
> > +   if (skb->dev) {
> > +   caller_net = dev_net(skb->dev);
> > +   ifindex = skb->dev->ifindex;
> > +   }
> > +
> > +   return __bpf_sk_lookup(skb, tuple, len, proto, netns_id, caller_net,
> > +  ifindex, flags);
> > +}
> > +
> >  BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb,
> >struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
> >  {
> > -- 
> > 2.17.1
> > 

-- 
Andrey Ignatov

Re: bring back IPX and NCPFS, please!

2018-11-09 Thread Johannes C. Schulz

Hello Willy, hello Stephen

Thankyou for your reply.
But I'm not able to maintain or code these modules. I'm just a bloody
user/webdev. It would be really nice if these modules will find a good
maintainer!

Best regards
Johannes

Am Fr., 9. Nov. 2018 um 17:09 Uhr schrieb Willy Tarreau :
>
> On Fri, Nov 09, 2018 at 02:23:27PM +0100, Johannes C. Schulz wrote:
> > Hello all!
> >
> > I like to please you to bring back IPX and NCPFS modules to the kernel.
> > Whyever my admins using Novell-shares on our network which I'm not be
> > able to use anymore - I'm forced to use cifs instead (and the admins
> > will kill the cifs-shares in some time), because my kernel (4.18) does
> > not have support for ncpfs anymore.
> > Maybe we at my work are not enough people that just for us this
> > modules will come back, but maybe out there are other people.
> > Thank you.
>
> Well, like any code, it requires time and skills. If nobody with the
> required skills is available for this anymore, there's no way you'll
> get a feature back. However you could always step up to maintain it
> yourself if you have the time and are willing to develop your own
> skills at it. It's how maintainers change over time for certain parts
> of the system, so you have an opportunity here.
>
> Just my two cents,
> Willy



-- 
Viele Grüße
Johannes C. Schulz

„Programmer - n. [proh-gram-er] an organism that turns caffeine and
pizza into software“

Re: [PATCH bpf-next 4/4] selftest/bpf: Use bpf_sk_lookup_{tcp,udp} in test_sock_addr

2018-11-09 Thread Martin Lau

On Thu, Nov 08, 2018 at 08:54:25AM -0800, Andrey Ignatov wrote:
> Use bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers from
> test_sock_addr programs to make sure they're available and can lookup
> and release socket properly for IPv4/IPv4, TCP/UDP.
> 
> Reading from a few fields of returned struct bpf_sock is also tested.
> 
Acked-by: Martin KaFai Lau

Re: [PATCH][RFC] udp: cache sock to avoid searching it twice

2018-11-09 Thread Eric Dumazet




On 11/08/2018 10:21 PM, Li RongQing wrote:
> GRO for UDP needs to lookup socket twice, first is in gro receive,
> second is gro complete, so if store sock to skb to avoid looking up
> twice, this can give small performance boost
> 
> netperf -t UDP_RR -l 10
> 
> Before:
>   Rate per sec: 28746.01
> After:
>   Rate per sec: 29401.67
> 
> Signed-off-by: Li RongQing 
> ---
>  net/ipv4/udp_offload.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 0646d61f4fa8..429570112a33 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -408,6 +408,11 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
> struct sk_buff *skb,
>  
>   if (udp_sk(sk)->gro_enabled) {
>   pp = call_gro_receive(udp_gro_receive_segment, head, skb);
> +
> + if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> + sock_hold(sk);
> + pp->sk = sk;


You also have to set pp->destructor to sock_edemux

flush_gro_hash -> kfree_skb()

If there is no destructor, the reference on pp->sk will never be released.




> + }
>   rcu_read_unlock();
>   return pp;
>   }
> @@ -444,6 +449,10 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
> struct sk_buff *skb,
>   skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
>   pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
>  
> + if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> + sock_hold(sk);
> + pp->sk = sk;
> + }
>  out_unlock:
>   rcu_read_unlock();
>   skb_gro_flush_final(skb, pp, flush);
> @@ -502,7 +511,9 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
>   uh->len = newlen;
>  
>   rcu_read_lock();
> - sk = (*lookup)(skb, uh->source, uh->dest);
> + sk = skb->sk;
> + if (!sk)
> + sk = (*lookup)(skb, uh->source, uh->dest);
>   if (sk && udp_sk(sk)->gro_enabled) {
>   err = udp_gro_complete_segment(skb);
>   } else if (sk && udp_sk(sk)->gro_complete) {
> @@ -516,6 +527,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
>   err = udp_sk(sk)->gro_complete(sk, skb,
>   nhoff + sizeof(struct udphdr));
>   }
> +
> + if (skb->sk) {
> + sock_put(skb->sk);
> + skb->sk = NULL;
> + }
>   rcu_read_unlock();
>  
>   if (skb->remcsum_offload)
>

Re: [PATCH bpf-next 3/4] bpf: Support socket lookup in CGROUP_SOCK_ADDR progs

2018-11-09 Thread Martin Lau

On Thu, Nov 08, 2018 at 08:54:24AM -0800, Andrey Ignatov wrote:
> Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers
> available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR.
> 
> Such programs operate on sockets and have access to socket and struct
> sockaddr passed by user to system calls such as sys_bind, sys_connect,
> sys_sendmsg.
> 
> It's useful to be able to lookup other sockets from these programs.
> E.g. sys_connect may lookup IP:port endpoint and if there is a server
> socket bound to that endpoint ("server" can be defined by saddr & sport
> being zero), redirect client connection to it by rewriting IP:port in
> sockaddr passed to sys_connect.
> 
> Signed-off-by: Andrey Ignatov 
> Acked-by: Alexei Starovoitov 
> ---
>  net/core/filter.c | 53 +++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index dc0f86a707b7..2e8575a34a1e 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4971,6 +4971,51 @@ static const struct bpf_func_proto 
> bpf_sk_release_proto = {
>   .ret_type   = RET_INTEGER,
>   .arg1_type  = ARG_PTR_TO_SOCKET,
>  };
> +
> +static unsigned long
> +bpf_sock_addr_sk_lookup(struct sock *sk, struct bpf_sock_tuple *tuple, u32 
> len,
> + u8 proto, u64 netns_id, u64 flags)
Nit. This func looks unnecessary. as good as directly calling __bpf_sk_lookup().

Others LGTM.

> +{
> + return __bpf_sk_lookup(NULL, tuple, len, proto, netns_id, sock_net(sk),
> +0, flags);
> +}
> +
> +BPF_CALL_5(bpf_sock_addr_sk_lookup_tcp, struct bpf_sock_addr_kern *, ctx,
> +struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
> +{
> + return bpf_sock_addr_sk_lookup(ctx->sk, tuple, len, IPPROTO_TCP,
> +netns_id, flags);
> +}
> +
> +static const struct bpf_func_proto bpf_sock_addr_sk_lookup_tcp_proto = {
> + .func   = bpf_sock_addr_sk_lookup_tcp,
> + .gpl_only   = false,
> + .ret_type   = RET_PTR_TO_SOCKET_OR_NULL,
> + .arg1_type  = ARG_PTR_TO_CTX,
> + .arg2_type  = ARG_PTR_TO_MEM,
> + .arg3_type  = ARG_CONST_SIZE,
> + .arg4_type  = ARG_ANYTHING,
> + .arg5_type  = ARG_ANYTHING,
> +};
> +
> +BPF_CALL_5(bpf_sock_addr_sk_lookup_udp, struct bpf_sock_addr_kern *, ctx,
> +struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
> +{
> + return bpf_sock_addr_sk_lookup(ctx->sk, tuple, len, IPPROTO_UDP,
> +netns_id, flags);
> +}
> +
> +static const struct bpf_func_proto bpf_sock_addr_sk_lookup_udp_proto = {
> + .func   = bpf_sock_addr_sk_lookup_udp,
> + .gpl_only   = false,
> + .ret_type   = RET_PTR_TO_SOCKET_OR_NULL,
> + .arg1_type  = ARG_PTR_TO_CTX,
> + .arg2_type  = ARG_PTR_TO_MEM,
> + .arg3_type  = ARG_CONST_SIZE,
> + .arg4_type  = ARG_ANYTHING,
> + .arg5_type  = ARG_ANYTHING,
> +};
> +
>  #endif /* CONFIG_INET */
>  
>  bool bpf_helper_changes_pkt_data(void *func)
> @@ -5077,6 +5122,14 @@ sock_addr_func_proto(enum bpf_func_id func_id, const 
> struct bpf_prog *prog)
>   return _get_socket_cookie_sock_addr_proto;
>   case BPF_FUNC_get_local_storage:
>   return _get_local_storage_proto;
> +#ifdef CONFIG_INET
> + case BPF_FUNC_sk_lookup_tcp:
> + return _sock_addr_sk_lookup_tcp_proto;
> + case BPF_FUNC_sk_lookup_udp:
> + return _sock_addr_sk_lookup_udp_proto;
> + case BPF_FUNC_sk_release:
> + return _sk_release_proto;
> +#endif /* CONFIG_INET */
>   default:
>   return bpf_base_func_proto(func_id);
>   }
> -- 
> 2.17.1
>

Re: [PATCH bpf-next 2/4] bpf: Split bpf_sk_lookup

2018-11-09 Thread Martin Lau

On Thu, Nov 08, 2018 at 08:54:23AM -0800, Andrey Ignatov wrote:
> Split bpf_sk_lookup to separate core functionality, that can be reused
> to make socket lookup available to more program types, from
> functionality specific to program types that have access to skb.
> 
> Core functionality is placed to __bpf_sk_lookup. And bpf_sk_lookup only
> gets caller netns and ifindex from skb and passes it to __bpf_sk_lookup.
> 
> Program types that don't have access to skb can just pass NULL to
> __bpf_sk_lookup that will be handled correctly by both inet{,6}_sdif and
> lookup functions.
> 
> This is refactoring that simply moves blocks around and does NOT change
> existing logic.
> 
> Signed-off-by: Andrey Ignatov 
> Acked-by: Alexei Starovoitov 
> ---
>  net/core/filter.c | 38 +++---
>  1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 9a1327eb25fa..dc0f86a707b7 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4825,14 +4825,10 @@ static const struct bpf_func_proto 
> bpf_lwt_seg6_adjust_srh_proto = {
>  
>  #ifdef CONFIG_INET
>  static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
> -   struct sk_buff *skb, u8 family, u8 proto)
> +   struct sk_buff *skb, u8 family, u8 proto, int dif)
>  {
>   bool refcounted = false;
>   struct sock *sk = NULL;
> - int dif = 0;
> -
> - if (skb->dev)
> - dif = skb->dev->ifindex;
>  
>   if (family == AF_INET) {
>   __be32 src4 = tuple->ipv4.saddr;
> @@ -4875,16 +4871,16 @@ static struct sock *sk_lookup(struct net *net, struct 
> bpf_sock_tuple *tuple,
>   return sk;
>  }
>  
> -/* bpf_sk_lookup performs the core lookup for different types of sockets,
> +/* __bpf_sk_lookup performs the core lookup for different types of sockets,
>   * taking a reference on the socket if it doesn't have the flag 
> SOCK_RCU_FREE.
>   * Returns the socket as an 'unsigned long' to simplify the casting in the
>   * callers to satisfy BPF_CALL declarations.
>   */
>  static unsigned long
> -bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> -   u8 proto, u64 netns_id, u64 flags)
> +__bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> + u8 proto, u64 netns_id, struct net *caller_net, int ifindex,
> + u64 flags)
That looks a bit different from the one landed to bpf-next.
You may need to respin the set.

>  {
> - struct net *caller_net;
>   struct sock *sk = NULL;
>   u8 family = AF_UNSPEC;
>   struct net *net;
> @@ -4893,19 +4889,15 @@ bpf_sk_lookup(struct sk_buff *skb, struct 
> bpf_sock_tuple *tuple, u32 len,
>   if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags))
>   goto out;
>  
> - if (skb->dev)
> - caller_net = dev_net(skb->dev);
> - else
> - caller_net = sock_net(skb->sk);
>   if (netns_id) {
>   net = get_net_ns_by_id(caller_net, netns_id);
>   if (unlikely(!net))
>   goto out;
> - sk = sk_lookup(net, tuple, skb, family, proto);
> + sk = sk_lookup(net, tuple, skb, family, proto, ifindex);
>   put_net(net);
>   } else {
>   net = caller_net;
> - sk = sk_lookup(net, tuple, skb, family, proto);
> + sk = sk_lookup(net, tuple, skb, family, proto, ifindex);
>   }
>  
>   if (sk)
> @@ -4914,6 +4906,22 @@ bpf_sk_lookup(struct sk_buff *skb, struct 
> bpf_sock_tuple *tuple, u32 len,
>   return (unsigned long) sk;
>  }
>  
> +static unsigned long
> +bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
> +   u8 proto, u64 netns_id, u64 flags)
> +{
> + struct net *caller_net = sock_net(skb->sk);
> + int ifindex = 0;
> +
> + if (skb->dev) {
> + caller_net = dev_net(skb->dev);
> + ifindex = skb->dev->ifindex;
> + }
> +
> + return __bpf_sk_lookup(skb, tuple, len, proto, netns_id, caller_net,
> +ifindex, flags);
> +}
> +
>  BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb,
>  struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
>  {
> -- 
> 2.17.1
>

Re: [PATCH 08/20] octeontx2-af: Alloc and config NPC MCAM entry at a time

2018-11-09 Thread Sunil Kovvuri

On Fri, Nov 9, 2018 at 4:32 PM Arnd Bergmann  wrote:
>
> On Fri, Nov 9, 2018 at 5:21 AM Sunil Kovvuri  wrote:
> >
> > On Fri, Nov 9, 2018 at 2:13 AM Arnd Bergmann  wrote:
> > >
> > > On Thu, Nov 8, 2018 at 7:37 PM  wrote:
> > > > @@ -666,4 +668,20 @@ struct npc_mcam_unmap_counter_req {
> > > > u8  all;   /* Unmap all entries using this counter ? */
> > > >  };
> > > >
> > > > +struct npc_mcam_alloc_and_write_entry_req {
> > > > +   struct mbox_msghdr hdr;
> > > > +   struct mcam_entry entry_data;
> > > > +   u16 ref_entry;
> > > > +   u8  priority;/* Lower or higher w.r.t ref_entry */
> > > > +   u8  intf;/* Rx or Tx interface */
> > > > +   u8  enable_entry;/* Enable this MCAM entry ? */
> > > > +   u8  alloc_cntr;  /* Allocate counter and map ? */
> > > > +};
> > >
> > > I noticed that this structure requires padding at the end because
> > > struct mbox_msghdr has a 32-bit alignment requirement. For
> > > data structures in an interface, I'd recommend avoiding that kind
> > > of padding and adding reserved fields or widening the types
> > > accordingly.
> > >
> >
> > When there are multiple messages in the mailbox, each message starts
> > at a 16byte aligned offset. So struct mbox_msghdr is always aligned.
> > I think adding reserved fields is not needed here.
> >
> > ===
> > struct mbox_msghdr *otx2_mbox_alloc_msg_rsp(struct otx2_mbox *mbox, int 
> > devid,
> > int size, int size_rsp)
> > {
> > size = ALIGN(size, MBOX_MSG_ALIGN);
> > ===
> >
> > Is this what you were referring to ?
> >
>
> No, I mean the padding at the end of the structure. An example
> would be a structure like
>
> struct s {
> u16 a;
> u32 b;
> u16 c;
> };
>
> Since b is aligned to four bytes, you get padding between a and b.
> On top of that, you also get padding after c to make the size of
> structure itself be a multiple of its alignment. For interfaces, we
> should avoid both kinds of padding. This can be done by marking
> members as __packed (usually I don't recommend that), by
> changing the size of members, or by adding explicit 'reserved'
> fields in place of the padding.
>
> > > I also noticed a similar problem in struct mbox_msghdr. Maybe
> > > use the 'pahole' tool to check for this kind of padding in the
> > > API structures.
>
>  Arnd

Got your point now and agree that padding has to be avoided.
But this is a big change and above pointed structure is not
the only one as this applies to all structures in the file.

Would it be okay if I submit a separate patch after this series
addressing all structures ?

Thanks,
Sunil.

Re: [PATCH][net-next] net: tcp: remove BUG_ON from tcp_v4_err

2018-11-09 Thread Eric Dumazet




On 11/09/2018 01:04 AM, Li RongQing wrote:
> if skb is NULL pointer, and the following access of skb's
> skb_mstamp_ns will trigger panic, which is same as BUG_ON
> 
> Signed-off-by: Li RongQing 
> ---
>  net/ipv4/tcp_ipv4.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index a336787d75e5..5424a4077c27 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -542,7 +542,6 @@ int tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
>   icsk->icsk_rto = inet_csk_rto_backoff(icsk, TCP_RTO_MAX);
>  
>   skb = tcp_rtx_queue_head(sk);
> - BUG_ON(!skb);
>  
>   tcp_mstamp_refresh(tp);
>   delta_us = (u32)(tp->tcp_mstamp - tcp_skb_timestamp_us(skb));
> 

SGTM, thanks.

Signed-off-by: Eric Dumazet

Re: [PATCH iproute] ss: Actually print left delimiter for columns

2018-11-09 Thread Stephen Hemminger

On Mon, 29 Oct 2018 23:04:25 +0100
Stefano Brivio  wrote:

> While rendering columns, we use a local variable to keep track of the
> field currently being printed, without touching current_field, which is
> used for buffering.
> 
> Use the right pointer to access the left delimiter for the current column,
> instead of always printing the left delimiter for the last buffered field,
> which is usually an empty string.
> 
> This fixes an issue especially visible on narrow terminals, where some
> columns might be displayed without separation.
> 
> Reported-by: YoyPa 
> Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a 
> table")
> Signed-off-by: Stefano Brivio 
> Tested-by: YoyPa 

This test broke the testsuite/ss/ssfilter.t test.
Please fix the test to match your new output format, or I will have to revert 
it.

Re: [PATCH 11/20] octeontx2-af: Add support for stripping STAG/CTAG

2018-11-09 Thread Sunil Kovvuri

On Fri, Nov 9, 2018 at 4:42 PM Arnd Bergmann  wrote:
>
> On Fri, Nov 9, 2018 at 5:29 AM Sunil Kovvuri  wrote:
> > On Fri, Nov 9, 2018 at 2:17 AM Arnd Bergmann  wrote:
> > > On Thu, Nov 8, 2018 at 7:37 PM  wrote:
>
> > >
> > > Here is another instance of bitfields in an interface structure. As
> > > before, please try to avoid doing that and use bit shifts and masks
> > > instead.
> > >
> > >Arnd
> >
> > No, this struct is not part of communication interface.
> > This is used to fill up a register in a bit more readable fashion
> > instead of plain bit shifts.
>
> But this is still an interface, isn't it? Writing to the register
> implies that there is some hardware that interprets the
> bits, so they have to be in the right place.
>
> > ===
> > struct nix_rx_vtag_action vtag_action;
> >
> > *(u64 *)_action = 0;
> > vtag_action.vtag0_valid = 1;
> > /* must match type set in NIX_VTAG_CFG */
> > vtag_action.vtag0_type = 0;
> > vtag_action.vtag0_lid = NPC_LID_LA;
> > vtag_action.vtag0_relptr = 12;
> > entry.vtag_action = *(u64 *)_action;
> >
> > /* Set TAG 'action' */
> > rvu_write64(rvu, blkaddr, NPC_AF_MCAMEX_BANKX_TAG_ACT(index, 
> > actbank),
> > entry->vtag_action);
>
> I assume this rvu_write64() does a cpu_to_le64() swap on big-endian,
> so the contents again are in the wrong place. I don't see any non-reserved
> fields that span an 8-bit boundary, so you can probably rearrange the bits
> to make it work, but generally speaking it's better to not rely on how the
> compiler lays out bit fields.
>
> Arnd

Agreed.
Will fix and submit a new series.

Thanks,
Sunil.

Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously

2018-11-09 Thread Jason Gunthorpe

On Fri, Nov 09, 2018 at 06:26:22PM +0200, Leon Romanovsky wrote:
> On Thu, Nov 08, 2018 at 07:49:03PM +, Jason Gunthorpe wrote:
> > On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> > > From: Moni Shoua 
> > >
> > > Telling the HCA that page fault handling is done and QP can resume
> > > its flow is done in the context of the page fault handler. This blocks
> > > the handling of the next work in queue without a need.
> > > Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> > > the workqueue to pick the next work item for handling. All tasks that
> > > were executed after PAGE_FAULT_RESUME need to be done now
> > > in the callback of the asynchronous command mechanism.
> > >
> > > Signed-off-by: Moni Shoua 
> > > Signed-off-by: Leon Romanovsky 
> > >  drivers/infiniband/hw/mlx5/odp.c | 110 +--
> > >  include/linux/mlx5/driver.h  |   3 +
> > >  2 files changed, 94 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx5/odp.c 
> > > b/drivers/infiniband/hw/mlx5/odp.c
> > > index abce55b8b9ba..0c4f469cdd5b 100644
> > > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > > @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct 
> > > mlx5_ib_dev *dev)
> > >   return;
> > >  }
> > >
> > > +struct pfault_resume_cb_ctx {
> > > + struct mlx5_ib_dev *dev;
> > > + struct mlx5_core_rsc_common *res;
> > > + struct mlx5_pagefault *pfault;
> > > +};
> > > +
> > > +static void page_fault_resume_callback(int status, void *context)
> > > +{
> > > + struct pfault_resume_cb_ctx *ctx = context;
> > > + struct mlx5_pagefault *pfault = ctx->pfault;
> > > +
> > > + if (status)
> > > + mlx5_ib_err(ctx->dev, "Resolve the page fault failed with 
> > > status %d\n",
> > > + status);
> > > +
> > > + if (ctx->res)
> > > + mlx5_core_res_put(ctx->res);
> > > + kfree(pfault);
> > > + kfree(ctx);
> > > +}
> > > +
> > >  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> > > +   struct mlx5_core_rsc_common *res,
> > > struct mlx5_pagefault *pfault,
> > > -   int error)
> > > +   int error,
> > > +   bool async)
> > >  {
> > > + int ret = 0;
> > > + u32 *out = pfault->out_pf_resume;
> > > + u32 *in = pfault->in_pf_resume;
> > > + u32 token = pfault->token;
> > >   int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> > > -  pfault->wqe.wq_num : pfault->token;
> > > - int ret = mlx5_core_page_fault_resume(dev->mdev,
> > > -   pfault->token,
> > > -   wq_num,
> > > -   pfault->type,
> > > -   error);
> > > - if (ret)
> > > - mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 
> > > 0x%x\n",
> > > - wq_num);
> > > + pfault->wqe.wq_num : pfault->token;
> > > + u8 type = pfault->type;
> > > + struct pfault_resume_cb_ctx *ctx = NULL;
> > > +
> > > + if (async)
> > > + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
> >
> > Why not allocate this ctx ast part of the mlx5_pagefault and avoid
> > this allocation failure strategy?
> 
> It is another way to implement it, both of them are correct.

.. I think it is alot better to move this allocation, it gets rid of
this ugly duplicated code

> Can I assume that we can progress with patches except patch #2?

Lets drop this one too..

Jason

Re: [PATCH iproute2 net-next v2 0/2] Add DF configuration for VXLAN and GENEVE link types

2018-11-09 Thread David Ahern

On 11/8/18 4:21 AM, Stefano Brivio wrote:
> This series adds configuration of the DF bit in outgoing IPv4 packets for
> VXLAN and GENEVE link types.
> 
> Stefano Brivio (2):
>   iplink_vxlan: Add DF configuration
>   iplink_geneve: Add DF configuration
> 
>  include/uapi/linux/if_link.h | 18 ++
>  ip/iplink_geneve.c   | 29 +
>  ip/iplink_vxlan.c| 29 +
>  man/man8/ip-link.8.in| 28 
>  4 files changed, 104 insertions(+)
> 

applied to iproute2-next. Thanks

Re: Should the bridge learn from frames with link local destination MAC address?

2018-11-09 Thread nikolay

On 9 November 2018 18:24:18 EET, Roopa Prabhu  wrote:
>On Fri, Nov 9, 2018 at 8:00 AM Stephen Hemminger
> wrote:
>>
>> On Fri, 9 Nov 2018 04:24:43 +0100
>> Andrew Lunn  wrote:
>>
>> > Hi Roopa, Nikolay
>> >
>> > br_handle_frame() looks out for frames with a destination MAC
>> > addresses with is Ethernet link local, those which fit
>> > 01-80-C2-00-00-XX. It does not normally forward these, but it will
>> > deliver them locally.
>> >
>> > Should the bridge perform learning on such frames?
>> >
>> > I've got a setup with two bridges connected together with multiple
>> > links between them. STP has done its thing, and blocked one of the
>> > ports to solve the loop.
>> >
>> > host0   host1
>> > +-+ +-+
>> > | lan0 forwarding |-| lan0 forwarding |
>> > | |   | |
>> > | lan1 forwarding |-| lan1 blocked|
>> > +-+   +-+
>> >
>> > I have LLDP running on both system, and they are sending out
>periodic
>> > frames on each port.
>> >
>> > Now, lan0 and lan1 on host1 use the same MAC address.  So i see the
>> > MAC address bouncing between ports because of the LLDP packets.
>> >
>> > # bridge monitor
>> > 00:26:55:d2:27:a8 dev lan1 master br0
>> > 00:26:55:d2:27:a8 dev lan0 master br0
>> > 00:26:55:d2:27:a8 dev lan1 master br0
>> > 00:26:55:d2:27:a8 dev lan0 master br0
>> > 00:26:55:d2:27:a8 dev lan1 master br0
>> >
>> > This then results in normal traffic from host0 to host1 being sent
>to
>> > the blocked port for some of the time.
>> >
>> > LLDP is using 01-80-C2-00-00-0E, a link local MAC address. If the
>> > bridge did not learn on such frames, i think this setup would
>> > work. The bridge would learn from ARP, IP etc, coming from the
>> > forwarding port of host1, and the blocked port would be ignored.
>> >
>> > I've tried a similar setup with a hardware switch, Marvell 6352. It
>> > never seems to learn from such frames.
>> >
>> > Thanks
>> >   Andrew
>>
>> I agree with your analysis. A properly operating 802 compliant bridge
>> should not learn link local addresses.  But changing that in Linux
>bridge
>> would probably break some users. There is already a hack to forward
>link
>> local frames. There are many usages of Linux vswitch where this
>behavior
>> might be a problem:
>> 1. a container or VM hub
>> 2. bump in the wire filter
>> 3. L2 nat etc.
>>
>> So what ever you decide it has to be optional and unfortunately
>default
>> to off.
>>
>
>Andrew, I agree with your analysis also. We have hit this problem too
>(and we have an internal bug tracking it).
>We have not acted on this so far because of the fear of breaking
>existing deployments. I am all for fixing this if there is a
>clean way.

+1 and since this would be a new bridge boolean option I'd like to add one new
64 bit option with mask for new boolean bridge options so we can avoid
increasing the max rtnl attr id for such options. Please let me know
if you plan to work on the new option or I can cook something.


Thanks,
Nik

[PATCH iproute2] testsuite: correctly use CC macros for generate_nlmsg

2018-11-09 Thread Luca Boccassi

It's $(QUIET_CC)$(CC) not $(QUIET_CC), copy-paste error. CI does
verbose build so it slipped through.

Fixes: 6e7d347aabbb ("testsuite: build generate_nlmsg with QUIET_CC")

Signed-off-by: Luca Boccassi 
---
 testsuite/tools/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/testsuite/tools/Makefile b/testsuite/tools/Makefile
index 85df69ec..344e58b0 100644
--- a/testsuite/tools/Makefile
+++ b/testsuite/tools/Makefile
@@ -3,7 +3,7 @@ CFLAGS=
 include ../../config.mk
 
 generate_nlmsg: generate_nlmsg.c ../../lib/libnetlink.c
-   $(QUIET_CC) $(CPPFLAGS) $(CFLAGS) $(EXTRA_CFLAGS) -I../../include 
-include../../include/uapi/linux/netlink.h -o $@ $^ -lmnl
+   $(QUIET_CC)$(CC) $(CPPFLAGS) $(CFLAGS) $(EXTRA_CFLAGS) -I../../include 
-include../../include/uapi/linux/netlink.h -o $@ $^ -lmnl
 
 clean:
rm -f generate_nlmsg
-- 
2.19.1

Re: [PATCH mlx5-next 08/10] IB/mlx5: Call PAGE_FAULT_RESUME command asynchronously

2018-11-09 Thread Leon Romanovsky

On Thu, Nov 08, 2018 at 07:49:03PM +, Jason Gunthorpe wrote:
> On Thu, Nov 08, 2018 at 09:10:15PM +0200, Leon Romanovsky wrote:
> > From: Moni Shoua 
> >
> > Telling the HCA that page fault handling is done and QP can resume
> > its flow is done in the context of the page fault handler. This blocks
> > the handling of the next work in queue without a need.
> > Call the PAGE_FAULT_RESUME command in an asynchronous manner and free
> > the workqueue to pick the next work item for handling. All tasks that
> > were executed after PAGE_FAULT_RESUME need to be done now
> > in the callback of the asynchronous command mechanism.
> >
> > Signed-off-by: Moni Shoua 
> > Signed-off-by: Leon Romanovsky 
> >  drivers/infiniband/hw/mlx5/odp.c | 110 +--
> >  include/linux/mlx5/driver.h  |   3 +
> >  2 files changed, 94 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/odp.c 
> > b/drivers/infiniband/hw/mlx5/odp.c
> > index abce55b8b9ba..0c4f469cdd5b 100644
> > +++ b/drivers/infiniband/hw/mlx5/odp.c
> > @@ -298,20 +298,78 @@ void mlx5_ib_internal_fill_odp_caps(struct 
> > mlx5_ib_dev *dev)
> > return;
> >  }
> >
> > +struct pfault_resume_cb_ctx {
> > +   struct mlx5_ib_dev *dev;
> > +   struct mlx5_core_rsc_common *res;
> > +   struct mlx5_pagefault *pfault;
> > +};
> > +
> > +static void page_fault_resume_callback(int status, void *context)
> > +{
> > +   struct pfault_resume_cb_ctx *ctx = context;
> > +   struct mlx5_pagefault *pfault = ctx->pfault;
> > +
> > +   if (status)
> > +   mlx5_ib_err(ctx->dev, "Resolve the page fault failed with 
> > status %d\n",
> > +   status);
> > +
> > +   if (ctx->res)
> > +   mlx5_core_res_put(ctx->res);
> > +   kfree(pfault);
> > +   kfree(ctx);
> > +}
> > +
> >  static void mlx5_ib_page_fault_resume(struct mlx5_ib_dev *dev,
> > + struct mlx5_core_rsc_common *res,
> >   struct mlx5_pagefault *pfault,
> > - int error)
> > + int error,
> > + bool async)
> >  {
> > +   int ret = 0;
> > +   u32 *out = pfault->out_pf_resume;
> > +   u32 *in = pfault->in_pf_resume;
> > +   u32 token = pfault->token;
> > int wq_num = pfault->event_subtype == MLX5_PFAULT_SUBTYPE_WQE ?
> > -pfault->wqe.wq_num : pfault->token;
> > -   int ret = mlx5_core_page_fault_resume(dev->mdev,
> > - pfault->token,
> > - wq_num,
> > - pfault->type,
> > - error);
> > -   if (ret)
> > -   mlx5_ib_err(dev, "Failed to resolve the page fault on WQ 
> > 0x%x\n",
> > -   wq_num);
> > +   pfault->wqe.wq_num : pfault->token;
> > +   u8 type = pfault->type;
> > +   struct pfault_resume_cb_ctx *ctx = NULL;
> > +
> > +   if (async)
> > +   ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>
> Why not allocate this ctx ast part of the mlx5_pagefault and avoid
> this allocation failure strategy?

It is another way to implement it, both of them are correct.

Can I assume that we can progress with patches except patch #2?

Thanks

>
> Jason


signature.asc
Description: PGP signature

1 2 >

1 - 100 of 175 matches

Mail list logo