date:20181109

Re: [PATCH net 0/5] net: aquantia: 2018-11 bugfixes

2018-11-09 Thread David Miller

From: Igor Russkikh 
Date: Fri, 9 Nov 2018 11:53:54 +

> The patchset fixes a number of bugs found in various areas after
> driver validation.

Series applied, thank you.

Please, when you provide a Fixes: tag, do not separate it with the
other Signed-off-by: and Acked-by: etc. tags with an empty line.  It
is just another tag, so keep them all together without any kind of
separation like that.

A lot of people seem to do this, I wonder why :-)

Thank you.

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread Richard Cochran

On Fri, Nov 09, 2018 at 03:28:46PM -0800, David Miller wrote:
> This series looks good to me but I want to give Richard an opportunity to
> review it first.

The series is good to go.

Acked-by: Richard Cochran

[PATCH net] net: qualcomm: rmnet: Fix incorrect assignment of real_dev

2018-11-09 Thread Subash Abhinov Kasiviswanathan

A null dereference was observed when a sysctl was being set
from userspace and rmnet was stuck trying to complete some actions
in the NETDEV_REGISTER callback. This is because the real_dev is set
only after the device registration handler completes.

sysctl call stack -

<6> Unable to handle kernel NULL pointer dereference at
virtual address 0108
<2> pc : rmnet_vnd_get_iflink+0x1c/0x28
<2> lr : dev_get_iflink+0x2c/0x40
<2>  rmnet_vnd_get_iflink+0x1c/0x28
<2>  inet6_fill_ifinfo+0x15c/0x234
<2>  inet6_ifinfo_notify+0x68/0xd4
<2>  ndisc_ifinfo_sysctl_change+0x1b8/0x234
<2>  proc_sys_call_handler+0xac/0x100
<2>  proc_sys_write+0x3c/0x4c
<2>  __vfs_write+0x54/0x14c
<2>  vfs_write+0xcc/0x188
<2>  SyS_write+0x60/0xc0
<2>  el0_svc_naked+0x34/0x38

device register call stack -

<2>  notifier_call_chain+0x84/0xbc
<2>  raw_notifier_call_chain+0x38/0x48
<2>  call_netdevice_notifiers_info+0x40/0x70
<2>  call_netdevice_notifiers+0x38/0x60
<2>  register_netdevice+0x29c/0x3d8
<2>  rmnet_vnd_newlink+0x68/0xe8
<2>  rmnet_newlink+0xa0/0x160
<2>  rtnl_newlink+0x57c/0x6c8
<2>  rtnetlink_rcv_msg+0x1dc/0x328
<2>  netlink_rcv_skb+0xac/0x118
<2>  rtnetlink_rcv+0x24/0x30
<2>  netlink_unicast+0x158/0x1f0
<2>  netlink_sendmsg+0x32c/0x338
<2>  sock_sendmsg+0x44/0x60
<2>  SyS_sendto+0x150/0x1ac
<2>  el0_svc_naked+0x34/0x38

Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink")
Signed-off-by: Sean Tranchetti 
Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 0afc3d3..d11c16a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -234,7 +234,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
  struct net_device *real_dev,
  struct rmnet_endpoint *ep)
 {
-   struct rmnet_priv *priv;
+   struct rmnet_priv *priv = netdev_priv(rmnet_dev);
int rc;
 
if (ep->egress_dev)
@@ -247,6 +247,8 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
rmnet_dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
rmnet_dev->hw_features |= NETIF_F_SG;
 
+   priv->real_dev = real_dev;
+
rc = register_netdevice(rmnet_dev);
if (!rc) {
ep->egress_dev = rmnet_dev;
@@ -255,9 +257,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 
rmnet_dev->rtnl_link_ops = _link_ops;
 
-   priv = netdev_priv(rmnet_dev);
priv->mux_id = id;
-   priv->real_dev = real_dev;
 
netdev_dbg(rmnet_dev, "rmnet dev created\n");
}
-- 
1.9.1

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread David Miller

From: Jeff Kirsher 
Date: Fri, 09 Nov 2018 15:33:10 -0800

> On Fri, 2018-11-09 at 15:28 -0800, David Miller wrote:
>> From: Miroslav Lichvar 
>> Date: Fri,  9 Nov 2018 11:14:41 +0100
>> 
>> > RFC->v1:
>> > - added new patches
>> > - separated PHC timestamp from ptp_system_timestamp
>> > - fixed memory leak in PTP_SYS_OFFSET_EXTENDED
>> > - changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
>> > - fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
>> > - fixed timecounter updates in drivers
>> > - split gettimex in igb driver
>> > - fixed ptp_read_* functions to be available without
>> >   CONFIG_PTP_1588_CLOCK
>> > 
>> > This series enables a more accurate synchronization between PTP
>> > hardware
>> > clocks and the system clock.
>>  ...
>> 
>> This series looks good to me but I want to give Richard an opportunity to
>> review it first.
> 
> Dave, I also do not want to hold this series up by picking up patches 5, 6
> and 7 (Intel drivers) so please apply the entire series after Richard
> provides his review.

Ok, will do.

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Cong Wang

On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:
>
> On 2018/11/10 3:43, Cong Wang wrote:
> > Currently netdev_rx_csum_fault() only shows a device name,
> > we need more information about the skb for debugging.
> >
> > Sample output:
> >
> >  ens3: hw csum failure
> >  dev features: 0x00014b89
> >  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> > csum_complete_sw=0, csum_valid=0
> >
> > Signed-off-by: Cong Wang 
> > ---
> >  include/linux/netdevice.h |  5 +++--
> >  net/core/datagram.c   |  6 +++---
> >  net/core/dev.c| 10 --
> >  net/sunrpc/socklib.c  |  2 +-
> >  4 files changed, 15 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 857f8abf7b91..fabcd9fa6cf7 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -4332,9 +4332,10 @@ static inline bool 
> > can_checksum_protocol(netdev_features_t features,
> >  }
> >
> >  #ifdef CONFIG_BUG
> > -void netdev_rx_csum_fault(struct net_device *dev);
> > +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
> >  #else
> > -static inline void netdev_rx_csum_fault(struct net_device *dev)
> > +static inline void netdev_rx_csum_fault(struct net_device *dev,
> > + struct sk_buff *skb)
> >  {
> >  }
> >  #endif
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> > --- a/net/core/datagram.c
> > +++ b/net/core/datagram.c
> > @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
> > *skb, int len)
> >   if (likely(!sum)) {
> >   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >   !skb->csum_complete_sw)
> > - netdev_rx_csum_fault(skb->dev);
> > + netdev_rx_csum_fault(skb->dev, skb);
> >   }
> >   if (!skb_shared(skb))
> >   skb->csum_valid = !sum;
> > @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
> >   if (likely(!sum)) {
> >   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >   !skb->csum_complete_sw)
> > - netdev_rx_csum_fault(skb->dev);
> > + netdev_rx_csum_fault(skb->dev, skb);
> >   }
> >
> >   if (!skb_shared(skb)) {
> > @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
> >
> >   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >   !skb->csum_complete_sw)
> > - netdev_rx_csum_fault(NULL);
> > + netdev_rx_csum_fault(NULL, skb);
> >   }
> >   return 0;
> >  fault:
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 0ffcbdd55fa9..2b337df26117 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
> >
> >  /* Take action when hardware reception checksum errors are detected. */
> >  #ifdef CONFIG_BUG
> > -void netdev_rx_csum_fault(struct net_device *dev)
> > +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
> >  {
> >   if (net_ratelimit()) {
> >   pr_err("%s: hw csum failure\n", dev ? dev->name : 
> > "");
> > + if (dev)
> > + pr_err("dev features: %pNF\n", >features);
> > + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> > ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> > +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> > +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> > +skb->csum_complete_sw, skb->csum_valid);
>
>
> This function also have the netdev available, use netdev_err to log the error?

It is apparently not me who picked pr_err() from the beginning,
I just follow that pr_err(). If you are not happy with it, please send
a followup.


>
> Also, dev->features was dumped before this patch, why remove it?

Seriously? Where do I remove it? Please be specific. :)

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Yunsheng Lin

On 2018/11/10 10:09, Cong Wang wrote:
> On Fri, Nov 9, 2018 at 6:02 PM Yunsheng Lin  wrote:
>>
>> On 2018/11/10 9:42, Cong Wang wrote:
>>> On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:

 On 2018/11/10 3:43, Cong Wang wrote:
> Currently netdev_rx_csum_fault() only shows a device name,
> we need more information about the skb for debugging.
>
> Sample output:
>
>  ens3: hw csum failure
>  dev features: 0x00014b89
>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> csum_complete_sw=0, csum_valid=0
>
> Signed-off-by: Cong Wang 
> ---
>  include/linux/netdevice.h |  5 +++--
>  net/core/datagram.c   |  6 +++---
>  net/core/dev.c| 10 --
>  net/sunrpc/socklib.c  |  2 +-
>  4 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8abf7b91..fabcd9fa6cf7 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -4332,9 +4332,10 @@ static inline bool 
> can_checksum_protocol(netdev_features_t features,
>  }
>
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev);
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
>  #else
> -static inline void netdev_rx_csum_fault(struct net_device *dev)
> +static inline void netdev_rx_csum_fault(struct net_device *dev,
> + struct sk_buff *skb)
>  {
>  }
>  #endif
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
> *skb, int len)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>   if (!skb_shared(skb))
>   skb->csum_valid = !sum;
> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>
>   if (!skb_shared(skb)) {
> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff 
> *skb,
>
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(NULL);
> + netdev_rx_csum_fault(NULL, skb);
>   }
>   return 0;
>  fault:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ffcbdd55fa9..2b337df26117 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
>
>  /* Take action when hardware reception checksum errors are detected. */
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev)
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
>  {
>   if (net_ratelimit()) {
>   pr_err("%s: hw csum failure\n", dev ? dev->name : 
> "");
> + if (dev)
> + pr_err("dev features: %pNF\n", >features);
> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> +skb->csum_complete_sw, skb->csum_valid);


 This function also have the netdev available, use netdev_err to log the 
 error?
>>>
>>> It is apparently not me who picked pr_err() from the beginning,
>>> I just follow that pr_err(). If you are not happy with it, please send
>>> a followup.
>>
>> Yes, but perhaps it is something to improve.
> 
> 
> Sure, no one stops you from improving it in a followup patch. :)
> 
> 
>> When using the netdev, then maybe it does not have to check if dev is null, 
>> because
>> netdev_err has handled the netdev being NULL case.
>> Maybe I missed something that netdev can not be used here?
>> If not, maybe I can send a followup.
>>
> 
> Maybe. Again, my patch intends to add a few debugging logs,
> not to convert pr_err() to whatever else, they are totally different
> goals. I choose pr_err() only because I follow the existing one,
> not to say which one is better than the other.

Ok. :)

> 
>

[PATCH net-next 2/6] nfp: flower: allow non repr netdev offload

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Previously the offload functions in NFP assumed that the ingress (or
egress) netdev passed to them was an nfp repr.

Modify the driver to permit the passing of non repr netdevs as the ingress
device for an offload rule candidate. This may include devices such as
tunnels. The driver should then base its offload decision on a combination
of ingress device and egress port for a rule.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/flower/action.c| 14 +++
 .../net/ethernet/netronome/nfp/flower/main.h  |  3 +-
 .../net/ethernet/netronome/nfp/flower/match.c | 38 ++-
 .../ethernet/netronome/nfp/flower/offload.c   | 33 +---
 4 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index fbc052d5bb47..2e64fe878da6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -149,11 +149,12 @@ nfp_fl_output(struct nfp_app *app, struct nfp_fl_output 
*output,
/* Set action output parameters. */
output->flags = cpu_to_be16(tmp_flags);
 
-   /* Only offload if egress ports are on the same device as the
-* ingress port.
-*/
-   if (!switchdev_port_same_parent_id(in_dev, out_dev))
-   return -EOPNOTSUPP;
+   if (nfp_netdev_is_nfp_repr(in_dev)) {
+   /* Confirm ingress and egress are on same device. */
+   if (!switchdev_port_same_parent_id(in_dev, out_dev))
+   return -EOPNOTSUPP;
+   }
+
if (!nfp_netdev_is_nfp_repr(out_dev))
return -EOPNOTSUPP;
 
@@ -840,9 +841,8 @@ nfp_flower_loop_action(struct nfp_app *app, const struct 
tc_action *a,
*a_len += sizeof(struct nfp_fl_push_vlan);
} else if (is_tcf_tunnel_set(a)) {
struct ip_tunnel_info *ip_tun = tcf_tunnel_info(a);
-   struct nfp_repr *repr = netdev_priv(netdev);
 
-   *tun_type = nfp_fl_get_tun_from_act_l4_port(repr->app, a);
+   *tun_type = nfp_fl_get_tun_from_act_l4_port(app, a);
if (*tun_type == NFP_FL_TUNNEL_NONE)
return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 0f6f1675f6f1..4a2b1a915131 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -222,7 +222,8 @@ void nfp_flower_metadata_cleanup(struct nfp_app *app);
 
 int nfp_flower_setup_tc(struct nfp_app *app, struct net_device *netdev,
enum tc_setup_type type, void *type_data);
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/match.c 
b/drivers/net/ethernet/netronome/nfp/flower/match.c
index e54fb6034326..cdf75595f627 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/match.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/match.c
@@ -52,10 +52,13 @@ nfp_flower_compile_port(struct nfp_flower_in_port *frame, 
u32 cmsg_port,
return 0;
}
 
-   if (tun_type)
+   if (tun_type) {
frame->in_port = cpu_to_be32(NFP_FL_PORT_TYPE_TUN | tun_type);
-   else
+   } else {
+   if (!cmsg_port)
+   return -EOPNOTSUPP;
frame->in_port = cpu_to_be32(cmsg_port);
+   }
 
return 0;
 }
@@ -289,17 +292,21 @@ nfp_flower_compile_ipv4_udp_tun(struct 
nfp_flower_ipv4_udp_tun *frame,
}
 }
 
-int nfp_flower_compile_flow_match(struct tc_cls_flower_offload *flow,
+int nfp_flower_compile_flow_match(struct nfp_app *app,
+ struct tc_cls_flower_offload *flow,
  struct nfp_fl_key_ls *key_ls,
  struct net_device *netdev,
  struct nfp_fl_payload *nfp_flow,
  enum nfp_flower_tun_type tun_type)
 {
-   struct nfp_repr *netdev_repr;
+   u32 cmsg_port = 0;
int err;
u8 *ext;
u8 *msk;
 
+   if (nfp_netdev_is_nfp_repr(netdev))
+   cmsg_port = nfp_repr_get_port_id(netdev);
+
memset(nfp_flow->unmasked_data, 0, key_ls->key_size);
memset(nfp_flow->mask_data, 0, key_ls->key_size);
 
@@ -327,15 +334,13 @@ int

[PATCH net-next 1/6] net: sched: register callbacks for indirect tc block binds

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Currently drivers can register to receive TC block bind/unbind callbacks
by implementing the setup_tc ndo in any of their given netdevs. However,
drivers may also be interested in binds to higher level devices (e.g.
tunnel drivers) to potentially offload filters applied to them.

Introduce indirect block devs which allows drivers to register callbacks
for block binds on other devices. The callback is triggered when the
device is bound to a block, allowing the driver to register for rules
applied to that block using already available functions.

Freeing an indirect block callback will trigger an unbind event (if
necessary) to direct the driver to remove any offloaded rules and unreg
any block rule callbacks. It is the responsibility of the implementing
driver to clean any registered indirect block callbacks before exiting,
if the block it still active at such a time.

Allow registering an indirect block dev callback for a device that is
already bound to a block. In this case (if it is an ingress block),
register and also trigger the callback meaning that any already installed
rules can be replayed to the calling driver.

Signed-off-by: John Hurley 
Signed-off-by: Jakub Kicinski 
---
 include/net/pkt_cls.h |  34 +
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 256 +-
 3 files changed, 292 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 00f71644fbcd..f6c0cd29dea4 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -81,6 +81,14 @@ void __tcf_block_cb_unregister(struct tcf_block *block,
   struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 tc_setup_cb_t *cb, void *cb_ident);
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb, void *cb_ident);
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb, void *cb_ident);
 
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 struct tcf_result *res, bool compat_mode);
@@ -183,6 +191,32 @@ void tcf_block_cb_unregister(struct tcf_block *block,
 {
 }
 
+static inline
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+   return 0;
+}
+
+static inline
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+   return 0;
+}
+
+static inline
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+}
+
+static inline
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb, void *cb_ident)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
   struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a8dd1fc141b6..9481f2c142e2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -24,6 +24,9 @@ struct bpf_flow_keys;
 typedef int tc_setup_cb_t(enum tc_setup_type type,
  void *type_data, void *cb_priv);
 
+typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+   enum tc_setup_type type, void *type_data);
+
 struct qdisc_rate_table {
struct tc_ratespec rate;
u32 data[256];
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f427a1e00e7e..d92f44ac4c39 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -365,6 +366,245 @@ static void tcf_chain_flush(struct tcf_chain *chain)
}
 }
 
+static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
+{
+   const struct Qdisc_class_ops *cops;
+   struct Qdisc *qdisc;
+
+   if (!dev_ingress_queue(dev))
+   return NULL;
+
+   qdisc = dev_ingress_queue(dev)->qdisc_sleeping;
+   if (!qdisc)
+   return NULL;
+
+   cops = qdisc->ops->cl_ops;
+   if (!cops)
+   return NULL;
+
+   if (!cops->tcf_block)
+   return NULL;
+
+   return cops->tcf_block(qdisc, TC_H_MIN_INGRESS, NULL);
+}
+
+static struct rhashtable indr_setup_block_ht;
+
+struct tc_indr_block_dev {
+

[PATCH net-next 3/6] nfp: flower: increase scope of netdev checking functions

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Both the actions and tunnel_conf files contain local functions that check
the type of an input netdev. In preparation for re-use with tunnel offload
via indirect blocks, move these to static inline functions in a header
file.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../ethernet/netronome/nfp/flower/action.c| 14 --
 .../net/ethernet/netronome/nfp/flower/cmsg.h  | 27 +++
 .../netronome/nfp/flower/tunnel_conf.c| 19 ++---
 3 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c 
b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 2e64fe878da6..8d54b36afee8 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -2,7 +2,6 @@
 /* Copyright (C) 2017-2018 Netronome Systems, Inc. */
 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -11,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "cmsg.h"
 #include "main.h"
@@ -92,18 +90,6 @@ nfp_fl_pre_lag(struct nfp_app *app, const struct tc_action 
*action,
return act_size;
 }
 
-static bool nfp_fl_netdev_is_tunnel_type(struct net_device *out_dev,
-enum nfp_flower_tun_type tun_type)
-{
-   if (netif_is_vxlan(out_dev))
-   return tun_type == NFP_FL_TUNNEL_VXLAN;
-
-   if (netif_is_geneve(out_dev))
-   return tun_type == NFP_FL_TUNNEL_GENEVE;
-
-   return false;
-}
-
 static int
 nfp_fl_output(struct nfp_app *app, struct nfp_fl_output *output,
  const struct tc_action *action, struct nfp_fl_payload *nfp_flow,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h 
b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
index 3e391555e191..15f41cfef9f1 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/cmsg.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../nfp_app.h"
 #include "../nfpcore/nfp_cpp.h"
@@ -499,6 +500,32 @@ static inline int nfp_flower_cmsg_get_data_len(struct 
sk_buff *skb)
return skb->len - NFP_FLOWER_CMSG_HLEN;
 }
 
+static inline bool
+nfp_fl_netdev_is_tunnel_type(struct net_device *netdev,
+enum nfp_flower_tun_type tun_type)
+{
+   if (netif_is_vxlan(netdev))
+   return tun_type == NFP_FL_TUNNEL_VXLAN;
+   if (netif_is_geneve(netdev))
+   return tun_type == NFP_FL_TUNNEL_GENEVE;
+
+   return false;
+}
+
+static inline bool nfp_fl_is_netdev_to_offload(struct net_device *netdev)
+{
+   if (!netdev->rtnl_link_ops)
+   return false;
+   if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
+   return true;
+   if (netif_is_vxlan(netdev))
+   return true;
+   if (netif_is_geneve(netdev))
+   return true;
+
+   return false;
+}
+
 struct sk_buff *
 nfp_flower_cmsg_mac_repr_start(struct nfp_app *app, unsigned int num_ports);
 void
diff --git a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c 
b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
index 5d641d7dabff..2d9f26a725c2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
@@ -4,7 +4,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -182,20 +181,6 @@ void nfp_tunnel_keep_alive(struct nfp_app *app, struct 
sk_buff *skb)
}
 }
 
-static bool nfp_tun_is_netdev_to_offload(struct net_device *netdev)
-{
-   if (!netdev->rtnl_link_ops)
-   return false;
-   if (!strcmp(netdev->rtnl_link_ops->kind, "openvswitch"))
-   return true;
-   if (netif_is_vxlan(netdev))
-   return true;
-   if (netif_is_geneve(netdev))
-   return true;
-
-   return false;
-}
-
 static int
 nfp_flower_xmit_tun_conf(struct nfp_app *app, u8 mtype, u16 plen, void *pdata,
 gfp_t flag)
@@ -617,7 +602,7 @@ static void nfp_tun_add_to_mac_offload_list(struct 
net_device *netdev,
 
if (nfp_netdev_is_nfp_repr(netdev))
port = nfp_repr_get_port_id(netdev);
-   else if (!nfp_tun_is_netdev_to_offload(netdev))
+   else if (!nfp_fl_is_netdev_to_offload(netdev))
return;
 
entry = kmalloc(sizeof(*entry), GFP_KERNEL);
@@ -660,7 +645,7 @@ int nfp_tunnel_mac_event_handler(struct nfp_app *app,
 {
if (event == NETDEV_DOWN || event == NETDEV_UNREGISTER) {
/* If non-nfp netdev then free its offload index. */
-   if (nfp_tun_is_netdev_to_offload(netdev))
+   if (nfp_fl_is_netdev_to_offload(netdev))
nfp_tun_del_mac_idx(app, netdev->ifindex);
} else if (event == NETDEV_UP || event == NETDEV_CHANGEADDR ||
   event == NETDEV_REGISTER) {

Re: [PATCH net] net: sched: cls_flower: validate nested enc_opts_policy to avoid build warning

2018-11-09 Thread Jakub Kicinski

On Fri, 09 Nov 2018 20:40:25 -0800 (PST), David Miller wrote:
> From: Jakub Kicinski 
> Date: Fri,  9 Nov 2018 14:41:22 -0800
> 
> > TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
> > currently contain further nested attributes, which are parsed by
> > hand, so the policy is never actually used.  Add the validation
> > anyway to avoid potential bugs when other attributes are added
> > and to make the attribute structure slightly more clear.  Validation
> > will also set extact to point to bad attribute on error.
> > 
> > Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
> > Signed-off-by: Jakub Kicinski 
> > Acked-by: Simon Horman   
> 
> If this fixes a build warning, please include the build warning
> message in your commit log.
> 
> Thanks!

Ah, sorry, it's a W=1 warning, which should have been mentioned, too.
I'll repost shortly!

Re: [PATCH net] net: sched: cls_flower: validate nested enc_opts_policy to avoid build warning

2018-11-09 Thread David Miller

From: Jakub Kicinski 
Date: Fri,  9 Nov 2018 14:41:22 -0800

> TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
> currently contain further nested attributes, which are parsed by
> hand, so the policy is never actually used.  Add the validation
> anyway to avoid potential bugs when other attributes are added
> and to make the attribute structure slightly more clear.  Validation
> will also set extact to point to bad attribute on error.
> 
> Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
> Signed-off-by: Jakub Kicinski 
> Acked-by: Simon Horman 

If this fixes a build warning, please include the build warning
message in your commit log.

Thanks!

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Cong Wang

On Fri, Nov 9, 2018 at 6:02 PM Yunsheng Lin  wrote:
>
> On 2018/11/10 9:42, Cong Wang wrote:
> > On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:
> >>
> >> On 2018/11/10 3:43, Cong Wang wrote:
> >>> Currently netdev_rx_csum_fault() only shows a device name,
> >>> we need more information about the skb for debugging.
> >>>
> >>> Sample output:
> >>>
> >>>  ens3: hw csum failure
> >>>  dev features: 0x00014b89
> >>>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> >>> csum_complete_sw=0, csum_valid=0
> >>>
> >>> Signed-off-by: Cong Wang 
> >>> ---
> >>>  include/linux/netdevice.h |  5 +++--
> >>>  net/core/datagram.c   |  6 +++---
> >>>  net/core/dev.c| 10 --
> >>>  net/sunrpc/socklib.c  |  2 +-
> >>>  4 files changed, 15 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>> index 857f8abf7b91..fabcd9fa6cf7 100644
> >>> --- a/include/linux/netdevice.h
> >>> +++ b/include/linux/netdevice.h
> >>> @@ -4332,9 +4332,10 @@ static inline bool 
> >>> can_checksum_protocol(netdev_features_t features,
> >>>  }
> >>>
> >>>  #ifdef CONFIG_BUG
> >>> -void netdev_rx_csum_fault(struct net_device *dev);
> >>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
> >>>  #else
> >>> -static inline void netdev_rx_csum_fault(struct net_device *dev)
> >>> +static inline void netdev_rx_csum_fault(struct net_device *dev,
> >>> + struct sk_buff *skb)
> >>>  {
> >>>  }
> >>>  #endif
> >>> diff --git a/net/core/datagram.c b/net/core/datagram.c
> >>> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> >>> --- a/net/core/datagram.c
> >>> +++ b/net/core/datagram.c
> >>> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
> >>> *skb, int len)
> >>>   if (likely(!sum)) {
> >>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >>>   !skb->csum_complete_sw)
> >>> - netdev_rx_csum_fault(skb->dev);
> >>> + netdev_rx_csum_fault(skb->dev, skb);
> >>>   }
> >>>   if (!skb_shared(skb))
> >>>   skb->csum_valid = !sum;
> >>> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
> >>>   if (likely(!sum)) {
> >>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >>>   !skb->csum_complete_sw)
> >>> - netdev_rx_csum_fault(skb->dev);
> >>> + netdev_rx_csum_fault(skb->dev, skb);
> >>>   }
> >>>
> >>>   if (!skb_shared(skb)) {
> >>> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff 
> >>> *skb,
> >>>
> >>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> >>>   !skb->csum_complete_sw)
> >>> - netdev_rx_csum_fault(NULL);
> >>> + netdev_rx_csum_fault(NULL, skb);
> >>>   }
> >>>   return 0;
> >>>  fault:
> >>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>> index 0ffcbdd55fa9..2b337df26117 100644
> >>> --- a/net/core/dev.c
> >>> +++ b/net/core/dev.c
> >>> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
> >>>
> >>>  /* Take action when hardware reception checksum errors are detected. */
> >>>  #ifdef CONFIG_BUG
> >>> -void netdev_rx_csum_fault(struct net_device *dev)
> >>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
> >>>  {
> >>>   if (net_ratelimit()) {
> >>>   pr_err("%s: hw csum failure\n", dev ? dev->name : 
> >>> "");
> >>> + if (dev)
> >>> + pr_err("dev features: %pNF\n", >features);
> >>> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> >>> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> >>> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> >>> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> >>> +skb->csum_complete_sw, skb->csum_valid);
> >>
> >>
> >> This function also have the netdev available, use netdev_err to log the 
> >> error?
> >
> > It is apparently not me who picked pr_err() from the beginning,
> > I just follow that pr_err(). If you are not happy with it, please send
> > a followup.
>
> Yes, but perhaps it is something to improve.


Sure, no one stops you from improving it in a followup patch. :)


> When using the netdev, then maybe it does not have to check if dev is null, 
> because
> netdev_err has handled the netdev being NULL case.
> Maybe I missed something that netdev can not be used here?
> If not, maybe I can send a followup.
>

Maybe. Again, my patch intends to add a few debugging logs,
not to convert pr_err() to whatever else, they are totally different
goals. I choose pr_err() only because I follow the existing one,
not to say which one is better than the other.

Thanks.

Re: [PATCH] add an initial version of snmp_counter.rst

2018-11-09 Thread Cong Wang

(Cc Randy)

On Fri, Nov 9, 2018 at 10:13 AM yupeng  wrote:
>
> The snmp_counter.rst run a set of simple experiments, explains the
> meaning of snmp counters depend on the experiments' results. This is
> an initial version, only covers a small part of the snmp counters.


I don't look into much details, so just a few high-level reviews:

1. Please try to group those counters by protocol, it would be easier
to search.

2. For many counters you provide a link to RFC, do you just copy
and paste them? Please try to expand.

3. _I think_ you don't need to show, for example, how to run a ping
command. It's safe to assume readers already know this. Therefore,
just explaining those counters is okay.


Thanks.

>
> Signed-off-by: yupeng 
> ---
>  Documentation/networking/index.rst|   1 +
>  Documentation/networking/snmp_counter.rst | 963 ++
>  2 files changed, 964 insertions(+)
>  create mode 100644 Documentation/networking/snmp_counter.rst
>
> diff --git a/Documentation/networking/index.rst 
> b/Documentation/networking/index.rst
> index bd89dae8d578..6a47629ef8ed 100644
> --- a/Documentation/networking/index.rst
> +++ b/Documentation/networking/index.rst
> @@ -31,6 +31,7 @@ Contents:
> net_failover
> alias
> bridge
> +   snmp_counter
>
>  .. only::  subproject
>
> diff --git a/Documentation/networking/snmp_counter.rst 
> b/Documentation/networking/snmp_counter.rst
> new file mode 100644
> index ..2939c5acf675
> --- /dev/null
> +++ b/Documentation/networking/snmp_counter.rst
> @@ -0,0 +1,963 @@
> +
> +snmp counter tutorial
> +
> +
> +This document explains the meaning of snmp counters. For understanding
> +their meanings better, this document doesn't explain the counters one
> +by one, but creates a set of experiments, and explains the counters
> +depend on the experiments' results. The experiments are on one or two
> +virtual machines. Except for the test commands we use in the experiments,
> +the virtual machines have no other network traffic. We use the 'nstat'
> +command to get the values of snmp counters, before every test, we run
> +'nstat -n' to update the history, so the 'nstat' output would only
> +show the changes of the snmp counters. For more information about
> +nstat, please refer:
> +
> +http://man7.org/linux/man-pages/man8/nstat.8.html
> +
> +icmp ping
> +
> +
> +Run the ping command against the public dns server 8.8.8.8::
> +
> +  nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
> +  PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
> +  64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms
> +
> +  --- 8.8.8.8 ping statistics ---
> +  1 packets transmitted, 1 received, 0% packet loss, time 0ms
> +  rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms
> +
> +The nstayt result::
> +
> +  nstatuser@nstat-a:~$ nstat
> +  #kernel
> +  IpInReceives1  0.0
> +  IpInDelivers1  0.0
> +  IpOutRequests   1  0.0
> +  IcmpInMsgs  1  0.0
> +  IcmpInEchoReps  1  0.0
> +  IcmpOutMsgs 1  0.0
> +  IcmpOutEchos1  0.0
> +  IcmpMsgInType0  1  0.0
> +  IcmpMsgOutType8 1  0.0
> +  IpExtInOctets   84 0.0
> +  IpExtOutOctets  84 0.0
> +  IpExtInNoECTPkts1  0.0
> +
> +The nstat output could be divided into two part: one with the 'Ext'
> +keyword, another without the 'Ext' keyword. If the counter name
> +doesn't have 'Ext', it is defined by one of snmp rfc, if it has 'Ext',
> +it is a kernel extent counter. Below we explain them one by one.
> +
> +The rfc defined counters
> +--
> +
> +* IpInReceives
> +The total number of input datagrams received from interfaces,
> +including those received in error.
> +
> +https://tools.ietf.org/html/rfc1213#page-26
> +
> +* IpInDelivers
> +The total number of input datagrams successfully delivered to IP
> +user-protocols (including ICMP).
> +
> +https://tools.ietf.org/html/rfc1213#page-28
> +
> +* IpOutRequests
> +The total number of IP datagrams which local IP user-protocols
> +(including ICMP) supplied to IP in requests for transmission.  Note
> +that this counter does not include any datagrams counted in
> +ipForwDatagrams.
> +
> +https://tools.ietf.org/html/rfc1213#page-28
> +
> +* IcmpInMsgs
> +The total number of ICMP messages which the entity received.  Note
> +that this counter includes all those counted by icmpInErrors.
> +
> +https://tools.ietf.org/html/rfc1213#page-41
> +
> +* IcmpInEchoReps
> +The number of ICMP Echo Reply messages received.
> +
> +https://tools.ietf.org/html/rfc1213#page-42
> +
> +* IcmpOutMsgs
> +The total number of ICMP messages which this entity attempted

Re: [PATCH net-next] nfp: use the new __netdev_tx_sent_queue() BQL optimisation

2018-11-09 Thread David Miller

From: Jakub Kicinski 
Date: Fri,  9 Nov 2018 18:50:00 -0800

> __netdev_tx_sent_queue() was added in commit e59020abf0f
> ("net: bql: add __netdev_tx_sent_queue()") and allows for
> better GSO performance.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Dirk van der Merwe 
> Reviewed-by: Simon Horman 

Applied.

[net-next PATCH v3] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Amritha Nambiar

Added support in tc flower for filtering based on port ranges.

Example:
1. Match on a port range:
-
$ tc filter add dev enp4s0 protocol ip parent :\
  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
  action drop

$ tc -s filter show dev enp4s0 parent :
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  ip_proto tcp
  dst_port range 20-30
  skip_hw
  not_in_hw
action order 1: gact action drop
 random type none pass val 0
 index 1 ref 1 bind 1 installed 85 sec used 3 sec
Action statistics:
Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

2. Match on IP address and port range:
--
$ tc filter add dev enp4s0 protocol ip parent :\
  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
  skip_hw action drop

$ tc -s filter show dev enp4s0 parent :
filter protocol ip pref 1 flower chain 0 handle 0x2
  eth_type ipv4
  ip_proto tcp
  dst_ip 192.168.1.1
  dst_port range 100-200
  skip_hw
  not_in_hw
action order 1: gact action drop
 random type none pass val 0
 index 2 ref 1 bind 1 installed 58 sec used 2 sec
Action statistics:
Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

v3:
1. Moved new fields in UAPI enum to the end of enum.
2. Removed couple of empty lines.

v2:
Addressed Jiri's comments:
1. Added separate functions for dst and src comparisons.
2. Removed endpoint enum.
3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
  lookup.
4. Cleaned up fl_lookup function.

Signed-off-by: Amritha Nambiar 
---
 include/uapi/linux/pkt_cls.h |7 ++
 net/sched/cls_flower.c   |  132 --
 2 files changed, 133 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 401d0c1..95d0db2 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -485,6 +485,11 @@ enum {
 
TCA_FLOWER_IN_HW_COUNT,
 
+   TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
+   TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
+   TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
+   TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
+
__TCA_FLOWER_MAX,
 };
 
@@ -518,6 +523,8 @@ enum {
TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
 };
 
+#define TCA_FLOWER_MASK_FLAGS_RANGE(1 << 0) /* Range-based match */
+
 /* Match-all classifier */
 
 enum {
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9aada2d..7780106 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -55,6 +55,8 @@ struct fl_flow_key {
struct flow_dissector_key_ip ip;
struct flow_dissector_key_ip enc_ip;
struct flow_dissector_key_enc_opts enc_opts;
+   struct flow_dissector_key_ports tp_min;
+   struct flow_dissector_key_ports tp_max;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -65,6 +67,7 @@ struct fl_flow_mask_range {
 struct fl_flow_mask {
struct fl_flow_key key;
struct fl_flow_mask_range range;
+   u32 flags;
struct rhash_head ht_node;
struct rhashtable ht;
struct rhashtable_params filter_ht_params;
@@ -179,13 +182,89 @@ static void fl_clear_masked_range(struct fl_flow_key *key,
memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
 }
 
-static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
-  struct fl_flow_key *mkey)
+static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter,
+ struct fl_flow_key *key,
+ struct fl_flow_key *mkey)
+{
+   __be16 min_mask, max_mask, min_val, max_val;
+
+   min_mask = htons(filter->mask->key.tp_min.dst);
+   max_mask = htons(filter->mask->key.tp_max.dst);
+   min_val = htons(filter->key.tp_min.dst);
+   max_val = htons(filter->key.tp_max.dst);
+
+   if (min_mask && max_mask) {
+   if (htons(key->tp.dst) < min_val ||
+   htons(key->tp.dst) > max_val)
+   return false;
+
+   /* skb does not have min and max values */
+   mkey->tp_min.dst = filter->mkey.tp_min.dst;
+   mkey->tp_max.dst = filter->mkey.tp_max.dst;
+   }
+   return true;
+}
+
+static bool fl_range_port_src_cmp(struct cls_fl_filter *filter,
+ struct fl_flow_key *key,
+ struct fl_flow_key *mkey)
+{
+   __be16 min_mask, max_mask, min_val, max_val;
+
+   min_mask = htons(filter->mask->key.tp_min.src);
+   max_mask = htons(filter->mask->key.tp_max.src);
+   min_val = htons(filter->key.tp_min.src);
+   max_val =

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Yunsheng Lin

On 2018/11/10 9:42, Cong Wang wrote:
> On Fri, Nov 9, 2018 at 5:39 PM Yunsheng Lin  wrote:
>>
>> On 2018/11/10 3:43, Cong Wang wrote:
>>> Currently netdev_rx_csum_fault() only shows a device name,
>>> we need more information about the skb for debugging.
>>>
>>> Sample output:
>>>
>>>  ens3: hw csum failure
>>>  dev features: 0x00014b89
>>>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
>>> csum_complete_sw=0, csum_valid=0
>>>
>>> Signed-off-by: Cong Wang 
>>> ---
>>>  include/linux/netdevice.h |  5 +++--
>>>  net/core/datagram.c   |  6 +++---
>>>  net/core/dev.c| 10 --
>>>  net/sunrpc/socklib.c  |  2 +-
>>>  4 files changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 857f8abf7b91..fabcd9fa6cf7 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -4332,9 +4332,10 @@ static inline bool 
>>> can_checksum_protocol(netdev_features_t features,
>>>  }
>>>
>>>  #ifdef CONFIG_BUG
>>> -void netdev_rx_csum_fault(struct net_device *dev);
>>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
>>>  #else
>>> -static inline void netdev_rx_csum_fault(struct net_device *dev)
>>> +static inline void netdev_rx_csum_fault(struct net_device *dev,
>>> + struct sk_buff *skb)
>>>  {
>>>  }
>>>  #endif
>>> diff --git a/net/core/datagram.c b/net/core/datagram.c
>>> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
>>> --- a/net/core/datagram.c
>>> +++ b/net/core/datagram.c
>>> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff 
>>> *skb, int len)
>>>   if (likely(!sum)) {
>>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>   !skb->csum_complete_sw)
>>> - netdev_rx_csum_fault(skb->dev);
>>> + netdev_rx_csum_fault(skb->dev, skb);
>>>   }
>>>   if (!skb_shared(skb))
>>>   skb->csum_valid = !sum;
>>> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>>>   if (likely(!sum)) {
>>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>   !skb->csum_complete_sw)
>>> - netdev_rx_csum_fault(skb->dev);
>>> + netdev_rx_csum_fault(skb->dev, skb);
>>>   }
>>>
>>>   if (!skb_shared(skb)) {
>>> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>>>
>>>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>   !skb->csum_complete_sw)
>>> - netdev_rx_csum_fault(NULL);
>>> + netdev_rx_csum_fault(NULL, skb);
>>>   }
>>>   return 0;
>>>  fault:
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 0ffcbdd55fa9..2b337df26117 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
>>>
>>>  /* Take action when hardware reception checksum errors are detected. */
>>>  #ifdef CONFIG_BUG
>>> -void netdev_rx_csum_fault(struct net_device *dev)
>>> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
>>>  {
>>>   if (net_ratelimit()) {
>>>   pr_err("%s: hw csum failure\n", dev ? dev->name : 
>>> "");
>>> + if (dev)
>>> + pr_err("dev features: %pNF\n", >features);
>>> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
>>> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
>>> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
>>> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
>>> +skb->csum_complete_sw, skb->csum_valid);
>>
>>
>> This function also have the netdev available, use netdev_err to log the 
>> error?
> 
> It is apparently not me who picked pr_err() from the beginning,
> I just follow that pr_err(). If you are not happy with it, please send
> a followup.

Yes, but perhaps it is something to improve.
When using the netdev, then maybe it does not have to check if dev is null, 
because
netdev_err has handled the netdev being NULL case.
Maybe I missed something that netdev can not be used here?
If not, maybe I can send a followup.

> 
> 
>>
>> Also, dev->features was dumped before this patch, why remove it?
> 
> Seriously? Where do I remove it? Please be specific. :)

Sorry, I missed that, I thought it was removed when adding the new log.

> 
> .
>

Re: [PATCH v4] Wait for running BPF programs when updating map-in-map

2018-11-09 Thread Chenbo Feng

Hi netdev,

Could we queue up this patch to stable 4.14 and stable 4.19? I can
provide a backport patch if needed. I checked it is a clean
cherry-pick for 4.19 but have some minor conflict for 4.14.

Thanks
Chenbo Feng
On Thu, Oct 18, 2018 at 4:36 PM Joel Fernandes  wrote:
>
> On Thu, Oct 18, 2018 at 08:46:59AM -0700, Alexei Starovoitov wrote:
> > On Tue, Oct 16, 2018 at 10:39:57AM -0700, Joel Fernandes wrote:
> > > On Fri, Oct 12, 2018 at 7:31 PM, Alexei Starovoitov
> > >  wrote:
> > > > On Fri, Oct 12, 2018 at 03:54:27AM -0700, Daniel Colascione wrote:
> > > >> The map-in-map frequently serves as a mechanism for atomic
> > > >> snapshotting of state that a BPF program might record.  The current
> > > >> implementation is dangerous to use in this way, however, since
> > > >> userspace has no way of knowing when all programs that might have
> > > >> retrieved the "old" value of the map may have completed.
> > > >>
> > > >> This change ensures that map update operations on map-in-map map types
> > > >> always wait for all references to the old map to drop before returning
> > > >> to userspace.
> > > >>
> > > >> Signed-off-by: Daniel Colascione 
> > > >> ---
> > > >>  kernel/bpf/syscall.c | 14 ++
> > > >>  1 file changed, 14 insertions(+)
> > > >>
> > > >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > >> index 8339d81cba1d..d7c16ae1e85a 100644
> > > >> --- a/kernel/bpf/syscall.c
> > > >> +++ b/kernel/bpf/syscall.c
> > > >> @@ -741,6 +741,18 @@ static int map_lookup_elem(union bpf_attr *attr)
> > > >>   return err;
> > > >>  }
> > > >>
> > > >> +static void maybe_wait_bpf_programs(struct bpf_map *map)
> > > >> +{
> > > >> + /* Wait for any running BPF programs to complete so that
> > > >> +  * userspace, when we return to it, knows that all programs
> > > >> +  * that could be running use the new map value.
> > > >> +  */
> > > >> + if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS ||
> > > >> + map->map_type == BPF_MAP_TYPE_ARRAY_OF_MAPS) {
> > > >> + synchronize_rcu();
> > > >> + }
> > > >
> > > > extra {} were not necessary. I removed them while applying to bpf-next.
> > > > Please run checkpatch.pl next time.
> > > > Thanks
> > >
> > > Thanks Alexei for taking it. Me and Lorenzo were discussing that not
> > > having this causes incorrect behavior for apps using map-in-map for
> > > this. So I CC'd stable as well.
> >
> > It is too late in the release cycle.
> > We can submit it to stable releases after the merge window.
> >
>
> Sounds good, thanks.
>
> - Joel
>

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread David Miller

From: Richard Cochran 
Date: Fri, 9 Nov 2018 17:44:31 -0800

> On Fri, Nov 09, 2018 at 03:28:46PM -0800, David Miller wrote:
>> This series looks good to me but I want to give Richard an opportunity to
>> review it first.
> 
> The series is good to go.
> 
> Acked-by: Richard Cochran 

Great, series applied to net-next, thanks everyone.

[PATCH net v2] net: sched: cls_flower: validate nested enc_opts_policy to avoid warning

2018-11-09 Thread Jakub Kicinski

TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used resulting in a W=1
build warning:

net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used 
[-Wunused-const-variable=]
 enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {

Add the validation anyway to avoid potential bugs when other
attributes are added and to make the attribute structure slightly
more clear.  Validation will also set extact to point to bad
attribute on error.

Signed-off-by: Jakub Kicinski 
Acked-by: Simon Horman 
---
 net/sched/cls_flower.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9aada2d0ef06..c6c327874abc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -709,11 +709,23 @@ static int fl_set_enc_opt(struct nlattr **tb, struct 
fl_flow_key *key,
  struct netlink_ext_ack *extack)
 {
const struct nlattr *nla_enc_key, *nla_opt_key, *nla_opt_msk = NULL;
-   int option_len, key_depth, msk_depth = 0;
+   int err, option_len, key_depth, msk_depth = 0;
+
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
 
nla_enc_key = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS]);
 
if (tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]) {
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
+
nla_opt_msk = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
msk_depth = nla_len(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
}
-- 
2.17.1

Re: [iproute2 PATCH v2] tc: flower: Classify packets based port ranges

2018-11-09 Thread Nambiar, Amritha

On 11/9/2018 12:51 AM, Jiri Pirko wrote:
> Wed, Nov 07, 2018 at 10:22:50PM CET, amritha.namb...@intel.com wrote:
>> Added support for filtering based on port ranges.
>>
>> Example:
>> 1. Match on a port range:
>> -
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>>  action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0
>> filter protocol ip pref 1 flower chain 0 handle 0x1
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_port range 20-30
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>>Action statistics:
>>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> 2. Match on IP address and port range:
>> --
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>>  skip_hw action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0 handle 0x2
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_ip 192.168.1.1
>>  dst_port range 100-200
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>>Action statistics:
>>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> v2:
>> Addressed Jiri's comment to sync output format with input
>>
>> Signed-off-by: Amritha Nambiar 
>> ---
>> include/uapi/linux/pkt_cls.h |7 ++
>> tc/f_flower.c|  145 
>> +++---
>> 2 files changed, 142 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 401d0c1..b63c3cf 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -405,6 +405,11 @@ enum {
>>  TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>>  TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>
>> +TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>> +
>>  TCA_FLOWER_FLAGS,
>>  TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>>  TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>> @@ -518,6 +523,8 @@ enum {
>>  TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
>> };
>>
>> +#define TCA_FLOWER_MASK_FLAGS_RANGE (1 << 0) /* Range-based match */
>> +
>> /* Match-all classifier */
>>
>> enum {
>> diff --git a/tc/f_flower.c b/tc/f_flower.c
>> index 65fca04..7724a1d 100644
>> --- a/tc/f_flower.c
>> +++ b/tc/f_flower.c
>> @@ -494,6 +494,66 @@ static int flower_parse_port(char *str, __u8 ip_proto,
>>  return 0;
>> }
>>
>> +static int flower_port_range_attr_type(__u8 ip_proto, enum flower_endpoint 
>> type,
>> +   __be16 *min_port_type,
>> +   __be16 *max_port_type)
>> +{
>> +if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP ||
>> +ip_proto == IPPROTO_SCTP) {
>> +if (type == FLOWER_ENDPOINT_SRC) {
>> +*min_port_type = TCA_FLOWER_KEY_PORT_SRC_MIN;
>> +*max_port_type = TCA_FLOWER_KEY_PORT_SRC_MAX;
>> +} else {
>> +*min_port_type = TCA_FLOWER_KEY_PORT_DST_MIN;
>> +*max_port_type = TCA_FLOWER_KEY_PORT_DST_MAX;
>> +}
>> +} else {
>> +return -1;
>> +}
>> +
>> +return 0;
>> +}
>> +
>> +static int flower_parse_port_range(__be16 *min, __be16 *max, __u8 ip_proto,
>> +   enum flower_endpoint endpoint,
>> +   struct nlmsghdr *n)
>> +{
>> +__be16 min_port_type, max_port_type;
>> +
>> +flower_port_range_attr_type(ip_proto, endpoint, _port_type,
>> +_port_type);
>> +addattr16(n, MAX_MSG, min_port_type, *min);
>> +addattr16(n, MAX_MSG, max_port_type, *max);
>> +
>> +return 0;
>> +}
>> +
>> +static int get_range(__be16 *min, __be16 *max, char *argv)
>> +{
>> +char *r;
>> +
>> +r = strchr(argv, '-');
>> +if (r) {
>> +*r = '\0';
>> +if (get_be16(min, argv, 10)) {
>> +fprintf(stderr, "invalid min range\n");
>> +return -1;
>> +}
>> +if (get_be16(max, r + 1, 10)) {
>> +fprintf(stderr, "invalid max range\n");
>> +return -1;
>> +}
>> +if (htons(*max) <= htons(*min)) {
>> +fprintf(stderr, "max value should be greater than min 
>> value\n");
>> +return -1;
>> +

[PATCH] infiniband: nes: Fix more direct skb list accesses.

2018-11-09 Thread David Miller



The following:

skb = skb->next;
...
if (skb == (struct sk_buff *)queue)

is transformed into:

skb = skb_peek_next(skb, queue);
...
if (!skb)

Signed-off-by: David S. Miller 
---
 drivers/infiniband/hw/nes/nes_mgt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_mgt.c 
b/drivers/infiniband/hw/nes/nes_mgt.c
index e9661c3a..fc0c191014e9 100644
--- a/drivers/infiniband/hw/nes/nes_mgt.c
+++ b/drivers/infiniband/hw/nes/nes_mgt.c
@@ -223,11 +223,11 @@ static struct sk_buff *nes_get_next_skb(struct nes_device 
*nesdev, struct nes_qp
}
 
old_skb = skb;
-   skb = skb->next;
+   skb = skb_peek_next(skb, >pau_list);
skb_unlink(old_skb, >pau_list);
nes_mgt_free_skb(nesdev, old_skb, PCI_DMA_TODEVICE);
nes_rem_ref_cm_node(nesqp->cm_node);
-   if (skb == (struct sk_buff *)>pau_list)
+   if (!skb)
goto out;
}
return skb;
-- 
2.19.1

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread David Miller

From: Cong Wang 
Date: Fri,  9 Nov 2018 11:43:33 -0800

> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
> int len)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>   if (!skb_shared(skb))
>   skb->csum_valid = !sum;

Didn't you move this function into net/core/skbuff.c? :-)

Please respin.

Re: [PATCH net] flow_dissector: do not dissect l4 ports for fragments

2018-11-09 Thread David Miller

From: Eric Dumazet 
Date: Fri,  9 Nov 2018 16:53:06 -0800

> From: 배석진 
> 
> Only first fragment has the sport/dport information,
> not the following ones.
> 
> If we want consistent hash for all fragments, we need to
> ignore ports even for first fragment.
> 
> This bug is visible for IPv6 traffic, if incoming fragments
> do not have a flow label, since skb_get_hash() will give
> different results for first fragment and following ones.
> 
> It is also visible if any routing rule wants dissection
> and sport or dport.
> 
> See commit 5e5d6fed3741 ("ipv6: route: dissect flow
> in input path if fib rules need it") for details.
> 
> [edumazet] rewrote the changelog completely.
> 
> Fixes: 06635a35d13d ("flow_dissect: use programable dissector in 
> skb_flow_dissect and friends")
> Signed-off-by: 배석진 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable.

[PATCH net-next] nfp: use the new __netdev_tx_sent_queue() BQL optimisation

2018-11-09 Thread Jakub Kicinski

__netdev_tx_sent_queue() was added in commit e59020abf0f
("net: bql: add __netdev_tx_sent_queue()") and allows for
better GSO performance.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
Reviewed-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index b55c91818a67..9aa6265bf4de 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -890,8 +890,6 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
u64_stats_update_end(_vec->tx_sync);
}
 
-   netdev_tx_sent_queue(nd_q, txbuf->real_len);
-
skb_tx_timestamp(skb);
 
tx_ring->wr_p += nr_frags + 1;
@@ -899,7 +897,7 @@ static int nfp_net_tx(struct sk_buff *skb, struct 
net_device *netdev)
nfp_net_tx_ring_stop(nd_q, tx_ring);
 
tx_ring->wr_ptr_add += nr_frags + 1;
-   if (!skb->xmit_more || netif_xmit_stopped(nd_q))
+   if (__netdev_tx_sent_queue(nd_q, txbuf->real_len, skb->xmit_more))
nfp_net_tx_xmit_more_flush(tx_ring);
 
return NETDEV_TX_OK;
-- 
2.17.1

Re: [PATCH v2 net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-09 Thread David Miller

From: Heiner Kallweit 
Date: Fri, 9 Nov 2018 18:35:52 +0100

> As a heritage from the very early days of phylib member interrupts is
> defined as u32 even though it's just a flag whether interrupts are
> enabled. So we can change it to a bitfield member. In addition change
> the code dealing with this member in a way that it's clear we're
> dealing with a bool value.
> 
> Signed-off-by: Heiner Kallweit 
> ---
> v2:
> - use false/true instead of 0/1 for the constants

Applied.

[PATCH net-next 6/6] nfp: flower: remove unnecessary code in flow lookup

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Recent changes to NFP mean that stats updates from fw to driver no longer
require a flow lookup and (because egdev offload has been removed) the
ingress netdev for a lookup is now always known.

Remove obsolete code in a flow lookup that matches on host context and
that allows for a netdev to be NULL.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/flower/main.h |  3 +--
 drivers/net/ethernet/netronome/nfp/flower/metadata.c | 11 +++
 drivers/net/ethernet/netronome/nfp/flower/offload.c  |  6 ++
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 9d134aa871fc..b858bac47621 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -20,7 +20,6 @@ struct nfp_fl_pre_lag;
 struct net_device;
 struct nfp_app;
 
-#define NFP_FL_STATS_CTX_DONT_CARE cpu_to_be32(0x)
 #define NFP_FL_STATS_ELEM_RS   FIELD_SIZEOF(struct nfp_fl_stats_id, \
 init_unalloc)
 #define NFP_FLOWER_MASK_ENTRY_RS   256
@@ -242,7 +241,7 @@ int nfp_modify_flow_metadata(struct nfp_app *app,
 
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx);
+  struct net_device *netdev);
 struct nfp_fl_payload *
 nfp_flower_remove_fl_table(struct nfp_app *app, unsigned long 
tc_flower_cookie);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 9b4711ce98f0..573a4400a26c 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -21,7 +21,6 @@ struct nfp_mask_id_table {
 struct nfp_fl_flow_table_cmp_arg {
struct net_device *netdev;
unsigned long cookie;
-   __be32 host_ctx;
 };
 
 static int nfp_release_stats_entry(struct nfp_app *app, u32 stats_context_id)
@@ -76,14 +75,13 @@ static int nfp_get_stats_entry(struct nfp_app *app, u32 
*stats_context_id)
 /* Must be called with either RTNL or rcu_read_lock */
 struct nfp_fl_payload *
 nfp_flower_search_fl_table(struct nfp_app *app, unsigned long tc_flower_cookie,
-  struct net_device *netdev, __be32 host_ctx)
+  struct net_device *netdev)
 {
struct nfp_fl_flow_table_cmp_arg flower_cmp_arg;
struct nfp_flower_priv *priv = app->priv;
 
flower_cmp_arg.netdev = netdev;
flower_cmp_arg.cookie = tc_flower_cookie;
-   flower_cmp_arg.host_ctx = host_ctx;
 
return rhashtable_lookup_fast(>flow_table, _cmp_arg,
  nfp_flower_table_params);
@@ -307,8 +305,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
priv->stats[stats_cxt].bytes = 0;
priv->stats[stats_cxt].used = jiffies;
 
-   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev,
-NFP_FL_STATS_CTX_DONT_CARE);
+   check_entry = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (check_entry) {
if (nfp_release_stats_entry(app, stats_cxt))
return -EINVAL;
@@ -353,9 +350,7 @@ static int nfp_fl_obj_cmpfn(struct rhashtable_compare_arg 
*arg,
const struct nfp_fl_flow_table_cmp_arg *cmp_arg = arg->key;
const struct nfp_fl_payload *flow_entry = obj;
 
-   if ((!cmp_arg->netdev || flow_entry->ingress_dev == cmp_arg->netdev) &&
-   (cmp_arg->host_ctx == NFP_FL_STATS_CTX_DONT_CARE ||
-flow_entry->meta.host_ctx_id == cmp_arg->host_ctx))
+   if (flow_entry->ingress_dev == cmp_arg->netdev)
return flow_entry->tc_flower_cookie != cmp_arg->cookie;
 
return 1;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 0e2dfbb3ef86..545d94168874 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -512,8 +512,7 @@ nfp_flower_del_offload(struct nfp_app *app, struct 
net_device *netdev,
if (nfp_netdev_is_nfp_repr(netdev))
port = nfp_port_from_netdev(netdev);
 
-   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev,
- NFP_FL_STATS_CTX_DONT_CARE);
+   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie, netdev);
if (!nfp_flow)
return -ENOENT;
 
@@ -561,8 +560,7 @@ nfp_flower_get_stats(struct nfp_app *app, struct net_device 
*netdev,
struct nfp_fl_payload *nfp_flow;
u32 ctx_id;
 
-   nfp_flow = nfp_flower_search_fl_table(app, flow->cookie,

Re: [PATCH net-next] udp6: cleanup stats accounting in recvmsg()

2018-11-09 Thread David Miller

From: Paolo Abeni 
Date: Fri,  9 Nov 2018 15:52:45 +0100

> In the udp6 code path, we needed multiple tests to select the correct
> mib to be updated. Since we touch at least a counter at each iteration,
> it's convenient to use the recently introduced __UDPX_MIB() helper once
> and remove some code duplication.
> 
> Signed-off-by: Paolo Abeni 

Applied, thanks.

Re: [PATCH net-next v2 0/2] dpaa2-eth: defer probe on object allocate

2018-11-09 Thread David Miller

From: Ioana Ciornei 
Date: Fri, 9 Nov 2018 15:26:45 +

> Allocatable objects on the fsl-mc bus may be probed by the fsl_mc_allocator
> after the first attempts of other drivers to use them. Defer the probe when
> this situation happens.
> 
> Changes in v2:
>   - proper handling of IS_ERR_OR_NULL

Series applied.

Re: [net-next PATCH v3] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Jiri Pirko

Sat, Nov 10, 2018 at 01:11:10AM CET, amritha.namb...@intel.com wrote:

[...]

>@@ -1026,8 +1122,7 @@ static void fl_init_dissector(struct flow_dissector 
>*dissector,
>FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
>-  FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>-   FLOW_DISSECTOR_KEY_PORTS, tp);
>+  FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);

You still need to set the key under a condition. Something like:
if (FL_KEY_IS_MASKED(mask, tp) ||
FL_KEY_IS_MASKED(mask, tp_min) ||
FL_KEY_IS_MASKED(mask, tp_max)
FL_KEY_SET(keys, cnt, FLOW_DISSECTOR_KEY_PORTS, tp);


>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,
>FLOW_DISSECTOR_KEY_IP, ip);
>   FL_KEY_SET_IF_MASKED(mask, keys, cnt,

[...]

[PATCH net] flow_dissector: do not dissect l4 ports for fragments

2018-11-09 Thread Eric Dumazet

From: 배석진 

Only first fragment has the sport/dport information,
not the following ones.

If we want consistent hash for all fragments, we need to
ignore ports even for first fragment.

This bug is visible for IPv6 traffic, if incoming fragments
do not have a flow label, since skb_get_hash() will give
different results for first fragment and following ones.

It is also visible if any routing rule wants dissection
and sport or dport.

See commit 5e5d6fed3741 ("ipv6: route: dissect flow
in input path if fib rules need it") for details.

[edumazet] rewrote the changelog completely.

Fixes: 06635a35d13d ("flow_dissect: use programable dissector in 
skb_flow_dissect and friends")
Signed-off-by: 배석진 
Signed-off-by: Eric Dumazet 
---
 net/core/flow_dissector.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 
676f3ad629f95625422aa55f0f54157001ac477c..588f475019d47c9d6bae8883acebab48aaf63b48
 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -1166,8 +1166,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
break;
}
 
-   if (dissector_uses_key(flow_dissector,
-  FLOW_DISSECTOR_KEY_PORTS)) {
+   if (dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_PORTS) &&
+   !(key_control->flags & FLOW_DIS_IS_FRAGMENT)) {
key_ports = skb_flow_dissector_target(flow_dissector,
  FLOW_DISSECTOR_KEY_PORTS,
  target_container);
-- 
2.19.1.930.g4563a0d9d0-goog

Re: [Patch net-next] net: dump more useful information in netdev_rx_csum_fault()

2018-11-09 Thread Yunsheng Lin

On 2018/11/10 3:43, Cong Wang wrote:
> Currently netdev_rx_csum_fault() only shows a device name,
> we need more information about the skb for debugging.
> 
> Sample output:
> 
>  ens3: hw csum failure
>  dev features: 0x00014b89
>  skb len=84 data_len=0 gso_size=0 gso_type=0 ip_summed=0 csum=0, 
> csum_complete_sw=0, csum_valid=0
> 
> Signed-off-by: Cong Wang 
> ---
>  include/linux/netdevice.h |  5 +++--
>  net/core/datagram.c   |  6 +++---
>  net/core/dev.c| 10 --
>  net/sunrpc/socklib.c  |  2 +-
>  4 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 857f8abf7b91..fabcd9fa6cf7 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -4332,9 +4332,10 @@ static inline bool 
> can_checksum_protocol(netdev_features_t features,
>  }
>  
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev);
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb);
>  #else
> -static inline void netdev_rx_csum_fault(struct net_device *dev)
> +static inline void netdev_rx_csum_fault(struct net_device *dev,
> + struct sk_buff *skb)
>  {
>  }
>  #endif
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index 57f3a6fcfc1e..d8f4d55cd6c5 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -736,7 +736,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, 
> int len)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>   if (!skb_shared(skb))
>   skb->csum_valid = !sum;
> @@ -756,7 +756,7 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>  
>   if (!skb_shared(skb)) {
> @@ -810,7 +810,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>  
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(NULL);
> + netdev_rx_csum_fault(NULL, skb);
>   }
>   return 0;
>  fault:
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0ffcbdd55fa9..2b337df26117 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3091,10 +3091,16 @@ EXPORT_SYMBOL(__skb_gso_segment);
>  
>  /* Take action when hardware reception checksum errors are detected. */
>  #ifdef CONFIG_BUG
> -void netdev_rx_csum_fault(struct net_device *dev)
> +void netdev_rx_csum_fault(struct net_device *dev, struct sk_buff *skb)
>  {
>   if (net_ratelimit()) {
>   pr_err("%s: hw csum failure\n", dev ? dev->name : "");
> + if (dev)
> + pr_err("dev features: %pNF\n", >features);
> + pr_err("skb len=%d data_len=%d gso_size=%d gso_type=%d 
> ip_summed=%d csum=%x, csum_complete_sw=%d, csum_valid=%d\n",
> +skb->len, skb->data_len, skb_shinfo(skb)->gso_size,
> +skb_shinfo(skb)->gso_type, skb->ip_summed, skb->csum,
> +skb->csum_complete_sw, skb->csum_valid);


This function also have the netdev available, use netdev_err to log the error?

Also, dev->features was dumped before this patch, why remove it?


>   dump_stack();
>   }
>  }
> @@ -5779,7 +5785,7 @@ __sum16 __skb_gro_checksum_complete(struct sk_buff *skb)
>   if (likely(!sum)) {
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   }
>  
>   NAPI_GRO_CB(skb)->csum = wsum;
> diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
> index 9062967575c4..7e55cfc69697 100644
> --- a/net/sunrpc/socklib.c
> +++ b/net/sunrpc/socklib.c
> @@ -175,7 +175,7 @@ int csum_partial_copy_to_xdr(struct xdr_buf *xdr, struct 
> sk_buff *skb)
>   return -1;
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>   !skb->csum_complete_sw)
> - netdev_rx_csum_fault(skb->dev);
> + netdev_rx_csum_fault(skb->dev, skb);
>   return 0;
>  no_checksum:
>   if (xdr_partial_copy_from_skb(xdr, 0, , xdr_skb_read_bits) < 0)
>

[PATCH net-next 4/6] nfp: flower: offload tunnel decap rules via indirect TC blocks

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Previously, TC block tunnel decap rules were only offloaded when a
callback was triggered through registration of the rules egress device.
This meant that the driver had no access to the ingress netdev and so
could not verify it was the same tunnel type that the rule implied.

Register tunnel devices for indirect TC block offloads in NFP, giving
access to new rules based on the ingress device rather than egress. Use
this to verify the netdev type of VXLAN and Geneve based rules and offload
the rules to HW if applicable.

Tunnel registration is done via a netdev notifier. On notifier
registration, this is triggered for already existing netdevs. This means
that NFP can register for offloads from devices that exist before it is
loaded (filter rules will be replayed from the TC core). Similarly, on
notifier unregister, a call is triggered for each currently active netdev.
This allows the driver to unregister any indirect block callbacks that may
still be active.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/flower/main.c  |   6 +
 .../net/ethernet/netronome/nfp/flower/main.h  |   5 +
 .../ethernet/netronome/nfp/flower/offload.c   | 137 +-
 3 files changed, 144 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index 2ad00773750f..d1c3c2081461 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -568,6 +568,8 @@ static int nfp_flower_init(struct nfp_app *app)
goto err_cleanup_metadata;
}
 
+   INIT_LIST_HEAD(_priv->indr_block_cb_priv);
+
return 0;
 
 err_cleanup_metadata:
@@ -684,6 +686,10 @@ nfp_flower_netdev_event(struct nfp_app *app, struct 
net_device *netdev,
return ret;
}
 
+   ret = nfp_flower_reg_indir_block_handler(app, netdev, event);
+   if (ret & NOTIFY_STOP_MASK)
+   return ret;
+
return nfp_tunnel_mac_event_handler(app, netdev, event, ptr);
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 4a2b1a915131..8c84829ebd21 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -130,6 +130,7 @@ struct nfp_fl_lag {
  * @reify_wait_queue:  wait queue for repr reify response counting
  * @mtu_conf:  Configuration of repr MTU value
  * @nfp_lag:   Link aggregation data block
+ * @indr_block_cb_priv:List of priv data passed to indirect block cbs
  */
 struct nfp_flower_priv {
struct nfp_app *app;
@@ -162,6 +163,7 @@ struct nfp_flower_priv {
wait_queue_head_t reify_wait_queue;
struct nfp_mtu_conf mtu_conf;
struct nfp_fl_lag nfp_lag;
+   struct list_head indr_block_cb_priv;
 };
 
 /**
@@ -271,5 +273,8 @@ int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
   struct nfp_fl_pre_lag *pre_act);
 int nfp_flower_lag_get_output_id(struct nfp_app *app,
 struct net_device *master);
+int nfp_flower_reg_indir_block_handler(struct nfp_app *app,
+  struct net_device *netdev,
+  unsigned long event);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 2c32edfc1a9d..222e1a98cf16 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -128,6 +128,7 @@ nfp_flower_calc_opt_layer(struct 
flow_dissector_key_enc_opts *enc_opts,
 
 static int
 nfp_flower_calculate_key_layers(struct nfp_app *app,
+   struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
bool egress,
@@ -186,8 +187,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
skb_flow_dissector_target(flow->dissector,
  
FLOW_DISSECTOR_KEY_ENC_CONTROL,
  flow->key);
-   if (!egress)
-   return -EOPNOTSUPP;
 
if (mask_enc_ctl->addr_type != 0x ||
enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
@@ -250,6 +249,10 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
default:
return -EOPNOTSUPP;
}
+
+   /* Ensure the ingress netdev matches the expected tun type. */
+   if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
+   return -EOPNOTSUPP;
} else if (egress) {
/*

[PATCH net-next 0/6] net: sched: indirect tc block cb registration

2018-11-09 Thread Jakub Kicinski

John says:

This patchset introduces an alternative to egdev offload by allowing a
driver to register for block updates when an external device (e.g. tunnel
netdev) is bound to a TC block. Drivers can track new netdevs or register
to existing ones to receive information on such events. Based on this,
they may register for block offload rules using already existing
functions.

The patchset also implements this new indirect block registration in the
NFP driver to allow the offloading of tunnel rules. The use of egdev
offload (which is currently only used for tunnel offload) is subsequently
removed.

RFC v2 -> PATCH
 - removed embedded tracking function from indir block register (now up to
   driver to clean up after itself)
 - refactored NFP code due to recent submissions
 - removed priv list clean function in NFP (list should be cleared by
   indirect block unregisters)

RFC v1->v2:
 - free allocated owner struct in block_owner_clean function
 - add geneve type helper function
 - move test stub in NFP (v1 patch 2) to full tunnel offload
   implementation via indirect blocks (v2 patches 3-8)

John Hurley (6):
  net: sched: register callbacks for indirect tc block binds
  nfp: flower: allow non repr netdev offload
  nfp: flower: increase scope of netdev checking functions
  nfp: flower: offload tunnel decap rules via indirect TC blocks
  nfp: flower: remove TC egdev offloads
  nfp: flower: remove unnecessary code in flow lookup

 .../ethernet/netronome/nfp/flower/action.c|  28 +-
 .../net/ethernet/netronome/nfp/flower/cmsg.h  |  27 ++
 .../net/ethernet/netronome/nfp/flower/main.c  |  18 +-
 .../net/ethernet/netronome/nfp/flower/main.h  |  14 +-
 .../net/ethernet/netronome/nfp/flower/match.c |  38 +--
 .../ethernet/netronome/nfp/flower/metadata.c  |  12 +-
 .../ethernet/netronome/nfp/flower/offload.c   | 243 +++--
 .../netronome/nfp/flower/tunnel_conf.c|  19 +-
 include/net/pkt_cls.h |  34 +++
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 256 +-
 11 files changed, 531 insertions(+), 161 deletions(-)

-- 
2.17.1

Re: [PATCH net-next 7/8] ixgbe: extend PTP gettime function to read system clock

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 11:14 +0100, Miroslav Lichvar wrote:
> This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
> 
> Cc: Richard Cochran 
> Cc: Jacob Keller 
> Cc: Jeff Kirsher 
> Signed-off-by: Miroslav Lichvar 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 54 
>  1 file changed, 44 insertions(+), 10 deletions(-)

Acked-by: Jeff Kirsher 


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 5/8] e1000e: extend PTP gettime function to read system clock

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 11:14 +0100, Miroslav Lichvar wrote:
> This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
> 
> Cc: Richard Cochran 
> Cc: Jacob Keller 
> Cc: Jeff Kirsher 
> Signed-off-by: Miroslav Lichvar 
> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h  |  3 ++
>  drivers/net/ethernet/intel/e1000e/netdev.c | 42 --
>  drivers/net/ethernet/intel/e1000e/ptp.c| 16 +
>  3 files changed, 45 insertions(+), 16 deletions(-)

Acked-by: Jeff Kirsher 


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 6/8] igb: extend PTP gettime function to read system clock

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 11:14 +0100, Miroslav Lichvar wrote:
> This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.
> 
> Cc: Richard Cochran 
> Cc: Jacob Keller 
> Cc: Jeff Kirsher 
> Signed-off-by: Miroslav Lichvar 
> ---
>  drivers/net/ethernet/intel/igb/igb_ptp.c | 65 
>  1 file changed, 55 insertions(+), 10 deletions(-)

Acked-by: Jeff Kirsher 


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net] net: qualcomm: rmnet: Fix incorrect assignment of real_dev

2018-11-09 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Fri,  9 Nov 2018 18:56:27 -0700

> A null dereference was observed when a sysctl was being set
> from userspace and rmnet was stuck trying to complete some actions
> in the NETDEV_REGISTER callback. This is because the real_dev is set
> only after the device registration handler completes.
 ...
> Fixes: b752eff5be24 ("net: qualcomm: rmnet: Implement ndo_get_iflink")
> Signed-off-by: Sean Tranchetti 
> Signed-off-by: Subash Abhinov Kasiviswanathan 

Applied and queued up for -stable, thanks.

[PATCH net-next 5/6] nfp: flower: remove TC egdev offloads

2018-11-09 Thread Jakub Kicinski

From: John Hurley 

Previously, only tunnel decap rules required egdev registration for
offload in NFP. These are now supported via indirect TC block callbacks.

Remove the egdev code from NFP.

Signed-off-by: John Hurley 
Reviewed-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/flower/main.c  | 12 ---
 .../net/ethernet/netronome/nfp/flower/main.h  |  3 -
 .../ethernet/netronome/nfp/flower/metadata.c  |  1 +
 .../ethernet/netronome/nfp/flower/offload.c   | 79 ---
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index d1c3c2081461..5059110a1768 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -146,23 +146,12 @@ nfp_flower_repr_netdev_stop(struct nfp_app *app, struct 
nfp_repr *repr)
return nfp_flower_cmsg_portmod(repr, false, repr->netdev->mtu, false);
 }
 
-static int
-nfp_flower_repr_netdev_init(struct nfp_app *app, struct net_device *netdev)
-{
-   return tc_setup_cb_egdev_register(netdev,
- nfp_flower_setup_tc_egress_cb,
- netdev_priv(netdev));
-}
-
 static void
 nfp_flower_repr_netdev_clean(struct nfp_app *app, struct net_device *netdev)
 {
struct nfp_repr *repr = netdev_priv(netdev);
 
kfree(repr->app_priv);
-
-   tc_setup_cb_egdev_unregister(netdev, nfp_flower_setup_tc_egress_cb,
-netdev_priv(netdev));
 }
 
 static void
@@ -711,7 +700,6 @@ const struct nfp_app_type app_flower = {
.vnic_init  = nfp_flower_vnic_init,
.vnic_clean = nfp_flower_vnic_clean,
 
-   .repr_init  = nfp_flower_repr_netdev_init,
.repr_preclean  = nfp_flower_repr_netdev_preclean,
.repr_clean = nfp_flower_repr_netdev_clean,
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 8c84829ebd21..9d134aa871fc 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -207,7 +207,6 @@ struct nfp_fl_payload {
char *unmasked_data;
char *mask_data;
char *action_data;
-   bool ingress_offload;
 };
 
 extern const struct rhashtable_params nfp_flower_table_params;
@@ -259,8 +258,6 @@ void nfp_tunnel_del_ipv4_off(struct nfp_app *app, __be32 
ipv4);
 void nfp_tunnel_add_ipv4_off(struct nfp_app *app, __be32 ipv4);
 void nfp_tunnel_request_route(struct nfp_app *app, struct sk_buff *skb);
 void nfp_tunnel_keep_alive(struct nfp_app *app, struct sk_buff *skb);
-int nfp_flower_setup_tc_egress_cb(enum tc_setup_type type, void *type_data,
- void *cb_priv);
 void nfp_flower_lag_init(struct nfp_fl_lag *lag);
 void nfp_flower_lag_cleanup(struct nfp_fl_lag *lag);
 int nfp_flower_lag_reset(struct nfp_fl_lag *lag);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c 
b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
index 48729bf171e0..9b4711ce98f0 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c
@@ -287,6 +287,7 @@ int nfp_compile_flow_metadata(struct nfp_app *app,
 
nfp_flow->meta.host_ctx_id = cpu_to_be32(stats_cxt);
nfp_flow->meta.host_cookie = cpu_to_be64(flow->cookie);
+   nfp_flow->ingress_dev = netdev;
 
new_mask_id = 0;
if (!nfp_check_mask_add(app, nfp_flow->mask_data,
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 222e1a98cf16..0e2dfbb3ef86 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -131,7 +131,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
struct net_device *netdev,
struct nfp_fl_key_ls *ret_key_ls,
struct tc_cls_flower_offload *flow,
-   bool egress,
enum nfp_flower_tun_type *tun_type)
 {
struct flow_dissector_key_basic *mask_basic = NULL;
@@ -253,9 +252,6 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
/* Ensure the ingress netdev matches the expected tun type. */
if (!nfp_fl_netdev_is_tunnel_type(netdev, *tun_type))
return -EOPNOTSUPP;
-   } else if (egress) {
-   /* Reject non tunnel matches offloaded to egress repr. */
-   return -EOPNOTSUPP;
}
 
if (dissector_uses_key(flow->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
@@ -376,7 +372,7 @@ nfp_flower_calculate_key_layers(struct nfp_app *app,
 }
 
 static struct nfp_fl_payload *
-nfp_flower_allocate_new(struct nfp_fl_key_ls *key_layer, bool

Re: [net-next PATCH v2] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Nambiar, Amritha

On 11/9/2018 4:10 AM, Jiri Pirko wrote:
> Wed, Nov 07, 2018 at 10:22:42PM CET, amritha.namb...@intel.com wrote:
>> Added support in tc flower for filtering based on port ranges.
>>
>> Example:
>> 1. Match on a port range:
>> -
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>>  action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0
>> filter protocol ip pref 1 flower chain 0 handle 0x1
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_port range 20-30
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>>Action statistics:
>>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> 2. Match on IP address and port range:
>> --
>> $ tc filter add dev enp4s0 protocol ip parent :\
>>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>>  skip_hw action drop
>>
>> $ tc -s filter show dev enp4s0 parent :
>> filter protocol ip pref 1 flower chain 0 handle 0x2
>>  eth_type ipv4
>>  ip_proto tcp
>>  dst_ip 192.168.1.1
>>  dst_port range 100-200
>>  skip_hw
>>  not_in_hw
>>action order 1: gact action drop
>> random type none pass val 0
>> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>>Action statistics:
>>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>>backlog 0b 0p requeues 0
>>
>> v2:
>> Addressed Jiri's comments:
>> 1. Added separate functions for dst and src comparisons.
>> 2. Removed endpoint enum.
>> 3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
>>  lookup.
>> 4. Cleaned up fl_lookup function.
>>
>> Signed-off-by: Amritha Nambiar 
>> ---
>> include/uapi/linux/pkt_cls.h |7 ++
>> net/sched/cls_flower.c   |  133 
>> --
>> 2 files changed, 134 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 401d0c1..b63c3cf 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -405,6 +405,11 @@ enum {
>>  TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>>  TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>
>> +TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>> +
> 
> Please put it at the end of the enum, as David mentioned.

Will fix in v3.

> 
> 
>>  TCA_FLOWER_FLAGS,
>>  TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>>  TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>> @@ -518,6 +523,8 @@ enum {
>>  TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
>> };
>>
>> +#define TCA_FLOWER_MASK_FLAGS_RANGE (1 << 0) /* Range-based match */
>> +
>> /* Match-all classifier */
>>
>> enum {
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 9aada2d..9d2582d 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -55,6 +55,9 @@ struct fl_flow_key {
>>  struct flow_dissector_key_ip ip;
>>  struct flow_dissector_key_ip enc_ip;
>>  struct flow_dissector_key_enc_opts enc_opts;
>> +
> 
> No need for an empty line.

Will fix in v3.

> 
> 
>> +struct flow_dissector_key_ports tp_min;
>> +struct flow_dissector_key_ports tp_max;
>> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
>> longs. */
>>
>> struct fl_flow_mask_range {
>> @@ -65,6 +68,7 @@ struct fl_flow_mask_range {
>> struct fl_flow_mask {
>>  struct fl_flow_key key;
>>  struct fl_flow_mask_range range;
>> +u32 flags;
>>  struct rhash_head ht_node;
>>  struct rhashtable ht;
>>  struct rhashtable_params filter_ht_params;
>> @@ -179,13 +183,89 @@ static void fl_clear_masked_range(struct fl_flow_key 
>> *key,
>>  memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
>> }
>>
>> -static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
>> -   struct fl_flow_key *mkey)
>> +static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter,
>> +  struct fl_flow_key *key,
>> +  struct fl_flow_key *mkey)
>> +{
>> +__be16 min_mask, max_mask, min_val, max_val;
>> +
>> +min_mask = htons(filter->mask->key.tp_min.dst);
>> +max_mask = htons(filter->mask->key.tp_max.dst);
>> +min_val = htons(filter->key.tp_min.dst);
>> +max_val = htons(filter->key.tp_max.dst);
>> +
>> +if (min_mask && max_mask) {
>> +if (htons(key->tp.dst) < min_val ||
>> +htons(key->tp.dst) > max_val)
>> +return false;
>> +
>> +/* skb does not have min and max values */
>> +

Re: [net-next PATCH v2] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Nambiar, Amritha

On 11/8/2018 3:15 PM, David Miller wrote:
> From: Amritha Nambiar 
> Date: Wed, 07 Nov 2018 13:22:42 -0800
> 
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 401d0c1..b63c3cf 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -405,6 +405,11 @@ enum {
>>  TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>>  TCA_FLOWER_KEY_UDP_DST, /* be16 */
>>  
>> +TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>> +TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>> +
>>  TCA_FLOWER_FLAGS,
>>  TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>>  TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>> @@ -518,6 +523,8 @@ enum {
> 
> I don't think you can do this without breaking UAPI, this changes the
> value of TCA_FLOWER_FLAGS and all subsequent values in this
> enumeration.
> 

Will move the new fields to the bottom of the enum in v3.

Re: [PATCH net v2] net: sched: cls_flower: validate nested enc_opts_policy to avoid warning

2018-11-09 Thread Jiri Pirko

Sat, Nov 10, 2018 at 06:06:26AM CET, jakub.kicin...@netronome.com wrote:
>TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
>currently contain further nested attributes, which are parsed by
>hand, so the policy is never actually used resulting in a W=1
>build warning:
>
>net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used 
>[-Wunused-const-variable=]
> enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = {
>
>Add the validation anyway to avoid potential bugs when other
>attributes are added and to make the attribute structure slightly
>more clear.  Validation will also set extact to point to bad
>attribute on error.
>
>Signed-off-by: Jakub Kicinski 
>Acked-by: Simon Horman 

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Acked-by: Jiri Pirko

[PATCH net] net: sched: cls_flower: validate nested enc_opts_policy to avoid build warning

2018-11-09 Thread Jakub Kicinski

TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only
currently contain further nested attributes, which are parsed by
hand, so the policy is never actually used.  Add the validation
anyway to avoid potential bugs when other attributes are added
and to make the attribute structure slightly more clear.  Validation
will also set extact to point to bad attribute on error.

Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Signed-off-by: Jakub Kicinski 
Acked-by: Simon Horman 
---
 net/sched/cls_flower.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9aada2d0ef06..c6c327874abc 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -709,11 +709,23 @@ static int fl_set_enc_opt(struct nlattr **tb, struct 
fl_flow_key *key,
  struct netlink_ext_ack *extack)
 {
const struct nlattr *nla_enc_key, *nla_opt_key, *nla_opt_msk = NULL;
-   int option_len, key_depth, msk_depth = 0;
+   int err, option_len, key_depth, msk_depth = 0;
+
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
 
nla_enc_key = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS]);
 
if (tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]) {
+   err = nla_validate_nested(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK],
+ TCA_FLOWER_KEY_ENC_OPTS_MAX,
+ enc_opts_policy, extack);
+   if (err)
+   return err;
+
nla_opt_msk = nla_data(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
msk_depth = nla_len(tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
}
-- 
2.17.1

Re: [PATCH net 1/1] bnx2x: Assign unique DMAE channel number for FW DMAE transactions.

2018-11-09 Thread David Miller

From: Sudarsana Reddy Kalluru 
Date: Fri, 9 Nov 2018 02:10:43 -0800

> +/* Following is the DMAE channel number allocation for the clients.
> + *   MFW: OCBB/OCSD implementations use DMAE channels 14/15 respectively.
> + *   Driver: 0-3 and 8-11 (for PF dmae operations)
> + *   4 and 12 (for stats requests)
> + */
> +#define BNX2X_FW_DMAE_C 13 /* Channel for FW DMAE operations 
> */
 ...
> + start_params->dmae_cmd_id = BNX2X_FW_DMAE_C;

Why do you need this, it never changes, and:

> + rdata->dmae_cmd_id  = start_params->dmae_cmd_id;

It always is the same value here in the one place it is used.

Just assign BNX2X_FW_DMAE_C directly to rdata->dmae_cmd_id please.

Re: [PATCH 08/20] octeontx2-af: Alloc and config NPC MCAM entry at a time

2018-11-09 Thread Arnd Bergmann

On Fri, Nov 9, 2018 at 6:13 PM Sunil Kovvuri  wrote:
> On Fri, Nov 9, 2018 at 4:32 PM Arnd Bergmann  wrote:
> > On Fri, Nov 9, 2018 at 5:21 AM Sunil Kovvuri  
> > wrote:

> >
> > Since b is aligned to four bytes, you get padding between a and b.
> > On top of that, you also get padding after c to make the size of
> > structure itself be a multiple of its alignment. For interfaces, we
> > should avoid both kinds of padding. This can be done by marking
> > members as __packed (usually I don't recommend that), by
> > changing the size of members, or by adding explicit 'reserved'
> > fields in place of the padding.
> >
> > > > I also noticed a similar problem in struct mbox_msghdr. Maybe
> > > > use the 'pahole' tool to check for this kind of padding in the
> > > > API structures.
>
> Got your point now and agree that padding has to be avoided.
> But this is a big change and above pointed structure is not
> the only one as this applies to all structures in the file.
>
> Would it be okay if I submit a separate patch after this series
> addressing all structures ?

It depends on how you want to address it. If you want to
change the structure layout, then I think it would be better
integrated into the series as that is an incompatible interface
change. If you just want to add reserved members to make
the padding explicit, that could be a follow-up.

Arnd

Re: [PATCH net-next 0/4] Remove VLAN_TAG_PRESENT from drivers

2018-11-09 Thread Shiraz Saleem

On Thu, Nov 08, 2018 at 06:44:46PM +0100, Michał Mirosław wrote:
> This series removes VLAN_TAG_PRESENT use from network drivers in
> preparation to removing its special meaning.
> 
> Michał Mirosław (4):
>   i40iw: remove use of VLAN_TAG_PRESENT
>   cnic: remove use of VLAN_TAG_PRESENT
>   gianfar: remove use of VLAN_TAG_PRESENT
>   OVS: remove use of VLAN_TAG_PRESENT
> 
>  drivers/infiniband/hw/i40iw/i40iw_cm.c|  8 +++
>

i40iw bit looks fine. Thanks!

Acked-by: Shiraz Saleem

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread David Ahern

On 11/9/18 9:21 AM, David Ahern wrote:
>> Is there possible to add only counters from xdp for vlans ?
>> This will help me in testing.
> I will take a look today at adding counters that you can dump using
> bpftool. It will be a temporary solution for this xdp program only.
> 

Same tree, kernel-tables-wip-02 branch. Compile kernel and install.
Compile samples as before.

If you give the userspace program a -t arg, it loop showing stats.
Ctrl-C to break. The xdp programs are not detached on exit.

Example:

./xdp_fwd -t 5 eth1 eth2 eth3 eth4

15:59:32:   rx tx  dropped  skippedl3_devfib_dev
index  3:   901158 9011580   18 0  0
index  4:   901159 9011580   20 0 901139
index 10:0  00019 19
index 11:0  000901139 901139
index 15:0  00019 19
index 16:0  000901139  0

Rx and Tx counters are for the physical port.

VLANs show up as l3_dev (ingress) and fib_dev (egress).

dropped is anytime the xdp program returns XDP_DROP (e.g., invalid packet)

skipped is anytime the program returns XDP_PASS (e.g., not ipv4 or ipv6,
local traffic, or needs full stack assist).

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Edward Cree

On 09/11/18 04:35, Alexei Starovoitov wrote:
> On Thu, Nov 08, 2018 at 10:56:55PM +, Edward Cree wrote:
>>  think this question of maps should be discussed in tomorrow's
>>  call, since it is when we start having other kinds of instances
> turned out most of us have a conflict, so the earliest is 1:30pm on Friday.
> still works for you?

Yep (that's 9.30pm GMT right?)

I'm assuming same bluejeans link again.

-Ed

[PATCH net-next 0/2] net: phy: add macros for PHYID matching in PHY driver config

2018-11-09 Thread Heiner Kallweit

Add macros for PHYID matching to be used in PHY driver configs.
By using these macros some boilerplate code can be avoided.

Use them initially in the Realtek PHY drivers.

Heiner Kallweit (2):
  net: phy: add macros for PHYID matching
  net: phy: realtek: use new PHYID matching macros

 drivers/net/phy/realtek.c | 29 ++---
 include/linux/phy.h   |  4 
 2 files changed, 14 insertions(+), 19 deletions(-)

-- 
2.19.1

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread Paweł Staszewski





W dniu 09.11.2018 o 17:21, David Ahern pisze:

On 11/9/18 3:20 AM, Paweł Staszewski wrote:

I just catch some weird behavior :)
All was working fine for about 20k packets

Then after xdp start to forward every 10 packets

Interesting. Any counter showing drops?

nothing that will fit

NIC statistics:
 rx_packets: 187041
 rx_bytes: 10600954
 tx_packets: 40316
 tx_bytes: 16526844
 tx_tso_packets: 797
 tx_tso_bytes: 3876084
 tx_tso_inner_packets: 0
 tx_tso_inner_bytes: 0
 tx_added_vlan_packets: 38391
 tx_nop: 2
 rx_lro_packets: 0
 rx_lro_bytes: 0
 rx_ecn_mark: 0
 rx_removed_vlan_packets: 187041
 rx_csum_unnecessary: 0
 rx_csum_none: 150011
 rx_csum_complete: 37030
 rx_csum_unnecessary_inner: 0
 rx_xdp_drop: 0
 rx_xdp_redirect: 64893
 rx_xdp_tx_xmit: 0
 rx_xdp_tx_full: 0
 rx_xdp_tx_err: 0
 rx_xdp_tx_cqe: 0
 tx_csum_none: 2468
 tx_csum_partial: 35955
 tx_csum_partial_inner: 0
 tx_queue_stopped: 0
 tx_queue_dropped: 0
 tx_xmit_more: 0
 tx_recover: 0
 tx_cqes: 38423
 tx_queue_wake: 0
 tx_udp_seg_rem: 0
 tx_cqe_err: 0
 tx_xdp_xmit: 0
 tx_xdp_full: 0
 tx_xdp_err: 0
 tx_xdp_cqes: 0
 rx_wqe_err: 0
 rx_mpwqe_filler_cqes: 0
 rx_mpwqe_filler_strides: 0
 rx_buff_alloc_err: 0
 rx_cqe_compress_blks: 0
 rx_cqe_compress_pkts: 0
 rx_page_reuse: 0
 rx_cache_reuse: 186302
 rx_cache_full: 0
 rx_cache_empty: 666768
 rx_cache_busy: 174
 rx_cache_waive: 0
 rx_congst_umr: 0
 rx_arfs_err: 0
 ch_events: 249320
 ch_poll: 249321
 ch_arm: 249001
 ch_aff_change: 0
 ch_eq_rearm: 0
 rx_out_of_buffer: 0
 rx_if_down_packets: 57
 rx_vport_unicast_packets: 142659
 rx_vport_unicast_bytes: 42706914
 tx_vport_unicast_packets: 40167
 tx_vport_unicast_bytes: 16668096
 rx_vport_multicast_packets: 39188170
 rx_vport_multicast_bytes: 3466527450
 tx_vport_multicast_packets: 58
 tx_vport_multicast_bytes: 4556
 rx_vport_broadcast_packets: 16343520
 rx_vport_broadcast_bytes: 1031334602
 tx_vport_broadcast_packets: 91
 tx_vport_broadcast_bytes: 5460
 rx_vport_rdma_unicast_packets: 0
 rx_vport_rdma_unicast_bytes: 0
 tx_vport_rdma_unicast_packets: 0
 tx_vport_rdma_unicast_bytes: 0
 rx_vport_rdma_multicast_packets: 0
 rx_vport_rdma_multicast_bytes: 0
 tx_vport_rdma_multicast_packets: 0
 tx_vport_rdma_multicast_bytes: 0
 tx_packets_phy: 40316
 rx_packets_phy: 55674361
 rx_crc_errors_phy: 0
 tx_bytes_phy: 16839376
 rx_bytes_phy: 4763267396
 tx_multicast_phy: 58
 tx_broadcast_phy: 91
 rx_multicast_phy: 39188180
 rx_broadcast_phy: 16343521
 rx_in_range_len_errors_phy: 0
 rx_out_of_range_len_phy: 0
 rx_oversize_pkts_phy: 0
 rx_symbol_err_phy: 0
 tx_mac_control_phy: 0
 rx_mac_control_phy: 0
 rx_unsupported_op_phy: 0
 rx_pause_ctrl_phy: 0
 tx_pause_ctrl_phy: 0
 rx_discards_phy: 1
 tx_discards_phy: 0
 tx_errors_phy: 0
 rx_undersize_pkts_phy: 0
 rx_fragments_phy: 0
 rx_jabbers_phy: 0
 rx_64_bytes_phy: 3792455
 rx_65_to_127_bytes_phy: 51821620
 rx_128_to_255_bytes_phy: 37669
 rx_256_to_511_bytes_phy: 1481
 rx_512_to_1023_bytes_phy: 434
 rx_1024_to_1518_bytes_phy: 694
 rx_1519_to_2047_bytes_phy: 20008
 rx_2048_to_4095_bytes_phy: 0
 rx_4096_to_8191_bytes_phy: 0
 rx_8192_to_10239_bytes_phy: 0
 link_down_events_phy: 0
 rx_pcs_symbol_err_phy: 0
 rx_corrected_bits_phy: 6
 rx_err_lane_0_phy: 0
 rx_err_lane_1_phy: 0
 rx_err_lane_2_phy: 0
 rx_err_lane_3_phy: 6
 rx_buffer_passed_thres_phy: 0
 rx_pci_signal_integrity: 0
 tx_pci_signal_integrity: 82
 outbound_pci_stalled_rd: 0
 outbound_pci_stalled_wr: 0
 outbound_pci_stalled_rd_events: 0
 outbound_pci_stalled_wr_events: 0
 rx_prio0_bytes: 4144920388
 rx_prio0_packets: 48310037
 tx_prio0_bytes: 16839376
 tx_prio0_packets: 40316
 rx_prio1_bytes: 481032
 rx_prio1_packets: 7074
 tx_prio1_bytes: 0
 tx_prio1_packets: 0
 rx_prio2_bytes: 9074194
 rx_prio2_packets: 106207
 tx_prio2_bytes: 0
 tx_prio2_packets: 0
 rx_prio3_bytes: 0
 rx_prio3_packets: 0
 tx_prio3_bytes: 0
 tx_prio3_packets: 0
 rx_prio4_bytes: 0
 rx_prio4_packets: 0
 tx_prio4_bytes: 0
 tx_prio4_packets: 0
 rx_prio5_bytes: 0
 rx_prio5_packets: 0
 tx_prio5_bytes: 0
 tx_prio5_packets: 0
 rx_prio6_bytes: 371961810
 rx_prio6_packets: 4006281
 tx_prio6_bytes: 0
 tx_prio6_packets: 0
 rx_prio7_bytes: 236830040
 rx_prio7_packets: 3244761
 tx_prio7_bytes: 0
 tx_prio7_packets: 0
 tx_pause_storm_warning_events : 0
 tx_pause_storm_error_events: 0
 module_unplug: 0
 module_bus_stuck: 0
 module_high_temp: 0
 module_bad_shorted: 0

NIC

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread Paweł Staszewski





W dniu 08.11.2018 o 20:12, Paweł Staszewski pisze:
CPU load is lower than for connectx4 - but it looks like bandwidth 
limit is the same :)

But also after reaching 60Gbit/60Gbit

 bwm-ng v0.6.1 (probing every 1.000s), press 'h' for help
  input: /proc/net/dev type: rate
  - iface   Rx Tx    Total
== 

 enp175s0:  45.09 Gb/s   15.09 Gb/s   
60.18 Gb/s
 enp216s0:  15.14 Gb/s   45.19 Gb/s   
60.33 Gb/s
-- 

    total:  60.45 Gb/s   60.48 Gb/s 120.93 Gb/s 


Today reached 65/65Gbit/s

But starting from 60Gbit/s RX / 60Gbit TX nics start to drop packets 
(with 50%CPU on all 28cores) - so still there is cpu power to use :).


So checked other stats.
softnet_stats shows average 1k squeezed per sec:
cpu  total    dropped   squeezed  collision    rps flow_limit
  0  18554  0  1  0  0 0
  1  16728  0  1  0  0 0
  2  18033  0  1  0  0 0
  3  17757  0  1  0  0 0
  4  18861  0  0  0  0 0
  5  0  0  1  0  0 0
  6  2  0  1  0  0 0
  7  0  0  1  0  0 0
  8  0  0  0  0  0 0
  9  0  0  1  0  0 0
 10  0  0  0  0  0 0
 11  0  0  1  0  0 0
 12 50  0  1  0  0 0
 13    257  0  0  0  0 0
 14 3629115363  0    3353259  0  0 0
 15  255167835  0    3138271  0  0 0
 16 4240101961  0    3036130  0  0 0
 17  599810018  0    3072169  0  0 0
 18  432796524  0    3034191  0  0 0
 19   41803906  0    3037405  0  0 0
 20  900382666  0    3112294  0  0 0
 21  620926085  0    3086009  0  0 0
 22   41861198  0    3023142  0  0 0
 23 4090425574  0    2990412  0  0 0
 24 4264870218  0    3010272  0  0 0
 25  141401811  0    3027153  0  0 0
 26  104155188  0    3051251  0  0 0
 27 4261258691  0    3039765  0  0 0
 28  4  0  1  0  0 0
 29  4  0  0  0  0 0
 30  0  0  1  0  0 0
 31  0  0  0  0  0 0
 32  3  0  1  0  0 0
 33  1  0  1  0  0 0
 34  0  0  1  0  0 0
 35  0  0  0  0  0 0
 36  0  0  1  0  0 0
 37  0  0  1  0  0 0
 38  0  0  1  0  0 0
 39  0  0  1  0  0 0
 40  0  0  0  0  0 0
 41  0  0  1  0  0 0
 42  299758202  0    3139693  0  0 0
 43 4254727979  0    3103577  0  0 0
 44 195943  0    2554885  0  0 0
 45 1675702723  0    2513481  0  0 0
 46 1908435503  0    2519698  0  0 0
 47 1877799710  0    2537768  0  0 0
 48 2384274076  0    2584673  0  0 0
 49 2598104878  0    2593616  0  0 0
 50 1897566829  0    2530857  0  0 0
 51 1712741629  0    2489089  0  0 0
 52 1704033648  0    2495892  0  0 0
 53 1636781820  0    2499783  0  0 0
 54 1861997734  0    2541060  0  0 0
 55 2113521616  0    2555673  0  0 0


So i rised netdev backlog and budged to rly high values
524288 for netdev_budget and same for backlog

This rised sortirqs from about 600k/sec to 800k/sec for NET_TX/NET_RX

But after this changes i have less packets drops.


Below perf top from max traffic reached:
   PerfTop:   72230 irqs/sec  kernel:99.4%  exact:  0.0% [4000Hz 
cycles],  (all, 56 CPUs)

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Alexei Starovoitov

On 11/9/18 1:28 PM, Edward Cree wrote:
> On 09/11/18 21:14, Alexei Starovoitov wrote:
>> same link, but i cannot make it right now.
>> have to extinguish few fires.
>> may be at 2pm (unlikely) or 3pm (more likely) PST?
> 
> Yep I can do either of those, just let me know which when you can.

still swamped. but see the light.
let's do 3pm

[PATCH net-next 2/2] net: phy: realtek: use new PHYID matching macros

2018-11-09 Thread Heiner Kallweit

Use new macros for PHYID matching to avoid boilerplate code.

Signed-off-by: Heiner Kallweit 
---
 drivers/net/phy/realtek.c | 29 ++---
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 0f8e5b1c9..c6010fb1a 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -213,14 +213,12 @@ static int rtl8366rb_config_init(struct phy_device 
*phydev)
 
 static struct phy_driver realtek_drvs[] = {
{
-   .phy_id = 0x8201,
+   PHY_ID_MATCH_EXACT(0x8201),
.name   = "RTL8201CP Ethernet",
-   .phy_id_mask= 0x,
.features   = PHY_BASIC_FEATURES,
}, {
-   .phy_id = 0x001cc816,
+   PHY_ID_MATCH_EXACT(0x001cc816),
.name   = "RTL8201F Fast Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_BASIC_FEATURES,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
@@ -229,17 +227,15 @@ static struct phy_driver realtek_drvs[] = {
.read_page  = rtl821x_read_page,
.write_page = rtl821x_write_page,
}, {
-   .phy_id = 0x001cc910,
+   PHY_ID_MATCH_EXACT(0x001cc910),
.name   = "RTL8211 Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_aneg= rtl8211_config_aneg,
.read_mmd   = _read_mmd_unsupported,
.write_mmd  = _write_mmd_unsupported,
}, {
-   .phy_id = 0x001cc912,
+   PHY_ID_MATCH_EXACT(0x001cc912),
.name   = "RTL8211B Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
@@ -248,35 +244,31 @@ static struct phy_driver realtek_drvs[] = {
.suspend= rtl8211b_suspend,
.resume = rtl8211b_resume,
}, {
-   .phy_id = 0x001cc913,
+   PHY_ID_MATCH_EXACT(0x001cc913),
.name   = "RTL8211C Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_init= rtl8211c_config_init,
.read_mmd   = _read_mmd_unsupported,
.write_mmd  = _write_mmd_unsupported,
}, {
-   .phy_id = 0x001cc914,
+   PHY_ID_MATCH_EXACT(0x001cc914),
.name   = "RTL8211DN Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.ack_interrupt  = rtl821x_ack_interrupt,
.config_intr= rtl8211e_config_intr,
.suspend= genphy_suspend,
.resume = genphy_resume,
}, {
-   .phy_id = 0x001cc915,
+   PHY_ID_MATCH_EXACT(0x001cc915),
.name   = "RTL8211E Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.ack_interrupt  = _ack_interrupt,
.config_intr= _config_intr,
.suspend= genphy_suspend,
.resume = genphy_resume,
}, {
-   .phy_id = 0x001cc916,
+   PHY_ID_MATCH_EXACT(0x001cc916),
.name   = "RTL8211F Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_init= _config_init,
.ack_interrupt  = _ack_interrupt,
@@ -286,9 +278,8 @@ static struct phy_driver realtek_drvs[] = {
.read_page  = rtl821x_read_page,
.write_page = rtl821x_write_page,
}, {
-   .phy_id = 0x001cc961,
+   PHY_ID_MATCH_EXACT(0x001cc961),
.name   = "RTL8366RB Gigabit Ethernet",
-   .phy_id_mask= 0x001f,
.features   = PHY_GBIT_FEATURES,
.config_init= _config_init,
.suspend= genphy_suspend,
@@ -299,7 +290,7 @@ static struct phy_driver realtek_drvs[] = {
 module_phy_driver(realtek_drvs);
 
 static const struct mdio_device_id __maybe_unused realtek_tbl[] = {
-   { 0x001cc800, GENMASK(31, 10) },
+   { PHY_ID_MATCH_VENDOR(0x001cc800) },
{ }
 };
 
-- 
2.19.1

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Edward Cree

On 09/11/18 21:14, Alexei Starovoitov wrote:
> same link, but i cannot make it right now.
> have to extinguish few fires.
> may be at 2pm (unlikely) or 3pm (more likely) PST?

Yep I can do either of those, just let me know which when you can.

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread Jeff Kirsher

On Fri, 2018-11-09 at 15:28 -0800, David Miller wrote:
> From: Miroslav Lichvar 
> Date: Fri,  9 Nov 2018 11:14:41 +0100
> 
> > RFC->v1:
> > - added new patches
> > - separated PHC timestamp from ptp_system_timestamp
> > - fixed memory leak in PTP_SYS_OFFSET_EXTENDED
> > - changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
> > - fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
> > - fixed timecounter updates in drivers
> > - split gettimex in igb driver
> > - fixed ptp_read_* functions to be available without
> >   CONFIG_PTP_1588_CLOCK
> > 
> > This series enables a more accurate synchronization between PTP
> > hardware
> > clocks and the system clock.
>  ...
> 
> This series looks good to me but I want to give Richard an opportunity to
> review it first.

Dave, I also do not want to hold this series up by picking up patches 5, 6
and 7 (Intel drivers) so please apply the entire series after Richard
provides his review.


signature.asc
Description: This is a digitally signed message part

Re: [PATCH iproute] ss: Actually print left delimiter for columns

2018-11-09 Thread Stefano Brivio

On Fri, 9 Nov 2018 09:05:46 -0800
Stephen Hemminger  wrote:

> On Mon, 29 Oct 2018 23:04:25 +0100
> Stefano Brivio  wrote:
> 
> > While rendering columns, we use a local variable to keep track of the
> > field currently being printed, without touching current_field, which is
> > used for buffering.
> > 
> > Use the right pointer to access the left delimiter for the current column,
> > instead of always printing the left delimiter for the last buffered field,
> > which is usually an empty string.
> > 
> > This fixes an issue especially visible on narrow terminals, where some
> > columns might be displayed without separation.
> > 
> > Reported-by: YoyPa 
> > Fixes: 691bd854bf4a ("ss: Buffer raw fields first, then render them as a 
> > table")
> > Signed-off-by: Stefano Brivio 
> > Tested-by: YoyPa   
> 
> This test broke the testsuite/ss/ssfilter.t test.
> Please fix the test to match your new output format, or I will have to revert 
> it.

Ouch, sorry, I didn't notice that "new" test. I'll fix that by tomorrow.

-- 
Stefano

Re: [PATCH v2 net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-09 Thread Florian Fainelli

On 11/9/18 9:35 AM, Heiner Kallweit wrote:
> As a heritage from the very early days of phylib member interrupts is
> defined as u32 even though it's just a flag whether interrupts are
> enabled. So we can change it to a bitfield member. In addition change
> the code dealing with this member in a way that it's clear we're
> dealing with a bool value.
> 
> Signed-off-by: Heiner Kallweit 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next v2 1/2] dpaa2-eth: defer probe on object allocate

2018-11-09 Thread Andrew Lunn

On Fri, Nov 09, 2018 at 03:26:45PM +, Ioana Ciornei wrote:
> The fsl_mc_object_allocate function can fail because not all allocatable
> objects are probed by the fsl_mc_allocator at the call time. Defer the
> dpaa2-eth probe when this happens.
> 
> Signed-off-by: Ioana Ciornei 
> ---
> Changes in v2:
>   - proper handling of IS_ERR_OR_NULL

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH bpf-next v2 02/13] bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO

2018-11-09 Thread Alexei Starovoitov

On 11/9/18 12:00 PM, Edward Cree wrote:
> On 09/11/18 04:35, Alexei Starovoitov wrote:
>> On Thu, Nov 08, 2018 at 10:56:55PM +, Edward Cree wrote:
>>>   think this question of maps should be discussed in tomorrow's
>>>   call, since it is when we start having other kinds of instances
>> turned out most of us have a conflict, so the earliest is 1:30pm on Friday.
>> still works for you?
> 
> Yep (that's 9.30pm GMT right?)
> 
> I'm assuming same bluejeans link again.

same link, but i cannot make it right now.
have to extinguish few fires.
may be at 2pm (unlikely) or 3pm (more likely) PST?

Re: [PATCH net-next] cxgb4: free mac_hlist properly

2018-11-09 Thread David Miller

From: Arjun Vynipadath 
Date: Fri,  9 Nov 2018 14:50:25 +0530

> The locally maintained list for tracking hash mac table was
> not freed during driver remove.
> 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net-next] cxgb4vf: free mac_hlist properly

2018-11-09 Thread David Miller

From: Arjun Vynipadath 
Date: Fri,  9 Nov 2018 14:52:53 +0530

> The locally maintained list for tracking hash mac table was
> not freed during driver remove.
> 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net-next] cxgb4vf: fix memleak in mac_hlist initialization

2018-11-09 Thread David Miller

From: Arjun Vynipadath 
Date: Fri,  9 Nov 2018 14:52:01 +0530

> mac_hlist was initialized during adapter_up, which will be called
> every time a vf device is first brought up, or every time when device
> is brought up again after bringing all devices down. This means our
> state of previous list is lost, causing a memleak if entries are
> present in the list. To fix that, move list init to the condition
> that performs initial one time adapter setup.
> 
> Signed-off-by: Arjun Vynipadath 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH v5 bpf-next 0/7] bpftool: support loading flow dissector

2018-11-09 Thread Jakub Kicinski

On Fri,  9 Nov 2018 08:21:39 -0800, Stanislav Fomichev wrote:
> v5 changes:
> * FILE -> PATH for load/loadall (can be either file or directory now)
> * simpler implementation for __bpf_program__pin_name
> * removed p_err for REQ_ARGS checks
> * parse_atach_detach_args -> parse_attach_detach_args
> * for -> while in bpf_object__pin_{programs,maps} recovery

Thanks!  Patch 3 needs attention from maintainers but the rest LGTM!

Re: [PATCH v2 net-next] net: phy: improve struct phy_device member interrupts handling

2018-11-09 Thread Andrew Lunn

On Fri, Nov 09, 2018 at 06:35:52PM +0100, Heiner Kallweit wrote:
> As a heritage from the very early days of phylib member interrupts is
> defined as u32 even though it's just a flag whether interrupts are
> enabled. So we can change it to a bitfield member. In addition change
> the code dealing with this member in a way that it's clear we're
> dealing with a bool value.
> 
> Signed-off-by: Heiner Kallweit 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH][net-next] net: tcp: remove BUG_ON from tcp_v4_err

2018-11-09 Thread David Miller

From: Li RongQing 
Date: Fri,  9 Nov 2018 17:04:51 +0800

> if skb is NULL pointer, and the following access of skb's
> skb_mstamp_ns will trigger panic, which is same as BUG_ON
> 
> Signed-off-by: Li RongQing 

Applied.

Re: [PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread David Miller

From: Miroslav Lichvar 
Date: Fri,  9 Nov 2018 11:14:41 +0100

> RFC->v1:
> - added new patches
> - separated PHC timestamp from ptp_system_timestamp
> - fixed memory leak in PTP_SYS_OFFSET_EXTENDED
> - changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
> - fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
> - fixed timecounter updates in drivers
> - split gettimex in igb driver
> - fixed ptp_read_* functions to be available without
>   CONFIG_PTP_1588_CLOCK
> 
> This series enables a more accurate synchronization between PTP hardware
> clocks and the system clock.
 ...

This series looks good to me but I want to give Richard an opportunity to
review it first.

[PATCH net-next 1/2] net: phy: add macros for PHYID matching

2018-11-09 Thread Heiner Kallweit

Add macros for PHYID matching to be used in PHY driver configs.
By using these macros some boilerplate code can be avoided.

Signed-off-by: Heiner Kallweit 
---
 include/linux/phy.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index 17d1f6472..03005c65e 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -651,6 +651,10 @@ struct phy_driver {
 #define PHY_ANY_ID "MATCH ANY PHY"
 #define PHY_ANY_UID 0x
 
+#define PHY_ID_MATCH_EXACT(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 0)
+#define PHY_ID_MATCH_MODEL(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 4)
+#define PHY_ID_MATCH_VENDOR(id) .phy_id = (id), .phy_id_mask = GENMASK(31, 10)
+
 /* A Structure for boards to register fixups with the PHY Lib */
 struct phy_fixup {
struct list_head list;
-- 
2.19.1

Re: [PATCH v2 bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.

2018-11-09 Thread Daniel Borkmann

On 10/29/2018 05:02 AM, Nitin Hande wrote:
> 
> This patch proposes to extend the sk_lookup() BPF API to the
> XDP hookpoint. The sk_lookup() helper supports a lookup
> on incoming packet to find the corresponding socket that will
> receive this packet. Current support for this BPF API is
> at the tc hookpoint. This patch will extend this API at XDP
> hookpoint. A XDP program can map the incoming packet to the
> 5-tuple parameter and invoke the API to find the corresponding
> socket structure.
> 
> Signed-off-by: Nitin Hande 

Looks good to me, applied to bpf-next, thanks Nitin!

[PATCH net-next 3/8] ptp: add PTP_SYS_OFFSET_EXTENDED ioctl

2018-11-09 Thread Miroslav Lichvar

The PTP_SYS_OFFSET ioctl, which can be used to measure the offset
between a PHC and the system clock, includes the total time that the
driver needs to read the PHC timestamp.

This typically involves reading of multiple PCI registers (sometimes in
multiple iterations) and the register that contains the lowest bits of
the timestamp is not read in the middle between the two readings of the
system clock. This asymmetry causes the measured offset to have a
significant error.

Introduce a new ioctl, driver function, and helper functions, which
allow the reading of the lowest register to be isolated from the other
readings in order to reduce the asymmetry. The ioctl returns three
timestamps for each measurement:
- system time right before reading the lowest bits of the PHC timestamp
- PHC time
- system time immediately after reading the lowest bits of the PHC
  timestamp

Cc: Richard Cochran 
Cc: Jacob Keller 
Cc: Marcelo Tosatti 
Signed-off-by: Miroslav Lichvar 
---
 drivers/ptp/ptp_chardev.c| 33 
 include/linux/ptp_clock_kernel.h | 31 ++
 include/uapi/linux/ptp_clock.h   | 12 
 3 files changed, 76 insertions(+)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 3c681bed5703..aad0d36cf5c0 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -122,10 +122,12 @@ int ptp_open(struct posix_clock *pc, fmode_t fmode)
 long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+   struct ptp_sys_offset_extended *extoff = NULL;
struct ptp_sys_offset_precise precise_offset;
struct system_device_crosststamp xtstamp;
struct ptp_clock_info *ops = ptp->info;
struct ptp_sys_offset *sysoff = NULL;
+   struct ptp_system_timestamp sts;
struct ptp_clock_request req;
struct ptp_clock_caps caps;
struct ptp_clock_time *pct;
@@ -211,6 +213,36 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
err = -EFAULT;
break;
 
+   case PTP_SYS_OFFSET_EXTENDED:
+   if (!ptp->info->gettimex64) {
+   err = -EOPNOTSUPP;
+   break;
+   }
+   extoff = memdup_user((void __user *)arg, sizeof(*extoff));
+   if (IS_ERR(extoff)) {
+   err = PTR_ERR(extoff);
+   extoff = NULL;
+   break;
+   }
+   if (extoff->n_samples > PTP_MAX_SAMPLES) {
+   err = -EINVAL;
+   break;
+   }
+   for (i = 0; i < extoff->n_samples; i++) {
+   err = ptp->info->gettimex64(ptp->info, , );
+   if (err)
+   goto out;
+   extoff->ts[i][0].sec = sts.pre_ts.tv_sec;
+   extoff->ts[i][0].nsec = sts.pre_ts.tv_nsec;
+   extoff->ts[i][1].sec = ts.tv_sec;
+   extoff->ts[i][1].nsec = ts.tv_nsec;
+   extoff->ts[i][2].sec = sts.post_ts.tv_sec;
+   extoff->ts[i][2].nsec = sts.post_ts.tv_nsec;
+   }
+   if (copy_to_user((void __user *)arg, extoff, sizeof(*extoff)))
+   err = -EFAULT;
+   break;
+
case PTP_SYS_OFFSET:
sysoff = memdup_user((void __user *)arg, sizeof(*sysoff));
if (IS_ERR(sysoff)) {
@@ -284,6 +316,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
}
 
 out:
+   kfree(extoff);
kfree(sysoff);
return err;
 }
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index 51349d124ee5..a1ec0448e341 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -39,6 +39,15 @@ struct ptp_clock_request {
 };
 
 struct system_device_crosststamp;
+
+/**
+ * struct ptp_system_timestamp - system time corresponding to a PHC timestamp
+ */
+struct ptp_system_timestamp {
+   struct timespec64 pre_ts;
+   struct timespec64 post_ts;
+};
+
 /**
  * struct ptp_clock_info - decribes a PTP hardware clock
  *
@@ -75,6 +84,14 @@ struct system_device_crosststamp;
  * @gettime64:  Reads the current time from the hardware clock.
  *  parameter ts: Holds the result.
  *
+ * @gettimex64:  Reads the current time from the hardware clock and optionally
+ *   also the system clock.
+ *   parameter ts: Holds the PHC timestamp.
+ *   parameter sts: If not NULL, it holds a pair of timestamps from
+ *   the system clock. The first reading is made right before
+ *   reading the lowest bits of the PHC timestamp and the second
+ *   reading immediately follows

[PATCH net-next 0/8] More accurate PHC<->system clock synchronization

2018-11-09 Thread Miroslav Lichvar

RFC->v1:
- added new patches
- separated PHC timestamp from ptp_system_timestamp
- fixed memory leak in PTP_SYS_OFFSET_EXTENDED
- changed PTP_SYS_OFFSET_EXTENDED to work with array of arrays
- fixed PTP_SYS_OFFSET_EXTENDED to break correctly from loop
- fixed timecounter updates in drivers
- split gettimex in igb driver
- fixed ptp_read_* functions to be available without
  CONFIG_PTP_1588_CLOCK

This series enables a more accurate synchronization between PTP hardware
clocks and the system clock.

The first two patches are minor cleanup/bug fixes.

The third patch adds an extended version of the PTP_SYS_OFFSET ioctl,
which returns three timestamps for each measurement. The idea is to
shorten the interval between the system timestamps to contain just the
reading of the lowest register of the PHC in order to reduce the error
in the measured offset and get a smaller upper bound on the maximum
error.

The fourth patch deprecates the original gettime function.

The remaining patches update the gettime function in order to support
the new ioctl in the e1000e, igb, ixgbe, and tg3 drivers.

Tests with few different NICs in different machines show that:
- with an I219 (e1000e) the measured delay was reduced from 2500 to 1300
  ns and the error in the measured offset, when compared to the cross
  timestamping supported by the driver, was reduced by a factor of 5
- with an I210 (igb) the delay was reduced from 5100 to 1700 ns
- with an I350 (igb) the delay was reduced from 2300 to 750 ns
- with an X550 (ixgbe) the delay was reduced from 1950 to 650 ns
- with a BCM5720 (tg3) the delay was reduced from 2400 to 1200 ns


Miroslav Lichvar (8):
  ptp: reorder declarations in ptp_ioctl()
  ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl
  ptp: add PTP_SYS_OFFSET_EXTENDED ioctl
  ptp: deprecate gettime64() in favor of gettimex64()
  e1000e: extend PTP gettime function to read system clock
  igb: extend PTP gettime function to read system clock
  ixgbe: extend PTP gettime function to read system clock
  tg3: extend PTP gettime function to read system clock

 drivers/net/ethernet/broadcom/tg3.c  | 19 --
 drivers/net/ethernet/intel/e1000e/e1000.h|  3 +
 drivers/net/ethernet/intel/e1000e/netdev.c   | 42 ++---
 drivers/net/ethernet/intel/e1000e/ptp.c  | 16 +++--
 drivers/net/ethernet/intel/igb/igb_ptp.c | 65 +---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 54 +---
 drivers/ptp/ptp_chardev.c| 55 ++---
 drivers/ptp/ptp_clock.c  |  5 +-
 include/linux/ptp_clock_kernel.h | 33 ++
 include/uapi/linux/ptp_clock.h   | 12 
 10 files changed, 253 insertions(+), 51 deletions(-)

-- 
2.17.2

[PATCH net-next 4/8] ptp: deprecate gettime64() in favor of gettimex64()

2018-11-09 Thread Miroslav Lichvar

When a driver provides gettimex64(), use it in the PTP_SYS_OFFSET ioctl
and POSIX clock's gettime() instead of gettime64(). Drivers should
provide only one of the functions.

Cc: Richard Cochran 
Cc: Jacob Keller 
Signed-off-by: Miroslav Lichvar 
---
 drivers/ptp/ptp_chardev.c| 5 -
 drivers/ptp/ptp_clock.c  | 5 -
 include/linux/ptp_clock_kernel.h | 2 ++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index aad0d36cf5c0..797fab33bb98 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -260,7 +260,10 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
pct->sec = ts.tv_sec;
pct->nsec = ts.tv_nsec;
pct++;
-   err = ptp->info->gettime64(ptp->info, );
+   if (ops->gettimex64)
+   err = ops->gettimex64(ops, , NULL);
+   else
+   err = ops->gettime64(ops, );
if (err)
goto out;
pct->sec = ts.tv_sec;
diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 5419a89d300e..40fda23e4b05 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -117,7 +117,10 @@ static int ptp_clock_gettime(struct posix_clock *pc, 
struct timespec64 *tp)
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
int err;
 
-   err = ptp->info->gettime64(ptp->info, tp);
+   if (ptp->info->gettimex64)
+   err = ptp->info->gettimex64(ptp->info, tp, NULL);
+   else
+   err = ptp->info->gettime64(ptp->info, tp);
return err;
 }
 
diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h
index a1ec0448e341..7121bbe76979 100644
--- a/include/linux/ptp_clock_kernel.h
+++ b/include/linux/ptp_clock_kernel.h
@@ -82,6 +82,8 @@ struct ptp_system_timestamp {
  *parameter delta: Desired change in nanoseconds.
  *
  * @gettime64:  Reads the current time from the hardware clock.
+ *  This method is deprecated.  New drivers should implement
+ *  the @gettimex64 method instead.
  *  parameter ts: Holds the result.
  *
  * @gettimex64:  Reads the current time from the hardware clock and optionally
-- 
2.17.2

[PATCH net-next 5/8] e1000e: extend PTP gettime function to read system clock

2018-11-09 Thread Miroslav Lichvar

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran 
Cc: Jacob Keller 
Cc: Jeff Kirsher 
Signed-off-by: Miroslav Lichvar 
---
 drivers/net/ethernet/intel/e1000e/e1000.h  |  3 ++
 drivers/net/ethernet/intel/e1000e/netdev.c | 42 --
 drivers/net/ethernet/intel/e1000e/ptp.c| 16 +
 3 files changed, 45 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index c760dc72c520..be13227f1697 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -505,6 +505,9 @@ extern const struct e1000_info e1000_es2_info;
 void e1000e_ptp_init(struct e1000_adapter *adapter);
 void e1000e_ptp_remove(struct e1000_adapter *adapter);
 
+u64 e1000e_read_systim(struct e1000_adapter *adapter,
+  struct ptp_system_timestamp *sts);
+
 static inline s32 e1000_phy_hw_reset(struct e1000_hw *hw)
 {
return hw->phy.ops.reset(hw);
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 16a73bd9f4cb..59bd587d809d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4319,13 +4319,16 @@ void e1000e_reinit_locked(struct e1000_adapter *adapter)
 /**
  * e1000e_sanitize_systim - sanitize raw cycle counter reads
  * @hw: pointer to the HW structure
- * @systim: time value read, sanitized and returned
+ * @systim: PHC time value read, sanitized and returned
+ * @sts: structure to hold system time before and after reading SYSTIML,
+ * may be NULL
  *
  * Errata for 82574/82583 possible bad bits read from SYSTIMH/L:
  * check to see that the time is incrementing at a reasonable
  * rate and is a multiple of incvalue.
  **/
-static u64 e1000e_sanitize_systim(struct e1000_hw *hw, u64 systim)
+static u64 e1000e_sanitize_systim(struct e1000_hw *hw, u64 systim,
+ struct ptp_system_timestamp *sts)
 {
u64 time_delta, rem, temp;
u64 systim_next;
@@ -4335,7 +4338,9 @@ static u64 e1000e_sanitize_systim(struct e1000_hw *hw, 
u64 systim)
incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
/* latch SYSTIMH on read of SYSTIML */
+   ptp_read_system_prets(sts);
systim_next = (u64)er32(SYSTIML);
+   ptp_read_system_postts(sts);
systim_next |= (u64)er32(SYSTIMH) << 32;
 
time_delta = systim_next - systim;
@@ -4353,15 +4358,16 @@ static u64 e1000e_sanitize_systim(struct e1000_hw *hw, 
u64 systim)
 }
 
 /**
- * e1000e_cyclecounter_read - read raw cycle counter (used by time counter)
- * @cc: cyclecounter structure
+ * e1000e_read_systim - read SYSTIM register
+ * @adapter: board private structure
+ * @sts: structure which will contain system time before and after reading
+ * SYSTIML, may be NULL
  **/
-static u64 e1000e_cyclecounter_read(const struct cyclecounter *cc)
+u64 e1000e_read_systim(struct e1000_adapter *adapter,
+  struct ptp_system_timestamp *sts)
 {
-   struct e1000_adapter *adapter = container_of(cc, struct e1000_adapter,
-cc);
struct e1000_hw *hw = >hw;
-   u32 systimel, systimeh;
+   u32 systimel, systimel_2, systimeh;
u64 systim;
/* SYSTIMH latching upon SYSTIML read does not work well.
 * This means that if SYSTIML overflows after we read it but before
@@ -4369,11 +4375,15 @@ static u64 e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
 * will experience a huge non linear increment in the systime value
 * to fix that we test for overflow and if true, we re-read systime.
 */
+   ptp_read_system_prets(sts);
systimel = er32(SYSTIML);
+   ptp_read_system_postts(sts);
systimeh = er32(SYSTIMH);
/* Is systimel is so large that overflow is possible? */
if (systimel >= (u32)0x - E1000_TIMINCA_INCVALUE_MASK) {
-   u32 systimel_2 = er32(SYSTIML);
+   ptp_read_system_prets(sts);
+   systimel_2 = er32(SYSTIML);
+   ptp_read_system_postts(sts);
if (systimel > systimel_2) {
/* There was an overflow, read again SYSTIMH, and use
 * systimel_2
@@ -4386,11 +4396,23 @@ static u64 e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
systim |= (u64)systimeh << 32;
 
if (adapter->flags2 & FLAG2_CHECK_SYSTIM_OVERFLOW)
-   systim = e1000e_sanitize_systim(hw, systim);
+   systim = e1000e_sanitize_systim(hw, systim, sts);
 
return systim;
 }
 
+/**
+ * e1000e_cyclecounter_read - read raw cycle counter (used by time counter)
+ * @cc: cyclecounter structure
+ **/
+static u64 e1000e_cyclecounter_read(const

[PATCH net-next 6/8] igb: extend PTP gettime function to read system clock

2018-11-09 Thread Miroslav Lichvar

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran 
Cc: Jacob Keller 
Cc: Jeff Kirsher 
Signed-off-by: Miroslav Lichvar 
---
 drivers/net/ethernet/intel/igb/igb_ptp.c | 65 
 1 file changed, 55 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ptp.c 
b/drivers/net/ethernet/intel/igb/igb_ptp.c
index 29ced6b74d36..8c1833a157d3 100644
--- a/drivers/net/ethernet/intel/igb/igb_ptp.c
+++ b/drivers/net/ethernet/intel/igb/igb_ptp.c
@@ -275,17 +275,53 @@ static int igb_ptp_adjtime_i210(struct ptp_clock_info 
*ptp, s64 delta)
return 0;
 }
 
-static int igb_ptp_gettime_82576(struct ptp_clock_info *ptp,
-struct timespec64 *ts)
+static int igb_ptp_gettimex_82576(struct ptp_clock_info *ptp,
+ struct timespec64 *ts,
+ struct ptp_system_timestamp *sts)
 {
struct igb_adapter *igb = container_of(ptp, struct igb_adapter,
   ptp_caps);
+   struct e1000_hw *hw = >hw;
unsigned long flags;
+   u32 lo, hi;
u64 ns;
 
spin_lock_irqsave(>tmreg_lock, flags);
 
-   ns = timecounter_read(>tc);
+   ptp_read_system_prets(sts);
+   lo = rd32(E1000_SYSTIML);
+   ptp_read_system_postts(sts);
+   hi = rd32(E1000_SYSTIMH);
+
+   ns = timecounter_cyc2time(>tc, ((u64)hi << 32) | lo);
+
+   spin_unlock_irqrestore(>tmreg_lock, flags);
+
+   *ts = ns_to_timespec64(ns);
+
+   return 0;
+}
+
+static int igb_ptp_gettimex_82580(struct ptp_clock_info *ptp,
+ struct timespec64 *ts,
+ struct ptp_system_timestamp *sts)
+{
+   struct igb_adapter *igb = container_of(ptp, struct igb_adapter,
+  ptp_caps);
+   struct e1000_hw *hw = >hw;
+   unsigned long flags;
+   u32 lo, hi;
+   u64 ns;
+
+   spin_lock_irqsave(>tmreg_lock, flags);
+
+   ptp_read_system_prets(sts);
+   rd32(E1000_SYSTIMR);
+   ptp_read_system_postts(sts);
+   lo = rd32(E1000_SYSTIML);
+   hi = rd32(E1000_SYSTIMH);
+
+   ns = timecounter_cyc2time(>tc, ((u64)hi << 32) | lo);
 
spin_unlock_irqrestore(>tmreg_lock, flags);
 
@@ -294,16 +330,22 @@ static int igb_ptp_gettime_82576(struct ptp_clock_info 
*ptp,
return 0;
 }
 
-static int igb_ptp_gettime_i210(struct ptp_clock_info *ptp,
-   struct timespec64 *ts)
+static int igb_ptp_gettimex_i210(struct ptp_clock_info *ptp,
+struct timespec64 *ts,
+struct ptp_system_timestamp *sts)
 {
struct igb_adapter *igb = container_of(ptp, struct igb_adapter,
   ptp_caps);
+   struct e1000_hw *hw = >hw;
unsigned long flags;
 
spin_lock_irqsave(>tmreg_lock, flags);
 
-   igb_ptp_read_i210(igb, ts);
+   ptp_read_system_prets(sts);
+   rd32(E1000_SYSTIMR);
+   ptp_read_system_postts(sts);
+   ts->tv_nsec = rd32(E1000_SYSTIML);
+   ts->tv_sec = rd32(E1000_SYSTIMH);
 
spin_unlock_irqrestore(>tmreg_lock, flags);
 
@@ -656,9 +698,12 @@ static void igb_ptp_overflow_check(struct work_struct 
*work)
struct igb_adapter *igb =
container_of(work, struct igb_adapter, ptp_overflow_work.work);
struct timespec64 ts;
+   u64 ns;
 
-   igb->ptp_caps.gettime64(>ptp_caps, );
+   /* Update the timecounter */
+   ns = timecounter_read(>tc);
 
+   ts = ns_to_timespec64(ns);
pr_debug("igb overflow check at %lld.%09lu\n",
 (long long) ts.tv_sec, ts.tv_nsec);
 
@@ -1124,7 +1169,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
adapter->ptp_caps.pps = 0;
adapter->ptp_caps.adjfreq = igb_ptp_adjfreq_82576;
adapter->ptp_caps.adjtime = igb_ptp_adjtime_82576;
-   adapter->ptp_caps.gettime64 = igb_ptp_gettime_82576;
+   adapter->ptp_caps.gettimex64 = igb_ptp_gettimex_82576;
adapter->ptp_caps.settime64 = igb_ptp_settime_82576;
adapter->ptp_caps.enable = igb_ptp_feature_enable;
adapter->cc.read = igb_ptp_read_82576;
@@ -1143,7 +1188,7 @@ void igb_ptp_init(struct igb_adapter *adapter)
adapter->ptp_caps.pps = 0;
adapter->ptp_caps.adjfine = igb_ptp_adjfine_82580;
adapter->ptp_caps.adjtime = igb_ptp_adjtime_82576;
-   adapter->ptp_caps.gettime64 = igb_ptp_gettime_82576;
+   adapter->ptp_caps.gettimex64 = igb_ptp_gettimex_82580;
adapter->ptp_caps.settime64 = igb_ptp_settime_82576;
adapter->ptp_caps.enable = igb_ptp_feature_enable;
adapter->cc.read = igb_ptp_read_82580;
@@ -1171,7 +1216,7 @@ void igb_ptp_init(struct igb_adapter

[PATCH net-next 7/8] ixgbe: extend PTP gettime function to read system clock

2018-11-09 Thread Miroslav Lichvar

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran 
Cc: Jacob Keller 
Cc: Jeff Kirsher 
Signed-off-by: Miroslav Lichvar 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 54 
 1 file changed, 44 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
index b3e0d8bb5cbd..d81a50dc9535 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c
@@ -443,22 +443,52 @@ static int ixgbe_ptp_adjtime(struct ptp_clock_info *ptp, 
s64 delta)
 }
 
 /**
- * ixgbe_ptp_gettime
+ * ixgbe_ptp_gettimex
  * @ptp: the ptp clock structure
- * @ts: timespec structure to hold the current time value
+ * @ts: timespec to hold the PHC timestamp
+ * @sts: structure to hold the system time before and after reading the PHC
  *
  * read the timecounter and return the correct value on ns,
  * after converting it into a struct timespec.
  */
-static int ixgbe_ptp_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
+static int ixgbe_ptp_gettimex(struct ptp_clock_info *ptp,
+ struct timespec64 *ts,
+ struct ptp_system_timestamp *sts)
 {
struct ixgbe_adapter *adapter =
container_of(ptp, struct ixgbe_adapter, ptp_caps);
+   struct ixgbe_hw *hw = >hw;
unsigned long flags;
-   u64 ns;
+   u64 ns, stamp;
 
spin_lock_irqsave(>tmreg_lock, flags);
-   ns = timecounter_read(>hw_tc);
+
+   switch (adapter->hw.mac.type) {
+   case ixgbe_mac_X550:
+   case ixgbe_mac_X550EM_x:
+   case ixgbe_mac_x550em_a:
+   /* Upper 32 bits represent billions of cycles, lower 32 bits
+* represent cycles. However, we use timespec64_to_ns for the
+* correct math even though the units haven't been corrected
+* yet.
+*/
+   ptp_read_system_prets(sts);
+   IXGBE_READ_REG(hw, IXGBE_SYSTIMR);
+   ptp_read_system_postts(sts);
+   ts->tv_nsec = IXGBE_READ_REG(hw, IXGBE_SYSTIML);
+   ts->tv_sec = IXGBE_READ_REG(hw, IXGBE_SYSTIMH);
+   stamp = timespec64_to_ns(ts);
+   break;
+   default:
+   ptp_read_system_prets(sts);
+   stamp = IXGBE_READ_REG(hw, IXGBE_SYSTIML);
+   ptp_read_system_postts(sts);
+   stamp |= (u64)IXGBE_READ_REG(hw, IXGBE_SYSTIMH) << 32;
+   break;
+   }
+
+   ns = timecounter_cyc2time(>hw_tc, stamp);
+
spin_unlock_irqrestore(>tmreg_lock, flags);
 
*ts = ns_to_timespec64(ns);
@@ -567,10 +597,14 @@ void ixgbe_ptp_overflow_check(struct ixgbe_adapter 
*adapter)
 {
bool timeout = time_is_before_jiffies(adapter->last_overflow_check +
 IXGBE_OVERFLOW_PERIOD);
-   struct timespec64 ts;
+   unsigned long flags;
 
if (timeout) {
-   ixgbe_ptp_gettime(>ptp_caps, );
+   /* Update the timecounter */
+   spin_lock_irqsave(>tmreg_lock, flags);
+   timecounter_read(>hw_tc);
+   spin_unlock_irqrestore(>tmreg_lock, flags);
+
adapter->last_overflow_check = jiffies;
}
 }
@@ -1216,7 +1250,7 @@ static long ixgbe_ptp_create_clock(struct ixgbe_adapter 
*adapter)
adapter->ptp_caps.pps = 1;
adapter->ptp_caps.adjfreq = ixgbe_ptp_adjfreq_82599;
adapter->ptp_caps.adjtime = ixgbe_ptp_adjtime;
-   adapter->ptp_caps.gettime64 = ixgbe_ptp_gettime;
+   adapter->ptp_caps.gettimex64 = ixgbe_ptp_gettimex;
adapter->ptp_caps.settime64 = ixgbe_ptp_settime;
adapter->ptp_caps.enable = ixgbe_ptp_feature_enable;
adapter->ptp_setup_sdp = ixgbe_ptp_setup_sdp_x540;
@@ -1233,7 +1267,7 @@ static long ixgbe_ptp_create_clock(struct ixgbe_adapter 
*adapter)
adapter->ptp_caps.pps = 0;
adapter->ptp_caps.adjfreq = ixgbe_ptp_adjfreq_82599;
adapter->ptp_caps.adjtime = ixgbe_ptp_adjtime;
-   adapter->ptp_caps.gettime64 = ixgbe_ptp_gettime;
+   adapter->ptp_caps.gettimex64 = ixgbe_ptp_gettimex;
adapter->ptp_caps.settime64 = ixgbe_ptp_settime;
adapter->ptp_caps.enable = ixgbe_ptp_feature_enable;
break;
@@ -1249,7 +1283,7 @@ static long ixgbe_ptp_create_clock(struct ixgbe_adapter 
*adapter)
adapter->ptp_caps.pps = 0;
adapter->ptp_caps.adjfreq = ixgbe_ptp_adjfreq_X550;
adapter->ptp_caps.adjtime = ixgbe_ptp_adjtime;
-   adapter->ptp_caps.gettime64 = ixgbe_ptp_gettime;
+   adapter->ptp_caps.gettimex64 = ixgbe_ptp_gettimex;
adapter->ptp_caps.settime64 = ixgbe_ptp_settime;

[PATCH net-next 2/8] ptp: check gettime64 return code in PTP_SYS_OFFSET ioctl

2018-11-09 Thread Miroslav Lichvar

If a gettime64 call fails, return the error and avoid copying data back
to user.

Cc: Richard Cochran 
Cc: Jacob Keller 
Signed-off-by: Miroslav Lichvar 
---
 drivers/ptp/ptp_chardev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index b54b8158ff8a..3c681bed5703 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -228,7 +228,9 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
pct->sec = ts.tv_sec;
pct->nsec = ts.tv_nsec;
pct++;
-   ptp->info->gettime64(ptp->info, );
+   err = ptp->info->gettime64(ptp->info, );
+   if (err)
+   goto out;
pct->sec = ts.tv_sec;
pct->nsec = ts.tv_nsec;
pct++;
@@ -281,6 +283,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
break;
}
 
+out:
kfree(sysoff);
return err;
 }
-- 
2.17.2

[PATCH net-next 1/8] ptp: reorder declarations in ptp_ioctl()

2018-11-09 Thread Miroslav Lichvar

Reorder declarations of variables as reversed Christmas tree.

Cc: Richard Cochran 
Suggested-by: Richard Cochran 
Signed-off-by: Miroslav Lichvar 
---
 drivers/ptp/ptp_chardev.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index 2012551d93e0..b54b8158ff8a 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -121,18 +121,18 @@ int ptp_open(struct posix_clock *pc, fmode_t fmode)
 
 long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, unsigned long arg)
 {
-   struct ptp_clock_caps caps;
-   struct ptp_clock_request req;
-   struct ptp_sys_offset *sysoff = NULL;
-   struct ptp_sys_offset_precise precise_offset;
-   struct ptp_pin_desc pd;
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
+   struct ptp_sys_offset_precise precise_offset;
+   struct system_device_crosststamp xtstamp;
struct ptp_clock_info *ops = ptp->info;
+   struct ptp_sys_offset *sysoff = NULL;
+   struct ptp_clock_request req;
+   struct ptp_clock_caps caps;
struct ptp_clock_time *pct;
+   unsigned int i, pin_index;
+   struct ptp_pin_desc pd;
struct timespec64 ts;
-   struct system_device_crosststamp xtstamp;
int enable, err = 0;
-   unsigned int i, pin_index;
 
switch (cmd) {
 
-- 
2.17.2

[PATCH net-next 8/8] tg3: extend PTP gettime function to read system clock

2018-11-09 Thread Miroslav Lichvar

This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran 
Cc: Michael Chan 
Signed-off-by: Miroslav Lichvar 
---
 drivers/net/ethernet/broadcom/tg3.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index 89295306f161..ce44d208e137 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6135,10 +6135,16 @@ static int tg3_setup_phy(struct tg3 *tp, bool 
force_reset)
 }
 
 /* tp->lock must be held */
-static u64 tg3_refclk_read(struct tg3 *tp)
+static u64 tg3_refclk_read(struct tg3 *tp, struct ptp_system_timestamp *sts)
 {
-   u64 stamp = tr32(TG3_EAV_REF_CLCK_LSB);
-   return stamp | (u64)tr32(TG3_EAV_REF_CLCK_MSB) << 32;
+   u64 stamp;
+
+   ptp_read_system_prets(sts);
+   stamp = tr32(TG3_EAV_REF_CLCK_LSB);
+   ptp_read_system_postts(sts);
+   stamp |= (u64)tr32(TG3_EAV_REF_CLCK_MSB) << 32;
+
+   return stamp;
 }
 
 /* tp->lock must be held */
@@ -6229,13 +6235,14 @@ static int tg3_ptp_adjtime(struct ptp_clock_info *ptp, 
s64 delta)
return 0;
 }
 
-static int tg3_ptp_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
+static int tg3_ptp_gettimex(struct ptp_clock_info *ptp, struct timespec64 *ts,
+   struct ptp_system_timestamp *sts)
 {
u64 ns;
struct tg3 *tp = container_of(ptp, struct tg3, ptp_info);
 
tg3_full_lock(tp, 0);
-   ns = tg3_refclk_read(tp);
+   ns = tg3_refclk_read(tp, sts);
ns += tp->ptp_adjust;
tg3_full_unlock(tp);
 
@@ -6330,7 +6337,7 @@ static const struct ptp_clock_info tg3_ptp_caps = {
.pps= 0,
.adjfreq= tg3_ptp_adjfreq,
.adjtime= tg3_ptp_adjtime,
-   .gettime64  = tg3_ptp_gettime,
+   .gettimex64 = tg3_ptp_gettimex,
.settime64  = tg3_ptp_settime,
.enable = tg3_ptp_enable,
 };
-- 
2.17.2

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread Paweł Staszewski





W dniu 08.11.2018 o 17:06, David Ahern pisze:

On 11/8/18 6:33 AM, Paweł Staszewski wrote:


W dniu 07.11.2018 o 22:06, David Ahern pisze:

On 11/3/18 6:24 PM, Paweł Staszewski wrote:

Does your setup have any other device types besides physical ports with
VLANs (e.g., any macvlans or bonds)?



no.
just
phy(mlnx)->vlans only config

VLAN and non-VLAN (and a mix) seem to work ok. Patches are here:
     https://github.com/dsahern/linux.git bpf/kernel-tables-wip

I got lazy with the vlan exports; right now it requires 8021q to be
builtin (CONFIG_VLAN_8021Q=y)

You can use the xdp_fwd sample:
    make O=kbuild -C samples/bpf -j 8

Copy samples/bpf/xdp_fwd_kern.o and samples/bpf/xdp_fwd to the server
and run:
     ./xdp_fwd 

e.g., in my testing I run:
     xdp_fwd eth1 eth2 eth3 eth4

All of the relevant forwarding ports need to be on the same command
line. This version populates a second map to verify the egress port has
XDP enabled.

Installed today on some lab server with mellanox connectx4

And trying some simple static routing first - but after enabling xdp
program - receiver is not receiving frames

Route table is simple as possible for tests :)

icmp ping test send from 192.168.22.237 to 172.16.0.2 - incomming
packets on vlan 4081

ip r
default via 192.168.22.236 dev vlan4081
172.16.0.0/30 dev vlan1740 proto kernel scope link src 172.16.0.1
192.168.22.0/24 dev vlan4081 proto kernel scope link src 192.168.22.205

neigh table:
ip neigh ls

192.168.22.237 dev vlan4081 lladdr 00:25:90:fb:a6:8d REACHABLE
172.16.0.2 dev vlan1740 lladdr ac:1f:6b:2c:2e:5a REACHABLE

and interfaces:
4: enp175s0f0:  mtu 1500 qdisc mq state
UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:90 brd ff:ff:ff:ff:ff:ff
5: enp175s0f1:  mtu 1500 qdisc mq state
UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:91 brd ff:ff:ff:ff:ff:ff
6: vlan4081@enp175s0f0:  mtu 1500 qdisc
noqueue state UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:90 brd ff:ff:ff:ff:ff:ff
7: vlan1740@enp175s0f1:  mtu 1500 qdisc
noqueue state UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:91 brd ff:ff:ff:ff:ff:ff

5: enp175s0f1:  mtu 1500 xdp/id:5 qdisc
mq state UP group default qlen 1000
     link/ether ac:1f:6b:07:c8:91 brd ff:ff:ff:ff:ff:ff
     inet6 fe80::ae1f:6bff:fe07:c891/64 scope link
    valid_lft forever preferred_lft forever
6: vlan4081@enp175s0f0:  mtu 1500 qdisc
noqueue state UP group default qlen 1000
     link/ether ac:1f:6b:07:c8:90 brd ff:ff:ff:ff:ff:ff
     inet 192.168.22.205/24 scope global vlan4081
    valid_lft forever preferred_lft forever
     inet6 fe80::ae1f:6bff:fe07:c890/64 scope link
    valid_lft forever preferred_lft forever
7: vlan1740@enp175s0f1:  mtu 1500 qdisc
noqueue state UP group default qlen 1000
     link/ether ac:1f:6b:07:c8:91 brd ff:ff:ff:ff:ff:ff
     inet 172.16.0.1/30 scope global vlan1740
    valid_lft forever preferred_lft forever
     inet6 fe80::ae1f:6bff:fe07:c891/64 scope link
    valid_lft forever preferred_lft forever


xdp program detached:
Receiving side tcpdump:
14:28:09.141233 IP 192.168.22.237 > 172.16.0.2: ICMP echo request, id
30227, seq 487, length 64

I can see icmp requests


enabling xdp
./xdp_fwd enp175s0f1 enp175s0f0

4: enp175s0f0:  mtu 1500 xdp qdisc mq
state UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:90 brd ff:ff:ff:ff:ff:ff
     prog/xdp id 5 tag 3c231ff1e5e77f3f
5: enp175s0f1:  mtu 1500 xdp qdisc mq
state UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:91 brd ff:ff:ff:ff:ff:ff
     prog/xdp id 5 tag 3c231ff1e5e77f3f
6: vlan4081@enp175s0f0:  mtu 1500 qdisc
noqueue state UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:90 brd ff:ff:ff:ff:ff:ff
7: vlan1740@enp175s0f1:  mtu 1500 qdisc
noqueue state UP mode DEFAULT group default qlen 1000
     link/ether ac:1f:6b:07:c8:91 brd ff:ff:ff:ff:ff:ff


What hardware is this?

Start with:

echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
cat /sys/kernel/debug/tracing/trace_pipe

>From there, you can check the FIB lookups:
sysctl -w kernel.perf_event_max_stack=16
perf record -e fib:* -a -g -- sleep 5
perf script



I just catch some weird behavior :)
All was working fine for about 20k packets

Then after xdp start to forward every 10 packets
ping 172.16.0.2 -i 0.1
PING 172.16.0.2 (172.16.0.2) 56(84) bytes of data.
64 bytes from 172.16.0.2: icmp_seq=1 ttl=64 time=5.12 ms
64 bytes from 172.16.0.2: icmp_seq=9 ttl=64 time=5.20 ms
64 bytes from 172.16.0.2: icmp_seq=19 ttl=64 time=4.85 ms
64 bytes from 172.16.0.2: icmp_seq=29 ttl=64 time=4.91 ms
64 bytes from 172.16.0.2: icmp_seq=38 ttl=64 time=4.85 ms
64 bytes from 172.16.0.2: icmp_seq=48 ttl=64 time=5.00 ms
^C
--- 172.16.0.2 ping statistics ---
55 packets transmitted, 6 received, 89% packet loss, time 5655ms
rtt min/avg/max/mdev = 4.850/4.992/5.203/0.145 ms


And again after some time back to normal

 ping 172.16.0.2 -i 0.1

Re: [PATCH 08/20] octeontx2-af: Alloc and config NPC MCAM entry at a time

2018-11-09 Thread Arnd Bergmann

On Fri, Nov 9, 2018 at 5:21 AM Sunil Kovvuri  wrote:
>
> On Fri, Nov 9, 2018 at 2:13 AM Arnd Bergmann  wrote:
> >
> > On Thu, Nov 8, 2018 at 7:37 PM  wrote:
> > > @@ -666,4 +668,20 @@ struct npc_mcam_unmap_counter_req {
> > > u8  all;   /* Unmap all entries using this counter ? */
> > >  };
> > >
> > > +struct npc_mcam_alloc_and_write_entry_req {
> > > +   struct mbox_msghdr hdr;
> > > +   struct mcam_entry entry_data;
> > > +   u16 ref_entry;
> > > +   u8  priority;/* Lower or higher w.r.t ref_entry */
> > > +   u8  intf;/* Rx or Tx interface */
> > > +   u8  enable_entry;/* Enable this MCAM entry ? */
> > > +   u8  alloc_cntr;  /* Allocate counter and map ? */
> > > +};
> >
> > I noticed that this structure requires padding at the end because
> > struct mbox_msghdr has a 32-bit alignment requirement. For
> > data structures in an interface, I'd recommend avoiding that kind
> > of padding and adding reserved fields or widening the types
> > accordingly.
> >
>
> When there are multiple messages in the mailbox, each message starts
> at a 16byte aligned offset. So struct mbox_msghdr is always aligned.
> I think adding reserved fields is not needed here.
>
> ===
> struct mbox_msghdr *otx2_mbox_alloc_msg_rsp(struct otx2_mbox *mbox, int devid,
> int size, int size_rsp)
> {
> size = ALIGN(size, MBOX_MSG_ALIGN);
> ===
>
> Is this what you were referring to ?
>

No, I mean the padding at the end of the structure. An example
would be a structure like

struct s {
u16 a;
u32 b;
u16 c;
};

Since b is aligned to four bytes, you get padding between a and b.
On top of that, you also get padding after c to make the size of
structure itself be a multiple of its alignment. For interfaces, we
should avoid both kinds of padding. This can be done by marking
members as __packed (usually I don't recommend that), by
changing the size of members, or by adding explicit 'reserved'
fields in place of the padding.

> > I also noticed a similar problem in struct mbox_msghdr. Maybe
> > use the 'pahole' tool to check for this kind of padding in the
> > API structures.

 Arnd

[PATCH net-next] cxgb4vf: fix memleak in mac_hlist initialization

2018-11-09 Thread Arjun Vynipadath

mac_hlist was initialized during adapter_up, which will be called
every time a vf device is first brought up, or every time when device
is brought up again after bringing all devices down. This means our
state of previous list is lost, causing a memleak if entries are
present in the list. To fix that, move list init to the condition
that performs initial one time adapter setup.

Signed-off-by: Arjun Vynipadath 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index ff84791..972dc7b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -722,6 +722,10 @@ static int adapter_up(struct adapter *adapter)
 
if (adapter->flags & USING_MSIX)
name_msix_vecs(adapter);
+
+   /* Initialize hash mac addr list*/
+   INIT_LIST_HEAD(>mac_hlist);
+
adapter->flags |= FULL_INIT_DONE;
}
 
@@ -747,8 +751,6 @@ static int adapter_up(struct adapter *adapter)
enable_rx(adapter);
t4vf_sge_start(adapter);
 
-   /* Initialize hash mac addr list*/
-   INIT_LIST_HEAD(>mac_hlist);
return 0;
 }
 
-- 
1.8.3.1

[PATCH net 1/1] bnx2x: Assign unique DMAE channel number for FW DMAE transactions.

2018-11-09 Thread Sudarsana Reddy Kalluru

Driver assigns DMAE channel 0 for FW as part of START_RAMROD command. FW
uses this channel for DMAE operations (e.g., TIME_SYNC implementation).
Driver also uses the same channel 0 for DMAE operations for some of the PFs
(e.g., PF0 on Port0). This could lead to concurrent access to the DMAE
channel by FW and driver which is not legal. Hence need to assign unique
DMAE id for FW.
Currently following DMAE channels are used by the clients,
  MFW - OCBB/OCSD functionality uses DMAE channel 14/15
  Driver 0-3 and 8-11 (for PF dmae operations)
 4 and 12 (for stats requests)
Assigning unique dmae_id '13' to the FW.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Michal Kalderon 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 7 +++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h | 3 +++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c  | 1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h  | 3 +++
 4 files changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index be15061..0de487a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -2191,6 +2191,13 @@ void bnx2x_igu_clear_sb_gen(struct bnx2x *bp, u8 func, 
u8 idu_sb_id,
 #define PMF_DMAE_C(bp) (BP_PORT(bp) * MAX_DMAE_C_PER_PORT + \
 E1HVN_MAX)
 
+/* Following is the DMAE channel number allocation for the clients.
+ *   MFW: OCBB/OCSD implementations use DMAE channels 14/15 respectively.
+ *   Driver: 0-3 and 8-11 (for PF dmae operations)
+ *   4 and 12 (for stats requests)
+ */
+#define BNX2X_FW_DMAE_C 13 /* Channel for FW DMAE operations */
+
 /* PCIE link and speed */
 #define PCICFG_LINK_WIDTH  0x1f0
 #define PCICFG_LINK_WIDTH_SHIFT20
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index 142bc11..b9059f4 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -965,6 +965,9 @@ static inline int bnx2x_func_start(struct bnx2x *bp)
start_params->network_cos_mode = STATIC_COS;
else /* CHIP_IS_E1X */
start_params->network_cos_mode = FW_WRR;
+
+   start_params->dmae_cmd_id = BNX2X_FW_DMAE_C;
+
if (bp->udp_tunnel_ports[BNX2X_UDP_PORT_VXLAN].count) {
port = bp->udp_tunnel_ports[BNX2X_UDP_PORT_VXLAN].dst_port;
start_params->vxlan_dst_port = port;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
index 3f4d2c8..1a33017 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
@@ -6149,6 +6149,7 @@ static inline int bnx2x_func_send_start(struct bnx2x *bp,
rdata->sd_vlan_tag  = cpu_to_le16(start_params->sd_vlan_tag);
rdata->path_id  = BP_PATH(bp);
rdata->network_cos_mode = start_params->network_cos_mode;
+   rdata->dmae_cmd_id  = start_params->dmae_cmd_id;
 
rdata->vxlan_dst_port   = cpu_to_le16(start_params->vxlan_dst_port);
rdata->geneve_dst_port  = cpu_to_le16(start_params->geneve_dst_port);
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
index 0bf2fd4..6cc3301 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h
@@ -1188,6 +1188,9 @@ struct bnx2x_func_start_params {
/* Function cos mode */
u8 network_cos_mode;
 
+   /* DMAE command id to be used for FW DMAE transactions */
+   u8 dmae_cmd_id;
+
/* UDP dest port for VXLAN */
u16 vxlan_dst_port;
 
-- 
1.8.3.1

Re: [PATCH bpf-next] bpftool: Improve handling of ENOENT on map dumps

2018-11-09 Thread Daniel Borkmann

On 11/08/2018 10:25 PM, Jakub Kicinski wrote:
> On Thu,  8 Nov 2018 13:00:07 -0800, David Ahern wrote:
>> From: David Ahern 
>>
>> bpftool output is not user friendly when dumping a map with only a few
>> populated entries:
>>
>> $ bpftool map
>> 1: devmap  name tx_devmap  flags 0x0
>> key 4B  value 4B  max_entries 64  memlock 4096B
>> 2: array  name tx_idxmap  flags 0x0
>> key 4B  value 4B  max_entries 64  memlock 4096B
>>
>> $ bpftool map dump id 1
>> key:
>> 00 00 00 00
>> value:
>> No such file or directory
>> key:
>> 01 00 00 00
>> value:
>> No such file or directory
>> key:
>> 02 00 00 00
>> value:
>> No such file or directory
>> key: 03 00 00 00  value: 03 00 00 00
>>
>> Handle ENOENT by keeping the line format sane and dumping
>> "" for the value
>>
>> $ bpftool map dump id 1
>> key: 00 00 00 00  value: 
>> key: 01 00 00 00  value: 
>> key: 02 00 00 00  value: 
>> key: 03 00 00 00  value: 03 00 00 00
>> ...
>>
>> Signed-off-by: David Ahern 
> 
> Seems good.  I wonder why "fd" maps report all indexes in get_next..
> 
> Acked-by: Jakub Kicinski 
> 
>> Alternatively, could just omit the value, so:
>> key: 00 00 00 00  value:
>> key: 01 00 00 00  value:
>> key: 02 00 00 00  value:
>> key: 03 00 00 00  value: 03 00 00 00
>>
>>  tools/bpf/bpftool/map.c | 19 +++
>>  1 file changed, 15 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
>> index 101b8a881225..1f0060644e0c 100644
>> --- a/tools/bpf/bpftool/map.c
>> +++ b/tools/bpf/bpftool/map.c
>> @@ -383,7 +383,10 @@ static void print_entry_plain(struct bpf_map_info 
>> *info, unsigned char *key,
>>  printf(single_line ? "  " : "\n");
>>  
>>  printf("value:%c", break_names ? '\n' : ' ');
>> -fprint_hex(stdout, value, info->value_size, " ");
>> +if (value)
>> +fprint_hex(stdout, value, info->value_size, " ");
>> +else
>> +printf("");
>>  
>>  printf("\n");
>>  } else {
>> @@ -398,8 +401,12 @@ static void print_entry_plain(struct bpf_map_info 
>> *info, unsigned char *key,
>>  for (i = 0; i < n; i++) {
>>  printf("value (CPU %02d):%c",
>> i, info->value_size > 16 ? '\n' : ' ');
>> -fprint_hex(stdout, value + i * step,
>> -   info->value_size, " ");
>> +if (value) {
>> +fprint_hex(stdout, value + i * step,
>> +   info->value_size, " ");
>> +} else {
>> +printf("");
>> +}
> 
> nit: in other places you don't add { }

Applied to bpf-next and fixed this nit up while doing so, thanks everyone!

[PATCH net-next] cxgb4vf: free mac_hlist properly

2018-11-09 Thread Arjun Vynipadath

The locally maintained list for tracking hash mac table was
not freed during driver remove.

Signed-off-by: Arjun Vynipadath 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c 
b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
index 972dc7b..8ec503c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c
@@ -3289,6 +3289,7 @@ static int cxgb4vf_pci_probe(struct pci_dev *pdev,
 static void cxgb4vf_pci_remove(struct pci_dev *pdev)
 {
struct adapter *adapter = pci_get_drvdata(pdev);
+   struct hash_mac_addr *entry, *tmp;
 
/*
 * Tear down driver state associated with device.
@@ -3339,6 +3340,11 @@ static void cxgb4vf_pci_remove(struct pci_dev *pdev)
if (!is_t4(adapter->params.chip))
iounmap(adapter->bar2);
kfree(adapter->mbox_log);
+   list_for_each_entry_safe(entry, tmp, >mac_hlist,
+list) {
+   list_del(>list);
+   kfree(entry);
+   }
kfree(adapter);
}
 
-- 
1.8.3.1

Re: Kernel 4.19 network performance - forwarding/routing normal users traffic

2018-11-09 Thread Paweł Staszewski





W dniu 09.11.2018 o 05:52, Saeed Mahameed pisze:

On Thu, 2018-11-08 at 17:42 -0700, David Ahern wrote:

On 11/8/18 5:40 PM, Paweł Staszewski wrote:

W dniu 08.11.2018 o 17:32, David Ahern pisze:

On 11/8/18 9:27 AM, Paweł Staszewski wrote:

What hardware is this?


mellanox connectx 4
ethtool -i enp175s0f0
driver: mlx5_core
version: 5.0-0
firmware-version: 12.21.1000 (SM_200101033)
expansion-rom-version:
bus-info: :af:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

ethtool -i enp175s0f1
driver: mlx5_core
version: 5.0-0
firmware-version: 12.21.1000 (SM_200101033)
expansion-rom-version:
bus-info: :af:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes


Start with:

echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
cat /sys/kernel/debug/tracing/trace_pipe

   cat /sys/kernel/debug/tracing/trace_pipe
   -0 [045] ..s. 68469.467752:
xdp_devmap_xmit:
ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0
drops=1
from_ifindex=4 to_ifindex=5 err=-6

FIB lookup is good, the redirect is happening, but the mlx5
driver does
not like it.

I think the -6 is coming from the mlx5 driver and the packet is
getting
dropped. Perhaps this check in mlx5e_xdp_xmit:

 if (unlikely(sq_num >= priv->channels.num))
  return -ENXIO;

I removed that part and recompiled - but after running now xdp_fwd
i
have kernel pamic :)

hh, no please don't do such thing :)

yes - dirty "try" :)
Code back in place :)




It must be because the tx netdev has less tx queues than the rx netdev.
or the rx netdev rings are bound to a high cpu indexes.

anyway, best practice is to open #cores RX/TX netdev on both sides

ethtool -L enp175s0f0  combined $(nproc)
ethtool -L enp175s0f1  combined $(nproc)

Ok now it is working.

Time for some tests :)

Thanks


Jesper or one of the Mellanox folks needs to respond about the config
needed to run XDP with this NIC. I don't have a 40G or 100G card to
play
with.

Re: [PATCH 00/20] octeontx2-af: NPC MCAM support and FLR handling

2018-11-09 Thread Arnd Bergmann

On Fri, Nov 9, 2018 at 5:35 AM Sunil Kovvuri  wrote:
> On Fri, Nov 9, 2018 at 2:32 AM Arnd Bergmann  wrote:
> > On Thu, Nov 8, 2018 at 7:36 PM  wrote:
> > > From: Sunil Goutham 
> >
> > Hmm, I noticed that you use a different address as the patch author
> > and the submitter. I'm guessing that "Sunil Goutham" and
> > "Sunil Kovvuri" actually refer to the same person, and you just
> > need to pick which of the two email addresses you want to use
> > for public communication, but that's not obvious here.
> >
> > However, if there are actually two different Sunil's here, then
> > you need to add that second Signed-off-by.
> >
>
> No, it's just me.
> Sometimes code indentation becomes messy and difficult to read, if i use
> corporate mail server to submit patches. So i have been using gmail.

Ok, I see. Ideally you should try to get the company mail server fixed
of course. A possible workaround is to add your marvell address as
an alias in gmail, which allows 'git send email' to send out mails with
the other address as the sender. This may however fail if the marvell
mail server uses SPF, as mail clients might then consider your
mails as forged.

Arnd

[PATCH net-next] cxgb4: free mac_hlist properly

2018-11-09 Thread Arjun Vynipadath

The locally maintained list for tracking hash mac table was
not freed during driver remove.

Signed-off-by: Arjun Vynipadath 
Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 05a4692..956e708 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2295,6 +2295,8 @@ static int cxgb_up(struct adapter *adap)
 
 static void cxgb_down(struct adapter *adapter)
 {
+   struct hash_mac_addr *entry, *tmp;
+
cancel_work_sync(>tid_release_task);
cancel_work_sync(>db_full_task);
cancel_work_sync(>db_drop_task);
@@ -2303,6 +2305,12 @@ static void cxgb_down(struct adapter *adapter)
 
t4_sge_stop(adapter);
t4_free_sge_resources(adapter);
+
+   list_for_each_entry_safe(entry, tmp, >mac_hlist, list) {
+   list_del(>list);
+   kfree(entry);
+   }
+
adapter->flags &= ~FULL_INIT_DONE;
 }
 
-- 
1.8.3.1

Re: [PATCH 11/20] octeontx2-af: Add support for stripping STAG/CTAG

2018-11-09 Thread Arnd Bergmann

On Fri, Nov 9, 2018 at 5:29 AM Sunil Kovvuri  wrote:
> On Fri, Nov 9, 2018 at 2:17 AM Arnd Bergmann  wrote:
> > On Thu, Nov 8, 2018 at 7:37 PM  wrote:

> >
> > Here is another instance of bitfields in an interface structure. As
> > before, please try to avoid doing that and use bit shifts and masks
> > instead.
> >
> >Arnd
>
> No, this struct is not part of communication interface.
> This is used to fill up a register in a bit more readable fashion
> instead of plain bit shifts.

But this is still an interface, isn't it? Writing to the register
implies that there is some hardware that interprets the
bits, so they have to be in the right place.

> ===
> struct nix_rx_vtag_action vtag_action;
>
> *(u64 *)_action = 0;
> vtag_action.vtag0_valid = 1;
> /* must match type set in NIX_VTAG_CFG */
> vtag_action.vtag0_type = 0;
> vtag_action.vtag0_lid = NPC_LID_LA;
> vtag_action.vtag0_relptr = 12;
> entry.vtag_action = *(u64 *)_action;
>
> /* Set TAG 'action' */
> rvu_write64(rvu, blkaddr, NPC_AF_MCAMEX_BANKX_TAG_ACT(index, actbank),
> entry->vtag_action);

I assume this rvu_write64() does a cpu_to_le64() swap on big-endian,
so the contents again are in the wrong place. I don't see any non-reserved
fields that span an 8-bit boundary, so you can probably rearrange the bits
to make it work, but generally speaking it's better to not rely on how the
compiler lays out bit fields.

Arnd

RE: [PATCH net-next 1/2] dpaa2-eth: defer probe on object allocate

2018-11-09 Thread Ioana Ciornei

> > The fsl_mc_object_allocate function can fail because not all
> > allocatable objects are probed by the fsl_mc_allocator at the call
> > time. Defer the dpaa2-eth probe when this happens.
> >
> > Signed-off-by: Ioana Ciornei 
> > ---
> >  drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 30
> > +---
> >  1 file changed, 21 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> > b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> > index 88f7acc..71f5cd4 100644
> > --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> > +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> > @@ -1434,8 +1434,11 @@ static struct fsl_mc_device *setup_dpcon(struct
> dpaa2_eth_priv *priv)
> > err = fsl_mc_object_allocate(to_fsl_mc_device(dev),
> >  FSL_MC_POOL_DPCON, );
> > if (err) {
> > -   dev_info(dev, "Not enough DPCONs, will go on as-is\n");
> > -   return NULL;
> > +   if (err == -ENXIO)
> > +   err = -EPROBE_DEFER;
> > +   else
> > +   dev_info(dev, "Not enough DPCONs, will go on as-
> is\n");
> > +   return ERR_PTR(err);
> > }
> >
> > err = dpcon_open(priv->mc_io, 0, dpcon->obj_desc.id,
> > >mc_handle); @@ -1493,8 +1496,10 @@ static void
> free_dpcon(struct dpaa2_eth_priv *priv,
> > return NULL;
> >
> > channel->dpcon = setup_dpcon(priv);
> > -   if (!channel->dpcon)
> > +   if (IS_ERR_OR_NULL(channel->dpcon)) {
> > +   err = PTR_ERR(channel->dpcon);
> > goto err_setup;
> > +   }
> 
> Hi Ioana
> 
> You need to be careful with IS_ERR_OR_NULL(). If it is a NULL,
> PTR_ERR() is going to return 0. You then jump to the error cleanup code, but
> return 0, meaning everything is O.K.
> 

Hi Andrew,

I took a closer look at the code path and it seems that if channel->dpcon in 
the snippet above is NULL,
then indeed PTR_ERR will be 0 but in the error cleanup code, in this case the 
err_setup label, 
a reverse ERR_PTR (NULL in this case) will be returned.
Continuing on the code path, alloc_channel then returns NULL and is handled by 
the following snippet.

+   if (IS_ERR_OR_NULL(channel)) {
+   err = PTR_ERR(channel);
+   if (err != -EPROBE_DEFER)
+   dev_info(dev,
+"No affine channel for cpu %d and 
above\n", i);
goto err_alloc_ch;
}
  
In case channel is NULL, then the dev_info will be called and the jump to the 
cleanup is made.

err_alloc_ch:
+   if (err == -EPROBE_DEFER)
+   return err;
+
if (cpumask_empty(>dpio_cpumask)) {
dev_err(dev, "No cpu with an affine DPIO/DPCON\n");
return err;

Here err is 0 so in case the cpumask is empty, 0 will be returned, which is not 
the intended use.
I will send a v2 changing the return value to -ENODEV in case no cpus with an 
affine DPIO is found.

Thanks,
Ioana

[PATCH net-next] udp6: cleanup stats accounting in recvmsg()

2018-11-09 Thread Paolo Abeni

In the udp6 code path, we needed multiple tests to select the correct
mib to be updated. Since we touch at least a counter at each iteration,
it's convenient to use the recently introduced __UDPX_MIB() helper once
and remove some code duplication.

Signed-off-by: Paolo Abeni 
---
 net/ipv6/udp.c | 32 +++-
 1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 0c0cb1611aef..dde51fc7ac16 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -326,6 +326,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len,
int err;
int is_udplite = IS_UDPLITE(sk);
bool checksum_valid = false;
+   struct udp_mib *mib;
int is_udp4;
 
if (flags & MSG_ERRQUEUE)
@@ -349,6 +350,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len,
msg->msg_flags |= MSG_TRUNC;
 
is_udp4 = (skb->protocol == htons(ETH_P_IP));
+   mib = __UDPX_MIB(sk, is_udp4);
 
/*
 * If checksum is needed at all, try to do it while copying the
@@ -377,24 +379,13 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len,
if (unlikely(err)) {
if (!peeked) {
atomic_inc(>sk_drops);
-   if (is_udp4)
-   UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
- is_udplite);
-   else
-   UDP6_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
-  is_udplite);
+   SNMP_INC_STATS(mib, UDP_MIB_INERRORS);
}
kfree_skb(skb);
return err;
}
-   if (!peeked) {
-   if (is_udp4)
-   UDP_INC_STATS(sock_net(sk), UDP_MIB_INDATAGRAMS,
- is_udplite);
-   else
-   UDP6_INC_STATS(sock_net(sk), UDP_MIB_INDATAGRAMS,
-  is_udplite);
-   }
+   if (!peeked)
+   SNMP_INC_STATS(mib, UDP_MIB_INDATAGRAMS);
 
sock_recv_ts_and_drops(msg, sk, skb);
 
@@ -443,17 +434,8 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len,
 csum_copy_err:
if (!__sk_queue_drop_skb(sk, _sk(sk)->reader_queue, skb, flags,
 udp_skb_destructor)) {
-   if (is_udp4) {
-   UDP_INC_STATS(sock_net(sk),
- UDP_MIB_CSUMERRORS, is_udplite);
-   UDP_INC_STATS(sock_net(sk),
- UDP_MIB_INERRORS, is_udplite);
-   } else {
-   UDP6_INC_STATS(sock_net(sk),
-  UDP_MIB_CSUMERRORS, is_udplite);
-   UDP6_INC_STATS(sock_net(sk),
-  UDP_MIB_INERRORS, is_udplite);
-   }
+   SNMP_INC_STATS(mib, UDP_MIB_CSUMERRORS);
+   SNMP_INC_STATS(mib, UDP_MIB_INERRORS);
}
kfree_skb(skb);
 
-- 
2.17.2

[PATCH net-next v2 0/2] dpaa2-eth: defer probe on object allocate

2018-11-09 Thread Ioana Ciornei

Allocatable objects on the fsl-mc bus may be probed by the fsl_mc_allocator
after the first attempts of other drivers to use them. Defer the probe when
this situation happens.

Changes in v2:
  - proper handling of IS_ERR_OR_NULL

Ioana Ciornei (2):
  dpaa2-eth: defer probe on object allocate
  dpaa2-ptp: defer probe when portal allocation failed

 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 32 
 drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c |  5 +++-
 2 files changed, 26 insertions(+), 11 deletions(-)

-- 
1.9.1

[PATCH net-next v2 2/2] dpaa2-ptp: defer probe when portal allocation failed

2018-11-09 Thread Ioana Ciornei

The fsl_mc_portal_allocate can fail when the requested MC portals are
not yet probed by the fsl_mc_allocator. In this situation, the driver
should defer the probe.

Signed-off-by: Ioana Ciornei 
---
Changes in v2:
  - none

 drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c
index 84b942b..9b150db 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-ptp.c
@@ -140,7 +140,10 @@ static int dpaa2_ptp_probe(struct fsl_mc_device *mc_dev)
 
err = fsl_mc_portal_allocate(mc_dev, 0, _dev->mc_io);
if (err) {
-   dev_err(dev, "fsl_mc_portal_allocate err %d\n", err);
+   if (err == -ENXIO)
+   err = -EPROBE_DEFER;
+   else
+   dev_err(dev, "fsl_mc_portal_allocate err %d\n", err);
goto err_exit;
}
 
-- 
1.9.1

[PATCH net-next v2 1/2] dpaa2-eth: defer probe on object allocate

2018-11-09 Thread Ioana Ciornei

The fsl_mc_object_allocate function can fail because not all allocatable
objects are probed by the fsl_mc_allocator at the call time. Defer the
dpaa2-eth probe when this happens.

Signed-off-by: Ioana Ciornei 
---
Changes in v2:
  - proper handling of IS_ERR_OR_NULL


 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 32 
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c 
b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 88f7acc..bdfb13b 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -1434,8 +1434,11 @@ static struct fsl_mc_device *setup_dpcon(struct 
dpaa2_eth_priv *priv)
err = fsl_mc_object_allocate(to_fsl_mc_device(dev),
 FSL_MC_POOL_DPCON, );
if (err) {
-   dev_info(dev, "Not enough DPCONs, will go on as-is\n");
-   return NULL;
+   if (err == -ENXIO)
+   err = -EPROBE_DEFER;
+   else
+   dev_info(dev, "Not enough DPCONs, will go on as-is\n");
+   return ERR_PTR(err);
}
 
err = dpcon_open(priv->mc_io, 0, dpcon->obj_desc.id, >mc_handle);
@@ -1493,8 +1496,10 @@ static void free_dpcon(struct dpaa2_eth_priv *priv,
return NULL;
 
channel->dpcon = setup_dpcon(priv);
-   if (!channel->dpcon)
+   if (IS_ERR_OR_NULL(channel->dpcon)) {
+   err = PTR_ERR(channel->dpcon);
goto err_setup;
+   }
 
err = dpcon_get_attributes(priv->mc_io, 0, channel->dpcon->mc_handle,
   );
@@ -1513,7 +1518,7 @@ static void free_dpcon(struct dpaa2_eth_priv *priv,
free_dpcon(priv, channel->dpcon);
 err_setup:
kfree(channel);
-   return NULL;
+   return ERR_PTR(err);
 }
 
 static void free_channel(struct dpaa2_eth_priv *priv,
@@ -1547,10 +1552,11 @@ static int setup_dpio(struct dpaa2_eth_priv *priv)
for_each_online_cpu(i) {
/* Try to allocate a channel */
channel = alloc_channel(priv);
-   if (!channel) {
-   dev_info(dev,
-"No affine channel for cpu %d and above\n", i);
-   err = -ENODEV;
+   if (IS_ERR_OR_NULL(channel)) {
+   err = PTR_ERR(channel);
+   if (err != -EPROBE_DEFER)
+   dev_info(dev,
+"No affine channel for cpu %d and 
above\n", i);
goto err_alloc_ch;
}
 
@@ -1608,9 +1614,12 @@ static int setup_dpio(struct dpaa2_eth_priv *priv)
 err_service_reg:
free_channel(priv, channel);
 err_alloc_ch:
+   if (err == -EPROBE_DEFER)
+   return err;
+
if (cpumask_empty(>dpio_cpumask)) {
dev_err(dev, "No cpu with an affine DPIO/DPCON\n");
-   return err;
+   return -ENODEV;
}
 
dev_info(dev, "Cores %*pbl available for processing ingress traffic\n",
@@ -1732,7 +1741,10 @@ static int setup_dpbp(struct dpaa2_eth_priv *priv)
err = fsl_mc_object_allocate(to_fsl_mc_device(dev), FSL_MC_POOL_DPBP,
 _dev);
if (err) {
-   dev_err(dev, "DPBP device allocation failed\n");
+   if (err == -ENXIO)
+   err = -EPROBE_DEFER;
+   else
+   dev_err(dev, "DPBP device allocation failed\n");
return err;
}
 
-- 
1.9.1

Re: [PATCH v4 bpf-next 5/7] bpftool: add loadall command

2018-11-09 Thread Stanislav Fomichev

On 11/09, Quentin Monnet wrote:
> 2018-11-08 16:22 UTC-0800 ~ Stanislav Fomichev 
> > From: Stanislav Fomichev 
> > 
> > This patch adds new *loadall* command which slightly differs from the
> > existing *load*. *load* command loads all programs from the obj file,
> > but pins only the first programs. *loadall* pins all programs from the
> > obj file under specified directory.
> > 
> > The intended usecase is flow_dissector, where we want to load a bunch
> > of progs, pin them all and after that construct a jump table.
> > 
> > Signed-off-by: Stanislav Fomichev 
> > ---
> >  .../bpftool/Documentation/bpftool-prog.rst| 14 +++-
> >  tools/bpf/bpftool/bash-completion/bpftool |  4 +-
> >  tools/bpf/bpftool/common.c| 31 
> >  tools/bpf/bpftool/main.h  |  1 +
> >  tools/bpf/bpftool/prog.c  | 74 ++-
> >  5 files changed, 82 insertions(+), 42 deletions(-)
> > 
> > diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst 
> > b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > index ac4e904b10fb..d943d9b67a1d 100644
> > --- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> > +++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
> 
> > @@ -24,7 +25,7 @@ MAP COMMANDS
> >  |  **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** 
> > | **visual**}]
> >  |  **bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | 
> > **opcodes**}]
> >  |  **bpftool** **prog pin** *PROG* *FILE*
> > -|  **bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** 
> > {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> > +|  **bpftool** **prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
> > [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> >  |   **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
> >  |   **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
> >  |  **bpftool** **prog help**
> > @@ -79,8 +80,13 @@ DESCRIPTION
> >   contain a dot character ('.'), which is reserved for future
> >   extensions of *bpffs*.
> >  
> > -   **bpftool prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** 
> > *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> > - Load bpf program from binary *OBJ* and pin as *FILE*.
> > +   **bpftool prog { load | loadall }** *OBJ* *FILE* [**type** *TYPE*] 
> > [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
> > + Load bpf program(s) from binary *OBJ* and pin as *FILE*.
> > + Both **bpftool prog load** and **bpftool prog loadall** load
> > + all maps and programs from the *OBJ* and differ only in
> > + pinning. **load** pins only the first program from the *OBJ*
> > + as *FILE*. **loadall** pins all programs from the *OBJ*
> > + under *FILE* directory.
> >   **type** is optional, if not specified program type will be
> >   inferred from section names.
> >   By default bpftool will create new maps as declared in the ELF
> 
> Thanks a lot for all the changes! The series looks really good to me
> now. The last nit I might have is that we could maybe replace "FILE"
> with "PATH" (as it can now be a directory), in the doc an below. No need
> to respin just for this, though.
Agreed, makes sense, will do another respin to address Jakub's comments
anyway. Thanks for a review!

> > @@ -1035,7 +1067,8 @@ static int do_help(int argc, char **argv)
> > "   %s %s dump xlated PROG [{ file FILE | opcodes | visual 
> > }]\n"
> > "   %s %s dump jited  PROG [{ file FILE | opcodes }]\n"
> > "   %s %s pin   PROG FILE\n"
> > -   "   %s %s load  OBJ  FILE [type TYPE] [dev NAME] \\\n"
> > +   "   %s %s { load | loadall } OBJ  FILE \\\n"
> > +   " [type TYPE] [dev NAME] \\\n"
> > " [map { idx IDX | name NAME } MAP]\n"
> > "   %s %s attach PROG ATTACH_TYPE MAP\n"
> > "   %s %s detach PROG ATTACH_TYPE MAP\n"

Re: [PATCH v4 bpf-next 2/7] libbpf: cleanup after partial failure in bpf_object__pin

2018-11-09 Thread Stanislav Fomichev

On 11/08, Jakub Kicinski wrote:
> On Thu,  8 Nov 2018 16:22:08 -0800, Stanislav Fomichev wrote:
> > +   for (map = bpf_map__prev(map, obj);
> > +map != NULL;
> > +map = bpf_map__prev(map, obj)) {
> 
> nit pick: if you need to respin all these for loops on error paths could
>   have been more concise while loops
Agreed with everything, will address this one and the other comments in v5.
Thank you for another thorough review!

Re: [iproute2 PATCH v2] tc: flower: Classify packets based port ranges

2018-11-09 Thread Jiri Pirko

Wed, Nov 07, 2018 at 10:22:50PM CET, amritha.namb...@intel.com wrote:
>Added support for filtering based on port ranges.
>
>Example:
>1. Match on a port range:
>-
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>  action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0
>filter protocol ip pref 1 flower chain 0 handle 0x1
>  eth_type ipv4
>  ip_proto tcp
>  dst_port range 20-30
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>Action statistics:
>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>2. Match on IP address and port range:
>--
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>  skip_hw action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0 handle 0x2
>  eth_type ipv4
>  ip_proto tcp
>  dst_ip 192.168.1.1
>  dst_port range 100-200
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>Action statistics:
>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>v2:
>Addressed Jiri's comment to sync output format with input
>
>Signed-off-by: Amritha Nambiar 
>---
> include/uapi/linux/pkt_cls.h |7 ++
> tc/f_flower.c|  145 +++---
> 2 files changed, 142 insertions(+), 10 deletions(-)
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 401d0c1..b63c3cf 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -405,6 +405,11 @@ enum {
>   TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>   TCA_FLOWER_KEY_UDP_DST, /* be16 */
> 
>+  TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>+
>   TCA_FLOWER_FLAGS,
>   TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>   TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>@@ -518,6 +523,8 @@ enum {
>   TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
> };
> 
>+#define TCA_FLOWER_MASK_FLAGS_RANGE   (1 << 0) /* Range-based match */
>+
> /* Match-all classifier */
> 
> enum {
>diff --git a/tc/f_flower.c b/tc/f_flower.c
>index 65fca04..7724a1d 100644
>--- a/tc/f_flower.c
>+++ b/tc/f_flower.c
>@@ -494,6 +494,66 @@ static int flower_parse_port(char *str, __u8 ip_proto,
>   return 0;
> }
> 
>+static int flower_port_range_attr_type(__u8 ip_proto, enum flower_endpoint 
>type,
>+ __be16 *min_port_type,
>+ __be16 *max_port_type)
>+{
>+  if (ip_proto == IPPROTO_TCP || ip_proto == IPPROTO_UDP ||
>+  ip_proto == IPPROTO_SCTP) {
>+  if (type == FLOWER_ENDPOINT_SRC) {
>+  *min_port_type = TCA_FLOWER_KEY_PORT_SRC_MIN;
>+  *max_port_type = TCA_FLOWER_KEY_PORT_SRC_MAX;
>+  } else {
>+  *min_port_type = TCA_FLOWER_KEY_PORT_DST_MIN;
>+  *max_port_type = TCA_FLOWER_KEY_PORT_DST_MAX;
>+  }
>+  } else {
>+  return -1;
>+  }
>+
>+  return 0;
>+}
>+
>+static int flower_parse_port_range(__be16 *min, __be16 *max, __u8 ip_proto,
>+ enum flower_endpoint endpoint,
>+ struct nlmsghdr *n)
>+{
>+  __be16 min_port_type, max_port_type;
>+
>+  flower_port_range_attr_type(ip_proto, endpoint, _port_type,
>+  _port_type);
>+  addattr16(n, MAX_MSG, min_port_type, *min);
>+  addattr16(n, MAX_MSG, max_port_type, *max);
>+
>+  return 0;
>+}
>+
>+static int get_range(__be16 *min, __be16 *max, char *argv)
>+{
>+  char *r;
>+
>+  r = strchr(argv, '-');
>+  if (r) {
>+  *r = '\0';
>+  if (get_be16(min, argv, 10)) {
>+  fprintf(stderr, "invalid min range\n");
>+  return -1;
>+  }
>+  if (get_be16(max, r + 1, 10)) {
>+  fprintf(stderr, "invalid max range\n");
>+  return -1;
>+  }
>+  if (htons(*max) <= htons(*min)) {
>+  fprintf(stderr, "max value should be greater than min 
>value\n");
>+  return -1;
>+  }
>+  } else {
>+  fprintf(stderr, "Illegal range format\n");
>+  return -1;
>+  }
>+  return 0;
>+}
>+
> #define TCP_FLAGS_MAX_MASK

[PATCH][net-next] net: tcp: remove BUG_ON from tcp_v4_err

2018-11-09 Thread Li RongQing

if skb is NULL pointer, and the following access of skb's
skb_mstamp_ns will trigger panic, which is same as BUG_ON

Signed-off-by: Li RongQing 
---
 net/ipv4/tcp_ipv4.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a336787d75e5..5424a4077c27 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -542,7 +542,6 @@ int tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
icsk->icsk_rto = inet_csk_rto_backoff(icsk, TCP_RTO_MAX);
 
skb = tcp_rtx_queue_head(sk);
-   BUG_ON(!skb);
 
tcp_mstamp_refresh(tp);
delta_us = (u32)(tp->tcp_mstamp - tcp_skb_timestamp_us(skb));
-- 
2.16.2

Re: [PATCH bpf-next 0/2] bpf: offer maximum packet offset info

2018-11-09 Thread Daniel Borkmann

On 11/08/2018 10:08 AM, Jiong Wang wrote:
> The maximum packet offset accessed by one BPF program is useful
> information.
> 
> Because sometimes there could be packet split and it is possible for some
> reasons (for example performance) we want to reject the BPF program if the
> maximum packet size would trigger such split. Normally, MTU value is
> treated as the maximum packet size, but one BPF program does not always
> access the whole packet, it could only access the head portion of the data.
> 
> We could let verifier calculate the maximum packet offset ever used and
> record it inside prog auxiliar information structure as a new field
> "max_pkt_offset".
> 
> Jiong Wang (2):
>   bpf: let verifier to calculate and record max_pkt_offset
>   nfp: bpf: relax prog rejection through max_pkt_offset
> 
>  drivers/net/ethernet/netronome/nfp/bpf/offload.c |  9 +
>  include/linux/bpf.h  |  1 +
>  kernel/bpf/verifier.c| 12 
>  3 files changed, 18 insertions(+), 4 deletions(-)
> 

Applied to bpf-next, thanks!

Re: [net-next PATCH v2] net: sched: cls_flower: Classify packets using port ranges

2018-11-09 Thread Jiri Pirko

Wed, Nov 07, 2018 at 10:22:42PM CET, amritha.namb...@intel.com wrote:
>Added support in tc flower for filtering based on port ranges.
>
>Example:
>1. Match on a port range:
>-
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower ip_proto tcp dst_port range 20-30 skip_hw\
>  action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0
>filter protocol ip pref 1 flower chain 0 handle 0x1
>  eth_type ipv4
>  ip_proto tcp
>  dst_port range 20-30
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 1 ref 1 bind 1 installed 85 sec used 3 sec
>Action statistics:
>Sent 460 bytes 10 pkt (dropped 10, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>2. Match on IP address and port range:
>--
>$ tc filter add dev enp4s0 protocol ip parent :\
>  prio 1 flower dst_ip 192.168.1.1 ip_proto tcp dst_port range 100-200\
>  skip_hw action drop
>
>$ tc -s filter show dev enp4s0 parent :
>filter protocol ip pref 1 flower chain 0 handle 0x2
>  eth_type ipv4
>  ip_proto tcp
>  dst_ip 192.168.1.1
>  dst_port range 100-200
>  skip_hw
>  not_in_hw
>action order 1: gact action drop
> random type none pass val 0
> index 2 ref 1 bind 1 installed 58 sec used 2 sec
>Action statistics:
>Sent 920 bytes 20 pkt (dropped 20, overlimits 0 requeues 0)
>backlog 0b 0p requeues 0
>
>v2:
>Addressed Jiri's comments:
>1. Added separate functions for dst and src comparisons.
>2. Removed endpoint enum.
>3. Added new bit TCA_FLOWER_FLAGS_RANGE to decide normal/range
>  lookup.
>4. Cleaned up fl_lookup function.
>
>Signed-off-by: Amritha Nambiar 
>---
> include/uapi/linux/pkt_cls.h |7 ++
> net/sched/cls_flower.c   |  133 --
> 2 files changed, 134 insertions(+), 6 deletions(-)
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 401d0c1..b63c3cf 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -405,6 +405,11 @@ enum {
>   TCA_FLOWER_KEY_UDP_SRC, /* be16 */
>   TCA_FLOWER_KEY_UDP_DST, /* be16 */
> 
>+  TCA_FLOWER_KEY_PORT_SRC_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_SRC_MAX,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MIN,/* be16 */
>+  TCA_FLOWER_KEY_PORT_DST_MAX,/* be16 */
>+

Please put it at the end of the enum, as David mentioned.


>   TCA_FLOWER_FLAGS,
>   TCA_FLOWER_KEY_VLAN_ID, /* be16 */
>   TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
>@@ -518,6 +523,8 @@ enum {
>   TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST = (1 << 1),
> };
> 
>+#define TCA_FLOWER_MASK_FLAGS_RANGE   (1 << 0) /* Range-based match */
>+
> /* Match-all classifier */
> 
> enum {
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index 9aada2d..9d2582d 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -55,6 +55,9 @@ struct fl_flow_key {
>   struct flow_dissector_key_ip ip;
>   struct flow_dissector_key_ip enc_ip;
>   struct flow_dissector_key_enc_opts enc_opts;
>+

No need for an empty line.


>+  struct flow_dissector_key_ports tp_min;
>+  struct flow_dissector_key_ports tp_max;
> } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as 
> longs. */
> 
> struct fl_flow_mask_range {
>@@ -65,6 +68,7 @@ struct fl_flow_mask_range {
> struct fl_flow_mask {
>   struct fl_flow_key key;
>   struct fl_flow_mask_range range;
>+  u32 flags;
>   struct rhash_head ht_node;
>   struct rhashtable ht;
>   struct rhashtable_params filter_ht_params;
>@@ -179,13 +183,89 @@ static void fl_clear_masked_range(struct fl_flow_key 
>*key,
>   memset(fl_key_get_start(key, mask), 0, fl_mask_range(mask));
> }
> 
>-static struct cls_fl_filter *fl_lookup(struct fl_flow_mask *mask,
>- struct fl_flow_key *mkey)
>+static bool fl_range_port_dst_cmp(struct cls_fl_filter *filter,
>+struct fl_flow_key *key,
>+struct fl_flow_key *mkey)
>+{
>+  __be16 min_mask, max_mask, min_val, max_val;
>+
>+  min_mask = htons(filter->mask->key.tp_min.dst);
>+  max_mask = htons(filter->mask->key.tp_max.dst);
>+  min_val = htons(filter->key.tp_min.dst);
>+  max_val = htons(filter->key.tp_max.dst);
>+
>+  if (min_mask && max_mask) {
>+  if (htons(key->tp.dst) < min_val ||
>+  htons(key->tp.dst) > max_val)
>+  return false;
>+
>+  /* skb does not have min and max values */
>+  mkey->tp_min.dst = filter->mkey.tp_min.dst;
>+  mkey->tp_max.dst = filter->mkey.tp_max.dst;
>+  }
>+  return true;
>+}
>+
>+static bool fl_range_port_src_cmp(struct cls_fl_filter *filter,
>+

bring back IPX and NCPFS, please!

2018-11-09 Thread Johannes C. Schulz

Hello all!

I like to please you to bring back IPX and NCPFS modules to the kernel.
Whyever my admins using Novell-shares on our network which I'm not be
able to use anymore - I'm forced to use cifs instead (and the admins
will kill the cifs-shares in some time), because my kernel (4.18) does
not have support for ncpfs anymore.
Maybe we at my work are not enough people that just for us this
modules will come back, but maybe out there are other people.
Thank you.

Re: [PATCH][RFC] udp: cache sock to avoid searching it twice

2018-11-09 Thread Paolo Abeni

Hi,

Adding Willem, I think he can be interested.

On Fri, 2018-11-09 at 14:21 +0800, Li RongQing wrote:
> GRO for UDP needs to lookup socket twice, first is in gro receive,
> second is gro complete, so if store sock to skb to avoid looking up
> twice, this can give small performance boost
> 
> netperf -t UDP_RR -l 10
> 
> Before:
>   Rate per sec: 28746.01
> After:
>   Rate per sec: 29401.67
> 
> Signed-off-by: Li RongQing 
> ---
>  net/ipv4/udp_offload.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 0646d61f4fa8..429570112a33 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -408,6 +408,11 @@ struct sk_buff *udp_gro_receive(struct list_head *head, 
> struct sk_buff *skb,
>  
>   if (udp_sk(sk)->gro_enabled) {
>   pp = call_gro_receive(udp_gro_receive_segment, head, skb);
> +
> + if (!IS_ERR(pp) && NAPI_GRO_CB(pp)->count > 1) {
> + sock_hold(sk);
> + pp->sk = sk;
> + }
>   rcu_read_unlock();
>   return pp;
>   }

What if 'pp' is NULL?

Aside from that, this replace a lookup with 2 atomic ops, and only when
such lookup is amortized on multiple aggregated packets: I'm unsure if
it's worthy and I don't understand how that improves RR tests (where
the socket can't see multiple, consecutive skbs, AFAIK).

Cheers,

Paolo

Re: [PATCH iproute2-next] devlink: Add missing region option to devlink man page

2018-11-09 Thread Jiri Pirko

Thu, Nov 08, 2018 at 10:14:13AM CET, va...@mellanox.com wrote:
>The region field was not added to the devlink man page.
>
>Fixes: 8b4fbf0bed8e6 ("devlink: Add support for devlink-region access")
>Signed-off-by: Alex Vesker 

Acked-by: Jiri Pirko

1 2 >

1 - 100 of 175 matches

Mail list logo