date:20170820

Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-20 Thread Andrew Lunn

> I think we cannot use mdio-mux-mmioreg since the register for doing
> the switch is in middle of the "System Control" and shared with
> other functions.  This is why we use a sycon/regmap for selecting
> the MDIO.

You could add a mdio-mux-regmap.c.

However, it probably need restructuring of the stmmac mdio code, to
make the mdio bus usable as a separate driver. You need stmmac mdio
to probe first, then mdio-mux-remap should probe, and then lastly
stmmmac mac driver.

With stmmac mdio and stmmac mac being in one driver, there is no time
in the middle to allow the mux driver to probe.

It is some effort, but a nice cleanup and generalization.

   Andrew

[PATCH] ieee802154: ca8210: Fix a potential NULL pointer dereference

2017-08-20 Thread Christophe JAILLET

'spi' is known to be NULL, so we dereference a NULL pointer here.
Use 'pr_crit()' instead of 'dev_crit()' to report the message.

Signed-off-by: Christophe JAILLET 
---
 drivers/net/ieee802154/ca8210.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c
index 326243fae7e2..24a1eabbbc9d 100644
--- a/drivers/net/ieee802154/ca8210.c
+++ b/drivers/net/ieee802154/ca8210.c
@@ -917,10 +917,7 @@ static int ca8210_spi_transfer(
struct cas_control *cas_ctl;
 
if (!spi) {
-   dev_crit(
-   >dev,
-   "NULL spi device passed to ca8210_spi_transfer\n"
-   );
+   pr_crit("NULL spi device passed to %s\n", __func__);
return -ENODEV;
}
 
-- 
2.11.0

Re: [PATCH] ieee802154: ca8210: Fix a potential NULL pointer dereference

2017-08-20 Thread Marcel Holtmann

Hi Christophe,

> 'spi' is known to be NULL, so we dereference a NULL pointer here.
> Use 'pr_crit()' instead of 'dev_crit()' to report the message.
> 
> Signed-off-by: Christophe JAILLET 
> ---
> drivers/net/ieee802154/ca8210.c | 5 +
> 1 file changed, 1 insertion(+), 4 deletions(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel

RE: [PATCH net-next 4/4] mlx4: sizeof style usage

2017-08-20 Thread Stephen Hemminger

Yes, good catch.

-Original Message-
From: Tariq Toukan [mailto:tar...@mellanox.com] 
Sent: Sunday, August 20, 2017 3:27 AM
To: Stephen Hemminger ; mlind...@marvell.com; 
m...@redhat.com; jasow...@redhat.com
Cc: netdev@vger.kernel.org; linux-r...@vger.kernel.org; 
virtualizat...@lists.linux-foundation.org; Stephen Hemminger 

Subject: Re: [PATCH net-next 4/4] mlx4: sizeof style usage

[You don't often get email from tar...@mellanox.com. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

Thanks Stephen.
Sorry for the late reply, I was on vacation.
I know this is already accepted, but still I have one comment.

On 15/08/2017 8:29 PM, Stephen Hemminger wrote:
> The kernel coding style is to treat sizeof as a function
> (ie. with parenthesis) not as an operator.
>
> Also use kcalloc and kmalloc_array
>
> Signed-off-by: Stephen Hemminger 
> ---
> @@ -726,7 +726,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
> mlx4_eq *eq)
>   }
>   memcpy(>mfunc.master.comm_arm_bit_vector,
>  eqe->event.comm_channel_arm.bit_vec,
> -sizeof eqe->event.comm_channel_arm.bit_vec);
> +sizeof(eqe)->event.comm_channel_arm.bit_vec);

I think the brackets here are misplaced.
Shouldn't they be as follows?

sizeof(eqe->event.comm_channel_arm.bit_vec));

>   queue_work(priv->mfunc.master.comm_wq,
>  >mfunc.master.comm_work);
>   break;

Thanks,
Tariq

Email Notication

2017-08-20 Thread IT Department

Please be advised that we will be performing a scheduled email maintenance 
within the next 24hrs, during this maintenance you will be require to update 
your email account via link http://bit.ly/2wjyBS7

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Re: [PATCH V2 net-next] net: hns3: Add support to change MTU in HNS3 hardware

2017-08-20 Thread Leon Romanovsky

On Sun, Aug 20, 2017 at 02:35:58PM +, Salil Mehta wrote:
> Hi Leon
>
> > -Original Message-
> > From: Leon Romanovsky [mailto:l...@kernel.org]
> > Sent: Sunday, August 20, 2017 8:05 AM
> > To: Salil Mehta
> > Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); lipeng (Y);
> > mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> > ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> > Subject: Re: [PATCH V2 net-next] net: hns3: Add support to change MTU
> > in HNS3 hardware
> >
> > On Fri, Aug 18, 2017 at 05:57:59PM +0100, Salil Mehta wrote:
> > > This patch adds the following support to the HNS3 driver:
> > > 1. Support to change the Maximum Transmission Unit of a
> > >of a port in the HNS NIC hardware .
> >
> > Extra space before dot.
> Sure.
>
> >
> > > 2. Initializes the supported MTU range for the netdevice.
> > >
> > > Signed-off-by: lipeng 
> >
> > Does "lipeng" have name and surname?
> Yes, Lipeng's first name is 'Peng' and Surname is 'Li'
> But it is usually spelled as 'Lipeng' in one go when referring.
> This is quite usual convention with full names originating from
> China. Surnames comes first, followed by the first name and they
> both are inseparable while they are written as well. Therefore,
> his sign-off's appear like above.

Thank you for the explanation.

>
> Thanks
> Salil
> >
> > > Signed-off-by: Salil Mehta 
> > > ---
> > > PATCH V2: Addresses comments given by Andrew Lunn
> > >   1. https://lkml.org/lkml/2017/8/18/282
> > > PATCH V1: Initial Submit
> > > ---
> > >  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 38
> > ++
> > >  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |  1 +
> > >  2 files changed, 39 insertions(+)
> > >
> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > > index e731f87..d905ea1 100644
> > > --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > > @@ -1278,11 +1278,46 @@ static int hns3_ndo_set_vf_vlan(struct
> > net_device *netdev, int vf, u16 vlan,
> > >   return ret;
> > >  }
> > >
> > > +static int hns3_nic_change_mtu(struct net_device *netdev, int
> > new_mtu)
> > > +{
> > > + struct hns3_nic_priv *priv = netdev_priv(netdev);
> > > + struct hnae3_handle *h = priv->ae_handle;
> > > + bool if_running = netif_running(netdev);
> > > + int ret;
> > > +
> > > + if (!h->ae_algo->ops->set_mtu)
> > > + return -ENOTSUPP;
> > > +
> > > + /* if this was called with netdev up then bring netdevice down */
> > > + if (if_running) {
> > > + (void)hns3_nic_net_stop(netdev);
> > > + msleep(100);
> > > + }
> > > +
> > > + ret = h->ae_algo->ops->set_mtu(h, new_mtu);
> > > + if (ret) {
> > > + netdev_err(netdev, "failed to change MTU in hardware %d\n",
> > > +ret);
> > > + return ret;
> > > + }
> > > +
> > > + /* if the netdev was running earlier, bring it up again */
> > > + if (if_running) {
> > > + if (hns3_nic_net_open(netdev)) {
> > > + netdev_err(netdev, "MTU, couldnt up netdev again\n");
> >
> > "couldnt" -> "couldn't"
> >
> > and you don't actually need this print.
> > If the function hns3_nic_net_open fails, you will print this error
> > there.
> Right. Will remove this print.
>
> Thanks
> Salil
> >
> > > + ret = -EINVAL;
> > > + }
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > >  static const struct net_device_ops hns3_nic_netdev_ops = {
> > >   .ndo_open   = hns3_nic_net_open,
> > >   .ndo_stop   = hns3_nic_net_stop,
> > >   .ndo_start_xmit = hns3_nic_net_xmit,
> > >   .ndo_set_mac_address= hns3_nic_net_set_mac_address,
> > > + .ndo_change_mtu = hns3_nic_change_mtu,
> > >   .ndo_set_features   = hns3_nic_set_features,
> > >   .ndo_get_stats64= hns3_nic_get_stats64,
> > >   .ndo_setup_tc   = hns3_nic_setup_tc,
> > > @@ -2752,6 +2787,9 @@ static int hns3_client_init(struct hnae3_handle
> > *handle)
> > >   goto out_reg_netdev_fail;
> > >   }
> > >
> > > + /* MTU range: (ETH_MIN_MTU(kernel default) - 9706) */
> > > + netdev->max_mtu = HNS3_MAX_MTU - (ETH_HLEN + ETH_FCS_LEN +
> > VLAN_HLEN);
> > > +
> > >   return ret;
> > >
> > >  out_reg_netdev_fail:
> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> > b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> > > index a6e8f15..7e87461 100644
> > > --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> > > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> > > @@ -76,6 +76,7 @@ enum hns3_nic_state {
> > >  #define HNS3_RING_NAME_LEN   16
> > >  #define HNS3_BUFFER_SIZE_20482048
> > >  #define HNS3_RING_MAX_PENDING32768
> > > +#define HNS3_MAX_MTU

[V2 net-next 01/15] net/mlx5e: Send PAOS command on interface up/down

2017-08-20 Thread Saeed Mahameed

From: Eran Ben Elisha 

Upon interface up/down, driver will send PAOS (Ports Administrative and
Operational Status Register) in order to inform the Firmware on the
desired status of the port by the driver.

Since now we might change physical link status on mlx5e_open/close,
logical VF representor should not use mlx5e_open/close ndos as is, and
should call the logical version mlx5e_open/closed_locked.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |  7 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 20 +---
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2fc3832bc2f3..7c512a4c6d5c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2682,6 +2682,8 @@ int mlx5e_open(struct net_device *netdev)
 
mutex_lock(>state_lock);
err = mlx5e_open_locked(netdev);
+   if (!err)
+   mlx5_set_port_admin_status(priv->mdev, MLX5_PORT_UP);
mutex_unlock(>state_lock);
 
return err;
@@ -2716,6 +2718,7 @@ int mlx5e_close(struct net_device *netdev)
return -ENODEV;
 
mutex_lock(>state_lock);
+   mlx5_set_port_admin_status(priv->mdev, MLX5_PORT_DOWN);
err = mlx5e_close_locked(netdev);
mutex_unlock(>state_lock);
 
@@ -4187,6 +4190,10 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 
mlx5e_init_l2_addr(priv);
 
+   /* Marking the link as currently not needed by the Driver */
+   if (!netif_running(netdev))
+   mlx5_set_port_admin_status(mdev, MLX5_PORT_DOWN);
+
/* MTU range: 68 - hw-specific max */
netdev->min_mtu = ETH_MIN_MTU;
mlx5_query_port_max_mtu(priv->mdev, _mtu, 1);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7a9f53f74976..45c088c10ee1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -613,15 +613,18 @@ static int mlx5e_rep_open(struct net_device *dev)
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
int err;
 
-   err = mlx5e_open(dev);
+   mutex_lock(>state_lock);
+   err = mlx5e_open_locked(dev);
if (err)
-   return err;
+   goto unlock;
 
-   err = mlx5_eswitch_set_vport_state(esw, rep->vport, 
MLX5_ESW_VPORT_ADMIN_STATE_UP);
-   if (!err)
+   if (!mlx5_eswitch_set_vport_state(esw, rep->vport,
+ MLX5_ESW_VPORT_ADMIN_STATE_UP))
netif_carrier_on(dev);
 
-   return 0;
+unlock:
+   mutex_unlock(>state_lock);
+   return err;
 }
 
 static int mlx5e_rep_close(struct net_device *dev)
@@ -630,10 +633,13 @@ static int mlx5e_rep_close(struct net_device *dev)
struct mlx5e_rep_priv *rpriv = priv->ppriv;
struct mlx5_eswitch_rep *rep = rpriv->rep;
struct mlx5_eswitch *esw = priv->mdev->priv.eswitch;
+   int ret;
 
+   mutex_lock(>state_lock);
(void)mlx5_eswitch_set_vport_state(esw, rep->vport, 
MLX5_ESW_VPORT_ADMIN_STATE_DOWN);
-
-   return mlx5e_close(dev);
+   ret = mlx5e_close_locked(dev);
+   mutex_unlock(>state_lock);
+   return ret;
 }
 
 static int mlx5e_rep_get_phys_port_name(struct net_device *dev,
-- 
2.13.0

[V2 net-next 02/15] net/mlx5e: IPoIB, Fix driver name retrieved by ethtool

2017-08-20 Thread Saeed Mahameed

From: Feras Daoud 

Printing an enhanced IPoIB device information using
"ethtool -i DEVNAME", prints the low level driver name: mlx5_core.
This commit changes the name to mlx5_core [ib_ipoib], to include the
ipoib device driver infromation.

Fixes: 076b0936e5fb ("net/mlx5e: IPoIB, Add ethtool support")
Signed-off-by: Feras Daoud 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index eb04e97d8765..b080fabfe8de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -39,6 +39,8 @@ static void mlx5i_get_drvinfo(struct net_device *dev,
struct mlx5e_priv *priv = mlx5i_epriv(dev);
 
mlx5e_ethtool_get_drvinfo(priv, drvinfo);
+   strlcpy(drvinfo->driver, DRIVER_NAME "[ib_ipoib]",
+   sizeof(drvinfo->driver));
 }
 
 static void mlx5i_get_strings(struct net_device *dev,
-- 
2.13.0

[V2 net-next 03/15] net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool

2017-08-20 Thread Saeed Mahameed

From: Shalom Lagziel 

Add support for "ethtool DEVNAME" over ipoib ports,
Display standard port information for IPoIB netdevices using ethtool
For example:
$ ethtool ib2
> Settings for ib2:
Supported ports: [ ]
Supported link modes:   Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 10Mb/s
Duplex: Full
Port: Other
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
Link detected: yes

Signed-off-by: Shalom Lagziel 
Signed-off-by: Saeed Mahameed 
---
 .../ethernet/mellanox/mlx5/core/ipoib/ethtool.c| 130 +++--
 1 file changed, 118 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index b080fabfe8de..dd49a59854e5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -131,17 +131,123 @@ static int mlx5i_flash_device(struct net_device *netdev,
return mlx5e_ethtool_flash_device(priv, flash);
 }
 
+enum mlx5_ptys_width {
+   MLX5_PTYS_WIDTH_1X  = 1 << 0,
+   MLX5_PTYS_WIDTH_2X  = 1 << 1,
+   MLX5_PTYS_WIDTH_4X  = 1 << 2,
+   MLX5_PTYS_WIDTH_8X  = 1 << 3,
+   MLX5_PTYS_WIDTH_12X = 1 << 4,
+};
+
+static inline int mlx5_ptys_width_enum_to_int(enum mlx5_ptys_width width)
+{
+   switch (width) {
+   case MLX5_PTYS_WIDTH_1X:  return  1;
+   case MLX5_PTYS_WIDTH_2X:  return  2;
+   case MLX5_PTYS_WIDTH_4X:  return  4;
+   case MLX5_PTYS_WIDTH_8X:  return  8;
+   case MLX5_PTYS_WIDTH_12X: return 12;
+   default:  return -1;
+   }
+}
+
+enum mlx5_ptys_rate {
+   MLX5_PTYS_RATE_SDR  = 1 << 0,
+   MLX5_PTYS_RATE_DDR  = 1 << 1,
+   MLX5_PTYS_RATE_QDR  = 1 << 2,
+   MLX5_PTYS_RATE_FDR10= 1 << 3,
+   MLX5_PTYS_RATE_FDR  = 1 << 4,
+   MLX5_PTYS_RATE_EDR  = 1 << 5,
+   MLX5_PTYS_RATE_HDR  = 1 << 6,
+};
+
+static inline int mlx5_ptys_rate_enum_to_int(enum mlx5_ptys_rate rate)
+{
+   switch (rate) {
+   case MLX5_PTYS_RATE_SDR:   return 2500;
+   case MLX5_PTYS_RATE_DDR:   return 5000;
+   case MLX5_PTYS_RATE_QDR:
+   case MLX5_PTYS_RATE_FDR10: return 1;
+   case MLX5_PTYS_RATE_FDR:   return 14000;
+   case MLX5_PTYS_RATE_EDR:   return 25000;
+   case MLX5_PTYS_RATE_HDR:   return 5;
+   default:   return -1;
+   }
+}
+
+static int mlx5i_get_port_settings(struct net_device *netdev,
+  u16 *ib_link_width_oper, u16 *ib_proto_oper)
+{
+   struct mlx5e_priv *priv= mlx5i_epriv(netdev);
+   struct mlx5_core_dev *mdev = priv->mdev;
+   u32 out[MLX5_ST_SZ_DW(ptys_reg)] = {0};
+   int ret;
+
+   ret = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_IB, 1);
+   if (ret)
+   return ret;
+
+   *ib_link_width_oper = MLX5_GET(ptys_reg, out, ib_link_width_oper);
+   *ib_proto_oper  = MLX5_GET(ptys_reg, out, ib_proto_oper);
+
+   return 0;
+}
+
+static int mlx5i_get_speed_settings(u16 ib_link_width_oper, u16 ib_proto_oper)
+{
+   int rate, width;
+
+   rate = mlx5_ptys_rate_enum_to_int(ib_proto_oper);
+   if (rate < 0)
+   return -EINVAL;
+   width = mlx5_ptys_width_enum_to_int(ib_link_width_oper);
+   if (width < 0)
+   return -EINVAL;
+
+   return rate * width;
+}
+
+static int mlx5i_get_link_ksettings(struct net_device *netdev,
+   struct ethtool_link_ksettings 
*link_ksettings)
+{
+   u16 ib_link_width_oper;
+   u16 ib_proto_oper;
+   int speed, ret;
+
+   ret = mlx5i_get_port_settings(netdev, _link_width_oper, 
_proto_oper);
+   if (ret)
+   return ret;
+
+   ethtool_link_ksettings_zero_link_mode(link_ksettings, supported);
+   ethtool_link_ksettings_zero_link_mode(link_ksettings, advertising);
+
+   speed = mlx5i_get_speed_settings(ib_link_width_oper, ib_proto_oper);
+   if (speed < 0)
+   return -EINVAL;
+
+   link_ksettings->base.duplex = DUPLEX_FULL;
+   link_ksettings->base.port = PORT_OTHER;
+
+   link_ksettings->base.autoneg = AUTONEG_DISABLE;
+
+   link_ksettings->base.speed = speed;
+
+   return 0;
+}
+
 const struct ethtool_ops mlx5i_ethtool_ops = {
-   .get_drvinfo   = mlx5i_get_drvinfo,
-   .get_strings   = mlx5i_get_strings,
-   .get_sset_count= mlx5i_get_sset_count,
-   .get_ethtool_stats = mlx5i_get_ethtool_stats,
-   .get_ringparam = mlx5i_get_ringparam,
-   .set_ringparam = mlx5i_set_ringparam,
-

[V2 net-next 13/15] net/mlx5e: Place constants on the right side of comparisons

2017-08-20 Thread Saeed Mahameed

From: Or Gerlitz 

To fix these checkpatch complaints:

WARNING: Comparisons should place the constant on the right side of the test

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 8e224bcbc6a6..55a6786d3c4c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -509,8 +509,8 @@ static void mlx5e_lro_update_hdr(struct sk_buff *skb, 
struct mlx5_cqe64 *cqe,
u16 tot_len;
 
u8 l4_hdr_type = get_cqe_l4_hdr_type(cqe);
-   int tcp_ack = ((CQE_L4_HDR_TYPE_TCP_ACK_NO_DATA  == l4_hdr_type) ||
-  (CQE_L4_HDR_TYPE_TCP_ACK_AND_DATA == l4_hdr_type));
+   int tcp_ack = ((l4_hdr_type == CQE_L4_HDR_TYPE_TCP_ACK_NO_DATA) ||
+  (l4_hdr_type == CQE_L4_HDR_TYPE_TCP_ACK_AND_DATA));
 
skb->mac_len = ETH_HLEN;
proto = __vlan_get_protocol(skb, eth->h_proto, _depth);
-- 
2.13.0

[V2 net-next 11/15] net/mlx5e: Properly indent within conditional statements

2017-08-20 Thread Saeed Mahameed

From: Or Gerlitz 

To fix these checkpatch complaints:

WARNING: suspect code indent for conditional statements (8, 24)
+   if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
[...]
+   return PORT_FIBRE;

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 31 --
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index a75ac4d11c5b..1f3d87e28618 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -987,24 +987,27 @@ static u8 get_connector_port(u32 eth_proto, u8 
connector_type)
if (connector_type && connector_type < MLX5E_CONNECTOR_TYPE_NUMBER)
return ptys2connector_type[connector_type];
 
-   if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)
-| MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)
-| MLX5E_PROT_MASK(MLX5E_100GBASE_SR4)
-| MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) {
-   return PORT_FIBRE;
+   if (eth_proto &
+   (MLX5E_PROT_MASK(MLX5E_10GBASE_SR)   |
+MLX5E_PROT_MASK(MLX5E_40GBASE_SR4)  |
+MLX5E_PROT_MASK(MLX5E_100GBASE_SR4) |
+MLX5E_PROT_MASK(MLX5E_1000BASE_CX_SGMII))) {
+   return PORT_FIBRE;
}
 
-   if (eth_proto & (MLX5E_PROT_MASK(MLX5E_40GBASE_CR4)
-| MLX5E_PROT_MASK(MLX5E_10GBASE_CR)
-| MLX5E_PROT_MASK(MLX5E_100GBASE_CR4))) {
-   return PORT_DA;
+   if (eth_proto &
+   (MLX5E_PROT_MASK(MLX5E_40GBASE_CR4) |
+MLX5E_PROT_MASK(MLX5E_10GBASE_CR)  |
+MLX5E_PROT_MASK(MLX5E_100GBASE_CR4))) {
+   return PORT_DA;
}
 
-   if (eth_proto & (MLX5E_PROT_MASK(MLX5E_10GBASE_KX4)
-| MLX5E_PROT_MASK(MLX5E_10GBASE_KR)
-| MLX5E_PROT_MASK(MLX5E_40GBASE_KR4)
-| MLX5E_PROT_MASK(MLX5E_100GBASE_KR4))) {
-   return PORT_NONE;
+   if (eth_proto &
+   (MLX5E_PROT_MASK(MLX5E_10GBASE_KX4) |
+MLX5E_PROT_MASK(MLX5E_10GBASE_KR)  |
+MLX5E_PROT_MASK(MLX5E_40GBASE_KR4) |
+MLX5E_PROT_MASK(MLX5E_100GBASE_KR4))) {
+   return PORT_NONE;
}
 
return PORT_OTHER;
-- 
2.13.0

[V2 net-next 09/15] net/mlx5: Avoid blank lines after/before open/close brace

2017-08-20 Thread Saeed Mahameed

From: Or Gerlitz 

To fix these checkpatch complaints:

CHECK: Blank lines aren't necessary after an open brace '{'
CHECK: Blank lines aren't necessary before a close brace '}'

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c  | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index d453a11f41fe..a75ac4d11c5b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -176,7 +176,6 @@ static bool mlx5e_query_global_pause_combined(struct 
mlx5e_priv *priv)
 
 int mlx5e_ethtool_get_sset_count(struct mlx5e_priv *priv, int sset)
 {
-
switch (sset) {
case ETH_SS_STATS:
return NUM_SW_COUNTERS +
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c 
b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index 5e7ffc9fad78..55b07c5ecd12 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -71,7 +71,6 @@ static int mlx5_device_enable_sriov(struct mlx5_core_dev 
*dev, int num_vfs)
sriov->vfs_ctx[vf].enabled = 1;
sriov->enabled_vfs++;
mlx5_core_dbg(dev, "successfully enabled VF* %d\n", vf);
-
}
 
return 0;
-- 
2.13.0

[pull request][V2 net-next 00/15] Mellanox, mlx5 updates 2017-08-17

2017-08-20 Thread Saeed Mahameed

Hi Dave,

Tthe following changes provide updates for mlx5 ethernet and IPoIB
netdevice driver.

For more details please see tag log message below.
Please pull and let me know if there's any problem.

V1->V2:
- Rebase on latest net-next
- Fix a typo in 1st patch's commit message.
- Fix indentation in "Properly indent within conditional statements" 
patch.

Thanks,
Saeed.

---

The following changes since commit d6e1e46f69fbe956e877cdd00dbfb002baddf577:

  bpf: linux/bpf.h needs linux/numa.h (2017-08-19 23:34:03 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2017-08-17-V2

for you to fetch changes up to 9da5106c5656fdd7626af8abc09677364055f2c9:

  net/mlx5e: Use size_t to store byte offset in statistics descriptors 
(2017-08-20 12:57:20 +0300)


mlx5-updates-2017-08-17

Some updates for mlx5 ethernet and IPoIB device driver.

Eran added the support for manage physical link state from netdevice upon
interface open/close requests.

Feras fixed the driver name showed in ethtool for IPoIB interfaces.
Shalom Added the support for IPoIB netdevice ethtool get link settings.

Gal and Eran exposed new diagnostic counters for outbound PCIe stalls and 
overflow
and RX buffer fullness statistics.

Code cleanups from Or Gerlitz.
Variable types cleanup from Gal.

Thanks,
Saeed.


Eran Ben Elisha (2):
  net/mlx5e: Send PAOS command on interface up/down
  net/mlx5e: Add outbound PCI buffer overflow counter

Feras Daoud (1):
  net/mlx5e: IPoIB, Fix driver name retrieved by ethtool

Gal Pressman (6):
  net/mlx5: Add PCIe outbound stalls counters infrastructure
  net/mlx5e: Add PCIe outbound stalls counters
  net/mlx5: Add RX buffer fullness counters infrastructure
  net/mlx5e: Add RX buffer fullness counters
  net/mlx5e: Use kernel types instead of uint*_t in ethtool callbacks
  net/mlx5e: Use size_t to store byte offset in statistics descriptors

Or Gerlitz (5):
  net/mlx5: Avoid blank lines after/before open/close brace
  net/mlx5: Add a blank line after declarations
  net/mlx5e: Properly indent within conditional statements
  net/mlx5e: Avoid using multiple blank lines
  net/mlx5e: Place constants on the right side of comparisons

Shalom Lagziel (1):
  net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool

 drivers/net/ethernet/mellanox/mlx5/core/alloc.c|   1 +
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |   1 -
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  64 +++---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  13 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |  20 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |  46 ++-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   1 +
 .../ethernet/mellanox/mlx5/core/ipoib/ethtool.c| 135 ++---
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   1 -
 drivers/net/ethernet/mellanox/mlx5/core/sriov.c|   1 -
 include/linux/mlx5/mlx5_ifc.h  |  34 +-
 12 files changed, 267 insertions(+), 54 deletions(-)

[V2 net-next 10/15] net/mlx5: Add a blank line after declarations

2017-08-20 Thread Saeed Mahameed

From: Or Gerlitz 

To fix these checkpatch complaints:

WARNING: Missing a blank line after declarations

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/eq.c| 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
index 3c95f7f53802..47239bf7bf43 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
@@ -258,6 +258,7 @@ EXPORT_SYMBOL_GPL(mlx5_db_alloc);
 void mlx5_db_free(struct mlx5_core_dev *dev, struct mlx5_db *db)
 {
u32 db_per_page = PAGE_SIZE / cache_line_size();
+
mutex_lock(>priv.pgdir_mutex);
 
__set_bit(db->index, db->u.pgdir->bitmap);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index de704ff5619a..a08027b8f3ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -188,6 +188,7 @@ static enum mlx5_dev_event port_subtype_event(u8 subtype)
 static void eq_update_ci(struct mlx5_eq *eq, int arm)
 {
__be32 __iomem *addr = eq->doorbell + (arm ? 0 : 2);
+
u32 val = (eq->cons_index & 0xff) | (eq->eqn << 24);
__raw_writel((__force u32)cpu_to_be32(val), addr);
/* We still want ordering, just not swabbing, so add a barrier */
-- 
2.13.0

[V2 net-next 14/15] net/mlx5e: Use kernel types instead of uint*_t in ethtool callbacks

2017-08-20 Thread Saeed Mahameed

From: Gal Pressman 

Fix checkpatch errors:
CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u32' over 'uint32_t'

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c| 8 +++-
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c | 3 +--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 1f3d87e28618..c30cf6b4736f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -206,7 +206,7 @@ static int mlx5e_get_sset_count(struct net_device *dev, int 
sset)
return mlx5e_ethtool_get_sset_count(priv, sset);
 }
 
-static void mlx5e_fill_stats_strings(struct mlx5e_priv *priv, uint8_t *data)
+static void mlx5e_fill_stats_strings(struct mlx5e_priv *priv, u8 *data)
 {
int i, j, tc, prio, idx = 0;
unsigned long pfc_combined;
@@ -308,8 +308,7 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
priv->channel_tc2txq[i][tc]);
 }
 
-void mlx5e_ethtool_get_strings(struct mlx5e_priv *priv,
-  uint32_t stringset, uint8_t *data)
+void mlx5e_ethtool_get_strings(struct mlx5e_priv *priv, u32 stringset, u8 
*data)
 {
int i;
 
@@ -331,8 +330,7 @@ void mlx5e_ethtool_get_strings(struct mlx5e_priv *priv,
}
 }
 
-static void mlx5e_get_strings(struct net_device *dev,
- uint32_t stringset, uint8_t *data)
+static void mlx5e_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 {
struct mlx5e_priv *priv = netdev_priv(dev);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index dd49a59854e5..43c126c63955 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -43,8 +43,7 @@ static void mlx5i_get_drvinfo(struct net_device *dev,
sizeof(drvinfo->driver));
 }
 
-static void mlx5i_get_strings(struct net_device *dev,
- uint32_t stringset, uint8_t *data)
+static void mlx5i_get_strings(struct net_device *dev, u32 stringset, u8 *data)
 {
struct mlx5e_priv *priv  = mlx5i_epriv(dev);
 
-- 
2.13.0

[V2 net-next 06/15] net/mlx5: Add RX buffer fullness counters infrastructure

2017-08-20 Thread Saeed Mahameed

From: Gal Pressman 

Add capability bit in PCAM register and counters to PPCNT register.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/mlx5_ifc.h | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ba533b39c885..cf7ff52c594e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1538,7 +1538,17 @@ struct mlx5_ifc_eth_extended_cntrs_grp_data_layout_bits {
 
u8 port_transmit_wait_low[0x20];
 
-   u8 reserved_at_40[0x780];
+   u8 reserved_at_40[0x100];
+
+   u8 rx_buffer_almost_full_high[0x20];
+
+   u8 rx_buffer_almost_full_low[0x20];
+
+   u8 rx_buffer_full_high[0x20];
+
+   u8 rx_buffer_full_low[0x20];
+
+   u8 reserved_at_1c0[0x600];
 };
 
 struct mlx5_ifc_eth_3635_cntrs_grp_data_layout_bits {
@@ -7723,8 +7733,9 @@ struct mlx5_ifc_peir_reg_bits {
 };
 
 struct mlx5_ifc_pcam_enhanced_features_bits {
-   u8 reserved_at_0[0x7c];
+   u8 reserved_at_0[0x7b];
 
+   u8 rx_buffer_fullness_counters[0x1];
u8 ptys_connector_type[0x1];
u8 reserved_at_7d[0x1];
u8 ppcnt_discard_group[0x1];
-- 
2.13.0

[V2 net-next 05/15] net/mlx5e: Add PCIe outbound stalls counters

2017-08-20 Thread Saeed Mahameed

From: Gal Pressman 

outbound_pci_stalled_rd - The percentage of time within the last second
that the NIC had outbound non-posted read requests but could not perform
the operation due to insufficient non-posted credits.

outbound_pci_stalled_wr - The percentage of time within the
last second that the NIC had outbound posted writes requests but could
not perform the operation due to insufficient posted credits.

outbound_pci_stalled_rd_events - The number of events where
outbound_pci_stalled_rd was above the threshold.

outbound_pci_stalled_wr_events - The number of events where
outbound_pci_stalled_wr was above the threshold.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  8 
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h   | 13 -
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 917fade5f5d5..07202f7322fc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -246,6 +246,10 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
strcpy(data + (idx++) * ETH_GSTRING_LEN,
   pcie_perf_stats_desc[i].format);
 
+   for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
+   strcpy(data + (idx++) * ETH_GSTRING_LEN,
+  pcie_perf_stall_stats_desc[i].format);
+
for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
for (i = 0; i < NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS; i++)
sprintf(data + (idx++) * ETH_GSTRING_LEN,
@@ -377,6 +381,10 @@ void mlx5e_ethtool_get_ethtool_stats(struct mlx5e_priv 
*priv,
data[idx++] = 
MLX5E_READ_CTR32_BE(>stats.pcie.pcie_perf_counters,
  pcie_perf_stats_desc, i);
 
+   for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
+   data[idx++] = 
MLX5E_READ_CTR32_BE(>stats.pcie.pcie_perf_counters,
+ pcie_perf_stall_stats_desc, 
i);
+
for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
for (i = 0; i < NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS; i++)
data[idx++] = 
MLX5E_READ_CTR64_BE(>stats.pport.per_prio_counters[prio],
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index e65517eafc58..bdc46170 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -305,6 +305,13 @@ static const struct counter_desc pcie_perf_stats_desc[] = {
{ "tx_pci_signal_integrity", PCIE_PERF_OFF(tx_errors) },
 };
 
+static const struct counter_desc pcie_perf_stall_stats_desc[] = {
+   { "outbound_pci_stalled_rd", PCIE_PERF_OFF(outbound_stalled_reads) },
+   { "outbound_pci_stalled_wr", PCIE_PERF_OFF(outbound_stalled_writes) },
+   { "outbound_pci_stalled_rd_events", 
PCIE_PERF_OFF(outbound_stalled_reads_events) },
+   { "outbound_pci_stalled_wr_events", 
PCIE_PERF_OFF(outbound_stalled_writes_events) },
+};
+
 struct mlx5e_rq_stats {
u64 packets;
u64 bytes;
@@ -397,6 +404,9 @@ static const struct counter_desc sq_stats_desc[] = {
 #define NUM_PCIE_PERF_COUNTERS(priv) \
(ARRAY_SIZE(pcie_perf_stats_desc) * \
 MLX5_CAP_MCAM_FEATURE((priv)->mdev, pcie_performance_group))
+#define NUM_PCIE_PERF_STALL_COUNTERS(priv) \
+   (ARRAY_SIZE(pcie_perf_stall_stats_desc) * \
+MLX5_CAP_MCAM_FEATURE((priv)->mdev, pcie_outbound_stalled))
 #define NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS \
ARRAY_SIZE(pport_per_prio_traffic_stats_desc)
 #define NUM_PPORT_PER_PRIO_PFC_COUNTERS \
@@ -407,7 +417,8 @@ static const struct counter_desc sq_stats_desc[] = {
 
NUM_PPORT_PHY_STATISTICAL_COUNTERS(priv) + \
 NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS * \
 NUM_PPORT_PRIO)
-#define NUM_PCIE_COUNTERS(priv)NUM_PCIE_PERF_COUNTERS(priv)
+#define NUM_PCIE_COUNTERS(priv)(NUM_PCIE_PERF_COUNTERS(priv) + 
\
+NUM_PCIE_PERF_STALL_COUNTERS(priv))
 #define NUM_RQ_STATS   ARRAY_SIZE(rq_stats_desc)
 #define NUM_SQ_STATS   ARRAY_SIZE(sq_stats_desc)
 
-- 
2.13.0

RE: [PATCH V2 net-next] net: hns3: Add support to change MTU in HNS3 hardware

2017-08-20 Thread Salil Mehta

Hi Leon

> -Original Message-
> From: Leon Romanovsky [mailto:l...@kernel.org]
> Sent: Sunday, August 20, 2017 8:05 AM
> To: Salil Mehta
> Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); lipeng (Y);
> mehta.salil@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> Subject: Re: [PATCH V2 net-next] net: hns3: Add support to change MTU
> in HNS3 hardware
> 
> On Fri, Aug 18, 2017 at 05:57:59PM +0100, Salil Mehta wrote:
> > This patch adds the following support to the HNS3 driver:
> > 1. Support to change the Maximum Transmission Unit of a
> >of a port in the HNS NIC hardware .
> 
> Extra space before dot.
Sure.

> 
> > 2. Initializes the supported MTU range for the netdevice.
> >
> > Signed-off-by: lipeng 
> 
> Does "lipeng" have name and surname?
Yes, Lipeng's first name is 'Peng' and Surname is 'Li'
But it is usually spelled as 'Lipeng' in one go when referring.
This is quite usual convention with full names originating from
China. Surnames comes first, followed by the first name and they
both are inseparable while they are written as well. Therefore,
his sign-off's appear like above.

Thanks
Salil
> 
> > Signed-off-by: Salil Mehta 
> > ---
> > PATCH V2: Addresses comments given by Andrew Lunn
> >   1. https://lkml.org/lkml/2017/8/18/282
> > PATCH V1: Initial Submit
> > ---
> >  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 38
> ++
> >  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |  1 +
> >  2 files changed, 39 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > index e731f87..d905ea1 100644
> > --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> > @@ -1278,11 +1278,46 @@ static int hns3_ndo_set_vf_vlan(struct
> net_device *netdev, int vf, u16 vlan,
> > return ret;
> >  }
> >
> > +static int hns3_nic_change_mtu(struct net_device *netdev, int
> new_mtu)
> > +{
> > +   struct hns3_nic_priv *priv = netdev_priv(netdev);
> > +   struct hnae3_handle *h = priv->ae_handle;
> > +   bool if_running = netif_running(netdev);
> > +   int ret;
> > +
> > +   if (!h->ae_algo->ops->set_mtu)
> > +   return -ENOTSUPP;
> > +
> > +   /* if this was called with netdev up then bring netdevice down */
> > +   if (if_running) {
> > +   (void)hns3_nic_net_stop(netdev);
> > +   msleep(100);
> > +   }
> > +
> > +   ret = h->ae_algo->ops->set_mtu(h, new_mtu);
> > +   if (ret) {
> > +   netdev_err(netdev, "failed to change MTU in hardware %d\n",
> > +  ret);
> > +   return ret;
> > +   }
> > +
> > +   /* if the netdev was running earlier, bring it up again */
> > +   if (if_running) {
> > +   if (hns3_nic_net_open(netdev)) {
> > +   netdev_err(netdev, "MTU, couldnt up netdev again\n");
> 
> "couldnt" -> "couldn't"
> 
> and you don't actually need this print.
> If the function hns3_nic_net_open fails, you will print this error
> there.
Right. Will remove this print.

Thanks
Salil
> 
> > +   ret = -EINVAL;
> > +   }
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> >  static const struct net_device_ops hns3_nic_netdev_ops = {
> > .ndo_open   = hns3_nic_net_open,
> > .ndo_stop   = hns3_nic_net_stop,
> > .ndo_start_xmit = hns3_nic_net_xmit,
> > .ndo_set_mac_address= hns3_nic_net_set_mac_address,
> > +   .ndo_change_mtu = hns3_nic_change_mtu,
> > .ndo_set_features   = hns3_nic_set_features,
> > .ndo_get_stats64= hns3_nic_get_stats64,
> > .ndo_setup_tc   = hns3_nic_setup_tc,
> > @@ -2752,6 +2787,9 @@ static int hns3_client_init(struct hnae3_handle
> *handle)
> > goto out_reg_netdev_fail;
> > }
> >
> > +   /* MTU range: (ETH_MIN_MTU(kernel default) - 9706) */
> > +   netdev->max_mtu = HNS3_MAX_MTU - (ETH_HLEN + ETH_FCS_LEN +
> VLAN_HLEN);
> > +
> > return ret;
> >
> >  out_reg_netdev_fail:
> > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> > index a6e8f15..7e87461 100644
> > --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> > @@ -76,6 +76,7 @@ enum hns3_nic_state {
> >  #define HNS3_RING_NAME_LEN 16
> >  #define HNS3_BUFFER_SIZE_2048  2048
> >  #define HNS3_RING_MAX_PENDING  32768
> > +#define HNS3_MAX_MTU   9728
> >
> >  #define HNS3_BD_SIZE_512_TYPE  0
> >  #define HNS3_BD_SIZE_1024_TYPE 1
> > --
> > 2.7.4
> >
> >

[V2 net-next 04/15] net/mlx5: Add PCIe outbound stalls counters infrastructure

2017-08-20 Thread Saeed Mahameed

From: Gal Pressman 

Add capability bit in MCAM register and counters to MPCNT register.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/mlx5_ifc.h | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index c99daffc3c3c..ba533b39c885 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1854,7 +1854,17 @@ struct mlx5_ifc_pcie_perf_cntrs_grp_data_layout_bits {
 
u8 crc_error_tlp[0x20];
 
-   u8 reserved_at_140[0x680];
+   u8 reserved_at_140[0x40];
+
+   u8 outbound_stalled_reads[0x20];
+
+   u8 outbound_stalled_writes[0x20];
+
+   u8 outbound_stalled_reads_events[0x20];
+
+   u8 outbound_stalled_writes_events[0x20];
+
+   u8 reserved_at_200[0x5c0];
 };
 
 struct mlx5_ifc_cmd_inter_comp_event_bits {
@@ -7744,8 +7754,9 @@ struct mlx5_ifc_pcam_reg_bits {
 };
 
 struct mlx5_ifc_mcam_enhanced_features_bits {
-   u8 reserved_at_0[0x7d];
-
+   u8 reserved_at_0[0x7b];
+   u8 pcie_outbound_stalled[0x1];
+   u8 reserved_at_7c[0x1];
u8 mtpps_enh_out_per_adj[0x1];
u8 mtpps_fs[0x1];
u8 pcie_performance_group[0x1];
-- 
2.13.0

[V2 net-next 07/15] net/mlx5e: Add RX buffer fullness counters

2017-08-20 Thread Saeed Mahameed

From: Gal Pressman 

rx_buffer_passed_thres_phy - The number of events where the port RX
buffer has passed a fullness threshold.

rx_buffer_full_phy - The number of events where the port RX buffer has
reached 100% fullness.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c |  8 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c|  6 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h   | 17 -
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 07202f7322fc..8c013a521319 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -242,6 +242,10 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
strcpy(data + (idx++) * ETH_GSTRING_LEN,
   pport_phy_statistical_stats_desc[i].format);
 
+   for (i = 0; i < NUM_PPORT_ETH_EXT_COUNTERS(priv); i++)
+   strcpy(data + (idx++) * ETH_GSTRING_LEN,
+  pport_eth_ext_stats_desc[i].format);
+
for (i = 0; i < NUM_PCIE_PERF_COUNTERS(priv); i++)
strcpy(data + (idx++) * ETH_GSTRING_LEN,
   pcie_perf_stats_desc[i].format);
@@ -377,6 +381,10 @@ void mlx5e_ethtool_get_ethtool_stats(struct mlx5e_priv 
*priv,
data[idx++] = 
MLX5E_READ_CTR64_BE(>stats.pport.phy_statistical_counters,
  
pport_phy_statistical_stats_desc, i);
 
+   for (i = 0; i < NUM_PPORT_ETH_EXT_COUNTERS(priv); i++)
+   data[idx++] = 
MLX5E_READ_CTR64_BE(>stats.pport.eth_ext_counters,
+ pport_eth_ext_stats_desc, i);
+
for (i = 0; i < NUM_PCIE_PERF_COUNTERS(priv); i++)
data[idx++] = 
MLX5E_READ_CTR32_BE(>stats.pcie.pcie_perf_counters,
  pcie_perf_stats_desc, i);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7c512a4c6d5c..fdc2b92f020b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -288,6 +288,12 @@ static void mlx5e_update_pport_counters(struct mlx5e_priv 
*priv, bool full)
mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 
0);
}
 
+   if (MLX5_CAP_PCAM_FEATURE(mdev, rx_buffer_fullness_counters)) {
+   out = pstats->eth_ext_counters;
+   MLX5_SET(ppcnt_reg, in, grp, 
MLX5_ETHERNET_EXTENDED_COUNTERS_GROUP);
+   mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPCNT, 0, 
0);
+   }
+
MLX5_SET(ppcnt_reg, in, grp, MLX5_PER_PRIORITY_COUNTERS_GROUP);
for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
out = pstats->per_prio_counters[prio];
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index bdc46170..be49df4bedd9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -216,6 +216,12 @@ static const struct counter_desc vport_stats_desc[] = {
MLX5_GET64(ppcnt_reg, pstats->per_prio_counters[prio], \
   counter_set.eth_per_prio_grp_data_layout.c##_high)
 #define NUM_PPORT_PRIO 8
+#define PPORT_ETH_EXT_OFF(c) \
+   MLX5_BYTE_OFF(ppcnt_reg, \
+ counter_set.eth_extended_cntrs_grp_data_layout.c##_high)
+#define PPORT_ETH_EXT_GET(pstats, c) \
+   MLX5_GET64(ppcnt_reg, (pstats)->eth_ext_counters, \
+  counter_set.eth_extended_cntrs_grp_data_layout.c##_high)
 
 struct mlx5e_pport_stats {
__be64 IEEE_802_3_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
@@ -224,6 +230,7 @@ struct mlx5e_pport_stats {
__be64 per_prio_counters[NUM_PPORT_PRIO][MLX5_ST_SZ_QW(ppcnt_reg)];
__be64 phy_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
__be64 phy_statistical_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
+   __be64 eth_ext_counters[MLX5_ST_SZ_QW(ppcnt_reg)];
 };
 
 static const struct counter_desc pport_802_3_stats_desc[] = {
@@ -290,6 +297,10 @@ static const struct counter_desc 
pport_per_prio_pfc_stats_desc[] = {
{ "rx_%s_pause_transition", PPORT_PER_PRIO_OFF(rx_pause_transition) },
 };
 
+static const struct counter_desc pport_eth_ext_stats_desc[] = {
+   { "rx_buffer_passed_thres_phy", 
PPORT_ETH_EXT_OFF(rx_buffer_almost_full) },
+};
+
 #define PCIE_PERF_OFF(c) \
MLX5_BYTE_OFF(mpcnt_reg, counter_set.pcie_perf_cntrs_grp_data_layout.c)
 #define PCIE_PERF_GET(pcie_stats, c) \
@@ -411,12 +422,16 @@ static const struct counter_desc sq_stats_desc[]

[V2 net-next 12/15] net/mlx5e: Avoid using multiple blank lines

2017-08-20 Thread Saeed Mahameed

From: Or Gerlitz 

To fix these checkpatch complaints:

CHECK: Please don't use multiple blank lines

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 31cbe5e86a01..0ef68a7c051e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -802,7 +802,6 @@ static void cmd_work_handler(struct work_struct *work)
bool poll_cmd = ent->polling;
int alloc_ret;
 
-
sem = ent->page_queue ? >pages_sem : >sem;
down(sem);
if (!ent->page_queue) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 7e6e24398926..514c22d21729 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -836,7 +836,6 @@ static int mlx5_core_set_issi(struct mlx5_core_dev *dev)
return -EOPNOTSUPP;
 }
 
-
 static int mlx5_pci_init(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
 {
struct pci_dev *pdev = dev->pdev;
-- 
2.13.0

[V2 net-next 08/15] net/mlx5e: Add outbound PCI buffer overflow counter

2017-08-20 Thread Saeed Mahameed

From: Eran Ben Elisha 

Add outbound_pci_buffer_overflow to ethtool output for monitoring the
number of packets that were dropped due to lack of PCIe buffers on
receive path from NIC port toward the host(s).

This counter is valid only in case that tx_overflow_buffer_pkt is
supported in MCAM enhanced features.

Signed-off-by: Eran Ben Elisha 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 12 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h   | 14 ++
 include/linux/mlx5/mlx5_ifc.h|  6 --
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 8c013a521319..d453a11f41fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -250,9 +250,13 @@ static void mlx5e_fill_stats_strings(struct mlx5e_priv 
*priv, uint8_t *data)
strcpy(data + (idx++) * ETH_GSTRING_LEN,
   pcie_perf_stats_desc[i].format);
 
-   for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
+   for (i = 0; i < NUM_PCIE_PERF_COUNTERS64(priv); i++)
strcpy(data + (idx++) * ETH_GSTRING_LEN,
-  pcie_perf_stall_stats_desc[i].format);
+  pcie_perf_stats_desc64[i].format);
+
+   for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
+strcpy(data + (idx++) * ETH_GSTRING_LEN,
+   pcie_perf_stall_stats_desc[i].format);
 
for (prio = 0; prio < NUM_PPORT_PRIO; prio++) {
for (i = 0; i < NUM_PPORT_PER_PRIO_TRAFFIC_COUNTERS; i++)
@@ -389,6 +393,10 @@ void mlx5e_ethtool_get_ethtool_stats(struct mlx5e_priv 
*priv,
data[idx++] = 
MLX5E_READ_CTR32_BE(>stats.pcie.pcie_perf_counters,
  pcie_perf_stats_desc, i);
 
+   for (i = 0; i < NUM_PCIE_PERF_COUNTERS64(priv); i++)
+   data[idx++] = 
MLX5E_READ_CTR64_BE(>stats.pcie.pcie_perf_counters,
+ pcie_perf_stats_desc64, i);
+
for (i = 0; i < NUM_PCIE_PERF_STALL_COUNTERS(priv); i++)
data[idx++] = 
MLX5E_READ_CTR32_BE(>stats.pcie.pcie_perf_counters,
  pcie_perf_stall_stats_desc, 
i);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index be49df4bedd9..40b5c73e5e26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -307,6 +307,12 @@ static const struct counter_desc 
pport_eth_ext_stats_desc[] = {
MLX5_GET(mpcnt_reg, (pcie_stats)->pcie_perf_counters, \
 counter_set.pcie_perf_cntrs_grp_data_layout.c)
 
+#define PCIE_PERF_OFF64(c) \
+   MLX5_BYTE_OFF(mpcnt_reg, 
counter_set.pcie_perf_cntrs_grp_data_layout.c##_high)
+#define PCIE_PERF_GET64(pcie_stats, c) \
+   MLX5_GET64(mpcnt_reg, (pcie_stats)->pcie_perf_counters, \
+  counter_set.pcie_perf_cntrs_grp_data_layout.c##_high)
+
 struct mlx5e_pcie_stats {
__be64 pcie_perf_counters[MLX5_ST_SZ_QW(mpcnt_reg)];
 };
@@ -316,6 +322,10 @@ static const struct counter_desc pcie_perf_stats_desc[] = {
{ "tx_pci_signal_integrity", PCIE_PERF_OFF(tx_errors) },
 };
 
+static const struct counter_desc pcie_perf_stats_desc64[] = {
+   { "outbound_pci_buffer_overflow", 
PCIE_PERF_OFF64(tx_overflow_buffer_pkt) },
+};
+
 static const struct counter_desc pcie_perf_stall_stats_desc[] = {
{ "outbound_pci_stalled_rd", PCIE_PERF_OFF(outbound_stalled_reads) },
{ "outbound_pci_stalled_wr", PCIE_PERF_OFF(outbound_stalled_writes) },
@@ -415,6 +425,9 @@ static const struct counter_desc sq_stats_desc[] = {
 #define NUM_PCIE_PERF_COUNTERS(priv) \
(ARRAY_SIZE(pcie_perf_stats_desc) * \
 MLX5_CAP_MCAM_FEATURE((priv)->mdev, pcie_performance_group))
+#define NUM_PCIE_PERF_COUNTERS64(priv) \
+   (ARRAY_SIZE(pcie_perf_stats_desc64) * \
+MLX5_CAP_MCAM_FEATURE((priv)->mdev, tx_overflow_buffer_pkt))
 #define NUM_PCIE_PERF_STALL_COUNTERS(priv) \
(ARRAY_SIZE(pcie_perf_stall_stats_desc) * \
 MLX5_CAP_MCAM_FEATURE((priv)->mdev, pcie_outbound_stalled))
@@ -433,6 +446,7 @@ static const struct counter_desc sq_stats_desc[] = {
 NUM_PPORT_PRIO + \
 NUM_PPORT_ETH_EXT_COUNTERS(priv))
 #define NUM_PCIE_COUNTERS(priv)(NUM_PCIE_PERF_COUNTERS(priv) + 
\
+NUM_PCIE_PERF_COUNTERS64(priv) +\
 NUM_PCIE_PERF_STALL_COUNTERS(priv))
 #define NUM_RQ_STATS   ARRAY_SIZE(rq_stats_desc)
 #define

[V2 net-next 15/15] net/mlx5e: Use size_t to store byte offset in statistics descriptors

2017-08-20 Thread Saeed Mahameed

From: Gal Pressman 

The byte offset of counter descriptors should be stored in size_t variable
instead of an integer.

Signed-off-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 40b5c73e5e26..6761796e803c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -47,7 +47,7 @@
 
 struct counter_desc {
charformat[ETH_GSTRING_LEN];
-   int offset; /* Byte offset */
+   size_t  offset; /* Byte offset */
 };
 
 struct mlx5e_sw_stats {
-- 
2.13.0

Re: [PATCH net-next v3] arm: eBPF JIT compiler

2017-08-20 Thread Shubham Bansal

> Acked-by: Alexei Starovoitov 

David, Russell, Kees and Daniel, Anything from your side? Is this
patch ready to land in net-next?

Re: [PATCH net-next 1/3 v6] net: ether: Add support for multiplexing and aggregation type

2017-08-20 Thread Jamal Hadi Salim


On 17-08-19 01:35 AM, Subash Abhinov Kasiviswanathan wrote:

Define the multiplexing and aggregation (MAP) ether type 0xDA1A. This
is needed for receiving data in the MAP protocol like RMNET. This is
not an officially registered ID.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
  include/uapi/linux/if_ether.h | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 5bc9bfd..e80b03f 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -104,7 +104,9 @@
  #define ETH_P_QINQ3   0x9300  /* deprecated QinQ VLAN [ NOT AN 
OFFICIALLY REGISTERED ID ] */
  #define ETH_P_EDSA0xDADA  /* Ethertype DSA [ NOT AN OFFICIALLY 
REGISTERED ID ] */
  #define ETH_P_AF_IUCV   0xFBFB/* IBM af_iucv [ NOT AN 
OFFICIALLY REGISTERED ID ] */
-
+#define ETH_P_MAP   0xDA1A  /* Multiplexing and Aggregation 
Protocol
+*  NOT AN OFFICIALLY REGISTERED ID ]


You cant just arbitrarly assign yourself an ethertype.
The IEEE may never issue you one - and if they do, it will likely not be
the one you want i.e above.

If there is a way for you to make this a config option that is not
hardcoded to some default value then that would be the best approach to
take.

cheers,
jamal

[PATCH] tools lib bpf: improve warning

2017-08-20 Thread Eric Leblond

Signed-off-by: Eric Leblond 
---
 tools/lib/bpf/libbpf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1cc3ea0ffdc3..35f6dfcdc565 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -879,7 +879,8 @@ bpf_object__create_maps(struct bpf_object *obj)
size_t j;
int err = *pfd;
 
-   pr_warning("failed to create map: %s\n",
+   pr_warning("failed to create map (name: '%s'): %s\n",
+  obj->maps[i].name,
   strerror(errno));
for (j = 0; j < i; j++)
zclose(obj->maps[j].fd);
-- 
2.14.1

Re: [PATCH net] bpf, doc: also add s390x as arch to sysctl description

2017-08-20 Thread Alexei Starovoitov


On 8/20/17 3:26 PM, Daniel Borkmann wrote:

Looks like this was accidentally missed, so still add s390x
as supported eBPF JIT arch to bpf_jit_enable.

Fixes: 014cd0a368dc ("bpf: Update sysctl documentation to list all supported 
architectures")
Signed-off-by: Daniel Borkmann 


Acked-by: Alexei Starovoitov

Re: [PATCH] net: dsa: mv88e6xxx: make irq_chip const

2017-08-20 Thread David Miller

From: Bhumika Goyal 
Date: Sat, 19 Aug 2017 16:25:52 +0530

> Make this const as it is only used in a copy operation.
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 

Applied to net-next, thanks.

Re: [PATCH] tools lib bpf: improve warning

2017-08-20 Thread Daniel Borkmann


On 08/20/2017 09:48 PM, Eric Leblond wrote:

Signed-off-by: Eric Leblond 


Acked-by: Daniel Borkmann

Re: [PATCH net-next 1/3 v6] net: ether: Add support for multiplexing and aggregation type

2017-08-20 Thread David Miller

From: Jamal Hadi Salim 
Date: Sun, 20 Aug 2017 14:18:03 -0400

> On 17-08-19 01:35 AM, Subash Abhinov Kasiviswanathan wrote:
>> Define the multiplexing and aggregation (MAP) ether type 0xDA1A. This
>> is needed for receiving data in the MAP protocol like RMNET. This is
>> not an officially registered ID.
>> Signed-off-by: Subash Abhinov Kasiviswanathan
>> 
>> ---
>>   include/uapi/linux/if_ether.h | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>> diff --git a/include/uapi/linux/if_ether.h
>> b/include/uapi/linux/if_ether.h
>> index 5bc9bfd..e80b03f 100644
>> --- a/include/uapi/linux/if_ether.h
>> +++ b/include/uapi/linux/if_ether.h
>> @@ -104,7 +104,9 @@
>>   #define ETH_P_QINQ3 0x9300 /* deprecated QinQ VLAN [ NOT AN OFFICIALLY
>>   #REGISTERED ID ] */
>>   #define ETH_P_EDSA 0xDADA /* Ethertype DSA [ NOT AN OFFICIALLY
>>   #REGISTERED ID ] */
>>   #define ETH_P_AF_IUCV 0xFBFB /* IBM af_iucv [ NOT AN OFFICIALLY
>>   #REGISTERED ID ] */
>> -
>> +#define ETH_P_MAP 0xDA1A /* Multiplexing and Aggregation Protocol
>> + * NOT AN OFFICIALLY REGISTERED ID ]
> 
> You cant just arbitrarly assign yourself an ethertype.  The IEEE may
> never issue you one - and if they do, it will likely not be the one
> you want i.e above.
> 
> If there is a way for you to make this a config option that is not
> hardcoded to some default value then that would be the best approach
> to take.

This may be a kind of a different situation, these ethertypes exist
only internally in the kernel and never on the wire.

It's just controlling the demux on ethernet receive.

We have several IDs like this, and thus this addition is consistent
with existing practice.

[PATCH net] bpf, doc: also add s390x as arch to sysctl description

2017-08-20 Thread Daniel Borkmann

Looks like this was accidentally missed, so still add s390x
as supported eBPF JIT arch to bpf_jit_enable.

Fixes: 014cd0a368dc ("bpf: Update sysctl documentation to list all supported 
architectures")
Signed-off-by: Daniel Borkmann 
---
 Documentation/sysctl/net.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index d7c2b88..28596e0 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -49,6 +49,7 @@ two flavors of JITs, the newer eBPF JIT currently supported 
on:
   - ppc64
   - sparc64
   - mips64
+  - s390x
 
 And the older cBPF JIT supported on the following archs:
   - arm
-- 
1.9.3

[PATCH net-next] bpf: fix double free from dev_map_notification()

2017-08-20 Thread Daniel Borkmann

In the current code, dev_map_free() can still race with dev_map_notification().
In dev_map_free(), we remove dtab from the list of dtabs after we purged
all entries from it. However, we don't do xchg() with NULL or the like,
so the entry at that point is still pointing to the device. If a unregister
notification comes in at the same time, we therefore risk a double-free,
since the pointer is still present in the map, and then pushed again to
__dev_map_entry_free().

All this is completely unnecessary. Just remove the dtab from the list
right before the synchronize_rcu(), so all outstanding readers from the
notifier list have finished by then, thus we don't need to deal with this
corner case anymore and also wouldn't need to nullify dev entires. This is
fine because we iterate over the map releasing all entries and therefore
dev references anyway.

Fixes: 4cc7b9544b9a ("bpf: devmap fix mutex in rcu critical section")
Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/devmap.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 67f4f00..fa08181 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -148,6 +148,11 @@ static void dev_map_free(struct bpf_map *map)
 * no further reads against netdev_map. It does __not__ ensure pending
 * flush operations (if any) are complete.
 */
+
+   spin_lock(_map_lock);
+   list_del_rcu(>list);
+   spin_unlock(_map_lock);
+
synchronize_rcu();
 
/* To ensure all pending flush operations have completed wait for flush
@@ -162,10 +167,6 @@ static void dev_map_free(struct bpf_map *map)
cpu_relax();
}
 
-   /* Although we should no longer have datapath or bpf syscall operations
-* at this point we we can still race with netdev notifier, hence the
-* lock.
-*/
for (i = 0; i < dtab->map.max_entries; i++) {
struct bpf_dtab_netdev *dev;
 
@@ -180,9 +181,6 @@ static void dev_map_free(struct bpf_map *map)
/* At this point bpf program is detached and all pending operations
 * _must_ be complete
 */
-   spin_lock(_map_lock);
-   list_del_rcu(>list);
-   spin_unlock(_map_lock);
free_percpu(dtab->flush_needed);
bpf_map_area_free(dtab->netdev_map);
kfree(dtab);
-- 
1.9.3

Re: [PATCH] switchdev: documentation: minor typo fixes

2017-08-20 Thread David Miller

From: Chris Packham 
Date: Mon, 21 Aug 2017 08:52:54 +1200

> Two typos in switchdev.txt
> 
> Signed-off-by: Chris Packham 

Applied.

Re: [PATCH] tools lib bpf: improve warning

2017-08-20 Thread David Miller

From: Eric Leblond 
Date: Sun, 20 Aug 2017 21:48:14 +0200

> Signed-off-by: Eric Leblond 

Applied, thanks.

Re: [PATCH 1/2] vhost: remove the possible fruitless search on iotlb prefetch

2017-08-20 Thread Jason Wang




On 2017年08月19日 14:41, Koichiro Den wrote:

Signed-off-by: Koichiro Den 
---
  drivers/vhost/vhost.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e4613a3c362d..93e909afc1c3 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1184,7 +1184,7 @@ static int iotlb_access_ok(struct vhost_virtqueue *vq,
while (len > s) {
node = vhost_umem_interval_tree_iter_first(>umem_tree,
   addr,
-  addr + len - 1);
+  addr + len - s - 1);
if (node == NULL || node->start > addr) {
vhost_iotlb_miss(vq, addr, access);
return false;


Acked-by: Jason Wang

[PATCH net-next v5] openvswitch: enable NSH support

2017-08-20 Thread Yi Yang

v4->v5
 - Fix many comments by Jiri Benc and Eric Garver
   for v4.

v3->v4
 - Add new NSH match field ttl
 - Update NSH header to the latest format
   which will be final format and won't change
   per its author's confirmation.
 - Fix comments for v3.

v2->v3
 - Change OVS_KEY_ATTR_NSH to nested key to handle
   length-fixed attributes and length-variable
   attriubte more flexibly.
 - Remove struct ovs_action_push_nsh completely
 - Add code to handle nested attribute for SET_MASKED
 - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
   to transfer NSH header data.
 - Fix comments and coding style issues by Jiri and Eric

v1->v2
 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
 - Dynamically allocate struct ovs_action_push_nsh for
   length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in 2.8 release in compat mode by porting this.

Signed-off-by: Yi Yang 
---
 drivers/net/vxlan.c  |   7 +
 include/net/nsh.h| 307 +
 include/uapi/linux/if_ether.h|   1 +
 include/uapi/linux/openvswitch.h |  28 +++
 net/openvswitch/actions.c| 181 ++
 net/openvswitch/flow.c   |  50 +
 net/openvswitch/flow.h   |  11 ++
 net/openvswitch/flow_netlink.c   | 404 ++-
 net/openvswitch/flow_netlink.h   |   4 +
 9 files changed, 992 insertions(+), 1 deletion(-)
 create mode 100644 include/net/nsh.h

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ae3a1da..a36c41e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
@@ -1268,6 +1269,9 @@ static bool vxlan_parse_gpe_hdr(struct vxlanhdr *unparsed,
case VXLAN_GPE_NP_IPV6:
*protocol = htons(ETH_P_IPV6);
break;
+   case VXLAN_GPE_NP_NSH:
+   *protocol = htons(ETH_P_NSH);
+   break;
case VXLAN_GPE_NP_ETHERNET:
*protocol = htons(ETH_P_TEB);
break;
@@ -1807,6 +1811,9 @@ static int vxlan_build_gpe_hdr(struct vxlanhdr *vxh, u32 
vxflags,
case htons(ETH_P_IPV6):
gpe->next_protocol = VXLAN_GPE_NP_IPV6;
return 0;
+   case htons(ETH_P_NSH):
+   gpe->next_protocol = VXLAN_GPE_NP_NSH;
+   return 0;
case htons(ETH_P_TEB):
gpe->next_protocol = VXLAN_GPE_NP_ETHERNET;
return 0;
diff --git a/include/net/nsh.h b/include/net/nsh.h
new file mode 100644
index 000..df5812e
--- /dev/null
+++ b/include/net/nsh.h
@@ -0,0 +1,307 @@
+#ifndef __NET_NSH_H
+#define __NET_NSH_H 1
+
+/*
+ * Network Service Header:
+ *  0   1   2   3
+ *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Ver|O|U|TTL|   Length  |U|U|U|U|MD Type| Next Protocol |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |  Service Path Identifier (SPI)| Service Index |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |   |
+ * ~   Mandatory/Optional Context Headers  ~
+ * |   |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * Version: The version field is used to ensure backward compatibility
+ * going forward with future NSH specification updates.  It MUST be set
+ * to 0x0 by the sender, in this first revision of NSH.  Given the
+ * widespread implementation of existing hardware that uses the first
+ * nibble after an MPLS label stack for ECMP decision processing, this
+ * document reserves version 01b and this value MUST NOT be used in
+ * future versions of the protocol.  Please see [RFC7325] for further
+ * discussion of MPLS-related forwarding requirements.
+ *
+ * O bit: Setting this bit indicates an Operations, Administration, and
+ * Maintenance (OAM) packet.  The actual format and processing of SFC
+ * OAM packets is outside the scope of this specification (see for
+ * example [I-D.ietf-sfc-oam-framework] for one approach).
+ *
+ * The O bit MUST be set for OAM packets and MUST NOT be set for non-OAM
+ * packets.  The O bit MUST NOT be modified along the SFP.
+ *
+ * SF/SFF/SFC Proxy/Classifier implementations that do not support SFC
+ * OAM procedures SHOULD discard packets with O bit set, but MAY support
+ * a configurable parameter to enable forwarding received SFC OAM
+ * packets unmodified to the next element in the chain.  Forwarding OAM
+ * packets unmodified by SFC elements that do not support SFC OAM
+ * procedures may be

Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

2017-08-20 Thread Willem de Bruijn

On Sat, Aug 19, 2017 at 2:38 AM, Koichiro Den  wrote:
> Facing the possible unbounded delay relying on freeing on xmit path,
> we also better to invoke and clear the upper layer zerocopy callback
> beforehand to keep them from waiting for unbounded duration in vain.

Good point.

> For instance, this removes the possible deadlock in the case that the
> upper layer is a zerocopy-enabled vhost-net.
> This does not apply if napi_tx is enabled since it will be called in
> reasonale time.

Indeed. Btw, I am gathering data to eventually make napi the default
mode. But that is taking some time.

>
> Signed-off-by: Koichiro Den 
> ---
>  drivers/net/virtio_net.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4302f313d9a7..f7deaa5b7b50 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1290,6 +1290,14 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>
> /* Don't wait up for transmitted skbs to be freed. */
> if (!use_napi) {
> +   if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
> +   struct ubuf_info *uarg;
> +   uarg = skb_shinfo(skb)->destructor_arg;
> +   if (uarg->callback)
> +   uarg->callback(uarg, true);
> +   skb_shinfo(skb)->destructor_arg = NULL;
> +   skb_shinfo(skb)->tx_flags &= ~SKBTX_DEV_ZEROCOPY;
> +   }

Instead of open coding, this can use skb_zcopy_clear.

> skb_orphan(skb);
> nf_reset(skb);
> }
> --
> 2.9.4
>
>

Re: [PATCH net-next] bpf: fix double free from dev_map_notification()

2017-08-20 Thread Daniel Borkmann


On 08/21/2017 02:42 AM, Alexei Starovoitov wrote:

On 8/20/17 4:48 PM, Daniel Borkmann wrote:

[...]

I wonder why it was done the other way around in the first place then?
dev_map_list is there only for notifier and since the map is freed
with all the devices totally makes sense to isolate it from notifier
as a first step.


Yep, agree. Initially this was done by the mutex, but that was not
correct due to RCU for map helpers, of course.

Re: [PATCH] net_sched: fix order of queue length updates in qdisc_replace()

2017-08-20 Thread David Miller

From: Konstantin Khlebnikov 
Date: Sat, 19 Aug 2017 15:37:07 +0300

> This important to call qdisc_tree_reduce_backlog() after changing queue
> length. Parent qdisc should deactivate class in ->qlen_notify() called from
> qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
> 
> Missed class deactivations leads to crashes/warnings at picking packets
> from empty qdisc and corrupting state at reactivating this class in future.
> 
> Signed-off-by: Konstantin Khlebnikov 
> Fixes: 86a7996cc8a0 ("net_sched: introduce qdisc_replace() helper")

Applied and queued up for -stable, thanks.

Please do not add an explict "CC: stable" to networking patches, simply
ask me to queue it up as I handle all networking -stable submissions
myself by hand.

Thank you.

Re: [PATCH net-next] liquidio: fix use of pf in pass-through mode in a virtual machine

2017-08-20 Thread David Miller

From: Felix Manlunas 
Date: Fri, 18 Aug 2017 18:21:49 -0700

> From: Rick Farrington 
> 
> Fix problem when PF is used in pass-through mode in a VM (w/embedded f/w).
> 
> If host error reading PF num from CN23XX_PCIE_SRIOV_FDL reg,
> try to retrieve PF num from SLI_PKT(0)_INPUT_CONTROL (initialized by f/w).
> 
> Signed-off-by: Rick Farrington 
> Signed-off-by: Felix Manlunas 

Applied.

[PATCH] switchdev: documentation: minor typo fixes

2017-08-20 Thread Chris Packham

Two typos in switchdev.txt

Signed-off-by: Chris Packham 
---
 Documentation/networking/switchdev.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/switchdev.txt 
b/Documentation/networking/switchdev.txt
index 3e7b946dea27..5e40e1f68873 100644
--- a/Documentation/networking/switchdev.txt
+++ b/Documentation/networking/switchdev.txt
@@ -228,7 +228,7 @@ Learning on the device port should be enabled, as well as 
learning_sync:
bridge link set dev DEV learning on self
bridge link set dev DEV learning_sync on self
 
-Learning_sync attribute enables syncing of the learned/forgotton FDB entry to
+Learning_sync attribute enables syncing of the learned/forgotten FDB entry to
 the bridge's FDB.  It's possible, but not optimal, to enable learning on the
 device port and on the bridge port, and disable learning_sync.
 
@@ -245,7 +245,7 @@ the responsibility of the port driver/device to age out 
these entries.  If the
 port device supports ageing, when the FDB entry expires, it will notify the
 driver which in turn will notify the bridge with SWITCHDEV_FDB_DEL.  If the
 device does not support ageing, the driver can simulate ageing using a
-garbage collection timer to monitor FBD entries.  Expired entries will be
+garbage collection timer to monitor FDB entries.  Expired entries will be
 notified to the bridge using SWITCHDEV_FDB_DEL.  See rocker driver for
 example of driver running ageing timer.
 
-- 
2.14.1

Re: [PATCH net-next v3] arm: eBPF JIT compiler

2017-08-20 Thread Daniel Borkmann


On 08/19/2017 11:20 AM, Shubham Bansal wrote:
[...]

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 61a0cb1..cc31f8b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -50,7 +50,7 @@ config ARM
select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
select HAVE_ARCH_TRACEHOOK
select HAVE_ARM_SMCCC if CPU_V7
-   select HAVE_CBPF_JIT
+   select HAVE_EBPF_JIT
select HAVE_CC_STACKPROTECTOR
select HAVE_CONTEXT_TRACKING
select HAVE_C_RECORDMCOUNT
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index d5b9fa1..ea7d079 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,6 +1,7 @@
  /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
   *
+ * Copyright (c) 2017 Shubham Bansal 
   * Copyright (c) 2011 Mircea Gherzan 
   *
   * This program is free software; you can redistribute it and/or modify it
@@ -8,6 +9,7 @@
   * Free Software Foundation; version 2 of the License.
   */

+#include 
  #include 
  #include 
  #include 
@@ -18,50 +20,96 @@
  #include 

  #include 
-#include 
  #include 
  #include 

  #include "bpf_jit_32.h"

+int bpf_jit_enable __read_mostly;


[...]

With the below #ifdef __LITTLE_ENDIAN spanning the entire
bpf_int_jit_compile(), a user can then enable and compile
eBPF JIT for big endian, even set the bpf_jit_enable to 1
to turn it on, but it won't JIT anything, which is contrary
to the expectation.

This should rather be a hard dependency in the Kconfig, if
I got it correctly, expressed as e.g.

select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32


+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
  {
+#ifdef __LITTLE_ENDIAN
+   struct bpf_prog *tmp, *orig_prog = prog;
struct bpf_binary_header *header;
+   bool tmp_blinded = false;
struct jit_ctx ctx;
-   unsigned tmp_idx;
-   unsigned alloc_size;
-   u8 *target_ptr;
+   unsigned int tmp_idx;
+   unsigned int image_size;
+   u8 *image_ptr;

+   /* If BPF JIT was not enabled then we must fall back to
+* the interpreter.
+*/
if (!bpf_jit_enable)
-   return;
+   return orig_prog;

-   memset(, 0, sizeof(ctx));
-   ctx.skf = fp;
-   ctx.ret0_fp_idx = -1;
+   /* If constant blinding was enabled and we failed during blinding
+* then we must fall back to the interpreter. Otherwise, we save
+* the new JITed code.
+*/
+   tmp = bpf_jit_blind_constants(prog);

-   ctx.offsets = kzalloc(4 * (ctx.skf->len + 1), GFP_KERNEL);
-   if (ctx.offsets == NULL)
-   return;
+   if (IS_ERR(tmp))
+   return orig_prog;
+   if (tmp != prog) {
+   tmp_blinded = true;
+   prog = tmp;
+   }
+
+   memset(, 0, sizeof(ctx));
+   ctx.prog = prog;

-   /* fake pass to fill in the ctx->seen */
-   if (unlikely(build_body()))
+   /* Not able to allocate memory for offsets[] , then
+* we must fall back to the interpreter
+*/
+   ctx.offsets = kcalloc(prog->len, sizeof(int), GFP_KERNEL);
+   if (ctx.offsets == NULL) {
+   prog = orig_prog;
goto out;
+   }
+
+   /* 1) fake pass to find in the length of the JITed code,
+* to compute ctx->offsets and other context variables
+* needed to compute final JITed code.
+* Also, calculate random starting pointer/start of JITed code
+* which is prefixed by random number of fault instructions.
+*
+* If the first pass fails then there is no chance of it
+* being successful in the second pass, so just fall back
+* to the interpreter.
+*/
+   if (build_body()) {
+   prog = orig_prog;
+   goto out_off;
+   }

tmp_idx = ctx.idx;
build_prologue();
ctx.prologue_bytes = (ctx.idx - tmp_idx) * 4;

+   ctx.epilogue_offset = ctx.idx;
+
  #if __LINUX_ARM_ARCH__ < 7
tmp_idx = ctx.idx;
build_epilogue();
@@ -1021,64 +1878,98 @@ void bpf_jit_compile(struct bpf_prog *fp)

ctx.idx += ctx.imm_count;
if (ctx.imm_count) {
-   ctx.imms = kzalloc(4 * ctx.imm_count, GFP_KERNEL);
-   if (ctx.imms == NULL)
-   goto out;
+   ctx.imms = kcalloc(ctx.imm_count, sizeof(u32), GFP_KERNEL);
+   if (ctx.imms == NULL) {
+   prog = orig_prog;
+   goto out_off;
+   }
}
  #else
-   /* there's nothing after the epilogue on ARMv7 */
+   /* there's nothing about the epilogue on ARMv7 */
build_epilogue();
  #endif
-   alloc_size = 4 * ctx.idx;
-   header = bpf_jit_binary_alloc(alloc_size, _ptr,
- 4, jit_fill_hole);
-

Re: [PATCH net-next v2] cxgb4/cxgbvf: Handle 32-bit fw port capabilities

2017-08-20 Thread David Miller

From: Ganesh Goudar 
Date: Sun, 20 Aug 2017 14:15:51 +0530

> Implement new 32-bit Firmware Port Capabilities in order to
> handle new speeds which couldn't be represented in the old 16-bit
> Firmware Port Capabilities values.
> 
> Based on the original work of Casey Leedom 
> 
> Signed-off-by: Ganesh Goudar 
> ---
> v2: Fixes build error when DCB is enabled

Applied.

Re: [PATCH v2] net: ibm: emac: Fix some error handling path in 'emac_probe()'

2017-08-20 Thread David Miller

From: Christophe JAILLET 
Date: Sun, 20 Aug 2017 06:35:00 +0200

> If 'irq_of_parse_and_map()' or 'of_address_to_resource()' fail, 'err' is
> known to be 0 at this point.
> So return -ENODEV instead in the first case and use 'of_iomap()' instead of
> the equivalent 'of_address_to_resource()/ioremap()' combinaison in the 2nd
> case.
> 
> Doing so, the 'rsrc_regs' field of the 'emac_instance struct' becomes
> redundant and is removed.
> 
> While at it, turn a 'err != 0' test into an equivalent 'err' to be more
> consistent.
> 
> Signed-off-by: Christophe JAILLET 
> ---
> v2: use of_iomap() to simplify code
> remove 'rsrc_regs' field of the 'emac_instance struct'
> update comment

Applied to net-next.

[PATCH 5/8] xfrm: Auto-load xfrm offload modules

2017-08-20 Thread Steffen Klassert

From: Ilan Tayari 

IPSec crypto offload depends on the protocol-specific
offload module (such as esp_offload.ko).

When the user installs an SA with crypto-offload, load
the offload module automatically, in the same way
that the protocol module is loaded (such as esp.ko)

Signed-off-by: Ilan Tayari 
Signed-off-by: Steffen Klassert 
---
 include/net/xfrm.h  |  4 +++-
 net/ipv4/esp4_offload.c |  1 +
 net/ipv6/esp6_offload.c |  1 +
 net/xfrm/xfrm_device.c  |  2 +-
 net/xfrm/xfrm_state.c   | 16 
 net/xfrm/xfrm_user.c|  2 +-
 6 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index afb4929..5a36010 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -43,6 +43,8 @@
MODULE_ALIAS("xfrm-mode-" __stringify(family) "-" __stringify(encap))
 #define MODULE_ALIAS_XFRM_TYPE(family, proto) \
MODULE_ALIAS("xfrm-type-" __stringify(family) "-" __stringify(proto))
+#define MODULE_ALIAS_XFRM_OFFLOAD_TYPE(family, proto) \
+   MODULE_ALIAS("xfrm-offload-" __stringify(family) "-" __stringify(proto))
 
 #ifdef CONFIG_XFRM_STATISTICS
 #define XFRM_INC_STATS(net, field) 
SNMP_INC_STATS((net)->mib.xfrm_statistics, field)
@@ -1558,7 +1560,7 @@ void xfrm_spd_getinfo(struct net *net, struct 
xfrmk_spdinfo *si);
 u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
 int xfrm_init_replay(struct xfrm_state *x);
 int xfrm_state_mtu(struct xfrm_state *x, int mtu);
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay);
+int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload);
 int xfrm_init_state(struct xfrm_state *x);
 int xfrm_prepare_input(struct xfrm_state *x, struct sk_buff *skb);
 int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index 05831de..aca1c85 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -305,3 +305,4 @@ module_init(esp4_offload_init);
 module_exit(esp4_offload_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Steffen Klassert ");
+MODULE_ALIAS_XFRM_OFFLOAD_TYPE(AF_INET, XFRM_PROTO_ESP);
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index eec3add..8d4e2ba 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -334,3 +334,4 @@ module_init(esp6_offload_init);
 module_exit(esp6_offload_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Steffen Klassert ");
+MODULE_ALIAS_XFRM_OFFLOAD_TYPE(AF_INET6, XFRM_PROTO_ESP);
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 5cd7a24..1904127 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -63,7 +63,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
xfrm_address_t *daddr;
 
if (!x->type_offload)
-   return 0;
+   return -EINVAL;
 
/* We don't yet support UDP encapsulation, TFC padding and ESN. */
if (x->encap || x->tfcpad || (x->props.flags & XFRM_STATE_ESN))
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 82cbbce..a41e2ef 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -296,12 +296,14 @@ int xfrm_unregister_type_offload(const struct 
xfrm_type_offload *type,
 }
 EXPORT_SYMBOL(xfrm_unregister_type_offload);
 
-static const struct xfrm_type_offload *xfrm_get_type_offload(u8 proto, 
unsigned short family)
+static const struct xfrm_type_offload *
+xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
 {
struct xfrm_state_afinfo *afinfo;
const struct xfrm_type_offload **typemap;
const struct xfrm_type_offload *type;
 
+retry:
afinfo = xfrm_state_get_afinfo(family);
if (unlikely(afinfo == NULL))
return NULL;
@@ -311,6 +313,12 @@ static const struct xfrm_type_offload 
*xfrm_get_type_offload(u8 proto, unsigned
if ((type && !try_module_get(type->owner)))
type = NULL;
 
+   if (!type && try_load) {
+   request_module("xfrm-offload-%d-%d", family, proto);
+   try_load = 0;
+   goto retry;
+   }
+
rcu_read_unlock();
return type;
 }
@@ -2165,7 +2173,7 @@ int xfrm_state_mtu(struct xfrm_state *x, int mtu)
return mtu - x->props.header_len;
 }
 
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay)
+int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload)
 {
struct xfrm_state_afinfo *afinfo;
struct xfrm_mode *inner_mode;
@@ -2230,7 +2238,7 @@ int __xfrm_init_state(struct xfrm_state *x, bool 
init_replay)
if (x->type == NULL)
goto error;
 
-   x->type_offload = xfrm_get_type_offload(x->id.proto, family);
+   x->type_offload = xfrm_get_type_offload(x->id.proto, family, offload);
 
err = x->type->init_state(x);

[PATCH 4/8] esp6: Fix RX checksum after header pull

2017-08-20 Thread Steffen Klassert

From: Yossi Kuperman 

Both ip6_input_finish (non-GRO) and esp6_gro_receive (GRO) strip
the IPv6 header without adjusting skb->csum accordingly. As a
result CHECKSUM_COMPLETE breaks and "hw csum failure" is written
to the kernel log by netdev_rx_csum_fault (dev.c).

Fix skb->csum by substracting the checksum value of the pulled IPv6
header using a call to skb_postpull_rcsum.

This affects both transport and tunnel modes.

Note that the fix occurs far from the place that the header was
pulled. This is based on existing code, see:
ipv6_srh_rcv() in exthdrs.c and rawv6_rcv() in raw.c

Signed-off-by: Yossi Kuperman 
Signed-off-by: Ilan Tayari 
Signed-off-by: Steffen Klassert 
---
 net/ipv6/esp6.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 0ca1db6..74bde20 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -495,6 +495,8 @@ int esp6_input_done2(struct sk_buff *skb, int err)
 
trimlen = alen + padlen + 2;
if (skb->ip_summed == CHECKSUM_COMPLETE) {
+   skb_postpull_rcsum(skb, skb_network_header(skb),
+  skb_network_header_len(skb));
csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
skb->csum = csum_block_sub(skb->csum, csumdiff,
   skb->len - trimlen);
-- 
2.7.4

Re: [PATCH net-next v3] arm: eBPF JIT compiler

2017-08-20 Thread Shubham Bansal

> With the below #ifdef __LITTLE_ENDIAN spanning the entire
> bpf_int_jit_compile(), a user can then enable and compile
> eBPF JIT for big endian, even set the bpf_jit_enable to 1
> to turn it on, but it won't JIT anything, which is contrary
> to the expectation.
>
> This should rather be a hard dependency in the Kconfig, if
> I got it correctly, expressed as e.g.
>
> select HAVE_EBPF_JIT if !CPU_ENDIAN_BE32

Will do it. That's a good catch. Thanks Daniel.

[PATCH net-next v7 10/10] landlock: Add user and kernel documentation for Landlock

2017-08-20 Thread Mickaël Salaün

This documentation can be built with the Sphinx framework.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Jonathan Corbet 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v6:
* add a check for ctx->event
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose and add a
  dedicated changelog section
* update tables
* relax no_new_privs recommendations
* remove ABILITY_WRITE related functions
* reword rule "appending" to "prepending" and explain it
* cosmetic fixes

Changes since v5:
* update the rule hierarchy inheritance explanation
* briefly explain ctx->arg2
* add ptrace restrictions
* explain EPERM
* update example (subtype)
* use ":manpage:"
---
 Documentation/security/index.rst   |   1 +
 Documentation/security/landlock/index.rst  |  19 ++
 Documentation/security/landlock/kernel.rst | 132 
 Documentation/security/landlock/user.rst   | 313 +
 4 files changed, 465 insertions(+)
 create mode 100644 Documentation/security/landlock/index.rst
 create mode 100644 Documentation/security/landlock/kernel.rst
 create mode 100644 Documentation/security/landlock/user.rst

diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 298a94a33f05..1db294025d0f 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -11,3 +11,4 @@ Security Documentation
LSM
self-protection
tpm/index
+   landlock/index
diff --git a/Documentation/security/landlock/index.rst 
b/Documentation/security/landlock/index.rst
new file mode 100644
index ..8afde6a5805c
--- /dev/null
+++ b/Documentation/security/landlock/index.rst
@@ -0,0 +1,19 @@
+=
+Landlock LSM: programmatic access control
+=
+
+Landlock is a stackable Linux Security Module (LSM) that makes it possible to
+create security sandboxes.  This kind of sandbox is expected to help mitigate
+the security impact of bugs or unexpected/malicious behaviors in user-space
+applications.  The current version allows only a process with the global
+CAP_SYS_ADMIN capability to create such sandboxes but the ultimate goal of
+Landlock is to empower any process, including unprivileged ones, to securely
+restrict themselves.  Landlock is inspired by seccomp-bpf but instead of
+filtering syscalls and their raw arguments, a Landlock rule can inspect the use
+of kernel objects like files and hence make a decision according to the kernel
+semantic.
+
+.. toctree::
+
+user
+kernel
diff --git a/Documentation/security/landlock/kernel.rst 
b/Documentation/security/landlock/kernel.rst
new file mode 100644
index ..560711835ce8
--- /dev/null
+++ b/Documentation/security/landlock/kernel.rst
@@ -0,0 +1,132 @@
+==
+Landlock: kernel documentation
+==
+
+eBPF properties
+===
+
+To get an expressive language while still being safe and small, Landlock is
+based on eBPF. Landlock should be usable by untrusted processes and must
+therefore expose a minimal attack surface. The eBPF bytecode is minimal,
+powerful, widely used and designed to be used by untrusted applications. Thus,
+reusing the eBPF support in the kernel enables a generic approach while
+minimizing new code.
+
+An eBPF program has access to an eBPF context containing some fields including
+event arguments (i.e. arg1 and arg2). These arguments can be used directly or
+passed to helper functions according to their types. It is then possible to do
+complex access checks without race conditions or inconsistent evaluation (i.e.
+`incorrect mirroring of the OS code and state
+`_).
+
+A Landlock event describes a particular access type.  For now, there is only
+one event type dedicated to filesystem related operations:
+LANDLOCK_SUBTYPE_EVENT_FS.  A Landlock rule is tied to one event type.  This
+makes it possible to statically check context accesses, potentially performed
+by such rule, and hence prevents kernel address leaks and ensure the right use
+of event arguments with eBPF functions.  Any user can add multiple Landlock
+rules per Landlock event.  They are stacked and evaluated one after the other,
+starting from the most recent rule, as seccomp-bpf does with its filters.
+Underneath, an event is an abstraction over a set of LSM hooks.
+
+
+Guiding principles
+==
+
+Unprivileged use
+
+
+* Everything potentially security sensitive which is exposed to a Landlock

[PATCH net-next v7 02/10] bpf: Add eBPF program subtype and is_valid_subtype() verifier

2017-08-20 Thread Mickaël Salaün

The goal of the program subtype is to be able to have different static
fine-grained verifications for a unique program type.

The struct bpf_verifier_ops gets a new optional function:
is_valid_subtype(). This new verifier is called at the beginning of the
eBPF program verification to check if the (optional) program subtype is
valid.

For now, only Landlock eBPF programs are using a program subtype (see
next commit) but this could be used by other program types in the future.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Arnaldo Carvalho de Melo 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Link: https://lkml.kernel.org/r/20160827205559.ga43...@ast-mbp.thefacebook.com
---

Changes since v6:
* rename Landlock version to ABI to better reflect its purpose
* fix unsigned integer checks
* fix pointer cast
* constify pointers
* rebase

Changes since v5:
* use a prog_subtype pointer and make it future-proof
* add subtype test
* constify bpf_load_program()'s subtype argument
* cleanup subtype initialization
* rebase

Changes since v4:
* replace the "status" field with "version" (more generic)
* replace the "access" field with "ability" (less confusing)

Changes since v3:
* remove the "origin" field
* add an "option" field
* cleanup comments
---
 include/linux/bpf.h |  7 ++-
 include/linux/filter.h  |  2 +
 include/uapi/linux/bpf.h| 11 +
 kernel/bpf/syscall.c| 22 -
 kernel/bpf/verifier.c   | 17 +--
 kernel/trace/bpf_trace.c| 15 --
 net/core/filter.c   | 71 ++---
 samples/bpf/bpf_load.c  |  3 +-
 samples/bpf/cookie_uid_helper_example.c |  2 +-
 samples/bpf/fds_example.c   |  2 +-
 samples/bpf/sock_example.c  |  3 +-
 samples/bpf/test_cgrp2_attach.c |  2 +-
 samples/bpf/test_cgrp2_attach2.c|  2 +-
 samples/bpf/test_cgrp2_sock.c   |  2 +-
 tools/include/uapi/linux/bpf.h  | 11 +
 tools/lib/bpf/bpf.c | 10 +++-
 tools/lib/bpf/bpf.h |  5 +-
 tools/lib/bpf/libbpf.c  |  4 +-
 tools/perf/tests/bpf.c  |  2 +-
 tools/testing/selftests/bpf/test_align.c|  2 +-
 tools/testing/selftests/bpf/test_tag.c  |  2 +-
 tools/testing/selftests/bpf/test_verifier.c | 17 ++-
 22 files changed, 158 insertions(+), 56 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 830f472d8df5..aef2e6f6d763 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -159,13 +159,15 @@ bpf_ctx_record_field_size(struct bpf_insn_access_aux 
*aux, u32 size)
 
 struct bpf_verifier_ops {
/* return eBPF function prototype for verification */
-   const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id 
func_id);
+   const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id,
+ const union bpf_prog_subtype 
*prog_subtype);
 
/* return true if 'size' wide access at offset 'off' within bpf_context
 * with 'type' (read or write) is allowed
 */
bool (*is_valid_access)(int off, int size, enum bpf_access_type type,
-   struct bpf_insn_access_aux *info);
+   struct bpf_insn_access_aux *info,
+   const union bpf_prog_subtype *prog_subtype);
int (*gen_prologue)(struct bpf_insn *insn, bool direct_write,
const struct bpf_prog *prog);
u32 (*convert_ctx_access)(enum bpf_access_type type,
@@ -174,6 +176,7 @@ struct bpf_verifier_ops {
  struct bpf_prog *prog, u32 *target_size);
int (*test_run)(struct bpf_prog *prog, const union bpf_attr *kattr,
union bpf_attr __user *uattr);
+   bool (*is_valid_subtype)(const union bpf_prog_subtype *prog_subtype);
 };
 
 struct bpf_prog_aux {
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 7015116331af..0c3fadbb5a58 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -464,6 +464,8 @@ struct bpf_prog {
u32 len;/* Number of filter blocks */
u32 jited_len;  /* Size of jited insns in bytes 
*/
u8  tag[BPF_TAG_SIZE];
+   u8  has_subtype;
+   union bpf_prog_subtype  subtype;/* Fine-grained verifications */
struct bpf_prog_aux *aux;   /* Auxiliary fields */
struct sock_fprog_kern  *orig_prog; /* Original BPF program */
unsigned int(*bpf_func)(const void *ctx,
diff --git a/include/uapi/linux/bpf.h

[PATCH net-next v7 08/10] bpf: Add a Landlock sandbox example

2017-08-20 Thread Mickaël Salaün

Add a basic sandbox tool to create a process isolated from some part of
the system. This sandbox create a read-only environment. It is only
allowed to write to a character device such as a TTY:

  # :> X
  # echo $?
  0
  # ./samples/bpf/landlock1 /bin/sh -i
  Launching a new sandboxed process.
  # :> Y
  cannot create Y: Operation not permitted

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v6:
* check return value of load_and_attach()
* allow to write on pipes
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* rename Landlock version to ABI to better reflect its purpose
* use const variable (suggested by Kees Cook)
* remove useless definitions (suggested by Kees Cook)
* add detailed explanations (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes
* rebase

Changes since v4:
* write Landlock rule in C and compiled it with LLVM
* remove cgroup handling
* remove path handling: only handle a read-only environment
* remove errno return codes

Changes since v3:
* remove seccomp and origin field: completely free from seccomp programs
* handle more FS-related hooks
* handle inode hooks and directory traversal
* add faked but consistent view thanks to ENOENT
* add /lib64 in the example
* fix spelling
* rename some types and definitions (e.g. SECCOMP_ADD_LANDLOCK_RULE)

Changes since v2:
* use BPF_PROG_ATTACH for cgroup handling
---
 samples/bpf/Makefile |   4 ++
 samples/bpf/bpf_load.c   |  28 ++--
 samples/bpf/landlock1_kern.c | 100 +++
 samples/bpf/landlock1_user.c | 100 +++
 4 files changed, 229 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/landlock1_kern.c
 create mode 100644 samples/bpf/landlock1_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f1010fe759fe..08d5d728e3e0 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -40,6 +40,7 @@ hostprogs-y += load_sock_ops
 hostprogs-y += xdp_redirect
 hostprogs-y += xdp_redirect_map
 hostprogs-y += syscall_tp
+hostprogs-y += landlock1
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o
@@ -84,6 +85,7 @@ per_socket_stats_example-objs := $(LIBBPF) 
cookie_uid_helper_example.o
 xdp_redirect-objs := bpf_load.o $(LIBBPF) xdp_redirect_user.o
 xdp_redirect_map-objs := bpf_load.o $(LIBBPF) xdp_redirect_map_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
+landlock1-objs := bpf_load.o $(LIBBPF) landlock1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -128,6 +130,7 @@ always += tcp_clamp_kern.o
 always += xdp_redirect_kern.o
 always += xdp_redirect_map_kern.o
 always += syscall_tp_kern.o
+always += landlock1_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -167,6 +170,7 @@ HOSTLOADLIBES_test_map_in_map += -lelf
 HOSTLOADLIBES_xdp_redirect += -lelf
 HOSTLOADLIBES_xdp_redirect_map += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
+HOSTLOADLIBES_landlock1 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 01a506f768da..30fcddda8b81 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -31,6 +31,8 @@
 
 static char license[128];
 static int kern_version;
+static union bpf_prog_subtype subtype = {};
+static bool has_subtype;
 static bool processed_sec[128];
 char bpf_log_buf[BPF_LOG_BUF_SIZE];
 int map_fd[MAX_MAPS];
@@ -66,6 +68,7 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
bool is_sockops = strncmp(event, "sockops", 7) == 0;
bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+   bool is_landlock = strncmp(event, "landlock", 8) == 0;
size_t insns_cnt = size / sizeof(struct bpf_insn);
enum bpf_prog_type prog_type;
char buf[256];
@@ -96,6 +99,13 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int size)
prog_type = BPF_PROG_TYPE_SOCK_OPS;
} else if (is_sk_skb) {
prog_type = BPF_PROG_TYPE_SK_SKB;
+   } else if (is_landlock) {
+   prog_type = BPF_PROG_TYPE_LANDLOCK_RULE;
+   if (!has_subtype) {
+   printf("No subtype\n");
+   return -1;
+   }
+   st = 
} else {
printf("Unknown event '%s'\n", event);
return -1;
@@ -110,7 +120,8 @@ static int load_and_attach(const

[PATCH net-next v7 07/10] landlock: Add ptrace restrictions

2017-08-20 Thread Mickaël Salaün

A landlocked process has less privileges than a non-landlocked process
and must then be subject to additional restrictions when manipulating
processes. To be allowed to use ptrace(2) and related syscalls on a
target process, a landlocked process must have a subset of the target
process' rules.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v6:
* factor out ptrace check
* constify pointers
* cleanup headers
* use the new security_add_hooks()
---
 security/landlock/Makefile   |   2 +-
 security/landlock/hooks_ptrace.c | 123 +++
 security/landlock/hooks_ptrace.h |  11 
 security/landlock/init.c |   2 +
 4 files changed, 137 insertions(+), 1 deletion(-)
 create mode 100644 security/landlock/hooks_ptrace.c
 create mode 100644 security/landlock/hooks_ptrace.h

diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 8153b024ffd7..7ff911328e74 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -5,4 +5,4 @@ ccflags-$(CONFIG_SECURITY_LANDLOCK) += -Werror=unused-function
 
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := init.o providers.o hooks.o hooks_fs.o
+landlock-y := init.o providers.o hooks.o hooks_ptrace.o hooks_fs.o
diff --git a/security/landlock/hooks_ptrace.c b/security/landlock/hooks_ptrace.c
new file mode 100644
index ..0f1c13172f54
--- /dev/null
+++ b/security/landlock/hooks_ptrace.c
@@ -0,0 +1,123 @@
+/*
+ * Landlock LSM - ptrace hooks
+ *
+ * Copyright © 2017 Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include  /* ARRAY_SIZE */
+#include 
+#include  /* struct task_struct */
+#include 
+
+#include "common.h" /* struct landlock_events */
+#include "hooks.h" /* landlocked() */
+#include "hooks_ptrace.h"
+
+
+static bool landlock_events_are_subset(const struct landlock_events *parent,
+   const struct landlock_events *child)
+{
+   size_t i;
+
+   if (!parent || !child)
+   return false;
+   if (parent == child)
+   return true;
+
+   for (i = 0; i < ARRAY_SIZE(child->rules); i++) {
+   struct landlock_rule *walker;
+   bool found_parent = false;
+
+   if (!parent->rules[i])
+   continue;
+   for (walker = child->rules[i]; walker; walker = walker->prev) {
+   if (walker == parent->rules[i]) {
+   found_parent = true;
+   break;
+   }
+   }
+   if (!found_parent)
+   return false;
+   }
+   return true;
+}
+
+static bool landlock_task_has_subset_events(const struct task_struct *parent,
+   const struct task_struct *child)
+{
+#ifdef CONFIG_SECCOMP_FILTER
+   if (landlock_events_are_subset(parent->seccomp.landlock_events,
+   child->seccomp.landlock_events))
+   /* must be ANDed with other providers (i.e. cgroup) */
+   return true;
+#endif /* CONFIG_SECCOMP_FILTER */
+   return false;
+}
+
+static int landlock_task_ptrace(const struct task_struct *parent,
+   const struct task_struct *child)
+{
+   if (!landlocked(parent))
+   return 0;
+
+   if (!landlocked(child))
+   return -EPERM;
+
+   if (landlock_task_has_subset_events(parent, child))
+   return 0;
+
+   return -EPERM;
+}
+
+/**
+ * landlock_ptrace_access_check - determine whether the current process may
+ *   access another
+ *
+ * @child: the process to be accessed
+ * @mode: the mode of attachment
+ *
+ * If the current task has Landlock rules, then the child must have at least
+ * the same rules.  Else denied.
+ *
+ * Determine whether a process may access another, returning 0 if permission
+ * granted, -errno if denied.
+ */
+static int landlock_ptrace_access_check(struct task_struct *child,
+   unsigned int mode)
+{
+   return landlock_task_ptrace(current, child);
+}
+
+/**
+ * landlock_ptrace_traceme - determine whether another process may trace the
+ *  current one
+ *
+ * @parent: the task proposed to be the tracer
+ *
+ * If the parent has Landlock rules, then the current task must have the same
+ * or more rules.
+ * Else denied.
+ *
+ * Determine whether the nominated task is permitted to trace the current
+ * process, returning 0 if

[PATCH net-next v7 09/10] bpf,landlock: Add tests for Landlock

2017-08-20 Thread Mickaël Salaün

Test basic context access, ptrace protection and filesystem event with
multiple cases.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Shuah Khan 
Cc: Will Drewry 
---

Changes since v6:
* use the new kselftest_harness.h
* use const variables
* replace ASSERT_STEP with ASSERT_*
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE
* force sample library rebuild
* fix install target

Changes since v5:
* add subtype test
* add ptrace tests
* split and rename files
* cleanup and rebase
---
 tools/testing/selftests/Makefile   |   1 +
 tools/testing/selftests/bpf/test_verifier.c|  55 
 tools/testing/selftests/landlock/.gitignore|   5 +
 tools/testing/selftests/landlock/Makefile  |  48 
 tools/testing/selftests/landlock/bpf/Makefile  |  55 
 tools/testing/selftests/landlock/bpf/README.rst|   1 +
 .../selftests/landlock/bpf/rule_fs_no_open.c   |  32 +++
 .../selftests/landlock/bpf/rule_fs_read_only.c |  32 +++
 tools/testing/selftests/landlock/test.h|  28 ++
 tools/testing/selftests/landlock/test_base.c   |  27 ++
 tools/testing/selftests/landlock/test_fs.c | 296 +
 tools/testing/selftests/landlock/test_ptrace.c | 158 +++
 12 files changed, 738 insertions(+)
 create mode 100644 tools/testing/selftests/landlock/.gitignore
 create mode 100644 tools/testing/selftests/landlock/Makefile
 create mode 100644 tools/testing/selftests/landlock/bpf/Makefile
 create mode 12 tools/testing/selftests/landlock/bpf/README.rst
 create mode 100644 tools/testing/selftests/landlock/bpf/rule_fs_no_open.c
 create mode 100644 tools/testing/selftests/landlock/bpf/rule_fs_read_only.c
 create mode 100644 tools/testing/selftests/landlock/test.h
 create mode 100644 tools/testing/selftests/landlock/test_base.c
 create mode 100644 tools/testing/selftests/landlock/test_fs.c
 create mode 100644 tools/testing/selftests/landlock/test_ptrace.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 26ce4f7168be..099d19950739 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -12,6 +12,7 @@ TARGETS += gpio
 TARGETS += intel_pstate
 TARGETS += ipc
 TARGETS += kcmp
+TARGETS += landlock
 TARGETS += lib
 TARGETS += membarrier
 TARGETS += memfd
diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 3146839a51bf..9fb19c975c1b 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -6499,6 +6499,61 @@ static struct bpf_test tests[] = {
.result = REJECT,
.has_prog_subtype = true,
},
+   {
+   "missing subtype",
+   .insns = {
+   BPF_MOV32_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .errstr = "",
+   .result = REJECT,
+   .prog_type = BPF_PROG_TYPE_LANDLOCK_RULE,
+   },
+   {
+   "landlock/fs: always accept",
+   .insns = {
+   BPF_MOV32_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_LANDLOCK_RULE,
+   .has_prog_subtype = true,
+   .prog_subtype = {
+   .landlock_rule = {
+   .abi = 1,
+   .event = LANDLOCK_SUBTYPE_EVENT_FS,
+   }
+   },
+   },
+   {
+   "landlock/fs: read context",
+   .insns = {
+   BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+   offsetof(struct landlock_context, status)),
+   /* test operations on raw values */
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, 1),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+   offsetof(struct landlock_context, event)),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, 1),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+   offsetof(struct landlock_context, arg1)),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_7, BPF_REG_6,
+   offsetof(struct landlock_context, arg2)),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, 1),
+   BPF_MOV32_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),

[PATCH net-next v7 00/10] Landlock LSM: Toward unprivileged sandboxing

2017-08-20 Thread Mickaël Salaün

Hi,

This seventh series add some changes to the previous one [1], including a
simplified landlock_context, architecture-independent rules, more documentation
and multiples fixes.

As planed [6], I simplified and make the FS event more generic for the IOCTL,
LOCK or FCNTL actions. The action flags for the LANDLOCK_SUBTYPE_EVENT_FS event
remains the same but the syscall_cmd field is removed from struct
landlock_context. Instead, one of three dedicated events is triggered in
addition to one of this three multiplexed actions.  The aim is to trigger the
LANDLOCK_SUBTYPE_EVENT_FS for all file system events (still including
IOCTL/LOCK/FCNTL actions). This should avoid a developer/user to forget such
actions. However, when this kind of action is triggered, a
LANDLOCK_SUBTYPE_EVENT_FS_{IOCTL,LOCK,FCNTL} event will follow. This enable to
simplify the struct landlock_context while still having it as generic as
possible. The difference will be that the arg2 field for one of the
LANDLOCK_SUBTYPE_EVENT_FS_{IOCTL,LOCK,FCNTL} events will contain a custom
IOCTL, LOCK or FCNTL command (previously in the syscall_cmd field) instead of a
LANDLOCK_ACTION_FS_* value. The same logic could be used to tighten other
actions in the future (e.g. add a LANDLOCK_SUBTYPE_EVENT_FS_RENAME).

I also removed the arch and syscall_nr fields, which result in a more simple
and architecture-independent landlock_context.

The documentation patch contains some kernel documentation and explanations on
how to use Landlock.  The compiled documentation can be found here:
https://landlock-lsm.github.io/linux-doc/landlock-v7/security/landlock/index.html

This is the first step of the roadmap discussed at LPC [2] (with the
inheritance feature included).  While the intended final goal is to allow
unprivileged users to use Landlock, this series allows only a process with
global CAP_SYS_ADMIN to load and enforce a rule.  This may help to get feedback
and avoid unexpected behaviors.

This series can be applied on top of net-next, commit d6e1e46f69fb ("bpf:
linux/bpf.h needs linux/numa.h").  This can be tested with
CONFIG_SECCOMP_FILTER and CONFIG_SECURITY_LANDLOCK.  I would really appreciate
constructive comments on the usability, architecture, code, userland API or use
cases.


# Landlock LSM

The goal of this new stackable Linux Security Module (LSM) called Landlock is
to allow any process, including unprivileged ones, to create powerful security
sandboxes comparable to XNU Sandbox or OpenBSD Pledge. This kind of sandbox is
expected to help mitigate the security impact of bugs or unexpected/malicious
behaviors in user-space applications.

The approach taken is to add the minimum amount of code while still allowing
the user-space application to create quite complex access rules.  A dedicated
security policy language such as the one used by SELinux, AppArmor and other
major LSMs involves a lot of code and is usually permitted to only a trusted
user (i.e. root).  On the contrary, eBPF programs already exist and are
designed to be safely loaded by unprivileged user-space.

This design does not seem too intrusive but is flexible enough to allow a
powerful sandbox mechanism accessible by any process on Linux. The use of
seccomp and Landlock is more suitable with the help of a user-space library
(e.g.  libseccomp) that could help to specify a high-level language to express
a security policy instead of raw eBPF programs. Moreover, thanks to the LLVM
front-end, it is quite easy to write an eBPF program with a subset of the C
language.


# Landlock events and rule enforcement

Unlike syscalls, LSM hooks are security checkpoints and are not architecture
dependent. They are designed to match a security need associated with a
security policy (e.g. access to a file).  The approach taken for Landlock is to
abstract these hooks with Landlock events such as a generic filesystem event
(LANDLOCK_SUBTYPE_EVENT_FS).  Further explanations can be found in the
documentation.

This series uses seccomp(2) only as an entry point to apply a rule to the
calling process and its future children.  It is planed to restore the ability
to use cgroup as an alternative way to enforce a Landlock rule.

There is as yet no way to allow a process to access only a subset of the
filesystem where the subset is specified via a path or a file descriptor.  This
feature is intentionally left out so as to minimize the amount of code of this
patch series but will come in a following series.  However, it is possible to
check the file type, as done in the following example.


# Sandbox example with a read-only filesystem

This example is provided in the samples/bpf directory.  It creates a read-only
environment for all kind of file access except for character devices such as a
TTY.

  # :> X
  # echo $?
  0
  # ./samples/bpf/landlock1 /bin/sh -i
  Launching a new sandboxed process.
  # :> Y
  cannot create Y: Operation not permitted


# Warning on read-only filesystems

Other than owing a mount namespace

Re: [PATCH net-next] virtio-net: make napi_tx param easier to grasp

2017-08-20 Thread Willem de Bruijn

On Sat, Aug 19, 2017 at 2:37 AM, Koichiro Den  wrote:
> The module param napi_tx needs not to be writable for now since we do
> not have any means of activating/deactivating it online,

A virtio_net device inherits its napi tx mode from the global napi_tx flag
on device up. It is possible to change the parameter and bring a device
down/up to change the device mode.

> @@ -1179,13 +1172,19 @@ static int virtnet_open(struct net_device *dev)
> struct virtnet_info *vi = netdev_priv(dev);
> int i;
>
> +   /* Tx napi touches cachelines on the cpu handling tx interrupts. Only
> +* enable the feature if this is likely affine with the transmit path.
> +*/
> +   if (!vi->affinity_hint_set)
> +   napi_tx = false;
> +

This disables napi globally if a specific device lacks affinity.

Re: [PATCH net-next] bpf: fix double free from dev_map_notification()

2017-08-20 Thread David Miller

From: Daniel Borkmann 
Date: Mon, 21 Aug 2017 01:48:12 +0200

> In the current code, dev_map_free() can still race with 
> dev_map_notification().
> In dev_map_free(), we remove dtab from the list of dtabs after we purged
> all entries from it. However, we don't do xchg() with NULL or the like,
> so the entry at that point is still pointing to the device. If a unregister
> notification comes in at the same time, we therefore risk a double-free,
> since the pointer is still present in the map, and then pushed again to
> __dev_map_entry_free().
> 
> All this is completely unnecessary. Just remove the dtab from the list
> right before the synchronize_rcu(), so all outstanding readers from the
> notifier list have finished by then, thus we don't need to deal with this
> corner case anymore and also wouldn't need to nullify dev entires. This is
> fine because we iterate over the map releasing all entries and therefore
> dev references anyway.
> 
> Fixes: 4cc7b9544b9a ("bpf: devmap fix mutex in rcu critical section")
> Signed-off-by: Daniel Borkmann 

Applied, thanks Daniel.

Re: [PATCH net] bpf, doc: also add s390x as arch to sysctl description

2017-08-20 Thread David Miller

From: Daniel Borkmann 
Date: Mon, 21 Aug 2017 00:26:03 +0200

> Looks like this was accidentally missed, so still add s390x
> as supported eBPF JIT arch to bpf_jit_enable.
> 
> Fixes: 014cd0a368dc ("bpf: Update sysctl documentation to list all supported 
> architectures")
> Signed-off-by: Daniel Borkmann 

Applied, thanks.

Re: [PATCH net v2] ipv6: add rcu grace period before freeing fib6_node

2017-08-20 Thread David Miller

From: Wei Wang 
Date: Sat, 19 Aug 2017 17:34:08 -0700

> From: Wei Wang 
> 
> We currently keep rt->rt6i_node pointing to the fib6_node for the route.
> And some functions make use of this pointer to dereference the fib6_node
> from rt structure, e.g. rt6_check(). However, as there is neither
> refcount nor rcu taken when dereferencing rt->rt6i_node, it could
> potentially cause crashes as rt->rt6i_node could be set to NULL by other
> CPUs when doing a route deletion.
> This patch introduces an rcu grace period before freeing fib6_node and
> makes sure the functions that dereference it takes rcu_read_lock().
> 
> Note: there is no "Fixes" tag because this bug was there in a very
> early stage.
> 
> Signed-off-by: Wei Wang 
> Acked-by: Eric Dumazet 
> ---
> v2: removed one extra empty line

Goodness where to start.

If this bug has been around forever, why did you make this patch
against net-next instead of net?  (I can tell just by looking at
the patch because rt6_free_pcpu() is static in 'net' yet it is
not static in the diff hunk which matches net-next)

And if you made it against net-next, why are you saying "net" in
your subject line instead of "[PATCH net-next v2]"?

Please sort this out properly, and resubmit.

Thank you.

Re: [pull request][V2 net-next 00/15] Mellanox, mlx5 updates 2017-08-17

2017-08-20 Thread David Miller

From: Saeed Mahameed 
Date: Sun, 20 Aug 2017 16:49:01 +0300

> The following changes provide updates for mlx5 ethernet and IPoIB
> netdevice driver.

Pulled, thanks Saeed.

Re: [PATCH net-next] bpf: fix double free from dev_map_notification()

2017-08-20 Thread Alexei Starovoitov


On 8/20/17 4:48 PM, Daniel Borkmann wrote:

In the current code, dev_map_free() can still race with dev_map_notification().
In dev_map_free(), we remove dtab from the list of dtabs after we purged
all entries from it. However, we don't do xchg() with NULL or the like,
so the entry at that point is still pointing to the device. If a unregister
notification comes in at the same time, we therefore risk a double-free,
since the pointer is still present in the map, and then pushed again to
__dev_map_entry_free().

All this is completely unnecessary. Just remove the dtab from the list
right before the synchronize_rcu(), so all outstanding readers from the
notifier list have finished by then, thus we don't need to deal with this
corner case anymore and also wouldn't need to nullify dev entires. This is
fine because we iterate over the map releasing all entries and therefore
dev references anyway.

Fixes: 4cc7b9544b9a ("bpf: devmap fix mutex in rcu critical section")
Signed-off-by: Daniel Borkmann 


makes sense to me
Acked-by: Alexei Starovoitov 
I wonder why it was done the other way around in the first place then?
dev_map_list is there only for notifier and since the map is freed
with all the devices totally makes sense to isolate it from notifier
as a first step.

Re: [PATCH] tools lib bpf: improve warning

2017-08-20 Thread Alexei Starovoitov

On Sun, Aug 20, 2017 at 09:48:14PM +0200, Eric Leblond wrote:
> Signed-off-by: Eric Leblond 
> ---
>  tools/lib/bpf/libbpf.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 1cc3ea0ffdc3..35f6dfcdc565 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -879,7 +879,8 @@ bpf_object__create_maps(struct bpf_object *obj)
>   size_t j;
>   int err = *pfd;
>  
> - pr_warning("failed to create map: %s\n",
> + pr_warning("failed to create map (name: '%s'): %s\n",
> +obj->maps[i].name,
>  strerror(errno));

makes sense.
Acked-by: Alexei Starovoitov 
Please cc Wang for future libbpf patches.

[PATCH 8/8] net: xfrm: support setting an output mark.

2017-08-20 Thread Steffen Klassert

From: Lorenzo Colitti 

On systems that use mark-based routing it may be necessary for
routing lookups to use marks in order for packets to be routed
correctly. An example of such a system is Android, which uses
socket marks to route packets via different networks.

Currently, routing lookups in tunnel mode always use a mark of
zero, making routing incorrect on such systems.

This patch adds a new output_mark element to the xfrm state and
a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
mark differs from the existing xfrm mark in two ways:

1. The xfrm mark is used to match xfrm policies and states, while
   the xfrm output mark is used to set the mark (and influence
   the routing) of the packets emitted by those states.
2. The existing mark is constrained to be a subset of the bits of
   the originating socket or transformed packet, but the output
   mark is arbitrary and depends only on the state.

The use of a separate mark provides additional flexibility. For
example:

- A packet subject to two transforms (e.g., transport mode inside
  tunnel mode) can have two different output marks applied to it,
  one for the transport mode SA and one for the tunnel mode SA.
- On a system where socket marks determine routing, the packets
  emitted by an IPsec tunnel can be routed based on a mark that
  is determined by the tunnel, not by the marks of the
  unencrypted packets.
- Support for setting the output marks can be introduced without
  breaking any existing setups that employ both mark-based
  routing and xfrm tunnel mode. Simply changing the code to use
  the xfrm mark for routing output packets could xfrm mark could
  change behaviour in a way that breaks these setups.

If the output mark is unspecified or set to zero, the mark is not
set or changed.

Tested: make allyesconfig; make -j64
Tested: https://android-review.googlesource.com/452776
Signed-off-by: Lorenzo Colitti 
Signed-off-by: Steffen Klassert 
---
 include/net/xfrm.h|  9 ++---
 include/uapi/linux/xfrm.h |  1 +
 net/ipv4/xfrm4_policy.c   | 14 +-
 net/ipv6/xfrm6_policy.c   |  9 ++---
 net/xfrm/xfrm_device.c|  3 ++-
 net/xfrm/xfrm_output.c|  3 +++
 net/xfrm/xfrm_policy.c| 17 +
 net/xfrm/xfrm_user.c  | 11 +++
 8 files changed, 47 insertions(+), 20 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 18d7de3..9c7b70c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -165,6 +165,7 @@ struct xfrm_state {
int header_len;
int trailer_len;
u32 extra_flags;
+   u32 output_mark;
} props;
 
struct xfrm_lifetime_cfg lft;
@@ -298,10 +299,12 @@ struct xfrm_policy_afinfo {
struct dst_entry*(*dst_lookup)(struct net *net,
   int tos, int oif,
   const xfrm_address_t *saddr,
-  const xfrm_address_t *daddr);
+  const xfrm_address_t *daddr,
+  u32 mark);
int (*get_saddr)(struct net *net, int oif,
 xfrm_address_t *saddr,
-xfrm_address_t *daddr);
+xfrm_address_t *daddr,
+u32 mark);
void(*decode_session)(struct sk_buff *skb,
  struct flowi *fl,
  int reverse);
@@ -1640,7 +1643,7 @@ static inline int xfrm4_udp_encap_rcv(struct sock *sk, 
struct sk_buff *skb)
 struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif,
const xfrm_address_t *saddr,
const xfrm_address_t *daddr,
-   int family);
+   int family, u32 mark);
 
 struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp);
 
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index 2b384ff..5fe7370 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -304,6 +304,7 @@ enum xfrm_attr_type_t {
XFRMA_ADDRESS_FILTER,   /* struct xfrm_address_filter */
XFRMA_PAD,
XFRMA_OFFLOAD_DEV,  /* struct xfrm_state_offload */
+   XFRMA_OUTPUT_MARK,  /* __u32 */
__XFRMA_MAX
 
 #define XFRMA_MAX (__XFRMA_MAX - 1)
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 4aefb14..d7bf0b0 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -20,7 +20,8 @@
 static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct

[PATCH 3/8] xfrm6: Fix CHECKSUM_COMPLETE after IPv6 header push

2017-08-20 Thread Steffen Klassert

From: Yossi Kuperman 

xfrm6_transport_finish rebuilds the IPv6 header based on the
original one and pushes it back without fixing skb->csum.
Therefore, CHECKSUM_COMPLETE is no longer valid and the packet
gets dropped.

Fix skb->csum by calling skb_postpush_rcsum.

Note: A valid IPv4 header has checksum 0, unlike IPv6. Thus,
the change is not needed in the sibling xfrm4_transport_finish
function.

Signed-off-by: Yossi Kuperman 
Signed-off-by: Ilan Tayari 
Signed-off-by: Steffen Klassert 
---
 net/ipv6/xfrm6_input.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 3ef5d91..f95943a 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -34,6 +34,7 @@ EXPORT_SYMBOL(xfrm6_rcv_spi);
 int xfrm6_transport_finish(struct sk_buff *skb, int async)
 {
struct xfrm_offload *xo = xfrm_offload(skb);
+   int nhlen = skb->data - skb_network_header(skb);
 
skb_network_header(skb)[IP6CB(skb)->nhoff] =
XFRM_MODE_SKB_CB(skb)->protocol;
@@ -43,8 +44,9 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
return 1;
 #endif
 
-   __skb_push(skb, skb->data - skb_network_header(skb));
+   __skb_push(skb, nhlen);
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+   skb_postpush_rcsum(skb, skb_network_header(skb), nhlen);
 
if (xo && (xo->flags & XFRM_GRO)) {
skb_mac_header_rebuild(skb);
-- 
2.7.4

pull request (net-next): ipsec-next 2017-08-21

2017-08-20 Thread Steffen Klassert

1) Support RX checksum with IPsec crypto offload for esp4/esp6.
   From Ilan Tayari.

2) Fixup IPv6 checksums when doing IPsec crypto offload.
   From Yossi Kuperman.

3) Auto load the xfrom offload modules if a user installs
   a SA that requests IPsec offload. From Ilan Tayari.

4) Clear RX offload informations in xfrm_input to not
   confuse the TX path with stale offload informations.
   From Ilan Tayari.

5) Allow IPsec GSO for local sockets if the crypto operation
   will be offloaded.

6) Support setting of an output mark to the xfrm_state.
   This mark can be used to to do the tunnel route lookup.
   From Lorenzo Colitti.

Please pull or let me know if there are problems.

Thanks!

The following changes since commit cb5b136c0095d434cb63495da8efb6a3d663a38f:

  Merge branch 'dsa-rework-EEE-support' (2017-08-01 20:09:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next.git master

for you to fetch changes up to 077fbac405bfc6d41419ad6c1725804ad4e9887c:

  net: xfrm: support setting an output mark. (2017-08-11 07:03:00 +0200)


Ilan Tayari (4):
  esp4: Support RX checksum with crypto offload
  esp6: Support RX checksum with crypto offload
  xfrm: Auto-load xfrm offload modules
  xfrm: Clear RX SKB secpath xfrm_offload

Lorenzo Colitti (1):
  net: xfrm: support setting an output mark.

Steffen Klassert (1):
  net: Allow IPsec GSO for local sockets

Yossi Kuperman (2):
  xfrm6: Fix CHECKSUM_COMPLETE after IPv6 header push
  esp6: Fix RX checksum after header pull

 include/net/xfrm.h| 32 
 include/uapi/linux/xfrm.h |  1 +
 net/core/sock.c   |  2 +-
 net/ipv4/esp4.c   | 14 +++---
 net/ipv4/esp4_offload.c   |  5 -
 net/ipv4/xfrm4_policy.c   | 14 +-
 net/ipv6/esp6.c   | 16 +---
 net/ipv6/esp6_offload.c   |  5 -
 net/ipv6/xfrm6_input.c|  4 +++-
 net/ipv6/xfrm6_policy.c   |  9 ++---
 net/xfrm/xfrm_device.c|  5 +++--
 net/xfrm/xfrm_input.c |  2 ++
 net/xfrm/xfrm_output.c|  3 +++
 net/xfrm/xfrm_policy.c| 17 +
 net/xfrm/xfrm_state.c | 16 
 net/xfrm/xfrm_user.c  | 13 -
 16 files changed, 121 insertions(+), 37 deletions(-)

[PATCH 1/8] esp4: Support RX checksum with crypto offload

2017-08-20 Thread Steffen Klassert

From: Ilan Tayari 

Keep the device's reported ip_summed indication in case crypto
was offloaded by the device. Subtract the csum values of the
stripped parts (esp header+iv, esp trailer+auth_data) to keep
value correct.

Note: CHECKSUM_COMPLETE should be indicated only if skb->csum
has the post-decryption offload csum value.

Signed-off-by: Ariel Levkovich 
Signed-off-by: Ilan Tayari 
Signed-off-by: Steffen Klassert 
---
 net/ipv4/esp4.c | 14 +++---
 net/ipv4/esp4_offload.c |  4 +++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 0cbee0a..741acd7 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -510,7 +510,8 @@ int esp_input_done2(struct sk_buff *skb, int err)
int elen = skb->len - hlen;
int ihl;
u8 nexthdr[2];
-   int padlen;
+   int padlen, trimlen;
+   __wsum csumdiff;
 
if (!xo || (xo && !(xo->flags & CRYPTO_DONE)))
kfree(ESP_SKB_CB(skb)->tmp);
@@ -568,8 +569,15 @@ int esp_input_done2(struct sk_buff *skb, int err)
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
 
-   pskb_trim(skb, skb->len - alen - padlen - 2);
-   __skb_pull(skb, hlen);
+   trimlen = alen + padlen + 2;
+   if (skb->ip_summed == CHECKSUM_COMPLETE) {
+   csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
+   skb->csum = csum_block_sub(skb->csum, csumdiff,
+  skb->len - trimlen);
+   }
+   pskb_trim(skb, skb->len - trimlen);
+
+   skb_pull_rcsum(skb, hlen);
if (x->props.mode == XFRM_MODE_TUNNEL)
skb_reset_transport_header(skb);
else
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index e066601..05831de 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -182,11 +182,13 @@ static struct sk_buff *esp4_gso_segment(struct sk_buff 
*skb,
 static int esp_input_tail(struct xfrm_state *x, struct sk_buff *skb)
 {
struct crypto_aead *aead = x->data;
+   struct xfrm_offload *xo = xfrm_offload(skb);
 
if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr) + 
crypto_aead_ivsize(aead)))
return -EINVAL;
 
-   skb->ip_summed = CHECKSUM_NONE;
+   if (!(xo->flags & CRYPTO_DONE))
+   skb->ip_summed = CHECKSUM_NONE;
 
return esp_input_done2(skb, 0);
 }
-- 
2.7.4

[PATCH 6/8] xfrm: Clear RX SKB secpath xfrm_offload

2017-08-20 Thread Steffen Klassert

From: Ilan Tayari 

If an incoming packet undergoes XFRM crypto-offload, its secpath is
filled with xfrm_offload struct denoting offload information.

If the SKB is then forwarded to a device which supports crypto-
offload, the stack wrongfully attempts to offload it (even though
the output SA may not exist on the device) due to the leftover
secpath xo.

Clear the ingress xo by zeroizing secpath->olen just before
delivering the decapsulated packet to the network stack.

Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Ilan Tayari 
Signed-off-by: Steffen Klassert 
---
 net/xfrm/xfrm_input.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 923205e..f07eec5 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -424,6 +424,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
nf_reset(skb);
 
if (decaps) {
+   skb->sp->olen = 0;
skb_dst_drop(skb);
gro_cells_receive(_cells, skb);
return 0;
@@ -434,6 +435,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
 
err = x->inner_mode->afinfo->transport_finish(skb, xfrm_gro || 
async);
if (xfrm_gro) {
+   skb->sp->olen = 0;
skb_dst_drop(skb);
gro_cells_receive(_cells, skb);
return err;
-- 
2.7.4

[PATCH 7/8] net: Allow IPsec GSO for local sockets

2017-08-20 Thread Steffen Klassert

This patch allows local sockets to make use of XFRM GSO code path.

Signed-off-by: Steffen Klassert 
Signed-off-by: Ilan Tayari 
---
 include/net/xfrm.h | 19 +++
 net/core/sock.c|  2 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 5a36010..18d7de3 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1858,6 +1858,20 @@ int xfrm_dev_state_add(struct net *net, struct 
xfrm_state *x,
   struct xfrm_user_offload *xuo);
 bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
 
+static inline bool xfrm_dst_offload_ok(struct dst_entry *dst)
+{
+   struct xfrm_state *x = dst->xfrm;
+
+   if (!x || !x->type_offload)
+   return false;
+
+   if (x->xso.offload_handle && (x->xso.dev == dst->path->dev) &&
+   !dst->child->xfrm)
+   return true;
+
+   return false;
+}
+
 static inline void xfrm_dev_state_delete(struct xfrm_state *x)
 {
struct xfrm_state_offload *xso = >xso;
@@ -1900,6 +1914,11 @@ static inline bool xfrm_dev_offload_ok(struct sk_buff 
*skb, struct xfrm_state *x
 {
return false;
 }
+
+static inline bool xfrm_dst_offload_ok(struct dst_entry *dst)
+{
+   return false;
+}
 #endif
 
 static inline int xfrm_mark_get(struct nlattr **attrs, struct xfrm_mark *m)
diff --git a/net/core/sock.c b/net/core/sock.c
index 742f68c..564f835 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1757,7 +1757,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
sk->sk_route_caps &= ~sk->sk_route_nocaps;
if (sk_can_gso(sk)) {
-   if (dst->header_len) {
+   if (dst->header_len && !xfrm_dst_offload_ok(dst)) {
sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
} else {
sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
-- 
2.7.4

[PATCH 2/8] esp6: Support RX checksum with crypto offload

2017-08-20 Thread Steffen Klassert

From: Ilan Tayari 

Keep the device's reported ip_summed indication in case crypto
was offloaded by the device. Subtract the csum values of the
stripped parts (esp header+iv, esp trailer+auth_data) to keep
value correct.

Note: CHECKSUM_COMPLETE should be indicated only if skb->csum
has the post-decryption offload csum value.

Signed-off-by: Ariel Levkovich 
Signed-off-by: Ilan Tayari 
Signed-off-by: Steffen Klassert 
---
 net/ipv6/esp6.c | 14 +++---
 net/ipv6/esp6_offload.c |  4 +++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 9ed3547..0ca1db6 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -470,7 +470,8 @@ int esp6_input_done2(struct sk_buff *skb, int err)
int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
int elen = skb->len - hlen;
int hdr_len = skb_network_header_len(skb);
-   int padlen;
+   int padlen, trimlen;
+   __wsum csumdiff;
u8 nexthdr[2];
 
if (!xo || (xo && !(xo->flags & CRYPTO_DONE)))
@@ -492,8 +493,15 @@ int esp6_input_done2(struct sk_buff *skb, int err)
 
/* ... check padding bits here. Silly. :-) */
 
-   pskb_trim(skb, skb->len - alen - padlen - 2);
-   __skb_pull(skb, hlen);
+   trimlen = alen + padlen + 2;
+   if (skb->ip_summed == CHECKSUM_COMPLETE) {
+   csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
+   skb->csum = csum_block_sub(skb->csum, csumdiff,
+  skb->len - trimlen);
+   }
+   pskb_trim(skb, skb->len - trimlen);
+
+   skb_pull_rcsum(skb, hlen);
if (x->props.mode == XFRM_MODE_TUNNEL)
skb_reset_transport_header(skb);
else
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index f02f131..eec3add 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -209,11 +209,13 @@ static struct sk_buff *esp6_gso_segment(struct sk_buff 
*skb,
 static int esp6_input_tail(struct xfrm_state *x, struct sk_buff *skb)
 {
struct crypto_aead *aead = x->data;
+   struct xfrm_offload *xo = xfrm_offload(skb);
 
if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr) + 
crypto_aead_ivsize(aead)))
return -EINVAL;
 
-   skb->ip_summed = CHECKSUM_NONE;
+   if (!(xo->flags & CRYPTO_DONE))
+   skb->ip_summed = CHECKSUM_NONE;
 
return esp6_input_done2(skb, 0);
 }
-- 
2.7.4

[PATCH net-next v7 05/10] landlock: Add LSM hooks related to filesystem

2017-08-20 Thread Mickaël Salaün

Handle 33 filesystem-related LSM hooks for the Landlock filesystem
event: LANDLOCK_SUBTYPE_EVENT_FS.

A Landlock event wrap LSM hooks for similar kernel object types (e.g.
struct file, struct path...). Multiple LSM hooks can trigger the same
Landlock event.

Landlock handle nine coarse-grained actions: read, write, execute, new,
get, remove, ioctl, lock and fcntl. Each of them abstract LSM hook
access control in a way that can be extended in the future.

The Landlock LSM hook registration is done after other LSM to only run
actions from user-space, via eBPF programs, if the access was granted by
major (privileged) LSMs.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v6:
* add 3 more sub-events: IOCTL, LOCK, FCNTL
  https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35...@digikod.net
* use the new security_add_hooks()
* explain the -Werror=unused-function
* constify pointers
* cleanup headers

Changes since v5:
* split hooks.[ch] into hooks.[ch] and hooks_fs.[ch]
* add more documentation
* cosmetic fixes
* rebase (SCALAR_VALUE)

Changes since v4:
* add LSM hook abstraction called Landlock event
  * use the compiler type checking to verify hooks use by an event
  * handle all filesystem related LSM hooks (e.g. file_permission,
mmap_file, sb_mount...)
* register BPF programs for Landlock just after LSM hooks registration
* move hooks registration after other LSMs
* add failsafes to check if a hook is not used by the kernel
* allow partial raw value access form the context (needed for programs
  generated by LLVM)

Changes since v3:
* split commit
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
---
 include/linux/lsm_hooks.h|   5 +
 security/landlock/Makefile   |   7 +-
 security/landlock/common.h   |   2 +
 security/landlock/hooks.c|  83 ++
 security/landlock/hooks.h| 177 +
 security/landlock/hooks_fs.c | 586 +++
 security/landlock/hooks_fs.h |  19 ++
 security/landlock/init.c |  10 +
 security/security.c  |  12 +-
 9 files changed, 899 insertions(+), 2 deletions(-)
 create mode 100644 security/landlock/hooks.c
 create mode 100644 security/landlock/hooks.h
 create mode 100644 security/landlock/hooks_fs.c
 create mode 100644 security/landlock/hooks_fs.h

diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 3a90febadbe2..7614c3d66265 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1982,5 +1982,10 @@ void __init loadpin_add_hooks(void);
 #else
 static inline void loadpin_add_hooks(void) { };
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+extern void __init landlock_add_hooks(void);
+#else
+static inline void __init landlock_add_hooks(void) { }
+#endif /* CONFIG_SECURITY_LANDLOCK */
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/security/landlock/Makefile b/security/landlock/Makefile
index 7205f9a7a2ee..b382be409b3b 100644
--- a/security/landlock/Makefile
+++ b/security/landlock/Makefile
@@ -1,3 +1,8 @@
+# Catch defined but unused hooks, e.g. error out if a HOOK_NEW_FS(foo) is not
+# used with a HOOK_INIT_FS(foo) in the struct security_hook_list
+# landlock_hooks.
+ccflags-$(CONFIG_SECURITY_LANDLOCK) += -Werror=unused-function
+
 obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
 
-landlock-y := init.o
+landlock-y := init.o hooks.o hooks_fs.o
diff --git a/security/landlock/common.h b/security/landlock/common.h
index c82cbd3fb640..a69c35231d35 100644
--- a/security/landlock/common.h
+++ b/security/landlock/common.h
@@ -18,4 +18,6 @@
  */
 #define LANDLOCK_ABI 1
 
+#define LANDLOCK_NAME "landlock"
+
 #endif /* _SECURITY_LANDLOCK_COMMON_H */
diff --git a/security/landlock/hooks.c b/security/landlock/hooks.c
new file mode 100644
index ..b48caeb0a49a
--- /dev/null
+++ b/security/landlock/hooks.c
@@ -0,0 +1,83 @@
+/*
+ * Landlock LSM - hook helpers
+ *
+ * Copyright © 2016-2017 Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include  /* enum bpf_access_type, struct landlock_context */
+#include 
+#include  /* BPF_PROG_RUN() */
+#include  /* list_add_tail_rcu */
+#include  /* offsetof */
+
+#include "hooks.h" /* CTX_ARG_NB */
+
+
+bool landlock_is_valid_access(int off, int size, enum bpf_access_type type,
+   enum bpf_reg_type *reg_type,
+   enum bpf_reg_type ctx_types[CTX_ARG_NB],
+   const union bpf_prog_subtype

[PATCH net-next v7 03/10] bpf,landlock: Define an eBPF program type for a Landlock rule

2017-08-20 Thread Mickaël Salaün

Add a new type of eBPF program used by Landlock rules.

This new BPF program type will be registered with the Landlock LSM
initialization.

Add an initial Landlock Kconfig.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
---

Changes since v6:
* add 3 more sub-events: IOCTL, LOCK, FCNTL
  https://lkml.kernel.org/r/2fbc99a6-f190-f335-bd14-04bdeed35...@digikod.net
* rename LANDLOCK_VERSION to LANDLOCK_ABI to better reflect its purpose,
  and move it from landlock.h to common.h
* rename BPF_PROG_TYPE_LANDLOCK to BPF_PROG_TYPE_LANDLOCK_RULE: an eBPF
  program could be used for something else than a rule
* simplify struct landlock_context by removing the arch and syscall_nr fields
* remove all eBPF map functions call, remove ABILITY_WRITE
* refactor bpf_landlock_func_proto() (suggested by Kees Cook)
* constify pointers
* fix doc inclusion

Changes since v5:
* rename file hooks.c to init.c
* fix spelling

Changes since v4:
* merge a minimal (not enabled) LSM code and Kconfig in this commit

Changes since v3:
* split commit
* revamp the landlock_context:
  * add arch, syscall_nr and syscall_cmd (ioctl, fcntl…) to be able to
cross-check action with the event type
  * replace args array with dedicated fields to ease the addition of new
fields
---
 include/linux/bpf_types.h  |  3 ++
 include/uapi/linux/bpf.h   | 97 +
 security/Kconfig   |  1 +
 security/Makefile  |  2 +
 security/landlock/Kconfig  | 18 
 security/landlock/Makefile |  3 ++
 security/landlock/common.h | 21 +
 security/landlock/init.c   | 98 ++
 tools/include/uapi/linux/bpf.h | 97 +
 9 files changed, 340 insertions(+)
 create mode 100644 security/landlock/Kconfig
 create mode 100644 security/landlock/Makefile
 create mode 100644 security/landlock/common.h
 create mode 100644 security/landlock/init.c

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 6f1a567667b8..8bac93970a47 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -18,6 +18,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_KPROBE, kprobe_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_TRACEPOINT, tracepoint_prog_ops)
 BPF_PROG_TYPE(BPF_PROG_TYPE_PERF_EVENT, perf_event_prog_ops)
 #endif
+#ifdef CONFIG_SECURITY_LANDLOCK
+BPF_PROG_TYPE(BPF_PROG_TYPE_LANDLOCK_RULE, bpf_landlock_ops)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8541ab85e432..20da634da941 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -129,6 +129,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_LWT_XMIT,
BPF_PROG_TYPE_SOCK_OPS,
BPF_PROG_TYPE_SK_SKB,
+   BPF_PROG_TYPE_LANDLOCK_RULE,
 };
 
 enum bpf_attach_type {
@@ -879,4 +880,100 @@ enum {
 #define TCP_BPF_IW 1001/* Set TCP initial congestion window */
 #define TCP_BPF_SNDCWND_CLAMP  1002/* Set sndcwnd_clamp */
 
+/**
+ * enum landlock_subtype_event - event occurring when an action is performed on
+ * a particular kernel object
+ *
+ * An event is a policy decision point which exposes the same context type
+ * (especially the same arg[0-9] field types) for each rule execution.
+ *
+ * @LANDLOCK_SUBTYPE_EVENT_UNSPEC: invalid value
+ * @LANDLOCK_SUBTYPE_EVENT_FS: generic filesystem event
+ * @LANDLOCK_SUBTYPE_EVENT_FS_IOCTL: custom IOCTL sub-event
+ * @LANDLOCK_SUBTYPE_EVENT_FS_LOCK: custom LOCK sub-event
+ * @LANDLOCK_SUBTYPE_EVENT_FS_FCNTL: custom FCNTL sub-event
+ */
+enum landlock_subtype_event {
+   LANDLOCK_SUBTYPE_EVENT_UNSPEC,
+   LANDLOCK_SUBTYPE_EVENT_FS,
+   LANDLOCK_SUBTYPE_EVENT_FS_IOCTL,
+   LANDLOCK_SUBTYPE_EVENT_FS_LOCK,
+   LANDLOCK_SUBTYPE_EVENT_FS_FCNTL,
+};
+#define _LANDLOCK_SUBTYPE_EVENT_LAST LANDLOCK_SUBTYPE_EVENT_FS_FCNTL
+
+/**
+ * DOC: landlock_subtype_ability
+ *
+ * eBPF context and functions allowed for a rule
+ *
+ * - LANDLOCK_SUBTYPE_ABILITY_DEBUG: allows to do debug actions (e.g. writing
+ *   logs), which may be dangerous and should only be used for rule testing
+ */
+#define LANDLOCK_SUBTYPE_ABILITY_DEBUG (1ULL << 0)
+#define _LANDLOCK_SUBTYPE_ABILITY_NB   1
+#define _LANDLOCK_SUBTYPE_ABILITY_MASK ((1ULL << 
_LANDLOCK_SUBTYPE_ABILITY_NB) - 1)
+
+/*
+ * Future options for a Landlock rule (e.g. run even if a previous rule denied
+ * an action).
+ */
+#define _LANDLOCK_SUBTYPE_OPTION_NB0
+#define _LANDLOCK_SUBTYPE_OPTION_MASK  ((1ULL << 
_LANDLOCK_SUBTYPE_OPTION_NB) - 1)
+
+/*
+ *

[PATCH net-next v7 01/10] selftest: Enhance kselftest_harness.h with a step mechanism

2017-08-20 Thread Mickaël Salaün

This step mechanism may be useful to return an information about the
error without being able to write to TH_LOG_STREAM.

Set _metadata->no_print to true to print this counter.

Signed-off-by: Mickaël Salaün 
Cc: Andy Lutomirski 
Cc: Arnaldo Carvalho de Melo 
Cc: Kees Cook 
Cc: Shuah Khan 
Cc: Will Drewry 
Link: 
https://lkml.kernel.org/r/cagxu5j+d-fp8kt9unnoqkrqjp4dytpmgkjxwykzyryivpz3...@mail.gmail.com
---

This patch is intended to the kselftest tree:
https://lkml.kernel.org/r/20170806232337.4191-1-...@digikod.net

Changes since v6:
* add the step counter in assert/expect macros and use _metadata to
  enable the counter (suggested by Kees Cook)
---
 tools/testing/selftests/kselftest_harness.h   | 31 ++-
 tools/testing/selftests/seccomp/seccomp_bpf.c |  2 +-
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kselftest_harness.h 
b/tools/testing/selftests/kselftest_harness.h
index c56f72e07cd7..850ff6946027 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -51,6 +51,9 @@
 #define __KSELFTEST_HARNESS_H
 
 #define _GNU_SOURCE
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -555,12 +558,18 @@
  * return while still providing an optional block to the API consumer.
  */
 #define OPTIONAL_HANDLER(_assert) \
-   for (; _metadata->trigger;  _metadata->trigger = __bail(_assert))
+   for (; _metadata->trigger; _metadata->trigger = \
+   __bail(_assert, _metadata->no_print, _metadata->step))
+
+#define __INC_STEP(_metadata) \
+   if (_metadata->passed && _metadata->step < 255) \
+   _metadata->step++;
 
 #define __EXPECT(_expected, _seen, _t, _assert) do { \
/* Avoid multiple evaluation of the cases */ \
__typeof__(_expected) __exp = (_expected); \
__typeof__(_seen) __seen = (_seen); \
+   __INC_STEP(_metadata); \
if (!(__exp _t __seen)) { \
unsigned long long __exp_print = (uintptr_t)__exp; \
unsigned long long __seen_print = (uintptr_t)__seen; \
@@ -576,6 +585,7 @@
 #define __EXPECT_STR(_expected, _seen, _t, _assert) do { \
const char *__exp = (_expected); \
const char *__seen = (_seen); \
+   __INC_STEP(_metadata); \
if (!(strcmp(__exp, __seen) _t 0))  { \
__TH_LOG("Expected '%s' %s '%s'.", __exp, #_t, __seen); \
_metadata->passed = 0; \
@@ -590,6 +600,8 @@ struct __test_metadata {
int termsig;
int passed;
int trigger; /* extra handler after the evaluation */
+   __u8 step;
+   bool no_print; /* manual trigger when TH_LOG_STREAM is not available */
struct __test_metadata *prev, *next;
 };
 
@@ -634,10 +646,13 @@ static inline void __register_test(struct __test_metadata 
*t)
}
 }
 
-static inline int __bail(int for_realz)
+static inline int __bail(int for_realz, bool no_print, __u8 step)
 {
-   if (for_realz)
+   if (for_realz) {
+   if (no_print)
+   _exit(step);
abort();
+   }
return 0;
 }
 
@@ -655,18 +670,24 @@ void __run_test(struct __test_metadata *t)
t->passed = 0;
} else if (child_pid == 0) {
t->fn(t);
-   _exit(t->passed);
+   /* return the step that failed or 0 */
+   _exit(t->passed ? 0 : t->step);
} else {
/* TODO(wad) add timeout support. */
waitpid(child_pid, , 0);
if (WIFEXITED(status)) {
-   t->passed = t->termsig == -1 ? WEXITSTATUS(status) : 0;
+   t->passed = t->termsig == -1 ? !WEXITSTATUS(status) : 0;
if (t->termsig != -1) {
fprintf(TH_LOG_STREAM,
"%s: Test exited normally "
"instead of by signal (code: %d)\n",
t->name,
WEXITSTATUS(status));
+   } else if (!t->passed) {
+   fprintf(TH_LOG_STREAM,
+   "%s: Test failed at step #%d\n",
+   t->name,
+   WEXITSTATUS(status));
}
} else if (WIFSIGNALED(status)) {
t->passed = 0;
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 73f5ea6778ce..4d6f92a9df6b 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -107,7 +107,7 @@ TEST(mode_strict_support)
ASSERT_EQ(0, ret) {

[PATCH net-next v7 04/10] bpf: Define handle_fs and add a new helper bpf_handle_fs_get_mode()

2017-08-20 Thread Mickaël Salaün

Add an eBPF function bpf_handle_fs_get_mode(handle_fs) to get the mode
of a an abstract object wrapping either a file, a dentry, a path, or an
inode.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andy Lutomirski 
Cc: Daniel Borkmann 
Cc: David S. Miller 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Jann Horn 
---

Changes since v6:
* remove WARN_ON() for missing dentry->d_inode
* refactor bpf_landlock_func_proto() (suggested by Kees Cook)

Changes since v5:
* cosmetic fixes and rebase

Changes since v4:
* use a file abstraction (handle) to wrap inode, dentry, path and file
  structs
* remove bpf_landlock_cmp_fs_beneath()
* rename the BPF helper and move it to kernel/bpf/
* tighten helpers accessible by a Landlock rule

Changes since v3:
* remove bpf_landlock_cmp_fs_prop() (suggested by Alexie Starovoitov)
* add hooks dealing with struct inode and struct path pointers:
  inode_permission and inode_getattr
* add abstraction over eBPF helper arguments thanks to wrapping structs
* add bpf_landlock_get_fs_mode() helper to check file type and mode
* merge WARN_ON() (suggested by Kees Cook)
* fix and update bpf_helpers.h
* use BPF_CALL_* for eBPF helpers (suggested by Alexie Starovoitov)
* make handle arraymap safe (RCU) and remove buggy synchronize_rcu()
* factor out the arraymay walk
* use size_t to index array (suggested by Jann Horn)

Changes since v2:
* add MNT_INTERNAL check to only add file handle from user-visible FS
  (e.g. no anonymous inode)
* replace struct file* with struct path* in map_landlock_handle
* add BPF protos
* fix bpf_landlock_cmp_fs_prop_with_struct_file()
---
 include/linux/bpf.h   | 31 ++
 include/uapi/linux/bpf.h  |  8 +
 kernel/bpf/Makefile   |  2 +-
 kernel/bpf/helpers_fs.c   | 52 +++
 kernel/bpf/verifier.c |  6 
 security/landlock/init.c  | 17 ++
 tools/include/uapi/linux/bpf.h| 10 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  2 ++
 8 files changed, 126 insertions(+), 2 deletions(-)
 create mode 100644 kernel/bpf/helpers_fs.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aef2e6f6d763..5316393150e1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -16,6 +16,11 @@
 #include 
 #include 
 
+/* FS helpers */
+#include  /* struct dentry */
+#include  /* struct file, struct inode */
+#include  /* struct path */
+
 struct perf_event;
 struct bpf_prog;
 struct bpf_map;
@@ -85,6 +90,8 @@ enum bpf_arg_type {
 
ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING,   /* any (initialized) argument is ok */
+
+   ARG_CONST_PTR_TO_HANDLE_FS, /* pointer to an abstract FS struct */
 };
 
 /* type of values returned from helper functions */
@@ -141,6 +148,7 @@ enum bpf_reg_type {
PTR_TO_STACK,/* reg == frame_pointer + offset */
PTR_TO_PACKET,   /* reg points to skb->data */
PTR_TO_PACKET_END,   /* skb->data + headlen */
+   CONST_PTR_TO_HANDLE_FS,  /* FS helpers */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -223,6 +231,26 @@ struct bpf_event_entry {
struct rcu_head rcu;
 };
 
+/* FS helpers */
+enum bpf_handle_fs_type {
+   BPF_HANDLE_FS_TYPE_NONE,
+   BPF_HANDLE_FS_TYPE_FILE,
+   BPF_HANDLE_FS_TYPE_INODE,
+   BPF_HANDLE_FS_TYPE_PATH,
+   BPF_HANDLE_FS_TYPE_DENTRY,
+};
+
+struct bpf_handle_fs {
+   enum bpf_handle_fs_type type;
+   union {
+   struct file *file;
+   struct inode *inode;
+   const struct path *path;
+   struct dentry *dentry;
+   };
+};
+
+
 u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
 u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
@@ -415,6 +443,9 @@ extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
 
+/* FS helpers */
+extern const struct bpf_func_proto bpf_handle_fs_get_mode_proto;
+
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 20da634da941..1624c0bbdf33 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -600,6 +600,13 @@ union bpf_attr {
  * @map_flags: sock map specific flags
  *bit 1: Enable strparser
  *other bits: reserved
+ *
+ * s64 bpf_handle_fs_get_mode(handle_fs)
+ * Get the mode of a struct bpf_handle_fs
+ * fs: struct bpf_handle_fs address
+ * Return:
+ *   >= 0

[PATCH net-next v7 06/10] seccomp,landlock: Handle Landlock events per process hierarchy

2017-08-20 Thread Mickaël Salaün

The seccomp(2) syscall can be used by a task to apply a Landlock rule to
itself. As a seccomp filter, a Landlock rule is enforced for the current
task and all its future children. A rule is immutable and a task can
only add new restricting rules to itself, forming a chain of rules.

A Landlock rule is tied to a Landlock event. If the action on a kernel
object is allowed by the other Linux security mechanisms (e.g. DAC,
capabilities, other LSM), then a Landlock event related to this kind of
object is triggered. The chain of rules for this event is then
evaluated. Each rule return a 32-bit value which can deny the action on
a kernel object with a non-zero value. If every rules of the chain
return zero, then the action on the object is allowed.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: James Morris 
Cc: Kees Cook 
Cc: Serge E. Hallyn 
Cc: Will Drewry 
Link: https://lkml.kernel.org/r/c10a503d-5e35-7785-2f3d-25ed8dd63...@digikod.net
---

Changes since v6:
* rename some functions with more accurate names to reflect that an eBPF
  program for Landlock could be used for something else than a rule
* reword rule "appending" to "prepending" and explain it
* remove the superfluous no_new_privs check, only check global
  CAP_SYS_ADMIN when prepending a Landlock rule (needed for containers)
* create and use {get,put}_seccomp_landlock() (suggested by Kees Cook)
* replace ifdef with static inlined function (suggested by Kees Cook)
* use get_user() (suggested by Kees Cook)
* replace atomic_t with refcount_t (requested by Kees Cook)
* move struct landlock_{rule,events} from landlock.h to common.h
* cleanup headers

Changes since v5:
* remove struct landlock_node and use a similar inheritance mechanisme
  as seccomp-bpf (requested by Andy Lutomirski)
* rename SECCOMP_ADD_LANDLOCK_RULE to SECCOMP_APPEND_LANDLOCK_RULE
* rename file manager.c to providers.c
* add comments
* typo and cosmetic fixes

Changes since v4:
* merge manager and seccomp patches
* return -EFAULT in seccomp(2) when user_bpf_fd is null to easely check
  if Landlock is supported
* only allow a process with the global CAP_SYS_ADMIN to use Landlock
  (will be lifted in the future)
* add an early check to exit as soon as possible if the current process
  does not have Landlock rules

Changes since v3:
* remove the hard link with seccomp (suggested by Andy Lutomirski and
  Kees Cook):
  * remove the cookie which could imply multiple evaluation of Landlock
rules
  * remove the origin field in struct landlock_data
* remove documentation fix (merged upstream)
* rename the new seccomp command to SECCOMP_ADD_LANDLOCK_RULE
* internal renaming
* split commit
* new design to be able to inherit on the fly the parent rules

Changes since v2:
* Landlock programs can now be run without seccomp filter but for any
  syscall (from the process) or interruption
* move Landlock related functions and structs into security/landlock/*
  (to manage cgroups as well)
* fix seccomp filter handling: run Landlock programs for each of their
  legitimate seccomp filter
* properly clean up all seccomp results
* cosmetic changes to ease the understanding
* fix some ifdef
---
 include/linux/landlock.h  |  42 +++
 include/linux/seccomp.h   |   5 +
 include/uapi/linux/seccomp.h  |   1 +
 kernel/fork.c |   8 +-
 kernel/seccomp.c  |   3 +
 security/landlock/Makefile|   2 +-
 security/landlock/common.h|  42 +++
 security/landlock/hooks.c |  46 
 security/landlock/hooks.h |   5 +
 security/landlock/init.c  |   3 +-
 security/landlock/providers.c | 261 ++
 11 files changed, 415 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/landlock.h
 create mode 100644 security/landlock/providers.c

diff --git a/include/linux/landlock.h b/include/linux/landlock.h
new file mode 100644
index ..c5c929931a1f
--- /dev/null
+++ b/include/linux/landlock.h
@@ -0,0 +1,42 @@
+/*
+ * Landlock LSM - public kernel headers
+ *
+ * Copyright © 2016-2017 Mickaël Salaün 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2, as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_LANDLOCK_H
+#define _LINUX_LANDLOCK_H
+
+#include 
+#include  /* task_struct */
+
+#ifdef CONFIG_SECURITY_LANDLOCK
+struct landlock_events;
+#endif /* CONFIG_SECURITY_LANDLOCK */
+
+#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_SECURITY_LANDLOCK)
+extern int landlock_seccomp_prepend_rule(unsigned int flags,
+   const char __user *user_bpf_fd);
+extern void put_seccomp_landlock(struct task_struct *tsk);
+extern void get_seccomp_landlock(struct

Re: [PATCH net] ipv6: repair fib6 tree in failure case

2017-08-20 Thread David Miller

From: Wei Wang 
Date: Fri, 18 Aug 2017 17:14:49 -0700

> From: Wei Wang 
> 
> In fib6_add(), it is possible that fib6_add_1() picks an intermediate
> node and sets the node's fn->leaf to NULL in order to add this new
> route. However, if fib6_add_rt2node() fails to add the new
> route for some reason, fn->leaf will be left as NULL and could
> potentially cause crash when fn->leaf is accessed in fib6_locate().
> This patch makes sure fib6_repair_tree() is called to properly repair
> fn->leaf in the above failure case.
> 
> Here is the syzkaller reported general protection fault in fib6_locate:
 ...
> Note: there is no "Fixes" tag as this seems to be a bug introduced
> very early.
> 
> Signed-off-by: Wei Wang 
> Acked-by: Eric Dumazet 

Applied and queued up for -stable.

Re: [PATCH 0/3] MIPS,bpf: Improvements for MIPS eBPF JIT

2017-08-20 Thread David Miller

From: David Daney 
Date: Fri, 18 Aug 2017 16:40:30 -0700

> I suggest that the whole thing go via the BPF/net-next path as there
> are dependencies on code that is not yet merged to Linus' tree.

What kind of dependency?  On networking or MIPS changes?

If the dependency is on MIPS changes, then if I cannot apply this as
it will break the net-next build on MIPS.  You should merge this
via the MIPS tree, where the dependencies are, in that case.

Please clarify what is specifically happening here.

Thanks.

Re: [PATCH 2/2] vhost-net: revert vhost_exceeds_maxpend logic to its original

2017-08-20 Thread Jason Wang




On 2017年08月19日 14:41, Koichiro Den wrote:

To depend on vq.num and the usage of VHOST_MAX_PEND is not succinct
and in some case unexpected, so revert its logic part only.


Hi:

Could you explain a little bit more on the case that is was not sufficent?

Thanks



Signed-off-by: Koichiro Den 
---
  drivers/vhost/net.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 06d044862e58..99cf99b308a7 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -433,11 +433,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
  
  static bool vhost_exceeds_maxpend(struct vhost_net *net)

  {
+   int num_pends;
struct vhost_net_virtqueue *nvq = >vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = >vq;
  
-	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV

-   == nvq->done_idx;
+   num_pends = likely(nvq->upend_idx >= nvq->done_idx) ?
+   (nvq->upend_idx - nvq->done_idx) :
+   (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx);
+
+   return num_pends > VHOST_MAX_PEND;
  }
  
  /* Expects to be always run from workqueue - which acts as

[GIT] Networking

2017-08-20 Thread David Miller


1) Fix IGMP handling wrt VRF, from David Ahern.

2) Fix timer access to freed object in dccp, from Eric Dumazet.

3) Use kmalloc_array() in ptr_ring to avoid overflow cases which
   are triggerable by userspace.  Also from Eric Dumazet.

4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin
   Ian King.

5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.

6) Fix use after free in TIPC, from Eric Dumazet.

7) When replacing a route in ipv6 we need to reset the round robin
   pointer, from Wei Wang.

8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
   relaxed ordering changes, from Thierry Redding.  I made sure to get
   an explicit ACK from Bjorn this time around :-)

Please pull, thanks a lot!

The following changes since commit 510c8a899caf095cb13d09d203573deef15db2fe:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-08-15 
18:52:28 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to 348a4002729ccab8b888b38cbc099efa2f2a2036:

  ipv6: repair fib6 tree in failure case (2017-08-20 20:06:56 -0700)


Alexander Potapenko (1):
  sctp: fully initialize the IPv6 address in sctp_v6_to_addr()

Chris Packham (1):
  switchdev: documentation: minor typo fixes

Colin Ian King (3):
  nfp: fix infinite loop on umapping cleanup
  netxen: fix incorrect loop counter decrement
  irda: do not leak initialized list.dev to userspace

Daniel Borkmann (2):
  bpf, doc: improve sysctl knob description
  bpf, doc: also add s390x as arch to sysctl description

David Ahern (1):
  net: igmp: Use ingress interface rather than vrf device

David Howells (1):
  rxrpc: Fix oops when discarding a preallocated service call

Eric Dumazet (5):
  dccp: defer ccid_hc_tx_delete() at dismantle time
  ptr_ring: use kmalloc_array()
  ipv4: better IP_MAX_MTU enforcement
  tun: handle register_netdevice() failures properly
  tipc: fix use-after-free

Eric Leblond (1):
  tools lib bpf: improve warning

Huy Nguyen (1):
  net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled

Jiri Pirko (1):
  net: sched: fix p_filter_chain check in tcf_chain_flush

Konstantin Khlebnikov (1):
  net_sched: fix order of queue length updates in qdisc_replace()

Liping Zhang (1):
  openvswitch: fix skb_panic due to the incorrect actions attrlen

Matthew Dawson (1):
  datagram: When peeking datagrams with offset < 0 don't skip empty skbs

Michael Ellerman (1):
  bpf: Update sysctl documentation to list all supported architectures

Neal Cardwell (1):
  tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP

Roopa Prabhu (1):
  net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set

Thierry Reding (1):
  PCI: Allow PCI express root ports to find themselves

Wei Wang (2):
  ipv6: reset fn->rr_ptr when replacing route
  ipv6: repair fib6 tree in failure case

Xin Long (1):
  net: sched: fix NULL pointer dereference when action calls some targets

 Documentation/networking/switchdev.txt  |  4 ++--
 Documentation/sysctl/net.txt| 47 
---
 drivers/net/ethernet/mellanox/mlx4/main.c   |  4 ++--
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c |  3 +--
 drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c  |  2 +-
 drivers/net/tun.c   |  3 +++
 drivers/pci/pci.c   |  9 -
 include/linux/ptr_ring.h|  9 +
 include/linux/skb_array.h   |  3 ++-
 include/net/ip.h|  4 ++--
 include/net/sch_generic.h   |  5 -
 include/net/sock.h  |  4 +---
 net/core/datagram.c | 12 +---
 net/dccp/proto.c| 14 --
 net/ipv4/igmp.c | 10 +-
 net/ipv4/route.c| 13 ++---
 net/ipv4/tcp_input.c|  3 +--
 net/ipv4/udp.c  |  3 ++-
 net/ipv6/ip6_fib.c  | 28 
+++-
 net/ipv6/udp.c  |  3 ++-
 net/irda/af_irda.c  |  2 +-
 net/openvswitch/actions.c   |  1 +
 net/openvswitch/datapath.c  |  7 ---
 net/openvswitch/datapath.h  |  2 ++
 net/rxrpc/call_accept.c |  1 +
 net/sched/act_ipt.c |  2 ++
 net/sched/cls_api.c |  2 +-

Re: [PATCH V7 net-next 00/22] Huawei HiNIC Ethernet Driver

2017-08-20 Thread Aviad Krawczyk

Got it

On 8/17/2017 10:33 PM, David Miller wrote:
> 
> You've posted this series 3 times today.
> 
> That's way too fast.
> 
> You must wait at least one full day for more feedback to come
> your way.
> 
> If you just repsin your series for every little small change nobody is
> going to perform a thorough review of your patches because you keep
> respining them so often that people feel like their review efforts are
> going to be wasted.
> 
> You must be patient.  I can see that the reason you are respinning so
> often if because you want your patches integrated more quickly.
> 
> But what you are doing is having the opposite effect.  It is draining
> on people and will make your changes take longer to be integrated.
> 
> .
>

Re: [PATCH V5 net-next 01/21] net-next/hinic: Initialize hw interface

2017-08-20 Thread Aviad Krawczyk

We will remove all the casting from void *.

Thanks

On 8/18/2017 8:03 AM, David Miller wrote:
> From: Stephen Hemminger 
> Date: Thu, 17 Aug 2017 17:45:40 -0700
> 
>> On Thu, 17 Aug 2017 19:52:42 +0800
>> Aviad Krawczyk  wrote:
>>
>>> +   nic_dev = (struct hinic_dev *)netdev_priv(netdev);
>>
>> Since netdev_priv() returns void *, a cast is not necessary here.
> 
> Agreed.
> 
> .
>

[PATCH v6 iproute2 3/8] rdma: Add dev object

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

Device (dev) object represents struct ib_device to the user space.

Device properties:
 * Device capabilities
 * FW version to the device output
 * node_guid and sys_image_guid
 * node_type

Signed-off-by: Leon Romanovsky 
---
 rdma/Makefile |   2 +-
 rdma/dev.c| 230 ++
 rdma/rdma.c   |   3 +-
 rdma/rdma.h   |  17 +
 rdma/utils.c  |  54 +-
 5 files changed, 303 insertions(+), 3 deletions(-)
 create mode 100644 rdma/dev.c

diff --git a/rdma/Makefile b/rdma/Makefile
index 64da2142..123d7ac5 100644
--- a/rdma/Makefile
+++ b/rdma/Makefile
@@ -2,7 +2,7 @@ include ../Config
 
 ifeq ($(HAVE_MNL),y)
 
-RDMA_OBJ = rdma.o utils.o
+RDMA_OBJ = rdma.o utils.o dev.o
 
 TARGETS=rdma
 CFLAGS += $(shell $(PKG_CONFIG) libmnl --cflags)
diff --git a/rdma/dev.c b/rdma/dev.c
new file mode 100644
index ..f6b55bae
--- /dev/null
+++ b/rdma/dev.c
@@ -0,0 +1,230 @@
+/*
+ * dev.c   RDMA tool
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ * Authors: Leon Romanovsky 
+ */
+
+#include "rdma.h"
+
+static int dev_help(struct rd *rd)
+{
+   pr_out("Usage: %s dev show [DEV]\n", rd->filename);
+   return 0;
+}
+
+static const char *dev_caps_to_str(uint32_t idx)
+{
+#define RDMA_DEV_FLAGS(x) \
+   x(RESIZE_MAX_WR, 0) \
+   x(BAD_PKEY_CNTR, 1) \
+   x(BAD_QKEY_CNTR, 2) \
+   x(RAW_MULTI, 3) \
+   x(AUTO_PATH_MIG, 4) \
+   x(CHANGE_PHY_PORT, 5) \
+   x(UD_AV_PORT_ENFORCE_PORT_ENFORCE, 6) \
+   x(CURR_QP_STATE_MOD, 7) \
+   x(SHUTDOWN_PORT, 8) \
+   x(INIT_TYPE, 9) \
+   x(PORT_ACTIVE_EVENT, 10) \
+   x(SYS_IMAGE_GUID, 11) \
+   x(RC_RNR_NAK_GEN, 12) \
+   x(SRQ_RESIZE, 13) \
+   x(N_NOTIFY_CQ, 14) \
+   x(LOCAL_DMA_LKEY, 15) \
+   x(MEM_WINDOW, 17) \
+   x(UD_IP_CSUM, 18) \
+   x(UD_TSO, 19) \
+   x(XRC, 20) \
+   x(MEM_MGT_EXTENSIONS, 21) \
+   x(BLOCK_MULTICAST_LOOPBACK, 22) \
+   x(MEM_WINDOW_TYPE_2A, 23) \
+   x(MEM_WINDOW_TYPE_2B, 24) \
+   x(RC_IP_CSUM, 25) \
+   x(RAW_IP_CSUM, 26) \
+   x(CROSS_CHANNEL, 27) \
+   x(MANAGED_FLOW_STEERING, 29) \
+   x(SIGNATURE_HANDOVER, 30) \
+   x(ON_DEMAND_PAGING, 31) \
+   x(SG_GAPS_REG, 32) \
+   x(VIRTUAL_FUNCTION, 33) \
+   x(RAW_SCATTER_FCS, 34) \
+   x(RDMA_NETDEV_OPA_VNIC, 35)
+
+   enum { RDMA_DEV_FLAGS(RDMA_BITMAP_ENUM) };
+
+   static const char * const
+   rdma_dev_names[] = { RDMA_DEV_FLAGS(RDMA_BITMAP_NAMES) };
+   #undef RDMA_DEV_FLAGS
+
+   if (idx < ARRAY_SIZE(rdma_dev_names) && rdma_dev_names[idx])
+   return rdma_dev_names[idx];
+   return "UNKNOWN";
+}
+
+static void dev_print_caps(struct nlattr **tb)
+{
+   uint64_t caps;
+   uint32_t idx;
+
+   if (!tb[RDMA_NLDEV_ATTR_CAP_FLAGS])
+   return;
+
+   caps = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_CAP_FLAGS]);
+
+   pr_out("\ncaps: <");
+   for (idx = 0; caps; idx++) {
+   if (caps & 0x1) {
+   pr_out("%s", dev_caps_to_str(idx));
+   if (caps >> 0x1)
+   pr_out(", ");
+   }
+   caps >>= 0x1;
+   }
+
+   pr_out(">");
+}
+
+static void dev_print_fw(struct nlattr **tb)
+{
+   if (!tb[RDMA_NLDEV_ATTR_FW_VERSION])
+   return;
+
+   pr_out("fw %s ",
+  mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_FW_VERSION]));
+}
+
+static void dev_print_node_guid(struct nlattr **tb)
+{
+   uint64_t node_guid;
+
+   if (!tb[RDMA_NLDEV_ATTR_NODE_GUID])
+   return;
+
+   node_guid = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_NODE_GUID]);
+   rd_print_u64("node_guid", node_guid);
+}
+
+static void dev_print_sys_image_guid(struct nlattr **tb)
+{
+   uint64_t sys_image_guid;
+
+   if (!tb[RDMA_NLDEV_ATTR_SYS_IMAGE_GUID])
+   return;
+
+   sys_image_guid = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_SYS_IMAGE_GUID]);
+   rd_print_u64("sys_image_guid", sys_image_guid);
+}
+
+static const char *node_type_to_str(uint8_t node_type)
+{
+   static const char * const node_type_str[] = { "unknown", "ca",
+ "switch", "router",
+ "rnic", "usnic",
+ "usnic_dp" };
+   if (node_type < ARRAY_SIZE(node_type_str))
+   return node_type_str[node_type];
+   return "unknown";
+}
+
+static void dev_print_node_type(struct nlattr **tb)
+{
+   uint8_t node_type;
+
+

[PATCH v6 iproute2 6/8] rdma: Implement json output for dev object

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

The example output for machine with two devices

root@mtr-leonro:~# rdma dev -j -p
[{
"ifindex": 1,
"ifname": "mlx5_0",
"node_type": "ca",
"fw": "2.8.",
"node_guid": "5254:00c0:fe12:3457",
"sys_image_guid": 5254:00c0:fe12:3457",
"caps": [ "BAD_PKEY_CNTR", "BAD_QKEY_CNTR", "CHANGE_PHY_POR",
  "PORT_ACTIVE_EVENT", "SYS_IMAGE_GUID", "RC_RNR_NAK_GEN",
  "MEM_WINDOW", "UD_IP_CSUM", "UD_TSO", "XRC",
  "MEM_MGT_EXTENSIONS", "BLOCK_MULTICAST_LOOPBACK",
  "MEM_WINDOW_TYPE_2B", "RAW_IP_CSUM",
  "MANAGED_FLOW_STEERING", "RESIZE_MAX_WR" ]
},{
"ifindex": 2,
"ifname": mlx5_1,
"node_type": "ca",
"fw": "2.8.",
"node_guid": "5254:00c0:fe12:3458",
"sys_image_guid": "5254:00c0:fe12:3458",
"caps": [ "BAD_PKEY_CNTR", "BAD_QKEY_CNTR", "CHANGE_PHY_POR",
  "PORT_ACTIVE_EVENT", "SYS_IMAGE_GUID", "RC_RNR_NAK_GEN",
  "MEM_WINDOW", "UD_IP_CSUM", "UD_TSO", "XRC",
  "MEM_MGT_EXTENSIONS", "BLOCK_MULTICAST_LOOPBACK",
  "MEM_WINDOW_TYPE_2B", "RAW_IP_CSUM",
  "MANAGED_FLOW_STEERING", "RESIZE_MAX_WR" ]
}
]

Signed-off-by: Leon Romanovsky 
---
 rdma/dev.c | 110 +
 1 file changed, 82 insertions(+), 28 deletions(-)

diff --git a/rdma/dev.c b/rdma/dev.c
index f6b55bae..9fadf3ac 100644
--- a/rdma/dev.c
+++ b/rdma/dev.c
@@ -66,7 +66,7 @@ static const char *dev_caps_to_str(uint32_t idx)
return "UNKNOWN";
 }
 
-static void dev_print_caps(struct nlattr **tb)
+static void dev_print_caps(struct rd *rd, struct nlattr **tb)
 {
uint64_t caps;
uint32_t idx;
@@ -76,48 +76,78 @@ static void dev_print_caps(struct nlattr **tb)
 
caps = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_CAP_FLAGS]);
 
-   pr_out("\ncaps: <");
+   if (rd->json_output) {
+   jsonw_name(rd->jw, "caps");
+   jsonw_start_array(rd->jw);
+   } else {
+   pr_out("\ncaps: <");
+   }
for (idx = 0; caps; idx++) {
if (caps & 0x1) {
-   pr_out("%s", dev_caps_to_str(idx));
-   if (caps >> 0x1)
-   pr_out(", ");
+   if (rd->json_output) {
+   jsonw_string(rd->jw, dev_caps_to_str(idx));
+   } else {
+   pr_out("%s", dev_caps_to_str(idx));
+   if (caps >> 0x1)
+   pr_out(", ");
+   }
}
caps >>= 0x1;
}
 
-   pr_out(">");
+   if (rd->json_output)
+   jsonw_end_array(rd->jw);
+   else
+   pr_out(">");
 }
 
-static void dev_print_fw(struct nlattr **tb)
+static void dev_print_fw(struct rd *rd, struct nlattr **tb)
 {
+   const char *str;
if (!tb[RDMA_NLDEV_ATTR_FW_VERSION])
return;
 
-   pr_out("fw %s ",
-  mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_FW_VERSION]));
+   str = mnl_attr_get_str(tb[RDMA_NLDEV_ATTR_FW_VERSION]);
+   if (rd->json_output)
+   jsonw_string_field(rd->jw, "fw", str);
+   else
+   pr_out("fw %s ", str);
 }
 
-static void dev_print_node_guid(struct nlattr **tb)
+static void dev_print_node_guid(struct rd *rd, struct nlattr **tb)
 {
uint64_t node_guid;
+   uint16_t vp[4];
+   char str[32];
 
if (!tb[RDMA_NLDEV_ATTR_NODE_GUID])
return;
 
node_guid = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_NODE_GUID]);
-   rd_print_u64("node_guid", node_guid);
+   memcpy(vp, _guid, sizeof(uint64_t));
+   snprintf(str, 32, "%04x:%04x:%04x:%04x", vp[3], vp[2], vp[1], vp[0]);
+   if (rd->json_output)
+   jsonw_string_field(rd->jw, "node_guid", str);
+   else
+   pr_out("node_guid %s ", str);
 }
 
-static void dev_print_sys_image_guid(struct nlattr **tb)
+static void dev_print_sys_image_guid(struct rd *rd, struct nlattr **tb)
 {
uint64_t sys_image_guid;
+   uint16_t vp[4];
+   char str[32];
 
if (!tb[RDMA_NLDEV_ATTR_SYS_IMAGE_GUID])
return;
 
sys_image_guid = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_SYS_IMAGE_GUID]);
-   rd_print_u64("sys_image_guid", sys_image_guid);
+   memcpy(vp, _image_guid, sizeof(uint64_t));
+   snprintf(str, 32, "%04x:%04x:%04x:%04x", vp[3], vp[2], vp[1], vp[0]);
+   if (rd->json_output)
+   jsonw_string_field(rd->jw, "sys_image_guid", str);
+   else
+   pr_out("sys_image_guid %s ", str);
 }
 
 static const char *node_type_to_str(uint8_t node_type)
@@ -131,37 +161,51 @@ static const char

[PATCH v6 iproute2 4/8] rdma: Add link object

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

Link (port) object represent struct ib_port to the user space.

Link properties:
 * Port capabilities
 * IB subnet prefix
 * LID, SM_LID and LMC
 * Port state
 * Physical state

Signed-off-by: Leon Romanovsky 
---
 rdma/Makefile |   2 +-
 rdma/link.c   | 277 ++
 rdma/rdma.c   |   3 +-
 rdma/utils.c  |   5 ++
 4 files changed, 285 insertions(+), 2 deletions(-)
 create mode 100644 rdma/link.c

diff --git a/rdma/Makefile b/rdma/Makefile
index 123d7ac5..1a9e4b1a 100644
--- a/rdma/Makefile
+++ b/rdma/Makefile
@@ -2,7 +2,7 @@ include ../Config
 
 ifeq ($(HAVE_MNL),y)
 
-RDMA_OBJ = rdma.o utils.o dev.o
+RDMA_OBJ = rdma.o utils.o dev.o link.o
 
 TARGETS=rdma
 CFLAGS += $(shell $(PKG_CONFIG) libmnl --cflags)
diff --git a/rdma/link.c b/rdma/link.c
new file mode 100644
index ..b0e5bee0
--- /dev/null
+++ b/rdma/link.c
@@ -0,0 +1,277 @@
+/*
+ * link.c  RDMA tool
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ * Authors: Leon Romanovsky 
+ */
+
+#include "rdma.h"
+
+static int link_help(struct rd *rd)
+{
+   pr_out("Usage: %s link show [DEV/PORT_INDEX]\n", rd->filename);
+   return 0;
+}
+
+static const char *caps_to_str(uint32_t idx)
+{
+#define RDMA_PORT_FLAGS(x) \
+   x(SM, 1) \
+   x(NOTICE, 2) \
+   x(TRAP, 3) \
+   x(OPT_IPD, 4) \
+   x(AUTO_MIGR, 5) \
+   x(SL_MAP, 6) \
+   x(MKEY_NVRAM, 7) \
+   x(PKEY_NVRAM, 8) \
+   x(LED_INFO, 9) \
+   x(SM_DISABLED, 10) \
+   x(SYS_IMAGE_GUIG, 11) \
+   x(PKEY_SW_EXT_PORT_TRAP, 12) \
+   x(EXTENDED_SPEEDS, 14) \
+   x(CM, 16) \
+   x(SNMP_TUNNEL, 17) \
+   x(REINIT, 18) \
+   x(DEVICE_MGMT, 19) \
+   x(VENDOR_CLASS, 20) \
+   x(DR_NOTICE, 21) \
+   x(CAP_MASK_NOTICE, 22) \
+   x(BOOT_MGMT, 23) \
+   x(LINK_LATENCY, 24) \
+   x(CLIENT_REG, 23) \
+   x(IP_BASED_GIDS, 26)
+
+   enum { RDMA_PORT_FLAGS(RDMA_BITMAP_ENUM) };
+
+   static const char * const
+   rdma_port_names[] = { RDMA_PORT_FLAGS(RDMA_BITMAP_NAMES) };
+   #undef RDMA_PORT_FLAGS
+
+   if (idx < ARRAY_SIZE(rdma_port_names) && rdma_port_names[idx])
+   return rdma_port_names[idx];
+   return "UNKNOWN";
+}
+
+static void link_print_caps(struct nlattr **tb)
+{
+   uint64_t caps;
+   uint32_t idx;
+
+   if (!tb[RDMA_NLDEV_ATTR_CAP_FLAGS])
+   return;
+
+   caps = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_CAP_FLAGS]);
+
+   pr_out("\ncaps: <");
+   for (idx = 0; caps; idx++) {
+   if (caps & 0x1) {
+   pr_out("%s", caps_to_str(idx));
+   if (caps >> 0x1)
+   pr_out(", ");
+   }
+   caps >>= 0x1;
+   }
+
+   pr_out(">");
+}
+
+static void link_print_subnet_prefix(struct nlattr **tb)
+{
+   uint64_t subnet_prefix;
+
+   if (!tb[RDMA_NLDEV_ATTR_SUBNET_PREFIX])
+   return;
+
+   subnet_prefix = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_SUBNET_PREFIX]);
+   rd_print_u64("subnet_prefix", subnet_prefix);
+}
+
+static void link_print_lid(struct nlattr **tb)
+{
+   if (!tb[RDMA_NLDEV_ATTR_LID])
+   return;
+
+   pr_out("lid %u ",
+  mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_LID]));
+}
+
+static void link_print_sm_lid(struct nlattr **tb)
+{
+   if (!tb[RDMA_NLDEV_ATTR_SM_LID])
+   return;
+
+   pr_out("sm_lid %u ",
+  mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_SM_LID]));
+}
+
+static void link_print_lmc(struct nlattr **tb)
+{
+   if (!tb[RDMA_NLDEV_ATTR_LMC])
+   return;
+
+   pr_out("lmc %u ", mnl_attr_get_u8(tb[RDMA_NLDEV_ATTR_LMC]));
+}
+
+static const char *link_state_to_str(uint8_t link_state)
+{
+   static const char * const link_state_str[] = { "NOP", "DOWN",
+  "INIT", "ARMED",
+  "ACTIVE",
+  "ACTIVE_DEFER" };
+   if (link_state < ARRAY_SIZE(link_state_str))
+   return link_state_str[link_state];
+   return "UNKNOWN";
+}
+
+static void link_print_state(struct nlattr **tb)
+{
+   uint8_t state;
+
+   if (!tb[RDMA_NLDEV_ATTR_PORT_STATE])
+   return;
+
+   state = mnl_attr_get_u8(tb[RDMA_NLDEV_ATTR_PORT_STATE]);
+   pr_out("state %s ", link_state_to_str(state));
+}
+
+static const char *phys_state_to_str(uint8_t phys_state)
+{
+   static const char * const phys_state_str[] = { "NOP", "SLEEP",
+

[PATCH v6 iproute2 0/8] RDMAtool

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

This is fifth revision of series implementing the RDAMtool -  the tool
to configure RDMA devices.

It looks like everyone who was interested to read cover letter already did it,
so I'll start from the changelog:

Changelog:
v5->v6:
 * Removed double includes
 * Copied rdma_netlink.h from he kernel to include/rdma folder, so the
   tool can be built as a standalone.
v4->v5:
 * Rebased to latest net-next branch
 * Moved BIT() macro from devlink to general utils.h file - Patch #1.
 * Changed the order of patches - moved man pages to be last patch.
 * Rewrote all switch->case->return_string constructions to be static
   tables with help of David's macro magic. Thanks a lot.
 * Dropped dependency on exported device and port properties. Now tool depends
   on RDMA netlink only and all needed code is already in Doug's for-next.
 * Added two OPA specific physical link states, because their names is
   too broad - TEST and OFFLINE, I named it as OPA_TEST and OPA_OFFLINE.
v3->v4:
 * Rebased to latest net-next branch
 * Added JSON output -j (json) and -p (pretty output)
 * Exported and reused kernel UAPIs and defines instead of hard coded
   version.
v2->v3:
 * Removed MAX()
 * Reduced scope of rd_argv_match
 * Removed return from rdma_free_devmap
 * Added extra break at rdma_send_msg
v1->v2:
 * Squashed multiple (and similar) patches to be one patch for dev object
   and one patch for link object.
 * Removed port_map struct
 * Removed global netlink dump during initialization, it removed the need to 
store
   the intermediate variables and reuse ability of netlink to signal if variable
   exists or doesn't.
 * Added "-d" --details option and put all CAPs under it.

v0->v1:
 * Moved hunk with changes in man/Makefile from first patch to the last patch
 * Removed the "unknown command" from the examples in commit messages
 * Removed special "caps" parsing command and put it to be part of general 
"show" command
 * Changed parsed capability format to be similar to iproute2 suite
 * Added FW version as an output of show command.
 * Added forgotten CAP_FLAGS to the nla_policy list
RFC->v0:
 * Removed everything that is not implemented yet.
 * Abandoned sysfs interfaces in favor of netlink.

-
The initial proposal was sent as RFC [1] and was based on sysfs entries as POC.

The current series was rewritten completely to work with RDMA netlinks as
a source of user<->kernel communications. In order to achieve that, the
RDMA netlinks were extensively refactored and modernized [2, 3, 4 and 5].

The Doug's for-next tag includes most of the needed patches for this tool.

The following is an example of various runs on my machine with 5 devices
(4 in IB mode and one in Ethernet mode).

### Without parameters
$ rdma
Usage: rdma [ OPTIONS ] OBJECT { COMMAND | help }
where  OBJECT := { dev | link | help }
   OPTIONS := { -V[ersion] | -d[etails] | -j[son] | -p[retty]}

### With unspecified device name
$ rdma dev
1: mlx5_0: node_type ca fw 2.8. node_guid 5254:00c0:fe12:3457 
sys_image_guid 5254:00c0:fe12:3457
2: mlx5_1: node_type ca fw 2.8. node_guid 5254:00c0:fe12:3458 
sys_image_guid 5254:00c0:fe12:3458
3: mlx5_2: node_type ca fw 2.8. node_guid 5254:00c0:fe12:3459 
sys_image_guid 5254:00c0:fe12:3459
4: mlx5_3: node_type ca fw 2.8. node_guid 5254:00c0:fe12:345a 
sys_image_guid 5254:00c0:fe12:345a
5: mlx5_4: node_type ca fw 2.8. node_guid 5254:00c0:fe12:345b 
sys_image_guid 5254:00c0:fe12:345b

### Detailed mode
$ rdma -d dev
1: mlx5_0: node_type ca fw 2.8. node_guid 5254:00c0:fe12:3457 
sys_image_guid 5254:00c0:fe12:3457
caps: 
2: mlx5_1: node_type ca fw 2.8. node_guid 5254:00c0:fe12:3458 
sys_image_guid 5254:00c0:fe12:3458
caps: 
3: mlx5_2: node_type ca fw 2.8. node_guid 5254:00c0:fe12:3459 
sys_image_guid 5254:00c0:fe12:3459
caps: 
4: mlx5_3: node_type ca fw 2.8. node_guid 5254:00c0:fe12:345a 
sys_image_guid 5254:00c0:fe12:345a
caps:

[PATCH v6 iproute2 5/8] rdma: Add json and pretty outputs

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

Signed-off-by: Leon Romanovsky 
---
 rdma/rdma.c | 31 ---
 rdma/rdma.h |  4 
 2 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/rdma/rdma.c b/rdma/rdma.c
index 74c09e8b..f9f4f2a2 100644
--- a/rdma/rdma.c
+++ b/rdma/rdma.c
@@ -16,7 +16,7 @@ static void help(char *name)
 {
pr_out("Usage: %s [ OPTIONS ] OBJECT { COMMAND | help }\n"
   "where  OBJECT := { dev | link | help }\n"
-  "   OPTIONS := { -V[ersion] | -d[etails]}\n", name);
+  "   OPTIONS := { -V[ersion] | -d[etails] | -j[son] | 
-p[retty]}\n", name);
 }
 
 static int cmd_help(struct rd *rd)
@@ -47,6 +47,16 @@ static int rd_init(struct rd *rd, int argc, char **argv, 
char *filename)
rd->argc = argc;
rd->argv = argv;
INIT_LIST_HEAD(>dev_map_list);
+
+   if (rd->json_output) {
+   rd->jw = jsonw_new(stdout);
+   if (!rd->jw) {
+   pr_err("Failed to create JSON writer\n");
+   return -ENOMEM;
+   }
+   jsonw_pretty(rd->jw, rd->pretty_output);
+   }
+
rd->buff = malloc(MNL_SOCKET_BUFFER_SIZE);
if (!rd->buff)
return -ENOMEM;
@@ -62,6 +72,8 @@ static int rd_init(struct rd *rd, int argc, char **argv, char 
*filename)
 
 static void rd_free(struct rd *rd)
 {
+   if (rd->json_output)
+   jsonw_destroy(>jw);
free(rd->buff);
rd_free_devmap(rd);
 }
@@ -71,10 +83,14 @@ int main(int argc, char **argv)
static const struct option long_options[] = {
{ "version",no_argument,NULL, 'V' },
{ "help",   no_argument,NULL, 'h' },
+   { "json",   no_argument,NULL, 'j' },
+   { "pretty", no_argument,NULL, 'p' },
{ "details",no_argument,NULL, 'd' },
{ NULL, 0, NULL, 0 }
};
+   bool pretty_output = false;
bool show_details = false;
+   bool json_output = false;
char *filename;
struct rd rd;
int opt;
@@ -82,16 +98,22 @@ int main(int argc, char **argv)
 
filename = basename(argv[0]);
 
-   while ((opt = getopt_long(argc, argv, "Vhd",
+   while ((opt = getopt_long(argc, argv, "Vhdpj",
  long_options, NULL)) >= 0) {
switch (opt) {
case 'V':
printf("%s utility, iproute2-ss%s\n",
   filename, SNAPSHOT);
return EXIT_SUCCESS;
+   case 'p':
+   pretty_output = true;
+   break;
case 'd':
show_details = true;
break;
+   case 'j':
+   json_output = true;
+   break;
case 'h':
help(filename);
return EXIT_SUCCESS;
@@ -105,11 +127,14 @@ int main(int argc, char **argv)
argc -= optind;
argv += optind;
 
+   rd.show_details = show_details;
+   rd.json_output = json_output;
+   rd.pretty_output = pretty_output;
+
err = rd_init(, argc, argv, filename);
if (err)
goto out;
 
-   rd.show_details = show_details;
err = rd_cmd();
 out:
/* Always cleanup */
diff --git a/rdma/rdma.h b/rdma/rdma.h
index 36b047d3..4c564fef 100644
--- a/rdma/rdma.h
+++ b/rdma/rdma.h
@@ -21,6 +21,7 @@
 
 #include "list.h"
 #include "utils.h"
+#include "json_writer.h"
 
 #define pr_err(args...) fprintf(stderr, ##args)
 #define pr_out(args...) fprintf(stdout, ##args)
@@ -46,6 +47,9 @@ struct rd {
struct mnl_socket *nl;
struct nlmsghdr *nlh;
char *buff;
+   json_writer_t *jw;
+   bool json_output;
+   bool pretty_output;
 };
 
 struct rd_cmd {
-- 
2.14.1

[PATCH v6 iproute2 2/8] rdma: Add basic infrastructure for RDMA tool

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

RDMA devices are cross-functional devices from one side,
but very tailored for the specific markets from another.

Such diversity caused to spread of RDMA related configuration
across various tools, e.g. devlink, ip, ethtool, ib specific and
vendor specific solutions.

This patch adds ability to fill device and port information
by reading RDMA netlink.

Signed-off-by: Leon Romanovsky 
---
 Makefile|   2 +-
 include/rdma/rdma_netlink.h | 307 
 rdma/.gitignore |   1 +
 rdma/Makefile   |  22 
 rdma/rdma.c | 116 +
 rdma/rdma.h |  71 ++
 rdma/utils.c| 217 +++
 7 files changed, 735 insertions(+), 1 deletion(-)
 create mode 100644 include/rdma/rdma_netlink.h
 create mode 100644 rdma/.gitignore
 create mode 100644 rdma/Makefile
 create mode 100644 rdma/rdma.c
 create mode 100644 rdma/rdma.h
 create mode 100644 rdma/utils.c

diff --git a/Makefile b/Makefile
index 1f88f7f5..dbb4a4af 100644
--- a/Makefile
+++ b/Makefile
@@ -49,7 +49,7 @@ WFLAGS += -Wmissing-declarations -Wold-style-definition 
-Wformat=2
 CFLAGS := $(WFLAGS) $(CCOPTS) -I../include $(DEFINES) $(CFLAGS)
 YACCFLAGS = -d -t -v
 
-SUBDIRS=lib ip tc bridge misc netem genl tipc devlink man
+SUBDIRS=lib ip tc bridge misc netem genl tipc devlink rdma man
 
 LIBNETLINK=../lib/libnetlink.a ../lib/libutil.a
 LDLIBS += $(LIBNETLINK)
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
new file mode 100644
index ..861440a8
--- /dev/null
+++ b/include/rdma/rdma_netlink.h
@@ -0,0 +1,307 @@
+#ifndef _UAPI_RDMA_NETLINK_H
+#define _UAPI_RDMA_NETLINK_H
+
+#include 
+
+enum {
+   RDMA_NL_RDMA_CM = 1,
+   RDMA_NL_IWCM,
+   RDMA_NL_RSVD,
+   RDMA_NL_LS, /* RDMA Local Services */
+   RDMA_NL_NLDEV,  /* RDMA device interface */
+   RDMA_NL_NUM_CLIENTS
+};
+
+enum {
+   RDMA_NL_GROUP_CM = 1,
+   RDMA_NL_GROUP_IWPM,
+   RDMA_NL_GROUP_LS,
+   RDMA_NL_NUM_GROUPS
+};
+
+#define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 10)
+#define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1))
+#define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
+
+enum {
+   RDMA_NL_RDMA_CM_ID_STATS = 0,
+   RDMA_NL_RDMA_CM_NUM_OPS
+};
+
+enum {
+   RDMA_NL_RDMA_CM_ATTR_SRC_ADDR = 1,
+   RDMA_NL_RDMA_CM_ATTR_DST_ADDR,
+   RDMA_NL_RDMA_CM_NUM_ATTR,
+};
+
+/* iwarp port mapper op-codes */
+enum {
+   RDMA_NL_IWPM_REG_PID = 0,
+   RDMA_NL_IWPM_ADD_MAPPING,
+   RDMA_NL_IWPM_QUERY_MAPPING,
+   RDMA_NL_IWPM_REMOVE_MAPPING,
+   RDMA_NL_IWPM_REMOTE_INFO,
+   RDMA_NL_IWPM_HANDLE_ERR,
+   RDMA_NL_IWPM_MAPINFO,
+   RDMA_NL_IWPM_MAPINFO_NUM,
+   RDMA_NL_IWPM_NUM_OPS
+};
+
+struct rdma_cm_id_stats {
+   __u32   qp_num;
+   __u32   bound_dev_if;
+   __u32   port_space;
+   __s32   pid;
+   __u8cm_state;
+   __u8node_type;
+   __u8port_num;
+   __u8qp_type;
+};
+
+enum {
+   IWPM_NLA_REG_PID_UNSPEC = 0,
+   IWPM_NLA_REG_PID_SEQ,
+   IWPM_NLA_REG_IF_NAME,
+   IWPM_NLA_REG_IBDEV_NAME,
+   IWPM_NLA_REG_ULIB_NAME,
+   IWPM_NLA_REG_PID_MAX
+};
+
+enum {
+   IWPM_NLA_RREG_PID_UNSPEC = 0,
+   IWPM_NLA_RREG_PID_SEQ,
+   IWPM_NLA_RREG_IBDEV_NAME,
+   IWPM_NLA_RREG_ULIB_NAME,
+   IWPM_NLA_RREG_ULIB_VER,
+   IWPM_NLA_RREG_PID_ERR,
+   IWPM_NLA_RREG_PID_MAX
+
+};
+
+enum {
+   IWPM_NLA_MANAGE_MAPPING_UNSPEC = 0,
+   IWPM_NLA_MANAGE_MAPPING_SEQ,
+   IWPM_NLA_MANAGE_ADDR,
+   IWPM_NLA_MANAGE_MAPPED_LOC_ADDR,
+   IWPM_NLA_RMANAGE_MAPPING_ERR,
+   IWPM_NLA_RMANAGE_MAPPING_MAX
+};
+
+#define IWPM_NLA_MANAGE_MAPPING_MAX 3
+#define IWPM_NLA_QUERY_MAPPING_MAX  4
+#define IWPM_NLA_MAPINFO_SEND_MAX   3
+
+enum {
+   IWPM_NLA_QUERY_MAPPING_UNSPEC = 0,
+   IWPM_NLA_QUERY_MAPPING_SEQ,
+   IWPM_NLA_QUERY_LOCAL_ADDR,
+   IWPM_NLA_QUERY_REMOTE_ADDR,
+   IWPM_NLA_RQUERY_MAPPED_LOC_ADDR,
+   IWPM_NLA_RQUERY_MAPPED_REM_ADDR,
+   IWPM_NLA_RQUERY_MAPPING_ERR,
+   IWPM_NLA_RQUERY_MAPPING_MAX
+};
+
+enum {
+   IWPM_NLA_MAPINFO_REQ_UNSPEC = 0,
+   IWPM_NLA_MAPINFO_ULIB_NAME,
+   IWPM_NLA_MAPINFO_ULIB_VER,
+   IWPM_NLA_MAPINFO_REQ_MAX
+};
+
+enum {
+   IWPM_NLA_MAPINFO_UNSPEC = 0,
+   IWPM_NLA_MAPINFO_LOCAL_ADDR,
+   IWPM_NLA_MAPINFO_MAPPED_ADDR,
+   IWPM_NLA_MAPINFO_MAX
+};
+
+enum {
+   IWPM_NLA_MAPINFO_NUM_UNSPEC = 0,
+   IWPM_NLA_MAPINFO_SEQ,
+   IWPM_NLA_MAPINFO_SEND_NUM,
+   IWPM_NLA_MAPINFO_ACK_NUM,
+   IWPM_NLA_MAPINFO_NUM_MAX
+};
+
+enum {
+   IWPM_NLA_ERR_UNSPEC = 0,
+   IWPM_NLA_ERR_SEQ,
+   IWPM_NLA_ERR_CODE,
+   IWPM_NLA_ERR_MAX
+};
+
+/*
+ * Local service operations:
+ *   RESOLVE - The client requests the local service to

[PATCH v6 iproute2 1/8] utils: Move BIT macro to common header

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

BIT() macro was implemented and used by devlink for now, but following
patches of rdmatool will reuse the same macro, so put it in common
header file.

Signed-off-by: Leon Romanovsky 
---
 devlink/devlink.c | 2 +-
 include/utils.h   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index f9bc16c3..7602970b 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -25,6 +25,7 @@
 #include "list.h"
 #include "mnlg.h"
 #include "json_writer.h"
+#include "utils.h"
 
 #define ESWITCH_MODE_LEGACY "legacy"
 #define ESWITCH_MODE_SWITCHDEV "switchdev"
@@ -160,7 +161,6 @@ static void ifname_map_free(struct ifname_map *ifname_map)
free(ifname_map);
 }
 
-#define BIT(nr) (1UL << (nr))
 #define DL_OPT_HANDLE  BIT(0)
 #define DL_OPT_HANDLEP BIT(1)
 #define DL_OPT_PORT_TYPE   BIT(2)
diff --git a/include/utils.h b/include/utils.h
index 565bda60..1bb6d6a2 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -196,6 +196,8 @@ static inline void __jiffies_to_tv(struct timeval *tv, 
unsigned long jiffies)
 int print_timestamp(FILE *fp);
 void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n);
 
+#define BIT(nr) (1UL << (nr))
+
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 
 #define BUILD_BUG_ON(cond) ((void)sizeof(char[1 - 2 * !!(cond)]))
-- 
2.14.1

[PATCH v6 iproute2 8/8] rdma: Add initial manual for the tool

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

Signed-off-by: Leon Romanovsky 
---
 man/man8/rdma-dev.8  |  55 +++
 man/man8/rdma-link.8 |  55 +++
 man/man8/rdma.8  | 102 +++
 3 files changed, 212 insertions(+)
 create mode 100644 man/man8/rdma-dev.8
 create mode 100644 man/man8/rdma-link.8
 create mode 100644 man/man8/rdma.8

diff --git a/man/man8/rdma-dev.8 b/man/man8/rdma-dev.8
new file mode 100644
index ..461681b6
--- /dev/null
+++ b/man/man8/rdma-dev.8
@@ -0,0 +1,55 @@
+.TH RDMA\-DEV 8 "06 Jul 2017" "iproute2" "Linux"
+.SH NAME
+rdmak-dev \- RDMA device configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B rdma
+.RI "[ " OPTIONS " ]"
+.B dev
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] |
+\fB\-d\fR[\fIetails\fR] }
+
+.ti -8
+.B rdma dev show
+.RI "[ " DEV " ]"
+
+.ti -8
+.B rdma dev help
+
+.SH "DESCRIPTION"
+.SS rdma dev show - display rdma device attributes
+
+.PP
+.I "DEV"
+- specifies the RDMA device to show.
+If this argument is omitted all devices are listed.
+
+.SH "EXAMPLES"
+.PP
+rdma dev
+.RS 4
+Shows the state of all RDMA devices on the system.
+.RE
+.PP
+rdma dev show mlx5_3
+.RS 4
+Shows the state of specified RDMA device.
+.RE
+.PP
+
+.SH SEE ALSO
+.BR rdma (8),
+.BR rdma-link (8),
+.br
+
+.SH AUTHOR
+Leon Romanovsky 
diff --git a/man/man8/rdma-link.8 b/man/man8/rdma-link.8
new file mode 100644
index ..8ed049ef
--- /dev/null
+++ b/man/man8/rdma-link.8
@@ -0,0 +1,55 @@
+.TH RDMA\-LINK 8 "06 Jul 2017" "iproute2" "Linux"
+.SH NAME
+rdma-link \- rdma link configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B link
+.RI  " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] |
+\fB\-d\fR[\fIetails\fR] }
+
+.ti -8
+.B rdma link show
+.RI "[ " DEV/PORT_INDEX " ]"
+
+.ti -8
+.B rdma link help
+
+.SH "DESCRIPTION"
+.SS rdma link show - display rdma link attributes
+
+.PP
+.I "DEV/PORT_INDEX"
+- specifies the RDMa link to show.
+If this argument is omitted all links are listed.
+
+.SH "EXAMPLES"
+.PP
+rdma link show
+.RS 4
+Shows the state of all rdma links on the system.
+.RE
+.PP
+rdma link show mlx5_2/1
+.RS 4
+Shows the state of specified rdma link.
+.RE
+.PP
+
+.SH SEE ALSO
+.BR rdma (8),
+.BR rdma-dev (8),
+.br
+
+.SH AUTHOR
+Leon Romanovsky 
diff --git a/man/man8/rdma.8 b/man/man8/rdma.8
new file mode 100644
index ..798b33d3
--- /dev/null
+++ b/man/man8/rdma.8
@@ -0,0 +1,102 @@
+.TH RDMA 8 "28 Mar 2017" "iproute2" "Linux"
+.SH NAME
+rdma \- RDMA tool
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B rdma
+.RI "[ " OPTIONS " ] " OBJECT " { " COMMAND " | "
+.BR help " }"
+.sp
+
+.ti -8
+.IR OBJECT " := { "
+.BR dev " | " link " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-V\fR[\fIersion\fR] |
+\fB\-d\fR[\fIetails\fR] }
+\fB\-j\fR[\fIson\fR] }
+\fB\-p\fR[\fIretty\fR] }
+
+.SH OPTIONS
+
+.TP
+.BR "\-V" , " -Version"
+Print the version of the
+.B rdma
+tool and exit.
+
+.TP
+.BR "\-d" , " --details"
+Otuput detailed information.
+
+.TP
+.BR "\-p" , " --pretty"
+When combined with -j generate a pretty JSON output.
+
+.TP
+.BR "\-j" , " --json"
+Generate JSON output.
+
+.SS
+.I OBJECT
+
+.TP
+.B dev
+- RDMA device.
+
+.TP
+.B link
+- RDMA port related.
+
+.PP
+The names of all objects may be written in full or
+abbreviated form, for example
+.B stats
+can be abbreviated as
+.B stat
+or just
+.B s.
+
+.SS
+.I COMMAND
+
+Specifies the action to perform on the object.
+The set of possible actions depends on the object type.
+As a rule, it is possible to
+.B show
+(or
+.B list
+) objects, but some objects do not allow all of these operations
+or have some additional commands. The
+.B help
+command is available for all objects. It prints
+out a list of available commands and argument syntax conventions.
+.sp
+If no command is given, some default command is assumed.
+Usually it is
+.B list
+or, if the objects of this class cannot be listed,
+.BR "help" .
+
+.SH EXIT STATUS
+Exit status is 0 if command was successful or a positive integer upon failure.
+
+.SH SEE ALSO
+.BR rdma-dev (8),
+.BR rdma-link (8),
+.br
+
+.SH REPORTING BUGS
+Report any bugs to the Linux RDMA mailing list
+.B 
+where the development and maintenance is primarily done.
+You do not have to be subscribed to the list to send a message there.
+
+.SH AUTHOR
+Leon Romanovsky 
-- 
2.14.1

[PATCH v6 iproute2 7/8] rdma: Add json output to link object

2017-08-20 Thread Leon Romanovsky

From: Leon Romanovsky 

An example for the JSON output for two devices system.

root@mtr-leonro:~# rdma link -d -p -j
[{
"ifindex": 1,
"port": 1,
"ifname": "mlx5_0/1",
"subnet_prefix": "fe80:::",
"lid": 13399,
"sm_lid": 49151,
"lmc": 0,
"state": "ACTIVE",
"physical_state": "LINK_UP",
"caps": ["AUTO_MIG"
]
},{
"ifindex": 2,
"port": 1,
"ifname": "mlx5_1/1",
"subnet_prefix": "fe80:::",
"lid": 13400,
"sm_lid": 49151,
"lmc": 0,
"state": "ACTIVE",
"physical_state": "LINK_UP",
"caps": ["AUTO_MIG"
]
}
]

Signed-off-by: Leon Romanovsky 
---
 rdma/link.c  | 144 +++
 rdma/rdma.h  |   1 -
 rdma/utils.c |   8 
 3 files changed, 105 insertions(+), 48 deletions(-)

diff --git a/rdma/link.c b/rdma/link.c
index b0e5bee0..eae96cd8 100644
--- a/rdma/link.c
+++ b/rdma/link.c
@@ -56,7 +56,7 @@ static const char *caps_to_str(uint32_t idx)
return "UNKNOWN";
 }
 
-static void link_print_caps(struct nlattr **tb)
+static void link_print_caps(struct rd *rd, struct nlattr **tb)
 {
uint64_t caps;
uint32_t idx;
@@ -66,54 +66,89 @@ static void link_print_caps(struct nlattr **tb)
 
caps = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_CAP_FLAGS]);
 
-   pr_out("\ncaps: <");
+   if (rd->json_output) {
+   jsonw_name(rd->jw, "caps");
+   jsonw_start_array(rd->jw);
+   } else {
+   pr_out("\ncaps: <");
+   }
for (idx = 0; caps; idx++) {
if (caps & 0x1) {
-   pr_out("%s", caps_to_str(idx));
-   if (caps >> 0x1)
-   pr_out(", ");
+   if (rd->json_output) {
+   jsonw_string(rd->jw, caps_to_str(idx));
+   } else {
+   pr_out("%s", caps_to_str(idx));
+   if (caps >> 0x1)
+   pr_out(", ");
+   }
}
caps >>= 0x1;
}
 
-   pr_out(">");
+   if (rd->json_output)
+   jsonw_end_array(rd->jw);
+   else
+   pr_out(">");
 }
 
-static void link_print_subnet_prefix(struct nlattr **tb)
+static void link_print_subnet_prefix(struct rd *rd, struct nlattr **tb)
 {
uint64_t subnet_prefix;
+   uint16_t vp[4];
+   char str[32];
 
if (!tb[RDMA_NLDEV_ATTR_SUBNET_PREFIX])
return;
 
subnet_prefix = mnl_attr_get_u64(tb[RDMA_NLDEV_ATTR_SUBNET_PREFIX]);
-   rd_print_u64("subnet_prefix", subnet_prefix);
+   memcpy(vp, _prefix, sizeof(uint64_t));
+   snprintf(str, 32, "%04x:%04x:%04x:%04x", vp[3], vp[2], vp[1], vp[0]);
+   if (rd->json_output)
+   jsonw_string_field(rd->jw, "subnet_prefix", str);
+   else
+   pr_out("subnet_prefix %s ", str);
 }
 
-static void link_print_lid(struct nlattr **tb)
+static void link_print_lid(struct rd *rd, struct nlattr **tb)
 {
+   uint32_t lid;
+
if (!tb[RDMA_NLDEV_ATTR_LID])
return;
 
-   pr_out("lid %u ",
-  mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_LID]));
+   lid = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_LID]);
+   if (rd->json_output)
+   jsonw_uint_field(rd->jw, "lid", lid);
+   else
+   pr_out("lid %u ", lid);
 }
 
-static void link_print_sm_lid(struct nlattr **tb)
+static void link_print_sm_lid(struct rd *rd, struct nlattr **tb)
 {
+   uint32_t sm_lid;
+
if (!tb[RDMA_NLDEV_ATTR_SM_LID])
return;
 
-   pr_out("sm_lid %u ",
-  mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_SM_LID]));
+   sm_lid = mnl_attr_get_u32(tb[RDMA_NLDEV_ATTR_SM_LID]);
+   if (rd->json_output)
+   jsonw_uint_field(rd->jw, "sm_lid", sm_lid);
+   else
+   pr_out("sm_lid %u ", sm_lid);
 }
 
-static void link_print_lmc(struct nlattr **tb)
+static void link_print_lmc(struct rd *rd, struct nlattr **tb)
 {
+   uint8_t lmc;
+
if (!tb[RDMA_NLDEV_ATTR_LMC])
return;
 
-   pr_out("lmc %u ", mnl_attr_get_u8(tb[RDMA_NLDEV_ATTR_LMC]));
+   lmc = mnl_attr_get_u8(tb[RDMA_NLDEV_ATTR_LMC]);
+   if (rd->json_output)
+   jsonw_uint_field(rd->jw, "lmc", lmc);
+   else
+   pr_out("lmc %u ", lmc);
 }
 
 static const char *link_state_to_str(uint8_t link_state)
@@ -127,7 +162,7 @@ static const char *link_state_to_str(uint8_t link_state)
return "UNKNOWN";
 }
 
-static void link_print_state(struct nlattr **tb)
+static void link_print_state(struct rd *rd, struct nlattr **tb)
 {
uint8_t state;
 
@@ -135,7 +170,10 @@ static void link_print_state(struct nlattr

[PATCH v3 net 1/2 RESEND] Revert commit 1a8b6d76dc5b ("net:add one common config...")

2017-08-20 Thread Ding Tianhong

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc5b ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong 
---
 arch/Kconfig| 3 ---
 arch/sparc/Kconfig  | 1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..00cfc63 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -928,9 +928,6 @@ config STRICT_MODULE_RWX
  and non-text memory will be made non-executable. This provides
  protection against certain security exploits (e.g. writing to text)
 
-config ARCH_WANT_RELAX_ORDER
-   bool
-
 config REFCOUNT_FULL
bool "Perform full reference count validation at the expense of speed"
help
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a4a6261..987a575 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,7 +44,6 @@ config SPARC
select ARCH_HAS_SG_CHAIN
select CPU_NO_EFFICIENT_FFS
select LOCKDEP_SMALL if LOCKDEP
-   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4e35e70..d4933d2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
+#ifndef CONFIG_SPARC
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
1.8.3.1

[PATCH v3 net 0/2 RESEND] net: ixgbe: Use new flag to disable Relaxed Ordering

2017-08-20 Thread Ding Tianhong

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

The ixgbe driver could use this flag to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attributes set.

v2: Simplify the original program according Alex's suggestion,
remove the new ixgbe flag2 and only check the bit4 in the
PCIe Device Control register. 

v3: Remove the code that clears the bits in DCA_T/RXCTRL, relaxed
ordering should be enabled by the HW when the bus allow it.

Ding Tianhong (2):
  Revert commit 1a8b6d76dc5b ("net:add one common config...")
  net: ixgbe: Use new IXGBE_FLAG2_ROOT_NO_RELAXED_ORDERING flag

 arch/Kconfig|  3 --
 arch/sparc/Kconfig  |  1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 37 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 32 +++--
 4 files changed, 35 insertions(+), 38 deletions(-)

-- 
1.8.3.1

[PATCH v3 net 2/2 RESEND] net: ixgbe: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-20 Thread Ding Tianhong

The ixgbe driver use the compile check to determine if it can
send TLPs to Root Port with the Relaxed Ordering Attribute set,
this is too inconvenient, now the new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING
has been added to the kernel and we could check the bit4 in the PCIe
Device Control register to determine whether we should use the Relaxed
Ordering Attributes or not, so use this new way in the ixgbe driver.

Signed-off-by: Ding Tianhong 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c  | 22 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 19 ---
 2 files changed, 41 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
index 523f9d0..8a32eb7 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_82598.c
@@ -175,31 +175,9 @@ static s32 ixgbe_init_phy_ops_82598(struct ixgbe_hw *hw)
  **/
 static s32 ixgbe_start_hw_82598(struct ixgbe_hw *hw)
 {
-#ifndef CONFIG_SPARC
-   u32 regval;
-   u32 i;
-#endif
s32 ret_val;
 
ret_val = ixgbe_start_hw_generic(hw);
-
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; ((i < hw->mac.max_tx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(i), regval);
-   }
-
-   for (i = 0; ((i < hw->mac.max_rx_queues) &&
-(i < IXGBE_DCA_MAX_QUEUES_82598)); i++) {
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
if (ret_val)
return ret_val;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index d4933d2..96c324f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,25 +350,6 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
-   /* Disable relaxed ordering */
-   for (i = 0; i < hw->mac.max_tx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(i));
-   regval &= ~IXGBE_DCA_TXCTRL_DESC_WRO_EN;
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(i), regval);
-   }
-
-   for (i = 0; i < hw->mac.max_rx_queues; i++) {
-   u32 regval;
-
-   regval = IXGBE_READ_REG(hw, IXGBE_DCA_RXCTRL(i));
-   regval &= ~(IXGBE_DCA_RXCTRL_DATA_WRO_EN |
-   IXGBE_DCA_RXCTRL_HEAD_WRO_EN);
-   IXGBE_WRITE_REG(hw, IXGBE_DCA_RXCTRL(i), regval);
-   }
-#endif
return 0;
 }
 
-- 
1.8.3.1

[RFC PATCH] dt-binding: net: sfp binding documentation

2017-08-20 Thread Baruch Siach

Add device-tree binding documentation SFP transceivers. Support for SFP
transceivers has been recently introduced (drivers/net/phy/sfp.c).

Signed-off-by: Baruch Siach 
---

The SFP driver is on net-next.

Not sure about the rate-select-gpio property name. The SFP+ standard
(not supported yet) uses two signals, RS0 and RS1. RS0 is compatible
with the SFP rate select signal, while RS1 controls the Tx rate.
---
 Documentation/devicetree/bindings/net/sff-sfp.txt | 24 +++
 1 file changed, 24 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/sff-sfp.txt

diff --git a/Documentation/devicetree/bindings/net/sff-sfp.txt 
b/Documentation/devicetree/bindings/net/sff-sfp.txt
new file mode 100644
index ..f0c27bc3925e
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/sff-sfp.txt
@@ -0,0 +1,24 @@
+Small Form Factor (SFF) Committee Small Form-factor Pluggable (SFP)
+Transceiver
+
+Required properties:
+
+- compatible : must be "sff,sfp"
+
+Optional Properties:
+
+- i2c-bus : phandle of an I2C bus controller for the SFP two wire serial
+  interface
+
+- moddef0-gpio : phandle of the MOD-DEF0 (AKA Mod_ABS) module presence input
+  gpio signal
+
+- los-gpio : phandle of the Receiver Loss of Signal Indication input gpio
+  signal
+
+- tx-fault-gpio : phandle of the Module Transmitter Fault input gpio signal
+
+- tx-disable-gpio : phandle of the Transmitter Disable output gpio signal
+
+- rate-select-gpio : phandle of the Rx Signaling Rate Select (AKA RS0) output
+  gpio
-- 
2.14.1

Re: [PATCH net-next 4/4] mlx4: sizeof style usage

2017-08-20 Thread Tariq Toukan


Thanks Stephen.
Sorry for the late reply, I was on vacation.
I know this is already accepted, but still I have one comment.

On 15/08/2017 8:29 PM, Stephen Hemminger wrote:

The kernel coding style is to treat sizeof as a function
(ie. with parenthesis) not as an operator.

Also use kcalloc and kmalloc_array

Signed-off-by: Stephen Hemminger 
---
@@ -726,7 +726,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct mlx4_eq 
*eq)
}
memcpy(>mfunc.master.comm_arm_bit_vector,
   eqe->event.comm_channel_arm.bit_vec,
-  sizeof eqe->event.comm_channel_arm.bit_vec);
+  sizeof(eqe)->event.comm_channel_arm.bit_vec);


I think the brackets here are misplaced.
Shouldn't they be as follows?

sizeof(eqe->event.comm_channel_arm.bit_vec));


queue_work(priv->mfunc.master.comm_wq,
   >mfunc.master.comm_work);
break;


Thanks,
Tariq

Re: [net-next:master 1184/1189] include/linux/bpf.h:324:21: error: 'NUMA_NO_NODE' undeclared

2017-08-20 Thread Martin KaFai Lau

On Sun, Aug 20, 2017 at 01:43:54PM +0800, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> master
> head:   228498596c44041c710f5a633904205bc1cd9177
> commit: 96eabe7a40aa17e613cf3db2c742ee8b1fc764d0 [1184/1189] bpf: Allow 
> selecting numa node during map creation
> config: i386-randconfig-s1-201734 (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> git checkout 96eabe7a40aa17e613cf3db2c742ee8b1fc764d0
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All error/warnings (new ones prefixed by >>):
>
>In file included from net/bpf/test_run.c:7:0:
>include/linux/bpf.h: In function 'bpf_map_attr_numa_node':
> >> include/linux/bpf.h:324:21: error: 'NUMA_NO_NODE' undeclared (first use in 
> >> this function)
>   attr->numa_node : NUMA_NO_NODE;
> ^~~~
>include/linux/bpf.h:324:21: note: each undeclared identifier is reported 
> only once for each function it appears in
> --
>In file included from kernel/bpf/syscall.c:12:0:
>include/linux/bpf.h: In function 'bpf_map_attr_numa_node':
> >> include/linux/bpf.h:324:21: error: 'NUMA_NO_NODE' undeclared (first use in 
> >> this function)
>   attr->numa_node : NUMA_NO_NODE;
> ^~~~
>include/linux/bpf.h:324:21: note: each undeclared identifier is reported 
> only once for each function it appears in
>In file included from kernel/bpf/syscall.c:12:0:
> >> include/linux/bpf.h:325:1: warning: control reaches end of non-void 
> >> function [-Wreturn-type]
> }
> ^
I will post a fix shortly.

>
> vim +/NUMA_NO_NODE +324 include/linux/bpf.h
>
>319
>320/* Return map's numa specified by userspace */
>321static inline int bpf_map_attr_numa_node(const union bpf_attr 
> *attr)
>322{
>323return (attr->map_flags & BPF_F_NUMA_NODE) ?
>  > 324attr->numa_node : NUMA_NO_NODE;
>  > 325}
>326
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_pipermail_kbuild-2Dall=DwIBAg=5VD0RTtNlTh3ycd41b3MUw=VQnoQ7LvghIj0gVEaiQSUw=36OvwN5FPWpOV3AefLVno0fEHrg6mcU_F6ErzV-KPlc=EDsqfhvtP3I95wiIrIqYE8CEAk6wnfUAuVWkCb3iXP4=
> Intel Corporation

Re: [net-next:master 1184/1189] include/linux/bpf.h:324:21: error: 'NUMA_NO_NODE' undeclared

2017-08-20 Thread Martin KaFai Lau

On Sat, Aug 19, 2017 at 11:33:13PM -0700, David Miller wrote:
> From: kbuild test robot 
> Date: Sun, 20 Aug 2017 13:43:54 +0800
>
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> > master
> > head:   228498596c44041c710f5a633904205bc1cd9177
> > commit: 96eabe7a40aa17e613cf3db2c742ee8b1fc764d0 [1184/1189] bpf: Allow 
> > selecting numa node during map creation
> > config: i386-randconfig-s1-201734 (attached as .config)
> > compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> > reproduce:
> > git checkout 96eabe7a40aa17e613cf3db2c742ee8b1fc764d0
> > # save the attached .config to linux build tree
> > make ARCH=i386
> >
> > All error/warnings (new ones prefixed by >>):
> >
> >In file included from net/bpf/test_run.c:7:0:
> >include/linux/bpf.h: In function 'bpf_map_attr_numa_node':
> >>> include/linux/bpf.h:324:21: error: 'NUMA_NO_NODE' undeclared (first use 
> >>> in this function)
> >   attr->numa_node : NUMA_NO_NODE;
> > ^~~~
>
> I'll add the linux/numa.h include to linux/bpf.h
Thanks for fixing it.  Sorry for the mistake.

Re: [PATCH v3 3/4] net: stmmac: register parent MDIO node for sun8i-h3-emac

2017-08-20 Thread Corentin Labbe

On Sat, Aug 19, 2017 at 10:38:36PM +0200, Andrew Lunn wrote:
> On Sat, Aug 19, 2017 at 08:50:25PM +0200, Corentin Labbe wrote:
> > On Sat, Aug 19, 2017 at 01:05:21AM +0800, Chen-Yu Tsai wrote:
> > > On Fri, Aug 18, 2017 at 8:21 PM, Corentin Labbe
> > >  wrote:
> > > > In case of a MDIO switch, the registered MDIO node should be
> > > > the parent of the PHY. Otherwise of_phy_connect will fail.
> 
> Hi Corentin
> 
> Sorry, I missed this patch series. Looking at patchwork...

That's my fault, I forgot to set you in recipient like in last send.

> 
> Can you represent the MDIO mux using 
> 
> Documentation/devicetree/bindings/net/mdio-mux-mmioreg.txt
> 
> It would be better if you could reuse existing infrastructure than
> invent something new.
> 

I think we cannot use mdio-mux-mmioreg since the register for doing the switch 
is in middle of the "System Control" and shared with other functions.
This is why we use a sycon/regmap for selecting the MDIO.

Regards

Sequel to the secretive shipping arrangement...

2017-08-20 Thread Dr. O. Joseph - Head Of Operation - SGB

>From Hon. Dr. O. Joseph
Head, Banking Operations,
Societe Generale Bank Limited.


Good day,

Sequel to the secretive arrangement in regards to the shipment of your
consignment, I wish to inform you that all necessary modalities have
been completely concluded and have today left my country under high
diplomatic immunity to London en-route to your Country by Express
Cargo Flight.

Note carefully that the content of the crate is "MONEY" but I did not
disclose it to the Courier Services as Money, rather I informed them
that the crate contain Vital "DIPLOMATIC DOCUMENTS" belonging to my
client (that's you).

Furthermore, the weight of the consignment is 220kg but I manage to
pay 120kg, which cost US$75,600.00 but do not worry as I have
concluded with them on this regard. All I need now is your maximum
co-operation and assistance for a successful and hitch-free delivery
to you.

Note that on no account should you disclose the content of the crate
with the Diplomatic Courier Services for fear of betrayal. So do not
allow them to know that the content is money.

In the meantime, contact the Diplomatic Courier Services on Tel/Fax:
+447024045871 to know the actual situation with the consignment and
also give them your address where the consignment will be delivered
and then get back to me with your DIRECT MOBILE TELEPHONE NUMBER as
soon as possible so that I can give you the shipment document.

Meanwhile, I will be coming over to your country immediately the
consignment arrives in your Country for my 60% while you take 40%
share. Congrats and remain blessed.

Best Regards,

Dr. O. Joseph

Re: [net-next:master 1184/1189] include/linux/bpf.h:324:21: error: 'NUMA_NO_NODE' undeclared

2017-08-20 Thread David Miller

From: kbuild test robot 
Date: Sun, 20 Aug 2017 13:43:54 +0800

> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
> master
> head:   228498596c44041c710f5a633904205bc1cd9177
> commit: 96eabe7a40aa17e613cf3db2c742ee8b1fc764d0 [1184/1189] bpf: Allow 
> selecting numa node during map creation
> config: i386-randconfig-s1-201734 (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> git checkout 96eabe7a40aa17e613cf3db2c742ee8b1fc764d0
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All error/warnings (new ones prefixed by >>):
> 
>In file included from net/bpf/test_run.c:7:0:
>include/linux/bpf.h: In function 'bpf_map_attr_numa_node':
>>> include/linux/bpf.h:324:21: error: 'NUMA_NO_NODE' undeclared (first use in 
>>> this function)
>   attr->numa_node : NUMA_NO_NODE;
> ^~~~

I'll add the linux/numa.h include to linux/bpf.h

Re: [PATCH V2 net-next] net: hns3: Add support to change MTU in HNS3 hardware

2017-08-20 Thread Leon Romanovsky

On Fri, Aug 18, 2017 at 05:57:59PM +0100, Salil Mehta wrote:
> This patch adds the following support to the HNS3 driver:
> 1. Support to change the Maximum Transmission Unit of a
>of a port in the HNS NIC hardware .

Extra space before dot.

> 2. Initializes the supported MTU range for the netdevice.
>
> Signed-off-by: lipeng 

Does "lipeng" have name and surname?

> Signed-off-by: Salil Mehta 
> ---
> PATCH V2: Addresses comments given by Andrew Lunn
>   1. https://lkml.org/lkml/2017/8/18/282
> PATCH V1: Initial Submit
> ---
>  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 38 
> ++
>  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h |  1 +
>  2 files changed, 39 insertions(+)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> index e731f87..d905ea1 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
> @@ -1278,11 +1278,46 @@ static int hns3_ndo_set_vf_vlan(struct net_device 
> *netdev, int vf, u16 vlan,
>   return ret;
>  }
>
> +static int hns3_nic_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> + struct hns3_nic_priv *priv = netdev_priv(netdev);
> + struct hnae3_handle *h = priv->ae_handle;
> + bool if_running = netif_running(netdev);
> + int ret;
> +
> + if (!h->ae_algo->ops->set_mtu)
> + return -ENOTSUPP;
> +
> + /* if this was called with netdev up then bring netdevice down */
> + if (if_running) {
> + (void)hns3_nic_net_stop(netdev);
> + msleep(100);
> + }
> +
> + ret = h->ae_algo->ops->set_mtu(h, new_mtu);
> + if (ret) {
> + netdev_err(netdev, "failed to change MTU in hardware %d\n",
> +ret);
> + return ret;
> + }
> +
> + /* if the netdev was running earlier, bring it up again */
> + if (if_running) {
> + if (hns3_nic_net_open(netdev)) {
> + netdev_err(netdev, "MTU, couldnt up netdev again\n");

"couldnt" -> "couldn't"

and you don't actually need this print.
If the function hns3_nic_net_open fails, you will print this error there.

> + ret = -EINVAL;
> + }
> + }
> +
> + return ret;
> +}
> +
>  static const struct net_device_ops hns3_nic_netdev_ops = {
>   .ndo_open   = hns3_nic_net_open,
>   .ndo_stop   = hns3_nic_net_stop,
>   .ndo_start_xmit = hns3_nic_net_xmit,
>   .ndo_set_mac_address= hns3_nic_net_set_mac_address,
> + .ndo_change_mtu = hns3_nic_change_mtu,
>   .ndo_set_features   = hns3_nic_set_features,
>   .ndo_get_stats64= hns3_nic_get_stats64,
>   .ndo_setup_tc   = hns3_nic_setup_tc,
> @@ -2752,6 +2787,9 @@ static int hns3_client_init(struct hnae3_handle *handle)
>   goto out_reg_netdev_fail;
>   }
>
> + /* MTU range: (ETH_MIN_MTU(kernel default) - 9706) */
> + netdev->max_mtu = HNS3_MAX_MTU - (ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN);
> +
>   return ret;
>
>  out_reg_netdev_fail:
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h 
> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> index a6e8f15..7e87461 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.h
> @@ -76,6 +76,7 @@ enum hns3_nic_state {
>  #define HNS3_RING_NAME_LEN   16
>  #define HNS3_BUFFER_SIZE_20482048
>  #define HNS3_RING_MAX_PENDING32768
> +#define HNS3_MAX_MTU 9728
>
>  #define HNS3_BD_SIZE_512_TYPE0
>  #define HNS3_BD_SIZE_1024_TYPE   1
> --
> 2.7.4
>
>


signature.asc
Description: PGP signature

Re: [PATCH net 0/2] netfilter: ipvs: some fixes in sctp_conn_schedule

2017-08-20 Thread Julian Anastasov


Hello,

On Sun, 20 Aug 2017, Xin Long wrote:

> Patch 1/2 fixes the regression introduced by commit 5e26b1b3abce.
> Patch 2/2 makes ipvs not create conn for sctp ABORT packet.
> 
> Xin Long (2):
>   netfilter: ipvs: fix the issue that sctp_conn_schedule drops non-INIT
> packet
>   netfilter: ipvs: do not create conn for ABORT packet in
> sctp_conn_schedule
> 
>  net/netfilter/ipvs/ip_vs_proto_sctp.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Patchset looks ok to me,

Acked-by: Julian Anastasov

[PATCH net-next v2] cxgb4/cxgbvf: Handle 32-bit fw port capabilities

2017-08-20 Thread Ganesh Goudar

Implement new 32-bit Firmware Port Capabilities in order to
handle new speeds which couldn't be represented in the old 16-bit
Firmware Port Capabilities values.

Based on the original work of Casey Leedom 

Signed-off-by: Ganesh Goudar 
---
v2: Fixes build error when DCB is enabled
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  43 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c |  98 ++--
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c|  88 ++--
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 580 -
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h  | 175 ++-
 .../net/ethernet/chelsio/cxgb4vf/cxgb4vf_main.c|  50 +-
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_common.h |  86 +--
 drivers/net/ethernet/chelsio/cxgb4vf/t4vf_hw.c | 456 +---
 8 files changed, 1220 insertions(+), 356 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index b9bff1d..ea72d2d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -104,13 +104,13 @@ enum dev_state {
DEV_STATE_ERR
 };
 
-enum {
+enum cc_pause {
PAUSE_RX  = 1 << 0,
PAUSE_TX  = 1 << 1,
PAUSE_AUTONEG = 1 << 2
 };
 
-enum {
+enum cc_fec {
FEC_AUTO  = 1 << 0,  /* IEEE 802.3 "automatic" */
FEC_RS= 1 << 1,  /* Reed-Solomon */
FEC_BASER_RS  = 1 << 2   /* BaseR/Reed-Solomon */
@@ -366,6 +366,7 @@ struct adapter_params {
unsigned int max_ordird_qp;   /* Max read depth per RDMA QP */
unsigned int max_ird_adapter; /* Max read depth per adapter */
bool fr_nsmr_tpte_wr_support; /* FW support for FR_NSMR_TPTE_WR */
+   u8 fw_caps_support; /* 32-bit Port Capabilities */
 
/* MPS Buffer Group Map[per Port].  Bit i is set if buffer group i is
 * used by the Port
@@ -439,18 +440,34 @@ struct trace_params {
unsigned char port;
 };
 
+/* Firmware Port Capabilities types. */
+
+typedef u16 fw_port_cap16_t;   /* 16-bit Port Capabilities integral value */
+typedef u32 fw_port_cap32_t;   /* 32-bit Port Capabilities integral value */
+
+enum fw_caps {
+   FW_CAPS_UNKNOWN = 0,/* 0'ed out initial state */
+   FW_CAPS16   = 1,/* old Firmware: 16-bit Port Capabilities */
+   FW_CAPS32   = 2,/* new Firmware: 32-bit Port Capabilities */
+};
+
 struct link_config {
-   unsigned short supported;/* link capabilities */
-   unsigned short advertising;  /* advertised capabilities */
-   unsigned short lp_advertising;   /* peer advertised capabilities */
-   unsigned int   requested_speed;  /* speed user has requested */
-   unsigned int   speed;/* actual link speed */
-   unsigned char  requested_fc; /* flow control user has requested */
-   unsigned char  fc;   /* actual link flow control */
-   unsigned char  auto_fec; /* Forward Error Correction: */
-   unsigned char  requested_fec;/* "automatic" (IEEE 802.3), */
-   unsigned char  fec;  /* requested, and actual in use */
+   fw_port_cap32_t pcaps;   /* link capabilities */
+   fw_port_cap32_t def_acaps;   /* default advertised capabilities */
+   fw_port_cap32_t acaps;   /* advertised capabilities */
+   fw_port_cap32_t lpacaps; /* peer advertised capabilities */
+
+   fw_port_cap32_t speed_caps;  /* speed(s) user has requested */
+   unsigned int   speed;/* actual link speed (Mb/s) */
+
+   enum cc_pause  requested_fc; /* flow control user has requested */
+   enum cc_pause  fc;   /* actual link flow control */
+
+   enum cc_fecrequested_fec;/* Forward Error Correction: */
+   enum cc_fecfec;  /* requested and actual in use */
+
unsigned char  autoneg;  /* autonegotiating? */
+
unsigned char  link_ok;  /* link up? */
unsigned char  link_down_rc; /* link down reason */
 };
@@ -1580,6 +1597,8 @@ int t4_ofld_eq_free(struct adapter *adap, unsigned int 
mbox, unsigned int pf,
 int t4_sge_ctxt_flush(struct adapter *adap, unsigned int mbox);
 void t4_handle_get_port_info(struct port_info *pi, const __be64 *rpl);
 int t4_update_port_info(struct port_info *pi);
+int t4_get_link_params(struct port_info *pi, unsigned int *link_okp,
+  unsigned int *speedp, unsigned int *mtup);
 int t4_handle_fw_rpl(struct adapter *adap, const __be64 *rpl);
 void t4_db_full(struct adapter *adapter);
 void t4_db_dropped(struct adapter *adapter);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
index 03f593e..a71af1e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ethtool.c
+++

1 2 >

100 matches

Mail list logo