date:20180604

Re: [PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split failures

2018-06-04 Thread Ido Schimmel

On Mon, Jun 04, 2018 at 03:15:03PM -0700, dsah...@kernel.org wrote:
> From: David Ahern 
> 
> Return messages in extack for port split/unsplit errors. e.g.,
> $ devlink port split swp1s1 count 4
> Error: mlxsw_spectrum: Port cannot be split further.
> devlink answers: Invalid argument
> 
> $ devlink port unsplit swp4
> Error: mlxsw_spectrum: Port was not split.
> devlink answers: Invalid argument
> 
> Signed-off-by: David Ahern 

Reviewed-by: Ido Schimmel 

Thanks!

Re: [PATCH net-next] net: metrics: add proper netlink validation

2018-06-04 Thread Eric Dumazet




On 06/04/2018 04:46 PM, Eric Dumazet wrote:
> Before using nla_get_u32(), better make sure the attribute
> is of the proper size.
> 
> 

> Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file")
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Eric Dumazet 
> Reported-by: syzbot 
> Cc: David Ahern 
> ---
>  net/ipv4/fib_semantics.c | 2 ++
>  net/ipv4/metrics.c   | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 
> 6608db23f54b6afdac0455650b47d64b1b22b255..9a890be8a0265edb78da225a82e2cac120f2150f
>  100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -717,6 +717,8 @@ bool fib_metrics_match(struct fib_config *cfg, struct 
> fib_info *fi)
>   nla_strlcpy(tmp, nla, sizeof(tmp));
>   val = tcp_ca_get_key_by_name(fi->fib_net, tmp, _ca);
>   } else {
> + if (nla_len(nla) != sizeof(u32)

Oh well, stupid typo.

> + return false;
>   val = nla_get_u32(nla);

I will send a V2.

Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq

2018-06-04 Thread Harini Katakam

Hi Jeniffer,

On Mon, Jun 4, 2018 at 8:35 PM, Nicolas Ferre
 wrote:
> Jennifer,
>
> On 25/05/2018 at 23:44, Jennifer Dahm wrote:
>>
>> During testing, I discovered that the Zynq GEM hardware overwrites all
>> outgoing UDP packet checksums, which is illegal in packet forwarding
>> cases. This happens both with and without the checksum-zeroing
>> behavior  introduced  in  007e4ba3ee137f4700f39aa6dbaf01a71047c5f6
>> ("net: macb: initialize checksum when using checksum offloading"). The
>> only solution to both the small packet bug and the packet forwarding
>> bug that I can find is to disable TX checksum offloading entirely.
>
>

Thanks for the extensive testing.
I'll try to reproduce and see if it is something to be fixed in the driver.

> Are the bugs listed above present in all revisions of the GEM IP, only for
> some revisions?
> Is there an errata that describe this issue for the Zynq GEM?

@Nicolas, AFAIK, there is no errata for this in either Cadence or
Zynq documentation.

Regards,
Harini

Re: [PATCH net-next] net/mlx5e: Make function mlx5e_change_rep_mtu() static

2018-06-04 Thread Leon Romanovsky

On Tue, Jun 05, 2018 at 02:42:45AM +, Wei Yongjun wrote:
> Fixes the following sparse warning:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:903:5: warning:
>  symbol 'mlx5e_change_rep_mtu' was not declared. Should it be static?
>
> Signed-off-by: Wei Yongjun 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>

Thanks,
Reviewed-by: Leon Romanovsky 


signature.asc
Description: PGP signature

[PATCH net] sctp: not allow transport timeout value less than HZ/5 for hb_timer

2018-06-04 Thread Xin Long

syzbot reported a rcu_sched self-detected stall on CPU which is caused
by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
value, hb_timer will get stuck there, as in its timer handler it starts
this timer again with this value, then goes to the timer handler again.

This problem is there since very beginning, and thanks to Eric for the
reproducer shared from a syzbot mail.

This patch fixes it by not allowing sctp_transport_timeout to return a
smaller value than HZ/5 for hb_timer, which is based on TCP's min rto.

Note that it doesn't fix this issue by limiting rto_min, as some users
are still using small rto and no proper value was found for it yet.

Reported-by: syzbot+3dcd59a1f907245f8...@syzkaller.appspotmail.com
Suggested-by: Marcelo Ricardo Leitner 
Signed-off-by: Xin Long 
---
 net/sctp/transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 47f82bd..03fc2c4 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -634,7 +634,7 @@ unsigned long sctp_transport_timeout(struct sctp_transport 
*trans)
trans->state != SCTP_PF)
timeout += trans->hbinterval;
 
-   return timeout;
+   return max_t(unsigned long, timeout, HZ / 5);
 }
 
 /* Reset transport variables to their initial values */
-- 
2.1.0

[PATCH net] failover: eliminate callback hell

2018-06-04 Thread Stephen Hemminger

The net failover should be a simple library, not a virtual
object with function callbacks (see callback hell).
The code is simpler is smaller both for the netvsc and virtio use case.

The code is restructured in many ways. I should have given these
as review comments to net_failover during review
but did not want to overwhelm the original submitter.
Therefore it was merged prematurely.

Some of the many items changed are:

  * The support routines should just be selected as needed in
kernel config, no need for them to be visible config items.

  * Both netvsc and net_failover should keep their list of their
own devices. Not a common list.

  * The matching of secondary device to primary device policy
is up to the network device. Both net_failover and netvsc
will use MAC for now but can change separately.

  * The match policy is only used during initial discovery; after
that the secondary device knows what the upper device is because
of the parent/child relationship; no searching is required.

  * Now, netvsc and net_failover use the same delayed work type
mechanism for setup. Previously, net_failover code was triggering off
name change but a similar policy was rejected for netvsc.
"what is good for the goose is good for the gander"

  * The net_failover private device info 'struct net_failover_info'
should have been private to the driver file, not a visible
API.

  * The net_failover device should use SET_NETDEV_DEV
that is intended only for physical devices not virtual devices.

  * No point in having DocBook style comments on a driver file.
They only make sense on an external exposed API.

  * net_failover only supports Ethernet, so use ether_addr_copy.

  * Set permanent and current address of net_failover device
to match the primary.

  * Carrier should be marked off before registering device
the net_failover device.

  * Use netdev_XXX for log messages, in net_failover (not dev_xxx)

  * Since failover infrastructure is about linking devices just
use RTNL no need for other locking in init and teardown.

  * Don't bother with ERR_PTR() style return if only possible
return is success or no memory.

  * As much as possible, the terms master and slave should be avoided
because of their cultural connotations.

Note; this code has been tested on Hyper-V
but is compile tested only on virtio.

Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
Signed-off-by: Stephen Hemminger 
---

Although this patch needs to go into 4.18 (linux-net),
this version is based against net-next because net-next
hasn't been merged into linux-net yet.


 drivers/net/hyperv/hyperv_net.h |   3 +-
 drivers/net/hyperv/netvsc_drv.c | 173 +++--
 drivers/net/net_failover.c  | 312 ---
 drivers/net/virtio_net.c|   9 +-
 include/net/failover.h  |  31 +---
 include/net/net_failover.h  |  32 +---
 net/Kconfig |  13 +-
 net/core/failover.c | 316 
 8 files changed, 373 insertions(+), 516 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 99d8e7398a5b..c7d25d10765e 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -902,6 +902,8 @@ struct net_device_context {
struct hv_device *device_ctx;
/* netvsc_device */
struct netvsc_device __rcu *nvdev;
+   /* list of netvsc net_devices */
+   struct list_head list;
/* reconfigure work */
struct delayed_work dwork;
/* last reconfig time */
@@ -933,7 +935,6 @@ struct net_device_context {
/* Serial number of the VF to team with */
u32 vf_serial;
 
-   struct failover *failover;
 };
 
 /* Per channel data */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index bef4d55a108c..074e6b8578df 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -70,6 +70,8 @@ static int debug = -1;
 module_param(debug, int, 0444);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 
+static LIST_HEAD(netvsc_dev_list);
+
 static void netvsc_change_rx_flags(struct net_device *net, int change)
 {
struct net_device_context *ndev_ctx = netdev_priv(net);
@@ -1846,101 +1848,120 @@ static void netvsc_vf_setup(struct work_struct *w)
}
 
vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
-   if (vf_netdev)
+   if (vf_netdev) {
__netvsc_vf_setup(ndev, vf_netdev);
-
+   dev_put(vf_netdev);
+   }
rtnl_unlock();
 }
 
-static int netvsc_pre_register_vf(struct net_device *vf_netdev,
- struct net_device *ndev)
+static struct net_device *get_netvsc_bymac(const u8 *mac)
 {
-   struct net_device_context *net_device_ctx;
-   struct netvsc_device *netvsc_dev;
+   struct net_device_context *ndev_ctx;

[PATCH net-next] net/mlx5e: fix error return code in mlx5e_alloc_rq()

2018-06-04 Thread Wei Yongjun

Fix to return error code -ENOMEM from the kvzalloc_node() error handling
case instead of 0, as done elsewhere in this function.

Fixes: 069d11465a80 ("net/mlx5e: RX, Enhance legacy Receive Queue memory 
scheme")
Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 333d4ed..89c96a0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -566,8 +566,10 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
kvzalloc_node((wq_sz << rq->wqe.info.log_num_frags) *
  sizeof(*rq->wqe.frags),
  GFP_KERNEL, cpu_to_node(c->cpu));
-   if (!rq->wqe.frags)
+   if (!rq->wqe.frags) {
+   err = -ENOMEM;
goto err_free;
+   }
 
err = mlx5e_init_di_list(rq, params, wq_sz, c->cpu);
if (err)

[PATCH net-next] net/mlx5e: Make function mlx5e_change_rep_mtu() static

2018-06-04 Thread Wei Yongjun

Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:903:5: warning:
 symbol 'mlx5e_change_rep_mtu' was not declared. Should it be static?

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 3857f22..57987f6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -900,7 +900,7 @@ int mlx5e_get_offload_stats(int attr_id, const struct 
net_device *dev,
.switchdev_port_attr_get= mlx5e_attr_get,
 };
 
-int mlx5e_change_rep_mtu(struct net_device *netdev, int new_mtu)
+static int mlx5e_change_rep_mtu(struct net_device *netdev, int new_mtu)
 {
return mlx5e_change_mtu(netdev, new_mtu, NULL);
 }

[RFC PATCH] net: aquantia: hw_atl_utils_mpi_set_state() can be static

2018-06-04 Thread kbuild test robot



Fixes: 45c5c36aa288 ("net: aquantia: Improve adapter init/deinit logic")
Signed-off-by: kbuild test robot 
---
 hw_atl_utils.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 9d0a96d..3d60a48 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -533,8 +533,8 @@ int hw_atl_utils_mpi_set_speed(struct aq_hw_s *self, u32 
speed)
return 0;
 }
 
-int hw_atl_utils_mpi_set_state(struct aq_hw_s *self,
-  enum hal_atl_utils_fw_state_e state)
+static int hw_atl_utils_mpi_set_state(struct aq_hw_s *self,
+ enum hal_atl_utils_fw_state_e state)
 {
int err = 0;
u32 transaction_id = 0;

Re: [PATCH net-next 2/5] net: aquantia: Improve adapter init/deinit logic

2018-06-04 Thread kbuild test robot

Hi Igor,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Igor-Russkikh/net-aquantia-Ethtool-based-ring-size-configuration/20180601-05
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

   drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:525:5: sparse: 
symbol 'hw_atl_utils_mpi_set_speed' was not declared. Should it be static?
>> drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:536:5: sparse: 
>> symbol 'hw_atl_utils_mpi_set_state' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH net-next] net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default.

2018-06-04 Thread Florian Fainelli

Le 06/04/18 à 13:17, Kun Yi a écrit :
> BCM54612E have 4 multi-functional LED pins that can be configured
> through register setting; the LED4 pin can be configured to a 125MHz
> reference clock output by setting the spare register. Since the dedicated
> CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
> pin is the only pin to provide such function in this package, and therefore
> it is beneficial to just enable the reference clock by default.

Checked the data sheet and this appears to be absolutely correct:

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: AF_XDP. Was: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread Alexander Duyck

On Mon, Jun 4, 2018 at 4:32 PM, Alexei Starovoitov
 wrote:
> On Mon, Jun 04, 2018 at 03:02:31PM -0700, Alexander Duyck wrote:
>> On Mon, Jun 4, 2018 at 2:27 PM, David Miller  wrote:
>> > From: Or Gerlitz 
>> > Date: Tue, 5 Jun 2018 00:11:35 +0300
>> >
>> >> Just to make sure, is the AF_XDP ZC (Zero Copy) UAPI going to be
>> >> merged for this window -- AFAIU from [1], it's still under
>> >> examination/development/research for non Intel HWs, am I correct or
>> >> this is going to get in now?
>> >
>> > All of the pending AF_XDP changes will be merged this merge window.
>> >
>> > I think Intel folks need to review things as fast as possible because
>> > I pretty much refuse to revert the series or disable it in Kconfig at
>> > this point.
>> >
>> > Thank you.
>>
>> My understanding of things is that the current AF_XDP patches were
>> going to be updated to have more of a model agnostic API such that
>> they would work for either the "typewriter" mode or the descriptor
>> ring based approach. The current plan was to have the zero copy
>> patches be a follow-on after the vendor agnostic API bits in the
>> descriptors and such had been sorted out. I believe you guys have the
>> descriptor fixes already right?
>>
>> In my opinion the i40e code isn't mature enough yet to really go into
>> anything other than maybe net-next in a couple weeks. We are going to
>> need a while to get adequate testing in order to flush out all the
>> bugs and performance regressions we are likely to see coming out of
>> this change.
>
> I think the work everyone did in this release cycle increased my confidence
> that the way descriptors are defined and the rest of uapi are stable enough
> and i40e zero copy bits can land in the next release without uapi changes.
> In that sense even if we merge i40e parts now, the other nic vendors
> will be in the same situation and may find things that they would like
> to improve in uapi.
> So I propose we merge the first 7 patches of the last series now and
> let 3 remaining i40e patches go via intel trees for the next release.
> In the mean time other NIC vendors should start actively working
> on AF_XDP support as well.
> If somehow uapi would need tweaks, we can still do minor adjustments
> since 4.18 won't be released for ~10 weeks.
>

That works for me. Actually I think patch 11 can probably be included
as well since that is just sample code and could probably be used by
whatever drivers end up implementing this.

Thanks.

- Alex

[PATCH net-next v2] net: qualcomm: rmnet: Fix use after free while sending command ack

2018-06-04 Thread Subash Abhinov Kasiviswanathan

When sending an ack to a command packet, the skb is still referenced
after it is sent to the real device. Since the real device could
free the skb, the device pointer would be invalid.
Also, remove an unnecessary variable.

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial 
implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan 

---
v1->v2: Rebase change on net-next instead as mentioned by David.
Also remove an unnecessary variable.
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
index 56a93df..3ee8ae9 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -67,7 +67,7 @@ static void rmnet_map_send_ack(struct sk_buff *skb,
   struct rmnet_port *port)
 {
struct rmnet_map_control_command *cmd;
-   int xmit_status;
+   struct net_device *dev = skb->dev;
 
if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4)
skb_trim(skb,
@@ -78,9 +78,9 @@ static void rmnet_map_send_ack(struct sk_buff *skb,
cmd = RMNET_MAP_GET_CMD_START(skb);
cmd->cmd_type = type & 0x03;
 
-   netif_tx_lock(skb->dev);
-   xmit_status = skb->dev->netdev_ops->ndo_start_xmit(skb, skb->dev);
-   netif_tx_unlock(skb->dev);
+   netif_tx_lock(dev);
+   dev->netdev_ops->ndo_start_xmit(skb, dev);
+   netif_tx_unlock(dev);
 }
 
 /* Process MAP command frame and send N/ACK message as appropriate. Message cmd
-- 
1.9.1

[PATCH net-next v2] net: ipv6: Generate random IID for addresses on RAWIP devices

2018-06-04 Thread Subash Abhinov Kasiviswanathan

RAWIP devices such as rmnet do not have a hardware address and
instead require the kernel to generate a random IID for the
IPv6 addresses.

Signed-off-by: Sean Tranchetti 
Signed-off-by: Subash Abhinov Kasiviswanathan 

---
v1->v2: Yoshfuji suggested to update the I/G and G/L bit.
Similar functionality is already implemented by addrconf_ifid_ip6tnl()
so use it.
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 4 
 net/ipv6/addrconf.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index cb02e1a..b9a7548 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -221,6 +221,10 @@ void rmnet_vnd_setup(struct net_device *rmnet_dev)
 
rmnet_dev->needs_free_netdev = true;
rmnet_dev->ethtool_ops = _ethtool_ops;
+
+   /* This perm addr will be used as interface identifier by IPv6 */
+   rmnet_dev->addr_assign_type = NET_ADDR_RANDOM;
+   eth_random_addr(rmnet_dev->perm_addr);
 }
 
 /* Exposed API */
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f09afc2..5596d87 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2251,6 +2251,7 @@ static int ipv6_generate_eui64(u8 *eui, struct net_device 
*dev)
return addrconf_ifid_ieee1394(eui, dev);
case ARPHRD_TUNNEL6:
case ARPHRD_IP6GRE:
+   case ARPHRD_RAWIP:
return addrconf_ifid_ip6tnl(eui, dev);
}
return -1;
@@ -3286,7 +3287,8 @@ static void addrconf_dev_config(struct net_device *dev)
(dev->type != ARPHRD_IP6GRE) &&
(dev->type != ARPHRD_IPGRE) &&
(dev->type != ARPHRD_TUNNEL) &&
-   (dev->type != ARPHRD_NONE)) {
+   (dev->type != ARPHRD_NONE) &&
+   (dev->type != ARPHRD_RAWIP)) {
/* Alas, we support only Ethernet autoconfiguration. */
return;
}
-- 
1.9.1

Re: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread David Miller

From: Jeff Kirsher 
Date: Mon,  4 Jun 2018 10:56:32 -0700

> This series contains a smorgasbord of updates to documentation, e1000e,
> igb, ixgbe, ixgbevf and i40e.
 ...
> The following are changes since commit 
> 8284fd4cb85577eecca024fe1e7a35b39ed0f3f5:
>   Merge branch 'selftests-net-various'
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Pulled, thanks Jeff.

Re: [PATCH net-next v11 04/10] netdev: cavium: octeon: Add Octeon III BGX Ports

2018-06-04 Thread Andrew Lunn

> + if (status.link) {
> + /* Always full duplex */
> + status.duplex = DUPLEX_FULL;
> +
> + /* Speed */
> + speed = bgx_port_get_qlm_speed(priv, priv->qlm);
> + data = oct_csr_read(BGX_CMR_CONFIG(priv->node, priv->bgx,
> +priv->index));
> + switch ((data >> 8) & 7) {
> + default:
> + case 1:
> + speed = (speed * 8 + 5) / 10;
> + lanes = 4;
> + break;

Hi Steven

Here you add 5, which you did not in the other function dealing with
speed...

> + priv->phydev = of_phy_connect(netdev, priv->phy_np,
> +   bgx_port_adjust_link, 0,
> +   PHY_INTERFACE_MODE_SGMII);
> + if (!priv->phydev)
> + return -ENODEV;
> +
> + netif_carrier_off(netdev);
> +
> + if (priv->phydev)
> + phy_start_aneg(priv->phydev);
> + }

If you are using phylib, you should not need to make calls to
netif_carrier_*(). The phylib will do it for you.

Why hard code passing PHY_INTERFACE_MODE_SGMII? You also support
RGMII? It would be better to use of_get_phy_mode().

> +int bgx_port_change_mtu(struct net_device *netdev, int new_mtu)
> +{
> + struct bgx_port_priv *priv = bgx_port_netdev2priv(netdev);
> + int max_frame;
> +
> + if (new_mtu < 60 || new_mtu > 65392) {
> + netdev_warn(netdev, "Maximum MTU supported is 65392\n");
> + return -EINVAL;
> + }

The core can check this for you, if you tell it the MAX and Min.

Andrew

Re: [PATCH net-next v11 03/10] netdev: cavium: octeon: Add Octeon III BGX Ethernet Nexus

2018-06-04 Thread Andrew Lunn

> + /* Connect to PKI/PKO */
> + data = oct_csr_read(BGX_CMR_CONFIG(numa_node, interface, port));
> + if (is_mix)
> + data |= BIT(11);
> + else
> + data &= ~BIT(11);
> + oct_csr_write(data, BGX_CMR_CONFIG(numa_node, interface, port));
> +

Hi Steven

This driver has quite a lot of magic BIT macros. Can you add some
#defines with useful names?

 Thanks
Andrew

Re: [PATCH net-next] net: metrics: add proper netlink validation

2018-06-04 Thread Eric Dumazet

On 06/04/2018 04:54 PM, David Ahern wrote:
> On 6/4/18 4:46 PM, Eric Dumazet wrote:
>> Before using nla_get_u32(), better make sure the attribute
>> is of the proper size.
>>
>> Code recently was changed, but bug has been there from beginning
>> of git.
>>
> ...
>>
>> Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file")
> 
> That commit just moved the code from 1 file to another. The previous
> commit id is 6cf9dfd3bd62e, but it just moved code to a helper. The
> originating commit id for the ip_metrics_convert bug is:
> 

Please read what I wrote.

I simply wanted to warn stable teams that your this patch is based on recent 
tree,
but bug has been there forever.

The Fixes: tag might help them to cook proper backports, thats is all.

A Fixes: tag does not blame the code, it simply gives some hints.

> ea697639992d9 ("net: tcp: add RTAX_CC_ALGO fib handling")
> 

This patch has not added any bug, it was there already.

I can put a (long) list of tags, but ultimately the bug has been there forever.

Re: [PATCH net-next] net: metrics: add proper netlink validation

2018-06-04 Thread David Ahern

On 6/4/18 4:46 PM, Eric Dumazet wrote:
> Before using nla_get_u32(), better make sure the attribute
> is of the proper size.
> 
> Code recently was changed, but bug has been there from beginning
> of git.
> 
...
> 
> Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file")

That commit just moved the code from 1 file to another. The previous
commit id is 6cf9dfd3bd62e, but it just moved code to a helper. The
originating commit id for the ip_metrics_convert bug is:

ea697639992d9 ("net: tcp: add RTAX_CC_ALGO fib handling")


> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Eric Dumazet 
> Reported-by: syzbot 
> Cc: David Ahern 
> ---
>  net/ipv4/fib_semantics.c | 2 ++
>  net/ipv4/metrics.c   | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 
> 6608db23f54b6afdac0455650b47d64b1b22b255..9a890be8a0265edb78da225a82e2cac120f2150f
>  100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -717,6 +717,8 @@ bool fib_metrics_match(struct fib_config *cfg, struct 
> fib_info *fi)
>   nla_strlcpy(tmp, nla, sizeof(tmp));
>   val = tcp_ca_get_key_by_name(fi->fib_net, tmp, _ca);
>   } else {
> + if (nla_len(nla) != sizeof(u32)
> + return false;
>   val = nla_get_u32(nla);
>   }
>  
> diff --git a/net/ipv4/metrics.c b/net/ipv4/metrics.c
> index 
> 5121c6475e6b0e9a9a158d4cee473f52cd4d8efe..04311f7067e2e9e3dafb89aa4f8e30dab0fde854
>  100644
> --- a/net/ipv4/metrics.c
> +++ b/net/ipv4/metrics.c
> @@ -32,6 +32,8 @@ int ip_metrics_convert(struct net *net, struct nlattr 
> *fc_mx, int fc_mx_len,
>   if (val == TCP_CA_UNSPEC)
>   return -EINVAL;
>   } else {
> + if (nla_len(nla) != sizeof(u32))
> + return -EINVAL;
>   val = nla_get_u32(nla);
>   }
>   if (type == RTAX_ADVMSS && val > 65535 - 40)
>

[PATCH net-next] net: metrics: add proper netlink validation

2018-06-04 Thread Eric Dumazet

Before using nla_get_u32(), better make sure the attribute
is of the proper size.

Code recently was changed, but bug has been there from beginning
of git.

BUG: KMSAN: uninit-value in rtnetlink_put_metrics+0x553/0x960 
net/core/rtnetlink.c:746
CPU: 1 PID: 14139 Comm: syz-executor6 Not tainted 4.17.0-rc5+ #103
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084
 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686
 rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746
 fib_dump_info+0xc42/0x2190 net/ipv4/fib_semantics.c:1361
 rtmsg_fib+0x65f/0x8c0 net/ipv4/fib_semantics.c:419
 fib_table_insert+0x2314/0x2b50 net/ipv4/fib_trie.c:1287
 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779
 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg net/socket.c:639 [inline]
 ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
 __sys_sendmsg net/socket.c:2155 [inline]
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x455a09
RSP: 002b:7faae5fd8c68 EFLAGS: 0246 ORIG_RAX: 002e
RAX: ffda RBX: 7faae5fd96d4 RCX: 00455a09
RDX:  RSI: 2000 RDI: 0013
RBP: 0072bea0 R08:  R09: 
R10:  R11: 0246 R12: 
R13: 05d0 R14: 006fdc20 R15: 

Uninit was stored to memory at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
 kmsan_save_stack mm/kmsan/kmsan.c:294 [inline]
 kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685
 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:529
 fib_convert_metrics net/ipv4/fib_semantics.c:1056 [inline]
 fib_create_info+0x2d46/0x9dc0 net/ipv4/fib_semantics.c:1150
 fib_table_insert+0x3e4/0x2b50 net/ipv4/fib_trie.c:1146
 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779
 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646
 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448
 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg net/socket.c:639 [inline]
 ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
 __sys_sendmsg net/socket.c:2155 [inline]
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
 kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
 slab_post_alloc_hook mm/slab.h:446 [inline]
 slab_alloc_node mm/slub.c:2753 [inline]
 __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395
 __kmalloc_reserve net/core/skbuff.c:138 [inline]
 __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
 alloc_skb include/linux/skbuff.h:988 [inline]
 netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
 netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg net/socket.c:639 [inline]
 ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117
 __sys_sendmsg net/socket.c:2155 [inline]
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Cc: David Ahern 
---
 net/ipv4/fib_semantics.c | 2 ++
 net/ipv4/metrics.c   | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 
6608db23f54b6afdac0455650b47d64b1b22b255..9a890be8a0265edb78da225a82e2cac120f2150f
 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -717,6 +717,8 @@ bool fib_metrics_match(struct fib_config *cfg, struct 
fib_info *fi)

Re: [PATCH bpf-next v2 2/2] samples/bpf: Add xdp_sample_pkts example

2018-06-04 Thread Daniel Borkmann

On 06/05/2018 12:32 AM, Jakub Kicinski wrote:
> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>> +if (load_bpf_file(filename)) {
> 
> Would you mind using libbpf instead of bpf_load.o?  I converted some
> samples in be5bca44aa6b ("samples: bpf: convert some XDP samples from
> bpf_load to libbpf"), it's pretty straight forward.  Maybe we can kill
> bpf_load.o one day :)

Agreed, we should only be using libbpf going forward.

Re: [PATCH bpf-next v2 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events

2018-06-04 Thread Daniel Borkmann

On 06/05/2018 12:26 AM, Jakub Kicinski wrote:
> On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
>> This adds two new helper functions to trace_helpers that supports polling
>> multiple perf file descriptors for events. These are used to the XDP
>> perf_event_output example, which needs to work with one perf fd per CPU.
>>
>> Signed-off-by: Toke Høiland-Jørgensen 
> 
> Did you take a look at tools/bpf/bpftool/map_perf_ring.c ?
> 
> I think the ability to poll multiple FDs could be generally useful and
> therefore better add it to libbpf.c than
> tools/testing/selftests/bpf/trace_helpers.c?  I'm not 100% sure myself...

I think for it to land in libbpf this code needs to be more generalized
as it is right now and allowing for more flexibility like pinning RB
processing threads to CPUs, poll handling, etc.

Re: [PATCH net-next v2 1/5] net: aquantia: Ethtool based ring size configuration

2018-06-04 Thread Igor Russkikh



>> +mutex_lock(>aq_mutex);
>> +
>>  if (aq_utils_obj_test(>flags, AQ_NIC_FLAGS_IS_NOT_READY))
>>  goto err_exit;
>>  
>> @@ -175,6 +177,7 @@ static void aq_nic_service_timer_cb(struct timer_list *t)
>>  ctimer = max(ctimer / 2, 1);
>>  
>>  err_exit:
>> +mutex_unlock(>aq_mutex);
>>  mod_timer(>service_timer, jiffies + ctimer);
>>  }
>>  
> 
> This looks like a timer callback from the prototype, I don't think you
> can take mutexes in timer callbacks.

True as well. Eventually, think we may just get rid of mutex inside of this 
callback.

Mutex then will only serve to prevent possible parallel `ethtool -G` collisions 
from happening.

BR, Igor

AF_XDP. Was: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread Alexei Starovoitov

On Mon, Jun 04, 2018 at 03:02:31PM -0700, Alexander Duyck wrote:
> On Mon, Jun 4, 2018 at 2:27 PM, David Miller  wrote:
> > From: Or Gerlitz 
> > Date: Tue, 5 Jun 2018 00:11:35 +0300
> >
> >> Just to make sure, is the AF_XDP ZC (Zero Copy) UAPI going to be
> >> merged for this window -- AFAIU from [1], it's still under
> >> examination/development/research for non Intel HWs, am I correct or
> >> this is going to get in now?
> >
> > All of the pending AF_XDP changes will be merged this merge window.
> >
> > I think Intel folks need to review things as fast as possible because
> > I pretty much refuse to revert the series or disable it in Kconfig at
> > this point.
> >
> > Thank you.
> 
> My understanding of things is that the current AF_XDP patches were
> going to be updated to have more of a model agnostic API such that
> they would work for either the "typewriter" mode or the descriptor
> ring based approach. The current plan was to have the zero copy
> patches be a follow-on after the vendor agnostic API bits in the
> descriptors and such had been sorted out. I believe you guys have the
> descriptor fixes already right?
> 
> In my opinion the i40e code isn't mature enough yet to really go into
> anything other than maybe net-next in a couple weeks. We are going to
> need a while to get adequate testing in order to flush out all the
> bugs and performance regressions we are likely to see coming out of
> this change.

I think the work everyone did in this release cycle increased my confidence
that the way descriptors are defined and the rest of uapi are stable enough
and i40e zero copy bits can land in the next release without uapi changes.
In that sense even if we merge i40e parts now, the other nic vendors
will be in the same situation and may find things that they would like
to improve in uapi.
So I propose we merge the first 7 patches of the last series now and
let 3 remaining i40e patches go via intel trees for the next release.
In the mean time other NIC vendors should start actively working
on AF_XDP support as well.
If somehow uapi would need tweaks, we can still do minor adjustments
since 4.18 won't be released for ~10 weeks.

Re: [bpf-next PATCH] bpf: sockmap, fix crash when ipv6 sock is added

2018-06-04 Thread Daniel Borkmann

On 06/05/2018 01:08 AM, John Fastabend wrote:
> On 06/04/2018 12:59 PM, Daniel Borkmann wrote:
>> On 06/04/2018 05:21 PM, John Fastabend wrote:
>>> This fixes a crash where we assign tcp_prot to IPv6 sockets instead
>>> of tcpv6_prot.
>>>
>>> Previously we overwrote the sk->prot field with tcp_prot even in the
>>> AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
>>> are used. Further, only allow ESTABLISHED connections to join the
>>> map per note in TLS ULP,
>>>
>>>/* The TLS ulp is currently supported only for TCP sockets
>>> * in ESTABLISHED state.
>>> * Supporting sockets in LISTEN state will require us
>>> * to modify the accept implementation to clone rather then
>>> * share the ulp context.
>>> */
>>>
>>> Also tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
>>> 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
>>> crashing case here.
>>>
>>> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
>>> Reported-by: syzbot+5c063698bdbfac19f...@syzkaller.appspotmail.com
>>> Signed-off-by: John Fastabend 
>>> Signed-off-by: Wei Wang 
>>
>> Applied to bpf-next, thanks everyone!
> 
> Thanks Daniel, this has the unfortunate side-effect though of
> making it hard to add sockets transitioning from LISTEN into
> ESTABLISHED states to a sockmap. Before this patch we could add
> sockets from the sock_ops event BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
> to a sock{map|hash}. However, this is before a socket is in established
> state so risked crashing and wasn't valid at all per this thread. So
> I believe its correct to block this action, seeing it will crash a
> system in many (most!) cases.
> 
> That said we still would like to support pushing sockets into a
> sock{map|hash} in this case. I thought about adding a new hook but
> we already have a few sock op hooks in the TCP stack so its too bad we
> don't have one that fires after the ESTABLISHED state has transitioned.
> Right now I'm looking into seeing if the following would have any
> issues,
> 
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2206,9 +2206,6 @@ void tcp_set_state(struct sock *sk, int state)
> BUILD_BUG_ON((int)BPF_TCP_NEW_SYN_RECV != (int)TCP_NEW_SYN_RECV);
> BUILD_BUG_ON((int)BPF_TCP_MAX_STATES != (int)TCP_MAX_STATES);
>  
> -   if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_STATE_CB_FLAG))
> -   tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_STATE_CB, oldstate, state);
> -
> switch (state) {
> case TCP_ESTABLISHED:
> if (oldstate != TCP_ESTABLISHED)
> @@ -2234,6 +2231,9 @@ void tcp_set_state(struct sock *sk, int state)
>  */
> inet_sk_state_store(sk, state);
>  
> +   if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_STATE_CB_FLAG))
> +   tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_STATE_CB, oldstate, state);
> +
>  #ifdef STATE_TRACE
> SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk, 
> statename[oldstate], statename[state]);
>  #endif
> 
> This would change the call hook slightly, moving it to after the state
> change. However unless the unhash is some how visible from the bpf program
> I don't think it should impact existing BPF programs.

Hmm, the current fix also breaks compilation when IPv6 is compiled out, so I had
to take it out for now. :-( I think this needs similar workaround as in kTLS 
case
in tls_init(). Given this and your above seen side-effect, lets respin all with 
a
clean fix.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
head:   4e1f687e302835e45e2f296392f21cfeb5671303
commit: 4e1f687e302835e45e2f296392f21cfeb5671303 [3/3] bpf: sockmap, fix crash 
when ipv6 sock is added
config: i386-randconfig-a0-06041847 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 4e1f687e302835e45e2f296392f21cfeb5671303
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

   kernel/bpf/sockmap.o: In function `bpf_tcp_ulp_register':
>> kernel/bpf/sockmap.c:1127: undefined reference to `tcpv6_prot'
>> kernel/bpf/sockmap.c:1127: undefined reference to `tcpv6_prot'

vim +1127 kernel/bpf/sockmap.c

  1122  
  1123  static int bpf_tcp_ulp_register(void)
  1124  {
  1125  tcp_bpf_proto = tcp_prot;
  1126  tcp_bpf_proto.close = bpf_tcp_close;
> 1127  tcpv6_bpf_proto = tcpv6_prot;
  1128  tcpv6_bpf_proto.close = bpf_tcp_close;
  1129  /* Once BPF TX ULP is registered it is never unregistered. It
  1130   * will be in the ULP list for the lifetime of the system. Doing
  1131   * duplicate registers is not a problem.
  1132   */
  1133  return tcp_register_ulp(_tcp_ulp_ops);
  1134  }
  1135  

Thanks,
Daniel

Re: [PATCH net] ipmr: fix error path when mr_table_alloc fails

2018-06-04 Thread kbuild test robot

Hi Sabrina,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:
https://github.com/0day-ci/linux/commits/Sabrina-Dubroca/ipmr-fix-error-path-when-mr_table_alloc-fails/20180605-060837
config: x86_64-randconfig-x006-201822 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/current.h:5:0,
from include/linux/sched.h:12,
from include/linux/uaccess.h:5,
from net/ipv6/ip6mr.c:19:
   net/ipv6/ip6mr.c: In function 'ip6_mroute_setsockopt':
>> include/linux/compiler.h:177:26: warning: 'mrt' may be used uninitialized in 
>> this function [-Wmaybe-uninitialized]
 case 8: *(__u64 *)res = *(volatile __u64 *)p; break;  \
 ^
   net/ipv6/ip6mr.c:1751:20: note: 'mrt' was declared here
  struct mr_table *mrt;
   ^~~
--
   In file included from arch/x86/include/asm/current.h:5:0,
from include/linux/sched.h:12,
from include/linux/uaccess.h:5,
from net//ipv6/ip6mr.c:19:
   net//ipv6/ip6mr.c: In function 'ip6_mroute_setsockopt':
>> include/linux/compiler.h:177:26: warning: 'mrt' may be used uninitialized in 
>> this function [-Wmaybe-uninitialized]
 case 8: *(__u64 *)res = *(volatile __u64 *)p; break;  \
 ^
   net//ipv6/ip6mr.c:1751:20: note: 'mrt' was declared here
  struct mr_table *mrt;
   ^~~

vim +/mrt +177 include/linux/compiler.h

230fa253 Christian Borntraeger 2014-11-25  170  
d976441f Andrey Ryabinin   2015-10-19  171  #define __READ_ONCE_SIZE
\
d976441f Andrey Ryabinin   2015-10-19  172  ({  
\
d976441f Andrey Ryabinin   2015-10-19  173  switch (size) { 
\
d976441f Andrey Ryabinin   2015-10-19  174  case 1: *(__u8 *)res = 
*(volatile __u8 *)p; break;  \
d976441f Andrey Ryabinin   2015-10-19  175  case 2: *(__u16 *)res = 
*(volatile __u16 *)p; break;\
d976441f Andrey Ryabinin   2015-10-19  176  case 4: *(__u32 *)res = 
*(volatile __u32 *)p; break;\
d976441f Andrey Ryabinin   2015-10-19 @177  case 8: *(__u64 *)res = 
*(volatile __u64 *)p; break;\
d976441f Andrey Ryabinin   2015-10-19  178  default:
\
d976441f Andrey Ryabinin   2015-10-19  179  barrier();  
\
d976441f Andrey Ryabinin   2015-10-19  180  
__builtin_memcpy((void *)res, (const void *)p, size);   \
d976441f Andrey Ryabinin   2015-10-19  181  barrier();  
\
d976441f Andrey Ryabinin   2015-10-19  182  }   
\
d976441f Andrey Ryabinin   2015-10-19  183  })
d976441f Andrey Ryabinin   2015-10-19  184  

:: The code at line 177 was first introduced by commit
:: d976441f44bc5d48635d081d277aa76556ffbf8b compiler, atomics, kasan: 
Provide READ_ONCE_NOCHECK()

:: TO: Andrey Ryabinin 
:: CC: Ingo Molnar 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [bpf-next PATCH] bpf: sockmap, fix crash when ipv6 sock is added

2018-06-04 Thread John Fastabend

On 06/04/2018 12:59 PM, Daniel Borkmann wrote:
> On 06/04/2018 05:21 PM, John Fastabend wrote:
>> This fixes a crash where we assign tcp_prot to IPv6 sockets instead
>> of tcpv6_prot.
>>
>> Previously we overwrote the sk->prot field with tcp_prot even in the
>> AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
>> are used. Further, only allow ESTABLISHED connections to join the
>> map per note in TLS ULP,
>>
>>/* The TLS ulp is currently supported only for TCP sockets
>> * in ESTABLISHED state.
>> * Supporting sockets in LISTEN state will require us
>> * to modify the accept implementation to clone rather then
>> * share the ulp context.
>> */
>>
>> Also tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
>> 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
>> crashing case here.
>>
>> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
>> Reported-by: syzbot+5c063698bdbfac19f...@syzkaller.appspotmail.com
>> Signed-off-by: John Fastabend 
>> Signed-off-by: Wei Wang 
> 
> Applied to bpf-next, thanks everyone!
> 

Thanks Daniel, this has the unfortunate side-effect though of
making it hard to add sockets transitioning from LISTEN into
ESTABLISHED states to a sockmap. Before this patch we could add
sockets from the sock_ops event BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
to a sock{map|hash}. However, this is before a socket is in established
state so risked crashing and wasn't valid at all per this thread. So
I believe its correct to block this action, seeing it will crash a
system in many (most!) cases.

That said we still would like to support pushing sockets into a
sock{map|hash} in this case. I thought about adding a new hook but
we already have a few sock op hooks in the TCP stack so its too bad we
don't have one that fires after the ESTABLISHED state has transitioned.
Right now I'm looking into seeing if the following would have any
issues,

--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2206,9 +2206,6 @@ void tcp_set_state(struct sock *sk, int state)
BUILD_BUG_ON((int)BPF_TCP_NEW_SYN_RECV != (int)TCP_NEW_SYN_RECV);
BUILD_BUG_ON((int)BPF_TCP_MAX_STATES != (int)TCP_MAX_STATES);

-   if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_STATE_CB_FLAG))
-   tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_STATE_CB, oldstate, state);
-
switch (state) {
case TCP_ESTABLISHED:
if (oldstate != TCP_ESTABLISHED)
@@ -2234,6 +2231,9 @@ void tcp_set_state(struct sock *sk, int state)
 */
inet_sk_state_store(sk, state);

+   if (BPF_SOCK_OPS_TEST_FLAG(tcp_sk(sk), BPF_SOCK_OPS_STATE_CB_FLAG))
+   tcp_call_bpf_2arg(sk, BPF_SOCK_OPS_STATE_CB, oldstate, state);
+
 #ifdef STATE_TRACE
SOCK_DEBUG(sk, "TCP sk=%p, State %s -> %s\n", sk, statename[oldstate], 
statename[state]);
 #endif

This would change the call hook slightly, moving it to after the state
change. However unless the unhash is some how visible from the bpf program
I don't think it should impact existing BPF programs.

Thanks,
John

Re: [PATCH net-next v11 04/10] netdev: cavium: octeon: Add Octeon III BGX Ports

2018-06-04 Thread Andrew Lunn

> +static int bgx_port_get_qlm_speed(struct bgx_port_priv   *priv, int qlm)
> +{
> + enum lane_mode lmode;
> + u64 data;
> +
> + data = oct_csr_read(GSER_LANE_MODE(priv->node, qlm));
> + lmode = data & 0xf;
> +
> + switch (lmode) {
> + case R_25G_REFCLK100:
> + return 2500;
> + case R_5G_REFCLK100:
> + return 5000;
> + case R_8G_REFCLK100:
> + return 8000;
> + case R_125G_REFCLK15625_KX:
> + return 1250;
> + case R_3125G_REFCLK15625_XAUI:
> + return 3125;
> + case R_103125G_REFCLK15625_KR:
> + return 10312;
> + case R_125G_REFCLK15625_SGMII:
> + return 1250;
> + case R_5G_REFCLK15625_QSGMII:
> + return 5000;
> + case R_625G_REFCLK15625_RXAUI:
> + return 6250;
> + case R_25G_REFCLK125:
> + return 2500;
> + case R_5G_REFCLK125:
> + return 5000;
> + case R_8G_REFCLK125:
> + return 8000;
> + default:
> + return 0;
> + }


> + struct port_status status;
> + int speed;
> +
> + /* The simulator always uses a 1Gbps full duplex port */
> + if (octeon_is_simulation()) {
> + status.link = 1;
> + status.duplex = DUPLEX_FULL;
> + status.speed = 1000;
> + } else {
> + /* Use the qlm speed */
> + speed = bgx_port_get_qlm_speed(priv, priv->qlm);
> + status.link = 1;
> + status.duplex = DUPLEX_FULL;
> + status.speed = speed * 8 / 10;
> + }

Hi Steve

That looks like it gives some odd speeds.

2500 * 8 / 10 = 2Gbps
5000 * 8 / 10 = 4Gbps
8000 * 8 / 10 = 6.4Gbps
10312 * 8 /10 = 8.249Gbps

Is this correct. We are more used to 10/100/1000/2.5G/5G/10G/40G/100G.

   Andrew

Re: [PATCH net-next v2 1/5] net: aquantia: Ethtool based ring size configuration

2018-06-04 Thread Jakub Kicinski

On Tue,  5 Jun 2018 01:30:15 +0300, Igor Russkikh wrote:
> @@ -158,6 +158,8 @@ static void aq_nic_service_timer_cb(struct timer_list *t)
>   int ctimer = AQ_CFG_SERVICE_TIMER_INTERVAL;
>   int err = 0;
>  
> + mutex_lock(>aq_mutex);
> +
>   if (aq_utils_obj_test(>flags, AQ_NIC_FLAGS_IS_NOT_READY))
>   goto err_exit;
>  
> @@ -175,6 +177,7 @@ static void aq_nic_service_timer_cb(struct timer_list *t)
>   ctimer = max(ctimer / 2, 1);
>  
>  err_exit:
> + mutex_unlock(>aq_mutex);
>   mod_timer(>service_timer, jiffies + ctimer);
>  }
>  

This looks like a timer callback from the prototype, I don't think you
can take mutexes in timer callbacks.

Re: [PATCH net-next 2/3] netdevsim: Add extack error message for devlink reload

2018-06-04 Thread Jakub Kicinski

On Mon,  4 Jun 2018 15:15:02 -0700, dsah...@kernel.org wrote:
> From: David Ahern 
> 
> devlink reset command can fail if a FIB resource limit is set to a value
> lower than the current occupancy. Return a proper message indicating the
> reason for the failure.
> 
> $ devlink resource sh netdevsim/netdevsim0
> netdevsim/netdevsim0:
>   name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 
> 1 dpipe_tables none
> resources:
>   name fib size unlimited occ 43 unit entry size_min 0 size_max unlimited 
> size_gran 1 dpipe_tables none
>   name fib-rules size unlimited occ 4 unit entry size_min 0 size_max 
> unlimited size_gran 1 dpipe_tables none
>   name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 
> 1 dpipe_tables none
> resources:
>   name fib size unlimited occ 54 unit entry size_min 0 size_max unlimited 
> size_gran 1 dpipe_tables none
>   name fib-rules size unlimited occ 3 unit entry size_min 0 size_max 
> unlimited size_gran 1 dpipe_tables none
> 
> $ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 40
> 
> $ devlink dev  reload netdevsim/netdevsim0
> Error: netdevsim: New size is less than current occupancy.
> devlink answers: Invalid argument
> 
> Signed-off-by: David Ahern 

Acked-by: Jakub Kicinski 

The entire set looks very useful, thanks!

Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps

2018-06-04 Thread David Ahern

On 6/4/18 11:25 AM, Jakub Kicinski wrote:
> that, and others can use completions.  I personally think Quentin did
> an awesome job on the completions, they cover the entire syntax unlike
> the iproute2 ones and we intend to keep them that way!

iproute2 patches for completions would be welcomed if anyone has the time.

Re: [PATCH bpf-next v2 2/2] samples/bpf: Add xdp_sample_pkts example

2018-06-04 Thread Jakub Kicinski

On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
> + if (load_bpf_file(filename)) {

Would you mind using libbpf instead of bpf_load.o?  I converted some
samples in be5bca44aa6b ("samples: bpf: convert some XDP samples from
bpf_load to libbpf"), it's pretty straight forward.  Maybe we can kill
bpf_load.o one day :)

[PATCH net-next v2 1/5] net: aquantia: Ethtool based ring size configuration

2018-06-04 Thread Igor Russkikh

From: Anton Mikaev 

Implemented ring size setup, min/max validation and reconfiguration in
runtime. NIC level lock is used to prevent collisions on parallel
reconfiguration and interference with periodic service timer job.

Signed-off-by: Anton Mikaev 
Signed-off-by: Igor Russkikh 
---
 .../net/ethernet/aquantia/atlantic/aq_ethtool.c| 65 ++
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  9 ++-
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c|  9 ++-
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h|  2 +
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c  | 46 +++
 .../aquantia/atlantic/hw_atl/hw_atl_a0_internal.h  |  8 +++
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  | 50 +
 .../aquantia/atlantic/hw_atl/hw_atl_b0_internal.h  |  8 +++
 8 files changed, 147 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
index f2d8063..97f42d1 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
@@ -11,6 +11,7 @@
 
 #include "aq_ethtool.h"
 #include "aq_nic.h"
+#include "aq_vec.h"
 
 static void aq_ethtool_get_regs(struct net_device *ndev,
struct ethtool_regs *regs, void *p)
@@ -284,6 +285,68 @@ static int aq_ethtool_set_coalesce(struct net_device *ndev,
return aq_nic_update_interrupt_moderation_settings(aq_nic);
 }
 
+static void aq_get_ringparam(struct net_device *ndev,
+struct ethtool_ringparam *ring)
+{
+   struct aq_nic_s *aq_nic = netdev_priv(ndev);
+   struct aq_nic_cfg_s *aq_nic_cfg = aq_nic_get_cfg(aq_nic);
+
+   ring->rx_pending = aq_nic_cfg->rxds;
+   ring->tx_pending = aq_nic_cfg->txds;
+
+   ring->rx_max_pending = aq_nic_cfg->aq_hw_caps->rxds_max;
+   ring->tx_max_pending = aq_nic_cfg->aq_hw_caps->txds_max;
+}
+
+static int aq_set_ringparam(struct net_device *ndev,
+   struct ethtool_ringparam *ring)
+{
+   int err = 0;
+   bool ndev_running = false;
+   struct aq_nic_s *aq_nic = netdev_priv(ndev);
+   struct aq_nic_cfg_s *aq_nic_cfg = aq_nic_get_cfg(aq_nic);
+   const struct aq_hw_caps_s *hw_caps = aq_nic_cfg->aq_hw_caps;
+
+   if (ring->rx_mini_pending || ring->rx_jumbo_pending) {
+   err = -EOPNOTSUPP;
+   goto err_exit;
+   }
+
+   mutex_lock(_nic->aq_mutex);
+
+   if (netif_running(ndev)) {
+   ndev_running = true;
+   dev_close(ndev);
+   }
+
+   aq_nic_free_vectors(aq_nic);
+
+   aq_nic_cfg->rxds = max(ring->rx_pending, hw_caps->rxds_min);
+   aq_nic_cfg->rxds = min(aq_nic_cfg->rxds, hw_caps->rxds_max);
+   aq_nic_cfg->rxds = ALIGN(aq_nic_cfg->rxds, AQ_HW_RXD_MULTIPLE);
+
+   aq_nic_cfg->txds = max(ring->tx_pending, hw_caps->txds_min);
+   aq_nic_cfg->txds = min(aq_nic_cfg->txds, hw_caps->txds_max);
+   aq_nic_cfg->txds = ALIGN(aq_nic_cfg->txds, AQ_HW_TXD_MULTIPLE);
+
+   for (aq_nic->aq_vecs = 0; aq_nic->aq_vecs < aq_nic_cfg->vecs;
+aq_nic->aq_vecs++) {
+   aq_nic->aq_vec[aq_nic->aq_vecs] =
+   aq_vec_alloc(aq_nic, aq_nic->aq_vecs, aq_nic_cfg);
+   if (unlikely(!aq_nic->aq_vec[aq_nic->aq_vecs])) {
+   err = -ENOMEM;
+   goto err_unlock;
+   }
+   }
+   if (ndev_running)
+   err = dev_open(ndev);
+
+err_unlock:
+   mutex_unlock(_nic->aq_mutex);
+err_exit:
+   return err;
+}
+
 const struct ethtool_ops aq_ethtool_ops = {
.get_link= aq_ethtool_get_link,
.get_regs_len= aq_ethtool_get_regs_len,
@@ -291,6 +354,8 @@ const struct ethtool_ops aq_ethtool_ops = {
.get_drvinfo = aq_ethtool_get_drvinfo,
.get_strings = aq_ethtool_get_strings,
.get_rxfh_indir_size = aq_ethtool_get_rss_indir_size,
+   .get_ringparam   = aq_get_ringparam,
+   .set_ringparam   = aq_set_ringparam,
.get_rxfh_key_size   = aq_ethtool_get_rss_key_size,
.get_rxfh= aq_ethtool_get_rss,
.get_rxnfc   = aq_ethtool_get_rxnfc,
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h 
b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
index a2d416b..904cdfd 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
@@ -24,8 +24,10 @@ struct aq_hw_caps_s {
u64 link_speed_msk;
unsigned int hw_priv_flags;
u32 media_type;
-   u32 rxds;
-   u32 txds;
+   u32 rxds_max;
+   u32 txds_max;
+   u32 rxds_min;
+   u32 txds_min;
u32 txhwb_alignment;
u32 irq_mask;
u32 vecs;
@@ -98,6 +100,9 @@ struct aq_stats_s {
 #define AQ_HW_MEDIA_TYPE_TP1U
 #define AQ_HW_MEDIA_TYPE_FIBRE 2U
 
+#define

[PATCH net-next v2 0/5] net: aquantia: various ethtool ops implementation

2018-06-04 Thread Igor Russkikh

In this patchset Anton Mikaev and I added some useful ethtool operations:
- ring size changes
- link renegotioation
- flow control management

The patch also improves init/deinit sequence.

V2 changes:
- using mutex to secure simultaneous dev close/open
- using state var to store/restore dev state

Igor Russkikh (5):
  net: aquantia: Ethtool based ring size configuration
  net: aquantia: Improve adapter init/deinit logic
  net: aquantia: Implement rx/tx flow control ethtools callback
  net: aquantia: Add renegotiate ethtool operation support
  net: aquantia: bump driver version

 .../net/ethernet/aquantia/atlantic/aq_ethtool.c| 121 +
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  20 +++-
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c|  17 ++-
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h|   2 +
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c  |  47 
 .../aquantia/atlantic/hw_atl/hw_atl_a0_internal.h  |   8 ++
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  |  51 -
 .../aquantia/atlantic/hw_atl/hw_atl_b0_internal.h  |   8 ++
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c|  54 +
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h|  35 ++
 .../aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c   |  69 +++-
 drivers/net/ethernet/aquantia/atlantic/ver.h   |   4 +-
 12 files changed, 352 insertions(+), 84 deletions(-)

-- 
2.7.4

[PATCH net-next v2 4/5] net: aquantia: Add renegotiate ethtool operation support

2018-06-04 Thread Igor Russkikh

From: Anton Mikaev 

Adds ethtool -r|--negotiate operation support. It triggers special
control bit on FW interface causing FW to restart link negotiation.

Signed-off-by: Igor Russkikh 
Signed-off-by: Anton Mikaev 
---
 .../net/ethernet/aquantia/atlantic/aq_ethtool.c| 14 +
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  2 ++
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h| 35 ++
 .../aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c   | 12 
 4 files changed, 63 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
index ec9ed44..7058f130 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
@@ -285,6 +285,19 @@ static int aq_ethtool_set_coalesce(struct net_device *ndev,
return aq_nic_update_interrupt_moderation_settings(aq_nic);
 }
 
+static int aq_ethtool_nway_reset(struct net_device *ndev)
+{
+   struct aq_nic_s *aq_nic = netdev_priv(ndev);
+
+   if (unlikely(!aq_nic->aq_fw_ops->renegotiate))
+   return -EOPNOTSUPP;
+
+   if (netif_running(ndev))
+   return aq_nic->aq_fw_ops->renegotiate(aq_nic->aq_hw);
+
+   return 0;
+}
+
 static void aq_ethtool_get_pauseparam(struct net_device *ndev,
  struct ethtool_pauseparam *pause)
 {
@@ -394,6 +407,7 @@ const struct ethtool_ops aq_ethtool_ops = {
.get_drvinfo = aq_ethtool_get_drvinfo,
.get_strings = aq_ethtool_get_strings,
.get_rxfh_indir_size = aq_ethtool_get_rss_indir_size,
+   .nway_reset  = aq_ethtool_nway_reset,
.get_ringparam   = aq_get_ringparam,
.set_ringparam   = aq_set_ringparam,
.get_pauseparam  = aq_ethtool_get_pauseparam,
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h 
b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
index 3aa36d5..1a51152 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
@@ -212,6 +212,8 @@ struct aq_fw_ops {
 
int (*reset)(struct aq_hw_s *self);
 
+   int (*renegotiate)(struct aq_hw_s *self);
+
int (*get_mac_permanent)(struct aq_hw_s *self, u8 *mac);
 
int (*set_link_speed)(struct aq_hw_s *self, u32 speed);
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
index cd8f18f..b875590 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
@@ -239,6 +239,41 @@ enum hw_atl_fw2x_caps_hi {
CAPS_HI_TRANSACTION_ID,
 };
 
+enum hw_atl_fw2x_ctrl {
+   CTRL_RESERVED1 = 0x00,
+   CTRL_RESERVED2,
+   CTRL_RESERVED3,
+   CTRL_PAUSE,
+   CTRL_ASYMMETRIC_PAUSE,
+   CTRL_RESERVED4,
+   CTRL_RESERVED5,
+   CTRL_RESERVED6,
+   CTRL_1GBASET_FD_EEE,
+   CTRL_2P5GBASET_FD_EEE,
+   CTRL_5GBASET_FD_EEE,
+   CTRL_10GBASET_FD_EEE,
+   CTRL_THERMAL_SHUTDOWN,
+   CTRL_PHY_LOGS,
+   CTRL_EEE_AUTO_DISABLE,
+   CTRL_PFC,
+   CTRL_WAKE_ON_LINK,
+   CTRL_CABLE_DIAG,
+   CTRL_TEMPERATURE,
+   CTRL_DOWNSHIFT,
+   CTRL_PTP_AVB,
+   CTRL_RESERVED7,
+   CTRL_LINK_DROP,
+   CTRL_SLEEP_PROXY,
+   CTRL_WOL,
+   CTRL_MAC_STOP,
+   CTRL_EXT_LOOPBACK,
+   CTRL_INT_LOOPBACK,
+   CTRL_RESERVED8,
+   CTRL_WOL_TIMER,
+   CTRL_STATISTICS,
+   CTRL_FORCE_RECONNECT,
+};
+
 struct aq_hw_s;
 struct aq_fw_ops;
 struct aq_hw_caps_s;
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
index d2d030a..1935fd6 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
@@ -215,6 +215,17 @@ static int aq_fw2x_update_stats(struct aq_hw_s *self)
return hw_atl_utils_update_stats(self);
 }
 
+static int aq_fw2x_renegotiate(struct aq_hw_s *self)
+{
+   u32 mpi_opts = aq_hw_read_reg(self, HW_ATL_FW2X_MPI_CONTROL2_ADDR);
+
+   mpi_opts |= BIT(CTRL_FORCE_RECONNECT);
+
+   aq_hw_write_reg(self, HW_ATL_FW2X_MPI_CONTROL2_ADDR, mpi_opts);
+
+   return 0;
+}
+
 static int aq_fw2x_set_flow_control(struct aq_hw_s *self)
 {
u32 mpi_state = aq_hw_read_reg(self, HW_ATL_FW2X_MPI_CONTROL2_ADDR);
@@ -230,6 +241,7 @@ const struct aq_fw_ops aq_fw_2x_ops = {
.init = aq_fw2x_init,
.deinit = aq_fw2x_deinit,
.reset = NULL,
+   .renegotiate = aq_fw2x_renegotiate,
.get_mac_permanent = aq_fw2x_get_mac_permanent,
.set_link_speed = aq_fw2x_set_link_speed,
.set_state = aq_fw2x_set_state,
-- 
2.7.4

[PATCH net-next v2 3/5] net: aquantia: Implement rx/tx flow control ethtools callback

2018-06-04 Thread Igor Russkikh

Runtime change of pause frame configuration (rx/tx flow control)
via ethtool.

Signed-off-by: Igor Russkikh 
---
 .../net/ethernet/aquantia/atlantic/aq_ethtool.c| 42 ++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c|  6 +++-
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c|  1 +
 .../aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c   | 26 ++
 4 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
index 97f42d1..ec9ed44 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c
@@ -285,6 +285,46 @@ static int aq_ethtool_set_coalesce(struct net_device *ndev,
return aq_nic_update_interrupt_moderation_settings(aq_nic);
 }
 
+static void aq_ethtool_get_pauseparam(struct net_device *ndev,
+ struct ethtool_pauseparam *pause)
+{
+   struct aq_nic_s *aq_nic = netdev_priv(ndev);
+
+   pause->autoneg = 0;
+
+   if (aq_nic->aq_hw->aq_nic_cfg->flow_control & AQ_NIC_FC_RX)
+   pause->rx_pause = 1;
+   if (aq_nic->aq_hw->aq_nic_cfg->flow_control & AQ_NIC_FC_TX)
+   pause->tx_pause = 1;
+}
+
+static int aq_ethtool_set_pauseparam(struct net_device *ndev,
+struct ethtool_pauseparam *pause)
+{
+   struct aq_nic_s *aq_nic = netdev_priv(ndev);
+   int err = 0;
+
+   if (!aq_nic->aq_fw_ops->set_flow_control)
+   return -EOPNOTSUPP;
+
+   if (pause->autoneg == AUTONEG_ENABLE)
+   return -EOPNOTSUPP;
+
+   if (pause->rx_pause)
+   aq_nic->aq_hw->aq_nic_cfg->flow_control |= AQ_NIC_FC_RX;
+   else
+   aq_nic->aq_hw->aq_nic_cfg->flow_control &= ~AQ_NIC_FC_RX;
+
+   if (pause->tx_pause)
+   aq_nic->aq_hw->aq_nic_cfg->flow_control |= AQ_NIC_FC_TX;
+   else
+   aq_nic->aq_hw->aq_nic_cfg->flow_control &= ~AQ_NIC_FC_TX;
+
+   err = aq_nic->aq_fw_ops->set_flow_control(aq_nic->aq_hw);
+
+   return err;
+}
+
 static void aq_get_ringparam(struct net_device *ndev,
 struct ethtool_ringparam *ring)
 {
@@ -356,6 +396,8 @@ const struct ethtool_ops aq_ethtool_ops = {
.get_rxfh_indir_size = aq_ethtool_get_rss_indir_size,
.get_ringparam   = aq_get_ringparam,
.set_ringparam   = aq_set_ringparam,
+   .get_pauseparam  = aq_ethtool_get_pauseparam,
+   .set_pauseparam  = aq_ethtool_set_pauseparam,
.get_rxfh_key_size   = aq_ethtool_get_rss_key_size,
.get_rxfh= aq_ethtool_get_rss,
.get_rxnfc   = aq_ethtool_get_rxnfc,
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 6721ffa..f97e0ba 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -766,10 +766,14 @@ void aq_nic_get_link_ksettings(struct aq_nic_s *self,
ethtool_link_ksettings_add_link_mode(cmd, advertising,
 100baseT_Full);
 
-   if (self->aq_nic_cfg.flow_control)
+   if (self->aq_nic_cfg.flow_control & AQ_NIC_FC_RX)
ethtool_link_ksettings_add_link_mode(cmd, advertising,
 Pause);
 
+   if (self->aq_nic_cfg.flow_control & AQ_NIC_FC_TX)
+   ethtool_link_ksettings_add_link_mode(cmd, advertising,
+Asym_Pause);
+
if (self->aq_nic_cfg.aq_hw_caps->media_type == AQ_HW_MEDIA_TYPE_FIBRE)
ethtool_link_ksettings_add_link_mode(cmd, advertising, FIBRE);
else
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index 9d0a96d..e1feba5 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -834,4 +834,5 @@ const struct aq_fw_ops aq_fw_1x_ops = {
.set_state = hw_atl_utils_mpi_set_state,
.update_link_status = hw_atl_utils_mpi_get_link_status,
.update_stats = hw_atl_utils_update_stats,
+   .set_flow_control = NULL,
 };
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
index a4ac592..d2d030a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
@@ -87,6 +87,19 @@ static int aq_fw2x_set_link_speed(struct aq_hw_s *self, u32 
speed)
return 0;
 }
 
+static void aq_fw2x_set_mpi_flow_control(struct aq_hw_s *self, u32 *mpi_state)
+{
+   if (self->aq_nic_cfg->flow_control & AQ_NIC_FC_RX)
+

[PATCH net-next v2 5/5] net: aquantia: bump driver version

2018-06-04 Thread Igor Russkikh

Signed-off-by: Igor Russkikh 
---
 drivers/net/ethernet/aquantia/atlantic/ver.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/ver.h 
b/drivers/net/ethernet/aquantia/atlantic/ver.h
index a445de6..94efc64 100644
--- a/drivers/net/ethernet/aquantia/atlantic/ver.h
+++ b/drivers/net/ethernet/aquantia/atlantic/ver.h
@@ -12,8 +12,8 @@
 
 #define NIC_MAJOR_DRIVER_VERSION   2
 #define NIC_MINOR_DRIVER_VERSION   0
-#define NIC_BUILD_DRIVER_VERSION   2
-#define NIC_REVISION_DRIVER_VERSION1
+#define NIC_BUILD_DRIVER_VERSION   3
+#define NIC_REVISION_DRIVER_VERSION0
 
 #define AQ_CFG_DRV_VERSION_SUFFIX "-kern"
 
-- 
2.7.4

[PATCH net-next v2 2/5] net: aquantia: Improve adapter init/deinit logic

2018-06-04 Thread Igor Russkikh

We now pass link drop status to FW on init/deinit. This is required
to inform FW that driver took/released a control on link.
FW then will manage its own state and device power profile based
on this information. To improve management we remove mpi_set
function which ambiguously took both state and speed parameters.

Deinit callback is now a part of FW ops, as it actually manages the FW.

Signed-off-by: Igor Russkikh 
---
 drivers/net/ethernet/aquantia/atlantic/aq_hw.h |  9 ++--
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c|  2 +-
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c  |  1 -
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c  |  1 -
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c| 53 --
 .../aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c   | 31 -
 6 files changed, 66 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h 
b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
index 904cdfd..3aa36d5 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_hw.h
@@ -202,25 +202,28 @@ struct aq_hw_ops {
 
int (*hw_get_fw_version)(struct aq_hw_s *self, u32 *fw_version);
 
-   int (*hw_deinit)(struct aq_hw_s *self);
-
int (*hw_set_power)(struct aq_hw_s *self, unsigned int power_state);
 };
 
 struct aq_fw_ops {
int (*init)(struct aq_hw_s *self);
 
+   int (*deinit)(struct aq_hw_s *self);
+
int (*reset)(struct aq_hw_s *self);
 
int (*get_mac_permanent)(struct aq_hw_s *self, u8 *mac);
 
int (*set_link_speed)(struct aq_hw_s *self, u32 speed);
 
-   int (*set_state)(struct aq_hw_s *self, enum hal_atl_utils_fw_state_e 
state);
+   int (*set_state)(struct aq_hw_s *self,
+enum hal_atl_utils_fw_state_e state);
 
int (*update_link_status)(struct aq_hw_s *self);
 
int (*update_stats)(struct aq_hw_s *self);
+
+   int (*set_flow_control)(struct aq_hw_s *self);
 };
 
 #endif /* AQ_HW_H */
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index a5ccfde..6721ffa 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -884,7 +884,7 @@ void aq_nic_deinit(struct aq_nic_s *self)
aq_vec_deinit(aq_vec);
 
if (self->power_state == AQ_HW_POWER_STATE_D0) {
-   (void)self->aq_hw_ops->hw_deinit(self->aq_hw);
+   (void)self->aq_fw_ops->deinit(self->aq_hw);
} else {
(void)self->aq_hw_ops->hw_set_power(self->aq_hw,
   self->power_state);
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
index 7fd6a7e..ed7fe6f 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_a0.c
@@ -877,7 +877,6 @@ static int hw_atl_a0_hw_ring_rx_stop(struct aq_hw_s *self,
 const struct aq_hw_ops hw_atl_ops_a0 = {
.hw_set_mac_address   = hw_atl_a0_hw_mac_addr_set,
.hw_init  = hw_atl_a0_hw_init,
-   .hw_deinit= hw_atl_utils_hw_deinit,
.hw_set_power = hw_atl_utils_hw_set_power,
.hw_reset = hw_atl_a0_hw_reset,
.hw_start = hw_atl_a0_hw_start,
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index 4ea15b9..9dd4f49 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -935,7 +935,6 @@ static int hw_atl_b0_hw_ring_rx_stop(struct aq_hw_s *self,
 const struct aq_hw_ops hw_atl_ops_b0 = {
.hw_set_mac_address   = hw_atl_b0_hw_mac_addr_set,
.hw_init  = hw_atl_b0_hw_init,
-   .hw_deinit= hw_atl_utils_hw_deinit,
.hw_set_power = hw_atl_utils_hw_set_power,
.hw_reset = hw_atl_b0_hw_reset,
.hw_start = hw_atl_b0_hw_start,
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c 
b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index e652d86..9d0a96d 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -30,10 +30,11 @@
 #define HW_ATL_MPI_CONTROL_ADR  0x0368U
 #define HW_ATL_MPI_STATE_ADR0x036CU
 
-#define HW_ATL_MPI_STATE_MSK0x00FFU
-#define HW_ATL_MPI_STATE_SHIFT  0U
-#define HW_ATL_MPI_SPEED_MSK0xU
-#define HW_ATL_MPI_SPEED_SHIFT  16U
+#define HW_ATL_MPI_STATE_MSK  0x00FFU
+#define HW_ATL_MPI_STATE_SHIFT0U
+#define HW_ATL_MPI_SPEED_MSK  0x00FFU
+#define HW_ATL_MPI_SPEED_SHIFT16U
+#define HW_ATL_MPI_DIRTY_WAKE_MSK 0x0200U

[PATCH net-next] tcp: refactor tcp_ecn_check_ce to remove sk type cast

2018-06-04 Thread Yousuk Seung

Refactor tcp_ecn_check_ce and __tcp_ecn_check_ce to accept struct sock*
instead of tcp_sock* to clean up type casts. This is a pure refactor
patch.

Signed-off-by: Yousuk Seung 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Acked-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_input.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d5ffb573ca4d..355d3dffd021 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -254,8 +254,10 @@ static void tcp_ecn_withdraw_cwr(struct tcp_sock *tp)
tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR;
 }
 
-static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb)
+static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb)
 {
+   struct tcp_sock *tp = tcp_sk(sk);
+
switch (TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK) {
case INET_ECN_NOT_ECT:
/* Funny extension: if ECT is not set on a segment,
@@ -263,31 +265,31 @@ static void __tcp_ecn_check_ce(struct tcp_sock *tp, const 
struct sk_buff *skb)
 * it is probably a retransmit.
 */
if (tp->ecn_flags & TCP_ECN_SEEN)
-   tcp_enter_quickack_mode((struct sock *)tp, 1);
+   tcp_enter_quickack_mode(sk, 1);
break;
case INET_ECN_CE:
-   if (tcp_ca_needs_ecn((struct sock *)tp))
-   tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_IS_CE);
+   if (tcp_ca_needs_ecn(sk))
+   tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
 
if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
/* Better not delay acks, sender can have a very low 
cwnd */
-   tcp_enter_quickack_mode((struct sock *)tp, 1);
+   tcp_enter_quickack_mode(sk, 1);
tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
}
tp->ecn_flags |= TCP_ECN_SEEN;
break;
default:
-   if (tcp_ca_needs_ecn((struct sock *)tp))
-   tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_NO_CE);
+   if (tcp_ca_needs_ecn(sk))
+   tcp_ca_event(sk, CA_EVENT_ECN_NO_CE);
tp->ecn_flags |= TCP_ECN_SEEN;
break;
}
 }
 
-static void tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb)
+static void tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb)
 {
-   if (tp->ecn_flags & TCP_ECN_OK)
-   __tcp_ecn_check_ce(tp, skb);
+   if (tcp_sk(sk)->ecn_flags & TCP_ECN_OK)
+   __tcp_ecn_check_ce(sk, skb);
 }
 
 static void tcp_ecn_rcv_synack(struct tcp_sock *tp, const struct tcphdr *th)
@@ -710,7 +712,7 @@ static void tcp_event_data_recv(struct sock *sk, struct 
sk_buff *skb)
}
icsk->icsk_ack.lrcvtime = now;
 
-   tcp_ecn_check_ce(tp, skb);
+   tcp_ecn_check_ce(sk, skb);
 
if (skb->len >= 128)
tcp_grow_window(sk, skb);
@@ -4434,7 +4436,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct 
sk_buff *skb)
u32 seq, end_seq;
bool fragstolen;
 
-   tcp_ecn_check_ce(tp, skb);
+   tcp_ecn_check_ce(sk, skb);
 
if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFODROP);
-- 
2.17.1.1185.g55be947832-goog

Re: [PATCH bpf-next v2 1/2] trace_helpers.c: Add helpers to poll multiple perf FDs for events

2018-06-04 Thread Jakub Kicinski

On Mon, 04 Jun 2018 18:33:56 +0200, Toke Høiland-Jørgensen wrote:
> This adds two new helper functions to trace_helpers that supports polling
> multiple perf file descriptors for events. These are used to the XDP
> perf_event_output example, which needs to work with one perf fd per CPU.
> 
> Signed-off-by: Toke Høiland-Jørgensen 

Did you take a look at tools/bpf/bpftool/map_perf_ring.c ?

I think the ability to poll multiple FDs could be generally useful and
therefore better add it to libbpf.c than
tools/testing/selftests/bpf/trace_helpers.c?  I'm not 100% sure myself...

[PATCH net-next v11 08/10] netdev: cavium: octeon: Add Octeon III BGX Ethernet core

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

This is the main core of the BGX Ethernet driver.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 drivers/net/ethernet/cavium/octeon/octeon3-core.c | 2380 +
 1 file changed, 2380 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-core.c

diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-core.c 
b/drivers/net/ethernet/cavium/octeon/octeon3-core.c
new file mode 100644
index 000..1e2f68d
--- /dev/null
+++ b/drivers/net/ethernet/cavium/octeon/octeon3-core.c
@@ -0,0 +1,2380 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Octeon III BGX Nexus Ethernet driver core
+ *
+ * Copyright (C) 2018 Cavium, Inc.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "octeon3.h"
+
+/*  First buffer:
+ *
+ *+---SKB-+
+ *|   |
+ *|   |
+ * +--+--*data|
+ * |  |   |
+ * |  |   |
+ * |  +---+
+ * |   /|\
+ * ||
+ * ||
+ *\|/   |
+ * WQE - 128 -+-> +-+---+ -+-
+ *|   |*skb +   |  |
+ *|   | |  |
+ *|   | |  |
+ *  WQE_SKIP = 128| |  |
+ *|   | |  |
+ *|   | |  |
+ *|   | |  |
+ *|   | |  First Skip
+ * WQE   -+-> +-+  |
+ *|   word 0|  |
+ *|   word 1|  |
+ *|   word 2|  |
+ *|   word 3|  |
+ *|   word 4|  |
+ *+-+ -+-
+ *   ++- packet link|
+ *   ||  packet data|
+ *   || |
+ *   || |
+ *   || .   |
+ *   || .   |
+ *   || .   |
+ *   |+-+
+ *   |
+ *   |
+ * Later buffers:|
+ *   |
+ *   |
+ *   |
+ *   |
+ *   |
+ *   |+---SKB-+
+ *   ||   |
+ *   ||   |
+ *   | +--+--*data|
+ *   | |  |   |
+ *   | |  |   |
+ *   | |  +---+
+ *   | |   /|\
+ *   | ||
+ *   | ||
+ *   |\|/   |
+ * WQE - 128 +--> +-+---+ -+-
+ *   ||*skb +   |  |
+ *   || |  |
+ *   || |  |
+ *   || |  |
+ *   || |  LATER_SKIP = 128
+ *   || |  |
+ *   || |  |
+ *   || |  |
+ *   |+-+ -+-
+ *   ||  packet link|
+ *   +--> |  packet data|
+ *| |
+ *| |
+ *| .   |
+ *| .   |
+ *| .   |
+ *+-+
+ */
+
+#define MAX_TX_QUEUE_DEPTH 512
+#define SSO_INTSN_EXE 0x61
+#define MAX_RX_CONTEXTS 32
+
+#define SKB_PTR_OFFSET 0
+#define SKB_AURA_OFFSET1
+#define SKB_AURA_MAGIC 0xbadc0ffee4dad000ULL
+
+#define MAX_CORES  48
+#define FPA3_NUM_AURAS 1024
+
+#define USE_ASYNC_IOBDMA   1
+#define SCR_SCRATCH0ull
+#define SSO_NO_WAIT0ull
+#define DID_TAG_SWTAG  0x60ull
+#define IOBDMA_SENDSINGLE  0xa200ull
+
+/* Values for the value of wqe word2 [ERRLEV] */
+#define PKI_ERRLEV_LA  0x01
+
+/* Values for the value of wqe word2 [OPCODE] */
+#define PKI_OPCODE_NONE0x00
+#define PKI_OPCODE_JABBER  0x02
+#define PKI_OPCODE_FCS 0x07
+
+/* Values for the layer type in the wqe */

[PATCH net-next v11 09/10] netdev: cavium: octeon: Add Octeon III BGX Ethernet building

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add the build and configuration files for the BGX Ethernet.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 drivers/net/ethernet/cavium/Kconfig | 22 +-
 drivers/net/ethernet/cavium/octeon/Makefile |  8 +++-
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/Kconfig 
b/drivers/net/ethernet/cavium/Kconfig
index 043e3c1..3b9709d 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -4,7 +4,7 @@
 
 config NET_VENDOR_CAVIUM
bool "Cavium ethernet drivers"
-   depends on PCI
+   depends on PCI || CAVIUM_OCTEON_SOC
default y
---help---
  Select this option if you want enable Cavium network support.
@@ -100,4 +100,24 @@ config LIQUIDIO_VF
  will be called liquidio_vf. MSI-X interrupt support is required
  for this driver to work correctly
 
+config OCTEON3_BGX_PORT
+   tristate "Cavium Octeon III BGX port support"
+   depends on CAVIUM_OCTEON_SOC
+   ---help---
+ This driver adds support for Cavium Octeon III BGX ports. BGX ports
+ support sgmii, rgmii, xaui, rxaui, xlaui, xfi, 10KR and 40KR modes.
+
+ Say Y to use the management port on Octeon III boards or to use
+ any other ethernet port.
+
+config OCTEON3_ETHERNET
+   tristate "Cavium OCTEON III PKI/PKO Ethernet support"
+   depends on CAVIUM_OCTEON_SOC
+   select OCTEON_BGX_PORT
+   select OCTEON_FPA3
+   select FW_LOADER
+   ---help---
+ Support for 'BGX' Ethernet via PKI/PKO units. No support for
+ cn70xx chips, use OCTEON_ETHERNET instead.
+
 endif # NET_VENDOR_CAVIUM
diff --git a/drivers/net/ethernet/cavium/octeon/Makefile 
b/drivers/net/ethernet/cavium/octeon/Makefile
index efa41c1..1939c84 100644
--- a/drivers/net/ethernet/cavium/octeon/Makefile
+++ b/drivers/net/ethernet/cavium/octeon/Makefile
@@ -1,5 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
 #
 # Makefile for the Cavium network device drivers.
 #
 
-obj-$(CONFIG_OCTEON_MGMT_ETHERNET) += octeon_mgmt.o
+obj-$(CONFIG_OCTEON_MGMT_ETHERNET) += octeon_mgmt.o
+obj-$(CONFIG_OCTEON3_BGX_PORT) += octeon3-bgx-nexus.o octeon3-bgx-port.o
+obj-$(CONFIG_OCTEON3_ETHERNET) += octeon3-ethernet.o
+
+octeon3-ethernet-objs += octeon3-core.o octeon3-pki.o octeon3-pko.o\
+octeon3-sso.o
-- 
2.1.4

[PATCH net-next v11 06/10] netdev: cavium: octeon: Add Octeon III PKO Support

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add support for Octeon III PKO logic block for BGX Ethernet.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 drivers/net/ethernet/cavium/octeon/octeon3-pko.c | 1619 ++
 1 file changed, 1619 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-pko.c

diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-pko.c 
b/drivers/net/ethernet/cavium/octeon/octeon3-pko.c
new file mode 100644
index 000..eb4c016
--- /dev/null
+++ b/drivers/net/ethernet/cavium/octeon/octeon3-pko.c
@@ -0,0 +1,1619 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Octeon III Packet-Output Processing Unit (PKO)
+ *
+ * Copyright (C) 2018 Cavium, Inc.
+ */
+
+#include "octeon3.h"
+
+#define MAX_OUTPUT_MAC 28
+#define MAX_FIFO_GRP   8
+
+#define FIFO_SIZE  2560
+
+/* Registers are accessed via xkphys */
+#define PKO_BASE   0x15400ull
+#define PKO_ADDR(node) (SET_XKPHYS + NODE_OFFSET(node) +  \
+PKO_BASE)
+
+#define PKO_L1_SQ_SHAPE(n, q)  (PKO_ADDR(n) + ((q) << 9)+ 0x10)
+#define PKO_L1_SQ_LINK(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x38)
+#define PKO_DQ_WM_CTL(n, q)(PKO_ADDR(n) + ((q) << 9)+ 0x40)
+#define PKO_L1_SQ_TOPOLOGY(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x08)
+#define PKO_L2_SQ_SCHEDULE(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x080008)
+#define PKO_L3_L2_SQ_CHANNEL(n, q) (PKO_ADDR(n) + ((q) << 9)+ 0x080038)
+#define PKO_CHANNEL_LEVEL(n)   (PKO_ADDR(n) + 0x0800f0)
+#define PKO_SHAPER_CFG(n)  (PKO_ADDR(n) + 0x0800f8)
+#define PKO_L2_SQ_TOPOLOGY(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x10)
+#define PKO_L3_SQ_SCHEDULE(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x18)
+#define PKO_L3_SQ_TOPOLOGY(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x18)
+#define PKO_L4_SQ_SCHEDULE(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x180008)
+#define PKO_L4_SQ_TOPOLOGY(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x20)
+#define PKO_L5_SQ_SCHEDULE(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x28)
+#define PKO_L5_SQ_TOPOLOGY(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x28)
+#define PKO_DQ_SCHEDULE(n, q)  (PKO_ADDR(n) + ((q) << 9)+ 0x280008)
+#define PKO_DQ_SW_XOFF(n, q)   (PKO_ADDR(n) + ((q) << 9)+ 0x2800e0)
+#define PKO_DQ_TOPOLOGY(n, q)  (PKO_ADDR(n) + ((q) << 9)+ 0x30)
+#define PKO_PDM_CFG(n) (PKO_ADDR(n) + 0x80)
+#define PKO_PDM_DQ_MINPAD(n, q)(PKO_ADDR(n) + ((q) << 3)+ 
0x8f)
+#define PKO_MAC_CFG(n, m)  (PKO_ADDR(n) + ((m) << 3)+ 0x90)
+#define PKO_PTF_STATUS(n, f)   (PKO_ADDR(n) + ((f) << 3)+ 0x900100)
+#define PKO_PTGF_CFG(n, g) (PKO_ADDR(n) + ((g) << 3)+ 0x900200)
+#define PKO_PTF_IOBP_CFG(n)(PKO_ADDR(n) + 0x900300)
+#define PKO_MCI0_MAX_CRED(n, m)(PKO_ADDR(n) + ((m) << 3)+ 
0xa0)
+#define PKO_MCI1_MAX_CRED(n, m)(PKO_ADDR(n) + ((m) << 3)+ 
0xa8)
+#define PKO_LUT(n, c)  (PKO_ADDR(n) + ((c) << 3)+ 0xb0)
+#define PKO_DPFI_STATUS(n) (PKO_ADDR(n) + 0xc0)
+#define PKO_DPFI_FLUSH(n)  (PKO_ADDR(n) + 0xc8)
+#define PKO_DPFI_FPA_AURA(n)   (PKO_ADDR(n) + 0xc00010)
+#define PKO_DPFI_ENA(n)(PKO_ADDR(n) + 
0xc00018)
+#define PKO_STATUS(n)  (PKO_ADDR(n) + 0xd0)
+#define PKO_ENABLE(n)  (PKO_ADDR(n) + 0xd8)
+
+/* These levels mimic the PKO internal linked queue structure */
+enum queue_level {
+   PQ = 1,
+   L2_SQ = 2,
+   L3_SQ = 3,
+   L4_SQ = 4,
+   L5_SQ = 5,
+   DQ = 6
+};
+
+enum pko_dqop_e {
+   DQOP_SEND,
+   DQOP_OPEN,
+   DQOP_CLOSE,
+   DQOP_QUERY
+};
+
+enum pko_dqstatus_e {
+   PASS = 0,
+   BADSTATE = 0x8,
+   NOFPABUF = 0x9,
+   NOPKOBUF = 0xa,
+   FAILRTNPTR = 0xb,
+   ALREADY = 0xc,
+   NOTCREATED = 0xd,
+   NOTEMPTY = 0xe,
+   SENDPKTDROP = 0xf
+};
+
+struct mac_info {
+   int fifo_cnt;
+   int prio;
+   int speed;
+   int fifo;
+   int num_lmacs;
+};
+
+struct fifo_grp_info {
+   int speed;
+   int size;
+};
+
+static const int lut_index_78xx[] = {
+   0x200,
+   0x240,
+   0x280,
+   0x2c0,
+   0x300,
+   0x340
+};
+
+static const int lut_index_73xx[] = {
+   0x000,
+   0x040,
+   0x080
+};
+
+static enum queue_level max_sq_level(void)
+{
+   /* 73xx and 75xx only have 3 scheduler queue levels */
+   if (OCTEON_IS_MODEL(OCTEON_CN73XX) || OCTEON_IS_MODEL(OCTEON_CNF75XX))
+   return L3_SQ;
+

[PATCH net-next v11 10/10] MAINTAINERS: Add entry for drivers/net/ethernet/cavium/octeon/octeon3-*

2018-06-04 Thread Steven J. Hill

From: David Daney 

Signed-off-by: David Daney 
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 70d61c2..9ab8b69 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3249,6 +3249,12 @@ W:   http://www.cavium.com
 S: Supported
 F: drivers/mmc/host/cavium*
 
+CAVIUM OCTEON-III NETWORK DRIVER
+M: Steven J. Hill 
+L: netdev@vger.kernel.org
+S: Supported
+F: drivers/net/ethernet/cavium/octeon/octeon3-*
+
 CAVIUM OCTEON-TX CRYPTO DRIVER
 M: George Cherian 
 L: linux-cry...@vger.kernel.org
-- 
2.1.4

[PATCH net-next v11 03/10] netdev: cavium: octeon: Add Octeon III BGX Ethernet Nexus

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add the BGX nexus architeture for Octeon III BGX Ethernet.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 .../net/ethernet/cavium/octeon/octeon3-bgx-nexus.c | 673 +
 1 file changed, 673 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-bgx-nexus.c

diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-bgx-nexus.c 
b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-nexus.c
new file mode 100644
index 000..f9c45d7
--- /dev/null
+++ b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-nexus.c
@@ -0,0 +1,673 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Octeon III BGX Nexus Ethernet driver
+ *
+ * Copyright (C) 2018 Cavium, Inc.
+ */
+
+#include 
+#include 
+#include 
+
+#include "octeon3.h"
+
+static atomic_t request_mgmt_once;
+static atomic_t load_driver_once;
+static atomic_t pki_id;
+
+static char *mix_port;
+module_param(mix_port, charp, 0444);
+MODULE_PARM_DESC(mix_port, "Specifies which ports connect to MIX interfaces.");
+
+static char *pki_port;
+module_param(pki_port, charp, 0444);
+MODULE_PARM_DESC(pki_port, "Specifies which ports connect to the PKI.");
+
+#define MAX_MIX_PER_NODE   2
+#define MAX_MIX(MAX_NODES * MAX_MIX_PER_NODE)
+
+/* struct mix_port_lmac - Describes a lmac that connects to a mix port. The 
lmac
+ *   must be on the same node as the mix.
+ * @node: Node of the lmac.
+ * @bgx: Bgx of the lmac.
+ * @lmac: Lmac index.
+ */
+struct mix_port_lmac {
+   int node;
+   int bgx;
+   int lmac;
+};
+
+/* mix_ports_lmacs contains all the lmacs connected to mix ports */
+static struct mix_port_lmac mix_port_lmacs[MAX_MIX];
+
+/* pki_ports keeps track of the lmacs connected to the pki */
+static bool pki_ports[MAX_NODES][MAX_BGX_PER_NODE][MAX_LMAC_PER_BGX];
+
+/* Created platform devices get added to this list */
+static struct list_head pdev_list;
+static struct mutex pdev_list_lock;
+
+/* Created platform device use this structure to add themselves to the list */
+struct pdev_list_item {
+   struct list_head list;
+   struct platform_device *pdev;
+};
+
+/* is_lmac_to_mix - Search the list of lmacs connected to mix'es for a match.
+ * @node: Numa node of lmac to search for.
+ * @bgx: Bgx of lmac to search for.
+ * @lmac: Lmac index to search for.
+ *
+ * Returns true if the lmac is connected to a mix.
+ * Returns false if the lmac is not connected to a mix.
+ */
+static bool is_lmac_to_mix(int node, int bgx, int lmac)
+{
+   int i;
+
+   for (i = 0; i < MAX_MIX; i++) {
+   if (mix_port_lmacs[i].node == node &&
+   mix_port_lmacs[i].bgx == bgx &&
+   mix_port_lmacs[i].lmac == lmac)
+   return true;
+   }
+
+   return false;
+}
+
+/* is_lmac_to_pki - Search the list of lmacs connected to the pki for a match.
+ * @node: Numa node of lmac to search for.
+ * @bgx: Bgx of lmac to search for.
+ * @lmac: Lmac index to search for.
+ *
+ * Returns true if the lmac is connected to the pki.
+ * Returns false if the lmac is not connected to the pki.
+ */
+static bool is_lmac_to_pki(int node, int bgx, int lmac)
+{
+   return pki_ports[node][bgx][lmac];
+}
+
+/* is_lmac_to_xcv - Check if this lmac is connected to the xcv block (rgmii).
+ * @of_node: Device node to check.
+ *
+ * Returns true if the lmac is connected to the xcv port.
+ * Returns false if the lmac is not connected to the xcv port.
+ */
+static bool is_lmac_to_xcv(struct device_node *of_node)
+{
+   return of_device_is_compatible(of_node, "cavium,octeon-7360-xcv");
+}
+
+static int bgx_probe(struct platform_device *pdev)
+{
+   struct platform_device *new_dev, *pki_dev;
+   struct mac_platform_data platform_data;
+   int i, interface, numa_node, r = 0;
+   struct device_node *child;
+   const __be32 *reg;
+   u64 addr, data;
+   char id[64];
+   u32 port;
+
+   reg = of_get_property(pdev->dev.of_node, "reg", NULL);
+   addr = of_translate_address(pdev->dev.of_node, reg);
+   interface = (addr >> 24) & 0xf;
+   numa_node = (addr >> 36) & 0x7;
+
+   /* Assign 8 CAM entries per LMAC */
+   for (i = 0; i < 32; i++) {
+   data = i >> 3;
+   oct_csr_write(data,
+ BGX_CMR_RX_ADRX_CAM(numa_node, interface, i));
+   }
+
+   for_each_available_child_of_node(pdev->dev.of_node, child) {
+   struct pdev_list_item *pdev_item;
+   bool is_mix = false;
+   bool is_pki = false;
+   bool is_xcv = false;
+
+   if (!of_device_is_compatible(child, 
"cavium,octeon-7890-bgx-port") &&
+   !of_device_is_compatible(child, "cavium,octeon-7360-xcv"))
+   continue;
+   r = of_property_read_u32(child, "reg", );
+   if (r)
+   return -ENODEV;
+
+   is_mix =

[PATCH net-next v11 00/10] netdev: octeon-ethernet: Add Cavium Octeon III support.

2018-06-04 Thread Steven J. Hill

Add the Cavium OCTEON III network driver. There are some corresponding
MIPS architecture support changes which will be upstreamed separately.

Changes in v11:

o Massive clean-up of files, split big patch into smaller pieces,
  and some minor rework.

Carlos Munoz (9):
  dt-bindings: Add Cavium Octeon Common Ethernet Interface.
  netdev: cavium: octeon: Header for Octeon III BGX Ethernet
  netdev: cavium: octeon: Add Octeon III BGX Ethernet Nexus
  netdev: cavium: octeon: Add Octeon III BGX Ports
  netdev: cavium: octeon: Add Octeon III PKI Support
  netdev: cavium: octeon: Add Octeon III PKO Support
  netdev: cavium: octeon: Add Octeon III SSO Support
  netdev: cavium: octeon: Add Octeon III BGX Ethernet core
  netdev: cavium: octeon: Add Octeon III BGX Ethernet building

David Daney (1):
  MAINTAINERS: Add entry for
drivers/net/ethernet/cavium/octeon/octeon3-*

 .../devicetree/bindings/net/cavium-bgx.txt |   59 +
 MAINTAINERS|6 +
 drivers/net/ethernet/cavium/Kconfig|   22 +-
 drivers/net/ethernet/cavium/octeon/Makefile|8 +-
 .../net/ethernet/cavium/octeon/octeon3-bgx-nexus.c |  673 ++
 .../net/ethernet/cavium/octeon/octeon3-bgx-port.c  | 2196 ++
 drivers/net/ethernet/cavium/octeon/octeon3-core.c  | 2380 
 drivers/net/ethernet/cavium/octeon/octeon3-pki.c   |  781 +++
 drivers/net/ethernet/cavium/octeon/octeon3-pko.c   | 1619 +
 drivers/net/ethernet/cavium/octeon/octeon3-sso.c   |  244 ++
 drivers/net/ethernet/cavium/octeon/octeon3.h   |  409 
 11 files changed, 8395 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/cavium-bgx.txt
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-bgx-nexus.c
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-core.c
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-pki.c
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-pko.c
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-sso.c
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3.h

-- 
2.1.4

[PATCH net-next v11 02/10] netdev: cavium: octeon: Header for Octeon III BGX Ethernet

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add the common header file used by the Octeon III BGX Ethernet
driver.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 drivers/net/ethernet/cavium/octeon/octeon3.h | 409 +++
 1 file changed, 409 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3.h

diff --git a/drivers/net/ethernet/cavium/octeon/octeon3.h 
b/drivers/net/ethernet/cavium/octeon/octeon3.h
new file mode 100644
index 000..2a64e1e
--- /dev/null
+++ b/drivers/net/ethernet/cavium/octeon/octeon3.h
@@ -0,0 +1,409 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Octeon III BGX Ethernet Driver 
+ *
+ * Copyright (C) 2018 Cavium, Inc.
+ */
+#ifndef _OCTEON3_H_
+#define _OCTEON3_H_
+
+#include 
+#include 
+#include 
+
+#include 
+
+#define MAX_NODES  2
+#define NODE_MASK  (MAX_NODES - 1)
+#define MAX_BGX_PER_NODE   6
+#define MAX_LMAC_PER_BGX   4
+
+#define IOBDMA_ORDERED_IO_ADDR 0xa200ull
+#define LMTDMA_ORDERED_IO_ADDR 0xa400ull
+
+#define SCRATCH_BASE   0x8000ull
+#define PKO_LMTLINE2ull
+#define LMTDMA_SCR_OFFSET  (PKO_LMTLINE * CVMX_CACHE_LINE_SIZE)
+
+/* Pko sub-command three bit codes (SUBDC3) */
+#define PKO_SENDSUBDC_GATHER   0x1
+
+/* Pko sub-command four bit codes (SUBDC4) */
+#define PKO_SENDSUBDC_TSO  0x8
+#define PKO_SENDSUBDC_FREE 0x9
+#define PKO_SENDSUBDC_WORK 0xa
+#define PKO_SENDSUBDC_MEM  0xc
+#define PKO_SENDSUBDC_EXT  0xd
+
+#define BGX_RX_FIFO_SIZE   (64 * 1024)
+#define BGX_TX_FIFO_SIZE   (32 * 1024)
+
+/* Registers are accessed via xkphys */
+#define SET_XKPHYS BIT_ULL(63)
+#define NODE_OFFSET(node)  ((node) * 0x10ull)
+
+/* Bgx register definitions */
+#define BGX_BASE   0x11800e000ull
+#define BGX_OFFSET(bgx)(BGX_BASE + ((bgx) << 24))
+#define INDEX_OFFSET(index)((index) << 20)
+#define INDEX_ADDR(n, b, i)(SET_XKPHYS + NODE_OFFSET(n) + \
+BGX_OFFSET(b) + INDEX_OFFSET(i))
+#define CAM_OFFSET(mac)((mac) << 3)
+#define CAM_ADDR(n, b, m)  (INDEX_ADDR(n, b, 0) + CAM_OFFSET(m))
+
+#define BGX_CMR_CONFIG(n, b, i)(INDEX_ADDR(n, b, i)  + 
0x0)
+#define BGX_CMR_GLOBAL_CONFIG(n, b)(INDEX_ADDR(n, b, 0)  + 0x8)
+#define BGX_CMR_RX_ID_MAP(n, b, i) (INDEX_ADDR(n, b, i)  + 0x00028)
+#define BGX_CMR_RX_BP_ON(n, b, i)  (INDEX_ADDR(n, b, i)  + 0x00088)
+#define BGX_CMR_RX_ADR_CTL(n, b, i)(INDEX_ADDR(n, b, i)  + 0x000a0)
+#define BGX_CMR_RX_FIFO_LEN(n, b, i)   (INDEX_ADDR(n, b, i)  + 0x000c0)
+#define BGX_CMR_RX_ADRX_CAM(n, b, m)   (CAM_ADDR(n, b, m)+ 0x00100)
+#define BGX_CMR_CHAN_MSK_AND(n, b) (INDEX_ADDR(n, b, 0)  + 0x00200)
+#define BGX_CMR_CHAN_MSK_OR(n, b)  (INDEX_ADDR(n, b, 0)  + 0x00208)
+#define BGX_CMR_TX_FIFO_LEN(n, b, i)   (INDEX_ADDR(n, b, i)  + 0x00418)
+#define BGX_CMR_TX_LMACS(n, b) (INDEX_ADDR(n, b, 0)  + 0x01000)
+
+#define BGX_SPU_CONTROL1(n, b, i)  (INDEX_ADDR(n, b, i)  + 0x1)
+#define BGX_SPU_STATUS1(n, b, i)   (INDEX_ADDR(n, b, i)  + 0x10008)
+#define BGX_SPU_STATUS2(n, b, i)   (INDEX_ADDR(n, b, i)  + 0x10020)
+#define BGX_SPU_BX_STATUS(n, b, i) (INDEX_ADDR(n, b, i)  + 0x10028)
+#define BGX_SPU_BR_STATUS1(n, b, i)(INDEX_ADDR(n, b, i)  + 0x10030)
+#define BGX_SPU_BR_STATUS2(n, b, i)(INDEX_ADDR(n, b, i)  + 0x10038)
+#define BGX_SPU_BR_BIP_ERR_CNT(n, b, i)(INDEX_ADDR(n, b, i)  + 
0x10058)
+#define BGX_SPU_BR_PMD_CONTROL(n, b, i)(INDEX_ADDR(n, b, i)  + 
0x10068)
+#define BGX_SPU_BR_PMD_LP_CUP(n, b, i) (INDEX_ADDR(n, b, i)  + 0x10078)
+#define BGX_SPU_BR_PMD_LD_CUP(n, b, i) (INDEX_ADDR(n, b, i)  + 0x10088)
+#define BGX_SPU_BR_PMD_LD_REP(n, b, i) (INDEX_ADDR(n, b, i)  + 0x10090)
+#define BGX_SPU_FEC_CONTROL(n, b, i)   (INDEX_ADDR(n, b, i)  + 0x100a0)
+#define BGX_SPU_AN_CONTROL(n, b, i)(INDEX_ADDR(n, b, i)  + 0x100c8)
+#define BGX_SPU_AN_STATUS(n, b, i) (INDEX_ADDR(n, b, i)  + 0x100d0)
+#define BGX_SPU_AN_ADV(n, b, i)(INDEX_ADDR(n, b, i)  + 
0x100d8)
+#define BGX_SPU_MISC_CONTROL(n, b, i)  (INDEX_ADDR(n, b, i)  + 0x10218)
+#define BGX_SPU_INT(n, b, i)   (INDEX_ADDR(n, b, i)  + 0x10220)
+#define BGX_SPU_DBG_CONTROL(n, b)  (INDEX_ADDR(n, b, 0)  + 0x10300)
+
+#define BGX_SMU_RX_INT(n, b, i)(INDEX_ADDR(n, b, i)  + 
0x2)
+#define BGX_SMU_RX_FRM_CTL(n, b, i)(INDEX_ADDR(n, b, i)  + 0x20008)
+#define

[PATCH net-next v11 05/10] netdev: cavium: octeon: Add Octeon III PKI Support

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add support for Octeon III PKI logic block for BGX Ethernet.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 drivers/net/ethernet/cavium/octeon/octeon3-pki.c | 781 +++
 1 file changed, 781 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-pki.c

diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-pki.c 
b/drivers/net/ethernet/cavium/octeon/octeon3-pki.c
new file mode 100644
index 000..63e136b
--- /dev/null
+++ b/drivers/net/ethernet/cavium/octeon/octeon3-pki.c
@@ -0,0 +1,781 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Octeon III Packet Input Unit (PKI)
+ *
+ * Copyright (C) 2018 Cavium, Inc.
+ */
+
+#include 
+
+#include "octeon3.h"
+
+#define PKI_CLUSTER_FIRMWARE   "cavium/pki-cluster.bin"
+#define VERSION_LEN8
+
+#define MAX_CLUSTERS   4
+#define MAX_BANKS  2
+#define MAX_BANK_ENTRIES   192
+#define PKI_NUM_QPG_ENTRY  2048
+#define PKI_NUM_STYLE  256
+#define PKI_NUM_FINAL_STYLE64
+#define MAX_PKNDS  64
+
+/* Registers are accessed via xkphys */
+#define PKI_BASE   0x118004400ull
+#define PKI_ADDR(node) (SET_XKPHYS + NODE_OFFSET(node) +  \
+PKI_BASE)
+
+#define PKI_SFT_RST(n) (PKI_ADDR(n) + 0x10)
+#define PKI_BUF_CTL(n) (PKI_ADDR(n) + 0x000100)
+#define PKI_STAT_CTL(n)(PKI_ADDR(n) + 
0x000110)
+#define PKI_ICG_CFG(n) (PKI_ADDR(n) + 0x00a000)
+
+#define CLUSTER_OFFSET(c)  ((c) << 16)
+#define CL_ADDR(n, c)  (PKI_ADDR(n) + CLUSTER_OFFSET(c))
+#define PKI_CL_ECC_CTL(n, c)   (CL_ADDR(n, c)   + 0x00c020)
+
+#define PKI_STYLE_BUF(n, s)(PKI_ADDR(n) + ((s) << 3)+ 0x024000)
+
+#define PKI_LTYPE_MAP(n, l)(PKI_ADDR(n) + ((l) << 3)+ 0x005000)
+#define PKI_IMEM(n, i) (PKI_ADDR(n) + ((i) << 3)+ 0x10)
+
+#define PKI_CL_PKIND_CFG(n, c, p)  (CL_ADDR(n, c) + ((p) << 8)  + 0x300040)
+#define PKI_CL_PKIND_STYLE(n, c, p)(CL_ADDR(n, c) + ((p) << 8)  + 0x300048)
+#define PKI_CL_PKIND_SKIP(n, c, p) (CL_ADDR(n, c) + ((p) << 8)  + 0x300050)
+#define PKI_CL_PKIND_L2_CUSTOM(n, c, p)(CL_ADDR(n, c) + ((p) << 8)  + 
0x300058)
+#define PKI_CL_PKIND_LG_CUSTOM(n, c, p)(CL_ADDR(n, c) + ((p) << 8)  + 
0x300060)
+
+#define STYLE_OFFSET(s)((s) << 3)
+#define STYLE_ADDR(n, c, s)(PKI_ADDR(n) + CLUSTER_OFFSET(c) + \
+STYLE_OFFSET(s))
+#define PKI_CL_STYLE_CFG(n, c, s)  (STYLE_ADDR(n, c, s) + 0x50)
+#define PKI_CL_STYLE_CFG2(n, c, s) (STYLE_ADDR(n, c, s) + 0x500800)
+#define PKI_CLX_STYLEX_ALG(n, c, s)(STYLE_ADDR(n, c, s) + 0x501000)
+
+#define PCAM_OFFSET(bank)  ((bank) << 12)
+#define PCAM_ENTRY_OFFSET(entry)   ((entry) << 3)
+#define PCAM_ADDR(n, c, b, e)  (PKI_ADDR(n) + CLUSTER_OFFSET(c) + \
+PCAM_OFFSET(b) + PCAM_ENTRY_OFFSET(e))
+#define PKI_CL_PCAM_TERM(n, c, b, e)   (PCAM_ADDR(n, c, b, e)   + 0x70)
+#define PKI_CL_PCAM_MATCH(n, c, b, e)  (PCAM_ADDR(n, c, b, e)   + 0x704000)
+#define PKI_CL_PCAM_ACTION(n, c, b, e) (PCAM_ADDR(n, c, b, e)   + 0x708000)
+
+#define PKI_QPG_TBLX(n, i) (PKI_ADDR(n) + ((i) << 3)+ 0x80)
+#define PKI_AURAX_CFG(n, a)(PKI_ADDR(n) + ((a) << 3)+ 0x90)
+#define PKI_STATX_STAT0(n, p)  (PKI_ADDR(n) + ((p) << 8)+ 0xe00038)
+#define PKI_STATX_STAT1(n, p)  (PKI_ADDR(n) + ((p) << 8)+ 0xe00040)
+#define PKI_STATX_STAT3(n, p)  (PKI_ADDR(n) + ((p) << 8)+ 0xe00050)
+
+enum pcam_term {
+   NONE,
+   L2_CUSTOM = 0x2,
+   HIGIGD = 0x4,
+   HIGIG = 0x5,
+   SMACH = 0x8,
+   SMACL = 0x9,
+   DMACH = 0xa,
+   DMACL = 0xb,
+   GLORT = 0x12,
+   DSA = 0x13,
+   ETHTYPE0 = 0x18,
+   ETHTYPE1 = 0x19,
+   ETHTYPE2 = 0x1a,
+   ETHTYPE3 = 0x1b,
+   MPLS0 = 0x1e,
+   L3_SIPHH = 0x1f,
+   L3_SIPMH = 0x20,
+   L3_SIPML = 0x21,
+   L3_SIPLL = 0x22,
+   L3_FLAGS = 0x23,
+   L3_DIPHH = 0x24,
+   L3_DIPMH = 0x25,
+   L3_DIPML = 0x26,
+   L3_DIPLL = 0x27,
+   LD_VNI = 0x28,
+   IL3_FLAGS = 0x2b,
+   LF_SPI = 0x2e,
+   L4_SPORT = 0x2f,
+   L4_PORT = 0x30,
+   LG_CUSTOM = 0x39
+};
+
+enum pki_ltype {
+   LTYPE_NONE,
+   LTYPE_ENET,
+   LTYPE_VLAN,
+   LTYPE_SNAP_PAYLD = 0x05,
+   LTYPE_ARP = 0x06,
+   LTYPE_RARP = 0x07,
+   LTYPE_IP4 = 0x08,
+   LTYPE_IP4_OPT = 0x09,
+   LTYPE_IP6 = 0x0a,
+   LTYPE_IP6_OPT = 0x0b,
+   LTYPE_IPSEC_ESP =

[PATCH net-next v11 07/10] netdev: cavium: octeon: Add Octeon III SSO Support

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add support for Octeon III SSO logic block for BGX Ethernet.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 drivers/net/ethernet/cavium/octeon/octeon3-sso.c | 244 +++
 1 file changed, 244 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-sso.c

diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-sso.c 
b/drivers/net/ethernet/cavium/octeon/octeon3-sso.c
new file mode 100644
index 000..51d67a8
--- /dev/null
+++ b/drivers/net/ethernet/cavium/octeon/octeon3-sso.c
@@ -0,0 +1,244 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Octeon III Schedule/Synchronize/Order Unit (SSO)
+ *
+ * Copyright (C) 2018 Cavium, Inc.
+ */
+
+#include "octeon3.h"
+
+/* Registers are accessed via xkphys. */
+#define SSO_BASE   0x16700ull
+#define SSO_ADDR(node) (SET_XKPHYS + NODE_OFFSET(node) + SSO_BASE)
+
+#define SSO_AW_STATUS(n)   (SSO_ADDR(n) + 0x10e0)
+#define SSO_AW_CFG(n)  (SSO_ADDR(n) + 0x10f0)
+#define SSO_ERR0(n)(SSO_ADDR(n) + 0x1240)
+#define SSO_TAQ_ADD(n) (SSO_ADDR(n) + 0x20e0)
+#define SSO_XAQ_AURA(n)(SSO_ADDR(n) + 0x2100)
+
+#define AQ_OFFSET(g)   ((g) << 3)
+#define AQ_ADDR(n, g)  (SSO_ADDR(n) + AQ_OFFSET(g))
+#define SSO_XAQ_HEAD_PTR(n, g) (AQ_ADDR(n, g) + 0x0008)
+#define SSO_XAQ_TAIL_PTR(n, g) (AQ_ADDR(n, g) + 0x0009)
+#define SSO_XAQ_HEAD_NEXT(n, g)(AQ_ADDR(n, g) + 0x000a)
+#define SSO_XAQ_TAIL_NEXT(n, g)(AQ_ADDR(n, g) + 0x000b)
+
+#define GRP_OFFSET(grp)((grp) << 16)
+#define GRP_ADDR(n, g) (SSO_ADDR(n)  + GRP_OFFSET(g))
+#define SSO_GRP_TAQ_THR(n, g)  (GRP_ADDR(n, g) + 0x2100)
+#define SSO_GRP_PRI(n, g)  (GRP_ADDR(n, g) + 0x2200)
+#define SSO_GRP_INT(n, g)  (GRP_ADDR(n, g) + 0x2400)
+#define SSO_GRP_INT_THR(n, g)  (GRP_ADDR(n, g) + 0x2500)
+#define SSO_GRP_AQ_CNT(n, g)   (GRP_ADDR(n, g) + 0x2700)
+
+static int octeon3_sso_get_num_groups(void)
+{
+   if (OCTEON_IS_MODEL(OCTEON_CN78XX))
+   return 256;
+   if (OCTEON_IS_MODEL(OCTEON_CNF75XX) || OCTEON_IS_MODEL(OCTEON_CN73XX))
+   return 64;
+   return 0;
+}
+
+void octeon3_sso_irq_set(int node, int group, bool enable)
+{
+   if (enable)
+   oct_csr_write(1, SSO_GRP_INT_THR(node, group));
+   else
+   oct_csr_write(0, SSO_GRP_INT_THR(node, group));
+
+   oct_csr_write(BIT(1), SSO_GRP_INT(node, group));
+}
+EXPORT_SYMBOL(octeon3_sso_irq_set);
+
+/* octeon3_sso_alloc_groups - Allocate a range of SSO groups.
+ * @node: Node where SSO resides.
+ * @groups: Pointer to allocated groups.
+ * @cnt: Number of groups to allocate.
+ * @start: Group number to start sequential allocation from. -1 for don't care.
+ *
+ * Returns 0 if successful, error code otherwise..
+ */
+int octeon3_sso_alloc_groups(int node, int *groups, int cnt, int start)
+{
+   struct global_resource_tag tag;
+   int group, ret;
+   char buf[16];
+
+   strncpy((char *), "cvm_sso_", 8);
+   snprintf(buf, 16, "0%d..", node);
+   memcpy(, buf, 8);
+
+   res_mgr_create_resource(tag, octeon3_sso_get_num_groups());
+
+   if (!groups)
+   ret = res_mgr_alloc_range(tag, start, cnt, false, );
+   if (!ret)
+   ret = group;
+   else
+   ret = res_mgr_alloc_range(tag, start, cnt, false, groups);
+
+   return ret;
+}
+EXPORT_SYMBOL(octeon3_sso_alloc_groups);
+
+/* octeon3_sso_free_groups - Free SSO groups.
+ * @node: Node where SSO resides.
+ * @groups: Array of groups to free.
+ * @cnt: Number of groups to free.
+ */
+void octeon3_sso_free_groups(int node, int *groups, intcnt)
+{
+   struct global_resource_tag tag;
+   char buf[16];
+
+   /* Allocate the requested groups. */
+   strncpy((char *), "cvm_sso_", 8);
+   snprintf(buf, 16, "0%d..", node);
+   memcpy(, buf, 8);
+
+   res_mgr_free_range(tag, groups, cnt);
+}
+EXPORT_SYMBOL(octeon3_sso_free_groups);
+
+/* octeon3_sso_pass1_limit - When the Transitory Admission Queue (TAQ) is
+ *   almost full, it is possible for the SSo to hang. We work around this
+ *   by ensuring that the sum of SSO_GRP(0..255)_TAQ_THR[MAX_THR] of all
+ *   used groups is <= 1264. This may reduce single group performance when
+ *   many groups are in use.
+ * @node: Node to update.
+ * @grp: SSO group to update.
+ */
+void octeon3_sso_pass1_limit(int node, int group)
+{
+   u64 max_thr, rsvd_thr, taq_add, taq_thr;
+
+   /* Ideally we would like to divide the maximum number of TAQ buffers
+* (1264) among the SSO groups in use. However, since we do not know
+* how many SSO groups are used by code outside this driver, we take
+* the worst case approach.
+*/
+   max_thr = 1264 / octeon3_sso_get_num_groups();
+   if (max_thr < 4)
+   max_thr =

[PATCH net-next v11 01/10] dt-bindings: Add Cavium Octeon Common Ethernet Interface.

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add bindings for Common Ethernet Interface (BGX) block.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 .../devicetree/bindings/net/cavium-bgx.txt | 59 ++
 1 file changed, 59 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/cavium-bgx.txt

diff --git a/Documentation/devicetree/bindings/net/cavium-bgx.txt 
b/Documentation/devicetree/bindings/net/cavium-bgx.txt
new file mode 100644
index 000..21c9606
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/cavium-bgx.txt
@@ -0,0 +1,59 @@
+* Common Ethernet Interface (BGX) block
+
+Properties:
+
+- compatible: "cavium,octeon-7890-bgx": Compatibility with all cn7xxx SOCs.
+
+- reg: The base address of the BGX block.
+
+- #address-cells: Must be <1>.
+
+- #size-cells: Must be <0>.  BGX addresses have no size component.
+
+Typically a BGX block has several children each representing a ethernet
+interface.
+
+Example:
+
+   ethernet-mac-nexus@11800e000 {
+   compatible = "cavium,octeon-7890-bgx";
+   reg = <0x00011800 0xe000 0x 0x0100>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ethernet-mac@0 {
+   ...
+   reg = <0>;
+   };
+   };
+
+
+* Ethernet Interface (BGX port) connects to PKI/PKO
+
+Properties:
+
+- compatible: "cavium,octeon-7890-bgx-port": Compatibility with all cn7xxx
+  SOCs.
+
+- reg: The index of the interface withing the BGX block.
+
+- local-mac-address: Mac address for the interface.
+
+- phy-handle: phandle to the phy node connected to the interface.
+
+
+* Ethernet Interface (BGX port) connects to XCV
+
+
+Properties:
+
+- compatible: "cavium,octeon-7360-xcv": Compatibility with cn73xx SOCs.
+
+- reg: The index of the interface withing the BGX block.
+
+- local-mac-address: Mac address for the interface.
+
+- phy-handle: phandle to the phy node connected to the interface.
+
+- cavium,rx-clk-delay-bypass: Set to <1> to bypass the rx clock delay setting.
+  Needed by the Micrel PHY.
-- 
2.1.4

[PATCH net-next v11 04/10] netdev: cavium: octeon: Add Octeon III BGX Ports

2018-06-04 Thread Steven J. Hill

From: Carlos Munoz 

Add individual BGX nexus port support for Octeon III BGX Ethernet.

Signed-off-by: Carlos Munoz 
Signed-off-by: Steven J. Hill 
---
 .../net/ethernet/cavium/octeon/octeon3-bgx-port.c  | 2196 
 1 file changed, 2196 insertions(+)
 create mode 100644 drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c

diff --git a/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c 
b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
new file mode 100644
index 000..c96254f
--- /dev/null
+++ b/drivers/net/ethernet/cavium/octeon/octeon3-bgx-port.c
@@ -0,0 +1,2196 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Octeon III BGX Nexus Ethernet driver
+ *
+ * Copyright (C) 2018 Cavium, Inc.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "octeon3.h"
+
+struct bgx_port_priv {
+   int node;
+   int bgx;
+   int index; /* Port index on BGX block*/
+   enum port_mode mode;
+   int pknd;
+   int qlm;
+   const u8 *mac_addr;
+   struct phy_device *phydev;
+   struct device_node *phy_np;
+   bool mode_1000basex;
+   bool bgx_as_phy;
+   struct net_device *netdev;
+   struct mutex lock;  /* Serializes delayed work */
+   struct port_status (*get_link)(struct bgx_port_priv *priv);
+   int (*set_link)(struct bgx_port_priv *priv, struct port_status status);
+   struct port_status last_status;
+   struct delayed_work dwork;
+   bool work_queued;
+};
+
+/* lmac_pknd keeps track of the port kinds assigned to the lmacs */
+static int lmac_pknd[MAX_NODES][MAX_BGX_PER_NODE][MAX_LMAC_PER_BGX];
+
+static struct workqueue_struct *check_state_wq;
+static DEFINE_MUTEX(check_state_wq_mutex);
+
+int bgx_port_get_qlm(int node, int bgx, int index)
+{
+   int qlm = -1;
+   u64 data;
+
+   if (OCTEON_IS_MODEL(OCTEON_CN78XX)) {
+   if (bgx < 2) {
+   data = oct_csr_read(BGX_CMR_GLOBAL_CONFIG(node, bgx));
+   if (data & 1)
+   qlm = bgx + 2;
+   else
+   qlm = bgx;
+   } else {
+   qlm = bgx + 2;
+   }
+   } else if (OCTEON_IS_MODEL(OCTEON_CN73XX)) {
+   if (bgx < 2) {
+   qlm = bgx + 2;
+   } else {
+   /* Ports on bgx2 can be connected to qlm5 or qlm6 */
+   if (index < 2)
+   qlm = 5;
+   else
+   qlm = 6;
+   }
+   } else if (OCTEON_IS_MODEL(OCTEON_CNF75XX)) {
+   /* Ports on bgx0 can be connected to qlm4 or qlm5 */
+   if (index < 2)
+   qlm = 4;
+   else
+   qlm = 5;
+   }
+
+   return qlm;
+}
+EXPORT_SYMBOL(bgx_port_get_qlm);
+
+/* Returns the mode of the bgx port */
+enum port_mode bgx_port_get_mode(int node, int bgx, int index)
+{
+   enum port_mode mode;
+   u64 data;
+
+   data = oct_csr_read(BGX_CMR_CONFIG(node, bgx, index));
+
+   switch ((data >> 8) & 7) {
+   case 0:
+   mode = PORT_MODE_SGMII;
+   break;
+   case 1:
+   mode = PORT_MODE_XAUI;
+   break;
+   case 2:
+   mode = PORT_MODE_RXAUI;
+   break;
+   case 3:
+   data = oct_csr_read(BGX_SPU_BR_PMD_CONTROL(node, bgx, index));
+   /* The use of training differentiates 10G_KR from xfi */
+   if (data & BIT(1))
+   mode = PORT_MODE_10G_KR;
+   else
+   mode = PORT_MODE_XFI;
+   break;
+   case 4:
+   data = oct_csr_read(BGX_SPU_BR_PMD_CONTROL(node, bgx, index));
+   /* The use of training differentiates 40G_KR4 from xlaui */
+   if (data & BIT(1))
+   mode = PORT_MODE_40G_KR4;
+   else
+   mode = PORT_MODE_XLAUI;
+   break;
+   case 5:
+   mode = PORT_MODE_RGMII;
+   break;
+   default:
+   mode = PORT_MODE_DISABLED;
+   break;
+   }
+
+   return mode;
+}
+EXPORT_SYMBOL(bgx_port_get_mode);
+
+int bgx_port_allocate_pknd(int node)
+{
+   struct global_resource_tag tag;
+   char buf[16];
+   int pknd;
+
+   strncpy((char *), "cvm_pknd", 8);
+   snprintf(buf, 16, "_%d..", node);
+   memcpy(, buf, 8);
+
+   res_mgr_create_resource(tag, 64);
+   pknd = res_mgr_alloc(tag, -1, false);
+   if (pknd < 0) {
+   pr_err("bgx-port: Failed to allocate pknd\n");
+   return -ENODEV;
+   }
+
+   return pknd;
+}
+EXPORT_SYMBOL(bgx_port_allocate_pknd);
+
+int bgx_port_get_pknd(int node, int bgx, int index)
+{
+   return lmac_pknd[node][bgx][index];
+}
+EXPORT_SYMBOL(bgx_port_get_pknd);
+
+/*

[PATCH net-next 1/3] devlink: Add extack to reload and port_{un,}split operations

2018-06-04 Thread dsahern

From: David Ahern 

Add extack argument to reload, port_split and port_unsplit operations.

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c   |  9 ++---
 drivers/net/ethernet/netronome/nfp/nfp_devlink.c |  5 +++--
 drivers/net/netdevsim/devlink.c  |  3 ++-
 include/net/devlink.h|  7 ---
 net/core/devlink.c   | 18 ++
 5 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 8a766fe28fa0..7ed38d80bc08 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -770,7 +770,8 @@ static void mlxsw_core_driver_put(const char *kind)
 
 static int mlxsw_devlink_port_split(struct devlink *devlink,
unsigned int port_index,
-   unsigned int count)
+   unsigned int count,
+   struct netlink_ext_ack *extack)
 {
struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
 
@@ -782,7 +783,8 @@ static int mlxsw_devlink_port_split(struct devlink *devlink,
 }
 
 static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
- unsigned int port_index)
+ unsigned int port_index,
+ struct netlink_ext_ack *extack)
 {
struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
 
@@ -963,7 +965,8 @@ mlxsw_devlink_sb_occ_tc_port_bind_get(struct devlink_port 
*devlink_port,
 pool_type, p_cur, p_max);
 }
 
-static int mlxsw_devlink_core_bus_device_reload(struct devlink *devlink)
+static int mlxsw_devlink_core_bus_device_reload(struct devlink *devlink,
+   struct netlink_ext_ack *extack)
 {
struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
int err;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c 
b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
index 71c2edd83031..db463e20a876 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_devlink.c
@@ -92,7 +92,7 @@ nfp_devlink_set_lanes(struct nfp_pf *pf, unsigned int idx, 
unsigned int lanes)
 
 static int
 nfp_devlink_port_split(struct devlink *devlink, unsigned int port_index,
-  unsigned int count)
+  unsigned int count, struct netlink_ext_ack *extack)
 {
struct nfp_pf *pf = devlink_priv(devlink);
struct nfp_eth_table_port eth_port;
@@ -123,7 +123,8 @@ nfp_devlink_port_split(struct devlink *devlink, unsigned 
int port_index,
 }
 
 static int
-nfp_devlink_port_unsplit(struct devlink *devlink, unsigned int port_index)
+nfp_devlink_port_unsplit(struct devlink *devlink, unsigned int port_index,
+struct netlink_ext_ack *extack)
 {
struct nfp_pf *pf = devlink_priv(devlink);
struct nfp_eth_table_port eth_port;
diff --git a/drivers/net/netdevsim/devlink.c b/drivers/net/netdevsim/devlink.c
index bef7db5d129a..e8366cf372ff 100644
--- a/drivers/net/netdevsim/devlink.c
+++ b/drivers/net/netdevsim/devlink.c
@@ -147,7 +147,8 @@ static int devlink_resources_register(struct devlink 
*devlink)
return err;
 }
 
-static int nsim_devlink_reload(struct devlink *devlink)
+static int nsim_devlink_reload(struct devlink *devlink,
+  struct netlink_ext_ack *extack)
 {
enum nsim_resource_id res_ids[] = {
NSIM_RESOURCE_IPV4_FIB, NSIM_RESOURCE_IPV4_FIB_RULES,
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 9686a1aa4ec9..e336ea9c73df 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -296,12 +296,13 @@ struct devlink_resource {
 #define DEVLINK_RESOURCE_ID_PARENT_TOP 0
 
 struct devlink_ops {
-   int (*reload)(struct devlink *devlink);
+   int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
int (*port_type_set)(struct devlink_port *devlink_port,
 enum devlink_port_type port_type);
int (*port_split)(struct devlink *devlink, unsigned int port_index,
- unsigned int count);
-   int (*port_unsplit)(struct devlink *devlink, unsigned int port_index);
+ unsigned int count, struct netlink_ext_ack *extack);
+   int (*port_unsplit)(struct devlink *devlink, unsigned int port_index,
+   struct netlink_ext_ack *extack);
int (*sb_pool_get)(struct devlink *devlink, unsigned int sb_index,
   u16 pool_index,
   struct devlink_sb_pool_info *pool_info);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index f75ee022e6b2..22099705cc41 100644
---

[PATCH net-next 3/3] mlxsw: Add extack messages for port_{un,}split failures

2018-06-04 Thread dsahern

From: David Ahern 

Return messages in extack for port split/unsplit errors. e.g.,
$ devlink port split swp1s1 count 4
Error: mlxsw_spectrum: Port cannot be split further.
devlink answers: Invalid argument

$ devlink port unsplit swp4
Error: mlxsw_spectrum: Port was not split.
devlink answers: Invalid argument

Signed-off-by: David Ahern 
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 14 ++
 drivers/net/ethernet/mellanox/mlxsw/core.h |  5 +++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 13 +++--
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 7ed38d80bc08..f9c724752a32 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -775,11 +775,14 @@ static int mlxsw_devlink_port_split(struct devlink 
*devlink,
 {
struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
 
-   if (port_index >= mlxsw_core->max_ports)
+   if (port_index >= mlxsw_core->max_ports) {
+   NL_SET_ERR_MSG_MOD(extack, "Port index exceeds maximum number 
of ports");
return -EINVAL;
+   }
if (!mlxsw_core->driver->port_split)
return -EOPNOTSUPP;
-   return mlxsw_core->driver->port_split(mlxsw_core, port_index, count);
+   return mlxsw_core->driver->port_split(mlxsw_core, port_index, count,
+ extack);
 }
 
 static int mlxsw_devlink_port_unsplit(struct devlink *devlink,
@@ -788,11 +791,14 @@ static int mlxsw_devlink_port_unsplit(struct devlink 
*devlink,
 {
struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
 
-   if (port_index >= mlxsw_core->max_ports)
+   if (port_index >= mlxsw_core->max_ports) {
+   NL_SET_ERR_MSG_MOD(extack, "Port index exceeds maximum number 
of ports");
return -EINVAL;
+   }
if (!mlxsw_core->driver->port_unsplit)
return -EOPNOTSUPP;
-   return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index);
+   return mlxsw_core->driver->port_unsplit(mlxsw_core, port_index,
+   extack);
 }
 
 static int
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h 
b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 4a8d4c7f89d9..552cfa29c2f7 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -274,8 +274,9 @@ struct mlxsw_driver {
int (*port_type_set)(struct mlxsw_core *mlxsw_core, u8 local_port,
 enum devlink_port_type new_type);
int (*port_split)(struct mlxsw_core *mlxsw_core, u8 local_port,
- unsigned int count);
-   int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port);
+ unsigned int count, struct netlink_ext_ack *extack);
+   int (*port_unsplit)(struct mlxsw_core *mlxsw_core, u8 local_port,
+   struct netlink_ext_ack *extack);
int (*sb_pool_get)(struct mlxsw_core *mlxsw_core,
   unsigned int sb_index, u16 pool_index,
   struct devlink_sb_pool_info *pool_info);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index fc39f22e5c70..1b6d930e452d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3092,7 +3092,8 @@ static void mlxsw_sp_port_unsplit_create(struct mlxsw_sp 
*mlxsw_sp,
 }
 
 static int mlxsw_sp_port_split(struct mlxsw_core *mlxsw_core, u8 local_port,
-  unsigned int count)
+  unsigned int count,
+  struct netlink_ext_ack *extack)
 {
struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
struct mlxsw_sp_port *mlxsw_sp_port;
@@ -3104,6 +3105,7 @@ static int mlxsw_sp_port_split(struct mlxsw_core 
*mlxsw_core, u8 local_port,
if (!mlxsw_sp_port) {
dev_err(mlxsw_sp->bus_info->dev, "Port number \"%d\" does not 
exist\n",
local_port);
+   NL_SET_ERR_MSG_MOD(extack, "Port number does not exist");
return -EINVAL;
}
 
@@ -3112,11 +3114,13 @@ static int mlxsw_sp_port_split(struct mlxsw_core 
*mlxsw_core, u8 local_port,
 
if (count != 2 && count != 4) {
netdev_err(mlxsw_sp_port->dev, "Port can only be split into 2 
or 4 ports\n");
+   NL_SET_ERR_MSG_MOD(extack, "Port can only be split into 2 or 4 
ports");
return -EINVAL;
}
 
if (cur_width != MLXSW_PORT_MODULE_MAX_WIDTH) {
netdev_err(mlxsw_sp_port->dev, "Port cannot be split 
further\n");
+   NL_SET_ERR_MSG_MOD(extack, "Port cannot be split further");

[PATCH net-next 2/3] netdevsim: Add extack error message for devlink reload

2018-06-04 Thread dsahern

From: David Ahern 

devlink reset command can fail if a FIB resource limit is set to a value
lower than the current occupancy. Return a proper message indicating the
reason for the failure.

$ devlink resource sh netdevsim/netdevsim0
netdevsim/netdevsim0:
  name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 
dpipe_tables none
resources:
  name fib size unlimited occ 43 unit entry size_min 0 size_max unlimited 
size_gran 1 dpipe_tables none
  name fib-rules size unlimited occ 4 unit entry size_min 0 size_max 
unlimited size_gran 1 dpipe_tables none
  name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 
dpipe_tables none
resources:
  name fib size unlimited occ 54 unit entry size_min 0 size_max unlimited 
size_gran 1 dpipe_tables none
  name fib-rules size unlimited occ 3 unit entry size_min 0 size_max 
unlimited size_gran 1 dpipe_tables none

$ devlink resource set netdevsim/netdevsim0 path /IPv4/fib size 40

$ devlink dev  reload netdevsim/netdevsim0
Error: netdevsim: New size is less than current occupancy.
devlink answers: Invalid argument

Signed-off-by: David Ahern 
---
 drivers/net/netdevsim/devlink.c   | 4 ++--
 drivers/net/netdevsim/fib.c   | 9 ++---
 drivers/net/netdevsim/netdevsim.h | 3 ++-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/netdevsim/devlink.c b/drivers/net/netdevsim/devlink.c
index e8366cf372ff..ba663e5af168 100644
--- a/drivers/net/netdevsim/devlink.c
+++ b/drivers/net/netdevsim/devlink.c
@@ -163,7 +163,7 @@ static int nsim_devlink_reload(struct devlink *devlink,
 
err = devlink_resource_size_get(devlink, res_ids[i], );
if (!err) {
-   err = nsim_fib_set_max(net, res_ids[i], val);
+   err = nsim_fib_set_max(net, res_ids[i], val, extack);
if (err)
return err;
}
@@ -181,7 +181,7 @@ static void nsim_devlink_net_reset(struct net *net)
int i;
 
for (i = 0; i < ARRAY_SIZE(res_ids); ++i) {
-   if (nsim_fib_set_max(net, res_ids[i], (u64)-1)) {
+   if (nsim_fib_set_max(net, res_ids[i], (u64)-1, NULL)) {
pr_err("Failed to reset limit for resource %u\n",
   res_ids[i]);
}
diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c
index 9bfe9e151e13..f61d094746c0 100644
--- a/drivers/net/netdevsim/fib.c
+++ b/drivers/net/netdevsim/fib.c
@@ -64,7 +64,8 @@ u64 nsim_fib_get_val(struct net *net, enum nsim_resource_id 
res_id, bool max)
return max ? entry->max : entry->num;
 }
 
-int nsim_fib_set_max(struct net *net, enum nsim_resource_id res_id, u64 val)
+int nsim_fib_set_max(struct net *net, enum nsim_resource_id res_id, u64 val,
+struct netlink_ext_ack *extack)
 {
struct nsim_fib_data *fib_data = net_generic(net, nsim_fib_net_id);
struct nsim_fib_entry *entry;
@@ -90,10 +91,12 @@ int nsim_fib_set_max(struct net *net, enum nsim_resource_id 
res_id, u64 val)
/* not allowing a new max to be less than curren occupancy
 * --> no means of evicting entries
 */
-   if (val < entry->num)
+   if (val < entry->num) {
+   NL_SET_ERR_MSG_MOD(extack, "New size is less than current 
occupancy");
err = -EINVAL;
-   else
+   } else {
entry->max = val;
+   }
 
return err;
 }
diff --git a/drivers/net/netdevsim/netdevsim.h 
b/drivers/net/netdevsim/netdevsim.h
index 3a8581af3b85..8ca50b72c328 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -126,7 +126,8 @@ void nsim_devlink_exit(void);
 int nsim_fib_init(void);
 void nsim_fib_exit(void);
 u64 nsim_fib_get_val(struct net *net, enum nsim_resource_id res_id, bool max);
-int nsim_fib_set_max(struct net *net, enum nsim_resource_id res_id, u64 val);
+int nsim_fib_set_max(struct net *net, enum nsim_resource_id res_id, u64 val,
+struct netlink_ext_ack *extack);
 #else
 static inline int nsim_devlink_setup(struct netdevsim *ns)
 {
-- 
2.11.0

[PATCH net-next 0/3] devlink: Add extack messages for reload and port split/unsplit

2018-06-04 Thread dsahern

From: David Ahern 

Patch 1 adds extack arg to reload, port_split and port_unsplit devlink
operations.

Patch 2 adds extack messages for reload operation in netdevsim.

Patch 3 adds extack messages to port split/unsplit in mlxsw driver.

David Ahern (3):
  devlink: Add extack to reload and port_{un,}split operations
  netdevsim: Add extack error message for devlink reload
  mlxsw: Add extack messages for port_{un,}split failures

 drivers/net/ethernet/mellanox/mlxsw/core.c   | 23 ---
 drivers/net/ethernet/mellanox/mlxsw/core.h   |  5 +++--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c   | 13 +++--
 drivers/net/ethernet/netronome/nfp/nfp_devlink.c |  5 +++--
 drivers/net/netdevsim/devlink.c  |  7 ---
 drivers/net/netdevsim/fib.c  |  9 ++---
 drivers/net/netdevsim/netdevsim.h|  3 ++-
 include/net/devlink.h|  7 ---
 net/core/devlink.c   | 18 ++
 9 files changed, 59 insertions(+), 31 deletions(-)

-- 
2.11.0

Re: [PATCH net-next 0/2] net: phy: improve PM handling of PHY/MDIO

2018-06-04 Thread Florian Fainelli

On 06/04/2018 02:48 PM, Andrew Lunn wrote:
> On Sat, Jun 02, 2018 at 10:33:36PM +0200, Heiner Kallweit wrote:
>> Current implementation of MDIO bus PM ops doesn't actually implement
>> bus-specific PM ops but just calls PM ops defined on a device level
>> what doesn't seem to be fully in line with the core PM model.
>>
>> When looking e.g. at __device_suspend() the PM core looks for PM ops
>> of a device in a specific order:
>> 1. device PM domain
>> 2. device type
>> 3. device class
>> 4. device bus
>>
>> I think it has good reason that there's no PM ops on device level.
>> The situation can be improved by modeling PHY's as device type of
>> a MDIO device. If for some other type of MDIO device PM ops are
>> needed, it could be modeled as struct device_type as well.
> 
> Hi Heiner
> 
> I tested that the files in /sys/class/bus/mdio/devices/* are still
> there. And also not there for MDIO devices which are not PHYs,
> e.g. Ethernet switches.
> 
> I don't have any boards which do PM. So i cannot test suspend/resume.
> 
> I also took a look at drivers/net/dsa/qca8k.c. This is an MDIO switch
> which has PM operations. I don't think this change will break it.

I don't think so, but I will give it a spin on a board that has system
wide suspend/resume support. Might take a few hours.

> 
> I would prefer a bit more testing, but i guess that is what -rc
> kernels are for.
> 
> Tested-by: Andrew Lunn 
> 
> Andrew
> 


-- 
Florian

Re: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread Alexander Duyck

On Mon, Jun 4, 2018 at 2:27 PM, David Miller  wrote:
> From: Or Gerlitz 
> Date: Tue, 5 Jun 2018 00:11:35 +0300
>
>> Just to make sure, is the AF_XDP ZC (Zero Copy) UAPI going to be
>> merged for this window -- AFAIU from [1], it's still under
>> examination/development/research for non Intel HWs, am I correct or
>> this is going to get in now?
>
> All of the pending AF_XDP changes will be merged this merge window.
>
> I think Intel folks need to review things as fast as possible because
> I pretty much refuse to revert the series or disable it in Kconfig at
> this point.
>
> Thank you.

My understanding of things is that the current AF_XDP patches were
going to be updated to have more of a model agnostic API such that
they would work for either the "typewriter" mode or the descriptor
ring based approach. The current plan was to have the zero copy
patches be a follow-on after the vendor agnostic API bits in the
descriptors and such had been sorted out. I believe you guys have the
descriptor fixes already right?

In my opinion the i40e code isn't mature enough yet to really go into
anything other than maybe net-next in a couple weeks. We are going to
need a while to get adequate testing in order to flush out all the
bugs and performance regressions we are likely to see coming out of
this change.

- Alex

Re: [PATCH net-next 0/2] net: phy: improve PM handling of PHY/MDIO

2018-06-04 Thread Andrew Lunn

On Sat, Jun 02, 2018 at 10:33:36PM +0200, Heiner Kallweit wrote:
> Current implementation of MDIO bus PM ops doesn't actually implement
> bus-specific PM ops but just calls PM ops defined on a device level
> what doesn't seem to be fully in line with the core PM model.
> 
> When looking e.g. at __device_suspend() the PM core looks for PM ops
> of a device in a specific order:
> 1. device PM domain
> 2. device type
> 3. device class
> 4. device bus
> 
> I think it has good reason that there's no PM ops on device level.
> The situation can be improved by modeling PHY's as device type of
> a MDIO device. If for some other type of MDIO device PM ops are
> needed, it could be modeled as struct device_type as well.

Hi Heiner

I tested that the files in /sys/class/bus/mdio/devices/* are still
there. And also not there for MDIO devices which are not PHYs,
e.g. Ethernet switches.

I don't have any boards which do PM. So i cannot test suspend/resume.

I also took a look at drivers/net/dsa/qca8k.c. This is an MDIO switch
which has PM operations. I don't think this change will break it.

I would prefer a bit more testing, but i guess that is what -rc
kernels are for.

Tested-by: Andrew Lunn 

Andrew

Re: [PATCH net-next] net: sched: return error code when tcf proto is not found

2018-06-04 Thread David Miller

From: Vlad Buslov 
Date: Mon,  4 Jun 2018 18:32:23 +0300

> If requested tcf proto is not found, get and del filter netlink protocol
> handlers output error message to extack, but do not return actual error
> code. Add check to return ENOENT when result of tp find function is NULL
> pointer.
> 
> Fixes: c431f89b18a2 ("net: sched: split tc_ctl_tfilter into three
> handlers")

Please do not split up a Fixes: tag into multiple lines.  I fixed it
up for you this time.

> Reported-by: Dan Carpenter 
> Signed-off-by: Vlad Buslov 

Applied, thanks.

Re: [PATCH 1/2 v2 net-next] net_failover: Use netdev_features_t instead of u32

2018-06-04 Thread David Miller

From: Dan Carpenter 
Date: Mon, 4 Jun 2018 17:43:21 +0300

> The features mask needs to be a netdev_features_t (u64) because a u32
> is not big enough.
> 
> Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
> Signed-off-by: Dan Carpenter 
> ---
> v2: In the original patch, I thought that the & should be | and I
> introduced a bug.

Applied.

Re: [PATCH 2/2 net] team: use netdev_features_t instead of u32

2018-06-04 Thread David Miller

From: Dan Carpenter 
Date: Mon, 4 Jun 2018 17:46:01 +0300

> This code was introduced in 2011 around the same time that we made
> netdev_features_t a u64 type.  These days a u32 is not big enough to
> hold all the potential features.
> 
> Signed-off-by: Dan Carpenter 

Applied and queued up for -stable.

Re: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread David Miller

From: Or Gerlitz 
Date: Tue, 5 Jun 2018 00:11:35 +0300

> Just to make sure, is the AF_XDP ZC (Zero Copy) UAPI going to be
> merged for this window -- AFAIU from [1], it's still under
> examination/development/research for non Intel HWs, am I correct or
> this is going to get in now?

All of the pending AF_XDP changes will be merged this merge window.

I think Intel folks need to review things as fast as possible because
I pretty much refuse to revert the series or disable it in Kconfig at
this point.

Thank you.

Re: [PATCH net] ipmr: fix error path when mr_table_alloc fails

2018-06-04 Thread David Miller

From: Sabrina Dubroca 
Date: Mon,  4 Jun 2018 13:55:54 +0200

> commit 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
> refactored ipmr_new_table, so that it now returns NULL when
> mr_table_alloc fails. Unfortunately, all callers of ipmr_new_table
> expect an ERR_PTR. commit 66fb33254f45 ("ipmr: properly check
> rhltable_init() return value") followed suit.
> 
> This can result in NULL deref, when ipmr_rules_exit calls
> ipmr_free_table with NULL net->ipv4.mrt in the
> !CONFIG_IP_MROUTE_MULTIPLE_TABLES version.
> 
> This patch makes mr_table_alloc return errors, and changes
> ip6mr_new_table and its callers to return/expect error pointers as
> well. It also removes the version of mr_table_alloc defined under
> !CONFIG_IP_MROUTE_COMMON, since it is never used.
> 
> Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table")
> Fixes: 66fb33254f45 ("ipmr: properly check rhltable_init() return value")
> Signed-off-by: Sabrina Dubroca 

This adds a new warning with gcc-8.1.1 on Fedora 28

  CC [M]  net/ipv6/ip6mr.o
In file included from ./arch/x86/include/asm/current.h:5,
 from ./include/linux/sched.h:12,
 from ./include/linux/uaccess.h:5,
 from net/ipv6/ip6mr.c:19:
net/ipv6/ip6mr.c: In function ‘ip6_mroute_setsockopt’:
./include/linux/compiler.h:177:26: warning: ‘mrt’ may be used uninitialized in 
this function [-Wmaybe-uninitialized]
  case 8: *(__u64 *)res = *(volatile __u64 *)p; break;  \
  ^
net/ipv6/ip6mr.c:1752:20: note: ‘mrt’ was declared here
   struct mr_table *mrt;
^~~

Re: pull request: bluetooth-next 2018-06-04

2018-06-04 Thread David Miller

From: Johan Hedberg 
Date: Mon, 4 Jun 2018 12:15:32 +0200

> Here's one last bluetooth-next pull request for the 4.18 kernel:
> 
>  - New USB device IDs for Realtek 8822BE and 8723DE
>  - reset/resume fix for Dell Inspiron 5565
>  - Fix HCI_UART_INIT_PENDING flag behavior
>  - Fix patching behavior for some ATH3012 models
>  - A few other minor cleanups & fixes
> 
> Please let me know if there are any issues pulling. Thanks.

Pulled, thanks Johan.

Re: [PATCH] docs: networking: fix minor typos in various documentation files

2018-06-04 Thread David Miller

From: Olivier Gayot 
Date: Mon,  4 Jun 2018 12:07:37 +0200

> This patch fixes some typos/misspelling errors in the
> Documentation/networking files.
> 
> Signed-off-by: Olivier Gayot 

Applied, thank you.

Re: [PATCH net] net: qualcomm: rmnet: Fix use after free while sending command ack

2018-06-04 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Sun,  3 Jun 2018 16:17:55 -0600

> When sending an ack to a command packet, the skb is still referenced
> after it is sent to the real device. Since the real device could
> free the skb, the device pointer would be invalid.
> 
> Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial 
> implementation")
> Signed-off-by: Subash Abhinov Kasiviswanathan 

This doesn't apply cleanly to the current net-next tree, please respin.

Re: [PATCH iproute2 1/2] ip: display netns name instead of nsid

2018-06-04 Thread Stephen Hemminger

On Mon,  4 Jun 2018 14:12:52 +0200
Nicolas Dichtel  wrote:

> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index c7c7e7df4e81..aee09c7ff6df 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -819,6 +819,9 @@ int print_linkinfo(const struct sockaddr_nl *who,
>   unsigned int m_flag = 0;
>   SPRINT_BUF(b1);
>  
> + netns_nsid_socket_init();
> + netns_map_init();
> +

The idea of printing network namespace is good but I am concerned that
setting up yet another netlink socket and scanning the netns directory on
each ip link command will impact some users.

Can this setup be deferred until the first net ns lookup happens?

Re: [PATCH net-next] net: ipv6: Generate random IID for addresses on RAWIP devices

2018-06-04 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Sun,  3 Jun 2018 15:54:34 -0600

> RAWIP devices such as rmnet do not have a hardware address and
> instead require the kernel to generate a random IID for the
> temporary addresses. For permanent addresses, the device IID is
> used along with prefix received.
> 
> Signed-off-by: Subash Abhinov Kasiviswanathan 

Please address yoshfuji's feedback, thank you.

Re: [PATCH] net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets

2018-06-04 Thread David Miller

From: "Maciej Żenczykowski" 
Date: Sun,  3 Jun 2018 10:47:05 -0700

> From: Maciej Żenczykowski 
> 
> It is not safe to do so because such sockets are already in the
> hash tables and changing these options can result in invalidating
> the tb->fastreuse(port) caching.
> 
> This can have later far reaching consequences wrt. bind conflict checks
> which rely on these caches (for optimization purposes).
> 
> Not to mention that you can currently end up with two identical
> non-reuseport listening sockets bound to the same local ip:port
> by clearing reuseport on them after they've already both been bound.
> 
> There is unfortunately no EISBOUND error or anything similar,
> and EISCONN seems to be misleading for a bound-but-not-connected
> socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT
> is the closest you can get to meaning 'socket in bad state'.
> (although perhaps EINVAL wouldn't be a bad choice either?)
> 
> This does unfortunately run the risk of breaking buggy
> userspace programs...
> 
> Signed-off-by: Maciej Żenczykowski 
> Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b

Applied and queued up for -stable.

Re: [PATCH] net-tcp: extend tcp_tw_reuse sysctl to enable loopback only optimization

2018-06-04 Thread David Miller

From: "Maciej Żenczykowski" 
Date: Sun,  3 Jun 2018 10:41:17 -0700

> From: Maciej Żenczykowski 
> 
> This changes the /proc/sys/net/ipv4/tcp_tw_reuse from a boolean
> to an integer.
> 
> It now takes the values 0, 1 and 2, where 0 and 1 behave as before,
> while 2 enables timewait socket reuse only for sockets that we can
> prove are loopback connections:
>   ie. bound to 'lo' interface or where one of source or destination
>   IPs is 127.0.0.0/8, :::127.0.0.0/104 or ::1.
> 
> This enables quicker reuse of ephemeral ports for loopback connections
> - where tcp_tw_reuse is 100% safe from a protocol perspective
> (this assumes no artificially induced packet loss on 'lo').
> 
> This also makes estblishing many loopback connections *much* faster
> (allocating ports out of the first half of the ephemeral port range
> is significantly faster, then allocating from the second half)
> 
> Without this change in a 32K ephemeral port space my sample program
> (it just establishes and closes [::1]:ephemeral -> [::1]:server_port
> connections in a tight loop) fails after 32765 connections in 24 seconds.
> With it enabled 5 connections only take 4.7 seconds.
> 
> This is particularly problematic for IPv6 where we only have one local
> address and cannot play tricks with varying source IP from 127.0.0.0/8
> pool.
> 
> Signed-off-by: Maciej Żenczykowski 

Applied, thank you.

Re: [PATCH net-next v2] qed: Add srq core support for RoCE and iWARP

2018-06-04 Thread David Miller

From: Yuval Bason 
Date: Sun, 3 Jun 2018 19:13:07 +0300

> This patch adds support for configuring SRQ and provides the necessary
> APIs for rdma upper layer driver (qedr) to enable the SRQ feature.
> 
> Signed-off-by: Michal Kalderon 
> Signed-off-by: Ariel Elior 
> Signed-off-by: Yuval Bason 
> ---
> Changes from v1:
>   - sparse warnings
>   - replace memset with ={}

Applied, thanks.

Re: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread Or Gerlitz

On Mon, Jun 4, 2018 at 11:30 PM, David Miller  wrote:
> It's open a day or two more to deal with the AF_XDP issues...

Dave,

Just to make sure, is the AF_XDP ZC (Zero Copy) UAPI going to be merged for
this window -- AFAIU from [1], it's still under
examination/development/research for
non Intel HWs, am I correct or this is going to get in now?

Or

[1] https://marc.info/?l=linux-netdev=152810546108060=2

Re: [PATCH net-next V2 2/2] cls_flower: Fix comparing of old filter mask with new filter

2018-06-04 Thread David Miller

From: Paul Blakey 
Date: Sun,  3 Jun 2018 10:06:14 +0300

> We incorrectly compare the mask and the result is that we can't modify
> an already existing rule.
> 
> Fix that by comparing correctly.
> 
> Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
> Reported-by: Vlad Buslov 
> Reviewed-by: Roi Dayan 
> Reviewed-by: Jiri Pirko 
> Signed-off-by: Paul Blakey 

Applied.

Re: [PATCH net-next V2 1/2] cls_flower: Fix missing free of rhashtable

2018-06-04 Thread David Miller

From: Paul Blakey 
Date: Sun,  3 Jun 2018 10:06:13 +0300

> When destroying the instance, destroy the head rhashtable.
> 
> Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
> Reported-by: Vlad Buslov 
> Reviewed-by: Roi Dayan 
> Reviewed-by: Jiri Pirko 
> Signed-off-by: Paul Blakey 

Applied.

Re: [PATCH bpf-next 10/11] i40e: implement AF_XDP zero-copy support for Tx

2018-06-04 Thread Alexander Duyck

On Mon, Jun 4, 2018 at 5:06 AM, Björn Töpel  wrote:
> From: Magnus Karlsson 
>
> Here, ndo_xsk_async_xmit is implemented. As a shortcut, the existing
> XDP Tx rings are used for zero-copy. This will result in other devices
> doing XDP_REDIRECT to an AF_XDP enabled queue will have its packets
> dropped.
>
> Signed-off-by: Magnus Karlsson 
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c |   7 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c |  93 +++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h |  23 +
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 140 
> 
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |   2 +
>  include/net/xdp_sock.h  |  14 +++
>  6 files changed, 242 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 8c602424d339..98c18c41809d 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -3073,8 +3073,12 @@ static int i40e_configure_tx_ring(struct i40e_ring 
> *ring)
> i40e_status err = 0;
> u32 qtx_ctl = 0;
>
> -   if (ring_is_xdp(ring))
> +   ring->clean_tx_irq = i40e_clean_tx_irq;
> +   if (ring_is_xdp(ring)) {
> ring->xsk_umem = i40e_xsk_umem(ring);
> +   if (ring->xsk_umem)
> +   ring->clean_tx_irq = i40e_clean_tx_irq_zc;

Again, I am worried what the performance penalty on this will be given
the retpoline penalty for function pointers.

> +   }
>
> /* some ATR related tx ring init */
> if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) {
> @@ -12162,6 +12166,7 @@ static const struct net_device_ops i40e_netdev_ops = {
> .ndo_bpf= i40e_xdp,
> .ndo_xdp_xmit   = i40e_xdp_xmit,
> .ndo_xdp_flush  = i40e_xdp_flush,
> +   .ndo_xsk_async_xmit = i40e_xsk_async_xmit,
>  };
>
>  /**
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
> b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 6b1142fbc697..923bb84a93ab 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -10,16 +10,6 @@
>  #include "i40e_trace.h"
>  #include "i40e_prototype.h"
>
> -static inline __le64 build_ctob(u32 td_cmd, u32 td_offset, unsigned int size,
> -   u32 td_tag)
> -{
> -   return cpu_to_le64(I40E_TX_DESC_DTYPE_DATA |
> -  ((u64)td_cmd  << I40E_TXD_QW1_CMD_SHIFT) |
> -  ((u64)td_offset << I40E_TXD_QW1_OFFSET_SHIFT) |
> -  ((u64)size  << I40E_TXD_QW1_TX_BUF_SZ_SHIFT) |
> -  ((u64)td_tag  << I40E_TXD_QW1_L2TAG1_SHIFT));
> -}
> -
>  #define I40E_TXD_CMD (I40E_TX_DESC_CMD_EOP | I40E_TX_DESC_CMD_RS)
>  /**
>   * i40e_fdir - Generate a Flow Director descriptor based on fdata
> @@ -649,9 +639,13 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
> if (!tx_ring->tx_bi)
> return;
>
> -   /* Free all the Tx ring sk_buffs */
> -   for (i = 0; i < tx_ring->count; i++)
> -   i40e_unmap_and_free_tx_resource(tx_ring, _ring->tx_bi[i]);
> +   /* Cleanup only needed for non XSK TX ZC rings */
> +   if (!tx_ring->xsk_umem) {
> +   /* Free all the Tx ring sk_buffs */
> +   for (i = 0; i < tx_ring->count; i++)
> +   i40e_unmap_and_free_tx_resource(tx_ring,
> +   _ring->tx_bi[i]);
> +   }
>
> bi_size = sizeof(struct i40e_tx_buffer) * tx_ring->count;
> memset(tx_ring->tx_bi, 0, bi_size);
> @@ -768,8 +762,40 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
> }
>  }
>
> +void i40e_update_tx_stats(struct i40e_ring *tx_ring,
> + unsigned int total_packets,
> + unsigned int total_bytes)
> +{
> +   u64_stats_update_begin(_ring->syncp);
> +   tx_ring->stats.bytes += total_bytes;
> +   tx_ring->stats.packets += total_packets;
> +   u64_stats_update_end(_ring->syncp);
> +   tx_ring->q_vector->tx.total_bytes += total_bytes;
> +   tx_ring->q_vector->tx.total_packets += total_packets;
> +}
> +
>  #define WB_STRIDE 4
>
> +void i40e_arm_wb(struct i40e_ring *tx_ring,
> +struct i40e_vsi *vsi,
> +int budget)
> +{
> +   if (tx_ring->flags & I40E_TXR_FLAGS_WB_ON_ITR) {
> +   /* check to see if there are < 4 descriptors
> +* waiting to be written back, then kick the hardware to force
> +* them to be written back in case we stay in NAPI.
> +* In this mode on X722 we do not enable Interrupt.
> +*/
> +   unsigned int j = i40e_get_tx_pending(tx_ring, false);
> +
> +   if (budget &&
> +

Re: [PATCH net] net/ipv6: prevent use after free in ip6_route_mpath_notify

2018-06-04 Thread Eric Dumazet




On 06/04/2018 01:41 PM, dsah...@kernel.org wrote:
> From: David Ahern 
> 
> syzbot reported a use-after-free:
> 
> BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 
> net/ipv6/route.c:4180
> Read of size 4 at addr 8801bf789cf0 by task syz-executor756/4555
> 
> Fix by not setting rt_last until the it is verified the insert succeeded.
> 
> Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to 
> RTA_MULTIPATH")
> Cc: Eric Dumazet 
> Reported-by: syzbot 
> Signed-off-by: David Ahern 
> ---

Reviewed-by: Eric Dumazet 

Thanks David !

[PATCH net] net/ipv6: prevent use after free in ip6_route_mpath_notify

2018-06-04 Thread dsahern

From: David Ahern 

syzbot reported a use-after-free:

BUG: KASAN: use-after-free in ip6_route_mpath_notify+0xe9/0x100 
net/ipv6/route.c:4180
Read of size 4 at addr 8801bf789cf0 by task syz-executor756/4555

CPU: 1 PID: 4555 Comm: syz-executor756 Not tainted 4.17.0-rc7+ #78
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
 ip6_route_mpath_notify+0xe9/0x100 net/ipv6/route.c:4180
 ip6_route_multipath_add+0x615/0x1910 net/ipv6/route.c:4303
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

Allocated by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
 dst_alloc+0xbb/0x1d0 net/core/dst.c:104
 __ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:361
 ip6_dst_alloc+0x29/0xb0 net/ipv6/route.c:376
 ip6_route_info_create+0x4d4/0x3a30 net/ipv6/route.c:2834
 ip6_route_multipath_add+0xc7e/0x1910 net/ipv6/route.c:4240
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

Freed by task 4555:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
 dst_destroy+0x267/0x3c0 net/core/dst.c:140
 dst_release_immediate+0x71/0x9e net/core/dst.c:205
 fib6_add+0xa40/0x1650 net/ipv6/ip6_fib.c:1305
 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1011
 ip6_route_multipath_add+0x513/0x1910 net/ipv6/route.c:4267
 inet6_rtm_newroute+0xe3/0x160 net/ipv6/route.c:4391
 ...

The problem is that rt_last can point to a deleted route if the insert
fails.

One reproducer is to insert a route and then add a multipath route that
has a duplicate nexthop.e.g,:
$ ip -6 ro add vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::2
$ ip -6 ro append vrf red 2001:db8:101::/64 nexthop via 2001:db8:1::4 
nexthop via 2001:db8:1::2

Fix by not setting rt_last until the it is verified the insert succeeded.

Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to 
RTA_MULTIPATH")
Cc: Eric Dumazet 
Reported-by: syzbot 
Signed-off-by: David Ahern 
---
 net/ipv6/route.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f4d61736c41a..c516f8556dbe 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4263,11 +4263,15 @@ static int ip6_route_multipath_add(struct fib6_config 
*cfg,
 
err_nh = NULL;
list_for_each_entry(nh, _nh_list, next) {
-   rt_last = nh->rt6_info;
err = __ip6_ins_rt(nh->rt6_info, info, >mxc, extack);
-   /* save reference to first route for notification */
-   if (!rt_notif && !err)
-   rt_notif = nh->rt6_info;
+   if (!err) {
+   /* save reference to last route successfully inserted */
+   rt_last = nh->rt6_info;
+
+   /* save reference to first route for notification */
+   if (!rt_notif)
+   rt_notif = nh->rt6_info;
+   }
 
/* nh->rt6_info is used or freed at this point, reset to NULL*/
nh->rt6_info = NULL;
-- 
2.11.0

Re: [bug] cxgb4: vrf stopped working with cxgb4 card

2018-06-04 Thread David Ahern

On 6/4/18 1:14 PM, AMG Zollner Robert wrote:
> Yes, I was enslaving while the interface was up.
> 
> Just tested some of the builds that where not working earlier and they
> are working if I keep the interface down when enslaving as you suggested.
> 
> Is this the expected behavior?

Not expected from my perspective.

The VRF device cycles interfaces when they are enslaved or unenslaved to
clean up route and neighbor tables. This is a day 1 property of VRF.

I guessed that was the problem based on the commit you bisected the
problem to. If nothing else, it gives you a workaround until it is fixed.

Re: [PATCH bpf-next 09/11] i40e: implement AF_XDP zero-copy support for Rx

2018-06-04 Thread Alexander Duyck

On Mon, Jun 4, 2018 at 5:05 AM, Björn Töpel  wrote:
> From: Björn Töpel 
>
> This commit adds initial AF_XDP zero-copy support for i40e-based
> NICs. First we add support for the new XDP_QUERY_XSK_UMEM and
> XDP_SETUP_XSK_UMEM commands in ndo_bpf. This allows the AF_XDP socket
> to pass a UMEM to the driver. The driver will then DMA map all the
> frames in the UMEM for the driver. Next, the Rx code will allocate
> frames from the UMEM fill queue, instead of the regular page
> allocator.
>
> Externally, for the rest of the XDP code, the driver internal UMEM
> allocator will appear as a MEM_TYPE_ZERO_COPY.
>
> The commit also introduces a completely new clean_rx_irq/allocator
> functions for zero-copy, and means (functions pointers) to set
> allocators and clean_rx functions.
>
> This first version does not support:
> * passing frames to the stack via XDP_PASS (clone/copy to skb).
> * doing XDP redirect to other than AF_XDP sockets
>   (convert_to_xdp_frame does not clone the frame yet).
>
> Signed-off-by: Björn Töpel 
> ---
>  drivers/net/ethernet/intel/i40e/Makefile|   3 +-
>  drivers/net/ethernet/intel/i40e/i40e.h  |  23 ++
>  drivers/net/ethernet/intel/i40e/i40e_main.c |  35 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 163 ++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h | 128 ++-
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c  | 537 
> 
>  drivers/net/ethernet/intel/i40e/i40e_xsk.h  |  17 +
>  include/net/xdp_sock.h  |  19 +
>  net/xdp/xdp_umem.h  |  10 -
>  9 files changed, 789 insertions(+), 146 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/i40e/i40e_xsk.c
>  create mode 100644 drivers/net/ethernet/intel/i40e/i40e_xsk.h
>
> diff --git a/drivers/net/ethernet/intel/i40e/Makefile 
> b/drivers/net/ethernet/intel/i40e/Makefile
> index 14397e7e9925..50590e8d1fd1 100644
> --- a/drivers/net/ethernet/intel/i40e/Makefile
> +++ b/drivers/net/ethernet/intel/i40e/Makefile
> @@ -22,6 +22,7 @@ i40e-objs := i40e_main.o \
> i40e_txrx.o \
> i40e_ptp.o  \
> i40e_client.o   \
> -   i40e_virtchnl_pf.o
> +   i40e_virtchnl_pf.o \
> +   i40e_xsk.o
>
>  i40e-$(CONFIG_I40E_DCB) += i40e_dcb.o i40e_dcb_nl.o
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
> b/drivers/net/ethernet/intel/i40e/i40e.h
> index 7a80652e2500..20955e5dce02 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -786,6 +786,12 @@ struct i40e_vsi {
>
> /* VSI specific handlers */
> irqreturn_t (*irq_handler)(int irq, void *data);
> +
> +   /* AF_XDP zero-copy */
> +   struct xdp_umem **xsk_umems;
> +   u16 num_xsk_umems_used;
> +   u16 num_xsk_umems;
> +
>  } cacheline_internodealigned_in_smp;
>
>  struct i40e_netdev_priv {
> @@ -1090,6 +1096,20 @@ static inline bool i40e_enabled_xdp_vsi(struct 
> i40e_vsi *vsi)
> return !!vsi->xdp_prog;
>  }
>
> +static inline struct xdp_umem *i40e_xsk_umem(struct i40e_ring *ring)
> +{
> +   bool xdp_on = i40e_enabled_xdp_vsi(ring->vsi);
> +   int qid = ring->queue_index;
> +
> +   if (ring_is_xdp(ring))
> +   qid -= ring->vsi->alloc_queue_pairs;
> +
> +   if (!ring->vsi->xsk_umems || !ring->vsi->xsk_umems[qid] || !xdp_on)
> +   return NULL;
> +
> +   return ring->vsi->xsk_umems[qid];
> +}
> +
>  int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
>  int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 seid, u64 max_tx_rate);
>  int i40e_add_del_cloud_filter(struct i40e_vsi *vsi,
> @@ -1098,4 +1118,7 @@ int i40e_add_del_cloud_filter(struct i40e_vsi *vsi,
>  int i40e_add_del_cloud_filter_big_buf(struct i40e_vsi *vsi,
>   struct i40e_cloud_filter *filter,
>   bool add);
> +int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
> +int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair);
> +
>  #endif /* _I40E_H_ */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
> b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 369a116edaa1..8c602424d339 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  /* Local includes */
>  #include "i40e.h"
> @@ -16,6 +17,7 @@
>   */
>  #define CREATE_TRACE_POINTS
>  #include "i40e_trace.h"
> +#include "i40e_xsk.h"
>
>  const char i40e_driver_name[] = "i40e";
>  static const char i40e_driver_string[] =
> @@ -3071,6 +3073,9 @@ static int i40e_configure_tx_ring(struct i40e_ring 
> *ring)
> i40e_status err = 0;
> u32 qtx_ctl = 0;
>
> +   if (ring_is_xdp(ring))
> +   ring->xsk_umem = i40e_xsk_umem(ring);
> +
> /* some ATR related tx ring init */
> if (vsi->back->flags & I40E_FLAG_FD_ATR_ENABLED) {
>

Re: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread David Miller

From: Or Gerlitz 
Date: Mon, 4 Jun 2018 23:27:57 +0300

> On Mon, Jun 4, 2018 at 8:56 PM, Jeff Kirsher
>  wrote:
>> This series contains a smorgasbord of updates to documentation, e1000e,
>> igb, ixgbe, ixgbevf and i40e.
> 
> Dave,
> 
> Did you forgot to flip the sign on the shop's door [1]?
> 
> Or.
> 
> [1] http://vger.kernel.org/~davem/net-next.html

It's open a day or two more to deal with the AF_XDP issues...

Re: [net-next 00/12][pull request] Intel Wired LAN Driver Updates 2018-06-04

2018-06-04 Thread Or Gerlitz

On Mon, Jun 4, 2018 at 8:56 PM, Jeff Kirsher
 wrote:
> This series contains a smorgasbord of updates to documentation, e1000e,
> igb, ixgbe, ixgbevf and i40e.

Dave,

Did you forgot to flip the sign on the shop's door [1]?

Or.

[1] http://vger.kernel.org/~davem/net-next.html

Re: [Intel-wired-lan] [PATCH bpf-next 00/11] AF_XDP: introducing zero-copy support

2018-06-04 Thread Jeff Kirsher

On Mon, 2018-06-04 at 09:38 -0700, Alexei Starovoitov wrote:
> On Mon, Jun 04, 2018 at 02:05:50PM +0200, Björn Töpel wrote:
> > From: Björn Töpel 
> > 
> > This patch serie introduces zerocopy (ZC) support for
> > AF_XDP. Programs using AF_XDP sockets will now receive RX packets
> > without any copies and can also transmit packets without incurring
> > any
> > copies. No modifications to the application are needed, but the NIC
> > driver needs to be modified to support ZC. If ZC is not supported
> > by
> > the driver, the modes introduced in the AF_XDP patch will be
> > used. Using ZC in our micro benchmarks results in significantly
> > improved performance as can be seen in the performance section
> > later
> > in this cover letter.
> > 
> > Note that for an untrusted application, HW packet steering to a
> > specific queue pair (the one associated with the application) is a
> > requirement when using ZC, as the application would otherwise be
> > able
> > to see other user space processes' packets. If the HW cannot
> > support
> > the required packet steering you need to use the XDP_SKB mode or
> > the
> > XDP_DRV mode without ZC turned on. The XSKMAP introduced in the
> > AF_XDP
> > patch set can be used to do load balancing in that case.
> > 
> > For benchmarking, you can use the xdpsock application from the
> > AF_XDP
> > patch set without any modifications. Say that you would like your
> > UDP
> > traffic from port 4242 to end up in queue 16, that we will enable
> > AF_XDP on. Here, we use ethtool for this:
> > 
> >   ethtool -N p3p2 rx-flow-hash udp4 fn
> >   ethtool -N p3p2 flow-type udp4 src-port 4242 dst-port 4242 \
> >   action 16
> > 
> > Running the rxdrop benchmark in XDP_DRV mode with zerocopy can then
> > be
> > done using:
> > 
> >   samples/bpf/xdpsock -i p3p2 -q 16 -r -N
> > 
> > We have run some benchmarks on a dual socket system with two
> > Broadwell
> > E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has
> > 14
> > cores which gives a total of 28, but only two cores are used in
> > these
> > experiments. One for TR/RX and one for the user space application.
> > The
> > memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> > 8192MB and with 8 of those DIMMs in the system we have 64 GB of
> > total
> > memory. The compiler used is gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0.
> > The
> > NIC is Intel I40E 40Gbit/s using the i40e driver.
> > 
> > Below are the results in Mpps of the I40E NIC benchmark runs for 64
> > and 1500 byte packets, generated by a commercial packet generator
> > HW
> > outputing packets at full 40 Gbit/s line rate. The results are
> > without
> > retpoline so that we can compare against previous numbers. 
> > 
> > AF_XDP performance 64 byte packets. Results from the AF_XDP V3
> > patch
> > set are also reported for ease of reference. The numbers within
> > parantheses are from the RFC V1 ZC patch set.
> > Benchmark   XDP_SKBXDP_DRVXDP_DRV with zerocopy
> > rxdrop   2.9*   9.6*   21.1(21.5)
> > txpush   2.6*   -  22.0(21.6)
> > l2fwd1.9*   2.5*   15.3(15.0)
> > 
> > AF_XDP performance 1500 byte packets:
> > Benchmark   XDP_SKB   XDP_DRV XDP_DRV with zerocopy
> > rxdrop   2.1*   3.3*   3.3(3.3)
> > l2fwd1.4*   1.8*   3.1(3.1)
> > 
> > * From AF_XDP V3 patch set and cover letter.
> > 
> > So why do we not get higher values for RX similar to the 34 Mpps we
> > had in AF_PACKET V4? We made an experiment running the rxdrop
> > benchmark without using the xdp_do_redirect/flush infrastructure
> > nor
> > using an XDP program (all traffic on a queue goes to one
> > socket). Instead the driver acts directly on the AF_XDP socket.
> > With
> > this we got 36.9 Mpps, a significant improvement without any change
> > to
> > the uapi. So not forcing users to have an XDP program if they do
> > not
> > need it, might be a good idea. This measurement is actually higher
> > than what we got with AF_PACKET V4.
> > 
> > XDP performance on our system as a base line:
> > 
> > 64 byte packets:
> > XDP stats   CPU pps issue-pps
> > XDP-RX CPU  16  32.3M  0
> > 
> > 1500 byte packets:
> > XDP stats   CPU pps issue-pps
> > XDP-RX CPU  16  3.3M0
> > 
> > The structure of the patch set is as follows:
> > 
> > Patches 1-3: Plumbing for AF_XDP ZC support
> > Patches 4-5: AF_XDP ZC for RX
> > Patches 6-7: AF_XDP ZC for TX
> 
> Acked-by: Alexei Starovoitov 
> for above patches
> 
> > Patch 8-10: ZC support for i40e.
> 
> these also look good to me.
> would be great if i40e experts take a look at them asap.
> 
> If there are no major objections we'd like to merge all of it
> for this merge window.

We would like a bit more time to review and test the changes, I
understand your eagerness for wanting this to get into 4.18 but this
change is large enough that a 24-48 hour review time is not prudent,
IMHO.

Alex also has requested

[PATCH net-next] net: phy: broadcom: Enable 125 MHz clock on LED4 pin for BCM54612E by default.

2018-06-04 Thread Kun Yi

BCM54612E have 4 multi-functional LED pins that can be configured
through register setting; the LED4 pin can be configured to a 125MHz
reference clock output by setting the spare register. Since the dedicated
CLK125 reference clock pin is not brought out on the 48-Pin MLP, the LED4
pin is the only pin to provide such function in this package, and therefore
it is beneficial to just enable the reference clock by default.

Signed-off-by: Kun Yi 
---
 drivers/net/phy/broadcom.c | 16 ++--
 include/linux/brcmphy.h|  4 
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index f9c25912eb98..e86ea105c802 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -54,6 +54,8 @@ static int bcm54210e_config_init(struct phy_device *phydev)
 
 static int bcm54612e_config_init(struct phy_device *phydev)
 {
+   int reg;
+
/* Clear TX internal delay unless requested. */
if ((phydev->interface != PHY_INTERFACE_MODE_RGMII_ID) &&
(phydev->interface != PHY_INTERFACE_MODE_RGMII_TXID)) {
@@ -65,8 +67,6 @@ static int bcm54612e_config_init(struct phy_device *phydev)
/* Clear RX internal delay unless requested. */
if ((phydev->interface != PHY_INTERFACE_MODE_RGMII_ID) &&
(phydev->interface != PHY_INTERFACE_MODE_RGMII_RXID)) {
-   u16 reg;
-
reg = bcm54xx_auxctl_read(phydev,
  MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
/* Disable RXD to RXC delay (default set) */
@@ -77,6 +77,18 @@ static int bcm54612e_config_init(struct phy_device *phydev)
 MII_BCM54XX_AUXCTL_MISC_WREN | reg);
}
 
+   /* Enable CLK125 MUX on LED4 if ref clock is enabled. */
+   if (!(phydev->dev_flags & PHY_BRCM_RX_REFCLK_UNUSED)) {
+   int err;
+
+   reg = bcm_phy_read_exp(phydev, BCM54612E_EXP_SPARE0);
+   err = bcm_phy_write_exp(phydev, BCM54612E_EXP_SPARE0,
+   BCM54612E_LED4_CLK125OUT_EN | reg);
+
+   if (err < 0)
+   return err;
+   }
+
return 0;
 }
 
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index b324e01ccf2d..daa9234a9baf 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -85,6 +85,7 @@
 #define MII_BCM54XX_EXP_SEL0x17/* Expansion register select */
 #define MII_BCM54XX_EXP_SEL_SSD0x0e00  /* Secondary SerDes select */
 #define MII_BCM54XX_EXP_SEL_ER 0x0f00  /* Expansion register select */
+#define MII_BCM54XX_EXP_SEL_ETC0x0d00  /* Expansion register spare + 
2k mem */
 
 #define MII_BCM54XX_AUX_CTL0x18/* Auxiliary control register */
 #define MII_BCM54XX_ISR0x1a/* BCM54xx interrupt status 
register */
@@ -219,6 +220,9 @@
 #define BCM54810_SHD_CLK_CTL   0x3
 #define BCM54810_SHD_CLK_CTL_GTXCLK_EN (1 << 9)
 
+/* BCM54612E Registers */
+#define BCM54612E_EXP_SPARE0   (MII_BCM54XX_EXP_SEL_ETC + 0x34)
+#define BCM54612E_LED4_CLK125OUT_EN(1 << 1)
 
 /*/
 /* Fast Ethernet Transceiver definitions. */
-- 
2.17.1.1185.g55be947832-goog

Re: [bug] cxgb4: vrf stopped working with cxgb4 card

2018-06-04 Thread AMG Zollner Robert


Yes, I was enslaving while the interface was up.

Just tested some of the builds that where not working earlier and they 
are working if I keep the interface down when enslaving as you suggested.


Is this the expected behavior?

Thank you,
Zollner Robert


On 04.06.2018 21:17, David Ahern wrote:

On 6/4/18 8:03 AM, AMG Zollner Robert wrote:

I have noticed that vrf is not working with kernel v4.15.0 but was
working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)

Setup:
Two metal servers with a T520-cr card each, directly connected without a
switch in between.

    SVR1  only ipfwd SVR2     with vrf
.. .--.
|                        | |     |
|    192.168.8.1 [  ens2f4]--|-|--[ens1f4] 192.168.8.2   |
|    192.168.9.1 [ens2f4d1]--|-|-- 192.168.9.2 VRF=10   |
`' `--'

When vrf is not working there are no error messages (dmesg or iproute
commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
shows packets(arp req/reply) coming in and going out, but outgoing
packets(arp reply) do not reach the other server SVR1.ens2f4d1


Bisect:
Found this commit to be the problem after doing a git bisect between
v4.13..v4.15:

commit ba581f77df23c8ee70b372966e69cf10bc5453d8
Author: Ganesh Goudar 
Date:   Sat Sep 23 16:07:28 2017 +0530

     cxgb4: do DCB state reset in couple of places

     reset the driver's DCB state in couple of places
     where it was missing.


A bisect step was considered good when:
- successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
- successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
forwarding) (this check was redundant,both tests fail or pass simultaneous)

The problem is still present on recent kernels also, checked v4.16.0 and
v4.17.rc7

Disabling DCB for the card support fixes the problem ( Compiling kernel
with "CONFIG_CHELSIO_T4_DCB=n")


Are you doing the VRF enslave while it is up?

If so, does it work ok if you change the sequence:

ip li set ens1f4d1 down
ip li set ens1f4d1 master 
ip li set ens1f4d1 up

Re: [PATCH net-next] Allow ethtool to change tun link settings

2018-06-04 Thread David Miller

From: Chas Williams <3ch...@gmail.com>
Date: Sat,  2 Jun 2018 17:49:53 -0400

> Let user space set whatever it would like to advertise for the
> tun interface.  Preserve the existing defaults.
> 
> Signed-off-by: Chas Williams <3ch...@gmail.com>

This looks fine, applied.

Re: [bpf-next PATCH] bpf: sockmap, fix crash when ipv6 sock is added

2018-06-04 Thread Daniel Borkmann

On 06/04/2018 05:21 PM, John Fastabend wrote:
> This fixes a crash where we assign tcp_prot to IPv6 sockets instead
> of tcpv6_prot.
> 
> Previously we overwrote the sk->prot field with tcp_prot even in the
> AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
> are used. Further, only allow ESTABLISHED connections to join the
> map per note in TLS ULP,
> 
>/* The TLS ulp is currently supported only for TCP sockets
> * in ESTABLISHED state.
> * Supporting sockets in LISTEN state will require us
> * to modify the accept implementation to clone rather then
> * share the ulp context.
> */
> 
> Also tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
> 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
> crashing case here.
> 
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Reported-by: syzbot+5c063698bdbfac19f...@syzkaller.appspotmail.com
> Signed-off-by: John Fastabend 
> Signed-off-by: Wei Wang 

Applied to bpf-next, thanks everyone!

Re: [PATCH bpf-next] bpf: guard bpf_get_current_cgroup_id() with CONFIG_CGROUPS

2018-06-04 Thread Daniel Borkmann

On 06/04/2018 05:53 PM, Yonghong Song wrote:
> Commit bf6fa2c893c5 ("bpf: implement bpf_get_current_cgroup_id()
> helper") introduced a new helper bpf_get_current_cgroup_id().
> The helper has a dependency on CONFIG_CGROUPS.
> 
> When CONFIG_CGROUPS is not defined, using the helper will result
> the following verifier error:
>   kernel subsystem misconfigured func bpf_get_current_cgroup_id#80
> which is hard for users to interpret.
> Guarding the reference to bpf_get_current_cgroup_id_proto with
> CONFIG_CGROUPS will result in below better message:
>   unknown func bpf_get_current_cgroup_id#80
> 
> Fixes: bf6fa2c893c5 ("bpf: implement bpf_get_current_cgroup_id() helper")
> Suggested-by: Daniel Borkmann 
> Signed-off-by: Yonghong Song 

Applied to bpf-next, thanks Yonghong!

Re: [PATCH bpf-next 0/5] AF_XDP: bug fixes and descriptor changes

2018-06-04 Thread Daniel Borkmann

On 06/04/2018 06:24 PM, Alexei Starovoitov wrote:
> On Mon, Jun 04, 2018 at 01:57:10PM +0200, Björn Töpel wrote:
>> From: Björn Töpel 
>>
>> An issue with the current AF_XDP uapi raised by Mykyta Iziumtsev (see
>> https://www.spinics.net/lists/netdev/msg503664.html) is that it does
>> not support NICs that have a "type-writer" model in an efficient
>> way. In this model, a memory window is passed to the hardware and
>> multiple frames might be filled into that window, instead of just one
>> that we have in the current fixed frame-size model.
>>
>> This patch set fixes two bugs in the current implementation and then
>> changes the uapi so that the type-writer model can be supported
>> efficiently by a possible future extension of AF_XDP.
>>
>> These are the uapi changes in this patch:
>>
>> * Change the "u32 idx" in the descriptors to "u64 addr". The current
>>   idx based format does NOT work for the type-writer model (as packets
>>   can start anywhere within a frame) but that a relative address
>>   pointer (the u64 addr) works well for both models in the prototype
>>   code we have that supports both models. We increased it from u32 to
>>   u64 to support umems larger than 4G. We have also removed the u16
>>   offset when having a "u64 addr" since that information is already
>>   carried in the least significant bits of the address.
>>
>> * We want to use "u8 padding[5]" for something useful in the future
>>   (since we are not allowed to change its name), so we now call it
>>   just options so it can be extended for various purposes in the
>>   future. It is an u32 as that it what is left of the 16 byte
>>   descriptor.
>>
>> * We changed the name of frame_size in the UMEM_REG setsockopt to
>>   chunk_size since this naming also makes sense to the type-writer
>>   model.
>>
>> With these changes to the uapi, we believe the type-writer model can
>> be supported without having to resort to a new descriptor format. The
>> type-writer model could then be supported, from the uapi point of
>> view, by setting a flag at bind time and providing a new flag bit in
>> the options field of the descriptor that signals to user space that
>> all packets have been written in a chunk. Or with a new chunk
>> completion queue as suggested by Mykyta in his latest feedback mail on
>> the list.
> 
> for the set:
> Acked-by: Alexei Starovoitov 
> Thank you for these fixes.
> According to unofficial feedback from brcm and netronome folks
> the descriptor format should work for these nics too.
> At some point we may consider second format, but I think SW
> should drive HW requirements and not the other way around.

LGTM as well, applied to bpf-next, thanks!

Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps

2018-06-04 Thread Daniel Borkmann

On 06/04/2018 08:25 PM, Jakub Kicinski wrote:
[...]
> We prefer to have both :)  Those of us who like to abbreviate can do
> that, and others can use completions.  I personally think Quentin did
> an awesome job on the completions, they cover the entire syntax unlike
> the iproute2 ones and we intend to keep them that way!

Fully agree, both make sense. Personally, I only use abbreviations on
bpftool so far. :)

Re: [PATCH net-next 0/2] net: phy: improve PM handling of PHY/MDIO

2018-06-04 Thread David Miller

From: Heiner Kallweit 
Date: Sat, 2 Jun 2018 22:33:36 +0200

> Current implementation of MDIO bus PM ops doesn't actually implement
> bus-specific PM ops but just calls PM ops defined on a device level
> what doesn't seem to be fully in line with the core PM model.
> 
> When looking e.g. at __device_suspend() the PM core looks for PM ops
> of a device in a specific order:
> 1. device PM domain
> 2. device type
> 3. device class
> 4. device bus
> 
> I think it has good reason that there's no PM ops on device level.
> The situation can be improved by modeling PHY's as device type of
> a MDIO device. If for some other type of MDIO device PM ops are
> needed, it could be modeled as struct device_type as well.

Andrew and Florian, it would nice if one of you would review this
patch series.

Thank you.

Re: [PATCH net-next 0/2] mlxsw: Fixes in offloading of mirror-to-gretap

2018-06-04 Thread David Miller

From: Ido Schimmel 
Date: Sat,  2 Jun 2018 21:09:33 +0300

> Petr says:
> 
> These two patches fix issues in offloading of mirror-to-gretap when
> bridge is present in the underlay.
> 
> In patch #1, reconsideration of SPAN configuration is not done right at
> the point that SWITCHDEV_OBJ_ID_PORT_VLAN deletion notification is
> distributed, but is postponed, because the notifications are actually
> distributed before the relevant change is implemented in the bridge.
> 
> In patch #2, a problem in configuring VLAN tagging in situations when a
> VLAN device is on top of an 802.1Q bridge whose egress port is marked as
> "egress untagged". In that case, mlxsw would neglect to suppress the
> tagging implicitly assumed after the VLAN device was seen.

Series applied, thank you.

Re: [PATCH bpf-next v3 05/11] bpf: avoid retpoline for lookup/update/delete calls on maps

2018-06-04 Thread Jakub Kicinski

On Mon, 4 Jun 2018 13:02:25 +0200, Phil Sutter wrote:
> On Sun, Jun 03, 2018 at 07:08:55PM +0200, Jesper Dangaard Brouer wrote:
> > Secondly I personally *hate* how the 'ip' does it's short options
> > parsing and especially order/precedence ambiguity.  Phil Sutter
> > (Fedora/RHEL iproute2 maintainer) have a funny quiz illustrating the
> > ambiguity issues.  
> 
> Hehe, yes. It's a classical case of something smart evolving into a
> pain: At first there's only 'ip link', so you allow 'ip l' as a
> shortcut. Then someone implements 'ip l2tp' - so what do you do?

Good example, I like that "ip l" shows me the links because that's what
99.99% of people want when they type that command ;)

> Establish a policy of abbreviation having to be unique and break
> existing behaviour or accept the mess and head on.

Commands are tested in order of addition so older ones take precedence.

The iproute2 behaviour was replicated in bpftool on purpose, because
it should be very familiar to people.  It is to me at least.  And IMHO
it's better to be consistent with a well known tool than have our own
quirks and rules...

> My suggestion would be to not get into the abbreviated subcommands
> business at all but instead ship and maintain a bash-completion script.

We prefer to have both :)  Those of us who like to abbreviate can do
that, and others can use completions.  I personally think Quentin did
an awesome job on the completions, they cover the entire syntax unlike
the iproute2 ones and we intend to keep them that way!

Re: [bug] cxgb4: vrf stopped working with cxgb4 card

2018-06-04 Thread David Ahern

On 6/4/18 8:03 AM, AMG Zollner Robert wrote:
> I have noticed that vrf is not working with kernel v4.15.0 but was
> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
> 
> Setup:
> Two metal servers with a T520-cr card each, directly connected without a
> switch in between.
> 
>    SVR1  only ipfwd SVR2     with vrf
> .. .--.
> |                        | |     |
> |    192.168.8.1 [  ens2f4]--|-|--[ens1f4] 192.168.8.2   |
> |    192.168.9.1 [ens2f4d1]--|-|-- 192.168.9.2 VRF=10   |
> `' `--'
> 
> When vrf is not working there are no error messages (dmesg or iproute
> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
> shows packets(arp req/reply) coming in and going out, but outgoing
> packets(arp reply) do not reach the other server SVR1.ens2f4d1
> 
> 
> Bisect:
> Found this commit to be the problem after doing a git bisect between
> v4.13..v4.15:
> 
> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
> Author: Ganesh Goudar 
> Date:   Sat Sep 23 16:07:28 2017 +0530
> 
>     cxgb4: do DCB state reset in couple of places
> 
>     reset the driver's DCB state in couple of places
>     where it was missing.
> 
> 
> A bisect step was considered good when:
> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
> forwarding) (this check was redundant,both tests fail or pass simultaneous)
> 
> The problem is still present on recent kernels also, checked v4.16.0 and
> v4.17.rc7
> 
> Disabling DCB for the card support fixes the problem ( Compiling kernel
> with "CONFIG_CHELSIO_T4_DCB=n")
> 

Are you doing the VRF enslave while it is up?

If so, does it work ok if you change the sequence:

ip li set ens1f4d1 down
ip li set ens1f4d1 master 
ip li set ens1f4d1 up

[Patch net-next] netdev-FAQ: clarify DaveM's position for stable backports

2018-06-04 Thread Cong Wang

Per discussion with David at netconf 2018, let's clarify
DaveM's position of handling stable backports in netdev-FAQ.

This is important for people relying on upstream -stable
releases.

Cc: sta...@vger.kernel.org
Cc: Greg Kroah-Hartman 
Signed-off-by: Cong Wang 
---
 Documentation/networking/netdev-FAQ.txt | 9 +
 1 file changed, 9 insertions(+)

diff --git a/Documentation/networking/netdev-FAQ.txt 
b/Documentation/networking/netdev-FAQ.txt
index 2a3278d5cf35..6dde6686c870 100644
--- a/Documentation/networking/netdev-FAQ.txt
+++ b/Documentation/networking/netdev-FAQ.txt
@@ -179,6 +179,15 @@ A: No.  See above answer.  In short, if you think it 
really belongs in
dash marker line as described in 
Documentation/process/submitting-patches.rst to
temporarily embed that information into the patch that you send.
 
+Q: Are all networking bug fixes backported to all stable releases?
+
+A: Due to capacity, Dave could only take care of the backports for the last
+   3 stable releases. For earlier stable releases, each stable branch 
maintainer
+   is supposed to take care of them. If you find any patch is missing from an
+   earlier stable branch, please notify sta...@vger.kernel.org with either a
+   commit ID or a formal patch backported, and CC Dave and other relevant
+   networking developers.
+
 Q: Someone said that the comment style and coding convention is different
for the networking content.  Is this true?
 
-- 
2.13.0

[net-next 02/12] Documentation: e100: Update the Intel 10/100 driver doc

2018-06-04 Thread Jeff Kirsher

Over the years, several of the links have changed or are no longer valid
so update them.  In addition, the default values were incorrect for a
couple of parameters.

Converted the text file to the reStructuredText (RST) format, since the
Linux kernel documentation now uses this format for documentation.

Signed-off-by: Jeff Kirsher 
Tested-by: Aaron Brown 
---
 .../networking/{e100.txt => e100.rst} | 60 +--
 Documentation/networking/index.rst|  1 +
 MAINTAINERS   |  2 +-
 3 files changed, 29 insertions(+), 34 deletions(-)
 rename Documentation/networking/{e100.txt => e100.rst} (79%)

diff --git a/Documentation/networking/e100.txt 
b/Documentation/networking/e100.rst
similarity index 79%
rename from Documentation/networking/e100.txt
rename to Documentation/networking/e100.rst
index 54810b82c01a..d4d837027925 100644
--- a/Documentation/networking/e100.txt
+++ b/Documentation/networking/e100.rst
@@ -1,7 +1,7 @@
 Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters
 ==
 
-March 15, 2011
+June 1, 2018
 
 Contents
 
@@ -36,16 +36,9 @@ Channel Bonding documentation can be found in the Linux 
kernel source:
 Identifying Your Adapter
 
 
-For more information on how to identify your adapter, go to the Adapter &
-Driver ID Guide at:
-
-  http://support.intel.com/support/network/adapter/pro100/21397.htm
-
-For the latest Intel network drivers for Linux, refer to the following
-website. In the search field, enter your adapter name or type, or use the
-networking link on the left to search for your adapter:
-
-  http://downloadfinder.intel.com/scripts-df/support_intel.asp
+For information on how to identify your adapter, and for the latest Intel
+network drivers, refer to the Intel Support website:
+http://www.intel.com/support
 
 Driver Configuration Parameters
 ===
@@ -57,22 +50,26 @@ Rx Descriptors: Number of receive descriptors. A receive 
descriptor is a data
structure that describes a receive buffer and its attributes to the network
controller. The data in the descriptor is used by the controller to write
data from the controller to host memory. In the 3.x.x driver the valid range
-   for this parameter is 64-256. The default value is 64. This parameter can be
-   changed using the command:
+   for this parameter is 64-256. The default value is 256. This parameter can 
be
+   changed using the command::
 
-   ethtool -G eth? rx n, where n is the number of desired rx descriptors.
+   ethtool -G eth? rx n
+
+   Where n is the number of desired Rx descriptors.
 
 Tx Descriptors: Number of transmit descriptors. A transmit descriptor is a data
structure that describes a transmit buffer and its attributes to the network
controller. The data in the descriptor is used by the controller to read
data from the host memory to the controller. In the 3.x.x driver the valid
-   range for this parameter is 64-256. The default value is 64. This parameter
-   can be changed using the command:
+   range for this parameter is 64-256. The default value is 128. This parameter
+   can be changed using the command::
+
+   ethtool -G eth? tx n
 
-   ethtool -G eth? tx n, where n is the number of desired tx descriptors.
+   Where n is the number of desired Tx descriptors.
 
 Speed/Duplex: The driver auto-negotiates the link speed and duplex settings by
-   default. The ethtool utility can be used as follows to force speed/duplex.
+   default. The ethtool utility can be used as follows to force speed/duplex.::
 
ethtool -s eth?  autoneg off speed {10|100} duplex {full|half}
 
@@ -81,7 +78,7 @@ Speed/Duplex: The driver auto-negotiates the link speed and 
duplex settings by
 
 Event Log Message Level:  The driver uses the message level flag to log events
to syslog. The message level can be set at driver load time. It can also be
-   set using the command:
+   set using the command::
 
ethtool -s eth? msglvl n
 
@@ -112,9 +109,9 @@ Additional Configurations
   -
   In order to see link messages and other Intel driver information on your
   console, you must set the dmesg level up to six. This can be done by
-  entering the following on the command line before loading the e100 driver:
+  entering the following on the command line before loading the e100 driver::
 
-   dmesg -n 8
+   dmesg -n 6
 
   If you wish to see all messages issued by the driver, including debug
   messages, set the dmesg level to eight.
@@ -146,7 +143,8 @@ Additional Configurations
 
   NAPI (Rx polling mode) is supported in the e100 driver.
 
-  See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
+  See https://wiki.linuxfoundation.org/networking/napi for more information
+  on NAPI.
 
   Multiple Interfaces on Same Ethernet Broadcast Network

[net-next 01/12] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

2018-06-04 Thread Jeff Kirsher

From: Benjamin Poirier 

There have been multiple reports of crashes that look like
kernel: RIP: 0010:[] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [] process_one_work+0x155/0x440
kernel:  [] worker_thread+0x116/0x4b0
kernel:  [] kthread+0xd2/0xf0
kernel:  [] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger 
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index d3fef7fefea8..acf1e8b52b8e 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3527,15 +3527,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter 
*adapter, u32 *timinca)
}
break;
case e1000_pch_spt:
-   if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
-   /* Stable 24MHz frequency */
-   incperiod = INCPERIOD_24MHZ;
-   incvalue = INCVALUE_24MHZ;
-   shift = INCVALUE_SHIFT_24MHZ;
-   adapter->cc.shift = shift;
-   break;
-   }
-   return -EINVAL;
+   /* Stable 24MHz frequency */
+   incperiod = INCPERIOD_24MHZ;
+   incvalue = INCVALUE_24MHZ;
+   shift = INCVALUE_SHIFT_24MHZ;
+   adapter->cc.shift = shift;
+   break;
case e1000_pch_cnp:
if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
/* Stable 24MHz frequency */
-- 
2.17.1

[net-next 08/12] ixgbe: introduce a helper to simplify code

2018-06-04 Thread Jeff Kirsher

From: YueHaibing 

ixgbe_dbg_reg_ops_read and ixgbe_dbg_netdev_ops_read copy-pasting
the same code except for ixgbe_dbg_netdev_ops_buf/ixgbe_dbg_reg_ops_buf,
so introduce a helper ixgbe_dbg_common_ops_read to remove redundant code.

Signed-off-by: YueHaibing 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ixgbe/ixgbe_debugfs.c  | 57 +++
 1 file changed, 21 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
index 55fe8114fe99..50dfb02fa34c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_debugfs.c
@@ -10,15 +10,9 @@ static struct dentry *ixgbe_dbg_root;
 
 static char ixgbe_dbg_reg_ops_buf[256] = "";
 
-/**
- * ixgbe_dbg_reg_ops_read - read for reg_ops datum
- * @filp: the opened file
- * @buffer: where to write the data for the user to read
- * @count: the size of the user's buffer
- * @ppos: file position offset
- **/
-static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, char __user *buffer,
-   size_t count, loff_t *ppos)
+static ssize_t ixgbe_dbg_common_ops_read(struct file *filp, char __user 
*buffer,
+size_t count, loff_t *ppos,
+char *dbg_buf)
 {
struct ixgbe_adapter *adapter = filp->private_data;
char *buf;
@@ -29,8 +23,7 @@ static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, char 
__user *buffer,
return 0;
 
buf = kasprintf(GFP_KERNEL, "%s: %s\n",
-   adapter->netdev->name,
-   ixgbe_dbg_reg_ops_buf);
+   adapter->netdev->name, dbg_buf);
if (!buf)
return -ENOMEM;
 
@@ -45,6 +38,20 @@ static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, 
char __user *buffer,
return len;
 }
 
+/**
+ * ixgbe_dbg_reg_ops_read - read for reg_ops datum
+ * @filp: the opened file
+ * @buffer: where to write the data for the user to read
+ * @count: the size of the user's buffer
+ * @ppos: file position offset
+ **/
+static ssize_t ixgbe_dbg_reg_ops_read(struct file *filp, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+   return ixgbe_dbg_common_ops_read(filp, buffer, count, ppos,
+ixgbe_dbg_reg_ops_buf);
+}
+
 /**
  * ixgbe_dbg_reg_ops_write - write into reg_ops datum
  * @filp: the opened file
@@ -121,33 +128,11 @@ static char ixgbe_dbg_netdev_ops_buf[256] = "";
  * @count: the size of the user's buffer
  * @ppos: file position offset
  **/
-static ssize_t ixgbe_dbg_netdev_ops_read(struct file *filp,
-char __user *buffer,
+static ssize_t ixgbe_dbg_netdev_ops_read(struct file *filp, char __user 
*buffer,
 size_t count, loff_t *ppos)
 {
-   struct ixgbe_adapter *adapter = filp->private_data;
-   char *buf;
-   int len;
-
-   /* don't allow partial reads */
-   if (*ppos != 0)
-   return 0;
-
-   buf = kasprintf(GFP_KERNEL, "%s: %s\n",
-   adapter->netdev->name,
-   ixgbe_dbg_netdev_ops_buf);
-   if (!buf)
-   return -ENOMEM;
-
-   if (count < strlen(buf)) {
-   kfree(buf);
-   return -ENOSPC;
-   }
-
-   len = simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf));
-
-   kfree(buf);
-   return len;
+   return ixgbe_dbg_common_ops_read(filp, buffer, count, ppos,
+ixgbe_dbg_netdev_ops_buf);
 }
 
 /**
-- 
2.17.1

[net-next 07/12] ixgbevf: fix possible race in the reset subtask

2018-06-04 Thread Jeff Kirsher

From: Emil Tantilov 

Extend the RTNL lock in ixgbevf_reset_subtask() to protect the state bits
check in addition to the call to ixgbevf_reinit_locked().

This is to make sure that we get the most up-to-date values for the bits
and avoid a possible race when going down.

Suggested-by: Zhiping du 
Signed-off-by: Emil Tantilov 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 2d5a706c3c29..59416eddd840 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -3141,15 +3141,17 @@ static void ixgbevf_reset_subtask(struct 
ixgbevf_adapter *adapter)
if (!test_and_clear_bit(__IXGBEVF_RESET_REQUESTED, >state))
return;
 
+   rtnl_lock();
/* If we're already down or resetting, just bail */
if (test_bit(__IXGBEVF_DOWN, >state) ||
test_bit(__IXGBEVF_REMOVING, >state) ||
-   test_bit(__IXGBEVF_RESETTING, >state))
+   test_bit(__IXGBEVF_RESETTING, >state)) {
+   rtnl_unlock();
return;
+   }
 
adapter->tx_timeout_count++;
 
-   rtnl_lock();
ixgbevf_reinit_locked(adapter);
rtnl_unlock();
 }
-- 
2.17.1

[net-next 06/12] ixgbevf: Fix coexistence of malicious driver detection with XDP

2018-06-04 Thread Jeff Kirsher

From: Alexander Duyck 

In the case of the VF driver it is supposed to provide a context descriptor
that allows us to provide information about the header offsets inside of
the frame. However in the case of XDP we don't really have any of that
information since the data is minimally processed. As a result we were
seeing malicious driver detection (MDD) events being triggered when the PF
had that functionality enabled.

To address this I have added a bit of new code that will "prime" the XDP
ring by providing one context descriptor that assumes the minimal setup of
an Ethernet frame which is an L2 header length of 14. With just that we can
provide enough information to make the hardware happy so that we don't
trigger MDD events.

Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf.h  |  1 +
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 36 +++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 70c75681495f..56a1031dcc07 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -76,6 +76,7 @@ enum ixgbevf_ring_state_t {
__IXGBEVF_TX_DETECT_HANG,
__IXGBEVF_HANG_CHECK_ARMED,
__IXGBEVF_TX_XDP_RING,
+   __IXGBEVF_TX_XDP_RING_PRIMED,
 };
 
 #define ring_is_xdp(ring) \
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c 
b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 083041129539..2d5a706c3c29 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -991,24 +991,45 @@ static int ixgbevf_xmit_xdp_ring(struct ixgbevf_ring 
*ring,
return IXGBEVF_XDP_CONSUMED;
 
/* record the location of the first descriptor for this packet */
-   tx_buffer = >tx_buffer_info[ring->next_to_use];
-   tx_buffer->bytecount = len;
-   tx_buffer->gso_segs = 1;
-   tx_buffer->protocol = 0;
-
i = ring->next_to_use;
-   tx_desc = IXGBEVF_TX_DESC(ring, i);
+   tx_buffer = >tx_buffer_info[i];
 
dma_unmap_len_set(tx_buffer, len, len);
dma_unmap_addr_set(tx_buffer, dma, dma);
tx_buffer->data = xdp->data;
-   tx_desc->read.buffer_addr = cpu_to_le64(dma);
+   tx_buffer->bytecount = len;
+   tx_buffer->gso_segs = 1;
+   tx_buffer->protocol = 0;
+
+   /* Populate minimal context descriptor that will provide for the
+* fact that we are expected to process Ethernet frames.
+*/
+   if (!test_bit(__IXGBEVF_TX_XDP_RING_PRIMED, >state)) {
+   struct ixgbe_adv_tx_context_desc *context_desc;
+
+   set_bit(__IXGBEVF_TX_XDP_RING_PRIMED, >state);
+
+   context_desc = IXGBEVF_TX_CTXTDESC(ring, 0);
+   context_desc->vlan_macip_lens   =
+   cpu_to_le32(ETH_HLEN << IXGBE_ADVTXD_MACLEN_SHIFT);
+   context_desc->seqnum_seed   = 0;
+   context_desc->type_tucmd_mlhl   =
+   cpu_to_le32(IXGBE_TXD_CMD_DEXT |
+   IXGBE_ADVTXD_DTYP_CTXT);
+   context_desc->mss_l4len_idx = 0;
+
+   i = 1;
+   }
 
/* put descriptor type bits */
cmd_type = IXGBE_ADVTXD_DTYP_DATA |
   IXGBE_ADVTXD_DCMD_DEXT |
   IXGBE_ADVTXD_DCMD_IFCS;
cmd_type |= len | IXGBE_TXD_CMD;
+
+   tx_desc = IXGBEVF_TX_DESC(ring, i);
+   tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
tx_desc->read.olinfo_status =
cpu_to_le32((len << IXGBE_ADVTXD_PAYLEN_SHIFT) |
@@ -1688,6 +1709,7 @@ static void ixgbevf_configure_tx_ring(struct 
ixgbevf_adapter *adapter,
   sizeof(struct ixgbevf_tx_buffer) * ring->count);
 
clear_bit(__IXGBEVF_HANG_CHECK_ARMED, >state);
+   clear_bit(__IXGBEVF_TX_XDP_RING_PRIMED, >state);
 
IXGBE_WRITE_REG(hw, IXGBE_VFTXDCTL(reg_idx), txdctl);
 
-- 
2.17.1

[net-next 11/12] ixgbe: check ipsec ip addr against mgmt filters

2018-06-04 Thread Jeff Kirsher

From: Shannon Nelson 

Make sure we don't try to offload the decryption of an incoming
packet that should get delivered to the management engine.  This
is a corner case that will likely be very seldom seen, but could
really confuse someone if they were to hit it.

Suggested-by: Jesse Brandeburg 
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ixgbe/ixgbe_ipsec.c| 88 +++
 1 file changed, 88 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
index 99b170f1efd1..e1c976271bbd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
@@ -444,6 +444,89 @@ static int ixgbe_ipsec_parse_proto_keys(struct xfrm_state 
*xs,
return 0;
 }
 
+/**
+ * ixgbe_ipsec_check_mgmt_ip - make sure there is no clash with mgmt IP filters
+ * @xs: pointer to transformer state struct
+ **/
+static int ixgbe_ipsec_check_mgmt_ip(struct xfrm_state *xs)
+{
+   struct net_device *dev = xs->xso.dev;
+   struct ixgbe_adapter *adapter = netdev_priv(dev);
+   struct ixgbe_hw *hw = >hw;
+   u32 mfval, manc, reg;
+   int num_filters = 4;
+   bool manc_ipv4;
+   u32 bmcipval;
+   int i, j;
+
+#define MANC_EN_IPV4_FILTER  BIT(24)
+#define MFVAL_IPV4_FILTER_SHIFT  16
+#define MFVAL_IPV6_FILTER_SHIFT  24
+#define MIPAF_ARR(_m, _n)(IXGBE_MIPAF + ((_m) * 0x10) + ((_n) * 4))
+
+#define IXGBE_BMCIP(_n)  (0x5050 + ((_n) * 4))
+#define IXGBE_BMCIPVAL   0x5060
+#define BMCIP_V4 0x2
+#define BMCIP_V6 0x3
+#define BMCIP_MASK   0x3
+
+   manc = IXGBE_READ_REG(hw, IXGBE_MANC);
+   manc_ipv4 = !!(manc & MANC_EN_IPV4_FILTER);
+   mfval = IXGBE_READ_REG(hw, IXGBE_MFVAL);
+   bmcipval = IXGBE_READ_REG(hw, IXGBE_BMCIPVAL);
+
+   if (xs->props.family == AF_INET) {
+   /* are there any IPv4 filters to check? */
+   if (manc_ipv4) {
+   /* the 4 ipv4 filters are all in MIPAF(3, i) */
+   for (i = 0; i < num_filters; i++) {
+   if (!(mfval & BIT(MFVAL_IPV4_FILTER_SHIFT + i)))
+   continue;
+
+   reg = IXGBE_READ_REG(hw, MIPAF_ARR(3, i));
+   if (reg == xs->id.daddr.a4)
+   return 1;
+   }
+   }
+
+   if ((bmcipval & BMCIP_MASK) == BMCIP_V4) {
+   reg = IXGBE_READ_REG(hw, IXGBE_BMCIP(3));
+   if (reg == xs->id.daddr.a4)
+   return 1;
+   }
+
+   } else {
+   /* if there are ipv4 filters, they are in the last ipv6 slot */
+   if (manc_ipv4)
+   num_filters = 3;
+
+   for (i = 0; i < num_filters; i++) {
+   if (!(mfval & BIT(MFVAL_IPV6_FILTER_SHIFT + i)))
+   continue;
+
+   for (j = 0; j < 4; j++) {
+   reg = IXGBE_READ_REG(hw, MIPAF_ARR(i, j));
+   if (reg != xs->id.daddr.a6[j])
+   break;
+   }
+   if (j == 4)   /* did we match all 4 words? */
+   return 1;
+   }
+
+   if ((bmcipval & BMCIP_MASK) == BMCIP_V6) {
+   for (j = 0; j < 4; j++) {
+   reg = IXGBE_READ_REG(hw, IXGBE_BMCIP(j));
+   if (reg != xs->id.daddr.a6[j])
+   break;
+   }
+   if (j == 4)   /* did we match all 4 words? */
+   return 1;
+   }
+   }
+
+   return 0;
+}
+
 /**
  * ixgbe_ipsec_add_sa - program device with a security association
  * @xs: pointer to transformer state struct
@@ -465,6 +548,11 @@ static int ixgbe_ipsec_add_sa(struct xfrm_state *xs)
return -EINVAL;
}
 
+   if (ixgbe_ipsec_check_mgmt_ip(xs)) {
+   netdev_err(dev, "IPsec IP addr clash with mgmt filters\n");
+   return -EINVAL;
+   }
+
if (xs->xso.flags & XFRM_OFFLOAD_INBOUND) {
struct rx_sa rsa;
 
-- 
2.17.1

1 2 >

1 - 100 of 197 matches

Mail list logo