date:20171011

[RFC] Support for UNARP (RFC 1868)

2017-10-11 Thread Girish Moodalbail

Add support for UNARP, as detailed in the IETF RFC 1868 (ARP Extension -
UNARP). The central idea here is for a node to announce that it is
leaving the network and that all the nodes on the L2 broadcast domain to
update their ARP tables accordingly (i.e., mark the neighbor entry state
to FAILED). Even though the ARP timers on nodes would eventually  mark
such entries as FAILED it will be more robust if those entries gets
marked FAILED sooner with the help from the host that is going away.

Besides providing a solution for an usecase, as captured in RFC, of an
IP address moving across a proxy server, this feature is even more
important for certain use cases in the Cloud. Imagine a tenant who is
bringing up and down VM instances for some workload of theirs. If these
instances are part of a small subnet, then the new VM instances may be
assigned the same IP address (since the subnet pool is small) but with a
different MAC address. So, if there is a client which has a stale
mapping of the IP address to the old MAC address, then that client will
fail to communicate with the new VM instance for some time.

Another usecase that comes to mind is that of the Live VM
Migration. Imagine a client that is communicating with a VM. Now, let us
migrate this VM to a destination machine. The IP address to MAC address
mapping for a VM doesn't change after the Live Migration. However, there
will be a small amount of time (till the VM sends gratuitous ARP from
the destination machine) during which packets from a client will be
forwarded to the source machine. This occurs because:

 - the ARP entry in the client is not invalidated yet and it continues
   to use the same MAC address and

 - the MAC address table of all of the intermediate switches between the
   client and the source machine are not updated yet for the MAC address
   move.

This issue of forwarding the packets to wrong target could be avoided by
sending UNARP packets from the source machine. This would invalidate the
ARP entry on the client and forces it to resolve the IP address again by
broadcasting an ARP request to the network. The VM on the destination
machine would then respond back with an ARP response. The ARP response
back from the VM should also clean up the MAC address table of the
intermediate switches.

The following changes implements the UNARP receive processing in the
kernel. Once the changes are in the kernel, arping(8) program can be
updated to send UNARP packets.

Any Thoughts/Comments?

Signed-off-by: Girish Moodalbail 
---

Compile-tested only.

 net/ipv4/arp.c | 46 +++---
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 7c45b88..8cb9aa1 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -686,6 +686,7 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
struct neighbour *n;
struct dst_entry *reply_dst = NULL;
bool is_garp = false;
+   bool is_unarp;
 
/* arp_rcv below verifies the ARP header and verifies the device
 * is ARP'able.
@@ -695,6 +696,8 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
goto out_free_skb;
 
arp = arp_hdr(skb);
+   /* arp_rcv has already verified the header for the UNARP case */
+   is_unarp = arp->ar_hln == 0;
 
switch (dev_type) {
default:
@@ -741,8 +744,8 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
  * Extract fields
  */
arp_ptr = (unsigned char *)(arp + 1);
-   sha = arp_ptr;
-   arp_ptr += dev->addr_len;
+   sha = is_unarp ? NULL : arp_ptr;
+   arp_ptr += arp->ar_hln;
memcpy(, arp_ptr, 4);
arp_ptr += 4;
switch (dev_type) {
@@ -751,8 +754,8 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
break;
 #endif
default:
-   tha = arp_ptr;
-   arp_ptr += dev->addr_len;
+   tha = is_unarp ? NULL : arp_ptr;
+   arp_ptr += arp->ar_hln;
}
memcpy(, arp_ptr, 4);
 /*
@@ -874,7 +877,10 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
   It is possible, that this option should be enabled for some
   devices (strip is candidate)
 */
-   if (!n &&
+   /* If the packet is UNARP and we don't have the corresponding
+* neighbour entry, then there is nothing to do.
+*/
+   if (!n && !is_unarp &&
(is_garp ||
 (arp->ar_op == htons(ARPOP_REPLY) &&
  (addr_type == RTN_UNICAST ||
@@ -899,12 +905,15 @@ static int arp_process(struct net *net, struct sock *sk, 
struct sk_buff *skb)
  NEIGH_VAR(n->parms, LOCKTIME)) ||

Re: [PATCH] rtl8xxxu: mark expected switch fall-throughs

2017-10-11 Thread Kalle Valo

Jes Sorensen  writes:

> On 10/11/2017 04:41 AM, Kalle Valo wrote:
>> Jes Sorensen  writes:
>>
>>> On 10/10/2017 03:30 PM, Gustavo A. R. Silva wrote:
 In preparation to enabling -Wimplicit-fallthrough, mark switch cases
 where we are expecting to fall through.
>>>
>>> While this isn't harmful, to me this looks like pointless patch churn
>>> for zero gain and it's just ugly.
>>
>> In general I find it useful to mark fall through cases. And it's just a
>> comment with two words, so they cannot hurt your eyes that much.
>
> I don't see them being harmful in the code, but I don't see them of
> much use either. If it happened as part of natural code development,
> fine. My objection is to people running around doing this
> systematically causing patch churn for little to zero gain.

We do receive quite a lot these kind of cleanup patches found with
various analysers and tools. I guess one could classify those as churn
but I think the net result is still very much on the positive side. And
this patch in particular seems useful for me and I think we should take
it.

-- 
Kalle Valo

[PATCH net-next] tcp: remove obsolete helpers

2017-10-11 Thread Eric Dumazet

From: Eric Dumazet 

Remove three inline helpers that are no longer needed.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h |   17 -
 1 file changed, 17 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
15163454174babdcb465904f725b919268dd1bc7..3b3b9b968e2d4b3469aa4f69d708a440bda578e7
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1629,18 +1629,6 @@ static inline struct sk_buff *tcp_write_queue_tail(const 
struct sock *sk)
return skb_peek_tail(>sk_write_queue);
 }
 
-static inline struct sk_buff *tcp_write_queue_next(const struct sock *sk,
-  const struct sk_buff *skb)
-{
-   return skb_queue_next(>sk_write_queue, skb);
-}
-
-static inline struct sk_buff *tcp_write_queue_prev(const struct sock *sk,
-  const struct sk_buff *skb)
-{
-   return skb_queue_prev(>sk_write_queue, skb);
-}
-
 #define tcp_for_write_queue_from_safe(skb, tmp, sk)\
skb_queue_walk_from_safe(&(sk)->sk_write_queue, skb, tmp)
 
@@ -1697,11 +1685,6 @@ static inline void tcp_add_write_queue_tail(struct sock 
*sk, struct sk_buff *skb
}
 }
 
-static inline void __tcp_add_write_queue_head(struct sock *sk, struct sk_buff 
*skb)
-{
-   __skb_queue_head(>sk_write_queue, skb);
-}
-
 /* Insert new before skb on the write queue of sk.  */
 static inline void tcp_insert_write_queue_before(struct sk_buff *new,
  struct sk_buff *skb,

[PATCH v2] net: ftgmac100: Request clock and set speed

2017-10-11 Thread Joel Stanley

According to the ASPEED datasheet, gigabit speeds require a clock of
100MHz or higher. Other speeds require 25MHz or higher. This patch
configures a 100MHz clock if the system has a direct-attached
PHY, or 25MHz if the system is running NC-SI which is limited to 100MHz.

There appear to be no other upstream users of the FTGMAC100 driver so it
is hard to know the clocking requirements of other platforms. Therefore
a conservative approach was taken with enabling clocks. If the platform
is not ASPEED, both requesting the clock and configuring the speed is
skipped.

Signed-off-by: Joel Stanley 
---
Andrew, as I'm travelling can you please test this on the evb and a
palmetto? Use my wip/aspeed-v4.14-clk branch, or OpenBMC's dev-4.13.

David, please wait for Andrew's tested-by before applying.

Cheers!

v2:
 - only touch the clocks on Aspeed platforms
 - unconditionally call clk_unprepare_disable

 drivers/net/ethernet/faraday/ftgmac100.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 9ed8e4b81530..cd352bf41da1 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -21,6 +21,7 @@
 
 #define pr_fmt(fmt)KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -59,6 +60,9 @@
 /* Min number of tx ring entries before stopping queue */
 #define TX_THRESHOLD   (MAX_SKB_FRAGS + 1)
 
+#define FTGMAC_100MHZ  1
+#define FTGMAC_25MHZ   2500
+
 struct ftgmac100 {
/* Registers */
struct resource *res;
@@ -96,6 +100,7 @@ struct ftgmac100 {
struct napi_struct napi;
struct work_struct reset_task;
struct mii_bus *mii_bus;
+   struct clk *clk;
 
/* Link management */
int cur_speed;
@@ -1734,6 +1739,22 @@ static void ftgmac100_ncsi_handler(struct ncsi_dev *nd)
nd->link_up ? "up" : "down");
 }
 
+static void ftgmac100_setup_clk(struct ftgmac100_priv *priv)
+{
+   priv->clk = devm_clk_get(>dev, NULL);
+   if (IS_ERR(priv->clk))
+   return;
+
+   clk_prepare_enable(priv->clk);
+
+   /* Aspeed specifies a 100MHz clock is required for up to
+* 1000Mbit link speeds. As NCSI is limited to 100Mbit, 25MHz
+* is sufficient
+*/
+   clk_set_rate(priv->clk, priv->is_ncsi ? FTGMAC_25MHZ :
+   FTGMAC_100MHZ);
+}
+
 static int ftgmac100_probe(struct platform_device *pdev)
 {
struct resource *res;
@@ -1830,6 +1851,9 @@ static int ftgmac100_probe(struct platform_device *pdev)
goto err_setup_mdio;
}
 
+   if (priv->is_aspeed)
+   ftgmac100_setup_clk(priv);
+
/* Default ring sizes */
priv->rx_q_entries = priv->new_rx_q_entries = DEF_RX_QUEUE_ENTRIES;
priv->tx_q_entries = priv->new_tx_q_entries = DEF_TX_QUEUE_ENTRIES;
@@ -1883,6 +1907,8 @@ static int ftgmac100_remove(struct platform_device *pdev)
 
unregister_netdev(netdev);
 
+   clk_disable_unprepare(priv->clk);
+
/* There's a small chance the reset task will have been re-queued,
 * during stop, make sure it's gone before we free the structure.
 */
-- 
2.14.1

Re: [PATCH][bpf-next] bpf: remove redundant variable old_flags

2017-10-11 Thread David Miller

From: Colin King 
Date: Wed, 11 Oct 2017 11:56:23 +0100

> From: Colin Ian King 
> 
> Variable old_flags is being assigned but is never read; it is redundant
> and can be removed.
> 
> Cleans up clang warning: Value stored to 'old_flags' is never read
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH net-next 0/3] mlx4_en XDP TX improvements

2017-10-11 Thread David Miller

From: Tariq Toukan 
Date: Wed, 11 Oct 2017 13:17:24 +0300

> This patchset contains performance improvements
> to the XDP_TX use case in the mlx4 Eth driver.
> 
> Patch 1 is a simple change in a function parameter type.
> Patch 2 replaces a call to a generic function with the
>   relevant parts inlined.
> Patch 3 moves the write of descriptors' constant values
>   from data path to control path.
> 
> Series generated against net-next commit:
> 833e0e2f24fd net: dst: move cpu inside ifdef to avoid compilation warning

Series applied, thanks.

Re: [PATCH][net-next] net: mpls: make function ipgre_mpls_encap_hlen static

2017-10-11 Thread David Miller

From: Colin King 
Date: Wed, 11 Oct 2017 10:53:28 +0100

> From: Colin Ian King 
> 
> The function ipgre_mpls_encap_hlen is local to the source and
> does not need to be in global scope, so make it static.
> 
> Cleans up sparse warning:
> symbol 'ipgre_mpls_encap_hlen' was not declared. Should it be static?
> 
> Signed-off-by: Colin Ian King 

Applied, thanks Colin.

Re: [PATCH][next] sctp: make array sctp_sched_ops static

2017-10-11 Thread David Miller

From: Colin King 
Date: Wed, 11 Oct 2017 11:17:57 +0100

> From: Colin Ian King 
> 
> The array sctp_sched_ops  is local to the source and
> does not need to be in global scope, so make it static.
> 
> Cleans up sparse warning:
> symbol 'sctp_sched_ops' was not declared. Should it be static?
> 
> Signed-off-by: Colin Ian King 

Applied.

Re: [PATCH net-next 2/2] ipv6: addrconf: don't use rtnl mutex in RTM_GETADDR

2017-10-11 Thread David Miller

From: Florian Westphal 
Date: Wed, 11 Oct 2017 10:28:01 +0200

> Similar to the previous patch, use the device lookup functions
> that bump device refcount and flag this as DOIT_UNLOCKED to avoid
> rtnl mutex.
> 
> Signed-off-by: Florian Westphal 

Applied.

Re: [PATCH net-next 1/2] ipv6: addrconf: don't use rtnl mutex in RTM_GETNETCONF

2017-10-11 Thread David Miller

From: Florian Westphal 
Date: Wed, 11 Oct 2017 10:28:00 +0200

> Instead of relying on rtnl mutex bump device reference count.
> After this change, values reported can change in parallel, but thats not
> much different from current state, as anyone can change the settings
> right after rtnl_unlock (and before userspace processed reply).
> 
> While at it, switch to GFP_KERNEL allocation.
> 
> Signed-off-by: Florian Westphal 

Applied.

Re: [patch net-next v2 0/4] net: sched: get rid of cls_flower->egress_dev

2017-10-11 Thread David Miller

From: Jiri Pirko 
Date: Wed, 11 Oct 2017 09:41:06 +0200

> From: Jiri Pirko 
> 
> Introduction of cls_flower->egress_dev was a workaround. Turned out
> to be a bit ugly hack. So replace it with more generic and reusable
> infrastructure.
> 
> This is a dependency of shared block introduction that will be send as
> a follow-up patchsets group.

Series applied, thanks.

Re: [PATCH net] net/ncsi: Don't limit vids based on hot_channel

2017-10-11 Thread David Miller

From: Samuel Mendoza-Jonas 
Date: Wed, 11 Oct 2017 16:54:27 +1100

> Currently we drop any new VLAN ids if there are more than the current
> (or last used) channel can support. Most importantly this is a problem
> if no channel has been selected yet, resulting in a segfault.
> 
> Secondly this does not necessarily reflect the capabilities of any other
> channels. Instead only drop a new VLAN id if we are already tracking the
> maximum allowed by the NCSI specification. Per-channel limits are
> already handled by ncsi_add_filter(), but add a message to set_one_vid()
> to make it obvious that the channel can not support any more VLAN ids.
> 
> Signed-off-by: Samuel Mendoza-Jonas 

Applied, thanks.

Re: [PATCH net-next v2 0/7] net: qualcomm: rmnet: Rewrite some existing functionality

2017-10-11 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Wed, 11 Oct 2017 18:43:51 -0600

> This series fixes some of the broken rmnet functionality.
> Bridge mode is re-written and made useable and the muxed_ep is converted to 
> hlist.
> 
> Patches 1-5 are cleanups in preparation for these changes.
> Patch 6 does the hlist conversion.
> Patch 7 has the implementation of the rmnet bridge mode.
> 
> v1->v2: Fix the warning and code style issue in rmnet_rx_handler as
> mentioned by David.

This looks better, series applied, thanks!

[PATCH] ip: update policy routing config help

2017-10-11 Thread Stephen Hemminger

The kernel config help for policy routing was still pointing at
an ancient document from 2000 that refers to Linux 2.1. Update it
to point to something that is at least occasionally updated.

Signed-off-by: Stephen Hemminger 
---
 net/ipv4/Kconfig | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 91a2557942fa..f48fe6fc7e8c 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -70,11 +70,9 @@ config IP_MULTIPLE_TABLES
  address into account. Furthermore, the TOS (Type-Of-Service) field
  of the packet can be used for routing decisions as well.
 
- If you are interested in this, please see the preliminary
- documentation at 
- and .
- You will need supporting software from
- .
+ If you need more information, see the Linux Advanced
+ Routing and Traffic Control documentation at
+ 
 
  If unsure, say N.
 
-- 
2.11.0

Re: [PATCH] net: ftgmac100: Request clock and set speed

2017-10-11 Thread Joel Stanley

On Tue, Oct 10, 2017 at 4:14 PM, Benjamin Herrenschmidt
 wrote:
> On Tue, 2017-10-10 at 15:19 +1030, Joel Stanley wrote:
>> According to the ASPEED datasheet, gigabit speeds require a clock of
>> 100MHz or higher. Other speeds require 25MHz or higher.
>
> Did you try "live" changing by either using ethtool or plugging into
> switches/hubs at different speed ?
>
> Also this is aspeed'isms, we should probably keep that under an
> is_aspeed test.
>
> My assumption is that we wouldn't bother, and just leave the freq
> set based on whether there's a physical gigabit capable connection or
> not (ie, real gigabit PHY vs. NC-SI really). But if it can help save a
> few milliwatts..

I didn't try changing the link speed at runtime. I don't have a setup
that lets me precisely measure power consumption, so it's hard to know
what the benefits are. In the future I can revisit this and do those
measurements.

I'll change it to be as you suggest; 100MHz for PHY and 50MHz for NC-SI.

Cheers,

Joel

Re: [PATCH net-next 1/1] bridge: return error code when deleting Vlan

2017-10-11 Thread David Ahern

On 10/11/17 3:29 PM, Roman Mashak wrote:
> Signed-off-by: Roman Mashak 
> ---
>  net/bridge/br_netlink.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index f0e8268..a1e1ca8 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -527,11 +527,11 @@ static int br_vlan_info(struct net_bridge *br, struct 
> net_bridge_port *p,
>  
>   case RTM_DELLINK:
>   if (p) {
> - nbp_vlan_delete(p, vinfo->vid);
> + err = nbp_vlan_delete(p, vinfo->vid);
>   if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER)
> - br_vlan_delete(p->br, vinfo->vid);
> + err = br_vlan_delete(p->br, vinfo->vid);

err is reset here. What if nbp_vlan_delete fails and br_vlan_delete
succeeds?


>   } else {
> - br_vlan_delete(br, vinfo->vid);
> + err = br_vlan_delete(br, vinfo->vid);
>   }
>   break;
>   }
>

[PATCH net-next 0/2] Add mqprio hardware offload support in hns3 driver

2017-10-11 Thread Yunsheng Lin

This patchset adds a new hardware offload type in mqprio before adding
mqprio hardware offload support in hns3 driver.

Yunsheng Lin (2):
  mqprio: Add a new hardware offload type in mqprio
  net: hns3: Add mqprio hardware offload support in hns3 driver

 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 23 +++
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 46 ++
 include/uapi/linux/pkt_sched.h |  1 +
 4 files changed, 55 insertions(+), 16 deletions(-)

-- 
1.9.1

[PATCH net-next 1/2] mqprio: Add a new hardware offload type in mqprio

2017-10-11 Thread Yunsheng Lin

When a driver supports both dcb and hardware offloaded mqprio, and
user is running mqprio and dcb tool concurrently, the configuration
set by each tool may be conflicted with each other because the dcb
and mqprio may be using the same hardwere offload component and share
the tc system in the network stack.

This patch adds a new offload type to indicate that the underlying
driver offload prio mapping as part of DCB. If the driver would be
incapable of that it would refuse the offload. User would then have
to explicitly request that qdisc offload.

Signed-off-by: Yunsheng Lin 
Suggested-by: Yuval Mintz 
---
 include/uapi/linux/pkt_sched.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf55..8016027 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -620,6 +620,7 @@ struct tc_drr_stats {
 enum {
TC_MQPRIO_HW_OFFLOAD_NONE,  /* no offload requested */
TC_MQPRIO_HW_OFFLOAD_TCS,   /* offload TCs, no queue counts */
+   TC_MQPRIO_HW_OFFLOAD_DCB,   /* offload shared by DCB */
__TC_MQPRIO_HW_OFFLOAD_MAX
 };
 
-- 
1.9.1

[PATCH net-next 2/2] net: hns3: Add mqprio hardware offload support in hns3 driver

2017-10-11 Thread Yunsheng Lin

When using tc qdisc, dcb_ops->setup_tc is used to tell hclge_dcb
module to do the tm related setup. Only TC_MQPRIO_HW_OFFLOAD_DCB
offload type is supported.

Signed-off-by: Yunsheng Lin 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 23 +++
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 46 ++
 3 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.h 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
index 575f50d..3acd8db 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.h
@@ -381,6 +381,7 @@ struct hnae3_dcb_ops {
u8   (*setdcbx)(struct hnae3_handle *, u8);
 
int (*map_update)(struct hnae3_handle *);
+   int (*setup_tc)(struct hnae3_handle *, u8, u8 *);
 };
 
 struct hnae3_ae_algo {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index 1b30a6f..7ec9484 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -276,6 +276,28 @@ static u8 hclge_setdcbx(struct hnae3_handle *h, u8 mode)
return 0;
 }
 
+static int hclge_setup_tc(struct hnae3_handle *h, u8 tc, u8 *prio_tc)
+{
+   struct hclge_vport *vport = hclge_get_vport(h);
+   struct hclge_dev *hdev = vport->back;
+   int ret;
+
+   if (tc > hdev->tc_max) {
+   dev_err(>pdev->dev,
+   "setup tc failed, tc(%u) > tc_max(%u)\n",
+   tc, hdev->tc_max);
+   return -EINVAL;
+   }
+
+   hclge_tm_schd_info_update(hdev, tc);
+
+   ret = hclge_tm_prio_tc_info_update(hdev, prio_tc);
+   if (ret)
+   return ret;
+
+   return hclge_tm_init_hw(hdev);
+}
+
 static const struct hnae3_dcb_ops hns3_dcb_ops = {
.ieee_getets= hclge_ieee_getets,
.ieee_setets= hclge_ieee_setets,
@@ -284,6 +306,7 @@ static u8 hclge_setdcbx(struct hnae3_handle *h, u8 mode)
.getdcbx= hclge_getdcbx,
.setdcbx= hclge_setdcbx,
.map_update = hclge_map_update,
+   .setup_tc   = hclge_setup_tc,
 };
 
 void hclge_dcb_ops_set(struct hclge_dev *hdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
index ba550c1..79d8d6b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -1186,42 +1186,56 @@ static void hns3_nic_udp_tunnel_del(struct net_device 
*netdev,
}
 }
 
-static int hns3_setup_tc(struct net_device *netdev, u8 tc)
+static int hns3_setup_tc(struct net_device *netdev, u8 tc, u8 *prio_tc)
 {
struct hnae3_handle *h = hns3_get_handle(netdev);
struct hnae3_knic_private_info *kinfo = >kinfo;
+   bool if_running;
unsigned int i;
int ret;
 
if (tc > HNAE3_MAX_TC)
return -EINVAL;
 
-   if (kinfo->num_tc == tc)
-   return 0;
-
if (!netdev)
return -EINVAL;
 
-   if (!tc) {
-   netdev_reset_tc(netdev);
-   return 0;
+   if_running = netif_running(netdev);
+   if (if_running) {
+   hns3_nic_net_stop(netdev);
+   msleep(100);
}
 
-   /* Set num_tc for netdev */
-   ret = netdev_set_num_tc(netdev, tc);
+   ret = (kinfo->dcb_ops && kinfo->dcb_ops->setup_tc) ?
+   kinfo->dcb_ops->setup_tc(h, tc, prio_tc) : -EOPNOTSUPP;
if (ret)
-   return ret;
+   goto out;
+
+   if (tc <= 1) {
+   netdev_reset_tc(netdev);
+   } else {
+   ret = netdev_set_num_tc(netdev, tc);
+   if (ret)
+   goto out;
+
+   for (i = 0; i < HNAE3_MAX_TC; i++) {
+   if (!kinfo->tc_info[i].enable)
+   continue;
 
-   /* Set per TC queues for the VSI */
-   for (i = 0; i < HNAE3_MAX_TC; i++) {
-   if (kinfo->tc_info[i].enable)
netdev_set_tc_queue(netdev,
kinfo->tc_info[i].tc,
kinfo->tc_info[i].tqp_count,
kinfo->tc_info[i].tqp_offset);
+   }
}
 
-   return 0;
+   ret = hns3_nic_set_real_num_queue(netdev);
+
+out:
+   if (if_running)
+   hns3_nic_net_open(netdev);
+
+   return ret;
 }
 
 static int hns3_nic_setup_tc(struct net_device *dev, enum tc_setup_type type,
@@ -1229,10 +1243,10 @@ static int hns3_nic_setup_tc(struct net_device *dev, 
enum tc_setup_type type,
 {
struct tc_mqprio_qopt *mqprio = type_data;
 
-   if (type !=

[PATCH net-next v4 3/5] security: bpf: Add LSM hooks for bpf object related syscall

2017-10-11 Thread Chenbo Feng

From: Chenbo Feng 

Introduce several LSM hooks for the syscalls that will allow the
userspace to access to eBPF object such as eBPF programs and eBPF maps.
The security check is aimed to enforce a per object security protection
for eBPF object so only processes with the right priviliges can
read/write to a specific map or use a specific eBPF program. Besides
that, a general security hook is added before the multiplexer of bpf
syscall to check the cmd and the attribute used for the command. The
actual security module can decide which command need to be checked and
how the cmd should be checked.

Signed-off-by: Chenbo Feng 
---
 include/linux/bpf.h   |  6 ++
 include/linux/lsm_hooks.h | 54 +++
 include/linux/security.h  | 45 +++
 kernel/bpf/syscall.c  | 34 +++--
 security/security.c   | 32 
 5 files changed, 169 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0e9ca2555d7f..225740688ab7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -57,6 +57,9 @@ struct bpf_map {
atomic_t usercnt;
struct bpf_map *inner_map_meta;
char name[BPF_OBJ_NAME_LEN];
+#ifdef CONFIG_SECURITY
+   void *security;
+#endif
 };
 
 /* function argument constraints */
@@ -190,6 +193,9 @@ struct bpf_prog_aux {
struct user_struct *user;
u64 load_time; /* ns since boottime */
char name[BPF_OBJ_NAME_LEN];
+#ifdef CONFIG_SECURITY
+   void *security;
+#endif
union {
struct work_struct work;
struct rcu_head rcu;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c9258124e417..7161d8e7ee79 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1351,6 +1351,40 @@
  * @inode we wish to get the security context of.
  * @ctx is a pointer in which to place the allocated security context.
  * @ctxlen points to the place to put the length of @ctx.
+ *
+ * Security hooks for using the eBPF maps and programs functionalities through
+ * eBPF syscalls.
+ *
+ * @bpf:
+ * Do a initial check for all bpf syscalls after the attribute is copied
+ * into the kernel. The actual security module can implement their own
+ * rules to check the specific cmd they need.
+ *
+ * @bpf_map:
+ * Do a check when the kernel generate and return a file descriptor for
+ * eBPF maps.
+ *
+ * @map: bpf map that we want to access
+ * @mask: the access flags
+ *
+ * @bpf_prog:
+ * Do a check when the kernel generate and return a file descriptor for
+ * eBPF programs.
+ *
+ * @prog: bpf prog that userspace want to use.
+ *
+ * @bpf_map_alloc_security:
+ * Initialize the security field inside bpf map.
+ *
+ * @bpf_map_free_security:
+ * Clean up the security information stored inside bpf map.
+ *
+ * @bpf_prog_alloc_security:
+ * Initialize the security field inside bpf program.
+ *
+ * @bpf_prog_free_security:
+ * Clean up the security information stored inside bpf prog.
+ *
  */
 union security_list_options {
int (*binder_set_context_mgr)(struct task_struct *mgr);
@@ -1682,6 +1716,17 @@ union security_list_options {
struct audit_context *actx);
void (*audit_rule_free)(void *lsmrule);
 #endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_BPF_SYSCALL
+   int (*bpf)(int cmd, union bpf_attr *attr,
+unsigned int size);
+   int (*bpf_map)(struct bpf_map *map, fmode_t fmode);
+   int (*bpf_prog)(struct bpf_prog *prog);
+   int (*bpf_map_alloc_security)(struct bpf_map *map);
+   void (*bpf_map_free_security)(struct bpf_map *map);
+   int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
+   void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
+#endif /* CONFIG_BPF_SYSCALL */
 };
 
 struct security_hook_heads {
@@ -1901,6 +1946,15 @@ struct security_hook_heads {
struct list_head audit_rule_match;
struct list_head audit_rule_free;
 #endif /* CONFIG_AUDIT */
+#ifdef CONFIG_BPF_SYSCALL
+   struct list_head bpf;
+   struct list_head bpf_map;
+   struct list_head bpf_prog;
+   struct list_head bpf_map_alloc_security;
+   struct list_head bpf_map_free_security;
+   struct list_head bpf_prog_alloc_security;
+   struct list_head bpf_prog_free_security;
+#endif /* CONFIG_BPF_SYSCALL */
 } __randomize_layout;
 
 /*
diff --git a/include/linux/security.h b/include/linux/security.h
index ce6265960d6c..18800b0911e5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct linux_binprm;
 struct cred;
@@ -1730,6 +1731,50 @@ static inline void securityfs_remove(struct dentry 
*dentry)
 
 #endif
 
+#ifdef CONFIG_BPF_SYSCALL
+#ifdef

[PATCH net-next v4 0/5] bpf: security: New file mode and LSM hooks for eBPF object permission control

2017-10-11 Thread Chenbo Feng

From: Chenbo Feng 

Much like files and sockets, eBPF objects are accessed, controlled, and
shared via a file descriptor (FD). Unlike files and sockets, the
existing mechanism for eBPF object access control is very limited.
Currently there are two options for granting accessing to eBPF
operations: grant access to all processes, or only CAP_SYS_ADMIN
processes. The CAP_SYS_ADMIN-only mode is not ideal because most users
do not have this capability and granting a user CAP_SYS_ADMIN grants too
many other security-sensitive permissions. It also unnecessarily allows
all CAP_SYS_ADMIN processes access to eBPF functionality. Allowing all
processes to access to eBPF objects is also undesirable since it has
potential to allow unprivileged processes to consume kernel memory, and
opens up attack surface to the kernel.

Adding LSM hooks maintains the status quo for systems which do not use
an LSM, preserving compatibility with userspace, while allowing security
modules to choose how best to handle permissions on eBPF objects. Here
is a possible use case for the lsm hooks with selinux module:

The network-control daemon (netd) creates and loads an eBPF object for
network packet filtering and analysis. It passes the object FD to an
unprivileged network monitor app (netmonitor), which is not allowed to
create, modify or load eBPF objects, but is allowed to read the traffic
stats from the map.

Selinux could use these hooks to grant the following permissions:
allow netd self:bpf_map { create read write};
allow netmonitor netd:fd use;
allow netmonitor netd:bpf_map read;

In this patch series, A file mode is added to bpf map to store the
accessing mode. With this file mode flags, the map can be obtained read
only, write only or read and write. With the help of this file mode,
several security hooks can be added to the eBPF syscall implementations
to do permissions checks. These LSM hooks are mainly focused on checking
the process privileges before it obtains the fd for a specific bpf
object. No matter from a file location or from a eBPF id. Besides that,
a general check hook is also implemented at the start of bpf syscalls so
that each security module can have their own implementation on the reset
of bpf object related functionalities.

In order to store the ownership and security information about eBPF
maps, a security field pointer is added to the struct bpf_map. And the
last two patch set are implementation of selinux check on these hooks
introduced, plus an additional check when eBPF object is passed between
processes using unix socket as well as binder IPC.

Change since V1:

 - Whitelist the new bpf flags in the map allocate check.
 - Added bpf selftest for the new flags.
 - Added two new security hooks for copying the security information from
   the bpf object security struct to file security struct
 - Simplified the checking action when bpf fd is passed between processes.

 Change since V2:

 - Fixed the line break problem for map flags check
 - Fixed the typo in selinux check of file mode.
 - Merge bpf_map and bpf_prog into one selinux class
 - Added bpf_type and bpf_sid into file security struct to store the
   security information when generate fd.
 - Add the hook to bpf_map_new_fd and bpf_prog_new_fd.

 Change since V3:

 - Return the actual error from security check instead of -EPERM
 - Move the hooks into anon_inode_getfd() to avoid get file again after
   bpf object file is installed with fd.
 - Removed the bpf_sid field inside file_scerity_struct to reduce the
   cache size.

Chenbo Feng (5):
  bpf: Add file mode configuration into bpf maps
  bpf: Add tests for eBPF file mode
  security: bpf: Add LSM hooks for bpf object related syscall
  selinux: bpf: Add selinux check for eBPF syscall operations
  selinux: bpf: Add addtional check for bpf object file receive

 fs/anon_inodes.c|   7 ++
 include/linux/bpf.h |  12 ++-
 include/linux/lsm_hooks.h   |  71 +
 include/linux/security.h|  53 ++
 include/uapi/linux/bpf.h|   6 ++
 kernel/bpf/arraymap.c   |   6 +-
 kernel/bpf/devmap.c |   5 +-
 kernel/bpf/hashtab.c|   5 +-
 kernel/bpf/inode.c  |  15 ++-
 kernel/bpf/lpm_trie.c   |   3 +-
 kernel/bpf/sockmap.c|   5 +-
 kernel/bpf/stackmap.c   |   5 +-
 kernel/bpf/syscall.c| 108 +--
 security/security.c |  40 +++
 security/selinux/hooks.c| 182 
 security/selinux/include/classmap.h |   2 +
 security/selinux/include/objsec.h   |  12 +++
 tools/testing/selftests/bpf/test_maps.c |  48 +
 18 files changed, 561 insertions(+), 24 deletions(-)

-- 
2.15.0.rc0.271.g36b669edcc-goog

[PATCH net-next v4 2/5] bpf: Add tests for eBPF file mode

2017-10-11 Thread Chenbo Feng

From: Chenbo Feng 

Two related tests are added into bpf selftest to test read only map and
write only map. The tests verified the read only and write only flags
are working on hash maps.

Signed-off-by: Chenbo Feng 
---
 tools/testing/selftests/bpf/test_maps.c | 48 +
 1 file changed, 48 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index fe3a443a1102..896f23cfe918 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1033,6 +1033,51 @@ static void test_map_parallel(void)
assert(bpf_map_get_next_key(fd, , ) == -1 && errno == ENOENT);
 }
 
+static void test_map_rdonly(void)
+{
+   int i, fd, key = 0, value = 0;
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
+   MAP_SIZE, map_flags | BPF_F_RDONLY);
+   if (fd < 0) {
+   printf("Failed to create map for read only test '%s'!\n",
+  strerror(errno));
+   exit(1);
+   }
+
+   key = 1;
+   value = 1234;
+   /* Insert key=1 element. */
+   assert(bpf_map_update_elem(fd, , , BPF_ANY) == -1 &&
+  errno == EPERM);
+
+   /* Check that key=2 is not found. */
+   assert(bpf_map_lookup_elem(fd, , ) == -1 && errno == ENOENT);
+   assert(bpf_map_get_next_key(fd, , ) == -1 && errno == ENOENT);
+}
+
+static void test_map_wronly(void)
+{
+   int i, fd, key = 0, value = 0;
+
+   fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
+   MAP_SIZE, map_flags | BPF_F_WRONLY);
+   if (fd < 0) {
+   printf("Failed to create map for read only test '%s'!\n",
+  strerror(errno));
+   exit(1);
+   }
+
+   key = 1;
+   value = 1234;
+   /* Insert key=1 element. */
+   assert(bpf_map_update_elem(fd, , , BPF_ANY) == 0)
+
+   /* Check that key=2 is not found. */
+   assert(bpf_map_lookup_elem(fd, , ) == -1 && errno == EPERM);
+   assert(bpf_map_get_next_key(fd, , ) == -1 && errno == EPERM);
+}
+
 static void run_all_tests(void)
 {
test_hashmap(0, NULL);
@@ -1050,6 +1095,9 @@ static void run_all_tests(void)
test_map_large();
test_map_parallel();
test_map_stress();
+
+   test_map_rdonly();
+   test_map_wronly();
 }
 
 int main(void)
-- 
2.15.0.rc0.271.g36b669edcc-goog

[PATCH net-next v4 1/5] bpf: Add file mode configuration into bpf maps

2017-10-11 Thread Chenbo Feng

From: Chenbo Feng 

Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.

Signed-off-by: Chenbo Feng 
Acked-by: Alexei Starovoitov 
---
 include/linux/bpf.h  |  6 ++--
 include/uapi/linux/bpf.h |  6 
 kernel/bpf/arraymap.c|  6 +++-
 kernel/bpf/devmap.c  |  5 ++-
 kernel/bpf/hashtab.c |  5 +--
 kernel/bpf/inode.c   | 15 ++---
 kernel/bpf/lpm_trie.c|  3 +-
 kernel/bpf/sockmap.c |  5 ++-
 kernel/bpf/stackmap.c|  5 ++-
 kernel/bpf/syscall.c | 80 +++-
 10 files changed, 114 insertions(+), 22 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bc7da2ddfcaf..0e9ca2555d7f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -308,11 +308,11 @@ void bpf_map_area_free(void *base);
 
 extern int sysctl_unprivileged_bpf_disabled;
 
-int bpf_map_new_fd(struct bpf_map *map);
+int bpf_map_new_fd(struct bpf_map *map, int flags);
 int bpf_prog_new_fd(struct bpf_prog *prog);
 
 int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
-int bpf_obj_get_user(const char __user *pathname);
+int bpf_obj_get_user(const char __user *pathname, int flags);
 
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
@@ -331,6 +331,8 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct 
file *map_file,
void *key, void *value, u64 map_flags);
 int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
 
+int bpf_get_file_flag(int flags);
+
 /* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
  * forced to use 'long' read/writes to try to atomically copy long counters.
  * Best-effort only.  No barriers here, since it _will_ race with concurrent
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6db9e1d679cd..9cb50a228c39 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -217,6 +217,10 @@ enum bpf_attach_type {
 
 #define BPF_OBJ_NAME_LEN 16U
 
+/* Flags for accessing BPF object */
+#define BPF_F_RDONLY   (1U << 3)
+#define BPF_F_WRONLY   (1U << 4)
+
 union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32   map_type;   /* one of enum bpf_map_type */
@@ -259,6 +263,7 @@ union bpf_attr {
struct { /* anonymous struct used by BPF_OBJ_* commands */
__aligned_u64   pathname;
__u32   bpf_fd;
+   __u32   file_flags;
};
 
struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
@@ -286,6 +291,7 @@ union bpf_attr {
__u32   map_id;
};
__u32   next_id;
+   __u32   open_flags;
};
 
struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 68d866628be0..988c04c91e10 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -19,6 +19,9 @@
 
 #include "map_in_map.h"
 
+#define ARRAY_CREATE_FLAG_MASK \
+   (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
int i;
@@ -56,7 +59,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
-   attr->value_size == 0 || attr->map_flags & ~BPF_F_NUMA_NODE ||
+   attr->value_size == 0 ||
+   attr->map_flags & ~ARRAY_CREATE_FLAG_MASK ||
(percpu && numa_node != NUMA_NO_NODE))
return ERR_PTR(-EINVAL);
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a2c4dd..e5d3de7cff2e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -50,6 +50,9 @@
 #include 
 #include 
 
+#define DEV_CREATE_FLAG_MASK \
+   (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
 struct bpf_dtab_netdev {
struct net_device *dev;
struct bpf_dtab *dtab;
@@ -80,7 +83,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
 
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
-   attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
+   attr->value_size != 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
 
dtab =

[PATCH net-next v4 4/5] selinux: bpf: Add selinux check for eBPF syscall operations

2017-10-11 Thread Chenbo Feng

From: Chenbo Feng 

Implement the actual checks introduced to eBPF related syscalls. This
implementation use the security field inside bpf object to store a sid that
identify the bpf object. And when processes try to access the object,
selinux will check if processes have the right privileges. The creation
of eBPF object are also checked at the general bpf check hook and new
cmd introduced to eBPF domain can also be checked there.

Signed-off-by: Chenbo Feng 
Acked-by: Alexei Starovoitov 
---
 security/selinux/hooks.c| 111 
 security/selinux/include/classmap.h |   2 +
 security/selinux/include/objsec.h   |   4 ++
 3 files changed, 117 insertions(+)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index f5d304736852..94e473b9c884 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -85,6 +85,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "avc.h"
 #include "objsec.h"
@@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void *ib_sec)
 }
 #endif
 
+#ifdef CONFIG_BPF_SYSCALL
+static int selinux_bpf(int cmd, union bpf_attr *attr,
+unsigned int size)
+{
+   u32 sid = current_sid();
+   int ret;
+
+   switch (cmd) {
+   case BPF_MAP_CREATE:
+   ret = avc_has_perm(sid, sid, SECCLASS_BPF, BPF__MAP_CREATE,
+  NULL);
+   break;
+   case BPF_PROG_LOAD:
+   ret = avc_has_perm(sid, sid, SECCLASS_BPF, BPF__PROG_LOAD,
+  NULL);
+   break;
+   default:
+   ret = 0;
+   break;
+   }
+
+   return ret;
+}
+
+static u32 bpf_map_fmode_to_av(fmode_t fmode)
+{
+   u32 av = 0;
+
+   if (fmode & FMODE_READ)
+   av |= BPF__MAP_READ;
+   if (fmode & FMODE_WRITE)
+   av |= BPF__MAP_WRITE;
+   return av;
+}
+
+static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+   u32 sid = current_sid();
+   struct bpf_security_struct *bpfsec;
+
+   bpfsec = map->security;
+   return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF,
+   bpf_map_fmode_to_av(fmode), NULL);
+}
+
+static int selinux_bpf_prog(struct bpf_prog *prog)
+{
+   u32 sid = current_sid();
+   struct bpf_security_struct *bpfsec;
+
+   bpfsec = prog->aux->security;
+   return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF,
+   BPF__PROG_USE, NULL);
+}
+
+static int selinux_bpf_map_alloc(struct bpf_map *map)
+{
+   struct bpf_security_struct *bpfsec;
+
+   bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+   if (!bpfsec)
+   return -ENOMEM;
+
+   bpfsec->sid = current_sid();
+   map->security = bpfsec;
+
+   return 0;
+}
+
+static void selinux_bpf_map_free(struct bpf_map *map)
+{
+   struct bpf_security_struct *bpfsec = map->security;
+
+   map->security = NULL;
+   kfree(bpfsec);
+}
+
+static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+   struct bpf_security_struct *bpfsec;
+
+   bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+   if (!bpfsec)
+   return -ENOMEM;
+
+   bpfsec->sid = current_sid();
+   aux->security = bpfsec;
+
+   return 0;
+}
+
+static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
+{
+   struct bpf_security_struct *bpfsec = aux->security;
+
+   aux->security = NULL;
+   kfree(bpfsec);
+}
+#endif
+
 static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(binder_set_context_mgr, selinux_binder_set_context_mgr),
LSM_HOOK_INIT(binder_transaction, selinux_binder_transaction),
@@ -6471,6 +6572,16 @@ static struct security_hook_list selinux_hooks[] 
__lsm_ro_after_init = {
LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
 #endif
+
+#ifdef CONFIG_BPF_SYSCALL
+   LSM_HOOK_INIT(bpf, selinux_bpf),
+   LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
+   LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
+   LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
+   LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
+   LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
+   LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
+#endif
 };
 
 static __init int selinux_init(void)
diff --git a/security/selinux/include/classmap.h 
b/security/selinux/include/classmap.h
index 35ffb29a69cb..a91fa46a789f 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -237,6 +237,8 @@ struct security_class_mapping secclass_map[] = {
  { "access", NULL } },
{ "infiniband_endport",
  { "manage_subnet", NULL } },
+   { "bpf",
+ {"map_create", "map_read",

[PATCH net-next v4 5/5] selinux: bpf: Add addtional check for bpf object file receive

2017-10-11 Thread Chenbo Feng

From: Chenbo Feng 

Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.
The information stored inside the file security struct is the same as
the information in bpf object security struct.

Signed-off-by: Chenbo Feng 
---
 fs/anon_inodes.c  |  7 
 include/linux/lsm_hooks.h | 17 ++
 include/linux/security.h  |  8 +
 security/security.c   |  8 +
 security/selinux/hooks.c  | 71 +++
 security/selinux/include/objsec.h |  8 +
 6 files changed, 119 insertions(+)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 3168ee4e77f4..7a950978622c 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -152,6 +153,12 @@ int anon_inode_getfd(const char *name, const struct 
file_operations *fops,
error = PTR_ERR(file);
goto err_put_unused_fd;
}
+#ifdef CONFIG_BPF_SYSCALL
+   if (!strcmp(name, "bpf-map"))
+   security_bpf_map_file(file);
+   else if (!strcmp(name, "bpf-prog"))
+   security_bpf_prog_file(file);
+#endif
fd_install(fd, file);
 
return fd;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7161d8e7ee79..fdeadb4ba590 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1385,6 +1385,19 @@
  * @bpf_prog_free_security:
  * Clean up the security information stored inside bpf prog.
  *
+ * @bpf_map_file:
+ * When creating a bpf map fd, set up the file security information with
+ * the bpf security information stored in the map struct. So when the map
+ * fd is passed between processes, the security module can directly read
+ * the security information from file security struct rather than the bpf
+ * security struct.
+ *
+ * @bpf_prog_file:
+ * When creating a bpf prog fd, set up the file security information with
+ * the bpf security information stored in the prog struct. So when the prog
+ * fd is passed between processes, the security module can directly read
+ * the security information from file security struct rather than the bpf
+ * security struct.
  */
 union security_list_options {
int (*binder_set_context_mgr)(struct task_struct *mgr);
@@ -1726,6 +1739,8 @@ union security_list_options {
void (*bpf_map_free_security)(struct bpf_map *map);
int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
+   void (*bpf_map_file)(struct file *file);
+   void (*bpf_prog_file)(struct file *file);
 #endif /* CONFIG_BPF_SYSCALL */
 };
 
@@ -1954,6 +1969,8 @@ struct security_hook_heads {
struct list_head bpf_map_free_security;
struct list_head bpf_prog_alloc_security;
struct list_head bpf_prog_free_security;
+   struct list_head bpf_map_file;
+   struct list_head bpf_prog_file;
 #endif /* CONFIG_BPF_SYSCALL */
 } __randomize_layout;
 
diff --git a/include/linux/security.h b/include/linux/security.h
index 18800b0911e5..ebb0cca5eef1 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1740,6 +1740,8 @@ extern int security_bpf_map_alloc(struct bpf_map *map);
 extern void security_bpf_map_free(struct bpf_map *map);
 extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
 extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
+extern void security_bpf_map_file(struct file *file);
+extern void security_bpf_prog_file(struct file *file);
 #else
 static inline int security_bpf(int cmd, union bpf_attr *attr,
 unsigned int size)
@@ -1772,6 +1774,12 @@ static inline int security_bpf_prog_alloc(struct 
bpf_prog_aux *aux)
 
 static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
 { }
+
+static inline void security_bpf_map_file(struct file *file)
+{ }
+
+static inline void security_bpf_prog_file(struct file *file)
+{ }
 #endif /* CONFIG_SECURITY */
 #endif /* CONFIG_BPF_SYSCALL */
 
diff --git a/security/security.c b/security/security.c
index 1cd8526cb0b7..2ee6ba5cd690 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1734,4 +1734,12 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux)
 {
call_void_hook(bpf_prog_free_security, aux);
 }
+void security_bpf_map_file(struct file *file)
+{
+   call_void_hook(bpf_map_file, file);
+}
+void

Re: [next-queue PATCH v6 3/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc

2017-10-11 Thread Eric Dumazet

On Wed, 2017-10-11 at 17:54 -0700, Vinicius Costa Gomes wrote:
> This queueing discipline implements the shaper algorithm defined by
> the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.

...

> +static s64 delay_from_credits(s64 credits, s32 slope)
> +{
> + s64 rate = slope * BYTES_PER_KBIT;
> +
> + if (unlikely(rate == 0))
> + return S64_MAX;
> +
> + return ((-credits * NSEC_PER_SEC) / rate);
> +}

Have you tried to compile this on 32bit arch ?

make ARCH=i386

[RFC] iproute2: reject zero length strings for matches

2017-10-11 Thread Stephen Hemminger

The ip commands use the function matches() to allow for abbreviations like:
  $ ip l
  $ ip r

But the function does not check for zero length strings which is potentially 
error
prone (but might also break some power users assumptions). For example:

$ ip ""
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group 
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host 
   valid_lft forever preferred_lft forever

This patch checks for zero length strings and never matches the option.

$ ip ""
Object "" is unknown, try "ip help".

diff --git a/lib/utils.c b/lib/utils.c
index ac155bf5a044..b68a2e0f4a07 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -733,7 +733,7 @@ int matches(const char *cmd, const char *pattern)
 {
int len = strlen(cmd);
 
-   if (len > strlen(pattern))
+   if (len == 0 || len > strlen(pattern))
return -1;
return memcmp(pattern, cmd, len);
 }

[next-queue PATCH v6 2/5] mqprio: Implement select_queue class_ops

2017-10-11 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

When replacing a child qdisc from mqprio, tc_modify_qdisc() must fetch
the netdev_queue pointer that the current child qdisc is associated
with before creating the new qdisc.

Currently, when using mqprio as root qdisc, the kernel will end up
getting the queue #0 pointer from the mqprio (root qdisc), which leaves
any new child qdisc with a possibly wrong netdev_queue pointer.

Implementing the Qdisc_class_ops select_queue() on mqprio fixes this
issue and avoid an inconsistent state when child qdiscs are replaced.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_mqprio.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 6bcdfe6e7b63..8c042ae323e3 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -396,6 +396,12 @@ static void mqprio_walk(struct Qdisc *sch, struct 
qdisc_walker *arg)
}
 }
 
+static struct netdev_queue *mqprio_select_queue(struct Qdisc *sch,
+   struct tcmsg *tcm)
+{
+   return mqprio_queue_get(sch, TC_H_MIN(tcm->tcm_parent));
+}
+
 static const struct Qdisc_class_ops mqprio_class_ops = {
.graft  = mqprio_graft,
.leaf   = mqprio_leaf,
@@ -403,6 +409,7 @@ static const struct Qdisc_class_ops mqprio_class_ops = {
.walk   = mqprio_walk,
.dump   = mqprio_dump_class,
.dump_stats = mqprio_dump_class_stats,
+   .select_queue   = mqprio_select_queue,
 };
 
 static struct Qdisc_ops mqprio_qdisc_ops __read_mostly = {
-- 
2.14.2

[next-queue PATCH v6 3/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc

2017-10-11 Thread Vinicius Costa Gomes

This queueing discipline implements the shaper algorithm defined by
the 802.1Q-2014 Section 8.6.8.2 and detailed in Annex L.

It's primary usage is to apply some bandwidth reservation to user
defined traffic classes, which are mapped to different queues via the
mqprio qdisc.

Only a simple software implementation is added for now.

Signed-off-by: Vinicius Costa Gomes 
Signed-off-by: Jesus Sanchez-Palencia 
---
 include/uapi/linux/pkt_sched.h |  18 +++
 net/sched/Kconfig  |  11 ++
 net/sched/Makefile |   1 +
 net/sched/sch_cbs.c| 302 +
 4 files changed, 332 insertions(+)
 create mode 100644 net/sched/sch_cbs.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf5528fed..41e349df4bf4 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -871,4 +871,22 @@ struct tc_pie_xstats {
__u32 maxq; /* maximum queue size */
__u32 ecn_mark; /* packets marked with ecn*/
 };
+
+/* CBS */
+struct tc_cbs_qopt {
+   __u8 offload;
+   __s32 hicredit;
+   __s32 locredit;
+   __s32 idleslope;
+   __s32 sendslope;
+};
+
+enum {
+   TCA_CBS_UNSPEC,
+   TCA_CBS_PARMS,
+   __TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index e70ed26485a2..c03d86a7775e 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -172,6 +172,17 @@ config NET_SCH_TBF
  To compile this code as a module, choose M here: the
  module will be called sch_tbf.
 
+config NET_SCH_CBS
+   tristate "Credit Based Shaper (CBS)"
+   ---help---
+ Say Y here if you want to use the Credit Based Shaper (CBS) packet
+ scheduling algorithm.
+
+ See the top of  for more details.
+
+ To compile this code as a module, choose M here: the
+ module will be called sch_cbs.
+
 config NET_SCH_GRED
tristate "Generic Random Early Detection (GRED)"
---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 7b915d226de7..80c8f92d162d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_NET_SCH_FQ_CODEL)+= sch_fq_codel.o
 obj-$(CONFIG_NET_SCH_FQ)   += sch_fq.o
 obj-$(CONFIG_NET_SCH_HHF)  += sch_hhf.o
 obj-$(CONFIG_NET_SCH_PIE)  += sch_pie.o
+obj-$(CONFIG_NET_SCH_CBS)  += sch_cbs.o
 
 obj-$(CONFIG_NET_CLS_U32)  += cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)   += cls_route.o
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
new file mode 100644
index ..29de3e2bc33f
--- /dev/null
+++ b/net/sched/sch_cbs.c
@@ -0,0 +1,302 @@
+/*
+ * net/sched/sch_cbs.c Credit Based Shaper
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Vinicius Costa Gomes 
+ *
+ */
+
+/* Credit Based Shaper (CBS)
+ * =
+ *
+ * This is a simple rate-limiting shaper aimed at TSN applications on
+ * systems with known traffic workloads.
+ *
+ * Its algorithm is defined by the IEEE 802.1Q-2014 Specification,
+ * Section 8.6.8.2, and explained in more detail in the Annex L of the
+ * same specification.
+ *
+ * There are four tunables to be considered:
+ *
+ * 'idleslope': Idleslope is the rate of credits that is
+ * accumulated (in kilobits per second) when there is at least
+ * one packet waiting for transmission. Packets are transmitted
+ * when the current value of credits is equal or greater than
+ * zero. When there is no packet to be transmitted the amount of
+ * credits is set to zero. This is the main tunable of the CBS
+ * algorithm.
+ *
+ * 'sendslope':
+ * Sendslope is the rate of credits that is depleted (it should be a
+ * negative number of kilobits per second) when a transmission is
+ * ocurring. It can be calculated as follows, (IEEE 802.1Q-2014 Section
+ * 8.6.8.2 item g):
+ *
+ * sendslope = idleslope - port_transmit_rate
+ *
+ * 'hicredit': Hicredit defines the maximum amount of credits (in
+ * bytes) that can be accumulated. Hicredit depends on the
+ * characteristics of interfering traffic,
+ * 'max_interference_size' is the maximum size of any burst of
+ * traffic that can delay the transmission of a frame that is
+ * available for transmission for this traffic class, (IEEE
+ * 802.1Q-2014 Annex L, Equation L-3):
+ *
+ * hicredit = max_interference_size * (idleslope / port_transmit_rate)
+ *
+ * 'locredit': Locredit is the minimum amount of credits that can
+ * be reached. It is a

[next-queue PATCH v6 1/5] net/sched: Check for null dev_queue on create flow

2017-10-11 Thread Vinicius Costa Gomes

From: Jesus Sanchez-Palencia 

In qdisc_alloc() the dev_queue pointer was used without any checks
being performed. If qdisc_create() gets a null dev_queue pointer, it
just passes it along to qdisc_alloc(), leading to a crash. That
happens if a root qdisc implements select_queue() and returns a null
dev_queue pointer for an "invalid handle", for example, or if the
dev_queue associated with the parent qdisc is null.

This patch is in preparation for the next in this series, where
select_queue() is being added to mqprio and as it may return a null
dev_queue.

Signed-off-by: Jesus Sanchez-Palencia 
---
 net/sched/sch_generic.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a0a198768aad..de2408f1ccd3 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -603,8 +603,14 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
struct Qdisc *sch;
unsigned int size = QDISC_ALIGN(sizeof(*sch)) + ops->priv_size;
int err = -ENOBUFS;
-   struct net_device *dev = dev_queue->dev;
+   struct net_device *dev;
+
+   if (!dev_queue) {
+   err = -EINVAL;
+   goto errout;
+   }
 
+   dev = dev_queue->dev;
p = kzalloc_node(size, GFP_KERNEL,
 netdev_queue_numa_node_read(dev_queue));
 
-- 
2.14.2

[next-queue PATCH v6 5/5] igb: Add support for CBS offload

2017-10-11 Thread Vinicius Costa Gomes

From: Andre Guedes 

This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.

The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.

FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.

FQTSS feature is supported by i210 controller only.

Signed-off-by: Andre Guedes 
---
 drivers/net/ethernet/intel/igb/e1000_defines.h |  23 ++
 drivers/net/ethernet/intel/igb/e1000_regs.h|   8 +
 drivers/net/ethernet/intel/igb/igb.h   |   6 +
 drivers/net/ethernet/intel/igb/igb_main.c  | 347 +
 4 files changed, 384 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h 
b/drivers/net/ethernet/intel/igb/e1000_defines.h
index 1de82f247312..83cabff1e0ab 100644
--- a/drivers/net/ethernet/intel/igb/e1000_defines.h
+++ b/drivers/net/ethernet/intel/igb/e1000_defines.h
@@ -353,7 +353,18 @@
 #define E1000_RXPBS_CFG_TS_EN   0x8000
 
 #define I210_RXPBSIZE_DEFAULT  0x00A2 /* RXPBSIZE default */
+#define I210_RXPBSIZE_MASK 0x003F
+#define I210_RXPBSIZE_PB_32KB  0x0020
 #define I210_TXPBSIZE_DEFAULT  0x0414 /* TXPBSIZE default */
+#define I210_TXPBSIZE_MASK 0xC0FF
+#define I210_TXPBSIZE_PB0_8KB  (8 << 0)
+#define I210_TXPBSIZE_PB1_8KB  (8 << 6)
+#define I210_TXPBSIZE_PB2_4KB  (4 << 12)
+#define I210_TXPBSIZE_PB3_4KB  (4 << 18)
+
+#define I210_DTXMXPKTSZ_DEFAULT0x0098
+
+#define I210_SR_QUEUES_NUM 2
 
 /* SerDes Control */
 #define E1000_SCTL_DISABLE_SERDES_LOOPBACK 0x0400
@@ -1051,4 +1062,16 @@
 #define E1000_VLAPQF_P_VALID(_n)   (0x1 << (3 + (_n) * 4))
 #define E1000_VLAPQF_QUEUE_MASK0x03
 
+/* TX Qav Control fields */
+#define E1000_TQAVCTRL_XMIT_MODE   BIT(0)
+#define E1000_TQAVCTRL_DATAFETCHARBBIT(4)
+#define E1000_TQAVCTRL_DATATRANARB BIT(8)
+
+/* TX Qav Credit Control fields */
+#define E1000_TQAVCC_IDLESLOPE_MASK0x
+#define E1000_TQAVCC_QUEUEMODE BIT(31)
+
+/* Transmit Descriptor Control fields */
+#define E1000_TXDCTL_PRIORITY  BIT(27)
+
 #endif
diff --git a/drivers/net/ethernet/intel/igb/e1000_regs.h 
b/drivers/net/ethernet/intel/igb/e1000_regs.h
index 58adbf234e07..8eee081d395f 100644
--- a/drivers/net/ethernet/intel/igb/e1000_regs.h
+++ b/drivers/net/ethernet/intel/igb/e1000_regs.h
@@ -421,6 +421,14 @@ do { \
 
 #define E1000_I210_FLA 0x1201C
 
+#define E1000_I210_DTXMXPKTSZ  0x355C
+
+#define E1000_I210_TXDCTL(_n)  (0x0E028 + ((_n) * 0x40))
+
+#define E1000_I210_TQAVCTRL0x3570
+#define E1000_I210_TQAVCC(_n)  (0x3004 + ((_n) * 0x40))
+#define E1000_I210_TQAVHC(_n)  (0x300C + ((_n) * 0x40))
+
 #define E1000_INVM_DATA_REG(_n)(0x12120 + 4*(_n))
 #define E1000_INVM_SIZE64 /* Number of INVM Data Registers */
 
diff --git a/drivers/net/ethernet/intel/igb/igb.h 
b/drivers/net/ethernet/intel/igb/igb.h
index 06ffb2bc713e..92845692087a 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -281,6 +281,11 @@ struct igb_ring {
u16 count;  /* number of desc. in the ring */
u8 queue_index; /* logical index of the ring*/
u8 reg_idx; /* physical index of the ring */
+   bool cbs_enable;/* indicates if CBS is enabled */
+   s32 idleslope;  /* idleSlope in kbps */
+   s32 sendslope;  /* sendSlope in kbps */
+   s32 hicredit;   /* hiCredit in bytes */
+   s32 locredit;   /* loCredit in bytes */
 
/* everything past this point are written often */
u16 next_to_clean;
@@ -621,6 +626,7 @@ struct igb_adapter {
 #define IGB_FLAG_EEE   BIT(14)
 #define IGB_FLAG_VLAN_PROMISC  BIT(15)
 #define IGB_FLAG_RX_LEGACY BIT(16)
+#define IGB_FLAG_FQTSS BIT(17)
 
 /* Media Auto Sense */
 #define IGB_MAS_ENABLE_0   0X0001
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 837d9b46a390..be2cf263efa9 100644
---

[next-queue PATCH v6 4/5] net/sched: Add support for HW offloading for CBS

2017-10-11 Thread Vinicius Costa Gomes

This adds support for offloading the CBS algorithm to the controller,
if supported. Drivers wanting to support CBS offload must implement
the .ndo_setup_tc callback and handle the TC_SETUP_CBS (introduced
here) type.

Signed-off-by: Vinicius Costa Gomes 
---
 include/linux/netdevice.h |   1 +
 include/net/pkt_sched.h   |   9 
 net/sched/sch_cbs.c   | 104 --
 3 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 31bb3010c69b..1f6c44ef5b21 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -775,6 +775,7 @@ enum tc_setup_type {
TC_SETUP_CLSFLOWER,
TC_SETUP_CLSMATCHALL,
TC_SETUP_CLSBPF,
+   TC_SETUP_CBS,
 };
 
 /* These structures hold the attributes of xdp state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 259bc191ba59..7c597b050b36 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -146,4 +146,13 @@ static inline bool is_classid_clsact_egress(u32 classid)
   TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_EGRESS);
 }
 
+struct tc_cbs_qopt_offload {
+   u8 enable;
+   s32 queue;
+   s32 hicredit;
+   s32 locredit;
+   s32 idleslope;
+   s32 sendslope;
+};
+
 #endif
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index 29de3e2bc33f..610a72529b72 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -68,6 +68,8 @@
 #define BYTES_PER_KBIT (1000 / 8)
 
 struct cbs_sched_data {
+   bool offload;
+   int queue;
s64 port_rate; /* in bytes/s */
s64 last; /* timestamp in ns */
s64 credits; /* in bytes */
@@ -80,6 +82,11 @@ struct cbs_sched_data {
struct sk_buff *(*dequeue)(struct Qdisc *sch);
 };
 
+static int cbs_enqueue_offload(struct sk_buff *skb, struct Qdisc *sch)
+{
+   return qdisc_enqueue_tail(skb, sch);
+}
+
 static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
@@ -178,6 +185,11 @@ static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
return skb;
 }
 
+static struct sk_buff *cbs_dequeue_offload(struct Qdisc *sch)
+{
+   return qdisc_dequeue_head(sch);
+}
+
 static struct sk_buff *cbs_dequeue(struct Qdisc *sch)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
@@ -189,14 +201,66 @@ static const struct nla_policy cbs_policy[TCA_CBS_MAX + 
1] = {
[TCA_CBS_PARMS] = { .len = sizeof(struct tc_cbs_qopt) },
 };
 
+static void cbs_disable_offload(struct net_device *dev,
+   struct cbs_sched_data *q)
+{
+   struct tc_cbs_qopt_offload cbs = { };
+   const struct net_device_ops *ops;
+   int err;
+
+   if (!q->offload)
+   return;
+
+   q->enqueue = cbs_enqueue_soft;
+   q->dequeue = cbs_dequeue_soft;
+
+   ops = dev->netdev_ops;
+   if (!ops->ndo_setup_tc)
+   return;
+
+   cbs.queue = q->queue;
+   cbs.enable = 0;
+
+   err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, );
+   if (err < 0)
+   pr_warn("Couldn't disable CBS offload for queue %d\n",
+   cbs.queue);
+}
+
+static int cbs_enable_offload(struct net_device *dev, struct cbs_sched_data *q,
+ const struct tc_cbs_qopt *opt)
+{
+   const struct net_device_ops *ops = dev->netdev_ops;
+   struct tc_cbs_qopt_offload cbs = { };
+   int err;
+
+   if (!ops->ndo_setup_tc)
+   return -EOPNOTSUPP;
+
+   cbs.queue = q->queue;
+
+   cbs.enable = 1;
+   cbs.hicredit = opt->hicredit;
+   cbs.locredit = opt->locredit;
+   cbs.idleslope = opt->idleslope;
+   cbs.sendslope = opt->sendslope;
+
+   err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, );
+   if (err < 0)
+   return err;
+
+   q->enqueue = cbs_enqueue_offload;
+   q->dequeue = cbs_dequeue_offload;
+
+   return 0;
+}
+
 static int cbs_change(struct Qdisc *sch, struct nlattr *opt)
 {
struct cbs_sched_data *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
struct nlattr *tb[TCA_CBS_MAX + 1];
-   struct ethtool_link_ksettings ecmd;
struct tc_cbs_qopt *qopt;
-   s64 link_speed;
int err;
 
err = nla_parse_nested(tb, TCA_CBS_MAX, opt, cbs_policy, NULL);
@@ -208,23 +272,30 @@ static int cbs_change(struct Qdisc *sch, struct nlattr 
*opt)
 
qopt = nla_data(tb[TCA_CBS_PARMS]);
 
-   if (qopt->offload)
-   return -EOPNOTSUPP;
+   if (!qopt->offload) {
+   struct ethtool_link_ksettings ecmd;
+   s64 link_speed;
 
-   if (!__ethtool_get_link_ksettings(dev, ))
-   link_speed = ecmd.base.speed;
-   else
-   link_speed = SPEED_1000;
+   if (!__ethtool_get_link_ksettings(dev, ))
+

[next-queue PATCH v6 0/5] TSN: Add qdisc based config interface for CBS

2017-10-11 Thread Vinicius Costa Gomes

Hi,

Changes since v5:
 - Fixed comments from Jiri Pirko;

Changes since v4:
 - Added a software implementation of the CBS algorithm;

Changes since v3:
 - None, only a clean patchset without old patches;

Changes since v2:
 - squashed the patch introducing the userspace API into the patch
   implementing CBS;

Changes since v1:
 - Solved the mqprio dependency;
 - Fixed a mqprio bug, that caused the inner qdisc to have a wrong
   dev_queue associated with it;

Changes from the RFC:
 - Fixed comments from Henrik Austad;
 - Simplified the Qdisc, using the generic implementation of callbacks
   where possible;
 - Small refactor on the driver (igb) code;

This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.

As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).

Overview


Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.

The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.


Proposal


Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.

For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.

As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:

$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
 idleslope I

   Note that the parameters for this qdisc are the ones defined by the
   802.1Q-2014 spec, so no hardware specific functionality is exposed here.

Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is
not yet covered by this proposal.


Testing this RFC


Attached to this cover letter are:
 - calculate_cbs_params.py: A Python script to calculate the
   parameters to the CBS queueing discipline;
 - tsn-talker.c: A sample C implementation of the talker side of a stream;
 - tsn-listener.c: A sample C implementation of the listener side of a
   stream;

For testing the patches of this series, you may want to use the
attached samples to this cover letter and use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs'
qdisc to configure the HW shaper of the i210 controller:

1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev ens4 handle 100: parent root mqprio num_tc 3 \
 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

For a more detailed explanation, see mqprio(8), in short, this command
will map traffic with priority 3 to the hardware queue 0, traffic with
priority 2 to hardware queue 1, and the rest will be mapped to
hardware queues 2 and 3.

2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g class show dev ens4

Ex.:
+---(100:3) mqprio
|+---(100:6) mqprio
|+---(100:7) mqprio
|
+---(100:2) mqprio
|+---(100:5) mqprio
|
+---(100:1) mqprio
 +---(100:4) mqprio

* Here '100:4' is Tx Queue #0 and '100:5' is Tx Queue #1.

3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
   for B is 10Mbps:
$ calc_cbs_params.py -A 2 -a 1500 -B 1 -b 1500

4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev ens4 parent 100:4 cbs locredit -1470 \
 hicredit 30 sendslope -98 idleslope 2

5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev ens4 parent 100:5 cbs \
 locredit -1485 hicredit 31 sendslope -99 idleslope 1

6) Run Listener:
$ ./tsn-listener -d 01:AA:AA:AA:AA:AA -i ens4 -s 1500

7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
$ ./tsn-talker -d 01:AA:AA:AA:AA:AA -i ens4 -p 3 -s 1500

 * The bandwidth displayed on the listener output at this stage

[PATCH net-next v2 7/7] net: qualcomm: rmnet: Implement bridge mode

2017-10-11 Thread Subash Abhinov Kasiviswanathan

Add support to bridge two devices which can send multiplexing and
aggregation (MAP) data. This is done only when the data itself is
not going to be consumed in the stack but is being passed on to a
different endpoint. This is mainly used for testing.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 93 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  7 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 26 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c|  2 +
 4 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index b5fe3f4..71bee1a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -109,6 +109,36 @@ static int rmnet_register_real_device(struct net_device 
*real_dev)
return 0;
 }
 
+static void rmnet_unregister_bridge(struct net_device *dev,
+   struct rmnet_port *port)
+{
+   struct net_device *rmnet_dev, *bridge_dev;
+   struct rmnet_port *bridge_port;
+
+   if (port->rmnet_mode != RMNET_EPMODE_BRIDGE)
+   return;
+
+   /* bridge slave handling */
+   if (!port->nr_rmnet_devs) {
+   rmnet_dev = netdev_master_upper_dev_get_rcu(dev);
+   netdev_upper_dev_unlink(dev, rmnet_dev);
+
+   bridge_dev = port->bridge_ep;
+
+   bridge_port = rmnet_get_port_rtnl(bridge_dev);
+   bridge_port->bridge_ep = NULL;
+   bridge_port->rmnet_mode = RMNET_EPMODE_VND;
+   } else {
+   bridge_dev = port->bridge_ep;
+
+   bridge_port = rmnet_get_port_rtnl(bridge_dev);
+   rmnet_dev = netdev_master_upper_dev_get_rcu(bridge_dev);
+   netdev_upper_dev_unlink(bridge_dev, rmnet_dev);
+
+   rmnet_unregister_real_device(bridge_dev, bridge_port);
+   }
+}
+
 static int rmnet_newlink(struct net *src_net, struct net_device *dev,
 struct nlattr *tb[], struct nlattr *data[],
 struct netlink_ext_ack *extack)
@@ -190,10 +220,10 @@ static void rmnet_dellink(struct net_device *dev, struct 
list_head *head)
ep = rmnet_get_endpoint(port, mux_id);
if (ep) {
hlist_del_init_rcu(>hlnode);
+   rmnet_unregister_bridge(dev, port);
rmnet_vnd_dellink(mux_id, port, ep);
kfree(ep);
}
-
rmnet_unregister_real_device(real_dev, port);
 
unregister_netdevice_queue(dev, head);
@@ -237,6 +267,8 @@ static void rmnet_force_unassociate_device(struct 
net_device *dev)
d.port = port;
 
rcu_read_lock();
+   rmnet_unregister_bridge(dev, port);
+
netdev_walk_all_lower_dev_rcu(real_dev, rmnet_dev_walk_unreg, );
rcu_read_unlock();
unregister_netdevice_many();
@@ -321,6 +353,65 @@ struct rmnet_endpoint *rmnet_get_endpoint(struct 
rmnet_port *port, u8 mux_id)
return NULL;
 }
 
+int rmnet_add_bridge(struct net_device *rmnet_dev,
+struct net_device *slave_dev,
+struct netlink_ext_ack *extack)
+{
+   struct rmnet_priv *priv = netdev_priv(rmnet_dev);
+   struct net_device *real_dev = priv->real_dev;
+   struct rmnet_port *port, *slave_port;
+   int err;
+
+   port = rmnet_get_port(real_dev);
+
+   /* If there is more than one rmnet dev attached, its probably being
+* used for muxing. Skip the briding in that case
+*/
+   if (port->nr_rmnet_devs > 1)
+   return -EINVAL;
+
+   if (rmnet_is_real_dev_registered(slave_dev))
+   return -EBUSY;
+
+   err = rmnet_register_real_device(slave_dev);
+   if (err)
+   return -EBUSY;
+
+   err = netdev_master_upper_dev_link(slave_dev, rmnet_dev, NULL, NULL,
+  extack);
+   if (err)
+   return -EINVAL;
+
+   slave_port = rmnet_get_port(slave_dev);
+   slave_port->rmnet_mode = RMNET_EPMODE_BRIDGE;
+   slave_port->bridge_ep = real_dev;
+
+   port->rmnet_mode = RMNET_EPMODE_BRIDGE;
+   port->bridge_ep = slave_dev;
+
+   netdev_dbg(slave_dev, "registered with rmnet as slave\n");
+   return 0;
+}
+
+int rmnet_del_bridge(struct net_device *rmnet_dev,
+struct net_device *slave_dev)
+{
+   struct rmnet_priv *priv = netdev_priv(rmnet_dev);
+   struct net_device *real_dev = priv->real_dev;
+   struct rmnet_port *port, *slave_port;
+
+   port = rmnet_get_port(real_dev);
+   port->rmnet_mode = RMNET_EPMODE_VND;
+   port->bridge_ep = NULL;
+
+   netdev_upper_dev_unlink(slave_dev, rmnet_dev);
+   slave_port = rmnet_get_port(slave_dev);
+

[PATCH net-next v2 1/7] net: qualcomm: rmnet: Remove existing logic for bridge mode

2017-10-11 Thread Subash Abhinov Kasiviswanathan

This will be rewritten in the following patches.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  1 -
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 77 +++---
 2 files changed, 9 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index dde4e9f..0b0c5a7 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -34,7 +34,6 @@ struct rmnet_endpoint {
  */
 struct rmnet_port {
struct net_device *dev;
-   struct rmnet_endpoint local_ep;
struct rmnet_endpoint muxed_ep[RMNET_MAX_LOGICAL_EP];
u32 ingress_data_format;
u32 egress_data_format;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 540c762..b50f401 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -44,56 +44,18 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 /* Generic handler */
 
 static rx_handler_result_t
-rmnet_bridge_handler(struct sk_buff *skb, struct rmnet_endpoint *ep)
+rmnet_deliver_skb(struct sk_buff *skb)
 {
-   if (!ep->egress_dev)
-   kfree_skb(skb);
-   else
-   rmnet_egress_handler(skb, ep);
+   skb_reset_transport_header(skb);
+   skb_reset_network_header(skb);
+   rmnet_vnd_rx_fixup(skb, skb->dev);
 
+   skb->pkt_type = PACKET_HOST;
+   skb_set_mac_header(skb, 0);
+   netif_receive_skb(skb);
return RX_HANDLER_CONSUMED;
 }
 
-static rx_handler_result_t
-rmnet_deliver_skb(struct sk_buff *skb, struct rmnet_endpoint *ep)
-{
-   switch (ep->rmnet_mode) {
-   case RMNET_EPMODE_NONE:
-   return RX_HANDLER_PASS;
-
-   case RMNET_EPMODE_BRIDGE:
-   return rmnet_bridge_handler(skb, ep);
-
-   case RMNET_EPMODE_VND:
-   skb_reset_transport_header(skb);
-   skb_reset_network_header(skb);
-   rmnet_vnd_rx_fixup(skb, skb->dev);
-
-   skb->pkt_type = PACKET_HOST;
-   skb_set_mac_header(skb, 0);
-   netif_receive_skb(skb);
-   return RX_HANDLER_CONSUMED;
-
-   default:
-   kfree_skb(skb);
-   return RX_HANDLER_CONSUMED;
-   }
-}
-
-static rx_handler_result_t
-rmnet_ingress_deliver_packet(struct sk_buff *skb,
-struct rmnet_port *port)
-{
-   if (!port) {
-   kfree_skb(skb);
-   return RX_HANDLER_CONSUMED;
-   }
-
-   skb->dev = port->local_ep.egress_dev;
-
-   return rmnet_deliver_skb(skb, >local_ep);
-}
-
 /* MAP handler */
 
 static rx_handler_result_t
@@ -130,7 +92,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
skb_pull(skb, sizeof(struct rmnet_map_header));
skb_trim(skb, len);
rmnet_set_skb_proto(skb);
-   return rmnet_deliver_skb(skb, ep);
+   return rmnet_deliver_skb(skb);
 }
 
 static rx_handler_result_t
@@ -204,29 +166,8 @@ rx_handler_result_t rmnet_rx_handler(struct sk_buff **pskb)
dev = skb->dev;
port = rmnet_get_port(dev);
 
-   if (port->ingress_data_format & RMNET_INGRESS_FORMAT_MAP) {
+   if (port->ingress_data_format & RMNET_INGRESS_FORMAT_MAP)
rc = rmnet_map_ingress_handler(skb, port);
-   } else {
-   switch (ntohs(skb->protocol)) {
-   case ETH_P_MAP:
-   if (port->local_ep.rmnet_mode ==
-   RMNET_EPMODE_BRIDGE) {
-   rc = rmnet_ingress_deliver_packet(skb, port);
-   } else {
-   kfree_skb(skb);
-   rc = RX_HANDLER_CONSUMED;
-   }
-   break;
-
-   case ETH_P_IP:
-   case ETH_P_IPV6:
-   rc = rmnet_ingress_deliver_packet(skb, port);
-   break;
-
-   default:
-   rc = RX_HANDLER_PASS;
-   }
-   }
 
return rc;
 }
-- 
1.9.1

[PATCH net-next v2 5/7] net: qualcomm: rmnet: Remove duplicate setting of rmnet_devices

2017-10-11 Thread Subash Abhinov Kasiviswanathan

The rmnet_devices information is already stored in muxed_ep, so
storing this in rmnet_devices[] again is redundant.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h | 1 -
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 8 
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index c5f5c6d..123ccf4 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -33,7 +33,6 @@ struct rmnet_port {
struct rmnet_endpoint muxed_ep[RMNET_MAX_LOGICAL_EP];
u32 ingress_data_format;
u32 egress_data_format;
-   struct net_device *rmnet_devices[RMNET_MAX_LOGICAL_EP];
u8 nr_rmnet_devs;
u8 rmnet_mode;
 };
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index 4ca59a4..8b8497b 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -105,12 +105,12 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
struct rmnet_priv *priv;
int rc;
 
-   if (port->rmnet_devices[id])
+   if (port->muxed_ep[id].egress_dev)
return -EINVAL;
 
rc = register_netdevice(rmnet_dev);
if (!rc) {
-   port->rmnet_devices[id] = rmnet_dev;
+   port->muxed_ep[id].egress_dev = rmnet_dev;
port->nr_rmnet_devs++;
 
rmnet_dev->rtnl_link_ops = _link_ops;
@@ -127,10 +127,10 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 
 int rmnet_vnd_dellink(u8 id, struct rmnet_port *port)
 {
-   if (id >= RMNET_MAX_LOGICAL_EP || !port->rmnet_devices[id])
+   if (id >= RMNET_MAX_LOGICAL_EP || !port->muxed_ep[id].egress_dev)
return -EINVAL;
 
-   port->rmnet_devices[id] = NULL;
+   port->muxed_ep[id].egress_dev = NULL;
port->nr_rmnet_devs--;
return 0;
 }
-- 
1.9.1

[PATCH net-next v2 2/7] net: qualcomm: rmnet: Remove some unused defines

2017-10-11 Thread Subash Abhinov Kasiviswanathan

Most of these constants were used in the initial patchset where
custom netlink configuration was used and hence are no longer relevant.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
index 7967198..49102f9 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
@@ -19,23 +19,15 @@
 #define RMNET_TX_QUEUE_LEN 1000
 
 /* Constants */
-#define RMNET_EGRESS_FORMAT__RESERVED__ BIT(0)
 #define RMNET_EGRESS_FORMAT_MAP BIT(1)
 #define RMNET_EGRESS_FORMAT_AGGREGATION BIT(2)
 #define RMNET_EGRESS_FORMAT_MUXING  BIT(3)
-#define RMNET_EGRESS_FORMAT_MAP_CKSUMV3 BIT(4)
-#define RMNET_EGRESS_FORMAT_MAP_CKSUMV4 BIT(5)
 
-#define RMNET_INGRESS_FIX_ETHERNET  BIT(0)
 #define RMNET_INGRESS_FORMAT_MAPBIT(1)
 #define RMNET_INGRESS_FORMAT_DEAGGREGATION  BIT(2)
 #define RMNET_INGRESS_FORMAT_DEMUXING   BIT(3)
 #define RMNET_INGRESS_FORMAT_MAP_COMMANDS   BIT(4)
-#define RMNET_INGRESS_FORMAT_MAP_CKSUMV3BIT(5)
-#define RMNET_INGRESS_FORMAT_MAP_CKSUMV4BIT(6)
 
-/* Pass the frame up the stack with no modifications to skb->dev */
-#define RMNET_EPMODE_NONE (0)
 /* Replace skb->dev to a virtual rmnet device and pass up the stack */
 #define RMNET_EPMODE_VND (1)
 /* Pass the frame directly to another device with dev_queue_xmit() */
-- 
1.9.1

[PATCH net-next v2 6/7] net: qualcomm: rmnet: Convert the muxed endpoint to hlist

2017-10-11 Thread Subash Abhinov Kasiviswanathan

Rather than using a static array, use a hlist to store the muxed
endpoints and use the mux id to query the rmnet_device.
This is useful as usually very few mux ids are used.

Signed-off-by: Subash Abhinov Kasiviswanathan 
Cc: Dan Williams 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 75 --
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  4 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 17 +++--
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c|  4 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 15 +++--
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h|  6 +-
 6 files changed, 68 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index 96058bb..b5fe3f4 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -61,18 +61,6 @@ static int rmnet_is_real_dev_registered(const struct 
net_device *real_dev)
return rtnl_dereference(real_dev->rx_handler_data);
 }
 
-static struct rmnet_endpoint*
-rmnet_get_endpoint(struct net_device *dev, int config_id)
-{
-   struct rmnet_endpoint *ep;
-   struct rmnet_port *port;
-
-   port = rmnet_get_port_rtnl(dev);
-   ep = >muxed_ep[config_id];
-
-   return ep;
-}
-
 static int rmnet_unregister_real_device(struct net_device *real_dev,
struct rmnet_port *port)
 {
@@ -93,7 +81,7 @@ static int rmnet_unregister_real_device(struct net_device 
*real_dev,
 static int rmnet_register_real_device(struct net_device *real_dev)
 {
struct rmnet_port *port;
-   int rc;
+   int rc, entry;
 
ASSERT_RTNL();
 
@@ -114,26 +102,13 @@ static int rmnet_register_real_device(struct net_device 
*real_dev)
/* hold on to real dev for MAP data */
dev_hold(real_dev);
 
+   for (entry = 0; entry < RMNET_MAX_LOGICAL_EP; entry++)
+   INIT_HLIST_HEAD(>muxed_ep[entry]);
+
netdev_dbg(real_dev, "registered with rmnet\n");
return 0;
 }
 
-static void rmnet_set_endpoint_config(struct net_device *dev,
- u8 mux_id, struct net_device *egress_dev)
-{
-   struct rmnet_endpoint *ep;
-
-   netdev_dbg(dev, "id %d dev %s\n", mux_id, egress_dev->name);
-
-   ep = rmnet_get_endpoint(dev, mux_id);
-   /* This config is cleared on every set, so its ok to not
-* clear it on a device delete.
-*/
-   memset(ep, 0, sizeof(struct rmnet_endpoint));
-   ep->egress_dev = egress_dev;
-   ep->mux_id = mux_id;
-}
-
 static int rmnet_newlink(struct net *src_net, struct net_device *dev,
 struct nlattr *tb[], struct nlattr *data[],
 struct netlink_ext_ack *extack)
@@ -145,6 +120,7 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
RMNET_EGRESS_FORMAT_MAP;
struct net_device *real_dev;
int mode = RMNET_EPMODE_VND;
+   struct rmnet_endpoint *ep;
struct rmnet_port *port;
int err = 0;
u16 mux_id;
@@ -156,6 +132,10 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
if (!data[IFLA_VLAN_ID])
return -EINVAL;
 
+   ep = kzalloc(sizeof(*ep), GFP_ATOMIC);
+   if (!ep)
+   return -ENOMEM;
+
mux_id = nla_get_u16(data[IFLA_VLAN_ID]);
 
err = rmnet_register_real_device(real_dev);
@@ -163,7 +143,7 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
goto err0;
 
port = rmnet_get_port_rtnl(real_dev);
-   err = rmnet_vnd_newlink(mux_id, dev, port, real_dev);
+   err = rmnet_vnd_newlink(mux_id, dev, port, real_dev, ep);
if (err)
goto err1;
 
@@ -177,11 +157,11 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
port->ingress_data_format = ingress_format;
port->rmnet_mode = mode;
 
-   rmnet_set_endpoint_config(real_dev, mux_id, dev);
+   hlist_add_head_rcu(>hlnode, >muxed_ep[mux_id]);
return 0;
 
 err2:
-   rmnet_vnd_dellink(mux_id, port);
+   rmnet_vnd_dellink(mux_id, port, ep);
 err1:
rmnet_unregister_real_device(real_dev, port);
 err0:
@@ -191,6 +171,7 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
 static void rmnet_dellink(struct net_device *dev, struct list_head *head)
 {
struct net_device *real_dev;
+   struct rmnet_endpoint *ep;
struct rmnet_port *port;
u8 mux_id;
 
@@ -204,8 +185,15 @@ static void rmnet_dellink(struct net_device *dev, struct 
list_head *head)
port = rmnet_get_port_rtnl(real_dev);
 
mux_id = rmnet_vnd_get_mux(dev);
-   rmnet_vnd_dellink(mux_id, port);
netdev_upper_dev_unlink(dev, real_dev);
+
+   ep =

[PATCH net-next v2 4/7] net: qualcomm: rmnet: Remove duplicate setting of rmnet private info

2017-10-11 Thread Subash Abhinov Kasiviswanathan

The end point is set twice in the local_ep as well as the mux_id and
the real_dev in the rmnet private structure. Remove the local_ep.
While these elements are equivalent, rmnet_endpoint will be
used only as part of the rmnet_port for muxed scenarios in VND mode.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c   | 10 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h   |  4 
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 18 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h |  3 +--
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c  | 19 ++-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h  |  1 -
 6 files changed, 15 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index 85fce9c..96058bb 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -67,13 +67,8 @@ static int rmnet_is_real_dev_registered(const struct 
net_device *real_dev)
struct rmnet_endpoint *ep;
struct rmnet_port *port;
 
-   if (!rmnet_is_real_dev_registered(dev)) {
-   ep = rmnet_vnd_get_endpoint(dev);
-   } else {
-   port = rmnet_get_port_rtnl(dev);
-
-   ep = >muxed_ep[config_id];
-   }
+   port = rmnet_get_port_rtnl(dev);
+   ep = >muxed_ep[config_id];
 
return ep;
 }
@@ -183,7 +178,6 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
port->rmnet_mode = mode;
 
rmnet_set_endpoint_config(real_dev, mux_id, dev);
-   rmnet_set_endpoint_config(dev, mux_id, real_dev);
return 0;
 
 err2:
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index 03d473f..c5f5c6d 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -20,9 +20,6 @@
 
 #define RMNET_MAX_LOGICAL_EP 255
 
-/* Information about the next device to deliver the packet to.
- * Exact usage of this parameter depends on the rmnet_mode.
- */
 struct rmnet_endpoint {
u8 mux_id;
struct net_device *egress_dev;
@@ -44,7 +41,6 @@ struct rmnet_port {
 extern struct rtnl_link_ops rmnet_link_ops;
 
 struct rmnet_priv {
-   struct rmnet_endpoint local_ep;
u8 mux_id;
struct net_device *real_dev;
 };
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 86e37cc..e0802d3 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -116,8 +116,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 }
 
 static int rmnet_map_egress_handler(struct sk_buff *skb,
-   struct rmnet_port *port,
-   struct rmnet_endpoint *ep,
+   struct rmnet_port *port, u8 mux_id,
struct net_device *orig_dev)
 {
int required_headroom, additional_header_len;
@@ -136,10 +135,10 @@ static int rmnet_map_egress_handler(struct sk_buff *skb,
return RMNET_MAP_CONSUMED;
 
if (port->egress_data_format & RMNET_EGRESS_FORMAT_MUXING) {
-   if (ep->mux_id == 0xff)
+   if (mux_id == 0xff)
map_header->mux_id = 0;
else
-   map_header->mux_id = ep->mux_id;
+   map_header->mux_id = mux_id;
}
 
skb->protocol = htons(ETH_P_MAP);
@@ -176,14 +175,17 @@ rx_handler_result_t rmnet_rx_handler(struct sk_buff 
**pskb)
  * for egress device configured in logical endpoint. Packet is then transmitted
  * on the egress device.
  */
-void rmnet_egress_handler(struct sk_buff *skb,
- struct rmnet_endpoint *ep)
+void rmnet_egress_handler(struct sk_buff *skb)
 {
struct net_device *orig_dev;
struct rmnet_port *port;
+   struct rmnet_priv *priv;
+   u8 mux_id;
 
orig_dev = skb->dev;
-   skb->dev = ep->egress_dev;
+   priv = netdev_priv(orig_dev);
+   skb->dev = priv->real_dev;
+   mux_id = priv->mux_id;
 
port = rmnet_get_port(skb->dev);
if (!port) {
@@ -192,7 +194,7 @@ void rmnet_egress_handler(struct sk_buff *skb,
}
 
if (port->egress_data_format & RMNET_EGRESS_FORMAT_MAP) {
-   switch (rmnet_map_egress_handler(skb, port, ep, orig_dev)) {
+   switch (rmnet_map_egress_handler(skb, port, mux_id, orig_dev)) {
case RMNET_MAP_CONSUMED:
return;
 
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
index

[PATCH net-next v2 3/7] net: qualcomm: rmnet: Move rmnet_mode to rmnet_port

2017-10-11 Thread Subash Abhinov Kasiviswanathan

Mode information on the real device makes it easier to route packets
to rmnet device or bridged device based on the configuration.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c   | 12 +---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h   |  2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c |  3 +--
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index 8403eea..85fce9c 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -124,20 +124,17 @@ static int rmnet_register_real_device(struct net_device 
*real_dev)
 }
 
 static void rmnet_set_endpoint_config(struct net_device *dev,
- u8 mux_id, u8 rmnet_mode,
- struct net_device *egress_dev)
+ u8 mux_id, struct net_device *egress_dev)
 {
struct rmnet_endpoint *ep;
 
-   netdev_dbg(dev, "id %d mode %d dev %s\n",
-  mux_id, rmnet_mode, egress_dev->name);
+   netdev_dbg(dev, "id %d dev %s\n", mux_id, egress_dev->name);
 
ep = rmnet_get_endpoint(dev, mux_id);
/* This config is cleared on every set, so its ok to not
 * clear it on a device delete.
 */
memset(ep, 0, sizeof(struct rmnet_endpoint));
-   ep->rmnet_mode = rmnet_mode;
ep->egress_dev = egress_dev;
ep->mux_id = mux_id;
 }
@@ -183,9 +180,10 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
   ingress_format, egress_format);
port->egress_data_format = egress_format;
port->ingress_data_format = ingress_format;
+   port->rmnet_mode = mode;
 
-   rmnet_set_endpoint_config(real_dev, mux_id, mode, dev);
-   rmnet_set_endpoint_config(dev, mux_id, mode, real_dev);
+   rmnet_set_endpoint_config(real_dev, mux_id, dev);
+   rmnet_set_endpoint_config(dev, mux_id, real_dev);
return 0;
 
 err2:
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index 0b0c5a7..03d473f 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -24,7 +24,6 @@
  * Exact usage of this parameter depends on the rmnet_mode.
  */
 struct rmnet_endpoint {
-   u8 rmnet_mode;
u8 mux_id;
struct net_device *egress_dev;
 };
@@ -39,6 +38,7 @@ struct rmnet_port {
u32 egress_data_format;
struct net_device *rmnet_devices[RMNET_MAX_LOGICAL_EP];
u8 nr_rmnet_devs;
+   u8 rmnet_mode;
 };
 
 extern struct rtnl_link_ops rmnet_link_ops;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index b50f401..86e37cc 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -205,8 +205,7 @@ void rmnet_egress_handler(struct sk_buff *skb,
}
}
 
-   if (ep->rmnet_mode == RMNET_EPMODE_VND)
-   rmnet_vnd_tx_fixup(skb, orig_dev);
+   rmnet_vnd_tx_fixup(skb, orig_dev);
 
dev_queue_xmit(skb);
 }
-- 
1.9.1

[PATCH net-next v2 0/7] net: qualcomm: rmnet: Rewrite some existing functionality

2017-10-11 Thread Subash Abhinov Kasiviswanathan

This series fixes some of the broken rmnet functionality.
Bridge mode is re-written and made useable and the muxed_ep is converted to 
hlist.

Patches 1-5 are cleanups in preparation for these changes.
Patch 6 does the hlist conversion.
Patch 7 has the implementation of the rmnet bridge mode.

v1->v2: Fix the warning and code style issue in rmnet_rx_handler as
mentioned by David.

Subash Abhinov Kasiviswanathan (7):
  net: qualcomm: rmnet: Remove existing logic for bridge mode
  net: qualcomm: rmnet: Remove some unused defines
  net: qualcomm: rmnet: Move rmnet_mode to rmnet_port
  net: qualcomm: rmnet: Remove duplicate setting of rmnet private info
  net: qualcomm: rmnet: Remove duplicate setting of rmnet_devices
  net: qualcomm: rmnet: Convert the muxed endpoint to hlist
  net: qualcomm: rmnet: Implement bridge mode

 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 166 -
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |  19 +--
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   | 137 +++--
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.h   |   3 +-
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c|   4 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h|   8 -
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c|  36 ++---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h|   7 +-
 8 files changed, 207 insertions(+), 173 deletions(-)

-- 
1.9.1

Re: [PATCH net-next 2/4] security: bpf: Add LSM hooks for bpf object related syscall

2017-10-11 Thread James Morris

On Wed, 4 Oct 2017, Chenbo Feng wrote:

>  int bpf_map_new_fd(struct bpf_map *map, int flags)
>  {
> + if (security_bpf_map(map, OPEN_FMODE(flags)))
> + return -EPERM;
> +

Don't hardcode -EPERM here, return the actual error from 
security_bpf_map().

> + if (security_bpf_prog(prog))
> + return -EPERM;
> +

Same.

> + err = security_bpf(cmd, , size);
> + if (err)
> + return -EPERM;

Same.

- James

Re: BUG:af_packet fails to TX TSO frames

2017-10-11 Thread Willem de Bruijn

On Wed, Oct 11, 2017 at 6:01 PM, Anton Ivanov
 wrote:
> [snip]
>
>> This will be tomorrow though, it is late here.
>>
>> The only obvious difference I can see at this point is that I am using
>> iovs and sending the vnet header as iov[0] and the data in pieces after
>> that while your code is doing a send() for the whole frame. This should
>> not make any difference though - it all ends up as an iov internally in
>> the kernel.
>
> Spoke too soon. It is not reporting any errors, but there is nothing
> coming out on the actual Ethernet.

It works for me on various platforms. On the receiver, drop these fake
tcp packets in iptables and read them with tcpdump

   iptables -A PREROUTING -t raw -p tcp --dport 9 -j DROP
   tcpdump src $src_ip

Note that not all combinations of flags are supported by the kernel
and that some flags have non-obvious behavior (disable a feature, in
place of enable it).

Specifically, mtu sized packets either must not pass a vnet_hdr or
must pass one with gso explicitly disabled ('-G').

  psock_txring_vnet -s $src_ip $dst_ip -l 1400
  psock_txring_vnet -s $src_ip $dst_ip -l 1400 -v -G
  psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N
  psock_txring_vnet -s $src_ip $dst_ip -l 1400 -N -v -G

Conversely, packets that exceed mtu have to have the gso flags in the
virtio_net_hdr:

  psock_txring_vnet -s $src_ip $dst_ip -l 4400 -v
  psock_txring_vnet -s $src_ip $dst_ip -l 4400 -N -v

When sending a large packet, but not passing a virtio_net_hdr along
('-v'), the test fails with

  psock_txring_vnet: send: Message too long

When passing a header along, but not disabling gso, the packet is
indeed dropped silently.

I verified correct segmentation with three modes of ethtool

  ethtool -K eth0 tso off gso off
  ethtool -K eth0 tso off gso on
  ethtool -K eth0 tso on gso on

by reading tcpdump on the sender.

The receive side results are the same with dev_queue_xmit and
packet_direct_xmit ('-q') mode. With direct_xmit, the packets are not
observed on the send side.

[PATCH next] ipvlan: always use the current L2 addr of the master

2017-10-11 Thread Mahesh Bandewar

From: Mahesh Bandewar 

If the underlying master ever changes its L2 (e.g. bonding device),
then make sure that the IPvlan slaves always emit packets with the
current L2 of the master instead of the stale mac addr which was
copied during the device creation. The problem can be seen with
following script -

  #!/bin/bash
  # Create a vEth pair
  ip link add dev veth0 type veth peer name veth1
  ip link set veth0 up
  ip link set veth1 up
  ip link show veth0
  ip link show veth1
  # Create an IPvlan device on one end of this vEth pair.
  ip link add link veth0 dev ipvl0 type ipvlan mode l2
  ip link show ipvl0
  # Change the mac-address of the vEth master.
  ip link set veth0 address 02:11:22:33:44:55

Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Mahesh Bandewar 
---
 drivers/net/ipvlan/ipvlan_main.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index c74893c1e620..5832091680f4 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -407,7 +407,7 @@ static int ipvlan_hard_header(struct sk_buff *skb, struct 
net_device *dev,
 * while the packets use the mac-addr on the physical device.
 */
return dev_hard_header(skb, phy_dev, type, daddr,
-  saddr ? : dev->dev_addr, len);
+  saddr ? : phy_dev->dev_addr, len);
 }
 
 static const struct header_ops ipvlan_header_ops = {
@@ -730,6 +730,11 @@ static int ipvlan_device_event(struct notifier_block 
*unused,
ipvlan_adjust_mtu(ipvlan, dev);
break;
 
+   case NETDEV_CHANGEADDR:
+   list_for_each_entry(ipvlan, >ipvlans, pnode)
+   ether_addr_copy(ipvlan->dev->dev_addr, dev->dev_addr);
+   break;
+
case NETDEV_PRE_TYPE_CHANGE:
/* Forbid underlying device to change its type. */
return NOTIFY_BAD;
-- 
2.15.0.rc0.271.g36b669edcc-goog

Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v4 5/6] i40e: Clean up of cloud filters

2017-10-11 Thread Shannon Nelson


On 10/10/2017 5:24 PM, Amritha Nambiar wrote:

Introduce the cloud filter datastructure and cleanup of cloud
filters associated with the device.

v2: Moved field comments in struct i40e_cloud_filter to the right.
Removed hlist_empty check from i40e_cloud_filter_exit()

Signed-off-by: Amritha Nambiar 
---
  drivers/net/ethernet/intel/i40e/i40e.h  |9 +
  drivers/net/ethernet/intel/i40e/i40e_main.c |   24 
  2 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index f3c501e..b938bb4a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -253,6 +253,12 @@ struct i40e_fdir_filter {
u32 fd_id;
  };
  
+struct i40e_cloud_filter {

+   struct hlist_node cloud_node;
+   unsigned long cookie;
+   u16 seid;   /* filter control */
+};
+
  #define I40E_ETH_P_LLDP   0x88cc
  
  #define I40E_DCB_PRIO_TYPE_STRICT	0

@@ -420,6 +426,9 @@ struct i40e_pf {
struct i40e_udp_port_config udp_ports[I40E_MAX_PF_UDP_OFFLOAD_PORTS];
u16 pending_udp_bitmap;
  
+	struct hlist_head cloud_filter_list;

+   u16 num_cloud_filters;
+
enum i40e_interrupt_policy int_policy;
u16 rx_itr_default;
u16 tx_itr_default;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 0539d43..bcdb16a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6937,6 +6937,26 @@ static void i40e_fdir_filter_exit(struct i40e_pf *pf)
  }
  
  /**

+ * i40e_cloud_filter_exit - Cleans up the Cloud Filters
+ * @pf: Pointer to PF
+ *
+ * This function destroys the hlist where all the Cloud Filters
+ * filters were saved.


Redundant "Cloud Filters filters"


+ **/
+static void i40e_cloud_filter_exit(struct i40e_pf *pf)
+{
+   struct i40e_cloud_filter *cfilter;
+   struct hlist_node *node;
+
+   hlist_for_each_entry_safe(cfilter, node,
+ >cloud_filter_list, cloud_node) {
+   hlist_del(>cloud_node);
+   kfree(cfilter);
+   }
+   pf->num_cloud_filters = 0;
+}
+
+/**
   * i40e_close - Disables a network interface
   * @netdev: network interface device structure
   *
@@ -12195,6 +12215,7 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf, 
bool reinit)
vsi = i40e_vsi_reinit_setup(pf->vsi[pf->lan_vsi]);
if (!vsi) {
dev_info(>pdev->dev, "setup of MAIN VSI failed\n");
+   i40e_cloud_filter_exit(pf);
i40e_fdir_teardown(pf);
return -EAGAIN;
}
@@ -13029,6 +13050,8 @@ static void i40e_remove(struct pci_dev *pdev)
if (pf->vsi[pf->lan_vsi])
i40e_vsi_release(pf->vsi[pf->lan_vsi]);
  
+	i40e_cloud_filter_exit(pf);

+
/* remove attached clients */
if (pf->flags & I40E_FLAG_IWARP_ENABLED) {
ret_code = i40e_lan_del_device(pf);
@@ -13260,6 +13283,7 @@ static void i40e_shutdown(struct pci_dev *pdev)
  
  	del_timer_sync(>service_timer);

cancel_work_sync(>service_task);
+   i40e_cloud_filter_exit(pf);
i40e_fdir_teardown(pf);
  
  	/* Client close must be called explicitly here because the timer


___
Intel-wired-lan mailing list
intel-wired-...@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v4 4/6] i40e: Admin queue definitions for cloud filters

2017-10-11 Thread Shannon Nelson


On 10/10/2017 5:24 PM, Amritha Nambiar wrote:

Add new admin queue definitions and extended fields for cloud
filter support. Define big buffer for extended general fields
in Add/Remove Cloud filters command.

v3: Shortened some lengthy struct names.
v2: Added I40E_CHECK_STRUCT_LEN check to AQ command structs and
added AQ definitions to i40evf for consistency based on Shannon's
feedback.

Signed-off-by: Amritha Nambiar 
Signed-off-by: Kiran Patil 
Signed-off-by: Jingjing Wu 
---
  drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |  110 
  .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h|  110 
  2 files changed, 216 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 729976b..bcc7986 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1371,14 +1371,16 @@ struct i40e_aqc_add_remove_cloud_filters {
  #define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT 0
  #define I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_MASK  (0x3FF << \
I40E_AQC_ADD_CLOUD_CMD_SEID_NUM_SHIFT)
-   u8  reserved2[4];
+   u8  big_buffer_flag;
+#define I40E_AQC_ADD_CLOUD_CMD_BB  1
+   u8  reserved2[3];
__le32  addr_high;
__le32  addr_low;
  };
  
  I40E_CHECK_CMD_LENGTH(i40e_aqc_add_remove_cloud_filters);
  
-struct i40e_aqc_add_remove_cloud_filters_element_data {

+struct i40e_aqc_cloud_filters_element_data {
u8  outer_mac[6];
u8  inner_mac[6];
__le16  inner_vlan;
@@ -1408,6 +1410,13 @@ struct i40e_aqc_add_remove_cloud_filters_element_data {
  #define I40E_AQC_ADD_CLOUD_FILTER_IMAC0x000A
  #define I40E_AQC_ADD_CLOUD_FILTER_OMAC_TEN_ID_IMAC0x000B
  #define I40E_AQC_ADD_CLOUD_FILTER_IIP 0x000C
+/* 0x0010 to 0x0017 is for custom filters */
+/* flag to be used when adding cloud filter: IP + L4 Port */
+#define I40E_AQC_ADD_CLOUD_FILTER_IP_PORT  0x0010
+/* flag to be used when adding cloud filter: Dest MAC + L4 Port */
+#define I40E_AQC_ADD_CLOUD_FILTER_MAC_PORT 0x0011
+/* flag to be used when adding cloud filter: Dest MAC + VLAN + L4 Port */
+#define I40E_AQC_ADD_CLOUD_FILTER_MAC_VLAN_PORT0x0012


Short description comments to the side of each line would be more 
readable, and maybe don't mind too much the 80 column thing here


#define I40E_AQC_ADD_CLOUD_FILTER_IP_PORT   0x0010/* Dest IP + L4 Port */
#define I40E_AQC_ADD_CLOUD_FILTER_MAC_PORT  0x0011 /* Dest MAC + L4 Port */
#define I40E_AQC_ADD_CLOUD_FILTER_MAC_VLAN_PORT	0x0012 /* Dest MAC + 
VLAN + L4 Port */



  
  #define I40E_AQC_ADD_CLOUD_FLAGS_TO_QUEUE		0x0080

  #define I40E_AQC_ADD_CLOUD_VNK_SHIFT  6
@@ -1442,6 +1451,49 @@ struct i40e_aqc_add_remove_cloud_filters_element_data {
u8  response_reserved[7];
  };
  
+I40E_CHECK_STRUCT_LEN(0x40, i40e_aqc_cloud_filters_element_data);

+
+/* i40e_aqc_cloud_filters_element_bb is used when
+ * I40E_AQC_CLOUD_CMD_BB flag is set.
+ */
+struct i40e_aqc_cloud_filters_element_bb {
+   struct i40e_aqc_cloud_filters_element_data element;
+   u16 general_fields[32];
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X10_WORD0   0
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X10_WORD1   1
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X10_WORD2   2
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X11_WORD0   3
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X11_WORD1   4
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X11_WORD2   5
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X12_WORD0   6
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X12_WORD1   7
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X12_WORD2   8
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X13_WORD0   9
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X13_WORD1   10
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X13_WORD2   11
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X14_WORD0   12
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X14_WORD1   13
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X14_WORD2   14
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD0   15
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD1   16
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD2   17
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD3   18
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD4   19
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD5   20
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD6   21
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X16_WORD7   22
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD0   23
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD1   24
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD2   25
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD3   26
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD4   27
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD5   28
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD6   29
+#define I40E_AQC_ADD_CLOUD_FV_FLU_0X17_WORD7   30
+};
+
+I40E_CHECK_STRUCT_LEN(0x80,

Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v4 6/6] i40e: Enable cloud filters via tc-flower

2017-10-11 Thread Shannon Nelson


On 10/10/2017 5:24 PM, Amritha Nambiar wrote:

This patch enables tc-flower based hardware offloads. tc flower
filter provided by the kernel is configured as driver specific
cloud filter. The patch implements functions and admin queue
commands needed to support cloud filters in the driver and
adds cloud filters to configure these tc-flower filters.

The classification function of the filter is to direct matched
packets to a traffic class which is set based on the offloaded
tc-flower classid. The approach here is similar to the tc 'prio'
qdisc which uses the classid for band selection. The ingress qdisc
is called :0, so traffic classes are :1 to :8 (i40e
has max of 8 TCs). TC0 is minor number 1, TC1 is minor number 2 etc.

# tc qdisc add dev eth0 ingress
# ethtool -K eth0 hw-tc-offload on

Match Dst MAC and route to TC0:
# tc filter add dev eth0 protocol ip parent :\
   prio 1 flower dst_mac 3c:fd:fe:a0:d6:70 skip_sw\
   classid :1

Match Dst IPv4,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ip parent :\
   prio 2 flower dst_ip 192.168.3.5/32\
   ip_proto udp dst_port 25 skip_sw\
   classid :2

Match Dst IPv6,Dst Port and route to TC1:
# tc filter add dev eth0 protocol ipv6 parent :\
   prio 3 flower dst_ip fe8::200:1\
   ip_proto udp dst_port 66 skip_sw\
   classid :2

Delete tc flower filter:
Example:

# tc filter del dev eth0 parent : prio 3 handle 0x1 flower
# tc filter del dev eth0 parent :

Flow Director Sideband is disabled while configuring cloud filters
via tc-flower and until any cloud filter exists.

Unsupported matches when cloud filters are added using enhanced
big buffer cloud filter mode of underlying switch include:
1. source port and source IP
2. Combined MAC address and IP fields.
3. Not specifying L4 port

These filter matches can however be used to redirect traffic to
the main VSI (tc 0) which does not require the enhanced big buffer
cloud filter support.

v4: Use classid to set traffic class for matched packets. Do not
allow disabling hw-tc-offloads when offloaded tc filters are active.
v3: Cleaned up some lengthy function names. Changed ipv6 address to
__be32 array instead of u8 array. Used macro for IP version. Minor
formatting changes.
v2:
1. Moved I40E_SWITCH_MODE_MASK definition to i40e_type.h
2. Moved dev_info for add/deleting cloud filters in else condition
3. Fixed some format specifier in dev_err logs
4. Refactored i40e_get_capabilities to take an additional
list_type parameter and use it to query device and function
level capabilities.
5. Fixed parsing tc redirect action to check for the is_tcf_mirred_tc()
to verify if redirect to a traffic class is supported.
6. Added comments for Geneve fix in cloud filter big buffer AQ
function definitions.
7. Cleaned up setup_tc interface to rebase and work with Jiri's
updates, separate function to process tc cls flower offloads.
8. Changes to make Flow Director Sideband and Cloud filters mutually
exclusive.

Signed-off-by: Amritha Nambiar 
Signed-off-by: Kiran Patil 
Signed-off-by: Anjali Singhai Jain 
Signed-off-by: Jingjing Wu 
---
  drivers/net/ethernet/intel/i40e/i40e.h |   45 +
  drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |3
  drivers/net/ethernet/intel/i40e/i40e_common.c  |  189 
  drivers/net/ethernet/intel/i40e/i40e_main.c|  913 +++-
  drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   16
  drivers/net/ethernet/intel/i40e/i40e_type.h|1
  .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h|3
  7 files changed, 1140 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index b938bb4a..c3f1312 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -55,6 +55,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include "i40e_type.h"
  #include "i40e_prototype.h"
  #include "i40e_client.h"
@@ -253,9 +255,48 @@ struct i40e_fdir_filter {
u32 fd_id;
  };
  
+#define IPV4_VERSION 4

+#define IPV6_VERSION 6


Why bother with yet-another-ip-type name?  Just use the existing 
ETH_P_IP and ETH_P_IPV6.



+
+#define I40E_CLOUD_FIELD_OMAC  0x01
+#define I40E_CLOUD_FIELD_IMAC  0x02
+#define I40E_CLOUD_FIELD_IVLAN 0x04
+#define I40E_CLOUD_FIELD_TEN_ID0x08
+#define I40E_CLOUD_FIELD_IIP   0x10
+
+#define I40E_CLOUD_FILTER_FLAGS_OMAC   I40E_CLOUD_FIELD_OMAC
+#define I40E_CLOUD_FILTER_FLAGS_IMAC   I40E_CLOUD_FIELD_IMAC
+#define I40E_CLOUD_FILTER_FLAGS_IMAC_IVLAN (I40E_CLOUD_FIELD_IMAC | \
+I40E_CLOUD_FIELD_IVLAN)
+#define I40E_CLOUD_FILTER_FLAGS_IMAC_TEN_ID(I40E_CLOUD_FIELD_IMAC | \
+I40E_CLOUD_FIELD_TEN_ID)
+#define

Re: [Intel-wired-lan] [jkirsher/next-queue PATCH v4 3/6] i40e: Cloud filter mode for set_switch_config command

2017-10-11 Thread Shannon Nelson


On 10/10/2017 5:24 PM, Amritha Nambiar wrote:

Add definitions for L4 filters and switch modes based on cloud filters
modes and extend the set switch config command to include the
additional cloud filter mode.

Signed-off-by: Amritha Nambiar 
Signed-off-by: Kiran Patil 
---
  drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h |   30 -
  drivers/net/ethernet/intel/i40e/i40e_common.c |4 ++-
  drivers/net/ethernet/intel/i40e/i40e_ethtool.c|2 +
  drivers/net/ethernet/intel/i40e/i40e_main.c   |2 +
  drivers/net/ethernet/intel/i40e/i40e_prototype.h  |2 +
  drivers/net/ethernet/intel/i40e/i40e_type.h   |9 ++
  6 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index 6a5db1b..729976b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -790,7 +790,35 @@ struct i40e_aqc_set_switch_config {
 */
__le16  first_tag;
__le16  second_tag;
-   u8  reserved[6];
+   /* Next byte is split into following:
+* Bit 7 : 0: No action, 1: Switch to mode defined by bits 6:0
+* Bit 6: 0 : Destination Port, 1: source port
+* Bit 5..4: L4 type


Can you tweak the formatting on these comments to line up the first 
couple of ':'s?



+* 0: rsvd
+* 1: TCP
+* 2: UDP
+* 3: Both TCP and UDP
+* Bits 3:0 Mode
+* 0: default mode
+* 1: L4 port only mode
+* 2: non-tunneled mode
+* 3: tunneled mode
+*/
+#define I40E_AQ_SET_SWITCH_BIT7_VALID  0x80
+
+#define I40E_AQ_SET_SWITCH_L4_SRC_PORT 0x40
+
+#define I40E_AQ_SET_SWITCH_L4_TYPE_RSVD0x00
+#define I40E_AQ_SET_SWITCH_L4_TYPE_TCP 0x10
+#define I40E_AQ_SET_SWITCH_L4_TYPE_UDP 0x20
+#define I40E_AQ_SET_SWITCH_L4_TYPE_BOTH0x30
+
+#define I40E_AQ_SET_SWITCH_MODE_DEFAULT0x00
+#define I40E_AQ_SET_SWITCH_MODE_L4_PORT0x01
+#define I40E_AQ_SET_SWITCH_MODE_NON_TUNNEL 0x02
+#define I40E_AQ_SET_SWITCH_MODE_TUNNEL 0x03
+   u8  mode;
+   u8  rsvd5[5];
  };
  
  I40E_CHECK_CMD_LENGTH(i40e_aqc_set_switch_config);

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 1b85eb3..0b3c5b7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -2402,13 +2402,14 @@ i40e_status i40e_aq_get_switch_config(struct i40e_hw 
*hw,
   * @hw: pointer to the hardware structure
   * @flags: bit flag values to set
   * @valid_flags: which bit flags to set
+ * @mode: cloud filter mode
   * @cmd_details: pointer to command details structure or NULL
   *
   * Set switch configuration bits
   **/
  enum i40e_status_code i40e_aq_set_switch_config(struct i40e_hw *hw,
u16 flags,
-   u16 valid_flags,
+   u16 valid_flags, u8 mode,
struct i40e_asq_cmd_details *cmd_details)
  {
struct i40e_aq_desc desc;
@@ -2420,6 +2421,7 @@ enum i40e_status_code i40e_aq_set_switch_config(struct 
i40e_hw *hw,
  i40e_aqc_opc_set_switch_config);
scfg->flags = cpu_to_le16(flags);
scfg->valid_flags = cpu_to_le16(valid_flags);
+   scfg->mode = mode;
if (hw->flags & I40E_HW_FLAG_802_1AD_CAPABLE) {
scfg->switch_tag = cpu_to_le16(hw->switch_tag);
scfg->first_tag = cpu_to_le16(hw->first_tag);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index a760d75..37ca294 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -4341,7 +4341,7 @@ static int i40e_set_priv_flags(struct net_device *dev, 
u32 flags)
sw_flags = I40E_AQ_SET_SWITCH_CFG_PROMISC;
valid_flags = I40E_AQ_SET_SWITCH_CFG_PROMISC;
ret = i40e_aq_set_switch_config(>hw, sw_flags, valid_flags,
-   NULL);
+   0, NULL);
if (ret && pf->hw.aq.asq_last_status != I40E_AQ_RC_ESRCH) {
dev_info(>pdev->dev,
 "couldn't set switch config bits, err %s aq_err 
%s\n",
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 33a8f429..0539d43 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -12165,7 +12165,7 @@ static int i40e_setup_pf_switch(struct

Re: [PATCH net-next 0/7] Rewrite some existing functionality

2017-10-11 Thread Subash Abhinov Kasiviswanathan

On 2017-10-11 16:25, David Miller wrote:

From: David Miller 
Date: Wed, 11 Oct 2017 15:22:59 -0700 (PDT)

From: Subash Abhinov Kasiviswanathan 
Date: Tue, 10 Oct 2017 22:17:29 -0600

This series fixes some of the broken rmnet functionality.
Bridge mode is re-written and made useable and the muxed_ep is 
converted to hlist.

Patches 1-5 are cleanups in preparation for these changes.
Patch 6 does the hlist conversion.
Patch 7 has the implementation of the rmnet bridge mode.

Series applied, thank you.

Actually, I reverted:

drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c: In function
‘rmnet_rx_handler’:
drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:174:6: warning:
‘rc’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
  int rc;
  ^~

Also, the indentation of the switch statement is wrong, the break
statements need to be indented the same as the rest of the code
in their switch statements.

Hi David

I'll fix this and upload v2.
Somehow my compiler didnt throw this warning even though i have -Wall 
set.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH 0/4] net: qcom/emac: various minor fixes

2017-10-11 Thread David Miller

From: Timur Tabi 
Date: Wed, 11 Oct 2017 14:52:22 -0500

> A set of patches for 4.15 that clean up some code, apply minors fixes,
> and so on.  Some of the code also prepares the driver for a future 
> version of the EMAC controller.

Series applied, thank you.

Re: [PATCH] i40e/i40evf: actually use u32 for feature flags

2017-10-11 Thread Jeff Kirsher

On Wed, 2017-10-11 at 16:02 +0200, Arnd Bergmann wrote:
> A previous cleanup intended to change the flags variable to 32
> bit instead of 64, but accidentally left out the important
> part of that change, leading to a build error:
> 
> drivers/net/ethernet/intel/i40e/i40e_ethtool.o: In function
> `i40e_set_priv_flags':
> i40e_ethtool.c:(.text+0x1a94): undefined reference to
> `wrong_size_cmpxchg'
> 
> This adds the missing modification.
> 
> Fixes: b74f571f59a8 ("i40e/i40evf: organize and re-number feature
> flags")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

To slow... :-) I had already sent a fix for this on Monday to David. 
Check David Miller's net-next tree, it is already there.

Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e

2017-10-11 Thread Nambiar, Amritha

On 10/11/2017 5:42 AM, Jamal Hadi Salim wrote:
> On 17-10-10 08:24 PM, Amritha Nambiar wrote:
>> This patch series enables configuring cloud filters in i40e
>> using the tc-flower classifier. The classification function
>> of the filter is to match a packet to a class. cls_flower is
>> extended to offload classid to hardware. The offloaded classid
>> is used direct matched packets to a traffic class on the device.
>> The approach here is similar to the tc 'prio' qdisc which uses
>> the classid for band selection. The ingress qdisc is called :0,
>> so traffic classes are :1 to :8 (i40e has max of 8 TCs).
>> TC0 is minor number 1, TC1 is minor number 2 etc.
>>
>> The cloud filters are added for a VSI and are cleaned up when
>> the VSI is deleted. The filters that match on L4 ports needs
>> enhanced admin queue functions with big buffer support for
>> extended fields in cloud filter commands.
>>
>> Example:
>> # tc qdisc add dev eth0 ingress
>> # ethtool -K eth0 hw-tc-offload on
>>
>> Match Dst IPv4,Dst Port and route to TC1:
>> # tc filter add dev eth0 protocol ip parent : prio 1 flower\
>>dst_ip 192.168.1.1/32 ip_proto udp dst_port 22\
>>skip_sw classid :2
>>
>> # tc filter show dev eth0 parent :
>> filter pref 1 flower chain 0
>> filter pref 1 flower chain 0 handle 0x1 classid :2
>>eth_type ipv4
>>ip_proto udp
>>dst_ip 192.168.1.1
>>dst_port 22
>>skip_sw
>>in_hw
>>
> 
> Much much better semantic. Thank you.
> Have you tested many filter mapping to the same classid?

Yes, I have tested mapping different filters to the same classID,
packets matching the flows were assigned the same classID and routed to
the same traffic class in HW.

filter pref 1 flower chain 0
filter pref 1 flower chain 0 handle 0x1 classid :2
  dst_mac 3c:fd:fe:a0:d6:70
  eth_type ipv4
  ip_proto udp
  dst_port 12000
  in_hw
filter pref 5 flower chain 0
filter pref 5 flower chain 0 handle 0x1 classid :2
  eth_type ipv4
  ip_proto udp
  dst_ip 192.168.1.1
  dst_port 12000
  in_hw

> 
> cheers,
> jamal
>

Re: [PATCH v2 0/7] net: qrtr: Fixes and support receiving version 2 packets

2017-10-11 Thread David Miller

From: Bjorn Andersson 
Date: Tue, 10 Oct 2017 23:45:16 -0700

> On the latest Qualcomm platforms remote processors are sending packets with
> version 2 of the message header. This series starts off with some fixes and
> then refactors the qrtr code to support receiving messages of both version 1
> and version 2.
> 
> As all remotes are backwards compatible transmitted packets continues to be
> send as version 1, but some groundwork has been done to make this a per-link
> property.

Series applied, thanks.

Re: [PATCH] r8169: only enable PCI wakeups when WOL is active

2017-10-11 Thread David Miller

From: Daniel Drake 
Date: Wed, 11 Oct 2017 12:56:52 +0800

> rtl_init_one() currently enables PCI wakeups if the ethernet device
> is found to be WOL-capable. There is no need to do this when
> rtl8169_set_wol() will correctly enable or disable the same wakeup flag
> when WOL is activated/deactivated.
> 
> This works around an ACPI DSDT bug which prevents the Acer laptop models
> Aspire ES1-533, Aspire ES1-732, PackardBell ENTE69AP and Gateway NE533
> from entering S3 suspend - even when no ethernet cable is connected.
> 
> On these platforms, the DSDT says that GPE08 is a wakeup source for
> ethernet, but this GPE fires as soon as the system goes into suspend,
> waking the system up immediately. Having the wakeup normally disabled
> avoids this issue in the default case.
> 
> With this change, WOL will continue to be unusable on these platforms
> (it will instantly wake up if WOL is later enabled by the user) but we
> do not expect this to be a commonly used feature on these consumer
> laptops. We have separately determined that WOL works fine without any
> ACPI GPEs enabled during sleep, so a DSDT fix or override would be
> possible to make WOL work.
> 
> Signed-off-by: Daniel Drake 

Applied, thank you.

Re: [PATCH net-next 0/7] Rewrite some existing functionality

2017-10-11 Thread David Miller

From: David Miller 
Date: Wed, 11 Oct 2017 15:22:59 -0700 (PDT)

> From: Subash Abhinov Kasiviswanathan 
> Date: Tue, 10 Oct 2017 22:17:29 -0600
> 
>> This series fixes some of the broken rmnet functionality.
>> Bridge mode is re-written and made useable and the muxed_ep is converted to 
>> hlist.
>> 
>> Patches 1-5 are cleanups in preparation for these changes.
>> Patch 6 does the hlist conversion.
>> Patch 7 has the implementation of the rmnet bridge mode.
> 
> Series applied, thank you.

Actually, I reverted:

drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c: In function 
‘rmnet_rx_handler’:
drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:174:6: warning: ‘rc’ may 
be used uninitialized in this function [-Wmaybe-uninitialized]
  int rc;
  ^~

Also, the indentation of the switch statement is wrong, the break
statements need to be indented the same as the rest of the code
in their switch statements.

Re: [PATCH net-next 0/7] Rewrite some existing functionality

2017-10-11 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Tue, 10 Oct 2017 22:17:29 -0600

> This series fixes some of the broken rmnet functionality.
> Bridge mode is re-written and made useable and the muxed_ep is converted to 
> hlist.
> 
> Patches 1-5 are cleanups in preparation for these changes.
> Patch 6 does the hlist conversion.
> Patch 7 has the implementation of the rmnet bridge mode.

Series applied, thank you.

Re: [PATCH net-next] net: hns3: make local functions static

2017-10-11 Thread David Miller

From: Wei Yongjun 
Date: Wed, 11 Oct 2017 02:35:23 +

> Fixes the following sparse warnings:
> 
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:464:5: warning:
>  symbol 'hns3_change_all_ring_bd_num' was not declared. Should it be static?
> drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c:477:5: warning:
>  symbol 'hns3_set_ringparam' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied.

RE: [PATCH] i40e/i40evf: actually use u32 for feature flags

2017-10-11 Thread Keller, Jacob E

> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Wednesday, October 11, 2017 7:03 AM
> To: Kirsher, Jeffrey T 
> Cc: Arnd Bergmann ; Keller, Jacob E
> ; Duyck, Alexander H
> ; Williams, Mitch A
> ; Sadowski, Filip ; 
> intel-
> wired-...@lists.osuosl.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] i40e/i40evf: actually use u32 for feature flags
> 
> A previous cleanup intended to change the flags variable to 32
> bit instead of 64, but accidentally left out the important
> part of that change, leading to a build error:
> 
> drivers/net/ethernet/intel/i40e/i40e_ethtool.o: In function 
> `i40e_set_priv_flags':
> i40e_ethtool.c:(.text+0x1a94): undefined reference to `wrong_size_cmpxchg'
> 
> This adds the missing modification.
> 

Hah good eyes. I'm guessing this got messed up in a patch re-ordering.

Acked-by: Jacob Keller 

> Fixes: b74f571f59a8 ("i40e/i40evf: organize and re-number feature flags")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h
> b/drivers/net/ethernet/intel/i40e/i40e.h
> index 18c453a3e728..7baf6d8a84dd 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -424,7 +424,7 @@ struct i40e_pf {
>  #define I40E_HW_PORT_ID_VALIDBIT(17)
>  #define I40E_HW_RESTART_AUTONEG  BIT(18)
> 
> - u64 flags;
> + u32 flags;
>  #define I40E_FLAG_RX_CSUM_ENABLEDBIT(0)
>  #define I40E_FLAG_MSI_ENABLEDBIT(1)
>  #define I40E_FLAG_MSIX_ENABLED   BIT(2)
> --
> 2.9.0

Re: [PATCH][bpf-next] bpf: remove redundant variable old_flags

2017-10-11 Thread Daniel Borkmann


On 10/11/2017 12:56 PM, Colin King wrote:

From: Colin Ian King 

Variable old_flags is being assigned but is never read; it is redundant
and can be removed.

Cleans up clang warning: Value stored to 'old_flags' is never read

Signed-off-by: Colin Ian King 


Acked-by: Daniel Borkmann

RE: [PATCH] fm10k: mark PM functions as __maybe_unused

2017-10-11 Thread Keller, Jacob E



> -Original Message-
> From: Arnd Bergmann [mailto:a...@arndb.de]
> Sent: Wednesday, October 11, 2017 6:58 AM
> To: Kirsher, Jeffrey T 
> Cc: Arnd Bergmann ; Keller, Jacob E
> ; Kwan, Ngai-mint ;
> David S. Miller ; Florian Westphal ;
> intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] fm10k: mark PM functions as __maybe_unused
> 
> A cleanup of the PM code left an incorrect #ifdef in place, leading
> to a harmless build warning:
> 
> drivers/net/ethernet/intel/fm10k/fm10k_pci.c:2502:12: error: 'fm10k_suspend'
> defined but not used [-Werror=unused-function]
> drivers/net/ethernet/intel/fm10k/fm10k_pci.c:2475:12: error: 'fm10k_resume'
> defined but not used [-Werror=unused-function]
> 
> It's easier to use __maybe_unused attributes here, since you
> can't pick the wrong one.
> 

Acked-by: Jacob Keller 

> Fixes: 8249c47c6ba4 ("fm10k: use generic PM hooks instead of legacy PCIe power
> hooks")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 9 ++---
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> index 1e9ae3197b17..52f8eb3c470e 100644
> --- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> +++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
> @@ -2463,7 +2463,6 @@ static int fm10k_handle_resume(struct fm10k_intfc
> *interface)
>   return err;
>  }
> 
> -#ifdef CONFIG_PM
>  /**
>   * fm10k_resume - Generic PM resume hook
>   * @dev: generic device structure
> @@ -2472,7 +2471,7 @@ static int fm10k_handle_resume(struct fm10k_intfc
> *interface)
>   * suspend or hibernation. This function does not need to handle lower PCIe
>   * device state as the stack takes care of that for us.
>   **/
> -static int fm10k_resume(struct device *dev)
> +static int __maybe_unused fm10k_resume(struct device *dev)
>  {
>   struct fm10k_intfc *interface = pci_get_drvdata(to_pci_dev(dev));
>   struct net_device *netdev = interface->netdev;
> @@ -2499,7 +2498,7 @@ static int fm10k_resume(struct device *dev)
>   * system suspend or hibernation. This function does not need to handle lower
>   * PCIe device state as the stack takes care of that for us.
>   **/
> -static int fm10k_suspend(struct device *dev)
> +static int __maybe_unused fm10k_suspend(struct device *dev)
>  {
>   struct fm10k_intfc *interface = pci_get_drvdata(to_pci_dev(dev));
>   struct net_device *netdev = interface->netdev;
> @@ -2511,8 +2510,6 @@ static int fm10k_suspend(struct device *dev)
>   return 0;
>  }
> 
> -#endif /* CONFIG_PM */
> -
>  /**
>   * fm10k_io_error_detected - called when PCI error is detected
>   * @pdev: Pointer to PCI device
> @@ -2643,11 +2640,9 @@ static struct pci_driver fm10k_driver = {
>   .id_table   = fm10k_pci_tbl,
>   .probe  = fm10k_probe,
>   .remove = fm10k_remove,
> -#ifdef CONFIG_PM
>   .driver = {
>   .pm = _pm_ops,
>   },
> -#endif /* CONFIG_PM */
>   .sriov_configure= fm10k_iov_configure,
>   .err_handler= _err_handler
>  };
> --
> 2.9.0

Re: [PATCH] hdlc: Convert timers to use timer_setup()

2017-10-11 Thread David Miller

From: Kees Cook 
Date: Tue, 10 Oct 2017 15:08:33 -0700

> In preparation for unconditionally passing the struct timer_list pointer to
> all timer callbacks, switch to using the new timer_setup() and from_timer()
> to pass the timer pointer explicitly. This adds a pointer back to the
> net_device, and drops needless open-coded resetting of the .function and
> .data fields.
> 
> Cc: David S. Miller 
> Cc: Krzysztof Halasa 
> Cc: netdev@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
> This requires commit 686fef928bba ("timer: Prepare to change timer
> callback argument type") in v4.14-rc3, but should be otherwise
> stand-alone.

This doesn't apply cleanly to net-next, please respin.

Re: [PATCH net-next 1/1] veth: tweak creation of veth device

2017-10-11 Thread David Miller

From: Roman Mashak 
Date: Tue, 10 Oct 2017 16:08:44 -0400

> When creating veth pair, at first rtnl_new_link() creates veth_dev, i.e.
> one end of the veth pipe, but not registers it; then veth_newlink() gets
> invoked, where peer dev is created _and_ registered, followed by veth_dev
> registration, which may fail if peer information, that is VETH_INFO_PEER
> attribute, has not been provided and the kernel will allocate unique veth
> name.
> 
> So, we should ask the kernel to allocate unique name for veth_dev only
> when peer info is not available.
> 
> Example:
> 
> % ip link dev veth0 type veth
> RTNETLINK answers: File exists
> 
> After fix:
> % ip link dev veth0 type veth
> % ip link show dev veth0
> 5: veth0@veth1:  mtu 1500 qdisc noop state DOWN mode 
> DEFAULT group default qlen 1000
> link/ether f6:ef:8b:96:f4:ec brd ff:ff:ff:ff:ff:ff
> %
> 
> Signed-off-by: Roman Mashak 

I'm not so sure about this.

If we specify an explicit tb[IFLA_NAME], we shouldn't completely ignore that
request from the user just because they didn't give any peer information.

I see what happens in this case, the peer gets 'veth0' and then since
the user asked for 'veth0' for the non-peer it conflicts.

Well, too bad.  The user must work to orchestrate things such that
this doesn't happen.  That means either providing the IFLA_NAME for
both the peer and the non-peer, or specifying neither.

I'm not applying this, sorry.

Re: BUG:af_packet fails to TX TSO frames

2017-10-11 Thread Anton Ivanov

[snip]

> This will be tomorrow though, it is late here.
>
> The only obvious difference I can see at this point is that I am using
> iovs and sending the vnet header as iov[0] and the data in pieces after
> that while your code is doing a send() for the whole frame. This should
> not make any difference though - it all ends up as an iov internally in
> the kernel.

Spoke too soon. It is not reporting any errors, but there is nothing
coming out on the actual Ethernet.

Leaving it till tomorrow.

A.

>
> A.
>
>


-- 
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661

Re: [RFC net-next 1/4] net: ipv6: Make inet6addr_validator a blocking notifier

2017-10-11 Thread David Ahern

On 10/11/17 3:13 PM, David Miller wrote:
> From: David Ahern 
> Date: Tue, 10 Oct 2017 09:41:02 -0700
> 
>> +/* validator notifier needs to be blocking;
>> + * do not call in softirq context
>> + */
>> +if (!in_softirq()) {
> 
> I think we can test this better.

The callchain we are protecting against is
7fff8149d0dd ipv6_add_addr ([kernel.kallsyms])
7fff814a161b addrconf_prefix_rcv ([kernel.kallsyms])
7fff814afb8a ndisc_router_discovery ([kernel.kallsyms])
7fff814b0310 ndisc_rcv ([kernel.kallsyms])
7fff814b62da icmpv6_rcv ([kernel.kallsyms])
7fff81499c37 ip6_input_finish ([kernel.kallsyms])
7fff81499e96 ip6_input ([kernel.kallsyms])
7fff8149a519 ip6_mc_input ([kernel.kallsyms])
7fff81499f9d ip6_rcv_finish ([kernel.kallsyms])
7fff8149a349 ipv6_rcv ([kernel.kallsyms])
7fff813fbe12 __netif_receive_skb_core ([kernel.kallsyms])
7fff813fc04c __netif_receive_skb ([kernel.kallsyms])
7fff813ff97c netif_receive_skb_internal ([kernel.kallsyms])

> 
> You should be able to audit the call sites and for each one set the
> value of a new boolean argument properly, and this way you can also
> give the boolean argument a descriptive name.

The safest is an in_atomic() check, but to your point I'll see if the
caller can pass in atomic vs blocking option as a bool.

> 
> Furthermore, we can also then pull the inet6_addr allocation out of
> the locking paths and thus use GFP_KERNEL when possible.
> 

Yes, I was thinking about that as a follow on -- how far down can the
rcu_read_lock_bh be pushed.

Re: BUG:af_packet fails to TX TSO frames

2017-10-11 Thread Anton Ivanov

[snip]

> The test can be run both with and without ring:
>
>   psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v
>   psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N
>
> both with and without qdisc bypass ('-q').

Thanks, apologies, I was being inpatient. Started reading the source,
saw the tpacket bits and stopped there.

>
>>  - this goes via the tpacket_snd
>> which allocs via sock_alloc_send_skb. That results in a non-fragged skb
>> as it calls pskb after that with data_len = 0 asking for a contiguous one.
> but attached the ring slot as fragments in tpacket_fill_skb.
>
>> My stuff is using sendmmsg which ends up via packet_snd which allocs
>> via  sock_alloc_send_pskb which is invoked in a way which always creates
>> 2 segments - one for the linear section and one for the rest (and more
>> if needed). It is faster than tpacket by the way (several times).
>>
>> As a comparison tap and other virtual drivers use sock_alloc_send_pskb
>> with non-zero data length which results in multiple frags. The code in
>> packet_snd is in fact identical with tap (+/- some cosmetic differences).
>>
>> That is the difference between the tests and that is why your test works
>> and mine fails.
> All the above test cases work for me, including those that build skbs
> with fragments. Could you try those.

Tried it, works on all of the adapters and hosts where mine fails. I
will step by step hack-in the differences so it behaves same as mine
until I find the culprit.

This will be tomorrow though, it is late here.

The only obvious difference I can see at this point is that I am using
iovs and sending the vnet header as iov[0] and the data in pieces after
that while your code is doing a send() for the whole frame. This should
not make any difference though - it all ends up as an iov internally in
the kernel.

A.

>

Re: [PATCH RFC 0/3] tun zerocopy stats

2017-10-11 Thread Willem de Bruijn

On Tue, Oct 10, 2017 at 11:15 PM, Jason Wang  wrote:
>
>
> On 2017年10月11日 03:11, Willem de Bruijn wrote:
>>
>> On Tue, Oct 10, 2017 at 1:39 PM, David Miller  wrote:
>>>
>>> From: Willem de Bruijn 
>>> Date: Tue, 10 Oct 2017 11:29:33 -0400
>>>
 If there is a way to expose these stats through vhost_net directly,
 instead of through tun, that may be better. But I did not see a
 suitable interface. Perhaps debugfs.
>>>
>>> Please don't use debugfs, thank you :-)
>>
>> Okay. I'll take a look at tracing for on-demand measurement.
>
>
> This reminds me a past series that adding tracepoints to vhost/net[1]. It
> can count zero/datacopy independently and even contains a sample program to
> show the stats.

Interesting, thanks!

For occasional evaluation, we can also use a bpf kprobe for the time being:

  bpf_program = """
  #include 
  #include 

  BPF_ARRAY(count, u64, 2);

  void inc_counter(struct pt_regs *ctx) {
  bool success;
  int key;
  u64 *val;

  success = PT_REGS_PARM2(ctx);
  key = success ? 0 : 1;
  val = count.lookup();
  if (val)
  lock_xadd(val, 1);
  }
  """

  b = bcc.BPF(text=bpf_program)
  b.attach_kprobe(event="vhost_zerocopy_callback", fn_name="inc_counter")

  time.sleep(5)

  print("vhost_zerocopy_callback: Y:%d N:%d" %
(b["count"][ctypes.c_int(0)].value,
 b["count"][ctypes.c_int(1)].value))

Re: [next-queue PATCH v5 4/5] net/sched: Add support for HW offloading for CBS

2017-10-11 Thread Vinicius Costa Gomes

Jiri Pirko  writes:

[...]

>>+static void disable_cbs_offload(struct net_device *dev,
>>+ struct cbs_sched_data *q)
>>+{
>>+ struct tc_cbs_qopt_offload cbs = { };
>>+ const struct net_device_ops *ops;
>>+ int err;
>>+
>>+ if (!q->offload)
>>+ return;
>>+
>>+ ops = dev->netdev_ops;
>>+ if (!ops->ndo_setup_tc)
>>+ return;
>>+
>>+ cbs.queue = q->queue;
>>+ cbs.enable = 0;
>>+
>>+ err = ops->ndo_setup_tc(dev, TC_SETUP_CBS, );
>>+ if (err < 0)
>>+ pr_warn("Couldn't disable CBS offload for queue %d\n",
>>+ cbs.queue);
>
> Hmm, you have separete helper for disable, yet you have enable spread
> over cbs_change. Please push the enable code into enable_cbs_offload.
> While you are at it, change the names to cbs_ to maintain the qdisc
> prefix in function names: cbs_offload_enable/cbs_offload_disable
>

Sure.


Cheers,
--
Vinicius

Re: [next-queue PATCH v5 3/5] net/sched: Introduce Credit Based Shaper (CBS) qdisc

2017-10-11 Thread Vinicius Costa Gomes

Jiri Pirko  writes:

[...]

>>+struct tc_cbs_qopt_offload {
>>+ u8 enable;
>>+ s32 queue;
>>+ s32 hicredit;
>>+ s32 locredit;
>>+ s32 idleslope;
>>+ s32 sendslope;
>
> Please introduce the qdisc in one patch, then offload it in second. That
> is what I requested already. 2 patches please.
>
> [...]

Will move these declarations to the offload patch.

>
>
>>+static struct Qdisc_ops cbs_qdisc_ops __read_mostly = {
>>+ .next   =   NULL,
>
> It is already 0, no need to re-init.

Will fix.


Cheers,
--
Vinicius

[PATCH net-next 1/1] bridge: return error code when deleting Vlan

2017-10-11 Thread Roman Mashak


Signed-off-by: Roman Mashak 
---
 net/bridge/br_netlink.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index f0e8268..a1e1ca8 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -527,11 +527,11 @@ static int br_vlan_info(struct net_bridge *br, struct 
net_bridge_port *p,
 
case RTM_DELLINK:
if (p) {
-   nbp_vlan_delete(p, vinfo->vid);
+   err = nbp_vlan_delete(p, vinfo->vid);
if (vinfo->flags & BRIDGE_VLAN_INFO_MASTER)
-   br_vlan_delete(p->br, vinfo->vid);
+   err = br_vlan_delete(p->br, vinfo->vid);
} else {
-   br_vlan_delete(br, vinfo->vid);
+   err = br_vlan_delete(br, vinfo->vid);
}
break;
}
-- 
1.9.1

Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e

2017-10-11 Thread Jiri Pirko

Wed, Oct 11, 2017 at 11:19:29PM CEST, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Wed, 11 Oct 2017 22:58:30 +0200
>
>> Well if I see classid, I expect it should refer to qdisc instance. So
>> far, this has been always a case. But for some drivers, this would mean
>> something totally different and unrelated. So what should I think?
>> What's next? Classid could be abused to identify something else. I don't
>> understand why.
>> 
>> classid in kernel and tclass in hw are 2 completely unrelated things.
>
>Why do they need to be different?
>
>It's qdisc instance in both cases.  The driver is just using it to
>refer to the qdisc as offloaded in the hardware.  It's a key, nothing
>more.  The context in which it is used doesn't change it's meaning.
>
>> Why they should share the same userspace api? What am I missing that
>> indicates this is not an abuse?
>
>Why invent a completely new ID space to refer to something we exactly
>have an ID for already?
>
>This duplication for the sake of "API" makes no sense to me.
>
>The handle is not going away.  It is not going to stop referring to
>a specific qdisc.
>
>So it's stable and appropriate to use to refer to a qdisc, whatever
>operation being performed, or offload being we are going to perform of
>it.

Okay, fair enough. Yet, I can't say I'm happy with it :/ But I guess
that what you say makes sense.


>
>I notice you are quite feisty lately in your reviews of other people's
>work, so I have to ask if things are very stressful in your life?

:) Yeah, that is probably coincidental. Lots of odd offloading stuff is
happening lately.


>Please drink a nice warm cup of tea and calm down :-)

I'm perfectly calm. But thanks for showing the care :)

Re: [PATCH][next] sctp: make array sctp_sched_ops static

2017-10-11 Thread Marcelo Ricardo Leitner

On Wed, Oct 11, 2017 at 03:44:50AM -0700, Joe Perches wrote:
> On Wed, 2017-10-11 at 11:17 +0100, Colin King wrote:
> > From: Colin Ian King 
> > 
> > The array sctp_sched_ops  is local to the source and
> > does not need to be in global scope, so make it static.
> > 
> > Cleans up sparse warning:
> > symbol 'sctp_sched_ops' was not declared. Should it be static?
> > 
> > Signed-off-by: Colin Ian King 
> > ---
> >  net/sctp/stream_sched.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/sctp/stream_sched.c b/net/sctp/stream_sched.c
> > index 03513a9fa110..0b83ec51e43b 100644
> > --- a/net/sctp/stream_sched.c
> > +++ b/net/sctp/stream_sched.c
> > @@ -124,7 +124,7 @@ static struct sctp_sched_ops sctp_sched_fcfs = {
> >  extern struct sctp_sched_ops sctp_sched_prio;
> >  extern struct sctp_sched_ops sctp_sched_rr;
> >  
> > -struct sctp_sched_ops *sctp_sched_ops[] = {
> > +static struct sctp_sched_ops *sctp_sched_ops[] = {
> > _sched_fcfs,
> > _sched_prio,
> > _sched_rr,
> 
> Perhaps these should also be const to move more data to text

Yes. There are no plans on supporting any sort of dynamic updates on
this, at least for now.

  Marcelo

Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e

2017-10-11 Thread David Miller

From: Jiri Pirko 
Date: Wed, 11 Oct 2017 22:58:30 +0200

> Well if I see classid, I expect it should refer to qdisc instance. So
> far, this has been always a case. But for some drivers, this would mean
> something totally different and unrelated. So what should I think?
> What's next? Classid could be abused to identify something else. I don't
> understand why.
> 
> classid in kernel and tclass in hw are 2 completely unrelated things.

Why do they need to be different?

It's qdisc instance in both cases.  The driver is just using it to
refer to the qdisc as offloaded in the hardware.  It's a key, nothing
more.  The context in which it is used doesn't change it's meaning.

> Why they should share the same userspace api? What am I missing that
> indicates this is not an abuse?

Why invent a completely new ID space to refer to something we exactly
have an ID for already?

This duplication for the sake of "API" makes no sense to me.

The handle is not going away.  It is not going to stop referring to
a specific qdisc.

So it's stable and appropriate to use to refer to a qdisc, whatever
operation being performed, or offload being we are going to perform of
it.

I notice you are quite feisty lately in your reviews of other people's
work, so I have to ask if things are very stressful in your life?
Please drink a nice warm cup of tea and calm down :-)

Re: [PATCH RFC] Add other KSZ switch support so that patch check does not complain

2017-10-11 Thread Florian Fainelli

On 10/06/2017 01:33 PM, tristram...@microchip.com wrote:
> From: Tristram Ha 
> 
> Add other KSZ switch support so that patch check does not complain.

You are not doing this just so checkpatch.pl stops complaining, what you
are doing here is to properly document the possible models supported by
this binding document.

Please also use a proper subject for this patch:

dt-bindings: dsa: Document additional Micrel KSZ family switches

or something along those lines.

With that:

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCH net-next 0/4] tc-testing: Test suite updates

2017-10-11 Thread Lucas Bates

This patch series is a roundup of changes to the tc-testing
suite:

 - Add test cases for police and mirred modules and some coverage
   in already-submitted test categories
 - Break the test case files down into more user-friendly sizes
 - Bug fix to the tdc.py script's handling of the -l argument


Lucas Bates (4):
  tc-testing: Add test cases for flushing actions
  tc-testing: Split test case files into smaller chunks
  tc-testing: Add test cases for police and skbmod
  tc-testing: fix the -l argument bug in tdc.py script

 .../tc-testing/tc-tests/actions/gact.json  |  469 
 .../selftests/tc-testing/tc-tests/actions/ife.json |   52 +
 .../tc-testing/tc-tests/actions/mirred.json|  223 
 .../tc-testing/tc-tests/actions/police.json|  527 +
 .../tc-testing/tc-tests/actions/simple.json|  130 +++
 .../tc-testing/tc-tests/actions/skbedit.json   |  320 ++
 .../tc-testing/tc-tests/actions/skbmod.json|  372 +++
 .../tc-testing/tc-tests/actions/tests.json | 1165 
 tools/testing/selftests/tc-testing/tdc.py  |8 +-
 9 files changed, 2097 insertions(+), 1169 deletions(-)
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/police.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/simple.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/skbmod.json
 delete mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/tests.json

--
2.7.4

[PATCH net-next 1/4] tc-testing: Add test cases for flushing actions

2017-10-11 Thread Lucas Bates

Tests for flushing gact and mirred were missing. This patch
adds test cases to explicitly test the flush of any installed
gact/mirred actions.

Signed-off-by: Lucas Bates 
Acked-by: Jamal Hadi Salim 
---
 .../tc-testing/tc-tests/actions/tests.json | 49 +-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/tests.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/tests.json
index 6973bdc..2ea0065 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/tests.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/tests.json
@@ -246,6 +246,27 @@
 ]
 },
 {
+"id": "3edf",
+"name": "Flush gact actions",
+"category": [
+"actions",
+"gact"
+],
+"setup": [
+"$TC actions add action reclassify index 101",
+"$TC actions add action reclassify index 102",
+"$TC actions add action reclassify index 103",
+"$TC actions add action reclassify index 104",
+"$TC actions add action reclassify index 105"
+],
+"cmdUnderTest": "$TC actions flush action gact",
+"expExitCode": "0",
+"verifyCmd": "$TC actions list action gact",
+"matchPattern": "action order [0-9]*: gact action reclassify",
+"matchCount": "0",
+"teardown": []
+},
+{
 "id": "63ec",
 "name": "Delete pass action",
 "category": [
@@ -469,6 +490,32 @@
 ]
 },
 {
+"id": "58c3",
+"name": "Flush mirred actions",
+"category": [
+"actions",
+"mirred"
+],
+"setup": [
+[
+"$TC actions flush action mirred",
+0,
+1,
+255
+],
+"$TC actions add action mirred egress mirror index 1 dev lo",
+"$TC actions add action mirred egress redirect index 2 dev lo"
+],
+"cmdUnderTest": "$TC actions show action mirred",
+"expExitCode": "0",
+"verifyCmd": "$TC actions list action mirred",
+"matchPattern": "[Mirror|Redirect] to device lo",
+"matchCount": "0",
+"teardown": [
+"$TC actions flush action mirred"
+]
+},
+{
 "id": "d7c0",
 "name": "Add invalid mirred direction",
 "category": [
@@ -1162,4 +1209,4 @@
 "$TC actions flush action ife"
 ]
 }
-]
\ No newline at end of file
+]
--
2.7.4

[PATCH net-next 2/4] tc-testing: Split test case files into smaller chunks

2017-10-11 Thread Lucas Bates

The original submission had the test cases stored in one
monolithic file. This can be unwieldy to edit, especially as more
test cases are added. This patch removes the original tests.json
file in favour of individual ones broken down by category.

Signed-off-by: Lucas Bates 
Acked-by: Jamal Hadi Salim 
---
 .../tc-testing/tc-tests/actions/gact.json  |  469 
 .../selftests/tc-testing/tc-tests/actions/ife.json |   52 +
 .../tc-testing/tc-tests/actions/mirred.json|  223 
 .../tc-testing/tc-tests/actions/simple.json|  130 +++
 .../tc-testing/tc-tests/actions/skbedit.json   |  320 ++
 .../tc-testing/tc-tests/actions/tests.json | 1212 
 6 files changed, 1194 insertions(+), 1212 deletions(-)
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/ife.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/simple.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/skbedit.json
 delete mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/tests.json

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
new file mode 100644
index 000..e2187b6
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/gact.json
@@ -0,0 +1,469 @@
+[
+{
+"id": "e89a",
+"name": "Add valid pass action",
+"category": [
+"actions",
+"gact"
+],
+"setup": [
+[
+"$TC actions flush action gact",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action pass index 8",
+"expExitCode": "0",
+"verifyCmd": "$TC actions list action gact",
+"matchPattern": "action order [0-9]*: gact action pass.*index 8 ref",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action gact"
+]
+},
+{
+"id": "a02c",
+"name": "Add valid pipe action",
+"category": [
+"actions",
+"gact"
+],
+"setup": [
+[
+"$TC actions flush action gact",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action pipe index 6",
+"expExitCode": "0",
+"verifyCmd": "$TC actions list action gact",
+"matchPattern": "action order [0-9]*: gact action pipe.*index 6 ref",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action gact"
+]
+},
+{
+"id": "feef",
+"name": "Add valid reclassify action",
+"category": [
+"actions",
+"gact"
+],
+"setup": [
+[
+"$TC actions flush action gact",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action reclassify index 5",
+"expExitCode": "0",
+"verifyCmd": "$TC actions list action gact",
+"matchPattern": "action order [0-9]*: gact action reclassify.*index 5 
ref",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action gact"
+]
+},
+{
+"id": "8a7a",
+"name": "Add valid drop action",
+"category": [
+"actions",
+"gact"
+],
+"setup": [
+[
+"$TC actions flush action gact",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action drop index 30",
+"expExitCode": "0",
+"verifyCmd": "$TC actions list action gact",
+"matchPattern": "action order [0-9]*: gact action drop.*index 30 ref",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action gact"
+]
+},
+{
+"id": "9a52",
+"name": "Add valid continue action",
+"category": [
+"actions",
+"gact"
+],
+"setup": [
+[
+"$TC actions flush action gact",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action continue index 432",
+"expExitCode": "0",
+"verifyCmd": "$TC actions list action gact",
+"matchPattern": "action order [0-9]*: gact action continue.*index 432 
ref",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action gact"
+]
+},
+{
+"id": "d700",
+

[PATCH net-next 3/4] tc-testing: Add test cases for police and skbmod

2017-10-11 Thread Lucas Bates

Add basic unit tests for police and skbmod actions in tc.

Signed-off-by: Lucas Bates 
Acked-by: Jamal Hadi Salim 
---
 .../tc-testing/tc-tests/actions/police.json| 527 +
 .../tc-testing/tc-tests/actions/skbmod.json| 372 +++
 2 files changed, 899 insertions(+)
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/police.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/skbmod.json

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/police.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
new file mode 100644
index 000..82d0990
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/police.json
@@ -0,0 +1,527 @@
+[
+{
+"id": "49aa",
+"name": "Add valid basic police action",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 1kbit burst 10k 
index 1",
+"expExitCode": "0",
+"verifyCmd": "$TC actions ls action police",
+"matchPattern": "action order [0-9]*:  police 0x1 rate 1Kbit burst 
10Kb",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action police"
+]
+},
+{
+"id": "3abe",
+"name": "Add police action with duplicate index",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+],
+"$TC actions add action police rate 4Mbit burst 120k index 9"
+],
+"cmdUnderTest": "$TC actions add action police rate 8kbit burst 24k 
index 9",
+"expExitCode": "255",
+"verifyCmd": "$TC actions ls action police",
+"matchPattern": "action order [0-9]*:  police 0x9",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action police"
+]
+},
+{
+"id": "49fa",
+"name": "Add valid police action with mtu",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 90kbit burst 10k 
mtu 1k index 98",
+"expExitCode": "0",
+"verifyCmd": "$TC actions get action police index 98",
+"matchPattern": "action order [0-9]*:  police 0x62 rate 90Kbit burst 
10Kb mtu 1Kb",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action police"
+]
+},
+{
+"id": "7943",
+"name": "Add valid police action with peakrate",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 90kbit burst 10k 
mtu 2kb peakrate 100kbit index 3",
+"expExitCode": "0",
+"verifyCmd": "$TC actions ls action police",
+"matchPattern": "action order [0-9]*:  police 0x3 rate 90Kbit burst 
10Kb mtu 2Kb peakrate 100Kbit",
+"matchCount": "1",
+"teardown": [
+"$TC actions flush action police"
+]
+},
+{
+"id": "055e",
+"name": "Add police action with peakrate and no mtu",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 5kbit burst 6kb 
peakrate 10kbit index 9",
+"expExitCode": "255",
+"verifyCmd": "$TC actions ls action police",
+"matchPattern": "action order [0-9]*:  police 0x9 rate 5Kb burst 10Kb",
+"matchCount": "0",
+"teardown": [
+"$TC actions flush action police"
+]
+},
+{
+"id": "f057",
+"name": "Add police action with valid overhead",
+"category": [
+"actions",
+"police"
+],
+"setup": [
+[
+"$TC actions flush action police",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC actions add action police rate 1mbit burst 100k 
overhead 64 index 64",
+

[PATCH net-next 4/4] tc-testing: fix the -l argument bug in tdc.py script

2017-10-11 Thread Lucas Bates

This patch fixes a bug in the tdc script, where executing tdc
with the -l argument would cause the tests to start running
as opposed to listing all the known test cases.

Signed-off-by: Lucas Bates 
Acked-by: Jamal Hadi Salim 
---
 tools/testing/selftests/tc-testing/tdc.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/tc-testing/tdc.py 
b/tools/testing/selftests/tc-testing/tdc.py
index cd61b78..d2391df 100755
--- a/tools/testing/selftests/tc-testing/tdc.py
+++ b/tools/testing/selftests/tc-testing/tdc.py
@@ -49,7 +49,7 @@ def exec_cmd(command, nsonly=True):
 stderr=subprocess.PIPE)
 (rawout, serr) = proc.communicate()

-if proc.returncode != 0:
+if proc.returncode != 0 and len(serr) > 0:
 foutput = serr.decode("utf-8")
 else:
 foutput = rawout.decode("utf-8")
@@ -203,7 +203,7 @@ def set_args(parser):
 help='Run tests only from the specified category, or 
if no category is specified, list known categories.')
 parser.add_argument('-f', '--file', type=str,
 help='Run tests from the specified file')
-parser.add_argument('-l', '--list', type=str, nargs='?', const="", 
metavar='CATEGORY',
+parser.add_argument('-l', '--list', type=str, nargs='?', const="++", 
metavar='CATEGORY',
 help='List all test cases, or those only within the 
specified category')
 parser.add_argument('-s', '--show', type=str, nargs=1, metavar='ID', 
dest='showID',
 help='Display the test case with specified id')
@@ -357,10 +357,10 @@ def set_operation_mode(args):
 testcases = get_categorized_testlist(alltests, ucat)

 if args.list:
-if (len(args.list) == 0):
+if (args.list == "++"):
 list_test_cases(alltests)
 exit(0)
-elif(len(args.list > 0)):
+elif(len(args.list) > 0):
 if (args.list not in ucat):
 print("Unknown category " + args.list)
 print("Available categories:")
--
2.7.4

[PATCH net-next 2/3] sched: act: ife: migrate to use per-cpu counters

2017-10-11 Thread Alexander Aring

This patch migrates the current counter handling which is protected by a
spinlock to a per-cpu counter handling. This reduce the time where the
spinlock is being held.

Signed-off-by: Alexander Aring 
---
 net/sched/act_ife.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index efac8a32c30a..f0d86b182387 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -463,7 +463,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a, _ife_ops,
-bind, false);
+bind, true);
if (ret)
return ret;
ret = ACT_P_CREATED;
@@ -624,19 +624,15 @@ static int tcf_ife_decode(struct sk_buff *skb, const 
struct tc_action *a,
u8 *tlv_data;
u16 metalen;
 
-   spin_lock(>tcf_lock);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
tcf_lastuse_update(>tcf_tm);
-   spin_unlock(>tcf_lock);
 
if (skb_at_tc_ingress(skb))
skb_push(skb, skb->dev->hard_header_len);
 
tlv_data = ife_decode(skb, );
if (unlikely(!tlv_data)) {
-   spin_lock(>tcf_lock);
-   ife->tcf_qstats.drops++;
-   spin_unlock(>tcf_lock);
+   qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
return TC_ACT_SHOT;
}
 
@@ -654,14 +650,12 @@ static int tcf_ife_decode(struct sk_buff *skb, const 
struct tc_action *a,
 */
pr_info_ratelimited("Unknown metaid %d dlen %d\n",
mtype, dlen);
-   ife->tcf_qstats.overlimits++;
+   
qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
}
}
 
if (WARN_ON(tlv_data != ifehdr_end)) {
-   spin_lock(>tcf_lock);
-   ife->tcf_qstats.drops++;
-   spin_unlock(>tcf_lock);
+   qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
return TC_ACT_SHOT;
}
 
@@ -713,23 +707,20 @@ static int tcf_ife_encode(struct sk_buff *skb, const 
struct tc_action *a,
exceed_mtu = true;
}
 
-   spin_lock(>tcf_lock);
-   bstats_update(>tcf_bstats, skb);
+   bstats_cpu_update(this_cpu_ptr(ife->common.cpu_bstats), skb);
tcf_lastuse_update(>tcf_tm);
 
if (!metalen) { /* no metadata to send */
/* abuse overlimits to count when we allow packet
 * with no metadata
 */
-   ife->tcf_qstats.overlimits++;
-   spin_unlock(>tcf_lock);
+   qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
return action;
}
/* could be stupid policy setup or mtu config
 * so lets be conservative.. */
if ((action == TC_ACT_SHOT) || exceed_mtu) {
-   ife->tcf_qstats.drops++;
-   spin_unlock(>tcf_lock);
+   qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
return TC_ACT_SHOT;
}
 
@@ -738,6 +729,8 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct 
tc_action *a,
 
ife_meta = ife_encode(skb, metalen);
 
+   spin_lock(>tcf_lock);
+
/* XXX: we dont have a clever way of telling encode to
 * not repeat some of the computations that are done by
 * ops->presence_check...
@@ -749,8 +742,8 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct 
tc_action *a,
}
if (err < 0) {
/* too corrupt to keep around if overwritten */
-   ife->tcf_qstats.drops++;
spin_unlock(>tcf_lock);
+   qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
return TC_ACT_SHOT;
}
skboff += err;
-- 
2.11.0

[PATCH net-next 3/3] sched: act: ife: update parameters via rcu handling

2017-10-11 Thread Alexander Aring

This patch changes the parameter updating via RCU and not protected by a
spinlock anymore. This reduce the time that the spinlock is being held.

Signed-off-by: Alexander Aring 
---
 include/net/tc_act/tc_ife.h | 10 --
 net/sched/act_ife.c | 87 ++---
 2 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/include/net/tc_act/tc_ife.h b/include/net/tc_act/tc_ife.h
index 30ba459ddd34..16a84f6d43e2 100644
--- a/include/net/tc_act/tc_ife.h
+++ b/include/net/tc_act/tc_ife.h
@@ -6,12 +6,18 @@
 #include 
 #include 
 
-struct tcf_ife_info {
-   struct tc_action common;
+struct tcf_ife_params {
u8 eth_dst[ETH_ALEN];
u8 eth_src[ETH_ALEN];
u16 eth_type;
u16 flags;
+
+   struct rcu_head rcu;
+};
+
+struct tcf_ife_info {
+   struct tc_action common;
+   struct tcf_ife_params __rcu *params;
/* list of metaids allowed */
struct list_head metalist;
 };
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index f0d86b182387..2ef25c5582bb 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -392,10 +392,14 @@ static void _tcf_ife_cleanup(struct tc_action *a, int 
bind)
 static void tcf_ife_cleanup(struct tc_action *a, int bind)
 {
struct tcf_ife_info *ife = to_ife(a);
+   struct tcf_ife_params *p;
 
spin_lock_bh(>tcf_lock);
_tcf_ife_cleanup(a, bind);
spin_unlock_bh(>tcf_lock);
+
+   p = rcu_dereference_protected(ife->params, 1);
+   kfree_rcu(p, rcu);
 }
 
 /* under ife->tcf_lock for existing action */
@@ -432,6 +436,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
struct tc_action_net *tn = net_generic(net, ife_net_id);
struct nlattr *tb[TCA_IFE_MAX + 1];
struct nlattr *tb2[IFE_META_MAX + 1];
+   struct tcf_ife_params *p, *p_old;
struct tcf_ife_info *ife;
u16 ife_type = ETH_P_IFE;
struct tc_ife *parm;
@@ -457,24 +462,34 @@ static int tcf_ife_init(struct net *net, struct nlattr 
*nla,
if (parm->flags & ~IFE_ENCODE)
return -EINVAL;
 
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p)
+   return -ENOMEM;
+
exists = tcf_idr_check(tn, parm->index, a, bind);
-   if (exists && bind)
+   if (exists && bind) {
+   kfree(p);
return 0;
+   }
 
if (!exists) {
ret = tcf_idr_create(tn, parm->index, est, a, _ife_ops,
 bind, true);
-   if (ret)
+   if (ret) {
+   kfree(p);
return ret;
+   }
ret = ACT_P_CREATED;
} else {
tcf_idr_release(*a, bind);
-   if (!ovr)
+   if (!ovr) {
+   kfree(p);
return -EEXIST;
+   }
}
 
ife = to_ife(*a);
-   ife->flags = parm->flags;
+   p->flags = parm->flags;
 
if (parm->flags & IFE_ENCODE) {
if (tb[TCA_IFE_TYPE])
@@ -485,24 +500,25 @@ static int tcf_ife_init(struct net *net, struct nlattr 
*nla,
saddr = nla_data(tb[TCA_IFE_SMAC]);
}
 
-   if (exists)
-   spin_lock_bh(>tcf_lock);
ife->tcf_action = parm->action;
 
if (parm->flags & IFE_ENCODE) {
if (daddr)
-   ether_addr_copy(ife->eth_dst, daddr);
+   ether_addr_copy(p->eth_dst, daddr);
else
-   eth_zero_addr(ife->eth_dst);
+   eth_zero_addr(p->eth_dst);
 
if (saddr)
-   ether_addr_copy(ife->eth_src, saddr);
+   ether_addr_copy(p->eth_src, saddr);
else
-   eth_zero_addr(ife->eth_src);
+   eth_zero_addr(p->eth_src);
 
-   ife->eth_type = ife_type;
+   p->eth_type = ife_type;
}
 
+   if (exists)
+   spin_lock_bh(>tcf_lock);
+
if (ret == ACT_P_CREATED)
INIT_LIST_HEAD(>metalist);
 
@@ -518,6 +534,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 
if (exists)
spin_unlock_bh(>tcf_lock);
+   kfree(p);
return err;
}
 
@@ -538,6 +555,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 
if (exists)
spin_unlock_bh(>tcf_lock);
+   kfree(p);
return err;
}
}
@@ -545,6 +563,11 @@ static int tcf_ife_init(struct net *net, struct nlattr 
*nla,
if (exists)
spin_unlock_bh(>tcf_lock);
 
+   p_old = rtnl_dereference(ife->params);
+   rcu_assign_pointer(ife->params, p);
+

[PATCH net-next 0/3] sched: act: ife: UAPI checks and performance tweaks

2017-10-11 Thread Alexander Aring

Hi,

this patch series contains at first a patch which adds a check for
IFE_ENCODE and IFE_DECODE when a ife act gets created or updated and adding
handling of these cases only inside the act callback only.

The second patch use per-cpu counters and move the spinlock around so that
the spinlock is less being held in act callback.

The last patch use rcu for update parameters and also move the spinlock for
the same purpose as in patch 2.

Notes:
 - There is still a spinlock around for protecting the metalist and a
   rw-lock for another list. Should be migrated to a rcu list, ife
   possible.

 - I use still dereference in dump callback, so I think what I didn't
   got was what happened when rcu_assign_pointer will do when rcu read
   lock is held. I suppose the pointer will be updated, then we don't
   have any issue here.

Alexander Aring (3):
  sched: act: ife: move encode/decode check to init
  sched: act: ife: migrate to use per-cpu counters
  sched: act: ife: update parameters via rcu handling

 include/net/tc_act/tc_ife.h |  10 +++-
 net/sched/act_ife.c | 135 +---
 2 files changed, 86 insertions(+), 59 deletions(-)

-- 
2.11.0

[PATCH net-next 1/3] sched: act: ife: move encode/decode check to init

2017-10-11 Thread Alexander Aring

This patch adds the check of the two possible ife handlings encode
and decode to the init callback. The decode value is for usability
aspect and used in userspace code only. The current code offers encode
else decode only. This patch avoids any other option than this.

Signed-off-by: Alexander Aring 
---
 net/sched/act_ife.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 8ccd35825b6b..efac8a32c30a 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -450,6 +450,13 @@ static int tcf_ife_init(struct net *net, struct nlattr 
*nla,
 
parm = nla_data(tb[TCA_IFE_PARMS]);
 
+   /* IFE_DECODE is 0 and indicates the opposite of IFE_ENCODE because
+* they cannot run as the same time. Check on all other values which
+* are not supported right now.
+*/
+   if (parm->flags & ~IFE_ENCODE)
+   return -EINVAL;
+
exists = tcf_idr_check(tn, parm->index, a, bind);
if (exists && bind)
return 0;
@@ -772,17 +779,7 @@ static int tcf_ife_act(struct sk_buff *skb, const struct 
tc_action *a,
if (ife->flags & IFE_ENCODE)
return tcf_ife_encode(skb, a, res);
 
-   if (!(ife->flags & IFE_ENCODE))
-   return tcf_ife_decode(skb, a, res);
-
-   pr_info_ratelimited("unknown failure(policy neither de/encode\n");
-   spin_lock(>tcf_lock);
-   bstats_update(>tcf_bstats, skb);
-   tcf_lastuse_update(>tcf_tm);
-   ife->tcf_qstats.drops++;
-   spin_unlock(>tcf_lock);
-
-   return TC_ACT_SHOT;
+   return tcf_ife_decode(skb, a, res);
 }
 
 static int tcf_ife_walker(struct net *net, struct sk_buff *skb,
-- 
2.11.0

[PATCH] rtlwifi: Remove unused cur_rfstate variables

2017-10-11 Thread Christos Gkekas

Clean up unused cur_rfstate variables in rtl8188ee, rtl8723ae, rtl8723be
and rtl8821ae.

Signed-off-by: Christos Gkekas 
---
 drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c | 4 +---
 drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c | 4 +---
 drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c | 4 +---
 drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c | 4 +---
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c
index 0ba26d2..4d843e6 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/hw.c
@@ -2235,7 +2235,7 @@ bool rtl88ee_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
 {
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
-   enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+   enum rf_pwrstate e_rfpowerstate_toset;
u32 u4tmp;
bool b_actuallyset = false;
 
@@ -2254,8 +2254,6 @@ bool rtl88ee_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
spin_unlock(>locks.rf_ps_lock);
}
 
-   cur_rfstate = ppsc->rfpwr_state;
-
u4tmp = rtl_read_dword(rtlpriv, REG_GPIO_OUTPUT);
e_rfpowerstate_toset = (u4tmp & BIT(31)) ? ERFON : ERFOFF;
 
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
index 5ac7b81..4e1e1f8 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
@@ -2103,7 +2103,7 @@ bool rtl8723e_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
struct rtl_phy *rtlphy = &(rtlpriv->phy);
-   enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+   enum rf_pwrstate e_rfpowerstate_toset;
u8 u1tmp;
bool b_actuallyset = false;
 
@@ -2122,8 +2122,6 @@ bool rtl8723e_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
spin_unlock(>locks.rf_ps_lock);
}
 
-   cur_rfstate = ppsc->rfpwr_state;
-
rtl_write_byte(rtlpriv, REG_GPIO_IO_SEL_2,
   rtl_read_byte(rtlpriv, REG_GPIO_IO_SEL_2)&~(BIT(1)));
 
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c
index 4d47b97..2ad1013 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723be/hw.c
@@ -2486,7 +2486,7 @@ bool rtl8723be_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
struct rtl_phy *rtlphy = &(rtlpriv->phy);
-   enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+   enum rf_pwrstate e_rfpowerstate_toset;
u8 u1tmp;
bool b_actuallyset = false;
 
@@ -2505,8 +2505,6 @@ bool rtl8723be_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
spin_unlock(>locks.rf_ps_lock);
}
 
-   cur_rfstate = ppsc->rfpwr_state;
-
rtl_write_byte(rtlpriv, REG_GPIO_IO_SEL_2,
   rtl_read_byte(rtlpriv, REG_GPIO_IO_SEL_2) & ~(BIT(1)));
 
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
index 1d431d4..e756f94 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
@@ -3845,7 +3845,7 @@ bool rtl8821ae_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
struct rtl_phy *rtlphy = >phy;
-   enum rf_pwrstate e_rfpowerstate_toset, cur_rfstate;
+   enum rf_pwrstate e_rfpowerstate_toset;
u8 u1tmp = 0;
bool b_actuallyset = false;
 
@@ -3864,8 +3864,6 @@ bool rtl8821ae_gpio_radio_on_off_checking(struct 
ieee80211_hw *hw, u8 *valid)
spin_unlock(>locks.rf_ps_lock);
}
 
-   cur_rfstate = ppsc->rfpwr_state;
-
rtl_write_byte(rtlpriv, REG_GPIO_IO_SEL_2,
rtl_read_byte(rtlpriv,
REG_GPIO_IO_SEL_2) & ~(BIT(1)));
-- 
2.7.4

Re: [PATCH v5 2/2] net: phy: at803x: Change error to EINVAL for invalid MAC

2017-10-11 Thread David Miller

From: Dan Murphy 
Date: Tue, 10 Oct 2017 12:42:56 -0500

> Change the return error code to EINVAL if the MAC
> address is not valid in the set_wol function.
> 
> Signed-off-by: Dan Murphy 

Applied to net-next.

Re: [PATCH 1/3] atm: idt77105: Drop needless setup_timer()

2017-10-11 Thread David Miller

From: Kees Cook 
Date: Tue, 10 Oct 2017 12:25:48 -0700

> Calling setup_timer() is redundant when DEFINE_TIMER() has been used.
> 
> Cc: Chas Williams <3ch...@gmail.com>
> Cc: linux-atm-gene...@lists.sourceforge.net
> Cc: netdev@vger.kernel.org
> Signed-off-by: Kees Cook 

Applied to net-next, thanks.

Re: [PATCH v5 1/2] net: phy: DP83822 initial driver submission

2017-10-11 Thread David Miller

From: Dan Murphy 
Date: Tue, 10 Oct 2017 12:42:55 -0500

> Add support for the TI  DP83822 10/100Mbit ethernet phy.
> 
> The DP83822 provides flexibility to connect to a MAC through a
> standard MII, RMII or RGMII interface.
> 
> In addition the DP83822 needs to be removed from the DP83848 driver
> as the WoL support is added here for this device.
> 
> Datasheet:
> http://www.ti.com/product/DP83822I/datasheet
> 
> Signed-off-by: Dan Murphy 

Applied to net-next.

Re: [RFC net-next 1/4] net: ipv6: Make inet6addr_validator a blocking notifier

2017-10-11 Thread David Miller

From: David Ahern 
Date: Tue, 10 Oct 2017 09:41:02 -0700

> + /* validator notifier needs to be blocking;
> +  * do not call in softirq context
> +  */
> + if (!in_softirq()) {

I think we can test this better.

You should be able to audit the call sites and for each one set the
value of a new boolean argument properly, and this way you can also
give the boolean argument a descriptive name.

Furthermore, we can also then pull the inet6_addr allocation out of
the locking paths and thus use GFP_KERNEL when possible.

Re: [PATCH net] macsec: fix memory leaks when skb_to_sgvec fails

2017-10-11 Thread David Miller

From: Sabrina Dubroca 
Date: Tue, 10 Oct 2017 17:07:12 +0200

> Fixes: cda7ea690350 ("macsec: check return value of skb_to_sgvec always")
> Signed-off-by: Sabrina Dubroca 

Applied and queued up for -stable.

Re: [PATCH net-next 1/1] net/smc: add SMC rendezvous protocol

2017-10-11 Thread David Miller

From: Ursula Braun 
Date: Tue, 10 Oct 2017 16:14:19 +0200

> The goal of this patch is to leave common TCP code unmodified. Thus,
> it uses netfilter hooks to intercept TCP SYN and SYN/ACK
> packets. For outgoing packets originating from SMC sockets, the
> experimental option is added. For inbound packets destined for SMC
> sockets, the experimental option is checked.

I think this really isn't going to pass.

It's a user experience nightmare when the kernel inserts and
deletes filtering rules outside of what the user configures
on their system.

This approach was also considerd for ipv6 ILA, and the same
pushback was given.

Why not add support for these new options as a normal TCP
socket option based feature?  Then normal userspace as well
as the SMC stack can make use of it.

Re: BUG:af_packet fails to TX TSO frames

2017-10-11 Thread Willem de Bruijn

On Wed, Oct 11, 2017 at 3:39 PM, Anton Ivanov
 wrote:
> On 11/10/17 19:57, Willem de Bruijn wrote:
>> On Wed, Oct 11, 2017 at 2:39 PM, Anton Ivanov
>>  wrote:
>>> The check as now insists that the actual driver supports GSO_ROBUST, because
>>> we have marked the skb dodgy.
>>>
>>> The specific bit which does this check is in net_gso_ok()
>>>
>>> Now, lets's see how many Ethernet drivers set GSO_ROBUST.
>>>
>>> find drivers/net/ethernet -type f -name "*.[c,h]" -exec grep -H GSO_ROBUST
>>> {} \;
>>>
>>> That returns nothing in 4.x
>>>
>>> IMHO - af_packet allocates the skb, does all checks (and extra may be added)
>>> on the gso, why is this set dodgy in the first place?
>> It is set when the header has to be validated.
>>
>> The segmentation logic will validate and fixup gso_segs. See for
>> instance tcp_gso_segment:
>>
>> if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>> /* Packet is from an untrusted source, reset gso_segs. */
>>
>> skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
>>
>> segs = NULL;
>> goto out;
>> }
>>
>> If the device would have the robust bit set and otherwise supports the
>> required features, fix up gso_segs and pass the large packet to the
>> device.
>>
>> Else it continues to the software gso path.
>>
>> Large packets generated with psock_txring_vnet.c pass this test. I
>
> That test is indeed a different path

The test can be run both with and without ring:

  psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v
  psock_txring_vnet -l 8000 -s $src_ip -d $dst_ip -v -N

both with and without qdisc bypass ('-q').

>  - this goes via the tpacket_snd
> which allocs via sock_alloc_send_skb. That results in a non-fragged skb
> as it calls pskb after that with data_len = 0 asking for a contiguous one.

but attached the ring slot as fragments in tpacket_fill_skb.

> My stuff is using sendmmsg which ends up via packet_snd which allocs
> via  sock_alloc_send_pskb which is invoked in a way which always creates
> 2 segments - one for the linear section and one for the rest (and more
> if needed). It is faster than tpacket by the way (several times).
>
> As a comparison tap and other virtual drivers use sock_alloc_send_pskb
> with non-zero data length which results in multiple frags. The code in
> packet_snd is in fact identical with tap (+/- some cosmetic differences).
>
> That is the difference between the tests and that is why your test works
> and mine fails.

All the above test cases work for me, including those that build skbs
with fragments. Could you try those.

Re: [PATCH RFC] Add Microchip KSZ8895 DSA driver

2017-10-11 Thread Pavel Machek

Hi!

> +static void ksz8895_set_prio_queue(struct ksz_device *dev, int port, int 
> queue)
> +{
> + u8 hi;
> + u8 lo;
> +
> + /* Number of queues can only be 1, 2, or 4. */
> + switch (queue) {
> + case 4:
> + case 3:
> + queue = PORT_QUEUE_SPLIT_4;
> + break;
> + case 2:
> + queue = PORT_QUEUE_SPLIT_2;
> + break;
> + default:
> + queue = PORT_QUEUE_SPLIT_1;
> + }
> + ksz_pread8(dev, port, REG_PORT_CTRL_0, );
> + ksz_pread8(dev, port, P_DROP_TAG_CTRL, );
> + lo &= ~PORT_QUEUE_SPLIT_L;
> + if (queue & PORT_QUEUE_SPLIT_2)
> + lo |= PORT_QUEUE_SPLIT_L;
> + hi &= ~PORT_QUEUE_SPLIT_H;
> + if (queue & PORT_QUEUE_SPLIT_4)
> + hi |= PORT_QUEUE_SPLIT_H;
> + ksz_pwrite8(dev, port, REG_PORT_CTRL_0, lo);
> + ksz_pwrite8(dev, port, P_DROP_TAG_CTRL, hi);
> +
> + /* Default is port based for egress rate limit. */
> + if (queue != PORT_QUEUE_SPLIT_1)
> + ksz_cfg(dev, REG_SW_CTRL_19, SW_OUT_RATE_LIMIT_QUEUE_BASED,
> + true);
> +}

This is same as the other driver, right? Same comments apply here, and
please find a way to make it shared.


> +static void ksz8895_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
> +   u64 *cnt)
> +{
> + u32 data;
> + u16 ctrl_addr;
> + u8 check;
> + int loop;
> +
> + ctrl_addr = addr + SWITCH_COUNTER_NUM * port;
> + ctrl_addr |= IND_ACC_TABLE(TABLE_MIB | TABLE_READ);
> +
> + mutex_lock(>alu_mutex);
> + ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> +
> + /* It is almost guaranteed to always read the valid bit because of
> +  * slow SPI speed.
> +  */
> + for (loop = 2; loop > 0; loop--) {
> + ksz_read8(dev, REG_IND_MIB_CHECK, );
> +
> + if (check & MIB_COUNTER_VALID) {
> + ksz_read32(dev, REG_IND_DATA_LO, );
> + if (check & MIB_COUNTER_OVERFLOW)
> + *cnt += MIB_COUNTER_VALUE + 1;
> + *cnt += data & MIB_COUNTER_VALUE;
> + break;
> + }
> + }
> + mutex_unlock(>alu_mutex);

Again, same function, same review comments.

> +static void ksz8895_r_table(struct ksz_device *dev, int table, u16 addr,
> + u64 *data)
> +{
> + u16 ctrl_addr;
> +
> + ctrl_addr = IND_ACC_TABLE(table | TABLE_READ) | addr;
> +
> + mutex_lock(>alu_mutex);
> + ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> + ksz_get(dev, REG_IND_DATA_HI, data, sizeof(u64));
> + mutex_unlock(>alu_mutex);
> + *data = be64_to_cpu(*data);
> +}

I've seen this before; this is duplicated code, it does not make sense
to review.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH 3/4] dpaa_eth: change device used

2017-10-11 Thread David Miller

From: Madalin Bucur 
Date: Tue, 10 Oct 2017 17:10:17 +0300

> @@ -2696,7 +2681,13 @@ static int dpaa_eth_probe(struct platform_device *pdev)
>   int err = 0, i, channel;
>   struct device *dev;
>  
> - dev = >dev;
> + /* device used for DMA mapping */
> + dev = pdev->dev.parent;
> + err = dma_coerce_mask_and_coherent(dev, DMA_BIT_MASK(40));
> + if (err) {
> + dev_err(dev, "dma_coerce_mask_and_coherent() failed\n");
> + goto dev_mask_failed;
> + }
>  
>   /* Allocate this early, so we can store relevant information in
>* the private area

Since you are moving this code up before the netdev allocation, you must
adjust the failure path goto label used.

Your change as-is will cause an OOPS because we'll pass a NULL pointer
to free_netdev().

Re: [PATCH v1 RFC 1/1] Add Microchip KSZ8795 DSA driver

2017-10-11 Thread Pavel Machek

Hi!

> +static void ksz8795_set_prio_queue(struct ksz_device *dev, int port, int 
> queue)
> +{
> + u8 hi;
> + u8 lo;
> +
> + /* Number of queues can only be 1, 2, or 4. */
> + switch (queue) {
> + case 4:
> + case 3:
> + queue = PORT_QUEUE_SPLIT_4;
> + break;
> + case 2:
> + queue = PORT_QUEUE_SPLIT_2;
> + break;
> + default:
> + queue = PORT_QUEUE_SPLIT_1;
> + }

If only 1, 2 and 4 are valid, it probably should not accept other
values?

> +static void ksz8795_r_mib_cnt(struct ksz_device *dev, int port, u16 addr,
> +   u64 *cnt)
> +{
> + u32 data;
> + u16 ctrl_addr;
> + u8 check;
> + int loop;
> +
> + ctrl_addr = addr + SWITCH_COUNTER_NUM * port;
> + ctrl_addr |= IND_ACC_TABLE(TABLE_MIB | TABLE_READ);
> +
> + mutex_lock(>alu_mutex);
> + ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> +
> + /* It is almost guaranteed to always read the valid bit because of
> +  * slow SPI speed.
> +  */
> + for (loop = 2; loop > 0; loop--) {
> + ksz_read8(dev, REG_IND_MIB_CHECK, );
> +
> + if (check & MIB_COUNTER_VALID) {
> + ksz_read32(dev, REG_IND_DATA_LO, );
> + if (check & MIB_COUNTER_OVERFLOW)
> + *cnt += MIB_COUNTER_VALUE + 1;
> + *cnt += data & MIB_COUNTER_VALUE;
> + break;
> + }
> + }

Hmm. Maybe, but should not this at least warn if if it can not get
valid counter?

> + /* It is almost guaranteed to always read the valid bit because of
> +  * slow SPI speed.
> +  */
> + for (loop = 2; loop > 0; loop--) {
> + ksz_read8(dev, REG_IND_MIB_CHECK, );
> +
> + if (check & MIB_COUNTER_VALID) {
> + ksz_read32(dev, REG_IND_DATA_LO, );
> + if (addr < 2) {
> + u64 total;
> +
> + total = check & MIB_TOTAL_BYTES_H;
> + total <<= 32;
> + *cnt += total;
> + *cnt += data;
> + if (check & MIB_COUNTER_OVERFLOW) {
> + total = MIB_TOTAL_BYTES_H + 1;
> + total <<= 32;
> + *cnt += total;
> + }
> + } else {
> + if (check & MIB_COUNTER_OVERFLOW)
> + *cnt += MIB_PACKET_DROPPED + 1;
> + *cnt += data & MIB_PACKET_DROPPED;
> + }
> + break;
> + }
> + }

Same here. Plus, is overflow handling correct? There may be more than
MIB_PACKET_DROPPED + 1 packets dropped between the checks. 

> +static void ksz8795_r_table(struct ksz_device *dev, int table, u16 addr,
> + u64 *data)
> +{
> + u16 ctrl_addr;
> +
> + ctrl_addr = IND_ACC_TABLE(table | TABLE_READ) | addr;
> +
> + mutex_lock(>alu_mutex);
> + ksz_write16(dev, REG_IND_CTRL_0, ctrl_addr);
> + ksz_get(dev, REG_IND_DATA_HI, data, sizeof(u64));
> + mutex_unlock(>alu_mutex);
> + *data = be64_to_cpu(*data);
> +}

It would be a tiny bit nicer to have be64 temporary variable and use
it; having *data change endianness at runtime is "interesting".

> +static int ksz8795_valid_dyn_entry(struct ksz_device *dev, u8 *data)
> +{
> + int timeout = 100;
> +
> + do {
> + ksz_read8(dev, REG_IND_DATA_CHECK, data);
> + timeout--;
> + } while ((*data & DYNAMIC_MAC_TABLE_NOT_READY) && timeout);
> +
> + /* Entry is not ready for accessing. */
> + if (*data & DYNAMIC_MAC_TABLE_NOT_READY) {
> + return -EAGAIN;
> + /* Entry is ready for accessing. */
> + } else {
> + ksz_read8(dev, REG_IND_DATA_8, data);
> +
> + /* There is no valid entry in the table. */
> + if (*data & DYNAMIC_MAC_TABLE_MAC_EMPTY)
> + return -ENXIO;
> + }

You can drop else and one indentation level.

> + /* At least one valid entry in the table. */
> + } else {
> + u64 buf;
> + int cnt;
> +
> + ksz_get(dev, REG_IND_DATA_HI, , sizeof(buf));
> + buf = be64_to_cpu(buf);

Would it make sense to convert endianness inside ksz_get?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e

2017-10-11 Thread Jiri Pirko

Wed, Oct 11, 2017 at 10:46:52PM CEST, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Wed, 11 Oct 2017 22:38:32 +0200
>
>> Wed, Oct 11, 2017 at 07:46:27PM CEST, alexander.du...@gmail.com wrote:
>>>On Wed, Oct 11, 2017 at 5:56 AM, Jiri Pirko  wrote:
 Wed, Oct 11, 2017 at 02:24:12AM CEST, amritha.namb...@intel.com wrote:
>This patch series enables configuring cloud filters in i40e
>using the tc-flower classifier. The classification function
>of the filter is to match a packet to a class. cls_flower is
>extended to offload classid to hardware. The offloaded classid
>is used direct matched packets to a traffic class on the device.
>The approach here is similar to the tc 'prio' qdisc which uses
>the classid for band selection. The ingress qdisc is called :0,
>so traffic classes are :1 to :8 (i40e has max of 8 TCs).


 NACK. This clearly looks like abuse of classid to something
 else. Classid is here to identify qdisc instance. However, you use it
 for hw tclass identification. This is mixing of apples and oranges.

 Why?

 Please don't try to abuse things! This is not nice.
>>>
>>>This isn't an abuse. This is reproducing in hardware what is already
>>>the behavior for software. Isn't that how offloads are supposed to
>>>work?
>> 
>> What is meaning of classid in HW? Classid is SW only identification of
>> qdisc instances. No relation to HW instances = abuse.
>
>Jiri I really don't see what the problem is.
>
>As long as the driver does the right thing when changes are made to the
>qdisc, it doesn't really matter what "key" they use to refer to it.
>
>It could have just as easily used the qdisc pointer and then internally
>use some IDR allocated ID to refer to it in the driver and hardware.
>
>But that's such a waste, we have a unique handle already so why can't
>the driver just use that?

Well if I see classid, I expect it should refer to qdisc instance. So
far, this has been always a case. But for some drivers, this would mean
something totally different and unrelated. So what should I think?
What's next? Classid could be abused to identify something else. I don't
understand why.

classid in kernel and tclass in hw are 2 completely unrelated things.
Why they should share the same userspace api? What am I missing that
indicates this is not an abuse?

There should be clean and well-defined userspace api:
1) classid to identify qdisc instances
2) something else to identify HW tclasses

Re: [PATCH v2 net-next 0/2] lan9303: Add basic offloading of unicast traffic

2017-10-11 Thread David Miller

From: Egil Hjelmeland 
Date: Tue, 10 Oct 2017 14:49:51 +0200

> This series add basic offloading of unicast traffic to the lan9303
> DSA driver.
> 
> Review welcome!
>  
> Changes v1 -> v2:
>  - Patch 1: Codestyle linting.
>  - Patch 2: Remember SWE_PORT_STATE while not bridged.
> Added constant LAN9303_SWE_PORT_MIRROR_DISABLED.

Series applied, thanks.

Re: Ethtool question

2017-10-11 Thread David Miller

From: "John W. Linville" 
Date: Wed, 11 Oct 2017 16:44:07 -0400

> On Wed, Oct 11, 2017 at 09:51:56AM -0700, Ben Greear wrote:
>> I noticed today that setting some ethtool settings to the same value
>> returns an error code.  I would think this should silently return
>> success instead?  Makes it easier to call it from scripts this way:
>> 
>> [root@lf0313-6477 lanforge]# ethtool -L eth3 combined 1
>> combined unmodified, ignoring
>> no channel parameters changed, aborting
>> current values: tx 0 rx 0 other 1 combined 1
>> [root@lf0313-6477 lanforge]# echo $?
>> 1
> 
> I just had this discussion a couple of months ago with someone. My
> initial feeling was like you, a no-op is not a failure. But someone
> convinced me otherwise...I will now endeavour to remember who that
> was and how they convinced me...
> 
> Anyone else have input here?

I guess this usually happens when drivers don't support changing the
settings at all.  So they just make their ethtool operation for the
'set' always return an error.

We could have a generic ethtool helper that does "get" and then if the
"set" request is identical just return zero.

But from another perspective, the error returned from the "set" in this
situation also indicates to the user that the driver does not support
the "set" operation which has value and meaning in and of itself.  And
we'd lose that with the given suggestion.

Re: [jkirsher/next-queue PATCH v4 0/6] tc-flower based cloud filters in i40e

2017-10-11 Thread David Miller

From: Jiri Pirko 
Date: Wed, 11 Oct 2017 22:38:32 +0200

> Wed, Oct 11, 2017 at 07:46:27PM CEST, alexander.du...@gmail.com wrote:
>>On Wed, Oct 11, 2017 at 5:56 AM, Jiri Pirko  wrote:
>>> Wed, Oct 11, 2017 at 02:24:12AM CEST, amritha.namb...@intel.com wrote:
This patch series enables configuring cloud filters in i40e
using the tc-flower classifier. The classification function
of the filter is to match a packet to a class. cls_flower is
extended to offload classid to hardware. The offloaded classid
is used direct matched packets to a traffic class on the device.
The approach here is similar to the tc 'prio' qdisc which uses
the classid for band selection. The ingress qdisc is called :0,
so traffic classes are :1 to :8 (i40e has max of 8 TCs).
>>>
>>>
>>> NACK. This clearly looks like abuse of classid to something
>>> else. Classid is here to identify qdisc instance. However, you use it
>>> for hw tclass identification. This is mixing of apples and oranges.
>>>
>>> Why?
>>>
>>> Please don't try to abuse things! This is not nice.
>>
>>This isn't an abuse. This is reproducing in hardware what is already
>>the behavior for software. Isn't that how offloads are supposed to
>>work?
> 
> What is meaning of classid in HW? Classid is SW only identification of
> qdisc instances. No relation to HW instances = abuse.

Jiri I really don't see what the problem is.

As long as the driver does the right thing when changes are made to the
qdisc, it doesn't really matter what "key" they use to refer to it.

It could have just as easily used the qdisc pointer and then internally
use some IDR allocated ID to refer to it in the driver and hardware.

But that's such a waste, we have a unique handle already so why can't
the driver just use that?

Re: [PATCH RFC] Add other KSZ switch support so that patch check does not complain

2017-10-11 Thread Pavel Machek

On Fri 2017-10-06 13:33:29, tristram...@microchip.com wrote:
> From: Tristram Ha 
> 
> Add other KSZ switch support so that patch check does not complain.
> 
> Signed-off-by: Tristram Ha 

Reviewed-by: Pavel Machek 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH v1 RFC 7/7] Modify tag_ksz.c so that tail tag code can be used by other KSZ switch drivers

2017-10-11 Thread Pavel Machek

Hi!

> +#define  KSZ_INGRESS_TAG_LEN 1

This define is now (or should be) unused, so you can delete it, no?

> _#define  KSZ_EGRESS_TAG_LEN  1

And I'd delete this define, too. Having constant for something that's
variable is quite confusing :-).

Plus you are really doing too much inside single patch.

> + * For Egress (KSZ9477 -> Host), 1 byte is added before FCS.
> + * 
> ---
> + * DA(6bytes)|SA(6bytes)||Data(nbytes)|tag0(1byte)|FCS(4bytes)
> + * 
> ---
> + * tag0 : zero-based value represents port
> + * (eg, 0x00=port1, 0x02=port3, 0x06=port7)
> + */
> +
> +#define KSZ9477_INGRESS_TAG_LEN  2
> +#define KSZ9477_PTP_TAG_LEN  4
> +#define KSZ9477_PTP_TAG_INDICATION   0x80
> +
> +#define KSZ9477_TAIL_TAG_OVERRIDEBIT(9)
> +#define KSZ9477_TAIL_TAG_LOOKUP  BIT(10)
> +
> +static int ksz9477_get_tag(u8 *tag, int *port)
> +{
> + int len = KSZ_EGRESS_TAG_LEN;
> +
> + /* Extra 4-bytes PTP timestamp */
> + if (tag[0] & KSZ9477_PTP_TAG_INDICATION)
> + len += KSZ9477_PTP_TAG_LEN;
> + *port = tag[0] & 7;
> + return len;
> +}
> +
> +static void ksz9477_set_tag(void *ptr, u8 *addr, int p)
> +{
> + u16 *tag = (u16 *)ptr;
> +
> + *tag = 1 << p;
> + if (!memcmp(addr, special_mult_addr, ETH_ALEN))
> + *tag |= KSZ9477_TAIL_TAG_OVERRIDE;
> + *tag = cpu_to_be16(*tag);
> +}

These are new features that were not there before, right?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: Ethtool question

2017-10-11 Thread John W. Linville

On Wed, Oct 11, 2017 at 09:51:56AM -0700, Ben Greear wrote:
> I noticed today that setting some ethtool settings to the same value
> returns an error code.  I would think this should silently return
> success instead?  Makes it easier to call it from scripts this way:
> 
> [root@lf0313-6477 lanforge]# ethtool -L eth3 combined 1
> combined unmodified, ignoring
> no channel parameters changed, aborting
> current values: tx 0 rx 0 other 1 combined 1
> [root@lf0313-6477 lanforge]# echo $?
> 1

I just had this discussion a couple of months ago with someone. My
initial feeling was like you, a no-op is not a failure. But someone
convinced me otherwise...I will now endeavour to remember who that
was and how they convinced me...

Anyone else have input here?

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.

1 2 3 >

1 - 100 of 253 matches

Mail list logo