National Investment Corporation (NIC)
Greetings from United Arab Emirate, Let me introduce to you about National Investment Corporation (NIC) Funding Program. I have to start by introducing myself as an investment consultant working under the mandate of National Investment Corporation (NIC) here in Abu Dhabi UAE to reach out to project owners and business men and women for funding cooperation between their companies/firms and National Investment Corporation (NIC). In introducing the company I represent, NIC is a private Investment Company, is one of the leading strategic investors based in the of Abu Dhabi, United Arab Emirates. Since its establishment the company has focused on contributing to the sustainable development of the region while creating value through investments in fundamental growing sectors Today National Investment Corporation presents an optimal balance between return on investments and growth, through focusing on essential sectors including Oil & Gas, Banking & Finance, project management, tourism, Aviation, Real estate, Business Investment, Marine Projects, Solar project and industrialization. Furthermore, NIC has put forward unique investment opportunities and facilitated the development of various projects that meet the local and international market needs. National Investment Corporation (NIC) is acting as a lender and the loan will be disbursed on a clear Loan of 3.5% Interest annually to project owners and Equity Partners for their Investment Projects. They focus on seed Capital, Early-stage, start-up ventures, existing LLC and total completion and expansion of Investment Projects with immediate Funding. NIC can invest in any country on a good conduct with the both parties. Hope to hear from you if we have a common goal of a better tomorrow through investments. Best Regards, Michael Childer (Investment Consultant) National Investment Corporation (NIC) Marina Mall, Al Marina, Abu Dhabi United Arab Emirates
Re: Fw: [Bug 199109] New: pptp: kernel printk "recursion detected", and then reboot itself
On Wed, 21 Mar 2018 16:35:28 +0800 Guillaume Naultwrote > On Wed, Mar 21, 2018 at 09:03:57AM +0800, xu heng wrote: > > Yes, i have tested it for 146390 seconds in my board, it's ok now. Thanks! > > > Feel free to add your Tested-by tag to the patch if you want to. > Thanks for your report. > > Guillaume > > BTW, for your future exchanges on the list, please avoid top-posting. > I'm sorry for that, will never do that again. Thanks. xuheng
Re: [bpf-next V2 PATCH 10/15] xdp: rhashtable with allocator ID to pointer mapping
On 2018年03月20日 22:27, Jesper Dangaard Brouer wrote: On Tue, 20 Mar 2018 10:26:50 +0800 Jason Wangwrote: On 2018年03月19日 17:48, Jesper Dangaard Brouer wrote: On Fri, 16 Mar 2018 16:45:30 +0800 Jason Wang wrote: On 2018年03月10日 00:07, Jesper Dangaard Brouer wrote: On Fri, 9 Mar 2018 21:07:36 +0800 Jason Wang wrote: Use the IDA infrastructure for getting a cyclic increasing ID number, that is used for keeping track of each registered allocator per RX-queue xdp_rxq_info. Signed-off-by: Jesper Dangaard Brouer A stupid question is, can we manage to unify this ID with NAPI id? Sorry I don't understand the question? I mean can we associate page poll pointer to napi_struct, record NAPI id in xdp_mem_info and do lookup through NAPI id? No. The driver can unreg/reg a new XDP memory model, Is there an actual use case for this? I believe this is the common use case. When attaching an XDP/bpf prog, then the driver usually want to change the RX-ring memory model (different performance trade off). Right, but a single driver should only have one XDP memory model. No! -- a driver can have multiple XDP memory models, based on different performance trade offs and hardware capabilities. The mlx5 (100Gbit/s) driver/hardware is a good example, which need different memory models. It already support multiple RX memory models, depending on HW support. So let me correct my question, not familiar with mlx5e driver but if I understand correctly, driver (mlx5) will not change memory model during runtime for each NAPI. So NAPI id still work in this case? So, I predict that we hit at performance limit around 42Mpps on PCIe (I can measure 36Mpps), this is due to PCI-express translations/sec limit. The mlx5 HW supports a compressed descriptor format which deliver packets in several pages (based on offset and len), thus lowering the needed PCIe transactions. The pitfall is that this comes tail room limitations, which can be okay if e.g. the users use-case does not involve cpumap. Plus, when a driver need to support AF_XDP zero-copy, that also count as another XDP memory model... Yes or TAP zero-copy XDP. But it looks to me that we don't even need to care about the recycling here since the pages belongs to userspace. Thanks
ITS ALL ABOUT FACEBOOK
Hello, Facebook is given out 14,000,000.USD (Fourteen Million Dollars) its all about 14 Please, respond with your Unique Code (FB/BF14-13M5250UD) using your registration email, to the Verification Department at; dustinmoskovitz.facebo...@gmail.com Dustin Moskovitz Facebook Team Copyright © 2018 Facebook Int'l
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Thu, Mar 22, 2018 at 01:43:49AM +0100, Andrew Lunn wrote: > On Wed, Mar 21, 2018 at 03:47:02PM -0700, Richard Cochran wrote: > > I happy to improve the modeling, but the solution should be generic > > and work for every MAC driver. Let me qualify that a bit... > Something else to think about. There are a number of MAC drivers which > don't use phylib. All the intel drivers for example. They have their > own MDIO and PHY code. And recently there have been a number of MAC > drivers for hardware capable of > 1GBps which do all the PHY control > in firmware. > > A phydev is optional, the MAC is mandatory. So MACs that have a built in PHY won't work, but we don't care because there is no way to hang another MII device in there anyhow. We already require phylib for NETWORK_PHY_TIMESTAMPING, and so we expect that here, too. Many of these IP core things will be targeting arm with device tree, and I want that to "just work" without MAC changes. (This is exactly the same situation with DSA, BTW.) If someone attaches an MII time stamper to a MACs whose driver does their own thing without phylib, then they are going to have to hack the MAC driver in any case. Such hacks will never be acceptable for mainline because they are design specific. We really don't have to worry about this case. Thanks, Richard
[PATCH net-next v4 1/5] net: qualcomm: rmnet: Fix casting issues
Fix warnings which were reported when running with sparse (make C=1 CF=-D__CHECK_ENDIAN__) drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c:81:15: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:271:37: warning: incorrect type in assignment (different base types) expected unsigned short [unsigned] [usertype] pkt_len got restricted __be16 [usertype] drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:287:29: warning: incorrect type in assignment (different base types) expected unsigned short [unsigned] [usertype] pkt_len got restricted __be16 [usertype] drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:310:22: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:319:13: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:49:18: warning: cast to restricted __be16 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:50:18: warning: cast to restricted __be32 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:74:21: warning: cast to restricted __be16 Signed-off-by: Subash Abhinov Kasiviswanathan--- drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h index 6ce31e2..4f362df 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h @@ -23,8 +23,8 @@ struct rmnet_map_control_command { struct { u16 ip_family:2; u16 reserved:14; - u16 flow_control_seq_num; - u32 qos_id; + __be16 flow_control_seq_num; + __be32 qos_id; } flow_control; u8 data[0]; }; @@ -44,7 +44,7 @@ struct rmnet_map_header { u8 reserved_bit:1; u8 cd_bit:1; u8 mux_id; - u16 pkt_len; + __be16 pkt_len; } __aligned(1); struct rmnet_map_dl_csum_trailer { -- 1.9.1
[PATCH net-next v4 4/5] net: qualcomm: rmnet: Export mux_id and flags to netlink
Define new netlink attributes for rmnet mux_id and flags. These flags / mux_id were earlier using vlan flags / id respectively. The flag bits are also moved to uapi and are renamed with prefix RMNET_FLAG_*. Also add the rmnet policy to handle the new netlink attributes. Signed-off-by: Subash Abhinov Kasiviswanathan--- drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 41 +- .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 10 +++--- .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 2 +- .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 2 +- .../net/ethernet/qualcomm/rmnet/rmnet_private.h| 6 include/uapi/linux/if_link.h | 21 +++ 6 files changed, 53 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c index 096301a..c5b7b2a 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c @@ -43,6 +43,11 @@ /* Local Definitions and Declarations */ +static const struct nla_policy rmnet_policy[IFLA_RMNET_MAX + 1] = { + [IFLA_RMNET_MUX_ID] = { .type = NLA_U16 }, + [IFLA_RMNET_FLAGS] = { .len = sizeof(struct ifla_rmnet_flags) }, +}; + static int rmnet_is_real_dev_registered(const struct net_device *real_dev) { return rcu_access_pointer(real_dev->rx_handler) == rmnet_rx_handler; @@ -131,7 +136,7 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { - u32 data_format = RMNET_INGRESS_FORMAT_DEAGGREGATION; + u32 data_format = RMNET_FLAGS_INGRESS_DEAGGREGATION; struct net_device *real_dev; int mode = RMNET_EPMODE_VND; struct rmnet_endpoint *ep; @@ -143,14 +148,14 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev, if (!real_dev || !dev) return -ENODEV; - if (!data[IFLA_VLAN_ID]) + if (!data[IFLA_RMNET_MUX_ID]) return -EINVAL; ep = kzalloc(sizeof(*ep), GFP_ATOMIC); if (!ep) return -ENOMEM; - mux_id = nla_get_u16(data[IFLA_VLAN_ID]); + mux_id = nla_get_u16(data[IFLA_RMNET_MUX_ID]); err = rmnet_register_real_device(real_dev); if (err) @@ -165,10 +170,10 @@ static int rmnet_newlink(struct net *src_net, struct net_device *dev, hlist_add_head_rcu(>hlnode, >muxed_ep[mux_id]); - if (data[IFLA_VLAN_FLAGS]) { - struct ifla_vlan_flags *flags; + if (data[IFLA_RMNET_FLAGS]) { + struct ifla_rmnet_flags *flags; - flags = nla_data(data[IFLA_VLAN_FLAGS]); + flags = nla_data(data[IFLA_RMNET_FLAGS]); data_format = flags->flags & flags->mask; } @@ -276,10 +281,10 @@ static int rmnet_rtnl_validate(struct nlattr *tb[], struct nlattr *data[], { u16 mux_id; - if (!data || !data[IFLA_VLAN_ID]) + if (!data || !data[IFLA_RMNET_MUX_ID]) return -EINVAL; - mux_id = nla_get_u16(data[IFLA_VLAN_ID]); + mux_id = nla_get_u16(data[IFLA_RMNET_MUX_ID]); if (mux_id > (RMNET_MAX_LOGICAL_EP - 1)) return -ERANGE; @@ -304,8 +309,8 @@ static int rmnet_changelink(struct net_device *dev, struct nlattr *tb[], port = rmnet_get_port_rtnl(real_dev); - if (data[IFLA_VLAN_ID]) { - mux_id = nla_get_u16(data[IFLA_VLAN_ID]); + if (data[IFLA_RMNET_MUX_ID]) { + mux_id = nla_get_u16(data[IFLA_RMNET_MUX_ID]); ep = rmnet_get_endpoint(port, priv->mux_id); hlist_del_init_rcu(>hlnode); @@ -315,10 +320,10 @@ static int rmnet_changelink(struct net_device *dev, struct nlattr *tb[], priv->mux_id = mux_id; } - if (data[IFLA_VLAN_FLAGS]) { - struct ifla_vlan_flags *flags; + if (data[IFLA_RMNET_FLAGS]) { + struct ifla_rmnet_flags *flags; - flags = nla_data(data[IFLA_VLAN_FLAGS]); + flags = nla_data(data[IFLA_RMNET_FLAGS]); port->data_format = flags->flags & flags->mask; } @@ -327,13 +332,16 @@ static int rmnet_changelink(struct net_device *dev, struct nlattr *tb[], static size_t rmnet_get_size(const struct net_device *dev) { - return nla_total_size(2) /* IFLA_VLAN_ID */ + - nla_total_size(sizeof(struct ifla_vlan_flags)); /* IFLA_VLAN_FLAGS */ + return + /* IFLA_RMNET_MUX_ID */ + nla_total_size(2) + + /* IFLA_RMNET_FLAGS */ + nla_total_size(sizeof(struct ifla_rmnet_flags)); } struct rtnl_link_ops rmnet_link_ops __read_mostly = { .kind = "rmnet", - .maxtype=
[PATCH net-next v4 0/5] net: qualcomm: rmnet: Updates 2018-03-12
This series contains some minor updates for rmnet driver. Patch 1 contains fixes for sparse warnings. Patch 2 updates the copyright date to 2018. Patch 3 is a cleanup in receive path. Patch 4 has the new rmnet netlink attributes in uapi and updates the usage. Patch 5 has the implementation of the fill_info operation. v1->v2: Remove the force casts since the data type is changed to __be types as mentioned by David. v2->v3: Update copyright in files which actually had changes as mentioned by Joe. v3->v4: Add new netlink attributes for mux_id and flags instead of using the the vlan attributes as mentioned by David. The rmnet specific flags are also moved to uapi. The netlink updates are done as part of #4 and #5 has the fill_info operation. Subash Abhinov Kasiviswanathan (5): net: qualcomm: rmnet: Fix casting issues net: qualcomm: rmnet: Update copyright year to 2018 net: qualcomm: rmnet: Remove unnecessary device assignment net: qualcomm: rmnet: Export mux_id and flags to netlink net: qualcomm: rmnet: Implement fill_info drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 73 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h | 2 +- .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 12 ++-- drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h| 8 +-- .../ethernet/qualcomm/rmnet/rmnet_map_command.c| 4 +- .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 5 +- .../net/ethernet/qualcomm/rmnet/rmnet_private.h| 8 +-- drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c| 2 +- include/uapi/linux/if_link.h | 21 +++ 9 files changed, 94 insertions(+), 41 deletions(-) -- 1.9.1
[PATCH net-next v4 2/5] net: qualcomm: rmnet: Update copyright year to 2018
Signed-off-by: Subash Abhinov Kasiviswanathan--- drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 2 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h | 2 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c| 2 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h | 2 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c | 2 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c| 2 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h | 2 +- drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c index c494918..096301a 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved. +/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h index 00e4634..0b5b5da 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2014, 2016-2017 The Linux Foundation. All rights reserved. +/* Copyright (c) 2013-2014, 2016-2018 The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c index 601edec..c758248 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved. +/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h index 4f362df..884f1f5 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved. +/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c index b0dbca0..afa2b86 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved. +/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c index c74a6c5..49e420e 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2017, The Linux Foundation. All rights reserved. +/* Copyright (c) 2013-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h index de0143e..98365ef 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2014, 2016-2017 The Linux Foundation. All rights reserved. +/* Copyright (c) 2013-2014, 2016-2018 The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c index 346d310..2ea16a0 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2013-2017,
[PATCH net-next v4 3/5] net: qualcomm: rmnet: Remove unnecessary device assignment
Device of the de-aggregated skb is correctly assigned after inspecting the mux_id, so remove the assignment in rmnet_map_deaggregate(). Signed-off-by: Subash Abhinov Kasiviswanathan--- drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c index 49e420e..e8f6c79 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c @@ -323,7 +323,6 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb, if (!skbn) return NULL; - skbn->dev = skb->dev; skb_reserve(skbn, RMNET_MAP_DEAGGR_HEADROOM); skb_put(skbn, packet_len); memcpy(skbn->data, skb->data, packet_len); -- 1.9.1
[PATCH net-next v4 5/5] net: qualcomm: rmnet: Implement fill_info
This is needed to query the mux_id and flags of a rmnet device. Signed-off-by: Subash Abhinov Kasiviswanathan--- drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c index c5b7b2a..38d9356 100644 --- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c +++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c @@ -339,6 +339,35 @@ static size_t rmnet_get_size(const struct net_device *dev) nla_total_size(sizeof(struct ifla_rmnet_flags)); } +static int rmnet_fill_info(struct sk_buff *skb, const struct net_device *dev) +{ + struct rmnet_priv *priv = netdev_priv(dev); + struct net_device *real_dev; + struct ifla_rmnet_flags f; + struct rmnet_port *port; + + real_dev = priv->real_dev; + + if (!rmnet_is_real_dev_registered(real_dev)) + return -ENODEV; + + if (nla_put_u16(skb, IFLA_RMNET_MUX_ID, priv->mux_id)) + goto nla_put_failure; + + port = rmnet_get_port_rtnl(real_dev); + + f.flags = port->data_format; + f.mask = ~0; + + if (nla_put(skb, IFLA_RMNET_FLAGS, sizeof(f), )) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -EMSGSIZE; +} + struct rtnl_link_ops rmnet_link_ops __read_mostly = { .kind = "rmnet", .maxtype= __IFLA_RMNET_MAX, @@ -350,6 +379,7 @@ struct rtnl_link_ops rmnet_link_ops __read_mostly = { .get_size = rmnet_get_size, .changelink = rmnet_changelink, .policy = rmnet_policy, + .fill_info = rmnet_fill_info, }; /* Needs either rcu_read_lock() or rtnl lock */ -- 1.9.1
Re: [RFC PATCH 2/3] x86/io: implement 256-bit IO read and write
On Tue, Mar 20, 2018 at 7:42 AM, Alexander Duyckwrote: > > Instead of framing this as an enhanced version of the read/write ops > why not look at replacing or extending something like the > memcpy_fromio or memcpy_toio operations? Yes, doing something like "memcpy_fromio_avx()" is much more palatable, in that it works like the crypto functions do - if you do big chunks, the "kernel_fpu_begin/end()" isn't nearly the issue it can be otherwise. Note that we definitely have seen hardware that *depends* on the regular memcpy_fromio()" not doing big reads. I don't know how hardware people screw it up, but it's clearly possible. So it really needs to be an explicitly named function that basically a driver can use to say "my hardware really likes big aligned accesses" and explicitly ask for some AVX version if possible. Linus
linux-next: manual merge of the net-next tree with the mac80211 tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/mac80211/debugfs.c include/net/mac80211.h between commit: 7c181f4fcdc6 ("mac80211: add ieee80211_hw flag for QoS NDP support") from the mac80211 tree and commit: 94ba92713f83 ("mac80211: Call mgd_prep_tx before transmitting deauthentication") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/net/mac80211.h index 2b581bd93812,2fd59ed3be00.. --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@@ -2063,9 -2070,14 +2070,17 @@@ struct ieee80211_txq * @IEEE80211_HW_SUPPORTS_TDLS_BUFFER_STA: Hardware supports buffer STA on *TDLS links. * + * @IEEE80211_HW_DOESNT_SUPPORT_QOS_NDP: The driver (or firmware) doesn't + *support QoS NDP for AP probing - that's most likely a driver bug. + * + * @IEEE80211_HW_DEAUTH_NEED_MGD_TX_PREP: The driver requires the + *mgd_prepare_tx() callback to be called before transmission of a + *deauthentication frame in case the association was completed but no + *beacon was heard. This is required in multi-channel scenarios, where the + *virtual interface might not be given air time for the transmission of + *the frame, as it is not synced with the AP/P2P GO yet, and thus the + *deauthentication frame might not be transmitted. + * * @NUM_IEEE80211_HW_FLAGS: number of hardware flags, used for sizing arrays */ enum ieee80211_hw_flags { @@@ -2109,7 -2121,7 +2124,8 @@@ IEEE80211_HW_REPORTS_LOW_ACK, IEEE80211_HW_SUPPORTS_TX_FRAG, IEEE80211_HW_SUPPORTS_TDLS_BUFFER_STA, + IEEE80211_HW_DOESNT_SUPPORT_QOS_NDP, + IEEE80211_HW_DEAUTH_NEED_MGD_TX_PREP, /* keep last, obviously */ NUM_IEEE80211_HW_FLAGS diff --cc net/mac80211/debugfs.c index 94c7ee9df33b,a75653affbf7.. --- a/net/mac80211/debugfs.c +++ b/net/mac80211/debugfs.c @@@ -212,7 -212,7 +212,8 @@@ static const char *hw_flag_names[] = FLAG(REPORTS_LOW_ACK), FLAG(SUPPORTS_TX_FRAG), FLAG(SUPPORTS_TDLS_BUFFER_STA), + FLAG(DOESNT_SUPPORT_QOS_NDP), + FLAG(DEAUTH_NEED_MGD_TX_PREP), #undef FLAG }; pgpJb0NCHNdlE.pgp Description: OpenPGP digital signature
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Wed, Mar 21, 2018 at 03:47:02PM -0700, Richard Cochran wrote: > On Wed, Mar 21, 2018 at 11:16:52PM +0100, Andrew Lunn wrote: > > The MAC drivers are clients of this device. They then use a phandle > > and specifier: > > > > eth0: ethernet-controller@72000 { > > compatible = "marvell,kirkwood-eth"; > > #address-cells = <1>; > > #size-cells = <0>; > > reg = <0x72000 0x4000>; > > > > timerstamper = < 2> > > } > > > > The 2 indicates this MAC is using port 2. > > > > The MAC driver can then do the standard device tree things to follow > > the phandle to get access to the device and use the API it exports. > > But that would require hacking every last MAC driver. > > I happy to improve the modeling, but the solution should be generic > and work for every MAC driver. Something else to think about. There are a number of MAC drivers which don't use phylib. All the intel drivers for example. They have their own MDIO and PHY code. And recently there have been a number of MAC drivers for hardware capable of > 1GBps which do all the PHY control in firmware. A phydev is optional, the MAC is mandatory. Andrew
[next-queue PATCH v5 1/9] igb: Fix not adding filter elements to the list
Because the order of the parameters passes to 'hlist_add_behind()' was inverted, the 'parent' node was added "behind" the 'input', as input is not in the list, this causes the 'input' node to be lost. Fixes: 0e71def25281 ("igb: add support of RX network flow classification") Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 606e6761758f..143f0bb34e4d 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2864,7 +2864,7 @@ static int igb_update_ethtool_nfc_entry(struct igb_adapter *adapter, /* add filter to the list */ if (parent) - hlist_add_behind(>nfc_node, >nfc_node); + hlist_add_behind(>nfc_node, >nfc_node); else hlist_add_head(>nfc_node, >nfc_filter_list); -- 2.16.2
[next-queue PATCH v5 3/9] igb: Enable the hardware traffic class feature bit for igb models
This will allow functionality depending on the hardware being traffic class aware to work. In particular the tc-flower offloading checks verifies that this bit is set. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index d0e8e796c6fa..9ce29b8bb7da 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2806,6 +2806,9 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent) if (hw->mac.type >= e1000_82576) netdev->features |= NETIF_F_SCTP_CRC; + if (hw->mac.type >= e1000_i350) + netdev->features |= NETIF_F_HW_TC; + #define IGB_GSO_PARTIAL_FEATURES (NETIF_F_GSO_GRE | \ NETIF_F_GSO_GRE_CSUM | \ NETIF_F_GSO_IPXIP4 | \ -- 2.16.2
[next-queue PATCH v5 6/9] igb: Enable nfc filters to specify MAC addresses
This allows igb_add_filter()/igb_erase_filter() to work on filters that include MAC addresses (both source and destination). For now, this only exposes the functionality, the next commit glues ethtool into this. Later in this series, these APIs are used to allow offloading of cls_flower filters. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb.h | 4 drivers/net/ethernet/intel/igb/igb_ethtool.c | 28 2 files changed, 32 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index dfef1702ba21..66165879f12b 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -441,6 +441,8 @@ struct hwmon_buff { enum igb_filter_match_flags { IGB_FILTER_FLAG_ETHER_TYPE = 0x1, IGB_FILTER_FLAG_VLAN_TCI = 0x2, + IGB_FILTER_FLAG_SRC_MAC_ADDR = 0x4, + IGB_FILTER_FLAG_DST_MAC_ADDR = 0x8, }; #define IGB_MAX_RXNFC_FILTERS 16 @@ -455,6 +457,8 @@ struct igb_nfc_input { u8 match_flags; __be16 etype; __be16 vlan_tci; + u8 src_addr[ETH_ALEN]; + u8 dst_addr[ETH_ALEN]; }; struct igb_nfc_filter { diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 143f0bb34e4d..4c6a1b78c413 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2775,6 +2775,25 @@ int igb_add_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input) return err; } + if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) { + err = igb_add_mac_steering_filter(adapter, + input->filter.dst_addr, + input->action, 0); + err = min_t(int, err, 0); + if (err) + return err; + } + + if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) { + err = igb_add_mac_steering_filter(adapter, + input->filter.src_addr, + input->action, + IGB_MAC_STATE_SRC_ADDR); + err = min_t(int, err, 0); + if (err) + return err; + } + if (input->filter.match_flags & IGB_FILTER_FLAG_VLAN_TCI) err = igb_rxnfc_write_vlan_prio_filter(adapter, input); @@ -2823,6 +2842,15 @@ int igb_erase_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input) igb_clear_vlan_prio_filter(adapter, ntohs(input->filter.vlan_tci)); + if (input->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) + igb_del_mac_steering_filter(adapter, input->filter.src_addr, + input->action, + IGB_MAC_STATE_SRC_ADDR); + + if (input->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) + igb_del_mac_steering_filter(adapter, input->filter.dst_addr, + input->action, 0); + return 0; } -- 2.16.2
[next-queue PATCH v5 8/9] igb: Add the skeletons for tc-flower offloading
This adds basic functions needed to implement offloading for filters created by tc-flower. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_main.c | 66 +++ 1 file changed, 66 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 52cd891aa579..150231e4db9d 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -35,6 +35,7 @@ #include #include #include +#include #include #include #include @@ -2497,6 +2498,69 @@ static int igb_offload_cbs(struct igb_adapter *adapter, return 0; } +static int igb_configure_clsflower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *cls_flower) +{ + return -EOPNOTSUPP; +} + +static int igb_delete_clsflower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *cls_flower) +{ + return -EOPNOTSUPP; +} + +static int igb_setup_tc_cls_flower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *cls_flower) +{ + switch (cls_flower->command) { + case TC_CLSFLOWER_REPLACE: + return igb_configure_clsflower(adapter, cls_flower); + case TC_CLSFLOWER_DESTROY: + return igb_delete_clsflower(adapter, cls_flower); + case TC_CLSFLOWER_STATS: + return -EOPNOTSUPP; + default: + return -EINVAL; + } +} + +static int igb_setup_tc_block_cb(enum tc_setup_type type, void *type_data, +void *cb_priv) +{ + struct igb_adapter *adapter = cb_priv; + + if (!tc_cls_can_offload_and_chain0(adapter->netdev, type_data)) + return -EOPNOTSUPP; + + switch (type) { + case TC_SETUP_CLSFLOWER: + return igb_setup_tc_cls_flower(adapter, type_data); + + default: + return -EOPNOTSUPP; + } +} + +static int igb_setup_tc_block(struct igb_adapter *adapter, + struct tc_block_offload *f) +{ + if (f->binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) + return -EOPNOTSUPP; + + switch (f->command) { + case TC_BLOCK_BIND: + return tcf_block_cb_register(f->block, igb_setup_tc_block_cb, +adapter, adapter); + case TC_BLOCK_UNBIND: + tcf_block_cb_unregister(f->block, igb_setup_tc_block_cb, + adapter); + return 0; + default: + return -EOPNOTSUPP; + } +} + static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, void *type_data) { @@ -2505,6 +2569,8 @@ static int igb_setup_tc(struct net_device *dev, enum tc_setup_type type, switch (type) { case TC_SETUP_QDISC_CBS: return igb_offload_cbs(adapter, type_data); + case TC_SETUP_BLOCK: + return igb_setup_tc_block(adapter, type_data); default: return -EOPNOTSUPP; -- 2.16.2
[next-queue PATCH v5 0/9] igb: offloading of receive filters
Hi, Changes from v4: - Added a new bit to the MAC address filters internal representation to mean that some MAC address filters are steering filters (i.e. they direct traffic to queues); - And, this is only supported in i210; - Added a "Known Issue" section; Changes from v3: - Addressed review comments from Aaron F. Brown and Jakub Kicinski; Changes from v2: - Addressed review comments from Jakub Kicinski, mostly about coding style adjustments and more consistent error reporting; Changes from v1: - Addressed review comments from Alexander Duyck and Florian Fainelli; - Adding and removing cls_flower filters are now proposed in the same patch; - cls_flower filters are kept in a separated list from "ethtool" filters (so that section of the original cover letter is no longer valid); - The patch adding support for ethtool filters is now independent from the rest of the series; Known issue: - It seems that the the QSEL bits in the RAH registers do not have any effect for source address (i.e. steering doesn't work for source address filters), everything is pointing to a hardware (or documentation) issue; Original cover letter: This series enables some ethtool and tc-flower filters to be offloaded to igb-based network controllers. This is useful when the system configurator want to steer kinds of traffic to a specific hardware queue. The first two commits are bug fixes. The basis of this series is to export the internal API used to configure address filters, so they can be used by ethtool, and extending the functionality so an source address can be handled. Then, we enable the tc-flower offloading implementation to re-use the same infrastructure as ethtool, and storing them in the per-adapter "nfc" (Network Filter Config?) list. But for consistency, for destructive access they are separated, i.e. an filter added by tc-flower can only be removed by tc-flower, but ethtool can read them all. Only support for VLAN Prio, Source and Destination MAC Address, and Ethertype is enabled for now. Open question: - igb is initialized with the number of traffic classes as 1, if we want to use multiple traffic classes we need to increase this value, the only way I could find is to use mqprio (for example). Should igb be initialized with, say, the number of queues as its "num_tc"? Vinicius Costa Gomes (9): igb: Fix not adding filter elements to the list igb: Fix queue selection on MAC filters on i210 igb: Enable the hardware traffic class feature bit for igb models igb: Add support for MAC address filters specifying source addresses igb: Add support for enabling queue steering in filters igb: Enable nfc filters to specify MAC addresses igb: Add MAC address support for ethtool nftuple filters igb: Add the skeletons for tc-flower offloading igb: Add support for adding offloaded clsflower filters drivers/net/ethernet/intel/igb/e1000_defines.h | 2 + drivers/net/ethernet/intel/igb/igb.h | 13 + drivers/net/ethernet/intel/igb/igb_ethtool.c | 65 - drivers/net/ethernet/intel/igb/igb_main.c | 332 - 4 files changed, 398 insertions(+), 14 deletions(-) -- 2.16.2
[next-queue PATCH v5 4/9] igb: Add support for MAC address filters specifying source addresses
Makes it possible to direct packets to queues based on their source address. Documents the expected usage of the 'flags' parameter. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/e1000_defines.h | 1 + drivers/net/ethernet/intel/igb/igb.h | 1 + drivers/net/ethernet/intel/igb/igb_main.c | 40 ++ 3 files changed, 37 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index 83cabff1e0ab..a3e5514b044e 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -490,6 +490,7 @@ * manageability enabled, allowing us room for 15 multicast addresses. */ #define E1000_RAH_AV 0x8000/* Receive descriptor valid */ +#define E1000_RAH_ASEL_SRC_ADDR 0x0001 #define E1000_RAL_MAC_ADDR_LEN 4 #define E1000_RAH_MAC_ADDR_LEN 2 #define E1000_RAH_POOL_MASK 0x03FC diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 55d6f17d5799..4501b28ff7c5 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -473,6 +473,7 @@ struct igb_mac_addr { #define IGB_MAC_STATE_DEFAULT 0x1 #define IGB_MAC_STATE_IN_USE 0x2 +#define IGB_MAC_STATE_SRC_ADDR 0x4 /* board specific private data structure */ struct igb_adapter { diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 9ce29b8bb7da..a5a681f7fbb2 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6843,8 +6843,14 @@ static void igb_set_default_mac_filter(struct igb_adapter *adapter) igb_rar_set_index(adapter, 0); } -static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, - const u8 queue) +/* Add a MAC filter for 'addr' directing matching traffic to 'queue', + * 'flags' is used to indicate what kind of match is made, match is by + * default for the destination address, if matching by source address + * is desired the flag IGB_MAC_STATE_SRC_ADDR can be used. + */ +static int igb_add_mac_filter_flags(struct igb_adapter *adapter, + const u8 *addr, const u8 queue, + const u8 flags) { struct e1000_hw *hw = >hw; int rar_entries = hw->mac.rar_entry_count - @@ -6864,7 +6870,7 @@ static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, ether_addr_copy(adapter->mac_table[i].addr, addr); adapter->mac_table[i].queue = queue; - adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE; + adapter->mac_table[i].state |= IGB_MAC_STATE_IN_USE | flags; igb_rar_set_index(adapter, i); return i; @@ -6873,8 +6879,21 @@ static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, return -ENOSPC; } -static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, +static int igb_add_mac_filter(struct igb_adapter *adapter, const u8 *addr, const u8 queue) +{ + return igb_add_mac_filter_flags(adapter, addr, queue, 0); +} + +/* Remove a MAC filter for 'addr' directing matching traffic to + * 'queue', 'flags' is used to indicate what kind of match need to be + * removed, match is by default for the destination address, if + * matching by source address is to be removed the flag + * IGB_MAC_STATE_SRC_ADDR can be used. + */ +static int igb_del_mac_filter_flags(struct igb_adapter *adapter, + const u8 *addr, const u8 queue, + const u8 flags) { struct e1000_hw *hw = >hw; int rar_entries = hw->mac.rar_entry_count - @@ -6891,12 +6910,14 @@ static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, for (i = 0; i < rar_entries; i++) { if (!(adapter->mac_table[i].state & IGB_MAC_STATE_IN_USE)) continue; + if ((adapter->mac_table[i].state & flags) != flags) + continue; if (adapter->mac_table[i].queue != queue) continue; if (!ether_addr_equal(adapter->mac_table[i].addr, addr)) continue; - adapter->mac_table[i].state &= ~IGB_MAC_STATE_IN_USE; + adapter->mac_table[i].state = 0; memset(adapter->mac_table[i].addr, 0, ETH_ALEN); adapter->mac_table[i].queue = 0; @@ -6907,6 +6928,12 @@ static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, return -ENOENT; } +static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, + const u8 queue) +{ + return
[next-queue PATCH v5 7/9] igb: Add MAC address support for ethtool nftuple filters
This adds the capability of configuring the queue steering of arriving packets based on their source and destination MAC addresses. In practical terms this adds support for the following use cases, characterized by these examples: $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0 (this will direct packets with destination address "aa:aa:aa:aa:aa:aa" to the RX queue 0) $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 action 3 (this will direct packets with source address "44:44:44:44:44:44" to the RX queue 3) Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_ethtool.c | 35 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 4c6a1b78c413..27caa413ade2 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -2494,6 +2494,23 @@ static int igb_get_ethtool_nfc_entry(struct igb_adapter *adapter, fsp->h_ext.vlan_tci = rule->filter.vlan_tci; fsp->m_ext.vlan_tci = htons(VLAN_PRIO_MASK); } + if (rule->filter.match_flags & IGB_FILTER_FLAG_DST_MAC_ADDR) { + ether_addr_copy(fsp->h_u.ether_spec.h_dest, + rule->filter.dst_addr); + /* As we only support matching by the full +* mask, return the mask to userspace +*/ + eth_broadcast_addr(fsp->m_u.ether_spec.h_dest); + } + if (rule->filter.match_flags & IGB_FILTER_FLAG_SRC_MAC_ADDR) { + ether_addr_copy(fsp->h_u.ether_spec.h_source, + rule->filter.src_addr); + /* As we only support matching by the full +* mask, return the mask to userspace +*/ + eth_broadcast_addr(fsp->m_u.ether_spec.h_source); + } + return 0; } return -EINVAL; @@ -2932,10 +2949,6 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter *adapter, if ((fsp->flow_type & ~FLOW_EXT) != ETHER_FLOW) return -EINVAL; - if (fsp->m_u.ether_spec.h_proto != ETHER_TYPE_FULL_MASK && - fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) - return -EINVAL; - input = kzalloc(sizeof(*input), GFP_KERNEL); if (!input) return -ENOMEM; @@ -2945,6 +2958,20 @@ static int igb_add_ethtool_nfc_entry(struct igb_adapter *adapter, input->filter.match_flags = IGB_FILTER_FLAG_ETHER_TYPE; } + /* Only support matching addresses by the full mask */ + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_source)) { + input->filter.match_flags |= IGB_FILTER_FLAG_SRC_MAC_ADDR; + ether_addr_copy(input->filter.src_addr, + fsp->h_u.ether_spec.h_source); + } + + /* Only support matching addresses by the full mask */ + if (is_broadcast_ether_addr(fsp->m_u.ether_spec.h_dest)) { + input->filter.match_flags |= IGB_FILTER_FLAG_DST_MAC_ADDR; + ether_addr_copy(input->filter.dst_addr, + fsp->h_u.ether_spec.h_dest); + } + if ((fsp->flow_type & FLOW_EXT) && fsp->m_ext.vlan_tci) { if (fsp->m_ext.vlan_tci != htons(VLAN_PRIO_MASK)) { err = -EINVAL; -- 2.16.2
[next-queue PATCH v5 2/9] igb: Fix queue selection on MAC filters on i210
On the RAH registers there are semantic differences on the meaning of the "queue" parameter for traffic steering depending on the controller model: there is the 82575 meaning, which "queue" means a RX Hardware Queue, and the i350 meaning, where it is a reception pool. The previous behaviour was having no effect for i210 based controllers because the QSEL bit of the RAH register wasn't being set. This patch separates the condition in discrete cases, so the different handling is clearer. Fixes: 83c21335c876 ("igb: improve MAC filter handling") Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb_main.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 715bb32e6901..d0e8e796c6fa 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -8747,12 +8747,17 @@ static void igb_rar_set_index(struct igb_adapter *adapter, u32 index) if (is_valid_ether_addr(addr)) rar_high |= E1000_RAH_AV; - if (hw->mac.type == e1000_82575) + switch (hw->mac.type) { + case e1000_82575: + case e1000_i210: rar_high |= E1000_RAH_POOL_1 * - adapter->mac_table[index].queue; - else + adapter->mac_table[index].queue; + break; + default: rar_high |= E1000_RAH_POOL_1 << - adapter->mac_table[index].queue; + adapter->mac_table[index].queue; + break; + } } wr32(E1000_RAL(index), rar_low); -- 2.16.2
[next-queue PATCH v5 5/9] igb: Add support for enabling queue steering in filters
On some igb models (82575 and i210) the MAC address filters can control to which queue the packet will be assigned. This extends the 'state' with one more state to signify that queue selection should be enabled for that filter. As 82575 parts are no longer easily obtained (and this was developed against i210), only support for the i210 model is enabled. These functions are exported and will be used in the next patch. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/e1000_defines.h | 1 + drivers/net/ethernet/intel/igb/igb.h | 6 ++ drivers/net/ethernet/intel/igb/igb_main.c | 26 ++ 3 files changed, 33 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/e1000_defines.h b/drivers/net/ethernet/intel/igb/e1000_defines.h index a3e5514b044e..c6f552de30dd 100644 --- a/drivers/net/ethernet/intel/igb/e1000_defines.h +++ b/drivers/net/ethernet/intel/igb/e1000_defines.h @@ -491,6 +491,7 @@ */ #define E1000_RAH_AV 0x8000/* Receive descriptor valid */ #define E1000_RAH_ASEL_SRC_ADDR 0x0001 +#define E1000_RAH_QSEL_ENABLE 0x1000 #define E1000_RAL_MAC_ADDR_LEN 4 #define E1000_RAH_MAC_ADDR_LEN 2 #define E1000_RAH_POOL_MASK 0x03FC diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 4501b28ff7c5..dfef1702ba21 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -474,6 +474,7 @@ struct igb_mac_addr { #define IGB_MAC_STATE_DEFAULT 0x1 #define IGB_MAC_STATE_IN_USE 0x2 #define IGB_MAC_STATE_SRC_ADDR 0x4 +#define IGB_MAC_STATE_QUEUE_STEERING 0x8 /* board specific private data structure */ struct igb_adapter { @@ -739,4 +740,9 @@ int igb_add_filter(struct igb_adapter *adapter, int igb_erase_filter(struct igb_adapter *adapter, struct igb_nfc_filter *input); +int igb_add_mac_steering_filter(struct igb_adapter *adapter, + const u8 *addr, u8 queue, u8 flags); +int igb_del_mac_steering_filter(struct igb_adapter *adapter, + const u8 *addr, u8 queue, u8 flags); + #endif /* _IGB_H_ */ diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index a5a681f7fbb2..52cd891aa579 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -6934,6 +6934,28 @@ static int igb_del_mac_filter(struct igb_adapter *adapter, const u8 *addr, return igb_del_mac_filter_flags(adapter, addr, queue, 0); } +int igb_add_mac_steering_filter(struct igb_adapter *adapter, + const u8 *addr, u8 queue, u8 flags) +{ + struct e1000_hw *hw = >hw; + + /* In theory, this should be supported on 82575 as well, but +* that part wasn't easily accessible during development. +*/ + if (hw->mac.type != e1000_i210) + return -EOPNOTSUPP; + + return igb_add_mac_filter_flags(adapter, addr, queue, + IGB_MAC_STATE_QUEUE_STEERING | flags); +} + +int igb_del_mac_steering_filter(struct igb_adapter *adapter, + const u8 *addr, u8 queue, u8 flags) +{ + return igb_del_mac_filter_flags(adapter, addr, queue, + IGB_MAC_STATE_QUEUE_STEERING | flags); +} + static int igb_uc_sync(struct net_device *netdev, const unsigned char *addr) { struct igb_adapter *adapter = netdev_priv(netdev); @@ -8783,6 +8805,10 @@ static void igb_rar_set_index(struct igb_adapter *adapter, u32 index) switch (hw->mac.type) { case e1000_82575: case e1000_i210: + if (adapter->mac_table[index].state & + IGB_MAC_STATE_QUEUE_STEERING) + rar_high |= E1000_RAH_QSEL_ENABLE; + rar_high |= E1000_RAH_POOL_1 * adapter->mac_table[index].queue; break; -- 2.16.2
[next-queue PATCH v5 9/9] igb: Add support for adding offloaded clsflower filters
This allows filters added by tc-flower and specifying MAC addresses, Ethernet types, and the VLAN priority field, to be offloaded to the controller. This reuses most of the infrastructure used by ethtool, but clsflower filters are kept in a separated list, so they are invisible to ethtool. Signed-off-by: Vinicius Costa Gomes--- drivers/net/ethernet/intel/igb/igb.h | 2 + drivers/net/ethernet/intel/igb/igb_main.c | 188 +- 2 files changed, 188 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h index 66165879f12b..adfef068e866 100644 --- a/drivers/net/ethernet/intel/igb/igb.h +++ b/drivers/net/ethernet/intel/igb/igb.h @@ -464,6 +464,7 @@ struct igb_nfc_input { struct igb_nfc_filter { struct hlist_node nfc_node; struct igb_nfc_input filter; + unsigned long cookie; u16 etype_reg_index; u16 sw_idx; u16 action; @@ -603,6 +604,7 @@ struct igb_adapter { /* RX network flow classification support */ struct hlist_head nfc_filter_list; + struct hlist_head cls_flower_list; unsigned int nfc_filter_count; /* lock for RX network flow classification filter */ spinlock_t nfc_lock; diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index 150231e4db9d..cc580b17dab3 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -2498,16 +2498,197 @@ static int igb_offload_cbs(struct igb_adapter *adapter, return 0; } +#define ETHER_TYPE_FULL_MASK ((__force __be16)~0) +#define VLAN_PRIO_FULL_MASK (0x07) + +static int igb_parse_cls_flower(struct igb_adapter *adapter, + struct tc_cls_flower_offload *f, + int traffic_class, + struct igb_nfc_filter *input) +{ + struct netlink_ext_ack *extack = f->common.extack; + + if (f->dissector->used_keys & + ~(BIT(FLOW_DISSECTOR_KEY_BASIC) | + BIT(FLOW_DISSECTOR_KEY_CONTROL) | + BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS) | + BIT(FLOW_DISSECTOR_KEY_VLAN))) { + NL_SET_ERR_MSG_MOD(extack, + "Unsupported key used, only BASIC, CONTROL, ETH_ADDRS and VLAN are supported"); + return -EOPNOTSUPP; + } + + if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_ETH_ADDRS)) { + struct flow_dissector_key_eth_addrs *key, *mask; + + key = skb_flow_dissector_target(f->dissector, + FLOW_DISSECTOR_KEY_ETH_ADDRS, + f->key); + mask = skb_flow_dissector_target(f->dissector, +FLOW_DISSECTOR_KEY_ETH_ADDRS, +f->mask); + + if (!is_zero_ether_addr(mask->dst)) { + if (!is_broadcast_ether_addr(mask->dst)) { + NL_SET_ERR_MSG_MOD(extack, "Only full masks are supported for destination MAC address"); + return -EINVAL; + } + + input->filter.match_flags |= + IGB_FILTER_FLAG_DST_MAC_ADDR; + ether_addr_copy(input->filter.dst_addr, key->dst); + } + + if (!is_zero_ether_addr(mask->src)) { + if (!is_broadcast_ether_addr(mask->src)) { + NL_SET_ERR_MSG_MOD(extack, "Only full masks are supported for source MAC address"); + return -EINVAL; + } + + input->filter.match_flags |= + IGB_FILTER_FLAG_SRC_MAC_ADDR; + ether_addr_copy(input->filter.src_addr, key->src); + } + } + + if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) { + struct flow_dissector_key_basic *key, *mask; + + key = skb_flow_dissector_target(f->dissector, + FLOW_DISSECTOR_KEY_BASIC, + f->key); + mask = skb_flow_dissector_target(f->dissector, +FLOW_DISSECTOR_KEY_BASIC, +f->mask); + + if (mask->n_proto) { + if (mask->n_proto != ETHER_TYPE_FULL_MASK) { + NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for EtherType filter"); + return -EINVAL; + } + + input->filter.match_flags |= IGB_FILTER_FLAG_ETHER_TYPE; +
Re: [PATCH net-next v6 1/2] net: permit skb_segment on head_frag frag_list skb
On Wed, Mar 21, 2018 at 4:31 PM, Yonghong Songwrote: > One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at > function skb_segment(), line 3667. The bpf program attaches to > clsact ingress, calls bpf_skb_change_proto to change protocol > from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect > to send the changed packet out. > > 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, > 3473 netdev_features_t features) > 3474 { > 3475 struct sk_buff *segs = NULL; > 3476 struct sk_buff *tail = NULL; > ... > 3665 while (pos < offset + len) { > 3666 if (i >= nfrags) { > 3667 BUG_ON(skb_headlen(list_skb)); > 3668 > 3669 i = 0; > 3670 nfrags = skb_shinfo(list_skb)->nr_frags; > 3671 frag = skb_shinfo(list_skb)->frags; > 3672 frag_skb = list_skb; > ... > > call stack: > ... > #1 [883ffef03558] __crash_kexec at 8110c525 > #2 [883ffef03620] crash_kexec at 8110d5cc > #3 [883ffef03640] oops_end at 8101d7e7 > #4 [883ffef03668] die at 8101deb2 > #5 [883ffef03698] do_trap at 8101a700 > #6 [883ffef036e8] do_error_trap at 8101abfe > #7 [883ffef037a0] do_invalid_op at 8101acd0 > #8 [883ffef037b0] invalid_op at 81a00bab > [exception RIP: skb_segment+3044] > RIP: 817e4dd4 RSP: 883ffef03860 RFLAGS: 00010216 > RAX: 2bf6 RBX: 883feb7aaa00 RCX: 0011 > RDX: 883fb87910c0 RSI: 0011 RDI: 883feb7ab500 > RBP: 883ffef03928 R8: 2ce2 R9: 27da > R10: 01ea R11: 2d82 R12: 883f90a1ee80 > R13: 883fb8791120 R14: 883feb7abc00 R15: 2ce2 > ORIG_RAX: CS: 0010 SS: 0018 > #9 [883ffef03930] tcp_gso_segment at 818713e7 > --- --- > ... > > The triggering input skb has the following properties: > list_skb = skb->frag_list; > skb->nfrags != NULL && skb_headlen(list_skb) != 0 > and skb_segment() is not able to handle a frag_list skb > if its headlen (list_skb->len - list_skb->data_len) is not 0. > > This patch addressed the issue by handling skb_headlen(list_skb) != 0 > case properly if list_skb->head_frag is true, which is expected in > most cases. The head frag is processed before list_skb->frags > are processed. > > Reported-by: Diptanu Gon Choudhury > Signed-off-by: Yonghong Song This looks good to me. Reviewed-by: Alexander Duyck
Re: [PATCH v4 12/17] net: cxgb4/cxgb4vf: Eliminate duplicate barriers on weakly-ordered archs
On 2018-03-21 19:03, Casey Leedom wrote: [[ Appologies for the DUPLICATE email. I forgot to tell my Mail Agent to use Plain Text. -- Casey ]] I feel very uncomfortable with these proposed changes. Our team is right in the middle of trying to tease our way through the various platform implementations of writel(), writel_relaxed(), __raw_writel(), etc. in order to support x86, PowerPC, ARM, etc. with a single code base. This is complicated by the somewhat ... "fuzzily defined" semantics and varying platform implementations of all of these APIs. (And note that I'm just picking writel() as an example.) Additionally, many of the changes aren't even in fast paths and are thus unneeded for performance. Please don't make these changes. We're trying to get this all sussed out. I was also given the feedback to look at performance critical path only. I am in the process of revisiting the patches. If you can point me to the ones that are important, I can try to limit the changes to those only. If your team wants to do it, I can drop this patch as well. I think the semantics of write API is clear. What was actually implemented is another story. I can share a few of my findings. A portable driver needs to do this. descriptor update in mem wmb () writel_relaxed () mmiowb () Using __raw_write() is wrong as it can get reordered. Using wmb()+writel() is also wrong for performance reasons. If something is unclear, please ask.
RE: [PATCH net-next RFC V1 1/5] net: Introduce peer to peer one step PTP time stamping.
> -Original Message- > From: Richard Cochran [mailto:richardcoch...@gmail.com] > Sent: Wednesday, March 21, 2018 2:26 PM > To: Keller, Jacob E> Cc: netdev@vger.kernel.org; devicet...@vger.kernel.org; Andrew Lunn > ; David Miller ; Florian Fainelli > ; Mark Rutland ; Miroslav > Lichvar ; Rob Herring ; Willem de > Bruijn > Subject: Re: [PATCH net-next RFC V1 1/5] net: Introduce peer to peer one step > PTP time stamping. > > On Wed, Mar 21, 2018 at 08:05:36PM +, Keller, Jacob E wrote: > > I am guessing that we expect all devices which support onestep P2P messages, > will always support onestep SYNC as well? > > Yes. Anything else doesn't make sense, don't you think? > > Also, reading 1588, it isn't clear whether supporting only 1-step Sync > without 1-step P2P is even intended. There is only a "one-step > clock", and it is described as doing both. > > Thanks, > Richard This was my understanding as well, but given the limited hardware which can do sync but not pdelay messages, I just wanted to make sure we were on the same page. Thanks, Jake
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Wed, Mar 21, 2018 at 03:47:02PM -0700, Richard Cochran wrote: > On Wed, Mar 21, 2018 at 11:16:52PM +0100, Andrew Lunn wrote: > > The MAC drivers are clients of this device. They then use a phandle > > and specifier: > > > > eth0: ethernet-controller@72000 { > > compatible = "marvell,kirkwood-eth"; > > #address-cells = <1>; > > #size-cells = <0>; > > reg = <0x72000 0x4000>; > > > > timerstamper = < 2> > > } > > > > The 2 indicates this MAC is using port 2. > > > > The MAC driver can then do the standard device tree things to follow > > the phandle to get access to the device and use the API it exports. > > But that would require hacking every last MAC driver. > > I happy to improve the modeling, but the solution should be generic > and work for every MAC driver. Well, the solution is generic, in that the phandle can point to a device anywhere. It could be MMIO, it could be on an MDIO bus, etc. You just need to make sure you API makes no assumption about how the device driver talks to the hardware. How clever is this device? Can it tell the difference between 1000Base-X and SGMII? Can it figure out that the MAC is repeating every bit 100 times and so has dropped to 10Mbits? Does it understand EEE? Does it need to know if RGMII or RGMII-ID is being used? Can such a device really operation without the MAC being involved? My feeling is it needs to understand how the MII bus is being used. It might also be that the device is less capable than the MAC, so you need to turn off some of the MAC features. I think you are going to need the MAC actively involved in this. Andrew
[PATCH net-next v6 0/2] net: permit skb_segment on head_frag frag_list skb
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. Patch #1 provides a simple solution to avoid BUG_ON. If list_skb->head_frag is true, its page-backed frag will be processed before the list_skb->frags. Patch #2 provides a test case in test_bpf module which constructs a skb and calls skb_segment() directly. The test case is able to trigger the BUG_ON without Patch #1. The patch has been tested in the following setup: ipv6_host <-> nat_server <-> ipv4_host where nat_server has a bpf program doing ipv4<->ipv6 translation and forwarding through clsact hook bpf_skb_change_proto. Changelog: v5 -> v6: . Added back missed BUG_ON(!nfrags) for zero skb_headlen(skb) case, plus a couple of cosmetic changes, from Alexander. v4 -> v5: . Replace local variable head_frag with a static inline function skb_head_frag_to_page_desc which gets the head_frag on-demand. This makes code more readable and also does not increase the stack size, from Alexander. . Remove the "if(nfrags)" guard for skb_orphan_frags and skb_zerocopy_clone as I found that they can handle zero-frag skb (with non-zero skb_headlen(skb)) properly. . Properly release segment list from skb_segment() in the test, from Eric. v3 -> v4: . Remove dynamic memory allocation and use rewinding for both index and frag to remove one branch in fast path, from Alexander. . Fix a bunch of issues in test_bpf skb_segment() test, including proper way to allocate skb, proper function argument for skb_add_rx_frag and not freeint skb, etc., from Eric. v2 -> v3: . Use starting frag index -1 (instead of 0) to special process head_frag before other frags in the skb, from Alexander Duyck. v1 -> v2: . Removed never-hit BUG_ON, spotted by Linyu Yuan. Yonghong Song (2): net: permit skb_segment on head_frag frag_list skb net: bpf: add a test for skb_segment in test_bpf module lib/test_bpf.c| 93 +-- net/core/skbuff.c | 27 +--- 2 files changed, 113 insertions(+), 7 deletions(-) -- 2.9.5
[PATCH net-next v6 2/2] net: bpf: add a test for skb_segment in test_bpf module
Without the previous commit, "modprobe test_bpf" will have the following errors: ... [ 98.149165] [ cut here ] [ 98.159362] kernel BUG at net/core/skbuff.c:3667! [ 98.169756] invalid opcode: [#1] SMP PTI [ 98.179370] Modules linked in: [ 98.179371] test_bpf(+) ... which triggers the bug the previous commit intends to fix. The skbs are constructed to mimic what mlx5 may generate. The packet size/header may not mimic real cases in production. But the processing flow is similar. Signed-off-by: Yonghong Song--- lib/test_bpf.c | 93 -- 1 file changed, 91 insertions(+), 2 deletions(-) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index 2efb213..a468b5c 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -6574,6 +6574,93 @@ static bool exclude_test(int test_id) return test_id < test_range[0] || test_id > test_range[1]; } +static __init struct sk_buff *build_test_skb(void) +{ + u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN; + struct sk_buff *skb[2]; + struct page *page[2]; + int i, data_size = 8; + + for (i = 0; i < 2; i++) { + page[i] = alloc_page(GFP_KERNEL); + if (!page[i]) { + if (i == 0) + goto err_page0; + else + goto err_page1; + } + + /* this will set skb[i]->head_frag */ + skb[i] = dev_alloc_skb(headroom + data_size); + if (!skb[i]) { + if (i == 0) + goto err_skb0; + else + goto err_skb1; + } + + skb_reserve(skb[i], headroom); + skb_put(skb[i], data_size); + skb[i]->protocol = htons(ETH_P_IP); + skb_reset_network_header(skb[i]); + skb_set_mac_header(skb[i], -ETH_HLEN); + + skb_add_rx_frag(skb[i], 0, page[i], 0, 64, 64); + // skb_headlen(skb[i]): 8, skb[i]->head_frag = 1 + } + + /* setup shinfo */ + skb_shinfo(skb[0])->gso_size = 1448; + skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4; + skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb[0])->gso_segs = 0; + skb_shinfo(skb[0])->frag_list = skb[1]; + + /* adjust skb[0]'s len */ + skb[0]->len += skb[1]->len; + skb[0]->data_len += skb[1]->data_len; + skb[0]->truesize += skb[1]->truesize; + + return skb[0]; + +err_skb1: + __free_page(page[1]); +err_page1: + kfree_skb(skb[0]); +err_skb0: + __free_page(page[0]); +err_page0: + return NULL; +} + +static __init int test_skb_segment(void) +{ + netdev_features_t features; + struct sk_buff *skb, *segs; + int ret = -1; + + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | + NETIF_F_IPV6_CSUM; + features |= NETIF_F_RXCSUM; + skb = build_test_skb(); + if (!skb) { + pr_info("%s: failed to build_test_skb", __func__); + goto done; + } + + segs = skb_segment(skb, features); + if (segs) { + kfree_skb_list(segs); + ret = 0; + pr_info("%s: success in skb_segment!", __func__); + } else { + pr_info("%s: failed in skb_segment!", __func__); + } + kfree_skb(skb); +done: + return ret; +} + static __init int test_bpf(void) { int i, err_cnt = 0, pass_cnt = 0; @@ -6632,9 +6719,11 @@ static int __init test_bpf_init(void) return ret; ret = test_bpf(); - destroy_bpf_tests(); - return ret; + if (ret) + return ret; + + return test_skb_segment(); } static void __exit test_bpf_exit(void) -- 2.9.5
[PATCH net-next v6 1/2] net: permit skb_segment on head_frag frag_list skb
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [883ffef03558] __crash_kexec at 8110c525 #2 [883ffef03620] crash_kexec at 8110d5cc #3 [883ffef03640] oops_end at 8101d7e7 #4 [883ffef03668] die at 8101deb2 #5 [883ffef03698] do_trap at 8101a700 #6 [883ffef036e8] do_error_trap at 8101abfe #7 [883ffef037a0] do_invalid_op at 8101acd0 #8 [883ffef037b0] invalid_op at 81a00bab [exception RIP: skb_segment+3044] RIP: 817e4dd4 RSP: 883ffef03860 RFLAGS: 00010216 RAX: 2bf6 RBX: 883feb7aaa00 RCX: 0011 RDX: 883fb87910c0 RSI: 0011 RDI: 883feb7ab500 RBP: 883ffef03928 R8: 2ce2 R9: 27da R10: 01ea R11: 2d82 R12: 883f90a1ee80 R13: 883fb8791120 R14: 883feb7abc00 R15: 2ce2 ORIG_RAX: CS: 0010 SS: 0018 #9 [883ffef03930] tcp_gso_segment at 818713e7 --- --- ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. This patch addressed the issue by handling skb_headlen(list_skb) != 0 case properly if list_skb->head_frag is true, which is expected in most cases. The head frag is processed before list_skb->frags are processed. Reported-by: Diptanu Gon ChoudhurySigned-off-by: Yonghong Song --- net/core/skbuff.c | 27 ++- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 715c134..4e1d4e7 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3460,6 +3460,19 @@ void *skb_pull_rcsum(struct sk_buff *skb, unsigned int len) } EXPORT_SYMBOL_GPL(skb_pull_rcsum); +static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb) +{ + skb_frag_t head_frag; + struct page *page; + + page = virt_to_head_page(frag_skb->head); + head_frag.page.p = page; + head_frag.page_offset = frag_skb->data - + (unsigned char *)page_address(page); + head_frag.size = skb_headlen(frag_skb); + return head_frag; +} + /** * skb_segment - Perform protocol segmentation on skb. * @head_skb: buffer to segment @@ -3664,15 +3677,19 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, while (pos < offset + len) { if (i >= nfrags) { - BUG_ON(skb_headlen(list_skb)); - i = 0; nfrags = skb_shinfo(list_skb)->nr_frags; frag = skb_shinfo(list_skb)->frags; frag_skb = list_skb; + if (!skb_headlen(list_skb)) { + BUG_ON(!nfrags); + } else { + BUG_ON(!list_skb->head_frag); - BUG_ON(!nfrags); - + /* to make room for head_frag. */ + i--; + frag--; + } if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) @@ -3689,7 +3706,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, goto err; } - *nskb_frag = *frag; + *nskb_frag = (i < 0) ? skb_head_frag_to_page_desc(frag_skb) : *frag; __skb_frag_ref(nskb_frag); size = skb_frag_size(nskb_frag); -- 2.9.5
Re: [PATCH v4 12/17] net: cxgb4/cxgb4vf: Eliminate duplicate barriers on weakly-ordered archs
[[ Appologies for the DUPLICATE email. I forgot to tell my Mail Agent to use Plain Text. -- Casey ]] I feel very uncomfortable with these proposed changes. Our team is right in the middle of trying to tease our way through the various platform implementations of writel(), writel_relaxed(), __raw_writel(), etc. in order to support x86, PowerPC, ARM, etc. with a single code base. This is complicated by the somewhat ... "fuzzily defined" semantics and varying platform implementations of all of these APIs. (And note that I'm just picking writel() as an example.) Additionally, many of the changes aren't even in fast paths and are thus unneeded for performance. Please don't make these changes. We're trying to get this all sussed out. Casey From: Sinan KayaSent: Monday, March 19, 2018 7:42:27 PM To: netdev@vger.kernel.org; ti...@codeaurora.org; sulr...@codeaurora.org Cc: linux-arm-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; Sinan Kaya; Ganesh GR; Casey Leedom; linux-ker...@vger.kernel.org Subject: [PATCH v4 12/17] net: cxgb4/cxgb4vf: Eliminate duplicate barriers on weakly-ordered archs Code includes wmb() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Create a new wrapper function with relaxed write operator. Use the new wrapper when a write is following a wmb(). Signed-off-by: Sinan Kaya --- drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 6 ++ drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 13 +++-- drivers/net/ethernet/chelsio/cxgb4/sge.c | 12 ++-- drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 2 +- drivers/net/ethernet/chelsio/cxgb4vf/adapter.h | 14 ++ drivers/net/ethernet/chelsio/cxgb4vf/sge.c | 18 ++ 6 files changed, 44 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 9040e13..6bde0b9 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -1202,6 +1202,12 @@ static inline void t4_write_reg(struct adapter *adap, u32 reg_addr, u32 val) writel(val, adap->regs + reg_addr); } +static inline void t4_write_reg_relaxed(struct adapter *adap, u32 reg_addr, + u32 val) +{ + writel_relaxed(val, adap->regs + reg_addr); +} + #ifndef readq static inline u64 readq(const volatile void __iomem *addr) { diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index 7b452e8..276472d 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -1723,8 +1723,8 @@ int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, u16 pidx, else val = PIDX_T5_V(delta); wmb(); - t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A), - QID_V(qid) | val); + t4_write_reg_relaxed(adap, MYPF_REG(SGE_PF_KDOORBELL_A), + QID_V(qid) | val); } out: return ret; @@ -1902,8 +1902,9 @@ static void enable_txq_db(struct adapter *adap, struct sge_txq *q) * are committed before we tell HW about them. */ wmb(); - t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A), - QID_V(q->cntxt_id) | PIDX_V(q->db_pidx_inc)); + t4_write_reg_relaxed(adap, MYPF_REG(SGE_PF_KDOORBELL_A), + QID_V(q->cntxt_id) | + PIDX_V(q->db_pidx_inc)); q->db_pidx_inc = 0; } q->db_disabled = 0; @@ -2003,8 +2004,8 @@ static void sync_txq_pidx(struct adapter *adap, struct sge_txq *q) else val = PIDX_T5_V(delta); wmb(); - t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A), - QID_V(q->cntxt_id) | val); + t4_write_reg_relaxed(adap, MYPF_REG(SGE_PF_KDOORBELL_A), + QID_V(q->cntxt_id) | val); } out: q->db_disabled = 0; diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c index 6e310a0..7388aac 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c @@ -530,11 +530,11 @@ static inline void ring_fl_db(struct adapter *adap, struct sge_fl *q) * mechanism. */ if (unlikely(q->bar2_addr == NULL)) { - t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL_A), - val |
Re: [PATCH net-next v5 1/2] net: permit skb_segment on head_frag frag_list skb
On 3/21/18 2:51 PM, Alexander Duyck wrote: On Wed, Mar 21, 2018 at 1:36 PM, Yonghong Songwrote: One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [883ffef03558] __crash_kexec at 8110c525 #2 [883ffef03620] crash_kexec at 8110d5cc #3 [883ffef03640] oops_end at 8101d7e7 #4 [883ffef03668] die at 8101deb2 #5 [883ffef03698] do_trap at 8101a700 #6 [883ffef036e8] do_error_trap at 8101abfe #7 [883ffef037a0] do_invalid_op at 8101acd0 #8 [883ffef037b0] invalid_op at 81a00bab [exception RIP: skb_segment+3044] RIP: 817e4dd4 RSP: 883ffef03860 RFLAGS: 00010216 RAX: 2bf6 RBX: 883feb7aaa00 RCX: 0011 RDX: 883fb87910c0 RSI: 0011 RDI: 883feb7ab500 RBP: 883ffef03928 R8: 2ce2 R9: 27da R10: 01ea R11: 2d82 R12: 883f90a1ee80 R13: 883fb8791120 R14: 883feb7abc00 R15: 2ce2 ORIG_RAX: CS: 0010 SS: 0018 #9 [883ffef03930] tcp_gso_segment at 818713e7 --- --- ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. This patch addressed the issue by handling skb_headlen(list_skb) != 0 case properly if list_skb->head_frag is true, which is expected in most cases. The head frag is processed before list_skb->frags are processed. Reported-by: Diptanu Gon Choudhury Signed-off-by: Yonghong Song --- net/core/skbuff.c | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 715c134..23b317a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3460,6 +3460,19 @@ void *skb_pull_rcsum(struct sk_buff *skb, unsigned int len) } EXPORT_SYMBOL_GPL(skb_pull_rcsum); +static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb) +{ + skb_frag_t head_frag; + struct page *page; + + page = virt_to_head_page(frag_skb->head); + head_frag.page.p = page; + head_frag.page_offset = frag_skb->data - + (unsigned char *)page_address(page); + head_frag.size = skb_headlen(frag_skb); + return head_frag; +} + /** * skb_segment - Perform protocol segmentation on skb. * @head_skb: buffer to segment @@ -3664,15 +3677,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, while (pos < offset + len) { if (i >= nfrags) { - BUG_ON(skb_headlen(list_skb)); - i = 0; nfrags = skb_shinfo(list_skb)->nr_frags; frag = skb_shinfo(list_skb)->frags; - frag_skb = list_skb; You could probably leave this line in place. No point in moving it. The only reason I moved it is to make define more close to the use. But I am totally fine with leaving it as it. - - BUG_ON(!nfrags); + if (skb_headlen(list_skb)) { + BUG_ON(!list_skb->head_frag); + /* to make room for head_frag. */ + i--; frag--; Normally these should be two separate lines one for "i--;" and one for "frag--;". Will change. Surprised that checkpatch.pl did not complain about this. + } You could probably place the BUG_ON(!nfrags) in an else statement here to handle the case where we have a potentially empty skb which would be a bug. Yes, this makes sense. Will add this BUG_ON. + frag_skb = list_skb;
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Wed, Mar 21, 2018 at 11:16:52PM +0100, Andrew Lunn wrote: > The MAC drivers are clients of this device. They then use a phandle > and specifier: > > eth0: ethernet-controller@72000 { > compatible = "marvell,kirkwood-eth"; > #address-cells = <1>; > #size-cells = <0>; > reg = <0x72000 0x4000>; > > timerstamper = < 2> > } > > The 2 indicates this MAC is using port 2. > > The MAC driver can then do the standard device tree things to follow > the phandle to get access to the device and use the API it exports. But that would require hacking every last MAC driver. I happy to improve the modeling, but the solution should be generic and work for every MAC driver. Thanks, Richard
Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc
On Wed, 21 Mar 2018, Thomas Gleixner wrote: > If you look at the use cases of TDM in various fields then FIFO mode is > pretty much useless. In industrial/automotive fieldbus applications the > various time slices are filled by different threads or even processes. That brings me to a related question. The TDM cases I'm familiar with which aim to use this utilize multiple periodic time slices, aka 802.1Qbv time-aware scheduling. Simple example: [1a][1b][1c][1d][1a][1b][1c][1d][. [2a][2b][2c][2d] [3a][3b] [4a][4b] --> t where 1-4 is the slice level and a-d are network nodes. In most cases the slice levels on a node are handled by different applications or threads. Some of the protocols utilize dedicated time slice levels - lets assume '4' in the above example - to run general network traffic which might even be allowed to have collisions, i.e. [4a-d] would become [4] and any node can send; the involved componets like switches are supposed to handle that. I'm not seing how TBS is going to assist with any of that. It requires everything to be handled at the application level. Not really useful especially not for general traffic which does not know about the scheduling bands at all. If you look at an industrial control node. It basically does: queue_first_packet(tx, slice1); while (!stop) { if (wait_for_packet(rx) == ERROR) goto errorhandling; tx = do_computation(rx); queue_next_tx(tx, slice1); } that's a pretty common pattern for these kind of applications. For audio sources queue_next() might be triggered by the input sampler which needs to be synchronized to the network slices anyway in order to work properly. TBS per current implementation is nice as a proof of concept, but it solves just a small portion of the complete problem space. I have the suspicion that this was 'designed' to replace the user space hack in the AVNU stack with something close to it. Not really a good plan to be honest. I think what we really want is a strict periodic scheduler which supports multiple slices as shown above because thats what all relevant TDM use cases need: A/V, industrial fieldbusses . |-| | | | TAS |<- Config |1 2 3 4| |-| | | | | | | | | | | | | | | | | [DirectSocket] [Qdisc FIFO] [Qdisc Prio] [Qdisc FIFO] | | | | | | [Socket][Socket] [General traffic] The interesting thing here is that it does not require any time stamp information brought in from the application. That's especially good for general network traffic which is routed through a dedicated time slot. If we don't have that then we need a user space scheduler which does exactly the same thing and we have to route the general traffic out to user space and back into the kernel, which is obviously a pointless exercise. There are all kind of TDM schemes out there which are not directly driven by applications, but rather route categorized traffic like VLANs through dedicated time slices. That works pretty well with the above scheme because in that case the applications might be completely oblivious about the tx time schedule. Surely there are protocols which do not utilize every time slice they could use, so we need a way to tell the number of empty slices between two consecutive packets. There are also different policies vs. the unused time slices, like sending dummy frames or just nothing which wants to be addressed, but I don't think that changes the general approach. There might be some special cases for setup or node hotplug, but the protocols I'm familiar with handle these in dedicated time slices or through general traffic so it should just fit in. I'm surely missing some details, but from my knowledge about the protocols which want to utilize this, the general direction should be fine. Feel free to tell me that I'm missing the point completely though :) Thoughts? Thanks, tglx
Re: [PATCH V2 net-next 06/14] net/tls: Add generic NIC offload infrastructure
Hi, Saeed, thanks for fixing some of my remarks, but I've dived into the code more deeply, and found with a sadness, the patch lacks the readability. It too big and not fit kernel coding style. Please, see some comments below. Can we do something with patch length? Is there a way to split it in several small patches? It's difficult to review the logic of changes. On 22.03.2018 00:01, Saeed Mahameed wrote: > From: Ilya Lesokhin> > This patch adds a generic infrastructure to offload TLS crypto to a > network devices. It enables the kernel TLS socket to skip encryption > and authentication operations on the transmit side of the data path. > Leaving those computationally expensive operations to the NIC. > > The NIC offload infrastructure builds TLS records and pushes them to > the TCP layer just like the SW KTLS implementation and using the same API. > TCP segmentation is mostly unaffected. Currently the only exception is > that we prevent mixed SKBs where only part of the payload requires > offload. In the future we are likely to add a similar restriction > following a change cipher spec record. > > The notable differences between SW KTLS and NIC offloaded TLS > implementations are as follows: > 1. The offloaded implementation builds "plaintext TLS record", those > records contain plaintext instead of ciphertext and place holder bytes > instead of authentication tags. > 2. The offloaded implementation maintains a mapping from TCP sequence > number to TLS records. Thus given a TCP SKB sent from a NIC offloaded > TLS socket, we can use the tls NIC offload infrastructure to obtain > enough context to encrypt the payload of the SKB. > A TLS record is released when the last byte of the record is ack'ed, > this is done through the new icsk_clean_acked callback. > > The infrastructure should be extendable to support various NIC offload > implementations. However it is currently written with the > implementation below in mind: > The NIC assumes that packets from each offloaded stream are sent as > plaintext and in-order. It keeps track of the TLS records in the TCP > stream. When a packet marked for offload is transmitted, the NIC > encrypts the payload in-place and puts authentication tags in the > relevant place holders. > > The responsibility for handling out-of-order packets (i.e. TCP > retransmission, qdisc drops) falls on the netdev driver. > > The netdev driver keeps track of the expected TCP SN from the NIC's > perspective. If the next packet to transmit matches the expected TCP > SN, the driver advances the expected TCP SN, and transmits the packet > with TLS offload indication. > > If the next packet to transmit does not match the expected TCP SN. The > driver calls the TLS layer to obtain the TLS record that includes the > TCP of the packet for transmission. Using this TLS record, the driver > posts a work entry on the transmit queue to reconstruct the NIC TLS > state required for the offload of the out-of-order packet. It updates > the expected TCP SN accordingly and transmit the now in-order packet. > The same queue is used for packet transmission and TLS context > reconstruction to avoid the need for flushing the transmit queue before > issuing the context reconstruction request. > > Signed-off-by: Ilya Lesokhin > Signed-off-by: Boris Pismenny > Signed-off-by: Aviad Yehezkel > Signed-off-by: Saeed Mahameed > --- > include/net/tls.h | 74 +++- > net/tls/Kconfig | 10 + > net/tls/Makefile | 2 + > net/tls/tls_device.c | 793 > ++ > net/tls/tls_device_fallback.c | 415 ++ > net/tls/tls_main.c| 33 +- > 6 files changed, 1320 insertions(+), 7 deletions(-) > create mode 100644 net/tls/tls_device.c > create mode 100644 net/tls/tls_device_fallback.c > > diff --git a/include/net/tls.h b/include/net/tls.h > index 4913430ab807..0bfb1b0a156a 100644 > --- a/include/net/tls.h > +++ b/include/net/tls.h > @@ -77,6 +77,37 @@ struct tls_sw_context { > struct scatterlist sg_aead_out[2]; > }; > > +struct tls_record_info { > + struct list_head list; > + u32 end_seq; > + int len; > + int num_frags; > + skb_frag_t frags[MAX_SKB_FRAGS]; > +}; > + > +struct tls_offload_context { > + struct crypto_aead *aead_send; > + spinlock_t lock;/* protects records list */ > + struct list_head records_list; > + struct tls_record_info *open_record; > + struct tls_record_info *retransmit_hint; > + u64 hint_record_sn; > + u64 unacked_record_sn; > + > + struct scatterlist sg_tx_data[MAX_SKB_FRAGS]; > + void (*sk_destruct)(struct sock *sk); > + u8 driver_state[]; > + /* The TLS layer reserves room for driver specific state > + * Currently the belief is that there is not enough > + * driver specific state
[PATCH net-next 1/1] tc-testing: updated police, mirred, skbedit and skbmod with more tests
Added extra test cases for control actions (reclassify, pipe etc.), cookies, max index value and police args sanity check. Signed-off-by: Roman Mashak--- .../tc-testing/tc-tests/actions/mirred.json| 192 + .../tc-testing/tc-tests/actions/police.json| 144 .../tc-testing/tc-tests/actions/skbedit.json | 168 ++ .../tc-testing/tc-tests/actions/skbmod.json| 26 ++- 4 files changed, 529 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json b/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json index 0fcccf18399b..443c9b3c8664 100644 --- a/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/mirred.json @@ -171,6 +171,198 @@ ] }, { +"id": "8917", +"name": "Add mirred mirror action with control pass", +"category": [ +"actions", +"mirred" +], +"setup": [ +[ +"$TC actions flush action mirred", +0, +1, +255 +] +], +"cmdUnderTest": "$TC actions add action mirred ingress mirror dev lo pass index 1", +"expExitCode": "0", +"verifyCmd": "$TC actions get action mirred index 1", +"matchPattern": "action order [0-9]*: mirred \\(Ingress Mirror to device lo\\) pass.*index 1 ref", +"matchCount": "1", +"teardown": [ +"$TC actions flush action mirred" +] +}, +{ +"id": "1054", +"name": "Add mirred mirror action with control pipe", +"category": [ +"actions", +"mirred" +], +"setup": [ +[ +"$TC actions flush action mirred", +0, +1, +255 +] +], +"cmdUnderTest": "$TC actions add action mirred ingress mirror dev lo pipe index 15", +"expExitCode": "0", +"verifyCmd": "$TC actions get action mirred index 15", +"matchPattern": "action order [0-9]*: mirred \\(Ingress Mirror to device lo\\) pipe.*index 15 ref", +"matchCount": "1", +"teardown": [ +"$TC actions flush action mirred" +] +}, +{ +"id": "9887", +"name": "Add mirred mirror action with control continue", +"category": [ +"actions", +"mirred" +], +"setup": [ +[ +"$TC actions flush action mirred", +0, +1, +255 +] +], +"cmdUnderTest": "$TC actions add action mirred ingress mirror dev lo continue index 15", +"expExitCode": "0", +"verifyCmd": "$TC actions get action mirred index 15", +"matchPattern": "action order [0-9]*: mirred \\(Ingress Mirror to device lo\\) continue.*index 15 ref", +"matchCount": "1", +"teardown": [ +"$TC actions flush action mirred" +] +}, +{ +"id": "e4aa", +"name": "Add mirred mirror action with control reclassify", +"category": [ +"actions", +"mirred" +], +"setup": [ +[ +"$TC actions flush action mirred", +0, +1, +255 +] +], +"cmdUnderTest": "$TC actions add action mirred ingress mirror dev lo reclassify index 150", +"expExitCode": "0", +"verifyCmd": "$TC actions get action mirred index 150", +"matchPattern": "action order [0-9]*: mirred \\(Ingress Mirror to device lo\\) reclassify.*index 150 ref", +"matchCount": "1", +"teardown": [ +"$TC actions flush action mirred" +] +}, +{ +"id": "ece9", +"name": "Add mirred mirror action with control drop", +"category": [ +"actions", +"mirred" +], +"setup": [ +[ +"$TC actions flush action mirred", +0, +1, +255 +] +], +"cmdUnderTest": "$TC actions add action mirred ingress mirror dev lo drop index 99", +"expExitCode": "0", +"verifyCmd": "$TC actions get action mirred index 99", +"matchPattern": "action order [0-9]*: mirred \\(Ingress Mirror to device lo\\) drop.*index 99 ref", +"matchCount": "1", +"teardown": [ +"$TC actions flush action mirred" +] +}, +{ +"id": "0031", +"name": "Add mirred mirror action with control jump", +"category": [ +"actions", +"mirred" +], +"setup": [ +[ +"$TC actions flush action mirred",
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Wed, Mar 21, 2018 at 02:57:29PM -0700, Richard Cochran wrote: > On Wed, Mar 21, 2018 at 10:44:36PM +0100, Andrew Lunn wrote: > > O.K, so lets do the 20 questions approach. > > :) > > > As far as i can see, this is not an MDIO device. It is not connected > > to the MDIO bus, it has no MDIO registers, you don't even pass a valid > > MDIO address in device tree. > > Right. O.K, so i suggest we stop trying to model this thing as an MDIO device. It is really an MMIO device. > There might very well be other products out there that *do* > use MDIO commands. I know that there are MII time stamping asics and > ip cores on the market, but I don't know all of their creative design > details. So i suggest we leave the design for those until we actual see one. > > It it actually an MII bus snooper? Does it snoop, or is it actually in > > the MII bus, and can modify packets, i.e. insert time stamps as frames > > pass over the MII bus? > > It acts like a "snooper" to provide out of band time stamps, but it > also can modify packets when for the one-step functionality. > > > When the driver talks about having three ports, does that mean it can > > be on three different MII busses? O.K, so here is how i think it should be done. It is a device which offers services to other devices. It is not that different to an interrupt controller, a GPIO controller, etc. Lets follow how they work in device tree The device itself is just another MMIO mapped device in the SoC: timestamper@6000 { compatible = "ines,ptp-ctrl"; reg = <0x6000 0x80>; #address-cells = <1>; #size-cells = <0>; }; The MAC drivers are clients of this device. They then use a phandle and specifier: eth0: ethernet-controller@72000 { compatible = "marvell,kirkwood-eth"; #address-cells = <1>; #size-cells = <0>; reg = <0x72000 0x4000>; timerstamper = < 2> } The 2 indicates this MAC is using port 2. The MAC driver can then do the standard device tree things to follow the phandle to get access to the device and use the API it exports. Andrew
Re: [PATCH REPOST v4 5/7] ixgbevf: keep writel() closer to wmb()
On 2018-03-21 17:54, David Miller wrote: From: Jeff KirsherDate: Wed, 21 Mar 2018 14:48:08 -0700 On Wed, 2018-03-21 at 14:56 -0400, Sinan Kaya wrote: Remove ixgbevf_write_tail() in favor of moving writel() close to wmb(). Signed-off-by: Sinan Kaya Reviewed-by: Alexander Duyck --- drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 - drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++-- 2 files changed, 2 insertions(+), 7 deletions(-) This patch fails to compile because there is a call to ixgbevf_write_tail() which you missed cleaning up. For a change with delicate side effects, it doesn't create much confidence if the code does not even compile. Sinan, please put more care into the changes you are making. I think the issue is the tree that code is getting tested has undelivered code as Alex mentioned. I was using linux-next 4.16 rc4 for testing. I will rebase to Jeff's tree. Thank you.
Re: [PATCH net v2 0/7] fix idr leak in actions
From: Davide CarattiDate: Mon, 19 Mar 2018 15:31:21 +0100 > This series fixes situations where a temporary failure to install a TC > action results in the permanent impossibility to reuse the configured > value of 'index'. > > Thanks to Cong Wang for the initial review. > > v2: fix build error in act_ipt.c, reported by kbuild test robot Series applied, thanks Davide.
Re: [PATCH] qede: fix spelling mistake: "registeration" -> "registration"
From: Colin KingDate: Mon, 19 Mar 2018 14:57:11 + > From: Colin Ian King > > Trivial fix to spelling mistakes in DP_ERR error message text and > comments > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] bnx2x: fix spelling mistake: "registeration" -> "registration"
From: Colin KingDate: Mon, 19 Mar 2018 14:32:59 + > From: Colin Ian King > > Trivial fix to spelling mistake in BNX2X_ERR error message text > > Signed-off-by: Colin Ian King Applied.
[trivial PATCH V2] treewide: Align function definition open/close braces
Some functions definitions have either the initial open brace and/or the closing brace outside of column 1. Move those braces to column 1. This allows various function analyzers like gnu complexity to work properly for these modified functions. Signed-off-by: Joe PerchesAcked-by: Andy Shevchenko Acked-by: Paul Moore Acked-by: Alex Deucher Acked-by: Dave Chinner Reviewed-by: Darrick J. Wong Acked-by: Alexandre Belloni Acked-by: Martin K. Petersen Acked-by: Takashi Iwai Acked-by: Mauro Carvalho Chehab --- git diff -w still shows no difference. This patch was sent but December and not applied. As the trivial maintainer seems not active, it'd be nice if Andrew Morton picks this up. V2: Remove fs/xfs/libxfs/xfs_alloc.c as it's updated and remerge the rest arch/x86/include/asm/atomic64_32.h | 2 +- drivers/acpi/custom_method.c | 2 +- drivers/acpi/fan.c | 2 +- drivers/gpu/drm/amd/display/dc/core/dc.c | 2 +- drivers/media/i2c/msp3400-kthreads.c | 2 +- drivers/message/fusion/mptsas.c | 2 +- drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c | 2 +- drivers/net/wireless/ath/ath9k/xmit.c| 2 +- drivers/platform/x86/eeepc-laptop.c | 2 +- drivers/rtc/rtc-ab-b5ze-s3.c | 2 +- drivers/scsi/dpt_i2o.c | 2 +- drivers/scsi/sym53c8xx_2/sym_glue.c | 2 +- fs/locks.c | 2 +- fs/ocfs2/stack_user.c| 2 +- fs/xfs/xfs_export.c | 2 +- kernel/audit.c | 6 +++--- kernel/trace/trace_printk.c | 4 ++-- lib/raid6/sse2.c | 14 +++--- sound/soc/fsl/fsl_dma.c | 2 +- 19 files changed, 28 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h index 46e1ef17d92d..92212bf0484f 100644 --- a/arch/x86/include/asm/atomic64_32.h +++ b/arch/x86/include/asm/atomic64_32.h @@ -123,7 +123,7 @@ static inline long long arch_atomic64_read(const atomic64_t *v) long long r; alternative_atomic64(read, "=" (r), "c" (v) : "memory"); return r; - } +} /** * arch_atomic64_add_return - add and return diff --git a/drivers/acpi/custom_method.c b/drivers/acpi/custom_method.c index b33fba70ec51..a07fbe999eb6 100644 --- a/drivers/acpi/custom_method.c +++ b/drivers/acpi/custom_method.c @@ -97,7 +97,7 @@ static void __exit acpi_custom_method_exit(void) { if (cm_dentry) debugfs_remove(cm_dentry); - } +} module_init(acpi_custom_method_init); module_exit(acpi_custom_method_exit); diff --git a/drivers/acpi/fan.c b/drivers/acpi/fan.c index 6cf4988206f2..3563103590c6 100644 --- a/drivers/acpi/fan.c +++ b/drivers/acpi/fan.c @@ -219,7 +219,7 @@ fan_set_cur_state(struct thermal_cooling_device *cdev, unsigned long state) return fan_set_state_acpi4(device, state); else return fan_set_state(device, state); - } +} static const struct thermal_cooling_device_ops fan_cooling_ops = { .get_max_state = fan_get_max_state, diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index 8394d69b963f..e934326a95d3 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -588,7 +588,7 @@ static void disable_dangling_plane(struct dc *dc, struct dc_state *context) **/ struct dc *dc_create(const struct dc_init_data *init_params) - { +{ struct dc *dc = kzalloc(sizeof(*dc), GFP_KERNEL); unsigned int full_pipe_count; diff --git a/drivers/media/i2c/msp3400-kthreads.c b/drivers/media/i2c/msp3400-kthreads.c index 4dd01e9f553b..dc6cb8d475b3 100644 --- a/drivers/media/i2c/msp3400-kthreads.c +++ b/drivers/media/i2c/msp3400-kthreads.c @@ -885,7 +885,7 @@ static int msp34xxg_modus(struct i2c_client *client) } static void msp34xxg_set_source(struct i2c_client *client, u16 reg, int in) - { +{ struct msp_state *state = to_state(i2c_get_clientdata(client)); int source, matrix; diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c index 439ee9c5f535..231f3a1e27bf 100644 --- a/drivers/message/fusion/mptsas.c +++ b/drivers/message/fusion/mptsas.c @@ -2967,7 +2967,7 @@ mptsas_exp_repmanufacture_info(MPT_ADAPTER *ioc, mutex_unlock(>sas_mgmt.mutex); out:
Re: [PATCH v2 bpf-next 4/8] tracepoint: compute num_args at build time
On 3/21/18 12:44 PM, Linus Torvalds wrote: On Wed, Mar 21, 2018 at 11:54 AM, Alexei Starovoitovwrote: add fancy macro to compute number of arguments passed into tracepoint at compile time and store it as part of 'struct tracepoint'. We should probably do this __COUNT() thing in some generic header, we just talked last week about another use case entirely. ok. Not sure which generic header though. Should I move it to include/linux/kernel.h ? And wouldn't it be nice to just have some generic infrastructure like this: /* * This counts to ten. * * Any more than that, and we'd need to take off our shoes */ #define __GET_COUNT(_0,_1,_2,_3,_4,_5,_6,_7,_8,_9,_10,_n,...) _n #define __COUNT(...) \ __GET_COUNT(__VA_ARGS__,10,9,8,7,6,5,4,3,2,1,0) #define COUNT(...) __COUNT(dummy,##__VA_ARGS__) since it will be a build time error, it's a good time to discuss how many arguments we want to support in tracepoints and in general in other places that would want to use this macro. Like the only reason my patch is counting till 17 is because of trace_iwlwifi_dev_ucode_error(). The next offenders are using 12 arguments: trace_mc_event() trace_mm_vmscan_lru_shrink_inactive() Clearly not every efficient usage of it: trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, nr_scanned, nr_reclaimed, stat.nr_dirty, stat.nr_writeback, stat.nr_congested, stat.nr_immediate, stat.nr_activate, stat.nr_ref_keep, stat.nr_unmap_fail, sc->priority, file); could have passed instead. I'd like to refactor that trace_iwlwifi_dev_ucode_error() and from now on set the limit to 12. Any offenders should be using tracepoints with <= 12 args instead of extending the macro. Does it sound reasonable ? #define __CONCAT(a,b) a##b #define __CONCATENATE(a,b) __CONCAT(a,b) and then you can do things like: #define fn(...) __CONCATENATE(fn,COUNT(__VA_ARGS__))(__VA_ARGS__) which turns "fn(x,y,z..)" into "fn(x,y,z)". That can be useful for things like "max(a,b,c,d)" expanding to "max4()", and then you can just have the trivial #define max3(a,b,c) max2(a,max2(b.c)) I can try that. Not sure my macro-fu is up to that level. __CAST_TO_U64() macro from the next patch was difficult to make work across compilers and architectures.
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Wed, Mar 21, 2018 at 10:44:36PM +0100, Andrew Lunn wrote: > O.K, so lets do the 20 questions approach. :) > As far as i can see, this is not an MDIO device. It is not connected > to the MDIO bus, it has no MDIO registers, you don't even pass a valid > MDIO address in device tree. Right. There might very well be other products out there that *do* use MDIO commands. I know that there are MII time stamping asics and ip cores on the market, but I don't know all of their creative design details. > It it actually an MII bus snooper? Does it snoop, or is it actually in > the MII bus, and can modify packets, i.e. insert time stamps as frames > pass over the MII bus? It acts like a "snooper" to provide out of band time stamps, but it also can modify packets when for the one-step functionality. > When the driver talks about having three ports, does that mean it can > be on three different MII busses? Yes. HTH, Richard
Re: [PATCH REPOST v4 5/7] ixgbevf: keep writel() closer to wmb()
From: Jeff KirsherDate: Wed, 21 Mar 2018 14:48:08 -0700 > On Wed, 2018-03-21 at 14:56 -0400, Sinan Kaya wrote: >> Remove ixgbevf_write_tail() in favor of moving writel() close to >> wmb(). >> >> Signed-off-by: Sinan Kaya >> Reviewed-by: Alexander Duyck >> --- >> drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 - >> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++-- >> 2 files changed, 2 insertions(+), 7 deletions(-) > > This patch fails to compile because there is a call to > ixgbevf_write_tail() which you missed cleaning up. For a change with delicate side effects, it doesn't create much confidence if the code does not even compile. Sinan, please put more care into the changes you are making. Thank you.
Re: [Intel-wired-lan] [PATCH REPOST v4 5/7] ixgbevf: keep writel() closer to wmb()
On Wed, Mar 21, 2018 at 2:51 PM,wrote: > On 2018-03-21 17:48, Jeff Kirsher wrote: >> >> On Wed, 2018-03-21 at 14:56 -0400, Sinan Kaya wrote: >>> >>> Remove ixgbevf_write_tail() in favor of moving writel() close to >>> wmb(). >>> >>> Signed-off-by: Sinan Kaya >>> Reviewed-by: Alexander Duyck >>> --- >>> drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 - >>> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++-- >>> 2 files changed, 2 insertions(+), 7 deletions(-) >> >> >> This patch fails to compile because there is a call to >> ixgbevf_write_tail() which you missed cleaning up. > > > Hah, I did a compile test but maybe I missed something. I will get v6 of > this patch only and leave the rest of the series as it is. Actually you might want to just pull Jeff's tree and rebase before you submit your patches. I suspect the difference is the ixgbevf XDP code that is present in Jeff's tree and not in Dave's. The alternative is to wait for Jeff to push the ixgbevf code and then once Dave has pulled it you could rebase your patches. Thanks. - Alex
Re: [PATCH net-next RFC V1 3/5] net: Introduce field for the MII time stamper.
On Wed, Mar 21, 2018 at 12:12:00PM -0700, Florian Fainelli wrote: > > +static int mdiobus_netdev_notification(struct notifier_block *nb, > > + unsigned long msg, void *ptr) > > +{ > > + struct net_device *netdev = netdev_notifier_info_to_dev(ptr); > > + struct phy_device *phydev = netdev->phydev; > > + struct mdio_device *mdev; > > + struct mii_bus *bus; > > + int i; > > + > > + if (netdev->mdiots || msg != NETDEV_UP || !phydev) > > + return NOTIFY_DONE; > > You are still assuming that we have a phy_device somehow, whereas you > parch series wants to solve that for generic MDIO devices, that is a bit > confusing. The phydev is the only thing that associates a netdev with an MII bus. > > + > > + /* > > +* Examine the MII bus associated with the PHY that is > > +* attached to the MAC. If there is a time stamping device > > +* on the bus, then connect it to the network device. > > +*/ > > + bus = phydev->mdio.bus; > > + > > + for (i = 0; i < PHY_MAX_ADDR; i++) { > > + mdev = bus->mdio_map[i]; > > + if (!mdev) > > + continue; > > + if (mdiodev_supports_timestamping(mdev)) { > > + netdev->mdiots = mdev; > > + return NOTIFY_OK; > > What guarantees that netdev->mdiots gets cleared? Why would it need to be cleared? > Also, why is this done > with a notifier instead of through phy_{connect,attach,disconnect}? We have no guarantee the mdio device has been probed yet. > It > looks like we still have this requirement of the mdio TS device being a > phy_device somehow, I am confused here... We only need the phydev to get from the netdev to the mii bus. > > + } > > + } > > + > > + return NOTIFY_DONE; > > +} > > + > > #ifdef CONFIG_PM > > static int mdio_bus_suspend(struct device *dev) > > { > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 5fbb9f1da7fd..223d691aa0b0 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -1943,6 +1943,7 @@ struct net_device { > > struct netprio_map __rcu *priomap; > > #endif > > struct phy_device *phydev; > > + struct mdio_device *mdiots; > > phy_device embedds a mdio_device, can you find a way to rework the PHY > PTP code to utilize the phy_device's mdio instance so do not introduce > yet another pointer in that big structure that net_device already is? It would be strange and wrong to "steal" the phy's mdio struct, IMHO. After all, we just got support for non-PHY mdio devices. The natural solution is to use it. Thanks, Richard
Re: [PATCH net-next v5 1/2] net: permit skb_segment on head_frag frag_list skb
On Wed, Mar 21, 2018 at 1:36 PM, Yonghong Songwrote: > One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at > function skb_segment(), line 3667. The bpf program attaches to > clsact ingress, calls bpf_skb_change_proto to change protocol > from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect > to send the changed packet out. > > 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, > 3473 netdev_features_t features) > 3474 { > 3475 struct sk_buff *segs = NULL; > 3476 struct sk_buff *tail = NULL; > ... > 3665 while (pos < offset + len) { > 3666 if (i >= nfrags) { > 3667 BUG_ON(skb_headlen(list_skb)); > 3668 > 3669 i = 0; > 3670 nfrags = skb_shinfo(list_skb)->nr_frags; > 3671 frag = skb_shinfo(list_skb)->frags; > 3672 frag_skb = list_skb; > ... > > call stack: > ... > #1 [883ffef03558] __crash_kexec at 8110c525 > #2 [883ffef03620] crash_kexec at 8110d5cc > #3 [883ffef03640] oops_end at 8101d7e7 > #4 [883ffef03668] die at 8101deb2 > #5 [883ffef03698] do_trap at 8101a700 > #6 [883ffef036e8] do_error_trap at 8101abfe > #7 [883ffef037a0] do_invalid_op at 8101acd0 > #8 [883ffef037b0] invalid_op at 81a00bab > [exception RIP: skb_segment+3044] > RIP: 817e4dd4 RSP: 883ffef03860 RFLAGS: 00010216 > RAX: 2bf6 RBX: 883feb7aaa00 RCX: 0011 > RDX: 883fb87910c0 RSI: 0011 RDI: 883feb7ab500 > RBP: 883ffef03928 R8: 2ce2 R9: 27da > R10: 01ea R11: 2d82 R12: 883f90a1ee80 > R13: 883fb8791120 R14: 883feb7abc00 R15: 2ce2 > ORIG_RAX: CS: 0010 SS: 0018 > #9 [883ffef03930] tcp_gso_segment at 818713e7 > --- --- > ... > > The triggering input skb has the following properties: > list_skb = skb->frag_list; > skb->nfrags != NULL && skb_headlen(list_skb) != 0 > and skb_segment() is not able to handle a frag_list skb > if its headlen (list_skb->len - list_skb->data_len) is not 0. > > This patch addressed the issue by handling skb_headlen(list_skb) != 0 > case properly if list_skb->head_frag is true, which is expected in > most cases. The head frag is processed before list_skb->frags > are processed. > > Reported-by: Diptanu Gon Choudhury > Signed-off-by: Yonghong Song > --- > net/core/skbuff.c | 26 -- > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 715c134..23b317a 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3460,6 +3460,19 @@ void *skb_pull_rcsum(struct sk_buff *skb, unsigned int > len) > } > EXPORT_SYMBOL_GPL(skb_pull_rcsum); > > +static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb) > +{ > + skb_frag_t head_frag; > + struct page *page; > + > + page = virt_to_head_page(frag_skb->head); > + head_frag.page.p = page; > + head_frag.page_offset = frag_skb->data - > + (unsigned char *)page_address(page); > + head_frag.size = skb_headlen(frag_skb); > + return head_frag; > +} > + > /** > * skb_segment - Perform protocol segmentation on skb. > * @head_skb: buffer to segment > @@ -3664,15 +3677,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > > while (pos < offset + len) { > if (i >= nfrags) { > - BUG_ON(skb_headlen(list_skb)); > - > i = 0; > nfrags = skb_shinfo(list_skb)->nr_frags; > frag = skb_shinfo(list_skb)->frags; > - frag_skb = list_skb; You could probably leave this line in place. No point in moving it. > - > - BUG_ON(!nfrags); > + if (skb_headlen(list_skb)) { > + BUG_ON(!list_skb->head_frag); > > + /* to make room for head_frag. */ > + i--; frag--; Normally these should be two separate lines one for "i--;" and one for "frag--;". > + } You could probably place the BUG_ON(!nfrags) in an else statement here to handle the case where we have a potentially empty skb which would be a bug. > + frag_skb = list_skb; > if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || > skb_zerocopy_clone(nskb, frag_skb,
Re: [PATCH REPOST v4 5/7] ixgbevf: keep writel() closer to wmb()
On 2018-03-21 17:48, Jeff Kirsher wrote: On Wed, 2018-03-21 at 14:56 -0400, Sinan Kaya wrote: Remove ixgbevf_write_tail() in favor of moving writel() close to wmb(). Signed-off-by: Sinan KayaReviewed-by: Alexander Duyck --- drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 - drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++-- 2 files changed, 2 insertions(+), 7 deletions(-) This patch fails to compile because there is a call to ixgbevf_write_tail() which you missed cleaning up. Hah, I did a compile test but maybe I missed something. I will get v6 of this patch only and leave the rest of the series as it is.
Re: [PATCH REPOST v4 5/7] ixgbevf: keep writel() closer to wmb()
On Wed, 2018-03-21 at 14:56 -0400, Sinan Kaya wrote: > Remove ixgbevf_write_tail() in favor of moving writel() close to > wmb(). > > Signed-off-by: Sinan Kaya> Reviewed-by: Alexander Duyck > --- > drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 5 - > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 4 ++-- > 2 files changed, 2 insertions(+), 7 deletions(-) This patch fails to compile because there is a call to ixgbevf_write_tail() which you missed cleaning up. signature.asc Description: This is a digitally signed message part
Re: [PATCH net-next RFC V1 2/5] net: phy: Move time stamping interface into the generic mdio layer.
On Wed, Mar 21, 2018 at 12:10:07PM -0700, Florian Fainelli wrote: > > + phydev->mdio.ts_info = dp83640_ts_info; > > + phydev->mdio.hwtstamp = dp83640_hwtstamp; > > + phydev->mdio.rxtstamp = dp83640_rxtstamp; > > + phydev->mdio.txtstamp = dp83640_txtstamp; > > Why is this implemented a the mdio_device level and not at the > mdio_driver level? This looks like the wrong level at which this is done. The question could be asked of: struct mdio_device { int (*bus_match)(struct device *dev, struct device_driver *drv); void (*device_free)(struct mdio_device *mdiodev); void (*device_remove)(struct mdio_device *mdiodev); } I saw how this is done for the phy, etc, but I don't see any benefit of doing it that way. It would add an extra layer (or two) of indirection and save the space four pointer functions. Is that trade-off worth it? Thanks, Richard
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
Hi Richard > The only other docs that I have is a PDF of the register layout, but I > don't think I can redistribute that. Actually, there really isn't any > detail in that doc at all. O.K, so lets do the 20 questions approach. As far as i can see, this is not an MDIO device. It is not connected to the MDIO bus, it has no MDIO registers, you don't even pass a valid MDIO address in device tree. It it actually an MII bus snooper? Does it snoop, or is it actually in the MII bus, and can modify packets, i.e. insert time stamps as frames pass over the MII bus? When the driver talks about having three ports, does that mean it can be on three different MII busses? Thanks Andrew
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Wed, Mar 21, 2018 at 08:33:15PM +0100, Andrew Lunn wrote: > Can you point us at some documentation for this. The overall one-step functionality is described IEEE 1588. > I think Florian and I want to better understand how this device works, > in order to understand your other changes. The device is from here: https://www.zhaw.ch/en/engineering/institutes-centres/ines/products-and-services/ptp-ieee-1588/ptp-hardware/#c43991 The only other docs that I have is a PDF of the register layout, but I don't think I can redistribute that. Actually, there really isn't any detail in that doc at all. Thanks, Richard
Re: [PATCH net-next RFC V1 1/5] net: Introduce peer to peer one step PTP time stamping.
On Wed, Mar 21, 2018 at 08:05:36PM +, Keller, Jacob E wrote: > I am guessing that we expect all devices which support onestep P2P messages, > will always support onestep SYNC as well? Yes. Anything else doesn't make sense, don't you think? Also, reading 1588, it isn't clear whether supporting only 1-step Sync without 1-step P2P is even intended. There is only a "one-step clock", and it is described as doing both. Thanks, Richard
Re: [PATCH][next] gre: fix TUNNEL_SEQ bit check on sequence numbering
On Wed, Mar 21, 2018 at 12:34 PM, Colin Kingwrote: > From: Colin Ian King > > The current logic of flags | TUNNEL_SEQ is always non-zero and hence > sequence numbers are always incremented no matter the setting of the > TUNNEL_SEQ bit. Fix this by using & instead of |. > > Detected by CoverityScan, CID#1466039 ("Operands don't affect result") > > Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") > Signed-off-by: Colin Ian King Thanks for the fix! btw, how can I access the CoverityScan result with this CID? Acked-by: William Tu > --- > net/ipv4/ip_gre.c | 2 +- > net/ipv6/ip6_gre.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 2fa2ef2e2af9..9ab1aa2f7660 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -550,7 +550,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct > net_device *dev, > (TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ); > gre_build_header(skb, tunnel_hlen, flags, proto, > tunnel_id_to_key32(tun_info->key.tun_id), > -(flags | TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); > +(flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); > > df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > index 0bcefc480aeb..3a98c694da5f 100644 > --- a/net/ipv6/ip6_gre.c > +++ b/net/ipv6/ip6_gre.c > @@ -725,7 +725,7 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, > gre_build_header(skb, tunnel->tun_hlen, > flags, protocol, > tunnel_id_to_key32(tun_info->key.tun_id), > -(flags | TUNNEL_SEQ) ? > htonl(tunnel->o_seqno++) > +(flags & TUNNEL_SEQ) ? > htonl(tunnel->o_seqno++) > : 0); > > } else { > -- > 2.15.1 >
Re: [PATCH net-next v2] net: mvpp2: Don't use dynamic allocs for local variables
Hello Yan, On Wed, 21 Mar 2018 19:57:47 +, Yan Markmanwrote : > Hi Maxime Please avoid top-posting on this list. > Please check the TWO points: > > 1). The mvpp2_prs_flow_find() returns TID if found > The TID=0 is valid FOUND value > For Not-found use -ENOENT (just like your mvpp2_prs_vlan_find) This is actually what is used in this patch. You might be refering to a previous draft version of this patch. > 2). The original code always uses "mvpp2_prs_entry *pe" storage > Zero-Allocated Please check the correctnes of new "mvpp2_prs_entry > pe" without memset(pe, 0, sizeof(pe)); >in all procedures where pe=kzalloc() has been replaced I think we're good on that regard. On places where I didn't memset the prs_entry, the pe.index field is set, and this is followed by a read from TCAM that will initialize the prs_entry to the correct value : pe.index = tid; mvpp2_prs_hw_read(priv, ); > Thanks > Yan Markman [...] Thanks, Maxime
Re: [PATCH V2 net-next 06/14] net/tls: Add generic NIC offload infrastructure
On 03/21/2018 02:01 PM, Saeed Mahameed wrote: > From: Ilya Lesokhin> > This patch adds a generic infrastructure to offload TLS crypto to a ... > + > +static inline int tls_push_record(struct sock *sk, > + struct tls_context *ctx, > + struct tls_offload_context *offload_ctx, > + struct tls_record_info *record, > + struct page_frag *pfrag, > + int flags, > + unsigned char record_type) > +{ > + skb_frag_t *frag; > + struct tcp_sock *tp = tcp_sk(sk); > + struct page_frag fallback_frag; > + struct page_frag *tag_pfrag = pfrag; > + int i; > + > + /* fill prepand */ > + frag = >frags[0]; > + tls_fill_prepend(ctx, > + skb_frag_address(frag), > + record->len - ctx->prepend_size, > + record_type); > + > + if (unlikely(!skb_page_frag_refill(ctx->tag_size, pfrag, GFP_KERNEL))) { > + /* HW doesn't care about the data in the tag > + * so in case pfrag has no room > + * for a tag and we can't allocate a new pfrag > + * just use the page in the first frag > + * rather then write a complicated fall back code. > + */ > + tag_pfrag = _frag; > + tag_pfrag->page = skb_frag_page(frag); > + tag_pfrag->offset = 0; > + } > + If HW does not care, why even trying to call skb_page_frag_refill() ? If you remove it, then we remove one seldom used path and might uncover bugs This part looks very suspect to me, to be honest.
[PATCH V2 net-next 03/14] net: Add Software fallback infrastructure for socket dependent offloads
From: Ilya LesokhinWith socket dependent offloads we rely on the netdev to transform the transmitted packets before sending them to the wire. When a packet from an offloaded socket is rerouted to a different device we need to detect it and do the transformation in software. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- include/net/sock.h | 21 + net/Kconfig| 4 net/core/dev.c | 4 3 files changed, 29 insertions(+) diff --git a/include/net/sock.h b/include/net/sock.h index b9624581d639..92a0e0c54ac1 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -481,6 +481,11 @@ struct sock { void(*sk_error_report)(struct sock *sk); int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); +#ifdef CONFIG_SOCK_VALIDATE_XMIT + struct sk_buff* (*sk_validate_xmit_skb)(struct sock *sk, + struct net_device *dev, + struct sk_buff *skb); +#endif void(*sk_destruct)(struct sock *sk); struct sock_reuseport __rcu *sk_reuseport_cb; struct rcu_head sk_rcu; @@ -2323,6 +2328,22 @@ static inline bool sk_fullsock(const struct sock *sk) return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV); } +/* Checks if this SKB belongs to an HW offloaded socket + * and whether any SW fallbacks are required based on dev. + */ +static inline struct sk_buff *sk_validate_xmit_skb(struct sk_buff *skb, + struct net_device *dev) +{ +#ifdef CONFIG_SOCK_VALIDATE_XMIT + struct sock *sk = skb->sk; + + if (sk && sk_fullsock(sk) && sk->sk_validate_xmit_skb) + skb = sk->sk_validate_xmit_skb(sk, dev, skb); +#endif + + return skb; +} + /* This helper checks if a socket is a LISTEN or NEW_SYN_RECV * SYNACK messages can be attached to either ones (depending on SYNCOOKIE) */ diff --git a/net/Kconfig b/net/Kconfig index 0428f12c25c2..fe84cfe3260e 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -407,6 +407,10 @@ config GRO_CELLS bool default n +config SOCK_VALIDATE_XMIT + bool + default n + config NET_DEVLINK tristate "Network physical/parent device Netlink interface" help diff --git a/net/core/dev.c b/net/core/dev.c index d8887cc38e7b..244a4c7ab266 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3086,6 +3086,10 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device if (unlikely(!skb)) goto out_null; + skb = sk_validate_xmit_skb(skb, dev); + if (unlikely(!skb)) + goto out_null; + if (netif_needs_gso(skb, features)) { struct sk_buff *segs; -- 2.14.3
[PATCH V2 net-next 02/14] net: Rename and export copy_skb_header
From: Ilya Lesokhincopy_skb_header is renamed to skb_copy_header and exported. Exposing this function give more flexibility in copying SKBs. skb_copy and skb_copy_expand do not give enough control over which parts are copied. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 9 + 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d8340e6e8814..dc0f81277723 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1031,6 +1031,7 @@ static inline struct sk_buff *alloc_skb_fclone(unsigned int size, struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src); int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask); struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority); +void skb_copy_header(struct sk_buff *new, const struct sk_buff *old); struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t priority); struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom, gfp_t gfp_mask, bool fclone); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 715c13495ba6..9ae1812fb705 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1304,7 +1304,7 @@ static void skb_headers_offset_update(struct sk_buff *skb, int off) skb->inner_mac_header += off; } -static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) +void skb_copy_header(struct sk_buff *new, const struct sk_buff *old) { __copy_skb_header(new, old); @@ -1312,6 +1312,7 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old) skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs; skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type; } +EXPORT_SYMBOL(skb_copy_header); static inline int skb_alloc_rx_flag(const struct sk_buff *skb) { @@ -1354,7 +1355,7 @@ struct sk_buff *skb_copy(const struct sk_buff *skb, gfp_t gfp_mask) BUG_ON(skb_copy_bits(skb, -headerlen, n->head, headerlen + skb->len)); - copy_skb_header(n, skb); + skb_copy_header(n, skb); return n; } EXPORT_SYMBOL(skb_copy); @@ -1418,7 +1419,7 @@ struct sk_buff *__pskb_copy_fclone(struct sk_buff *skb, int headroom, skb_clone_fraglist(n); } - copy_skb_header(n, skb); + skb_copy_header(n, skb); out: return n; } @@ -1598,7 +1599,7 @@ struct sk_buff *skb_copy_expand(const struct sk_buff *skb, BUG_ON(skb_copy_bits(skb, -head_copy_len, n->head + head_copy_off, skb->len + head_copy_len)); - copy_skb_header(n, skb); + skb_copy_header(n, skb); skb_headers_offset_update(n, newheadroom - oldheadroom); -- 2.14.3
[PATCH V2 net-next 04/14] net: Add TLS offload netdev ops
From: Ilya LesokhinAdd new netdev ops to add and delete tls context Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- include/linux/netdevice.h | 24 1 file changed, 24 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 913b1cc882cf..e1fef7bb6ed4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -864,6 +864,26 @@ struct xfrmdev_ops { }; #endif +#if IS_ENABLED(CONFIG_TLS_DEVICE) +enum tls_offload_ctx_dir { + TLS_OFFLOAD_CTX_DIR_RX, + TLS_OFFLOAD_CTX_DIR_TX, +}; + +struct tls_crypto_info; +struct tls_context; + +struct tlsdev_ops { + int (*tls_dev_add)(struct net_device *netdev, struct sock *sk, + enum tls_offload_ctx_dir direction, + struct tls_crypto_info *crypto_info, + u32 start_offload_tcp_sn); + void (*tls_dev_del)(struct net_device *netdev, + struct tls_context *ctx, + enum tls_offload_ctx_dir direction); +}; +#endif + struct dev_ifalias { struct rcu_head rcuhead; char ifalias[]; @@ -1748,6 +1768,10 @@ struct net_device { const struct xfrmdev_ops *xfrmdev_ops; #endif +#if IS_ENABLED(CONFIG_TLS_DEVICE) + const struct tlsdev_ops *tlsdev_ops; +#endif + const struct header_ops *header_ops; unsigned intflags; -- 2.14.3
[PATCH V2 net-next 12/14] net/mlx5e: TLS, Add error statistics
From: Ilya LesokhinAdd statistics for rare TLS related errors. Since the errors are rare we have a counter per netdev rather then per SQ. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 + .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 22 ++ .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h | 22 ++ .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 24 +++--- .../mellanox/mlx5/core/en_accel/tls_stats.c| 89 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 + drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 22 ++ 8 files changed, 178 insertions(+), 10 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_stats.c diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index ec785f589666..a7135f5d5cf6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -28,6 +28,6 @@ mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o ipoib/ipoib mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \ en_accel/ipsec_stats.o -mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o +mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o en_accel/tls_stats.o CFLAGS_tracepoint.o := -I$(src) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 7d8696fca826..d397be0b5885 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -795,6 +795,9 @@ struct mlx5e_priv { #ifdef CONFIG_MLX5_EN_IPSEC struct mlx5e_ipsec*ipsec; #endif +#ifdef CONFIG_MLX5_EN_TLS + struct mlx5e_tls *tls; +#endif }; struct mlx5e_profile { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c index aa6981c98bdc..d167845271c3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c @@ -173,3 +173,25 @@ void mlx5e_tls_build_netdev(struct mlx5e_priv *priv) netdev->hw_features |= NETIF_F_HW_TLS_TX; netdev->tlsdev_ops = _tls_ops; } + +int mlx5e_tls_init(struct mlx5e_priv *priv) +{ + struct mlx5e_tls *tls = kzalloc(sizeof(*tls), GFP_KERNEL); + + if (!tls) + return -ENOMEM; + + priv->tls = tls; + return 0; +} + +void mlx5e_tls_cleanup(struct mlx5e_priv *priv) +{ + struct mlx5e_tls *tls = priv->tls; + + if (!tls) + return; + + kfree(tls); + priv->tls = NULL; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h index f7216b9b98e2..b6162178f621 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h @@ -38,6 +38,17 @@ #include #include "en.h" +struct mlx5e_tls_sw_stats { + atomic64_t tx_tls_drop_metadata; + atomic64_t tx_tls_drop_resync_alloc; + atomic64_t tx_tls_drop_no_sync_data; + atomic64_t tx_tls_drop_bypass_required; +}; + +struct mlx5e_tls { + struct mlx5e_tls_sw_stats sw_stats; +}; + struct mlx5e_tls_offload_context { struct tls_offload_context base; u32 expected_seq; @@ -55,10 +66,21 @@ mlx5e_get_tls_tx_context(struct tls_context *tls_ctx) } void mlx5e_tls_build_netdev(struct mlx5e_priv *priv); +int mlx5e_tls_init(struct mlx5e_priv *priv); +void mlx5e_tls_cleanup(struct mlx5e_priv *priv); + +int mlx5e_tls_get_count(struct mlx5e_priv *priv); +int mlx5e_tls_get_strings(struct mlx5e_priv *priv, uint8_t *data); +int mlx5e_tls_get_stats(struct mlx5e_priv *priv, u64 *data); #else static inline void mlx5e_tls_build_netdev(struct mlx5e_priv *priv) { } +static inline int mlx5e_tls_init(struct mlx5e_priv *priv) { return 0; } +static inline void mlx5e_tls_cleanup(struct mlx5e_priv *priv) { } +static inline int mlx5e_tls_get_count(struct mlx5e_priv *priv) { return 0; } +static inline int mlx5e_tls_get_strings(struct mlx5e_priv *priv, uint8_t *data) { return 0; } +static inline int mlx5e_tls_get_stats(struct mlx5e_priv *priv, u64 *data) { return 0; } #endif diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c index 49e8d455ebc3..ad2790fb5966 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c @@ -164,7 +164,8 @@ static struct sk_buff *
[PATCH V2 net-next 11/14] net/mlx5e: TLS, Add Innova TLS TX offload data path
From: Ilya LesokhinImplement the TLS tx offload data path according to the requirements of the TLS generic NIC offload infrastructure. Special metadata ethertype is used to pass information to the hardware. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 15 ++ .../mellanox/mlx5/core/en_accel/en_accel.h | 72 ++ .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 2 + .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 272 + .../mellanox/mlx5/core/en_accel/tls_rxtx.h | 50 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 10 + drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 9 + drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 37 +-- 10 files changed, 455 insertions(+), 16 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 50872ed30c0b..ec785f589666 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -28,6 +28,6 @@ mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o ipoib/ipoib mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \ en_accel/ipsec_stats.o -mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o +mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o CFLAGS_tracepoint.o := -I$(src) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 6660986285bf..7d8696fca826 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -340,6 +340,7 @@ struct mlx5e_sq_dma { enum { MLX5E_SQ_STATE_ENABLED, MLX5E_SQ_STATE_IPSEC, + MLX5E_SQ_STATE_TLS, }; struct mlx5e_sq_wqe_info { @@ -824,6 +825,8 @@ void mlx5e_build_ptys2ethtool_map(void); u16 mlx5e_select_queue(struct net_device *dev, struct sk_buff *skb, void *accel_priv, select_queue_fallback_t fallback); netdev_tx_t mlx5e_xmit(struct sk_buff *skb, struct net_device *dev); +netdev_tx_t mlx5e_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb, + struct mlx5e_tx_wqe *wqe, u16 pi); void mlx5e_completion_event(struct mlx5_core_cq *mcq); void mlx5e_cq_error_event(struct mlx5_core_cq *mcq, enum mlx5_event event); @@ -929,6 +932,18 @@ static inline bool mlx5e_tunnel_inner_ft_supported(struct mlx5_core_dev *mdev) MLX5_CAP_FLOWTABLE_NIC_RX(mdev, ft_field_support.inner_ip_version)); } +static inline void mlx5e_sq_fetch_wqe(struct mlx5e_txqsq *sq, + struct mlx5e_tx_wqe **wqe, + u16 *pi) +{ + struct mlx5_wq_cyc *wq; + + wq = >wq; + *pi = sq->pc & wq->sz_m1; + *wqe = mlx5_wq_cyc_get_wqe(wq, *pi); + memset(*wqe, 0, sizeof(**wqe)); +} + static inline struct mlx5e_tx_wqe *mlx5e_post_nop(struct mlx5_wq_cyc *wq, u32 sqn, u16 *pc) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h new file mode 100644 index ..68fcb40a2847 --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/en_accel.h @@ -0,0 +1,72 @@ +/* + * Copyright (c) 2018 Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A
[PATCH V2 net-next 09/14] net/mlx5: Accel, Add TLS tx offload interface
From: Ilya LesokhinAdd routines for manipulating TLS TX offload contexts. In Innova TLS, TLS contexts are added or deleted via a command message over the SBU connection. The HW then sends a response message over the same connection. Add implementation for Innova TLS (FPGA-based) hardware. These routines will be used by the TLS offload support in a later patch mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs to work directly with mlx5_core rather than Innova FPGA or other mlx5 acceleration providers. In the future, when IPSec/TLS or any other acceleration gets integrated into ConnectX chip, mlx5/accel layer will provide the integrated acceleration, rather than the Innova one. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/Makefile | 4 +- .../net/ethernet/mellanox/mlx5/core/accel/tls.c| 71 +++ .../net/ethernet/mellanox/mlx5/core/accel/tls.h| 86 .../net/ethernet/mellanox/mlx5/core/fpga/core.h| 1 + drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c | 563 + drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h | 68 +++ drivers/net/ethernet/mellanox/mlx5/core/main.c | 11 + include/linux/mlx5/mlx5_ifc.h | 16 - include/linux/mlx5/mlx5_ifc_fpga.h | 77 +++ 9 files changed, 879 insertions(+), 18 deletions(-) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index c805769d92a9..9989e5265a45 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -8,10 +8,10 @@ mlx5_core-y :=main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \ fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \ diag/fs_tracepoint.o -mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o +mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o accel/tls.o mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o \ - fpga/ipsec.o + fpga/ipsec.o fpga/tls.o mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \ en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c new file mode 100644 index ..77ac19f38cbe --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/accel/tls.c @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2018 Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include + +#include "accel/tls.h" +#include "mlx5_core.h" +#include "fpga/tls.h" + +int mlx5_accel_tls_add_tx_flow(struct mlx5_core_dev *mdev, void *flow, + struct tls_crypto_info *crypto_info, + u32 start_offload_tcp_sn, u32 *p_swid) +{ + return mlx5_fpga_tls_add_tx_flow(mdev, flow, crypto_info, +start_offload_tcp_sn, p_swid); +} + +void mlx5_accel_tls_del_tx_flow(struct mlx5_core_dev *mdev, u32 swid) +{ + mlx5_fpga_tls_del_tx_flow(mdev, swid,
[PATCH V2 net-next 06/14] net/tls: Add generic NIC offload infrastructure
From: Ilya LesokhinThis patch adds a generic infrastructure to offload TLS crypto to a network devices. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmit the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- include/net/tls.h | 74 +++- net/tls/Kconfig | 10 + net/tls/Makefile | 2 + net/tls/tls_device.c | 793 ++ net/tls/tls_device_fallback.c | 415 ++ net/tls/tls_main.c| 33 +- 6 files changed, 1320 insertions(+), 7 deletions(-) create mode 100644 net/tls/tls_device.c create mode 100644 net/tls/tls_device_fallback.c diff --git a/include/net/tls.h b/include/net/tls.h index 4913430ab807..0bfb1b0a156a 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -77,6 +77,37 @@ struct tls_sw_context { struct scatterlist sg_aead_out[2]; }; +struct tls_record_info { + struct list_head list; + u32 end_seq; + int len; + int num_frags; + skb_frag_t frags[MAX_SKB_FRAGS]; +}; + +struct tls_offload_context { + struct crypto_aead *aead_send; + spinlock_t lock;/* protects records list */ + struct list_head records_list; + struct tls_record_info *open_record; + struct tls_record_info *retransmit_hint; + u64 hint_record_sn; + u64 unacked_record_sn; + + struct scatterlist sg_tx_data[MAX_SKB_FRAGS]; + void (*sk_destruct)(struct sock *sk); + u8 driver_state[]; + /* The TLS layer reserves room for driver specific state +* Currently the belief is that there is not enough +* driver specific state to justify another layer of indirection +*/ +#define TLS_DRIVER_STATE_SIZE (max_t(size_t, 8, sizeof(void *))) +}; + +#define TLS_OFFLOAD_CONTEXT_SIZE \ + (ALIGN(sizeof(struct tls_offload_context), sizeof(void *)) + \ +TLS_DRIVER_STATE_SIZE) + enum { TLS_PENDING_CLOSED_RECORD }; @@ -87,6 +118,10 @@ struct tls_context { struct tls12_crypto_info_aes_gcm_128 crypto_send_aes_gcm_128; }; + struct list_head list; + struct net_device *netdev; +
[PATCH V2 net-next 13/14] MAINTAINERS: Update mlx5 innova driver maintainers
From: Boris PismennySigned-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- MAINTAINERS | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 214c9bca232a..cd4067ccf959 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8913,26 +8913,17 @@ W: http://www.mellanox.com Q: http://patchwork.ozlabs.org/project/netdev/list/ F: drivers/net/ethernet/mellanox/mlx5/core/en_* -MELLANOX ETHERNET INNOVA DRIVER -M: Ilan Tayari -R: Boris Pismenny +MELLANOX ETHERNET INNOVA DRIVERS +M: Boris Pismenny L: netdev@vger.kernel.org S: Supported W: http://www.mellanox.com Q: http://patchwork.ozlabs.org/project/netdev/list/ +F: drivers/net/ethernet/mellanox/mlx5/core/en_accel/* +F: drivers/net/ethernet/mellanox/mlx5/core/accel/* F: drivers/net/ethernet/mellanox/mlx5/core/fpga/* F: include/linux/mlx5/mlx5_ifc_fpga.h -MELLANOX ETHERNET INNOVA IPSEC DRIVER -M: Ilan Tayari -R: Boris Pismenny -L: netdev@vger.kernel.org -S: Supported -W: http://www.mellanox.com -Q: http://patchwork.ozlabs.org/project/netdev/list/ -F: drivers/net/ethernet/mellanox/mlx5/core/en_ipsec/* -F: drivers/net/ethernet/mellanox/mlx5/core/ipsec* - MELLANOX ETHERNET SWITCH DRIVERS M: Jiri Pirko M: Ido Schimmel -- 2.14.3
[PATCH V2 net-next 14/14] MAINTAINERS: Update TLS maintainers
From: Boris PismennySigned-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index cd4067ccf959..285ea4e6c580 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9711,7 +9711,7 @@ F:net/netfilter/xt_CONNSECMARK.c F: net/netfilter/xt_SECMARK.c NETWORKING [TLS] -M: Ilya Lesokhin +M: Boris Pismenny M: Aviad Yehezkel M: Dave Watson L: netdev@vger.kernel.org -- 2.14.3
[PATCH V2 net-next 10/14] net/mlx5e: TLS, Add Innova TLS TX support
From: Ilya LesokhinAdd NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the TLS generic NIC offload infrastructure. The NETIF_F_HW_TLS_TX capability will be added in the next patch. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 11 ++ drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 + .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 173 + .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h | 65 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 + 5 files changed, 254 insertions(+) create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.h diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig index 25deaa5a534c..6befd2c381b8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig +++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig @@ -85,3 +85,14 @@ config MLX5_EN_IPSEC Build support for IPsec cryptography-offload accelaration in the NIC. Note: Support for hardware with this capability needs to be selected for this option to become available. + +config MLX5_EN_TLS + bool "TLS cryptography-offload accelaration" + depends on MLX5_CORE_EN + depends on TLS_DEVICE + depends on MLX5_ACCEL + default n + ---help--- + Build support for TLS cryptography-offload accelaration in the NIC. + Note: Support for hardware with this capability needs to be selected + for this option to become available. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile index 9989e5265a45..50872ed30c0b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile +++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile @@ -28,4 +28,6 @@ mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o ipoib/ipoib mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \ en_accel/ipsec_stats.o +mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o + CFLAGS_tracepoint.o := -I$(src) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c new file mode 100644 index ..38d88108a55a --- /dev/null +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c @@ -0,0 +1,173 @@ +/* + * Copyright (c) 2018 Mellanox Technologies. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + *copyright notice, this list of conditions and the following + *disclaimer. + * + * - Redistributions in binary form must reproduce the above + *copyright notice, this list of conditions and the following + *disclaimer in the documentation and/or other materials + *provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#include +#include +#include "en_accel/tls.h" +#include "accel/tls.h" + +static void mlx5e_tls_set_ipv4_flow(void *flow, struct sock *sk) +{ + struct inet_sock *inet = inet_sk(sk); + + MLX5_SET(tls_flow, flow, ipv6, 0); + memcpy(MLX5_ADDR_OF(tls_flow, flow, dst_ipv4_dst_ipv6.ipv4_layout.ipv4), + >inet_daddr, MLX5_FLD_SZ_BYTES(ipv4_layout, ipv4)); + memcpy(MLX5_ADDR_OF(tls_flow, flow, src_ipv4_src_ipv6.ipv4_layout.ipv4), + >inet_rcv_saddr, MLX5_FLD_SZ_BYTES(ipv4_layout, ipv4)); +} + +#if IS_ENABLED(CONFIG_IPV6) +static void mlx5e_tls_set_ipv6_flow(void *flow, struct sock *sk) +{ + struct ipv6_pinfo *np = inet6_sk(sk); + + MLX5_SET(tls_flow, flow, ipv6, 1); + memcpy(MLX5_ADDR_OF(tls_flow, flow, dst_ipv4_dst_ipv6.ipv6_layout.ipv6), + >sk_v6_daddr, MLX5_FLD_SZ_BYTES(ipv6_layout,
[PATCH V2 net-next 07/14] net/tls: Support TLS device offload with IPv6
From: Ilya LesokhinPreviously get_netdev_for_sock worked only with IPv4. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- net/tls/tls_device.c | 49 - 1 file changed, 48 insertions(+), 1 deletion(-) diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index e623280ea019..c35fc107d9c5 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -34,6 +34,11 @@ #include #include #include +#include +#include +#include +#include +#include #include #include @@ -99,13 +104,55 @@ static void tls_device_queue_ctx_destruction(struct tls_context *ctx) spin_unlock_irqrestore(_device_lock, flags); } +static inline struct net_device *ipv6_get_netdev(struct sock *sk) +{ + struct net_device *dev = NULL; +#if IS_ENABLED(CONFIG_IPV6) + struct inet_sock *inet = inet_sk(sk); + struct ipv6_pinfo *np = inet6_sk(sk); + struct flowi6 _fl6, *fl6 = &_fl6; + struct dst_entry *dst; + + memset(fl6, 0, sizeof(*fl6)); + fl6->flowi6_proto = sk->sk_protocol; + fl6->daddr = sk->sk_v6_daddr; + fl6->saddr = np->saddr; + fl6->flowlabel = np->flow_label; + IP6_ECN_flow_xmit(sk, fl6->flowlabel); + fl6->flowi6_oif = sk->sk_bound_dev_if; + fl6->flowi6_mark = sk->sk_mark; + fl6->fl6_sport = inet->inet_sport; + fl6->fl6_dport = inet->inet_dport; + fl6->flowi6_uid = sk->sk_uid; + security_sk_classify_flow(sk, flowi6_to_flowi(fl6)); + + if (ipv6_stub->ipv6_dst_lookup(sock_net(sk), sk, , fl6) < 0) + return NULL; + + dev = dst->dev; + dev_hold(dev); + dst_release(dst); + +#endif + return dev; +} + /* We assume that the socket is already connected */ static struct net_device *get_netdev_for_sock(struct sock *sk) { struct inet_sock *inet = inet_sk(sk); struct net_device *netdev = NULL; - netdev = dev_get_by_index(sock_net(sk), inet->cork.fl.flowi_oif); + if (sk->sk_family == AF_INET) + netdev = dev_get_by_index(sock_net(sk), + inet->cork.fl.flowi_oif); + else if (sk->sk_family == AF_INET6) { + netdev = ipv6_get_netdev(sk); + if (!netdev && !sk->sk_ipv6only && + ipv6_addr_type(>sk_v6_daddr) == IPV6_ADDR_MAPPED) + netdev = dev_get_by_index(sock_net(sk), + inet->cork.fl.flowi_oif); + } return netdev; } -- 2.14.3
[PATCH V2 net-next 08/14] net/mlx5e: Move defines out of ipsec code
From: Ilya LesokhinThe defines are not IPSEC specific. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +++ drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 3 --- drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 5 + drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h | 2 ++ 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 4c9360b25532..6660986285bf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -53,6 +53,9 @@ #include "mlx5_core.h" #include "en_stats.h" +#define MLX5E_METADATA_ETHER_TYPE (0x8CE4) +#define MLX5E_METADATA_ETHER_LEN 8 + #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) #define MLX5E_ETH_HARD_MTU (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h index 1198fc1eba4c..93bf10e6508c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.h @@ -45,9 +45,6 @@ #define MLX5E_IPSEC_SADB_RX_BITS 10 #define MLX5E_IPSEC_ESN_SCOPE_MID 0x8000L -#define MLX5E_METADATA_ETHER_TYPE (0x8CE4) -#define MLX5E_METADATA_ETHER_LEN 8 - struct mlx5e_priv; struct mlx5e_ipsec_sw_stats { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c index 4f1568528738..a6b672840e34 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c @@ -43,9 +43,6 @@ #include "fpga/sdk.h" #include "fpga/core.h" -#define SBU_QP_QUEUE_SIZE 8 -#define MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC (60 * 1000) - enum mlx5_fpga_ipsec_cmd_status { MLX5_FPGA_IPSEC_CMD_PENDING, MLX5_FPGA_IPSEC_CMD_SEND_FAIL, @@ -258,7 +255,7 @@ static int mlx5_fpga_ipsec_cmd_wait(void *ctx) { struct mlx5_fpga_ipsec_cmd_context *context = ctx; unsigned long timeout = - msecs_to_jiffies(MLX5_FPGA_IPSEC_CMD_TIMEOUT_MSEC); + msecs_to_jiffies(MLX5_FPGA_CMD_TIMEOUT_MSEC); int res; res = wait_for_completion_timeout(>complete, timeout); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h index baa537e54a49..a0573cc2fc9b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.h @@ -41,6 +41,8 @@ * DOC: Innova SDK * This header defines the in-kernel API for Innova FPGA client drivers. */ +#define SBU_QP_QUEUE_SIZE 8 +#define MLX5_FPGA_CMD_TIMEOUT_MSEC (60 * 1000) enum mlx5_fpga_access_type { MLX5_FPGA_ACCESS_TYPE_I2C = 0x0, -- 2.14.3
[PATCH V2 net-next 05/14] net: Add TLS TX offload features
From: Ilya LesokhinThis patch adds a netdev feature to configure TLS TX offloads. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- include/linux/netdev_features.h | 2 ++ net/core/ethtool.c | 1 + 2 files changed, 3 insertions(+) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index db84c516bcfb..18dc34202080 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -77,6 +77,7 @@ enum { NETIF_F_HW_ESP_BIT, /* Hardware ESP transformation offload */ NETIF_F_HW_ESP_TX_CSUM_BIT, /* ESP with TX checksum offload */ NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */ + NETIF_F_HW_TLS_TX_BIT, /* Hardware TLS TX offload */ NETIF_F_GRO_HW_BIT, /* Hardware Generic receive offload */ @@ -145,6 +146,7 @@ enum { #define NETIF_F_HW_ESP __NETIF_F(HW_ESP) #define NETIF_F_HW_ESP_TX_CSUM __NETIF_F(HW_ESP_TX_CSUM) #defineNETIF_F_RX_UDP_TUNNEL_PORT __NETIF_F(RX_UDP_TUNNEL_PORT) +#define NETIF_F_HW_TLS_TX __NETIF_F(HW_TLS_TX) #define for_each_netdev_feature(mask_addr, bit)\ for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT) diff --git a/net/core/ethtool.c b/net/core/ethtool.c index 157cd9efa4be..9f07f9fe39ca 100644 --- a/net/core/ethtool.c +++ b/net/core/ethtool.c @@ -107,6 +107,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] [NETIF_F_HW_ESP_BIT] = "esp-hw-offload", [NETIF_F_HW_ESP_TX_CSUM_BIT] = "esp-tx-csum-hw-offload", [NETIF_F_RX_UDP_TUNNEL_PORT_BIT] = "rx-udp_tunnel-port-offload", + [NETIF_F_HW_TLS_TX_BIT] ="tls-hw-tx-offload", }; static const char -- 2.14.3
[PATCH V2 net-next 01/14] tcp: Add clean acked data hook
From: Ilya LesokhinCalled when a TCP segment is acknowledged. Could be used by application protocols who hold additional metadata associated with the stream data. This is required by TLS device offload to release metadata associated with acknowledged TLS records. Signed-off-by: Ilya Lesokhin Signed-off-by: Boris Pismenny Signed-off-by: Aviad Yehezkel Signed-off-by: Saeed Mahameed --- include/net/inet_connection_sock.h | 2 ++ include/net/tcp.h | 5 + net/ipv4/tcp.c | 5 + net/ipv4/tcp_input.c | 6 ++ 4 files changed, 18 insertions(+) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index b68fea022a82..2ab6667275df 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -77,6 +77,7 @@ struct inet_connection_sock_af_ops { * @icsk_af_ops Operations which are AF_INET{4,6} specific * @icsk_ulp_ops Pluggable ULP control hook * @icsk_ulp_data ULP private data + * @icsk_clean_acked Clean acked data hook * @icsk_listen_portaddr_node hash to the portaddr listener hashtable * @icsk_ca_state:Congestion control state * @icsk_retransmits: Number of unrecovered [RTO] timeouts @@ -102,6 +103,7 @@ struct inet_connection_sock { const struct inet_connection_sock_af_ops *icsk_af_ops; const struct tcp_ulp_ops *icsk_ulp_ops; void *icsk_ulp_data; + void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq); struct hlist_node icsk_listen_portaddr_node; unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu); __u8 icsk_ca_state:6, diff --git a/include/net/tcp.h b/include/net/tcp.h index 9c9b3768b350..dba03b205680 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2101,4 +2101,9 @@ static inline bool tcp_bpf_ca_needs_ecn(struct sock *sk) #if IS_ENABLED(CONFIG_SMC) extern struct static_key_false tcp_have_smc; #endif + +#if IS_ENABLED(CONFIG_TLS_DEVICE) +extern struct static_key_false clean_acked_data_enabled; +#endif + #endif /* _TCP_H */ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index e553f84bde83..70056bb760d2 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -297,6 +297,11 @@ DEFINE_STATIC_KEY_FALSE(tcp_have_smc); EXPORT_SYMBOL(tcp_have_smc); #endif +#if IS_ENABLED(CONFIG_TLS_DEVICE) +DEFINE_STATIC_KEY_FALSE(clean_acked_data_enabled); +EXPORT_SYMBOL_GPL(clean_acked_data_enabled); +#endif + /* * Current number of TCP sockets. */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 451ef3012636..21f5c647f4be 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3542,6 +3542,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) if (after(ack, prior_snd_una)) { flag |= FLAG_SND_UNA_ADVANCED; icsk->icsk_retransmits = 0; + +#if IS_ENABLED(CONFIG_TLS_DEVICE) + if (static_branch_unlikely(_acked_data_enabled)) + if (icsk->icsk_clean_acked) + icsk->icsk_clean_acked(sk, ack); +#endif } prior_fack = tcp_is_sack(tp) ? tcp_highest_sack_seq(tp) : tp->snd_una; -- 2.14.3
[PATCH V2 net-next 00/14] TLS offload, netdev & MLX5 support
Hi Dave, The following series from Ilya and Boris provides TLS TX inline crypto offload. v1->v2: - Added IS_ENABLED(CONFIG_TLS_DEVICE) and a STATIC_KEY for icsk_clean_acked - File license fix - Fix spelling, comment by DaveW - Move memory allocations out of tls_set_device_offload and other misc fixes, comments by Kiril. Boris says: === This series adds a generic infrastructure to offload TLS crypto to a network devices. It enables the kernel TLS socket to skip encryption and authentication operations on the transmit side of the data path. Leaving those computationally expensive operations to the NIC. The NIC offload infrastructure builds TLS records and pushes them to the TCP layer just like the SW KTLS implementation and using the same API. TCP segmentation is mostly unaffected. Currently the only exception is that we prevent mixed SKBs where only part of the payload requires offload. In the future we are likely to add a similar restriction following a change cipher spec record. The notable differences between SW KTLS and NIC offloaded TLS implementations are as follows: 1. The offloaded implementation builds "plaintext TLS record", those records contain plaintext instead of ciphertext and place holder bytes instead of authentication tags. 2. The offloaded implementation maintains a mapping from TCP sequence number to TLS records. Thus given a TCP SKB sent from a NIC offloaded TLS socket, we can use the tls NIC offload infrastructure to obtain enough context to encrypt the payload of the SKB. A TLS record is released when the last byte of the record is ack'ed, this is done through the new icsk_clean_acked callback. The infrastructure should be extendable to support various NIC offload implementations. However it is currently written with the implementation below in mind: The NIC assumes that packets from each offloaded stream are sent as plaintext and in-order. It keeps track of the TLS records in the TCP stream. When a packet marked for offload is transmitted, the NIC encrypts the payload in-place and puts authentication tags in the relevant place holders. The responsibility for handling out-of-order packets (i.e. TCP retransmission, qdisc drops) falls on the netdev driver. The netdev driver keeps track of the expected TCP SN from the NIC's perspective. If the next packet to transmit matches the expected TCP SN, the driver advances the expected TCP SN, and transmits the packet with TLS offload indication. If the next packet to transmit does not match the expected TCP SN. The driver calls the TLS layer to obtain the TLS record that includes the TCP of the packet for transmission. Using this TLS record, the driver posts a work entry on the transmit queue to reconstruct the NIC TLS state required for the offload of the out-of-order packet. It updates the expected TCP SN accordingly and transmit the now in-order packet. The same queue is used for packet transmission and TLS context reconstruction to avoid the need for flushing the transmit queue before issuing the context reconstruction request. Expected TCP SN is accessed without a lock, under the assumption that TCP doesn't transmit SKBs from different TX queue concurrently. We assume that packets are not rerouted to a different network device. Paper: https://www.netdevconf.org/1.2/papers/netdevconf-TLS.pdf === === The series is based on latest net-next: 0466080c751e ("Merge branch 'dsa-mv88e6xxx-some-fixes'") Thanks, Saeed. --- Boris Pismenny (2): MAINTAINERS: Update mlx5 innova driver maintainers MAINTAINERS: Update TLS maintainers Ilya Lesokhin (12): tcp: Add clean acked data hook net: Rename and export copy_skb_header net: Add Software fallback infrastructure for socket dependent offloads net: Add TLS offload netdev ops net: Add TLS TX offload features net/tls: Add generic NIC offload infrastructure net/tls: Support TLS device offload with IPv6 net/mlx5e: Move defines out of ipsec code net/mlx5: Accel, Add TLS tx offload interface net/mlx5e: TLS, Add Innova TLS TX support net/mlx5e: TLS, Add Innova TLS TX offload data path net/mlx5e: TLS, Add error statistics MAINTAINERS| 19 +- drivers/net/ethernet/mellanox/mlx5/core/Kconfig| 11 + drivers/net/ethernet/mellanox/mlx5/core/Makefile | 6 +- .../net/ethernet/mellanox/mlx5/core/accel/tls.c| 71 ++ .../net/ethernet/mellanox/mlx5/core/accel/tls.h| 86 +++ drivers/net/ethernet/mellanox/mlx5/core/en.h | 21 + .../mellanox/mlx5/core/en_accel/en_accel.h | 72 ++ .../ethernet/mellanox/mlx5/core/en_accel/ipsec.h | 3 - .../net/ethernet/mellanox/mlx5/core/en_accel/tls.c | 197 + .../net/ethernet/mellanox/mlx5/core/en_accel/tls.h | 87 +++ .../mellanox/mlx5/core/en_accel/tls_rxtx.c | 278 +++ .../mellanox/mlx5/core/en_accel/tls_rxtx.h | 50 ++ .../mellanox/mlx5/core/en_accel/tls_stats.c
Hello,
Hello, I am Mr. Abdulahi Issa, from Burkina Faso in West African region. I work with the Bank of Africa here Which i am the audit manager . Can you safe Guard these amount( $18 Million USD) for me in your Country?? Further Details will be given to you if you show Interest. Regards Mr. Abdulahi Issa.
Re: [bug, bisected] pfifo_fast causes packet reordering
On 03/21/2018 12:44 PM, Jakob Unterwurzacher wrote: > On 21.03.18 19:43, John Fastabend wrote: >> Thats my theory at least. Are you able to test a patch if I generate >> one to fix this? > > Yes, no problem. Can you try this, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index d4907b5..1e596bd 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -30,6 +30,7 @@ struct qdisc_rate_table { enum qdisc_state_t { __QDISC_STATE_SCHED, __QDISC_STATE_DEACTIVATED, + __QDISC_STATE_RUNNING, }; struct qdisc_size_table { diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 190570f..cf7c37d 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -377,20 +377,26 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets) struct netdev_queue *txq; struct net_device *dev; struct sk_buff *skb; - bool validate; + bool more, validate; /* Dequeue packet */ + if (test_and_set_bit(__QDISC_STATE_RUNNING, >state)) + return false; + skb = dequeue_skb(q, , packets); - if (unlikely(!skb)) + if (unlikely(!skb)) { + clear_bit(__QDISC_STATE_RUNNING, >state); return false; + } if (!(q->flags & TCQ_F_NOLOCK)) root_lock = qdisc_lock(q); dev = qdisc_dev(q); txq = skb_get_tx_queue(dev, skb); - - return sch_direct_xmit(skb, q, dev, txq, root_lock, validate); + more = sch_direct_xmit(skb, q, dev, txq, root_lock, validate); + clear_bit(__QDISC_STATE_RUNNING, >state); + return more; } > > I just tested with the flag change you suggested (see below, I had to keep > TCQ_F_CPUSTATS to prevent a crash) and I have NOT seen OOO so far. > Right because the code expects per cpu stats if the CPUSTATS flag is removed it will crash. > Thanks, > Jakob > > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 190570f21b20..51b68ef4977b 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -792,7 +792,7 @@ struct Qdisc_ops pfifo_fast_ops __read_mostly = { > .dump = pfifo_fast_dump, > .change_tx_queue_len = pfifo_fast_change_tx_queue_len, > .owner = THIS_MODULE, > - .static_flags = TCQ_F_NOLOCK | TCQ_F_CPUSTATS, > + .static_flags = TCQ_F_CPUSTATS, > }; > EXPORT_SYMBOL(pfifo_fast_ops);
Re: [PATCH net-next 06/14] net/tls: Add generic NIC offload infrastructure
On Wed, 2018-03-21 at 19:31 +0300, Kirill Tkhai wrote: > On 21.03.2018 18:53, Boris Pismenny wrote: > > ... > > > > > > Other patches have two licenses in header. Can I distribute this > > > file under GPL license terms? > > > > > > > Sure, I'll update the license to match other files under net/tls. > > > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > + > > > > +#include > > > > +#include > > > > + > > > > +/* device_offload_lock is used to synchronize tls_dev_add > > > > + * against NETDEV_DOWN notifications. > > > > + */ > > > > +DEFINE_STATIC_PERCPU_RWSEM(device_offload_lock); > > > > + > > > > +static void tls_device_gc_task(struct work_struct *work); > > > > + > > > > +static DECLARE_WORK(tls_device_gc_work, tls_device_gc_task); > > > > +static LIST_HEAD(tls_device_gc_list); > > > > +static LIST_HEAD(tls_device_list); > > > > +static DEFINE_SPINLOCK(tls_device_lock); > > > > + > > > > +static void tls_device_free_ctx(struct tls_context *ctx) > > > > +{ > > > > +struct tls_offload_context *offlad_ctx = > > > > tls_offload_ctx(ctx); > > > > + > > > > +kfree(offlad_ctx); > > > > +kfree(ctx); > > > > +} > > > > + > > > > +static void tls_device_gc_task(struct work_struct *work) > > > > +{ > > > > +struct tls_context *ctx, *tmp; > > > > +struct list_head gc_list; > > > > +unsigned long flags; > > > > + > > > > +spin_lock_irqsave(_device_lock, flags); > > > > +INIT_LIST_HEAD(_list); > > > > > > This is stack variable, and it should be initialized outside of > > > global spinlock. > > > There is LIST_HEAD() primitive for that in kernel. > > > There is one more similar place below. > > > > > > > Sure. > > > > > > +list_splice_init(_device_gc_list, _list); > > > > +spin_unlock_irqrestore(_device_lock, flags); > > > > + > > > > +list_for_each_entry_safe(ctx, tmp, _list, list) { > > > > +struct net_device *netdev = ctx->netdev; > > > > + > > > > +if (netdev) { > > > > +netdev->tlsdev_ops->tls_dev_del(netdev, ctx, > > > > +TLS_OFFLOAD_CTX_DIR_TX); > > > > +dev_put(netdev); > > > > +} > > > > > > How is possible the situation we meet NULL netdev here > > > > > This can happen in tls_device_down. tls_deviec_down is called > > whenever a netdev that is used for TLS inline crypto offload goes > > down. It gets called via the NETDEV_DOWN event of the netdevice > > notifier. > > > > This flow is somewhat similar to the xfrm_device netdev notifier. > > However, we do not destroy the socket (as in destroying the > > xfrm_state in xfrm_device). Instead, we cleanup the netdev state > > and allow software fallback to handle the rest of the traffic. > > > > > > + > > > > +list_del(>list); > > > > +tls_device_free_ctx(ctx); > > > > +} > > > > +} > > > > + > > > > +static void tls_device_queue_ctx_destruction(struct > > > > tls_context *ctx) > > > > +{ > > > > +unsigned long flags; > > > > + > > > > +spin_lock_irqsave(_device_lock, flags); > > > > +list_move_tail(>list, _device_gc_list); > > > > + > > > > +/* schedule_work inside the spinlock > > > > + * to make sure tls_device_down waits for that work. > > > > + */ > > > > +schedule_work(_device_gc_work); > > > > + > > > > +spin_unlock_irqrestore(_device_lock, flags); > > > > +} > > > > + > > > > +/* We assume that the socket is already connected */ > > > > +static struct net_device *get_netdev_for_sock(struct sock *sk) > > > > +{ > > > > +struct inet_sock *inet = inet_sk(sk); > > > > +struct net_device *netdev = NULL; > > > > + > > > > +netdev = dev_get_by_index(sock_net(sk), inet- > > > > >cork.fl.flowi_oif); > > > > + > > > > +return netdev; > > > > +} > > > > + > > > > +static int attach_sock_to_netdev(struct sock *sk, struct > > > > net_device *netdev, > > > > + struct tls_context *ctx) > > > > +{ > > > > +int rc; > > > > + > > > > +rc = netdev->tlsdev_ops->tls_dev_add(netdev, sk, > > > > TLS_OFFLOAD_CTX_DIR_TX, > > > > + >crypto_send, > > > > + tcp_sk(sk)->write_seq); > > > > +if (rc) { > > > > +pr_err_ratelimited("The netdev has refused to offload > > > > this socket\n"); > > > > +goto out; > > > > +} > > > > + > > > > +rc = 0; > > > > +out: > > > > +return rc; > > > > +} > > > > + > > > > +static void destroy_record(struct tls_record_info *record) > > > > +{ > > > > +skb_frag_t *frag; > > > > +int nr_frags = record->num_frags; > > > > + > > > > +while (nr_frags > 0) { > > > > +frag = >frags[nr_frags - 1]; > > > > +__skb_frag_unref(frag); > > > > +--nr_frags; > > > > +} > > > > +kfree(record); > > > > +} > > > > + > > > > +static void delete_all_records(struct tls_offload_context > > > > *offload_ctx) > > > > +{ > > > > +struct tls_record_info *info, *temp; > > > > +
[PATCH net-next v5 2/2] net: bpf: add a test for skb_segment in test_bpf module
Without the previous commit, "modprobe test_bpf" will have the following errors: ... [ 98.149165] [ cut here ] [ 98.159362] kernel BUG at net/core/skbuff.c:3667! [ 98.169756] invalid opcode: [#1] SMP PTI [ 98.179370] Modules linked in: [ 98.179371] test_bpf(+) ... which triggers the bug the previous commit intends to fix. The skbs are constructed to mimic what mlx5 may generate. The packet size/header may not mimic real cases in production. But the processing flow is similar. Signed-off-by: Yonghong Song--- lib/test_bpf.c | 93 -- 1 file changed, 91 insertions(+), 2 deletions(-) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index 2efb213..a468b5c 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -6574,6 +6574,93 @@ static bool exclude_test(int test_id) return test_id < test_range[0] || test_id > test_range[1]; } +static __init struct sk_buff *build_test_skb(void) +{ + u32 headroom = NET_SKB_PAD + NET_IP_ALIGN + ETH_HLEN; + struct sk_buff *skb[2]; + struct page *page[2]; + int i, data_size = 8; + + for (i = 0; i < 2; i++) { + page[i] = alloc_page(GFP_KERNEL); + if (!page[i]) { + if (i == 0) + goto err_page0; + else + goto err_page1; + } + + /* this will set skb[i]->head_frag */ + skb[i] = dev_alloc_skb(headroom + data_size); + if (!skb[i]) { + if (i == 0) + goto err_skb0; + else + goto err_skb1; + } + + skb_reserve(skb[i], headroom); + skb_put(skb[i], data_size); + skb[i]->protocol = htons(ETH_P_IP); + skb_reset_network_header(skb[i]); + skb_set_mac_header(skb[i], -ETH_HLEN); + + skb_add_rx_frag(skb[i], 0, page[i], 0, 64, 64); + // skb_headlen(skb[i]): 8, skb[i]->head_frag = 1 + } + + /* setup shinfo */ + skb_shinfo(skb[0])->gso_size = 1448; + skb_shinfo(skb[0])->gso_type = SKB_GSO_TCPV4; + skb_shinfo(skb[0])->gso_type |= SKB_GSO_DODGY; + skb_shinfo(skb[0])->gso_segs = 0; + skb_shinfo(skb[0])->frag_list = skb[1]; + + /* adjust skb[0]'s len */ + skb[0]->len += skb[1]->len; + skb[0]->data_len += skb[1]->data_len; + skb[0]->truesize += skb[1]->truesize; + + return skb[0]; + +err_skb1: + __free_page(page[1]); +err_page1: + kfree_skb(skb[0]); +err_skb0: + __free_page(page[0]); +err_page0: + return NULL; +} + +static __init int test_skb_segment(void) +{ + netdev_features_t features; + struct sk_buff *skb, *segs; + int ret = -1; + + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | + NETIF_F_IPV6_CSUM; + features |= NETIF_F_RXCSUM; + skb = build_test_skb(); + if (!skb) { + pr_info("%s: failed to build_test_skb", __func__); + goto done; + } + + segs = skb_segment(skb, features); + if (segs) { + kfree_skb_list(segs); + ret = 0; + pr_info("%s: success in skb_segment!", __func__); + } else { + pr_info("%s: failed in skb_segment!", __func__); + } + kfree_skb(skb); +done: + return ret; +} + static __init int test_bpf(void) { int i, err_cnt = 0, pass_cnt = 0; @@ -6632,9 +6719,11 @@ static int __init test_bpf_init(void) return ret; ret = test_bpf(); - destroy_bpf_tests(); - return ret; + if (ret) + return ret; + + return test_skb_segment(); } static void __exit test_bpf_exit(void) -- 2.9.5
[PATCH net-next v5 1/2] net: permit skb_segment on head_frag frag_list skb
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [883ffef03558] __crash_kexec at 8110c525 #2 [883ffef03620] crash_kexec at 8110d5cc #3 [883ffef03640] oops_end at 8101d7e7 #4 [883ffef03668] die at 8101deb2 #5 [883ffef03698] do_trap at 8101a700 #6 [883ffef036e8] do_error_trap at 8101abfe #7 [883ffef037a0] do_invalid_op at 8101acd0 #8 [883ffef037b0] invalid_op at 81a00bab [exception RIP: skb_segment+3044] RIP: 817e4dd4 RSP: 883ffef03860 RFLAGS: 00010216 RAX: 2bf6 RBX: 883feb7aaa00 RCX: 0011 RDX: 883fb87910c0 RSI: 0011 RDI: 883feb7ab500 RBP: 883ffef03928 R8: 2ce2 R9: 27da R10: 01ea R11: 2d82 R12: 883f90a1ee80 R13: 883fb8791120 R14: 883feb7abc00 R15: 2ce2 ORIG_RAX: CS: 0010 SS: 0018 #9 [883ffef03930] tcp_gso_segment at 818713e7 --- --- ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. This patch addressed the issue by handling skb_headlen(list_skb) != 0 case properly if list_skb->head_frag is true, which is expected in most cases. The head frag is processed before list_skb->frags are processed. Reported-by: Diptanu Gon ChoudhurySigned-off-by: Yonghong Song --- net/core/skbuff.c | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 715c134..23b317a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3460,6 +3460,19 @@ void *skb_pull_rcsum(struct sk_buff *skb, unsigned int len) } EXPORT_SYMBOL_GPL(skb_pull_rcsum); +static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb) +{ + skb_frag_t head_frag; + struct page *page; + + page = virt_to_head_page(frag_skb->head); + head_frag.page.p = page; + head_frag.page_offset = frag_skb->data - + (unsigned char *)page_address(page); + head_frag.size = skb_headlen(frag_skb); + return head_frag; +} + /** * skb_segment - Perform protocol segmentation on skb. * @head_skb: buffer to segment @@ -3664,15 +3677,16 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, while (pos < offset + len) { if (i >= nfrags) { - BUG_ON(skb_headlen(list_skb)); - i = 0; nfrags = skb_shinfo(list_skb)->nr_frags; frag = skb_shinfo(list_skb)->frags; - frag_skb = list_skb; - - BUG_ON(!nfrags); + if (skb_headlen(list_skb)) { + BUG_ON(!list_skb->head_frag); + /* to make room for head_frag. */ + i--; frag--; + } + frag_skb = list_skb; if (skb_orphan_frags(frag_skb, GFP_ATOMIC) || skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) @@ -3689,7 +3703,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, goto err; } - *nskb_frag = *frag; + *nskb_frag = (i < 0) ? skb_head_frag_to_page_desc(frag_skb) : *frag; __skb_frag_ref(nskb_frag); size = skb_frag_size(nskb_frag); -- 2.9.5
[PATCH net-next v5 0/2] net: permit skb_segment on head_frag frag_list skb
One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. Patch #1 provides a simple solution to avoid BUG_ON. If list_skb->head_frag is true, its page-backed frag will be processed before the list_skb->frags. Patch #2 provides a test case in test_bpf module which constructs a skb and calls skb_segment() directly. The test case is able to trigger the BUG_ON without Patch #1. The patch has been tested in the following setup: ipv6_host <-> nat_server <-> ipv4_host where nat_server has a bpf program doing ipv4<->ipv6 translation and forwarding through clsact hook bpf_skb_change_proto. Changelog: v4 -> v5: . Replace local variable head_frag with a static inline function skb_head_frag_to_page_desc which gets the head_frag on-demand. This makes code more readable and also does not increase the stack size, from Alexander. . Remove the "if(nfrags)" guard for skb_orphan_frags and skb_zerocopy_clone as I found that they can handle zero-frag skb (with non-zero skb_headlen(skb)) properly. . Properly release segment list from skb_segment() in the test, from Eric. v3 -> v4: . Remove dynamic memory allocation and use rewinding for both index and frag to remove one branch in fast path, from Alexander. . Fix a bunch of issues in test_bpf skb_segment() test, including proper way to allocate skb, proper function argument for skb_add_rx_frag and not freeint skb, etc., from Eric. v2 -> v3: . Use starting frag index -1 (instead of 0) to special process head_frag before other frags in the skb, from Alexander Duyck. v1 -> v2: . Removed never-hit BUG_ON, spotted by Linyu Yuan. Yonghong Song (2): net: permit skb_segment on head_frag frag_list skb net: bpf: add a test for skb_segment in test_bpf module lib/test_bpf.c| 93 +-- net/core/skbuff.c | 26 2 files changed, 111 insertions(+), 8 deletions(-) -- 2.9.5
Re: [PATCH net-next V2] Documentation/networking: Add net DIM documentation
On Wed, Mar 21, 2018 at 12:44:29PM -0700, Florian Fainelli wrote: > On 03/21/2018 12:37 PM, Randy Dunlap wrote: > > On 03/21/2018 11:33 AM, Tal Gilboa wrote: > >> Net DIM is a generic algorithm, purposed for dynamically > >> optimizing network devices interrupt moderation. This > >> document describes how it works and how to use it. > >> > >> Signed-off-by: Tal Gilboa> >> --- > >> Documentation/networking/net_dim.txt | 174 > >> +++ > >> 1 file changed, 174 insertions(+) > >> create mode 100644 Documentation/networking/net_dim.txt > >> > >> diff --git a/Documentation/networking/net_dim.txt > >> b/Documentation/networking/net_dim.txt > >> new file mode 100644 > >> index 000..9cb31c5 > >> --- /dev/null > >> +++ b/Documentation/networking/net_dim.txt > >> @@ -0,0 +1,174 @@ > >> +Net DIM - Generic Network Dynamic Interrupt Moderation > >> +== > >> + > >> +Author: > >> + Tal Gilboa > >> + > >> + > >> +Contents > >> += > >> + > >> +- Assumptions > >> +- Introduction > >> +- The Net DIM Algorithm > >> +- Registering a Network Device to DIM > >> +- Example > >> + > >> +Part 0: Assumptions > >> +== > >> + > >> +This document assumes the reader has basic knowledge in network drivers > >> +and in general interrupt moderation. > >> + > >> + > >> +Part I: Introduction > >> +== > >> + > >> +Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the > >> +interrupt moderation configuration of a channel in order to optimize > >> packet > >> +processing. The mechanism includes an algorithm which decides if and how > >> to > >> +change moderation parameters for a channel, usually by performing an > >> analysis on > >> +runtime data sampled from the system. Net DIM is such a mechanism. In each > >> +iteration of the algorithm, it analyses a given sample of the data, > >> compares it > >> +to the previous sample and if required, it can decide to change some of > >> the > >> +interrupt moderation configuration fields. The data sample is composed of > >> data > >> +bandwidth, the number of packets and the number of events. The time > >> between > >> +samples is also measured. Net DIM compares the current and the previous > >> data and > >> +returns an adjusted interrupt moderation configuration object. In some > >> cases, > >> +the algorithm might decide not to change anything. The configuration > >> fields are > >> +the minimum duration (microseconds) allowed between events and the maximum > >> +number of wanted packets per event. The Net DIM algorithm ascribes > >> importance to > >> +increase bandwidth over reducing interrupt rate. > >> + > >> + > >> +Part II: The Net DIM Algorithm > >> +=== > >> + > >> +Each iteration of the Net DIM algorithm follows these steps: > >> +1. Calculates new data sample. > >> +2. Compares it to previous sample. > >> +3. Makes a decision - suggests interrupt moderation configuration fields. > >> +4. Applies a schedule work function, which applies suggested > >> configuration. > >> + > >> +The first two steps are straightforward, both the new and the previous > >> data are > >> +supplied by the driver registered to Net DIM. The previous data is the > >> new data > >> +supplied to the previous iteration. The comparison step checks the > >> difference > >> +between the new and previous data and decides on the result of the last > >> step. > >> +A step would result as "better" if bandwidth increases and as "worse" if > >> +bandwidth reduces. If there is no change in bandwidth, the packet rate is > >> +compared in a similar fashion - increase == "better" and decrease == > >> "worse". > >> +In case there is no change in the packet rate as well, the interrupt rate > >> is > >> +compared. Here the algorithm tries to optimize for lower interrupt rate > >> so an > >> +increase in the interrupt rate is considered "worse" and a decrease is > >> +considered "better". Step #2 has an optimization for avoiding false > >> results: it > >> +only considers a difference between samples as valid if it is greater > >> than a > >> +certain percentage. Also, since Net DIM does not measure anything by > >> itself, it > >> +assumes the data provided by the driver is valid. > >> + > >> +Step #3 decides on the suggested configuration based on the result from > >> step #2 > >> +and the internal state of the algorithm. The states reflect the > >> "direction" of > >> +the algorithm: is it going left (reducing moderation), right (increasing > >> +moderation) or standing still. Another optimization is that if a decision > >> +to stay still is made multiple times, the interval between iterations of > >> the > >> +algorithm would increase in order to reduce calculation overhead. Also, > >> after > >> +"parking" on one of the most left or most right decisions, the algorithm > >>
Re: [PATCH net-next V2] Documentation/networking: Add net DIM documentation
On Wed, Mar 21, 2018 at 08:33:45PM +0200, Tal Gilboa wrote: > Net DIM is a generic algorithm, purposed for dynamically > optimizing network devices interrupt moderation. This > document describes how it works and how to use it. > > Signed-off-by: Tal Gilboa> --- > Documentation/networking/net_dim.txt | 174 > +++ > 1 file changed, 174 insertions(+) > create mode 100644 Documentation/networking/net_dim.txt > > diff --git a/Documentation/networking/net_dim.txt > b/Documentation/networking/net_dim.txt > new file mode 100644 > index 000..9cb31c5 > --- /dev/null > +++ b/Documentation/networking/net_dim.txt > @@ -0,0 +1,174 @@ > +Net DIM - Generic Network Dynamic Interrupt Moderation > +== > + > +Author: > + Tal Gilboa > + > + > +Contents > += > + > +- Assumptions > +- Introduction > +- The Net DIM Algorithm > +- Registering a Network Device to DIM > +- Example > + > +Part 0: Assumptions > +== > + > +This document assumes the reader has basic knowledge in network drivers > +and in general interrupt moderation. > + > + > +Part I: Introduction > +== > + > +Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the > +interrupt moderation configuration of a channel in order to optimize packet > +processing. The mechanism includes an algorithm which decides if and how to > +change moderation parameters for a channel, usually by performing an > analysis on > +runtime data sampled from the system. Net DIM is such a mechanism. In each > +iteration of the algorithm, it analyses a given sample of the data, compares > it > +to the previous sample and if required, it can decide to change some of the > +interrupt moderation configuration fields. The data sample is composed of > data > +bandwidth, the number of packets and the number of events. The time between > +samples is also measured. Net DIM compares the current and the previous data > and > +returns an adjusted interrupt moderation configuration object. In some cases, > +the algorithm might decide not to change anything. The configuration fields > are > +the minimum duration (microseconds) allowed between events and the maximum > +number of wanted packets per event. The Net DIM algorithm ascribes > importance to > +increase bandwidth over reducing interrupt rate. > + > + > +Part II: The Net DIM Algorithm > +=== > + > +Each iteration of the Net DIM algorithm follows these steps: > +1. Calculates new data sample. > +2. Compares it to previous sample. > +3. Makes a decision - suggests interrupt moderation configuration fields. > +4. Applies a schedule work function, which applies suggested configuration. > + > +The first two steps are straightforward, both the new and the previous data > are > +supplied by the driver registered to Net DIM. The previous data is the new > data > +supplied to the previous iteration. The comparison step checks the difference > +between the new and previous data and decides on the result of the last step. > +A step would result as "better" if bandwidth increases and as "worse" if > +bandwidth reduces. If there is no change in bandwidth, the packet rate is > +compared in a similar fashion - increase == "better" and decrease == "worse". > +In case there is no change in the packet rate as well, the interrupt rate is > +compared. Here the algorithm tries to optimize for lower interrupt rate so an > +increase in the interrupt rate is considered "worse" and a decrease is > +considered "better". Step #2 has an optimization for avoiding false results: > it > +only considers a difference between samples as valid if it is greater than a > +certain percentage. Also, since Net DIM does not measure anything by itself, > it > +assumes the data provided by the driver is valid. > + > +Step #3 decides on the suggested configuration based on the result from step > #2 > +and the internal state of the algorithm. The states reflect the "direction" > of > +the algorithm: is it going left (reducing moderation), right (increasing > +moderation) or standing still. Another optimization is that if a decision > +to stay still is made multiple times, the interval between iterations of the > +algorithm would increase in order to reduce calculation overhead. Also, after I wonder if this increased interval can lead to packet drops due to some impulse? Like, the card is receiving a low volume of packets and suddenly a new flow starts at line rate, for example. If the max interval is not too aggressive, this would't be a problem. (sorry, I didn't read much of the implementation nor the drivers already using it) > +"parking" on one of the most left or most right decisions, the algorithm may > +decide to verify this decision by taking a step in the other direction. This > is > +done in order to avoid getting stuck in a "deep sleep"
Re: [PATCH net-next v4 2/2] net: bpf: add a test for skb_segment in test_bpf module
On 3/21/18 8:26 AM, Eric Dumazet wrote: On 03/20/2018 11:47 PM, Yonghong Song wrote: +static __init int test_skb_segment(void) +{ + netdev_features_t features; + struct sk_buff *skb; + int ret = -1; + + features = NETIF_F_SG | NETIF_F_GSO_PARTIAL | NETIF_F_IP_CSUM | + NETIF_F_IPV6_CSUM; + features |= NETIF_F_RXCSUM; + skb = build_test_skb(); + if (!skb) { + pr_info("%s: failed to build_test_skb", __func__); + goto done; + } + + if (skb_segment(skb, features)) { + ret = 0; + pr_info("%s: success in skb_segment!", __func__); + } else { + pr_info("%s: failed in skb_segment!", __func__); + } + kfree_skb(skb); If skb_segmen() was successful (original) skb was already freed. kfree_skb(old_skb) should thus panic the box, if you run this code on a kernel having some debugging features like KASAN I tried with KASAN. It does not panic. Looking at the code in net/core/dev.c: validate_xmit_skb: static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device *dev, bool *again) ... if (netif_needs_gso(skb, features)) { struct sk_buff *segs; segs = skb_gso_segment(skb, features); if (IS_ERR(segs)) { goto out_kfree_skb; } else if (segs) { consume_skb(skb); skb = segs; } ... out_kfree_skb: kfree_skb(skb); which also indicates kfree_skb/consume_skb probably is the right way to free skb after skb_gso_segment/skb_segment. This probably explains why my above kfree_skb(skb) does not crash. So you must store in a variable the return of skb_segment(), to be able to free skb(s), using kfree_skb_list() Totally agree. Will make the change. Thanks! +done: + return ret; +} +
Re: [PATCH net-next v3 1/2] net: permit skb_segment on head_frag frag_list skb
On 3/21/18 7:59 AM, Alexander Duyck wrote: On Tue, Mar 20, 2018 at 10:02 PM, Yonghong Songwrote: On 3/20/18 4:50 PM, Alexander Duyck wrote: On Tue, Mar 20, 2018 at 4:21 PM, Yonghong Song wrote: One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at function skb_segment(), line 3667. The bpf program attaches to clsact ingress, calls bpf_skb_change_proto to change protocol from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect to send the changed packet out. 3472 struct sk_buff *skb_segment(struct sk_buff *head_skb, 3473 netdev_features_t features) 3474 { 3475 struct sk_buff *segs = NULL; 3476 struct sk_buff *tail = NULL; ... 3665 while (pos < offset + len) { 3666 if (i >= nfrags) { 3667 BUG_ON(skb_headlen(list_skb)); 3668 3669 i = 0; 3670 nfrags = skb_shinfo(list_skb)->nr_frags; 3671 frag = skb_shinfo(list_skb)->frags; 3672 frag_skb = list_skb; ... call stack: ... #1 [883ffef03558] __crash_kexec at 8110c525 #2 [883ffef03620] crash_kexec at 8110d5cc #3 [883ffef03640] oops_end at 8101d7e7 #4 [883ffef03668] die at 8101deb2 #5 [883ffef03698] do_trap at 8101a700 #6 [883ffef036e8] do_error_trap at 8101abfe #7 [883ffef037a0] do_invalid_op at 8101acd0 #8 [883ffef037b0] invalid_op at 81a00bab [exception RIP: skb_segment+3044] RIP: 817e4dd4 RSP: 883ffef03860 RFLAGS: 00010216 RAX: 2bf6 RBX: 883feb7aaa00 RCX: 0011 RDX: 883fb87910c0 RSI: 0011 RDI: 883feb7ab500 RBP: 883ffef03928 R8: 2ce2 R9: 27da R10: 01ea R11: 2d82 R12: 883f90a1ee80 R13: 883fb8791120 R14: 883feb7abc00 R15: 2ce2 ORIG_RAX: CS: 0010 SS: 0018 #9 [883ffef03930] tcp_gso_segment at 818713e7 --- --- ... The triggering input skb has the following properties: list_skb = skb->frag_list; skb->nfrags != NULL && skb_headlen(list_skb) != 0 and skb_segment() is not able to handle a frag_list skb if its headlen (list_skb->len - list_skb->data_len) is not 0. This patch addressed the issue by handling skb_headlen(list_skb) != 0 case properly if list_skb->head_frag is true, which is expected in most cases. The head frag is processed before list_skb->frags are processed. Reported-by: Diptanu Gon Choudhury Signed-off-by: Yonghong Song --- net/core/skbuff.c | 51 +-- 1 file changed, 37 insertions(+), 14 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 715c134..59bbc06 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3475,7 +3475,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, struct sk_buff *segs = NULL; struct sk_buff *tail = NULL; struct sk_buff *list_skb = skb_shinfo(head_skb)->frag_list; - skb_frag_t *frag = skb_shinfo(head_skb)->frags; + skb_frag_t *frag = skb_shinfo(head_skb)->frags, *head_frag = NULL; I think you misunderstood me. I wasn't saying you allocate head_frag. I was saying you could move the declaration down. Sorry for my misunderstanding. I did understand your intention of moving the declaration down in order to save stack space. I thought that we cannot really move declaration down (although it works in C, but semantically it is not quite right, more later), so I moved on to use runtime allocation. But indeed skb_frag_t is not big (16 bytes), it could live on the stack. unsigned int mss = skb_shinfo(head_skb)->gso_size; unsigned int doffset = head_skb->data - skb_mac_header(head_skb); struct sk_buff *frag_skb = head_skb; @@ -3664,19 +3664,39 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, while (pos < offset + len) { So right here in the loop you could add a "skb_frag_t head_frag;" just so we declare it here and save ourselves the stack space. I actually tried to move "skb_frag_t head_frag". The stack size remains the same, 0xc0. This is related to how C compiler allocates stack space. The declaration place won't decide the stack size as long as the declaration dictates the usage. The stack size is really determined by liveness analysis. Further, we have code like: do { while (pos < offset + len) { if (i >= nfrags) { ... head_frag = ... } ... = head_frag; // head_frag access guaranteed after
RE: [PATCH net-next RFC V1 1/5] net: Introduce peer to peer one step PTP time stamping.
> -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Richard Cochran > Sent: Wednesday, March 21, 2018 11:58 AM > To: netdev@vger.kernel.org > Cc: devicet...@vger.kernel.org; Andrew Lunn; David Miller > ; Florian Fainelli ; Mark Rutland > ; Miroslav Lichvar ; Rob > Herring ; Willem de Bruijn > Subject: [PATCH net-next RFC V1 1/5] net: Introduce peer to peer one step PTP > + > + /* > + * Same as HWTSTAMP_TX_ONESTEP_SYNC, but also enables time > + * stamp insertion directly into PDelay_Resp packets. In this > + * case, neither transmitted Sync nor PDelay_Resp packets will > + * receive a time stamp via the socket error queue. > + */ > + HWTSTAMP_TX_ONESTEP_P2P, > }; > I am guessing that we expect all devices which support onestep P2P messages, will always support onestep SYNC as well? Thanks, Jake
RE: [PATCH net-next v2] net: mvpp2: Don't use dynamic allocs for local variables
Hi Maxime Please check the TWO points: 1). The mvpp2_prs_flow_find() returns TID if found The TID=0 is valid FOUND value For Not-found use -ENOENT (just like your mvpp2_prs_vlan_find) 2). The original code always uses "mvpp2_prs_entry *pe" storage Zero-Allocated Please check the correctnes of new "mvpp2_prs_entry pe" without memset(pe, 0, sizeof(pe)); in all procedures where pe=kzalloc() has been replaced Thanks Yan Markman Tel. 05-44732819 -Original Message- From: Maxime Chevallier [mailto:maxime.chevall...@bootlin.com] Sent: Wednesday, March 21, 2018 5:14 PM To: da...@davemloft.net Cc: Maxime Chevallier; netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Antoine Tenart ; thomas.petazz...@bootlin.com; gregory.clem...@bootlin.com; miquel.ray...@bootlin.com; Nadav Haklai ; Stefan Chulski ; Yan Markman ; m...@semihalf.com Subject: [PATCH net-next v2] net: mvpp2: Don't use dynamic allocs for local variables Some helper functions that search for given entries in the TCAM filter on PPv2 controller make use of dynamically alloced temporary variables, allocated with GFP_KERNEL. These functions can be called in atomic context, and dynamic alloc is not really needed in these cases anyways. This commit gets rid of dynamic allocs and use stack allocation in the following functions, and where they're used : - mvpp2_prs_flow_find - mvpp2_prs_vlan_find - mvpp2_prs_double_vlan_find - mvpp2_prs_mac_da_range_find For all these functions, instead of returning an temporary object representing the TCAM entry, we simply return the TCAM id that matches the requested entry. Signed-off-by: Maxime Chevallier --- V2: Remove unnecessary brackets, following Antoine Tenart's review. drivers/net/ethernet/marvell/mvpp2.c | 289 +++ 1 file changed, 127 insertions(+), 162 deletions(-) diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c index 9bd35f2291d6..28e33e139178 100644 --- a/drivers/net/ethernet/marvell/mvpp2.c +++ b/drivers/net/ethernet/marvell/mvpp2.c @@ -1913,16 +1913,11 @@ static void mvpp2_prs_sram_offset_set(struct mvpp2_prs_entry *pe, } /* Find parser flow entry */ -static struct mvpp2_prs_entry *mvpp2_prs_flow_find(struct mvpp2 *priv, int flow) +static int mvpp2_prs_flow_find(struct mvpp2 *priv, int flow) { - struct mvpp2_prs_entry *pe; + struct mvpp2_prs_entry pe; int tid; - pe = kzalloc(sizeof(*pe), GFP_KERNEL); - if (!pe) - return NULL; - mvpp2_prs_tcam_lu_set(pe, MVPP2_PRS_LU_FLOWS); - /* Go through the all entires with MVPP2_PRS_LU_FLOWS */ for (tid = MVPP2_PRS_TCAM_SRAM_SIZE - 1; tid >= 0; tid--) { u8 bits; @@ -1931,17 +1926,16 @@ static struct mvpp2_prs_entry *mvpp2_prs_flow_find(struct mvpp2 *priv, int flow) priv->prs_shadow[tid].lu != MVPP2_PRS_LU_FLOWS) continue; - pe->index = tid; - mvpp2_prs_hw_read(priv, pe); - bits = mvpp2_prs_sram_ai_get(pe); + pe.index = tid; + mvpp2_prs_hw_read(priv, ); + bits = mvpp2_prs_sram_ai_get(); /* Sram store classification lookup ID in AI bits [5:0] */ if ((bits & MVPP2_PRS_FLOW_ID_MASK) == flow) - return pe; + return tid; } - kfree(pe); - return NULL; + return -ENOENT; } /* Return first free tcam index, seeking from start to end */ @@ -2189,16 +2183,12 @@ static void mvpp2_prs_dsa_tag_ethertype_set(struct mvpp2 *priv, int port, } /* Search for existing single/triple vlan entry */ -static struct mvpp2_prs_entry *mvpp2_prs_vlan_find(struct mvpp2 *priv, - unsigned short tpid, int ai) +static int mvpp2_prs_vlan_find(struct mvpp2 *priv, unsigned short tpid, +int ai) { - struct mvpp2_prs_entry *pe; + struct mvpp2_prs_entry pe; int tid; - pe = kzalloc(sizeof(*pe), GFP_KERNEL); - if (!pe) - return NULL; - mvpp2_prs_tcam_lu_set(pe, MVPP2_PRS_LU_VLAN); + memset(, 0, sizeof(pe)); /* Go through the all entries with MVPP2_PRS_LU_VLAN */ for (tid = MVPP2_PE_FIRST_FREE_TID; @@ -2210,19 +2200,19 @@ static struct mvpp2_prs_entry *mvpp2_prs_vlan_find(struct mvpp2 *priv, priv->prs_shadow[tid].lu != MVPP2_PRS_LU_VLAN) continue; - pe->index = tid; + pe.index = tid; - mvpp2_prs_hw_read(priv, pe); - match = mvpp2_prs_tcam_data_cmp(pe, 0, swab16(tpid)); + mvpp2_prs_hw_read(priv, ); + match = mvpp2_prs_tcam_data_cmp(, 0,
Re: [PATCH net-next V2] Documentation/networking: Add net DIM documentation
On 03/21/2018 12:37 PM, Randy Dunlap wrote: > On 03/21/2018 11:33 AM, Tal Gilboa wrote: >> Net DIM is a generic algorithm, purposed for dynamically >> optimizing network devices interrupt moderation. This >> document describes how it works and how to use it. >> >> Signed-off-by: Tal Gilboa>> --- >> Documentation/networking/net_dim.txt | 174 >> +++ >> 1 file changed, 174 insertions(+) >> create mode 100644 Documentation/networking/net_dim.txt >> >> diff --git a/Documentation/networking/net_dim.txt >> b/Documentation/networking/net_dim.txt >> new file mode 100644 >> index 000..9cb31c5 >> --- /dev/null >> +++ b/Documentation/networking/net_dim.txt >> @@ -0,0 +1,174 @@ >> +Net DIM - Generic Network Dynamic Interrupt Moderation >> +== >> + >> +Author: >> +Tal Gilboa >> + >> + >> +Contents >> += >> + >> +- Assumptions >> +- Introduction >> +- The Net DIM Algorithm >> +- Registering a Network Device to DIM >> +- Example >> + >> +Part 0: Assumptions >> +== >> + >> +This document assumes the reader has basic knowledge in network drivers >> +and in general interrupt moderation. >> + >> + >> +Part I: Introduction >> +== >> + >> +Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the >> +interrupt moderation configuration of a channel in order to optimize packet >> +processing. The mechanism includes an algorithm which decides if and how to >> +change moderation parameters for a channel, usually by performing an >> analysis on >> +runtime data sampled from the system. Net DIM is such a mechanism. In each >> +iteration of the algorithm, it analyses a given sample of the data, >> compares it >> +to the previous sample and if required, it can decide to change some of the >> +interrupt moderation configuration fields. The data sample is composed of >> data >> +bandwidth, the number of packets and the number of events. The time between >> +samples is also measured. Net DIM compares the current and the previous >> data and >> +returns an adjusted interrupt moderation configuration object. In some >> cases, >> +the algorithm might decide not to change anything. The configuration fields >> are >> +the minimum duration (microseconds) allowed between events and the maximum >> +number of wanted packets per event. The Net DIM algorithm ascribes >> importance to >> +increase bandwidth over reducing interrupt rate. >> + >> + >> +Part II: The Net DIM Algorithm >> +=== >> + >> +Each iteration of the Net DIM algorithm follows these steps: >> +1. Calculates new data sample. >> +2. Compares it to previous sample. >> +3. Makes a decision - suggests interrupt moderation configuration fields. >> +4. Applies a schedule work function, which applies suggested configuration. >> + >> +The first two steps are straightforward, both the new and the previous data >> are >> +supplied by the driver registered to Net DIM. The previous data is the new >> data >> +supplied to the previous iteration. The comparison step checks the >> difference >> +between the new and previous data and decides on the result of the last >> step. >> +A step would result as "better" if bandwidth increases and as "worse" if >> +bandwidth reduces. If there is no change in bandwidth, the packet rate is >> +compared in a similar fashion - increase == "better" and decrease == >> "worse". >> +In case there is no change in the packet rate as well, the interrupt rate is >> +compared. Here the algorithm tries to optimize for lower interrupt rate so >> an >> +increase in the interrupt rate is considered "worse" and a decrease is >> +considered "better". Step #2 has an optimization for avoiding false >> results: it >> +only considers a difference between samples as valid if it is greater than a >> +certain percentage. Also, since Net DIM does not measure anything by >> itself, it >> +assumes the data provided by the driver is valid. >> + >> +Step #3 decides on the suggested configuration based on the result from >> step #2 >> +and the internal state of the algorithm. The states reflect the "direction" >> of >> +the algorithm: is it going left (reducing moderation), right (increasing >> +moderation) or standing still. Another optimization is that if a decision >> +to stay still is made multiple times, the interval between iterations of the >> +algorithm would increase in order to reduce calculation overhead. Also, >> after >> +"parking" on one of the most left or most right decisions, the algorithm may >> +decide to verify this decision by taking a step in the other direction. >> This is >> +done in order to avoid getting stuck in a "deep sleep" scenario. Once a >> +decision is made, an interrupt moderation configuration is selected from >> +the predefined profiles. > > I think a short description of the predefined profiles could
Re: [PATCH v2 bpf-next 4/8] tracepoint: compute num_args at build time
On Wed, Mar 21, 2018 at 11:54 AM, Alexei Starovoitovwrote: > > add fancy macro to compute number of arguments passed into tracepoint > at compile time and store it as part of 'struct tracepoint'. We should probably do this __COUNT() thing in some generic header, we just talked last week about another use case entirely. And wouldn't it be nice to just have some generic infrastructure like this: /* * This counts to ten. * * Any more than that, and we'd need to take off our shoes */ #define __GET_COUNT(_0,_1,_2,_3,_4,_5,_6,_7,_8,_9,_10,_n,...) _n #define __COUNT(...) \ __GET_COUNT(__VA_ARGS__,10,9,8,7,6,5,4,3,2,1,0) #define COUNT(...) __COUNT(dummy,##__VA_ARGS__) #define __CONCAT(a,b) a##b #define __CONCATENATE(a,b) __CONCAT(a,b) and then you can do things like: #define fn(...) __CONCATENATE(fn,COUNT(__VA_ARGS__))(__VA_ARGS__) which turns "fn(x,y,z..)" into "fn(x,y,z)". That can be useful for things like "max(a,b,c,d)" expanding to "max4()", and then you can just have the trivial #define max3(a,b,c) max2(a,max2(b.c)) etc (with proper parentheses, of course). And I'd rather not have that function name concatenation be part of the counting logic, because we actually may have different ways of counting, so the concatenation is separate. In particular, the other situation this came up for, the counting was in arguments _pairs_, so you'd use a "COUNT_PAIR()" instead of "COUNT()". NOTE NOTE NOTE! The above was slightly tested and then cut-and-pasted. I might have screwed up at any point. Think of it as pseudo-code. Linus
Re: [bug, bisected] pfifo_fast causes packet reordering
On 21.03.18 19:43, John Fastabend wrote: Thats my theory at least. Are you able to test a patch if I generate one to fix this? Yes, no problem. I just tested with the flag change you suggested (see below, I had to keep TCQ_F_CPUSTATS to prevent a crash) and I have NOT seen OOO so far. Thanks, Jakob diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 190570f21b20..51b68ef4977b 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -792,7 +792,7 @@ struct Qdisc_ops pfifo_fast_ops __read_mostly = { .dump = pfifo_fast_dump, .change_tx_queue_len = pfifo_fast_change_tx_queue_len, .owner = THIS_MODULE, - .static_flags = TCQ_F_NOLOCK | TCQ_F_CPUSTATS, + .static_flags = TCQ_F_CPUSTATS, }; EXPORT_SYMBOL(pfifo_fast_ops);
Re: [PATCH net-next V2] Documentation/networking: Add net DIM documentation
On 03/21/2018 11:33 AM, Tal Gilboa wrote: > Net DIM is a generic algorithm, purposed for dynamically > optimizing network devices interrupt moderation. This > document describes how it works and how to use it. > > Signed-off-by: Tal Gilboa> --- > Documentation/networking/net_dim.txt | 174 > +++ > 1 file changed, 174 insertions(+) > create mode 100644 Documentation/networking/net_dim.txt > > diff --git a/Documentation/networking/net_dim.txt > b/Documentation/networking/net_dim.txt > new file mode 100644 > index 000..9cb31c5 > --- /dev/null > +++ b/Documentation/networking/net_dim.txt > @@ -0,0 +1,174 @@ > +Net DIM - Generic Network Dynamic Interrupt Moderation > +== > + > +Author: > + Tal Gilboa > + > + > +Contents > += > + > +- Assumptions > +- Introduction > +- The Net DIM Algorithm > +- Registering a Network Device to DIM > +- Example > + > +Part 0: Assumptions > +== > + > +This document assumes the reader has basic knowledge in network drivers > +and in general interrupt moderation. > + > + > +Part I: Introduction > +== > + > +Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the > +interrupt moderation configuration of a channel in order to optimize packet > +processing. The mechanism includes an algorithm which decides if and how to > +change moderation parameters for a channel, usually by performing an > analysis on > +runtime data sampled from the system. Net DIM is such a mechanism. In each > +iteration of the algorithm, it analyses a given sample of the data, compares > it > +to the previous sample and if required, it can decide to change some of the > +interrupt moderation configuration fields. The data sample is composed of > data > +bandwidth, the number of packets and the number of events. The time between > +samples is also measured. Net DIM compares the current and the previous data > and > +returns an adjusted interrupt moderation configuration object. In some cases, > +the algorithm might decide not to change anything. The configuration fields > are > +the minimum duration (microseconds) allowed between events and the maximum > +number of wanted packets per event. The Net DIM algorithm ascribes > importance to > +increase bandwidth over reducing interrupt rate. > + > + > +Part II: The Net DIM Algorithm > +=== > + > +Each iteration of the Net DIM algorithm follows these steps: > +1. Calculates new data sample. > +2. Compares it to previous sample. > +3. Makes a decision - suggests interrupt moderation configuration fields. > +4. Applies a schedule work function, which applies suggested configuration. > + > +The first two steps are straightforward, both the new and the previous data > are > +supplied by the driver registered to Net DIM. The previous data is the new > data > +supplied to the previous iteration. The comparison step checks the difference > +between the new and previous data and decides on the result of the last step. > +A step would result as "better" if bandwidth increases and as "worse" if > +bandwidth reduces. If there is no change in bandwidth, the packet rate is > +compared in a similar fashion - increase == "better" and decrease == "worse". > +In case there is no change in the packet rate as well, the interrupt rate is > +compared. Here the algorithm tries to optimize for lower interrupt rate so an > +increase in the interrupt rate is considered "worse" and a decrease is > +considered "better". Step #2 has an optimization for avoiding false results: > it > +only considers a difference between samples as valid if it is greater than a > +certain percentage. Also, since Net DIM does not measure anything by itself, > it > +assumes the data provided by the driver is valid. > + > +Step #3 decides on the suggested configuration based on the result from step > #2 > +and the internal state of the algorithm. The states reflect the "direction" > of > +the algorithm: is it going left (reducing moderation), right (increasing > +moderation) or standing still. Another optimization is that if a decision > +to stay still is made multiple times, the interval between iterations of the > +algorithm would increase in order to reduce calculation overhead. Also, after > +"parking" on one of the most left or most right decisions, the algorithm may > +decide to verify this decision by taking a step in the other direction. This > is > +done in order to avoid getting stuck in a "deep sleep" scenario. Once a > +decision is made, an interrupt moderation configuration is selected from > +the predefined profiles. I think a short description of the predefined profiles could help. > + > +The last step is to notify the registered driver that it should apply the > +suggested configuration. This is done by scheduling a work function, defined > by > +the Net
[PATCH][next] gre: fix TUNNEL_SEQ bit check on sequence numbering
From: Colin Ian KingThe current logic of flags | TUNNEL_SEQ is always non-zero and hence sequence numbers are always incremented no matter the setting of the TUNNEL_SEQ bit. Fix this by using & instead of |. Detected by CoverityScan, CID#1466039 ("Operands don't affect result") Fixes: 77a5196a804e ("gre: add sequence number for collect md mode.") Signed-off-by: Colin Ian King --- net/ipv4/ip_gre.c | 2 +- net/ipv6/ip6_gre.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 2fa2ef2e2af9..9ab1aa2f7660 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -550,7 +550,7 @@ static void gre_fb_xmit(struct sk_buff *skb, struct net_device *dev, (TUNNEL_CSUM | TUNNEL_KEY | TUNNEL_SEQ); gre_build_header(skb, tunnel_hlen, flags, proto, tunnel_id_to_key32(tun_info->key.tun_id), -(flags | TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); +(flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0; diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 0bcefc480aeb..3a98c694da5f 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -725,7 +725,7 @@ static netdev_tx_t __gre6_xmit(struct sk_buff *skb, gre_build_header(skb, tunnel->tun_hlen, flags, protocol, tunnel_id_to_key32(tun_info->key.tun_id), -(flags | TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) +(flags & TUNNEL_SEQ) ? htonl(tunnel->o_seqno++) : 0); } else { -- 2.15.1
Re: [PATCH net-next V2] Documentation/networking: Add net DIM documentation
Hi Tal, On 03/21/2018 11:33 AM, Tal Gilboa wrote: > Net DIM is a generic algorithm, purposed for dynamically > optimizing network devices interrupt moderation. This > document describes how it works and how to use it. Thanks a lot for providing this documentation, this is very helpful! A few things that could be good to be expanded upon a little bit: - HW must support configuring a timeout per channel - HW must support configuring a number of packets before getting an interrupt per channel Does that sound about right? [snip] > +In order to use Net DIM from a networking driver, the driver needs to call > the > +main net_dim() function. The recommended method is to call net_dim() on each > +interrupt. I would make it a bit clearer that this is on each invocation of the interrupt service routine function. With NAPI + net DIM working in concert you would not actually get one packet per interrupt consistently, it would largely depend on the rate, right? > Since Net DIM has a built-in moderation and it might decide to skip > +iterations under certain conditions, there is no need to moderate the > net_dim() > +calls as well. As mentioned above, the driver needs to provide an object of > type > +struct net_dim to the net_dim() function call. It is advised for each entity > +using Net DIM to hold a struct net_dim as part of its data structure and use > it > +as the main Net DIM API object. The struct net_dim_sample should hold the > latest > +bytes, packets and interrupts count. No need to perform any calculations, > just > +include the raw data. > + > +The net_dim() call itself does not return anything. Instead Net DIM relies on > +the driver to provide a callback function, which is called when the algorithm > +decides to make a change in the interrupt moderation parameters. This > callback > +will be scheduled and run in a separate thread in order not to add overhead > to > +the data flow. After the work is done, Net DIM algorithm needs to be set to > +the proper state in order to move to the next iteration. > + > + > +Part IV: Example > += > + > +The following code demonstrates how to register a driver to Net DIM. The > actual > +usage is not complete but it should make the outline of the usage clear. It could be worth to touch a word or two about reflecting the use of Net DIM within the driver into ethtool_coalesce::use_adaptive_rx_coalesce and ethtool_coalesce::use_adaptive_tx_coalesce? > + > +my_driver.c: > + > +#include > + > +/* Callback for net DIM to schedule on a decision to change moderation */ > +void my_driver_do_dim_work(struct work_struct *work) > +{ > + /* Get struct net_dim from struct work_struct */ > + struct net_dim *dim = container_of(work, struct net_dim, > +work); > + /* Do interrupt moderation related stuff */ > + ... > + > + /* Signal net DIM work is done and it should move to next iteration */ > + dim->state = NET_DIM_START_MEASURE; > +} > + > +/* My driver's interrupt handler */ > +int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) > +{ > + ... > + /* A struct to hold current measured data */ > + struct net_dim_sample dim_sample; > + ... > + /* Initiate data sample struct with current data */ > + net_dim_sample(my_entity->events, > +my_entity->packets, > +my_entity->bytes, > +_sample); > + /* Call net DIM */ > + net_dim(_entity->dim, dim_sample); > + ... > +} > + > +/* My entity's initialization function (my_entity was already allocated) */ > +int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) > +{ > + ... > + /* Initiate struct work_struct with my driver's callback function */ > + INIT_WORK(_entity->dim.work, my_driver_do_dim_work); > + ... > +} > -- Florian
Re: [PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
On Wed, Mar 21, 2018 at 11:58:18AM -0700, Richard Cochran wrote: > The InES at the ZHAW offers a PTP time stamping IP core. The FPGA > logic recognizes and time stamps PTP frames on the MII bus. This > patch adds a driver for the core along with a device tree binding to > allow hooking the driver to MAC devices. Hi Richard Can you point us at some documentation for this. I think Florian and I want to better understand how this device works, in order to understand your other changes. Andrew
Re: [PATCH net-next RFC V1 3/5] net: Introduce field for the MII time stamper.
On 03/21/2018 11:58 AM, Richard Cochran wrote: > This patch adds a new field to the network device structure to reference > a time stamping device on the MII bus. By decoupling the time stamping > function from the PHY device, we pave the way to allowing a non-PHY > device to take this role. > > Signed-off-by: Richard Cochran> --- > drivers/net/phy/mdio_bus.c | 51 > +- > include/linux/netdevice.h | 1 + > 2 files changed, 51 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c > index 24b5511222c8..fdac8c8ac272 100644 > --- a/drivers/net/phy/mdio_bus.c > +++ b/drivers/net/phy/mdio_bus.c > @@ -717,6 +717,47 @@ static int mdio_uevent(struct device *dev, struct > kobj_uevent_env *env) > return 0; > } > > +static bool mdiodev_supports_timestamping(struct mdio_device *mdiodev) > +{ > + if (mdiodev->ts_info && mdiodev->hwtstamp && > + mdiodev->rxtstamp && mdiodev->txtstamp) > + return true; > + else > + return false; > +} > + > +static int mdiobus_netdev_notification(struct notifier_block *nb, > +unsigned long msg, void *ptr) > +{ > + struct net_device *netdev = netdev_notifier_info_to_dev(ptr); > + struct phy_device *phydev = netdev->phydev; > + struct mdio_device *mdev; > + struct mii_bus *bus; > + int i; > + > + if (netdev->mdiots || msg != NETDEV_UP || !phydev) > + return NOTIFY_DONE; You are still assuming that we have a phy_device somehow, whereas you parch series wants to solve that for generic MDIO devices, that is a bit confusing. > + > + /* > + * Examine the MII bus associated with the PHY that is > + * attached to the MAC. If there is a time stamping device > + * on the bus, then connect it to the network device. > + */ > + bus = phydev->mdio.bus; > + > + for (i = 0; i < PHY_MAX_ADDR; i++) { > + mdev = bus->mdio_map[i]; > + if (!mdev) > + continue; > + if (mdiodev_supports_timestamping(mdev)) { > + netdev->mdiots = mdev; > + return NOTIFY_OK; What guarantees that netdev->mdiots gets cleared? Also, why is this done with a notifier instead of through phy_{connect,attach,disconnect}? It looks like we still have this requirement of the mdio TS device being a phy_device somehow, I am confused here... > + } > + } > + > + return NOTIFY_DONE; > +} > + > #ifdef CONFIG_PM > static int mdio_bus_suspend(struct device *dev) > { > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 5fbb9f1da7fd..223d691aa0b0 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1943,6 +1943,7 @@ struct net_device { > struct netprio_map __rcu *priomap; > #endif > struct phy_device *phydev; > + struct mdio_device *mdiots; phy_device embedds a mdio_device, can you find a way to rework the PHY PTP code to utilize the phy_device's mdio instance so do not introduce yet another pointer in that big structure that net_device already is? > struct lock_class_key *qdisc_tx_busylock; > struct lock_class_key *qdisc_running_key; > boolproto_down; > -- Florian
Re: [PATCH net-next RFC V1 2/5] net: phy: Move time stamping interface into the generic mdio layer.
On 03/21/2018 11:58 AM, Richard Cochran wrote: > There are different ways of obtaining hardware time stamps on network > packets. The ingress and egress times can be measured in the MAC, in > the PHY, or by a device listening on the MII bus. Up until now, the > kernel has support for MAC and PHY time stamping, but not for other > MII bus devices. > > This patch moves the PHY time stamping interface into the generic > mdio device in order to support MII time stamping hardware. > > Signed-off-by: Richard Cochran> --- > drivers/net/phy/dp83640.c | 29 - > drivers/net/phy/phy.c | 4 ++-- > include/linux/mdio.h | 23 +++ > include/linux/phy.h | 23 --- > net/core/ethtool.c| 4 ++-- > net/core/timestamping.c | 8 > 6 files changed, 51 insertions(+), 40 deletions(-) > > diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c > index 654f42d00092..79aeb5eb471a 100644 > --- a/drivers/net/phy/dp83640.c > +++ b/drivers/net/phy/dp83640.c > @@ -215,6 +215,10 @@ static LIST_HEAD(phyter_clocks); > static DEFINE_MUTEX(phyter_clocks_lock); > > static void rx_timestamp_work(struct work_struct *work); > +static int dp83640_ts_info(struct mdio_device *m, struct ethtool_ts_info > *i); > +static int dp83640_hwtstamp(struct mdio_device *m, struct ifreq *i); > +static bool dp83640_rxtstamp(struct mdio_device *m, struct sk_buff *s, int > t); > +static void dp83640_txtstamp(struct mdio_device *m, struct sk_buff *s, int > t); > > /* extended register access functions */ > > @@ -1162,6 +1166,12 @@ static int dp83640_probe(struct phy_device *phydev) > list_add_tail(>list, >phylist); > > dp83640_clock_put(clock); > + > + phydev->mdio.ts_info = dp83640_ts_info; > + phydev->mdio.hwtstamp = dp83640_hwtstamp; > + phydev->mdio.rxtstamp = dp83640_rxtstamp; > + phydev->mdio.txtstamp = dp83640_txtstamp; Why is this implemented a the mdio_device level and not at the mdio_driver level? This looks like the wrong level at which this is done. -- Florian
Re: [PATCH v4 00/17] netdev: Eliminate duplicate barriers on weakly-ordered archs
On 3/21/2018 10:56 AM, David Miller wrote: > From: Sinan Kaya> Date: Mon, 19 Mar 2018 22:42:15 -0400 > >> Code includes wmb() followed by writel() in multiple places. writel() >> already has a barrier on some architectures like arm64. >> >> This ends up CPU observing two barriers back to back before executing the >> register write. >> >> Since code already has an explicit barrier call, changing writel() to >> writel_relaxed(). >> >> I did a regex search for wmb() followed by writel() in each drivers >> directory. >> I scrubbed the ones I care about in this series. >> >> I considered "ease of change", "popular usage" and "performance critical >> path" as the determining criteria for my filtering. > > I agree that for performance sensitive operations, specifically writing > doorbell registers in the hot paths or RX and TX packet processing, this > is a good change. > > However, in configuration paths and whatnot, it is much less urgent and > useful. > > Therefore I think it would work better if you concentrated solely on > hot code path cases. > > You can, on a driver by driver basis, submit the other transformations > in the slow paths, and let the driver maintainers decide whether to > take those on or not. > > Also, please stick exactly to the case where we have: > > wmb/mb/etc. > writel() > OK > Because I see some changes where we have: > > writel() > > barrier() > > writel() > barrier() on ARM is a write barrier. Apparently, it is a compiler barrier on Intel. I briefly discussed the barrier() behavior in rdma mailing list [1]. Our conclusion is that code should have used wmb() if it really needed to synchronize memory contents to the device and barrier() is already wrong. It just guarantees that code doesn't move. writel() already has a compiler barrier inside. It won't move to begin with. Like you suggested, we decided to leave these changes alone and even skip those drivers. I'll take another look at the patches. > for exmaple, and you are turning that second writel() into a relaxed > on as well. The above is using a compile barrier, not a memory > barrier, so effectively it is two writel()'s in sequence which is > not what this patch set is about. > > If anything, that compile barrier() is superfluous and could be > removed. But that is also a separate change from what this patch > series is doing here. > agreed, I'll remove such changes. > Finally, it makes it that much easier if we can see the preceeding > memory barrier in the context of the patch that adjusts the writel > into a writel_relaxed. > > In one case, a macro DOORBELL() is changed to use writel(). This > makes it so that the patch reviewer has to scan over the entire > driver in question to see exactly how DOORBELL() is used and whether > it fits the criteria for the writel_relaxed() transformation. > > I would suggest that you adjust the name of the macro in a situation > like this, f.e. to DOORBELL_RELAXED(). makes sense. > > Thank you. > [1] https://patchwork.kernel.org/project/LKML/list/?submitter=145491 -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH REPOST v4 4/7] igb: eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan KayaReviewed-by: Alexander Duyck --- drivers/net/ethernet/intel/igb/igb_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index b88fae7..82aea92 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -5671,7 +5671,7 @@ static int igb_tx_map(struct igb_ring *tx_ring, igb_maybe_stop_tx(tx_ring, DESC_NEEDED); if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) { - writel(i, tx_ring->tail); + writel_relaxed(i, tx_ring->tail); /* we need this if more than one processor can write to our tail * at a time, it synchronizes IO on IA64/Altix systems @@ -8072,7 +8072,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count) * such as IA-64). */ wmb(); - writel(i, rx_ring->tail); + writel_relaxed(i, rx_ring->tail); } } -- 2.7.4
[PATCH net-next RFC V1 0/5] Peer to Peer One-Step time stamping
This series adds support for PTP (IEEE 1588) P2P one-step time stamping along with a driver for a hardware device that supports this. If the hardware supports p2p one-step, it subtracts the ingress time stamp value from the Pdelay_Request correction field. The user space software stack then simply copies the correction field into the Pdelay_Response, and on transmission the hardware adds the egress time stamp into the correction field. - Patch 1 adds the new option. - Patches 2-4 adds support for MII time stamping in non-PHY devices. - Patch 5 adds a driver implementing the new option. Earlier today I posted user space support as an RFC on the linuxptp-devel list. Comments and review are most welcome. Thanks, Richard Richard Cochran (5): net: Introduce peer to peer one step PTP time stamping. net: phy: Move time stamping interface into the generic mdio layer. net: Introduce field for the MII time stamper. net: Use the generic MII time stamper when available. net: mdio: Add a driver for InES time stamping IP core. Documentation/devicetree/bindings/net/ines-ptp.txt | 42 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 + drivers/net/phy/Makefile | 1 + drivers/net/phy/dp83640.c | 29 +- drivers/net/phy/ines_ptp.c | 857 + drivers/net/phy/mdio_bus.c | 51 +- drivers/net/phy/phy.c | 6 +- drivers/ptp/Kconfig| 10 + include/linux/mdio.h | 23 + include/linux/netdevice.h | 1 + include/linux/phy.h| 23 - include/uapi/linux/net_tstamp.h| 8 + net/Kconfig| 8 +- net/core/dev_ioctl.c | 1 + net/core/ethtool.c | 5 +- net/core/timestamping.c| 36 +- 16 files changed, 1034 insertions(+), 68 deletions(-) create mode 100644 Documentation/devicetree/bindings/net/ines-ptp.txt create mode 100644 drivers/net/phy/ines_ptp.c -- 2.11.0
[PATCH REPOST v4 1/7] i40e/i40evf: Eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan KayaReviewed-by: Alexander Duyck --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 8 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 4 ++-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index e554aa6cf..9455869 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -185,7 +185,7 @@ static int i40e_program_fdir_filter(struct i40e_fdir_filter *fdir_data, /* Mark the data descriptor to be watched */ first->next_to_watch = tx_desc; - writel(tx_ring->next_to_use, tx_ring->tail); + writel_relaxed(tx_ring->next_to_use, tx_ring->tail); return 0; dma_fail: @@ -1375,7 +1375,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val) * such as IA-64). */ wmb(); - writel(val, rx_ring->tail); + writel_relaxed(val, rx_ring->tail); } /** @@ -2258,7 +2258,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget) */ wmb(); - writel(xdp_ring->next_to_use, xdp_ring->tail); + writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail); } rx_ring->skb = skb; @@ -3286,7 +3286,7 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb, /* notify HW of packet */ if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) { - writel(i, tx_ring->tail); + writel_relaxed(i, tx_ring->tail); /* we need this if more than one processor can write to our tail * at a time, it synchronizes IO on IA64/Altix systems diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c index 357d605..56eea20 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c @@ -667,7 +667,7 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val) * such as IA-64). */ wmb(); - writel(val, rx_ring->tail); + writel_relaxed(val, rx_ring->tail); } /** @@ -2243,7 +2243,7 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb, /* notify HW of packet */ if (netif_xmit_stopped(txring_txq(tx_ring)) || !skb->xmit_more) { - writel(i, tx_ring->tail); + writel_relaxed(i, tx_ring->tail); /* we need this if more than one processor can write to our tail * at a time, it synchronizes IO on IA64/Altix systems -- 2.7.4
[PATCH net-next RFC V1 5/5] net: mdio: Add a driver for InES time stamping IP core.
The InES at the ZHAW offers a PTP time stamping IP core. The FPGA logic recognizes and time stamps PTP frames on the MII bus. This patch adds a driver for the core along with a device tree binding to allow hooking the driver to MAC devices. Signed-off-by: Richard Cochran--- Documentation/devicetree/bindings/net/ines-ptp.txt | 42 + drivers/net/phy/Makefile | 1 + drivers/net/phy/ines_ptp.c | 857 + drivers/ptp/Kconfig| 10 + 4 files changed, 910 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/ines-ptp.txt create mode 100644 drivers/net/phy/ines_ptp.c diff --git a/Documentation/devicetree/bindings/net/ines-ptp.txt b/Documentation/devicetree/bindings/net/ines-ptp.txt new file mode 100644 index ..ed7b1d773ded --- /dev/null +++ b/Documentation/devicetree/bindings/net/ines-ptp.txt @@ -0,0 +1,42 @@ +ZHAW InES PTP time stamping IP core + +The IP core needs two different kinds of nodes. The control node +lives somewhere in the memory map and specifies the address of the +control registers. There can be up to three port nodes placed on the +mdio bus. They associate a particular MAC with a port index within +the IP core. + +Required properties of the control node: + +- compatible: "ines,ptp-ctrl" +- reg: physical address and size of the register bank +- phandle: globally unique handle for the ports to point to + +Required properties of the port nodes: + +- compatible: "ines,ptp-port" +- ctrl-handle: points to the control node +- port-index: port channel within the IP core +- reg: phy address. This is required even though the + device does not respond to mdio operations + +Example: + + timestamper@6000 { + compatible = "ines,ptp-ctrl"; + reg = <0x6000 0x80>; + phandle = <0x10>; + }; + + ethernet@8000 { + ... + mdio { + ... + timestamper@1f { + compatible = "ines,ptp-port"; + ctrl-handle = <0x10>; + port-index = <0>; + reg = <0x1f>; + }; + }; + }; diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 01acbcb2c798..e286bb822295 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -61,6 +61,7 @@ obj-$(CONFIG_DP83848_PHY) += dp83848.o obj-$(CONFIG_DP83867_PHY) += dp83867.o obj-$(CONFIG_FIXED_PHY)+= fixed_phy.o obj-$(CONFIG_ICPLUS_PHY) += icplus.o +obj-$(CONFIG_INES_PTP_TSTAMP) += ines_ptp.o obj-$(CONFIG_INTEL_XWAY_PHY) += intel-xway.o obj-$(CONFIG_LSI_ET1011C_PHY) += et1011c.o obj-$(CONFIG_LXT_PHY) += lxt.o diff --git a/drivers/net/phy/ines_ptp.c b/drivers/net/phy/ines_ptp.c new file mode 100644 index ..4f66459d4417 --- /dev/null +++ b/drivers/net/phy/ines_ptp.c @@ -0,0 +1,857 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2018 MOSER-BAER AG + */ +#define pr_fmt(fmt) "InES_PTP: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +MODULE_DESCRIPTION("Driver for the ZHAW InES PTP time stamping IP core"); +MODULE_AUTHOR("Richard Cochran "); +MODULE_VERSION("1.0"); +MODULE_LICENSE("GPL"); + +/* GLOBAL register */ +#define MCAST_MAC_SELECT_SHIFT 2 +#define MCAST_MAC_SELECT_MASK 0x3 +#define IO_RESET BIT(1) +#define PTP_RESET BIT(0) + +/* VERSION register */ +#define IF_MAJOR_VER_SHIFT 12 +#define IF_MAJOR_VER_MASK 0xf +#define IF_MINOR_VER_SHIFT 8 +#define IF_MINOR_VER_MASK 0xf +#define FPGA_MAJOR_VER_SHIFT 4 +#define FPGA_MAJOR_VER_MASK0xf +#define FPGA_MINOR_VER_SHIFT 0 +#define FPGA_MINOR_VER_MASK0xf + +/* INT_STAT register */ +#define RX_INTR_STATUS_3 BIT(5) +#define RX_INTR_STATUS_2 BIT(4) +#define RX_INTR_STATUS_1 BIT(3) +#define TX_INTR_STATUS_3 BIT(2) +#define TX_INTR_STATUS_2 BIT(1) +#define TX_INTR_STATUS_1 BIT(0) + +/* INT_MSK register */ +#define RX_INTR_MASK_3 BIT(5) +#define RX_INTR_MASK_2 BIT(4) +#define RX_INTR_MASK_1 BIT(3) +#define TX_INTR_MASK_3 BIT(2) +#define TX_INTR_MASK_2 BIT(1) +#define TX_INTR_MASK_1 BIT(0) + +/* BUF_STAT register */ +#define RX_FIFO_NE_3 BIT(5) +#define RX_FIFO_NE_2 BIT(4) +#define RX_FIFO_NE_1 BIT(3) +#define TX_FIFO_NE_3 BIT(2) +#define TX_FIFO_NE_2 BIT(1) +#define TX_FIFO_NE_1 BIT(0) + +/* PORT_CONF register */ +#define CM_ONE_STEPBIT(6) +#define
[PATCH net-next RFC V1 1/5] net: Introduce peer to peer one step PTP time stamping.
The 1588 standard defines one step operation for both Sync and PDelay_Resp messages. Up until now, hardware with P2P one step has been rare, and kernel support was lacking. This patch adds support of the mode in anticipation of new hardware developments. Signed-off-by: Richard Cochran--- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 + include/uapi/linux/net_tstamp.h | 8 net/core/dev_ioctl.c | 1 + 3 files changed, 10 insertions(+) diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 74fc9af4aadb..c6295e5c16af 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -15379,6 +15379,7 @@ int bnx2x_configure_ptp_filters(struct bnx2x *bp) NIG_REG_P0_TLLH_PTP_RULE_MASK, 0x3EEE); break; case HWTSTAMP_TX_ONESTEP_SYNC: + case HWTSTAMP_TX_ONESTEP_P2P: BNX2X_ERR("One-step timestamping is not supported\n"); return -ERANGE; } diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h index 4fe104b2411f..f89b5a836c2a 100644 --- a/include/uapi/linux/net_tstamp.h +++ b/include/uapi/linux/net_tstamp.h @@ -90,6 +90,14 @@ enum hwtstamp_tx_types { * queue. */ HWTSTAMP_TX_ONESTEP_SYNC, + + /* +* Same as HWTSTAMP_TX_ONESTEP_SYNC, but also enables time +* stamp insertion directly into PDelay_Resp packets. In this +* case, neither transmitted Sync nor PDelay_Resp packets will +* receive a time stamp via the socket error queue. +*/ + HWTSTAMP_TX_ONESTEP_P2P, }; /* possible values for hwtstamp_config->rx_filter */ diff --git a/net/core/dev_ioctl.c b/net/core/dev_ioctl.c index 0ab1af04296c..cdda085e4b47 100644 --- a/net/core/dev_ioctl.c +++ b/net/core/dev_ioctl.c @@ -187,6 +187,7 @@ static int net_hwtstamp_validate(struct ifreq *ifr) case HWTSTAMP_TX_OFF: case HWTSTAMP_TX_ON: case HWTSTAMP_TX_ONESTEP_SYNC: + case HWTSTAMP_TX_ONESTEP_P2P: tx_type_valid = 1; break; } -- 2.11.0