Re: ath9k gpio request
Done, https://patchwork.kernel.org/patch/9151847/. Thanks, Miaoqing From: Kalle ValoSent: Friday, June 3, 2016 1:33 PM To: Pan, Miaoqing Cc: Sudip Mukherjee; Stephen Rothwell; ath9k-devel; linux-n...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-wirel...@vger.kernel.org; ath9k-de...@lists.ath9k.org; netdev@vger.kernel.org; Miaoqing Pan Subject: Re: ath9k gpio request Sudip Mukherjee writes: > On Thursday 02 June 2016 01:32 PM, Pan, Miaoqing wrote: >> Seems there are something wrong in the datasheet, try >> >> --- a/drivers/net/wireless/ath/ath9k/reg.h >> +++ b/drivers/net/wireless/ath/ath9k/reg.h >> @@ -1122,8 +1122,8 @@ enum { >> #define AR9300_NUM_GPIO 16 >> #define AR9330_NUM_GPIO 16 >> #define AR9340_NUM_GPIO 23 >> -#define AR9462_NUM_GPIO 10 >> -#define AR9485_NUM_GPIO 12 >> +#define AR9462_NUM_GPIO 14 >> +#define AR9485_NUM_GPIO 11 >> #define AR9531_NUM_GPIO 18 >> #define AR9550_NUM_GPIO 24 >> #define AR9561_NUM_GPIO 23 >> @@ -1139,8 +1139,8 @@ enum { >> #define AR9300_GPIO_MASK0xF4FF >> #define AR9330_GPIO_MASK0xF4FF >> #define AR9340_GPIO_MASK0x000F >> -#define AR9462_GPIO_MASK0x03FF >> -#define AR9485_GPIO_MASK0x0FFF >> +#define AR9462_GPIO_MASK0x3FFF >> +#define AR9485_GPIO_MASK0x07FF >> #define AR9531_GPIO_MASK0x000F >> #define AR9550_GPIO_MASK0x000F >> #define AR9561_GPIO_MASK0x000F > > solves the problem. > > Tested-by: Sudip Mukherjee Great, thanks for testing everyone. Miaoqing, please send a proper patch ASAP and I'll push it to 4.7. -- Kalle Valo
Re: ath9k gpio request
Sudip Mukherjeewrites: > On Thursday 02 June 2016 01:32 PM, Pan, Miaoqing wrote: >> Seems there are something wrong in the datasheet, try >> >> --- a/drivers/net/wireless/ath/ath9k/reg.h >> +++ b/drivers/net/wireless/ath/ath9k/reg.h >> @@ -1122,8 +1122,8 @@ enum { >> #define AR9300_NUM_GPIO 16 >> #define AR9330_NUM_GPIO 16 >> #define AR9340_NUM_GPIO 23 >> -#define AR9462_NUM_GPIO 10 >> -#define AR9485_NUM_GPIO 12 >> +#define AR9462_NUM_GPIO 14 >> +#define AR9485_NUM_GPIO 11 >> #define AR9531_NUM_GPIO 18 >> #define AR9550_NUM_GPIO 24 >> #define AR9561_NUM_GPIO 23 >> @@ -1139,8 +1139,8 @@ enum { >> #define AR9300_GPIO_MASK0xF4FF >> #define AR9330_GPIO_MASK0xF4FF >> #define AR9340_GPIO_MASK0x000F >> -#define AR9462_GPIO_MASK0x03FF >> -#define AR9485_GPIO_MASK0x0FFF >> +#define AR9462_GPIO_MASK0x3FFF >> +#define AR9485_GPIO_MASK0x07FF >> #define AR9531_GPIO_MASK0x000F >> #define AR9550_GPIO_MASK0x000F >> #define AR9561_GPIO_MASK0x000F > > solves the problem. > > Tested-by: Sudip Mukherjee Great, thanks for testing everyone. Miaoqing, please send a proper patch ASAP and I'll push it to 4.7. -- Kalle Valo
Good News
You are a recipient to Mr Pedro Quezada Donation of 2M USD. Contact (qpedro...@gmail.com) for claims.
Offer
You are a recipient to Mr Pedro Quezada Donation of 2M USD. Contact (qpedro...@gmail.com) for claims.
Re: [PATCH] tipc: fix an infoleak in tipc_nl_compat_link_dump
From: Kangjie LuDate: Thu, 2 Jun 2016 04:04:56 -0400 > link_info.str is a char array of size 60. Memory after the NULL > byte is not initialized. Sending the whole object out can cause > a leak. > > Signed-off-by: Kangjie Lu Applied.
Re: [PATCH] rds: fix an infoleak in rds_inc_info_copy
From: Kangjie LuDate: Thu, 2 Jun 2016 04:11:20 -0400 > The last field "flags" of object "minfo" is not initialized. > Copying this object out may leak kernel stack data. > Assign 0 to it to avoid leak. > > Signed-off-by: Kangjie Lu Applied.
Re: [PATCH net-next] qed: Utilize FW 8.10.3.0
From: Yuval MintzDate: Thu, 2 Jun 2016 10:23:29 +0300 > The New QED firmware contains several fixes, including: > - Wrong classification of packets in 4-port devices. > - Anti-spoof interoperability with encapsulated packets. > - Tx-switching of encapsulated packets. > It also slightly improves Tx performance of the device. > > In addition, this firmware contains the necessary logic for > supporting iscsi & rdma, for which we plan on pushing protocol > drivers in the imminent future. > > Signed-off-by: Yuval Mintz Applied, thanks.
Re: Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing")
From: Eric DumazetDate: Thu, 02 Jun 2016 19:58:26 -0700 > Arg, I totally messed up the patch title :( I noticed it was odd, but it's not a big deal.
Re: Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing")
On Thu, 2016-06-02 at 18:31 -0400, David Miller wrote: > From: Eric Dumazet> Date: Thu, 02 Jun 2016 14:52:43 -0700 > > > From: Eric Dumazet > > > > Paul Moore tracked a regression caused by a recent commit, which > > mistakenly assumed that sk_filter() could be avoided if socket > > had no current BPF filter. > > > > The intent was to avoid udp_lib_checksum_complete() overhead. > > > > But sk_filter() also checks skb_pfmemalloc() and > > security_sock_rcv_skb(), so better call it. > > > > Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") > > Signed-off-by: Eric Dumazet > > Reported-by: Paul Moore > > Tested-by: Paul Moore > > Tested-by: Stephen Smalley > > Cc: samanthakumar > > Applied, thanks Eric. Arg, I totally messed up the patch title :(
[PATCH v4 net-next 06/13] net: hns: use platform_get_irq instead of irq_of_parse_and_map
From: Kejian YanAs irq_of_parse_and_map is only used by DT case, it is excepted to use a uniform interface. So it is used platform_get_irq() instead. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c index 4ef6d23..3ce2409 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c @@ -458,7 +458,6 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) u32 i; u32 ring_num = rcb_common->ring_num; int base_irq_idx = hns_rcb_get_base_irq_idx(rcb_common); - struct device_node *np = rcb_common->dsaf_dev->dev->of_node; struct platform_device *pdev = to_platform_device(rcb_common->dsaf_dev->dev); bool is_ver1 = AE_IS_VER1(rcb_common->dsaf_dev->dsaf_ver); @@ -473,10 +472,10 @@ void hns_rcb_get_cfg(struct rcb_common_cb *rcb_common) ring_pair_cb->port_id_in_comm = hns_rcb_get_port_in_comm(rcb_common, i); ring_pair_cb->virq[HNS_RCB_IRQ_IDX_TX] = - is_ver1 ? irq_of_parse_and_map(np, base_irq_idx + i * 2) : + is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2) : platform_get_irq(pdev, base_irq_idx + i * 3 + 1); ring_pair_cb->virq[HNS_RCB_IRQ_IDX_RX] = - is_ver1 ? irq_of_parse_and_map(np, base_irq_idx + i * 2 + 1) : + is_ver1 ? platform_get_irq(pdev, base_irq_idx + i * 2 + 1) : platform_get_irq(pdev, base_irq_idx + i * 3); ring_pair_cb->q.phy_base = RCB_COMM_BASE_TO_RING_BASE(rcb_common->phy_base, i); -- 1.9.1
[PATCH v4 net-next 09/13] net: hns: add dsaf misc operation method
From: Kejian YanThe misc operation for different hw platform may be different, if using current implementation, it will add a new branch on each function for every new hw platform, so we add a method for this operation. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 4 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 6 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 14 ++-- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 2 - drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 11 ++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 33 ++--- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 79 +++--- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h | 7 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 15 ++-- .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c| 10 +-- 10 files changed, 111 insertions(+), 70 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c index 8e009f4..d37b778 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c @@ -637,13 +637,15 @@ static int hns_ae_config_loopback(struct hnae_handle *handle, int ret; struct hnae_vf_cb *vf_cb = hns_ae_get_vf_cb(handle); struct hns_mac_cb *mac_cb = hns_get_mac_cb(handle); + struct dsaf_device *dsaf_dev = mac_cb->dsaf_dev; switch (loop) { case MAC_INTERNALLOOP_PHY: ret = 0; break; case MAC_INTERNALLOOP_SERDES: - ret = hns_mac_config_sds_loopback(vf_cb->mac_cb, en); + ret = dsaf_dev->misc_op->cfg_serdes_loopback(vf_cb->mac_cb, +!!en); break; case MAC_INTERNALLOOP_MAC: ret = hns_mac_config_mac_loopback(vf_cb->mac_cb, loop, en); diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c index 44abb08..1235c7f 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c @@ -110,7 +110,7 @@ static void hns_gmac_free(void *mac_drv) u32 mac_id = drv->mac_id; - hns_dsaf_ge_srst_by_port(dsaf_dev, mac_id, 0); + dsaf_dev->misc_op->ge_srst(dsaf_dev, mac_id, 0); } static void hns_gmac_set_tx_auto_pause_frames(void *mac_drv, u16 newval) @@ -317,9 +317,9 @@ static void hns_gmac_init(void *mac_drv) port = drv->mac_id; - hns_dsaf_ge_srst_by_port(dsaf_dev, port, 0); + dsaf_dev->misc_op->ge_srst(dsaf_dev, port, 0); mdelay(10); - hns_dsaf_ge_srst_by_port(dsaf_dev, port, 1); + dsaf_dev->misc_op->ge_srst(dsaf_dev, port, 1); mdelay(10); hns_gmac_disable(mac_drv, MAC_COMM_MODE_RX_AND_TX); hns_gmac_tx_loop_pkt_dis(mac_drv); diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c index 527b49d..2ebf14a 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c @@ -95,7 +95,7 @@ void hns_mac_get_link_status(struct hns_mac_cb *mac_cb, u32 *link_status) else *link_status = 0; - ret = hns_mac_get_sfp_prsnt(mac_cb, _prsnt); + ret = mac_cb->dsaf_dev->misc_op->get_sfp_prsnt(mac_cb, _prsnt); if (!ret) *link_status = *link_status && sfp_prsnt; @@ -512,7 +512,7 @@ void hns_mac_stop(struct hns_mac_cb *mac_cb) mac_ctrl_drv->mac_en_flg = 0; mac_cb->link = 0; - cpld_led_reset(mac_cb); + mac_cb->dsaf_dev->misc_op->cpld_reset_led(mac_cb); } /** @@ -804,7 +804,7 @@ int hns_mac_get_cfg(struct dsaf_device *dsaf_dev, struct hns_mac_cb *mac_cb) else mac_cb->mac_type = HNAE_PORT_DEBUG; - mac_cb->phy_if = hns_mac_get_phy_if(mac_cb); + mac_cb->phy_if = dsaf_dev->misc_op->get_phy_if(mac_cb); ret = hns_mac_get_mode(mac_cb->phy_if); if (ret < 0) { @@ -819,7 +819,7 @@ int hns_mac_get_cfg(struct dsaf_device *dsaf_dev, struct hns_mac_cb *mac_cb) if (ret) return ret; - cpld_led_reset(mac_cb); + mac_cb->dsaf_dev->misc_op->cpld_reset_led(mac_cb); mac_cb->vaddr = hns_mac_get_vaddr(dsaf_dev, mac_cb, mac_mode_idx); return 0; @@ -906,7 +906,7 @@ void hns_mac_uninit(struct dsaf_device *dsaf_dev) int max_port_num = hns_mac_get_max_port_num(dsaf_dev); for (i = 0; i < max_port_num; i++) { - cpld_led_reset(dsaf_dev->mac_cb[i]); + dsaf_dev->misc_op->cpld_reset_led(dsaf_dev->mac_cb[i]); dsaf_dev->mac_cb[i] = NULL; } } @@ -989,7 +989,7 @@ void
[PATCH v4 net-next 08/13] net: hns: add uniform interface for phy connection
From: Kejian YanAs device_node is only used by DT case, HNS needs to treat the other cases including ACPI. It needs to use uniform ways to handle both of DT and ACPI. This patch chooses phy_device, and of_phy_connect and of_phy_attach are only used by DT case. It needs to use uniform interface to handle that sequence by both DT and ACPI. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v2: 1. remove the redundant functions, and 2. adds fwnode match method beside DT and ACPI. v1: first submit link: https://lkml.org/lkml/2016/5/13/100 --- drivers/net/ethernet/hisilicon/hns/hnae.c | 8 - drivers/net/ethernet/hisilicon/hns/hnae.h | 3 +- drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 2 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 34 +++--- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 2 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 2 +- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 21 +++-- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 2 +- 8 files changed, 49 insertions(+), 25 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c b/drivers/net/ethernet/hisilicon/hns/hnae.c index d630acd..5d3047c 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.c +++ b/drivers/net/ethernet/hisilicon/hns/hnae.c @@ -96,7 +96,13 @@ static int __ae_match(struct device *dev, const void *data) { struct hnae_ae_dev *hdev = cls_to_ae_dev(dev); - return (data == >dev->of_node->fwnode); + if (dev_of_node(hdev->dev)) + return (data == >dev->of_node->fwnode); + else if (is_acpi_node(hdev->dev->fwnode)) + return (data == hdev->dev->fwnode); + + dev_err(dev, "__ae_match cannot read cfg data from OF or acpi\n"); + return 0; } static struct hnae_ae_dev *find_ae(const struct fwnode_handle *fwnode) diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h index f5f8140..529cb13 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.h +++ b/drivers/net/ethernet/hisilicon/hns/hnae.h @@ -27,6 +27,7 @@ * "cb" means control block */ +#include #include #include #include @@ -512,7 +513,7 @@ struct hnae_ae_dev { struct hnae_handle { struct device *owner_dev; /* the device which make use of this handle */ struct hnae_ae_dev *dev; /* the device who provides this handle */ - struct device_node *phy_node; + struct phy_device *phy_dev; phy_interface_t phy_if; u32 if_support; int q_num; diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c index 7a757e8..8e009f4 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c @@ -131,7 +131,7 @@ struct hnae_handle *hns_ae_get_handle(struct hnae_ae_dev *dev, vf_cb->mac_cb = dsaf_dev->mac_cb[port_id]; ae_handle->phy_if = vf_cb->mac_cb->phy_if; - ae_handle->phy_node = vf_cb->mac_cb->phy_node; + ae_handle->phy_dev = vf_cb->mac_cb->phy_dev; ae_handle->if_support = vf_cb->mac_cb->if_support; ae_handle->port_type = vf_cb->mac_cb->mac_type; ae_handle->dport_id = port_id; diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c index 611581f..527b49d 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c @@ -15,7 +15,8 @@ #include #include #include -#include +#include +#include #include #include "hns_dsaf_main.h" @@ -645,7 +646,7 @@ free_mac_drv: */ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) { - struct device_node *np = mac_cb->dev->of_node; + struct device_node *np; struct regmap *syscon; struct of_phandle_args cpld_args; u32 ret; @@ -672,21 +673,34 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) * from dsaf node */ if (!mac_cb->fw_port) { - mac_cb->phy_node = of_parse_phandle(np, "phy-handle", - mac_cb->mac_id); - if (mac_cb->phy_node) + np = of_parse_phandle(mac_cb->dev->of_node, "phy-handle", + mac_cb->mac_id); + mac_cb->phy_dev = of_phy_find_device(np); + if (mac_cb->phy_dev) { + /* refcount is held by of_phy_find_device() +* if the phy_dev is found +*/ + put_device(_cb->phy_dev->mdio.dev); + dev_dbg(mac_cb->dev, "mac%d phy_node: %s\n", - mac_cb->mac_id, mac_cb->phy_node->name); +
[PATCH v4 net-next 00/13] net: hns: add support of ACPI
From: Kejian YanThis series adds HNS support of acpi. The routine will call some ACPI helper functions, like acpi_dev_found() and acpi_evaluate_dsm(), which are not included in other cases. In order to make system compile successfully in other cases except ACPI, it needs to add relative stub functions to linux/acpi.h. And we use device property functions instead of serial helper functions to suport both DT and ACPI cases. And then add the supports of ACPI for HNS. change log: v3->v4: mii-id gets from dev-name instead of address v2->v3: 1. add Review-by: Andy Shevchenko 2. fix the potential memory leak v1 -> v2: 1. use acpi_dev_found() instead of acpi_match_device_ids() to check if it is a acpi node. 2. use is_of_node() instead of IS_ENABLED() to check if it is a DT node. 3. split the patch("add support of acpi for hns-mdio") into two patches: 3.1 Move to use fwnode_handle 3.2 Add ACPI 4. add the patch which subject is dsaf misc operation method 5. fix the comments by Andy Shevchenko Kejian Yan (13): ACPI: bus: add stub acpi_dev_found() to linux/acpi.h ACPI: bus: add stub acpi_evaluate_dsm() to linux/acpi.h net: hisilicon: cleanup to prepare for other cases net: hisilicon: add support of acpi for hns-mdio net: hns: use device_* APIs instead of of_* APIs net: hns: use platform_get_irq instead of irq_of_parse_and_map net: hns: enet specify a reference to dsaf by fwnode_handle net: hns: add uniform interface for phy connection net: hns: add dsaf misc operation method net: hns: dsaf adds support of acpi net: hns: register phy device in each mac initial sequence net: hns: implement the miscellaneous operation by asl net: hns: net: hns: enet adds support of acpi drivers/net/ethernet/hisilicon/hns/hnae.c | 18 +- drivers/net/ethernet/hisilicon/hns/hnae.h | 5 +- drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 6 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 6 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 247 +++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 4 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 105 ++--- drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 33 ++- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 250 ++--- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.h | 7 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 15 +- drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 5 +- .../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c| 10 +- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 90 +--- drivers/net/ethernet/hisilicon/hns/hns_enet.h | 2 +- drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 2 +- drivers/net/ethernet/hisilicon/hns_mdio.c | 150 +++-- include/linux/acpi.h | 13 ++ 18 files changed, 706 insertions(+), 262 deletions(-) -- 1.9.1
[PATCH v4 net-next 01/13] ACPI: bus: add stub acpi_dev_found() to linux/acpi.h
From: Kejian Yanacpi_dev_found() will be used to detect if a given ACPI device is in the system. It will be compiled in non-ACPI case, but the function is in acpi_bus.h and acpi_bus.h can only be used in ACPI case, so this patch add the stub function to linux/acpi.h to make compiled successfully in non-ACPI cases. Cc: Rafael J. Wysocki Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- include/linux/acpi.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 288fac5..3025d19 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -543,6 +543,11 @@ struct platform_device *acpi_create_platform_device(struct acpi_device *); struct fwnode_handle; +static inline bool acpi_dev_found(const char *hid) +{ + return false; +} + static inline bool is_acpi_node(struct fwnode_handle *fwnode) { return false; -- 1.9.1
[PATCH v4 net-next 12/13] net: hns: implement the miscellaneous operation by asl
From: Kejian YanThe miscellaneous operation is implemented in BIOS, the kernel can call _DSM method help to call the implementation in ACPI case. Here is a patch to do that. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v2: use a serial function to implement the reset sequence v1: first submit link: https://lkml.org/lkml/2016/5/13/94 --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 167 + 1 file changed, 167 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c index f21177b..96cb628 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c @@ -12,6 +12,27 @@ #include "hns_dsaf_ppe.h" #include "hns_dsaf_reg.h" +enum _dsm_op_index { + HNS_OP_RESET_FUNC = 0x1, + HNS_OP_SERDES_LP_FUNC = 0x2, + HNS_OP_LED_SET_FUNC = 0x3, + HNS_OP_GET_PORT_TYPE_FUNC = 0x4, + HNS_OP_GET_SFP_STAT_FUNC= 0x5, +}; + +enum _dsm_rst_type { + HNS_DSAF_RESET_FUNC = 0x1, + HNS_PPE_RESET_FUNC = 0x2, + HNS_XGE_CORE_RESET_FUNC = 0x3, + HNS_XGE_RESET_FUNC = 0x4, + HNS_GE_RESET_FUNC = 0x5, +}; + +const u8 hns_dsaf_acpi_dsm_uuid[] = { + 0x1A, 0xAA, 0x85, 0x1A, 0x93, 0xE2, 0x5E, 0x41, + 0x8E, 0x28, 0x8D, 0x69, 0x0A, 0x0F, 0x82, 0x0A +}; + static void dsaf_write_sub(struct dsaf_device *dsaf_dev, u32 reg, u32 val) { if (dsaf_dev->sub_ctrl) @@ -109,6 +130,34 @@ static int cpld_set_led_id(struct hns_mac_cb *mac_cb, #define RESET_REQ_OR_DREQ 1 +static void hns_dsaf_acpi_srst_by_port(struct dsaf_device *dsaf_dev, u8 op_type, + u32 port_type, u32 port, u32 val) +{ + union acpi_object *obj; + union acpi_object obj_args[3], argv4; + + obj_args[0].integer.type = ACPI_TYPE_INTEGER; + obj_args[0].integer.value = port_type; + obj_args[1].integer.type = ACPI_TYPE_INTEGER; + obj_args[1].integer.value = port; + obj_args[2].integer.type = ACPI_TYPE_INTEGER; + obj_args[2].integer.value = val; + + argv4.type = ACPI_TYPE_PACKAGE; + argv4.package.count = 3; + argv4.package.elements = obj_args; + + obj = acpi_evaluate_dsm(ACPI_HANDLE(dsaf_dev->dev), + hns_dsaf_acpi_dsm_uuid, 0, op_type, ); + if (!obj) { + dev_warn(dsaf_dev->dev, "reset port_type%d port%d fail!", +port_type, port); + return; + } + + ACPI_FREE(obj); +} + static void hns_dsaf_rst(struct dsaf_device *dsaf_dev, bool dereset) { u32 xbar_reg_addr; @@ -126,6 +175,13 @@ static void hns_dsaf_rst(struct dsaf_device *dsaf_dev, bool dereset) dsaf_write_sub(dsaf_dev, nt_reg_addr, RESET_REQ_OR_DREQ); } +static void hns_dsaf_rst_acpi(struct dsaf_device *dsaf_dev, bool dereset) +{ + hns_dsaf_acpi_srst_by_port(dsaf_dev, HNS_OP_RESET_FUNC, + HNS_DSAF_RESET_FUNC, + 0, dereset); +} + static void hns_dsaf_xge_srst_by_port(struct dsaf_device *dsaf_dev, u32 port, bool dereset) { @@ -146,6 +202,13 @@ static void hns_dsaf_xge_srst_by_port(struct dsaf_device *dsaf_dev, u32 port, dsaf_write_sub(dsaf_dev, reg_addr, reg_val); } +static void hns_dsaf_xge_srst_by_port_acpi(struct dsaf_device *dsaf_dev, + u32 port, bool dereset) +{ + hns_dsaf_acpi_srst_by_port(dsaf_dev, HNS_OP_RESET_FUNC, + HNS_XGE_RESET_FUNC, port, dereset); +} + static void hns_dsaf_xge_core_srst_by_port(struct dsaf_device *dsaf_dev, u32 port, bool dereset) { @@ -166,6 +229,14 @@ static void hns_dsaf_xge_core_srst_by_port(struct dsaf_device *dsaf_dev, dsaf_write_sub(dsaf_dev, reg_addr, reg_val); } +static void +hns_dsaf_xge_core_srst_by_port_acpi(struct dsaf_device *dsaf_dev, + u32 port, bool dereset) +{ + hns_dsaf_acpi_srst_by_port(dsaf_dev, HNS_OP_RESET_FUNC, + HNS_XGE_CORE_RESET_FUNC, port, dereset); +} + static void hns_dsaf_ge_srst_by_port(struct dsaf_device *dsaf_dev, u32 port, bool dereset) { @@ -218,6 +289,13 @@ static void hns_dsaf_ge_srst_by_port(struct dsaf_device *dsaf_dev, u32 port, } } +static void hns_dsaf_ge_srst_by_port_acpi(struct dsaf_device *dsaf_dev, + u32 port, bool dereset) +{ + hns_dsaf_acpi_srst_by_port(dsaf_dev, HNS_OP_RESET_FUNC, + HNS_GE_RESET_FUNC, port, dereset); +} + static void hns_ppe_srst_by_port(struct dsaf_device
[PATCH v4 net-next 04/13] net: hisilicon: add support of acpi for hns-mdio
From: Kejian Yanhns-mdio needs to register itself to mii-bus. The info of the device can be read by both DT and ACPI. HNS tries to call Linux PHY driver to help access PHY-devices, the HNS hardware topology is as below. The MDIO controller may control several PHY-devices, and each PHY-device connects to a MAC device. The MDIO will be registered to mdiobus, then PHY-devices will register when each mac find PHY device. cpu | | --- | | | | | | | dsaf | MDIO | MDIO | --- | | | | | | | | | | | | | |MAC MAC MAC MAC| | | | | | | | | | | |||||| || PHY PHY PHY PHY And the driver can handle reset sequence by _RST method in DSDT in ACPI case. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v2: 1. use dev_of_node instead of IS_ENABLED macro 2. Add ACPI bits v1: first submit Link: https://lkml.org/lkml/2016/5/13/93 --- drivers/net/ethernet/hisilicon/hns_mdio.c | 106 +++--- 1 file changed, 69 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns_mdio.c b/drivers/net/ethernet/hisilicon/hns_mdio.c index 297edc4..761a32f 100644 --- a/drivers/net/ethernet/hisilicon/hns_mdio.c +++ b/drivers/net/ethernet/hisilicon/hns_mdio.c @@ -7,6 +7,7 @@ * (at your option) any later version. */ +#include #include #include #include @@ -354,48 +355,60 @@ static int hns_mdio_reset(struct mii_bus *bus) struct hns_mdio_device *mdio_dev = (struct hns_mdio_device *)bus->priv; int ret; - if (!dev_of_node(bus->parent)) - return -ENOTSUPP; + if (dev_of_node(bus->parent)) { + if (!mdio_dev->subctrl_vbase) { + dev_err(>dev, "mdio sys ctl reg has not maped\n"); + return -ENODEV; + } - if (!mdio_dev->subctrl_vbase) { - dev_err(>dev, "mdio sys ctl reg has not maped\n"); - return -ENODEV; - } + /* 1. reset req, and read reset st check */ + ret = mdio_sc_cfg_reg_write(mdio_dev, MDIO_SC_RESET_REQ, 0x1, + MDIO_SC_RESET_ST, 0x1, + MDIO_CHECK_SET_ST); + if (ret) { + dev_err(>dev, "MDIO reset fail\n"); + return ret; + } - /*1. reset req, and read reset st check*/ - ret = mdio_sc_cfg_reg_write(mdio_dev, MDIO_SC_RESET_REQ, 0x1, - MDIO_SC_RESET_ST, 0x1, - MDIO_CHECK_SET_ST); - if (ret) { - dev_err(>dev, "MDIO reset fail\n"); - return ret; - } + /* 2. dis clk, and read clk st check */ + ret = mdio_sc_cfg_reg_write(mdio_dev, MDIO_SC_CLK_DIS, + 0x1, MDIO_SC_CLK_ST, 0x1, + MDIO_CHECK_CLR_ST); + if (ret) { + dev_err(>dev, "MDIO dis clk fail\n"); + return ret; + } - /*2. dis clk, and read clk st check*/ - ret = mdio_sc_cfg_reg_write(mdio_dev, MDIO_SC_CLK_DIS, - 0x1, MDIO_SC_CLK_ST, 0x1, - MDIO_CHECK_CLR_ST); - if (ret) { - dev_err(>dev, "MDIO dis clk fail\n"); - return ret; - } + /* 3. reset dreq, and read reset st check */ + ret = mdio_sc_cfg_reg_write(mdio_dev, MDIO_SC_RESET_DREQ, 0x1, + MDIO_SC_RESET_ST, 0x1, + MDIO_CHECK_CLR_ST); + if (ret) { + dev_err(>dev, "MDIO dis clk fail\n"); + return ret; + } - /*3. reset dreq, and read reset st check*/ - ret = mdio_sc_cfg_reg_write(mdio_dev, MDIO_SC_RESET_DREQ, 0x1, - MDIO_SC_RESET_ST, 0x1, - MDIO_CHECK_CLR_ST); - if (ret) { - dev_err(>dev, "MDIO dis clk fail\n"); - return ret; + /* 4. en clk, and read clk st check */ + ret = mdio_sc_cfg_reg_write(mdio_dev, MDIO_SC_CLK_EN, +
[PATCH v4 net-next 03/13] net: hisilicon: cleanup to prepare for other cases
From: Kejian YanHns-mdio only supports DT case now. do some cleanup to prepare for introducing other cases later, no functional change. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v4: mii-id gets from dev_name instead of address v3: first submit Link: https://lkml.org/lkml/2016/5/30/298 --- drivers/net/ethernet/hisilicon/hns_mdio.c | 48 --- 1 file changed, 18 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns_mdio.c b/drivers/net/ethernet/hisilicon/hns_mdio.c index 765ddb3..297edc4 100644 --- a/drivers/net/ethernet/hisilicon/hns_mdio.c +++ b/drivers/net/ethernet/hisilicon/hns_mdio.c @@ -354,6 +354,9 @@ static int hns_mdio_reset(struct mii_bus *bus) struct hns_mdio_device *mdio_dev = (struct hns_mdio_device *)bus->priv; int ret; + if (!dev_of_node(bus->parent)) + return -ENOTSUPP; + if (!mdio_dev->subctrl_vbase) { dev_err(>dev, "mdio sys ctl reg has not maped\n"); return -ENODEV; @@ -397,24 +400,6 @@ static int hns_mdio_reset(struct mii_bus *bus) } /** - * hns_mdio_bus_name - get mdio bus name - * @name: mdio bus name - * @np: mdio device node pointer - */ -static void hns_mdio_bus_name(char *name, struct device_node *np) -{ - const u32 *addr; - u64 taddr = OF_BAD_ADDR; - - addr = of_get_address(np, 0, NULL, NULL); - if (addr) - taddr = of_translate_address(np, addr); - - snprintf(name, MII_BUS_ID_SIZE, "%s@%llx", np->name, -(unsigned long long)taddr); -} - -/** * hns_mdio_probe - probe mdio device * @pdev: mdio platform device * @@ -422,17 +407,16 @@ static void hns_mdio_bus_name(char *name, struct device_node *np) */ static int hns_mdio_probe(struct platform_device *pdev) { - struct device_node *np; struct hns_mdio_device *mdio_dev; struct mii_bus *new_bus; struct resource *res; - int ret; + int ret = -ENODEV; if (!pdev) { dev_err(NULL, "pdev is NULL!\r\n"); return -ENODEV; } - np = pdev->dev.of_node; + mdio_dev = devm_kzalloc(>dev, sizeof(*mdio_dev), GFP_KERNEL); if (!mdio_dev) return -ENOMEM; @@ -448,7 +432,7 @@ static int hns_mdio_probe(struct platform_device *pdev) new_bus->write = hns_mdio_write; new_bus->reset = hns_mdio_reset; new_bus->priv = mdio_dev; - hns_mdio_bus_name(new_bus->id, np); + new_bus->parent = >dev; res = platform_get_resource(pdev, IORESOURCE_MEM, 0); mdio_dev->vbase = devm_ioremap_resource(>dev, res); @@ -457,16 +441,20 @@ static int hns_mdio_probe(struct platform_device *pdev) return ret; } - mdio_dev->subctrl_vbase = - syscon_node_to_regmap(of_parse_phandle(np, "subctrl-vbase", 0)); - if (IS_ERR(mdio_dev->subctrl_vbase)) { - dev_warn(>dev, "no syscon hisilicon,peri-c-subctrl\n"); - mdio_dev->subctrl_vbase = NULL; - } - new_bus->parent = >dev; platform_set_drvdata(pdev, new_bus); + snprintf(new_bus->id, MII_BUS_ID_SIZE, "%s-%s", "Mii", +dev_name(>dev)); + if (dev_of_node(>dev)) { + mdio_dev->subctrl_vbase = syscon_node_to_regmap( + of_parse_phandle(pdev->dev.of_node, +"subctrl-vbase", 0)); + if (IS_ERR(mdio_dev->subctrl_vbase)) { + dev_warn(>dev, "no syscon hisilicon,peri-c-subctrl\n"); + mdio_dev->subctrl_vbase = NULL; + } + ret = of_mdiobus_register(new_bus, pdev->dev.of_node); + } - ret = of_mdiobus_register(new_bus, np); if (ret) { dev_err(>dev, "Cannot register as MDIO bus!\n"); platform_set_drvdata(pdev, NULL); -- 1.9.1
[PATCH v4 net-next 07/13] net: hns: enet specify a reference to dsaf by fwnode_handle
From: Kejian YanAs device_node is only used by DT case, it is expected to find uniform ways. So fwnode_handle is the suitable method. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v2: remove the redundant line v1: first submit link: https://lkml.org/lkml/2016/5/13/98 --- drivers/net/ethernet/hisilicon/hns/hnae.c | 12 ++-- drivers/net/ethernet/hisilicon/hns/hnae.h | 2 +- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 14 -- drivers/net/ethernet/hisilicon/hns/hns_enet.h | 2 +- 4 files changed, 16 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.c b/drivers/net/ethernet/hisilicon/hns/hnae.c index 3bfe36f..d630acd 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.c +++ b/drivers/net/ethernet/hisilicon/hns/hnae.c @@ -96,16 +96,16 @@ static int __ae_match(struct device *dev, const void *data) { struct hnae_ae_dev *hdev = cls_to_ae_dev(dev); - return hdev->dev->of_node == data; + return (data == >dev->of_node->fwnode); } -static struct hnae_ae_dev *find_ae(const struct device_node *ae_node) +static struct hnae_ae_dev *find_ae(const struct fwnode_handle *fwnode) { struct device *dev; - WARN_ON(!ae_node); + WARN_ON(!fwnode); - dev = class_find_device(hnae_class, NULL, ae_node, __ae_match); + dev = class_find_device(hnae_class, NULL, fwnode, __ae_match); return dev ? cls_to_ae_dev(dev) : NULL; } @@ -312,7 +312,7 @@ EXPORT_SYMBOL(hnae_reinit_handle); * return handle ptr or ERR_PTR */ struct hnae_handle *hnae_get_handle(struct device *owner_dev, - const struct device_node *ae_node, + const struct fwnode_handle *fwnode, u32 port_id, struct hnae_buf_ops *bops) { @@ -321,7 +321,7 @@ struct hnae_handle *hnae_get_handle(struct device *owner_dev, int i, j; int ret; - dev = find_ae(ae_node); + dev = find_ae(fwnode); if (!dev) return ERR_PTR(-ENODEV); diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h index e8d36aa..f5f8140 100644 --- a/drivers/net/ethernet/hisilicon/hns/hnae.h +++ b/drivers/net/ethernet/hisilicon/hns/hnae.h @@ -528,7 +528,7 @@ struct hnae_handle { #define ring_to_dev(ring) ((ring)->q->dev->dev) struct hnae_handle *hnae_get_handle(struct device *owner_dev, - const struct device_node *ae_node, + const struct fwnode_handle *fwnode, u32 port_id, struct hnae_buf_ops *bops); diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c index 8851420..93f6ccb 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -1807,7 +1807,7 @@ static int hns_nic_try_get_ae(struct net_device *ndev) int ret; h = hnae_get_handle(>netdev->dev, - priv->ae_node, priv->port_id, NULL); + priv->fwnode, priv->port_id, NULL); if (IS_ERR_OR_NULL(h)) { ret = -ENODEV; dev_dbg(priv->dev, "has not handle, register notifier!\n"); @@ -1867,7 +1867,7 @@ static int hns_nic_dev_probe(struct platform_device *pdev) struct device *dev = >dev; struct net_device *ndev; struct hns_nic_priv *priv; - struct device_node *node = dev->of_node; + struct device_node *ae_node; u32 port_id; int ret; @@ -1881,17 +1881,19 @@ static int hns_nic_dev_probe(struct platform_device *pdev) priv->dev = dev; priv->netdev = ndev; - if (of_device_is_compatible(node, "hisilicon,hns-nic-v1")) + if (of_device_is_compatible(dev->of_node, "hisilicon,hns-nic-v1")) priv->enet_ver = AE_VERSION_1; else priv->enet_ver = AE_VERSION_2; - priv->ae_node = (void *)of_parse_phandle(node, "ae-handle", 0); - if (IS_ERR_OR_NULL(priv->ae_node)) { - ret = PTR_ERR(priv->ae_node); + ae_node = of_parse_phandle(dev->of_node, "ae-handle", 0); + if (IS_ERR_OR_NULL(ae_node)) { + ret = PTR_ERR(ae_node); dev_err(dev, "not find ae-handle\n"); goto out_read_prop_fail; } + priv->fwnode = _node->fwnode; + /* try to find port-idx-in-ae first */ ret = device_property_read_u32(dev, "port-idx-in-ae", _id); if (ret) { diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.h b/drivers/net/ethernet/hisilicon/hns/hns_enet.h index 337efa5..44bb301 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.h +++
[PATCH v4 net-next 13/13] net: hns: net: hns: enet adds support of acpi
From: Kejian YanEnet needs to get configration parameter by acpi. This patch adds support of ACPI for enet. The configuration parameter will be configed in BIOS. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v2: 1. use acpi_dev_found() instead of acpi_match_device_ids() 2. use is_acpi_node() to check if it works by ACPI case 3. use dev_of_node() to check if it works by DT case v1: first submit link: https://lkml.org/lkml/2016/5/13/99 --- drivers/net/ethernet/hisilicon/hns/hns_enet.c | 56 +-- 1 file changed, 44 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c index 3ec3c27..ad742a6 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -132,6 +132,13 @@ static void fill_v2_desc(struct hnae_ring *ring, void *priv, ring_ptr_move_fw(ring, next_to_use); } +static const struct acpi_device_id hns_enet_acpi_match[] = { + { "HISI00C1", 0 }, + { "HISI00C2", 0 }, + { }, +}; +MODULE_DEVICE_TABLE(acpi, hns_enet_acpi_match); + static void fill_desc(struct hnae_ring *ring, void *priv, int size, dma_addr_t dma, int frag_end, int buf_num, enum hns_desc_type type, int mtu) @@ -1870,7 +1877,6 @@ static int hns_nic_dev_probe(struct platform_device *pdev) struct device *dev = >dev; struct net_device *ndev; struct hns_nic_priv *priv; - struct device_node *ae_node; u32 port_id; int ret; @@ -1884,20 +1890,45 @@ static int hns_nic_dev_probe(struct platform_device *pdev) priv->dev = dev; priv->netdev = ndev; - if (of_device_is_compatible(dev->of_node, "hisilicon,hns-nic-v1")) - priv->enet_ver = AE_VERSION_1; - else - priv->enet_ver = AE_VERSION_2; + if (dev_of_node(dev)) { + struct device_node *ae_node; - ae_node = of_parse_phandle(dev->of_node, "ae-handle", 0); - if (IS_ERR_OR_NULL(ae_node)) { - ret = PTR_ERR(ae_node); - dev_err(dev, "not find ae-handle\n"); - goto out_read_prop_fail; + if (of_device_is_compatible(dev->of_node, + "hisilicon,hns-nic-v1")) + priv->enet_ver = AE_VERSION_1; + else + priv->enet_ver = AE_VERSION_2; + + ae_node = of_parse_phandle(dev->of_node, "ae-handle", 0); + if (IS_ERR_OR_NULL(ae_node)) { + ret = PTR_ERR(ae_node); + dev_err(dev, "not find ae-handle\n"); + goto out_read_prop_fail; + } + priv->fwnode = _node->fwnode; + } else if (is_acpi_node(dev->fwnode)) { + struct acpi_reference_args args; + + if (acpi_dev_found(hns_enet_acpi_match[0].id)) + priv->enet_ver = AE_VERSION_1; + else if (acpi_dev_found(hns_enet_acpi_match[1].id)) + priv->enet_ver = AE_VERSION_2; + else + return -ENXIO; + + /* try to find port-idx-in-ae first */ + ret = acpi_node_get_property_reference(dev->fwnode, + "ae-handle", 0, ); + if (ret) { + dev_err(dev, "not find ae-handle\n"); + goto out_read_prop_fail; + } + priv->fwnode = acpi_fwnode_handle(args.adev); + } else { + dev_err(dev, "cannot read cfg data from OF or acpi\n"); + return -ENXIO; } - priv->fwnode = _node->fwnode; - /* try to find port-idx-in-ae first */ ret = device_property_read_u32(dev, "port-idx-in-ae", _id); if (ret) { /* only for old code compatible */ @@ -2014,6 +2045,7 @@ static struct platform_driver hns_nic_dev_driver = { .driver = { .name = "hns-nic", .of_match_table = hns_enet_of_match, + .acpi_match_table = ACPI_PTR(hns_enet_acpi_match), }, .probe = hns_nic_dev_probe, .remove = hns_nic_dev_remove, -- 1.9.1
[PATCH v4 net-next 10/13] net: hns: dsaf adds support of acpi
From: Kejian YanDsaf needs to get configuration parameter by ACPI, so this patch add support of ACPI. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v2: 1. use dev_of_node() instead of IS_ENABLED() to check if it is in DT case, 2. split a new patch to implement misc operation method, 3. use acpi_dev_found() instead of acpi_match_device_ids() to check which hw version it is, 4. use is_acpi_node instead of ACPI_COMPANION to check if it is work in ACPI case. v1: first submit link: https://lkml.org/lkml/2016/5/13/108 --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 80 ++-- drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 85 +++--- drivers/net/ethernet/hisilicon/hns/hns_dsaf_misc.c | 32 3 files changed, 114 insertions(+), 83 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c index 2ebf14a..3ef0c9b 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c @@ -689,9 +689,7 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) return 0; } - if (!is_of_node(mac_cb->fw_port)) - return -EINVAL; - + if (is_of_node(mac_cb->fw_port)) { /* parse property from port subnode in dsaf */ np = of_parse_phandle(to_of_node(mac_cb->fw_port), "phy-handle", 0); mac_cb->phy_dev = of_phy_find_device(np); @@ -701,47 +699,49 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) mac_cb->mac_id, np->name); } - syscon = syscon_node_to_regmap( - of_parse_phandle(to_of_node(mac_cb->fw_port), -"serdes-syscon", 0)); - if (IS_ERR_OR_NULL(syscon)) { - dev_err(mac_cb->dev, "serdes-syscon is needed!\n"); - return -EINVAL; - } - mac_cb->serdes_ctrl = syscon; - - ret = fwnode_property_read_u32(mac_cb->fw_port, - "port-rst-offset", - _cb->port_rst_off); - if (ret) { - dev_dbg(mac_cb->dev, - "mac%d port-rst-offset not found, use default value.\n", - mac_cb->mac_id); - } + syscon = syscon_node_to_regmap( + of_parse_phandle(to_of_node(mac_cb->fw_port), +"serdes-syscon", 0)); + if (IS_ERR_OR_NULL(syscon)) { + dev_err(mac_cb->dev, "serdes-syscon is needed!\n"); + return -EINVAL; + } + mac_cb->serdes_ctrl = syscon; - ret = fwnode_property_read_u32(mac_cb->fw_port, - "port-mode-offset", - _cb->port_mode_off); - if (ret) { - dev_dbg(mac_cb->dev, - "mac%d port-mode-offset not found, use default value.\n", - mac_cb->mac_id); - } + ret = fwnode_property_read_u32(mac_cb->fw_port, + "port-rst-offset", + _cb->port_rst_off); + if (ret) { + dev_dbg(mac_cb->dev, + "mac%d port-rst-offset not found, use default value.\n", + mac_cb->mac_id); + } - ret = of_parse_phandle_with_fixed_args(to_of_node(mac_cb->fw_port), - "cpld-syscon", 1, 0, _args); - if (ret) { - dev_dbg(mac_cb->dev, "mac%d no cpld-syscon found.\n", - mac_cb->mac_id); - mac_cb->cpld_ctrl = NULL; - } else { - syscon = syscon_node_to_regmap(cpld_args.np); - if (IS_ERR_OR_NULL(syscon)) { - dev_dbg(mac_cb->dev, "no cpld-syscon found!\n"); + ret = fwnode_property_read_u32(mac_cb->fw_port, + "port-mode-offset", + _cb->port_mode_off); + if (ret) { + dev_dbg(mac_cb->dev, + "mac%d port-mode-offset not found, use default value.\n", + mac_cb->mac_id); + } + + ret = of_parse_phandle_with_fixed_args( + to_of_node(mac_cb->fw_port), "cpld-syscon", 1, 0, + _args); + if (ret) { + dev_dbg(mac_cb->dev, "mac%d no cpld-syscon found.\n", + mac_cb->mac_id); mac_cb->cpld_ctrl = NULL;
[PATCH v4 net-next 11/13] net: hns: register phy device in each mac initial sequence
From: Kejian YanIn ACPI case, there is no interface to register phy device to mdio-bus. Phy device has to be registered itself to mdio-bus, and then enet can get the phy device's info so that it can config the phy-device to help to trasmit and receive data. HNS hardware topology is as below. The MDIO controller may control several PHY-devices, and each PHY-device connects to a MAC device. PHY-devices will register when each mac find PHY device in initial sequence. cpu | | --- | | | | | | | dsaf | MDIO | MDIO | --- | | | | | | | | | | | | | |MAC MAC MAC MAC| | | | | | | | | | | |||||| || PHY PHY PHY PHY Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- change log: v2: fix the build error by kbuild test robot v1: first submit link: https://lkml.org/lkml/2016/5/13/97 --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 133 -- 1 file changed, 126 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c index 3ef0c9b..c526558 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c @@ -7,6 +7,7 @@ * (at your option) any later version. */ +#include #include #include #include @@ -638,6 +639,115 @@ free_mac_drv: return ret; } +static int +hns_mac_phy_parse_addr(struct device *dev, struct fwnode_handle *fwnode) +{ + u32 addr; + int ret; + + ret = fwnode_property_read_u32(fwnode, "phy-addr", ); + if (ret) { + dev_err(dev, "has invalid PHY address ret:%d\n", ret); + return ret; + } + + if (addr >= PHY_MAX_ADDR) { + dev_err(dev, "PHY address %i is too large\n", addr); + return -EINVAL; + } + + return addr; +} + +static int hns_mac_phydev_match(struct device *dev, void *fwnode) +{ + return dev->fwnode == fwnode; +} + +static struct +platform_device *hns_mac_find_platform_device(struct fwnode_handle *fwnode) +{ + struct device *dev; + + dev = bus_find_device(_bus_type, NULL, + fwnode, hns_mac_phydev_match); + return dev ? to_platform_device(dev) : NULL; +} + +static int +hns_mac_register_phydev(struct mii_bus *mdio, struct hns_mac_cb *mac_cb, + u32 addr) +{ + struct phy_device *phy; + const char *phy_type; + bool is_c45; + int rc; + + rc = fwnode_property_read_string(mac_cb->fw_port, +"phy-mode", _type); + if (rc < 0) + return rc; + + if (!strcmp(phy_type, phy_modes(PHY_INTERFACE_MODE_XGMII))) + is_c45 = 1; + else if (!strcmp(phy_type, phy_modes(PHY_INTERFACE_MODE_SGMII))) + is_c45 = 0; + else + return -ENODATA; + + phy = get_phy_device(mdio, addr, is_c45); + if (!phy || IS_ERR(phy)) + return -EIO; + + if (mdio->irq) + phy->irq = mdio->irq[addr]; + + /* All data is now stored in the phy struct; +* register it +*/ + rc = phy_device_register(phy); + if (rc) { + phy_device_free(phy); + return -ENODEV; + } + + mac_cb->phy_dev = phy; + + dev_dbg(>dev, "registered phy at address %i\n", addr); + + return 0; +} + +static void hns_mac_register_phy(struct hns_mac_cb *mac_cb) +{ + struct acpi_reference_args args; + struct platform_device *pdev; + struct mii_bus *mii_bus; + int rc; + int addr; + + /* Loop over the child nodes and register a phy_device for each one */ + if (!to_acpi_device_node(mac_cb->fw_port)) + return; + + rc = acpi_node_get_property_reference( + mac_cb->fw_port, "mdio-node", 0, ); + if (rc) + return; + + addr = hns_mac_phy_parse_addr(mac_cb->dev, mac_cb->fw_port); + if (addr < 0) + return; + + /* dev address in adev */ + pdev = hns_mac_find_platform_device(acpi_fwnode_handle(args.adev)); + mii_bus = platform_get_drvdata(pdev); + rc = hns_mac_register_phydev(mii_bus, mac_cb, addr); + if (!rc) + dev_dbg(mac_cb->dev, "mac%d register phy addr:%d\n",
[PATCH v4 net-next 02/13] ACPI: bus: add stub acpi_evaluate_dsm() to linux/acpi.h
From: Kejian Yanacpi_evaluate_dsm() will be used to handle the _DSM method in ACPI case. It will be compiled in non-ACPI case, but the function is in acpi_bus.h and acpi_bus.h can only be used in ACPI case, so this patch add the stub function to linux/acpi.h to make compiled successfully in non-ACPI cases. Cc: Rafael J. Wysocki Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- include/linux/acpi.h | 8 1 file changed, 8 insertions(+) diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 3025d19..4d4bb49 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -659,6 +659,14 @@ static inline bool acpi_driver_match_device(struct device *dev, return false; } +static inline union acpi_object *acpi_evaluate_dsm(acpi_handle handle, + const u8 *uuid, + int rev, int func, + union acpi_object *argv4) +{ + return NULL; +} + static inline int acpi_device_uevent_modalias(struct device *dev, struct kobj_uevent_env *env) { -- 1.9.1
[PATCH v4 net-next 05/13] net: hns: use device_* APIs instead of of_* APIs
From: Kejian YanOF series functions can be used only for DT case. Use unified device property function instead to support both DT and ACPI. Signed-off-by: Kejian Yan Signed-off-by: Yisen Zhuang --- drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 9 + drivers/net/ethernet/hisilicon/hns/hns_enet.c | 11 +++ 2 files changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c index 1c2ddb2..9afc5e6 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c @@ -50,7 +50,7 @@ int hns_dsaf_get_cfg(struct dsaf_device *dsaf_dev) else dsaf_dev->dsaf_ver = AE_VERSION_2; - ret = of_property_read_string(np, "mode", _str); + ret = device_property_read_string(dsaf_dev->dev, "mode", _str); if (ret) { dev_err(dsaf_dev->dev, "get dsaf mode fail, ret=%d!\n", ret); return ret; @@ -142,7 +142,7 @@ int hns_dsaf_get_cfg(struct dsaf_device *dsaf_dev) } } - ret = of_property_read_u32(np, "desc-num", _num); + ret = device_property_read_u32(dsaf_dev->dev, "desc-num", _num); if (ret < 0 || desc_num < HNS_DSAF_MIN_DESC_CNT || desc_num > HNS_DSAF_MAX_DESC_CNT) { dev_err(dsaf_dev->dev, "get desc-num(%d) fail, ret=%d!\n", @@ -151,14 +151,15 @@ int hns_dsaf_get_cfg(struct dsaf_device *dsaf_dev) } dsaf_dev->desc_num = desc_num; - ret = of_property_read_u32(np, "reset-field-offset", _offset); + ret = device_property_read_u32(dsaf_dev->dev, "reset-field-offset", + _offset); if (ret < 0) { dev_dbg(dsaf_dev->dev, "get reset-field-offset fail, ret=%d!\r\n", ret); } dsaf_dev->reset_offset = reset_offset; - ret = of_property_read_u32(np, "buf-size", _size); + ret = device_property_read_u32(dsaf_dev->dev, "buf-size", _size); if (ret < 0) { dev_err(dsaf_dev->dev, "get buf-size fail, ret=%d!\r\n", ret); diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c index e621636..8851420 100644 --- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c +++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c @@ -1067,13 +1067,8 @@ void hns_nic_update_stats(struct net_device *netdev) static void hns_init_mac_addr(struct net_device *ndev) { struct hns_nic_priv *priv = netdev_priv(ndev); - struct device_node *node = priv->dev->of_node; - const void *mac_addr_temp; - mac_addr_temp = of_get_mac_address(node); - if (mac_addr_temp && is_valid_ether_addr(mac_addr_temp)) { - memcpy(ndev->dev_addr, mac_addr_temp, ndev->addr_len); - } else { + if (!device_get_mac_address(priv->dev, ndev->dev_addr, ETH_ALEN)) { eth_hw_addr_random(ndev); dev_warn(priv->dev, "No valid mac, use random mac %pM", ndev->dev_addr); @@ -1898,10 +1893,10 @@ static int hns_nic_dev_probe(struct platform_device *pdev) goto out_read_prop_fail; } /* try to find port-idx-in-ae first */ - ret = of_property_read_u32(node, "port-idx-in-ae", _id); + ret = device_property_read_u32(dev, "port-idx-in-ae", _id); if (ret) { /* only for old code compatible */ - ret = of_property_read_u32(node, "port-id", _id); + ret = device_property_read_u32(dev, "port-id", _id); if (ret) goto out_read_prop_fail; /* for old dts, we need to caculate the port offset */ -- 1.9.1
Re: [PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
On Thu, Jun 02, 2016 at 06:32:42PM +, mario_limoncie...@dell.com wrote: > > And you want to check this for all Dell devices? Please be model > > specific, I doubt a bunch of Dell servers wants to run this code... > > > > Tracking model specific is really going to turn into a giant list never > ending list. > To drill down more specifically, I can match on chassis too. Yes, as this is a vendor/platform-specific "quirk", you will have to update it for each and every individual device you want it enabled as it is so different from what all other drivers do. thanks, greg k-h
Offer
You are a recipient to Mr Pedro Quezada Donation of 2M USD. Contact (qpedro...@gmail.com) for claims.
Re: [PATCH v2 6/7] Binding:PHY: Binding doc for NS2 PCIe PHYs.
On Tue, May 31, 2016 at 07:06:40PM +0530, Pramod Kumar wrote: > Binding doc for NS2 PCIe PHYs. > > Signed-off-by: Jon Mason> Signed-off-by: Pramod Kumar > --- > .../bindings/phy/brcm,mdio-mux-bus-pci.txt | 27 > ++ > 1 file changed, 27 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/phy/brcm,mdio-mux-bus-pci.txt Acked-by: Rob Herring
Re: [PATCH v2 3/7] binding: mdio-mux: Add DT binding doc for Broadcom MDIO bus mutiplexer
On Tue, May 31, 2016 at 07:06:37PM +0530, Pramod Kumar wrote: > Add DT binding doc for Broadcom MDIO bus mutiplexer driver. > > Signed-off-by: Pramod Kumar> --- > .../bindings/net/brcm,mdio-mux-iproc.txt | 60 > ++ > 1 file changed, 60 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/net/brcm,mdio-mux-iproc.txt > > diff --git a/Documentation/devicetree/bindings/net/brcm,mdio-mux-iproc.txt > b/Documentation/devicetree/bindings/net/brcm,mdio-mux-iproc.txt > new file mode 100644 > index 000..f270b41 > --- /dev/null > +++ b/Documentation/devicetree/bindings/net/brcm,mdio-mux-iproc.txt > @@ -0,0 +1,60 @@ > +Properties for an MDIO bus mutiplexer found in Broadcom iProc based SoCs. > + > +This MDIO bus multiplexer defines buses that could be internal as well as > +external to SoCs and could accept MDIO transaction compatible to C-22 or > +C-45 Clause. When Child bus is selected, one need to select these two s/Child/child/ s/need/needs/ > +properties as well to generate desired MDIO trascation on appropriate bus. > + > +Required properties in addition to the generic multiplexer properties: > + > +MDIO multiplexer node: > +- complatible: brcm,mdio-mux-iproc. typo > + > +Every non-ethernet PHY requires a compatible so that it could be probed based > +on this compatible string. > + > +Additional information regarding generic multiplexer properties could be > found s/could/can/ > +at- Documentation/devicetree/bindings/net/mdio-mux.txt > + > + > +for example: > + mdio_mux_iproc: mdio_mux_iproc@6602023c { No '_' in node names. mdio-mux@... > + compatible = "brcm,mdio-mux-iproc"; > + reg = <0x6602023c 0x14>; > + #address-cells = <1>; > + #size-cells = <0>; > + mdio-integrated-mux; > + > + mdio@0 { > + reg = <0x0>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + pci_phy0: pci-phy@0 { > + compatible = "brcm,ns2-pcie-phy"; > + reg = <0x0>; > + #phy-cells = <0>; > + }; > + }; > + > + mdio@7 { > + reg = <0x7>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + pci_phy1: pci-phy@0 { > + compatible = "brcm,ns2-pcie-phy"; > + reg = <0x0>; > + #phy-cells = <0>; > + }; > + }; > + mdio@10 { > + reg = <0x10>; > + #address-cells = <1>; > + #size-cells = <0>; > + > + gphy0: eth-phy@10 { > + reg = <0x10>; > + }; > + }; > + }; > -- > 1.9.1 >
[PATCH net] ethernet/sfc: report supported link speeds on SFP connections
My solarflare cards connected to a 10GbE switch with an SFP+ module/cable don't currently report any supported link speeds: $ ethtool ens4f0 Settings for ens4f0: Supported ports: [ FIBRE ] Supported link modes: Not reported Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Advertised link modes: Not reported Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Link partner advertised link modes: 1baseKX4/Full Link partner advertised pause frame use: Symmetric Link partner advertised auto-negotiation: No Speed: 1Mb/s Duplex: Full Port: FIBRE PHYAD: 255 Transceiver: internal Auto-negotiation: on Cannot get wake-on-lan settings: Operation not permitted Current message level: 0x20f7 (8439) drv probe link ifdown ifup rx_err tx_err hw Link detected: yes I've navigated my way through the sfc code down to mcdi_to_ethtool_cap's switch on media's MC_CMD_MEDIA_SFP_PLUS case, where no speeds are set. If we just do some cap checks similar to the MC_CMD_MEDIA_KX4 case, I get the expected output: $ ethtool ens4f0 Settings for ens4f0: Supported ports: [ FIBRE ] Supported link modes: 1000baseKX/Full 1baseKX4/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Advertised link modes: Not reported Advertised pause frame use: Symmetric Advertised auto-negotiation: Yes Link partner advertised link modes: 1baseKX4/Full Link partner advertised pause frame use: Symmetric Link partner advertised auto-negotiation: No Speed: 1Mb/s Duplex: Full Port: FIBRE PHYAD: 255 Transceiver: internal Auto-negotiation: on Cannot get wake-on-lan settings: Operation not permitted Current message level: 0x20f7 (8439) drv probe link ifdown ifup rx_err tx_err hw Link detected: yes This is from an sfc9120 interface here. It also applies to a 9140 with a 10GbE breakout cable. Side note: wiring up Advertised by simply copying Supported seems to be a thing many other drivers do. Worth doing here?... CC: Solarflare linux maintainersCC: Edward Cree CC: Bert Kenward CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson --- drivers/net/ethernet/sfc/mcdi_port.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/sfc/mcdi_port.c b/drivers/net/ethernet/sfc/mcdi_port.c index 7f295c4..6516471 100644 --- a/drivers/net/ethernet/sfc/mcdi_port.c +++ b/drivers/net/ethernet/sfc/mcdi_port.c @@ -189,6 +189,10 @@ static u32 mcdi_to_ethtool_cap(u32 media, u32 cap) case MC_CMD_MEDIA_XFP: case MC_CMD_MEDIA_SFP_PLUS: + if (cap & (1 << MC_CMD_PHY_CAP_1000FDX_LBN)) + result |= SUPPORTED_1000baseKX_Full; + if (cap & (1 << MC_CMD_PHY_CAP_1FDX_LBN)) + result |= SUPPORTED_1baseKX4_Full; result |= SUPPORTED_FIBRE; break; -- 1.8.3.1
[PATCH net-next v2 1/2] net: Add l3mdev rule
Currently, VRFs require 1 oif and 1 iif rule per address family per VRF. As the number of VRF devices increases it brings scalability issues with the increasing rule list. All of the VRF rules have the same format with the exception of the specific table id to direct the lookup. Since the table id is available from the oif or iif in the loopup, the VRF rules can be consolidated to a single rule that pulls the table from the VRF device. This patch introduces a new rule attribute l3mdev. The l3mdev rule means the table id used for the lookup is pulled from the L3 master device (e.g., VRF) rather than being statically defined. With the l3mdev rule all of the basic VRF FIB rules are reduced to 1 l3mdev rule per address family (IPv4 and IPv6). If an admin wishes to insert higher priority rules for specific VRFs those rules will co-exist with the l3mdev rule. This capability means current VRF scripts will co-exist with this new simpler implementation. Currently, the rules list for both ipv4 and ipv6 look like this: $ ip ru ls 1000: from all oif vrf1 lookup 1001 1000: from all iif vrf1 lookup 1001 1000: from all oif vrf2 lookup 1002 1000: from all iif vrf2 lookup 1002 1000: from all oif vrf3 lookup 1003 1000: from all iif vrf3 lookup 1003 1000: from all oif vrf4 lookup 1004 1000: from all iif vrf4 lookup 1004 1000: from all oif vrf5 lookup 1005 1000: from all iif vrf5 lookup 1005 1000: from all oif vrf6 lookup 1006 1000: from all iif vrf6 lookup 1006 1000: from all oif vrf7 lookup 1007 1000: from all iif vrf7 lookup 1007 1000: from all oif vrf8 lookup 1008 1000: from all iif vrf8 lookup 1008 ... 32765: from all lookup local 32766: from all lookup main 32767: from all lookup default With the l3mdev rule the list is just the following regardless of the number of VRFs: $ ip ru ls 1000: from all lookup [l3mdev table] 32765: from all lookup local 32766: from all lookup main 32767: from all lookup default (Note: the above pretty print of the rule is based on an iproute2 prototype. Actual verbage may change) Signed-off-by: David Ahern--- v2 - if CONFIG_NET_L3_MASTER_DEV is not enabled changed the inline l3mdev_fib_rule_match function to return 1 rather than 0 allowing the compiler to completely drop the check: if (rule->l3mdev && !l3mdev_fib_rule_match()) - moved setting of tb_id down to its use in fib4_rule_action which addresses Dave's comment about reverse xmas tree order. Same change for ipv6 version. include/net/fib_rules.h| 24 ++-- include/net/l3mdev.h | 12 include/uapi/linux/fib_rules.h | 1 + net/core/fib_rules.c | 33 - net/ipv4/fib_rules.c | 6 -- net/ipv6/fib6_rules.c | 6 -- net/l3mdev/l3mdev.c| 38 ++ 7 files changed, 109 insertions(+), 11 deletions(-) diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h index 59160de702b6..456e4a6006ab 100644 --- a/include/net/fib_rules.h +++ b/include/net/fib_rules.h @@ -17,7 +17,8 @@ struct fib_rule { u32 flags; u32 table; u8 action; - /* 3 bytes hole, try to use */ + u8 l3mdev; + /* 2 bytes hole, try to use */ u32 target; __be64 tun_id; struct fib_rule __rcu *ctarget; @@ -36,6 +37,7 @@ struct fib_lookup_arg { void*lookup_ptr; void*result; struct fib_rule *rule; + u32 table; int flags; #define FIB_LOOKUP_NOREF 1 #define FIB_LOOKUP_IGNORE_LINKSTATE2 @@ -89,7 +91,8 @@ struct fib_rules_ops { [FRA_TABLE] = { .type = NLA_U32 }, \ [FRA_SUPPRESS_PREFIXLEN] = { .type = NLA_U32 }, \ [FRA_SUPPRESS_IFGROUP] = { .type = NLA_U32 }, \ - [FRA_GOTO] = { .type = NLA_U32 } + [FRA_GOTO] = { .type = NLA_U32 }, \ + [FRA_L3MDEV]= { .type = NLA_U8 } static inline void fib_rule_get(struct fib_rule *rule) { @@ -102,6 +105,20 @@ static inline void fib_rule_put(struct fib_rule *rule) kfree_rcu(rule, rcu); } +#ifdef CONFIG_NET_L3_MASTER_DEV +static inline u32 fib_rule_get_table(struct fib_rule *rule, +struct fib_lookup_arg *arg) +{ + return rule->l3mdev ? arg->table : rule->table; +} +#else +static inline u32 fib_rule_get_table(struct fib_rule *rule, +struct fib_lookup_arg *arg) +{ + return rule->table; +} +#endif + static inline u32
[PATCH net-next v2 2/2] net: vrf: Add l3mdev rules on first device create
Add l3mdev rule per address family when the first VRF device is created. Remove them when the last is deleted. Signed-off-by: David Ahern--- v2 - added EXCL flag and EEXISTS check. Appropriate once the exclude fib rule patch is accepted - changed 3rd arg to vrf_fib_rule from 0/1 to false/true per Dave's comment drivers/net/vrf.c | 119 +- 1 file changed, 118 insertions(+), 1 deletion(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index d356f5d0f7b0..1d13c95cab97 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -35,6 +35,7 @@ #include #include #include +#include #define RT_FL_TOS(oldflp4) \ ((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK)) @@ -42,6 +43,11 @@ #define DRV_NAME "vrf" #define DRV_VERSION"1.0" +static atomic_t num_vrfs; + +static u32 rule_pref = 1000; +module_param(rule_pref, uint, S_IRUGO); + struct net_vrf { struct rtable __rcu *rth; struct rt6_info __rcu *rt6; @@ -729,6 +735,98 @@ static const struct ethtool_ops vrf_ethtool_ops = { .get_drvinfo= vrf_get_drvinfo, }; +static inline size_t vrf_fib_rule_nl_size(void) +{ + size_t sz; + + sz = NLMSG_ALIGN(sizeof(struct fib_rule_hdr)); + sz += nla_total_size(sizeof(u8)); /* FRA_L3MDEV */ + sz += nla_total_size(sizeof(u32)); /* FRA_PRIORITY */ + + return sz; +} + +static int vrf_fib_rule(const struct net_device *dev, __u8 family, bool add_it) +{ + struct fib_rule_hdr *frh; + struct nlmsghdr *nlh; + struct sk_buff *skb; + int err; + + skb = nlmsg_new(vrf_fib_rule_nl_size(), GFP_KERNEL); + if (!skb) + return -ENOMEM; + + nlh = nlmsg_put(skb, 0, 0, 0, sizeof(*frh), 0); + if (!nlh) + goto nla_put_failure; + + /* rule only needs to appear once */ + nlh->nlmsg_flags &= NLM_F_EXCL; + + frh = nlmsg_data(nlh); + memset(frh, 0, sizeof(*frh)); + frh->family = family; + frh->action = FR_ACT_TO_TBL; + + if (nla_put_u32(skb, FRA_L3MDEV, 1)) + goto nla_put_failure; + + if (nla_put_u32(skb, FRA_PRIORITY, rule_pref)) + goto nla_put_failure; + + nlmsg_end(skb, nlh); + + /* fib_nl_{new,del}rule handling looks for net from skb->sk */ + skb->sk = dev_net(dev)->rtnl; + if (add_it) { + err = fib_nl_newrule(skb, nlh); + if (err == -EEXIST) + err = 0; + } else { + err = fib_nl_delrule(skb, nlh); + if (err == -ENOENT) + err = 0; + } + nlmsg_free(skb); + + return err; + +nla_put_failure: + nlmsg_free(skb); + + return -EMSGSIZE; +} + +static void vrf_del_fib_rules(const struct net_device *dev) +{ + if (vrf_fib_rule(dev, AF_INET, false) || + vrf_fib_rule(dev, AF_INET6, false)) { + netdev_err(dev, "Failed to delete FIB rules.\n"); + } +} + +static int vrf_add_fib_rules(const struct net_device *dev) +{ + int err; + + err = vrf_fib_rule(dev, AF_INET, true); + if (err < 0) + goto out_err; + + err = vrf_fib_rule(dev, AF_INET6, true); + if (err < 0) + goto out_err; + + return 0; + +out_err: + netdev_err(dev, "Failed to add FIB rules.\n"); + vrf_del_fib_rules(dev); + + return err; +} + static void vrf_setup(struct net_device *dev) { ether_setup(dev); @@ -763,12 +861,17 @@ static int vrf_validate(struct nlattr *tb[], struct nlattr *data[]) static void vrf_dellink(struct net_device *dev, struct list_head *head) { unregister_netdevice_queue(dev, head); + + atomic_dec(_vrfs); + if (!atomic_read(_vrfs)) + vrf_del_fib_rules(dev); } static int vrf_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) { struct net_vrf *vrf = netdev_priv(dev); + int err; if (!data || !data[IFLA_VRF_TABLE]) return -EINVAL; @@ -777,7 +880,21 @@ static int vrf_newlink(struct net *src_net, struct net_device *dev, dev->priv_flags |= IFF_L3MDEV_MASTER; - return register_netdevice(dev); + err = register_netdevice(dev); + if (err) + goto out; + + if (!atomic_read(_vrfs)) { + err = vrf_add_fib_rules(dev); + if (err) { + unregister_netdevice(dev); + goto out; + } + } + + atomic_inc(_vrfs); +out: + return err; } static size_t vrf_nl_getsize(const struct net_device *dev) -- 2.1.4
[PATCH net-next v2 0/2] net: vrf: Improve use of FIB rules
Currently, VRFs require 1 oif and 1 iif rule per address family per VRF. As the number of VRF devices increases it brings scalability issues with the increasing rule list. All of the VRF rules have the same format with the exception of the specific table id to direct the lookup. Since the table id is available from the oif or iif in the loopup, the VRF rules can be consolidated to a single rule that pulls the table from the VRF device. This solution still allows a user to insert their own rules for VRFs, including rules with additional attributes. Accordingly, it is backwards compatible with existing setups and allows other policy routing as desired. David Ahern (2): net: Add l3mdev rule net: vrf: Add l3mdev rules on first device create drivers/net/vrf.c | 119 - include/net/fib_rules.h| 24 - include/net/l3mdev.h | 12 + include/uapi/linux/fib_rules.h | 1 + net/core/fib_rules.c | 33 ++-- net/ipv4/fib_rules.c | 6 ++- net/ipv6/fib6_rules.c | 6 ++- net/l3mdev/l3mdev.c| 38 + 8 files changed, 227 insertions(+), 12 deletions(-) -- 2.1.4
Re: [PATCH 0/2] Quiet noisy LSM denial when accessing net sysctl
On Thu, 2 Jun 2016, Tyler Hicks wrote: > On 05/17/2016 09:13 AM, Tyler Hicks wrote: > > On 05/08/2016 10:56 PM, David Miller wrote: > >> From: Tyler Hicks> >> Date: Fri, 6 May 2016 18:04:12 -0500 > >> > >>> This pair of patches does away with what I believe is a useless denial > >>> audit message when a privileged process initially accesses a net sysctl. > >> > >> The LSM folks can apply this if they agree with you. > > > > Hi James - Could you pick up these two bug fix patches? Thanks! > > Hello - Just checking in again to see if you plan on taking these > through the security tree? Sure, please resend. -- James Morris
Re: [PATCH v2 2/7] DT: phy.txt: Add mdio-integrated-mux property
On Thu, Jun 02, 2016 at 06:27:03PM -0500, Rob Herring wrote: > On Tue, May 31, 2016 at 07:06:36PM +0530, Pramod Kumar wrote: > > This property is used by integrated MDIO multiplexer > > which has bus selection and mdio transaction generation logic, > > integrated inside. > > > > Signed-off-by: Pramod Kumar> > --- > > Documentation/devicetree/bindings/net/mdio-mux.txt | 9 - > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/devicetree/bindings/net/mdio-mux.txt > > b/Documentation/devicetree/bindings/net/mdio-mux.txt > > index 491f5bd..b5ad83e 100644 > > --- a/Documentation/devicetree/bindings/net/mdio-mux.txt > > +++ b/Documentation/devicetree/bindings/net/mdio-mux.txt > > @@ -5,13 +5,20 @@ numbered uniquely in a device dependent manner. The > > nodes for an MDIO > > bus multiplexer/switch will have one child node for each child bus. > > > > Required properties: > > -- mdio-parent-bus : phandle to the parent MDIO bus. > > - #address-cells = <1>; > > - #size-cells = <0>; > > > > Optional properties: > > +- mdio-parent-bus : phandle to the parent MDIO bus. Should be used > > + if parent mdio bus is not part of multiplexer. > > You don't appear to be using this. When would you? He is moving it to optional. The mdio-mux-mmio and mdio-mux-gpio do however use it, which follow this binding. Andrew
RE: [PATCH] net: fjes: fjes_main: Remove create_workqueue
Dear Bhaktipriya, Thanks. Looks good to me. Sincerely, Taku Izumi > -Original Message- > From: Bhaktipriya Shridhar [mailto:bhaktipriy...@gmail.com] > Sent: Thursday, June 02, 2016 6:31 PM > To: David S. Miller; Izumi, Taku/泉 拓; Florian Westphal; Bhaktipriya Shridhar > Cc: Tejun Heo; netdev@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: [PATCH] net: fjes: fjes_main: Remove create_workqueue > > alloc_workqueue replaces deprecated create_workqueue(). > > The workqueue adapter->txrx_wq has workitem > >raise_intr_rxdata_task per adapter. Extended Socket Network > Device is shared memory based, so someone's transmission denotes other's > reception. raise_intr_rxdata_task raises interruption of receivers from > the sender in order to notify receivers. > > The workqueue adapter->control_wq has workitem > >interrupt_watch_task per adapter. interrupt_watch_task is used > to prevent delay of interrupts. > > Dedicated workqueues have been used in both cases since the workitems > on the workqueues are involved in normal device operation and require > forward progress under memory pressure. > > max_active has been set to 0 since there is no need for throttling > the number of active work items. > > Since network devices may be used for memory reclaim, > WQ_MEM_RECLAIM has been set to guarantee forward progress. > > Signed-off-by: Bhaktipriya Shridhar> --- > drivers/net/fjes/fjes_main.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c > index 86c331b..9006877 100644 > --- a/drivers/net/fjes/fjes_main.c > +++ b/drivers/net/fjes/fjes_main.c > @@ -1187,8 +1187,9 @@ static int fjes_probe(struct platform_device *plat_dev) > adapter->force_reset = false; > adapter->open_guard = false; > > - adapter->txrx_wq = create_workqueue(DRV_NAME "/txrx"); > - adapter->control_wq = create_workqueue(DRV_NAME "/control"); > + adapter->txrx_wq = alloc_workqueue(DRV_NAME "/txrx", WQ_MEM_RECLAIM, 0); > + adapter->control_wq = alloc_workqueue(DRV_NAME "/control", > + WQ_MEM_RECLAIM, 0); > > INIT_WORK(>tx_stall_task, fjes_tx_stall_task); > INIT_WORK(>raise_intr_rxdata_task, > -- > 2.1.4 >
Re: [PATCH v2 2/7] DT: phy.txt: Add mdio-integrated-mux property
On Tue, May 31, 2016 at 07:06:36PM +0530, Pramod Kumar wrote: > This property is used by integrated MDIO multiplexer > which has bus selection and mdio transaction generation logic, > integrated inside. > > Signed-off-by: Pramod Kumar> --- > Documentation/devicetree/bindings/net/mdio-mux.txt | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/Documentation/devicetree/bindings/net/mdio-mux.txt > b/Documentation/devicetree/bindings/net/mdio-mux.txt > index 491f5bd..b5ad83e 100644 > --- a/Documentation/devicetree/bindings/net/mdio-mux.txt > +++ b/Documentation/devicetree/bindings/net/mdio-mux.txt > @@ -5,13 +5,20 @@ numbered uniquely in a device dependent manner. The nodes > for an MDIO > bus multiplexer/switch will have one child node for each child bus. > > Required properties: > -- mdio-parent-bus : phandle to the parent MDIO bus. > - #address-cells = <1>; > - #size-cells = <0>; > > Optional properties: > +- mdio-parent-bus : phandle to the parent MDIO bus. Should be used > + if parent mdio bus is not part of multiplexer. You don't appear to be using this. When would you? > +- mdio-integrated-mux: boolean property indicateing that the hardware > + is an integrated multiplex supporting muxed bus selection > + and MDIO transaction logic generation. > - Other properties specific to the multiplexer/switch hardware. > > +Note: one of mdio-parent-bus and mdio-integrated-mux is mandatory to > +get parent bus regsitered. > + > Required properties for child nodes: > - #address-cells = <1>; > - #size-cells = <0>; > -- > 1.9.1 >
[PATCH] net: ethernet: ti: cpsw: remove rx_descs property
There is no reason to hold s/w dependent parameter in device tree. Even more, there is no reason in this parameter because davinici_cpdma driver splits pool of descriptors equally between tx and rx channels. That is, if number of descriptors 256, 128 of them are for rx channels. While receiving, the descriptor is freed to the pool and then allocated with new skb. And if in DT the "rx_descs" is set to 64, then 128 - 64 = 64 descriptors are always in the pool and cannot be used, for tx, for instance. It's not correct resource usage, better to set it to half of pool, then the rx pool can be used in full. It will not have any impact on performance, as anyway, the "redundant" descriptors were unused. Signed-off-by: Ivan Khoronzhuk--- Based on master Documentation/devicetree/bindings/net/cpsw.txt | 3 --- arch/arm/boot/dts/am33xx.dtsi | 1 - arch/arm/boot/dts/am4372.dtsi | 1 - arch/arm/boot/dts/dm814x.dtsi | 1 - arch/arm/boot/dts/dra7.dtsi| 1 - drivers/net/ethernet/ti/cpsw.c | 13 +++-- drivers/net/ethernet/ti/cpsw.h | 1 - drivers/net/ethernet/ti/davinci_cpdma.c| 6 ++ drivers/net/ethernet/ti/davinci_cpdma.h| 1 + 9 files changed, 10 insertions(+), 18 deletions(-) diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt index 0ae0649..5fe6239 100644 --- a/Documentation/devicetree/bindings/net/cpsw.txt +++ b/Documentation/devicetree/bindings/net/cpsw.txt @@ -15,7 +15,6 @@ Required properties: - cpdma_channels : Specifies number of channels in CPDMA - ale_entries : Specifies No of entries ALE can hold - bd_ram_size : Specifies internal descriptor RAM size -- rx_descs : Specifies number of Rx descriptors - mac_control : Specifies Default MAC control register content for the specific platform - slaves : Specifies number for slaves @@ -70,7 +69,6 @@ Examples: ale_entries = <1024>; bd_ram_size = <0x2000>; no_bd_ram = <0>; - rx_descs = <64>; mac_control = <0x20>; slaves = <2>; active_slave = <0>; @@ -99,7 +97,6 @@ Examples: ale_entries = <1024>; bd_ram_size = <0x2000>; no_bd_ram = <0>; - rx_descs = <64>; mac_control = <0x20>; slaves = <2>; active_slave = <0>; diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi index 52be48b..702126f 100644 --- a/arch/arm/boot/dts/am33xx.dtsi +++ b/arch/arm/boot/dts/am33xx.dtsi @@ -766,7 +766,6 @@ ale_entries = <1024>; bd_ram_size = <0x2000>; no_bd_ram = <0>; - rx_descs = <64>; mac_control = <0x20>; slaves = <2>; active_slave = <0>; diff --git a/arch/arm/boot/dts/am4372.dtsi b/arch/arm/boot/dts/am4372.dtsi index 12fcde4..a10fa7f 100644 --- a/arch/arm/boot/dts/am4372.dtsi +++ b/arch/arm/boot/dts/am4372.dtsi @@ -626,7 +626,6 @@ ale_entries = <1024>; bd_ram_size = <0x2000>; no_bd_ram = <0>; - rx_descs = <64>; mac_control = <0x20>; slaves = <2>; active_slave = <0>; diff --git a/arch/arm/boot/dts/dm814x.dtsi b/arch/arm/boot/dts/dm814x.dtsi index d4537dc..f23cae0c 100644 --- a/arch/arm/boot/dts/dm814x.dtsi +++ b/arch/arm/boot/dts/dm814x.dtsi @@ -509,7 +509,6 @@ ale_entries = <1024>; bd_ram_size = <0x2000>; no_bd_ram = <0>; - rx_descs = <64>; mac_control = <0x20>; slaves = <2>; active_slave = <0>; diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi index e007401..b7ddc64 100644 --- a/arch/arm/boot/dts/dra7.dtsi +++ b/arch/arm/boot/dts/dra7.dtsi @@ -1626,7 +1626,6 @@ ale_entries = <1024>; bd_ram_size = <0x2000>; no_bd_ram = <0>; - rx_descs = <64>; mac_control = <0x20>; slaves = <2>; active_slave = <0>; diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 4b08a2f..635be3e 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1277,6 +1277,7 @@ static int cpsw_ndo_open(struct net_device *ndev) ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0); if (!cpsw_common_res_usage_state(priv)) {
[PATCH] net: ethernet: ti: cpsw: remove unused priv lock
There is no reason in this lock. At least for now. Signed-off-by: Ivan Khoronzhuk--- Based on master drivers/net/ethernet/ti/cpsw.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 9919cb3..8d1d373 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -365,7 +365,6 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset) } struct cpsw_priv { - spinlock_t lock; struct platform_device *pdev; struct net_device *ndev; struct napi_struct napi_rx; @@ -2413,7 +2412,6 @@ static int cpsw_probe_dual_emac(struct platform_device *pdev, } priv_sl2 = netdev_priv(ndev); - spin_lock_init(_sl2->lock); priv_sl2->data = *data; priv_sl2->pdev = pdev; priv_sl2->ndev = ndev; @@ -2533,7 +2531,6 @@ static int cpsw_probe(struct platform_device *pdev) platform_set_drvdata(pdev, ndev); priv = netdev_priv(ndev); - spin_lock_init(>lock); priv->pdev = pdev; priv->ndev = ndev; priv->dev = >dev; -- 1.9.1
Re: Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing")
From: Eric DumazetDate: Thu, 02 Jun 2016 14:52:43 -0700 > From: Eric Dumazet > > Paul Moore tracked a regression caused by a recent commit, which > mistakenly assumed that sk_filter() could be avoided if socket > had no current BPF filter. > > The intent was to avoid udp_lib_checksum_complete() overhead. > > But sk_filter() also checks skb_pfmemalloc() and > security_sock_rcv_skb(), so better call it. > > Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") > Signed-off-by: Eric Dumazet > Reported-by: Paul Moore > Tested-by: Paul Moore > Tested-by: Stephen Smalley > Cc: samanthakumar Applied, thanks Eric.
Re: [PATCH net-next] net: vrf: set operstate and mtu at link create
From: David AhernDate: Wed, 1 Jun 2016 21:16:39 -0700 > The VRF device exists to define L3 domains and guide FIB lookups. As > such its operstate is not relevant. Seeing 'state UNKNOWN' in the > output of 'ip link show' can be confusing, so set operstate at link > create. > > Similarly, the MTU for a VRF device is not used; any fragmentation > of the payload is done on the output path based on the real egress > device. An MTU of 1500 on the VRF device while enslaved devices > have a higher MTU can lead to confusion. Since the VRF MTU is not > relevant set to 64k similar to what is done for loopback. > > Signed-off-by: David Ahern Applied, thanks.
Re: [PATCH net-next 2/2] net: vrf: Add l3mdev rules on first device create
From: David AhernDate: Wed, 1 Jun 2016 21:14:54 -0700 > Add l3mdev rule per address family when the first VRF device is > created. Remove them when the last is deleted. > > Signed-off-by: David Ahern > --- > drivers/net/vrf.c | 114 > +- > 1 file changed, 113 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c > index dff08842f26d..b3cb80e84ea7 100644 > --- a/drivers/net/vrf.c > +++ b/drivers/net/vrf.c > @@ -35,6 +35,7 @@ > #include > #include > #include > +#include > > #define RT_FL_TOS(oldflp4) \ > ((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK)) > @@ -42,6 +43,11 @@ > #define DRV_NAME "vrf" > #define DRV_VERSION "1.0" > > +static atomic_t num_vrfs; > + > +static u32 rule_pref = 1000; > +module_param(rule_pref, uint, S_IRUGO); > + > struct net_vrf { > struct rtable __rcu *rth; > struct rt6_info __rcu *rt6; > @@ -723,6 +729,93 @@ static const struct ethtool_ops vrf_ethtool_ops = { > .get_drvinfo= vrf_get_drvinfo, > }; > > +static inline size_t vrf_fib_rule_nl_size(void) > +{ > + size_t sz; > + > + sz = NLMSG_ALIGN(sizeof(struct fib_rule_hdr)); > + sz += nla_total_size(sizeof(u8)); /* FRA_L3MDEV */ > + sz += nla_total_size(sizeof(u32)); /* FRA_PRIORITY */ > + > + return sz; > +} > + > +static int vrf_fib_rule(const struct net_device *dev, __u8 family, bool > add_it) ... > +static void vrf_del_fib_rules(const struct net_device *dev) > +{ > + if (vrf_fib_rule(dev, AF_INET, 0) || > + vrf_fib_rule(dev, AF_INET6, 0)) { ... > +static int vrf_add_fib_rules(const struct net_device *dev) > +{ > + int err; > + > + err = vrf_fib_rule(dev, AF_INET, 1); > + if (err < 0) > + goto out_err; > + > + err = vrf_fib_rule(dev, AF_INET6, 1); Since the third arg to vrf_fib_rule() is a bool, pass true/false.
Re: [PATCH net-next 1/2] net: Add l3mdev rule
From: David AhernDate: Wed, 1 Jun 2016 21:14:53 -0700 > @@ -76,6 +76,7 @@ static int fib4_rule_action(struct fib_rule *rule, struct > flowi *flp, > { > int err = -EAGAIN; > struct fib_table *tbl; > + u32 tb_id = fib_rule_get_table(rule, arg); Please order local variable lines from longest to shortest.
Re: [net-next] ovs: set name assign type of internal port
From: Zhang ShengjuDate: Tue, 31 May 2016 13:41:02 + > Set name_assign_type of internal port to NET_NAME_USER. > > Signed-off-by: Zhang Shengju Applied, thanks.
Re: [PATCH net-next v10 2/5] openvswitch: set skb protocol and mac_len when receiving on internal device
On Wed, Jun 1, 2016 at 11:24 PM, Simon Hormanwrote: > * Set skb protocol based on contents of packet. I have observed this is > necessary to get actual protocol of a packet when it is injected into an > internal device e.g. by libnet in which case skb protocol will be set to > ETH_ALL. > > * Set the mac_len which has been observed to not be set up correctly when > an ARP packet is generated and sent via an openvswitch bridge. > My test case is a scenario where there are two open vswtich bridges. > One outputs to a tunnel port which egresses on the other. > > The motivation for this is that support for outputting to layer 3 (non-tap) > GRE tunnels as implemented by a subsequent patch depends on protocol and > mac_len being set correctly on receive. > > Signed-off-by: Simon Horman > > --- > v10 > * Set mac_len > > v9 > * New patch > --- > net/openvswitch/vport-internal_dev.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/net/openvswitch/vport-internal_dev.c > b/net/openvswitch/vport-internal_dev.c > index 2ee48e447b72..f89b1efa88f1 100644 > --- a/net/openvswitch/vport-internal_dev.c > +++ b/net/openvswitch/vport-internal_dev.c > @@ -48,6 +48,10 @@ static int internal_dev_xmit(struct sk_buff *skb, struct > net_device *netdev) > { > int len, err; > > + skb->protocol = eth_type_trans(skb, netdev); > + skb_push(skb, ETH_HLEN); > + skb_reset_mac_len(skb); > + resetting mac-len breaks the assumption about mac_len for referencing MPLS header ref: skb_mpls_header().
Re: [PATCH net-next v10 4/5] openvswitch: add layer 3 flow/port support
On Wed, Jun 1, 2016 at 11:24 PM, Simon Hormanwrote: > From: Lorand Jakab > > Implementation of the pop_eth and push_eth actions in the kernel, and > layer 3 flow support. > > This doesn't actually do anything yet as no layer 2 tunnel ports are > supported yet. The original patch by Lorand was against the Open vSwitch > tree which has L2 LISP tunnels but that is not supported in mainline Linux. > I (Simon) plan to follow up with support for non-TEB GRE ports based on > work by Thomas Morin. > > Cc: Thomas Morin > Signed-off-by: Lorand Jakab > Signed-off-by: Simon Horman > > --- > v10 [Simon Horman] > * Move outermost VLAN into skb metadata in pop_eth and > leave any VLAN as-is in push_eth. The effect is to allow the presence > of a vlan to be independent of pushing and popping ethernet headers. > * Omit unnecessary type field from push_eth action > * Squash with the following patches to make a more complete patch: > "openvswitch: add layer 3 support to ovs_packet_cmd_execute()" > "openvswitch: extend layer 3 support to cover non-IP packets" > > v9 [Simon Horman] > * Rebase > * Minor coding style updates > * Prohibit push/pop MPLS on l3 packets > * There are no layer 3 ports supported at this time so only > send and receive layer 2 packets: that is don't actually > use this new infrastructure yet > * Expect that vports that can handle layer 3 packets will: have > a type other than ARPHRD_IPETHER; can also handle layer 2 packets; > and that packets can be differentiated by layer 2 packets having > skb->protocol set to htons(ETH_P_TEB) > > v1 - v8 [Lorand Jakub] > > wip: fix: openvswitch: add support to push and pop > > * Consistently use skb_hdr() in push_eth() by assigning > its value to a local variable. > * Limit scope of hdr in push_mpls() > * Recalculate csum for protocl change in push_mpls. > - Also needed for pop_mpls? > - Break out into a fix-patch > > Signed-off-by: Simon Horman ... > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index 15f130e4c22b..5567529904fa 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -300,6 +300,51 @@ static int set_eth_addr(struct sk_buff *skb, struct > sw_flow_key *flow_key, > return 0; > } > > +static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key) > +{ > + /* Pop outermost VLAN tag to skb metadata unless a VLAN tag > +* is already present there. > +*/ > + if ((skb->protocol == htons(ETH_P_8021Q) || > +skb->protocol == htons(ETH_P_8021AD)) && > + !skb_vlan_tag_present(skb)) { > + int err = skb_vlan_accel(skb); > + if (unlikely(err)) > + return err; > + } > + I do not think we can keep just the vlan tag and pop ethernet header. There are multiple issues with this. First networking stack can not handle suck packet. second issue even after this patch OVS can not parse this type of packet. third this patch does not allow pop-eth action on vlan tagged packet. There is already separate vlan related actions in OVS so lets keep it simple. > + skb_pull_rcsum(skb, ETH_HLEN); > + skb_reset_mac_header(skb); > + skb->mac_len -= ETH_HLEN; > + > + invalidate_flow_key(key); > + return 0; > +} > + ... ... > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c > index 0ea128eeeab2..2d9777abcfc9 100644 > --- a/net/openvswitch/flow.c > +++ b/net/openvswitch/flow.c > @@ -468,28 +468,31 @@ static int key_extract(struct sk_buff *skb, struct > sw_flow_key *key) > > skb_reset_mac_header(skb); > > - /* Link layer. We are guaranteed to have at least the 14 byte > Ethernet > -* header in the linear data area. > -*/ > - eth = eth_hdr(skb); > - ether_addr_copy(key->eth.src, eth->h_source); > - ether_addr_copy(key->eth.dst, eth->h_dest); > + /* Link layer. */ > + key->eth.tci = 0; > + if (key->phy.is_layer3) { > + if (skb_vlan_tag_present(skb)) > + key->eth.tci = htons(skb->vlan_tci); > + } else { > + eth = eth_hdr(skb); eth can be moved to this block. > + ether_addr_copy(key->eth.src, eth->h_source); > + ether_addr_copy(key->eth.dst, eth->h_dest); > > - __skb_pull(skb, 2 * ETH_ALEN); > - /* We are going to push all headers that we pull, so no need to > -* update skb->csum here. > -*/ > + __skb_pull(skb, 2 * ETH_ALEN); > + /* We are going to push all headers that we pull, so no need > to > +* update skb->csum here. > +*/ > > - key->eth.tci = 0; > - if (skb_vlan_tag_present(skb)) > - key->eth.tci = htons(skb->vlan_tci); > - else if
Re: [PATCH net-next v10 3/5] openvswitch: add support to push and pop mpls for layer3 packets
On Wed, Jun 1, 2016 at 11:24 PM, Simon Hormanwrote: > Allow push and pop mpls actions to act on layer 3 packets by teaching > them not to access non-existent L2 headers of such packets. > > Signed-off-by: Simon Horman > --- > v10 > * Limit scope of hdr in {push,pop}_mpls() > > v9 > * New Patch > --- > net/openvswitch/actions.c | 19 --- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > index 9a3eb7a0ebf4..15f130e4c22b 100644 > --- a/net/openvswitch/actions.c > +++ b/net/openvswitch/actions.c > @@ -172,7 +172,8 @@ static int push_mpls(struct sk_buff *skb, struct > sw_flow_key *key, > > skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN); > > - update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype); > + if (skb->mac_len) > + update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype); We can move all ethernet related code in this if block. for example memmove(). > if (!skb->inner_protocol) > skb_set_inner_protocol(skb, skb->protocol); > skb->protocol = mpls->mpls_ethertype; > @@ -184,7 +185,6 @@ static int push_mpls(struct sk_buff *skb, struct > sw_flow_key *key, > static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key, > const __be16 ethertype) > { > - struct ethhdr *hdr; > int err; > > err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN); > @@ -199,11 +199,16 @@ static int pop_mpls(struct sk_buff *skb, struct > sw_flow_key *key, > __skb_pull(skb, MPLS_HLEN); > skb_reset_mac_header(skb); > > - /* skb_mpls_header() is used to locate the ethertype > -* field correctly in the presence of VLAN tags. > -*/ > - hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN); > - update_ethertype(skb, hdr, ethertype); > + if (skb->mac_len) { > + struct ethhdr *hdr; > + > + /* skb_mpls_header() is used to locate the ethertype > +* field correctly in the presence of VLAN tags. > +*/ > + hdr = (struct ethhdr *)(skb_mpls_header(skb) - ETH_HLEN); > + update_ethertype(skb, hdr, ethertype); > + } same here.
Re: [PATCH net-next v10 1/5] net: add skb_vlan_accel helper
On Wed, Jun 1, 2016 at 11:24 PM, Simon Hormanwrote: > This breaks out some of of skb_vlan_pop into a separate helper. > This new helper moves the outer-most vlan tag present in packet data > into metadata. > > The motivation is to allow acceleration VLAN tags without adding a new > one. This is in preparation for a push ethernet header support in Open > vSwitch. > > Signed-off-by: Simon Horman > I am not sure we need this function at this point. I will post comment on patch 4 where it is used.
Re: Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing")
From: Eric DumazetPaul Moore tracked a regression caused by a recent commit, which mistakenly assumed that sk_filter() could be avoided if socket had no current BPF filter. The intent was to avoid udp_lib_checksum_complete() overhead. But sk_filter() also checks skb_pfmemalloc() and security_sock_rcv_skb(), so better call it. Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Signed-off-by: Eric Dumazet Reported-by: Paul Moore Tested-by: Paul Moore Tested-by: Stephen Smalley Cc: samanthakumar --- net/ipv4/udp.c | 10 +- net/ipv6/udp.c | 12 ++-- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d56c0559b477..0ff31d97d485 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1618,12 +1618,12 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) } } - if (rcu_access_pointer(sk->sk_filter)) { - if (udp_lib_checksum_complete(skb)) + if (rcu_access_pointer(sk->sk_filter) && + udp_lib_checksum_complete(skb)) goto csum_error; - if (sk_filter(sk, skb)) - goto drop; - } + + if (sk_filter(sk, skb)) + goto drop; udp_csum_pull_header(skb); if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 2da1896af934..f421c9f23c5b 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -653,12 +653,12 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb) } } - if (rcu_access_pointer(sk->sk_filter)) { - if (udp_lib_checksum_complete(skb)) - goto csum_error; - if (sk_filter(sk, skb)) - goto drop; - } + if (rcu_access_pointer(sk->sk_filter) && + udp_lib_checksum_complete(skb)) + goto csum_error; + + if (sk_filter(sk, skb)) + goto drop; udp_csum_pull_header(skb); if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
Re: Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing")
On Thu, 2016-06-02 at 17:36 -0400, Paul Moore wrote: > On Wed, Jun 1, 2016 at 4:44 PM, Stephen Smalleywrote: > > On 06/01/2016 03:18 PM, Eric Dumazet wrote: > >> On Wed, 2016-06-01 at 15:01 -0400, Paul Moore wrote: > >>> Hello, > >>> > >>> I'm currently trying to debug a problem with 4.7-rc1 and labeled > >>> networking over UDP. I'm having some difficulty with the latest > >>> 4.7-rc1 builds on my test system at the moment so I haven't been able > >>> to concisely identify the problem, but looking through the commits in > >>> 4.7-rc1 I think there may be a problem with the following: > >>> > >>> commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac > >>> Author: samanthakumar > >>> Date: Tue Apr 5 12:41:15 2016 -0400 > >>> > >>>udp: remove headers from UDP packets before queueing > >>> > >>>Remove UDP transport headers before queueing packets for reception. > >>>This change simplifies a follow-up patch to add MSG_PEEK support. > >>> > >>>Signed-off-by: Sam Kumar > >>>Signed-off-by: Willem de Bruijn > >>>Signed-off-by: David S. Miller > >>> > >>> ... it appears that this commit changes things so that sk_filter() is > >>> only called when sk->sk_filter is not NULL. While this is fine for > >>> the traditional socket filter case, it causes problems with LSMs that > >>> make use of security_sock_rcv_skb() to enforce per-packet access > >>> controls. > >>> > >>> Hopefully I'll get 4.7-rc1 booting soon and I can do a proper > >>> bisection test around this patch, but I wanted to mention this now in > >>> case others are seeing the same problem. > >> > >> Thanks for the report. Please try following fix. > >> > >> sk_filter() got additional features like the skb_pfmemalloc() things and > >> security_sock_rcv_skb() > > > > This resolved the SELinux regression for me. > > > > Tested-by: Stephen Smalley > > The patch works for me too. Eric, are you going to send this to DaveM > (assuming he isn't listening in on this thread and picking it up > himself)? > > Tested-by: Paul Moore I am going to send the official patch right away, thanks !
Re: Possible problem with e6afc8ac ("udp: remove headers from UDP packets before queueing")
On Wed, Jun 1, 2016 at 4:44 PM, Stephen Smalleywrote: > On 06/01/2016 03:18 PM, Eric Dumazet wrote: >> On Wed, 2016-06-01 at 15:01 -0400, Paul Moore wrote: >>> Hello, >>> >>> I'm currently trying to debug a problem with 4.7-rc1 and labeled >>> networking over UDP. I'm having some difficulty with the latest >>> 4.7-rc1 builds on my test system at the moment so I haven't been able >>> to concisely identify the problem, but looking through the commits in >>> 4.7-rc1 I think there may be a problem with the following: >>> >>> commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac >>> Author: samanthakumar >>> Date: Tue Apr 5 12:41:15 2016 -0400 >>> >>>udp: remove headers from UDP packets before queueing >>> >>>Remove UDP transport headers before queueing packets for reception. >>>This change simplifies a follow-up patch to add MSG_PEEK support. >>> >>>Signed-off-by: Sam Kumar >>>Signed-off-by: Willem de Bruijn >>>Signed-off-by: David S. Miller >>> >>> ... it appears that this commit changes things so that sk_filter() is >>> only called when sk->sk_filter is not NULL. While this is fine for >>> the traditional socket filter case, it causes problems with LSMs that >>> make use of security_sock_rcv_skb() to enforce per-packet access >>> controls. >>> >>> Hopefully I'll get 4.7-rc1 booting soon and I can do a proper >>> bisection test around this patch, but I wanted to mention this now in >>> case others are seeing the same problem. >> >> Thanks for the report. Please try following fix. >> >> sk_filter() got additional features like the skb_pfmemalloc() things and >> security_sock_rcv_skb() > > This resolved the SELinux regression for me. > > Tested-by: Stephen Smalley The patch works for me too. Eric, are you going to send this to DaveM (assuming he isn't listening in on this thread and picking it up himself)? Tested-by: Paul Moore >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >> index d56c0559b477..0ff31d97d485 100644 >> --- a/net/ipv4/udp.c >> +++ b/net/ipv4/udp.c >> @@ -1618,12 +1618,12 @@ int udp_queue_rcv_skb(struct sock *sk, struct >> sk_buff *skb) >> } >> } >> >> - if (rcu_access_pointer(sk->sk_filter)) { >> - if (udp_lib_checksum_complete(skb)) >> + if (rcu_access_pointer(sk->sk_filter) && >> + udp_lib_checksum_complete(skb)) >> goto csum_error; >> - if (sk_filter(sk, skb)) >> - goto drop; >> - } >> + >> + if (sk_filter(sk, skb)) >> + goto drop; >> >> udp_csum_pull_header(skb); >> if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) { >> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c >> index 2da1896af934..f421c9f23c5b 100644 >> --- a/net/ipv6/udp.c >> +++ b/net/ipv6/udp.c >> @@ -653,12 +653,12 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct >> sk_buff *skb) >> } >> } >> >> - if (rcu_access_pointer(sk->sk_filter)) { >> - if (udp_lib_checksum_complete(skb)) >> - goto csum_error; >> - if (sk_filter(sk, skb)) >> - goto drop; >> - } >> + if (rcu_access_pointer(sk->sk_filter) && >> + udp_lib_checksum_complete(skb)) >> + goto csum_error; >> + >> + if (sk_filter(sk, skb)) >> + goto drop; >> >> udp_csum_pull_header(skb); >> if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) { >> >> -- paul moore www.paul-moore.com
[PATCH v2 -next] virtio-net: Add initial MTU advice feature
This commit adds the feature bit and associated mtu device entry for the virtio network device. When a virtio device comes up, it checks the feature bit for the VIRTIO_NET_F_MTU feature. If such feature bit is enabled, the driver will read the advised MTU and use it as the initial value. Signed-off-by: Aaron Conole--- v1->v2: * Fixed omitted hunk from virtio_net.h * Squashed to a single commit * Fixed commit message. drivers/net/virtio_net.c| 7 +++ include/uapi/linux/virtio_net.h | 3 +++ 2 files changed, 10 insertions(+) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index e0638e5..ef5ee01 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1896,6 +1896,12 @@ static int virtnet_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) vi->has_cvq = true; + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { + dev->mtu = virtio_cread16(vdev, + offsetof(struct virtio_net_config, + mtu)); + } + if (vi->any_header_sg) dev->needed_headroom = vi->hdr_len; @@ -2067,6 +2073,7 @@ static unsigned int features[] = { VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, VIRTIO_NET_F_CTRL_MAC_ADDR, VIRTIO_F_ANY_LAYOUT, + VIRTIO_NET_F_MTU, }; static struct virtio_driver virtio_net_driver = { diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index ec32293..1ab4ea6 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -55,6 +55,7 @@ #define VIRTIO_NET_F_MQ22 /* Device supports Receive Flow * Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ +#define VIRTIO_NET_F_MTU 25/* Initial MTU advice */ #ifndef VIRTIO_NET_NO_LEGACY #define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */ @@ -73,6 +74,8 @@ struct virtio_net_config { * Legal values are between 1 and 0x8000 */ __u16 max_virtqueue_pairs; + /* Default maximum transmit unit advice */ + __u16 mtu; } __attribute__((packed)); /* -- 2.5.5
Re: [PATCH v2 5/6] ethernet/intel: Use pci_(request|release)_mem_regions
On Thu, 2016-06-02 at 09:30 +0200, Johannes Thumshirn wrote: > Now that we do have pci_request_mem_regions() and > pci_release_mem_regions() at > hand, use it in the Intel ethernet drivers. > > Suggested-by: Christoph Hellwig> Signed-off-by: Johannes Thumshirn > Cc: Jeff Kirsher > Cc: David S. Miller > Cc: netdev@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: intel-wired-...@lists.osuosl.org Acked-by: Jeff Kirsher > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 6 ++ > drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 11 +++ > drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +++-- > drivers/net/ethernet/intel/igb/igb_main.c | 10 +++--- > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 9 +++-- > 5 files changed, 14 insertions(+), 31 deletions(-) signature.asc Description: This is a digitally signed message part
[PATCH net-next] net: disable fragment reassembly if high_thresh is zero
Before commit 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting"), setting the reassembly high threshold to 0 prevented fragment reassembly as first fragment would be always evicted before second could be added to the queue. While inefficient, some users apparently relied on this method. Since the commit mentioned above, a percpu counter is used for reassembly memory accounting and high batch size avoids taking slow path in most common scenarios. As a result, a whole full sized packet can be reassembled without the percpu counter's main counter changing its value so that even with high_thresh set to 0, fragmented packets can be still reassembled and processed. Add explicit check preventing reassembly if high threshold is zero. Signed-off-by: Michal Kubecek--- net/ipv4/inet_fragment.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 3a88b0c73797..b5e9317eaf9e 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -355,7 +355,7 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf, { struct inet_frag_queue *q; - if (frag_mem_limit(nf) > nf->high_thresh) { + if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) { inet_frag_schedule_worker(f); return NULL; } -- 2.8.3
[PATCH net-next 2/3] net: vrf: ipv4 support for local traffic to local addresses
Add support for locally originated traffic to VRF-local addresses. If destination device for an skb is the loopback or VRF device then set its dst to a local version of the VRF cached dst_entry and call netif_rx to insert the packet onto the rx queue - similar to what is done for loopback. This patch handles IPv4 support; follow on patch handles IPv6. With this patch, ping, tcp and udp packets to a local IPv4 address are successfully routed: $ ip addr show dev eth1 4: eth1:mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1 valid_lft forever preferred_lft forever inet6 2100:1::1/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe1c:b974/64 scope link valid_lft forever preferred_lft forever $ ping -c1 -I red 10.100.1.1 ping: Warning: source address might be selected on device other than red. PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data. 64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms This patch also enables use of IPv4 loopback address on the VRF device: $ ip addr add dev red 127.0.0.1/8 $ ping -c1 -I red 127.0.0.1 PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms Signed-off-by: David Ahern --- drivers/net/vrf.c | 100 -- 1 file changed, 98 insertions(+), 2 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index d678aaeba572..7df065456893 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -50,6 +50,7 @@ module_param(rule_pref, uint, S_IRUGO); struct net_vrf { struct rtable __rcu *rth; + struct rtable __rcu *rth_local; struct rt6_info __rcu *rt6; u32 tb_id; }; @@ -60,9 +61,20 @@ struct pcpu_dstats { u64 tx_drps; u64 rx_pkts; u64 rx_bytes; + u64 rx_drps; struct u64_stats_sync syncp; }; +static void vrf_rx_stats(struct net_device *dev, int len) +{ + struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats); + + u64_stats_update_begin(>syncp); + dstats->rx_pkts++; + dstats->rx_bytes += len; + u64_stats_update_end(>syncp); +} + static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb) { vrf_dev->stats.tx_errors++; @@ -97,6 +109,34 @@ static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev, return stats; } +/* Local traffic destined to local address. Reinsert the packet to rx + * path, similar to loopback handling. + */ +static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev, + struct dst_entry *dst) +{ + int len = skb->len; + + skb_orphan(skb); + + skb_dst_set(skb, dst); + skb_dst_force(skb); + + /* set pkt_type to avoid skb hitting packet taps twice - +* once on Tx and again in Rx processing +*/ + skb->pkt_type = PACKET_LOOPBACK; + + skb->protocol = eth_type_trans(skb, dev); + + if (likely(netif_rx(skb) == NET_RX_SUCCESS)) + vrf_rx_stats(dev, len); + else + this_cpu_inc(dev->dstats->rx_drps); + + return NETDEV_TX_OK; +} + #if IS_ENABLED(CONFIG_IPV6) static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, struct net_device *dev) @@ -175,6 +215,34 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, } skb_dst_drop(skb); + + /* if dst.dev is loopback or the VRF device again this is locally +* originated traffic destined to a local address. Short circuit +* to Rx path using our local dst +*/ + if (rt->dst.dev == net->loopback_dev || rt->dst.dev == vrf_dev) { + struct net_vrf *vrf = netdev_priv(vrf_dev); + struct rtable *rth_local; + struct dst_entry *dst = NULL; + + ip_rt_put(rt); + + rcu_read_lock(); + + rth_local = rcu_dereference(vrf->rth_local); + if (likely(rth_local)) { + dst = _local->dst; + dst_hold(dst); + } + + rcu_read_unlock(); + + if (unlikely(!dst)) + goto err; + + return vrf_local_xmit(skb, vrf_dev, dst); + } + skb_dst_set(skb, >dst); /* strip the ethernet header added for pass through VRF device */ @@ -381,29 +449,48 @@ static int vrf_output(struct net *net, struct sock *sk, struct sk_buff *skb) static void
[PATCH net-next 0/3] net: vrf: Add support for local traffic to local addresses
Add support for locally originated traffic to VRF-local addresses, be it addresses on enslaved devices or addresses on the VRF device: $ ip addr show dev red 33: red:mtu 65536 qdisc pfifo_fast state UP group default qlen 1000 link/ether be:00:53:b5:e4:25 brd ff:ff:ff:ff:ff:ff inet 1.1.1.1/32 scope global red valid_lft forever preferred_lft forever inet6 :1::1/128 scope global valid_lft forever preferred_lft forever $ ip addr show dev eth1 3: eth1: mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 link/ether 02:e0:f9:79:34:bd brd ff:ff:ff:ff:ff:ff inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1 valid_lft forever preferred_lft forever inet6 2100:1::1/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe79:34bd/64 scope link valid_lft forever preferred_lft forever $ ping -c1 -I red 10.100.1.1 ping: Warning: source address might be selected on device other than red. PING 10.100.1.1 (10.100.1.1) from 10.100.1.1 red: 56(84) bytes of data. 64 bytes from 10.100.1.1: icmp_seq=1 ttl=64 time=0.057 ms $ ping -c1 -I red 1.1.1.1 PING 1.1.1.1 (1.1.1.1) from 1.1.1.1 red: 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=64 time=0.136 ms --- 1.1.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.136/0.136/0.136/0.000 ms $ ping6 -c1 -I red 2100:1::1 ping6: Warning: source address might be selected on device other than red. PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes 64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.167 ms --- 2100:1::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.167/0.167/0.167/0.000 ms $ ping6 -c1 -I red ::1 PING ::1(::1) from :1::1 red: 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.187 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.187/0.187/0.187/0.000 ms This change also enables use of loopback address on the VRF device: $ ip addr add dev red 127.0.0.1/8 $ ping -c1 -I red 127.0.0.1 PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 red: 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.058 ms David Ahern (3): net: vrf: Minor refactoring for local address patches net: vrf: ipv4 support for local traffic to local addresses net: vrf: ipv6 support for local traffic to local addresses drivers/net/vrf.c | 234 ++ 1 file changed, 201 insertions(+), 33 deletions(-) -- 2.1.4
[PATCH net-next 1/3] net: vrf: Minor refactoring for local address patches
Move the stripping of the ethernet header from is_ip_tx_frame into the ipv4 and ipv6 outbound functions. If the packet is destined to a local address the header is retained since the packet is sent back to netif_rx. Collapse vrf_send_v4_prep into vrf_process_v4_outbound. Signed-off-by: David Ahern--- drivers/net/vrf.c | 45 ++--- 1 file changed, 18 insertions(+), 27 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index aaac4c779047..d678aaeba572 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -125,6 +125,9 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, skb_dst_drop(skb); skb_dst_set(skb, dst); + /* strip the ethernet header added for pass through VRF device */ + __skb_pull(skb, skb_network_offset(skb)); + ret = ip6_local_out(net, skb->sk, skb); if (unlikely(net_xmit_eval(ret))) dev->stats.tx_errors++; @@ -145,29 +148,6 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, } #endif -static int vrf_send_v4_prep(struct sk_buff *skb, struct flowi4 *fl4, - struct net_device *vrf_dev) -{ - struct rtable *rt; - int err = 1; - - rt = ip_route_output_flow(dev_net(vrf_dev), fl4, NULL); - if (IS_ERR(rt)) - goto out; - - /* TO-DO: what about broadcast ? */ - if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) { - ip_rt_put(rt); - goto out; - } - - skb_dst_drop(skb); - skb_dst_set(skb, >dst); - err = 0; -out: - return err; -} - static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, struct net_device *vrf_dev) { @@ -182,10 +162,24 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, FLOWI_FLAG_SKIP_NH_OIF, .daddr = ip4h->daddr, }; + struct net *net = dev_net(vrf_dev); + struct rtable *rt; - if (vrf_send_v4_prep(skb, , vrf_dev)) + rt = ip_route_output_flow(net, , NULL); + if (IS_ERR(rt)) goto err; + if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) { + ip_rt_put(rt); + goto err; + } + + skb_dst_drop(skb); + skb_dst_set(skb, >dst); + + /* strip the ethernet header added for pass through VRF device */ + __skb_pull(skb, skb_network_offset(skb)); + if (!ip4h->saddr) { ip4h->saddr = inet_select_addr(skb_dst(skb)->dev, 0, RT_SCOPE_LINK); @@ -206,9 +200,6 @@ static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb, static netdev_tx_t is_ip_tx_frame(struct sk_buff *skb, struct net_device *dev) { - /* strip the ethernet header added for pass through VRF device */ - __skb_pull(skb, skb_network_offset(skb)); - switch (skb->protocol) { case htons(ETH_P_IP): return vrf_process_v4_outbound(skb, dev); -- 2.1.4
[PATCH net-next 3/3] net: vrf: ipv6 support for local traffic to local addresses
Add support for locally originated traffic to VRF-local IPv6 addresses. Similar to IPv4 a local dst is set on the skb and the packet is reinserted with a call to netif_rx. With this patch, ping, tcp and udp packets to a local IPv6 address are successfully routed: $ ip addr show dev eth1 4: eth1:mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000 link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1 valid_lft forever preferred_lft forever inet6 2100:1::1/120 scope global valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe1c:b974/64 scope link valid_lft forever preferred_lft forever $ ping6 -c1 -I red 2100:1::1 ping6: Warning: source address might be selected on device other than red. PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes 64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms ip6_input is exported so the VRF driver can use it for the dst input function. The dst_alloc function for IPv4 defaults to setting the input and output functions; IPv6's does not. VRF does not need to duplicate the Rx path so just export the ipv6 input function. Signed-off-by: David Ahern --- drivers/net/vrf.c | 89 --- 1 file changed, 85 insertions(+), 4 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 7df065456893..a0ca158f6ad9 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -52,6 +52,7 @@ struct net_vrf { struct rtable __rcu *rth; struct rtable __rcu *rth_local; struct rt6_info __rcu *rt6; + struct rt6_info __rcu *rt6_local; u32 tb_id; }; @@ -163,6 +164,46 @@ static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb, goto err; skb_dst_drop(skb); + + /* if dst.dev is loopback or the VRF device again this is locally +* originated traffic destined to a local address. Short circuit +* to Rx path using our local dst +*/ + if (dst->dev == net->loopback_dev || dst->dev == dev) { + struct net_vrf *vrf = netdev_priv(dev); + struct rt6_info *rt6_local; + + /* release looked up dst and use cached local dst */ + dst_release(dst); + + rcu_read_lock(); + + rt6_local = rcu_dereference(vrf->rt6_local); + if (unlikely(!rt6_local)) { + rcu_read_unlock(); + goto err; + } + + /* Ordering issue: cached local dst is created on newlink +* before the IPv6 initialization. Using the local dst +* requires rt6i_idev to be set so make sure it is. +*/ + if (unlikely(!rt6_local->rt6i_idev)) { + rt6_local->rt6i_idev = in6_dev_get(dev); + if (!rt6_local->rt6i_idev) { + rcu_read_unlock(); + goto err; + } + } + + dst = _local->dst; + dst_hold(dst); + + rcu_read_unlock(); + + return vrf_local_xmit(skb, dev, _local->dst); + } + skb_dst_set(skb, dst); /* strip the ethernet header added for pass through VRF device */ @@ -342,27 +383,38 @@ static int vrf_output6(struct net *net, struct sock *sk, struct sk_buff *skb) static void vrf_rt6_release(struct net_vrf *vrf) { struct rt6_info *rt6 = rtnl_dereference(vrf->rt6); + struct rt6_info *rt6_local = rtnl_dereference(vrf->rt6_local); - rcu_assign_pointer(vrf->rt6, NULL); + RCU_INIT_POINTER(vrf->rt6, NULL); + RCU_INIT_POINTER(vrf->rt6_local, NULL); + synchronize_rcu(); if (rt6) dst_release(>dst); + + if (rt6_local) { + if (rt6_local->rt6i_idev) + in6_dev_put(rt6_local->rt6i_idev); + + dst_release(_local->dst); + } } static int vrf_rt6_create(struct net_device *dev) { + int flags = DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE; struct net_vrf *vrf = netdev_priv(dev); struct net *net = dev_net(dev); struct fib6_table *rt6i_table; - struct rt6_info *rt6; + struct rt6_info *rt6, *rt6_local; int rc = -ENOMEM; rt6i_table = fib6_new_table(net, vrf->tb_id); if (!rt6i_table) goto out; - rt6 = ip6_dst_alloc(net, dev, - DST_HOST | DST_NOPOLICY | DST_NOXFRM | DST_NOCACHE); + /* create a dst for routing packets out a VRF device */ + rt6 = ip6_dst_alloc(net, dev, flags); if (!rt6) goto out; @@ -370,7 +422,25 @@ static int
Re: [net-next] ovs: set name assign type of internal port
On Tue, May 31, 2016 at 6:41 AM, Zhang Shengjuwrote: > Set name_assign_type of internal port to NET_NAME_USER. > > Signed-off-by: Zhang Shengju > --- > net/openvswitch/vport-internal_dev.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/openvswitch/vport-internal_dev.c > b/net/openvswitch/vport-internal_dev.c > index 2ee48e4..434e04c 100644 > --- a/net/openvswitch/vport-internal_dev.c > +++ b/net/openvswitch/vport-internal_dev.c > @@ -195,7 +195,7 @@ static struct vport *internal_dev_create(const struct > vport_parms *parms) > } > > vport->dev = alloc_netdev(sizeof(struct internal_dev), > - parms->name, NET_NAME_UNKNOWN, do_setup); > + parms->name, NET_NAME_USER, do_setup); Looks good. Acked-by: Pravin B Shelar
Re: [PATCH v2 2/5] fsl/qe: setup clock source for TDM mode
From: Zhao QiangDate: Thu, 2 Jun 2016 09:44:58 +0800 > +static int ucc_get_tdm_sync_source(u32 tdm_num, enum qe_clock clock, > +enum comm_dir mode) > +{ > + int source = -EINVAL; > + > + if (mode == COMM_DIR_RX && clock == QE_RSYNC_PIN) { > + source = 0; > + return source; > + } > + if (mode == COMM_DIR_TX && clock == QE_TSYNC_PIN) { > + source = 0; > + return source; > + } > + > + switch (tdm_num) { > + case 0: > + case 1: > + switch (clock) { > + case QE_BRG9: > + source = 1; > + break; > + case QE_BRG10: > + source = 2; > + break; These switch case bodies are over indented. Same goes for the rest of this function.
RE: [PATCH] r8152: Add support for setting MAC to system's Auxiliary MAC address
> I have some other questions which answers should we know: > > 1) Is that AUX MAC address implemented only in customized windows Dell > driver? Or also in "upstream" windows Realtek driver and all users of > Realtek hw can install it (or update via next driver update)? > I don't have the information on this. Realtek will have to comment here as this part is a black box to me. I'm asking my internal colleagues about this too. > 2) Can you share pseudo code or description of algorithm which decide > MAC address for newly connected r8152 device on windows? This could help > us to decide if something similar/same cannot be implemented also on > linux (either in kernel or userspace). What I would like to know are > those situations when you connect more r8152 devices (some Dell and some > non-Dell). > This is another thing I don't have the information for right now. I can install Windows on a laptop, install the Realtek driver and experiment, but it would be better to get this directly from Realtek if at all possible. > > I do have a way to query if a dock is plugged in via SMM, but I doubt > > that's what Realtek is using on the Windows side. > > So there is some way to check if Dell dock is plugged, right? But what > happen if you connect Dell dock and also non-Dell r8152 device? Can you > distinguish which device is Dell and which non-Dell? Yes, when querying if a Dell dock is plugged in, a "location" and "count" parameter is returned. I'd have to figure out how to translate that into what the Linux kernel sees. Actually the information for how to do this is already public too. It's in a pull request for Dock FW updating in the fwupd project. https://github.com/hughsie/fwupd/pull/49/files#diff-81b55c87ce1542a18b0a4b2b228b9129R189 > > Anyway, I think that by SMM you mean dell smbios API call. Cannot you > guys in Dell release documentation of all smbios calls to community? Well dell SMBIOS API call really means to use dcdbas kernel module which does SMM.. > Time to time you release some small parts in libsmbios project which > then we can use for implementing useful parts in kernel (e.g. LED driver > for controlling keyboard backlight). But there are couple of > undocumented APIs and maybe some can also help with this problem... > Releasing different bits of our SMBIOS document requires approvals. We can't just release the whole thing as there are lots of interfaces that aren't intended for the OS to be using. They're used only by Dell tools. For example we just had approval for information about querying TPM and dock information and those are present in the fwupd pull request for dock and TPM FW updates you see above. If you have some API's in particular you would like more information on, I'm happy to have internal discussion to see if we can release information on those. > > I'd leave that as > > a second to last resort (last resort being move back to userspace > > again). > > > > > What you definitely should not do is to change the mac for some > > > arbitrary "first" device. Then you are better off with the > > > userspace proposal where you and your users have some chance to > > > implement a sensible policy based on e.g. usb port numbers. > > > > OK, if I can't come up with a way to key on the device being a Dell > > dock I'll scrap this entirely kernel approach. > > E.g. PCI devices have ordinary PCI device & vendor IDs, but have Dell > specific subsystem IDs. And via subsystem IDs we can distinguish between > Intel graphics card on Dell laptop and on non-Dell laptop. > > Does not you have some special/modified firmware in those Dell realtek > docks (and ability to check from OS some registers)? I think so. Otherwise there would be all the same concerns you have outlined with generic devices. Like I said this part is currently a black box to me. I hope Realtek can publicly comment on this, or I can get some information from my colleagues.
Re: [PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
On Thu, Jun 02, 2016 at 07:04:32PM +, mario_limoncie...@dell.com wrote: > > -Original Message- > > From: Andrew Lunn [mailto:and...@lunn.ch] > > Sent: Thursday, June 2, 2016 2:03 PM > > To: Limonciello, Mario> > Cc: gre...@linuxfoundation.org; hayesw...@realtek.com; linux- > > ker...@vger.kernel.org; netdev@vger.kernel.org; linux- > > u...@vger.kernel.org; pali.ro...@gmail.com; anthony.w...@canonical.com > > Subject: Re: [PATCH v2] r8152: Add support for setting MAC to system's > > Auxiliary MAC address > > > > > > And you want to check this for all Dell devices? Please be model > > > > specific, I doubt a bunch of Dell servers wants to run this code... > > > > > > > > > > Tracking model specific is really going to turn into a giant list never > > > ending > > list. > > > To drill down more specifically, I can match on chassis too. > > > > Does Dell happen to use its own USB Vendor ID for the USB device in > > the dock? You could go at this problem from the other direction if it > > does have a unique vendor ID. > > > > Andrew > > Unfortunately it's not a Dell specific VID/PID. I'm asking around to find out > if there is something else identifiable about this dock's NIC (maybe that > r8152 > can query). lsusb -v I assume there is a USB hub in the dock, maybe that has a Dell VID? Going one level up the USB tree hierarchy should not be too hard. Andrew
Re: [PATCH] net: ethernet: wiznet: Remove create_workqueue
From: Bhaktipriya ShridharDate: Wed, 1 Jun 2016 23:29:15 +0530 > alloc_workqueue replaces deprecated create_workqueue(). > > A dedicated workqueue has been used since the workitems are involved > in normal device operation. Workitems >rx_work and >tx_work, > map to w5100_rx_work and w5100_tx_work respectively and are involved in > receiving and transmitting packets. Forward progress under > memory pressure is a requirement here. > > create_workqueue has been replaced with alloc_workqueue with max_active > as 0 since there is no need for throttling the number of active work > items. > > Since the driver may be used in memory reclaim path, > WQ_MEM_RECLAIM has been set to guarantee forward progress. > > flush_workqueue is unnecessary since destroy_workqueue() itself calls > drain_workqueue() which flushes repeatedly till the workqueue > becomes empty. Hence the call to flush_workqueue() has been dropped. > > Signed-off-by: Bhaktipriya Shridhar Applied to net-next, thanks.
Re: [PATCH] stmmac: do not sleep in atomic context for mdio_reset
From: Vincent PalatinDate: Wed, 1 Jun 2016 08:53:48 -0700 > stmmac_mdio_reset() has been updated to use msleep rather udelay > (as some PHY requires a one second delay there). > It called from stmmac_resume() within the spin_lock_irqsave block > atomic context triggering 'scheduling while atomic'. > > The stmmac_priv lock usage is not fully documented, but it seems > to protect the access to the MAC registers / DMA structures rather > than the MDIO bus or the PHY (which have separate locking), > so we can push the spin_lock after the stmmac_mdio_reset call. > > Signed-off-by: Vincent Palatin Applied, thanks.
Re: [PATCH 0/2] Software workaround for i.MX6Q/DL ERR006687
From: Lucas StachDate: Wed, 1 Jun 2016 17:29:41 +0200 > I would prefer if this series gets merged through the imx achitecture > tree with acks for the FEC changes from the network people. Sure, this is fine: Acked-by: David S. Miller
[PATCH] rxrpc: Use pr_ and pr_fmt, reduce object size a few KB
Use the more common kernel logging style and reduce object size. The logging message prefix changes from a mixture of "RxRPC:" and "RXRPC:" to "af_rxrpc: ". $ size net/rxrpc/built-in.o* textdata bss dec hex filename 6417219728304 74448 122d0 net/rxrpc/built-in.o.new 6751219728304 77788 12fdc net/rxrpc/built-in.o.old Miscellanea: o Consolidate the ASSERT macros to use a single pr_err call with decimal and hexadecimal output and a stringified #OP argument Signed-off-by: Joe Perches--- net/rxrpc/af_rxrpc.c | 18 ++ net/rxrpc/ar-accept.c | 2 ++ net/rxrpc/ar-ack.c| 2 ++ net/rxrpc/ar-call.c | 12 ++-- net/rxrpc/ar-connection.c | 2 ++ net/rxrpc/ar-connevent.c | 2 ++ net/rxrpc/ar-input.c | 2 ++ net/rxrpc/ar-internal.h | 30 -- net/rxrpc/ar-key.c| 4 +++- net/rxrpc/ar-local.c | 2 ++ net/rxrpc/ar-output.c | 2 ++ net/rxrpc/ar-peer.c | 2 ++ net/rxrpc/ar-recvmsg.c| 4 +++- net/rxrpc/ar-skbuff.c | 2 ++ net/rxrpc/ar-transport.c | 2 ++ net/rxrpc/rxkad.c | 2 ++ 16 files changed, 56 insertions(+), 34 deletions(-) diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c index e45e94c..7840b8e 100644 --- a/net/rxrpc/af_rxrpc.c +++ b/net/rxrpc/af_rxrpc.c @@ -9,6 +9,8 @@ * 2 of the License, or (at your option) any later version. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -796,49 +798,49 @@ static int __init af_rxrpc_init(void) "rxrpc_call_jar", sizeof(struct rxrpc_call), 0, SLAB_HWCACHE_ALIGN, NULL); if (!rxrpc_call_jar) { - printk(KERN_NOTICE "RxRPC: Failed to allocate call jar\n"); + pr_notice("Failed to allocate call jar\n"); goto error_call_jar; } rxrpc_workqueue = alloc_workqueue("krxrpcd", 0, 1); if (!rxrpc_workqueue) { - printk(KERN_NOTICE "RxRPC: Failed to allocate work queue\n"); + pr_notice("Failed to allocate work queue\n"); goto error_work_queue; } ret = rxrpc_init_security(); if (ret < 0) { - printk(KERN_CRIT "RxRPC: Cannot initialise security\n"); + pr_crit("Cannot initialise security\n"); goto error_security; } ret = proto_register(_proto, 1); if (ret < 0) { - printk(KERN_CRIT "RxRPC: Cannot register protocol\n"); + pr_crit("Cannot register protocol\n"); goto error_proto; } ret = sock_register(_family_ops); if (ret < 0) { - printk(KERN_CRIT "RxRPC: Cannot register socket family\n"); + pr_crit("Cannot register socket family\n"); goto error_sock; } ret = register_key_type(_type_rxrpc); if (ret < 0) { - printk(KERN_CRIT "RxRPC: Cannot register client key type\n"); + pr_crit("Cannot register client key type\n"); goto error_key_type; } ret = register_key_type(_type_rxrpc_s); if (ret < 0) { - printk(KERN_CRIT "RxRPC: Cannot register server key type\n"); + pr_crit("Cannot register server key type\n"); goto error_key_type_s; } ret = rxrpc_sysctl_init(); if (ret < 0) { - printk(KERN_CRIT "RxRPC: Cannot register sysctls\n"); + pr_crit("Cannot register sysctls\n"); goto error_sysctls; } diff --git a/net/rxrpc/ar-accept.c b/net/rxrpc/ar-accept.c index e7a7f05..eea5f4a 100644 --- a/net/rxrpc/ar-accept.c +++ b/net/rxrpc/ar-accept.c @@ -9,6 +9,8 @@ * 2 of the License, or (at your option) any later version. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include diff --git a/net/rxrpc/ar-ack.c b/net/rxrpc/ar-ack.c index 374478e..1838178 100644 --- a/net/rxrpc/ar-ack.c +++ b/net/rxrpc/ar-ack.c @@ -9,6 +9,8 @@ * 2 of the License, or (at your option) any later version. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include diff --git a/net/rxrpc/ar-call.c b/net/rxrpc/ar-call.c index 571a41f..1fbaae1 100644 --- a/net/rxrpc/ar-call.c +++ b/net/rxrpc/ar-call.c @@ -9,6 +9,8 @@ * 2 of the License, or (at your option) any later version. */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include #include #include @@ -669,8 +671,7 @@ void rxrpc_release_call(struct rxrpc_call *call) conn->channels[3] == NULL); break; default: - printk(KERN_ERR "RxRPC: conn->avail_calls=%d\n", - conn->avail_calls); + pr_err("conn->avail_calls=%d\n", conn->avail_calls); BUG();
Re: [PATCH V3 0/2] vhost_net polling optimization
From: Jason WangDate: Wed, 1 Jun 2016 01:56:32 -0400 > This series tries to optimize vhost_net polling at two points: > > - Stop rx polling for reduicng the unnecessary wakeups during > handle_rx(). > - Conditonally enable tx polling for reducing the unnecessary > traversing and spinlock touching. > > Test shows about 17% improvement on rx pps. > > Please review > > Changes from V2: > - Don't enable rx vq if we meet an err or rx vq is empty > Changes from V1: > - use vhost_net_disable_vq()/vhost_net_enable_vq() instead of open > coding. > - Add a new patch for conditionally enable tx polling. Michael, please review this patch series. Thanks.
Re: [net-next] ovs: set name assign type of internal port
From: Zhang ShengjuDate: Tue, 31 May 2016 13:41:02 + > Set name_assign_type of internal port to NET_NAME_USER. > > Signed-off-by: Zhang Shengju Pravin or some other OVS expert, please review this.
RE: [PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
> -Original Message- > From: Andrew Lunn [mailto:and...@lunn.ch] > Sent: Thursday, June 2, 2016 2:03 PM > To: Limonciello, Mario> Cc: gre...@linuxfoundation.org; hayesw...@realtek.com; linux- > ker...@vger.kernel.org; netdev@vger.kernel.org; linux- > u...@vger.kernel.org; pali.ro...@gmail.com; anthony.w...@canonical.com > Subject: Re: [PATCH v2] r8152: Add support for setting MAC to system's > Auxiliary MAC address > > > > And you want to check this for all Dell devices? Please be model > > > specific, I doubt a bunch of Dell servers wants to run this code... > > > > > > > Tracking model specific is really going to turn into a giant list never > > ending > list. > > To drill down more specifically, I can match on chassis too. > > Does Dell happen to use its own USB Vendor ID for the USB device in > the dock? You could go at this problem from the other direction if it > does have a unique vendor ID. > > Andrew Unfortunately it's not a Dell specific VID/PID. I'm asking around to find out if there is something else identifiable about this dock's NIC (maybe that r8152 can query).
Re: [PATCH] r8152: Add support for setting MAC to system's Auxiliary MAC address
On Thursday 02 June 2016 20:28:33 mario_limoncie...@dell.com wrote: > > -Original Message- > > From: Bjørn Mork [mailto:bj...@mork.no] > > Sent: Thursday, June 2, 2016 1:04 PM > > To: Limonciello, Mario> > Cc: gre...@linuxfoundation.org; hayesw...@realtek.com; linux- > > ker...@vger.kernel.org; netdev@vger.kernel.org; linux- > > u...@vger.kernel.org; pali.ro...@gmail.com; > > anthony.w...@canonical.com Subject: Re: [PATCH] r8152: Add support > > for setting MAC to system's Auxiliary MAC address > > > > writes: > > >> > 2) Track whether this is the first or second USB NIC plugged > > >> > in. Only > > > > offer it > > > > >> on the first NIC detected by r8152. When the second NIC is > > >> plugged in > > > > don't > > > > >> match from ACPI. > > >> > > >> > There would be a question of what to do if the first NIC is > > >> > removed and > > >> > > >> added back if it should get the persistent system MAC or not. > > >> > > >> > I'd say yes, just make sure that only one NIC can have it at a > > >> > time. > > >> > > >> You are going to get things very complex very quickly if you try > > >> to do this. > > > > > > It's really not that hard, track a module wide static variable > > > whether the feature is in use. Track in each device whether the > > > feature was in use. If it in use, don't assign the next device > > > plugged in via the ACPI string. If a device is removed that has > > > the feature activated, change the module wide static variable. > > > > Having the mac address jump around in an arbitrary way like this is > > going to confuse the hell out of your users. Consider what happens > > if the user docks a laptop with an r8152 usb dongle already > > plugged in... How are you going to explain that the dock gets some > > other mac address in this case? How are you going to explain the > > difference between using an r8152 based dongle and some other > > ethernet usb dongle with your systems? > > Yeah I understand the concern. I agree that would be very confusing > to a user. This does need to match only on Dell docks then. > > > Make it behave consistently if you're going to add this. Which can > > be done by specifically matching the Dell dock (doesn't it have an > > unique Dell device ID?) and ignoring any other r8152 device. You > > could also choose to set the same mac for all r8152 devices. > > Which is fine, but will probably confuse many users. > > Unfortunately there is no Dell specific VID/PID. I checked a no-name > dongle that used r8152 and it was the same (0bda:8153). Maybe Hayes > Wang can check with his Windows driver colleagues if there was > anything else to key off when this was implemented on the Windows > Realtek driver. If there is something else to key off of, I'm not > aware what it is. I'll check with some of my colleagues too. I have some other questions which answers should we know: 1) Is that AUX MAC address implemented only in customized windows Dell driver? Or also in "upstream" windows Realtek driver and all users of Realtek hw can install it (or update via next driver update)? 2) Can you share pseudo code or description of algorithm which decide MAC address for newly connected r8152 device on windows? This could help us to decide if something similar/same cannot be implemented also on linux (either in kernel or userspace). What I would like to know are those situations when you connect more r8152 devices (some Dell and some non-Dell). > I do have a way to query if a dock is plugged in via SMM, but I doubt > that's what Realtek is using on the Windows side. So there is some way to check if Dell dock is plugged, right? But what happen if you connect Dell dock and also non-Dell r8152 device? Can you distinguish which device is Dell and which non-Dell? Anyway, I think that by SMM you mean dell smbios API call. Cannot you guys in Dell release documentation of all smbios calls to community? Time to time you release some small parts in libsmbios project which then we can use for implementing useful parts in kernel (e.g. LED driver for controlling keyboard backlight). But there are couple of undocumented APIs and maybe some can also help with this problem... > I'd leave that as > a second to last resort (last resort being move back to userspace > again). > > > What you definitely should not do is to change the mac for some > > arbitrary "first" device. Then you are better off with the > > userspace proposal where you and your users have some chance to > > implement a sensible policy based on e.g. usb port numbers. > > OK, if I can't come up with a way to key on the device being a Dell > dock I'll scrap this entirely kernel approach. E.g. PCI devices have ordinary PCI device & vendor IDs, but have Dell specific subsystem IDs. And via subsystem IDs we can distinguish between Intel graphics card on Dell laptop and on non-Dell laptop. Does not you have some
Re: [PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
> > And you want to check this for all Dell devices? Please be model > > specific, I doubt a bunch of Dell servers wants to run this code... > > > > Tracking model specific is really going to turn into a giant list never > ending list. > To drill down more specifically, I can match on chassis too. Does Dell happen to use its own USB Vendor ID for the USB device in the dock? You could go at this problem from the other direction if it does have a unique vendor ID. Andrew
Re: [PATCH] r8152: Add support for setting MAC to system's Auxiliary MAC address
On Thursday 02 June 2016 20:04:02 Bjørn Mork wrote: >writes: > >> > 2) Track whether this is the first or second USB NIC plugged in. > >> > Only offer it > >> > >> on the first NIC detected by r8152. When the second NIC is > >> plugged in don't match from ACPI. > >> > >> > There would be a question of what to do if the first NIC is > >> > removed and > >> > >> added back if it should get the persistent system MAC or not. > >> > >> > I'd say yes, just make sure that only one NIC can have it at a > >> > time. > >> > >> You are going to get things very complex very quickly if you try > >> to do this. > > > > It's really not that hard, track a module wide static variable > > whether the feature is in use. Track in each device whether the > > feature was in use. If it in use, don't assign the next device > > plugged in via the ACPI string. If a device is removed that has > > the feature activated, change the module wide static variable. > > Having the mac address jump around in an arbitrary way like this is > going to confuse the hell out of your users. Consider what happens > if the user docks a laptop with an r8152 usb dongle already plugged > in... How are you going to explain that the dock gets some other mac > address in this case? How are you going to explain the difference > between using an r8152 based dongle and some other ethernet usb > dongle with your systems? > > Make it behave consistently if you're going to add this. Which can > be done by specifically matching the Dell dock (doesn't it have an > unique Dell device ID?) and ignoring any other r8152 device. You > could also choose to set the same mac for all r8152 devices. Which > is fine, but will probably confuse many users. > > What you definitely should not do is to change the mac for some > arbitrary "first" device. Then you are better off with the userspace > proposal where you and your users have some chance to implement a > sensible policy based on e.g. usb port numbers. This is exactly what I wanted to write, but you were faster :-) You can connect more Dell docks (with r8152 devices) and more non-Dell r8152 devices in random order into Dell laptop. In any case dependent on connect and disconnect order, devices always must have exactly same MAC addresses. Otherwise there will be problems! It confuse users and also admins of networks... So if kernel approach is chosen then I think there are only two solution those satisfy above conditions: First one is: * all non-Dell devices have own MAC address * all Dell devices have (one, same) AUX MAC address Second one is: * all devices (Dell and also non-Dell) have own address * AUX MAC address is never used So what do you (netdev maintainers) think about it? -- Pali Rohár pali.ro...@gmail.com signature.asc Description: This is a digitally signed message part.
Re: [PATCH v5 2/2] skb_array: ring test
On Tue, 24 May 2016 23:34:14 +0300 "Michael S. Tsirkin"wrote: > On Tue, May 24, 2016 at 07:03:20PM +0200, Jesper Dangaard Brouer wrote: > > > > On Tue, 24 May 2016 12:28:09 +0200 > > Jesper Dangaard Brouer wrote: > > > > > I do like perf, but it does not answer my questions about the > > > performance of this queue. I will code something up in my own > > > framework[2] to answer my own performance questions. > > > > > > Like what is be minimum overhead (in cycles) achievable with this type > > > of queue, in the most optimal situation (e.g. same CPU enq+deq cache hot) > > > for fastpath usage. > > > > Coded it up here: > > https://github.com/netoptimizer/prototype-kernel/commit/b16a3332184 > > > > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_bench01.c > > > > This is a really fake benchmark, but it sort of shows the > > overhead achievable with this type of queue, where it is the same > > CPU enqueuing and dequeuing, and cache is guaranteed to be hot. > > > > Measured on a i7-4790K CPU @ 4.00GHz, the average cost of > > enqueue+dequeue of a single object is around 102 cycles(tsc). > > > > To compare this with below, where enq and deq is measured separately: > > 102 / 2 = 51 cycles The alf_queue[1] baseline is 26 cycles in this minimum overhead achievable benchmark with a MPMC (Multi-Producer/Multi-Consumer) queue which use a locked cmpxchg. (SPSC variant is 5 cycles, thus most cost comes from locked cmpxchg). [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/linux/alf_queue.h > > > Then I also want to know how this performs when two CPUs are involved. > > > As this is also a primary use-case, for you when sending packets into a > > > guest. > > > > Coded it up here: > > https://github.com/netoptimizer/prototype-kernel/commit/75fe31ef62e > > > > https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_parallel01.c > > > > This parallel benchmark try to keep two (or more) CPUs busy enqueuing or > > dequeuing on the same skb_array queue. It prefills the queue, > > and stops the test as soon as queue is empty or full, or > > completes a number of "loops"/cycles. > > > > For two CPUs the results are really good: > > enqueue: 54 cycles(tsc) > > dequeue: 53 cycles(tsc) As MST points out, a scheme like the alf_queue[1] have the issue that it "reads" the opposite cacheline of the consumer.tail/producer.tail to determine if space-is-left/queue-is-empty. This cause an expensive transition for the cache coherency protocol. Coded up similar test for alf_queue: https://github.com/netoptimizer/prototype-kernel/commit/b3ff2624f1 https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/alf_queue_parallel01.c For two CPUs MPMC results are, significantly worse, and demonstrate MSTs point: enqueue: 227 cycles(tsc) dequeue: 231 cycles(tsc) Alf_queue also have a SPSC (Single-Producer/Single-Consumer) variant: enqueue: 24 cycles(tsc) dequeue: 23 cycles(tsc) > > Going to 4 CPUs, things break down (but it was not primary use-case?): > > CPU(0) 927 cycles(tsc) enqueue > > CPU(1) 921 cycles(tsc) dequeue > > CPU(2) 927 cycles(tsc) enqueue > > CPU(3) 898 cycles(tsc) dequeue > > It's mostly the spinlock contention I guess. > Maybe we don't need fair spinlocks in this case. > Try replacing spinlocks with simple cmpxchg > and see what happens? The alf_queue uses a cmpxchg scheme, and it does scale better when the number of CPUs increase: CPUs:4 Average: 586 cycles(tsc) CPUs:6 Average: 744 cycles(tsc) CPUs:8 Average: 1578 cycles(tsc) Notice the alf_queue was designed with the purpose of bulking, to mitigate the effect of this cacheline bouncing, but it was not covered in this test. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
RE: [PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
> -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Thursday, June 2, 2016 12:48 PM > To: Limonciello, Mario> Cc: hayesw...@realtek.com; LKML ; Netdev > ; Linux USB ; > pali.ro...@gmail.com; anthony.w...@canonical.com > Subject: Re: [PATCH v2] r8152: Add support for setting MAC to system's > Auxiliary MAC address > > On Thu, Jun 02, 2016 at 11:58:07AM -0500, Mario Limonciello wrote: > > Dell systems with Type-C ports have support for a persistent system > > specific MAC address when used with Dell Type-C docks and dongles. > > This means a dock plugged into two different systems will show different > > (but persistent) MAC addresses. Dell Type-C docks and dongles use the > > r8152 driver. > > > > This information for the system's persistent MAC address is burned in > when > > the HW is built and available under _SB\AMAC in the DSDT at runtime. > > > > More information about the technology is available here: > > http://www.dell.com/support/article/us/en/04/SLN301147 > > > > Signed-off-by: Mario Limonciello > > --- > > drivers/net/usb/r8152.c | 53 > + > > 1 file changed, 53 insertions(+) > > > > diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c > > index 3f9f6ed..6dea542 100644 > > --- a/drivers/net/usb/r8152.c > > +++ b/drivers/net/usb/r8152.c > > @@ -26,6 +26,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > /* Information for net-next */ > > #define NETNEXT_VERSION"08" > > @@ -500,6 +502,7 @@ enum rtl8152_flags { > > SELECTIVE_SUSPEND, > > PHY_RESET, > > SCHEDULE_NAPI, > > + MAC_PASSTHRU = 0, > > Does setting that to 0 really work? You just did this for two enum > values, what is the compiler supposed to do? Very silly of me. I was rushing to send a v2. I'm surprised this worked. Shouldn't be assigned to anything. > > > }; > > > > /* Define these values to match your device */ > > @@ -653,6 +656,7 @@ enum tx_csum_stat { > > */ > > static const int multicast_filter_limit = 32; > > static unsigned int agg_buf_sz = 16384; > > +static bool mac_passthru_active; > > very generic name for a platform-specific feature :( Once this is broken up into an x86 platform provided method I'll rename this to platform_mac_active (or something similar). > > > > > > #define RTL_LIMITED_TSO_SIZE (agg_buf_sz - sizeof(struct tx_desc) - > \ > > VLAN_ETH_HLEN - VLAN_HLEN) > > @@ -1030,6 +1034,49 @@ out1: > > return ret; > > } > > > > +static int get_auxiliary_addr(struct r8152 *tp, struct sockaddr *sa) > > What about the platform mac address api that was pointed out? I mentioned this in the cover letter - I haven't gotten a chance to move it over there yet. I sent v2 before I did so that you can see what I've been doing as it was relevant to your other comments. > > > +{ > > + acpi_status status; > > + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; > > + union acpi_object *obj; > > + int ret = -1; > > + unsigned char buf[6]; > > + > > + if (!dmi_name_in_vendors("Dell Inc.") || mac_passthru_active) > > + return -1; > > Don't make up random error values, please use "real" ones. OK. > > And you want to check this for all Dell devices? Please be model > specific, I doubt a bunch of Dell servers wants to run this code... > Tracking model specific is really going to turn into a giant list never ending list. To drill down more specifically, I can match on chassis too. > > + > > + /* returns _AUXMAC_#AABBCCDDEEFF# */ > > + status = acpi_evaluate_object(NULL, "\\_SB.AMAC", NULL, ); > > + obj = (union acpi_object *)buffer.pointer; > > + if (ACPI_SUCCESS(status)) { > > + if (obj->type != ACPI_TYPE_BUFFER || > > + obj->string.length != 0x17) { > > + pr_warn("r8152: get_auxiliary_addr: Invalid buffer"); > > + goto amacout; > > + } > > + if (strncmp(obj->string.pointer, "_AUXMAC_#", 9) != 0) { > > + pr_warn("r8152: get_auxiliary_addr: Invalid header"); > > + goto amacout; > > + } > > + ret = hex2bin(buf, obj->string.pointer + 9, 6); > > + if (ret < 0) { > > + pr_warn("r8152: get_auxiliary_addr: Invalid MAC"); > > + goto amacout; > > + } > > + memcpy(sa->sa_data, buf, 6); > > + ether_addr_copy(tp->netdev->dev_addr, sa->sa_data); > > + netdev_info(tp->netdev, "Using system MAC address > %pM\n", > > + sa->sa_data); > > + set_bit(MAC_PASSTHRU, >flags); > > + mac_passthru_active = true; > > + ret = 1; > > 1 is not a "all is good" return value. OK will switch
RE: [PATCH] r8152: Add support for setting MAC to system's Auxiliary MAC address
> -Original Message- > From: Bjørn Mork [mailto:bj...@mork.no] > Sent: Thursday, June 2, 2016 1:04 PM > To: Limonciello, Mario> Cc: gre...@linuxfoundation.org; hayesw...@realtek.com; linux- > ker...@vger.kernel.org; netdev@vger.kernel.org; linux- > u...@vger.kernel.org; pali.ro...@gmail.com; anthony.w...@canonical.com > Subject: Re: [PATCH] r8152: Add support for setting MAC to system's > Auxiliary MAC address > > writes: > > >> > 2) Track whether this is the first or second USB NIC plugged in. Only > offer it > >> on the first NIC detected by r8152. When the second NIC is plugged in > don't > >> match from ACPI. > >> > There would be a question of what to do if the first NIC is removed and > >> added back if it should get the persistent system MAC or not. > >> > I'd say yes, just make sure that only one NIC can have it at a time. > >> > >> You are going to get things very complex very quickly if you try to do > >> this. > > > > It's really not that hard, track a module wide static variable whether > > the feature is in use. Track in each device whether the feature was > > in use. If it in use, don't assign the next device plugged in via the > > ACPI string. If a device is removed that has the feature activated, > > change the module wide static variable. > > Having the mac address jump around in an arbitrary way like this is > going to confuse the hell out of your users. Consider what happens if > the user docks a laptop with an r8152 usb dongle already plugged in... > How are you going to explain that the dock gets some other mac address > in this case? How are you going to explain the difference between using > an r8152 based dongle and some other ethernet usb dongle with your > systems? Yeah I understand the concern. I agree that would be very confusing to a user. This does need to match only on Dell docks then. > > Make it behave consistently if you're going to add this. Which can be > done by specifically matching the Dell dock (doesn't it have an unique > Dell device ID?) and ignoring any other r8152 device. You could also > choose to set the same mac for all r8152 devices. Which is fine, but > will probably confuse many users. Unfortunately there is no Dell specific VID/PID. I checked a no-name dongle that used r8152 and it was the same (0bda:8153). Maybe Hayes Wang can check with his Windows driver colleagues if there was anything else to key off when this was implemented on the Windows Realtek driver. If there is something else to key off of, I'm not aware what it is. I'll check with some of my colleagues too. I do have a way to query if a dock is plugged in via SMM, but I doubt that's what Realtek is using on the Windows side. I'd leave that as a second to last resort (last resort being move back to userspace again). > > What you definitely should not do is to change the mac for some > arbitrary "first" device. Then you are better off with the userspace > proposal where you and your users have some chance to implement a > sensible policy based on e.g. usb port numbers. OK, if I can't come up with a way to key on the device being a Dell dock I'll scrap this entirely kernel approach.
Re: [PATCH -next 2/2] virtio_net: Read the advised MTU
On 06/02/2016 10:06 AM, Aaron Conole wrote: Rick Joneswrites: One of the things I've been doing has been setting-up a cluster (OpenStack) with JumboFrames, and then setting MTUs on instance vNICs by hand to measure different MTU sizes. It would be a shame if such a thing were not possible in the future. Keeping a warning if shrinking the MTU would be good, leave the error (perhaps) to if an attempt is made to go beyond the advised value. This was cut because it didn't make sense for such a warning to be issued, but it seems like perhaps you may want such a feature? I agree with Michael, after thinking about it, that I don't know what sort of use the warning would serve. After all, if you're changing the MTU, you must have wanted such a change to occur? I don't need a warning, was simply willing to live with one when shrinking the MTU. Didn't want an error. happy benchmarking, rick jones
[PATCH v3 6/7] sctp: Add GSO support
SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer. This patch takes a different approach. SCTP will now build a skb as it would be if it was received using GRO. That is, there will be a cover skb with protocol headers and children ones containing the actual segments, already segmented to a way that respects SCTP RFCs. With that, we can tell skb_segment() to just split based on frag_list, trusting its sizes are already in accordance. This way SCTP can benefit from GSO and instead of passing several packets through the stack, it can pass a single large packet. v2: - Added support for receiving GSO frames, as requested by Dave Miller. - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP) - Added heuristics similar to what we have in TCP for not generating single GSO packets that fills cwnd. v3: - consider sctphdr size in skb_gso_transport_seglen() - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO") Signed-off-by: Marcelo Ricardo LeitnerTested-by: Xin Long --- include/linux/netdev_features.h | 7 +- include/linux/netdevice.h | 1 + include/linux/skbuff.h | 2 + include/net/sctp/sctp.h | 4 + include/net/sctp/structs.h | 5 + net/core/ethtool.c | 1 + net/core/skbuff.c | 3 + net/sctp/Makefile | 3 +- net/sctp/input.c| 12 +- net/sctp/inqueue.c | 51 +- net/sctp/offload.c | 98 +++ net/sctp/output.c | 363 +++- net/sctp/protocol.c | 3 + net/sctp/socket.c | 2 + 14 files changed, 429 insertions(+), 126 deletions(-) create mode 100644 net/sctp/offload.c diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index aa7b2400f98c584d29e83f0eddf7bf13766cedd1..9c6c8ef2e9e704513cc4272b0a3ee2fec6809d46 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -53,8 +53,9 @@ enum { * headers in software. */ NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */ + NETIF_F_GSO_SCTP_BIT, /* ... SCTP fragmentation */ /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ - NETIF_F_GSO_TUNNEL_REMCSUM_BIT, + NETIF_F_GSO_SCTP_BIT, NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */ NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */ @@ -128,6 +129,7 @@ enum { #define NETIF_F_TSO_MANGLEID __NETIF_F(TSO_MANGLEID) #define NETIF_F_GSO_PARTIAL __NETIF_F(GSO_PARTIAL) #define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM) +#define NETIF_F_GSO_SCTP __NETIF_F(GSO_SCTP) #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER) #define NETIF_F_HW_VLAN_STAG_RX__NETIF_F(HW_VLAN_STAG_RX) #define NETIF_F_HW_VLAN_STAG_TX__NETIF_F(HW_VLAN_STAG_TX) @@ -166,7 +168,8 @@ enum { NETIF_F_FSO) /* List of features with software fallbacks. */ -#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | NETIF_F_UFO) +#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | NETIF_F_UFO | \ +NETIF_F_GSO_SCTP) /* * If one device supports one of these features, then enable them diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929ce815725d868261e9a2585ac53d0c8f128..fa6df2699532e4ad6deb37f1bdcfafc71d2580cb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4012,6 +4012,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type) BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_PARTIAL != (NETIF_F_GSO_PARTIAL >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT)); + BUILD_BUG_ON(SKB_GSO_SCTP!= (NETIF_F_GSO_SCTP >> NETIF_F_GSO_SHIFT)); return (features & feature) == feature; } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index aa3f9d7e8d5ca455387efa22d4a9d3a079a56f0c..dc0fca747c5e1c5b23b1e52ce3e354667eb2a994 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -487,6 +487,8 @@ enum { SKB_GSO_PARTIAL = 1 << 13, SKB_GSO_TUNNEL_REMCSUM = 1 << 14, + + SKB_GSO_SCTP = 1 << 15, }; #if BITS_PER_LONG > 32 diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index b392ac8382f2bf0be118f797acc0eb4ddeb5..632e205ca54bfe85124753e09445251056e19aa7 100644 --- a/include/net/sctp/sctp.h +++
[PATCH v3 5/7] sctp: delay as much as possible skb_linearize
This patch is a preparation for the GSO one. In order to successfully handle GSO packets on rx path we must not call skb_linearize, otherwise it defeats any gain GSO may have had. This patch thus delays as much as possible the call to skb_linearize, leaving it to sctp_inq_pop() moment. For that the sanity checks performed now know how to deal with fragments. One positive side-effect of this is that if the socket is backlogged it will have the chance of doing it on backlog processing instead of during softirq. With this move, it's evident that a check for non-linearity in sctp_inq_pop was ineffective and is now removed. Note that a similar check is performed a bit below this one. Signed-off-by: Marcelo Ricardo LeitnerTested-by: Xin Long --- net/sctp/input.c | 45 + net/sctp/inqueue.c | 29 ++--- 2 files changed, 43 insertions(+), 31 deletions(-) diff --git a/net/sctp/input.c b/net/sctp/input.c index a701527a9480faff1b8d91257e1dbf3c0f09ed68..5cff2546c3dd6d3823b5a28bac1e72880cd57756 100644 --- a/net/sctp/input.c +++ b/net/sctp/input.c @@ -112,7 +112,6 @@ int sctp_rcv(struct sk_buff *skb) struct sctp_ep_common *rcvr; struct sctp_transport *transport = NULL; struct sctp_chunk *chunk; - struct sctphdr *sh; union sctp_addr src; union sctp_addr dest; int family; @@ -124,15 +123,18 @@ int sctp_rcv(struct sk_buff *skb) __SCTP_INC_STATS(net, SCTP_MIB_INSCTPPACKS); - if (skb_linearize(skb)) + /* If packet is too small to contain a single chunk, let's not +* waste time on it anymore. +*/ + if (skb->len < sizeof(struct sctphdr) + sizeof(struct sctp_chunkhdr) + + skb_transport_offset(skb)) goto discard_it; - sh = sctp_hdr(skb); + if (!pskb_may_pull(skb, sizeof(struct sctphdr))) + goto discard_it; - /* Pull up the IP and SCTP headers. */ + /* Pull up the IP header. */ __skb_pull(skb, skb_transport_offset(skb)); - if (skb->len < sizeof(struct sctphdr)) - goto discard_it; skb->csum_valid = 0; /* Previous value not applicable */ if (skb_csum_unnecessary(skb)) @@ -141,11 +143,7 @@ int sctp_rcv(struct sk_buff *skb) goto discard_it; skb->csum_valid = 1; - skb_pull(skb, sizeof(struct sctphdr)); - - /* Make sure we at least have chunk headers worth of data left. */ - if (skb->len < sizeof(struct sctp_chunkhdr)) - goto discard_it; + __skb_pull(skb, sizeof(struct sctphdr)); family = ipver2af(ip_hdr(skb)->version); af = sctp_get_af_specific(family); @@ -230,7 +228,7 @@ int sctp_rcv(struct sk_buff *skb) chunk->rcvr = rcvr; /* Remember the SCTP header. */ - chunk->sctp_hdr = sh; + chunk->sctp_hdr = sctp_hdr(skb); /* Set the source and destination addresses of the incoming chunk. */ sctp_init_addrs(chunk, , ); @@ -660,19 +658,23 @@ out_unlock: */ static int sctp_rcv_ootb(struct sk_buff *skb) { - sctp_chunkhdr_t *ch; - __u8 *ch_end; - - ch = (sctp_chunkhdr_t *) skb->data; + sctp_chunkhdr_t *ch, _ch; + int ch_end, offset = 0; /* Scan through all the chunks in the packet. */ do { + /* Make sure we have at least the header there */ + if (offset + sizeof(sctp_chunkhdr_t) > skb->len) + break; + + ch = skb_header_pointer(skb, offset, sizeof(*ch), &_ch); + /* Break out if chunk length is less then minimal. */ if (ntohs(ch->length) < sizeof(sctp_chunkhdr_t)) break; - ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length)); - if (ch_end > skb_tail_pointer(skb)) + ch_end = offset + WORD_ROUND(ntohs(ch->length)); + if (ch_end > skb->len) break; /* RFC 8.4, 2) If the OOTB packet contains an ABORT chunk, the @@ -697,8 +699,8 @@ static int sctp_rcv_ootb(struct sk_buff *skb) if (SCTP_CID_INIT == ch->type && (void *)ch != skb->data) goto discard; - ch = (sctp_chunkhdr_t *) ch_end; - } while (ch_end < skb_tail_pointer(skb)); + offset = ch_end; + } while (ch_end < skb->len); return 0; @@ -1173,6 +1175,9 @@ static struct sctp_association *__sctp_rcv_lookup_harder(struct net *net, { sctp_chunkhdr_t *ch; + if (skb_linearize(skb)) + return NULL; + ch = (sctp_chunkhdr_t *) skb->data; /* The code below will attempt to walk the chunk and extract diff --git a/net/sctp/inqueue.c b/net/sctp/inqueue.c index
[PATCH v3 7/7] sctp: improve debug message to also log curr pkt and new chunk size
This is useful for debugging packet sizes. Signed-off-by: Marcelo Ricardo LeitnerTested-by: Xin Long --- net/sctp/output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/sctp/output.c b/net/sctp/output.c index 60499a69179d255c47da1fa19b73147917a050bf..90d2e125c2f5e0e1ecb33a7eab10772e5b39567c 100644 --- a/net/sctp/output.c +++ b/net/sctp/output.c @@ -182,7 +182,8 @@ sctp_xmit_t sctp_packet_transmit_chunk(struct sctp_packet *packet, sctp_xmit_t retval; int error = 0; - pr_debug("%s: packet:%p chunk:%p\n", __func__, packet, chunk); + pr_debug("%s: packet:%p size:%lu chunk:%p size:%d\n", __func__, +packet, packet->size, chunk, chunk->skb ? chunk->skb->len : -1); switch ((retval = (sctp_packet_append_chunk(packet, chunk { case SCTP_XMIT_PMTU_FULL: -- 2.5.5
[PATCH v3 3/7] sk_buff: allow segmenting based on frag sizes
This patch allows segmenting a skb based on its frags sizes instead of based on a fixed value. Signed-off-by: Marcelo Ricardo LeitnerTested-by: Xin Long --- include/linux/skbuff.h | 5 + net/core/skbuff.c | 10 +++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ee38a41274759f279be1c0752a7fab63fac517c8..329a0a9ef67115cae03b7c1304de031116384148 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -301,6 +301,11 @@ struct sk_buff; #endif extern int sysctl_max_skb_frags; +/* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to + * segment using its current segmentation instead. + */ +#define GSO_BY_FRAGS 0x + typedef struct skb_frag_struct skb_frag_t; struct skb_frag_struct { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4724bcf9b0cae1cecbe5bc2c04e308bb70b3232a..97c32c75e704af1f31b064e8f1e0475ff1505d67 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3116,9 +3116,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, int hsize; int size; - len = head_skb->len - offset; - if (len > mss) - len = mss; + if (unlikely(mss == GSO_BY_FRAGS)) { + len = list_skb->len; + } else { + len = head_skb->len - offset; + if (len > mss) + len = mss; + } hsize = skb_headlen(head_skb) - offset; if (hsize < 0) -- 2.5.5
[PATCH v3 2/7] skbuff: export skb_gro_receive
sctp GSO requires it and sctp can be compiled as a module, so we need to export this function. Signed-off-by: Marcelo Ricardo LeitnerTested-by: Xin Long --- net/core/skbuff.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index f2b77e549c03a771909cd9c87c40ec2b7826cd31..4724bcf9b0cae1cecbe5bc2c04e308bb70b3232a 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3438,6 +3438,7 @@ done: NAPI_GRO_CB(skb)->same_flow = 1; return 0; } +EXPORT_SYMBOL_GPL(skb_gro_receive); void __init skb_init(void) { -- 2.5.5
[PATCH v3 4/7] skbuff: introduce skb_gso_validate_mtu
skb_gso_network_seglen is not enough for checking fragment sizes if skb is using GSO_BY_FRAGS as we have to check frag per frag. This patch introduces skb_gso_validate_mtu, based on the former, which will wrap the use case inside it as all calls to skb_gso_network_seglen were to validate if it fits on a given TMU, and improve the check. Signed-off-by: Marcelo Ricardo LeitnerTested-by: Xin Long --- include/linux/skbuff.h | 1 + net/core/skbuff.c | 31 +++ net/ipv4/ip_forward.c | 2 +- net/ipv4/ip_output.c | 2 +- net/ipv6/ip6_output.c | 2 +- net/mpls/af_mpls.c | 2 +- 6 files changed, 36 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 329a0a9ef67115cae03b7c1304de031116384148..aa3f9d7e8d5ca455387efa22d4a9d3a079a56f0c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2992,6 +2992,7 @@ void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len); int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen); void skb_scrub_packet(struct sk_buff *skb, bool xnet); unsigned int skb_gso_transport_seglen(const struct sk_buff *skb); +bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu); struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features); struct sk_buff *skb_vlan_untag(struct sk_buff *skb); int skb_ensure_writable(struct sk_buff *skb, int write_len); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 97c32c75e704af1f31b064e8f1e0475ff1505d67..5ca562b56ec39d39e1225d96547e242732518ffe 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4392,6 +4392,37 @@ unsigned int skb_gso_transport_seglen(const struct sk_buff *skb) } EXPORT_SYMBOL_GPL(skb_gso_transport_seglen); +/** + * skb_gso_validate_mtu - Return in case such skb fits a given MTU + * + * @skb: GSO skb + * + * skb_gso_validate_mtu validates if a given skb will fit a wanted MTU + * once split. + */ +bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu) +{ + const struct skb_shared_info *shinfo = skb_shinfo(skb); + const struct sk_buff *iter; + unsigned int hlen; + + hlen = skb_gso_network_seglen(skb); + + if (shinfo->gso_size != GSO_BY_FRAGS) + return hlen <= mtu; + + /* Undo this so we can re-use header sizes */ + hlen -= GSO_BY_FRAGS; + + skb_walk_frags(skb, iter) { + if (hlen + skb_headlen(iter) > mtu) + return false; + } + + return true; +} +EXPORT_SYMBOL_GPL(skb_gso_validate_mtu); + static struct sk_buff *skb_reorder_vlan_header(struct sk_buff *skb) { if (skb_cow(skb, skb_headroom(skb)) < 0) { diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c index cbfb1808fcc490b94dc0bbdab6142acb8fa37815..9f0a7b96646f368021d9cd51bc3f728ba49eed0d 100644 --- a/net/ipv4/ip_forward.c +++ b/net/ipv4/ip_forward.c @@ -54,7 +54,7 @@ static bool ip_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) if (skb->ignore_df) return false; - if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) + if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu)) return false; return true; diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 124bf0a663283502deb03397343160d493a378b1..cbac493c913ac37b57a97314f9e7099b14b8246c 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -225,7 +225,7 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk, /* common case: locally created skb or seglen is <= mtu */ if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) || - skb_gso_network_seglen(skb) <= mtu) + skb_gso_validate_mtu(skb, mtu)) return ip_finish_output2(net, sk, skb); /* Slowpath - GSO segment length is exceeding the dst MTU. diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index cbf127ae7c676650cc626cbf12cd61b6b570ea43..6b2f60a5c1de3063bb65c07b2b77c13f33890af8 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -368,7 +368,7 @@ static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) if (skb->ignore_df) return false; - if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) + if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu)) return false; return true; diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 0b80a7140cc494d8c39bd3efba2423272d1b8844..7a4aa3450dd71039e73516bd711ba7392493eb5e 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -91,7 +91,7 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) if (skb->len <= mtu) return false; - if (skb_is_gso(skb) && skb_gso_network_seglen(skb) <= mtu) + if (skb_is_gso(skb) && skb_gso_validate_mtu(skb,
[PATCH v3 0/7] sctp: Add GSO support
This patchset adds sctp GSO support. Performance tests indicates that increases throughput by 10% if using bigger chunk sizes, specially if bigger than MTU. For small chunks, it doesn't help much if not using heavy firewall rules. For small chunks it will probably be of more use once we get something like MSG_MORE as David Laight had suggested. overall changes: v1->v2: Added support for receiving GSO frames on SCTP stack, as requested by Dave Miller. v2->v3: Consider sctphdr size in skb_gso_transport_seglen() rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO") Marcelo Ricardo Leitner (7): loopback: make use of NETIF_F_GSO_SOFTWARE skbuff: export skb_gro_receive sk_buff: allow segmenting based on frag sizes skbuff: introduce skb_gso_validate_mtu sctp: delay as much as possible skb_linearize sctp: Add GSO support sctp: improve debug message to also log curr pkt and new chunk size drivers/net/loopback.c | 5 +- include/linux/netdev_features.h | 7 +- include/linux/netdevice.h | 1 + include/linux/skbuff.h | 8 + include/net/sctp/sctp.h | 4 + include/net/sctp/structs.h | 5 + net/core/ethtool.c | 1 + net/core/skbuff.c | 45 - net/ipv4/ip_forward.c | 2 +- net/ipv4/ip_output.c| 2 +- net/ipv6/ip6_output.c | 2 +- net/mpls/af_mpls.c | 2 +- net/sctp/Makefile | 3 +- net/sctp/input.c| 57 --- net/sctp/inqueue.c | 78 +++-- net/sctp/offload.c | 98 +++ net/sctp/output.c | 366 +++- net/sctp/protocol.c | 3 + net/sctp/socket.c | 2 + 19 files changed, 524 insertions(+), 167 deletions(-) create mode 100644 net/sctp/offload.c -- 2.5.5
[PATCH v3 1/7] loopback: make use of NETIF_F_GSO_SOFTWARE
NETIF_F_GSO_SOFTWARE was defined to list all GSO software types, so lets make use of it in loopback code. Note that veth/vxlan/others already uses it. Within this patch series, this patch causes lo to pick up SCTP GSO feature automatically (as it's added to NETIF_F_GSO_SOFTWARE) and thus avoiding segmentation if possible. Signed-off-by: Marcelo Ricardo LeitnerTested-by: Xin Long --- drivers/net/loopback.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index a400288cb37b9bfb6190f1bd7c64d02e97713956..6255973e3dda35fd41464ce51f0f9fb9f0b8364b 100644 --- a/drivers/net/loopback.c +++ b/drivers/net/loopback.c @@ -169,10 +169,9 @@ static void loopback_setup(struct net_device *dev) dev->flags = IFF_LOOPBACK; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_QUEUE; netif_keep_dst(dev); - dev->hw_features= NETIF_F_ALL_TSO | NETIF_F_UFO; + dev->hw_features= NETIF_F_GSO_SOFTWARE; dev->features = NETIF_F_SG | NETIF_F_FRAGLIST - | NETIF_F_ALL_TSO - | NETIF_F_UFO + | NETIF_F_GSO_SOFTWARE | NETIF_F_HW_CSUM | NETIF_F_RXCSUM | NETIF_F_SCTP_CRC -- 2.5.5
Re: [PATCH] r8152: Add support for setting MAC to system's Auxiliary MAC address
writes: >> > 2) Track whether this is the first or second USB NIC plugged in. Only >> > offer it >> on the first NIC detected by r8152. When the second NIC is plugged in don't >> match from ACPI. >> > There would be a question of what to do if the first NIC is removed and >> added back if it should get the persistent system MAC or not. >> > I'd say yes, just make sure that only one NIC can have it at a time. >> >> You are going to get things very complex very quickly if you try to do this. > > It's really not that hard, track a module wide static variable whether > the feature is in use. Track in each device whether the feature was > in use. If it in use, don't assign the next device plugged in via the > ACPI string. If a device is removed that has the feature activated, > change the module wide static variable. Having the mac address jump around in an arbitrary way like this is going to confuse the hell out of your users. Consider what happens if the user docks a laptop with an r8152 usb dongle already plugged in... How are you going to explain that the dock gets some other mac address in this case? How are you going to explain the difference between using an r8152 based dongle and some other ethernet usb dongle with your systems? Make it behave consistently if you're going to add this. Which can be done by specifically matching the Dell dock (doesn't it have an unique Dell device ID?) and ignoring any other r8152 device. You could also choose to set the same mac for all r8152 devices. Which is fine, but will probably confuse many users. What you definitely should not do is to change the mac for some arbitrary "first" device. Then you are better off with the userspace proposal where you and your users have some chance to implement a sensible policy based on e.g. usb port numbers. Bjørn
Re: [PATCH -next 2/2] virtio_net: Read the advised MTU
kbuild test robot <l...@intel.com> writes: > Hi, > > [auto build test ERROR on next-20160602] > > url: > https://github.com/0day-ci/linux/commits/Aaron-Conole/virtio-net-Advised-MTU-feature/20160603-000714 > config: i386-allmodconfig (attached as .config) > compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > Note: the > linux-review/Aaron-Conole/virtio-net-Advised-MTU-feature/20160603-000714 HEAD > d909da4df3c52f78b4f5fcccd89aea5e38722d10 builds fine. > It only hurts bisectibility. > > All errors (new ones prefixed by >>): > >drivers/net/virtio_net.c: In function 'virtnet_probe': >>> drivers/net/virtio_net.c:1899:31: error: 'VIRTIO_NET_F_MTU' >>> undeclared (first use in this function) > if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { > ^~~~ >drivers/net/virtio_net.c:1899:31: note: each undeclared identifier is > reported only once for each function it appears in >drivers/net/virtio_net.c: At top level: >>> drivers/net/virtio_net.c:2076:2: error: 'VIRTIO_NET_F_MTU' >>> undeclared here (not in a function) > VIRTIO_NET_F_MTU, > ^~~~ Oops, hunk was dropped during rebase. Sorry for this, v2 will fix this error, as well and I'll do a boot test before submission. Thanks kbuild robot! > vim +/VIRTIO_NET_F_MTU +1899 drivers/net/virtio_net.c > > 1893virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) > 1894vi->any_header_sg = true; > 1895 > 1896if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) > 1897vi->has_cvq = true; > 1898 >> 1899 if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { > 1900dev->mtu = virtio_cread16(vdev, > 1901 offsetof(struct virtio_net_config, > 1902 mtu)); > 1903} > 1904 > 1905if (vi->any_header_sg) > 1906dev->needed_headroom = vi->hdr_len; > 1907 > 1908/* Use single tx/rx queue pair as default */ > 1909vi->curr_queue_pairs = 1; > 1910vi->max_queue_pairs = max_queue_pairs; > 1911 > 1912/* Allocate/initialize the rx/tx queues, and invoke > find_vqs */ > 1913err = init_vqs(vi); > 1914if (err) > 1915goto free_stats; > 1916 > 1917#ifdef CONFIG_SYSFS > 1918if (vi->mergeable_rx_bufs) > 1919dev->sysfs_rx_queue_group = > _net_mrg_rx_group; > 1920#endif > 1921netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs); > 1922netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs); > 1923 > 1924virtnet_init_settings(dev); > 1925 > 1926err = register_netdev(dev); > 1927if (err) { > 1928pr_debug("virtio_net: registering device > failed\n"); > 1929goto free_vqs; > 1930} > 1931 > 1932virtio_device_ready(vdev); > 1933 > 1934vi->nb.notifier_call = _cpu_callback; > 1935err = register_hotcpu_notifier(>nb); > 1936if (err) { > 1937pr_debug("virtio_net: registering cpu notifier > failed\n"); > 1938goto free_unregister_netdev; > 1939} > 1940 > 1941/* Assume link up if device can't report link status, > 1942 otherwise get link status from config. */ > 1943if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) { > 1944netif_carrier_off(dev); > 1945schedule_work(>config_work); > 1946} else { > 1947vi->status = VIRTIO_NET_S_LINK_UP; > 1948netif_carrier_on(dev); > 1949} > 1950 > 1951pr_debug("virtnet: registered device %s with %d RX and > TX vq's\n", > 1952 dev->name, max_queue_pairs); > 1953 > 1954return 0; > 1955 > 1956free_unregister_netdev: > 1957
Re: [PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
On Thu, Jun 02, 2016 at 11:58:07AM -0500, Mario Limonciello wrote: > Dell systems with Type-C ports have support for a persistent system > specific MAC address when used with Dell Type-C docks and dongles. > This means a dock plugged into two different systems will show different > (but persistent) MAC addresses. Dell Type-C docks and dongles use the > r8152 driver. > > This information for the system's persistent MAC address is burned in when > the HW is built and available under _SB\AMAC in the DSDT at runtime. > > More information about the technology is available here: > http://www.dell.com/support/article/us/en/04/SLN301147 > > Signed-off-by: Mario Limonciello> --- > drivers/net/usb/r8152.c | 53 > + > 1 file changed, 53 insertions(+) > > diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c > index 3f9f6ed..6dea542 100644 > --- a/drivers/net/usb/r8152.c > +++ b/drivers/net/usb/r8152.c > @@ -26,6 +26,8 @@ > #include > #include > #include > +#include > +#include > > /* Information for net-next */ > #define NETNEXT_VERSION "08" > @@ -500,6 +502,7 @@ enum rtl8152_flags { > SELECTIVE_SUSPEND, > PHY_RESET, > SCHEDULE_NAPI, > + MAC_PASSTHRU = 0, Does setting that to 0 really work? You just did this for two enum values, what is the compiler supposed to do? > }; > > /* Define these values to match your device */ > @@ -653,6 +656,7 @@ enum tx_csum_stat { > */ > static const int multicast_filter_limit = 32; > static unsigned int agg_buf_sz = 16384; > +static bool mac_passthru_active; very generic name for a platform-specific feature :( > > #define RTL_LIMITED_TSO_SIZE (agg_buf_sz - sizeof(struct tx_desc) - \ >VLAN_ETH_HLEN - VLAN_HLEN) > @@ -1030,6 +1034,49 @@ out1: > return ret; > } > > +static int get_auxiliary_addr(struct r8152 *tp, struct sockaddr *sa) What about the platform mac address api that was pointed out? > +{ > + acpi_status status; > + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; > + union acpi_object *obj; > + int ret = -1; > + unsigned char buf[6]; > + > + if (!dmi_name_in_vendors("Dell Inc.") || mac_passthru_active) > + return -1; Don't make up random error values, please use "real" ones. And you want to check this for all Dell devices? Please be model specific, I doubt a bunch of Dell servers wants to run this code... > + > + /* returns _AUXMAC_#AABBCCDDEEFF# */ > + status = acpi_evaluate_object(NULL, "\\_SB.AMAC", NULL, ); > + obj = (union acpi_object *)buffer.pointer; > + if (ACPI_SUCCESS(status)) { > + if (obj->type != ACPI_TYPE_BUFFER || > + obj->string.length != 0x17) { > + pr_warn("r8152: get_auxiliary_addr: Invalid buffer"); > + goto amacout; > + } > + if (strncmp(obj->string.pointer, "_AUXMAC_#", 9) != 0) { > + pr_warn("r8152: get_auxiliary_addr: Invalid header"); > + goto amacout; > + } > + ret = hex2bin(buf, obj->string.pointer + 9, 6); > + if (ret < 0) { > + pr_warn("r8152: get_auxiliary_addr: Invalid MAC"); > + goto amacout; > + } > + memcpy(sa->sa_data, buf, 6); > + ether_addr_copy(tp->netdev->dev_addr, sa->sa_data); > + netdev_info(tp->netdev, "Using system MAC address %pM\n", > + sa->sa_data); > + set_bit(MAC_PASSTHRU, >flags); > + mac_passthru_active = true; > + ret = 1; 1 is not a "all is good" return value. > + } > + > +amacout: > + kfree(obj); > + return ret; > +} > + > static int set_ethernet_addr(struct r8152 *tp) > { > struct net_device *dev = tp->netdev; > @@ -1041,6 +1088,10 @@ static int set_ethernet_addr(struct r8152 *tp) > else > ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data); > > + /* if system provides auxiliary MAC address */ > + if (get_auxiliary_addr(tp, )) > + ret = 0; ret = my_dell_specific_function(); But again, I don't like this, but I'm not the network subsystem maintainer, I'll defer to them as to if this is something they want in individual drivers... thanks, greg k-h
Re: [PATCH -next 2/2] virtio_net: Read the advised MTU
Hi, [auto build test ERROR on next-20160602] url: https://github.com/0day-ci/linux/commits/Aaron-Conole/virtio-net-Advised-MTU-feature/20160603-000714 config: i386-allmodconfig (attached as .config) compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430 reproduce: # save the attached .config to linux build tree make ARCH=i386 Note: the linux-review/Aaron-Conole/virtio-net-Advised-MTU-feature/20160603-000714 HEAD d909da4df3c52f78b4f5fcccd89aea5e38722d10 builds fine. It only hurts bisectibility. All errors (new ones prefixed by >>): drivers/net/virtio_net.c: In function 'virtnet_probe': >> drivers/net/virtio_net.c:1899:31: error: 'VIRTIO_NET_F_MTU' undeclared >> (first use in this function) if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { ^~~~ drivers/net/virtio_net.c:1899:31: note: each undeclared identifier is reported only once for each function it appears in drivers/net/virtio_net.c: At top level: >> drivers/net/virtio_net.c:2076:2: error: 'VIRTIO_NET_F_MTU' undeclared here >> (not in a function) VIRTIO_NET_F_MTU, ^~~~ vim +/VIRTIO_NET_F_MTU +1899 drivers/net/virtio_net.c 1893 virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) 1894 vi->any_header_sg = true; 1895 1896 if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) 1897 vi->has_cvq = true; 1898 > 1899 if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { 1900 dev->mtu = virtio_cread16(vdev, 1901offsetof(struct virtio_net_config, 1902 mtu)); 1903 } 1904 1905 if (vi->any_header_sg) 1906 dev->needed_headroom = vi->hdr_len; 1907 1908 /* Use single tx/rx queue pair as default */ 1909 vi->curr_queue_pairs = 1; 1910 vi->max_queue_pairs = max_queue_pairs; 1911 1912 /* Allocate/initialize the rx/tx queues, and invoke find_vqs */ 1913 err = init_vqs(vi); 1914 if (err) 1915 goto free_stats; 1916 1917 #ifdef CONFIG_SYSFS 1918 if (vi->mergeable_rx_bufs) 1919 dev->sysfs_rx_queue_group = _net_mrg_rx_group; 1920 #endif 1921 netif_set_real_num_tx_queues(dev, vi->curr_queue_pairs); 1922 netif_set_real_num_rx_queues(dev, vi->curr_queue_pairs); 1923 1924 virtnet_init_settings(dev); 1925 1926 err = register_netdev(dev); 1927 if (err) { 1928 pr_debug("virtio_net: registering device failed\n"); 1929 goto free_vqs; 1930 } 1931 1932 virtio_device_ready(vdev); 1933 1934 vi->nb.notifier_call = _cpu_callback; 1935 err = register_hotcpu_notifier(>nb); 1936 if (err) { 1937 pr_debug("virtio_net: registering cpu notifier failed\n"); 1938 goto free_unregister_netdev; 1939 } 1940 1941 /* Assume link up if device can't report link status, 1942 otherwise get link status from config. */ 1943 if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) { 1944 netif_carrier_off(dev); 1945 schedule_work(>config_work); 1946 } else { 1947 vi->status = VIRTIO_NET_S_LINK_UP; 1948 netif_carrier_on(dev); 1949 } 1950 1951 pr_debug("virtnet: registered device %s with %d RX and TX vq's\n", 1952 dev->name, max_queue_pairs); 1953 1954 return 0; 1955 1956 free_unregister_netdev: 1957 vi->vdev->config->reset(vdev); 1958 1959 unregister_netdev(dev); 1960 free_vqs: 1961 cancel_delayed_work_sync(>refill); 1962 free_receive_page_frags(vi); 1963 virtnet_del_vqs(vi); 1964 free_stats: 1965 free_percpu(vi->stats); 1966 free: 1967 free_netdev(dev); 1968 return err; 1969 } 1970 1971 static void remove_vq_common(struct virtnet_info *vi) 1972 { 1973 vi->vdev->config->reset(vi->vdev); 1974 1975 /* Free unused buffers in both send and recv, if any. */ 1976 free_unused_bufs(vi); 1977 1978 free_receive_bufs(vi); 1979 1980 free_receive_page_frags(vi); 1981 1982 virtnet_del_vqs(vi); 1983 } 1984 1985 static void virtnet_remove(struct virtio_device *vdev) 1986 { 1987 struct virtnet_info *vi = vdev->priv; 1988 1989 unregister_hotcpu_notifier(>nb); 1990 1991
[PATCH net-next] hv_netvsc: Fix VF register on vlan devices
Added a condition to avoid vlan devices with same MAC registering as VF. Signed-off-by: Haiyang ZhangReviewed-by: K. Y. Srinivasan --- drivers/net/hyperv/netvsc_drv.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 6a69b5c..5ac1267 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -1500,6 +1500,10 @@ static int netvsc_netdev_event(struct notifier_block *this, { struct net_device *event_dev = netdev_notifier_info_to_dev(ptr); + /* Avoid Vlan dev with same MAC registering as VF */ + if (event_dev->priv_flags & IFF_802_1Q_VLAN) + return NOTIFY_DONE; + switch (event) { case NETDEV_REGISTER: return netvsc_register_vf(event_dev); -- 1.7.4.1
[PATCH iputils v3] ping6: allow disabling of openssl/libgcrypt support
Signed-off-by: Mike Frysinger--- Makefile | 5 - iputils_md5dig.h | 2 +- ping6.c | 28 +++- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index b6cf512f22a5..8b9e2aa232e6 100644 --- a/Makefile +++ b/Makefile @@ -36,7 +36,7 @@ ARPING_DEFAULT_DEVICE= # Libgcrypt (for MD5) for ping6 [yes|no|static] USE_GCRYPT=yes -# Crypto library for ping6 [shared|static] +# Crypto library for ping6 [shared|static|no] USE_CRYPTO=shared # Resolv library for ping6 [yes|static] USE_RESOLV=yes @@ -66,7 +66,10 @@ ifneq ($(USE_GCRYPT),no) LIB_CRYPTO = $(call FUNC_LIB,$(USE_GCRYPT),$(LDFLAG_GCRYPT)) DEF_CRYPTO = -DUSE_GCRYPT else +ifneq ($(USE_CRYPTO),no) LIB_CRYPTO = $(call FUNC_LIB,$(USE_CRYPTO),$(LDFLAG_CRYPTO)) + DEF_CRYPTO = -DUSE_OPENSSL +endif endif # USE_RESOLV: LIB_RESOLV diff --git a/iputils_md5dig.h b/iputils_md5dig.h index 4cec86699465..9f09ba0a8c60 100644 --- a/iputils_md5dig.h +++ b/iputils_md5dig.h @@ -5,7 +5,7 @@ # include # include # define IPUTILS_MD5DIG_LEN16 -#else +#elif defined(USE_OPENSSL) # include #endif diff --git a/ping6.c b/ping6.c index 6d1a6db37146..95568ec4fbaf 100644 --- a/ping6.c +++ b/ping6.c @@ -85,6 +85,12 @@ char copyright[] = #include "ping6_niquery.h" #include "in6_flowlabel.h" +#if defined(USE_GCRYPT) || defined(USE_OPENSSL) +# define ENABLE_NIQUERY 1 +#else +# define ENABLE_NIQUERY 0 +#endif + #ifndef SOL_IPV6 #define SOL_IPV6 IPPROTO_IPV6 #endif @@ -238,6 +244,8 @@ unsigned int if_name2index(const char *ifname) return i; } +#if ENABLE_NIQUERY + struct niquery_option { char *name; int namelen; @@ -669,6 +677,12 @@ int niquery_option_handler(const char *opt_arg) return ret; } +#else + +# define niquery_is_enabled() 0 + +#endif /* ENABLE_NIQUERY */ + static int hextoui(const char *str) { unsigned long val; @@ -790,6 +804,7 @@ int main(int argc, char *argv[]) printf("ping6 utility, iputils-%s\n", SNAPSHOT); exit(0); case 'N': +#if ENABLE_NIQUERY if (using_ping_socket) { fprintf(stderr, "ping: -N requires raw socket permissions\n"); exit(2); @@ -798,6 +813,10 @@ int main(int argc, char *argv[]) usage(); break; } +#else + fprintf(stderr, "ping: function not available; crypto disabled\n"); + exit(2); +#endif break; COMMON_OPTIONS common_options(ch); @@ -891,6 +910,7 @@ int main(int argc, char *argv[]) } #endif +#if ENABLE_NIQUERY if (niquery_is_enabled()) { niquery_init_nonce(); @@ -900,6 +920,7 @@ int main(int argc, char *argv[]) ni_subject_type = NI_SUBJ_IPV6; } } +#endif if (argc > 1) { #ifndef ENABLE_PING6_RTHDR @@ -1369,7 +1390,7 @@ int build_echo(__u8 *_icmph) return cc; } - +#if ENABLE_NIQUERY int build_niquery(__u8 *_nih) { struct ni_hdr *nih; @@ -1391,6 +1412,7 @@ int build_niquery(__u8 *_nih) return cc; } +#endif int send_probe(void) { @@ -1398,9 +1420,11 @@ int send_probe(void) rcvd_clear(ntransmitted + 1); +#if ENABLE_NIQUERY if (niquery_is_enabled()) len = build_niquery(outpack); else +#endif len = build_echo(outpack); if (cmsglen == 0) { @@ -1619,6 +1643,7 @@ parse_reply(struct msghdr *msg, int cc, void *addr, struct timeval *tv) hops, 0, tv, pr_addr(>sin6_addr), pr_echo_reply)) return 0; +#if ENABLE_NIQUERY } else if (icmph->icmp6_type == ICMPV6_NI_REPLY) { struct ni_hdr *nih = (struct ni_hdr *)icmph; int seq = niquery_check_nonce(nih->ni_nonce); @@ -1629,6 +1654,7 @@ parse_reply(struct msghdr *msg, int cc, void *addr, struct timeval *tv) hops, 0, tv, pr_addr(>sin6_addr), pr_niquery_reply)) return 0; +#endif } else { int nexthdr; struct ip6_hdr *iph1 = (struct ip6_hdr*)(icmph+1); -- 2.8.2
Re: [PATCH -next 2/2] virtio_net: Read the advised MTU
"Michael S. Tsirkin"writes: > On Thu, Jun 02, 2016 at 11:43:31AM -0400, Aaron Conole wrote: >> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it >> exists, read the advised MTU and use it. >> >> No proper error handling is provided for the case where a user changes the >> negotiated MTU. A future commit will add proper error handling. Instead, a >> warning is emitted if the guest changes the device MTU after previously >> being given advice. > > I don't see a warning and I don't think it's needed. > Patch is ok commit log isn't. Okay, I'll fix it when I submit v2. >> Signed-off-by: Aaron Conole >> --- >> drivers/net/virtio_net.c | 7 +++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index e0638e5..ef5ee01 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -1896,6 +1896,12 @@ static int virtnet_probe(struct virtio_device *vdev) >> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) >> vi->has_cvq = true; >> >> +if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { >> +dev->mtu = virtio_cread16(vdev, >> + offsetof(struct virtio_net_config, >> + mtu)); >> +} >> + >> if (vi->any_header_sg) >> dev->needed_headroom = vi->hdr_len; >> >> @@ -2067,6 +2073,7 @@ static unsigned int features[] = { >> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, >> VIRTIO_NET_F_CTRL_MAC_ADDR, >> VIRTIO_F_ANY_LAYOUT, >> +VIRTIO_NET_F_MTU, >> }; >> >> static struct virtio_driver virtio_net_driver = { >> -- >> 2.5.5
Re: [PATCH -next 1/2] virtio: Start feature MTU support
"Michael S. Tsirkin"writes: > On Thu, Jun 02, 2016 at 11:43:30AM -0400, Aaron Conole wrote: >> This commit adds the feature bit and associated mtu device entry for the >> virtio network device. Future commits will make use of these bits to >> support negotiated MTU. > > why split it out? Pls squash with the next patch. Okay. I thought the usual method was make a commit which introduces the data structure changes, and then make a commit which hooks it up. I'll squash this in the future. >> Signed-off-by: Aaron Conole >> --- >> include/uapi/linux/virtio_net.h | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/include/uapi/linux/virtio_net.h >> b/include/uapi/linux/virtio_net.h >> index ec32293..751ff59 100644 >> --- a/include/uapi/linux/virtio_net.h >> +++ b/include/uapi/linux/virtio_net.h >> @@ -73,6 +73,8 @@ struct virtio_net_config { >> * Legal values are between 1 and 0x8000 >> */ >> __u16 max_virtqueue_pairs; >> +/* Default maximum transmit unit advice */ >> +__u16 mtu; >> } __attribute__((packed)); >> >> /* >> -- >> 2.5.5
Re: [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns
Nikolay please see my question for you at the end. Jan Karawrites: > On Wed 01-06-16 11:00:06, Eric W. Biederman wrote: >> Cc'd the containers list. >> >> Nikolay Borisov writes: >> >> > Currently the inotify instances/watches are being accounted in the >> > user_struct structure. This means that in setups where multiple >> > users in unprivileged containers map to the same underlying >> > real user (e.g. user_struct) the inotify limits are going to be >> > shared as well which can lead to unplesantries. This is a problem >> > since any user inside any of the containers can potentially exhaust >> > the instance/watches limit which in turn might prevent certain >> > services from other containers from starting. >> >> On a high level this is a bit problematic as it appears to escapes the >> current limits and allows anyone creating a user namespace to have their >> own fresh set of limits. Given that anyone should be able to create a >> user namespace whenever they feel like escaping limits is a problem. >> That however is solvable. >> >> A practical question. What kind of limits are we looking at here? >> >> Are these loose limits for detecting buggy programs that have gone >> off their rails? >> >> Are these tight limits to ensure multitasking is possible? > > The original motivation for these limits is to limit resource usage. There > is in-kernel data structure that is associated with each notification mark > you create and we don't want users to be able to DoS the system by creating > too many of them. Thus we limit number of notification marks for each user. > There is also a limit on the number of notification instances - those are > naturally limited by the number of open file descriptors but admin may want > to limit them more... > > So cgroups would be probably the best fit for this but I'm not sure whether > it is not an overkill... There is some level of kernel memory accounting in the memory cgroup. That said my experience with cgroups is that while they are good for some things the semantics that derive from the userspace API are problematic. In the cgroup model objects in the kernel don't belong to a cgroup they belong to a task/process. Those processes belong to a cgroup. Processes under control of a sufficiently privileged parent are allowed to switch cgroups. This causes implementation challenges and sematic mismatch in a world where things are typically considered to have an owner. Right now fs_notify groups (upon which all of the rest of the inotify accounting is built upon) belong to a user. So there is a semantic mismatch with cgroups right out of the gate. Given that cgroups have not choosen to account for individual kernel objects or give that level of control, I think it reasonable to look to other possible solutions. Assuming the overhead can be kept under control. The implementation of a hierarchical counter in mm/page_counter.c strongly suggests to me that the overhead can be kept under control. And yes. I am thinking of the problem space where you have a limit based on the problem domain where if an application consumes more than the limit, the application is likely bonkers. Which does prevent a DOS situation in kernel memory. But is different from the problem I have seen cgroups solve. The problem I have seen cgroups solve looks like. Hmm. I have 8GB of ram. I have 3 containers. Container A can have 4GB, Container B can have 1GB and container C can have 3GB. Then I know one container won't push the other containers into swap. Perhaps that would tend to be a top down/vs a bottom up approach to coming up with limits. As DOS preventions limits like the inotify ones are generally written from the perspective of if you have more than X you are crazy. While cgroup limits tend to be thought about top down from a total system management point of view. So I think there is definitely something to look at. All of that said there is definitely a practical question that needs to be asked. Nikolay how did you get into this situation? A typical user namespace configuration will set up uid and gid maps with the help of a privileged program and not map the uid of the user who created the user namespace. Thus avoiding exhausting the limits of the user who created the container. Which makes me personally more worried about escaping the existing limits than exhausting the limits of a particular user. Eric
Re: [PATCH -next 2/2] virtio_net: Read the advised MTU
Hi Rick, In the future, please don't cut the list. Rick Joneswrites: > On 06/02/2016 08:43 AM, Aaron Conole wrote: >> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it >> exists, read the advised MTU and use it. >> >> No proper error handling is provided for the case where a user changes the >> negotiated MTU. A future commit will add proper error handling. Instead, a >> warning is emitted if the guest changes the device MTU after previously >> being given advice. > > One of the things I've been doing has been setting-up a cluster > (OpenStack) with JumboFrames, and then setting MTUs on instance vNICs > by hand to measure different MTU sizes. It would be a shame if such a > thing were not possible in the future. Keeping a warning if shrinking > the MTU would be good, leave the error (perhaps) to if an attempt is > made to go beyond the advised value. This was cut because it didn't make sense for such a warning to be issued, but it seems like perhaps you may want such a feature? I agree with Michael, after thinking about it, that I don't know what sort of use the warning would serve. After all, if you're changing the MTU, you must have wanted such a change to occur? -Aaron > happy benchmarking, > > rick jones
Re: [PATCH net-next] tcp: accept RST if SEQ matches right edge of SACK block
On Thu, Jun 2, 2016 at 3:14 PM, Randall Stewartwrote: > > Pau: > > Hopefully me setting the “plain text” in my Mac-Mail preferences will make > this > plain text :-) > > > >> > >> > >> Well yes the probability is increased but definitely not assured :-) > >> Your scenario is specific to a very high loss path. Which is why > >> the challenge ack is lost... > >> > > > > Correct. But still an improvement in this particular situation with > > the only drawback of checking against 5 (4+1) SEQ numbers instead of > > 1. > > > > > >> > >> > >> Possibly another alternative is to change the client where when sending a > >> RST > >> with a TCB instead of using snd_nxt you could use snd_una. Of course that > >> could > >> also result in a challenge ACK if the receiver has not yet received a > >> ACK that is in flight (that was the whole purpose of the challenge ack). I > >> think overall > >> you will always have this problem i.e. the sender of the RST may not know > >> precisely > >> the state of the receiver. > >> > >> > >> Indeed, I guess there will still be same problem in other scenarios. > >> On top of that, it seems a bit weird to me to send a RST packet using > >> a SEQ number which was already used to send a different packet (that's > >> what would happen in this case right?). > >> > >> > >> Why the trick here is you want to RST the connection. You need to use > >> a seq number that is valid.. the seq numbers once a RST is being sent > >> mean nothing the app will get the same thing. > >> > >> In fact snd_nxt in the scenario above is *also* a re-used sequence number. > >> You > >> retransmitted from snd_una for one segment, snd_nxt got left 1 segment up, > >> so > >> when you sent the RST it would be snd_nxt which was previously a data > >> segment being marked with RST. > >> > >> If you wanted to assure that no other segment had been sent with that > >> sequence you would have to put snd_max as the value, but of course > >> that *would* return a challenge ack for sure. > >> > >> The trick here is you are trying to “guess” where the peer is. The only > >> thing you know for sure is snd_una. Anything else in flight won’t reset > >> the peer. In fact in your scenario if you had sent snd_una instead of > >> snd_nxt it would have worked. If you changed the sender to use > >> snd_una then it would be interesting to see if that also gave you > >> similar results... > >> > >> > >> > >> > >> Your fix happens to work since the receiver happens to have the SACK blocks > >> in question.. this is fine and if you don’t mind *weakening the security* > >> of > >> the > >> RST you could do that. I think for stack I am working on for FreeBSD I will > >> change > >> the stack I am working on to recognize the RST going out and use snd_una. > >> > >> > >> You mean here you are always going to use snd_una or that you are > >> going to try to figure out some heuristics to use either snd_next or > >> snd_una depending on the scenario? > >> > >> > >> For this scenario I will always use snd_una I am not sure you can reliably > >> develop any heuristics to tell you to use snd_una/snd_nxt or some other > >> block that as been sent :-) > >> > >> snd_una is actually the most actuate as to what you know at the time not > >> snd_nxt. I am sure using snd_nxt is a hold over from before the RFC got > >> implemented. > >> > > > > As, as far as I understand now, it could be useful to improve the > > situation in the sender too by checking if we recently received SACK > > blocks from the receiver and in that case sending the RST using > > snd_una instead of snd_nxt because in that case we will almost surely > > receive a challenge_ack if we use snd_nxt as SEQ for the RST. > > > > Checking the scoreboard and using snd_una instead of snd_nxt > might help things. > > > On the other hand, using always snd_una as SEQ for the RST would > > cause other (even more usual) cases to be discarded or answered with a > > challenge ACK which are accepted right now. I'm thinking for instance > > any case in which you send packets (so in flight packets > > sender->receiver) just before the RST is sent (with the snd_una). > > Packets are received by the receiver and RCV_NXT is updated and then > > you receive the RST which is < than RCV_NXT just updated. Am I missing > > something? Please correct me if I'm wrong. > > > > I think if packets are in flight either way you are taking a gamble on what > you are sending, snd_nxt/snd_una and snd_max. > > If you are idle and send a RST then those should all be the same and > you will win :) > > snd_nxt will be questionable if you are in the middle of retransmitting (your > case) > since you really don’t know where it is. > > I do like your idea of using the scoreboard to tell if you need to use > snd_una or snd_nxt. In theory if the scoreboard is empty using > snd_nxt should be the equivalent to using snd_max.. but if they > are not equal then you are doing a retransmit and it becomes a crap >
[ANNOUNCE] nftables 0.6 release
Hi! The Netfilter project proudly presents: nftables 0.6 This release contains many accumulated bug fixes and new features availale up to the Linux 4.7-rc1 kernel release. New features * Rule replacement: You can replace any rule from the unique 64-bits handle. You have to retrieve the handle from the ruleset listing. # nft list ruleset -a table ip filter { chain input { ... ct state new tcp dport ssh accept counter packets 0 bytes 0 # handle 4 } } Then, indicate this handle from the new rule that you want to replace, eg. # nft replace rule filter input handle 4 ct state new \ tcp dport { 22, 80} counter accept * Flow table support: This provides a native replacement for the hashlimit match in iptables. The rule below creates a 'ssh' flow table declares a ratelimit of 10 packets per second for each source IP address: # nft add rule filter input tcp dport 22 ct state new \ flow table ssh { ip saddr limit rate 10/second } accept This is actually way more than hashlimit since you can use any selector and build your own tuple of selectors through concatenations, eg. # nft add rule filter input \ flow table acct { iif . ip saddr timeout 60s counter } Then, if you want to list the content of the 'acct' flow table: # nft list flow table acct table ip filter { flow table acct { type iface_index . ipv4_addr flags timeout elements = { eth0 . 218.68.110.274 expires 3m56s : counter packets 1 bytes 98, eth0 . 180.29.103.19 expires 3m57s : counter packets 2 bytes 80, eth0 . 8.8.8.8 expires 3m44s : counter packets 1 bytes 84} } } Note that this listing format is still unstable though, so don't make tools to parse this output yet. Commands to empty flow tables and remove specific entries are still missing. Moreover, flow tables require a Linux kernel >= 4.3. * New tracing infrastructure: Useful for ruleset debugging, you have to enable tracing via: # nft filter input tcp dport 1 nftrace set 1 # nft filter input icmp type echo-request nftrace set 1 Then, you can monitor traces through: # nft -nn monitor trace That generates the following outputs: trace id e1f5055f ip filter input packet: iif eth0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 ip saddr 192.168.122.1 ip daddr 192.168.122.83 ip tos 0 ip ttl 64 ip id 32315 ip length 84 icmp type echo-request icmp code 0 icmp id 10087 icmp sequence 1 trace id e1f5055f ip filter input rule icmp type echo-request nftrace set 1 (verdict continue) trace id e1f5055f ip filter input verdict continue trace id e1f5055f ip filter input trace id 74e47ad2 ip filter input packet: iif vlan0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 vlan pcp 0 vlan cfi 1 vlan id 1000 ip saddr 10.0.0.1 ip daddr 10.0.0.2 ip tos 0 ip ttl 64 ip id 49030 ip length 84 icmp type echo-request icmp code 0 icmp id 10095 icmp sequence 1 trace id 74e47ad2 ip filter input rule icmp type echo-request nftrace set 1 (verdict continue) trace id 74e47ad2 ip filter input verdict continue trace id 74e47ad2 ip filter input trace id 3030de23 ip filter input packet: iif vlan0 ether saddr 63:f6:4b:00:54:52 ether daddr c9:4b:a9:00:54:52 vlan pcp 0 vlan cfi 1 vlan id 1000 ip saddr 10.0.0.1 ip daddr 10.0.0.2 ip tos 16 ip ttl 64 ip id 59062 ip length 60 tcp sport 55438 tcp dport 1 tcp flags == syn tcp window 29200 trace id 3030de23 ip filter input rule tcp dport 1 nftrace set 1 (verdict continue) trace id 3030de23 ip filter input verdict continue trace id 3030de23 ip filter input The trace id is unique for each packet, there above you can see the travel of this packet through the nft packet classifier. * Ratelimiting enhancements: You can now specify ratelimits in terms of bytes/second, eg. # nft add rule filter forward \ limit rate 1024 mbytes/second counter accept The rule above matches packets under the specified ratelimit. This requires a Linux kernel >= 4.3 btw. You can also indicate the amount of traffic that can go over the threshold via 'burst', eg. # nft add rule filter forward \ limit rate 1024 mbytes/second burst 10240 bytes counter accept You may also need to match based on inverted logic, eg. # nft add rule filter forward \ limit rate over 1024 mbytes/second log prefix "OVERLIMIT: " drop * VLAN matching: You can match any vlan header field and combine this with any of the existing upper layer header selectors, eg. # nft add rule bridge filter prerouting vlan id 24 \ ip saddr 192.168.1.0/24 counter accept * Packet duplication: When used from any of the supported layer 3 families, this allows you to clone packets to a given destination address, eg. duplicate all packets whose mark is 0x: # nft add rule filter
[PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
This adjusts a lot of concerns that have been raised on LKML. Changes from v1: * Lots of error checking around bad ACPI data * Only activate on Dell system vendor DMI string * Use hex2bin instead of inventing a wheel * Copy MAC to both dev_addr and sa_data * Track and set only for one NIC at a time * If MAC lookup failed (bad ACPI data or bad return) fall back to regular r8152 MAC setting routine. This has been tested with TB15, WD15 docks and Dell P/N 96NP5 dongle. It's also been tested with two devices that use r8152 simultaneously. Remaining discussion points: * Greg KH had asked this to only on machines that are known to have \\_SB.AMAC - I would rather avoid doing this because the list will just grow every year. I've added lots of error checking and restricted this to Dell. * There was also a request to move this to an x86 arch_get_platform_mac_address() implementation. - I haven't yet done this. If this is the right approach. I would like to know the proper place in arch/x86 to put this code. My initial thought was a new file in arch/x86/platform/intel Mario Limonciello (1): r8152: Add support for setting MAC to system's Auxiliary MAC address drivers/net/usb/r8152.c | 53 + 1 file changed, 53 insertions(+) -- 2.7.4
RE: [PATCH] r8152: Add support for setting MAC to system's Auxiliary MAC address
Some of my comments are getting stale with what I've done in response to all these emails. Let me send a v2 that we can better iterate on, a few comments below though. > -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Thursday, June 2, 2016 11:09 AM > To: Limonciello, Mario> Cc: hayesw...@realtek.com; linux-ker...@vger.kernel.org; > netdev@vger.kernel.org; linux-...@vger.kernel.org; pali.ro...@gmail.com; > anthony.w...@canonical.com > Subject: Re: [PATCH] r8152: Add support for setting MAC to system's > Auxiliary MAC address > > On Thu, Jun 02, 2016 at 03:46:41PM +, mario_limoncie...@dell.com > wrote: > > > > > > > > This isn't something part of ACPI - it's been added specifically for a > > > > selection of Dell machines. > > > > > > Ah, but isn't ACPI supposed to be a "standard"? :) > > > > > > > Heh. > > It's also possible to get this from an SMM routine. Lesser of two evils to > fetch the information this way, right? :) > > Yes, but again, please only do this for machines you _know_ this value > will be present on. Otherwise you will end up with problems. I'm going to send a V2, I'd like to know where and how this could still break. I am having a hard time grasping this. > > > > And please wrap your email lines, there is a "standard" for that... > > > > I'm unfortunately not limited to an evil mail client at my workplace since > > our > mail server migration. My apologies, I've got it set to wrap at 76 > characters > and I'm trying to make it as LKML friendly as possible. > > It's not working as you can see here :( Ugh, sorry. Stupid outlook. It seems to only be doing it on replies. I'll manually just chop the lines when they're around that size until I've got a better solution. > > > > > I would rather not hardcode to the specific DMI model strings of those > > > > Dell machines as it's certainly going to be a feature that expands to > > > > more machines. Since it is Dell specific though, if you would rather > > > > me just match to the sys vendor Dell Inc., that seems like a pretty > > > > good compromise to me. > > > > > > You need to only do this on machines you "know" have this set to a > correct > > > value, otherwise if some other random BIOS happens to set that field to > > > some random value, you will have problems. > > > > Pali had recommended in another message to check the buffer header. I > was intending to do this along with check ACPI buffer output type, and > output size in the next revision I submitted. By switching to hex2bin, I'll > also > validate that the string has correct values (0-F or 0-f). If somehow all of > that > fails, the set_ethernet_addr checks if the address is valid. If it's > invalid it will > generate a random one. > > Why generate a random one and not just use the one that the network > controler already provides? That's how the flow works in r8152 already and I'm not overriding it. Again, I'll send V2 and you'll see what I did. > > > > It's really not that hard, track a module wide static variable whether the > feature is in use. Track in each device whether the feature was in use. If > it in > use, don't assign the next device plugged in via the ACPI string. If a > device is > removed that has the feature activated, change the module wide static > variable. > > Ok, let's see the code before I say anymore about this. > > > > What's wrong with a "simple" script to set the mac address from > userspace if > > > the user wants something like this? Provide it as a system package and > then > > > no kernel changes are needed at all. Much easier to support on your end > > > (you don't have to maintain this odd kernel code for > > > 10+ years), the default behavior is as Linux users expect, and your > > > limited number of people who want this crazy behaviour can install your > > > script if they want it. > > > > > > > This was my original approach. It involved a network manager script, > network manager code changes to support this, and exposing this > somewhere in a platform module (like dell-laptop). I was told I'm better off > doing it directly in the network module, so here I am. > > Why not a small systemd unit file for this that sets things up when the > device is found in the system? Why mess with network manager and a > platform kernel driver at all? That seems very complex for such a > simple operation where the kernel doesn't need to be involved at all, > especially for such a "niche" product. > > See this link: > https://wiki.archlinux.org/index.php/MAC_address_spoofing#Auto > matically > The ACPI subsystem doesn't create a sysfs node for a random buffer under _SB. I don't think the ACPI guys would be crazy about this either. So you need a platform kernel driver to pull this out of ACPI (or SMM) and expose into userspace somewhere in the first place. I was putting it into a random sysfs attribute when I did my first attempts
[PATCH v2] r8152: Add support for setting MAC to system's Auxiliary MAC address
Dell systems with Type-C ports have support for a persistent system specific MAC address when used with Dell Type-C docks and dongles. This means a dock plugged into two different systems will show different (but persistent) MAC addresses. Dell Type-C docks and dongles use the r8152 driver. This information for the system's persistent MAC address is burned in when the HW is built and available under _SB\AMAC in the DSDT at runtime. More information about the technology is available here: http://www.dell.com/support/article/us/en/04/SLN301147 Signed-off-by: Mario Limonciello--- drivers/net/usb/r8152.c | 53 + 1 file changed, 53 insertions(+) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 3f9f6ed..6dea542 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -26,6 +26,8 @@ #include #include #include +#include +#include /* Information for net-next */ #define NETNEXT_VERSION"08" @@ -500,6 +502,7 @@ enum rtl8152_flags { SELECTIVE_SUSPEND, PHY_RESET, SCHEDULE_NAPI, + MAC_PASSTHRU = 0, }; /* Define these values to match your device */ @@ -653,6 +656,7 @@ enum tx_csum_stat { */ static const int multicast_filter_limit = 32; static unsigned int agg_buf_sz = 16384; +static bool mac_passthru_active; #define RTL_LIMITED_TSO_SIZE (agg_buf_sz - sizeof(struct tx_desc) - \ VLAN_ETH_HLEN - VLAN_HLEN) @@ -1030,6 +1034,49 @@ out1: return ret; } +static int get_auxiliary_addr(struct r8152 *tp, struct sockaddr *sa) +{ + acpi_status status; + struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; + union acpi_object *obj; + int ret = -1; + unsigned char buf[6]; + + if (!dmi_name_in_vendors("Dell Inc.") || mac_passthru_active) + return -1; + + /* returns _AUXMAC_#AABBCCDDEEFF# */ + status = acpi_evaluate_object(NULL, "\\_SB.AMAC", NULL, ); + obj = (union acpi_object *)buffer.pointer; + if (ACPI_SUCCESS(status)) { + if (obj->type != ACPI_TYPE_BUFFER || + obj->string.length != 0x17) { + pr_warn("r8152: get_auxiliary_addr: Invalid buffer"); + goto amacout; + } + if (strncmp(obj->string.pointer, "_AUXMAC_#", 9) != 0) { + pr_warn("r8152: get_auxiliary_addr: Invalid header"); + goto amacout; + } + ret = hex2bin(buf, obj->string.pointer + 9, 6); + if (ret < 0) { + pr_warn("r8152: get_auxiliary_addr: Invalid MAC"); + goto amacout; + } + memcpy(sa->sa_data, buf, 6); + ether_addr_copy(tp->netdev->dev_addr, sa->sa_data); + netdev_info(tp->netdev, "Using system MAC address %pM\n", + sa->sa_data); + set_bit(MAC_PASSTHRU, >flags); + mac_passthru_active = true; + ret = 1; + } + +amacout: + kfree(obj); + return ret; +} + static int set_ethernet_addr(struct r8152 *tp) { struct net_device *dev = tp->netdev; @@ -1041,6 +1088,10 @@ static int set_ethernet_addr(struct r8152 *tp) else ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data); + /* if system provides auxiliary MAC address */ + if (get_auxiliary_addr(tp, )) + ret = 0; + if (ret < 0) { netif_err(tp, probe, dev, "Get ether addr fail\n"); } else if (!is_valid_ether_addr(sa.sa_data)) { @@ -4268,6 +4319,8 @@ static void rtl8152_disconnect(struct usb_interface *intf) if (udev->state == USB_STATE_NOTATTACHED) set_bit(RTL8152_UNPLUG, >flags); + if (test_bit(MAC_PASSTHRU, >flags)) + mac_passthru_active = false; netif_napi_del(>napi); unregister_netdev(tp->netdev); tp->rtl_ops.unload(tp); -- 2.7.4
Re: [RFC PATCH 0/4] Make inotify instance/watches be accounted per userns
Nikolay Borisovwrites: > On 06/01/2016 07:00 PM, Eric W. Biederman wrote: >> Cc'd the containers list. >> >> >> Nikolay Borisov writes: >> >>> Currently the inotify instances/watches are being accounted in the >>> user_struct structure. This means that in setups where multiple >>> users in unprivileged containers map to the same underlying >>> real user (e.g. user_struct) the inotify limits are going to be >>> shared as well which can lead to unplesantries. This is a problem >>> since any user inside any of the containers can potentially exhaust >>> the instance/watches limit which in turn might prevent certain >>> services from other containers from starting. >> >> On a high level this is a bit problematic as it appears to escapes the >> current limits and allows anyone creating a user namespace to have their >> own fresh set of limits. Given that anyone should be able to create a >> user namespace whenever they feel like escaping limits is a problem. >> That however is solvable. > > This is indeed a problem and the presented solution is rather dumb in > that regard. I'm happy to work with you on suggestions so that I arrive > at a solution that is upstreamable. The one in kernel solution to hierarchical resource limits that I am aware of is the current include/linux/page_counter.h which evolved from include/linux/res_counter.h >> A practical question. What kind of limits are we looking at here? >> >> Are these loose limits for detecting buggy programs that have gone >> off their rails? > > Loose limits. > >> >> Are these tight limits to ensure multitasking is possible? >> >> >> >> For tight limits where something is actively controlling the limits you >> probably want a cgroup base solution. >> >> For loose limits that are the kind where you set a good default and >> forget about I think a user namespace based solution is reasonable. > > That's exactly the use case I had in mind. > >> >>> The solution I propose is rather simple, instead of accounting the >>> watches/instances per user_struct, start accounting them in a hashtable, >>> where the index used is the hashed pointer of the userns. This way >>> the administrator needn't set the inotify limits very high and also >>> the risk of one container breaching the limits and affecting every >>> other container is alleviated. >> >> I don't think this is the right data structure for a user namespace >> based solution, at least in part because it does not account for users >> escaping. > > Admittedly this is a naive solution, what are you ideas on something > which achieves my initial aim of having limits per users, yet not > allowing them to just create another namespace and escape them. The > current namespace code has a hard-coded limit of 32 for nesting user > namespaces. So currently at the worst case one can escape the limits up > to 32 * current_limits. 32 is the nesting depth not the width of the tree. But see above. Eric
Re: [PATCH 0/2] Quiet noisy LSM denial when accessing net sysctl
On 05/17/2016 09:13 AM, Tyler Hicks wrote: > On 05/08/2016 10:56 PM, David Miller wrote: >> From: Tyler Hicks>> Date: Fri, 6 May 2016 18:04:12 -0500 >> >>> This pair of patches does away with what I believe is a useless denial >>> audit message when a privileged process initially accesses a net sysctl. >> >> The LSM folks can apply this if they agree with you. > > Hi James - Could you pick up these two bug fix patches? Thanks! Hello - Just checking in again to see if you plan on taking these through the security tree? Tyler signature.asc Description: OpenPGP digital signature
Re: [RFC 05/12] nfp: add BPF to NFP code translator
On 16-06-01 01:15 PM, Alexei Starovoitov wrote: > On Wed, Jun 01, 2016 at 10:03:04PM +0200, Daniel Borkmann wrote: >> On 06/01/2016 06:50 PM, Jakub Kicinski wrote: >>> Add translator for JITing eBPF to operations which >>> can be executed on NFP's programmable engines. >>> >>> Signed-off-by: Jakub Kicinski>>> Reviewed-by: Dinan Gunawardena >>> Reviewed-by: Simon Horman >> [...] >>> +int >>> +nfp_bpf_jit(struct bpf_prog *filter, void *prog_mem, unsigned int >>> prog_start, >>> + unsigned int tgt_out, unsigned int tgt_abort, >>> + unsigned int prog_sz, struct nfp_bpf_result *res) >>> +{ >>> + struct nfp_prog *nfp_prog; >>> + int ret; >>> + >>> + /* TODO: maybe make this dependent on bpf_jit_enable? */ >> >> Probably makes sense to leave it independent from this. >> >> Maybe that would rather be an ethtool flag/setting? > > Agree that it should be independent of bpf_jit_enable, > since that's very different JIT. The whole point of hw offload > is that bpf is translated into something hw understand natively. > Gating it by sysctl or another flag doesn't make much sense to me. > In this case the user will say 'do offload tc+cls_bpf into a nic' > and nic should either do it or not. No need for ethtool flag either. > One can argue that that bpf_jit_enable=2 was useful for debugging > of JIT itself, but looks like it was only used by jit developers > like us, but we would be fine with temp printk while debugging. > At least there was never a case where jit had a bug and we would > ask a person reporting a bug to send us back jit_enable=2 output. > We cannot remove it now, but I wouldn't simply copy the behavior here. > So I'm suggesting not to use bpf_jit_enable either 1 or 2 at all. > In the default case (no flags to the tc command) the tc filter tries to load itself in the hardware. The ethtool flag is there to enable/disable this default behavior. The alternative to the default load into hardware behavior is to specify it explicitly via userspace using the 'do offload tc+cls_bpf' as you note. This was the default behavior folks wanted at netdev conference so I added it even though for many of my use cases users specify explicitly if they want offload or not. Thanks, John