Re: [PATCH] iproute2: Extend bridge command to configure ageing interval on bridge devices.
On Fri, Aug 14, 2015 at 09:50:02AM +, Premkumar Jonnala wrote: Extend bridge command to configure and retrieve ageing interval for bridge devices. Netlink messaging is used to configure and retrieve the ageing interval. Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com ... diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index a78f0b3..abc9617 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -139,6 +139,13 @@ enum { RTM_GETNSID = 90, #define RTM_GETNSID RTM_GETNSID + RTM_SETAGEING = 92, +#define RTM_SETAGEING RTM_SETAGEING + RTM_SETDEFAULTAGEING = 93, +#define RTM_SETDEFAULTAGEING RTM_SETDEFAULTAGEING + RTM_GETAGEING = 94, +#define RTM_GETAGEING RTM_GETAGEING + __RTM_MAX, #define RTM_MAX (((__RTM_MAX + 3) ~3) - 1) }; -- As far as I can see, this depends on a kernel patch which is still under review (in particular, adding these new message types was objected to). I would suggest to wait with submission of the iproute2 patch until relevant kernel changes are accepted. Michal Kubecek -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
pull-request: mac80211 2015-08-14
Hi Dave, I'm back from vacation, and found a single bugfix waiting. It's in this pull request, but I'm not quite up to speed as to what's happening with the release. If it goes in, great; if not I've already tagged it with Cc stable anyway. Thanks, johannes The following changes since commit 923b352f19d9ea971ae2536eab55f5fc9e95fedf: cfg80211: use RTNL locked reg_can_beacon for IR-relaxation (2015-07-17 15:02:02 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-davem-2015-08-14 for you to fetch changes up to f5eeb5fa191fd7b634cbc4883ac58f3b2184dbc5: mac80211: fix invalid read in minstrel_sort_best_tp_rates() (2015-08-13 13:52:34 +0200) We have a single bugfix for an invalid memory read. Adrien Schildknecht (1): mac80211: fix invalid read in minstrel_sort_best_tp_rates() net/mac80211/rc80211_minstrel.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] net/bonding: enable LRO if one device supports it
On Thu, Aug 13, 2015 at 02:02:55PM -0400, Jarod Wilson wrote: Currently, all bonding devices come up, and claim to have LRO support, which ethtool will let you toggle on and off, even if none of the underlying hardware devices actually support it. While the bonding driver takes precautions for slaves that don't support all features, this is at least a little bit misleading to users. If we add NETIF_F_LRO to the NETIF_F_ONE_FOR_ALL flags in netdev_features.h, then netdev_features_increment() will only enable LRO if 1) its listed in the device's feature mask and 2) if there's actually a slave present that supports the feature. Note that this is going to require some follow-up patches, as not all LRO capable device drivers are currently properly reporting LRO support in their vlan_features, which is where the bonding driver picks up device-specific features. CC: David S. Miller da...@davemloft.net CC: Jiri Pirko j...@resnulli.us CC: Tom Herbert therb...@google.com CC: Scott Feldman sfel...@gmail.com CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- include/linux/netdev_features.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 9672781..6440bf1 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -159,7 +159,8 @@ enum { */ #define NETIF_F_ONE_FOR_ALL (NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ROBUST | \ NETIF_F_SG | NETIF_F_HIGHDMA | \ - NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED) + NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED | \ + NETIF_F_LRO) /* * If one device doesn't support one of these features, then disable it -- I don't think this is going to work the way you expect. Assume we have a non-LRO eth1 and LRO capable eth2. If we enslave eth1 first, bond will lose NETIF_F_LRO so that while enslaving eth2, bond_enslave() does run if (!(bond_dev-features NETIF_F_LRO)) dev_disable_lro(slave_dev); and disable LRO on eth2 even before computing the bond features so that in the end, all three interfaces end up with disabled LRO. If you add the slaves in the opposite order, you end up with eth2 and bond having LRO enabled. IMHO features should not depend on the order in which slaves are added into the bond. You would need to remove the code quoted above to make things work the way you want (or move it after the call to bond_compute_features() which is effectively the same). But then the result would be even worse: adding a LRO-capable slave to a bond having dev_disable_lro() called on it would not disable LRO on that slave, possibly (or rather likely) causing communication breakage. I believe NETIF_F_LRO in its original sense should be only considered for physical devices; even if it's not explicitely said in the commit message, the logic behind fbe168ba91f7 (net: generic dev_disable_lro() stacked device handling) is that for stacked devices like bond or team, NETIF_F_LRO means allow slaves to use LRO if they can and want while its absence means disable LRO on all slaves. If you wanted NETIF_F_LRO for a bond to mean there is at least one LRO capable slave, you would need a new flag for the LRO should be disabled for all lower devices state. I don't think it's worth the effort. Michal Kubecek -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.
Bridge devices have ageing interval used to age out MAC addresses from FDB. This ageing interval was not configuratble. Enable netlink based configuration of ageing interval for bridges and switch devices. The ageing interval changes the timer used to purge inactive FDB entries in bridges. The ageing interval config is propagated to switch devices, so that platform or hardware based ageing works according to configuration. Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com --- diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 607b5f4..e3b0c45 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1053,7 +1053,16 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, * This function is used to pass protocol port error state information * to the switch driver. The switch driver can react to the proto_down * by doing a phys down on the associated switch port. - * + * int (*ndo_bridge_setageing)(const struct net_device *dev, + *int ageing_interval); + * Called to set FDB aging interval for a given bridge device. + * int (*ndo_bridge_getageing_nl)(struct sk_buff *skb, + * const struct net_device *dev, + * struct netlink_callback *cb); + * Called to return the ageing interval for the given bridge device, + * in a format suitable for netlink messaging. + * int (*ndo_bridge_getageing)(const struct net_device *dev); + * Called to retrieve the ageing interval for the given bridge device. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1226,6 +1235,13 @@ struct net_device_ops { int (*ndo_get_iflink)(const struct net_device *dev); int (*ndo_change_proto_down)(struct net_device *dev, bool proto_down); + int (*ndo_bridge_setageing)(const struct net_device *dev, + int ageing_interval); + int (*ndo_bridge_getageing_nl)(struct sk_buff *skb, + const struct net_device *dev, + struct netlink_callback *cb); + + int (*ndo_bridge_getageing)(const struct net_device *dev); }; /** diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 89da893..7186fea 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -129,6 +129,10 @@ int switchdev_port_attr_get(struct net_device *dev, struct switchdev_attr *attr); int switchdev_port_attr_set(struct net_device *dev, struct switchdev_attr *attr); +int netdev_switch_ageing_set(struct net_device *dev, int ageing_interval); +int netdev_switch_ageing_get(struct sk_buff *skb, +const struct net_device *dev, +struct netlink_callback *cb); int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj); int switchdev_port_obj_del(struct net_device *dev, struct switchdev_obj *obj); int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj); @@ -163,6 +167,17 @@ void switchdev_port_fwd_mark_set(struct net_device *dev, #else +static inline int netdev_switch_ageing_set(struct net_device *dev, + int ageing_interval) +{ + return -EOPNOTSUPP; +} + +static inline int netdev_switch_ageing_get(struct net_device *dev) +{ + return -EOPNOTSUPP; +} + static inline int switchdev_port_attr_get(struct net_device *dev, struct switchdev_attr *attr) { diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index 3635b77..a32ab4d 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -199,4 +199,23 @@ enum { }; #define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1) +struct admsg { + __u8 adm_family; + __u8 adm_pad1; + __u16 adm_pad2; + __s32 adm_ifindex; + __u16 adm_ageing_interval; +}; + +/* The value of this macro is based on the value recommended by IEEE + * standard 802.1d. + */ +#define MIN_AGEING_INTERVAL_SECS (10) + +/* The value of DEFAULT_AGEING_INTERVAL_SECS is the default ageing + * interval that was used in br_device.c. This default value is also + * recommended by IEEE Standard 802.1d. + */ +#define DEFAULT_AGEING_INTERVAL_SECS (300) + #endif /* _UAPI_LINUX_IF_BRIDGE_H */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 47d24cb..9321818 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -139,6 +139,13 @@ enum { RTM_GETNSID = 90, #define RTM_GETNSID RTM_GETNSID + RTM_SETAGEING = 92, +#define RTM_SETAGEING RTM_SETAGEING + RTM_SETDEFAULTAGEING = 93, +#define
Re: [PATCH] bridge: Enable configuration of ageing interval for bridges and switch devices.
On 8/13/15, 11:23 PM, Premkumar Jonnala wrote: Bridge devices have ageing interval used to age out MAC addresses from FDB. This ageing interval was not configuratble. Enable netlink based configuration of ageing interval for bridges and switch devices. The ageing interval changes the timer used to purge inactive FDB entries in bridges. The ageing interval config is propagated to switch devices, so that platform or hardware based ageing works according to configuration. Signed-off-by: Premkumar Jonnala pjonn...@broadcom.com How is this different from netlink attribute IFLA_BR_AGEING_TIME ? --- diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 607b5f4..e3b0c45 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1053,7 +1053,16 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev, *This function is used to pass protocol port error state information *to the switch driver. The switch driver can react to the proto_down * by doing a phys down on the associated switch port. - * + * int (*ndo_bridge_setageing)(const struct net_device *dev, + *int ageing_interval); + * Called to set FDB aging interval for a given bridge device. + * int (*ndo_bridge_getageing_nl)(struct sk_buff *skb, + * const struct net_device *dev, + * struct netlink_callback *cb); + * Called to return the ageing interval for the given bridge device, + * in a format suitable for netlink messaging. + * int (*ndo_bridge_getageing)(const struct net_device *dev); + * Called to retrieve the ageing interval for the given bridge device. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1226,6 +1235,13 @@ struct net_device_ops { int (*ndo_get_iflink)(const struct net_device *dev); int (*ndo_change_proto_down)(struct net_device *dev, bool proto_down); + int (*ndo_bridge_setageing)(const struct net_device *dev, + int ageing_interval); + int (*ndo_bridge_getageing_nl)(struct sk_buff *skb, + const struct net_device *dev, + struct netlink_callback *cb); + + int (*ndo_bridge_getageing)(const struct net_device *dev); }; you cannot add new ndo's for each of these. It should be covered as part of existing br_link_ops /** diff --git a/include/net/switchdev.h b/include/net/switchdev.h index 89da893..7186fea 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -129,6 +129,10 @@ int switchdev_port_attr_get(struct net_device *dev, struct switchdev_attr *attr); int switchdev_port_attr_set(struct net_device *dev, struct switchdev_attr *attr); +int netdev_switch_ageing_set(struct net_device *dev, int ageing_interval); +int netdev_switch_ageing_get(struct sk_buff *skb, +const struct net_device *dev, +struct netlink_callback *cb); int switchdev_port_obj_add(struct net_device *dev, struct switchdev_obj *obj); int switchdev_port_obj_del(struct net_device *dev, struct switchdev_obj *obj); int switchdev_port_obj_dump(struct net_device *dev, struct switchdev_obj *obj); @@ -163,6 +167,17 @@ void switchdev_port_fwd_mark_set(struct net_device *dev, #else +static inline int netdev_switch_ageing_set(struct net_device *dev, + int ageing_interval) +{ + return -EOPNOTSUPP; +} + +static inline int netdev_switch_ageing_get(struct net_device *dev) +{ + return -EOPNOTSUPP; +} + static inline int switchdev_port_attr_get(struct net_device *dev, struct switchdev_attr *attr) { diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h index 3635b77..a32ab4d 100644 --- a/include/uapi/linux/if_bridge.h +++ b/include/uapi/linux/if_bridge.h @@ -199,4 +199,23 @@ enum { }; #define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1) +struct admsg { + __u8 adm_family; + __u8 adm_pad1; + __u16 adm_pad2; + __s32 adm_ifindex; + __u16 adm_ageing_interval; +}; + +/* The value of this macro is based on the value recommended by IEEE + * standard 802.1d. + */ +#define MIN_AGEING_INTERVAL_SECS (10) + +/* The value of DEFAULT_AGEING_INTERVAL_SECS is the default ageing + * interval that was used in br_device.c. This default value is also + * recommended by IEEE Standard 802.1d. + */ +#define DEFAULT_AGEING_INTERVAL_SECS (300) + #endif /* _UAPI_LINUX_IF_BRIDGE_H */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 47d24cb..9321818 100644 --- a/include/uapi/linux/rtnetlink.h +++
Re: [PATCH net-next 1/3] lwt: Add support to redirect dst.input
On 8/13/15, 9:54 AM, Tom Herbert wrote: This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. Signed-off-by: Tom Herbert t...@herbertland.com - LGTM. Acked-by: Roopa Prabhu ro...@cumulusnetworks.com thanks, Roopa -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iwlwifi: out-of-bounds access in iwl_init_sband_channels
Adrien Schildknecht adrien+...@schischi.me writes: Hi, On 08/14/2015 03:36 AM, Adrien Schildknecht wrote: Both loops of this function compare data from the 'chan' array and then check if the index is valid. The 2 conditions should be inverted to avoid an out-of-bounds access. Was that found by a static analyzer or any other automated tool, or was that the result of your very careful review? The error has been reported by KASan: == BUG: KASan: out of bounds access in iwl_init_sband_channels+0x207/0x260 [iwlwifi] at addr 8800c2d0aac8 Read of size 4 by task modprobe/329 == Always try to add information like this to the commit log, it's very useful. -- Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] Revert net: fec: Ensure clocks are enabled while using mdio bus
It causes the i.mx6sx sdb board hang when using nfsroot during boots up at v4.2-rc6. This reverts commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3. Cc: netdev@vger.kernel.org Cc: Fugang Duan b38...@freescale.com Cc: shawn@linaro.org Cc: fabio.este...@freescale.com Cc: tyler.ba...@linaro.org Cc: Lucas Stach l.st...@pengutronix.de Cc: Andrew Lunn and...@lunn.ch Signed-off-by: Peter Chen peter.c...@freescale.com --- According to Fugang Duan, the i.mx series has different clock control sequence among SoCs, this patch may only consider certain SoCs. drivers/net/ethernet/freescale/fec_main.c | 89 +-- 1 file changed, 13 insertions(+), 76 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c..5e8b837 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -24,7 +24,6 @@ #include linux/module.h #include linux/kernel.h #include linux/string.h -#include linux/pm_runtime.h #include linux/ptrace.h #include linux/errno.h #include linux/ioport.h @@ -78,7 +77,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev); #define FEC_ENET_RAEM_V0x8 #define FEC_ENET_RAFL_V0x8 #define FEC_ENET_OPD_V 0xFFF0 -#define FEC_MDIO_PM_TIMEOUT 100 /* ms */ static struct platform_device_id fec_devtype[] = { { @@ -1769,13 +1767,7 @@ static void fec_enet_adjust_link(struct net_device *ndev) static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1791,30 +1783,18 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO read timeout\n); - ret = -ETIMEDOUT; - goto out; + return -ETIMEDOUT; } - ret = FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); - -out: - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + /* return value */ + return FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); } static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, u16 value) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1831,13 +1811,10 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO write timeout\n); - ret = -ETIMEDOUT; + return -ETIMEDOUT; } - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + return 0; } static int fec_enet_clk_enable(struct net_device *ndev, bool enable) @@ -1849,6 +1826,9 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) ret = clk_prepare_enable(fep-clk_ahb); if (ret) return ret; + ret = clk_prepare_enable(fep-clk_ipg); + if (ret) + goto failed_clk_ipg; if (fep-clk_enet_out) { ret = clk_prepare_enable(fep-clk_enet_out); if (ret) @@ -1872,6 +1852,7 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) } } else { clk_disable_unprepare(fep-clk_ahb); + clk_disable_unprepare(fep-clk_ipg); if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); if (fep-clk_ptp) { @@ -1893,6 +1874,8 @@ failed_clk_ptp: if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); failed_clk_enet_out: + clk_disable_unprepare(fep-clk_ipg); +failed_clk_ipg: clk_disable_unprepare(fep-clk_ahb); return ret; @@ -2864,14 +2847,10 @@ fec_enet_open(struct net_device *ndev) struct fec_enet_private *fep = netdev_priv(ndev); int ret; - ret = pm_runtime_get_sync(fep-pdev-dev); - if (IS_ERR_VALUE(ret)) - return ret; - pinctrl_pm_select_default_state(fep-pdev-dev); ret = fec_enet_clk_enable(ndev, true); if (ret) - goto clk_enable; + return ret; /*
Re: [PATCH 1/2] average: provide macro to create static EWMA
On Thu, 2015-08-13 at 17:26 -0700, David Miller wrote: From: Johannes Berg johan...@sipsolutions.net Date: Thu, 13 Aug 2015 11:11:48 +0200 From: Johannes Berg johannes.b...@intel.com Having the EWMA parameters stored in the runtime struct imposes memory requirements for the constant values that could just be inlined in the code. This particularly makes sense if there are a lot of such structs, for example in mac80211 in the station table where each station has a number of these in an array, and there can be many stations. Provide a macro DECLARE_EWMA() that declares the necessary struct and inline functions to access it with the parameters hard-coded; using this also means the user no longer needs to 'select AVERAGE' as it's entirely self-contained. In the mac80211 case, on x86-64, this actually slightly *reduces* code size, while also saving 80 bytes of runtime memory per sta. Signed-off-by: Johannes Berg johannes.b...@intel.com --- As the next patch relies on this, I'll take this through my tree unless I hear objections. This looks fine to me. Thanks, I've applied both. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1] Revert net: fec: Ensure clocks are enabled while using mdio bus
From: Peter Chen peter.c...@freescale.com Sent: Friday, August 14, 2015 1:48 PM To: da...@davemloft.net Cc: Chen Peter-B29397; netdev@vger.kernel.org; Duan Fugang-B38611; shawn@linaro.org; Estevam Fabio-R49496; tyler.ba...@linaro.org; Lucas Stach; Andrew Lunn Subject: [PATCH 1/1] Revert net: fec: Ensure clocks are enabled while using mdio bus It causes the i.mx6sx sdb board hang when using nfsroot during boots up at v4.2-rc6. This reverts commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3. Cc: netdev@vger.kernel.org Cc: Fugang Duan b38...@freescale.com Cc: shawn@linaro.org Cc: fabio.este...@freescale.com Cc: tyler.ba...@linaro.org Cc: Lucas Stach l.st...@pengutronix.de Cc: Andrew Lunn and...@lunn.ch Signed-off-by: Peter Chen peter.c...@freescale.com --- According to Fugang Duan, the i.mx series has different clock control sequence among SoCs, this patch may only consider certain SoCs. drivers/net/ethernet/freescale/fec_main.c | 89 + -- 1 file changed, 13 insertions(+), 76 deletions(-) I suggest to revert the patch. The current patch doesn't consider i.MX6sx/i.MX7d... chips. As somebody/customer's requirement that want to use MDIO bus is independent of MAC itself, I will submit one mdio driver to separate MDIO bus and MAC driver. Regards, Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 4/4] Added getsynctime64() callback
On Thu, Aug 13, 2015 at 09:10:36PM +, Hall, Christopher S wrote: + if (!cpu_has_art) + return -EOPNOTSUPP; Perform this check before registration, setting .getsynctime64 accordingly. The problem here is that ART initialization doesn't happen until we install TSC as a clocksource. This design is per Thomas' suggestion. That occurs after the driver is loaded (as a module). So that 'cpu_has_art' actually means 'cpu_has_art_and_has_been_initialized'? In any case, returning EOPNOTSUPP early on, but OK later seems mean to me. If the clocks aren't ready yet, the error should be EBUSY so that user space knows it can try again. Thanks, Richard -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] iwlwifi: out-of-bounds access in iwl_init_sband_channels
Hi, On 08/14/2015 03:36 AM, Adrien Schildknecht wrote: Both loops of this function compare data from the 'chan' array and then check if the index is valid. The 2 conditions should be inverted to avoid an out-of-bounds access. Was that found by a static analyzer or any other automated tool, or was that the result of your very careful review? The error has been reported by KASan: == BUG: KASan: out of bounds access in iwl_init_sband_channels+0x207/0x260 [iwlwifi] at addr 8800c2d0aac8 Read of size 4 by task modprobe/329 == -- Adrien Schildknecht -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 751/762] DockBook: Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: d52736e24fe2e927c26817256f8d1a3c8b5d51a0 commit: 4e3c89920cd3a6cfce22c6f537690747c26128dd [751/762] net: Introduce VRF related flags and helpers reproduce: make htmldocs All warnings (new ones prefixed by ): Warning(include/linux/skbuff.h:833): No description found for parameter 'sk' Warning(net/core/skbuff.c:407): No description found for parameter 'len' Warning(net/core/skbuff.c:407): Excess function parameter 'length' description in '__netdev_alloc_skb' Warning(net/core/skbuff.c:476): No description found for parameter 'len' Warning(net/core/skbuff.c:476): Excess function parameter 'length' description in '__napi_alloc_skb' Warning(net/core/gen_stats.c:155): No description found for parameter 'cpu' Warning(net/core/gen_estimator.c:212): No description found for parameter 'cpu_bstats' Warning(net/core/gen_estimator.c:303): No description found for parameter 'cpu_bstats' Warning(net/core/dev.c:2921): No description found for parameter 'sk' Warning(net/core/dev.c:3986): No description found for parameter 'sk' Warning(net/core/dev.c:6078): No description found for parameter 'len' Warning(include/linux/netdevice.h:1293): Enum value 'IFF_XMIT_DST_RELEASE_PERM' not described in enum 'netdev_priv_flags' Warning(include/linux/netdevice.h:1293): Enum value 'IFF_IPVLAN_MASTER' not described in enum 'netdev_priv_flags' Warning(include/linux/netdevice.h:1293): Enum value 'IFF_IPVLAN_SLAVE' not described in enum 'netdev_priv_flags' Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags' Warning(include/linux/netdevice.h:1795): No description found for parameter 'ptype_all' Warning(include/linux/netdevice.h:1795): No description found for parameter 'ptype_specific' --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.2.0-rc6 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT=elf32-i386 CONFIG_ARCH_DEFCONFIG=arch/x86/configs/i386_defconfig CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS=-fcall-saved-ecx -fcall-saved-edx CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE= # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_GZIP=y # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME=(none) # CONFIG_SYSVIPC is not set # CONFIG_CROSS_MEMORY_ATTACH is not set # CONFIG_FHANDLE is not set # CONFIG_USELIB is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set # CONFIG_NO_HZ is not set # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set # CONFIG_RCU_STALL_COMMON is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_RCU_EXPEDITE_BOOT is not set # CONFIG_BUILD_BIN2C is not set # CONFIG_IKCONFIG is not set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y # CONFIG_CGROUPS is not set # CONFIG_CHECKPOINT_RESTORE is not set # CONFIG_SCHED_AUTOGROUP is not set # CONFIG_RELAY is
Re: [PATCH net-next 1/3] lwt: Add support to redirect dst.input
On 13/08/15 17:54, Tom Herbert wrote: This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. The lwt state is refcounted so it can be shared by different dst contexts, so is it safe to be storing per-dst state in the lwt state? Otherwise, it looks good. Thanks, Rob -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
4.1.5 oops in alloc_skb_with_frags
Hi, I got a series of (same) Oopses on a fresh 4.1.5 on KDE startup: Aug 14 08:45:38 gandalf kernel: PGD 0 Aug 14 08:45:38 gandalf kernel: Oops: [#1] PREEMPT SMP Aug 14 08:45:38 gandalf kernel: Modules linked in: radeon cfbfillrect cfbimgblt cfbcopyarea fbcon i2c_algo_bit bit blit softcursor font drm_kms_helper ttm drm fb fbdev Aug 14 08:45:38 gandalf kernel: CPU: 2 PID: 2726 Comm: X Not tainted 4.1.5 #1 Aug 14 08:45:38 gandalf kernel: Hardware name: Apple Inc. iMac11,2/Mac-F2238AC8, BIOS IM112.88Z.0057.B00.100503 1455 05/03/10 Aug 14 08:45:38 gandalf kernel: task: 880092e8b020 ti: 880135248000 task.ti: 880135248000 Aug 14 08:45:38 gandalf kernel: RIP: 0010:[811b41a6] [811b41a6] __kmalloc_track_caller+0x76/0 x190 Aug 14 08:45:38 gandalf kernel: RSP: 0018:88013524ba88 EFLAGS: 00010202 Aug 14 08:45:38 gandalf kernel: RAX: RBX: 8800812e2f00 RCX: 9042 Aug 14 08:45:38 gandalf kernel: RDX: 903a RSI: 903a RDI: 02bf Aug 14 08:45:38 gandalf kernel: RBP: 88013524bac8 R08: 00018d80 R09: 0003 Aug 14 08:45:38 gandalf kernel: R10: 7000 R11: 0160 R12: 02c0 Aug 14 08:45:38 gandalf kernel: R13: 000106d0 R14: 01d6800b R15: 880137001780 Aug 14 08:45:38 gandalf kernel: FS: 7f5233ea1880() GS:88013bc8() knlGS: Aug 14 08:45:38 gandalf kernel: CS: 0010 DS: ES: CR0: 80050033 Aug 14 08:45:38 gandalf kernel: CR2: 01d6800b CR3: 00013528b000 CR4: 06e0 Aug 14 08:45:38 gandalf kernel: Stack: Aug 14 08:45:38 gandalf kernel: 8801 8164defd 88013524bae8 8800812e2f00 Aug 14 08:45:38 gandalf kernel: 88013524bb27 04d0 02c0 Aug 14 08:45:38 gandalf kernel: 88013524bb08 8164de3c 880136799300 8800812e2f00 Aug 14 08:45:38 gandalf kernel: Call Trace: Aug 14 08:45:38 gandalf kernel: [8164defd] ? __alloc_skb+0x6d/0x1c0 Aug 14 08:45:38 gandalf kernel: [8164de3c] __kmalloc_reserve.isra.43+0x2c/0x80 Aug 14 08:45:38 gandalf kernel: [8164defd] __alloc_skb+0x6d/0x1c0 Aug 14 08:45:38 gandalf kernel: [8164e0a7] alloc_skb_with_frags+0x57/0x200 Aug 14 08:45:38 gandalf kernel: [81649e83] ? sock_wfree+0x53/0x60 Aug 14 08:45:38 gandalf kernel: [81647dd6] sock_alloc_send_pskb+0x196/0x240 Aug 14 08:45:38 gandalf kernel: [816531bf] ? skb_copy_datagram_from_iter+0x4f/0x1f0 Aug 14 08:45:38 gandalf kernel: [8174b55a] unix_stream_sendmsg+0x25a/0x3a0 Aug 14 08:45:38 gandalf kernel: [81644c52] sock_sendmsg+0x12/0x20 Aug 14 08:45:38 gandalf kernel: [81644cd3] sock_write_iter+0x73/0xd0 Aug 14 08:45:38 gandalf kernel: [811bc3d4] do_iter_readv_writev+0x54/0x70 Aug 14 08:45:38 gandalf kernel: [811bca96] do_readv_writev+0x196/0x230 Aug 14 08:45:38 gandalf kernel: [811d7f80] ? __fget_light+0x20/0x70 Aug 14 08:45:38 gandalf kernel: [811d7f0d] ? __fget+0x6d/0xa0 Aug 14 08:45:38 gandalf kernel: [811bcba4] vfs_writev+0x34/0x50 Aug 14 08:45:38 gandalf kernel: [811bd875] SyS_writev+0x45/0xd0 Aug 14 08:45:38 gandalf kernel: [8189f197] system_call_fastpath+0x12/0x6a Aug 14 08:45:38 gandalf kernel: Code: 48 89 c8 65 48 03 05 8a 5f e5 7e 48 8b 70 08 48 39 f2 75 e7 4c 8b 30 4d 85 f6 0f 84 bc 00 00 00 49 63 47 20 48 8d 4a 08 4d 8b 07 49 8b 1c 06 4c 89 f0 65 49 0f c7 08 0f 94 c0 84 c0 74 ba 49 63 Aug 14 08:45:38 gandalf kernel: RSP 88013524ba88 Aug 14 08:45:38 gandalf kernel: CR2: 01d6800b Aug 14 08:45:38 gandalf kernel: ---[ end trace 3cf0da471519df5e ]--- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mm: make page pfmemalloc check more robust
On 08/13/2015 04:40 PM, Eric Dumazet wrote: On Thu, 2015-08-13 at 11:13 +0200, Vlastimil Babka wrote: Given that this apparently isn't the first case of this localhost issue, I wonder if network code should just clear skb-pfmemalloc during send (or maybe just send over localhost). That would be probably easier than distinguish the __skb_fill_page_desc() callers for send vs receive. Would this still needed after this patch ? Not until another corner case is discovered :) Or something passes a genuine pfmemalloc page to a socket (sending contents of some slab objects perhaps, where the slab page was allocated as pfmemalloc? Dunno if that can happen right now). It is sad we do not have a SNMP counter to at least count how often we drop skb because pfmemalloc is set. I'll provide such a patch. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IGMP: Inhibit reports for local multicast groups
Hi Philip So with a bit of poking and prodding, we have a much better understanding as to why this is O.K. Maybe your next patch can quote the relevant RFCs and have a much fuller commit message? Thanks Andrew -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] MIPS: net: BPF: Introduce BPF ASM helpers
On Thu, Aug 13, 2015 at 10:42:46PM +0200, Aurelien Jarno wrote: This patch relies on R2 instructions, and thus the Linux kernel fails to build when targetting non-R2 CPUs. See for example: https://buildd.debian.org/status/fetch.php?pkg=linuxarch=mipselver=4.2%7Erc6-1%7Eexp1stamp=143948 -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net Hi, I think Ralf may have a fix for R1 cores but I am not sure about the status of that patch. Ralf? -- markos -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/fsl: simplify Kconfig dependency list for fsl networking
On Fri, Aug 14, 2015 at 12:01 AM, Stuart Yoder stuart.yo...@freescale.com wrote: make the list of Kconfig dependencies for Freescale networking more general. Simplify to supported architectures: ARM, ARM64, PPC, M68K Signed-off-by: Stuart Yoder stuart.yo...@freescale.com --- drivers/net/ethernet/freescale/Kconfig | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig index ff76d4e..70782d7 100644 --- a/drivers/net/ethernet/freescale/Kconfig +++ b/drivers/net/ethernet/freescale/Kconfig @@ -5,9 +5,7 @@ config NET_VENDOR_FREESCALE bool Freescale devices default y - depends on FSL_SOC || QUICC_ENGINE || CPM1 || CPM2 || PPC_MPC512x || \ - M523x || M527x || M5272 || M528x || M520x || M532x || \ - ARCH_MXC || ARCH_MXS || (PPC_MPC52xx PPC_BESTCOMM) + depends on M68K || PPC || ARM || ARM64 ---help--- If you have a network (Ethernet) card belonging to this class, say Y. This breaks m68k/allmodconfig: drivers/net/ethernet/freescale/gianfar.c: In function ‘gfar_parse_group’: drivers/net/ethernet/freescale/gianfar.c:684: error: ‘NO_IRQ’ undeclared (first use in this function) drivers/net/ethernet/freescale/gianfar.c:684: error: (Each undeclared identifier is reported only once drivers/net/ethernet/freescale/gianfar.c:684: error: for each function it appears in.) P.S. Hint: Would have been caught earlier if the NET_VENDOR_* symbol had || COMPILE_TEST among its dependencies. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say programmer or something like that. -- Linus Torvalds -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Revert net: fec: Ensure clocks are enabled while using mdio bus
Am Freitag, den 14.08.2015, 08:25 + schrieb Peter Chen: Am Freitag, den 14.08.2015, 13:47 +0800 schrieb Peter Chen: It causes the i.mx6sx sdb board hang when using nfsroot during boots up at v4.2-rc6. This reverts commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3. Cc: netdev@vger.kernel.org Cc: Fugang Duan b38...@freescale.com Cc: shawn@linaro.org Cc: fabio.este...@freescale.com Cc: tyler.ba...@linaro.org Cc: Lucas Stach l.st...@pengutronix.de Cc: Andrew Lunn and...@lunn.ch Signed-off-by: Peter Chen peter.c...@freescale.com --- According to Fugang Duan, the i.mx series has different clock control sequence among SoCs, this patch may only consider certain SoCs. Sorry, but NACK. Please test current mainline (what will become v4.2-rc7). There is already a patch in that fixes i.MX27 and probably fixes the same problem on i.MX6SX. Would you help point to me which commit and at which tree? Mainline, so Linus Torvalds tree. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14d2b7c1a96ef37eb571599c73d4a1a606b964d6 Regards, Lucas Peter drivers/net/ethernet/freescale/fec_main.c | 89 +-- 1 file changed, 13 insertions(+), 76 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c..5e8b837 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -24,7 +24,6 @@ #include linux/module.h #include linux/kernel.h #include linux/string.h -#include linux/pm_runtime.h #include linux/ptrace.h #include linux/errno.h #include linux/ioport.h @@ -78,7 +77,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev); #define FEC_ENET_RAEM_V 0x8 #define FEC_ENET_RAFL_V 0x8 #define FEC_ENET_OPD_V 0xFFF0 -#define FEC_MDIO_PM_TIMEOUT 100 /* ms */ static struct platform_device_id fec_devtype[] = { { @@ -1769,13 +1767,7 @@ static void fec_enet_adjust_link(struct net_device *ndev) static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1791,30 +1783,18 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO read timeout\n); - ret = -ETIMEDOUT; - goto out; + return -ETIMEDOUT; } - ret = FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); - -out: - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + /* return value */ + return FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); } static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, u16 value) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1831,13 +1811,10 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO write timeout\n); - ret = -ETIMEDOUT; + return -ETIMEDOUT; } - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + return 0; } static int fec_enet_clk_enable(struct net_device *ndev, bool enable) @@ -1849,6 +1826,9 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) ret = clk_prepare_enable(fep-clk_ahb); if (ret) return ret; + ret = clk_prepare_enable(fep-clk_ipg); + if (ret) + goto failed_clk_ipg; if (fep-clk_enet_out) { ret = clk_prepare_enable(fep-clk_enet_out); if (ret) @@ -1872,6 +1852,7 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) } } else { clk_disable_unprepare(fep-clk_ahb); + clk_disable_unprepare(fep-clk_ipg); if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); if (fep-clk_ptp) { @@ -1893,6 +1874,8 @@ failed_clk_ptp: if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); failed_clk_enet_out: +
[PATCH v2] iwlwifi: out-of-bounds access in iwl_init_sband_channels
KASan error report: == BUG: KASan: out of bounds access in iwl_init_sband_channels+0x207/0x260 [iwlwifi] at addr 8800c2d0aac8 Read of size 4 by task modprobe/329 == Both loops of this function compare data from the 'chan' array and then check if the index is valid. The 2 conditions should be inverted to avoid an out-of-bounds access. Signed-off-by: Adrien Schildknecht adrien+...@schischi.me --- drivers/net/wireless/iwlwifi/iwl-eeprom-parse.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/iwlwifi/iwl-eeprom-parse.c b/drivers/net/wireless/iwlwifi/iwl-eeprom-parse.c index 21302b6..acc3d18 100644 --- a/drivers/net/wireless/iwlwifi/iwl-eeprom-parse.c +++ b/drivers/net/wireless/iwlwifi/iwl-eeprom-parse.c @@ -713,12 +713,12 @@ int iwl_init_sband_channels(struct iwl_nvm_data *data, struct ieee80211_channel *chan = data-channels[0]; int n = 0, idx = 0; - while (chan-band != band idx n_channels) + while (idx n_channels chan-band != band) chan = data-channels[++idx]; sband-channels = data-channels[idx]; - while (chan-band == band idx n_channels) { + while (idx n_channels chan-band == band) { chan = data-channels[++idx]; n++; } -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GCOV_PROFILE_ALL breaks BUILD_BUG_ON(!is_power_of_2(8))
On Fri, 2015-08-14 at 11:00 +0200, Michal Kubecek wrote: but should I have expected this? It might have something to do with the fact that is_power_of_2() being an inline function, perhaps with this compiler option it translates to something that can't be used in the context BUILD_BUG_ON() uses it in. Evidently, yeah. There is a BUILD_BUG_ON_NOT_POWER_OF_2() macro you could use. Good point, I'll do that, thanks. johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux kernel commit breaks IPMI on iface downing
The BNX2 firmware has already been updated to the latest version available from Dell. root@debian:~# ethtool -i eth0 | grep firmware firmware-version: 6.4.5 bc 5.2.3 NCSI 2.0.11 2015-08-14 3:30 GMT+02:00 Michael Chan mc...@broadcom.com: +netdev and Harish who is the current maintainer of bnx2 at qlogic. The patch in question effectively just removes the bnx2_set_power_state() call during ip link set down. If there is IPMI, the firmware should know the link needs to stay up when the driver resets the device during bnx2_close(). This should be a very common scenario. Please provide the firmware versions to Harish with ethtool -i. Perhaps upgrading the firmware can resolve this issue. On Thu, 2015-08-13 at 15:47 +0200, Sébastien Bocahu wrote: Hi, Being unable to install Debian Jessie via IPMI on mainstream Dell R410 servers that used to be well supported by Debian Wheezy, I tracked the problem down to a specific commit in the Linux kernel, specifically in the bnx2 driver. The issue is that ip link set eth0 down takes the Ethernet part of the BMC down (shared NIC for BMC+eth0), cutting off the IPMI session. The BMC gets back only after power cycling. Hardware: Dell R410 w/ a Broadcom 5716 NIC: Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20) ) Part number: BCM95716C1 Vendor specific: 6.4.5 (Firmware has been updated to the latest version available by Dell's support website) I built many kernels to track which change caused the issue and it seems that's : 25bfb1dd4ba3b2d9a49ce9d9b0cd7be1840e15ed (bnx2: Add pci shutdown handler.) Before, 'ip link set eth0 down' would just cause 2/3s of packet loss but IPMI would still be working afterwards. I'm available for more informations and/or testing if needed. Thanks ! -- Sébastien Bocahu IT infrastructure manager 4, Rue Montrochet - 69002 - Lyon, France +33 (0)437651704 - Phone ReportLinker.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Revert net: fec: Ensure clocks are enabled while using mdio bus
Am Freitag, den 14.08.2015, 13:47 +0800 schrieb Peter Chen: It causes the i.mx6sx sdb board hang when using nfsroot during boots up at v4.2-rc6. This reverts commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3. Cc: netdev@vger.kernel.org Cc: Fugang Duan b38...@freescale.com Cc: shawn@linaro.org Cc: fabio.este...@freescale.com Cc: tyler.ba...@linaro.org Cc: Lucas Stach l.st...@pengutronix.de Cc: Andrew Lunn and...@lunn.ch Signed-off-by: Peter Chen peter.c...@freescale.com --- According to Fugang Duan, the i.mx series has different clock control sequence among SoCs, this patch may only consider certain SoCs. Sorry, but NACK. Please test current mainline (what will become v4.2-rc7). There is already a patch in that fixes i.MX27 and probably fixes the same problem on i.MX6SX. drivers/net/ethernet/freescale/fec_main.c | 89 +-- 1 file changed, 13 insertions(+), 76 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c..5e8b837 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -24,7 +24,6 @@ #include linux/module.h #include linux/kernel.h #include linux/string.h -#include linux/pm_runtime.h #include linux/ptrace.h #include linux/errno.h #include linux/ioport.h @@ -78,7 +77,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev); #define FEC_ENET_RAEM_V 0x8 #define FEC_ENET_RAFL_V 0x8 #define FEC_ENET_OPD_V 0xFFF0 -#define FEC_MDIO_PM_TIMEOUT 100 /* ms */ static struct platform_device_id fec_devtype[] = { { @@ -1769,13 +1767,7 @@ static void fec_enet_adjust_link(struct net_device *ndev) static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1791,30 +1783,18 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO read timeout\n); - ret = -ETIMEDOUT; - goto out; + return -ETIMEDOUT; } - ret = FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); - -out: - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + /* return value */ + return FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); } static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, u16 value) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1831,13 +1811,10 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO write timeout\n); - ret = -ETIMEDOUT; + return -ETIMEDOUT; } - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + return 0; } static int fec_enet_clk_enable(struct net_device *ndev, bool enable) @@ -1849,6 +1826,9 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) ret = clk_prepare_enable(fep-clk_ahb); if (ret) return ret; + ret = clk_prepare_enable(fep-clk_ipg); + if (ret) + goto failed_clk_ipg; if (fep-clk_enet_out) { ret = clk_prepare_enable(fep-clk_enet_out); if (ret) @@ -1872,6 +1852,7 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) } } else { clk_disable_unprepare(fep-clk_ahb); + clk_disable_unprepare(fep-clk_ipg); if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); if (fep-clk_ptp) { @@ -1893,6 +1874,8 @@ failed_clk_ptp: if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); failed_clk_enet_out: + clk_disable_unprepare(fep-clk_ipg); +failed_clk_ipg: clk_disable_unprepare(fep-clk_ahb); return ret; @@ -2864,14 +2847,10 @@ fec_enet_open(struct net_device *ndev) struct fec_enet_private *fep = netdev_priv(ndev); int ret; - ret = pm_runtime_get_sync(fep-pdev-dev); -
[PATCH net-next]r8169.c: Force transmission when nic refuse to start.
Brute force transmission when Rx interrupt exist on interface is up. Guaranteed to start on full duplex and not maximum speed.When set half duplex working same without this patch. For apply this patch set --whitespace=warn Signed-off-by: Corcodel Marian corcodel.mar...@gmail.com diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index eb2d2a4..6882eab 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -7470,15 +7470,22 @@ static int rtl8169_poll(struct napi_struct *napi, int budget) u16 enable_mask = RTL_EVENT_NAPI | tp-event_slow; int work_done= 0; u16 status; +int tx_force = 1; status = rtl_get_events(tp); rtl_ack_events(tp, status ~tp-event_slow); - + if (netif_running(dev)) { if (status RTL_EVENT_NAPI_RX) work_done = rtl_rx(dev, tp, (u32) budget); + if (status RTL_EVENT_NAPI_TX) rtl_tx(dev, tp); + else if (tx_force == 1) { +mdelay(10); +rtl_tx(dev, tp); + } +} if (status tp-event_slow) { enable_mask = ~tp-event_slow; -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
Dear Michael, Hi Igor, Am Donnerstag, 13. August 2015, 22:18:34 schrieben Sie: * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. Is there any offical errata sheet for this PHY family? How do you know, that this is a common HW bug? The LAN8700, LAN8710, LAN8720 is a product of the SMSC company. Microchip acquired SMSC in August 2012. The LAN8700 is a legacy product for Microchip and they will not update anything about it. So, even if Microchip know about HW bug, then there is no chance to have Errata sheet or any new documents about LAN8700. I think same history is for LAN8710/LAN8720 even if they are not marked as legacy. They are SMSC products. The workarounds for same issue in LAN8710/LAN8720 was committed by: * Marek Vasut ma...@denx.de as b629820d18fa65cc598390e4b9712fd5f83ee693. * Patrick Trantham patrick.trant...@fuel7.com as 4223dbffed9f89596177ff2b256ef3258b20fa46. Me too, I think that this family has some problems with this mode, however, without hard evidence, I would put it softer. I have discovered this bug by just monitoring of data to/from MDIO registers of LAN8700. And HW issue is proven on 100 % by rare absence of ENERGYON bit when cable is plugged in. Sometimes, it is required to make 2-20 tests to catch this issue. The configuration of CPU pins, responsible for the MDIO interface, was checked carefully by oscilloscope and they are fine (no spikes, no garbage, good shape of edges). * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. Signed-off-by: Igor Plyatov plya...@gmail.com --- drivers/net/phy/smsc.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c index c0f6479..8559ff1 100644 --- a/drivers/net/phy/smsc.c +++ b/drivers/net/phy/smsc.c @@ -104,6 +104,7 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int i; if (!phydev-link) { /* Disable EDPD to wake up PHY */ @@ -116,8 +117,16 @@ static int lan87xx_read_status(struct phy_device *phydev) if (rc 0) return rc; - /* Sleep 64 ms to allow ~5 link test pulses to be sent */ - msleep(64); + /* Wait max 640 ms to detect energy */ Why 640ms and not e.g. 650ms? I'm no PHY expert, but this looks like an ugly workaround. Such a value was adopted after many trial and probes. It allows to detect cable plugging on 100 %. Ugly or not, but it works and reliable. Maybe it would be better to avoid this power saving mode at all, when it is not reliable, but this are just my 2cts. :-) Power saving mode allow to save around 220 mW of energy consumed from power supply, when Ethernet cable is not plugged in. This is a good value for embedded devices. Better to keep power save mode on. Anyway, I guess you should also update the explanation on top of the function to reflect your new approach. I propose following comment for the lan87xx_read_status(): /* * The LAN87xx suffers from rare absence of the ENERGYON-bit when Ethernet cable * plugs in while LAN87xx is in Energy Detect Power-Down mode. This leads to * unstable detection of plugging in Ethernet cable. * This workaround disables Energy Detect Power-Down mode and waiting for * response on link pulses to detect presence of plugged Ethernet cable. * The Energy Detect Power-Down mode enabled again in the end of procedure to * save approximately 220 mW of power if cable is unplugged. */ + for (i = 0; i 64; i++) { + /* Sleep to allow link test pulses to be sent */ + msleep(10); + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); + if (rc 0) + return rc; + if (rc MII_LAN83C185_ENERGYON) + break; + }; /* Re-enable EDPD */ rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); @@ -191,7 +200,7 @@ static struct phy_driver smsc_phy_driver[] = { /* basic functions */ .config_aneg = genphy_config_aneg, - .read_status = genphy_read_status, + .read_status = lan87xx_read_status, This one makes sense, since I really guess, that the whole PHY family behave very similar. But this change alone does not solve your problem, right? Yes, use of non modified lan87xx_read_status() only reduce amount of false cable detections, but does not resolve issue completely. .config_init = smsc_phy_config_init, .soft_reset = smsc_phy_reset, Regards, Michael Best wishes. -- Igor Plyatov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at
GCOV_PROFILE_ALL breaks BUILD_BUG_ON(!is_power_of_2(8))
+linux-kernel +#define DECLARE_EWMA(name, _factor, _weight) \ + struct ewma_##name { \ +unsigned long internal; \ + }; \ + static inline void ewma_##name##_init(struct ewma_##name *e) \ + { \ +BUILD_BUG_ON(!__builtin_constant_p(_factor)); \ +BUILD_BUG_ON(!__builtin_constant_p(_weight)); \ +BUILD_BUG_ON(!is_power_of_2(_factor)); \ +BUILD_BUG_ON(!is_power_of_2(_weight)); \ So this seemed fine to me, but for some reason the compiler is saying the BUILD_BUG_ON(!is_power_of_2(x)) fails, if and only if (!) CONFIG_GCOV_PROFILE_ALL is enabled, which seems to boil down to the compiler option -fprofile-arcs. I'm going to replace this with just the code itself, i.e. /* both must be a power of 2 */ BUILD_BUG_ON(_factor (_factor - 1)); BUILD_BUG_ON(_weight (_weight - 1)); but should I have expected this? johannes -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net v2] ppp: fix device unregistration upon netns deletion
PPP devices may get automatically unregistered when their network namespace is getting removed. This happens if the ppp control plane daemon (e.g. pppd) exits while it is the last user of this namespace. This leads to several races: * ppp_exit_net() may destroy the per namespace idr (pn-units_idr) before all file descriptors were released. Successive ppp_release() calls may then cleanup PPP devices with ppp_shutdown_interface() and try to use the already destroyed idr. * Automatic device unregistration may also happen before the ppp_release() call for that device gets executed. Once called on the file owning the device, ppp_release() will then clean it up and try to unregister it a second time. To fix these issues, operations defined in ppp_shutdown_interface() are moved to the PPP device's ndo_uninit() callback. This allows PPP devices to be properly cleaned up by unregister_netdev() and friends. So checking for ppp-owner is now an accurate test to decide if a PPP device should be unregistered. Setting ppp-owner is done in ppp_create_interface(), before device registration, in order to avoid unprotected modification of this field. Finally, ppp_exit_net() now starts by unregistering all remaining PPP devices to ensure that none will get unregistered after the call to idr_destroy(). Signed-off-by: Guillaume Nault g.na...@alphalink.fr --- v2: remove unnecessary curly braces in idr_for_each_entry() drivers/net/ppp/ppp_generic.c | 78 +++ 1 file changed, 42 insertions(+), 36 deletions(-) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 9d15566..fa8f504 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -269,9 +269,9 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound); static void ppp_ccp_closed(struct ppp *ppp); static struct compressor *find_compressor(int type); static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st); -static struct ppp *ppp_create_interface(struct net *net, int unit, int *retp); +static struct ppp *ppp_create_interface(struct net *net, int unit, + struct file *file, int *retp); static void init_ppp_file(struct ppp_file *pf, int kind); -static void ppp_shutdown_interface(struct ppp *ppp); static void ppp_destroy_interface(struct ppp *ppp); static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit); static struct channel *ppp_find_channel(struct ppp_net *pn, int unit); @@ -392,8 +392,10 @@ static int ppp_release(struct inode *unused, struct file *file) file-private_data = NULL; if (pf-kind == INTERFACE) { ppp = PF_TO_PPP(pf); + rtnl_lock(); if (file == ppp-owner) - ppp_shutdown_interface(ppp); + unregister_netdevice(ppp-dev); + rtnl_unlock(); } if (atomic_dec_and_test(pf-refcnt)) { switch (pf-kind) { @@ -593,8 +595,10 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg) mutex_lock(ppp_mutex); if (pf-kind == INTERFACE) { ppp = PF_TO_PPP(pf); + rtnl_lock(); if (file == ppp-owner) - ppp_shutdown_interface(ppp); + unregister_netdevice(ppp-dev); + rtnl_unlock(); } if (atomic_long_read(file-f_count) 2) { ppp_release(NULL, file); @@ -838,11 +842,10 @@ static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf, /* Create a new ppp unit */ if (get_user(unit, p)) break; - ppp = ppp_create_interface(net, unit, err); + ppp = ppp_create_interface(net, unit, file, err); if (!ppp) break; file-private_data = ppp-file; - ppp-owner = file; err = -EFAULT; if (put_user(ppp-file.index, p)) break; @@ -916,6 +919,16 @@ static __net_init int ppp_init_net(struct net *net) static __net_exit void ppp_exit_net(struct net *net) { struct ppp_net *pn = net_generic(net, ppp_net_id); + struct ppp *ppp; + LIST_HEAD(list); + int id; + + rtnl_lock(); + idr_for_each_entry(pn-units_idr, ppp, id) + unregister_netdevice_queue(ppp-dev, list); + + unregister_netdevice_many(list); + rtnl_unlock(); idr_destroy(pn-units_idr); } @@ -1088,8 +1101,28 @@ static int ppp_dev_init(struct net_device *dev) return 0; } +static void ppp_dev_uninit(struct net_device *dev) +{ + struct ppp *ppp = netdev_priv(dev); +
Re: [PATCH 0/2] net: introduce IFF_NO_QUEUE as successor of zero tx_queue_len
On Thu, Aug 13, 2015 at 12:11:57PM -0700, Stephen Hemminger wrote: On Thu, 13 Aug 2015 20:40:37 +0200 Jesper Dangaard Brouer bro...@redhat.com wrote: On Thu, 13 Aug 2015 10:49:50 -0700 Stephen Hemminger step...@networkplumber.org wrote: On Thu, 13 Aug 2015 19:01:05 +0200 Phil Sutter p...@nwl.cc wrote: Up to now, drivers being aware of the above applying to them set dev-tx_queue_len to zero to indicate no qdisc should be attached to the interface they drive and the kernel reacts upon this by assigning the noop qdisc instead of the default pfifo_fast. This implicit agreement though leads to an inconvenient situation once a user tries to attach a real qdisc to these devices, as the formerly special tx_queue_len value becomes a regular one, So this is a workaround for user ignorance by introducing kernel API complexity. Before user sets qdisc, why don't they set tx queue length? Please don't insist on keeping this broke interface... how should users know that BEFORE adding a qdisc they MUST change the _device_ tx queue length (not zero). Before setting any qdisc, they should set queue length anyway. Probably, yes. But if they don't, it depends on the interface driver whether they're screwed or not. In my opinion, this inconsistency alone is worth fixing. Getting back to the original state, they MUST change the device tx queue len back to zero BEFORE deleting the qdisc, such that when assigning the default queue qdisc the system detects this device can work without a qdisc. Changing the tx queue len to zero after the qdisc is deleted will have not effect. Listen to the description, that interface is broken. The kernel really needs to hide these details from userspace. It even allows you to misconfigure the kernel, by tricking the kernel into assigning noqueue to physical devices that really need it. But adding a flag risks breaking external scripts. Could you please elaborate on this? As far as I can tell, introducing a separate flag is the only solution *not* breaking existing scripts. So if you see the rub, I would like to know where exactly it is. Cheers, Phil -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/2] ppp: implement x-netns support
On Thu, Aug 13, 2015 at 09:20:04PM -0700, David Miller wrote: From: Guillaume Nault g.na...@alphalink.fr Date: Thu, 13 Aug 2015 15:28:02 +0200 This series allows PPP devices to reside in a different netns from the PPP unit/channels. Packets only cross netns boundaries when they're transmitted between the net_device and the PPP unit (units and channels always remain in their creation namespace). So only PPP units need to handle cross namespace operations. Channels and lower layer protocols aren't affected. Patch #1 is a bug fix for an existing namespace deletion bug and has been separetly sent to net. Patch #2 is the actual x-netns implementation. Patch #1 needs to be respun with the change I requested. Ok, done. And this is not the way to submit things that have dependencies upon bug fixes. Will do. I was actually unsure about how to handle this case. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] net: ethernet: micrel: fix an error code
The dma_mapping_error() function returns true or false. We should return -ENOMEM if it there is a dma mapping error. Signed-off-by: Dan Carpenter dan.carpen...@oracle.com diff --git a/drivers/net/ethernet/micrel/ks8842.c b/drivers/net/ethernet/micrel/ks8842.c index f78909a..09d2e16 100644 --- a/drivers/net/ethernet/micrel/ks8842.c +++ b/drivers/net/ethernet/micrel/ks8842.c @@ -952,9 +952,8 @@ static int ks8842_alloc_dma_bufs(struct net_device *netdev) sg_dma_address(tx_ctl-sg) = dma_map_single(adapter-dev, tx_ctl-buf, DMA_BUFFER_SIZE, DMA_TO_DEVICE); - err = dma_mapping_error(adapter-dev, - sg_dma_address(tx_ctl-sg)); - if (err) { + if (dma_mapping_error(adapter-dev, sg_dma_address(tx_ctl-sg))) { + err = -ENOMEM; sg_dma_address(tx_ctl-sg) = 0; goto err; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IGMP: Inhibit reports for local multicast groups
Sorry for the duplication - I responded in a similar manner before seeing this. Thanks Philip -Original Message- From: Thadeu Lima de Souza Cascardo [mailto:casca...@redhat.com] Sent: Thursday, August 13, 2015 7:08 PM To: Andrew Lunn Cc: Philip Downey; David Miller; kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshf...@linux-ipv6.org; ka...@trash.net; linux-ker...@vger.kernel.org; netdev@vger.kernel.org Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups On Thu, Aug 13, 2015 at 07:01:37PM +0200, Andrew Lunn wrote: On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote: Hi Andrew IGMP snooping is designed to prevent hosts on a local network from receiving traffic for a multicast group they have not explicitly joined. Link- Local multicast traffic should not have an IGMP client since it is reserved for routing protocols. One would expect that IGMP snooping needs to ignore local multicast traffic in the reserved range intended for routers since there should be no IGMP client to make join requests. The point of this patch is that Linux is sending out group membership for these addresses, it is acting as a client. What happens with a switch which is applying IGMP snooping to link-local multicast groups? You turn on this feature, and you no longer get your routing protocol messages. I had a quick look at RFC 3376. The only mention i spotted for not sending IGMP messages is: The all-systems multicast address, 224.0.0.1, is handled as a special case. On all systems -- that is all hosts and routers, including multicast routers -- reception of packets destined to the all-systems multicast address, from all sources, is permanently enabled on all interfaces on which multicast reception is supported. No IGMP messages are ever sent regarding the all-systems multicast address. IGMP v2 has something similar: The all-systems group (address 224.0.0.1) is handled as a special case. The host starts in Idle Member state for that group on every interface, never transitions to another state, and never sends a report for that group. But i did not find anything which says all other link-local addresses don't need member reports. Did i miss something? Andrew From RFC 4541 (Considerations for Internet Group Management Protocol (IGMP) and Multicast Listener Discovery (MLD) Snooping Switches): 2) Packets with a destination IP (DIP) address in the 224.0.0.X range which are not IGMP must be forwarded on all ports. This recommendation is based on the fact that many host systems do not send Join IP multicast addresses in this range before sending or listening to IP multicast packets. Furthermore, since the 224.0.0.X address range is defined as link-local (not to be routed), it seems unnecessary to keep the state for each address in this range. Additionally, some routers operate in the 224.0.0.X address range without issuing IGMP Joins, and these applications would break if the switch were to prune them due to not having seen a Join Group message from the router. So, it looks like some hosts and routers out there in the field do not send joins for those local addresses. In fact, IPv4 local multicast addresses are ignored when Linux bridge multicast snooping adds a new group. static int br_ip4_multicast_add_group(struct net_bridge *br, ... if (ipv4_is_local_multicast(group)) return 0; Cascardo. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IGMP: Inhibit reports for local multicast groups
Hi Andrew Answers inline... -Original Message- From: Andrew Lunn [mailto:and...@lunn.ch] Sent: Thursday, August 13, 2015 6:02 PM To: Philip Downey Cc: David Miller; kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshfuji@linux- ipv6.org; ka...@trash.net; linux-ker...@vger.kernel.org; netdev@vger.kernel.org Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups On Thu, Aug 13, 2015 at 04:52:32PM +, Philip Downey wrote: Hi Andrew IGMP snooping is designed to prevent hosts on a local network from receiving traffic for a multicast group they have not explicitly joined. Link- Local multicast traffic should not have an IGMP client since it is reserved for routing protocols. One would expect that IGMP snooping needs to ignore local multicast traffic in the reserved range intended for routers since there should be no IGMP client to make join requests. The point of this patch is that Linux is sending out group membership for these addresses, it is acting as a client. What happens with a switch which is applying IGMP snooping to link-local multicast groups? You turn on this feature, and you no longer get your routing protocol messages. It is expected that link-local multicast is always forwarded by switches otherwise routers may not function correctly. From the relevant RFC: RFC 4541 IGMP and MLD Snooping Switches Considerations May 2006 2.1.2. Data Forwarding Rules 1) Packets with a destination IP address outside 224.0.0.X which are not IGMP should be forwarded according to group-based port membership tables and must also be forwarded on router ports. This is the main IGMP snooping functionality for the data path. One approach that an implementation could take would be to maintain separate membership and multicast router tables in software and then merge these tables into a forwarding cache. 2) Packets with a destination IP (DIP) address in the 224.0.0.X range which are not IGMP must be forwarded on all ports. This recommendation is based on the fact that many host systems do not send Join IP multicast addresses in this range before sending or listening to IP multicast packets. Furthermore, since the 224.0.0.X address range is defined as link-local (not to be routed), it seems unnecessary to keep the state for each address in this range. Additionally, some routers operate in the 224.0.0.X address range without issuing IGMP Joins, and these applications would break if the switch were to prune them due to not having seen a Join Group message from the router. I had a quick look at RFC 3376. The only mention i spotted for not sending IGMP messages is: The all-systems multicast address, 224.0.0.1, is handled as a special case. On all systems -- that is all hosts and routers, including multicast routers -- reception of packets destined to the all-systems multicast address, from all sources, is permanently enabled on all interfaces on which multicast reception is supported. No IGMP messages are ever sent regarding the all-systems multicast address. IGMP v2 has something similar: The all-systems group (address 224.0.0.1) is handled as a special case. The host starts in Idle Member state for that group on every interface, never transitions to another state, and never sends a report for that group. But i did not find anything which says all other link-local addresses don't need member reports. Did i miss something? No you did not miss anything - that is correct. However, the RFCs don't really cover the behavior of routers well in some areas. Routing protocols which use the 224.0.0.x address space do not need IGMP therefore it makes no sense to distribute membership reports for these groups. A router which receives an IGMP membership report which includes groups from this reserved address range will ignore it -and probably generate debug messages highlighting an invalid address. Regards Philip Andrew -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv1 net-next 0/5] netlink: mmap: kernel panic and some issues
Hi, Thank you for taking your time. Please let me explain these with code samples on gist. I can not describe and arrange it well, sorry. normal socket nflog sample: https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/nflog.c set iptables iptables -A INPUT -p icmp --icmp-type echo-request \ -j NFLOG --nflog-group 2 --nflog-threshold 4 monitor nlmon (like netsniff-ng), run this sample and ping -i 0.2 -c 10 from another hosts. This sample only shows receive size and nlmsg_type. Same things can be done with rx mmaped socket. rx only mmaped nflog sample: https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/rxring-nflog.c This sample gets a panic if monitoring nlmon. panic message: https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/mmaped_netlink_panic I think it's because of accessing a skb_shared_info when releasing skb, although mmaped netlink skb does not have a skb_shared_info. I tried to fix this at patch 1 and 2 by introducing helper function which will not access a skb_shared_info. And I think nm_status should be set to UNUSED when releasing it so also tried to fix it patch 3. With both tx/rx mmaped, both tx/rx mmaped nflog sample: https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/2d6da8fff31ef61af77e68713fdb1d71978746a6/ring-nflog.c This sample will not work, since msg-msg_iter.type in netlink_sendmsg() is set to 1 (WRITE) when this sample calls sendto(). patch 4 fix this by accepting it. After applying patch 1 and 2, rx only sample can work but it behaves differ from normal one. patch 5 may fix this. And it also works well with my another code which set frame nm_status to SKIP and passes it to worker threads and the worker threads set status to UNUSED, even though ring becomes full. That my another code may set UNUSED status in random, not sequensially, so that it seems I need to check whole ring. Thanks, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/1] Revert net: fec: Ensure clocks are enabled while using mdio bus
Am Freitag, den 14.08.2015, 13:47 +0800 schrieb Peter Chen: It causes the i.mx6sx sdb board hang when using nfsroot during boots up at v4.2-rc6. This reverts commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3. Cc: netdev@vger.kernel.org Cc: Fugang Duan b38...@freescale.com Cc: shawn@linaro.org Cc: fabio.este...@freescale.com Cc: tyler.ba...@linaro.org Cc: Lucas Stach l.st...@pengutronix.de Cc: Andrew Lunn and...@lunn.ch Signed-off-by: Peter Chen peter.c...@freescale.com --- According to Fugang Duan, the i.mx series has different clock control sequence among SoCs, this patch may only consider certain SoCs. Sorry, but NACK. Please test current mainline (what will become v4.2-rc7). There is already a patch in that fixes i.MX27 and probably fixes the same problem on i.MX6SX. Would you help point to me which commit and at which tree? Peter drivers/net/ethernet/freescale/fec_main.c | 89 +-- 1 file changed, 13 insertions(+), 76 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 32e3807c..5e8b837 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -24,7 +24,6 @@ #include linux/module.h #include linux/kernel.h #include linux/string.h -#include linux/pm_runtime.h #include linux/ptrace.h #include linux/errno.h #include linux/ioport.h @@ -78,7 +77,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev); #define FEC_ENET_RAEM_V0x8 #define FEC_ENET_RAFL_V0x8 #define FEC_ENET_OPD_V 0xFFF0 -#define FEC_MDIO_PM_TIMEOUT 100 /* ms */ static struct platform_device_id fec_devtype[] = { { @@ -1769,13 +1767,7 @@ static void fec_enet_adjust_link(struct net_device *ndev) static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1791,30 +1783,18 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO read timeout\n); - ret = -ETIMEDOUT; - goto out; + return -ETIMEDOUT; } - ret = FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); - -out: - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + /* return value */ + return FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); } static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, u16 value) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1831,13 +1811,10 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO write timeout\n); - ret = -ETIMEDOUT; + return -ETIMEDOUT; } - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + return 0; } static int fec_enet_clk_enable(struct net_device *ndev, bool enable) @@ -1849,6 +1826,9 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) ret = clk_prepare_enable(fep-clk_ahb); if (ret) return ret; + ret = clk_prepare_enable(fep-clk_ipg); + if (ret) + goto failed_clk_ipg; if (fep-clk_enet_out) { ret = clk_prepare_enable(fep-clk_enet_out); if (ret) @@ -1872,6 +1852,7 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) } } else { clk_disable_unprepare(fep-clk_ahb); + clk_disable_unprepare(fep-clk_ipg); if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); if (fep-clk_ptp) { @@ -1893,6 +1874,8 @@ failed_clk_ptp: if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); failed_clk_enet_out: + clk_disable_unprepare(fep-clk_ipg); +failed_clk_ipg: clk_disable_unprepare(fep-clk_ahb); return ret; @@ -2864,14 +2847,10 @@ fec_enet_open(struct net_device *ndev) struct fec_enet_private *fep = netdev_priv(ndev); int ret; -
Re: GCOV_PROFILE_ALL breaks BUILD_BUG_ON(!is_power_of_2(8))
On Fri, Aug 14, 2015 at 10:29:04AM +0200, Johannes Berg wrote: +linux-kernel +#define DECLARE_EWMA(name, _factor, _weight) \ + struct ewma_##name { \ + unsigned long internal; \ + }; \ + static inline void ewma_##name##_init(struct ewma_##name *e) \ + { \ + BUILD_BUG_ON(!__builtin_constant_p(_factor)); \ + BUILD_BUG_ON(!__builtin_constant_p(_weight)); \ + BUILD_BUG_ON(!is_power_of_2(_factor)); \ + BUILD_BUG_ON(!is_power_of_2(_weight)); \ So this seemed fine to me, but for some reason the compiler is saying the BUILD_BUG_ON(!is_power_of_2(x)) fails, if and only if (!) CONFIG_GCOV_PROFILE_ALL is enabled, which seems to boil down to the compiler option -fprofile-arcs. I'm going to replace this with just the code itself, i.e. /* both must be a power of 2 */ BUILD_BUG_ON(_factor (_factor - 1)); BUILD_BUG_ON(_weight (_weight - 1)); but should I have expected this? It might have something to do with the fact that is_power_of_2() being an inline function, perhaps with this compiler option it translates to something that can't be used in the context BUILD_BUG_ON() uses it in. There is a BUILD_BUG_ON_NOT_POWER_OF_2() macro you could use. Michal Kubecek -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] Revert net: fec: Ensure clocks are enabled while using mdio bus
On Fri, Aug 14, 2015 at 10:27:33AM +0200, Lucas Stach wrote: Am Freitag, den 14.08.2015, 08:25 + schrieb Peter Chen: Am Freitag, den 14.08.2015, 13:47 +0800 schrieb Peter Chen: It causes the i.mx6sx sdb board hang when using nfsroot during boots up at v4.2-rc6. This reverts commit 8fff755e9f8d0f70a595e79f248695ce6aef5cc3. Cc: netdev@vger.kernel.org Cc: Fugang Duan b38...@freescale.com Cc: shawn@linaro.org Cc: fabio.este...@freescale.com Cc: tyler.ba...@linaro.org Cc: Lucas Stach l.st...@pengutronix.de Cc: Andrew Lunn and...@lunn.ch Signed-off-by: Peter Chen peter.c...@freescale.com --- According to Fugang Duan, the i.mx series has different clock control sequence among SoCs, this patch may only consider certain SoCs. Sorry, but NACK. Please test current mainline (what will become v4.2-rc7). There is already a patch in that fixes i.MX27 and probably fixes the same problem on i.MX6SX. Would you help point to me which commit and at which tree? Mainline, so Linus Torvalds tree. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14d2b7c1a96ef37eb571599c73d4a1a606b964d6 It fixes my imx6sx-sdb board. It is interesting that there was no problem for some platforms, but with problem for others. Your fix is a common runtime PM fix. Again, why we need this as a bug-fix, not but as new feature for next rc1? -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] IGMP: Inhibit reports for local multicast groups
Hi Andrew Will resubmit with the information you suggest. There might be a slight delay for this as I am on holiday now for a week. Rest regards Philip -Original Message- From: Andrew Lunn [mailto:and...@lunn.ch] Sent: Friday, August 14, 2015 2:35 PM To: Philip Downey Cc: David Miller; kuz...@ms2.inr.ac.ru; jmor...@namei.org; yoshfuji@linux- ipv6.org; ka...@trash.net; linux-ker...@vger.kernel.org; netdev@vger.kernel.org Subject: Re: [PATCH] IGMP: Inhibit reports for local multicast groups Hi Philip So with a bit of poking and prodding, we have a much better understanding as to why this is O.K. Maybe your next patch can quote the relevant RFCs and have a much fuller commit message? Thanks Andrew -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] lwtunnel: rename ip lwtunnel attributes
We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look very similar, yet they serve very different purpose. This is confusing for anyone trying to implement a user space tool supporting lwt. As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more logical to have them in lwtunnel.h together with the encap enum. Fixes: 3093fbe7ff4b (route: Per route IP tunnel metadata via lightweight tunnel) Signed-off-by: Jiri Benc jb...@redhat.com --- These are still in net-next only, thus it's safe to change them. It's still a bit weird these attributes are in RTA_ENCAP, perhaps we should also rename RTA_ENCAP to RTA_LWT_ENCAP or similar? --- include/uapi/linux/lwtunnel.h | 14 +++ include/uapi/linux/rtnetlink.h | 15 net/ipv4/ip_tunnel_core.c | 86 +- 3 files changed, 57 insertions(+), 58 deletions(-) diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h index 31377bbea3f8..3bf223bc2367 100644 --- a/include/uapi/linux/lwtunnel.h +++ b/include/uapi/linux/lwtunnel.h @@ -12,5 +12,19 @@ enum lwtunnel_encap_types { #define LWTUNNEL_ENCAP_MAX (__LWTUNNEL_ENCAP_MAX - 1) +enum lwtunnel_ip_t { + LWTUNNEL_IP_UNSPEC, + LWTUNNEL_IP_ID, + LWTUNNEL_IP_DST, + LWTUNNEL_IP_SRC, + LWTUNNEL_IP_TTL, + LWTUNNEL_IP_TOS, + LWTUNNEL_IP_SPORT, + LWTUNNEL_IP_DPORT, + LWTUNNEL_IP_FLAGS, + __LWTUNNEL_IP_MAX, +}; + +#define LWTUNNEL_IP_MAX (__LWTUNNEL_IP_MAX - 1) #endif /* _UAPI_LWTUNNEL_H_ */ diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 47d24cb3fbc1..0d3d3cc43356 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -286,21 +286,6 @@ enum rt_class_t { /* Routing message attributes */ -enum ip_tunnel_t { - IP_TUN_UNSPEC, - IP_TUN_ID, - IP_TUN_DST, - IP_TUN_SRC, - IP_TUN_TTL, - IP_TUN_TOS, - IP_TUN_SPORT, - IP_TUN_DPORT, - IP_TUN_FLAGS, - __IP_TUN_MAX, -}; - -#define IP_TUN_MAX (__IP_TUN_MAX - 1) - enum rtattr_type_t { RTA_UNSPEC, RTA_DST, diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c index 5512f4e4ec1b..fd6319681c50 100644 --- a/net/ipv4/ip_tunnel_core.c +++ b/net/ipv4/ip_tunnel_core.c @@ -192,15 +192,15 @@ struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev, } EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64); -static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = { - [IP_TUN_ID] = { .type = NLA_U64 }, - [IP_TUN_DST]= { .type = NLA_U32 }, - [IP_TUN_SRC]= { .type = NLA_U32 }, - [IP_TUN_TTL]= { .type = NLA_U8 }, - [IP_TUN_TOS]= { .type = NLA_U8 }, - [IP_TUN_SPORT] = { .type = NLA_U16 }, - [IP_TUN_DPORT] = { .type = NLA_U16 }, - [IP_TUN_FLAGS] = { .type = NLA_U16 }, +static const struct nla_policy ip_tun_policy[LWTUNNEL_IP_MAX + 1] = { + [LWTUNNEL_IP_ID]= { .type = NLA_U64 }, + [LWTUNNEL_IP_DST] = { .type = NLA_U32 }, + [LWTUNNEL_IP_SRC] = { .type = NLA_U32 }, + [LWTUNNEL_IP_TTL] = { .type = NLA_U8 }, + [LWTUNNEL_IP_TOS] = { .type = NLA_U8 }, + [LWTUNNEL_IP_SPORT] = { .type = NLA_U16 }, + [LWTUNNEL_IP_DPORT] = { .type = NLA_U16 }, + [LWTUNNEL_IP_FLAGS] = { .type = NLA_U16 }, }; static int ip_tun_build_state(struct net_device *dev, struct nlattr *attr, @@ -208,10 +208,10 @@ static int ip_tun_build_state(struct net_device *dev, struct nlattr *attr, { struct ip_tunnel_info *tun_info; struct lwtunnel_state *new_state; - struct nlattr *tb[IP_TUN_MAX + 1]; + struct nlattr *tb[LWTUNNEL_IP_MAX + 1]; int err; - err = nla_parse_nested(tb, IP_TUN_MAX, attr, ip_tun_policy); + err = nla_parse_nested(tb, LWTUNNEL_IP_MAX, attr, ip_tun_policy); if (err 0) return err; @@ -223,29 +223,29 @@ static int ip_tun_build_state(struct net_device *dev, struct nlattr *attr, tun_info = lwt_tun_info(new_state); - if (tb[IP_TUN_ID]) - tun_info-key.tun_id = nla_get_u64(tb[IP_TUN_ID]); + if (tb[LWTUNNEL_IP_ID]) + tun_info-key.tun_id = nla_get_u64(tb[LWTUNNEL_IP_ID]); - if (tb[IP_TUN_DST]) - tun_info-key.ipv4_dst = nla_get_be32(tb[IP_TUN_DST]); + if (tb[LWTUNNEL_IP_DST]) + tun_info-key.ipv4_dst = nla_get_be32(tb[LWTUNNEL_IP_DST]); - if (tb[IP_TUN_SRC]) - tun_info-key.ipv4_src = nla_get_be32(tb[IP_TUN_SRC]); + if (tb[LWTUNNEL_IP_SRC]) + tun_info-key.ipv4_src = nla_get_be32(tb[LWTUNNEL_IP_SRC]); - if (tb[IP_TUN_TTL]) - tun_info-key.ipv4_ttl =
Re: [PATCH net-next 2/4] packet: add eBPF fanout mode
[ @Willem: RH email doesn't exist anymore, I took it out, otherwise every reply gets a bounce. ;) ] On 08/14/2015 07:03 PM, Alexei Starovoitov wrote: On 8/14/15 8:50 AM, Willem de Bruijn wrote: ... all looks great except in the above the check: if (new-type != BPF_PROG_TYPE_SOCKET_FILTER) { bpf_prog_put(new); return -EINVAL; } is missing. Otherwise user will be able to attach programs of wrong types to fanout. Also instead of: #define PACKET_FANOUT_BPF6 #define PACKET_FANOUT_EBPF7 I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. This is how bpf manpage distinguishes them. We have SO_ATTACH_FILTER and SO_ATTACH_BPF, could also be analogous for fanout, if we want to be consistent with the API? But C/E prefix seems okay too, how you want ... Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), perhaps we should also apply it on fanout? Thanks, Daniel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/4] packet: add eBPF fanout mode
[ @Willem: RH email doesn't exist anymore, I took it out, otherwise every reply gets a bounce. ;) ] Sorry for using the wrong address, Daniel. Also instead of: #define PACKET_FANOUT_BPF6 #define PACKET_FANOUT_EBPF7 I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. This is how bpf manpage distinguishes them. We have SO_ATTACH_FILTER and SO_ATTACH_BPF, could also be analogous for fanout, if we want to be consistent with the API? But C/E prefix seems okay too, how you want ... I don't feel very strongly, either. But CBPF/EBPF is a bit more descriptive, so let's do that. Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), perhaps we should also apply it on fanout? Good point. With classic bpf, packet access control is fully enforced in per-socket filters, but playing with load balancing filters could allow an adversary to infer some information about the dropped packets*. With eBPF and maps, access is even more direct. Let's support locking of fanout filters in place. I intend to test the existing socket flag. No need to add a separate flag for the fanout group, as far as I can see. (*) I noticed that a similar unintended effect also causes the PACKET_FANOUT_LB selftest to be flaky: filters on the sockets ensure that the test only reads expected packets. But, all traffic makes it through packet_rcv_fanout. Packets that are later dropped by sk_filter have already incremented rr_cur. Worst case, with 2 sockets and each accepted packet interleaved with a dropped packet, all packets are queued on only one socket. Test flakiness is fixed, e.g., by running in a private network namespace. The implementation behavior may be unexpected in other, production, environments. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next]r8169.c: Force transmission when nic refuse to start.
Corcodel Marian corcodel.mar...@gmail.com : [...] diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index eb2d2a4..6882eab 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -7470,15 +7470,22 @@ static int rtl8169_poll(struct napi_struct *napi, int budget) u16 enable_mask = RTL_EVENT_NAPI | tp-event_slow; int work_done= 0; u16 status; +int tx_force = 1; status = rtl_get_events(tp); rtl_ack_events(tp, status ~tp-event_slow); - + if (netif_running(dev)) { if (status RTL_EVENT_NAPI_RX) work_done = rtl_rx(dev, tp, (u32) budget); + if (status RTL_EVENT_NAPI_TX) rtl_tx(dev, tp); + else if (tx_force == 1) { +mdelay(10); +rtl_tx(dev, tp); + } +} Please try to use TimerInt instead of this ugly hack. -- Ueimor -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/4] packet: add eBPF fanout mode
On 08/14/2015 09:27 PM, Willem de Bruijn wrote: ... Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), perhaps we should also apply it on fanout? Good point. With classic bpf, packet access control is fully enforced in per-socket filters, but playing with load balancing filters could allow an adversary to infer some information about the dropped packets*. With eBPF and maps, access is even more direct. Let's support locking of fanout filters in place. Right, a process could share a map between the fanout lb filter and actual sk filter, i.e. to look up how much actually passed through on the later sk level filter, and use that information in addition for its lb decisions. I intend to test the existing socket flag. No need to add a separate flag for the fanout group, as far as I can see. Agreed, should be okay. Thanks Willem! (*) I noticed that a similar unintended effect also causes the PACKET_FANOUT_LB selftest to be flaky: filters on the sockets ensure that the test only reads expected packets. But, all traffic makes it through packet_rcv_fanout. Packets that are later dropped by sk_filter have already incremented rr_cur. Worst case, with 2 sockets and each accepted packet interleaved with a dropped packet, all packets are queued on only one socket. Test flakiness is fixed, e.g., by running in a private network namespace. The implementation behavior may be unexpected in other, production, environments. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/4] packet: add eBPF fanout mode
On Fri, Aug 14, 2015 at 1:03 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 8/14/15 8:50 AM, Willem de Bruijn wrote: +static int fanout_set_data_ebpf(struct packet_fanout *f, char __user *data, + unsigned int len) +{ + struct bpf_prog *new; + u32 fd; + + if (len != sizeof(fd)) + return -EINVAL; + if (copy_from_user(fd, data, len)) + return -EFAULT; + + new = bpf_prog_get(fd); + if (IS_ERR(new)) + return PTR_ERR(new); + + __fanout_set_data_bpf(f, new); + return 0; +} all looks great except in the above the check: if (new-type != BPF_PROG_TYPE_SOCKET_FILTER) { bpf_prog_put(new); return -EINVAL; } is missing. Otherwise user will be able to attach programs of wrong types to fanout. Ai, good point! Also instead of: #define PACKET_FANOUT_BPF 6 #define PACKET_FANOUT_EBPF 7 I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. This is how bpf manpage distinguishes them. Sounds good. I'll make both changes in v2. Thanks for reviewing, Alexei. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] be2net: avoid vxlan offloading on multichannel configs
VxLAN offloading is not functional if the NIC is running in multichannel mode (UMC, FLEX-10, VNIC...). Enabling this additionally kills whole connectivity through the NIC and the device needs to be down and up to restore it. The firmware should take care about it and does not allow the conversion of interface to tunnel type (be_cmd_manage_iface) or should support VxLAN offloading if multichannel config is enabled. I have tested this on the latest available firmware (10.6.144.21). Result: [root@sm-04 ~]# ip link set enp5s0f0 up[root@sm-04 ~]# ip addr add 172.30.10.50/24 dev enp5s0f0 [root@sm-04 ~]# ping -c 3 172.30.10.254PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. 64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.317 ms 64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.187 ms 64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.188 ms --- 172.30.10.254 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.187/0.230/0.317/0.063 ms [root@sm-04 ~]# ip link add link enp5s0f0 vxlan10 type vxlan id 10 remote 172.30.10.60 dstport 4789 [root@sm-04 ~]# ip link set vxlan10 up [ 7900.442811] be2net :05:00.0: Enabled VxLAN offloads for UDP port 4789 [ 7900.455722] be2net :05:00.1: Enabled VxLAN offloads for UDP port 4789 [ 7900.468635] be2net :05:00.2: Enabled VxLAN offloads for UDP port 4789 [ 7900.481553] be2net :05:00.3: Enabled VxLAN offloads for UDP port 4789 [root@sm-04 ~]# ping -c 3 172.30.10.254 PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. --- 172.30.10.254 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms [root@sm-04 ~]# ip link set vxlan10 down [ 7959.434093] be2net :05:00.0: Disabled VxLAN offloads for UDP port 4789 [ 7959.444792] be2net :05:00.1: Disabled VxLAN offloads for UDP port 4789 [ 7959.455592] be2net :05:00.2: Disabled VxLAN offloads for UDP port 4789 [ 7959.466416] be2net :05:00.3: Disabled VxLAN offloads for UDP port 4789 [root@sm-04 ~]# ip link del vxlan10 [root@sm-04 ~]# ping -c 3 172.30.10.254 PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. --- 172.30.10.254 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms [root@sm-04 ~]# ip link set enp5s0f0 down [root@sm-04 ~]# ip link set enp5s0f0 up [ 8071.019003] be2net :05:00.0 enp5s0f0: Link is Up [root@sm-04 ~]# ping -c 3 172.30.10.254 PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. 64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.318 ms 64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.196 ms 64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.194 ms --- 172.30.10.254 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.194/0.236/0.318/0.057 ms Cc: Sathya Perla sathya.pe...@avagotech.com Cc: Ajit Khaparde ajit.khapa...@avagotech.com Cc: Padmanabh Ratnakar padmanabh.ratna...@avagotech.com Cc: Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com Signed-off-by: Ivan Vecera ivec...@redhat.com --- drivers/net/ethernet/emulex/benet/be_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index c28e3bf..6ca693b 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -5174,7 +5174,7 @@ static void be_add_vxlan_port(struct net_device *netdev, sa_family_t sa_family, struct device *dev = adapter-pdev-dev; int status; - if (lancer_chip(adapter) || BEx_chip(adapter)) + if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter)) return; if (adapter-flags BE_FLAGS_VXLAN_OFFLOADS) { @@ -5221,7 +5221,7 @@ static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family, { struct be_adapter *adapter = netdev_priv(netdev); - if (lancer_chip(adapter) || BEx_chip(adapter)) + if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter)) return; if (adapter-vxlan_port != port) -- 2.4.6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bug 102861] New: soft lockup with inet: fix races with reqsk timers
On Fri, 2015-08-14 at 08:57 -0700, Stephen Hemminger wrote: Begin forwarded message: Date: Fri, 14 Aug 2015 14:32:56 + From: bugzilla-dae...@bugzilla.kernel.org bugzilla-dae...@bugzilla.kernel.org To: shemmin...@linux-foundation.org shemmin...@linux-foundation.org Subject: [Bug 102861] New: soft lockup with inet: fix races with reqsk timers https://bugzilla.kernel.org/show_bug.cgi?id=102861 Bug ID: 102861 Summary: soft lockup with inet: fix races with reqsk timers Product: Networking Version: 2.5 Kernel Version: net-next d52736e2 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: IPV4 Assignee: shemmin...@linux-foundation.org Reporter: andreas.r...@gmail.com Regression: No Created attachment 184921 -- https://bugzilla.kernel.org/attachment.cgi?id=184921action=edit dmesg picture Happens withing 10min while eg. torrenting. net-next from a few days ago was fine. Picture as ssh hangs and dmesg to disk lands in nirvana instead. Checked via reverting this commit, which allowed me to merrily torrent without lockup again. This does not 100% prove that it was actually the cause, ofc. Fixed with : http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net 2/3] ipv6: Add rt6_make_pcpu_route()
It is a prep work for fixing a potential deadlock when creating a pcpu rt. The current rt6_get_pcpu_route() will also create a pcpu rt if one does not exist. This patch moves the pcpu rt creation logic into another function, rt6_make_pcpu_route(). Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index c95c319..0a82653 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -993,13 +993,21 @@ static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info *rt) /* It should be called with read_lock_bh(tb6_lock) acquired */ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) { - struct rt6_info *pcpu_rt, *prev, **p; + struct rt6_info *pcpu_rt, **p; p = this_cpu_ptr(rt-rt6i_pcpu); pcpu_rt = *p; - if (pcpu_rt) - goto done; + if (pcpu_rt) { + dst_hold(pcpu_rt-dst); + rt6_dst_from_metrics_check(pcpu_rt); + } + return pcpu_rt; +} + +static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) +{ + struct rt6_info *pcpu_rt, *prev, **p; pcpu_rt = ip6_rt_pcpu_alloc(rt); if (!pcpu_rt) { @@ -1009,6 +1017,7 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) goto done; } + p = this_cpu_ptr(rt-rt6i_pcpu); prev = cmpxchg(p, NULL, pcpu_rt); if (prev) { /* If someone did it before us, return prev instead */ @@ -1093,8 +1102,11 @@ redo_rt6_select: rt-dst.lastuse = jiffies; rt-dst.__use++; pcpu_rt = rt6_get_pcpu_route(rt); - read_unlock_bh(table-tb6_lock); + if (!pcpu_rt) + pcpu_rt = rt6_make_pcpu_route(rt); + + read_unlock_bh(table-tb6_lock); return pcpu_rt; } } -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net 0/3] ipv6: Fix a potential deadlock when creating pcpu rt
v1 - v2: A minor change in the commit message of patch 2. This patch series fixes a potential deadlock when creating a pcpu rt. It happens when dst_alloc() decided to run gc. Something like this: read_lock(table-tb6_lock); ip6_rt_pcpu_alloc() = dst_alloc() = ip6_dst_gc() = write_lock(table-tb6_lock); /* oops */ Patch 1 and 2 are some prep works. Patch 3 is the fix. Original report: https://bugzilla.kernel.org/show_bug.cgi?id=102291 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/4] packet: add eBPF fanout mode
On 8/14/15 8:50 AM, Willem de Bruijn wrote: +static int fanout_set_data_ebpf(struct packet_fanout *f, char __user *data, + unsigned int len) +{ + struct bpf_prog *new; + u32 fd; + + if (len != sizeof(fd)) + return -EINVAL; + if (copy_from_user(fd, data, len)) + return -EFAULT; + + new = bpf_prog_get(fd); + if (IS_ERR(new)) + return PTR_ERR(new); + + __fanout_set_data_bpf(f, new); + return 0; +} all looks great except in the above the check: if (new-type != BPF_PROG_TYPE_SOCKET_FILTER) { bpf_prog_put(new); return -EINVAL; } is missing. Otherwise user will be able to attach programs of wrong types to fanout. Also instead of: #define PACKET_FANOUT_BPF 6 #define PACKET_FANOUT_EBPF 7 I would call them FANOUT_CBPF and FANOUT_EBPF to be unambiguous. This is how bpf manpage distinguishes them. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
* Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. Signed-off-by: Igor Plyatov plya...@gmail.com --- drivers/net/phy/smsc.c | 31 +++ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c index c0f6479..d64f016 100644 --- a/drivers/net/phy/smsc.c +++ b/drivers/net/phy/smsc.c @@ -91,19 +91,18 @@ static int lan911x_config_init(struct phy_device *phydev) } /* - * The LAN8710/LAN8720 requires a minimum of 2 link pulses within 64ms of each - * other in order to set the ENERGYON bit and exit EDPD mode. If a link partner - * does send the pulses within this interval, the PHY will remained powered - * down. - * - * This workaround will manually toggle the PHY on/off upon calls to read_status - * in order to generate link test pulses if the link is down. If a link partner - * is present, it will respond to the pulses, which will cause the ENERGYON bit - * to be set and will cause the EDPD mode to be exited. + * The LAN87xx suffers from rare absence of the ENERGYON-bit when Ethernet cable + * plugs in while LAN87xx is in Energy Detect Power-Down mode. This leads to + * unstable detection of plugging in Ethernet cable. + * This workaround disables Energy Detect Power-Down mode and waiting for + * response on link pulses to detect presence of plugged Ethernet cable. + * The Energy Detect Power-Down mode is enabled again in the end of procedure to + * save approximately 220 mW of power if cable is unplugged. */ static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int i; if (!phydev-link) { /* Disable EDPD to wake up PHY */ @@ -116,8 +115,16 @@ static int lan87xx_read_status(struct phy_device *phydev) if (rc 0) return rc; - /* Sleep 64 ms to allow ~5 link test pulses to be sent */ - msleep(64); + /* Wait max 640 ms to detect energy */ + for (i = 0; i 64; i++) { + /* Sleep to allow link test pulses to be sent */ + msleep(10); + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); + if (rc 0) + return rc; + if (rc MII_LAN83C185_ENERGYON) + break; + }; /* Re-enable EDPD */ rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); @@ -191,7 +198,7 @@ static struct phy_driver smsc_phy_driver[] = { /* basic functions */ .config_aneg= genphy_config_aneg, - .read_status= genphy_read_status, + .read_status= lan87xx_read_status, .config_init= smsc_phy_config_init, .soft_reset = smsc_phy_reset, -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Several races in usbnet module (kernel 4.1.x)
Hi, 21.07.2015 17:22, Oliver Neukum пишет: On Mon, 2015-07-20 at 21:13 +0300, Eugene Shatokhin wrote: And here, the code clears EVENT_RX_KILL bit in dev-flags, which may execute concurrently with the above operation: #0 clear_bit (bitops.h:113, inlined) #1 usbnet_bh (usbnet.c:1475) /* restart RX again after disabling due to high error rate */ clear_bit(EVENT_RX_KILL, dev-flags); If clear_bit() is atomic w.r.t. setting dev-flags to 0, this race is not a problem, I guess. Otherwise, it may be. clear_bit is atomic with respect to other atomic operations. So how about this: Regards Oliver From 1c4e685b3a9c183e04c46b661830e5c7ed35b513 Mon Sep 17 00:00:00 2001 From: Oliver Neukum oneu...@suse.com Date: Tue, 21 Jul 2015 16:19:40 +0200 Subject: [PATCH] usbnet: fix race between usbnet_stop() and the BH Does this do the job? Signed-off-by: Oliver Neukum oneu...@suse.com --- drivers/net/usb/usbnet.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 3c86b10..77a9a86 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -778,7 +778,7 @@ int usbnet_stop (struct net_device *net) { struct usbnet *dev = netdev_priv(net); struct driver_info *info = dev-driver_info; - int retval, pm; + int retval, pm, mpn; clear_bit(EVENT_DEV_OPEN, dev-flags); netif_stop_queue (net); @@ -813,14 +813,17 @@ int usbnet_stop (struct net_device *net) * can't flush_scheduled_work() until we drop rtnl (later), * else workers could deadlock; so make workers a NOP. */ + mpn = !test_and_clear_bit(EVENT_NO_RUNTIME_PM, dev-flags); dev-flags = 0; del_timer_sync (dev-delay); tasklet_kill (dev-bh); + mpn |= !test_and_clear_bit(EVENT_NO_RUNTIME_PM, dev-flags); + /* in case the bh reset a flag */ + dev-flags = 0; if (!pm) usb_autopm_put_interface(dev-intf); - if (info-manage_power - !test_and_clear_bit(EVENT_NO_RUNTIME_PM, dev-flags)) + if (info-manage_power mpn) info-manage_power(dev, 0); else usb_autopm_put_interface(dev-intf); From what we have discussed here, I have combined a patch that fixes the race #1 in usbnet_stop() and makes #4 harmless by using atomics. I will send it shortly. I had to make some adjustments (e.g. using spin_lock_nested in one place for lockdep to see it is OK to take dev-done.lock there). I have tested the patch on the mainline kernel 4.2-rc6 built for x86-64, with the same USB modem. So far, lockdep, Kmemleak (just in case) and my tools have not detected problems in the relevant parts of the code. The device and the driver seem to work well. So, what is your opinion? Regards, Eugene -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] lwtunnel: rename ip lwtunnel attributes
On 08/14/15 at 04:40pm, Jiri Benc wrote: We already have IFLA_IPTUN_ netlink attributes. The IP_TUN_ attributes look very similar, yet they serve very different purpose. This is confusing for anyone trying to implement a user space tool supporting lwt. As the IP_TUN_ attributes are used only for the lightweight tunnels, prefix them with LWTUNNEL_IP_ instead to make their purpose clear. Also, it's more logical to have them in lwtunnel.h together with the encap enum. Fixes: 3093fbe7ff4b (route: Per route IP tunnel metadata via lightweight tunnel) Signed-off-by: Jiri Benc jb...@redhat.com Acked-by: Thomas Graf tg...@suug.ch --- These are still in net-next only, thus it's safe to change them. It's still a bit weird these attributes are in RTA_ENCAP, perhaps we should also rename RTA_ENCAP to RTA_LWT_ENCAP or similar? I think RTA_ENCAP is fine but I don't mind changing it either. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: fix PHY_RUNNING in phy_state_machine
Le 08/13/15 21:23, shh@gmail.com a écrit : From: Shaohui Xie shaohui@freescale.com Currently, if phy state is PHY_RUNNING, we always register a CHANGE when phy works in polling or interrupt ignored, this will make the adjust_link being called even the phy link did Not changed. Right, which is why most drivers do implement a caching scheme. checking the phy link to make sure the link did changed before we register a CHANGE, if link did not changed, we do nothing. With your change we will end-up with virtually polling a PHY twice as fast as we used to with the RUNNING - CHANGELINK - RUNNING transition (current state transitions), which is probably fine, but puts a bit more pressure on the (slow) MDIO bus since we end-up with two additional reads to latch the link status register. PS: I would appreciate if you could CC me on future libphy submissions. Signed-off-by: Shaohui Xie shaohui@freescale.com --- drivers/net/phy/phy.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 84b1fba..d972851 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -814,6 +814,7 @@ void phy_state_machine(struct work_struct *work) bool needs_aneg = false, do_suspend = false; enum phy_state old_state; int err = 0; + int old_link; mutex_lock(phydev-lock); @@ -899,11 +900,18 @@ void phy_state_machine(struct work_struct *work) phydev-adjust_link(phydev-attached_dev); break; case PHY_RUNNING: - /* Only register a CHANGE if we are - * polling or ignoring interrupts + /* Only register a CHANGE if we are polling or ignoring + * interrupts and link changed since latest checking. */ - if (!phy_interrupt_is_valid(phydev)) - phydev-state = PHY_CHANGELINK; + if (!phy_interrupt_is_valid(phydev)) { + old_link = phydev-link; + err = phy_read_status(phydev); + if (err) + break; + + if (old_link != phydev-link) + phydev-state = PHY_CHANGELINK; + } break; case PHY_CHANGELINK: err = phy_read_status(phydev); -- Florian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
Hi Igor, Am Freitag, 14. August 2015, 11:03:04 schrieb Igor Plyatov: Dear Michael, Hi Igor, Am Donnerstag, 13. August 2015, 22:18:34 schrieben Sie: * Due to HW bug, LAN8700 sometimes does not detect presence of energy in the Ethernet cable in Energy Detect Power-Down mode (e.g while EDPWRDOWN bit is set, the ENERGYON bit does not asserted sometimes). This is a common bug of LAN87xx family of PHY chips. Is there any offical errata sheet for this PHY family? How do you know, that this is a common HW bug? The LAN8700, LAN8710, LAN8720 is a product of the SMSC company. Microchip acquired SMSC in August 2012. The LAN8700 is a legacy product for Microchip and they will not update anything about it. So, even if Microchip know about HW bug, then there is no chance to have Errata sheet or any new documents about LAN8700. Long time ago, I worked on a custom device with a PHY of the same family. Errata sheet existed but was only available by signing a NDA. So I simply wondered whether this changed since SMSC is now Microchip or if they keep it still so covered... I think same history is for LAN8710/LAN8720 even if they are not marked as legacy. They are SMSC products. The workarounds for same issue in LAN8710/LAN8720 was committed by: * Marek Vasut ma...@denx.de as b629820d18fa65cc598390e4b9712fd5f83ee693. * Patrick Trantham patrick.trant...@fuel7.com as 4223dbffed9f89596177ff2b256ef3258b20fa46. Me too, I think that this family has some problems with this mode, however, without hard evidence, I would put it softer. I have discovered this bug by just monitoring of data to/from MDIO registers of LAN8700. And HW issue is proven on 100 % by rare absence of ENERGYON bit when cable is plugged in. Sometimes, it is required to make 2-20 tests to catch this issue. The configuration of CPU pins, responsible for the MDIO interface, was checked carefully by oscilloscope and they are fine (no spikes, no garbage, good shape of edges). * The lan87xx_read_status() was improved to acquire ENERGYON bit. Its previous algorythm still not reliable on 100 % and sometimes skip cable plugging. Signed-off-by: Igor Plyatov plya...@gmail.com --- drivers/net/phy/smsc.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c index c0f6479..8559ff1 100644 --- a/drivers/net/phy/smsc.c +++ b/drivers/net/phy/smsc.c @@ -104,6 +104,7 @@ static int lan911x_config_init(struct phy_device *phydev) static int lan87xx_read_status(struct phy_device *phydev) { int err = genphy_read_status(phydev); + int i; if (!phydev-link) { /* Disable EDPD to wake up PHY */ @@ -116,8 +117,16 @@ static int lan87xx_read_status(struct phy_device *phydev) if (rc 0) return rc; - /* Sleep 64 ms to allow ~5 link test pulses to be sent */ - msleep(64); + /* Wait max 640 ms to detect energy */ Why 640ms and not e.g. 650ms? I'm no PHY expert, but this looks like an ugly workaround. Such a value was adopted after many trial and probes. It allows to detect cable plugging on 100 %. Ugly or not, but it works and reliable. Maybe it would be better to avoid this power saving mode at all, when it is not reliable, but this are just my 2cts. :-) Power saving mode allow to save around 220 mW of energy consumed from power supply, when Ethernet cable is not plugged in. This is a good value for embedded devices. Better to keep power save mode on. Ok, I was not aware, that this is so much. Anyway, I guess you should also update the explanation on top of the function to reflect your new approach. I propose following comment for the lan87xx_read_status(): /* * The LAN87xx suffers from rare absence of the ENERGYON-bit when Ethernet cable * plugs in while LAN87xx is in Energy Detect Power-Down mode. This leads to * unstable detection of plugging in Ethernet cable. * This workaround disables Energy Detect Power-Down mode and waiting for * response on link pulses to detect presence of plugged Ethernet cable. * The Energy Detect Power-Down mode enabled again in the end of procedure to * save approximately 220 mW of power if cable is unplugged. */ Nice. Only one nitpick: ... _is_ enabled again... + for (i = 0; i 64; i++) { + /* Sleep to allow link test pulses to be sent */ + msleep(10); + rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); + if (rc 0) + return rc; + if (rc MII_LAN83C185_ENERGYON) + break; + }; /* Re-enable EDPD */ rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS); @@ -191,7 +200,7 @@ static struct phy_driver smsc_phy_driver[] = { /* basic functions */ .config_aneg = genphy_config_aneg, -
Re: [PATCH v2] net: phy: workaround for buggy cable detection by LAN8700 after cable plugging
Dear Michael, The LAN8700, LAN8710, LAN8720 is a product of the SMSC company. Microchip acquired SMSC in August 2012. The LAN8700 is a legacy product for Microchip and they will not update anything about it. So, even if Microchip know about HW bug, then there is no chance to have Errata sheet or any new documents about LAN8700. Long time ago, I worked on a custom device with a PHY of the same family. Errata sheet existed but was only available by signing a NDA. So I simply wondered whether this changed since SMSC is now Microchip or if they keep it still so covered... The Microchip web-site does not contain Errata sheet for LAN87xx devices. While it contains many Errata sheets for PIC and dsPIC devices. So, situation is same as many years ago. I propose following comment for the lan87xx_read_status(): /* * The LAN87xx suffers from rare absence of the ENERGYON-bit when Ethernet cable * plugs in while LAN87xx is in Energy Detect Power-Down mode. This leads to * unstable detection of plugging in Ethernet cable. * This workaround disables Energy Detect Power-Down mode and waiting for * response on link pulses to detect presence of plugged Ethernet cable. * The Energy Detect Power-Down mode enabled again in the end of procedure to * save approximately 220 mW of power if cable is unplugged. */ Nice. Only one nitpick: ... _is_ enabled again... Changed in [PATCH v3]. Best wishes. -- Igor Plyatov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 04/11] udp: Handle VRF device in sendmsg
On Thu, Aug 13, 2015 at 1:59 PM, David Ahern d...@cumulusnetworks.com wrote: For unconnected UDP sockets using a VRF device lookup source address based on VRF table. This allows the UDP header to be properly setup before showing up at the VRF device via the dst. Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com Signed-off-by: David Ahern d...@cumulusnetworks.com --- net/ipv4/udp.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604f9273..7af5052e3b1f 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1013,11 +1013,31 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (!rt) { struct net *net = sock_net(sk); + __u8 flow_flags = inet_sk_flowi_flags(sk); fl4 = fl4_stack; + + /* unconnected socket. If output device is enslaved to a VRF +* device lookup source address from VRF table. This mimics +* behavior of ip_route_connect{_init}. +*/ + if (netif_index_is_vrf(net, ipc.oif)) { + flowi4_init_output(fl4, ipc.oif, sk-sk_mark, tos, + RT_SCOPE_UNIVERSE, sk-sk_protocol, + (flow_flags | FLOWI_FLAG_VRFSRC), + faddr, saddr, dport, + inet-inet_sport); + + rt = ip_route_output_flow(net, fl4, sk); + if (!IS_ERR(rt)) { + saddr = fl4-saddr; + ip_rt_put(rt); + } + } + I really don't like this. It seems like you're putting device specific code in a critical L4 data path function. Also, does ipv6/udp.c need be updated similarly? Why can't VRF be abstracted out in routing lookups? Tom flowi4_init_output(fl4, ipc.oif, sk-sk_mark, tos, RT_SCOPE_UNIVERSE, sk-sk_protocol, - inet_sk_flowi_flags(sk), + flow_flags, faddr, saddr, dport, inet-inet_sport); security_sk_classify_flow(sk, flowi4_to_flowi(fl4)); -- 2.3.2 (Apple Git-55) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] usbnet: Fix two races between usbnet_stop() and the BH
Both races may happen when a device (e.g. YOTA 4G LTE Modem) is unplugged while the system is downloading a large file from the Net. Hardware breakpoints and Kprobes with delays were used to confirm that the races do actually happen. 1. The first race is on skb_queue ('next' pointer) between usbnet_stop() and rx_complete(), which, in turn, calls usbnet_bh(). Here is a part of the call stack with the code where the changes to the queue happen. The line numbers are for the kernel 4.1.0: *0 __skb_unlink (skbuff.h:1517) prev-next = next; *1 defer_bh (usbnet.c:430) spin_lock_irqsave(list-lock, flags); old_state = entry-state; entry-state = state; __skb_unlink(skb, list); spin_unlock(list-lock); spin_lock(dev-done.lock); __skb_queue_tail(dev-done, skb); if (dev-done.qlen == 1) tasklet_schedule(dev-bh); spin_unlock_irqrestore(dev-done.lock, flags); *2 rx_complete (usbnet.c:640) state = defer_bh(dev, skb, dev-rxq, state); At the same time, the following code repeatedly checks if the queue is empty and reads these values concurrently with the above changes: *0 usbnet_terminate_urbs (usbnet.c:765) /* maybe wait for deletions to finish. */ while (!skb_queue_empty(dev-rxq) !skb_queue_empty(dev-txq) !skb_queue_empty(dev-done)) { schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); set_current_state(TASK_UNINTERRUPTIBLE); netif_dbg(dev, ifdown, dev-net, waited for %d urb completions\n, temp); } *1 usbnet_stop (usbnet.c:806) if (!(info-flags FLAG_AVOID_UNLINK_URBS)) usbnet_terminate_urbs(dev); As a result, it is possible, for example, that the skb is removed from dev-rxq by __skb_unlink() before the check !skb_queue_empty(dev-rxq) in usbnet_terminate_urbs() is made. It is also possible in this case that the skb is added to dev-done queue after !skb_queue_empty(dev-done) is checked. So usbnet_terminate_urbs() may stop waiting and return while dev-done queue still has an item. Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid this race. 2. The second race is on dev-flags. dev-flags is set to 0 here: *0 usbnet_stop (usbnet.c:816) /* deferred work (task, timer, softirq) must also stop. * can't flush_scheduled_work() until we drop rtnl (later), * else workers could deadlock; so make workers a NOP. */ dev-flags = 0; del_timer_sync (dev-delay); tasklet_kill (dev-bh); And here, the code clears EVENT_RX_KILL bit in dev-flags, which may execute concurrently with the above operation: *0 clear_bit (bitops.h:113, inlined) *1 usbnet_bh (usbnet.c:1475) /* restart RX again after disabling due to high error rate */ clear_bit(EVENT_RX_KILL, dev-flags); It seems, setting dev-flags to 0 is not necessarily atomic w.r.t. clear_bit() and other bit operations with dev-flags. It is safer to make it atomic and this way, make the race harmless. While at it, the checking of EVENT_NO_RUNTIME_PM bit of dev-flags in usbnet_stop() was fixed too: the bit should be checked before dev-flags is cleared. Signed-off-by: Eugene Shatokhin eugene.shatok...@rosalab.ru --- drivers/net/usb/usbnet.c | 49 -- include/linux/usb/usbnet.h | 33 +++ 2 files changed, 54 insertions(+), 28 deletions(-) diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 3c86b10..a53124c 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -428,12 +428,18 @@ static enum skb_state defer_bh(struct usbnet *dev, struct sk_buff *skb, old_state = entry-state; entry-state = state; __skb_unlink(skb, list); - spin_unlock(list-lock); - spin_lock(dev-done.lock); + + /* defer_bh() is never called with list == dev-done. +* spin_lock_nested() tells lockdep that it is OK to take +* dev-done.lock here with list-lock held. * +*/ + spin_lock_nested(dev-done.lock, SINGLE_DEPTH_NESTING); + __skb_queue_tail(dev-done, skb); if (dev-done.qlen == 1) tasklet_schedule(dev-bh); - spin_unlock_irqrestore(dev-done.lock, flags); + spin_unlock(dev-done.lock); + spin_unlock_irqrestore(list-lock, flags); return old_state; } @@ -749,6 +755,20 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs); /*-*/ +static void wait_skb_queue_empty(struct sk_buff_head *q) +{ + unsigned long flags; + + spin_lock_irqsave(q-lock, flags); + while (!skb_queue_empty(q)) { + spin_unlock_irqrestore(q-lock, flags); + schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS)); + set_current_state(TASK_UNINTERRUPTIBLE); + spin_lock_irqsave(q-lock, flags); + } + spin_unlock_irqrestore(q-lock, flags); +} + // precondition: never
Re: [PATCH RFC net 0/3] ipv6: Fix potential deadlock when creating pcpu rt
On Thu, Aug 13, 2015 at 05:29:09PM -0700, David Miller wrote: From: Martin KaFai Lau ka...@fb.com Date: Thu, 13 Aug 2015 00:58:00 -0700 This patch series fixes a potential deadlock when creating a pcpu rt. It happens when dst_alloc() decided to run gc. Something like this: read_lock(table-tb6_lock); ip6_rt_pcpu_alloc() = dst_alloc() = ip6_dst_gc() = write_lock(table-tb6_lock); /* oops */ Patch 1 and 2 are some prep works. Patch 3 is the fix. Original report: https://bugzilla.kernel.org/show_bug.cgi?id=102291 Steinar, the patches can also be applied to 4.2-rc5 (I just tried). Can you help to test them? Thanks! This series looks fine to me. Thanks. I will repost it with a minor change in one of the commit messages. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net 3/3] ipv6: Fix a potential deadlock when creating pcpu rt
rt6_make_pcpu_route() is called under read_lock(table-tb6_lock). rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then calls dst_alloc(). dst_alloc() _may_ call ip6_dst_gc() which takes the write_lock(tabl-tb6_lock). A visualized version: read_lock(table-tb6_lock); rt6_make_pcpu_route(); = ip6_rt_pcpu_alloc(); = dst_alloc(); = ip6_dst_gc(); = write_lock(table-tb6_lock); /* oops */ The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc(). A reported stack: [141625.537638] INFO: rcu_sched self-detected stall on CPU { 27} (t=6 jiffies g=4159086 c=4159085 q=2139) [141625.547469] Task dump for CPU 27: [141625.550881] mtr R running task0 22121 22081 0x0008 [141625.558069] 88103f363d98 8106e488 001b [141625.565641] 81684900 88103f363db8 810702b0 0800 [141625.573220] 81684900 88103f363de8 8108df9f 88103f375a00 [141625.580803] Call Trace: [141625.583345] IRQ [8106e488] sched_show_task+0xc1/0xc6 [141625.589650] [810702b0] dump_cpu_task+0x35/0x39 [141625.595144] [8108df9f] rcu_dump_cpu_stacks+0x6a/0x8c [141625.601320] [81090606] rcu_check_callbacks+0x1f6/0x5d4 [141625.607669] [810940c8] update_process_times+0x2a/0x4f [141625.613925] [8109fbee] tick_sched_handle+0x32/0x3e [141625.619923] [8109fc2f] tick_sched_timer+0x35/0x5c [141625.625830] [81094a1f] __hrtimer_run_queues+0x8f/0x18d [141625.632171] [81094c9e] hrtimer_interrupt+0xa0/0x166 [141625.638258] [8102bf2a] local_apic_timer_interrupt+0x4e/0x52 [141625.645036] [8102c36f] smp_apic_timer_interrupt+0x39/0x4a [141625.651643] [8140b9e8] apic_timer_interrupt+0x68/0x70 [141625.657895] EOI [81346ee8] ? dst_destroy+0x7c/0xb5 [141625.664188] [813d45b5] ? fib6_flush_trees+0x20/0x20 [141625.670272] [81082b45] ? queue_write_lock_slowpath+0x60/0x6f [141625.677140] [8140aa33] _raw_write_lock_bh+0x23/0x25 [141625.683218] [813d4553] __fib6_clean_all+0x40/0x82 [141625.689124] [813d45b5] ? fib6_flush_trees+0x20/0x20 [141625.695207] [813d6058] fib6_clean_all+0xe/0x10 [141625.700854] [813d60d3] fib6_run_gc+0x79/0xc8 [141625.706329] [813d0510] ip6_dst_gc+0x85/0xf9 [141625.711718] [81346d68] dst_alloc+0x55/0x159 [141625.717105] [813d09b5] __ip6_dst_alloc.isra.32+0x19/0x63 [141625.723620] [813d1830] ip6_pol_route+0x36a/0x3e8 [141625.729441] [813d18d6] ip6_pol_route_output+0x11/0x13 [141625.735700] [813f02c8] fib6_rule_action+0xa7/0x1bf [141625.741698] [813d18c5] ? ip6_pol_route_input+0x17/0x17 [141625.748043] [81357c48] fib_rules_lookup+0xb5/0x12a [141625.754050] [81141628] ? poll_select_copy_remaining+0xf9/0xf9 [141625.761002] [813f0535] fib6_rule_lookup+0x37/0x5c [141625.766914] [813d18c5] ? ip6_pol_route_input+0x17/0x17 [141625.773260] [813d008c] ip6_route_output+0x7a/0x82 [141625.779177] [813c44c8] ip6_dst_lookup_tail+0x53/0x112 [141625.785437] [813c45c3] ip6_dst_lookup_flow+0x2a/0x6b [141625.791604] [813ddaab] rawv6_sendmsg+0x407/0x9b6 [141625.797423] [813d7914] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2 [141625.804464] [8139d4b4] inet_sendmsg+0x57/0x8e [141625.810028] [81329ba3] sock_sendmsg+0x2e/0x3c [141625.815588] [8132be57] SyS_sendto+0xfe/0x143 [141625.821063] [813dd551] ? rawv6_setsockopt+0x5e/0x67 [141625.827146] [8132c9f8] ? sock_common_setsockopt+0xf/0x11 [141625.833660] [8132c08c] ? SyS_setsockopt+0x81/0xa2 [141625.839565] [8140ac17] entry_SYSCALL_64_fastpath+0x12/0x6a Fixes: d52d3997f843 (pv6: Create percpu rt6_info) Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org Reported-by: Steinar H. Gunderson sgunder...@bigfoot.com --- net/ipv6/ip6_fib.c | 2 ++ net/ipv6/route.c | 44 +--- 2 files changed, 35 insertions(+), 11 deletions(-) diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 55d1986..548c623 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -172,6 +172,8 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt) *ppcpu_rt = NULL; } } + + non_pcpu_rt-rt6i_pcpu = NULL; } static void rt6_release(struct rt6_info *rt) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 0a82653..d155864 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1007,27 +1007,39 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) { + struct fib6_table *table = rt-rt6i_table; struct rt6_info *pcpu_rt, *prev, **p; pcpu_rt = ip6_rt_pcpu_alloc(rt); if (!pcpu_rt) {
[PATCH v2 net 1/3] ipv6: Remove un-used argument from ip6_dst_alloc()
After 4b32b5ad31a6 (ipv6: Stop rt6_info from using inet_peer's metrics), ip6_dst_alloc() does not need the 'table' argument. This patch cleans it up. Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 9de4d2b..c95c319 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -318,8 +318,7 @@ static const struct rt6_info ip6_blk_hole_entry_template = { /* allocate dst with ip6_dst_ops */ static struct rt6_info *__ip6_dst_alloc(struct net *net, struct net_device *dev, - int flags, - struct fib6_table *table) + int flags) { struct rt6_info *rt = dst_alloc(net-ipv6.ip6_dst_ops, dev, 0, DST_OBSOLETE_FORCE_CHK, flags); @@ -336,10 +335,9 @@ static struct rt6_info *__ip6_dst_alloc(struct net *net, static struct rt6_info *ip6_dst_alloc(struct net *net, struct net_device *dev, - int flags, - struct fib6_table *table) + int flags) { - struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags, table); + struct rt6_info *rt = __ip6_dst_alloc(net, dev, flags); if (rt) { rt-rt6i_pcpu = alloc_percpu_gfp(struct rt6_info *, GFP_ATOMIC); @@ -950,8 +948,7 @@ static struct rt6_info *ip6_rt_cache_alloc(struct rt6_info *ort, if (ort-rt6i_flags (RTF_CACHE | RTF_PCPU)) ort = (struct rt6_info *)ort-dst.from; - rt = __ip6_dst_alloc(dev_net(ort-dst.dev), ort-dst.dev, -0, ort-rt6i_table); + rt = __ip6_dst_alloc(dev_net(ort-dst.dev), ort-dst.dev, 0); if (!rt) return NULL; @@ -983,8 +980,7 @@ static struct rt6_info *ip6_rt_pcpu_alloc(struct rt6_info *rt) struct rt6_info *pcpu_rt; pcpu_rt = __ip6_dst_alloc(dev_net(rt-dst.dev), - rt-dst.dev, rt-dst.flags, - rt-rt6i_table); + rt-dst.dev, rt-dst.flags); if (!pcpu_rt) return NULL; @@ -1555,7 +1551,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev, if (unlikely(!idev)) return ERR_PTR(-ENODEV); - rt = ip6_dst_alloc(net, dev, 0, NULL); + rt = ip6_dst_alloc(net, dev, 0); if (unlikely(!rt)) { in6_dev_put(idev); dst = ERR_PTR(-ENOMEM); @@ -1742,7 +1738,8 @@ int ip6_route_add(struct fib6_config *cfg) if (!table) goto out; - rt = ip6_dst_alloc(net, NULL, (cfg-fc_flags RTF_ADDRCONF) ? 0 : DST_NOCOUNT, table); + rt = ip6_dst_alloc(net, NULL, + (cfg-fc_flags RTF_ADDRCONF) ? 0 : DST_NOCOUNT); if (!rt) { err = -ENOMEM; @@ -2399,7 +2396,7 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev, { struct net *net = dev_net(idev-dev); struct rt6_info *rt = ip6_dst_alloc(net, net-loopback_dev, - DST_NOCOUNT, NULL); + DST_NOCOUNT); if (!rt) return ERR_PTR(-ENOMEM); -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against the other tree ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0] [ 6620.282805] Modules linked in: [ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1 [ 6620.282805] task: 8221a580 ti: 8220 task.ti: 8220 [ 6620.282805] RIP: e030:[8100122a] [8100122a] xen_hypercall_xen_version+0xa/0x20 [ 6620.282805] RSP: e02b:88000fc03d48 EFLAGS: 0246 [ 6620.282805] RAX: 00040006 RBX: 0200 RCX: 8100122a [ 6620.282805] RDX: 0001 RSI: deadbeef RDI: deadbeef [ 6620.282805] RBP: 88000fc03d60 R08: 88000fc03ee0 R09: 00ee [ 6620.282805] R10: 8220a0c0 R11: 0246 R12: [ 6620.282805] R13: 0001 R14: 880003b53054 R15: 0005 [ 6620.282805] FS: 7fec747ad800() GS:88000fc0() knlGS: [ 6620.282805] CS: e033 DS: ES: CR0: 8005003b [ 6620.282805] CR2: 7ffcb7a7a6d8 CR3: 03164000 CR4: 0660 [ 6620.282805] Stack: [ 6620.282805] 0068 0007 81008dbd 88000fc03dd8 [ 6620.282805] 81009592 0068 8220a0c0 00ee [ 6620.282805] 88000fc03ee0 0200 0200 0001 [ 6620.282805] Call Trace: [ 6620.282805] IRQ [ 6620.282805] [81008dbd] ? xen_force_evtchn_callback+0xd/0x10 [ 6620.282805] [81009592] check_events+0x12/0x20 [ 6620.282805] [8100957f] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 6620.282805] [81af79a5] ? _raw_spin_unlock_irqrestore+0x25/0x30 [ 6620.282805] [8110ed43] try_to_del_timer_sync+0x43/0x60 [ 6620.282805] [8110eda7] del_timer_sync+0x47/0x60 [ 6620.282805] [81a2b698] inet_csk_reqsk_queue_drop+0x118/0x1f0 [ 6620.282805] [81a2b8c6] reqsk_timer_handler+0x156/0x260 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f3c7] call_timer_fn.isra.27+0x17/0x80 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f55d] run_timer_softirq+0x12d/0x200 [ 6620.282805] [810ca6c3] __do_softirq+0x103/0x210 [ 6620.282805] [810ca9cb] irq_exit+0x4b/0xa0 [ 6620.282805] [814f05d4] xen_evtchn_do_upcall+0x34/0x50 [ 6620.282805] [81af932e] xen_do_hypervisor_callback+0x1e/0x40 [ 6620.282805] EOI [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [81008d60] ? xen_safe_halt+0x10/0x20 [ 6620.282805] [810188d3] ? default_idle+0x13/0x20 [ 6620.282805] [81018e1a] ? arch_cpu_idle+0xa/0x10 [ 6620.282805] [810f8e7e] ? default_idle_call+0x2e/0x50 [ 6620.282805] [810f9112] ? cpu_startup_entry+0x272/0x2e0 [ 6620.282805] [81ae7967] ? rest_init+0x77/0x80 [ 6620.282805] [82312f58] ? start_kernel+0x43b/0x448 [ 6620.282805] [823124ef] ? x86_64_start_reservations+0x2a/0x2c [ 6620.282805] [82316008] ? xen_start_kernel+0x550/0x55c [ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] rhashtable-test: extend to test concurrency
After having tested insertion, lookup, table walk and removal, spawn a number of threads running operations on the same rhashtable. Each of them will: 1) insert it's own set of objects, 2) lookup every successfully inserted object and finally 3) remove objects in several rounds until all of them have been removed, making sure the remaining ones are still found after each round. This should put a good amount of load onto the system and due to synchronising thread startup via two semaphores also extensive concurrent table access. The default number of ten threads returned within half a second on my local VM with two cores. Running 200 threads took about four seconds. If slow systems suffer too much from this though, the default could be lowered or even set to zero so this extended test does not run at all by default. Signed-off-by: Phil Sutter p...@nwl.cc --- lib/test_rhashtable.c | 155 +- 1 file changed, 154 insertions(+), 1 deletion(-) diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c index 9af7cef..a26d76f 100644 --- a/lib/test_rhashtable.c +++ b/lib/test_rhashtable.c @@ -16,9 +16,11 @@ #include linux/init.h #include linux/jhash.h #include linux/kernel.h +#include linux/kthread.h #include linux/module.h #include linux/rcupdate.h #include linux/rhashtable.h +#include linux/semaphore.h #include linux/slab.h #include linux/sched.h @@ -45,11 +47,21 @@ static int size = 8; module_param(size, int, 0); MODULE_PARM_DESC(size, Initial size hint of table (default: 8)); +static int tcount = 10; +module_param(tcount, int, 0); +MODULE_PARM_DESC(tcount, Number of threads to spawn (default: 10)); + struct test_obj { int value; struct rhash_head node; }; +struct thread_data { + int id; + struct task_struct *task; + struct test_obj *objs; +}; + static struct test_obj array[MAX_ENTRIES]; static struct rhashtable_params test_rht_params = { @@ -60,6 +72,9 @@ static struct rhashtable_params test_rht_params = { .nulls_base = (3U RHT_BASE_SHIFT), }; +static struct semaphore prestart_sem; +static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0); + static int __init test_rht_lookup(struct rhashtable *ht) { unsigned int i; @@ -200,10 +215,97 @@ static s64 __init test_rhashtable(struct rhashtable *ht) static struct rhashtable ht; +static int thread_lookup_test(struct thread_data *tdata) +{ + int i, err = 0; + + for (i = 0; i entries; i++) { + struct test_obj *obj; + int key = (tdata-id 16) | i; + + obj = rhashtable_lookup_fast(ht, key, test_rht_params); + if (obj (tdata-objs[i].value == TEST_INSERT_FAIL)) { + pr_err( found unexpected object %d\n, key); + err++; + } else if (!obj (tdata-objs[i].value != TEST_INSERT_FAIL)) { + pr_err( object %d not found!\n, key); + err++; + } else if (obj (obj-value != key)) { + pr_err( wrong object returned (got %d, expected %d)\n, + obj-value, key); + err++; + } + } + return err; +} + +static int threadfunc(void *data) +{ + int i, step, err = 0, insert_fails = 0; + struct thread_data *tdata = data; + + up(prestart_sem); + if (down_interruptible(startup_sem)) + pr_err( thread[%d]: down_interruptible failed\n, tdata-id); + + for (i = 0; i entries; i++) { + tdata-objs[i].value = (tdata-id 16) | i; + err = rhashtable_insert_fast(ht, tdata-objs[i].node, +test_rht_params); + if (err == -ENOMEM || err == -EBUSY) { + tdata-objs[i].value = TEST_INSERT_FAIL; + insert_fails++; + } else if (err) { + pr_err( thread[%d]: rhashtable_insert_fast failed\n, + tdata-id); + goto out; + } + } + if (insert_fails) + pr_info( thread[%d]: %d insert failures\n, + tdata-id, insert_fails); + + err = thread_lookup_test(tdata); + if (err) { + pr_err( thread[%d]: rhashtable_lookup_test failed\n, + tdata-id); + goto out; + } + + for (step = 10; step 0; step--) { + for (i = 0; i entries; i += step) { + if (tdata-objs[i].value == TEST_INSERT_FAIL) + continue; + err = rhashtable_remove_fast(ht, tdata-objs[i].node, +test_rht_params); + if (err) { + pr_err(
Re: [PATCH net-next 1/3] lwt: Add support to redirect dst.input
I will send out a v2 short, this breaks compilation when CONFIG_LWTUNNEL is not defined. On Thu, Aug 13, 2015 at 9:54 AM, Tom Herbert t...@herbertland.com wrote: This patch adds the capability to redirect dst input in the same way that dst output is redirected by LWT. Also, save the original dst.input and and dst.out when setting up lwtunnel redirection. These can be called by the client as a pass- through. Signed-off-by: Tom Herbert t...@herbertland.com --- include/net/lwtunnel.h | 25 ++- net/core/lwtunnel.c| 55 ++ net/ipv4/route.c | 8 +++- net/ipv6/route.c | 8 +++- 4 files changed, 93 insertions(+), 3 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 33bd309..3db87d7 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -11,12 +11,15 @@ #define LWTUNNEL_HASH_SIZE (1 LWTUNNEL_HASH_BITS) /* lw tunnel state flags */ -#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1 +#define LWTUNNEL_STATE_OUTPUT_REDIRECT BIT(0) +#define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) struct lwtunnel_state { __u16 type; __u16 flags; atomic_trefcnt; + int (*orig_output)(struct sock *sk, struct sk_buff *skb); + int (*orig_input)(struct sk_buff *); int len; __u8data[0]; }; @@ -25,6 +28,7 @@ struct lwtunnel_encap_ops { int (*build_state)(struct net_device *dev, struct nlattr *encap, struct lwtunnel_state **ts); int (*output)(struct sock *sk, struct sk_buff *skb); + int (*input)(struct sk_buff *skb); int (*fill_encap)(struct sk_buff *skb, struct lwtunnel_state *lwtstate); int (*get_encap_size)(struct lwtunnel_state *lwtstate); @@ -58,6 +62,13 @@ static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate) return false; } +static inline bool lwtunnel_input_redirect(struct lwtunnel_state *lwtstate) +{ + if (lwtstate (lwtstate-flags LWTUNNEL_STATE_INPUT_REDIRECT)) + return true; + + return false; +} int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op, unsigned int num); int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, @@ -72,6 +83,8 @@ struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len); int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b); int lwtunnel_output(struct sock *sk, struct sk_buff *skb); int lwtunnel_output6(struct sock *sk, struct sk_buff *skb); +int lwtunnel_input(struct sk_buff *skb); +int lwtunnel_input6(struct sk_buff *skb); #else @@ -142,6 +155,16 @@ static inline int lwtunnel_output6(struct sock *sk, struct sk_buff *skb) return -EOPNOTSUPP; } +static inline int lwtunnel_input(struct sock *sk, struct sk_buff *skb) +{ + return -EOPNOTSUPP; +} + +static inline int lwtunnel_input6(struct sock *sk, struct sk_buff *skb) +{ + return -EOPNOTSUPP; +} + #endif #endif /* __NET_LWTUNNEL_H */ diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index 5d6d8e3..3331585 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -241,3 +241,58 @@ int lwtunnel_output(struct sock *sk, struct sk_buff *skb) return __lwtunnel_output(sk, skb, lwtstate); } EXPORT_SYMBOL(lwtunnel_output); + +int __lwtunnel_input(struct sk_buff *skb, +struct lwtunnel_state *lwtstate) +{ + const struct lwtunnel_encap_ops *ops; + int ret = -EINVAL; + + if (!lwtstate) + goto drop; + + if (lwtstate-type == LWTUNNEL_ENCAP_NONE || + lwtstate-type LWTUNNEL_ENCAP_MAX) + return 0; + + ret = -EOPNOTSUPP; + rcu_read_lock(); + ops = rcu_dereference(lwtun_encaps[lwtstate-type]); + if (likely(ops ops-input)) + ret = ops-input(skb); + rcu_read_unlock(); + + if (ret == -EOPNOTSUPP) + goto drop; + + return ret; + +drop: + kfree_skb(skb); + + return ret; +} + +int lwtunnel_input6(struct sk_buff *skb) +{ + struct rt6_info *rt = (struct rt6_info *)skb_dst(skb); + struct lwtunnel_state *lwtstate = NULL; + + if (rt) + lwtstate = rt-rt6i_lwtstate; + + return __lwtunnel_input(skb, lwtstate); +} +EXPORT_SYMBOL(lwtunnel_input6); + +int lwtunnel_input(struct sk_buff *skb) +{ + struct rtable *rt = (struct rtable *)skb_dst(skb); + struct lwtunnel_state *lwtstate = NULL; + + if (rt) + lwtstate = rt-rt_lwtstate; + + return __lwtunnel_input(skb, lwtstate); +} +EXPORT_SYMBOL(lwtunnel_input); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 18fd7c9..051d834 100644 ---
Re: [BUG net-next] infamous dev refcnt leak... again.
On Fri, 2015-08-14 at 17:19 -0600, David Ahern wrote: On 8/14/15 5:14 PM, Eric Dumazet wrote: On Fri, 2015-08-14 at 14:14 -0700, Eric Dumazet wrote: While rebooting host running latest net-next unregister_netdevice: waiting for eth0 to become free. Usage count = 4 Oh well... It looks like David Ahern recent changes uncover a bug ? Not clear which commit is at fault. Maybe 3bfd847203c6d89532f836ad3f5b4ff4ced26dd9 ? Somehow a down device can be found. Can you elaborate on what you are doing to see the refcnt leak? I have not seen that at all. I have to leave for soccer carpool in 45 minutes or so, but can take a look this weekend. I simply reboot my host. eth0 device can not be dismantled and block the reboot, I gave to reset the host. I get the issue every time. I confirm reverting 3bfd847203c6d89532f836ad3f5b4ff4ced26dd9 removes the issue for me. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
On Sat, 2015-08-15 at 00:09 +0200, Sander Eikelenboom wrote: On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against the other tree ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: Yes, this was fixed by : http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=83fccfc3940c4a2db90fd7e7079f5b465cd8c6af -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG net-next] infamous dev refcnt leak... again.
On Fri, 2015-08-14 at 14:14 -0700, Eric Dumazet wrote: While rebooting host running latest net-next unregister_netdevice: waiting for eth0 to become free. Usage count = 4 Oh well... It looks like David Ahern recent changes uncover a bug ? Not clear which commit is at fault. Maybe 3bfd847203c6d89532f836ad3f5b4ff4ced26dd9 ? Somehow a down device can be found. diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index b7f1d20..675a3b6 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -725,10 +725,14 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi, nh-nh_dev = dev = FIB_RES_DEV(res); if (!dev) goto out; - dev_hold(dev); if (!netif_carrier_ok(dev)) nh-nh_flags |= RTNH_F_LINKDOWN; - err = (dev-flags IFF_UP) ? 0 : -ENETDOWN; + if (dev-flags IFF_UP) { + err = 0; + dev_hold(dev); + } else { + err = -ENETDOWN; + } } else { struct in_device *in_dev; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Intel-wired-lan] [PATCH v2] e1000e: Modify tx/rx configurations to avoid null pointer dereferences in e1000_open
From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On Behalf Of Jia-Ju Bai Sent: Wednesday, August 05, 2015 3:16 AM To: Kirsher, Jeffrey T; Brandeburg, Jesse Cc: netdev@vger.kernel.org; Jia-Ju Bai; intel-wired-...@lists.osuosl.org; linux-ker...@vger.kernel.org Subject: [Intel-wired-lan] [PATCH v2] e1000e: Modify tx/rx configurations to avoid null pointer dereferences in e1000_open When e1000e_setup_rx_resources is failed in e1000_open, e1000e_free_tx_resources in err_setup_rx segment is executed. writel(0, tx_ring-head) statement in e1000_clean_tx_ring in e1000e_free_tx_resources will cause a null poonter dereference(crash), because tx_ring-head is only assigned in e1000_configure_tx in e1000_configure, but it is after e1000e_setup_rx_resources. This patch moves head/tail register writing to e1000_configure_tx/rx, which can fix this problem. It is inspired by igb_configure_tx_ring in the igb driver. Specially, thank Alexander Duyck for his valuable suggestion. Signed-off-by: Jia-Ju Bai baijiaju1...@163.com --- drivers/net/ethernet/intel/e1000e/netdev.c | 24 --- - 1 file changed, 12 insertions(+), 12 deletions(-) Tested-by: Aaron Brown aaron.f.br...@intel.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [B.A.T.M.A.N.] [PATCH 03/10] batman-adv: Make DAT capability changes atomic
On 11/08/15 21:36, Sergei Shtylyov wrote: /* check if orig node candidate is running DAT */ -if (!(candidate-capabilities BATADV_ORIG_CAPA_HAS_DAT)) +if (!(test_bit(BATADV_ORIG_CAPA_HAS_DAT, candidate-capabilities))) () around the tst_bit() call not needed. Thanks for the hint Sergei. Even if I don't remember having seen any complaint from checkpatch.pl about this. I'll resend the pull request with this fixed patches. Cheers, -- Antonio Quartulli signature.asc Description: OpenPGP digital signature
pull request: batman-adv 20150814
Hi David, this is our first batch intended for net-next/linux-4.3 (resent after fixing the parenthesis as reported by Sergei). Here you have all those non-critical fixes/changes that we couldn't merge into the net tree as it was already too late in the release cycle. This is a summary of what each patch does: - patch 1 by Sven Eckelmann is changing the way the GW metric is computed so that the resulting operation does not make use of divisions and also does not lead to any data type promotion. This is a requirement for patch 2; - patch 2 by Ruben Wisniewski is changing the type of the variable used in the same GW metric computation as patch 1 to uint64_t so that potential integer overflows are prevented. Thanks to Sven's patch above no 64bit division will be involved; - patches 3, 4, 5 and 6 by Linus Lüssing are converting plain bitwise operations on capability bits to set/clear/test_bit() in order to ensure their atomicity and prevent potential race conditions; - patch 7, also by Linus, is making the multicast TVLV parsing routine thread-safe in order to prevent potential race conditions upon reception of two OGMs from the same originator at the same time; - patch 8 by Marek Lindner prevents potential double deletions of TT Request objects from its lists which would lead to a kernel crash. - patch 9 by Simon Wunderlich is ensuring that no enqueued packet is leaked when an interface is deactivated; - patch 10 by Linus Lüssing is setting the network header in the skb struct right after a packet was delivered to the batman virtual interface so that subsequent call to ip/ipv6_hdr() do not crash. Please pull or let me know of any problem! Thanks a lot David, Antonio The following changes since commit 07a51cd3794960548627a27aae68c1446341db32: vxlan: fix fdb_dump index calculation (2015-08-10 21:15:18 -0700) are available in the git repository at: git://git.open-mesh.org/linux-merge.git tags/batman-adv-for-davem for you to fetch changes up to 53cf037bf846417fd92dc92ddf97267f69b110f4: batman-adv: Fix potentially broken skb network header access (2015-08-14 22:52:10 +0200) Included changes: - avoid integer overflow in GW selection routine - prevent race condition by making capability bit changes atomic (use clear/set/test_bit) - fix synchronization issue in mcast tvlv handler - fix crash on double list removal of TT Request objects - fix leak by puring packets enqueued for sending upon iface removal - ensure network header pointer is set in skb Linus Lüssing (6): batman-adv: Make DAT capability changes atomic batman-adv: Make NC capability changes atomic batman-adv: Make TT capability changes atomic batman-adv: Make MCAST capability changes atomic batman-adv: Fix potential synchronization issues in mcast tvlv handler batman-adv: Fix potentially broken skb network header access Marek Lindner (1): batman-adv: protect tt request from double deletion Ruben Wisniewski (1): batman-adv: Avoid u32 overflow during gateway select Simon Wunderlich (1): batman-adv: remove broadcast packets scheduled for purged outgoing if Sven Eckelmann (1): batman-adv: Replace gw_reselect divisor with simple shift net/batman-adv/distributed-arp-table.c | 7 +-- net/batman-adv/gateway_client.c| 8 +--- net/batman-adv/multicast.c | 81 +- net/batman-adv/network-coding.c| 7 +-- net/batman-adv/originator.c| 5 +++ net/batman-adv/send.c | 3 +- net/batman-adv/soft-interface.c| 7 ++- net/batman-adv/translation-table.c | 17 --- net/batman-adv/types.h | 15 --- 9 files changed, 102 insertions(+), 48 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/10] batman-adv: Make NC capability changes atomic
From: Linus Lüssing linus.luess...@c0d3.blue Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 3f4841ffb336 (batman-adv: tvlv - add network coding container) Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/network-coding.c | 7 --- net/batman-adv/types.h | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c index f0a50f3..4660401 100644 --- a/net/batman-adv/network-coding.c +++ b/net/batman-adv/network-coding.c @@ -19,6 +19,7 @@ #include main.h #include linux/atomic.h +#include linux/bitops.h #include linux/byteorder/generic.h #include linux/compiler.h #include linux/debugfs.h @@ -134,9 +135,9 @@ static void batadv_nc_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, uint16_t tvlv_value_len) { if (flags BATADV_TVLV_HANDLER_OGM_CIFNOTFND) - orig-capabilities = ~BATADV_ORIG_CAPA_HAS_NC; + clear_bit(BATADV_ORIG_CAPA_HAS_NC, orig-capabilities); else - orig-capabilities |= BATADV_ORIG_CAPA_HAS_NC; + set_bit(BATADV_ORIG_CAPA_HAS_NC, orig-capabilities); } /** @@ -894,7 +895,7 @@ void batadv_nc_update_nc_node(struct batadv_priv *bat_priv, goto out; /* check if orig node is network coding enabled */ - if (!(orig_node-capabilities BATADV_ORIG_CAPA_HAS_NC)) + if (!test_bit(BATADV_ORIG_CAPA_HAS_NC, orig_node-capabilities)) goto out; /* accept ogms from 'good' neighbors and single hop neighbors */ diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index 29fd625..ed4aec5 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -314,7 +314,7 @@ struct batadv_orig_node { */ enum batadv_orig_capabilities { BATADV_ORIG_CAPA_HAS_DAT, - BATADV_ORIG_CAPA_HAS_NC = BIT(1), + BATADV_ORIG_CAPA_HAS_NC, BATADV_ORIG_CAPA_HAS_TT = BIT(2), BATADV_ORIG_CAPA_HAS_MCAST = BIT(3), }; -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG net-next] infamous dev refcnt leak... again.
While rebooting host running latest net-next unregister_netdevice: waiting for eth0 to become free. Usage count = 4 Oh well... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/10] batman-adv: Fix potentially broken skb network header access
From: Linus Lüssing linus.luess...@c0d3.blue The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They need a correctly set skb network header. Unfortunately we cannot rely on the device drivers to set it for us. Therefore setting it in the beginning of the according ndo_start_xmit handler. Fixes: 1d8ab8d3c176 (batman-adv: Modified forwarding behaviour for multicast packets) Fixes: ab49886e3da7 (batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support) Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/soft-interface.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index c002961..926292d 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -202,6 +202,7 @@ static int batadv_interface_tx(struct sk_buff *skb, int gw_mode; enum batadv_forw_mode forw_mode; struct batadv_orig_node *mcast_single_orig = NULL; + int network_offset = ETH_HLEN; if (atomic_read(bat_priv-mesh_state) != BATADV_MESH_ACTIVE) goto dropped; @@ -214,14 +215,18 @@ static int batadv_interface_tx(struct sk_buff *skb, case ETH_P_8021Q: vhdr = vlan_eth_hdr(skb); - if (vhdr-h_vlan_encapsulated_proto != ethertype) + if (vhdr-h_vlan_encapsulated_proto != ethertype) { + network_offset += VLAN_HLEN; break; + } /* fall through */ case ETH_P_BATMAN: goto dropped; } + skb_set_network_header(skb, network_offset); + if (batadv_bla_tx(bat_priv, skb, vid)) goto dropped; -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ipv6: trivial whitespace fix
Change brace placement to be in line with coding standards Signed-off-by: Ian Morris i...@chirality.org.uk --- net/ipv6/udp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index e51fc3e..0aba654 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1496,7 +1496,8 @@ int __net_init udp6_proc_init(struct net *net) return udp_proc_register(net, udp6_seq_afinfo); } -void udp6_proc_exit(struct net *net) { +void udp6_proc_exit(struct net *net) +{ udp_proc_unregister(net, udp6_seq_afinfo); } #endif /* CONFIG_PROC_FS */ -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/6] net/bonding: enable LRO if one device supports it
On 2015-08-14 2:56 AM, Michal Kubecek wrote: On Thu, Aug 13, 2015 at 02:02:55PM -0400, Jarod Wilson wrote: Currently, all bonding devices come up, and claim to have LRO support, which ethtool will let you toggle on and off, even if none of the underlying hardware devices actually support it. While the bonding driver takes precautions for slaves that don't support all features, this is at least a little bit misleading to users. If we add NETIF_F_LRO to the NETIF_F_ONE_FOR_ALL flags in netdev_features.h, then netdev_features_increment() will only enable LRO if 1) its listed in the device's feature mask and 2) if there's actually a slave present that supports the feature. Note that this is going to require some follow-up patches, as not all LRO capable device drivers are currently properly reporting LRO support in their vlan_features, which is where the bonding driver picks up device-specific features. CC: David S. Miller da...@davemloft.net CC: Jiri Pirko j...@resnulli.us CC: Tom Herbert therb...@google.com CC: Scott Feldman sfel...@gmail.com CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- include/linux/netdev_features.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 9672781..6440bf1 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -159,7 +159,8 @@ enum { */ #define NETIF_F_ONE_FOR_ALL (NETIF_F_GSO_SOFTWARE | NETIF_F_GSO_ROBUST | \ NETIF_F_SG | NETIF_F_HIGHDMA | \ -NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED) +NETIF_F_FRAGLIST | NETIF_F_VLAN_CHALLENGED | \ +NETIF_F_LRO) /* * If one device doesn't support one of these features, then disable it -- I don't think this is going to work the way you expect. Assume we have a non-LRO eth1 and LRO capable eth2. If we enslave eth1 first, bond will lose NETIF_F_LRO so that while enslaving eth2, bond_enslave() does run if (!(bond_dev-features NETIF_F_LRO)) dev_disable_lro(slave_dev); and disable LRO on eth2 even before computing the bond features so that in the end, all three interfaces end up with disabled LRO. If you add the slaves in the opposite order, you end up with eth2 and bond having LRO enabled. IMHO features should not depend on the order in which slaves are added into the bond. Crap, you're right. Hadn't tried inverting the order of added devices, as it didn't occur to me that it would make a difference. You would need to remove the code quoted above to make things work the way you want (or move it after the call to bond_compute_features() which is effectively the same). But then the result would be even worse: adding a LRO-capable slave to a bond having dev_disable_lro() called on it would not disable LRO on that slave, possibly (or rather likely) causing communication breakage. I believe NETIF_F_LRO in its original sense should be only considered for physical devices; even if it's not explicitely said in the commit message, the logic behind fbe168ba91f7 (net: generic dev_disable_lro() stacked device handling) is that for stacked devices like bond or team, NETIF_F_LRO means allow slaves to use LRO if they can and want while its absence means disable LRO on all slaves. If you wanted NETIF_F_LRO for a bond to mean there is at least one LRO capable slave, you would need a new flag for the LRO should be disabled for all lower devices state. I don't think it's worth the effort. Yeah, my thinking was that it should mean there's at least one lro capable slave. If we just leave things the way they are though, I think its confusing on the user side -- it was one of our QE people who reported confusion being able to toggle lro on a bond when none of the slaves supported it. And there's also the inconsistency among devices that support lro in their vlan_features. So I think *something* should still be done here to make things clearer and more consistent, but I'll have to ponder that next week, since its beyond quitting time on Friday already. :) Oh, last thought: the comment above #define NETIF_F_ONE_FOR_ALL is partly to blame for my not thinking harder and trying inverted ordering of slave additions: /* * If one device supports one of these features, then enable them * for all in netdev_increment_features. */ This clearly seems to fall down in the lro case. :) -- Jarod Wilson ja...@redhat.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG net-next] infamous dev refcnt leak... again.
On Fri, 2015-08-14 at 16:31 -0700, Eric Dumazet wrote: I simply reboot my host. eth0 device can not be dismantled and block the reboot, I gave to reset the host. I get the issue every time. I confirm reverting 3bfd847203c6d89532f836ad3f5b4ff4ced26dd9 removes the issue for me. Also, netif_index_is_vrf() is supposed to be called under rcu, but it is not the case from net/ipv4/udp.c , and ip_route_connect_init() -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] net: add Hisilicon Network Subsystem support (config and documents)
On Friday 14 August 2015 18:30:18 Kenneth Lee wrote: diff --git a/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt b/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt new file mode 100644 index 000..5ab6969 --- /dev/null +++ b/Documentation/devicetree/bindings/net/hisilicon-hns-nic.txt @@ -0,0 +1,14 @@ +Hisilicon Network Subsystem NIC controller + +Required properties: +- compatible: hisilicon,hns-nic +- ae-name: accelerator name who provide this interface +- ae-opts: options (string) to the accelerator. e.g. the index interface + +Example: + + ethernet@0{ + compatible = hisilicon,hns-nic; + ae-name = soc0-n4; + ae-opts = 0; + }; These properties look very unconventional. What are the valid strings for ae-name and ae-opts? It looks like the latter is just a number, so why not use an integer property? Arnd -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/10] batman-adv: Replace gw_reselect divisor with simple shift
From: Sven Eckelmann s...@narfation.org The gw_factor is divided by BATADV_TQ_LOCAL_WINDOW_SIZE ** 2 * 64. But the rest of the calculation has nothing to do with the tq window size and therefore the calculation is just (tmp_gw_factor / (64 ** 3)). Replace it with a simple shift to avoid a costly 64-bit divide when the max_gw_factor is changed from u32 to u64. This type change is necessary to avoid an overflow bug. Signed-off-by: Sven Eckelmann s...@narfation.org Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/gateway_client.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c index bb015862..e1e1f31 100644 --- a/net/batman-adv/gateway_client.c +++ b/net/batman-adv/gateway_client.c @@ -154,14 +154,10 @@ batadv_gw_get_best_gw_node(struct batadv_priv *bat_priv) struct batadv_neigh_ifinfo *router_ifinfo; struct batadv_gw_node *gw_node, *curr_gw = NULL; uint32_t max_gw_factor = 0, tmp_gw_factor = 0; - uint32_t gw_divisor; uint8_t max_tq = 0; uint8_t tq_avg; struct batadv_orig_node *orig_node; - gw_divisor = BATADV_TQ_LOCAL_WINDOW_SIZE * BATADV_TQ_LOCAL_WINDOW_SIZE; - gw_divisor *= 64; - rcu_read_lock(); hlist_for_each_entry_rcu(gw_node, bat_priv-gw.list, list) { if (gw_node-deleted) @@ -187,7 +183,7 @@ batadv_gw_get_best_gw_node(struct batadv_priv *bat_priv) tmp_gw_factor = tq_avg * tq_avg; tmp_gw_factor *= gw_node-bandwidth_down; tmp_gw_factor *= 100 * 100; - tmp_gw_factor /= gw_divisor; + tmp_gw_factor = 18; if ((tmp_gw_factor max_gw_factor) || ((tmp_gw_factor == max_gw_factor) -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/10] batman-adv: Avoid u32 overflow during gateway select
From: Ruben Wisniewski ru...@freifunk-nrw.de The gateway selection based on fast connections is using a single value calculated from the average tq (0-255) and the download bandwidth (in 100Kibit). The formula for the first step (tq ** 2 * 1 * bandwidth) tends to overflow a u32 with low bandwidth settings like 50 [100KiBit] and a tq value of over 92. Changing this to a 64 bit unsigned integer allows to support a bandwidth_down with up to ~2.8e10 [100KiBit] and a perfect tq of 255. This is ~6.6 times higher than the maximum possible value of the gateway announcement TVLV. This problem only affects the non-default gw_sel_class 1. Signed-off-by: Ruben Wisniewsi ru...@vfn-nrw.de [s...@narfation.org: rewritten commit message] Signed-off-by: Sven Eckelmann s...@narfation.org Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/gateway_client.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/batman-adv/gateway_client.c b/net/batman-adv/gateway_client.c index e1e1f31..4ac24d8 100644 --- a/net/batman-adv/gateway_client.c +++ b/net/batman-adv/gateway_client.c @@ -153,7 +153,7 @@ batadv_gw_get_best_gw_node(struct batadv_priv *bat_priv) struct batadv_neigh_node *router; struct batadv_neigh_ifinfo *router_ifinfo; struct batadv_gw_node *gw_node, *curr_gw = NULL; - uint32_t max_gw_factor = 0, tmp_gw_factor = 0; + uint64_t max_gw_factor = 0, tmp_gw_factor = 0; uint8_t max_tq = 0; uint8_t tq_avg; struct batadv_orig_node *orig_node; -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ipv6: remove unnecessary include
printk.h does not need to be explicitly included as we include kernel.h which already called it. Signed-off-by: Ian Morris i...@chirality.org.uk --- net/ipv6/ip6_offload.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 08b6204..1cb2dc7 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -12,7 +12,6 @@ #include linux/socket.h #include linux/netdevice.h #include linux/skbuff.h -#include linux/printk.h #include net/protocol.h #include net/ipv6.h -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 4.2-rc6 regression: RIP: e030:[ffffffff8110fb18] [ffffffff8110fb18] detach_if_pending+0x18/0x80
On 2015-08-15 00:09, Sander Eikelenboom wrote: On 2015-08-13 00:41, Eric Dumazet wrote: On Wed, 2015-08-12 at 23:46 +0200, Sander Eikelenboom wrote: Thanks for the reminder, but luckily i was aware of that, seen enough of your replies asking for patches to be resubmitted against the other tree ;) Kernel with patch is currently running so fingers crossed. Thanks for testing. I am definitely interested knowing your results. Hmm it seems now commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af is breaking things (have to test if a revert helps) i get this in some guests: Should have done that before, because it wasn't in yet .. and likely to fix the issue, also pulled and compiling now. -- Sander NMI watchdog: BUG: soft lockup - CPU#0 stuck for 506s! [swapper/0:0] [ 6620.282805] Modules linked in: [ 6620.282805] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6-20150814-linus-doflr-apicrevert+ #1 [ 6620.282805] task: 8221a580 ti: 8220 task.ti: 8220 [ 6620.282805] RIP: e030:[8100122a] [8100122a] xen_hypercall_xen_version+0xa/0x20 [ 6620.282805] RSP: e02b:88000fc03d48 EFLAGS: 0246 [ 6620.282805] RAX: 00040006 RBX: 0200 RCX: 8100122a [ 6620.282805] RDX: 0001 RSI: deadbeef RDI: deadbeef [ 6620.282805] RBP: 88000fc03d60 R08: 88000fc03ee0 R09: 00ee [ 6620.282805] R10: 8220a0c0 R11: 0246 R12: [ 6620.282805] R13: 0001 R14: 880003b53054 R15: 0005 [ 6620.282805] FS: 7fec747ad800() GS:88000fc0() knlGS: [ 6620.282805] CS: e033 DS: ES: CR0: 8005003b [ 6620.282805] CR2: 7ffcb7a7a6d8 CR3: 03164000 CR4: 0660 [ 6620.282805] Stack: [ 6620.282805] 0068 0007 81008dbd 88000fc03dd8 [ 6620.282805] 81009592 0068 8220a0c0 00ee [ 6620.282805] 88000fc03ee0 0200 0200 0001 [ 6620.282805] Call Trace: [ 6620.282805] IRQ [ 6620.282805] [81008dbd] ? xen_force_evtchn_callback+0xd/0x10 [ 6620.282805] [81009592] check_events+0x12/0x20 [ 6620.282805] [8100957f] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 6620.282805] [81af79a5] ? _raw_spin_unlock_irqrestore+0x25/0x30 [ 6620.282805] [8110ed43] try_to_del_timer_sync+0x43/0x60 [ 6620.282805] [8110eda7] del_timer_sync+0x47/0x60 [ 6620.282805] [81a2b698] inet_csk_reqsk_queue_drop+0x118/0x1f0 [ 6620.282805] [81a2b8c6] reqsk_timer_handler+0x156/0x260 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f3c7] call_timer_fn.isra.27+0x17/0x80 [ 6620.282805] [81a2b770] ? inet_csk_reqsk_queue_drop+0x1f0/0x1f0 [ 6620.282805] [8110f55d] run_timer_softirq+0x12d/0x200 [ 6620.282805] [810ca6c3] __do_softirq+0x103/0x210 [ 6620.282805] [810ca9cb] irq_exit+0x4b/0xa0 [ 6620.282805] [814f05d4] xen_evtchn_do_upcall+0x34/0x50 [ 6620.282805] [81af932e] xen_do_hypervisor_callback+0x1e/0x40 [ 6620.282805] EOI [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [810013aa] ? xen_hypercall_sched_op+0xa/0x20 [ 6620.282805] [81008d60] ? xen_safe_halt+0x10/0x20 [ 6620.282805] [810188d3] ? default_idle+0x13/0x20 [ 6620.282805] [81018e1a] ? arch_cpu_idle+0xa/0x10 [ 6620.282805] [810f8e7e] ? default_idle_call+0x2e/0x50 [ 6620.282805] [810f9112] ? cpu_startup_entry+0x272/0x2e0 [ 6620.282805] [81ae7967] ? rest_init+0x77/0x80 [ 6620.282805] [82312f58] ? start_kernel+0x43b/0x448 [ 6620.282805] [823124ef] ? x86_64_start_reservations+0x2a/0x2c [ 6620.282805] [82316008] ? xen_start_kernel+0x550/0x55c [ 6620.282805] Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] be2net: avoid vxlan offloading on multichannel configs
On Aug 14, 2015, at 3:30 PM, Ivan Vecera ivec...@redhat.com wrote: VxLAN offloading is not functional if the NIC is running in multichannel mode (UMC, FLEX-10, VNIC...). Enabling this additionally kills whole connectivity through the NIC and the device needs to be down and up to restore it. The firmware should take care about it and does not allow the conversion of interface to tunnel type (be_cmd_manage_iface) or should support VxLAN offloading if multichannel config is enabled. I have tested this on the latest available firmware (10.6.144.21). Result: [root@sm-04 ~]# ip link set enp5s0f0 up[root@sm-04 ~]# ip addr add 172.30.10.50/24 dev enp5s0f0 [root@sm-04 ~]# ping -c 3 172.30.10.254PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. 64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.317 ms 64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.187 ms 64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.188 ms --- 172.30.10.254 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.187/0.230/0.317/0.063 ms [root@sm-04 ~]# ip link add link enp5s0f0 vxlan10 type vxlan id 10 remote 172.30.10.60 dstport 4789 [root@sm-04 ~]# ip link set vxlan10 up [ 7900.442811] be2net :05:00.0: Enabled VxLAN offloads for UDP port 4789 [ 7900.455722] be2net :05:00.1: Enabled VxLAN offloads for UDP port 4789 [ 7900.468635] be2net :05:00.2: Enabled VxLAN offloads for UDP port 4789 [ 7900.481553] be2net :05:00.3: Enabled VxLAN offloads for UDP port 4789 [root@sm-04 ~]# ping -c 3 172.30.10.254 PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. --- 172.30.10.254 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms [root@sm-04 ~]# ip link set vxlan10 down [ 7959.434093] be2net :05:00.0: Disabled VxLAN offloads for UDP port 4789 [ 7959.444792] be2net :05:00.1: Disabled VxLAN offloads for UDP port 4789 [ 7959.455592] be2net :05:00.2: Disabled VxLAN offloads for UDP port 4789 [ 7959.466416] be2net :05:00.3: Disabled VxLAN offloads for UDP port 4789 [root@sm-04 ~]# ip link del vxlan10 [root@sm-04 ~]# ping -c 3 172.30.10.254 PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. --- 172.30.10.254 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 1999ms [root@sm-04 ~]# ip link set enp5s0f0 down [root@sm-04 ~]# ip link set enp5s0f0 up [ 8071.019003] be2net :05:00.0 enp5s0f0: Link is Up [root@sm-04 ~]# ping -c 3 172.30.10.254 PING 172.30.10.254 (172.30.10.254) 56(84) bytes of data. 64 bytes from 172.30.10.254: icmp_seq=1 ttl=64 time=0.318 ms 64 bytes from 172.30.10.254: icmp_seq=2 ttl=64 time=0.196 ms 64 bytes from 172.30.10.254: icmp_seq=3 ttl=64 time=0.194 ms --- 172.30.10.254 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.194/0.236/0.318/0.057 ms Cc: Sathya Perla sathya.pe...@avagotech.com Cc: Ajit Khaparde ajit.khapa...@avagotech.com Cc: Padmanabh Ratnakar padmanabh.ratna...@avagotech.com Cc: Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com Signed-off-by: Ivan Vecera ivec...@redhat.com Acked-by: Ajit Khaparde ajit.khapa...@avagotech.com --- drivers/net/ethernet/emulex/benet/be_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index c28e3bf..6ca693b 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -5174,7 +5174,7 @@ static void be_add_vxlan_port(struct net_device *netdev, sa_family_t sa_family, struct device *dev = adapter-pdev-dev; int status; - if (lancer_chip(adapter) || BEx_chip(adapter)) + if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter)) return; if (adapter-flags BE_FLAGS_VXLAN_OFFLOADS) { @@ -5221,7 +5221,7 @@ static void be_del_vxlan_port(struct net_device *netdev, sa_family_t sa_family, { struct be_adapter *adapter = netdev_priv(netdev); - if (lancer_chip(adapter) || BEx_chip(adapter)) + if (lancer_chip(adapter) || BEx_chip(adapter) || be_is_mc(adapter)) return; if (adapter-vxlan_port != port) -- 2.4.6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/10] batman-adv: remove broadcast packets scheduled for purged outgoing if
From: Simon Wunderlich si...@open-mesh.com When an interface is purged, the broadcast packets scheduled for this interface should get purged as well. Signed-off-by: Simon Wunderlich si...@open-mesh.com Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/send.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index 0a01992..191076e 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -616,7 +616,8 @@ batadv_purge_outstanding_packets(struct batadv_priv *bat_priv, * we delete only packets belonging to the given interface */ if ((hard_iface) - (forw_packet-if_incoming != hard_iface)) + (forw_packet-if_incoming != hard_iface) + (forw_packet-if_outgoing != hard_iface)) continue; spin_unlock_bh(bat_priv-forw_bcast_list_lock); -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] net: add Hisilicon Network Subsystem MDIO support
On Friday 14 August 2015 18:30:20 Kenneth Lee wrote: +#define MDIO_BASE_ADDR 0x403C Does not belong in here (and is not used) +#define MDIO_COMMAND_REG 0x0 +#define MDIO_ADDR_REG 0x4 +#define MDIO_WDATA_REG 0x8 +#define MDIO_RDATA_REG 0xc +#define MDIO_STA_REG 0x10 These look suspiciously similar to definitions from drivers/net/ethernet/hisilicon/hip04_mdio.c. Could the hardware be related? If so, please try to share the common parts. +static inline void mdio_write_reg(void *base, u32 reg, u32 value) +{ + u8 __iomem *reg_addr = ACCESS_ONCE(base); + + writel(value, reg_addr + reg); +} + +#define MDIO_WRITE_REG(a, reg, value) \ + mdio_write_reg((a)-vbase, (reg), (value)) Something seems wrong here: why do you have an ACCESS_ONCE() on a local variable? Doesn't this just make the code less efficient without providing lockless access to shared variables? The types are inconsistent here, you should get a warning from running this through 'make C=1' because of the missing __iomem annotation of the pointer. Also, why both a macro and an inline function? Just use an inline function. Arnd -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/10] batman-adv: Make TT capability changes atomic
From: Linus Lüssing linus.luess...@c0d3.blue Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: e17931d1a61d (batman-adv: introduce capability initialization bitfield) Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/translation-table.c | 8 +--- net/batman-adv/types.h | 4 ++-- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index b482495..1573489 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -19,6 +19,7 @@ #include main.h #include linux/atomic.h +#include linux/bitops.h #include linux/bug.h #include linux/byteorder/generic.h #include linux/compiler.h @@ -1862,7 +1863,7 @@ void batadv_tt_global_del_orig(struct batadv_priv *bat_priv, } spin_unlock_bh(list_lock); } - orig_node-capa_initialized = ~BATADV_ORIG_CAPA_HAS_TT; + clear_bit(BATADV_ORIG_CAPA_HAS_TT, orig_node-capa_initialized); } static bool batadv_tt_global_to_purge(struct batadv_tt_global_entry *tt_global, @@ -2821,7 +2822,7 @@ static void _batadv_tt_update_changes(struct batadv_priv *bat_priv, return; } } - orig_node-capa_initialized |= BATADV_ORIG_CAPA_HAS_TT; + set_bit(BATADV_ORIG_CAPA_HAS_TT, orig_node-capa_initialized); } static void batadv_tt_fill_gtable(struct batadv_priv *bat_priv, @@ -3321,7 +3322,8 @@ static void batadv_tt_update_orig(struct batadv_priv *bat_priv, bool has_tt_init; tt_vlan = (struct batadv_tvlv_tt_vlan_data *)tt_buff; - has_tt_init = orig_node-capa_initialized BATADV_ORIG_CAPA_HAS_TT; + has_tt_init = test_bit(BATADV_ORIG_CAPA_HAS_TT, + orig_node-capa_initialized); /* orig table not initialised AND first diff is in the OGM OR the ttvn * increased by one - we can apply the attached changes diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index ed4aec5..6f801ef 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -274,7 +274,7 @@ struct batadv_orig_node { struct hlist_node mcast_want_all_ipv6_node; #endif unsigned long capabilities; - uint8_t capa_initialized; + unsigned long capa_initialized; atomic_t last_ttvn; unsigned char *tt_buff; int16_t tt_buff_len; @@ -315,7 +315,7 @@ struct batadv_orig_node { enum batadv_orig_capabilities { BATADV_ORIG_CAPA_HAS_DAT, BATADV_ORIG_CAPA_HAS_NC, - BATADV_ORIG_CAPA_HAS_TT = BIT(2), + BATADV_ORIG_CAPA_HAS_TT, BATADV_ORIG_CAPA_HAS_MCAST = BIT(3), }; -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/10] batman-adv: Fix potential synchronization issues in mcast tvlv handler
From: Linus Lüssing linus.luess...@c0d3.blue So far the mcast tvlv handler did not anticipate the processing of multiple incoming OGMs from the same originator at the same time. This can lead to various issues: * Broken refcounting: For instance two mcast handlers might both assume that an originator just got multicast capabilities and will together wrongly decrease mcast.num_disabled by two, potentially leading to an integer underflow. * Potential kernel panic on hlist_del_rcu(): Two mcast handlers might one after another try to do an hlist_del_rcu(orig-mcast_want_all_*_node). The second one will cause memory corruption / crashes. (Reported by: Sven Eckelmann s...@narfation.org) Right in the beginning the code path makes assumptions about the current multicast related state of an originator and bases all updates on that. The easiest and least error prune way to fix the issues in this case is to serialize multiple mcast handler invocations with a spinlock. Fixes: 60432d756cf0 (batman-adv: Announce new capability via multicast TVLV) Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/multicast.c | 63 +++-- net/batman-adv/originator.c | 5 net/batman-adv/types.h | 3 +++ 3 files changed, 58 insertions(+), 13 deletions(-) diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index 8f1ec21..68a9554 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -20,6 +20,7 @@ #include linux/atomic.h #include linux/bitops.h +#include linux/bug.h #include linux/byteorder/generic.h #include linux/errno.h #include linux/etherdevice.h @@ -589,19 +590,26 @@ batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb, * * If the BATADV_MCAST_WANT_ALL_UNSNOOPABLES flag of this originator, * orig, has toggled then this method updates counter and list accordingly. + * + * Caller needs to hold orig-mcast_handler_lock. */ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv, struct batadv_orig_node *orig, uint8_t mcast_flags) { + struct hlist_node *node = orig-mcast_want_all_unsnoopables_node; + struct hlist_head *head = bat_priv-mcast.want_all_unsnoopables_list; + /* switched from flag unset to set */ if (mcast_flags BATADV_MCAST_WANT_ALL_UNSNOOPABLES !(orig-mcast_flags BATADV_MCAST_WANT_ALL_UNSNOOPABLES)) { atomic_inc(bat_priv-mcast.num_want_all_unsnoopables); spin_lock_bh(bat_priv-mcast.want_lists_lock); - hlist_add_head_rcu(orig-mcast_want_all_unsnoopables_node, - bat_priv-mcast.want_all_unsnoopables_list); + /* flag checks above + mcast_handler_lock prevents this */ + WARN_ON(!hlist_unhashed(node)); + + hlist_add_head_rcu(node, head); spin_unlock_bh(bat_priv-mcast.want_lists_lock); /* switched from flag set to unset */ } else if (!(mcast_flags BATADV_MCAST_WANT_ALL_UNSNOOPABLES) @@ -609,7 +617,10 @@ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv, atomic_dec(bat_priv-mcast.num_want_all_unsnoopables); spin_lock_bh(bat_priv-mcast.want_lists_lock); - hlist_del_rcu(orig-mcast_want_all_unsnoopables_node); + /* flag checks above + mcast_handler_lock prevents this */ + WARN_ON(hlist_unhashed(node)); + + hlist_del_init_rcu(node); spin_unlock_bh(bat_priv-mcast.want_lists_lock); } } @@ -622,19 +633,26 @@ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv, * * If the BATADV_MCAST_WANT_ALL_IPV4 flag of this originator, orig, has * toggled then this method updates counter and list accordingly. + * + * Caller needs to hold orig-mcast_handler_lock. */ static void batadv_mcast_want_ipv4_update(struct batadv_priv *bat_priv, struct batadv_orig_node *orig, uint8_t mcast_flags) { + struct hlist_node *node = orig-mcast_want_all_ipv4_node; + struct hlist_head *head = bat_priv-mcast.want_all_ipv4_list; + /* switched from flag unset to set */ if (mcast_flags BATADV_MCAST_WANT_ALL_IPV4 !(orig-mcast_flags BATADV_MCAST_WANT_ALL_IPV4)) { atomic_inc(bat_priv-mcast.num_want_all_ipv4); spin_lock_bh(bat_priv-mcast.want_lists_lock); - hlist_add_head_rcu(orig-mcast_want_all_ipv4_node, - bat_priv-mcast.want_all_ipv4_list); + /* flag checks above + mcast_handler_lock prevents this */ +
[PATCH 08/10] batman-adv: protect tt request from double deletion
From: Marek Lindner mareklind...@neomailbox.ch The list_del() calls were changed to list_del_init() to prevent an accidental double deletion in batadv_tt_req_node_new(). Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/translation-table.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 1573489..cd35bb8 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -2196,7 +2196,7 @@ static void batadv_tt_req_list_free(struct batadv_priv *bat_priv) spin_lock_bh(bat_priv-tt.req_list_lock); list_for_each_entry_safe(node, safe, bat_priv-tt.req_list, list) { - list_del(node-list); + list_del_init(node-list); kfree(node); } @@ -2232,7 +2232,7 @@ static void batadv_tt_req_purge(struct batadv_priv *bat_priv) list_for_each_entry_safe(node, safe, bat_priv-tt.req_list, list) { if (batadv_has_timed_out(node-issued_at, BATADV_TT_REQUEST_TIMEOUT)) { - list_del(node-list); + list_del_init(node-list); kfree(node); } } @@ -2514,7 +2514,8 @@ out: batadv_hardif_free_ref(primary_if); if (ret tt_req_node) { spin_lock_bh(bat_priv-tt.req_list_lock); - list_del(tt_req_node-list); + /* list_del_init() verifies tt_req_node still is in the list */ + list_del_init(tt_req_node-list); spin_unlock_bh(bat_priv-tt.req_list_lock); kfree(tt_req_node); } @@ -2951,7 +2952,7 @@ static void batadv_handle_tt_response(struct batadv_priv *bat_priv, list_for_each_entry_safe(node, safe, bat_priv-tt.req_list, list) { if (!batadv_compare_eth(node-addr, resp_src)) continue; - list_del(node-list); + list_del_init(node-list); kfree(node); } -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/10] batman-adv: Make MCAST capability changes atomic
From: Linus Lüssing linus.luess...@c0d3.blue Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 60432d756cf0 (batman-adv: Announce new capability via multicast TVLV) Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/multicast.c | 18 ++ net/batman-adv/types.h | 2 +- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c index 7aa480b..8f1ec21 100644 --- a/net/batman-adv/multicast.c +++ b/net/batman-adv/multicast.c @@ -19,6 +19,7 @@ #include main.h #include linux/atomic.h +#include linux/bitops.h #include linux/byteorder/generic.h #include linux/errno.h #include linux/etherdevice.h @@ -697,29 +698,30 @@ static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, uint8_t mcast_flags = BATADV_NO_FLAGS; bool orig_initialized; - orig_initialized = orig-capa_initialized BATADV_ORIG_CAPA_HAS_MCAST; + orig_initialized = test_bit(BATADV_ORIG_CAPA_HAS_MCAST, + orig-capa_initialized); /* If mcast support is turned on decrease the disabled mcast node * counter only if we had increased it for this node before. If this * is a completely new orig_node no need to decrease the counter. */ if (orig_mcast_enabled - !(orig-capabilities BATADV_ORIG_CAPA_HAS_MCAST)) { + !test_bit(BATADV_ORIG_CAPA_HAS_MCAST, orig-capabilities)) { if (orig_initialized) atomic_dec(bat_priv-mcast.num_disabled); - orig-capabilities |= BATADV_ORIG_CAPA_HAS_MCAST; + set_bit(BATADV_ORIG_CAPA_HAS_MCAST, orig-capabilities); /* If mcast support is being switched off or if this is an initial * OGM without mcast support then increase the disabled mcast * node counter. */ } else if (!orig_mcast_enabled - (orig-capabilities BATADV_ORIG_CAPA_HAS_MCAST || + (test_bit(BATADV_ORIG_CAPA_HAS_MCAST, orig-capabilities) || !orig_initialized)) { atomic_inc(bat_priv-mcast.num_disabled); - orig-capabilities = ~BATADV_ORIG_CAPA_HAS_MCAST; + clear_bit(BATADV_ORIG_CAPA_HAS_MCAST, orig-capabilities); } - orig-capa_initialized |= BATADV_ORIG_CAPA_HAS_MCAST; + set_bit(BATADV_ORIG_CAPA_HAS_MCAST, orig-capa_initialized); if (orig_mcast_enabled tvlv_value (tvlv_value_len = sizeof(mcast_flags))) @@ -763,8 +765,8 @@ void batadv_mcast_purge_orig(struct batadv_orig_node *orig) { struct batadv_priv *bat_priv = orig-bat_priv; - if (!(orig-capabilities BATADV_ORIG_CAPA_HAS_MCAST) - orig-capa_initialized BATADV_ORIG_CAPA_HAS_MCAST) + if (!test_bit(BATADV_ORIG_CAPA_HAS_MCAST, orig-capabilities) + test_bit(BATADV_ORIG_CAPA_HAS_MCAST, orig-capa_initialized)) atomic_dec(bat_priv-mcast.num_disabled); batadv_mcast_want_unsnoop_update(bat_priv, orig, BATADV_NO_FLAGS); diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index 6f801ef..1eeed18 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -316,7 +316,7 @@ enum batadv_orig_capabilities { BATADV_ORIG_CAPA_HAS_DAT, BATADV_ORIG_CAPA_HAS_NC, BATADV_ORIG_CAPA_HAS_TT, - BATADV_ORIG_CAPA_HAS_MCAST = BIT(3), + BATADV_ORIG_CAPA_HAS_MCAST, }; /** -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/10] batman-adv: Make DAT capability changes atomic
From: Linus Lüssing linus.luess...@c0d3.blue Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One OGM handler might undo the set/clear of a specific bit from another handler run in between. Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions. Fixes: 17cf0ea455f1 (batman-adv: tvlv - add distributed arp table container) Signed-off-by: Linus Lüssing linus.luess...@c0d3.blue Signed-off-by: Marek Lindner mareklind...@neomailbox.ch Signed-off-by: Antonio Quartulli anto...@meshcoding.com --- net/batman-adv/distributed-arp-table.c | 7 --- net/batman-adv/types.h | 4 ++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index fb54e6a..1cfba20 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -19,6 +19,7 @@ #include main.h #include linux/atomic.h +#include linux/bitops.h #include linux/byteorder/generic.h #include linux/errno.h #include linux/etherdevice.h @@ -453,7 +454,7 @@ static bool batadv_is_orig_node_eligible(struct batadv_dat_candidate *res, int j; /* check if orig node candidate is running DAT */ - if (!(candidate-capabilities BATADV_ORIG_CAPA_HAS_DAT)) + if (!test_bit(BATADV_ORIG_CAPA_HAS_DAT, candidate-capabilities)) goto out; /* Check if this node has already been selected... */ @@ -713,9 +714,9 @@ static void batadv_dat_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv, uint16_t tvlv_value_len) { if (flags BATADV_TVLV_HANDLER_OGM_CIFNOTFND) - orig-capabilities = ~BATADV_ORIG_CAPA_HAS_DAT; + clear_bit(BATADV_ORIG_CAPA_HAS_DAT, orig-capabilities); else - orig-capabilities |= BATADV_ORIG_CAPA_HAS_DAT; + set_bit(BATADV_ORIG_CAPA_HAS_DAT, orig-capabilities); } /** diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h index 67d6348..29fd625 100644 --- a/net/batman-adv/types.h +++ b/net/batman-adv/types.h @@ -273,7 +273,7 @@ struct batadv_orig_node { struct hlist_node mcast_want_all_ipv4_node; struct hlist_node mcast_want_all_ipv6_node; #endif - uint8_t capabilities; + unsigned long capabilities; uint8_t capa_initialized; atomic_t last_ttvn; unsigned char *tt_buff; @@ -313,7 +313,7 @@ struct batadv_orig_node { * (= orig node announces a tvlv of type BATADV_TVLV_MCAST) */ enum batadv_orig_capabilities { - BATADV_ORIG_CAPA_HAS_DAT = BIT(0), + BATADV_ORIG_CAPA_HAS_DAT, BATADV_ORIG_CAPA_HAS_NC = BIT(1), BATADV_ORIG_CAPA_HAS_TT = BIT(2), BATADV_ORIG_CAPA_HAS_MCAST = BIT(3), -- 2.5.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Global protection fault in netfilter code
Hi, When doing NFS stress tests in a VM with a recent kernel (yesterday's commit 7ddab73346a1 Merge branch 'fixes' of git ://ftp.arm.linux.org.uk/~rmk/linux-arm), I've been seeing the following General Protection Fault code apparently in the nf_conntrack code: PID: 358TASK: 88003630cb80 CPU: 0 COMMAND: kworker/0:1H #0 [88013a603680] die at 81007608 #1 [88013a6036b0] do_general_protection at 8100407a #2 [88013a6036e0] general_protection at 817888c8 [exception RIP: detach_if_pending+103] RIP: 81101b37 RSP: 88013a603798 RFLAGS: 00010086 RAX: dead00200200 RBX: 8800b9771bd8 RCX: 000f RDX: 88013a60e818 RSI: 88013a60d980 RDI: 0046 RBP: 88013a6037b8 R8: R9: 0001 R10: 88013a60d998 R11: 0001 R12: 8800b9771bd8 R13: 88013a60d980 R14: R15: 0001 ORIG_RAX: CS: 0010 SS: 0018 #3 [88013a6037c0] mod_timer_pending at 81101fd2 #4 [88013a603820] __nf_ct_refresh_acct at a055891b [nf_conntrack] #5 [88013a603850] tcp_packet at a056232e [nf_conntrack] #6 [88013a603970] nf_conntrack_in at a055b70a [nf_conntrack] #7 [88013a603a40] ipv4_conntrack_in at a0576326 [nf_conntrack_ipv4] #8 [88013a603a50] nf_iterate at 81688dad #9 [88013a603aa0] nf_hook_slow at 81688e42 #10 [88013a603af0] ip_rcv at 81695ca3 #11 [88013a603b60] __netif_receive_skb_core at 8164d688 #12 [88013a603c00] __netif_receive_skb at 8164e108 #13 [88013a603c20] netif_receive_skb_internal at 8164f7f6 #14 [88013a603c60] napi_gro_complete at 8164fbf7 #15 [88013a603cb0] dev_gro_receive at 81650508 #16 [88013a603d20] napi_gro_receive at 81650a6b #17 [88013a603d50] e1000_clean_rx_irq at a00308db [e1000] #18 [88013a603e00] e1000_clean at a0030f3d [e1000] #19 [88013a603ec0] net_rx_action at 8164ffda #20 [88013a603f40] __do_softirq at 81087a18 #21 [88013a603fb0] do_softirq_own_stack at 8178875c --- IRQ stack --- #22 [880035fabaf0] do_softirq_own_stack at 8178875c [exception RIP: unknown or invalid address] RIP: 88007d7a6108 RSP: 88007d7a60d0 RFLAGS: a02827e5 RAX: 810868a9 RBX: 880035fabb38 RCX: 88007d7a6108 RDX: 880136e56a00 RSI: 880035fabb78 RDI: 81786209 RBP: 810d662d R8: 880035fabb58 R9: fe00 R10: 0046 R11: 810867e5 R12: 0046 R13: 810ce565 R14: 880035fabb18 R15: ORIG_RAX: 8800a4408840 CS: 880035fabbc8 SS: 0001 WARNING: possibly bogus exception frame #23 [880035fabbd0] nfs41_wake_and_assign_slot at a06c7a9d [nfsv4] #24 [880035fabbe0] nfs41_sequence_done at a069c7c0 [nfsv4] #25 [880035fabc30] nfs4_sequence_done at a069caaf [nfsv4] #26 [880035fabc40] nfs4_read_done at a06a2c0e [nfsv4] #27 [880035fabc60] nfs_readpage_done at a0654736 [nfs] #28 [880035fabc90] nfs_pgio_result at a0653414 [nfs] #29 [880035fabcc0] rpc_exit_task at a027f10c [sunrpc] #30 [880035fabce0] __rpc_execute at a02820dd [sunrpc] #31 [880035fabd60] rpc_async_schedule at a0282725 [sunrpc] #32 [880035fabd70] process_one_work at 8109fe89 #33 [880035fabdf0] worker_thread at 810a04ae #34 [880035fabe60] kthread at 810a6aef #35 [880035fabf50] ret_from_fork at 81786f5f I do not see that in vanilla Linux-4.1, so it seems to be a 4.2 cycle thing. Is anyone else seeing this, and is it being looked at by the netfilter folks? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG net-next] infamous dev refcnt leak... again.
On 8/14/15 5:14 PM, Eric Dumazet wrote: On Fri, 2015-08-14 at 14:14 -0700, Eric Dumazet wrote: While rebooting host running latest net-next unregister_netdevice: waiting for eth0 to become free. Usage count = 4 Oh well... It looks like David Ahern recent changes uncover a bug ? Not clear which commit is at fault. Maybe 3bfd847203c6d89532f836ad3f5b4ff4ced26dd9 ? Somehow a down device can be found. Can you elaborate on what you are doing to see the refcnt leak? I have not seen that at all. I have to leave for soccer carpool in 45 minutes or so, but can take a look this weekend. David diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index b7f1d20..675a3b6 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -725,10 +725,14 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi, nh-nh_dev = dev = FIB_RES_DEV(res); if (!dev) goto out; - dev_hold(dev); if (!netif_carrier_ok(dev)) nh-nh_flags |= RTNH_F_LINKDOWN; - err = (dev-flags IFF_UP) ? 0 : -ENETDOWN; + if (dev-flags IFF_UP) { + err = 0; + dev_hold(dev); + } else { + err = -ENETDOWN; + } } else { struct in_device *in_dev; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 1/4] packet: add classic BPF fanout mode
From: Willem de Bruijn will...@google.com Add fanout mode PACKET_FANOUT_CBPF that accepts a classic BPF program to select a socket. This avoids having to keep adding special case fanout modes. One example use case is application layer load balancing. The QUIC protocol, for instance, encodes a connection ID in UDP payload. Also add socket option SOL_PACKET/PACKET_FANOUT_DATA that updates data associated with the socket group. Fanout mode PACKET_FANOUT_CBPF is the only user so far. Signed-off-by: Willem de Bruijn will...@google.com --- include/uapi/linux/if_packet.h | 2 + net/packet/af_packet.c | 99 +- net/packet/internal.h | 5 ++- 3 files changed, 104 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index d3d715f8c..a4bb16f 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -55,6 +55,7 @@ struct sockaddr_ll { #define PACKET_TX_HAS_OFF 19 #define PACKET_QDISC_BYPASS20 #define PACKET_ROLLOVER_STATS 21 +#define PACKET_FANOUT_DATA 22 #define PACKET_FANOUT_HASH 0 #define PACKET_FANOUT_LB 1 @@ -62,6 +63,7 @@ struct sockaddr_ll { #define PACKET_FANOUT_ROLLOVER 3 #define PACKET_FANOUT_RND 4 #define PACKET_FANOUT_QM 5 +#define PACKET_FANOUT_CBPF 6 #define PACKET_FANOUT_FLAG_ROLLOVER0x1000 #define PACKET_FANOUT_FLAG_DEFRAG 0x8000 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index b5afe53..8869d07 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -92,6 +92,7 @@ #ifdef CONFIG_INET #include net/inet_common.h #endif +#include linux/bpf.h #include internal.h @@ -1410,6 +1411,22 @@ static unsigned int fanout_demux_qm(struct packet_fanout *f, return skb_get_queue_mapping(skb) % num; } +static unsigned int fanout_demux_bpf(struct packet_fanout *f, +struct sk_buff *skb, +unsigned int num) +{ + struct bpf_prog *prog; + unsigned int ret = 0; + + rcu_read_lock(); + prog = rcu_dereference(f-bpf_prog); + if (prog) + ret = BPF_PROG_RUN(prog, skb) % num; + rcu_read_unlock(); + + return ret; +} + static bool fanout_has_flag(struct packet_fanout *f, u16 flag) { return f-flags (flag 8); @@ -1454,6 +1471,9 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev, case PACKET_FANOUT_ROLLOVER: idx = fanout_demux_rollover(f, skb, 0, false, num); break; + case PACKET_FANOUT_CBPF: + idx = fanout_demux_bpf(f, skb, num); + break; } if (fanout_has_flag(f, PACKET_FANOUT_FLAG_ROLLOVER)) @@ -1502,6 +1522,74 @@ static bool match_fanout_group(struct packet_type *ptype, struct sock *sk) return false; } +static void fanout_init_data(struct packet_fanout *f) +{ + switch (f-type) { + case PACKET_FANOUT_LB: + atomic_set(f-rr_cur, 0); + break; + case PACKET_FANOUT_CBPF: + RCU_INIT_POINTER(f-bpf_prog, NULL); + break; + } +} + +static void __fanout_set_data_bpf(struct packet_fanout *f, struct bpf_prog *new) +{ + struct bpf_prog *old; + + spin_lock(f-lock); + old = rcu_dereference_protected(f-bpf_prog, lockdep_is_held(f-lock)); + rcu_assign_pointer(f-bpf_prog, new); + spin_unlock(f-lock); + + if (old) { + synchronize_net(); + bpf_prog_destroy(old); + } +} + +static int fanout_set_data_cbpf(struct packet_sock *po, char __user *data, + unsigned int len) +{ + struct bpf_prog *new; + struct sock_fprog fprog; + int ret; + + if (sock_flag(po-sk, SOCK_FILTER_LOCKED)) + return -EPERM; + if (len != sizeof(fprog)) + return -EINVAL; + if (copy_from_user(fprog, data, len)) + return -EFAULT; + + ret = bpf_prog_create_from_user(new, fprog, NULL); + if (ret) + return ret; + + __fanout_set_data_bpf(po-fanout, new); + return 0; +} + +static int fanout_set_data(struct packet_sock *po, char __user *data, + unsigned int len) +{ + switch (po-fanout-type) { + case PACKET_FANOUT_CBPF: + return fanout_set_data_cbpf(po, data, len); + default: + return -EINVAL; + }; +} + +static void fanout_release_data(struct packet_fanout *f) +{ + switch (f-type) { + case PACKET_FANOUT_CBPF: + __fanout_set_data_bpf(f, NULL); + }; +} + static int fanout_add(struct sock *sk, u16 id, u16 type_flags) { struct packet_sock *po = pkt_sk(sk); @@ -1519,6 +1607,7 @@ static int
[PATCH net-next v2 2/4] packet: add extended BPF fanout mode
From: Willem de Bruijn will...@google.com Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF program to select a socket. Update the internal eBPF program by passing to socket option SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf(). Signed-off-by: Willem de Bruijn will...@google.com --- include/uapi/linux/if_packet.h | 1 + net/packet/af_packet.c | 31 +++ 2 files changed, 32 insertions(+) diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h index a4bb16f..9e7edfd 100644 --- a/include/uapi/linux/if_packet.h +++ b/include/uapi/linux/if_packet.h @@ -64,6 +64,7 @@ struct sockaddr_ll { #define PACKET_FANOUT_RND 4 #define PACKET_FANOUT_QM 5 #define PACKET_FANOUT_CBPF 6 +#define PACKET_FANOUT_EBPF 7 #define PACKET_FANOUT_FLAG_ROLLOVER0x1000 #define PACKET_FANOUT_FLAG_DEFRAG 0x8000 diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 8869d07..7b8e39a 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1472,6 +1472,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev, idx = fanout_demux_rollover(f, skb, 0, false, num); break; case PACKET_FANOUT_CBPF: + case PACKET_FANOUT_EBPF: idx = fanout_demux_bpf(f, skb, num); break; } @@ -1529,6 +1530,7 @@ static void fanout_init_data(struct packet_fanout *f) atomic_set(f-rr_cur, 0); break; case PACKET_FANOUT_CBPF: + case PACKET_FANOUT_EBPF: RCU_INIT_POINTER(f-bpf_prog, NULL); break; } @@ -1571,12 +1573,39 @@ static int fanout_set_data_cbpf(struct packet_sock *po, char __user *data, return 0; } +static int fanout_set_data_ebpf(struct packet_sock *po, char __user *data, + unsigned int len) +{ + struct bpf_prog *new; + u32 fd; + + if (sock_flag(po-sk, SOCK_FILTER_LOCKED)) + return -EPERM; + if (len != sizeof(fd)) + return -EINVAL; + if (copy_from_user(fd, data, len)) + return -EFAULT; + + new = bpf_prog_get(fd); + if (IS_ERR(new)) + return PTR_ERR(new); + if (new-type != BPF_PROG_TYPE_SOCKET_FILTER) { + bpf_prog_put(new); + return -EINVAL; + } + + __fanout_set_data_bpf(po-fanout, new); + return 0; +} + static int fanout_set_data(struct packet_sock *po, char __user *data, unsigned int len) { switch (po-fanout-type) { case PACKET_FANOUT_CBPF: return fanout_set_data_cbpf(po, data, len); + case PACKET_FANOUT_EBPF: + return fanout_set_data_ebpf(po, data, len); default: return -EINVAL; }; @@ -1586,6 +1615,7 @@ static void fanout_release_data(struct packet_fanout *f) { switch (f-type) { case PACKET_FANOUT_CBPF: + case PACKET_FANOUT_EBPF: __fanout_set_data_bpf(f, NULL); }; } @@ -1608,6 +1638,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) case PACKET_FANOUT_RND: case PACKET_FANOUT_QM: case PACKET_FANOUT_CBPF: + case PACKET_FANOUT_EBPF: break; default: return -EINVAL; -- 2.5.0.276.gf5e568e -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 3/4] selftests/net: test classic bpf fanout mode
From: Willem de Bruijn will...@google.com Test PACKET_FANOUT_CBPF by inserting a cBPF program that selects a socket by payload. Requires modifying the test program to send packets with multiple payloads. Also fix a bug in testing the return value of mmap() Signed-off-by: Willem de Bruijn will...@google.com --- tools/testing/selftests/net/psock_fanout.c | 16 tools/testing/selftests/net/psock_lib.h| 29 + 2 files changed, 33 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c index 08c2a36..baf46a2 100644 --- a/tools/testing/selftests/net/psock_fanout.c +++ b/tools/testing/selftests/net/psock_fanout.c @@ -19,6 +19,7 @@ * - PACKET_FANOUT_LB * - PACKET_FANOUT_CPU * - PACKET_FANOUT_ROLLOVER + * - PACKET_FANOUT_CBPF * * Todo: * - functionality: PACKET_FANOUT_FLAG_DEFRAG @@ -115,8 +116,8 @@ static char *sock_fanout_open_ring(int fd) ring = mmap(0, req.tp_block_size * req.tp_block_nr, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); - if (!ring) { - fprintf(stderr, packetsock ring mmap\n); + if (ring == MAP_FAILED) { + perror(packetsock ring mmap); exit(1); } @@ -209,6 +210,7 @@ static int test_datapath(uint16_t typeflags, int port_off, { const int expect0[] = { 0, 0 }; char *rings[2]; + uint8_t type = typeflags 0xFF; int fds[2], fds_udp[2][2], ret; fprintf(stderr, test: datapath 0x%hx\n, typeflags); @@ -219,6 +221,9 @@ static int test_datapath(uint16_t typeflags, int port_off, fprintf(stderr, ERROR: failed open\n); exit(1); } + if (type == PACKET_FANOUT_CBPF) + sock_setfilter(fds[0], SOL_PACKET, PACKET_FANOUT_DATA); + rings[0] = sock_fanout_open_ring(fds[0]); rings[1] = sock_fanout_open_ring(fds[1]); pair_udp_open(fds_udp[0], PORT_BASE); @@ -227,11 +232,11 @@ static int test_datapath(uint16_t typeflags, int port_off, /* Send data, but not enough to overflow a queue */ pair_udp_send(fds_udp[0], 15); - pair_udp_send(fds_udp[1], 5); + pair_udp_send_char(fds_udp[1], 5, DATA_CHAR_1); ret = sock_fanout_read(fds, rings, expect1); /* Send more data, overflow the queue */ - pair_udp_send(fds_udp[0], 15); + pair_udp_send_char(fds_udp[0], 15, DATA_CHAR_1); /* TODO: ensure consistent order between expect1 and expect2 */ ret |= sock_fanout_read(fds, rings, expect2); @@ -275,6 +280,7 @@ int main(int argc, char **argv) const int expect_rb[2][2] = { { 15, 5 }, { 20, 15 } }; const int expect_cpu0[2][2] = { { 20, 0 }, { 20, 0 } }; const int expect_cpu1[2][2] = { { 0, 20 }, { 0, 20 } }; + const int expect_bpf[2][2] = { { 15, 5 }, { 15, 20 } }; int port_off = 2, tries = 5, ret; test_control_single(); @@ -295,6 +301,8 @@ int main(int argc, char **argv) port_off, expect_lb[0], expect_lb[1]); ret |= test_datapath(PACKET_FANOUT_ROLLOVER, port_off, expect_rb[0], expect_rb[1]); + ret |= test_datapath(PACKET_FANOUT_CBPF, +port_off, expect_bpf[0], expect_bpf[1]); set_cpuaffinity(0); ret |= test_datapath(PACKET_FANOUT_CPU, port_off, diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h index 37da54a..24bc7ec 100644 --- a/tools/testing/selftests/net/psock_lib.h +++ b/tools/testing/selftests/net/psock_lib.h @@ -30,6 +30,7 @@ #define DATA_LEN 100 #define DATA_CHAR 'a' +#define DATA_CHAR_1'b' #define PORT_BASE 8000 @@ -37,29 +38,36 @@ # define __maybe_unused__attribute__ ((__unused__)) #endif -static __maybe_unused void pair_udp_setfilter(int fd) +static __maybe_unused void sock_setfilter(int fd, int lvl, int optnum) { struct sock_filter bpf_filter[] = { { 0x80, 0, 0, 0x }, /* LD pktlen*/ - { 0x35, 0, 5, DATA_LEN }, /* JGE DATA_LEN [f goto nomatch]*/ + { 0x35, 0, 4, DATA_LEN }, /* JGE DATA_LEN [f goto nomatch]*/ { 0x30, 0, 0, 0x0050 }, /* LD ip[80]*/ - { 0x15, 0, 3, DATA_CHAR }, /* JEQ DATA_CHAR [f goto nomatch]*/ - { 0x30, 0, 0, 0x0051 }, /* LD ip[81]*/ - { 0x15, 0, 1, DATA_CHAR }, /* JEQ DATA_CHAR [f goto nomatch]*/ + { 0x15, 1, 0, DATA_CHAR }, /* JEQ DATA_CHAR [t goto match]*/ + { 0x15, 0, 1, DATA_CHAR_1}, /* JEQ DATA_CHAR_1 [t goto match]*/ { 0x06, 0, 0, 0x0060 }, /* RET match */
[PATCH net-next v2 4/4] selftests/net: test extended BPF fanout mode
From: Willem de Bruijn will...@google.com Test PACKET_FANOUT_EBPF by inserting a program into the the kernel with bpf(), then attaching it to the fanout group. Observe the same payload-based distribution as in the PACKET_FANOUT_CBPF test. Signed-off-by: Willem de Bruijn will...@google.com --- tools/testing/selftests/net/psock_fanout.c | 53 ++ 1 file changed, 53 insertions(+) diff --git a/tools/testing/selftests/net/psock_fanout.c b/tools/testing/selftests/net/psock_fanout.c index baf46a2..4124593 100644 --- a/tools/testing/selftests/net/psock_fanout.c +++ b/tools/testing/selftests/net/psock_fanout.c @@ -20,6 +20,7 @@ * - PACKET_FANOUT_CPU * - PACKET_FANOUT_ROLLOVER * - PACKET_FANOUT_CBPF + * - PACKET_FANOUT_EBPF * * Todo: * - functionality: PACKET_FANOUT_FLAG_DEFRAG @@ -45,7 +46,9 @@ #include arpa/inet.h #include errno.h #include fcntl.h +#include linux/unistd.h /* for __NR_bpf */ #include linux/filter.h +#include linux/bpf.h #include linux/if_packet.h #include net/ethernet.h #include netinet/ip.h @@ -92,6 +95,51 @@ static int sock_fanout_open(uint16_t typeflags, int num_packets) return fd; } +static void sock_fanout_set_ebpf(int fd) +{ + const int len_off = __builtin_offsetof(struct __sk_buff, len); + struct bpf_insn prog[] = { + { BPF_ALU64 | BPF_MOV | BPF_X, 6, 1, 0, 0 }, + { BPF_LDX | BPF_W | BPF_MEM, 0, 6, len_off, 0 }, + { BPF_JMP | BPF_JGE | BPF_K, 0, 0, 1, DATA_LEN }, + { BPF_JMP | BPF_JA | BPF_K, 0, 0, 4, 0 }, + { BPF_LD| BPF_B | BPF_ABS, 0, 0, 0, 0x50 }, + { BPF_JMP | BPF_JEQ | BPF_K, 0, 0, 2, DATA_CHAR }, + { BPF_JMP | BPF_JEQ | BPF_K, 0, 0, 1, DATA_CHAR_1 }, + { BPF_ALU | BPF_MOV | BPF_K, 0, 0, 0, 0 }, + { BPF_JMP | BPF_EXIT, 0, 0, 0, 0 } + }; + char log_buf[512]; + union bpf_attr attr; + int pfd; + + memset(attr, 0, sizeof(attr)); + attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; + attr.insns = (unsigned long) prog; + attr.insn_cnt = sizeof(prog) / sizeof(prog[0]); + attr.license = (unsigned long) GPL; + attr.log_buf = (unsigned long) log_buf, + attr.log_size = sizeof(log_buf), + attr.log_level = 1, + + pfd = syscall(__NR_bpf, BPF_PROG_LOAD, attr, sizeof(attr)); + if (pfd 0) { + perror(bpf); + fprintf(stderr, bpf verifier:\n%s\n, log_buf); + exit(1); + } + + if (setsockopt(fd, SOL_PACKET, PACKET_FANOUT_DATA, pfd, sizeof(pfd))) { + perror(fanout data ebpf); + exit(1); + } + + if (close(pfd)) { + perror(close ebpf); + exit(1); + } +} + static char *sock_fanout_open_ring(int fd) { struct tpacket_req req = { @@ -223,6 +271,8 @@ static int test_datapath(uint16_t typeflags, int port_off, } if (type == PACKET_FANOUT_CBPF) sock_setfilter(fds[0], SOL_PACKET, PACKET_FANOUT_DATA); + else if (type == PACKET_FANOUT_EBPF) + sock_fanout_set_ebpf(fds[0]); rings[0] = sock_fanout_open_ring(fds[0]); rings[1] = sock_fanout_open_ring(fds[1]); @@ -301,8 +351,11 @@ int main(int argc, char **argv) port_off, expect_lb[0], expect_lb[1]); ret |= test_datapath(PACKET_FANOUT_ROLLOVER, port_off, expect_rb[0], expect_rb[1]); + ret |= test_datapath(PACKET_FANOUT_CBPF, port_off, expect_bpf[0], expect_bpf[1]); + ret |= test_datapath(PACKET_FANOUT_EBPF, +port_off, expect_bpf[0], expect_bpf[1]); set_cpuaffinity(0); ret |= test_datapath(PACKET_FANOUT_CPU, port_off, -- 2.5.0.276.gf5e568e -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2 0/4] packet: add cBPF and eBPF fanout modes
From: Willem de Bruijn will...@google.com Allow programmable fanout modes. Support both classical BPF programs passed directly and extended BPF programs passed by file descriptor. One use case is packet steering by deep packet inspection, for instance for packet steering by application layer header fields. Separate the configuration of the fanout mode and the configuration of the program, to allow dynamic updates to the latter at runtime. Changes v1 - v2: - follow SO_LOCK_FILTER semantics on filter updates - only accept eBPF programs of type BPF_PROG_TYPE_SOCKET_FILTER - rename PACKET_FANOUT_BPF to PACKET_FANOUT_CBPF to match man 2 bpf usage: classic vs. extended BPF. Willem de Bruijn (4): packet: add classic BPF fanout mode packet: add extended BPF fanout mode selftests/net: test classic bpf fanout mode selftests/net: test extended BPF fanout mode include/uapi/linux/if_packet.h | 3 + net/packet/af_packet.c | 130 - net/packet/internal.h | 5 +- tools/testing/selftests/net/psock_fanout.c | 69 ++- tools/testing/selftests/net/psock_lib.h| 29 +-- 5 files changed, 222 insertions(+), 14 deletions(-) -- 2.5.0.276.gf5e568e -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 04/11] udp: Handle VRF device in sendmsg
On 8/14/15 9:16 PM, Tom Herbert wrote: At least collect this code into one (static inline) function to better minimize the code churn in udp. If this is general functionality that can be used by other drivers then abstract it out as such. Also, if the VRF driver is not configured it seems like this code should compiled out. As it stands now if (netif_index_is_vrf(net, ipc.oif)) { adds a conditional to every call of udp_sendmsg rather or not we are using VRF :-(. Sure. I wanted to make sure all of the VRF related changes compiled out when the VRF driver is not enabled. This one slipped by me. I'll send a patch next week along with a couple of others per Eric D's comments. David -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/4] packet: add eBPF fanout mode
On Fri, Aug 14, 2015 at 3:46 PM, Daniel Borkmann dan...@iogearbox.net wrote: On 08/14/2015 09:27 PM, Willem de Bruijn wrote: ... Btw, in case someone sets sock_flag(sk, SOCK_FILTER_LOCKED), perhaps we should also apply it on fanout? Good point. With classic bpf, packet access control is fully enforced in per-socket filters, but playing with load balancing filters could allow an adversary to infer some information about the dropped packets*. With eBPF and maps, access is even more direct. Let's support locking of fanout filters in place. Right, a process could share a map between the fanout lb filter and actual sk filter, i.e. to look up how much actually passed through on the later sk level filter, and use that information in addition for its lb decisions. I intend to test the existing socket flag. No need to add a separate flag for the fanout group, as far as I can see. Agreed, should be okay. Great. Thanks for the suggestion, Daniel! I'll send a v2 the three suggested changes in a minute. Thanks Willem! (*) I noticed that a similar unintended effect also causes the PACKET_FANOUT_LB selftest to be flaky: filters on the sockets ensure that the test only reads expected packets. But, all traffic makes it through packet_rcv_fanout. Packets that are later dropped by sk_filter have already incremented rr_cur. Worst case, with 2 sockets and each accepted packet interleaved with a dropped packet, all packets are queued on only one socket. Test flakiness is fixed, e.g., by running in a private network namespace. The implementation behavior may be unexpected in other, production, environments. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv1 net-next 0/5] netlink: mmap: kernel panic and some issues
Hi, Thank you for taking your time and trying to understand, even though one of samples is wrong. correct one is: rx only mmaped nflog sample: https://gist.github.com/chamaken/dc0f80c14862e8061c06/raw/365c8a106840368f313a3791958da9be0f5fbed0/rxring-nflog.c Currently, what happens is that the shared info accesses whatever memory is there in the mmaped region. So when you already do an skb_clone() you should already get into trouble right there f.e. when we test for orphaning frags etc (if at the right offset in the mmap buffer, the tx_flags member would contain a SKBTX_DEV_ZEROCOPY bit). And I'm afraid of a skb which does not have shared info can be released by kfree_skb or not if the next frame is valid. i.e. the current skb-end, shared info points to the next frame's nm_status, say NL_MMAP_STATUS_SKIP, and handle it as shared info pointer. Ken-ichirou, have you observed this issue only in relation to nlmon? Yes, if taps are indeed the only ones affected, it might probably not be worth adding that much complexity for a fix itself, but to keep it simple instead. I don't know if there are any real users of netlink mmap, but You mean mmaped skb can not be monitored by nlmon for a while? I'll follow you, it's tough for me to fix this issue. It seems you have some other, separate fixes in your series, so you might want to submit them separately against the net tree, instead? I'll follow you too. Thank you, I appreciate. include/linux/netlink.h | 4 net/netlink/af_netlink.c | 12 +++- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 9120edb..42cdcd8 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -35,6 +35,10 @@ struct netlink_skb_parms { #define NETLINK_CB(skb) (*(struct netlink_skb_parms*)((skb)-cb)) #define NETLINK_CREDS(skb) (NETLINK_CB((skb)).creds) +static inline bool netlink_skb_is_mmaped(const struct sk_buff *skb) +{ + return NETLINK_CB(skb).flags NETLINK_SKB_MMAPED; +} extern void netlink_table_grab(void); extern void netlink_table_ungrab(void); diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 67d2104..4307446 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -238,6 +238,13 @@ static void __netlink_deliver_tap(struct sk_buff *skb) static void netlink_deliver_tap(struct sk_buff *skb) { + /* Netlink mmaped skbs must not access shared info, and thus + * are not allowed to be cloned. For now, just don't allow + * them to get inspected by taps. + */ + if (netlink_skb_is_mmaped(skb)) + return; + rcu_read_lock(); if (unlikely(!list_empty(netlink_tap_all))) @@ -278,11 +285,6 @@ static void netlink_rcv_wake(struct sock *sk) } #ifdef CONFIG_NETLINK_MMAP -static bool netlink_skb_is_mmaped(const struct sk_buff *skb) -{ - return NETLINK_CB(skb).flags NETLINK_SKB_MMAPED; -} - static bool netlink_rx_is_mmaped(struct sock *sk) { return nlk_sk(sk)-rx_ring.pg_vec != NULL; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 04/11] udp: Handle VRF device in sendmsg
On Fri, Aug 14, 2015 at 10:58 AM, Shrijeet Mukherjee s...@cumulusnetworks.com wrote: On Fri, Aug 14, 2015 at 9:27 AM, Tom Herbert t...@herbertland.com wrote: On Thu, Aug 13, 2015 at 1:59 PM, David Ahern d...@cumulusnetworks.com wrote: For unconnected UDP sockets using a VRF device lookup source address based on VRF table. This allows the UDP header to be properly setup before showing up at the VRF device via the dst. Signed-off-by: Shrijeet Mukherjee s...@cumulusnetworks.com Signed-off-by: David Ahern d...@cumulusnetworks.com --- net/ipv4/udp.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604f9273..7af5052e3b1f 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1013,11 +1013,31 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (!rt) { struct net *net = sock_net(sk); + __u8 flow_flags = inet_sk_flowi_flags(sk); fl4 = fl4_stack; + + /* unconnected socket. If output device is enslaved to a VRF +* device lookup source address from VRF table. This mimics +* behavior of ip_route_connect{_init}. +*/ + if (netif_index_is_vrf(net, ipc.oif)) { + flowi4_init_output(fl4, ipc.oif, sk-sk_mark, tos, + RT_SCOPE_UNIVERSE, sk-sk_protocol, + (flow_flags | FLOWI_FLAG_VRFSRC), + faddr, saddr, dport, + inet-inet_sport); + + rt = ip_route_output_flow(net, fl4, sk); + if (!IS_ERR(rt)) { + saddr = fl4-saddr; + ip_rt_put(rt); + } + } + I really don't like this. It seems like you're putting device specific code in a critical L4 data path function. Also, does ipv6/udp.c need be updated similarly? Why can't VRF be abstracted out in routing lookups? Tom, Did not have a better way to make this work. The point of the VRF driver was to be completely transparent for anything other routing lookups. Modifying the header in the driver means that fragmentation etc will have trouble. So this code really just makes the saddr evaluation before we enter the udp code path and is similar to what the tcp side does. If you have a suggestion on a different and hopefully consistent way to do with tcp and ipv6, that would be preferable. At least collect this code into one (static inline) function to better minimize the code churn in udp. If this is general functionality that can be used by other drivers then abstract it out as such. Also, if the VRF driver is not configured it seems like this code should compiled out. As it stands now if (netif_index_is_vrf(net, ipc.oif)) { adds a conditional to every call of udp_sendmsg rather or not we are using VRF :-(. Tom -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net] igb: Fix oops caused by missing queue pairing
From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On Behalf Of Shota Suzuki Sent: Tuesday, June 30, 2015 5:26 PM To: Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson, Shannon; Wyborny, Carolyn; Skidmore, Donald C; Vick, Matthew; Ronciak, John; Williams, Mitch A; intel-wired-...@lists.osuosl.org; netdev@vger.kernel.org; linux- ker...@vger.kernel.org Cc: Shota Suzuki Subject: [PATCH net] igb: Fix oops caused by missing queue pairing When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is set if adapter-rss_queues exceeds half of max_rss_queues in igb_init_queue_configuration(). On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of queues exceeds half of max_combined in igb_set_channels() when changing the number of queues by ethtool -L. In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size of adapter-msix_entries[], an overflow can occur in igb_set_interrupt_capability(), which in turn leads to an oops. Fix this problem as follows: - When changing the number of queues by ethtool -L, set IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver. - When increasing the size of q_vector, reallocate it appropriately. (With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.) Another possible way to fix this problem is to cap the queues at its initial number, which is the number of the initial online cpus. But this is not the optimal way because we cannnot increase queues when another cpu becomes online. Note that before commit cd14ef54d25b (igb: Change to use statically allocated array for MSIx entries), this problem did not cause oops but just made the number of queues become 1 because of entering msi_only mode in igb_set_interrupt_capability(). Fixes: 907b7835799f (igb: Add ethtool support to configure number of channels) Signed-off-by: Shota Suzuki suzuki_shota...@lab.ntt.co.jp --- Although we might be able to additionally unset IGB_FLAG_QUEUE_PAIRS when it is not needed, this patch doesn't change existing behaviour because such a change is not a bug fix. drivers/net/ethernet/intel/igb/igb.h | 1 + drivers/net/ethernet/intel/igb/igb_ethtool.c | 5 - drivers/net/ethernet/intel/igb/igb_main.c| 16 ++-- 3 files changed, 19 insertions(+), 3 deletions(-) Tested-by: Aaron Brown aaron.f.br...@intel.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html