Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users
On Fri, Jul 10, 2015 at 06:21:14PM -0700, David Miller wrote: From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Date: Thu, 9 Jul 2015 11:15:19 -0300 SCTP has this operation to peel off associations from a given socket and create a new socket using this association. We currently have two ways to use this operation: - via getsockopt(), on which it will also create and return a file descriptor for this new socket - via sctp_do_peeloff(), which is for kernel only The caveat with using sctp_do_peeloff() directly is that it creates a dependency to SCTP module, while all other operations are handled via kernel_{socket,sendmsg,getsockopt...}() interface. This causes the kernel to load SCTP module even when it's not directly used This patch then updates SCTP_SOCKOPT_PEELOFF so that for kernel users of this protocol it will not allocate a file descriptor but instead just return the socket pointer directly. If called by an user application it will work as before. Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com I do not like this at all. Socket option implementations should not change their behavior or what datastructures they consume or return just because the socket happens to be a kernel socket. But in this case its necessecary, as the kernel here can't allocate an fd, due to serious leakage (see commit 2f2d76cc3e938389feee671b46252dde6880b3b7). Initially Marcelo had created duplicate code paths, one to return an fd, one to return a file struct. If you would rather go in that direction, I'm sure he can propose it again, but that seems less correct to me than this solution. I'm not applying this series, sorry. Also, your patch series lacked an intial PATCH 0/N posting, so you could at least spend the time to discuss this patch series at a high level and explain your overall motivations. That was in the initial posting. It should have been reposted, but if you're interested: http://marc.info/?l=linux-sctpm=143449456219518w=2 Regards Neil -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 02/16] i40e/i40evf: Add stats to track FD ATR and SB dynamic enable state
From: Anjali Singhai Jain anjali.sing...@intel.com Since the driver can dynamically enable/disable FD ATR and SB features, these stats help keep track of the current state and along with fd_flush count provide a means to debug what could be going on with the flow director filters. This will take away the need for being verbose in our debug logs with respect to FD. Change-ID: I29224f750fe6602391043655d18996570720377d Signed-off-by: Anjali Singhai Jain anjali.sing...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 ++ drivers/net/ethernet/intel/i40e/i40e_main.c| 12 drivers/net/ethernet/intel/i40e/i40e_type.h| 2 ++ drivers/net/ethernet/intel/i40evf/i40e_type.h | 2 ++ 4 files changed, 18 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c index 9a68c65..0b68f61 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c @@ -148,7 +148,9 @@ static struct i40e_stats i40e_gstrings_stats[] = { I40E_PF_STAT(fdir_flush_cnt, fd_flush_cnt), I40E_PF_STAT(fdir_atr_match, stats.fd_atr_match), I40E_PF_STAT(fdir_atr_tunnel_match, stats.fd_atr_tunnel_match), + I40E_PF_STAT(fdir_atr_status, stats.fd_atr_status), I40E_PF_STAT(fdir_sb_match, stats.fd_sb_match), + I40E_PF_STAT(fdir_sb_status, stats.fd_sb_status), /* LPI stats */ I40E_PF_STAT(tx_lpi_status, stats.tx_lpi_status), diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index b44eb35..b5fc654 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -1123,6 +1123,18 @@ static void i40e_update_pf_stats(struct i40e_pf *pf) pf-stat_offsets_loaded, osd-rx_lpi_count, nsd-rx_lpi_count); + if (pf-flags I40E_FLAG_FD_SB_ENABLED + !(pf-auto_disable_flags I40E_FLAG_FD_SB_ENABLED)) + nsd-fd_sb_status = true; + else + nsd-fd_sb_status = false; + + if (pf-flags I40E_FLAG_FD_ATR_ENABLED + !(pf-auto_disable_flags I40E_FLAG_FD_ATR_ENABLED)) + nsd-fd_atr_status = true; + else + nsd-fd_atr_status = false; + pf-stat_offsets_loaded = true; } diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index 9a5a75b..350c5ee 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -1134,6 +1134,8 @@ struct i40e_hw_port_stats { u64 fd_atr_match; u64 fd_sb_match; u64 fd_atr_tunnel_match; + u32 fd_atr_status; + u32 fd_sb_status; /* EEE LPI */ u32 tx_lpi_status; u32 rx_lpi_status; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h index c463ec4..068813d 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_type.h +++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h @@ -1109,6 +1109,8 @@ struct i40e_hw_port_stats { u64 fd_atr_match; u64 fd_sb_match; u64 fd_atr_tunnel_match; + u32 fd_atr_status; + u32 fd_sb_status; /* EEE LPI */ u32 tx_lpi_status; u32 rx_lpi_status; -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 04/16] i40e/i40evf: improve Tx performance with a small tweak
From: Jesse Brandeburg jesse.brandeb...@intel.com Add a prefetch for the next Tx descriptor to be used when we know there are more coming. Change-ID: Ibb9acab11d508eec2db7da795df74debc16eeacb Signed-off-by: Jesse Brandeburg jesse.brandeb...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 ++ drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 9a4f2bc..1fe230d 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2616,6 +2616,8 @@ static inline void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb, netif_xmit_stopped(netdev_get_tx_queue(tx_ring-netdev, tx_ring-queue_index))) writel(i, tx_ring-tail); + else + prefetchw(tx_desc + 1); return; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c index 395f32f..0f0e185 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c @@ -1841,6 +1841,8 @@ static inline void i40evf_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb, netif_xmit_stopped(netdev_get_tx_queue(tx_ring-netdev, tx_ring-queue_index))) writel(i, tx_ring-tail); + else + prefetchw(tx_desc + 1); return; -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 01/16] i40e: Implement ndo_features_check()
From: Joe Stringer joestrin...@nicira.com i40e supports UDP tunnel headers up to 80 bytes in length, so this adds a check to ensure that it doesn't try to offload packets that exceed that. Signed-off-by: Joe Stringer joestrin...@nicira.com Signed-off-by: Jesse Gross je...@nicira.com Acked-by: Jesse Brandeburg jesse.brandeb...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_main.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 48a52b3..b44eb35 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -8099,6 +8099,25 @@ static int i40e_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, } #endif /* HAVE_BRIDGE_ATTRIBS */ +#define I40E_MAX_TUNNEL_HDR_LEN 80 +/** + * i40e_features_check - Validate encapsulated packet conforms to limits + * @skb: skb buff + * @netdev: This physical port's netdev + * @features: Offload features that the stack believes apply + **/ +static netdev_features_t i40e_features_check(struct sk_buff *skb, +struct net_device *dev, +netdev_features_t features) +{ + if (skb-encapsulation + (skb_inner_mac_header(skb) - skb_transport_header(skb) +I40E_MAX_TUNNEL_HDR_LEN)) + return features ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK); + + return features; +} + static const struct net_device_ops i40e_netdev_ops = { .ndo_open = i40e_open, .ndo_stop = i40e_close, @@ -8133,6 +8152,7 @@ static const struct net_device_ops i40e_netdev_ops = { #endif .ndo_get_phys_port_id = i40e_get_phys_port_id, .ndo_fdb_add= i40e_ndo_fdb_add, + .ndo_features_check = i40e_features_check, #ifdef HAVE_BRIDGE_ATTRIBS .ndo_bridge_getlink = i40e_ndo_bridge_getlink, .ndo_bridge_setlink = i40e_ndo_bridge_setlink, -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 05/16] i40evf: Allow for an abundance of vectors
From: Mitch Williams mitch.a.willi...@intel.com The driver currently only maps TX and RX queues to a single MSI-X vector per queue pair if there are exactly enough vectors for this. Unfortunately, if we have too many vectors it will fail and allocate queues to vectors in a suboptimal manner. Change the condition check to allow for excess vectors. In this case, the extras just won't be used. Change-ID: I23e1e2955c64739c86612db88a25583e6a7e0b17 Signed-off-by: Mitch Williams mitch.a.willi...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 4ab4ebb..94eff4a 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -406,7 +406,7 @@ static int i40evf_map_rings_to_vectors(struct i40evf_adapter *adapter) /* The ideal configuration... * We have enough vectors to map one per queue. */ - if (q_vectors == (rxr_remaining * 2)) { + if (q_vectors = (rxr_remaining * 2)) { for (; rxr_idx rxr_remaining; v_start++, rxr_idx++) i40evf_map_vector_to_rxq(adapter, v_start, rxr_idx); -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.
On 13 July 2015 at 15:32, YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com wrote: Hi, Erik Kline wrote: Hmm, when I run a UML linux with this patch (which, I'm ashamed to say, I failed to do before) I get these kinds of errors: unregister_netdevice: waiting for TAPdevice to become free. Usage count = 1 unregister_netdevice: waiting for TAPdevice to become free. Usage count = 1 Perhaps they're unrelated... I'm still investigating. Would you test attached patch please? That does look logically correct, so +1 to it regardless, but it does not seem to have fixed the issue I'm seeing. I still haven't produced the smallest possible demo test program. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netlink: enable skb header refcounting before sending first broadcast
On 13.07.2015 10:23, Herbert Xu wrote: On Fri, Jul 10, 2015 at 02:51:41PM +0300, Konstantin Khlebnikov wrote: This fixes race between non-atomic updates of adjacent bit-fields: skb-cloned could be lost because netlink broadcast clones skb after sending it to the first listener who sets skb-peeked at the same skb. As a result atomic refcounting of skb header stays disabled and skb_release_data() frees it twice. Race leads to double-free in kmalloc-xxx. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Fixes: b19372273164 (net: reorganize sk_buff for faster __copy_skb_header()) --- net/netlink/af_netlink.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index dea925388a5b..921e0d8dfe3a 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2028,6 +2028,12 @@ int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 portid info.tx_filter = filter; info.tx_data = filter_data; + /* Enable atomic refcounting in skb_release_data() before first send: +* non-atomic set of that bit-field in __skb_clone() could race with +* __skb_recv_datagram() which touches the same set of bit-fields. +*/ + skb-cloned = 1; + /* While we sleep in clone, do not allow to change socket list */ netlink_lock_table(); Your effort in finding this bug is wonderful. However I think the fix is a bit dirty. The real issue here is that the recv path no longer handles shared skbs. So either we need to fix the recv path to not touch skbs without cloning them, or we need to get rid of the use of shared skbs in netlink. I don't think that recv path should care about shared skb -- skb can be delivered into only one socket anyway. Less dirty fix for that: do not send original skb. That adds one extra clone but makes code much cleaner. --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -1957,17 +1957,16 @@ static void do_one_broadcast(struct sock *sk, } sock_hold(sk); - if (p-skb2 == NULL) { - if (skb_shared(p-skb)) { - p-skb2 = skb_clone(p-skb, p-allocation); - } else { - p-skb2 = skb_get(p-skb); - /* -* skb ownership may have been set when -* delivered to a previous socket. -*/ - skb_orphan(p-skb2); - } + if (p-skb2 == NULL || skb_shared(p-skb2)) { + kfree_skb(p-skb2); + p-skb2 = skb_clone(p-skb, p-allocation); + } else { + skb_get(p-skb2); + /* +* skb ownership may have been set when +* delivered to a previous socket. +*/ + skb_orphan(p-skb2); } if (p-skb2 == NULL) { netlink_overrun(sk); @@ -1997,7 +1996,6 @@ static void do_one_broadcast(struct sock *sk, } else { p-congested |= val; p-delivered = 1; - p-skb2 = NULL; } out: sock_put(sk); In fact it looks I introduced the bug way back in commit a59322be07c964e916d15be3df473fb7ba20c41e Author: Herbert Xu herb...@gondor.apana.org.au Date: Wed Dec 5 01:53:40 2007 -0800 [UDP]: Only increment counter on first peek/recv I will try to mend this error :) Cheers, -- Konstantin -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] bnx2x: Update to FW version 7.12.30
The new FW will allow us to utilize some new features in our driver, mainly adding vlan filtering offload and vxlan offload support. In addition, this fixes several issues: 1. Packets from a VF with pvid configured which were sent with a different vlan were transmitted instead of being discarded. 2. FCoE traffic might not recover after a failue while there's traffic to another function. Signed-off-by: Yuval Mintz yuval.mi...@qlogic.com Hi, any news about this one? Thanks, Yuval Any updates? I've sent this 3-weeks ago and haven't seen any reply. Apparently the destination E-mail has changed and I was unaware. Is anyone here? ;-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 03/16] i40e/i40evf: Update Flex-10 related device/function capabilities
From: Pawel Orlowski pawel.orlow...@intel.com The Flex10 device/function capability has been upgraded to include information needed to support Flex-10 configurations. This patch adds new fields to the i40e_hw_capabilities structure and updates i40e_parse_discover_capabilities functions to extract them from the AQ response. Naming convention has changed to use flex10 mode instead of existing mfp_mode_1. Change-ID: I305dd66985a30293acb3fb14fa43ca6b79ea Signed-off-by: Pawel Orlowski pawel.orlow...@intel.com Signed-off-by: Akeem G Abodunrin akeem.g.abodun...@intel.com Signed-off-by: Shannon Nelson shannon.nel...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_common.c | 24 +++- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_type.h | 12 +++- drivers/net/ethernet/intel/i40evf/i40e_type.h | 12 +++- 4 files changed, 42 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c index 0bae22d..0703222 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_common.c +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c @@ -2391,7 +2391,7 @@ i40e_aq_erase_nvm_exit: #define I40E_DEV_FUNC_CAP_MSIX_VF 0x44 #define I40E_DEV_FUNC_CAP_FLOW_DIRECTOR0x45 #define I40E_DEV_FUNC_CAP_IEEE_15880x46 -#define I40E_DEV_FUNC_CAP_MFP_MODE_1 0xF1 +#define I40E_DEV_FUNC_CAP_FLEX10 0xF1 #define I40E_DEV_FUNC_CAP_CEM 0xF2 #define I40E_DEV_FUNC_CAP_IWARP0x51 #define I40E_DEV_FUNC_CAP_LED 0x61 @@ -2416,6 +2416,7 @@ static void i40e_parse_discover_capabilities(struct i40e_hw *hw, void *buff, u32 valid_functions, num_functions; u32 number, logical_id, phys_id; struct i40e_hw_capabilities *p; + u8 major_rev; u32 i = 0; u16 id; @@ -2433,6 +2434,7 @@ static void i40e_parse_discover_capabilities(struct i40e_hw *hw, void *buff, number = le32_to_cpu(cap-number); logical_id = le32_to_cpu(cap-logical_id); phys_id = le32_to_cpu(cap-phys_id); + major_rev = cap-major_rev; switch (id) { case I40E_DEV_FUNC_CAP_SWITCH_MODE: @@ -2507,9 +2509,21 @@ static void i40e_parse_discover_capabilities(struct i40e_hw *hw, void *buff, case I40E_DEV_FUNC_CAP_MSIX_VF: p-num_msix_vectors_vf = number; break; - case I40E_DEV_FUNC_CAP_MFP_MODE_1: - if (number == 1) - p-mfp_mode_1 = true; + case I40E_DEV_FUNC_CAP_FLEX10: + if (major_rev == 1) { + if (number == 1) { + p-flex10_enable = true; + p-flex10_capable = true; + } + } else { + /* Capability revision = 2 */ + if (number 1) + p-flex10_enable = true; + if (number 2) + p-flex10_capable = true; + } + p-flex10_mode = logical_id; + p-flex10_status = phys_id; break; case I40E_DEV_FUNC_CAP_CEM: if (number == 1) @@ -2557,7 +2571,7 @@ static void i40e_parse_discover_capabilities(struct i40e_hw *hw, void *buff, /* Software override ensuring FCoE is disabled if npar or mfp * mode because it is not supported in these modes. */ - if (p-npar_enable || p-mfp_mode_1) + if (p-npar_enable || p-flex10_enable) p-fcoe = false; /* count the enabled ports (aka the not disabled ports) */ diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index b5fc654..ed6fc52 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -7685,7 +7685,7 @@ static int i40e_sw_init(struct i40e_pf *pf) } /* MFP mode enabled */ - if (pf-hw.func_caps.npar_enable || pf-hw.func_caps.mfp_mode_1) { + if (pf-hw.func_caps.npar_enable || pf-hw.func_caps.flex10_enable) { pf-flags |= I40E_FLAG_MFP_ENABLED; dev_info(pf-pdev-dev, MFP mode Enabled\n); if (i40e_get_npar_bw_setting(pf)) diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h index 350c5ee..220371e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_type.h +++ b/drivers/net/ethernet/intel/i40e/i40e_type.h @@ -213,7 +213,17 @@ struct
[net-next 06/16] i40e: ignore duplicate port VLAN requests
From: Mitch Williams mitch.a.willi...@intel.com If user attempts to set a port VLAN on a VF that already has the same port VLAN configured, the driver will go through a completely unnecessary flurry of filter removals and filter adds. Just check for this condition and return success instead of doing a bunch of busywork. Change-ID: Ia1a9e83e6ed48b3f4658bc20dfc6af0cf525d54a Signed-off-by: Mitch Williams mitch.a.willi...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 23f95cd..433e803 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2088,6 +2088,10 @@ int i40e_ndo_set_vf_port_vlan(struct net_device *netdev, goto error_pvid; } + if (vsi-info.pvid == (vlan_id | (qos I40E_VLAN_PRIORITY_SHIFT))) + /* duplicate request, so just return success */ + goto error_pvid; + if (vsi-info.pvid == 0 i40e_is_vsi_in_vlan(vsi)) { dev_err(pf-pdev-dev, VF %d has already configured VLAN filters and the administrator is requesting a port VLAN override.\nPlease unload and reload the VF driver for this change to take effect.\n, -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 12/16] i40evf: don't delete all the filters
From: Mitch Williams mitch.a.willi...@intel.com Due to an inverted conditional, the driver was marking all of its MAC filters for deletion every time set_rx_mode was called. Depending upon the timing of the calls to set_rx_mode and the processing of the admin queue, the driver would (accidentally) end up with a varying number of functional filters. Correct this logic so that MAC filters are added and removed correctly. Add a check for the driver's hardware MAC address so that this filter doesn't get removed incorrectly. Change-ID: Ib3e7c4a5b53df6835f164fe44cb778cb71f8aff8 Signed-off-by: Mitch Williams mitch.a.willi...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 94eff4a..07f6052 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -892,8 +892,10 @@ static void i40evf_set_rx_mode(struct net_device *netdev) break; } } + if (ether_addr_equal(f-macaddr, adapter-hw.mac.addr)) + found = true; } - if (found) { + if (!found) { f-remove = true; adapter-aq_required |= I40EVF_FLAG_AQ_DEL_MAC_FILTER; } -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 00/16][pull request] Intel Wired LAN Driver Updates 2015-07-13
This series contains updates to i40e and i40evf only. Joe Stringer and Jesse Gross add a ndo_features_check function to ensure that the i40e driver does not try to offload packets that exceed 80 bytes in length. Anjali adds additional stats to track flow director ATR and SB current state and flow director flush count which will help the need for verbose debug logs with respect to flow director. Also refines an error message to avoid confusion, so that it indicates what may have really happened when the init_shared_code() call possibly fails. Pawel adds new fields to the capabilities structures to handle Flex-10 device/function capabilities which is needed to support Flex-10 configs. Jesse improves the transmit performance by added a prefetch for the next transmit descriptor to be used when we know there are more coming. Mitch modifies i40evf driver to handle/allow an abundance of vectors. Currently the driver only maps transmit and receive queues to a single MSI-X vector per queue if there are exactly enough vectors for this, but if we have too many vectors, it will fail and allocate queues to vectors in a suboptimal manner. So change the condition check to allow for an excess number of vectors and won't use the extras. Also update the driver to just return success if the user attempts to set a port VLAN on a VF that already has the same port VLAN configured, instead of going through unnecessary filter removals adds. Fix the MAC filters for VFs, which were being programmed with 0 for the VLAN value when there was no VLAN assigned. Instead, we must use -1 to indicate that no VLAN is in use. Fix the VF disable code, which was not properly cleaning up the VF and would leave the VF in an indeterminate state, so fix this by notifying the VF and then call the normal VF reset routine. Fix the logic in the driver so that MAC filters are added and removed correctly and added a check for the driver's hardware MAC address so that this filter does not get removed incorrectly. Carolyn removes incorrect #ifdef's which should not have been added in the first place and with the #ifdef's removed, make the necessary changes in the driver to resolve compile errors. Greg updates the admin queue command header defines. The following are changes since commit 14fe22e334623e451b5592193415c644005461ea: Revert ipv4: use skb coalescing in defragmentation and are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue master Anjali Singhai Jain (2): i40e/i40evf: Add stats to track FD ATR and SB dynamic enable state i40e: Refine an error message to avoid confusion Carolyn Wyborny (1): i40e: Remove incorrect #ifdef's Catherine Sullivan (1): i40e/i40evf: Bump version to 1.3.6 for i40e and 1.3.2 for i40evf Faisal Latif (1): i40e/i40evf: Add support for pre-allocated pages for PD Greg Rose (1): i40e/i40evf: Update the admin queue command header Jesse Brandeburg (1): i40e/i40evf: improve Tx performance with a small tweak Joe Stringer (1): i40e: Implement ndo_features_check() Mitch Williams (7): i40evf: Allow for an abundance of vectors i40e: ignore duplicate port VLAN requests i40e: correctly program filters for VFs i40e: do a proper reset when disabling a VF i40e: un-disable VF after reset i40evf: don't delete all the filters i40evf: add MAC address filter in open, not init Pawel Orlowski (1): i40e/i40evf: Update Flex-10 related device/function capabilities drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 24 +- drivers/net/ethernet/intel/i40e/i40e_common.c | 24 +++--- drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 + drivers/net/ethernet/intel/i40e/i40e_hmc.c | 30 - drivers/net/ethernet/intel/i40e/i40e_hmc.h | 4 +- drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c | 2 +- drivers/net/ethernet/intel/i40e/i40e_main.c| 51 -- drivers/net/ethernet/intel/i40e/i40e_txrx.c| 2 + drivers/net/ethernet/intel/i40e/i40e_type.h| 14 +- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 23 +- .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h| 18 +++- drivers/net/ethernet/intel/i40evf/i40e_hmc.h | 4 +- drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 + drivers/net/ethernet/intel/i40evf/i40e_type.h | 14 +- drivers/net/ethernet/intel/i40evf/i40evf_main.c| 20 +++-- 15 files changed, 155 insertions(+), 79 deletions(-) -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 10/16] i40e: do a proper reset when disabling a VF
From: Mitch Williams mitch.a.willi...@intel.com The VF disable code was just whanging on the reset bit without properly cleaning up the VF, which would leave the VF in an indeterminate state from which it could not recover. Fix this by notifying the VF and then by calling the normal VF reset routine. Change-ID: I862b9dfa919368773cbdc212b805b520db2f7430 Signed-off-by: Mitch Williams mitch.a.willi...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 4070a22..55b19f5 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -160,13 +160,8 @@ void i40e_vc_notify_vf_reset(struct i40e_vf *vf) **/ static inline void i40e_vc_disable_vf(struct i40e_pf *pf, struct i40e_vf *vf) { - struct i40e_hw *hw = pf-hw; - u32 reg; - - reg = rd32(hw, I40E_VPGEN_VFRTRIG(vf-vf_id)); - reg |= I40E_VPGEN_VFRTRIG_VFSWR_MASK; - wr32(hw, I40E_VPGEN_VFRTRIG(vf-vf_id), reg); - i40e_flush(hw); + i40e_vc_notify_vf_reset(vf); + i40e_reset_vf(vf, false); } /** -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 14/16] i40e/i40evf: Add support for pre-allocated pages for PD
From: Faisal Latif faisal.la...@intel.com The i40e_add_pd_table_entry() routine is being modified to handle both cases where a backing page is passed and where backing page is allocated in i40e_add_pd_table_entry(). For PBLE resource management, it is more efficient for it to manage its backing pages. For VF, PBLE backing page addresses will be send to PF driver for PBLE resource. The i40e_remove_pd_bp() is also modified to not free pre-allocated pages and free only ones which were allocated in i40e_add_pd_table_entry(). Change-ID: Ie673f0403f22979e9406f5a94048dceb91bcf9a8 Signed-off-by: Faisal Latif faisal.la...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_hmc.c | 30 +- drivers/net/ethernet/intel/i40e/i40e_hmc.h | 4 +++- drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c | 2 +- drivers/net/ethernet/intel/i40evf/i40e_hmc.h | 4 +++- 4 files changed, 27 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c index 9b987cc..b89856a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c +++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c @@ -116,6 +116,7 @@ exit: * @hw: pointer to our HW structure * @hmc_info: pointer to the HMC configuration information structure * @pd_index: which page descriptor index to manipulate + * @rsrc_pg: if not NULL, use preallocated page instead of allocating new one. * * This function: * 1. Initializes the pd entry @@ -129,12 +130,14 @@ exit: **/ i40e_status i40e_add_pd_table_entry(struct i40e_hw *hw, struct i40e_hmc_info *hmc_info, - u32 pd_index) + u32 pd_index, + struct i40e_dma_mem *rsrc_pg) { i40e_status ret_code = 0; struct i40e_hmc_pd_table *pd_table; struct i40e_hmc_pd_entry *pd_entry; struct i40e_dma_mem mem; + struct i40e_dma_mem *page = mem; u32 sd_idx, rel_pd_idx; u64 *pd_addr; u64 page_desc; @@ -155,18 +158,24 @@ i40e_status i40e_add_pd_table_entry(struct i40e_hw *hw, pd_table = hmc_info-sd_table.sd_entry[sd_idx].u.pd_table; pd_entry = pd_table-pd_entry[rel_pd_idx]; if (!pd_entry-valid) { - /* allocate a 4K backing page */ - ret_code = i40e_allocate_dma_mem(hw, mem, i40e_mem_bp, -I40E_HMC_PAGED_BP_SIZE, -I40E_HMC_PD_BP_BUF_ALIGNMENT); - if (ret_code) - goto exit; + if (rsrc_pg) { + pd_entry-rsrc_pg = true; + page = rsrc_pg; + } else { + /* allocate a 4K backing page */ + ret_code = i40e_allocate_dma_mem(hw, page, i40e_mem_bp, + I40E_HMC_PAGED_BP_SIZE, + I40E_HMC_PD_BP_BUF_ALIGNMENT); + if (ret_code) + goto exit; + pd_entry-rsrc_pg = false; + } - pd_entry-bp.addr = mem; + pd_entry-bp.addr = *page; pd_entry-bp.sd_pd_index = pd_index; pd_entry-bp.entry_type = I40E_SD_TYPE_PAGED; /* Set page address and valid bit */ - page_desc = mem.pa | 0x1; + page_desc = page-pa | 0x1; pd_addr = (u64 *)pd_table-pd_page_addr.va; pd_addr += rel_pd_idx; @@ -240,7 +249,8 @@ i40e_status i40e_remove_pd_bp(struct i40e_hw *hw, I40E_INVALIDATE_PF_HMC_PD(hw, sd_idx, idx); /* free memory here */ - ret_code = i40e_free_dma_mem(hw, (pd_entry-bp.addr)); + if (!pd_entry-rsrc_pg) + ret_code = i40e_free_dma_mem(hw, pd_entry-bp.addr); if (ret_code) goto exit; if (!pd_table-ref_cnt) diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.h b/drivers/net/ethernet/intel/i40e/i40e_hmc.h index 732a026..386416b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_hmc.h +++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.h @@ -62,6 +62,7 @@ struct i40e_hmc_bp { struct i40e_hmc_pd_entry { struct i40e_hmc_bp bp; u32 sd_index; + bool rsrc_pg; bool valid; }; @@ -218,7 +219,8 @@ i40e_status i40e_add_sd_table_entry(struct i40e_hw *hw, i40e_status i40e_add_pd_table_entry(struct i40e_hw *hw, struct i40e_hmc_info *hmc_info, - u32 pd_index); + u32 pd_index, +
[net-next 11/16] i40e: un-disable VF after reset
From: Mitch Williams mitch.a.willi...@intel.com When a VF is disabled, there is no way for it to recover until either the PF driver is reloaded or SR-IOV is disabled and enabled. To correct this, enable the VF after a successful reset. Change-ID: I9e0788476c4d53d5407961b503febdfff2b8a7c6 Signed-off-by: Mitch Williams mitch.a.willi...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 55b19f5..fdd7f5e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -832,6 +832,7 @@ complete_reset: i40e_alloc_vf_res(vf); i40e_enable_vf_mappings(vf); set_bit(I40E_VF_STAT_ACTIVE, vf-vf_states); + clear_bit(I40E_VF_STAT_DISABLED, vf-vf_states); /* tell the VF the reset is done */ wr32(hw, I40E_VFGEN_RSTAT1(vf-vf_id), I40E_VFR_VFACTIVE); -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 08/16] i40e/i40evf: Update the admin queue command header
From: Greg Rose gregory.v.r...@intel.com Make the necessary updates to i40e_adminq_cmd.h. Change-ID: Ib031c86cc6cab78e5aa44c64d8ce5474be8d7e42 Signed-off-by: Greg Rose gregory.v.r...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 24 -- .../net/ethernet/intel/i40evf/i40e_adminq_cmd.h| 18 +++- 2 files changed, 20 insertions(+), 22 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h index 929e3d7..9101f5c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h @@ -34,7 +34,7 @@ */ #define I40E_FW_API_VERSION_MAJOR 0x0001 -#define I40E_FW_API_VERSION_MINOR 0x0002 +#define I40E_FW_API_VERSION_MINOR 0x0004 struct i40e_aq_desc { __le16 flags; @@ -132,12 +132,7 @@ enum i40e_admin_queue_opc { i40e_aqc_opc_list_func_capabilities = 0x000A, i40e_aqc_opc_list_dev_capabilities = 0x000B, - i40e_aqc_opc_set_cppm_configuration = 0x0103, - i40e_aqc_opc_set_arp_proxy_entry= 0x0104, - i40e_aqc_opc_set_ns_proxy_entry = 0x0105, - /* LAA */ - i40e_aqc_opc_mng_laa= 0x0106, /* AQ obsolete */ i40e_aqc_opc_mac_address_read = 0x0107, i40e_aqc_opc_mac_address_write = 0x0108, @@ -262,7 +257,6 @@ enum i40e_admin_queue_opc { /* Tunnel commands */ i40e_aqc_opc_add_udp_tunnel = 0x0B00, i40e_aqc_opc_del_udp_tunnel = 0x0B01, - i40e_aqc_opc_tunnel_key_structure = 0x0B10, /* Async Events */ i40e_aqc_opc_event_lan_overflow = 0x1001, @@ -274,8 +268,6 @@ enum i40e_admin_queue_opc { i40e_aqc_opc_oem_ocbb_initialize= 0xFE03, /* debug commands */ - i40e_aqc_opc_debug_get_deviceid = 0xFF00, - i40e_aqc_opc_debug_set_mode = 0xFF01, i40e_aqc_opc_debug_read_reg = 0xFF03, i40e_aqc_opc_debug_write_reg= 0xFF04, i40e_aqc_opc_debug_modify_reg = 0xFF07, @@ -509,7 +501,8 @@ struct i40e_aqc_mac_address_read { #define I40E_AQC_SAN_ADDR_VALID0x20 #define I40E_AQC_PORT_ADDR_VALID 0x40 #define I40E_AQC_WOL_ADDR_VALID0x80 -#define I40E_AQC_ADDR_VALID_MASK 0xf0 +#define I40E_AQC_MC_MAG_EN_VALID 0x100 +#define I40E_AQC_ADDR_VALID_MASK 0x1F0 u8 reserved[6]; __le32 addr_high; __le32 addr_low; @@ -532,7 +525,9 @@ struct i40e_aqc_mac_address_write { #define I40E_AQC_WRITE_TYPE_LAA_ONLY 0x #define I40E_AQC_WRITE_TYPE_LAA_WOL0x4000 #define I40E_AQC_WRITE_TYPE_PORT 0x8000 -#define I40E_AQC_WRITE_TYPE_MASK 0xc000 +#define I40E_AQC_WRITE_TYPE_UPDATE_MC_MAG 0xC000 +#define I40E_AQC_WRITE_TYPE_MASK 0xC000 + __le16 mac_sah; __le32 mac_sal; u8 reserved[8]; @@ -1068,6 +1063,7 @@ struct i40e_aqc_set_vsi_promiscuous_modes { __le16 seid; #define I40E_AQC_VSI_PROM_CMD_SEID_MASK0x3FF __le16 vlan_tag; +#define I40E_AQC_SET_VSI_VLAN_MASK 0x0FFF #define I40E_AQC_SET_VSI_VLAN_VALID0x8000 u8 reserved[8]; }; @@ -2064,6 +2060,12 @@ I40E_CHECK_CMD_LENGTH(i40e_aqc_lldp_start); #define I40E_AQC_CEE_PFC_STATUS_MASK (0x7 I40E_AQC_CEE_PFC_STATUS_SHIFT) #define I40E_AQC_CEE_APP_STATUS_SHIFT 0x8 #define I40E_AQC_CEE_APP_STATUS_MASK (0x7 I40E_AQC_CEE_APP_STATUS_SHIFT) +#define I40E_AQC_CEE_FCOE_STATUS_SHIFT 0x8 +#define I40E_AQC_CEE_FCOE_STATUS_MASK (0x7 I40E_AQC_CEE_FCOE_STATUS_SHIFT) +#define I40E_AQC_CEE_ISCSI_STATUS_SHIFT0xA +#define I40E_AQC_CEE_ISCSI_STATUS_MASK (0x7 I40E_AQC_CEE_ISCSI_STATUS_SHIFT) +#define I40E_AQC_CEE_FIP_STATUS_SHIFT 0x10 +#define I40E_AQC_CEE_FIP_STATUS_MASK (0x7 I40E_AQC_CEE_FIP_STATUS_SHIFT) struct i40e_aqc_get_cee_dcb_cfg_v1_resp { u8 reserved1; u8 oper_num_tc; diff --git a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h index e715bcc..d5bd6f0 100644 --- a/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h +++ b/drivers/net/ethernet/intel/i40evf/i40e_adminq_cmd.h @@ -34,7 +34,7 @@ */ #define I40E_FW_API_VERSION_MAJOR 0x0001 -#define I40E_FW_API_VERSION_MINOR 0x0002 +#define I40E_FW_API_VERSION_MINOR 0x0004 #define I40E_FW_API_VERSION_A0_MINOR 0x struct i40e_aq_desc { @@ -133,12 +133,7 @@ enum i40e_admin_queue_opc { i40e_aqc_opc_list_func_capabilities = 0x000A, i40e_aqc_opc_list_dev_capabilities = 0x000B, - i40e_aqc_opc_set_cppm_configuration = 0x0103, - i40e_aqc_opc_set_arp_proxy_entry= 0x0104, - i40e_aqc_opc_set_ns_proxy_entry
[net-next 16/16] i40e/i40evf: Bump version to 1.3.6 for i40e and 1.3.2 for i40evf
From: Catherine Sullivan catherine.sulli...@intel.com Bump. Change-ID: I84573d9fa51effc5b29bf5b8c74e3cc8b2673f48 Signed-off-by: Catherine Sullivan catherine.sulli...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9ec6fa2..6ce9086 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -39,7 +39,7 @@ static const char i40e_driver_string[] = #define DRV_VERSION_MAJOR 1 #define DRV_VERSION_MINOR 3 -#define DRV_VERSION_BUILD 4 +#define DRV_VERSION_BUILD 6 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) . \ __stringify(DRV_VERSION_MINOR) . \ __stringify(DRV_VERSION_BUILD)DRV_KERN diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 526cc8d..ec1eaa5 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -34,7 +34,7 @@ char i40evf_driver_name[] = i40evf; static const char i40evf_driver_string[] = Intel(R) XL710/X710 Virtual Function Network Driver; -#define DRV_VERSION 1.2.25 +#define DRV_VERSION 1.3.2 const char i40evf_driver_version[] = DRV_VERSION; static const char i40evf_copyright[] = Copyright (c) 2013 - 2014 Intel Corporation.; -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 13/16] i40evf: add MAC address filter in open, not init
From: Mitch Williams mitch.a.willi...@intel.com During close, all of the MAC filters are cleared, so the driver would be unable to receive unicast packets after being closed and reopened. Add the adapter's hardware MAC address filter in open, not init. This ensures that the correct filter is present each time. Change-ID: I51a11e9c1200139dab6f66a5353bd38c7d26f875 Signed-off-by: Mitch Williams mitch.a.willi...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 12 +--- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 07f6052..526cc8d 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -1858,6 +1858,7 @@ static int i40evf_open(struct net_device *netdev) if (err) goto err_req_irq; + i40evf_add_filter(adapter, adapter-hw.mac.addr); i40evf_configure(adapter); err = i40evf_up_complete(adapter); @@ -1998,7 +1999,6 @@ static void i40evf_init_task(struct work_struct *work) struct i40evf_adapter, init_task.work); struct net_device *netdev = adapter-netdev; - struct i40evf_mac_filter *f; struct i40e_hw *hw = adapter-hw; struct pci_dev *pdev = adapter-pdev; int i, err, bufsz; @@ -2132,16 +2132,6 @@ static void i40evf_init_task(struct work_struct *work) ether_addr_copy(netdev-dev_addr, adapter-hw.mac.addr); ether_addr_copy(netdev-perm_addr, adapter-hw.mac.addr); - f = kzalloc(sizeof(*f), GFP_ATOMIC); - if (!f) - goto err_sw_init; - - ether_addr_copy(f-macaddr, adapter-hw.mac.addr); - f-add = true; - adapter-aq_required |= I40EVF_FLAG_AQ_ADD_MAC_FILTER; - - list_add(f-list, adapter-mac_filter_list); - init_timer(adapter-watchdog_timer); adapter-watchdog_timer.function = i40evf_watchdog_timer; adapter-watchdog_timer.data = (unsigned long)adapter; -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET: AX.25: Stop heartbeat timer on disconnect.
This may result in a kernel panic. The bug has always existed but somehow we've run out of luck now and it bites. Signed-off-by: Richard Stearn rich...@rns-stearn.demon.co.uk Cc: sta...@vger.kernel.org # all branches Signed-off-by: Ralf Baechle r...@linux-mips.org --- net/ax25/ax25_subr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ax25/ax25_subr.c b/net/ax25/ax25_subr.c index 1997538..3b78e84 100644 --- a/net/ax25/ax25_subr.c +++ b/net/ax25/ax25_subr.c @@ -264,6 +264,7 @@ void ax25_disconnect(ax25_cb *ax25, int reason) { ax25_clear_queues(ax25); + ax25_stop_heartbeat(ax25); ax25_stop_t1timer(ax25); ax25_stop_t2timer(ax25); ax25_stop_t3timer(ax25); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Revert net: fec: Ensure clocks are enabled while using mdio bus
This reverts commit 6c3e921b18edca290099adfddde8a50236bf2d80. commit 6c3e921b18ed (net: fec: Ensure clocks are enabled while using mdio bus) prevents the kernel to boot on mx6 boards, so let's revert it. Reported-by: Tyler Baker tyler.ba...@linaro.org Signed-off-by: Fabio Estevam fabio.este...@freescale.com --- drivers/net/ethernet/freescale/fec_main.c | 88 +-- 1 file changed, 13 insertions(+), 75 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 42e20e5..1f89c59 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -24,7 +24,6 @@ #include linux/module.h #include linux/kernel.h #include linux/string.h -#include linux/pm_runtime.h #include linux/ptrace.h #include linux/errno.h #include linux/ioport.h @@ -78,7 +77,6 @@ static void fec_enet_itr_coal_init(struct net_device *ndev); #define FEC_ENET_RAEM_V0x8 #define FEC_ENET_RAFL_V0x8 #define FEC_ENET_OPD_V 0xFFF0 -#define FEC_MDIO_PM_TIMEOUT 100 /* ms */ static struct platform_device_id fec_devtype[] = { { @@ -1769,13 +1767,7 @@ static void fec_enet_adjust_link(struct net_device *ndev) static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1791,30 +1783,18 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO read timeout\n); - ret = -ETIMEDOUT; - goto out; + return -ETIMEDOUT; } - ret = FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); - -out: - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + /* return value */ + return FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); } static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, u16 value) { struct fec_enet_private *fep = bus-priv; - struct device *dev = fep-pdev-dev; unsigned long time_left; - int ret = 0; - - ret = pm_runtime_get_sync(dev); - if (IS_ERR_VALUE(ret)) - return ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1831,13 +1811,10 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO write timeout\n); - ret = -ETIMEDOUT; + return -ETIMEDOUT; } - pm_runtime_mark_last_busy(dev); - pm_runtime_put_autosuspend(dev); - - return ret; + return 0; } static int fec_enet_clk_enable(struct net_device *ndev, bool enable) @@ -1849,6 +1826,9 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) ret = clk_prepare_enable(fep-clk_ahb); if (ret) return ret; + ret = clk_prepare_enable(fep-clk_ipg); + if (ret) + goto failed_clk_ipg; if (fep-clk_enet_out) { ret = clk_prepare_enable(fep-clk_enet_out); if (ret) @@ -1872,6 +1852,7 @@ static int fec_enet_clk_enable(struct net_device *ndev, bool enable) } } else { clk_disable_unprepare(fep-clk_ahb); + clk_disable_unprepare(fep-clk_ipg); if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); if (fep-clk_ptp) { @@ -1893,6 +1874,8 @@ failed_clk_ptp: if (fep-clk_enet_out) clk_disable_unprepare(fep-clk_enet_out); failed_clk_enet_out: + clk_disable_unprepare(fep-clk_ipg); +failed_clk_ipg: clk_disable_unprepare(fep-clk_ahb); return ret; @@ -2864,14 +2847,10 @@ fec_enet_open(struct net_device *ndev) struct fec_enet_private *fep = netdev_priv(ndev); int ret; - ret = pm_runtime_get_sync(fep-pdev-dev); - if (IS_ERR_VALUE(ret)) - return ret; - pinctrl_pm_select_default_state(fep-pdev-dev); ret = fec_enet_clk_enable(ndev, true); if (ret) - goto clk_enable; + return ret; /* I should reset the ring buffers here, but I don't yet know * a simple way to do that. @@ -2902,9 +2881,6 @@ err_enet_mii_probe: fec_enet_free_buffers(ndev); err_enet_alloc: fec_enet_clk_enable(ndev, false);
net: Fix skb csum races when peeking
On Mon, Jul 13, 2015 at 04:31:00PM +0800, Herbert Xu wrote: On Mon, Jul 13, 2015 at 10:28:19AM +0200, Eric Dumazet wrote: Except that udp checksum are checked outside of spinlock protection. Good point. I wonder when this got broken. I'll do some digging. OK looks like I can claim credit for this bug too :) commit fb286bb2990a107009dbf25f6ffebeb7df77f9be Author: Herbert Xu herb...@gondor.apana.org.au Date: Thu Nov 10 13:01:24 2005 -0800 [NET]: Detect hardware rx checksum faults correctly Although others have made the hole bigger more recently. PS we seem to no longer use the hardware checksum in case of CHECKSUM_COMPLETE, I wonder why that is? ---8--- When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu herb...@gondor.apana.org.au diff --git a/net/core/datagram.c b/net/core/datagram.c index b80fb91..4967262 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -622,7 +657,8 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len) !skb-csum_complete_sw) netdev_rx_csum_fault(skb-dev); } - skb-csum_valid = !sum; + if (!skb_shared(skb)) + skb-csum_valid = !sum; return sum; } EXPORT_SYMBOL(__skb_checksum_complete_head); @@ -642,11 +678,13 @@ __sum16 __skb_checksum_complete(struct sk_buff *skb) netdev_rx_csum_fault(skb-dev); } - /* Save full packet checksum */ - skb-csum = csum; - skb-ip_summed = CHECKSUM_COMPLETE; - skb-csum_complete_sw = 1; - skb-csum_valid = !sum; + if (!skb_shared(skb)) { + /* Save full packet checksum */ + skb-csum = csum; + skb-ip_summed = CHECKSUM_COMPLETE; + skb-csum_complete_sw = 1; + skb-csum_valid = !sum; + } return sum; } -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] net: enable inband link state negotiation only when explicitly requested
Hi Stas, On Fri, 10 Jul 2015, Stas Sergeev wrote: Those who were affected by the change, please send your Tested-by, Thanks! I also confirm that this version of the patch solves the issue: Tested-by: Sebastien Rannou m...@sbrk.org -- Sébastien
linux-4.2-rc2/drivers/net/ethernet/brocade/bna/bfa_ioc.c:2843: out of bounds string access ?
Hello there, [linux-4.2-rc2/drivers/net/ethernet/brocade/bna/bfa_ioc.c:2843]: (error) Buffer is accessed out of bounds. Source code is memcpy(manufacturer, BFA_MFG_NAME, BFA_ADAPTER_MFG_NAME_LEN); and #define BFA_MFG_NAME QLogic and $ fgrep BFA_ADAPTER_MFG_NAME_LEN `find linux-4.2-rc2/drivers/net/ethernet/brocade/ -name \*.h -print` linux-4.2-rc2/drivers/net/ethernet/brocade/bna/bfa_defs.h: BFA_ADAPTER_MFG_NAME_LEN = 8, /*! manufacturer name length */ $ so the code attempts to read eight bytes from a seven byte string. Suggest code rework. Regards David Binderman -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave
From: Satish Ashok sas...@cumulusnetworks.com A report with INCLUDE/Change_to_include and empty source list should be treated as a leave, specified by RFC 3376, section 3.1: If the requested filter mode is INCLUDE *and* the requested source list is empty, then the entry corresponding to the requested interface and multicast address is deleted if present. If no such entry is present, the request is ignored. Signed-off-by: Satish Ashok sas...@cumulusnetworks.com Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com --- net/bridge/br_multicast.c | 37 ++--- 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 742a6c27d7a2..79db489cdade 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -39,6 +39,16 @@ static void br_multicast_start_querier(struct net_bridge *br, struct bridge_mcast_own_query *query); static void br_multicast_add_router(struct net_bridge *br, struct net_bridge_port *port); +static void br_ip4_multicast_leave_group(struct net_bridge *br, +struct net_bridge_port *port, +__be32 group, +__u16 vid); +#if IS_ENABLED(CONFIG_IPV6) +static void br_ip6_multicast_leave_group(struct net_bridge *br, +struct net_bridge_port *port, +const struct in6_addr *group, +__u16 vid); +#endif unsigned int br_mdb_rehash_seq; static inline int br_ip_equal(const struct br_ip *a, const struct br_ip *b) @@ -1010,9 +1020,15 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br, continue; } - err = br_ip4_multicast_add_group(br, port, group, vid); - if (err) - break; + if ((type == IGMPV3_CHANGE_TO_INCLUDE || +type == IGMPV3_MODE_IS_INCLUDE) + ntohs(grec-grec_nsrcs) == 0) { + br_ip4_multicast_leave_group(br, port, group, vid); + } else { + err = br_ip4_multicast_add_group(br, port, group, vid); + if (err) + break; + } } return err; @@ -1071,10 +1087,17 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br, continue; } - err = br_ip6_multicast_add_group(br, port, grec-grec_mca, -vid); - if (err) - break; + if ((grec-grec_type == MLD2_CHANGE_TO_INCLUDE || +grec-grec_type == MLD2_MODE_IS_INCLUDE) + ntohs(*nsrcs) == 0) { + br_ip6_multicast_leave_group(br, port, grec-grec_mca, +vid); + } else { + err = br_ip6_multicast_add_group(br, port, +grec-grec_mca, vid); + if (!err) + break; + } } return err; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.
Yoshifuji-san, At Mon, 13 Jul 2015 17:38:48 +0900, Erik Kline wrote: On 13 July 2015 at 15:32, YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com wrote: Hi, Erik Kline wrote: Hmm, when I run a UML linux with this patch (which, I'm ashamed to say, I failed to do before) I get these kinds of errors: unregister_netdevice: waiting for TAPdevice to become free. Usage count = 1 unregister_netdevice: waiting for TAPdevice to become free. Usage count = 1 Perhaps they're unrelated... I'm still investigating. Would you test attached patch please? That does look logically correct, so +1 to it regardless, but it does not seem to have fixed the issue I'm seeing. I still haven't produced the smallest possible demo test program. sorry to jump-in, but there is a side-effect with this patch, which my tcp and dccp tests (ipv6) are failed. because newly added function (__ipv6_dev_get_saddr) won't update a variable 'hiscore' (it swaps with 'score' in some case), the caller (ipv6_dev_get_saddr) can't fill an appropriate saddr in the end. I don't know if this is a good patch but the following diff makes my test happy. -- Hajime diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4ab74d5..c4e9416 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1363,7 +1363,8 @@ static void __ipv6_dev_get_saddr(struct net *net, unsigned int prefs, const struct in6_addr *saddr, struct inet6_dev *idev, -struct ipv6_saddr_score *scores) +struct ipv6_saddr_score *scores, +struct ipv6_saddr_score **in_hiscore) { struct ipv6_saddr_score *score = scores[0], *hiscore = scores[1]; @@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net, in6_ifa_hold(score-ifa); swap(hiscore, score); + *in_hiscore = hiscore; /* restore our iterator */ score-ifa = hiscore-ifa; @@ -1480,13 +1482,15 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, } if (use_oif_addr) { - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, +scores, hiscore); } else { for_each_netdev_rcu(net, dev) { idev = __in6_dev_get(dev); if (!idev) continue; - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, +scores, hiscore); } } rcu_read_unlock(); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Add Kconfig option to signal cross-endian guests
On Thu, 9 Jul 2015 09:49:05 +0200 Thomas Huth th...@redhat.com wrote: The option for supporting cross-endianness legacy guests in the vhost and tun code should only be available on systems that support cross-endian guests. Signed-off-by: Thomas Huth th...@redhat.com Acked-by: Greg Kurz gk...@linux.vnet.ibm.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] NET: AX.25: Stop heartbeat timer on disconnect.
From: Richard Stearn rich...@rns-stearn.demon.co.uk This may result in a kernel panic. The bug has always existed but somehow we've run out of luck now and it bites. Signed-off-by: Richard Stearn rich...@rns-stearn.demon.co.uk Cc: sta...@vger.kernel.org # all branches Signed-off-by: Ralf Baechle r...@linux-mips.org --- v2: Correctly attribute the patch to Richard Stearn in the From: line net/ax25/ax25_subr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ax25/ax25_subr.c b/net/ax25/ax25_subr.c index 1997538..3b78e84 100644 --- a/net/ax25/ax25_subr.c +++ b/net/ax25/ax25_subr.c @@ -264,6 +264,7 @@ void ax25_disconnect(ax25_cb *ax25, int reason) { ax25_clear_queues(ax25); + ax25_stop_heartbeat(ax25); ax25_stop_t1timer(ax25); ax25_stop_t2timer(ax25); ax25_stop_t3timer(ax25); - End forwarded message - Ralf -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next 07/16] i40e: Remove incorrect #ifdef's
From: Carolyn Wyborny carolyn.wybo...@intel.com This patch removes some #ifdef's that should not be there. They were stopping code that is needed from being compiled in. With these #ifdef's removed, changes are needed in the driver to fix some compile errors: adding missing parameters to the definition of ndo_bridge_setlink and a ndo_dflt_brige_getlink call. Change-ID: I5516614e1bc50b6bca0647cef971bc96161ba2de Signed-off-by: Carolyn Wyborny carolyn.wybo...@intel.com Signed-off-by: Catherine Sullivan catherine.sulli...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40e/i40e_main.c | 12 ++-- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index ed6fc52..c7f2a0a 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -7993,7 +7993,6 @@ static int i40e_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], return err; } -#ifdef HAVE_BRIDGE_ATTRIBS /** * i40e_ndo_bridge_setlink - Set the hardware bridge mode * @dev: the netdev being configured @@ -8007,7 +8006,8 @@ static int i40e_ndo_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], * bridge mode enabled. **/ static int i40e_ndo_bridge_setlink(struct net_device *dev, - struct nlmsghdr *nlh) + struct nlmsghdr *nlh, + u16 flags) { struct i40e_netdev_priv *np = netdev_priv(dev); struct i40e_vsi *vsi = np-vsi; @@ -8078,14 +8078,9 @@ static int i40e_ndo_bridge_setlink(struct net_device *dev, * Return the mode in which the hardware bridge is operating in * i.e VEB or VEPA. **/ -#ifdef HAVE_BRIDGE_FILTER static int i40e_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, struct net_device *dev, u32 filter_mask, int nlflags) -#else -static int i40e_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, - struct net_device *dev, int nlflags) -#endif /* HAVE_BRIDGE_FILTER */ { struct i40e_netdev_priv *np = netdev_priv(dev); struct i40e_vsi *vsi = np-vsi; @@ -8109,7 +8104,6 @@ static int i40e_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, return ndo_dflt_bridge_getlink(skb, pid, seq, dev, veb-bridge_mode, nlflags, 0, 0, filter_mask, NULL); } -#endif /* HAVE_BRIDGE_ATTRIBS */ #define I40E_MAX_TUNNEL_HDR_LEN 80 /** @@ -8165,10 +8159,8 @@ static const struct net_device_ops i40e_netdev_ops = { .ndo_get_phys_port_id = i40e_get_phys_port_id, .ndo_fdb_add= i40e_ndo_fdb_add, .ndo_features_check = i40e_features_check, -#ifdef HAVE_BRIDGE_ATTRIBS .ndo_bridge_getlink = i40e_ndo_bridge_getlink, .ndo_bridge_setlink = i40e_ndo_bridge_setlink, -#endif /* HAVE_BRIDGE_ATTRIBS */ }; /** -- 2.4.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
linux-4.2-rc2/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1993: possible bad error checking ?
Hello there, [linux-4.2-rc2/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1993]: (style) Checking if unsigned variable 'entry' is less than zero. Source code is entry = priv-hw-mode-jumbo_frm(priv, skb, csum_insertion); if (unlikely(entry 0)) goto dma_map_err; but unsigned int entry; So the error checking from the function call looks broken to me. If the return value from the function call to jumbo_frm is a plain signed integer, suggest sanity check that *before* assigning into an unsigned integer. Regards David Binderman -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] net: enable inband link state negotiation only when explicitly requested
13.07.2015 12:54, Sebastien Rannou пишет: Hi Stas, On Fri, 10 Jul 2015, Stas Sergeev wrote: Those who were affected by the change, please send your Tested-by, Thanks! I also confirm that this version of the patch solves the issue: Tested-by: Sebastien Rannou m...@sbrk.org Thanks Sebastien! Unfortunately, there will be v3 in a few days. Perhaps you should not rush with the tests until the things are settled, or who knows how many iterations you'll have to test... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: GFIT
From: Cunliffe Bryan (RW3) CMFT Manchester Sent: 12 July 2015 21:11 To: Cunliffe Bryan (RW3) CMFT Manchester Subject: GFIT Donation has been made to you Email mrs.gloriamacke...@outlook.commailto:mrs.gloriamacke...@outlook.com for more Details Privacy and Confidentiality Notice: The information contained in this e-mail is intended for the named recipient(s) only. It may contain privileged and confidential information. If you are not an intended recipient, you must not copy, distribute or take any action in reliance on it. If you have received this e-mail in error, we would be grateful if you would notify us immediately. Thank you for your assistance. Please note that e-mails sent or received by our staff may be disclosed under the Freedom of Information Act (unless exempt). -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] net: ieee802154: Remove redundant spi driver bus initialization
Hi Antonio, In ancient times it was necessary to manually initialize the bus field of an spi_driver to spi_bus_type. These days this is done in spi_register_driver(), so we can drop the manual assignment. Signed-off-by: Antonio Borneo borneo.anto...@gmail.com To: Alan Ott a...@signal11.us To: Alexander Aring alex.ar...@gmail.com To: Varka Bhadram varkabhad...@gmail.com To: linux-w...@vger.kernel.org To: netdev@vger.kernel.org Cc: linux-ker...@vger.kernel.org --- drivers/net/ieee802154/cc2520.c | 1 - drivers/net/ieee802154/mrf24j40.c | 1 - 2 files changed, 2 deletions(-) patch has been applied to bluetooth-next tree. Regards Marcel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] net: macb: Add mdio driver for accessing multiple phy devices
On Tue, Jul 14, 2015 at 12:13 AM, Florian Fainelli f.faine...@gmail.com wrote: On 12/07/15 21:48, Punnaiah Choudary Kalluri wrote: This patch is to add support for the design that has multiple ethernet mac controllers and single mdio bus connected to multiple phy devices. i.e mdio lines are connected to any of the ethernet mac controller and all the phy devices will be accessed using the phy maintenance interface in that mac controller. __ _ | | |PHY0 | | MAC0 |-| | |__| | |_| | __| _ | | | | | | MAC1 | |_|PHY1 | |__| | | So, i come up with two implementations for addressing the above configuration. Implementation 1: Have separate driver for mdio bus Create a DT node for all the PHY devices connected to the mdio bus This driver will share the register space of the mac controller that has mdio bus connected. That is the best design implementation, MDIO in itself is a sub-piece of your Ethernet MAC controller the fact that it is within the Ethernet MAC core is just coincidental, but there is no reason why it could not be taken apart and made a separate block in itself. Thanks Florian for suggesting this. No idea on why the mdio block was not made a separate block. regards, Punnaiah Implementation 2: Add new property has-mdio and it should be 1 for the mac that has mdio bus connected. Create the mdio bus only when the has-mdio property is 1 Please review the two implementations and suggest which one is better to proceed further. In my opinion implementation 1 will be the ideal one. Agreed. Currently i have tested the patches with single mac and single phy configuration. I need to take care of few more cases before releasing the final patch but before that i would like to have your opinion on the above implementations and finalize one implementation. so that i can enhance it further. Punnaiah Choudary Kalluri (1): net: macb: Add mdio driver for accessing multiple phy devices net: macb: Add support for single mac managing more than one phy drivers/net/ethernet/cadence/Makefile|2 +- drivers/net/ethernet/cadence/macb.c | 93 +- drivers/net/ethernet/cadence/macb.h |3 +- drivers/net/ethernet/cadence/macb_mdio.c | 204 ++ 4 files changed, 211 insertions(+), 91 deletions(-) create mode 100644 drivers/net/ethernet/cadence/macb_mdio.c -- Florian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] bridge: mdb: fix double add notification
On Mon, Jul 13, 2015 at 6:36 AM, Nikolay Aleksandrov niko...@cumulusnetworks.com wrote: Since the mdb add/del code was introduced there have been 2 br_mdb_notify calls when doing br_mdb_add() resulting in 2 notifications on each add. Example: Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent Before patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent [MDB]dev br0 port eth1 grp 239.0.0.1 permanent After patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com Fixes: cfd567543590 (bridge: add support of adding and deleting mdb entries) --- net/bridge/br_mdb.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index c11cf2611db0..1198a3dbad95 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -351,7 +351,6 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port, if (state == MDB_TEMPORARY) mod_timer(p-timer, now + br-multicast_membership_interval); - br_mdb_notify(br-dev, port, group, RTM_NEWMDB); return 0; } Looks good to me. And probably we can convert existing __br_mdb_notify() to using non-atomic allocation too, but that is for net-next. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] e1000e: Move e1000e_disable_aspm_locked() inside CONFIG_PM
e1000e_disable_aspm_locked() is only used in __e1000_resume() which is inside CONFIG_PM. So when CONFIG_PM=n we get a defined but not used warning for e1000e_disable_aspm_locked(). Move it inside the existing CONFIG_PM block to avoid the warning. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 89d788d8f263..f1d7fe2ea183 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -6439,6 +6439,7 @@ static void e1000e_disable_aspm(struct pci_dev *pdev, u16 state) __e1000e_disable_aspm(pdev, state, 0); } +#ifdef CONFIG_PM /** * e1000e_disable_aspm_locked Disable ASPM states. * @pdev: pointer to PCI device struct @@ -6452,7 +6453,6 @@ static void e1000e_disable_aspm_locked(struct pci_dev *pdev, u16 state) __e1000e_disable_aspm(pdev, state, 1); } -#ifdef CONFIG_PM static int __e1000_resume(struct pci_dev *pdev) { struct net_device *netdev = pci_get_drvdata(pdev); -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[rhashtable] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:301 __debug_object_init()
-netfront: refactor skb slot counting git bisect bad 829a3ada9cc7d4c30fa61f8033403fb6c8f8092a # 09:38 0- 1 geneve: Simplify locking. git bisect good a4c9ea5e8fec680134d22aa99b54d1cd8c226ebd # 09:42 22+ 12 geneve: Add Geneve GRO support git bisect good 255047b0dca31e6b8ce254481a0b65d559d2ebb8 # 09:46 20+ 0 Bluetooth: Add timing information to SMP test case runs git bisect good 354f473ee2c5d01c1cf90f747f95218ee3e73e95 # 09:52 22+ 0 ath9k: fix typo git bisect good d312da293f787e1b19c57acb58e8c1b171c4a04a # 09:59 22+ 0 ixgbe: convert to CYCLECOUNTER_MASK macro. git bisect good b8e1943e9f754219bcfb40bac4a605b5348acb25 # 10:03 22+ 8 rhashtable: Factor out bucket_tail() function git bisect bad f89bd6f87a53ce5a7d60662429591ebac2745c10 # 10:08 0- 22 rhashtable: Supports for nulls marker git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa # 10:12 22+ 13 spinlock: Add spin_lock_bh_nested() git bisect bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1 # 10:16 0- 22 rhashtable: Per bucket locks deferred expansion/shrinking # first bad commit: [97defe1ecf868b8127f8e62395499d6a06e4c4b1] rhashtable: Per bucket locks deferred expansion/shrinking git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa # 10:19 66+ 27 spinlock: Add spin_lock_bh_nested() # extra tests with DEBUG_INFO git bisect bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1 # 10:25 0- 66 rhashtable: Per bucket locks deferred expansion/shrinking # extra tests on HEAD of linux-devel/devel-spot-201507122014 git bisect good 3afd2c3f65a385c405a084d80431c84b103cb6df # 10:28 66+ 49 0day head guard for 'devel-spot-201507122014' # extra tests on tree/branch linus/master git bisect good f760b87f8f12eb262f14603e65042996fe03720e # 10:33 66+ 0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net # extra tests on tree/branch linus/master git bisect good f760b87f8f12eb262f14603e65042996fe03720e # 10:33 66+ 0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net # extra tests on tree/branch next/master git bisect good 2eb62d762a2112579f259903e62ba18d16c51f66 # 10:36 66+ 20 Add linux-next specific files for 20150713 This script may reproduce the error. #!/bin/bash kernel=$1 initrd=yocto-minimal-x86_64.cgz wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd kvm=( qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -kernel $kernel -initrd $initrd -m 256 -smp 1 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -serial stdio -display none -monitor null ) append=( hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw drbd.minor_count=8 ) ${kvm[@]} --append ${append[*]} --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/lkp Intel Corporation early console in setup code [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.19.0-rc2-00323-g97defe1 (kbuild@lkp-ib03) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Tue Jul 14 10:14:59 CST 2015 [0.00] Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-a0-07122340/linux-devel:devel-spot-201507122014:97defe1ecf868b8127f8e62395499d6a06e4c4b1:bisect-linux-1/.vmlinuz-97defe1ecf868b8127f8e62395499d6a06e4c4b1-20150714101515-19-ivb41 branch=linux-devel/devel-spot-201507122014 BOOT_IMAGE=/pkg/linux/x86_64-randconfig-a0-07122340/gcc-4.9/97defe1ecf868b8127f8e62395499d6a06e4c4b1/vmlinuz-3.19.0-rc2-00323-g97defe1 drbd.minor_count=8 [0.00] KERNEL supported cpus: [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] CPU: vendor_id 'GenuineIntel' unknown, using generic init. [0.00] CPU: Your system may be unstable. [0.00] e820: BIOS-provided physical RAM map
Re: mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings
On Mon, Jul 13, 2015 at 6:18 AM, Kirill A. Shutemov kir...@shutemov.name wrote: Hi, This simple test-case trigers few locking asserts in kernel: #define _GNU_SOURCE #include stdlib.h #include stdio.h #include string.h #include sys/mman.h #include sys/socket.h #include sys/types.h #include linux/netlink.h #define SOL_NETLINK 270 int main(int argc, char **argv) { unsigned int block_size = 16 * 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr= 64, .nm_frame_size = 16384, .nm_frame_nr= 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, req, sizeof(req)) 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, req, sizeof(req)) 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ [2.500126] BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 [2.501328] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init [2.501997] 3 locks held by init/1: [2.502380] #0: (reboot_mutex){+.+...}, at: [81080959] SyS_reboot+0xa9/0x220 [2.503328] #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [8107f379] __blocking_notifier_call_chain+0x39/0x70 [2.504659] #2: (rcu_callback){..}, at: [810d32e0] rcu_do_batch.isra.49+0x160/0x10c0 [2.505724] Preemption disabled at:[8145365f] __delay+0xf/0x20 [2.506443] [2.506612] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-9-gbddf4c4818e0 #253 [2.507378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 [2.508386] 88017b3d8000 88027bc03c38 81929ceb 0102 [2.509233] 88027bc03c68 81085a9d 0002 [2.510057] 81ca2a20 0268 88027bc03c98 [2.510882] Call Trace: [2.511146] IRQ [81929ceb] dump_stack+0x4f/0x7b [2.511763] [81085a9d] ___might_sleep+0x16d/0x270 [2.512476] [81085bed] __might_sleep+0x4d/0x90 [2.513071] [8192e96f] mutex_lock_nested+0x2f/0x430 [2.513683] [81932fed] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [2.514385] [81464143] ? __this_cpu_preempt_check+0x13/0x20 [2.515066] [8182fc3d] netlink_set_ring+0x1ed/0x350 [2.515694] [8182e000] ? netlink_undo_bind+0x70/0x70 [2.516411] [8182fe20] netlink_sock_destruct+0x80/0x150 [2.517070] [817e484d] __sk_free+0x1d/0x160 [2.517607] [817e49a9] sk_free+0x19/0x20 [2.518118] [8182e020] deferred_put_nlk_sk+0x20/0x30 [2.518735] [810d391c] rcu_do_batch.isra.49+0x79c/0x10c0 Caused by: commit 21e4902aea80ef35afc00ee8d2abdea4f519b7f7 Author: Thomas Graf tg...@suug.ch Date: Fri Jan 2 23:00:22 2015 +0100 netlink: Lockless lookup with RCU grace period in socket release Defers the release of the socket reference using call_rcu() to allow using an RCU read-side protected call to rhashtable_lookup() This restores behaviour and performance gains as previously introduced by e341694 (netlink: Convert netlink_lookup() to use RCU protected hash table) without the side effect of severely delayed socket destruction. Signed-off-by: Thomas Graf tg...@suug.ch Signed-off-by: David S. Miller da...@davemloft.net We can't hold mutex lock in a rcu callback, perhaps we could defer the mmap ring cleanup to a workqueue. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
add some more infomation RE: Issue with active-backup mode bond and bridge
I test this issue in kernel 3.0.93. This issue is a reproduction problem. Step 1. Create a active-backup mode bond with two nics and make sure the IP is in the bond. Step 2. Create a bridge with brctl command Step 3. Join the bond to the bridge and make the IP in the bridge device Step 4. use tcpdump -i bond to ensure the packets across the bond Step 5. Use ifconfig ethX down , make the active slave down, check whether there is gratuitous ARPs or not. -Original Message- From: pengyi Peng(Yi) Sent: Thursday, July 02, 2015 11:05 AM To: 'netdev@vger.kernel.org' Cc: Lichunhe; Zhangwei (FF) Subject: Issue with active-backup mode bond and bridge I find that kernel seems to be not well handled with the combination of bonding and bridge module. I have a physical host with two nics that are bonded together (active backup mode). Each nic is connected to a separate L2 switch. And the two L2 switchs are connected to a L3 switch. If the host only has the bond device, when I manually make the active slave down, bonding will issue one or more gratuitous ARPs on the newly active slave. One gratuitous ARP is issued for the bonding master interface, provided that the interface has at least one IP address configured. However, if there is a bridge named br0 and the bond device joins in the bridge br0, the IP address of the bond moves to the br0 device. First, I make two nics up. But this time, when I again make the active slave down, I can't capture the gratuitous ARP in the bond device with tcpdump. And this can result in the bad connect to the host, because with no ARP packet sended out of the host, the L3 switch may still send the packets from outside to the old L2 switch which connect to the new backup nic. These packets can't get any responses. I read the kernel code. When change the active slave into the specified one, in bond_change_active_slave function, bond will send the NETDEV_NOTIFY_PEERS event: netdev_bonding_change(bond-dev, NETDEV_BONDING_FAILOVER); if (should_notify_peers) netdev_bonding_change(bond-dev, NETDEV_NOTIFY_PEERS); And in inetdev_event function, if event is NETDEV_NOTIFY_PEERS, it will call inetdev_send_gratuitous_arp to send gratuitous ARP. case NETDEV_NOTIFY_PEERS: /* Send gratuitous ARP to notify of link change */ inetdev_send_gratuitous_arp(dev, in_dev); break; But when the bond is in the bridge, the code won't change the dev to the bridge device, and there is no IP address in bond device, so there is no gratuitous ARP. My question is, why the latest kernel(4.1) still does not consider this conditoin ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majordomo@xxx More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().
Commit 9131f3de2 (ipv6: Do not iterate over all interfaces when finding source address on specific interface.) did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline e...@google.com. Based on patch proposed by Hajime Tazaki thehaj...@gmail.com. Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b (ipv6: Do not iterate over all interfaces when finding source address on specific interface.) Signed-off-by: YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com --- net/ipv6/addrconf.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4ab74d5..4c9a024 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1358,14 +1358,15 @@ out: return ret; } -static void __ipv6_dev_get_saddr(struct net *net, -struct ipv6_saddr_dst *dst, -unsigned int prefs, -const struct in6_addr *saddr, -struct inet6_dev *idev, -struct ipv6_saddr_score *scores) +static int __ipv6_dev_get_saddr(struct net *net, + struct ipv6_saddr_dst *dst, + unsigned int prefs, + const struct in6_addr *saddr, + struct inet6_dev *idev, + struct ipv6_saddr_score *scores, + int hiscore_idx) { - struct ipv6_saddr_score *score = scores[0], *hiscore = scores[1]; + struct ipv6_saddr_score *score = scores[1 - hiscore_idx], *hiscore = scores[hiscore_idx]; read_lock_bh(idev-lock); list_for_each_entry(score-ifa, idev-addr_list, if_list) { @@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net, in6_ifa_hold(score-ifa); swap(hiscore, score); + hiscore_idx = 1 - hiscore_idx; /* restore our iterator */ score-ifa = hiscore-ifa; @@ -1434,18 +1436,20 @@ static void __ipv6_dev_get_saddr(struct net *net, } out: read_unlock_bh(idev-lock); + return hiscore_idx; } int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, const struct in6_addr *daddr, unsigned int prefs, struct in6_addr *saddr) { - struct ipv6_saddr_score scores[2], *hiscore = scores[1]; + struct ipv6_saddr_score scores[2], *hiscore; struct ipv6_saddr_dst dst; struct inet6_dev *idev; struct net_device *dev; int dst_type; bool use_oif_addr = false; + int hiscore_idx = 0; dst_type = __ipv6_addr_type(daddr); dst.addr = daddr; @@ -1454,8 +1458,8 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, dst.label = ipv6_addr_label(net, daddr, dst_type, dst.ifindex); dst.prefs = prefs; - hiscore-rule = -1; - hiscore-ifa = NULL; + scores[hiscore_idx].rule = -1; + scores[hiscore_idx].ifa = NULL; rcu_read_lock(); @@ -1480,17 +1484,19 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, } if (use_oif_addr) { - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + if (idev) + hiscore_idx = __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores, hiscore_idx); } else { for_each_netdev_rcu(net, dev) { idev = __in6_dev_get(dev); if (!idev) continue; - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + hiscore_idx = __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores, hiscore_idx); } } rcu_read_unlock(); + hiscore = scores[hiscore_idx]; if (!hiscore-ifa) return -EADDRNOTAVAIL; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 4/4 v2] gianfar: Add paged allocation and Rx S/G
The eTSEC h/w is capable of scatter/gather on the receive side too if MAXFRM MRBLR, when the allowed maximum Rx frame size is set to be greater than the maximum Rx buffer size (MRBLR). It's about time the driver makes use of this h/w capability, by supporting fixed buffer sizes and Rx S/G. The buffer size given to eTSEC for reception is fixed to 1536B (must be multiple of 64), which is the same default buffer size as before, used to accommodate standard MTU (1500B) size frames. As before, eTSEC can receive frames of up to 9600B. Individual Rx buffers are mapped to page halves (page size for eTSEC systems is 4KB). The skb is built around the first buffer of a frame (using build_skb()). In case the frame spans multiple buffers, the trailing buffers are added as Rx fragments to the skb. The last buffer in frame is marked by the L status flag. A mechanism is in place to reuse the pages owned by the driver (for Rx) for subsequent receptions. Supporting fixed size buffers allows the implementation of Rx S/G, which in turn removes the memory pressure issues the driver had before when MTU was set for jumbo frame reception. Also, in most cases, the Rx path becomes faster due to Rx page reusal, since the overhead of allocating new rx buffers is removed from the fast path. Signed-off-by: Claudiu Manoil claudiu.man...@freescale.com --- v2: use lstatus as u32 consistently drivers/net/ethernet/freescale/gianfar.c | 320 ++- drivers/net/ethernet/freescale/gianfar.h | 31 ++- drivers/net/ethernet/freescale/gianfar_ethtool.c | 1 - 3 files changed, 208 insertions(+), 144 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index 7654d5e..648ca85 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -109,7 +109,7 @@ #define TX_TIMEOUT (1*HZ) -const char gfar_driver_version[] = 1.3; +const char gfar_driver_version[] = 2.0; static int gfar_enet_open(struct net_device *dev); static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev); @@ -207,6 +207,7 @@ static void gfar_init_bds(struct net_device *ndev) rx_queue-next_to_clean = 0; rx_queue-next_to_use = 0; + rx_queue-next_to_alloc = 0; /* make sure next_to_clean != next_to_use after this * by leaving at least 1 unused descriptor @@ -222,7 +223,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev) { void *vaddr; dma_addr_t addr; - int i, j, k; + int i, j; struct gfar_private *priv = netdev_priv(ndev); struct device *dev = priv-dev; struct gfar_priv_tx_q *tx_queue = NULL; @@ -262,6 +263,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev) rx_queue-rx_bd_base = vaddr; rx_queue-rx_bd_dma_base = addr; rx_queue-ndev = ndev; + rx_queue-dev = dev; addr += sizeof(struct rxbd8) * rx_queue-rx_ring_size; vaddr += sizeof(struct rxbd8) * rx_queue-rx_ring_size; } @@ -276,21 +278,17 @@ static int gfar_alloc_skb_resources(struct net_device *ndev) if (!tx_queue-tx_skbuff) goto cleanup; - for (k = 0; k tx_queue-tx_ring_size; k++) - tx_queue-tx_skbuff[k] = NULL; + for (j = 0; j tx_queue-tx_ring_size; j++) + tx_queue-tx_skbuff[j] = NULL; } for (i = 0; i priv-num_rx_queues; i++) { rx_queue = priv-rx_queue[i]; - rx_queue-rx_skbuff = - kmalloc_array(rx_queue-rx_ring_size, - sizeof(*rx_queue-rx_skbuff), - GFP_KERNEL); - if (!rx_queue-rx_skbuff) + rx_queue-rx_buff = kcalloc(rx_queue-rx_ring_size, + sizeof(*rx_queue-rx_buff), + GFP_KERNEL); + if (!rx_queue-rx_buff) goto cleanup; - - for (j = 0; j rx_queue-rx_ring_size; j++) - rx_queue-rx_skbuff[j] = NULL; } gfar_init_bds(ndev); @@ -335,10 +333,8 @@ static void gfar_init_rqprm(struct gfar_private *priv) } } -static void gfar_rx_buff_size_config(struct gfar_private *priv) +static void gfar_rx_offload_en(struct gfar_private *priv) { - int frame_size = priv-ndev-mtu + ETH_HLEN + ETH_FCS_LEN; - /* set this when rx hw offload (TOE) functions are being used */ priv-uses_rxfcb = 0; @@ -347,16 +343,6 @@ static void gfar_rx_buff_size_config(struct gfar_private *priv) if (priv-hwts_rx_en) priv-uses_rxfcb = 1; - - if (priv-uses_rxfcb) - frame_size += GMAC_FCB_LEN; - - frame_size
[PATCH net-next 3/4 v2] gianfar: Use ndev, more Rx path cleanup
Use ndev instead of dev, as the rx queue back pointer to a net_device struct, to avoid name clashing with a struct device reference. This prepares the addition of a struct device back pointer to the rx queue structure. Remove duplicated rxq registration in the process. Move napi_gro_receive() outside gfar_process_frame(). Signed-off-by: Claudiu Manoil claudiu.man...@freescale.com --- v2: merge lstatus as u32 drivers/net/ethernet/freescale/gianfar.c | 54 ++-- drivers/net/ethernet/freescale/gianfar.h | 4 +-- 2 files changed, 26 insertions(+), 32 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index c839e76..7654d5e 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -141,8 +141,7 @@ static void gfar_netpoll(struct net_device *dev); #endif int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit); static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue); -static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb, - struct napi_struct *napi); +static void gfar_process_frame(struct net_device *ndev, struct sk_buff *skb); static void gfar_halt_nodisable(struct gfar_private *priv); static void gfar_clear_exact_match(struct net_device *dev); static void gfar_set_mac_for_addr(struct net_device *dev, int num, @@ -262,7 +261,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev) rx_queue = priv-rx_queue[i]; rx_queue-rx_bd_base = vaddr; rx_queue-rx_bd_dma_base = addr; - rx_queue-dev = ndev; + rx_queue-ndev = ndev; addr += sizeof(struct rxbd8) * rx_queue-rx_ring_size; vaddr += sizeof(struct rxbd8) * rx_queue-rx_ring_size; } @@ -593,7 +592,7 @@ static int gfar_alloc_rx_queues(struct gfar_private *priv) priv-rx_queue[i]-rx_skbuff = NULL; priv-rx_queue[i]-qindex = i; - priv-rx_queue[i]-dev = priv-ndev; + priv-rx_queue[i]-ndev = priv-ndev; } return 0; } @@ -1913,7 +1912,7 @@ static void free_skb_tx_queue(struct gfar_priv_tx_q *tx_queue) static void free_skb_rx_queue(struct gfar_priv_rx_q *rx_queue) { struct rxbd8 *rxbdp; - struct gfar_private *priv = netdev_priv(rx_queue-dev); + struct gfar_private *priv = netdev_priv(rx_queue-ndev); int i; rxbdp = rx_queue-rx_bd_base; @@ -2709,17 +2708,17 @@ static struct sk_buff *gfar_new_skb(struct net_device *ndev, static void gfar_rx_alloc_err(struct gfar_priv_rx_q *rx_queue) { - struct gfar_private *priv = netdev_priv(rx_queue-dev); + struct gfar_private *priv = netdev_priv(rx_queue-ndev); struct gfar_extra_stats *estats = priv-extra_stats; - netdev_err(rx_queue-dev, Can't alloc RX buffers\n); + netdev_err(rx_queue-ndev, Can't alloc RX buffers\n); atomic64_inc(estats-rx_alloc_err); } static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q *rx_queue, int alloc_cnt) { - struct net_device *ndev = rx_queue-dev; + struct net_device *ndev = rx_queue-ndev; struct rxbd8 *bdp, *base; dma_addr_t bufaddr; int i; @@ -2756,10 +2755,10 @@ static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q *rx_queue, rx_queue-next_to_use = i; } -static void count_errors(u32 lstatus, struct net_device *dev) +static void count_errors(u32 lstatus, struct net_device *ndev) { - struct gfar_private *priv = netdev_priv(dev); - struct net_device_stats *stats = dev-stats; + struct gfar_private *priv = netdev_priv(ndev); + struct net_device_stats *stats = ndev-stats; struct gfar_extra_stats *estats = priv-extra_stats; /* If the packet was truncated, none of the other errors matter */ @@ -2854,10 +2853,9 @@ static inline void gfar_rx_checksum(struct sk_buff *skb, struct rxfcb *fcb) } /* gfar_process_frame() -- handle one incoming packet if skb isn't NULL. */ -static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb, - struct napi_struct *napi) +static void gfar_process_frame(struct net_device *ndev, struct sk_buff *skb) { - struct gfar_private *priv = netdev_priv(dev); + struct gfar_private *priv = netdev_priv(ndev); struct rxfcb *fcb = NULL; /* fcb is at the beginning if exists */ @@ -2866,10 +2864,8 @@ static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb, /* Remove the FCB from the skb * Remove the padded bytes, if there are any */ - if (priv-uses_rxfcb) { - skb_record_rx_queue(skb, fcb-rq); + if (priv-uses_rxfcb) skb_pull(skb, GMAC_FCB_LEN); - } /* Get receive timestamp from the skb */ if
[PATCH net-next 2/4 v2] gianfar: Fix and cleanup rxbd status handling
There are several (long standing) problems about how the status field of the rx buffer descriptor (rxbd) is currently handled on the error path: - too many unnecessary 16bit reads of the two halves of the rxbd status field (32bit), also resulting in overuse of endianness convesion macros; - bdp-status = RXBD_LARGE makes no sense, since the large flag is read only (only eTSEC can write it), and trying to clear the other status bits is also error prone in this context (most of the rx status bits are read only anyway). This is fixed with a single 32bit read of the status field, and then the appropriate 16bit shifting is applied to access the various status bits or the rx frame length. Also corrected the use of the RXBD_LARGE flag. Additional fix: rx_over_errors stat is incremented instead of rx_crc_errors in case of RXBD_OVERRUN occurrence. Signed-off-by: Claudiu Manoil claudiu.man...@freescale.com --- v2: lstatus is u32, not unsigned long drivers/net/ethernet/freescale/gianfar.c | 34 +--- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index b35bf3d..c839e76 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -2756,14 +2756,14 @@ static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q *rx_queue, rx_queue-next_to_use = i; } -static inline void count_errors(unsigned short status, struct net_device *dev) +static void count_errors(u32 lstatus, struct net_device *dev) { struct gfar_private *priv = netdev_priv(dev); struct net_device_stats *stats = dev-stats; struct gfar_extra_stats *estats = priv-extra_stats; /* If the packet was truncated, none of the other errors matter */ - if (status RXBD_TRUNCATED) { + if (lstatus BD_LFLAG(RXBD_TRUNCATED)) { stats-rx_length_errors++; atomic64_inc(estats-rx_trunc); @@ -2771,25 +2771,25 @@ static inline void count_errors(unsigned short status, struct net_device *dev) return; } /* Count the errors, if there were any */ - if (status (RXBD_LARGE | RXBD_SHORT)) { + if (lstatus BD_LFLAG(RXBD_LARGE | RXBD_SHORT)) { stats-rx_length_errors++; - if (status RXBD_LARGE) + if (lstatus BD_LFLAG(RXBD_LARGE)) atomic64_inc(estats-rx_large); else atomic64_inc(estats-rx_short); } - if (status RXBD_NONOCTET) { + if (lstatus BD_LFLAG(RXBD_NONOCTET)) { stats-rx_frame_errors++; atomic64_inc(estats-rx_nonoctet); } - if (status RXBD_CRCERR) { + if (lstatus BD_LFLAG(RXBD_CRCERR)) { atomic64_inc(estats-rx_crcerr); stats-rx_crc_errors++; } - if (status RXBD_OVERRUN) { + if (lstatus BD_LFLAG(RXBD_OVERRUN)) { atomic64_inc(estats-rx_overrun); - stats-rx_crc_errors++; + stats-rx_over_errors++; } } @@ -2921,6 +2921,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit) i = rx_queue-next_to_clean; while (rx_work_limit--) { + u32 lstatus; if (cleaned_cnt = GFAR_RX_BUFF_ALLOC) { gfar_alloc_rx_buffs(rx_queue, cleaned_cnt); @@ -2928,7 +2929,8 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit) } bdp = rx_queue-rx_bd_base[i]; - if (be16_to_cpu(bdp-status) RXBD_EMPTY) + lstatus = be32_to_cpu(bdp-lstatus); + if (lstatus BD_LFLAG(RXBD_EMPTY)) break; /* order rx buffer descriptor reads */ @@ -2940,13 +2942,13 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit) dma_unmap_single(priv-dev, be32_to_cpu(bdp-bufPtr), priv-rx_buffer_size, DMA_FROM_DEVICE); - if (unlikely(!(be16_to_cpu(bdp-status) RXBD_ERR) -be16_to_cpu(bdp-length) priv-rx_buffer_size)) - bdp-status = cpu_to_be16(RXBD_LARGE); + if (unlikely(!(lstatus BD_LFLAG(RXBD_ERR)) +(lstatus BD_LENGTH_MASK) priv-rx_buffer_size)) + lstatus |= BD_LFLAG(RXBD_LARGE); - if (unlikely(!(be16_to_cpu(bdp-status) RXBD_LAST) || -be16_to_cpu(bdp-status) RXBD_ERR)) { - count_errors(be16_to_cpu(bdp-status), dev); + if (unlikely(!(lstatus BD_LFLAG(RXBD_LAST)) || +(lstatus BD_LFLAG(RXBD_ERR { + count_errors(lstatus, dev); /* discard faulty buffer */
[PATCH net-next 0/4 v2] gianfar: Add Rx S/G
Hi David, This patch-set introduces scatter/gather support on the Rx side, addressing Rx path performance issues in the driver. Thanks. As an example, two boards connected back-to-back were used to measure the throughput, running the same kernel 4.1, before and after applying these patches. The netperf UDP_STREAM results below show that the bottleneck lies on the Rx side BEFORE applying the patches, and that the Rx throughput is even lower with a larger MTU. AFTER applying the patches the Rx bottleneck is gone (Rx throughput matches the Tx one) and the RX throughput is not influenced by MTU size any longer (as expected). BEFORE: 1) MTU 1500 (default) root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service SizeSize Time Okay Errors Throughput Util Demand bytes bytessecs# # 10^6bits/sec % SS us/KB 163840 512 150.0020119124 0 549.4 100.00 14.911 163840 150.0014057349 383.9 100.00 14.911 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service SizeSize Time Okay Errors Throughput Util Demand bytes bytessecs# # 10^6bits/sec % SS us/KB 163840 64 150.0023654013 0 80.7 100.00 101.463 163840 150.0015875288 54.2 100.00 101.463 2) MTU 8000 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service SizeSize Time Okay Errors Throughput Util Demand bytes bytessecs# # 10^6bits/sec % SS us/KB 163840 512 150.0020067232 0 548.0 100.00 14.950 163840 150.006113498 166.9 99.9514.942 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service SizeSize Time Okay Errors Throughput Util Demand bytes bytessecs# # 10^6bits/sec % SS us/KB 163840 64 150.0023621279 0 80.6 100.00 101.604 163840 150.005868602 20.0 99.96101.563 AFTER: (both MTU 1500 and MTU 8000) root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service SizeSize Time Okay Errors Throughput Util Demand bytes bytessecs# # 10^6bits/sec % SS us/KB 163840 512 150.0019914969 0 543.8 100.00 15.064 163840 150.0019914969 543.8 99.3514.966 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service SizeSize Time Okay Errors Throughput Util Demand bytes bytessecs# # 10^6bits/sec % SS us/KB 163840 64 150.0023433989 0 80.0 100.00 102.416 163840 150.0023433989 80.0 99.62102.023 Claudiu Manoil (4): gianfar: Bundle Rx allocation, cleanup gianfar: Fix and cleanup rxbd status handling gianfar: Use ndev, more Rx path cleanup gianfar: Add paged allocation and Rx S/G drivers/net/ethernet/freescale/gianfar.c | 496 +-- drivers/net/ethernet/freescale/gianfar.h | 72 ++-- drivers/net/ethernet/freescale/gianfar_ethtool.c | 4 +- 3 files changed, 331 insertions(+), 241 deletions(-) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] ipv6: Do not iterate over all interfaces when finding source address on specific interface.
Hi, Hajime Tazaki wrote: Yoshifuji-san, At Mon, 13 Jul 2015 17:38:48 +0900, Erik Kline wrote: On 13 July 2015 at 15:32, YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com wrote: Hi, Erik Kline wrote: Hmm, when I run a UML linux with this patch (which, I'm ashamed to say, I failed to do before) I get these kinds of errors: unregister_netdevice: waiting for TAPdevice to become free. Usage count = 1 unregister_netdevice: waiting for TAPdevice to become free. Usage count = 1 Perhaps they're unrelated... I'm still investigating. Would you test attached patch please? That does look logically correct, so +1 to it regardless, but it does not seem to have fixed the issue I'm seeing. I still haven't produced the smallest possible demo test program. sorry to jump-in, but there is a side-effect with this patch, which my tcp and dccp tests (ipv6) are failed. because newly added function (__ipv6_dev_get_saddr) won't update a variable 'hiscore' (it swaps with 'score' in some case), the caller (ipv6_dev_get_saddr) can't fill an appropriate saddr in the end. I don't know if this is a good patch but the following diff makes my test happy. We should update score as well... -- Hajime diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4ab74d5..c4e9416 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1363,7 +1363,8 @@ static void __ipv6_dev_get_saddr(struct net *net, unsigned int prefs, const struct in6_addr *saddr, struct inet6_dev *idev, - struct ipv6_saddr_score *scores) + struct ipv6_saddr_score *scores, + struct ipv6_saddr_score **in_hiscore) { struct ipv6_saddr_score *score = scores[0], *hiscore = scores[1]; @@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net, in6_ifa_hold(score-ifa); swap(hiscore, score); + *in_hiscore = hiscore; /* restore our iterator */ score-ifa = hiscore-ifa; @@ -1480,13 +1482,15 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, } if (use_oif_addr) { - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, + scores, hiscore); } else { for_each_netdev_rcu(net, dev) { idev = __in6_dev_get(dev); if (!idev) continue; - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, + scores, hiscore); } } rcu_read_unlock(); -- Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v4] rtnl/bond: don't send rtnl msg for unregistered iface
Hello, I have a quick question about this patch. On Wed, May 13, 2015 at 2:19 PM, Nicolas Dichtel nicolas.dich...@6wind.com wrote: diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 837d30b5ffed..7b25f1ef3d75 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2415,6 +2415,9 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned int change, { struct sk_buff *skb; + if (dev-reg_state != NETREG_REGISTERED) + return; + Is this check correct, or placed at the correct location? The reason I am asking is as follows. In rollback_registered_many(), dev-reg_state is set to NETREG_UNREGISTERING for devices that will be unregistered. When rtmsg_ifinfo_build_skb(RTM_DELLINK, ...) is called in the following loop in rollback_registered_many, this comparison will always be true and no DELLINK event generated. This change led to some applications I have not behaving as expected due to missing DELLINK when network devices are removed. I also see no DELLINK with ip mon link. Removing the check restores the old behavior (DELLINK events are generated). My machine is running 3.18.18, which includes this fix. Thanks in advance for any help, Kristian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: net: Fix skb csum races when peeking
On Mon, Jul 13, 2015 at 08:01:42PM +0800, Herbert Xu wrote: PS we seem to no longer use the hardware checksum in case of CHECKSUM_COMPLETE, I wonder why that is? Nevermind, it's still there. I was just looking in the wrong place. -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add stealth mode
On 2015-07-12 19:13, Matteo Croce wrote: 2015-07-08 15:32 GMT+02:00 Austin S Hemmelgarn ahferro...@gmail.com: On 2015-07-06 15:44, Matteo Croce wrote: Just to name a few that I know of off the top of my head: 1. IP packets with any protocol number not supported by your current kernel (these return a special ICMP message). Right, I'll handle them 2. SCTP INIT and COOKIE_ECHO chunks when you have SCTP enabled in the kernel. Well, I've never played with SCTP before It should still be checked, as should DCCP and RDS (those are the only other Layer 3 protocols that I have ever actually seen people try to scan hosts with besides TCP/UDP/SCTP). SCTP itself is not hugely prevalent outside of some clustering uses, but it is still seen on the internet sometimes (for example, Gentoo has optional patches for OpenSSH to use SCTP). 3. Theoretically, some IGMP messages. 4. NDP messages. 5. ARP queries looking for the machine's IP addresses. Yes I know, but it's unlikely to receive this packets from WAN, right? My flag is intended to be used mostly on WAN interfaces, machines in LAN should be easily discoverable IMHO. In theory it's unlikely, but if you use any kind of IPv4 multicast on the WAN you will get IGMP (and MLD for IPv6 multicast). You may also get some NDP queries also if you are using IPv6 and your WAN is itself behind a NAT router (and yes, there are ISP's who do that). 6. Certain odd flag combinations on single TCP packets (check the documentation for Nmap for more info regarding these), which I believe (although I may be reading the code wrong) you aren't accounting for. I've tried many TCP flags combination with hping3, NUL, SYN/ACK, ACK, SYN/FIN, etc. They doesn't get any response when the flag is set How about FIN/ACK and FIN/PSH/URG? 7. DAD queries. Never looked at this packets, are a subset of NDP? Kind of, it's an ICMPv6 extension for detecting if SLACC configured address is already in use. Most distro's have support for it enabled by default. 8. ICMP address mask queries (which you also don't appear to account for). It's deprecated and actually it doesn't get any response already Just because it's deprecated doesn't mean you shouldn't account for it, although it does appear to get dropped by default by the kernel. You should also test how different combinations of sysctls under /proc/sys/net affect this (there are for example already sysctls for ignoring certain types of ICMP packets). smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net] bridge: mdb: fix double add notification
Since the mdb add/del code was introduced there have been 2 br_mdb_notify calls when doing br_mdb_add() resulting in 2 notifications on each add. Example: Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent Before patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent [MDB]dev br0 port eth1 grp 239.0.0.1 permanent After patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov niko...@cumulusnetworks.com Fixes: cfd567543590 (bridge: add support of adding and deleting mdb entries) --- net/bridge/br_mdb.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c index c11cf2611db0..1198a3dbad95 100644 --- a/net/bridge/br_mdb.c +++ b/net/bridge/br_mdb.c @@ -351,7 +351,6 @@ static int br_mdb_add_group(struct net_bridge *br, struct net_bridge_port *port, if (state == MDB_TEMPORARY) mod_timer(p-timer, now + br-multicast_membership_interval); - br_mdb_notify(br-dev, port, group, RTM_NEWMDB); return 0; } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users
On 13-07-2015 07:39, Neil Horman wrote: On Fri, Jul 10, 2015 at 06:21:14PM -0700, David Miller wrote: From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Date: Thu, 9 Jul 2015 11:15:19 -0300 SCTP has this operation to peel off associations from a given socket and create a new socket using this association. We currently have two ways to use this operation: - via getsockopt(), on which it will also create and return a file descriptor for this new socket - via sctp_do_peeloff(), which is for kernel only The caveat with using sctp_do_peeloff() directly is that it creates a dependency to SCTP module, while all other operations are handled via kernel_{socket,sendmsg,getsockopt...}() interface. This causes the kernel to load SCTP module even when it's not directly used This patch then updates SCTP_SOCKOPT_PEELOFF so that for kernel users of this protocol it will not allocate a file descriptor but instead just return the socket pointer directly. If called by an user application it will work as before. Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com I do not like this at all. Socket option implementations should not change their behavior or what datastructures they consume or return just because the socket happens to be a kernel socket. But in this case its necessecary, as the kernel here can't allocate an fd, due to serious leakage (see commit 2f2d76cc3e938389feee671b46252dde6880b3b7). Initially Marcelo had created duplicate code paths, one to return an fd, one to return a file struct. If you would rather go in that direction, I'm sure he can propose it again, but that seems less correct to me than this solution. Yes. dlm is the only user of this option within kernel today and it causes serious problems, as Neil just referenced. Another good result of this implementation is that we are preventing such leakage from happening again in the future. I'm not applying this series, sorry. Also, your patch series lacked an intial PATCH 0/N posting, so you could at least spend the time to discuss this patch series at a high level and explain your overall motivations. That was in the initial posting. It should have been reposted, but if you're interested: http://marc.info/?l=linux-sctpm=143449456219518w=2 My bad. Won't happen again. Thanks, Marcelo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] nf: IDLETIMER: fix lockdep warning
On Mon, Jul 13, 2015 at 6:20 AM, Pablo Neira Ayuso pa...@netfilter.org wrote: On Thu, Jul 09, 2015 at 05:15:01PM -0700, Dmitry Torokhov wrote: Dynamically allocated sysfs attributes should be initialized with sysfs_attr_init() otherwise lockdep will be angry with us: [ 45.468653] BUG: key ffc030fad4e0 not in .data! [ 45.468655] [ cut here ] [ 45.468666] WARNING: CPU: 0 PID: 1176 at /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 lockdep_init_map+0x12c/0x490() [ 45.468672] DEBUG_LOCKS_WARN_ON(1) [ 45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U W 3.18.0 #43 [ 45.468674] Hardware name: XXX [ 45.468675] Call trace: [ 45.468680] [ffc0002072b4] dump_backtrace+0x0/0x10c [ 45.468683] [ffc0002073d0] show_stack+0x10/0x1c [ 45.468688] [ffc000a86cd4] dump_stack+0x74/0x94 [ 45.468692] [ffc000217ae0] warn_slowpath_common+0x84/0xb0 [ 45.468694] [ffc000217b84] warn_slowpath_fmt+0x4c/0x58 [ 45.468697] [ffc0002530a4] lockdep_init_map+0x128/0x490 [ 45.468701] [ffc000367ef0] __kernfs_create_file+0x80/0xe4 [ 45.468704] [ffc00036862c] sysfs_add_file_mode_ns+0x104/0x170 [ 45.468706] [ffc00036870c] sysfs_create_file_ns+0x58/0x64 [ 45.468711] [ffc000930430] idletimer_tg_checkentry+0x14c/0x324 [ 45.468714] [ffc00092a728] xt_check_target+0x170/0x198 [ 45.468717] [ffc000993efc] check_target+0x58/0x6c [ 45.468720] [ffc000994c64] translate_table+0x30c/0x424 [ 45.468723] [ffc00099529c] do_ipt_set_ctl+0x144/0x1d0 [ 45.468728] [ffc0009079f0] nf_setsockopt+0x50/0x60 [ 45.468732] [ffc000946870] ip_setsockopt+0x8c/0xb4 [ 45.468735] [ffc0009661c0] raw_setsockopt+0x10/0x50 [ 45.468739] [ffc0008c1550] sock_common_setsockopt+0x14/0x20 [ 45.468742] [ffc0008bd190] SyS_setsockopt+0x88/0xb8 [ 45.468744] ---[ end trace 41d156354d18c039 ]--- Applied, thanks. One question: Change-Id: I1da5cd96fc8e1e1e4209e81eba1165a42d4d45e9 BTW, does this gerrit change ID provide any public information? Thanks. Argh, I am sorry, I forgot to clean this out when mailing the patch. In this particular case you can find the change in AOSP gerrit at https://android-review.googlesource.com but without such context this change-id is of course useless. Thanks, Dmitry -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mmap()ed AF_NETLINK: lockdep and sleep-in-atomic warnings
Hi, This simple test-case trigers few locking asserts in kernel: #define _GNU_SOURCE #include stdlib.h #include stdio.h #include string.h #include sys/mman.h #include sys/socket.h #include sys/types.h #include linux/netlink.h #define SOL_NETLINK 270 int main(int argc, char **argv) { unsigned int block_size = 16 * 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr= 64, .nm_frame_size = 16384, .nm_frame_nr= 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, req, sizeof(req)) 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, req, sizeof(req)) 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ [2.500126] BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 [2.501328] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init [2.501997] 3 locks held by init/1: [2.502380] #0: (reboot_mutex){+.+...}, at: [81080959] SyS_reboot+0xa9/0x220 [2.503328] #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [8107f379] __blocking_notifier_call_chain+0x39/0x70 [2.504659] #2: (rcu_callback){..}, at: [810d32e0] rcu_do_batch.isra.49+0x160/0x10c0 [2.505724] Preemption disabled at:[8145365f] __delay+0xf/0x20 [2.506443] [2.506612] CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-9-gbddf4c4818e0 #253 [2.507378] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 [2.508386] 88017b3d8000 88027bc03c38 81929ceb 0102 [2.509233] 88027bc03c68 81085a9d 0002 [2.510057] 81ca2a20 0268 88027bc03c98 [2.510882] Call Trace: [2.511146] IRQ [81929ceb] dump_stack+0x4f/0x7b [2.511763] [81085a9d] ___might_sleep+0x16d/0x270 [2.512476] [81085bed] __might_sleep+0x4d/0x90 [2.513071] [8192e96f] mutex_lock_nested+0x2f/0x430 [2.513683] [81932fed] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [2.514385] [81464143] ? __this_cpu_preempt_check+0x13/0x20 [2.515066] [8182fc3d] netlink_set_ring+0x1ed/0x350 [2.515694] [8182e000] ? netlink_undo_bind+0x70/0x70 [2.516411] [8182fe20] netlink_sock_destruct+0x80/0x150 [2.517070] [817e484d] __sk_free+0x1d/0x160 [2.517607] [817e49a9] sk_free+0x19/0x20 [2.518118] [8182e020] deferred_put_nlk_sk+0x20/0x30 [2.518735] [810d391c] rcu_do_batch.isra.49+0x79c/0x10c0 [2.519386] [810d32e0] ? rcu_do_batch.isra.49+0x160/0x10c0 [2.520101] [810d787b] rcu_process_callbacks+0xdb/0x6d0 [2.520790] [8105dd52] __do_softirq+0x152/0x630 [2.521370] [8105e3be] irq_exit+0x8e/0xb0 [2.521895] [81936366] smp_apic_timer_interrupt+0x46/0x60 [2.522558] [8145365f] ? __delay+0xf/0x20 [2.523079] [81934a00] apic_timer_interrupt+0x70/0x80 [2.523705] EOI [8145365f] ? __delay+0xf/0x20 [2.524366] [810b26ab] ? in_lock_functions+0x1b/0x20 [2.524995] [8108ab81] get_parent_ip+0x11/0x50 [2.525562] [8108ad1f] preempt_count_sub+0x9f/0xf0 [2.526179] [81453778] delay_tsc+0x68/0xc0 [2.526706] [8145365f] __delay+0xf/0x20 [2.527207] [8145369a] __const_udelay+0x2a/0x30 [2.527781] [8172d05a] md_notify_reboot+0xea/0x100 [2.528489] [8107f379] ? __blocking_notifier_call_chain+0x39/0x70 [2.529236] [8107efc6] notifier_call_chain+0x66/0x90 [2.529856] [8107f391] __blocking_notifier_call_chain+0x51/0x70 [2.530570] [810ae8c6] ? __lock_acquire+0x606/0xf50 [2.531172] [8107f3c6] blocking_notifier_call_chain+0x16/0x20 [2.531869] [8108061d] kernel_restart_prepare+0x1d/0x40 [2.532593] [810806e6] kernel_restart+0x16/0x60 [2.533183] [81080a07] SyS_reboot+0x157/0x220 [2.533738] [81010778] ? __restore_xstate_sig+0xf8/0x720 [2.534390] [81464127] ? debug_smp_processor_id+0x17/0x20 [2.535051] [810a836e] ? put_lock_stats.isra.19+0xe/0x30 [2.535707] [81933040] ? _raw_spin_unlock_irq+0x30/0x60 [2.536446] [8108ad2b] ? preempt_count_sub+0xab/0xf0 [2.537112] [81933daa] ? syscall_return+0x11/0x54 [2.537709] [81464143] ? __this_cpu_preempt_check+0x13/0x20 [2.538399]
[PATCH net-next 1/4 v2] gianfar: Bundle Rx allocation, cleanup
Use a more common consumer/ producer index design to improve rx buffer allocation. Instead of allocating a single new buffer (skb) on each iteration, bundle the allocation of several rx buffers at a time. This also opens the path for further memory optimizations. Remove useless check of rxq-rfbptr, since this patch touches rx pause frame handling code as well. rxq-rfbptr is always initialized as part of Rx BD ring init. Remove redundant (and misleading) 'amount_pull' parameter. Signed-off-by: Claudiu Manoil claudiu.man...@freescale.com --- v2: none drivers/net/ethernet/freescale/gianfar.c | 201 --- drivers/net/ethernet/freescale/gianfar.h | 39 +++-- drivers/net/ethernet/freescale/gianfar_ethtool.c | 3 + 3 files changed, 136 insertions(+), 107 deletions(-) diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c index ff87502..b35bf3d 100644 --- a/drivers/net/ethernet/freescale/gianfar.c +++ b/drivers/net/ethernet/freescale/gianfar.c @@ -116,8 +116,8 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev); static void gfar_reset_task(struct work_struct *work); static void gfar_timeout(struct net_device *dev); static int gfar_close(struct net_device *dev); -static struct sk_buff *gfar_new_skb(struct net_device *dev, - dma_addr_t *bufaddr); +static void gfar_alloc_rx_buffs(struct gfar_priv_rx_q *rx_queue, + int alloc_cnt); static int gfar_set_mac_address(struct net_device *dev); static int gfar_change_mtu(struct net_device *dev, int new_mtu); static irqreturn_t gfar_error(int irq, void *dev_id); @@ -142,7 +142,7 @@ static void gfar_netpoll(struct net_device *dev); int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit); static void gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue); static void gfar_process_frame(struct net_device *dev, struct sk_buff *skb, - int amount_pull, struct napi_struct *napi); + struct napi_struct *napi); static void gfar_halt_nodisable(struct gfar_private *priv); static void gfar_clear_exact_match(struct net_device *dev); static void gfar_set_mac_for_addr(struct net_device *dev, int num, @@ -169,17 +169,15 @@ static void gfar_init_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp, bdp-lstatus = cpu_to_be32(lstatus); } -static int gfar_init_bds(struct net_device *ndev) +static void gfar_init_bds(struct net_device *ndev) { struct gfar_private *priv = netdev_priv(ndev); struct gfar __iomem *regs = priv-gfargrp[0].regs; struct gfar_priv_tx_q *tx_queue = NULL; struct gfar_priv_rx_q *rx_queue = NULL; struct txbd8 *txbdp; - struct rxbd8 *rxbdp; u32 __iomem *rfbptr; int i, j; - dma_addr_t bufaddr; for (i = 0; i priv-num_tx_queues; i++) { tx_queue = priv-tx_queue[i]; @@ -207,33 +205,18 @@ static int gfar_init_bds(struct net_device *ndev) rfbptr = regs-rfbptr0; for (i = 0; i priv-num_rx_queues; i++) { rx_queue = priv-rx_queue[i]; - rx_queue-cur_rx = rx_queue-rx_bd_base; - rx_queue-skb_currx = 0; - rxbdp = rx_queue-rx_bd_base; - - for (j = 0; j rx_queue-rx_ring_size; j++) { - struct sk_buff *skb = rx_queue-rx_skbuff[j]; - if (skb) { - bufaddr = be32_to_cpu(rxbdp-bufPtr); - } else { - skb = gfar_new_skb(ndev, bufaddr); - if (!skb) { - netdev_err(ndev, Can't allocate RX buffers\n); - return -ENOMEM; - } - rx_queue-rx_skbuff[j] = skb; - } + rx_queue-next_to_clean = 0; + rx_queue-next_to_use = 0; - gfar_init_rxbdp(rx_queue, rxbdp, bufaddr); - rxbdp++; - } + /* make sure next_to_clean != next_to_use after this +* by leaving at least 1 unused descriptor +*/ + gfar_alloc_rx_buffs(rx_queue, gfar_rxbd_unused(rx_queue)); rx_queue-rfbptr = rfbptr; rfbptr += 2; } - - return 0; } static int gfar_alloc_skb_resources(struct net_device *ndev) @@ -311,8 +294,7 @@ static int gfar_alloc_skb_resources(struct net_device *ndev) rx_queue-rx_skbuff[j] = NULL; } - if (gfar_init_bds(ndev)) - goto cleanup; + gfar_init_bds(ndev); return 0; @@ -1639,10 +1621,7 @@ static int gfar_restore(struct device *dev) return 0; } - if (gfar_init_bds(ndev)) { -
Re: [PATCH net-next] tc: fix tc actions in case of shared skb
On 07/10/15 20:10, Alexei Starovoitov wrote: TC actions need to check for very unlikely event skb-users != 1, otherwise subsequent pskb_may_pull/pskb_expand_head will crash. When skb_shared() just drop the packet, since in the middle of actions it's too late to call skb_share_check(), since classifiers/actions assume the same skb pointer. Alexei, To add to what Dave said - are the rules specified here: Documentation/networking/tc-actions-env-rules.txt insufficient? cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] nf: IDLETIMER: fix lockdep warning
On Thu, Jul 09, 2015 at 05:15:01PM -0700, Dmitry Torokhov wrote: Dynamically allocated sysfs attributes should be initialized with sysfs_attr_init() otherwise lockdep will be angry with us: [ 45.468653] BUG: key ffc030fad4e0 not in .data! [ 45.468655] [ cut here ] [ 45.468666] WARNING: CPU: 0 PID: 1176 at /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 lockdep_init_map+0x12c/0x490() [ 45.468672] DEBUG_LOCKS_WARN_ON(1) [ 45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U W 3.18.0 #43 [ 45.468674] Hardware name: XXX [ 45.468675] Call trace: [ 45.468680] [ffc0002072b4] dump_backtrace+0x0/0x10c [ 45.468683] [ffc0002073d0] show_stack+0x10/0x1c [ 45.468688] [ffc000a86cd4] dump_stack+0x74/0x94 [ 45.468692] [ffc000217ae0] warn_slowpath_common+0x84/0xb0 [ 45.468694] [ffc000217b84] warn_slowpath_fmt+0x4c/0x58 [ 45.468697] [ffc0002530a4] lockdep_init_map+0x128/0x490 [ 45.468701] [ffc000367ef0] __kernfs_create_file+0x80/0xe4 [ 45.468704] [ffc00036862c] sysfs_add_file_mode_ns+0x104/0x170 [ 45.468706] [ffc00036870c] sysfs_create_file_ns+0x58/0x64 [ 45.468711] [ffc000930430] idletimer_tg_checkentry+0x14c/0x324 [ 45.468714] [ffc00092a728] xt_check_target+0x170/0x198 [ 45.468717] [ffc000993efc] check_target+0x58/0x6c [ 45.468720] [ffc000994c64] translate_table+0x30c/0x424 [ 45.468723] [ffc00099529c] do_ipt_set_ctl+0x144/0x1d0 [ 45.468728] [ffc0009079f0] nf_setsockopt+0x50/0x60 [ 45.468732] [ffc000946870] ip_setsockopt+0x8c/0xb4 [ 45.468735] [ffc0009661c0] raw_setsockopt+0x10/0x50 [ 45.468739] [ffc0008c1550] sock_common_setsockopt+0x14/0x20 [ 45.468742] [ffc0008bd190] SyS_setsockopt+0x88/0xb8 [ 45.468744] ---[ end trace 41d156354d18c039 ]--- Applied, thanks. One question: Change-Id: I1da5cd96fc8e1e1e4209e81eba1165a42d4d45e9 BTW, does this gerrit change ID provide any public information? Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next 12/16] i40evf: don't delete all the filters
Hello. On 7/13/2015 12:08 PM, Jeff Kirsher wrote: From: Mitch Williams mitch.a.willi...@intel.com Due to an inverted conditional, the driver was marking all of its MAC filters for deletion every time set_rx_mode was called. Depending upon the timing of the calls to set_rx_mode and the processing of the admin queue, the driver would (accidentally) end up with a varying number of functional filters. Correct this logic so that MAC filters are added and removed correctly. Add a check for the driver's hardware MAC address so that this filter doesn't get removed incorrectly. Change-ID: Ib3e7c4a5b53df6835f164fe44cb778cb71f8aff8 Signed-off-by: Mitch Williams mitch.a.willi...@intel.com Tested-by: Jim Young james.m.yo...@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirs...@intel.com --- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 94eff4a..07f6052 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -892,8 +892,10 @@ static void i40evf_set_rx_mode(struct net_device *netdev) break; } } + if (ether_addr_equal(f-macaddr, adapter-hw.mac.addr)) + found = true; This line is indented too much. [...] WBR, Sergei -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [rhashtable] WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:301 __debug_object_init()
22+ 0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial git bisect bad fd3e646c87ab3f2ba98aa25394581af27cc78dc5 # 09:27 0- 22 net: act_bpf: fix size mismatch on filter preparation git bisect bad e84448d52190413400663736067f826f28a04ad6 # 09:32 0- 22 xen-netfront: refactor skb slot counting git bisect bad 829a3ada9cc7d4c30fa61f8033403fb6c8f8092a # 09:38 0- 1 geneve: Simplify locking. git bisect good a4c9ea5e8fec680134d22aa99b54d1cd8c226ebd # 09:42 22+ 12 geneve: Add Geneve GRO support git bisect good 255047b0dca31e6b8ce254481a0b65d559d2ebb8 # 09:46 20+ 0 Bluetooth: Add timing information to SMP test case runs git bisect good 354f473ee2c5d01c1cf90f747f95218ee3e73e95 # 09:52 22+ 0 ath9k: fix typo git bisect good d312da293f787e1b19c57acb58e8c1b171c4a04a # 09:59 22+ 0 ixgbe: convert to CYCLECOUNTER_MASK macro. git bisect good b8e1943e9f754219bcfb40bac4a605b5348acb25 # 10:03 22+ 8 rhashtable: Factor out bucket_tail() function git bisect bad f89bd6f87a53ce5a7d60662429591ebac2745c10 # 10:08 0- 22 rhashtable: Supports for nulls marker git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa # 10:12 22+ 13 spinlock: Add spin_lock_bh_nested() git bisect bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1 # 10:16 0- 22 rhashtable: Per bucket locks deferred expansion/shrinking # first bad commit: [97defe1ecf868b8127f8e62395499d6a06e4c4b1] rhashtable: Per bucket locks deferred expansion/shrinking git bisect good 113948d841e8d78039e5dbbb5248f5b73e99eafa # 10:19 66+ 27 spinlock: Add spin_lock_bh_nested() # extra tests with DEBUG_INFO git bisect bad 97defe1ecf868b8127f8e62395499d6a06e4c4b1 # 10:25 0- 66 rhashtable: Per bucket locks deferred expansion/shrinking # extra tests on HEAD of linux-devel/devel-spot-201507122014 git bisect good 3afd2c3f65a385c405a084d80431c84b103cb6df # 10:28 66+ 49 0day head guard for 'devel-spot-201507122014' # extra tests on tree/branch linus/master git bisect good f760b87f8f12eb262f14603e65042996fe03720e # 10:33 66+ 0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net # extra tests on tree/branch linus/master git bisect good f760b87f8f12eb262f14603e65042996fe03720e # 10:33 66+ 0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net # extra tests on tree/branch next/master git bisect good 2eb62d762a2112579f259903e62ba18d16c51f66 # 10:36 66+ 20 Add linux-next specific files for 20150713 This script may reproduce the error. #!/bin/bash kernel=$1 initrd=yocto-minimal-x86_64.cgz wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd kvm=( qemu-system-x86_64 -enable-kvm -cpu Haswell,+smep,+smap -kernel $kernel -initrd $initrd -m 256 -smp 1 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -serial stdio -display none -monitor null ) append=( hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw drbd.minor_count=8 ) ${kvm[@]} --append ${append[*]} --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/lkp Intel Corporation early console in setup code [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.19.0-rc2-00323-g97defe1 (kbuild@lkp-ib03) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Tue Jul 14 10:14:59 CST 2015 [0.00] Command line: hung_task_panic=1 earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw link=/kbuild-tests/run-queue/kvm/x86_64-randconfig-a0-07122340/linux-devel:devel-spot-201507122014:97defe1ecf868b8127f8e62395499d6a06e4c4b1:bisect-linux-1/.vmlinuz-97defe1ecf868b8127f8e62395499d6a06e4c4b1-20150714101515-19-ivb41 branch=linux-devel/devel-spot-201507122014 BOOT_IMAGE=/pkg/linux/x86_64-randconfig-a0-07122340/gcc-4.9
RE: [PATCH v3] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl
I am assuming the patch is rejected at this point. I will re-submit later as soon as I am able to post a full end to end solution. Chris -Original Message- From: Richard Cochran [mailto:richardcoch...@gmail.com] Sent: Thursday, July 09, 2015 7:58 AM To: Hall, Christopher S Cc: t...@linutronix.de; john.stu...@linaro.org; Ronciak, John; linux- ker...@vger.kernel.org; netdev@vger.kernel.org Subject: Re: [PATCH v3] Add support for driver cross-timestamp to PTP_SYS_OFFSET ioctl On Wed, Jul 08, 2015 at 01:46:41PM -0700, Christopher Hall wrote: This patch allows system and device time (cross-timestamp) to be performed by the driver. Currently, the cross-timestamping is performed in the PTP_SYS_OFFSET ioctl. The PTP clock driver reads gettimeofday() and the gettime64() callback provided by the driver. The cross-timestamp is best effort where the latency between the capture of system time (getnstimeofday()) and the device time (driver callback) may be significant. The interface looks okay to me. Now all we need is a user of it... Acked-by: Richard Cochran richardcoch...@gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] net: ieee802154: Remove redundant spi driver bus initialization
On Tue, Jun 23, 2015 at 10:52:52PM +0800, Antonio Borneo wrote: In ancient times it was necessary to manually initialize the bus field of an spi_driver to spi_bus_type. These days this is done in spi_register_driver(), so we can drop the manual assignment. Marcel, I don't see this patch in any linux-next, net-next, bluetooth-next tree. Could you please apply this patch with the acks by Alan and Varka? - Alex -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 1/2] pci: Add dev_flags bit to access VPD through function 0
From: Mark Rustad mark.d.rus...@intel.com Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through function 0 to provide VPD access on other functions. This is for hardware devices that provide copies of the same VPD capability registers in multiple functions. Because the kernel expects that each function has its own registers, both the locking and the state tracking are affected by VPD accesses to different functions. On such devices for example, if a VPD write is performed on function 0, *any* later attempt to read VPD from any other function of that device will hang. This has to do with how the kernel tracks the expected value of the F bit per function. Concurrent accesses to different functions of the same device can not only hang but also corrupt both read and write VPD data. When hangs occur, typically the error message: vpd r/w failed. This is likely a firmware bug on this device. will be seen. Never set this bit on function 0 or there will be an infinite recursion. Signed-off-by: Mark Rustad mark.d.rus...@intel.com --- Changes in V2: - Corrected spelling in log message - Added checks to see that the referenced function 0 is reasonable Changes in V3: - Don't leak a device reference - Check that function 0 has VPD - Make a helper for the function 0 checks - Do multifunction check in the quirk Changes in V4: - Provide a much more detailed explanation in the commit log --- drivers/pci/access.c | 61 +- include/linux/pci.h |2 ++ 2 files changed, 62 insertions(+), 1 deletion(-) diff --git a/drivers/pci/access.c b/drivers/pci/access.c index d9b64a175990..b965c12168b7 100644 --- a/drivers/pci/access.c +++ b/drivers/pci/access.c @@ -439,6 +439,56 @@ static const struct pci_vpd_ops pci_vpd_pci22_ops = { .release = pci_vpd_pci22_release, }; +static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count, + void *arg) +{ + struct pci_dev *tdev = pci_get_slot(dev-bus, PCI_SLOT(dev-devfn)); + ssize_t ret; + + if (!tdev) + return -ENODEV; + + ret = pci_read_vpd(tdev, pos, count, arg); + pci_dev_put(tdev); + return ret; +} + +static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count, + const void *arg) +{ + struct pci_dev *tdev = pci_get_slot(dev-bus, PCI_SLOT(dev-devfn)); + ssize_t ret; + + if (!tdev) + return -ENODEV; + + ret = pci_write_vpd(tdev, pos, count, arg); + pci_dev_put(tdev); + return ret; +} + +static const struct pci_vpd_ops pci_vpd_f0_ops = { + .read = pci_vpd_f0_read, + .write = pci_vpd_f0_write, + .release = pci_vpd_pci22_release, +}; + +static int pci_vpd_f0_dev_check(struct pci_dev *dev) +{ + struct pci_dev *tdev = pci_get_slot(dev-bus, PCI_SLOT(dev-devfn)); + int ret = 0; + + if (!tdev) + return -ENODEV; + if (!tdev-vpd || !tdev-multifunction || + dev-class != tdev-class || dev-vendor != tdev-vendor || + dev-device != tdev-device) + ret = -ENODEV; + + pci_dev_put(tdev); + return ret; +} + int pci_vpd_pci22_init(struct pci_dev *dev) { struct pci_vpd_pci22 *vpd; @@ -447,12 +497,21 @@ int pci_vpd_pci22_init(struct pci_dev *dev) cap = pci_find_capability(dev, PCI_CAP_ID_VPD); if (!cap) return -ENODEV; + if (dev-dev_flags PCI_DEV_FLAGS_VPD_REF_F0) { + int ret = pci_vpd_f0_dev_check(dev); + + if (ret) + return ret; + } vpd = kzalloc(sizeof(*vpd), GFP_ATOMIC); if (!vpd) return -ENOMEM; vpd-base.len = PCI_VPD_PCI22_SIZE; - vpd-base.ops = pci_vpd_pci22_ops; + if (dev-dev_flags PCI_DEV_FLAGS_VPD_REF_F0) + vpd-base.ops = pci_vpd_f0_ops; + else + vpd-base.ops = pci_vpd_pci22_ops; mutex_init(vpd-lock); vpd-cap = cap; vpd-busy = false; diff --git a/include/linux/pci.h b/include/linux/pci.h index 8a0321a8fb59..8edb125db13a 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -180,6 +180,8 @@ enum pci_dev_flags { PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 6), /* Do not use PM reset even if device advertises NoSoftRst- */ PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 7), + /* Get VPD from function 0 VPD */ + PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 8), }; enum pci_irq_reroute_variant { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 2/2] pci: Add VPD quirk for Intel Ethernet devices
From: Mark Rustad mark.d.rus...@intel.com This quirk sets the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device functions other than function 0. Signed-off-by: Mark Rustad mark.d.rus...@intel.com --- Changes in V3: - Added a multifunction device check --- drivers/pci/quirks.c |9 + 1 file changed, 9 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index e9fd0e90fa3b..08c04e4f5ab2 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -1894,6 +1894,15 @@ static void quirk_netmos(struct pci_dev *dev) DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID, PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos); +static void quirk_f0_vpd_link(struct pci_dev *dev) +{ + if (!dev-multifunction || !PCI_FUNC(dev-devfn)) + return; + dev-dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0; +} +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, + PCI_CLASS_NETWORK_ETHERNET, 8, quirk_f0_vpd_link); + static void quirk_e100_interrupt(struct pci_dev *dev) { u16 command, pmcsr; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch net] fq_codel: fix a use-after-free
Fixes: 25331d6ce42b (net: sched: implement qstat helper routines) Cc: John Fastabend john.fastab...@gmail.com Signed-off-by: Cong Wang xiyou.wangc...@gmail.com Signed-off-by: Cong Wang cw...@twopensource.com --- net/sched/sch_fq_codel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index d75993f..06e7c84 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -155,10 +155,10 @@ static unsigned int fq_codel_drop(struct Qdisc *sch) skb = dequeue_head(flow); len = qdisc_pkt_len(skb); q-backlogs[idx] -= len; - kfree_skb(skb); sch-q.qlen--; qdisc_qstats_drop(sch); qdisc_qstats_backlog_dec(sch, skb); + kfree_skb(skb); flow-dropped++; return idx; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 18/22] fjes: unshare_watch_task
Hi Izumi-san, On Wed, 24 Jun 2015 11:55:50 +0900 Taku Izumi izumi.t...@jp.fujitsu.com wrote: This patch adds unshare_watch_task. Shared buffer's status can be changed into unshared. This task is used to monitor shared buffer's status. Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com --- drivers/net/fjes/fjes.h | 3 + drivers/net/fjes/fjes_main.c | 130 +++ 2 files changed, 133 insertions(+) diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h index d31d4c3..57feee8 100644 --- a/drivers/net/fjes/fjes.h +++ b/drivers/net/fjes/fjes.h @@ -59,6 +59,9 @@ struct fjes_adapter { struct work_struct tx_stall_task; struct work_struct raise_intr_rxdata_task; + struct work_struct unshare_watch_task; + unsigned long unshare_watch_bitmask; + struct delayed_work interrupt_watch_task; bool interrupt_watch_enable; diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c index 1ddb9d3..69a238c 100644 --- a/drivers/net/fjes/fjes_main.c +++ b/drivers/net/fjes/fjes_main.c @@ -73,6 +73,7 @@ static int fjes_remove(struct platform_device *); static int fjes_sw_init(struct fjes_adapter *); static void fjes_netdev_setup(struct net_device *); static void fjes_irq_watch_task(struct work_struct *); +static void fjes_watch_unshare_task(struct work_struct *); static void fjes_rx_irq(struct fjes_adapter *, int); static int fjes_poll(struct napi_struct *, int); @@ -312,6 +313,8 @@ static int fjes_close(struct net_device *netdev) fjes_free_irq(adapter); cancel_delayed_work_sync(adapter-interrupt_watch_task); + cancel_work_sync(adapter-unshare_watch_task); + adapter-unshare_watch_bitmask = 0; cancel_work_sync(adapter-raise_intr_rxdata_task); cancel_work_sync(adapter-tx_stall_task); @@ -1032,6 +1035,8 @@ static int fjes_probe(struct platform_device *plat_dev) INIT_WORK(adapter-tx_stall_task, fjes_tx_stall_task); INIT_WORK(adapter-raise_intr_rxdata_task, fjes_raise_intr_rxdata_task); + INIT_WORK(adapter-unshare_watch_task, fjes_watch_unshare_task); + adapter-unshare_watch_bitmask = 0; INIT_DELAYED_WORK(adapter-interrupt_watch_task, fjes_irq_watch_task); adapter-interrupt_watch_enable = false; @@ -1077,6 +1082,7 @@ static int fjes_remove(struct platform_device *plat_dev) struct fjes_hw *hw = adapter-hw; cancel_delayed_work_sync(adapter-interrupt_watch_task); + cancel_work_sync(adapter-unshare_watch_task); cancel_work_sync(adapter-raise_intr_rxdata_task); cancel_work_sync(adapter-tx_stall_task); if (adapter-control_wq) @@ -1136,6 +1142,130 @@ static void fjes_irq_watch_task(struct work_struct *work) } } +static void fjes_watch_unshare_task(struct work_struct *work) +{ + struct fjes_adapter *adapter = + container_of(work, struct fjes_adapter, unshare_watch_task); + + struct fjes_hw *hw = adapter-hw; + struct net_device *netdev = adapter-netdev; + int epidx; + int max_epid, my_epid; + unsigned long unshare_watch_bitmask; + int wait_time = 0; + int is_shared; + int stop_req, stop_req_done; + int unshare_watch, unshare_reserve; + int ret; + + my_epid = hw-my_epid; + max_epid = hw-max_epid; + + unshare_watch_bitmask = adapter-unshare_watch_bitmask; + adapter-unshare_watch_bitmask = 0; + + while ((unshare_watch_bitmask || hw-txrx_stop_req_bit) +(wait_time 3000)) { + for (epidx = 0; epidx hw-max_epid; epidx++) { + if (epidx == hw-my_epid) + continue; + + is_shared = + fjes_hw_epid_is_shared(hw-hw_info.share, epidx); + + stop_req = + test_bit(epidx, hw-txrx_stop_req_bit); + + stop_req_done = + hw-ep_shm_info[epidx].rx.info-v1i.rx_status + FJES_RX_STOP_REQ_DONE; + + unshare_watch = + test_bit(epidx, unshare_watch_bitmask); + + unshare_reserve = + test_bit(epidx, + hw-hw_info.buffer_unshare_reserve_bit); + + if ((!stop_req || + (is_shared (!is_shared || !stop_req_done))) + (is_shared || !unshare_watch || !unshare_reserve)) + continue; + + mutex_lock(hw-hw_info.lock); + ret = fjes_hw_unregister_buff_addr(hw, epidx); + switch (ret) { + case 0: + break; + case -ENOMSG: + case -EBUSY: + default: + if
Re: [PATCH net-next] ebpf: remove self-assignment in interpreter's tail call
On 7/13/15 11:49 AM, Daniel Borkmann wrote: ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1] and thus has no effect. Add a comment instead, explaining what happens and why it's okay to just remove it. Since from user space side, a tail call is invoked as a pseudo helper function via bpf_tail_call_proto, the verifier checks the arguments just like with any other helper function and makes sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX. Signed-off-by: Daniel Borkmann dan...@iogearbox.net Thanks! Acked-by: Alexei Starovoitov a...@plumgrid.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users
From: Neil Horman nhor...@tuxdriver.com Date: Mon, 13 Jul 2015 06:39:11 -0400 Initially Marcelo had created duplicate code paths, one to return an fd, one to return a file struct. If you would rather go in that direction, I'm sure he can propose it again, but that seems less correct to me than this solution. That's much better. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
From: Carol L Soto cls...@linux.vnet.ibm.com Add function bond_remove_proc_entry at __bond_release_one to avoid stack trace at rmmod bonding. [68830.202239] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [68830.202257] [ cut here ] [68830.202260] WARNING: at fs/proc/generic.c:562 [68830.202412] NIP [c02abf6c] .remove_proc_entry+0x1fc/0x240 [68830.202416] LR [c02abf68] .remove_proc_entry+0x1f8/0x240 [68830.202419] PACATMSCRATCH [80009032] [68830.202421] Call Trace: [68830.202424] [c00179277940] [c02abf68] .remove_proc_entry+0x1f8/0x240 (unreliable) [68830.202434] [c001792779f0] [d53229a4] .bond_destroy_proc_dir+0x34/0x54 [bonding] [68830.202440] [c00179277a70] [d53130e0] .bond_net_exit+0x90/0x120 [bonding] [68830.202445] [c00179277b10] [c059944c] .ops_exit_list.isra.0+0x6c/0xd0 [68830.202450] [c00179277ba0] [c0599774] .unregister_pernet_operations+0x94/0x100 [68830.202454] [c00179277c40] [c0599814] .unregister_pernet_subsys+0x34/0x60 [68830.202460] [c00179277cc0] [d5323758] .bonding_exit+0x48/0x2328 [bonding] [68830.202466] [c00179277d30] [c010dcc4] .SyS_delete_module+0x1f4/0x340 [68830.202471] [c00179277e30] [c0009e7c] syscall_exit+0x0/0x7c [68830.202491] ---[ end trace 9bd1d810219c9875 ]--- Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com --- drivers/net/bonding/bond_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 19eb990..ace105a 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev, dev_set_mac_address(slave_dev, addr); } + bond_remove_proc_entry(bond); + dev_set_mtu(slave_dev, slave-original_mtu); slave_dev-priv_flags = ~IFF_BONDING; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users
On 13-07-2015 15:59, David Miller wrote: From: Neil Horman nhor...@tuxdriver.com Date: Mon, 13 Jul 2015 06:39:11 -0400 Initially Marcelo had created duplicate code paths, one to return an fd, one to return a file struct. If you would rather go in that direction, I'm sure he can propose it again, but that seems less correct to me than this solution. That's much better. I'm not sure what you mean. Is the new option better or the history/description? Marcelo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fighting out-of-order reception with RPS?
On Sun, Jul 12, 2015 at 12:15 PM, Oliver Hartkopp socket...@hartkopp.net wrote: Hello Eric, On 07/11/2015 06:35 AM, Eric Dumazet wrote: On Fri, 2015-07-10 at 22:36 +0200, Oliver Hartkopp wrote: Hm. Doesn't sound like a good solution when there's a difference between NAPI and non-NAPI drivers in matters of OOO, right? Isn't OOO a problem for you ? Then you either have to : 1) Use a single CPU to handle IRQ from the device 2) Use NAPI See below ... What about checking in netif_rx() if the non-NAPI driver has set a hash (aka the driver is OOO sensitive)? And if so we could automatically set rps_cpus for this interface in a way that all CPUs are enabled to take skbs following the hash. Wow, netif_rx() is packet processing fast path, certainly not the place to add controlling path decisions. My only requirement is to be able to pick CAN frames (contained in skbs) from the socket in the same order they have been received. Please convert your driver to NAPI. You might then even benefit from GRO. Just some remarks about CAN and CAN frames as you suggest GRO which is completely pointless for CAN. CAN frames have a 11 or 29 bit CAN Identifier (no MAC but content addressing) and 0 to 64 bytes of payload. Therefore the MTU for CAN interfaces is 16 or 72 byte (see struct can(fd)_frame). Each skbuff contains a single CAN frame. There are CAN controllers which have a FIFO for up to 32 CAN frames, e.g. flexcan.c which also implements NAPI. Others (e.g. sja1000.c) don't have any FIFO and the reading of the CAN frame from the memory mapped registers needs to be processed in the irq context instantly. So 'fast path' netif_rx() is reasonable, right? So why is it not possible to pass netif_rx() skbs from a specific CAN network interface to whatever queue where they are processed in order? E.g. with skb_set_hash(skb, dev-ifindex, PKT_HASH_TYPE_L2); and echo f /sys/class/net/can0/queues/rx-0/rps_cpus I get properly ordered CAN frames - even with netif_rx() processed skbs. I just want to have this stuff to be enabled by default for CAN interfaces to kill the OOO frame issue. If you really must process the CAN FIFO in the hard interrupt then create a private sk_buf queue. In the interrupt, dequeue from FIFO and enqueue on the sk_buf queue. Then schedule NAPI, and when that runs process the sk_buf queue calling call netif_receive_skb for each enqueued skb. Pretty simple actually :-) Regards, Oliver -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tcp: don't use F-RTO on non-recurring timeouts
Currently F-RTO may repeatedly send new data packets on non-recurring timeouts in CA_Loss mode. This is a bug because F-RTO (RFC5682) should only be used on either new recovery or recurring timeouts. This exacerbates the recovery progress during frequent timeout repair, because we prioritize sending new data packets instead of repairing the holes when the bandwidth is already scarce. Fix it by correcting the test of a new recovery episode. Signed-off-by: Yuchung Cheng ych...@google.com Signed-off-by: Neal Cardwell ncardw...@google.com --- net/ipv4/tcp_input.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1578fc2..0cef1af 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1920,14 +1920,13 @@ void tcp_enter_loss(struct sock *sk) const struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; - bool new_recovery = false; + bool new_recovery = icsk-icsk_ca_state TCP_CA_Recovery; bool is_reneg; /* is receiver reneging on SACKs? */ /* Reduce ssthresh if it has not yet been made inside this window. */ if (icsk-icsk_ca_state = TCP_CA_Disorder || !after(tp-high_seq, tp-snd_una) || (icsk-icsk_ca_state == TCP_CA_Loss !icsk-icsk_retransmits)) { - new_recovery = true; tp-prior_ssthresh = tcp_current_ssthresh(sk); tp-snd_ssthresh = icsk-icsk_ca_ops-ssthresh(sk); tcp_ca_event(sk, CA_EVENT_LOSS); -- 2.4.3.573.g4eafbef -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] bnx2:Make various functions to have a return type of void in the file bnx2.c
Sony, I also sent this patch and was wondering if I can get a reply on it. From 4a607447562bec161fd947caae5eb02c2365c58a Mon Sep 17 00:00:00 2001 From: Nicholas Krause xerofo...@gmail.com Date: Wed, 8 Jul 2015 08:29:07 -0400 Subject: [PATCH] bnx2i:Fix backwards locking scenario in the function bnx2i_cleanup_task This fixes the backwards locking scenario for unlocking the bottom half spinlock before calling the wait_for_completion_timeout on the structure pointer bnx2i_conn's member cmd_cleanup_cmpl for the critical region of this function to lock the spin_lock bottom half before unlocking it after the call to this function in order to have actual protection for the function bnx2i_cleanup_task's critical region. Signed-off-by: Nicholas Krause xerofo...@gmail.com --- drivers/scsi/bnx2i/bnx2i_iscsi.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c index 7289437..619a26f 100644 --- a/drivers/scsi/bnx2i/bnx2i_iscsi.c +++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c @@ -1172,12 +1172,12 @@ static void bnx2i_cleanup_task(struct iscsi_task *task) if (task-state == ISCSI_TASK_ABRT_TMF) { bnx2i_send_cmd_cleanup_req(hba, task-dd_data); - spin_unlock_bh(conn-session-back_lock); - spin_unlock_bh(conn-session-frwd_lock); + spin_lock_bh(conn-session-back_lock); + spin_lock_bh(conn-session-frwd_lock); wait_for_completion_timeout(bnx2i_conn- cmd_cleanup_cmpl, msecs_to_jiffies(ISCSI_CMD_CLEANUP_TIMEOUT)); - spin_lock_bh(conn-session-frwd_lock); - spin_lock_bh(conn-session-back_lock); + spin_unlock_bh(conn-session-frwd_lock); + spin_unlock_bh(conn-session-back_lock); } bnx2i_iscsi_unmap_sg_list(task-dd_data); } -- 2.1.4 I am assuming it's wrong but you never known. Nick Nick, I have included the Qlogic ISCSI engineer to the mailing list to review and ACK the patch. I will also follow it up with the ISCSI team. Thanks, Sony -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 0/2] pci: Provide a flag to access VPD through function 0
Many multi-function devices provide shared registers in extended config space for accessing VPD. The behavior of these registers means that the state must be tracked and access locked correctly for accesses not to hang or worse. One way to meet these needs is to always perform the accesses through function 0, thereby using the state tracking and mutex that already exists. To provide this behavior, add a dev_flags bit to indicate that this should be done. This bit can then be set for any non-zero function that needs to redirect such VPD access to function 0. Do not set this bit on the zero function or there will be an infinite recursion. The second patch uses this new flag to invoke this behavior on all multi-function Intel Ethernet devices. Any hardware that shares VPD registers with multiple functions has been suffering these problems forever. The hangs result in the log message: vpd r/w failed. This is likely a firmware bug on this device. Both read and write data corruption are also possible during overlapping accesses in addition to hangs. Signed-off-by: Mark Rustad mark.d.rus...@intel.com --- Changes in V2: - Corrected a spelling error in a log message - Added checks to see that the referenced function 0 is reasonable Changes in V3: - Don't leak a device reference - Check that function 0 has VPD - Make a helper for the function 0 checks - Moved a multifunction check to the quirk patch Changes in V4: - Provide a more extensive commit log for patch 1 --- Mark Rustad (2): pci: Add dev_flags bit to access VPD through function 0 pci: Add VPD quirk for Intel Ethernet devices drivers/pci/access.c | 61 +- drivers/pci/quirks.c |9 +++ include/linux/pci.h |2 ++ 3 files changed, 71 insertions(+), 1 deletion(-) -- Mark Rustad, Network Division, Intel Corporation -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] hv_netvsc: Add close of RNDIS filter into change mtu call
The current change mtu call only stops tx before removing RNDIS filter. In case ringbufer is not empty, the rndis_filter_device_remove() may hang on removing the buffers. This patch adds close of RNDIS filter before removing it, also a gradual waiting loop until the ring is empty. The change_mtu hang issue under heavy traffic is solved by this patch. Signed-off-by: Haiyang Zhang haiya...@microsoft.com Reviewed-by: K. Y. Srinivasan k...@microsoft.com --- drivers/net/hyperv/netvsc_drv.c | 58 +++ 1 files changed, 52 insertions(+), 6 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index b855ba9..7b36d5f 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -106,7 +106,7 @@ static int netvsc_open(struct net_device *net) return ret; } - netif_tx_start_all_queues(net); + netif_tx_wake_all_queues(net); nvdev = hv_get_drvdata(device_obj); rdev = nvdev-extension; @@ -120,15 +120,56 @@ static int netvsc_close(struct net_device *net) { struct net_device_context *net_device_ctx = netdev_priv(net); struct hv_device *device_obj = net_device_ctx-device_ctx; + struct netvsc_device *nvdev = hv_get_drvdata(device_obj); int ret; + u32 aread, awrite, i, msec = 10, retry = 0, retry_max = 20; + struct vmbus_channel *chn; netif_tx_disable(net); /* Make sure netvsc_set_multicast_list doesn't re-enable filter! */ cancel_work_sync(net_device_ctx-work); ret = rndis_filter_close(device_obj); - if (ret != 0) + if (ret != 0) { netdev_err(net, unable to close device (ret %d).\n, ret); + return ret; + } + + /* Ensure pending bytes in ring are read */ + while (true) { + aread = 0; + for (i = 0; i nvdev-num_chn; i++) { + chn = nvdev-chn_table[i]; + if (!chn) + continue; + + hv_get_ringbuffer_availbytes(chn-inbound, aread, +awrite); + + if (aread) + break; + + hv_get_ringbuffer_availbytes(chn-outbound, aread, +awrite); + + if (aread) + break; + } + + retry++; + if (retry retry_max || aread == 0) + break; + + msleep(msec); + + if (msec 1000) + msec *= 2; + } + + if (aread) { + netdev_err(net, Ring buffer not empty after closing rndis\n); + ret = -ETIMEDOUT; + } return ret; } @@ -736,6 +777,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) struct netvsc_device *nvdev = hv_get_drvdata(hdev); struct netvsc_device_info device_info; int limit = ETH_DATA_LEN; + int ret = 0; if (nvdev == NULL || nvdev-destroy) return -ENODEV; @@ -746,9 +788,11 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) if (mtu NETVSC_MTU_MIN || mtu limit) return -EINVAL; + ret = netvsc_close(ndev); + if (ret) + goto out; + nvdev-start_remove = true; - cancel_work_sync(ndevctx-work); - netif_tx_disable(ndev); rndis_filter_device_remove(hdev); ndev-mtu = mtu; @@ -758,9 +802,11 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) device_info.ring_size = ring_size; device_info.max_num_vrss_chns = max_num_vrss_chns; rndis_filter_device_add(hdev, device_info); - netif_tx_wake_all_queues(ndev); - return 0; +out: + netvsc_open(ndev); + + return ret; } static struct rtnl_link_stats64 *netvsc_get_stats64(struct net_device *net, -- 1.7.4.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] ebpf: remove self-assignment in interpreter's tail call
ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1] and thus has no effect. Add a comment instead, explaining what happens and why it's okay to just remove it. Since from user space side, a tail call is invoked as a pseudo helper function via bpf_tail_call_proto, the verifier checks the arguments just like with any other helper function and makes sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX. Signed-off-by: Daniel Borkmann dan...@iogearbox.net --- kernel/bpf/core.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index c5bedc8..bf38f5e 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -453,7 +453,11 @@ select_insn: if (unlikely(!prog)) goto out; - ARG1 = BPF_R1; + /* ARG1 at this point is guaranteed to point to CTX from +* the verifier side due to the fact that the tail call is +* handeled like a helper, that is, bpf_tail_call_proto, +* where arg1_type is ARG_PTR_TO_CTX. +*/ insn = prog-insnsi; goto select_insn; out: -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 08/12] IB/cma: Add net_dev and private data checks to RDMA CM
On Mon, Jun 22, 2015 at 03:42:37PM +0300, Haggai Eran wrote: + switch (ib_event-event) { + case IB_CM_REQ_RECEIVED: + req-device = req_param-listen_id-device; + req-port = req_param-port; + req-local_gid = req_param-primary_path-sgid; + req-service_id = req_param-primary_path-service_id; + req-pkey = be16_to_cpu(req_param-primary_path-pkey); I feel pretty strongly that we should be using the pkey from the work completion, not the pkey in the message. The reason, if someone is using pkey like vlan, and expecting a container to never receive packets outside the assigned pkey, then we need to check each and every packet for the correct pkey before associating it with that container. When doing the namespace patches you should probably also look at other CM GMPs than just the REQ and how the paths are setup and consider what to do with the pkey. I'd probably suggest that the pkey should be forced throughout the entire process to ensure it always matches the ip device - at least for containers that is the right thing.. I probably wouldn't turn it on for the root namespace though.. Jason -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c
On Mon, Jul 13, 2015 at 9:13 AM, Akemi Yagi amy...@gmail.com wrote: On Sun, 05 Jul 2015 08:35:20 -0700, Guenter Roeck wrote: On Sat, Jul 04, 2015 at 12:44:36AM -0700, Vinson Lee wrote: Hi. With the latest Linux 4.2-rc1, I am hitting this build error with GCC 4.4.7 on CentOS 6. CC net/netfilter/ipset/ip_set_hash_netnet.o net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet4_uadt’: net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field ‘cidr’ specified in initializer net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces around initializer net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: (near initialization for ‘e.anonymous.ip’) net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet6_uadt’: net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field ‘cidr’ specified in initializer net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces around initializer net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: (near initialization for ‘e.ip[0]’) Previously fixed with commit 1a869205c75cb (netfilter: ipset: The unnamed union initialization may lead to compilation error), reintroduced with commit aff227581ed1a (netfilter: ipset: Check CIDR value only when attribute is given). Guenter I wonder what can be done to get this issue fixed. This problem was seen in 4.2-rc1 and now in 4.2-rc2 on RHEL-6.6. Just revert the initializer piece. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
request for -stable: route: Use ipv4_mtu instead of raw rt_pmtu
Hi, Can you queue for active older -stables up to 3.18: commit 3cdaa5be9e81 ipv4: Don't increase PMTU with Datagram Too Big message commit cb6ccf09d6b9 route: Use ipv4_mtu instead of raw rt_pmtu commit 3cdaa5be9e81 made it to 3.19.y and was later fixed additionally with conversion to ipv4_mtu() in the second referenced commit. However, these patches together will fix another case that is not so obvious: the case if the original route had MTU set on it. Previously it was ignored but using ipv4_mtu as the first check will also check RTAX_MTU on metrics. This fixes the nasty issue that PMTU can trigger to send larger packets then what was explicitly configured via a static route mtu. Thanks, Timo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 05/12] IB/cm: Share listening CM IDs
On Mon, Jun 22, 2015 at 03:42:34PM +0300, Haggai Eran wrote: spin_lock_irq(cm.lock); + if (--cm_id_priv-listen_sharecount 0) { + /* The id is still shared. */ + atomic_dec(cm_id_priv-refcount); Nit: This looks very strange not to be cm_deref_id .. Looks OK as is because we are sure refcount cannot be 0 here? @@ -958,8 +988,10 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask, } cm_id-state = IB_CM_LISTEN; + ++cm_id_priv-listen_sharecount; - spin_lock_irqsave(cm.lock, flags); + if (lock) + spin_lock_irqsave(cm.lock, flags); Hmm, I'd like to see the listen_sharecount consistently locked, so it should be manipulated only while cm.lock is held.. if (service_id == IB_CM_ASSIGN_SERVICE_ID) { cm_id-service_id = cpu_to_be64(cm.listen_service_id++); cm_id-service_mask = ~cpu_to_be64(0); @@ -968,18 +1000,98 @@ int ib_cm_listen(struct ib_cm_id *cm_id, __be64 service_id, __be64 service_mask, cm_id-service_mask = service_mask; } cur_cm_id_priv = cm_insert_listen(cm_id_priv); - spin_unlock_irqrestore(cm.lock, flags); + if (lock) + spin_unlock_irqrestore(cm.lock, flags); if (cur_cm_id_priv) { cm_id-state = IB_CM_IDLE; + --cm_id_priv-listen_sharecount; Ditto Otherwise I don't see any other mechanical problems with this. Sean said he was happy with the idea right? Reviewed-By: Jason Gunthorpe jguntho...@obsidianresearch.com Jason -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 11/12] IB/cma: Share ib_cm_ids between rdma_cm_ids
On Mon, Jun 22, 2015 at 03:42:40PM +0300, Haggai Eran wrote: Use ib_cm_id_create_and_listen to create listening IB CM IDs or share ^^^ Is that the wrong name? ib_cm_insert_listen perhaps? I think I've looked at the details in this series I was concerned about, Sean should OK the rest of the changes to the CM code, but nothing much stood out to me. Jason -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().
At Mon, 13 Jul 2015 23:28:10 +0900, YOSHIFUJI Hideaki/吉藤英明 wrote: Commit 9131f3de2 (ipv6: Do not iterate over all interfaces when finding source address on specific interface.) did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline e...@google.com. Based on patch proposed by Hajime Tazaki thehaj...@gmail.com. Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b (ipv6: Do not iterate over all interfaces when finding source address on specific interface.) Signed-off-by: YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com all of my tests passed with the patch on 14fe22e: Revert ipv4: use skb coalescing in defragmentation. thanks for the prompt fix ! Acked-by: Hajime Tazaki thehaj...@gmail.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: Build IPv6 into kernel by default
This patch makes the default to build IPv6 into the kernel. IPv6 now has significant traction and any remaining vestiges of IPv6 not being provided parity with IPv4 should be swept away. IPv6 is now core to the Internet and kernel. Points on IPv6 adoption: - Per Google statistics, IPv6 usage has reached 7% on the Internet and continues to exhibit an exponential growth rate https://www.google.com/intl/en/ipv6/statistics.html - Just a few days ago ARIN officially depleted its IPv4 pool - IPv6 only data centers are being successfully built (e.g. at Facebook) This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6 is set to y and the text has been updated to reflect the maturity of IPv6. Impact: Under some circumstances building modules in to kernel might have a performance advantage. In my testing, I did notice a very slight improvement. This will obviously increase the size of the kernel image. In my configuration I see: IPv6 as module: textdata bss dec hex filename 9703666 1899288 933888 12536842 bf4c0a vmlinux IPv6 built into kernel text data bss dec hex filename 9436490 1879600 913408 12229498 ba9b7a vmlinux Which increases text size by ~270K (2.8% increase in size for me). If image size is an issue, presumably for a device which does not do IP networking (IMO we should be discouraging IPv4-only devices), IPV6 can be disabled or still built as a module. Acked-by: YOSHIFUJI Hideaki yoshf...@linux-ipv6.org Signed-off-by: Tom Herbert t...@herbertland.com --- net/ipv6/Kconfig | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig index 438a73a..643f613 100644 --- a/net/ipv6/Kconfig +++ b/net/ipv6/Kconfig @@ -5,16 +5,15 @@ # IPv6 as module will cause a CRASH if you try to unload it menuconfig IPV6 tristate The IPv6 protocol - default m + default y ---help--- - This is complemental support for the IP version 6. - You will still be able to do traditional IPv4 networking as well. + Support for IP version 6 (IPv6). For general information about IPv6, see https://en.wikipedia.org/wiki/IPv6. - For Linux IPv6 development information, see http://www.linux-ipv6.org. - For specific information about IPv6 under Linux, read the HOWTO at - http://www.bieringer.de/linux/IPv6/. + For specific information about IPv6 under Linux, see + Documentation/networking/ipv6.txt and read the HOWTO at + http://www.tldp.org/HOWTO/Linux+IPv6-HOWTO/ To compile this protocol support as a module, choose M here: the module will be called ipv6. -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ipv6: Fix finding best source address in ipv6_dev_get_saddr().
I am testing this patch which may be a little simpler. Also idev needs to be checked after __in6_dev_get Tom diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4ab74d5..d631ac3 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1363,9 +1363,10 @@ static void __ipv6_dev_get_saddr(struct net *net, unsigned int prefs, const struct in6_addr *saddr, struct inet6_dev *idev, -struct ipv6_saddr_score *scores) +struct ipv6_saddr_score **in_score, +struct ipv6_saddr_score **in_hiscore) { - struct ipv6_saddr_score *score = scores[0], *hiscore = scores[1]; + struct ipv6_saddr_score *score = *in_score, *hiscore = *in_hiscore; read_lock_bh(idev-lock); list_for_each_entry(score-ifa, idev-addr_list, if_list) { @@ -1434,13 +1435,16 @@ static void __ipv6_dev_get_saddr(struct net *net, } out: read_unlock_bh(idev-lock); + *in_hiscore = hiscore; + *in_score = score; } int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, const struct in6_addr *daddr, unsigned int prefs, struct in6_addr *saddr) { - struct ipv6_saddr_score scores[2], *hiscore = scores[1]; + struct ipv6_saddr_score scores[2]; + struct ipv6_saddr_score *score = scores[0], *hiscore = scores[1]; struct ipv6_saddr_dst dst; struct inet6_dev *idev; struct net_device *dev; @@ -1475,18 +1479,19 @@ int ipv6_dev_get_saddr(struct net *net, const struct net_device *dst_dev, if ((dst_type IPV6_ADDR_MULTICAST) || dst.scope = IPV6_ADDR_SCOPE_LINKLOCAL) { idev = __in6_dev_get(dst_dev); - use_oif_addr = true; + if (idev) + use_oif_addr = true; } } if (use_oif_addr) { - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, score, hiscore); } else { for_each_netdev_rcu(net, dev) { idev = __in6_dev_get(dev); if (!idev) continue; - __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, scores); + __ipv6_dev_get_saddr(net, dst, prefs, saddr, idev, score, hiscore); } } rcu_read_unlock(); On Mon, Jul 13, 2015 at 7:28 AM, YOSHIFUJI Hideaki/吉藤英明 hideaki.yoshif...@miraclelinux.com wrote: Commit 9131f3de2 (ipv6: Do not iterate over all interfaces when finding source address on specific interface.) did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline e...@google.com. Based on patch proposed by Hajime Tazaki thehaj...@gmail.com. Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b (ipv6: Do not iterate over all interfaces when finding source address on specific interface.) Signed-off-by: YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com --- net/ipv6/addrconf.c | 30 ++ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4ab74d5..4c9a024 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1358,14 +1358,15 @@ out: return ret; } -static void __ipv6_dev_get_saddr(struct net *net, -struct ipv6_saddr_dst *dst, -unsigned int prefs, -const struct in6_addr *saddr, -struct inet6_dev *idev, -struct ipv6_saddr_score *scores) +static int __ipv6_dev_get_saddr(struct net *net, + struct ipv6_saddr_dst *dst, + unsigned int prefs, + const struct in6_addr *saddr, + struct inet6_dev *idev, + struct ipv6_saddr_score *scores, + int hiscore_idx) { - struct ipv6_saddr_score *score = scores[0], *hiscore = scores[1]; + struct ipv6_saddr_score *score = scores[1 - hiscore_idx], *hiscore = scores[hiscore_idx]; read_lock_bh(idev-lock); list_for_each_entry(score-ifa, idev-addr_list, if_list) { @@ -1424,6 +1425,7 @@ static void __ipv6_dev_get_saddr(struct net *net, in6_ifa_hold(score-ifa); swap(hiscore, score); + hiscore_idx = 1 - hiscore_idx; /* restore our iterator */
Re: [PATCH v2 15/22] fjes: net_device_ops.ndo_vlan_rx_add/kill_vid
Hi Izumi-san, On Wed, 24 Jun 2015 11:55:47 +0900 Taku Izumi izumi.t...@jp.fujitsu.com wrote: This patch adds net_device_ops.ndo_vlan_rx_add_vid and net_device_ops.ndo_vlan_rx_kill_vid callback. Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com --- drivers/net/fjes/fjes_hw.c | 27 +++ drivers/net/fjes/fjes_hw.h | 2 ++ drivers/net/fjes/fjes_main.c | 40 3 files changed, 69 insertions(+) diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c index 5e3f847..8363e22 100644 --- a/drivers/net/fjes/fjes_hw.c +++ b/drivers/net/fjes/fjes_hw.c @@ -827,6 +827,33 @@ bool fjes_hw_check_vlan_id(struct epbuf_handler *epbh, u16 vlan_id) return ret; } +bool fjes_hw_set_vlan_id(struct epbuf_handler *epbh, u16 vlan_id) +{ + union ep_buffer_info *info = epbh-info; + int i; + + for (i = 0; i EP_BUFFER_SUPPORT_VLAN_MAX; i++) { + if (info-v1i.vlan_id[i] == 0) { + info-v1i.vlan_id[i] = vlan_id; + return true; + } + } + return false; +} + +void fjes_hw_del_vlan_id(struct epbuf_handler *epbh, u16 vlan_id) +{ + union ep_buffer_info *info = epbh-info; + int i; + + if (0 != vlan_id) { How about using the following if statement so than you can delete indent? if (vlan_id == 0) return; + for (i = 0; i EP_BUFFER_SUPPORT_VLAN_MAX; i++) { + if (vlan_id == info-v1i.vlan_id[i]) + info-v1i.vlan_id[i] = 0; + } + } +} + bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *epbh) { union ep_buffer_info *info = epbh-info; diff --git a/drivers/net/fjes/fjes_hw.h b/drivers/net/fjes/fjes_hw.h index ea30aeb..afad03e 100644 --- a/drivers/net/fjes/fjes_hw.h +++ b/drivers/net/fjes/fjes_hw.h @@ -321,6 +321,8 @@ int fjes_hw_epid_is_shared(struct fjes_device_shared_info *, int); bool fjes_hw_check_epbuf_version(struct epbuf_handler *, u32); bool fjes_hw_check_mtu(struct epbuf_handler *, u32); bool fjes_hw_check_vlan_id(struct epbuf_handler *, u16); +bool fjes_hw_set_vlan_id(struct epbuf_handler *, u16); +void fjes_hw_del_vlan_id(struct epbuf_handler *, u16); bool fjes_hw_epbuf_rx_is_empty(struct epbuf_handler *); void *fjes_hw_epbuf_rx_curpkt_get_addr(struct epbuf_handler *, size_t *); void fjes_hw_epbuf_rx_curpkt_drop(struct epbuf_handler *); diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c index e2e69e0..bb4c8e4 100644 --- a/drivers/net/fjes/fjes_main.c +++ b/drivers/net/fjes/fjes_main.c @@ -58,6 +58,8 @@ static irqreturn_t fjes_intr(int, void*); static struct rtnl_link_stats64 * fjes_get_stats64(struct net_device *, struct rtnl_link_stats64 *); static int fjes_change_mtu(struct net_device *, int); +static int fjes_vlan_rx_add_vid(struct net_device *, __be16 proto, u16); +static int fjes_vlan_rx_kill_vid(struct net_device *, __be16 proto, u16); static void fjes_tx_retry(struct net_device *); static int fjes_acpi_add(struct acpi_device *); @@ -229,6 +231,8 @@ static const struct net_device_ops fjes_netdev_ops = { .ndo_get_stats64= fjes_get_stats64, .ndo_change_mtu = fjes_change_mtu, .ndo_tx_timeout = fjes_tx_retry, + .ndo_vlan_rx_add_vid= fjes_vlan_rx_add_vid, + .ndo_vlan_rx_kill_vid = fjes_vlan_rx_kill_vid, }; /* fjes_open - Called when a network interface is made active */ @@ -757,6 +761,42 @@ static int fjes_change_mtu(struct net_device *netdev, int new_mtu) return -EINVAL; } +static int fjes_vlan_rx_add_vid(struct net_device *netdev, + __be16 proto, u16 vid) +{ + struct fjes_adapter *adapter = netdev_priv(netdev); + bool ret = true; + int epid; + + for (epid = 0; epid adapter-hw.max_epid; epid++) { + if (epid == adapter-hw.my_epid) + continue; + + if (!fjes_hw_check_vlan_id( + adapter-hw.ep_shm_info[epid].tx, vid)) + ret = fjes_hw_set_vlan_id( + adapter-hw.ep_shm_info[epid].tx, vid); + } + + return ret ? 0 : -ENOSPC; +} + +static int fjes_vlan_rx_kill_vid(struct net_device *netdev, + __be16 proto, u16 vid) The function always returns 0. So how about defining the function as void? Thanks, Ysauaki Ishimatsu +{ + struct fjes_adapter *adapter = netdev_priv(netdev); + int epid; + + for (epid = 0; epid adapter-hw.max_epid; epid++) { + if (epid == adapter-hw.my_epid) + continue; + + fjes_hw_del_vlan_id(adapter-hw.ep_shm_info[epid].tx, vid); + } + + return 0; +} + static irqreturn_t fjes_intr(int irq, void *data) { struct fjes_adapter *adapter = data; --
Re: [PATCH] nf: IDLETIMER: fix lockdep warning
On Mon, Jul 13, 2015 at 08:02:36AM -0700, Dmitry Torokhov wrote: On Mon, Jul 13, 2015 at 6:20 AM, Pablo Neira Ayuso pa...@netfilter.org wrote: On Thu, Jul 09, 2015 at 05:15:01PM -0700, Dmitry Torokhov wrote: Dynamically allocated sysfs attributes should be initialized with sysfs_attr_init() otherwise lockdep will be angry with us: [ 45.468653] BUG: key ffc030fad4e0 not in .data! [ 45.468655] [ cut here ] [ 45.468666] WARNING: CPU: 0 PID: 1176 at /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 lockdep_init_map+0x12c/0x490() [ 45.468672] DEBUG_LOCKS_WARN_ON(1) [ 45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U W 3.18.0 #43 [ 45.468674] Hardware name: XXX [ 45.468675] Call trace: [ 45.468680] [ffc0002072b4] dump_backtrace+0x0/0x10c [ 45.468683] [ffc0002073d0] show_stack+0x10/0x1c [ 45.468688] [ffc000a86cd4] dump_stack+0x74/0x94 [ 45.468692] [ffc000217ae0] warn_slowpath_common+0x84/0xb0 [ 45.468694] [ffc000217b84] warn_slowpath_fmt+0x4c/0x58 [ 45.468697] [ffc0002530a4] lockdep_init_map+0x128/0x490 [ 45.468701] [ffc000367ef0] __kernfs_create_file+0x80/0xe4 [ 45.468704] [ffc00036862c] sysfs_add_file_mode_ns+0x104/0x170 [ 45.468706] [ffc00036870c] sysfs_create_file_ns+0x58/0x64 [ 45.468711] [ffc000930430] idletimer_tg_checkentry+0x14c/0x324 [ 45.468714] [ffc00092a728] xt_check_target+0x170/0x198 [ 45.468717] [ffc000993efc] check_target+0x58/0x6c [ 45.468720] [ffc000994c64] translate_table+0x30c/0x424 [ 45.468723] [ffc00099529c] do_ipt_set_ctl+0x144/0x1d0 [ 45.468728] [ffc0009079f0] nf_setsockopt+0x50/0x60 [ 45.468732] [ffc000946870] ip_setsockopt+0x8c/0xb4 [ 45.468735] [ffc0009661c0] raw_setsockopt+0x10/0x50 [ 45.468739] [ffc0008c1550] sock_common_setsockopt+0x14/0x20 [ 45.468742] [ffc0008bd190] SyS_setsockopt+0x88/0xb8 [ 45.468744] ---[ end trace 41d156354d18c039 ]--- Applied, thanks. One question: Change-Id: I1da5cd96fc8e1e1e4209e81eba1165a42d4d45e9 BTW, does this gerrit change ID provide any public information? Thanks. Argh, I am sorry, I forgot to clean this out when mailing the patch. In this particular case you can find the change in AOSP gerrit at https://android-review.googlesource.com but without such context this change-id is of course useless. No problem, I'll remove it. Thanks Dmitry. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux 4.2 build error in net/netfilter/ipset/ip_set_hash_netnet.c
On Sun, 05 Jul 2015 08:35:20 -0700, Guenter Roeck wrote: On Sat, Jul 04, 2015 at 12:44:36AM -0700, Vinson Lee wrote: Hi. With the latest Linux 4.2-rc1, I am hitting this build error with GCC 4.4.7 on CentOS 6. CC net/netfilter/ipset/ip_set_hash_netnet.o net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet4_uadt’: net/netfilter/ipset/ip_set_hash_netnet.c:163: error: unknown field ‘cidr’ specified in initializer net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: missing braces around initializer net/netfilter/ipset/ip_set_hash_netnet.c:163: warning: (near initialization for ‘e.anonymous.ip’) net/netfilter/ipset/ip_set_hash_netnet.c: In function ‘hash_netnet6_uadt’: net/netfilter/ipset/ip_set_hash_netnet.c:388: error: unknown field ‘cidr’ specified in initializer net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: missing braces around initializer net/netfilter/ipset/ip_set_hash_netnet.c:388: warning: (near initialization for ‘e.ip[0]’) Previously fixed with commit 1a869205c75cb (netfilter: ipset: The unnamed union initialization may lead to compilation error), reintroduced with commit aff227581ed1a (netfilter: ipset: Check CIDR value only when attribute is given). Guenter I wonder what can be done to get this issue fixed. This problem was seen in 4.2-rc1 and now in 4.2-rc2 on RHEL-6.6. $ gcc --version gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11) Please advise. Akemi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 10/22] fjes: tx_stall_task
Hi Izum-san, On Wed, 24 Jun 2015 11:55:42 +0900 Taku Izumi izumi.t...@jp.fujitsu.com wrote: This patch adds tx_stall_task. When receiver's buffer is full, sender stops its tx queue. This task is used to monitor receiver's status and when receiver's buffer is avairable, it resumes tx queue. Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com --- drivers/net/fjes/fjes.h | 2 ++ drivers/net/fjes/fjes_main.c | 63 2 files changed, 65 insertions(+) diff --git a/drivers/net/fjes/fjes.h b/drivers/net/fjes/fjes.h index 8e9899e..b04ea9d 100644 --- a/drivers/net/fjes/fjes.h +++ b/drivers/net/fjes/fjes.h @@ -30,6 +30,7 @@ #define FJES_MAX_QUEUES 1 #define FJES_TX_RETRY_INTERVAL (20 * HZ) #define FJES_TX_RETRY_TIMEOUT(100) +#define FJES_TX_TX_STALL_TIMEOUT (FJES_TX_RETRY_INTERVAL / 2) #define FJES_OPEN_ZONE_UPDATE_WAIT (300) /* msec */ /* board specific private data structure */ @@ -52,6 +53,7 @@ struct fjes_adapter { struct workqueue_struct *txrx_wq; + struct work_struct tx_stall_task; struct work_struct raise_intr_rxdata_task; struct fjes_hw hw; diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c index 735aa5e..f4c2445 100644 --- a/drivers/net/fjes/fjes_main.c +++ b/drivers/net/fjes/fjes_main.c @@ -53,6 +53,7 @@ static int fjes_setup_resources(struct fjes_adapter *); static void fjes_free_resources(struct fjes_adapter *); static netdev_tx_t fjes_xmit_frame(struct sk_buff *, struct net_device *); static void fjes_raise_intr_rxdata_task(struct work_struct *); +static void fjes_tx_stall_task(struct work_struct *); static irqreturn_t fjes_intr(int, void*); static int fjes_acpi_add(struct acpi_device *); @@ -281,6 +282,7 @@ static int fjes_close(struct net_device *netdev) fjes_free_irq(adapter); cancel_work_sync(adapter-raise_intr_rxdata_task); + cancel_work_sync(adapter-tx_stall_task); fjes_hw_wait_epstop(hw); @@ -410,6 +412,61 @@ static void fjes_free_resources(struct fjes_adapter *adapter) } } +static void fjes_tx_stall_task(struct work_struct *work) +{ + struct fjes_adapter *adapter = container_of(work, + struct fjes_adapter, tx_stall_task); + struct fjes_hw *hw = adapter-hw; + struct net_device *netdev = adapter-netdev; + enum ep_partner_status pstatus; + int epid; + int max_epid, my_epid; + union ep_buffer_info *info; + int all_queue_available; + int i; + int sendable; + + if (((long)jiffies - + (long)(netdev-trans_start)) FJES_TX_TX_STALL_TIMEOUT) { + netif_wake_queue(netdev); + return; + } + + my_epid = hw-my_epid; + max_epid = hw-max_epid; + + for (i = 0; i 5; i++) { Why do you loop 5 times? Thanks, Yasuaki Ishimatsu + all_queue_available = 1; + + for (epid = 0; epid max_epid; epid++) { + if (my_epid == epid) + continue; + + pstatus = fjes_hw_get_partner_ep_status(hw, epid); + sendable = (pstatus == EP_PARTNER_SHARED); + if (!sendable) + continue; + + info = adapter-hw.ep_shm_info[epid].tx.info; + + if (EP_RING_FULL(info-v1i.head, info-v1i.tail, + info-v1i.count_max)) { + all_queue_available = 0; + break; + } + } + + if (all_queue_available) { + netif_wake_queue(netdev); + return; + } + } + + usleep_range(50, 100); + + queue_work(adapter-txrx_wq, adapter-tx_stall_task); +} + static void fjes_raise_intr_rxdata_task(struct work_struct *work) { struct fjes_adapter *adapter = container_of(work, @@ -606,6 +663,10 @@ fjes_xmit_frame(struct sk_buff *skb, struct net_device *netdev) netdev-trans_start = jiffies; netif_tx_stop_queue(cur_queue); + if (!work_pending(adapter-tx_stall_task)) + queue_work(adapter-txrx_wq, + adapter-tx_stall_task); + ret = NETDEV_TX_BUSY; } } else { @@ -690,6 +751,7 @@ static int fjes_probe(struct platform_device *plat_dev) adapter-txrx_wq = create_workqueue(DRV_NAME /txrx); + INIT_WORK(adapter-tx_stall_task, fjes_tx_stall_task); INIT_WORK(adapter-raise_intr_rxdata_task, fjes_raise_intr_rxdata_task); @@ -734,6
[PATCH] net: qlcnic: Deletion of unnecessary memset
There is no need to memset memory allocated with vzalloc. Signed-off-by: Christophe JAILLET christophe.jail...@wanadoo.fr --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 2f6cc42..7dbab3c 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -2403,7 +2403,6 @@ int qlcnic_alloc_tx_rings(struct qlcnic_adapter *adapter, qlcnic_free_tx_rings(adapter); return -ENOMEM; } - memset(cmd_buf_arr, 0, TX_BUFF_RINGSIZE(tx_ring)); tx_ring-cmd_buf_arr = cmd_buf_arr; spin_lock_init(tx_ring-tx_clean_lock); } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
virtio-net TSO Lockup
We've been encountering an issue in the virtio-net driver that cause it to become unresponsive after a period of high load. This issue goes away if we disable TSO on the interface. Once this issue has been triggered, the interface can still receive traffic, but will not transmit anything. Specifically: * Initially the machine will still try to respond to packets (I say try, because I see the packets in tcpdump, but the counters shown by 'ip -s -d link show eth1' do not increment. I also do not see the packets make it to the upstream network interface) * After a little while (1-2 minutes), I stop seeing the response packets in tcpdump. (In this case I'm looking for ARP request/replies, so the requests still come in, but the responses do not go out. This is not limited to just ARP, the interface will not respond at all) * If I leave a ping running while the interface is broken, eventually I start seeing 'ping: sendmsg: No buffer space available' I've reproduced this on a few Ubuntu kernel builds (3.13.0-53-generic and 4.0.7-040007-generic), and a few CentOS kernels (2.6.32-504.16.2.el6.x86_64, 4.1.1-1.el6.elrepo.x86_64) so I do not believe this to be distribution specific. If I restart the machine (just issuing a server level 'reboot' command, not restarting qemu itself), the adapter starts working properly again. Interestingly, these machines have two virtio NICs, and this only seems to occur for one of them (by this, I mean eth0 always works, and eth1 always breaks. If I remove eth0 from the machine, eth1 still breaks). On the host level, the broken one is a macvtap interface, while the working one is an tap device. We've seen this in the past with a different interface type (the qemu multicast NIC type), so I do not believe this is really relevant. If I switch the machines to using emulated e1000 nics, I can no longer reproduce the issue. Reproduction is fairly easy, with two machines run `nc -lk 1818 | pv /dev/null` on one, and `cat /dev/zero | pv | nc 10.99.0.100 1818` (the machine sending traffic will break within a minute or two). I can easily provide access to machines where the problem manifests, if that would be helpful. I'm not really sure where to go from here. Tracking down a bug in the virtio driver is a bit above my skill level. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
On 07/13/2015 11:05 PM, Nikolay Aleksandrov wrote: On 07/13/2015 08:57 PM, cls...@linux.vnet.ibm.com wrote: From: Carol L Soto cls...@linux.vnet.ibm.com Add function bond_remove_proc_entry at __bond_release_one to avoid stack trace at rmmod bonding. [68830.202239] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [68830.202257] [ cut here ] [68830.202260] WARNING: at fs/proc/generic.c:562 [68830.202412] NIP [c02abf6c] .remove_proc_entry+0x1fc/0x240 [68830.202416] LR [c02abf68] .remove_proc_entry+0x1f8/0x240 [68830.202419] PACATMSCRATCH [80009032] [68830.202421] Call Trace: [68830.202424] [c00179277940] [c02abf68] .remove_proc_entry+0x1f8/0x240 (unreliable) [68830.202434] [c001792779f0] [d53229a4] .bond_destroy_proc_dir+0x34/0x54 [bonding] [68830.202440] [c00179277a70] [d53130e0] .bond_net_exit+0x90/0x120 [bonding] [68830.202445] [c00179277b10] [c059944c] .ops_exit_list.isra.0+0x6c/0xd0 [68830.202450] [c00179277ba0] [c0599774] .unregister_pernet_operations+0x94/0x100 [68830.202454] [c00179277c40] [c0599814] .unregister_pernet_subsys+0x34/0x60 [68830.202460] [c00179277cc0] [d5323758] .bonding_exit+0x48/0x2328 [bonding] [68830.202466] [c00179277d30] [c010dcc4] .SyS_delete_module+0x1f4/0x340 [68830.202471] [c00179277e30] [c0009e7c] syscall_exit+0x0/0x7c [68830.202491] ---[ end trace 9bd1d810219c9875 ]--- Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com --- drivers/net/bonding/bond_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 19eb990..ace105a 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev, dev_set_mac_address(slave_dev, addr); } +bond_remove_proc_entry(bond); + dev_set_mtu(slave_dev, slave-original_mtu); slave_dev-priv_flags = ~IFF_BONDING; This is incorrect, it tries to remove the bond entry on every slave release so if we have a bonding device with = 2 slaves and release one of them then the whole bond device entry will be removed from /proc/net/bonding. You can hit this case only if you had created a bonding device while doing the rmmod bonding (it's an old race condition which was fixed long time ago, but the procfs was apparently missed) and only after the notifier has been unregistered but before the sysfs has been removed. Scratch this part, it should be triggered in a different way. Could you provide a way to reproduce ? Since the bonding netdevice notifier is handling the procfs creation/destruction we could try moving the unregister after the pernet destruction which should help avoid such problems. Could you try the following patch: diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 19eb990d398c..d515ee38b77f 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4682,12 +4682,10 @@ err_link: static void __exit bonding_exit(void) { - unregister_netdevice_notifier(bond_netdev_notifier); - bond_destroy_debugfs(); - bond_netlink_fini(); unregister_pernet_subsys(bond_net_ops); + unregister_netdevice_notifier(bond_netdev_notifier); #ifdef CONFIG_NET_POLL_CONTROLLER /* Make sure we don't have an imbalance on our netpoll blocking */ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users
From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Date: Mon, 13 Jul 2015 16:05:27 -0300 On 13-07-2015 15:59, David Miller wrote: From: Neil Horman nhor...@tuxdriver.com Date: Mon, 13 Jul 2015 06:39:11 -0400 Initially Marcelo had created duplicate code paths, one to return an fd, one to return a file struct. If you would rather go in that direction, I'm sure he can propose it again, but that seems less correct to me than this solution. That's much better. I'm not sure what you mean. Is the new option better or the history/description? I mean that adding an explicit function for these internal kernel users to call is better. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tc: fix tc actions in case of shared skb
On 7/11/15 9:29 PM, David Miller wrote: From: Alexei Starovoitov a...@plumgrid.com Date: Fri, 10 Jul 2015 17:10:11 -0700 TC actions need to check for very unlikely event skb-users != 1, otherwise subsequent pskb_may_pull/pskb_expand_head will crash. When skb_shared() just drop the packet, since in the middle of actions it's too late to call skb_share_check(), since classifiers/actions assume the same skb pointer. Signed-off-by: Alexei Starovoitov a...@plumgrid.com I think whatever creates this skb-users != 1 situation should be fixed, they should clone the packet. In all normal cases skb-users == 1, but pktgen is using trick: atomic_add(burst, skb-users); so when testing something like: tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \ action vlan push id 2 action drop it will crash: [ 31.999519] kernel BUG at ../net/core/skbuff.c:1130! [ 31.999519] invalid opcode: [#1] PREEMPT SMP [ 31.999519] Modules linked in: act_gact act_vlan cls_u32 sch_ingress veth pktgen [ 31.999519] CPU: 0 PID: 339 Comm: kpktgend_0 Not tainted 4.1.0+ #730 [ 31.999519] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), [ 31.999519] Call Trace: [ 31.999519] [8160eea7] skb_vlan_push+0x1d7/0x200 [ 31.999519] [a0017108] tcf_vlan+0x108/0x110 [act_vlan] [ 31.999519] [81650d26] tcf_action_exec+0x46/0x80 [ 31.999519] [a001f4fe] u32_classify+0x30e/0x740 [cls_u32] [ 31.999519] [810bcc6f] ? __lock_acquire+0xbcf/0x1e80 [ 31.999519] [810bcc6f] ? __lock_acquire+0xbcf/0x1e80 [ 31.999519] [8161f392] ? __netif_receive_skb_core+0x1b2/0xce0 [ 31.999519] [8164c0c3] tc_classify_compat+0xa3/0xb0 [ 31.999519] [8164ca03] tc_classify+0x33/0x90 [ 31.999519] [8161f674] __netif_receive_skb_core+0x494/0xce0 [ 31.999519] [8161f274] ? __netif_receive_skb_core+0x94/0xce0 [ 31.999519] [810bf10d] ? trace_hardirqs_on_caller+0xad/0x1d0 [ 31.999519] [8161fee1] __netif_receive_skb+0x21/0x70 [ 31.999519] [81620b43] netif_receive_skb_internal+0x23/0x1c0 [ 31.999519] [816219a9] netif_receive_skb_sk+0x49/0x1e0 [ 31.999519] [a0006e8d] pktgen_thread_worker+0x111d/0x1fa0 [pktgen] In fact, it would really help enormously if you could explain in detail how this situation can actually arise. Especially since I do not consider it acceptable to drop the packet in this situation. It's not pretty to drop, but it's better than crash. I don't think we can get rid of 'skb-users += burst' trick, since that's where all performance comes from (for both TX and RX testing). So the only cheap way I see to avoid crash is to do this if (unlikely(skb_shared(skb))) check in actions that call pskb_expand_head. In all normal scenarios it won't be triggered and pktgen tests won't be crashing. Yes. pktgen numbers will be a bit meaningless, since act_vlan will be dropping instead of adding vlan, so users cannot make any performance conclusions, but still better than crash. the rules specified here: Documentation/networking/tc-actions-env-rules.txt insufficient? Jamal, that doc definitely needs updating. :) It says: If you munge any packet thou shalt call pskb_expand_head in the case someone else is referencing the skb. After that you own the skb. that's incorrect. If somebody 'referencing' skb via skb-users 1 it's too late to call pskb_expand_head. As you can see in the crash trace above. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: Build IPv6 into kernel by default
From: Tom Herbert t...@herbertland.com Date: Mon, 13 Jul 2015 08:48:00 -0700 This patch makes the default to build IPv6 into the kernel. IPv6 now has significant traction and any remaining vestiges of IPv6 not being provided parity with IPv4 should be swept away. IPv6 is now core to the Internet and kernel. Points on IPv6 adoption: ... Acked-by: YOSHIFUJI Hideaki yoshf...@linux-ipv6.org Signed-off-by: Tom Herbert t...@herbertland.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ebpf: remove self-assignment in interpreter's tail call
From: Daniel Borkmann dan...@iogearbox.net Date: Mon, 13 Jul 2015 20:49:32 +0200 ARG1 = BPF_R1 as it stands, evaluates to regs[BPF_REG_1] = regs[BPF_REG_1] and thus has no effect. Add a comment instead, explaining what happens and why it's okay to just remove it. Since from user space side, a tail call is invoked as a pseudo helper function via bpf_tail_call_proto, the verifier checks the arguments just like with any other helper function and makes sure that the first argument (regs[BPF_REG_1])'s type is ARG_PTR_TO_CTX. Signed-off-by: Daniel Borkmann dan...@iogearbox.net Applied, thanks Daniel. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] sctp: SCTP_SOCKOPT_PEELOFF return socket pointer for kernel users
On 13-07-2015 16:58, David Miller wrote: From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Date: Mon, 13 Jul 2015 16:05:27 -0300 On 13-07-2015 15:59, David Miller wrote: From: Neil Horman nhor...@tuxdriver.com Date: Mon, 13 Jul 2015 06:39:11 -0400 Initially Marcelo had created duplicate code paths, one to return an fd, one to return a file struct. If you would rather go in that direction, I'm sure he can propose it again, but that seems less correct to me than this solution. That's much better. I'm not sure what you mean. Is the new option better or the history/description? I mean that adding an explicit function for these internal kernel users to call is better. Okay. I'll try to minimize that code duplication then. Thanks Marcelo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Logically DeadCode
On 3 July 2015 at 06:52, Rahul Jain rahul.j...@samsung.com wrote: From 0c34030166a150d6d9f1ab52e7bb40a5440a68c2 Mon Sep 17 00:00:00 2001 From: Rahul Jain rahul.j...@samsung.com Date: Fri, 3 Jul 2015 10:19:12 +0530 Subject: [PATCH] Logically DeadCode You didn't use any prefix for the commit message, it's unclear (Logically DeadCode what?), no description, you touch two code places at once. Please fix above problems and resend. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert net: fec: Ensure clocks are enabled while using mdio bus
From: Fabio Estevam fabio.este...@freescale.com Date: Mon, 13 Jul 2015 08:13:52 -0300 This reverts commit 6c3e921b18edca290099adfddde8a50236bf2d80. commit 6c3e921b18ed (net: fec: Ensure clocks are enabled while using mdio bus) prevents the kernel to boot on mx6 boards, so let's revert it. Reported-by: Tyler Baker tyler.ba...@linaro.org Signed-off-by: Fabio Estevam fabio.este...@freescale.com Andrew, please review. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tc: fix tc actions in case of shared skb
On 7/13/15 1:04 PM, David Miller wrote: From: Alexei Starovoitov a...@plumgrid.com Date: Mon, 13 Jul 2015 12:47:42 -0700 In all normal cases skb-users == 1, but pktgen is using trick: atomic_add(burst, skb-users); so when testing something like: You can want pktgen rx (which is the only buggy case as far as I can see, TX is fine) to run fast, but you must do so by abiding by the appropriate SKB sharing rules. You can't do an optimization in pktgen for RX processing that works some of the time. We have shared SKB rules for a reason. And I don't want to have to explain to someone in the future why that drop check is there, and have to tell them because pktgen is broken and we decided to add a hack here rather than make pktgen send properly formed SKBs into the RX path Ok? in general all makes sense, but it is both RX and TX. Without burst hack we cannot achieve line rate TX. atomic_add(burst, pkt_dev-skb-users); xmit_more: ret = netdev_start_xmit(pkt_dev-skb, odev, txq, --burst 0); in pktgen we check that driver can work with users 1 via: pkt_dev-odev-priv_flags IFF_TX_SKB_SHARING so real hw driver are mostly ready for users 1, it's only few tc actions struggle a bit. We cannot check tc actions from pktgen, since they can be added dynamically. So I see three options: 1 get rid of burst hack for both RX and TX in pktgen (kills performance) 2 add unlikely(skb_shread) check to few tc actions 3 do nothing I think 2 isn't that bad after all if properly documented with because pktgen is doing this hack for performance ? I'm fine with 3 too, since the whole pktgen business is for root and for kernel hackers who suppose to know what they're doing. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tc: fix tc actions in case of shared skb
From: Alexei Starovoitov a...@plumgrid.com Date: Mon, 13 Jul 2015 12:47:42 -0700 In all normal cases skb-users == 1, but pktgen is using trick: atomic_add(burst, skb-users); so when testing something like: You can want pktgen rx (which is the only buggy case as far as I can see, TX is fine) to run fast, but you must do so by abiding by the appropriate SKB sharing rules. You can't do an optimization in pktgen for RX processing that works some of the time. We have shared SKB rules for a reason. And I don't want to have to explain to someone in the future why that drop check is there, and have to tell them because pktgen is broken and we decided to add a hack here rather than make pktgen send properly formed SKBs into the RX path Ok? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] net: macb: Add mdio driver for accessing multiple phy devices
On 12/07/15 21:48, Punnaiah Choudary Kalluri wrote: This patch is to add support for the design that has multiple ethernet mac controllers and single mdio bus connected to multiple phy devices. i.e mdio lines are connected to any of the ethernet mac controller and all the phy devices will be accessed using the phy maintenance interface in that mac controller. __ _ | | |PHY0 | | MAC0 |-| | |__| | |_| | __| _ | | | | | | MAC1 | |_|PHY1 | |__| | | So, i come up with two implementations for addressing the above configuration. Implementation 1: Have separate driver for mdio bus Create a DT node for all the PHY devices connected to the mdio bus This driver will share the register space of the mac controller that has mdio bus connected. That is the best design implementation, MDIO in itself is a sub-piece of your Ethernet MAC controller the fact that it is within the Ethernet MAC core is just coincidental, but there is no reason why it could not be taken apart and made a separate block in itself. Implementation 2: Add new property has-mdio and it should be 1 for the mac that has mdio bus connected. Create the mdio bus only when the has-mdio property is 1 Please review the two implementations and suggest which one is better to proceed further. In my opinion implementation 1 will be the ideal one. Agreed. Currently i have tested the patches with single mac and single phy configuration. I need to take care of few more cases before releasing the final patch but before that i would like to have your opinion on the above implementations and finalize one implementation. so that i can enhance it further. Punnaiah Choudary Kalluri (1): net: macb: Add mdio driver for accessing multiple phy devices net: macb: Add support for single mac managing more than one phy drivers/net/ethernet/cadence/Makefile|2 +- drivers/net/ethernet/cadence/macb.c | 93 +- drivers/net/ethernet/cadence/macb.h |3 +- drivers/net/ethernet/cadence/macb_mdio.c | 204 ++ 4 files changed, 211 insertions(+), 91 deletions(-) create mode 100644 drivers/net/ethernet/cadence/macb_mdio.c -- Florian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tc: fix tc actions in case of shared skb
On 07/13/2015 10:17 PM, Alexei Starovoitov wrote: ... We cannot check tc actions from pktgen, since they can be added dynamically. So I see three options: 1 get rid of burst hack for both RX and TX in pktgen (kills performance) 2 add unlikely(skb_shread) check to few tc actions 3 do nothing I think 2 isn't that bad after all if properly documented with because pktgen is doing this hack for performance ? I'm fine with 3 too, since the whole pktgen business is for root and for kernel hackers who suppose to know what they're doing. Hmm, one thing for option 3 could be that we add a modinfo tag experimental, so that on loading of pktgen module, we trigger (like in case of staging) ... add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK); ... and add a pr_warn() to the user, it may be more visible/clear than the Packet Generator (USE WITH CAUTION) Kconfig title? ;) It'd be a pity that we'd need the extra atomic read only for the pktgen case. :/ With regards to option 2, you could hide that behind a static inline helper wrapped in IS_ENABLED(CONFIG_NET_PKTGEN), but that is a vry ugly workaround/hack as well (and distros might even ship it nevertheless). I wouldn't be surprised if there are other usage combinations with pktgen that would crash your box. :/ Best, Daniel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/bonding: Add function bond_remove_proc_entry at __bond_release_one
On 07/13/2015 08:57 PM, cls...@linux.vnet.ibm.com wrote: From: Carol L Soto cls...@linux.vnet.ibm.com Add function bond_remove_proc_entry at __bond_release_one to avoid stack trace at rmmod bonding. [68830.202239] remove_proc_entry: removing non-empty directory 'net/bonding', leaking at least 'bond0' [68830.202257] [ cut here ] [68830.202260] WARNING: at fs/proc/generic.c:562 [68830.202412] NIP [c02abf6c] .remove_proc_entry+0x1fc/0x240 [68830.202416] LR [c02abf68] .remove_proc_entry+0x1f8/0x240 [68830.202419] PACATMSCRATCH [80009032] [68830.202421] Call Trace: [68830.202424] [c00179277940] [c02abf68] .remove_proc_entry+0x1f8/0x240 (unreliable) [68830.202434] [c001792779f0] [d53229a4] .bond_destroy_proc_dir+0x34/0x54 [bonding] [68830.202440] [c00179277a70] [d53130e0] .bond_net_exit+0x90/0x120 [bonding] [68830.202445] [c00179277b10] [c059944c] .ops_exit_list.isra.0+0x6c/0xd0 [68830.202450] [c00179277ba0] [c0599774] .unregister_pernet_operations+0x94/0x100 [68830.202454] [c00179277c40] [c0599814] .unregister_pernet_subsys+0x34/0x60 [68830.202460] [c00179277cc0] [d5323758] .bonding_exit+0x48/0x2328 [bonding] [68830.202466] [c00179277d30] [c010dcc4] .SyS_delete_module+0x1f4/0x340 [68830.202471] [c00179277e30] [c0009e7c] syscall_exit+0x0/0x7c [68830.202491] ---[ end trace 9bd1d810219c9875 ]--- Signed-off-by: Carol L Soto cls...@linux.vnet.ibm.com --- drivers/net/bonding/bond_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 19eb990..ace105a 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1870,6 +1870,8 @@ static int __bond_release_one(struct net_device *bond_dev, dev_set_mac_address(slave_dev, addr); } + bond_remove_proc_entry(bond); + dev_set_mtu(slave_dev, slave-original_mtu); slave_dev-priv_flags = ~IFF_BONDING; This is incorrect, it tries to remove the bond entry on every slave release so if we have a bonding device with = 2 slaves and release one of them then the whole bond device entry will be removed from /proc/net/bonding. You can hit this case only if you had created a bonding device while doing the rmmod bonding (it's an old race condition which was fixed long time ago, but the procfs was apparently missed) and only after the notifier has been unregistered but before the sysfs has been removed. Since the bonding netdevice notifier is handling the procfs creation/destruction we could try moving the unregister after the pernet destruction which should help avoid such problems. Could you try the following patch: diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 19eb990d398c..d515ee38b77f 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4682,12 +4682,10 @@ err_link: static void __exit bonding_exit(void) { - unregister_netdevice_notifier(bond_netdev_notifier); - bond_destroy_debugfs(); - bond_netlink_fini(); unregister_pernet_subsys(bond_net_ops); + unregister_netdevice_notifier(bond_netdev_notifier); #ifdef CONFIG_NET_POLL_CONTROLLER /* Make sure we don't have an imbalance on our netpoll blocking */ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 18/22] openvswitch: Make tunnel set action attach a metadata dst
Hi Thomas, On 10 July 2015 at 07:19, Thomas Graf tg...@suug.ch wrote: diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c index ecfa530..05fe46b 100644 --- a/net/openvswitch/flow_netlink.c +++ b/net/openvswitch/flow_netlink.c @@ -1548,11 +1548,45 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size, bool log) return sfa; } +static void ovs_nla_free_set_action(const struct nlattr *a) +{ + const struct nlattr *ovs_key = nla_data(a); + struct ovs_tunnel_info *ovs_tun; + + switch (nla_type(ovs_key)) { + case OVS_KEY_ATTR_TUNNEL_INFO: + ovs_tun = nla_data(ovs_key); + dst_release((struct dst_entry *)ovs_tun-tun_dst); + break; + } +} + +void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts) +{ + const struct nlattr *a; + int rem; + + nla_for_each_attr(a, sf_acts-actions, sf_acts-actions_len, rem) { + switch (nla_type(a)) { + case OVS_ACTION_ATTR_SET: + ovs_nla_free_set_action(a); + break; + } + } + + kfree(sf_acts); +} It doesn't look like flow_free() is using this new function to properly free the actions. Also, some of the error cases that hit this code have sf_acts=NULL. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html