[PATCH] openvswitch: make for_each_node loops work with sparse numa systems
Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow-stats[node]. For example, if node_possible_map is 0x30003, this patch will map node to node_cnt as follows: 0,1,16,17 = 0,1,2,3 The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- net/openvswitch/flow.c | 10 ++ net/openvswitch/flow_table.c | 18 +++--- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index bc7b0ab..425d45d 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow, struct ovs_flow_stats *ovs_stats, unsigned long *used, __be16 *tcp_flags) { - int node; + int node, node_cnt = 0; *used = 0; *tcp_flags = 0; memset(ovs_stats, 0, sizeof(*ovs_stats)); for_each_node(node) { - struct flow_stats *stats = rcu_dereference_ovsl(flow-stats[node]); + struct flow_stats *stats = rcu_dereference_ovsl(flow-stats[node_cnt]); if (stats) { /* Local CPU may write on non-local stats, so we must @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow, ovs_stats-n_bytes += stats-byte_count; spin_unlock_bh(stats-lock); } + node_cnt++; } } /* Called with ovs_mutex. */ void ovs_flow_stats_clear(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; for_each_node(node) { - struct flow_stats *stats = ovsl_dereference(flow-stats[node]); + struct flow_stats *stats = ovsl_dereference(flow-stats[node_cnt]); if (stats) { spin_lock_bh(stats-lock); @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow) stats-tcp_flags = 0; spin_unlock_bh(stats-lock); } + node_cnt++; } } diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 4613df8..5d10c54 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void) { struct sw_flow *flow; struct flow_stats *stats; - int node; + int node, node_cnt = 0; flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); if (!flow) @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void) RCU_INIT_POINTER(flow-stats[0], stats); - for_each_node(node) + for_each_node(node) { if (node != 0) - RCU_INIT_POINTER(flow-stats[node], NULL); + RCU_INIT_POINTER(flow-stats[node_cnt], NULL); + node_cnt++; + } return flow; err: @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int n_buckets) static void flow_free(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; if (ovs_identifier_is_key(flow-id)) kfree(flow-id.unmasked_key); kfree((struct sw_flow_actions __force *)flow-sf_acts); - for_each_node(node) - if (flow-stats[node]) + for_each_node(node) { + if (flow-stats[node_cnt]) kmem_cache_free(flow_stats_cache, - (struct flow_stats __force *)flow-stats[node]); + (struct flow_stats __force *)flow-stats[node_cnt]); + node_cnt++; + } kmem_cache_free(flow_cache, flow); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] ath10k: fixing wrong initialization of struct channel
Maninder Singh maninder...@samsung.com writes: chandef is initialized with NULL and on the very next line, we are using it to get channel, which is not correct. channel should be initialized after obtaining chandef. Signed-off-by: Maninder Singh maninder...@samsung.com How did you find this bug? Static anlysis reports this bug like coverity or any other static tool like cppcheck :- drivers/net/wireless/ath/ath10k/mac.c:839]: (error) Possible null pointer dereference: chandef Thanks. This is always good to add to the commit log so I did that: ath10k: fix wrong initialization of struct channel chandef is initialized with NULL and on the very next line, we are using it to get channel, which is not correct. Channel should be initialized after obtaining chandef. Found by cppcheck: ath/ath10k/mac.c:839]: (error) Possible null pointer dereference: chandef Signed-off-by: Maninder Singh maninder...@samsung.com Signed-off-by: Kalle Valo kv...@qca.qualcomm.com -- Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring
So it gets freed when the device is going away. This fixes a DMA memory leak on driver probe() fail and driver remove(). Signed-off-by: Lucas Stach l.st...@pengutronix.de --- v2: Fix indentation of second line to fix alignment with opening bracket. --- drivers/net/ethernet/freescale/fec_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 349365d85b92..a7f1bdf718f8 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev) fep-bufdesc_size; /* Allocate memory for buffer descriptors. */ - cbd_base = dma_alloc_coherent(NULL, bd_size, bd_dma, - GFP_KERNEL); + cbd_base = dmam_alloc_coherent(fep-pdev-dev, bd_size, bd_dma, + GFP_KERNEL); if (!cbd_base) { return -ENOMEM; } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path
This function frees resources and cancels delayed work item that have been initialized in fec_ptp_init(). Use this to do proper error handling if something goes wrong in probe function after fec_ptp_init has been called. Signed-off-by: Lucas Stach l.st...@pengutronix.de --- drivers/net/ethernet/freescale/fec.h | 1 + drivers/net/ethernet/freescale/fec_main.c | 5 ++--- drivers/net/ethernet/freescale/fec_ptp.c | 10 ++ 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 1eee73cccdf5..99d33e2d35e6 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -562,6 +562,7 @@ struct fec_enet_private { }; void fec_ptp_init(struct platform_device *pdev); +void fec_ptp_stop(struct platform_device *pdev); void fec_ptp_start_cyclecounter(struct net_device *ndev); int fec_ptp_set(struct net_device *ndev, struct ifreq *ifr); int fec_ptp_get(struct net_device *ndev, struct ifreq *ifr); diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index a7f1bdf718f8..32e3807c650e 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3494,6 +3494,7 @@ failed_register: failed_mii_init: failed_irq: failed_init: + fec_ptp_stop(pdev); if (fep-reg_phy) regulator_disable(fep-reg_phy); failed_regulator: @@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev) struct net_device *ndev = platform_get_drvdata(pdev); struct fec_enet_private *fep = netdev_priv(ndev); - cancel_delayed_work_sync(fep-time_keep); cancel_work_sync(fep-tx_timeout_work); + fec_ptp_stop(pdev); unregister_netdev(ndev); fec_enet_mii_remove(fep); if (fep-reg_phy) regulator_disable(fep-reg_phy); - if (fep-ptp_clock) - ptp_clock_unregister(fep-ptp_clock); of_node_put(fep-phy_node); free_netdev(ndev); diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c index a15663ad7f5e..f457a23d0bfb 100644 --- a/drivers/net/ethernet/freescale/fec_ptp.c +++ b/drivers/net/ethernet/freescale/fec_ptp.c @@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev) schedule_delayed_work(fep-time_keep, HZ); } +void fec_ptp_stop(struct platform_device *pdev) +{ + struct net_device *ndev = platform_get_drvdata(pdev); + struct fec_enet_private *fep = netdev_priv(ndev); + + cancel_delayed_work_sync(fep-time_keep); + if (fep-ptp_clock) + ptp_clock_unregister(fep-ptp_clock); +} + /** * fec_ptp_check_pps_event * @fep: the fec_enet_private structure handle -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems
On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote: On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and Couldn't this also be fixed by just allocationg with a multiple of nr_node_ids (which seems to have been the original intent all along)? You could then make your stats array be sparse or not. Yea originally this is what I did, but I thought it would be wasting memory. assumes the node variable returned by for_each_node will index into flow-stats[node]. For example, if node_possible_map is 0x30003, this patch will map node to node_cnt as follows: 0,1,16,17 = 0,1,2,3 The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 My concern with this version of the fix is that you're relying on, implicitly, the order of for_each_node's iteration corresponding to the entries in stats 1:1. But what about node hotplug? It seems better to have the enumeration of the stats array match the topology accurately, rather, or to maintain some sort of internal map in the OVS code between the NUMA node and the entry in the stats array? I'm willing to be convinced otherwise, though :) -Nish Nish, The method I described should work for hotplug since it's using possible map which AFAIK is static rather than the online map. Regardless, the more simple solution to solve this issue would be to just allocate nr_node_ids number of entries and use up extra memory. I'll send a v2 after testing it. --chris Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- net/openvswitch/flow.c | 10 ++ net/openvswitch/flow_table.c | 18 +++--- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index bc7b0ab..425d45d 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow, struct ovs_flow_stats *ovs_stats, unsigned long *used, __be16 *tcp_flags) { - int node; + int node, node_cnt = 0; *used = 0; *tcp_flags = 0; memset(ovs_stats, 0, sizeof(*ovs_stats)); for_each_node(node) { - struct flow_stats *stats = rcu_dereference_ovsl(flow-stats[node]); + struct flow_stats *stats = rcu_dereference_ovsl(flow-stats[node_cnt]); if (stats) { /* Local CPU may write on non-local stats, so we must @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow, ovs_stats-n_bytes += stats-byte_count; spin_unlock_bh(stats-lock); } + node_cnt++; } } /* Called with ovs_mutex. */ void ovs_flow_stats_clear(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; for_each_node(node) { - struct flow_stats *stats = ovsl_dereference(flow-stats[node]); + struct flow_stats *stats = ovsl_dereference(flow-stats[node_cnt]); if (stats) { spin_lock_bh(stats-lock); @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow) stats-tcp_flags = 0; spin_unlock_bh(stats-lock); } + node_cnt++; } } diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 4613df8..5d10c54 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void) { struct sw_flow *flow; struct flow_stats *stats; - int node; + int node, node_cnt = 0; flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); if (!flow) @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void) RCU_INIT_POINTER(flow-stats[0], stats); - for_each_node(node) + for_each_node(node) { if (node != 0) - RCU_INIT_POINTER(flow-stats[node], NULL); + RCU_INIT_POINTER(flow-stats[node_cnt], NULL); + node_cnt++; + } return flow; err: @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int n_buckets) static void flow_free(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; if (ovs_identifier_is_key(flow-id)) kfree(flow-id.unmasked_key); kfree((struct sw_flow_actions __force *)flow-sf_acts); - for_each_node(node) - if (flow-stats[node]) + for_each_node(node) { + if (flow-stats[node_cnt])
Re: [PATCH] netcp:Fix error handling in the function netcp_xgbe_serdes_config
On 07/20/2015 11:54 AM, Nicholas Krause wrote: This fixes error handling in the function netcp_xgbe_serdes_config by putting the return value of netcp_xgbe_serdes_check_lane into the variable ret and return this value to the caller as this function can fail when called by returning the error code -ETIMEOUT. Signed-off-by: Nicholas Krause xerofo...@gmail.com --- drivers/net/ethernet/ti/netcp_xgbepcsr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/netcp_xgbepcsr.c b/drivers/net/ethernet/ti/netcp_xgbepcsr.c index 33571ac..0c79e3d 100644 --- a/drivers/net/ethernet/ti/netcp_xgbepcsr.c +++ b/drivers/net/ethernet/ti/netcp_xgbepcsr.c @@ -483,7 +483,7 @@ static int netcp_xgbe_serdes_config(void __iomem *serdes_regs, return ret; netcp_xgbe_serdes_enable_xgmii_port(sw_regs); - netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs); + ret = netcp_xgbe_serdes_check_lane(serdes_regs, sw_regs); return ret; } Nicholas, Thanks for the patch. Acked-by: Murali Karicheri m-kariche...@ti.com -- Murali Karicheri Linux Kernel, Keystone -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2/2] iwlegacy: convert hex_dump_to_buffer() to %*ph
There is no need to use hex_dump_to_buffer() in the cases like this: hexdump_to_buffer(buf, len, 16, 1, outbuf, outlen, false); /* len = 16 */ sprintf(%s\n, outbuf); since it maybe easily converted to simple: sprintf(%*ph\n, len, buf); Note: it seems in the case the output is groupped by 2 bytes and looks like a typo. Thus, patch changes that to plain byte stream. Signed-off-by: Andy Shevchenko andriy.shevche...@linux.intel.com Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] packet: Allow packets with only a header (but no payload)
9c70776 added validation for the packet size in packet_snd. This change enforced that every packet needs a long enough header and at least one byte payload. However, when trying to establish a PPPoE connection the following message is printed every time a PPPoE discovery packet is sent: pppd: packet size is too short (24 = 24) From what I can see in the PPPoE code the PADI discovery packet can consist of only a header with no payload (when there is neither a service name nor a Host-Uniq configured). Signed-off-by: Martin Blumenstingl martin.blumensti...@googlemail.com --- net/packet/af_packet.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c9e8741..d983f8f 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2199,18 +2199,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } -static bool ll_header_truncated(const struct net_device *dev, int len) -{ - /* net device doesn't like empty head */ - if (unlikely(len = dev-hard_header_len)) { - net_warn_ratelimited(%s: packet size is too short (%d = %d)\n, -current-comm, len, dev-hard_header_len); - return true; - } - - return false; -} - static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, void *frame, struct net_device *dev, int size_max, __be16 proto, unsigned char *addr, int hlen) @@ -2286,8 +2274,14 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, if (unlikely(err 0)) return -EINVAL; } else if (dev-hard_header_len) { - if (ll_header_truncated(dev, tp_len)) + /* net device doesn't like empty head */ + if (unlikely(len = dev-hard_header_len)) { + net_warn_ratelimited(%s: packet size is too short + (%d = %d)\n, + current-comm, len, + dev-hard_header_len); return -EINVAL; + } skb_push(skb, dev-hard_header_len); err = skb_store_bits(skb, 0, data, @@ -2624,8 +2618,13 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) if (unlikely(offset 0)) goto out_free; } else { - if (ll_header_truncated(dev, len)) + if (unlikely(len dev-hard_header_len)) { + net_warn_ratelimited(%s: packet size is shorter than + minimum header size (%d %d)\n, + current-comm, len, + dev-hard_header_len); goto out_free; + } } /* Returns -EFAULT on error */ -- 2.4.6 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] packet: Allow packets with only a header (but no payload)
On Tue, Jul 21, 2015 at 12:14 PM, Martin Blumenstingl martin.blumensti...@googlemail.com wrote: 9c70776 added validation for the packet size in packet_snd. This change enforced that every packet needs a long enough header and at least one byte payload. However, when trying to establish a PPPoE connection the following message is printed every time a PPPoE discovery packet is sent: pppd: packet size is too short (24 = 24) From what I can see in the PPPoE code the PADI discovery packet can consist of only a header with no payload (when there is neither a service name nor a Host-Uniq configured). Interesting. 9c7077622dd9 only extended the check from tpacket_snd to packet_snd to make the two paths equivalent. The existing check had the ominous statement /* net device doesn't like empty head */ so allowing a header-only packet while correct in your case may not be safe in some edge cases (specific device drivers?). This was also discussed previously http://www.spinics.net/lists/netdev/msg309677.html In any case, I don't think that reverting the patch and restoring the old inconsistent state is a fix. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: rtlwifi: rtl8821ae: Fix an expression that is always false
In routine _rtl8821ae_set_media_status(), an incorrect mask results in a test for AP status to always be false. Similar bugs were fixed in rtl8192cu and rtl8192de, but this instance was missed at that time. Reported-by: David Binderman dcb...@hotmail.com Signed-off-by: Larry Finger larry.fin...@lwfinger.net Cc: Stable sta...@vger.kernel.org [3.18+] Cc: David Binderman dcb...@hotmail.com Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 net-next 0/3] ARM BPF JIT features
On 7/21/15 5:16 AM, Nicolas Schichan wrote: This serie adds support for more instructions to the ARM BPF JIT namely skb netdevice type retrieval, skb payload offset retrieval, and skb packet type retrieval. This allows 35 tests to use the JIT instead of 29 before. This serie depends on the BPF JIT fixes for ARM serie sent earlier. Actually in these patches I don't see a strong dependency on 'net' set, but since you're saying there is, you'd need to resubmit this set after your 'net' set is merged, whole 'net' sent to Linus and merged into net-next. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] packet: Allow packets with only a header (but no payload)
Hi Willem, On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn will...@google.com wrote: Interesting. 9c7077622dd9 only extended the check from tpacket_snd to packet_snd to make the two paths equivalent. The existing check had the ominous statement /* net device doesn't like empty head */ OK, I guess it's best to find out what the purpose of this comment is. so allowing a header-only packet while correct in your case may not be safe in some edge cases (specific device drivers?). I'm wondering how a good fix would look like (I can think of a few things, like renaming hard_header_len to something min_packet_size)? I am open for suggestions since I have zero knowledge about the inner workings of the packet framework. This was also discussed previously http://www.spinics.net/lists/netdev/msg309677.html In any case, I don't think that reverting the patch and restoring the old inconsistent state is a fix. I totally agree with you that it's a bad fix if this means that we could break other drivers. My primary goal was to fix PPPoE connections - I guess I should have simply added RFC to the subject. Regards, Martin -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems
On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and Couldn't this also be fixed by just allocationg with a multiple of nr_node_ids (which seems to have been the original intent all along)? You could then make your stats array be sparse or not. assumes the node variable returned by for_each_node will index into flow-stats[node]. For example, if node_possible_map is 0x30003, this patch will map node to node_cnt as follows: 0,1,16,17 = 0,1,2,3 The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 My concern with this version of the fix is that you're relying on, implicitly, the order of for_each_node's iteration corresponding to the entries in stats 1:1. But what about node hotplug? It seems better to have the enumeration of the stats array match the topology accurately, rather, or to maintain some sort of internal map in the OVS code between the NUMA node and the entry in the stats array? I'm willing to be convinced otherwise, though :) -Nish Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- net/openvswitch/flow.c | 10 ++ net/openvswitch/flow_table.c | 18 +++--- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index bc7b0ab..425d45d 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -134,14 +134,14 @@ void ovs_flow_stats_get(const struct sw_flow *flow, struct ovs_flow_stats *ovs_stats, unsigned long *used, __be16 *tcp_flags) { - int node; + int node, node_cnt = 0; *used = 0; *tcp_flags = 0; memset(ovs_stats, 0, sizeof(*ovs_stats)); for_each_node(node) { - struct flow_stats *stats = rcu_dereference_ovsl(flow-stats[node]); + struct flow_stats *stats = rcu_dereference_ovsl(flow-stats[node_cnt]); if (stats) { /* Local CPU may write on non-local stats, so we must @@ -155,16 +155,17 @@ void ovs_flow_stats_get(const struct sw_flow *flow, ovs_stats-n_bytes += stats-byte_count; spin_unlock_bh(stats-lock); } + node_cnt++; } } /* Called with ovs_mutex. */ void ovs_flow_stats_clear(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; for_each_node(node) { - struct flow_stats *stats = ovsl_dereference(flow-stats[node]); + struct flow_stats *stats = ovsl_dereference(flow-stats[node_cnt]); if (stats) { spin_lock_bh(stats-lock); @@ -174,6 +175,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow) stats-tcp_flags = 0; spin_unlock_bh(stats-lock); } + node_cnt++; } } diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 4613df8..5d10c54 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -77,7 +77,7 @@ struct sw_flow *ovs_flow_alloc(void) { struct sw_flow *flow; struct flow_stats *stats; - int node; + int node, node_cnt = 0; flow = kmem_cache_alloc(flow_cache, GFP_KERNEL); if (!flow) @@ -99,9 +99,11 @@ struct sw_flow *ovs_flow_alloc(void) RCU_INIT_POINTER(flow-stats[0], stats); - for_each_node(node) + for_each_node(node) { if (node != 0) - RCU_INIT_POINTER(flow-stats[node], NULL); + RCU_INIT_POINTER(flow-stats[node_cnt], NULL); + node_cnt++; + } return flow; err: @@ -139,15 +141,17 @@ static struct flex_array *alloc_buckets(unsigned int n_buckets) static void flow_free(struct sw_flow *flow) { - int node; + int node, node_cnt = 0; if (ovs_identifier_is_key(flow-id)) kfree(flow-id.unmasked_key); kfree((struct sw_flow_actions __force *)flow-sf_acts); - for_each_node(node) - if (flow-stats[node]) + for_each_node(node) { + if (flow-stats[node_cnt]) kmem_cache_free(flow_stats_cache, - (struct flow_stats __force *)flow-stats[node]); + (struct flow_stats __force *)flow-stats[node_cnt]); + node_cnt++; + } kmem_cache_free(flow_cache, flow); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at
[net-next:master 187/208] net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces)
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: e3e4712ec0961ed586a8db340bd994c4ad7f5dba [187/208] mpls: ip tunnel support reproduce: # apt-get install sparse git checkout e3e4712ec0961ed586a8db340bd994c4ad7f5dba make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by ) net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces) vim +73 net/mpls/mpls_iptunnel.c 57 /* Obtain the ttl */ 58 if (skb-protocol == htons(ETH_P_IP)) { 59 ttl = ip_hdr(skb)-ttl; 60 rt = (struct rtable *)dst; 61 lwtstate = rt-rt_lwtstate; 62 } else if (skb-protocol == htons(ETH_P_IPV6)) { 63 ttl = ipv6_hdr(skb)-hop_limit; 64 rt6 = (struct rt6_info *)dst; 65 lwtstate = rt6-rt6i_lwtstate; 66 } else { 67 goto drop; 68 } 69 70 skb_orphan(skb); 71 72 /* Find the output device */ 73 out_dev = rcu_dereference(dst-dev); 74 if (!mpls_output_possible(out_dev) || 75 !lwtstate || skb_warn_if_lro(skb)) 76 goto drop; 77 78 skb_forward_csum(skb); 79 80 tun_encap_info = mpls_lwtunnel_encap(lwtstate); 81 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path
Hi, Martin KaFai Lau wrote: The patch checks neigh-nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. You have to take some lock when accessing neigh-nud_state theoretically. I also take this chance to re-arrange the code. No, please do not mix multiple changes. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): BeforeAfter 55M 95M Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6090969..a6c6b5a 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w) static void rt6_probe(struct rt6_info *rt) { + struct __rt6_probe_work *work; struct neighbour *neigh; /* * Okay, this does not seem to be appropriate @@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt) rcu_read_lock_bh(); neigh = __ipv6_neigh_lookup_noref(rt-dst.dev, rt-rt6i_gateway); if (neigh) { - write_lock(neigh-lock); if (neigh-nud_state NUD_VALID) goto out; - } - - if (!neigh || - time_after(jiffies, neigh-updated + rt-rt6i_idev-cnf.rtr_probe_interval)) { - struct __rt6_probe_work *work; + work = NULL; + write_lock(neigh-lock); + if (!(neigh-nud_state NUD_VALID) + time_after(jiffies, neigh-updated + rt-rt6i_idev-cnf.rtr_probe_interval)) { + work = kmalloc(sizeof(*work), GFP_ATOMIC); + if (work) { + __neigh_set_probe_once(neigh); + } + } + write_unlock(neigh-lock); + } else { work = kmalloc(sizeof(*work), GFP_ATOMIC); + } - if (neigh work) - __neigh_set_probe_once(neigh); - - if (neigh) - write_unlock(neigh-lock); + if (work) { + INIT_WORK(work-work, rt6_probe_deferred); + work-target = rt-rt6i_gateway; + dev_hold(rt-dst.dev); + work-dev = rt-dst.dev; + schedule_work(work-work); + } - if (work) { - INIT_WORK(work-work, rt6_probe_deferred); - work-target = rt-rt6i_gateway; - dev_hold(rt-dst.dev); - work-dev = rt-dst.dev; - schedule_work(work-work); - } - } else { out: - write_unlock(neigh-lock); - } rcu_read_unlock_bh(); } #else -- Hideaki Yoshifuji hideaki.yoshif...@miraclelinux.com Technical Division, MIRACLE LINUX CORPORATION -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers
On Tue, Jul 21, 2015 at 07:00:40PM -0700, Alex Gartrell wrote: mov %rsp, %r1 ; r1 = rsp add $-8, %r1; r1 = rsp - 8 store_q $123, -8(%rsp) ; *(u64*)r1 = 123 - valid store_q $123, (%r1) ; *(u64*)r1 = 123 - previously invalid mov $0, %r0 exit; Always need to exit Is this your new eBPF assembler syntax? :) imo gnu style looks ugly... ;) It's great to see such in-depth understanding of verifier!! And we'd get the following error: 0: (bf) r1 = r10 1: (07) r1 += -8 2: (7a) *(u64 *)(r10 -8) = 999 3: (7a) *(u64 *)(r1 +0) = 999 R1 invalid mem access 'fp' Unable to load program We already know that a register is a stack address and the appropriate offset, so we should be able to validate those references as well. yes, we can teach verifier to do that. Though llvm doesn't generate such code. It's small enough change. Signed-off-by: Alex Gartrell agartr...@fb.com --- kernel/bpf/verifier.c | 9 + 1 file changed, 9 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 039d866..5dfbece 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, err = check_stack_write(state, off, size, value_regno); else err = check_stack_read(state, off, size, value_regno); + } else if (state-regs[regno].type == PTR_TO_STACK) { + int real_off = state-regs[regno].imm + off; real_off is missing alignment and bounds checks. something like: if (state-regs[regno].type == PTR_TO_STACK) off += state-regs[regno].imm; if (off % size != 0) ... else if (state-regs[regno].type == FRAME_PTR || == PTR_TO_STACK) .. as-is here ... would fix it. please add few accept and reject tests for this to test_verifier.c as well. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mac80211_hwsim: unregister genetlink family properly
During hwsim_init_netlink(), we should call genl_unregister_family() if failed on netlink_register_notifier() since the genetlink is already registered. Signed-off-by: Su Kang Yin cant...@cantona.net --- drivers/net/wireless/mac80211_hwsim.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 99e873d..16d953e 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -3120,8 +3120,10 @@ static int hwsim_init_netlink(void) goto failure; rc = netlink_register_notifier(hwsim_netlink_notifier); - if (rc) + if (rc) { + genl_unregister_family(hwsim_genl_family); goto failure; + } return 0; -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf] netfilter: Support expectations in different zones
When zones were originally introduced, the expectation functions were all extended to perform lookup using the zone. However, insertion was not modified to check the zone. This means that two expectations which are intended to apply for different connections that have the same tuple but exist in different zones cannot both be tracked. Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for conntrack zones) Signed-off-by: Joe Stringer joestrin...@nicira.com --- net/netfilter/nf_conntrack_expect.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/nf_conntrack_expect.c b/net/netfilter/nf_conntrack_expect.c index 7a17070..b45a422 100644 --- a/net/netfilter/nf_conntrack_expect.c +++ b/net/netfilter/nf_conntrack_expect.c @@ -219,7 +219,8 @@ static inline int expect_clash(const struct nf_conntrack_expect *a, a-mask.src.u3.all[count] b-mask.src.u3.all[count]; } - return nf_ct_tuple_mask_cmp(a-tuple, b-tuple, intersect_mask); + return nf_ct_tuple_mask_cmp(a-tuple, b-tuple, intersect_mask) + nf_ct_zone(a-master) == nf_ct_zone(b-master); } static inline int expect_matches(const struct nf_conntrack_expect *a, -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 200/208] drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different base types)
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: 614732eaa12dd462c0ab274700bed14f36afea5e [200/208] openvswitch: Use regular VXLAN net_device device reproduce: # apt-get install sparse git checkout 614732eaa12dd462c0ab274700bed14f36afea5e make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by ) include/net/checksum.h:166:35: sparse: incorrect type in argument 1 (different base types) include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum include/net/checksum.h:166:35:got restricted __sum16 include/net/checksum.h:166:43: sparse: incorrect type in argument 2 (different base types) include/net/checksum.h:166:43:expected restricted __wsum [usertype] addend include/net/checksum.h:166:43:got restricted __sum16 [usertype] noident include/net/checksum.h:174:43: sparse: incorrect type in argument 2 (different base types) include/net/checksum.h:174:43:expected restricted __wsum [usertype] addend include/net/checksum.h:174:43:got restricted __sum16 [usertype] noident include/net/checksum.h:166:35: sparse: incorrect type in argument 1 (different base types) include/net/checksum.h:166:35:expected restricted __wsum [usertype] csum include/net/checksum.h:166:35:got restricted __sum16 include/net/checksum.h:166:43: sparse: incorrect type in argument 2 (different base types) include/net/checksum.h:166:43:expected restricted __wsum [usertype] addend include/net/checksum.h:166:43:got restricted __sum16 [usertype] noident drivers/net/vxlan.c:1739:21: sparse: incorrect type in assignment (different base types) drivers/net/vxlan.c:1739:21:expected restricted __be32 [usertype] vx_vni drivers/net/vxlan.c:1739:21:got unsigned int [unsigned] [usertype] vni drivers/net/vxlan.c:1818:21: sparse: incorrect type in assignment (different base types) drivers/net/vxlan.c:1818:21:expected restricted __be32 [usertype] vx_vni drivers/net/vxlan.c:1818:21:got unsigned int [unsigned] [usertype] vni drivers/net/vxlan.c:2014:58: sparse: incorrect type in argument 11 (different base types) drivers/net/vxlan.c:2014:58:expected unsigned int [unsigned] [usertype] vni drivers/net/vxlan.c:2014:58:got restricted __be32 [usertype] noident drivers/net/vxlan.c:2072:67: sparse: incorrect type in argument 11 (different base types) drivers/net/vxlan.c:2072:67:expected unsigned int [unsigned] [usertype] vni drivers/net/vxlan.c:2072:67:got restricted __be32 [usertype] noident vim +1739 drivers/net/vxlan.c 1723 } 1724 1725 skb = vlan_hwaccel_push_inside(skb); 1726 if (WARN_ON(!skb)) { 1727 err = -ENOMEM; 1728 goto err; 1729 } 1730 1731 skb = iptunnel_handle_offloads(skb, udp_sum, type); 1732 if (IS_ERR(skb)) { 1733 err = -EINVAL; 1734 goto err; 1735 } 1736 1737 vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh)); 1738 vxh-vx_flags = htonl(VXLAN_HF_VNI); 1739 vxh-vx_vni = vni; 1740 1741 if (type SKB_GSO_TUNNEL_REMCSUM) { 1742 u32 data = (skb_checksum_start_offset(skb) - hdrlen) 1743 VXLAN_RCO_SHIFT; 1744 1745 if (skb-csum_offset == offsetof(struct udphdr, check)) 1746 data |= VXLAN_RCO_UDP; 1747 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cgroup: net_cls: fix false-positive suspicious RCU usage
From: Konstantin Khlebnikov khlebni...@yandex-team.ru Date: Tue, 21 Jul 2015 19:46:29 +0300 @@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct cgroup_subsys_state struct cgroup_cls_state *task_cls_state(struct task_struct *p) { - return css_cls_state(task_css(p, net_cls_cgrp_id)); + return css_cls_state(task_css_check(p, net_cls_cgrp_id, +rcu_read_lock_bh_held())); You've made a serious mess of the indentation here. First of all, you've changed the correct plain TAB before the 'return' line into a TAB and two SPACE characters. Secondly, the second line needs to be precisely indented to the exact column following the openning parenthesis of the task_css_check() call on the previous line. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay
From: Dan Murphy dmur...@ti.com Date: Tue, 21 Jul 2015 12:06:45 -0500 Fix warning: logical ‘or’ of collectively exhaustive tests is always true Change the internal delay check from an 'or' condition to an 'and' condition. Reported-by: David Binderman dcb...@hotmail.com Signed-off-by: Dan Murphy dmur...@ti.com Applied, thanks.
Re: [PATCH net v2] macvtap: fix network header pointer for VLAN tagged pkts
On 15/07/21 (火) 16:18, Ivan Vecera wrote: Network header is set with offset ETH_HLEN but it is not true for VLAN (multiple-)tagged and results in checksum issues in lower devices. v2: leave skb-protocol untouched (thx Vlad), comment added Signed-off-by: Ivan Vecera ivec...@redhat.com --- drivers/net/macvtap.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 3b933bb..b75776b 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -796,6 +796,13 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m, skb_reset_mac_header(skb); skb-protocol = eth_hdr(skb)-h_proto; + /* Move network header to the right position for VLAN tagged packets */ + if (skb_vlan_tagged(skb)) { I guess you don't need the condition skb_vlan_tag_present(skb), i.e., if (skb-protocol == htons(ETH_P_8021Q) || skb-protocol == htons(ETH_P_8021AD)) + int depth; + __vlan_get_protocol(skb, skb-protocol, depth); __vlan_get_protocol() can fail, and then, depth will not be initialized. + skb_set_network_header(skb, depth); I think you should set network_header after skb_probe_transport_header(). It calls skb_flow_dissect_flow_keys(), which seems to expect network_header to be ETH_HLEN. Toshiaki Makita -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v2] brcmsmac: Use kstrdup to simplify code
Replace a kmalloc+strcpy by an equivalent kstrdup in order to improve readability. Signed-off-by: Christophe JAILLET christophe.jail...@wanadoo.fr Acked-by: Arend van Spriel ar...@broadcom.com Thanks, applied to wireless-drivers-next.git. Kalle Valo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] mpls: make RTA_OIF optional
From: Roopa Prabhu ro...@cumulusnetworks.com If user did not specify an oif, try and get it from the via address. If failed to get device, return with -ENODEV. Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com --- net/mpls/af_mpls.c | 67 +++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c index 1f93a59..4cd3789 100644 --- a/net/mpls/af_mpls.c +++ b/net/mpls/af_mpls.c @@ -15,6 +15,7 @@ #include net/ip_fib.h #include net/netevent.h #include net/netns/generic.h +#include net/ip6_route.h #include internal.h #define LABEL_NOT_SPECIFIED (120) @@ -327,6 +328,70 @@ static unsigned find_free_label(struct net *net) return LABEL_NOT_SPECIFIED; } +static struct net_device *inet_fib_lookup_dev(struct net *net, void *addr) +{ + struct net_device *dev = NULL; + struct rtable *rt; + struct in_addr daddr; + + memcpy(daddr, addr, sizeof(struct in_addr)); + rt = ip_route_output(net, daddr.s_addr, 0, 0, 0); + if (IS_ERR(rt)) + goto errout; + + dev = rt-dst.dev; + dev_hold(dev); + + ip_rt_put(rt); + +errout: + return dev; +} + +static struct net_device *inet6_fib_lookup_dev(struct net *net, void *addr) +{ + struct net_device *dev = NULL; + struct dst_entry *dst; + struct flowi6 fl6; + + memset(fl6, 0, sizeof(fl6)); + memcpy(fl6.daddr, addr, sizeof(struct in6_addr)); + dst = ip6_route_output(net, NULL, fl6); + if (dst-error) + goto errout; + + dev = dst-dev; + dev_hold(dev); + +errout: + dst_release(dst); + + return dev; +} + +static struct net_device *find_outdev(struct net *net, + struct mpls_route_config *cfg) +{ + struct net_device *dev = NULL; + + if (!cfg-rc_ifindex) { + switch (cfg-rc_via_table) { + case NEIGH_ARP_TABLE: + dev = inet_fib_lookup_dev(net, cfg-rc_via); + break; + case NEIGH_ND_TABLE: + dev = inet6_fib_lookup_dev(net, cfg-rc_via); + break; + case NEIGH_LINK_TABLE: + break; + } + } else { + dev = dev_get_by_index(net, cfg-rc_ifindex); + } + + return dev; +} + static int mpls_route_add(struct mpls_route_config *cfg) { struct mpls_route __rcu **platform_label; @@ -358,7 +423,7 @@ static int mpls_route_add(struct mpls_route_config *cfg) goto errout; err = -ENODEV; - dev = dev_get_by_index(net, cfg-rc_ifindex); + dev = find_outdev(net, cfg); if (!dev) goto errout; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] packet: Allow packets with only a header (but no payload)
On Tue, Jul 21, 2015 at 12:38 PM, Martin Blumenstingl martin.blumensti...@googlemail.com wrote: Hi Willem, On Tue, Jul 21, 2015 at 6:28 PM, Willem de Bruijn will...@google.com wrote: Interesting. 9c7077622dd9 only extended the check from tpacket_snd to packet_snd to make the two paths equivalent. The existing check had the ominous statement /* net device doesn't like empty head */ OK, I guess it's best to find out what the purpose of this comment is. so allowing a header-only packet while correct in your case may not be safe in some edge cases (specific device drivers?). I'm wondering how a good fix would look like (I can think of a few things, like renaming hard_header_len to something min_packet_size)? I am open for suggestions since I have zero knowledge about the inner workings of the packet framework. I don't see a simple way of verifying the safety of allowing packets without data short of a code audit, which would be huge, especially when taking device driver logic into account. Perhaps someone remembers why that statement was added and what edge case(s) it refers to. I'm afraid that I don't. It was added in 69e3c75f4d54. I added the author to this thread. This was also discussed previously http://www.spinics.net/lists/netdev/msg309677.html In any case, I don't think that reverting the patch and restoring the old inconsistent state is a fix. I totally agree with you that it's a bad fix if this means that we could break other drivers. My primary goal was to fix PPPoE connections - I guess I should have simply added RFC to the subject. Regards, Martin -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: phy: dp83867: Fix warning check for setting the internal delay
Fix warning: logical ‘or’ of collectively exhaustive tests is always true Change the internal delay check from an 'or' condition to an 'and' condition. Reported-by: David Binderman dcb...@hotmail.com Signed-off-by: Dan Murphy dmur...@ti.com --- drivers/net/phy/dp83867.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c index c7a12e2..8a3bf54 100644 --- a/drivers/net/phy/dp83867.c +++ b/drivers/net/phy/dp83867.c @@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev) return ret; } - if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) || + if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) (phydev-interface = PHY_INTERFACE_MODE_RGMII_RXID)) { val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL, DP83867_DEVADDR, phydev-addr); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: dp83867: Fix warning check for setting the internal delay
On 21/07/15 10:06, Dan Murphy wrote: Fix warning: logical ‘or’ of collectively exhaustive tests is always true Change the internal delay check from an 'or' condition to an 'and' condition. Reported-by: David Binderman dcb...@hotmail.com Signed-off-by: Dan Murphy dmur...@ti.com Acked-by: Florian Fainelli f.faine...@gmail.com --- drivers/net/phy/dp83867.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c index c7a12e2..8a3bf54 100644 --- a/drivers/net/phy/dp83867.c +++ b/drivers/net/phy/dp83867.c @@ -164,7 +164,7 @@ static int dp83867_config_init(struct phy_device *phydev) return ret; } - if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) || + if ((phydev-interface = PHY_INTERFACE_MODE_RGMII_ID) (phydev-interface = PHY_INTERFACE_MODE_RGMII_RXID)) { val = phy_read_mmd_indirect(phydev, DP83867_RGMIICTL, DP83867_DEVADDR, phydev-addr); -- Florian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, Jul 21, 2015 at 3:52 AM, Eric Dumazet eric.duma...@gmail.com wrote: On Tue, 2015-07-21 at 06:04 -0400, Jamal Hadi Salim wrote: It is worrisome to fix the core code for this. The root cause seems to be codel. Dont have time but in general, reset would be something like: struct fq_codel_sched_data *q = qdisc_priv(sch); qdisc_reset(q) This only works for very simple qdisc with one queue. or something along those lines... But certainly dequeue semantics dont seem right there.. Well, reset() is trivial to implement like this while (skb = local_dequeue(sch)) { kfree_skb(skb); } And I guess I copy/pasted sfq code here, because I was lazy. But yes, qdisc_tree_decrease_qlen() would have to be not called. Hmm, so the semantic is each qdisc resets qlen for its own and calls qdisc_reset() to reset its leaf qdisc's, that makes sense for me. It seems I coded fq_reset() differently. Alex, please try instead : diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 21ca33c9f036..3f0320ab6029 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -288,10 +288,21 @@ begin: static void fq_codel_reset(struct Qdisc *sch) { - struct sk_buff *skb; + struct fq_codel_sched_data *q = qdisc_priv(sch); + int i; - while ((skb = fq_codel_dequeue(sch)) != NULL) - kfree_skb(skb); + INIT_LIST_HEAD(q-new_flows); + INIT_LIST_HEAD(q-old_flows); + for (i = 0; i q-flows_cnt; i++) { + struct fq_codel_flow *flow = q-flows + i; + + while (flow-head) + kfree_skb(dequeue_head(flow)); + + INIT_LIST_HEAD(flow-flowchain); You probably need to call codel_vars_init(flow-cvars) as well. + } + memset(q-backlogs, 0, q-flows_cnt * sizeof(u32)); + sch-q.qlen = 0; } static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = { Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net] inet: frags: fix defragmented packet's IP header for af_packet
From: Eric Dumazet eric.duma...@gmail.com Date: Tue, 21 Jul 2015 09:43:59 +0200 From: Edward Hyunkoo Jee ed...@google.com When ip_frag_queue() computes positions, it assumes that the passed sk_buff does not contain L2 headers. However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly functions can be called on outgoing packets that contain L2 headers. Also, IPv4 checksum is not corrected after reassembly. Fixes: 7736d33f4262 (packet: Add pre-defragmentation support for ipv4 fanouts.) Signed-off-by: Edward Hyunkoo Jee ed...@google.com Signed-off-by: Eric Dumazet eduma...@google.com Applied and queued up for -stable, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 14/22] vxlan: Flow based tunneling
On 07/21/15 at 10:30am, Alexei Starovoitov wrote: RX: +info-mode = IP_TUNNEL_INFO_RX; +info-key.tun_flags = TUNNEL_KEY; +info-key.tun_id = cpu_to_be64(vni 8); ... TX: +dst_port = info-key.tp_dst ? : vxlan-dst_port; +vni = be64_to_cpu(info-key.tun_id); I think the copy paste of ovs_tunnel_info into ip_tunnel_info can be improved. In particular instead of '__be64 tun_id' we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx paths. netlink for this part also seems inconsistent. In the patch 16: +static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = { + [IP_TUN_ID] = { .type = NLA_U64 }, ... + if (tb[IP_TUN_ID]) + tun_info-key.tun_id = nla_get_u64(tb[IP_TUN_ID]); I think nla_get_be64 should be there? and with my suggestion we can add be64_to_cpu() here instead of doing it per packet. Thoughts? I like this. The be64 originates from how OVS stores the tun_id in the flow key. I agree that it makes sense to limit and delay the byteswaps to when OVS inherits the flow key from the ip_tunnel_info. I will send a follow-up. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] cgroup: net_cls: fix false-positive suspicious RCU usage
In dev_queue_xmit() net_cls protected with rcu-bh. [ 270.730026] === [ 270.730029] [ INFO: suspicious RCU usage. ] [ 270.730033] 4.2.0-rc3+ #2 Not tainted [ 270.730036] --- [ 270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage! [ 270.730041] other info that might help us debug this: [ 270.730043] rcu_scheduler_active = 1, debug_locks = 1 [ 270.730045] 2 locks held by dhclient/748: [ 270.730046] #0: (rcu_read_lock_bh){..}, at: [81682b70] __dev_queue_xmit+0x50/0x960 [ 270.730085] #1: (qdisc_tx_lock){+.}, at: [81682d60] __dev_queue_xmit+0x240/0x960 [ 270.730090] stack backtrace: [ 270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2 [ 270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [ 270.730100] 0001 8800bafeba58 817ad487 0007 [ 270.730103] 880232a0a780 8800bafeba88 810ca4f2 88022fb23e00 [ 270.730105] 880232a0a780 8800bafebb68 8800bafebb68 8800bafebaa8 [ 270.730108] Call Trace: [ 270.730121] [817ad487] dump_stack+0x4c/0x65 [ 270.730148] [810ca4f2] lockdep_rcu_suspicious+0xe2/0x120 [ 270.730153] [816a62d2] task_cls_state+0x92/0xa0 [ 270.730158] [a00b534f] cls_cgroup_classify+0x4f/0x120 [cls_cgroup] [ 270.730164] [816aac74] tc_classify_compat+0x74/0xc0 [ 270.730166] [816ab573] tc_classify+0x33/0x90 [ 270.730170] [a00bcb0a] htb_enqueue+0xaa/0x4a0 [sch_htb] [ 270.730172] [81682e26] __dev_queue_xmit+0x306/0x960 [ 270.730174] [81682b70] ? __dev_queue_xmit+0x50/0x960 [ 270.730176] [816834a3] dev_queue_xmit_sk+0x13/0x20 [ 270.730185] [81787770] dev_queue_xmit+0x10/0x20 [ 270.730187] [8178b91c] packet_snd.isra.62+0x54c/0x760 [ 270.730190] [8178be25] packet_sendmsg+0x2f5/0x3f0 [ 270.730203] [81665245] ? sock_def_readable+0x5/0x190 [ 270.730210] [817b64bb] ? _raw_spin_unlock+0x2b/0x40 [ 270.730216] [8173bcbc] ? unix_dgram_sendmsg+0x5cc/0x640 [ 270.730219] [8165f367] sock_sendmsg+0x47/0x50 [ 270.730221] [8165f42f] sock_write_iter+0x7f/0xd0 [ 270.730232] [811fd4c7] __vfs_write+0xa7/0xf0 [ 270.730234] [811fe5b8] vfs_write+0xb8/0x190 [ 270.730236] [811fe8c2] SyS_write+0x52/0xb0 [ 270.730239] [817b6bae] entry_SYSCALL_64_fastpath+0x12/0x76 Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru --- net/core/netclassid_cgroup.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/netclassid_cgroup.c b/net/core/netclassid_cgroup.c index 1f2a126f4ffa..515939034298 100644 --- a/net/core/netclassid_cgroup.c +++ b/net/core/netclassid_cgroup.c @@ -23,7 +23,8 @@ static inline struct cgroup_cls_state *css_cls_state(struct cgroup_subsys_state struct cgroup_cls_state *task_cls_state(struct task_struct *p) { - return css_cls_state(task_css(p, net_cls_cgrp_id)); + return css_cls_state(task_css_check(p, net_cls_cgrp_id, + rcu_read_lock_bh_held())); } EXPORT_SYMBOL_GPL(task_cls_state); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Identifying underlying interface from struct sock
Hi, First, I apologize for posting on the netdev forum. Majordomo did not list any other network related mailing list. Is there a way to identify the underlying network interface from an instance of struct sock? I realize that the socket is abstract and shouldn't/doesn't necessarily depend on the underlying interface, but say, with TCP, where the connection is endpoint oriented, shouldn't this mean that the socket maintains a reference to the interface to which it is associated? I tried dev = dev_get_by_index(sock_net(sk), skb-skb_iif); and dev = skb-dev; but in both cases, dev was NULL. I'm trying to reference the underlying interface to determine whether the conditions present in that interface are acceptable for transmission. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 14/22] vxlan: Flow based tunneling
On 7/21/15 1:43 AM, Thomas Graf wrote: This prepares the VXLAN device to be steered by the routing and other subsystems which allows to support encapsulation for a large number of tunnel endpoints and tunnel ids through a single net_device which improves the scalability. +1. looks very useful. RX: + info-mode = IP_TUNNEL_INFO_RX; + info-key.tun_flags = TUNNEL_KEY; + info-key.tun_id = cpu_to_be64(vni 8); ... TX: + dst_port = info-key.tp_dst ? : vxlan-dst_port; + vni = be64_to_cpu(info-key.tun_id); I think the copy paste of ovs_tunnel_info into ip_tunnel_info can be improved. In particular instead of '__be64 tun_id' we can use '__u64 tun_id' which will avoid extra byteswaps for rx/tx paths. netlink for this part also seems inconsistent. In the patch 16: +static const struct nla_policy ip_tun_policy[IP_TUN_MAX + 1] = { + [IP_TUN_ID] = { .type = NLA_U64 }, ... + if (tb[IP_TUN_ID]) + tun_info-key.tun_id = nla_get_u64(tb[IP_TUN_ID]); I think nla_get_be64 should be there? and with my suggestion we can add be64_to_cpu() here instead of doing it per packet. Thoughts? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow-stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- net/openvswitch/flow_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 4613df8..6552394 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -752,7 +752,7 @@ int ovs_flow_init(void) BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long)); flow_cache = kmem_cache_create(sw_flow, sizeof(struct sw_flow) - + (num_possible_nodes() + + (nr_node_ids * sizeof(struct flow_stats *)), 0, 0, NULL); if (flow_cache == NULL) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] drivers: net: cpsw: remove tx event processing in rx napi poll
From: Mugunthan V N mugunthan...@ti.com Date: Tue, 21 Jul 2015 16:00:42 +0530 With commit c03abd84634d (net: ethernet: cpsw: don't requests IRQs we don't use) common isr and napi are separated into separate tx isr and rx isr/napi, but still in rx napi tx events are handled. So removing the tx event handling in rx napi. Signed-off-by: Mugunthan V N mugunthan...@ti.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] tcp: don't extend RTO on failed loss probe attempts
On Fri, Jul 17, 2015 at 10:27 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-07-17 at 14:22 -0700, Yuchung Cheng wrote: If TLP was unable to send a probe, it extended the RTO to now + icsk_rto. But extending the RTO makes little sense if no TLP probe went out. With this commit, instead of extending the RTO we re-arm it relative to the transmit time of the write queue head. But what was the reason the probe could not be sent ? If it is local congestion or memory allocation error, it does make sense to not add fuel to the fire. Good point. We can identify those so we don't attempt to retransmit on these errors, but will retransmit on receive-window limit. I'll re-spin the patch. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 00/22 v2] Lightweight flow based encapsulation
From: Thomas Graf tg...@suug.ch Date: Tue, 21 Jul 2015 10:43:44 +0200 This series combines the work previously posted by Roopa, Robert and myself. It's according to what we discussed at NFWS. The motivation of this series is to: * Consolidate code between OVS and the rest of the kernel and get rid of OVS vports and instead represent them as pure net_devices. * Introduce a lightweight tunneling mechanism which enables flow based encapsulation to improve scalability on both RX and TX. * Do the above in an encapsulation unspecific way so that the encapsulation type is eventually abstracted away from the user. * Use the same forwarding decision for both native forwarding and encapsulation thus allowing to switch between native IPv6 and UDP encapsulation based on endpoint without requiring additional logic The fundamental changes introduces in this series are: * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation instructions. Depending on the specified type, the instructions apply to UDP encapsulations, MPLS and possible other in the future. * Depending on the encapsulation type, the output function of the dst is directly overwritten or the dst merely attaches metadata and relies on a subsequent net_device to apply it to the packet. The latter is typically used if an inner and outer IP header exist which require two subsequent routing lookups to be performed. * A new metadata_dst structure which can be attached to skbs to carry metadata in between subsystems. This new metadata transport is used to provide a single interface for VXLAN, routing and OVS to communicate through metadata. Series applied, but please take Alexei's endianness feedback into consideration. Thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2 v2] ss: Fix crash when dump stats from /proc with '-p'
On Tue, 21 Jul 2015 16:18:36 +0300 Vadim Kochan vadi...@gmail.com wrote: From: Vadim Kochan vadi...@gmail.com It really partially reverts: ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct) but adds few fields (name peer_name) from removed unixstat to sockstat struct to easy return original code. Fixes: ec4d0d8a9def35 (ss: Replace unixstat struct by new sockstat struct) Reported-by: Marc Dietrich marvi...@gmx.de Signed-off-by: Vadim Kochan vadi...@gmail.com I applied this one after resolving merge conflicts. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net] tcp: suppress a division by zero warning
From: Eric Dumazet eduma...@google.com Andrew Morton reported following warning on one ARM build with gcc-4.4 : net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc': net/ipv4/inet_hashtables.c:617: warning: division by zero Even guarded with a test on sizeof(spinlock_t), compiler does not like current construct on a !CONFIG_SMP build. Remove the warning by using a temporary variable. Fixes: 095dc8e0c368 (tcp: fix/cleanup inet_ehash_locks_alloc()) Reported-by: Andrew Morton a...@linux-foundation.org Signed-off-by: Eric Dumazet eduma...@google.com --- net/ipv4/inet_hashtables.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 5f9b063bbe8a..0cb9165421d4 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -624,22 +624,21 @@ EXPORT_SYMBOL_GPL(inet_hashinfo_init); int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) { + unsigned int locksz = sizeof(spinlock_t); unsigned int i, nblocks = 1; - if (sizeof(spinlock_t) != 0) { + if (locksz != 0) { /* allocate 2 cache lines or at least one spinlock per cpu */ - nblocks = max_t(unsigned int, - 2 * L1_CACHE_BYTES / sizeof(spinlock_t), - 1); + nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U); nblocks = roundup_pow_of_two(nblocks * num_possible_cpus()); /* no more locks than number of hash buckets */ nblocks = min(nblocks, hashinfo-ehash_mask + 1); - hashinfo-ehash_locks = kmalloc_array(nblocks, sizeof(spinlock_t), + hashinfo-ehash_locks = kmalloc_array(nblocks, locksz, GFP_KERNEL | __GFP_NOWARN); if (!hashinfo-ehash_locks) - hashinfo-ehash_locks = vmalloc(nblocks * sizeof(spinlock_t)); + hashinfo-ehash_locks = vmalloc(nblocks * locksz); if (!hashinfo-ehash_locks) return -ENOMEM; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] tcp: suppress a division by zero warning
From: Eric Dumazet eric.duma...@gmail.com Date: Wed, 22 Jul 2015 07:02:00 +0200 From: Eric Dumazet eduma...@google.com Andrew Morton reported following warning on one ARM build with gcc-4.4 : net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc': net/ipv4/inet_hashtables.c:617: warning: division by zero Even guarded with a test on sizeof(spinlock_t), compiler does not like current construct on a !CONFIG_SMP build. Remove the warning by using a temporary variable. Fixes: 095dc8e0c368 (tcp: fix/cleanup inet_ehash_locks_alloc()) Reported-by: Andrew Morton a...@linux-foundation.org Signed-off-by: Eric Dumazet eduma...@google.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next] cxgb4: Add debugfs entry to enable backdoor access
From: Hariprasad Shenai haripra...@chelsio.com Date: Tue, 21 Jul 2015 22:39:40 +0530 Add debugfs entry 'use_backdoor' to enable backdoor access to read sge context. By default, we read sge context's via firmware. In case of FW issues, one can enable backdoor access via debugfs to dump sge context for debugging purpose. Signed-off-by: Hariprasad Shenai haripra...@chelsio.com --- V2: Remove unnecessary braces as per comments by Sergei Shtylyov sergei.shtyl...@cogentembedded.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path
From: Lucas Stach l.st...@pengutronix.de Sent: Tuesday, July 21, 2015 11:11 PM To: David S. Miller Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org; ker...@pengutronix.de; patchwork-...@pengutronix.de Subject: [PATCH v2 2/2] net: fec: introduce fec_ptp_stop and use in probe fail path This function frees resources and cancels delayed work item that have been initialized in fec_ptp_init(). Use this to do proper error handling if something goes wrong in probe function after fec_ptp_init has been called. Signed-off-by: Lucas Stach l.st...@pengutronix.de --- drivers/net/ethernet/freescale/fec.h | 1 + drivers/net/ethernet/freescale/fec_main.c | 5 ++--- drivers/net/ethernet/freescale/fec_ptp.c | 10 ++ 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 1eee73cccdf5..99d33e2d35e6 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -562,6 +562,7 @@ struct fec_enet_private { }; void fec_ptp_init(struct platform_device *pdev); +void fec_ptp_stop(struct platform_device *pdev); void fec_ptp_start_cyclecounter(struct net_device *ndev); int fec_ptp_set(struct net_device *ndev, struct ifreq *ifr); int fec_ptp_get(struct net_device *ndev, struct ifreq *ifr); diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index a7f1bdf718f8..32e3807c650e 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3494,6 +3494,7 @@ failed_register: failed_mii_init: failed_irq: failed_init: + fec_ptp_stop(pdev); if (fep-reg_phy) regulator_disable(fep-reg_phy); failed_regulator: @@ -3515,14 +3516,12 @@ fec_drv_remove(struct platform_device *pdev) struct net_device *ndev = platform_get_drvdata(pdev); struct fec_enet_private *fep = netdev_priv(ndev); - cancel_delayed_work_sync(fep-time_keep); cancel_work_sync(fep-tx_timeout_work); + fec_ptp_stop(pdev); unregister_netdev(ndev); fec_enet_mii_remove(fep); if (fep-reg_phy) regulator_disable(fep-reg_phy); - if (fep-ptp_clock) - ptp_clock_unregister(fep-ptp_clock); of_node_put(fep-phy_node); free_netdev(ndev); diff --git a/drivers/net/ethernet/freescale/fec_ptp.c b/drivers/net/ethernet/freescale/fec_ptp.c index a15663ad7f5e..f457a23d0bfb 100644 --- a/drivers/net/ethernet/freescale/fec_ptp.c +++ b/drivers/net/ethernet/freescale/fec_ptp.c @@ -604,6 +604,16 @@ void fec_ptp_init(struct platform_device *pdev) schedule_delayed_work(fep-time_keep, HZ); } +void fec_ptp_stop(struct platform_device *pdev) { + struct net_device *ndev = platform_get_drvdata(pdev); + struct fec_enet_private *fep = netdev_priv(ndev); + + cancel_delayed_work_sync(fep-time_keep); + if (fep-ptp_clock) + ptp_clock_unregister(fep-ptp_clock); +} + /** * fec_ptp_check_pps_event * @fep: the fec_enet_private structure handle -- 2.1.4 Acked-by: Fugang Duan b38...@freescale.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] e1000e: Move e1000e_disable_aspm_locked() inside CONFIG_PM
On Wed, 2015-07-15 at 03:30 -0700, Jeff Kirsher wrote: On Tue, 2015-07-14 at 13:54 +1000, Michael Ellerman wrote: e1000e_disable_aspm_locked() is only used in __e1000_resume() which is inside CONFIG_PM. So when CONFIG_PM=n we get a defined but not used warning for e1000e_disable_aspm_locked(). Move it inside the existing CONFIG_PM block to avoid the warning. Signed-off-by: Michael Ellerman m...@ellerman.id.au --- drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) NACK, this is already fixed in my next-queue tree. Raanan submitted a patch back on July 6th to resolve this issue, see commit id a75787d2246a93d256061db602f252703559af65 in my dev-queue branch of my next-queue tree. OK. I take it your next-queue is destined for 4.3, so we'll just have to suck on the warning until then? cheers -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote: - kfree_skb(skb); + INIT_LIST_HEAD(q-new_flows); + INIT_LIST_HEAD(q-old_flows); + for (i = 0; i q-flows_cnt; i++) { + struct fq_codel_flow *flow = q-flows + i; + + while (flow-head) + kfree_skb(dequeue_head(flow)); + + INIT_LIST_HEAD(flow-flowchain); You probably need to call codel_vars_init(flow-cvars) as well. It is not necessary : flow-cvars only matter in the event of a dequeue, but whole qdisc is dismantled and no packet will be dequeued. But it will affect the next dequeue _after_ reset? which is not supposed to happen as we expect a fresh start after reset? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb
Based on comments by Neal Cardwell to tcp_nv patch: AFAICT this patch would not require an increase in the size of sk_buff cb[] if it were to take advantage of the fact that the tcp_skb_cb header.h4 and header.h6 fields are only used in the packet reception code path, and this in_flight field is only used on the transmit side. So the in_flight field could be placed in a struct that is itself placed in a union with the header union. That way the sender code can remember the in_flight value without requiring any extra space. And in the future other sender-side info could be stored in the tx struct, if needed. Signed-off-by: Lawrence Brakmo bra...@fb.com --- include/net/tcp.h | 13 ++--- net/ipv4/tcp_input.c | 5 - net/ipv4/tcp_output.c | 4 +++- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 26e7651..2e62efe 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -755,11 +755,17 @@ struct tcp_skb_cb { /* 1 byte hole */ __u32 ack_seq;/* Sequence number ACK'd*/ union { - struct inet_skb_parmh4; + struct { + /* bytes in flight when this packet was sent */ + __u32 in_flight; + } tx; /* only used for outgoing skbs */ + union { + struct inet_skb_parmh4; #if IS_ENABLED(CONFIG_IPV6) - struct inet6_skb_parm h6; + struct inet6_skb_parm h6; #endif - } header; /* For incoming frames */ + } header; /* For incoming skbs */ + }; }; #define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)((__skb)-cb[0])) @@ -837,6 +843,7 @@ union tcp_cc_info; struct ack_sample { u32 pkts_acked; s32 rtt_us; + u32 in_flight; }; struct tcp_congestion_ops { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4f641f6..aca4ae5 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3068,6 +3068,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, long ca_rtt_us = -1L; struct sk_buff *skb; u32 pkts_acked = 0; + u32 last_in_flight = 0; bool rtt_update; int flag = 0; @@ -3107,6 +3108,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, if (!first_ackt.v64) first_ackt = last_ackt; + last_in_flight = TCP_SKB_CB(skb)-tx.in_flight; reord = min(pkts_acked, reord); if (!after(scb-end_seq, tp-high_seq)) flag |= FLAG_ORIG_SACK_ACKED; @@ -3196,7 +3198,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, } if (icsk-icsk_ca_ops-pkts_acked) { - struct ack_sample sample = {pkts_acked, ca_rtt_us}; + struct ack_sample sample = {pkts_acked, ca_rtt_us, + last_in_flight}; icsk-icsk_ca_ops-pkts_acked(sk, sample); } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 7105784..e9deab5 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -920,9 +920,12 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, int err; BUG_ON(!skb || !tcp_skb_pcount(skb)); + tp = tcp_sk(sk); if (clone_it) { skb_mstamp_get(skb-skb_mstamp); + TCP_SKB_CB(skb)-tx.in_flight = TCP_SKB_CB(skb)-end_seq + - tp-snd_una; if (unlikely(skb_cloned(skb))) skb = pskb_copy(skb, gfp_mask); @@ -933,7 +936,6 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, } inet = inet_sk(sk); - tp = tcp_sk(sk); tcb = TCP_SKB_CB(skb); memset(opts, 0, sizeof(opts)); -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 net-next 0/3] tcp: add NV congestion control
This patchset adds support for NV congestion control. The first patch replaces two arguments in the pkts_acked() function of the congestion control modules with a struct, making it easier to add more parameters later without modifying the existing congestion control modules. The second patch adds the number of bytes in_flight when a packet is sent to the tcp_skb_cb without increasing its size. The third patch adds NV congestion control support. [RFC PATCH v2 net-next 1/3] tcp: replace cnt rtt with struct in pkts_acked() [RFC PATCH v2 net-next 2/3] tcp: add in_flight to tcp_skb_cb [RFC PATCH v2 net-next 3/3] tcp: add NV congestion control Signed-off-by: Lawrence Brakmo bra...@fb.com include/net/tcp.h | 21 ++- net/ipv4/Kconfig | 16 ++ net/ipv4/Makefile | 1 + net/ipv4/sysctl_net_ipv4.c | 9 + net/ipv4/tcp_bic.c | 6 +- net/ipv4/tcp_cdg.c | 14 +- net/ipv4/tcp_cubic.c | 6 +- net/ipv4/tcp_htcp.c| 10 +- net/ipv4/tcp_illinois.c| 20 +- net/ipv4/tcp_input.c | 12 +- net/ipv4/tcp_lp.c | 6 +- net/ipv4/tcp_nv.c | 479 net/ipv4/tcp_output.c | 4 +- net/ipv4/tcp_vegas.c | 6 +- net/ipv4/tcp_vegas.h | 2 +- net/ipv4/tcp_veno.c| 6 +- net/ipv4/tcp_westwood.c| 6 +- net/ipv4/tcp_yeah.c| 6 +- 18 files changed, 579 insertions(+), 51 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/9] sfc: support for cascaded multicast filtering
From: Edward Cree ec...@solarflare.com Date: Tue, 21 Jul 2015 15:07:44 +0100 Recent versions of firmware for SFC9100 adapters add support for filter chaining, in which packets matching multiple filters are delivered to all filters' recipients, rather than only the highest match-priority filter as was previously the case. This patch series enables this feature and redesigns the filter handling code to make use of it; in particular, subscribing to a multicast address on one function no longer prevents traffic to that address reaching another function which is in promiscuous or allmulti mode. If the firmware does not support filter chaining, the driver will fall back to the old behaviour. Series applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 net-next 1/3] tcp: replace cnt rtt with struct in pkts_acked()
Replace 2 arguments (cnt and rtt) in the congestion control modules' pkts_acked() function with a struct. This will allow adding more information without having to modify existing congestion control modules (tcp_nv in particular needs bytes in flight when packet was sent). This was proposed by Neal Cardwell in his comments to the tcp_nv patch. Signed-off-by: Lawrence Brakmo lawre...@brakmo.org --- include/net/tcp.h | 7 ++- net/ipv4/tcp_bic.c | 6 +++--- net/ipv4/tcp_cdg.c | 14 +++--- net/ipv4/tcp_cubic.c| 6 +++--- net/ipv4/tcp_htcp.c | 10 +- net/ipv4/tcp_illinois.c | 20 ++-- net/ipv4/tcp_input.c| 7 +-- net/ipv4/tcp_lp.c | 6 +++--- net/ipv4/tcp_vegas.c| 6 +++--- net/ipv4/tcp_vegas.h| 2 +- net/ipv4/tcp_veno.c | 6 +++--- net/ipv4/tcp_westwood.c | 6 +++--- net/ipv4/tcp_yeah.c | 6 +++--- 13 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 364426a..26e7651 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -834,6 +834,11 @@ enum tcp_ca_ack_event_flags { union tcp_cc_info; +struct ack_sample { + u32 pkts_acked; + s32 rtt_us; +}; + struct tcp_congestion_ops { struct list_headlist; u32 key; @@ -857,7 +862,7 @@ struct tcp_congestion_ops { /* new value of cwnd after loss (optional) */ u32 (*undo_cwnd)(struct sock *sk); /* hook for packet ack accounting (optional) */ - void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us); + void (*pkts_acked)(struct sock *sk, struct ack_sample); /* get info for inet_diag (optional) */ size_t (*get_info)(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info); diff --git a/net/ipv4/tcp_bic.c b/net/ipv4/tcp_bic.c index fd1405d..6a873f7 100644 --- a/net/ipv4/tcp_bic.c +++ b/net/ipv4/tcp_bic.c @@ -197,15 +197,15 @@ static void bictcp_state(struct sock *sk, u8 new_state) /* Track delayed acknowledgment ratio using sliding window * ratio = (15*ratio + sample) / 16 */ -static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt) +static void bictcp_acked(struct sock *sk, struct ack_sample sample) { const struct inet_connection_sock *icsk = inet_csk(sk); if (icsk-icsk_ca_state == TCP_CA_Open) { struct bictcp *ca = inet_csk_ca(sk); - cnt -= ca-delayed_ack ACK_RATIO_SHIFT; - ca-delayed_ack += cnt; + ca-delayed_ack += sample.pkts_acked - + (ca-delayed_ack ACK_RATIO_SHIFT); } } diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c index 167b6a3..ef64106 100644 --- a/net/ipv4/tcp_cdg.c +++ b/net/ipv4/tcp_cdg.c @@ -294,12 +294,12 @@ static void tcp_cdg_cong_avoid(struct sock *sk, u32 ack, u32 acked) ca-shadow_wnd = max(ca-shadow_wnd, ca-shadow_wnd + incr); } -static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us) +static void tcp_cdg_acked(struct sock *sk, struct ack_sample sample) { struct cdg *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); - if (rtt_us = 0) + if (sample.rtt_us = 0) return; /* A heuristic for filtering delayed ACKs, adapted from: @@ -307,20 +307,20 @@ static void tcp_cdg_acked(struct sock *sk, u32 num_acked, s32 rtt_us) * delay and rate based TCP mechanisms. TR 100219A. CAIA, 2010. */ if (tp-sacked_out == 0) { - if (num_acked == 1 ca-delack) { + if (sample.pkts_acked == 1 ca-delack) { /* A delayed ACK is only used for the minimum if it is * provenly lower than an existing non-zero minimum. */ - ca-rtt.min = min(ca-rtt.min, rtt_us); + ca-rtt.min = min(ca-rtt.min, sample.rtt_us); ca-delack--; return; - } else if (num_acked 1 ca-delack 5) { + } else if (sample.pkts_acked 1 ca-delack 5) { ca-delack++; } } - ca-rtt.min = min_not_zero(ca-rtt.min, rtt_us); - ca-rtt.max = max(ca-rtt.max, rtt_us); + ca-rtt.min = min_not_zero(ca-rtt.min, sample.rtt_us); + ca-rtt.max = max(ca-rtt.max, sample.rtt_us); } static u32 tcp_cdg_ssthresh(struct sock *sk) diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 28011fb..070d629 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -416,21 +416,21 @@ static void hystart_update(struct sock *sk, u32 delay) /* Track delayed acknowledgment ratio using sliding window * ratio = (15*ratio + sample) / 16 */ -static void bictcp_acked(struct sock *sk, u32 cnt, s32 rtt_us) +static void bictcp_acked(struct sock *sk, struct ack_sample sample) { const struct tcp_sock *tp = tcp_sk(sk);
[RFC PATCH v2 net-next 3/3] tcp: add NV congestion control
This is a request for comments. TCP-NV (New Vegas) is a major update to TCP-Vegas. An earlier version of NV was presented at 2010's LPC (slides). It is a delayed based congestion avoidance for the data center. This version has been tested within a 10G rack where the HW RTTs are 20-50us. A description of TCP-NV, including implementation and experimental results, can be found at: http://www.brakmo.org/networking/tcp-nv/TCPNV.html The current version includes many module parameters to support experimentation with the parameters. Signed-off-by: Lawrence Brakmo bra...@fb.com --- include/net/tcp.h | 1 + net/ipv4/Kconfig | 16 ++ net/ipv4/Makefile | 1 + net/ipv4/sysctl_net_ipv4.c | 9 + net/ipv4/tcp_input.c | 2 + net/ipv4/tcp_nv.c | 479 + 6 files changed, 508 insertions(+) create mode 100644 net/ipv4/tcp_nv.c diff --git a/include/net/tcp.h b/include/net/tcp.h index 2e62efe..c0690ae 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat; extern int sysctl_tcp_min_tso_segs; extern int sysctl_tcp_autocorking; extern int sysctl_tcp_invalid_ratelimit; +extern int sysctl_tcp_nv_enable; extern atomic_long_t tcp_memory_allocated; extern struct percpu_counter tcp_sockets_allocated; diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig index 6fb3c90..c37b374 100644 --- a/net/ipv4/Kconfig +++ b/net/ipv4/Kconfig @@ -539,6 +539,22 @@ config TCP_CONG_VEGAS window. TCP Vegas should provide less packet loss, but it is not as aggressive as TCP Reno. +config TCP_CONG_NV + tristate TCP NV + default m + ---help--- + TCP NV is a follow up to TCP Vegas. It has been modified to deal with + 10G networks, measurement noise introduced by LRO, GRO and interrupt + coalescence. In addition, it will decrease its cwnd multiplicative + instead of linearly. + + Note that in general congestion avoidance (cwnd decreased when # packets + queued grows) cannot coexist with congestion control (cwnd decreased only + when there is packet loss) due to fairness issues. One scenario when the + can coexist safely is when the CA flows have RTTs CC flows RTTs. + + For further details see http://www.brakmo.org/networking/tcp-nv/ + config TCP_CONG_SCALABLE tristate Scalable TCP default n diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile index efc43f3..06f335f 100644 --- a/net/ipv4/Makefile +++ b/net/ipv4/Makefile @@ -50,6 +50,7 @@ obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o obj-$(CONFIG_TCP_CONG_HYBLA) += tcp_hybla.o obj-$(CONFIG_TCP_CONG_HTCP) += tcp_htcp.o obj-$(CONFIG_TCP_CONG_VEGAS) += tcp_vegas.o +obj-$(CONFIG_TCP_CONG_NV) += tcp_nv.o obj-$(CONFIG_TCP_CONG_VENO) += tcp_veno.o obj-$(CONFIG_TCP_CONG_SCALABLE) += tcp_scalable.o obj-$(CONFIG_TCP_CONG_LP) += tcp_lp.o diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 433231c..31846d5 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -730,6 +730,15 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec_ms_jiffies, }, { + .procname = tcp_nv_enable, + .data = sysctl_tcp_nv_enable, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = zero, + .extra2 = one, + }, + { .procname = icmp_msgs_per_sec, .data = sysctl_icmp_msgs_per_sec, .maxlen = sizeof(int), diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index aca4ae5..87560d9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly; int sysctl_tcp_moderate_rcvbuf __read_mostly = 1; int sysctl_tcp_early_retrans __read_mostly = 3; int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2; +int sysctl_tcp_nv_enable __read_mostly = 1; +EXPORT_SYMBOL(sysctl_tcp_nv_enable); #define FLAG_DATA 0x01 /* Incoming frame contained data. */ #define FLAG_WIN_UPDATE0x02 /* Incoming ACK was a window update. */ diff --git a/net/ipv4/tcp_nv.c b/net/ipv4/tcp_nv.c new file mode 100644 index 000..af451b6 --- /dev/null +++ b/net/ipv4/tcp_nv.c @@ -0,0 +1,479 @@ +/* + * TCP NV: TCP with Congestion Avoidance + * + * TCP-NV is a successor of TCP-Vegas that has been developed to + * deal with the issues that occur in modern networks. + * Like TCP-Vegas, TCP-NV supports true congestion avoidance, + * the ability to detect congestion before packet losses occur. + * When congestion (queue buildup) starts to occur, TCP-NV + * predicts what the cwnd size should be for the current + * throughput and it
Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
From: Chris J Arges chris.j.ar...@canonical.com Date: Tue, 21 Jul 2015 12:36:33 -0500 Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow-stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] mpls: make RTA_OIF optional
From: Roopa Prabhu ro...@cumulusnetworks.com Date: Tue, 21 Jul 2015 09:16:24 -0700 From: Roopa Prabhu ro...@cumulusnetworks.com If user did not specify an oif, try and get it from the via address. If failed to get device, return with -ENODEV. Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] be2net: support ndo_get_phys_port_id()
From: Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com Add be_get_phys_port_id() function to report physical port id. The port id should be unique across different be2net devices in the system. We use the chip serial number along with the physical port number for this. Signed-off-by: Sriharsha Basavapatna sriharsha.basavapa...@avagotech.com --- drivers/net/ethernet/emulex/benet/be.h | 3 +++ drivers/net/ethernet/emulex/benet/be_cmds.c | 7 ++- drivers/net/ethernet/emulex/benet/be_cmds.h | 8 +--- drivers/net/ethernet/emulex/benet/be_main.c | 22 ++ 4 files changed, 36 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index cb5777b..8cd384d 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -105,6 +105,8 @@ #define MAX_VFS30 /* Max VFs supported by BE3 FW */ #define FW_VER_LEN 32 +#defineCNTL_SERIAL_NUM_WORDS 8 /* Controller serial number words */ +#defineCNTL_SERIAL_NUM_WORD_SZ (sizeof(u16)) /* Byte-sz of serial num word */ #defineRSS_INDIR_TABLE_LEN 128 #define RSS_HASH_KEY_LEN 40 @@ -590,6 +592,7 @@ struct be_adapter { struct rss_info rss_info; /* Filters for packets that need to be sent to BMC */ u32 bmc_filt_mask; + u16 serial_num[CNTL_SERIAL_NUM_WORDS]; }; #define be_physfn(adapter) (!adapter-virtfn) diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c index ecad46f..3be1fbd 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.c +++ b/drivers/net/ethernet/emulex/benet/be_cmds.c @@ -2852,10 +2852,11 @@ int be_cmd_get_cntl_attributes(struct be_adapter *adapter) struct be_mcc_wrb *wrb; struct be_cmd_req_cntl_attribs *req; struct be_cmd_resp_cntl_attribs *resp; - int status; + int status, i; int payload_len = max(sizeof(*req), sizeof(*resp)); struct mgmt_controller_attrib *attribs; struct be_dma_mem attribs_cmd; + u32 *serial_num; if (mutex_lock_interruptible(adapter-mbox_lock)) return -1; @@ -2886,6 +2887,10 @@ int be_cmd_get_cntl_attributes(struct be_adapter *adapter) if (!status) { attribs = attribs_cmd.va + sizeof(struct be_cmd_resp_hdr); adapter-hba_port_num = attribs-hba_attribs.phy_port; + serial_num = attribs-hba_attribs.controller_serial_number; + for (i = 0; i CNTL_SERIAL_NUM_WORDS; i++) + adapter-serial_num[i] = le32_to_cpu(serial_num[i]) + (BIT_MASK(16) - 1); } err: diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h index a4479f7..36d835b 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.h +++ b/drivers/net/ethernet/emulex/benet/be_cmds.h @@ -1637,10 +1637,12 @@ struct be_cmd_req_set_qos { struct mgmt_hba_attribs { u32 rsvd0[24]; u8 controller_model_number[32]; - u32 rsvd1[79]; - u8 rsvd2[3]; + u32 rsvd1[16]; + u32 controller_serial_number[8]; + u32 rsvd2[55]; + u8 rsvd3[3]; u8 phy_port; - u32 rsvd3[13]; + u32 rsvd4[13]; } __packed; struct mgmt_controller_attrib { diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index c996dd7..5e92db8 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -5219,6 +5219,27 @@ static netdev_features_t be_features_check(struct sk_buff *skb, } #endif +static int be_get_phys_port_id(struct net_device *dev, + struct netdev_phys_item_id *ppid) +{ + int i, id_len = CNTL_SERIAL_NUM_WORDS * CNTL_SERIAL_NUM_WORD_SZ + 1; + struct be_adapter *adapter = netdev_priv(dev); + u8 *id; + + if (MAX_PHYS_ITEM_ID_LEN id_len) + return -ENOSPC; + + ppid-id[0] = adapter-hba_port_num + 1; + id = ppid-id[1]; + for (i = CNTL_SERIAL_NUM_WORDS - 1; i = 0; +i--, id += CNTL_SERIAL_NUM_WORD_SZ) + memcpy(id, adapter-serial_num[i], CNTL_SERIAL_NUM_WORD_SZ); + + ppid-id_len = id_len; + + return 0; +} + static const struct net_device_ops be_netdev_ops = { .ndo_open = be_open, .ndo_stop = be_close, @@ -5249,6 +5270,7 @@ static const struct net_device_ops be_netdev_ops = { .ndo_del_vxlan_port = be_del_vxlan_port, .ndo_features_check = be_features_check, #endif + .ndo_get_phys_port_id = be_get_phys_port_id, }; static void be_netdev_init(struct net_device *netdev) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to
Re: [RFC PATCH v2 net-next 1/3] tcp: replace cnt rtt with struct in pkts_acked()
On Tue, 2015-07-21 at 21:21 -0700, Lawrence Brakmo wrote: Replace 2 arguments (cnt and rtt) in the congestion control modules' pkts_acked() function with a struct. This will allow adding more information without having to modify existing congestion control modules (tcp_nv in particular needs bytes in flight when packet was sent). This was proposed by Neal Cardwell in his comments to the tcp_nv patch. Are you sure Neal suggested to pass a struct as argument ? It was probably a struct pointer instead. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().
On 7/21/15, 9:57 PM, Rami Rosen wrote: This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The assignment of vinfo.flags = ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is unneeded, as vinfo.flags value is overriden by the immediately following vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement. Signed-off-by: Rami Rosen rami.ro...@intel.com Acked-by: Roopa Prabhu ro...@cumulusnetworks.com Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path
Hello, On Tue, 21 Jul 2015, Martin KaFai Lau wrote: The patch checks neigh-nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. Locking usage is absolutely correct. + if (!(neigh-nud_state NUD_VALID) + time_after(jiffies, neigh-updated + rt-rt6i_idev-cnf.rtr_probe_interval)) { but this line is too long... + work = kmalloc(sizeof(*work), GFP_ATOMIC); + if (work) { + __neigh_set_probe_once(neigh); + } scripts/checkpatch.pl --strict /tmp/file.patch Regards -- Julian Anastasov j...@ssi.bg -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V2 net 0/3] BPF JIT fixes for ARM
From: Nicolas Schichan nschic...@freebox.fr Date: Tue, 21 Jul 2015 14:14:11 +0200 These patches are fixing bugs in the ARM JIT and should probably find their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release are now passing OK (was 54 out of 60 before). Series applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference
From: Roopa Prabhu ro...@cumulusnetworks.com fix for: net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces) remove incorrect rcu_dereference possibly left over from earlier revisions of the code. Reported-by: kbuild test robot fengguang...@intel.com Signed-off-by: Roopa Prabhu ro...@cumulusnetworks.com --- net/mpls/mpls_iptunnel.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c index eea096f..276f8c9 100644 --- a/net/mpls/mpls_iptunnel.c +++ b/net/mpls/mpls_iptunnel.c @@ -70,7 +70,7 @@ int mpls_output(struct sock *sk, struct sk_buff *skb) skb_orphan(skb); /* Find the output device */ - out_dev = rcu_dereference(dst-dev); + out_dev = dst-dev; if (!mpls_output_possible(out_dev) || !lwtstate || skb_warn_if_lro(skb)) goto drop; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH net-next] ebpf: Allow dereferences of PTR_TO_STACK registers
mov %rsp, %r1 ; r1 = rsp add $-8, %r1; r1 = rsp - 8 store_q $123, -8(%rsp) ; *(u64*)r1 = 123 - valid store_q $123, (%r1) ; *(u64*)r1 = 123 - previously invalid mov $0, %r0 exit; Always need to exit And we'd get the following error: 0: (bf) r1 = r10 1: (07) r1 += -8 2: (7a) *(u64 *)(r10 -8) = 999 3: (7a) *(u64 *)(r1 +0) = 999 R1 invalid mem access 'fp' Unable to load program We already know that a register is a stack address and the appropriate offset, so we should be able to validate those references as well. Signed-off-by: Alex Gartrell agartr...@fb.com --- kernel/bpf/verifier.c | 9 + 1 file changed, 9 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 039d866..5dfbece 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -676,6 +676,15 @@ static int check_mem_access(struct verifier_env *env, u32 regno, int off, err = check_stack_write(state, off, size, value_regno); else err = check_stack_read(state, off, size, value_regno); + } else if (state-regs[regno].type == PTR_TO_STACK) { + int real_off = state-regs[regno].imm + off; + + if (t == BPF_WRITE) + err = check_stack_write( + state, real_off, size, value_regno); + else + err = check_stack_read( + state, real_off, size, value_regno); } else { verbose(R%d invalid mem access '%s'\n, regno, reg_type_str[state-regs[regno].type]); -- Alex Gartrell agartr...@fb.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, 2015-07-21 at 19:03 -0700, Cong Wang wrote: On Tue, Jul 21, 2015 at 1:57 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote: - kfree_skb(skb); + INIT_LIST_HEAD(q-new_flows); + INIT_LIST_HEAD(q-old_flows); + for (i = 0; i q-flows_cnt; i++) { + struct fq_codel_flow *flow = q-flows + i; + + while (flow-head) + kfree_skb(dequeue_head(flow)); + + INIT_LIST_HEAD(flow-flowchain); You probably need to call codel_vars_init(flow-cvars) as well. It is not necessary : flow-cvars only matter in the event of a dequeue, but whole qdisc is dismantled and no packet will be dequeued. But it will affect the next dequeue _after_ reset? which is not supposed to happen as we expect a fresh start after reset? Hmm... I thought reset() was only done at queue dismantle, so no new packet should be added later, and since no packet should be left after reset, no dequeue should happen. For completeness, we still can add the codel_vars_init(), no problem. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] bridge: Fix setting a flag in br_fill_ifvlaninfo_range().
This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The assignment of vinfo.flags = ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is unneeded, as vinfo.flags value is overriden by the immediately following vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement. Signed-off-by: Rami Rosen rami.ro...@intel.com --- net/bridge/br_netlink.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 364bdc9..793d247 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -164,8 +164,6 @@ static int br_fill_ifvlaninfo_range(struct sk_buff *skb, u16 vid_start, sizeof(vinfo), vinfo)) goto nla_put_failure; - vinfo.flags = ~BRIDGE_VLAN_INFO_RANGE_BEGIN; - vinfo.vid = vid_end; vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END; if (nla_put(skb, IFLA_BRIDGE_VLAN_INFO, -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] netlink: don't hold mutex in rcu callback when releasing mmapd ring
From: Florian Westphal f...@strlen.de Date: Tue, 21 Jul 2015 16:33:50 +0200 Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: ... Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: Kirill A. Shutemov kir...@shutemov.name Diagnosed-by: Cong Wang cw...@twopensource.com Suggested-by: Thomas Graf tg...@suug.ch Signed-off-by: Florian Westphal f...@strlen.de Applied, thanks everyone. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] net: track success and failure of TCP PMTU probing
From: r...@tardy.usa.hp.com (Rick Jones) Date: Tue, 21 Jul 2015 16:14:13 -0700 (PDT) From: Rick Jones rick.jon...@hp.com Track success and failure of TCP PMTU probing. Signed-off-by: Rick Jones rick.jon...@hp.com Seems reasonable, applied, thanks Rick. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ravb: fix ring memory allocation
From: Sergei Shtylyov sergei.shtyl...@cogentembedded.com Date: Wed, 22 Jul 2015 01:31:59 +0300 The driver is written as if it can adapt to a low memory situation allocating less RX skbs and TX aligned buffers than the respective RX/TX ring sizes. In reality though the driver would malfunction in this case. Stop being overly smart and just fail in such situation -- this is achieved by moving the memory allocation from ravb_ring_format() to ravb_ring_init(). We leave dma_map_single() calls in place but make their failure non-fatal by marking the corresponding RX descriptors with zero data size which should prevent DMA to an invalid addresses. Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com Applied. But the real way to handle this is to allocate all of the necessary resources for the replacement RX SKB before unmapping and passing the original SKB up into the stack. That way you _NEVER_ starve the device of RX packets to receive into, since if you fail the memory allocation or the DMA mapping, you just put the original SKB back into the ring. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
On Tue, Jul 21, 2015 at 10:36 AM, Chris J Arges chris.j.ar...@canonical.com wrote: Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow-stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com Acked-by: Pravin B Shelar pshe...@nicira.com Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ARP response with link local IP, why not broadcast
On Tue, Jul 21, 2015 at 4:38 PM, Sebastian Fett db_ext...@gmx.de wrote: Hello! According to RFC3927 every ARP packet (reply and request) should be sent as link layer broadcast as long as the sender IP is a link local address. (see chapter 2.5). Because broadcast replies are noisy and should be avoided. if possible- it creates a broadcast flood that would wake up all receivers, and is especially undesirable in today's world, where bcast would wake up sleepy devices, or require other inefficient processes in a cloud env. See also https://www.ietf.org/id/draft-nordmark-6man-dad-approaches-01.txt That functionality would help me a lot with a use case I have with our application. what is your use case? But it is not implemented in the kernel that way. Does anyone know why? --Sowmini -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 0/2] pci: Provide a flag to access VPD through function 0
[+cc Alex] On Mon, Jul 13, 2015 at 11:39:54AM -0700, Mark D Rustad wrote: Many multi-function devices provide shared registers in extended config space for accessing VPD. The behavior of these registers means that the state must be tracked and access locked correctly for accesses not to hang or worse. One way to meet these needs is to always perform the accesses through function 0, thereby using the state tracking and mutex that already exists. To provide this behavior, add a dev_flags bit to indicate that this should be done. This bit can then be set for any non-zero function that needs to redirect such VPD access to function 0. Do not set this bit on the zero function or there will be an infinite recursion. The second patch uses this new flag to invoke this behavior on all multi-function Intel Ethernet devices. Any hardware that shares VPD registers with multiple functions has been suffering these problems forever. The hangs result in the log message: vpd r/w failed. This is likely a firmware bug on this device. Both read and write data corruption are also possible during overlapping accesses in addition to hangs. Signed-off-by: Mark Rustad mark.d.rus...@intel.com --- Changes in V2: - Corrected a spelling error in a log message - Added checks to see that the referenced function 0 is reasonable Changes in V3: - Don't leak a device reference - Check that function 0 has VPD - Make a helper for the function 0 checks - Moved a multifunction check to the quirk patch Changes in V4: - Provide a more extensive commit log for patch 1 I applied these to pci/misc for v4.3 with changelogs as follows. I added Alex's ack, since he acked v3 and the only difference here is the changelog. I also added a stable tag. Thanks! Bjorn commit 932c435caba8a2ce473a91753bad0173269ef334 Author: Mark Rustad mark.d.rus...@intel.com Date: Mon Jul 13 11:40:02 2015 -0700 PCI: Add dev_flags bit to access VPD through function 0 Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through function 0 to provide VPD access on other functions. This is for hardware devices that provide copies of the same VPD capability registers in multiple functions. Because the kernel expects that each function has its own registers, both the locking and the state tracking are affected by VPD accesses to different functions. On such devices for example, if a VPD write is performed on function 0, *any* later attempt to read VPD from any other function of that device will hang. This has to do with how the kernel tracks the expected value of the F bit per function. Concurrent accesses to different functions of the same device can not only hang but also corrupt both read and write VPD data. When hangs occur, typically the error message: vpd r/w failed. This is likely a firmware bug on this device. will be seen. Never set this bit on function 0 or there will be an infinite recursion. Signed-off-by: Mark Rustad mark.d.rus...@intel.com Signed-off-by: Bjorn Helgaas bhelg...@google.com Acked-by: Alexander Duyck alexander.h.du...@redhat.com CC: sta...@vger.kernel.org commit 7aa6ca4d39edf01f997b9e02cf6d2fdeb224f351 Author: Mark Rustad mark.d.rus...@intel.com Date: Mon Jul 13 11:40:07 2015 -0700 PCI: Add VPD function 0 quirk for Intel Ethernet devices Set the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device functions other than function 0, so that on multi-function devices, we will always read VPD from function 0 instead of from the other functions. [bhelgaas: changelog] Signed-off-by: Mark Rustad mark.d.rus...@intel.com Signed-off-by: Bjorn Helgaas bhelg...@google.com Acked-by: Alexander Duyck alexander.h.du...@redhat.com CC: sta...@vger.kernel.org -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Frank Schreuder fschreu...@transip.nl wrote: [ inet frag evictor crash ] We believe we found the bug. This patch should fix it. We cannot share list for buckets and evictor, the flag member is subject to race conditions so flags INET_FRAG_EVICTED test is not reliable. It would be great if you could confirm that this fixes the problem for you, we'll then make formal patch submission. Please apply this on kernel without previous test patches, wheter you use affected -stable or net-next kernel shouldn't matter since those are similar enough. Many thanks! diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -45,6 +45,7 @@ enum { * @flags: fragment queue flags * @max_size: maximum received fragment size * @net: namespace that this frag belongs to + * @list_evictor: list of queues to forcefully evict (e.g. due to low memory) */ struct inet_frag_queue { spinlock_t lock; @@ -59,6 +60,7 @@ struct inet_frag_queue { __u8flags; u16 max_size; struct netns_frags *net; + struct hlist_node list_evictor; }; #define INETFRAGS_HASHSZ 1024 diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a0..1722348 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -151,14 +151,13 @@ evict_again: } fq-flags |= INET_FRAG_EVICTED; - hlist_del(fq-list); - hlist_add_head(fq-list, expired); + hlist_add_head(fq-list_evictor, expired); ++evicted; } spin_unlock(hb-chain_lock); - hlist_for_each_entry_safe(fq, n, expired, list) + hlist_for_each_entry_safe(fq, n, expired, list_evictor) f-frag_expire((unsigned long) fq); return evicted; @@ -284,8 +283,7 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f) struct inet_frag_bucket *hb; hb = get_frag_bucket_locked(fq, f); - if (!(fq-flags INET_FRAG_EVICTED)) - hlist_del(fq-list); + hlist_del(fq-list); spin_unlock(hb-chain_lock); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
On 21.07.2015 [12:36:33 -0500], Chris J Arges wrote: Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow-stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges chris.j.ar...@canonical.com Acked-by: Nishanth Aravamudan n...@linux.vnet.ibm.com --- net/openvswitch/flow_table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c index 4613df8..6552394 100644 --- a/net/openvswitch/flow_table.c +++ b/net/openvswitch/flow_table.c @@ -752,7 +752,7 @@ int ovs_flow_init(void) BUILD_BUG_ON(sizeof(struct sw_flow_key) % sizeof(long)); flow_cache = kmem_cache_create(sw_flow, sizeof(struct sw_flow) -+ (num_possible_nodes() ++ (nr_node_ids * sizeof(struct flow_stats *)), 0, 0, NULL); if (flow_cache == NULL) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] openvswitch: make for_each_node loops work with sparse numa systems
On 21.07.2015 [11:30:58 -0500], Chris J Arges wrote: On Tue, Jul 21, 2015 at 09:24:18AM -0700, Nishanth Aravamudan wrote: On 21.07.2015 [10:32:34 -0500], Chris J Arges wrote: Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and Couldn't this also be fixed by just allocationg with a multiple of nr_node_ids (which seems to have been the original intent all along)? You could then make your stats array be sparse or not. Yea originally this is what I did, but I thought it would be wasting memory. assumes the node variable returned by for_each_node will index into flow-stats[node]. For example, if node_possible_map is 0x30003, this patch will map node to node_cnt as follows: 0,1,16,17 = 0,1,2,3 The crash was noticed after 3af229f2 was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 My concern with this version of the fix is that you're relying on, implicitly, the order of for_each_node's iteration corresponding to the entries in stats 1:1. But what about node hotplug? It seems better to have the enumeration of the stats array match the topology accurately, rather, or to maintain some sort of internal map in the OVS code between the NUMA node and the entry in the stats array? I'm willing to be convinced otherwise, though :) -Nish Nish, The method I described should work for hotplug since it's using possible map which AFAIK is static rather than the online map. Oh you're right, I'm sorry! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2] Use PATH_MAX instead of MAXPATHLEN
On Wed, Apr 29, 2015 at 6:52 PM, Felix Janda felix.ja...@posteo.de wrote: Florian Fainelli wrote: On 27/04/15 09:13, Stephen Hemminger wrote: On Sat, 25 Apr 2015 22:33:28 +0200 Felix Janda felix.ja...@posteo.de wrote: They are equivalent but the former is more common. PATH_MAX is specified by POSIX and needs limits.h while MAXPATHLEN has BSD origin and needs sys/param.h. PATH_MAX has already been in use in misc/lnstat.h. Signed-off-by: Felix Janda felix.ja...@posteo.de Iproute2 is intended for use on Linux. It makes more sense to align with Posix than using leftover BSD stuff. Therefore I don't see any point in doing this. My reading from Felix's commit message is that he is attempting to do exactly that: conform to POSIX rather than BSD, which seems to be the direction you are also suggesting here. -- Florian This is correct. (In fact I misread the end of Stephen's message, thought that the patch was merged and wanted to thank for that.) What's the status of this patch? This is one of the reasons iproute2 cannot be compiled against musl C library. After fixing this I get tons of redefine errors: In file included from ../include/linux/xfrm.h:4:0, from xfrm_state.c:31: ../include/linux/in6.h:32:8: error: redefinition of ‘struct in6_addr’ struct in6_addr { ^ In file included from /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0, from xfrm_state.c:30: /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:24:8: note: originally defined here struct in6_addr ^ In file included from ../include/linux/xfrm.h:4:0, from xfrm_state.c:31: ../include/linux/in6.h:40:0: warning: s6_addr redefined #define s6_addr in6_u.u6_addr8 ^ In file included from /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netdb.h:9:0, from xfrm_state.c:30: /home/user/Documents/versioned/buildroot/output/host/usr/arm-buildroot-linux-musleabi/sysroot/usr/include/netinet/in.h:32:0: note: this is the location of the previous definition #define s6_addr __in6_union.__s6_addr ^ Yegor -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why return E2BIG from bpf map update?
On 7/21/15 3:13 AM, Alex Gartrell wrote: But, the EINVAL errno has similarly been abused to death there was a thread few month ago trying to come up with a generic solution for aliased error codes, but unfortunately nothing concrete came out of it. The one I liked sounded that the kernel may be able to extend syscall interface to return a string together with errno, but it's quite hard to do at present. May be extensions to vdso data writable by kernel can improve the situation. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested
Hi guys, Florian Fainelli f.faine...@gmail.com writes: Changes in v5: - removed an invalid use of the link_update callback in the SF2 driver was appeared after merging net: phy: fixed_phy: handle link-down case - reworded the commit message for patch 2 to make it clear what it fixes and why this is required Initial cover letter from Stas: Hello. Currently the link status auto-negotiation is enabled for any SGMII link with fixed-link DT binding. The regression was reported: https://lkml.org/lkml/2015/7/8/865 Apparently not all HW that implements SGMII protocol, generates the inband status for the auto-negotiation to work. More details here: https://lkml.org/lkml/2015/7/10/206 The following patches reverts to the old behavior by default, which is to not enable the auto-negotiation for fixed-link. The new DT property is added that allows to explicitly request the auto-negotiation. FWIW, I tested this v5 series on mirabox (2 mvneta interfaces using RGMII); both interfaces still work as expected, i.e. no regression on my side. a+ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH,v2 net] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Tue, 2015-07-21 at 11:12 -0700, Cong Wang wrote: - kfree_skb(skb); + INIT_LIST_HEAD(q-new_flows); + INIT_LIST_HEAD(q-old_flows); + for (i = 0; i q-flows_cnt; i++) { + struct fq_codel_flow *flow = q-flows + i; + + while (flow-head) + kfree_skb(dequeue_head(flow)); + + INIT_LIST_HEAD(flow-flowchain); You probably need to call codel_vars_init(flow-cvars) as well. It is not necessary : flow-cvars only matter in the event of a dequeue, but whole qdisc is dismantled and no packet will be dequeued. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring
From: Lucas Stach l.st...@pengutronix.de Sent: Tuesday, July 21, 2015 11:11 PM To: David S. Miller Cc: Duan Fugang-B38611; Li Frank-B20596; netdev@vger.kernel.org; ker...@pengutronix.de; patchwork-...@pengutronix.de Subject: [PATCH v2 1/2] net: fec: use managed DMA API functions to allocate BD ring So it gets freed when the device is going away. This fixes a DMA memory leak on driver probe() fail and driver remove(). Signed-off-by: Lucas Stach l.st...@pengutronix.de --- v2: Fix indentation of second line to fix alignment with opening bracket. --- drivers/net/ethernet/freescale/fec_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 349365d85b92..a7f1bdf718f8 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3142,8 +3142,8 @@ static int fec_enet_init(struct net_device *ndev) fep-bufdesc_size; /* Allocate memory for buffer descriptors. */ - cbd_base = dma_alloc_coherent(NULL, bd_size, bd_dma, - GFP_KERNEL); + cbd_base = dmam_alloc_coherent(fep-pdev-dev, bd_size, bd_dma, +GFP_KERNEL); if (!cbd_base) { return -ENOMEM; } -- Can you also replace the below position with dma_alloc_coherent() ? txq-tso_hdrs = dma_alloc_coherent(NULL, txq-tx_ring_size * TSO_HEADER_SIZE, txq-tso_hdrs_dma, GFP_KERNEL); Regards, Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] ravb: fix ring memory allocation
The driver is written as if it can adapt to a low memory situation allocating less RX skbs and TX aligned buffers than the respective RX/TX ring sizes. In reality though the driver would malfunction in this case. Stop being overly smart and just fail in such situation -- this is achieved by moving the memory allocation from ravb_ring_format() to ravb_ring_init(). We leave dma_map_single() calls in place but make their failure non-fatal by marking the corresponding RX descriptors with zero data size which should prevent DMA to an invalid addresses. Signed-off-by: Sergei Shtylyov sergei.shtyl...@cogentembedded.com --- The patch is against Dave Miller's 'net.git' repo. drivers/net/ethernet/renesas/ravb_main.c | 59 +-- 1 file changed, 34 insertions(+), 25 deletions(-) Index: net/drivers/net/ethernet/renesas/ravb_main.c === --- net.orig/drivers/net/ethernet/renesas/ravb_main.c +++ net/drivers/net/ethernet/renesas/ravb_main.c @@ -228,9 +228,7 @@ static void ravb_ring_format(struct net_ struct ravb_desc *desc = NULL; int rx_ring_size = sizeof(*rx_desc) * priv-num_rx_ring[q]; int tx_ring_size = sizeof(*tx_desc) * priv-num_tx_ring[q]; - struct sk_buff *skb; dma_addr_t dma_addr; - void *buffer; int i; priv-cur_rx[q] = 0; @@ -241,41 +239,28 @@ static void ravb_ring_format(struct net_ memset(priv-rx_ring[q], 0, rx_ring_size); /* Build RX ring buffer */ for (i = 0; i priv-num_rx_ring[q]; i++) { - priv-rx_skb[q][i] = NULL; - skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1); - if (!skb) - break; - ravb_set_buffer_align(skb); /* RX descriptor */ rx_desc = priv-rx_ring[q][i]; /* The size of the buffer should be on 16-byte boundary. */ rx_desc-ds_cc = cpu_to_le16(ALIGN(PKT_BUF_SZ, 16)); - dma_addr = dma_map_single(ndev-dev, skb-data, + dma_addr = dma_map_single(ndev-dev, priv-rx_skb[q][i]-data, ALIGN(PKT_BUF_SZ, 16), DMA_FROM_DEVICE); - if (dma_mapping_error(ndev-dev, dma_addr)) { - dev_kfree_skb(skb); - break; - } - priv-rx_skb[q][i] = skb; + /* We just set the data size to 0 for a failed mapping which +* should prevent DMA from happening... +*/ + if (dma_mapping_error(ndev-dev, dma_addr)) + rx_desc-ds_cc = cpu_to_le16(0); rx_desc-dptr = cpu_to_le32(dma_addr); rx_desc-die_dt = DT_FEMPTY; } rx_desc = priv-rx_ring[q][i]; rx_desc-dptr = cpu_to_le32((u32)priv-rx_desc_dma[q]); rx_desc-die_dt = DT_LINKFIX; /* type */ - priv-dirty_rx[q] = (u32)(i - priv-num_rx_ring[q]); memset(priv-tx_ring[q], 0, tx_ring_size); /* Build TX ring buffer */ for (i = 0; i priv-num_tx_ring[q]; i++) { - priv-tx_skb[q][i] = NULL; - priv-tx_buffers[q][i] = NULL; - buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL); - if (!buffer) - break; - /* Aligned TX buffer */ - priv-tx_buffers[q][i] = buffer; tx_desc = priv-tx_ring[q][i]; tx_desc-die_dt = DT_EEMPTY; } @@ -298,7 +283,10 @@ static void ravb_ring_format(struct net_ static int ravb_ring_init(struct net_device *ndev, int q) { struct ravb_private *priv = netdev_priv(ndev); + struct sk_buff *skb; int ring_size; + void *buffer; + int i; /* Allocate RX and TX skb rings */ priv-rx_skb[q] = kcalloc(priv-num_rx_ring[q], @@ -308,12 +296,28 @@ static int ravb_ring_init(struct net_dev if (!priv-rx_skb[q] || !priv-tx_skb[q]) goto error; + for (i = 0; i priv-num_rx_ring[q]; i++) { + skb = netdev_alloc_skb(ndev, PKT_BUF_SZ + RAVB_ALIGN - 1); + if (!skb) + goto error; + ravb_set_buffer_align(skb); + priv-rx_skb[q][i] = skb; + } + /* Allocate rings for the aligned buffers */ priv-tx_buffers[q] = kcalloc(priv-num_tx_ring[q], sizeof(*priv-tx_buffers[q]), GFP_KERNEL); if (!priv-tx_buffers[q]) goto error; + for (i = 0; i priv-num_tx_ring[q]; i++) { + buffer = kmalloc(PKT_BUF_SZ + RAVB_ALIGN - 1, GFP_KERNEL); + if (!buffer) + goto error; + /* Aligned TX buffer */ + priv-tx_buffers[q][i] = buffer; + } + /* Allocate all RX
Re: [PATCHv2 net-next] net: #ifdefify sk_classid member of struct sock
From: Mathias Krause mini...@googlemail.com Date: Sun, 19 Jul 2015 22:21:13 +0200 The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is enabled. #ifdefify it to reduce the size of struct sock on 32 bit systems, at least. Signed-off-by: Mathias Krause mini...@googlemail.com --- v2: - ensure we'll error out in nft_meta_get_init() if CONFIG_CGROUP_NET_CLASSID is not set Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pull-request: wireless-drivers 2015-07-20
From: Kalle Valo kv...@codeaurora.org Date: Mon, 20 Jul 2015 18:36:30 +0300 here are few fixes for 4.2, should not have anything out of ordinary. Please let me know if there are any issues. Pulled, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: track success and failure of TCP PMTU probing
From: Rick Jones rick.jon...@hp.com Track success and failure of TCP PMTU probing. Signed-off-by: Rick Jones rick.jon...@hp.com --- Tested by loading-up into an OpenStack instance and kicking the MTU out from under it in the corresponding router namespace. diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h index eee8968..25a9ad8 100644 --- a/include/uapi/linux/snmp.h +++ b/include/uapi/linux/snmp.h @@ -278,6 +278,8 @@ enum LINUX_MIB_TCPACKSKIPPEDCHALLENGE, /* TCPACKSkippedChallenge */ LINUX_MIB_TCPWINPROBE, /* TCPWinProbe */ LINUX_MIB_TCPKEEPALIVE, /* TCPKeepAlive */ + LINUX_MIB_TCPMTUPFAIL, /* TCPMTUPFail */ + LINUX_MIB_TCPMTUPSUCCESS, /* TCPMTUPSuccess */ __LINUX_MIB_MAX }; diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c index da5d483..3abd9d7 100644 --- a/net/ipv4/proc.c +++ b/net/ipv4/proc.c @@ -300,6 +300,8 @@ static const struct snmp_mib snmp4_net_list[] = { SNMP_MIB_ITEM(TCPACKSkippedChallenge, LINUX_MIB_TCPACKSKIPPEDCHALLENGE), SNMP_MIB_ITEM(TCPWinProbe, LINUX_MIB_TCPWINPROBE), SNMP_MIB_ITEM(TCPKeepAlive, LINUX_MIB_TCPKEEPALIVE), + SNMP_MIB_ITEM(TCPMTUPFail, LINUX_MIB_TCPMTUPFAIL), + SNMP_MIB_ITEM(TCPMTUPSuccess, LINUX_MIB_TCPMTUPSUCCESS), SNMP_MIB_SENTINEL }; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1578fc2..cda3ffe 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2593,6 +2593,7 @@ static void tcp_mtup_probe_failed(struct sock *sk) icsk-icsk_mtup.search_high = icsk-icsk_mtup.probe_size - 1; icsk-icsk_mtup.probe_size = 0; + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPFAIL); } static void tcp_mtup_probe_success(struct sock *sk) @@ -2612,6 +2613,7 @@ static void tcp_mtup_probe_success(struct sock *sk) icsk-icsk_mtup.search_low = icsk-icsk_mtup.probe_size; icsk-icsk_mtup.probe_size = 0; tcp_sync_mss(sk, icsk-icsk_pmtu_cookie); + NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPMTUPSUCCESS); } /* Do a simple retransmit without using the backoff mechanisms in -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v4] ipv6: sysctl to restrict candidate source addresses
From: Erik Kline e...@google.com Date: Mon, 20 Jul 2015 16:06:34 +0200 I thought perhaps use_oif_addr_only was a slightly clearer sysctl name. (Maybe it should be plural, use_oif_addrs_only?) I think plural would be better too, please respin with that change. Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net v5 0/4] net: enable inband link state negotiation only when explicitly requested
From: Florian Fainelli f.faine...@gmail.com Date: Mon, 20 Jul 2015 17:49:54 -0700 Changes in v5: - removed an invalid use of the link_update callback in the SF2 driver was appeared after merging net: phy: fixed_phy: handle link-down case - reworded the commit message for patch 2 to make it clear what it fixes and why this is required Series applied, thanks Florian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/1] tipc: fix compatibility bug
From: Jon Maloy jon.ma...@ericsson.com Date: Tue, 21 Jul 2015 06:42:28 -0400 In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6 (tipc: reduce locking scope during packet reception) we introduced a new function tipc_link_proto_rcv(). This function contains a bug, so that it sometimes by error sends out a non-zero link priority value in created protocol messages. The bug may lead to an extra link reset at initial link establising with older nodes. This will never happen more than once, whereafter the link will work as intended. We fix this bug in this commit. Signed-off-by: Jon Maloy jon.ma...@ericsson.com Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch net] sch_plug: purge buffered packets during reset
Otherwise the skbuff related structures are not correctly refcount'ed. Cc: Jamal Hadi Salim j...@mojatatu.com Signed-off-by: Cong Wang xiyou.wangc...@gmail.com --- net/sched/sch_plug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sched/sch_plug.c b/net/sched/sch_plug.c index 89f8fcf..ade9445 100644 --- a/net/sched/sch_plug.c +++ b/net/sched/sch_plug.c @@ -216,6 +216,7 @@ static struct Qdisc_ops plug_qdisc_ops __read_mostly = { .peek= qdisc_peek_head, .init= plug_init, .change = plug_change, + .reset = qdisc_reset_queue, .owner = THIS_MODULE, }; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next:master 194/208] include/net/dst_metadata.h:39:4: error: implicit declaration of function 'lwt_tun_info'
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: 3093fbe7ff4bc7d1571fc217dade1cf80330a714 [194/208] route: Per route IP tunnel metadata via lightweight tunnel config: i386-randconfig-i0-201529 (attached as .config) reproduce: git checkout 3093fbe7ff4bc7d1571fc217dade1cf80330a714 # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by ): In file included from net/core/dst.c:25:0: include/net/dst_metadata.h: In function 'skb_tunnel_info': include/net/dst_metadata.h:39:4: error: implicit declaration of function 'lwt_tun_info' [-Werror=implicit-function-declaration] return lwt_tun_info(rt-rt_lwtstate); ^ include/net/dst_metadata.h:39:4: warning: return makes pointer from integer without a cast cc1: some warnings being treated as errors vim +/lwt_tun_info +39 include/net/dst_metadata.h 33 return md_dst-u.tun_info; 34 35 switch (family) { 36 case AF_INET: 37 rt = (struct rtable *)skb_dst(skb); 38 if (rt rt-rt_lwtstate) 39 return lwt_tun_info(rt-rt_lwtstate); 40 break; 41 } 42 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.2.0-rc2 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT=elf32-i386 CONFIG_ARCH_DEFCONFIG=arch/x86/configs/i386_defconfig CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_LAZY_GS=y CONFIG_ARCH_HWEIGHT_CFLAGS=-fcall-saved-ecx -fcall-saved-edx CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config CONFIG_CONSTRUCTORS=y CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE= # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION= CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set # CONFIG_KERNEL_LZMA is not set # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set CONFIG_KERNEL_LZ4=y CONFIG_DEFAULT_HOSTNAME=(none) # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y # CONFIG_POSIX_MQUEUE is not set CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_FHANDLE=y CONFIG_USELIB=y # CONFIG_AUDIT is not set CONFIG_HAVE_ARCH_AUDITSYSCALL=y # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_DEBUG=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set CONFIG_NO_HZ=y # CONFIG_HIGH_RES_TIMERS is not set # # CPU/Task time and stats accounting # CONFIG_TICK_CPU_ACCOUNTING=y # CONFIG_IRQ_TIME_ACCOUNTING is not set # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_TINY_RCU=y # CONFIG_RCU_EXPERT is not set CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set # CONFIG_RCU_STALL_COMMON is not set # CONFIG_TREE_RCU_TRACE is not set # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y # CONFIG_IKCONFIG_PROC is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set CONFIG_CGROUP_FREEZER=y CONFIG_CGROUP_DEVICE=y # CONFIG_CPUSETS is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_MEMCG is not set # CONFIG_CGROUP_PERF is not set CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y CONFIG_RT_GROUP_SCHED=y CONFIG_BLK_CGROUP=y CONFIG_DEBUG_BLK_CGROUP=y # CONFIG_CHECKPOINT_RESTORE is not set
[net-next:master 195/208] net/core/fib_rules.c:418:3: error: implicit declaration of function 'ip_tunnel_need_metadata'
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master head: 16040894b26af9f85d9395f072c53d76a44eba21 commit: e7030878fc8448492b6e5cecd574043f63271298 [195/208] fib: Add fib rule match on tunnel id config: i386-randconfig-r0-201529 (attached as .config) reproduce: git checkout e7030878fc8448492b6e5cecd574043f63271298 # save the attached .config to linux build tree make ARCH=i386 All error/warnings (new ones prefixed by ): net/core/fib_rules.c: In function 'fib_nl_newrule': net/core/fib_rules.c:418:3: error: implicit declaration of function 'ip_tunnel_need_metadata' [-Werror=implicit-function-declaration] ip_tunnel_need_metadata(); ^ net/core/fib_rules.c: In function 'fib_nl_delrule': net/core/fib_rules.c:505:4: error: implicit declaration of function 'ip_tunnel_unneed_metadata' [-Werror=implicit-function-declaration] ip_tunnel_unneed_metadata(); ^ cc1: some warnings being treated as errors vim +/ip_tunnel_need_metadata +418 net/core/fib_rules.c 412 ops-nr_goto_rules++; 413 414 if (unresolved) 415 ops-unresolved_rules++; 416 417 if (rule-tun_id) 418 ip_tunnel_need_metadata(); 419 420 notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).portid); 421 flush_route_cache(ops); 422 rules_ops_put(ops); 423 return 0; 424 425 errout_free: 426 kfree(rule); 427 errout: 428 rules_ops_put(ops); 429 return err; 430 } 431 432 static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh) 433 { 434 struct net *net = sock_net(skb-sk); 435 struct fib_rule_hdr *frh = nlmsg_data(nlh); 436 struct fib_rules_ops *ops = NULL; 437 struct fib_rule *rule, *tmp; 438 struct nlattr *tb[FRA_MAX+1]; 439 int err = -EINVAL; 440 441 if (nlh-nlmsg_len nlmsg_msg_size(sizeof(*frh))) 442 goto errout; 443 444 ops = lookup_rules_ops(net, frh-family); 445 if (ops == NULL) { 446 err = -EAFNOSUPPORT; 447 goto errout; 448 } 449 450 err = nlmsg_parse(nlh, sizeof(*frh), tb, FRA_MAX, ops-policy); 451 if (err 0) 452 goto errout; 453 454 err = validate_rulemsg(frh, tb, ops); 455 if (err 0) 456 goto errout; 457 458 list_for_each_entry(rule, ops-rules_list, list) { 459 if (frh-action (frh-action != rule-action)) 460 continue; 461 462 if (frh_get_table(frh, tb) 463 (frh_get_table(frh, tb) != rule-table)) 464 continue; 465 466 if (tb[FRA_PRIORITY] 467 (rule-pref != nla_get_u32(tb[FRA_PRIORITY]))) 468 continue; 469 470 if (tb[FRA_IIFNAME] 471 nla_strcmp(tb[FRA_IIFNAME], rule-iifname)) 472 continue; 473 474 if (tb[FRA_OIFNAME] 475 nla_strcmp(tb[FRA_OIFNAME], rule-oifname)) 476 continue; 477 478 if (tb[FRA_FWMARK] 479 (rule-mark != nla_get_u32(tb[FRA_FWMARK]))) 480 continue; 481 482 if (tb[FRA_FWMASK] 483 (rule-mark_mask != nla_get_u32(tb[FRA_FWMASK]))) 484 continue; 485 486 if (tb[FRA_TUN_ID] 487 (rule-tun_id != nla_get_be64(tb[FRA_TUN_ID]))) 488 continue; 489 490 if (!ops-compare(rule, frh, tb)) 491 continue; 492 493 if (rule-flags FIB_RULE_PERMANENT) { 494 err = -EPERM; 495 goto errout; 496 } 497 498 if (ops-delete) { 499 err = ops-delete(rule); 500 if (err) 501 goto errout; 502 } 503 504 if (rule-tun_id) 505 ip_tunnel_unneed_metadata(); 506 507 list_del_rcu(rule-list); 508 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.2.0-rc2 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y
[PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path
The patch checks neigh-nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. I also take this chance to re-arrange the code. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): BeforeAfter 55M 95M Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 41 - 1 file changed, 20 insertions(+), 21 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 6090969..a6c6b5a 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -544,6 +544,7 @@ static void rt6_probe_deferred(struct work_struct *w) static void rt6_probe(struct rt6_info *rt) { + struct __rt6_probe_work *work; struct neighbour *neigh; /* * Okay, this does not seem to be appropriate @@ -558,34 +559,32 @@ static void rt6_probe(struct rt6_info *rt) rcu_read_lock_bh(); neigh = __ipv6_neigh_lookup_noref(rt-dst.dev, rt-rt6i_gateway); if (neigh) { - write_lock(neigh-lock); if (neigh-nud_state NUD_VALID) goto out; - } - - if (!neigh || - time_after(jiffies, neigh-updated + rt-rt6i_idev-cnf.rtr_probe_interval)) { - struct __rt6_probe_work *work; + work = NULL; + write_lock(neigh-lock); + if (!(neigh-nud_state NUD_VALID) + time_after(jiffies, neigh-updated + rt-rt6i_idev-cnf.rtr_probe_interval)) { + work = kmalloc(sizeof(*work), GFP_ATOMIC); + if (work) { + __neigh_set_probe_once(neigh); + } + } + write_unlock(neigh-lock); + } else { work = kmalloc(sizeof(*work), GFP_ATOMIC); + } - if (neigh work) - __neigh_set_probe_once(neigh); - - if (neigh) - write_unlock(neigh-lock); + if (work) { + INIT_WORK(work-work, rt6_probe_deferred); + work-target = rt-rt6i_gateway; + dev_hold(rt-dst.dev); + work-dev = rt-dst.dev; + schedule_work(work-work); + } - if (work) { - INIT_WORK(work-work, rt6_probe_deferred); - work-target = rt-rt6i_gateway; - dev_hold(rt-dst.dev); - work-dev = rt-dst.dev; - schedule_work(work-work); - } - } else { out: - write_unlock(neigh-lock); - } rcu_read_unlock_bh(); } #else -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Patch net] sch_choke: drop all packets in queue during reset
Signed-off-by: Cong Wang xiyou.wangc...@gmail.com --- net/sched/sch_choke.c | 13 + 1 file changed, 13 insertions(+) diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c index 93d5742..6a783af 100644 --- a/net/sched/sch_choke.c +++ b/net/sched/sch_choke.c @@ -385,6 +385,19 @@ static void choke_reset(struct Qdisc *sch) { struct choke_sched_data *q = qdisc_priv(sch); + while (q-head != q-tail) { + struct sk_buff *skb = q-tab[q-head]; + + q-head = (q-head + 1) q-tab_mask; + if (!skb) + continue; + qdisc_qstats_backlog_dec(sch, skb); + --sch-q.qlen; + qdisc_drop(skb, sch); + } + + memset(q-tab, 0, (q-tab_mask + 1) * sizeof(struct sk_buff *)); + q-head = q-tail = 0; red_restart(q-vars); } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] caif: fix leaks and race in caif_queue_rcv_skb()
From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 17 Jul 2015 10:19:23 +0200 From: Eric Dumazet eduma...@google.com 1) If sk_filter() is applied, skb was leaked (not freed) 2) Testing SOCK_DEAD twice is racy : packet could be freed while already queued. 3) Remove obsolete comment about caching skb-len Signed-off-by: Eric Dumazet eduma...@google.com Applied, thanks Eric. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sctp: fix cut and paste issue in comment
From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Date: Fri, 17 Jul 2015 13:50:21 -0300 Cookie ACK is always received by the association initiator, so fix the comment to avoid confusion. Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] net: fec: use managed DMA API functions to allocate BD ring
From: Lucas Stach l.st...@pengutronix.de Date: Mon, 20 Jul 2015 15:51:37 +0200 So it gets freed when the device is going away. This fixes a DMA memory leak on driver probe() fail and driver remove(). Signed-off-by: Lucas Stach l.st...@pengutronix.de --- drivers/net/ethernet/freescale/fec_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 349365d85b92..b3287c6b069b 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -3142,7 +3142,7 @@ static int fec_enet_init(struct net_device *ndev) fep-bufdesc_size; /* Allocate memory for buffer descriptors. */ - cbd_base = dma_alloc_coherent(NULL, bd_size, bd_dma, + cbd_base = dmam_alloc_coherent(fep-pdev-dev, bd_size, bd_dma, GFP_KERNEL); When you change the column of the openning parenthesis of a function call, you must fix up the indentation of the second and subsequent lines so that they all properly start at the first column after that openning parenthesis. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/mdio: fix mdio_bus_match for c45 PHY
From: shh@gmail.com Date: Fri, 17 Jul 2015 18:07:19 +0800 From: Shaohui Xie shaohui@freescale.com We store c45 PHY's id information in c45_ids, so it should be used to check the matching between PHY driver and PHY device for c45 PHY. Signed-off-by: Shaohui Xie shaohui@freescale.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] net: mvneta: fix refilling for Rx DMA buffers
From: Simon Guinot simon.gui...@sequanux.org Date: Sun, 19 Jul 2015 13:00:53 +0200 With the actual code, if a memory allocation error happens while refilling a Rx descriptor, then the original Rx buffer is both passed to the networking stack (in a SKB) and let in the Rx ring. This leads to various kernel oops and crashes. As a fix, this patch moves Rx descriptor refilling ahead of building SKB with the associated Rx buffer. In case of a memory allocation failure, data is dropped and the original DMA buffer is put back into the Rx ring. Signed-off-by: Simon Guinot simon.gui...@sequanux.org Fixes: c5aff18204da (net: mvneta: driver for Marvell Armada 370/XP network unit) Cc: sta...@vger.kernel.org # v3.8+ Tested-by: Yoann Sculo yo...@sculo.fr Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] inet: frags: fix defragmented packet's IP header for af_packet
This doesn't compile. net/ipv4/ip_fragment.c: In function ‘ip_frag_reasm’: net/ipv4/ip_fragment.c:644:23: error: ‘skb’ undeclared (first use in this function) ip_send_check(ip_hdr(skb)); This was meant to be ip_send_check(iph); Sorry, will send a v2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] phylib: add driver for Teranetics TN2020
From: shh@gmail.com Date: Fri, 17 Jul 2015 11:19:46 +0800 From: Shaohui Xie shaohui@freescale.com Teranetics TN2020 is compliant with IEEE 802.3an 10 Gigabit. Signed-off-by: Shaohui Xie shaohui@freescale.com Applied to net-next, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] rhashtable: Allow other tasks to be scheduled in large lookup loops
From: Thomas Graf tg...@suug.ch Date: Fri, 17 Jul 2015 10:52:48 +0200 Depending on system speed, the large lookup/insert/delete loops of the testsuite can take a considerable amount of time to complete causing watchdog warnings to appear. Allow other tasks to be scheduled throughout the loops. Reported-by: Meelis Roos mr...@linux.ee Signed-off-by: Thomas Graf tg...@suug.ch --- v2: Use cond_resched() instead schedule() Applied, thanks Thomas. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] inet: Always increment refcount in inet_twsk_schedule
On Mon, 2015-07-20 at 19:14 +, subas...@codeaurora.org wrote: //Initialize time wait socket and setup timer inet_twsk_alloc() tw_refcnt = 0 __inet_twsk_hashdance() tw_refcnt = 3 inet_twsk_schedule() tw_refcnt = 4 inet_twsk_put() tw_refcnt = 3 //Receive packet 1 in timewait state tcp_timewait_state_process() - inet_twsk_schedule tw_refcnt = 3 (no change) This is obviously wrong. If a timewait socket is found, do we increment its refcnt before proceeding. We do not increment refcount currently when we find a timewait socket. Actually we do increment refcnt, for every socket found in ehash. Carefully read again __inet_lookup_established() This code is generic for ESTABLISH and TIME-WAIT sockets If you found a code that performed the lookup without taking the refcnt, please point me at it, this would be a serious bug. I've received some private mails about tw issues, that turned to be caused by buggy drivers or buggy arch specific code. Are you crashed observed on x86 ? This is observed on ARM devices. In the current debug, all time wait socket refcount changes were happening in TCP stack only and there was no platform / driver code involved. According to my understanding, we would need to increment the time wait socket refcount first before proceeding with any subsequent operations. However, I request your expert opinion on this. Is it some Android kernel ? Android had private modules that needed an update in 3.18 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bonding: correct the MAC address for follow fail_over_mac policy
On 2015/7/21 11:30, David Miller wrote: From: Ding Tianhong dingtianh...@huawei.com Date: Thu, 16 Jul 2015 16:30:02 +0800 The follow fail_over_mac policy is useful for multiport devices that either become confused or incur a performance penalty when multiple ports are programmed with the same MAC address, but the same MAC address still may happened by this steps for this policy: 1) echo +eth0 /sys/class/net/bond0/bonding/slaves bond0 has the same mac address with eth0, it is MAC1. 2) echo +eth1 /sys/class/net/bond0/bonding/slaves eth1 is backup, eth1 has MAC2. 3) ifconfig eth0 down eth1 became active slave, bond will swap MAC for eth0 and eth1, so eth1 has MAC1, and eth0 has MAC2. 4) ifconfig eth1 down there is no active slave, and eth1 still has MAC1, eth2 has MAC2. 5) ifconfig eth0 up the eth0 became active slave again, the bond set eth0 to MAC1. Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same MAC address, it will break this policy for ACTIVE_BACKUP mode. This patch will fix this problem by finding the old active slave and swap them MAC address before change active slave. Signed-off-by: Ding Tianhong dingtianh...@huawei.com Applied and queued up for -stable, thanks. Thanks David. hi zefan: Could you please apply this patch to 3.4 stable tree, I think it will fix the same problem for this version. Ding -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html . -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] macvtap: fix network header pointer for VLAN tagged pkts
On 07/20/2015 06:42 PM, Vlad Yasevich wrote: On 07/20/2015 11:44 AM, Ivan Vecera wrote: Network header is set with offset ETH_HLEN but it is not true for VLAN (multiple-)tagged and results in checksum issues in lower devices. Signed-off-by: Ivan Vecera ivec...@redhat.com --- drivers/net/macvtap.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 3b933bb..cdcbab4 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -796,6 +796,12 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m, skb_reset_mac_header(skb); skb-protocol = eth_hdr(skb)-h_proto; + if (skb_vlan_tagged(skb)) { + int depth; + skb-protocol = __vlan_get_protocol(skb, skb-protocol, depth); I don't think this is right. This would reset the protocol to the encapsulated protocol which isn't really the case since you are not really stripping vlan encapsulations. -vlad Yup, you are right, skb-protocol should be untouched. Will post v2. Ivan -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] net: ratelimit warnings about dst entry refcount underflow or overflow
From: Konstantin Khlebnikov khlebni...@yandex-team.ru Date: Fri, 17 Jul 2015 14:01:11 +0300 Kernel generates a lot of warnings when dst entry reference counter overflows and becomes negative. That bug was seen several times at machines with outdated 3.10.y kernels. Most like it's already fixed in upstream. Anyway that flood completely kills machine and makes further debugging impossible. Signed-off-by: Konstantin Khlebnikov khlebni...@yandex-team.ru Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next] net/vxlan: Fix kernel unaligned access in __vxlan_find_mac
From: Sowmini Varadhan sowmini.varad...@oracle.com Date: Mon, 20 Jul 2015 09:54:50 +0200 __vxlan_find_mac invokes ether_addr_equal on the eth_addr field, which triggers unaligned access messages, so rearrange vxlan_fdb to avoid this in the most non-intrusive way. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: Alexander Duyck comments: place eth_addr[] to be 64b aligned Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net] stmmac: fix setting of driver data in stmmac_dvr_probe
From: Joachim Eastwood manab...@gmail.com Date: Fri, 17 Jul 2015 23:48:17 +0200 Commit 803f8fc46274b (stmmac: move driver data setting into stmmac_dvr_probe) mistakenly set priv and not priv-dev as driver data. This meant that the remove, resume and suspend callbacks that fetched and tried to use this data would most likely explode. Fix the issue by using the correct variable. Fixes: 803f8fc46274b (stmmac: move driver data setting into stmmac_dvr_probe) Signed-off-by: Joachim Eastwood manab...@gmail.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html