Re: [RFC PATCH 06/10] ipv6: Avoid deleting RTF_CACHE route from ip6_route_del()

2015-04-20 Thread Martin KaFai Lau
On Mon, Apr 20, 2015 at 02:23:05PM -0400, David Miller wrote: From: Martin KaFai Lau ka...@fb.com Date: Fri, 10 Apr 2015 18:54:09 -0700 Before patch 'Allow pmtu update on /128 via gateway route', RTF_CACHE route was not created for DST_HOST. It also requires changes on both delete

[PATCH net-next 3/5] ipv6: Stop /128 route from disappearing after pmtu update

2015-04-28 Thread Martin KaFai Lau
), all routes that allow pmtu update should have a RTF_CACHE clone. Hence, stop updating MTU for any non RTF_CACHE route. Signed-off-by: Martin KaFai Lau ka...@fb.com Signed-off-by: Steffen Klassert steffen.klass...@secunet.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org --- net

[PATCH net-next 2/5] ipv6: Extend the route lookups to low priority metrics.

2015-04-28 Thread Martin KaFai Lau
the garbage collector deletes the invalid route. This typically happens if a host route expires afer a pmtu event. Fix this by searching also for routes with a lower priority metric. Signed-off-by: Steffen Klassert steffen.klass...@secunet.com Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed

[PATCH net-next 4/5] ipv6: Stop rt6_info from using inet_peer's metrics

2015-04-28 Thread Martin KaFai Lau
KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com --- include/net/ip6_fib.h | 10 + net/ipv6/route.c | 102 +- 2 files changed, 60 insertions(+), 52

[PATCH net-next 0/5] ipv6: Stop /128 route from disappearing after pmtu update

2015-04-28 Thread Martin KaFai Lau
The series is separated from another patch series, 'ipv6: Only create RTF_CACHE route after encountering pmtu exception', which can be found here: http://thread.gmane.org/gmane.linux.network/359140 This series focus on fixing the /128 route issues. It is currently targeted for net-next due to

[PATCH net-next 5/5] ipv6: Remove DST_METRICS_FORCE_OVERWRITE and _rt6i_peer

2015-04-28 Thread Martin KaFai Lau
. Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Michal Kubeček mkube...@suse.cz Cc: Steffen Klassert steffen.klass...@secunet.com --- include/net/dst.h | 6 -- include/net/ip6_fib.h | 31 --- net

[PATCH net-next 1/5] ipv6: Consider RTF_CACHE when searching the fib6 tree

2015-04-28 Thread Martin KaFai Lau
it. Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com --- net/ipv6/addrconf.c | 2 ++ net/ipv6/route.c| 6 ++ 2 files changed, 8 insertions(+) diff --git a/net/ipv6/addrconf.c b/net/ipv6

[PATCH net-next 1/6] ipv6: Remove external dependency on rt6i_dst and rt6i_src

2015-04-28 Thread Martin KaFai Lau
) later. Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com --- drivers/scsi/cxgbi/libcxgbi.c | 2 +- include/net/ipv6.h | 3 ++- net/ipv6/icmp.c | 2 +- net

[PATCH net-next 4/6] ipv6: Only create RTF_CACHE routes after encountering pmtu exception

2015-04-28 Thread Martin KaFai Lau
This patch creates a RTF_CACHE routes only after encountering a pmtu exception. After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6 tree, the rt-rt6i_node-fn_sernum is bumped which will fail the ip6_dst_check() and trigger a relookup. Signed-off-by: Martin KaFai Lau ka

[PATCH net-next 0/6 v2] ipv6: Only create RTF_CACHE route after encountering pmtu exception

2015-04-28 Thread Martin KaFai Lau
v1 - v2: - Move the /128 route bug fixes to another series (posted). - Create a function for checking (rt6i_flags (RTF_NONEXTHOP | RTF_GATEWAY)). - Avoid shuffling the skb network_header. Instead, change the function signature to take iph instead of skb. The perf numbers do not change much

[PATCH net-next 3/6] ipv6: Combine rt6_alloc_cow and rt6_alloc_clone

2015-04-28 Thread Martin KaFai Lau
A prep work for creating RTF_CACHE on exception only. After this patch, the same condition (rt-rt6i_flags (RTF_NONEXTHOP | RTF_GATEWAY)) is checked twice. This redundancy will be removed in the later patch. Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han

[PATCH net-next 2/6] ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST

2015-04-28 Thread Martin KaFai Lau
on rt6i_gateway and RTF_ANYCAST. Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com --- include/net/ip6_route.h| 14 +- net/bluetooth/6lowpan.c| 2

[PATCH net-next 6/6] ipv6: Create percpu rt6_info

2015-04-28 Thread Martin KaFai Lau
After the patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception', we need to compensate the performance hit (bouncing dst-__refcnt). Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass

[PATCH net-next 5/6] ipv6: Break up ip6_rt_copy()

2015-04-28 Thread Martin KaFai Lau
This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and ip6_rt_cache_alloc(). In the later patch, we need to create a percpu rt6_info copy. Hence, refactor the common rt6_info init codes to ip6_rt_copy_init(). Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa

Re: [PATCH net-next 2/6] ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST

2015-04-29 Thread Martin KaFai Lau
Hi, On Wed, Apr 29, 2015 at 11:28:46AM +0300, Julian Anastasov wrote: Hello, On Tue, 28 Apr 2015, Martin KaFai Lau wrote: -static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt) +static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt

[RFC PATCH net-next v3 8/9] ipv6: Break up ip6_rt_copy()

2015-05-01 Thread Martin KaFai Lau
This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and ip6_rt_cache_alloc(). In the later patch, we need to create a percpu rt6_info copy. Hence, refactor the common rt6_info init codes to ip6_rt_copy_init(). Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han

[PATCH net-next v4 0/10] ipv6: Only create RTF_CACHE route after encountering pmtu exception

2015-05-20 Thread Martin KaFai Lau
v3 - v4: - Patch 8 is new. It keeps track of the DST_NOCACHE routes in a list to handle the iface down/unregister event. - Remove rcu from the newly added rt6i_pcpu variable. It is not needed because it has already been protected by the existing reader/writer lock. - Thanks to 'Julian

[PATCH net-next v4 07/10] ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set

2015-05-20 Thread Martin KaFai Lau
This patch always creates RTF_CACHE clone with DST_NOCACHE when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to the fl6-daddr. Signed-off-by: Martin KaFai Lau ka...@fb.com Acked-by: Julian Anastasov j...@ssi.bg Tested-by: Julian Anastasov j...@ssi.bg Cc: Hannes Frederic Sowa han

[PATCH net-next v4 03/10] ipv6: Combine rt6_alloc_cow and rt6_alloc_clone

2015-05-20 Thread Martin KaFai Lau
A prep work for creating RTF_CACHE on exception only. After this patch, the same condition (rt-rt6i_flags (RTF_NONEXTHOP | RTF_GATEWAY)) is checked twice. This redundancy will be removed in the later patch. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han

[PATCH net-next v4 01/10] ipv6: Remove external dependency on rt6i_dst and rt6i_src

2015-05-20 Thread Martin KaFai Lau
) later. Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Julian Anastasov j...@ssi.bg --- drivers/scsi/cxgbi/libcxgbi.c | 2 +- include/net/ipv6.h | 3 ++- net/ipv6

[PATCH net-next v4 05/10] ipv6: Add rt6_get_cookie() function

2015-05-20 Thread Martin KaFai Lau
Instead of doing the rt6-rt6i_node check whenever we need to get the route's cookie. Refactor it into rt6_get_cookie(). It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also percpu rt6_info later. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org

[PATCH net-next v4 02/10] ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST

2015-05-20 Thread Martin KaFai Lau
on rt6i_gateway and RTF_ANYCAST. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Julian Anastasov j...@ssi.bg --- include/net/ip6_route.h| 19 ++- net/bluetooth

[PATCH net-next v4 10/10] ipv6: Create percpu rt6_info

2015-05-20 Thread Martin KaFai Lau
After the patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception', we need to compensate the performance hit (bouncing dst-__refcnt). Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass

[PATCH net-next v4 04/10] ipv6: Only create RTF_CACHE routes after encountering pmtu exception

2015-05-20 Thread Martin KaFai Lau
This patch creates a RTF_CACHE routes only after encountering a pmtu exception. After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6 tree, the rt-rt6i_node-fn_sernum is bumped which will fail the ip6_dst_check() and trigger a relookup. Signed-off-by: Martin KaFai Lau ka

[PATCH net-next v4 09/10] ipv6: Break up ip6_rt_copy()

2015-05-20 Thread Martin KaFai Lau
This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and ip6_rt_cache_alloc(). In the later patch, we need to create a percpu rt6_info copy. Hence, refactor the common rt6_info init codes to ip6_rt_copy_init(). Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han

[PATCH net-next v4 08/10] ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister

2015-05-20 Thread Martin KaFai Lau
This patch keeps track of the DST_NOCACHE routes in a list and replaces its dev with loopback during the iface down/unregister event. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Julian

[PATCH net-next v4 06/10] ipv6: Set FLOWI_FLAG_KNOWN_NH at flowi6_flags

2015-05-20 Thread Martin KaFai Lau
. Signed-off-by: Martin KaFai Lau ka...@fb.com Acked-by: Julian Anastasov j...@ssi.bg Tested-by: Julian Anastasov j...@ssi.bg Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com --- net/ipv6/raw.c | 3 +++ net/netfilter/ipvs

Re: Recurring trace from tcp_fragment()

2015-06-04 Thread Martin KaFai Lau
On Thu, Jun 04, 2015 at 01:10:26PM -0700, Grant Zhang wrote: Hi Martin, Thank you! My net.ipv4.tcp_mtu_probing is 1. After turning it off, the WARN_ON stack is gone. Thanks for confirming it. Could you elaborate a bit on why this setting relates to the WARN_ON trace? The WARN_ON is

Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing

2015-06-08 Thread Martin KaFai Lau
On Fri, Jun 05, 2015 at 06:11:33PM -0700, Eric Dumazet wrote: On Fri, 2015-06-05 at 17:46 -0700, Martin KaFai Lau wrote: The problem is caught by this WARN_ON(len skb-len) in tcp_fragment(): [810510ca] warn_slowpath_null+0x1a/0x20 [8160ec90] tcp_fragment+0x2a0/0x2b0

[RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing

2015-06-05 Thread Martin KaFai Lau
. This patch is to set the pcount after skb_pull() was called in tcp_mtu_probe(). Signed-off-by: Martin KaFai Lau ka...@fb.com Reported-by: Grant Zhang gzh...@fastly.com Cc: Eric Dumazet eduma...@google.com Cc: Neal Cardwell ncardw...@google.com Cc: Yuchung Cheng ych...@google.com --- net/ipv4

Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing

2015-06-05 Thread Martin KaFai Lau
On Fri, Jun 05, 2015 at 09:53:51AM -0700, Eric Dumazet wrote: Sounds good, although I would simply get rid of all this complexity in this very unlikely path. Would you instead try the following ? diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index

[PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing

2015-06-05 Thread Martin KaFai Lau
- Replace the skb slicing codes by the existing tcp_trim_head(), suggested by Eric Dumazet. v1 - Call tcp_set_skb_tso_segs() for all slicing cases. Signed-off-by: Martin KaFai Lau ka...@fb.com Reported-by: Grant Zhang gzh...@fastly.com Cc: Grant Zhang gzh...@fastly.com Cc: Eric Dumazet eduma

Re: [RFC PATCH net] tcp: Update pcount after skb_pull() during mtu probing

2015-06-05 Thread Martin KaFai Lau
On Fri, Jun 05, 2015 at 02:23:55PM -0700, Eric Dumazet wrote: On Fri, 2015-06-05 at 11:02 -0700, Martin KaFai Lau wrote: tcp_trim_head() does not take the mss_now. Is it fine to have mss_now = tcp_skb_mss(skb)? or we can depend on the tcp_init_tso_segs() in the tcp_write_xmit() to take

Re: [PATCH net v2] tcp: Force updating pcount after skb_pull() during mtu probing

2015-06-09 Thread Martin KaFai Lau
On Tue, Jun 09, 2015 at 10:06:25AM -0700, Eric Dumazet wrote: I've been working on this, but still can get the bug triggering in tcp_fragment(), no matter what (Neal patch , yours, mine...) Can you describe the test case that can reproduce it? -- To unsubscribe from this list: send the line

Re: Recurring trace from tcp_fragment()

2015-06-04 Thread Martin KaFai Lau
Hi Grant, On Thu, Jun 04, 2015 at 09:35:04AM -0700, Grant Zhang wrote: Hi Neal, Unfortunately with the patch we still see the same stack trace. Attached is the TcpExtTCPSACKReneging with the patch, captured with 60 seconds interval. Its value is incremented at an similar speed as before,

Re: [PATCH net-next] ipv6: ipv6_select_ident() returns a __be32

2015-05-25 Thread Martin KaFai Lau
Reported-by: kbuild test robot fengguang...@intel.com Thanks for fixing it. Acked-by: Martin KaFai Lau ka...@fb.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH net-next v5 05/11] ipv6: Only create RTF_CACHE routes after encountering pmtu exception

2015-05-22 Thread Martin KaFai Lau
This patch creates a RTF_CACHE routes only after encountering a pmtu exception. After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6 tree, the rt-rt6i_node-fn_sernum is bumped which will fail the ip6_dst_check() and trigger a relookup. Signed-off-by: Martin KaFai Lau ka

[PATCH net-next v5 03/11] ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST

2015-05-22 Thread Martin KaFai Lau
on rt6i_gateway and RTF_ANYCAST. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Julian Anastasov j...@ssi.bg --- include/net/ip6_route.h| 19 ++- net/bluetooth

[PATCH net-next v5 07/11] ipv6: Set FLOWI_FLAG_KNOWN_NH at flowi6_flags

2015-05-22 Thread Martin KaFai Lau
. Signed-off-by: Martin KaFai Lau ka...@fb.com Acked-by: Julian Anastasov j...@ssi.bg Tested-by: Julian Anastasov j...@ssi.bg Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com --- net/ipv6/raw.c | 3 +++ net/netfilter/ipvs

[PATCH net-next v5 11/11] ipv6: Create percpu rt6_info

2015-05-22 Thread Martin KaFai Lau
After the patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception', we need to compensate the performance hit (bouncing dst-__refcnt). Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass

[PATCH net-next v5 06/11] ipv6: Add rt6_get_cookie() function

2015-05-22 Thread Martin KaFai Lau
Instead of doing the rt6-rt6i_node check whenever we need to get the route's cookie. Refactor it into rt6_get_cookie(). It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also percpu rt6_info later. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org

[PATCH net-next v5 01/11] ipv6: Clean up ipv6_select_ident() and ip6_fragment()

2015-05-22 Thread Martin KaFai Lau
has been generated or not. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Julian Anastasov j...@ssi.bg --- include/net/ipv6.h | 3 +-- net/ipv6/ip6_output.c | 17 ++--- net/ipv6

[PATCH net-next v5 09/11] ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister

2015-05-22 Thread Martin KaFai Lau
This patch keeps track of the DST_NOCACHE routes in a list and replaces its dev with loopback during the iface down/unregister event. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Julian

[PATCH net-next v5 10/11] ipv6: Break up ip6_rt_copy()

2015-05-22 Thread Martin KaFai Lau
This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and ip6_rt_cache_alloc(). In the later patch, we need to create a percpu rt6_info copy. Hence, refactor the common rt6_info init codes to ip6_rt_copy_init(). Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han

[PATCH net-next v5 02/11] ipv6: Remove external dependency on rt6i_dst and rt6i_src

2015-05-22 Thread Martin KaFai Lau
) later. Signed-off-by: Martin KaFai Lau ka...@fb.com Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Cc: Steffen Klassert steffen.klass...@secunet.com Cc: Julian Anastasov j...@ssi.bg --- drivers/scsi/cxgbi/libcxgbi.c | 2 +- include/net/ipv6.h | 4 +++- net/ipv6

[PATCH net-next v5 08/11] ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set

2015-05-22 Thread Martin KaFai Lau
This patch always creates RTF_CACHE clone with DST_NOCACHE when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to the fl6-daddr. Signed-off-by: Martin KaFai Lau ka...@fb.com Acked-by: Julian Anastasov j...@ssi.bg Tested-by: Julian Anastasov j...@ssi.bg Cc: Hannes Frederic Sowa han

[PATCH net-next v5 00/11] ipv6: Only create RTF_CACHE route after encountering pmtu exception

2015-05-22 Thread Martin KaFai Lau
v4 - v5: - Patch 1 is new. Clean up the ipv6_select_ident() and ip6_fragment(). - Further simplify the newly added rt6_get_pcpu_route(). If there is a 'prev' after cmpxchg, return prev instead of the newly created percpu clone. v3 - v4: - Patch 8 is new. It keeps track of the DST_NOCACHE

[PATCH net-next v5 04/11] ipv6: Combine rt6_alloc_cow and rt6_alloc_clone

2015-05-22 Thread Martin KaFai Lau
A prep work for creating RTF_CACHE on exception only. After this patch, the same condition (rt-rt6i_flags (RTF_NONEXTHOP | RTF_GATEWAY)) is checked twice. This redundancy will be removed in the later patch. Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han

Re: [PATCH net-next v5 00/11] ipv6: Only create RTF_CACHE route after encountering pmtu exception

2015-05-26 Thread Martin KaFai Lau
On Tue, May 26, 2015 at 11:20:53PM +0200, Hannes Frederic Sowa wrote: I also went over the changes to the last version and such, albeit a bit late: Reviewed-by: Hannes Frederic Sowa han...@stressinduktion.org Thanks for your help and review, Hannes! --Martin -- To unsubscribe from this list:

[PATCH RFC net 0/3] ipv6: Fix potential deadlock when creating pcpu rt

2015-08-13 Thread Martin KaFai Lau
This patch series fixes a potential deadlock when creating a pcpu rt. It happens when dst_alloc() decided to run gc. Something like this: read_lock(table-tb6_lock); ip6_rt_pcpu_alloc() = dst_alloc() = ip6_dst_gc() = write_lock(table-tb6_lock); /* oops */ Patch 1 and 2 are some prep works. Patch

[PATCH RFC net 3/3] ipv6: Fix potential deadlock when creating pcpu rt

2015-08-13 Thread Martin KaFai Lau
+0x5e/0x67 [141625.827146] [8132c9f8] ? sock_common_setsockopt+0xf/0x11 [141625.833660] [8132c08c] ? SyS_setsockopt+0x81/0xa2 [141625.839565] [8140ac17] entry_SYSCALL_64_fastpath+0x12/0x6a Fixes: d52d3997f843 (pv6: Create percpu rt6_info) Signed-off-by: Martin KaFai Lau

[PATCH RFC net 2/3] ipv6: Add rt6_make_pcpu_route()

2015-08-13 Thread Martin KaFai Lau
It is a prep work for the potential deadlock. The current rt6_get_pcpu_route() will also create a pcpu rt if one does not exist. This patch moves the pcpu rt creation logic into another function, rt6_make_pcpu_route(). Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han

[PATCH RFC net 1/3] ipv6: Remove un-used argument from ip6_dst_alloc()

2015-08-13 Thread Martin KaFai Lau
After 4b32b5ad31a6 (ipv6: Stop rt6_info from using inet_peer's metrics), ip6_dst_alloc() does not need the 'table' argument. This patch cleans it up. Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 21

Re: kernel warning in tcp_fragment

2015-08-12 Thread Martin KaFai Lau
On Mon, Aug 10, 2015 at 02:35:37PM -0400, Neal Cardwell wrote: On Mon, Aug 10, 2015 at 2:10 PM, Jovi Zhangwei j...@cloudflare.com wrote: Ping? We saw a lot of this warnings in our production system. It would be great appreciate if someone can give us the fix on this warnings. :) What

[PATCH v2 net 2/3] ipv6: Add rt6_make_pcpu_route()

2015-08-14 Thread Martin KaFai Lau
It is a prep work for fixing a potential deadlock when creating a pcpu rt. The current rt6_get_pcpu_route() will also create a pcpu rt if one does not exist. This patch moves the pcpu rt creation logic into another function, rt6_make_pcpu_route(). Signed-off-by: Martin KaFai Lau ka...@fb.com CC

[PATCH v2 net 0/3] ipv6: Fix a potential deadlock when creating pcpu rt

2015-08-14 Thread Martin KaFai Lau
v1 - v2: A minor change in the commit message of patch 2. This patch series fixes a potential deadlock when creating a pcpu rt. It happens when dst_alloc() decided to run gc. Something like this: read_lock(table-tb6_lock); ip6_rt_pcpu_alloc() = dst_alloc() = ip6_dst_gc() =

Re: [PATCH RFC net 0/3] ipv6: Fix potential deadlock when creating pcpu rt

2015-08-14 Thread Martin KaFai Lau
On Thu, Aug 13, 2015 at 05:29:09PM -0700, David Miller wrote: From: Martin KaFai Lau ka...@fb.com Date: Thu, 13 Aug 2015 00:58:00 -0700 This patch series fixes a potential deadlock when creating a pcpu rt. It happens when dst_alloc() decided to run gc. Something like this: read_lock

[PATCH v2 net 3/3] ipv6: Fix a potential deadlock when creating pcpu rt

2015-08-14 Thread Martin KaFai Lau
+0x5e/0x67 [141625.827146] [8132c9f8] ? sock_common_setsockopt+0xf/0x11 [141625.833660] [8132c08c] ? SyS_setsockopt+0x81/0xa2 [141625.839565] [8140ac17] entry_SYSCALL_64_fastpath+0x12/0x6a Fixes: d52d3997f843 (pv6: Create percpu rt6_info) Signed-off-by: Martin KaFai Lau

[PATCH v2 net 1/3] ipv6: Remove un-used argument from ip6_dst_alloc()

2015-08-14 Thread Martin KaFai Lau
After 4b32b5ad31a6 (ipv6: Stop rt6_info from using inet_peer's metrics), ip6_dst_alloc() does not need the 'table' argument. This patch cleans it up. Signed-off-by: Martin KaFai Lau ka...@fb.com CC: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 21

Re: kernel warning in tcp_fragment

2015-07-27 Thread Martin KaFai Lau
On Wed, Jul 22, 2015 at 11:55:35AM -0700, Jovi Zhangwei wrote: Sorry for disturbing, our production system(3.14 and 3.18 stable kernel) have many tcp_fragment warnings, the trace is same as below one which you discussed before.

Re: [PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-22 Thread Martin KaFai Lau
On Wed, Jul 22, 2015 at 11:10:59AM +0900, YOSHIFUJI Hideaki wrote: You have to take some lock when accessing neigh-nud_state theoretically. I don't think read_lock can buy us a lot of extra protection either. If it has missed the train, the next ip6_pol_route() call will trigger rt6_probe().

[PATCH net-next v2 1/2] ipv6: Re-arrange code in rt6_probe()

2015-07-24 Thread Martin KaFai Lau
KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org Cc: Julian Anastasov j...@ssi.bg Cc: YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com --- net/ipv6/route.c | 44 1 file changed, 20 insertions(+), 24 deletions(-) diff --git

[PATCH net-next v2 2/2] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-24 Thread Martin KaFai Lau
(): Before: 55M After: 95M Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org CC: Julian Anastasov j...@ssi.bg CC: YOSHIFUJI Hideaki hideaki.yoshif...@miraclelinux.com --- net/ipv6/route.c | 4 1 file changed, 4 insertions(+) diff --git a/net/ipv6

[PATCH net-next v2 0/2] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-24 Thread Martin KaFai Lau
v1 - v2: 1. Separate the code re-arrangement into another patch 2. Fix style -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH net-next] ipv6: Avoid rt6_probe() taking writer lock in the fast path

2015-07-21 Thread Martin KaFai Lau
. At the end, the total number of finished sendto(): BeforeAfter 55M 95M Signed-off-by: Martin KaFai Lau ka...@fb.com Cc: Hannes Frederic Sowa han...@stressinduktion.org --- net/ipv6/route.c | 41 - 1 file changed, 20 insertions(+), 21 deletions

[PATCH net 0/3] ipv6: Fixes for pmtu update and DST_NOCACHE route

2015-11-11 Thread Martin KaFai Lau
This patchset fixes: 1. An oops during IPv6 pmtu update on a IPv4 GRE running in an IPSec setup 2. Misc fixes on DST_NOCACHE route -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at

[PATCH net 2/3] ipv6: Check expire on DST_NOCACHE route

2015-11-11 Thread Martin KaFai Lau
__rt6_check_expired() as one of the condition check. Signed-off-by: Martin KaFai Lau <ka...@fb.com> Cc: Hannes Frederic Sowa <han...@stressinduktion.org> --- net/ipv6/route.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/r

[PATCH net 3/3] ipv6: Check rt->dst.from for the DST_NOCACHE route

2015-11-11 Thread Martin KaFai Lau
Fixes: 8e3d5be73681 ("ipv6: Avoid double dst_free") Signed-off-by: Martin KaFai Lau <ka...@fb.com> Cc: Hannes Frederic Sowa <han...@stressinduktion.org> --- include/net/ip6_fib.h | 3 ++- net/ipv6/route.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --

Re: NULL pointer dereference in rt6_get_cookie()

2015-10-14 Thread Martin KaFai Lau
On Tue, Oct 13, 2015 at 09:26:41PM +0200, Phil Sutter wrote: > I have backed up the rt pointer at top of the function and restored it > before pr_err, this is the output: > > | rt6i_dst:2001:4dd0:ff3b:13::/64 rt6i_gateway::: rt6i_flags:4001 > dst.flags: Hi Phil, Can you try the

[PATCH net 0/2] ipv6: Initialize rt6_info properly in ip6_blackhole_route()

2015-10-15 Thread Martin KaFai Lau
This patchset ensures the rt6_info's fields are initialized properly in ip6_blackhole_route() where xfrm_policy is the primarily user. The first patch is a prep work. The second patch is the fix. It fixes d52d3997f843 ("ipv6: Create percpu rt6_info"). Here is the oops reported by Phil Sutter

[PATCH net 2/2] ipv6: Initialize rt6_info properly in ip6_blackhole_route()

2015-10-15 Thread Martin KaFai Lau
allocated blackhole route. This bug triggers an oops (reported by Phil Sutter) in rt6_get_cookie(). It is because RTF_PCPU is set while rt->dst.from is NULL. Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info") Signed-off-by: Martin KaFai Lau <ka...@fb.com> Reported-by: Phil

[PATCH net 1/2] ipv6: Move common init code for rt6_info to a new function rt6_info_init()

2015-10-15 Thread Martin KaFai Lau
Introduce rt6_info_init() to do the common init work for 'struct rt6_info' (after calling dst_alloc). It is a prep work to fix the rt6_info init logic in the ip6_blackhole_route(). Signed-off-by: Martin KaFai Lau <ka...@fb.com> Cc: Hannes Frederic Sowa <han...@stressinduktion.org>

Re: NULL pointer dereference in rt6_get_cookie()

2015-10-14 Thread Martin KaFai Lau
On Thu, Oct 15, 2015 at 12:34:13AM +0200, Phil Sutter wrote: > Hi Martin, > > On Tue, Oct 13, 2015 at 11:14:21PM -0700, Martin KaFai Lau wrote: > > On Tue, Oct 13, 2015 at 09:26:41PM +0200, Phil Sutter wrote: > > > I have backed up the rt pointer at top of the function and

Re: NULL pointer dereference in rt6_get_cookie()

2015-10-12 Thread Martin KaFai Lau
On Sat, Oct 10, 2015 at 03:24:37PM +0200, Phil Sutter wrote: > Using printk-debugging I could track down the problem to > rt6_get_cookie() function in include/net/ip6_fib.h: > > The conditional at the start of the function evaluates true, since > 'rt->rt6i_flags & RTF_PCPU' is non-zero. Due to

Re: [PATCH net] ipv6: Don't call with rt6_uncached_list_flush_dev

2015-10-12 Thread Martin KaFai Lau
> incorrect behavior. Thanks for fixing it. Reviewed-by: Martin KaFai Lau <ka...@fb.com> I also tested the following cases with the presence of DST_NOCACHE entries: 1. rmmod e1000.ko while running netperf 2. unshare(CLONE_NEWNET) as reported by Dmitry Tested-by: Martin KaFai Lau <ka...@f

Re: [bug report or not] ping6 will lost packets when ping6 lots of ipv6 address

2015-10-13 Thread Martin KaFai Lau
On Tue, Oct 13, 2015 at 08:46:49PM +0800, Li RongQing wrote: > 1. in a machine, configure 3000 ipv6 address in one interface > > for i in {1..3000}; do ip -6 addr add 4001:5013::$i/0 dev eth0; done > > > 2. in other machine, ping6 the upper configured ipv6 address, then > lots of lost packets > >

Re: NULL pointer dereference in rt6_get_cookie()

2015-10-13 Thread Martin KaFai Lau
On Tue, Oct 13, 2015 at 09:10:39PM +0200, Phil Sutter wrote: > Hi Martin, > > On Tue, Oct 13, 2015 at 11:14:43AM -0700, Martin KaFai Lau wrote: > > On Sat, Oct 10, 2015 at 03:24:37PM +0200, Phil Sutter wrote: > > > The conditional at the start of the function evaluat

Re: NULL pointer dereference in rt6_get_cookie()

2015-10-13 Thread Martin KaFai Lau
On Sat, Oct 10, 2015 at 03:24:37PM +0200, Phil Sutter wrote: > The conditional at the start of the function evaluates true, since > 'rt->rt6i_flags & RTF_PCPU' is non-zero. Hi Phil, can you try the following patch and capture the dmesg output to confirm the value of rt->rt6i_flags and the

Re: [PATCH net] net/ip6_tunnel: fix dst leak

2015-11-18 Thread Martin KaFai Lau
cached > dst on non current cpu are not actually reset. > > This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning > such storage. Thanks for fixing it. Acked-by: Martin KaFai Lau <ka...@fb.com> > > Fixes: cdf3464e6c6b ("ipv6: Fix dst_entry refcnt bugs in

Re: [PATCH net-next v5 00/11] ipv6: Only create RTF_CACHE route after encountering pmtu exception

2015-08-28 Thread Martin KaFai Lau
On Mon, Aug 17, 2015 at 11:43:20AM +0200, Alexander Holler wrote: That's why I vote to check out if it's possible/reasonable to backport this series to the stable kernels. I have backported to 4.0.y without major issue, so possible. I did try on 3.1x and gave up. It is a lot of changes, so I

[PATCH net 2/3] ipv6: Rename the dst_cache helper functions in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
with ip6_tnl_dst_get(). Signed-off-by: Martin KaFai Lau <ka...@fb.com> --- include/net/ip6_tunnel.h | 4 ++-- net/ipv6/ip6_gre.c | 4 ++-- net/ipv6/ip6_tunnel.c| 12 ++-- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/net/ip6_tunnel.h b/inclu

[PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
y's refcnt. This patch: 1. Create a percpu dst_entry cache in ip6_tnl 2. Use a spinlock to protect the dst_cache operations 3. The outgoing skb always holds the dst_entry's refcnt Signed-off-by: Martin KaFai Lau <ka...@fb.com> --- include/net/ip6_tunnel.h | 11 - net/ipv6/ip6_gre.c

[PATCH net 0/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
This patch series is to fix the dst refcnt bugs in ip6_tunnel. Patch 1 and 2 are the prep works. Patch 3 is the fix. I can reproduce the bug by adding and removing the ip6gre tunnel while running a super_netperf TCP_CRR test. I get the following trace by adding WARN_ON_ONCE(newrefcnt < 0) to

[PATCH net 1/3] ipv6: Refactor common ip6gre_tunnel_init codes

2015-09-01 Thread Martin KaFai Lau
It is a prep work to fix the dst_entry refcnt bugs in ip6_tunnel. This patch refactors some common init codes used by both ip6gre_tunnel_init and ip6gre_tap_init. Signed-off-by: Martin KaFai Lau <ka...@fb.com> --- net/ipv6/ip6_gre.c | 37 - 1 file chang

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
On Tue, Sep 01, 2015 at 03:38:36PM -0700, Eric Dumazet wrote: > On Tue, 2015-09-01 at 15:25 -0700, Martin KaFai Lau wrote: > > On Tue, Sep 01, 2015 at 02:26:58PM -0700, Eric Dumazet wrote: > > > On Tue, 2015-09-01 at 13:55 -0700, Martin KaFai Lau wrote: > > > > On T

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
On Tue, Sep 01, 2015 at 01:14:20PM -0700, Eric Dumazet wrote: > On Tue, 2015-09-01 at 11:55 -0700, Martin KaFai Lau wrote: > > Problems in the current dst_entry cache in the ip6_tunnel: > > > > 1. ip6_tnl_dst_set is racy. There is no lock to protect it: > >- One ma

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
On Tue, Sep 01, 2015 at 01:14:20PM -0700, Eric Dumazet wrote: > It should not be a problem. refcnt is taken when/if necessary (skb > queued on a qdisc for example) > > We have other uses of skb_dst_set_noref() > > Please describe the problem ? The current ip6_tnl_dst_get() does not take the dst

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
On Tue, Sep 01, 2015 at 05:31:44PM -0700, Martin KaFai Lau wrote: > On Tue, Sep 01, 2015 at 03:38:36PM -0700, Eric Dumazet wrote: > > On Tue, 2015-09-01 at 15:25 -0700, Martin KaFai Lau wrote: > > > On Tue, Sep 01, 2015 at 02:26:58PM -0700, Eric Dumazet wrote: > > > &g

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
On Tue, Sep 01, 2015 at 05:42:00PM -0700, Martin KaFai Lau wrote: > I look a closer look at dst_rcu_free() and your commit pointers. I can see > your point > for DST_NOCACHE. > > However, dst_free() for not DST_NOCACHE is still an issue, I think. oops. Ignore this ema

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-01 Thread Martin KaFai Lau
On Tue, Sep 01, 2015 at 02:26:58PM -0700, Eric Dumazet wrote: > On Tue, 2015-09-01 at 13:55 -0700, Martin KaFai Lau wrote: > > On Tue, Sep 01, 2015 at 01:14:20PM -0700, Eric Dumazet wrote: > > > It should not be a problem. refcnt is taken when/if necessary (skb > > > qu

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-02 Thread Martin KaFai Lau
On Tue, Sep 01, 2015 at 01:14:20PM -0700, Eric Dumazet wrote: > > 2. Use a spinlock to protect the dst_cache operations > > Well, a seqlock would be better : No need for an atomic operation in > fast path. > seqlock can ensure consistency between idst->dst and idst->cookie. However, IPv6 dst

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-02 Thread Martin KaFai Lau
On Wed, Sep 02, 2015 at 03:48:57PM -0700, Eric Dumazet wrote: > On Wed, 2015-09-02 at 14:52 -0700, Martin KaFai Lau wrote: > > On Wed, Sep 02, 2015 at 02:30:45PM -0700, Eric Dumazet wrote: > > > Object cannot be freed until all cpus have exited their RCU sections. > > Y

Re: [PATCH net 3/3] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-02 Thread Martin KaFai Lau
On Wed, Sep 02, 2015 at 02:30:45PM -0700, Eric Dumazet wrote: > Object cannot be freed until all cpus have exited their RCU sections. You meant the dst_destroy() here will wait for all cpus exited their RCU sections? static inline void dst_free(struct dst_entry *dst) { if (dst->obsolete

Re: [PATCH RFC v2 net 4/5] ipv6: Avoid double dst_free

2015-09-04 Thread Martin KaFai Lau
On Fri, Sep 04, 2015 at 04:12:41PM -0700, Martin KaFai Lau wrote: > @@ -1962,6 +1961,9 @@ static int __ip6_del_rt(struct rt6_info *rt, struct > nl_info *info) > if (rt == net->ipv6.ip6_null_entry) { > err = -ENOENT; > goto out; > + } e

[PATCH RFC v2 net 4/5] ipv6: Avoid double dst_free

2015-09-04 Thread Martin KaFai Lau
destroyed already. 3. If rt is a DST_NOCACHE, dst_free(rt) should not be called. 4. It is a stopper to make dst freeing from fib tree undergo a rcu grace period. This patch is to use a DST_NOCACHE flag to indicate a rt is managed by the fib tree or not. Signed-off-by: Martin KaFai Lau <ka...@fb.

[PATCH RFC v2 net 2/5] ipv6: Rename the dst_cache helper functions in ip6_tunnel

2015-09-04 Thread Martin KaFai Lau
with ip6_tnl_dst_get(). Signed-off-by: Martin KaFai Lau <ka...@fb.com> --- include/net/ip6_tunnel.h | 4 ++-- net/ipv6/ip6_gre.c | 4 ++-- net/ipv6/ip6_tunnel.c| 12 ++-- 3 files changed, 10 insertions(+), 10 deletions(-) diff --git a/include/net/ip6_tunnel.h b/inclu

[PATCH RFC v2 net 5/5] ipv6: Replace spinlock with seqlock and rcu in ip6_tunnel

2015-09-04 Thread Martin KaFai Lau
This patch uses a seqlock to ensure consistency between idst->dst and idst->cookie. It also makes dst freeing from fib tree to undergo a rcu grace period. Signed-off-by: Martin KaFai Lau <ka...@fb.com> --- include/net/ip6_tunnel.h | 4 ++-- net/ipv6/ip6_fib.c | 9 +++

[PATCH RFC v2 net 0/5] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-04 Thread Martin KaFai Lau
v2: - Add patch 4 and 5 to remove the spinlock v1: This patch series is to fix the dst refcnt bugs in ip6_tunnel. Patch 1 and 2 are the prep works. Patch 3 is the fix. I can reproduce the bug by adding and removing the ip6gre tunnel while running a super_netperf TCP_CRR test. I get the

[PATCH RFC v2 net 3/5] ipv6: Fix dst_entry refcnt bugs in ip6_tunnel

2015-09-04 Thread Martin KaFai Lau
: 1. Create a percpu dst_entry cache in ip6_tnl 2. Use a spinlock to protect the dst_cache operations 3. ip6_tnl_dst_get always takes the dst refcnt before returning Signed-off-by: Martin KaFai Lau <ka...@fb.com> Conflicts: net/ipv6/ip6_gre.c net/ipv6/ip6_tunnel.c --- in

[PATCH RFC v2 net 1/5] ipv6: Refactor common ip6gre_tunnel_init codes

2015-09-04 Thread Martin KaFai Lau
It is a prep work to fix the dst_entry refcnt bugs in ip6_tunnel. This patch refactors some common init codes used by both ip6gre_tunnel_init and ip6gre_tap_init. Signed-off-by: Martin KaFai Lau <ka...@fb.com> --- net/ipv6/ip6_gre.c | 42 +-

  1   2   3   4   5   6   7   8   >