RE: [PATCH net] udp: fix dst races with multicast early demux
Hi, I have included this patch into my code and re-run our tests overnight, out of 644 iterations we did not see the kernel crash. Previous reproduction rate we would have expected 4-6 crashes in this time. So I think this fixes the issue we are seeing. Thanks, Greg From: netdev-ow...@vger.kernel.org netdev-ow...@vger.kernel.org on behalf of Eric Dumazet eric.duma...@gmail.com Sent: Saturday, 1 August 2015 10:14 p.m. To: Gregory Hoggarth Cc: Shawn Bohrer; netdev@vger.kernel.org; alexgartr...@gmail.com; Michal Kubeček Subject: [PATCH net] udp: fix dst races with multicast early demux From: Eric Dumazet eduma...@google.com Multicast dst are not cached. They carry DST_NOCACHE. As mentioned in commit f8864972126899 (ipv4: fix dst race in sk_dst_get()), these dst need special care before caching them into a socket. Caching them is allowed only if their refcnt was not 0, ie we must use atomic_inc_not_zero() Also, we must use READ_ONCE() to fetch sk-sk_rx_dst, as mentioned in commit d0c294c53a771 (tcp: prevent fetching dst twice in early demux code) Fixes: 421b3885bf6d (udp: ipv4: Add udp early demux) Signed-off-by: Eric Dumazet eduma...@google.com Reported-by: Gregory Hoggarth gregory.hogga...@alliedtelesis.co.nz Reported-by: Alex Gartrell agartr...@fb.com Cc: Michal Kubeček mkube...@suse.cz --- David : I will be on vacation for following 7 days, no internet access. Please wait for tests done by Gregory Alex before merging this ? Thanks ! net/ipv4/udp.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604f9273..1b8c5ba7d5f7 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1995,12 +1995,19 @@ void udp_v4_early_demux(struct sk_buff *skb) skb-sk = sk; skb-destructor = sock_efree; - dst = sk-sk_rx_dst; + dst = READ_ONCE(sk-sk_rx_dst); if (dst) dst = dst_check(dst, 0); - if (dst) - skb_dst_set_noref(skb, dst); + if (dst) { + /* DST_NOCACHE can not be used without taking a reference */ + if (dst-flags DST_NOCACHE) { + if (likely(atomic_inc_not_zero(dst-__refcnt))) + skb_dst_set(skb, dst); + } else { + skb_dst_set_noref(skb, dst); + } + } } int udp_rcv(struct sk_buff *skb) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Panic with demuxed ipv4 multicast udp sockets on 4.0.4
Hi Eric, The previous fix, which included the change to __udp4_lib_rcv did not fix the issue. The reproduction rate may perhaps have been slightly lower with that fix in place, hard to tell as our automated tests unfortunately failed after 600 runs due to another issue and didn't automatically resume themselves (3 kernel crash observed, so 1/200 instead of previous 1/100). I will apply the new suggested patch, reverting previous patch, and test overnight and update tomorrow. Thanks, Greg From: Eric Dumazet eric.duma...@gmail.com Sent: Friday, 31 July 2015 11:17 p.m. To: Gregory Hoggarth Cc: Shawn Bohrer; netdev@vger.kernel.org; alexgartr...@gmail.com Subject: Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4 On Fri, 2015-07-31 at 12:17 +0200, Eric Dumazet wrote: On Thu, 2015-07-30 at 22:29 +, Gregory Hoggarth wrote: Hi Eric, Thanks for the patch. I have put this into our code and we'll run our test over tonight and the weekend and report back on Monday. Thanks, Oh well, real fix will be : (Or at least test if DST_NOCACHE is set : those dst need special care) Here is a more complete patch, not destroying performance of unicast UDP. Please guys test it, I am confident it should solve your problems. Thanks diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604f9273..1b8c5ba7d5f7 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1995,12 +1995,19 @@ void udp_v4_early_demux(struct sk_buff *skb) skb-sk = sk; skb-destructor = sock_efree; - dst = sk-sk_rx_dst; + dst = READ_ONCE(sk-sk_rx_dst); if (dst) dst = dst_check(dst, 0); - if (dst) - skb_dst_set_noref(skb, dst); + if (dst) { + /* DST_NOCACHE can not be used without taking a reference */ + if (dst-flags DST_NOCACHE) { + if (likely(atomic_inc_not_zero(dst-__refcnt))) + skb_dst_set(skb, dst); + } else { + skb_dst_set_noref(skb, dst); + } + } } int udp_rcv(struct sk_buff *skb) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Panic with demuxed ipv4 multicast udp sockets on 4.0.4
Hi Eric, Thanks for the patch. I have put this into our code and we'll run our test over tonight and the weekend and report back on Monday. Thanks, Greg From: Eric Dumazet eric.duma...@gmail.com Sent: Thursday, 30 July 2015 5:45 p.m. To: Gregory Hoggarth Cc: Shawn Bohrer; netdev@vger.kernel.org; alexgartr...@gmail.com Subject: Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4 On Thu, 2015-07-30 at 07:42 +0200, Eric Dumazet wrote: On Thu, 2015-07-30 at 01:41 +, Gregory Hoggarth wrote: Hi, My company has also started having what appears to be the same problem, since we upgraded our embedded system to linux kernel 3.16. I tried applying the suggested fix of READ_ONCE (and also had to add in the necessary code to compiler.h as 3.16 didn't have it) and unfortunately it did not fix the issue at all. Unfortunately we do not have an easy reproduction method, and do not know precisely what is going on in the system when the issue occurs. We know it is a multicast UDP packet but that is about it. For us, the crash happens during a critical stage in our system initialisation, making additional debugging and instrumentation difficult. Our reproduction rate is approximately 1 out of 100 test runs; testing overnight we will usually see 3-5 instances of the crash happening. All our attempts to increase the reproduction rate, or reproduce the issue in a simpler/more controlled way have failed. Because we have customised the linux kernel, in some places radically, we assumed this was just a problem only we were seeing, so we were trying to fix it ourselves. Now that this appears to be a generic problem upstream, we've simply disabled UDP early demux in our system (since it's a new optimisation that we have lived without up till now) and will wait for this issue to be fixed upstream instead. So I'm sharing the debug patch I've written to help gather data on what is going on in the system, and some of the output we've gotten from the debug, in case this is useful for anyone else who is seeing this problem or would like to try and fix it. Feel free to ask questions, I'm not sure how much help I can be but will do my best. We'll be happy to assist in testing any proposed fixes. I also have some more examples of kernel oops and debug output if that could be useful, although the debug is from earlier iterations of the patch so that historical output is not as detailed as the output generated by the latest version of the patch attached here. Thanks, Greg Hoggarth CC UDP early demux author : Shawn Bohrer I believe this is a race condition with a dst escaping RCU protected region. I will send a patch. Please try following fixes : diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 83aa604f9273..02baaa6d97b3 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1778,9 +1778,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, struct dst_entry *dst = skb_dst(skb); int ret; - if (unlikely(sk-sk_rx_dst != dst)) + if (unlikely(sk-sk_rx_dst != dst)) { + skb_dst_force(skb); udp_sk_rx_dst_set(sk, dst); - + } ret = udp_queue_rcv_skb(sk, skb); sock_put(sk); /* a return value 0 means to resubmit the input, but @@ -1995,7 +1996,7 @@ void udp_v4_early_demux(struct sk_buff *skb) skb-sk = sk; skb-destructor = sock_efree; - dst = sk-sk_rx_dst; + dst = READ_ONCE(sk-sk_rx_dst); if (dst) dst = dst_check(dst, 0); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Panic with demuxed ipv4 multicast udp sockets on 4.0.4
Hi, My company has also started having what appears to be the same problem, since we upgraded our embedded system to linux kernel 3.16. I tried applying the suggested fix of READ_ONCE (and also had to add in the necessary code to compiler.h as 3.16 didn't have it) and unfortunately it did not fix the issue at all. Unfortunately we do not have an easy reproduction method, and do not know precisely what is going on in the system when the issue occurs. We know it is a multicast UDP packet but that is about it. For us, the crash happens during a critical stage in our system initialisation, making additional debugging and instrumentation difficult. Our reproduction rate is approximately 1 out of 100 test runs; testing overnight we will usually see 3-5 instances of the crash happening. All our attempts to increase the reproduction rate, or reproduce the issue in a simpler/more controlled way have failed. Because we have customised the linux kernel, in some places radically, we assumed this was just a problem only we were seeing, so we were trying to fix it ourselves. Now that this appears to be a generic problem upstream, we've simply disabled UDP early demux in our system (since it's a new optimisation that we have lived without up till now) and will wait for this issue to be fixed upstream instead. So I'm sharing the debug patch I've written to help gather data on what is going on in the system, and some of the output we've gotten from the debug, in case this is useful for anyone else who is seeing this problem or would like to try and fix it. Feel free to ask questions, I'm not sure how much help I can be but will do my best. We'll be happy to assist in testing any proposed fixes. I also have some more examples of kernel oops and debug output if that could be useful, although the debug is from earlier iterations of the patch so that historical output is not as detailed as the output generated by the latest version of the patch attached here. Thanks, Greg HoggarthFrom: Gregory Hoggarth gregory.hogga...@alliedtelesis.co.nz dst tracking debug patch The debugging works by adding poisoned function pointers to a dst structure when it is cleaned up by the garbage collector. Then, at several points where these pointers are de-referenced, check if they hold the poisoned value. If so, it means the dst that is trying to be referenced has gone through the garbage collector, ie a 'use-after-free' scenario. Instead of letting the system access the garbage pointer and crash, pre-empty the de-reference and panic allowing us to print out the debug information. At many places in the code where dst structures are manipulated, add tracking information about the current dst structure to an external array that can store 100 dst structures. This method of storing the data is not actually very safe, as each new dst structure allocated will consume a slot in the array. If many dst structures are created quickly, this will quickly wrap-around and destroy valid data. A linked-list implementation would be better. --- include/net/dst.h | 145 +--- include/net/sock.h |2 + net/core/dst.c | 109 +++ net/ipv4/udp.c |2 + 4 files changed, 248 insertions(+), 10 deletions(-) diff --git a/include/net/dst.h b/include/net/dst.h index 273c15b..f746999 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -103,8 +103,106 @@ struct dst_entry { struct rt6_info *rt6_next; struct dn_route __rcu *dn_next; }; + + // dst tracking ID + int track_id; +}; + +#include linux/fs.h +#include asm/uaccess.h +#include linux/buffer_head.h +#include linux/time.h + +#define DST_LIST_SZ 25 +extern int dst_latest; +static char dst_filename[50]; + +struct dst_track { + unsigned long address[DST_LIST_SZ]; + unsigned int action[DST_LIST_SZ]; + unsigned int refCount[DST_LIST_SZ]; + struct timespec tvs[DST_LIST_SZ]; + + unsigned int index; + unsigned int reuseMagic; }; +extern struct dst_track track_array[100]; + +static char *dst_list_action[16] = +{ + Free, Hold, Clon, Aloc, AlNr, + ReAl, ReNr, Grbg, NUse, NoRf, + Chld, Zero, SkBf, Sach, Gotn, + NSwp +}; + +struct file* file_open1(const char* path, int flags, int rights); +void file_close1(struct file* file); +int file_write1(struct file* file, unsigned long long offset, unsigned char* data, unsigned int size); + +#define DST_TRACKING(TYPE) \ +do { \ + struct dst_track *track = track_array[dst-track_id]; \ + int index = track-index; \ + if (index DST_LIST_SZ) \ + {\ + track-address[index] = \ + (unsigned long) __builtin_return_address(0); \ + track-action[index] = (TYPE);\ + track-refCount[index] = dst-__refcnt.counter; \ + getrawmonotonic (track-tvs[index]); \ + track-index++; \ + }\ +} while (0) + +#define DST_PANIC \ +do { \ + int k, i, len = 0; \ + struct dst_track *track